CPSC 220 Introduction to Computer Architecture Fall 2009

Lab 11: A Simple C Compiler

Introduction

In this lab, which is due in a little over one week at the start of the final exam (7PM on 12/16), you will write another significant C program. You will write a simple compiler for C so that you can gain some understanding for how compilers work (although there is a lot more to learn, which you will see if you take CPSC 433). You will want to consult your class notes and the textbook for details on the C programming language as well as the following resources:

Setup

Create a lab11 directory within the ~/cs220 directory that you created in lab 1 for use in this lab. Change to the newly-created lab11 directory. There are several files in this directory. The first is scc.c (short for simple C compiler), which must be completed in this lab. In addition, there is some auxiliary code in parser.c, parser.h, and scc.h. You will need to look over these files but you should not need to modify them. There is also a Larc assembler, asm, and Larc simulator, sim, which you will need to assemble and run simple C programs compiled using your compiler. The final file, test.c, is a simple test program for testing your compiler.

To compile your C program (which, interestingly enough, is also a compiler) you will use the commercial C compiler gcc. The following command will compile scc.c into machine code that can be executed:

bash$ gcc -o scc scc.c parser.c
bash$

This command creates an executable, scc, which we can run as follows:

bash$ ./scc test.c
...

You will provide a single argument to scc, which is the C program to compile. A working version of scc will convert test.c to Larc assembly code and print the converted program to the screen (note: a commercial compiler would probably store it in a file ending in ".s").

To send this output to a file you can do the following:

bash$ ./scc test.c > test.s
bash$

You can then assemble this code and simulate it as in previous labs:

bash$ ./asm test.s
bash$ ./sim test.out
...

Initially, your program will compile although it will only print the C program back to the screen as it is not complete. You must complete it in this lab.

Note: the rest of this lab assumes you are in your lab11 directory.

Exercises

There is one exercise for this week's lab due in a little over a week at the start of the final exam on Wednesday, December 16th at 7PM.

  1. In this exercise, you will implement a simple C compiler, which you will also write in C (we will use the pre-existing commercial gcc to compile it). Your compiler will translate simple C programs into Larc assembly programs, which can then be assembled with a Larc assembler, and run with a Larc simulator. It will work similarly to the Larc assembler you wrote in lab 7. You might even find it useful to relook at the assembler lab when doing this lab.

    Simple C language. We will compile a small subset of the C language (one aspect of the language is not from C so it's technically not a subset, but it's close). We will call this language the simple C language. Many of the C features have been removed including functions, control flow, and arrays. In fact, the language is so simple that it is not a universal computing language (i.e., there are many programs that cannot be written in simple C). It includes just simple statements for performing integer, binary, and unary operations, moving values into variables, and printing variables. As such, it will allow for simple calculation (similar to the stack program from lab 10) but not much else.

    Here are the following classes of statements that can be written in simple C:

    TypeFormatExample
    Print<var> ;x ;
    Move<var> = <operand> ;x = 3 ;
    Unary<var> = <unary op> <operand> ;x = - y ;
    Binary<var> = <operand> <binary op> <operand> ;x = 3 + y ;

    A simple C programmer can write statements to perform a binary operation on two variable or integer operands, and store the result in a variable. The programmer can write statements to perform a unary operation on a single variable or integer operand, and store the result in a variable. The programmer can move a variable or integer into another variable. Finally, the programmer can print out the value of a variable by putting it in a statement by itself (note: a newline is also printed after the variable's value is printed).

    The following unary and binary operations are supported:

    TypeOperationExample
    UnaryNegationx = - 3 ;
    UnaryBitwise NOTx = ~ y ;
    BinaryAdditionx = 3 + 4 ;
    BinarySubtractionx = y - 5 ;
    BinaryMultiplicationx = 2 * y ;
    BinaryDivisionx = y / z ;
    BinaryModulusx = x % 2 ;
    BinaryBitwise ANDx = x & y ;
    BinaryBitwise ORx = x | y ;

    Here is an example simple C program called test.c, which has been included in your working directory:

    x = 20 * 2 ;
    x ;
    y = x / 3 ;
    y ;
    z = x % 3 ;
    z ;
    w = y + z ;
    w ;
    

    After compiling and running this program, it would print:

    40
    13
    1
    14
    

    There are a couple of additional simplifications to the language that make the compiler easier to implement. First, each token (variable, integer constant, operator, etc.) must be separated by one or more spaces. This includes the terminating semi-colon. Second, instructions cannot span multiple lines. Third, comments are not supported in simple C programs.

    Simple C compiler. The simple C compiler takes as input a simple C program consisting of these 4 classes of statements and generates Larc assembly code, which it prints to the screen (normally this would be sent to a file). The Larc assembly code can use any features of the Larc assembly language including extended features so long as the resulting program assembles properly. On a buggy simple C program, the compiler should print an error message and exit.

    To make this lab doable in a single week, you have been some code to get you started. In particular, you have been given a file parser.c, which will parse a simple C program and build an array of instructions, which it will return. It will also stop compilation if it finds any statements are incorrectly formatted. parser.c also contains some functions for converting between integers and strings. The interfaces to the functions in parser.c can be viewed in the header file parser.h. You should look over this file although you should not modify it or parser.c.

    In addition, there is a header file called scc.h, which contains several types you will need to use in your compiler. For example, it contains an enumeration representing a boolean type, a struct representing an operation type, and a struct representing an instruction. You will need to use the latter two types when working with the array of instructions that you get back from the parser (each element is an instruction struct, and inside the instruction struct is an operation type). You should look over this file before writing your compiler. You should not need to modify it.

    After looking over parser.h and scc.h, you should look at the function print_cprog in scc.c, which prints the C program back to the screen. Although this function is not compiling the program, it shows how to use the instruction array.

    Completing the compiler. Recall that a Larc assembly program consists of a data and text section. In the data section, we will put all the C variables. (Note: they will not actually be placed anywhere but instead printed to the screen.) We will put a word for each one of them (initially 0) with a label for accessing them. The variable name should be used for the label name. In addition, a newline string will be placed in the data section so that a newline can be printed after printing any variable.

    After the data section will come the Larc text section. In this section you will put instructions for performing every statement in the simple C program. In addition, at the end of the section, you will add Larc instructions for exiting the program.

    Your compiler will probably need to do two passes:

    1. Pass 1 -- find and check all variables, potentially keeping track of them in an array.

    2. Pass 2 -- generate the .data and .text sections of the program, printing them out to the screen.

    You might implement each pass in a separate function.

    Here are some tips to keep in mind when performing the translation:

    Error handling. The parser takes care of most of the error handling but there is one potential semantic error that you must catch: variables cannot be used before they are assigned to. For example, the following program is incorrect since y is not assigned to before it is used:

    x = y + 3 ;
    x ;
    

    Getting started. The file scc.c already contains some code to help you get started. In particular, it contains code for calling the parser to parse the program as well as code for printing out the C program to the screen (which you will want to eventually remove). You will need to complete this file, however, by completing the function compile inside it, as well as adding other functions and declarations. In addition, parser.c, scc.h, and parser.h contain some auxiliary code for parsing the C program and converting between ints and strings. You should not need to modify these functions but you should look them over (the header files in particular).

Handin

Verify that your lab11 folder contains all of the files you created or modified for this lab (e.g., scc.c), then copy your entire lab11 folder to the handin directory ~mcorliss/handin/cs220/username (where username is replaced with your username). For example, if your working directory is ~/cs220/lab11/ then you could do the following:

cp -r ~/cs220/lab11 ~mcorliss/handin/cs220/username

where username is replaced with your particular username (e.g., mcorliss).


Good luck and have fun!