CPSC 220, Fall 2012
Lab 9: Larc Assembler
For this lab, you will write (part of) a Larc assembler program that converts Larc assembly language programs into Larc machine language. This lab is taken with very minimal changes from the Larc manual.
We will discuss in the class before the lab whether everyone will work on this project individually, or whether you are allowed to work with a partner.
You will be working with a set of Java files written by Larc's author, Marc Corliss. The only change that has been made is to the main program: It was originally designed to work only on the command line; it has been modified so that when the program is run with no command-line parameters, it will use a JFileChooser to ask the user to specify the input and output files.
The Java files that you need can be found in /classes/cs220/lab9-files. A subfolder named tests contains the sample assembly language programs. You should copy lab9-files into your own account. If you want to use Eclipse, copy the Java files into an Eclipse project. The API for the supplied Java files can be found at http://math.hws.edu/eck/cs220/f12/lab9/api. You should familiarize yourself with the API before attempting to do the lab.
This is a two-week lab, but there is a checkpoint after one week. By next week, you should have a basic assembler finished and working. The basic assembler does not handle any extended instructions, and it assumes that each assembly language instruction in the program corresponds to one machine language instruction (so, all labels and immediates can be represented in 8 bits.) This first version of the lab should be in your homework folder before class next Tuesday.
For the second week, you will upgrade your assembler to handle labels and immediates that require more than 8 bits, and you will implement at least a few features of the extended assembly language instructions. I expect people working with a partner to implement more extended instructions than people working on their own. You should consult with me about how much is expected.
The Assembler Project
In this exercise, you will write an assembler for translating Larc assembly code into Larc machine code. For example, once your assembler is working the following command will convert the assembly program in tests/hello-world.s to a machine code file in tests/hello-world.out:
java Assembler tests/hello-world.s
The base name of the machine code file (tests/hello-world) will be taken taken from the name of the assembly file and appended with .out. You can then run the machine code program by running it in the simulator (as in the previous labs):
sim tests/hello-world.out
The machine language programs output by your assembler should have the same behavior as programs produced by the standard Larc assembler asm, but the two machine language programs don't necessarily have to be identical. (Note: If you run the program without a command-line argument, it will ask you to select the input and output files using a JFileChooser dialog box.)
The Larc assembly language section in Section 5 of the Larc manual describes the Larc assembly language in detail. (It was handed out in class and is available in /classes/cs220/larc/manual.) Before starting this assignment you should read that section carefully. You are responsible for implementing an assembler which follows the specifications laid out in the manual. Note: For the first version of the assembler, you are only responsible for the base assembly instructions. For the second, you will implement some extended instructions.
Some parts of the assembler will be provided for you such as a class for parsing a Larc assembly program (Parser.java); classes for representing a generic assembly item (Asm.java), instruction (Inst.java), data item (Datum.java), and label (Label.java); a class containing some useful variables and methods pertaining to the ISA (ISA.java); and a class containing some useful conversion methods (Convert.java). You are responsible for completing Assembler.java, which converts the parsed assembly program into a machine language program. It contains skeleton code to get you started. You will also be responsible for understanding the other files, at least functionally, even though you do not have to implement them. (A web address for the API is given above.)
Parts of Assembler.java are already completed. In particular, Assembler.java already calls a parse method from Parser.java to obtain the parsed program. This is stored in two Vectors of assembly items (type Asm). The Vectors are called insn and data. Each item in the vectors is either: an instruction (type Inst), a data item (type Datum), or a label (type Label). (Note: Inst, Datum, and Label are subclasses of Asm.) Your assembler must perform multiple passes over these Vectors of assembly items. In the final pass, your assembler will build a Vector of binary 16-bit words (type String). The name of this vector is binaryProgram. The contents of binaryProgram are the machine language program, which will be written to the output file. (The output is already implement in Assembler.java; you just have to put the program into the binaryProgram vector.)
Currently Assembler.java simply prints the assembly program back to the screen in a method called printProgram() (you will eventually remove the call to printProgram()). This method is provided to you to show you how to work with the Vectors isns and data. You should look at it carefully before writing new code. Each pass that you perform over the assembly program will look similar to the pair of loops in printProgram(). You will probably need to make two to three passes over the assembly program. In the first pass, you will need to compute an address for each instruction, data item, and label, which will be used later. In the second pass, you will patch each instruction or data item that uses a label. The reference to the label will be replaced with an immediate value. Finally, in a third pass (which could be merged with the second pass), each assembly item is converted into binary words, which are placed as strings in the Vector binaryProgram. For the basic version of the assembler, most items will correspond to one 16-bit word. The exceptions are the .asciiz and .space data directives, which may correspond to multiple 16-bit words. In the second version of the assembler, some instructions will have to be expanded into several machine language instructions.
When you compute addresses for each assembly item in the first pass, you will need to keep track of the address corresponding to each label. The class Label has a static method addToMap for adding a (label name, address) pair to a map so that later you can look up the address of that particular label. To look up the address of a label, you would use the static method getFromMap in the Label class. If the label exists this method will return the address that you have specified for the label; otherwise, it will return a -1. For example, the following adds the label "label1" to the map, giving it address 2000:
Label.addToMap("label1", 2000);
and the following sets variable addr to the address of the label named “label1”:
int addr = Label.getFromMap("label1");
If "label1" has not been added to the map then Label.getFromMap("label1") returns -1.
Error handling: Your assembler will need to handle errors in the assembly file. The parser (in Parser.java) already handles several errors such as a bad register name or unknown operator. But some errors must be caught in Assembler.java. In particular, your assembler must catch the following:
- Labels that are defined more than once.
- References to non-existent labels.
- Use of any extended assembly language instruction that is not implemented by your program. (The legal assembly language opcodes for non-extended instructions are 0 to 15 plus 17 for la. In a machine language program, la is actually implemented as li. ISA.java provides symbolic names for both extended and non-extended opcodes.)
- Immediates that are out of range (at least in the first version of the program).
- Labels that map to immediate values that are too large (at least in the first version of the program).
A method assemblyError is provided for reporting errors. You should call it with the appropriate message (a string) if your assembler discovers an error. This method does not return. it exits the program after printing the error message.