Larc Computer and Machine Language


"Larc," or "little architecture," is a very simple model computer and an ISA (Instruction Set Architecture) for that computer. It was developed about a dozen years ago by Marc Corliss for use in CS 220 when he was a professor at Hobart and William Smith Colleges. We will use it in CS 220 this semester as a first example of an ISA, and later in the semester you will create a working simulation of the Larc computer in the Logisim logic circuit simulation program.

This document describes the Larc model computer and machine language. It also discusses a very simple Larc assembly language that is really just an alternative notation for the machine language, without all the convenient features of a normal assembly language. A second document discusses a program that simulates the Larc computer.


The Larc Computer

The Larc CPU contains 16 general-purpose registers, in addition to a PC (Program Counter) register. Each register holds a 16-bit binary number. The general purpose registers are numbered 0 to 15, or 0b0000 to 0b1111 in binary; that is, a register number is a four-bit binary number. Register number 0 has the peculiar property that its value is always zero.

The main memory, or RAM, for Larc has 65536, or 216, locations, and each location holds a 16-bit binary number. Locations are numbered 0 to 65535, or 0b0000000000000000 to 0b1111111111111111 in binary. The number of a location is called its address. Note that an address of a location is RAM is a 16-bit binary number.

A machine language (ML) instruction for Larc is a 16-bit binary number. The structure of the machine language is discussed below. Any RAM location can hold one instruction, but the contents of a memory location can also be interpreted in other ways, such as an integer that is meant as data for the program. The meaning of a number depends on how that number is used.

The operation of the computer is the basic fetch-and-execute cycle. The 16-bit number in the PC gives the address of the memory location that contains the next ML instruction to be executed. The computer fetches the instruction from that address, adds one to the PC, and then executes the instruction. It repeats the cycle indefinitely until it is halted in some way. Some instructions modify the PC when they are executed; these are "branch" or "jump" instructions. Adding 1 to the PC gets the PC ready for the next instruction in the usual sequence, but the sequence can be changed by a branch or jump.


The Larc Machine Language

An ML instruction is a 16-bit number. The four high-order bits of an instruction are an opcode that specifies the operation encoded by the instruction, and the remaining twelve bits contain data for the operation. There are several formats for the data in an instruction, depending on the instruction's opcode.

The following table describes the 16 ML instructions. Reg[n] represents the value stored in register number n. The function sext(x) sign-extends x to a 16-bit number. The mnemonic for an instruction is used in the assembly language notation for that instruction. The semantics are expressed using Java-like notation.

Instruction Format Opcode Mnemonic Semantics
Arithmetic. The twelve data bits in an instruction contain three four-bit register numbers, which are referred to here as ra, rb, and rc. 0000 add Reg[ra] = Reg[rb] + Reg[rc].
0001 sub Reg[ra] = Reg[rb] - Reg[rc]
0010 mul Reg[ra] = Reg[rb] * Reg[rc]
0011 div Reg[ra] = Reg[rb] / Reg[rc],
but if Reg[rc] is zero, the computer halts because of a division by zero error.
0100 sll Reg[ra] = Reg[rb] << (Reg[rc] & 15);
shift left logical (with zero fill)
0101 srl Reg[ra] = Reg[rb] >>> (Reg[rc] & 15);
shift right logical (with zero fill)
0110 nor Reg[ra] = ~(Reg[rb] | Reg[rc]);
bitwise logical NOR operation.
0111 slt Reg[ra] = (Reg[rb] < Reg[rc]) ? 1 : 0;
set if less than
Immediate. The first four data bits, ra, represent a register number. The last eight data bits, shown here as limm, represent a signed 8-bit number. "limm" stands for "long immediate," and an "immediate" is a field in an instruction that represents a constant rather than a register number. 1000 li Reg[ra] = sext(limm);
load immediate
1001 lui Reg[ra] = limm << 8;
load upper immediate
1010 beqz if (Reg[ra] == 0) PC = PC + sext(limm);
branch if equal to zero
1011 bnez if (Reg[ra] != 0) PC = PC + sext(limm);
branch if not equal to zero
Memory. Data bits are two 4-bit register numbers, ra and rb, and and a signed 4-bit number, simm. "simm" stands for "short immediate." 1100 lw Load value from memory location Reg[rb]+sext(simm) into Reg[ra]
1101 sw Store value from Reg[ra] into memory location Reg[rb]+sext(simm)
Jump. Two 4-bit fields, ra and rb; last four data bits are ignored. 1110 jalr Jump-and-link-register: Save current PC in Reg[ra] and set PC to Reg[rb].
Syscall. All data bits are ignored. 1111 syscall Call system subroutine number Reg[1].

For example, the 16-bit binary number 0x0001001100100111 represents a subtraction operation (code 0001) applied to register numbers 3, 2, and 7 (binary numbers 0011, 0010, and 0111). That is, when it is executed, the values from registers 2 and 7 are added, and the result is stored in register 3

The number 0b0000100011110000 means to add the values from registers 15 and 0, and store the result in register 8. Since the value in register 0 is always zero, this has the effect of copying the value from register 15 into register 8.

Or consider 0b1100000100020011. This is a "load" command with arguments 1, 2, and 3. It takes the value from register 2 and adds the number 3 to that value, and it uses the result as the address of a location in memory. The value from that location is copied into register 2. (The SIMM value in a memory instruction will most often be zero. Non-zero values are sometimes useful when you want to load a sequence of values from consecutive memory locations.)

The load-upper-immediate (lui) command exists because a limm is an 8-bit value, but a register holds a 16-bit value. The lui command loads an 8-bit constant value into the leftmost, or upper, eight bits of a register, and fills the lower 8 bits of the register with zeros. Note that it can be difficult to get an arbitrary 16-bit constant into a register; it requires a combination of lui, li, and other commands.

The jalr command can be used to jump to any memory address. Before jumping, it stores the current PC value in a register to make it easy to jump back to that address later; this can be used for calling and returning from subroutines. The conditional branch instructions, beqz and bnez, can only be used to jump to nearby addresses. They add a constant amount to current value of the PC. The constant is treated as a signed 8-bit value, in the range -128 to 127. Note that the constant is added to the PC value that has already been incremented as part of the fetch-and-execute cycle, so the constant actually gives an offset from the instruction that follows the branch instruction in memory.

The syscall instruction needs more explanation. It really only makes sense if the computer is running under the control of an operating system. In a real computer, the operating system has full access to all aspects of the computer, while other programs have restricted access. A syscall instruction is used to call subroutines in the operating system. This is different from calling a normal subroutine because it means removing the restrictions that apply to normal programs. We will not implement an operating system, but the Larc Simulator fakes a few system subroutines to give syscall something useful to do. Note that in a real computer, a system error, such as division by zero, also transfers control to the operating system.


Larc ML as Assembly Language Instructions

An ML instruction is a 16-bit binary number. But for our purposes, we can also represent ML instructions in a textual form, using the mnemonics from the above table. An instruction is represented by a mnemonic, followed by its arguments, separated by spaces.

In an assembly language command, an argument that is a register number is represented by a $ followed by a decimal number in the range 0 to 15: $0, $1, $2, $3, ..., $15. A limm immediate value, which is an 8-bit binary number can be represented by a decimal constant in the range -128 to 255, by a hexadecimal constant in the range 0x00 to 0xFF, or by a binary constant in the range 0b00000000 to 0b11111111. (Note that Java notation is always used for binary and hexadecimal constants.) Similarly, the simm value in a memory instruction, which is a 4-bit binary number, can be represented by a decimal constant in the range -8 to 7, by a hexadecimal constant in the range 0x0 to 0xF, or by a binary constant in the range 0b0000 to 0b1111.

Some examples: add $2 $2 $3, beqz $2 -13, li $1 0xF3, sw $3 $2 0, jalr $11 $6, syscall.

The following table shows the assembly language syntax for Larc machine language instructions. Here, ra, rb, and rc represent register numbers in the range 0 to 15. LIMM and SIMM represent constants with the appropriate number of bits. There are two entries for each memory instruction, because the SIMM value can be omitted from the assembly language instruction when its value is zero.

add $ra $rb $rc Reg[ra] = Reg[rb] + Reg[rc]
sub $ra $rb $rc Reg[ra] = Reg[rb] - Reg[rc]
mul $ra $rb $rc Reg[ra] = Reg[rb] * Reg[rc]
div $ra $rb $rc Reg[ra] = Reg[rb] / Reg[rc]
sll $ra $rb $rc Reg[ra] = Reg[rb] << Reg[rc]
srl $ra $rb $rc Reg[ra] = Reg[rb] >>> Reg[rc]
nor $ra $rb $rc Reg[ra] = ~(Reg[rb] | Reg[rc])
slt $ra $rb $rc Reg[ra] = (Reg[rb] < Reg[rc])? 0 : 1
li $ra LIMM Reg[ra] = sext(LIMM)
lui $ra LIMM Reg[ra] = LIMM << 8
beqz $ra LIMM if Reg[ra] == 0, then PC = PC + sext(LIMM)
bnez $ra LIMM if Reg[ra] != 0, then PC = PC + sext(LIMM)
lw $ra $rb Reg[ra] = Mem[ Reg[rb] ]
sw $ra $rb Mem[ Reg[rb] ] = Reg[ra]
lw $ra $rb SIMM Reg[ra] = Mem[ Reg[rb] + sext(SIMM) ]
sw $ra $rb SIMM Mem[ Reg[rb] + sext(SIMM) ] = Mem[ra]
jalr $ra $rb temp = PC; PC = Reg[rb]; PC = Reg[ra]
syscall call system subroutine number Reg[1]