Programming Assignment PA10: CS 270 Computer Organization

CS 270, Fall 2014
Programming Assignment PA10
LC-3 Assembler in C

PA10 due Thursday, Dec. 12 at 11:59pm, no late submissions.

Goals of Assignment

In this assignment, you will complete an assembler for the LC3 assembly language by completing the file assembler.c. You will reuse code that you wrote in previous assignments, add new code and integrate with code provided to you. Some of that code is C source code. Other parts are just a library. You are given header files so that you know what functionality is provided and can call functions in the library even though you do not have the source code. This assignment serves several purposes:

Learn how to translate an assembly language to binary code
Learn how to do simple file I/O in C
Learn to decompose functionality into smaller pieces
Practice working with structures and pointers
Practice working on a larger project

Overview of an assembler

An assembler is a program which translates assembly language statement to code. It must translate ADD R1,R2,R3 to the hex code 0x1283. It is like a compiler except that the language is deals with is much simpler than a high level language like Java or C. Several things make assembly language easier to deal with:

Every statement is on a single line of source code
There is at most one statement per source line
The syntax is very regular and is quite simple.

For the LC3 assembly language the syntax is:


    [label] opcode operands [; optional end of line comment]

The assembler reads the source code a line at a time, analyzes each line and produces the output file(s) required to run the program. The assembler produces two output files: 1) an object file containing the code; 2) a symbol table file. Because the assembly code may contain references to things that have not yet been encountered (e.g. a branch to a location later in the code), the assembler normally makes two passes over the "code".

The first pass of the assembler must, at a minimum, do two things:

Verify that each line is syntactically correct. This involves identifying the LC3 operator (if any) and verifying that the operands are correct in number and type for that operator. If a line is empty or contains only a comment, simply continue with the following line. For lines that contain code, you must determine what the address of the next instruction is. Most instructions only take one word and in those cases, the address is simply incremented. Other pseudo-ops may update the address differently.
Whenever a line contains a label, insert it and its address into the symbol table. This is required so that the PCoffset for the LD/ST/LDI/STI/BR/JSR/LEA operators can be computed in the second pass.

Additionally, the first pass may choose to store results from the syntactic analysis and pass this on to the second pass. This will make that easier. It requires building a list of information about each instruction.

Alternatively, you can skip storing this information and only create the symbol table. Then, in the second pass, the source file is be re-read and syntactic analysis done again. At this point, there are no syntatic errors, because they would have been found in the first pass. This approach requires reading the source file twice.

The second pass of the assembler is responsible for generating the object code from the .asm file. This pass should write a .obj or .hex file. The actual work depends on how the second pass was structured. It may:

scan a data structure created with the first pass, or
re-read the source file and reprocess each line.

In either case, it must generate the LC3 word(s) that are required for each instruction. This involves creating the correct 16-bit bit pattern(s) that defines the instruction. When an LD/ST/LDI/STI/BR/JSR/LEA instruction is encountered, the code needs to compute the PCoffset, determine if it is in range and insert it into the bit pattern. Offsets out of range are reported and are the only errors generated by during the second pass.

Getting Started

Create a directory for this assignment and cd to it
Either copy this PA10.tar file and unpack it, or copy the following files to the directory you created. It is easiest to right click on the link, and do a Save Target As.. for each of the files.
- assembler.c (complete this file)
- assembler.h (do no modify)
- field.h (do not modify)
- lc3as.a (library binary)
- lc3.h (do not modify)
- main.c (do not modify)
- Makefile (do not modify)
- seeLC3.c (do not modify)
- symbol.h (do not modify)
- testTokens.c (do not modify)
- tokens.h (do not modify)
- util.h (do not modify)
Build the executable by typing make. There should be two warnings about unused variables.

You are now ready to begin implementation. However, before writing ANY code, you should make yourself a map of all the components in this project, what functionality each provides, and how they relate to one another. In doing this you are creating a model of the project. Modeling is something you will formally study when you take your Software Engineering course cs314.

Implementing Convenience functions

There are many small taks that one can identify before taking on the main ones. For example, the program will need to open several files. A good program will report errors to notify the user of any problems. One tactic is to wrap the two operations (file opening and error reporting) into a small, convenience function. Similarly, registers are used frequently in the LC3 language. A convenience function to get a register and report an error on failure may prove useful. This is a list of conveniene functions that may find useful in your assembler. They are optional and you do not need to implement/use them unless you want to.

open_read_or_error()
open_write_or_error()
get_reg_or_error()
get_comma_or_error()
get_immediate_or_error()
get_PCoffset_or_error()

Each of these is just a few lines of code. A suggested way of completing these and other functions is to write only a couple of lines of C, save your result, and compile. Small syntax errors can lead to a large number of error messages. Thus, by doing just a few lines at a time, on can be sure that the new lines caused all the errors. Do NOT add more lines until you have a clean compile.

Although there is no way to test them directly at this point, they are simple enough you should be able to "test" them by inspection. You might want to implement them only if/when your other code requires them.

Stubbing in `asm_pass_two()`

This may seem like getting the cart before the horse, but a little work here, will greatly aid in writing/debugging asm_pass_one().

The second pass of the assembler will loop over the data structure created in the first pass and generate code. So, write a loop that traverses the elements of the linked list defined by infoHead and infoTail and call asm_print_line_info() on each element.

Implementing `asm_pass_one()`, phase 1

Study the documentation for this routine. You may translate this outline directly into code. For initial testing, two code snippets will prove useful.

Between steps 3 and 4, print the line you read and call print_tokens()
replace step 4.3 and with code to save the source line in the field reference using the function strdup().

To test your code, make the assembler and run it with a small assembly file(s). The name of your assembler is mylc3as. What you should get is several things:

each source line and the tokens that is contains
a list of the source file with source code line numbers in it.
a symbol table file (.sym) with a header, but no symbols

You can see from step 4.4 that this assembler is building a data structure that will be re-used in the second pass. This will let you practice your C dynamic memory management skills.

To help you understand what your output should look like, you may execute the program ~cs270/PA10/phase1. This program is a reference implementation of the code described above.

Implementing `asm_pass_one()`, phase 2

You are now ready to add a bit more. Specifically, you will implement three functions:

check_for_label()
update_address()
check_line_syntax()

Study the documentation for these methods. You can code check_for_label() from its description.

Now start to code update_address(). Just handle the "default" case. How should the "output" of your program change? Build it, and run it and verify that your "output" has changed in the way you expected.

Now begin to code check_line_syntax(). Specifically, write code for the first four steps. Step 5 is an error check and will be deferred until later. In your asm_pass_one(), add the call to check_line_syntax(). Once you have completed this, you can test by running your assembler with sample file(s). If you have labels in your assembly code, those labels should appear in the symbol table file. And the opcode field printed out should correspond to the opcode on each line. Study the sample code in file seeLC3.c for ideas.

To help you understand what your output should look like, you may execute the program ~cs270/PA10/phase2.


Implementing asm_pass_one(), phase 3
This phase is where you must finish translating the assembly code
for each line into values in the line_info_t structure. First remove the
code that stores the source line in the reference field.

This phase will be much easier if you do incremental development. To do this,
you want a series of "small" assembly language program that will be used to
test code in you assembler. The programs need not even be legal LC3 programs.
For example, you can test your handling of .ORIG with a file
that contains only a single statement. Develope the code necessary to handle
that single instruction and make sure it works. Create additional examples and
add the code necessary to support them. Instructions that take NO
operands are particularly easy.

Code the function scan_operands(). Study the documentation to
determine what it does. The basic structure is a loop. However, the loop
is a little different that what you are used to writing. How do the values
of the operand_t vary from one to the next? Once you understand
this, the loop is easy to write. For each operand,
call get_operand(). Once you have written this method, build
your assembler and test it with "small" assembly files. Do you understand
the output and does it match what you think should be output?

One you have completed scan_operands(), code the function
get_operand(). Fill in the code for one of the cases
and build your assembler. Then test it with a simple assembly file that
will exercize that case. Code a little, build, test to perfection, and repeat 
until you are done!


Implementing asm_pass_two()
This phase is where you must use the values in the line_info_t
structure to generate machine code. This involves taking the various operands
discovered in the first pass and encoding them in the 16 bit LC3 word.
The only error checking performed it to make sure that the PCoffset
value is in range.

Write code incrementally to handle
a single LC3 instruction. You might do the ones with no operands first.
Test to make sure the simple instructions work correctly. Add more cases
and continue testing.

At some point you will want to turn off your debugging output, if you have
any. Do NOT remove this code. You may need it again. You may comment
it out or, if you are adventurous, learn to use conditional compilation to
turn you debugging output off and on without changing you code.


Testing your code
You should be able to test your assembler with any valid LC-3 program,
but we have provided several files, including some with one instruction
each to allow you to do incremental development. 
The test files that are currently available are
here, and more may be added.


Disassembler Option
For the honors or extra credit option you can use all of the same utilities and
files to write a disassembler. A disassembler reads a .hex file (no .obj files!)
and produces LC-3 source code. Here is an overview of how this can be accomplished:


    Create an option to mylc3as called -disassemble.
    
When called with -disassemble, change asm_pass_one as follows:
    
        Add code to read a file in .hex format.
        
For each line in the file, allocate a line_info structure.
        
For each line in the file, fill in the machineCode field.
        
Set the line_info fields based on the machineCode values.
        
This requires you to interpret the bits in the machineCode field.
        
Use the existing utilities to read the symbol table.
    
    
When called with -disassemble, change asm_pass_two as follows:
    
        Iterate the line_info structures, as before.
        
Translate the line_info fields into assembly mnemonics.
        
For example, x1BA8 turns into ADD R5,R6,#8.
        
Query the symbol table for each address, and print the label.
    
    
Test your code by assembling a .asm file with lc3as and disassembling
    with your program.
    
Some formatting tips so that we can test:
    
        The label should be in the columns 0..14 of the output, left justified.
        
No labels that we test with will ever exceed 14 characters.
        
The opcode should be in columns 15..20 of the output, left justified.
        
No opcodes that we test with will ever exceed 5 characters.
        
The operands (or assembler directives should start at column 21 of the output, left justified.
        
There should be no spaces between the operands, just a comma.
        
There should be no comments in the file.
        
For example:
    
Label         ADD   R5,R6,#8
1234567890123456789012345678901234567890

    
Grading Information:
    
        There will be a separate checkin assignment.
        
You must meet with the instructor to get this graded.
        
The maximum number of extra points will be 50.
        
Do not attempt this unless you already have PA10 working well!
    



Grading Criteria

Here is a list of final tests that you will be graded on, along with
the number of points for each. In each case you must exactly match the .hex
file produced by lc3as to get credit.


    NOTE: The AND,ADD,NOT opcodes require only register operands.
    Handle an assembly program with a single register ADD. (5 points)
    
Handle an assembly program with a single register AND. (5 points)
    
Handle an assembly program with a single NOT command. (5 points)
    
NOTE: The AND,ADD,TRAP opcodes require an immediate value:
    
Handle an assembly program with a single immediate ADD. (5 points)
    
Handle an assembly program with a single immediate AND. (5 points)
    
Handle an assembly program with a single TRAP command. (5 points)
    
NOTE: The JSR,LD,LDI,LEA,ST,STI opcodes require a register and PC offset.
    
Handle an assembly program with a single JSR command. (5 points)
    
Handle an assembly program with a single LD command. (5 points)
    
Handle an assembly program with a single LDI command. (5 points)
    
Handle an assembly program with a single LEA command. (5 points)
    
Handle an assembly program with a single ST command. (5 points)
    
Handle an assembly program with a single STI command. (5 points)
    
NOTE: The JMP,JSRR,RET opcodes require a single register:
    
Handle an assembly program with a single JMP command. (5 points)
    
Handle an assembly program with a single JSRR command. (5 points)
    
Handle an assembly program with a single RET command. (5 points)
    
NOTE: The LDR,STR opcodes require a register and offset.
    
Handle an assembly program with a single LDR command. (5 points)
    
Handle an assembly program with a single STR command. (5 points)
    
NOTE: The BR opcode requires special handling of condition codes.
    
Handle an assembly program with a BR instruction. (2 points)
    
Handle an assembly program with a BRn instruction. (3 points)
    
Handle an assembly program with a BRz instruction. (3 points)
    
Handle an assembly program with a BRp instruction. (3 points)
    
Handle an assembly program with a BRnz instruction. (3 points)
    
Handle an assembly program with a BRzp instruction. (3 points)
    
Handle an assembly program with a BRnzp instruction. (3 points)
    
NOTE: The .FILL and .BLKW directives must generate the correct data:
    
Handle an assembly program with a .FILL instruction. (5 points)
    
Handle an assembly program with a .BLKW instruction. (10 points)
    
NOTE: Code to detect the following errors is required:
    
Handle an assembly program with an ERR_OPEN_READ error. (5 points).
    
Handle an assembly program with an ERR_OPEN_WRITE error. (5 points).
    
Handle an assembly program with an ERR_DUPLICATE_LABEL error. (5 points).
    
Handle an assembly program with an ERR_MISSING_LABEL error. (5 points).
    
Handle an assembly program with an ERR_IMM_TOO_BIG error. (10 points).
    
Handle an assembly program with an ERR_BAD_PC_OFFSET error. (10 points).
    
NOTE: The following are full LC-3 programs:
    
Simple assembly program to add two numbers. (10 points).
    
Assembly program with negative and positive PC offsets. (10 points).
    
Moderate assembly program from master PA8 solution. (10 points).
    
Complex assembly program from master PA7 solution. (10 points).




Checking in Your Code
You will submit the single file assembler.c using the
Checkin tab under the PA10 assignment. Submit
only the assembler.c file, do not modify any other source files!

CS 270, Fall 2014
Programming Assignment PA10
LC-3 Assembler in C

PA10 due Thursday, Dec. 12 at 11:59pm, no late submissions.

Goals of Assignment

Overview of an assembler

Getting Started

Implementing Convenience functions

Stubbing in `asm_pass_two()`

Implementing `asm_pass_one()`, phase 1

Implementing `asm_pass_one()`, phase 2

Implementing `asm_pass_one()`, phase 3

Implementing `asm_pass_two()`

Testing your code

Disassembler Option

Grading Criteria

Checking in Your Code

CS 270, Fall 2014 Programming Assignment PA10 LC-3 Assembler in C

PA10 due Thursday, Dec. 12 at 11:59pm, no late submissions.

Goals of Assignment

Overview of an assembler

Getting Started

Implementing Convenience functions

Stubbing in asm_pass_two()

Implementing asm_pass_one(), phase 1

Implementing asm_pass_one(), phase 2

Implementing asm_pass_one(), phase 3

Implementing asm_pass_two()

Testing your code

Disassembler Option

Grading Criteria

Checking in Your Code

CS 270, Fall 2014
Programming Assignment PA10
LC-3 Assembler in C

Stubbing in `asm_pass_two()`

Implementing `asm_pass_one()`, phase 1

Implementing `asm_pass_one()`, phase 2

Implementing `asm_pass_one()`, phase 3

Implementing `asm_pass_two()`