CS 453 Programming Assignment #5 — Semantic Analysis

Due Friday, April 16th (by 11:59pm) late policy

Preliminary Subversion Log
Due Monday, March 31st (by 11:59pm)
see below


For your last assignment you wrote a parser using JavaCUP. The parser generates an abstract syntax tree representation of the program. Now that we have a clean representation of the input program, it's time to perform semantic analysis.
  1. As a first step, you will write a visitor that determines what line and position to associate with each node in the abstract syntax tree.
  2. We provide most of a symbol table implementation. You will need to implement the lookup method for the class Scope.
  3. Then you will use the symbol table classes and ast.analysis classes to write a short visitor class that stores away information about classes, methods, and variables as it traverses an AST. This visitor will also generate errors when a symbol is used twice in any visible scopes (e.g. in the case of multiple declarations of the same variable name). You will test all the pieces by compiling with a driver that will generate an AST, traverse it with your visitors, and dump the resulting symbol table information to the file inputfile.ST.dot.
  4. While generating the symbol table entries for variables, your symbol table building visitor will need to assign offsets for each variable. For local variables including formal parameters, the offsets are offsets relative to the frame pointer in the target MIPS code. For member variables, the offset is relative to the start of the memory allocation for the class instance.
  5. The final step will be to perform some semantic analysis. You will be performing typechecking. You will test the semantic analysis by writing test cases that contain the kinds of errors for which you are expected to print error messages. The error messages should be printed to the standard error stream. The more test case sharing that occurs on the mailing list, the better your chances of fully testing your implementation.
To get started download and unpack MJSemDriverStart.tar. Start a subversion repository and verify that the starter project compiles and run for the SemDriver.java main. Then copy your parser from PA4 into the MJSemDriverStart/src/mjparser directory and call your parser within SemDriver.java.

Lines and Positions

In order to print reasonable error messages, you will need to know the line and position for each error. We provide the Lines data structure (MJSemDriverStart/lines/Lines.java) for maintaining the mapping between nodes and the line and position information. You need to write a visitor over the ast that determines this information for each node in the AST. Token nodes have line and position information already associated with them. Non-token nodes should be given the same line and position as the FIRST child token with which they are associated. If the non-token node has no children, then it should be given the line and position for the first token that follows it in a depth-first search traversal. For example,
  a + b - c * d
assume that the variable c in the above is at line 10 and position 11. The line and position associated with the multiplication (i.e. MulExp node) should be line 10 and position 11. You may want to use the ReversedDepthFirstAdapter to implement the visitor that calculates lines and positions.

Symbol Tables

We provide most of a symbol table implementation in MJSemDriverStart/symtable. The SymTable data structure contains a stack of scopes. The outermost, or bottom-most, scope is the global scope. It is then possible to have a class scope and a method scope on the stack as well. Each Scope instance will refer to its enclosing Scope. Therefore if a lookup is performed on a symbol, if it is not found in the current scope then it is possible to traverse to the enclosing scope to look for the symbol there. Each symbol will have a symbol table entry (STE) associated with it. Your first task will be to fill in the missing lookup method body in the Scope class.

The Symbol Table Builder Visitor

Once you've finished with the symbol table code, define a visitor class that will add entries to the table as it traverses the AST. It may help to look at PrintVisitor.java. The main difference between BuildSymTable and PrintVisitor is that in BuildSymTable you don't have to override the case methods. Instead you will be able to implement your symbol table builder visitor using the in and out methods for various AST nodes. The parser has built lists of variable declarations, method definitions, and class declarations, so you won't have to look through much of the tree to find the information you need. Make sure your visitor class has a SymTable instance variable, and a getSymTable() method that returns it so that we can get at the fruits of its labor.

For the MiniJava we are implementing, no symbol can be used twice in the same scope. Also, no name can be used within an enclosing scope that already has another definition for that name. This means that no method name or member variable name can be the same as any of the classes that have been defined before it and no local variable can be the same as any classes, current class methods, or current class members that have been already defined.

The symbol table builder should print out the following error to standard error anytime a symbol is redefined:

where LINENUM is the line number for the symbol, POSNUM is the position number for the symbol, and SYMNAME is the symbol name. All such errors should be printed to standard error, and at the end of symbol table construction if any such errors are printed then the following error should be printed to standard error and your compiler should exit:
Errors found while building SymTable
The BuildSymTable visitor should just skip adding symbol table entries for any of the identifiers within any class or method if the class name or method name is multiply defined. Notice that the SemanticDriver.java expects to catch a SemanticException class. You can find the definition for SemanticException in MJSemDriverStart/exceptions.

The MethodSTE constructor takes a Frame interface as one of its parameters. For now, just pass the following:

    new Temp.Label(mCurrentClass.getName()+"_"+node.getName().getText()),
                   new LinkedList())
The LocalSTE constructor takes an Access interface as one of their parameters. For now, just pass the following:
    new Mips.InReg(new Temp.Temp())
The correct way to create this parameter is discussed in the next step of the assignment.

Memory Layout

Next, you will change the MemberSTE and LocalSTE constructor calls so that each LocalSTE and MemberSTE ends up with information about where it will be stored at runtime. Local variables and parameters will be stored in the stack frame. Class member variables will be stored in a class instance, which will be allocated on the heap.

Stack Layout

The handout provided in class describes the Frame.Frame, Frame.Access, Mips.Frame, Mips.InFrame, Mips.InReg, Temp.Temp, and Temp.Label classes. These classes have been provided for you in the MJSemDriverStart.

When the BuildSymTable is processing a method declaration, it should call newFrame to create a frame for the method and then it should create LocalSTEs for each of the parameters including the implicit this parameter.

When the BuildSymTable visitor is processing a local variable declaration, that local should be given space on the frame using a call to the allocLocal() for a particular method's frame. Local variables should be allocated space in the order they are declared.

public abstract class Frame {
  public abstract Frame newFrame(Label name, List<Boolean> formals);
  public Label name;
  public List<Access> formals;
  public abstract Access allocLocal(boolean escape);
  public abstract int wordSize();

Class Layout

The ClassSTE will need to keep track of the current available offset within its layout. For example, let's say the class has three member variables: int a, boolean b, and Class1 c. When the declaration for int a is being processed, the BuildSymTable visitor will need to call allocMember(frame.wordSize()) and the first offset received will be 0. When allocMember() is called for boolean b, the return value will be equivalent to frame.wordSize(), which is 4 in the case of MIPS. The offset returned from allocMember() for Class1 c will be 8. The integer, boolean, and class reference datatypes are all wordSize() in MiniJava. Member variables must be allocated space in the order they are declared.
public class ClassSTE extends STE {
  // returns an offset into the class for storing a member variable
  public int allocMember(int sizeInBytes) { ... }

The Type Check Visitor

The type check visitor is responsible for flagging incorrect type usage within a language, for example adding an integer to a boolean. The type errors you are responsible for generating along with hints as to where such errors might occur are listed below. To make your grade better and Alan's life easier, do not change the phrasing of the error messages.
[LINENUM,POSNUM] Invalid type returned from method METHODNAME
    // any method declaration

[LINENUM,POSNUM] Class CLASSNAME does not exist
    // anywhere an id token should be a class name

[LINENUM,POSNUM] Invalid condition type for if statement
[LINENUM,POSNUM] Invalid condition type for while statement

[LINENUM,POSNUM] Undeclared variable VARNAME
    // anywhere an id token could be a variable name

[LINENUM,POSNUM] Invalid expression type assigned to variable VARNAME
    // assignment statements

[LINENUM,POSNUM] Array reference to non-array type
    // array expressions and array assignments

[LINENUM,POSNUM] Invalid index expression type for array reference

[LINENUM,POSNUM] Operator length called on non-array type

[LINENUM,POSNUM] Invalid expression type assigned into array 
    // array assignments

[LINENUM,POSNUM] Invalid left operand type for operator OP

[LINENUM,POSNUM] Invalid right operand type for operator OP

[LINENUM,POSNUM] Invalid operand type for operator !

[LINENUM,POSNUM] Receiver of method call must be a class type

[LINENUM,POSNUM] Method METHODNAME does not exist in class type CLASSNAME

[LINENUM,POSNUM] Method METHODNAME requires exactly NUM arguments

[LINENUM,POSNUM] Invalid argument type for method METHODNAME
    // any call to a method (note that println is a method)

[LINENUM,POSNUM] Invalid operand type for new array operator

With this visitor, you should stop type checking after the first error is detected and reported. If multiple errors are relevant for the same AST node, then use the above list as an ordering of which error message should be generated. For example, if a call is made to a non-existing method and passed the wrong number of arguments, then the following error should be printed to standard error:
[LINENUM,POSNUM] Method METHODNAME does not exist in class type CLASSNAME

The Assignment:

After each piece of the assignment you should use the SemDriver.java driver to test that piece (just comment out calls to later pieces). We will use the SemDriver and perform diffs on the files you generate and your standard error output with the corresponding output from our reference compiler implementation. There are example programs and their output in the MJSemDriver/TestCases/. The .out files include the standard error and standard output results for running the reference compiler on the associated input file. To partially test the visitor that builds the symbol table, you can compare your results with the .ST.dot files. Other relevant output files include .ast.dot and .astlines.dot.


Preliminary Subversion Log


  • Submit assignment using checkin utility
    ~cs453/bin/checkin PA5 MJSemDriver-groupname.jar
  • Sanity Check (procedure TA will use to run your assignment):
      % java -classpath java-cup-11a-runtime.jar -jar MJSemDriver-groupname.jar filename.java
      % mkdir MJSemDriver-groupname/src/
      % cd MJSemDriver-groupname/src/
      % cp ../../MJSemDriver-groupname.jar .
      % jar xf MJSemDriver-groupname.jar
    Note that you need to have a copy of the java-cup-11a-runtime.jar file in the same directory as the MJParser-groupname.jar file. We will provide our own copy of the runtime jar file for testing. We will be running your parser with semantic analysis on multiple test files. Also, the TA will be looking at the source files.
  • Late Policy:

    Late assignments will be accepted up to 24 hours past the due date for a deduction of 20% and will not accepted past this period.

    mstrout@cs.colostate.edu .... March 23, 2008
    Originally written by Brad Richards 2006, modified with permission by Michelle Strout 2007 and 2008.