CS 453 Programming Assignment #5 — MiniJava: variable declarations, boolean expressions, and control flow

Due Wednesday April 7th (by 11:59pm)


This assignment is to be done in groups of two or three. In this assignment you will be extending your MiniJava compiler from PA4 to handle programs containing variable declarations, boolean expressions, and the while and if statements as well as maintaining the functionality for constant expressions, assignment statements, and println statements. The goals of this assignment are for you to learn how to

The Assignment

You should create a jar file, PA5_groupname.jar, that can be executed as follows:
   java -jar PA5_groupname.jar --two-pass-mips input_file

Your program should generate an AST that is printed to an input_file.ast.dot file. Next your program should generate a symbol table and an associated input_file.ST.dot file that visualizes the symbol table. Your program should also print out a dot file that helps visualize the lines and positions associated with each of the AST nodes. A visitor that checks the type usage in the program will use the symbol table to lookup type information. Finally, the code generation visitor should be extended to handle the new program constructs and generate a MIPS program in the input_file.s file. The generated MIPS program must be capable of being interpreted by the MARS interpreter.

The input_file will contain a special main construct that starts with the keywords "special" and "main" and is followed by curly braces containing zero or more variable declarations followed by zero or more statements with the assignment of boolean and integer expressions to variables, println statements, if statements, and while statements. For example, the input_file might include the following:

    special main {
        int a;
        boolean b;
        a = 1;
        b = true && ! false && (a<3);
        if (b && true) {
            a = 42;
        } else {
            a = 7;
        System.out.println(a*1 + 2);
        while (0 < a) {
            a = a - 10;
The output from running the compiler should be as follows:
Printing dot file to input_file.ast.dot
Printing symbol table to input_file.ST.dot
Printing AST with line and position info to input_file.astlines.dot
Printing MARS MIPS file to input_file.s
The output from running MARS on the file input_file.s should be as follows:
MARS 3.8  Copyright 2003-2010 Pete Sanderson and Kenneth Vollmar


For this assignment, you can no longer assume correct input programs. The programs will be correct syntactically, but they can contain semantic errors, which your program must report. The two types of semantic errors are (1) redeclarations of variables and (2) inappropriate type usage including undeclared variables. ALL variable redeclarations should be reported. For example, the following input:

    special main
      int a;
      int a;
      int b;
      int a;
      a = 5;
      b = 5;
      System.out.println( a + b );
should result in the following output:
    Printing dot file to two-redeclares.txt.ast.dot
    [4,11] Redeclared symbol a
    [6,11] Redeclared symbol a
    Errors found while building symbol table
For inappropriate type usage, it is only necessary to report the first error. For example, the following input file:
    special main {
        int b;
        boolean a;
        b = 3;
        System.out.println(a + b * 2);
        System.out.println(2 * (6 - 1) + 2 - x);
results in only the first type error being reported, even though the variable x is also not declared.
    Printing dot file to error-op1.txt.ast.dot
    Printing symbol table to error-op1.txt.ST.dot
    Printing AST with line and position info to error-op1.txt.astlines.dot
    [5,24] Invalid left operand type for operator +
This is a significant assignment that you should start right away. We strongly recommend the following progression:
  1. Add the tokens, grammar rules, and AST building actions for variable declarations and the special main construct. Test that your compiler can parse and create ASTs for such programs.
  2. Create and test a symbol table data structure that is capable of inserting symbols with type and offset information, looking up symbols, and generating dot output.
  3. Write and test a build symbol table AST visitor that inserts entries into the symbol table while visiting variable declarations. The build symbol table visitor should also generate any redeclaration errors.
  4. Write a compute lines and positions visitor that generates a lines.Lines data structure (see MJControlFlowStart/src/lines/Lines.java), which maps each node in the AST to an approximate line and position.
  5. Write and test a check types visitor that uses the symbol table to determine if variables are being used properly within integer expressions.
  6. Implement and test the tokens, grammar productions, type checking, and code generation for each of the following program constructs in order:
    1. if statements and the true and false boolean constants
    2. less than (<) operator
    3. NOT (!) and short-circuited AND (&&) operators
    4. while loop
    We strongly recommend creating test cases for each of the different constructs.
The remainder of this writeup provides details about each of the recommended phases.

Phase 1: Adding Grammar Rules and New Tokens

The special main construct is not part of the online MiniJava grammar. It can be described with the following grammar rule:
    SpecialMain ::= "special" "main" "{"  (VarDeclaration)* (Statement)* "}"
The statement and variable declaration grammar productions can be the same as those found here. Keep in mind that you have to create list productions as you did in the last assignment for the statement list. The only variable types you need to handle for this assignment are the "int" and "boolean" types.

As in PA4, all of the AST nodes needed have been provided in MJControlFlowStart/src/ast/nodes. Also, the ast/analysis package has been updated.

Phase 2: The Symbol Table

Create a new package symtable that will include the symbol table and symbol table entry classes. A symbol table entry (STE) for this assignment should contain information about a symbol/variable's base, offset, and type. You will need to create your own STE classes. We recommend have an STE subclass for variables now, and eventually you will need to add subclasses for MiniJava classes and methods.

The symbol table data structure should be able to insert and lookup the symbol table entries for symbols and print itself to a dot file. Here is a possible interface:

    STE lookup(String)
    void insert(STE)
    void outputDot(...)
MJControlFlowStart includes the reference compiler MJ.jar for this assignment. Use the reference compiler to generate example symbol table .dot files (i.e., they are the .ST.dot files). Use any format you prefer, but some form of visualization of the symbol table with dot is required.

Phase 3: Building the Symbol Table

The symbol table builder should print out the following error to standard error anytime a symbol is redeclared:
[LINENUM,POSNUM] Redeclared symbol SYMNAME
where LINENUM is the line number for the symbol, POSNUM is the position number for the symbol, and SYMNAME is the symbol name. ALL such errors should be printed to standard error, and at the end of symbol table construction if any such errors are printed then the following error should be printed to standard error and your compiler should exit:
Errors found while building symbol table
Note that since the reference compiler is provided in jar form, you can compare your output with the output from the reference MJ.jar. Also note that id tokens (i.e., the Token class) already has line and position information.

Phase 4: Lines and Positions

In order to print reasonable error messages, you will need to know the line and position for each error, even those that occur at non-Token nodes in the AST. We provide an example Lines data structure (MJControlFlowStart/src/lines/Lines.java) for maintaining the mapping between nodes and the line and position information. We also provide a visitor for generating a dot file with the line and position information (MJControlFlowStart/src/ast_visitors/DotLinesVisitor.java). You need to write a visitor over the AST that determines this information for each node in the AST. Token nodes have line and position information already associated with them. Non-token nodes should be given the same line and position as the FIRST child token with which they are associated. If the non-token node has no children, then it should be given the line and position for the first token that follows it in a depth-first search traversal. For example,
  a + b - c * d
assume that the variable c in the above is at line 10 and position 11. The line and position associated with the multiplication (i.e. MulExp node) should be line 10 and position 11. You may want to use the ReversedDepthFirstAdapter (in MJControlFlow/src/ast/analysis/) to implement the visitor that calculates lines and positions.

To generate the correct line and position information, you will have to modify the .lex files so as to reset the yychar variable to 0 everytime an end of line token is found.

    {EOL} { /* reset position counter */ yychar = 0; }
Note also that the reference compiler adds 1 to yyline when creating tokens, because the generated lexer starts the line number at 0.
    "+"  {return new Symbol(sym.PLUS,new TokenValue(yytext(), yyline+1, yychar));}

Phase 5: Type Checking

For this assignment variables will be declared as either of type boolean or int. Your compiler will need to perform some type checking. The following is a list of error messages that will be expected:
    [LINENUM,POSNUM] Undeclared variable VARNAME
        // anywhere an id token could be a variable name

    [LINENUM,POSNUM] Invalid expression type assigned to variable VARNAME
        // assignment statements

    [LINENUM,POSNUM] Invalid left operand type for operator OP

    [LINENUM,POSNUM] Invalid right operand type for operator OP

    [LINENUM,POSNUM] Invalid operand type for operator !

    [LINENUM,POSNUM] Invalid argument type for method println

    [LINENUM,POSNUM] Invalid condition type for if statement
    [LINENUM,POSNUM] Invalid condition type for while statement

where LINENUM is the line number for the symbol, POSNUM is the position number for the symbol, OP is a specific binary operator, and VARNAME is a specific variable name.

To enable easier testing and grading do not change the phrasing of the error messages. Implement the first four errors where relevant and then implement other variants of error messages three and four and the rest of the errors in phase 6.

Phase 6: Implementing New Language Features

At this point, your compiler should have a lexer, a parser, a visitor that creates the symbol table, a check types visitor, and a MIPS code generator visitor. Each new language feature will require minor extensions to each of those components. We recommend implementing one feature at a time.

For expressions, use the following precedence, which is listed from lowest to highest with operators and tokens listed on the same line having equal precedence:

  1. Exp && Exp
  2. Exp < Exp
  3. Exp + Exp, Exp - Exp
  4. Exp * Exp
  5. ! Exp
  6. INTEGER_LITERAL, false, true, id, ( Exp )
Of the above operators, only the not(!) operator has right-to-left associativity.

Getting Started

Download the MJControlFlowStart.tgz file and untar it. As for previous assignments, we have given you all of the needed AST nodes and the DepthFirstAdapter class that is capable of visiting all the AST nodes.

The TestCases directory includes some test cases.

Start working through the phases as outlined above.

Submitting the Assignment

Late Policy

Late assignments will be accepted up to 24 hours past the due date and time for a deduction of 20% and will not accepted past this period. Late means anything after 11:59pm on the day the assignment is due, including 12 midnight.

mstrout@cs.colostate.edu .... April 7, 2010