CS 453 Programming Assignment #7 — MiniJava: arrays and classes

Due Wednesday May 5th (by 11:59pm)

Introduction

This assignment is to be done in groups of two or three. In this assignment you will be extending your MiniJava compiler from PA6 to handle programs with arrays and class reference declarations. Your PA7 MiniJava compiler will also need to add another scoping level to enable class types with member variables as well as methods. Also, you will be implementing type checking and code generation for arrays and classes. PA7 implements all of MiniJava EXCEPT for inheritance. The goals of this assignment are for you to learn how to

The Assignment

You Makefile should be able to create a jar file, PA7_groupname.jar, that can be executed as follows:
   java -jar PA7_groupname.jar --two-pass-mips input_file
   
When called with the --two-pass-mips option, the program should generate a MIPS program in the input_file.s file. The generated MIPS program must be capable of being interpreted by the MARS interpreter.

The input file to your compiler will contain a legal MiniJava/Java program including inline comments and C-like comments. For example, the following program is possible PA7 input:

class ArrayAssign {
    public static void main(String[] a){
        System.out.println(new MyClass().testing());
    }
}

/* here is a multi-line C comment
   goodbye */
class MyClass {
    int [] y;	// member variable
    public int testing() {
        int [] x;	// local variable
        y = new int [80];
        x = new int [10];
        x[0] = 7;
        x[3] = 77;
        y[75] = 42;
        return x[0] + y[x[3]-2];
    }

}
We will NOT be giving you a reference compiler for this assignment. Use javac and java to compare your results. Your type errors will not be identical, but you should be catching the same first error that javac catches. If you are concerned about matching the output of the reference compiler for type errors, then send example test cases to the mailing list and we will send the output back to the whole list.

For this assignment, you need to maintain the semantic analysis from the previous assignments and additionally catch the following errors:

[LINENUM,POSNUM] Class CLASSNAME does not exist
    // anywhere an id token should be a class name

[LINENUM,POSNUM] Array reference to non-array type
    // array expressions and array assignments

[LINENUM,POSNUM] Invalid index expression type for array reference

[LINENUM,POSNUM] Operator length called on non-array type

[LINENUM,POSNUM] Invalid expression type assigned into array 
    // array assignments

[LINENUM,POSNUM] Receiver of method call must be a class type

[LINENUM,POSNUM] Method METHODNAME does not exist in class type CLASSNAME

[LINENUM,POSNUM] Invalid operand type for new array operator
The above errors are all considered inappropriate type usage errors and therefore it is only necessary to report the first error.

As always you should start this assignment right away. We recommend the following progression:

  1. Add the tokens, grammar rules, and AST building actions for the remaining grammar rules. Since you are not doing inheritance, you don't need to handle the grammar rule with the keyword "extends". Test that your compiler can parse and create ASTs for such programs.
  2. Extend your symbol table data structure and the visitor that creates it to include class symbol table entries.
  3. Extend your check types visitor so that it finds class and array related type errors.
  4. Implement code generation for each of the following program constructs:
    1. the new classname() construct
    2. the implicit "this" parameter
    3. calling class methods
    4. class member variables uses and defines
    5. the new int[exp] construct
    6. the arrayref.length expression
    7. array assignment and array expressions
The remainder of this writeup provides details about each of the recommended phases.

Phase 1: Adding Grammar Rules and New Tokens

Besides classes and arrays, PA7 also includes being able to parse MiniJava programs with comments in them. You can handle the comments in the lexer by organizing the following in your lexer:
LINE_COMMENT="//"{NOT_EOL}*{EOL}
C_COMMENT="/*"{NOT_STAR}*("*"({NOT_STAR_OR_SLASH}{NOT_STAR}*)?)*"*/"
NOT_EOL=[^\r\n]
NOT_STAR=[^*]
NOT_STAR_OR_SLASH=[^*/]
{LINE_COMMENT} { /* ignore comments */ yychar = 0; yy_buffer_start = yy_buffer_index-1; /* reset for EOL */ }
{C_COMMENT} { /* ignore comments */  }
For expressions, use the following precedence, which is listed from lowest to highest with operators and tokens listed on the same line having equal precedence:
  1. Exp && Exp
  2. Exp < Exp
  3. Exp + Exp, Exp - Exp
  4. Exp * Exp
  5. ! Exp
  6. Exp [ Exp ], Exp . length, Exp . id ( ExpList )
  7. INTEGER_LITERAL, false, true, this, id, new id (), new int [ Exp ], ( Exp )
Of the above operators, only the not(!) operator has right-to-left associativity.

You will probably run into some shift/reduce parser issues for the full MiniJava grammar. The AmbiguousGrammarExamples.tar example that we went over in class covers how to generate debug information for solving shift/reduce errors.

As in previous assignments, all of the AST nodes needed have been provided in MJFuncStart/src/ast/nodes. Also, the ast/analysis package has been updated.

Phase 2: The Symbol Table

Extend your symbol table package to include a symbol table entry for classes (ClassSTE). Class STEs should also contain a scope that can then contain member variables and methods. Each class also has its own namespace, and therefore different classes can have the same method name in them. This means that the method name will need to be mangled, both when the label is printed and for any calls to that method. We recommend that you extend the method STEs to keep track of a mangled name for the method. The mangled name can be some concatenation of the class name and the method name.

Here are some routines you might want to implement for class STEs:

    int allocMember(int size) - Given the size of a member variable, 
        allocates space for the member variable in the class layout and 
        returns the member variable's offset.
    
    int getNumVariables() - Returns the number of member variables that
        a class contains.  Useful when computing the size of a class
        instance.  Note that in MiniJava, all member variables are of
        size 4 bytes.
While building the symbol table for PA7, the new features needed are the following: You will probably want member variables in your various AST visitors that keep track of the current class being traversed and whether or not the visitor is in a method subtree.

Phase 3: Type Checking

Your compiler will need to perform some additional type checking. The following is a list of ALL the error messages that will be expected:
    [LINENUM,POSNUM] Class CLASSNAME does not exist
        // anywhere an id token should be a class name

    [LINENUM,POSNUM] Array reference to non-array type
        // array expressions and array assignments

    [LINENUM,POSNUM] Invalid index expression type for array reference
        // index expressions must be of type integer

    [LINENUM,POSNUM] Operator length called on non-array type

    [LINENUM,POSNUM] Invalid expression type assigned into array 
        // array assignments

    [LINENUM,POSNUM] Receiver of method call must be a class type

    [LINENUM,POSNUM] Method METHODNAME does not exist in class type CLASSNAME

    [LINENUM,POSNUM] Invalid operand type for new array operator



// Errors from previous assignment that are still relevant

    [LINENUM,POSNUM] Invalid type returned from method METHODNAME
        // any method declaration

    [LINENUM,POSNUM] Method METHODNAME requires exactly NUM arguments

    [LINENUM,POSNUM] Invalid argument type for method METHODNAME
        // any call to a method (note that println is a method)



    [LINENUM,POSNUM] Undeclared variable VARNAME
        // anywhere an id token could be a variable name

    [LINENUM,POSNUM] Invalid expression type assigned to variable VARNAME
        // assignment statements

    [LINENUM,POSNUM] Invalid left operand type for operator OP

    [LINENUM,POSNUM] Invalid right operand type for operator OP

    [LINENUM,POSNUM] Invalid operand type for operator !

    [LINENUM,POSNUM] Invalid condition type for if statement
    
    [LINENUM,POSNUM] Invalid condition type for while statement

where LINENUM is the line number for the symbol, POSNUM is the position number for the symbol, OP is a specific binary operator, CLASSNAME is a specific class name, METHODNAME is the unmangled method name, and VARNAME is a specific variable name.

To enable easier testing and grading, do not change the phrasing of the error messages.

Method calls in the MiniJava grammar include an expression on the left-hand-side of the ".", which is called the receiver. The receiver is a reference/pointer to an object instance on which a method is invoked; it "receives" the method call. The receiver is what is passed in as the implicit "this" parameter.

When at a method call (CallExp), it is necessary to know the class type for the receiver in order to lookup the method name. Keep in mind that all of the methods in MiniJava are public and therefore can be called from within other classes. This means that you will have to look up the class symbol table entry for the method being called in order to look up the method symbol table entry. In the type checking visitor, you are already maintaining a mapping of expression nodes to their type. Therefore, determining the type of the receiver is simply a matter of looking up the type for the receiver expression while you are in the call expression AST node. You will also need this information in other AST visitors, so you will need to maintain a mapping of expressions to types either in the symbol table or for each visitor.

Phase 4: Implementing New Language Features

While your partner is implementing the type checking, you can be implementing MIPS code generation for the following language features:
  1. the new classname() construct: Generate code that calls halloc with the number of bytes needed to represent an instance of the class. The reference/pointer returned by halloc should end up on the top of the run-time stack. Recall that in PA2 you wrote a halloc function in MIPS.
  2. the implicit "this" parameter: The implicit this parameter is just another parameter. It should be put into the symbol table for each method, have an offset in the stack frame (it is the first parameter), and have code generated for it like IdExps for local variables.
  3. calling class methods: When generating code for method calls, you need to modify caseCallExp so that code is generated for the receiver subtree and then generate code that passes in the value generated for the receiver subtree as the implicit "this" parameter.
  4. uses and defines of class member variables: Member variables need to use the implicit "this" variable value as their offset. Therefore, in IdExp you will need to determine whether the variable being accessed is a member variable or not (Hint: keep a flag in VarSTE), and if it is then load the value of the "this" reference into a register and use that register as your variable's base. You also need to determine whether a variable is a member variable or not in the assignment statements.
  5. the new int[exp] construct: We recommend writing an initArray run-time library routine that calls halloc to allocate space for the array and then initializes all of the array elements to zero. Keep in mind that the array should have an extra entry at the front that stores the size of the array.
  6. the arrayref.length expression: Generate code that fetches the array size from the first word of the memory allocated for an array.
  7. array assignment and array expressions: The array reference will provide a pointer to the front of the array. Generate code that performs the necessary array addressing arithmetic to determine the address of the array element being accessed. Either write to that array element address or read from it based on whether generating code for an array assignment or an array expression.

Getting Started

Download the MJFullStart.tgz file and untar it. As for previous assignments, we have given you all of the needed AST nodes and the DepthFirstAdapter class that is capable of visiting all the AST nodes.

The TestCases directory includes some test cases.

Start working through the phases as outlined above.

Submitting the Assignment

Late Policy

Late assignments will be accepted up to 24 hours past the due date and time for a deduction of 20% and will not accepted past this period.


mstrout@cs.colostate.edu .... April 21, 2010