CS 453 Programming Assignment #3 — Lexical Analysis

Due Friday, February 25th (by 6:00pm)
late policy

Preliminary Subversion Log
Due Monday, February 20 (by 11:59pm)
see below

Introduction:

Sit down and take a deep breath. This project is the first step in building a compiler for MiniJava! You will use a tool called JLex to construct a lexical analyzer that recognizes all of the tokens in MiniJava. You will describe the tokens in MiniJava to JLex using regular expressions.

Getting Started

We provide you with PA3-start.tar, which includes the JLex and JavaCUP packages, a driver you will use for your lexer, and the start of a minijava.lex file for specifying the MiniJava tokens. After downloading PA3-start.tar, do the following:
  1. Untar the start directory.
        % tar xf PA3-start.tar
        
    It should unpack into the following:
        PA3-start/
            JLex/
                ...
            MJLexerStart/    
                src/
                    LexDriver.java
                    mjparser/
                        TokenValue.java
                        java-cup-11a.jar
                        java-cup-11a-runtime.jar
                        minijava.cup
                        minijava.lex
                TestCases/
        
  2. Rename MJLexerStart to MJLexer-groupname, and start a subversion repository for MJLexer-groupname.
  3. Start a new eclipse project with the existing files in MJLexer-groupname.
        
            File -> New -> Project
                Use project specific JRE (1.5 or 5.0)
                Build project from existing source files (MJLexer-groupname)
                Remove TestCases from the buildpath
        
  4. Put java-cup-11a-runtime.jar in CLASSPATH.
        
            Eclipse -> Preferences -> Java -> Build Path -> Classpath Variables
                JAVA_CUP 
                Path: put in full path for java-cup-11a-runtime.jar, 
                      which is in the mjparser/ subdirectory
                
            Project -> Properties -> Libraries -> Add Variable
                select JAVA_CUP
        
  5. Generate parser.java and sym.java from minijava.cup. See the notes in src/mjparser/minijava.cup.
  6. Move the JLex directory into the mjparser/ subdirectory.
        % cd PA3-start
        % mv JLex MJLexer-groupname/src/mjparser/
        
  7. Do a refresh in the Eclipse project. Remove JLex from the project build in Eclipse by right clicking on it in Eclipse and selecting Build->Exclude.
  8. See the minijava.lex file for instructions for generating the Yylex.java file.
  9. Run LexDriver on the TestCases/test.file to observe the following output:
    [1, 0] AND
    [1, 2] BOOLEAN
    [1, 9] SEMI
        
JLex/ contains implementation of JLex tool. java-cup-11a-runtime.jar contains definition for the Symbol class, which is what we will eventually be using to talk to the parser generated by JavaCUP. For more information see a JLex Tutorial and do a google search on JLex.

Lexical Descriptions

The MJLexerStart code includes a starting lexer specification in the minijava.lex file. An abbreviated JLex tutorial can be found at http://pages.cs.wisc.edu/~bodik/cs536/NOTES/2a.JLEX.html.

We recommend you specify one token at a time and then unit test each token with a file that contains instances of that token.

Hints

The Assignment:

Create a lexical description for MiniJava in minijava.lex, and verify that it is correct by feeding it to JLex and testing the constructed lexical analyzer. The BNF grammar for MiniJava can be found here. From the grammar, you can determine which tokens are required by the language, and describe them formally. (The context free grammar specifies how to string these tokens together to construct a legal program. You won't be specifying an order for the tokens yet — just what the tokens are. e.g., "main /* hi */ System.out.println" is a legal input file at this point.) You will probably find this information from the MiniJava language reference useful as well. It describes identifiers and other items in more detail. It says that you should be able to handle nested /*...*/ style comments, but I'm giving you a break: You need to handle both // and /*...*/ style comments, but you may assume that a comment (of any style) never appears within another comment.

Submitting:

Preliminary Subversion Log

Assignment

  • Submit assignment using checkin utility
    ~cs453/bin/checkin PA3 MJLexer-groupname.jar
  • Sanity Check (procedure TA will use to run your assignment):
      % java -classpath java-cup-11a-runtime.jar -jar MJLexer-groupname.jar Test.java
      % mkdir MJLexer-groupname/src/
      % cd MJLexer-groupname/src/
      % cp ../../MJLexer-groupname.jar .
      % jar xf MJLexer-groupname.jar
      
    Note that you need to have a copy of the java-cup-11a-runtime.jar file in the same directory as the MJLexer-groupname.jar file. We will provide our own copy of the runtime jar file for testing. We will be running your lexer on multiple test files. Also, the TA will be looking at the source files and possibly running JLex on the minijava.lex file.
  • Late Policy:

    Late assignments will be accepted up to 24 hours past the due date for a deduction of 20% and will not accepted past this period.


    mstrout@cs.colostate.edu .... February 16, 2007
    Originally written by Brad Richards 2006, modified with permission by Michelle Strout and Dave Rostron 2006. Project changed to use a new tool, Michelle Strout 2008.