CS453 Colorado State University ========================================== Undergraduate Compilers Review ========================================== 8/27/09 ------------------------------ Outline - lexer - parser - AST creation - visitors over AST - semantic analysis - code generation ------------------------------ JLex syntax how to express regular expressions to JLex -page 121-124 in book, look in JLex manual as well single characters "a", "b", "\n", "\t" character classes [a-z], [a-zA-Z] union/alternation a|b JLex: set of characters that does not include x [^x] JLex: intermediate regular expressions, or regular definitions MYRE=(a|b) HELLO={MYRE}* Important notes -no spaces allowed in regular expression definition -suggest adding one regular expression at a time and then continually create Java file with JLex by doing a "make source" or "./MakeSource.sh" -can use parentheses in regular expression specification -can not nest union and * in the same regular expression definition ---------------------------- Using JLex with JavaCUP (also see Makefile) To generate a lexer: % java -jar JLex.jar mj.lex % mv mj.lex.java Yylex.java This assumes the current directory contains JLex.jar. To create sym.java and mj_ast.java: % java -jar java-cup-11a.jar -parser mj_ast mj_ast.cup ==================== Outline of .lex file ==================== package mjparser; import java_cup.runtime.Symbol; %% %line %char %cup %public %eofval{ return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar)); %eofval} EOL=\r|\n|\r\n NOT_EOL=[^\r\n] %% "&&" { return new Symbol(sym.AND, new TokenValue(yytext(), yyline+1, yychar)); } . { System.out.println("Illegal character: "+yytext()); } ==================== ==================== Outline of .cup file ==================== package mjparser; import java_cup.runtime.*; terminal AND, BOOLEAN, SEMI; // more terminal declarations non terminal garbage; // Grammar rule with terminals to avoid error message. garbage ::= AND BOOLEAN SEMI; ==================== ---------------------------- Conflict resolution (selecting amongst >1 possible tokens) How can the lexer tell the difference between "if" and "ifmyvar"? (1) longest match The generated lexer will pick from amongst the longest possible tokens. How can the lexer tell the difference between the keyword "if" and an identifier "if"? (2) priority rule The generated lexer will select the token of highest priority from amongst a set of possible tokens of the same length. Priority is indicated by the order that the regular expressions for the tokens are specified. ------------------------------- Parser Construction with JavaCUP Also details on how JavaCUP deals with ambiguity in the expression grammar. ... terminal PLUS, MINUS, TIMES, LPAREN, RPAREN, NUMBER; non terminal expr; precedence left PLUS,MINUS; precedence left TIMES; expr ::= NUMBER:n {: RESULT = ...; :} | expr:a PLUS expr:b {: RESULT = ...; :} | expr:a MINUS expr:b {: RESULT = ...; :} | expr:a TIMES expr:b {: RESULT = ...; :} | LPAREN expr:e RPAREN {: RESULT = ... :} Can use the precedence keyword and left and right associativity key words. - precedence should be listed from lowest to highest Note the following: - terminals in JavaCUP are typically written with all capitals - ::= is the right arrow in the production and | indicates another production rule for the nonterminal expr - The {: :} syntax are brackets around the action being associated with the production rule it follows. - The :varname syntax give a name to the value being "passed up" the implicit parse tree. The varname can then be used in the action associated with the production. - RESULT is a special variable name that represents the value being associated with the lhs nonterminal of the production and being passed up the implicit parse tree. ------------------------- --> Show usage of ./MakeSource.sh or "make source" ------------------------- --------------------- Abstract Syntax Trees General Concepts - ast has a lot fewer nodes - often is language specific - we will be generating MIPS code from the AST Constructing an AST Concepts in the book and in PA3 - in book each node has an attribute n, which is an AST node reference/pointer - in JavaCUP the same thing is true, execpt you don't explicitly manipulate the mapping of nodes to AST node references. -> instead you specify the reference type for each nonterminal (e.g. IExp) -> and then specify variable names for nonterminals on the rhs of a production to access their attribute in the action code - Figure 2.39 provides an example of how an AST is generated using actions. So does 5.10 and 5.11. ------------------------- --> Run MJDriver.java on ParamClash.java --> Show AST for ParamClash.java ------------------------- -------------------------- Symbol Table Discussed in Ch. 2.7.1 as well. Data structures (-> data structures slide (20)) -The book suggests an Env class that contains a hashtable and a link to its enclosing Env. With this Env class, you can construct a tree structure while building the symbol table to represent scoping. The semantic rules in 2.38 maintain a stack of Env and keep a reference to the most deeply nested scope. -In the MiniJava compiler, we are creating a symbol table that is then passed on to later stages of the compiler. We recommend that you have a single symbol table class instance that represents the symbol table and provides an interface for its creation and usage. We recommend the following set of classes: SymTable The symbol table should maintain a stack of scopes with the current most deeply nested scope at the top of the stack. The symbol table should also maintain a reference to the outermost (or global) scope. void insertAndPushScope(NamedScopeSTE) For first time a named scope is created like a MethodSTE. void pushScope(String) Looks up a named scope like a method and then pushes its scope on the stack. void popScope() Pops the top scope off the stack. STE lookup(String) Does a lookup in most nested scope. void insert(STE) Inserts symbol table entry into most deeply nested scope. MethodSTE The method symbol table entry contains a reference to signature information and to the method's scope. Scope Contains a dictionary that maps strings to symbol table entries. Also maintains a reference to enclosing scope. int num() Returns number of symbols in this scope. STE lookup(String) Looks for given symbol in this scope. If it doesn't find it, then calls lookup on enclosing scope. void insert(STE) Inserts symbol table entry into this scope. ------------------------- --> Show symbol table for ParamClash.java ------------------------- ------------------------- Type data structure ------------------------- --> still on Slide 21 ------------------------- What are the ways we can construct types in MiniJava? VarDecl -> Type id ; Type -> int [] | boolean | int | id ClassDecl -> class id { VarDecl* } MethodDecl -> public Type id ( FormalList ) ------------------------- --> still on Slide 22 ------------------------- Discussion of the Type data structure usage for following example: class C { int i; int [] y; C c; public int foo( int [] x, boolean p ) { B b; -> write these down in any order and ask around the room as to the correctness // correct in terms of typing p = true; i = y[i]; x = y; c = this; // legal? b = new B(); // incorrect c = new B(); i = x; } } Representing types in the symbol table. [Extra Notes not covered in class] - AST has nodes IntType, ClassType, and BoolType that derive from IType. Probably want a helper method that can convert one of those instances into the appropriate symtable.Type instance. Other information to maintain in the symbol table [Extra Notes not covered in class] - for each VarSTE, is it a local variable or a member variable - offset for each member variable - for each class, how many member variables does it have? What will the object size be? - for each method, - a VarSTE for "this" parameter - how many parameters including the implicit "this" parameter? - how many local variables - mangled method name, e.g. classname_method ------------------------- --> Slide 23: show that MJDriver generated a .s file and run that through MARS ------------------------- ----------------- Why MIPS? [Extra Notes not covered in class] - real machine - considered close to pure RISC (reduced instruction set computer) - appendix A is an important resource - MARS tool we are using to interpret MIPS has some online help we will cover MARS tool in recitation -> for now show help in MARS --------------- Why is it useful to be comfortable with assembly? [Extra Notes not covered in class] -programs for embedded processors are sometimes written in assembly -some of these low-level concepts are needed to implement drivers -useful to understand an example of how data is organized in memory because might have to use a hexdump to do some nasty debugging some day --------------- MIPS basics ISA: instruction set architecture, interface machine provides -delay slot in MIPS does not show up in ISA, YEAH! - ld-store architecture -> draw picture of abstraction: memory (data, text, heap, stack), registers, CPU - only load-store operations have access to memory - Example: a = b assume a is stored at address $fp-0 assume b is stored at address $fp-4 - loads lw $t0, -4($fp) - stores sw $t0, 0($fp) [Extra Notes not covered in class] - registers (page 22 in Appendix) $0 or $zero always have the value 0 $v0 $v1 - typically used for results of an expression eval or function return value, we don't use these when emulating the Patt and Patel calling convention $a0-$a3 - function arguments, we don't use these when emulating the Patt and Patel calling convention $t0-$t9 - caller-saved registers $s0-$s7 - callee-saved registers $sp - the stack pointer $fp - the frame pointer $ra - the return address [Extra Notes not covered in class] - some instructions and pseudo-instructions add $t0, $t1, $t2 semantics: $t0 = $t1 + $t2 addi $t0, $t1, 4 semantics: $t0 = $t1+4 li $t0, num $t0 = num move $t0, $t1 $t0 = $t1 [Extra Notes not covered in class] - outline for the main function (MARS specific) - The main function must be the first function in the assembly file given to MARS because MARS starts executing from the top of the file. # the following three lines are not necessary for MARS, # but make things more readable for humans .text .globl main main: ... # HALT the MARS simulation. li $v0, 10 syscall - creating constant strings .data astr: .asciiz "Hello\n" - system calls li $v0, 4 la $a0, astr syscall ----------------------------------------------------- MIPS instructions probably need for MiniJava code gen [Extra Notes not covered in class] .text .global L1 L1: beq $t0, $t1, L1 bne $t0, $t1, L1 blt $t0, $t1, L1 j L1 jal L1 jr $ra sw $t0, 4($t1) lw $t0, -8($sp) li $t0, 42 move $t0, $t1 addi $t0, $t1, 8 and $t0, $t1, $t2 sub, mul, xor, add syscall ----------------------------- Expression Evaluation in MIPS We do expression evaluation in the MJ compiler by pushing popping operands off the stack and then pushing the result on the top of the stack. This means we only need enough registers to hold operands and the result for a single operation. ------------------------- --> still on Slide 23: show MIPS code for Expr1.java ------------------------- ------------------------------ calling convention ------------------------- --> Slide 24 ------------------------- - calling convention is interface between caller and callee - callers have to pass parameters to callee - callees have to pass return values to caller - callers and callees have to agree who is storing what registers - stack pointer and frame pointer are used to keep track of the runtime stack --> Slide 24: step through main.c and foo example -------------------- Summarize rules (slide 25) [Extra Notes not covered in class] Why these rules? - in general, easier for compiler to generate code - all locals and parameters have a constant offset from the frame pointer - don't have to find max args because just push parameters on stack - don't have different logic for parameters past number of registers set aside for parameter passing - since pushing parameters backwards can handle variable argument lists (e.g. printf) NOTE: The Patt and Patel rules are similar to but NOT identical to the rules used most often on real machines. - typically return values are passed through registers - typically parameters are passed through registers - there are variations on where the frame pointer is set ------------------------- --> Slide 26: show in MJDriver.java where all pieces are being called ------------------------- ------------------------ mstrout@cs.colostate.edu, 8/27/09