CS453 Colorado State University =============================================== Building the AST and the Visitor Design Pattern =============================================== -------------- Plan for today What is an AST. How to build an AST while parsing. The AST data structure that will be provided for PA4 through PA7. What is a visitor design pattern. The AST visitor design pattern that will be provided for PA4 through PA7. The DotVisitor we are providing. Evaluating byte and integer expressions in the code generator visitor. --------------- PA3 (slide 2) Using lexer and parser generator tools. Syntax-directed byte and integer expression evaluation. Syntax-directed code generation of main prologue, main epilogue, and setPixel() calls. --------------- PA4 (slide 2) What stays the same For lexer will just have to add some more regular expressions. Build for lexer and parser and driver will stay the same. What changes Now will build an Abstract Syntax Tree (AST) to represent the program. Type checking will be implemented with a visitor. Will need a line and position for each token in the AST. Byte and integer expression evaluation and code generation will now be implemented with a visitor. Code generator visitor will handle control-flow constructs and some of the boolean expressions. Boolean expression evaluation will be done on the AVR run-time stack. Code generator visitor will also generate code for Meggy.delay() and Meggy.checkButton(). --------------------- Abstract Syntax Trees Example -> show PA4OneIfStmt.java, show compilation command, then show AST General Concepts - AST has a lot fewer nodes than a parse tree - AST is often language specific, for example the MeggyJava AST does not have a ForStatement node and does have a node for each of the built-in Meggy methods - An AST is useful in that the compiler can be written as concrete visitors over the AST. IOW will be doing multiple traversals of the AST for type checking, building the symbol table, and code generation -------------------------------------- The AST data structures being provided Node class hierarchy (slide 3) -each Node subclass is just representational -each subclass constructor takes children in AST as input (show in Eclipse) PlusExp(leftExpNode, rightExpNode); IfStatement( testExp, trueStmtList, falseStmtList); Nodes of interest for PA4 Integer and Byte Expression nodes ByteCast IntegerExp MinusExp MulExp PlusExp Most Boolean Expressions TrueExp FalseExp AndExp EqualExp NotExp Other expressions ButtonExp ColorExp Statements BlockStatment IfStatement MeggyDelay MeggySetPixel WhileStatement Other Nodes Token Program MainClass Converting the TokenValue instance into a Token (slide 4) We recommend you have a TokenValue class that is used by the lexer to return each Token with a lexeme and its line and position. // in .lex file "Meggy.Button.B" {return new Symbol(sym.BUTTON_LITERAL, new TokenValue(yytext(), yyline+1, yychar));} {number} {return new Symbol(sym.INT_LITERAL, new TokenValue(yytext(), yyline+1, yychar));} // in .cup file terminal TokenValue INT_LITERAL; ... | BUTTON_LITERAL:n {: RESULT = new ButtonExp(new Token(n.text,n.line,n.pos)); :} | INT_LITERAL:n {: RESULT = new IntegerExp(new Token(n.text,n.line,n.pos)); :} -> now look at PA4OneIfStmt.java AST again ------------------- Constructing an AST Concepts in the book and in PA4 - in book Section 2.8 each node has an attribute n, which is an AST node reference/pointer - in JavaCUP the same thing is true, except you don't explicitly manipulate the mapping of nodes to AST node references. -> instead you specify the reference type for each nonterminal (e.g. IExp) -> and then specify variable names for nonterminals on the rhs of a production to access their attribute in the action code - In book, Figure 2.39 provides an example of how an AST is generated using actions. So does 5.10 and 5.11. (slide 5) - Declarations for nonterminals and terminals in grammar for JavaCUP non terminal IExp exp; non terminal List statement_list; - Action for building PlusExp in JavaCUP | exp:a PLUS exp:b {: RESULT = new PlusExp(a,b); :} | ( E:e ) {: RESULT = e; :} statement_list ::= statement_list:list statement:s {: if (s!=null) { list.add(s); } RESULT = list; :} | /* epsilon */ {: RESULT = new LinkedList(); :} ; ---------------------------- Abstract Syntax Tree Example Example 2 + (3 - 1) Parse tree -> using grammar for PA3, have students indicate what is in the parse tree AST node class hierarchy -> provide students with a possible AST node class hierarchy (go back to slide 3) Generating AST while parsing -> have students indicate how to represent the examples expression using the given AST node class hierarchy and the order that it will be constructed ---------------------- Visitor Design Pattern (slide 6) Situation - Want to perform some processing on all items in a data structure - Will be adding many different ways to process items, different features - Will not be changing the classes of the data structure itself much Possibilities - For each functionality add a method to all of the classes - Each new functionality is spread over multiple files - Sometimes can’t add methods to existing class hierarchy - Use a large if-then-else statement in visit method pro: keeps all the code for the feature in one place con: can be costly and involve lots of casting - Visitor design pattern ---------------------- Visitor Design Pattern Example class Degree { String who; } subclasses BS MS Ph.D. List peopleList; - want algorithm that tells a joke. Joke involves a different string per subclass found in the list. BSstring = "Think you know everything" MSstring = "Think you don't know anything" PHDstring = "Realize neither does anyone else" - want another algorithm that pretty prints degree - other ideas for algorithms that operate on this list? - options 1) put a method for each algorithm into each of the subclasses If there are lot of different subclasses (think ast.node), then this becomes quite cumbersome and spread out. 2) Have each algorithm use instanceof and do a lot of casting. How do you know if all were covered? Have to write the traversal code in every algorithm. 3) Visitor Design pattern -Each subclass has apply/accept method that accepts a Visitor/Switch interface. -Each algorithm satisfies the Visitor/Switch interface and can write code specific to each subclass in data structure. -Visitor interface can do traversal or leave that to the specific algorithm. ----------------- version where the algorithm does the traversal itself public interface Visitor { void visitMS( MS degree ); void visitBS( BS degree ); void visitPHD( PHD degree ); } public interface Degree { void accept( Visitor v ); Degree getNext() { ... } } public class MS implement Degree { void accept (Visitor v) { v.visitMS(this); } } // algorithm public class Joke implements Visitor { void visitBS( BS degree ) { print BSstring; if (getNext()!=null) { getNext().accept(this) } } void visitMS( MS degree ) { print MSstring; if (getNext()!=null) { getNext().accept(this) } } } // using the algorithm Degree dl = ... something that initializes the list ... Joke myj = new Joke(); dl.accept(myj); ----------------------------------------------- Provided Visitor Design Pattern Implementation -> show them the class hierarchy (slide 7) -> show slide with excerpts from various parts of code defining and using the visitor classes (slide 8) -> show Visitor -> show DepthFirstVisitor defaultOut defaultIn -> step through the AVRgenVisitor with the Eclipse debugger -> bring up PA4OneIfStatement and show where in and out methods are called during a depth-first traversal of the nodes in that tree - Notations such as @SuppressWarnings("unused") are used to suppress javac compilation warnings. ----------------- version of the joke visitor that uses something similar to what is provided in ast.visitor public interface Visitor { void visitBS(BS degree); void visitMS(MS degree); void visitPhD(PhD degree); ... } public interface Degree implements IVisitable { void accept( Visitor ); // in IVisitable Degree getNext(); } // setup in subclasses, only need to do this once public class BS implements Degree { void accept( Visitor v ) { v.visitBS(this); } } public class MS implements Degree { void accept( Visitor v ) { v.visitMS(this); } } // implementing an algorithm with a visitor public class Joke extends ListTraversalVisitor { void visitBS( BS degree ) { print BSstring; } void visitMS( MS degree ) { print MSstring; } ... } // using the algorithm Degree dl = ...; Joke myj; dl.accept(myj); ----------- FAQ (slide 10) - How do we associate data to nodes? Use templated HashMaps that map nodes to information such as AST node references. - What if we want to do the same logic at each node in data structure? -> Show DotVisitor.java, which just defines defaultCase, defaultIn, and defaultOut. ---------------------------- The Dot Visitor -> show the reference compiler running and look at the generated dot file % java -jar MJ_PA4.jar TODO example that used at beginning // generating a png file of the graph % dot -Tpng -otest.png test.dot // generating a pdf file of the graph % dot -Tps2 -Gsize=64,64 -Gmargin=0 -otest.ps2 test.dot % ps2pdf test.ps2 test.pdf // use dotty on PCs or GraphViz on Mac's to do interactive // rendering -> show how the DotVisitor is called in the reference compiler ast.node.Node ast_root = (ast.node.Node)parser.parse().value; java.io.PrintStream astout = new java.io.PrintStream( new java.io.FileOutputStream(filename + ".ast.dot")); ast_root.accept(new DotVisitor(new PrintWriter(astout))); -> show the DotVisitor implementation It overrides the defaultIn and defaultOut methods and will be provided for you. ====> On March 1, 2011 we got here. ------------------- Examples to test understanding - put example on board and ask what numbering we get, preorder or postorder? public class Test1 extends DepthFirstVisitor { private HashMap mNodeToNum = new HashMap(); int count = 0; ... public void defaultIn(Node node) { mNodeTNum.put(node,count); count++; } Suggested exercise: how would we write a visitor that reprints the expressions with parentheses around all of the sub-expressions so as to indicate the full expression evaluation order? --------------------------------------------------------------------- Evaluating byte and integer expressions in the code generator visitor Map expression nodes to an integer value. Look up child values, perform operation, and then map computed value to current node. // Snippets from concrete visitor. class AVRgenVisitor extends DepthFirstVisitor { HashMap mExpVal = new HashMap(); public void outPlusExp(PlusExp node) { // Look up child values, Integer lexpval = mExpVal.get(node.getLExp()); Integer rexpval = mExpVal.get(node.getRExp()); // perform operation, Integer value = lexpval + rexpval; // and then map computed value to current node. mExpVal.put(node, value); } public void outByteCast(ByteCast node) { // Look up child value, Integer expval = mExpVal.get(node.getExp()); // perform operation, Integer value = expval; // and then map computed value to current node. mExpVal.put(node, value); } } --------------- Debugging Ideas -System.out.println in parser actions -break points in visitor methods -> actually do an example in Eclipse -> step through the apply on the root and through all of the applys on a subtree so that the functionality of the given analysis classes are introduced -> show how to set a break point -more debugging ideas will be covered in recitation... [overwrite defaultOut to print an error message if the node has not been implemented yet.] ------------------------ mstrout@cs.colostate.edu, 3/1/11