CS453 Colorado State University ========================================== Top-Down Predictive Parsers ========================================== class announcements - HW2 and PA2 have been posted - expect Quiz 2 to be posted Friday night for completion by Monday night --------------- Goals for Today - Finish predictive parser for Mesh grammar - Predictive parser for MiniSVG - General error recovery for predictive parsers - General parsing versus top-down predictive - complexity - requirements on the grammar - the need to remove left-recursion for a recursive-descent parser - use MiniSVG modification as an example - the need to sometimes perform left-factoring - MeggyJava language intro ----------------------------- Predictive Parser Table A predictive parser is a recursive descent parser that does not require backtracking. Using the FIRST and FOLLOW sets we can construct a parse table that for each pairing of nonterminal and terminal indicates the relevant production rule. -> construct the nullable, FIRST, and FOLLOW sets for the Mesh grammar [Put on board somewhere that I can keep this info] nullable FIRST FOLLOW ----------------------------------------------- start | false {NUM} {} | mesh | false {NUM} {EOF} | | node | false {NODE} {NODE,NUM} | | elem | false {TRI,SQR} {TRI,SQR,EOF} | | node_list | true {NODE} {NUM} | | | elem_list | true {TRI,SQR} {EOF} | | -> use predictive parser table algorithm on slide to construct table for Mesh grammar [, keep parse table on white board] NUM REAL TRI SQR EOF --------------------------------------------------------------- start | | mesh | | | node | | | elem | | | node_list | | | | elem_list | | | -> show relationship between table and switch statements ----------------------------- Error Recovery Goals - Provide program with a list of as many errors as possible - Provide USEFUL error messages - appropriate line and position information - guidance for fixing the error - Avoid infinite loops or recursion - Add minimal overhead to the processing of correct programs Approaches - Stop after first error - Panic mode, skip tokens until find synchronizing tokens (1) in function A() use FOLLOW(A) as set of synchronizing tokens (2) might want to also include start of other statement constructs (3) could put FIRST(A) in synchronizing tokens and restart parsing A (4) Use all tokens other than current error token as synchronizing set. Attempt to continue in same nonterminal function. - Phrase-level recovery ------------------------------------- Panic mode using heuristic (1) Predictive Parser - each nonterminal requires a function definition - the function for nonterminal X should have one phrase for each possible production rule for X. A phrase includes a case for every character in the FIRST set for the rhs of the production, each character in the FOLLOW set if the rhs is nullable, and calls to match the tokens or calls to other nonterminal functions to process the rhs of the production. - For panic mode, match tokens until get to follow of nonterminal currently parsing match(Token tok) { if(tok==lookahead) lookahead = scan(); else throw new Error(message); } // write panic method for each nonterminal panic_nonterminal( ) { print error; while ( scan() not in (FOLLOW(nonterminal) union {EOF}) ) { } } Float Assignments Grammar S -> StmtList EOF Stm -> id assign float StmList -> Stm StmList -> epsilon What is the FOLLOW set for each nonterminal? symbol or string nullable FIRST FOLLOW Stm no id id, EOF StmList yes id EOF S no id, EOF none - (slide 5) show the predictive parser code that has been modified to have match(tok) throw an exception and has try and catch blocks around the processing of each rhs of production rule ----------------------------- Left Recursion -> show slide with modified MiniSVG grammar (slide 6) -> calculate nullable, FIRST, and FOLLOW sets -> create predictive parse table and observe problem -> show example parse tree for modified MiniSVG slide (slide 7) Removing Left Recursion - For lists, just make it right recursive. - For general case, see 4.3.3 in book. - Will not be tested on how to remove left recursion in the general case, but you should be able to make a left-recursive list grammar into a right-recursive list grammar. Observation - In our mesh grammar, what if nodelist -> epsilon | nodelist node? IOW what if it were left recursive? - We would have to know ahead of time how many nodes were in the list, which we do but that is part of the language semantics and not part of the syntax or grammar. Using that info would make the language context-based instead of context free. ----------------------------- Left Factoring There is more than one way to specify a circle. What if we had the following two productions in the MiniSVG grammar? elem -> CIRCLE_START KW_CX EQ NUM KW_CY EQ NUM KW_R EQ NUM KW_FILL EQ COLOR ELEM_END and elem -> CIRCLE_START KW_X EQ NUM KW_Y EQ NUM KW_WIDTH EQ NUM KW_HEIGHT EQ NUM KW_FILL EQ COLOR ELEM_END Both production rules have the same FIRST set. Left factor as follows: elem -> CIRCLE_START circle circle -> KW_CX EQ NUM KW_CY EQ NUM KW_R EQ NUM KW_FILL EQ COLOR ELEM_END and circle -> KW_X EQ NUM KW_Y EQ NUM KW_WIDTH EQ NUM KW_HEIGHT EQ NUM KW_FILL EQ COLOR ELEM_END ----------------------------- Predictive Parsing Complexity grammar classes LL(k) - left-to-right scan, left-most derivation, k tokens of lookahead General parsing versus top-down predictive - complexity of general parsing O(N^3), where N is number of tokens Earley parser does a form of dynamic programming: O(n^3) in the general case O(n^2) for unambiguous grammars O(n) for most LR(k) grammars http://en.wikipedia.org/wiki/Earley_parser - complexity of predictive parsing is O(N) - requirements on the grammar - for all the productions for nonterminal A - none of the FIRST(rhs) for A production rules can overlap' [-> students if they do what do we do?] - if nullable(A)==true, then FOLLOW(A) must not overlap with FIRST(rhs) for any A->rhs -> Note that MiniSVG grammar satisfies these constraints. ----------------------------- MeggyJava Intro Overview PA2 writeup. ------------------------ mstrout@cs.colostate.edu, 1/25/10