CS453 Colorado State University ============================================== LR Parse Tables ============================================== ------------------------- Outline - Bottom-up parsing with LR parse tables - JavaCUP debug information - Using an LR parse table - Building the LR parse table for LR(0) - Building an LR(1) parse table - Debugging shift/reduce errors - Relationship between LR parse tables and pushdown automata ------------------------- JavaCUP debug information (review) see AmbiguousGrammarExamples.tar posted on recitation website java -jar java-cup-11a.jar -dump minimal.cup >& LRtable.txt In minimal.cup (or wherever you call parser): parser code {: public static void main(String args[]) throws Exception { new parser(new Yylex(System.in)).debug_parse(); } :} -> Show them the generated output. Today is about understanding the LR parse table output and the debug_parse output and learning how to use them for debugging grammars. ------------------------- Bottom-Up Parsing with LR parse tables (review) bottom-up parsing - “builds” the parse tree from the leaves to the roots - performs a right-most derivation in reverse - a reduction is the reverse of a derivation step shift-reduce parsing - form of bottom-up parsing - grammar symbols held on the stack and token string is in input buffer - LR(k) parser is a kind of shift-reduce parser - left to right parse - rightmost derivation - k-token look ahead ----------------------- Example LR parse table (review) - You can find another good example in the book and on wikipedia entry for LR parser. Doing examples is important to aid in understanding and study for the final. - rules for the parse table Look at state at top of stack and input symbol to get action shift(n): advance input, push n on stack - Putting input on stack as well to help us humans keep track. As is described in the book, computers just need the state numbers. reduce(k): pop rhs of grammar rule k look up state on top of stack and lhs for goto n push lhs(k) and n onto stack accept: stop and success error: stop and fail - Parsing the string ((a)) using the LR parse table generate by by JavaCUP. - See slide for grammar and LR parse table. - See how JavaCUP output corresponds to LR parse table. Stack Input Action 0 ((a))$ shift 3 0(3 (a))$ shift 3 0(3(3 a))$ shift 1 0(3(3a1 ))$ reduce by [3] S -> ID, goto 4 0(3(3S4 ))$ shift 5 0(3(3S4)5 )$ reduce by [0] S -> ( S ), goto 4 0(3S4 )$ shift 5 0(3S4)5 $ reduce by [0] S -> ( S ), goto 2 0S2 $ accept -------------------- How to build LR(0) parse table (new material in the rest of these notes) - item, grammar rule with dot indicating position in parse - Closure(I), where I is a set of items for all A -> alpha . X beta in I where alpha and beta are strings and A and X are non terminals I = I union { X -> . gamma } return I - Goto(I,X), where I is a set of items and X is a nonterm or term the set of items J that is obtained by parsing X for all A -> alpha . X beta in I add A -> alpha X . beta to J return Closure(J) -> use the above definitions to create a graph for nested parenthesis example. See slides for answer. - Produce LR parse table 0) Make a table with states down first column and tokens and nonterminals across the top 1) I -- X --> J, where X is a terminal put shift J at (I,X) 2) I -- X --> J, where X is a nonterminal put goto J at (I,X) 3) S' -> S . $ in I put accept at (I,$) 4) (k) A -> gamma . reduce k for every token ------------------------- But if need to know next token then how LR(0)? - Notice that end up with same reduction occuring no matter what token, once we are in a particular state. Really depends on what is on the stack. Done this way so that we can use same LR parse table for LR(0) and LR(1) ----------------------- Another example (item sets are shown) (0) S' -> . E $ (1) E -> . E || B (2) E -> . B (3) B -> . t (4) B -> . f S' -> E . $ E -> E . || B E -> B . B -> t . B -> f . E -> E || . B B -> . t B -> . f E -> E || B . -------------------- LR(1) parse tables Sometimes lookahead is necessary. Means that when in a particular state will be able to shift or reduce based on the lookahead token. To create an LR(1) parse table, we need to construct FIRST and FOLLOW sets. Example to show need for LR(1). S -> epsilon | b S -> start making an LR(0) and run into trouble -------------------- First and Follow Sets - why? - both are needed to construct a predictive parser, which is a type of top down parser - for LR(1) parsers, used to determine what tokens can start the string after a nonterminal (A -> alpha . X beta, z) what is FIRST[beta z]? - Go through example as review definitions. S -> ( S ) S -> ID S' -> S $ FIRST[ terminal ] = { terminal } FIRST[ X ] = FIRST[ rhs ], where X is a nonterminal Union of all FIRST[ Y ] on rhs up to and including first nonnullable FIRST[ X gamma ] if X is nullable FIRST[X] union FIRST[ gamma ] else FIRST[X] FOLLOW[ Y ], where Y is a nonterminal look for Y in rhs of rules union all FIRST sets for symbols after Y up to and including first nonnullable string FIRST nullable FOLLOW ------------------------------------------ ( S ) ( false S ( ID false ) $ ----------------------- How to build LR(1) parse table - item is (A -> alpha . X beta, z) A is nonterminal alpha and beta are strings of terminals and nonterminals alpha is on top of the stack X is a terminal or nonterminal z is a terminal/token - Closure(I), where I is a set of items for all (A -> alpha . X beta, z) in I where alpha and beta are strings and A and B are non terminals I = I union { (X -> . gamma, w) }, where w in FIRST(beta z) return I - Goto( I, X ), where I is a set of items and X is a nonterm or term for all (A -> alpha . X beta, z) in I add (A -> alpha X . beta, z) to J return Closure(J) - use the above definitions to create a graph for the following grammar (0) S' -> E $ (1) E -> T + E (2) E -> T (3) T -> x - Produce LR(1) parse table 1) I -- X --> J, where X is a terminal put shift J at (I,X) 2) I -- X --> J, where X is a nonterm put goto J at (I,X) 3) S' -> S.$ in I put accept at (I,$) (only change from LR(0)) 4) (k) ( A -> gamma ., w) reduce k for token w ------------------------------------------ Two zero+ lists right next to each other whose repeated item starts with same token start ::= dl sl ; decl ::= ID ID SC; dl ::= | dl decl; stm ::= ID EQ; sl ::= | sl stm; Warning : *** Shift/Reduce conflict found in state #1 between sl ::= (*) and decl ::= (*) ID ID SEMI under symbol ID Resolved in favor of shifting. START lalr_state [0]: { [dl ::= (*) dl decl , {EOF ID }] [$START ::= (*) prog EOF , {EOF }] [dl ::= (*) , {EOF ID }] [prog ::= (*) dl sl , {EOF }] } transition on prog to state [2] transition on dl to state [1] ------------------- lalr_state [1]: { [decl ::= (*) ID ID SEMI , {EOF ID }] [sl ::= (*) sl stm , {EOF ID }] [dl ::= dl (*) decl , {EOF ID }] [sl ::= (*) , {EOF ID }] [prog ::= dl (*) sl , {EOF }] } transition on decl to state [6] transition on ID to state [5] transition on sl to state [4] Intuition for problem -In state [1], we can shift an ID onto the stack to start parsing the decl production or when we see an ID we can do a reduction using the (sl->epsilon) rule. Solution -Give the (sl -> epsilon) reduction a different lookahead by making the (sl -> sl stm) right recursive, so it becomes (sl -> sl stm). ----------------------------------- Fix and show second set of items after the fix (see minimal.cup in AmbiguousGrammarExamples.tar for full example) prog ::= dl sl ; decl ::= ID ID SEMI; dl ::= | dl decl; stm ::= ID EQ; sl ::= | stm sl; START lalr_state [0]: { [dl ::= (*) dl decl , {EOF ID }] [$START ::= (*) prog EOF , {EOF }] [dl ::= (*) , {EOF ID }] [prog ::= (*) dl sl , {EOF }] } transition on prog to state [2] transition on dl to state [1] ------------------- lalr_state [1]: { [stm ::= (*) ID EQ , {EOF ID }] [decl ::= (*) ID ID SEMI , {EOF ID }] [sl ::= (*) stm sl , {EOF }] [dl ::= dl (*) decl , {EOF ID }] [sl ::= (*) , {EOF }] [prog ::= dl (*) sl , {EOF }] } transition on stm to state [7] transition on decl to state [6] transition on ID to state [5] transition on sl to state [4] ------------------- -------------------------- Suggested Exercise Show LR(1) table for JavaCUP grammar that has a shift-reduce error. What is causing the error? How can the error be fixed easily in JavaCUP? (0) S' -> S $ (1) S -> E (2) E -> E - E (3) E -> num ----------------------- The PushDown automata behind the nested parens parse table [May not get to this] - parse tables are a subset of pushdown automata, it has been specialized for parsing - review pushdown automata and notation stack stack alphabet, X initial item on stack input string language alphabet, A end of string token DFA start state set of states, Q set of accept states transition function: Q x A x X -> Q x X - show correspondence of automata with LR parse table, LR parse table is not EXACTLY a pushdown automata but can be converted to one - a pushdown automata is only supposed to pop or push one character at a time ( 0, epsilon, S ) -> ( 2, S2 ) [goto 2] - can replace with two transitions and an intermediate state 0a - replace the transition ( 0, epsilon, S ) -> ( 2, S2 ) with the following: (0, epsilon, S) -> (0a, S) (0a, epsilon, epsilon) -> (2, 2) - show that can make a push down automata that does not use the input character, no lookahead State 3 Replace (3, ID, epsilon) -> (1, ID 1) (3, '(', epsilon) -> (3, '(' 3) with (3, *, epsilon) -> (3a, *) (3a, epsilon, '(' ) -> (3, 3) (3a, epsilon, ID ) -> (1, 1) - as the automata is drawn in the slides, the number of states are same as in LR table patterns - when reduce, pop a lot of stack and next state depends on last state popped - In places where can reduce epsilon to a non-terminal, get self cycle for reduce. Example in slides does not include this. - when shift, push character and state going to on stack - when goto, take nothing from input, must have nonterminal on top of stack, and push nonterminal back with state going to - accept when have nonterminal for start symbol and some state on top of the stack and the next input token is the $/EOF - use the automata to recognize the string (b) - storage needed for pushdown automata - |Q| * |A| * |X| * 2 - storage needed for LR parsing table - |Q| * (|A| + |K|), where K is the set of grammar rules ------------------------ mstrout@cs.colostate.edu, 4/4/11