CS453 Colorado State University ========================================== Top-Down Predictive Parsers ========================================== class announcements - PA1 due Wednesday night at 11:59, can submit more than once - HW2 and PA2 should be posted Wednesday night by midnight --------------- Goals for Today - Finish up complexity discussion of exhaustive search parsing - Learn about predictive parsing - Calculating FIRST and FOLLOW sets to determine lookahead tokens for each case in nonterminal function switch statements - error messages for MiniSVG -------------------- Exhaustive search parsing algorithm - exhaustive search algorithm and its complexity - need for O(N) solutions -------------------- Recursive descent parsing, specifically predictive parsing - when you want to write a parser quickly by hand... - parse with a recursive set of procedures - one procedure for each nonterminal - if only one lookahead needed to determine production for each nonterminal, then can use a predictive parser - a predictive parser does a switch on the lookahead within each procedure to determine the production - each production for for nonterminal corresponds to "one case" in the switch statement - each case calls procedures for nonterminals and match() for terminals for the production to which it corresponds --------------------- Example Predictive Parser // ------------------ Example Grammar (1) start -> mesh EOF (2) mesh -> NUM nodelist NUM ElemList (3a) nodelist -> (3b) nodelist -> node nodelist (4) node -> NODE NUM REAL REAL // node_id x_coord y_coord (5a) elemlist -> (5b) elemlist -> elem elemlist (6a) elem -> TRI NUM NUM NUM NUM (6b) elem -> SQR NUM NUM NUM NUM NUM // ------------------ Code to implement predictive parser // lookahead is a variable visible to all of the // following procedures void start() { switch(m_lookahead) { case NUM: mesh(); match(Token.Tag.EOF); break; default: throw new ParseException(…); }} void mesh() { switch(this.m_lookahead) { case NUM: num_nodes = ((Num)m_lookahead).value; match(NUM); nodelist(); num_elem = ((Num)m_lookahead).value; match(NUM); elemlist(); break; default: throw new ParseException(…); }} void nodelist() { switch(m_lookahead) { case NUM: break; // nodelist -> epsilon case NODE: node(); nodelist(); break; // nodelist -> node nodelist default: throw new ParseException(…); }} // ------------------------------------------------- void node() { switch(m_lookahead) { case NODE: match(NODE); node_id = ((Num)m_lookahead).value; match(NUM); x[node_id] = ((RealNum)m_lookahead).value; match(REAL); y[node_id] = ((RealNum)m_lookahead).value; match(REAL); break; default: throw new ParseException(…); } } // ----------------------------------------------- void elemlist() { switch(lookahead) { case EOF: break; case TRI: case SQR: elem(); elemlist(); break; default: throw new ParseException(…); } } void elem() { switch(lookahead) { case TRI: match(TRI); elem_id = ((Num)m_lookahead).value; match(NUM); n1[elem_id] = ((Num)m_lookahead).value; match(NUM); n2[elem_id] = ((Num)m_lookahead).value; match(NUM); n3[elem_id] = ((Num)m_lookahead).value; match(NUM); break; case TRI: match(SQR); elem_id = ((Num)m_lookahead).value; match(NUM); n1[elem_id] = ((Num)m_lookahead).value; match(NUM); n2[elem_id] = ((Num)m_lookahead).value; match(NUM); n3[elem_id] = ((Num)m_lookahead).value; match(NUM); n4[elem_id] = ((Num)m_lookahead).value; match(NUM); break; default: throw new ParseException(…); } } Step through parsing the following example: (slide 8) 2 NODE 1 0.3 42.7 NODE 2 0.9 43.0 0 EOF The calls to the various nonterminal procedures in the recursive descent, predictive parser correspond to a pre-order traversal of the parse tree. ----------------------------- Predictive Parser for MiniSVG -> look at given parser code again -m_lookahead -match() method -> write nonterminal method for svg and one of the elem production cases ------------------- Making it work using FIRST and FOLLOW sets Used ... - in top-down parsers to select the appropriate production to apply - in top-down and bottom-up used for panic-mode error recovery Terminology note - We will use the implementation of FIRST and FOLLOW sets as described in the Tiger book instead of the Dragon book. The main difference is we do not have epsilon in the FIRST and FOLLOW sets. Instead we use a helper set of nullable nonterminals. Here is our definition of a FIRST set. Let alpha be a string of symbols (nonterminals and terminals), FIRST[ alpha ] = set of first tokens for all possible strings derivable from alpha To compute FIRST for all of the production right-hand sides, we first determine for each symbol whether it is nullable or not. nullable(terminal) = false nullable(X) = true, if X can derive the empty string Algorithm to calculate nullable do for each X -> gamma if (gamma == epsilon) nullable(X) = true else if (gamma == Y_1 Y_2 ... Y_n and all Y_i are nonterminals such that nullable(Y_i) = true) nullable(X) = true while nullable changes A string of symbols, alpha, is nullable if all of its symbols are nullable. -> Can alpha be nullable if it contains a terminal? Then we compute the FIRST sets for all production right-hand sides. We iterate over the following rules until convergence upon a solution. FIRST[ terminal ] = { terminal } FIRST[ nonterminal ] = union over all FIRST[ rhs ], where rhs is a production right-hand side for nonterminal FIRST[ alpha ] = union of all FIRST(sym) up to and including first nonnullable symbol. FOLLOW[ Y ], where Y is a nonterminal -look for Y in rhs of rules -union all FIRST sets for symbols after Y up to and including first nonnullable -if all symbols after Y are nullable then also union in FOLLOW[lhs] -iterate over the grammar until the FOLLOW sets converge ----------------------------- Predictive Parser Table A predictive parser is a recursive descent parser that does not require backtracking. Using the FIRST and FOLLOW sets we can construct a parse table that for each pairing of nonterminal and terminal indicates the relevant production rule. -> construct the nullable, FIRST, and FOLLOW sets for the Mesh grammar nullable FIRST FOLLOW ----------------------------------------------- start | | mesh | | | node | | | node | | | node_list | | | | elem_list | | | -> use predictive parser table algorithm on slide to construct table for Mesh grammar [, keep parse table on white board] NUM REAL TRI SQR EOF --------------------------------------------------------------- start | | mesh | | | node | | | node | | | node_list | | | | elem_list | | | -> show relationship between table and switch statements ------------------------ MiniSVG error messages (slide 21) ------------------------ mstrout@cs.colostate.edu, 1/31/11