Midterm Exam Advice and Hints CSE4100 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Dr. Robert LaBarre United Technologies Research Center 411 Silver Lane E. Hartford, CT 06018 LaBarrRE@utrc.utc.com MTE.1 Core Material CSE4100 Chapter 1: Introduction to Compilers Basic Compiler Ideas and Concepts The “Big Picture” Chapter 2: A Simple One-Pass Compiler A Look at All Phases of Compilation Process From Lexical Analysis Thru Code Generation FOCUS: Chapter 3: Lexical Analysis Specifying/Recognizing Tokens Patterns (Regular Expressions) and Lexemes Regular Expressions and DFA/NFA Algorithms for Regular Expression to NFA NFA to DFA MTE.2 Core Material FOCUS Chapter 4: Syntax Analysis Context Free Grammar: Defs and Concepts CSE4100 Derivations, Specification, Languages Writing Grammars Ambiguity, Left Recursion, Left Factoring, Removing epsilon Moves Algorithm for Left Recursion Removal Top-Down Parsing Recursive Descent and Predictive Parsing First and Follow Calculation Constructing LL(1) Parsing Table Ambiguity and Error Handling Lex and Yacc will not be Tested! MTE.3 Hints for Taking Exam CSE4100 Read the Questions Carefully! Ask Questions if you are Confused! Answer Questions in Any Order Organized to fit on minimum number of pages Answer “Easiest” questions for you! Assess Points per Time Unit 75 minutes = 75 points 30 minutes = 30 points; 20 minutes = 20 points Don't Be Afraid to Not Answer a Question 60% Correct for 100 Points = 60 Points 90% Correct For 80 Points = 72 Points Partial Credit is the Norm MTE.4 Possible Questions CSE4100 Open Notes and Open Book 5 to 6 Total Multi-Part Questions Possibilities… Constructive and Algorithm Questions Writing and Using Grammar Understanding Significance and Relevance of Concepts Know your Algorithms and Constructs (Regular Expressions, NFA, DFA, CFG) Show All Work to Receive Partial (Any) Credit Do Not Jump to Final Answer Avoid Run-on Explanations MTE.5 Chapter 3 Excerpted Material Introducing Basic Terminology Token CSE4100 Sample Lexemes Informal Description of Pattern const const const if if if relation <, <=, =, < >, >, >= < or <= or = or < > or >= or > id pi, count, D2 letter followed by letters and digits num 3.1416, 0, 6.02E23 any numeric constant literal “core dumped” any characters between “ and “ except “ Classifies Pattern Actual values are critical. Info is : 1. Stored in symbol table 2. Returned to parser MTE.6 Language Concepts A language, L, is simply any set of strings over a fixed alphabet. CSE4100 Alphabet {0,1} Languages {0,10,100,1000,100000…} {0,1,00,11,000,111,…} {a,b,c} {abc,aabbcc,aaabbbccc,…} {A, … ,Z} {TEE,FORE,BALL,…} {FOR,WHILE,GOTO,…} {A,…,Z,a,…,z,0,…9, { All legal PASCAL progs} +,-,…,<,>,…} { All grammatically correct English sentences } Special Languages: - EMPTY LANGUAGE - contains string only MTE.7 Formal Language Operations CSE4100 OPERATION union of L and M written L M concatenation of L and M written LM Kleene closure of L written L* DEFINITION L M = {s | s is in L or s is in M} LM = {st | s is in L and t is in M} L*= Li i 0 L* denotes “zero or more concatenations of “ L positive closure of L written L+ L+= L i i 1 L+ denotes “one or more concatenations of “ L MTE.8 Formal Language Operations Examples L = {A, B, C, D } CSE4100 D = {1, 2, 3} L D = {A, B, C, D, 1, 2, 3 } LD = {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 } L2 = { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD} L4 = L2 L2 = ?? L* = { All possible strings of L plus } L+ = L* - L (L D ) = ?? L (L D )* = ?? MTE.9 Language & Regular Expressions A Regular Expression is a Set of Rules / Techniques for Constructing Sequences of Symbols (Strings) From an Alphabet. Let Be an Alphabet, r a Regular Expression CSE4100 Then L(r) is the Language That is Characterized by the Rules of R MTE.10 Rules for Specifying Regular Expressions: 1. is a regular expression denoting { } CSE4100 If a is in , a is a regular expression that denotes {a} 2. 3. Let r and s be regular expressions with languages L(r) and L(s). Then p r e c e d e n c e (a) (r) | (s) is a regular expression L(r) L(s) (b) (r)(s) is a regular expression L(r) L(s) (c) (r)* is a regular expression (L(r))* (d) (r) is a regular expression L(r) All are Left-Associative. MTE.11 EXAMPLES of Regular Expressions L = {A, B, C, D } D = {1, 2, 3} CSE4100 A|B|C|D =L (A | B | C | D ) (A | B | C | D ) = L2 (A | B | C | D )* = L* (A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L D) MTE.12 Algebraic Properties of Regular Expressions CSE4100 AXIOM r|s=s|r r | (s | t) = (r | s) | t (r s) t = r (s t) r(s|t)=rs|rt (s|t)r=sr|tr r = r r = r r* = ( r | )* r** = r* DESCRIPTION | is commutative | is associative concatenation is associative concatenation distributes over | Is the identity element for concatenation relation between * and * is idempotent MTE.13 Automata & Language Theory Terminology FSA A recognizer that takes an input string and determines whether it’s a valid string of the language. CSE4100 Non-Deterministic FSA (NFA) Has several alternative actions for the same input symbol Deterministic FSA (DFA) Has at most 1 action for any given input symbol Bottom Line expressive power(NFA) == expressive power(DFA) Conversion can be automated MTE.14 Finite Automata & Language Theory Finite Automata : CSE4100 A recognizer that takes an input string & determines whether it’s a valid sentence of the language Non-Deterministic : Has more than one alternative action for the same input symbol. Can’t utilize algorithm ! Deterministic : Has at most one action for a given input symbol. Both types are used to recognize regular expressions. MTE.15 NFAs & DFAs CSE4100 Non-Deterministic Finite Automata (NFAs) easily represent regular expression, but are somewhat less precise. Deterministic Finite Automata (DFAs) require more complexity to represent regular expressions, but offer more precision. We’ll discuss both plus conversion algorithms, i.e., NFA DFA and DFA NFA MTE.16 Non-Deterministic Finite Automata An NFA is a mathematical model that consists of : CSE4100 • S, a set of states • , the symbols of the input alphabet • move, a transition function. • move(state, symbol) state • move : S S • A state, s0 S, the start state • F S, a set of final or accepting states. MTE.17 Example NFA a S = { 0, 1, 2, 3 } CSE4100 start s0 = 0 a 0 F={3} 1 b b 2 3 b = { a, b } What Language is defined ? What is the Transition Table ? input s t a t e a b 0 { 0, 1 } {0} 1 -- {2} 2 -- {3} (null) moves possible i j Switch state but do not use any input symbol MTE.18 Epsilon-Transitions CSE4100 Given the regular expression: (a (b*c)) | (a (b |c+)?) Find a transition diagram NFA that recognizes it. Solution ? MTE.19 Deterministic Finite Automata CSE4100 A DFA is an NFA with the following restrictions: • moves are not allowed • For every state s S, there is one and only one path from s for every input symbol a . Since transition tables don’t have any alternative options, DFAs are easily simulated via an algorithm. s s0 c nextchar; while c eof do s move(s,c); c nextchar; end; if s is in F then return “yes” else return “no” MTE.20 Example - DFA b a CSE4100 start a 0 b 1 b 2 3 a b a What Language is Accepted? Recall the original NFA: a start a 0 1 b 2 b 3 b MTE.21 Regular Expression to NFA Construction We now focus on transforming a Reg. Expr. to an NFA CSE4100 This construction allows us to take: • Regular Expressions (which describe tokens) • To an NFA (to characterize language) • To a DFA (which can be computerized) The construction process is componentwise Builds NFA from components of the regular expression in a special order with particular techniques. NOTE: Construction is syntax-directed translation, i.e., syntax of regular expression is determining factor for NFA construction and structure. MTE.22 Motivation: Construct NFA For: : CSE4100 start i f a: b: start start ab: start b A 0 a 0 1 B a 1 A b B | ab : a* ( | ab )* : MTE.23 Construction Algorithm : R.E. NFA Construction Process : CSE4100 1st : Identify subexpressions of the regular expression symbols r|s rs r* 2nd : Characterize “pieces” of NFA for each subexpression MTE.24 Piecing Together NFAs 1. For in the regular expression, construct NFA CSE4100 start i f L() 2. For a in the regular expression, construct NFA start i a f L(a) MTE.25 Piecing Together NFAs – continued(1) 3.(a) If s, t are regular expressions, N(s), N(t) their NFAs CSE4100 s|t has NFA: N(s) start i f N(t) L(s) L(t) where i and f are new start / final states, and -moves are introduced from i to the old start states of N(s) and N(t) as well as from all of their final states to f. MTE.26 Piecing Together NFAs – continued(2) 3.(b) If s, t are regular expressions, N(s), N(t) their NFAs st (concatenation) has NFA: CSE4100 start i N(s) N(t) f L(s) L(t) overlap Alternative: start i N(s) N(t) f where i is the start state of N(s) (or new under the alternative) and f is the final state of N(t) (or new). Overlap maps final states of N(s) to start state of N(t). MTE.27 Piecing Together NFAs – continued(3) 3.(c) If s is a regular expressions, N(s) its NFA, s* (Kleene star) has NFA: CSE4100 start i N(s) f where : i is new start state and f is new final state -move i to f (to accept null string) -moves i to old start, old final(s) to f -move old final to old start (WHY?) MTE.28 Properties of Construction Let r be a regular expression, with NFA N(r), then CSE4100 1. N(r) has at most 2*(#symbols + #operators) of r 2. N(r) has exactly one start and one accepting state 3. Each state of N(r) has at most one outgoing edge a and at most two outgoing ’s 4. BE CAREFUL to assign unique names to all states ! MTE.29 Detailed Example See example 3.16 in textbook for (a | b)*abb 2nd Example - (ab*c) | (a(b|c*)) CSE4100 Parse Tree for this regular expression: r13 r5 r3 r12 | r4 r11 a r1 r2 a r10 ( r7 r0 * ) r9 r8 | c b r6 * b What is the NFA? Let’s construct it ! c MTE.30 Detailed Example – Construction(1) CSE4100 r0: b r3: a r2: c r1: b r4 : r1 r2 b c r5 : r3 r4 a b c MTE.31 Detailed Example – Construction(2) b r7: r8: CSE4100 r : 11 a c b c r6: r9 : r7 | r 8 c r10 : r9 b r12 : r11 r10 a c MTE.32 Detailed Example – Final Step r13 : r5 | r12 CSE4100 a 2 3 4 b 5 6 c 7 17 1 b 10 a 9 8 11 12 13 c 14 15 16 MTE.33 Conversion : NFA DFA Algorithm • Algorithm Constructs a Transition Table for DFA from NFA CSE4100 • Each state in DFA corresponds to a SET of states of the NFA • Why does this occur ? • moves • non-determinism Both require us to characterize multiple situations that occur for accepting the same string. (Recall : Same input can have multiple paths in NFA) • Key Issue : Reconciling AMBIGUITY ! MTE.34 Converting NFA to DFA – 1st Look CSE4100 a 2 b 3 4 0 1 5 8 6 c 7 From State 0, Where can we move without consuming any input ? This forms a new state: 0,1,2,6,8 What transitions are defined for this new state ? MTE.35 The Resulting DFA a 0, 1, 2, 6, 8 CSE4100 3 a a c b 1, 2, 5, 6, 7, 8 c 1, 2, 4, 5, 6, 8 c Which States are FINAL States ? a A a B a b c D c c How do we handle alphabet symbols not defined for A, B, C, D ? C MTE.36 Algorithm Concepts NFA N = ( S, , s0, F, MOVE ) -Closure(S) : s S CSE4100 : set of states in S that are reachable No input is from s via -moves of N that originate consumed from s. -Closure of T : T S : NFA states reachable from all t T on -moves only. move(T,a) : T S, a : Set of states to which there is a transition on input a from some t T These 3 operations are utilized by algorithms / techniques to facilitate the conversion process. MTE.37 Illustrating Conversion – An Example Start with NFA: CSE4100 (a | b)*abb a 2 3 start 0 1 6 b 4 5 a 7 b 8 9 b 10 First we calculate: -closure(0) (i.e., state 0) -closure(0) = {0, 1, 2, 4, 7} (all states reachable from 0 on -moves) Let A={0, 1, 2, 4, 7} be a state of new DFA, D. MTE.38 Chapter 4 Excerpted Material Context Free Grammars Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where: CSE4100 T: Terminals / tokens of the language NT: Non-terminals to denote sets of strings generatable by the grammar & in the language S: Start symbol, SNT, which defines all strings of the language PR: Production rules to indicate how T and NT are combines to generate valid strings of the language. PR: NT (T | NT)* Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model ! MTE.39 Context Free Grammars : A First Look assign_stmt id := expr ; expr term operator term CSE4100 term id term real term integer What do “BLUE” symbols represent? What do “BLACK” symbols represent? operator + operator - Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens. Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax. MTE.40 How is Grammar Used ? Given the rules on the previous slide, suppose CSE4100 id := real + int; is input. Is it syntactically correct? How do we know? expr is represented as: expr term operator term Is this accurate / complete? expr expr operator term expr term How does this affect the derivations which are possible? MTE.41 Grammar Concepts A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule. CSE4100 EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E EXAMPLE: E E A E E * E E * ( E ) DEFINITION: derives in one step + derives in one step * derives in zero steps EXAMPLES: A if A is a production rule * * 1 2 … n 1 n ; for all * and then * If MTE.42 Leftmost and Rightmost Derivations Leftmost: Replace the leftmost non-terminal symbol CSE4100 E E A E id A E id * E id * id lm lm lm lm Rightmost: Replace the leftmost non-terminal symbol E rm EAE E A id E * id id * id rm rm rm Important Notes: A If A , what’s true about ? lm If A , what’s true about ? rm Derivations: Actions to parse input can be represented pictorially in a parse tree. MTE.43 Examples of LM / RM Derivations E E A E | ( E ) | -E | id CSE4100 A+|-|*| / | A leftmost derivation of : id + id * id A rightmost derivation of : id + id * id MTE.44 Derivations & Parse Tree E E EAE E CSE4100 A E E E*E E A E * E id * E id * id E A E id * E E A E id * id MTE.45 Parse Trees and Derivations Consider the expression grammar: E E+E | E*E | (E) | -E | id CSE4100 Leftmost derivations of id + id * id E E EE+E E + E + E id + E E E + E id E id + E id + E * E E + E id E * E MTE.46 Removing Ambiguity Take Original Grammar: CSE4100 stmt if expr then stmt | if expr then stmt else stmt | other (any other statement) Revise to remove ambiguity: stmt matched_stmt | unmatched_stmt matched_stmt if expr then matched_stmt else matched_stmt | other unmatched_stmt if expr then stmt | if expr then matched_stmt else unmatched_stmt How does this grammar work ? MTE.47 Resolving Difficulties : Left Recursion A left recursive grammar has rules that support the + derivation : A A, for some . CSE4100 Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination. A A A A … etc. A A | Take left recursive grammar: A A | To the following: A’ A’ A’ A’ | MTE.48 Why is Left Recursion a Problem ? CSE4100 Consider: EE+T | T TT*F | F F ( E ) | id Derive : id + id + id EE+T How can left recursion be removed ? EE+T | T What does this generate? EE+TT+T EE+TE+T+TT+T+T How does this build strings ? What does each string have to start with ? MTE.49 Resolving Difficulties : Left Recursion (2) Informal Discussion: CSE4100 Take all productions for A and order as: A A1 | A2 | … | Am | 1 | 2 | … | n Where no i begins with A. Now apply concepts of previous slide: A 1A’ | 2A’ | … | nA’ A’ 1A’ | 2A’| … | m A’ | For our example: EE+T | T TT*F | F F ( E ) | id E TE’ E’ + TE’ | F ( E ) | id T FT’ T’ * FT’ | MTE.50 Resolving Difficulties : Left Recursion (3) Problem: If left recursion is two-or-more levels deep, this isn’t enough S Aa | b CSE4100 A Ac | Sd | S Aa Sda Algorithm: Input: Grammar G with no cycles or -productions Output: An equivalent grammar with no left recursion 1. Arrange the non-terminals in some order A1,A2,…An 2. for i := 1 to n do begin for j := 1 to i – 1 do begin replace each production of the form Ai Aj by the productions Ai 1 | 2 | … | k where Aj 1|2|…|k are all current Aj productions; end eliminate the immediate left recursion among Ai productions end MTE.51 Using the Algorithm Apply the algorithm to: A1 A2a | b A2 A2c | A1d | CSE4100 i=1 For A1 there is no left recursion i=2 for j=1 to 1 do Take productions: A2 A1 and replace with A2 1 | 2 | … | k | where A1 1 | 2 | … | k are A1 productions in our case A2 A1d becomes A2 A2ad | bd What’s left: A1 A2a | b A2 A2 c | A2 ad | bd | Are we done ? MTE.52 Using the Algorithm (2) No ! We must still remove A2 left recursion ! CSE4100 A1 A2a | b A2 A2 c | A2 ad | bd | Recall: A A1 | A2 | … | Am | 1 | 2 | … | n A 1A’ | 2A’ | … | nA’ A’ 1A’ | 2A’| … | m A’ | Apply to above case. What do you get ? MTE.53 Removing Difficulties : -Moves Very Simple: A and B uAv implies add B uv to the grammar G. CSE4100 Why does this work ? Examples: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id A1 A2 a | b A2 bd A2’ | A2’ A2’ c A2’ | bd A2’ | MTE.54 Removing Difficulties : Cycles How would cycles be removed ? CSE4100 S SS | ( S ) | Has: S SS S S MTE.55 Removing Difficulties : Left Factoring Problem : Uncertain which of 2 rules to choose: CSE4100 stmt if expr then stmt else stmt | if expr then stmt When do you know which one is valid ? What’s the general form of stmt ? A 1 | 2 : if expr then stmt 1: else stmt 2 : Transform to: A A’ A’ 1 | 2 EXAMPLE: stmt if expr then stmt rest rest else stmt | MTE.56 Resolving Grammar Problems : Another Look • Forcing Matching Within Grammar CSE4100 - if-then-else else must go with most recent if stmt if expr then stmt | if expr then stmt else stmt | other Leads to: stmt matched_stmt | unmatched_stmt matched_stmt if expr then matched_stmt else matched_stmt | other unmatched_stmt if expr then stmt | if expr then matched_stmt else unmatched_stmt MTE.57 Resolving Grammar Problems : Another Look (2) • Left Factoring - Know correct choice in advance ! CSE4100 where_queries where are_does declarations of symbol | where are_does “search_option” appear Rules given in Assignment or where_queries where rest_where rest_where are declarations of symbol | does “search_option” appear MTE.58 Resolving Grammar Problems : Another Look (3) • Left Recursion - Avoid infinite loop when choosing rules CSE4100 A A | EXAMPLE: EE+T | T TT*F | F F ( E ) | id A A’ A’ A’ | E TE’ E’ + TE’ | F ( E ) | id T FT’ T’ * FT’ | If we use the original production rules on input: id + id (with leftmost): * T+T+T+…+T EE+TE+T+TE+T+T+T not assured If we use the transformed production rules on input: id + id (with leftmost): * id + TE’ * id + id E’ id + id E TE’ T + TE’ MTE.59 Overall Top-Down Algorithm CSE4100 Data structures A stack Holds the symbols to treat A lookahead window to choose a prediction Algorithm Startup Initialize stack to start symbol (a non-terminal) Initialize lookahead window at start of token stream Recursive Process Find out if front of window and top of stack match If match – Consume the symbols No match – Pop / Select / Push MTE.60 Lookahead Size and Languages With k tokens of lookahead Some languages can be parsed (this way) Some languages cannot be parsed What about k+1 tokens ? CSE4100 More languages can be parsed.... So the set of languages recognized with k is a subset of the set of languages recognized with k+1! We have a hierarchy! Still... In practice LL(1) should be enough. MTE.61 Top-Down Parsing CSE4100 Identify a leftmost derivation for an input string Why ? By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion consistent with scanning the input. A aBc adDc adec (scan a, scan d, scan e, scan c - accept!) Recursive-descent parsing concepts Predictive parsing Recursive / Brute force technique non-recursive / table driven Error recovery Implementation MTE.62 Non-Recursive / Table Driven a + b $ CSE4100 Stack X NT + T symbols of CFG Y Empty stack symbol $ Z Input Predictive Parsing Program Output What actions parser should take based on stack / input Parsing Table M[A,a] General parser behavior: X : top of stack (String + terminator) a : current input 1. When X=a = $ halt, accept, success 2. When X=a $ , POP X off stack, advance input, go to 1. 3. When X is a non-terminal, examine M[X,a] if it is an error call recovery routine if M[X,a] = {X UVW}, POP X, PUSH W,V,U DO NOT expend any input MTE.63 Algorithm for Non-Recursive Parsing Set ip to point to the first symbol of w$; repeat CSE4100 let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then Input pointer if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = XY1Y2…Yk then begin pop X from stack; push Yk, Yk-1, … , Y1 onto stack, with Y1 on top output the production XY1Y2…Yk end else error() May also execute other code based on the production used until X=$ /* stack is empty */ MTE.64 Example E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id CSE4100 Our well-worn example ! Table M Nonterminal E INPUT SYMBOL id ( TFT’ $ E’ E’ T’ T’ TFT’ T’ Fid ) ETE’ E’+TE’ T’ F * ETE’ E’ T + T’*FT’ F(E) MTE.65 Trace of Example STACK CSE4100 $E $E’T $E’T’F $E’T’id $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’id $E’T’ $E’ $ INPUT id + id * id$ id + id * id$ id + id * id$ id + id * id$ + id * id$ + id * id$ + id * id$ id * id$ id * id$ id * id$ * id$ * id$ id$ id$ $ $ $ OUTPUT E TE’ T FT’ F id T’ E’ +TE’ Expend Input T FT’ F id T’ *FT’ F id T’ E’ MTE.66 Leftmost Derivation for the Example The leftmost derivation for the example is as follows: CSE4100 E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’ id + id T’E’ id + id * FT’E’ id + id * id T’E’ id + id * id E’ id + id * id MTE.67 What’s the Missing Puzzle Piece ? Constructing the Parsing Table M ! CSE4100 1st : Calculate First & Follow for Grammar 2nd: Apply Construction Algorithm for Parsing Table Conceptual Perspective: First: Let be a string of grammar symbols. First() are the first terminals that can appear in in any possible derivation. NOTE: If * , then is First( ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals that can appear directly to the right of A in * * some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A). MTE.68 Computing First(X) : All Grammar Symbols 1. If X is a terminal, First(X) = {X} 2. If X is a production rule, add to First(X) CSE4100 3. If X is a non-terminal, and X Y1Y2…Yk is a production rule Place First(Y1) in First(X) * if Y1 , Place First(Y2) in First(X) * , if Y2 Place First(Y3) in First(X) … * , if Yk-1 Place First(Yk) in First(X) * , Stop. NOTE: As soon as Yi May repeat 1, 2, and 3, above for each Yj MTE.69 Computing First(X) : All Grammar Symbols - continued Informally, suppose we want to compute CSE4100 First(X1 X2 … Xn ) = First (X1) “+” First(X2) if is in First(X1) “+” First(X3) if is in First(X2) “+” … First(Xn) if is in First(Xn-1) Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) ! MTE.70 Example Computing First for: CSE4100 First(TE’) First(E) E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id First(T) “+” First(E’) * Not First(E’) since T First(T) First(F) “+” First(T’) First((E)) “+” First(id) Overall: First(F) Not First(T’) since F * “(“ and “id” First(E) = { ( , id } = First(F) First(E’) = { + , } First(T’) = { * , } First(T) First(F) = { ( , id } MTE.71 Example 2 Given the production rules: CSE4100 S i E t SS’ | a S’ eS | E b Verify that First(S) = { i, a } First(S’) = { e, } First(E) = { b } MTE.72 Computing Follow(A) : All Non-Terminals 1. Place $ in Follow(S), where S is the start symbol and $ signals end of input CSE4100 2. If there is a production A B, then everything in First() is in Follow(B) except for . * 3. If A B is a production, or A B and (First() contains ), then everything in Follow(A) is in Follow(B) (Whatever followed A must follow B, since nothing follows B from the production rule) We’ll calculate Follow for two grammars. MTE.73 Example Compute Follow for: CSE4100 E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id • Follow(E) - contains $ since E is the start symbol. Also, since F (E) then First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ } • Follow(E’) : E TE’ implies Follow(E) is in Follow(E’), and Follow(E’) = { ) , $ } * , put in • Follow(T) : E TE’ implies put in First(E’). Since E’ * , put in Follow(E). Since E’ +TE’ , Put in First(E’), and since E’ Follow(E’). Thus Follow(T) = { +, ), $ }. • Follow(T’) • Follow(F) You do these ! MTE.74 Computing Follow : 2nd Example Recall: CSE4100 S i E t SS’ | a First(S) = { i, a } S’ eS | First(S’) = { e, } E b First(E) = { b } Follow(S) – Contains $, since S is start symbol Since S i E t SS’ , put in First(S’) – not * , Put in Follow(S) Since S’ Since S’ eS, put in Follow(S’) So…. Follow(S) = { e, $ } Follow(S’) = Follow(S) HOW? Follow(E) = { t } MTE.75 Motivation Behind First & Follow First: CSE4100 Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream) Example: If A , and a is in First(), then when a=input, replace with . ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Follow: Is used when First has a conflict, to resolve choices. * , then what follows A dictates the When or next choice to be made. Example: If A , and b is in Follow(A ), then when a * , and if b is an input character, then we expand A with , which will eventually expand to , of which b follows! ( Above * . Here First( ) contains .) MTE.76 Constructing Parsing Table Algorithm: CSE4100 1. Repeat Steps 2 & 3 for each rule A 2. Terminal a in First()? Add A to M[A, a ] 3.1 in First()? Add A to M[A, a ] for all terminals b in Follow(A). 3.2 in First() and $ in Follow(A)? Add A to M[A, $ ] 4. All undefined entries are errors. MTE.77 Constructing Parsing Table - Example E TE’ E’ + TE’ | T FT’ CSE4100 T’ * FT’ | F ( E ) | id First(E,F,T) = { (, id } First(E’) = { +, } First(T’) = { *, } Follow(E,E’) = { ), $} Follow(F) = { *, +, ), } Follow(T,T’) = { +, ), } Expression Example: E TE’ : First(TE’) = First(T) = { (, id } M[E, ( ] : E TE’ M[E, id ] : E TE’ by rule 2 (by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’ (by rule 3) E’ : in First( ) T’ : in First( ) M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1) M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1) (Due to Follow(E’) M[T’, $] : T’ (3.2) MTE.78 Constructing Parsing Table – Example 2 CSE4100 S i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ } S’ eS | First(S’) = { e, } Follow(S’) = { e, $ } E b First(E) = { b } Follow(E) = { t } S i E t SS’ Sa Eb First(i E t SS’)={i} First(a) = {a} First(b) = {b} S’ eS First(eS) = {e} S First() = {} Follow(S’) = { e, $ } INPUT SYMBOL Nonterminal a S S a b i t $ S iEtSS’ S’ S’ eS S’ E e S E b MTE.79 Example CSE4100 Step 1 Compute Follow First S→E$ T → F T’ E → T E’ T’ → * F T’ E’→ + T E’ → → F→(E) → Id MTE.80 Example CSE4100 Step 2 Build the parser table Step 3 Input: Id + Id * Id $ S→E$ T → F T’ E → T E’ T’ → * F T’ E’→ + T E’ → → F→(E) → Id Parser Table Input Symbols NT Id + * ( S E → TE’ E → TE’ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ → E’ → T’ → T’ → T →FT’ T’ → T’ ) T’ →*FT’ F → (E) MTE.81 Motivation Behind First & Follow First: CSE4100 Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream) Example: If A , and a is in First(), then when a=input, replace with . ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Follow: Is used when First has a conflict, to resolve choices. * , then what follows A dictates the When or next choice to be made. Example: If A , and b is in Follow(A ), then when a * , and if b is an input character, then we expand A with , which will eventually expand to , of which b follows! ( Above * . Here First( ) contains .) MTE.82 Constructing Parsing Table Algorithm: CSE4100 1. Repeat Steps 2 & 3 for each rule A 2. Terminal a in First()? Add A to M[A, a ] 3.1 in First()? Add A to M[A, a ] for all terminals b in Follow(A). 3.2 in First() and $ in Follow(A)? Add A to M[A, $ ] 4. All undefined entries are errors. MTE.83 Constructing Parsing Table - Example E TE’ E’ + TE’ | T FT’ CSE4100 T’ * FT’ | F ( E ) | id First(E,F,T) = { (, id } First(E’) = { +, } First(T’) = { *, } Follow(E,E’) = { ), $} Follow(F) = { *, +, ), } Follow(T,T’) = { +, ), } Expression Example: E TE’ : First(TE’) = First(T) = { (, id } M[E, ( ] : E TE’ M[E, id ] : E TE’ by rule 2 (by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’ (by rule 3) E’ : in First( ) T’ : in First( ) M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1) M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1) (Due to Follow(E’) M[T’, $] : T’ (3.2) MTE.84 Constructing Parsing Table – Example 2 CSE4100 S i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ } S’ eS | First(S’) = { e, } Follow(S’) = { e, $ } E b First(E) = { b } Follow(E) = { t } S i E t SS’ Sa Eb First(i E t SS’)={i} First(a) = {a} First(b) = {b} S’ eS First(eS) = {e} S First() = {} Follow(S’) = { e, $ } INPUT SYMBOL Nonterminal a S S a b i t $ S iEtSS’ S’ S’ eS S’ E e S E b MTE.85 Example CSE4100 Step 1 Compute Follow First S→E$ T → F T’ E → T E’ T’ → * F T’ E’→ + T E’ → → F→(E) → Id MTE.86 Example CSE4100 Step 2 Build the parser table Step 3 Input: Id + Id * Id $ S→E$ T → F T’ E → T E’ T’ → * F T’ E’→ + T E’ → → F→(E) → Id Parser Table Input Symbols NT Id + * ( S E → TE’ E → TE’ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ → E’ → T’ → T’ → T →FT’ T’ → T’ ) T’ →*FT’ F → (E) MTE.87 LL(1) Grammars L : Scan input from Left to Right L : Construct a Leftmost Derivation CSE4100 1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action LL(1) grammars have no multiply-defined entries in the parsing table. Properties of LL(1) grammars: • Grammar can’t be ambiguous or left recursive • Grammar is LL(1) when A 1. & do not derive strings starting with the same terminal a 2. Either or can derive , but not both. Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar MTE.88 Error Recovery When Do Errors Occur? Recall Predictive Parser Function: a + b $ CSE4100 Stack X Y Z $ Predictive Parsing Program Input Output Parsing Table M[A,a] 1. If X is a terminal and it doesn’t match input. 2. If M[ X, Input ] is empty – No allowable actions Consider two recovery techniques: A. Panic Mode B. Phase-level Recovery MTE.89 Panic Mode Recovery Augment parsing table with action that attempts to realign / synchronize token stream with the expected input. CSE4100 Suppose : A on top of stack doesn’t mesh with current input symbol 1. Use Follow(A) to remove input tokens – sync (discard) 2. Use First(A) to determine when to restart parsing 3. Incorporate higher level language concepts (begin/end, while, repeat/until) to sync actions we don’t skip tokens unnecessarily. Other actions: 4. When A , use it to manipulate stack to postpone error detection 5. Use non-matching terminal on stack as token that is inserted into input. MTE.90 Revised Parsing Table / Example Nonterminal CSE4100 E INPUT SYMBOL id ( ) ETE’ synch E’+TE’ TFT’ T’ F * ETE’ E’ T + Fid E’ synch TFT’ T’ T’*FT’ synch synch From Follow sets. Pop stack entry – T or NT synch T’ F(E) synch $ synch E’ synch T’ synch Skip input symbol MTE.91 Skip & Synch Meaning Skip CSE4100 Discard input symbol Synch Pop top of stack Messages Constructed based on lookahead an non-terminal Example NT = F Lookahead = + Expecting a FACTOR. Got + for a Term. So a factor is missing. MTE.92 Revised Parsing Table / Example(2) STACK CSE4100 $E $E $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’ $ INPUT ) id * + id$ id * + id$ id * + id$ id * + id$ id * + id$ * + id$ * + id$ + id$ + id$ + id$ + id$ id$ id$ id$ $ $ $ Remark error, skip ) id is in First(E) error, M[F,+] = synch F has been popped MTE.93 Phase-Level Recovery CSE4100 Fill in blanks entries of parsing table with error handling routines These routines modify stack and / or input stream issue error message Problems: Modifying stack has to be done with care, so as to not create possibility of derivations that aren’t in language Infinite loops must be avoided Can be used in conjunction with panic mode to have more complete error handling MTE.94