Sebesta 4

CSci210.BA4 Chapter 4 Topics Introduction  Lexical and Syntax Analysis  The Parsing Problem  Recursive-Descent Parsing  Bottom-Up Parsing  Introduction Syntax analyzers almost always based on a formal description of the syntax of the source language (grammars)  Almost all compilers separate analyzing syntax into:   Lexical Analysis – low-level  Syntax Analysis – high-level Reasons to Separate Syntax and Lexical Analysis  Simplicity – lexical analysis is less complex, so the process is simpler when separated Efficiency – allows for selective optimization  Portability – lexical analyzer is somewhat  platform dependent whereas the syntax analyzer is more platform independent Lexical Analysis A pattern matcher for character strings  Performs syntax analysis at the lowest level of the program structure  Extracts lexemes from a given input string and produce the corresponding tokens  Lexical Analysis (continued) result = oldsum – value / 100; Token Lexeme IDENT ASSIGN_OP IDENT SUB_OP IDENT DIV_OP INT_LIT SEMICOLON result = oldsum value / 100 ; Building a Lexical Analyzer Write a formal description of the tokens and use a software tool that constructs lexical analyzers when given such a description  Design a state transition diagram that describes the tokens and write a program that implements the diagram  Design a state transition diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram  State (Transition) Diagram Design A directed graph with nodes labeled with state names and arcs labeled with input characters  Including states and transitions for each and every token pattern would be too large and complex  Transitions can be combined to simplify the state diagram  The Parsing Problem  Two goals of syntax analysis:  Check the input program for any syntax errors, produce a diagnostic message if an error is found, and recover  Produce the parse tree, or at least a trace of the parse tree, for the program  Two Classes of parsers:  Top-down  Bottom-up Top-Down Parsers Traces or builds a parse tree in preorder (leftmost derivation)  The most common top-down parsing algorithms:   Recursive descent  LL parsers Bottom-Up Parsers Produce the parse tree by beginning at the leaves and progressing towards the root  Most common bottom-up parsers are in the LR family  Complexity of Parsing Parsing algorithms that work for any unambiguous grammar are complex and inefficient: O(n3)  Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time: O(n)  Recursive-Descent Parsing Top-Down Parser  EBNF is ideal for the basis of a recursive-descent parser   Each terminal maps to a function  For a non-terminal with more than one RHS, look at the next token to determine which side to choose  No mapping = syntax error Recursive-Descent Parsing  Grammar for an expression: <expr> → <term> {+ <term>} <term> → <factor> {* <factor>} <factor> → id | int_constant | ( <expr> )  How do we parse? Expression: 1 + 2 <expr> → <term> + <term> → <factor> + <term> → 1 + <term> Recursive-Descent Parsing  Grammar for an expression: <expr> → <term> {+ <term>} <term> → <factor> {* <factor>} <factor> → id | int_constant | ( <expr> )  What does code look like? void expr() { term(); while (nextToken == ADD_OP) { lex(); term(); } } Recursive-Descent Parsing  The LL (Left Recursion) Problem <expr> → <expr> + <term> <expr> → <expr> + <term> + <term> <expr> → <expr> + <term> + <term> + <term>  How do we fix it?  Modify grammar to remove left recursion Before: After: <expr> → <expr> + <term> <expr> → <term> + <term> <term> → id | int_constant | <expr> Recursive-Descent Parsing  The Pairwise Disjointness Problem  If the grammar is not pairwise disjoint, how do you know which RHS to pick based on the next token? <variable> → identifier | identifier[<expr>]  How do we fix it?  Left Factoring <variable> → identifier<new> <new> → ø | [<expr>] Bottom-Up Parsing  Parsing is based on reduction  Reverse of a rightmost derivation  At each step, find the correct RHS that reduces to the previous step in the derivation  Example Grammar <S> → <A>b <A> → a <A> → b Input: ab Step 1: <A>b Step 2: <S> Bottom-Up Parsing  Most bottom-up parsers are shift-reduce algorithms  Shift – move token onto the stack  Reduce – replace RHS with LHS Bottom-Up Parsing  Handles  Def:  is the handle of the right sentential form iff  = w if and only if S =>*rm Aw =>rm w  The handle of a right sentential form is its leftmost simple phrase  Bottom-Up Parsing is essentially looking for handles and replacing them with their LHS Bottom-Up Parsing  Advantages of Shift Reduction Parsers  They can be built for all programming languages  They can detect syntax errors as soon as it is possible in a left-to-right scan  They LR class of grammars is a proper superset of the class parsable by LL parsers (for example, many left recursive grammars are LR, but none are LL) Bottom-Up Parsing  Shift Reduction Algorithms  Input Sequence – input to be parsed  Parse Stack – input is shifted onto the parse stack  ACTION Table – what the parser does  GOTO Table – holds state symbols to be pushed onto the stack when a reduction is completed Bottom-Up Parsing  ACTION Table (or Parse Table)  Rows = State Symbols  Columns = Terminal symbols  Values  Shift – push token on stack  Reduce – replace handle with LHS  Accept – stack only has start symbol and input is empty  Error – original input is invalid Bottom-Up Parsing  GOTO Table (or Parse Table)  Rows = State Symbols  Columns = Nonterminal Symbols  Values indicate which state symbol should be pushed onto the parse stack after a reduction has been completed

Sebesta 4

Related documents

Products

Support

Sebesta 4

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib