Sebesta 4

advertisement
CSci210.BA4
Chapter 4 Topics
Introduction
 Lexical and Syntax Analysis
 The Parsing Problem
 Recursive-Descent Parsing
 Bottom-Up Parsing

Introduction
Syntax analyzers almost always based
on a formal description of the syntax of
the source language (grammars)
 Almost all compilers separate analyzing
syntax into:

 Lexical Analysis – low-level
 Syntax Analysis – high-level
Reasons to Separate Syntax and
Lexical Analysis

Simplicity – lexical analysis is less complex, so
the process is simpler when separated
Efficiency – allows for selective optimization
 Portability – lexical analyzer is somewhat

platform dependent whereas the syntax analyzer
is more platform independent
Lexical Analysis
A pattern matcher for character strings
 Performs syntax analysis at the lowest
level of the program structure
 Extracts lexemes from a given input
string and produce the corresponding
tokens

Lexical Analysis (continued)
result = oldsum – value / 100;
Token
Lexeme
IDENT
ASSIGN_OP
IDENT
SUB_OP
IDENT
DIV_OP
INT_LIT
SEMICOLON
result
=
oldsum
value
/
100
;
Building a Lexical Analyzer
Write a formal description of the tokens
and use a software tool that constructs
lexical analyzers when given such a
description
 Design a state transition diagram that
describes the tokens and write a program
that implements the diagram
 Design a state transition diagram that
describes the tokens and hand-construct a
table-driven implementation of the state
diagram

State (Transition) Diagram Design
A directed graph with nodes labeled with
state names and arcs labeled with input
characters
 Including states and transitions for each
and every token pattern would be too
large and complex
 Transitions can be combined to simplify
the state diagram

The Parsing Problem

Two goals of syntax analysis:
 Check the input program for any syntax
errors, produce a diagnostic message if an
error is found, and recover
 Produce the parse tree, or at least a trace of
the parse tree, for the program

Two Classes of parsers:
 Top-down
 Bottom-up
Top-Down Parsers
Traces or builds a parse tree in preorder
(leftmost derivation)
 The most common top-down parsing
algorithms:

 Recursive descent
 LL parsers
Bottom-Up Parsers
Produce the parse tree by beginning at
the leaves and progressing towards the
root
 Most common bottom-up parsers are in
the LR family

Complexity of Parsing
Parsing algorithms that work for any
unambiguous grammar are complex and
inefficient: O(n3)
 Compilers use parsers that only work for
a subset of all unambiguous grammars,
but do it in linear time: O(n)

Recursive-Descent Parsing
Top-Down Parser
 EBNF is ideal for the basis of a
recursive-descent parser

 Each terminal maps to a function
 For a non-terminal with more than one RHS,
look at the next token to determine which
side to choose
 No mapping = syntax error
Recursive-Descent Parsing

Grammar for an expression:
<expr> → <term> {+ <term>}
<term> → <factor> {* <factor>}
<factor> → id | int_constant | ( <expr> )

How do we parse?
Expression: 1 + 2
<expr>
→ <term> + <term>
→ <factor> + <term>
→ 1 + <term>
Recursive-Descent Parsing

Grammar for an expression:
<expr> → <term> {+ <term>}
<term> → <factor> {* <factor>}
<factor> → id | int_constant | ( <expr> )

What does code look like?
void expr() {
term();
while (nextToken == ADD_OP) {
lex();
term();
}
}
Recursive-Descent Parsing

The LL (Left Recursion) Problem
<expr> → <expr> + <term>
<expr> → <expr> + <term> + <term>
<expr> → <expr> + <term> + <term> + <term>

How do we fix it?
 Modify grammar to remove left recursion
Before:
After:
<expr> → <expr> + <term>
<expr> → <term> + <term>
<term> → id | int_constant | <expr>
Recursive-Descent Parsing

The Pairwise Disjointness Problem
 If the grammar is not pairwise disjoint, how do you know
which RHS to pick based on the next token?
<variable> → identifier | identifier[<expr>]

How do we fix it?
 Left Factoring
<variable> → identifier<new>
<new> → ø | [<expr>]
Bottom-Up Parsing

Parsing is based on reduction
 Reverse of a rightmost derivation
 At each step, find the correct RHS that
reduces to the previous step in the
derivation

Example Grammar
<S> → <A>b
<A> → a
<A> → b
Input: ab
Step 1: <A>b
Step 2: <S>
Bottom-Up Parsing

Most bottom-up parsers are shift-reduce
algorithms
 Shift – move token onto the stack
 Reduce – replace RHS with LHS
Bottom-Up Parsing

Handles
 Def:  is the handle of the right sentential
form iff
 = w if and only if S =>*rm Aw =>rm w
 The handle of a right sentential form is its
leftmost simple phrase
 Bottom-Up Parsing is essentially looking for
handles and replacing them with their LHS
Bottom-Up Parsing

Advantages of Shift Reduction Parsers
 They can be built for all programming
languages
 They can detect syntax errors as soon as it
is possible in a left-to-right scan
 They LR class of grammars is a proper
superset of the class parsable by LL parsers
(for example, many left recursive grammars
are LR, but none are LL)
Bottom-Up Parsing

Shift Reduction Algorithms
 Input Sequence – input to be parsed
 Parse Stack – input is shifted onto the
parse stack
 ACTION Table – what the parser does
 GOTO Table – holds state symbols to be
pushed onto the stack when a reduction is
completed
Bottom-Up Parsing

ACTION Table (or Parse Table)
 Rows = State Symbols
 Columns = Terminal symbols

Values
 Shift – push token on stack
 Reduce – replace handle with LHS
 Accept – stack only has start symbol and
input is empty
 Error – original input is invalid
Bottom-Up Parsing

GOTO Table (or Parse Table)
 Rows = State Symbols
 Columns = Nonterminal Symbols

Values indicate which state symbol
should be pushed onto the parse stack
after a reduction has been completed
Download