Chapter 5

advertisement
CS 3240 – Chapter 5
Language
Machine
Grammar
Regular
Finite Automaton
Regular Expression,
Regular Grammar
Context-Free
Pushdown Automaton
Context-Free
Grammar
Recursively
Enumerable
Turing Machine
Unrestricted PhraseStructure Grammar
CS 3240 - Introduction
2

5.1: Context-Free Grammars
 Derivations
 Derivation Trees


5.2: Parsing and Ambiguity
5.3: CFGs and Programming Languages
 Precedence
 Associativity
 Expression Trees
CS 3240 - Context-Free Languages
3


S ➞ aaSa | λ
It is not right-linear or left-linear
 so it is not a “regular grammar”

But it is linear
 only one variable

What is it’s language?
CS 3240 - Context-Free Languages
4
S ➝ aSb | λ
Deriving aaabbb:
S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb
CS 3240 - Context-Free Languages
5

Variables
 aka “non-terminals”

Letters from some alphabet, Σ
 aka “terminals”

Rules (“substitution rules”)
 of the form V → s
▪ where s is any string of letters and variables, or λ
 Rules are often called productions
CS 3240 - Context-Free Languages
6








ancbn
anb2n
anbm, where 0 ≤ n ≤ m ≤ 2n
anbm, n ≠ m
Palindrome (start with a recursive definition)
Non-Palindrome
Equal
anbnam
CS 3240 - Context-Free Languages
7
S → aSbSbS | bSaSbS | bSbSaS | λ
Trace ababbb
When building CFGs, remember that the start variable (S)
represents a string in the language. So, for example, if S has
twice as many b’s as a’s, then so does aSbSbS, etc.
CS 3240 - Pushdown Automata
8


A derivation is a sequence of applications of
grammatical rules, eventually yielding a
string in the language
A CFG can have multiple variables on the
right-hand side of a rule
 Giving a choice of which variable to expand first

By convention, we usually use a leftmost
derivation
CS 3240 - Context-Free Languages
9
<S> → <NP> <VP>
<NP> → the <N>
<VP> → <V> <NP>
<V> → sings | eats
<N> → cat | song | canary
<S> ⇒ <NP> <VP>
⇒ the <N> <VP>
⇒ the canary <VP>
⇒ the canary <V> <NP>
⇒ the canary sings <NP>
⇒ the canary sings the <N>
⇒ the canary sings the song
CS 3240 - Context-Free Languages
“sentential forms”
(aka “productions”)
10




A graphical representation of a derivation
The start symbol is the root
Each symbol in the right-hand side of the rule
is a child node at the same level
Continue until the leaves are all terminals
CS 3240 - Context-Free Languages
11
CS 3240 - Context-Free Languages
12

Note how there was only one parse tree or
the string “the canary sings the song”
 And only one leftmost derivation


This is not true of all grammars!
Some grammars allow choices of distinct
rules to generate the same string
 Or equivalently, where there is more than one
parse tree for the same string

Such a grammar is ambiguous
 Not easy to process programmatically
CS 3240 - Context-Free Languages
13
<exp> → <exp> + <exp> | <exp> * <exp> | (<exp>) | a | b | c
<exp> ⇒ <exp> + <exp>
⇒ a + <exp>
⇒ a + <exp> * <exp>
⇒ a + b * <exp>
⇒a+b*c
<exp> ⇒ <exp> * <exp>
⇒ <exp> + <exp> * <exp>
⇒ a + <exp> * <exp
⇒ a + b * <exp>
⇒a+b*c
CS 3240 - Context-Free Languages
14
Which one is “correct”?
CS 3240 - Context-Free Languages
15

The process of determining if a string is
generated by a grammar
 And often we want the parse tree
 So that we know the order of operations

Top-down Parsing
 Easiest conceptually

Bottom-up Parsing
 Most efficient (used by commercial compilers)
 We will use a simple one in Chapter 6
CS 3240 - Context-Free Languages
16


Try to match a string, w, to a grammar
If there is a rule S → w, we’re done!
 Fat chance :-)

Try to find rules that match the first character
 A “look-ahead” strategy
 This is what we do “in our heads” anyway


Repeat on the rest of the string…
Very “brute force”
CS 3240 - Context-Free Languages
17
S → SS | aSb | bSa | λ
Parse “aabb”:
CS 3240 - Context-Free Languages
18
S → SS | aSb | bSa | λ
Parse “aabb”:
Candidate rules: 1) S → SS, 2) S → aSb:
1)SS ⇒ SSS, SS ⇒ aSbS
2)aSb ⇒ aSSb, aSb ⇒ aaSbb
Answer: S ⇒ aSb ⇒ aaSbb ⇒ aabb (2)
Not a well-defined algorithm (yet)!
CS 3240 - Context-Free Languages
19


A top-down parsing technique
Grammar Requirements:
 no ambiguity
 no lambdas
 no left-recursion (e.g., A -> Ab)
 … and some other stuff



Create a function for each variable
Check first character to choose a rule
Start by calling S( )
CS 3240 - Context-Free Languages
20

Grammar:
S -> aSb | ab

Function S:
 if length == 2, check to see if it is “ab”
 otherwise, consume outer‘a’ and ‘b’, then call S
on what’s left
 See parseanbn.py, parseanbn2.py
CS 3240 - Context-Free Languages
21

Grammar:
A -> BA | a
B -> bB | b

See parsebstara.cpp
CS 3240 - Context-Free Languages
22

Lambda rules can cause productions to shrink
 Then they can grow, and shrink again
 And grow, and shrink, and grow, and shrink…

How then can we know if the string isn’t in
the language?
 That is, how do we know when we’re done so we
can stop and reject the string?
CS 3240 - Context-Free Languages
23




A rule of the form A → B doesn’t increase the
size of the sentential form
Once again, we could spend a long time
cycling through unit rules before parsing |w|
We prefer a method that always strictly grows
to |w|, so we can stop and answer “yes” or
“no” efficiently
So, we will remove lambda and unit rules
 In Chapter 6
CS 3240 - Context-Free Languages
24


Precedence
Associativity
CS 3240 - Context-Free Languages
25

It was ambiguous because it treated all
operators equally
 But multiplication should have higher precedence
than addition

So we introduce a new variable for
multiplicative expressions
 And place it further down in the rules
 Because we want it to appear further down in the
parse tree
CS 3240 - Context-Free Languages
26
<exp> → <exp> + <mulexp> | <mulexp>
<mulexp> → <mulexp> * <rootexp> | <rootexp>
<rootexp> → (<exp>) | a | b | c
Now only one leftmost derivation for a + b * c:
<exp> ⇒ <exp> + <mulexp> ⇒ <mulexp> + <mulexp>
⇒ <rootexp> + <mulexp>
⇒ a + <mulexp>
⇒ a + <mulexp> * <rootexp>
⇒ a + <rootexp> * <rootexp>
⇒ a + b * <rootexp>
⇒a+b*c
CS 3240 - Context-Free Languages
27
CS 3240 - Context-Free Languages
28



Derive the parse tree for a + b + c …
Note how you get (a + b) + c, in effect
Left-recursion gives left associativity
 Analogously for right associativity

Exercise:
 Add a right-associative power (exponentiation)
operator (^, with variable <powerexp>) to the
grammar with the proper precedence
CS 3240 - Context-Free Languages
29
Download