Basic Parsing with Context- Free Grammars 1

advertisement
Basic Parsing with ContextFree Grammars
Slides adapted from Julia Hirschberg and Dan Jurafsky
1
Homework – Getting Started

Data
–
–
–
–
–

2
News articles in TDT4
Make sure you are looking in ENG sub-directory
You need to represent each article in .arff form
You need to write a program that will extract “features” from
each article
(Note that files with POS tags are now available in
Eng_POSTAGGED)
The .arff file contains independent variables as well
as the dependent variable
An example



Start with classifying into topic
Suppose you want to start with just the
words
Two approaches
1.
2.
3
Use your intuition to choose a few words that
might disambiguate
Start with all words
What would your .arff file look like?

Words are the attributes. What are the
values?



4
Binary: present or not
Frequency: how many times it occurs
TF*IDF: how many times it occurs in this document (TF
= term frequency) divided by how many times it occurs
in all documents (DF = document frequency
news_2865.input
5
<DOC>
<SOURCE_TYPE>NWIRE</SOURCE_TYPE>
<SOURCE_LANG>ENG</SOURCE_LANG>
<SOURCE_ORG>NYT</SOURCE_ORG>
<DOC_DATE>20010101</DOC_DATE>
<BROAD_TOPIC>BT_9</BROAD_TOPIC>
<NARROW_TOPIC>NT_34</NARROW_TOPIC>
<TEXT>
Reversing a policy that has kept medical errors secret for more than two decades, federal
officials say they will soon allow Medicare beneficiaries to obtain data about doctors who
botched their care. Tens of thousands of Medicare patients file complaints each year about
the quality of care they receive from doctors and hospitals. But in many cases, patients get
no useful information because doctors can block the release of assessments of their
performance. Under a new policy, officials said, doctors will no longer be able to veto
disclosure of the findings of investigations. Federal law has for many years allowed for
review of care received by Medicare patients, and the law says a peer review organization
must inform the patient of the ``final disposition of the complaint'' in each case. But the
federal rules used to carry out the law say the peer review organization may disclose
information about a doctor only ``with the consent of that practitioner.'' The federal manual
for peer review organizations includes similar language about disclosure. Under the new
plan, investigators will have to tell patients whether their care met ``professionally
recognized standards of health care'' and inform them of any action against the doctor or
the hospital. Patients could use such information in lawsuits and other actions against
doctors and hospitals that provided substandard care. The new policy came in response to
All words for news_2865.input












6
Class
Reversing
A
Policy
That
Has
Kept
Commonwealth
Independent
States
Preemptive
Refugees
BT_9
dependent
1
independent
100
20
50
75
3
0
(news_2816.input)
0
0
0
0
Try it!

Open your file
Select attributes using Chi-square
You can cut and paste resulting attributes to a file
Classify

How does it work?

Try n-grams, POS or date next in same way



–
7
How many features would each give you?
CFG: Example

Many possible CFGs for English, here is an example
(fragment):
–
–
–
–
–
–
–
–
–
8
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
the very small boy likes a girl
Modify the grammar
9
10
Derivations of CFGs


12
String rewriting system: we derive a string
(=derived structure)
But derivation history represented by phrasestructure tree (=derivation structure)!
Formal Definition of a CFG

G = (V,T,P,S)
V: finite set of nonterminal symbols

T: finite set of terminal symbols, V and T are disjoint

P: finite set of productions of the form
A  , A  V and   (T  V)*

13
S  V: start symbol
Context?
14

The notion of context in CFGs has nothing to do with
the ordinary meaning of the word context in
language

All it really means is that the non-terminal on the lefthand side of a rule is out there all by itself (free of
context)
A -> B C
Means that I can rewrite an A as a B followed by a C
regardless of the context in which A is found
Key Constituents (English)




15
Sentences
Noun phrases
Verb phrases
Prepositional phrases
Sentence-Types

Declaratives: I do not.
S -> NP VP

Imperatives: Go around again!
S -> VP

Yes-No Questions: Do you like my hat?
S -> Aux NP VP

WH Questions: What are they going to do?
S -> WH Aux NP VP
16
18
19
NPs

NP -> Pronoun
–

NP -> Proper-Noun
–
–


The president
NP -> Det Nominal
Nominal -> Noun Noun
–
20
New Jersey is west of New York City
Lee Bollinger is the president of Columbia
NP -> Det Noun
–

I came, you saw it, they conquered
A morning flight to Denver
PPs

PP -> Preposition NP
–
–
–
–
–
21
Over the house
Under the house
To the tree
At play
At a party on a boat at night
22
23
24
26
Recursion

We’ll have to deal with rules such as the
following where the non-terminal on the left
also appears somewhere on the right
(directly)
NP -> NP PP
VP -> VP PP
27
[[The flight] [to Boston]]
[[departed Miami] [at noon]]
Recursion

Of course, this is what makes syntax interesting
Flights from Denver
Flights from Denver to Miami
Flights from Denver to Miami in February
Flights from Denver to Miami in February on a Friday
Flights from Denver to Miami in February on a Friday under $300
Flights from Denver to Miami in February on a Friday under $300 with
lunch
28
Recursion
[[Flights] [from Denver]]
[[[Flights] [from Denver]] [to Miami]]
[[[[Flights] [from Denver]] [to Miami]] [in February]]
[[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a
Friday]]
Etc.
NP -> NP PP
29
Implications of recursion and
context-freeness

30
If you have a rule like
–
VP -> V NP
–
It only cares that the thing after the verb is an NP
It doesn’t have to know about the internal affairs
of that NP
The point


VP -> V NP
(I) hate
flights from Denver
flights from Denver to Miami
flights from Denver to Miami in February
flights from Denver to Miami in February on a Friday
flights from Denver to Miami in February on a Friday under $300
flights from Denver to Miami in February on a Friday under $300 with
lunch
31
Grammar Equivalence

Can have different grammars that generate same set of
strings (weak equivalence)
–
–

Can have different grammars that have same set of
derivation trees (strong equivalence)
–
–
–

32
Grammar 1: NP  DetP N and DetP  a | the
Grammar 2: NP  a N | NP  the N
With CFGs, possible only with useless rules
Grammar 2: NP  a N | NP  the N
Grammar 3: NP  a N | NP  the N, DetP  many
Strong equivalence implies weak equivalence
Normal Forms &c


33
There are weakly equivalent normal forms
(Chomsky Normal Form, Greibach Normal
Form)
There are ways to eliminate useless
productions and so on
Chomsky Normal Form
A CFG is in Chomsky Normal Form (CNF) if all
productions are of one of two forms:
 A  BC with A, B, C nonterminals
 A  a, with A a nonterminal and a a terminal
Every CFG has a weakly equivalent CFG in CNF
34
“Generative Grammar”



35
Formal languages: formal device to generate a set of
strings (such as a CFG)
Linguistics (Chomskyan linguistics in particular):
approach in which a linguistic theory enumerates all
possible strings/structures in a language
(=competence)
Chomskyan theories do not really use formal devices
– they use CFG + informally defined transformations
Nobody Uses Simple CFGs
(Except Intro NLP Courses)



36
All major syntactic theories (Chomsky, LFG, HPSG,
TAG-based theories) represent both phrase structure
and dependency, in one way or another
All successful parsers currently use statistics about
phrase structure and about dependency
Derive dependency through “head percolation”: for
each rule, say which daughter is head
Penn Treebank (PTB)




Syntactically annotated corpus of newspaper texts
(phrase structure)
The newspaper texts are naturally occurring data, but
the PTB is not!
PTB annotation represents a particular linguistic
theory (but a fairly “vanilla” one)
Particularities
–
–
–
37
Very indirect representation of grammatical relations (need
for head percolation tables)
Completely flat structure in NP (brown bag lunch, pink-andyellow child seat )
Has flat Ss, flat VPs
Example from PTB
( (S (NP-SBJ It)
(VP 's
(NP-PRD (NP (NP the latest investment craze)
(VP sweeping
(NP Wall Street)))
:
(NP (NP a rash)
(PP of
(NP (NP new closed-end country funds)
,
(NP (NP those
(ADJP publicly traded)
portfolios)
(SBAR (WHNP-37 that)
(S (NP-SBJ *T*-37)
(VP invest
(PP-CLR in
(NP (NP stocks)
(PP of
(NP a single foreign country)))))))))))
Types of syntactic constructions

Is this the same construction?
An elf decided to clean the kitchen
– An elf seemed to clean the kitchen
An elf cleaned the kitchen
–

Is this the same construction?
An elf decided to be in the kitchen
– An elf seemed to be in the kitchen
An elf was in the kitchen
–
39
Types of syntactic constructions
(ctd)

Is this the same construction?
There is an elf in the kitchen
– *There decided to be an elf in the kitchen
– There seemed to be an elf in the kitchen

Is this the same construction?
It is raining/it rains
– ??It decided to rain/be raining
– It seemed to rain/be raining
40
Types of syntactic constructions
(ctd)
Conclusion:
 to seem: whatever is embedded surface
subject can appear in upper clause
 to decide: only full nouns that are referential
can appear in upper clause
 Two types of verbs
41
Types of syntactic constructions:
Analysis
S
NP
S
VP
an elf V
VP
S
to decide NP
VP
an elf V
S
V
to seem NP
PP
to be in the
kitchen
VP
an elf V
PP
to be in the
kitchen
Types of syntactic constructions:
Analysis
S
NP
S
VP
an elf V
VP
S
decided NP
VP
PRO V
S
V
seemed NP
PP
to be in the
kitchen
VP
an elf V
PP
to be in the
kitchen
Types of syntactic constructions:
Analysis
S
NP
S
VP
an elf V
VP
S
decided NP
VP
PRO V
S
V
seemed NP
PP
to be in the
kitchen
VP
an elf V
PP
to be in the
kitchen
Types of syntactic constructions:
Analysis
S
NP
S
NPi
VP
an elf V
an elf V
S
decided NP
VP
PRO V
VP
S
seemed NP
PP
to be in the
kitchen
ti
VP
V
PP
to be in the
kitchen
Types of syntactic constructions:
Analysis
to seem: lower surface subject raises to
upper clause; raising verb
seems (there to be an elf in the kitchen)
there seems (t to be an elf in the kitchen)
it seems (there is an elf in the kitchen)
46
Types of syntactic constructions:
Analysis (ctd)

to decide: subject is in upper clause and co-refers with an
empty subject in lower clause; control verb
an elf decided (an elf to clean the kitchen)
an elf decided (PRO to clean the kitchen)
an elf decided (he cleans/should clean the kitchen)
*it decided (an elf cleans/should clean the kitchen)
47
Lessons Learned from the
Raising/Control Issue




48
Use distribution of data to group phenomena into
classes
Use different underlying structure as basis for
explanations
Allow things to “move” around from underlying
structure -> transformational grammar
Check whether explanation you give makes
predictions
Empirical Matter
The Big Picture
or
Formalisms
•Data structures
•Formalisms
•Algorithms
•Distributional Models
uses
descriptive
theory is
about
predicts
Maud expects
there to be a
riot
*Teri
promised there
to be a riot
Maud expects
the shit to hit
the fan
*Teri
promised the
shit to hit the
explanatory
theory is about
Linguistic Theory
Content: Relate morphology to semantics
• Surface representation (eg, ps)
• Deep representation (eg, dep)
• Correspondence
Syntactic Parsing
50
Syntactic Parsing


51
Declarative formalisms like CFGs, FSAs
define the legal strings of a language -- but
only tell you ‘this is a legal string of the
language X’
Parsing algorithms specify how to recognize
the strings of a language and assign each
string one (or more) syntactic analyses
Parsing as a Form of Search

Searching FSAs
–
–

Searching CFGs
–
–

52
Finding the right path through the automaton
Search space defined by structure of FSA
Finding the right parse tree among all possible
parse trees
Search space defined by the grammar
Constraints provided by the input sentence
and the automaton or grammar
CFG for Fragment of English
S  NP VP
S  Aux NP VP
VP  V
NP  Det Nom
PP -> Prep NP
N  old | dog | footsteps |
young
NP PropN
V  dog | include | prefer
Nom -> Adj Nom
Nom  N Nom
Nom  N
Aux  does
Prep from | to | on | of
PropN  Bush | McCain |
Obama
Det  that | this | a| the
Nom  Nom PP
VP  V NP
TopD BotUp
E.g.
LC’s
Parse Tree for ‘The old dog the footsteps of the
young’ for Prior CFG
S
NP
VP
V
DET
NOM
old
NOM
DET
N
The
NP
dog
the
N
footsteps
PP
of the young
Top-Down Parser



Builds from the root S node to the leaves
Expectation-based
Common search strategy
–
–
–
–
–
55
Top-down, left-to-right, backtracking
Try first rule with LHS = S
Next expand all constituents in these trees/rules
Continue until leaves are POS
Backtrack when candidate POS does not match input string
Rule Expansion

“The old dog the footsteps of the young.”



56
Where does backtracking happen?
What are the computational disadvantages?
What are the advantages?
Bottom-Up Parsing

Parser begins with words of input and builds
up trees, applying grammar rules whose
RHS matches
Det N V Det N
Prep Det N
The old dog the footsteps of the young.
57
Det Adj N Det N
Prep Det N
The old dog the footsteps of the young.
Parse continues until an S root node reached or
no further node expansion possible
What’s right/wrong with….



Top-Down parsers – they never explore illegal parses
(e.g. which can’t form an S) -- but waste time on trees that
can never match the input
Bottom-Up parsers – they never explore trees
inconsistent with input -- but waste time exploring illegal
parses (with no S root)
For both: find a control strategy -- how explore search
space efficiently?
–
–
58
–
Pursuing all parses in parallel or backtrack or …?
Which rule to apply next?
Which node to expand next?
Some Solutions
Dynamic Programming Approaches – Use a chart
to represent partial results
 CKY Parsing Algorithm
–
–
–

Early Parsing Algorithm
–
–
–

59
Bottom-up
Grammar must be in Normal Form
The parse tree might not be consistent with linguistic theory
Top-down
Expectations about constituents are confirmed by input
A POS tag for a word that is not predicted is never added
Chart Parser
Earley Parsing


Allows arbitrary CFGs
Fills a table in a single sweep over the input
words
–
–
Table is length N+1; N is number of words
Table entries represent



60
Completed constituents and their locations
In-progress constituents
Predicted constituents
States

61
The table-entries are called states and are
represented with dotted-rules.
S -> · VP
A VP is predicted
NP -> Det · Nominal
An NP is in progress
VP -> V NP ·
A VP has been found
States/Locations

62
It would be nice to know where these things are in
the input so…
S -> · VP [0,0]
A VP is predicted at the
start of the sentence
NP -> Det · Nominal [1,2]
An NP is in progress; the
Det goes from 1 to 2
VP -> V NP ·
A VP has been found
starting at 0 and ending at 3
[0,3]
Graphically
63
Earley



As with most dynamic programming
approaches, the answer is found by looking
in the table in the right place.
In this case, there should be an S state in the
final column that spans from 0 to n+1 and is
complete.
If that’s the case you’re done.
–
64
S –> α · [0,n+1]
Earley Algorithm


March through chart left-to-right.
At each step, apply 1 of 3 operators
–
Predictor

–
Scanner

–
Match word predictions (rule with word after dot) to
words
Completer

65
Create new states representing top-down expectations
When a state is complete, see what rules were looking
for that completed constituent
Predictor

Given a state
–
–
–
–
–
With a non-terminal to right of dot
That is not a part-of-speech category
Create a new state for each expansion of the non-terminal
Place these new states into same chart entry as generated state,
beginning and ending where generating state ends.
So predictor looking at

–
results in


66
S -> . VP [0,0]
VP -> . Verb [0,0]
VP -> . Verb NP [0,0]
Scanner

Given a state
–
–
–
–
–
With a non-terminal to right of dot
That is a part-of-speech category
If the next word in the input matches this part-of-speech
Create a new state with dot moved over the non-terminal
So scanner looking at

–
If the next word, “book”, can be a verb, add new state:

–
–
67
VP -> . Verb NP [0,0]
VP -> Verb . NP [0,1]
Add this state to chart entry following current one
Note: Earley algorithm uses top-down input to disambiguate POS!
Only POS predicted by some state can get added to chart!
Completer



Applied to a state when its dot has reached right end of role.
Parser has discovered a category over some span of input.
Find and advance all previous states that were looking for this
category
–

Given:
–
–

NP -> Det Nominal . [1,3]
VP -> Verb. NP [0,1]
Add
–
68
copy state, move dot, insert in current chart entry
VP -> Verb NP . [0,3]
Earley: how do we know we are
done?



How do we know when we are done?.
Find an S state in the final column that spans
from 0 to n+1 and is complete.
If that’s the case you’re done.
–
69
S –> α · [0,n+1]
Earley

More specifically…
1.
2.
Predict all the states you can upfront
Read a word
1.
2.
3.
3.
70
Extend states based on matches
Add new predictions
Go to 2
Look at N+1 to see if you have a winner
Example


71
Book that flight
We should find… an S from 0 to 3 that is a
completed state…
Example
72
Example
73
Example
74
Details

What kind of algorithms did we just describe
–
Not parsers – recognizers



75
The presence of an S state with the right attributes in the
right place indicates a successful recognition.
But no parse tree… no parser
That’s how we solve (not) an exponential problem in
polynomial time
Converting Earley from
Recognizer to Parser


76
With the addition of a few pointers we have a
parser
Augment the “Completer” to point to where
we came from.
Augmenting the chart with
structural information
S8
S9
S10
S11
S12
S13
S8
S9
S8
Retrieving Parse Trees from Chart






78
All the possible parses for an input are in the table
We just need to read off all the backpointers from every
complete S in the last column of the table
Find all the S -> X . [0,N+1]
Follow the structural traces from the Completer
Of course, this won’t be polynomial time, since there could be
an exponential number of trees
So we can at least represent ambiguity efficiently
Left Recursion vs. Right
Recursion

Depth-first search will never terminate if
grammar is left recursive (e.g. NP --> NP PP)
*
*
( 

, 

)
79

Solutions:
–
Rewrite the grammar (automatically?) to a weakly
equivalent one which is not left-recursive
e.g. The man {on the hill with the telescope…}
NP  NP PP (wanted: Nom plus a sequence of PPs)
NP  Nom PP
NP  Nom
Nom  Det N
…becomes…
NP  Nom NP’
Nom  Det N
NP’  PP NP’ (wanted: a sequence of PPs)
NP’  e
 Not so obvious what these rules mean…
–
Harder to detect and eliminate non-immediate left
recursion
–
NP --> Nom PP
– Nom --> NP
–
–
Fix depth of search explicitly
Rule ordering: non-recursive rules first


81
NP --> Det Nom
NP --> NP PP
Another Problem: Structural
ambiguity

Multiple legal structures
–
–
–
82
Attachment (e.g. I saw a man on a hill with a
telescope)
Coordination (e.g. younger cats and dogs)
NP bracketing (e.g. Spanish language teachers)
NP vs. VP Attachment
83

Solution?
–
84
Return all possible parses and disambiguate using
“other methods”
Summing Up

Parsing is a search problem which may be
implemented with many control strategies
–
Top-Down or Bottom-Up approaches each have
problems

–
–

Left recursion
Syntactic ambiguity
Next time: Making use of statistical
information about syntactic constituents
–
85
Combining the two solves some but not all issues
Read Ch 14
Download