English Syntax

advertisement
English Syntax
Read J & M Chapter 9.
Two Kinds of Issues
•Linguistic – what are the facts about language?
The rules of syntax (grammar)
•Algorithmic – what are effective computational procedures
for dealing with those facts?
Building parsers
What is Syntax?
Try 1: the rules for stringing words together to form sentences.
The boys hit the ball. vs. Ball boys hit the the.
I gave Sue a ride to the store vs. I gave Sue ride to store.
I saw the book that Mary had written. vs.
I saw the book what Mary had written.
But if that’s all it were, we wouldn’t have to do much for
understanding assuming legal input.
What is Syntax?
Try 2: The rules for forming constituents that correspond to
meaningful entities.
Example: The cat with the furry tail purred.
Why Do We Care about Syntax?
Morphology
POS Tagging
Syntax
Semantics
Discourse Integration
Generation goes backwards. For this reason, we generally want
declarative representations of the facts.
Sometimes We Need it Even if We Don’t Go
All the Way
Question answering:
Lawyers whose clients committed fraud
vs
Lawyers who committed fraud
vs
Clients whose lawyers committed fraud
Finding Constituents in Sentences
A constituent is a word or group of words that functions as a
unit.
How can we discern constituents?
•Semantically:
The cat with the furry tail purred.
•What can be chopped out and replaced by a single word?
Agnes purred.
* Agnes tail purred.
Finding Constituents in Sentences, con’t
•Preposed and postposed constructions:
Early next year I’d like to go to Paris.
I’d like to go to Paris early next year.
I’d like early next year to go to Paris.
* Early I’d like to go to Paris next year.
* I’d like early to go to Paris next year.
* The early next year old man would like to go to Paris.
How Many Kinds of Constituents are There?
Although there may be an infinite number of possible
constituent tokens, there’s quite a small number of constituent
types, e.g., NP, PP, VP.
On what basis can we group tokens into types? Occurrence in
similar contexts.
How Many Kinds of Constituents are There, con’t
The cat with the furry tail purred.
Every dog wore a collar.
Most of the children in the room brought a dog with a furry
tail and a collar.
The furry tail brought a room.
Every room purred.
A dog with a furry tail and a collar purred.
Mary saw most of the children in the room.
NPs occur as subjects, objects of verbs, and objects of
prepositions.
Single Word Constituents
Single word constituents are exactly the parts of speech that we
have already considered.
How many of these single word constituent types are there?
Look at sizes of tagsets.
Lots of design decisions:
Sue bought the big white house.
* Sue bought the white big house.
Are big and white the same POS?
Simple Constituent Types Don’t Capture
Everything
* The cat with a furry tail purred a collar.
Mary imagined a cat with a furry tail.
Mary decided to go.
* Mary decided a cat with a furry tail.
Mary decided a cat with a furry tail would be her next pet.
Mary gave Lucy the food.
* Mary decided Lucy the food.
Subcategorization
Frame
Ø
NP
NP NP
PPfrom PPto
NP PPwith
VPto
VPbrst
S
Verb
eat, sleep, …
prefer, find, leave, ...
show, give, …
fly, travel, …
help, load, …
prefer, want, need, …
can, would, might, …
mean
Example
I want to eat
Find [NP the flight from Pittsburgh to Boston]
Show [NP me] [NP airlines with flights from Pittsburgh]
I would like to fly [pp from Boston] [pp to Philadelphia]
Can you help [NP me] [pp with a flight]
I would prefer [VPto to go by United airlines]
I can [VPbrst go from Boston]
Does this mean [S AA has a hub in Boston]?
The Role of the Lexicon in Parsing
•Serves as the starting point for POS tagging.
•Provides additional information such as subcategorization:
•For verbs
•For adjectives:
I’m angry with Mary.
I’m mad at Mary.
I’m angry at Mary.
* I’m mad with Mary.
•For nouns:
Jane has a passion for old movies.
Jane has an interest in old movies.
One Other Barrier to a Small Number of
Kinds of Constituents Agreement
Number agreement:
The boys want to go to the game(s).
* The boy want to to to the game(s).
Case agreement:
I want to give it to him.
* Me want to give it to he.
In English it’s just
pronouns, but not so in
many other languages.
The Solution – Augmenting the Constituent
Types
To solve these and other problems, one strategy is to augment
constituent types with other sorts of information:
V +pl +[NP NP]
 VP/NP/NP +pl
Show
VP/NP +pl
Show me
VP +pl
Show me the book.
Specifying a Language
•The set of sentences in English is large (maybe even infinite).
•We want a concise (i. e., much shorter than a list of sentences)
definition of it.
•We have a finite (in fact quite small) set of constituent types (NP,
VP, etc.) from which to build our description.
So we appeal to recursion and write grammar rules such as:
S  NP VP
VP  V NP
NP  NP PP
NP  NP S (The boy who went to the store won the game.)
PP  prep NP
A Context-Free Grammar for English
If we ignore:
•subcategorization
•agreement
•gapping
Then we can build a context-free grammar for English that does
a pretty good job of:
•generating all and only the acceptable sentences, and of
•building reasonable parse trees for those sentences.
We’ll look at whether English is formally context free later.
Context-Free Grammars
A context-free grammar (CFG) is a 4-tuple:
1. A set of non-terminal symbols N
2. A set of terminals  (disjoint from N)
3. A set of productions P, each of the form A  , where A
is a non-terminal and  is a string of symbols from the
infinite set of strings (N)*
4. A designated start symbol S
In our grammar of English:
•  is the set of POS, and
• N is the set of remaining constituent types, e.g., NP, VP, PP
Derivations Using CFGs
The standard formal definition:
LG generated by grammar G is the set of strings composed
of terminal symbols which can be derived from the
designated start symbol S.
LG = {w | w is in * and S  w}
But we won’t generally want our grammar to have to all
the way to words. We want to let the lexicon do that.
That’s why we let  be the set of POS. So the grammar
may generate strings such as:
N V Det N
Derivations Using CFGs
So we will use the following definition:
LG = {s | w is in * and S  w and s can be derived
from w by substituting words for POS as licensed by
the lexicon}
Note that this doesn’t change the formal picture. We could
instead augment our grammar with tens of thousands of rules
of the form:
N  phlogiston
This is a system design decision.
Context-Free Grammars and Parse Trees
S  NP VP
NP  Name
NP  Det N
VP  V NP
S
NP
Name
John
(S
(NP
(VP
(NAME John))
(V ate)
(NP (ART the)
(N pizza))))
VP
V
ate
NP
Det
N
the
pizza
Long Distance Dependencies
Who did she say she saw ____ coming down the hill?
She did say she saw who coming down the hill.
The boy she saw
coming down the road was crying.
The boy she saw _____ coming down the road was crying.
Long Distance Dependencies – A Linguistic
Solution
Transformational Grammar (Chomsky, 1965):
•A context free grammar generates base forms
•A transformational component moves constituents around
and may delete them from the surface form.
But how can we run these rules backwards?
This approach went out of fashion at least 20 years ago.
Long Distance Dependencies –
Computational Solutions
•Augmented Transition Networks: All arbitrary actions on the
arcs. These permit insertions and movements of constituents.
But any procedural solution won’t be reversible for
generation.
•Unification systems: Declarative patterns for assigning
constituents to fill subcategorization slots.
Spoken Language Syntax
Speech is collected in utterances rather than in text.
Spoken language is looser than written with more pauses, ‘nonverbal
events’, disfluencies such as er, uh, um.
Sample spoken language utterances from users interacting with ATIS
Spoken Language Syntax
The repair often has the same structure as the constituent
immediately before the interruption point.
Download