Syntax: Structural Descriptions of Sentences

advertisement
Syntax: Structural Descriptions of Sentences
Why Study Syntax?
Syntax provides
•
systematic rules for forming new sentences in a language.
•
can be used to verify if a sentence is legitimate in a language.
•
a step closer to the “meaning” of a sentence.
–
Who did what to whom semantics
Applications
•
Improving precision in search applications
Yankees beat red sox
– Red sox beat yankees
–
•
Paraphrasing
–
•
John loves Mary = Mary is loved by John
Information Extraction
–
Fill in a form by extracting information from a document.
Structure of Words
What are words?
•
Orthographic tokens separated by white space.
In some languages the distinction between words and
sentences is less clear.
•
Chinese, Japanese: no white space between words
–
•
nowhitespace  no white space/no whites pace/now hit esp ace
Turkish: words could represent a complete “sentence”
–
Eg: uygarlastiramadiklarimizdanmissinizcasina
Morphology: the structure of words
•
Basic elements: morphemes
•
Morphological Rules: how to combine morphemes.
Syntax: the structure of sentences
•
Rules for ordering words in a sentence
•
Elementary units: Phrasal and Clauses
Morphology and Syntax
Interplay between syntax and morphology
•
How much information does a language allow to be packed in a
word, and how easy is it to unpack.
•
More information  less rigid syntax  more free word order
•
Hindi: “John likes Mary” – all six orders are possible, due to rich
morphological information.
–
John-nom Mary-acc likes
English expresses relations between words through word order.
Morphologically rich languages have freer word order.
•
However, some parts have rigid word order.
–
Noun groups in Hindi: “one yellow book”
Outline
Constituency
• How does this notion arise?
•
Type of constituents
•
Representation: Tree Structure
Formal device: Context Free Grammars
• Derived tree and derivation tree
•
Grammar Equivalence
Strong and weak generative capacity
– Chomsky Normal Form
–
•
Other Formal Frameworks (Tree-Adjoining Grammar)
Other topics in syntax
• Dependency
•
Spoken language syntax
•
Structural Priming
Constituency
Words are grouped into part-of-speech groups
•
Similar morphological inflections
•
Allows us to create new word forms (“blog”, “xerox”)
•
Nouns, Verbs, Determiners, Adjectives etc…
Certain sequences of words in a sentence are grouped as
constituents
•
Distributionally similar behavior
•
cohesive units (move around in a sentence as a unit)
In the morning I take a walk
– I take a walk in the morning
–
•
Substrings are typed “Clause”, “Noun Phrase”, “Verb Phrase”
“Preposition Phrase” etc.
Constituency – contd.
Examples of constituents:
•
Noun phrase:
–
•
Preposition phrase:
–
•
the dog, two big light blue vans
in the box, under the bridge
Clause:
–
the dog bit the man, John thought the dog bit the man
The type of a constituent is derived from the “head word” of
the constituent.
Constituent Structure
Decomposition of a sentence into its constituents.
Attaching constituents to each other to reflect relations among words:
Emergence of Tree Structure
•
John saw the man with the telescope
•
(S (NP John) saw (NP (NP the man) (PP with (NP the telescope))))
•
(S (NP John) saw (NP the man) (PP with (NP the telescope))))
Select a sentence from a newspaper text and provide its constituent
structure.
Evidence of another constituent – verb phrase (“VP”)
•
Substring involving a verb move around and can be referred to as a unit.
–
–
–
VP-fronting (and quickly clean the carpet he did! )
VP-ellipsis (He cleaned the carpets quickly, and so did she )
Can have adjuncts before and after VP, but not in VP (He often eats beans, *he
eats often beans )
Relations among Words
Types of relations between words
• Arguments: subject, object, indirect object, prepositional object
•
Adjuncts: temporal, locative, causal, manner, …
•
Function Words
Subcategorization: List of arguments of a word (verb)
• with features about realization (POS, perhaps case, verb form etc)
For English, the argument order: Subject-Object-IndirectObj
Example:
• like: NP-NP (“John likes Mary”), NP-VP(to-inf) (John likes to watch movies)
•
think: NP-S (“John thought Mary was going to the party”)
•
put: NP-NP-PP
Adjuncts are optional (typically modifiers of an action)
• John put the book on the table at 3pm yesterday
There are words with “demands” and words that fill the “demands”.
• Demands are typed (NP, VP, PP, S)
English Syntax: A Sample
Sentence types:
•
Declarative (John closed the door)
•
Imperative (close the door!!)
•
Yes-No-Question (can you close the door?)
•
Wh-question (who closed the door? What did John close?)
Clause types:
•
Infinitival (to read a book)
•
Gerundive (reading of a book)
•
Relative Clause (that has a green cover)
English Syntax: A Sample – contd.
Noun Phrase:
•
Before the head noun:
–
•
Pre-determiner Determiner Post-determiner (Adjective|Noun) Noun
After the head noun (Modifiers)
–
–
–
Preposition phrases
Relative Clauses (the book that has only one sentence)
Gerundive (the flight arriving after 10pm)
Auxiliary Verbs
•
Modal (could, might, will, should…) < perfect (have) < progressive (be) <
passive (be)
•
“might have been destroyed”
Large wide-coverage grammars have been developed/under
development
•
XTAG (www.cis.upenn.edu/~xtag), HPSG, LFG
Two Representations of Syntactic Structure
Phrase structure: illustrates the constituents and its type.
Dependency structure: Relations between words without
intervening structure.
S
reads
arg0
NP
NP
reads
Adv
the
boy
slowly
DetP
a
book
arg1
book slowly
boy
fw
DetP
adj
the
fw
a
Context Free-Grammars
String Rewriting Systems
•
Transform one string to another (until termination)
G=(V,T,P,S)
where V: vocabulary of non-terminals
T: vocabulary of terminals
S: start symbol
P: set of productions of the form
a  b where a  V and b  (V U T)*
Derivation: Rewrite a non-terminal with the production of the grammar until
no non-terminals exist in the string.
•
Start with “S”
Sample Context-Free Grammar, derivation and derived structure.
Two Representations
String rewriting system: we derive a string (=derived structure)
But derivation history represented by phrase-structure tree
(=derivation structure)!
Grammar Equivalence
•
Can have different grammars that generate same set of strings (weak
equivalence)
•
Can have different grammars that have same set of derivation trees (strong
equivalence)
•
Strong equivalence implies weak equivalence
CFG Normal Forms:
•
Chomsky Normal Form (a  b g)
•
Griebarch Normal Form (a  w b)
•
Convert a grammar into CNF and GNF
Penn Treebank (PTB)
Syntactically annotated corpus (phrase structure)
Contains 1 miilion words of Wall Street Journal sentences marked
up with syntactic structure.
•
Can be converted into a dependency Treebank.
–
•
Completely flat structure in NP
–
•
need for head percolation tables
brown bag lunch, pink-and-yellow child seat
Represents a particular linguistic theory
PropBank
•
PTB with some grammatical relations made explicit
Unification
Mechanism needed to pass and check constraints.
Constraints, syntactic and semantic:
• Subject-verb agreement
–
–
•
Subject/Auxiliary inversion: (Yes-no-question)
–
–
•
S  NP VP
the boy reads / the boys read / * the boys reads
S  AuxVerb NP VP
Do you have flights / * does you have flights
Selectional restrictions:
–
An apple reads a book
Need a mechanism to encode these constraints
• Refine the non-terminal set to encode these constraints.
•
S  3sgAux 3sgNP VP ; 3sgAux  does | has …
•
S  Non3sgAux Non3sgNP VP; Non3sgAux  do | have | can
•
We need to split the NP rule into the 3sgNP and Non3sgNP.
•
Size of the grammar grows;
•
can we factor these constraints out of the structure of the rules?
Unification – contd.
Attribute value matrix:
Cat
boy : Number
Person
Cat
boys : Number
Person
N
sg
3
N
pl
3
Percolate Constraints
V
Cat
read :
Subj agr Number pl
Cat
reads:
Number sg
Person 1|2
V
Subj agr Number sg
3
Person
Check Constraints
S  NP VP
VP  V
VP.number = V.subj.agr.number
VP.person = V.subj.agr.person
NP.number = VP.subj.agr.number
NP.person = VP.subj.agr.person
The boy reads / * the boys reads / the boys read
Structural Priming
Structure of preceding sentences helps/hinders the reading times of
subsequent sentences.
•
Dative alternation
The woman gave her car to the church
– The woman gave the church her car
–
•
One of these forms is primed depending on what the prime was
V NP NP  gave the church her car
– V NP PP  gave her car to the church
–
Spoken Language Syntax
Not as “clean”, rampant disfluency.
•
edits (restarts, repairs)
•
Filled pauses
•
Ungrammaticality
Sentence  utterance.
“Clean up” the utterance first before understanding it.
Download