Syntax: Structural Descriptions of Sentences Why Study Syntax? Syntax provides • systematic rules for forming new sentences in a language. • can be used to verify if a sentence is legitimate in a language. • a step closer to the “meaning” of a sentence. – Who did what to whom semantics Applications • Improving precision in search applications Yankees beat red sox – Red sox beat yankees – • Paraphrasing – • John loves Mary = Mary is loved by John Information Extraction – Fill in a form by extracting information from a document. Structure of Words What are words? • Orthographic tokens separated by white space. In some languages the distinction between words and sentences is less clear. • Chinese, Japanese: no white space between words – • nowhitespace no white space/no whites pace/now hit esp ace Turkish: words could represent a complete “sentence” – Eg: uygarlastiramadiklarimizdanmissinizcasina Morphology: the structure of words • Basic elements: morphemes • Morphological Rules: how to combine morphemes. Syntax: the structure of sentences • Rules for ordering words in a sentence • Elementary units: Phrasal and Clauses Morphology and Syntax Interplay between syntax and morphology • How much information does a language allow to be packed in a word, and how easy is it to unpack. • More information less rigid syntax more free word order • Hindi: “John likes Mary” – all six orders are possible, due to rich morphological information. – John-nom Mary-acc likes English expresses relations between words through word order. Morphologically rich languages have freer word order. • However, some parts have rigid word order. – Noun groups in Hindi: “one yellow book” Outline Constituency • How does this notion arise? • Type of constituents • Representation: Tree Structure Formal device: Context Free Grammars • Derived tree and derivation tree • Grammar Equivalence Strong and weak generative capacity – Chomsky Normal Form – • Other Formal Frameworks (Tree-Adjoining Grammar) Other topics in syntax • Dependency • Spoken language syntax • Structural Priming Constituency Words are grouped into part-of-speech groups • Similar morphological inflections • Allows us to create new word forms (“blog”, “xerox”) • Nouns, Verbs, Determiners, Adjectives etc… Certain sequences of words in a sentence are grouped as constituents • Distributionally similar behavior • cohesive units (move around in a sentence as a unit) In the morning I take a walk – I take a walk in the morning – • Substrings are typed “Clause”, “Noun Phrase”, “Verb Phrase” “Preposition Phrase” etc. Constituency – contd. Examples of constituents: • Noun phrase: – • Preposition phrase: – • the dog, two big light blue vans in the box, under the bridge Clause: – the dog bit the man, John thought the dog bit the man The type of a constituent is derived from the “head word” of the constituent. Constituent Structure Decomposition of a sentence into its constituents. Attaching constituents to each other to reflect relations among words: Emergence of Tree Structure • John saw the man with the telescope • (S (NP John) saw (NP (NP the man) (PP with (NP the telescope)))) • (S (NP John) saw (NP the man) (PP with (NP the telescope)))) Select a sentence from a newspaper text and provide its constituent structure. Evidence of another constituent – verb phrase (“VP”) • Substring involving a verb move around and can be referred to as a unit. – – – VP-fronting (and quickly clean the carpet he did! ) VP-ellipsis (He cleaned the carpets quickly, and so did she ) Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans ) Relations among Words Types of relations between words • Arguments: subject, object, indirect object, prepositional object • Adjuncts: temporal, locative, causal, manner, … • Function Words Subcategorization: List of arguments of a word (verb) • with features about realization (POS, perhaps case, verb form etc) For English, the argument order: Subject-Object-IndirectObj Example: • like: NP-NP (“John likes Mary”), NP-VP(to-inf) (John likes to watch movies) • think: NP-S (“John thought Mary was going to the party”) • put: NP-NP-PP Adjuncts are optional (typically modifiers of an action) • John put the book on the table at 3pm yesterday There are words with “demands” and words that fill the “demands”. • Demands are typed (NP, VP, PP, S) English Syntax: A Sample Sentence types: • Declarative (John closed the door) • Imperative (close the door!!) • Yes-No-Question (can you close the door?) • Wh-question (who closed the door? What did John close?) Clause types: • Infinitival (to read a book) • Gerundive (reading of a book) • Relative Clause (that has a green cover) English Syntax: A Sample – contd. Noun Phrase: • Before the head noun: – • Pre-determiner Determiner Post-determiner (Adjective|Noun) Noun After the head noun (Modifiers) – – – Preposition phrases Relative Clauses (the book that has only one sentence) Gerundive (the flight arriving after 10pm) Auxiliary Verbs • Modal (could, might, will, should…) < perfect (have) < progressive (be) < passive (be) • “might have been destroyed” Large wide-coverage grammars have been developed/under development • XTAG (www.cis.upenn.edu/~xtag), HPSG, LFG Two Representations of Syntactic Structure Phrase structure: illustrates the constituents and its type. Dependency structure: Relations between words without intervening structure. S reads arg0 NP NP reads Adv the boy slowly DetP a book arg1 book slowly boy fw DetP adj the fw a Context Free-Grammars String Rewriting Systems • Transform one string to another (until termination) G=(V,T,P,S) where V: vocabulary of non-terminals T: vocabulary of terminals S: start symbol P: set of productions of the form a b where a V and b (V U T)* Derivation: Rewrite a non-terminal with the production of the grammar until no non-terminals exist in the string. • Start with “S” Sample Context-Free Grammar, derivation and derived structure. Two Representations String rewriting system: we derive a string (=derived structure) But derivation history represented by phrase-structure tree (=derivation structure)! Grammar Equivalence • Can have different grammars that generate same set of strings (weak equivalence) • Can have different grammars that have same set of derivation trees (strong equivalence) • Strong equivalence implies weak equivalence CFG Normal Forms: • Chomsky Normal Form (a b g) • Griebarch Normal Form (a w b) • Convert a grammar into CNF and GNF Penn Treebank (PTB) Syntactically annotated corpus (phrase structure) Contains 1 miilion words of Wall Street Journal sentences marked up with syntactic structure. • Can be converted into a dependency Treebank. – • Completely flat structure in NP – • need for head percolation tables brown bag lunch, pink-and-yellow child seat Represents a particular linguistic theory PropBank • PTB with some grammatical relations made explicit Unification Mechanism needed to pass and check constraints. Constraints, syntactic and semantic: • Subject-verb agreement – – • Subject/Auxiliary inversion: (Yes-no-question) – – • S NP VP the boy reads / the boys read / * the boys reads S AuxVerb NP VP Do you have flights / * does you have flights Selectional restrictions: – An apple reads a book Need a mechanism to encode these constraints • Refine the non-terminal set to encode these constraints. • S 3sgAux 3sgNP VP ; 3sgAux does | has … • S Non3sgAux Non3sgNP VP; Non3sgAux do | have | can • We need to split the NP rule into the 3sgNP and Non3sgNP. • Size of the grammar grows; • can we factor these constraints out of the structure of the rules? Unification – contd. Attribute value matrix: Cat boy : Number Person Cat boys : Number Person N sg 3 N pl 3 Percolate Constraints V Cat read : Subj agr Number pl Cat reads: Number sg Person 1|2 V Subj agr Number sg 3 Person Check Constraints S NP VP VP V VP.number = V.subj.agr.number VP.person = V.subj.agr.person NP.number = VP.subj.agr.number NP.person = VP.subj.agr.person The boy reads / * the boys reads / the boys read Structural Priming Structure of preceding sentences helps/hinders the reading times of subsequent sentences. • Dative alternation The woman gave her car to the church – The woman gave the church her car – • One of these forms is primed depending on what the prime was V NP NP gave the church her car – V NP PP gave her car to the church – Spoken Language Syntax Not as “clean”, rampant disfluency. • edits (restarts, repairs) • Filled pauses • Ungrammaticality Sentence utterance. “Clean up” the utterance first before understanding it.