What is Syntax?

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19 Admin Stuff • These slides available at o http://www.cs.columbia.edu/~rambow/teaching.html • For Eliza in homework, you can use a tagger or chunker, if you want – details at: o http://www.cs.columbia.edu/~ani/cs4705.html • Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721 Statistical POS Tagging • Want to choose most likely string of tags (T), given the string of words (W) • W = w1, w2, …, wn • T = t1, t2, …, tn • I.e., want argmaxT p(T | W) • Problem: sparse data Statistical POS Tagging (ctd) • p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) • argmaxT p(T|W) = argmaxT p(W|T) p (T) / p(W) = argmaxT p(W|T) p (T) Statistical POS Tagging (ctd) p(T) = p(t1, t2, …, tn-1 , tn) = p(tn | t1, …, tn-1 ) p (t1, …, tn-1) = p(tn | t1, …, tn-1 ) p(tn-1 | t1, …, tn-2) p (t1, …, tn-2) = i p(ti | t1, …, ti-1 )  i p(ti | ti-2, ti-1 )  trigram (n-gram) Statistical POS Tagging (ctd) p(W|T) = p(w1, w2, …, wn | t1, t2, …, tn ) = i p(wi | w1, …, wi-1, t1, t2, …, tn)  i p(wi | ti ) Statistical POS Tagging (ctd) argmaxT p(T|W) = argmaxT p(W|T) p (T)  argmaxT i p(wi | ti ) p(ti | ti-2, ti-1 ) • Relatively easy to get data for parameter estimation (next slide) • But: need smoothing for unseen words • Easy to determine the argmax (Viterbi algorithm in time linear in sentence length) Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation • p’ ( wi | ti ) = c( wi, ti ) / c( ti ) • p’ ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-1 ) Statistical POS Tagging • Method common to many tasks in speech & NLP • “Noisy Channel Model”, Hidden Markov Model Back to Syntax • (((the/Det) boy/N) likes/V ((a/Det) girl/N)) nonterminal symbols = constituents S NP DetP the boy likes NP DetP Phrase-structure tree girl a terminal symbols = words Phrase Structure and Dependency Structure S NP DetP the boy likes/V likes NP DetP a girl boy/N the/Det girl/N a/Det Types of Dependency likes/V Adj(unct) sometimes/Adv Subj Fw the/Det boy/N Adj small/Adj Adj very/Adv Obj girl/N Fw a/Det Grammatical Relations • Types of relations between words o Arguments: subject, object, indirect object, prepositional object o Adjuncts: temporal, locative, causal, manner, … o Function Words Subcategorization • List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc) • In canonical order Subject-ObjectIndObj • Example: like: N-N, N-V(to-inf) o see: N, N-N, N-N-V(inf) o • Note: J&M talk about subcategorization only within VP Where is the VP? S S likes NP DetP boy DetP girl NP NP the a DetP the boy VP likes NP DetP a girl Where is the VP? • Existence of VP is a linguistic (empirical) claim, not a methodological claim • Semantic evidence??? • Syntactic evidence VP-fronting (and quickly clean the carpet he did! ) o VP-ellipsis (He cleaned the carpets quickly, and so did she ) o Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans ) o • Note: in all right-branching structures, issue is different again Penn Treebank, Again • Syntactically annotated corpus (phrase structure) • PTB is not naturally occurring data! • Represents a particular linguistic theory (but a fairly “vanilla” one) • Particularities o o o Very indirect representation of grammatical relations (need for head percolation tables) Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat ) Has flat Ss, flat VPs Context-Free Grammars • Defined in formal language theory (comp sci) • Terminals, nonterminals, start symbol, rules • String-rewriting system • Start with start symbol, rewrite using rules, done when only terminals left CFG: Example • Rules o S  NP VP o VP  V NP o NP  Det N | AdjP NP o AdjP  Adj | Adv AdjP o N  boy | girl o V  sees | likes o Adj  big | small o Adv  very o Det  a | the the very small boy likes a girl Derivations of CFGs • String rewriting system: we derive a string (=derived structure) • But derivation history represented by phrase-structure tree (=derivation structure)! Grammar Equivalence and Normal Form • Can have different grammars that generate same set of strings (weak equivalence) • Can have different grammars that have same set of derivation trees (string equivalence) Nobody Uses CFGs Only (Except Intro NLP Courses) All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another o All successful parsers currently use statistics about phrase structure and about dependency o Massive Ambiguity of Syntax • For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations! • Example: o The large head master told the man that he gave money and shares in a letter on Wednesday Some Syntactic Constructions: Wh -Movement Control Raising

What is Syntax?

Related documents

Products

Support

What is Syntax?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib