CS 4705 Final Review CS4705 Julia Hirschberg Format and Coverage • Covers only material from <date> thru <date> (i.e. beginning with Probabilistic Parsing) • Same format as midterm: – Short answers: 2-3 sentences – True/False: for false statements provide true correction that is not just the negation of the false statement, e.g. – Good answer: • The exam is on Dec 14. FALSE! The exam is on Dec 16. – Bad answer: • The exam is on Dec 14. FALSE! The exam is not on Dec 14.. • Exercises • Short essays: 2 essays, 3-5 paragraphs each • The final will be only slightly longer than the midterm, although you will have the full 3h to complete it. Probabilistic Parsing • Problems with CFGs: – Rules unordered, many possible parses • Solutions: – Weight the rules by their probabilities – But rules aren’t sensitive to lexical items or subcategorization frames – Add headwords to trees – Add subcategorization probabilities – Add complement/adjunct distinction – Etc. Semantics • Meaning Representations – Predicate/argument structure and FOPC x, y{Having(x) Haver(S, x) HadThing( y, x) Car( y)} – Problems with mapping to NL (e.g. and ^) • Frame semantics Having Haver: S HadThing: Car – Problems with reasoning from representation Subcategorization Frames and Thematic Roles • What patterns of arguments can different verbs take? – NP likes NP – NP likes Inf-VP – NP likes NP Inf-VP • What roles can arguments take? – Agent, Patient, Theme (The ice melted), Experiencer (Bill likes pizza), (Bill likes pizza), Stimulus (Bill likes pizza), Goal (Bill ran to Copley Square), Recipient (Bill gave the book to Mary), Instrument (Bill ate the burrito with a plastic spork), Location (Bill sits under the tree on Wednesdays) Selectional Restrictions George assassinated the senator. ?The spider assassinated the fly *Cain assassinated Able. George broke the bank. Lexical Semantics • • • • Lexemes Lexicon Wordnet: synsets Framenet: subcategorization frames/verb semantics Word Relations • Types of word relations – Homonymy: bank/bank – Homophones: red/read – Homographs: bass/bass – Polysemy: bank/sperm bank – Synonymy: big/large – Hyponym/hypernym: poodle/dog – Metonymy: (printing press)/the press – Meronymy: (wheel)/car – Metaphor: Nothing scares Google. Word Sense Disambiguation Time flies like an arrow. • Tasks: all-words vs. lexical sample • Techniques: – Supervised, semi-supervised bootstrapping, unsupervised – Corpora needed – Features that are useful – Competitions and Evaluation methods • Specific approaches: – Naïve Bayes, Decision Lists, Dictionary-based, Selectional Restrictions Discourse Structure and Coherence • Topic segmentation – Useful Features – Hearst’s TexTiling – how does it work? – Supervised methods – how do we evaluate? • Coherence relations – Hobbs’ – Rhetorical Structure Theory – what are it’s problems? Reference Terminology • • • • • • • • • Referring expressions Discourse referents Anaphora and cataphora Coreference Antecendents Pronouns One-anaphora Definite and indefinite NPs Anaphoric chains Constraints on Anaphoric Reference • • • • • • • • • Salience Recency of mention: rule of 2 sentences Discourse structure Agreement Grammatical function Repeated mention Parallel construction Verb semantics/thematic roles Pragmatics Algorithms for Coreference Resolution • • • • • Lappin & Leas Hobbes Centering Theory Supervised approaches Evaluation Information Extraction • Template-based IE – Named Entity Tagging – Sequence-based relation tagging: supervised and bootstrapping – IE for Question Answering, e.g. biographical information (Biadsy’s `bouncing’ between Wikipedia and Google) Information Retrieval • Vector-Space model – Cosine similarity – TF/IDF weighting • NIST competition retrieval tasks • Techniques for improvement • Metrics – Precision, recall, F-measure Question Answering • • • • Factoid questions Useful Features Answer typing UT Dallas System Summarization • Types and approaches to summarization – Indicative vs. informative – Generative vs. extractive – Single vs. multi-document – Generic vs. user-focused • Useful features • Evaluation methods • Newsblaster – how does it work? – Multi-document – Sentence fusion and ordering – Topic tracking MT • Multilingual challenges – Orthography, Lexical ambiguity, morphology, syntax • MT Approaches: – The Pyramid – Statistical vs. Rule-based vs. Hybrid • Evaluation metrics – Human vs. Bleu score – Criteria: fluency vs. accuracy Dialogue • • • • • Turns and Turn-taking Speech Acts and Dialogue Acts Grounding Intentional Structure: Centering Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature The Final • Dec. 16, MUDD 535, 1:10-4pm • Good luck!