Parsing & Parsing Speech Benjamin Lambert 10/23/09 “MLSP” Group Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics What is parsing? “Parsing is the process of structuring a linear representation in accordance with a given grammar…. …“The linear representation may be a sentence, a computer program, a knitting pattern, a sequence of geological strata, a piece of music, actions in ritual behaviors, in short any linear sequence in which the preceding elements in some way restrict the next element.” (From “Parsing Techniques” Grune and Jacob, 1990) “Structure a linear representation” • “The dog, that I saw, was fast” • The dog, that I saw, was fast Why parse speech? To add constraints to the model • Specifically, parsing can help us identify: – Grammatical errors: • The dog, that I saw, was fast • *The dogs, that I saw, was fast – Semantic errors: • The pulp will be made into newsprint • *The Pope will be made into newsprint • (These are all long-distances dependencies, too distant for n-gram language model) Why parse speech? Extract information from the speech (e.g. dialog system) Linear representation: “I’d like to fly from Boston to, um, Pittsburgh on Saturday on US Airways” Structured representation: FLIGHT FRAME Origin Boston Destination Pittsburgh Date Saturday Airline US Airways … … Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics What is a (formal) language? “A language is a ‘set’ of sentences, and each sentence is a ‘sequence’ of ‘symbols’… that is all there is: no meaning, no structure, either a sentence belongs to a language or it does not.” (“Parsing Techniques” Grune) • A linguistic would disagree • We’ll use this definition for formal language theory only Example formal languages: • Binary numbers: – 0, 1, 10, 11, 100… • Binary numbers with an odd number of ones: – 1, 111, 1000, 1011, … – * 11, 101, 1111,… • N zeros followed by n ones (0n1n) – 01, 0011, 000111, 00001111… – *0, 1, 100, … • Grammatically correct English: – “The pope will be made into newsprint”, … – *“The pope will are made into newsprint”, … • Semantically correct English (Semantic validity determined by some world model) – “The pulp will be made into newsprint”, …. – *“The Pope will be made into newsprint”, … DFAs All diagrams of DFAs, NFAs, and PDAs from the Sipser book. NFAs (and equivalent DFAs) PDA for: {On1n | n ≥ 0} CFG for: {On1n | n ≥ 0} • SA • A 0A1 • Aε • Can convert any CFG to Chomsky normal form: A BC Aa A Non-deterministic PDA for {aibjck |i,j,k ≥ 0 and i=j or i=k} Review: Formal Language Theory Language class Written formalism “Machine” Chomsky class Example Regular Regular expressions DFA/NFA 3 01* Context-free Context-free grammars (nondeterministic) Push-down automaton 2 0n1n, programming languages, (natural languages?) Contextsensitive Context-sensitive grammars Linear-bounded automaton 1 0n1n,2n, (Natural languages— mildly) Unrestricted? (Perl?) Turing machine? “0” Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics History (See chapter summaries in Hopcroft et. al) 1931 Kurt Goedel Incompleteness Theorem (See Nagel 2001) 1936 Alan Turing Turing machines & undecidability of unres. lan 1936 Church, Kleene, Post Computability 55/56 Huffman, Mealy, Moore DFA 1956 Shannon and McCarthy NFA 1956 S.C. Kleene Regular expressions 1956 Chomsky Context-free grammars & context-sensitive grammars (Chomsky, 1957) ‘59/’60 Backus, and Naur CFGs for Fortran & Algol (respectively) 61/63 Oettinger/Schutzenberger Push-down automata 1963 P.C. Fischer Deterministic PDA 1965 D.E. Knuth LR(k) grammars 1967 D.H. Younger (& C & K?) CYK parsing algorithm 1967 Charles Fillmore “Case” theory of linguistics (Fillmore, 1967) 71/72 S.C. Cook & R.M. Karp NP-completeness problems Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics Parsing Context-Free Grammars • Want to retrieve the structure, not just accept/reject • Parsing – CYK parsing algorithm (not PDA-based) (1967) • Example – LR parsing (Knuth ’68)—Most like a PDA – GLR parsing, LR parsing but allows ambiguity, simulates non-deterministic PDA (Tomita, 1984) Parsing CFGs, Goal: Given a grammar: S NP VP NP (Det) N VP V (NP) N pope V ran Det the And, an input string: “The pope ran.” Find the structure that the grammar describes: • How to parse? • CYK (Cocke-Younger-Kasami ~1965): – Convert to CNF (Chomsky Normal form) – Really simple algorithm: O(n3) – Bottom-up, so we use “substitution” rules in reverse • In practice: – We don’t want to convert to CNF – Can do much faster on the average case – Other algorithms (Earley, chart-parsing, LR parsing) are similar, but tedious to go through CYK algorithm • Bottom-up • Chart-based CYK algorithm Taken from? From Alon Lavie’s lecture slides on CYK parsing from CMU’s 11-711 course CYK Algorithm Example 5 4 3 Length 2 1 b a a b a 1 2 3 4 5 Start position Running time • CYK: O(n3) • Most reasonable algorithms, worst case: O(n3) – (CYK isn’t “reasonable”) • Current fastest: O(n2.376), reduced to matrix multiplication • (Some grammars much faster with other algorithms, e.g. LR-parsing in linear time). Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics LR Parsing • Shift-reduce parsing • Simulates a push-down automata • Developed by Knuth in the late 60’s for programming language compilers • Only works for (mostly) unambiguous grammars (i.e. LR(k) grammars) • Very fast—linear time parsing A Simple Arithmetic Grammar From Grune and Jacobs Non-deterministic LR Parsing Table From Grune and Jacobs Deterministic LR Parsing Table From Grune and Jacobs Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics GLR Parsing • Just like LR parsing, but allows ambiguity, and simulates non-determinism when there is ambiguity • Masaru Tomita (1984) – Co-founder of LTI GLR Parsing From Tomita’s 1985 IJCAI paper GLR Parse forest From Tomita’s 1985 IJCAI paper Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics Challenges Parsing speech • Disfluencies: – Uh, um – Interruptions/restarts (“I want to drive…. I want to fly to Pittsburgh” “Boston, no I mean, Maui” • Incomplete sentences “from Boston to Pittsburgh” • Non-sentences “Hello.” • Ellipsis “I want to fly from Boston <silence> Pittsburgh” • Segments with poor acoustics/LM “I want to fly from Boston purple monkey dishwasher Pittsburgh” • No: <s> </s> “I want to fly to Pittsburgh she wants to fly to Maui” • Two examples: – GLR* – PHOENIX GLR* • Alon Lavie (1993-1996) – His PhD thesis at CMU, under Tomita • Conceptually the same as GLR, plus: – Can parse multiple simultaneous trees, i.e. more than one complete “S” tree – Can “skip” words in the input • These create lots of additional ambiguity, so adds heuristics How to skip filler words? From Grune and Jacobs GLR* parsing of speech • Allows skips, allows multiple S-nodes GLR* (1993) • Multiple S’s and skip edges blow up the search space, so do a beam search with heuristics: – Number of words skipped – Fragmentation of the parse (number of S-nodes) How can we use GLR* in ASR? • Add parsability constraints • Search through the n-best list looking for hypotheses that parse: 1. THOUGH THE GAME HAD AROUND THE SCIENCE AND THE NINETEEN FIFTIES IT NEVER REGAINED THE POPULARITY OF ITS GOLD MANAGED [Sphinx: -29494370 ] [WER: .4211] 2. THOUGH THE GAME HAD AROUND THE SCIENCE AND THE NINETEEN FIFTIES IT NEVER REGAINED THE POPULARITY OF ITS GOAL NATURE [Sphinx: -29557043 ] [WER: .4211] 3. THOUGH THE GAME HAD EVER RENAISSANCE AND THE NINETEEN FIFTIES IT NEVER REGAINED THE POPULARITY OF ITS GOLD MANAGED [Sphinx: -29571010] [WER: .2105] … N?. THOUGH THE GAME HAD A RENAISSANCE IN THE NINETEEN FIFTIES IT NEVER REGAINED THE POPULARITY OF ITS GOLDEN AGE Examples from WSJ dataset GLR*: pros and cons • Pros: – *General* open domain • Cons: – Still imposes pretty strict grammatical constraints Outline • • • • What is parsing? Why parse? Review: Formal Language Theory History Parsing – CYK parsing – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars or, PHOENIX • Alternatives/extensions • How can we use these? Parsing in restricted domain ASR • Two options: – Make the grammar extremely specific (a “semantic grammar”) – Or, forget about trying to do a “complete” parse, and just look for the information that we know we need (information extraction) (PHOENIX) • Most (all?) dialog systems use one or the other or both (See McTear, 2004) Semantic grammars • In a restricted domain, we can (potentially) greatly simplify the parsing • Instead of abstract grammatical categories S NP VP NP Adj* N • Specific, meaningful, actionable categories: AddToSpreadsheetCommand SelectCommand SpreadsheetLocation Text SelectComment “select” | “highlight” SpreadSheetlocation … Semantic Grammar • Just another grammar, use GLR* or something else to parse • SOUP parser (Marsal Gavalda, CMU, ~2000) – – – – – Same GLR* but with “semantic grammar”? On ATIS style data (Semantic classes: location, time, etc.) Uses equivalent Probabilistic Recursive Transition Networks “Inspired by PHOENIX” Used in JANUS Speech-to-Speech MT system • Semantic grammars are not portable/reusable – A lot of work to create PHOENIX- Case frame “I’d like to fly from Boston to, um, Pittsburgh on Saturday on US Airways” FLIGHT FRAME Origin Boston Destination Pittsburgh Date Saturday Airline US Airways … … PHOENIX (Ward ~1990) • Before GLR*, etc. but perhaps the most influential in ASR, ASU, Dialogue systems – “Set the bar for the ATIS task” (Alex Acero, paraphrased) • Still used in “Ravenclaw-Olympus” (a CMU dialog system framework (Bohus and Rudnicky) – Thus, used in: “Let’s Go”, etc. etc. CMU Communicator, etc. • Doesn’t attempt to perform a full parse, just to extract the important bits of information – Each field is recognized by a “mini” CFG parser (strict, no skips) – Allows any amount of skipping/noise in between informative fields Numerous frames; one grammar per slot • ATIS 1994 system has 70 grammars. For example: • Grammar #1: ORIGIN_CITY [from | beginning in ] [Atlanta | Pittsburgh | Boston | …] • Grammar #2: DEPARTURE_TIME [leaving at | on ] TIME_EXPRESSION TIME_EXPRESSION [DAY_OF_WEEK] TIME_EXPRESSION [DAY_OF_WEEK] [TIME_OF_DAY] PHOENIX • “… As slot fillers (semantic phrases) are recognized, they are added to frames to which they apply. The algorithm is basically a dynamic programming beam search on frames. Many different frames, and several different versions of a frame, are pursued simultaneously. The score for each frame hypothesis is the number of words that it accounts for. A file of words not to be counted in the score is included. At the end of an utterance the parser picks the best scoring frame as a result… The output of the parser is the frame name and the parse trees for the filled slots.” (Ward and Issar, 1994) Applications • Dialogue systems – Frames are used to represent the actions the system can take – E.g. robot voice commands (TeamTalk) MOVE FRAME DIRECTION <N, E, S, W> DISTANCE – Most (?) dialogue systems require very specific/constrained semantic grammars to achieve acceptable performance (?) – Semantic grammars are not portable • Speech-to-speech Machine Translation – Frame (?) as target in Interlingua (a shallow semantic representation) (Levin et al. 2000) Outline • What is parsing? Why parse? • Review: Formal Language Theory – History • Parsing – CYK parsing—example – LR parsing (concept) – GLR parsing (concept) • Parsing speech – Open domain: GLR* – Restricted domain: Semantic grammars, and PHOENIX • Open research areas & additional topics Open research areas • Using non-domain-specific frame-like constraints for open-domain ASR – E.g. “turn X into Y” frame • X=pulp, Y=newsprint • X=Pope, Y=newsprint • “Reusable” semantic grammars? – “Re-use” the grammar – But substitute new semantics for a new application Additional topics in parsing • Alternative algorithms: – Top-down parsing (faster, but probably not good for speech) • • • • Probabilistic CFGs Mildly-context sensitive? Dependency parsing, “chunking” Feature unification grammars (e.g. LFG) – Attaches additional constraints to each grammar rule • Other constraints (Scone) References Dick Grune and Ceriel J.H. Jacobs, Parsing Techniqes – A Practical Guide, Ellis Horwood, Chichester, England, 1990. Michael Sipser, Introduction to the Theory of Computation, Course Technology, 2005. John Hopcroft, Rajeev Motwani, and Jefferey Ullman, Introduction to Automata Theory, Languages, and Computation, 2nd ed. Addison Wesley, 2000. Ernest Nagel, James Newman, Godel’s Proof, NYU Press, 2001. Charles Fillmore, “The Case for Case,” in April 1967 Texas Symposium On Linguistic Universals. Noam Chomsky, Syntactic Structures, 1957. References (2) Tomita, Masaru (1985). "An efficient context-free parsing algorithm for natural languages". International Joint Conference on Artificial Intelligence. IJCAI. pp. 756-764. Alon Lavie, Masaru Tomita, “GLR* - An Efficient Noiseskipping Parsing Algorithm For Context Free Grammars,” In Proceedings of the Third International Workshop on Parsing Technologies, 1993. Marsal Gavaldà, “Soup: a parser for real-world spontaneous speech” In book, New developments in parsing technology book contents, Kluwer Academic Publishers Norwell, MA, 2004. Michael McTear, Spoken Dialogue Technology: Toward the Conversational User Interface, Springer, 2004. References (3) Wayne Ward, “Understanding spontaneous speech: the Phoenix system.” ICASSP, 1991. Wayne Ward, Sunil Issar, “Recent Improvements in the CMU Spoken Language Understanding System” In HLT, 1994. Lori Levin, Alon Lavie, Monika Woszczyna, Donna Gates, Marsal Gavaldá, Detlef Koll, Alex Waibel, “The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains,” Machine Translation, Volume 15 , Issue 1/2 (June 2000) End of slides. Extra slides… Tree-Adjoining Grammar Non-robust parsers From Lavie and Rose’s LCFLEX paper (?) “Robust” parsers From Lavie and Rose’s LCFLEX paper (?) More speech parsing: • LCFLEX (Rose and Lavie ~2000)– GLR* but faster – Uses “left-corner” algorithm, combination topdown and bottom-up • Other case-frame parsers? (80’s - 90’s) – MINDS – DYPAR • For additional references, see my thesis proposal (forthcoming) Turing machine for {0n^2 }