Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT Today Given what we know about DRT, both from a theoretical and practical perspective, can we use it for practical applications? Outline Spoken dialogue system with DRT Using DRT and inference to control a mobile robot Wide coverage parsing with DRT Recognising Textual Entailment Human-Computer Dialogue Focus on small domains Grammatical coverage ensured Background knowledge encoding Spoken Dialogue system Godot the robot Speech recognition and synthesis People could give Godot directions, ask it questions, tell it new information Godot was a REAL robot Godot the Robot Godot with Tetsushi Oka Typical conversation with Godot Simple dialogues in beginning Human: Robot? Robot: Yes? Human: Where are you? Robot: I am in the hallway. Human: OK. Go to the rest room! Adding DRT to the robot The language model that the robot used for speech recognition returned DRSs We used the model builder MACE and the theorem prover SPASS for inference The model produced by MACE was used to find out what the robot should do This was possible as MACE produces minimal models Of course we also checked for consistency and informativity Advanced conversation with Godot Dealing with inconsistency and informativity Human: Robot, where are you? Robot: I am in the hallway. Human: You are in my office. Robot: No, that is not true. Human: You are in the hallway. Robot: Yes I know! Videos of Godot Video 1: Godot in the basement of Bucceuch Place Video 2: Screenshot of dialogue manager with DRSs and camera view of Godot Minimal Models Model builders normally generate models by iteration over the domain size As a side-effect, the output is a model with a minimal domain size From a linguistic point of view, this is interesting, as there is no redundant information Using models Examples: Turn on a light. Turn on every light. Turn on everything except the radio. Turn off the red light or the blue light. Turn on another light. Adding presupposition Godot was connected to an automated home environment One day, I asked Godot to switch on all the lights However, Godot refused to do this, responding that it was unable to do so. Why was that? At first I thought that the theorem prover made a mistake. But it turned out that one of the lights was already on. Intermediate Accommodation Because I had coded to switch on X having a precondition that X is not on, the theorem prover found a proof. Coding this as a presupposition, would not give an inconsistency, but a beautiful case of intermediate accommodation. In other words: Switch on all the lights! [ All lights are off; switch them on.] [=Switch on all the lights that are currently off] Sketch of resolution x Robot[x] e y Light[y] => Off[y] switch[e] Agent[e,x] Theme[e,y] Global Accommodation x Robot[x] Off[y] e y Light[y] => switch[e] Agent[e,x] Theme[e,y] Intermediate Accommodation x Robot[x] e y Light[y] Off[y] => switch[e] Agent[e,x] Theme[e,y] Local Accommodation x Robot[x] e y Light[y] => switch[e] Agent[e,x] Theme[e,y] Off[y] Outline Spoken dialogue system with DRT Using DRT and inference to control a mobile robot Wide coverage parsing with DRT Recognising Textual Entailment Wide-coverage DRT Nowadays we have robust widecoverage parsers that use stochastic methods for producing a parse tree Trained on Penn Tree Bank Examples are parsers like those from Collins and Charniak Wide-coverage parsers Say we wished to produce DRSs on the output of these parsers We would need quite detailed syntax derivations Closer inspection reveals that many of the parsers use many [several thousands] phrase structure rules Often, long distance dependencies are not recovered Combinatory Categorial Grammar CCG is a lexicalised theory of grammar (Steedman 2001) Deals with complex cases of coordination and long-distance dependencies Lexicalised, hence easy to implement English wide-coverage grammar Fast robust parser available Categorial Grammar Lexicalised theory of syntax Many different lexical categories Few grammar rules Finite set of categories defined over a base of core categories Core categories: s np n pp Combined categories: np/n s\np (s\np)/np CCG: type-driven lexicalised grammar Category Name Examples N noun Ralph, car NP noun phrase Everyone NP/N determiner a, the, every S\NP intrans. verb walks, smiles (S\NP)/NP transitive verb loves, hates (S\NP)\(S\NP) adverb quickly CCG: combinatorial rules Forward Application (FA) Backward Application (BA) Generalised Forward Composition (FC) Backward Crossed Composition (BC) Type Raising (TR) Coordination CCG derivation NP/N:a N:spokesman S\NP:lied CCG derivation NP/N:a N:spokesman S\NP:lied CCG derivation NP/N:a N:spokesman S\NP:lied ------------------------------- (FA) CCG derivation NP/N:a N:spokesman S\NP:lied ------------------------------- (FA) NP: a spokesman CCG derivation NP/N:a N:spokesman S\NP:lied ------------------------------- (FA) NP: a spokesman ---------------------------------------- (BA) CCG derivation NP/N:a N:spokesman S\NP:lied ------------------------------- (FA) NP: a spokesman ---------------------------------------- (BA) S: a spokesman lied CCG derivation NP/N:a N:spokesman S\NP:lied ------------------------------- (FA) NP: a spokesman ---------------------------------------- (BA) S: a spokesman lied Coordination in CCG np:Artie (s\np)/np:likes (x\x)/x:and np:Tony (s\np)/np:hates np:beans ---------------- (TR) ---------------- (TR) s/(s\np):Artie s/(s\np):Tony ------------------------------------ (FC) --------------------------------------- (FC) s/np: Artie likes s/np:Tony hates ------------------------------------------------------- (FA) (s/np)\(s/np):and Tony hates --------------------------------------------------------------------------------- (BA) s/np: Artie likes and Tony hates ------------------------------------------------------ (FA) s: Artie likes and Tony hates beans The Glue Use the Lambda Calculus to combine CCG with DRT Each lexical entry gets a DRS with lambda-bound variables, representing the “missing” information Each combinatorial rule in CCG gets a semantic interpretation, again using the tools of the lambda calculus Interpreting Combinatorial Rules Each combinatorial rule in CCG is expressed in terms of the lambda calculus: Forward Application: FA(,) = @ Backward Application: BA(,) = @ Type Raising: TR() = x.x@ Function Composition: FC(,) = x.@x@ CCG: lexical semantics Category N NP/N S\NP Semantics x. Example spokesman spokesman(x) p. q.(( X ;p@x);q@x) e x. x@y. lie(e) agent(e,y) a lied CCG derivation NP/N:a N:spokesman e x p. q. S\NP:lied ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) CCG derivation NP/N:a N:spokesman e x p. q. S\NP:lied ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) ------------------------------------------------ (FA) NP: a spokesman p. q. x ;p@x;q@x@z. spokesman(z) CCG derivation NP/N:a N:spokesman S\NP:lied e x p. q. ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) -------------------------------------------------------- (FA) NP: a spokesman q. x ; spokesman(x) ;q@x CCG derivation NP/N:a N:spokesman S\NP:lied e x p. q. ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) -------------------------------------------------------- (FA) NP: a spokesman x q. spokesman(x) ;q@x CCG derivation NP/N:a N:spokesman S\NP:lied e x p. q. ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) -------------------------------------------------------- (FA) NP: a spokesman x q. spokesman(x) ;q@x -------------------------------------------------------------------------------- (BA) S: a spokesman lied e x.x@y. lie(e) agent(e,y) @q. x spokesman(x) ;q@x CCG derivation NP/N:a N:spokesman S\NP:lied e x p. q. ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) -------------------------------------------------------- (FA) NP: a spokesman x q. spokesman(x) ;q@x -------------------------------------------------------------------------------- (BA) S: a spokesman lied q. x spokesman(x) e ;q@x @ y. lie(e) agent(e,y) CCG derivation NP/N:a N:spokesman S\NP:lied e x p. q. ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) -------------------------------------------------------- (FA) NP: a spokesman x q. spokesman(x) ;q@x -------------------------------------------------------------------------------- (BA) S: a spokesman lied x spokesman(x) e ; lie(e) agent(e,x) CCG derivation NP/N:a N:spokesman S\NP:lied e x p. q. ;p@x;q@x z. spokesman(z) x.x@y. lie(e) agent(e,y) -------------------------------------------------------- (FA) NP: a spokesman x q. spokesman(x) ;q@x -------------------------------------------------------------------------------- (BA) S: a spokesman lied xe spokesman(x) lie(e) agent(e,x) The Clark & Curran Parser Use standard statistical techniques Robust wide-coverage parser Clark & Curran (ACL 2004) Grammar derived from CCGbank 409 different categories Hockenmaier & Steedman (ACL 2002) Results: 96% coverage WSJ Bos et al. (COLING 2004) Example output: Applications Has been used for different kind of applications Question Answering Recognising Textual Entailment Recognising Textual Entailment A task for NLP systems to recognise entailment between two (short) texts Introduced in 2004/2005 as part of the PASCAL Network of Excellence Proved to be a difficult, but popular task. Pascal provided a development and test set of several hundred examples RTE Example (entailment) RTE 1977 (TRUE) His family has steadfastly denied the charges. ----------------------------------------------------The charges were denied by his family. RTE Example (no entailment) RTE 2030 (FALSE) Lyon is actually the gastronomical capital of France. ----------------------------------------------------Lyon is the capital of France. Aristotle’s Syllogisms ARISTOTLE 1 (TRUE) All men are mortal. Socrates is a man. ------------------------------Socrates is mortal. How to deal with RTE There are several methods We will look at five of them to see how difficult RTE actually is Recognising Textual Entailment Method 1: Flipping a coin Flipping a coin Advantages Easy to implement Disadvantages Just 50% accuracy Recognising Textual Entailment Method 2: Calling a friend Calling a friend Advantages High accuracy (95%) Disadvantages Lose friends High phonebill Recognising Textual Entailment Method 3: Ask the audience Ask the audience RTE 893 (????) The first settlements on the site of Jakarta were established at the mouth of the Ciliwung, perhaps as early as the 5th century AD. ---------------------------------------------------------------The first settlements on the site of Jakarta were established as early as the 5th century AD. Human Upper Bound RTE 893 (TRUE) The first settlements on the site of Jakarta were established at the mouth of the Ciliwung, perhaps as early as the 5th century AD. ---------------------------------------------------------------The first settlements on the site of Jakarta were established as early as the 5th century AD. Recognising Textual Entailment Method 4: Word Overlap Word Overlap Approaches Popular approach Ranging in sophistication from simple bag of word to use of WordNet Accuracy rates ca. 55% Word Overlap Advantages Relatively straightforward algorithm Disadvantages Hardly better than flipping a coin RTE State-of-the-Art Pascal RTE challenge Hard problem Requires semantics Accuracy RTE 2004/5 (n=25) 0.59-0.60 0.57-0.58 0.55-0.56 0.53-0.54 0.51-0.52 0.49-0.50 0 10 Recognising Textual Entailment Method 5: Using DRT Inference How do we perform inference with DRSs? Translate DRS into first-order logic, use off-the-shelf inference engines. What kind of inference engines? Theorem Provers Model Builders Using Theorem Proving Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Give this to the theorem prover: T’ H’ If the theorem prover finds a proof, then T entails H Vampire (Riazanov & Voronkov 2002) Let’s try this. We will use the theorem prover Vampire (currently the best known theorem prover for FOL) This gives us good results for: apposition relative clauses coodination intersective adjectives/complements passive/active alternations Example (Vampire: proof) RTE-2 112 (TRUE) On Friday evening, a car bomb exploded outside a Shiite mosque in Iskandariyah, 30 miles south of the capital. ----------------------------------------------------A bomb exploded outside a mosque. Example (Vampire: proof) RTE-2 489 (TRUE) Initially, the Bundesbank opposed the introduction of the euro but was compelled to accept it in light of the political pressure of the capitalist politicians who supported its introduction. ----------------------------------------------------The introduction of the euro has been opposed. Background Knowledge However, it doesn’t give us good results for cases requiring additional knowledge Lexical knowledge World knowledge We will use WordNet as a start to get additional knowledge All of WordNet is too much, so we create MiniWordNets MiniWordNets MiniWordNets Use hyponym relations from WordNet to build an ontology Do this only for the relevant symbols Convert the ontology into first-order axioms MiniWordNet: an example Example text: There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes. MiniWordNet: an example Example text: There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes. x(user(x)person(x)) x(worker(x)person(x)) x(researcher(x)person(x)) x(person(x)risk(x)) x(person(x)cigarette(x)) ……. Using Background Knowledge Given a textual entailment pair T/H with text T and hypothesis H: Produce DRS for T and H Translate drs(T) and drs(H) into FOL Create Background Knowledge for T&H Give this to the theorem prover: (BK & T’) H’ MiniWordNets at work RTE 1952 (TRUE) Crude oil prices soared to record levels. ----------------------------------------------------Crude oil prices rise. Background Knowledge: x(soar(x)rise(x)) Troubles with theorem proving Theorem provers are extremely precise. They won’t tell you when there is “almost” a proof. Even if there is a little background knowledge missing, Vampire will say: NO Vampire: no proof RTE 1049 (TRUE) Four Venezuelan firefighters who were traveling to a training course in Texas were killed when their sport utility vehicle drifted onto the shoulder of a Highway and struck a parked truck. ---------------------------------------------------------------Four firefighters were killed in a car accident. Using Model Building Need a robust way of inference Use model builder Paradox Claessen & Sorensson (2003) Use size of (minimal) model Compare size of model of T and T&H If the difference is small, then it is likely that T entails H Using Model Building Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Generate Background Knowledge Give this to the Model Builder: i) BK & T’ ii) BK & T’ & H’ If the models for i) and ii) are similar in size, then we predict that T entails H Example 1 T: John met Mary in Rome H: John met Mary Model T: 3 entities Model T+H: 3 entities Modelsize difference: 0 Prediction: entailment Example 2 T: John met Mary H: John met Mary in Rome Model T: 2 entities Model T+H: 3 entities Modelsize difference: 1 Prediction: no entailment Model size differences Of course this is a very rough approximation But it turns out to be a useful one Gives us a notion of robustness Of course we need to deal with negation as well Give not T and not [T & H] to model builder Not necessarily one unique minimal model Lack of Background Knowledge RTE-2 235 (TRUE) Indonesia says the oil blocks are within its borders, as does Malaysia, which has also sent warships to the area, claiming that its waters and airspace have been violated. --------------------------------------------------------------There is a territorial waters dispute. How well does this work? We tried this at the RTE 2004/05 Combined this with a shallow approach (word overlap) Using standard machine learning methods to build a decision tree Features used: Proof (yes/no) Model size Model size difference Word Overlap Task (source of RTE pair) RTE Results 2004/5 Accuracy CWS Shallow 0.569 0.624 Deep 0.562 0.608 Hybrid (S+D) 0.577 0.632 Hybrid+Task 0.612 0.646 Bos & Markert 2005 Conclusions We have got the tools for doing computational semantics in a principled way using DRT For many applications, success depends on the ability to systematically generate background knowledge Small restricted domains [dialogue] Open domain What we did in this course We introduced DRT, a notational variant of first-order logic. Semantically, we can handle in DRT anything we can in FOL, including events. Moreover, because it is so close to FOL, we can use first-order methods to implement inference for DRT. The DRT box syntax, is essentially about nesting contexts, which allows a uniform treatment of anaphoric phenomena. Moreover, this works not only on the theoretical level, but is also implementable, and even applicable. What we hope you got out of it First, we hope we made you aware that nowadays computational semantics is able to handle some difficult problems. Second, we hope we made you aware that DRT is not just a theory. It is a complete architecture allowing us to experiment with computational semantics. Third, we hope you are aware that state-ofthe-art inference engines can help to study or apply semantics. Where you can find more For more on DRT read the standard textbook devoted to DRT by Kamp and Reyle. This book discusses not only the basic theory, but also plurals, tense, and aspect. Where you can find more For more on the basic architecture underlying this work on computational semantics, and particular on implementations on the lambda calculus, and parallel use of theorem provers and model builders, see: www.blackburnbos.org Where you can find more All of the theory we discussed in this course is implemented in Prolog. This software can be downloaded from www.blackburnbos.org. For an introduction to Prolog written very much with this software in mind, try Learn Prolog Now! www.learnprolognow.org