Information Status Varieties of Information Status – Contrast John wanted a poodle but Becky preferred a corgi. – Topic/comment The corgi they bought turned out to have fleas. – Theme/rheme The corgi they bought turned out to have fleas. – Focus/presupposition It was Becky who took him to the vet. – Given/new Some wildcats bite, but this wildcat turned out to be a sweetheart. Today: Given/New • Why do we care about Given/New? • Defining Given/New: why is this hard? – Hearer-based and Discourse-based models • Uses of Given/New information in NLP • Identifying Given/New information automatically – Rule-based – Corpus-based – The Boston Directions Corpus – Laboratory studies suggest new directions Why do we care about the given/new distinction? • Building a model of the discourse – What do S and H believe to be true? – What is in their consciousness now? – What is ‘grounded’? • Speech technologies – TTS: Given information is often deaccented while new information is usually accented – ASR? Defining Given/New • Halliday ‘67: – Given: Recoverable from some form of context – New: Not recoverable • Chafe ’74 ’76: – Given: what S believes is in H’s consciousness – New: what S believes is not… – “Chafe-givenness” Yesterday I had my class disrupted by a bulldog/dog. I’m beginning to dislike dogs/bulldogs. • But not vice versa…. Prince ’81: A Given/New Taxonomy • Text as set of instructions from S to H on how to construct a discourse model – Model includes discourse entities, attributes, and links between entities – Discourse entities: individuals, classes, exemplars, substances, concepts (NPs) – Entities as ‘hooks’ on which to hang attributes (Webber ’78) • Entities when first introduced are new – Brand-new (H must create a new entity) I saw a dinosaur today. – Unused (H already knows of this entity) I saw your mother today. • Evoked entities are old -- already in the discourse – Textually evoked The dinosaur was scaley and gray. – Situationally evoked The light was red when you went through it. • Inferrables – Containing I bought a carton of eggs. One of them was broken. – Non-containing A bus pulled up beside me. The driver was a monkey. Given/New and Definiteness/Indefiniteness – Definiteness: subject NPs tend to be syntactically definite and old – Indefiniteness: object NPs tend to be indefinite and new I saw a black cat yesterday. The cat looked hungry. • Definite articles, demonstratives, possessives, personal pronouns, proper nouns, quantifiers like all, every signal definiteness…but… There were the usual suspects at the bar. • Indefinite articles, quantifiers like some, any, one signal indefiniteness…but…. This guy came into the room What’s wrong with a simple Hearer-centric model of given/new? • Hearer-centric information status: – Given: what S believes H has in his/her consciousness – New: what S believes H does not have in his/her consciousness • But discourse entities may also be given and new wrt the current discourse – Discourse-old: already evoked in the discourse – Discourse-new: not evoked (1) A: I’ve decided to make an appointment with Lee Bollinger. (2) B: Why do you want to see Bollinger? • Hearer status of discourse entities in 1? 2? – If B is your roommate? your mother? a guy on the subway? • Discourse status of discourse entities in 1? 2? • What would be the hearer/discourse status of discourse entities in this version? (1) A: I’ve decided to make an appointment with Lee Bollinger. (2a) B: Why do you want to see the president? (2b) B: Have you talked to his secretary? What does this new Hearer/Discourse given/new distinction provide? • A way to separate what is explicit in the discourse model from what is believed to be in speaker/hearer cognitive model • A way to explain given/new in more complex terms – To identify coreference relations – To explain deaccenting in ASR and TTS Gross Oversimplification: Given Items Tend to be Deaccented • Accenting and deaccenting: making items intonationally prominent or not • Critical to get this distinction ‘right’ in TTS – Accenting everything makes it hard for people to understand anything, e.g. I like my cat and my cat adores me. One potato, two potato, three potato,… If a discourse entity is given for one speaker then it may or may not be given for another speaker. How can we determine automatically whether a discourse entity is given or new? • A rule-based approach: – Stem the content words in the discourse – Select a window within which incoming items with the same stem as a previous entity and within this window will be labeled ‘given’ • Other items are ‘new’ • Is this hearer-based? Discourse-based? • How well does it work? – 65-75% accurate (precision) depending on genre, domain Boston Directions Corpus (Hirschberg & Nakatani ’96) • Experimental Design • 12 speakers: 4 used • Spontaneous and read versions of 9 direction-giving tasks • Corpus: 50m read; 67m spon • Labeling – Prosodic: ToBI intonational labeling – Discourse: Grosz & Sidner – Given/new (Prince ’92), grammatical function, p.o.s.,… Boston Directions Corpus: Describe how to get to MIT from Harvard d1: dsp1: step 1: enter and get token first enter the Harvard Square T stop and buy a token d2: dsp2: inbound on red line then proceed to get on the inbound um Red Line uh subway dp3 dsp3: take subway from hs, to cs to ks and take the subway from Harvard Square to Central Square and then to Kendall Square dp4: dsp4: get off T. then get off the T Hearer and Discourse Given/New Labeling first enter <HG/DN the Harvard Square T stop> and buy <HI/DN a token> then proceed to get on <HI/DN the inbound um Red Line uh subway> and take <HG/DG the subway> from <HG/DG Harvard Square> to <HG/DN Central Square> and then to <HG/DN Kendall Square> then get off <HG/DG the T> What could we do with this labeled data? • Can we predict given/new? • Can we predict what will be accented and what will be deaccented? Does Given/New Status Predict Deaccenting? NPa Deaccented Total HG HI HN DG DN 37.1% 53.9% 26.2% 43.3% 38.8% 1009 406 130 596 950 What else might be at work? • Given/new and grammatical function • Hypothesis: how discourse entities are evoked in a discourse influences how ‘given’ they are • E.g., How might grammatical function and surface position interact with the accentuation of ‘given’ items? • Cases: – X has not been mentioned in the prior context – X has been mentioned, with the same grammatical function/surface position – X has been mentioned but with a different grammatical function/surface position Experimental Design • Major problem: – How to elicit ‘spontaneous’ productions while varying desired phenomena systematically? – Key: simple variations and actions can capitalize upon natural tendency to associate grammatical functions with particular thematic roles for a given set of verbs Rectangle Triangle Cylinder Octagon Diamond Context 1 Rectangle Triangle Cylinder Octagon Diamond Context 2 Rectangle Triangle Cylinder Diamond Octagon Context 3 Rectangle Triangle Cylinder Octagon Diamond Target(A) Triangle Rectangle Cylinder Octagon Diamond Target(B) Rectangle Triangle Cylinder Octagon Diamond Experimental Conditions • 10 native speakers of standard American English • Subject and experimenter in soundproof booth • Subject told to describe scenes to confederate outside the booth, visible but with providing no feedback • 10 practice scenarios • ~20 minutes per subject Prosodic Analysis • Target turns excised and analyzed by two judges independently for location of pitch accents for each referring expression: accented (2), unsure (1), deaccented (0) accentedness score from 0-4 (81% agreement for 0 and 2 scores) Grammatical Role/Surface Position Accenting CONTEXT GIVEN NEW TARGET Subj D-obj Pp-obj Subj 2.1 3.6 3.2 D-obj 3.3 0.6 1.6 Pp-obj 3.0 1.4 0.7 3.7 3.8 -- Findings • In general – Items that differ from context to target in grammatical function or surface position tend to be accented – Items that share grammatical function and surface position tend to be deaccented • But – Subjects tend to be accented more often than objects, even if previously mentioned in the same role – Direct objects and pp-objects tend to be more distinguished from subjects than from one another How can we explain these observations? • Consider our examples, e.g. subjD.O. The TRIANGLE touches the CYLINDER. The triangle touches the DIAMOND. The triangle touches the OCTAGON. The RECTANGLE touches the TRIANGLE. • An entity may be ‘given’ or ‘new’ wrt the role it plays in the discourse Given/New Sensitive to the Role the Discourse Entity Plays • E.g., a discourse entity may retain a given or take on a new thematic role – By the time the target is uttered, ‘triangle’ is established both as a ‘given’ discourse entity and as the discourse topic (or BLC in centering theory) – But this status has been established for ‘triangle’ as agent – What is new, and, perhaps, focused in the target is ‘triangle’s’ new thematic role as patient – the players are the same but the roles are different Consequences for NLP – Identification of given/new status must be sensitive to more complex model of context (grammatical function/thematic role) – Will this help us predict deaccenting more accurately? – Stay tuned….. Next Class