Discourse Structure in Generation Julia Hirschberg CS 4706 7/15/2016 1 Today • Models of Discourse Structure – Do we have them? – Grosz & Sidner ’86 • What identifies discourse structure to Hearers? – Textual cues – Spoken cues • How can we produce appropriate discourse structure in TTS systems? • Can we identify discourse structure automatically, from speech? 7/15/2016 2 Is there structure in this discourse? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. 7/15/2016 3 Is this a reasonable structure? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. 7/15/2016 4 This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. 7/15/2016 5 This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. 7/15/2016 6 What information do we use in segmenting a discourse? • • • • ‘Topic’ coherence? Repeated reference? ‘Cue’ phrases? ???? 7/15/2016 7 Structures of Discourse Structure (Grosz & Sidner ‘86) • A leading theory of discourse structure – Based upon Speaker intentions and Speaker and Hearer attentional state – Identifies a few, general relations that hold among Speaker intentions – Identifies a model of attentional state • Three components: – Linguistic structure – Intentional structure – Attentional structure 7/15/2016 8 Linguistic Structure • What is actually said or written • How is the linguistic structure represented? – Assume discourse is segmented into Discourse Segments (DS) • What is the basic unit of analysis? • Do we all segment alike? • Do we all use the same cues? 7/15/2016 9 Linguistic Structure of Discourse D S1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. S2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. 7/15/2016 10 Intentional Structure • Discourse purpose (DP): basic purpose of the Speaker in producing the discourse • Discourse segment purposes (DSPs): the Speaker’s purpose in producing the segment • Segments are related to one another by their purposes: – Satisfaction-precedence: DSP1 must be satisfied before DSP2 – Dominance: DSP1 dominates DSP2 if fulfilling DSP2 constitutes part of fulfilling DSP1 7/15/2016 11 Linguistic Structure of Discourse D DSP1: Describe murder of dove by duck. S1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. DSP2: Describe meeting of old friend. S2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity. 7/15/2016 12 DSP2: Describe recovery process. S2: DSP3: Describe snack S3: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. DSP3: Describe meeting old friend. S4: To my surprise, I ran into a friend from back home. DSP5: Describe friend’s reaction S5: When I told her of my recent experience she questioned my sanity. 7/15/2016 13 Attentional State: The Focus Stack • Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP • State changes: transition rules controlling the addition/deletion of focus spaces – Information at lower levels may or may not be available at higher levels – Focus spaces are pushed onto the stack when • A new DS is begun 7/15/2016 14 • An embedded DS (e.g. a DS dominated by another DS) is begun – Focus spaces are popped when they are completed • State of focus stack models felicitous reference, coherence in discourse S2: DSP2, scene, Speaker, snack_bar Cocoa, friend, home,sanity S1: DSP1, duck, dove, Speaker, duck_dove_supply 7/15/2016 15 Limits of the Theory • Assumes discourses are task-oriented • Assumes a single, hierarchical structure shared by S and H • Questions: – Do people really build such structures when they converse? – Use them in interpreting what others say? – How could they do it? 7/15/2016 16 How might people recognize discourse structure? • Linguistic markers? – tense and aspect – cue phrases • Inference of Speaker intentions? • Inference from task structure? • Intonational Information? 7/15/2016 17 Acoustic and Prosodic Cues to Discourse Structure • Intuition: – Speakers vary acoustic and prosodic cues to convey variation in discourse structure – Systematic? In read or spontaneous speech? • Evidence: – Observations from recorded corpora – Laboratory experiments – Machine learning of discourse structure from acoustic/prosodic features 7/15/2016 18 Prosodic Correlates of Discourse/Topic Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 7/15/2016 19 • Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 7/15/2016 20 Issues • Do we find significant and reliable cues to discourse structure in prosodic variation – When tested against an independent theory of discourse structure? – In spontaneous as well as read speech? • Are Hearers interpretations of discourse structure influenced by intonational variation? 7/15/2016 21 Grosz & Hirschberg ‘92 • Small corpus of read AP newswire – Read by professional speaker – Labeled for discourse structure from text alone or from text and speech – Pre-ToBI labeled – Acoustic-prosodic features extracted for each intermediate (level 3) phrase • • • • 7/15/2016 Pitch range and change from prior phrase Intensity (rms) and change in db from prior phrase Preceding and subsequent pause Speaking rate 22 • Analysis of phrases in different segment positions: SBEG, SF, parentheticals, quoted speech – ANOVA’s and t-tests on means • Results: – Direct quotes: larger pitch range – Parentheticals: smaller range, neg change from prior phrase, neg change in db, faster rate – SBEG: larger range, louder, greater preceding pause, less subsequent pause – SF: greater subsequent pause 7/15/2016 23 • Machine learning experiments identified: – SBEG with 91.5% est. accuracy (x-validation) – SF, 92.5% – Attributive tags, 96.9% – Direct quotations, 86.4% – Indirect quotations, 88.5% – Parentheticals, 89.2% • Conclusion: Acoustic/prosodic information is available to permit Hearers to identify discourse structure… 7/15/2016 24 Next • The midterm – Closed book, no notes or electronic devices – Will include material through today 7/15/2016 25