Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg

advertisement
Recognizing Structure:
Dialogue Acts and
Segmentation
Julia Hirschberg
CS 6998
7/15/2016
1
Today
Recognizing structural information from speech
Topic structure
Speech/dialogue acts
Applications
Speech browsing and search of large corpora
Broadcast News (NIST TREC SDR track)
Topic Detection and Tracking (NIST/DARPA TDT)
Customer care call recordings, focus groups,
voicemail
7/15/2016
2
SCAN
7/15/2016
3
SCAN demo
7/15/2016
4
Discourse Structure and
Topic Structure
Intention-based accounts
Grosz & Sidner ‘86
Conversational moves (games)
Edinburgh map task dialogues
Adjacency pairs
Schegloff, Sacks, Jefferson
7/15/2016
5
Indicators of Topic Structure
Cue phrases: now, well, first
Pronominal reference
Orthography and formatting -- in text
Lexical information (Hearst ‘94, Reynar ’98,
Beeferman et al ‘99)
In speech?
7/15/2016
6
Prosodic Correlates of
Discourse/Topic Structure
Pitch range
Lehiste ’75, Brown et al ’83, Silverman ’86,
Avesani & Vayra ’88, Ayers ’92, Swerts et al
’92, Grosz & Hirschberg’92, Swerts &
Ostendorf ’95, Hirschberg & Nakatani ‘96
Preceding pause
Lehiste ’79, Chafe ’80, Brown et al ’83,
Silverman ’86, Woodbury ’87, Avesani & Vayra
’88, Grosz & Hirschberg’92, Passoneau &
Litman ’93, Hirschberg & Nakatani ‘96
7/15/2016
7
Rate
Butterworth ’75, Lehiste ’80, Grosz &
Hirschberg’92, Hirschberg & Nakatani ‘96
Amplitude
Brown et al ’83, Grosz & Hirschberg’92,
Hirschberg & Nakatani ‘96
Contour
Brown et al ’83, Woodbury ’87, Swerts et al ‘92
7/15/2016
8
Prosodic Cues to Sentence
and Topic Boundaries:
Shriberg et al ’00
Prosody cues perform as well or better than
text-based cues at topic segmentation -- and
generalize better?
Goal: identify sentence and topic boundaries at
ASR-defined word boundaries
CART decision trees provided boundary
predictions
HMM combined these with lexical boundary
predictions
7/15/2016
9
Features
For each potential boundary location:
Pause at boundary (raw and normalized by
speaker)
Pause at word before boundary (is this a new
‘turn’ or part of continuous speech segment?)
Phone and rhyme duration (normalized by
inherent duration) (phrase-final lengthening?)
F0 (smoothed and stylized): reset, range (topline,
baseline), slope and continuity
7/15/2016
10
Voice quality (halving/doubling estimates as
correlates of creak or glottalization)
Speaker change, time from start of turn, # turns
in conversation and gender
Trained/tested on Switchboard and Broadcast News
7/15/2016
11
Sentence segmentation
results
• Prosodic features
• Better than LM for BN
• Worse (on transcription) and same for ASR transcript
on SB
• All better than chance
• Useful features for BN
• Pause at boundary ,turn/no turn, f0 diff across
boundary, rhyme duration
7/15/2016
12
• Useful features for SB
• Phone/rhyme duration before boundary, pause at
boundary, turn/no turn, pause at preceding word
boundary, time in turn
7/15/2016
13
Topic segmentation results
(BN only):
• Useful features
• Pause at boundary, f0 range, turn/no turn, gender,
time in turn
• Prosody alone better than LM
• Combined model improves significantly
7/15/2016
14
Speech Act Theory
John Searle
Locutionary acts: semantic meaning
Illocutionary acts: ask, promise, answer,
threat
Perlocutionary acts: Effect intended to be
produced on speaker: regret, fear
Dialogue acts
Many tagging schemes (e.g. DAMSL)
7/15/2016
15
Practical Motivations:
Spoken Dialogue Systems
Add more information about speaker intentions
Disambiguate ambiguous utterances
Okay
Um
Right
7/15/2016
16
Experimental Evidence:
Nickerson & Chu-Carroll ‘99
Can/would/would..willing questions
Can you move the piano?
Would you move the piano?
Would you be willing to move the piano?
A la Sag & Liberman ‘75: can intonation
disambiguate?
7/15/2016
17
Experiments
Production studies:
Subjects read ambiguous questions in
disambiguating contexts
Control for given/new and contrastiveness
Polite/neutral/impolite
Problems:
Cells imbalanced
No pretesting
7/15/2016
18
No distractors
Same speaker reads both contexts
7/15/2016
19
Results
Indirect requests
If L%, more likely (73%) to be indirect
46% H%: differences in height of boundary
tone?
Politeness: can differs in impolite (higher
rise) vs. neutral
Variation in speaker strategy
7/15/2016
20
Corpus Studies: Jurafsky et al
‘98
Lexical, acoustic/prosodic/syntactic
differentiators for yeah, ok, uhuh, mhmm, um…
Continuers: Mhmm (not taking floor)
Assessments: Mhmm (tasty)
Agreements: Mhmm (I agree)
Yes answers: Mhmm (That’s right)
Incipient speakership: Mhmm (taking floor)
7/15/2016
21
Corpus Study
Switchboard telephone conversation corpus
Hand segmented and labeled with DA
information (initially from text)
Relabeled for this study
Analyzed for
Lexical realization
F0 and rms features
Syntactic patterns
7/15/2016
22
Results: Lexical Differences
Agreements
yeah (36%), right (11%),...
Continuer
uhuh (45%), yeah (27%),…
Incipient speaker
yeah (59%), uhuh (17%), right (7%),…
Yes-answer
yeah (56%), yes (17%), uhuh (14%),...
7/15/2016
23
Results: Prosodic and
Syntactic Cues
Relabeling from speech produces only 2%
changed labels over all (114/5757)
43/987 continuers --> agreements
Why?
Shorter duration, lower F0, lower energy, longer
preceding pause
Over all DA’s, duration best differentiator but…
Highly correlated with length in words
Assessments: That’s X (good, great, fine,…)
7/15/2016
24
Future Work
Speaker differences?
Higher level prosodic differences among
ambiguous word DA’s?
7/15/2016
25
Next Week
Turn-taking and disfluencies
7/15/2016
26
Download