Turn-Taking, Grounding and Speaker Segmentation Julia Hirschberg LSA07 353

advertisement
Turn-Taking, Grounding and Speaker
Segmentation
Julia Hirschberg
LSA07 353
7/15/2016
1
Today
• Turn-taking behaviors in human-human
conversation
– Conversational Analysis accounts
• Task/circumstance/individual dependencies
• Linguistic/cultural differences
– Grounding analyses
• Diarization: Automatic Turn Identification
7/15/2016
2
Turn-taking Behavior
• How do speakers know when it is appropriate to
contribute to a conversation?
• Conversational Analysis Theory: Conversational
partners expect certain patterns of behavior in
normal conversation
Pat: You got an A? That’s great!
Chris: Yeah, I’m really smart you know.
Chris: Well, I was just lucky I happened to read the
chapter on dialogue systems right before the test.
Otherwise I never would have squeaked through.
– General patterns in ordinary conversation
– Deviation is significant
7/15/2016
3
• Children learn turn taking within first 2 years
(Stern ’74)
– Children liked by their peers are more skilled (Black &
Hazen ’90)
• General individual differences
– Shy people pause longer and speak less and less
often (Pilkonis ’77)
– Schizophrenics, neurotics, depressed people less
skilled in turn-taking
7/15/2016
4
Expectations of What to Say Depend on Task at
Hand
• Telephone
– Openings
Pat: Hello?
Chris: Hi, Pat. It’s Chris.
Pat: Hi!
– Closings (6-turn)
Chris: Well, I just wanted to see how you were doing
Pat: Thanks for calling. We'll have to have lunch sometime
Chris: I'd like to
Pat: Okay
Chris: Okay
Pat: See you
Chris: Yeah, see you
7/15/2016
5
• Email
Pat: “Hi, can we switch lunch to 12:30? I’m running late.”
Chris: “Sure. 12:30.”
Pat: “Great. See you.”
• Service encounters
Clerk: Good morning. Is there something I can help you with?
Pat: Hi. Yeah. I wonder if you could show me….
• Meetings
Boss: Today I want to focus on next year’s goal statements. Chris,
could you report please….
Chris: …
Boss: Pat, now let’s hear from you…
Pat: …
• News broadcasts
Anchor: …Chris Smith reports from Rome now on the upcoming
conclave. Chris?
Reporter: Thanks, Pat….. And now back to Pat Jones in New
York.
7/15/2016
6
Conversational Analysis (Sacks et al ’74)
• Can we characterize expectations of ‘what to
say’ more generally?
• ‘Rules’ of turn-taking
– If, during this turn the current speaker has selected A
as the next speaker, then A must speak next
– If the current speaker does not select the next
speaker, any other speaker may take the next turn
– If no one else takes the next turn, the current speaker
may take the next turn
• Rules Apply at Transition Relevant Places
(TRPs) where something allows speaker
changes to occur
7/15/2016
7
Where Can Speaker Shifts Occur
• Adjacency pairs
– Question/answer
– Greeting/greeting
– Compliment/downplayer
• Dispreferred responses
–
–
–
–
Silence
‘No’ to a simple request without explanation
Changing the topic abruptly without transition
Important for Spoken Dialogue Systems
7/15/2016
8
Cultural Differences in Turn-Taking
• Chinese telephone conversations
– Openings (Zhu ’04)
• Mandarin vs. British
• Identification differences
– British self-report
– Chinese callees ask the caller
– Closings (Sun ’05)
• 39 female-female telephone conversations
• Closings initiated through matter-of-fact statement of
intention to end conversation
• Verbalized thanking occurs except in mother/daughter
closings – not the standard English model
– Finnish business calls (Halmari ’93) vs. American
• Americans get right to the point
• Finns chat
7/15/2016
9
But where is the intent? Purpose?
7/15/2016
10
Grounding Approaches to Conversational
Modeling
• Conversation is a joint process through which S
and H are constantly negotiating a common
ground (Stalnaker ’78, Clark ’96 inter alia)
– Cf mutual belief
– Principle of Closure: agents performing an action
require evidence that they have succeeded (Norman
’88)…or not
– Clark & Schaeffer ’89
• Presentation (by S) and Acceptance (by H) via
– Continued attention, relevant next contribution,
acknowledgement/assessment, demonstration, display
7/15/2016
11
• S: John Stewart is my favorite comedian
–
–
–
–
–
H: {continued attention}
H: The Daily Show is not to miss {rel next contrib}
H: Mhmm {acknowledgement}
H: He’s the funniest person you know {demonstr}
H: Your favorite comedian {display}
7/15/2016
12
Importance in SDS
• Turn-taking models and theories of grounding of
considerable potential use in SDS
– What is the User likely to say next and when?
– How can we be sure what the User has said and its
relationship to what s/he believes to be true?
– What type of response does s/he expect the system
to make? When?
• Obstacles for practical use:
– What cues signal when it is appropriate to speak?
– How do we negotiate a common system/user ground?
7/15/2016
13
When Is It Appropriate to Speak? (Beattie ’82)
• Data: 25m televised interviews before 1979
British General election
– Margaret Thatcher (Tory leader): the Iron Lady
– Jim Callaghan (Prime Minister): Sunny Jim
• Who interrupts?
– Less intelligent, highly neurotic, extroverted
– Men interrupt women
– Interruptions may indicate
• Desire for dominance
• Desire for social approval
• Conveyance of ‘joint enthusiasm’, heightened
involvement
7/15/2016
14
• Beattie’s classification scheme:
– Identify spkr 2 attempts to take the turn
• Smooth switches: no simultaneous speech, spkr
1’s utterance complete, turn to spkr 2
• Simple interruptions: simultaneous speech, spkr 1
doesn’t complete utterance, turn to spkr 2
• Overlap: simultaneous speech, spkr 1 completes
utterance, turn to spkr 2
• Butting-in: simultaneous speech but no change of
turn, spkr 1 keeps the turn
• Silent interruption: spkr 1’s utterance incomplete,
no simultaneous speech, turn to spkr 2
7/15/2016
15
– Analyze acoustic/prosodic and gestural information
• Turn-yielding behavior
–
–
–
–
–
–
Pauses
Speaking rate slows
Drawl at end of clause
Drop in pitch or loudness
Completion of syntactic clause
Gesture of termination
• Attempt suppression signals
– Filled pauses
– Gestures
7/15/2016
16
Results
• Thatcher interrupted almost twice as often as
she interrupts interviewer (19/10)– unlike
Callaghan (14/23)
– Thatcher: Starts slow and gets faster, few FPs (4)
– Callaghan: starts fast and gets slower, many FPs (22)
• Public perception: Thatcher is domineering in
interviews and Callaghan is a ‘nice guy’
– But Thatcher does not dominate
– Why is Thatcher interrupted?
• Interruptions come at end of syntactic clause when
drawl on stressed syllable in clause and falling
intonation
7/15/2016
17
• No suppression signals
– Why does she do this?
• Speech training before election?
– Why is she still perceived as domineering?
• When interrupted she doesn’t cede the floor
despite lengthy stretches of simultaneous speech
7/15/2016
18
Automatic Speaker
Identification/Segmentation
• Diarization: Segmentation of audio corpora
(Broadcast News, meetings, telephone
conversations) into speaker segments
– Speaker turns
– Speaker identification
– Speech and music
• Speaker segmentation
– Initial segmentation
– Segment clustering based on acoustic features
– State-of-the-art: 8.47% error
7/15/2016
19
<DOC>
<DOCNO> CNN19980104.1130.0000 </DOCNO>
<DOCTYPE> MISCELLANEOUS TEXT (automatic initial) </DOCTYPE>
<DATE_TIME> 01/04/1998 11:30:00.00 </DATE_TIME>
<BODY>
<TEXT>
</TEXT>
</BODY>
<END_TIME> 01/04/1998 11:30:34.71 </END_TIME>
</DOC>
<DOC>
<DOCNO> CNN19980104.1130.0034 </DOCNO>
<DOCTYPE> NEWS STORY </DOCTYPE>
<DATE_TIME> 01/04/1998 11:30:34.71 </DATE_TIME>
<BODY>
<TEXT>
in northern kentucky are forcing 3,000 people in two states to flee their
homes.
the fire started early this morning at the cargill company plant in
maysville near the ohio river.
authorities have been going door-to-door advising people in kentucky and
ohio to take shelter in area high schools.
the fire is in a building where several fertilizers and chemicals are
stored.
7/15/2016
20
officials say all they can do is let the fire burn itself out, because
spraying water on the flames would be too dangerous.
<TURN>
at the current time, our only way of getting it under control is to stay
away from it.
we've backed everyone off from the fire by about a mile and a quarter and
evacuated homes in that radius and the chief threat at this point is a very
small risk of a very large explosion caused by 400 tons of ammonia nitrate
stored in the building.
<TURN>
foir people have been taken to hospitals.
one firefighter was injured and treated on the scene.
</TEXT>
</BODY>
<END_TIME> 01/04/1998 11:31:31.00 </END_TIME>
</DOC>
<DOC>
<DOCNO> CNN19980104.1130.0091 </DOCNO>
<DOCTYPE> NEWS STORY </DOCTYPE>
<DATE_TIME> 01/04/1998 11:31:31.00 </DATE_TIME>
<BODY>
<TEXT>
authorities in brooklyn, new york, say an explosion at a tire company has
7/15/2016
21
caused at least three buildings to collapse.
it set off a four-alarm fire, which has been contained.
officials tell cnn one person was injured.
investigators have not determined the cause of the incident.
</TEXT>
</BODY>
<END_TIME> 01/04/1998 11:31:48.11 </END_TIME>
</DOC>
<DOC>
<DOCNO> CNN19980104.1130.0108 </DOCNO>
<DOCTYPE> NEWS STORY </DOCTYPE>
<DATE_TIME> 01/04/1998 11:31:48.11 </DATE_TIME>
<BODY>
<TEXT>
unexpected weather conditions are the rule across much of the united states
this weekend.
angela astore reports.
<TURN>
<ANNOTATION> Reporter: </ANNOTATION>
it was a nice day to play along the beach -- spend a few hours fishing -or get in a game of golf -- not uncommon -- unless it's january in chicago.
record high temperatures were set yesterday from minnesota to
massachusetts.
warm air drawn northward from the gulf of mexico was behind the rise in the
mercury.
7/15/2016
22
it was a different scene in the northwest, where snow is the story.
but the winter weather didn't stop this man from getting in some warmer
pursuits.
and he wasn't bothered by the fact that he couldn't see where his golf
balls landed.
<TURN>
it's not really where it's going to land that's important at this point
while you are learning.
once you've learned, then it is.
we'll worry about that when the snow clears.
right now, it's probably better that i don't see where they land.
7/15/2016
23
Speaker Identification
– Linguistic information to identify speaker types and
speaker names (LIMSI ’04)
• Templates (“<name> has this report from
<location>”)
• Results: 10.9% error on test set
– But only 10% of segments contain relevant patterns
– Estimate 25% error on broadcast news if segmentation
and clustering is done to id all of each speaker’s
segments
7/15/2016
24
Online Turn Identification for SDS
• Push-to-talk systems
• Silence detection
• Speech detection
– Barge-in
• Need more ‘natural’ turn-taking support
–
–
–
–
When are users ready to be interrupted?
When do they want to keep the floor?
When do they expect the system to backchannel?
How can we indicate when the system has finished its
turn?
7/15/2016
25
Conclusions
• Turn-taking models and theories of grounding of
considerable potential use in SDS
– What is the User likely to say next and when?
– How can we be sure what the User has said and its
relationship to what s/he believes to be true?
– What type of response does s/he expect the system
to make? When?
• Obstacles for practical use:
– What cues signal when it is appropriate to speak?
– How do we negotiate a common system/user ground?
7/15/2016
26
Next Class
• We know a few things we need to accomplish
and a bit about the difficulties….
• What tools do we have to use in tackling the
problems?
• Components of SDS:
– Automatic Speech Recognition
– Text-to-Speech
• Readings: J&M 22.2
7/15/2016
27
Download