Dan Bohus
Dialogs on Dialogs Group, October 2003
We’re trying to build systems that can deal with a noisy recognition channel
Q: How good are humans are that?
More importantly, how do they do it?
What strategies do they use?
How do they decide which one to use when?
What kind of knowledge used in the process?
Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result
(text in these cases)
Exploring Human Error Handling Strategies
[Gabriel Skantze]
A Study of Human Dialogue Strategies in the
Presence of Speech Recognition Errors
[Teresa Zollo]
Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result
(text in these cases)
Exploring Human Error Handling Strategies
[Gabriel Skantze]
A Study of Human Dialogue Strategies in the
Presence of Speech Recognition Errors
[Teresa Zollo]
Problem-solving task
Wizard is guiding user through a campus
Wizard has detailed map
User has small fraction of map with their current surroundings
Experiments
8 users, 8 operators, balanced male/female
5 scenarios per user → 40 dialogs
Wizard receives recognition results on a GUI
Not parsed (user plays parser also)
Confidence denoted by color intensity
Users know they are talking to a human
Normal wizard more costly
Hard to maintain subjects for longitudinal studies
Conflicting information on change in linguistic patterns when speaking to a machine vs. to a human
Operators are naïve, they are also subjects of the study
43% WER, 7.3% OOV
Manual labeling of operator understanding
Full understanding
Partial understanding
Non-understanding
Misunderstanding
Very few misunderstandings
Operators good at rejecting
Users thought they were almost always understood
3 main operator strategies (approx equally distributed) for dealing with non- and partial understandings:
Continuation of route description
Signal of non-understanding
Task-related question
PARADISE-like regression indicates strategy 2 is inversely correlated with “how well do you think you did?”
Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result
Exploring Human Error Handling Strategies
[Gabriel Skantze]
A Study of Human Dialogue Strategies in the
Presence of Speech Recognition Errors
[Teresa Zollo]
TRIPS-Pacifica: planning the evacuation of the fictitious island Pacifica
Construct a plan to transport all the civilians on
Pacifica to Barnacle by 5 am so that they can be evacuated from the island (the play will be deployed at midnight)
+ the road between Calypso and Ocean Beach is impassable
Only 7 dialogs (September ’99)
Wizard assisted by GUI for quick information access and generating synthesized responses
Sphinx-2 (CMU), TrueTalk (Entropics)
Wizard receives string of words (paper does not mention confidence scores)
User debriefing questionnaire
Wizard annotates interaction transcript with knowledge sources used in decisions, etc…
Small corpus
7 dialogs
348 utterances
Manually labeled misunderstandings
Overall WER: 30%
Looked at positive and negative feedback
Request for full repetition: 33/80
24/33 cases users complied and repeated/rephrased
WH-replacement of missing or erroneous word:
12/80
8/12 cases users responded with the precise info
Attempt to salvage correct word: 20/80
Possibly increase user satisfaction?
Similar responses to ask for repeat
Request for verification: 15/80
10/15 responded by explicit affirmations
Request for full repetition: 33/80
24/33 cases users complied and repeated/rephrased
WH-replacement of missing or erroneous word:
12/80
8/12 cases users responded with the precise info
Attempt to salvage correct word: 20/80
Possibly increase user satisfaction?
Similar responses to ask for repeat
Request for verification: 15/80
10/15 responded by explicit affirmations
Wizards gave negative feedback in 80 cases
(35%) of the total 227 recognized incorrectly
Compensation for ASR:
Ignoring words that are not salient in the TRIPS domain
Hypothesizing correct words based on phonetic similarity
Q: So, what does that say? Better parsing?
Using an acknowledgement term (okay, right)
Simple response to question (next relevant contribution)
Conversational/social response i.e. greetings/thanks
Providing a next unsolicited relevant contribution
Clarifying or correcting
Paraphrasing
Observations consistent with theoretical grounding models (Clark et al)
Negative feedback only when really needed
Unless ASR is perfect (and sometimes even then), wizards give explicit indications of their understanding
WOZ setting…
Wizard = Parser + Dialog Manager
Seems that humans can extract more info from text than current parsers
we need better, more robust parsers?
How about Wizard = Dialog Manager?
Domain choice
Skantze results make sense in chosen domain
How can such results hold across domains?