Error Detection in Human-Machine Interaction Dan Bohus

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002 Errors in Spoken-Language Interfaces  Speech Recognition is problematic:  Input signal quality  Accents, Non-native speakers  Spoken language disfluencies: stutters, falsestarts, /mm/, /um/ Typical Word Error Rates in SDS: 10-30%  Systems today lack the ability to gracefully recover from error  An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO] Pathway to a solution  Make systems aware of unreliability in their inputs  Confidence  scores Develop a model which learns to optimally choose between several prevention/repair strategies  Identify strategies  Express them in a computable manner  Develop the model Papers  Error Detection in Spoken HumanMachine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels]  Problem Spotting in Human-Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels]  The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates [E.Krahmer, M. Swerts, M. Theune, M. Weegels] Goals  [Let’s look at dialog on page 2] (1) Analysis of positive an negative cues we use in response to implicit and explicit verification questions  (2) Explore the possibilities of spotting errors on line  Explicit vs. Implicit  Explicit  Presumably  easier for the system to verify But there’s evidence that it’s not as easy …  Leads to more turns, less efficiency, frustration  Implicit  Efficiency  But induces a higher cognitive burden which can result in more confusion  ~ Systems don’t deal very well with it… Clarke & Schaeffer  Grounding model  Presentation phase  Acceptance phase  Various indicators Go ON / YES  Go BACK / NO   Can we detect them reliably (when following implicit and explicit verification questions) ? Positive and Negative Cues Positive Negative Short turns Long turns Unmarked word order Marked word order Confirm Discomfirm Answer No answer No corrections Corrections No repetitions Repetitions New info No new info Experimental Setup / Data 120 dialogs : Dutch SDS providing train timetable information  487 utterances   44 (~10%) not used Users accepting a wrong result  Barge-in  Users starting their own contribution   Left 443 resulting adjacent S/U utterances Results – Nr of words Explicit Implicit ~Problems 1.68 3.21 Problems 3.44 7.12 Results – Empty turns (%) Explicit Implicit ~Problems 0% 3.4% Problems 2.6% 10.3% Results – Marked word order % Explicit Implicit ~Problems 3.3% 1.2% Problems 4.4% 26.9% Results – Yes/No Explicit Implicit ~Problems Problems Yes 92.8% 6.1% No 0% 56.6% Other 7.1% 37.1% Yes 0% 0% No 0% 15.4% Other 100% ? 84.6% Results – Repeated/Corrected/New Explicit Implicit ~Problems Problems Repeated 8.5% 23.9% Corrected 0% 72.6% New 11.4% 12.4% Repeated 2.4% 61.0% Corrected 0% 92.3% New 53.6% 36.5% First conclusion  People use more negative cues when there are problems  And even more so for implicit confirmations (vs. explicit ones) How well can you classify  Using individual features  Look at precision/recall Explicit: absence of confirmation  Implicit: non-zero number of corrections   Multiple features  Used memory based learning 97% accuracy (maj. Baseline 68%)  Confirm + Correct is winning, although individually less good  This is overall, right ? How about for explicit vs. implicit ?  BUT !!!  How many of these features are available on-line? Positive Short turns Unmarked word order Negative Long turns Marked word order Confirm Answer No corrections ? Disconfirm No answer Corrections ? No repetitions ? New info ? Repetitions ? No new info ? What else can we throw at it ? Prosody (next paper)  Lexical information  Acoustic confidence scores   Maybe also of previous utterances Repetitions/Corrections/New info on transcript ? … …  Papers  Error Detection in Spoken HumanMachine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels]  Problem Spotting in Human-Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels]  The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates [E.Krahmer, M. Swerts, M. Theune, M. Weegels] Goals  Investigate the prosodic correlates of disconfirmations  Is this slightly different than before ? (i.e. now looking at any corrections? Answer: No)  Looked at prosody on “NO” as a go_on vs a go_back:  Do you want to fly from Pittsburgh ?  Shall I summarize your trip ? Human-human Higher pitch range, longer duration  Preceded by a longer delay  High H% boundary tone   Expected to see same behavior for disconfirmation in human-machine Prosodic correlates POSITIVE(‘go on’) NEGATIVE(‘go back’) Boundary tone Low High Duration Short Long Delay Short Long Pause Short Long Pitch range Low High Features  Yes, the correlations are there as expected Perceptual analysis Took 40 “No” from No+stuff, 20 go_on and 20 go_back (note that some features are lost this way…)  Forced choice randomized task, w/ no feedback; 25 native speakers of Dutch  Results   17 go_on correctly identified above chance  15 go_back correctly identified above chance; but also 1 incorrectly identified above chance. Discussion  Q1: Blurred relationships …  Confidence annotation  Go_on / Go_back signal Is that the same as corrections ?  Is that the most general case for responses to implicit/explicit verifications, or should we have a separate detector ?   Q2: What other features could we throw at these problems ? What are the “most juicy” ones ? Discussion  Q3: For implicit confirms, are these different in terms of induced response behavior ?  When do you want to leave Pittsburgh ?  Travelling from Pittsburgh … when do you want to leave ?  When do you want to leave from Pittsburgh to Boston ?

Error Detection in Human-Machine Interaction Dan Bohus

Related documents

Products

Support

Error Detection in Human-Machine Interaction Dan Bohus

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib