Asking Questions to Limited Domain Virtual Characters: How Good Does Speech Recognition Have to Be? Dr. Anton Leuski, 2LT Brandon Kennedy, Ronak Patel, Dr. David Traum Outline • • • • Question-Answering Characters Sgt Blackwell Answer selection mechanisms Research questions – How good are the responses? – What is the impact of imperfect ASR? • Experiment and Results • Summary, Future Work, & Final Thoughts Question-answering characters • Q&A dialogue – Focus on information and social interaction • Simulate person answering question, e.g.: – From reporter – From interviewer – From police interrogator – Different from Question-answering system • Give appropriate answer rather than correct information – Different from believable characters • Focus on simulation of question-answering process rather than Turing test • Uses for Q&A characters – Simulation – Training – Games Examples of ICT Question-answering Characters Sgt Blackwell C3IT/TACQ: Raed Be a Reporter QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Sgt Blackwell • Technology demo for ASC 24 • Highlights: – Life-sized, mixed reality • Trans-screen – High-production quality • • • • Rendering (> 60K polygons) Voice Authored Text Robust responsiveness – Speech recognition – speech and non-verbal reply – Limited domain of interaction: responding to interview/Q&A QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Virtual Character Creation: Data-driven method 1. Collect data (questions) a) b) c) d) Scripted Paraphrases Wizard of Oz System 2. Annotate data – – Pick appropriate answers Rate level of appropriateness 3. Train statistical algorithms 4. Integrate in system 5. Evaluate = (1d) WOZ Data collection Sgt Blackwell Dialogue Model • Set of pre-constructed answers – In domain – Off-topic – Prompt • Local history • IR-based classification – Given (possibly previously unseen) question, map to best answer Methods for Computing Responses to Questions • Classification-Response Selection • Extraction-Template selectiontemplate filling • Parsing/Interpretation - Inference Generation Some Word-based Classification Approaches • Key-word spotting • Bayesian Classification • Latent Semantic Analysis (LSA) • Support-Vector Machines (SVM) • Relevance Model Retrieval • Cross-language Relevance Model Sgt Blackwell Evaluation Questions: 1. How good are the responses to expected questions? 2. How good are the responses to open questions? 3. How much do speech recognition errors affect performance? Experiment • Data: – 20 users asked Sgt Blackwell 20 questions each: • 10 given, in domain • 10 composed by user • Metrics – Accuracy of Speech recognizer & classifier – Appropriateness of replies (including to unseen and out of domain questions) • Answers rated for relevance (scale from 1-6) Example Session:user questions Q Who created you (Who created you) Tell you the truth, I have spent a lot of time with the smart folks at ICT, testing and tweaking. I am about ready to get out there and train with real soldiers. I am thinking first stop is going to be Fort Sill. But my ultimate goal is to get forward deployed... Q Why should we spend money on you (What should we spend money on you) Why do you need me?! What you should be saying is "How did you get along without me?" I will show you how to be a leader, how to make critical decisions under stress... And I am glad to be here, because I can help. I know I can. Look... I have one mission: to make better soldiers, make better warfighters. And if I can ... can save one life... bring one loved one back to his family... Well, then, I am definitely worth it. You be the judge. Q What's your value (What's the value in) I can do "Virtually" anything. I do mission rehearsals, Urban Combat training, language training - I speak Farsi, Pashtun and Arabic. "Mar Haba shabab! That means how are you in Arabic. Q Do you really work or is this just a demo (G. really worker is this just a demo) Sorry. That's outside my AO. Q How could you be better (How could you be better) Stop mumbling. Just kidding. I didn't get that. Q What do you wish you could do (When you wish you could do) Career Army. That is what I am thinking. Just call me a lifer. Q What kind of people have you trained (What other people in the turn in) I teach cultural awareness, function as a training mentor and can call in the full range of Joint fires and effects. Q What's the hardest part of your job (He was the hardest are you job) I am not authorized to comment on that Q You have any virtual friends (The universal friends) Sorry, I can't hear you Q Are you hard of hearing (Are you had a hearing) I can't understand you. Response coherence coding Gandhe et al ASC 2004 1. 2. 3. 4. 5. 6. Response is not related in any way to the question Response contains some discussion of people or objects mentioned in the question, but does not really address the question itself. Response partially addresses the question, but little or no coherence between the question and response. Response does mostly address the question, but with major problems in the coherence between question and response; seems like the response is really addressing a different question than the one asked. Response does address the question, but the transition is somewhat awkward. Response answers the question in a perfectly fluent manner. Performance on in-domain questions % data Avg Ans Score ASR WER Performance on user-selected questions % data Avg Ans Score ASR WER Summary • Question Answering Characters • How important is Speech recognition accuracy? – Not very • getting some correct words is good enough – Even a moderate quality recognizer is good enough, and worth convenience factor of speech Future Work • More use of context – Information transfer – Mood of character • New domains – Extended Blackwell (Cooper-Hewitt Museum) – Tactical questioning • ELECT BiLAT character Hassan • C3IT character Raed Closing thought: NL Dialogue Processing what are the best techniques for a task? QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Sgt. Blackwell, RAED BAR, TACQ, C3IT Understand language Manage dialog Generate language Radiobot MRE, SASO (Doctor Perez) Text classification Information extraction Semantic parsing Keep history Follow Protocol Rule-based reasoning Recorded answers Templatebased Statistical & Grammar-based generation