Building Effective Question

advertisement
Asking Questions to Limited
Domain Virtual Characters:
How Good Does Speech
Recognition Have to Be?
Dr. Anton Leuski, 2LT Brandon Kennedy,
Ronak Patel, Dr. David Traum
Outline
•
•
•
•
Question-Answering Characters
Sgt Blackwell
Answer selection mechanisms
Research questions
– How good are the responses?
– What is the impact of imperfect ASR?
• Experiment and Results
• Summary, Future Work, & Final Thoughts
Question-answering characters
• Q&A dialogue
– Focus on information and social interaction
• Simulate person answering question, e.g.:
– From reporter
– From interviewer
– From police interrogator
– Different from Question-answering system
• Give appropriate answer rather than correct information
– Different from believable characters
• Focus on simulation of question-answering process rather than
Turing test
• Uses for Q&A characters
– Simulation
– Training
– Games
Examples of ICT
Question-answering Characters
Sgt Blackwell
C3IT/TACQ:
Raed
Be a Reporter
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Sgt Blackwell
• Technology demo for ASC 24
• Highlights:
– Life-sized, mixed reality
• Trans-screen
– High-production quality
•
•
•
•
Rendering (> 60K polygons)
Voice
Authored Text
Robust responsiveness
– Speech recognition
– speech and non-verbal reply
– Limited domain of interaction:
responding to interview/Q&A
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Virtual Character Creation:
Data-driven method
1. Collect data (questions)
a)
b)
c)
d)
Scripted
Paraphrases
Wizard of Oz
System
2. Annotate data
–
–
Pick appropriate answers
Rate level of appropriateness
3. Train statistical algorithms
4. Integrate in system
5. Evaluate = (1d)
WOZ Data
collection
Sgt Blackwell Dialogue Model
• Set of pre-constructed answers
– In domain
– Off-topic
– Prompt
• Local history
• IR-based classification
– Given (possibly previously unseen) question,
map to best answer
Methods for Computing
Responses to Questions
• Classification-Response Selection
• Extraction-Template selectiontemplate filling
• Parsing/Interpretation - Inference Generation
Some Word-based
Classification Approaches
• Key-word spotting
• Bayesian Classification
• Latent Semantic Analysis (LSA)
• Support-Vector Machines (SVM)
• Relevance Model Retrieval
• Cross-language Relevance Model
Sgt Blackwell Evaluation
Questions:
1. How good are the responses to expected
questions?
2. How good are the responses to open
questions?
3. How much do speech recognition errors
affect performance?
Experiment
• Data:
– 20 users asked Sgt Blackwell 20 questions each:
• 10 given, in domain
• 10 composed by user
• Metrics
– Accuracy of Speech recognizer & classifier
– Appropriateness of replies (including to unseen
and out of domain questions)
• Answers rated for relevance (scale from 1-6)
Example Session:user questions
Q
Who created you (Who created you)
Tell you the truth, I have spent a lot of time with
the smart folks at ICT, testing and tweaking. I
am about ready to get out there and train
with real soldiers. I am thinking first stop is
going to be Fort Sill. But my ultimate goal is
to get forward deployed...
Q
Why should we spend money on you (What
should we spend money on you)
Why do you need me?! What you should be saying is
"How did you get along without me?" I will
show you how to be a leader, how to make
critical decisions under stress... And I am glad
to be here, because I can help. I know I can.
Look... I have one mission: to make better
soldiers, make better warfighters. And if I can
... can save one life... bring one loved one
back to his family... Well, then, I am
definitely worth it. You be the judge.
Q
What's your value (What's the value in)
I can do "Virtually" anything. I do mission rehearsals,
Urban Combat training, language training - I
speak Farsi, Pashtun and Arabic. "Mar Haba
shabab! That means how are you in Arabic.
Q
Do you really work or is this just a demo (G.
really worker is this just a demo)
Sorry. That's outside my AO.
Q
How could you be better (How could you be
better)
Stop mumbling. Just kidding. I didn't get that.
Q
What do you wish you could do (When you wish
you could do)
Career Army. That is what I am thinking. Just call me
a lifer.
Q
What kind of people have you trained (What
other people in the turn in)
I teach cultural awareness, function as a training
mentor and can call in the full range of Joint
fires and effects.
Q
What's the hardest part of your job (He was the
hardest are you job)
I am not authorized to comment on that
Q
You have any virtual friends (The universal
friends)
Sorry, I can't hear you
Q Are you hard of hearing (Are you had a hearing)
I can't understand you.
Response coherence coding
Gandhe et al ASC 2004
1.
2.
3.
4.
5.
6.
Response is not related in any way to the question
Response contains some discussion of people or
objects mentioned in the question, but does not really
address the question itself.
Response partially addresses the question, but little or
no coherence between the question and response.
Response does mostly address the question, but with
major problems in the coherence between question
and response; seems like the response is really
addressing a different question than the one asked.
Response does address the question, but the transition
is somewhat awkward.
Response answers the question in a perfectly fluent
manner.
Performance on in-domain
questions
%
data
Avg
Ans
Score
ASR WER
Performance on user-selected
questions
%
data
Avg
Ans
Score
ASR WER
Summary
• Question Answering Characters
• How important is Speech recognition
accuracy?
– Not very
• getting some correct words is good enough
– Even a moderate quality recognizer is good
enough, and worth convenience factor of speech
Future Work
• More use of context
– Information transfer
– Mood of character
• New domains
– Extended Blackwell (Cooper-Hewitt Museum)
– Tactical questioning
• ELECT BiLAT character Hassan
• C3IT character Raed
Closing thought: NL Dialogue Processing what are the best techniques for a task?
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Sgt. Blackwell, RAED
BAR, TACQ, C3IT
Understand
language
Manage
dialog
Generate
language
Radiobot
MRE, SASO (Doctor Perez)
Text
classification
Information
extraction
Semantic parsing
Keep
history
Follow
Protocol
Rule-based
reasoning
Recorded
answers
Templatebased
Statistical &
Grammar-based
generation
Download