Non-Native Users in the Let s Go!! Spoken Dialogue System:

advertisement
Non-Native Users in the
Let’s Go!! Spoken Dialogue System:
Dealing with Linguistic Mismatch
Antoine Raux & Maxine Eskenazi
Language Technologies Institute
Carnegie Mellon University
Background



Speech-enabled systems use models of
the user’s language
Such models are tailored for native
speech
Great loss of performance for non-native
users who don’t follow typical native
patterns
Previous Work on Non-Native
Speech Recognition



Assumes knowledge about/data from a
specific non-native population
Often based on read speech
Focuses on acoustic mismatch:
• Acoustic adaptation
• Multilingual acoustic models
Linguistic Particularities of
Non-Native Speakers

Non-native speakers might use different
lexical and syntactic constructs

Non-native speakers are in a dynamic
process of L2 acquisition
Outline of the Talk



Baseline system and data collection
Study of non-native/native mismatch and
effect of additional non-native data
Adaptive lexical entrainment
The CMU Let’s Go!! System:
Bus Schedule Information for the
Pittsburgh Area
ASR
Sphinx II
Parsing
Phoenix
Dialogue
Management
RavenClaw
HUB
Galaxy
Speech Synthesis
Festival
NLG
Rosetta
Data Collection



Baseline system accessible since
February 2003
Experiments with scenarios
Publicized the phone number inside
CMU in Fall 2003
Data Collection Web Page
Data

Directed experiments: 134 calls
• 17 non-native speakers (5 from India, 7 from
Japan, 5 others)



Spontaneous: 30 calls
Total: 1768 utterances
Evaluation Data:
• Non-Native: 449 utterances
• Native: 452 utterances
Speech Recognition Baseline

Acoustic Models:

Language Model:
• semi-continuous HMMs (codebook size: 256)
• 4000 tied states
• trained on CMU Communicator data
• class-based backoff 3-gram
• trained on 3074 utterances from native calls
Speech Recognition Results
Word Error Rate:
Native
Non-Native
20.4%
52.0%
Causes of discrepancy:
• Acoustic mismatch (accent)
• Linguistic mismatch (word choice, syntax)
Language Model Performance
OOV Rate
3.5
3
% tokens
Evaluation on transcripts.
Initial model:
3074 native utterances
2.5
2
1.5
1
0.5
0
40
35
30
25
20
15
10
5
0
Native
Non-Native
Rate of utterances with OOV
14
12
% utterances
Perplexity
Perplexity
10
8
6
4
2
Native
Non-Native
0
Native
Non-Native
Language Model Performance
Adding non-native data:
3074 native+1308 non-native utterances
3.5
3
% tokens
Initial (native) model
Mixed model
OOV Rate
1
0
Native
Non-Native
Rate of utterances with OOV
14
12
% utterances
Perplexity
2
1.5
0.5
Perplexity
40
35
30
25
20
15
10
5
0
2.5
10
8
6
4
2
Native
Non-Native
0
Native
Non-Native
Natural Language Understanding



Grammar manually written incrementally,
as the system was being developed
Initially built with native speakers in mind
Phoenix: robust parser (less sensitive to
non-standard expressions)
Grammar Coverage
Initial grammar:
•
45
40
35
30
25
20
15
10
5
0
Manually written for
native utterances
Native
Non-Native
Parse Utterance Coverage
60
% utterances not
fully parsed

% words not
covered by parse
Parse Word Coverage
50
40
30
20
10
0
Native
Non-Native
Grammar Coverage
Grammar designed to
accept some nonnative patterns:
•
•
“reach” = “arrive”
“What is the next bus?” =
“When is the next bus?”
45
40
35
30
25
20
15
10
5
0
Native
Non-Native
Parse Utterance Coverage
60
% utterances not
fully parsed

% words not
covered by parse
Parse Word Coverage
50
40
30
20
10
0
Native
Non-Native
Relative Improvement due to
Additional Data
% Improvement
60
50
40
30
20
10
0
% OOV
% utt w/
OOV
Native Set
Perplexity
Word
Utt.
Coverage Coverage
Non-Native Set
Effect of Additional Data
on Speech Recognition
Word Error Rate (%)
60
50
40
Native Model
Mixed Model
30
20
10
0
Native Set
Non-Native Set
Adaptive Lexical Entrainment





“If you can’t adapt the system, adapt the user”
System should use the same expressions it
expects from the user
But non-native speakers might not master all
target expressions
Use expressions that are close to the nonnative speaker’s language
Use prosody to stress incorrect words
Adaptive Lexical Entrainment:
Example
I
want
to
go
the airport
I
Did you mean:
want
to go TO the airport?
Adaptive Lexical Entrainment:
Algorithm
I want to go the airport
ASR
Hypothesis
DP-based
Alignment
Target
Prompts
Prompt
Selection
Emphasis
Confirmation
Prompt
Adaptive Lexical Entrainment:
Algorithm
I’d like to go to the airport
I want to go the airport
ASR
Hypothesis
DP-based
Alignment
Target
Prompts
Prompt
Selection
Emphasis
Confirmation
Prompt
Adaptive Lexical Entrainment:
Algorithm
I’d like to go to the airport
I want to go the airport
I want to go to the airport
ASR
Hypothesis
DP-based
Alignment
Target
Prompts
Prompt
Selection
Emphasis
Confirmation
Prompt
Adaptive Lexical Entrainment:
Algorithm
I’d like to go to the airport
I want to go the airport
I want to go to the airport
ASR
Hypothesis
DP-based
Alignment
Target
Prompts
Prompt
Selection
Emphasis
Confirmation
Prompt
Adaptive Lexical Entrainment:
Algorithm
I’d like to go to the airport
I want to go the airport
I want to go to the airport
ASR
Hypothesis
DP-based
Alignment
Target
Prompts
Prompt
Selection
Emphasis
Confirmation
Prompt
Adaptive Lexical Entrainment:
Algorithm
I’d like to go to the airport
I want to go the airport
Did you mean:
ASR
Hypothesis
I want to go to the airport
DP-based
Alignment
Target
Prompts
Prompt
Selection
Emphasis
?
Confirmation
Prompt
Adaptive Lexical Entrainment:
Study


24 dialogues, 12.4 turns/dialogue, 2.5
entrainment prompts/dialogue
Trigger entrainment whenever best
target does not perfectly match ASR
• Too simple? 40.7% false positive

64.4% prompts are triggered by
misrecognitions
Conclusion



LM and Grammar perform significantly worse
on non-native speech
Including non-native data reduces the gap in
linguistic metrics but not WER
Future work:
•
•
Improve Speech Recognition
• More data for LM
• Acoustic/Lexical adaptation
Integrate adaptive lexical entrainment in the dialog
management framework
Example Dialogue
S:
Welcome to the Let’s Go!! bus information system.
What can I do for you?
U:
Excuse me. I’m in the place which uh where the Forbes Avenue and
Murray Avenue are crossing and I want to go to the place mm Forbes
where the Forbes Avenue and Craig Street are crossing.
S:
I heard “SIX”. Is that correct?
U:
Which bus uh should I take? Uh sorry uh maybe 61C bus I have…
What time the next 61C bus will come?
S:
The time is currently 5:37 PM. Where are you leaving from?
…
Example Dialogue
S:
Welcome to the Let’s Go!! bus information system.
What can I do for you?
U:
I want to go to downtown now I’m at Fifth and Bigelow.
S:
Sorry, did you say “I want to go to downtown”?
U:
Which number bus can I take and when will the next bus
come here?
S:
... What can I do for you?
…
Download