Non-Native Users in the Let s Go!! Spoken Dialogue System:

Non-Native Users in the Let’s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute Carnegie Mellon University Background    Speech-enabled systems use models of the user’s language Such models are tailored for native speech Great loss of performance for non-native users who don’t follow typical native patterns Previous Work on Non-Native Speech Recognition    Assumes knowledge about/data from a specific non-native population Often based on read speech Focuses on acoustic mismatch: • Acoustic adaptation • Multilingual acoustic models Linguistic Particularities of Non-Native Speakers  Non-native speakers might use different lexical and syntactic constructs  Non-native speakers are in a dynamic process of L2 acquisition Outline of the Talk    Baseline system and data collection Study of non-native/native mismatch and effect of additional non-native data Adaptive lexical entrainment The CMU Let’s Go!! System: Bus Schedule Information for the Pittsburgh Area ASR Sphinx II Parsing Phoenix Dialogue Management RavenClaw HUB Galaxy Speech Synthesis Festival NLG Rosetta Data Collection    Baseline system accessible since February 2003 Experiments with scenarios Publicized the phone number inside CMU in Fall 2003 Data Collection Web Page Data  Directed experiments: 134 calls • 17 non-native speakers (5 from India, 7 from Japan, 5 others)    Spontaneous: 30 calls Total: 1768 utterances Evaluation Data: • Non-Native: 449 utterances • Native: 452 utterances Speech Recognition Baseline  Acoustic Models:  Language Model: • semi-continuous HMMs (codebook size: 256) • 4000 tied states • trained on CMU Communicator data • class-based backoff 3-gram • trained on 3074 utterances from native calls Speech Recognition Results Word Error Rate: Native Non-Native 20.4% 52.0% Causes of discrepancy: • Acoustic mismatch (accent) • Linguistic mismatch (word choice, syntax) Language Model Performance OOV Rate 3.5 3 % tokens Evaluation on transcripts. Initial model: 3074 native utterances 2.5 2 1.5 1 0.5 0 40 35 30 25 20 15 10 5 0 Native Non-Native Rate of utterances with OOV 14 12 % utterances Perplexity Perplexity 10 8 6 4 2 Native Non-Native 0 Native Non-Native Language Model Performance Adding non-native data: 3074 native+1308 non-native utterances 3.5 3 % tokens Initial (native) model Mixed model OOV Rate 1 0 Native Non-Native Rate of utterances with OOV 14 12 % utterances Perplexity 2 1.5 0.5 Perplexity 40 35 30 25 20 15 10 5 0 2.5 10 8 6 4 2 Native Non-Native 0 Native Non-Native Natural Language Understanding    Grammar manually written incrementally, as the system was being developed Initially built with native speakers in mind Phoenix: robust parser (less sensitive to non-standard expressions) Grammar Coverage Initial grammar: • 45 40 35 30 25 20 15 10 5 0 Manually written for native utterances Native Non-Native Parse Utterance Coverage 60 % utterances not fully parsed  % words not covered by parse Parse Word Coverage 50 40 30 20 10 0 Native Non-Native Grammar Coverage Grammar designed to accept some nonnative patterns: • • “reach” = “arrive” “What is the next bus?” = “When is the next bus?” 45 40 35 30 25 20 15 10 5 0 Native Non-Native Parse Utterance Coverage 60 % utterances not fully parsed  % words not covered by parse Parse Word Coverage 50 40 30 20 10 0 Native Non-Native Relative Improvement due to Additional Data % Improvement 60 50 40 30 20 10 0 % OOV % utt w/ OOV Native Set Perplexity Word Utt. Coverage Coverage Non-Native Set Effect of Additional Data on Speech Recognition Word Error Rate (%) 60 50 40 Native Model Mixed Model 30 20 10 0 Native Set Non-Native Set Adaptive Lexical Entrainment      “If you can’t adapt the system, adapt the user” System should use the same expressions it expects from the user But non-native speakers might not master all target expressions Use expressions that are close to the nonnative speaker’s language Use prosody to stress incorrect words Adaptive Lexical Entrainment: Example I want to go the airport I Did you mean: want to go TO the airport? Adaptive Lexical Entrainment: Algorithm I want to go the airport ASR Hypothesis DP-based Alignment Target Prompts Prompt Selection Emphasis Confirmation Prompt Adaptive Lexical Entrainment: Algorithm I’d like to go to the airport I want to go the airport ASR Hypothesis DP-based Alignment Target Prompts Prompt Selection Emphasis Confirmation Prompt Adaptive Lexical Entrainment: Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis DP-based Alignment Target Prompts Prompt Selection Emphasis Confirmation Prompt Adaptive Lexical Entrainment: Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis DP-based Alignment Target Prompts Prompt Selection Emphasis Confirmation Prompt Adaptive Lexical Entrainment: Algorithm I’d like to go to the airport I want to go the airport I want to go to the airport ASR Hypothesis DP-based Alignment Target Prompts Prompt Selection Emphasis Confirmation Prompt Adaptive Lexical Entrainment: Algorithm I’d like to go to the airport I want to go the airport Did you mean: ASR Hypothesis I want to go to the airport DP-based Alignment Target Prompts Prompt Selection Emphasis ? Confirmation Prompt Adaptive Lexical Entrainment: Study   24 dialogues, 12.4 turns/dialogue, 2.5 entrainment prompts/dialogue Trigger entrainment whenever best target does not perfectly match ASR • Too simple? 40.7% false positive  64.4% prompts are triggered by misrecognitions Conclusion    LM and Grammar perform significantly worse on non-native speech Including non-native data reduces the gap in linguistic metrics but not WER Future work: • • Improve Speech Recognition • More data for LM • Acoustic/Lexical adaptation Integrate adaptive lexical entrainment in the dialog management framework Example Dialogue S: Welcome to the Let’s Go!! bus information system. What can I do for you? U: Excuse me. I’m in the place which uh where the Forbes Avenue and Murray Avenue are crossing and I want to go to the place mm Forbes where the Forbes Avenue and Craig Street are crossing. S: I heard “SIX”. Is that correct? U: Which bus uh should I take? Uh sorry uh maybe 61C bus I have… What time the next 61C bus will come? S: The time is currently 5:37 PM. Where are you leaving from? … Example Dialogue S: Welcome to the Let’s Go!! bus information system. What can I do for you? U: I want to go to downtown now I’m at Fifth and Bigelow. S: Sorry, did you say “I want to go to downtown”? U: Which number bus can I take and when will the next bus come here? S: ... What can I do for you? …

Non-Native Users in the Let s Go!! Spoken Dialogue System:

Related documents

Products

Support

Non-Native Users in the Let s Go!! Spoken Dialogue System:

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib