Powerpoint file - people.csail.mit.edu

advertisement
Multilingual Conversational Systems
SPEECH
SPEECH
SPEECH
SYNTHESIS
SYNTHESIS
SYNTHESIS
LANGUAGE
GENERATION
Graphs
& Tables
DIALOGUE
MANAGER
DISCOURSE
CONTEXT
Models
Models
Models
SPEECH
RECOGNITION
Language
Transparent
Computer Science and Artificial Intelligence Laboratory
Rules
DATABASE
Meaning
Representation
LANGUAGE
UNDERSTANDING
Language
Independent
Rules
Language
Dependent
Steps to Develop Language
Learning System
1.
2.
3.
4.
5.
Begin with existing mature system in English
Develop English-to-Mandarin translation capability
Induce Mandarin corpus from English corpus
Train LM statistics for both recognizers from corpora
Develop parsing grammar for Mandarin queries and
generation rules for Mandarin responses
Not yet completed:
1. Develop domain-specific user simulation capability
2. Generate thousands of dialogues in both languages
3. Train recognizers and users from simulated dialogues
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months
• Translation from English to Mandarin
– Mainly focused on user queries (as contrasted with responses)
– Integrating generation-based translation with example-based
approach
– Exploring the use of statistical machine translation
* Use phrase-based statistical translation framework developed
by Phillip Koehn
* Utilized the formal methods to generate domain-specific parallel
corpus in weather query domain
* Implemented a finite-state transducer version of the decoder
and integrated with Galaxy
• Translation from Mandarin to English
– Use statistical method to obtain Chinese to English translation
capability
– Explore grammar induction techniques to create parsing grammar
for Mandarin queries, towards developing formal methods for
Mandarin to English translation
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months, Cont’d
• System Development
– Upgraded weather harvesting process
– Upgraded database server to support Postgres in addition to Oracle
– Improved dialogue management
* Better handling of meta queries
– Developed a new GUI interface ovecoming firewall limitations
* Support automatic checking and correction of typed tone errors
* Better display of tones as diacritcs
– Developed a new concatenative speech synthesis capability for high
quality translation of user queries spoken in English using Envoice
– Developed a batchmode capability to process synthetic speech
through dialogue interaction to aid system development
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months, Cont’d
Presentations:
– Three talks at InStill Workshop in Venice
* Wang and Seneff: Translation
* Seneff et al. : LL Systems
* Peabody et al.: Web based interface for tone acquisition
– ISCSLP:
* Seneff et al.: Focused on MuXing system overall
– SigDial Demo Session
* Wang and Seneff: Presentation and live demonstration
– One hour seminar at Microsoft China’s Speech Group
– One hour seminar at Defense Language Institute in Monterey
– Demonstrated system to Julian Wheatley, head of Chinese
department at MIT and to Henry Jenkins, director of MIT
Comparative Media Studies
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months, Cont’d
Data collection initiatives:
– Eight subjects have completed Web-based exercise at MIT
– Two visits by Stephanie Seneff to Defense Language Institute in
Monterey California
* One successful class participation exercise
* Another attempted but aborted due to power outage
– Installed Web-based exercise system on computers at MIT
Language Lab
* Julian Wheatley has agreed to support data collection initiatives
with students in the MIT Chinese classes
Computer Science and Artificial Intelligence Laboratory
Bilingual Recognizer Construction
English
corpus
Parse
Semantic
Frame
Generate
Chinese
corpus
Recognizer
English
Network
English
Recognizer
Language
Model
Chinese
Network
Chinese
Recognizer
Language
Model
• Two languages compete in common search space
• Automatically translate existing English corpus into Mandarin
• Use NL grammar to automatically induce language model for
both English and Mandarin recognizers
Computer Science and Artificial Intelligence Laboratory
Automatic Grammar Induction
Once translation ability exists from English to target
language, can create reverse system almost effortlessly
parse
English
Sentence
Interlingua
generate
Mandarin
Sentence
Corpus Pairs
Grammar
Induction
Mandarin
Parsing
Grammar
Computer Science and Artificial Intelligence Laboratory
Utilizes English parse
tree and Mandarin
generation lexicon to
induce Mandarin
parse tree
Multilingual Spoken
Translation Framework
Common meaning representation: semantic frame
English
Chinese
Spanish
Japanese
Recognition
NLU
Semantic
Frame
Parsing
Rules
Speech
Corpora
Computer Science and Artificial Intelligence Laboratory
Models
Generation
Rules
NLG
Synthesis
English
Chinese
Spanish
Japanese
Challenges in Cross-language
Generation for Translation
• Some expressions have very different syntactic structures in
different languages
What is your name?
I like her.
你(you) 叫(call) 什么(what) 名字(name)?
Ella me gusta.
• Syntactic features are expressed in many different ways
– Determiners (English but not Chinese)
附近(vicinity) 哪儿(where) 有(have) 银行(bank)?
Where is a bank nearby?
– Particles (Chinese but not English)
that hotel
那(that) 家(<particle>) 旅馆(hotel)
I lost my key.
我(I) 丢(lose) 了(<past tense>) 我的(my) 钥匙(key).
– Gender (extensive in Spanish)
Computer Science and Artificial Intelligence Laboratory
An Example: English/Chinese
How long does it take to take a taxi
there
How long does it take
need
take to take a taxi go there
there
( take
taxi go there
坐 出租车 去 那里
•
•
•
•
need
how long )
要
多久
Function words disappear in Chinese
Two instances of “take” have different translations
Verb “go” omitted in English
Sentence structure is very different
Computer Science and Artificial Intelligence Laboratory
Semantic Frame for Example
{c wh_question
Chinese
English
:aux “do”
:phatic_pronoun “it”
:pred {p take_time
:trace “how_long”
:aux “to_inf”
:v_complement {p take_ride
:topic {q taxi }
:quantifier “indef” }
:pred {p destination
:topic “there” } } } }
• Semantic frame is identical for both inputs, except for missing function
words in Mandarin
• Where necessary, constituent movement is invoked to render the same
hierarchical structure
• English generation predicts missing function words
• Mandarin generation infers “go” from “destination” predicate
Computer Science and Artificial Intelligence Laboratory
Strategies for Achieving
High Quality and Robustness
• Interlingua-based translation
– Maintain consistency of semantic frame representation
across different languages whenever possible
– Seed grammar rules for each new language on English
grammar rules
– Target language dependent generation rules specify
constituent order
– Word sense disambiguation achieved through semantic
features
• Restricted conversational domains (lesson plans)
– Emphasis on mechanisms to enable rapid porting to
new domains and languages
• Use parsability to assess quality of translation outputs
– Back off to example-based method when parse fails
Computer Science and Artificial Intelligence Laboratory
Schematic of Generation into Mandarin
{c verify
:aux “will”
“will” conditioned by “verify”
:subject “it”
:pred {p rain
:pred {p locative
:prep ‘in”
:topic {q city
:name “boston” } }
pulled to the front :pred {p temporal
:topic {q weekday
:quanitifier “this”
:name “weekend” } } } }
bo1 shi4 dun4 zhe4 zhou4 mo4
(
Boston
this weekend
hui4 bu2 hui4
will-not-will
zhe4 zhou4 mo4 bo1 shi4 dun4
( this weekend
Boston
hui4
will
Computer Science and Artificial Intelligence Laboratory
xia4 yu3 ?
rain
? )
xia4 yu3
ma5 ?
rain
<question-particle> ? )
Generation-based Translation
Semantic frame serves as interlingua
Translation achieved by parsing and generation
Use Mandarin grammar to detect potential problems
Rejected sentences routed to example-based translation for a
second chance
English
Input
Parse
English
Grammar
Semantic
Frame
Generate
Chinese
Rules
Computer Science and Artificial Intelligence Laboratory
Chinese
Sentence
rejected
•
•
•
•
Example-based
Translation
Parse?
Chinese
Grammar
Chinese
Output
accepted
Example-based Translation
• Requires translation pairs and a retrieval mechanism
– Corpus automatically obtained via the generation-based approach
– Retrieval based on lean semantic information
* Encoded as key-value pairs
* Obtained from semantic frame via simple generation rules
* Generalizes words to classes (e.g., city name, weekday, etc.) to
overcome data sparseness
Computer Science and Artificial Intelligence Laboratory
Example-based Translation Procedure
English
Input
Parser
English
Grammar
Semantic
Frame
Generator
KV
String
Chinese
KV-Chinese Output
Table
Key-value
Rules
Is there any chance of rain in San Francisco?
<CITY>
WEATHER: rain CITY: San
Francisco
San jin1
Francisco
{ <CITY> : jiu4
shan1 }}
jiu4 jin1<CITY>
shan1 hui4 bu2 hui4 xia4 yu3?
• Key-value string serves as interlingua
• Translation achieved by parsing and table lookup
• City name masked during retrieval and recovered in final
surface string
Computer Science and Artificial Intelligence Laboratory
Evaluation: English to Mandarin
Weather Domain
• Evaluation data
–
–
–
–
Drawn from the publicly available Jupiter weather system
Telephone recordings; conversational speech
Unparsable utterances (English grammar) were excluded
Total of 695 utterances, with 6.5 words per utterance on average
• System configuration
– Text input or speech input
* Recognizer achieved 6.9% word error rate, and 19.0% sentence
error rate
– Generation-based method preferred over example-based method
– NULL output if both failed
• Evaluation criteria
– Yield of each translation method
– Human judgment of translation quality
Computer Science and Artificial Intelligence Laboratory
Spoken Language Translation:
Evaluation Results
Perfect
Adequate
Wrong
Failed
%
Rule
550
34
8
Example
27
16
5
55
15%
577(83%)
50(7%)
13(2%)
8%
100%
Total
•
•
•
•
85%
Recognizer WER was 6.9%
Bilingual judge rated translations
Example-based translation increased yield by 6%
Incorrect translation provided only 2% of the time
– Often due to recognition errors
– English paraphrase provides context for errors
Computer Science and Artificial Intelligence Laboratory
Multilingual Weather Responses
English source:
Some thunderstorms may be accompanied by gusty winds and hail
rain/storm
clause: weather_event
topic: precip_act, name: thunderstorm, num: pl
quantifier: some
Frame indexed under
pred: accompanied_by wind, rain, storm, and hail
adverb: possibly
topic: wind, num: pl, pred: gusty
and: precip_act, name: hail
wind
hail
Japanese:
Spanish:
Algunas tormentas posiblement acompanadas
por vientos racheados y granizo
Chinese:
¤@ ¨Ç ¹p «B ¥i ¯à ·| ¦ñ ¦³ °} · ©M ¦B ¹r
Computer Science and Artificial Intelligence Laboratory
Stage 1: Drill Exercises
• Web-based Interface to provide practice in typing queries in
the weather domain
• 10 weather scenarios to be solved using typed pinyin: “Boston,
rain, tomorrow”
– Student given feedback on both query completeness and tone
accuracy
• Separate recording sessions allow user to practice both read
and spontaneous spoken queries
– Recordings will be used to train the system on accented speech
– Recordings will also be assessed for tone quality
• The Defense Language Institute in Monterey conducted a
successful experiment using this Web-based interface in a
class of 30 students
• We are planning to introduce the exercise in the language
laboratory at MIT
Computer Science and Artificial Intelligence Laboratory
Lexical Tone Correction
• Character representation does not explicitly encode tone:
– 洛杉矶星期一刮风吗?
• Exploit pinyin to help student acquire tonal knowledge:
– Diacritic: luò shān jī xīng qī yī guā fēng ma?
– Numeric: luo4 shan1 ji1 xing1 qi1 yi1 gua1 feng1 ma5?
• Hypothesis: Errors in typed pinyin reflect inaccurate
knowledge of tones
– luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2?
• Provide explicit feedback about typed tone errors
Computer Science and Artificial Intelligence Laboratory
Lexical Tone Correction
•
Exploit some features of Chinese
– Syllable lexicon is small, approximately 420 unique syllables
– 5 tones (including neutral tone)
•
Exploit some abilities of TINA NL system
– Ability to parse weighted word FST using probabilistic models
– FST normally represents a list of recognizer hypotheses
– A path through the FST represents the most likely correct parse
•
Given some input
1)
2)
3)
4)
Generate FST of single sentence
Expand the tones on each syllable
Attempt to parse FST
Selected path through FST represents corrected tones
Computer Science and Artificial Intelligence Laboratory
FST Example: Step 1
Step 1: Generate simple FST
Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2
Computer Science and Artificial Intelligence Laboratory
FST Example: Step 2
Step 2: Assign benefit of doubt to items that appear in lexicon
Items that do not appear in lexicon are removed.
Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2
Computer Science and Artificial Intelligence Laboratory
FST Example: Step 3
Step 3: Expand each syllable to alternate tones. More compact than
specifying each possible sentence variant.
Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2
Computer Science and Artificial Intelligence Laboratory
FST Example: Step 4
Step 4: Remaining probability is uniformly distributed among alternate tones
Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2
Computer Science and Artificial Intelligence Laboratory
FST Example: Step 5
Step 5: Parsing reveals the correct tones
Given: luo3 shan1 ji3 xing1 qi2 yi1 gua4 feng2 ma2
Correct: luo4 shan1 ji1 xing1 qi1 yi1 gua1 feng1 ma5
Computer Science and Artificial Intelligence Laboratory
Web interface: Practice Exercise
San Francisco
Tuesday
Hot
Student is prompted for city, time, and event
Computer Science and Artificial Intelligence Laboratory
Web interface: Practice Exercise
Xing1 qi1 er3 jiu3 jin3 shan1 hui4 bu2 hui4 re1
Student types in:
•
A question concerning this
topic in Mandarin using pinyin
OR
•
An English word or phrase for
a translation
Computer Science and Artificial Intelligence Laboratory
Web interface: Practice Exercise
Student is given feedback
Computer Science and Artificial Intelligence Laboratory
Web interface
Computer Science and Artificial Intelligence Laboratory
Spoken Conversational Interaction
• Weather information domain (rain, snow, wind, temperature, etc.)
• Initial version configured for American learning Mandarin
• Recognizer supports both English and Mandarin
– Seamless language switching
• English queries are translated into Mandarin
• Mandarin queries are answered in Mandarin
– User can ask for a translation into English of the response at any
time
• Uses Mandarin synthesizer provided by DELTA Electronics for
responses, Envoice concatenative synthesizer for query
translations
• System can be configured as telephone-only or as telephone
augmented with a Web-based gui interface
Computer Science and Artificial Intelligence Laboratory
Illustration of Dialogue Interaction
User:
System:
User:
System:
User:
System:
User:
System:
Bo1 Shi4 Dun4 ming2 tian1 hui4 xia4 yu3 ma5?
(Is it going to rain tomorrow in Boston?)
Tian1 qi4 yu4 bao4 ming2 tian1 Bo1 shi4 dun4
mei2 you3 yu3. (The forecast calls for no rain
tomorrow in Boston)
(in English) What is the temperature?
(translates) Qi4 wen1 shi4 duo1 shao3?
(emulates) Qi4 wen1 shi4 duo1 shao3?
Bo1 Shi4 Dun4 ming2 tian1 zui4 gao1 qi4 wen1
er4 she4 shi4 du4, ming2 tian1 ye4 jian1, zui4 di4
qi4 wen1 ling2 xia4 wu3 she4 shi4 du4.
Could you translate that?
In Boston tomorrow, high 2 degrees Celsius,
Tomorrow night, low -5 Celsius.
Computer Science and Artificial Intelligence Laboratory
Example Dialogue in Weather Domain
•
•
•
•
•
•
•
•
“What is the forecast for San Francisco tomorrow?”
System paraphrases request, then answers
“Please translate”
High quality synthesis for translation using MIT’s Envoice
concatenative synthesis framework
“Could you repeat that” – system provides translation
User emulates in Mandarin and system repeats previous
response
“Will it rain in London?”
“I’m sorry I didn’t understand you.” – response given when it
fails to recognize or parse the user query
Computer Science and Artificial Intelligence Laboratory
Video Clip
Demo
Computer Science and Artificial Intelligence Laboratory
Assessment
• Phonetic aspects
– Expand phonological rules to support non-native realizations (e.g.,
/dh/  /d/ or schwa insertion)
– Allow realizations of selected phones from native language to
compete in recognizer search
• Tonal aspects (Mandarin)
– Use tone recognition system (Wang et al., 1998) to score tone
productions; highlight worst-scoring words
– Tabulate frequencies of tone errors in typed inputs (pinyin)
– Use phase-vocoder techniques (Tang et al., 2001) to repair user’s
tone productions by replacing prosodic contour with native speech
patterns
• Fluency measures
– Word-by-word speaking rate (Chung & Seneff, 1999)
– Percentage of utterance containing pauses and disfluencies
Computer Science and Artificial Intelligence Laboratory
Tone analysis:
Native vs Non-Native Mandarin
• Creating pitch contours
– F0 extracted using algorithm in (Wang and Seneff, 2000)
– Statistics of each pitch contour over each syllable considered
without regard for left or right contexts
• Normalization
– Duration normalized by sampling at 10% intervals
– Pitch normalized according to:
lg x  lg
L
T ( x)  5
lg H  lg L
• Comparisons based on (Wang et al., 2003)
– Include normalized F0 value, peak, valley, range, peak position,
valley position, falling range, and rising range
• Corpus (from the Defense Language Institute)
– 2065 utterances from 4 native speakers
– 4657 utterances from 20 non-native speakers
Computer Science and Artificial Intelligence Laboratory
Tonal averages over all syllables:
Native Example
Computer Science and Artificial Intelligence Laboratory
Tonal averages over all syllables:
Non-Native Example
Computer Science and Artificial Intelligence Laboratory
Capturing Phonological Errors
• Leverage phonological modeling capabilities of SUMMIT
– Model typical pronunciation errors explicitly
– Direct and intuitive mapping from linguistic rules
– Support both within-language and cross-language substitutions
• Initial experiments completed on Koreans learning English
(Kim et al., ICSLP 2004)
– Phonological rules capture typical problems such as schwa insertion and /dh/
/d/ confusions
– Best path in alignment used to detect errors
– Verbal feedback given to student
• Current research to apply to Americans learning Mandarin
– Build single recognizer to support both languages
– Use data-driven approaches to discover most likely cross-language phone
substitution errors
– Explicitly encode such errors in formal phonological rules
– Side benefit may be improved recognition for English-accented Mandarin
–
Computer Science and Artificial Intelligence Laboratory
Detecting Phonological Errors
{CONSONANT} td {CONSONANT} => [tcl] [t] | tcl t [ax];
// No CCC allowed in Korean
{} dd {} => dcl [d [ax]] ;
// A vowel may be inserted after a coda consonant (Staccato Rhythm)
{} dh {} => dh | [dcl] d ;
// Becomes an onset stop as in 'they'. No [dh] in Korean phonemes..
Computer Science and Artificial Intelligence Laboratory
Future Plans
• Develop tools to rapidly port to new domains and
languages
– Automatic grammar induction
– Generic dialogue modeling
– Simulated dialogue interactions
• Develop various scoring algorithms for quality
assessment of student’s speech
• Develop high quality synthesis capability for Mandarin
translations, for multiple domains of knowledge
• Collect and transcribe data from language learners and
evaluate both system and students
– Begin with weather domain, our most mature system
– Extend to other domains once they are better developed
• Refine all aspects of systems based on collected data
Computer Science and Artificial Intelligence Laboratory
Download