Dagstuhl 2000
Wolfgang Wahlster
German Research Center for
Artificial Intelligence, DFKI GmbH
Stuhlsatzenhausweg 3
66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de
WWW:http://www.dfki.de/~wahlster
A capuccino in
10 minutes, please!
Speech-controlled coffee machine
Let‘s go to Baker
Street in Berkeley!
Speech-based car navigation
I would like to hear
Mozart‘s piano concert
No. 3!
Send the following email to
Mark Maybury: Hi Mark, please forward the following agenda to your project partners!
© Wolfgang Wahlster, DFKI
Speech-enabled music selection
Dictation
Dagstuhl 2000
Show me all CNN news of the last 3 months that feature Bill
Clinton discussing health care!
Information on demand
What has Jim Hendler said about DAML during our recent Dagstuhl seminar?
Audio Mining
I would like to make an appointment with
Dr. Kuremastu in Kyoto next week!
© Wolfgang Wahlster, DFKI
Speech-to-Speech
Translation
Dagstuhl 2000
Speech Input
Acoustic
Language Models
Speech Recognition
Word Lists
What has the speaker said?
100
Alternatives
Sprachanalyse
Speech Analysis
Grammar
Lexical
Meaning
Speech
Understanding
What has the speaker meant?
10
Alternatives
Discourse Context
Knowledge about Domain of Discourse
© Wolfgang Wahlster, DFKI
What does the speaker want?
Unambiguous
Understanding in the
Dialog Context
Dagstuhl 2000
Input Conditions Naturalness Adaptability Dialog Capabilities
Close-Speaking
Microphone/Headset
Push-to-talk
Isolated Words Speaker
Dependent
Monolog
Dictation
Informationseeking Dialog
Telephone,
Pause-based
Segmentation
Read Continuous
Speech
Speaker
Independent
Open Microphone,
GSM Quality
Spontaneous
Speech
Speaker adaptive
© Wolfgang Wahlster, DFKI
Multiparty
Negotiation
Dagstuhl 2000
Wann fährt der nächste
Zug nach Hamburg ab?
Wo befindet sich das nächste
Hotel?
When does the next train to Hamburg depart?
Where is the nearest hotel?
Verbmobil
Server
© Wolfgang Wahlster, DFKI Dagstuhl 2000
As the name Verb mobil suggests, the system supports verb al communication with foreign dialog partners in mobil e situations.
1 face-to-face conversations
2 telecommunication
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Verbmobil Speech
Translation Server
Solution: Conference Call: The Verbmobil Speech Translation Server is accessed by GSM mobile phones.
© Wolfgang Wahlster, DFKI Dagstuhl 2000
© Wolfgang Wahlster, DFKI
Dagstuhl 2000
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Audio Signal Recognizers
German
English
Japanese
Word Hypotheses Graph
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Edge = Word
Best Hypothesis
Acoustic Score
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Massive Data Collection Efforts
Transliteration Variant 1
Transliteration Variant 2
Lexical Orthography
Canonical Pronounciation
Manual Phonological Segmentation
Automatic Phonological Segmentation
Word Segmentation
Prosodic Segmentation
Dialog Acts
Noises
Superimposed Speech
Syntactic Category
Word Category
Syntactic Function
Prosodic Boundaries
The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations
© Wolfgang Wahlster, DFKI
3,200 dialogs (182 hours) with 1,658 speakers
79,562 turns distributed on
56 CDs, 21.5 GB
Dagstuhl 2000
Extracting Statistical Properties from Large Corpora
Transcribed
Speech Data
Segmented
Speech with Prosodic
Labels
Annotated
Dialogs with
Dialog Acts
Treebanks &
Predicate-
Argument
Structures
Aligned
Bilingual
Corpora
Machine Learning for the Integration of Statistical Properties into
Symbolic Models for Speech Recognition, Parsing,
Dialog Processing, Translation
Hidden
Markov
Models
© Wolfgang Wahlster, DFKI
Neural Nets,
Multilayered
Perceptrons
Probabilistic
Automata
Probabilistic
Grammars
Probabilistic
Transfer
Rules
Dagstuhl 2000
Multi-Agent Architecture
M3
M1 M2
M4 M5
M6
Each module must know, which module produces what data
Direct communication between modules
Each module has only one instance
Heavy data traffic for moving copies around
Multiparty and telecooperation applications are impossible
Software: ICE and ICE Master
Basic Platform: PVM
Multi-Blackboard Architecture
M1 M2 M3
BB 1 BB 2 BB 3
Blackboards
M4 M5 M6
All modules can register for each blackboard dynamically
No direct communication between modules
Each module can have several instances
No copies of representation structures
(word lattice, VIT chart)
Multiparty and Telecooperation applications are possible
Software: PCA and Module Manager
Basic Platform: PVM
© Wolfgang Wahlster, DFKI Dagstuhl 2000
A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
Command
Recognizer
Audio Data
Spontaneous
Speech Recognizer
Channel/Speaker
Adaptation
Prosodic
Analysis
Statistical
Parser
Chunk
Parser
Dialog Act
Recognition
Word Hypotheses
Graph with
Prosodic Labels HPSG
Parser
Semantic
Construction
Robust Dialog
Semantics
VITs
Underspecified
Discourse
Representations
Semantic
Transfer
Generation
© Wolfgang Wahlster, DFKI Dagstuhl 2000
The Use of Prosodic Information at All Processing Stages
Speech Signal Word Hypotheses Graph
Multilingual Prosody Module
Prosodic features:
duration
pitch
energy
pause
Accented
Words
Boundary
Information
Boundary
Information
Sentence
Mood
Search Space
Restriction
Parsing
© Wolfgang Wahlster, DFKI
Dialog Act
Segmentation and
Recognition
Dialog
Understanding
Constraints for
Transfer
Translation
Lexical
Choice
Generation
Prosodic Feature
Vector
Speaker
Adaptation
Speech
Synthesis
Dagstuhl 2000
Concurrent processing modules combine deep semantic translation with shallow surface-oriented translation methods.
Expensive, but precise Translation
Principled and compositional syntactic and semantic analysis
Semantic-based transfer of
Verbmobil Interface Terms (VITs) as set of underspecified DRS
Results with
Confidence Values
Word Lattice time out?
Cheap, but approximate Translation
Case-based Translation
Dialog-act based translation
Statistical translation
Selection of best result
Results with
Confidence Values
Acceptable Translation Rate
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Integrating Shallow and Deep Analysis
Components in a Multi-Blackboard Architecture
Augmented
Word Hypotheses
Graph
Statistical Parser partial VITs
Chunk Parser partial VITs
Chart with a combination of partial VITs
Robust Dialog Semantics
Combination and knowledgebased reconstruction of complete VITs
Complete and Spanning
VITs
© Wolfgang Wahlster, DFKI
HPSG Parser partial VITs
Dagstuhl 2000
VHG: A Packed Chart Representation of Partial
Semantic Representations
Incremental chart construction and anytime processing
Rule-based combination and transformation of partial UDRS coded as VITs
Selection of a spanning analysis using a bigram model for VITs
(trained on a tree bank of 24 k VITs)
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on treebank
Very fast HPSG parser
© Wolfgang Wahlster, DFKI
Semantic
Construction
Dagstuhl 2000
The Understanding of Spontaneous Speech Repairs
Original Utterance
I need a car next Tuesday
Reparandum
Editing Phase Repair Phase oops
Hesitation
Monday
Reparans
Recognition of
Substitutions
Transformation of the
Word Hypothesis Graph
I need a car next Monday
Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning
Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Automatic Understanding and Correction of Speech
Repairs in Spontaneous Telephone Dialogs
Wir treffen uns in
Mannheim, äh, in Saarbrücken.
(We are meeting in
Mannheim, oops, in Saarbruecken.)
German
© Wolfgang Wahlster, DFKI
English
We are meeting in Saarbruecken.
Dagstuhl 2000
Robust Dialog Semantics: Combining and
Completing Partial Representations
Let us meet (in) the late afternoon to catch the train to Frankfurt
Let us meet the late afternoon to catch the train to Frankfurt
The preposition ‚in‘ is missing in all paths through the word hypotheses graph.
A temporal NP is transformed into a temporal modifier using a underspecified temporal relation:
[temporal_np(V1)]
[typeraise_to_mod (V1, V2)] & V2
The modifier is applied to a proposition:
[type (V1, prop), type (V2, mod)]
[apply (V2, V1, V3)] & V3
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Integrating Deep and Shallow Processing: Combining
Results from Concurrent Translation Threads
Segment 1
If you prefer another hotel,
Segment 2 please let me know.
Statistical
Translation
Case-Based
Translation
Dialog-Act Based
Translation
Alternative Translations with Confidence Values
Selection Module
Semantic
Transfer
Segment 1
Translated by Semantic Transfer
© Wolfgang Wahlster, DFKI
Segment 2
Translated by Case-Based Translation
Dagstuhl 2000
Sentence to synthesize
I have time on
I
I
I I
I have have time
Edge direction on on on monday.
monday monday
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Linguatronic : Spoken Dialogs with Mercedes-Benz
Please call Doris Wahlster.
Microphone
Push-to-talk
Switch
Open the left window in the back.
I want to hear the weather channel.
When will I reach the next gas station?
Where is the next parking lot?
Speech control of: cellular phone, radio, windows / AC, route guidance system
Option for S-, C-, and E-Class of Mercedes and BMW
Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)
© Wolfgang Wahlster, DFKI Dagstuhl 2000
International Research Trends in Multilingual Systems
Multilingual Language Technology
Speech Recognition, Language Understanding, Language Generation, and Speech Synthesis
Dialog Translation
Call Centers
ECommerce
Mobile Travel
Assistance
Telephone
Translations
Verbmobil
Multilingual
Indexing and
Annotation of
Videos
Video Archives
News Archives
Multilingual
Audio Retrieval and Audio Mining
Discussions
Lecture Notes
Organizers
Speech-based
Web Access to Multilingual
Web pages
WAP Phones
WebTV
Multilingual and Mobile
Communication
Assistants
Multimodal
Interfaces
SmartKom
Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches .
In a multi-blackboard architecture based on packed representations on all processing levels (speech recognition, parsing, semantic processing, translation, generation) using charts with underspecified representations (eg. UDRS) the results of concurrent processing threads can be combined in an incremental fashion.
© Wolfgang Wahlster, DFKI Dagstuhl 2000
All results of concurrent processing modules should come with a confidence value , so that a selection module can choose the most promising result at a each processing stage.
Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable.
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies.
Shallow methods can be used to guide the search in deep processing.
Statistical methods must be augmented by symbolic models (eg. Class-based language modelling, word order normalization as part of statistical translation).
Statistical methods can be used to learn operators or selection strategies for symbolic processes.
© Wolfgang Wahlster, DFKI Dagstuhl 2000
Problems with current machine learning approaches
L Expensive data collection
L Cognitively unrealistic training data
L Data sparseness
Problems with current hand-crafted knowledge sources
L Brittleness
L Domain dependence
L Limited scalability
© Wolfgang Wahlster, DFKI Dagstuhl 2000
-500 years TODAY +50 years
Oral Society Textual Society Oral Society
News and knowledge is passed orally
No mass storage
No automatic processing
No automatic retrieval
News and knowledge is passed textually
Mass storage of texts
Text Processing
Text Retrieval
News and knowledge is passed orally
Mass storage of speech
Speech Processing
Audio Retrieval
© Wolfgang Wahlster, DFKI Dagstuhl 2000