From Speech Recognition Towards Speech Understanding

advertisement
Heraeus-Seminar
„Speech Recognition and Speech Understanding“
Physikzentrum Bad Honnef, April 5, 2000
From Speech Recognition
Towards Speech Understanding
Wolfgang Wahlster
German Research Center for Artificial
Intelligence, DFKI GmbH
Stuhlsatzenhausweg 3
66123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162
fax: (+49 681) 302-5341
e-mail: wahlster@dfki.de
WWW:http://www.dfki.de/~wahlster
Outline
1. Speech-to-Speech Translation: Challenges for Language Technology
2. A Multi-Blackboard Architecture for the Integration of Deep and
Shallow Processing
3. Integrating the Results of Multiple Deep and Shallow Parsers
4. Packed Chart Structures for Partial Semantic Representations
5. Robust Semantic Processing: Merging and Completing Discourse
Representations
6. Combining the Results of Deep and Shallow Translation Threads
7. The Impact of Verbmobil on German Language Industry
8. SmartKom: Integrating Verbmobil Technology Into an Intelligent
Interface Agent
9. Conclusion
Signal-Symbol-Signal Transformations in
Spoken Dialog Systems
Input
Speech
Signal
Subsymbolic
Processing
Speech
Recognition
SubSymbolic Processing symbolic
Processing
Speech Understanding &
Generation
Output
Speech
Signal
Speech
Synthesis
 W. Wahlster, DFKI
Three Levels of Language Processing
Speech Telephone Input
Speech Recognition
Word Lists
Sprachanalyse
What has the caller
said?
100
Alternatives
Speech Analysis
Grammar
Lexical
Meaning
Speech
Understanding
What has the caller
meant?
10
Alternatives
Discourse Context
Knowledge
about Domain
of Discourse
Reduction of Uncertainty
Acoustic
Language Modells
What does the caller
want?
Unambiguous
Understanding in the
Dialog Context
 W. Wahlster, DFKI
Increasing Complexity
Challenges for Language Engineering
Input Conditions
Naturalness
Adaptability
Dialog Capabilities
Close-Speaking
Microphone/Headset
Push-to-talk
Isolated Words
Speaker
Dependent
Monolog
Dictation
Telephone,
Pause-based
Segmentation
Read Continuous
Speech
Speaker
Independent
Informationseeking Dialog
Open Microphone,
GSM Quality
Spontaneous
Speech
Speaker
adaptive
Multiparty
Negotiation
Verbmobil
 W. Wahlster, DFKI
Telephone-based Dialog Translation
German
German
GermanEnglish
Bianca/Brick XS
BinTec
English German
ISDN-LAN Router
English
English
American Dialog
Partner
Sun ULTRA 60/80
l
German Dialog
Partner
LINUX Server
l
ISDN Conference Call
(3 Participants)
German Speaker: Verbmobil:
American Speaker
Speech-based Set-up
of the Conference Call
Sun Server 450
l
Verbmobil
Server
Cluster
 W. Wahlster, DFKI
Context-Sensitive Speech-to-Speech Translation
Wann fährt der nächste
Zug nach Hamburg ab?
When does the next
train to Hamburg depart?
Wo befindet sich
das nächste
Hotel?
Where is the nearest
hotel?
Verbmobil
Server
Final Verbmobil Demos:l ECAI-2000 (Berlin)
l CeBIT-2000 (Hannover)
l COLING-2000 (Saarbrücken)
 W. Wahlster, DFKI
Dialog Translation 1
If I get the train at
2 o‘clock I am in
Frankfurt at 4 o‘clock.
We could meet at the
airport.
Wenn ich den Zug
um 14 Uhr bekomme,
bin ich um 4 in
Frankfurt.
Am Flughafen
könnten
wir uns treffen.
 W. Wahlster, DFKI
Dialog Translation 2
We could go out
for dinner in the
evening.
What time in the
evening?
Abends
könnten wir
Essen gehen.
Wann denn
am Abend?
 W. Wahlster, DFKI
Dialog Translation 3
I could reserve a
table for 8
o‘clock.
Ich könnte für
8 Uhr einen
Tisch reservieren.
 W. Wahlster, DFKI
Verbmobil II: Three Domains of Discourse
Scenario 1
Scenario 2
Scenario 3
Appointment
Scheduling
Travel Planning &
Hotel Reservation
PC-Maintenance
Hotline
When?
When? Where? How?
What? When? Where?
How?
Focus on temporal
expressions
Focus on temporal
and spatial expressions
Integration of special
sublanguage lexica
Vocabulary Size:
2500/6000
Vocabulary Size:
7000/10000
Vocabulary Size:
15000/30000
 W. Wahlster, DFKI
Data Collection with Mulitiple Input
Devices
Room
Microphone
CloseSpeaking
Microphone
GSM
Mobile
Phone
ISDN
Phone
> 43 CDs of transliterated speech data, aligned translations
> 5.000 Dialogs
> 50.000 Turns
>10.000 Lemmata
 W. Wahlster, DFKI
Extracting Statistical Properties from Large Corpora
Transcribed
Speech Data
Segmented
Speech
with Prosodic
Labels
Annotated
Dialogs with
Dialog Acts
Treebanks &
PredicateArgument
Structures
Aligned
Bilingual
Corpora
Machine Learning
for the Integration of Statistical Properties into
Symbolic Models for Speech Recognition, Parsing,
Dialog Processing, Translation
Hidden
Markov
Models
Neural Nets,
Multilayered
Perceptrons
Probabilistic
Automata
Probabilistic
Grammars
Probabilistic
Transfer
Rules
 W. Wahlster, DFKI
Verbmobil Partner
TU-BRAUNSCHWEIG
RHEINISCHE FRIEDRICH
WILHELMS-UNIVERSITÄT
BONN
DAIMLERCHRYSLER
LUDWIG
MAXIMILIANS
UNIVERSITÄT
MÜNCHEN
Phase 2
UNIVERSITÄT
BIELEFELD
UNIVERSITÄT DES
SAARLANDES
TECHNISCHE
UNIVERSITÄT
MÜNCHEN
UNIVERSITÄT
HAMBURG
FRIEDRICHALEXANDEREBERHARDT-KARLS
UNIVERSITÄT
UNIVERSITÄT
ERLANGEN-NÜRNBERG
TÜBINGEN
UNIVERSITÄT
KARLSRUHE
UNIVERSITÄT
STUTTGART
RUHR-UNIVERSITÄT
BOCHUM
 W. Wahlster, DFKI
The Control Panel of Verbmobil
 W. Wahlster, DFKI
From a Multi-Agent Architecture to a Multi-Blackboard
Architecture
Verbmobil I
Verbmobil II
l Multi-Agent Architecture
l Multi-Blackboard Architecture
M3
M1
M1
M2
M3
Blackboards
M2
M4
BB 1
BB 2
BB 3
M4
M5
M6
M5
M6
l Each module must know, which module
produces what data
l Direct communication between modules
l Each module has only one instance
l Heavy data traffic for moving copies
around
l Multiparty and telecooperation applications
are impossible
l Software: ICE and ICE Master
l Basic Platform: PVM
l All modules can register for each blackboard
dynamically
l No direct communication between modules
l Each module can have several instances
l No copies of representation structures
(word lattice, VIT chart)
l Multiparty and Telecooperation applications are
possible
l Software: PCA and Module Manager
l Basic Platform: PVM
 W. Wahlster, DFKI
Multi-Blackboard/Multi-Agent Architecture
Module 1
Blackboard 1
Preprocessed
Speech Signal
Blackboard 2
Word Lattice
Module 4
Module 2
Blackboard 3
Syntactic
Representation:
Parsing
Results
Module 5
Module 3
Blackboard 4
Semantic
Representation:
Lambda
DRS
Blackboard 5
Dialog
Acts
Module 6
 W. Wahlster, DFKI
A Multi-Blackboard Architecture for the Combination
of Results from Deep and Shallow Processing Modules
Command
Recognizer
Channel/Speaker
Adaptation
Audio Data
Spontaneous
Speech Recognizer
Prosodic
Analysis
Statistical
Parser
Chunk
Parser
Dialog Act
Recognition
Semantic
Construction
Robust Dialog
Semantics
Word Hypothesis
Graph with
Prosodic Labels
VITs
Underspecified
Discourse
Representations
HPSG
Parser
Semantic
Transfer
Generation
 W. Wahlster, DFKI
Integrating Shallow and Deep Analysis
Components in a Multi-Blackboard Architecture
Augmented
Word Lattice
Statistical Parser
Chunk Parser
HPSG Parser
partial VITs
partial VITs
Chart with a
combination of
partial VITs
partial VITs
Robust Dialog Semantics
Combination and knowledgebased reconstruction of
complete VITs
Complete and Spanning
VITs
 W. Wahlster, DFKI
VHG: A Packed Chart Representation of Partial
Semantic Representations
l Incremental chart construction and anytime processing
l Rule-based combination and transformation of partial UDRS coded as VITs
l Selection of a spanning analysis using a bigram model for VITs
(trained on a tree bank of 24 k VITs)
l Chart Parser using cascaded finite-state transducers
(Abney, Hinrichs)
l Statistical LR parser trained on treebank
(Block, Ruland)
l Very fast HPSG parser
(see two papers at ACL99, Kiefer, Krieger et al.)
Semantic
Construction
 W. Wahlster, DFKI
Robust Dialog Semantics: Deep Processing of
Shallow Structures
Goals of robust semantic processing (Pinkal, Worm, Rupp)
l Combination of unrelated analysis fragments
l Completion of incomplete analysis results
l Skipping of irrelevant fragments
Method: Transformation rules on VIT Hypothesis Graph:
Conditions on VIT structures  Operations on VIT structures
The rules are based on various knowledge sources:
l lattice of semantic types
l domain ontology
l sortal restrictions
l semantic constraints
Results: 20% analysis is improved, 0.6% analysis gets worse
 W. Wahlster, DFKI
Semantic Correction of Recognition Errors
Wir treffen
uns
Kaiserslautern.
(We are
meeting
Kaiserslautern.)
German
English
We are
meeting in
Kaiserslautern.
 W. Wahlster, DFKI
Robust Dialog Semantics: Combining and
Completing Partial Representations
Let us meet (in) the late afternoon to catch the train to Frankfurt
Let us
meet
the late afternoon
to catch
the train
to Frankfurt
The preposition ‚in‘ is missing in all paths through the word hypothesis graph.
A temporal NP is transformed into a temporal modifier using a underspecified
temporal relation:
[temporal_np(V1)]  [typeraise_to_mod (V1, V2)] & V2
The modifier is applied to a proposition:
[type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3
 W. Wahlster, DFKI
The Understanding of Spontaneous Speech
Repairs
Original Utterance
Editing Phase
Repair Phase
I need a car next Tuesday
oops
Monday
Hesitation
Reparans
Reparandum
Recognition of
Substitutions
Transformation of the
Word Hypothesis Graph
I need a car next Monday
Verbmobil Technology: Understands Speech Repairs and extracts the
intended meaning
Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking
cannot deal with spontaneous speech and transcribe
the corrupted utterances.
 W. Wahlster, DFKI
Automatic Understanding and Correction of Speech
Repairs in Spontaneous Telephone Dialogs
Wir treffen uns in
Mannheim, äh,
in Saarbrücken.
(We are meeting in
Mannheim, oops,
in Saarbruecken.)
German
English
We are meeting
in Saarbruecken.
 W. Wahlster, DFKI
Integrating a Deep HPSG-based Analysis with
Probabilistic Dialog Act Recognition for
Semantic Transfer
HPSG Analysis
Probabilistic
Analysis of Dialog
Acts (HMM)
Dialog Act Type
Dialog Act Type
Recognition of
Dialog Plans
(Plan Operators)
Robust
Dialog Semantics
VIT
Dialog Phase
Semantic
Transfer
 W. Wahlster, DFKI
The Dialog Act Hierarchy used for Planning,
Prediction, Translation and Generation
CONTROL_DIALOG
GREETING_BEGIN
GREETING
GREETING_END
INTRODUCE
POLITENESS_FORMULA
THANK
DELIBERATE
BACKCHANNEL
INIT
MANAGE_TASK
Dialog Act
DEFER
CLOSE
REQUEST
PROMOTE_TASK
REQUEST_SUGGEST
REQUEST_CLARIFY
REQUEST_COMMENT
REQUEST_COMMIT
DEVIATE_SCENARIO
REFER_TO_SETTING
SUGGEST
DIGRESS
EXCLUDE
INFORM
CLARIFY
GIVE_REASON
FEEDBACK
FEEDBACK_NEGATIVE
REJECT
COMMIT
FEEDBACK_POSITIVE
ACCEPT
CONFIRM
CLARIFY_ANSWER
EXPLAINED_REJECT
 W. Wahlster, DFKI
Combining Statistical and Symbolic Processing
for Dialog Processing
Dialog-Act
based
Translation
Dialog Module
Context
Evaluation
Statistical
Prediction
Dialog Act
Predictions
Context
Evaluation
Main
Proprositional
Content
Focus
Plan
Recognition
Dialog
Phase
Transfer
by Rules
Dialog Act
Dialog-Act
based
Translation
Dialog
Memory
Dialog Act
Generation
of Minutes
 W. Wahlster, DFKI
Statistical Dialog Act Recognition
 Statistical approach: find most probable dialog act D for words W :
D = argmax P(D’ | W)
D’
 Bayes’ formula:
D = argmax P(W | D’) P(D’)
D’
 Use dialog context H :
D = argmax P(W | D’) P(D’ | H)
D’
 Approximation of a-priori word probabilities P(W | D) and dialog act
probabilities P(D | H) from the corpus
 W. Wahlster, DFKI
Learning of Probabilistic Plan Operators from
Annotated Corpora
( OPERATOR-s-10523-6
goal [IN-TURN confirm-s-10523 ?SLASH-3314 ?SLASH-3316]
subgoals (sequence
[IN-TURN confirm-s-10521 ?SLASH-3314 ?SLASH-3315]
[IN-TURN confirm-s-10522 ?SLASH-3315 ?SLASH-3316])
PROB 0.72)
( OPERATOR-s-10521-8
goal [IN-TURN confirm-s-10521 ?SLASH-3321 ?SLASH-3322]
subgoals (sequence [DOMAIN-DEPENDENT accept ?SLASH-3321 ?SLASH-3322])
PROB 0.95)
( OPERATOR-s-10522-10
goal [IN-TURN confirm-s-10522 ?SLASH-3325 ?SLASH-3326]
subgoals (sequence [DOMAIN-DEPENDENT confirm ?SLASH-3325 ?SLASH-3326])
PROB 0.83)
 W. Wahlster, DFKI
Automatic Generation of Multilingual Protocols
of Telephone Conversations
Dialog Translation
by Verbmobil
Multilingual
Generation of Protocols
German Dialog
Partner
HTML-Document
HTML-Document
In English
In English
Transfered by
Transfered by
Internet or Fax
Internet or Fax
American Dialog Partner
 W. Wahlster, DFKI
Automatic Generation of Minutes
A and B greet each other.
A: (INIT_DATE, SUGGEST_SUPPORT_DATE, REQUEST_COMMENT_DATE)
I would like to make a date. How about the seventeenth? Is that ok with you?
B: (REJECT_DATE, ACCEPT_DATE)
The seventeenth does not suit me. I’m free for one hour at three o’clock.
A: (SUGGEST_SUPPORT_DATE)
How about the sixteenth in the afternoon?
B: (CLARIFY_QUERY, ACCEPT_DATE, CONFIRM)
The sixteenth at two o’clock? That suits me. Ok.
A and B say goodbye.
Minutes generated automatically on 23 May 1999 08:35:18 h
 W. Wahlster, DFKI
Dialog Protocol
Participants: Speaker B, Speaker A
Date: 22.3.2000
Time: 8:57 AM to 10:03 AM
Theme: Appointment schedule with trip and accommodation
DIALOGUE RESULTS:
Scheduling:
Speaker B and speaker A will meet in the train station on the 1st of march 2000
at a quarter to 10 in the morning.
Travelling:
There the trip from Hamburg to Hanover by train will start on the 2nd of march at
10 o'clock in the morning.
The way back by train will start on the 2nd of march at half past 6 in the evening.
Accommodation:
The hotel Luisenhof in Hanover was agreed on. Speaker A is taking care of the
hotel reservation.
Summary automatically generated at 22.3.2000 12:31:24 h
 W. Wahlster, DFKI
Spoken Clarification Dialogs between the
User and the Verbmobil System
English
Translation
German Input
User 1
Clarification
Subdialog in
German
Clarification caused by:
lspeech recognition problems
llack of context knowledge
linconsistency with regard to
the system’s knowledge
Verbmobil
System
User 2
English Input
 confusion with similar words (Sonntag vs.
Sonntags)
 unknown words (heuer  dieses Jahr)
 lexical ambiguity (Noch einen Termin bitte!)
 inconsistent date (Freitag, 24. Oktober)
 W. Wahlster, DFKI
Competing Strategies for Robust Speech
Translation
Concurrent processing modules of Verbmobil combine deep semantic translation
with shallow surface-oriented translation methods.
Word Lattice
Expensive, but precise Translation
Cheap, but approximate Translation
time
out?
l Principled and compositional
syntactic and semantic analysis
l Case-based Translation
l Dialog-act based translation
l Semantic-based transfer of
Verbmobil Interface Terms (VITs)
as set of underspecified DRS
l Statistical translation
Results with
Confidence Values
Selection of
best result
Results with
Confidence Values
Acceptable Translation Rate
 W. Wahlster, DFKI
Architecture of the Semantic Transfer Module
Bilingual Dictionary
Refined VIT (L1)
Monolingual
Refinement
Rules
Refinement
Refinement
VIT (L1)
Disambiguation
Rules
Underspecified VIT (L1)
Refined VIT (L2)
Lexical Transfer
Phrasal Transfer
Phrasal Dictionary
Monolingual
Refinement
Rules
VIT (L2)
Disambiguation
Rules
Underspecified VIT (L2)
 W. Wahlster, DFKI
Extensions of Discourse Representation
Theory
The Verbmobil version of - DRT (Pinkal et al.) includes various
extension of DRT:
l lambda:
 - abstraction over DRSs
l merge operator:
combination of DRSs
l functional application:
basic composition operation
l quants feature:
allows scope-free semantic representation
l alfa expressions:
representation of anaphoric elements with
underspecified reference
l anchors list:
representation of deictic information
l epsilon expressions:
underspecification of elliptical expressions
l modal expressions:
representation of propositional attitudes
 W. Wahlster, DFKI
Three English Translations of the German
Word “Termin” Found in the Verbmobil Corpus
Subsumption Relations
in the Domain Model
1. Verschieben wir den Termin.
Let’s reschedule the appointment
2. Schlagen Sie einen Termin vor.
Suggest a date.
3. Da habe ich einen Termin frei.
I have got a free slot there.
scheduled event
default
temporal_specification
appointment set_start_time time_interval
date
slot
 W. Wahlster, DFKI
Entries in the Transfer Lexicon:
German  English (Simplified)
tau_lex (termin, appointment, pred_sort (subsumption (scheduled_event))).
tau_lex (termin, date,
pred_sort (subsumption (set_start_time)).
tau_lex (termin, slot,
pred_sort (subsumption (time_interval))).
tau_lex (verschieben, reschedule, [tau (#S), tau (#0)],
pred_args ([#S, #0 & pred_sort (scheduled_event)]))
tau_lex (ausmachen, make, [tau (#S), tau (#0)],
pred_args ([#S, #0 & pred_sort (scheduled_event)]))
tau_lex (ausmachen, fix, [tau (#S), tau (#0)],
pred_args ([#S, #0 & pred_sort (set_start_time)]))
tau_lex (freihaben, have_free, [tau (#S), tau (#0)],
pred_args ([#S, #0 & pred_sort (time_interval)]))
 W. Wahlster, DFKI
Context-Sensitive Translation Exploiting a Discourse
Model
Example:
Three different translations of the German word Platz
1
2
3
room / table / seat
Nehmen wir dieses Hotel, ja.
Let us take this hotel.
Ich reserviere einen Platz.
I reserve a room.
Machen wir das Abendessen dort.
Let us have dinner there.
Ich reserviere einen Platz.
I reserve a table.
Gehen wir ins Theater.
Let us go to the theater.
Ich möchte Plätze reservieren.
I would like to reserve seats.
All other dialog translation systems translate sentece by sentence
without taking the dialog context into account.
 W. Wahlster, DFKI
The Use of Underspecified Representations
Two Readings in the
Source Language
Wir telephonierten
A compact representation
of scope ambiguities in a
logical language without
using disjunctions
Two Readings in the
Target Language
mit Freunden
Underspecified
Semantic
Representation
We called
friends
aus Schweden.
Ambiguity
Preserving
Translations
from Sweden.
 W. Wahlster, DFKI
The Control Panel of Verbmobil
 W. Wahlster, DFKI
Integrating Deep and Shallow Processing: Combining
Results from Concurrent Translation Threads
Segment 1
Wenn
If you
wirprefer
den Termin
another
vorziehen,
hotel,
Statistical
Translation
Case-Based
Translation
Segment 2
dasplease
würde let
mirme
gutknow.
passen.
Dialog-Act Based
Translation
Semantic
Transfer
Alternative Translations with Confidence Values
Selection Module
Segment 1
Segment 2
Translated by Semantic Transfer
Translated by Case-Based Translation
 W. Wahlster, DFKI
A Context-Free Approach to the Selection of the Best
Translation Result
SEQ
:= Set of all translation sequences for a turn
SeqSEQ := Sequence of translation segments s1, s2, ...sn
Input:
Each translation thread provides for every segment an online confidence
value confidence (thread.segment)
Task: Compute normalized confidence values for translated Seq
CONF (Seq) =

Length(segment) * (alpha(thread)
segment
+ beta(thread) * confidence(thread.segment))
 Seq
Output:
Best (SEQ) = {Seq  SEQ | Seq is maximal element in (SEQ CONF)
 W. Wahlster, DFKI
Learning the Normalizing Factors Alpha and
Beta from an Annotated Corpus
Turn := segment1, segment2...segmentn
For each turn in a training corpus
all segments translated by one of the four translation threads are
manually annotated with a score for translation quality.
For the sequence of n segments resulting in the best overall translation
score at most 4n linear inequations are generated, so that the selected
sequence is better than all alternative translation sequences.
From the set of inequations for spanning analyses ( 4n) the values of
alpha and beta can be determind offline by solving the constraint system.
 W. Wahlster, DFKI
Example of a Linear Inequation Used for Offline Learning
Turn := Segment_1 Segment_2 Segment_3
Statistical Translation = STAT Case-based Translation = CASE
Dialog-Act Based Translation = DIAL Semantic Transfer = SEMT
quality (CASE, Segment_1), quality (SEMT, Segment_2), quality (STAT, Sement_3)
is optimal
Length (Segment_1) * (alpha (CASE ) + beta (CASE) * confidence (CASE, Segment_1))
Length (Segment_2) * (alpha (SEMT) + beta (SEMT) * confidence (SEMT, Segment_2))
Length (Segment_3) * (alpha (STAT) + beta (STAT) * confidence (STAT, Segment_3))
>
Length (Segment_1) * (alpha (DIAL) + beta (DIAL) * confidence (DIAL, Segment_1))
Length (Segment_2) * (alpha (DIAL) + beta (DIAL) * confidence (DIAL, Segment_2))
Length (Segment_3) * (alpha (DIAL) + beta (DIAL) * confidence (DIAL, Segment_3))
 W. Wahlster, DFKI
The Context-Sensitive Selection of the Best Translation
Using probabilities of dialog acts in the normalization process
CONF (Seq) =

Length (segment) * (alpha (thread) +
segment
 Seq
dialog-act (thread, segment) +
beta (thread) * confidence (thread, segmnet))
e.g. Greet (Statistical_Translation, Segment > Greet (Semantic_Transfer, Segment)
Suggest (Semantic_Transfer, Segment) > Suggest (Case_based Translation,
Segment)
Exploiting meta-knowledge
If the semantic transfer generates  x disambiguation tasks
then increase the alpha and beta values for semantic transfer.
e.g. einen Termin vorziehen  prefer/give priority to/bring forward <a date>
Observation: Even on the meta-control level
(selection module) a hybrid approach is advantageous.
 W. Wahlster, DFKI
Verbmobil: Long-Term, Large-Scale Funding and
Its Impact
l Funding by the German Ministry for Education and Research BMBF
Phase I (1993-1996)
Phase II (1997-2000)
$ 33 M
$ 28 M
l 60% Industrial funding according to shared cost model
l Additional R&D investments of industrial partners
$ 17 M
$ 11 M
Total
$ 89 M
l > 400 Publications (>250 refereed)
l > Many Patents
l > 10 Commercial Spin-off Products l > Many new Spin-off Companies
l > 100 New jobs in German Language l > 50 Academics transferred to
Industry
Industry
Philips, DaimlerChrysler and Siemens are leaders in Spoken Dialog
Applications
 W. Wahlster, DFKI
SmartKom: Intuitive Multimodal Interaction
Project Budget:
Project Duration:
$ 34 M
4 years
The SmartKom Consortium:
Main Contractor
Project Management
Testbed
Software Integration
DFKI
Saarbrücken
Uinv. Of
Munich
MediaInterface
Berkeley
Dresden
Saarbrücken
European Media Lab
Heidelberg
Univ. of
Erlangen
DAIMLERCHRYSLER
Aachen
Ulm
Univ. of
Stuttgart
Munich
Stuttgart
 W. Wahlster, DFKI
The Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998)
Input
Processing
Media
Media
Analysis
Analysis
Interaction
Management
Language
Media Fusion
Gesture
Discourse
Modeling
Biometrics
Media
Design
Design
Intention
Recognition
Language
User(s)
Graphics
User
Modeling
Application Interface
Graphics
Information
Applications
People
Gesture
Animated
Presentation
Agent
Presentation
Design
Output
Rendering
User
Model
Discourse
Model
Domain
Model
Task
Model
Media
Models
Representation and Inference
 W. Wahlster, DFKI
SmartKom: A Transportable and Transmutable Interface
Agent
Media Analysis
Media
Design
SmartKom-Mobile:
A Handheld
Communication
Assistant
Kernel of
SmartKom
Interface
Agent
Application
Management
Interaction Management
SmartKom-Public:
A Multimodal
Communication
Booth
SmartKom-Home/Office:
A Versatile Agent-based Interface
 W. Wahlster, DFKI
SmartKom-Public:
A Multimodal
Communication Booth
Loudspeaker
Room microphone
Smartcard/
Credit Card
for authentication
and billing
Face-tracking camera
Virtual touchscreen
protected against vandalism
Multipoint video conferencing
Docking station
for PDA/Notebook/
Camcorder
high speed and broad
bandwidth Internet
connectivity
High-resolution scanner
 W. Wahlster, DFKI
SmartKom-Mobile: A Handheld Communication Assistant
GPS
GSM for Telephone,
Fax, Internet
Connectivity
Camera
Wearable
Compute
Server
Stylus-Activated Sketch Pad
Microphone
MOBILE
Biosensor
for Authentication
& Emotional Feedback
Loudspeaker
Docking Station
for Car PC
 W. Wahlster, DFKI
SmartKom-Home/Office:
A Versatile Agent-based Interface
SpeechMike
Natural Gesture Recognition
Virtual Touchscreen
 W. Wahlster, DFKI
Speech-based Interaction with an Organizer
on a WAP Phone (Voice In - WML out)
With Maier
on 25 Oktober,
with Tetzlaff,
and with Streit too.
Oops, not with Streit.
From 2 to 3.
Okay!
 W. Wahlster, DFKI
Conclusion
g
Real-world problems in language technology like the
understanding of spoken dialogs, speech-to-speech
translation and multimodal dialog systems can only
be cracked by the combined muscle of deep and shallow
processing approaches.
g
In a multi-blackboard architecture based on packed
representations on all processing levels (speech
recognition, parsing, semantic processing, translation,
generation) using charts with underspecified
representations (eg. UDRS) the results of concurrent
processing threads can be combined in an incremental
fashion.
 W. Wahlster, DFKI
Conclusion
g
All results of concurrent processing modules should
come with a confidence value, so that a selection module
can choose the most promising result at a each
processing stage.
g
Packed representations together with formalisms for
underspecification capture the uncertainties in a each
processing phase, so that the uncertainties can be
reduced by linguistic, discourse and domain constraints
as soon as they become applicable.
 W. Wahlster, DFKI
Conclusion
g
Deep Processing can be used for merging, completing
and repairing the results of shallow processing strategies.
g
Shallow methods can be used to guide the search in deep
processing.
g
Statistical methods must be augmented by symbolic
models (eg. Class-based language modelling, word order
normalization as part of statistical translation).
g
Statistical methods can be used to learn operators or
selection strategies for symbolic processes.
It is much more than a balancing act...
(see Klavans and Resnik 1996)
 W. Wahlster, DFKI
Download