From Speech Recognition Towards Speech Understanding

Heraeus-Seminar „Speech Recognition and Speech Understanding“ Physikzentrum Bad Honnef, April 5, 2000 From Speech Recognition Towards Speech Understanding Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Outline 1. Speech-to-Speech Translation: Challenges for Language Technology 2. A Multi-Blackboard Architecture for the Integration of Deep and Shallow Processing 3. Integrating the Results of Multiple Deep and Shallow Parsers 4. Packed Chart Structures for Partial Semantic Representations 5. Robust Semantic Processing: Merging and Completing Discourse Representations 6. Combining the Results of Deep and Shallow Translation Threads 7. The Impact of Verbmobil on German Language Industry 8. SmartKom: Integrating Verbmobil Technology Into an Intelligent Interface Agent 9. Conclusion Signal-Symbol-Signal Transformations in Spoken Dialog Systems Input Speech Signal Subsymbolic Processing Speech Recognition SubSymbolic Processing symbolic Processing Speech Understanding & Generation Output Speech Signal Speech Synthesis  W. Wahlster, DFKI Three Levels of Language Processing Speech Telephone Input Speech Recognition Word Lists Sprachanalyse What has the caller said? 100 Alternatives Speech Analysis Grammar Lexical Meaning Speech Understanding What has the caller meant? 10 Alternatives Discourse Context Knowledge about Domain of Discourse Reduction of Uncertainty Acoustic Language Modells What does the caller want? Unambiguous Understanding in the Dialog Context  W. Wahlster, DFKI Increasing Complexity Challenges for Language Engineering Input Conditions Naturalness Adaptability Dialog Capabilities Close-Speaking Microphone/Headset Push-to-talk Isolated Words Speaker Dependent Monolog Dictation Telephone, Pause-based Segmentation Read Continuous Speech Speaker Independent Informationseeking Dialog Open Microphone, GSM Quality Spontaneous Speech Speaker adaptive Multiparty Negotiation Verbmobil  W. Wahlster, DFKI Telephone-based Dialog Translation German German GermanEnglish Bianca/Brick XS BinTec English German ISDN-LAN Router English English American Dialog Partner Sun ULTRA 60/80 l German Dialog Partner LINUX Server l ISDN Conference Call (3 Participants) German Speaker: Verbmobil: American Speaker Speech-based Set-up of the Conference Call Sun Server 450 l Verbmobil Server Cluster  W. Wahlster, DFKI Context-Sensitive Speech-to-Speech Translation Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Verbmobil Server Final Verbmobil Demos:l ECAI-2000 (Berlin) l CeBIT-2000 (Hannover) l COLING-2000 (Saarbrücken)  W. Wahlster, DFKI Dialog Translation 1 If I get the train at 2 o‘clock I am in Frankfurt at 4 o‘clock. We could meet at the airport. Wenn ich den Zug um 14 Uhr bekomme, bin ich um 4 in Frankfurt. Am Flughafen könnten wir uns treffen.  W. Wahlster, DFKI Dialog Translation 2 We could go out for dinner in the evening. What time in the evening? Abends könnten wir Essen gehen. Wann denn am Abend?  W. Wahlster, DFKI Dialog Translation 3 I could reserve a table for 8 o‘clock. Ich könnte für 8 Uhr einen Tisch reservieren.  W. Wahlster, DFKI Verbmobil II: Three Domains of Discourse Scenario 1 Scenario 2 Scenario 3 Appointment Scheduling Travel Planning & Hotel Reservation PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica Vocabulary Size: 2500/6000 Vocabulary Size: 7000/10000 Vocabulary Size: 15000/30000  W. Wahlster, DFKI Data Collection with Mulitiple Input Devices Room Microphone CloseSpeaking Microphone GSM Mobile Phone ISDN Phone > 43 CDs of transliterated speech data, aligned translations > 5.000 Dialogs > 50.000 Turns >10.000 Lemmata  W. Wahlster, DFKI Extracting Statistical Properties from Large Corpora Transcribed Speech Data Segmented Speech with Prosodic Labels Annotated Dialogs with Dialog Acts Treebanks & PredicateArgument Structures Aligned Bilingual Corpora Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Hidden Markov Models Neural Nets, Multilayered Perceptrons Probabilistic Automata Probabilistic Grammars Probabilistic Transfer Rules  W. Wahlster, DFKI Verbmobil Partner TU-BRAUNSCHWEIG RHEINISCHE FRIEDRICH WILHELMS-UNIVERSITÄT BONN DAIMLERCHRYSLER LUDWIG MAXIMILIANS UNIVERSITÄT MÜNCHEN Phase 2 UNIVERSITÄT BIELEFELD UNIVERSITÄT DES SAARLANDES TECHNISCHE UNIVERSITÄT MÜNCHEN UNIVERSITÄT HAMBURG FRIEDRICHALEXANDEREBERHARDT-KARLS UNIVERSITÄT UNIVERSITÄT ERLANGEN-NÜRNBERG TÜBINGEN UNIVERSITÄT KARLSRUHE UNIVERSITÄT STUTTGART RUHR-UNIVERSITÄT BOCHUM  W. Wahlster, DFKI The Control Panel of Verbmobil  W. Wahlster, DFKI From a Multi-Agent Architecture to a Multi-Blackboard Architecture Verbmobil I Verbmobil II l Multi-Agent Architecture l Multi-Blackboard Architecture M3 M1 M1 M2 M3 Blackboards M2 M4 BB 1 BB 2 BB 3 M4 M5 M6 M5 M6 l Each module must know, which module produces what data l Direct communication between modules l Each module has only one instance l Heavy data traffic for moving copies around l Multiparty and telecooperation applications are impossible l Software: ICE and ICE Master l Basic Platform: PVM l All modules can register for each blackboard dynamically l No direct communication between modules l Each module can have several instances l No copies of representation structures (word lattice, VIT chart) l Multiparty and Telecooperation applications are possible l Software: PCA and Module Manager l Basic Platform: PVM  W. Wahlster, DFKI Multi-Blackboard/Multi-Agent Architecture Module 1 Blackboard 1 Preprocessed Speech Signal Blackboard 2 Word Lattice Module 4 Module 2 Blackboard 3 Syntactic Representation: Parsing Results Module 5 Module 3 Blackboard 4 Semantic Representation: Lambda DRS Blackboard 5 Dialog Acts Module 6  W. Wahlster, DFKI A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Dialog Act Recognition Semantic Construction Robust Dialog Semantics Word Hypothesis Graph with Prosodic Labels VITs Underspecified Discourse Representations HPSG Parser Semantic Transfer Generation  W. Wahlster, DFKI Integrating Shallow and Deep Analysis Components in a Multi-Blackboard Architecture Augmented Word Lattice Statistical Parser Chunk Parser HPSG Parser partial VITs partial VITs Chart with a combination of partial VITs partial VITs Robust Dialog Semantics Combination and knowledgebased reconstruction of complete VITs Complete and Spanning VITs  W. Wahlster, DFKI VHG: A Packed Chart Representation of Partial Semantic Representations l Incremental chart construction and anytime processing l Rule-based combination and transformation of partial UDRS coded as VITs l Selection of a spanning analysis using a bigram model for VITs (trained on a tree bank of 24 k VITs) l Chart Parser using cascaded finite-state transducers (Abney, Hinrichs) l Statistical LR parser trained on treebank (Block, Ruland) l Very fast HPSG parser (see two papers at ACL99, Kiefer, Krieger et al.) Semantic Construction  W. Wahlster, DFKI Robust Dialog Semantics: Deep Processing of Shallow Structures Goals of robust semantic processing (Pinkal, Worm, Rupp) l Combination of unrelated analysis fragments l Completion of incomplete analysis results l Skipping of irrelevant fragments Method: Transformation rules on VIT Hypothesis Graph: Conditions on VIT structures  Operations on VIT structures The rules are based on various knowledge sources: l lattice of semantic types l domain ontology l sortal restrictions l semantic constraints Results: 20% analysis is improved, 0.6% analysis gets worse  W. Wahlster, DFKI Semantic Correction of Recognition Errors Wir treffen uns Kaiserslautern. (We are meeting Kaiserslautern.) German English We are meeting in Kaiserslautern.  W. Wahlster, DFKI Robust Dialog Semantics: Combining and Completing Partial Representations Let us meet (in) the late afternoon to catch the train to Frankfurt Let us meet the late afternoon to catch the train to Frankfurt The preposition ‚in‘ is missing in all paths through the word hypothesis graph. A temporal NP is transformed into a temporal modifier using a underspecified temporal relation: [temporal_np(V1)]  [typeraise_to_mod (V1, V2)] & V2 The modifier is applied to a proposition: [type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3  W. Wahlster, DFKI The Understanding of Spontaneous Speech Repairs Original Utterance Editing Phase Repair Phase I need a car next Tuesday oops Monday Hesitation Reparans Reparandum Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.  W. Wahlster, DFKI Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs Wir treffen uns in Mannheim, äh, in Saarbrücken. (We are meeting in Mannheim, oops, in Saarbruecken.) German English We are meeting in Saarbruecken.  W. Wahlster, DFKI Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Probabilistic Analysis of Dialog Acts (HMM) Dialog Act Type Dialog Act Type Recognition of Dialog Plans (Plan Operators) Robust Dialog Semantics VIT Dialog Phase Semantic Transfer  W. Wahlster, DFKI The Dialog Act Hierarchy used for Planning, Prediction, Translation and Generation CONTROL_DIALOG GREETING_BEGIN GREETING GREETING_END INTRODUCE POLITENESS_FORMULA THANK DELIBERATE BACKCHANNEL INIT MANAGE_TASK Dialog Act DEFER CLOSE REQUEST PROMOTE_TASK REQUEST_SUGGEST REQUEST_CLARIFY REQUEST_COMMENT REQUEST_COMMIT DEVIATE_SCENARIO REFER_TO_SETTING SUGGEST DIGRESS EXCLUDE INFORM CLARIFY GIVE_REASON FEEDBACK FEEDBACK_NEGATIVE REJECT COMMIT FEEDBACK_POSITIVE ACCEPT CONFIRM CLARIFY_ANSWER EXPLAINED_REJECT  W. Wahlster, DFKI Combining Statistical and Symbolic Processing for Dialog Processing Dialog-Act based Translation Dialog Module Context Evaluation Statistical Prediction Dialog Act Predictions Context Evaluation Main Proprositional Content Focus Plan Recognition Dialog Phase Transfer by Rules Dialog Act Dialog-Act based Translation Dialog Memory Dialog Act Generation of Minutes  W. Wahlster, DFKI Statistical Dialog Act Recognition  Statistical approach: find most probable dialog act D for words W : D = argmax P(D’ | W) D’  Bayes’ formula: D = argmax P(W | D’) P(D’) D’  Use dialog context H : D = argmax P(W | D’) P(D’ | H) D’  Approximation of a-priori word probabilities P(W | D) and dialog act probabilities P(D | H) from the corpus  W. Wahlster, DFKI Learning of Probabilistic Plan Operators from Annotated Corpora ( OPERATOR-s-10523-6 goal [IN-TURN confirm-s-10523 ?SLASH-3314 ?SLASH-3316] subgoals (sequence [IN-TURN confirm-s-10521 ?SLASH-3314 ?SLASH-3315] [IN-TURN confirm-s-10522 ?SLASH-3315 ?SLASH-3316]) PROB 0.72) ( OPERATOR-s-10521-8 goal [IN-TURN confirm-s-10521 ?SLASH-3321 ?SLASH-3322] subgoals (sequence [DOMAIN-DEPENDENT accept ?SLASH-3321 ?SLASH-3322]) PROB 0.95) ( OPERATOR-s-10522-10 goal [IN-TURN confirm-s-10522 ?SLASH-3325 ?SLASH-3326] subgoals (sequence [DOMAIN-DEPENDENT confirm ?SLASH-3325 ?SLASH-3326]) PROB 0.83)  W. Wahlster, DFKI Automatic Generation of Multilingual Protocols of Telephone Conversations Dialog Translation by Verbmobil Multilingual Generation of Protocols German Dialog Partner HTML-Document HTML-Document In English In English Transfered by Transfered by Internet or Fax Internet or Fax American Dialog Partner  W. Wahlster, DFKI Automatic Generation of Minutes A and B greet each other. A: (INIT_DATE, SUGGEST_SUPPORT_DATE, REQUEST_COMMENT_DATE) I would like to make a date. How about the seventeenth? Is that ok with you? B: (REJECT_DATE, ACCEPT_DATE) The seventeenth does not suit me. I’m free for one hour at three o’clock. A: (SUGGEST_SUPPORT_DATE) How about the sixteenth in the afternoon? B: (CLARIFY_QUERY, ACCEPT_DATE, CONFIRM) The sixteenth at two o’clock? That suits me. Ok. A and B say goodbye. Minutes generated automatically on 23 May 1999 08:35:18 h  W. Wahlster, DFKI Dialog Protocol Participants: Speaker B, Speaker A Date: 22.3.2000 Time: 8:57 AM to 10:03 AM Theme: Appointment schedule with trip and accommodation DIALOGUE RESULTS: Scheduling: Speaker B and speaker A will meet in the train station on the 1st of march 2000 at a quarter to 10 in the morning. Travelling: There the trip from Hamburg to Hanover by train will start on the 2nd of march at 10 o'clock in the morning. The way back by train will start on the 2nd of march at half past 6 in the evening. Accommodation: The hotel Luisenhof in Hanover was agreed on. Speaker A is taking care of the hotel reservation. Summary automatically generated at 22.3.2000 12:31:24 h  W. Wahlster, DFKI Spoken Clarification Dialogs between the User and the Verbmobil System English Translation German Input User 1 Clarification Subdialog in German Clarification caused by: lspeech recognition problems llack of context knowledge linconsistency with regard to the system’s knowledge Verbmobil System User 2 English Input  confusion with similar words (Sonntag vs. Sonntags)  unknown words (heuer  dieses Jahr)  lexical ambiguity (Noch einen Termin bitte!)  inconsistent date (Freitag, 24. Oktober)  W. Wahlster, DFKI Competing Strategies for Robust Speech Translation Concurrent processing modules of Verbmobil combine deep semantic translation with shallow surface-oriented translation methods. Word Lattice Expensive, but precise Translation Cheap, but approximate Translation time out? l Principled and compositional syntactic and semantic analysis l Case-based Translation l Dialog-act based translation l Semantic-based transfer of Verbmobil Interface Terms (VITs) as set of underspecified DRS l Statistical translation Results with Confidence Values Selection of best result Results with Confidence Values Acceptable Translation Rate  W. Wahlster, DFKI Architecture of the Semantic Transfer Module Bilingual Dictionary Refined VIT (L1) Monolingual Refinement Rules Refinement Refinement VIT (L1) Disambiguation Rules Underspecified VIT (L1) Refined VIT (L2) Lexical Transfer Phrasal Transfer Phrasal Dictionary Monolingual Refinement Rules VIT (L2) Disambiguation Rules Underspecified VIT (L2)  W. Wahlster, DFKI Extensions of Discourse Representation Theory The Verbmobil version of - DRT (Pinkal et al.) includes various extension of DRT: l lambda:  - abstraction over DRSs l merge operator: combination of DRSs l functional application: basic composition operation l quants feature: allows scope-free semantic representation l alfa expressions: representation of anaphoric elements with underspecified reference l anchors list: representation of deictic information l epsilon expressions: underspecification of elliptical expressions l modal expressions: representation of propositional attitudes  W. Wahlster, DFKI Three English Translations of the German Word “Termin” Found in the Verbmobil Corpus Subsumption Relations in the Domain Model 1. Verschieben wir den Termin. Let’s reschedule the appointment 2. Schlagen Sie einen Termin vor. Suggest a date. 3. Da habe ich einen Termin frei. I have got a free slot there. scheduled event default temporal_specification appointment set_start_time time_interval date slot  W. Wahlster, DFKI Entries in the Transfer Lexicon: German  English (Simplified) tau_lex (termin, appointment, pred_sort (subsumption (scheduled_event))). tau_lex (termin, date, pred_sort (subsumption (set_start_time)). tau_lex (termin, slot, pred_sort (subsumption (time_interval))). tau_lex (verschieben, reschedule, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (scheduled_event)])) tau_lex (ausmachen, make, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (scheduled_event)])) tau_lex (ausmachen, fix, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (set_start_time)])) tau_lex (freihaben, have_free, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (time_interval)]))  W. Wahlster, DFKI Context-Sensitive Translation Exploiting a Discourse Model Example: Three different translations of the German word Platz 1 2 3 room / table / seat Nehmen wir dieses Hotel, ja. Let us take this hotel. Ich reserviere einen Platz. I reserve a room. Machen wir das Abendessen dort. Let us have dinner there. Ich reserviere einen Platz. I reserve a table. Gehen wir ins Theater. Let us go to the theater. Ich möchte Plätze reservieren. I would like to reserve seats. All other dialog translation systems translate sentece by sentence without taking the dialog context into account.  W. Wahlster, DFKI The Use of Underspecified Representations Two Readings in the Source Language Wir telephonierten A compact representation of scope ambiguities in a logical language without using disjunctions Two Readings in the Target Language mit Freunden Underspecified Semantic Representation We called friends aus Schweden. Ambiguity Preserving Translations from Sweden.  W. Wahlster, DFKI The Control Panel of Verbmobil  W. Wahlster, DFKI Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 Wenn If you wirprefer den Termin another vorziehen, hotel, Statistical Translation Case-Based Translation Segment 2 dasplease würde let mirme gutknow. passen. Dialog-Act Based Translation Semantic Transfer Alternative Translations with Confidence Values Selection Module Segment 1 Segment 2 Translated by Semantic Transfer Translated by Case-Based Translation  W. Wahlster, DFKI A Context-Free Approach to the Selection of the Best Translation Result SEQ := Set of all translation sequences for a turn SeqSEQ := Sequence of translation segments s1, s2, ...sn Input: Each translation thread provides for every segment an online confidence value confidence (thread.segment) Task: Compute normalized confidence values for translated Seq CONF (Seq) =  Length(segment) * (alpha(thread) segment + beta(thread) * confidence(thread.segment))  Seq Output: Best (SEQ) = {Seq  SEQ | Seq is maximal element in (SEQ CONF)  W. Wahlster, DFKI Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus Turn := segment1, segment2...segmentn For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences. From the set of inequations for spanning analyses ( 4n) the values of alpha and beta can be determind offline by solving the constraint system.  W. Wahlster, DFKI Example of a Linear Inequation Used for Offline Learning Turn := Segment_1 Segment_2 Segment_3 Statistical Translation = STAT Case-based Translation = CASE Dialog-Act Based Translation = DIAL Semantic Transfer = SEMT quality (CASE, Segment_1), quality (SEMT, Segment_2), quality (STAT, Sement_3) is optimal Length (Segment_1) * (alpha (CASE ) + beta (CASE) * confidence (CASE, Segment_1)) Length (Segment_2) * (alpha (SEMT) + beta (SEMT) * confidence (SEMT, Segment_2)) Length (Segment_3) * (alpha (STAT) + beta (STAT) * confidence (STAT, Segment_3)) > Length (Segment_1) * (alpha (DIAL) + beta (DIAL) * confidence (DIAL, Segment_1)) Length (Segment_2) * (alpha (DIAL) + beta (DIAL) * confidence (DIAL, Segment_2)) Length (Segment_3) * (alpha (DIAL) + beta (DIAL) * confidence (DIAL, Segment_3))  W. Wahlster, DFKI The Context-Sensitive Selection of the Best Translation Using probabilities of dialog acts in the normalization process CONF (Seq) =  Length (segment) * (alpha (thread) + segment  Seq dialog-act (thread, segment) + beta (thread) * confidence (thread, segmnet)) e.g. Greet (Statistical_Translation, Segment > Greet (Semantic_Transfer, Segment) Suggest (Semantic_Transfer, Segment) > Suggest (Case_based Translation, Segment) Exploiting meta-knowledge If the semantic transfer generates  x disambiguation tasks then increase the alpha and beta values for semantic transfer. e.g. einen Termin vorziehen  prefer/give priority to/bring forward <a date> Observation: Even on the meta-control level (selection module) a hybrid approach is advantageous.  W. Wahlster, DFKI Verbmobil: Long-Term, Large-Scale Funding and Its Impact l Funding by the German Ministry for Education and Research BMBF Phase I (1993-1996) Phase II (1997-2000) $ 33 M $ 28 M l 60% Industrial funding according to shared cost model l Additional R&D investments of industrial partners $ 17 M $ 11 M Total $ 89 M l > 400 Publications (>250 refereed) l > Many Patents l > 10 Commercial Spin-off Products l > Many new Spin-off Companies l > 100 New jobs in German Language l > 50 Academics transferred to Industry Industry Philips, DaimlerChrysler and Siemens are leaders in Spoken Dialog Applications  W. Wahlster, DFKI SmartKom: Intuitive Multimodal Interaction Project Budget: Project Duration: $ 34 M 4 years The SmartKom Consortium: Main Contractor Project Management Testbed Software Integration DFKI Saarbrücken Uinv. Of Munich MediaInterface Berkeley Dresden Saarbrücken European Media Lab Heidelberg Univ. of Erlangen DAIMLERCHRYSLER Aachen Ulm Univ. of Stuttgart Munich Stuttgart  W. Wahlster, DFKI The Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998) Input Processing Media Media Analysis Analysis Interaction Management Language Media Fusion Gesture Discourse Modeling Biometrics Media Design Design Intention Recognition Language User(s) Graphics User Modeling Application Interface Graphics Information Applications People Gesture Animated Presentation Agent Presentation Design Output Rendering User Model Discourse Model Domain Model Task Model Media Models Representation and Inference  W. Wahlster, DFKI SmartKom: A Transportable and Transmutable Interface Agent Media Analysis Media Design SmartKom-Mobile: A Handheld Communication Assistant Kernel of SmartKom Interface Agent Application Management Interaction Management SmartKom-Public: A Multimodal Communication Booth SmartKom-Home/Office: A Versatile Agent-based Interface  W. Wahlster, DFKI SmartKom-Public: A Multimodal Communication Booth Loudspeaker Room microphone Smartcard/ Credit Card for authentication and billing Face-tracking camera Virtual touchscreen protected against vandalism Multipoint video conferencing Docking station for PDA/Notebook/ Camcorder high speed and broad bandwidth Internet connectivity High-resolution scanner  W. Wahlster, DFKI SmartKom-Mobile: A Handheld Communication Assistant GPS GSM for Telephone, Fax, Internet Connectivity Camera Wearable Compute Server Stylus-Activated Sketch Pad Microphone MOBILE Biosensor for Authentication & Emotional Feedback Loudspeaker Docking Station for Car PC  W. Wahlster, DFKI SmartKom-Home/Office: A Versatile Agent-based Interface SpeechMike Natural Gesture Recognition Virtual Touchscreen  W. Wahlster, DFKI Speech-based Interaction with an Organizer on a WAP Phone (Voice In - WML out) With Maier on 25 Oktober, with Tetzlaff, and with Streit too. Oops, not with Streit. From 2 to 3. Okay!  W. Wahlster, DFKI Conclusion g Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches. g In a multi-blackboard architecture based on packed representations on all processing levels (speech recognition, parsing, semantic processing, translation, generation) using charts with underspecified representations (eg. UDRS) the results of concurrent processing threads can be combined in an incremental fashion.  W. Wahlster, DFKI Conclusion g All results of concurrent processing modules should come with a confidence value, so that a selection module can choose the most promising result at a each processing stage. g Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable.  W. Wahlster, DFKI Conclusion g Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies. g Shallow methods can be used to guide the search in deep processing. g Statistical methods must be augmented by symbolic models (eg. Class-based language modelling, word order normalization as part of statistical translation). g Statistical methods can be used to learn operators or selection strategies for symbolic processes. It is much more than a balancing act... (see Klavans and Resnik 1996)  W. Wahlster, DFKI

From Speech Recognition Towards Speech Understanding

Related documents

Products

Support

From Speech Recognition Towards Speech Understanding

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib