presentation source

advertisement

Dagstuhl 2000

Pervasive Speech and

Language Technology

Wolfgang Wahlster

German Research Center for

Artificial Intelligence, DFKI GmbH

Stuhlsatzenhausweg 3

66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de

WWW:http://www.dfki.de/~wahlster

Pervasive Speech and Language Technology

A capuccino in

10 minutes, please!

Speech-controlled coffee machine

Let‘s go to Baker

Street in Berkeley!

Speech-based car navigation

I would like to hear

Mozart‘s piano concert

No. 3!

Send the following email to

Mark Maybury: Hi Mark, please forward the following agenda to your project partners!

© Wolfgang Wahlster, DFKI

Speech-enabled music selection

Dictation

Dagstuhl 2000

Pervasive Speech and Language Technology

Show me all CNN news of the last 3 months that feature Bill

Clinton discussing health care!

Information on demand

What has Jim Hendler said about DAML during our recent Dagstuhl seminar?

Audio Mining

I would like to make an appointment with

Dr. Kuremastu in Kyoto next week!

© Wolfgang Wahlster, DFKI

Speech-to-Speech

Translation

Dagstuhl 2000

Three Levels of Language Processing

Speech Input

Acoustic

Language Models

Speech Recognition

Word Lists

What has the speaker said?

100

Alternatives

Sprachanalyse

Speech Analysis

Grammar

Lexical

Meaning

Speech

Understanding

What has the speaker meant?

10

Alternatives

Discourse Context

Knowledge about Domain of Discourse

© Wolfgang Wahlster, DFKI

What does the speaker want?

Unambiguous

Understanding in the

Dialog Context

Dagstuhl 2000

Challenges for Language Engineering

Input Conditions Naturalness Adaptability Dialog Capabilities

Close-Speaking

Microphone/Headset

Push-to-talk

Isolated Words Speaker

Dependent

Monolog

Dictation

Informationseeking Dialog

Telephone,

Pause-based

Segmentation

Read Continuous

Speech

Speaker

Independent

Open Microphone,

GSM Quality

Spontaneous

Speech

Speaker adaptive

Verbmobil

© Wolfgang Wahlster, DFKI

Multiparty

Negotiation

Dagstuhl 2000

Context-Sensitive Speech-to-Speech Translation

Wann fährt der nächste

Zug nach Hamburg ab?

Wo befindet sich das nächste

Hotel?

When does the next train to Hamburg depart?

Where is the nearest hotel?

Verbmobil

Server

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Mobile Speech-to-Speech Translation of

Spontaneous Dialogs

As the name Verb mobil suggests, the system supports verb al communication with foreign dialog partners in mobil e situations.

1 face-to-face conversations

2 telecommunication

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Mobile Speech-to-Speech Translation of

Spontaneous Dialogs

Verbmobil Speech

Translation Server

Solution: Conference Call: The Verbmobil Speech Translation Server is accessed by GSM mobile phones.

© Wolfgang Wahlster, DFKI Dagstuhl 2000

© Wolfgang Wahlster, DFKI

Speech-to-Speech Translation

Dagstuhl 2000

The Control Panel of Verbmobil

© Wolfgang Wahlster, DFKI Dagstuhl 2000

General Speech Recognition Task

Audio Signal Recognizers

German

English

Japanese

Word Hypotheses Graph

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Word Hypotheses Graphs (WHGs)

WHGs realize the interface between acoustic and linguistic processing

Edge = Word

Best Hypothesis

Acoustic Score

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Massive Data Collection Efforts

Transliteration Variant 1

Transliteration Variant 2

Lexical Orthography

Canonical Pronounciation

Manual Phonological Segmentation

Automatic Phonological Segmentation

Word Segmentation

Prosodic Segmentation

Dialog Acts

Noises

Superimposed Speech

Syntactic Category

Word Category

Syntactic Function

Prosodic Boundaries

The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations

© Wolfgang Wahlster, DFKI

3,200 dialogs (182 hours) with 1,658 speakers

79,562 turns distributed on

56 CDs, 21.5 GB

Dagstuhl 2000

Extracting Statistical Properties from Large Corpora

Transcribed

Speech Data

Segmented

Speech with Prosodic

Labels

Annotated

Dialogs with

Dialog Acts

Treebanks &

Predicate-

Argument

Structures

Aligned

Bilingual

Corpora

Machine Learning for the Integration of Statistical Properties into

Symbolic Models for Speech Recognition, Parsing,

Dialog Processing, Translation

Hidden

Markov

Models

© Wolfgang Wahlster, DFKI

Neural Nets,

Multilayered

Perceptrons

Probabilistic

Automata

Probabilistic

Grammars

Probabilistic

Transfer

Rules

Dagstuhl 2000

From Multi-Agent Architectures to a Multi-

Blackboard Architectures

Multi-Agent Architecture

M3

M1 M2

M4 M5

M6

 Each module must know, which module produces what data

Direct communication between modules

Each module has only one instance

 Heavy data traffic for moving copies around

 Multiparty and telecooperation applications are impossible

Software: ICE and ICE Master

 Basic Platform: PVM

Multi-Blackboard Architecture

M1 M2 M3

BB 1 BB 2 BB 3

Blackboards

M4 M5 M6

 All modules can register for each blackboard dynamically

No direct communication between modules

Each module can have several instances

 No copies of representation structures

(word lattice, VIT chart)

 Multiparty and Telecooperation applications are possible

Software: PCA and Module Manager

 Basic Platform: PVM

© Wolfgang Wahlster, DFKI Dagstuhl 2000

A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules

Command

Recognizer

Audio Data

Spontaneous

Speech Recognizer

Channel/Speaker

Adaptation

Prosodic

Analysis

Statistical

Parser

Chunk

Parser

Dialog Act

Recognition

Word Hypotheses

Graph with

Prosodic Labels HPSG

Parser

Semantic

Construction

Robust Dialog

Semantics

VITs

Underspecified

Discourse

Representations

Semantic

Transfer

Generation

© Wolfgang Wahlster, DFKI Dagstuhl 2000

The Use of Prosodic Information at All Processing Stages

Speech Signal Word Hypotheses Graph

Multilingual Prosody Module

Prosodic features:

 duration

 pitch

 energy

 pause

Accented

Words

Boundary

Information

Boundary

Information

Sentence

Mood

Search Space

Restriction

Parsing

© Wolfgang Wahlster, DFKI

Dialog Act

Segmentation and

Recognition

Dialog

Understanding

Constraints for

Transfer

Translation

Lexical

Choice

Generation

Prosodic Feature

Vector

Speaker

Adaptation

Speech

Synthesis

Dagstuhl 2000

Competing Strategies for Robust Speech

Translation

Concurrent processing modules combine deep semantic translation with shallow surface-oriented translation methods.

Expensive, but precise Translation

 Principled and compositional syntactic and semantic analysis

 Semantic-based transfer of

Verbmobil Interface Terms (VITs) as set of underspecified DRS

Results with

Confidence Values

Word Lattice time out?

Cheap, but approximate Translation

 Case-based Translation

 Dialog-act based translation

 Statistical translation

Selection of best result

Results with

Confidence Values

Acceptable Translation Rate

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Integrating Shallow and Deep Analysis

Components in a Multi-Blackboard Architecture

Augmented

Word Hypotheses

Graph

Statistical Parser partial VITs

Chunk Parser partial VITs

Chart with a combination of partial VITs

Robust Dialog Semantics

Combination and knowledgebased reconstruction of complete VITs

Complete and Spanning

VITs

© Wolfgang Wahlster, DFKI

HPSG Parser partial VITs

Dagstuhl 2000

VHG: A Packed Chart Representation of Partial

Semantic Representations

 Incremental chart construction and anytime processing

 Rule-based combination and transformation of partial UDRS coded as VITs

 Selection of a spanning analysis using a bigram model for VITs

(trained on a tree bank of 24 k VITs)

 Chart Parser using cascaded finite-state transducers

 Statistical LR parser trained on treebank

 Very fast HPSG parser

© Wolfgang Wahlster, DFKI

Semantic

Construction

Dagstuhl 2000

The Understanding of Spontaneous Speech Repairs

Original Utterance

I need a car next Tuesday

Reparandum

Editing Phase Repair Phase oops

Hesitation

Monday

Reparans

Recognition of

Substitutions

Transformation of the

Word Hypothesis Graph

I need a car next Monday

Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning

Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Automatic Understanding and Correction of Speech

Repairs in Spontaneous Telephone Dialogs

Wir treffen uns in

Mannheim, äh, in Saarbrücken.

(We are meeting in

Mannheim, oops, in Saarbruecken.)

German

© Wolfgang Wahlster, DFKI

English

We are meeting in Saarbruecken.

Dagstuhl 2000

Robust Dialog Semantics: Combining and

Completing Partial Representations

Let us meet (in) the late afternoon to catch the train to Frankfurt

Let us meet the late afternoon to catch the train to Frankfurt

The preposition ‚in‘ is missing in all paths through the word hypotheses graph.

A temporal NP is transformed into a temporal modifier using a underspecified temporal relation:

[temporal_np(V1)]

[typeraise_to_mod (V1, V2)] & V2

The modifier is applied to a proposition:

[type (V1, prop), type (V2, mod)]

[apply (V2, V1, V3)] & V3

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Integrating Deep and Shallow Processing: Combining

Results from Concurrent Translation Threads

Segment 1

If you prefer another hotel,

Segment 2 please let me know.

Statistical

Translation

Case-Based

Translation

Dialog-Act Based

Translation

Alternative Translations with Confidence Values

Selection Module

Semantic

Transfer

Segment 1

Translated by Semantic Transfer

© Wolfgang Wahlster, DFKI

Segment 2

Translated by Case-Based Translation

Dagstuhl 2000

Unit Selection Algorithm

Sentence to synthesize

I have time on

I

I

I I

I have have time

Edge direction on on on monday.

monday monday

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Linguatronic : Spoken Dialogs with Mercedes-Benz

Please call Doris Wahlster.

Microphone

Push-to-talk

Switch

Open the left window in the back.

I want to hear the weather channel.

When will I reach the next gas station?

Where is the next parking lot?

 Speech control of: cellular phone, radio, windows / AC, route guidance system

 Option for S-, C-, and E-Class of Mercedes and BMW

 Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)

© Wolfgang Wahlster, DFKI Dagstuhl 2000

International Research Trends in Multilingual Systems

Multilingual Language Technology

Speech Recognition, Language Understanding, Language Generation, and Speech Synthesis

Dialog Translation

 Call Centers

 ECommerce

 Mobile Travel

Assistance

 Telephone

Translations

Verbmobil

Multilingual

Indexing and

Annotation of

Videos

 Video Archives

 News Archives

Multilingual

Audio Retrieval and Audio Mining

 Discussions

 Lecture Notes

 Organizers

Speech-based

Web Access to Multilingual

Web pages

 WAP Phones

 WebTV

Multilingual and Mobile

Communication

Assistants

 Multimodal

Interfaces

SmartKom

Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Conclusion I

 Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches .

In a multi-blackboard architecture based on packed representations on all processing levels (speech recognition, parsing, semantic processing, translation, generation) using charts with underspecified representations (eg. UDRS) the results of concurrent processing threads can be combined in an incremental fashion.

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Conclusion II

All results of concurrent processing modules should come with a confidence value , so that a selection module can choose the most promising result at a each processing stage.

Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable.

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Conclusion III

 Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies.

Shallow methods can be used to guide the search in deep processing.

Statistical methods must be augmented by symbolic models (eg. Class-based language modelling, word order normalization as part of statistical translation).

Statistical methods can be used to learn operators or selection strategies for symbolic processes.

It is much more than a balancing act...

(see Klavans and Resnik 1996)

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Open Problems for the Next Decade

 Problems with current machine learning approaches

L Expensive data collection

L Cognitively unrealistic training data

L Data sparseness

 Problems with current hand-crafted knowledge sources

L Brittleness

L Domain dependence

L Limited scalability

© Wolfgang Wahlster, DFKI Dagstuhl 2000

A Speculative Conclusion (+50 years)

-500 years TODAY +50 years

Oral Society  Textual Society  Oral Society

News and knowledge is passed orally

No mass storage

No automatic processing

No automatic retrieval

News and knowledge is passed textually

Mass storage of texts

Text Processing

Text Retrieval

News and knowledge is passed orally

Mass storage of speech

Speech Processing

Audio Retrieval

© Wolfgang Wahlster, DFKI Dagstuhl 2000

Download