CHL

advertisement
Automatic Speech Attribute Transcription
(ASAT)
• Project Period: 10/01/04 – 9/30/08
• The ASAT Team
–
–
–
–
–
–
–
Mark Clements (clements@ece.gatech.edu)
Sorin Dusan (sdusan@speech.rutgers.edu)
Eric Fosler-Lussier (fosler@cse.ohio-state.edu)
Keith Johnson (kjohnson@ling.ohio-state.edu)
Fred Juang (juang@ece.gatech.edu)
Larry Rabiner (lrr@caip.rutgers.edu)
Chin Lee (Coordinator, chl@ece.gatech.edu)
• NSF HLC Program Director: (mharper@nsf.gov)
ASAT Paradigm and SoW
1
2
3
4
5. Overall System Prototypes and Common Platform
1.
Bank of Speech Attribute Detectors
• Each detected attribute is represented by a time series (event)
– An example: frame-based detector (0-1 simulating posterior probability)
• ANN-based Attribute Detectors
– An example: nasal and stop detectors
• Sound-specific parameters and feature detectors
– An example: “VOT” for V/UV stop discrimination
• Biologically-motivated processors and detectors
– Analog detectors, short-term and long-term detectors
• Perceptually-motivated processors and detectors
– Converting speech into neural activity level functions
• Others?
An Example: More Visible than Spectrogram?
j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn
Stop
XX
Nasal
Vowel
Early acoustic to linguistic mapping !!
2.
Event Merger
• Merge multiple time series into another time series
– Maintaining the same detector output characteristics
• Combine temporal events
– An example: combining phones into words (word detectors)
• Combine spatial events
– An example: combining vowel and nasal features into
nasalized vowels
• Extreme: Build a 20K-word recognizer by
implementing 20K keyword detectors
• Others: OOV, partial recognition
3.
Evidence Verifier
• Provide confidence measures to events and evidences
– Utterance verification algorithms can be used
• Output recognized evidences (words and others)
– Hypothesis testing is needed in every stage
• Prune event and evidence lattices
– Pruning threshold decisions
• Minimum verification error (MVE) verifiers
• Many new theories can be developed
• Others?
Word and Phone Verifiers
(/w/+//+/n/ = “one”)
4.
Knowledge Sources: Definition & Evaluation
• Explore large body of speech science literature
• Define training, evaluation and testing databases
• Develop Objective Evaluation Methodology
– Defining detectors, mergers, verifiers, recognizers
– Defining/collecting evaluation data for all
• Document all pieces on the web
5.
Prototype ASR Systems and Platform
• Continuous Phone Recognition: TIMIT?
• Continuous Speech Recognition
– Connected digit recognition
– Wall Street Journal
– Switchboard?
• Establishment of a collaborative platform
– Implementing divide-’n’-conquer strategy
– Developing a user community
Summary
• ASAT Goal: Go beyond state-of-the-art
• ASAT Spirit: Work for team excellence
• ASAT team member responsibilities
–
–
–
–
–
–
–
MAC: Event Fusion
SD: Perception-based processing
EF: Knowledge Integration (Event Merger)
KJ: Acoustic Phonetics
BHJ: Evidence Verifier
LRR: Attribute Detector
CHL: Overall
Download