Automatic Speech Attribute Transcription (ASAT) • Project Period: 10/01/04 – 9/30/08 • The ASAT Team – – – – – – – Mark Clements (clements@ece.gatech.edu) Sorin Dusan (sdusan@speech.rutgers.edu) Eric Fosler-Lussier (fosler@cse.ohio-state.edu) Keith Johnson (kjohnson@ling.ohio-state.edu) Fred Juang (juang@ece.gatech.edu) Larry Rabiner (lrr@caip.rutgers.edu) Chin Lee (Coordinator, chl@ece.gatech.edu) • NSF HLC Program Director: (mharper@nsf.gov) ASAT Paradigm and SoW 1 2 3 4 5. Overall System Prototypes and Common Platform 1. Bank of Speech Attribute Detectors • Each detected attribute is represented by a time series (event) – An example: frame-based detector (0-1 simulating posterior probability) • ANN-based Attribute Detectors – An example: nasal and stop detectors • Sound-specific parameters and feature detectors – An example: “VOT” for V/UV stop discrimination • Biologically-motivated processors and detectors – Analog detectors, short-term and long-term detectors • Perceptually-motivated processors and detectors – Converting speech into neural activity level functions • Others? An Example: More Visible than Spectrogram? j+ve d+ing z+ii j+i g+ong h+e g+uo d+e m+ing +vn Stop XX Nasal Vowel Early acoustic to linguistic mapping !! 2. Event Merger • Merge multiple time series into another time series – Maintaining the same detector output characteristics • Combine temporal events – An example: combining phones into words (word detectors) • Combine spatial events – An example: combining vowel and nasal features into nasalized vowels • Extreme: Build a 20K-word recognizer by implementing 20K keyword detectors • Others: OOV, partial recognition 3. Evidence Verifier • Provide confidence measures to events and evidences – Utterance verification algorithms can be used • Output recognized evidences (words and others) – Hypothesis testing is needed in every stage • Prune event and evidence lattices – Pruning threshold decisions • Minimum verification error (MVE) verifiers • Many new theories can be developed • Others? Word and Phone Verifiers (/w/+//+/n/ = “one”) 4. Knowledge Sources: Definition & Evaluation • Explore large body of speech science literature • Define training, evaluation and testing databases • Develop Objective Evaluation Methodology – Defining detectors, mergers, verifiers, recognizers – Defining/collecting evaluation data for all • Document all pieces on the web 5. Prototype ASR Systems and Platform • Continuous Phone Recognition: TIMIT? • Continuous Speech Recognition – Connected digit recognition – Wall Street Journal – Switchboard? • Establishment of a collaborative platform – Implementing divide-’n’-conquer strategy – Developing a user community Summary • ASAT Goal: Go beyond state-of-the-art • ASAT Spirit: Work for team excellence • ASAT team member responsibilities – – – – – – – MAC: Event Fusion SD: Perception-based processing EF: Knowledge Integration (Event Merger) KJ: Acoustic Phonetics BHJ: Evidence Verifier LRR: Attribute Detector CHL: Overall