more - SJTU SpeechLab

advertisement
Time: 20th September, 2014, 14:00-15:30
Location: SEIEE-410
Organizer: Dr. Yanmin Qian
Title: High Accuracy Keyword Spotting from Low Resource Languages
Abstract
This talk will describe several different methods we developed to build a system that performs
accurate Keyword Spotting on languages with very little training data. Our focus has been on keyword
spotting in totally new languages with very few resources, including only a few hours of transcribed
speech, no available text (other than the transcriptions), and no phonetic dictionary, with only a few
weeks of effort. The talk will describe the basic speech recognition and keyword spotting techniques
that we use, which is based on the BBN Byblos speech recognition system. The main theme of this talk
is that, while good keyword spotting requires good speech recognition, that is not enough. There are
many differences, because the goal of keyword spotting is different from speech recognition and we
have developed many algorithms for different parts of the system. We will briefly describe methods for
deriving improved features using deep neural networks, the high-recall keyword search method using
confusion networks, a new search criterion using “white listing” to ensure that we maintain high recall
for all words, score normalization for assuring that scores are consistent across all keywords, and
techniques for detecting out-of-vocabulary keywords – words that were not in the vocabulary at the
time the speech recognition was done – with accuracy approaching that of known words. The
combination of all of these techniques has resulted in a system that has consistently outperformed
other systems in evaluations of keyword spotting research.
Bio
Richard Schwartz graduated from MIT in 1971, where his Bachelor’s thesis was on vowel
recognition. He has been working at BBN (Raytheon BBN Technologies) since 1972, where he is now a
Principal Scientist in the Speech Language and Multimedia Department. He has worked on, and
developed new methods that have become the standard way of doing things in speech recognition and
keyword spotting, speech synthesis, speech enhancement, speaker identification, optical character
recognition, topic spotting, information retrieval, information extraction, and machine translation.
Some of the important breakthroughs include the use of context-dependent phonetic models for speech
recognition, the first real-time continuous speech recognition in 1992 on off-the-shelf computers, the
use of probability density estimation for speaker identification, higher performance through the use of
probabilistic models for topic spotting, information retrieval, and text information extraction, the
application of speech recognition techniques for language-independent OCR, large improvements in
speech recognition by semi-supervised training using large amounts of untranscribed speech, a system
for highly accurate translation of patents, and new techniques for improved keyword spotting.
Download