Time: 20th September, 2014, 14:00-15:30 Location: SEIEE-410 Organizer: Dr. Yanmin Qian Title: High Accuracy Keyword Spotting from Low Resource Languages Abstract This talk will describe several different methods we developed to build a system that performs accurate Keyword Spotting on languages with very little training data. Our focus has been on keyword spotting in totally new languages with very few resources, including only a few hours of transcribed speech, no available text (other than the transcriptions), and no phonetic dictionary, with only a few weeks of effort. The talk will describe the basic speech recognition and keyword spotting techniques that we use, which is based on the BBN Byblos speech recognition system. The main theme of this talk is that, while good keyword spotting requires good speech recognition, that is not enough. There are many differences, because the goal of keyword spotting is different from speech recognition and we have developed many algorithms for different parts of the system. We will briefly describe methods for deriving improved features using deep neural networks, the high-recall keyword search method using confusion networks, a new search criterion using “white listing” to ensure that we maintain high recall for all words, score normalization for assuring that scores are consistent across all keywords, and techniques for detecting out-of-vocabulary keywords – words that were not in the vocabulary at the time the speech recognition was done – with accuracy approaching that of known words. The combination of all of these techniques has resulted in a system that has consistently outperformed other systems in evaluations of keyword spotting research. Bio Richard Schwartz graduated from MIT in 1971, where his Bachelor’s thesis was on vowel recognition. He has been working at BBN (Raytheon BBN Technologies) since 1972, where he is now a Principal Scientist in the Speech Language and Multimedia Department. He has worked on, and developed new methods that have become the standard way of doing things in speech recognition and keyword spotting, speech synthesis, speech enhancement, speaker identification, optical character recognition, topic spotting, information retrieval, information extraction, and machine translation. Some of the important breakthroughs include the use of context-dependent phonetic models for speech recognition, the first real-time continuous speech recognition in 1992 on off-the-shelf computers, the use of probability density estimation for speaker identification, higher performance through the use of probabilistic models for topic spotting, information retrieval, and text information extraction, the application of speech recognition techniques for language-independent OCR, large improvements in speech recognition by semi-supervised training using large amounts of untranscribed speech, a system for highly accurate translation of patents, and new techniques for improved keyword spotting.