Welcome 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh 1 What will the course be about • We will cover most relevant topics of speech recognition • The focus will be on the theory and practice – We will not discuss code for the most part – We will keep maths out of it as far as possible, however • We will discuss algorithms and implementation details 2 Instructors • Bhiksha Raj: Carnegie Mellon University – Expert in speech recognition • Rita Singh: Carnegie Mellon University – Expert in speech recognition • Peter Wolf: Independent Consultant – Previously in Dragon Systems Inc. – Sphinx4 expert, expert in speech recogintion application development – Brought in primarily as a resource for helping with sphinx4 and answering applications related questions 3 Format of Course • 3 Lectures daily – Morning: 8.00 AM, 1.00 – 1.30 ours – Late Morning / Early Afternoon: 11:00 AM – Afternoon: 2.30 PM • The schedule is flexible – timings may vary depending on how much is covered • Lectures expected to last 1.00 – 1.5 hours each • Intervening times expected to be taken up by exercises 4 Instruction Format • Lectures will be pictorially oriented • Although we will cover general topics, the specific implementations described will be based on CMU Sphinx – Most other systems are similar • Exercises will be based on sphinx 5 Lecture Outline: Day 1 • Lecture 1: “Speech recognition for dummies” – a quick development of speech recognition as string matching • Lecture 2: “Feature computation” – Explaining how features are computed for speech recognition, including all signal processing • Lecture 3: “Hidden Markov Models” – Describing HMMs and all associated problems 6 Lecture Outline: Day 2 • Lecture 1: “Training From Continuous Speech” – How to train models from continuous speech – Phonemes, why we need them and how to train them • Lecture 2: “Context dependent phonemes” – What are context dependent phonemes – Various types of context dependent phonemes – Training CD phonemes • Lecture 3: “Decision Trees and State Tying” – All about decision trees for parameter sharing in ASR systems 7 Lecture Outline: Day 3 • Lecture 1: “Training context-dependent models with tied states” – A (relatively) short lecture explaining the final overall process for training models • Lecture 2: “Language Modelling” – How to model “language” for speech recognition – Statistical language modelling • Lecture 3: “Decoding: Basics” – Describing the basic ideas behind the decoding strategies for continuous speech 8 Lecture Outline: Day 4 • Lecture 1: “Decoding: Advanced” – Explaining various more advanced approaches to decoding • Arriving at the state of art • Lecture 2: “Advanced Topics” – Adaptation, Normalization, Discriminative Training etc. • Session 3: Open. – Any spillover – Question Answering 9 Exercises: Day 1 • There will be exercises following most lectures • Lecture 1: None • Lecture 2: Exercise on capture and feature computation from speech signals • Lecture 3: None 10 Exercises: Day 2 • Lecture 1: “Training From Continuous Speech” – Exercise on training phoneme models and recognizing with them • Lecture 2: “Context dependent phonemes” – Exercise on training models for context-dependent phonemes and recognizing with them • Lecture 3: “Decision Trees and State Tying” – Exercise on learning decision trees 11 Exercises: Day 3 • Lecture 1: “Training context-dependent models with tied states” – Exercise on complete training of the ASR system • Lecture 2: “Language Modelling” – Exercises on building JSGF grammars and Ngram LMs for speech recognition • Lecture 3: “Decoding: Basics” 12 Lecture Outline: Day 4 • Lecture 1: “Decoding: Advanced” – Decoding with various speech recognition system variants: • Sphinx3 flat, Sphinx3 tree, Sphinx4 • Lecture 2: “Advanced Topics” – No exercises • Session 3: Open. – No exercises 13 Software to Install • We will be using the CMU sphinx extensively – – – – Sphinxtrain Sphinx3 decoder Sphinx4 decoder CMU LM Toolkit or SRI LM Toolkit • We will need additional software to go with it – Java, ant, groovy for S4 14 Sphinx Downloads: http://cmusphinx.sourceforge.net 15 Sphinx Downloads: http://cmusphinx.sourceforge.net • Sphinxbase: – Click on the “sphinxbase” link on the left – Click “all releases” – Download version 0.4.1 • http://downloads.sourceforge.net/cmusphinx/sphinxbase0.4.1.tar.bz2?use_mirror=superb-east • Sphinx3: – Click on “sphinx3” link on left – Click on “all releases” – Download version 3-0.8 • http://downloads.sourceforge.net/cmusphinx/sphinx30.8.zip?use_mirror=internap 16 Sphinx Downloads: http://cmusphinx.sourceforge.net • Cepview: – Click on the “cepview” link on the left • lm3g2dmp: – Click on “lm3g2dmp” link on left • The above two are visualization / data-structure optimization tools and are not critical – But they are small, so you might as well download them • CMULM toolkit: You may install SRI LM toolkit instead – Better maintained – CMU toolkit is not currently maintained 17 Sphinx Downloads: http://cmusphinx.sourceforge.net • Sphinx4: – For this workshop download a copy of sphinx that is under development at github.com – http://github.com/juanzanos/sphinx4/tree/master • Click on download link – Caveat: some scripts may not run; if so we will revert to release version • Sphinx4 will also need – – – – Java JDK 1.6 -- from http://javasoft.com Apache ant -- from http://ant.apache.org A useful scripting tool (some of our latest scripts are in it): Groovy Groovy can be had from http://groovy.codehaus.org • Bookmark this link: – http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels. html 18 Operating Systems • Sphinxbase and Sphinx3 packages have been tried and tested on linux – We are not windows people • Suggestion: Prefer linux-based machines – You may also try to run these programs on cygwin under windows • Sphinx* should compile under cygwin • Install “tcsh” under cygwin • We will provide tcsh scripts • Sphinx4 is platform independent 19 Additional Packages • Would be useful to have a visualization tool – Need to visualize matrices as surfaces • Matlab would be great • If you don’t have matlab, download octave – http://www.gnu.org/software/octave/ 20 Data • You may use any data you wish to • For exercise we will attempt to provide a small amount of data – As much as can be dealt with on your computers 21 Questions • ? 22