Welcome 2009 Almost-Spring Short Course on Speech Recognition Instructors:

advertisement
Welcome
2009 Almost-Spring Short Course
on Speech Recognition
Instructors:
Bhiksha Raj and Rita Singh
1
What will the course be about
• We will cover most relevant topics of speech
recognition
• The focus will be on the theory and practice
– We will not discuss code for the most part
– We will keep maths out of it as far as possible,
however
• We will discuss algorithms and implementation
details
2
Instructors
• Bhiksha Raj: Carnegie Mellon University
– Expert in speech recognition
• Rita Singh: Carnegie Mellon University
– Expert in speech recognition
• Peter Wolf: Independent Consultant
– Previously in Dragon Systems Inc.
– Sphinx4 expert, expert in speech recogintion
application development
– Brought in primarily as a resource for helping with
sphinx4 and answering applications related questions
3
Format of Course
• 3 Lectures daily
– Morning: 8.00 AM, 1.00 – 1.30 ours
– Late Morning / Early Afternoon: 11:00 AM
– Afternoon: 2.30 PM
• The schedule is flexible – timings may vary
depending on how much is covered
• Lectures expected to last 1.00 – 1.5 hours each
• Intervening times expected to be taken up by
exercises
4
Instruction Format
• Lectures will be pictorially oriented
• Although we will cover general topics, the
specific implementations described will be
based on CMU Sphinx
– Most other systems are similar
• Exercises will be based on sphinx
5
Lecture Outline: Day 1
• Lecture 1: “Speech recognition for dummies”
– a quick development of speech recognition as string
matching
• Lecture 2: “Feature computation”
– Explaining how features are computed for speech
recognition, including all signal processing
• Lecture 3: “Hidden Markov Models”
– Describing HMMs and all associated problems
6
Lecture Outline: Day 2
• Lecture 1: “Training From Continuous Speech”
– How to train models from continuous speech
– Phonemes, why we need them and how to train them
• Lecture 2: “Context dependent phonemes”
– What are context dependent phonemes
– Various types of context dependent phonemes
– Training CD phonemes
• Lecture 3: “Decision Trees and State Tying”
– All about decision trees for parameter sharing in ASR systems
7
Lecture Outline: Day 3
• Lecture 1: “Training context-dependent models with tied
states”
– A (relatively) short lecture explaining the final overall process for
training models
• Lecture 2: “Language Modelling”
– How to model “language” for speech recognition
– Statistical language modelling
• Lecture 3: “Decoding: Basics”
– Describing the basic ideas behind the decoding strategies for
continuous speech
8
Lecture Outline: Day 4
• Lecture 1: “Decoding: Advanced”
– Explaining various more advanced approaches to decoding
• Arriving at the state of art
• Lecture 2: “Advanced Topics”
– Adaptation, Normalization, Discriminative Training etc.
• Session 3: Open.
– Any spillover
– Question Answering
9
Exercises: Day 1
• There will be exercises following most
lectures
• Lecture 1: None
• Lecture 2: Exercise on capture and
feature computation from speech signals
• Lecture 3: None
10
Exercises: Day 2
• Lecture 1: “Training From Continuous Speech”
– Exercise on training phoneme models and
recognizing with them
• Lecture 2: “Context dependent phonemes”
– Exercise on training models for context-dependent
phonemes and recognizing with them
• Lecture 3: “Decision Trees and State Tying”
– Exercise on learning decision trees
11
Exercises: Day 3
• Lecture 1: “Training context-dependent models
with tied states”
– Exercise on complete training of the ASR system
• Lecture 2: “Language Modelling”
– Exercises on building JSGF grammars and Ngram
LMs for speech recognition
• Lecture 3: “Decoding: Basics”
12
Lecture Outline: Day 4
• Lecture 1: “Decoding: Advanced”
– Decoding with various speech recognition system
variants:
• Sphinx3 flat, Sphinx3 tree, Sphinx4
• Lecture 2: “Advanced Topics”
– No exercises
• Session 3: Open.
– No exercises
13
Software to Install
• We will be using the CMU sphinx extensively
–
–
–
–
Sphinxtrain
Sphinx3 decoder
Sphinx4 decoder
CMU LM Toolkit or SRI LM Toolkit
• We will need additional software to go with it
– Java, ant, groovy for S4
14
Sphinx Downloads:
http://cmusphinx.sourceforge.net
15
Sphinx Downloads:
http://cmusphinx.sourceforge.net
• Sphinxbase:
– Click on the “sphinxbase” link on the left
– Click “all releases”
– Download version 0.4.1
• http://downloads.sourceforge.net/cmusphinx/sphinxbase0.4.1.tar.bz2?use_mirror=superb-east
• Sphinx3:
– Click on “sphinx3” link on left
– Click on “all releases”
– Download version 3-0.8
• http://downloads.sourceforge.net/cmusphinx/sphinx30.8.zip?use_mirror=internap
16
Sphinx Downloads:
http://cmusphinx.sourceforge.net
• Cepview:
– Click on the “cepview” link on the left
• lm3g2dmp:
– Click on “lm3g2dmp” link on left
• The above two are visualization / data-structure
optimization tools and are not critical
– But they are small, so you might as well download them
• CMULM toolkit: You may install SRI LM toolkit instead
– Better maintained – CMU toolkit is not currently maintained
17
Sphinx Downloads:
http://cmusphinx.sourceforge.net
• Sphinx4:
– For this workshop download a copy of sphinx that is under development
at github.com
– http://github.com/juanzanos/sphinx4/tree/master
• Click on download link
– Caveat: some scripts may not run; if so we will revert to release version
• Sphinx4 will also need
–
–
–
–
Java JDK 1.6 -- from http://javasoft.com
Apache ant -- from http://ant.apache.org
A useful scripting tool (some of our latest scripts are in it): Groovy
Groovy can be had from http://groovy.codehaus.org
• Bookmark this link:
– http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.
html
18
Operating Systems
• Sphinxbase and Sphinx3 packages have been tried and
tested on linux
– We are not windows people
• Suggestion: Prefer linux-based machines
– You may also try to run these programs on cygwin under
windows
• Sphinx* should compile under cygwin
• Install “tcsh” under cygwin
• We will provide tcsh scripts
• Sphinx4 is platform independent
19
Additional Packages
• Would be useful to have a visualization
tool
– Need to visualize matrices as surfaces
• Matlab would be great
• If you don’t have matlab, download octave
– http://www.gnu.org/software/octave/
20
Data
• You may use any data you wish to
• For exercise we will attempt to provide a
small amount of data
– As much as can be dealt with on your
computers
21
Questions
• ?
22
Download