final_speech

advertisement
Introduction
Sphinx-4 is an open source framework for speech recognition, written in the Java programming
to help in the research of speech recognition system. In Sphinx-4 it has 3 main components
1. The FrontEnd
2. The Decoder
3. The Linguist
When the recognizer starts up, it constructs the front end (which generates features from speech),
the decoder, and the linguist (which generates the search graph) according to the configuration
specified by the user. These components will in turn construct their own subcomponents. For
example, the linguist will construct the acoustic model, the dictionary, and the language model. It
will use the knowledge from these three components to construct a search graph that is
appropriate for the task. The decoder will construct the search manager, which in turn constructs
the scorer, the pruner, and the active list.
In this project we focus on the Linguist component that has 3 subcomponents
1. The Acoustic Model
Acoustic model is pronounced of individual characters, known as phonemes.
2. The Dictionary
Dictionary is the pronunciation of all the words that the system can recognize.
3. The Language Model
Language model describes how the grammar looks like.
The Linguist translates any type of standard language model, along with pronunciation
information from the Dictionary and structural information from one or more sets of the acoustic
model.
In Automatic Speech recognition one of the problems is how to create a new dictionary that
match with specific work. Because the Dictionary in Sphinx-4 is not enough word that make a
problem when use in specific work. In this project we show you the process to add or delete
some words in dictionary from Sphinx-4.
Acoustic model in Sphinx-4
The acoustic model in Sphinx-4 consists of a set of left-to-right Hidden Markov Models, which a
type of statistical model for basic sound units. The units represent phones in a triphone context.
The following diagram illustrates the definition of the Hidden Markov Models of the acoustic
models in Sphinx-4.
The acoustic model in Sphinx-4 is packed in JAR file. The advantage of packing it in a JAR file
is that the file can be included in the classpath and referenced in the configuration file for it to be
used in a Sphinx-4 application. Sphinx4 allows URIs to contain resource:<acoustic or language
model path> which allows XML configuration files to easily reference models in JAR files.
Scheme resource:/path causes Sphinx4 to search on the classpath for the path.
The acoustic model is very important part of the recognizer. In sphix-4 we have two important
models that are for difference purpose
1. TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for
number. If you need to recognize number then you should use this model
2. WSJ_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for text. If you
want to recognize text then you should use this model.
Dictionary in Sphinx-4
The dictionary provides pronunciations for words found in the language model. The
pronunciations split words into sequences of phonemes that found in the acoustic model.
Create new dictionary
In Sphinx-4 the dictionary is packed in JAR file that can be either Wall Street Journal (WSJ) or
TIDIGITS so if you want to create a new dictionary you must to change it in JAR files because
XML configuration refers the acoustic model in that.
When creating a new dictionary thing about checking that there is no mismatch between
dictionary and acoustic model.
Language Model in Sphinx_4
There are two types of model that describe language
1. Grammars language model
Grammars describe very simple types of languages for command and control, and you are
written by hand or generated automatically with plain code.
2. Statistical language model
Statistical language model estimate the probability of the distribution of natural language.
The most widely used statistical language model is N-gram.
Configuring Sphinx-4
You have to configure your Sphinx-4 application when you have everything you need for your
application. You will do it in the configuration file, an XML file.
The XML configuration file
The configuration of a particular Sphin-4 system is determined by a configuration file. This
configuration file defines the following
1. The names and types of all of the components of the system.
2. The connectivity of these components – that is, which components talk to each other.
3. The detailed configuration for each of these elements.
Use model in Sphinx-4
There are three steps to use new model from Sphinx-4
1. Defining a language model.
2. Defining a dictionary.
3. Defining an acoustic model.
4. Configure a frontend.
As a base for building application configuration take any exiting configuration. In this project we
take configuration from HelloWorld demo
Define a language model
<component name="trigramModel"
type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel">
<property name="unigramWeight" value="0.7"/>
<property name="maxDepth" value="3"/>
<property name="logMath" value="logMath"/>
<property name="dictionary" value="dictionary"/>
<property name="location"
value="the name of the language model file
for example
<your_training_folder>/etc/<your_model_name>.lm.DMP"/>
</component>
Define a dictionary
<component name="dictionary"
type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
<property name="dictionaryPath"
value="the name of the dictionary file
for example <your_training_folder>/etc/<your_model_name>.dic"/>
<property name="fillerPath"
value="the name of the filler file
for example
<your_training_folder>/etc/<your_model_name>.filler"/>
<property name="addSilEndingPronunciation" value="false"/>
<property name="allowMissingWords" value="false"/>
<property name="unitManager" value="unitManager"/>
</component>
Define an acoustic model
<component name="sphinx3Loader"
type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
<property name="logMath" value="logMath"/>
<property name="unitManager" value="unitManager"/>
<property name="location" value="the path to the model folder
for example
<your_training_folder>/model_parameters/<your_model_name>.cd_cont_<senones>"/
>
</component>
<component name="acousticModel"
type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
<property name="loader" value="sphinx3Loader"/>
<property name="unitManager" value="unitManager"/>
</component>
Example
In this project we use example from HelloWorld demo to show how create dictionary and
grammar.
The project is focus on how to create dictionary and grammar
1. The process that create a new dictionary.
1. We extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar in lib directory.
2. Go to dict folder and open cmudict.0.6.d file in that folder.
3. Insert words and phonemes into cmudict.0.6d file and save.
4. Zip the folder that we extract in zip file.
5. Remove WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar from libraries in build
path and add zip file into libraries in build path.
6. Now we can use a new dictionary.
2. The process that change grammar
In HelloWorld we change the grammar by written the sentence that we want in
hello.gram
Download