final_speech

Introduction Sphinx-4 is an open source framework for speech recognition, written in the Java programming to help in the research of speech recognition system. In Sphinx-4 it has 3 main components 1. The FrontEnd 2. The Decoder 3. The Linguist When the recognizer starts up, it constructs the front end (which generates features from speech), the decoder, and the linguist (which generates the search graph) according to the configuration specified by the user. These components will in turn construct their own subcomponents. For example, the linguist will construct the acoustic model, the dictionary, and the language model. It will use the knowledge from these three components to construct a search graph that is appropriate for the task. The decoder will construct the search manager, which in turn constructs the scorer, the pruner, and the active list. In this project we focus on the Linguist component that has 3 subcomponents 1. The Acoustic Model Acoustic model is pronounced of individual characters, known as phonemes. 2. The Dictionary Dictionary is the pronunciation of all the words that the system can recognize. 3. The Language Model Language model describes how the grammar looks like. The Linguist translates any type of standard language model, along with pronunciation information from the Dictionary and structural information from one or more sets of the acoustic model. In Automatic Speech recognition one of the problems is how to create a new dictionary that match with specific work. Because the Dictionary in Sphinx-4 is not enough word that make a problem when use in specific work. In this project we show you the process to add or delete some words in dictionary from Sphinx-4. Acoustic model in Sphinx-4 The acoustic model in Sphinx-4 consists of a set of left-to-right Hidden Markov Models, which a type of statistical model for basic sound units. The units represent phones in a triphone context. The following diagram illustrates the definition of the Hidden Markov Models of the acoustic models in Sphinx-4. The acoustic model in Sphinx-4 is packed in JAR file. The advantage of packing it in a JAR file is that the file can be included in the classpath and referenced in the configuration file for it to be used in a Sphinx-4 application. Sphinx4 allows URIs to contain resource:<acoustic or language model path> which allows XML configuration files to easily reference models in JAR files. Scheme resource:/path causes Sphinx4 to search on the classpath for the path. The acoustic model is very important part of the recognizer. In sphix-4 we have two important models that are for difference purpose 1. TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for number. If you need to recognize number then you should use this model 2. WSJ_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for text. If you want to recognize text then you should use this model. Dictionary in Sphinx-4 The dictionary provides pronunciations for words found in the language model. The pronunciations split words into sequences of phonemes that found in the acoustic model. Create new dictionary In Sphinx-4 the dictionary is packed in JAR file that can be either Wall Street Journal (WSJ) or TIDIGITS so if you want to create a new dictionary you must to change it in JAR files because XML configuration refers the acoustic model in that. When creating a new dictionary thing about checking that there is no mismatch between dictionary and acoustic model. Language Model in Sphinx_4 There are two types of model that describe language 1. Grammars language model Grammars describe very simple types of languages for command and control, and you are written by hand or generated automatically with plain code. 2. Statistical language model Statistical language model estimate the probability of the distribution of natural language. The most widely used statistical language model is N-gram. Configuring Sphinx-4 You have to configure your Sphinx-4 application when you have everything you need for your application. You will do it in the configuration file, an XML file. The XML configuration file The configuration of a particular Sphin-4 system is determined by a configuration file. This configuration file defines the following 1. The names and types of all of the components of the system. 2. The connectivity of these components – that is, which components talk to each other. 3. The detailed configuration for each of these elements. Use model in Sphinx-4 There are three steps to use new model from Sphinx-4 1. Defining a language model. 2. Defining a dictionary. 3. Defining an acoustic model. 4. Configure a frontend. As a base for building application configuration take any exiting configuration. In this project we take configuration from HelloWorld demo Define a language model <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> <property name="unigramWeight" value="0.7"/> <property name="maxDepth" value="3"/> <property name="logMath" value="logMath"/> <property name="dictionary" value="dictionary"/> <property name="location" value="the name of the language model file for example <your_training_folder>/etc/<your_model_name>.lm.DMP"/> </component> Define a dictionary <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary"> <property name="dictionaryPath" value="the name of the dictionary file for example <your_training_folder>/etc/<your_model_name>.dic"/> <property name="fillerPath" value="the name of the filler file for example <your_training_folder>/etc/<your_model_name>.filler"/> <property name="addSilEndingPronunciation" value="false"/> <property name="allowMissingWords" value="false"/> <property name="unitManager" value="unitManager"/> </component> Define an acoustic model <component name="sphinx3Loader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> <property name="location" value="the path to the model folder for example <your_training_folder>/model_parameters/<your_model_name>.cd_cont_<senones>"/ > </component> <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component> Example In this project we use example from HelloWorld demo to show how create dictionary and grammar. The project is focus on how to create dictionary and grammar 1. The process that create a new dictionary. 1. We extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar in lib directory. 2. Go to dict folder and open cmudict.0.6.d file in that folder. 3. Insert words and phonemes into cmudict.0.6d file and save. 4. Zip the folder that we extract in zip file. 5. Remove WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar from libraries in build path and add zip file into libraries in build path. 6. Now we can use a new dictionary. 2. The process that change grammar In HelloWorld we change the grammar by written the sentence that we want in hello.gram

final_speech

Related documents

Products

Support

final_speech

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib