Speech To Text Conversion using Java API

advertisement
Speech To Text Conversion using Java API
In this post, I am going to explain speech to text conversion using java speech API.
Here is the agenda of this post:





Brief about Speech Recognition.
Setup Speech Recognition engine
Verifying the Speech Recognition engine installation
Sample code
Conclusion
A Brief Idea
Speech Recognition is the process of converting spoken input to digital output, such as text.
Speech recognition systems provide computers with the ability to listen to user speech
and determine what is said.
The Speech Recognition process can be divided into these four steps:
1.
2.
3.
4.
Speech is converted to digital signals.
Actual speech sounds are extracted from the sounds (based on energy of the sounds).
The extracted sounds are put together into 'speech frames.'
The speech frames are compared with words from the grammar file to determine the spoken
word.
I am going to use a third party java speech recognizer engine TalkingJava SDK . It is the full
implementation of Sun's Java Speech API providing Text-To-Speech and Speech-Recognition engines.
Setup
Download TalkingJava SDK : Click Here
Four easy step to set up TalkingJava SDK




unpack TalkingJavaSDK-1xx.zip or TalkingJavaSDK-1xx.jar into a new directory
[TalkingJavaSDK-1xx], and double-click on Setup.exe and install SDK.
unpack packet.jar available inside TalkingJavaSDK-1xx directory into a new
directory called packet.
Copy cgjsapi.jar and cgjsapi1xx.dll files available inside packet directory to
your Java\jdk1.x.x.x\jre\lib\ext\ direcoty.
Setup your packet directory path into your CLASSPATH environment variable.
For more installation details : Click here
Verify Setup
I have written a small java class to verify whether the speech recognition engine has
been installed successfully or not.
package com.sarf.talkingjava;
import java.util.Locale;
import javax.speech.Central;
import javax.speech.EngineList;
import javax.speech.recognition.RecognizerModeDesc;
public class TestRecognizerConfig {
public static void main(String[] args) {
try
{
Central.registerEngineCentral
("com.cloudgarden.speech.CGEngineCentral");
RecognizerModeDesc desc =
new RecognizerModeDesc(Locale.US,Boolean.TRUE);
EngineList el = Central.availableRecognizers(desc);
if(el.size()<1){
System.out.println("Recognition Engine is not
available");
System.exit(1);
}else{
System.out.println("Recognition Engine is
available");
System.exit(1);
}
}catch(Exception exception)
{
exception.printStackTrace();
}
}
}
Run this class file and check the output.
If output is Recognition Engine is available then you have installed recognition
engine successfully and you are good to go.
Here We Go
package com.sarf.talkingjava;
import javax.speech.Central;
import javax.speech.recognition.*;
import java.io.FileReader;
import java.util.Locale;
public class SpeechToTextConverter extends ResultAdapter {
static Recognizer recognizer;
public void resultAccepted(ResultEvent resultEvent) {
Result result = (Result)(resultEvent.getSource());
ResultToken resultToken[] = result.getBestTokens();
for (int nIndex = 0; nIndex < resultToken.length; nIndex++){
System.out.print(resultToken[nIndex].getSpokenText() + "
");
}
try {
// Deallocate the recognizer
recognizer.forceFinalize(true);
recognizer.deallocate();
}catch (Exception exception) {
exception.printStackTrace();
}
System.exit(0);
}
public static void main(String args[]) {
try {
Central.registerEngineCentral
("com.cloudgarden.speech.CGEngineCentral");
RecognizerModeDesc desc =
new RecognizerModeDesc(Locale.US,Boolean.TRUE);
// Create a recognizer that supports US English.
recognizer = Central.createRecognizer(desc);
// Start up the recognizer
recognizer.allocate();
// Load the grammar from a file, and enable it
FileReader fileReader =
new FileReader("D:\\my_grammar.grammar");
RuleGrammar grammar = recognizer.loadJSGF(fileReader);
grammar.setEnabled(true);
// Add the listener to get results
recognizer.addResultListener(new SpeechToTextConverter());
// Commit the grammar
recognizer.commitChanges();
recognizer.waitEngineState(Recognizer.LISTENING);
// Request focus and start listening
recognizer.requestFocus();
recognizer.resume();
recognizer.waitEngineState(Recognizer.FOCUS_ON);
recognizer.forceFinalize(true);
recognizer.waitEngineState(Recognizer.DEALLOCATED);
} catch (Exception e) {
e.printStackTrace();
System.exit(0);
}
}
}
Little Grammar
The JSpeech Grammar Format (JSGF) is a platform-independent, vendor-independent
textual representation of grammars for use in speech recognition. Grammars are used
by speech recognizers to determine what the recognizer should listen for, and so
describe the utterances a user may say. JSGF adopts the style and conventions of the
JavaTM Programming Language in addition to use of traditional grammar notations.
To know more about JSGF. Click here
Here is the content of my_grammar.grammar file.
#JSGF V1.0;
grammar com.sarf.talkingjava.example;
public <startExample> = (please | My name is sarf |
What is your Name | Open Firefox | Open notepad |Open grammar |
Please to meet you ) *;
public <endExample> = [thanks | thank you | thank you very much];
Limitation
There are few limitations of speech recognition technology.
It does not transcribe free-format speech input. So you might be getting some thing different
while transcription.
Speech recognition is constrained by the grammar.
Download