Doc

advertisement
Lesson 3: Hearing Things
In this module, you will learn:
● What speech recognition is
● How to use Speech Recognition on NAO Framework.
○ English Speech Recognition : online and offline.
○ Vietnamese Speech Recognition: online and offline.
Contents:
● Speech Recognition on NAO
● Speech Recognition on NAO Framework
● Task 1: English Speech Recognition
● Task 2: Vietnamese Speech Recognition
Speech Recognition on NAO:
Humans frequently communicate through speech. For example, a common greeting when we meet
someone is “hi” or “how are you?” We process speech automatically, and understand the meaning
of the words we hear nearly instantaneously. On a robot, this process is more involved. The NAO
humanoid robot has microphones on its head, that it uses to listen to sounds around it.
However, unlike our ears that listen for sounds all the time, the NAO has to be programmed to
listen for sounds at specific times. After it hears human speech, the NAO performs speech
recognition with an algorithm to convert what it hears into words that it knows.
To do so, the NAO requires a library of words that it expects to hear. For example, the library can
contain two words, “yes” and “no”. When the NAO processes the sounds it hears, it will classify it
as either “yes”, “no”, or neither of the two. You may have had experience with a similar system
when using automated phone services or voice control on your cellphone, where you are given a list
of options that you can speak to select.
Once a word is recognized, the NAO can then be programmed to react in different ways. After
hearing “yes”, the NAO could reply with “I am happy” and after hearing “no”, the NAO could say
“I am sad”. If the NAO doesn’t understand the words (it did not sound like “yes” or “no”) then the
NAO could reply “I don’t know.” This is called a conditional on the robot, and we will go into more
detail in the tasks below.
Speech Recognition on NAO Framework:
NAO Framework provides two choice of speech recognition:
- On Android phone:
If our application uses speech recognition on Android phone, we will see the below picture to know
the programing flow:
Advantage:
- Use a lot popular Speech Recognition Engine as Google
Disadvantage:
- On NAO Robot
If we want to use speech recognition service on NAO, first we have to register the service on NAO
robot via NAO Framework.
Task 1: English Speech Recognition
In this lesson, we will learn how to use speech recognition using NAO Framework. We will
program the NAO to recognize the question about robot name and to give response.
- Option 1: Using speech recognition on Android phone
Using Android Speech Recognition
(http://developer.android.com/reference/android/speech/package-summary.html)
- Option 2: Using speech recognition on NAO robot
Step 1. Create new Android project using Robot Activity template.
Step 2. Open res/layout/activity_main.xml in your android project and replace its content with
following:
File: res/layout/activity_main.xml
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent" >
<LinearLayout
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:orientation="vertical">
<LinearLayout
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:orientation="horizontal">
<Button
android:id="@+id/btSpeechRecognitionStart"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:layout_weight="1"
android:text="Start Recognition" />
<Button
android:id="@+id/btSpeechRecognitionStop"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
android:layout_weight="1"
android:text="Stop Recognition" />
</LinearLayout>
</LinearLayout>
</RelativeLayout>
The UI is very simply. One LinearLayout to organize the button and text view. Note the id for two
buttons: btSpeechRecognitionStart and btSpeechRecognitionStop which we will use in our Java code.
Step 3: Register RobotEvent to handle result.
Use RobotEventReceiver.register() to register event form NAO Robot.
Step 4: Setting the language to use.
In this step, we have to set the language, language parameters, sentences list.
Use RobotSpeechRecognition.setVisualExpression() to set visual expression.
Use RobotSpeechRecognition.setAudioExpression() to set audio expression.
Use RobotSpeechRecognition.setVocabulary() to set vocabulary.
Use RobotSpeechRecognition.getAvailableLanguages() to get all available language.
Use RobotSpeechRecognition.getCurrentLanguage()to get current language.
Use RobotSpeechRecognition.setCurrentLanguage()to set current language.
Step 5: Subscribe the speech recognition event.
We have to subscribe the speech recognition event to use them on NAO Robot by
RobotEventSubscriber.subscribeEvent() function.
Step 6: Start Recognition process and handle the result.
After that, we can use speech recognition on NAO, speak the sentence in sentence list
above, NAO Robot will recognize it and return the result.
Step 7: Unsubscribe the speech recognition event.
To stopping speech recognition module you have to unsubscribe the robot event which you
subscribe above by RobotEventSubscriber.unsubscribeEvent()function.
The State-Machine of NAO Speech Recognition.
Task 2: Vietnamese Speech Recognition
In this next exercise, its similar to previous lesson but used Vietnamese speech recognition.
- Option 1: Using online speech recognition :
Step 1: Create new Android project using Robot Activity template.
Step 2: Initiating a Recognition.
Before you use speech recognition, ensure that you have set up the core Speech Kit library with the
SpeechKit.initialize() method.
Then create and initialize a Recognizer object:
recognizer = sk.createRecognizer(Recognizer.RecognizerType.Dictation,
Recognizer.EndOfSpeechDetection.Short,
"en_US", this, handler);
The SpeechKit.createRecognizer method initializes a recognizer and starts the speech recognition process.

The type parameter is a String , generally one of the recognition type constants defined in the
Speech Kit library and available in the class documentation for Recognizer . Nuance may
provide you with a different value for your unique recognition needs, in which case you will
enter the raw String .

The detection parameter determines the end-of-speech detection model and must be one of
the Recognizer.EndOfSpeechDetection types.

The language parameter defines the speech language as a string in the format of the ISO 639
language code, followed by an underscore “_”, followed by the ISO 3166-1 country code.

The this parameter defines the object to receive status, error, and result messages from the
recognizer. It can be replaced with any object that implements the RecognizerListener interface.

handler
should be an android.os.Handler object that was created with
Handler handler = new Handler();

is a special Android object that processes messages. It is needed to receive callbacks from the Speech Kit library. This object can be created inside an Activity that is
associated with the main window of your application, or with the windows or controls
where voice recognition will actually be used.
Handler
Start the recognition by calling start() .
The Recognizer.Listener passed into SpeechKit.createRecognizer receives the recognition results or error
messages, as described below.
Step 3: Receiving Recognition Results
To retrieve the recognition results, implement the Recognizer.Listener.onResults method. For example:
public void onResults(Recognizer recognizer, Recognition results) {
currentRecognizer = null;
int count = results.getResultCount();
Recognition.Result [] rs = new Recognition.Result[count];
for (int i = 0; i < count; i++)
{
rs[i] = results.getResult(i);
}
setResults(rs);
}
This method will be called only on successful completion, and the results list will have zero or more
results.
Even in the absence of an error, there may be a suggestion, present in the recognition results object,
from the speech server. This suggestion should be presented to the user.
Step 4: Using Prompts
Prompts are short audio clips or vibrations that are played during a recognition. Prompts may be
played at the following stages of the recognition:




Recording start: the prompt is played before recording. The moment the prompt completes,
recording will begin.
Recording stop: the prompt is played when the recorder is stopped.
Result: the prompt is played if a successful result is received.
Error: the prompt is played if an error occurs.
The SpeechKit.defineAudioPrompt method defines an audio prompt from a raw resource ID packaged with
the Android application. Audio prompts may consume significant system resources until release is
called, to try to minimize the number of instances. The Prompt.vibrate method defines a vibration
prompt. Vibration prompts are inexpensive–they can be created on the fly as they are used, and
there is no need to release them.
Call SpeechKit.setDefaultRecognizerPrompts to specify default audio or vibration prompts to play during all
recognitions by default. To override the default prompts in a specific recognition, call setPrompt prior
to calling start .
Step 5: Handling Errors
To be informed of any recognition errors, implement the onError method of the Recognizer.Listener
interface. In the case of errors, only this method will be called; conversely, on success this method
will not be called. In addition to the error, a suggestion, as described in the previous section, may or
may not be present. Note that both the Recognition and the SpeechError class have a getSuggestion method
that can be used to check for a suggestion from the server.
Example:
public void onError(Recognizer recognizer, SpeechError error) {
if (recognizer != currentRecognizer) return;
currentRecognizer = null;
// Display the error + suggestion in the edit box
String detail = error.getErrorDetail();
String suggestion = error.getSuggestion();
if (suggestion == null) suggestion = "";
setResult(detail + "\n" + suggestion);
}
Step 6: Managing Recording State Changes
■
Optionally, to be informed when the recognizer starts or stops recording audio, implement
the onRecordingBegin and onRecordingDone methods of the Recognizer.Listener interface. There may be a delay
between initialization of the recognizer and the actual start of recording, so the onRecordingBegin
message can be used to signal to the user when the system is listening.
public void onRecordingBegin(Recognizer recognizer) {
// Update the UI to indicate the system is now recording
}
The onRecordingDone message is sent before the speech server has finished receiving and processing the
audio, and therefore before the result is available.
public void onRecordingDone(Recognizer recognizer) {
// Update the UI to indicate that recording has stopped and the speech is still
being proc
}
This message is sent both with and without end-of-speech detection models in place. The message
is sent regardless, whether recording was stopped due to calling the stopRecording method or due
to detecting end-of-speech.
The state machine:
- Option 2: Using offline speech recognition :
Step 1. Create new Android project using Robot Activity template.
Step 2. Open res/layout/activity_main.xml in your android project and replace its content with
following:
File: res/layout/activity_main.xml
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:orientation="vertical"
android:layout_width="fill_parent"
android:layout_height="fill_parent" android:background="@color/white">
<EditText
android:id="@+id/EditText01"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:layout_weight="1"
android:contentDescription="Recognition results"
android:text="Text goes here..."
android:textColor="@color/black" >
</EditText>
<ProgressBar
android:id="@+id/progressbar_level"
style="?android:attr/progressBarStyleHorizontal"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:layout_weight="1" />
<Button android:id="@+id/Button01" android:layout_width="wrap_content"
android:layout_height="wrap_content" android:layout_weight="0"
android:layout_gravity="center_horizontal" android:text="Start">]
</Button>
</LinearLayout>
The UI is very simply. One LinearLayout to organize the button and text view. Note the id for
button: Button01 and text view: EditText01 which we will use in our Java code.
Step 3. Create the speech recognition using
SphinxSpeechRecognizer createSpeechRecognizer(Context context, ArrayList<String> grammarContent, String modelPath )
@param context : current context
@param grammarContent : Sentences used in your application
@param modelPath : Path to language models in Assets folder
Note: We have to create string list include all the sentences if we want to use. Then we have to init
the recognition and set the listener to handle result and error after run recognition.
Step 4. Using prepareSpeechRecognizer() and startSpeechRecognizer() to prepare and start recognition process.
Step 5: Using stopSpeechRecognizer() to stop recognition process and handle the result in onResults(Bundle b).
The State-Machine of VN Speech Recognition.
Download