Lesson 3: Hearing Things In this module, you will learn: ● What speech recognition is ● How to use Speech Recognition on NAO Framework. ○ English Speech Recognition : online and offline. ○ Vietnamese Speech Recognition: online and offline. Contents: ● Speech Recognition on NAO ● Speech Recognition on NAO Framework ● Task 1: English Speech Recognition ● Task 2: Vietnamese Speech Recognition Speech Recognition on NAO: Humans frequently communicate through speech. For example, a common greeting when we meet someone is “hi” or “how are you?” We process speech automatically, and understand the meaning of the words we hear nearly instantaneously. On a robot, this process is more involved. The NAO humanoid robot has microphones on its head, that it uses to listen to sounds around it. However, unlike our ears that listen for sounds all the time, the NAO has to be programmed to listen for sounds at specific times. After it hears human speech, the NAO performs speech recognition with an algorithm to convert what it hears into words that it knows. To do so, the NAO requires a library of words that it expects to hear. For example, the library can contain two words, “yes” and “no”. When the NAO processes the sounds it hears, it will classify it as either “yes”, “no”, or neither of the two. You may have had experience with a similar system when using automated phone services or voice control on your cellphone, where you are given a list of options that you can speak to select. Once a word is recognized, the NAO can then be programmed to react in different ways. After hearing “yes”, the NAO could reply with “I am happy” and after hearing “no”, the NAO could say “I am sad”. If the NAO doesn’t understand the words (it did not sound like “yes” or “no”) then the NAO could reply “I don’t know.” This is called a conditional on the robot, and we will go into more detail in the tasks below. Speech Recognition on NAO Framework: NAO Framework provides two choice of speech recognition: - On Android phone: If our application uses speech recognition on Android phone, we will see the below picture to know the programing flow: Advantage: - Use a lot popular Speech Recognition Engine as Google Disadvantage: - On NAO Robot If we want to use speech recognition service on NAO, first we have to register the service on NAO robot via NAO Framework. Task 1: English Speech Recognition In this lesson, we will learn how to use speech recognition using NAO Framework. We will program the NAO to recognize the question about robot name and to give response. - Option 1: Using speech recognition on Android phone Using Android Speech Recognition (http://developer.android.com/reference/android/speech/package-summary.html) - Option 2: Using speech recognition on NAO robot Step 1. Create new Android project using Robot Activity template. Step 2. Open res/layout/activity_main.xml in your android project and replace its content with following: File: res/layout/activity_main.xml <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools="http://schemas.android.com/tools" android:layout_width="match_parent" android:layout_height="match_parent" > <LinearLayout android:layout_width="fill_parent" android:layout_height="fill_parent" android:orientation="vertical"> <LinearLayout android:layout_width="fill_parent" android:layout_height="wrap_content" android:orientation="horizontal"> <Button android:id="@+id/btSpeechRecognitionStart" android:layout_width="fill_parent" android:layout_height="fill_parent" android:layout_weight="1" android:text="Start Recognition" /> <Button android:id="@+id/btSpeechRecognitionStop" android:layout_width="fill_parent" android:layout_height="fill_parent" android:layout_weight="1" android:text="Stop Recognition" /> </LinearLayout> </LinearLayout> </RelativeLayout> The UI is very simply. One LinearLayout to organize the button and text view. Note the id for two buttons: btSpeechRecognitionStart and btSpeechRecognitionStop which we will use in our Java code. Step 3: Register RobotEvent to handle result. Use RobotEventReceiver.register() to register event form NAO Robot. Step 4: Setting the language to use. In this step, we have to set the language, language parameters, sentences list. Use RobotSpeechRecognition.setVisualExpression() to set visual expression. Use RobotSpeechRecognition.setAudioExpression() to set audio expression. Use RobotSpeechRecognition.setVocabulary() to set vocabulary. Use RobotSpeechRecognition.getAvailableLanguages() to get all available language. Use RobotSpeechRecognition.getCurrentLanguage()to get current language. Use RobotSpeechRecognition.setCurrentLanguage()to set current language. Step 5: Subscribe the speech recognition event. We have to subscribe the speech recognition event to use them on NAO Robot by RobotEventSubscriber.subscribeEvent() function. Step 6: Start Recognition process and handle the result. After that, we can use speech recognition on NAO, speak the sentence in sentence list above, NAO Robot will recognize it and return the result. Step 7: Unsubscribe the speech recognition event. To stopping speech recognition module you have to unsubscribe the robot event which you subscribe above by RobotEventSubscriber.unsubscribeEvent()function. The State-Machine of NAO Speech Recognition. Task 2: Vietnamese Speech Recognition In this next exercise, its similar to previous lesson but used Vietnamese speech recognition. - Option 1: Using online speech recognition : Step 1: Create new Android project using Robot Activity template. Step 2: Initiating a Recognition. Before you use speech recognition, ensure that you have set up the core Speech Kit library with the SpeechKit.initialize() method. Then create and initialize a Recognizer object: recognizer = sk.createRecognizer(Recognizer.RecognizerType.Dictation, Recognizer.EndOfSpeechDetection.Short, "en_US", this, handler); The SpeechKit.createRecognizer method initializes a recognizer and starts the speech recognition process. The type parameter is a String , generally one of the recognition type constants defined in the Speech Kit library and available in the class documentation for Recognizer . Nuance may provide you with a different value for your unique recognition needs, in which case you will enter the raw String . The detection parameter determines the end-of-speech detection model and must be one of the Recognizer.EndOfSpeechDetection types. The language parameter defines the speech language as a string in the format of the ISO 639 language code, followed by an underscore “_”, followed by the ISO 3166-1 country code. The this parameter defines the object to receive status, error, and result messages from the recognizer. It can be replaced with any object that implements the RecognizerListener interface. handler should be an android.os.Handler object that was created with Handler handler = new Handler(); is a special Android object that processes messages. It is needed to receive callbacks from the Speech Kit library. This object can be created inside an Activity that is associated with the main window of your application, or with the windows or controls where voice recognition will actually be used. Handler Start the recognition by calling start() . The Recognizer.Listener passed into SpeechKit.createRecognizer receives the recognition results or error messages, as described below. Step 3: Receiving Recognition Results To retrieve the recognition results, implement the Recognizer.Listener.onResults method. For example: public void onResults(Recognizer recognizer, Recognition results) { currentRecognizer = null; int count = results.getResultCount(); Recognition.Result [] rs = new Recognition.Result[count]; for (int i = 0; i < count; i++) { rs[i] = results.getResult(i); } setResults(rs); } This method will be called only on successful completion, and the results list will have zero or more results. Even in the absence of an error, there may be a suggestion, present in the recognition results object, from the speech server. This suggestion should be presented to the user. Step 4: Using Prompts Prompts are short audio clips or vibrations that are played during a recognition. Prompts may be played at the following stages of the recognition: Recording start: the prompt is played before recording. The moment the prompt completes, recording will begin. Recording stop: the prompt is played when the recorder is stopped. Result: the prompt is played if a successful result is received. Error: the prompt is played if an error occurs. The SpeechKit.defineAudioPrompt method defines an audio prompt from a raw resource ID packaged with the Android application. Audio prompts may consume significant system resources until release is called, to try to minimize the number of instances. The Prompt.vibrate method defines a vibration prompt. Vibration prompts are inexpensive–they can be created on the fly as they are used, and there is no need to release them. Call SpeechKit.setDefaultRecognizerPrompts to specify default audio or vibration prompts to play during all recognitions by default. To override the default prompts in a specific recognition, call setPrompt prior to calling start . Step 5: Handling Errors To be informed of any recognition errors, implement the onError method of the Recognizer.Listener interface. In the case of errors, only this method will be called; conversely, on success this method will not be called. In addition to the error, a suggestion, as described in the previous section, may or may not be present. Note that both the Recognition and the SpeechError class have a getSuggestion method that can be used to check for a suggestion from the server. Example: public void onError(Recognizer recognizer, SpeechError error) { if (recognizer != currentRecognizer) return; currentRecognizer = null; // Display the error + suggestion in the edit box String detail = error.getErrorDetail(); String suggestion = error.getSuggestion(); if (suggestion == null) suggestion = ""; setResult(detail + "\n" + suggestion); } Step 6: Managing Recording State Changes ■ Optionally, to be informed when the recognizer starts or stops recording audio, implement the onRecordingBegin and onRecordingDone methods of the Recognizer.Listener interface. There may be a delay between initialization of the recognizer and the actual start of recording, so the onRecordingBegin message can be used to signal to the user when the system is listening. public void onRecordingBegin(Recognizer recognizer) { // Update the UI to indicate the system is now recording } The onRecordingDone message is sent before the speech server has finished receiving and processing the audio, and therefore before the result is available. public void onRecordingDone(Recognizer recognizer) { // Update the UI to indicate that recording has stopped and the speech is still being proc } This message is sent both with and without end-of-speech detection models in place. The message is sent regardless, whether recording was stopped due to calling the stopRecording method or due to detecting end-of-speech. The state machine: - Option 2: Using offline speech recognition : Step 1. Create new Android project using Robot Activity template. Step 2. Open res/layout/activity_main.xml in your android project and replace its content with following: File: res/layout/activity_main.xml <?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android" android:orientation="vertical" android:layout_width="fill_parent" android:layout_height="fill_parent" android:background="@color/white"> <EditText android:id="@+id/EditText01" android:layout_width="fill_parent" android:layout_height="wrap_content" android:layout_weight="1" android:contentDescription="Recognition results" android:text="Text goes here..." android:textColor="@color/black" > </EditText> <ProgressBar android:id="@+id/progressbar_level" style="?android:attr/progressBarStyleHorizontal" android:layout_width="match_parent" android:layout_height="wrap_content" android:layout_weight="1" /> <Button android:id="@+id/Button01" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_weight="0" android:layout_gravity="center_horizontal" android:text="Start">] </Button> </LinearLayout> The UI is very simply. One LinearLayout to organize the button and text view. Note the id for button: Button01 and text view: EditText01 which we will use in our Java code. Step 3. Create the speech recognition using SphinxSpeechRecognizer createSpeechRecognizer(Context context, ArrayList<String> grammarContent, String modelPath ) @param context : current context @param grammarContent : Sentences used in your application @param modelPath : Path to language models in Assets folder Note: We have to create string list include all the sentences if we want to use. Then we have to init the recognition and set the listener to handle result and error after run recognition. Step 4. Using prepareSpeechRecognizer() and startSpeechRecognizer() to prepare and start recognition process. Step 5: Using stopSpeechRecognizer() to stop recognition process and handle the result in onResults(Bundle b). The State-Machine of VN Speech Recognition.