docx - mywr.net

advertisement
Kinect
Lab 4: Sound
Sjoerd Houben
Introduction
In this lab we’ll look in to the possibilities of the Kinect’s Microphone array. One of the cool features
of the Kinect is its ability to recognize Speech. This should not be confused with voice recognition,
which is the recognition of an individual voice.
Stroop effect
As example for speech recognition, we are going to use the Stroop Effect. The Stroop effect looks at
how we read text and the colour of the text. For example:
You should normally read the text as black, blue and red, without even noticing the colours. But if
use the following example:
It might take you longer to read the actually text. This is due the fact that brains recognize the colour
is different from the actual text.
Using this as an example we are able to create a functional application to use the Kinect’s speech
recognition part.
2
Speech recognition
First open up a new WPF Application and import the Kinect reference and add the namespace to the
MainWindow. We will be using the following variables:
private List<string> colorNames;
private List<Color> colors;
private int index = 0, indexC = 0, score = 0;
private KinectSensor kinectSensor;
private SpeechRecognitionEngine speechRecogEng;
private KinectAudioSource source;
As you can see, we are going to use two new variables: SpeechRecognitionEngine and
KinectAudioSource. Their purpose will become clear in the next few steps.
Next, add two textblocks to the xaml file. One will be used for the actually text, the other will be the
score.
For the Constructor we need to add a couple of new methods.
public MainWindow()
{
InitializeComponent();
textBlockScore.Text = score.ToString();
InitColors();
InitNames();
NextColor();
this.Unloaded += delegate
{
speechRecogEng.RecognizeAsyncCancel();
speechRecogEng.RecognizeAsyncStop();
speechRecogEng.Dispose();
};
this.Loaded += delegate
{
kinectSensor = KinectSensor.KinectSensors[0];
kinectSensor.Start();
StartSpeechRecognition();
};
}
3
The methods InitColors and InitNames add the colours and text to the List variables.
private void InitColors()
{
colors = new List<Color>();
colors.Add(Colors.Blue);
colors.Add(Colors.Brown);
colors.Add(Colors.Purple);
colors.Add(Colors.Gray);
colors.Add(Colors.Yellow);
colors.Add(Colors.Black);
colors.Add(Colors.Green);
}
private void InitNames()
{
colorNames = new List<string>();
colorNames.Add("Blue");
colorNames.Add("Brown");
colorNames.Add("Purple");
colorNames.Add("Gray");
colorNames.Add("Yellow");
colorNames.Add("Black");
colorNames.Add("Green");
}
In the constructor we use the StartSpeechRecognition method. This method is the core of the
application. We declare the source, grammar list and SpeechRecognitionEngine.
private void StartSpeechRecognition()
{
source = CreateAudioSource();
Func<RecognizerInfo, bool> matchingFunc = r =>
{
string value;
r.AdditionalInfo.TryGetValue("Kinect", out value);
return "True".Equals(value,
StringComparison.InvariantCultureIgnoreCase)
&& "en-US".Equals(r.Culture.Name,
StringComparison.InvariantCultureIgnoreCase);
};
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers()
.Where(matchingFunc).FirstOrDefault();
speechRecogEng = new SpeechRecognitionEngine(ri.Id);
CreateGrammar(ri);
speechRecogEng.SpeechRecognized += sre_SpeechRecognized;
speechRecogEng.SpeechHypothesized += sre_SpeechHypothesized;
Stream s = source.Start();
speechRecogEng.SetInputToAudioStream(s,
new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1,
32000, 2, null));
speechRecogEng.RecognizeAsync(RecognizeMode.Multiple);
}
4
In StartSpeechRecognition we start the engine by getting the Recognizer info. After that we load the
grammar with CreateGrammar. Then we put in the event that will register the speech and guess the
word. When all of this is completed we can start the Stream.
To get the audio back from the Kinect, we are using the CreateAudioSource method. The method
gets audio source from the KinectSensor and disables automatic gain and echo cancellation.
private KinectAudioSource CreateAudioSource()
{
var source = KinectSensor.KinectSensors[0].AudioSource;
source.AutomaticGainControlEnabled = false;
source.EchoCancellationMode = EchoCancellationMode.None;
return source;
}
CreateGrammar will load the strings from the colorNames list so that Kinect can recognize the words.
private void CreateGrammar(RecognizerInfo ri)
{
var commands = new Choices();
foreach (String colorName in colorNames)
{
commands.Add(colorName);
}
var gb = new GrammarBuilder();
gb.Culture = ri.Culture;
gb.Append(commands);
var g = new Grammar(gb);
speechRecogEng.LoadGrammar(g);
var q = new GrammarBuilder();
q.Append("quit");
var quit = new Grammar(q);
speechRecogEng.LoadGrammar(quit);
}
5
Next are the two events we declared for the Speech recognition engine:
private void sre_SpeechHypothesized(object sender,
SpeechHypothesizedEventArgs e)
{
HypothesizedText = e.Result.Text;
}
private void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Dispatcher.BeginInvoke(new
Action<SpeechRecognizedEventArgs>(InterpretWord), e);
}
You will notice that we will be using OnPropertyChanged to call the respective events.
private string hypothesizedText;
public string HypothesizedText
{
get { return hypothesizedText; }
set
{
hypothesizedText = value;
OnPropertyChanged("HypothesizedText");
}
}
public event PropertyChangedEventHandler PropertyChanged;
private void OnPropertyChanged(string propertyName)
{
if (PropertyChanged != null)
{
PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
}
}
When the Kinect picks up audio it thinks it can translate, the Recognized event fires off. That event
will trigger the InterpretWord method.
private void InterpretWord(SpeechRecognizedEventArgs e)
{
var result = e.Result;
Confidence = Math.Round(result.Confidence, 2).ToString();
if (result.Words[0].Text == colorNames[indexC])
{
score++;
NextColor();
}
}
6
When the colour of the word we’re saying matches the correct one, then the program will increment
the score and trigger the NextColor method. Note the Confidence variable. This variable shows how
sure the Kinect is of the word.
The NextColor method uses a randomizer to get a random colour and word to appear on screen.
private void NextColor()
{
Random rdm = new Random();
int i;
do
{
i = rdm.Next(0, colorNames.Count);
} while (i == index);
index = i;
textBlock1.Text = colorNames[index];
int j;
do
{
j = rdm.Next(0, colors.Count);
} while (j == indexC && j == index);
indexC = j;
textBlock1.Foreground = new SolidColorBrush(colors[indexC]);
textBlockScore.Text = score.ToString();
}
That’s all the code you need to get this working. This is what the result should look like.
7
Download