Natural User Interface with Kinect for Windows

advertisement
Natural User Interface with
Kinect for Windows
Clemente Giorio & Paolo Patierno
Natural User Interface
Hardware Overview
3-axis ACCELEROMETER
IR PROJECTOR
MIC ARRAY
DEPTH CAMERA
RGB CAMERA
TILT MOTOR
Hardware Requirements:
•Windows 7, Windows 8, Windows
Embedded Standard 7, or Windows
Embedded POSReady 7.
• CPU x86 or x64
•Dual-core 2.66-GHz
•Dedicated USB 2.0 bus
•2 GB RAM
Inside Kinect
IR Projector
827nm
The pattern is composed by 3x3 sub-patterns of
211x165 dots pattern (for a total of 633x495
dots). In each sub-patterns one spot is much
brighter than all the others.
Depth Camera
Mode
Near
Normal
CMOS with an IR-pass filter
up-to 640x480 pixels
Each pixel, based on 11 bits, can represents
2048 levels of depth.
Physical Limits
0.4 to 3m (1.3 to 9.8ft)
0.8 to 4m (2.6 to 13.1ft)
Practical Limits
0.8 to 2.5m (2.6 to 8.2ft)
1.2 to 3.5m (4 to 11.5ft)
RGB Camera
IR frame
CMOS
1280x960@12fps
30fps@640x480
with 8bits per channel
producing a Bayer filter output with a RGGBD pattern
Tilt Motor & 3-axis Accelerometer
3-axis accelerometer configured for a 2g range
(g is the acceleration value due to gravity) with
1-3 degree accuracy.
Tilt Motor
Mic Array
24-bit Analog to Digital Converter
The captured audio is encoded
using Pulse-Code Modulation
(PCM) with a sampling rate of 16
KHz and 16-bit depth.
4 x mic
Advantages of multi-microphones
Enhanced Noise Suppression,
Acoustic Echo Cancellation
Beam-forming technique.
SDK Overview
Camera Data
Step 1: Register for
VideoFrameReady Event
/// Active Kinect sensor
private KinectSensor sensor;
// Turn on the color stream to receive color frames
this.sensor.ColorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30);
// Add an event handler to be called whenever there is new color frame data
this.sensor.ColorFrameReady += this.SensorColorFrameReady;
// Start the sensor!
this.sensor.Start();
Step 2: Read the Stream
/// Event handler for Kinect sensor's ColorFrameReady event
private void SensorColorFrameReady(object sender, ColorImageFrameReadyEventArgs e)
{
using (ColorImageFrame colorFrame = e.OpenColorImageFrame())
{ if (colorFrame != null)
{
// Copy the pixel data from the image to a temporary array
colorFrame.CopyPixelDataTo(this.colorPixels);
// Write the pixel data into our bitmap
this.colorBitmap.WritePixels(new Int32Rect(0, 0,
this.colorBitmap.PixelWidth,
this.colorBitmap.PixelHeight),
this.colorPixels,
this.colorBitmap.PixelWidth * sizeof(int), 0);
}
}
}
DepthFrameReady Event
void sensor_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e) {
using (DepthImageFrame depthFrame = e.OpenDepthImageFrame())
{
if (depthFrame != null)
{
// Copy the pixel data from the image to a temporary array
depthFrame.CopyDepthImagePixelDataTo(this.depthPixels);
//convert the depth pixels to colored pixels
ConvertDepthData2RGB(depthFrame.MinDepth, depthFrame.MaxDepth);
this.depthBitmap.WritePixels(
new Int32Rect(0, 0,
this.depthBitmap.PixelWidth,
this.depthBitmap.PixelHeight),
this.colorDepthPixels,
this.depthBitmap.PixelWidth * sizeof(int),
0);
UpdateFrameRate();
}
} }
Depth Data
• ImageFrame.Image.Bits
• Array of bytes - public byte[] Bits;
• Array
–Starts at top left of image
–Moves left to right, then top to bottom
–Represents distance for pixel in millimeters
Distance
• 2 bytes per pixel (16 bits)
• Depth – Distance per pixel
–Bitshift second byte by 8
–Distance (0,0) = (int)(Bits[0] | Bits[1] << 8);
–VB (int)(CInt(Bits(0)) Or CInt(Bits(1)) << 8);
• DepthAndPlayer Index – Includes Player index
–Bitshift by 3 first byte (player index), 5 second byte
–Distance (0,0) =(int)(Bits[0] >> 3 | Bits[1] << 5);
–VB:(int)(CInt(Bits(0)) >> 3 Or CInt(Bits(1)) << 5);
Skeleton Tracking
• Skeleton Data
Y
X
Z
Skeleton
Default
20 Joints
Seated
10 Joints
Skeleton API
Joint Data
• Maximum two players tracked at once
• Six player proposals
• Each player with set of <x, y, z> joints in meters
• Each joint has associated state
• Tracked, Not tracked, or Inferred
• Inferred - Occluded, clipped, or low confidence
joints
Step 1: SkeletonFrameReady event
// Turn on the skeleton stream to receive skeleton frames
this.sensor.SkeletonStream.Enable();
// Add an event handler to be called whenever there is new color frame data
this.sensor.SkeletonFrameReady += this.SensorSkeletonFrameReady;
/// Event handler for Kinect sensor's SkeletonFrameReady event
private void SensorSkeletonFrameReady
(object sender, SkeletonFrameReadyEventArgs e) {
Skeleton[] skeletons = new Skeleton[0];
using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) {
if (skeletonFrame != null)
{
skeletons = new Skeleton[skeletonFrame.SkeletonArrayLength];
skeletonFrame.CopySkeletonDataTo(skeletons);
}
}
Step 2: Read the skeleton data
using (DrawingContext dc = this.drawingGroup.Open())
{
// Draw a transparent background to set the render size
dc.DrawRectangle(Brushes.Black, null,
new Rect(0.0, 0.0, RenderWidth, RenderHeight));
if (skeletons.Length != 0)
{
foreach (Skeleton skel in skeletons) {
RenderClippedEdges(skel, dc);
if (skel.TrackingState == SkeletonTrackingState.Tracked)
{
this.DrawBonesAndJoints(skel, dc);}
else if (skel.TrackingState == SkeletonTrackingState.PositionOnly)
{
dc.DrawEllipse(this.centerPointBrush,
null,
this.SkeletonPointToScreen(skel.Position),
BodyCenterThickness,
BodyCenterThickness); }}}
// prevent drawing outside of our render area
this.drawingGroup.ClipGeometry =
new RectangleGeometry(new Rect(0.0, 0.0, RenderWidth, RenderHeight)); } }
Step 3: Use the joint data
// Left Arm
this.DrawBone(skeleton, drawingContext, JointType.ShoulderLeft, JointType.ElbowLeft);
this.DrawBone(skeleton, drawingContext, JointType.ElbowLeft, JointType.WristLeft);
this.DrawBone(skeleton, drawingContext, JointType.WristLeft, JointType.HandLeft);
Step 4: Fine-tune
Audio
• As microphone
• For Speech Recognition
Speech Recognition
• Kinect Grammar available to download
• Grammar – What we are listening for
–Code – GrammarBuilder, Choices
–Speech Recognition Grammar Specification
(SRGS)
• C:\Program Files (x86)\Microsoft Speech Platform
SDK\Samples\Sample Grammars\
Grammar
<grammar version="1.0" xml:lang="it-IT" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">
<rule id="rootRule">
<one-of>
<item>
<tag>FORWARD</tag>
<one-of>
<item> avanti </item>
<item> vai avanti </item>
<item> avanza </item>
</one-of>
</item>
<item>
<tag>BACKWARD</tag>
<one-of>
<item> indietro </item>
<item> vai indietro </item>
<item> indietreggia </item>
</one-of>
</item>
</one-of>
</rule>
</grammar>
Netduino Plus based robot
Magician chassis
• Struttura
• 2 DC motors
Motor driver
WiFi bridge
Netduino Plus
Demo
MotionControlRemote
Connect & Commands
MotionClient
MotionServer
MotionControlTB6612FNG
TB6612FNG
Demo
DEMO
Per visualizzare qualche attimo registrato durante la sessione:
Demo Gesture Recognition: https://vimeo.com/58336449
Demo Speech Recognition in Napoletano: https://vimeo.com/58336020
Resources & Contact
Kinect for Windows:
http://www.microsoft.com/en-us/kinectforwindows/
MSDN:
http://msdn.microsoft.com/en-us/library/hh855347.aspx
Clemente Giorio:
http://it.linkedin.com/pub/clemente-giorio/11/618/3a
Paolo Patierno:
http://it.linkedin.com/in/paolopatierno
Ringraziamo gli sponsor!
Download