Natural User Interface with Kinect for Windows Clemente Giorio & Paolo Patierno Natural User Interface Hardware Overview 3-axis ACCELEROMETER IR PROJECTOR MIC ARRAY DEPTH CAMERA RGB CAMERA TILT MOTOR Hardware Requirements: •Windows 7, Windows 8, Windows Embedded Standard 7, or Windows Embedded POSReady 7. • CPU x86 or x64 •Dual-core 2.66-GHz •Dedicated USB 2.0 bus •2 GB RAM Inside Kinect IR Projector 827nm The pattern is composed by 3x3 sub-patterns of 211x165 dots pattern (for a total of 633x495 dots). In each sub-patterns one spot is much brighter than all the others. Depth Camera Mode Near Normal CMOS with an IR-pass filter up-to 640x480 pixels Each pixel, based on 11 bits, can represents 2048 levels of depth. Physical Limits 0.4 to 3m (1.3 to 9.8ft) 0.8 to 4m (2.6 to 13.1ft) Practical Limits 0.8 to 2.5m (2.6 to 8.2ft) 1.2 to 3.5m (4 to 11.5ft) RGB Camera IR frame CMOS 1280x960@12fps 30fps@640x480 with 8bits per channel producing a Bayer filter output with a RGGBD pattern Tilt Motor & 3-axis Accelerometer 3-axis accelerometer configured for a 2g range (g is the acceleration value due to gravity) with 1-3 degree accuracy. Tilt Motor Mic Array 24-bit Analog to Digital Converter The captured audio is encoded using Pulse-Code Modulation (PCM) with a sampling rate of 16 KHz and 16-bit depth. 4 x mic Advantages of multi-microphones Enhanced Noise Suppression, Acoustic Echo Cancellation Beam-forming technique. SDK Overview Camera Data Step 1: Register for VideoFrameReady Event /// Active Kinect sensor private KinectSensor sensor; // Turn on the color stream to receive color frames this.sensor.ColorStream.Enable(ColorImageFormat.RgbResolution640x480Fps30); // Add an event handler to be called whenever there is new color frame data this.sensor.ColorFrameReady += this.SensorColorFrameReady; // Start the sensor! this.sensor.Start(); Step 2: Read the Stream /// Event handler for Kinect sensor's ColorFrameReady event private void SensorColorFrameReady(object sender, ColorImageFrameReadyEventArgs e) { using (ColorImageFrame colorFrame = e.OpenColorImageFrame()) { if (colorFrame != null) { // Copy the pixel data from the image to a temporary array colorFrame.CopyPixelDataTo(this.colorPixels); // Write the pixel data into our bitmap this.colorBitmap.WritePixels(new Int32Rect(0, 0, this.colorBitmap.PixelWidth, this.colorBitmap.PixelHeight), this.colorPixels, this.colorBitmap.PixelWidth * sizeof(int), 0); } } } DepthFrameReady Event void sensor_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e) { using (DepthImageFrame depthFrame = e.OpenDepthImageFrame()) { if (depthFrame != null) { // Copy the pixel data from the image to a temporary array depthFrame.CopyDepthImagePixelDataTo(this.depthPixels); //convert the depth pixels to colored pixels ConvertDepthData2RGB(depthFrame.MinDepth, depthFrame.MaxDepth); this.depthBitmap.WritePixels( new Int32Rect(0, 0, this.depthBitmap.PixelWidth, this.depthBitmap.PixelHeight), this.colorDepthPixels, this.depthBitmap.PixelWidth * sizeof(int), 0); UpdateFrameRate(); } } } Depth Data • ImageFrame.Image.Bits • Array of bytes - public byte[] Bits; • Array –Starts at top left of image –Moves left to right, then top to bottom –Represents distance for pixel in millimeters Distance • 2 bytes per pixel (16 bits) • Depth – Distance per pixel –Bitshift second byte by 8 –Distance (0,0) = (int)(Bits[0] | Bits[1] << 8); –VB (int)(CInt(Bits(0)) Or CInt(Bits(1)) << 8); • DepthAndPlayer Index – Includes Player index –Bitshift by 3 first byte (player index), 5 second byte –Distance (0,0) =(int)(Bits[0] >> 3 | Bits[1] << 5); –VB:(int)(CInt(Bits(0)) >> 3 Or CInt(Bits(1)) << 5); Skeleton Tracking • Skeleton Data Y X Z Skeleton Default 20 Joints Seated 10 Joints Skeleton API Joint Data • Maximum two players tracked at once • Six player proposals • Each player with set of <x, y, z> joints in meters • Each joint has associated state • Tracked, Not tracked, or Inferred • Inferred - Occluded, clipped, or low confidence joints Step 1: SkeletonFrameReady event // Turn on the skeleton stream to receive skeleton frames this.sensor.SkeletonStream.Enable(); // Add an event handler to be called whenever there is new color frame data this.sensor.SkeletonFrameReady += this.SensorSkeletonFrameReady; /// Event handler for Kinect sensor's SkeletonFrameReady event private void SensorSkeletonFrameReady (object sender, SkeletonFrameReadyEventArgs e) { Skeleton[] skeletons = new Skeleton[0]; using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) { if (skeletonFrame != null) { skeletons = new Skeleton[skeletonFrame.SkeletonArrayLength]; skeletonFrame.CopySkeletonDataTo(skeletons); } } Step 2: Read the skeleton data using (DrawingContext dc = this.drawingGroup.Open()) { // Draw a transparent background to set the render size dc.DrawRectangle(Brushes.Black, null, new Rect(0.0, 0.0, RenderWidth, RenderHeight)); if (skeletons.Length != 0) { foreach (Skeleton skel in skeletons) { RenderClippedEdges(skel, dc); if (skel.TrackingState == SkeletonTrackingState.Tracked) { this.DrawBonesAndJoints(skel, dc);} else if (skel.TrackingState == SkeletonTrackingState.PositionOnly) { dc.DrawEllipse(this.centerPointBrush, null, this.SkeletonPointToScreen(skel.Position), BodyCenterThickness, BodyCenterThickness); }}} // prevent drawing outside of our render area this.drawingGroup.ClipGeometry = new RectangleGeometry(new Rect(0.0, 0.0, RenderWidth, RenderHeight)); } } Step 3: Use the joint data // Left Arm this.DrawBone(skeleton, drawingContext, JointType.ShoulderLeft, JointType.ElbowLeft); this.DrawBone(skeleton, drawingContext, JointType.ElbowLeft, JointType.WristLeft); this.DrawBone(skeleton, drawingContext, JointType.WristLeft, JointType.HandLeft); Step 4: Fine-tune Audio • As microphone • For Speech Recognition Speech Recognition • Kinect Grammar available to download • Grammar – What we are listening for –Code – GrammarBuilder, Choices –Speech Recognition Grammar Specification (SRGS) • C:\Program Files (x86)\Microsoft Speech Platform SDK\Samples\Sample Grammars\ Grammar <grammar version="1.0" xml:lang="it-IT" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar"> <rule id="rootRule"> <one-of> <item> <tag>FORWARD</tag> <one-of> <item> avanti </item> <item> vai avanti </item> <item> avanza </item> </one-of> </item> <item> <tag>BACKWARD</tag> <one-of> <item> indietro </item> <item> vai indietro </item> <item> indietreggia </item> </one-of> </item> </one-of> </rule> </grammar> Netduino Plus based robot Magician chassis • Struttura • 2 DC motors Motor driver WiFi bridge Netduino Plus Demo MotionControlRemote Connect & Commands MotionClient MotionServer MotionControlTB6612FNG TB6612FNG Demo DEMO Per visualizzare qualche attimo registrato durante la sessione: Demo Gesture Recognition: https://vimeo.com/58336449 Demo Speech Recognition in Napoletano: https://vimeo.com/58336020 Resources & Contact Kinect for Windows: http://www.microsoft.com/en-us/kinectforwindows/ MSDN: http://msdn.microsoft.com/en-us/library/hh855347.aspx Clemente Giorio: http://it.linkedin.com/pub/clemente-giorio/11/618/3a Paolo Patierno: http://it.linkedin.com/in/paolopatierno Ringraziamo gli sponsor!