A REAL-TIME PERSONALIZED GESTURE INTERACTION SYSTEM USING WII REMOTE AND KINECT FOR TILEDDISPLAY ENVIRONMENT Yihua Lou, Wenjun Wu Beihang University OUTLINE • Background & Problem • Related works • System & Algorithm Design • Experiments • Conclusion BACKGROUND • Large Tiled-Display Environment • Large virtual desktop (tens of millions of pixels) • View and manipulate juxtaposed applications • Suitable for multi-user interaction & collaboration BACKGROUND • Somatosensory devices • Wii Remote: 3-axis accelerometer • Kinect: 30fps RGB & depth image, Skeleton tracking PROBLEM • Interaction method • Device-based: Not suitable in a large-space environment • Gesture-based: Suitable, but some technical challenges • Gesture interaction challenges • Gestures are personal • Gestures may vary from time to time • Difficult to define standard gesture vocabulary OUTLINE • Background & Problem • Related works • System & Algorithm Design • Experiments • Conclusion RELATED WORKS • Gesture recognition systems • uWave: acceleration-based, personalized, DTW based • Wiigee: acceleration-based, user-independent, HMM based • Some other Kinect based recognition systems • Gesture recognition algorithms • HMM: Most popular for user-independent recognition, requires large training dataset • DTW: No need of training, suitable for personalized OUTLINE • Background & Problem • Related works • System & Algorithm Design • Experiments • Conclusion SYSTEM & ALGORITHM DESIGN OVERVIEW • Design Aspects • Easy-to-use • Personalized • Two-handed • Ongoing Gesture • System Architecture • Gesture Input Clinet SYSTEM & ALGORITHM DESIGN FEATURE SELECTION SYSTEM & ALGORITHM DESIGN FILTER AND QUANTIZATION • Moving-average Filter: Reduce noise • Window: five samples for acceleration data, three samples for skeleton position data • Step: one sample • Quantization: Improve efficiency • Acceleration data: Non-linear, [-3g, 3g] → [-31, 31] • Skeleton position data: Linear, [-1m, 1m] → [-30, 30]; ±31 for other values Original acceleration value Quantized value a > +2g +g < a ≤ +2g -g ≤ a ≤ +g -2g ≤ a < -g a < -2g 31 21 to 30 (linearly) -20 to 20 (linearly) -30 to -21 (linearly) -31 Input time series SYSTEM & ALGORITHM DESIGN DYNAMIC TIME WARPING Template time series SYSTEM & ALGORITHM DESIGN TEMPLATE ADAPTION • Template accuracy is important • Directly affect the recognition accuracy • Data may vary in each gesture • Templates are adapted when rejection occurred • Two continuous or three accumulative rejections • Use the gesture input with minimal accumulated DTW distance among all the previous successfully recognized gesture inputs as the new template SYSTEM & ALGORITHM DESIGN PROCESSING FLOW • Processing flow of Gesture Input Client OUTLINE • Background & Problem • Related works • System & Algorithm Design • Experiments • Conclusion EXPERIMENTS ENVIRONMENT • Hardware • ThinkPad T420s: Core-i5 2520M / 4GB RAM • Wii Remote controller with the Nunchuk extension • Kinect for Windows sensor • Software • Windows 7 SP1 • Visual C++ 2012 • Kinect SDK 1.6 for Windows EXPERIMENTS DATASET • Eight two-handed gestures • Six individuals (two females, four males) • Five days • 2400 samples Horizontal Zoom-In Rotate Left Horizontal Zoom-Out Vertical Zoom-In Rotate Right Push Forward Vertical Zoom-Out Pull Back EXPERIMENTS RECOGNITION ACCURACY • Data from different days • Without template adaption: 6.7% rejection and 1.8% error in average • With template adaption: 4.4% rejection and 1.1% error in average Without template adaption HI HO VI VO RL RR PF PB Avg Correct 94.4% 95.6% 98.9% 98.9% 76.1% 78.9% 97.8% 91.7% 91.5% Error 0.0% 0.5% 0.0% 0.0% 7.8% 6.1% 0.0% 0.0% 1.8% With template adaption Rejected Correct 5.6% 96.7% 3.9% 95.6% 1.1% 98.9% 1.1% 98.9% 16.1% 78.3% 15.0% 92.8% 2.2% 98.9% 8.3% 96.1% 6.7% 94.5% Error 0.0% 0.5% 0.0% 0.0% 4.5% 3.9% 0.0% 0.0% 1.1% Rejected 3.3% 3.9% 1.1% 1.1% 17.2% 3.3% 1.1% 3.9% 4.4% EXPERIMENTS RECOGNITION ACCURACY • Data from the same day • Without template adaption: 1.7% rejection and 0.2% error in average • With template adaption: 1.7% rejection and 0.2% error in average Without template adaption HI HO VI VO RL RR PF PB Avg Correct 96.7% 99.4% 99.4% 99.4% 97.8% 94.4% 99.4% 98.3% 98.1% Error 0.0% 0.6% 0.6% 0.0% 0.0% 0.6% 0.0% 0.0% 0.2% With template adaption Rejected Correct 3.3% 96.7% 0.0% 99.4% 0.0% 99.4% 0.6% 99.4% 2.2% 97.8% 5.0% 94.4% 0.6% 99.4% 1.7% 98.3% 1.7% 98.1% Error 0.0% 0.6% 0.6% 0.0% 0.0% 0.6% 0.0% 0.0% 0.2% Rejected 3.3% 0.0% 0.0% 0.6% 2.2% 5.0% 0.6% 1.7% 1.7% EXPERIMENTS RECOGNITION ACCURACY • Comparison of error rate of using different input • Use only acceleration or skeleton: 5.6% in average • Use combination: 1.8% in average Acceleration Same day HI HO VI VO RL RR PF PB Avg 0.6% 0.6% 0.6% 0.6% 1.7% 4.4% 0.0% 0.0% 1.0% Skeleton position Different Same day days 1.7% 3.3% 0.6% 0.6% 0.0% 0.6% 0.6% 0.0% 15.6% 0.6% 21.1% 2.8% 2.2% 0.6% 2.8% 1.7% 5.6% 1.2% Combination Different Same day days 5.6% 0.0% 4.4% 0.6% 1.1% 0.6% 0.6% 0.0% 19.4% 0.0% 8.3% 0.6% 0.0% 0.0% 5.6% 0.0% 5.6% 0.2% Different days 0.0% 0.5% 0.0% 0.0% 7.8% 6.1% 0.0% 0.0% 1.8% EXPERIMENTS ONGOING ACCURACY • Recognition accuracy • 70% of input data with at least 85% accuracy rate • Recognition error • 70% of input data with at most 5% error rate EXPERIMENTS PRACTICAL EVALUATION • Tested in our SAGE-based tiled-display environment OUTLINE • Background & Problem • Related works • System & Algorithm Design • Experiments • Conclusion CONCLUSION • A personalized gesture interaction system • Use both acceleration data from Wii Remote and skeleton data from Kinect • A DTW based real-time gesture recognition algorithm • Ongoing gesture recognition support • Future work • Adding the user-identification feature THANK YOU!