How Kinect works? Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012 Major References • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • • CVPR 2011 Best Paper Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/0118123A1 • PrimeSense Patent 2 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 3 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 4 2/13/2012 What is Kinect? • • Motion sensing input device by Microsoft Depth camera tech. developed by PrimeSense • Invented in 2005 • Software tech. developed by Rare • First announced at E3 2009 as “Project Natal” • Windows SDK Releases http://www.microsoft.com /en-us/kinectforwindows/ discover/features.aspx 5 2/13/2012 Kinect IR Structured Light 6 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 7 2/13/2012 Kinect Architecture Depth Image IR Structured Light Body Parts Random Decision Forest 8 Joint Position Mean Shift 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 9 2/13/2012 3D Imaging of surface 10 2/13/2012 Triangulation • Main Problem • To recover shape from multiple views, need CORRESPONDENCES between the images • Matching/Correspondence problem is hard • • Occlusions, Texture, Colors.. Etc. Solution: Structured light • • Idea: Simplify matching Strategy: Use illumination to create your own correspondences 11 2/13/2012 Structured Light • Basic Principle • • Use a projector to create unambiguous correspondences Light projection • If we project a single point, matching is unique 12 2/13/2012 Structured Light • Line projection ( Line Scan ) • For calibrated cameras, the epipolar geometry is known • Project a line instead of a single point 13 2/13/2012 Structured Light • Project Multiple Stripes or Grids • Which stripe matches which? • Correspondence Again 14 2/13/2012 Structured Light • Answer 1: Assume Surface Continuity • Ordering Constraint 15 2/13/2012 Structured Light • Answer 2: Coloured stripes (De Bruijn) • Difficult to use for coloured surfaces 16 2/13/2012 Structured Light • Answer 2: Coloured dots (M-array) • Difficult to use for coloured surfaces 17 2/13/2012 Structured Light • Answer 3: Pattern dots (M-array) • Difficult for industrial manufacturing 18 2/13/2012 Structured Light • Answer 4: Time-coded light patterns (Time multiplexing) • • Use a sequence of binary patterns → (log N) images Each stripe has a unique binary illumination code 19 2/13/2012 Structured Light • • All of the above are categorized as Discrete Methods • Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680 There are a lot more Continuous Structured Light Methods such as Phase shifting and etc. 20 2/13/2012 Structured Light • • All of the above are human designed patterns. Random Speckle • • Structured light using randomly generated patterns May obtain denser depth information by solving correspondence problem 21 2/13/2012 What can we do better? • • A Projector is just an inverse of a camera • Need Calibration One projector and one camera is enough for triangulation 22 2/13/2012 PrimeSense Patents • US 2010/0118123 • • Projector-Camera system Already calibrated structure • δZ results in δX in 32 23 2/13/2012 PrimeSense Patents • US 2010/0118123 • Structured Light-1 • • • • Pseudo-random distribution Local: Random Global: Gray level decreases Can make a rough estimate in a low resolution image 24 2/13/2012 PrimeSense Patents • US 2010/0118123 • Structured Light-2 • • • Quasi-periodic pattern Five-fold symmetry Results in distinct peaks in freq. domain • Contain no unit cell repeats over spatial domain • Use to reduce noise and ambient light in environment 25 2/13/2012 Kinect IR Structured Light 26 2/13/2012 PrimeSense Patents • US 2010/0290698 27 2/13/2012 PrimeSense Patents • US 2010/0290698 • Uses a special (“astigmatic”) lens with different focal length in x- and y- directions • Orientation of the circle indicates depth 28 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 29 2/13/2012 From depth to joints • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) • • Algorithms runs 5ms per frame on Xbox GPU Novelty: Intermediate body parts representation 30 2/13/2012 Body Part Inference • Body part labeling • • 31 body parts Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body 31 2/13/2012 Body Part Inference • Depth image features • • • dI(x) is the depth at pixel x in image I θ=(u,v) describe offsets u and v Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations 32 2/13/2012 Randomized Decision Forests • • • • Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification 33 2/13/2012 Combining Models • Multiple classifiers work together • Committees • • • E.g. Majority votes Boosting • • • E.g. Averaging the predictions of a set of individual models Classifiers trained in sequence E.g. AdaBoost Decision Tree • Binary selection corresponding to the traversal of a tree 34 2/13/2012 Decision Tree • Three major aspect • • • A splitting criterion A stop-splitting rule A rule to assign each leaf to a specific class • Decision Forests • A Decision Tree Committee 35 2/13/2012 Randomized Decision Forests • • • • Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification How to train? 36 2/13/2012 Randomized Decision Forests • Training • • • Each tree train on different images Each image pick 2000 example pixels Algorithm 37 2/13/2012 Randomized Decision Forests • Algorithm(cont.) • Shannon entropy given Z on Y 38 2/13/2012 Randomized Decision Forests • Algorithm(cont.) • Training takes a lot of efforts • 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster Where are those training data? 39 2/13/2012 Training Data • Depth imaging • • Simplify the task of background subtraction Most important: easy to synthesize!!! Take Real Images Learning Synthesize Parameters 40 Generate Lots of training data 2/13/2012 Kinect Architecture Depth Image IR Structured Light Body Parts Random Decision Forest 41 Joint Position Mean Shift 2/13/2012 Joint Position Proposals • From the previous section, • Use Mean Shift with a weighted Gaussian kernel 42 2/13/2012 Mean Shift • Kernel density estimator • • • Discrete points -> Continuous function Calculate the gradient at initial point and shift Iterate till stop 43 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 44 2/13/2012 Experiments and Results • Synthetic • Real 45 2/13/2012 Experiments and Results • Failure 46 2/13/2012 Experiments and Results • Training parameters vs. classification accuracy 47 2/13/2012 Experiments and Results • Comparisons 48 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 49 2/13/2012 Conclusion • Depth images may contain enough information to solve human pose problems • Depth images are color and texture invariant, which simplifies a lot of the corresponding problem • A deep combining model with sufficient training data can become a good classifier even with simple features • Buy a Kinect for LAB 50 2/13/2012 Outline • • • What is Kinect? Kinect Architecture From IR to depth image • • • PrimeSense Invented Structured Light From depth image to joint positions • • • • • History of Structured Light Body Part Interference Joint Proposals Experiments and Results Conclusion References 51 2/13/2012 References • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/0118123A1 • Freedman, B., A. Shpunt, et al. (2008). Distance-Varying Illumination and Imaging Techniques for Depth Mapping, US 2010/0290698A1 52 2/13/2012 References • • • • • Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680. Albitar, I., P. Graebling, et al. (2007). “Robust structured light coding for 3D reconstruction,” IEEE. Scharstein, D. and R. Szeliski (2003). “High-accuracy stereo depth maps using structured light,” IEEE. Breiman, L. (2001). "Random forests." Machine learning 45(1): 5-32. Amit, Y. and D. Geman (1997). "Shape quantization and recognition with randomized trees." Neural computation 9(7): 1545-1588. 53 2/13/2012 References • • • • • • • • John MacCormick, “How does the Kinect work? ” users.dickinson.edu/~jmac/selected-talks/kinect.pdf “Structured Light”, www.igp.ethz.ch/photogrammetry/.../MV-SS2011structured.pdf http://en.wikipedia.org/wiki/Kinect http://en.wikipedia.org/wiki/Structured-light_3D_scanner http://en.wikipedia.org/wiki/Triangulation http://dms.irb.hr/tutorial/tut_dtrees.php http://www.anandtech.com/show/4057/microsoft-kinectthe-anandtech-review/2 Chen, Y. S. and B. T. Chen (2003). "Measuring of a threedimensional surface by use of a spatial distance computation." Applied optics 42(11): 1958-1972. 54 2/13/2012