Random Forest and Graph Cut based segmentation of human limbs Nadezhda Zlateva, IICT-BAS 7 Sept. 2011 Outline • Human Pose Recognition • Case Study • Randomized Decision Tree • Random Forest • Experimental results with RF • Graph Cut • Experimental results with GC • Application to hand classification • Conclusion • References 2 Human Pose Recognition Recognition via conventional intensity cameras depth cameras Frame to frame points tracking – slow to re-initialize Pose Recognition in parts: • Body parts segmentation - Per pixel classification • 3D skeletal joints estimation [1] Shotton et al., 11 3 Case Study Upper limbs segmentation for hand gesture recognition Application: • Sign language interpretation • Medical environments -Robots medical assistants [Purdue University] -CT & MRI review in sterile environments [Sunnybrook Hospital, Toronto] 4 Binary Decision Tree: Basics 5 leaf nodes split nodes v 1 ≥ 2 3 < 4 5 6 7 < 8 10 9 14 15 16 ≥ 11 12 17 category c 13 DT over depth images: Training feature vector – pixel x [x, y, z]T of depth image I split function – depth comparison features fθ as function of x: dI(x) – depth at pixel x [1] Shotton, 11 θ1 θ2 Combination of weak but computationally efficient features 6 Randomized DT: Training 7 1. Random selection of a set of split candidates ϕ = (θ, τ), where - set of split thresholds for each θ for tree t. 2. Definition of the set of training pixels Q={(I,x)} over all training images for the tree t. Q - set of pixels at the root node. 3. Find best split candidate at node n – largest information gain from splitting Q into Qleft & Qright Randomized DT: Training 4. Recurse for Qleft(ϕ*) & Qright(ϕ*)– till reaching stop conditions - Maximum depth - Minimum information gain - Minimum number of node pixels 5. Estimation of Pt(c|I,x) at each leaf node over body part labels c – use normalized histogram Note: • dependent on choice of parameters • prone to over-fitting 8 Random Forest 9 Forest - ensemble of T decision trees • Divide training (depth) images into T subsets – unique subset for each tree t • Train each tree [3] Breiman 01 [1] Shotton et al. 11 Random Forest: Classification x x tree t1 …… label c label c • classification is tree tT 10 Random Forest: Toy demo 11 [2] Shotton et al. 09 Random Forest: Summary • • • • • • Improves generalization to new data Ensemble of trees gives robustness Good for multi-class problems Resistant to over-fitting Fast training on large data sets Efficient classifier 12 RF: Experiments and results - 13 Ground truth: 500 (upper limb) labeled depth images (640x480) Number of trees: T=3 Tree depth: 15 Split candidates: |θ|=100, |τ|=20 for each θ Random pixels per image: 1000 5-fold cross validation => 100 test images, 130 training images per tree Table 1. Average per class accuracy with RF classification RF: Experiments and results Ground truth & training Per pixel classification 14 Segmentation by Graph Cut: Motivation15 RF classification results: • Fuzzy body part boundaries • Left/Right uncertainty Subsequent hand sign recognition – requires cleaner hand region segmentation Graph Cut framework: • Energy minimization framework • Binary and multi-label image segmentation • Combines local and contextual information Pixel labeling problem Given Pixels Assignment cost – U (unary potential) Separation cost – B (boundary potential) - pairs of neighboring pixels Find Labels that minimize [4] Boykov et al. 01 16 Graph Cut: Binary case 17 • Image as directed graph G(V, E) t-link Assignment cost n-link Separation cost Energy minimization problem = min s-t cut on G = max-flow Theorem: In a graph G, the maximum source-to-sink flow possible is equal to the capacity of the minimum cut in G. [L. R. Foulds, Graph Theory Applications, 1992 Springer-Verlag New York Inc., 247-248] Graph Cut: Multi-label case Energy = cut cost || C || |w ij eC Suboptimal approximation of the minimum energy | 18 Graph Cut: Potentials 19 Importance weight Energy function prob. by RF Unary potential , Boundary potential prior constraints , [5] Boykov et al. 06 Graph Cut: Results Spatial Coherence: 20 Graph Cut: Results RF classifications GC segmentation 21 RF & GC for hands Ground truth Random Forest Graph Cut 22 63 frames 500 random pixels |Omax| = 45 58.5% per class accuracy 70.9% per class accuracy Conclusion • RF – strong classifier • RF + GC over depth maps – good object segmentation • • • • Future Work Increase available data Improve pixel label inference Estimate upper limb/hand joints Recognize finger configuration 23 References [1] Shotton, J., A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake. Real-time Human Pose Recognition in Parts from a Single Depth Image. CVPR, 2011 [2] Shotton, J. Boosting and Random Forest for Visual Recogniion, ICCV Tutorial, 2009. http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial [3] Breiman, L. Random forests. Mach. Learning, 45(1):5–32, 2001. http://www.stat.berkeley.edu/~breiman/RandomForests [4] Boykov, Y., and M. P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision, 2001. [5] Boykov, Y., and G. Funka-Lea. Graph cuts and efficient n-d image segmentation. IJCV, 70:109–131, 2006