SOMM: Self Organizing Markov Map for Gesture Recognition G. Caridakis et al., Pattern Recognition, Vol. 31, pp. 52-59, 2010. Pattern Recognition 2010 Spring Seung-Hyun Lee Contents • Introduction • Related Work – Hidden Markov Models – Other Method • Proposed Method • Experiments • Conclusion 1 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Introduction • Gesture – A motion of the body that conveys information • In this paper – Focus on hand gestures 2 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Introduction • Taxonomy of gesture(McNeill, 1992) – – – – – Gesticulation Speech-linked Pantomime Emblems Sign Languages • Other (Kendon,1992) (Quek, 1994) 3 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Introduction • Taxonomy by functionality Gestures Definition Symbolic gestures gestures that, within each culture, have come to have a single meani ng. Deictic gestures types of gestures most generally seen in HCI and are the gestures of pointing to entities or direction. Iconic gestures gestures used to convey information about the size, spatial relations, actions, shape or orientation of the object of discourse display. Pantominic gestures gestures typically used to mimic an action, object or concept. 4 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Hidden Markov Model Related Work • Cogan(2006) – Discrete HMM which fuse hand shape and position • Hossain(2005) – Implicit/Explicit Temporal Information Encoded HMM – Discriminated attention and non-attention gestures • Mantyla(2000) – On mobile devices – Utilized SOM and HMM method • Starner(1998) – HMM based American Sign Language(ASL) recognition – Sentence level recognition is possible 5 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Other method Related Work • Black and Jepson(1998) – CONDitional dENSity propagATION (CONDENSATION) algorothm • Wong and Ciipolla(2006) – Sparse Bayesian classifier • Hong et al.(2000) – Finite State Machines(FSM) • Su(2000) – Fuzzy logic and rule-based approaches and hyper-rectangular composite Neural network(HRCNNs) • Juang and Ku(2005) – Fuzzified Takagi-Sugeno-Kang(TSK) type recurrent network • Yang et al.(2002) – Time Delay Neural network • Huang and Huang(1998) – 3D Hopfield Neural Network 6 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Overview Proposed Method • Modules – Image processing : detection an tracking of hands – SOM : quantization of hand location and direction – HMM : transition probability matrix 7 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Feature Extraction Proposed Method • Video based method – Creation of moving skin masks (Skin color area) – Tracking the centroid of the skin masks – Prior knowledge is required • It should indicate different body parts (Left, right hand, and head) • Environment – PC platform – OpenCV 8 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Proposed Method • Dataset • Gesture instances • Gesture instances 9 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Position Model Proposed Method • cf) SOM (1) continuous input space (2) discrete output space in the form of lattice (3) time-varying neighborhood function defined around winning neuron (4) decreasing learning rate parameter 10 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Position Model Proposed Method • Some based representation of hand position 11 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Direction Model Proposed Method • Additional information: Moving direction 12 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Generalized Median Proposed Method • Based on Levenshtein distance(edit distance) – Measuring the amount of difference between two sequences • Generalized median of data set Mj • Mean Levenstein distance between members 13 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Gesture Decoding Proposed Method • Position – Probability – Calculation of Ssom • First state: initial probability • From second state: transition probability – Unit u 14 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Gesture Decoding Proposed Method • Direction – Probability – Calculation of Sof – Unit u 15 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Gesture Decoding Proposed Method • Similarity measurement – Problem • Shorter gesture instances tend to gain an advantage by having less transitions and thus less probabilities multiplication – Measurement 16 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Error Propagation Proposed Method • Error definition for function f • SOM based approach – If data containing small error is mapped to the same node of SOM No problem – Otherwise Consequently, because of neighboring relation of u, error is not propagated to the next steps of the recognition process 17 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Data Set Experiment • 30 gestures 10 repetitions each 18 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Result Experiment • SOM clustering – Blue: close to input vector – Red: not close • Recognition accuracy – Test with training data: 100% – 10-fold cross validation: 93% 0.843 ms for decoding a gesture – Only HMM-based classifier: 86.36% 19 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Conclusion • Key features – SOM and HMM based automatic recognition architecture – ROI • Relative hand position • Moving direction • Similarity of pattern • Application – Sign language – Gaming environment 20 S FT COMPUTING @ YONSEI UNIV . KOREA 16 Thank you