9.913 Pattern Recognition for Vision Class XIII, Motion and Gesture Yuri Ivanov Fall 2004 TOC • Movement – Activity – Action • View-based representation • Sequence comparison • Hidden Markov Models • Hierarchical representations Fall 2004 Pattern Recognition for Vision From Tracking to Classification How do we describe that? How do we classify that? Fall 2004 Figure by MIT OCW. Pattern Recognition for Vision Sequence Analysis • We might want to ask: – Is <it> doing something meaningful? – What exactly? – How does it do it? • How fast – e.g. conducting • How accurately – e.g. dance instruction • What style? • That leads us to a sequence analysis Fall 2004 Pattern Recognition for Vision Motion Taxonomy • Movement – Primitive motion – Self-evidential, is “what it looks like” • Activity – Requires explicit sequence model • Action Images removed due to copyright considerations. Please see: Bobick, A. "Movement, Activity, and Action: The Role of Knowledge in Perception of Motion." Phyl Trans of Royal Statistical Society 352 (1997). – Requires contextual information – Requires relational information – And many other things… Fall 2004 Pattern Recognition for Vision Basic Problem Object trajectory in the feature space Ø x11 ø Œ 1œ Œ y1 œ Œq 1 œ Œ 1œ Œº � œß Fall 2004 Ø x1M ø Œ 1 œ Œ yM œ Œq 1 œ Œ Mœ Œº � œß Ø xN2 ø Œ 2œ Œ yN œ Œ 2œ Œq N œ Œº � œß s1 = x11...x1M ? s1 <> s2 Ø x12 ø Œ 2œ Œ y1 œ Œq 2 œ Œ 1œ Œº � œß s2 = x12 ...x 2N Pattern Recognition for Vision Motion Energy Image First idea –implicit representation of time D (x, y , t ) - frame differencing Sum the differences over the last τ frames: t -1 Et ( x , y , t ) = ∪ D ( x, y , t - i ) - WHERE motion happened i =0 Photographs and figures from: Bobick, A., and J. Davis. "The Representation and Recognition of Action Using Temporal Templates." IEEE Transactions on Pattern Analysis and Machine Intelligence 23, no. 3 (2002). Courtesy of IEEE, A. Bobick, and J. Davis. Copyright 2002 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Motion History Image Step two: include temporal information if D( x, y , t ) = 1 �t Ht ( x, y , t ) = � � max(0, Ht ( x, y , t - 1) - 1) otherwise t - HOW motion happened Aside - you can compute a similar measure recursively: Ht ( x, y , t ) = Ht ( x, y , t - 1) + a ( D( x , y , t ) - Ht ( x, y , t - 1)) Photographs and figures from: Bobick, A., and J. Davis. "The Representation and Recognition of Action Using Temporal Templates." IEEE Transactions on Pattern Analysis and Machine Intelligence 23, no. 3 (2002). Courtesy of IEEE, A. Bobick, and J. Davis. Copyright 2002 IEEE. Used with Permission. Fall 2004 t Pattern Recognition for Vision Illustration OpenCV – Intel Open source Computer Vision Library Fall 2004 Pattern Recognition for Vision Classification Feature vector: x = [7 Hu moments for MEI + 7 Hu moments for MHI] RTS invariant shape descriptors (see the end of notes) With the usual Gaussian assumption on distribution of x: mw = E[ xw ]; Sw = E[( xw - mw ) 2 ] Then the class, ω : T w = argmin Ø( x - mw ) Sw-1 ( x - mw ) ø º ß Fall 2004 Pattern Recognition for Vision Multi-View Recognition The model is replicated for a discrete number of views: For q = {0�...90� } ( V (wi ) = min Ø x - m q Œ º w = argmin [V (wi )] q wi ) T S - 1, q wi ( x - m )øœß q wi Closest member of each class Nearest class Photographs and figures from: Bobick, A., and J. Davis. "The Representation and Recognition of Action Using Temporal Templates." IEEE Transactions on Pattern Analysis and Machine Intelligence 23, no. 3 (2002). Courtesy of IEEE, A. Bobick, and J. Davis. Copyright 2002 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Example Application KidsRoom • Interactive story • Autonomous system • Narration is controlled • Input from cameras and mike • Visual events: • position • motion energy • motion direction • gross body motion Fall 2004 Image removed due to copyright considerations. See: http://whitechapel.media.mit.edu/vismod/demos/kidsroom/kidsroom.html Pattern Recognition for Vision Motion Energy Reference object MEI Move right Move left Move straight Vismod Tech Report # 398 Fall 2004 Pattern Recognition for Vision Movement Classification “Flap” MEI/MHI “Spin” Last game sequence Fall 2004 Pattern Recognition for Vision Temporal Alignment Another idea – temporal alignment X Y ? <> If sequences are aligned to a common time axis, then we can treat them as vectors Fall 2004 Pattern Recognition for Vision Temporal Alignment Find re-indexing sequences ix and iy that align X and Y to a common time axis k while minimizing dissimilarity. iy ix = f x ( k ) Ty Tx 1 1 T i y = f y (k ) 1 Tx 1 k ix Ty 1 1 T k One solution – Dynamic Time Warp algorithm 1 E= Mf �{m(n) (s [i (n)] - s T n=1 Global normalization Fall 2004 x x y غiy (n ) øß )} 2 Local weighting Pattern Recognition for Vision Example: Utterance Classification “sit” Class Prototypes Master Prototype Time normalization: 1. Find the least distortion prototype in each class 2. Pick the longest one 3. Warp all data to it 4. Train classifier “roll over” Original 30-70 samples Fall 2004 Normalized 59 samples Pattern Recognition for Vision Example: Utterance Classification Alternative - pair-wise alignment: N SVM: f ( x) = �a i yi K ( x, xi ) + b i =1 1. Compute the symmetric DTW between all pairs d ij = D( si , s j ) + D( s j , si ) 2 2. Compute an RBF Kernel K ( si , s j ) = exp( -g dij ) Danger: K might not be a proper kernel matrix – need to regularize Fall 2004 Pattern Recognition for Vision Example: Utterance Classification Japanese Vowel Set (UCI Machine Learning Repository): • Speaker identification task • 9 speakers • saying the same Japanese vowel • features 12 cepstral coefficients • each utterance – 7-30 samples • 340 training examples • 240 testing examples Accuracy KNN MCC HMM SVM DynSVM Fall 2004 94.60% 94.10% 96.20% 98.20% 98.20% Pattern Recognition for Vision Hidden Markov Model (HM&M) Yet another idea: Fall 2004 Figure by MIT OCW. Pattern Recognition for Vision Hidden Markov Model (HM&M) Yet another idea: Fall 2004 Figure by MIT OCW Pattern Recognition for Vision HMMs Another view – “Graphical Model”: Figure by MIT OCW. Fall 2004 Pattern Recognition for Vision Components of an HMM l = {p , A, B} 1) π - probability of starting from a particular state ⎪ πi = p(q1=i) N �p i =1 i =1 2) A - probability of moving to a state, given the history aij=p(qt=i| qt-1=j) – Markov assumption N �a ij =1 j =1 3) B - probability of outputing a particular observation from a given state: bi (o)=p(ot |qt=i) � b ( x)dx = 1 i Fall 2004 Pattern Recognition for Vision HMM in Pictures T transitions π q1 qT N states … qt qt-1 o1 • A = p(qt| qt-1,…, q1)= p(qt| qt-1) – Markov assumption • p(qt| qt-1) is independent of t – stationary => a matrix Fall 2004 Pattern Recognition for Vision HMM Example Input Alphabet Output distributions Fall 2004 How HMM sees it Initial state Transition matrix Pattern Recognition for Vision Three Tasks of HMM 1. Given a sequence of observations find a probability of it given the model, p(O|λ) 2. Given a sequence of observations recover a sequence of states, P(q|O,λ) 3. Given a sequence, estimate parameters of the model Fall 2004 Pattern Recognition for Vision Problem I – Probability Calculation Take I – brute force: Given: O = (o1 ,..., oT ) Calculate: P(O | l ) Marginalize: P(O | l ) = � P(O, q | l ) = � P(O | q, l )P (q | l ) "q "q P(O | q, l ) = bq1 (o1 )bq2 (o2 )...bqT ( o3 ) P(q | l ) = p q1 aq1q2 aq2q3 ...aqT -1qT P(O | q, l ) P (q | l ) = p q1 bq1 (o1 ) aq1q2 bq2 (o2 )aq2 q3b q3 (o3 )...aqT -1q T bqT (oT ) N states, T transitions => |q|=NT !!!! N=5, T=100 => 2TNT= 2*100*5 100 ~ 1072 computations 65536*1072 particles in the universe Fall 2004 Pattern Recognition for Vision Try Again P (O | l ) = � P (O , q | l ) "q q 1072 = � p q (1)bq (1) (o1 ) aq (1) q (2) bq (2) (o2 ) aq (2) q (3) bq (3) (o3 )...aq (T -1) q (T )bq (T ) (oT ) q= q1 S » 2TN T = ��� �� p i bi ( o1 ) aij b j (o2 ) a jk bk (o3 )...almbm (oT ) m l j i S Fall 2004 » N 2T Pattern Recognition for Vision Problem I – Probability Calculation Take II – forward procedure: Define a “forward variable”, α a t (i) = P(o1o2 ...ot , qt = i | l ) - probability of seeing the string up to t and ending up in state i 1. Initialize a1 (i) = p i bi ( o1 ) … 2. Induce ØN ø a t +1 ( j ) = Œ� a t (i )aij œ b j (ot +1 ) º i =1 ß … 3. Terminate N P(O | l ) = �a T (i) … i =1 Fall 2004 Pattern Recognition for Vision While We Are At It… Define a “backward variable”, β b t (i) = P(ot +1ot + 2 ...oT | qt = i, l ) - probability of seeing the rest of the string after t and after visiting state i at t 1. Initialize b T (i ) = 1 2. Induce N bt (i) = � aij b j (ot +1 ) bt +1 ( j ) … … j =1 3. Terminate … Fall 2004 Pattern Recognition for Vision Task II – Optimal State Sequence “Optimality” - maximum probability of being in a state i at time t. Given: O = (o1 ,..., oT ) Find: qt = argmax P( qt | O, l ) q by Bayes rule P( qt = i | O) = P(O, qt = i ) N � P(O, q i =1 t … = i) o1 o2 o3 oT -1 oT P(O, qt ) = P( o1...ot , ot +1 ...oT , qt ) = P( o1...ot , qt ) P( ot +1...oT | o1...ot , qt ) = P( o1...ot , qt ) P( ot +1...oT | qt ) = a t bt Fall 2004 Pattern Recognition for Vision State Posterior So, P( qt = i | O) = P(O, qt = i) N � P(O, q j =1 1. 2. 3. 4. t = j) = a t (i) bt (i) N �a ( j )b ( j ) j =1 t Forward pass – compute α matrix Backward pass – compute β matrix Multiply element-by element Normalize columns = g t (i ) t » N 2T » N 2T NT » N 2T What’s the problem? Inconsistent paths – some might not even be allowed But not entirely useless! We will need it later. Fall 2004 Pattern Recognition for Vision Task II – Viterbi Algorithm “Optimality” – single maximum probability path. Given: Find: O = (o1 ,..., oT ) argmax P( q | O, l ) q Define: d t (i ) = max P(q1...qt -1 , qt = i, o1 ...ot ) q1q2 ... qt-1 Max prob. path so far By the optimality principle (Bellman, ’57): d t +1 ( j) = Ømax d t (i)aij ø b j (ot +1 ) º i ß Just need to keep track of max probability states along the way Fall 2004 Pattern Recognition for Vision Task II – Viterbi Algorithm (cont.) 1. Initialize d1 (i) = p i bi ( o1 ) y 1 (i ) = 0 1£ i £ N Housekeeping variable 2. Recurse d t ( j) = max غd t -1 (i) aij øß b j ( ot ) 2£ t £T 1£ j £ N y t ( j ) = argmax غd t -1 (i )aij øß 2£ t £T 1£ j £ N 1£i£ N 1£i £ N 3. Terminate P* = max d T (i) 1£i £ N q = argmax d T (i ) * T 1£ i £ N 4. Backtrack qt* = y t +1 (qt*+1 ) Fall 2004 t = (T - 1), ..., 1 Pattern Recognition for Vision Viterbi Illustration • Similar to the forward procedure • Typically, you’ll do it in log space for speed and underflows: - replace all parameters with their logarithms - replace all multiplications with additions Fall 2004 Pattern Recognition for Vision Task III – Parameter Estimation Baum-Welch algorithm (EM for HMMs) Given: Find: O = (o1 ,..., oT ) p , A, B First, introduce another greek letter: xt (i, j ) = P(qt = i, qt +1 = j | O) = P( qt = i, qt +1 = j , O) P(O) aij bj (ot +1 ) = a i (t ) Fall 2004 a t (i )aij b j (ot +1 ) bt ( j ) P(O) b j (t + 1) Pattern Recognition for Vision Transition Probability S … o1 o2 o3 expected # of transitions from i to j oT -1 oT This leads to: T -1 aij x (i , j ) � E [ #(i fi j )] = = E [ #(i fi .) ] � g (i ) t =1 T -1 t =1 t t The rest is easy Fall 2004 Pattern Recognition for Vision Priors and Outputs Prior distribution: p i = E [ #(i, t = 1) ] = g 1 (i) Output distribution (discrete): T � g (i) E [ #(i, vk )] to=t 1=vk bi (k ) = = T E [ #(i) ] t � g (i) t =1 Sum probabilities of being in state i while seeing symbol vk t Normalize Figure by MIT OCW. Fall 2004 Pattern Recognition for Vision Continuous Output Case Output distribution (continuous, Gaussian): bi (o) = N (mi ,S i ) Observation at time t weighted by the probability of being in the state at that time T mi = � g (i ) � o t t =1 t T � g (i ) t =1 t These should look VERY familiar T Si = T g ( i ) � ( o m )( o m ) �t t i t i t =1 T � g (i ) t =1 Fall 2004 t Pattern Recognition for Vision Semi-Continuous HMM Example Input Initial state Fall 2004 Output distributions Transition matrix How HMM sees it Pattern Recognition for Vision Gesture Recognition –Trajectory Model Modeling a tracked hand trajectory. Fall 2004 Pattern Recognition for Vision HMM Classifier Nothing unusual: l1 l2 Input Sequence l3 ArgMax l l4 Bank of HMMs Fall 2004 Pattern Recognition for Vision Applications – American Sign Language Task: Recognition of sentences of American Sign Language part of speech pronoun verb 40 word lexicon: noun • Single camera • No special markings on hands • Real-time adjective vocabulary I, you, he, we, you(pl), they want, like, lose, dontwant, dontlike, love, pack, hit, loan box, car, book, table, paper, pants, bicycle, bottle, can, wristwatch, umbrella, coat, pencil, shoes, food, magazine, fish, mouse, pill, bowl red, brown, black, gray, yellow Table from: Starner, T., and et. al. "Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video." IEEE Transactions on Pattern Analysis and Machine Intelligence (1998). Courtesy of IEEE. Copyright 1998 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision ASL – Features and Model “Word” model – a 4-state L-R HMM with a single skip transition: Features (from skin model): o = غ( x, y , dx, dy, area,q , lmax ,lmax / lmin )right ,(...)left øß System 1: Second person T System 2: First person Photo marked with black box due to copyright consideration. Nose could be used for initializing the skin model Courtesy of Thad Starner. Used with permission. Fall 2004 Pattern Recognition for Vision ASL – In Action Courtesy of Thad Starner. Used with permission. Fall 2004 Pattern Recognition for Vision Applications – American Sign Language 500 sentences (400 training, 100 testing) System 1: o`qs,ne,roddbg fq`ll`q experiment all features relative features all features & unrestricted grammar training set 94.10% 89.60% 81.0% (87%) (D=31, S=287, I=137, N=2390) test set 91.90% 87.20% 74.5% (83%) (D=3, S=76, I=41, N=470) grammar part-of-speech 5-word sentence training set 99.30% 98.2% (98.4%) (D = 5, S=36, I=5 N =2500) 96.4% (97.8%) (D=24, S=32, I=35, N=2500) test set 97.80% System 2: unrestricted Fall 2004 % words recognized correctly Word accuracy, D+ S+ I 1N 97.80% 96.8% (98.0%) (D=4, S=6, I=6, N=500) Pattern Recognition for Vision Beyond HMM Where can we go if HMM is not sufficient? Ideas: • Hierarchical HMM • More complex models - SCFG Explicit representation of structure Capable of generating only a regular language Compact representation More expressive, may include memory, but harder to deal with Figures from: Ivanov, Y., and A. Bobick. "Recognition of Visual Activities and Interactions." IEEE Transactions of Pattern Analysis and Machine Intelligence (2000). Courtesy of IEEE. Copyright 2000 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Structured Gesture Problem: 2 directions = 2 models WHY??? Fall 2004 Solution – split the model in two: • Components (trajectories) • Structure (events) Pattern Recognition for Vision Heterogeneous Representation • Many high-level activities are sequences of primitives – Pitching, cooking, dancing, stealing a car from a parking lot • Components – Signal level model – Variability in performance – Hidden state representation (HMM, etc.) • Structure – Event-level model – Uncertainty in component detections – State is NOT hidden (SRG, SCFG, etc) • Right tool for the right task! Fall 2004 Pattern Recognition for Vision Two-tier Recognition Architecture Fall 2004 Pattern Recognition for Vision Application: Conducting Music V V V V 4 4 “Dictionary” gestures Fall 2004 Pattern Recognition for Vision Application: Conducting Music Jean Sibelius, Second Symphony, Opus 43, D Major 56 = 3 2 1 6 4 5 456 = 34 12 123 Grammar: Correct Fall 2004 Individual ~70% Component ~85% Bar ~95% Pattern Recognition for Vision Component Detection Input sequence HMM Likelihoods Peaks of likelihood Single event Fall 2004 Pattern Recognition for Vision Temporal Consistency Grammar: A fi ab | abA Input a Fall 2004 b a b b a b Pattern Recognition for Vision Temporal Consistency Grammar: A fi ab | abA Terminals have temporal extent! Input a b a b b a b Figures from: Ivanov, Y., and A. Bobick. "Recognition of Visual Activities and Interactions." IEEE Transactions of Pattern Analysis and Machine Intelligence (2000). Courtesy of IEEE. Copyright 2000 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Temporal Consistency Grammar: A fi ab | abA Terminals have temporal extent! Input a b a b b a b Inconsistent parse Figures from: Ivanov, Y., and A. Bobick. "Recognition of Visual Activities and Interactions." IEEE Transactions of Pattern Analysis and Machine Intelligence (2000). Courtesy of IEEE. Copyright 2000 IEEE. Used with Permission. Consistent parse Fall 2004 Pattern Recognition for Vision Temporal Consistency Grammar: A fi ab | abA Terminals have temporal extent! Input a b a b b a b Inconsistent parse Consistent parse Figures from: Ivanov, Y., and A. Bobick. "Recognition of Visual Activities and Interactions." IEEE Transactions of Pattern Analysis and Machine Intelligence (2000). Courtesy of IEEE. Copyright 2000 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Parsing The idea is that the top level parse will filter out mistakes in low level detections a b c d e Parser Detection Fall 2004 adecb Pattern Recognition for Vision Stochastic Context-Free Grammar Example Grammar: TRACK -> CAR-TRACK | PERSON-TRACK [0.50] [0.50] CAR-TRACK -> CAR-THROUGH | CAR-PICKUP | CAR-OUT | CAR-DROP [0.25] [0.25] [0.25] [0.25] car-exit | SKIP car-exit [0.70] [0.30] CAR-THROUGH . . . Non-terminals ­ semantically significant groups of events Terminals - individual events . . . CAR-EXIT -> SKIP-rule - noise symbol Fall 2004 Rule probabilities Pattern Recognition for Vision Event Parsing X fi Zc - production rules (states) Z fi ab X Z X - target non-terminal (label) a b c a x y b p q r c SKIP SKIP Z - intermediate non-terminal - input stream (tracking events) - noise rules For the production X, events a, b and c should be consistent Fall 2004 Pattern Recognition for Vision Application: Musical Conducting Courtesy of Teresa Marrin-Nakra. Used with permission. Individual Component Bar Fall 2004 Correct ~70% ~85% ~95% Pattern Recognition for Vision Application: Surveillance System • Outdoor environment - occlusions and lighting changes • Static cameras • Real-time performance • Labeling activities and person-vehicle interactions in a parking lot • Handling simultaneous events Fall 2004 Pattern Recognition for Vision Monitoring System Photos and figures from: Stauffer, Chris, and Eric Grimson, "Learning Patterns of Activity Using Real-Time Tracking." IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI) 22, no. 8 (2000): 747-757. Courtesy of IEEE, Chris Stauffer, and Eric Grimson. Copyright 2000 IEEE. Used with Permission. • Tracker (Stauffer, Grimson) – assigns identity to the moving objects – collects the trajectory data into partial tracks • Event Generator – maps partial tracks onto a set of events • Parser – labels sequences of events according to a grammar – enforces spatial and temporal constraints Fall 2004 Pattern Recognition for Vision Tracker • Adaptive to slow lighting changes: – Each pixel is modeled by a mixture K P ( X t ) = � wi ,t *h ( X t , m i ,t , S i ,t ) i =1 • Foreground regions are found by connected components algorithm • Object dynamics is modeled in 2D by a set of Kalman filters • Details - (Stauffer, Grimson CVPR 99) Fall 2004 Pattern Recognition for Vision Tracker Camera view Trajectories over time Connected components An object Photos and figures from: Stauffer, Chris, and Eric Grimson, "Learning Patterns of Activity Using Real-Time Tracking." IEEE Transactions on PatternRecognition and Machine Intelligence (TPAMI) 22, no. 8 (2000): 747-757. Courtesy of IEEE, Chris Stauffer, and Eric Grimson. Copyright 2000 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Event Generator Photos and figures from: Stauffer, Chris, and Eric Grimson, "Learning Patterns of Activity Using Real-Time Tracking." IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI) 22, no. 8 (2000): 747-757. Courtesy of IEEE, Chris Stauffer, and Eric Grimson. Copyright 2000 IEEE. Used with permission. Map tracks onto events: car-enter, person-enter, car-found, person-found, car-lost, person-lost, stopped • Events along with class likelihoods are posted at the endpoints of each track (car-appear [0.5], car-disappear [1.0]) • Action label is assigned to each event in accordance with the environment map (car-enter [0.5], car-exit [1.0]) • Each event is complemented if the label probability is < 1 (car-enter [0.5], person-enter [0.5], car-exit [1.0]) Fall 2004 Pattern Recognition for Vision Parking Lot Grammar (Partial) Photos and figures from: Stauffer, Chris, and Eric Grimson, "Learning Patterns of Activity Using Real-Time Tracking." IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI) 22, no. 8 (2000): 747-757. Courtesy of IEEE, Chris Stauffer, and Eric Grimson. Copyright 2000 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Consistency • Temporal – Events should happen in particular order – Temporally close events are more likely to be related – Tracks overlapping in time are definitely not related to the same object • Spatial – Spatially close events are more likely to be related • Other – Objects don’t change identity within a track Fall 2004 Pattern Recognition for Vision Spatio-Temporal Consistency r = ( x, y ), reject X dr = ( dx , dy ) Z X t Predict new position: t Penalize: Z X penalize Z R|0, if ( t - t ) < 0 f ( r , r ) = S F (r - r ) (r - r ) I JK |TexpGH q 2 1 T p t Fall 2004 rp = r1 + dr1( t2 - t1 ) accept 2 2 p 2 p Pattern Recognition for Vision Input Data Fall 2004 Pattern Recognition for Vision Event Generator Interleaved events in the input stream Figures from Ivanov, Yuri, Chris Stauffer, Aaron Bobick, W. E. L. Grimson. "Video Surveillance of Interactions." IEEE Workshop on Visual Surveillance (ICCV 2001) (1999). Courtesy of IEEE, Yuri Ivanov, Chris Stauffer, Aaron Bobick, and W. E. L. Grimson. Copyright 1999 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Parse 1: Person-Pass-Through Action label Component labels Object track Temporal extent Figures from Ivanov, Yuri, Chris Stauffer, Aaron Bobick, W. E. L. Grimson. "Video Surveillance of Interactions." IEEE Workshop on Visual Surveillance (ICCV 2001) (1999). Courtesy of IEEE, Yuri Ivanov, Chris Stauffer, Aaron Bobick, and W. E. L. Grimson. Copyright 1999 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Parse 2: Drive-In Action label Component labels Object tracks Temporal extent Figures from Ivanov, Yuri, Chris Stauffer, Aaron Bobick, W. E. L. Grimson. "Video Surveillance of Interactions." IEEE Workshop on Visual Surveillance (ICCV 2001) (1999). Courtesy of IEEE, Yuri Ivanov, Chris Stauffer, Aaron Bobick, and W. E. L. Grimson. Copyright 1999 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Parse 3: Drop-off Action label Component labels Object track Temporal extent Figures from Ivanov, Yuri, Chris Stauffer, Aaron Bobick, W. E. L. Grimson. "Video Surveillance of Interactions." IEEE Workshop on Visual Surveillance (ICCV 2001) (1999). Courtesy of IEEE, Yuri Ivanov, Chris Stauffer, Aaron Bobick, and W. E. L. Grimson. Copyright 1999 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Parse 4: Car-Pass-Through Action label Component labels Object track Temporal extent Figures from Ivanov, Yuri, Chris Stauffer, Aaron Bobick, W. E. L. Grimson. "Video Surveillance of Interactions." IEEE Workshop on Visual Surveillance (ICCV 2001) (1999). Courtesy of IEEE, Yuri Ivanov, Chris Stauffer, Aaron Bobick, and W. E. L. Grimson. Copyright 1999 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision Summary Real-time system First of a kind end-to end system Extended robust parsing algorithm Events are staged in real environment with other cars and people • ~10-15 events per minute • Staged events - 100% detected • Accidental events - ~80% detected • • • • Fall 2004 Pattern Recognition for Vision Automatic Surveillance System • • • • Outdoor environment - occlusions and lighting changes Static cameras Real-time performance Labeling activities and person-vehicle interactions in a parking lot • Handling simultaneous events Fall 2004 Pattern Recognition for Vision Appendix: Hu Moments Full appendix from: Bobick, A., and J. Davis. "The Representation and Recognition of Action Using Temporal Templates." IEEE Transactions on Pattern Analysis and Machine Intelligence 23, no. 3 (2002). Courtesy of IEEE. Copyright 2002 IEEE. Used with Permission. Fall 2004 Pattern Recognition for Vision