CSCI498B/598B Human-Centered Robotics October 5, 2015 Skeleton-based representations • Solution 2: Compute additional features 𝒅 𝜽 2 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Histogram of Joint Position Differences 3 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Joint Movement Volume Features 4 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Joint Movement Volume Features 5 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Covariance of 3D Joints 6 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Covariance of 3D Joints Temporal Pyramid: 7 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Histogram of Oriented Displacement 8 Skeleton-based representation • Solution 2: Feature extraction - compute additional features Representation: Histogram of Oriented Displacement Temporal pyramid also applies. 9 Skeleton-based representation • Advantages: • Fast! • Invariant to human body rotations • Able to obtain satisfactory performance • Disadvantages: • Cannot model human-object interaction • Cannot incorporate background/context information • Performance depends heavily on skeleton data quality 10 General Pipeline 1. Data acquisition 2. Representation construction 3. Pattern recognition (classification or clustering) 4. Decision making 5. Taking an action 11 • Question: Can we use the local features (e.g., “regions of corners”) to represent an observation or a human activity? 12 Bag-of-Words representation • Question: Can we use the local features (e.g., “regions of corners”) to represent an observation or a human activity? 13 Bag-of-Words representation 14 Bag-of-Words representation 15 Bag-of-Words representation • Bag-of-words representation has its origin in text processing • Orderless document representation: frequencies of words from a dictionary 16 Bag-of-Words representation • Bag-of-words representation has its origin in text processing • Orderless document representation: frequencies of words from a dictionary 17 Bag-of-Words representation • Bag-of-words representation has its origin in text processing • Orderless document representation: frequencies of words from a dictionary 18 Bag-of-Words representation • Bag-of-words representation has its origin in text processing • Orderless document representation: frequencies of words from a dictionary 19 Bag-of-Words representation • Texture is characterized by the repetition of basic elements 20 Bag-of-Words representation • Texture is characterized by the repetition of basic elements 21 Bag-of-Words representation • Procedures: • Feature detection: e.g., corners 22 Bag-of-Words representation • Procedures: • Feature detection: e.g., corners • Feature description: e.g., incorporate the information of local regions around the corners 23 Bag-of-Words representation • Procedures: • Feature detection: e.g., corners • Feature description: e.g., incorporate the information of local regions around the corners • Dictionary learning: e.g., through clustering then assign each feature descriptor an index 24 Bag-of-Words representation • Procedures: • Feature detection: e.g., corners • Feature description: e.g., incorporate the information of local regions around the corners • Dictionary learning: e.g., through clustering then assign each feature descriptor an index • Bag-of-words representation 25 Bag-of-Words representation • Advantages: • Robust to partial occlusion, illumination changes, etc • Independent of global skeleton data • Capable of modeling more complicated scenarios, such as human-object interaction • Capable of capturing context information contained in the scene How about modeling human activities? 26 Bag-of-Words representation • Limitations: • Incapable of modeling time • Incapable of dealing with 3D perceptual data • Ignore spatial information, since assuming there’s no order within the visual features/words. 27