Pose Estimation of Gymnasts Brian Reily Pose Estimation of Gymnasts I. Problem, Motivation, and Data Set II. Simple Pose Estimation III.Advanced Pose Estimation IV.Future Work I. The Problem ➔The pommel horse is a gymnastics event, where most moves involve spinning around a central axis I. Motivation ➔Gymnasts score better when their spins are consistently timed ➔Knowing if a gymnast tends to speed up or slow down can provide direction for training and conditioning ➔The United States Olympic Committee wants to use computer vision techniques to determine this I. Data Set ➔The data set was collected with a Kinect 2, providing depth data using infrared time of flight ➔Kinect 2 includes ‘skeleton’ generation, but this fails on the gymnasts’ body poses II. Simple Pose Estimation ➔To provide data about the gymnast’s spin rate, we only need to know where his feet are ➔Basic idea: Track gymnast’s feet, identify points where they reach as far left or far right as possible in the image frame II. Image Processing 1) Depth image of a gymnast. 2) Using the depth of the pommel horse as a base, we can threshold by the depth of each pixel. 3) A mean background (built over frames without a gymnast) also thresholded by depth, and subtracted. 4) A built-in Matlab or OpenCV function is used to find the contour of the gymnast’s body. II. How to Identify Feet? Idea is to find the major axis of the body, bent at the waist. 1) Find the longest vector from the centroid (green). 2) Find the shortest vector from the centroid (red). 3) Find the vector that best matches this angle (blue). II. Finding Extrema of a Spin ➔By tracking the last 5 positions of the feet, we look for a spot that is further left or further right than surrounding positions. II. Interpolating Frame Times ➔ Kinect 2 peaks at 30fps; typically less ➔ Olympic gymnasts are very fast and very consistent ➔ By fitting the extrema point and the 2 nearest points to a cubic spline, can interpolate times between frames. II. Demonstration https://www.youtube.com/v/IFTE_Lna9So II. Data for Olympic Coaches II. Evaluation ➔ Ground truth data is from hand annotated video (marking foot position), interpolated for time between frames using the same cubic spline method. ➔ Typical RMSE of ~10ms for each individual routine. III.Advanced Pose Estimation ➔Knowing the location of hands, individual feet, or ideally an entire skeleton, can provide information about other aspects of the pommel horse. ➔Or, could extend the work to other events and sports. ➔Starting with ‘normal’ poses for now instead of gymnasts. III.Example ‘Normal’ Pose III.Detecting Body Parts ➔ Use geodesic distance instead of euclidean (geodesic distance is along a surface) Geodesic Distance Euclidean Distance III.Detecting Body Parts 1) Find the geodesic distance from the centroid for the entire body, with Dijkstra’s shortest path algorithm. 2) Find the top N maximum points these are likely locations for hands, feet, and the head. 3) Possible to pick hands and head based on a heuristic. III.Recognizing Body Parts ➔Plan: Run a classifier using these likely points, should show significant speed improvements over a sliding window approach. ➔For RGB images, state-of-the-art is HOG features to describe body parts. ➔But for depth images, there’s no consensus feature. III.Fitting a Skeleton ➔ If we know the head and hands, we can fit a basic skeleton - as a snake. ➔ Using a gradient vector field, the snake can be pushed away from the body edges. IV.Future Work ➔The foot tracking algorithm will be used by a undergraduate field session team this summer to build a full application for USOC ➔Begin work on a feature descriptor for depth images ➔Improve the snake-skeleton fitting, and work on incorporating kinematic data into skeleton fitting Questions?