Pose Estimation of Gymnasts Brian Reily

advertisement
Pose Estimation
of Gymnasts
Brian Reily
Pose Estimation of Gymnasts
I. Problem, Motivation, and Data Set
II. Simple Pose Estimation
III.Advanced Pose Estimation
IV.Future Work
I. The Problem
➔The pommel horse is a gymnastics event,
where most moves involve spinning around a
central axis
I. Motivation
➔Gymnasts score better when their spins are
consistently timed
➔Knowing if a gymnast tends to speed up or
slow down can provide direction for training
and conditioning
➔The United States Olympic Committee wants
to use computer vision techniques to
determine this
I. Data Set
➔The data set was collected with a
Kinect 2, providing depth data
using infrared time of flight
➔Kinect 2 includes ‘skeleton’
generation, but this fails on
the gymnasts’ body poses
II. Simple Pose Estimation
➔To provide data about the gymnast’s spin
rate, we only need to know where his feet
are
➔Basic idea: Track gymnast’s feet, identify
points where they reach as far left or far right
as possible in the image frame
II. Image Processing
1) Depth image of a
gymnast.
2) Using the depth of the
pommel horse as a
base, we can threshold
by the depth of each
pixel.
3) A mean background
(built over frames
without a gymnast)
also thresholded by
depth, and subtracted.
4) A built-in Matlab or
OpenCV function is
used to find the
contour of the
gymnast’s body.
II. How to Identify Feet?
Idea is to find the major axis of the body,
bent at the waist.
1) Find the longest vector from the
centroid (green).
2) Find the shortest vector from the
centroid (red).
3) Find the vector that best matches this
angle (blue).
II. Finding Extrema of a Spin
➔By tracking the last 5 positions of the feet,
we look for a spot that is further left or further
right than surrounding positions.
II. Interpolating Frame Times
➔ Kinect 2 peaks at 30fps;
typically less
➔ Olympic gymnasts are very
fast and very consistent
➔ By fitting the extrema point
and the 2 nearest points to a
cubic spline, can interpolate
times between frames.
II. Demonstration
https://www.youtube.com/v/IFTE_Lna9So
II. Data for Olympic Coaches
II. Evaluation
➔ Ground truth data is from
hand annotated video
(marking foot position),
interpolated for time
between frames using the
same cubic spline
method.
➔ Typical RMSE of ~10ms
for each individual
routine.
III.Advanced Pose Estimation
➔Knowing the location of hands, individual
feet, or ideally an entire skeleton, can
provide information about other aspects of
the pommel horse.
➔Or, could extend the work to other events
and sports.
➔Starting with ‘normal’ poses for now instead
of gymnasts.
III.Example ‘Normal’ Pose
III.Detecting Body Parts
➔ Use geodesic distance instead of euclidean (geodesic
distance is along a surface)
Geodesic Distance
Euclidean Distance
III.Detecting Body Parts
1) Find the geodesic
distance from the
centroid for the
entire body, with
Dijkstra’s shortest
path algorithm.
2) Find the top N
maximum points these are likely
locations for hands,
feet, and the head.
3) Possible to pick hands
and head based on a
heuristic.
III.Recognizing Body Parts
➔Plan: Run a classifier using these likely
points, should show significant speed
improvements over a sliding window
approach.
➔For RGB images, state-of-the-art is HOG
features to describe body parts.
➔But for depth images, there’s no consensus
feature.
III.Fitting a Skeleton
➔ If we know the head and hands, we can fit a basic skeleton - as a snake.
➔ Using a gradient vector field, the snake can be pushed away from the body
edges.
IV.Future Work
➔The foot tracking algorithm will be used by a
undergraduate field session team this
summer to build a full application for USOC
➔Begin work on a feature descriptor for depth
images
➔Improve the snake-skeleton fitting, and work
on incorporating kinematic data into skeleton
fitting
Questions?
Download