slides - Machine Learning

advertisement
Learning Human Pose and
Motion Models for Animation
Aaron Hertzmann
University of Toronto
Animation is maturing …
… but it’s still hard to create
Keyframe animation
Keyframe animation
q1
q2
q(t)
q3
q(t)
http://www.cadtutor.net/dd/bryce/anim/anim.html
Characters are very complex
Woody:
- 200 facial controls
- 700 controls in his body
http://www.pbs.org/wgbh/nova/specialfx2/mcqueen.html
Motion capture
[Images from NYU and UW]
Motion capture
Mocap is not a panacea
Goal: model human motion
What motions are likely?
Applications:
• Computer animation
• Computer vision
Related work: physical models
•Accurate, in principle
•Too complex to work with
(but see [Liu, Hertzmann, Popović 2005])
•Computationally expensive
Related work: motion graphs
Input: raw motion capture
“Motion graph”
(slide from J. Lee)
Approach: statistical models of motions
Learn a PDF over motions, and synthesize
from this PDF [Brand and Hertzmann 1999]
What PDF do we use?
Style-Based Inverse
Kinematics
with: Keith Grochow, Steve
Martin, Zoran Popović
Motivation
Body parameterization
Pose at time t: qt
Root pos./orientation (6 DOFs)
Joint angles (29 DOFs)
Motion
X = [q1, …, qT]
Forward kinematics
Pose to 3D positions:
qt
FK
[xi,yi,zi]t
Problem Statement
Generate a character pose based on a chosen
style subject to constraints
Degrees of freedom (DOFs) q
Constraints
Approach
Off-Line Learning
Motion
Learning
Style
Real-time Pose Synthesis
Constraints
Synthesis
Pose
Features
y(q) = q
[ q0 q1 q2 ……
orientation(q) velocity(q)
r 0 r1 r 2
v0 v1 v2 …
]
Goals for the PDF
• Learn PDF from any data
• Smooth and descriptive
• Minimal parameter tuning
• Real-time synthesis
Mixtures-of-Gaussians
GPLVM
Gaussian Process Latent Variable
Model [Lawrence 2004]
x2
GP
y2
  -1
x ~ N(0,I)
y ~ GP(x; )
y3
x1
Latent Space
y1
Feature Space
Learning: arg max p(X, | Y) = arg max p(Y | X, ) p(X)
Scaled Outputs
Different DOFs have different
“importances”
Solution:
RBF kernel function k(x,x’)
ki(x,x’) = k(x,x’)/wi2
Equivalently: learn x  Wy
where W = diag(w1, w2, … wD)
Precision in Latent Space
2(x)
SGPLVM Objective Function
C
y
x2
f(x; θ)
x
y2
y3
y1
x1
L IK (x, y; θ) 
Wθ (y  f(x; θ))
2 (x; θ)
2
2

D
2
ln  (x; θ)
2
Baseball Pitch
Track Start
Jump Shot
Style interpolation
Given two styles 1 and 2, can we
“interpolate” them?
p1 ( y )  exp(  LIK ( y; θ1 ))
p2 (y)  exp(  LIK (y; θ2 ))
Approach: interpolate in log-domain
Style interpolation
p1 (y)  exp(  LIK (y; θ1 ))
(1-s)
p2 (y)  exp(  LIK (y; θ2 ))
s
(1  s) p1 ( y)  sp2 ( y)
Style interpolation in log space
exp(  LIK (y; θ1 ))
(1-s)
exp(  LIK (y; θ1 ))
s
exp( ((1  s) L(y; θ1 )  sL(y; θ2 ))
Interactive Posing
Interactive Posing
Interactive Posing
Multiple motion style
Realtime Motion Capture
Style Interpolation
Trajectory Keyframing
Posing from an Image
Modeling motion
GPLVM doesn’t model motions
• Velocity features are a hack
How do we model and learn dynamics?
Gaussian Process
Dynamical Models
with: David Fleet, Jack Wang
Dynamical models
xt
xt+1
Dynamical models
Hidden Markov Model (HMM)
Linear Dynamical Systems (LDS)
[van Overschee et al ‘94; Doretto et al ‘01]
Switching LDS
[Ghahramani and Hinton ’98; Pavlovic et
al ‘00; Li et al ‘02]
Nonlinear Dynamical Systems
[e.g., Ghahramani and Roweis ‘00]
Gaussian Process Dynamical Model
(GPDM)
Latent dynamical model:
latent dynamics
pose reconstruction
Assume IID Gaussian noise, and
with Gaussian priors on
and
Marginalize out
, and then optimize the latent positions to
simultaneously minimize pose reconstruction error and (dynamic)
prediction error on training data
.
Dynamics
The latent dynamic process on
has a similar form:
where
is a kernel matrix defined by kernel function
with hyperparameters
Markov Property
Subspace dynamical model:
Remark: Conditioned on
, the dynamical
model is 1st-order Markov, but the marginalization
introduces longer temporal dependence.
Learning
GPDM posterior:
reconstruction
likelihood
training
motions
dynamics
likelihood
priors
latent
hyperparameters
trajectories
To estimate the latent coordinates & kernel parameters we minimize
with respect to
and
.
Motion Capture Data
~2.5 gait cycles (157 frames)
Learned latent coordinates
(1st-order prediction, RBF kernel)
56 joint angles + 3 global translational velocity + 3 global orientation
from CMU motion capture database
3D GPLVM Latent
Coordinates
large “jumps’
in latent space
Reconstruction Variance
Volume visualization of
(1st-order prediction, RBF kernel)
.
Motion Simulation
initial state
Random trajectories from MCMC
(~1 gait cycle, 60 steps)
Animation of mean motion
(200 step sequence)
Simulation: 1st-Order Mean
Prediction
Red: 200 steps of mean prediction
Green: 60-step MCMC mean
Animation
Missing Data
50 of 147 frames dropped
(almost a full gait cycle)
spline interpolation
Missing Data: RBF Dynamics
Determining hyperparameters
Data: six distinct walkers
GPDM
Neil’s parameters
MCEM
Where do we go from here?
Let’s look at some limitations of the model
60 Hz
120 Hz
What do we want?
Phase
Variation
x2
x1
A walk cycle
Branching motions
Walk
Run
Stylistic variation
Current work: manifold GPs
Latent space (x)
Data space (y)
Summary
GPLVM and GPDM provide priors from
small data sets
Dependence on initialization, hyperpriors,
latent dimensionality
Open problems modeling data topology
and stylistic variation
Download