Learning Human Pose and Motion Models for Animation Aaron Hertzmann University of Toronto Animation is maturing … … but it’s still hard to create Keyframe animation Keyframe animation q1 q2 q(t) q3 q(t) http://www.cadtutor.net/dd/bryce/anim/anim.html Characters are very complex Woody: - 200 facial controls - 700 controls in his body http://www.pbs.org/wgbh/nova/specialfx2/mcqueen.html Motion capture [Images from NYU and UW] Motion capture Mocap is not a panacea Goal: model human motion What motions are likely? Applications: • Computer animation • Computer vision Related work: physical models •Accurate, in principle •Too complex to work with (but see [Liu, Hertzmann, Popović 2005]) •Computationally expensive Related work: motion graphs Input: raw motion capture “Motion graph” (slide from J. Lee) Approach: statistical models of motions Learn a PDF over motions, and synthesize from this PDF [Brand and Hertzmann 1999] What PDF do we use? Style-Based Inverse Kinematics with: Keith Grochow, Steve Martin, Zoran Popović Motivation Body parameterization Pose at time t: qt Root pos./orientation (6 DOFs) Joint angles (29 DOFs) Motion X = [q1, …, qT] Forward kinematics Pose to 3D positions: qt FK [xi,yi,zi]t Problem Statement Generate a character pose based on a chosen style subject to constraints Degrees of freedom (DOFs) q Constraints Approach Off-Line Learning Motion Learning Style Real-time Pose Synthesis Constraints Synthesis Pose Features y(q) = q [ q0 q1 q2 …… orientation(q) velocity(q) r 0 r1 r 2 v0 v1 v2 … ] Goals for the PDF • Learn PDF from any data • Smooth and descriptive • Minimal parameter tuning • Real-time synthesis Mixtures-of-Gaussians GPLVM Gaussian Process Latent Variable Model [Lawrence 2004] x2 GP y2 -1 x ~ N(0,I) y ~ GP(x; ) y3 x1 Latent Space y1 Feature Space Learning: arg max p(X, | Y) = arg max p(Y | X, ) p(X) Scaled Outputs Different DOFs have different “importances” Solution: RBF kernel function k(x,x’) ki(x,x’) = k(x,x’)/wi2 Equivalently: learn x Wy where W = diag(w1, w2, … wD) Precision in Latent Space 2(x) SGPLVM Objective Function C y x2 f(x; θ) x y2 y3 y1 x1 L IK (x, y; θ) Wθ (y f(x; θ)) 2 (x; θ) 2 2 D 2 ln (x; θ) 2 Baseball Pitch Track Start Jump Shot Style interpolation Given two styles 1 and 2, can we “interpolate” them? p1 ( y ) exp( LIK ( y; θ1 )) p2 (y) exp( LIK (y; θ2 )) Approach: interpolate in log-domain Style interpolation p1 (y) exp( LIK (y; θ1 )) (1-s) p2 (y) exp( LIK (y; θ2 )) s (1 s) p1 ( y) sp2 ( y) Style interpolation in log space exp( LIK (y; θ1 )) (1-s) exp( LIK (y; θ1 )) s exp( ((1 s) L(y; θ1 ) sL(y; θ2 )) Interactive Posing Interactive Posing Interactive Posing Multiple motion style Realtime Motion Capture Style Interpolation Trajectory Keyframing Posing from an Image Modeling motion GPLVM doesn’t model motions • Velocity features are a hack How do we model and learn dynamics? Gaussian Process Dynamical Models with: David Fleet, Jack Wang Dynamical models xt xt+1 Dynamical models Hidden Markov Model (HMM) Linear Dynamical Systems (LDS) [van Overschee et al ‘94; Doretto et al ‘01] Switching LDS [Ghahramani and Hinton ’98; Pavlovic et al ‘00; Li et al ‘02] Nonlinear Dynamical Systems [e.g., Ghahramani and Roweis ‘00] Gaussian Process Dynamical Model (GPDM) Latent dynamical model: latent dynamics pose reconstruction Assume IID Gaussian noise, and with Gaussian priors on and Marginalize out , and then optimize the latent positions to simultaneously minimize pose reconstruction error and (dynamic) prediction error on training data . Dynamics The latent dynamic process on has a similar form: where is a kernel matrix defined by kernel function with hyperparameters Markov Property Subspace dynamical model: Remark: Conditioned on , the dynamical model is 1st-order Markov, but the marginalization introduces longer temporal dependence. Learning GPDM posterior: reconstruction likelihood training motions dynamics likelihood priors latent hyperparameters trajectories To estimate the latent coordinates & kernel parameters we minimize with respect to and . Motion Capture Data ~2.5 gait cycles (157 frames) Learned latent coordinates (1st-order prediction, RBF kernel) 56 joint angles + 3 global translational velocity + 3 global orientation from CMU motion capture database 3D GPLVM Latent Coordinates large “jumps’ in latent space Reconstruction Variance Volume visualization of (1st-order prediction, RBF kernel) . Motion Simulation initial state Random trajectories from MCMC (~1 gait cycle, 60 steps) Animation of mean motion (200 step sequence) Simulation: 1st-Order Mean Prediction Red: 200 steps of mean prediction Green: 60-step MCMC mean Animation Missing Data 50 of 147 frames dropped (almost a full gait cycle) spline interpolation Missing Data: RBF Dynamics Determining hyperparameters Data: six distinct walkers GPDM Neil’s parameters MCEM Where do we go from here? Let’s look at some limitations of the model 60 Hz 120 Hz What do we want? Phase Variation x2 x1 A walk cycle Branching motions Walk Run Stylistic variation Current work: manifold GPs Latent space (x) Data space (y) Summary GPLVM and GPDM provide priors from small data sets Dependence on initialization, hyperpriors, latent dimensionality Open problems modeling data topology and stylistic variation