Supervised Descent Method and its
Applications to Face Alignment
Xuehan Xiong, Fernando De la Torre (CVPR-2013, extensions: submitted to TPAMI)
Zoltán Szabó
Gatsby Unit, Tea Talk
March 16, 2015
Zoltán Szabó
Motivation
Computer vision: many tasks boil down to continuous nonlinear optimization.
Our focus: facial feature detection/tracking.
Zoltán Szabó
Newton’s method (and its variants)
Task: min x f ( x ) , f ∈ C
2
.
Newton’s method: locally quadratic approximation, x
0
: given.
Second order Taylor expansion ( k = 0 , 1 , 2 , ...
) around x k
: f ( x k
+ ∆ x ) ≈ f ( x k
) + J f
( x k
)
T
(∆ x ) +
1
2
(∆ x )
T
H ( x k
)(∆ x ) ⇒
∆ x k + 1
= − H −
1
( x k
) J f
( x k
) , x k + 1
= x k
+ ∆ x k + 1
.
Zoltán Szabó
Newton’s method: pro & contra
Advantages:
If it converges ⇒ quadratic rate ( q = 2) lim k
→∞ k x k k x k
−
1
− x
∗ k
− x
∗ k q
= L > 0 .
If x
0 is ”close enough” to x
∗
Disadvantages (in CV):
⇒ convergence.
f : non-differentiable (SIFT) → numerical J f
Linear equation: expensive (quasi too).
, H : slow.
Zoltán Szabó
Optimization idea (z-axis: reversed)
Zoltán Szabó
Face alignment: x
0
– initial estimation, x
∗
– manual labels
Zoltán Szabó
Face alignment: formulation
Image: d .
Landmark locations ( p ): x = [ x
1
; y
1
; . . .
; x p
; y p
] ∈ R
2 p .
Feature extraction function (SIFT): h .
Features around the landmarks: h ( x ; d ) ∈ R
128 p
.
Task: for given x
0 g (∆ x ) = f ( x
0
+ ∆ x ) = k h ( x
0
+ ∆ x ; d ) − φ
∗ k
2
2
→ min
∆ x
,
φ
∗
= h ( x
∗
; d ) .
Zoltán Szabó
Face alignment
Task: f ( x
0
+ ∆ x ) = k h ( x
0
+ ∆ x ; d ) − φ
∗ k
2
2
→ min
∆ x
.
By the Newton trick (+chain rule):
∆ x
1
= − H −
1
( x
0
) J f
( x
0
) = − 2 H −
1
( x
0
) J
T h
( x
0
)( φ
0
− φ
∗
) ,
φ
0
= h ( x
0
; d ) ,
φ
∗
= h ( x
∗
; d ) .
Zoltán Szabó
Face alignment
Task: f ( x
0
+ ∆ x ) = k h ( x
0
+ ∆ x ; d ) − φ
∗ k
2
2
→ min
∆ x
.
By the Newton trick (+chain rule):
∆ x
1
= − H −
1
( x
0
) J f
( x
0
) = − 2 H −
1
( x
0
) J
T h
( x
0
)( φ
0
− φ
∗
) ,
φ
0
= h ( x
0
; d ) ,
φ
∗
= h ( x
∗
; d ) .
Idea [ H := H ( x
0
) , J h
:= J h
( x
0
) ]:
∆ x
1
= − 2 H −
1
J
T h
( φ
0
− φ
∗
) = h
− 2 H −
1
J
T h i
φ
0
+ h
2 H −
1
J
T h
φ
∗ i
= R
0
φ
0
+ b
0
⇒ optimize for ( R
0
, b
0
) based on samples .
Zoltán Szabó
Face alignment
The algorithm is unlikely to converge in 1 iteration.
Cascade of regressors:
∆ x
1
= R
0
φ
0
+ b
0
,
−− − −
∆ x k
= R k
−
1
φ k
−
1
+ b k
−
1
, where
φ k
−
1
= h ( x k
−
1
; d ) : features at the previous landmarks .
Zoltán Szabó
Face alignment: optimization ( k
=
0)
Given: set of images: { d i } N i = 1 initial estimates: { x i
0
, hand-labelled landmarks: { x i
∗
} N i = 1
⇒
} N i = 1
,
Optimal updates, extracted features:
∆ x i
∗
0
= x i
∗
− x i
0
, φ i
0
= h x i
0
; d i
.
Objective:
J ( R
0
, b
0
) =
1
N
N
X i = 1
∆ x i
∗
0
− R
0
φ i
0
− b
0
2
2
→ min
R
0
, b
0
.
Zoltán Szabó
Face alignment: optimization (general k )
Update the landmark estimates ( x k
):
∆ x i k
= R k
−
1
φ i k
−
1
+ b k
−
1
( i = 1 , . . . ,
Compute optimal updates ( ∀ i ), extract features:
N ) .
∆ x i
∗ k
= x i
∗
− x i k
, φ i k
= h x i k
; d i
.
Objective:
J ( R k
, b k
) =
1
N
N
X i = 1
∆ x i
∗ k
− R k
φ i k
− b k
2
2
→ min
R k
, b k
.
Numerical experience: convergence in 4 − 5 steps.
Zoltán Szabó
Face alignment: training, testing
Training ⇒ { R k
, b k
} .
Testing: test image:
˜
, inital estimate: x
0
, extract features: φ
0
= h x
0
; ˜ , iteratively compute ∆ x k
, the features at x k
( k = 1 , . . .
):
∆ x k
= R k
−
1
φ k
−
1
+ b k
−
1
,
φ k
= h x k
; ˜ .
Zoltán Szabó
Facial feature detection: 2 ”face in the wild” datasets
Last row: 10 worst cases.
Zoltán Szabó
Facial feature detection: cumulative error distribution curves
Zoltán Szabó
Facial feature tracking
Initialization = landmark estimate from the previous frame.
(a): average RMSE-s on 29 videos, (b): RMSE=5 .
03 demo.
Zoltán Szabó
Summary
Focus: continuous nonlinear optimization.
Newton’s method: expensive.
Idea: supervised Newton method, learn cascade of affine regressors based on samples.
Application: facial feature detecion, face tracking.
Zoltán Szabó
Thank you for the attention!
Zoltán Szabó