EM for motion segmentation

advertisement
EM for Motion Segmentation
“Perceptually organized EM: A framework that
combines information about form and motion”
“A unified mixture framework for motion
segmentation: incorporating spatial coherence and
estimating the number of models”
By:
Yair Weiss and Edward H. Adelson.
Presenting:
Ady Ecker and Max Chvalevsky.
Contents





2
Motion segmentation.
Expectation Maximization.
EM for motion segmentation.
EM modifications for motion segmentation.
Summery.
Part 1:
Motion
Segmentation
Motion segmentation problem
 Input:
1. Sequence of images.
2. Flow vector field – output
of standard algorithm.
 Problem:
Find a small number of
moving objects in the
sequence of images.
4
vy
v
vx
Segmentation Output
 Classification of each
pixel in each image to
its object.
 Full velocity field.
flow data
5
velocity field
Segmentation goal
6
Motion vs. static segmentation
 Combination of motion and spatial data.
Object can contain parts with different static
parameters (several colors).
 Object representation in an image can be
non-continuous when:


7
There are occlusions.
Only parts of the object are captured...
Difficulties
 Motion estimation.
 Integration versus segmentation dilemma.
 Smoothing inside the model while keeping
models independent.
8
Motion estimation - review
 Estimation cannot
be done from local
measurements only.
We have to
integrate them.
9
Motion integration
 In reality we will not have clear distinction
between corners and lines.
10
Integration without
segmentation
 When there are several motions, we might
get false intersection points of velocity
constraints at T-junctions.
11
Integration without
segmentation
 False corners
(T-junctions)
introduce false
dominant
directions
(upwards).
12
Contour ownership
 Most pixels inside the object don’t supply
movement information. They move with the
whole object.
13
Smoothing
 We would like to smooth information inside
objects, not between objects.
14
Smoothness in layers
15
Human segmentation
 Humans perform segmentation effortlessly.
 Segmentation may be illusive.
 Tendency to prefer (and tradeoff):


16
Small number of models.
Slow and smooth motion.
 The segmentation depends on factors such as
contrast and speed, that effect our confidence
in possible motions.
Segmentation illusion –
The split herringbone
17
Segmentation Illusion - plaids
18
Part 2:
Expectation
Maximization
Clustering
20
Clustering Problems
 Structure:


Vectors in high-dimension space belong to
(disjoint) groups (clusters, classes, populations).
Given a vector, find its group (label).
 Examples:



21
Medical diagnosis.
Vector Quantization.
Motion Segmentation.
Clustering by distance to
known centers
22
Finding the centers from known
clustering
23
EM: Unknown clusters and
centers
Start with random
model parameters
Expectation step:
Classify each vector
to the closest center
24
Maximization step:
Find the center (mean)
of each class
Illustration
25
EM Characteristics
 Simple to program.
 Separates the iterative stage to two
independent simple stages.
 Convergence is guaranteed, to some local
minimum.
 Speed and quality depend on:


26

Number of clusters.
Geometric Shape of the real clusters.
Initial clustering.
Soft EM
Each point is given a probability (weight)
to belong to each class.
27
 The E step:
The probabilities of each point are updated
according to the distances to the centers.
 The M step:
Class centers are computed as a weighted
average over all data points.
Soft EM (cont.)
 Final E step:
classify each point to the nearest (most
probable) center.
 As a result:



28
Points near a center of a cluster have high
influence on the location of the center.
Points near clusters boundaries have small
influence on several centers.
Convergence to local minima is avoided as
each point can softly change its group.
Perceptual Organization
 Neighboring or similar points
are likely to be of the same class.
 Account for this in the computation of
weights by prior probabilities.
29
Example:
Fitting 2 lines to data points
 Input:

(xi,yi)
ri
Data points that where
generated by 2 lines
with Gaussian noise.
 Output:


30
The parameters of
the 2 lines.
The assignment of
each point to its line.
y=a1x+b1+sv
v~N(0,1)
y=a2x+b2+sv
The E Step
 Compute residuals assuming known lines:
r1 (i )  a1 xi  b1  y i
r2 (i )  a 2 xi  b2  yi
 Compute soft assignments:
w1 (i ) 
w2 (i ) 
31
e
e
 r12 ( i ) / s 2
 r12 ( i ) / s 2
e
e
e
 r22 ( i ) / s 2
 r22 ( i ) / s 2
 r12 ( i ) / s 2
e
 r22 ( i ) / s 2
Least-Squares review
 In case of single line and normal i.i.d. errors,
maximum likelihood estimation reduces to
least-squares:
2
min a ,b i axi  b  y i   min a ,b i ri 2
 The line parameters (a,b) are solutions to the
system:
32
 i xi2

  xi
 i
 x   a     x y 
 1   b    y 
i
i
i
i
i
i
i
i
The M Step
 In the weighted case we find
min a ,b
 w (i)r
2
1
1
i
(i )  i w2 (i )r22 (i )

 Weighted least squares system is solved twice for
(a1,b1) and (a2,b2).
33
 i w1 (i ) xi2

  w1 (i ) xi
 i
 w (i) x   a     w (i) x y 
b  


w
(
i
)
y
w
(
i
)







 i w2 (i ) xi2

  w2 (i ) xi
 i
 w (i) x   a
b

w
(
i
)


1
i
i
i
1
2
i
i
1
1
i
2
1
i
i
i
1
i
i
  i w2 (i ) xi y i 
 
  w2 (i ) y i 
2 
i


2
Illustrations
34
Illustration
Estimating the number of
models
 In weighted scenario, additional models will
not necessarily reduce the total error.
 The optimal number of models is a function
of the s parameter – how well we expect the
model to fit the data.
 Algorithm: start with many models.
redundant models will collapse.
36
Illustration
l=log(likelihood)
Part 3:
EM for Motion
Segmentation
Segmentation of image motion:
Input
Products of image sequence:
 Local flow – output of standard algorithm.
 Pixel intensities and color.
 Pixel coordinates.
 Static segmentation:


39
Based on the same local data.
Problematic as explained before.
Segmentation output
 segmentation
40
 Models:
‘blue’ model
‘red’ model
Notations








41
r - pixel.
Or - flow vector at pixel r.
k - model id.
qk - parameters of model k.
vk(r) - velocity predicted by model k at location r.
Dk(r) = D(r, qk) - distance measure.
s - expected noise variance.
gk(r) - probability that pixel ‘r’
is a member of model ‘k’.
Segmentation output
 Segmented O:
42
r
O(r)
 Model parameters:
qblue
q red
Vblue(r)
Vred(r)
The E Step
 Purpose: determine statistic classification of
every pixel to models.
 k exp( D (r ) s )
g k (r ) 
2
2
 j  j exp( D j (r ) s )
2
k
2
 k(r) - prior probability granted to model ‘k’.
 For classical EM, k(r) are equal for all ‘k’.
43
The E Step (cont)
 Alternative representation:
g (r )  softmin( D12 (r ) s 2 , D22 (r ) s 2 ,...)
Soft decision enables slow convergence to better
minimum instead of finding local minima.
44
Distance measure functionality
 Correct physical interpretation of motion
data.
 If possible – enable analytic solution.
45
Distance measures (1)
 Optic flow constraint:
I
I 2
D (r )    rs ( v k (r )  )
r
t
s
2
k
46
  – window centered at ‘r’.
 vk(r) – velocity of ‘k’ at location ‘r’.
 Quadratic. Provides closed MLE solution
for the M-step.
Distance measures (2)
 Deviation from constant intensity:
D (r )    rs ( I ( s  v k (r ), t  t )  I ( s, t ))
2
k
s
  – window centered at ‘r’.
 Good for high speed motion.
 Resolved by successive linearizations.
47
2
The M step
 Purpose: layer optimization
(according the soft classification of pixels).


2
q k  arg min J (q k )   g k (r ) D (r ,q k )
q
r


 Produces weighted ‘average’ of the model.
 ‘Average’ depends on definition of D.
 Constrained by J (slow & smooth motion).
48
J (cost) definition
 For loosely constrained q
(typical for image segmentation):
J (q )  
x, y

an q

x
n 0

n
 For highly constrained q:
(#degrees of freedom < #owned pixels).
0
49
EM: Unknown clusters and
centers
Start with random
model parameters
Estimation step:
Classify each vector
to the closest center
50
Maximization step:
Find the center (mean)
of each class
Natural image processing
without segmentation
a) Frame of a movie taken
from driving car.
b) Flow data along the
dotted line.
c) Smooth global
approximation of
motion (along the line).
51
EM natural image processing
a) The same picture.
b) Rigid-like model
segmentation.
c) EM result.
52
Textureless Regions
 Homogeneous regions have no clear layer
preference
(stay gray in ownership plots).
 Wrong segmentation decisions for “similar”
motions (squares example).
 Probabilistic resolution of ambiguities
(bars example).
53
Illustration
 BARS:
Probabilistic solution,
vertical vs. horizontal.
 2 squares moving diagonally right



No segmentation for vertical lines.
No segmentation for background.
Motion directions identified
correctly.
 Hand:

54
Noisy segmentation
Energy formulation of EM
E eff (q , g , )   g k ( r ) Dk2 ( r )  s 2  g k ( r ) log g k ( r )
r ,k
r ,k
 E-step: optimization with respect to gk(r).


First term prefers hard decision.
Second term (entropy) prefers no decision.
 M-step: optimization with respect to q
(embedded in D).
55
Part 4:
EM Modifications
for Motion
Segmentation
Proposed modifications
 POEM – Perceptually Organized EM
Combines local & neighbor motion
with static analysis.


Regional grouping.
Color & intensity data.
 Contour ownership.
 Outlier detection


57
T-junction points.
Statistical outliers.
POEM algorithm idea
 Determine the segmentation based on:



Local pixel flow (standard EM).
Neighbor pixel segmentation.
Static data (optionally).
 Reason:


58
neighboring pixels have higher probability to
belong the same object.
Similar pixels have even higher probability to
belong the same object.
PO
 Window of influence
w(r , s )  exp( 
rs
s
2
1

I (r )  I ( s)
s
2
2
 Neighbor votes:
Vk (r )   g k ( s) w(r , s)
s
59
)
POE step
 Basic equation:
 k exp( Dk2 (r ) s 2 )
g k (r ) 
2
2

exp(

D
(
r
)
s
)
j j
j
  k estimation:
exp(Vk (r ))
ˆ k (r ) 
 j exp(V j (r ))
60
POE step alternative
representation
g (r )  softmin( D1 (r )  V1 (r ), D2 (r )  V2 (r ),...)
 The solution is computationally intensive.
61
M step in POEM
 The M step is unchanged:


2
q k  arg min J (q k )   g k (r ) D (r ,q k )
q
r


62
POEM Energy formulation
E eff (q , g , )   g k (r ) Dk2 (r )  s 2  g k (r ) log g k (r ) 
r ,k
r ,k
   w(r , s ) g k (r ) g k ( s )
r ,k s  r
 PO represented by the additional (last) term
63
Contour ownership
 Implemented by modification of PO function:



Step 1: preliminary segmentation & border detection.
Step 2: contour ownership determination –
by relative depth,
consistent with T-junctions.
Step 3: combining in voting procedure.
 Equation modification:
window of influence between pixels (w(r,s))
gives additional weight
to pixels on segment’s borders.
64
Results of POEM
Advantages:
 Resolves regions without texture
by propagating information from borders.
 More robust to noise.
65
Illustration
 2 squares moving diagonally right

Partially correct segmentation for
vertical lines.
 Moving hand:

66
Smooth solution
Bars results
 Input
 Classical EM segmentation
 POEM segmentation without
contour ownership
 POEM segmentation with
contour ownership
67
Flow outliers
 Segmentation into k layers
and additional layer of “outliers”
 Probability to be outlier – function of:


Prior – e.g. for T-junction.
Likelihood – likelihood to be outlier.
 Outliers don’t participate directly in PO.
 Outliers layer is not smooth nor slow.
68
Part 5:
Summery
Advantages
 The system showed relatively good results:


For natural images.
For artificial, challenging images.
 Simple - has few parameters.
 Universal.
 Modular - enables improvements:



70

Utilizing additional data (mostly static).
Optimizing parameters.
Using advanced convergence methods.
Altering priors to fit non-symmetric biological phenomena.
Drawbacks
 Some images weren’t resolved completely:

Edge deviations in ‘hand’ image.
 The system includes input-dependant s.
No process to determine value of s was proposed.
 The ‘optic flow constraint’ measure is appropriate
only for instantaneous motion.
 Other distance measures – much more difficult to
solve.
71
Conclusions
 The system tackles with ambiguity & noise.
 It estimates the degree of ambiguity.
 It assumes slow & smooth motions.
 The system is capable to explain its input by
segmentation into separate layers of motion.
 It exploits static data to improve the
segmentation.
72
Download