CSM_OFMKAM

advertisement
Manifold Learning in the Wild
A New Manifold Modeling and Learning Framework for
Image Ensembles
Aswin C. Sankaranarayanan
Rice University
Richard G. Baraniuk
Chinmay Hegde
Sriram Nagaraj
Sensor Data Deluge
Concise Models
• Efficient processing / compression requires
concise representation
• Our interest in this talk:
Collections of images
Concise Models
• Our interest in this talk:
Collections of image
parameterized by q \in Q
– translations of an object
q: x-offset and y-offset
– wedgelets
q: orientation and offset
– rotations of a 3D object
q: pitch, roll, yaw
Concise Models
• Our interest in this talk:
Collections of image
parameterized by q \in Q
– translations of an object
q: x-offset and y-offset
– wedgelets
q: orientation and offset
– rotations of a 3D object
q: pitch, roll, yaw
• Image articulation manifold
Image Articulation Manifold
• N-pixel images:
• K-dimensional
articulation space
• Then
is a K-dimensional manifold
in the ambient space
• Very concise model
articulation parameter space
Smooth IAMs
• N-pixel images:
• Local isometry
image distance
parameter space distance
• Linear tangent spaces
are close approximation
locally
• Low dimensional
articulation space
articulation parameter space
Smooth IAMs
• N-pixel images:
• Local isometry
image distance
parameter space distance
• Linear tangent spaces
are close approximation
locally
• Low dimensional
articulation space
articulation parameter space
Smooth IAMs
• N-pixel images:
• Local isometry
image distance
parameter space distance
• Linear tangent spaces
are close approximation
locally
• Low dimensional
articulation space
articulation parameter space
Ex: Manifold Learning
LLE
ISOMAP
LE
HE
Diff. Geo …
• K=1
rotation
Ex: Manifold Learning
• K=2
rotation and scale
Theory/Practice Disconnect
Smoothness
• Practical image manifolds are not smooth!
• If images have sharp edges, then manifold is
everywhere non-differentiable [Donoho and Grimes]
Tangent approximations ?
Theory/Practice Disconnect
Smoothness
• Practical image manifolds are not smooth!
• If images have sharp edges, then manifold is
everywhere non-differentiable [Donoho and Grimes]
Tangent approximations ?
Failure of Tangent Plane Approx.
• Ex: cross-fading when synthesizing / interpolating
images that should lie on manifold
Input
Image
Input
Image
Geodesic
Linear path
Theory/Practice Disconnect
Isometry
• Ex: translation
manifold
all blue images
are equidistant
from the red image
• Local isometry
3.5
– satisfied only when sampling is
dense
Euclidean distance
3
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
Theory/Practice Disconnect
Nuisance articulations
• Unsupervised data, invariably, has additional
undesired articulations
– Illumination
– Background clutter, occlusions, …
• Image ensemble is no longer low-dimensional
Image representations
• Conventional representation for an image
– A vector of pixels
– Inadequate!
pixel image
Image representations
• Conventional representation for an image
– A vector of pixels
– Inadequate!
• Remainder of the talk
– TWO novel image representations that alleviate the
theoretical/practical challenges in manifold learning on
image ensembles
Transport operators for image
manifolds
Beyond point cloud model for image
manifolds
The concept of
Transport operators
I q  f  I q ref
Example
I q ( x )  I q ref ( f ( x ))
Example: Translation
• 2D Translation manifold
I0
I1 (x) = I 0 (x + D1 ),
I1
f1 (x) = x + D1
• Set of all transport operators = R
2
• Beyond a point cloud model
– Action of the articulation is more accurate and meaningful
Optical Flow
• Generalizing this idea: Pixel correspondances
I1 and I2
• Idea:
OF
from I1 to I2
OF between two images is a natural
and accurate transport operator
(Figures from Ce Liu’s optical flow page)
Optical Flow Transport
• Consider a reference image
and a K-dimensional articulation
• Collect optical flows from
to all images reachable by a
K-dimensional articulation
IAM
Iq 0
I q1
Iq2
q0
q1
Articulations
q2
Optical Flow Transport
• Consider a reference image
and a K-dimensional articulation
• Collect optical flows from
to all images reachable by a
K-dimensional articulation
IAM
Iq 0
I q1
Iq2
q0
q1
Articulations
q2
Optical Flow Transport
• Consider a reference image
and a K-dimensional articulation
• Collect optical flows from
to all images reachable by a
K-dimensional articulation
• Collection of OFs is a smooth, Kdimensional manifold
(even if IAM is not smooth)
for large class of articulations
OFM at I q
0
IAM
Iq0
I q1
Iq2
q0
q1
Articulations
q2
OFM is Smooth
Pixel intensity
at 3 points
Intensity
I(θ)
Intensity
q   60 
Optical
in pixels
vx v(θ)
flow
Op. flow
q 0  0
q  30 
q  60 
200
150
100
50
0
Flow
(nearly linear)
q   30 
(Rotation)
20
-80
-60
-40
-80
-60
-40
-20
0
20
40
60
80
-20
0
20
Articulation q in []
40
60
80
Articulation q in []
10
0
-10
-20
Articulation θ in [⁰]
Main results
• Local model at each I q
OFM at I q
0
0
• Each point on the OFM
defines a transport operator
– Each transport operator maps I q
to one of its neighbors
I q1
• For a large class of
articulations, OFMs are
smooth and locally
isometric
– Traditional manifold processing
techniques work on OFMs
IAM
Iq0
0
Iq2
q0
q1
Articulations
q2
Linking it all together
Nonlinear dim.
reduction
OFM at I
q0
e0
e1
IAM
Iq0
I q1
Iq2
Articulations
The non-differentiability does not
dissappear --- it is embedded in
the mapping from OFM to the IAM.
However, this is a known map
q0
q1
e2
q2
{ I ref , f q }  I q such that
I q ( x )  I ref ( f q ( x ))
The Story So Far…
Tangent space at
IAM
Iq0
I q1
OFM at I q
Iq0
Iq2
Articulations
IAM
Iq0
I q1
q0
q1
0
Iq2
q0
q2
q1
Articulations
q2
Input Image
IAM
OFM
Input Image
Geodesic
Linear path
OFM Manifold Learning
Data
196 images of two
bears moving linearly
and independently
Task
Find low-dimensional
embedding
IAM
OFM
OFM ML + Parameter Estimation
Data
196 images of
a cup moving
on a plane
Task 1
Find low-dimensional
embedding
Task 2
Parameter estimation
for new images
(tracing an “R”)
IAM
OFM
Karcher Mean
• Point on the manifold such that the sum of
geodesic distances to every other point is minimized
• Important concept in nonlinear data modeling,
compression, shape analysis
[Srivastava et al]
10 images
from an IAM
ground truth KM
OFM KM
linear KM
Sparse keypoint-based image
representation
Image representations
• Conventional representation for an image
– A vector of pixels
– Inadequate!
pixel image
Image representations
• Replace vector of pixels with an abstract
bag of features
– Ex: SIFT (Scale Invariant Feature Transform) selects
keypoint locations in an image and computes
keypoint descriptors for each keypoint
Image representations
• Replace vector of pixels with an abstract
bag of features
– Ex: SIFT (Scale Invariant Feature Transform) selects
keypoint locations in an image and computes
keypoint descriptors for each keypoint
– Feature descriptors are local; it is very easy to make them
robust to nuisance imaging parameters
Loss of Geometrical Info
• Bag of features representations hide
potentially useful image geometry
Image space
Keypoint space
• Goal: make salient
image geometrical info
more explicit for
exploitation
Key idea
• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
• Mechanism: define kernels
,
between keypoint locations, keypoint descriptors
Keypoint Kernel
• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
• Mechanism: define kernels
,
between keypoint locations, keypoint descriptors
• Joint keypoint kernel between two images
is given by
Keypoint Geometry
Theorem: Under the metric induced by the kernel
certain ensembles of articulating images form
smooth, isometric manifolds
• In contrast: conventional approach to image fusion
via image articulation manifolds (IAMs) fraught
with non-differentiability (due to sharp image edges)
– not smooth
– not isometric
Application: Manifold Learning
2D Translation
Application: Manifold Learning
2D Translation
IAM
KAM
Manifold Learning in the Wild
• Rice University’s Duncan Hall Lobby
– 158 images
– 360° panorama using handheld camera
– Varying brightness, clutter
Manifold Learning in the Wild
• Duncan Hall Lobby
• Ground truth using state of the
art structure-from-motion
software
Ground truth
IAM
KAM
Internet scale imagery
• Notre-dame
cathedral
– 738 images
– Collected from Flickr
– Large variations in
illumination
(night/day/saturations)
, clutter (people,
decorations), camera
parameters (focal
length, fov, …)
– Non-uniform sampling
of the space
Organization
• k-nearest neighbors
Organization
• “geodesics’
“zoom-out”
“Walk-closer”
3D rotation
Summary
• Need for novel image representations
– Transport operators
 Enables differentiability and meaningful transport
– Sparse features
 Robustness to outliers, nuisance articulations, etc.
 Learning in the wild: unsupervised imagery
• True power of manifold signal processing lies in fast
algorithms that mainly use neighbor relationships
– What are compelling applications where such methods can
achieve state of the art performance ?
Summary
• IAMs a useful concise model for many image
processing problems involving image collections
and multiple sensors/viewpoints
• But practical IAMs are non-differentiable
– IAM-based algorithms have not lived up to their promise
• Optical flow manifolds (OFMs)
– smooth even when IAM is not
– OFM ~ nonlinear tangent space
– support accurate image synthesis, learning, charting, …
• Barely discussed here: OF enables the safe
extension of differential geometry concepts
– Log/Exp maps, Karcher mean, parallel transport, …
Summary: Rice U
• Bag of features representations pervasive in
image information fusion applications (ex: SIFT)
• Progress to date:
– Bags of features can contain rich geometrical structure
– Keypoint distance very efficient to compute
(contrast with combinatorial complexity of feature
correspondence algorithm)
– Keypoint distance significantly outperforms standard image
Euclidean distance in learning and fusion applications
– Punch line: KAM preferred over IAM
• Current/future work:
– More exhaustive experiments with occlusion and clutter
– Studying geometrical structure of feature representations
of other kinds of non-image data (ex: documents)
Open Questions
• Our treatment is specific to
image manifolds under
brightness constancy
• What are the natural transport operators for
other data manifolds?
dsp.rice.edu
Related Work
• Analytic transport operators
– transport operator has group structure
[Xiao and Rao
07][Culpepper and Olshausen 09] [Miller and Younes 01] [Tuzel et al 08]
– non-linear analytics [Dollar et al 06]
– spatio-temporal manifolds [Li and Chellappa 10]
– shape manifolds [Klassen et al 04]
• Analytic approach limited to a small class of standard
image transformations
(ex: affine transformations, Lie groups)
• In contrast, OFM approach works reliably with
real-world image samples (point clouds) and
broader class of transformations
Limitations
• Brightness constancy
– Optical flow is no longer meaningful
• Occlusion
– Undefined pixel flow in theory, arbitrary flow estimates in
practice
– Heuristics to deal with it
• Changing backgrounds etc.
– Transport operator assumption too strict
– Sparse correspondences ?
Occlusion
• Detect occlusion using forward-backward flow
reasoning
Occluded
• Remove occluded pixel computations
• Heuristic --- formal occlusion handling is hard
Open Questions
• Theorem:
random measurements
stably embed a
K-dim manifold
whp [B, Wakin, FOCM ’08]
• Q: Is there an analogous
result for OFMs?
Tools for manifold processing
Algebraic manifolds
Data manifolds
Geodesics, exponential
maps, log-maps,
Riemannian metrics, Karcher
means, …
LLE, kNN graphs
Smooth diff manifold
Point cloud model
Concise Models
• Efficient processing / compression requires
concise representation
• Sparsity of an individual image
pixels
large
wavelet
coefficients
(blue = 0)
History of Optical Flow
• Dark ages (<1985)
– special cases solved
– LBC an under-determined set of linear equations
• Horn and Schunk (1985)
– Regularization term: smoothness prior on the flow
• Brox et al (2005)
– shows that linearization of brightness constancy (BC) is
a bad assumption
– develops optimization framework to handle BC directly
• Brox et al (2010), Black et al (2010), Liu et al
(2010)
– practical systems with reliable code
Manifold Learning
2D rotations
ISOMAP embedding error
for OFM and IAM
Reference
image
Manifold Learning
2D rotations
Embedding of OFM
Two-dimensional Isomap embedding (with neighborhood graph).
5
4
Reference
image
3
2
1
0
-1
-2
-3
-4
-5
-6
-4
-2
0
2
4
6
OFM Implementation details
Reference Image
Pairwise distances and embedding
100
4
200
300
2
400
0
0.5
100
200
300
400
Residual variance
-2
0.4
-4
0.3
-5
0.2
0.1
0
0
2
4
6
8
Isomap dimensionality
10
0
5
Flow Embedding
100
200
300
400
100
200
300
400
4
0.5
Residual variance
2
0.4
0
0.3
0.2
-2
0.1
0
0
2
4
6
8
Isomap dimensionality
10
-4
-5
0
5
OFM Synthesis
Manifold Charting
• Goal: build a generative model for an entire
IAM/OFM based on a small number of base images
• eLCheapoTM algorithm:
–
–
–
–
–
choose a reference image randomly
find all images that can be generated from this image by OF
compute Karcher (geodesic) mean of these images
compute OF from Karcher mean image to other images
repeat on the remaining images until no images remain
• Exact representation when no occlusions
Manifold Charting
• Goal: build a generative model for an entire
IAM/OFM based on a small number of base images
• Ex: cube rotating about axis. All cube images can be
representing using 4 reference images + OFMs
IAM
Optimal
charting
q   180 
q   90 
q  0
q  90 
Greedy
charting
• Many applications
– selection of target templates for classification
– “next-view” selection for adaptive sensing applications
q  180 
SIFT Features
• Features (including SIFT) ubiquitous in fusion and
processing apps (15k+ cites for 2 SIFT papers)
image stitching
organizing internet-scale databases
building 3D models
part-based object
recognition
Figures courtesy Rob Fergus (NYU), Phototourism website, Antonio Torralba (MIT), and Wei Lu
Many Possible Kernels
• Euclidean kernel
• Gaussian kernel
• Polynomial kernel
• Pyramid match kernel
• Many others
[Grauman et al. ’07]
Keypoint Kernel
• Joint keypoint kernel between two images
is given by
• Using Euclidean/Gaussian (E/G) combination yields
From Kernel to Metric
Theorem: The E/G keypoint kernel is a Mercer kernel
– enables algorithms such as SVM
Theorem: The E/G keypoint kernel induces a metric
on the space of images
– alternative to conventional L2 distance between images
– keypoint metric robust to nuisance imaging parameters,
occlusion, clutter, etc.
Keypoint Geometry
NOTE:
Metric requires no permutation-based
matching of keypoints
(combinatorial complexity in general)
Theorem: The E/G keypoint kernel induces a metric
on the space of images
Keypoint Geometry
Theorem: Under the metric induced by the kernel
certain ensembles of articulating images form
smooth, isometric manifolds
• Keypoint representation compact, efficient, and …
• Robust to illumination variations, non-stationary backgrounds,
clutter, occlusions
Manifold Learning in the Wild
Viewing angle – 179 images
IAM
KAM
Manifold Learning in the Wild
• Rice University’s Brochstein Pavilion
– 400 outdoor images of a building
– occlusions, movement in foreground, varying background
Manifold Learning in the Wild
• Brochstein Pavilion
– 400 outdoor images of a building
– occlusions, movement in foreground, background
IAM
KAM
Download