Lecture 3&4 Modelling vision problems: Applications & Insights Carsten Rother

advertisement
Lecture 3&4
Modelling vision problems:
Applications & Insights
Carsten Rother
Comments and corrections form last lecture
all slides are now online:
http://research.microsoft.com/en-us/um/cambridge/projects/tutorials/course5F16/
Prior – 4x4 Grid
Pure Prior model: P(x) = 1/f ∏ exp{-10|xi-xj|}
i,j
Є
N4
It was incorrect:
P(x) = 1/f ∏ 10 exp{-|xi-xj|} = 1/f ∏ exp{- 2.3 |xi-xj|}
i,j Є N4
Probability
Distribution
216 configurations
i,j Є N4
Samples
Recap: Decision Theory
Assume w has been learned and P(x|z,w) is:
Worst Solutions sorted by probability
Probability
Best Solutions sorted by probability
216 configurations
Which solution x* would you choose?
Recap: How to make a decision
Assume model P(x|z,w) is known
Goal: Choose x* which minimizes the risk R
x* = argmin R(x)
x
Risk R is the expected loss:
R(x) =
∑ P(x’|z,w)
x’
Δ(x’,x)
“loss function”
Recap: Decision Theory
Best Solutions sorted by probability
Risk: R(x) =
∑ P(x’|z,w) Δ(x’,x)
x’
Worst Solutions sorted by probability
0/1 loss:
Δ(x,x’) = 0 if x’=x, 1 otherwise
x* = argmin R(x) = argmin ∑ P(x’|z,w) Δ(x’,x) = 2^16 - P(x|z,w)
x
x
x’
Therefore: x* = argmax P(x|z,w)
x
which is the MAP solution
Question from last lecture:
Loss-based learning and Marginals
Minimize R = 1/|T|
∑ Δ(xt,x*t)
t
Search: Hamming loss (MAP)
Testing:
1.
Hamming Loss MAP (w=0.1)
Error Hamming: 0.093
2.
Hamming Loss Marginals (w=0.4)
Error Hamming: 0.1039
x* = argmax P(x|z,w) (MAP)
x
Search: Hamming loss (Marginals)
Marginals:
x* = argmax P(xi|z,w)
xi
Big Picture: Statistical Models in Computer Vision
Model :




discrete or continuous variables?
discrete or continuous space?
Dependence between variables?
…
Optimisation/Prediction/inference




Applications:









2D/3D Image segmentation
Object Recognition
3D reconstruction
Stereo matching
Image denoising
Texture Synthesis
Pose estimation
Panoramic Stitching
…


Combinatorial optimization:
e.g. Graph Cut
Message Passing: e.g. BP, TRW
Iterated Conditional Modes (ICM)
LP-relaxation: e.g. Cutting-plane
Problem decomposition + subgradient
…
Learning:


Maximum Likelihood Learning
 Pseudo-likelihood approximation
Loss minimizing Parameter Learning
 Exhaustive search
 Constraint generation
 …
Graphical models
Write probability distribution as a Graphical model:
- Directed graphical model (also called Bayesian Networks)
- Undirected graphical model (also called Markov Random Fields)
- Factor graph (most explicit representation)
Basic idea: • A model represents a family of distributions
• Key concept is conditional independency
References:
- Pattern Recognition and Machine Learning [Bishop ‘08, chapter 8]
- several lectures at the Machine Learning Summer School 2009 (see video lectures)
- ML Lecture
Factor Graph: formal definition
P(x) = 1/Z ∏ ΨF (xN(F)) with
F ∊𝔽
Z = ∑ ∏ ΨF (xN(F))
x F ∊𝔽
Z: partition function
F: Factor
𝔽: Set of all factors
N(F): Neighbourhood of a factor
ΨF : function (no distribution) depending on xN(F)
Can be written as Gibbs distribution (last lecture):
We set: ΨF (xN(F)) = exp{- θF(xN(F))}
P(x) = 1/f exp{-E(x)} with Energy E(x) = ∑ θF(xN(F))
F ∊𝔽
Example - Factor Graphs
P(x) ~ exp{-E(x)}
“Gibbs distribution with
E(x) = θ1(x1,x2,x3) + θ2(x2,x4) + θ3(x3,x4) + θ4(x3,x5) 4 factors”
unobserved
x1
x2
x3
x4
x5
Factor graph
variables are in same factor.
Definitions
x1
x2
x3
x4
E(x) = θ1(x1,x2,x3) + θ2(x2,x4) + θ3(x3,x4) + θ4(x3,x5)
arity 3
Definitions:
arity 2
• Order: the arity (number of variables) of the
largest factor
• Markov Random Field: Random Field with
low-order factors
• Some people use “cliques” as “factors”
(not the same since clique may or may not
“decomposable”)
x5
Factor graph
with order 3
Comment: Undirected Graphical models
Triple
Clique
x1
x2
x3
Undirected model
E(X) = θ1(x1,x2,x3)
+ θ2(x1,x2) + θ3(x2,x3)
+ θ4(x1,x3) + θ5(x1)
+ θ6(x2) + θ7(x3)
x1
x2
x1
Triple
factor
x3
Possible Factor graph
E(X) = θ1(x1,x2,x3)
+ θ2(x2)
x2
Pairwise
factor
x3
Possible Factor graph
E(X) = θ1(x1,x2)
+ θ2(x2,x3)
+ θ3(x1,x3)
Discrete domain
Discrete label space (x∊ 𝓛)
Examples - Order
4-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N4
Order 2
“Pairwise energy”
higher(8)-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N8
Order 2
Higher-order RF
E(x) = ∑ θij (xi,xj)
i,j Є N4
+θ(x1,…,xn)
Order n
“higher-order energy”
Random field models
4-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N4
Order 2
“Pairwise energy”
higher(8)-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N8
Order 2
Higher-order RF
E(x) = ∑ θij (xi,xj)
i,j Є N4
+θ(x1,…,xn)
Order n
“higher-order energy”
Example: Image segmentation
P(x|z) ~ exp{-E(x)}
E(x): {0,1}n → ℝ
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj)
i
i,j
Є
N4
Observed variable
Unobserved (latent) variable
zi
xj
Factor graph
xi
Stereo matching
d=0
d=4
Image – left(a)
Image – right(b)
Ground truth depth
• Images rectified
• Ignore occlusion for now
Energy:
E(d): {0,…,D-1}n → ℝ
Labels: d (depth/shift)
di
Stereo matching - Energy
Energy:
E(d): {0,…,D-1}n → R
E(d) = ∑ θi (di) + ∑ θij (di,dj)
i,j Є N4
i
Unary:
θi (di) = ∑ || lj-ri-di||2
“SSD; Sum of square differences”
(many others possible, NCC,…)
left
i
i-2
(di=2)
right
Pairwise:
θij (di,dj) = g(|di-dj|)
Right Image
window
Window is good to overcome
differences in lighting, etc.
Left Image
Stereo matching - prior
cost
θij (di,dj) = g(|di-dj|)
|di-dj|
No truncation
(global min.)
[Olga Veksler PhD thesis,
Daniel Cremers et al.]
Stereo matching - prior
cost
θij (di,dj) = g(|di-dj|)
|di-dj|
discontinuity preserving potentials
[Blake&Zisserman’83,’87]
No truncation
(global min.)
with truncation
(NP hard optimization)
[Olga Veksler PhD thesis,
Daniel Cremers et al.]
Stereo matching
see http://vision.middlebury.edu/stereo/
No MRF
Pixel independent (WTA)
No vertical links
Efficient since independent chains
Pairwise MRF
[Boykov et al. ‘01]
Ground truth
Texture synthesis
Input
Output
Bad case:
Good case:
b
b
a
a
b
i
a
j
i
j
E: {0,1}n → R
E(x) =
O
1
∑ |xi-xj|
i,j Є N4
[ |ai-bi|+|aj-bj| ]
[Kwatra et. al. Siggraph ‘03 ]
Video Synthesis
Input
Output
Video (duplicated)
Video
Panoramic stitching
Panoramic stitching
Large alphabet
Label-space has some ordering:
- Example: depth (stereo), motion (optical flow).
- Pairwise cost have (simple) parametric form
cost
Stereo example
Optical flow estimation:
Often continuous labels are used for this task
|di-dj|
Label-space has no ordering:
- Examples: Panoramic stitching, in-painting, super-resolution
- Label space can get very large (very hard problems)
θij (di,dj) is an arbitrary table
Exemplar-based in-painting
[Freeman et al., MRF book, ch. 10]
Input
Various local minima
E(x) = ∑ |Patch(xi) – Patch(xj)|2
i,j ∊ N4
xi∊ 𝓛
Higher-order model
[Rother et al. TR ‘11]
Exemplar-based in-painting
Global techniques [Komodiakis et al. CVPR ‘06]
Greedy techniques [Criminisi et al. CVPR ‘03]
Recap: 4-connected MRFs
• A lot of useful vision systems are based on
4-connected pairwise MRFs.
• Possible Reason (see optimization part):
a lot of fast and good (globally optimal)
optimization techniques exist
Random field models
4-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N4
Order 2
“Pairwise energy”
higher(8)-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N8
Order 2
Higher-order RF
E(x) = ∑ θij (xi,xj)
i,j Є N4
+θ(x1,…,xn)
Order n
“higher-order energy”
Why larger connectivity?
We have seen…
• “Knock-on” effect (each pixel influences each other pixel)
• Many good systems
What is missing:
1. Modelling real-world texture (images)
2. Reduce discretization artefacts
3. Encode complex prior knowledge
4. Use non-local parameters
Reason 1: Texture modelling
Training images
Result MRF
4-connected
(neighbours)
Test image
Result MRF
4-connected
Test image (60% Noise)
Result MRF
9-connected
(7 attractive; 2 repulsive)
Reason2: Discretization (metrication) artefacts
1
Length of the paths:
4-con.
8-con.
5.65
6.28
5.08
8
6.28
6.75
Eucl.
Larger connectivity can model true Euclidean
length (also other metric possible)
[Boykov et al. ‘03, ‘05]
Reason2: Discretization (metrication) artefacts
4-connected
Euclidean
8-connected
Euclidean (MRF)
8-connected
geodesic (CRF)
(Last lecture: θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||})
[Boykov et al. ‘03; ‘05]
3D reconstruction
Metrication artefacts more visible when no observation z is available
[Slide credits: Daniel Cremers]
Reason 3: Encode complex prior knowledge:
Stereo with occlusion
E(d): {1,…,D}2n → R
Each pixel is connected to D pixels in the other image
d=1 (∞ cost)
1
1
d=10 (match)
θlr (dl,dr) =
d
d
match
D
dl
d=20 (0 cost)
D
dr
Left view
right view
Stereo with occlusion
Ground truth
Stereo with occlusion
[Kolmogrov et al. ‘02]
Stereo without occlusion
[Boykov et al. ‘01]
Reason 4: Use Non-local parameters:
Interactive Segmentation (GrabCut)
w
Model jointly segmentation and color model:
E(x,w): {0,1}n x {GMMs}→ R
E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)
i
i,j Є N4
Red
Red
An object is a compact set of colors:
[Rother et al. Siggraph ’04]
Reason 4: Use Non-local parameters:
Object recognition & segmentation
Sky
Building
Tree
E(x,ω) =
Grass
∑ θi (ω, xi) +∑ θi (xi) + ∑ θi ( xi)
i
(color)
i (location)
i
xi ∊ {1,…,K} for K object classes
∑ θij (xi,xj)
i,j
(edge aware
Ising prior)
Class (boosted textons)
Location
sky
(class)
+
grass
[TextonBoost; Shotton et al. ‘06]
Reason 4: Use Non-local parameters:
Object recognition & segmentation
Class+
location
+ edges
+ color
[TextonBoost; Shotton et al, ‘06]
Reason 4: Use Non-local parameters:
Object recognition & segmentation
Good results …
[TextonBoost; Shotton et al, ‘06]
Reason 4: Use Non-local parameters:
Object recognition & segmentation
Failure cases…
Reason 4: Use Non-local parameters:
Object recognition & segmentation
Goal: Detect and segment test image:
1
w
Large set of example segmentation:
T(1)
T(2)
T(3)
Up to 2.000.000 shape templates
E(x,w): {0,1}n x {Exemplar} → R
E(x,w) = ∑ |T(w)i-xi| + ∑ θij (xi,xj)
i
i,j Є N4
“Hamming distance”
[Lempitsky et al. ECCV ’08]
Reason 4: Use Non-local parameters:
Object recognition & segmentation
UIUC dataset; 98.8% accuracy
[Lempisky et al. ECCV ’08]
Reason 4: Use Non-local parameters:
Recognition with Latent/Hidden CRFs
“instance”
“instance
label”
“parts”
[LayoutCRF Winn et al. ’06]
• Many other examples:
• ObjCut [Kumar et. al. ’05]
• Deformable Part Model [Felzenszwalb et al.; CVPR ’08]
• PoseCut [Bray et al. ’06]
• Branch&Mincut [Lempitsky et al. ECCV ‘08]
• Minimization over hidden variables vs. marginalize over hidden variables
Random field models
4-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N4
Order 2
“Pairwise energy”
higher(8)-connected;
pairwise MRF
E(x) = ∑ θij (xi,xj)
i,j Є N8
Order 2
Higher-order RF
E(x) = ∑ θij (xi,xj)
i,j Є N4
+θ(x1,…,xn)
Order n
“higher-order energy”
Why Higher-order Functions?
In general θ(x1,x2,x3) ≠ θ(x1,x2) + θ(x1,x3) + θ(x2,x3)
Reasons for higher-order RFs:
1. Even better image(texture) models:
–
–
Field-of Expert [FoE, Roth et al. ‘05]
Curvature [Woodford et al. ‘08]
2. Use global Priors:
–
–
–
Connectivity [Vicente et al. ‘08, Nowozin et al. ‘09]
Better encoding label statistics [Woodford et al. ‘09]
Convert global variables to global factors [Vicente et al. ‘09]
cost
Reason1: Better Texture Modelling
Higher Order Structure
not Preserved
Training images
Test Image
Test Image (60% Noise) Result pairwise MRF
9-connected
Higher-order RF
[Rother et al. CVPR ‘09]
Reason 2: Use global Prior
Foreground object must be connected:
User input
E(x) = P(x) + h(x)
Standard MRF
with connectivity
{
∞ if not 4-connected
with h(x)=
0 otherwise
[Vicente et al. ’08]
Reason 2: Use global Prior
Remember, bias of Prior:
|xi-xj|
Results: increased pairwise strength
Introduce a global term,
which controls statistic
Pair-wise
Probability
Ground truth Noisy input
|xi-xj|
[Woodford et. al. ICCV ‘09]
Summary:
Discrete labels and discrete domain
• Order of the random field
– 4-connected to higher-order
• Global, auxiliary variables (e.g. color model of an object)
• Number of labels
– Can be a very large number, e.g. stereo, optical flow, in-painting
=> Discretization artefact/hierarchical approaches
• Labels have a structure?
– Depth (or Pixel intensity) is ordered
=> pairwise potential functions can have a simpler form
– No order for labels:
Panoramic stitching, in-painting, object recognition
Pairwise potential: |xi-xj|
• Comment: discrete labels are often relaxed to continuous for
“better” optimization (… will come in later lectures)
Discrete labels: Optimization
• Combinatorial Optimization
–
–
–
–
Binary, pairwise MRF: Graph cut, BHS (QPBO)
Multiple label, pairwise: move-making; transformation
Binary, higher-order factors: transformation
Multi-label, higher-order factors:
move-making + transformation
• Dual/Problem Decomposition
– Decompose (NP-)hard problem into tractable once.
Solve with e.g. sub-gradient technique
• Local search / Genetic algorithms
– ICM, simulated annealing
Discrete labels: Optimization
• Message Passing Techniques
– Methods can be applied to any model in theory
(higher order, multi-label, etc.)
– BP, TRW, TRW-S
• Relaxations (e.g. LP)
– Relax original problem (e.g. {0,1} to [0,1])
and solve with existing techniques (e.g. sub-gradient)
– Can be applied any model (dep. on solver used)
– Connections to message passing (TRW) and combinatorial
optimization (QPBO)
•
Commercial solvers (e.g. CPLEX)
Discrete labels: Higher-order models
• Arbitrary potentials are only tractable for order <7
(memory, computation time)
• For ≥7 potentials need some structure to be
exploited in order to make them tractable
(e.g. cost over number of labels)
Discrete domain
Continuous label space (x∊ ℝn)
• Gaussian Random Fields
• Arbitrary continuous label Random Fields
Gaussian MRF (GMRF)
P(x) = 1/√det(2π∑) exp{-1/2 (x-μ)T ∑-1 (x-μ)}
x∊ ℝn One big Gaussian system
Gibbs distribution:
P(x) = 1/f exp{ -E(x)}
Solution:
Ax - b = 0
x* = (AT A)-1 AT b
(Or other solvers)
with E(x) = ½ xTAx – xTb
Example - GMRF
We would like to model: E(x) = ∑ (xi-xj)2
x1
i,j
1
-1
-1
2
-1
-1
2
=: ½ A
-1
1
1
E(x) = (x1, x2, x3, x4)
-1
1
1
-1
-1
1
1
-1
1
-1
-1
x2
E(x) = ∑ (xi-xj)2
i,j
x4
for E(x) = ½ xTAx – xTb
with b = 0
x1
x2
x3
x4
(x1-x2, x2-x3, x3-x4)T
(x1-x2, x2-x3, x3-x4)
x3
Example GMRF – image denoising
GMRF: E(x) = ½ xTAx – xTb
Ep
Encode prior knowledge:
Ep(x) = ∑ (xi-xj)2
i,j ∊ N
|xi-xj|
4
Eu
Encode data z dependency
Eu(x) = ∑ (xi-zi)2
i
= (x-z)T I (x-z) = xT I x – 2xTz + zTz
original
Input z
continuous label x
HBF [Szeliski ‘06]~ 15times faster
zi
xi
256 discrete labels x
(TRW-S)
Remember from last lecture
Ideal output Label x
Unary potential: |zi-xi|
Noisy input image z
Pairwise potential: |xi-xj|
Learning these potentials (Loss-minimizing parameter learning)
gives non-Gaussian potentials
[Putting MAP back on the map, Pletscher et al. DAGM 2010]
Detour:
Data-term is often arbitrary (non-Gaussian)
window
lj-ri-di||2
left
Not quadratic!
cost
Unary: θi (di) = ∑ ||
Ground truth depth
Image – right(b)
Image – left(a)
i
right
Disparity d
d
i-2
(di=2)
Scanline (black low cost) [Scharstein et al. IJCV ‘02]
Detour:
Data-term is expensive to evaluate everywhere
Unary: θi (di) = ∑ || lj-ri-di||2
window
left
i
i-2
(di=2)
Discrete
scanline
right
Continuous valued 3D plane
scanline
Insight: 3D aligned windows give an improved data-cost
(also bigger windows can be used (3x3 discrete vs. 35x35 continuous 3D plane)
Optimizing is hard: here Patchmatch [Barnes et al. ‘09]
(even more challenging for optical flow (2D motion))
Detour: PatchMatch stereo
[Bleyer et al. BMVC ‘11]
Gaussian MRF for Matting
[Levin et al. CVPR ‘06]
Input z
Output α
Input
Output α
Input: z
Output: α , F,B
Constraint: zi = αi Fi + (1- αi) Bi
Each pixel has 3 constraints and 7 unknowns
E(α) = αT L α
α* = argmin E(α) sb.t brush strokes
α
L called matting Laplacian.
L(i,j) = 0 if outside a window (e.g. 3x3) otherwise:
The true solution is a global optimum if in each window the fore (F)- and background (B)
colors lie on an arbitrary line in RGB space
… a more physic-based view
[Rhemann et al. CVPR 10]
Problem: E(α) has many solutions!
A Spatially Varying PSF-based Prior
Matting Database and competition
http://www.alphamatting.com/
Non-Gaussian (arbitrary) RFs
Image segmentation
[Singaraju, Grady et al. MRF book, ch. 8]
As with matting: no data term, just hard constraints (brush strokes)
Energy: E(x) = ∑ (wij |xi-xj|)p
i,j
Є
N4
Here: - 4-connected grid
- wij = exp{-ß||zi-zj||})
xi ∊ℝ
(as before)
Foreground
brush
Background
brush
Optimize: x* = argmin E(x)
x
sb.t. xi=1 for foreground; xi=0 for background
Output xo = round(x*)
, i.e. xo = 0 if x* < 0.5, 1 otherwise
For any p ≥ 1 this can be solved globally optimal (but not guaranteed unique solution).
Here optimized with IRLS (iterated reweighted least squares))
Varying “p”
Energy: E(x) = ∑ (wij |xi-xj|)p
i,j Є N4
• p=1, the rounded solution is the same as the solution
to discrete problem: xi ϵ {0,1}
(i.e GraphCut [Boykov, Jolly, ICCV ‘01])
• p=2, Gaussian MRF
(known as random walker solution [Grady PAMI ‘06])
• p -> ∞, solution gets very ambiguous:
only one edge is important if x discrete
x* = argmin ∑ (wij|xi-xj|)p->∞ = argmin max (wij|xi-xj|) p->∞
x
i,j Є N4
x
(one solution is shortest geodesic path
[Bai et al. ICCV ‘07])
i,j Є N4
Metrication (discretization) artefacts
see [Singaraju, Grady et al. MRF book, ch. 8]
“Blockiness” of the segmentation due to the
underlying pixel grid
Larger p better
Proximity bias for p-norm
Sensitivity of the segmentation to the placement of
the brush strokes
Proximity bias
Sensitivity of the segmentation to the placement of
the brush strokes
Small p may be better … but user can adapt to system!
Shrinking bias
Shortening of the segmentation due to
regularization
larger p may be better … again, user can adapt to system!
x* = argmin ∑ (wij|xi-xj|)p->∞ = argmin max (wij|xi-xj|) p->∞
x
i,j Є N4
x
i,j Є N4
only one edge is important if x discrete
Field of Expert
[Roth et al. ‘05]
Generative model for a noisy image:
P(x|z,w) = P(z|x) P(x|w) 1/P(z)
Label x
MAP: x* = argmax P(z|x) P(x|w)
x
Likelihood:
P(z|x,w) ~ ∏ exp{-1/(2σ2) (zi-xi)2}
i
Data z
(pixel independent
Gaussian noise)
Prior for real images
So far: E(x) = ∑ θ(xi-xj)
i,j
Linear filters JT
k
(with weight wk )
Pairwise potential: |xi-xj|
Product over all filters
FoE: P(x|w) = 1/f
Π Πk
F ∊𝔽
(1+1/2(JT
k
xN(F)
-wk
)2)
“heavy-tailed student-t distribution
Product over windows (e.g. 5x5)
Results
Even better:
• Discriminative models [Tappen, MRF book, ch. 13]
• (hand-crafted) Discriminative functions such as BM3D
In-painting:
Take advantage of a Generative model
P(x|z,w) ~ p(z|x) p(x|w)
Likelihood:
p(z|x) is either given (observed area) or uniform (unobserved area)
In-painting results
Image
(zoom)
FoE
Smoothing MRF
[Bertalmio et al., Siggraph ‘00]
Summary: Continuous labels
• Most of the time used for problems where discrete labels
are ordered:
– image intensity, depth, motion, transparency(matting)
– Regularization term (pairwise, higher-order) can be written in
some parametric form (Gaussian, Filters, etc)
• Data-term can be quite arbitrary and expansive to evaluate
everywhere (e.g. 3D planes for depth estimation)
• Often done in the context of continuous domain methods
(comes next)
Continuous labels: Optimization
• Closed-form for Gaussian
• Gradient-descent style optimization
(e.g. BFGS (Broyden–Fletcher–Goldfarb–Shanno method))
• Convex Relaxation
• Message passing
• Fusion move (see later lectures)
Continuous domain
Continuous label space
… just a few comments
From continuous domain to discrete domain
[Schelten and Roth CVPR’11]
Image 𝒖 (zoom)
domain: Ω ⊂ ℝ2
data 𝒖: Ω -> ℝ
Piece-wise linear
functions f (zoom)
(many other realizations possible)
function 𝒇: Ω -> ℝ
Energy: E(𝒇) =
Ω 𝒇−𝒖
𝟐
Convex data-term
+
Ω |𝛁𝒇|
Total variation
smoothness
MRF factor graph
(factors for smoothness term)
Results
Continuous domain (my take)
•
Main advantage: do continuous math and then commit to a certain discretization
of the image
•
An argument is that the world is continuous
… but then you have to model the image formation process (e.g. camera PSF)
which leads to a higher-order model
•
No probabilistic view has been done available
(only loss-based learning possible)
•
Fast (Primal-dual) GPU solvers are available.
Relationship to discrete counter-parts would be good to establish.
•
Most continuous domain methods use first-order derivatives (e.g. total variation).
This is inferior to high-order derivatives.
It would be interesting to write an FoE model in the continuous domain
•
Data-term often has to be discretized
•
General: The connection to discrete domain and/or discrete label techniques have
to be better explored
Lecture Summary (my take)
• Discrete domain, discrete labels:
– Many applications (low-level and high-level vision)
– A lot of research on optimization and learning
• Discrete domain, continuous labels
– less often used in vision
– data-term often has to be discretized
• Continuous domain, continuous labels
– connections to discrete domain/labels have to be better
understood
– data-term often has to be discretized
Download