shen - University of Alberta Statistics Center

advertisement
A Young Mathematician’s Reflection on
Vision:
Illposedness & Regularizations
Jackie (Jianhong) Shen
School of Mathematics
University of Minnesota, Minneapolis, MN 55455
Workshop on Regularization in Statistics
Banff International Research Station, Canada; September 6-11, 2003
Abstract
Vision, the perception of the 3-D world from its 2-D partial projections
onto the left and right retinas, is fundamentally an illposed inverse
problem. But after millions of years' of evolution, human vision has
become astonishingly accurate and satisfactory. How could it have
become such a remarkable inverse problem solver, and what are the
hidden (or subconscious) regularization techniques it employs?
This talk attempts to reflect and shed some light on these “billion-dollar”
or Nobel-level questions (the works of several Nobel Laureates will
be mentioned indeed), based on the limited but unique experience
and philosophy of a young mathematician. In particular, we discuss
the topological, geometric, statistical/Bayesian, and psychological
regularization techniques of visual perception.
I owe deep appreciation to my former advisors, mentors, and teachers
leading me to the Door of Vision : Professors Gil Strang, Tony Chan,
Stan Osher, David Mumford, Stu Geman, and Dan Kersten.
2
Agenda
 An Abstract View of Vision: Imaging and Perception
 Illposedness of Visual Perception: Root and Solutions
 Conscious or Subconscious Regularizations




3
Topological Regularization: Generic Viewpoint Principle
Geometric Regularization: Perception & Role of Curvature
Statistical Regularization: Gibbs Fields and Learning
Psychological Regularization: Role of Weber’s Law.
Dedication
“ Dedicated to all pioneering mathematicians in
Mathematical Image and Vision Analysis (Miva),
on whose shoulders we the younger generations are
standing,
and on whose shoulders, we the younger must think
deeper, speak louder, and look further beyond …”
- Jackie Shen
4
An Abstract View of Vision :
Passive Imaging
&
Active Visual Perception
Biological or Digital (Passive) Imaging Process
I: Illuminance or incident light
G: 3-D surface geometry & topology
R: Reflectance or material property
Passive Imaging Process:
Optical imaging process (human vision or digital
camera) can be modeled as a function (or
operator):
u( x 2 )  F[G( x 3 ), R( x 3 ); a1, a2 ,, an ]
Here, 2 and 3 indicate the dimension of the spatial variables.
G and R denote the configuration (scene | geometry) and the
reflectance. All the a’s are parameters such as I and q.
6
Active Visual Perception
I: Illuminance or incident light
G: 3-D surface geometry & topology
R: Reflectance or material property
Active Visual Perception:
Perception is to reconstruct the 3-D world
(geometry, topology, material surface properties,
light source, etc.) from the observed 2-D image:
[G( x 3 ), R( x 3 ); a1 , a2 ,, an ]  V [u( x 2 )],
[G( x 3 ),R( x 3 )]  V [u( x 2 ); a1 , a2 ,, an ].
7
or
A Quick Overview of Mathematicians’ Missions
 Develop mathematical formulations for all the important
psychological and cognitive discoveries in visual
perception.
 As in geometry and physics (Hamiltonian, Statistical or
Quantum Mechanics), unify and extract the most
fundamental laws or axioms of perception, and develop
their mathematical foundations.
 Decode the computational and information-theoretic
efficiency behind human visual perception, and integrate
such knowledge into digital and computational decision
and optimization algorithms.
 Develop novel mathematical tools and theories arising
from such studies, and apply them to other scientific and
engineering fields, such as data visualization/mining;
pattern recognition/learning/coding; multiscale modeling.
8
Is It Risky for a Young Mathematician to Study Perception?
 Quoting Albert Einstein:
“The object of all science, whether natural science or
psychology , is to co-ordinate our experiences and to bring
them into a logical system.”
- from “Space and Time in Pre-Relativity Physics,” May, 1921.
To me, as for Special or General Relativities, being logical
necessarily means being mathematical :
o Cause-effect modeling and description (e.g., learning theory);
o
o
o
o
9
Formulating basic psychological/perceptual laws (e.g. Weber);
Simulating brain computation using computers;
Verifying existing data, and furnishing reasonable predictions.
etc.
Illposedness of
Visual Perception:
 Root of Illposedness
 Solutions to Illposedness
Vision is an Inverse Computer Graphics Problem
Dan Kersten (1997)
3-D Computer Graphics: (Hollywood animated movies)
G: Geometric configuration of a 3-D scene (bike, table,…)
R: Material surface property: reflectance field.
I : ILLUMINANCE (incident lights, light source and type).
Goal: Generate a 2-D image U, which looks exactly like the
image that one sees if facing such a real 3-D scene.
Visual Perception is an Inverse C.G. Problem (Kersten):
Given a 2-D image U, one attempts to reconstruct its 3-D
world: G, R, I, viewing position & angle, etc.
11
Vision is an illposed Inverse Problem (A-1)
A. Geometry is not invertible: depth or range is lost !
Mathematical Model (1): Projective Imaging
P: (x1, x2, x3)  (x1, x2) is not invertible!
For any given 2-D curve g2, there are infinitely many 3-D
curves g3, so that P(g3) = g2 .
Example.
g3 = (cos t, sin t, t )
g2 is just like the projection of a circle
12
Vision is an illposed Inverse Problem (A-2)
Mathematical Model (2): Perspective Imaging
a hyperbola
a parabola
an ellipse
Imaging plane
(or retina)
All three different types of curves are imaged as a 2-D circle !
13
Vision is an illposed Inverse Problem (B)
B. The Reflectance-Illuminance entanglement.
R
I
different 3-D scenes
R
I
14
identical images !
But vision still makes sense,
after millions of years’ evolution ,
which implies that:
The human vision system is a well developed system of
software and hardware, which can solve this highly illposed inverse problem efficiently and robustly.
Fundamental Questions:
 What kind of regularization techniques the human
vision system employs to conquer the illposedness?
 What kind of features or variables to regularize ?
15
Deterministic Model: Tikhonov Regularization
Tikhonov conditioning technique for inverse problems:
forward data generating: U0 = F(X).
backward (inverse) reconstruction of X:
min d [ F(X), U0 ] + R(X),
where d is a suitable discrepancy metric, and R is the
regularizing/conditioning term.
Example. (Cleaning and sharpening of astronomical images)
U0 = h * X + n . (h : atmosphere blurring ; n: white noise)
Such an inverse problem is typically solved by
min
X
16

2


(U 0  h  X ) 2 dx   | X | dx,   1.

Tikhonov Meets Bayes: Perception as Inference
Bayesian Perception: X = (geometry G, reflectance R, …).
Prob (U 0 | X ) Prob ( X )
max Prob ( X | U 0 ) 
X
Prob (U 0 )
Gibbs’ energy formula: Prob (y) = 1/Z exp ( - E[y] / k T ).
Thus, in terms of energies, we have
min E[ X | U 0 ]  E[U 0 | X ]  E[ X ]  const ,
X
which leads to Tikhonov Inversion !
Conclusion:
It is our a priori knowledge of the world (i.e., Prob [X ] )
that regularizes our visual perception!
17
A Priori Knowledge (Common Sense) of the World
Regularizes Vision
 G: Knowledge of curve and shape geometry;
 I: Knowledge of light sources & illuminance
(sun, lamps, indoor or outdoor, …);
 R: Knowledge of materials (wood, bricks, …)
and surface reflectance (metal shines and
water sparkles, …)
 Q : Knowledge of the viewers (often standing
perpendicularly to the ground, viewing more
horizontally, several feet away for indoor
scenes, …)
 ……
18
Example I:
Prior Knowledge on Curves & Boundaries
 Straight lines: length(g), or 1-D Hausdorff.
 Euler’s elastica (Mumford [1994], Chan-Kang-Shen [2002]):
g
( a  b 2 ) ds.
 Piecewise elastica (Shah [2001]):
e[g , S ]  
g \S
(a  b 2 )ds  c # S ,
which allows corners or hinges (i.e. the
discrete set S) along the curve.
19
Example II. Prior Knowledge on Material Reflectance
Reflectance distribution R(x) in the visual field 

The Mumford-Shah [1989] free-boundary model:

 \g
| R |
2
dx  e[g ].
curve energy
 Many challenging free-boundary problems.
 Connections to interface motions (such as the
mean curvature motion).
20
Deeper Question: What to Learn & How
 How does the vision system subconsciously choose what
knowledge to learn and store, out of massive visual data in
daily life?
 For such knowledge, to which degree of regularity or
compression that the vision system “decides” to process, in
order to achieve maximum efficiency and robustness?
 How to mathematically model (or quantify) such activities?
Partial answer by Nobel Laureate (1972) Gerald Edelman:
Neural Darwinism: (Basic Books, Inc., 1987)
The vision system in each individual operates as a selective system
resembling natural section in evolution, but operating by different
mechanisms.
21
Topological Regularization:
Principle of Generic Viewpoint
vs.
Theorem of Transversality in
Differential Topology
Transversal Phenomenon
1
3-D world
2
projection
Digital or Biological
2-D Retina
A
B
g1
C
g2
g 1 and g 2 are transversal, in the sense of differential topology:
g 1 and g 2 are not tangent at any point A  g 1  g 2 .
23
Transversal Is Generic: Stability & Universality
Suppose that (2, 3 are dimensions, and P is the projection)




x 2 (t )  P[ x 3 (t )] and y 2 (t )  P[ y 3 (t )]
are transversal. Then for any perturbation Pe to P , within an e
distance under a suitable smoothness norm,




xe2 (t )  Pe [ x 3 (t )] and ye2 (t )  Pe [ y 3 (t )]
are still transversal.
Stability
or
Openness
Theorem (Milnor): Let M and N be two smooth manifolds, and
P  N submanifold. Then any smooth mapping f: M  N can be
approximated arbitrarily closely by one which is transversal to P. Universality
or
Denseness
P
M
N
24
Principle of Generic Viewpoint in Vision
Generic Viewpoint: (Nakayama & Shimojo, Science [1992]; Freeman, Nature [1994])
What we see is generic w.r.t. the viewpoint !
Mathematical Model:
Suppose we see an image
u( x 2 )  F[G( x 3 ), R( x 3 ); a1, a2 ,, an ]
Then
lim e 0 D{u ( x 2 ), F [G ( x 3 ), R( x 3 ); a1e , a2e ,, ane ]}  0,
where ake ' s denote small perturbations of the a’s, controlled by e,
and D is a vision motivated measure, metric or distance.
In terms of curves, D can be based on invariants of differential topology:
smoothness, corners, connected components, branching, …
25
How G.V. Principle Regularizes Perception: 3 Examples
 How the Generic Viewpoint Principle regularizes visual
perception (conditioning the illposedness):
 Set up preference/biases among all possible scenes.
 (Uniqueness) Lead to topologically unique solutions.
 (Stability) Lead to the stability of perceptual solutions.
26
Example 1: Curve Perception
2
1
A pitchfork
Two curves
g
Given the image g , which 3-D scene (curve) satisfies the G.V. Principle?
Answer: 1. For 2, when the viewing angle q is perturbed a little bit, the image
would be:
OR
which is not diffeomorphic to g (since branching degree d and number of
connected components c are invariants). Take the metric D, for example,
D = difference in d + difference in c + other visual metric terms.
27
Example 2: Shape Perception
A painted planar plate
a uniform V-wing
u
Given the image u, which 3-D scenario satisfies the G. V. Principle?
Answer: Planar Plate. If the viewer moves a little bit (i.e. viewing angle q qe), the
image of the V-wing scene would become
ue
The two 2-D shapes are NOT diffeomorphic since corner is an invariant. (This is
because: if f is a diffeomorphism, then the Jacobian Jf is non-singular, and a
corner can never be smoothed out leading to a straight line.
28
Example 3: shape perception : Simian or Human
u
v
d
d=0
 For image u, we “see” two human faces. No doubt.
 If we move the face contours closer till touching (image v), do we still
percept two human faces? or two simians? Psychologists show that most
of us percept the latter. WHY? My explanation based on G. V. Principle:
In 3-D, give the two faces a slight distance difference (from the
viewer). Then the perception of two human faces violates the G.V.
Principle, while the perception of two (100% transparent except
along the outlines) simian faces satisfies the G.V. Principle, or the
Stability Principle. [Classical Interpretation is based on Gestalt.]
29
Two simian faces
Two human faces
Dan Kersten, Vision Psychology, UMN
Geometric Regularization:
Perception and Roles of
Curvature
Why Curvature: (A) Architectural Evidence
St. Louis Arch, USA
Eiffel Tower, Paris
MIT’s Big Dome, Cambridge, USA
 If human vision were blind to curvature, architects
would not have bothered to build these beautiful
structures !
(Disclaimer: All pictures are Internet downloads, not taken by the author.)
31
Why Curvature: (B) Anatomic Evidence

The work of Nobel Laureates (1981) David Hubel & Torstern Wiesel (1962):
x
Visual cortical surface
y
In my opinion, this provides
the most intimate evidence
of visual processing of the
curvature information.
WHY ? Because
50 mm
Curvature= Spatial Organization
of Orientations:   dq/ds !
Down to the interior
z
Quantized array of cortical visual neurons and their oriented receptive fields
32
Why Curvature: (C) Heuristic Mathematical Argument
Taylor expansion for a curve:




x( s)  x(0)  x (0)s  x(0)s 2 / 2  .

x (0)
s
Question: Which orders do matter to human visual perception?
Answer: Like Newtonian mechanics, only up to the second order.
Heuristic Mathematical Argument:
 First order matters:
since we can detect rotation!




 
x (0)  x (0)  ev is like a rotation Re x( s) if v  x (0).
 Second order matters:
since we have a
strong sense of concavity/convexity/inflection.
 Third order?
x( s)  s 2 , s [1,1].
x( s)  s 2  0.1s3 , s [1,1].
33
What Do All These Mean?
 The proceeding evidences clearly demonstrate the
significance, feasibility, legitimacy of curvature
processing in visual perception. So what? It implies to:
We
focus
on
this
point.
 Employ curvature as a deterministic regularizer for solving
numerous ill-posed inverse problems in image and vision
analysis ( Mumford’s proposal of Euler’s elastica regularizer (1994), also
Masnou-Morel (1998, ICIP ), Chan-Kang-Shen, (2002, SJAP ), Esedoglu-Shen
(2002, EJAP ) );
 Encode curvature in data structure and organization (e.g.,
Candes-Donoho’s curvelets (1999; 2002, IEEE Trans. IP ));
 Develop and study curvature-based mathematical models
(e.g., Chan-Shen’s CDD 3rd nonlinear PDE model (2001, JVCIR ), Esedoglu-
Shen’s proposal of Mumford-Shah-Euler model (2002, EJAP ); Euclidean and
affine invariant scale spaces (Alvarez-Guichard-Lions-Morel, 1993; CalabiOlver-Tannenbaum, 1996); mean curvature motions (Evans-Spruck, 1991)).
34
Perceptual Interpolation of Missing Boundaries

np
Occluding object

nq
p
q

np
g
p

nq
q
Problem: Find a curve g (t), 0<t<1, which passes through the two
endpoints p and q , and looks “natural.”
Vision background: Most objects in our material world are not
transparent. Occlusion is universal. Human vision must be
able to integrate the dissected information (by suitable
interpolation), in order to successfully perceive the world.
Why regularization is needed: (1) (non-uniqueness) otherwise
too many curves to choose from: Brownian paths, say; (2) to
regularize is to properly model “being natural.”
35
Second Order Polynomial Regularization
Modeling
“Being
Natural”
g (0)  p, g (1)  q; and


n p , g(0)  0, nq , g(1)  0.

np
g
p

nq
q
Let us try polynomial regularization:
   2 3
g (t )  a  b t  c t  dt  ...
But which order is generically sufficient?
Count the constraints: in 2-D, totally 2+2+1+1=6 conditions.
 
n
Theorem. As long as p  nq  0 , there exists a unique parabolic
interpolant in the form of

g (t )  qt  2b t(1  t )  p(1  t )2 .
2
This solution echoes the earlier claim of second order sufficiency
36
Second Order Geometric Regularization: Euler’s Elastica
g (0)  p, g (1)  q; and


n p , g(0)  0, nq , g(1)  0.
Mumford [1994]

np
g

nq
p
q
Variational approach: under these constraints, to minimize
g
( a  b 2 ) ds,
called Euler’s elastica. Energy was studied by Euler (1744) to
model the steady shape of a thin and torsion free elastica rod.
The equilibrium equation:
 ' ' ( s) 
 a
2
    with the first integral ds 
2b

2d
S  R  
2
4
which is an elliptic integral, and the solution can be expressed by
elliptic functions (Mumford [1994]).
37
,
Image Interpolation or Inpainting
 Inpainting is an artistic way of saying Image Interpolation
(as first used in IP by Bertalmio et al (SIGGRAPH, 2000));
 Partial image information loss is very common:




Occlusion caused by non-transparent objects;
Data loss in wireless transmission;
Cracks in ancient paintings due to pigment aging/weather;
Insufficient number of image acquisition sensors. etc
Example I: Occlusion
38
Example II: Cracks
Read Shen (SIAM News, 36(5), Inpainting and Fundamental Problem of IP, 2003)
Chan-Shen’s Inpainting Model via BV Regularizer
Chan-Shen (SIAM J. Appl. Math., 62(3), 2001)
u 0 | \ D given
missing on D

Existence is guaranteed,
but uniqueness is not.
The TV inpainting model:
BV regularizer
min E[u | u 0 , D]   | Du | 

u

2

\ D
| Ku  u 0 |2 dx,
least square (for uniform Gaussian noise)
The associated formal Euler-Lagrange equation on :
 u
0  

 u

t
0
  K   D ( x )( K  u  u ),


Fidelity Index
D ( x)   1 \ D ( x).
with Neumann adiabatic condition along the boundary of .
39
Formally looks very similar to Rudin-Osher-Fatemi’s denoising model !
Chan-Shen’s Model for Inpainting Noisy & Blurry Images
Chan-Shen
(AMS Contemporary Math., 2002)
movie forever
40
Suppose K=Gt,
is the Gaussian kernel.
Then, the model gives
a good inverting of
heat diffusion. Without
the BV regularization,
backward diffusion is
notoriously ill-posed.
The BV Regularizer is Insufficient for Inpainting
(Kanisza, Nitzberg-Mumford, Chan-Shen)
w
Long distance
connection is
too expensive
for the TV cost !
l
Cheaper to
simply break it.
We need curvature!
When l >> w
41
Human
perception
BV breaks it
Lifting Curve Regularity to Image Regularity
 Using the level-sets of an image, we can “lift” a curve model to
an image model (formally; theoretical study by Bellettini, et al., 1992):


E2 [u ] 
D
1
0
Curvature of level sets
( a  b 2 ) | u | dx
d 
g  :u  
 u
 | u
  
( a  b 2 ) ds





n
|

 Connection to the mean curvature flow (Evans-Spruck, 1991):
ut  ( ij 
d
dt
42

D
u xi u x j
| u |
2
)u xi u x j ,
| u | dx     2 | u | dx.
D
Elastica Inpainting Model: Curvature Incorporated
Chan-Kang-Shen (SJAP, 2002; also ref. to Masnou-Morel, 1998 )
E[u | u 0 , D]    ( ) | u |dx 

2  \ D

    n is the curvature,  ( )  a  b 2 .

| u  u 0 |2 dx,
Theorem (associated E.-L. PDE).
t
The gradient descent flow is given by

u
   V   D ( x)(u 0  u ),
t



t
 ( ' ( ) | u |)
V   ( ) n 
,

| u |
t
n
level sets of u
where V is called the flux field , with proper boundary
conditions.
43
Elastica Inpainting: Nonlinear Tranport + CDD
Chan-Kang-Shen (SJAP, 2002 )
 Transport along the isophotes (level sets):

t
 ( ' ( ) | u |)
Tangential component Vt  
,

| u |
t
t
n
 Curvature driven diffusion (CDD) across the isophotes:
Normal component

u
Vn   ( )n   ( )
.
| u |
Conclusion:
Elastica inpainting unifies the earlier work of
Bertalmio, Sapiro, Caselles, and Ballester (SIGGRAPH,
2000) on transport based inpainting, and that of Chan
and Shen (JVCIR, 2001) on CDD inpainting (motivated
by human visual perception).
44
level sets
Elastica Inpainting. I. Smoother Completion
Effect 1: as b/a increases,
connection becomes smoother.
Chan-Kang-Shen (SJAP, 2002 )
45
Euler' s elastica :  ( )  a  b 2 .
Elastica Inpainting. II. Long Distance is Cheaper
Effect 2: as b/a increases,
long distance connection gets cheaper.
Euler' s elastica :  ( )  a  b 2 .
For more theoretical and computational (4th order nonlinear!)
details, please see Chan-Kang-Shen (SJAP, 63(2), 2002).
46
Inpainting Regularized by Mumford-Shah Regulerizer
Chan-Shen (SJAP, 2000), Tsai-Yezzi-Willsky (2001), Esedoglu-Shen (EJAP, 2002)
The Mumford-Shah (1989) image model was initially designed
for the segmentation application:
Ems [u , ]  E[u | ]  E[] 
g
2  \ 
| u |2 dx   length (),
Mumford-Shah based inpainting is to minimize:
Ems [u,  | u 0 , D, K ]  Ems [u, ]  E[u 0 | u, , D, K ].
Inpainting domain possible blurring
A free boundary optimization problem.
47
Mumford-Shah Inpainting:
Algorithm
Esedoglu-Shen (Europ. J. Appl. Math., 2002)
For the current guess of edge completion , find

u
to minimize
Ems [u | , u 0 , D]  E[u | ]  E[u 0 | u, , D]  const .

equivalent to solving the elliptic equation on \:
gu  D ( x)(u 0  u )  0.

This updated guess of
dx
dt
u
then guides the motion of :

 (  [ R] )n.
M. C. Motion
R+
R-
Jump across  of the roughness measure
R  g2 | u |2  2D (u  u 0 ) 2 .
[We can then benefit from the level-set implementation by Chan-Vese.]
48
M.-S. Inpainting via -Convergence Approximation
Esedoglu-Shen (Europ. J. Appl. Math., 2002)
The
-convergence approximation of Ambrosio-Tortorelli (1990):
Ems [u , ] 
g
g
2

\
| u |2 dx   length (),

(1  z ) 2
2
Ee [u, z ]   ( z  o(e )) | u | dx     e | z | 

2 
4e

2
2

dx.

z=1
z=0

edge  is approximated by a signature function z.
49
Esedoglu-Shen shows
that inpainting is the perfect
market for -convergence
Leading to Simple
Elliptic
Implementation
Esedoglu-Shen (2002)
u 0 | \ D given
missing on D

The associated equilibrium PDEs are two coupled elliptic
equations for u and z, with Neuman boundary conditions:
D ( x)(u  u 0 )  g  ( z 2u )  0,
z 1 

(g | u | ) z     2ez 
  0,
2e 

2
which can be solved numerically by any efficient elliptic solver.
50
Applications: Disocclusion and Text Removal
Esedoglu-Shen (2002)
51
inpainted u
the edge signature z
Inpainting domain
inpainted u
Insufficiency of Mumford-Shah Regularity for Inpainting
Defect I:
Artificial corners
Defect II:
Fail to realize
the Connectivity
Principle, like BV.
52
Esedoglu-Shen’s Proposal: Mumford-Shah-Euler
Esedoglu-Shen (2002)
Idea: change the straight-line curve model embedded in the
Mumford-Shah model to Euler’s elastica:
Emse [u, ] 
g
2
\
| u |2 dx  e(),
e()   length ()     2 ds, the elastica energy.

The -convergence approximation (conjecture) of De Giorgi (1991):
Ee [u , z ] 
g
W ( z) 

2
2
2
(
z

o
(
e
))
|

u
|
dx


e
|

z
|


dx
 
2 
4e 


e
2
W ' ( z) 

2
e

z


 dx,
 
4e 
W ( z )  (1  z ) 2 (1  z ) 2 is the double - well potential.
For the technical and computational details, please see Esedoglu-Shen.
53
Inpainting Based on Mumford-Shah-Euler Regularity
Esedoglu-Shen (2002)
Issues:
• Very costly: 4th order PDE
• Many local minima
• Without approximation, it is
difficult to implement geometry
54
Conclusion for Curvature Based Regularization
 Curvature processing has its anatomical vision foundation
 Curvature processing has its cognitive/perceptual foundation
 Curvature based regularization is necessary for
 Image analysis and coding
 Image processing and modeling
 Image computation: algorithms and implementation schemes
 Curvature imposes both theoretical & computational challenges:
non-quadratic objective functionals; nonlinear 3rd and 4th order
PDEs; insufficient theory for wellposedness (existence,
uniqueness, definition domains, etc.) …
55
Statistical Regularization:
Gibbs Fields and Learning
Stochastic View of Images: Random Fields
Geman-Geman (1984), Grenander (1993), Zhu-Mumford (1997)
 An observed image u is treated as a randomly drawn sample.
 The sample space is an ensemble of images with its own (often
unknown) distribution.
 An ensemble equipped with a probability distribution m or p
naturally leads to a random field (R. F.) on the image domain.
 Two distinguished features of such stochastic view:
 We are not interested in the pixelwise details of a particular image,
rather, the key statistical features of the R. F. associated.
 The R. F. is assumed to be ergodic : for a typical sample image u,
spatial statistics converge to ensemble/field statistics.
 In this view, image regularization is built into the distribution m .
57
Stochastic View of Images: Examples
(Shen-Jung, 2003)
(Shen-Jung, 2003)
Reaction-diffusion spots
Reaction-diffusion stripes
Wood texture (Internet download)
Crops (Internet download)
58
Stochastic Regularization: Gibbs Fields
Geman-Geman (1984), Grenander (1993), Zhu-Mumford (1997)
 Stochastic regularity is encoded or specified by the random
field distribution p (u ) .
 If p is a uniform distribution, then there is not much tangible
“information” associated to the images, or in Shannon’s
language, the information content is the lowest (equivalently,
no regularity is in presence as the entropy reaches highest).
 In Grenander’s (founder of Pattern Theory, Brown U) language,
textures are built from basic building elements (“atoms”), and
these “atoms,” like molecules, are bound together by local
regularity energies, leading to informative images.
 Thus Gibbs Fields Model seems natural to characterize such
stochastic regularity: Gibbs Canonical Ensemble/Formula:
p(u ) 
1
Z
exp(  E[u ] / kT )
partition function
59
“regularity energy”
visual “temperature”
Regularization: Visual Potentials and Their Duals
 What can a box of air (molecules) tell us:
 Microscopically never stops fluctuating (lacking regularity)
 Macroscopically there are a few key feature potentials:
o Temperature T
o Pressure p
o Chemical potentials m
 These potentials have their dual (additive) variables
o Energy E, dual to the inverse temperature 1/kT.
o Volume V, dual to the pressure p.
o Mole numbers N, dual to the chemical potential m.
 Gibbs Generalized Ensemble Model says,
Prob (a micro state) = 1/Z .exp(-  E[.] -  p V[.] +  m N[.] ).
 Conclusion: to apply Gibbs regularity in image analysis, one
needs to properly identify visually meaningful potentials and
their duals. (These feature parameters will regularize images.)
60
Example of Gibbs Canonical Images:
Binary Ising Images
 Ising’s Model (1925, for ferromagnetic phase transition originally):
 Binary images on the Cartesian lattice Z2.
 u=u(i,j)=+1 or –1 (spin up or down).
 In an external (biasing) magnetic field H , the energy
associated to each observed field u is:
E[u ]   Jk

H
u
(
a
)
u
(
b
)

k
a ~b
 u(c)
c
o J : internal energy of magnetic dipole (neighbor pair)
o a, b, c : representing general pixels, ~: neighboring.
 But generic (natural) images are not generated from
such clean physics. Challenge is to develop good
models for (generalized) energies, and properly model
short-range or long-range visual interactions.
61
What Features to Regularize: Visual Filters
Zhu-Mumford (1997)
 Treat the human vision system as a system of linear filters: T=
(F1 , F2 , …, FN ).
 Each filter Fn is characterized by its special capability in
resolving a particular orientation q with a particular spatial
frequency w at a particular scale s . That is, Fn’s are
parametrized by the feature vector ( q, w, s ).
 Basic Assumption:
To human vision, the random image fields are completely describable
(at least in satisfactory approximation) by the filter response T[u].
All other features are blindly filtered out, and treated visually
insignificant. [ This is the hidden rule of visual regularization!! ]
 Justification (from Statistical Mechanics):
Though the dimension of the phase space of a box of air molecules is
huge, in equilibrium such a system can be accurately described by
three feature parameters: temperature, pressure, and volume.
62
From Visual Filters to Potentials:
Maximum Entropy Learning of Zhu-Mumford
Zhu-Mumford (1997)
 Following the proceeding basic assumption, a Gibbs image
model would be ONLY based on the statistics of the filter
outputs: T[u] = ( F1 [u], F2 [u], … , FN [u] ).
 But how exactly? Zhu-Mumford’s MEL Model/Scheme:
 Each filter output vn = Fn [u] is by itself a random field.
 In the ideal case, suppose we do know the random field
distributions: qn ( vn ), n =1:N. Then, these can be used as a set of
constraints on the original Gibbs image u. That is, p=p(u) should
lead to Prob( vn ) = qn ( vn ), n =1:N.
 Under these set of constraints, one can use Gibbs variational
formulation of Statistical Mechanics to find the unique Gibbs field
that maximizes the entropy, which necessarily takes the form of
p(u ) 
1
Z
exp (-

n ,v
n
 (n, v n )   F
n
[ u ]v n
).
Dirac’s delta
 In reality, the spatial structure of vn is often ignored and each
is treated as a field of i.i.d.’s. Thus the joint p.d.f qn is a direct
tensor product, and is replaced by its 1-D histogram.
63
Learning of Regularization Will Be Momentous …
 A brief conclusion:
 Statistical regularization is often achieved through
visually meaningful filters and filtering processes.
 The Gibbs field model can be learned based on the
empirical statistics (e.g. histograms) of these filter
outputs of a typical image sample, and the ergodicity
assumption is thus crucial in such learning processes.
 Maximum Entropy based Learning Processes (Melp) will
play an increasingly important role in vision and pattern
analysis.
64
Psychological Regularization:
Role of Weber’s Law: A Case Study
Weber’s Law
Shen (Physica D, 175, 2003)
 Weber’s Law in Psychology:
“Let u denote the mean field of a background (sound/light), and
u the intensity increment just detectable by human
perception (ears/eyes) [or the so-called JND: justnoticeable-difference]. Then u/u = constant.”
 First qualitatively described by German physiologist E. H.
Weber in 1834; Later formulated quantitatively by the great
experimental psychologist G. T. Fechner in 1858.
 Search your own experience for validating Weber’s Law:
 In a fully packed stadium: high bgsound u => have to cry loud to be
effectively heard by other folks (i.e., u has to be high as well);
 “月明星稀” (An ancient Chinese idiom): translated to “In a night with
a bright full moon, the stars always look scarce.”
66
Is Weber’s Law Psychological or Physiological ?
James Keener (Math. Physiology, 1998)
 Jackie Shen’s Theorem: Any commonly shared psychological
phenomenon (among most of the 6,000,000,000 people on this
planet) has to be physiological.
 Weber’s Law expresses the light adaptive capability of the frontal
end of the entire vision system – the two retinas.
 Without Weber’s Law, the retinas would not be able to operate over
a wide range of light intensities: from several photons to bright
solar light, since the membrane potential V of a neuron cell has a
saturated maximum value.
 Weber’s Law is the result of a feedback mechanism (TranchinaPerskin, 1988) of the retina system, which is “implemented” by the
biochemical physiology (ion channels) of the photoreceptors
(McNaughton, 1990).
67
Weber’s Law: Regularization Can Respect Real Perception
Shen (Physica D, vol. 175, 2003)
 What does Weber’s Law have to do with regularization?
 Before Shen (2003), all variational image regularizers in image
and vision analysis are defined by either the Sobolev norm


| u |2 dx
as in the Linear Filtering Theory, and the Mumford-Shah model
(1989), or the Total Variation (TV) Radon measure


| Du |
as in the Rudin-Osher-Fatemi model (Physica D, 1992).
 The fundamental assumption of such regularizers is that “human
visual sensitivity to small fluctuations (or irregularities) depends
on nothing else but themselves.” But this is inappropriate
according to Weber’s Law !
68
Weberized Regularization and Applications
Shen (Physica D, vol. 175, 2003)
 Thus Shen (2003) proposed to Weberize (a word conveniently
coined) the conventional Sobolev or TV regularizers to


or,
| u | 2
dx
u2


| Du |
u
 For example, the Weberized TV denoising and deblurring
model (for additive Gaussian noise) would be to minimize the
energy
| Du |

0
0 2
E[u | u , K ] 


u

2


| Ku  u
|
dx,
where the light intensity field (or image) u is non-negative.
69
Existence and Uniqueness of Weberization
Shen (Physica D, vol. 175, 2003)
E[u | u 0 ] 
| Du |


 u
2


| u  u 0 | 2 dx
 Admissible space of u:
D  {u  0 | u  L2 (), TV (ln u )  , u  12 u 0 }
 Existence Theorem:
Assume that 0  u 0  A and ln u 0  L1loc () . Then there exists at
least one minimizer in the admissible space D.
 Uniqueness Theorem:
0
Assume that u = z(x) in D is a minimizer and z  u ( x) / 2 at
each pixel x. Then z(x) is unique.
70
Weberized TV Restoration: An Example
Minimum irregularities are
adaptively distributed
according to Weber’s Law
rougher
smoother
Weberized new model
still maintains the old
virtue of faithfully
restoring sharp edges.
71
Profile of one horizontal slice
from noisy image
after Weberized TV restoration
One-Sentence Conclusion of Today’s Talk
Regularization is crucial for visual perception, and presents
numerous challenges as well as opportunities for further
statistical and mathematical modeling.
72
That is all, folks…
Thank you for your patience!
73
Acknowledgments
 School of Mathematics and IMA, University of Minnesota (UMN).
 Tony Chan, Stan Osher, Luminita Vese, Selim Esedoglu (UCLA); S.-H. Kang (U.
Kentucky); Yoon-Mo Jung (UMN).
 Gil Strang (my Ph.D. advisor, MIT) for his vision, guide, and support on research.
 David Mumford and Stu Geman (Division Appl Math, Brown U).
 Dan Kersten and Paul Schrater (Psychology & EECS, UMN).
 S. Masnou and J.-M. Morel (France); G. Sapiro and M. Bertalmio (EECS, UMN).
 Fadil Santosa, Peter Olver, Hans Othmer, Bob Gulliver, Willard Miller, Doug
Arnold, Mitch Luskin (Colleagues at Math, UMN).
 National Science Foundations (NSF), Program of Applied Mathematics; Office
of Navy Research (ONR).
 All the generous support and warm words from:
Jayant Shah, Andrea Bertozzi, David Donoho, James Murray, Rachid Deriche.
74
Download