PPT

advertisement
Computational Theories & Low-level
Pixels To Percepts
A. Efros, CMU, Spring 2009
Four Stages of Visual Perception
Sound
A u d itio n
L TM
O d o r (e tc.)
L ig h t
L ig h t
Im a g e B a se d
P ro ce ssin g
S TM
M o to r
V isio n
S u rfa ce B a se d
P ro ce ssin g
O b je ctB a se d
P ro ce ssin g
M o ve ment
C a te g o ryB a se d
P ro ce ssin g
Ceramic
cup on
a table
David Marr, 1982
© Stephen E. Palmer, 2002
Four Stages of Visual Perception
The Retinal Image
An Image
(blowup)
Receptor Output
© Stephen E. Palmer, 2002
Four Stages of Visual Perception
Retinal
Image
Image-based
Representation
Imagebased
processes
Edges
Lines
Blobs
etc.
An Image
(Line Drawing)
Primal Sketch
(Marr) © Stephen E. Palmer, 2002
We likely throw away a lot
line drawings are universal
Four Stages of Visual Perception
Image-based
Representation
Surface-based
Representation
Surfacebased
processes
Stereo
Shading
Motion
etc.
Primal Sketch
2.5-D Sketch
© Stephen E. Palmer, 2002
Single Surface
(Koenderink’s trick)
Four Stages of Visual Perception
Image-based
Representation
Surface-based
Representation
Surfacebased
processes
Stereo
Shading
Motion
etc.
Primal Sketch
2.5-D Sketch
© Stephen E. Palmer, 2002
Figure/Ground Organization
 A contour belongs to one of the two (but not
both) abutting regions.
Figure
(face)
Ground
(Shapeless)
Ground
(shapeless)
Figure
(Goblet)
Important for the perception of shape
Figure-Ground Organization
15.18
Properties of figures vs. grounds
Figure
Thing-like
Closer
Shaped
Ground
Not thing-like
Farther
Extends behind
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.19
Principles of figure-ground organization:
Surroundedness
Surrounded region --> Figure
Surrounding region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.20
Principles of figure-ground organization:
Size
Smaller region --> Figure
Larger region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.21
Principles of figure-ground organization:
Orientation
Horizontal/vertical region --> Figure
Oblique region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.22
Principles of figure-ground organization:
Contrast
Higher contrast region --> Figure
Lower contrast region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.23
Principles of figure-ground organization:
Symmetry
Symmetrical region --> Figure
Asymmetrical region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.24
Principles of figure-ground organization:
Convexity
More convex region --> Figure
Less convex region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.25
Principles of figure-ground organization:
Parallelism
More parallel region --> Figure
Less parallel region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.26
Principles of figure-ground organization:
Lower region
Lower region --> Figure
Upper region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.27
Principles of figure-ground organization:
Meaningfulness
More meaningful region --> Figure
Less meaningful region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.28
Relation to Depth Factors
Figure-ground organization as edge assignment:
To which side does the edge belong?
To the closer side. This fact connects figure-ground
organization with depth perception.
Depth cues can also be figure-ground factors
and
Figure-ground factors can be depth cues.
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.29
Principles of figure-ground organization:
Occlusion
Occluding region --> Figure
Occluded region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.30
Principles of figure-ground organization:
Cast Shadows
Shadowing region --> Figure
Shadowed region --> Ground
© Stephen E. Palmer, 2002
Figure-Ground Organization
15.32
Principles of figure-ground organization:
Shading
Shaded region --> Figure
Nonshaded region --> Ground
© Stephen E. Palmer, 2002
Line Labeling
[Clowes 1971, Huffman 1971; Waltz 1972; Malik 1986]
Constraint
Propagation
> : contour direction
+ : convex edge
- : concave edge
possible junctions
(constraints)
26
Line Labeling
Four Stages of Visual Perception
Object-based
Representation
Surface-based
Representation
Objectbased
processes
Grouping
Parsing
Completion
etc.
2.5-D Sketch
Volumetric Sketch
© Stephen E. Palmer, 2002
Geons
(Biederman '87)
Four Stages of Visual Perception
Category-based
Representation
Object-based
Representation
Categorybased
processes
Category: cup
Color: light-gray
PatternRecognition
Size: 6”
Location: table
Spatialdescription
Volumetric Sketch
Basic-level Category
© Stephen E. Palmer, 2002
We likely throw away a lot
line drawings are universal
However, things are not so
simple…
●
Problems with feed-forward model of
processing…
Junctions in Real Images
Are Junctions local evidence?
J McDermott, 2004
Early vs. Late Grouping
14.38
Is grouping an early or late process?
L ig h t
? ? ? ?
Im a g e B a se d
P ro ce ssin g
S u rfa ce B a se d
P ro ce ssin g
O b je ctB a se d
P ro ce ssin g
C a te g o ryB a se d
P ro ce ssin g
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.39
Before or after stereoscopic depth?
(Rock & Brosgole, 1964)
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.40
Before or after lightness constancy?
Opaque
paper strip
(Rock, Nijhawan, Palmer & Tudor, 1992)
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.41
Before or after visual completion?
(Palmer, Neff & Beck, 1996)
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.42
Before or after illusory contours?
?
(Palmer & Nelson, 2000)
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.43
Conclusion: Grouping can occur “late”
Question: Can grouping also occur “early”
(Palmer & Brooks, in preparation)
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.44
Grouping affects shape constancy
Ambiguous
Circle in depth
Flat oval
(Palmer & Brooks, in preparation)
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.45
Proximity effects
Biased toward oval
Biased toward circle
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.46
Color similarity effects
Biased toward oval
Biased toward circle
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.47
Common fate effects
Biased toward oval
Biased toward circle
© Stephen E. Palmer, 2002
Early vs. Late Grouping
14.48
Conclusion: Grouping occurs both “early”
and “late” -- possibly everywhere!
Grouping Grouping Grouping Grouping
L ig h t
Im a g e B a se d
P ro ce ssin g
S u rfa ce B a se d
P ro ce ssin g
O b je ctB a se d
P ro ce ssin g
C a te g o ryB a se d
P ro ce ssin g
© Stephen E. Palmer, 2002
two-tone images
“attached
shadow”
contour
hair (not
shadow!)
“cast shadow”
contour
inferred
external
contours
Cavanagh's argument
Finding 3D structure in two-tone images requires distinguishing
cast shadows, attached shadows, and areas of low reflectivity
The images do not contain this information a priori (at low
level)
A Classical View of Vision
High-level
Object and Scene
Recognition
Figure/Ground
Organization
Mid-level
Grouping /
Segmentation
Low-level
pixels, features,
edges, etc.
A Contemporary View of Vision
High-level
Mid-level
Object and Scene
Recognition
Figure/Ground
Organization
Grouping /
Segmentation
But where we
draw this line?
Low-level
pixels, features,
edges, etc.
Question #1:
What (if anything) should be done at
the “Low-Level”?
N.B. I have already told you everything
that is known. From now on, there
aren’t any answers.. Only questions…
Who cares? Why not just use pixels?
Pixel differences vs. Perceptual differences
Eye is not a photometer!
"Every light is a shade, compared to the higher
lights, till you come to the sun; and every
shade is a light, compared to the deeper
shades, till you come to the night."
— John Ruskin, 1879
Cornsweet Illusion
Sine wave
Campbell-Robson contrast sensitivity curve
Metamers
Question #1:
What (if anything) should be done at
the “Low-Level”?
i.e. What input stimulus should we be
invariant to?
Invariant to:
• Brightness / Color changes?
low-frequency changes
small brightness / color changes
But one can be too invariant
Invariant to:
• Edge contrast / reversal?
I shouldn’t care what background I am on!
but be careful of exaggerating noise
Representation choices
Raw Pixels
Gradients:
Gradient Magnitude:
Thresholded gradients (edge + sign):
Thresholded gradient mag. (edges):
Spatial invariance
• Rotation, Translation, Scale
• Yes, but not too much…
• In brain: complex cells – partial invariance
• In Comp. Vision: histogram-binning methods
(SIFT, GIST, Shape Context, etc) or,
equivalently, blurring (e.g. Geometric Blur) -will discuss later
Many lives of a boundary
Often, context-dependent…
input
canny
Maybe low-level is never enough?
human
1/f amplitude spectra for natural images
There are statistical regularities in
the natural world, and image
statistics reflect that.
(Burton & Moorehead 1987; Field 1987; Tolhurst et
al. 1992)
(Field 1987)
Why 1/f?
Scale invariance
Edges have 1/f structure
Object distribution in real world
(Ruderman 1997; Lee & Mumford 1999)
(Image source: smokiesguidebook.com
Slide content: Simoncelli & Olshausen 2001)
A closer look at amplitude spectra
(Torralba & Oliva 2003)
Do natural image statistics matter?
Sensory coding might exploit statistical regularities of
our world according to various criteria:
Representational efficiency
Decorrelate input responses, make them independent, sparse,
information theoretic metrics etc.
Metabolic efficiency
Spike efficiency, minimal wiring.
Learning efficiency
Sparseness, invariance, over completeness etc.
Lots and lots of work; see reviews Graham & Field
(2007), Simoncelli & Olshausen (2001)
Download