Using geometry and related things

advertisement
Using geometry and related
things
Region labels
Qualitative
+ Boundaries
and objects
Stronger geometric
constraints from
domain knowledge
More quantitative
more precise
Reasoning on
aspects and
poses
3D point clouds
Explicit
[D. Hoiem, A. A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75(1):151–172, 2007]
What level of representation?
How qualitative?
What type of training information is available?
Assumptions (camera geometry, etc…)?
Learning from image features to depth + MRF: A. Saxena, et al.. 3-D depth reconstruction from a single still image. IJCV, 76, 2007.
Make3D: Learning 3D Scene Structure from a Single Still Image: A. Saxena, et al. TPAMI, 2010.
Stage classes: Nedovic, V., Smeulders, A., Redert, A., Geusebroek, J.: Stages as models of scene geometry. In: PAMI (2010)
Next….
Can coarse surface labels be used for improving object
recognition and scene analysis performance through
better geometric reasoning?
Object Detection
Surface Estimates
Viewpoint Prior
Local Car Detector
From geometry to objects and back?
Distributions versus decisions?
D. Hoiem, A. Efros, M. Hebert. Putting objects in perspective. IJCV 2009
Local Ped Detector
S.Y. Bao, M. Sun, S.Savarese. Toward
Coherent Object Detection And Scene Layout
Understanding. CVPR 2010.
B. Leibe, N. Cornelis, K. Cornelis,
and L. Van Gool. Dynamic 3D
Scene Analysis from a Moving
Vehicle. CVPR07
• Is a more precise representation
possible?
• Boundaries, interposition, relative
depth ordering….
• Still low-level: Can we combine
reasoning about semantic labels
Region labels
Qualitative
+ More geom. Stronger geometric
relations and
constraints from
semantic labels domain knowledge
More quantitative
more precise
Reasoning on
aspects and
poses
3D point clouds
Explicit
Iterative refinement
Iter 1
Iter 2
Toward true integration of geometric and
semantic cues: How to beat the intractable
Global
natureoptimization
of the problem?
F(D|I,L) =
(d ) +
(d ,d ,d ) +
(d ,d )
What (features/cues
can be used?
F( |I,L,S) =
)+
( )+
( , )
D. Hoiem, A. A. Efros, and M. Hebert.
Closing the loop on scene interpretation. In
CVPR, 2008
Final
p
1
p
p
1
pqr
i
i
i
2
p
2
i
Tx
=1
q
r
p
ij
3
j
3
p
i
j
Tx
=1
g
B. Liu, S. Gould, D. Koller. Single image depth estimation from predicted semantic labels. CVPR 2010
S. Gould, R. Fulton, D. Koller. Decomposing a scene into geometric and semantically consistent regions. ICCV 2009
•
•
Region labels
Qualitative
+ Boundaries
and objects
Stronger geometric
constraints from
domain knowledge
More quantitative
more precise
Still mostly bottom-up
classification approach
No use of domain
constraints or constraints
governing the physical world
Reasoning on
aspects and
poses
3D point clouds
Explicit
Score
How to generate and search through hypotheses
(in a tractable manner)?
How to evaluate score?
How to avoid early decisions?
How to represent constraints in a more general
way?
D. Lee, T. Kanade, M. Hebert. Geometric Reasoning
for Single Image Structure Recovery. CVPR09.
Lines
Faces
V. Hedau, D. Hoiem, D.Forsyth, “Recovering the
Spatial Layout of Cluttered Rooms,” IEEE
International Conference on Computer Vision (ICCV),
2009.
H. Wang, S. Gould, D. Koller. Discriminative learning
with latent variables for cluttered indoor scene
understanding. ECCV 2010.
Classifiers
f(x,y,w) = wT (x,y)
Feature vector
measuring agreement
Learned weight vector between lines, faces
and labels
Region labels
Qualitative
+ Boundaries
and objects
Stronger geometric + more constraints3D point clouds
constraints from
domain knowledge
More quantitative
more precise
Explicit
• Finite volume
• Spatial exclusion
• Containment
•
•
•
•
Stability
Contact
Proximity
………….
• Search through hypothesis space
Input image
Line segments and
Vanishing points
Room hypotheses
Reject invalid
configurations
Geometric context
Orientation map
Compatibility of image data with geometric
configuration
f x, y
Features from image (surface labels,
vanishing points, etc.)
Object hypotheses
Penalty term for incompatible configurations
T
w
x, y
w
T
y
Hypothesis: Scene layout+
object hypothesis
D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects
and Surfaces. Advances in Neural Information Processing Systems (NIPS), Vol. 24, 2010.
Surface Layout Density Map
Bag of Segments
Frontal Front-Right
Front-Left
Left-Right Left-OccludedRight-Occluded
Porous
Solid
Catalogue
sky
above
Medium
above
above
above
Medium Medium High
Infront
Medium
Infront
Pointsupported
Original Image
Pointsupported
supported
Ground
High
supported
3D Parse Graph
A. Gupta, A. Efros, and M. Hebert. Blocks World Revisited: Image Understanding Using Qualitative Geometry and
Mechanics. ECCV 2010.
http://www.cs.cmu.edu/~abhinavg/blocksworld
• Direct search through hypothesis space
• Sampling
L. Del Pero, J. Guan, E. Brau, J. Schlecht, K. Barnard. Sampling
Bedrooms. CVPR 2011.
Diffusion moves:
Sample room boundary
Sample camera
Change r = (x,y,z,w,h,l, )
Change c =
Sample object parameters
f
Sample over a block edge only
Change o = (x,y,z,w,h,l)
Jump moves:
• Direct search through hypothesis space
• Sampling
• (Constrained) object detection
P(O1,..,ON,L,H|I) = P(H)P(L|H,I)
i P(Oi|L,H,I)
V. Hedau, D. Hoiem, D. Forsyth, “Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry,”
European Conference on Computer Vision (ECCV), 2010.
•
•
•
•
Direct search through hypothesis space
Sampling
Object detection
Grammars
Truth
Max
Window
Wall
Balcony
Door
Roof
=
L. Simon, O. Teboul, P. Koutsourakis and N. Paragios. Random
Exploration of the Procedural Space for Single-View 3D Modeling of
Buildings. International Journal of Computer Vision (IJCV), 2010.
O. Teboul, I. Kokkinos, P. Koutsourakis, L. Simon and N. Paragios.
Shape Grammar Parsing via Reinforcement Learning. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
2011.
•
•
•
•
Direct search through hypothesis space
Sampling
Object detection
Grammars
What constraints?
How to combine low level classifiers with “higher-level” reasoning?
What are the right tools to search through and score hypotheses?
How to represent partial interpretations without early
commitment?
G. Tsai et al. Realtime Indoor Scene
Understanding using
Bayesian Filtering
with Motion Cues.
ICCV 2011.
Region labels
Qualitative
+ Boundaries
and objects
+ sparse/partial
3D data
Stronger geometric + more constraints3D+ point
otherclouds
constraints from
constraints
domain knowledge
+ (large) prior
data
More quantitative
more precise
Explicit
• What level of representation? Do we need explicit
parse of the input or how far can we go with
associations?
– Hierarchical, region labels, associations,...
• How to incorporate knowledge/context external to
the input image?
– Task, geometry, contextual info (scene type,
location), text, ...
• Should we use global models vs. sequences of
simpler models? Is the problem too hard as posed,
i.e., intractable?
• What are the right definitions of actions, activities,
behaviors?
• How to combine temporal (actions) and spatial
(scenes, objects) information effectively?
• What should be evaluated and what stage?
– bounding boxes, pixelwise labels, 3D models,
actions/behaviors, predictions?
Download