Object Recognition

advertisement
Object Recognition
Tom McGrath
CIS 601
What is object recognition?
Perception of objects is different for
humans than for computers.
For humans: perception of familiar items.
For computers: perception of familiar
patterns.
Are they really the same thing?
What do we mean by ‘objects’
What we call object recognition may also
be called pattern recognition.
A pattern is an arrangement of descriptors.
Descriptors may have more forms, but
they are primarily vectors and strings.
More generally…
Object recognition is the process whereby
observers are able to recognize threedimensional objects despite receiving only
two-dimensional input that varies greatly
depending on viewing conditions.(2)
2 main approaches
Decision-theoretic
– Patterns described using quantitative
descriptors.
Structural
– Patterns represented by symbolic information.
– Strings, for example.
Decision Theoretic
Based on discriminant functions
Let x = (x1, x2, …, Xn)T represent an ndimensional pattern vector
Let W = (w1, w2,… ,wW) be pattern
classes.
Basic problem of decision-theoretic
We want to find W decision functions d1(x),
d2(x),…,dw(x) with the property:
If a pattern x belongs to class wi, then
di(x) > dj(x),
where j = 1, 2, …, W; j != i
In other words..
We want to classify x, which is a pattern.
We are given a finite set of classes of
objects.
We want to categorize the pattern x into
one of the classes.
To do so, we apply x to all decision
functions, and categorize x to the class of
best fit.
Structural
Represents objects as strings, trees,
graphs..
Define descriptors and recognition rules
base on the representations.
What does finite classification
imply?
The idea of a finite set of classes is quite
limiting.
Corresponds with industries’ use of object
recognition: very application specific.
Indicates that computer object recognition
techniques lack some abilities which are
simple for humans.
Differences in classification
Techniques thus far only classify objects
based on their shape, color, texture, etc.
These are only representative of the light
reflected by an object.
Humans classify objects many ways,
including an object’s function.
For example…
We classify a ring of rocks with a fire
inside as a fire pit.
We classify a board as a joist once it is
installed as support for the floor.
We classify our computer as a
paperweight once it is more than five
years old.
Correlation
Given an image, we want to find all places
in the image which contain a subimage,
also called a template.
Very useful for answering ‘where is the ‘x’
in this picture?’
Notice..
Recognition models typically rely on input
from optical sensors.
Such input is represented entirely in twodimensional space.
Is a 3D representation necessary?
DARPA challenge was not successfully
completed.
Army’s LADAR sensors, which provide
depth data, have demonstrated more
capability.
3D Object recognition with neural
trees
First stage extracts features from the input
range images.
These features are used in the second
stage to group image pixels into different
surface patches according to the six
surface classes proposed by the
differential geometry.(4)
Invariants
Basic idea:
– D(g(A),g(B)) = D(A,B)
– For all g in transformation group G
Limitations:
There are very many possible transformations
in G, and computation times becomes a
problem.
Varying goals of object recognition
Are we looking for “that” object?
– Face recognition
Are we looking for “one of those” objects?
– Web search for 1987 Chevy pickup.
Notice…
Just because an object exists in an image
doesn’t mean it is recognizable.
Example from Late Night with Conan
O’Brien
We don’t know what this is…
Recognizable as a human face…
Recognizable as the pope…
The Punchline
Histogram approach…
Vary bad results for images with:
– Much noise
– Small target objects
With tightly controlled conditions, moderate
success can be achieved.
Noisy histograms
Noisy histograms
Noisy histograms
Correlation example
Find the flower
Create template
Actual template size:
32X32
Acquire input image
Actual image size:
1600X1200
Compute correlation image
Actual image size:
1600X1200
Show areas of best match
Actual image size:
1600X1200
Find flower with more noise
Source image:
1600X1200
Correlation image
Area of best match?
Templates for a coin
Acquire a template:
Acquire target image
Actual size:
1600X1200
Compute correlation image
Display area of best match
Finding coin among noise
Correlation image
Brightest coin – wrong one
Among different noise…
Correlation image
Coin found
Structural approach to stapler
Acquire source
stapler image
Segment and find the stapler edge
Compute the boundary
Image recreated from
computed boundary:
Select boundary points
Boundary points at
distance of 8:
Image recreated from boundary
points:
Compare to a different view…
Segment and acquire boundary
Image redrawn from boundary
data:
Boundary points selected at a
distance of 8:
Redrawn from selected boundary
points:
Final step
Compare the chain code strings of the 2
sets of boundary points.
Finding boundaries with noise
Custom filters for
each target image
may be required:
Conclusions
Modern object recognition techniques can
provide much functionality in controlled
environments.
Simulation of human object recognition
capabilities is a long way off.
Best way to search for objects
The best approach to create an image
search engine requires extensive human
labor involving organizing every image in
the database into it’s correct hierarchical
position.
Input as text can provide as much
functionality as input from images in this
approach.
References
Digital Image Processing using Matlab
– Prentice Hall, ISBN 0-13-008519-7
Michael Tarr, Brown University
http://www.cog.brown.edu/~tarr/pdf/Tarr02ECS.pdf#search='obje
ct%20recognition‘
3D Object Recognition by Neural Trees
http://csdl.computer.org/comp/proceedings/icip/1997/8183/03/8
1830408abs.htm
Longjin Jan Latecki, CIS 601 Lecture notes
http://www.cis.temple.edu/~latecki/CIS601-04/lectures_fall04.htm
Download