Object Recognition Tom McGrath CIS 601 What is object recognition? Perception of objects is different for humans than for computers. For humans: perception of familiar items. For computers: perception of familiar patterns. Are they really the same thing? What do we mean by ‘objects’ What we call object recognition may also be called pattern recognition. A pattern is an arrangement of descriptors. Descriptors may have more forms, but they are primarily vectors and strings. More generally… Object recognition is the process whereby observers are able to recognize threedimensional objects despite receiving only two-dimensional input that varies greatly depending on viewing conditions.(2) 2 main approaches Decision-theoretic – Patterns described using quantitative descriptors. Structural – Patterns represented by symbolic information. – Strings, for example. Decision Theoretic Based on discriminant functions Let x = (x1, x2, …, Xn)T represent an ndimensional pattern vector Let W = (w1, w2,… ,wW) be pattern classes. Basic problem of decision-theoretic We want to find W decision functions d1(x), d2(x),…,dw(x) with the property: If a pattern x belongs to class wi, then di(x) > dj(x), where j = 1, 2, …, W; j != i In other words.. We want to classify x, which is a pattern. We are given a finite set of classes of objects. We want to categorize the pattern x into one of the classes. To do so, we apply x to all decision functions, and categorize x to the class of best fit. Structural Represents objects as strings, trees, graphs.. Define descriptors and recognition rules base on the representations. What does finite classification imply? The idea of a finite set of classes is quite limiting. Corresponds with industries’ use of object recognition: very application specific. Indicates that computer object recognition techniques lack some abilities which are simple for humans. Differences in classification Techniques thus far only classify objects based on their shape, color, texture, etc. These are only representative of the light reflected by an object. Humans classify objects many ways, including an object’s function. For example… We classify a ring of rocks with a fire inside as a fire pit. We classify a board as a joist once it is installed as support for the floor. We classify our computer as a paperweight once it is more than five years old. Correlation Given an image, we want to find all places in the image which contain a subimage, also called a template. Very useful for answering ‘where is the ‘x’ in this picture?’ Notice.. Recognition models typically rely on input from optical sensors. Such input is represented entirely in twodimensional space. Is a 3D representation necessary? DARPA challenge was not successfully completed. Army’s LADAR sensors, which provide depth data, have demonstrated more capability. 3D Object recognition with neural trees First stage extracts features from the input range images. These features are used in the second stage to group image pixels into different surface patches according to the six surface classes proposed by the differential geometry.(4) Invariants Basic idea: – D(g(A),g(B)) = D(A,B) – For all g in transformation group G Limitations: There are very many possible transformations in G, and computation times becomes a problem. Varying goals of object recognition Are we looking for “that” object? – Face recognition Are we looking for “one of those” objects? – Web search for 1987 Chevy pickup. Notice… Just because an object exists in an image doesn’t mean it is recognizable. Example from Late Night with Conan O’Brien We don’t know what this is… Recognizable as a human face… Recognizable as the pope… The Punchline Histogram approach… Vary bad results for images with: – Much noise – Small target objects With tightly controlled conditions, moderate success can be achieved. Noisy histograms Noisy histograms Noisy histograms Correlation example Find the flower Create template Actual template size: 32X32 Acquire input image Actual image size: 1600X1200 Compute correlation image Actual image size: 1600X1200 Show areas of best match Actual image size: 1600X1200 Find flower with more noise Source image: 1600X1200 Correlation image Area of best match? Templates for a coin Acquire a template: Acquire target image Actual size: 1600X1200 Compute correlation image Display area of best match Finding coin among noise Correlation image Brightest coin – wrong one Among different noise… Correlation image Coin found Structural approach to stapler Acquire source stapler image Segment and find the stapler edge Compute the boundary Image recreated from computed boundary: Select boundary points Boundary points at distance of 8: Image recreated from boundary points: Compare to a different view… Segment and acquire boundary Image redrawn from boundary data: Boundary points selected at a distance of 8: Redrawn from selected boundary points: Final step Compare the chain code strings of the 2 sets of boundary points. Finding boundaries with noise Custom filters for each target image may be required: Conclusions Modern object recognition techniques can provide much functionality in controlled environments. Simulation of human object recognition capabilities is a long way off. Best way to search for objects The best approach to create an image search engine requires extensive human labor involving organizing every image in the database into it’s correct hierarchical position. Input as text can provide as much functionality as input from images in this approach. References Digital Image Processing using Matlab – Prentice Hall, ISBN 0-13-008519-7 Michael Tarr, Brown University http://www.cog.brown.edu/~tarr/pdf/Tarr02ECS.pdf#search='obje ct%20recognition‘ 3D Object Recognition by Neural Trees http://csdl.computer.org/comp/proceedings/icip/1997/8183/03/8 1830408abs.htm Longjin Jan Latecki, CIS 601 Lecture notes http://www.cis.temple.edu/~latecki/CIS601-04/lectures_fall04.htm