From: AAAI-83 Proceedings. Copyright ©1983, AAAI (www.aaai.org). All rights reserved. ORGANIZATION PERCEPTUAL FOR David VISUAL G. Lowe and Thomas Computer Science Stanford TJniversity, is prescnled showiug that bottom-up is usually prerequisite in terprc tation of images. Ihcse models. Scvernl ing which relations to the extent by accident relations from should which rcsofutions, structures the linking The \vhich no single-resolution Its j>crformance is demonstrated such as the have and on been aspect ment images goal of computer to prior label and vision systems panding cnvironmcnts to the image pcrTormnnce same of searching and the of the models sible mntchcs. dirncnsiorlal ing, far the increasing the shnpc from imagesP-usiilg However, considerably. solved even formation, and these of human stcrco, to reduce when methods performance the given models and fail in such simple the domains lhcsc of trcat- regularities this paper of inference plays nnd~rlying prinprcscnt new seg- principles. of irlformation of image the problem and for many In some and scurcps orgailixxtion simplify 1) A major reduction segmentation related features can of matching which provide, against in the -the search division features. This hzs space of the world long is achieved image been inlo by sets of as a recognized crucial problem in image interpretation. want to match models against all possible of posthree- of matching three-dimensional bottom-up up”n valuable this a unified that form but Wit,kin knowledge: gcncrality shnd- this ;~p- of detecting on the irn;lgc describe these [7] was explicit, non-accidental. that b;v;ed are three all of which models search to explain the will motion, are Lhe role methods There is one lhc size of this problem full mentation is not one of between number based models in ex- problem describe but Recenlly, They to such general formulation on the assumption image efforts lackctl relations structure b>:;cd in the many for the importance here. or- description, problcrns: often in model based recognition, develop ciples for this level of interprct>.t,ioll, tightly- results in an excessively large space Continued research into recovering from level Rather, correspondences tcxlurc--promises md space since domains 1haL two different data. for potential image, in with a few well-specified [Z, 8, It]. The difficulty unlikely image model- only general WC will visual and thereby current demonstrated to more is very fit the constituents, However, bceu is to relate given of inference detected research of their them. have ambiguity--it fully knowledge interpret constrained to compare vision reasons been developed. con- has gone perceptual slcctch of these fully topics or connectivity, primal and imposirlg the same of uniformly texture have have [12] h ave argued regularities related scgmcntstion and some body of their segmentalion, of collinearity to make in a form be applied phenomena, inilial little boltom-up n large knowledge and There of it was never Tenenbaum Introduction image integrated wc have image into cm any for specific Marr’s intended natural as the index on this perception. plicability. data. A major names detection not effect,ivcJ describe we have research algorithms which paper we will describe methods the significance of relations be- figure/ground Gestalt of imaging could detect. OR synthetic under about to use in a way that before Previous and is a.ble to detect algorithm elements tents. an algorithm of segments algorithm image develop structure and to selectively In this and cva,luating ganization, must be based significant be used knowledge. tween of features, images it is necessary to structure can to all images are are few altcrna- over a range to interpret for detecting to have arisen we develop including the basis of curvilinearity. there detects in- world object relations and relations are invariant segmentation at multiple image where that 94305 knowledge, techniques of for dctermin- distribution Using these principles for curve for accessjug that they are unlikely proximity, which conditions. of functions be formed: the surrounding tives within tilt same Rinford California prior and 2) three-dimcnsiomrl descriptions can only be formed on properties three principlcc J are hypothesized image significant WC describe and 3) stable grouping to the recognition 1) scgmcnlation, groupings: tcrpreta,tion, 0. In order im&ge features A BASIS Department Stanford, Abstract Evidence AS RECOGNITION we do cornhinations IlGt of features in an image, so good scgmrntat ion is crucial for reducing the combinatorics of this search. 2) Two-dimcusional is in- cxccllent inter_:\rrtations, pnpcrs [I, 51. For mu::t as line he co!lirlear viewpoint. drawings. 255 rrlations sional lead to specific as we have c:;amp!c, collinear in 3-sp.7cc, A corollary of ibis three-dimcn- described barring is thal lines an thesr in previous in tilt image accident image of rcla- tions are which normally greatly invariant simplilies three-dimensional 3) To the objects extent different that imaging used, Not but can be and description For example, can under they can of the relations will have be several in the values collinear line segments relative sizes I of world relative provides be used stable a body relation whose which that tempted that the names by the gaps, are viewpoints, to access each characterized ments Note can of variation can be used. and to orientation. relations terms in addition parameters image index to viewpoint, of malching of unknown these only respect problem conditions be used as reliable knowledge. with the of the seg- a viewpoint-invariant to select a model for at- a. matching. all three of these points assume that z/ the relations found in the image are a result of regularities in the obThis means that any relations which jects being viewed. happen to arise accidentally from independent features will only confuse the significant and interpretations. accidental This relations distinction is a point I- between to which we I will return. The importance of perceptual recognition: The importance of these grouping the processing of images by the be demonstrated by periment. In Figure drawing for bottom-up most parallelism, with identity of the object, difficult able to recognize and the of the ready been level performed---the task would embedded in a normal I(b) segment location segments other circular termination this segmentation is in has be even harder scene nition times than for the would would al- if the containing be expected first, this second with 3 out as in l(a) segment allow ma,ny The that the the group- recognizing lower it 5 seconds and with 7 out of 10 subjects the 60 second time limit. Presumably, recognizing it if the added segment had been placed at some location lend itself to perceptual groupings the change which did not in recognition times would have These figures human capability been negligible. can also be used to demonstrate to make use of top-down contextual 1935 to limit the in- the demonstrated [3], verbal clues search space in expcrirnents naming even for forming performed vague a match. as early non-visual as object greatly reduce the recognition time. Subjects interpret Figure l(a) immediately upon being told that it is a bicycle. esting borderline where formation can to recognition. recog- dramatically of 10 subjects As was classes can can usually bottom-up in recognition, were in center with to further if we assume figure only it to be combined be coincident leading role with was placed grouping. then segment, an important for added in a curvilinear grouping play drawing The which of another As might groupings same added. a strategic within within that unlimit, 1: When opportunities for bottom-up grouping of’image features have been removed, as was done for the line drawing of a bicycle in (a), the drawing is remarkably difficult to recognize. The average recognition time for (a) was over one minute when the subjects had no prior knowledge of the object’s identity. When a single line segment was added in (b), which provided local evidence l’or a curvilinear grouping, the recognition times were greatly reduced. formation is the with ings. Note were time features. Figure of this the to be rcmark- a 60 second 45 seconds. \/ Figure experiabout of 10 subjects object were the b. endpoint nothing proved out within took that told line we have In informal drawing object subject fact surrounding a single the (e.g., l ex- a partial collinearity, were -- can opportunities eliminated Nine in system most symmetry). this to recognize. tenth spite that of signilicant who as a stage visual constructed are and 10 subjects ably bicycle cases for psychophysical a way segmentation eliminated ments we have in such proximity, operations human a straightforward l(a) of a bicycle organization A demonstration Thus this figure is on an intereither bottom-up or top-down in- suddenly reduce the search One can imagine a series space and lead of expcrirnents this search space that would systematically explore reduction in its size created by different botlorn-up and the or top- down clues. These figures can also be used to demonstrate the hunran eqlrivalcnt of a back-projection algorit,hrn [(I] followed partial by image-level matches cm matching, where ccr(.n.in hypothesized bc used to solve for the position, oricn- tation, and internal parameters of the model, which in turn lead to accurate predictions for further nlatxhrs at specific locations in the image. Principles There are virtually be forrncd bctwccn principles can we dcrivc arc forming worth was rrlcntioncd and that they than accidental for measuring their are rcprcspn t actual arisen by accident. call arise positioning as well as from by rxarnining rounding to relation measures If there were the tions, could this mcntations. However, to be so wide very be used that weak. of signi6cancc tures are and any We have with scale. Significance probability that is biased sumption. But pendently since positioned respect to orientation, discrimination seems typically objects (leading location, operation major must is obviously tances that so the do not particular type in which a highly include being tion to detect this failure resources. limited tempted than can the examined. such relations, background is that each It over 2 shows to human vision targets. It would at that principles and but how will bc attempted. scale above they they at some scale, which will be evaluated do not specify There some relation scenes, groupings classes factors image, the only then are the very rarely it is more likely is accidental be possible to distinguish the relation image measurements). to devote than is useful easiest if right-angles image from play resources some accidentals searching are of instance it would Of course, which If structure (although for those and a role attempted. the from would Witkin to distinguish be that of to detect probabilities in an image arise even in the therefore on irnage- viewpoints in viewpoint. the relation seldom all be pointless since prior should depend or other collinearity angle change relations do not over it would that and arises An which rccogniso in creating can perform images mented edge been smooth without of still given it is also for properties common. these bottom-up is the problem segmentation a computer perceptual of creating program processes appropriately on seg- descriptions. considerable The best current edge operators which are then linked using ncarestinto lists of points. Although there research into the problem of fitting curves to these lists exception these efforts of points [9, 10, have concentrated 111, almost on a single pre-selected resolution of segmentation and have attempted merely to smooth out noise induced by the imaging process. of (We tion of relations which for curve bottleneck natural has will bc class algorithm detect “edge points” neighbor algorithms no groupings for a given which arc several which accidentals, But any in selecting from A significant of description. describe almost [12] argue which when there presumably grouping, then image imagingit is only an example and significant example, scene Tenenbaum of five equally-spaced of intcrprctlation candidates with less productive dis- of the For in the the accurate-enough of features candidates in which position, in the in the scene. at right-angles typical significance. in any absolute sense, since groupings can be atat many different scales; however, if there are more be formed formed the must be aI,1 ributablc to a lack of computational This does not mean that groupings are diameter- 2. few false The with image), be formed Figure a statistically inde- relations light-source paramct,cr. common change as- complexity. false grouping purposes in the to this only too many significant sys- many all combinations can collinear dots is not apparent are enough surrounding false be USC~III for contains for judging to test visual to independence computational rclat,ions lines are for irn:lge it is present collinearity such important factor is the same that was mrnt ioned earlier--- viewpoint, because to the from looking formation for psychologi- of scgmcntation limited fca- independence scale wort,h a specific location, human respect criterion impossible in an image, with principle have and be that arisen this a scene like a reasonable A second the from this choice. One invariance condition seems proportional It is a matter direction of relations hypothesis have to see whether in any rela- computation to orientation, would features. experimentation tem null by enough competing candidates for grouping within the same proximity, as in (b). Th is occurs even though the relation remains highly significant in the et L) atistical sense and would thcrcfore likely be of use for recognition. of seg- levcll must our of five equally-spaced collinear dots in (a) spontaneously by human vision if it is surrounded is not detected the and images at this out inversely relation a set of independent cal to the respect is then the prior 2: The pattern Figure any meaSsures the significance to carry with that of features knowledge 0 sur- knowledge respect independent the level prior . it is possible of common . . 0 However, and during chosen . * e we have nonrandomness range . or random likelihood of e 0 e those for significance for judging the from 0 . b. test distributions . . be Lo distinguish, These a significant expected scene image, of the can then bc used as the basic segmentation process. regarding of the a central relation in the . As to the in the image. of each . a. only of viewpoint structure is accidental. significnnce? All of the relations of featrtrcs probabili~;l,ic which structures accidents accuracy disLribution give given the from relations useful must significant could general Therefore, process considered that What struct,urc align rncn ts. as possible, have those scgnlcntations of the segmentation as accurately of rclalions of any image. for selecting earlier, rather which number Lhc elements CXIcnt funcLioll of segmentation an infinite use “resolution” to refer to the the smoothed influence 257 curve in the allowable description.) context of curve segmentatransverse deviation from Although these smoothed results may that the appear is because lower been lustrates the problem, other structures developed earlier, curve selects description. all groupings These can detection are lower 3(c). degree a wide significant human primitive curve - - we examine curva- arbitrary that b. for or of constant to represent of range structures implementation, linear of seg- the over resoluWe have measures either general when on the principles it is possible of more in Figure based the most not 3 il- of collinearity, which be splincd although segmentation as in Figure In our which have Figure instance lists eye, perform they the in edge-point and still program. apparent algorithm, structure of resolutions only human can one recognized outlined nonrandom curves, are arc a new mentation ture. where naive though by the to recognize groupings the system even described is adequate but to visual groupings and tion the human resolution detected 3(b) reasonable the vision \ smooth includes groupings, the such as spirals. 6. Measuring The is to the significance first task determine confuse in developing how each grouping. linked on the of a curve a segmentation we will measure this nonrandomness in proximity of their from the would since the point they lie, as is shown have chosen would a point distance from is equal to can looking the points for that the considered point previously which in terms an overall Testing this angle between endpoint, far and of the points Since given curve. the by away its This curve in 4(c). recursively from any of the likelihood proximity these we to be how as shown calculating to these likelihood together values to produce curve. all possible significance divide the initial have the highest of the can be multiplied for the those points Therefore, point is farthest points. they value 4(b). of its minimum considered are independent, Given 8 is the closest thus with as it is to a curve to 4 or more point the distance of two. However, in linearity defining the be extended for 4(a) and nonrandomness where from at at least two Figure 3: The data in (a) can be segmented different resolutions of description, as shown in (b) and (c). One instance of collinearity can only be detected in segmentation (b) while the other instance of collinearity and the parallelism can only be detected straightforwardly in (c). by the be close to the line on which in Figures closest 20/n, the vector the measurement to one of the other is to be as close the with of proximity close automatically to define unlikely This \ of For example, if we start points, we might measure effects by being third and algorithm significance linearity by measuring line joining the other confuse linearity, the In this case, since the points were originally basis of proximity, we must be careful not to of nonrandomness in linearity. looking at a set of only three significance one point segmentation test linked list significance for groupings sets of points, of points values. we want to into segments which Previous methods of curve segmentation have usually attempted corners (tangent discontinuities) on a curve, to search for where the curve F’igure 4: The middle dots in (a) and (b) are both the same distance from the dashed line joining t,he other two dots. Yet the three dots in (b) are much more significant in terms of their collinearity than those in (a), since the middle dot in (a) could be close to the line merely as a result of its proximity to the first endpoint. Therefore, we measure the probability of a l)oint being within a given distance from a line in terms of its proximity to the clo.;est endpoint defining the line, as shown in (c). can bo divided into different segments. Our approach is the dual---WC look for segments of the cwve which exhibit significant nonrandomness, continuities are assigned ing segments. method In contrast will fail to assign and tangent to the junctions to the earlier any segmentation and curvature between approaches, where dis- neighborour the curve 258 Figure 5: The small 30 by 45 pixel region of an aerial photograph shown in (a) was run through the Marimont edge detector to produce the linked edge points shown in (b). lcigure (c) shows all the segments at different scales and locations v;hich were tested for significance. After selecting only t’hose sc;mrnts which were loca!ly rnaximurn with respect to size of grouping, and thresholding out those which arc not statistically significant, we are left with the segments shov~n in (d). It is t.hen fairly simple ho form collinearity and curvilincarity relations between t!rese scgrnents as shown by dotted lines in (e). Figure 6: The hand-input curves in (a) have been created to cxhibit significant structure at multiple resolutions. When these are given as data to the curve segmentation algorithm, it produces the results shown in (b), which makes these multiple levels of structure explicit. 259 appears to wander sign multiple turc randomly at diffcrcnt clearly of the allow and groupings. up groupings (amounting scale, from that any After of the than at one local the given this dctcct each 5(n). from data, shown tusl lists. Figure apparent. Given collinearity also he fairly parallelism, Figure which these and by the local :;cgmcnts, lines straightforward constant differing and perceptual image relations search space pa.per play that must by a major role be considered from the could the for human in limiting when the matching of background be applied to problems or segmentation O., “Inferring surfaces 17 (1981), 205-244. from images,” Artifi- study of a neglected portion of Icarningof sensory organization,” Journal of Genetic 46 (1935), 41-75. in ACRONYM,” Proceed- of visual information,” [7] Marr, David, “Early processing sophical Transactions of the Royal Society of London, PhiloSeries [lo] perccp- importance vision. range are the most (1976), Workshop 483-524. Recognition (New of York, NY: Rutkowski, W.S., and hzriel Rosenfeld, “A comparison of corner-detection techniques for chain-coded curves,” TR-623, Computer Science Center, TJniversity of Maryland, 1978. Shirai, Y., “Recognition of man-made objects cues,” Computer Vision Systems, A. Hanson, eds. (New York: Academic Press, 1978). demonstrating organization pcrccptual which Calif- [ll] this accident im- of segmenting over a wide those methodology Pattern [9] Pavlidis, ‘I‘., Structural Springer-Verlag, 1977). 5(c). Summary We began preliminary (Stanford, groupings. bottom-up from analysis. R, 275 endpoint other taken “Description and recognition [S] Nevatia, R., and T.O. Binford, Artificial Intelligence, 9 (1977). of curved objects,” relations in Figure to detect intervals, the ma.xima it is rela- curvilinearity dotted all the 5(d) shows sclccts by same of other at groupings retaining arisen This range and arc methodology [G] Marirnont, David, “Segmentation ings ARPA Image Understanding ornia, September 1982). by to subpixcl widely in images represpect [5] Lowe, David G. and Thomas Binford, “The interpretation of three-dimensional structure from image cnrvcs,” Proceedings UC/U-7 (Vancouver, Canada, August 1979), 613-628. in generated 5(c) shows the to have Psychology, on written points with for the parameters of object models [4] Lowe, David G., ‘Solving from image descriptions” Proceedings ARPA Image Understanding Workshop (College Park, MD, April 1980), 121-127. of an is shown data relations [3] Leeper, R., “A the development produced. program edge process as shown edge although to resolution. them linked image reasoning among 3-D models [2] Brooks, Rodney A., “Symbolic and 2-D images,” Artificial Intelligence, 16 (1981). to algorithm image groupings correspondence References G(b) ability have these invariant developed general by looking [l] Binford, Thomas cial Intelligence, when photograph dotccts into are not to form proximity, is also group- at different the many We would like to thank Peter Blicher, Chris Goad, David Marimont, Marty Tenenbaurn, Brian Wandcll, Andrew Witkin, and our other colleagues for stimulating cliscussions and suggestions. This work was supported under AlU’A contract NOO039-82-C0250. The first author was also supported by a postgraduate scholarship from the Natural Sciences and Engineering Research Council of Canada. different Figure arc prin- arc Acknowledgements which 3. could detection all resolulions, the thinning It would signal resolutions--results digit,ized some (61, which values locations and and will be algorithm of an aerial by an edge them data scales a wide There the algorithm’s original significance between in Figure of running region results easy there below at multiple s h ows pcrccptual the these and be detected algorithms of the maximum exhibits segmentation results The links at respect a curve structures algorithm the specific distribution. in action different structure 5(b) and with level, a,nd demonstrates Marimont tively that of grouping. algoriLhm as was image after the arc locally if the curve exhibit 5 shows groupings does through along which of the curve facility. accuracy one but grouping, steps location resolutions which Figure David least that these There stereo and An viewpoints. The data as well as edges 6(a) shows some hand- much this of each which maximum 30 by 45 pixel oil tank at have of its length It is possible no single-resolution a small will will upon in which is the non- complexity, were suggested. based recognit,ion in the scene they of detecting demonstrated. to the extent plementations means and tested on synthetic natural images. Figure output Figure This 50% each in MA~I,ISF’ on a KL- significant which At since different and An example structure to viewpoint, curve the curve, by 50%. significance have been curves gives of the along problem, principles combinatorial relations segmentation, besides be useful. resent implemented algorithm resolutions, by points curve dcvclopcd problems would unlikely values. The drawn length curve covers segmentations at different 10 computer derived from differing of 100 points). a t,hreshold at the 0.05 significance ings are not considered significant. This of adjacent at all locations the signilicancc structures a curve is executed only those in their full overlapping measuring resolutions selects for three number for was other to cover small at all scales ciples, if we unifying structure, avoiding for viewpoint-invariant algorithm possible its borders. procedure different every The knowledge. random looking struc- Ilowcvcr, of only of the which outside thinning more size segment attempted test it is possible groupings groupings extend world will as- diCfercr;t a relatively groupings given grouping to of error, groupings to 6 scales adjacent costly with the we examine with and it exhibits nonrandomncss. margin We examine of two, to for locations factors not bc too curve a rcasonal~le all scales whcro resolir Lions. It would scgmcnt at all resolutions, segmc;ltxtions using edge E. Riseman, [12] Witkin, Andrew P. and .Jay M. Tenenbaum, “On the role To appear in Human and Machine of structure in vision.” Vision, Roscnfeld and Beck, eds. (New York: Academic Press, 1983). These size of the against 260