From Stereoscopic Vision to Symbolic Representation

From: AAAI Technical Report FS-01-01. Compilation copyright © 2001, AAAI (www.aaai.org). All rights reserved. From Stereoscopic Vision to Symbolic Representation Paulo E. Santos* and Murray P. Shanahan {p.santos, m.shanahan}@ic.ac.uk Intelligent and Interactive SystemsGroup Imperial College of Science, Technologyand Medicine University of London Exhibition Road SW72AZLondon Abstract Thispaperdescribesa symbolicrepresentationof the data providedby the stereo-visionsystemof a mobile robot. Therepresentationproposed is constructedfrom qualitativedescriptions of transitionsin the depthinformationgivenby the stereograms.Thegoal of this paper is to investigatethe consequences in the depthtransitionsof qualitativespatial eventssuchas the relative motionbetweenobjects,spatial occlusionandobject’s deformation(expansionand contraction).Thiscausal relationshipbetween spatialeventsanddepthtransitions formsa background theoryfor an abductiveprocessof sensordata assimilation. Introduction Oneof the most important issues in mobilerobot navigation is the connectionbetweenthe physical world and the internal states of the robot. The term "anchoring"(Coradeschi &Saffiotti 2000),refers to this processof assigningabstract symbolsto real sensor data, and is related to the symbold grounding problem (Harnard 1990). There are three mainpossible approachesto this problem,either this connection is explicitly madevia a knowledgerepresentation theory, such as (Shanahan1997)(Hahnel,Burgard, & Lakemeyer1998),or it is hiddenin the layers of neural-networks (Racz&Dubrawski1995)or, finally, there can be no representation of the robot’s internal states whatsoever,as is the case with purely reactive systems (Brooks 1991a)(Brooks 1991b). Althoughthe last two frameworksgive important solutions to certain sets of problems,they are insufficient for the construction of task planning systems using sensor information. For this reason the present work assumesa knowledgerepresentation view of the connection between sensor informationand abstract symbols. Briefly, the goal of the present workis to developa logicbased systemfor scene interpretation using dynamicqualitative spatial notions. Therefore,not only is anchoringa central topic in this work,but the sceneinterpretationissues discussed in this paper are also going to contribute to the understandingof howto track and predict future states of *Supportedby CAPES. Copyright(~) 2001,American Associationfor Artificial Intelligence(www.aaai.org). All rights reserved. 37 symbolicrepresentations of objects in a robot’s knowledge base. This paper assumesa stereo vision system embeddedin a mobile robot as the source of data about the world. A symbolicrepresentation of the sensor data fromthe vision systemis constructedassuminga simplification of the stereograms, a one-dimensionalview of the scene (horizontal cut). Thedepths and edgesof the bright contrasting regions in the stereogramnoted in this vieware, then, turned into two-dimensionalcharts, nameddepth profiles. Peaksin depth profiles are connectedto sensed objects (e.g. physical bodies, occludingphysical bodies, light reflexion, sensor noise). Eachdepth profile is, thus, a snapshot of the robot’s environment as notedby the its sensorsat a time instant. Anexampleof the output of the vision system, a horizontal cut and the correspondingdepth profile is shownin figure 1. In this paperweare interested in explainingincomingsensor data by hypothesising the existence and dynamicrelationships betweenphysical objects (and betweenphysical objects and the observer). The incomingsensor data is assumedto take the form of temporally ordered sequences of depth profiles. This process of considering hypothesesto interpret a sequenceof sensor informationrecalls the sensor data assimilation as abduction proposedin (Shanahan 1996). In fact the inspiration for the present workwas to proposea newqualitative backgroundtheory about space for this framework.In this paper, the backgroundtheory about space takes into accountdynamicalrelations, thus the notion of time plays an importantrole in this theory. Thereare manyapproachesto qualitative theories for time and for space. Van Benthen(van Benthen1991) presents discussion about the mainproblemson defining an ontology for time. A comprehensiveoverview of the field of spatial reasoning is presented in (Stock 1997). Although, theories about qualitative space and time address important problems,most of themare not concernedwith central issues in robotics such as howspace is related to real sensor data, nor are there any eoneernwith respect to predicting sensor data as events happenin the world. The framework presented in this paper introduces someideas to overcome these limitations. In termsof qualitative theories of space,particularlyrelevant to this workis the RegionOcclusionCalculus (ROC) (Randell, Witkowski, & Shanahan 2001). The ROCis a first-order axiomatisationthat can be used to modelspatial occlusion of arbitrary shapedobjects. The formal theory is defined on the degree of connectivity betweenregions representing two-dimensionalimages of bodies as seen from someviewpoint. This paper presents a dynamiccharacterization of occlusion as noted by a mobilerobot’s sensors, whichcould extend ROC by introducing time into its ontology. Stereo-Vision System The stereo-vision systemused in this workis Small Vision System(SVS), whichis an efficient implementationof an area correlation algorithm for computingrange from stereo images. The images are delivered by a stereo head STHMD1,whichis a digital device with megapixelimagers that uses the 1394bus (FireWire)to deliver high-resolution, real time stereo imageryto any PCequippedwith a 1394interface card t. Figure1 showsa pair of input images,the resultant stereo disparity imagefrom the vision systemand a depth profile of a scene. quenceof profiles is a sequenceof temporallyordered snapshots of the world. For brevity, assumethroughoutthis paper that a profile Pi representsthe snapshottaken at the time point tl, for i e N. Informationabout the objects in the world, as noted by the robot’s sensors, is then encodedas peaks (depth peaks) in depthprofiles (figure 2b). The variables edgeand depth in the depth profiles range over the positive real numbers.The edges of a peak in a profile represent the imageboundariesof the visible objects in a horizontalcut. Thedepthof a peakis assumedto be a measureof the relative distance betweenthe observer (robot) and the objects in its field of view.Purther investigation shouldbe donein order to extendthis notion to take into accountthe depthsin the objects’ shape. This wouldlead to the treatmentof the peak’sformat as an additional informationin the theory. The axis depth in a profile is constrainedby the furthest point that can be noted by the robot’s sensors, this limiting distance is represented by L in the charts (figure 2b). The value of L is, thus, determinedby the specification of the 2. robot’s sensors a) J ¯ e f i ........... d. Figure 2: a) Twoobjects a, b notedfrom a robot’s viewpoint v; b) Depthprofile (relative to the viewpointv in a) representing the objects a and b respectively by the peaksp and q. Figure 1: Imagesof a scene (top pictures); stereogramgiven by SVS(bottomleft picture); and depth profile of a scene’s horizontal cut (bottomcentre image). DepthProfiles In order to transform the stereogramsfrom the vision systeminto a high-levelinterpretation of the world,a simplification of the initial data is considered.Instead of assuming the wholestereogramof a scene, only a horizontal line of it (horizontal cut) is analysed. Eachhorizontal cut is represented by a two-dimensional chart (depth profile) relating the depths and the extremeedgesof bright contrasting regions in the portion of the stereogramcrossedby the cut. An exampleof a depthprofile is shownin figure 2b. Depthprofiles can be interpreted as snapshotsof the environment,each profile taken at a time instant. Thus, a se~BothSVSand STH-MD1 weredevelopedin the SRIlaboratoties, http:flwww.ai.sri.com/konoligeYsvs/index.htm. 38 Animportant information embedded in the depth profiles is the size of the peaks. The size of a peakis given by the moduloof the subtraction betweenthe values of its edges (e.g. in figure 2b the size of the peakq is definedby le- dl). Thus,the size of a peakis connectedto the angulardistance betweentwoobjects in the worldwith respect to a viewpoint. Differences in the sizes (and/or depths) of peaksin the sameprofile and transitions of the size (depth)of a peakin sequenceof profiles encodeinformationabout dynamicalrelations betweenobjects in the worldand betweenthe objects and the observer. Themaingoal of this workis to d.evelop methodsto extract this informationfor sceneinterpretation. It is worthypointing out that, for the purposesof this work,the variables depth and edge do not necessarily have to represent precise measurements of regions in a horizontal cut of a scene, but approximatevalues of these quantifies within a predefinedrange. However, the values of these 2Thepractical limit (L) for depthperceptionin the apparatus availablefor this workis about5 meters,readingsfurtherthanthis has shown to be veryunreliable. variables should be accurate enoughto represent the order of magnitudeof the actual measurementsof depth and size of objects in the worldfromthe robot’s perspective. In this work,depthprofiles are assumedto be the only informationavailable to the robot about its environment.The purpose of this assumptionis to permit an analysis of how muchinformation can be extracted from the vision system alone, makingprecise to whichextent this information is ambiguousand howthe assumption of new data (such as robot’s proprioception)can be used to disambiguateit. their logical descriptions wouldbe too extensiveto be taken into account completely.Thus, the strategy of the reasoning systemis not to alwaysprovidinga correct answer,but to concludea possibly correct interpretation of the sensor transition. If this interpretationturns out to be false, another hypothesisshould be searchedfor. Example2: Occlusion betweentwo objects Figure 4 is anotherexampleof a depthprofile sequence.In this case the profiles contain two peaks, p and q, possibly representing twodifferent objects in the observer’sfield of view. Motivating Examples In order to motivatethe development of a theory about depth information two examplesare presented. Example1: Approaching an Object Assuming the two profiles in figure 3 as constituting a sequenceof snapshots of the world taken fromthe viewpointof one observer (u). Common sense makesus suppose that there is an object in the worldrepresentedby the peakp which,either: 1. expandedits size, or 2. was approachedby the observer, or 3. is approaching the observer. It is worthypointing out that any combinationof the previous three hypothesis wouldalso be a plausible explanation for this profile sequence.Thechoiceabout the best set of hypothesesto consider should followa modelof priority, whichis not developedin the present paper. Thetransition in the profile sequencethat suggeststhese hypothesesaccountfor the decreasein the relative depth of the peak p from P1 to P2. The task of a reasoning system, in this domain,wouldbe to hypothesisesimilar facts given sequencesof depth profiles. P1 VI kedge P2 v I Figure 3: Depthprofile sequenceP1, P2. Thereare other plausible hypothesisto explain this profile sequencethat werenot consideredin this example..For instance, it could be supposedthat in the interval after P1 and before P2 a different object wasplaced in front of the robot’s sensors that totally occludesthe object represented in P1. This is not a weaknessof the present work, but an exampleof its basic assumptionconcerningthe incompleteness of information. For the sake of this simple example it mighthavebeing possible to enumerateall of the possible hypothesisto explain the scenario in figure 3, however, in a more complexsituation the amountof hypothesis and 39 P2 P1 P F v edge vI P3 edge q ~"edge ]P4; vI kedge Figure 4: A depth profile sequence P1, P2, P3 and P4. Intuitively, analysingthe profile sequencein figure 4, the transition fromP1 to P2 suggeststhat either the twoobjects representedby p and q weregetting closer to each other or the observer waseircling aroundp and q in the direction of decreasingthe observer-relative distance betweenthese two objects. Moreover,from P2 to P3 it is possible to supposethat the object represented by q and p are in a relation of occlusion, since the peaks p and q in P2 becameone peak in P3. Likewise, from P2, P3, P4, totally occlusion between twoobjects can be entailed, since the peaksp and q changed from two distinct peaks in P2 to one peak in P3 (whiehis a compositionof the peaks in P2), and eventually to one single peak(in P4). In the next section these notions are mademorerigorous. Peaks, Attributes and Predicate Grounding In the present section a formal languageis defined to describe depthprofiles transitions in termsof relations between the objects of the worldand the observer’s viewpoint. Thetheory’s universeof discourseincludes sorts for time points, depth, size, peaks, physical bodies and viewpoints. Timepoints, depthandsize are variables that rangeover positive real numbers(R+), peaksare variables for depthpeaks, physical bodies are variables for objects of the world, viewpoints are points in Ra. Amany-sortedfirst-order logic is assumedin this work. The relation peak_of(p, b, v) represents that there is a peakp assigned to the physical bodyb with respect to the viewpointv. This relation plays a similar role of the predicate groundingrelation defined in (Coradeschi&S~Tiotti 2000). Apeakassignedto an object is associatedto the attributes of depthandsize of that object’s image.In orderto explicitly refer to particular attributes of a peakat a determinedtime instant t, twofunctionsare defined: ¯ depth:peak x time point ~ depth that gives the peak’s depthat a timeinstant; and ¯ size: peak x time point -+ size, which mapsa peak and a time point to the peak’sangularsize. Besidesthe informationabout individual peaks, information about the relative distance betweentwopeaksis also important whenanalysingsequencesof depth profiles. This data is obtained by introducing the function dist described below. ¯ dist: peak x peak x timepoint ~ size, mapstwo peaks and a time point to the angular distance separating the peaksin that instant. This angulardistance is the modulo of the subtraction of the nearest extremeedges between two peak (e.g. in figure 2b the distance betweenpeak and peakqis given by Ib - el). It is worthypointingout that the depthprofiles of a stereogrammaycontain peaksrepresenting not only real physical objects of the worldbut also illusions of objects (e.g. shadowsand reflexes) and sensor noise. This kind of data cannot, in most of the cases, easily be distinguished from the data about real objects in the robot’s environment.In order to cope with this problem,the frameworkpresented in this paper assumes,as a rule of conjecture, that every peak in a profile represents a physical bodyunless there is reasonto believe in the contrary. In other words,this is the assumption that the sensors (by default) note only physical bodies in the world,illusions (andnoise) of objects are exceptions. Onewayto state this default rule formallyis by the use of circumscriptionin a similar wayas presented in (Shanahan 1996) and (Shanahan 1997). A complete solution to this problem,however,is a subject for future research. Relations and Axioms Assumingthat the symbolsa and b represent physical bodies, v represents the observer’s viewpoint,and t represents a time point, the followingrelations are investigatedin this paper: 1. being_approached_by(v, a, t), read as "the viewpoint u is beingapproachedby the object a at timet"; 2. is_distancing_from(a,v, t), read as "the object a is dislancing fromthe viewpointu at time t"; 3. approaching(u,a,t), read as "the viewpoint u is approachingthe object a at time t"; 4. receding(u, a, t), read as "the viewpointu is receding fromthe object a at timet"; 5. expanding(a,u, t), read as "the object a is expanding fromthe viewpointu a at t"; 6. contracting(a,v, t), read as "the object a is contracting fromthe viewpointu a at t"; getting.closer(a, b, u, t), read as "the objects a andb are 7. getting closer fromeachother with respect to v at t"; 8. getting_further(a, b, u, t), read as "the objects a and b are getting further fromeach other with respect to v at t"; 9. eireling<(u,a, b, t), read as "u is circling aroundobjects a and b at t in the direction that the distancebetween their imagesis decreasing"; 10. circling>(v, a, b, t), read as "u is circling aroundobjects a andb at t in the direction that the distancebetweentheir imagesis increasing"; 11. occluding(a,b, v,t), read as "a is occludingb fromv at timet"; andfinally 12. totally_occluding(a,b, v, t), read as "a is totally occluding b fromu at timet"; Theserelations are hypothesesassumedto be possible explanationsfor transitions in the attributes of a peak(or set of peaks). The formulae[A1] to [A6] below express the connectionsbetweenthe relations 1. to 12. stated aboveand the transitions in the peaks’attributes. [A1]Vavt being_approached_by(v,a, t)V approaching(v,a, t) V expanding(a,u, t) 3ptlt2(tl < t) A (t < t2)A peak_of(p, a,u)A (depth(p, tl)< de~h(p, Theformula[A 1] states that if there exists a peakp representing the object a fromthe viewpointu suchthat its depth decreasesthrougha sequenceof profiles, then it is the ease that either the observeru is beingapproached by the object, or the observer is approaching the object, or the object is expandingitself. The relations is_distancing_f rora / 4, receding~4and contracting~3can be similarly stated. As shownin the formula[A2]. [A2] Vaut is_distancing_from(a, u, t)V receding(v, a, t) V contracting(a, u, 3ptlt~(tl < t) A (t < t2)A peak_of(p, a, u) A (depth(p, tl) > depth(p, t2)) Likewise,the event of two bodies getting closer to each other (or the observer movingaroundthe objects) at an instant t followsas hypothesisto the fact that the angulardistance betweenthe respective depth peaksto the two objeets decreases throughtime. The formula[A3]belowstates these facts. Theoppositerelation, getting further, is stated by the formula[.44]. [A3]Vabut[getting_closer(a, b, u, t)V circling<(u,a,b,t)] A (a y£ b) ~-3pqht2(ta < t) A (t < t2) Apeak_of(p,a,v)A peak_of(q,b,u) A (dist(p,q, t2) < dist(p,q, 40 [.44] Vabut Letting_further(a, b, u, t)Y circling>(u, a, b, t)] A(a ¢ b) 3pqtlt2(tl < t) A (t < t2) Apeak_of(p,a,u)A peak_of(q,b, v) A (dist(p, t2)> dist (p, q, ~1)) Assumethe relation connected(p, q, t), read as "peak p is connectedto peakq at time t"; connected(p,q, t) holds if the distancebetweenthe peaksp and q at timet is zero. of reading the formulaeaboveis assumingthe predicates in the left-hand side as actions to be manipulatedby a planning system. In this cause, the formulae[All to [A6]can be understoodas rewriting roles for these actions in termsof the expectedsensor transition. As the present paper is concernedwith the first interpretation, sensor data assimilation via an abductiveprocessis briefly discussedin the next section. Sensor Data Assimilation Vpqtconnected(p,q, t) ~- ( dist(p, q, t) Notethat the previousrelation is only definedfor peaks, it has no relation to the degree of connectivity betweentwo physical bodies. In this article the relation connected~3is only used to characterise the occluding~4relation, as stated in formula [A5]below. [A5] Vabut occluding(a, b, v, t) A (a ~ b) 3Pq~I($1 ~ ~) Apeak_of(p,a, y)A peak_of(q,b, u) A -~connected(p,q, tx Aconnected(p,q, t) A [size(q, t) < size(q, tl)V size(p, t) < size(p, tl)] Theformula[.45] states that one of the objects a and b is occludingthe other at time instant t, if the peaks p and q (connectedat t) are not connectedat sometime point before t and the size of one of the peaks is shrinking through the given profile sequence. Similarly to occluding~4,if at an instant tl (tl < t) wasthe ease that occluding(a,b, v, t), andat t the peakrepresenting b disappearsfromthe profile (i.e. size(q, t) = then one of the objects is totally occludingthe other. As representedin formula[.46]. As already mentioned,this workfollows a similar approach to that proposedin (Shanahan1996),the task of sensor data assimilationof a streamof sensor data is consideredto be an abductive process on a backgroundtheory about the world. Basically, abductionis the process of adopting hypotheses to explaina set of facts accordingto a knowledge theory. Givena set of formulaeabout a certain field of knowledge and a set of observations, the task of the abductiveinference is to supposethe causesof these observationsfromthe knowledgeavailable. Abduction,thus, is a non-monotonic inference (Boutilier &Becher1995) since the explanation inferred from the available knowledgemaycomeout to be false providednewinformationis acquired. Takinginto account the formal systemdescribed in the previoussection, the task of the abductiveprocessis to infer the conclusionsof formulae[.41] to [.46] as hypotheses, given a description (observation)of the sensor data in terms of depthpeakstransitions. Moreformally, assumingthat ~ is a backgroundtheory comprisingformulae[A1] to [.46] and ¯ a description (in terms of depth profiles) of a sequenceof stereo images,the task is to find a set of formulaeA, suchthat r~,A~, [.46] Vabuttotally_occluding(a, b, u, t) A (a # b) 3h V q( t a <t) ^ occluding(a, b, u, tx ^peak_of(q, b, u) ^ (size(q, t) The formulae[A1] to [A6] represent a set of weakconnections betweendepth peak transitions and dynamicrelations betweenobjects and betweenthe observer and then objects. By weakconnectionswe meanthat some(but not all) of the hypothesesto explain the sensor data are representedwithin the formulae. This set of hypotheses, however, does not exhaust all of the possible configurationsof the objects as noted by each sequenceof snapshots(as briefly discussedin the example1 above). The purpose of this incomplete set of hypothesesis, initially, to give a hint aboutwhatpossibly couldbe a high-levelexplanationfor the sensor data this high-level interpretation could then be used by a planning systemin order to give a possibly executableplan. Asa matter of fact, there are twowaysof readingthe formulaeabove. Firstly, if the fight-hand side of one of these formulaeunifies with a logic description of the sensor data, then the predicates in the left-hand side of it should be inferred. Dueto the fact that the set of hypothesesis not complete, this inference process maylead to false conclusion, that can be reviewedif newdata is available. For this reason we assumeabduction as the inference rule. A second way 41 (I) where thesymbol l~refers tothenon-monotonic logic consequence. In order to makethis clearer, examples1 and 2 are recalled using the formal languageand the concept of abduction presented above. ExamplesRevisited Example1: Approachingan Object Consider the profile sequencein figure 3. A description of the sensor data described by the transitions betweenthe peak p in P1 and P2 is the set (3p a u tl t2 peak_of(p,a,u)^ = ~ (size(p, t,) < size(p, t2))A I, ( depth(p,tl ) >depth(p,t2 Wherepeak_of(p, a, y) is the result of the default assumptionthat every peak represents a physical body. /,From ¯ and the formula [A1], abduction infers that being_approached_by(y,a, t) or approaching(u,a, t) or expanding(a, u, t) for sometime point t E It1,... ,t2]. Consideringthe formula (1) in the previous section, the set {being_approached_by(u, a, t ), approaching(u,a, t expanding(a,u, t)} constitutes the set of explanationsA. Example2: Occlusion between two objects Similarly to the example1, a description of the profile sequencerepresentedin figure 4 can be written as the set the gap betweenvision and planningrely on the use of special purpose algorithms or on the developmentof a common language,instead of first-order logic, to solve this issue. A limitationof the first approachis that it treats visual interpretation and planningas twodistinct processes, while the present workassumesthat they can be handled by a similar abductive mechanism,as proposed in (Shanahan1996). the other hand, by assumingthe secondoption, the developmentof a common languagedistinct from logic mayhave to cope with most problemsthat were already solved through the longtradition of logic-basedsystemsfor A.I. In this workthe simplification of the stereogramsin terms of horizontalcuts mayseemto be verylimiting at first sight. However, it guaranteedthe development of a qualitative spatial reasoningtheory intrinsically related to sensor information. The core purposeof this ongoingresearch is to investigate howmuchof a spatial reasoning theory can be extracted from poor sensor information, concludingfromthat whatkind of robotic tasks can be accomplishedwithin this theory. Extendingthis one-dimensionalview of the scene to a moregeneral analysis could be doneby consideringthe joint information given by depth profiles of cuts madein other directions of the scene. This analysis maylead to a better notion of individual objects and the degree of connectivity betweenobject. This issue, however,is left for future research. Anotherissue that was overlookedby the present paper due to time constraints wasthe considerationof the information encodedby the format of the depth peaks. Assuming profiles taken frommanydifferent points of viewaround the object, the formatof these peaksare intrinsically related to the shape of the bodies, and to the two dimensionalregion this object oecupyin the environment.Thus, the format of the depth peaks could play an important role in the perceptualsignature of the objects (Coradesehi&Saffiotti 2000) and to theories about spatial occupancyand persistence (Shanahan1995). Afurther subject for future researchis the improvement of the representationalsystemas moreprecise and richer visual informationis considered. Future workshould also investigate the possible transitions betweenthe relations 1. to 12. (describedin the section Peaks, Attributes and Predicate Grounding).This information wouldbe useful for constraining the possible explanations in the abduetivetask of sensordata assimilation. Theconsiderationof default rules in the theory is another important open problemworthy mentioning. In the section Peaks, Attributes and Predicate Grounding, the connectionbetweenthe symbolicand the physical representation of the objects in the worldis defined by the predicate peak_of~3. The statement peak.o f fp, a, v) can be interpreted in terms of anchoringas "the anchorp is assigned to the object a fromthe viewpointv". It is the task of the abductiveprocessto maintainthis anchorin time. For instance, recalling the profile P4 in the example2 (figure 4), whereonly one peak is shown, and supposingthat in sequential profile P5 another peak appears next to peak q, the reasoningsystemshould concludethat this newpeak is ’3ab01 02tl t2 t3 t4 v tl < t2 A t2 < t3A t3 <t4 A peak_of(a, 01, !/) A peak_of(b, 02, v)A ( dist( a, b, h) > dist( a, t2)) ^ -~connected( a,b, h) ^-~eonneeted(a, b, t2) ^ connected(a,b, t3)^ Vt q [h, t4] (depth(a, t) > depth(a, t))^ size(a, t4) = /,From ~’ and the set of formulae I~ (as defined in the previous section), an abductiveexplanation for ~ wouldinelude someof the followingrelations: ’getting_closer(a, b, v, t) !m A tl <~ t A 1~ <~ t2 circling(v, a, b, t) A h < t A t < occluding(a,b, v, t3) , totally_occluding(a,b, v, t4) Image segmentation: future plans This work assumes a bottom-up approach to image segmentation. Theimplementationof the systemis in its very early stages. Upto nowthere is an edgedetection task followedby a region segmentation(and labeling) step on the stereograms.Therefore,depth profiles are going to be constructed from the vertical edges extracted fromthe imageof the scene. Thecentral idea is to hypothesisethe existenceof objects fromthese imageregions, and proposetests in order to verify the hypothesesor to search for morecompleteexplanations wheneverit is needed.This necessity in searchingfor more complete explanations should be decided by a planning systemcapable of handling priorities within the hypotheses. This issue, however,is outside the scopeof this paper. Discussion and Open Issues This paper proposeda symbolicrepresentation of the data from a mobilerobot’s stereo-vision systemusing depth informationand dynamicnotions relative to a viewpoint. The symbolicrepresentation proposedis based on a horizontal cut of the imagegiven by the stereo-vision system. Transitions in the depth profiles were, then, linked to highlevel descriptions of the environmentby first-order axioms. The mainpurpose of this workis to developa mechanism of inference to concludehypothesisabout the relative motion betweenobjects, spatial occlusion and objeet’s deformation fromsensor data transition. This relationship betweenspatial events and their effects in the depthprofile transitions constitutes a backgroundtheory for a logic-based abduetive processof sensor data assimilation. The reason for assuminga logic-based approachin this workis to assumea solid common language for visual data interpretation and task planning. Otheroptions for bridging 42 the peakio (previously involvedin the profile sequenceP1, P2, P3 and P4), maintainingthe previously stated connection betweenp and the object a. This fact can be concluded assuminga default rule about the permanence of objeeLs,another importantopen issue of this work. The connection between the depth profile representation and the Region Occlusion Calculus (ROC)(Randell, Witkowski,&Shanahan2001), as mentionedin the introductionof this paper, is also an interesting topic for further investigation. Themaincontributionof investigating the relationship betweenthe RegionOcclusion Calculus and the framework presentedin this article is the possibility to extend ROCincluding dynamicsensor information in its ontology. Acknowledgments Thanks to David Randell and MarkWitkowskifor the invaluable discussions and to the two anonymousreviewers for the useful suggestionsfor improvingthis paper. References Boutilier, C., and Beeher, V. 1995. Abductionas belief revision.Artificial Intelligence. Brooks, R.A. 1991a. Intelligence without reason. A.I. Memo1293, MIT. Brooks,R. A. 1991b.Intelligence without representation. Artificial Intelligence Journal47:139-159. Coradesehi,S., and Saffiotti, A. 2000. Anchoringsymbols to sensor data: preliminaryreport. In Proc.of the 17thNational Conferenceon Artificial Intelligence (AAAI-2000), 129-135. Hahnel, D.; Burgard, W.; and Lakemeyer, G. 1998. GOLEX bridging the gap between logic (GOLOG) and real robot. LNCS.Springer Verlag. Harnard, S. 1990. The symbolgroundingproblem. Physica D 42:335-346. Racz, J., and Dubrawski,A. 1995. Artificial neural networkfor mobilerobot topological localization. Journalof Robotics and AutonomousSystems 16(1):73-80. RandeU,D.; Witkowski,M.; and Shanahan, M. 2001. From imagesto bodies: modelingand exploiting spatial occlusion and motionparallax. In Proceedingsof the International Joint Conferenceon Artificial Intelligence. Toappear. Shanahan,M. 1995. Default reasoning about spatial occupancy.Artificial Intelligence 74(1): 147-163. Shanahan, M. 1996. Robotics and the commonsense informatic situation. In Proceedingsof the 12th European Conferenceon Artificial Intelligence. JohnWileyand Sons. Shanahan, M. 1997. Solving the frame problem. TheMIT press. Stock, O., ed. 1997. Spatial and TemporalReasoning. KluwerAcademicPublishers. van Benthen,J. 1991. TheLogic of Time: a model-theoretic investigation into the varieties of temporalontologyand temporal discource. KluwerAcademic. 43

From Stereoscopic Vision to Symbolic Representation

Related documents

Products

Support

From Stereoscopic Vision to Symbolic Representation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib