From: AAAI Technical Report FS-01-01. Compilation copyright © 2001, AAAI (www.aaai.org). All rights reserved.
TowardBootstrap Learning for Place Recognition
Benjamin Kuipers and Patrick Beeson
Computer Science Department
The University of Texas at Austin
Austin, Texas 78712
Abstract
Pierce and Kuipers [1997] explored howsuch an agent
couldprogressivelylearn: (1) properties of its sensors, (2)
basis set of motorcommands,
(3) sets of sensoryfeatures useful as local state variables, and(4) controllawsfor trajectoryfollowing and hill-climbing. Thesecontrol laws are sufficient to supporttravel amongdistinctive states (dstates), and
henceto supportcreation of the causal, topological and metrical levels of the Spatial SemanticHierarchy(SSH)[Kuipers
&Byun, 1991; Kuipers, 2000]. Weuse the term bootstrap
learningfor this kind of progressivecreation of a hierarchy
of representations.
In this paper, given that the agent has learned to movereliably amongdistinctive states, weask howit can learn to
recognizeplaces.
Wepresent a methodwherebya robot with no prior
knowledgeof its sensors, effectors or environment
can learn to recognizeplaces with high accuracy,in
spite of perceptualalia.sing (different places appear
the same)and imagevariability (the sameplace appears differently). Previous workshowedhowsuch
a robot could learn from its experience a useful
set of sensory features, motionprimitives, and local control laws to movefromone distinctive state
to another. Suchprogressive learning of a hierarchical representation is called bootstraplearning.
Thefirst step in learning place recognition eliminates imagevariability in twosteps: (a) focusing
on recognition of distinctive states defined by the
robot’s control laws, and (b) unsupervisedlearning
of clusters of similar sensory images. The clusters define viewsassociatedwithdistinctive states,
often increasing perceptual aliasing. The second
step eliminates perceptual aliasing by building a
cognitive mapand using history information gathered during exploration to disambiguatedistinctive
states. The third step uses the labeled imagesfor
supervisedlearning of direct associations fromsensory imagesto distinctive states. Weevaluate the
methodusing a physical mobile robot in two environments, showinglarge amountsof perceptual
aliasing and highresulting recognitionrates.
1 Bootstrap Learning
Suppose an agent awakes in an unknownenvironment with
an uninterpretedset of sensors and effectors. Howcanit learn
the nature of its ownsensorimotorsystemand then learn the
structure of its environment?
This problemis important in praetical terms because we
wantrobots with very rich sensorimotorsystemsto be able to
adapt to newsenses or to changesin its existing sensors. Future robot sensors based on MEMS
technology mayalso have
irregular structures similar to biological sensors, rather than
being (for example)a rectangular array of pixels. This problem is an aspect of the fundamentalquestion of howsymbols
in a knowledgerepresentation gain their meaningby being
groundedin sensorimotorinteraction with the world.
25
2 Place Recognition
Wewant a robot to learn from experience to recognize the
place it is at andits orientation at that place. Together,the
robot’s position andorientation constitute its state in the environment. Withoutcontextual information, this recognition
problemis unsolvableevenwithperfect sensors, since different places mayhave identical sensory images.Realistically,
sensors are imperfect, so evenif subtle distinguishing features are present, they maybe buried in sensor noise. There
are twodifficulties that mustbe overcome
for effeetive place
recognition.
¯ Perceptualaliasing: different places mayhavesimilar or
identical sensory images.
¯ Imagevariability: the sameposition and orientation may
havedifferent sensoryimageson different occasions,for
exampleat different times of day.
Thesetwodifficulties trade offagalnst eachother. Withrelatively impoverishedsensors (e.g., a sonar ring) manyplaces
have similar images, so the dominantproblemis perceptual
aliasing. Withmuchricher sensors such as vision or laser
range-finders, discriminatingfeatures are morelikely to be
present in the image,but so are noise and dynamicchanges,
so the dominantproblemfor recognition becomesimagevariability.
3 A Hybrid Solution
In order to bootstrap to an effective solution to the place
recognition problem,we combineseveral different learning
Action(i)
Image(i)
Figure 2: Lassie explores a rectangular roomwhoseonly distingnishing feature is a small notch out of one comer. Image
variability arises fromposition and orientation variation when
Lassiereachesa distinctive state, andfromthe intrinsic noise
in the laser range-finder.Perceptualaliasing arises fromthe
symmetryof the environment,and the lack of a compass.The
notch is designedto be a distinguishingfeature that is small
enoughto be obscuredby imagevariability.
Dstate(i)
(a)
1
SSHTopological
Map
-- by exploration and abduction from the observedsequence of views and actions [Kuipers, 2000; Remolina
&Kuipers, 2001]. If neededto disambignatethe current
distinctive state, use history-basedmethodssuch as the
Place(i),Path(j)
rehearsal procedure [Kuipers & Byun,1991] or homing
sequences LRivest &Schapire, 1989]. While theoretically tractable, these are expensiveboth in computation
Figure 1." Bootstraplearning of place recognition. Solid arand in additionaltravel. Thisis feedbackpath (a) in Figrowsrepresent the majorinference paths, while dotted arrows
ure 1.
represent feedback.
4. Nowthe sequence of images is classified into views
and labeled with the correct dstate. Applya supermethods(Figure 1). Westart by attacking the problem
vised learning algorithmto learn a direct association
imagevariability, first by focusingon distinctive states and
from
sensory image to dstate. The added information
second by clustering to efiminate noise. Then we apply a
in
supervised
learning makesit possible to identify subrelatively expensivedeductiveand exploration methodto distle discriminatingfeaturas that wouldbe indistinguishambiguateperceptual aliasing, someof which maybe caused
able fromnoise by an unsupervisedclustering algorithm.
by the first step. Andfinally, weuse our expensively-bought
This is feedbackpath (b) in Figure
knowledgeof distinctive states to learn direct associations
from sensory imagesto dstates. The resulting association
Wedescribe the individual steps in moredetail along with
maynot be perfect, in the case of dstates with identical sena simple experiment. Lassie is an RWIMagellanrobot. It
sory images, but the moreexpensivecontextual methodsreperceives its environmentusing a laser range-finder: each
sensory imageis a point in Ris°. Althougheach imagerepremainavailable.
sents the rangesto obstaclesin the 180° arc in front of Lassie,
Thesteps of our solution are the following.
because
we are doing bootstrap learning the robot does not
1. Restrict attention to recognizing distinctive states
havethis knowledge,so it cannotapply powerfulspatial mod(dstates). Distinctive states are well-separated in the
eling methodslike occupancygrids [Moravec,1988].
robot’s state space, and their sensory imagestend to be
As Lassie performsclockwisecircuits of its environment,
either well-separatedor verysimilar.
it encounterseight distinctive states, one immediately
before,
2. Applyan unsupervisedclustering algorithm to the senand one immediatelyafter the turn at each comer(Figure 2).
sory imagesobtained at the dstates in the environment.
This reduces imagevariability by mappingdifferent im4 Focuson Distinctive States
agesof the sameplace into the samecluster, evenat the
TheSSH[Kuipers, 2000]is a hierarchyof distinct but closely
cost of increasing perceptual aliasing by mappingimrelated representationsfor knowledge
of large-scale space. It
ages of different states into the samecluster. Wedeshowshowthe cognitive mapcan be robustly acquired during
fine each cluster to be a view, in the sense of the SSH exploration and used for problem-solvingevenin the face of
[Kuipers & Byun,1991; Kuipers, 2000].
resource limitations and incompleteknowledge.
3. Build the SSHcausal and topological map-- a symA distinctive state is the isolated fixed-point of a hillboric description madeup of dstates, places and paths
climbingcontrol law. Travel amongdistinctive states elim-
26
inates cumulative estimated position error. A sequence of
control laws taking the robot fromone dstate to the next is
abstracted to an action.
For a typical mobilerobot, the state variable x = [x, V, 0]
is three-dimensionalwith components
for position and orientation, and a two-dimensional
motorvector u = Iv, w] specifies linear and angularvelocities. In contrast, a realistic robot
of the present and future will have a very high-dimensional
sense vector s, including such sensors as binocular vision,
laser, sonar and IR range-finders, bumpsensors, odometry,
compass and GPS.
In a qualitatively uniformsegmentof the environment,the
robot governsits behaviorby selecting a reactive control law
Xi. The robot and its environment,coupled through the sensorimotorsystem, are described as a dynamicalsystemwhich
evolvesto a fixed-point x (the distinctive state) where± =
= ¢(x,u)
s = ~(x)
u = x~(s)
(1)
(2)
(3)
cI, represents the physicsof the robot and the constraints
of the environment.~I, represents the sensory systemof the
robot. Xi is the reactive control lawfor the current segmentof
the robot’s behavior. Wedefine an imageas being the highdimensional sensory input I = s = ~(x) when x is a distinctivestate.
Anyimplementation of the dynamical system (1-3) will
have finite tolerances, so the values of x when± = 0 on
different occasionswill not be precisely equal. However,
they
will be clustered very closely, andthey will be separatedfrom
other distinctive states by the basin of attraction of the system (1-3). The samewill be true of the dependentvariable
ts = ~(x).
5 Cluster ImagesInto Views
Anenvironmentcontains a relatively small discrete set of
dstates. In a high-dimensionalsensory space, their sensory
imagesare likely to be well-separated,so a clustering algorithm can eliminate amountsof variation that are small compared with the separation betweengroups of images. When
imagesfrom different dstates happento be very close (due
to highly structured environmentsor weaksensors), they can
simplybe mappedto the samecluster, resulting in perceptual
aliasing.
Distinctivestates are significantly easier to recognizethan
places selected at regularly spacedintervals in the environment [Yamauchi & Langley, 1997; Duckett & Nehmzow,
2000].Regularlyspacedstates are unlikely to be as well separated in sensoryspaceso it will be difficult to eliminateall image variability by clustering withoutincurring large amounts
of perceptualaliasing.
TheSSHa~ssumesthat each dstate is associatedwith a single view, thoughdifferent dstates mayhavethe sameview, so
clustering musteliminate imagevariability.
Wecluster images into k clusters using k-means[Duda,
Hart, &Stork, 2001], searchingfor the value of k that maximizesour "internal measure"of clustering quality,
k
M=/r
k
.o
L c=1 i=1
]_x
I,o,,
- o1’1
<4)
where ne is the numberof elements and dc is the meanof
cluster c, and xc,i is the ith elementof cluster c. Mrewards
tight clusters but penalizeslarger numbersof clusters.
An"external measure"of cluster quality uses knowledge
of
the true dstate associated with each image.The uncertainty
coefficient U(VIS) measuresthe extent to which knowledge
of S predicts the view V [Press et aL, 1992, pp. 632--635].
(p/,j is the probabilitythat the current viewis l~ andthe current dstate is Sj.) Thelargest value of k with U = 1 correspondsto the greatest discriminating powerwhile completely
eliminatingimagevariability.
= HCV)-HCVIS
)
H(V)
H(V) = ~"~Pi. ln pi. wh
erepi.
V(VlS)
= ~--~p,,j
j
H
(VlS) = - ~p,,j In "J whereP4 = ~ P,,~
P4
i
i,./
In 50 circuits of the notchedrectangle environment(Figure 2), Lassie experiences400 images.Applyingthe internal
measure(4) ofeluster quality, Lassie determinesthat k = 4
the clear winner(Figure 3(a)). Figure 3(b) showsthat
is also optimalto the external measure.
Thenotchin the rectangle is clearly being treated as noise
by the clustering algorithm, so diagonally opposite dstates
havethe sameview. In this environment,the four viewscorrespondto the followingtable of distances perceivedto the
robot’sleft, front, andright.
left front right
0.5
4.5
V0 0.5
V1 0.5
4.5
2.5
0.5
2.5
V2 0.5
2.5
4.5
Vs 0.5
l
6 Build the CausalandTopologicalMaps
As the robot travels amongdistinctive states, its continuous
experienceis abstracted, first to an alternating sequenceof
images Ik and actions Ak, then images are clustered into
views Vk, and finally views are associated with dstates Sk.
The SSHCausal mapcan be represented as a simple tableau.
to Io
Ao
tl /’1
(5)
An-1
tNotethat ¯ cannotbehavepathologicallyin the neighborhood
Since clustering imagesinto views h&seliminated image
of a dstate x, or the dynamical
systemwouldnot convergestablyto
x, andit couldn’tbea distinctivestate.
variability, but leaves perceptual aliasing, the problemis to
27
ploratory sequenceof actions likely to producethe relevant
experience.
At the SSHtopologicallevel, actions are classified as turns
and travels. Sets of distinctive states that are connectedby
turns withouttravel defineplaces, and sets of dstates that are
connectedby travels without turns (except TurnAround)
define paths. The SSHcausal and topological levels describe
the environment
at the discrete set of distinctive states, when
the robot is at a dstate S, a place P, and on a directed path
(Pa, fir), where dir E {pos, neg} is the one-dimensional
orientation along the path.
As Lassie explores the notched-rectangle environment,it
creates the followingtableau of experienceat the SSHcausal
and topologicallevels. Notethat the sequencesof time-points
tk, imagesIk and actions Akare not periodic. The sequence
of viewsV~has period four, but the sequenceof distinctive
states Sk has period eight.
to
O,OE
O.9E
tl
O.94
0.9~
0.9
0.84
0.62
0.8
i
3
i
4
i
5
k
i
6
i
7
Figure 3: After Lassie’s explorationof the notchedrectangle,
k = 4 is selected as the best numberof clusters by the internal
measureM(top), and is confirmedas optimal by the external
measure U (bottom).
determinethe correct distinctive states Sk. Weresolve this
ambiguityusing the "rehearsal procedure"[Kuipers &Byun,
1991], or methodsfor learning deterministic finite automata
(DFA)[Rivest &Schapire, 1989], evenin the face of stochastic uncertainty [Basye, Dean, &Kaelbling, 1995]. However,
by focusing on distinctive states and clustering to eliminate
imagevariability, we believe that the remaininguncertainty
is not stochastic. Perceptualaliasing is simplydifferent states
of the DFAhavingthe sameoutput (i.e., view).
Thebasic idea for identifying distinctive states is to assumethat two dstates with the sameview are equal, unless
they can be provedto be different. Theycan be proveddifferent with experiences stored in the SSHcausal level when
the twodstates are directly linked by a non-nullaction. This
can also be done whenthe SSHtopological level includes a
relation incompatiblewith dstate identity, for examplewhen
the twodstates are associated with different places, or when
they mustlie on different sides of a path servingas a dividing
boundary.In ease discriminating information is not already
in the cognitive map,the rehearsal procedureproposesan ex-
28
It
At
Vo So
(turn - °)
90
/1 11181
A1
t2 12 V2
A2
ts 13 Vs
A4
t4 h Vo
A5
t5 15 Vx
Ae
te I6 Vz
A7
tr 17 Va
As
ts Is Vo
:
:
(travel 4m)
82
°)
(turn - 90
Ss
(travel 2m)
&
(turn S5
(travel
$6
(turn $7
(travel
So
:
°)
90
P0
Pat pos
Po Pal pos
1:’1 Pal
pos
P1
pos
Pas
P2 Pa2 pos
P2
Pas
pos
Ps
Pas
pos
Pa
Pat
pos
Po
Pat
pos
:
:
:
(6)
4m)
°)
90
2m)
to ~ on(Po,Pao)
tl ~ on(Po,Pal)
t2 --+ on(P1,Pal)A PO(Pal,pos,Po,
The connectivity of the graph of places and paths is derived from the tableau above. (on(Po, Pat) meansthat place
Po is on path Pat, and PO(Pal,pos,Po, P1) means that
in the place order associated with path Pal in the pos direction, place Po precedes Px.) The detailed rules are beyond the scope of this paper. The concepts are described
in [Kuipers, 2000] and the formal axiomsare given in [Remolina & Kuipers, 2001]. Roughly,we conclude that $4
So because$4 is at P2, whichis to the right of boundaryPat,
while So is at Po, whichis on Pat. Similarly for $5, Se and
$7. Weconclude that Ss = So by a minimality argument,
since there is no necessityfor themto be different.
Thus, by constructing the SSHcausal and topological
maps, Lassie determines that the four views correspond to
eight distinctive states, four placesandfour paths.
I
9~
8£
70
Figure 5: Taylor Hall, secondfloor hallway. Theactual environmentis 80 meters long and includes trash cans, lockers,
benches,desks and a portable blackboard.
4O
3O
20
8 A Natural Office Environment
10
0
11111
o
SO
n
100
i
K
n
150
200
250
Humber
(XIrekdng
examples
t.amIn Taylor2 HLWVsy
n
300
r
350
A natural environment,even an office environment,contains
muchmoredetail than the simplified notched-rectangleenvironment.To a robot with rich sensors, imagesat distinctive
states are muchmoredistinguishable. Imagevariability is the
problem,not perceptual aliasing.
Lassie explored the main hallway on the secondfloor of
Taylor Hall (Figure 5). It collected 100imagesfrom 20 distinctive states. The topological maplinking themcontained
sevenplaces and four paths. Whenclustering the images, the
internal measure Mhad its maximum
at k = 8. The external measureU confirms that this is optimal (Figure 6).
building the causal and topological mapthe robot is able to
disambiguateall twentydistinctive states, eventhoughthere
are onlyeight views.Giventhe correct labeling with dstates,
the supervisedlearner reaches 88%accuracywithin 90 trials
(Figure4(b)).
400
9O
8O
7O
60
4O
30
9
Humb~~ trading
oxanq)les
Figure 4: Learningcurve (using 10-fold cross validation) for
nearest neighbor classification of dstates given sensory images for (a) the notched-rectangleenvironment,or (b) second
floor of Taylor Hall. In both cases, recognition is approximately 90%after 90 training examples.
7 Supervised Learning to Recognize Dstates
Withunique identifiers for dstates, the supervisedlearning
step quicklylearns to identify the correct distinctive state directly from the sensory image with high accuracy. The supervised learning methodis the nearest neighbor algorithm
[Duda, Hart, &Stork, 2001]. Experienced images are represented in the sensoryspace, labeled with their true dstate.
Whena new image is experienced, the dstate label on the
nearest stored imagein the sensoryspace is proposed,and the
accuracyof this guess is recorded. Thenthe imageis stored
withits correct dstate label. Figure4 showsthe rate of correct
answers as a function of numberof imagesexperienced. In
both cases, accuracyrises near 90%at about 90 images.
In general, of course, recognition of dstates from sensory
imagescannot be doneperfectly, since there can be different
dstates whosedistinguishing features, if present at all, cannot be discerned by the robot’s sensors. In such cases, the
robotcan fall backon the historical contextof its travel, or on
further exploration.
29
Conclusion and Future Work
Wehaveestablished that bootstrap learning for place recognition can achieve high accuracy with real sensory images
from a physical robot exploring amongdistinctive states in
real environments. The methodstarts by eliminating image
variability by focusing on distinctive states and doingunsupervised clustering of images. Then,by building the causal
and topological map,distinctive states are disambiguatedand
perceptualaliasing is eliminated.Finally, supervisedlearning
of labeled imagesachieves high accuracy direct recognition
of distinctive states fromsensoryimages.
The current unsupervised and supervised learning algorithms we use are k-meansand nearest neighbor. Weplan
to experimentwithother algorithmsto fill these roles in the
learning method. Other clustering techniques maybe more
sensitive to the kinds of similarities and distinctions present
in sensor images. Supervisedlearning methodslike backprop
maymakeit possible to analyze hidden units to determine
whichfeatures are significant to the discriminationand whieh
are noise. Usingmethodslike these, it maybe possible to
identify and explain certain aspects of imagevariability, for
examplethe effect of time of day on visual imageillumination.
References
[Basye, Dean, &Kaelbling, 1995] Basye, K.; Dean, T.; and
Kaelbling, L.P. 1995. Learning dynamics: system iden-
"~
1.2~x 10
Irll~n~ measwenle~
d ciuNerlng
[Pierce &Kuipers, 1997] Pierce, D. M., and Kuipers, B. J.
1997. Maplearning with uninterpreted sensors and effeetors. Artificial Intelligence 92:169-227.
[Press et al., 1992]Press, W.H.; Teukolsky,S. A.; Vitterling,
W.T.; and Flannery, B. P. 1992. NumericalRecipes in C:
The Art of Scientific Computing.CambridgeUniversity
Press, secondedition.
[Remolina&Kuipers, 2001] Remolina, E., and Kuipers, B.
2001. A logical account of causal and topological maps.
In Proc. 17th Int. Joint Conf. on Artificial Intelligence
(IJCAI-01), 5-11. San Mateo, CA: MorganKaufmann.
[Rivest &Sehapire, 1989]Rivest, R. L., and Sehapire, R. E.
1989. Inference of finite automata using homing sequences. In Proceedings of the 21st Annual ACMSymposium on Theoretical Computing, 411-420. ACM.
[Yamauchi& Langley, 1997] Yamauehi,B., and Langley, P.
1997. Place recognition in dynamicenvironments.Journal
of Robotic Systems 14:107-120.
I,;
:i 1.1
1.05
i
4
i
8
4
i
6
~
8
i
10
i
i
12
14
k
Exlemal rnJ Of cluNor ~1
116
i
18
2O
116
18
20
0,96
k
12
14
Figure 6: After Lassie’s exploration of the Taylor hallway,
k = 8 is selected as the best number
of clusters by the internal
measureM(top), and is confirmedas optimal by the external
measure U (bottom).
tification for perceptuallychallengedagents. Artificial Intelligence 72:139-171.
[Duckett & Nehmzow,2000] Duckett, T., and Nehmzow,U.
2000. Performancecomparison of landmark recognition
systems for navigating mobilerobots. In Proc. 17th National Conf. on Artificial Intelligence (AAAI-2000),826831. AAAIPress/The MITPress.
[Duda, Hart, &Stork, 2001] Duda, R. O.; Hart, P. E.; and
Stork, D. G. 2001. Pattern Classification. NewYork:
JohnWiley&Sons, Inc., secondedition.
[Kuipers &Byun,1991] Kuipers, B. J., and Byun, Y.-T.
1991. A robot exploration and mappingstrategy based on
a semantichierarchyof spatial representations. Journalof
Robotics and AutonomousSystems 8:47-63.
[Kuipers, 2000] Kuipers, B. J. 2000. The spatial semantic
hierarchy. Artificial Intelligence 119:191-233.
[Moravee,1988] Moravec,H. P. 1988. Sensor fusion in certainty grids for mobilerobots. AI Magazine61-74.
30