Learning Distinctions and Rules in a Continuous World through

advertisement
Learning Distinctions and Rules in
a Continuous World through Active
Exploration
PAPER BY
JONATHAN MUGAN & BENJAMIN KUIPERS
PRESENTED BY
DANIEL HOUGH
The Challenge
 To build a robot which learns its environment like
children do.
 Piaget [1952] theorised that children constructed
this knowledge in stages
 Cohen [2002] proposed that children have a
domain-general information processing system for
bootstrapping knowledge.
Foundations
 The Focus of the work: How a developing agent can
learn temporal contingencies in the form of
predictive rules over events.
 Watson [2001] proposed a model of contingencies
based on his observations of infant behaviour:


Prospective temporal contingency: Event B tends to
follow Event A with a likelihood greater than chance
Retrospective temporal contingency: Event A tends to
come before Event B more often than chance.
 Distinctions must be found to determine
when an event has occurred.
Foundations
 Drescher [1991] proposed a model inspired by
Piaget where contingencies (here schemas) are found
using marginal attribution.
 Results are found which follow actions in a method
similar to Watson’s.
 For each schema (in the form of an action + result),
the algorithm searches for context (situation) that
makes the result more likely to follow that action.
The Method
Introduction
 Here, prospective contingencies as well as
contingencies in which events occur simultaneously
are represented using predictive rules
 These rules are learned using a method inspired by
marginal attribution
 The difference with Drescher is continuous
variables.
 This brings up the issue of determining when events
occur, so distinctions must be found.
The Method
Introduction
 Motor babbling method from last week to learn
distinctions and contingencies.
 This was undirected, does not allow learning for
larger problems – too much effort is wasted on
uninteresting portions of state space.
The Method
Introduction
 In this algorithm, the agent receives as input the values
of time-varying continuous variables but can only
represent, reason about and construct knowledge using
discrete values.
 Continuous values are discretised using distinctions in
the form of landmarks:



A discrete value v(t) for each continuous variable v’(t);
If for landmarks v1 and v2, v1 < v’(t) < v2 then v(t) has the open
interval between v1 and v2 as its value, v = (v1,v2).
The association means agent can focus on changes of v = events
 The agent greedily learns rules that use one event to
predict another.
The Method
How it’s evaluated
 The method is evaluated using a simulated robot
based on the situation of a baby sitting in a high
chair.
Fig. 1: Adorable
Fig. 2: Less adorable
The Method
Knowledge Representation & Learning
 The goal is for the agent to learn to identify
landmark values from its own experience.
 The importance of a qualitative distinction is
estimated from the reliability of the rules that can be
learned, given that distinction.
 The qualitative representation is based on QSIM
[Kupiers, 1994]
The Method
Knowledge Representation & Learning
 A continuous variable x’(t) is represented by discrete
variable x(t) for magnitude and x’’(t) for the direction of
change of x’(t), and ranges over some subset of the real
number line (-∞, +∞).
 In QSIM, magnitude is abstracted to a discrete variable
x(t) that ranges over a quantity space Q(x) of qualitative
values.
Q(x) = L(x) U I(x)
where
L(x) = {x1,...,xn}
landmark values
I(x) = {(-∞,x1),(x1,x2),...,(xn, +∞)}
mutually disjoint open intervals
The Method
Knowledge Representation & Learning
 A quantity space with two landmarks might be
described as (x1,x2), which implies five distinct
qualtitative values,
Q(x) = {(-∞,x1),x1,(x1,x2),x2,(x2, +∞)}
 A discrete variable x’’(t) for direction of change of
x’(t) has a single intrinsic landmark at 0, so its initial
quantity space is
Q(x’’) = {(-∞,0),(0,+∞)}
The Method
Knowledge Representation & Learning: Events
 If a is the qualitative value of a discrete variable A,
meaning a ∈ Q(A), then the event At → a is defined
by A(t – 1) =/= a and A(t) = a
 That is, an event takes place when a discrete variable
A changes to value a at time t, from some other
value.
The Method
Knowledge Representation & Learning: Predictive Rules
 This is how temporal contingencies are described
 There are two types of predictive rules:
 Causal: one event occurs after another later in time
 Functional: linked by a function so happen at the same time
The Method
Learning a predictive rule
 The agent wants to learn rule which predicts a
certain event h
 It will look at other events and find that if one, u,
leads to h more likely than others, then it will create
a rule with that event as the antecedent

It does so by starting with an initial rule with no context
The Method
Landmarks
 When a new landmark is inserted into Q(x) we
replace one interval with two intervals and the
dividing landmark, e.g. a new landmark x* we have
(xi,x*),x*,(x*,xi+1)
 Whenever a new landmark is inserted, statistics
about the previous state space are thrown out and
new ones are built up. This means checking that the
reliability of the rule must be checked.
The Method
The Learning Process
Do 7 times
1.
a)
b)
c)
Actively explore world with set of candidate goals coming
from discrete variables in M for 1000 timesteps
Learn new causal and functional rules
Learn new landmarks by examining statistics stored in rules
and events
2. Gather 3000 more timesteps of experience to
solidify the learned rules
3. Update the strata
4. Goto 1
Evaluation
Experimental Setup
 The robot has two motor variables, one for each of its
degrees of freedom
 A perpetual system creates variables for each of the
two tracked objects in the environment: the hand
and the block.
 Too many variables to reasonably explain here, each
has various constraints
 During learning of the block is knocked off the tray
or if it is not moved for 300 timesteps, it’s put back
on the tray in a random position within reach of the
agent
Evaluation
Experimental Results
 The algorithm was evaluated using the simple task of
moving the block in a specified direction.
 It was ran five times using passive learning and five
using active learning and each run lasted 120,000
timesteps.
 Each active run of the algorithm resulted in an
average of 62 predictive rules.
 The agent gains proficiency as it learns until reaching
threshold at approximately 70,000 timesteps for
both.
Evaluation
Experimental Results
 Clearly, active
exploration
appears to do
better since at
40,000
timesteps, active
learning achieves
the level passive
has at 60,000
timesteps.
The Complexity of Space and Time
 The storage required to learn new rules is O(e2), as is
the number of rules – but only a small number are
learned by the agent.
 Using marginal attribution each rule requires storage
O(e), although all pairs of events are stored for
simplicity.
Conclusion
 First the agent could only determine the direction of
movement of an object
 Active exploration of environment and using rules to
learn distinctions then using distinctions to learn
more rules, the agent progressed from having a very
simple representation towards a representation that
is aligned with the natural “joints” of its
environment.
Download