Robo-labw9

advertisement
Mobile Robotics COM596
Laboratory Session: week 9
Reinforcement Learning
A good way to understand reinforcement learning is to consider some of the examples and possible
applications that have guided its development.

A mobile robot decides whether it should enter a new room in search of more trash to collect or start
trying to find its way back to its battery recharging station. It makes its decision based on how quickly
and easily it has been able to find the recharger in the past.
The example share features that are so basic that they are easy to overlook. All involve interaction between an
active decision-making agent and its environment, within which the agent seeks to achieve a goal despite
uncertainty about its environment. The agent's actions are permitted to affect the future state of the
environment (e.g., the next chess position, the level of reservoirs of the refinery, the next location of the
robot), thereby affecting the options and opportunities available to the agent at later times. Correct choice
requires taking into account indirect, delayed consequences of actions, and thus may require foresight or
planning.
At the same time, in all these examples the effects of actions cannot be fully predicted; thus the agent must
monitor its environment frequently and react appropriately. The example above involve goals that are explicit
in the sense that the agent can judge progress toward its goal based on what it can sense directly.
In all of these examples the agent can use its experience to improve its performance over time. The knowledge
the agent brings to the task at the start--either from previous experience with related tasks or built into it by
design or evolution--influences what is useful or easy to learn, but interaction with the environment is
essential for adjusting behavior to exploit specific features of the task.
The robot is frontally approaching an obstacle (e.g. one of the wood-bots that are used in the assignment). It
continuously checks its front IR-sensors. As soon as the robot senses the approaching obstacle it turns either
left or right, say left. Now it should sense the obstacle with the sensors at its right side. If this happens (What
could the robot do if not?) the robot can go straight again until it starts to loose contact to the obstacle. If this
happens it will turn again, but now to the right. Moving forward it should sense to obstacle again. (What could
the robot do if not?) Now the loop can start again. This obstacle-avoidance-and-re-approach should continue
until the robot is "behind" the obstacle. The robot can use either its angle or the extrapolated path in order to
determine whether it has already circumvented the obstacle.
The angle can be calculated based on the movements of the robot by a simple formula, but it it may be better
to use a complete odometry (x-coordinate, y-coordinate and angle) in order to keep track of the robot's
position.
Classical odometry with Khepera
The Khepera has two shaft encoders for its left and right wheels. Those two sensors increment or decrement
two tick counters according to the rotation of the wheels. 1 tick corresponds to a displacement of 0.08 mm of
the wheel. Implementation of odometry consists repeated use of these counters in order to update the pose of
the robot.

y
x
Figure 8.1: Pose of the Khepera
The pose of the robot is made of three values: (x, y, ) where x and y are the absolute cartesian coordinates,
and  the orientation of the robot, as shown in Figure 8.1. Generally, the absolute values of the counters are
not relevant, so we are more interested in their variations during the iteration of the algorithm. Let us denote
these variations dL and dR for the left and right wheel respectively.
The implementation of odometry for a typical robotic application can be done as follow :
%Global Pose
Pose = [x0, y0, 0]
// initialises pose
...
[dL, dR] = read_encoders
// gets encoders variation
[dx, dy, d] = Odometry(dL, dR, ) // calculates displacement
Pose = Pose + [dx, dy, d]
// updates pose
...
// moving commands
where the function Odometry() is a direct implementation of the forwards kinematics equations. The basis for
the equations is that the Khepera always moves on an arc of circle, because of its geometry. The centre and
radius of this arc can be found using dL, dR, and some geometric features of the Khepera (diameter of wheels,
distance between wheels, shaft encoders step).
Forward kinematics
The following is fully extracted from [Lundgren, 2003].
Figure 8.2: Robot’s specification.
Each wheel follow a path that moves around the ICC at the same angular rate ω, thus
ω(R + l/2) = Vr (8.1)
ω(R - l/2) = Vl (8.2)
where L is the distance between the two wheels, the right wheel moves with velocity Vr and the left wheel
moves with velocity Vl. R is the distance from the ICC to the midpoint between the wheels. All these control
parameters are functions of time, which gives
R = l/2 * (Vl + Vr) / (Vr – Vl) …..(8.3)
ω = (Vr – Vl) / L …..(8.4)
We have two special cases that comes from these equations. If Vl = Vr, then the radius R is infinite and the
robot moves in a straight line. If Vl = -Vr, then the radius is zero and the some angular rate ω.
The forward kinematics equations can be derived easily now that we have established the
basics. Our focus is on how the x and y coordinates and the orientation change with respect to time [2]. Let θ
be the angle of orientation, measured in radians, counter-clockwise from the x axis. If we let m(t) and θ(t) be
functions of time representing speed and orientation for the robot, then the solution will be in the form:
dx/dt = m(t)cos(θ(t)) (3.1)
dy/dt = m(t)sin(θ(t)) (3.2)
The change of orientation with respect to time is the same as the angular rate ω. Therefore
dθ/dt = ω = (Vr – Vl) / L (3.3)
Integrating this equation yields a function for the robots orientation with respect to time.
The robots initial orientation θ(0) is also replaced by θ0:
θ(t) = (Vr – Vl)t / L + θ0 (3.4)
Since the velocity in functions (3.1) and (3.2) above simply equals the average speed for the two wheels, that
is m(t)=(Vl + Vr) / 2, integrating this in (3.1) and (3.2) gives:
dx/dt = [(Vr + Vl)/2]cos(θ(t)) (3.5)
dy/dt = [(Vr + Vl)/2]sin(θ(t)) (3.6)
The final step is to integrate equations (3.5) and (3.6), and taking the initial positions to be x(0) = x 0, and y(0)
= y0 to get:
x(t) = x0 + l/2(Vr + Vl)/(Vr - Vl)[sin((Vr - Vl)t/l+ θ0)-sin(θ0)] (3.8)
y(t) = y0 - l/2(Vr + Vl)/(Vr - Vl)[cos((Vr - Vl)t/l+ θ0)-cos(θ0)] (3.9)
Noting that l/2(Vr + Vl) / (Vr - Vl) = R, the robots turn radius, and that (Vr - Vl) / L = ω, equations (3.8) and
(3.9) can be reduced to:
x(t) = x0 + R[sin(ωt + θ0)-sin(θ0)] (4.0)
y(t) = y0 - R[cos(ωt + θ0)-cos(θ0)] (4.1)
This is the theory that lies behind implementing dead reckoning on a wheeled mobile robot using differential
steering. The only thing one has to do is to substitute the terms Vr and Vl with Sr and Sl, indicating the
calculations of displacements rather than speeds, and as a result of this also drop the time value t. Here Sl and
Sr are the distances traveled by the left and right wheels respectively. Finally when this has been done
equations (3.8) and (3.9) becomes:
x(t) = x0 + l/2(sr + sl)/(sr - sl)[sin((sr - sl)/L+ θ0)-sin(θ0)] (4.2)
y(t) = y0 - l/2(sr + sl)/(sr - sl)[cos((sr - sl)/L+ θ0)-cos(θ0)] (4.3)
The following Matlab codes will be useful for coding the controller of the Khepera robot.
Example 8.1: Matlab code for obstacle avoidance
function avoid(port,baud,time)
% avoid(port,baud,time)
% port = serial port to communicate with (port<0 ==> simulated robot, port>=0 ==> real
robot
% baud = baud rate
% time = time to run behaviour
ref=kiks_kopen([-1,9600,100]);
kSetEncoders(ref,0,0);
reflex = 0;
speed = [10 10];
t0=clock;
loops=0;
while (kiks_ktime(port)<time)
loops=loops+1;
reflex = kProximity(ref);
lights = kAmbient(ref);
weightsL = [10 2 4 6 6 4 2 4 4];
weightsR = [10 -2 -4 -6 -6 -4 -2 4 4];
speed = calcSpd(weightsL,weightsR,reflex)/2;
kSetSpeed(ref,speed(1),speed(2));
end;
%kGetEncoders(ref)
t=etime(clock,t0);
disp(sprintf('%.1f loops in %.1f seconds = %.1f loops/second
(%.0f%%)\n',loops,t,loops/t,(time/t)*100));
kSetSpeed(ref,0,0);
kiks_kclose(ref);
function out = calcSpd(weightsL, weightsR, reflex)
mL = weightsL(1);
mR = weightsR(1);
for i=2:9
mL = weightsL(i)*(1/400)*reflex(i-1)+mL;
mR = weightsR(i)*(1/400)*reflex(i-1)+mR;
end
if sum(reflex(1:4)) > sum(reflex(5:8))
out = [round(mL) round(mR)];
else
out = [round(mR) round(mL)];
end;
Example 8.2: Matlab code for wall following
function follow(port,baud,time)
% follow(port,baud,time)
% port = serial port to communicate with (port<0 ==> simulated robot, port>=0 ==> real
robot
% baud = baud rate
% time = time to run behaviour
ref=kiks_kopen([-1,9600,100]); % KiKS starts up automatically when kiks_kopen is called
reflex = 0;
speed = [1 1];
loops=0; % for calculating loops/second, not needed for the simulation
t0=clock; % for calculating loops/second, not needed for the simulation
kSetEncoders(ref,0,0);
while (kiks_ktime(port)<time)
% kiks_pause(0.2); % just to show how much a simulation can be accelerated.
loops=loops+1; % for calculating loops/second, not needed for the simulation
reflex = kProximity(ref);
weightsL = [5 0 2 3 3 0 0 0 0];
weightsR = [7 0 0 -3 -3 0 0 0 0];
speed = calcspd(weightsL,weightsR,reflex);
kSetSpeed(ref,speed(1),speed(2));
end;
kSetSpeed(ref,0,0);
% calculate stats
fprintf('%1.0f seconds simulated in %1.0f seconds (%3.0f%% of real
time)\n',kiks_ktime(port),etime(clock,t0),(kiks_ktime(port)/etime(clock,t0))*100)
fprintf('%1.1f loops / second\n',loops/kiks_ktime(port))
% close port
kiks_kclose(ref);
function out = calcspd(weightsL, weightsR, reflex)
mL = weightsL(1);
mR = weightsR(1);
for i=2:9
mL = weightsL(i)*(1/1023)*reflex(i-1)+mL;
mR = weightsR(i)*(1/1023)*reflex(i-1)+mR;
end
out = [round(mL) round(mR)];
Example 8.3: Matlab code for maze following
function maze(port,baud,maze_size)
% maze(port,baud)
% port = serial port to communicate with (port<0 ==> simulated robot, port>=0 ==> real robot
% baud = baud rate
% maze_size = size of maze
if(nargin<3) maze_size=16; end;
if(nargin<2) baud=9600; end;
if(nargin<1) port=-1; end;
if port<0 % only if this is a simulated robot
kiks_arena(kiks_generate_maze(maze_size,maze_size)); % set up a new arena using the
kiks_generate_maze function
end;
ref=kiks_kopen([-1,9600,100]);
reflex = 0;
speed = [1 1];
loops=0; % for calculating loops/second
t0=clock; % for calculating loops/second
kSetEncoders(ref,0,0);
lights=kAmbient(ref);
while (min(lights)>0) % stop when robot gets near the light
loops=loops+1; % for calculating loops/second
reflex = kProximity(ref);
weightsL = [-1 4 0 3 0 0 0 0 0];
weightsR = [ 3 0 0 -8 0 0 0 0 0];
speed = calcspd(weightsL,weightsR,reflex);
if speed==[0 0] speed=[1 1]; end;
kSetSpeed(ref,speed(1),speed(2));
mov_err = kGetStatus(ref);
backup(ref,[mov_err(3) mov_err(6)],speed);
lights = kAmbient(ref);
end;
kSetSpeed(ref,0,0);
% calculate stats
fprintf('%1.0f seconds simulated in %1.0f seconds (%3.0f%% of real
time)\n',kiks_ktime(port),etime(clock,t0),(kiks_ktime(port)/etime(clock,t0))*100)
fprintf('%1.1f loops / second\n',loops/kiks_ktime(port))
% close port
kiks_kclose(ref);
function out = calcspd(weightsL, weightsR, reflex)
mL = weightsL(1);
mR = weightsR(1);
for i=2:9
mL = weightsL(i)*(1/1023)*reflex(i-1)+mL;
mR = weightsR(i)*(1/1023)*reflex(i-1)+mR;
end
out = [round(mL) round(mR)];
function R = backup(ref,mov_err,spd)
if mov_err>spd*0.8
% back up
kSetEncoders(ref,0,0);
kMoveTo(ref,-100,-500);
% kiks_pause(1);
status=kGetStatus(ref);
while(status(1)+status(2)<2)
status=kGetStatus(ref);
end;
end;
Download