Mobile Robotics COM596 Laboratory Session: week 9 Reinforcement Learning A good way to understand reinforcement learning is to consider some of the examples and possible applications that have guided its development. A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on how quickly and easily it has been able to find the recharger in the past. The example share features that are so basic that they are easy to overlook. All involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment. The agent's actions are permitted to affect the future state of the environment (e.g., the next chess position, the level of reservoirs of the refinery, the next location of the robot), thereby affecting the options and opportunities available to the agent at later times. Correct choice requires taking into account indirect, delayed consequences of actions, and thus may require foresight or planning. At the same time, in all these examples the effects of actions cannot be fully predicted; thus the agent must monitor its environment frequently and react appropriately. The example above involve goals that are explicit in the sense that the agent can judge progress toward its goal based on what it can sense directly. In all of these examples the agent can use its experience to improve its performance over time. The knowledge the agent brings to the task at the start--either from previous experience with related tasks or built into it by design or evolution--influences what is useful or easy to learn, but interaction with the environment is essential for adjusting behavior to exploit specific features of the task. The robot is frontally approaching an obstacle (e.g. one of the wood-bots that are used in the assignment). It continuously checks its front IR-sensors. As soon as the robot senses the approaching obstacle it turns either left or right, say left. Now it should sense the obstacle with the sensors at its right side. If this happens (What could the robot do if not?) the robot can go straight again until it starts to loose contact to the obstacle. If this happens it will turn again, but now to the right. Moving forward it should sense to obstacle again. (What could the robot do if not?) Now the loop can start again. This obstacle-avoidance-and-re-approach should continue until the robot is "behind" the obstacle. The robot can use either its angle or the extrapolated path in order to determine whether it has already circumvented the obstacle. The angle can be calculated based on the movements of the robot by a simple formula, but it it may be better to use a complete odometry (x-coordinate, y-coordinate and angle) in order to keep track of the robot's position. Classical odometry with Khepera The Khepera has two shaft encoders for its left and right wheels. Those two sensors increment or decrement two tick counters according to the rotation of the wheels. 1 tick corresponds to a displacement of 0.08 mm of the wheel. Implementation of odometry consists repeated use of these counters in order to update the pose of the robot. y x Figure 8.1: Pose of the Khepera The pose of the robot is made of three values: (x, y, ) where x and y are the absolute cartesian coordinates, and the orientation of the robot, as shown in Figure 8.1. Generally, the absolute values of the counters are not relevant, so we are more interested in their variations during the iteration of the algorithm. Let us denote these variations dL and dR for the left and right wheel respectively. The implementation of odometry for a typical robotic application can be done as follow : %Global Pose Pose = [x0, y0, 0] // initialises pose ... [dL, dR] = read_encoders // gets encoders variation [dx, dy, d] = Odometry(dL, dR, ) // calculates displacement Pose = Pose + [dx, dy, d] // updates pose ... // moving commands where the function Odometry() is a direct implementation of the forwards kinematics equations. The basis for the equations is that the Khepera always moves on an arc of circle, because of its geometry. The centre and radius of this arc can be found using dL, dR, and some geometric features of the Khepera (diameter of wheels, distance between wheels, shaft encoders step). Forward kinematics The following is fully extracted from [Lundgren, 2003]. Figure 8.2: Robot’s specification. Each wheel follow a path that moves around the ICC at the same angular rate ω, thus ω(R + l/2) = Vr (8.1) ω(R - l/2) = Vl (8.2) where L is the distance between the two wheels, the right wheel moves with velocity Vr and the left wheel moves with velocity Vl. R is the distance from the ICC to the midpoint between the wheels. All these control parameters are functions of time, which gives R = l/2 * (Vl + Vr) / (Vr – Vl) …..(8.3) ω = (Vr – Vl) / L …..(8.4) We have two special cases that comes from these equations. If Vl = Vr, then the radius R is infinite and the robot moves in a straight line. If Vl = -Vr, then the radius is zero and the some angular rate ω. The forward kinematics equations can be derived easily now that we have established the basics. Our focus is on how the x and y coordinates and the orientation change with respect to time [2]. Let θ be the angle of orientation, measured in radians, counter-clockwise from the x axis. If we let m(t) and θ(t) be functions of time representing speed and orientation for the robot, then the solution will be in the form: dx/dt = m(t)cos(θ(t)) (3.1) dy/dt = m(t)sin(θ(t)) (3.2) The change of orientation with respect to time is the same as the angular rate ω. Therefore dθ/dt = ω = (Vr – Vl) / L (3.3) Integrating this equation yields a function for the robots orientation with respect to time. The robots initial orientation θ(0) is also replaced by θ0: θ(t) = (Vr – Vl)t / L + θ0 (3.4) Since the velocity in functions (3.1) and (3.2) above simply equals the average speed for the two wheels, that is m(t)=(Vl + Vr) / 2, integrating this in (3.1) and (3.2) gives: dx/dt = [(Vr + Vl)/2]cos(θ(t)) (3.5) dy/dt = [(Vr + Vl)/2]sin(θ(t)) (3.6) The final step is to integrate equations (3.5) and (3.6), and taking the initial positions to be x(0) = x 0, and y(0) = y0 to get: x(t) = x0 + l/2(Vr + Vl)/(Vr - Vl)[sin((Vr - Vl)t/l+ θ0)-sin(θ0)] (3.8) y(t) = y0 - l/2(Vr + Vl)/(Vr - Vl)[cos((Vr - Vl)t/l+ θ0)-cos(θ0)] (3.9) Noting that l/2(Vr + Vl) / (Vr - Vl) = R, the robots turn radius, and that (Vr - Vl) / L = ω, equations (3.8) and (3.9) can be reduced to: x(t) = x0 + R[sin(ωt + θ0)-sin(θ0)] (4.0) y(t) = y0 - R[cos(ωt + θ0)-cos(θ0)] (4.1) This is the theory that lies behind implementing dead reckoning on a wheeled mobile robot using differential steering. The only thing one has to do is to substitute the terms Vr and Vl with Sr and Sl, indicating the calculations of displacements rather than speeds, and as a result of this also drop the time value t. Here Sl and Sr are the distances traveled by the left and right wheels respectively. Finally when this has been done equations (3.8) and (3.9) becomes: x(t) = x0 + l/2(sr + sl)/(sr - sl)[sin((sr - sl)/L+ θ0)-sin(θ0)] (4.2) y(t) = y0 - l/2(sr + sl)/(sr - sl)[cos((sr - sl)/L+ θ0)-cos(θ0)] (4.3) The following Matlab codes will be useful for coding the controller of the Khepera robot. Example 8.1: Matlab code for obstacle avoidance function avoid(port,baud,time) % avoid(port,baud,time) % port = serial port to communicate with (port<0 ==> simulated robot, port>=0 ==> real robot % baud = baud rate % time = time to run behaviour ref=kiks_kopen([-1,9600,100]); kSetEncoders(ref,0,0); reflex = 0; speed = [10 10]; t0=clock; loops=0; while (kiks_ktime(port)<time) loops=loops+1; reflex = kProximity(ref); lights = kAmbient(ref); weightsL = [10 2 4 6 6 4 2 4 4]; weightsR = [10 -2 -4 -6 -6 -4 -2 4 4]; speed = calcSpd(weightsL,weightsR,reflex)/2; kSetSpeed(ref,speed(1),speed(2)); end; %kGetEncoders(ref) t=etime(clock,t0); disp(sprintf('%.1f loops in %.1f seconds = %.1f loops/second (%.0f%%)\n',loops,t,loops/t,(time/t)*100)); kSetSpeed(ref,0,0); kiks_kclose(ref); function out = calcSpd(weightsL, weightsR, reflex) mL = weightsL(1); mR = weightsR(1); for i=2:9 mL = weightsL(i)*(1/400)*reflex(i-1)+mL; mR = weightsR(i)*(1/400)*reflex(i-1)+mR; end if sum(reflex(1:4)) > sum(reflex(5:8)) out = [round(mL) round(mR)]; else out = [round(mR) round(mL)]; end; Example 8.2: Matlab code for wall following function follow(port,baud,time) % follow(port,baud,time) % port = serial port to communicate with (port<0 ==> simulated robot, port>=0 ==> real robot % baud = baud rate % time = time to run behaviour ref=kiks_kopen([-1,9600,100]); % KiKS starts up automatically when kiks_kopen is called reflex = 0; speed = [1 1]; loops=0; % for calculating loops/second, not needed for the simulation t0=clock; % for calculating loops/second, not needed for the simulation kSetEncoders(ref,0,0); while (kiks_ktime(port)<time) % kiks_pause(0.2); % just to show how much a simulation can be accelerated. loops=loops+1; % for calculating loops/second, not needed for the simulation reflex = kProximity(ref); weightsL = [5 0 2 3 3 0 0 0 0]; weightsR = [7 0 0 -3 -3 0 0 0 0]; speed = calcspd(weightsL,weightsR,reflex); kSetSpeed(ref,speed(1),speed(2)); end; kSetSpeed(ref,0,0); % calculate stats fprintf('%1.0f seconds simulated in %1.0f seconds (%3.0f%% of real time)\n',kiks_ktime(port),etime(clock,t0),(kiks_ktime(port)/etime(clock,t0))*100) fprintf('%1.1f loops / second\n',loops/kiks_ktime(port)) % close port kiks_kclose(ref); function out = calcspd(weightsL, weightsR, reflex) mL = weightsL(1); mR = weightsR(1); for i=2:9 mL = weightsL(i)*(1/1023)*reflex(i-1)+mL; mR = weightsR(i)*(1/1023)*reflex(i-1)+mR; end out = [round(mL) round(mR)]; Example 8.3: Matlab code for maze following function maze(port,baud,maze_size) % maze(port,baud) % port = serial port to communicate with (port<0 ==> simulated robot, port>=0 ==> real robot % baud = baud rate % maze_size = size of maze if(nargin<3) maze_size=16; end; if(nargin<2) baud=9600; end; if(nargin<1) port=-1; end; if port<0 % only if this is a simulated robot kiks_arena(kiks_generate_maze(maze_size,maze_size)); % set up a new arena using the kiks_generate_maze function end; ref=kiks_kopen([-1,9600,100]); reflex = 0; speed = [1 1]; loops=0; % for calculating loops/second t0=clock; % for calculating loops/second kSetEncoders(ref,0,0); lights=kAmbient(ref); while (min(lights)>0) % stop when robot gets near the light loops=loops+1; % for calculating loops/second reflex = kProximity(ref); weightsL = [-1 4 0 3 0 0 0 0 0]; weightsR = [ 3 0 0 -8 0 0 0 0 0]; speed = calcspd(weightsL,weightsR,reflex); if speed==[0 0] speed=[1 1]; end; kSetSpeed(ref,speed(1),speed(2)); mov_err = kGetStatus(ref); backup(ref,[mov_err(3) mov_err(6)],speed); lights = kAmbient(ref); end; kSetSpeed(ref,0,0); % calculate stats fprintf('%1.0f seconds simulated in %1.0f seconds (%3.0f%% of real time)\n',kiks_ktime(port),etime(clock,t0),(kiks_ktime(port)/etime(clock,t0))*100) fprintf('%1.1f loops / second\n',loops/kiks_ktime(port)) % close port kiks_kclose(ref); function out = calcspd(weightsL, weightsR, reflex) mL = weightsL(1); mR = weightsR(1); for i=2:9 mL = weightsL(i)*(1/1023)*reflex(i-1)+mL; mR = weightsR(i)*(1/1023)*reflex(i-1)+mR; end out = [round(mL) round(mR)]; function R = backup(ref,mov_err,spd) if mov_err>spd*0.8 % back up kSetEncoders(ref,0,0); kMoveTo(ref,-100,-500); % kiks_pause(1); status=kGetStatus(ref); while(status(1)+status(2)<2) status=kGetStatus(ref); end; end;