Goal Finding Robot using Fuzzy Logic and Approximate Q-Learning CAP 5636 PROJECT P R E S E N TAT I O N BY S S A N T H O S H C H A I TA N YA Background Significant usage in indoor environments ◦ ◦ ◦ Hospital Shop stores Hotel Robot navigation in a semi-structural environment has three main problems: ◦ Cognition by sensors ◦ Navigation ◦ obstacle avoidance. Fuzzy logic algorithms can solve complex navigational problem with reasonable accuracy Q- learning helps to understands the environment so that the navigational strategy during training phase can be adapted to new environment. Program Design using Fuzzy Logic Robot uses ◦ Goal location ◦ Direction 𝑍𝑔 − 𝑍𝑟 φ= arctan(𝑋 𝑔 − 𝑋𝑟 ) θ = robot’s odometer angle Continued… -100*β -25*β 0*β 25*β 100*β Membership function of ω(angular velocity) Z: Zero, SN: small negative, SP: small positive, BP: Big positive continued… ◦ ◦ //initialize odometry readings of the robot while(d_goal>distance_threshold) { get_sensor_data(); compuete_distance_to_goal(); compuete orientation_angle(); //beta angular_speed1 = compuete_membership(); angular_speed2 = oam();//obstacle avoidance module if(obstacle_exists) { guided by oam module } else { guided by membership function } } Obstacle avoidance module void oam(void){ double oam_delta = 0; oam_speed[LEFT]=oam_speed[RIGHT]=0; if (obstacle[PS_1] || obstacle[PS_2]) { //turn left oam_delta += (int) (k_2* ps_value[PS_LEFT_90]); oam_delta += (int) (k_1 * ps_value[PS_LEFT_45]); } else if (obstacle[PS_5] || obstacle[PS_6]) { //turn right oam_delta -= (int) (k_2* ps_value[PS_LEFT_90]); oam_delta -= (int) (k_1 * ps_value[PS_LEFT_45]); } oam_speed[LEFT] -= oam_delta; oam_speed[RIGHT] += oam_delta; } Program Design for Approximate Q-learning Actions Forward Forward Left Forward Right Circular Left Circular Right For each state feature vector is combinational conditions of below 3 values: distance to obstacle if obstacle present val = 1.0 else val = 0.0 distance to goal = val/10 difference angle between goal and bot -0.1<d_angle<0.1 ---forward = 1 d_angle>0.1 ------- forward = 1 d_angle<0.1 Main algorithm flow For I in range(1,nooftrainingepisodes) Until goal state is reached or hit with wall for each state computes the Q = (product of weights and feature vectors) get action with maximum q value Take that action Update reward using reward function for current action. updateWeights wi = wi+α⋅difference⋅fi(s,a) difference=(r+γ*maxa′Q(s′,a′))−Q(s,a) epsilon = 0.05 γ is gamma = 0.8 α is alpha = 0.2 rewards ////if there is no obstacle reward is calculated as below ◦ if (action == FORWARD) ◦ reward += 0.3; ◦ else if (action ==LEFT || action==RIGHT|| action ==FORWARDLEFT || action==FORWARDRIGHT) ◦ reward += 0.07; //if there is obstacle rewards is calculated in this way ◦ if (obstacle is forward) ◦ { ◦ reward += -0.15; ◦ } ◦ else if (obstacle ==LEFT || obstacle ==RIGHT|| obstacle ==FORWARDLEFT || obstacle ==FORWARDRIGHT) ◦ { ◦ reward += -0.05; ◦ } ◦ //reward for orientation toward goal direction. More toward the goal more reward and vice versa ◦ reward +=1/(fabsf(d_angle)*10); Implementation Both Algorithms are implemented on E-puck robot on we-bots simulator It has 8 IR sensors out of 6 forward sensors are used for sensing obstacle During learning phase Supervisor controller Sends weight values to controller Robot controller Sends weight values to controller and Saves weight values in text file before restating simulation Continued… Assumptions: ◦ Map contains only bot and obstacles ◦ Only static obstacle are considered for simulation Analysis Fuzzy logic seems to provide solution in any episode and on multiple scenarios Q- learning requires lot of tuning of features for learning the environment( algorithm took around 1.0 hours to learn the simulation arena). No of episodes used for training are 1500. Future Goals Implement q learning algorithm on more complex scenario like moving obstacle present in the environment by fine tuning feature vectors references [1]Strauss, Clement and Sahin, Ferat, "Autonomous navigationased on a Q-learning algorithm for a robot"(2008). [2] Webot guide http://www.cyberbotics.com/guide.pdf [3] Mohannad Abid Shehab Ahmed ,"OPTIMUM SHORT PATH FINDER FOR ROBOT USING LEARNING",vol.05, No. 01, pp. 13-24, June 2012 [4] Reinforcement Learning on the Lego Mindstorms NXT Robot. Analysis and Implementation http:/babel.isa.uma.es/angelmt/Reinforcement_Learning/AMT_MASTER_THESIS.pdf Questions? Goal Finding Robot using Fuzzy Logic and Approximate Q-Learning CAP 5636 PROJECT P R E S E N TAT I O N BY S S A N T H O S H C H A I TA N YA