goalfinding robot using fuzzy and Q

advertisement
Goal Finding Robot using Fuzzy Logic and
Approximate Q-Learning
CAP 5636 PROJECT
P R E S E N TAT I O N BY S S A N T H O S H C H A I TA N YA
Background
Significant usage in indoor environments
◦
◦
◦
Hospital
Shop stores
Hotel
Robot navigation in a semi-structural environment has three main problems:
◦ Cognition by sensors
◦ Navigation
◦ obstacle avoidance.
Fuzzy logic algorithms can solve complex navigational problem with reasonable accuracy
Q- learning helps to understands the environment so that the navigational strategy during training phase can be adapted to
new environment.
Program Design using Fuzzy Logic
Robot uses
◦ Goal location
◦ Direction
𝑍𝑔 − 𝑍𝑟
φ= arctan(𝑋
𝑔
−
𝑋𝑟
)
θ = robot’s odometer angle
Continued…
-100*β
-25*β
0*β
25*β
100*β
Membership function of ω(angular velocity) Z: Zero, SN: small negative, SP: small positive,
BP: Big positive
continued…
◦
◦
//initialize odometry readings of the robot
while(d_goal>distance_threshold)
{
get_sensor_data();
compuete_distance_to_goal();
compuete orientation_angle(); //beta
angular_speed1 = compuete_membership();
angular_speed2 = oam();//obstacle avoidance module
if(obstacle_exists)
{
guided by oam module
}
else
{
guided by membership function
}
}
Obstacle avoidance module
void oam(void){
double oam_delta = 0;
oam_speed[LEFT]=oam_speed[RIGHT]=0;
if (obstacle[PS_1] || obstacle[PS_2])
{
//turn left
oam_delta += (int) (k_2* ps_value[PS_LEFT_90]);
oam_delta += (int) (k_1 * ps_value[PS_LEFT_45]);
}
else if (obstacle[PS_5] || obstacle[PS_6])
{
//turn right
oam_delta -= (int) (k_2* ps_value[PS_LEFT_90]);
oam_delta -= (int) (k_1 * ps_value[PS_LEFT_45]);
}
oam_speed[LEFT] -= oam_delta;
oam_speed[RIGHT] += oam_delta;
}
Program Design for Approximate Q-learning
Actions
Forward
Forward Left
Forward Right
Circular Left
Circular Right
For each state feature vector is combinational conditions of
below 3 values:
distance to obstacle
if obstacle present val = 1.0
else val = 0.0
distance to goal = val/10
difference angle between goal and bot
-0.1<d_angle<0.1 ---forward = 1
d_angle>0.1 ------- forward = 1
d_angle<0.1
Main algorithm flow
For I in range(1,nooftrainingepisodes)
Until goal state is reached or hit with wall
for each state
computes the Q = (product of weights and feature vectors)
get action with maximum q value
Take that action
Update reward using reward function for current action.
updateWeights
wi = wi+α⋅difference⋅fi(s,a)
difference=(r+γ*maxa′Q(s′,a′))−Q(s,a)
epsilon = 0.05
γ is gamma = 0.8
α is alpha = 0.2
rewards
////if there is no obstacle reward is calculated as below
◦ if (action == FORWARD)
◦
reward += 0.3;
◦ else if (action ==LEFT || action==RIGHT|| action ==FORWARDLEFT || action==FORWARDRIGHT)
◦
reward += 0.07;
//if there is obstacle rewards is calculated in this way
◦ if (obstacle is forward)
◦
{
◦
reward += -0.15;
◦
}
◦ else if (obstacle ==LEFT || obstacle ==RIGHT|| obstacle ==FORWARDLEFT || obstacle ==FORWARDRIGHT)
◦
{
◦
reward += -0.05;
◦
}
◦ //reward for orientation toward goal direction. More toward the goal more reward and vice versa
◦ reward +=1/(fabsf(d_angle)*10);
Implementation
Both Algorithms are implemented on E-puck robot on we-bots simulator
It has 8 IR sensors out of 6 forward sensors are used for sensing obstacle
During learning phase
Supervisor
controller
Sends weight values to controller
Robot controller
Sends weight values to controller and
Saves weight values in text file before restating simulation
Continued…
Assumptions:
◦ Map contains only bot and obstacles
◦ Only static obstacle are considered for simulation
Analysis
Fuzzy logic seems to provide solution in any episode and on multiple scenarios
Q- learning requires lot of tuning of features for learning the environment( algorithm took
around 1.0 hours to learn the simulation arena).
No of episodes used for training are 1500.
Future Goals
Implement q learning algorithm on more complex scenario like moving obstacle present in the
environment by fine tuning feature vectors
references
[1]Strauss, Clement and Sahin, Ferat, "Autonomous navigationased on a Q-learning algorithm for
a robot"(2008).
[2] Webot guide http://www.cyberbotics.com/guide.pdf
[3] Mohannad Abid Shehab Ahmed ,"OPTIMUM SHORT PATH FINDER FOR ROBOT USING
LEARNING",vol.05, No. 01, pp. 13-24, June 2012
[4] Reinforcement Learning on the Lego Mindstorms NXT Robot. Analysis and Implementation
http:/babel.isa.uma.es/angelmt/Reinforcement_Learning/AMT_MASTER_THESIS.pdf
Questions?
Goal Finding Robot using Fuzzy Logic and
Approximate Q-Learning
CAP 5636 PROJECT
P R E S E N TAT I O N BY S S A N T H O S H C H A I TA N YA
Download