Computer Engineering Department Faculty of Engineering Chapter 21 Adaptive Dynamic Programming Presented by Amr Salah El.din Hassan 6/25/2020 1 ADP I have a problem and I want to think with you about a Hi Isolution am Adaptive to thisDynamic problemProgramming Agent 6/25/2020 2 Problem I am In an environment like shown my initial position is IIwant did not to find wantthe to enter best way the to state state which which contain contain -1 +1 start 6/25/2020 3 Facts In each aState there is a π reward R(S) ,butwhen unfortunately I certain did not I have fixed policy which mean I am let in aus now could you help me to solve this problem? think know the value of reward in any state till I enter this state state. I will perform a certain action π(s) with high together probability I do not know the result of performing an action in a certain state till I perform this action and see the result(which state I will be in) i.e. I do not know T(s,π(s),s’) 6/25/2020 4 Solution Oh Ihave have just remember some thing which may be ourin IWhat OK As will The I you calculate Utility see an here Function idea the it utility is a linear U^π value that equation for we each had so state discussed for n and states store in we chapter them will Great ISo did I will not I should am be know so able happy I do the to now? values estimate that we of reach the R(s) transition and ……. T(s,π(s),s’) ,but probability wait a , second what Now NowIIhave can calculate found thethe solution utility for function my problem for my thank environment you for solution Isome have will 17 make table nthe equations ,so some trials in InIam in unknowns the intrial aenvironment certain each betrial ableIto aT(s,π(s),s’) disappointed by when storing each sequence thinking from with model methat reach from mystate trialsIinwill decide will towhen which I state reach Ithe willstate go that .myhas decision +1 reward willin depend oreach -1 state on Andstop I will be able to know thetoreward R(s) value which of my neighbors has the greatest utility value reward In each trial I will store the sequence of the trial and the value of reward in each state 6/25/2020 5