Uploaded by ramy.m.saied

Adaptive Dynamic Programming

advertisement
Computer Engineering Department
Faculty of Engineering
Chapter 21
Adaptive Dynamic Programming
Presented by
Amr Salah El.din Hassan
6/25/2020
1
ADP
I have a problem and I want to think with you about a
Hi
Isolution
am Adaptive
to thisDynamic
problemProgramming Agent
6/25/2020
2
Problem
I am In an environment like shown my initial position is
IIwant
did not
to find
wantthe
to enter
best way
the to
state
state
which
which
contain
contain
-1 +1
start
6/25/2020
3
Facts
In
each aState
there
is a π
reward
R(S)
,butwhen
unfortunately
I certain
did not
I have
fixed
policy
which
mean
I am let
in aus
now
could
you
help
me
to
solve
this
problem?
think
know
the
value
of
reward
in
any
state
till
I
enter
this
state
state. I will perform a certain action π(s) with high
together
probability
I do not know the result of performing an action in a
certain state till I perform this action and see the
result(which state I will be in) i.e. I do not know T(s,π(s),s’)
6/25/2020
4
Solution
Oh
Ihave
have
just
remember
some
thing
which
may
be
ourin
IWhat
OK
As
will
The
I
you
calculate
Utility
see
an
here
Function
idea
the
it
utility
is
a
linear
U^π
value
that
equation
for
we
each
had
so
state
discussed
for
n
and
states
store
in
we
chapter
them
will
Great
ISo
did
I
will
not
I
should
am
be
know
so
able
happy
I
do
the
to
now?
values
estimate
that
we
of
reach
the
R(s)
transition
and
…….
T(s,π(s),s’)
,but
probability
wait
a
,
second
what
Now
NowIIhave
can calculate
found thethe
solution
utility for
function
my problem
for my thank
environment
you for
solution
Isome
have
will
17 make
table
nthe
equations
,so
some
trials
in InIam
in
unknowns
the
intrial
aenvironment
certain
each
betrial
ableIto
aT(s,π(s),s’)
disappointed
by when
storing
each
sequence
thinking
from
with
model
methat
reach
from
mystate
trialsIinwill
decide
will
towhen
which
I state
reach
Ithe
willstate
go
that
.myhas
decision
+1 reward
willin
depend
oreach
-1 state
on
Andstop
I will
be able
to know
thetoreward
R(s)
value
which of my neighbors has the greatest utility value
reward
In each trial I will store the sequence of the trial and the
value of reward in each state
6/25/2020
5
Download