Adaptive Routing: Reinforcement Learning Approaches

advertisement
Adaptive Routing
Rienforcement Learning
Approaches
Contents
Routing Protocols
 Reinforcement Learning
 Q-Routing
 PQ-Routing
 Ant Routing
 Summary

Routing Classification
Centralized
A Main controller updates all
node’s routing tables.
Fault Tollerent.
Suitable for small networks.
Distributed
Route computation shared
among nodes by exchanging.
Widley used.
Routing Classification…
Static
Adaptive
Routing based only on source
and destination.
Adapt policy to time and trafic.
Current network state - Ignored
More attractive.
Ossilations in path.
Routing Classification Based on
Optimization.
Non-Minimal
Minimal
Shortest Path
DV
LS
Optimal
Short falls of Static Routing

Dynamic networks are subjected to the
following changes.
 Topologies
changes, as nodes are added
and removed
 Traffic patterns change cyclically
 Overall network load changes
 So, routing algorithms that assume that the network
is static don’t work in this setting
Tackling Dynamic Networks
Periodic Updates?
 Routing traffic?
 When to update?

Adaptive Routing’s the Answer?
Reinforcement Learning
Agent Playing against a player- Chess and Tic-Tac-Toe
Learning a Value Function
Learning Value Function
 Temporal Difference
V(e) = V(e) + K [ V(g) – V(e) ]
For K = 0.4 We have
V(e) = 0.5 + 0.2 = 0.7

0.5
1
Exploration Vs Exploitation
e and e*
Rienfocement Learning -Networks
S
V(s) = V(s) + K [ V(s’) – V(s) ]
R
R
R
R
R
R
D
Q-Routing

Qx(d, y) is the time that node x estimates it will take to
deliver a packet to node d through its neighbor y

When y receives the packet, it sends back a message (to node
x), containing its (i.e. y’s) best estimate of the time remaining
to get the packet to d, i.e.
 t = min(Qy(d, z)) over all z neighbors( y )

x then updates Qx(d, y) by:
 [Qx(d, y)]NEW = [Qx(d, y)]OLD + K.(s+q+t - [Qx(d,y)]OLD )
DQ
Where
• s = RTT from x to y
• q = Time spent in queue at x
• T = new estimate by y
Q-Routing…
message
message to d
y
x
Dest
Next Qx(d, y)
…
d
y
20 22
d
w
21
…
min(Qy(d, zi)) = 13;
RTT = s = 11
Qy(d, z1) = 25
Qy(d, z2) = 17
Qy(d, ze) = 70
w
[Qx(d, y)] += (0.25).[(11+17) - 20 ]
Results
Short falls

Shortest path algorithm – better than Q
Routing under low load.

Failure to converge back to shortest paths
when network load decreases again.
Failure to explore new shortcuts
Short falls…
message
message to d
y
Qy(d, z1) = 25
Qy(d, z2) = 17
Qy(d, ze) = 70
x
Dest Next Qx(d, y)
w
…
d
y
22
d
w
21
…
20
Even if route via y reduces later,
It never gets used untill route via
W gets cunjusted
Predictive Q-Routing



DQ = s+q+t - [Qx(d,y)]OLD
[Qx(d, y)]NEW = [Qx(d, y)]OLD + K.DQ
Bx(d,y) = MIN[Bx(d,y), Qx(d,y)]

If(DQ < 0) //Path is improving
 DR = DQ/(currentTime – lastUpdatedTime)
 Rx(d,y) = Rx(d,y) + B.DR //Decrease in R
Else
 Rx(d,y) = G.Rx(d,y) //Increase of R
End If

lastUpdatedTime = currentTime


PQ-Routing Policy…
Finding neighbour y
For each neighbour y of x
 Dt = currentTime – lastUpdatedTime
 Qx-pred(d,y) = Qx(d,y) + Dt.Rx(d,y)
 Choose y with MIN[Qx-pred(d,y)]

PQ-Routing Results
Performs better than Q-Routing under low,
high and varying network loads.
 Adapts faster if “probing inactive paths” for
shortcuts introduced.
 Under high loads, behaves like Q-Routing.
 Uses more memory than Q-Routing.

Comparision – Low Load
Ant- Routing
Stigmergy - Inspirations From Nature…






Sorts brood and food items
Explore particular areas for food, and preferentially
exploits the richest available food source
Cooperates in carrying large items
Leaves pheromones on their way back
Always finds the shortest paths to their nests or food
source
Are blind, can not foresee future, and has very limited
memory
Ants

Each router x in the network maintains for each
destination node d a list of the form:
 <d, <y1, p1>, <y2, p2>, …, <ye, pe>>,
 where y1, y2, …, ye are the neighbors of x, and
 p1 + p2 + …+ pe = 1

This is a parallel (multi-path) routing scheme

This also multiplies the number of degrees of
freedom the system has by a factor of |E|
Ants…

Every destination host hd periodically
generates an “ant” to a random source host
hs

An “ant” is a 3-tuple of the form:
< hd, hs, cost>


cost is a counter of the cost of the path the ant has covered
so far
Ant Routing Example
0
1
2
3
4
0
Dest. 2
Node 3
4
< 4,0,cost >
Next Node
0
2
3
0.33 0.33 0.33
0.33 0.33 0.33
0.26 0.48 0.26
0.33 0.33 0.33
Routing Table for 1
Ants: Updation
When a router x receives an ant < hd, hs, cost>
from neighbor yi, it:
1. Updates
cost by the cost of traversing the link from
x to yi (i.e. the cost of the link in reverse)
2. Updates entry for host (<hd, <y1, p1>, <y2, p2>, …, <ye, pe>>)
p = k / cost, for some k
pi = pi + p
1 + p
for j i, pj =
normalizing sum of probabilities to 1
pj
1 + p
Ants: Propagation

Two sub-species of ant:

Regular Ants:
P( ant sent to yi ) = pi
 Uniform Ants:
P( ant sent to yi ) = 1 / e


Regular ants use learned tables to route ants
Uniform ants explore randomly
Ants: Comparision
Regular Ants
Uniform Ants
Explore best paths very
Explore all paths equally
thoroughly; others hardly at all
Propagate “bad news”
extremely quickly, “good
news” extremely slowly
Propagate “good” and “bad”
news equally fast
Tends to find shortest paths
Natural parallel (multi-path)
routers
Does not converge to QRouting
Converges to Q-Routing in a
static network
Q-Routing vs. Ants



Q-Routing only changes its currently selected
route when the cost of that route increases, not
when the cost of an alternate route decreases
Q-Routing involves overhead linear in the volume
of traffic in the network; ants are effectively free
in moderate traffic
Q-Routing cannot route messages by parallel
paths; uniform ants can
Ants with Evoperation

Evaporation is a real life
scenario - Where
pheromone laid by real
ants evaporates.

Link usage statistics are
used to evaporate (E(x)).

It is the proportion of
number of ants from node
x over the total ants
received by the current
node.



ant _ send ( x) 
 xP( x)
E ( x)  1  N

ant _ send (i ) 

 i  0

P(i)  P(i)  E ( x), i  x
E ( x)
P(i )  P(i ) 
,i  x
N 1
Summary




Routing algorithms that assume a static network
don’t work well in real-world networks, which are
dynamic
Adaptive routing algorithms avoid these
problems, at the cost of a linear increase in the
size of the routing tables
Q-Routing is a straightforward application of QLearning to the routing problem
Routing with ants is more flexible than Q-Routing
Reference

Boyan, J., & Littman, M. (1994). Packet routing in dinamically changing networks: A reinforcement learning approach. In Advances in Neural Information Processing Systems 6 (NIPS6),
pp. 671-678. San Francisco, CA:Morgan Kaufmann.

Di Caro, G., & Dorigo, M. (1998). Two ant colony algorithms for best-eort routing in datagram
networks. In Proceedings of the Tenth IASTED International Conference on Parallel and
Distributed Computing and Systems (PDCS'98), pp. 541-546. IASTED/ACTA Press.

Choi, S., & Yeung, D.-Y. (1996). Predictive Q-routing: A memory-based reinforcement learning
approach to adaptive trac control. In Advances in Neural Information Processing Systems 8
(NIPS8), pp. 945-951. MIT Press.

Dorigo, M., Maniezzo, V., & Colorni, A. (1996). The ant system: Optimization by a colony of
cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26 (1), 29-41.
Download