Distributed Partial Information Management (DPIM) for Survivable Networks Dahai Xu

advertisement
Distributed Partial Information
Management (DPIM) for
Survivable Networks
Dahai Xu
1
Content



Basic Concepts of Protection & Restoration
Previous Work on Shared Path Protection
Proposed DPIM Schemes
 what partial info to maintain and how?
 how a connection is routed under
distributed control and with partial info?
 how distributed signaling is done and
bandwidth (BW) allocated/deallocated?
 A heuristic based on Potential Backup Cost
2
Protection



Path Protection
Link Protection
Advantages & Disadvantages
3
Path Protection



Use more than one path to guarantee the
data be sent successfully
Dedicated Path Protection
Shared Path Protection
4
Dedicated Path Protection


1+1 Protection
Point-to-Point Protection & Mesh Network
Protection
5
1+1 Protection
6
Mesh Network Protection
7
Shared Path Protection

1:1 Protection

1:N Protection
8
Link Protection




Use an alternate path if the link failed
Dedicated Link Protection: not practical
Shared Link Protection: practical
It may fail when a node fails
9
Advantages & Disadvantages
of Protection




Simple
Quick: Do not require much extra process
time
Usually can only recover from single link fault
Inefficient usage of resource
10
Restoration



Path Restoration
 Route can be computed after failure
Link Restoration
 Path is discovered at the end nodes of the failed link
 More practical than path restoration
Advantages & Disadvantages of Restoration
 Usually can recover from multiplex element faults
 More efficient usage of resource
 Complex
 Slow: require extra process time to setup path and
reserve resource
11
Comparison between
Protection & Restoration



Characteristic: Protection -- the resource are
reserved before the failure, they may be not
used; Restoration -- the resource are
reserved and used after the failure
Route: Protection -- predetermined;
Restoration -- can be dynamically computed
Resource Efficiency: Protection -- Low;
Restoration -- High
12
Comparison between Protection
& Restoration (Cont’)



Time used: Protection -- Short; Restoration -Long
Reliability: Protection -- mainly for single fault;
Restoration -- can survive under multiplex
faults
Implementation: Protection -- Simple;
Restoration -- Complex
13
Offline Routing



Arrange a set of traffic flows
Integer Linear Programming(ILP) to get optimal
results
Heuristic Algorithms
 Relaxation of ILP
 Simulated Annealing - A stochastic hill-climbing
heuristic search method. (Explore a larger area in
the search space without being trapped in local
optimal)
 Genetic Algorithm: Evolves the current population
of “good solutions” toward the optimality by using
carefully designed crossover and mutation
operators.
 Tabu search
14
Online Routing of Bandwidth
Guaranteed


Online routing, bandwidth guaranteed path
with simultaneous protection path
Metrics
 Unlimited Link Capacity


Bandwidth Consumption
Limited Link Capacity


Connection drop/block probability
Profit / Revenue
15
Assumption


Two connections whose active paths are
completely link disjoint can share backup
Bandwidth (BBW).
The objective of the algorithm is to exploit
this BBW sharing to e.g., reduce the total
amount of bandwidth (TBW) consumed by
the connections.
16
Information for Routing


The amount of BBW sharing depends on the
information available to the routing algorithm.
Three important cases to be considered.
 No Information on how existing
connections are routed
 Complete Per-flow/Aggregate Information
 Partial Aggregate Information
17
No Sharing (NS)





Only know the residual (available) bandwidth
on each link
Residual bandwidth = Link capacity -Reserved
active bandwidth (ABW) - Reserved backup
bandwidth (BBW)
Can be obtained from OSPF Extensions or ISIS Extensions
Only the total used bandwidth is known
(active + backup)
Can not share BBW, thus waste resources.
18
Sharing with Complete
Information (SCI)



Know routes for the active and backup paths
of all current connections.
May have too much information to maintain.
O(LQ). L is the average path length, Q is the
number of existing connections.
Permits the best sharing and provides a
Performance upper-bound
19
Partial Information for Routing


Know some aggregated information of each link
Two schemes
 SPI (Sharing with Partial Information): Centralized
control, knows BBW and ABW on each/every link
 DPIM (Distributed Partial Information
Management): Distributed control, each ingress
edge (source) node decides the routes.
20
Notations (I)
21
Notations (II)
22
No Sharing (NS)



Remove links Re < w
Determine two link disjoint paths for
active/backup
Formulation:
 standard network flow problem
 each link has unit cost and unit capacity
 s supply two units, d demand two units
 minimum cost flow algorithm can be used
23
Linear Programming for SCI (I)


For new request (s, d, w), the least cost of
using a on AP and b on BP
The cost of using e on BP
(1)
24
Linear Programming for SCI (II)


Objective
Constraints
25
SPI

In SCI, can be calculated from per-flow
information. Need maintain per-flow information.
Not scalable.
In SPI, is not known, only
is known

Same objective and constraints as in SCI
Further improvement to be discussed in DPIM

26
Survivable Routing (SR)



Distributed control with complete but aggregated
information.
Every edge node essentially maintains a matrix of
for all links a and b
Uses the active path first (APF) heuristic instead of
ILP formulation
 Remove links whose Re<w (temporarily)
 Find a shortest path as AP
 Put back temporarily removed links, remove AP
links, calculate backup cost using Eq. (1)
 Find a shortest (cheapest) path as BP
27
Successive SR (SSR)



After
is updated as a result of setting up
a new connection, some existing BPs may
change (route and the amount of additional
BBW reserved)
Such changes may in turn trigger changes to
other existing BPs until an equilibrium state is
reached
Achieve a better BBW sharing, but with a
high signaling and control overhead
28
RAFT




RAFT: Resource Aggregation for Fault Tolerance
Each node maintains fault management table
(FMT) , which list AP or BP flow on each link e.
FMT must be updated each time a request
initiates or terminates
AP and BP route are node-disjoint by using
shortest path algorithm firstly
A request is accepted only if the bandwidth
requirement is available on all the links on its AP
and BP, otherwise it is rejected.
29
Doshi’s






Each node maintains a link capacity control table (LCCT)
for each local link
Source nodes using Content-lock mechanism to avoid
multiple demands deadlock.
BP route search: Distributed breadth-first search (BFS)
over a residual network
In BFS, it first query the residual spare capacity in LCCT,
only use the link if the link has sufficient capacity
If a route is found, the source node stores it as the
restoration route for the demand.
If fail to find the BP route, the capacity optimization
procedure is activated by changing previous BP routes
30
Su’s




Each node maintains “bucket”-based link state
(equivalent to
)
The amount of link states is proportional to the
number of failure/link, not the number of light paths
AP and BP are optimized separately. AP are assumed
to using minimum-hop paths, BP are optimized to
reduce the wavelength redundancy
The “width” of link l with respect to a failure event k*
is defined as the normalized difference between the
maximum bucket height and the bucket corresponding
to link failure k*, which indicates the sharing capacity
of links.
31
Su’s (Cont’)


By using Bellman-Ford algorithm to identify
the widest path between the end nodes of
the protected link, the path that offer the
most sharing.
In the event that there are more than one
such path candidates, the one that traverses
the lease number links with width 0 was
selected
32
DPIM-SAM

Distributed Partial Information Management
 Edge node maintains (and exchanges)
non-local information:
for each link e. (O(E) information)
 Each node also maintains profiles of ABW
and BBW
for each local link
e. (O(E) information)
33
Path Determination




This estimated BBW may not be minimal
Using ILP, or APF to find AP and BP
DPIM-M-A: APF with Minimal BBW Allocation
34
Distributed Signaling


Minimal BBW Allocation
Maintaining Partial Information on AP and BP
 Send AP Set-up packet containing BP to
the nodes along AP, each node having an
outgoing link e in AP updates
 Similar way to update
35
Minimal BBW allocation
36
Connection Release


Can’t be done efficiently in SPI
AP Tear-Down and BBW Deallocation. Update
PBe and release bw.
37
Network Topology
38
Performance Evaluation


Traffic Types
 Incremental traffic (Established connection
lasts forever)
 Dynamic traffic (with connection durations)
Performance Metrics
 Unlimited Link Capacity


Bandwidth Saving (Ratio): upper bound 50%
Limited Link Capacity


Connection drop/block probability
Total Earning (Ratio) : Earning Rate matrix
(independent of traffic load)
39
Simulation Results

Average Bandwidth Saving Ratio

Total Earning Ratio
40
Active Path First with Potential
Backup Cost (APF-PBC)

Challenges
 Integer Linear Programming (ILP) based
approaches are notoriously time consuming


Guarantee minimal allocation of TBW for each
request, but do not guarantee an optimal result
for all requests.
Active path first (APF) can only achieve
sub-optimal results:

Does not consider the potential cost along the
BP when selecting the AP
41
Main idea of APF-PBC





Also uses Active Path First
In selecting Active Path, Each capable link a
will be assigned a cost
We use
as the potential
backup cost (and try to minimize TBW).
Intuition: PBC increases with w and
Can apply to SCI and DPIM-SAM (which
determine backup cost and BP differently)
42
Potential Backup Cost Derivation



is derived based on the statistical analysis
of experimental data. (SCI-ILP) for the 15node network, infinite link capacity)
challenge:
but do not
know which link b to be used to backup link
a, let alone Bb and
solution: guess the (weighted average) value
of Bb (call it x) and (call it s)
43
Derivation based on statistical
analysis of Bb



Distribution of Bb/M
(w,s,M) is the expected value of a(w) when
s is fixed.
Guess the distribution of
and
calculated the weighted average value of
(w,s,M) over all s to obtain a(w)
44
Distribution of Bb/M
45
Graph of (w,s,M) & approximation

Integral (curves) from adaptive Lobatto quadrature

Approximation (line-fitting Y=c1X+c2)
46
Cumulative distribution
function of
47
Graph of
48
Approximation of a(w)



Distribution of
Effect of constants c and  on performance of
APF-PBC

49
Distribution of
50
Effect of constants c and  on
performance of APF-PBC
51
Bandwidth consumed after
500 demands
52
Total earning after 500
demands
53
Simulation Results -PBC

Average Bandwidth
Saving Ratio

Total Earning Ratio
54
Summary





On-line Shared path protection (need to extend
to other schemes)
Amount of information (Complete/Partial)
affects BBW sharing
May use ILP or APF-based heuristics
Proposed a DPIM scheme for a distributed,
partial / aggregated information management
(including signaling for path set-up/tear-down)
Proposed a potential cost heuristic, which runs
faster and better than ILP
55
Summary II


Have also extended to cases with unprotected (UP) and
pre-emptable (PE) connections
 UP: use just one path similar to an AP (i.e., no BP);
affected if (and only if) the path breaks.
 PE: unprotected and may be affected even if a
failure does not break its path
 A PE may use the existing BPs/BBW to carry lowpriority traffic in fault-free situations
 A PE is similar, but not identical to a BP: can share
BBW with other BPs, but cannot share with other PE
The idea of potential cost can also be applied to solving
other joint optimization problems with heuristics
56
Reference
57
Download