Internet Routing Instability Craig Labovitz G. Robert Malan Farnam Jahanian

advertisement
Internet Routing Instability
Craig Labovitz
G. Robert Malan
Farnam Jahanian
Appeared: SIGCOMM ‘97
Presenters:
Supranamaya Ranjan
Mohammed Ahamed
Internet Structure
• Many small ISP’s
at lowest level
• Small number of
big ISP’s at core
The Core of the Internet
Sprint
Verio
rice.edu
UUNet
• Routing done using BGP at core
• Inter-domain routing could be RIP/OSPF etc
BGP Overview
92.92.x.x
128.42.x.x
196.29.x.x
Sprint
92.92.x.x
Verio
100.100.x.x
196.29.x.x
UUNet
128.42.x.x
196.29.x.x
100.100.x.x
BGP Overview (contd.)
• Path Vector protocol
• Similar to Distance Vector routing
• Loop detection done using AS_PATH field
R1
R2
Peering session (TCP)
• Exchange full routing table at start
• Updates sent incrementally
Key Point
The volume of BGP messages exchanged is
abnormally high
• Most messages are redundant / unnecessary and do not
correspond to and topology or policy changes
Consequence: Instability
• Normal data packets handled by dedicated hardware
• BGP packet processing consumes CPU time
• Severe CPU processing overhead takes the router offline
Route Flap Storm:
B
• Router A temporarily fails
• When A becomes alive B & C
send full routing tables
• B & C fail…cascading effect
A
C
How do we avoid /lessen the impact of these problems?
Route Dampening
• Router does not accept frequent route updates to a
destination
• Might signal that network has erratic connectivity
• Increment counter for destination when route changes
• Counter exceeds threshold stop accepting updates
• Decrement counter with time
Problem:
• Future legitimate announcements are accepted only
after a delay
Prefix Aggregation/Super-netting
• Core router advertises a less specific network prefix
• Reduces size of routing tables exchanged
Problems:
Prefix aggregation is not effective because:
- Internet addresses largely non-hierarchically assigned
- Domain renumbering not done when changing ISP’s
- 25% of prefixes multi-homed
- Multi-homed prefixes should be exposed at the core
Route Servers
• O(N) peering sessions per
Router
• 1 peering session per router
Route
Server
In-spite of all these measures the BGP message overhead
is unexpectedly high
Evaluation Methodology
• Data from Route Server at M.A.E west (D.C) peering point
• Peering point for more than 60 major ISP’s
• Nine month log
• Time series analysis of message exchange events
Observation: Lot’s of redundant updates
• Duplicate route with-drawls
ISP
Number of With-drawls
Unique
A
23276
4344
5
F
86417
12435
7
I
2479023
14112
175
One Reason:
- Stateless BGP
- No state of previous with-drawls maintained
Ratio
Observation: Instability Proportional to Activity
After removing duplicate messages:
ISP infrastructure up-grade
Time of day
24:00
Lesser
messages
18:00
12:00
10:00 AM
6:00
Lesser
messages
Instability density with time
Power spectral density
Number of instability events
Evidence from Fine Grained Structure
7 days
24 hours
Frequency (1/hour)
Conjecture:
BGP packets are competing with data packets during
high bandwidth activity.
Proportion of announcements
AADiff
WADiff
Proportion of routing table
Proportion of routing table
• ISP’s serving more network prefixes
may not contribute more to instability
Proportion of announcements
Proportion of announcements
Observation: Instability & size uncorrelated
WADup
Proportion of routing table
Cumulative proportion
Observation: Instability distributed over routes
75% median
10
# of announcements per prefix+AS
• 20% to 90% of routes change 10 times or less
• No single route contributes significantly to instability
Observation: Synchronized updates
• 30 s and 1 minute patterns
30s
Proportion
• Inter-arrival times of
updates shows periodicity
AADiff
1min
• Some routers collect and send
Updates once every 30 s
Possible reasons:
Inter Arrival Time distribution for
AADiff’s
• Routers get synchronized
• Border router- Internal router: interaction misconfigured??
End-to-end Perspective
Chinoy: “Dynamics of Internet routing information” (SIGCOMM 93)
Measurements on NSFNET showed:
- Processing and forwarding latency of BDP update
is 3 orders of magnitude more than the latency incurred in
forwarding data packets
- Will lead to packet drops during the intervening period
Paxson: “End-to-End routing behavior in the internet” (SIGCOMM 96)
- Routing loops introduce loops into other router’s routing tables
- An end-to-end route changes every 1.5 hours on an average
End-to-End perspective (Paxson)
Pathology
type
Probability
in 1995
Long-lived
Routing loops
~ 0.14%
same
Short-lived
Routing loops
~ 0.065%
same
0.96%
2.2%
1.5%
3.4%
Outage>30s
Total
Probability
in 1996
Summary and Conclusions
• Redundant routing information flows in core
• Instability distributed across autonomous systems
Possible reasons for instability:
- Stateless BGP updates
- Misconfigured routers
- Synchronization
- Clocks driving the links not synchronized (link “flaps”)
Follow-up work & impact
“Origins of Internet Routing Instability”-1999
• Migration from stateless to stateful BGP decreased duplicate withdrawals
by an order of magnitude
• But Duplicate Announcements (AADup) doubled
• Reason: Non-transitive attribute filtering not implemented
- BGP specification: “never propagate non-transitive attributes”..
- ASPATH is transitive attribute
- MED (Multi Exit Discriminator) is NOT transitive
Propagating MED’s Causes Oscillations
Download