PPT Version

advertisement
Explicit Marking and Prioritized Treatment of Specific
OSPF Packets for Faster Convergence and Improved
Network Scalability and Stability
(draft-ietf-ospf-scalability-02.txt)
Gagan Choudhury
AT&T
gchoudhury@att.com
Vishwas Manral
NetPlane Systems
VishwasM@netplane.com
Anurag Maunder
Sanera Systems
amaunder@sanera.net
Vera Sapozhnikova
AT&T
sapozhnikova@att.com
1
The Basic Issue
• In Large Operational Networks Running Link-State Protocols we
have Often Observed Sustained CPU Congestion (Often Memory
Congestion as well) Caused by LSA Storms Triggered By
– Links/Nodes Failures
– Synchronization of Refreshes
– Software Bugs or Procedural Errors
• Congestion Reinforced by Positive Feedback Loop due to
– LSA Retransmissions, possible packet droppings, possible link failures
due to missed Hellos and eventual recoveries
More LSAs
• On Rare Occasions the Congestion Spreads to Many Nodes and
Cause Significant Failures
• We Propose Prioritization of Hello, LSA Acknowledgment Packets to
improve Network Stability and Scalability
• Prioritized Treatment may be facilitated by Special Marking
• “Smart” Proprietary Implementations are perhaps already doing it
but we propose them as Best Current Practices so that all
implementations benefit from it
2
Simulation Study
• Three Priority Scenarios
– 1. Incoming LSUs, Hellos, LSA Acks at the Same Priority
– 2. Hellos have Priority over LSUs and LSA Acks
– 3. Hellos and LSA Acks have Priority over LSUs
• Network Scenarios:
– Network 1: 100 Nodes, 1200 Links, Max Node Adjacency 50
– Network 2: 50 Nodes, 600 Links, Max Node Adjacency 48
• LSA Scenarios
– 1 Router LSA per Node, 1 TE LSA per Link
– 1 Router LSA per Node, 10 ASE LSAs per Every Other Node
• LSA Retransmission Timer Value: 5 Seconds or 10 Seconds
• LSU Processing Time : ~ 1 ms, ~0.5 ms
• Hello/Router-Dead Interval: 10 Sec/40 Sec, 2 Sec/8 Sec
3
Six Simulation Cases
• Case 1: Network 1, Link LSAs, Retransmission Timer = 10 Sec,
Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 2: Network 1, ASE LSAs, Retransmission Timer = 10 Sec,
Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 3: Network 1, Link LSAs, Retransmission Timer = 5 Sec,
Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 4: Network 1, Link LSAs, Retransmission Timer = 10 Sec,
Proc. Time ~ 0.5 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 5: Network 1, Link LSAs, Retransmission Timer = 10 Sec,
Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 2/8 Sec.
• Case 6: Network 2, Link LSAs, Retransmission Timer = 10 Sec,
Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
4
Number of Nonconverged
LSUs in Network
Number of Non-Converged LSAs Vs. LSA Storm
- Case 1, No Priority to Hello, Ack
- LSA Storm Starts Between 20 and 30 Seconds
100
80
LSA Storm Size
= 100
LSA Storm Size
= 140
LSA Storm Size
= 160
60
40
20
0
10
30
40
60
100
Time in Seconds
5
LSA Storm Threshold for Sustained CPU Congestion
Case
Number
Maximum Allowable LSA Storm Size For
Case 1
No Priority to
Hello or Ack*
150
Priority to Hello Priority to Hello
Only**
and Ack**
190
250
Case 2
185
215
285
Case 3
115
127
170
Case 4
320
375
580
Case 5
120
175
225
Case 6
185
224
285
* Congestion Due to Retransmissions and Adjacency Loss Due to Missed Hello
** Congestion Due to Retransmissions only (Adjacency Stays Up)
6
Proposal
• Process Critical OSPF Packets (Hello, LSA Ack) at
Higher Priority Compared to Other OSPF Packets
– This May be Facilitated by Special Marking (e.g., use two
Diffserv Codepoints for OSPF Packets, one for Higher and
other for Lower Priority Class)
• During Congestion use Any Packet Received over an
Interface as a Surrogate for Hello in order to Keep
Link Alive (Same Impact as Prioritized Hello)
• Other Potential OSPF Packets to Get High Priority
– LSA Carrying Topology Change Information
– Database Description Packet from Slave That is Used as Ack
• These or Similar Mechanisms are Perhaps Already
Being Used in Smart Proprietary Implementations
– Proposal as BCP would Benefit All Implementations
7
Download