Minimizing Wide-Area Performance Disruptions in Inter-Domain Routing Yaping Zhu

Minimizing Wide-Area Performance Disruptions in Inter-Domain Routing

Yaping Zhu yapingz@cs.princeton.edu

Advisor: Prof. Jennifer Rexford

Princeton University

Minimize Performance Disruptions

• Network changes affect user experience

– Equipment failures

– Routing changes

– Network congestion

• Network operators have to react and fix problems

– Fix equipment failure

– Change route selection

– Change server selection

2

Diagnosis Framework: Enterprise Network

Measure: network changes

Full Visibility

Diagnose

Fix: equipment, config, etc.

Full Control

3

Challenges to Minimize Wide-Area Disruptions

• The Internet is composed of many networks

– ISP (Internet Service Provider): provides connectivity

– CDN (Content Distribution Network): provides services

• Each network has limited visibility and control

Small ISPs

Large ISP

Client

CDN

4

ISP’s Challenge: Provide Good Transit for Packets

• Limited visibility

– Small ISP: lack of visibility into problem

• Limited control

– Large ISP: lack of direct control to fix congestion

Small ISPs

Large ISP

Client

CDN

5

CDN’s Challenge: Maximize Performance for Services

• Limited visibility

– CDN: can’t figure out exact root cause

• Limited control

– CDN: lack of direct control to fix problem

Small ISPs

Large ISP

Client

CDN

6

Summary of Challenges of Wide-Area Diagnosis

• Measure: large volume and diverse kinds of data

• Diagnosis today: ad-hoc

– Takes a long time to get back to customers

– Does not scale to large number of events

Our Goal: Build Systems for Wide-Area Diagnosis

Formalize and automate the diagnosis process

Analyze a large volume of measurement data

7

Techniques and Tools for Wide-Area Diagnosis

Tool Problem Statement Results

Route

Oracle

NetDiag

Track route changes scalably for

ISPs

Deployed at AT&T

[IMC09, PER10]

Diagnose wide-area latency increases for CDNs

Deployed at Google

In submission

8

Rethink Routing Protocol Design

• Many performance problems caused by routing

– Route selection not based on performance

– 42.2% of the large latency increases in a large CDN correlated with inter-domain routing changes

– No support for multi-path routing

Our Goal: Routing Protocol for Better Performance

Fast convergence to reduce disruptions

Route selection based on performance

Scalable multi-path to avoid disruptions

Less complexity for fewer errors

9

Thesis Outline

Chapter Problem Statement Results

Route

Oracle

Track route changes scalably for

ISPs

Deployed at AT&T

[IMC09, PER10]

NetDiag Diagnose wide-area latency increases for CDNs

Next-hop

BGP

Routing protocol designed for better performance

Deployed at Google

In submission

[HotNets10]

In submission to

CoNext

10

Route Oracle:

Where Have All the Packets Gone?

Work with: Jennifer Rexford

Aman Shaikh and Subhabrata Sen

AT&T Research

Route Oracle: Where Have All the Packets Gone?

AT&T IP Network

IP Packet

Ingress

Router

Egress

Router

• Inputs:

– Destination: IP Address

– When? Time

– Where? Ingress router

• Outputs:

– Where leaving the network? Egress router

– What’s the route to destination? AS path

AS Path

Destination

IP Address

12

Application: Service-Level Performance Management

AT&T

CDN Server in Atlanta

AT&T

Router in Atlanta Leave AT&T in Atlanta

Leave AT&T in

Washington DC

Sprint

• Troubleshoot CDN throughput drop

• Case provided by AT&T ICDS (Intelligent

Content Distribution Service) Project

Atlanta users

13

Background: IP Prefix and Prefix Nesting

• IP prefix: IP address / prefix length

– E.g. 12.0.0.0 / 8 stands for [12.0.0.0, 12.255.255.255]

• Suppose the routing table has routes for prefixes:

– 12.0.0.0/8: [12.0.0.0-12.255.255.255]

– 12.0.0.0/16: [12.0.0.0-12.0.255.255]

– [12.0.0.0-12.0.255.255] covered by both /8 and /16 prefix

• Prefix nesting: IPs covered by multiple prefixes

– 24.2% IP addresses are covered by more than one prefix

14

Background: Longest Prefix Match (LPM)

• BGP update format

– by IP prefix

– egress router, AS path

• Longest prefix match (LPM):

– Routers use LPM to forward IP packets

– LPM changes as routes are announced and withdrawn

– 13.0% BGP updates cause LPM changes

Challenge: determine the route for an IP address

-> LPM for the IP address

-> track LPM changes for the IP address

15

Challenge: Scale of the BGP Data

• Data collection: BGP Monitor

– Have BGP session with each router

– Receive incremental updates of best routes

• Data Scale

– Dozens of routers (one per city)

– Each router has many prefixes (~300K)

– Each router receives lots of updates (millions per day)

Best routes

Software Router

BGP Routers

Centralized Server

16

Background: BGP is Incremental Protocol

• Incremental Protocol

– Routes not changed are not updated

• How to log routes for incremental protocol?

– Routing table dump: daily

– Incremental updates: 15mins

Daily table dump

15 mins updates

Best routes

Software Router Centralized Server

BGP Routers

17

Route Oracle: Interfaces and Challenges

• Challenges

– Track longest prefix match

– Scale of the BGP data

– Need answer to queries

• At scale: for many IP addresses

• In real time: for network operation

Inputs

Destination IP Address

Ingress Router

Time

BGP Routing Data

Route Oracle

Outputs

Egress

Router

Yaping Zhu, Princeton University

AS

Path

18

Strawman Solution: Track LPM Changes by Forwarding Table

• How to implement

– Run routing software to update forwarding table

– Forwarding table answers queries based on LPM

• Answer query for one IP address

– Suppose: n prefixes in routing table at t1, m updates from t1 to t2

– Time complexity: O(n+m)

– Space complexity:

• O(P): P stands for #prefixes covering the query IP address

19

Strawman Solution: Track LPM Changes by Forwarding Table

• Answer queries for k IP addresses

– Keep all prefixes in forwarding table

– Space complexity: O(n)

• Time complexity: major steps

– Initialize n routes: n*log(n)+k*n

– Process m updates: m*log(n)+k*m

– In sum: (n+m)*(log(n)+k)

• Goal: reduce query processing time

– Trade more space for less time: pre-processing

– Store pre-processed results: not scale for 2 32 IPs

– need to track LPM scalably

20

Track LPM Scalably: Address Range

• Prefix set

– Collection of all matching prefixes for given IP address

• Address range

– Contiguous addresses that have the same prefix set

• E.g. 12.0.0.0/8 and 12.0.0.0/16 in routing table

– [12.0.0.0-12.0.255.255] has prefix set {/8, /16}

– [12.1.0.0-12.255.255.255] has prefix set {/8}

• Benefits of address range

– Track LPM scalably

– No dependency between different address ranges

21

Track LPM by Address Range:

Data Structure and Algorithm

• Tree-based data structure: node stands for address range

• Real-time algorithm for incoming updates

[12.0.1.0-12.0.255.255]

[12.0.0.0-12.0.0.255]

/8 /16 /24

[12.1.0.0-12.255.255.255]

/8 /16 /8

Prefix

Routing Table

BGP

Route

12.0.0.0/8

12.0.0.0/16

12.0.0.0/24

22

Track LPM by Address Range: Complexity

• Pre-processing:

– for n initial routes in the routing table and m updates

– Time complexity: (n+m)*log(n)

– Space complexity: O(n+m)

• Query processing: for k queries

– Time complexity: O((n+m)*k)

– Parallelization using c processors: O((n+m)*k/c)

Space complexity

Pre-processing time

Query time

Query parallelization

Strawman approach Route Oracle

O(n) O(n+m)

O((n+m)*(log(n)+k))

No

O((n+m)*log(n))

O((n+m)*k)

Yes

23

Route Oracle: System Implementation

BGP Routing Data:

Daily table dump, 15 mins updates

Precomputation

Query Inputs:

Destination IP

Ingress router

Time

Daily snapshot of routes by address ranges

Incremental route updates for address ranges

Query

Processing

Output for each query:

Egress router, AS path

24

Query Processing: Optimizations

• Optimize for multiple queries

– Amortize the cost of reading address range records: across multiple queried IP addresses

• Parallelization

– Observation: address range records could be processed independently

– Parallelization on multi-core machine

25

Performance Evaluation: Pre-processing

• Experiment on SMP server

– Two quad-core Xeon X5460 Processors

– Each CPU: 3.16 GHz and 6 MB cache

– 16 GB of RAM

• Experiment design

– BGP updates received over fixed time-intervals

– Compute the pre-processing time for each batch of updates

• Can we keep up? pre-processing time

– 5 mins updates: ~2 seconds

– 20 mins updates: ~5 second s

26

Performance Evaluation: Query Processing

• Query for one IP (duration: 1 day)

• Route Oracle 3-3.5 secs; Strawman approach: minutes

• Queries for many IPs: scalability (duration: 1 hour)

27

Performance Evaluation: Query Parallelization

28

Conclusion

Contributions Challenges

1 Prefix nesting

LPM changes

Introduce “address range”

Track LPM changes scalably for many IPs

2 Scale of BGP data Tree based data structure

Real-time algorithm for incoming updates

3 Answer queries at scale and in real time

Pre-processing: more space for less time

Amortize the processing for multiple queries

Parallelize query processing

29

NetDiag: Diagnosing Wide-Area Latency

Changes for CDNs

Work with: Jennifer Rexford

Benjamin Helsley, Aspi Siganporia, and Sridhar Srinivasan

Google Inc.

Background: CDN Architecture

• Life of a client request

• Front-end (FE) server selection

• Latency map

• Load balancing (LB)

Ingress Router

Front-end Server

(FE)

CDN Network

Client

AS Path

Egress Router

31

Challenges

• Many factors contribute to latency increase

– Internal factors

– External factors

• Separate cause from effect

– e.g., FE changes lead to ingress/egress changes

• The scale of a large CDN

– Hundreds of millions of users, grouped by ISP/Geo

– Clients served at multiple FEs

– Clients traverse multiple ingress/egress routers

32

Contributions

• Classification:

– Separating cause from effect

– Identify threshold for classification

• Metrics: analyze over sets of servers and routers

– Metrics for each potential cause

– Metrics by an individual router or server

• Characterization:

– Events of latency increases in Google’s CDN (06/2010)

33

Background: Client Performance Data

Ingress Router

CDN Network

Performance Data

Front-end Server

(FE)

Client

AS Path

Egress Router

Performance Data Format:

IP prefix, FE, Requests Per Day (RPD), Round-Trip Time (RTT)

34

Background: BGP Routing and Netflow Traffic

• Netflow traffic (at edge routers): 15 mins by prefix

– Incoming traffic: ingress router, FE, bytes-in

– Outgoing traffic: egress router, FE, bytes-out

• BGP routing (at edge routers): 15 mins by prefix

– Egress router and AS path

35

Background: Joint Data Set

• Granularity

– Daily

– By IP prefix

• Format

– FE, requests per day (RPD), round-trip time (RTT)

– List of {ingress router, bytes-in}

– List of {egress router, AS path, bytes-out}

BGP Routing Data Netflow Traffic Data Performance Data

Joint Data Set

36

Classification of Latency Increases

Latency Map

FE Capacity and Demand

Performance Data

Group by Region

Identify

Events

FE Changes

Events

FE Change vs.

FE Latency Increase

FE Latency

Increase

Latency Map Change vs.

Load Balancing

BGP Routing

Netflow Traffic

Routing Changes:

Ingress Router vs.

Egress Router, AS path

37

Case Study: Flash Crowd Leads some

Requests to a Distant Front-End Server

• Identify event:

RTT doubled for an ISP in Malaysia

• Diagnose: follow the decision tree

Latency Map


97.9% by FE changes

FE Change vs.



Load Balancing

32.3% FE change

By load balancing

RPD (requests per day) jumped:

RPD2/RPD1 = 2.5

38

Classification: FE Server and Latency Metrics

Latency Map


Performance Data

Group by Region

Identify

Events

FE Changes

Events

FE Change vs.


FE Latency

Increase


Load Balancing

BGP Routing

Netflow Traffic




39

FE Change vs. FE Latency Increase

• RTT: weighted by requests from FEs

RTT





RTT i i

*

RPD i

RPD



RTT





( RTT

2 i

* i

RPD

2 i

RPD

2



RTT

1 i

*

RPD

1 i )

RPD

1

• FE change 

FE

• FE latency change 

Lat



– Clients using the same FE, latency to FE increases



40

FE Change vs. FE Latency Change Breakdown

• FE change



FE





RTT

2 i i

* (

RPD

2 i

RPD

2



RPD

1 i ) /



RTT

RPD

1

• FE latency change





Lat





( RTT

2 i i



RTT

1 i

) *

RPD

1 i

RPD

1

/



RTT

• Important properties

– Sum up to 1



FE

 

Lat



1

41



FE Changes: Latency Map vs. Load Balancing

Latency Map


Performance Data

Group by Region

Identify

Events

FE Changes

Events

FE Change vs.


FE Latency

Increase


Load Balancing

BGP Routing

Netflow Traffic




42

FE Changes: Latency Map vs. Load Balancing

• Classify FE changes by two metrics:

• Fraction of traffic shift by latency map

• Fraction of traffic shift by load balancing

Latency Map


FE Changes


Load Balancing

FE Change vs.


43

Latency Map: Closest FE Server

• Calculate latency map

– Latency map format: (prefix, closest FE)

– Aggregate by groups of clients: list of (FE i

, r i

) r i

: fraction of requests directed to FE i

• Define latency map metric by latency map



LatMap





| r

2 i i

 r

1 i

| /2



44



Load Balancing: Avoiding Busy Servers

• FE request distribution change



FEDist





|

RPD

2 i



RPD

2 i

RPD

1 i

RPD

1

| /2

• Fraction of requests shifted by the load balancer



LoadBalance

1





[ r

1 i i



RPD

1 i ]



RPD

1

– Sum only if positive: target request load > actual load

• Metric: more traffic load balanced on day 2





LoadBal



LoadBalance

2



LoadBalance

1

45

FE Latency Increase: Routing Changes

• Correlate with routing changes:

• Fraction of traffic shifted ingress router

• Fraction of traffic shifted egress router, AS path

FE hange vs.


BGP Routing

Netflow Traffic

FE Latency

Increase




46

Routing Changes: Ingress, Egress, AS Path

• Identify the FE with largest impact



Lat i



( RTT

2 i



RTT

1 i

) *

RPD

1 i

RPD

1

/



RTT

• Calculate fraction of traffic which shifted routes



Ingress





| f

2 j

 f

1 j

| /2 j

• f

1j

, f

2j

: fraction of traffic entering ingress j on days 1 and 2





EgressASPath





| g

2 k

 g

1 k

| /2 k

• g

1k

, g

2k

: fraction of traffic leaving egress/AS path k on day 1, 2

47

Identify Significant Performance Disruptions

Latency Map


Performance Data

Group by Region

Identify

Events

FE Changes

Events

FE Change vs.


FE Latency

Increase


Load Balancing

BGP Routing

Netflow Traffic




48

Identify Significant Performance Disruptions

• Focus on large events

– Large increases: >= 100 msec, or doubles

– Many clients: for an entire region (country/ISP)

– Sustained period: for an entire day

• Characterize latency changes

– Calculate daily latency changes by region

Event Category Percentage

Latency Increase by more than 100 msec 1%

Latency more than doubles 0.45%

49

Latency Characterization for Google’s CDN

• Apply the classification to one month of data (06/2010)

Category

FE latency increase

Ingress router

(Egress router, AS path)

Both

Unknown

FE server change

Latency map

Load balancing

Both

Unknown

Total

% Events

73.9%

10.3%

14.5%

17.4%

31.5%

34.7%

14.2%

2.9%

9.3%

8.4%

100.0%

50

Conclusion and Future Work

• Conclusion

– Method for automatic classification of latency increases

– Tool deployed at Google since 08/2010

• Future work

– More accurate diagnosis on smaller timescale

– Incorporate active measurement data

51

Putting BGP on the Right Path:

Better Performance via Next-Hop

Routing

Work with: Michael Schapira, Jennifer Rexford

Princeton University

Motivation: Rethink BGP Protocol Design

• Many performance problems caused by routing

– Slow convergence during path exploration

– Path selection based on AS path length, not performance

– Selecting a single path, rather than multiple

– Vulnerability to attacks that forge the AS-PATH

Many performance problems related to:

Routing decision based on AS path length, not performance

53

Next-Hop Routing: for Better Performance

• Control plane: path-based routing -> next-hop routing

– Fast convergence through less path exploration

– Scalable multipath routing without exposing all paths

• Data plane: performance and security

– Path selection based on performance

– Reduced attack surface without lying on AS-PATH

54

Today’s BGP: Path-Based Routing

32d > 31d

3

Don’t export

2d to 3

1

3, I’m using 1d d

1, 2, I’m available

2

55

Background: BGP Decision Process

• Import policy

• Decision process

– Prefer higher local preference

– Prefer shorter AS path length

– etc.

• Export policy

Receive route updates from neighbors

Choose single

“best” route

(ranking)

Send route updates to neighbors

(export policy)

56

Next-hop Routing Rules

• Rule 1: use next-hop rankings

5

4 > 3

541d >

53d >

542d

4

1

2

3 d

57



• Rule 2: prioritize current route

– To minimize path exploration

2=3

Prioritize current route

2=3

Break ties in favor of lower

AS number

2

1

3 d

58



• Rule 2: prioritize current route

• Rule 3: consistently export:

– If a route P is exportable to a neighbor AS i, then so must be all routes that are more highly ranked than P.

– To avoid disconnecting upstream nodes

1 > 2,

Export 32d, but not 31d, to 4

1

4 3 d

2

59


• Control plane

– Fast convergence

– Scalable multipath routing

• Data plane

– Performance-driven routing

– Reduced attack surface

60

Simulation Setup

• C-BGP simulator. Cyclops AS-level topology

– Jan 1 st 2010: 34.0k ASes, 4.7k non-stubs

• Protocols

– BGP, Prefer Recent Route (PRR), Next-hop routing

• Metrics

– # updates, # routing changes, # forwarding changes

• Events:

– prefix up, link failure, link recovery

• Methodology:

– 500 experiments

– vantage points: all non-stubs, randomly chosen 5k stubs

61

Fast Convergence: # Updates

• X-axis: # updates after a link failure

• Y-axis: Fraction of non-stubs with more than x updates

62

Fast Convergence: # Routing Changes

• X-axis: # routing changes after a link failure

• Y-axis: Fraction of non-stubs with more than x changes

63


• Control plane



• Data plane



64

Multipath with Today’s BGP: Not Scalable

I’m using 1 and 2

I’m using 5-1, 5-2,

6-3 and 6-4

8 7

5

I’m using 3 and 4

6

2

1

3

4 d

65

Making Multipath Routing Scalable

• Benefits: availability, failure recovery, load balancing

I’m using

{1,2}

5

I’m using

{1,2,3,4,5,6}

8 7

2

1 d

3

I’m using

{3,4}

6

4

66


• Control plane



• Data plane



67

Performance-driven Routing

• Next-hop routing can lead to longer paths

– Evaluate across events: prefix up, link failure/recovery

– 68.7%-89.9% ASes have same path length

– Most other ASes experience one extra hop

• Decision based on measurements of path quality

– Performance metrics: throughput, latency, or loss

– Adjust ranking of next-hop ASes

– Split traffic over multiple next-hop ASes

68

Monitoring Path Performance: Multi-homed Stub

• Apply existing techniques

– Intelligent route control: supported by routers

• Collect performance measurements

– Stub AS see traffic in both directions: forward, reverse

Provider A Provider B

Multi-home Stub

69

Monitoring Path Performance: Service Provider

• Monitor end-to-end performance for clients

– Collect logs at servers: e.g. round-trip time

• Explore alternate routes: route injection

– Announce more-specific prefix

– Direct a small portion of traffic on alternate path

• Active probing on alternate paths

70

Monitoring Path Performance: ISPs

• Challenges

– Most traffic does not start or end in ISP network

– Asymmetric routing

• Focus on single-homed customers

– Why single-homed?

• See both directions of the traffic

– How to collect passive flow measurement selectively?

• Hash-based sampling

71


• Control plane



• Data plane



72

Security

• Reduced attack surface

– Attack: announce shorter path to attract traffic

– Next-hop routing: AS path not used, cannot be forged!

• Incentive compatible

– Definition: AS cannot get a better next-hop by deviating from protocol (e.g. announce bogus route, report inconsistent information to neighbors)

– Theorem [1]: ASes do not have incentives to violate the next-hop routing protocol

• End-to-end security mechanisms:

– Not rely on BGP for data-plane security

– Use encryption, authentication, etc.

[1] J. Feigenbaum, V. Ramachandran, and M. Schapira, “Incentive-compatible interdomain routing”, in Proc. ACM Electronic Commerce, pp. 130-139, 2006.

73

Conclusion

• Next-hop routing for better performance

– Control-plane: fast convergence, scalable multipath

– Data-plane: performance-driven routing, less attacks

• Future work

– Remove the AS path attribute entirely

– Stability and efficiency of performance-driven routing

74

Conclusion

Chapter Contributions Results

Route

Oracle

Analysis of prefix nesting, LPM changes

Track LPM scalably by address range

System implementation with optimizations

NetDiag Classification for causes of latency increases

Metrics to analyze sets of servers and routers

Latency characterization for Google’s CDN

Deployed at

AT&T

[IMC09,

PER10]

Deployed at

Google

In submission

Next-hop

BGP

Proposal of BGP variant by next-hop routing

Evaluate better convergence

Scalable multi-path performance-driven routing

[HotNets10]

In submission to CoNext

75

Minimizing Wide-Area Performance Disruptions in Inter-Domain Routing Yaping Zhu

Minimizing Wide-Area Performance Disruptions in Inter-Domain Routing

Route Oracle:

Where Have All the Packets Gone?

Strawman Solution: Track LPM Changes by Forwarding Table

Strawman Solution: Track LPM Changes by Forwarding Table

Track LPM Scalably: Address Range

Track LPM by Address Range: Complexity

NetDiag: Diagnosing Wide-Area Latency

Changes for CDNs

Putting BGP on the Right Path:

Better Performance via Next-Hop

Routing

Next-Hop Routing: for Better Performance

Next-Hop Routing: for Better Performance

Fast Convergence: # Updates

Fast Convergence: # Routing Changes

Next-Hop Routing: for Better Performance

Next-Hop Routing: for Better Performance

Performance-driven Routing

Next-Hop Routing: for Better Performance

Security

Conclusion

Related documents

Products

Support

Minimizing Wide-Area Performance Disruptions in Inter-Domain Routing Yaping Zhu