An Introduction to Interdomain Routing and BGP Timothy G. Griffin

advertisement
An Introduction to
Interdomain Routing and
BGP
Timothy G. Griffin
griffin@research.att.com
http://www.research.att.com/~griffin/interdomain.html
SIGCOMM 2001 Tutorial Session
August 28, 2001
1
Acknowledgements
Thanks to Jay Borkenhagen, Randy Bush,
Anja Feldmann, Matt Grossglauser,
Madan Musuvathi, Jennifer Rexford,
Shubho Sen, and Jia Wang for many
helpful comments
Errors are my own
My opinions should not be taken to
represent AT&T policy
Common View of the Telco Network
Brick
Common View of the IP Network (Layer 3)
What This Tutorial Is About
routing
Goal
Understand how layer 3 connectivity
is maintained in the global Internet
This tutorial will not say much about
the applications that exploit this connectivity.
It will be restricted to IPv4 unicast routing.
• Part I : The basics of interdomain
routing and BGP
• Part II : BGP in practice: Issues of
Scale
Outline Part I
• Forwarding vs. Routing
• IP addressing
• Autonomous Systems (basic units of
interdomain routing)
• The Border Gateway Protocol (BGP)
–
–
–
–
BGP fundamentals
BGP route attributes
Implementing policy with BGP
A wee bit of theory
Outline Part II
• Scaling internal BGP
• BGP table growth
– Address aggregation vs. Multihoming
• Growth in number of autonomous
systems
• Dynamics of BGP
– Route flapping
– BGP convergence
– Rates of BGP updates
Best Effort Connectivity
IP traffic
135.207.49.8
192.0.2.153
This is the fundamental service provided
by Internet Service Providers (ISPs)
All other IP services depend on connectivity:
DNS, email, VPNs, Web Hosting, …
Routing vs. Forwarding
Net
Default to
upstream
router
A
B
R
R2
R
D
R3 R
R1
R4
A
B
C
D
E
default
Nxt Hop
R1
Direct
R3
R1
R3
R1
C
R5
Net
E
Forwarding: determine next hop
Routing: establish end-to-end paths
Forwarding always works
A
B
C
D
E
default
Routing can be badly broken
Net
A
B
C
D
E
default
Nxt Hop
R2
R2
Direct
R5
R5
R2
Nxt Hop
R4
R3
R3
R4
Direct
R4
10
How Are Forwarding Tables
Populated to implement Routing?
Statically
Administrator
manually configures
forwarding table entries
+ More control
+ Not restricted to
destination-based
forwarding
- Doesn’t scale
- Slow to adapt to
network failures
Dynamically
Routers exchange network reachability
information using ROUTING PROTOCOLS.
Routers use this to compute best routes
+ Can rapidly adapt to changes
in network topology
+ Can be made to scale well
- Complex distributed algorithms
- Consume CPU, Bandwidth, Memory
- Debugging can be difficult
- Current protocols are destination-based
In practice : a mix of these.
Static routing mostly at the “edge”
11
Routers Talking to Routers
Routing info
Routing info
• Routing computation is distributed among routers within a
routing domain
• Computation of best next hop based on routing information
is the most CPU/memory intensive task on a router
• Routing messages are usually not routed, but exchanged
via layer 2 between physically adjacent routers (internal
BGP and multi-hop external BGP are exceptions)
Before We Go Any Further …
IP ROUTING PROTOCOLS DO NOT
DYNAMICALLY ROUTE AROUND
NETWORK CONGESTION
• IP traffic can be very bursty
• Dynamic adjustments in routing typically
operate more slowly than fluctuations in
traffic load
• Dynamically adapting routing to account
for traffic load can lead to wild, unstable
oscillations of routing system
Autonomous Routing Domains
A collection of physical networks glued together
using IP, that have a unified administrative
routing policy.
•
•
•
•
Campus networks
Corporate networks
ISP Internal networks
…
Autonomous Systems (ASes)
An autonomous system is an autonomous routing domain
that has been assigned an Autonomous System Number (ASN).
… the administration of an AS appears to other ASes to
have a single coherent interior routing plan and presents a
consistent picture of what networks are reachable through it.
RFC 1930: Guidelines for creation, selection,
and registration of an Autonomous System
AS Numbers (ASNs)
ASNs are 16 bit values.
64512 through 65535 are “private”
•
•
•
•
•
•
•
•
Currently over 11,000 in use.
Genuity: 1
MIT: 3
Harvard: 11
UC San Diego: 7377
AT&T: 7018, 6341, 5074, …
UUNET: 701, 702, 284, 12199, …
Sprint: 1239, 1240, 6211, 6242, …
…
ASNs represent units of routing policy
Architecture of Dynamic Routing
OSPF
BGP
AS 1
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP,
EIGRP (cisco)
EGP = Exterior Gateway Protocol
EIGRP
AS 2
Policy based: BGP
The Routing Domain of BGP is the entire Internet
Technology of Distributed Routing
Link State
•
•
•
•
•
•
Topology information is
flooded within the routing
domain
Best end-to-end paths are
computed locally at each
router.
Best end-to-end paths
determine next-hops.
Based on minimizing
some notion of distance
Works only if policy is
shared and uniform
Examples: OSPF, IS-IS
Vectoring
•
•
•
•
•
•
Each router knows little
about network topology
Only best next-hops are
chosen by each router for
each destination network.
Best end-to-end paths
result from composition
of all next-hop choices
Does not require any
notion of distance
Does not require uniform
policies at all routers
Examples: RIP, BGP
The Gang of Four
Link State
IGP
EGP
OSPF
IS-IS
Vectoring
RIP
BGP
Many Routing Processes Can Run on a Single Router
BGP
RIP Process
BGP Process
RIP Routing tables
BGP Routing tables
OSPF Process
OSPF Routing tables
RIP
Domain
OS kernel
OSPF
Domain
Forwarding Table Manager
Forwarding Table
20
IPv4 Addresses are 32 Bit Values
11111111 00010001 10000111 00000000
255
17
134
0
255.17.134.0
Dotted quad notation
IPv6 addresses have 128 bits
21
Classful Addresses
Class A
0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh
Class B
10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh
Class C
110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh
n = network address bit
h = host identifier bit
Leads to a rigid, flat, inefficient use of address space …
22
RFC 1519: Classless Inter-Domain
Routing (CIDR)
Use two 32 bit numbers to represent a network.
Network number = IP address + Mask
IP Address : 12.4.0.0
Address
Mask
IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Network Prefix
for hosts
Usually written as 12.4.0.0/15
23
Which IP Addresses are Covered by a Prefix?
12.5.9.16 is covered by prefix 12.4.0.0/15
12.5.9.16
00001100 00000101 00001001 00010000
00001100 00000100 00000000 00000000
12.4.0.0/15
11111111 11111110 00000000 00000000
12.7.9.16
00001100 00000111 00001001 00010000
12.7.9.16 is not covered by prefix 12.4.0.0/15
CIDR = Hierarchy in Addressing
12.0.0.0/16
12.1.0.0/16
12.2.0.0/16
12.3.0.0/16
12.0.0.0/8
:
:
:
12.253.0.0/16
12.254.0.0/16
12.3.0.0/24
12.3.1.0/24
:
:
:
:
:
12.3.254.0/24
12.253.0.0/19
12.253.32.0/19
12.253.64.0/19
12.253.96.0/19
12.253.128.0/19
12.253.160.0/19
12.253.192.0/19
25
Classless Forwarding
Destination =12.5.9.16
------------------------------payload
Prefix
OK
better
Next Hop
Interface
0.0.0.0/0
10.14.11.33
ATM 5/0/9
12.0.0.0/8
10.14.22.19
ATM 5/0/8
even better
12.4.0.0/15 10.1.3.77
Ethernet 0/1/3
best!
12.5.8.0/23 attached
Serial 1/0/7
IP Forwarding Table
IP Address Allocation and
Assignment: Internet Registries
IANA
www.iana.org
ARIN
www.arin.org
RIPE
www.ripe.org
APNIC
www.apnic.org
Allocate to National and local registries and ISPs
Addresses assigned to customers by ISPs
RFC 2050 - Internet Registry IP Allocation Guidelines
RFC 1918 - Address Allocation for Private Internets
RFC 1518 - An Architecture for IP Address Allocation with CIDR
Nontransit vs. Transit ASes
ISP 2
ISP 1
NET A
Traffic NEVER
flows from ISP 1
through NET A to ISP 2
(At least not intentionally!)
IP traffic
Internet Service
providers (often)
have transit
networks
Nontransit AS
might be a corporate
or campus network.
Could be a “content
provider”
28
Selective Transit
NET B
NET A DOES NOT
provide transit
Between NET D
and NET B
NET C
NET A
NET A provides transit
between NET B and NET C
and between NET D
and NET C
NET D
Most transit networks transit in a selective manner…
IP traffic
29
Customers and Providers
provider
provider
customer
IP traffic
customer
Customer pays provider for access to the Internet
Customers Don’t Always Need BGP
provider
Nail up routes 192.0.2.0/24
pointing to customer
Nail up default routes 0.0.0.0/0
pointing to provider.
customer
192.0.2.0/24
Static routing is the most common way of connecting an
autonomous routing domain to the Internet.
This helps explain why BGP is a mystery to many …
Customer-Provider Hierarchy
provider
customer
IP traffic
The Peering Relationship
peer
provider
peer
customer
Peers provide transit between
their respective customers
Peers do not provide transit
between peers
traffic
allowed
traffic NOT
allowed
Peers (often) do not exchange $$$
Peering Provides Shortcuts
Peering also allows connectivity between
the customers of “Tier 1” providers.
peer
provider
peer
customer
Peering Wars
Peer
• Reduces upstream
transit costs
• Can increase end-toend performance
• May be the only way to
connect your
customers to some
part of the Internet
(“Tier 1”)
Don’t Peer
• You would rather have
customers
• Peers are usually your
competition
• Peering relationships
may require periodic
renegotiation
Peering struggles are by far the most
contentious issues in the ISP world!
Peering agreements are often confidential.
BGP-4
• BGP = Border Gateway Protocol
• Is a Policy-Based routing protocol
• Is the de facto EGP of today’s global Internet
• Relatively simple protocol, but configuration is complex and the
entire world can see, and be impacted by, your mistakes.
•
1989 : BGP-1 [RFC 1105]
–
•
Replacement for EGP (1984, RFC 904)
1990 : BGP-2 [RFC 1163]
• 1991 : BGP-3 [RFC 1267]
•
1995 : BGP-4 [RFC 1771]
–
Support for Classless Interdomain Routing (CIDR)
36
BGP Operations (Simplified)
Establish session on
TCP port 179
AS1
BGP session
Exchange all
active routes
AS2
Exchange incremental
updates
While connection
is ALIVE exchange
route UPDATE messages
37
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular intervals.
• Notification : Shuts down a peering session.
• Update : Announcing new routes or withdrawing
previously announced routes.
announcement
=
prefix + attributes values
38
BGP Attributes
Value
----1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...
255
Code
--------------------------------ORIGIN
AS_PATH
NEXT_HOP
MULTI_EXIT_DISC
LOCAL_PREF
ATOMIC_AGGREGATE
AGGREGATOR
COMMUNITY
ORIGINATOR_ID
CLUSTER_LIST
DPA
ADVERTISER
RCID_PATH / CLUSTER_ID
MP_REACH_NLRI
MP_UNREACH_NLRI
EXTENDED COMMUNITIES
Reference
--------[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1997]
[RFC2796]
[RFC2796]
[Chen]
[RFC1863]
[RFC1863]
[RFC2283]
[RFC2283]
[Rosen]
This
tutorial
will cover
these
attributes
reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
Not all attributes
need to be present in
every announcement
Attributes are Used to Select
Best Routes
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
Given multiple
routes to the same
prefix, a BGP speaker
must pick at most
one best route
(Note: it could reject
them all!)
Two Types of BGP Neighbor Relationships
AS1
• External Neighbor (eBGP) in a
different Autonomous Systems
• Internal Neighbor (iBGP) in the
same Autonomous System
iBGP is routed (using IGP!)
eBGP
iBGP
AS2
41
iBGP Peers Must be Fully Meshed
eBGP update
iBGP updates
iBGP neighbors do not announce
routes received via iBGP to other iBGP
neighbors.
• iBGP is needed to
avoid routing loops
within an AS
• Injecting external
routes into IGP
does not scale and
causes BGP policy
information to be
lost
• BGP does not
provide “shortest
path” routing
• Is iBGP an IGP?
NO!
42
BGP Next Hop Attribute
12.125.133.90
AS 7018
12.127.0.121
AT&T
AS 12654
AS 6431
RIPE NCC
RIS project
AT&T Research
135.207.0.0/16
Next Hop = 12.125.133.90
135.207.0.0/16
Next Hop = 12.127.0.121
Every time a route announcement crosses an AS
boundary, the Next Hop attribute is changed to the IP
address of the border router that announced the route.
43
Join EGP with IGP For Connectivity
135.207.0.0/16
Next Hop = 192.0.2.1
135.207.0.0/16
10.10.10.10
Forwarding Table
destination
next hop
192.0.2.0/30
AS 1
192.0.2.1
192.0.2.0/30
10.10.10.10
Forwarding Table
destination
next hop
+
EGP
destination
next hop
135.207.0.0/16
192.0.2.1
135.207.0.0/16
192.0.2.0/30
10.10.10.10
10.10.10.10
AS 2
Next Hop Often Rewritten to Loopback
135.207.0.0/16
Next Hop = 127.22.33.44
135.207.0.0/16
Next Hop = 192.0.2.1
135.207.0.0/16
10.10.10.10
Forwarding Table
destination
next hop
127.22.33.44
AS 1
192.0.2.1
127.22.33.44
10.10.10.10
Forwarding Table
destination
next hop
+
EGP
destination
next hop
135.207.0.0/16
127.22.33.44
135.207.0.0/16
127.22.33.44
10.10.10.10
10.10.10.10
AS 2
Implementing Customer/Provider
and Peer/Peer relationships
Two parts:
• Enforce transit relationships
– Outbound route filtering
• Enforce order of route preference
– provider < peer < customer
Import Routes
provider route
peer route
From
provider
customer route
From
provider
From
peer
From
peer
From
customer
From
customer
ISP route
Export Routes
provider route
peer route
To
provider
customer route
ISP route
From
provider
To
peer
To
peer
To
customer
To
customer
filters
block
How Can Routes be Colored?
BGP Communities!
A community value is 32 bits
Used for signally
within and between
ASes
By convention,
first 16 bits is
ASN indicating
who is giving it
an interpretation
community
number
Very powerful
BECAUSE it
has no (predefined)
meaning
Community Attribute = a list of community values.
(So one route can belong to multiple communities)
Two reserved communities
no_export = 0xFFFFFF01: don’t export out of AS
RFC 1997 (August 1996)
no_advertise 0xFFFFFF02: don’t pass to BGP neighbors
49
Communities Example
• 1:100
• To Customers
– Customer routes
• 1:200
– 1:100, 1:200, 1:300
• To Peers
– Peer routes
• 1:300
– 1:100
• To Providers
– Provider Routes
– 1:100
Import
Export
AS 1
Blackholes
peer
provider
peer
customer
Need Filter Here!
192.0.2.0/24
legitimate
Accidental or malicious
announcement of your prefix
can blackhole your destinations
in large part of the Internet
192.0.2.0/24
not legitimate
Mars Attacks!
Martian list often includes
•
•
•
•
•
•
•
•
•
0.0.0.0/0: default
10.0.0.0/8: private
172.16.0.0/12: private
192.168.0.0/16: private
127.0.0.0/8: loopbacks
128.0.0.0/16: IANA reserved
192.0.2.0/24: test networks
224.0.0.0/3: classes D and E
…..
Import Routes (Revisited)
provider route
peer route
customer route
ISP route
potential
backhole
Martian
From
provider
xxxxxx
Customer
address
filters
xxxxxx
xxxxxx
cccccc
xxxxxx
From
peer
From
provider
xxxxxx
cccccc
From
customer
xxxxxx
cccccc
From
customer
From
peer
xxxxxx
filters
martians
So Many Choices
peer
provider
peer
customer
AS 4
Frank’s
Internet Barn
AS 3
AS 2
AS 1
Which route should
Frank pick to 13.13.0.0./16?
13.13.0.0/16
54
BGP Route Processing
Open ended programming.
Constrained only by vendor configuration language
Receive Apply Policy =
filter routes &
BGP
Updates tweak attributes
Apply Import
Policies
Based on
Attribute
Values
Best
Routes
Best Route
Selection
Best Route
Table
Apply Policy =
filter routes &
tweak attributes
Transmit
BGP
Updates
Apply Export
Policies
Install forwarding
Entries for best
Routes.
IP Forwarding Table
55
Tweak Tweak Tweak
•
For inbound traffic
– Filter outbound routes
– Tweak attributes on
outbound routes in
the hope of
influencing your
neighbor’s best route
selection
•
inbound
traffic
For outbound traffic
– Filter inbound routes
– Tweak attributes on
inbound routes to
influence best route
selection
In general, an AS has more
control over outbound traffic
outbound
traffic
outbound
routes
inbound
routes
Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
traffic engineering
Lowest IGP cost
to BGP egress
Lowest router ID
Throw up hands and
break ties
Back to Frank …
peer
provider
peer
Local preference only used in iBGP
customer
AS 4
local pref = 80
local pref = 90
AS 3
local pref = 100
AS 2
Higher Local
preference values
are more preferred
AS 1
13.13.0.0/16
58
Implementing Backup Links with Local
Preference (Outbound Traffic)
AS 1
primary link
Set Local Pref = 100
for all routes from AS 1
backup link
AS 65000
Set Local Pref = 50
for all routes from AS 1
Forces outbound traffic to take primary link, unless link is down.
We’ll talk about inbound traffic soon …
59
Multihomed Backups
(Outbound Traffic)
AS 1
AS 3
provider
provider
primary link
backup link
Set Local Pref = 100
for all routes from AS 1
Set Local Pref = 50
for all routes from AS 3
AS 2
Forces outbound traffic to take primary link, unless link is down.
60
ASPATH Attribute
AS 1129
135.207.0.0/16
AS Path = 1755 1239 7018 6341
135.207.0.0/16
AS Path = 1239 7018 6341
AS 1239
Sprint
AS 1755
135.207.0.0/16
AS Path = 1129 1755 1239 7018 6341
Ebone
AS 12654
AS 6341
AT&T Research
RIPE NCC
RIS project
135.207.0.0/16
AS Path = 7018 6341
AS7018
135.207.0.0/16
AS Path = 6341
Global Access
135.207.0.0/16
AS Path = 3549 7018 6341
AT&T
135.207.0.0/16
AS Path = 7018 6341
AS 3549
Global Crossing
135.207.0.0/16
Prefix Originated
61
Interdomain Loop Prevention
AS 7018
BGP at AS YYY will
never accept a
route with ASPATH
containing YYY.
Don’t Accept!
12.22.0.0/16
ASPATH = 1 333 7018 877
AS 1
62
Traffic Often Follows ASPATH
135.207.0.0/16
ASPATH = 3 2 1
AS 1
AS 2
AS 3
AS 4
135.207.0.0/16
IP Packet
Dest =
135.207.44.66
… But It Might Not
135.207.0.0/16
ASPATH = 1
AS 1
135.207.44.0/25
ASPATH = 5
AS 2
AS 2 filters all
subnets with masks
longer than /24
135.207.0.0/16
ASPATH = 3 2 1
AS 3
AS 4
135.207.0.0/16
IP Packet
Dest =
135.207.44.66
AS 5
135.207.44.0/25
From AS 4, it
may look like this
packet will take
path 3 2 1, but it
actually takes
path 3 2 5
Shorter Doesn’t Always Mean Shorter
In fairness:
could you do
this “right” and
still scale?
Mr. BGP says that
path 4 1 is better
than path 3 2 1
Duh!
AS 4
AS 3
Exporting internal
state would
dramatically
increase global
instability and
amount of routing
state
AS 2
AS 1
Shedding Inbound Traffic with
ASPATH Padding Hack
AS 1
provider
192.0.2.0/24
ASPATH = 2 2 2
192.0.2.0/24
ASPATH = 2
primary
backup
customer
AS 2
192.0.2.0/24
Padding will (usually)
force inbound
traffic from AS 1
to take primary link
66
Padding May Not Shut Off All Traffic
AS 1
AS 3
provider
provider
192.0.2.0/24
ASPATH = 2
192.0.2.0/24
ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
primary
backup
customer
AS 2
192.0.2.0/24
AS 3 will send
traffic on “backup”
link because it prefers
customer routes and local
preference is considered
before ASPATH length!
Padding in this way is often
used as a form of load
67
balancing
COMMUNITY Attribute to the Rescue!
AS 1
AS 3
provider
provider
AS 3: normal
customer local
pref is 100,
peer local pref is 90
192.0.2.0/24
ASPATH = 2
COMMUNITY = 3:70
192.0.2.0/24
ASPATH = 2
primary
backup
customer
AS 2
192.0.2.0/24
Customer import policy at AS 3:
If 3:90 in COMMUNITY then
set local preference to 90
If 3:80 in COMMUNITY then
set local preference to 80
If 3:70 in COMMUNITY then
set local preference to 70
68
Hot Potato Routing: Go for the Closest
Egress Point
192.44.78.0/24
egress 2
egress 1
15
56
IGP distances
This Router has two BGP routes to 192.44.78.0/24.
Hot potato: get traffic off of your network as
Soon as possible. Go for egress 1!
69
Getting Burned by the Hot Potato
2865
High bandwidth
Provider backbone
17
SFF
Low bandwidth
customer backbone
Heavy
Content
Web Farm
NYC
15
56
San Diego
Many customers want
their provider to
carry the bits!
tiny http request
huge http reply
70
Cold Potato Routing with MEDs
(Multi-Exit Discriminator Attribute)
Prefer lower
MED values
2865
17
Heavy
Content
Web Farm
192.44.78.0/24
MED = 56
192.44.78.0/24
MED = 15
15
56
192.44.78.0/24
This means that MEDs must be considered BEFORE
IGP distance!
Note1 : some providers will not listen to MEDs
Note2 : MEDs need not be tied to IGP distance
71
Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
traffic engineering
Lowest IGP cost
to BGP egress
Lowest router ID
Throw up hands and
break ties
This is somewhat simplified. Hey, what happened to ORIGIN??
Policies Can Interact Strangely
(“Route Pinning” Example)
backup
customer
1
3
2
Disaster strikes primary link
and the backup takes over
4
Install backup link using community
Primary link is restored but some
traffic remains pinned to backup
News At 11:00
• BGP is not guaranteed to converge on a
stable routing. Policy interactions could lead
to “livelock” protocol oscillations.
See “Persistent Route Oscillations in Inter-domain Routing” by K. Varadhan, R.
Govindan, and D. Estrin. ISI report, 1996
• Corollary: BGP is not guaranteed to recover
from network failures.
A Wee Bit of Theory
Underlying problem
Distributed means of
computing a solution.
Shortest Paths
RIP, OSPF, IS-IS
X?
BGP
What Problem is BGP solving?
X could
•
•
•
•
aid in the design of policy analysis algorithms and heuristics
aid in the analysis and design of BGP and extensions
help explain some BGP routing anomalies
provide a fun way of thinking about the protocol
Separate dynamic and static semantics
static
semantics
BGP Policies
dynamic
semantics
BGP
Stable Paths
SPVP
Problem (SPP)
See [Griffin, Shepherd, Wilfong]
Booo Hooo,
Many, many
complications...
SPVP = Simple Path
Vector Protocol = a
distributed
algorithm for
solving SPP
An instance of the Stable Paths Problem (SPP)
•
•
•
•
A graph of nodes and edges,
Node 0, called the origin,
For each non-zero node, a
set or permitted paths to
the origin. This set always
contains the “null path”.
A ranking of permitted
paths at each node. Null
path is always least
preferred. (Not shown in
diagram)
1
When modeling BGP : nodes represent
BGP speaking routers, and 0 represents
a node originating some address block
210
2
20
5
5210
2
4
420
430
3
30
0
1
130
10
most preferred
…
least preferred (not null)
Yes, the translation
gets messy!
A Solution to a Stable Paths Problem
2
210
20
A solution is an assignment of
permitted paths to each node
such that
•
•
node u’s assigned path is either
the null path or is a path uwP,
where wP is assigned to node w
and {u,w} is an edge in the
graph,
each node1is assigned the
highest ranked path among
those consistent with the paths
assigned to its neighbors.
5
5210
2
4
420
430
3
30
0
1
130
10
A Solution need not represent
a shortest path tree, or
a spanning tree.
An SPP may have multiple solutions
120
10
120
10
1
120
10
1
0
0
2
210
20
DISAGREE
1
2
210
20
First solution
0
2
210
20
Second solution
BAD GADGET : No Solution
2
210
20
4
0
130
10
1
3
3
320
30
SURPRISE : Beware of Backup Policies
210
20
BGP is not robust :
it is not guaranteed
to recover from
network failures.
1
130
10
2
Becomes a BAD GADGET if link
(4, 0) goes down.
4
40
420
430
0
3
3420
30
PRECARIOUS
Has a solution, but can get “trapped”
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions
Part II
Issues of scale for BGP in the
real world
Big and Getting Bigger
• Scaling the iBGP mesh
– Confederations
– Route Reflectors
• BGP Table Growth
– Address aggregation (CIDR)
– Address allocation
• AS number allocation and use
• Dynamics of BGP
–
–
–
–
Inherent vs. accidental oscillation
Rate limiting and route flap dampening
Lots and lots of noise
Slow convergence time
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
Scale
iBGP Mesh Does Not Scale
eBGP update
•
N border routers means N(N-1)/2
peering sessions
•
Each router must have N-1 iBGP
sessions configured
•
The addition a single iBGP speaker
requires configuration changes to all
iBGP updates
other iBGP speakers
•
Size of iBGP routing table can be
order N larger than number of best
routes (remember alternate routes!)
•
Each router has to listen to update
noise from each neighbor
Currently four solutions:
(0) Buy bigger routers!
(1) Break AS into smaller ASes
(2) BGP Route reflectors
(3) BGP confederations
85
Route Reflectors
• Route reflectors can pass on
iBGP updates to clients
RR
RR
•
RR
•
Each RR passes along ONLY
best routes
ORIGINATOR_ID and
CLUSTER_LIST attributes are
needed to avoid loops
86
BGP Confederations
AS 65502
AS 65503
AS 65500
AS 65501
AS 65504
AS 1
From the outside, this looks like AS 1
Confederation eBGP (between member ASes) preserves
LOCAL_PREF, MED, and BGP NEXTHOP.
BGP Table Growth
Thanks to Geoff Huston. http://www.telstra.net/ops/bgptable.html on August 8, 2001
Large BGP Tables Considered Harmful
• Routing tables must store best
routes and alternate routes
• Burden can be large for routers with
many alternate routes (route
reflectors for example)
• Routers have been known to die
• Increases CPU load, especially
during session reset
Moore’s Law may save us in theory. But
in practice it means spending money to upgrade
equipment …
Deaggregation Due to Multihoming
May be a Leading Cause
If AS 1 does
not announce the
more specific prefix,
then most traffic
to AS 2 will go
through AS 3
because it is a
longer match
12.2.0.0/16
12.2.0.0/16
12.0.0.0/8
AS 3
AS 1
provider
provider
AS 2
AS 2 is
“punching a hole” in
The CIDR block of AS 1
customer
12.2.0.0/16
How Many ASNs are there?
Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001
When will we run out of ASNs?
64,511
2005?
2007?
What is to be done?
• Make ASNs larger than 16 bits
– How about 32 bits?
– See Internet Draft: “BGP support for four-octet
AS number space” (draft-ietf-idr-as4bytes-03.txt)
– Requires protocol change and wide deployment
• Change the way ASNs are used
– Allow multihomed, non-transit networks to use
private ASNs
– Uses ASE (AS number Substitution on Egress )
– See Internet Draft: “Autonomous System Number
Substitution on Egress” (draft-jhaas-ase-00.txt)
– Works at edge, requires protocol change (for
loop prevention)
– Makes some kinds of debugging harder!
Multihomed and “Private”!
(draft-jhaas-ase-00.txt)
63.63.63.0/24
63.63.63.0/24
ASPATH = 1
ASPATH = 2
ASE-ORIGINATOR = 65535
AS 1
63.63.63.0/24
ASPATH = 65535
ASE-ORIGINATOR = 65535
AS 3
Replace
private ASN
AS 65535
63.63.63.0/24
AS 2
63.63.63.0/24
ASPATH = 65535
A non-transit network
In fairness:
could
you “do this
ASE-ORIGINATOR is a new attribute needed for “sender side loop detection” at AS 1 and 2
right” and still scale?
Choice of “private” ASN requires a bit of additional coordination between providers
BGP Routing Tables
show ip bgp
BGP table version is 111849680, local router ID is 203.62.248.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
. . .
*>i192.35.25.0
*>i192.35.29.0
*>i192.35.35.0
*>i192.35.37.0
*>i192.35.39.0
*>i192.35.44.0
*>i192.35.48.0
*>i192.35.49.0
*>i192.35.50.0
*>i192.35.51.0/25
. . .
Next Hop
134.159.0.1
166.49.251.25
134.159.0.1
134.159.0.1
134.159.0.3
166.49.251.25
203.62.248.34
203.62.248.34
203.62.248.34
203.62.248.34
Metric LocPrf Weight Path
50
50
50
50
50
50
55
55
55
55
0
0
0
0
0
0
0
0
0
0
16779 1 701 703 i
5727 7018 14541 i
16779 1 701 1744 i
16779 1 3561 i
16779 1 701 80 i
5727 7018 1785 i
16779 209 7843 225 225
16779 209 7843 225 225
16779 3549 714 714 714
16779 3549 14744 14744
225 225 225 i
225 225 225 i
i
14744 14744 14744 14744 14744 14744 i
Thanks to Geoff Huston. http://www.telstra.net/ops on July 6, 2001
•
•
Use “whois” queries to associate an ASN with “owner” (for
example, http://www.arin.net/whois/arinwhois.html)
7018 = AT&T Worldnet, 701 =Uunet, 3561 = Cable & Wireless, …
• Hey, we can use these paths to draw cool graphs!
AS Graphs Can Be Fun
The subgraph showing all ASes that have more than 100 neighbors in full
graph of 11,158 nodes. July 6, 2001. Point of view: AT&T route-server
AS Graphs Depend on Point of View
peer
peer
provider
customer
1
3
2
4
1
3
5
1
2
4
5
6
3
1
6
4
2
2
6
4
5
3
5
6
This explains why there is no UUNET (701) Sprint (1239) link on previous slide!
AS Graphs Do Not Show Topology!
BGP was designed to
throw away information!
The AS graph
may look like this.
Reality may be closer to this…
BGP Dynamics
• How many updates are flying around
the Internet?
• How long Does it take Routes to
Change?
The goals of
(1) fast convergence
(2) minimal updates
(3) path redundancy
are at odds
Daily Update Count
What is the Sound of One Route Flapping?
A Few Bad Apples …
Most prefixes are
stable most of the time.
On this day, about 83% of the prefixes
were not updated.
Typically, 80% of
the updates are
for less than 5%
Of the prefixes.
Percent of BGP table prefixes
Thanks to Madanlal Musuvathi for this plot.
Data source: RIPE NCC
Two BGP Mechanisms for
Squashing Updates
• Rate limiting on sending updates
– Send batch of updates every
MinRouteAdvertisementInterval
seconds (+/- random fuzz)
– Default value is 30 seconds
– A router can change its mind about
best routes many times within this
interval without telling neighbors
• Route Flap Dampening
– Punish routes for misbehaving
Effective in
dampening
oscillations
inherent in the
vectoring
approach
Must be turned on
with configuration
30 Second Bursts
How Long Does BGP Take to
Adapt to Changes?
100
Cumulative Percentage of Events
90
80
70
60
Tup
Tshort
50
Tlong
40
Tdow n
30
20
10
0
0
20
40
60
80
100
120
140
160
Seconds Until Convergence
Thanks to Abha Ahuja and Craig Labovitz for this plot.
Two Main Factors in Delayed
Convergence
• Rate limiting timer slows everything
down
• BGP can explore many alternate
paths before giving up or arriving at
a new path
– No global knowledge in vectoring
protocols
Why is Rate Limiting Needed?
Updates
to convergence
Time
to convergence
0
0
MinRouteAdvertisementInterval
MinRouteAdvertisementInterval
Rate limiting dampens some of the
oscillation inherent in a vectoring protocol.
Current interval (30 seconds) was picked
“out of the blue sky”
SSFNet (www.ssfnet.org) simulations, T. Griffin and B.J. Premore.
To appear in ICNP 2001.
Route Flap Dampening (RFC 2439)
Routes are given a penalty for changing.
If penalty exceeds suppress limit, the
route is dampened. When the route is not changing,
its penalty decays exponentially. If the penalty goes
below reuse limit, then it is announced again.
• Can dramatically reduce the number of
BGP updates
• Requires additional router resources
• Applied on eBGP inbound only
108
Route Flap Dampening Example
route dampened
for nearly 1 hour
penalty for each flap = 1000
Q: Why All the Updates?
• Networks come, networks go
• There’s always a router rebooting somewhere
• Hardware failure, flaky interface cards, backhoes
digging, floods in Houston, …
This is “normal” --- exactly what
dynamic routing is designed for…
Q: Why All the Updates?
•
•
•
•
•
•
•
•
•
•
•
Misconfiguration
Route flap dampening not widely used
BGP exploring many alternate paths
Software bugs in implementation of routing protocols
BGP session resets due to congestion or lack of
interoperability: BGP sessions are brittle. One malformed
update is enough to reset session and flap 100K routes.
(Consequence of incremental approach)
IGP instability exported by use of MEDs or IGP tie breaker
Sub-optimal vendor implementation choices
Secret sauce routing algorithms attempting fancy-dancy tricks
Weird policy interactions (MED oscillation, BAD GADGETS??)
Gnomes, sprites, and fairies
….
A: NO ONE REALLY KNOWS …
IGP Tie Breaking Can Export Internal
Instability to the Whole Wide World
192.44.78.0/24
AS 1
AS 3
AS 2
10
AS 4
192.44.78.0/24
ASPATH = 4 2 1
15
FLAP
FLAP
FLAP FLAP
56
192.44.78.0/24
ASPATH = 4 3 1
112
MEDs Can Export Internal Instability
2865
17
FLAP
FLAP
192.44.78.0/24
MED = 56 OR 10
192.44.78.0/24
MED = 15
10
15
Heavy
Content
Web Farm
FLAP
FLAP
56
FLAP
FLAP
192.44.78.0/24
113
Implementation Does Matter!
stateless withdraws
widely deployed
stateful withdraws
widely deployed
Thanks to Abha Ahuja and Craig Labovitz for this plot.
How Long Will Interdomain
Routing Continue to Scale?
A quote from some recent email:
... the existing interdomain routing
infrastructure is rapidly nearing the
end of its useful lifetime. It
appears unlikely that mere tweaks
of BGP will stave off fundamental
scaling issues, brought on by growth,
multihoming and other causes.
Is this true or false?
How can we tell?
Research required…
Summary
• BGP is a fairly simple protocol …
• … but it is not easy to configure
• BGP is running on more than 100K
routers (my estimate), making it one
of world’s largest and most visible
distributed systems
• Global dynamics and scaling
principles are still not well
understood
Addressing and ASN RFCs
•
•
•
•
•
•
•
•
•
•
•
•
•
•
RFC 1380 IESG Deliberations on Routing and Addressing (1992)
RFC 1517Applicability Statement for the Implementation of Classless
InterDomain Routing (CIDR) (1993)
RFC 1518 An Architecture for IP Address Allocation with CIDR (1993)
RFC 1519 Classless Inter-Domain Routing (CIDR) (1993)
RFC 1467 Status of CIDR Deployment in the Intrenet (1983)
RFC 1520 Exchanging Routing Information Across Provider Boundaries in
the CIDR Environment (1993)
RFC 1817 CIDR and Classful routing (1995)
RFC 1918 Address Allocation for Private Internets (1996)
RFC 2008 Implications of Various Address Allocation Policies for Internet
Routing (1996)
RFC 2050 Internet Registry IP Allocation Guidelines (1996)
RFC 2260 Scalable Support for Multi-homed Multi-provider Connectivity
(1998)
RFC 2519 A Framework for Inter-Domain Route Aggregation (1999)
RFC 1930 Guidelines for creation, selection, and registration of an
Autonomous System (AS)
RFC 2270 Using a Dedicated AS for Sites Homed to a Single Provider
Selected BGP RFCs
Internet Engineering Task Force (IETF)
http://www.ietf.org
•
IDR : http://www.ietf.org/html.charters/idr-charter.html
•
RFC 1771 A Border Gateway Protocol 4 (BGP-4)
•
Latest draft rewrite: draft-ietf-idr-bgp4-12.txt
•
RFC 1772 Application of the Border Gateway Protocol in the
Internet
•
RFC 1773 Experience with the BGP-4 protocol
•
RFC 1774 BGP-4 Protocol Analysis
•
RFC 2796 BGP Route Reflection An alternative to full mesh IBGP
•
RFC 3065 Autonomous System Confederations for BGP
•
RFC 1997 BGP Communities Attribute
•
RFC 1998 An Application of the BGP Community Attribute in Multihome Routing
•
RFC 2439 Route Flap Dampening
Titles of Some Recent
Internet Drafts
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Dynamic Capability for BGP-4
Application of Multiprotocol BGP-4 to IPv4 Multicast Routing
Graceful Restart mechanism for BGP
Cooperative Route Filtering Capability for BGP-4
Address Prefix Based Outbound Route Filter for BGP-4
Aspath Based Outbound Route Filter for BGP-4
Architectural Requirements for Inter-Domain Routing in the
Internet
BGP support for four-octet AS number space
Autonomous System Number Substitution on Egress
BGP Extended Communities Attribute
Controlling the redistribution of BGP routes
BGP Persistent Route Oscillation Condition
Benchmarking Methodology for Basic BGP Convergence
Terminology for Benchmarking External Routing Convergence
Measurements
BGP is a moving target …
Selected Bibliography on Routing
• Internet Routing Architectures. Bassam Halabi.
Second edition Cisco Press, 2000
• BGP4: Inter-domain Routing in the Internet.
John W. Stewart, III. Addison-Wesley, 1999
• Routing in the Internet. Christian Huitema. 2000
• ISP Survival Guide: Strategies for Running a
Competitive ISP. Geoff Huston. Wiley, 1999.
• Interconnection, Peering and Settlements.
Geoff Huston. The Internet Protocol Journal.
March and June 1999.
120
BGP Stability and Convergence
• The Impact of Internet Policy and Topology on
Delayed Routing Convergence. Craig Labovitz, Abha
Ahuja, Roger Wattenhofer, Srinivasan Venkatachary.
INFOCOM 2001
• An Experimental Study of BGP Convergence. Craig
Labovitz, Abha Ahuja, Abhijit Abose, Farnam
Jahanian. SIGCOMM 2000
• Origins of Internet Routing Instability. C. Labovitz, R.
Malan, F. Jahanian. INFOCOM 1999
• Internet Routing Instability. Craig Labovitz, G. Robert
Malan and Farnam Jahanian. SIGCOMM 1997
Analysis of Interdomain Routing
•
Cooperative Association for Internet Data Analysis (CAIDA)
– http://www.caida.org/
– Tools and analyses promoting the engineering and maintenance
of a robust, scalable global Internet infrastructure
•
Internet Performance Measurement and Analysis (IPMA)
– http://www.merit.edu/ipma/
– Studies the performance of networks and networking protocols
in local and wide-area networks
•
National Laboratory for Applied Network Research (NLANR)
– http://www.nlanr.net/
– Analysis, tools, visualization.
• IRTF Routing Research Group (IRTF-RR)
– http://puck.nether.net/irtf-rr/
Internet Route Registries
• Internet Route Registry
– http://www.irr.net/
• Routing Policy Specification Language (RPSL)
– RFC 2622 Routing Policy Specification Language
(RPSL)
– RFC 2650 Using RPSL in Practice
• Internet Route Registry Daemon (IRRd)
– http://www.irrd.net/
• RAToolSet
– http://www.isi.edu/ra/RAToolSet/
Some BGP Theory
•
•
•
•
•
•
•
Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan,
and Deborah Estrin. Computer Networks, Jan. 2000. (Also USC Tech Report, Feb. 1996)
– Shows that BGP is not guaranteed to converge
An Architecture for Stable, Analyzable Internet Routing. Ramesh Govindan, Cengiz
Alaettinoglu, George Eddy, David Kessens, Satish Kumar, and WeeSan Lee. IEEE Network
Magazine, Jan-Feb 1999.
– Use RPSL to specify policies. Store them in registries. Use registry for conguration
generation and analysis.
An Analysis of BGP Convergence Properties. Timothy G. Griffin, Gordon Wilfong. SIGCOMM
1999
– Model BGP, shows static analysis of divergence in policies is NP complete
Policy Disputes in Path Vector Protocols. Timothy G. Griffin, F. Bruce Shepherd, Gordon
Wilfong. ICNP 1999
– Define Stable Paths Problem and develop sufficient condition for “sanity”
A Safe Path Vector Protocol. Timothy G. Griffin, Gordon Wilfong. INFOCOM 2001
– Dynamic solution for SPVP based on histories
Stable Internet Routing without Global Coordination. Lixin Gao, Jennifer Rexford.
SIGMETRICS 2000
– Show that if certain guidelines are followed, then all is well.
Inherently safe backup routing with BGP. Lixin Gao, Timothy G. Griffin, Jennifer Rexford.
INFOCOM 2001
– Use SPP to study complex backup policies
Thank You!
• Companion links:
• http://www.research.att.com/~griffin/interdomain.html
Download