An open problem in Internet Routing --- Policy Language Design for BGP

advertisement
An open problem in Internet
Routing --- Policy Language
Design for BGP
Timothy G. Griffin
Intel Research,
Cambridge UK
tim.griffin@intel.com
Nov 3, 2003
Architecture of Dynamic Routing
IGP
EGP (= BGP)
AS 1
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP,
EIGRP (cisco)
EGP = Exterior Gateway Protocol
IGP
AS 2
Policy based: BGP
The Routing Domain of BGP is the entire Internet
Technology of Distributed Routing
Link State
•
•
•
•
•
•
Topology information is
flooded within the routing
domain
Best end-to-end paths are
computed locally at each
router.
Best end-to-end paths
determine next-hops.
Based on minimizing
some notion of distance
Works only if policy is
shared and uniform
Examples: OSPF, IS-IS
Vectoring
•
•
•
•
•
•
Each router knows little
about network topology
Only best next-hops are
chosen by each router for
each destination network.
Best end-to-end paths
result from composition
of all next-hop choices
Does not require any
notion of distance
Does not require uniform
policies at all routers
Examples: RIP, BGP
The Gang of Four
Link State
IGP
EGP
OSPF
IS-IS
Vectoring
RIP
BGP
Partial View of www.cl.cam.ac.uk
(128.232.0.20) Neighborhood
AS 3356
Level 3
AS 5459
LINX
AS 6461
AboveNet
AS 20965
GEANT
AS 786
ja.net
(UKERNA)
Originates > 180 prefixes,
Including 128.232.0.0/16
AS 7
UK Defense
Research Agency
AS 1239
Sprint
AS 702
UUNET
AS 1213
HEAnet
(Irish academic
and research)
AS 4373
Online Computer
Library Center
How Many ASNs are there today?
16,046
Thanks to Geoff Huston. http://bgp.potaroo.net on November 3, 2003
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular
intervals.
• Notification : Shuts down a peering
session.
• Update : Announcing new routes or
withdrawing previously announced
routes.
announcement
=
prefix + attributes values7
BGP Attributes
Value
----1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...
255
Code
--------------------------------ORIGIN
AS_PATH
NEXT_HOP
MULTI_EXIT_DISC
LOCAL_PREF
ATOMIC_AGGREGATE
AGGREGATOR
COMMUNITY
ORIGINATOR_ID
CLUSTER_LIST
DPA
ADVERTISER
RCID_PATH / CLUSTER_ID
MP_REACH_NLRI
MP_UNREACH_NLRI
EXTENDED COMMUNITIES
Reference
--------[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1997]
[RFC2796]
[RFC2796]
[Chen]
[RFC1863]
[RFC1863]
[RFC2283]
[RFC2283]
[Rosen]
Most
important
attributes
reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
Not all attributes
need to be present in
every announcement
BGP Route Processing
Open ended programming.
Constrained only by vendor configuration language
Receive Apply Policy =
filter routes &
BGP
Updates tweak attributes
Apply Import
Policies
Based on
Attribute
Values
Best
Routes
Best Route
Selection
Best Route
Table
Apply Policy =
filter routes &
tweak attributes
Transmit
BGP
Updates
Apply Export
Policies
Install forwarding
Entries for best
Routes.
IP Forwarding Table
9
Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
traffic engineering
Lowest IGP cost
to BGP egress
Lowest router ID
Throw up hands and
break ties
ASPATH Attribute
AS 1129
135.207.0.0/16
AS Path = 1755 1239 7018 6341
135.207.0.0/16
AS Path = 1239 7018 6341
AS 1239
Sprint
AS 1755
135.207.0.0/16
AS Path = 1129 1755 1239 7018 6341
Ebone
AS 12654
AS 6341
AT&T Research
RIPE NCC
RIS project
135.207.0.0/16
AS Path = 7018 6341
AS7018
135.207.0.0/16
AS Path = 6341
Global Access
135.207.0.0/16
AS Path = 3549 7018 6341
AT&T
135.207.0.0/16
AS Path = 7018 6341
AS 3549
Global Crossing
135.207.0.0/16
Prefix Originated
11
Shorter Doesn’t Always Mean Shorter
In fairness:
could you do
this “right” and
still scale?
Mr. BGP says that
path 4 1 is better
than path 3 2 1
Duh!
AS 4
AS 3
Exporting internal
state would
dramatically
increase global
instability and
amount of routing
state
AS 2
AS 1
Shedding Inbound Traffic with
ASPATH Prepending
AS 1
Prepending will (usually)
force inbound
traffic from AS 1
to take primary link
provider
192.0.2.0/24
ASPATH = 2 2 2
192.0.2.0/24
ASPATH = 2
primary
backup
customer
AS 2
192.0.2.0/24
Yes, this is a
Glorious Hack …
13
… But Padding Does Not Always Work
AS 1
AS 3
provider
provider
192.0.2.0/24
ASPATH = 2
192.0.2.0/24
ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
primary
backup
customer
AS 2
192.0.2.0/24
AS 3 will send
traffic on “backup”
link because it prefers
customer routes and local
preference is considered
before ASPATH length!
Padding in this way is often
used as a form of load
14
balancing
COMMUNITY Attribute to the Rescue!
AS 1
AS 3
provider
provider
AS 3: normal
customer local
pref is 100,
peer local pref is 90
192.0.2.0/24
ASPATH = 2
COMMUNITY = 3:70
192.0.2.0/24
ASPATH = 2
primary
backup
customer
AS 2
192.0.2.0/24
Customer import policy at AS 3:
If 3:90 in COMMUNITY then
set local preference to 90
If 3:80 in COMMUNITY then
set local preference to 80
If 3:70 in COMMUNITY then
set local preference to 70
15
Don’t celebrate just yet…
Provider A (Tier 1)
peering
Provider B (Tier 1)
provider/customer
provider/customer
Provider C (Tier 2)
customer
Now, customer wants
a backup link to C….
Customer installs a “backup link” …
Provider A (Tier 1)
Provider C (Tier 2)
backup
customer sends
“lower my preference”
Community value
Provider B (Tier 1)
primary
customer
Disaster Strikes!
Provider A (Tier 1)
Provider C (Tier 2)
backup
Provider B (Tier 1)
primary
customer
customer is happy that backup was installed …
The primary link is repaired, and
something odd occurs…
Provider A (Tier 1)
Provider C (Tier 2)
backup
Provider B (Tier 1)
primary
customer
YIKES --- routing DOES NOT return to normal!!!
WAIT! It Gets Better…
A
B
P
B
B
C
B
D
P = primary B = backup
OOOOOPS!
A
B
P
B
B
C
B
Suppose A, B, C all
D
break ties in the
same direction
(clockwise or counter-clockwise)
No solution =
Protocol Divergence
What the heck is going on?
• There is no guarantee that a BGP
configuration has a unique routing solution.
– When multiple solutions exist, the (unpredictable)
order of updates will determine which one is wins.
• There is no guarantee that a BGP
configuration has any solution!
– And checking configurations NP-Complete [GW1999]
• Complex policies (weights, communities
setting preferences, and so on) increase
chances of routing anomalies.
– … yet this is the current trend!
What Problem is BGP Solving?
Underlying problem
Distributed means of
computing a solution.
Shortest Paths
RIP, OSPF, IS-IS
Stable
????
Paths
[GSW1998, GSW2002]
BGP
An instance of the Stable Paths Problem (SPP)
•A graph of nodes and edges,
•Node 0, called the origin,
•For each non-zero node, a set
or permitted paths to the
origin. This set always
contains the “null path”.
•A ranking of permitted paths
at each node. Null path is
always least preferred. (Not
shown in diagram)
1
When modeling BGP : nodes represent
BGP speaking routers, and 0 represents
a node originating some address block
210
2
20
5
5210
2
4
420
430
3
30
0
1
130
10
most preferred
…
least preferred
A Solution to a Stable Paths Problem
2
210
20
A solution is an assignment of
permitted paths to each node
such that
•node u’s assigned path is either the
null path or is a path uwP, where wP is
assigned to node w and {u,w} is an edge
in the graph,
•each node is assigned the highest
ranked path among those consistent
with the paths assigned to its
neighbors.
1
5
5210
2
4
420
430
3
30
0
1
130
10
A Solution need not represent
a shortest path tree, or
a spanning tree.
An SPP may have multiple solutions
120
10
120
10
1
120
10
1
0
0
2
210
20
DISAGREE
1
2
210
20
First solution
0
2
210
20
Second solution
BAD GADGET : No Solution
2
210
20
4
0
130
10
1
3
3
320
30
This is an SPP version of the example first presented in
Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan,
and Deborah Estrin. Computer Networks, Jan. 2000
SURPRISE!
210
20
BGP is not robust :
it is not guaranteed
to recover from
network failures.
1
130
10
2
Becomes a BAD GADGET if link
(4, 0) goes down.
4
40
420
430
0
3
3420
30
PRECARIOUS
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions
Has a solution, but path vector may not find it!
A Sufficient Condition for
Robustness
P
Q : transitive closure of (subpath relation on permitted
paths union the path ranking relation at each node)
Partially Partially Ordered (PP0):
For all paths P and Q,
P
Q and Q P implies
(P = Q or head(P) = head(Q))
This is a
sufficient
condition for
robustness
PPO iff ranking functions can be rewritten to be
strictly increasing along all paths
Checking PPO at the “language level” is
an NP-Complete problem
Why is BGP not causing more trouble?
If the provider/customer digraph is acyclic and
every AS obeys the commandments
• Thou shall prefer customer
routes over all others
• Thou shall use provider
routes only as a last resort
• Thou shall not provide
transit between peers or
providers
then the BGP configuration is robust.
[see Gao-Rexford and Gao-Griffin-Rexford]
Hierarchical BGP (HBGP)
HBGP +PEER + BU
HBGP + BU
HBGP +PEER
HBGP
[GR2000, GGR2001]
Can BGP be fixed?
• BGP policy languages have
evolved organically
• A policy language really
should be designed!
• But how?
Joint work with
Aaron Jaggard (UPenn Math) and
Vijay Ramachandran (Yale CS)
to appear at SIGCOMM 2003
Design Dimensions
•
•
•
•
•
•
Robustness (required!)
Transparency (required!)
Expressive Power
Autonomy (“local wiggle room”)
Local vs. Global Constraints
Policy Opacity
Tradeoffs galore
General Autonomy
Suppose C and K are any predicates that partition all routes.
Then it is possible to write policies, with no inbound filtering,
such that for all imported routes, those that satisfy C are ranked
below those that satisfy K.
A Partial Ordered for the Design Space
Global Constraint
Local Constraint
( J1 , L1 ) < ( J2 , L2 )
if and only if for all S : SPP
1. J(S)
implies J(S)
2
1
2. L(S)
implies L(S)
1
2
2
Robust Designs
( J, L )
is a robust design if
(J and L ) implies PPO
Examples:
( True, SP )
2
( PPO, True )
Expressive Power
( PPO, True )
Not tractable
Tractable
( True, SP )
Constraint Simplicity
Robust Subspace
Need Global Constraints
Theorem: Any robust system supporting both transparency
and autonomy must have a non-trivial global
constraint
Global constraints must be a
part of design from the start
Next?
• Need techniques for
constructing policy languages.
• Design of protocols to enforce
global constraints.
• Can ad-hocery be avoided?
Download