L10 - Inter-domain routing 1 (Intro to BGP)

advertisement
CPS-356
Network Layer:
Inter-domain Routing
Theophilus Benson
Based partly on lecture notes by Xiaowei Yang, Rodrigo Foncesa, Rob Sherwood, David Mazières, Phil Levis, John Jannotti
Administrivia
• Assignment #2 is out
– You’ll be making your own router
• Implementing routing: RIP
• Implementing forwarding: IP-forwarding
– Due in about three weeks (March 02, 2015)
• Groups of 2
Today
• Inter-Domain Routing (EGP)
• BGP
– Implementation details
• BGP Challenges
Why study BGP?
• Critical protocol: makes the Internet run
– Only widely deployed EGP
• Active area of problems!
–
–
–
–
–
Efficiency
Cogent vs. Level3: Internet Partition
Spammers use prefix hijacking
Pakistan accidentally took down YouTube
Egypt disconnected for 5 days
Why Inter vs. Intra
• Why not just use OSPF everywhere?
– E.g., hierarchies of OSPF areas?
– Hint: scaling is not the only limitation
• BGP is a policy control and information
hiding protocol
–
–
–
–
Autonomous System (AS) owner of a network
intra == trusted, inter == untrusted
Different policies by different ASs
Different costs by different ASs
Types of ASs
• Local Traffic – source or destination in local AS
• Transit Traffic – passes through an AS
• Stub AS
– Connects to only a single other AS
• Multihomed AS
– Connects to multiple ASs
– Carries no transit traffic
• Transit AS
– Connects to multiple ASs and carries transit traffic
AT&T
(Transit)
Comcast
(Stub)
Abilene
(Transit)
BGP
All ASes are not equal
Duke
Cogent
(Transit)
Multihomed
AS relationships
• Very complex economic landscape
• Simplifying a bit:
– Transit: “I pay you to carry my packets to
everywhere” (provider-customer)
– Peering: “For free, I carry your packets to my
customers only.” (peer-peer)
• Technical definition of tier-1 ISP: In the
“default-free” zone. No transit.
– Note that other “tiers” are marketing, but convenient.
“Tier 3” may connect to tier-1.
Zooming in 4x
Tier 1 ISP
Tier 1 ISP
Default free,
Has information on
every prefix
Default: provider
$$
$$
Tier 2
Regional
Tier 2: Regional/National
Tier 2
Tier 2
$$
Tier 3: Local
Tier 3
(local
)
Who pays whom?
• Transit: Customer pays the provider
– Who is who? Usually, the one who can “live
without” the other. AT&T does not need Duke, but
Duke needs some ISP.
• What if both need each other? Free
Peering.
– Instead of sending packets over $$ transit, set up
a direct connection and exchange traffic for free!
– http://vijaygill.wordpress.com/2009/09/08/peeringpolicy-analysis/
• Tier 1s must all peer with each other by
definition
– Tier 1s form a full mesh Internet core
• Peering can give:
– Better performance
– Lower cost
– More “efficient” routing (keeps packets local)
• But negotiating can be very tricky!
Tussles between Business and Peering
• Cooperative competition (competition)
• Much more desirable to have your peer’s customers
– Much nicer to get paid for transit
• Peering “tiffs” are relatively common
Peering Drama
• Cogent vs. Level3 were peers
• In 2003, Level3 decided to start charging
Cogent
• Cogent said no
• Internet partition: Cogent’s customers couldn’t
get to Level3’s customers and vice-versa
– Other ISPs were affected as well
• Took 3 weeks to reach an undisclosed
agreement
How do you choose a routing
protocol for the WAN?
• Constraints
– Respect $$$$$$
• Autonomy (policy and privacy)
– Scaling
Sometimes you need to lie.
Tier 1 ISP
Tier 1 ISP
Default free,
Has information on
every prefix
Default: provider
Tier 2
Regional
$$
$$
$$
Tier 2
$$
Tier 2
$$
$$
Tier 3
(local)
Tier 3
(local)
Choice of Routing Algorithm
• Link-state?
– Requires sharing of complete information
– Information exchange does not scale
• Hence areas ….
– Can’t express policy
• Distance Vector?
– Scales and retains privacy
– Can’t implement policy
– Can’t avoid loops if shortest path not taken
• Hence Count-to-infinity ..
Path Vector Protocol
• Distance vector algorithm with extra
information
– For each route, store the complete path (ASs)
– No extra computation, just extra storage (and
traffic)
• Advantages
– Can make policy choices based on set of ASs in
path
– Can easily avoid loops
BGP
BGP - High Level
• Single EGP protocol in use today
• Abstract each AS to a single node
• Destinations are CIDR prefixes
• Exchange prefix reachability with all neighbors
– E.g., “I can reach prefix 128.148.0.0/16 through ASes
44444 3356 14325 11078”
• Select a single path by routing policy
• Critical: learn many paths, propagate one
– Add your ASN to advertised path
Terms
• Route: a network prefix plus path attributes
152.2.3.4
AS 1
AS 2
AS 3
• Customer/provider/peer routes:
– Route advertisements heard from
customers/providers/peers
• Transit service: If A advertises a route to B, it
implies that A will forward packets coming from B
to any destination in the advertised prefix
152.3/16
152.3/16
Duke
UNC
NC
RegNet
152.2.3.4
152.3.137.179
BGP: Distance Vector+Paths
Autonomous Systems (ASes)
Route Advertisement
Traffic
Session (over
TCP)
BGP peers
BGP: Distance Vector+Paths
Autonomous Systems (ASes)
Why should I let
others know of the
Route Advertisements?
Route Advertisement
Traffic
Which of two paths to use?
Enforcing relationships
• Two mechanisms
– Route export filters
• Control what routes you send to neighbors
– Route import ranking
• Controls which route you prefer of those you hear.
• “LOCALPREF” – Local Preference. More later.
Export Policies
• Provider  Customer
– All routes so as to provide transit service
• Customer  Provider
– Only customer routes
– Why?
– Only transit for those that pay
• Peer  Peer
– Only customer routes
Sometimes you need to lie.
Tier 1 ISP
Tier 1 ISP
Default free,
Has information on
every prefix
Default: provider
Tier 2
Regional
$$
$$
$$
Tier 2
$$
Tier 2
$$
$$
Tier 3
(local)
Tier 3
(local)
Import policies
• Same routes heard from providers,
customers, and peers, whom to choose?
– customer > peer > provider
– Why?
– Choose the most economic routes!
• Customer route: charge $$ 
• Peer route: free
• Provider route: pay $$ 
Valley Free Routing
• Number links as(+1,0,-1) for provider,
peer and customer
– In any valid path should only see sequence of+1
, followed by at most one 0, followed by
sequence of -1 – Why?
• Consider the economics of the situation
Sometimes you need to lie.
0
Tier 1 ISP
Default free,
Has information on
every prefix
Default: provider
Tier 2
Regional
Tier 1 ISP
$$
$$
+1$$
Tier 2
$$
-1
Tier 2
$$
+1$$
Tier 3
(local)
Tier 3
(local)
Sometimes you need to lie.
0
Tier 1 ISP
Default free,
Has information on
every prefix
Default: provider
Tier 2
Regional
Tier 1 ISP
$$
$$
+1$$
Tier 2
$$
-1
-1
Tier 2
$$
+1$$
Tier 3
(local)
Tier 3
(local)
BGP Implementation Details
BGP
• BGP = Border Gateway Protocol
– Currently in version 4, specified in RFC 1771. (~ 60 pages)
• Inter-domain routing protocol for routing between
autonomous systems
• Uses TCP to establish a BGP session and to send
routing messages over the BGP session
• BGP is a path vector protocol
– Similar to distance vector routing, but routing messages in BGP
contain complete paths
• Network administrators can specify routing policies
BGP policy routing
• BGP’s goal is to find any path (not an
optimal one)
– Since the internals of the AS are never revealed,
finding an optimal path is not feasible
• Network administrator sets BGP’s
policies to determine the “best path” to
reach a destination network
BGP messages
– OPEN
– UPDATE
• Announcements
– Dest Next-hop AS Path … other attributes …
– 128.2.0.0/16 196.7.106.245 2905 701 1239 5050 9
• Withdrawals
– KEEPALIVE
• Keepalive timer / hold timer
• Key thing: The Next Hop attribute
Path Vector
• ASPATH Attribute
– Records what ASes a route goes through
– Loop avoidance: Immediately discard
– Shortest path heuristics
• Like distance vector, but fixes the countto-infinity problem
I can reach d via B,D
A
B
D
I can reach d
Via A,B,D
d
C
I can reach d
Via C,A,B,D
Common BGP path attributes
• Origin: indicates how BGP learned about a particular route
– IGP (internal gateway protocol)
– EGP (external gateway protocol)
– Incomplete
• AS path :
– When a route advertisement passes through an autonomous system,
the AS number is added to an ordered list of AS numbers that the route
advertisement has traversed
• Next hop
• Multi_Exit_Disc (MED, multiple exit discriminator):
-used as a suggestion to an external AS regarding the preferred route
into the AS
• Local_pref: is used to prefer an exit point from the local
autonomous system
• Community: apply routing decisions to a group of destinations
BGP route selection process
Routes sent
to peers
Routes recved from peers
Input
Policy
Engine
Decision
process
Best
routes
Out
Policy
Engine
• Input/output engine may filter routes or
manipulate their attributes
Decision Engine:
Best path selection algorithm
1.
If next hop is inaccessible, ignore routes
2.
Prefer the route with the largest local preference value.
3.
If local prefs are the same, prefer route with the shortest AS
path
4.
If AS_path is the same, prefer route with lowest origin (IGP <
EGP < incomplete)
5.
If origin is the same, prefer the route with lowest MED
6.
IF MEDs are the same, prefer eBGP paths to iBGP paths
7.
If all the above are the same, prefer the route that can be
reached via the closest IGP neighbor.
8.
If the IGP costs are the same, prefer the router with lowest
router id.
Some BGP Challenges
• Convergence
• Traffic engineering: Load Balancing
– How to assure certain routes are selected
• Scaling (route reflectors)
• Security (Next class)
Convergence
• Some reasons for change
–
–
–
–
Topology changes
BGP session failures
Changes in policy
Conflicts between policies can cause oscillation
• Given a change, how long until the network
re-stabilizes?
–
–
–
–
Depends on the change: sometimes never
Depends on the topology
Depends on the time of updates
Open research problem: “tweak and pray”
Change: D goes down.
I can reach d via E, B,D
E
I can reach d via B,D
I can reach d
Via A,E,B,D
I can reach d
IVia
canA,E,B,D
reach d
Via A,B,D
A
B
D
d
C
I can reach d
Via C,A,B,D
I can reach d
Via C,A,E,B,D
Avoiding BGP Instabilities
• Detecting conflicting policies
– Centralized: NP-Complete problem!
– Distributed: open research problem
– Requires too much cooperation
• Detecting oscillations
– Monitoring for repetitive BGP messages
• Restricted routing policies and topologies
– Some topologies / policies proven to be safe*
* Gao & Rexford, “Stable Internet Routing
without Global Coordination”, IEEE/ACM ToN, 2001
Some BGP Challenges
• Convergence
• Traffic engineering: Load Balancing
– How to assure certain routes are selected
• Scaling (route reflectors)
• Security (Next class)
Traffic Engineering: Load balancing
• Same route from two providers
• Outbound is “easy” (you have control)
– Set localpref according to goals
• Inbound is tough (nobody has to listen)
– AS path prepending
– MEDs
• Hot and Cold Potato Routing (picture)
• Often ignored unless contracts involved
• Practical use: tier-1 peering with a content provider
Hot-Potato Routing (early exit)
Foo
12/8
SF
12/8
AT&T
NYC
12.0.0.1
12//8
SF
Sprint
12/8
Bar
NYC
Why is Hot-Potato Bad?
• Forward and reverse paths have different
properties
– Different latencies and loss rates affect TCP
– inaccuracies in measurements of the forwarding
system
• packet loss due to forwarding loops
Cold-Potato Routing (MED)
SF
NYC
Akamai
Med=100
Med=200
SF
Sprint
NYC
Other Traffic Engineering
• Want to:
– Helps Improve revenue
• Route filtering
– Avoid Hot-potato routing?
• Setting MED weights
– Avoid other ISPs using you to get to ISP X?
• AS prepending: “477 477 477 477”
• People prefer shortest routes
• More of an art than science
Some BGP Challenges
• Convergence
• Traffic engineering: Load Balancing
– How to assure certain routes are selected
• Scaling (route reflectors)
• Security (Next class)
Routing table scalability with Classful IP
Addresses
• Fast growing routing table size
• Classless inter-domain routing aims to
address this issue
CIDR hierarchical address
allocation
ISP
128.1.0.0/16
128.2.0.0/16
128.0.0.0/8
128.195.0.0/16
University
Foo.com
Bar.com
Library
128.195.1.0/24
•
•
•
•
•
128.195.4.150
CS
128.195.4.0/24
IP addresses are hierarchically allocated.
An ISP obtains an address block from a Regional Internet
Registry
An ISP allocates a subdivision of the address block to an
organization
An organization recursively allocates subdivision of its address
block to its networks
A host in a network obtains an address within the address block
assigned to the network
CIDR hierarchical address
allocation
ISP
128.1.0.0/16
128.0.0.0/8
128.195.0.0/16
128.2.0.0/16
University
Foo.com
Bar.com
Library
128.195.1.0/24
• ISP obtains an address block
128.195.4.150
CS
128.195.4.0/24
– 128.0.0.0/8  [128.0.0.0, 128.255.255.255]
• ISP allocates
–
128.195.0.0/16 ([128.195.0.0, 128.195.255.255]) to the university.
• University allocates
– 128.195.4.0/24 ([128.195.4.0, 128.195.4.255]) to the CS department’s
network
• A host on the CS department’s network gets one IP
address 128.195.4.150
CIDR allows route aggregation
You can reach 128.0.0.0/8 via ISP1
128.1.0.0/16
Foo.com
ISP3
ISP1
128.2.0.0/16
I
128.0.0.0/8
128.0.0.0/8 ISP1
128.195.0.0/16
Bar.com
University
Library
CS
• ISP1 announces one address prefix
128.0.0.0./8 to ISP2
• ISP2 can use one routing entry to reach
all networks connected to ISP1
Multi-homing increases routing
table size
204.0.0.0/8
ISP2
You can reach 128.0.0.0/8
And 204.1.0.0/16 via ISP1
ISP1
ISP2
128.0.0.0/8
204.0.0.0/8
128.0.0.0/8
204.1.0.0/16
Mutil-home.com
204.1.0.0/16
ISP3
ISP1
204.1.0.0/16 ISP1
Global routing tables continue to
grow (1994-now)
BGP Summary
• BGP uses path vector algorithm
– Intra Versus Inter-domain
• Challenges with BGP
– Scalability
– Security
– Convergence
• Policy is mostly determined by economic
considerations
– Valley Free routing
Download