Deploying and Troubleshooting BGP Networks © 2000, Cisco Systems, Inc. 1 Agenda CCIE’00 Paris © 2000, Cisco Systems, Inc. 2 Agenda • Basics • Peering • Attributes and Route Selection Algorithm • Prefix Generation and Aggregation © 2000, Cisco Systems, Inc. 3 Agenda (cont) • Soft Reconfiguration • Internal mesh reduction • MP-BGP © 2000, Cisco Systems, Inc. 4 Basics CCIE’00 Paris © 2000, Cisco Systems, Inc. 5 Autonomous system B AS 123 AS 456 C A D AS 678 E • Collection of networks under a a single technical administration • Range: 1 to 65,535 (private: 64512 to 65534) © 2000, Cisco Systems, Inc. 6 Autonomous systems Stub AS Stub AS ISP © 2000, Cisco Systems, Inc. 7 Autonomous systems • Multihomed Nontransit AS AS 2 AS 1 AS 3 © 2000, Cisco Systems, Inc. 8 Autonomous systems • Multihomed Transit AS AS 2 AS 1 AS 3 © 2000, Cisco Systems, Inc. 9 BGP session BGP session • BGP session established on top of TCP (port 179) • Reliable transport layer • TCP needs a routing layer (IGP) © 2000, Cisco Systems, Inc. 10 BGP table IGP FIB BGP • BGP uses a database (BGP table) • Databases are exchanged after session set up • Incremental updates after © 2000, Cisco Systems, Inc. 11 Generalities • BGP supports CIDR • NLRI: Network Layer Reachability Information Information carried and exchanged by BGP © 2000, Cisco Systems, Inc. 12 iBGP vs eBGP • eBGP is used to exchange NLRI between Autonomous Systems • iBGP is used to carry NLRI within the Autonomous System • A BGP router has internal and/or external neighbors © 2000, Cisco Systems, Inc. 13 iBGP vs eBGP AS 1 © 2000, Cisco Systems, Inc. eBGP session iBGP session AS 2 14 General operation • Learns multiple paths via internal and external BGP speakers • Picks THE best path and installs it in the IP forwarding table • Policies applied by influencing the best path selection © 2000, Cisco Systems, Inc. 15 General operation • BGP speaker advertises only the routes that it uses itself “hop-by-hop” routing paradigm • Reliable Transport Protocol no need to implement fragmentation, reTX, ACKs and sequencing assumes a “graceful” close: all outstanding data will be delivered © 2000, Cisco Systems, Inc. 16 Information Transfer • From eBGP -> advertise to all • From iBGP -> advertise only to eBGP full iBGP mesh is required!! • Propagate ONLY the best path © 2000, Cisco Systems, Inc. 17 When should you use BGP? • Most appropriate for Multihomed transit and non-transit AS Scaling large networks Deploying new IP-VPN services (MBGP) • Not appropriate on stub AS (static route instead) © 2000, Cisco Systems, Inc. 18 Peering CCIE’00 Paris © 2000, Cisco Systems, Inc. 19 Peers Peers A C AS 100 AS 101 D B E AS 102 © 2000, Cisco Systems, Inc. 20 BGP message types • OPEN • UPDATE • NOTIFICATION • KEEPALIVE • size: 19 to 4096 octets © 2000, Cisco Systems, Inc. 21 Open message 1 2 3 4 bytes Version My autonomous system Hold Time BGP identifier Opt param Len Optional parameters Hold time = Max time (in sec) that may elapse between the receipt of successive UPDATE or KEEPALIVE packet. Negotiated when session starts © 2000, Cisco Systems, Inc. 22 Notification message 1 2 3 Error Error subcode 4 bytes Data Data Error code 1- message Header Error 2-Open message error 3-UPDATE message error 4-Hold timer expired 5-Finite state machine error 6-Cease © 2000, Cisco Systems, Inc. Error subcode 1: Connection Not sync 2: Bad message length 3: Bad message type 1: Unsupported version numb 2: Bad Peer AS 3: Bad BGP identifier 4: Unsupported Optional Par. 5: Authent error 6: Unacceptable hold time 1: Malformed Attribute-list 2: Unrecognised well-know attr. 3: Missing well-know attribute (…) NA NA NA 23 Update message 1 2 3 4 bytes Unfeasible Routes Length Withdrawn routes (variable len) Unreach. routes Total Path Attribute length Path Attributes (var len) Length Prefix (var) Length Prefix (var) Path Attributes NLRI Information (…) © 2000, Cisco Systems, Inc. 24 Path Attributes • 4 Categories: Well-Known mandatory (ex: AS_Path, next-hop, origin) Well-Known discretionary (ex: local pref) Optional transitive: should be passed along even if not supported (ex: community, aggregator) Optional nontransitive (ex: MED) © 2000, Cisco Systems, Inc. 25 Keepalive message • 19 Byte BGP header with no data • Periodically exchanged. • Hold time = max time between successive Keepalive and Update messages. © 2000, Cisco Systems, Inc. 26 Neighbor negotiation’s finite state machine Connect Active ? START Idle OpenSent OpenConfirm Established © 2000, Cisco Systems, Inc. GOAL 27 Neighbor negotiation’s finite state machine Connect Active Start event (inc: reset) Idle OpenSent OpenConfirm Established © 2000, Cisco Systems, Inc. 28 Neighbor negotiation’s finite state machine TCP session not OK Active BGP is waiting for the transport session to start Connect Connect retry timer expires -> new TCP session TCP session successful Idle OpenSent OpenConfirm Established © 2000, Cisco Systems, Inc. 29 Neighbor negotiation’s finite state machine BGP tries to establish TCP session and listens for other potential peers Active Connect Connect retry timer expires TCP session successfully established Idle OpenSent OpenConfirm Troubleshooting tip: Established A neighbor state flip-flopping between connect and active indicates a problem with the TCP session. Use extended ping to check © 2000, Cisco Systems, Inc. 30 Neighbor negotiation’s finite state machine Connect Active If TCP disconnect received In case of error (ex: bad version) -> Notification message sent Idle OpenSent Open message sent. BGP waits for neighbor’s open message © 2000, Cisco Systems, Inc. If Open mess. OK, send a Keepalive OpenConfirm Established 31 Neighbor negotiation’s finite state machine Connect Active Notification message received OpenSent Idle OpenConfirm BGP waits for Keepalive © 2000, Cisco Systems, Inc. Keepalive received Established 32 Neighbor negotiation’s finite state machine Connect Active Idle OpenSent OpenConfirm If notification message received or sent Established CCIE’00 Paris © 2000, Cisco Systems, Inc. Sends periodic Keepalives 33 eBGP Peering • BGP speakers in different AS • Should be directly connected • Configuration: Router B AS 109 A 131.108.0.0/16 router bgp 110 network 150.10.0.0 neighbor 131.108.10.1 remote-as 109 . 1 131.108.10.0/24 Router A router bgp 109 network 131.108.0.0 neighbor 131.108.10.2 remote-as 110 © 2000, Cisco Systems, Inc. AS 110 .2 B 150.10.0.0/16 34 eBGP Peering • Non directly connected neighbors -> ebgp-multihop • Configuration: Router B AS 109 A 131.108.0.0/16 router bgp 110 neighbor 131.108.10.1 remote-as 109 . 1 131.108.10.0/24 neighbor 131.108.10.1 update-source ethernet 0 AS 110 Router A router bgp 109 neighbor 150.10.0.1 remote-as 110 neighbor 150.10.0.1 ebgp-multihop .2 B .1 150.10.0.0/16 ip route 150.10.0.1 255.255.255.255 131.108.10.2 © 2000, Cisco Systems, Inc. 35 iBGP Peering • BGP speakers in same AS • Use loopback interfaces A -> Update source loopback 0 .1 • Configuration: Router B 131.108.10.0/24 AS 123 router bgp 123 neighbor 131.108.10.1 remote-as 123 neighbor 131.108.10.1 update-source loopback 0 Router A .2 B 10.0.0.2/32 router bgp 123 neighbor 10.0.0.2 remote-as 123 © 2000, Cisco Systems, Inc. 36 Load Balancing across parallel links • Use of <ebgp-multihop> • Use the loopback on both routers ISP • Define IGP between the loopback interfaces in DMZ • Configuration: router bgp 201 neighbor x.x.x.x remote-as ISP-AS neighbor x.x.x.x update-source loopback0 neighbor x.x.x.x ebgp-multihop ! ip route x.x.x.x 255.255.255.255 next-hop0/1 ip route x.x.x.x 255.255.255.255 next-hop0/2 CCIE’00 Paris © 2000, Cisco Systems, Inc. AS 201 37 Typical issue with eBGP multihop • Use specific static routes ex: ip route x.x.x.x 255.255.255.255 nexthop0/1 ISP • If not a specific static route, you could end-up learning via BGP a better prefix (longer match) for reaching the neighbor. -> Session restarts continuously. AS 201 CCIE’00 Paris © 2000, Cisco Systems, Inc. 38 MultiPath Support • Router peering with multiple routers in neighboring AS • Install multiple routes in IP routing table ISP D F • Routes should be identical • Next-hop is set to self (use loopback interface) A AS 201 CCIE’00 Paris © 2000, Cisco Systems, Inc. 39 MultiPath Support (Cont.) • Configuration: router bgp 201 neighbor 141.153.12.1 remote-as 2 neighbor 141.153.17.2 remote-as 2 maximum-paths 2 • <sh ip route> B 144.10.0.0/16 [20/0] via 141.153.12.1, 00:03:29 [20/0] via 141.153.17.2, 00:03:29 ISP D F A AS 201 CCIE’00 Paris © 2000, Cisco Systems, Inc. 40 Summary Typical Peering issues • Extended ping fails -> IGP issue • Update source missing • No directly connected route to neighbor (eBGP) + forgot ebgpmultihop • ebgp-multihop but wrong (or not specific enough) static route to neighbor © 2000, Cisco Systems, Inc. 41 Attributes and Route Selection Algorithm CCIE’00 Paris © 2000, Cisco Systems, Inc. 42 BGP Attributes WKM WKD OT ONT • • • • • • • • AS-path Next-hop Origin Local preference Atomic aggregate Aggregator Community Multi Exit Discriminator (MED) © 2000, Cisco Systems, Inc. 43 Synchronization “In a transit network, a route learned from an external peer should not be advertised to other eBGP peers until all the routers in the local AS have learned about it. ” © 2000, Cisco Systems, Inc. 44 Synchronization 690 A 1880 209 B • Rtr A won’t advertise the prefixes from AS209 until the IGP converges. • Turn synchronization off! next-hop has to be known via IGP router bgp 1880 no sync © 2000, Cisco Systems, Inc. 45 Synchronization 690 A 1880 B C 209 • Rtr A won’t advertise the prefixes from AS209 until the IGP converges. • Solutions: redistribute into IGP (NOT!) run BGP in rtr B © 2000, Cisco Systems, Inc. 46 no synchronization • Why? not a transit network all routers in transit path run BGP • Advantages carry fewer routes in IGP BGP converges faster © 2000, Cisco Systems, Inc. 47 NEXT_HOP • The next hop to reach a network eBGP AS 109 IP address of the peer iBGP A 131.108.0.0/16 NEXT_HOP advertised by eBGP . 1 131.108.10.0/24 IGP should carry route to NEXT_HOPs AS 110 Recursive route lookup Unlinks BGP from the physical topology Allows IGP to make intelligent forwarding decision .2 B 150.10.0.0/16 Unreachable next-hop -> route not used © 2000, Cisco Systems, Inc. 48 Third-Party NEXT_HOP • Example: AS 200 A and B are in the same AS Router A will advertise 192.68.1.0/24 150.1.1.2 with a NEXT_HOP of 150.1.1.3. • More efficient! © 2000, Cisco Systems, Inc. C 150.1.1.1 150.1.1.3 A B 192.68.1.0/24 AS 201 49 Third-Party NEXT_HOP • Use of <next-hop-self> • Example: 150.10.0.0 C .1 .3 B A and B are in the same AS Router A will advertise 150.10.0.0 with a NEXT_HOP of 131.108.10.1, but router C can’t reach the next-hop!! 131.108.10.0 Frame relay • Configuration (rtr A): router bgp 109 network 150.10.0.0 neighbor 131.108.10.3 next-hop-self © 2000, Cisco Systems, Inc. .2 A 50 Override Third-Party Next-Hop • Alternative to configuring a specific IP address to be the next-hop for BGP routes • Syntax (route-map command): set ip next-hop peer-address © 2000, Cisco Systems, Inc. 51 Override Third-Party Next-Hop (Cont.) • Set IP next-hop : best used on outbound route-map • Be careful when manipulating next-hop and default routes. Routing loops can occur! Solution: Good network design © 2000, Cisco Systems, Inc. 52 WEIGHT • Cisco specific (sort of router’s internal local preference) • Local to the router Not propagated • value: 0 - 65535 • Default: originated locally = 32768 other = 0 © 2000, Cisco Systems, Inc. 53 LOCAL_PREF • Indication of preferred path to exit the local AS • Global to the local AS • Paths with highest LOCAL-PREF are most desirable (default = 100) bgp default local-preference value © 2000, Cisco Systems, Inc. 54 LOCAL_PREF (Cont.) 690 • Configuration (rtr A): router bgp 109 neighbor x.x.x.x remote-as 1880 neighbor x.x.x.x route-map foo in ! route-map foo permit 10 666 match as-path 2 set local-preference 120 ! ip as-path access-list 2 permit ^1880_ © 2000, Cisco Systems, Inc. 1755 1880 A Needs to go to 690 55 AS_PATH •AS-PATH contains the list of AS the update had to traverse. •AS-PATH is updated by the sending router with its own AS number. •BGP uses the AS-PATH to detect routing loops. © 2000, Cisco Systems, Inc. 56 AS_PATH •Each time the router receives an eBGP update it checks the AS-PATH. •If it finds is own AS number on the ASPATH, the update is discarded. © 2000, Cisco Systems, Inc. 57 AS_PATH 690 B 1. Router A sends update for 141.253.10.0/24 with AS_PATH: 1880 1880 A 2. Router B sends update for 141.253.10.0/24 with AS_PATH: 690 1880 C 200 © 2000, Cisco Systems, Inc. 141.253.10.0/24 3.Router C sends update for 141.253.10.0/24 with AS_PATH: 200 690 1880 4.Router A will detect its own AS number and will discard the update 58 AS_PATH manipulation AS-PATH prepending ISP 1 Internet Problem: 80% of the incoming traffic comes from ISP 1 © 2000, Cisco Systems, Inc. You ISP 2 59 AS_PATH manipulation AS-PATH prepending Solution: ISP 1 Internet route-map prepend permit 10 match as-path 2 set as-path prepend 250 250 As-path: 250 250 250 AS 250 As-Path: 250 ISP 2 © 2000, Cisco Systems, Inc. 60 AS_path manipulation Private-AS Removal • neighbor x.x.x.x remove-private-AS available for eBGP neighbors only Update must have AS_PATH exclusively made up of private-AS numbers. Confederations: private AS will be removed only if it’s after the confederation’s set of Ases remove-private-as will not work if the private ASN you want to remove is the neighboring one! © 2000, Cisco Systems, Inc. 61 Private-AS - Application • Applications include: ISP with singlehomed customers 65001 193.0.32.0/24 Scaling big corporate 1880 networks 193.1.34.0/24 65002 193.0.33.0/24 65003 193.2.35.0/24 A 193.1.32.0/22 1880 © 2000, Cisco Systems, Inc. 62 misc issue with AS_PATH • Error message: #%BGP-3INSUFCHUNKS: Insufficient chunk pools for aspath • Router keeps working fine!!! • Appears when router gets an update with AS_PATH > 50 AS • Since 12.0(11) and 12.1(2), only appears when AS_PATH > 125 © 2000, Cisco Systems, Inc. 63 ORIGIN • Origin of the prefix • Values: IGP (i) = via network command EGP (e) = learned from EGP incomplete (?) = redistribution © 2000, Cisco Systems, Inc. 64 Multi-Exit Discriminator (MED) • Indication (to external peers) of the preferred path into an AS used in multiple entry AS non-transitive • Compared only for routes from the same AS • Lower MED value is more preferable © 2000, Cisco Systems, Inc. 65 MED 690 A 1755 1880 B 209 • Configuration (rtr B): router bgp 1755 neighbor x.x.x.x remote-as 1880 neighbor x.x.x.x route-map set_MED out ! route-map set_MED permit 10 match as-path 2 set metric 2 ! ip as-path access-list 2 permit _690$ © 2000, Cisco Systems, Inc. 66 MED & IGP Metric • set metric-type internal enable BGP to advertise a MED which corresponds to the IGP metric values changes are monitored (and readvertised if needed) every 600s bgp dynamic-med-interval <secs> © 2000, Cisco Systems, Inc. 67 MED Comparison • MED is compared ONLY for prefixes received from the same AS (unless bgp always-compare-med is enabled) • If the AS_PATH is made up of only confederation sub-ASs, its length is not considered AND the MED is not compared • If an update is received with no MED, the router (by default) assigns it a value of 0 © 2000, Cisco Systems, Inc. 68 Community Attribute rfc1997 • Used to group destinations and apply a common policy • Each prefix can belong to multiple communities • Not propagated by default neighbor ip-address send-community © 2000, Cisco Systems, Inc. 69 Community Attribute (Cont.) • 32-bits long use 16 bits to indicate the ASN ip bgp-community new-format set community AS:community [additive] set community none erase all the values in the attribute set comm-list <number> delete erase selected communities © 2000, Cisco Systems, Inc. 70 Well-Known Communities • internet = all routes are members of this community • no-export = do not advertise to eBGP peers • no-advertise = do not advertise to any peer • local-AS = do not advertise outside local AS (used with confederations) © 2000, Cisco Systems, Inc. 71 No-Export Community 170.10.0.0/16 170.10.X.X No-Export 170.10.X.X D A AS 100 B C © 2000, Cisco Systems, Inc. E AS 200 170.10.0.0/16 G F 72 Extended Community Attribute draft-ramachandra-bgp-ext-communities-01 • Extended range 8 Bytes (64 bits) • Structure type:value Value may be of the form AS:xxx © 2000, Cisco Systems, Inc. 73 BGP Path Selection • 1 Only consider paths with reachable NEXT_HOPs • 2 Do not consider iBGP path if not synchronized • 3 Highest WEIGHT • 4 Highest LOCAL_PREF • 5 Prefer locally originated route • 6 Shortest AS_PATH © 2000, Cisco Systems, Inc. 74 BGP Path Selection • 7 Lowest ORIGIN code: IGP < EGP < incomplete • 8 Lowest Multi-Exit Discriminator (MED) 8a IF bgp always-compare-med, then compare it for all paths 8b Considered only if paths are from the same neighbor AS • 9 Prefer an External path over an Internal one • 10 Lowest IGP metric to the NEXT_HOP © 2000, Cisco Systems, Inc. 75 BGP Path Selection (Cont.) • 11 IF multipath is enabled, the router may install up to N parallel paths in the routing table • 12 For eBGP paths, select the “oldest” to minimize route-flap • 13 Lowest Router-ID Originator-ID is considered for reflected routes • 14 Shortest Cluster-List Client must be aware of RR attributes! • 15 Lowest neighbor IP address © 2000, Cisco Systems, Inc. 76 Prefix Generation And Aggregation Say what?! CCIE’00 Paris © 2000, Cisco Systems, Inc. 77 <network> Command • Networks originated by the local router • Matching IGP route must exist dynamic or static entry in routing table • Example: router bgp 109 network 200.10.10.0 network 198.10.0.0 mask 255.255.0.0 ! ip route 198.10.0.0 255.255.0.0 null 0 © 2000, Cisco Systems, Inc. 78 Redistribution • From IGP Typically NOT a good thing! • Static routes pointed to null0 • Example: router bgp 109 redistribute static ! ip route 198.10.0.0 255.255.0.0 null 0 © 2000, Cisco Systems, Inc. 79 Aggregate Addresses Aggregate Addresses • Combine different routes into one • Advertised as coming from the local AS • A component must exist in the BGP table © 2000, Cisco Systems, Inc. 80 Aggregation Attributes • Aggregator Attribute Last AS number that formed the aggregate route IP address of the BGP speaker that formed the aggregate route • Atomic Aggregate attribute indicates a more specific route exists BGP speaker receiving this attribute shall not remove the attribute when propagating it • Useful for debugging. Don’t affect route selection. © 2000, Cisco Systems, Inc. 81 Aggregate Attributes NEXT_HOP = local WEIGHT = 32768 LOCAL_PREF = best AS_PATH = AS_SET or nothing ORIGIN = worst MED = none © 2000, Cisco Systems, Inc. 82 <aggregate address> • With no options it propagates the aggregate and all the components • summary-only Advertise ONLY the aggregate (no components) Example: router bgp 109 aggregate-address 198.10.0.0 255.255.0.0 summary-only © 2000, Cisco Systems, Inc. 83 as-set • AS_SET unordered set of al ASs traversed helps avoid loops • advertise the prefix and the components AND include AS_SET information in the path © 2000, Cisco Systems, Inc. 84 as-set (Cont.) • Example: router bgp 1880 network 193.1.34.0 aggregate-address 193.0.32.0 255.255.254.0 as-set 1880 193.1.34/24 1883 193.0.32/24 1881 193.0.33/24 A 193.1.34/24 193.0.33/24 193.0.32/24 193.0.32/23 © 2000, Cisco Systems, Inc. 1880 1880 1881 1880 1883 1880 {1881,1883} 85 Options (Cont.) suppress | advertise | attribute-map suppress-map = suppress specific components advertise-map = create an aggregate from specific components attribute-map = set attributes for the aggregate route © 2000, Cisco Systems, Inc. 86 Conditional Advertisement • Conditionally advertise prefixes— useful for dual homing • Syntax: neighbor <address> advertise-map <route-map> non-exist-map <route-map> non-exist-map is periodically checked; if satisfied (i.e. routes are not in the BGP table), the prefixes matched by the advertise-map are advertised to the neighbor © 2000, Cisco Systems, Inc. 87 Soft Reconfiguration CCIE’00 Paris © 2000, Cisco Systems, Inc. 88 BGP Soft-Reconfiguration • Allows policies to be changed without clearing the neighbor • Both inbound and outbound Inbound requires additional memory Outbound is more efficient © 2000, Cisco Systems, Inc. 89 Soft-Reconfiguration • Outbound does not require any configuration • Inbound configuration: router bgp 30 neighbor 141.153.12.2 remote-as 32 neighbor 141.153.12.2 soft-reconfiguration neighbor 141.153.12.2 route-map filter in neighbor 141.153.30.2 remote-as 31 • <clear ip bgp x.x.x.x soft [in|out]> © 2000, Cisco Systems, Inc. 90 Managing Policy Changes clear ip bgp <addr> [soft] [in|out] • <addr> may be any of the following x.x.x.x IP address of a peer * all peers ASN all peers in an AS external all external peers peer-group <name> all peers in a peer-group © 2000, Cisco Systems, Inc. 91 Route Refresh Capability • Facilitates non-disruptive policy changes • No configuration is needed • No additional memory is used • clear ip bgp x.x.x.x in © 2000, Cisco Systems, Inc. 92 Internal mesh reduction CCIE’00 Paris © 2000, Cisco Systems, Inc. 93 IBGP Mesh • IBGP speaker does not advertise IBGP learned info to a third IBGP speaker!!! • Avoids routing information loop • Does not scale • Following solutions do not change the current behaviour Route reflectors Confederation © 2000, Cisco Systems, Inc. 94 Normal IBGP A AS 100 B © 2000, Cisco Systems, Inc. C 95 Route Reflector: Principle Route Reflector A AS 100 B © 2000, Cisco Systems, Inc. C 96 Route-reflector • Multiple level of RR RR RR AS 1 B AS2 © 2000, Cisco Systems, Inc. 97 Loop Avoidance • Originator_ID Attribute carries the RID of the originator of the route in the local AS • Cluster_list Attribute The local cluster-id (RR router-ID) is added when the update is reflected (added by the RR) © 2000, Cisco Systems, Inc. 98 Loop Avoidance • When RR receives an update: Check if its cluster-id is on the cluster-list If cluster-id is on the cluster-list the update is silently discarded If the BGP update is ok, the RR updates the cluster-list with its cluster-id and reflects the update (according to the rules) With multiple RR in the same cluster, a unique cluster-id should be set by configuration © 2000, Cisco Systems, Inc. 99 Confederations • Collection of AS—sub-AS • Visible to outside world as single AS • Uses reserved AS numbers for internal sub-AS • Sub-AS are fully meshed • EBGP between sub-AS © 2000, Cisco Systems, Inc. 100 Confederation Sub-AS 65002 A Sub-AS 65003 B C Sub-AS 65001 Confederation 100 © 2000, Cisco Systems, Inc. 101 Confederation: Principle • Mini-AS have eBGP like connections to other mini-AS • However they do carry all the usual IBGP information : MED, local-pref, next-hop. © 2000, Cisco Systems, Inc. 102 Confederation: AS-path 180.10.0.0/16 200 180.10.0.0/16 {65002} 200 A Sub-AS 65002 B 180.10.0.0/16 {65004 65002} 200 C Sub-AS 65004 H Sub-AS 65003 180.10.0.0/16 © 2000, Cisco Systems, Inc. D E F G 100 200 Sub-AS 65001 Confederation 100 103 RR vs Confederations • Route-Reflectors – Easy to configure (clients are unchanged) – RR configuration does not require any downtime – RR will scale easily © 2000, Cisco Systems, Inc. 104 RR vs Confederations • Confederations – Maintenance is complex due reconfiguration of ALL routers in AS – Sub-confederation may have different BGP policies © 2000, Cisco Systems, Inc. 105 Route Dampening CCIE’00 Paris © 2000, Cisco Systems, Inc. 106 Route Flap Dampening • Route flaps ripple through the entire Internet up and down of path change in attributes • Wastes CPU • Objective: reduce the scope of route flap propagation © 2000, Cisco Systems, Inc. 107 Route Flap Dampening 4 Suppress-Limit 3 Penalty Reuse-Limit 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Time © 2000, Cisco Systems, Inc. 108 Flap Dampening: Operation • Add fixed penalty for each flap flap = withdraw or attribute change • Exponentially decay penalty half-life determines rate • Penalty above suppress-limit = do not advertise up route • Penalty decayed below reuse-limit = advertise route © 2000, Cisco Systems, Inc. 109 MP-BGP CCIE’00 Paris © 2000, Cisco Systems, Inc. 110 Multi-Protocol BGP • Extension to the BGP protocol in order to carry routing information about other protocols ex: Multicast, MPLS-VPN, IPv6, CLNS, ... • Exchange of Multi-Protocol NLRI must be negotiated at session set up BGP Capabilities negotiation © 2000, Cisco Systems, Inc. 111 Multi-Protocol BGP - RFC2283 • New non-transitive and optional BGP attributes MP_REACH_NLRI “Carry the set of reachable destinations together with the next-hop information to be used for forwarding to these destinations” (RFC2283) MP_UNREACH_NLRI Carry the set of unreachable destinations © 2000, Cisco Systems, Inc. 112 Multi-Protocol BGP - RFC2283 • Attribute contains one or more Triples 1) Address Family Information (AFI) with Sub-AFI Identifies the protocol information carried in the NLRI field 2) Next-Hop Information Next-hop address must be of the same family 3) NLRI © 2000, Cisco Systems, Inc. 113 BGP Capabilities Negotiation • BGP routers establish BGP sessions through the OPEN message • OPEN message contains optional parameters • BGP session is terminated if OPEN parameters are not recognised • A new optional parameter: CAPABILITIES © 2000, Cisco Systems, Inc. 114 BGP Capabilities Negotiation • A BGP router sends an OPEN message with CAPABILITIES parameter containing its capabilities: Multiprotocol extension Route-refresh ... © 2000, Cisco Systems, Inc. 115 BGP Capabilities Negotiation • BGP routers determine capabilities of their neighbors by looking at the capabilities parameters in the open message • Unknown or unsupported capabilities may trigger the transmission of a NOTIFICATION message © 2000, Cisco Systems, Inc. 116 MBGP • MBGP: Multiprotocol BGP for Multicast NLRIs Multicast-BGP • Unicast and Multicast routes are carried through same BGP session © 2000, Cisco Systems, Inc. 117 MBGP • AFI, Sub-AFI part of MP_REACH_NLRI and MP_UNREACH_NLRI AFI = 1 (IPv4) Sub-AFI = 1 (NLRI is used for unicast) Sub-AFI = 2 (NLRI is used for multicast) Sub-AFI = 3 (NLRI is used for both unicast and multicast) • Separate BGP tables © 2000, Cisco Systems, Inc. 118 MBGP • MBGP is used to match RPF • MBGP does NOT propagate any multicast state • Same rules apply to path selection and validation BGP attributes (AS-Path, LocalPref, MED, …) • Recursive RPF lookup is done in unicast routing table © 2000, Cisco Systems, Inc. 119 MBGP • BGP/MBGP configuration allows to define which NLRI type are exchanged (unicast, multicast, both) set NLRI type through route-maps (redistribution) define policies through standard BGP attributes (for unicast and/or multicast NLRI) • Translation between multicast and unicast NLRIs © 2000, Cisco Systems, Inc. 120 MBGP BGP session for unicast and multicast NLRI AS 321 AS 123 192.168.100.0/24 RP RP receiver © 2000, Cisco Systems, Inc. BGP: 192.168.100.2 open active, local address 192.168.100.1 BGP: 192.168.100.2 went from Active to OpenSent BGP: 192.168.100.2 sending OPEN, version 4 BGP: 192.168.100.2 OPEN rcvd, version 4 BGP: 192.168.100.2 rcv OPEN w/ option parameter type: 2, len: 6 BGP: 192.168.100.2 OPEN has CAPABILITY code: 1, length 4 BGP: 192.168.100.2 OPEN has MP_EXT CAP for afi/safi: 1/1 BGP: 192.168.100.2 rcv OPEN w/ option parameter type: 2, len: 6 BGP: 192.168.100.2 OPEN has CAPABILITY code: 1, length 4 BGP: 192.168.100.2 OPEN has MP_EXT CAP for afi/safi: 1/2 BGP: 192.168.100.2 went from OpenSent to OpenConfirm BGP: 192.168.100.2 went from OpenConfirm to Established sender 121 MBGP and non-congruent topologies Single BGP session across loopback interfaces AS 321 AS 123 Unicast traffic 192.168.100.0/24 Multicast traffic 192.168.200.0/24 © 2000, Cisco Systems, Inc. router bgp 321 network 192.168.100.0 nlri unicast network 192.168.200.0 nlri multicast network 192.168.25.0 nlri unicast multicast neighbor 192.168.1.1 remote-as 123 nlri unicast multicast neighbor 192.168.1.1 ebgp-multihop 255 neighbor 192.168.1.1 update-source Loopback0 neighbor 192.168.1.1 route-map setNH out ! route-map setNH permit 10 match nlri multicast set ip next-hop 192.168.200.2 ! route-map setNH permit 15 match nlri unicast set ip next-hop 192.168.100.2 192.168.25.0/24 sender 122 MPLS-VPN What is an IP VPN ? • An IP network infrastructure delivering private network services over a public infrastructure Use a layer 3 backbone Scalability, easy provisioning Global as well as non-unique private address space © 2000, Cisco Systems, Inc. 123 VPN Models - The Overlay model • Private trunks over a TELCO/SP shared infrastructure Leased/Dialup lines FR/ATM circuits IP (GRE) tunnelling • Transparency between provider and customer networks • Optimal routing requires full mesh over backbone © 2000, Cisco Systems, Inc. 124 VPN Models - The Peer model • Both provider and customer network use same network protocol • CE and PE routers have a routing adjacency at each site • All provider routers hold the full routing information about all customer networks • Private addresses are not allowed © 2000, Cisco Systems, Inc. 125 VPN Models - MPLS-VPN: The True Peer model • Same as Peer model BUT !!! • Provider Edge routers receive and hold routing information only about VPNs directly connected • Reduces the amount of routing information a PE router will store • Routing information is proportional to the number of VPNs a router is attached to • MPLS is used within the backbone to switch packets (no need of full routing) © 2000, Cisco Systems, Inc. 126 MPLS VPN Connection Model VPN_A VPN_A MP-iBGP sessions 10.2.0.0 CE PE P P PE CE VPN_A VPN_B 10.2.0.0 CE CE VPN_A 11.6.0.0 10.1.0.0 CE VPN_B 10.1.0.0 11.5.0.0 CE PE P P PE CE VPN_B 10.3.0.0 • P routers (LSRs) are in the core of the MPLS cloud • PE routers use MPLS with the core and plain IP with CE routers • P and PE routers share a common IGP • PE router are MP-iBGP fully meshed © 2000, Cisco Systems, Inc. 127 MPLS VPN Connection Model P P PE PE VPN Backbone IGP P P MP-iBGP session • Multiple routing tables (VRFs) are used on PEs Each VRF contain customer routes Customer addresses can overlap VPNs are isolated • MP-BGP is used to propagate these addresses between PE routers © 2000, Cisco Systems, Inc. 128 MPLS VPN Connection Model Addresses overlap P P PE PE VPN Backbone IGP P P MP-iBGP session • BGP always propagate ONE route per destination • What if two customers are using the same address ? BGP will propagate only one route - PROBLEM !!! • Therefore MP-BGP will distinguish between customer addresses © 2000, Cisco Systems, Inc. 129 MPLS VPN Connection Model Route propagation through MP-BGP P P update for Site-1 Net1 VPN-A PE-2 PE-1 VPN-IPv4 updates are translated into IPv4 address and inserted into the VRF corresponding to the RT value VPN Backbone IGP P P Site-2 VPN-A update for Net1 update for Site-1 VPN-B update for Net1 CE-1 Net1 VPN-IPv4 update: RD1:Net1, Nexthop=PE-1 SOO=Site1, RT=Yellow, Label=10 VPN-IPv4 update: RD2:Net1, Nexthop=PE-1 SOO=Site1, RT=Green, Label=12 Site-2 VPN-B MP-BGP assign a RD to each route in order to make them unique In order to propagate them all MP-BGP assign a Route-Target in order for remote PEs to insert such route to the corresponding routing table (VRF) Route-Target is the colour of the route © 2000, Cisco Systems, Inc. 130 VPN Connection Model:Route propagation through MP-BGP P P update for Site-1 Net1 VPN-A PE-2 PE-1 VPN Backbone IGP P update for Net1 P Site-2 VPN-B update for Net1 update for Site-1 VPN-B VPN-IPv4 updates are translated into IPv4 address and inserted into the VRF corresponding to the RT value CE-1 Net1 VPN-IPv4 update: RD1:Net1, Nexthop=PE-1 SOO=Site1, RT=Yellow, Label=10 VPN-IPv4 update: RD2:Net1, Nexthop=PE-1 SOO=Site1, RT=Green, Label=12 Site-2 VPN-A When a PE router receives a MP-BGP route it does check the route-target value If such value is equal to the one intended to be used in a particular routing table the route is inserted into it The label associated with the route is stored and used to send packets towards the destination © 2000, Cisco Systems, Inc. 131 MPLS VPN Connection Model MP-BGP Update • VPN-IPV4 address Route Distinguisher 64 bits Makes the IPv4 route globally unique RD is configured in the PE for each VRF IPv4 address (32bits) • Extended Community attribute (64 bits) Site of Origin (SOO): identifies the originating site Route-target (RT): identifies the set of sites the route has to be advertised to © 2000, Cisco Systems, Inc. 132 MPLS VPN Connection Model MP-BGP Update Any other standard BGP attribute Local Preference MED Next-hop AS_PATH Standard Community ... A Label identifying: The outgoing interface The VRF where a lookup has to be done (aggregate label) The BGP label will be the second label in the label stack of packets travelling in the core CCIE’00 Paris © 2000, Cisco Systems, Inc. 133 Scaling • Existing BGP techniques can be used to scale the route distribution: route reflectors • Each edge router needs only the information for the VPNs it supports Directly connected VPNs • RRs are used to distribute VPN routing information © 2000, Cisco Systems, Inc. 134 Scaling • Very highly scalable: Initial VPN release: 1000 VPNs x 1000 sites/VPN = 1,000,000 sites Architecture supports 100,000+ VPNs, 10,000,000+ sites BGP “segmentation” through RRs is essential !!!! • Easy to add new sites • configure the site on the PE connected to it • the network automagically does the rest © 2000, Cisco Systems, Inc. 135 MPLS-VPN Scaling BGP VPN_A Route Reflectors VPN_A RR 10.2.0.0 VPN_B 10.2.0.0 CE P P P P PE2 CE 11.5.0.0 VPN_A PE CE 10.1.0.0 VPN_B PE CE PE1 VPN_B 10.1.0.0 CE CE VPN_A 11.6.0.0 RR 10.3.0.0 CE • Route Reflectors may be partitioned Each RR store routes for a set of VPNs • Thus, no BGP router needs to store ALL VPNs information • PEs will peer to RRs according to the VPNs they directly connect © 2000, Cisco Systems, Inc. 136 MPLS-VPN Scaling BGP updates filtering iBGP full mesh between PEs results in flooding all VPNs routes to all PEs Scaling problems when large amount of routes. In addition PEs need only routes for attached VRFs Therefore each PE will discard any VPN-IPv4 route that hasn’t a route-target configured to be imported in any of the attached VRFs This reduces significantly the amount of information each PE has to store Volume of BGP table is equivalent of volume of attached VRFs (nothing more) © 2000, Cisco Systems, Inc. 137 Conclusion CCIE’00 Paris © 2000, Cisco Systems, Inc. 138 Summary • BGP represents a viable solution today for Service Providers to: Offer new world IP-VPN services. Interconnect transit and non transit AS to the Internet • And for Enterprise customers to Scale Big networks and dual home their AS. © 2000, Cisco Systems, Inc. 139 Thanks to • Stefano Previdi for his slides!!! • You for your attention!!!!! © 2000, Cisco Systems, Inc. 140 CCIE’00 Paris © 2000, Cisco Systems, Inc. 141