IXP Training Workshops Contact: training@apnic.net WROU03_v1.0 Introduction to The Internet IXP Training Workshops 2 Introduction to the Internet Topologies and Definitions IP Addressing Internet Hierarchy Gluing it all together 3 Topologies and Definitions What does all the jargon mean? 4 Some Icons… Router (layer 3, IP datagram forwarding) Ethernet switch (layer 2, packet forwarding) Network Cloud 5 Routed Backbone ISPs build networks covering regions Regions can cover a country, sub-continent, or even global Each region has points of presence built by the ISP Routers are the infrastructure Physical circuits run between routers Easy routing configuration, operation and troubleshooting The dominant topology used in the Internet today 6 MPLS Backbones Some ISPs & Telcos use Multi Protocol Label Switching (MPLS) MPLS is built on top of router infrastructure Used replace old ATM technology Tunnelling technology Main purpose is to provide VPN services Although these can be done just as easily with other tunnelling technologies such as GRE 7 Points of Presence PoP – Point of Presence vPoP – virtual PoP Physical location of ISP’s equipment Sometimes called a “node” To the end user, it looks like an ISP location In reality a back hauled access point Used mainly for consumer access networks Hub/SuperPoP – large central PoP Links to many PoPs 8 PoP Topologies Core routers Distribution routers connections to other providers Service routers high port density, connecting the end users to the network Border routers higher port density, aggregating network edge to the network core Access routers high speed trunk connections hosting and servers Some functions might be handled by a single router 9 Typical PoP Design Other ISPs Other ISPs Border Backbone link to another PoP Backbone link to another PoP Network Core Service Network Operation Centre Access Business Customer Aggregation Service ISP Services (DNS, Mail, News, FTP, WWW) Access Hosted Services Consumer Aggregation 10 More Definitions Transit Peering Carrying traffic across a network Usually for a fee Exchanging routing information and traffic Usually for no fee Sometimes called settlement free peering Default Where to send traffic when there is no explicit match in the routing table 11 Peering and Transit example provider A IXP-West Backbone Provider D IXP-East provider B provider C A and B peer for free, but need transit arrangements with D to get packets to/from C 12 Private Interconnect Autonomous System 334 ISP B border border ISP A Autonomous System 99 13 Public Interconnect A location or facility where several ISPs are present and connect to each other over a common shared media Why? To save money, reduce latency, improve performance IXP – Internet eXchange Point NAP – Network Access Point 14 Public Interconnect Centralised (in one facility) Distributed (connected via WAN links) Switched interconnect Ethernet (Layer 2) Technologies such as SRP, FDDI, ATM, Frame Relay, SMDS and even routers have been used in the past Each provider establishes peering relationship with other providers at IXP ISP border router peers with all other provider border routers 15 Public Interconnect ISP 1 ISP 2 ISP 3 ISP 4 IXP ISP 5 ISP 6 Each of these represents a border router in a different autonomous system 16 ISPs participating in Internet Bringing all pieces together, ISPs: Build multiple PoPs in a distributed network Build redundant backbones Have redundant external connectivity Obtain transit from upstream providers Get free peering from local providers at IXPs 17 Example ISP Backbone Design ISP Peer ISP Peer IXP ISP Peer ISP Peer Upstream1 Upstream 2 Upstream 2 PoP 2 Upstream1 PoP 1 Network Core Backbone Links PoP 3 PoP 4 18 IP Addressing Where to get address space and who from 19 IP Addressing Internet uses classless routing Concept of IPv4 class A, class B or class C is no more Engineers talk in terms of prefix length, for example the class B 158.43 is now called 158.43/16. All routers must be CIDR capable Classless InterDomain Routing RFC1812 – Router Requirements 20 IP Addressing Pre-CIDR (before 1994) The CIDR IPv4 years (1994 to 2010) Big networks got a class A Medium networks got a class B Small networks got a class C Sizes of IPv4 allocations/assignments made according to demonstrated need – CLASSLESS IPv6 adoption (from 2011) The size of IPv4 address allocations and assignments are now very limited as IANA’s free pool has run out 21 IP Addressing IP Address space is a resource shared amongst all Internet users Regional Internet Registries delegated allocation responsibility by the IANA AfriNIC, APNIC, ARIN, LACNIC & RIPE NCC are the five RIRs RIRs allocate address space to ISPs and Local Internet Registries ISPs/LIRs assign address space to end customers or other ISPs All usable IPv4 address space has been allocated to the RIRs by the IANA (February 2011) The time for IPv6 is now 22 Non-portable Address Space “Provider Aggregatable” or “PA Space” Customer uses RIR member’s address space while connected to Internet Customer has to renumber to change ISP Aids control of size of Internet routing table Need to fragment provider block when multihoming PA space is allocated to the RIR member All assignments made by the RIR member to end sites are announced as an aggregate to the rest of the Internet 23 Portable Address Space “Provider Independent” or “PI Space” Customer gets or has address space independent of ISP Customer keeps addresses when changing ISP Is very bad for size of Internet routing table Is very bad for scalability of the routing system PI space is rarely distributed by the RIRs 24 Internet Hierarchy The pecking order 25 High Level View of the Global Internet Global Providers Regional Provider 1 Regional Provider 2 Content Provider 1 Access R4 1 Provider Content Provider 2 Internet Exchange Point Access Provider 2 Customer Networks 26 Detailed View of the Global Internet Global Transit Providers Regional Transit Providers Connect to each other Provide connectivity to Regional Transit Providers Connect to each other Provide connectivity to Content Providers Provide connectivity to Access Providers Access Providers Connect to each other across IXPs (free peering) Provide access to the end user 27 Categorising ISPs Tier 1 ISP $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ Tier 1 ISP Tier 1 ISP Tier 1 ISP Tier 2 ISP Tier 2 ISP Tier 2 ISP Tier 2 ISP IXP Tier 3 ISP IXP Tier 3 ISP Tier 3 ISP Tier 3 ISP Tier 3 ISP Tier 3 ISP 28 Inter-provider relationships Peering between equivalent sizes of service providers (e.g. Tier 2 to Tier 2) Peering across exchange points Shared cost private interconnection, equal traffic flows No cost peering If convenient, of mutual benefit, technically feasible Fee based peering Unequal traffic flows, “market position” 29 Default Free Zone The default free zone is made up of Internet routers which have explicit routing information about the rest of the Internet, and therefore do not need to use a default route NB: is not related to where an ISP is in the hierarchy 30 Gluing it together 31 Gluing it together Who runs the Internet? How does it keep working? No one (Definitely not ICANN, nor the RIRs, nor the US,…) Inter-provider business relationships and the need for customer reachability ensures that the Internet by and large functions for the common good Any facilities to help keep it working? Not really. But… Engineers keep working together! 32 Engineers keep talking to each other... North America Latin America NANOG (North American Network Operators Group) NANOG meetings and mailing list www.nanog.org Foro de Redes NAPLA LACNOG – supported by LACNIC Middle East MENOG (Middle East Network Operators Group) www.menog.net 33 Engineers keep talking to each other... Asia & Pacific APRICOT annual conference APOPS & APNIC-TALK mailing lists mailman.apnic.net/mailman/listinfo/apops mailman.apnic.net/mailman/listinfo/apnic-talk PacNOG (Pacific NOG) www.apricot.net mailman.apnic.net/mailman/listinfo/pacnog SANOG (South Asia NOG) E-mail to sanog-request@sanog.org 34 Engineers keep talking to each other... Europe Africa RIPE meetings, working groups and mailing lists e.g. Routing WG: www.ripe.net/mailman/listinfo/routing-wg AfNOG meetings and mailing list And many in-country ISP associations and NOGs IETF meetings and mailing lists www.ietf.org 35 Summary Topologies and Definitions IP Addressing Internet Hierarchy PA versus PI address space Local, Regional, Global Transit Providers IXPs Gluing it all together Engineers cooperate, common business interests 36 Introduction to The Internet ISP Training Workshops 37 The Value of Peering ISP Training Workshops 38 The Internet Internet is made up of ISPs of all shapes and sizes These ISPs interconnect their businesses Some have local coverage (access providers) Others can provide regional or per country coverage And others are global in scale They don’t interconnect with every other ISP (over 41000 distinct autonomous networks) – won’t scale They interconnect according to practical and business needs Some ISPs provide transit to others They interconnect other ISP networks 39 Categorising ISPs Global ISP $ $ $ $ $ $ $ $ Regional ISP $ $ $ $ Access ISP $ $ $ Global ISP Global ISP Global ISP Regional ISP Regional ISP Regional ISP IXP IXP Access ISP Access ISP Access ISP Access ISP Access ISP 40 Peering and Transit Transit Carrying traffic across a network Usually for a fee Example: Access provider connects to a regional provider Peering Exchanging routing information and traffic Usually for no fee Sometimes called settlement free peering Example: Regional provider connects to another regional provider 41 Private Interconnect Two ISPs connect their networks over a private link Can be peering arrangement No charge for traffic Share cost of the link Can be transit arrangement One ISP charges the other for traffic One ISP (the customer) pays for the link ISP 1 ISP 2 42 Public Interconnect Several ISPs meeting in a common neutral location and interconnect their networks Usually is a peering arrangement between their networks ISP 1 ISP 6 ISP 2 ISP 3 IXP ISP 5 ISP 4 43 ISP Goals Minimise the cost of operating the business Transit ISP has to pay for circuit (international or domestic) ISP has to pay for data (usually per Mbps) Repeat for each transit provider Significant cost of being a service provider Peering ISP shares circuit cost with peer (private) or runs circuit to public peering point (one off cost) No need to pay for data Reduces transit data volume, therefore reducing cost 44 Transit – How it works Small access provider provides Internet access for a city’s population Mixture of dial up, wireless and fixed broadband Possibly some business customers Possibly also some Internet cafes How do their customers get access to the rest of the Internet? ISP buys access from one, two or more larger ISPs who already have visibility of the rest of the Internet This is transit – they pay for the physical connection to the upstream and for the traffic volume on the link 45 Peering – How it works If two ISPs are of equivalent sizes, they have: Equivalent network infrastructure coverage Equivalent customer size Similar content volumes to be shared with the Internet Potentially similar traffic flows to each other’s networks This makes them good peering partners If they don’t peer They both have to pay an upstream provider for access to each other’s network/customers/content Upstream benefits from this arrangement, the two ISPs both have to fund the transit costs 46 The IXP’s role Private peering makes sense when there are very few equivalent players Connecting to one other ISP costs X Connecting to two other ISPs costs 2 times X Connecting to three other ISPs costs 3 times X Etc… (where X is half the circuit cost plus a port cost) The more private peers, the greater the cost IXP is a more scalable solution to this problem 47 The IXP’s role Connecting to an IXP Some IXPs charge annual “maintenance fees” ISP costs: one router port, one circuit, and one router to locate at the IXP The maintenance fee has potential to significantly influence the cost balance for an ISP Generally connecting to an IXP and peering there becomes cost effective when there are at least three other peers The real $ amount varies from region to region, IXP to IXP 48 Who peers at an IXP? Access Providers Don’t have to pay their regional provider transit fees for local traffic Keeps latency for local traffic low ‘Unlimited’ bandwidth through the IXP (compared with costly and limited bandwidth through transit provider) Regional Providers Don’t have to pay their global provider transit for local and regional traffic Keeps latency for local and regional traffic low ‘Unlimited’ bandwidth through the IXP (compared with costly and limited bandwidth through global provider) 49 The IXP’s role Global Providers can be located close to IXPs Attracted by the potential transit business available Advantageous for access & regional providers They can peer with other similar providers at the IXP And in the same facility pay for transit to their regional or global provider (Not across the IXP fabric, but a separate connection) IXP Transit Access 50 Connectivity Decisions Transit Almost every ISP needs transit to reach rest of Internet One provider = no redundancy Two providers: ideal for traffic engineering as well as redundancy Three providers = better redundancy, traffic engineering gets harder More then three = diminishing returns, rapidly escalating costs and complexity Peering Means low (or zero) cost access to another network Private or Public Peering (or both) 51 Transit Goals 1. Minimise number of transit providers 2. But maintain redundancy 2 is ideal, 4 or more is bad Aggregate capacity to transit providers More aggregated capacity means better value Lower cost per Mbps 4x 45Mbps circuits to 4 different ISPs will almost always cost more than 2x 155Mbps circuits to 2 different ISPs Yet bandwidth of latter (310Mbps) is greater than that of former (180Mbps) and is much easier to operate 52 Peering or Transit? How to choose? Or do both? It comes down to cost of going to an IXP Free peering Paying for transit from an ISP co-located in same facility, or perhaps close by Or not going to an IXP and paying for the cost of transit directly to an upstream provider There is no right or wrong answer, someone has to do the arithmetic 53 Private or Public Peering Private peering Public peering Scaling issue, with costs, number of providers, and infrastructure provisioning Makes sense the more potential peers there are (more is usually greater than “two”) Which public peering point? Local Internet Exchange Point: great for local traffic and local peers Regional Internet Exchange Point: great for meeting peers outside the locality, might be cheaper than paying transit to reach the same consumer base 54 Local Internet Exchange Point Defined as a public peering point serving the local Internet industry Local means where it becomes cheaper to interconnect with other ISPs at a common location than it is to pay transit to another ISP to reach the same consumer base Local can mean different things in different regions! 55 Regional Internet Exchange Point These are also “local” Internet Exchange Points But also attract regional ISPs and ISPs from outside the locality Regional ISPs peer with each other And show up at several of these Regional IXPs Local ISPs peer with ISPs from outside the locality They don’t compete in each other’s markets Local ISPs don’t have to pay transit costs ISPs from outside the locality don’t have to pay transit costs Quite often ISPs of disparate sizes and influences will happily peer – to defray transit costs 56 Which IXP? How many routes are available? What is the cost of co-lo space? If prohibitive or space not available, pointless choosing this IXP What is the cost of running a circuit to the location? What is traffic to & from these destinations, and by how much will it reduce cost of transit? If prohibitive or competitive with transit costs, pointless choosing this IXP What is the cost of remote hands/assistance? If no remote hands, doing maintenance is challenging and potentially costly with a serious outage 57 Example: South Asian ISP @ LINX Date: October 2011 Facts: Route Server plus bilateral peering offers 81k prefixes IXP traffic averages 55Mbps/15Mbps Transit traffic averages 35Mbps/3Mbps Analysis: 61% of inbound traffic comes from 81k prefixes available by peering 39% of inbound traffic comes from remaining 287k prefixes from transit provider 58 Example: South Asian ISP @ HKIX Date: October 2011 Facts: Route Server plus bilateral peering offers 34k prefixes IXP traffic is 130Mbps/30Mbps Transit traffic is 125Mbps/40Mbps Analysis: 51% of inbound traffic comes from 42k prefixes available by peering 49% of inbound traffic comes from remaining 326k prefixes from transit provider 59 Example: South Asian ISP Summary: Traffic by Peering: 185Mbps/45Mbps Traffic by Transit: 160Mbps/43Mbps 54% of incoming traffic is by peering 52% of outbound traffic is by peering 60 Example: South Asian ISP Router at remote co-lo Servers at remote co-lo Benefits: can select peers, easy to swap transit providers Costs: co-lo space and remote hands Benefits: mail filtering, content caching, etc Costs: co-lo space and remote hands Overall advantage: Can control what goes on the expensive connectivity “back to home” 61 Value propositions Peering at a local IXP Reduces latency & transit costs for local traffic Improves Internet quality perception Participating at a Regional IXP A means of offsetting transit costs Managing connection back to home network Improving Internet Quality perception for customers 62 Summary Benefits of peering Private Internet Exchange Points Local versus Regional IXPs Local services local traffic Regional helps defray transit costs 63 Worked Example Single International Transit Versus Local IXP + Regional IXP + Transit 64 Worked Example ISP A is local access provider Some business customers (around 200 fixed links) Some co-located content provision (datacentre with 100 servers) Some consumers on broadband (5000 DSL/Cable/Wireless) Some consumers on dial (1000 on V.34 type speeds) They have a single transit provider Connect with a 16Mbps international leased link to their transit’s PoP Transit link is highly congested 65 Worked Example (2) There are two other ISPs serving the same locality Course of action for our ISP: There is no interconnection between any of the three ISPs Local traffic (between all 3 ISPs) is traversing International connections Work to establish local IXP Establish presence at overseas co-location First Step Assess local versus international traffic ratio Use NetFlow on border router connecting to transit provider 66 Worked Example (3) Local/Non-local traffic ratio Example: balance is 30:70 Local = traffic going to other two ISPs Non-local = traffic going elsewhere Of 16Mbps, that means 5Mbps could stay in country and not congest International circuit 16Mbps transit costs $50 per Mbps per month traffic charges = $250 per month, or $3000 per year for local traffic Circuit costs $100k per year: $30k is spent on local traffic Total is $33k per year for local traffic 67 Worked Example (4) IXP cost: Simple 8 port 10/100 managed switch plus co-lo space over 3 years could be around US$30k total; or $3k per year per ISP One router to handle 5Mbps (e.g. 2801) would be around $3k (good for 3 years) One local 10Mbps circuit from ISP location to IXP location would be around $5k per year, no traffic charges Per ISP total: $9k Somewhat cheaper than $33k Business case for local peering is straightforward - $24k saving per annum 68 Worked Example (5) After IXP establishment 5Mbps removed from International link Leaving 5Mbps for more International traffic – and that fills the link within weeks of the local traffic being removed Next step is to assess transit charges and optimise costs ISPs visits several major regional IXPs Assess routes available Compares routes available with traffic generated by those routes from its Netflow data Discovers that 30% of traffic would transfer to one IXP via peering 69 Worked Example (6) Costs: Router for Regional IXP (e.g. 2801) at $3k over three years Co-lo space at Regional IXP venue at $3k per year Best price for transit at the Regional IXP venue by competitive tender is $30 per Mbps per month, plus $1k port charge 30% of traffic offloads to IXP, leaving 70% of 16Mbps to transit provider = $330 per month, or $5k per annum Total with this model is $9k per year, plus the cost of the circuit (still $100k) Compare this with paying $50 per Mbps per month to the transit provider = $10k per annum (plus cost of the circuit) 70 Worked Example (7) Result: ISP co-locates at Regional IXP Pays reduced transit charges to transit provider (competitive tender) Pays no charges for traffic across Regional IXP Bonuses: Rate limits on router at Regional IXP Co-lo Can prioritise congestion dependent on customer demands Install servers at Regional IXP co-lo facility Filters e-mail (spam and viruses) – relieves some capacity on link Caches content – relieves a little more capacity on link 71 Conclusion Within the original costs of having one international transit provider: ISP has turned up at the local IXP and offloaded local traffic for free ISP has turned up at a major regional IXP and offloaded traffic, avoiding paying transit charges to transit provider ISP has reduced remaining transit charges by competitive tender at the regional IXP co-location facility Caveat These numbers are typical of the Internet today As ever, your mileage may vary – but do the financial calculations first and in the context of potential technical advantages too 72 The Value of Peering ISP Training Workshops 73 Introduction to OSPF ISP Training Workshops 74 OSPF Open Shortest Path First Link state or SPF technology Developed by OSPF working group of IETF (RFC 1247) OSPFv2 standard described in RFC2328 Designed for: TCP/IP environment Fast convergence Variable-length subnet masks Discontiguous subnets Incremental updates Route authentication Runs on IP, Protocol 89 75 Link State Z’s Link State Q’s Link State Z Q Y X X’s Link State A B C Q Z X 2 13 13 Topology Information is kept in a Database separate from the Routing Table 76 Link State Routing Neighbour discovery Constructing a Link State Packet (LSP) Distribute the LSP (Link State Announcement – LSA) Compute routes On network failure New LSPs flooded All routers recompute routing table 77 Low Bandwidth Utilisation LSA X R1 LSA Only changes propagated Uses multicast on multi-access broadcast networks 78 Fast Convergence Detection Plus LSA/SPF Known as the Dijkstra Algorithm Alternate Path N1 R1 R2 X R3 N2 Primary Path 79 Fast Convergence Finding a new route LSA flooded throughout area Acknowledgement based Topology database synchronised Each router derives routing table to destination network LSA N1 R1 X 80 OSPF Areas Area is a group of contiguous hosts and networks Per area topology database Reduces routing traffic R2 Area 2 Invisible outside the area Backbone area MUST be contiguous R1 All other areas must be connected to the backbone Rc Area 0 Backbone Area Rd Rb Ra R5 R8 Area 3 R4 R7 Area 4 R6 Area 1 R3 81 Virtual Links between OSPF Areas Virtual Link is used when it is not possible to physically connect the area to the backbone ISPs avoid designs which require virtual links Increases complexity Decreases reliability and scalability Rc Area 0 Backbone Area Rd Rb Ra Area 4 R5 R8 R4 R7 Area 1 R6 R3 82 Classification of Routers IR R1 IR R2 Area 2 Area 3 Rc Rb ABR/BR Area 0 Rd Ra ASBR To other AS IR/BR R5 R4 Area 1 R3 Internal Router (IR) Area Border Router (ABR) Backbone Router (BR) Autonomous System Border Router (ASBR) 83 OSPF Route Types IR R1 IR R2 Area 2 Area 3 Rc Rb ABR/BR Area 0 Rd Ra ASBR To other AS R5 Intra-area Route R4 Inter-area Route Area 1 R3 all routes inside an area routes advertised from one area to another by an Area Border Router External Route routes imported into OSPF from other protocol 84 or static routes External Routes Prefixes which are redistributed into OSPF from other protocols Flooded unaltered throughout the AS Recommendation: Avoid redistribution!! OSPF supports two types of external metrics Type 1 external metrics Type 2 external metrics (Cisco IOS default) OSPF R2 Redistribute RIP EIGRP BGP Static Connected etc. 85 External Routes Type 1 external metric: metrics are added to the summarised internal link cost Cost = 10 R2 to N1 External Cost = 1 R1 Cost = 8 Network N1 N1 Type 1 11 10 Next Hop R2 R3 R3 to N1 External Cost = 2 Selected Route 86 External Routes Type 2 external metric: metrics are compared without adding to the internal link cost Cost = 10 R2 to N1 External Cost = 1 R1 Cost = 8 Network N1 N1 Type 1 1 2 Next Hop R2 R3 R3 to N1 External Cost = 2 Selected Route 87 Topology/Link State Database A router has a separate LS database for each area to which it belongs All routers belonging to the same area have identical database SPF calculation is performed separately for each area LSA flooding is bounded by area Recommendation: Limit the number of areas a router participates in!! 1 to 3 is fine (typical ISP design) >3 can overload the CPU depending on the area topology complexity 88 The Hello Protocol Responsible for establishing and maintaining neighbour relationships Elects designated router on multi-access networks Hello Hello Hello 89 The Hello Packet Contains: Router priority Hello interval Router dead interval Network mask List of neighbours DR and BDR Options: E-bit, MC-bit,… (see A.2 of RFC2328) Hello Hello Hello 90 Designated Router There is ONE designated router per multiaccess network Generates network link advertisements Assists in database synchronization Designated Router Designated Router Backup Designated Router Backup Designated Router 91 Designated Router by Priority Configured priority (per interface) ISPs configure high priority on the routers they want as DR/BDR Else determined by highest router ID Router ID is 32 bit integer Derived from the loopback interface address, if configured, otherwise the highest IP address 131.108.3.2 R1 131.108.3.3 DR R1 Router ID = 144.254.3.5 144.254.3.5 R2 R2 Router ID = 131.108.3.3 92 Neighbouring States Full Routers are fully adjacent Databases synchronised Relationship to DR and BDR Full DR BDR 93 Neighbouring States 2-way Router sees itself in other Hello packets DR selected from neighbours in state 2-way or greater 2-way DR BDR 94 When to Become Adjacent Underlying network is point to point Underlying network type is virtual link The router itself is the designated router or the backup designated router The neighbouring router is the designated router or the backup designated router 95 LSAs Propagate Along Adjacencies DR BDR LSAs acknowledged along adjacencies 96 Broadcast Networks IP Multicast used for Sending and Receiving Updates All routers must accept packets sent to AllSPFRouters (224.0.0.5) All DR and BDR routers must accept packets sent to AllDRouters (224.0.0.6) Hello packets sent to AllSPFRouters (Unicast on point-to-point and virtual links) 97 Routing Protocol Packets Share a common protocol header Routing protocol packets are sent with type of service (TOS) of 0 Five types of OSPF routing protocol packets Hello – packet type 1 Database description – packet type 2 Link-state request – packet type 3 Link-state update – packet type 4 Link-state acknowledgement – packet type 5 98 Different Types of LSAs Six distinct type of LSAs Type Type Type Type Type Type 1: 2: 3 & 4: 5 & 7: 6: 9, 10 & 11: Router LSA Network LSA Summary LSA External LSA (Type 7 is for NSSA) Group membership LSA Opaque LSA (9: Link-Local, 10: Area) 99 Router LSA (Type 1) Describes the state and cost of the router’s links to the area All of the router’s links in an area must be described in a single LSA Flooded throughout the particular area and no more Router indicates whether it is an ASBR, ABR, or end point of virtual link 100 Network LSA (Type 2) Generated for every transit broadcast and NBMA network Describes all the routers attached to the network Only the designated router originates this LSA Flooded throughout the area and no more 101 Summary LSA (Type 3 and 4) Describes the destination outside the area but still in the AS Flooded throughout a single area Originated by an ABR Only inter-area routes are advertised into the backbone Type 4 is the information about the ASBR 102 External LSA (Type 5 and 7) Defines routes to destination external to the AS Default route is also sent as external Two types of external LSA: E1: Consider the total cost up to the external destination E2: Considers only the cost of the outgoing interface to the external destination (Type 7 LSAs used to describe external LSA for one specific OSPF area type) 103 Inter-Area Route Summarisation Prefix or all subnets Prefix or all networks ‘Area range’ command R2 With Network summarisation 1 Without Network summarisation 1.A 1.B 1.C Next Hop R1 Next Hop R1 R1 R1 Backbone Area 0 (ABR) R1 1.A 1.B Area 1 1.C 104 No Summarisation Specific Link LSA advertised out of each area Link state changes propagated out of each area 1.A 1.B 1.C 1.D 3.A 3.B 3.C 3.D Area 0 2.A 2.B 2.C 2.D 1.A 1.C 1.B 1.D 3.A 2.A 2.C 2.B 3.C 2.D 3.B 3.D 105 With Summarisation Only summary LSA advertised out of each area Link state changes do not propagate out of the area 1 3 Area 0 2 1.A 1.C 1.B 1.D 3.A 2.A 2.C 2.B 3.C 2.D 3.B 3.D 106 No Summarisation Specific Link LSA advertised in to each area Link state changes propagated in to each area 2.A 2.C 3.A 3.C 2.B 2.D 3.B 3.D Area 0 1.A 1.C 3.A 3.C 1.A 1.C 1.A 1.C 2.A 2.C 1.B 1.D 3.B 3.D 1.B 1.D 3.A 2.A 2.C 2.B 3.C 2.D 1.B 1.D 2.B 2.D 3.B 3.D 107 With Summarisation Only summary link LSA advertised in to each area Link state changes do not propagate in to each area 2 3 1 2 Area 0 1 3 1.A 1.C 1.B 1.D 3.A 2.A 2.C 2.B 3.C 2.D 3.B 3.D 108 Types of Areas Regular Stub Totally Stubby Not-So-Stubby Only “regular” areas are useful for ISPs Other area types handle redistribution of other routing protocols into OSPF – ISPs don’t redistribute anything into OSPF The next slides describing the different area types are provided for information only 109 Regular Area (Not a Stub) From Area 1’s point of view, summary networks from other areas are injected, as are external networks such as X.1 ASBR X.1 2 3 X.1 External networks 1 2 X.1 Area 0 X.1 1 3 X.1 1.A 1.C 1.B 1.D X.1 X.1 2.A 2.C 3.A 2.B 3.C 2.D 3.B 3.D 110 Normal Stub Area Summary networks, default route injected Command is area x stub ASBR Default 2 3 X.1 External networks 1 2 Default Area 0 Default 1 3 X.1 1.A 1.C 1.B 1.D X.1 X.1 2.A 2.C 3.A 2.B 3.C 2.D 3.B 3.D 111 Totally Stubby Area Only a default route injected Default path to closest area border router Command is area x stub no-summary Totally Stubby Area X.1 Default ASBR X.1 External networks 1 2 Default Area 0 Default 1 3 1.A 1.C 1.B 1.D X.1 X.1 2.A 2.C 3.A 2.B 3.C 2.D 3.B 3.D 112 Not-So-Stubby Area Capable of importing routes in a limited fashion Type-7 LSA’s carry external information within an NSSA NSSA Border routers translate selected type-7 LSAs into type-5 external network LSAs ASBR X.1 External networks Not-SoStubby Area X.1 Default Area 0 Default X.2 1 3 1.A X.2 External networks 1 2 Default X.2 1.C 1.B 1.D X.2 X.2 X.1 X.1 2.A 2.C 3.A 2.B 3.C 2.D 3.B 3.D 113 ISP Use of Areas ISP networks use: Backbone area Backbone area Regular area No partitioning Regular area Summarisation of point to point link addresses used within areas Loopback addresses allowed out of regular areas without summarisation (otherwise iBGP won’t work) 114 Addressing for Areas Area 0 network 192.168.1.0 range 255.255.255.192 Area 1 network 192.168.1.64 range 255.255.255.192 Area 2 network 192.168.1.128 range 255.255.255.192 Area 3 network 192.168.1.192 range 255.255.255.192 Assign contiguous ranges of subnets per area to facilitate summarisation 115 Summary Fundamentals of Scalable OSPF Network Design Area hierarchy DR/BDR selection Contiguous intra-area addressing Route summarisation Infrastructure prefixes only 116 Introduction to OSPF ISP Training Workshops 117 Deploying OSPF for ISPs ISP Training Workshops 118 Agenda OSPF Design in SP Networks Adding Networks in OSPF OSPF in Cisco’s IOS 119 OSPF Design As applicable to Service Provider Networks 120 Service Providers SP networks are divided into PoPs PoPs are linked by the backbone Transit routing information is carried via iBGP IGP is only used to carry the next hop for BGP Optimal path to the next hop is critical 121 SP Architecture Major routing information is ~430K prefixes via BGP Largest known IGP routing table is ~9–10K Total of 440K 10K/440K is 2½% of IGP routes in an ISP network A very small factor but has a huge impact on network convergence! Area 6/L1 BGP 1 POP POP Area 1/L1 BGP 1 Area 2/L1 BGP 1 IP Backbone Area0/L2 BGP 1 POP Area 5/L1 BGP 1 POP Area 3/L1 BGP 1 POP Area 4/L1 BGP 1 POP 122 SP Architecture Regional Core You can reduce the IGP size from 10K to approx the number of routers in your network This will bring really fast convergence Optimise where you must and summarise where you can Stops unnecessary flapping RR IGP Access customer customer customer 123 OSPF Design: Addressing OSPF Design and Addressing go together Objective is to keep the Link State Database lean Create an address hierarchy to match the topology Use separate Address Blocks for loopbacks, network infrastructure, customer interfaces & customers Customer Address Space PtP LinksInfrastructure Loopbacks 124 OSPF Design: Addressing Minimising the number of prefixes in OSPF: Number loopbacks out of a contiguous address block Use contiguous address blocks per area for infrastructure point-to-point links But do not summarise these across area boundaries: iBGP peer addresses need to be in the IGP Use area range command on ABR to summarise With these guidelines: Number of prefixes in area 0 will then be very close to the number of routers in the network It is critically important that the number of prefixes and LSAs in area 0 is kept to the absolute minimum 125 OSPF Design: Areas Examine physical topology Use areas and summarisation This reduces overhead and LSA counts (but watch next-hop for iBGP when summarising) Don’t bother with the various stub areas Is it meshed or hub-and-spoke? No benefits for ISPs, causes problems for iBGP Push the creation of a backbone Reduces mesh and promotes hierarchy 126 OSPF Design: Areas One SPF per area, flooding done per area Avoid externals in OSPF Watch out for overloading ABRs DO NOT REDISTRIBUTE into OSPF External LSAs flood through entire network Different types of areas do different flooding Normal areas Stub areas Totally stubby (stub no-summary) Not so stubby areas (NSSA) 127 OSPF Design: Areas Area 0 must be contiguous Do NOT use virtual links to join two Area 0 islands Traffic between two non-zero areas always goes via Area 0 There is no benefit in joining two non-zero areas together Avoid designs which have two non-zero areas touching each other (Typical design is an area per PoP, with core routers being ABR to the backbone area 0) 128 OSPF Design: Summary Think Redundancy Dual Links out of each area – using metrics (cost) for traffic engineering Too much redundancy… Dual links to backbone in stub areas must be the same cost – other wise sub-optimal routing will result Too Much Redundancy in the backbone area without good summarisation will effect convergence in the Area 0 129 OSPF Areas: Migration Where to place OSPF Areas? Follow the physical topology! Remember the earlier design advice Configure area at a time! Start at the outermost edge of the network Log into routers at either end of a link and change the link from Area 0 to the chosen Area Wait for OSPF to re-establish adjacencies And then move onto the next link, etc Important to ensure that there is never an Area 0 island anywhere in the migrating network 130 OSPF Areas: Migration A B C Area 0 D Area 10 E G Migrate small parts of the network, one area at a time F Remember to introduce summarisation where feasible With careful planning, the migration can be done with minimal network downtime 131 OSPF for Service Providers Configuring OSPF & Adding Networks 132 OSPF: Configuration Starting OSPF in Cisco’s IOS router ospf 100 Where “100” is the process ID OSPF process ID is unique to the router Gives possibility of running multiple instances of OSPF on one router Process ID is not passed between routers in an AS Many ISPs configure the process ID to be the same as their BGP Autonomous System Number 133 OSPF: Establishing Adjacencies Cisco IOS OSPFv2 automatically tries to establish adjacencies on all defined interfaces (or subnets) Best practice is to disable this Potential security risk: sending OSPF Hellos outside of the autonomous system, and risking forming adjacencies with external networks Example: Only POS4/0 interface will attempt to form an OSPF adjacency router ospf 100 passive-interface default no passive-interface POS4/0 134 OSPF: Adding Networks Option One Redistribution: Applies to all connected interfaces on the router but sends networks as external type-2s – which are not summarised router ospf 100 redistribute connected subnets Do NOT do this! Because: Type-2 LSAs flood through entire network These LSAs are not all useful for determining paths through backbone; they simply take up valuable space 135 OSPF: Adding Networks Option Two Per link configuration – from IOS 12.4 onwards OSPF is configured on each interface (same as ISIS) Useful for multiple subnets per interface interface POS 4/0 ip address 192.168.1.1 255.255.255.0 ip address 172.16.1.1 255.255.255.224 secondary ip ospf 100 area 0 ! router ospf 100 passive-interface default no passive-interface POS 4/0 136 OSPF: Adding Networks Option Three Specific network statements Every active interface with a configured IP address needs an OSPF network statement Interfaces that will have no OSPF neighbours need passive-interface to disable OSPF Hello’s That is: all interfaces connecting to devices outside the ISP backbone (i.e. customers, peers, etc) router ospf 100 network 192.168.1.0 0.0.0.3 area 51 network 192.168.1.4 0.0.0.3 area 51 passive-interface Serial 1/0 137 OSPF: Adding Networks Option Four Network statements – wildcard mask Every active interface with configured IP address covered by wildcard mask used in OSPF network statement Interfaces covered by wildcard mask but having no OSPF neighbours need passive-interface (or use passiveinterface default and then activate the interfaces which will have OSPF neighbours) router ospf 100 network 192.168.1.0 0.0.0.255 area 51 passive-interface default no passive interface POS 4/0 138 OSPF: Adding Networks Recommendations Don’t ever use Option 1 Use Option 2 if supported; otherwise: Option 3 is fine for core/infrastructure routers Doesn’t scale too well when router has a large number of interfaces but only a few with OSPF neighbours solution is to use Option 3 with “no passive” on interfaces with OSPF neighbours Option 4 is preferred for aggregation routers Or use iBGP next-hop-self Or even ip unnumbered on external point-to-point links 139 OSPF: Adding Networks Example One (Cisco IOS ≥ 12.4) Aggregation router with large number of leased line customers and just two links to the core network: interface loopback 0 ip address 192.168.255.1 255.255.255.255 ip ospf 100 area 0 interface POS 0/0 ip address 192.168.10.1 255.255.255.252 ip ospf 100 area 0 interface POS 1/0 ip address 192.168.10.5 255.255.255.252 ip ospf 100 area 0 interface serial 2/0:0 ... ip unnumbered loopback 0 ! Customers connect here ^^^^^^^ router ospf 100 passive-interface default no passive interface POS 0/0 no passive interface POS 1/0 140 OSPF: Adding Networks Example One (Cisco IOS < 12.4) Aggregation router with large number of leased line customers and just two links to the core network: interface loopback 0 ip address 192.168.255.1 255.255.255.255 interface POS 0/0 ip address 192.168.10.1 255.255.255.252 interface POS 1/0 ip address 192.168.10.5 255.255.255.252 interface serial 2/0:0 ... ip unnumbered loopback 0 ! Customers connect here ^^^^^^^ router ospf 100 network 192.168.255.1 0.0.0.0 area 51 network 192.168.10.0 0.0.0.3 area 51 network 192.168.10.4 0.0.0.3 area 51 passive-interface default no passive interface POS 0/0 no passive interface POS 1/0 141 OSPF: Adding Networks Example Two (Cisco IOS ≥ 12.4) Core router with only links to other core routers: interface loopback 0 ip address 192.168.255.1 255.255.255.255 ip ospf 100 area 0 interface POS 0/0 ip address 192.168.10.129 255.255.255.252 ip ospf 100 area 0 interface POS 1/0 ip address 192.168.10.133 255.255.255.252 ip ospf 100 area 0 interface POS 2/0 ip address 192.168.10.137 255.255.255.252 ip ospf 100 area 0 interface POS 2/1 ip address 192.168.10.141 255.255.255.252 ip ospf 100 area 0 router ospf 100 passive interface loopback 0 142 OSPF: Adding Networks Example Two (Cisco IOS < 12.4) Core router with only links to other core routers: interface loopback 0 ip address 192.168.255.1 255.255.255.255 interface POS 0/0 ip address 192.168.10.129 255.255.255.252 interface POS 1/0 ip address 192.168.10.133 255.255.255.252 interface POS 2/0 ip address 192.168.10.137 255.255.255.252 interface POS 2/1 ip address 192.168.10.141 255.255.255.252 router ospf 100 network 192.168.255.1 0.0.0.0 area 0 network 192.168.10.128 0.0.0.3 area 0 network 192.168.10.132 0.0.0.3 area 0 network 192.168.10.136 0.0.0.3 area 0 network 192.168.10.140 0.0.0.3 area 0 passive interface loopback 0 143 OSPF: Adding Networks Summary Key Theme when selecting a technique: Keep the Link State Database Lean Increases Stability Reduces the amount of information in the Link State Advertisements (LSAs) Speeds Convergence Time 144 OSPF in Cisco IOS Useful features for ISPs 145 Areas An area is stored as a 32-bit field: Defined in IPv4 address format (i.e. Area 0.0.0.0) Can also be defined using single decimal value (i.e. Area 0) 0.0.0.0 reserved for the backbone area Area 3 Area 0 Area 2 Area 1 146 Logging Adjacency Changes The router will generate a log message whenever an OSPF neighbour changes state Syntax: [no] [ospf] log-adjacency-changes (OSPF keyword is optional, depending on IOS version) Example of a typical log message: %OSPF-5-ADJCHG: Process 1, Nbr 223.127.255.223 on Ethernet0 from LOADING to FULL, Loading Done 147 Number of State Changes The number of state transitions is available via SNMP (ospfNbrEvents) and the CLI: show ip ospf neighbor [type number] [neighbor-id] [detail] Detail—(Optional) Displays all neighbours given in detail (list all neighbours). When specified, neighbour state transition counters are displayed per interface or neighbour ID 148 State Changes (Continued) To reset OSPF-related statistics, use the clear ip ospf counters command This will reset neighbour state transition counters per interface or neighbour id clear ip ospf counters [neighbor [<type number>] [neighbor-id]] 149 Router ID If the loopback interface exists and has an IP address, that is used as the router ID in routing protocols – stability! If the loopback interface does not exist, or has no IP address, the router ID is the highest IP address configured – danger! OSPF sub command to manually set the Router ID: router-id <ip address> 150 Cost & Reference Bandwidth Bandwidth used in Metric calculation Syntax: Cost = 108/bandwidth Not useful for interface bandwidths > 100 Mbps ospf auto-cost reference-bandwidth <referencebw> Default reference bandwidth still 100 Mbps for backward compatibility Most ISPs simply choose to develop their own cost strategy and apply to each interface type 151 Cost: Example Strategy 100GE 40GE/OC768 10GE/OC192 OC48 GigEthernet OC12 OC3 FastEthernet Ethernet E1 100Gbps 40Gbps 10Gbps 2.5Gbps 1Gbps 622Mbps 155Mbps 100Mbps 10Mbps 2Mbps cost cost cost cost cost cost cost cost cost cost = = = = = = = = = = 1 2 5 10 20 50 100 200 500 1000 152 Default routes Originating a default route into OSPF default-information originate metric <n> Will originate a default route into OSPF if there is a matching default route in the Routing Table (RIB) The optional always keyword will always originate a default route, even if there is no existing entry in the RIB 153 Clear/Restart OSPF clear commands clear ip ospf [pid] redistribution This command clears redistribution based on OSPF routing process ID clear ip ospf [pid] counters If no process ID is given, all OSPF processes on the router are assumed This command clears counters based on OSPF routing process ID clear ip ospf [pid] process This command will restart the specified OSPF process. It attempts to keep the old router-id, except in cases where a new router-id was configured or an old user configured router-id was removed. Since this command 154 can potentially cause a network churn, a user confirmation is required before performing any action Use OSPF Authentication Use authentication Too many operators overlook this basic requirement When using authentication, use the MD5 feature Under the global OSPF configuration, specify: area <area-id> authentication message-digest Under the interface configuration, specify: ip ospf message-digest-key 1 md5 <key> Authentication can be selectively disabled per interface with: ip ospf authentication null 155 Point to Point Ethernet Links For any broadcast media (like Ethernet), OSPF will attempt to elect a designated and backup designated router when it forms an adjacency If the interface is running as a point-to-point WAN link, with only 2 routers on the wire, configuring OSPF to operate in "point-to-point mode" scales the protocol by reducing the link failure detection times Point-to-point mode improves convergence times on Ethernet networks because it: Prevents the election of a DR/BDR on the link, Simplifies the SPF computations and reduces the router's memory footprint due to a smaller topology database. interface fastethernet0/2 ip ospf network point-to-point 156 Tuning OSPF (1) DR/BDR Selection ip ospf priority 100 (default 1) This feature should be in use in your OSPF network Forcibly set your DR and BDR per segment so that they are known Choose your most powerful, or most idle routers, so that OSPF converges as fast as possible under maximum network load conditions Try to keep the DR/BDR limited to one segment each 157 Tuning OSPF (2) OSPF startup max-metric router-lsa on-startup wait-for-bgp Avoids blackholing traffic on router restart Causes OSPF to announce its prefixes with highest possible metric until iBGP is up and running When iBGP is running, OSPF metrics return to normal, make the path valid ISIS equivalent: set-overload-bit on-startup wait-for-bgp 158 Tuning OSPF (3) Hello/Dead Timers ip ospf hello-interval 3 (default 10) ip ospf dead-interval 15 (default is 4x hello) This allows for faster network awareness of a failure, and can result in faster reconvergence, but requires more router CPU and generates more overhead LSA Pacing timers lsa-group-pacing 300 (default 240) Allows grouping and pacing of LSA updates at configured interval Reduces overall network and router impact 159 Tuning OSPF (4) OSPF Internal Timers timers spf 2 8 (default is 5 and 10) Allows you to adjust SPF characteristics The first number sets wait time from topology change to SPF run The second is hold-down between SPF runs BE CAREFUL WITH THIS COMMAND; if you’re not sure when to use it, it means you don’t need it; default is sufficient 95% of the time 160 Tuning OSPF (5) LSA filtering/interface blocking Per interface: Per neighbor: neighbor 1.1.1.1 database-filter all out (no options) OSPFs router will flood an LSA out all interfaces except the receiving one; LSA filtering can be useful in cases where such flooding unnecessary (i.e., NBMA networks), where the DR/BDR can handle flooding chores area <area-id> filter-list <acl> Filters out specific Type 3 LSAs at ABRs ip ospf database-filter all out (no options) Improper use can result in routing loops and black-holes that can be very difficult to troubleshoot 161 Summary OSPF has a bewildering number of features and options Observe ISP best practices Keep design and configuration simple Investigate tuning options and suitability for your own network Don’t just turn them on! 162 Deploying OSPF for ISPs ISP Training Workshops 163 Introduction to BGP ISP Training Workshops 164 Border Gateway Protocol A Routing Protocol used to exchange routing information between different networks Described in RFC4271 Exterior gateway protocol RFC4276 gives an implementation report on BGP RFC4277 describes operational experiences using BGP The Autonomous System is the cornerstone of BGP It is used to uniquely identify networks with a common routing policy 165 BGP Path Vector Protocol Incremental Updates Many options for policy enforcement Classless Inter Domain Routing (CIDR) Widely used for Internet backbone Autonomous systems 166 Path Vector Protocol BGP is classified as a path vector routing protocol (see RFC 1322) A path vector protocol defines a route as a pairing between a destination and the attributes of the path to that destination. 12.6.126.0/24 207.126.96.43 1021 0 6461 7018 6337 11268 i AS Path 167 Path Vector Protocol AS6337 AS11268 AS7018 AS500 AS6461 AS600 168 Definitions Transit – carrying traffic across a network, usually for a fee Peering – exchanging routing information and traffic Default – where to send traffic when there is no explicit match in the routing table 169 Default Free Zone The default free zone is made up of Internet routers which have explicit routing information about the rest of the Internet, and therefore do not need to use a default route NB: is not related to where an ISP is in the hierarchy 170 Peering and Transit example provider A IXP-West Backbone Provider D IXP-East provider B provider C A and B can peer, but need transit arrangements with D to get packets to/from C 171 Autonomous System (AS) AS 100 Collection of networks with same routing policy Single routing protocol Usually under single ownership, trust and administrative control Identified by a unique 32-bit integer (ASN) 172 Autonomous System Number (ASN) Two ranges (original 16-bit range) (32-bit range – RFC4893) Usage: 0-65535 65536-4294967295 0 and 65535 1-64495 64496-64511 64512-65534 23456 65536-65551 65552-4294967295 (reserved) (public Internet) (documentation – RFC5398) (private use only) (represent 32-bit range in 16-bit world) (documentation – RFC5398) (public Internet) 32-bit range representation specified in RFC5396 Defines “asplain” (traditional format) as standard notation 173 Autonomous System Number (ASN) ASNs are distributed by the Regional Internet Registries Current 16-bit ASN allocations up to 61439 have been made to the RIRs Around 42000 are visible on the Internet Each RIR has also received a block of 32-bit ASNs They are also available from upstream ISPs who are members of one of the RIRs Out of 3100 assignments, around 2800 are visible on the Internet See www.iana.org/assignments/as-numbers 174 Configuring BGP in Cisco IOS This command enables BGP in Cisco IOS: router bgp 100 For ASNs > 65535, the AS number can be entered in either plain or dot notation: router bgp 131076 or router bgp 2.4 IOS will display ASNs in plain notation by default Dot notation is optional: router bgp 2.4 bgp asnotation dot 175 BGP Basics Peering A C AS 100 AS 101 D B Runs over TCP – port 179 Path vector protocol Incremental updates “Internal” & “External” BGP E AS 102 176 Demarcation Zone (DMZ) A AS 100 DMZ Network B C AS 101 D E AS 102 DMZ is the link or network shared between ASes 177 BGP General Operation Learns multiple paths via internal and external BGP speakers Picks the best path and installs it in the routing table (RIB) Best path is sent to external BGP neighbours Policies are applied by influencing the best path selection 178 Constructing the Forwarding Table BGP “in” process BGP “out” process receives path information from peers results of BGP path selection placed in the BGP table “best path” flagged announces “best path” information to peers Best path stored in Routing Table (RIB) Best paths in the RIB are installed in forwarding table (FIB) if: prefix and prefix length are unique lowest “protocol distance” 179 Constructing the Forwarding Table BGP in process in discarded accepted everything bgp BGP table peer routing table best paths out BGP out process forwarding table 180 eBGP & iBGP BGP used internally (iBGP) and externally (eBGP) iBGP used to carry Some/all Internet prefixes across ISP backbone ISP’s customer prefixes eBGP used to Exchange prefixes with other ASes Implement routing policy 181 BGP/IGP model used in ISP networks Model representation eBGP eBGP eBGP iBGP iBGP iBGP iBGP IGP IGP IGP IGP AS1 AS2 AS3 AS4 182 External BGP Peering (eBGP) A AS 100 C AS 101 B Between BGP speakers in different AS Should be directly connected Never run an IGP between eBGP peers 183 Configuring External BGP ip address on ethernet interface Router A in AS100 interface ethernet 5/0 ip address 102.102.10.2 255.255.255.240 ! Local ASN router bgp 100 network 100.100.8.0 mask 255.255.252.0 Remote ASN neighbor 102.102.10.1 remote-as 101 neighbor 102.102.10.1 prefix-list RouterC in neighbor 102.102.10.1 prefix-list RouterC out ! ip address of Router C ethernet interface Inbound and outbound filters 184 Configuring External BGP ip address on ethernet interface Router C in AS101 interface ethernet 1/0/0 ip address 102.102.10.1 255.255.255.240 ! Local ASN router bgp 101 network 100.100.64.0 mask 255.255.248.0 Remote ASN neighbor 102.102.10.2 remote-as 100 neighbor 102.102.10.2 prefix-list RouterA in neighbor 102.102.10.2 prefix-list RouterA out ! ip address of Router A ethernet interface Inbound and outbound filters 185 Internal BGP (iBGP) BGP peer within the same AS Not required to be directly connected IGP takes care of inter-BGP speaker connectivity iBGP speakers must be fully meshed: They originate connected networks They pass on prefixes learned from outside the ASN They do not pass on prefixes learned from other iBGP speakers 186 Internal BGP Peering (iBGP) AS 100 A B C D Topology independent Each iBGP speaker must peer with every other iBGP speaker in the AS 187 Peering between Loopback Interfaces AS 100 C A B Peer with loop-back interface Loop-back interface does not go down – ever! Do not want iBGP session to depend on state of 188 a single interface or the physical topology Configuring Internal BGP ip address on loopback interface Router A in AS100 interface loopback 0 ip address 105.3.7.1 255.255.255.255 ! Local ASN router bgp 100 network 100.100.1.0 Local ASN neighbor 105.3.7.2 remote-as 100 neighbor 105.3.7.2 update-source loopback0 neighbor 105.3.7.3 remote-as 100 neighbor 105.3.7.3 update-source loopback0 ! ip address of Router B loopback interface 189 Configuring Internal BGP ip address on loopback interface Router B in AS100 interface loopback 0 ip address 105.3.7.2 255.255.255.255 ! Local ASN router bgp 100 network 100.100.1.0 Local ASN neighbor 105.3.7.1 remote-as 100 neighbor 105.3.7.1 update-source loopback0 neighbor 105.3.7.3 remote-as 100 neighbor 105.3.7.3 update-source loopback0 ! ip address of Router A loopback interface 190 Inserting prefixes into BGP Two ways to insert prefixes into BGP redistribute static network command 191 Inserting prefixes into BGP – redistribute static Configuration Example: router bgp 100 redistribute static ip route 102.10.32.0 255.255.254.0 serial0 Static route must exist before redistribute command will work Forces origin to be “incomplete” Care required! 192 Inserting prefixes into BGP – redistribute static Care required with redistribute! redistribute <routing-protocol> means everything in the <routing-protocol> will be transferred into the current routing protocol Will not scale if uncontrolled Best avoided if at all possible redistribute normally used with “routemaps” and under tight administrative control 193 Inserting prefixes into BGP – network command Configuration Example router bgp 100 network 102.10.32.0 mask 255.255.254.0 ip route 102.10.32.0 255.255.254.0 serial0 A matching route must exist in the routing table before the network is announced Forces origin to be “IGP” 194 Configuring Aggregation Three ways to configure route aggregation redistribute static aggregate-address network command 195 Configuring Aggregation Configuration Example: router bgp 100 redistribute static ip route 102.10.0.0 255.255.0.0 null0 250 static route to “null0” is called a pull up route packets only sent here if there is no more specific match in the routing table distance of 250 ensures this is last resort static care required – see previously! 196 Configuring Aggregation – Network Command Configuration Example router bgp 100 network 102.10.0.0 mask 255.255.0.0 ip route 102.10.0.0 255.255.0.0 null0 250 A matching route must exist in the routing table before the network is announced Easiest and best way of generating an aggregate 197 Configuring Aggregation – aggregate-address command Configuration Example: router bgp 100 network 102.10.32.0 mask 255.255.252.0 aggregate-address 102.10.0.0 255.255.0.0 [summary-only] Requires more specific prefix in BGP table before aggregate is announced summary-only keyword Optional keyword which ensures that only the summary is announced if a more specific prefix exists in the routing table Summary BGP neighbour status Router6>sh ip bgp sum BGP router identifier 10.0.15.246, local AS number 10 BGP table version is 16, main routing table version 16 7 network entries using 819 bytes of memory 14 path entries using 728 bytes of memory 2/1 BGP path/bestpath attribute entries using 248 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 1795 total bytes of memory BGP activity 7/0 prefixes, 14/0 paths, scan interval 60 secs Neighbor 10.0.15.241 10.0.15.242 10.0.15.243 ... V 4 4 4 AS MsgRcvd MsgSent 10 9 8 10 6 5 10 9 8 BGP Version TblVer 16 16 16 InQ OutQ Up/Down State/PfxRcd 0 0 00:04:47 2 0 0 00:01:43 2 0 0 00:04:49 2 Updates sent Updates waiting and received 199 Summary BGP Table Router6>sh ip bgp BGP table version is 16, local router ID is 10.0.15.246 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found *>i *>i *>i *>i *>i *> *>i *>i *>i *>i ... Network 10.0.0.0/26 10.0.0.64/26 10.0.0.128/26 10.0.0.192/26 10.0.1.0/26 10.0.1.64/26 10.0.1.128/26 10.0.1.192/26 10.0.2.0/26 10.0.2.64/26 Next Hop 10.0.15.241 10.0.15.242 10.0.15.243 10.0.15.244 10.0.15.245 0.0.0.0 10.0.15.247 10.0.15.248 10.0.15.249 10.0.15.250 Metric LocPrf Weight Path 0 100 0 i 0 100 0 i 0 100 0 i 0 100 0 i 0 100 0 i 0 32768 i 0 100 0 i 0 100 0 i 0 100 0 i 0 100 0 i 200 Summary BGP4 – path vector protocol iBGP versus eBGP stable iBGP – peer with loopbacks announcing prefixes & aggregates 201 Introduction to BGP ISP Training Workshops 202 BGP Policy Control ISP Training Workshops 203 Applying Policy with BGP Policy-based on AS path, community or the prefix Rejecting/accepting selected routes Set attributes to influence path selection Tools: Prefix-list (filters prefixes) Filter-list (filters ASes) Route-maps and communities 204 Policy Control – Prefix List Per neighbour prefix filter incremental configuration Inbound or Outbound Based upon network numbers (using familiar IPv4 address/mask format) Using access-lists in Cisco IOS for filtering prefixes was deprecated long ago Strongly discouraged! 205 Prefix-list Command Syntax Syntax: [no] ip prefix-list list-name [seq seq-value] permit|deny network/len [ge ge-value] [le levalue] network/len: The prefix and its length ge ge-value: “greater than or equal to” le le-value: “less than or equal to” Both “ge” and “le” are optional Used to specify the range of the prefix length to be matched for prefixes that are more specific than network/len Sequence number is also optional no ip prefix-list sequence-number display of sequence numbers to disable 206 Prefix Lists – Examples Deny default route ip prefix-list EG deny 0.0.0.0/0 Permit the prefix 35.0.0.0/8 ip prefix-list EG permit 35.0.0.0/8 Deny the prefix 172.16.0.0/12 ip prefix-list EG deny 172.16.0.0/12 In 192/8 allow up to /24 ip prefix-list EG permit 192.0.0.0/8 le 24 This allows all prefix sizes in the 192.0.0.0/8 address block, apart from /25, /26, /27, /28, /29, /30, /31 and /32. 207 Prefix Lists – Examples In 192/8 deny /25 and above ip prefix-list EG deny 192.0.0.0/8 ge 25 This denies all prefix sizes /25, /26, /27, /28, /29, /30, /31 and /32 in the address block 192.0.0.0/8. It has the same effect as the previous example In 193/8 permit prefixes between /12 and /20 ip prefix-list EG permit 193.0.0.0/8 ge 12 le 20 This denies all prefix sizes /8, /9, /10, /11, /21, /22, … and higher in the address block 193.0.0.0/8. Permit all prefixes ip prefix-list EG permit 0.0.0.0/0 le 32 0.0.0.0 matches all possible addresses, “0 le 32” matches all possible prefix lengths 208 Policy Control – Prefix List Example Configuration router bgp 100 network 105.7.0.0 mask 255.255.0.0 neighbor 102.10.1.1 remote-as 110 neighbor 102.10.1.1 prefix-list AS110-IN in neighbor 102.10.1.1 prefix-list AS110-OUT out ! ip prefix-list AS110-IN deny 218.10.0.0/16 ip prefix-list AS110-IN permit 0.0.0.0/0 le 32 ip prefix-list AS110-OUT permit 105.7.0.0/16 ip prefix-list AS110-OUT deny 0.0.0.0/0 le 32 209 Policy Control – Filter List Filter routes based on AS path Inbound or Outbound Example Configuration: router bgp 100 network 105.7.0.0 mask 255.255.0.0 neighbor 102.10.1.1 filter-list 5 out neighbor 102.10.1.1 filter-list 6 in ! ip as-path access-list 5 permit ^200$ ip as-path access-list 6 permit ^150$ 210 Policy Control – Regular Expressions Like Unix regular expressions . * + ^ $ \ _ | () [] Match one character Match any number of preceding expression Match at least one of preceding expression Beginning of line End of line Escape a regular expression character Beginning, end, white-space, brace Or brackets to contain expression brackets to contain number ranges 211 Policy Control – Regular Expressions Simple Examples .* .+ ^$ _1800$ ^1800_ _1800_ _790_1800_ _(1800_)+ _\(65530\)_ match anything match at least one character match routes local to this AS originated by AS1800 received from AS1800 via AS1800 via AS1800 and AS790 multiple AS1800 in sequence (used to match AS-PATH prepends) via AS65530 (confederations) 212 Policy Control – Regular Expressions Not so simple Examples ^[0-9]+$ ^[0-9]+_[0-9]+$ ^[0-9]*_[0-9]+$ ^[0-9]*_[0-9]*$ Match AS_PATH length of one Match AS_PATH length of two Match AS_PATH length of one or two Match AS_PATH length of one or two (will also match zero) ^[0-9]+_[0-9]+_[0-9]+$ Match AS_PATH length of three _(701|1800)_ Match anything which has gone through AS701 or AS1800 _1849(_.+_)12163$ Match anything of origin AS12163 and passed through AS1849 213 Policy Control – Route Maps A route-map is like a “programme” for IOS Has “line” numbers, like programmes Each line is a separate condition/action Concept is basically: if match then do expression and exit else if match then do expression and exit else etc Route-map “continue” lets ISPs apply multiple conditions and actions in one route-map 214 Route Maps – Caveats Lines can have multiple set statements Lines can have multiple match statements Line with only a match statement Line with only a set statement Only prefixes matching go through, the rest are dropped All prefixes are matched and set Any following lines are ignored Line with a match/set statement and no following lines Only prefixes matching are set, the rest are dropped 215 Route Maps – Caveats Example Omitting the third line below means that prefixes not matching list-one or list-two are dropped route-map sample permit 10 match ip address prefix-list list-one set local-preference 120 ! route-map sample permit 20 match ip address prefix-list list-two set local-preference 80 ! route-map sample permit 30 ! Don’t forget this 216 Route Maps – Matching prefixes Example Configuration router bgp 100 neighbor 1.1.1.1 route-map infilter in ! route-map infilter permit 10 match ip address prefix-list HIGH-PREF set local-preference 120 ! route-map infilter permit 20 match ip address prefix-list LOW-PREF set local-preference 80 ! ip prefix-list HIGH-PREF permit 10.0.0.0/8 ip prefix-list LOW-PREF permit 20.0.0.0/8 217 Route Maps – AS-PATH filtering Example Configuration router bgp 100 neighbor 102.10.1.2 remote-as 200 neighbor 102.10.1.2 route-map filter-on-as-path in ! route-map filter-on-as-path permit 10 match as-path 1 set local-preference 80 ! route-map filter-on-as-path permit 20 match as-path 2 set local-preference 200 ! ip as-path access-list 1 permit _150$ 218 ip as-path access-list 2 permit _210_ Route Maps – AS-PATH prepends Example configuration of AS-PATH prepend router bgp 300 network 105.7.0.0 mask 255.255.0.0 neighbor 2.2.2.2 remote-as 100 neighbor 2.2.2.2 route-map SETPATH out ! route-map SETPATH permit 10 set as-path prepend 300 300 Use your own AS number when prepending Otherwise BGP loop detection may cause disconnects 219 Route Maps – Matching Communities Example Configuration router bgp 100 neighbor 102.10.1.2 remote-as 200 neighbor 102.10.1.2 route-map filter-on-community in ! route-map filter-on-community permit 10 match community 1 set local-preference 50 ! route-map filter-on-community permit 20 match community 2 exact-match set local-preference 200 ! ip community-list 1 permit 150:3 200:5 220 ip community-list 2 permit 88:6 Community-List Processing Note: When multiple values are configured in the same community list statement, a logical AND condition is created. All community values must match to satisfy an AND condition ip community-list 1 permit 150:3 200:5 When multiple values are configured in separate community list statements, a logical OR condition is created. The first list that matches a condition is processed ip community-list 1 permit 150:3 ip community-list 1 permit 200:5 221 Route Maps – Setting Communities Example Configuration router bgp 100 network 105.7.0.0 mask 255.255.0.0 neighbor 102.10.1.1 remote-as 200 neighbor 102.10.1.1 send-community neighbor 102.10.1.1 route-map set-community out ! route-map set-community permit 10 match ip address prefix-list NO-ANNOUNCE set community no-export ! route-map set-community permit 20 match ip address prefix-list AGGREGATE ! ip prefix-list NO-ANNOUNCE permit 105.7.0.0/16 ge 222 17 ip prefix-list AGGREGATE permit 105.7.0.0/16 Route Map Continue Handling multiple conditions and actions in one route-map (for BGP neighbour relationships only) route-map peer-filter permit 10 match ip address prefix-list group-one continue 30 set metric 2000 ! route-map peer-filter permit 20 match ip address prefix-list group-two set community no-export ! route-map peer-filter permit 30 match ip address prefix-list group-three set as-path prepend 100 100 ! 223 Order of processing BGP policy For policies applied to a specific BGP neighbour, the following sequence is applied: For inbound updates, the order is: Route-map Filter-list Prefix-list For outbound updates, the order is: Prefix-list Filter-list Route-map 224 Managing Policy Changes New policies only apply to the updates going through the router AFTER the policy has been introduced or changed To facilitate policy changes on the entire BGP table the router handles the BGP peerings need to be “refreshed” This is done by clearing the BGP session either in or out, for example: clear ip bgp <neighbour-addr> in|out Do NOT forget in or out — doing so results in a hard reset of the BGP session 225 Managing Policy Changes Ability to clear the BGP sessions of groups of neighbours configured according to several criteria clear ip bgp <addr> [in|out] <addr> may be any of the following x.x.x.x IP address of a peer * all peers ASN all peers in an AS external all external peers peer-group <name> all peers in a peer-group 226 BGP Policy Control ISP Training Workshops 227 Internet Exchange Point Design ISP Training Workshops 228 IXP Design Background Why set up an IXP? Layer 2 Exchange Point Layer 3 “Exchange Point” Design Considerations Route Collectors & Servers What can go wrong? 229 A bit of history In a time long gone… 230 A Bit of History… End of NSFnet – one major backbone move towards commercial Internet Need for coordination of routing exchange between providers Private companies selling their bandwidth Traffic from ISP A needs to get to ISP B Routing Arbiter project created to facilitate this 231 What is an Exchange Point Network Access Points (NAPs) established at end of NSFnet The original “exchange points” Major providers connect their networks and exchange traffic High-speed network or ethernet switch Simple concept – any place where providers come together to exchange traffic 232 Internet Exchange Points Layer 2 exchange point Ethernet (100Gbps/10Gbps/1Gbps/100Mbps) Older technologies include ATM, Frame Relay, SRP, FDDI and SMDS Layer 3 exchange point Router based Has historical status now 233 Why an Internet Exchange Point? Saving money, improving QoS, Generating a local Internet economy 234 Internet Exchange Point Why peer? Consider a region with one ISP Internet grows, another ISP sets up in competition They provide internet connectivity to their customers They have one or two international connections They provide internet connectivity to their customers They have one or two international connections How does traffic from customer of one ISP get to customer of the other ISP? Via the international connections 235 Internet Exchange Point Why peer? Yes, International Connections… If satellite, RTT is around 550ms per hop So local traffic takes over 1s round trip International bandwidth Costs significantly more than domestic bandwidth Congested with local traffic Wastes money, harms performance 236 Internet Exchange Point Why peer? Solution: Two competing ISPs peer with each other Result: Both save money Local traffic stays local Better network performance, better QoS,… More international bandwidth for expensive international traffic Everyone is happy 237 Internet Exchange Point Why peer? A third ISP enters the equation Becomes a significant player in the region Local and international traffic goes over their international connections They agree to peer with the two other ISPs To save money To keep local traffic local To improve network performance, QoS,… 238 Internet Exchange Point Why peer? Private peering means that the three ISPs have to buy circuits between each other Works for three ISPs, but adding a fourth or a fifth means this does not scale Solution: Internet Exchange Point 239 Internet Exchange Point Every participant has to buy just one whole circuit From their premises to the IXP Rather than N-1 half circuits to connect to the N-1 other ISPs 5 ISPs have to buy 4 half circuits = 2 whole circuits already twice the cost of the IXP connection 240 Internet Exchange Point Solution Every ISP participates in the IXP Cost is minimal – one local circuit covers all domestic traffic International circuits are used for just international traffic – and backing up domestic links in case the IXP fails Result: Local traffic stays local QoS considerations for local traffic is not an issue RTTs are typically sub 10ms Customers enjoy the Internet experience Local Internet economy grows rapidly 241 Layer 2 Exchange The traditional IXP 242 IXP Design Very simple concept: Ethernet switch is the interconnection media IXP is one LAN Each ISP brings a router, connects it to the ethernet switch provided at the IXP Each ISP peers with other participants at the IXP using BGP Scaling this simple concept is the challenge for the larger IXPs 243 Layer 2 Exchange ISP 6 ISP 5 ISP 4 IXP Services: IXP Management Network Root & TLD DNS, Routing Registry Ethernet Switch Looking Glass, etc ISP 1 ISP 2 ISP 3 244 Layer 2 Exchange ISP 6 ISP 5 ISP 4 IXP Services: IXP Management Network Root & TLD DNS, Routing Registry Ethernet Switches Looking Glass, etc ISP 1 ISP 2 ISP 3 245 Layer 2 Exchange Two switches for redundancy ISPs use dual routers for redundancy or loadsharing Offer services for the “common good” Internet portals and search engines DNS Root & TLDs, NTP servers Routing Registry and Looking Glass 246 Layer 2 Exchange Requires neutral IXP management Usually funded equally by IXP participants 24x7 cover, support, value add services Secure and neutral location Configuration Private address space if non-transit and no value add services Otherwise public IPv4 (/24) and IPv6 (/64) ISPs require AS, basic IXP does not 247 Layer 2 Exchange Network Security Considerations LAN switch needs to be securely configured Management routers require TACACS+ authentication, vty security IXP services must be behind router(s) with strong filters 248 “Layer 3 IXP” Layer 3 IXP is marketing concept used by Transit ISPs Real Internet Exchange Points are only Layer 2 249 IXP Design Considerations 250 Exchange Point Design The IXP Core is an Ethernet switch It must be a managed switch Has superseded all other types of network devices for an IXP From the cheapest and smallest managed 12 or 24 port 10/100 switch To the largest switches now handling high densities of 10GE and 100GE interfaces 251 Exchange Point Design Each ISP participating in the IXP brings a router to the IXP location Router needs: One Ethernet port to connect to IXP switch One WAN port to connect to the WAN media leading back to the ISP backbone To be able to run BGP 252 Exchange Point Design IXP switch located in one equipment rack dedicated to IXP Also includes other IXP operational equipment Routers from participant ISPs located in neighbouring/adjacent rack(s) Copper (UTP) connections made for 10Mbps, 100Mbps or 1Gbps connections Fibre used for 1Gbps, 10Gbps, 40Gbps or 100Gbps connections 253 Peering Each participant needs to run BGP They need their own AS number Public ASN, NOT private ASN Each participant configures external BGP directly with the other participants in the IXP Peering with all participants or Peering with a subset of participants 254 Peering (more) Mandatory Multi-Lateral Peering (MMLP) Multi-Lateral Peering (MLP) Each participant is forced to peer with every other participant as part of their IXP membership Has no history of success — the practice is strongly discouraged Each participant peers with every other participant (usually via a Route Server) Bi-Lateral Peering Participants set up peering with each other according to their own requirements and business relationships This is the most common situation at IXPs today 255 Routing ISP border routers at the IXP must NOT be configured with a default route or carry the full Internet routing table Carrying default or full table means that this router and the ISP network is open to abuse by non-peering IXP members Correct configuration is only to carry routes offered to IXP peers on the IXP peering router Note: Some ISPs offer transit across IX fabrics They do so at their own risk – see above 256 Routing (more) ISP border routers at the IXP should not be configured to carry the IXP LAN network within the IGP or iBGP Use next-hop-self BGP concept Don’t generate ISP prefix aggregates on IXP peering router If connection from backbone to IXP router goes down, normal BGP failover will then be successful 257 Address Space Some IXPs use private addresses for the IX LAN Public address space means IXP network could be leaked to Internet which may be undesirable Because most ISPs filter RFC1918 address space, this avoids the problem Some IXPs use public addresses for the IX LAN Address space available from the RIRs IXP terms of participation often forbid the IX LAN to be carried in the ISP member backbone 258 Hardware Try not to mix port speeds Don’t mix transports if 10Mbps and 100Mbps connections available, terminate on different switches (L2 IXP) if terminating ATM PVCs and G/F/Ethernet, terminate on different devices Insist that IXP participants bring their own router moves buffering problem off the IXP security is responsibility of the ISP, not the IXP 259 Charging IXPs should be run at minimal cost to participants Examples: Datacentre hosts IX for free IX operates cost recovery Because ISP participants then use data centre for co-lo services, and the datacentre benefits long term Each member pays a flat fee towards the cost of the switch, hosting, power & management Different pricing for different ports One slot may handle 24 10GE ports Or one slot may handle 96 1GE ports 96 port 1GE card is tenth price of 24 port 10GE card Relative port cost is passed on to participants 260 Services Offered Services offered should not compete with member ISPs (basic IXP) e.g. web hosting at an IXP is a bad idea unless all members agree to it IXP operations should make performance and throughput statistics available to members Use tools such as MRTG/Cacti to produce IX throughput graphs for member (or public) information 261 Services to Offer ccTLD DNS Root server the country IXP could host the country’s top level DNS e.g. “SE.” TLD is hosted at Netnod IXes in Sweden Offer back up of other country ccTLD DNS Anycast instances of I.root-servers.net, F.rootservers.net etc are present at many IXes Usenet News Usenet News is high volume could save bandwidth to all IXP members 262 Services to Offer Route Collector Route collector shows the reachability information available at the exchange Technical detail covered later on Looking Glass One way of making the Route Collector routes available for global view (e.g. www.traceroute.org) Public or members only access 263 Services to Offer Content Redistribution/Caching Network Time Protocol For example, Akamised update distribution service Locate a stratum 1 time source (GPS receiver, atomic clock, etc) at IXP Routing Registry Used to register the routing policy of the IXP membership (more later) 264 Introduction to Route Collectors What routes are available at the IXP? 265 What is a Route Collector? Usually a router or Unix system running BGP Gathers routing information from service provider routers at an IXP Peers with each ISP using BGP Does not forward packets Does not announce any prefixes to ISPs 266 Purpose of a Route Collector To provide a public view of the Routing Information available at the IXP Useful for existing members to check functionality of BGP filters Useful for prospective members to check value of joining the IXP Useful for the Internet Operations community for troubleshooting purposes E.g. www.traceroute.org 267 Route Collector at an IXP R3 R2 R1 R4 SWITCH Route Collector R5 268 Route Collector Requirements Router or Unix system running BGP Peers eBGP with every IXP member Minimal memory requirements – only holds IXP routes Minimal packet forwarding requirements – doesn’t forward any packets Accepts everything; Gives nothing Uses a private ASN Connects to IXP Transit LAN “Back end” connection Second Ethernet globally routed Connection to IXP Website for public access 269 Route Collector Implementation Most IXPs now implement some form of Route Collector Benefits already mentioned Great public relations tool Unsophisticated requirements Just runs BGP 270 Introduction to Route Servers How to scale very large IXPs 271 What is a Route Server? Has all the features of a Route Collector But also: Announces routes to participating IXP members according to their routing policy definitions Implemented using the same specification as for a Route Collector 272 Features of a Route Server Helps scale routing for large IXPs Simplifies Routing Processes on ISP Routers Optional participation Provided as service, is NOT mandatory Does result in insertion of RS Autonomous System Number in the Routing Path Optionally uses Policy registered in IRR 273 Diagram of N-squared Peering Mesh For large IXPs (dozens for participants) maintaining a larger peering mesh becomes cumbersome and often too hard 274 Peering Mesh with Route Servers RS RS ISP routers peer with the Route Servers Only need to have two eBGP sessions rather than N 275 RS based Exchange Point Routing Flow RS TRAFFIC FLOW ROUTING INFORMATION FLOW 276 Advantages of Using a Route Server Advantageous for large IXPs Helps scale eBGP mesh Helps scale prefix distribution Separation of Routing and Forwarding Simplifies BGP Configuration Management on ISP routers 277 Disadvantages of using a Route Server ISPs can lose direct policy control Completely dependent on 3rd party If RS is only peer, ISPs have no control over who their prefixes are distributed to Configuration, troubleshooting, etc… Insertion of RS ASN into routing path (If using a router rather than a dedicated route-server BGP implementation) Traffic engineering/multihoming needs more care 278 Typical usage of a Route Server Route Servers may be provided as an OPTIONAL service Most common at large IXPs (>50 participants) Examples: LINX, TorIX, AMS-IX, etc ISPs peer: Directly with significant peers With Route Server for the rest 279 Things to think about... Would using a route server benefit you? Helpful when BGP knowledge is limited (but is NOT an excuse not to learn BGP) Avoids having to maintain a large number of eBGP peers But can you afford to lose policy control? (An ISP not in control of their routing policy is what?) 280 What can go wrong… The different ways IXP operators harm their IXP… 281 What can go wrong? Concept Some Service Providers attempt to cash in on the reputation of IXPs Market Internet transit services as “Internet Exchange Point” “We are exchanging packets with other ISPs, so we are an Internet Exchange Point!” So-called Layer-3 Exchanges — really Internet Transit Providers Router used rather than a Switch Most famous example: SingTelIX 282 What can go wrong? Financial Some IXPs price the IX out of the means of most providers IXP is intended to encourage local peering Acceptable charging model is minimally costrecovery only Some IXPs charge for port traffic IXPs are not a transit service, charging for traffic puts the IX in competition with members (There is nothing wrong with charging different flat fees for 100Mbps, 1Gbps, 10Gbps etc ports as they all have different hardware costs on 283 What can go wrong? Competition Too many exchange points in one locale Competing exchanges defeats the purpose Becomes expensive for ISPs to connect to all of them An IXP: is NOT a competition is NOT a profit making business 284 What can go wrong? Rules and Restrictions IXPs try to compete with their membership IXPs run as a closed privileged club e.g.: Offering services that ISPs would/do offer their customers Restrictive membership criteria IXPs providing access to end users rather than just Service Providers IXPs interfering with ISP business decisions e.g. Mandatory Multi-Lateral Peering 285 What can go wrong? Technical Design Errors Interconnected IXPs IXP in one location believes it should connect directly to the IXP in another location Who pays for the interconnect? How is traffic metered? Competes with the ISPs who already provide transit between the two locations (who then refuse to join IX, harming the viability of the IX) Metro interconnections work ok (e.g. LINX, AMS-IX, DE-CIX etc) 286 What can go wrong? Technical Design Errors ISPs bridge the IXP LAN back to their offices “We are poor, we can’t afford a router” Financial benefits of connecting to an IXP far outweigh the cost of a router In reality it allows the ISP to connect any devices to the IXP LAN — with disastrous consequences for the security, integrity and reliability of the IXP 287 What can go wrong? Routing Design Errors Route Server implemented from Day One ISPs have no incentive to learn BGP Therefore have no incentive to understand peering relationships, peering policies, &c Entirely dependent on operator of RS for troubleshooting, configuration, reliability RS can’t be run by committee! Route Server is to help scale peering at LARGE IXPs 288 What can go wrong? Routing Design Errors iBGP Route Reflector used to distribute prefixes between IXP participants Claimed Advantage (1): Participants don’t need to know about or run BGP Actually a Disadvantage IXP Operator has to know BGP ISP not knowing BGP is big commercial disadvantage ISPs who would like to have a growing successful business need to be able to multi-home, peer with other ISPs, etc — these activities require BGP 289 What can go wrong? Routing Design Errors (cont) Route Reflector Claimed Advantage (2): Allows an IXP to be started very quickly Fact: IXP is only an Ethernet switch — setting up an iBGP mesh with participants is no quicker than setting up an eBGP mesh 290 What can go wrong? Routing Design Errors (cont) Route Reflector Claimed Advantage (3): IXP operator has full control over IXP activities Actually a Disadvantage ISP participants surrender control of: Their border router; it is located in IXP’s AS Their routing and peering policy IXP operator is single point of failure If they aren’t available 24x7, then neither is the IXP BGP configuration errors by IXP operator have real impacts on ISP operations 291 What can go wrong? Routing Design Errors (cont) Route Reflector Disadvantage (4): Migration from Route Reflector to “correct” routing configuration is highly non-trivial ISP router is in IXP’s ASN Need to move ISP router from IXP’s ASN to the ISP’s ASN Need to reconfigure BGP on ISP router, add to ISP’s IGP and iBGP mesh, and set up eBGP with IXP participants and/or the IXP Route Server 292 More Information 293 Exchange Point Policies & Politics AUPs Fees? Acceptable Use Policy Minimal rules for connection Some IXPs charge no fee Other IXPs charge cost recovery A few IXPs are commercial Nobody is obliged to peer Agreements left to ISPs, not mandated by IXP 294 Exchange Point etiquette Don’t point default route at another IXP participant Be aware of third-party next-hop Only announce your aggregate routes Read RIPE-399 first www.ripe.net/docs/ripe-399.html Filter! Filter! Filter! 295 Exchange Point Examples LINX in London, UK TorIX in Toronto, Canada AMS-IX in Amsterdam, Netherlands SIX in Seattle, Washington, US PA-IX in Palo Alto, California, US JPNAP in Tokyo, Japan DE-CIX in Frankfurt, Germany HK-IX in Hong Kong … All use Ethernet Switches 296 Features of IXPs (1) Redundancy & Reliability Support Multiple switches, UPS NOC to provide 24x7 support for problems at the exchange DNS, Route Collector, Content & NTP servers ccTLD & root servers Content redistribution systems such as Akamai Route Collector – Routing Table view 297 Features of IXPs (2) Location Address space neutral co-location facilities Peering LAN AS Number If using Route Collector/Server Route servers (optional, for larger IXPs) Statistics Traffic data – for membership 298 More info about IXPs http://www.pch.net/documents Another excellent resource of IXP locations, papers, IXP statistics, etc http://www.telegeography.com/ee/ix/inde x.php A collection of IXPs and interconnect points for ISPs 299 Summary L2 IXP – most commonly deployed The core is an ethernet switch ATM and other old technologies are obsolete L3 IXP – nowadays is a marketing concept used by wholesale ISPs Does not offer the same flexibility as L2 Not recommended unless there are overriding regulatory or political reasons to do so Avoid! 300 Internet Exchange Point Design ISP Training Workshops 301 BGP Configuration for IXPs ISP Training Workshops 302 Background This presentation covers the BGP configurations required for a participant at an Internet Exchange Point It does not cover the technical design of an IXP Nor does it cover the financial and operational benefits of participating in an IXP See the IXP Design Presentation that is part of this Workshop Material set for financial, technical and operational details 303 Recap: Definitions Transit – carrying traffic across a network, usually for a fee Traffic and prefixes originating from one AS are carried across an intermediate AS to reach their destination AS Peering – private interconnect between two ASNs, usually for no fee Internet Exchange Point – common interconnect location where several ASNs exchange routing information and traffic 304 IXP Peering Issues Only announce your aggregates and your customer aggregates at IXPs Only accept the aggregates which your peer is entitled to originate Never carry a default route on an IXP (or private) peering router 305 ISP Transit Issues Many mistakes are made on the Internet today due to incomplete understanding of how to configure BGP for peering at Internet Exchange Points 306 Simple BGP Configuration example Exchange Point Configuration 307 Exchange Point Example Exchange point with 6 ASes present Layer 2 – ethernet switch Each ISP peers with the other NO transit across the IXP is allowed 308 Exchange Point AS150 AS100 A AS110 AS120 F AS140 E B C D AS130 Each of these represents a border router in a different autonomous system 309 Router configuration IXP router is usually located at the Exchange Point premises Create a peer-group for IXP peers Configuration needs to be such that disconnecting it from the backbone does not cause routing loops or traffic blackholes All outbound policy to each peer will be the same Ensure the router is not carrying the default route Or the full routing table (for that matter) 310 Creating a peer-group & route-map router bgp 100 neighbor ixp-peer peer-group neighbor ixp-peer send-community neighbor ixp-peer prefix-list my-prefixes out neighbor ixp-peer route-map set-local-pref in ! ip prefix-list my-prefixes permit 121.10.0.0/19 ! Only allow AS100 address route-map set-local-pref permit 10 block to IXP peers set local-preference 150 ! Prefixes heard from IXP peers have highest preference 311 Interface and BGP configuration (1) interface fastethernet 0/0 description Exchange Point LAN ip address 120.5.10.1 mask 255.255.255.224 no ip directed-broadcast no ip proxy-arp IXP LAN BCP configuration no ip redirects ! router bgp 100 neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peer neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in 312 Interface and BGP Configuration (2) neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor ! ip ! ip ip ip ip ip 120.5.10.4 120.5.10.4 120.5.10.4 120.5.10.5 120.5.10.5 120.5.10.5 120.5.10.6 120.5.10.6 120.5.10.6 remote-as 130 peer-group ixp-peers prefix-list peer130 in remote-as 140 peer-group ixp-peers prefix-list peer140 in remote-as 150 peer-group ixp-peers prefix-list peer150 in Peer-group applied to each peer Each peer has own inbound filter route 121.10.0.0 255.255.224.0 null0 prefix-list prefix-list prefix-list prefix-list prefix-list peer110 peer120 peer130 peer140 peer150 permit permit permit permit permit 122.0.0.0/19 122.30.0.0/19 122.12.0.0/19 122.18.128.0/19 122.1.32.0/19 313 Exchange Point Configuration of the other routers in the AS is similar in concept Notice inbound and outbound prefix filters outbound announces myprefixes only inbound accepts peer prefixes only Notice inbound route-map Set local preference higher than default ensures that if the same prefix is heard via AS100 upstream, the best path for traffic is via the IXP 314 Exchange Point Ethernet port configuration Be aware of LAN configuration best practices Switch off proxy arp, redirects and broadcasts (if not already default) IXP border router must NOT carry prefixes with origin outside local AS and IXP participant ASes Helps prevent “stealing of bandwidth” 315 Exchange Point Issues: AS100 needs to know all the prefixes its peers are announcing New prefixes requires the prefix-lists to be updated Alternative solutions Use the Internet Routing Registry to build prefix list Use AS Path filters (could be risky) 316 More Complex BGP example Exchange Point Configuration 317 Exchange Point Example Exchange point with 6 ASes present Layer 2 – ethernet switch Each ISP peers with the other NO transit across the IXP allowed ISPs at exchange points provide transit to their BGP customers 318 Exchange Point AS200 AS201 AS110 AS120 AS150 AS100 A F AS140 E B C D AS130 Each of these represents a border router in a different autonomous system 319 Exchange Point Router A configuration interface fastethernet 0/0 description Exchange Point LAN ip address 120.5.10.2 mask 255.255.255.224 no ip directed-broadcast no ip proxy-arp no ip redirects ! Filter by ASN rather router bgp 100 than by prefix – and neighbor ixp-peers peer-group block bogons too neighbor ixp-peers send-community neighbor ixp-peers prefix-list bogons out neighbor ixp-peers filter-list 10 out neighbor ixp-peers route-map set-local-pref in ...next slide 320 Exchange Point neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor neighbor 120.5.10.2 120.5.10.2 120.5.10.2 120.5.10.3 120.5.10.3 120.5.10.3 120.5.10.4 120.5.10.4 120.5.10.4 120.5.10.5 120.5.10.5 120.5.10.5 120.5.10.6 120.5.10.6 120.5.10.6 remote-as 110 peer-group ixp-peers prefix-list peer110 in remote-as 120 peer-group ixp-peers prefix-list peer120 in remote-as 130 peer-group ixp-peers prefix-list peer130 in remote-as 140 peer-group ixp-peers prefix-list peer140 in remote-as 150 peer-group ixp-peers prefix-list peer150 in 321 Exchange Point ip route 121.10.0.0 255.255.224.0 null0 ! ip as-path access-list 10 permit ^$ ip as-path access-list 10 permit ^200$ ip as-path access-list 10 permit ^201$ ! ip prefix-list peer110 permit 122.0.0.0/19 ip prefix-list peer120 permit 122.30.0.0/19 ip prefix-list peer130 permit 122.12.0.0/19 ip prefix-list peer140 permit 122.18.128.0/19 ip prefix-list peer150 permit 122.1.32.0/19 ! route-map set-local-pref permit 10 set local-preference 150 322 Exchange Point Notice the change in router A’s configuration Filter-list instead of prefix-list permits local and customer ASes out to exchange Prefix-list blocks Special Use Address prefixes – rest get out, could be risky Other issues as previously This configuration will not scale as more and more BGP customers are added to AS100 As-path filter has to be updated each time Solution: BGP communities 323 More scalable BGP example Exchange Point Configuration 324 Exchange Point Example (Scalable) Exchange point with 6 ASes present Each ISP peers with the other Layer 2 – ethernet switch NO transit across the IXP allowed ISPs at exchange points provide transit to their BGP customers (Scalable solution is presented here) 325 Exchange Point AS150 AS100 AS110 AS120 A F AS140 E B C D AS130 Each of these represents a border router in a different autonomous system - each ASN has BGP customers of their own 326 Router configuration Take AS100 as an example Create a peer-group for IXP peers All outbound policy to each peer will be the same Communities will be used Has 15 BGP customers, in AS501 to AS515 AS-path filters will not scale well Community Policy AS100 aggregate put into 100:1000 All BGP customer aggregates go into 100:1100 327 Creating a peer-group & route-map router bgp 100 neighbor ixp-peer peer-group neighbor ixp-peer send-community neighbor ixp-peer route-map ixp-peers-out out neighbor ixp-peer route-map set-local-pref in ! AS100 aggregate ip community-list 10 permit 100:1000 ip community-list 11 permit 100:1100 AS100 BGP customers ! route-map ixp-peers-out permit 10 match community 10 11 ! route-map set-local-pref permit 10 Prefixes heard from IXP peers set local-preference 150 have highest preference 328 ! BGP configuration for IXP router router bgp 100 neighbor 120.5.10.2 neighbor 120.5.10.2 neighbor 120.5.10.2 neighbor 120.5.10.3 neighbor 120.5.10.3 neighbor 120.5.10.3 ...etc remote-as 110 peer-group ixp-peer prefix-list peer110 in remote-as 120 peer-group ixp-peers prefix-list peer120 in Remaining configuration is the same as earlier Note the reliance again on inbound prefix-lists for peers Peers need to update the ISP if filters need to be changed And that’s what the IRR is for (otherwise use email) 329 BGP configuration for AS100’s customer aggregation router router bgp 100 network 121.10.0.0 mask 255.255.192.0 route-map set-comm neighbor 121.10.4.2 remote-as 501 neighbor 121.10.4.2 prefix-list as501-in in neighbor 121.10.4.2 prefix-list default out neighbor 121.10.4.2 route-map set-cust-policy in ...etc ! Set community on route-map set-comm permit 10 AS100 aggregate set community 100:1000 ! route-map set-cust-policy permit 10 set community 100:1100 Set community on ! BGP customer routes 330 Scalable IXP policy ISP Community policy is set on ingress ISP now relies on communities to determine what is announced at the IXP If BGP customer announces more prefixes, only the filters at the aggregation edge need to be updated No need to update any as-path filters, prefix-lists, &c And those new prefixes will automatically be tagged with the community to allow them through to AS100’s IXP peers Consult the BGP community presentation for more extensive examples 331 Route Servers IXP operators quite often provide a Route Server to assist with scaling the BGP mesh All prefixes sent to a Route Server are usually distributed to all ASNs that peer with the Route Server (although some IXPs offer ISPs the facility to configure specific policies on their Route Server) BGP configuration to peer with a Route Server is the same as for any other ordinary peer But note that the route server will offer prefixes from several ASNs (the IXP membership who choose to participate) Inbound filter should be constructed appropriately 332 Route Servers Route Server software suppresses the ASN of the RS so that it doesn’t appear in the AS-path IOS by default will not accept prefixes from a neighbouring AS unless that AS is first in the ASpath Needed so that IOS can receive prefixes without AS65534 being first in path router bgp 100 no bgp enforce-first-as neighbor x.x.x.a remote-as 65534 neighbor x.x.x.a route-map IXP-RS-in in neighbor x.x.x.a route-map ixp-peers-out out 333 Summary Exchange Point Configuration 334 Summary Ensure that BGP is scalable on your IXP peering router Only carry local ASN prefixes and customer routes on the IXP peering router Manually updating filters every time a new customer connects is tiresome and has potential to cause errors Anything else (e.g. default or full BGP table) has the potential to result in bandwidth theft Filter IXP peer announcements Inbound – use the IRR if maintaining prefix-lists is difficult Outbound – use communities for scalability 335 BGP Configuration for IXPs ISP Training Workshops 336