用Nexus 设计数据中心 -Deploying OTV in Datacenter Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1 Agenda OTV 介绍 OTV 典型部署模式 路径优化(Path Optimization) Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 2 数据中心二层扩展需求 业务需求 Disaster Avoidance Business Continuance Workload mobility 多点数据中心 • 灾备中心如2地3中心 • 原有数据中心由于早期设计机房空间、电力、制冷、性能容量的限制 ,需要新增数据中心灵活扩展 • 建多点物理位置分散的数据中心提供更高可靠性保障,同时实现用户 访问的流量更好的在数据中心之间分担,获得更好的访问性能 Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 3 Traditional Layer 2 Extension EoMPLS Dark Fiber VPLS Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 4 Overlay Transport Virtualization (OTV) OTV is a “MAC in IP” technique to extend Layer 2 domains OVER ANY TRANSPORT O T V Presentation_ID Overlay - A solution that is independent of the infrastructure technology and services, flexible over various inter-connect facilities Transport - Transporting services for layer 2 Ethernet and IP traffic Virtualization - Provides virtual stateless multi-access connections © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 5 OTV Control Plane MAC Address Advertisements (Multicast-Enabled Transport) Every time an Edge Device learns a new MAC address, the OTV control plane will advertise it together with its associated VLAN IDs and IP next hop. The IP next hops are the addresses of the Edge Devices through which these MACs addresses are reachable in the core. A single OTV update can contain multiple MAC addresses for different VLANs. A single update reaches all neighbors, as it is encapsulated in the same ASM multicast group used for the neighbor discovery. 4 VLAN 1 3 New MACs are learned on VLAN 100 Vlan 100 MAC A Vlan 100 MAC B Vlan 100 MAC C OTV update is replicated by the core 3 Core MAC IF 100 MAC A IP A 100 MAC B IP A 100 MAC C IP A East 2 IP A VLAN West 3 MAC IF 100 MAC A IP A 100 MAC B IP A 100 MAC C IP A 4 South-East Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 6 OTV Data Plane: Inter-Site Packet Flow 4. The Edge Device on site East receives and decapsulates the packet. 5. Layer 2 lookup on the original frame. MAC 3 is a local MAC. 6. The frame is delivered to the destination. 1. Layer 2 lookup on the destination MAC. MAC 3 is reachable through IP B. 2. The Edge Device encapsulates the frame. 3. The transport delivers the packet to the Edge Device on site East. 3 Transport Infrastructure MAC TABLE 1 Layer 2 Looku p VLAN MAC IF 100 MAC 1 Eth 2 100 OTV MAC 2 Eth 1 100 MAC 3 IP B 100 MAC 4 IP B MAC 1 MAC 3 Presentation_ID IP A OTV 2 Encap MAC 1 MAC 3 IP A IP B MAC 1 © 2006 Cisco Systems, Inc. All rights reserved. West Site Cisco Confidential MAC TABLE Decap IP B 4 OTV MAC 1 MAC 3 IP A IP B East Site VLAN MAC IF 100 MAC 1 IP A 100 MAC 2OTV IP A 100 MAC 3 Eth 3 100 MAC 4 Eth 4 MAC 1 MAC 3 5 Layer 2 Looku p 6 MAC 3 7 OTV Data Plane: Multicast Data Multicast State Creation 1. The multicast receivers for the multicast group “Gs” on the East site send IGMP reports to join the multicast group. 2. The Edge Device (ED) snoops these IGMP reports, but it doesn’t forward them. 3. Upon snooping the IGMP reports, the ED does two things: 1. Announces the receivers in a Group-Membership Update (GM-Update) to all EDs. 2. Sends an IGMPv3 report to join the (IP A, Gd) group in the core. 4. On reception of the GM-Update, the source ED will add the overlay interface to the appropriate multicast Outbound Interface List (OIL). 2 OIL-List Group Gs Gd IF OTV Overlay 4 Client IGMP snoop Multicast-enabled Transport OTV Receive GM-Update Update OIL 3.1 GM-Update IP B SSM Tree for Gd Source 1 Client IGMP report to join Gs Receiver 3.2 IP A West Presentation_ID From Right to Left IGMPv3 report to join (IP A, Gd) , the SSM group in the Core. It is important to clarify that the edge devices join the core multicast groups as hosts, not as routers! Cisco Confidential © 2006 Cisco Systems, Inc. All rights reserved. East 8 OTV Data Plane: Multicast Data Multicast Packet Flow OIF-List 1 Looku p Group IF Gs Gd Overlay OTV IPs Gs Multicast-enabled Transport IPs Gs IP A Gd 3 Transport Replication IPs Gs 2 IP B IP A Gd Source IPs Gs 4 IP A OTV IP A Gd IPs Gs Receiver Decap 5 Encap West IP C OTV IP s Gs East 4 IP A Gd IPs Gs Decap 5 Receiver South Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9 OTV Control Plane Neighbor Discovery (Unicast-Only Transport) 1. One of the OTV Edge Devices (ED) is configured as an Adjacency Server (AS)*. 2. All EDs are configured to register to the AS: send their site-id and IP address. 3. The AS builds a list of neighbor IP addresses: overlay Neighbor List (oNL). 4. The AS unicasts the oNL to every neighbor. 5. Each node unicasts hellos and updates to every neighbor in the oNL. Site 2 IP B oNL Site 1, IP Site 2, IP Site 3, IP Site 4, IP Site 5, IP Site 3 IP C Site 1 A B C D E Unicast-Only Transport IP A Adjacency Server Mode IP E IP D Site 4 Site 5 * A redundant pair may be configured Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10 OTV Encapsulation Consideration OTV adds a 42 Byte IP encapsulation The OTV shim header contains VLAN ID, Overlay number and CoS The OTV Edge Devices do NOT perform packet fragmenting and reassembling. A packet failing the MTU is dropped by the Forwarding Engine Make sure that [xB + 42B] < DCI MTU… where x = Size of original packet 802.1Q DMAC 802.1Q 6B 6B 2B IP Header Payload 20B CRC OTV Shim VLA N Ether Type Et h CoS SMAC To S DMAC SMAC 8B Original Frame 4B 42 Byte encapsulation Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11 OTV Automated Multi-homing Per-VLAN Load Balancing The detection of the multi-homing is fully automated and it does not require additional protocols and configuration The Edge Devices within a site discover each other over the “otv site vlan”. In each site OTV elects one of the Edge Devices to be the Authoritative Edge Device (AED) for a subset of the extended VLANs In a dual-homed site the VLANs will be split in odd and even VLANs The AED: forwards traffic to and from the overlay advertises MAC addresses for any given site/VLAN MAC TABLE VLAN MAC 100 MAC 1 IP A 101 MAC 2 IP B AED IF AED Transport OTV OTV IP A OTV OTV IP B AED Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. AED Cisco Confidential 12 OTV Layer 2 Fault Isolation STP isolation – No configuration required • No BPDUs forwarded across the overlay • STP remains local to each site • Edge device internal interfaces behave as any other switchport Unknown unicast isolation – No configuration required • No unknown unicast frames flooded onto the overlay • Assumption is that end stations are not silent • Option for selective unknown unicast flooding (for certain applications) Proxy ARP cache for remote-site hosts – On by default • On ARP request for remote host, request forwarded through OTV and initial ARP reply generated by that host • OTV edge device snoops ARP replies and caches data • Subsequent ARP replies proxied by local OTV edge device using ARP cache Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13 MAC Mobility Local MAC = Blue Remote MAC = Red Server Moves OTV OTV MAC X MAC X West East OTV MAC X OTV MAC X MAC X AED OTV AED AED advertises MAC X with a metric of zero OTV MAC X MAC X East OTV West AED detects MAC X is now local. OTV MAC X MAC X MAC X AED AED OTV OTV EDs in site West see MAC X advertisement with a better metric from site East and change them to remote MAC address. MAC X MAC X West East OTV OTV MAC X MAC X MAC X AED AED Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14 OTV VDC OTV VDC Models Two different deployment models are considered for the OTV VDC: OTV Appliance on a Stick Inline OTV Appliance Common Uplinks to Transport For Layer3 and DCI OTV VDC Dedicated Uplinks to the Uplink for DCI Layer3 Transport Join Interface Internal Interface SVIs L3 OTV VDC SVIs L3 L2 L2 Inline OTV Appliance OTV Appliance on a Stick No difference in OTV functionality between the two models Presentation_ID The Inline OTV Appliance requires availability of Core downstream links © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15 OTV Edge Device at the Aggregation OTV at the Aggregation w/ L2-L3 Boundary DC Core performs only Layer 3 role ARP, STP and unknown unicast domains isolated between PODs Inter or Intra-DC LAN extension provided by OTV Ideal for single aggregation block topology Join Interface Recommended for Greenfield Internal Interface Virtual Overlay Interface Core OTV VDC SVIs SVIs OTV VDC VPC OTV VDC Aggregation SVIs SVIs OTV VDC VPC Access Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16 OTV Edge Device at the Core OTV at the DC Core with L2–L3 boundary at the Aggregation Option 1 – Dedicated devices to perform OTV Physical devices or VDCs carved out from the Nexus 7000 deployed in the core Easy deployment for Brownfield Dedicated Uplinks for DCI Dedicated Uplinks for Layer 3 Separated infrastructure to provide Layer 2 extension and Layer 3 connectivity services OTV OTV VLANs extended from Agg Layer VPC Recommended to use separate physical links for L2 & L3 traffic Loop-free hub-and-spoke Layer 2 topology VPC L3 L2 VSS Aggregation VPC Access Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17 OTV Edge Device at the Core OTV at the DC Core with L2–L3 boundary at the Aggregation Option2 – Common Devices for DCI and Layer 3 Easy deployment for brownfields DC Core devices perform Layer 3 and OTV functionalities Easy deployment for Brownfield HSRP Localization at each POD VLANs extended from Agg Layer OTV Recommended to use separate physical links for L2 & L3 traffic Loop-free hub-and-spoke Layer 2 topology STP and L2 broadcast Domains not isolated between PODs OTV Common Uplinks for DCI and Layer 3 Core Carries Only the OTV extended VLAN VPC VPC L3 L2 Carries Only the OTV extended VLAN VSS Aggregation VPC Access Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18 Deploy OTV at the Core OTV at the DC Core with L2–L3 boundary at the Core Easy deployment for Brownfield L2-L3 boundary in the DC core DC Core devices performs L2, L3 and OTV functionalities Requires a dedicated OTV VDC into core Nexus OTV deployed in the DC core to provide LAN extension services to remote sites Intra-DC LAN extension provided by bridging through the Core VSS/vPC recommended to create an STP loopless topology Storm-control between PODs Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19 OTV VDC Two possible approaches DCI Edge Layer DCI Edge Layer N7K1-VDCB N7K1-VDCB N7K2-VDCB N7K2-VDCB Warning Aggregation Layer Aggregation Layer N7K1-VDCA N7K1-VDCA N7K2-VDCA Only AED forwards the traffic to and from OTV Overlay DCI traffic hashed to OTV Edge (non-AED) device will have to traverse the vPC PeerLink between the two DCI Edge switches Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential N7K2-VDCA Single vPC Layer at the Aggregation. Provides good level of resiliency with the minimum amount of ports. DCI traffic is always forwarded directly to the OTV AED device (mac-addresstable) 20 Path Optimization Egress Routing Localization – OTV Solution The approach is to use the same HSRP group in all sites and therefore provide the same default gateway MAC address. Each site pretends that it is the sole existing one, and provide optimal egress routing of traffic locally. OTV achieves Edge Routing Localization by filtering the HSRP hello messages between the sites, therefore limiting the “view” of what other routers are present within the VLAN. ARP requests are intercepted at the OTV edge to ensure the replies are from the local active GWY. Active GWY Site 1 Active GWY Site 2 L3 L2 ARP traffic is kept local FHRP Hellos West Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential FHRP Hellos East ARP traffic is kept local 21 Filtering Configuration for HSRP Localization To be applied in the OTV VDC Step 1: VACL Option or Port ACL Option HSRPv2 ip access-list hsrp 10 permit udp any 224.0.0.2/32 eq 1985 Filters HSRP packets in OTV VDC action drop vlan access-map hsrp-localize 20 match ip address all-ips action forward ip access-list otv-hsrp-filter 10 deny udp any 224.0.0.2/32 eq 1985 20 deny udp any 224.0.0.102/32 eq 1985 20 permit ip any any 20 permit udp any 224.0.0.102/32 eq 1985 ip access-list all-ips 10 permit ip any any vlan access-map hsrp-localize 10 match ip address hsrp HSRPv2 interface x/y description [ OTV internal interfacs] ip port access-group otv-hsrp-filter Step2: Filters VIP MAC advertisements in OTV vlan filter hsrp-localize vlan-list <OTV-VLANs> mac-list hsrp-vmac seq 10 deny 0000.0c07.ac00 ffff.ffff.ff00 HSRPv2 mac-list hsrp-vmac seq 20 deny 0000.0c9f.f000 ffff.ffff.f000 mac-list hsrp-vmac seq 20 permit 0000.0000.0000 0000.0000.0000 route-map hsrp-filter permit 10 match mac-list hsrp-vmac otv-isis default Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential vpn overlay<#> redistribute filter route-map hsrp-filter 22 Distributed Workload Mobility State Created Before vMotion © 2006 Cisco Systems, Inc. All rights reserved. N7K3-VDCA Firewall DCI Traffic incurs DCI latency Presentation_ID LB N7K2-VDCA Source NAT for symmetric flow SNAT Firewall LB N7K1-VDCA FHRP localization is not possible, because request and reply need to pass through the same service device pair N7K4-VDCA Outbound Traffic with Services Cisco Confidential LD vMotion After vMotion 23 Distributed Workload Mobility Inbound Traffic using RHI Route Health Injection makes use of ACE Load Balancer to inject /32 host route once Virtual Machine moves RHI /32 N7K4-VDCA Load Balancer N7K3-VDCA N7K2-VDCA N7K1-VDCA Load Balancer DCI Before vMotion Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. LD vMotion Cisco Confidential After vMotion 24 Path Optimization Prefix Route Locator (RLOC) Ingress Routing Optimization with LISP 10.10.10.1 A, B 10.10.10.2 A, B … … 10.10.10.5 C, D 10.10.10.6 C, D End-point host ID (EID) Route Locator (RLOC) Ingress Tunnel Router (ITR) Egress Tunnel Router (ETR) IP_DA = 10.10.10.1 1) ITR consults directory to get Route Locator (RLOC) for the destination End-point ID (EID) 2) ITR IPinIP encapsulates traffic to send it to the RLOC address IP_DA= A 3 Here RLOC routes only Core OTV Decap Granular reachability information for hosts in extended subnet RLOCs: A B Egress TR (ETR) C D Pod N Pod A IP_DA = 10.10.10.1 If a host moves, its mapping is updated … No end-host state in routing tables © 2006 Cisco Systems, Inc. All rights reserved. Ingress Tunnel Router (ITR) IP_DA = 10.10.10.1 3) ETRs receive and decapsulate traffic Presentation_ID 1 Encap 2 EIDs: 10.10.10.1 Cisco Confidential .2 .3 .4 .5 .6 Extended Subnet (10.10.10.0 /24) .7 .8 25 OTV在企业网的应用 部门位置分散,需要按照部门划分VLAN 在园区移动办公 网络迁移 集团单位骨干网为下属单位提供二层通道 等等 Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26 Challenges with LAN Extensions Real Problems Solved by OTV Extensions over any transport (IP, MPLS) Fault Domain North Data Center Fault Domain Failure boundary preservation Site independence / isolation Optimal BW utilization (no head-end replication) Resiliency/multihoming LAN Extension Built-in end-to-end loop prevention Multisite connectivity (inter and intra DC) Scalability VLANs, sites, MACs ARP, broadcasts/floods Only 5 CLI commands Operations simplicity Fault Domain Fault Domain South Data Center Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27 OTV现阶段不足之处 IETF draft,还未形成正式标准 Convergence time(3s-30s) 目前支持的Site比较少,不适合汇聚层的部署 SVI limitation 目前Per-VLAN AED流量负载平衡问题 目前backbone必须支持组播 Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 28 Presentation_ID © 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential 29