VMware® Network Virtualization Design Guide T e c h n i c a l W HI T E P A P E R January 2013 VMware Network Virtualization Design Guide Table of Contents Intended Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Components of the VMware Network Virtualization Solution. . . . . . . . . . . . . . . . . . . . . . . . 4 vSphere Distributed Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Logical Network (VXLAN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 vCloud Networking and Security Edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 vCloud Networking and Security Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 vCloud Director. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 VXLAN Technology Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Standardization Effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Encapsulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 VXLAN Packet Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Intra-VXLAN Packet Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Inter-VXLAN Packet Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Network Virtualization Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Physical Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Network Topologies with L2 Configuration in the Access Layer. . . . . . . . . . . . . . . . 12 Network Topologies with L3 Configuration in the Access Layer. . . . . . . . . . . . . . . . 13 Logical Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Scenario 1 – Greenfield Deployment: Logical Network with a Single Physical L2 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Scenario 2 – Logical Network: Multiple Physical L2 Domains. . . . . . . . . . . . . . . . . . . 15 Scenario 3 – Logical Network: Multiple Physical L2 Domains with vMotion. . . . . . 16 Scenario 4 – Logical Network: Stretched Clusters Across Two Datacenters . . . . . 17 Managing IP Addresses in Logical Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Scaling Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Consumption Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 In vCloud Director. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 In vCloud Networking and Security Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Using API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Troubleshooting and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Network Health Check. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 VXLAN Connectivity Check – Unicast and Broadcast Tests . . . . . . . . . . . . . . . . . . . . . . 23 Monitoring Logical Flows – IPFIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Port Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 T ECHNICAL W HI T E P A P E R / 2 VMware Network Virtualization Design Guide Intended Audience This document is targeted toward virtualization and network architects interested in deploying VMware® network virtualization solutions. Overview The IT industry has gained significant efficiency and flexibility as a direct result of virtualization. Organizations are moving toward a virtual datacenter (VDC) model, and flexibility, speed, scale and automation are central to their success. Although compute and memory resources are pooled and automated, networks and network services, such as security, have not kept pace. Traditional network and security operations not only reduce efficiency but also limit the ability of businesses to rapidly deploy, scale and protect applications. VMware vCloud® Networking and Security™ offers a network virtualization solution to overcome these challenges. Application Application Application Workload x86 Environment Virtual Machine Virtual Machine Workload Workload L2, L3, L4-7 Network Services Virtual Machine Server Hypervisor Requirement: x86 Physical Compute and Memory Virtual Network Decoupled Virtual Network Virtual Network Network Virtualization Platform Requirement: IP Transport Physical Network Figure 1. Server and Network Virtualization Analogy Figure 1 draws an analogy between compute and network virtualization. Just as VMware vSphere® abstracts compute capacity from the server hardware to create virtual pools of resources, network virtualization abstracts the network into a generalized pool of network capacity. The unified pool of network capacity can then be optimally segmented into logical networks directly attached to specific applications. Customers can create logical networks that span physical boundaries, optimizing compute resource utilization across clusters and pods. Unlike legacy architectures, logical networks can be scaled without reconfiguring the underlying physical hardware. Customers can also integrate network services—such as firewalls, VPNs and load balancers—and deliver them exactly where they are needed. “Single pane of glass” management for all these services further reduces the cost and complexity of datacenter operations. T ECHNICAL W HI T E P A P E R / 3 VMware Network Virtualization Design Guide The VMware network virtualization solution addresses the following key needs in today’s datacenter: •Increasing compute utilization by pooling compute clusters •Enabling noncontiguous cluster expansion •Leveraging capacity across multiple racks in the datacenter •Overcoming IP-addressing challenges when moving workloads •Avoiding VLAN sprawl in large environments •Enabling multitenancy at scale without encountering VLAN scale limitations By adopting network virtualization, customers can effectively address these issues as well as realize the following business benefits: •Drive faster provisioning of network and services, enabling business agility •Improve infrastructure utilization, leading to significant CapEx savings ––Increase compute utilization by 30 percent by efficiently pooling compute resources ––Increase network utilization by 40 percent due to compute pooling and improved traffic management •Decouple logical networks from physical networks, providing complete flexibility •Isolate and segment network traffic at scale •Provide multitenancy without increasing the administrative burden •Automate repeatable network and service provisioning workflows, translating to 30 percent or more in OpEx savings on network operations alone Components of the VMware Network Virtualization Solution There are several components bundled in the vCloud Networking and Security suite, plus several components of the core vSphere layer, used to deploy VMware network virtualization: 1.VMware vSphere Distributed Switch™ 5.1 (VDS) 2. VMware vSphere logical network (VXLAN) 3. VMware vCloud Networking and Security Edge™ 5.1 4.VMware vCloud Networking and Security Manager™ 5.1 5.VMware vCloud Director® 5.1 (not part of the vCloud Networking and Security suite) 6.VMware vCenter Server™ 5.1 (not part of the vCloud Networking and Security suite; shown as part of item 4 in Figure 2) T ECHNICAL W HI T E P A P E R / 4 VMware Network Virtualization Design Guide 5 VCD 3 VMware L3 Edge vShield Manager/ vCenter 4 2 Logical Network (VXLAN) Physical IP Network 1 VM VM VM VM VM Figure 2. VMware VXLAN Solution Components vSphere Distributed Switch VDS abstracts the physical network and provides access-level switching in the vSphere hypervisor. It is central to network virtualization because it enables logical networks that are independent of physical constructs such as VLAN. Keep in mind the following key points: •VDS facilitates massive scale, with support for up to 500 physical hosts. •Multiple features such as Port Mirroring, NetFlow/IPFIX, Configuration Backup and Restore, Network Health Check, QoS, LACP, and so on, provide a comprehensive toolkit for traffic management, monitoring and troubleshooting within a virtual network. For specific feature details, refer to the What’s New in VMware vSphere 5.1 – Networking white paper at http://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdf. Logical Network (VXLAN) VMware network virtualization is built using Virtual eXtensible Local Area Network (VXLAN) overlay networking technology, an industry standard that VMware developed jointly with major networking vendors. Logical network enables the following capabilities: •Creation over existing IP networks of a flexible logical layer 2 (L2) overlay network that works on existing physical network infrastructure without the need to rearchitect any of the datacenter networks •Communication (east–west and north–south) while maintaining isolation between tenants •Application workloads that are agnostic of the overlay network and transparently perform all L2-to-VXLAN translations in the host See the following sections for more details on VXLAN technology, architecture components and packet flows. T ECHNICAL W HI T E P A P E R / 5 VMware Network Virtualization Design Guide vCloud Networking and Security Edge vCloud Networking and Security Edge serves as a VXLAN gateway, translating traffic between the logical network and a physical VLAN- or IP-based network. In addition, it provides services to the logical network such as DHCP, NAT, routing (static routing), firewall, VPN and load balancing. It is deployed in a virtual appliance form factor, supports full active–standby HA functionality and can support up to 9GBps of traffic. The following are key points to consider for vCloud Networking and Security Edge VXLAN gateway and network services offered in network virtualization: •It acts as an L3 gateway to translate between VXLAN and physical networks and is primarily used for north–south traffic. •It provides inter-VXLAN routing. •Each VXLAN segment requires a separate vCloud Networking and Security Edge interface to ensure isolation. •It is available in three sizes: compact, full and x-large; it offers options to scale up for higher performance or scale out using multiple virtual appliances. •vCloud Networking and Security Edge firewall services can be applied on a per–VXLAN segment basis. •In multitenant deployments, individual pools of IP per tenant can be provided using vCloud Networking and Security Edge DHCP services. vCloud Networking and Security Manager vCloud Networking and Security Manager is the centralized network and security management component of the vCloud Networking and Security product suite. It is installed from an open virtualization appliance (OVA) file as a virtual machine by using VMware vSphere Client™. Keep in mind the following important points about vCloud Networking and Security Manager: •Using the vCloud Networking and Security Manager user interface, administrators can install, configure and maintain network and network services components. •vCloud Networking and Security Manager exposes APIs that can be used to integrate with existing cloud management systems or for scripts. These are also termed as northbound APIs. •vCloud Director requires vCloud Networking and Security Manager to offer simple workflows for consumption of virtual networks and services. •VMware vCenter Server™ plug-in for vCloud Networking and Security Manager enables customers to perform VXLAN configuration from vCenter Server as part of the Network Virtualization tab. vCloud Director The vCloud Director virtual datacenter container is a highly automatable abstraction of the pooled virtual infrastructure. Network virtualization is fully integrated in vCloud Director workflows, enabling rapid self-service provisioning within the context of the application workload. vCloud Director uses vCloud Networking and Security Manager in the backend to provision network virtualization elements. vCloud Director is not part of vCloud Networking and Security; it is a separate purchased component. It is not mandatory for deploying a network virtualization solution, but it is highly recommended to achieve the complete operational flexibility and agility discussed previously. See consumption models for all available consumption choices for VMware network virtualization. T ECHNICAL W HI T E P A P E R / 6 VMware Network Virtualization Design Guide VXLAN Technology Overview Standardization Effort VXLAN is an Internet Engineering Task Force (IETF) Internet draft formulated in collaboration with leading networking vendors including Cisco, Arista and Broadcom. It provides a framework for creating L2 overlay networks over L3 networks. Each L2 overlay network is called a VXLAN segment (or “virtual wire”) and is uniquely identified by a 24-bit segment ID. This enables customers to create up to 16 million unique VXLAN segments, each of which is an isolated logical network. Encapsulation VXLAN makes use of an encapsulation or tunneling method to carry the L2 overlay network traffic on top of L3 networks. A special kernel module running on the vSphere hypervisor host along with a vmknic acts as the virtual tunnel endpoint (VTEP). Each VTEP is assigned a unique IP address that is configured on the vmknic virtual adapter associated with the VTEP. The VTEP on the vSphere host handles all encapsulation and deencapsulation of traffic for all virtual machines running on that host. A VTEP encapsulates the MAC and IP packets from the virtual machines with a VXLAN+UDP+IP header and sends the packet out as an IP unicast or multicast packet. The latter mode is used for broadcast and unknown destination MAC frames originated by the virtual machines that must be sent across the physical IP network. Figure 3 shows the VXLAN frame format. The original packet between the virtual machines communicating on the same VXLAN segment is encapsulated with an outer Ethernet header, an outer IP header, an outer UDP header and a VXLAN header. The encapsulation is done by the source VTEP and is sent out to the destination VTEP. At the destination VTEP, the packet is stripped of its outer header and is passed on to the destination virtual machine if the segment ID in the packet is valid. Outer MAC DA Outer MAC SA Outer 8021.Q Outer IP DA Outer IP SA VXLAN Encapsulation Outer UDP VXLAN Header 8 bytes Inner MAC DA Inner MAC SA Optional Inner 8021.Q Original Ethernet Payload CRC Original Ethernet Frame Figure 3. VXLAN Frame Format The destination MAC address in the outer Ethernet header can be the MAC address of the destination VTEP or that of an intermediate L3 router. The outer IP header represents the corresponding source and destination VTEP IPs. The association of the virtual machine’s MAC to the VTEP’s IP is discovered via source learning. More details on the forwarding table are provided in the “VXLAN Packet Flow” section. The outer UDP header contains source port, destination port and checksum information. The source port of the UDP header is a hash of the inner Ethernet frame’s header. This is done to enable a level of entropy for ECMP/load balancing of the virtual machine–to–virtual machine traffic across the VXLAN overlay. The VXLAN header is an 8-byte field that has 8 bits to indicate whether the VXLAN Network Identifier (VNI) is valid, 24 bits for the VXLAN Segment ID/VXLAN VNI and the remaining 24 bits reserved. T ECHNICAL W HI T E P A P E R / 7 VMware Network Virtualization Design Guide VXLAN Packet Flow The following flow pattern describes the handling of ARP on a VXLAN segment (for the purposes of discussion, it is a typical ARP packet from a virtual machine (MAC1) connected to a logical L2 network VXLAN 5001): •Figure 4 shows two virtual machines connected to a logical L2 network. The virtual machines don’t detect any difference in communicating to the external world. They continue to use standard IP protocol to communicate with the destination. The traffic flows through the VTEP interface defined on the host. •Each logical L2 network is associated with an IP multicast group. In this example, VXLAN 5001 is associated with IP multicast group address (239.1.1.1), and both vSphere hosts (VTEPs) have joined that multicast group. •The ARP broadcast frame from the virtual machine is encapsulated within an IP multicast frame by the VTEP on which the virtual machine is running. •The multicast frame is then sent to the multicast group associated with a logical L2 network segment ID. •The multicast frame is received by the target VTEPs. The destination VTEPs then validate the logical L2 network segment ID, deencapsulate the packet, and forward it if there are virtual machines on that host that are connected to this L2 network. •The destination virtual machine then responds to the ARP request with a unicast packet. The VTEP on the host on which this destination virtual machine is running establishes a point-to-point tunnel with the VTEP where the virtual machine MAC1 is hosted. NOTE: •The number of multicast groups supported in the physical infrastructure dictates whether there can be a one-to-one mapping to logical L2 network segment IDs. However, in the scenario where there are more logical networks than multicast groups, mapping of multiple logical networks to one multicast group is supported. •Multicast frames are generated only when a broadcast packet is detected on the logical L2 network or if VTEP’s forwarding table does not have the mapping of a virtual machine MAC-to-VTEP IP for that MAC address, also called an unknown unicast packet. This is similar to the transparent bridging operation of L2 switches or bridges where the packets are broadcast if there is no entry in the MAC forwarding table that matches the destination MAC address of a frame. After the virtual machine MAC address–to–VTEP IP address entry has been discovered and updated into the forwarding table, any future requests for communication to that particular virtual machine is handled by the source host VTEP by establishing a point-to-point (stateless) tunnel between destination VTEPs where the virtual machine is hosted. •The IP multicast protocol acts as a control plane that helps build the forwarding table with virtual machine MAC address and VTEP IP address mapping. Figure 4 shows the packet encapsulation and a forwarding table entry in one of the VTEPs. VTEP MAC addresses are detected during the multicast packet exchange that occurs when a virtual machine is connected to a virtual wire. No standard ARP request is sent out from the VXLAN kernel module to detect the VTEP MAC address, so there is no proxy ARP configuration requirement on the first hop router. T ECHNICAL W HI T E P A P E R / 8 VMware Network Virtualization Design Guide L2 IP Payload L2 IP Payload VM 1 VM MAC 1 MAC 2 4 VXLAN 5001 vSphere Distributed Switch vSphere vSphere Forwarding Table VTEP IP 10.20.10.10 VM MAC VTEP IP Segment ID MAC1 10.20.10.10 5001 2 VTEP IP 10.20.10.11 3 L2 IP UDP VXLAN L2 IP Payload L2/L3 network infra Figure 4. VXLAN Encapsulation and Forwarding Table Example The next part of this section describes packet flow in the following VXLAN deployments: 1) Intra-VXLAN packet flow; that is, two virtual machines on the same logical L2 network 2) Inter-VXLAN packet flow; that is, two virtual machines on two different logical L2 networks T ECHNICAL W HI T E P A P E R / 9 VMware Network Virtualization Design Guide Intra-VXLAN Packet Flow Figure 5 shows two traffic flows: •A virtual machine is communicating with another virtual machine on the same logical L2 network (red dotted line). •A virtual machine is communicating with an external device on the Internet (green dotted line). VM VM 192.168.1.10 VXLAN BLUE 192.168.1.11 192.168.1.0/24 192.168.1.1 vCloud Networking and Security Edge Gateway 172.26.10.10 External Network 172.26.10.0/24 Virtual Machine–to–Virtual Machine communication Virtual Machine–to–Internet communication Internet Figure 5. VXLAN Traffic Flow – Same Logical L2 and External Traffic In the case of virtual machine–to–virtual machine communication on the same logical L2 network, the following two traffic flow examples illustrate possibilities that are dependent on where the virtual machines are deployed: 1)Both virtual machines are on the same vSphere host. 2)The virtual machines are on two different vSphere hosts. In the first case, traffic remains on one vSphere host; in the second case, the virtual machine packet is encapsulated into a new UDP header by the source VTEP on one vSphere host and is sent over through the external IP network infrastructure to the destination VTEP on another vSphere host. In this process, the external switches and routers do not detect anything about the virtual machine’s IP (192.168.1.10/192.168.1.11) and MAC address because they are embedded in the new UDP header. In the scenario where the virtual machine is communicating with the external world, as shown by the green dotted line, it first will send the traffic to gateway IP address 192.168.1.1; the vCloud Networking and Security Edge gateway will send unencapsulated traffic over its external-facing interface to the Internet. T ECHNICAL W HI T E P A P E R / 1 0 VMware Network Virtualization Design Guide Inter-VXLAN Packet Flow In the example shown in Figure 6, there are two logical L2 networks, VXLAN Blue and VXLAN Orange. The virtual machines connected to these networks are isolated from each other. The two networks are assigned with two different subnet IP addresses, 192.168.1.0/24 and 192.168.2.0/24. The vCloud Networking and Security Edge gateway acts as the router/gateway between these two isolated logical L2 networks. The traffic flow between the two virtual machines on different logical networks depends on where the virtual machines and vCloud Networking and Security Edge gateway appliance are deployed. The following are possible scenarios: 1) All the virtual machines and the vCloud Networking and Security Edge gateway are on the same vSphere host. 2) The virtual machines are on different vSphere hosts, and the vCloud Networking and Security Edge gateway appliance is deployed on one of the vSphere hosts. 3) All the virtual machines and the vCloud Networking and Security Edge gateway appliance are on different vSphere hosts. The first case is simple to describe because the traffic remains on the same host. The virtual machines direct the traffic to the respective gateway IP address of the logical network subnets 192.168.1.1 and 192.168.2.1. The vCloud Networking and Security Edge gateway receives the traffic on the different interfaces and, based on the firewall rule, makes the routing decision between the two different interfaces. The second and third cases of traffic flow involve the encapsulated packets that traverse the physical network infrastructure before they reach the vCloud Networking and Security Edge gateway, which then routes the packet to the appropriate destination. VM VM 192.168.1.10 VM 192.168.1.11 192.168.2.10 VXLAN Blue VXLAN Orange 192.168.1.0/24 192.168.2.0/24 192.168.1.1 vCloud Networking and Security Edge Gateway 192.168.2.1 172.26.10.10 External Network 172.26.10.0/24 Virtual Machine–to–Virtual Machine communication between two VXLANs Internet Figure 6. VXLAN Traffic Flow – Different Logical L2 T ECHNICAL W HI T E P A P E R / 1 1 VMware Network Virtualization Design Guide Network Virtualization Design Considerations VMware network virtualization can be deployed on top of existing datacenter networks. In this section, we discuss how the logical networks using VXLANs can be deployed over common datacenter network topologies. We first discuss requirements for the physical network, followed by logical network deployment options. Physical Network The physical datacenter network varies across different customer environments in terms of which network topology they use in their datacenter. Hierarchical network design provides the required high availability and scalability to the datacenter network. This section assumes that the reader has some background in various network topologies utilizing traditional L3 and L2 network configurations. Readers are encouraged to look at the design guides from the physical network vendor of choice. We will examine some common physical network topologies and how to enable network virtualization in them. Network Topologies with L2 Configuration in the Access Layer In this topology access layer, switches connect to the aggregation layer over an L2 network. Aggregation switches are the VLAN termination points, as shown in Figure 7. Spanning Tree Protocol (STP) is traditionally used to avoid loops. Routing protocols run between aggregation and core layers. VM VM VM VM VM VM VM Consume Logical L2 Network VXLAN Fabric Deploy VDS vSphere Distributed Switch VLAN100 VLAN100 Single Subnet Enable IGMP Snooping L3 Access Layer STP L2 Trunks Aggregation Layer L3 Links Enable IGMP Querier Routing Rack 1 Core Layer Rack 10 Figure 7. Datacenter Design – L2 Configuration in Access Layer with STP In such deployments with a single subnet (VLAN 100) configured on different racks, enabling network virtualization based on VXLAN requires the following: •Enable IGMP snooping on the L2 switches. •Enable the IGMP querier feature on one of the L2/L3 switches in the aggregation layer. •Increase the end-to-end MTU by a minimum of 50 bytes to accommodate a VXLAN header. The recommended size is 1,550 or jumbo frames. T ECHNICAL W HI T E P A P E R / 1 2 VMware Network Virtualization Design Guide To overcome slower convergence times and lower link utilization limitations of STP, most datacenter networks today use technologies such as Cisco vPC/VSS (or MLAG, MCE, SMLT, and so on). From the VXLAN design perspective, there is no change to the previously stated requirements. When the physical topology has an access layer with multiple subnets configured (for example, VLAN 100 in Rack 1 and VLAN 200 in Rack 10 in Figure 8), the aggregation layer must have Protocol-Independent Multicast (PIM) enabled to ensure that multicast routes across multiple subnets are exchanged. All the VXLAN requirements previously discussed apply to leaf and spine datacenter architectures as well. Network Topologies with L3 Configuration in the Access Layer In this topology, access layer switches connect to the aggregation layer over an L3 network. Access switches are the VLAN termination points, as shown in Figure 8. Key advantages of this design are better utilization of all the links using Equal-Cost Multipathing (ECMP) and elimination of STP. From the VXLAN deployment perspective, the following requirements must be met: •Enable PIM on access switches. •Ensure that during the VXLAN preparation process, no VLAN is configured. This ensures that a VDS doesn’t perform VLAN tagging, also called virtual switch tagging (VST) mode. •Increase end-to-end MTU by a minimum of 50 bytes to accommodate a VXLAN header. The recommended size is 1,550 or jumbo frames. VM VM VM VM VM VM VM Consume Logical L2 Network VXLAN Fabric Deploy VDS vSphere Distributed Switch L3 Links Routing L3 Access Layer Enable PIM ECMP Aggregation Layer Rack 1 Core Layer Rack 10 Figure 8. Datacenter Design – L3 Configuration in Access Layer with ECMP T ECHNICAL W HI T E P A P E R / 1 3 VMware Network Virtualization Design Guide Logical Network After the physical network has been prepared, logical networks are deployed with VXLAN, with no ongoing changes to the physical network. The logical network design differs based on the customer’s needs and the type of compute, network and storage components they have in the datacenter. The following aspects of the virtual infrastructure should be taken into account before deploying logical networks: •A cluster is a collection of vSphere hosts and associated virtual machines with shared resources. One cluster can have a maximum of 32 vSphere hosts. •A VDS is the datacenter-wide virtual switch that can span across up to 500 hosts in the datacenter. Best practice is to use one VDS across all clusters to enable simplified design and cluster-wide VMware vSphere vMotion® migration. •With VXLAN, a new traffic type is added to the vSphere host: VXLAN transport traffic. As a best practice, the new VXLAN traffic type should be isolated from other virtual infrastructure traffic types. This can be achieved by assigning a separate VLAN during the VXLAN preparation process. •A VMware vSphere ESXi™ host’s infrastructure traffic, including vMotion migration, VMware vSphere Fault Tolerance, management, and so on, is not encapsulated and is independent of the VXLAN-based logical network. These traffic types should be isolated from each other, and enough bandwidth should be allocated to them. As of this release only, VMware does not support placing infrastructure traffic such as vMotion migration on VXLAN-based virtual networks. Only virtual machine traffic is supported on logical networks. •To support vMotion migrations of workloads between clusters, all clusters should have access to all storage resources. •The link aggregation method configured on the vSphere hosts also impacts how VXLAN transport traffic traverses the host NICs. The VDS VXLAN port group’s teaming can be configured as failover, LACP active mode, LACP passive mode or static EtherChannel. a. When LACP or static EtherChannel is configured, the upstream physical switch must have an equivalent port channel or EtherChannel configured. b. Also, if LACP is used, the physical switch must have 5-tuple hash distribution enabled. c. Virtual port ID and load-based teaming are not supported with VXLAN. Next, the design in the following three scenarios is discussed. •Greenfield deployment – A datacenter built from scratch. •Brownfield deployment – An existing operational datacenter with virtualization. •Stretched cluster – Two datacenters separated by a short distance. Scenario 1 – Greenfield Deployment: Logical Network with a Single Physical L2 Domain In a greenfield deployment, the recommended design is to have a single VDS stretching across all the compute clusters within the same vCenter Server. All hosts in the VDS are placed on the same L2 subnet (single VLAN on all uplinks). In Figure 9, the VLAN 10 spanning the racks is switched—not routed—creating a single L2 subnet. This single subnet serves as the VXLAN transport subnet, and each host receives an IP address from this subnet, used in VXLAN encapsulation. Multicast and other requirements are met based on the physical network topology. Refer to the L2 configuration in the access layer shown in Figure 9 for details on multicast-related configuration. T ECHNICAL W HI T E P A P E R / 1 4 VMware Network Virtualization Design Guide VM VM VM VM VM VM VM Logical L2 Network VM VXLAN 5002 VXLAN 5001 VXLAN Fabric Rack 1 Cluster 1 VLAN 10 vSphere Distributed Switch vSphere vSphere vSphere vSphere Rack 10 Cluster 2 VLAN 10 Legend: VTEP vwire5001 portgroup vwire5002 portgroup Switch Figure 9. Greenfield Deployment – One VDS Keep in mind the following key points while deploying: •The VDS VXLAN port group must be in the same VLAN across all hosts in all clusters. This configuration is handled through the vCloud Networking and Security Manager plug-in in vCenter Server. •VDS, VLAN, teaming and MTU settings must be provided as part of the VXLAN configuration process. •A VTEP IP address is assigned either via DHCP or statically via vCenter Server. •Virtual machines communicating outside the logical network (to the Internet or to nonlogical networks within the datacenter) require a VXLAN gateway. vMotion Boundary The vMotion boundary, or the workload migration limit, in VXLAN deployment is dictated by the following two criteria: 1)vMotion migration is limited to hosts managed by a single vCenter Server instance. 2)vMotion migration is not possible across two VDS. In this scenario where all the hosts are part of the same VDS, vMotion migration will work across all hosts as long as the shared storage requirement is satisfied across the two clusters. Scenario 2 – Logical Network: Multiple Physical L2 Domains In brownfield deployments, clusters are typically deployed with multiple VDS, one per cluster. Each VDS is on a different subnet, terminated on an aggregation router. Logical L2 networks can span across these subnet boundaries. The main difference as compared to scenario 1 is that VXLAN transport traffic is routed instead of being switched in the same subnet. Multicast and ECMP requirements are dependent on the physical topology. Refer to the L3 configuration in the access layer shown in Figure 10 for details on multicast-related configuration. T ECHNICAL W HI T E P A P E R / 1 5 VMware Network Virtualization Design Guide VM VM VM VM VM VM VM Logical L2 Network VM VXLAN 5002 VXLAN 5001 VXLAN Fabric Rack 1 Cluster 1 VLAN 10 vSphere Distributed Switch vSphere vSphere Distributed Switch vSphere vSphere vSphere Rack 10 Cluster 2 VLAN 20 Legend: VTEP vwire5001 portgroup vwire5002 portgroup Switch Router Figure 10. Brownfield Deployment – Two VDS Keep in mind the following key points while deploying: •VTEPs in different subnets can route traffic to each other. •A VTEP IP address is assigned either via DHCP or statically via vCenter. •Applications running in virtual machines cannot detect the physical topology and are in the same subnet. •Virtual machines communicating outside the logical network (to the Internet or to nonlogical networks within the datacenter) require a VXLAN gateway. (See appendix 2 for packet flows.) vMotion Boundary In this two-VDS VXLAN deployment, the vMotion boundary is limited to one VDS. The workloads deployed on a logical L2 network cannot be moved to a host connected to a different VDS. However, if workload placement alone is the goal, this design enables the choice of any cluster for the deployment of a workload, even if they are on different physical VLANs. Scenario 3 – Logical Network: Multiple Physical L2 Domains with vMotion If vMotion migration across clusters is an important requirement, the following modified design should be used. Here, a single VDS spans across multiple clusters, enabling vMotion migration across clusters. The following are some of the key differences in this design: •No VLAN ID is configured during the VXLAN preparation. The VDS will not perform VLAN tagging for the VXLAN traffic going out on the uplinks (no VST). •Dedicated uplinks are required on the hosts to carry untagged VXLAN traffic. •The physical-switch ports, where the host uplinks are connected, are configured as access ports with appropriate VLAN. For example, as shown in Figure 11, access switch ports of cluster 1 are configured with VLAN 10; those of cluster 2 are configured with VLAN 20. T ECHNICAL W HI T E P A P E R / 1 6 VMware Network Virtualization Design Guide VM VM VM VM VM VM VM Logical L2 Network VM VXLAN 5002 VXLAN 5001 VXLAN Fabric Rack 1 Cluster 1 No VST Rack 10 Cluster 2 vSphere Distributed Switch vSphere vSphere vSphere vSphere No VST Legend: VTEP vwire5001 portgroup vwire5002 portgroup VLAN 10 Switch VLAN 20 Router Figure 11. Brownfield Deployment – Single VDS to Enable vMotion Migration Because the storage network is parallel and independent of a logical network, it is assumed that both clusters can reach the shared storage. Standard vMotion migration distance limitations and single vCenter requirements still apply. Because the moved virtual machine is still in the same logical L2 network, no IP readdressing is necessary, even though the physical hosts might be on different subnets. Scenario 4 – Logical Network: Stretched Clusters Across Two Datacenters Stretched clusters offer the ability to balance workloads between two datacenters. This nondisruptive workload mobility enables migration of services between geographically adjacent sites. A stretched cluster design helps pool resources in two datacenters and enables workload mobility. Virtual machine–to–virtual machine traffic is within the same logical L2 network, enabling L2 adjacency across datacenters. The virtual machine–to–virtual machine traffic dynamics are the same as those previously cited. In this section, we will discuss the impact of this design on north–south traffic (virtual machine communicating outside the logical L2 network) because that is the main difference as compared to previous scenarios. Figure 12 shows two sites, site A and site B, with two hosts deployed in each site along with the storage and the replication setup. Here all hosts are managed by a single vCenter Server and are part of the same VDS. In general, for stretched cluster design, the following requirements must be met: •The two datacenters must be managed by one vCenter Server because the VXLAN scope is limited to a single vCenter Server. •vMotion support requires that the datacenters have a common stretched VDS (as in scenario 3). A multiple VDS design, discussed in scenario 2, can also be used, but vMotion migration will not work. T ECHNICAL W HI T E P A P E R / 1 7 VMware Network Virtualization Design Guide VM After vMotion VM VXLAN 5002 vSphere Distributed Switch Stretched Cluster WAN Site A IP Network Internet Storage A Site B IP Network FC/IP LUN (R/W) Storage B Internet LUN (R/O) Figure 12. Stretched Cluster In this design, the vCloud Networking and Security Edge gateway is pinned to one of the datacenters (site A in this example). In the vCloud Networking and Security 5.1 release, each VXLAN segment can have only one vCloud Networking and Security Edge gateway. This has the following implications: •All north–south traffic from the second datacenter (site B) in the same VXLAN (5002) must transit the vCloud Networking and Security Edge gateway in the first datacenter (site A). •Also, when a virtual machine is moved from site A to site B, all north–south traffic returns to site A before reaching the Internet or other physical networks in the datacenter. •Storage must support a “campus cluster” configuration. These implications raise obvious concerns regarding bandwidth consumption and latency, so an active–active multidatacenter design is not recommended. This design is mainly targeted toward the following scenarios: •Datacenter migrations that require no IP address changes on the virtual machines. After the migration has been completed, the vCloud Networking and Security Edge gateway can be moved to the new datacenter, requiring a change in external IP addresses on the vCloud Networking and Security Edge only. If all virtual machines have public IP addresses and are not behind vCloud Networking and Security Edge gateway network address translation (NAT), more changes are needed. •Deployments that require limited north–south traffic. Because virtual machine–virtual machine traffic does not require crossing the vCloud Networking and Security Edge gateway, the stretched cluster limitation does not apply. These scenarios also benefit from elastic pooling of resources and initial workload placement flexibility. If virtual machines are in different VXLANs, the limitations do not apply. T ECHNICAL W HI T E P A P E R / 1 8 VMware Network Virtualization Design Guide Managing IP Addresses in Logical Networks In a large cloud environment with multiple tenants, IP address management is a critical task. In this section, we will focus on IP address management of the virtual machines deployed on the VXLAN logical L2 network. Each logical L2 network created with VXLAN is a separate L2 broadcast domain. This L2 broadcast domain can be associated with a separate subnet using a private IP space or publicly routable IP space. Depending on whether private IP space or publicly routable IP space is used for the assignment to the logical networks, customers must choose either the NAT or the non-NAT option on the vCloud Networking and Security Edge gateway. So the IP address assignment depends on whether the virtual machine is connected to a logical L2 network through a NAT or non-NAT configuration. Let’s take a look at the example with the following two deployments: 1) Using the NAT and DHCP services of the vCloud Networking and Security Edge gateway 2) Not using the NAT and DHCP services of the vCloud Networking and Security Edge gateway With Network Address Translation In deployments where customers have limited IP address space, NAT is used to provide address translation from private IP space to the limited public IP addresses. By utilizing vCloud Networking and Security Edge gateway services, customers can provide individual tenants with the ability to create their own pool of private IP addresses, which ultimately get mapped to the publicly routable external IP address of the external vCloud Networking and Security Edge gateway interface. Figure 13 shows a three-tenant deployment, with each tenant virtual machine connected to separate logical L2 networks. The blue, green and purple virtual wires (VXLAN segments) are connected to the three internal interfaces of the vCloud Networking and Security Edge gateway; the external interface of the vCloud Networking and Security Edge is connected to the Internet via a datacenter router. 192.168.1.10 192.168.1.11 VM VM 192.168.3.10 VXLAN 5000 VM 192.168.1.0/24 VXLAN 5002 192.168.3.0/24 192.168.2.10 VM 192.168.1.1 192.168.3.1 VXLAN 5001 192.168.2.0/24 vCloud Networking and Security Edge Gateway 192.168.2.1 Standard NAT Configuration and DHCP service 172.26.10.1 External Network 172.26.10.0/24 Internet Figure 13. NAT and DHCP Configuration on vCloud Networking and Security Edge Gateway T ECHNICAL W HI T E P A P E R / 1 9 VMware Network Virtualization Design Guide The following are some configuration details of the vCloud Networking and Security Edge gateway: •Blue, green and purple virtual wires (VXLAN segments) are associated with separate port groups on a VDS. Internal interfaces of the vCloud Networking and Security Edge gateway connect to these port groups. •The vCloud Networking and Security Edge gateway interface connected to the blue virtual wire is configured with IP 192.168.1.1. •Enable DHCP service on this internal interface of vCloud Networking and Security Edge by providing a pool of IP addresses. For example, 192.168.1.10 to 192.168.1.50. •All the virtual machines connected to the blue virtual wire receive an IP address from the DHCP service configured on Edge or on the same subnet. •The NAT configuration on the external interface of the vCloud Networking and Security Edge gateway allows virtual machines on a virtual wire to communicate with devices on the external network. This communication is allowed only when the requests are initiated by the virtual machines connected to the internal interface of the vCloud Networking and Security Edge. In situations where overlapping IP and MAC address support is required, one vCloud Networking and Security Edge gateway per tenant is recommended. Figure 14 shows an overlapping IP address deployment with two tenants and two separate vCloud Networking and Security Edge gateways. Tenant 1 Tenant 2 10.10.1.10 10.10.1.11 10.10.1.10 VM VM VM VXLAN 5000 VXLAN 5001 10.10.1.0/24 10.10.1.0/24 10.10.1.1 10.10.1.1 vCloud Networking and Security Edge Gateway vCloud Networking and Security Edge Gateway 10.10.20.1 10.10.10.1 External Network 10.10.0.0/16 IP Core Figure 14. Overlapping IP and MAC Addresses Without Network Address Translation Customers who are not limited by routable IP addresses, have virtual machines with public IP addresses or do not want to deploy NAT can use static routing on vCloud Networking and Security Edge. T ECHNICAL W HI T E P A P E R / 2 0 VMware Network Virtualization Design Guide 172.26.1.10 172.26..1.11 VM VM 172.26..3.10 VXLAN 5000 VM 172.26.1.0/24 VXLAN 5002 172.26..3.0/24 172.26.2.10 VM 172.26.1.1 172.26.3.1 VXLAN 5001 172.26.2.0/24 vCloud Networking and Security Edge Gateway 172.26.2.1 172.26.10.1 External Network 172.26.10.0/24 Internet Figure 15. Routable IP Assignments to the Logical Networks In the deployment shown in Figure 15, the vCloud Networking and Security Edge gateway is not configured with the DHCP and NAT services. However, static routes are set up between different interfaces of the vCloud Networking and Security Edge gateway. Other Network Services •In a multitenant environment, vCloud Networking and Security Edge firewall can also be used to segment intertenant and intratenant traffic. •vCloud Networking and Security Edge load balancer can be used for load balancing external to internal Web traffic, for example, when multiple Web servers are deployed on the logical network. Static routes must be configured on the upstream router to properly route inbound traffic to the vCloud Networking and Security Edge external interface. •vCloud Networking and Security Edge also provides DNS relay functionality to resolve domain names. DNS relay configuration should point to an existing DNS in the physical network. Alternatively, a DNS server can be deployed in the logical network itself. Scaling Network Virtualization In this section, we present the design considerations that can be followed for the different components while planning the scaling of VXLAN networks and associated network services. The following key components and parameters should be taken into account: 1)VDS: •One vCenter Server can have 128 VDS. •One VDS can span across 500 hosts. •One VDS can support 10,000 port groups. Because a new port group is created for every logical L2 network, this number dictates the number of L2 logical networks that can be created. T ECHNICAL W HI T E P A P E R / 2 1 VMware Network Virtualization Design Guide 2)vCloud Networking and Security Edge gateway: •Each vCloud Networking and Security Edge gateway can have a maximum of 10 interfaces and can be configured to connect to an internal or external network. The number of logical networks requiring gateway services determines the number of gateway instances that must be deployed based on the 10-interfaces-per-gateway maximum. For example, if one interface per gateway is connected to an external network (leaving 9 for internal networks), the number of gateway instances required for 90 logical L2 networks would be 90/9—that is, 10 vCloud Networking and Security Edge gateway devices. •Available in three different sizes, based on capacity. 3)VXLAN Traffic: •The planned virtual machine consolidation ratio should take into consideration the amount of virtual machine traffic that VTEP must handle. •Meet the bandwidth requirements for the VXLAN traffic by assigning sufficient NICs for the same. To optimally utilize the uplinks, use link aggregation methods on the physical switches. 4)Multicast: •Each VXLAN logical network is uniquely identified by a combination of a number called segment ID (determined from a range defined by the user) and the configured multicast group. The multicast group–to–VXLAN segment ID mapping is handled by the vCloud Networking and Security Manager. There is no need to have one-to-one mapping between the segment ID and the multicast group. In case of a limited number of multicast groups, vCloud Networking and Security Manager maps multiple logical networks (segment IDs) to one multicast group. Consumption Models After the VXLAN configuration has been completed, customers can create and consume logical L2 networks on demand. Depending on the type of vCloud Networking and Security bundle purchased, they have the following three options: 1)Use the vCloud Director interface. 2)Use the vCloud Networking and Security Manager interface. 3) Use REST APIs offered by vCloud Networking and Security products. In vCloud Director vCloud Director creates a VXLAN network pool implicitly for each provider VDC backed by VXLAN prepared clusters. The total number of logical networks that can be created using a VXLAN network pool is determined by the configuration at the time of VXLAN fabric preparation. A cloud administrator can in turn distribute this total number to the various organization VDCs backed by the provider VDC. The quota allocated to an organization VDC determines the number of logical networks (organization VDC/ VMware vSphere vApp™ networks) backed by VXLAN that can be created in that organization VDC. In vCloud Networking and Security Manager Customers who don’t have vCloud Director deployment can consume the logical L2 networks through the vCloud Networking and Security Manager Web interface or through the vSphere Client network virtualization plug-in. T ECHNICAL W HI T E P A P E R / 2 2 VMware Network Virtualization Design Guide Using API In addition to vCloud Director and vCloud Networking and Security Manager, vCloud Networking and Security components can be managed using APIs provided by VMware. For detailed information on how to use the APIs, refer to the vCloud Networking and Security 5.1 API Programming Guide at https://www.vmware.com/pdf/vshield_51_api.pdf. Troubleshooting and Monitoring The following are some of the important tools that customers should use to troubleshoot and monitor the VXLAN network. These tools provide the required visibility into the encapsulated VXLAN traffic and also help manage the overall logical network infrastructure. Network Health Check Network Health Check enables proactive reports on virtual and physical network configuration inconsistencies, reducing operational costs involved in troubleshooting and fixing errors. It checks for the following three parameters: •VLAN IDs •MTU settings •Teaming configuration VXLAN Connectivity Check – Unicast and Broadcast Tests The unicast and broadcast tests available through the vCloud Networking and Security Manager enable customers to test the configuration across the virtual and physical infrastructure. They also enable verification that all VTEP configurations are correct and that each VTEP can reach other VTEPs. A gateway address on VTEP is required for this functionality to work. A VTEP IP address must be assigned using DHCP to configure the gateway, because static IP configuration on VTEP via vCenter Server does not enable gateways to be configured. Proxy ARP on upstream gateway/router is not a requirement. Monitoring Logical Flows – IPFIX NetFlow v10/IPFIX on VDS enables vendors to predefine custom NetFlow records. A new VXLAN template has been predefined to monitor traffic flows in logical networks. With this template, customers can monitor VXLAN flows at virtual machine–level granularity. Port Mirroring VDS provides multiple standard port mirroring features such as SPAN, RSPAN and ERSPAN that help in detailed traffic analysis. T ECHNICAL W HI T E P A P E R / 2 3 VMware Network Virtualization Design Guide Conclusion The VMware network virtualization solution addresses the current challenges with the physical network infrastructure and brings flexibility, agility and scale through VXLAN-based logical networks. Along with the ability to create on-demand logical networks using VXLAN, the vCloud Networking and Security Edge gateway helps customers deploy various logical network services such as firewall, DHCP, NAT and load balancing on these networks. The operational tools provided as part of the solution help in the troubleshooting and monitoring of these overlay networks. T ECHNICAL W HI T E P A P E R / 2 4 VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com Copyright © 2013 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item No: VMW-WP-NETWORK-VIRT-GUIDE-USLET-101 Docsource: OIC - 12VM008.07