ARMD: Address Resolution for Massive amount of hosts in Data Center Description of Working Group: Server virtualization introduces massive amount of hosts in a data center As server virtualization is introduced to data centers, the number of hosts in a data center can grow dramatically. Each physical server, which used to host one end-station, now can host hundreds of end-stations or virtual machines (VMs). Virtual machines (VMs) can be easily added and/or deleted and moved among physical servers. The VM’s flexible addition, deletion, and (or) migration make it possible to combine pool of servers into one huge single resource entity for ease of management and better resource utilization. It also provides foundation for virtual hosts and virtual desktops offered by Cloud Computing services. This rapid growth of virtual hosts could tremendously impact networks and servers. One major issue is frequent ARP (IPv4) or neighbor discovery (IPv6) requests from hosts. All hosts send out those requests frequently due to host ARP caches being aged out in minutes. With tens of thousands of hosts (each with a distinct MAC address) in one Data Center, the number of ARP (or ND) packets per second from all hosts could potentially be as high as 1,000 to 10,000/second. This rate imposes tremendous computational burden on servers. Next Gen or Cloud Data Center have to handle massive amount of subnets (or Closed User Groups) Cloud Data Centers offer multi-tenant services which may require each tenant (and their VMs) to be in their own VPN environment, equivalent to a Closed User Group. The number of such tenants/groups might be much larger than the VLAN space (4095). The topology of subnet changes as virtual machines migrate from one location to another One key characteristic of a next-generation Data Center is the possibility of one subnet spanning across multiple sites and its topology changes. For example, when VMs move from one rack to another, their associated subnets topology will change accordingly. VM migration in Layer 2 environments will require updating the Layer 2 (MAC) tables in the individual switches in the data center to ensure accurate forwarding. Consider a case where a VM migrates across racks. The migrated VM often sends out a gratuitous ARP broadcast when it comes up at the new location. This is forwarded by the top of rack switch at the new rack across the entire network. The individual switches learn the new location of the migrated VM through the source address of this broadcast frame. The top of rack switch at the old rack is not aware of the migration until it receives this gratuitous ARP. So it continues to forward frames to the port where it learnt the VM’s MAC address from earlier, leading to black holing of traffic. The duration of this black holing period may depend upon the topology. It may be longer if the VM has moved to a rack in a different data center connected to this data center over Layer 2 Traditional subnet (VLAN) partitioning no longer works well when servers are virtualized. When each server only has one host, there is only one VLAN enabled for the switch port towards the server. The switch can block all broadcast messages from other subnets (VLANs) from going to the server. When one physical server is supporting >100 Virtual Machines, i.e. >100 hosts, most likely the virtual hosts on one server are on different subnets (VLANs). If there are 50 subnets (VLANs) enabled on the switch port to the server, the server has to handle all the ARP broadcast messages on all 50 subnets (VLANs). The amount of ARP to be processed by each server is still too much. Another issue with VLAN partitioning is that there could be more subnets than available VLANs (4095) in a Data Center. The goal of this working group is to develop interoperable solutions to solve the above problems within a Data Center and multi-site Data Center environments. The multiple sites could be connected by any types of network, like L2VPN, L3VPN, Ethernet network, MACin-MAC, IEEE802.1aq, or simple IP networks. This could involve multi-domain (multicompany), multi-site and multi-vendor environments. Efficient address resolution across multiple sites should be considered by this working group. Since VMs allocation and migration within a data center is managed by data center’s central resource management, directory based approach should also be considered as well. The design should consider the following properties: All solutions developed by ARMD WG should not require any behavior changes on hosts, applications, or Virtual Machines being deployed in the market. All solutions developed should not break DHCP, or any other broadcast/multicast mechanism used by applications. Evaluating the impact to IPv6 ND, and develop solutions accordingly if needed. Should consider variety of solutions, including directory based, proxy based, or cache based solutions. Include analysis of security concerns of IPv4 ARP requests from malicious users. Evaluating potential security solutions and conclude if the security threat can justify solutions. ARMD assumes the direct links to individual hosts and virtual machines are IEEE802.3 Ethernet links. Should consider scenarios of one Ethernet network being interconnected by another network, which can be L2VPN, pure IP, Ethernet, or others. Should consider a performance analysis of proposed solutions. Here are the items which should not be in the scope of the working group: Re-define DHCP behavior Re-define IPv6 ND security model Solution developed should not expect any behavior changes for the guest OS/applications running on the VMs Direct links from hosts and virtual hosts being non-Ethernet links. Goals and Milestones: Problem statement Gap Analysis Statement within Data Center o Gaps within ARP o Gaps within IPv6 ND or autoconfiguration Survey of Existing Solutions o Survey of NHRP (RFC2332) & SCSP (RFC2334), and their applicability to Ethernet networks; o Survey of TRILL work as a potential solution; o Survey of other potential solutions, like MOOSE or SEATTLE; and o Other proposals Security Analysis of Existing Solutions o Study existing solutions, like Dynamic ARP Inspection, Egress ARP Inspection, etc, and evaluate if any of them or others should be included in the solutions specified by this working group. Architectural Design for Next-Generation Data Center Protocol Documents Time line for BOF presentations Initial Problem Statement GAP analysis Survey of Existing Solutions Security Analysis of Existing Solutions Architectural Design for NG Final Problem statement for BOF Protocol Documents August October October 2011 October October 2011