ARMD: Address Resolution for Massive amount of hosts in Data

advertisement
ARMD: Address Resolution for Massive amount of hosts in Data Center
Description of Working Group:

Server virtualization introduces massive amount of hosts in a data center
As server virtualization is introduced to data centers, the number of hosts in a data center
can grow dramatically. Each physical server, which used to host one end-station, now
can host hundreds of end-stations or virtual machines (VMs). Virtual machines (VMs)
can be easily added and/or deleted and moved among physical servers. The VM’s
flexible addition, deletion, and (or) migration make it possible to combine pool of servers
into one huge single resource entity for ease of management and better resource
utilization. It also provides foundation for virtual hosts and virtual desktops offered by
Cloud Computing services.
This rapid growth of virtual hosts could tremendously impact networks and servers. One
major issue is frequent ARP (IPv4) or neighbor discovery (IPv6) requests from hosts. All
hosts send out those requests frequently due to host ARP caches being aged out in
minutes. With tens of thousands of hosts (each with a distinct MAC address) in one Data
Center, the number of ARP (or ND) packets per second from all hosts could potentially
be as high as 1,000 to 10,000/second. This rate imposes tremendous computational
burden on servers.

Next Gen or Cloud Data Center have to handle massive amount of subnets (or Closed
User Groups)
Cloud Data Centers offer multi-tenant services which may require each tenant (and their
VMs) to be in their own VPN environment, equivalent to a Closed User Group. The
number of such tenants/groups might be much larger than the VLAN space (4095).

The topology of subnet changes as virtual machines migrate from one location to another
One key characteristic of a next-generation Data Center is the possibility of one subnet
spanning across multiple sites and its topology changes. For example, when VMs move
from one rack to another, their associated subnets topology will change accordingly. VM
migration in Layer 2 environments will require updating the Layer 2 (MAC) tables in the
individual switches in the data center to ensure accurate forwarding. Consider a case where a
VM migrates across racks. The migrated VM often sends out a gratuitous ARP broadcast when it
comes up at the new location. This is forwarded by the top of rack switch at the new rack across
the entire network. The individual switches learn the new location of the migrated VM through
the source address of this broadcast frame. The top of rack switch at the old rack is not aware of
the migration until it receives this gratuitous ARP. So it continues to forward frames to the port
where it learnt the VM’s MAC address from earlier, leading to black holing of traffic. The
duration of this black holing period may depend upon the topology. It may be longer if the VM
has moved to a rack in a different data center connected to this data center over Layer 2
Traditional subnet (VLAN) partitioning no longer works well when servers are virtualized.
When each server only has one host, there is only one VLAN enabled for the switch port
towards the server. The switch can block all broadcast messages from other subnets
(VLANs) from going to the server. When one physical server is supporting >100 Virtual
Machines, i.e. >100 hosts, most likely the virtual hosts on one server are on different subnets
(VLANs). If there are 50 subnets (VLANs) enabled on the switch port to the server, the
server has to handle all the ARP broadcast messages on all 50 subnets (VLANs). The amount
of ARP to be processed by each server is still too much. Another issue with VLAN
partitioning is that there could be more subnets than available VLANs (4095) in a Data
Center.
The goal of this working group is to develop interoperable solutions to solve the above
problems within a Data Center and multi-site Data Center environments. The multiple sites
could be connected by any types of network, like L2VPN, L3VPN, Ethernet network, MACin-MAC, IEEE802.1aq, or simple IP networks. This could involve multi-domain (multicompany), multi-site and multi-vendor environments. Efficient address resolution across
multiple sites should be considered by this working group. Since VMs allocation and
migration within a data center is managed by data center’s central resource management,
directory based approach should also be considered as well.
The design should consider the following properties:
 All solutions developed by ARMD WG should not require any behavior changes on
hosts, applications, or Virtual Machines being deployed in the market.
 All solutions developed should not break DHCP, or any other broadcast/multicast
mechanism used by applications.
 Evaluating the impact to IPv6 ND, and develop solutions accordingly if needed.
 Should consider variety of solutions, including directory based, proxy based, or cache
based solutions.
 Include analysis of security concerns of IPv4 ARP requests from malicious users.
Evaluating potential security solutions and conclude if the security threat can justify
solutions.
 ARMD assumes the direct links to individual hosts and virtual machines are IEEE802.3
Ethernet links.
 Should consider scenarios of one Ethernet network being interconnected by another
network, which can be L2VPN, pure IP, Ethernet, or others.
 Should consider a performance analysis of proposed solutions.
Here are the items which should not be in the scope of the working group:
 Re-define DHCP behavior
 Re-define IPv6 ND security model
 Solution developed should not expect any behavior changes for the guest OS/applications
running on the VMs
 Direct links from hosts and virtual hosts being non-Ethernet links.
Goals and Milestones:
 Problem statement





Gap Analysis Statement within Data Center
o Gaps within ARP
o Gaps within IPv6 ND or autoconfiguration
Survey of Existing Solutions
o Survey of NHRP (RFC2332) & SCSP (RFC2334), and their applicability to
Ethernet networks;
o Survey of TRILL work as a potential solution;
o Survey of other potential solutions, like MOOSE or SEATTLE; and
o Other proposals
Security Analysis of Existing Solutions
o Study existing solutions, like Dynamic ARP Inspection, Egress ARP Inspection,
etc, and evaluate if any of them or others should be included in the solutions
specified by this working group.
Architectural Design for Next-Generation Data Center
Protocol Documents
Time line for BOF presentations
Initial Problem Statement
GAP analysis
Survey of Existing Solutions
Security Analysis of Existing Solutions
Architectural Design for NG
Final Problem statement for BOF
Protocol Documents
August
October
October
2011
October
October
2011
Download