Data Center Solutions Guide Abstract: The following document provides a Virtualized Data Center Solution Guide with the architectural components that tie the network, compute, storage and management together. A SOLUTION WHITE PAPER Table of Contents Introduction................................................................................................................... 4 1.1 DATA CENTER CHALLENGES. . .............................................................................................................. 4 1.2 EXTREME SOLUTION............................................................................................................................5 1.3 ARCHITECTURAL COMPONENTS. . ........................................................................................................6 2. Network Management and Service Orchestration............................................. 7 2.1 OVERVIEW. . ........................................................................................................................................... 7 2.2 NETSIGHT . . ............................................................................................................................................8 2.3 ONEFABRIC CONNECT APIS................................................................................................................9 2.4 ORCHESTRATION................................................................................................................................ 11 2.5 DEVOPS. . .............................................................................................................................................. 11 2.6 CLI SCRIPTING . . ................................................................................................................................... 11 3 Network Abstraction................................................................................................13 3.1 ONECONTROLLER............................................................................................................................... 13 3.2 NETWORK ACCESS CONTROL........................................................................................................... 14 3.3 ANALYTICS-AS-A-SERVICE................................................................................................................ 15 4 Network Infrastructure............................................................................................16 4.1 HIGH AVAILABILITY............................................................................................................................ 17 4.2 MULTIPATH.. ........................................................................................................................................ 17 4.3 REDUNDANCY.. ...................................................................................................................................25 4.4 LOGICAL SEPARATION...................................................................................................................... 31 4.5QUALITY OF SERVICE (QOS)............................................................................................................. 32 4.6 ELASTICITY........................................................................................................................................ 37 4.7 SECURITY.. ..........................................................................................................................................39 4.8 TOR AND EOR DESIGNS. . ................................................................................................................... 41 4.9 DATA CENTER INTERCONNECT (DCI).............................................................................................. 43 4.10 MANAGEMENT ................................................................................................................................ 46 Data Center Solutions Guide – White Paper 2 5 Data Center Infrastructure Elements...................................................................48 5.1 SERVER VIRTUALIZATION................................................................................................................. 50 5.2 STORAGE........................................................................................................................................... 50 5.3 FIREWALLS......................................................................................................................................... 51 5.4 SERVICE CHAINING. . .......................................................................................................................... 52 Data Center Solutions Guide – White Paper 3 WHITE PAPER Introduction 1.1 DATA CENTER CHALLENGES Online delivery and consumption models for business and consumer services are evolving, for both cloud services or traditional IT services. Demand for these services and application availability has changed the requirements for data centers. Early motivations for data center changes were associated with massive cost reduction and redundancy but are now focused on agility and ability to meet requirements in different cloud models that deliver the new business and consumer services. Today’s highly distributed wired and wireless networks are designed for increased flexibility, scale and reliability. Additionally, enterprises of all sizes increasingly turn to outsourced services of all types – SaaS, PaaS, IaaS, and more. Securely delivering new end-to-end services and applications across these environments often results in increased complexity, compromise, and costs. Customers face challenges that require a data center to be: Simpler: 1. Managing a virtualized environment is complicated. Both provisioning network devices and gaining visibility into the traffic on those devices is difficult, timeconsuming and requires too many tools. 2.The data center needs to be improved in order to be more dynamic and automated via technologies like SDN, but that’s also perceived as being complex 3.Traffic isn’t optimized both within the data center and between/among interconnected data centers. Faster: 1. Rolling out new applications, services and other network changes is inefficient, ineffective and takes too long. 2.The data center can’t scale to accommodate the speed and performance demanded by the explosive growth of new applications and devices. 3.The data center is expensive in terms of OPEX; operators are spending too much time on basic maintenance tasks and not enough on leveraging this valuable business asset. Data Center Solutions Guide – White Paper 4 Smarter: 1. The data center needs to have improved availability/redundancy/reliability in order to eliminate expensive downtime. This also means improved security in order to eliminate external disruptions. 2.Operators need better analytics into network usage so they can leverage this Business Intelligence in order to assure Service Level Agreements (SLAs) and to improve the business overall Data center LANs are constantly evolving. Business pressures are forcing IT organizations to adopt new application delivery models. Edge computing models are transitioning from applications at the edge to virtualized desktops in the data center. The evolution of the data center from centralized servers to a private cloud is well underway and will be augmented by hybrid and public cloud computing services. With data center traffic becoming less client-server centric and more server-server centric, new data center topologies are emerging. Yesterday’s heavily segmented data center is becoming less physically segmented and more virtually segmented. Virtual segmentation allows sharing the same physical infrastructure in the most efficient manner, leading to both capital and operational expense (CAPEX/OPEX) savings. Virtual segmentation accelerates the time to spin up new business applications. 1.2 EXTREME SOLUTION With businesses demanding a broader variety of IT-driven services, overcoming these constraints has become a priority for IT leadership. Leveraging Extreme Networks OneFabric Connect and Software-Defined Architecture (SDA), organizations overcome these challenges with a unified platform for security, virtualization, manageability, mobility and convergence that enables more reliable provisioning and delivery of new services and application on a more dynamic IT infrastructure. With Extreme Networks OneFabric Connect and SDN architecture, the network tier becomes as dynamic, automated and modifiable as the storage and compute tiers, providing a simple, fast, and smart networking solution that delivers the benefits of: • Simplified end-to-end automation that makes network deployment, management and ongoing operations more cost effective • Faster provisioning that supports any application while providing flexibility for deploying the operator’s choice of best-of-breed applications, solutions and vendors • Intelligent orchestration compatible with existing systems to take advantage of present network infrastructures and protect an organization’s existing investments Extreme Networks provides the foundation for open, standards-based and comprehensive SDN platforms and integrated ecosystems. OneFabric Connect provides an open, programmable and centrally managed foundation for implementing SDN on any network, as our open, standards-based Software-Defined Architecture provides a number of key innovations and capabilities, including fully integrated management, access control, and application analytics for flexibly deploying new SDN solutions. These solutions operate across heterogeneous network infrastructures to enable seamless migrations to new applications and services without compromise. Data Center Solutions Guide – White Paper 5 Figure 1: Extreme Networks Software Defined Architecture 1.3 ARCHITECTURAL COMPONENTS The architectural components of the data center as described in this document are key to any data center design. The Extreme Data Center solution provides secure, scalable infrastructure with the ability to expand or shrink resources based on business needs. The ability to quickly provision resources to meet specific business needs allows application agility. This characteristic has various advantages, which include, faster time to market, reduced TCO and enabling application agility. This document should be used as general guidelines and can easily be implemented to the target customer Data Center requirements, from K-12 school data centers to large enterprise data centers to Hadoop clusters to Infrastructure-As-A-Service (IaaS) platforms. This document will walk through the layers and architectural components and describe how Extreme Networks each of those requirements. Derived from the business objectives and the requirements of the applications hosted in the data center (see the Business Applications in Figure 1), the common design goals include: • Application availability • Performance, scale and responsiveness • Security against attacks/hacks • Visibility, workload manageability, upgradability • Resource utilization Data Center Solutions Guide – White Paper 6 2. Network Management and Service Orchestration 2.1 OVERVIEW IT organizations need simplified data center management that requires a single pane of glass management system that is intelligent, highly automated, and integrated with the entire data center ecosystem. Simplicity in configuring and deploying the infrastructure and environment, provisioning, and centralized management is critical for all types of data centers. It reduces time to deployment and reduces operational expenditures. Customers want the single pane of glass to interface with the network elements, and they want different ways of doing it through open APIs without vendor lock-in. They also need to be able to develop on top of the platform and integrate with 3rd party solutions, on top of a data center that is brownfield (has existing infrastructure). Organizations need to dynamically optimize network resources and investments to fit changing business strategies, and create an open foundation for delivering new services, increased efficiencies, reduced costs, and sustained advantage without compromise. Extreme achieves a simplified and consistent user experience through integrated infrastructure management solutions described below that enables collaboration and automation to ultimate provide optimized Data center IT administration and operations. The value add of a single pain of glass management system are: Business alignment • Transform complex network data into business-centric, actionable information • Centralize and simplify the definition, management, and enforcement of policies such as guest access or personal devices • Easily integrate with business applications with Software Defined Networking for operational efficiency • Operational efficiency • Reduce Datacenter IT administrative effort with the automation of routine tasks and web-based dashboard • Streamline management with the integration of wired and wireless networks • Easily enforce policies network-wide for QoS, bandwidth, etc. • Troubleshoot with the convenience of a smartphone or tablet • Integrate with enterprise management platforms • Integrate with other network services and service chaining Security • Protect corporate data with centralized monitoring, control, and real-time response • Enhance existing investments in network security • Preserve LAN/WLAN network integrity with unified policies Data Center Solutions Guide – White Paper 7 2.2 NETSIGHT A Network Management System (NMS) is essential to provide centralized visibility and granular control of enterprise network resources end to end. Next generation data centers have higher requirements of the FCAPS capabilities that a solid NMS provides, as the NMS also goes beyond just traditional configuration and management responsibilities. To meet SDN platform requirements and integrate communication between the infrastructure elements and the OneController, it provides a seamless migration path for existing ecosystem partners. It can collect topology, host, and statistical information from OneController and provide visualization for provisioning and monitoring, and manage security applications and services. Today, Extreme’s centralized network management application NetSight provides capabilities common to many NMS solutions and is distinctive for granularity that reaches beyond ports and VLANs down to individual users, applications, and protocols. No matter how many moves, adds, or changes occur in your environment, NetSight keeps everything in view and under control through role-based access controls. One click can equal a thousand actions when you manage your network with Extreme Networks. NetSight can even manage hardware beyond Extreme Networks switching, routing, and wireless hardware, enabling standards-based control of other vendors’ network equipment. NetSight OneView: This screen shows devices and MLAG specific information Data Center Solutions Guide – White Paper 8 2.3 ONEFABRIC CONNECT APIS With the OneFabric Connect API, business applications are directly controlled from OneFabric Control Center and Extreme Networks NetSight Advanced management application. Figure X – OneFabric Solution Architecture The result is a complete solution that provides innovative features including: Increased Agility and Flexibility • Control managed and unmanaged BYOD devices within the same infrastructure, with unified single-pane-of-glass visibility • Easily deploy and manage new applications, devices, users and services • Leverage pre-defined integrations with other IT systems to enable features like user and location-based URL filtering Lower Operational Costs • Automate provisioning and control of network services by IT systems inside as well as outside the network management domain • Discover, track, and document all network-connected assets in real-time • Automate onboarding and provisioning of network services for any device Improved Visibility, Control, and Security • Enforce policies based on context at the network layer for more comprehensive control • View application usage and threat detection information to quarantine users and devices • Gain insights into asset information for increased visibility, as well as search and location capabilities for any user and device on the network Data Center Solutions Guide – White Paper 9 • Ensure mobile device compliance for more accurate policy enforcement decisions at the network layer With Extreme Networks OneFabric Connect, organizations can integrate variety of systems and applications, using either predefined integrations that allow programmatic control of VM, MDM, CMDB, analytics, web filtering and firewall systems, or by simply and easily adding customer-defined integrations via existing APIs. Pre-defined integrations and Technology Solution Integration Partners include: CONVERGENCE Microsoft Lync Polycom CMA Avaya Easy Management DATACENTER AND CLOUD VMware vSphere (vCenter and/or ESX) VMware View Microsoft Hyper-V Microsoft SCVMM Citrix XenServer with XenCenter Citrix XenDesktop MANAGEMENT AND IT OPERATIONS FNT Command Microsoft SCCM CA ITSM MOBILITY AirWatch MobileIron JAMF Casper Fibrelink MaaS360 SECURITY Palo Alto iBoss Client IF-MAP Lightspeed Systems McAfee EPO Palo Alto Extreme Networks Data Center Manager (DCM), part of OneFabric Control Center, provides IT administrators with a transparent, cross-functional service provisioning and orchestration tool that bridges the divide between the server, networking, and storage teams and provides a single integrated view of virtual server and network environments. By enabling the unification and automation of the physical and virtual network provisioning, Data Center Manager enables networks to benefit from the high availability required for mission critical application and data performance. DCM delivers numerous benefits to IT teams, including the ability to: • Automate physical and virtual switching environments to streamline data center network provisioning • Create consistent configurations throughout the network fabric for predictable behaviors and simplified troubleshooting • Increase coordination and improve workflow between network, server, and storage teams within IT • Gain granular visibility into traffic flows and real-time and historical data to simplify incorporation of VMs into the network, improve visibility and control, and enable simplified auditing of the network via policy-based management • Unify management through an easily extensible architecture that supports a variety of hypervisor technologies and vSwitches, including VMware, Citrix, and Microsoft Specific DCM Integrations • VMware vSphere (vCenter and/or ESX) • VMware View • Microsoft Windows Server 2008 R2 with Hyper-V support • Microsoft SCVMM 2008 R2 Data Center Solutions Guide – White Paper 10 • Citrix XenServer with XenCenter • Citrix XenDesktop For more information on these and other pre-defined integrations and Technology Solution Integration Partners, please go to: http://www.extremenetworks.com/ partners/tsp/ 2.4 ORCHESTRATION Customers use a myriad of data center orchestration solutions that need to seamlessly integrate into the rest of the ecosystem, without vendor lock-in. They want to rapidly automate service delivery and application provisioning, and to simplify data center operations, managing the infrastructure elements together. Datacenter customers want a best-of-breed multi-vendor environment, and vendors that embrace integration with other vendors are the most appealing. There are some proprietary orchestration solutions, like VMWare’s vCenter Orchestrator, and some open-source cloud computing solutions, like OpenStack. The solutions geared towards cloud data centers can be modular and address cloud ecosystem requirements for things like provisioning network, compute, storage, as well as centralized directory, billing, templates, etc OpenStack has features presented in an abstract view across many physical devices, and some of these features require dynamic reconfigurations of the devices involved. Plus OpenStack may use multiple network configurations at the same time. This dynamic nature poses one of the greater challenges when it is connected to a physical network. OpenStack provides an internal, virtual node-to-node network and also can provide physical break-out points into the LAN, along with tenant separation within the internal virtual network. These different networks typically overlap and interweave with each other dynamically, but will have to be established across static, physical network equipment. Extreme Networks recognizes these challenges and offers solutions to enable automation and dynamic configuration of network equipment through various means of configuration. Extreme advocates open, standards-based solutions like OpenStack, which can flexibly be supported through Extreme’s open APIs in OneFabricConnect or through our open, standards-based controller OneController. OpenStack will leverage OneController for the overlay and underlay management. Thus, Extreme ensures that any business applications developed through them will be deployable with Extreme. 2.5 DEVOPS To manage the large numbers of data center servers and VMs that are running typically identical application and services, DevOps community uses tools like Puppet, Chef, Salt, and Ansible. These tools provide a programmatic way to perform configuration tasks. Although traditionally under the compute admin domain, these tools are useful for management of network infrastructure as well, so under the same umbrella these administrative tools can manage both compute and network and even storage. They can maintain switches and verify their configuration by making them check-in with a centralized server, and to the tool the switch looks like just another device. Extreme Networks easily supports the DevOps tools, which are based on open source code with vendor specific interfaces. 2.6 CLI SCRIPTING Extreme Networks platforms support CLI scripting that could be used to generate automated sequences, embedded as part of an overall automated workflow in the data center. To streamline deployment and administration of the data center Data Center Solutions Guide – White Paper 11 network, data center IT administrators can leverage ExtremeXOS automated switch management capabilities. The CLI-based scripting, with TCL and python support, allows users to significantly automate switch management through support of variables and functions that users customize for handling special events. ExtremeXOS has a flexible framework that can enable selected trigger events that are directly tied to the Event Monitoring System (EMS) to activate dynamic profiles, such as when a user or device connection to a switch port. These profiles contain script commands and cause dynamic changes to the switch configuration, and can be used for general manageability of the network or to enforce policies. For example, scripts can be triggered based on movement of Virtual Machines or MAC addresses that can then adapt a port’s configuration to match that of a VM. Another example is where a script can be triggered when a storage array is detected. The script can be used to configure the switch and network for storage traffic including things like enabling jumbo frame support, assigning the storage traffic to a certain traffic class and setting up bandwidth guarantees for that traffic class. Caption: Data Center Automation and Customization Data Center Solutions Guide – White Paper 12 3 Network Abstraction Data centers also have challenges with sub-optimal traffic flows and workloads that fail to meet application requirements for low latency or resource isolation. Elastic resourcing or dynamic nature of traffic flows also makes traffic patterns unpredictable. To optimize the network infrastructure to ensure that the underlay is performing optimally, minimizing delay, and providing flexible movement of workloads, there needs to be intelligent route selection, i.e. traffic engineering, and intelligent VM placement and intelligent VM placement. Abstraction is key to achieving the agility, manageability, and elasticity in the data center. Abstraction of the network removes it as a bottleneck, enables VM-VM reachability regardless of location, and provides the ability to rapidly react to business application needs. Networks can be abstracted into network overlays with a corresponding network underlay. Overlay technologies like VXLAN enable VMs to communicate with one another while maintaining isolation. Ultimately, they allow data centers to meet larger scalability requirements for logical network domains and reduce time and cost to deploying new services through network virtualization. Underlay technologies are the traditional network pieces that comprise the network fabric. Underlay solutions must be aware of the network elements, do intelligent traffic-engineering, service insertion, provide tenant-based QoS, manage malicious behavior, etc. Overlays cannot be agnostic of the underlays; there should be feedback mechanisms between the overlay and underlay for optimized performance. Networks are underutilized as they are, the abstraction will help use the resources better. Plus the network abstraction and overlays and underlays must be managed by a centralized system that has a view of the entire domain. Analogous to the role of the hypervisor on the server that abstracts compute resources and carves them up for the host VMs, the network needs the role of a controller that abstracts the network fabric resources and carves them up for tenant VMs. This means the network needs a centralized controller that will enable new services, and Extreme Networks OneController is described in the next section 3.1 ONECONTROLLER Data centers need a single platform to tie together network management, network access control, network optimization, advanced application analytics. The single platform can also tie together a heterogeneous, brownfield data center that has deployed multiple vendors, white box and black box, plus enable the developing Network Function Virtualization (NFV) solutions. A single platform promotes community led innovation on top of that platform when it is standards-based and comprehensive. When it can be deployed ready to integrate with existing and multi-vendor hardware and software network environments, it preserves customer investments and avoids vendor lock-in. Extreme Networks’ OneController is based on a hardened OpenDaylight (ODL) controller, preserving the integrity of the open API provided by ODL while extending data center orchestration, automation and provisioning to the entire network under a single pane of glass. OneController has multiple APIs for interfacing to the infrastructure elements and can provide overlay and underlay functionality. The architecture is highly available and redundant, and provides scalability, both horizontal scalability to support additional devices and vertical scalability being lightweight. For redundancy, if multiple OneControllers are deployed in Active/ Active or Active/Standby mode, a network management system can provision and manage the multiple instances, and perform life-cycle management. The software Data Center Solutions Guide – White Paper 13 development kit (SDK) and developer community will enable customers to evolve the network to keep pace with emerging security, wireless, and converged SDN infrastructure. The result is a simpler development platform for the data center. 3.2 NETWORK ACCESS CONTROL Data Center security needs to be ingrained in every device and also abstracted out to manage security from a holistic network perspective. Datacenter IT administrators need to ensure that only the right users have access to the right information from the right place at the right time including time of day, location, authentication types, device and OS type, and end system and user groups. They need to perform multiuser, multi-method authentication, vulnerability assessment and assisted remediation, and to choose whether or not to re-strict access for guests/contractors to public Internet services only—and how to handle authenticated internal users/devices that do not pass the security posture assessment. Businesses need the flexibility to balance user productivity and security. Extreme’s solution is a centrally deployed and managed appliance called Network Access Control (NAC). NAC has unique capabilities to take action on the data center. It has visibility into end hosts and the network elements and can react dynamically to changes. For example, a vMotion moves a VM from one rack to another rack and then NetSight can authenticate the VM and provision the VLAN and policy associated with that VM on the new top of rack switch and remove it from the old top of rack switch. This VM tracking capability is a result of integration between the NetSight, NAC, and the network elements, and DCM on top of that allows for automation and orchestration with the hypervisor elements. NAC rules to authenticate VMs and provision VLANs and policies associated with the VM Data Center Solutions Guide – White Paper 14 3.3 ANALYTICS-AS-A-SERVICE Applications hosted in the Data Center are critical to customers and can be business-impacting. They vary from commercial off-the-shelf applications to customized applications to unique homegrown applications, and they can be extremely complex and demand a certain SLA level. As a result, Data center IT administrators needs visibility into these applications and how they are being used and to be able to analyze the application impact on the data center infrastructure and vice versa, without deploying new or intrusive infrastructure elements to be able to do this analysis. From a business perspective, an in-depth view into the real-time and historical network and applications also provides valuable information for up front budget planning when implementing new applications for the business while also ensuring security compliance for approved applications. This saves both time and money for the business when the critical applications are running at the best possible performance. From a Data center IT administrator perspective, visibility into network and application performance allowing IT to pinpoint and resolve performance issues in the infrastructure whether they are caused by the network, application, or server. It is useful to get total application visibility of inter-VM traffic within the data center, even between VMs on the same server. By eliminating unnecessary application delay, users become more efficient and can focus on the important aspects of their jobs. Analytics go beyond just application visibility. Analytics-as-a-Service is a more encompassing concept, to provide pervasive and actionable visibility and coalesces data from multiple sources. And then serve that data to be consumable by other applications such as Security, Performance, and Multi-tenancy. Analytics-as-a-Service is also an enabler for intelligent VM placement via topology awareness. Intelligent VM placement goes beyond evaluating server workload and resource availability (CPU, memory) and beyond placement of services to co-exist with users within the same geographical data center. There needs to be placement of services within the same rack or even same server as the clients to meet the application SLAs that are becoming more and more stringent. Analytics are also becoming a more critical component of Big Data environments. Extreme’s Purview platform delivers a network powered application analytics and optimization solution that captures and analyzes context-based application traffic to deliver meaningful intelligence - about applications, users, locations and devices. Purview uses deep packet inspection (DPI) technology with a rich set of application fingerprinting techniques to detect internally hosted applications (SAP, SOA traffic, Exchange, SQL, etc.), public cloud applications (Salesforce, Google, Email, YouTube, P2P, file sharing, etc.), and social media applications (Facebook, Twitter, etc.) at Layer 7 of the OSI model, enabling guarantees for a quality user experience for business critical applications. Purview includes over 14,000 application fingerprints and new fingerprints are continually added. Application fingerprints are XML files that are developed by Extreme or they can be developed by users themselves to provide visibility to custom applications that may be used by an organization. Application detection does not stop with signature-based fingerprints though. To detect applications that try to obscure themselves (like P2P and others) Purview also includes heuristics (behavioral detection) based fingerprints to ensure the applications are detected appropriately. Through its robust fingerprinting technology, Purview is able to identify an application regardless of whether they run on well known ports or use non-standard ports. Data Center Solutions Guide – White Paper 15 Purview is enabled by the CoreFlow2 ASIC, in Extreme S-Series and/or K-Series switches, which identifies new flows and sends a few packets for every new flow to the Purview engine. Application fingerprinting takes place in the Purview engine and is then combined with non-sampled NetFlow data collected from the CoreFlow2 powered switch for the duration of the flow, which allows the Purview engine to process traffic at unprecedented scale with no performance degradation to the network itself. The Purview Engine determines the application, extracts application context information such as URL, certificate information, browser version, device hardware and OS. It measures response times, aggregates the data, adds additional context derived from identity and access control (optional) like user, role, device type and identity, location, and then sends it to NetSight for storage. OneFabric Control Center, which is part of NetSight, provides complete application management and reporting through dashboards and detailed reporting for Purview. For example, when Purview is deployed with Extreme, information such as user, role, device type, and location are integrated with the application flows. Further, taking advantage of the OneFabric Connect API and SDN architecture, allows simple integration with other IT applications such as analytics or Big Data processing engines via XML/SOAP as well as real time notifications using syslog. Purview can also be integrated with technologies that provide VM-to-VM traffic for VMs residing on the same hypervisor. For example, Purview integrates with Ixia’s Phantom vTap to extend application visibility from physical to virtual networking across the entire data center. Administrators can mirror traffic from the VMs, sending traffic of interest to Purview, and then they have a complete view of the data center for total visibility, security, and control.. 4 Network Infrastructure The network fabric provides interconnectivity between servers, storage, security devices, and the rest of the IT infrastructure. This section describes the requirements of that data center network. The hardware and software should perform well and address these requirements regardless whether the data center is a small deployment Data Center Solutions Guide – White Paper 16 or high scale deployment, considering typical data center modeling parameters like number of physical servers, virtual machines, VLANs, racks, tenants, etc. 4.1 HIGH AVAILABILITY High Availability (HA) is crucial to data center networks. Data center failure costs include both revenue lost and business creditability. System availability is simply calculated by “system uptime” divided by “total time.” Availability =( MTBF)/( MTBF+MTTR) where MTBF is Mean Time Between Failure, MTTR is Mean Time To Repair AVAILABILITY DOWN TIME PER YEAR 99.000% 3 Days 15 hours 36 minutes 99.000% 1 Day 19 hours 48 minutes 99.000% 8 hours 46 minutes 99.000% 4 hours 99.000% 23 minutes 53 minutes 99.000% 5 minutes 99.000% 30 seconds The table above shows availibilty percentage and down time per year Typically, network architects expect to see 4 or 5 “nines” system availability. Each additional “9” can raise deployment costs significantly. To achieve a data center with near zero down time, data center IT administrators need to consider both system/ application resiliency and network resiliency. For connectivity itself, there are two aspects to consider: System-level resiliency: increasing availability by using reliable and robust hardware and software designed specifically for HA and minimizing the MTTR by using resilient hardware. One must also consider data center site redundancy through warm standby or hot standby, and Disaster Recovery (DR) scenarios. In warm standby, the primary data center will be active and provide services while a secondary data center will be in standby. The advantage to warm standby is simplicity of design, configuration and maintenance. However, the disadvantage is no load sharing between two sites, which leads to under utilization of resources, inability to verify that the failover to secondary site is fully functional when it is not used consistently during normal operation, and an unacceptable delay in the event that a manual cutover is required. It is also difficult to verify that the “warm” failover is functional when it is not used during normal operation. In hot standby, both the primary and secondary data centers provide services in a load sharing manner, optimizing resource utilization. The disadvantage to this scenario is that it is significantly more complex, requiring the active management of two active data centers and implementation of bi-directional data mirroring, or synchronous replication, which results in additional overhead and more bandwidth between the 2 sites. 4.2 MULTIPATH Extreme Networks connectivity solutions provide the ability to compress the traditional 3-tier network into a physical 2-tier network by virtualizing the routing and switching functions into a single tier (the middle tier). Virtualized routing provides for greater resiliency and fewer switches dedicated to just connecting switches. Reducing the number of uplinks (switch hops) in the data center improves application performance as Data Center Solutions Guide – White Paper 17 it reduces latency throughout the fabric. The aggregation and core are merged into a single layer by virtualizing the router function in the data center LAN switch. Two-tier Data Center Design Switches are typically deployed in pairs with redundant links inter-connecting them for resiliency. While this definitely satisfies the desired high-availability, it does introduce the concepts of loops within the environment. In an effort to avoid these loops, traditional Layer 2 loop prevention protocols like Spanning Tree Protocol (STP) were developed. However, STP has many limitations such as inefficient utilization of links and high convergence times. Modern network fabric designs steer away from STP. Depending on the size of the deployment and other requirements, customers can consider several options as described below. 4.2.1 SPINE-LEAF WITH DEVICE-LEVEL AGGREGATION One can address both the performance as well as the resiliency requirements of small to medium virtualized data centers by extending the link-level redundancy capabilities of link aggregation and add support for device-level redundancy. This can be accomplished by allowing one end of the link aggregated port group to be dualhomed into two different devices to provide device-level redundancy. With device-level aggregation, the aggregated devices present themselves as a single entity, and the remote device uses regular link aggregation. The devices leverage a Layer 2 meshed network fabric for interconnectivity. The upstream switches now work together to create the perception of a common link aggregated group so that downstream switch doesn’t see anything different from a link aggregation perspective, even though the link aggregated ports are now distributed across multiple switches. This enables all links to be utilized; no links are blocked as they would be in STP. Data Center Solutions Guide – White Paper 18 The design below shows device-level aggregation used at the data center LAN access layer providing connectivity for both applications and IP storage – iSCSI or NFS attached. STP versus MLAG/VSB • STP is configure per • All links are active in an VLAN, thus links are MLAG topology put into blocking state • MLAG uses special based on STP algorithm blocking logic that • Effective bandwidth for prevents L2 loops switch is only 40G • Effective bandwidth is 160G This two-tier leaf-spine architecture provides high performance and resiliency in an easy deployment architecture, by extending the link-level redundancy capabilities of link aggregation and adding support for device-level redundancy. With an activeactive model, it can load share for full utilization of network bandwidth. It also has fast failover convergence performance. Data Center Solutions Guide – White Paper 19 The spine is composed of high performance, high port density switches. The spine switches can be modular and support any combination of interface modules. They will have connections to: • Upstream WAN device or another data center’s spine switch • Peer spine switch • Every leaf switch for full-mesh • Storage devices (if not at the leaf layer) • Firewalls The leaf is composed of highly resilient switches. It provides intra-rack connectivity, functioning as top of rack switch. The leaf switches will have connections to: • Upstream spine switches • Peer leaf switch via the ISC • Storage devices (if not at the spine layer) The device-level redundancy on the BDX8 and Summit X670 is provided via the feature Multi-Switch Link Aggregation (MLAG), and on the S-series and 7100 series is provided via the feature Virtual Switch Bonding (VSB). Data Center Solutions Guide – White Paper 20 Extreme Networks MLAG feature allows devices to see a pair of physical switches as a single logical switch. A device can connect via link aggregation to two MLAG switches and to the connecting device, they look like a single switch. This functionality provides redundancy at any layer, at the server access layer or at the aggregation layer. It dynamically provisions trunked server connectivity using IEEE 802.1AX/802.3ad link aggregation protocols. Dynamic trunk provisioning can lower OPEX overhead in comparison to static server NIC teaming. In virtualized configurations, assigning virtual hosts to an aggregated link provides better application performance and reduces the need for hypervisor network configuration. MLAG peers have an Inter Switch Connection (ISC) dedicated control VLAN that is used exclusively used for inter-MLAG peer control traffic and should not be provisioned to carry any user data traffic. Data traffic however can traverse the ISC port using other user-defined data VLANs. This diagram shows MLAG configuration between leaf layer down to server. Data Center Solutions Guide – White Paper 21 This diagram shows MLAG configuration between spine and leaf layer MLAG PORT LAG PORT SPINE, LAG 1:1 + SPINE02, LAG 1:1 = MLAG ID1 QSFP+ SPLITTER LEAF01, LAG 49 + LEAF02, LAG49 = MLAG ID 1 QSFP+ CABLE CABLE Extreme also supports MLAG switches to create one or two MLAG peers. The design in this document focuses on any given switch having just one MLAG peer, but it is possible for one switch to have two MLAG peers as in a linear daisy chain of ISCs. Customers can split the downlink hosts or switches between the peers such that if one of the switches has a failure, only a subset of hosts or switches would lose half their bandwidth – the remainder connected to the other two MLAG peers would not be impacted. All the basic MLAG functionality and traffic forwarding rules apply to one or two MLAG peers. Data Center Solutions Guide – White Paper 22 4.2.1.2 VIRTUAL SWITCH BONDING Extreme Networks Virtual Switch Bonding (VSB) is a similar feature as MLAG in that other switches see the switches as one, but VSB additionally enables the chassis to be managed as a single entity. Instead of managing two devices each with N ports, VSB allows administrators to manage one device with 2xN ports. All features are seamlessly available across both VSB switches. It is a single router, single switch architecture, single configuration. All features are distributed, The VSB switches may be connected via dedicated hardware ports. S-Series VSB allows two chassis to be fully virtualized to form a single entity via dedicated hardware ports or normal 10G ports. The S-Series depending on the model can use either multiple ordinary 10G ports or multiple dedicated VSB ports to form the high speed link between chassis. 7100-Series virtual switch bonding will allow up to eight switches to form a single entity. Data Center Solutions Guide – White Paper 23 4.2.2 SPB For larger data centers with more servers, customers can deploy IEEE Shortest Path Bridging (SPB), a standards-based protocol. SPB is plug-and-play, leveraging the IS-IS link state protocol for building a global view of the switch topology and to control the Layer 2 data plane. SPB builds shortest path trees for each node to every other node within the domain. These unique shortest path trees ensure efficient usage of available links within the mesh by always using the shortest path between any two nodes in the domain. Where multiple equal cost paths exist, the protocols provides Equal Cost Multipath (ECMP) algorithms to further distribute the load and efficiently utilize equal path links through the network. Fully meshed data center designs leveraging SPB provide load-sharing through the efficient use of multiple paths through network. They improve the resiliency of the networks because they: • Have the ability to use all available physical connectivity • Enable fast restoration of connectivity after failure • Restrict failures so only directly affected traffic is impacted during restoration; all surrounding traffic continues unaffected • Enable rapid restoration of broadcast and multicast connectivity simultaneously Data Center Solutions Guide – White Paper 24 Shortest Path Bridging (SPB) IEEE 802.1aq was developed as an evolution of the various Spanning Tree protocols. SPB’s IEEE 802.1 heritage ensures full interoperability with the existing RSTP/MSTP topologies, in fact SPB leverages the spanning tree state machine for controlling forwarding on a per shortest path tree basis. Shortest Path Bridging comes in 2 versions: • SPBV, using 802.1Q VLAN translation data plane forwarding • SPBM using 802.1ah MAC-in-MAC encapsulation for data plane forwarding SPB can be used in conjunction with Fabric Routing and/or Routing-as-a-Service to push routing to the edge, to optimize east/west and north/south traffic flows. 4.3 REDUNDANCY 4.3.1 LAG AND LACP Link aggregation (LAG) feature allows customers to increase bandwidth and availability by using a group of ports to share the traffic load between parallel links to the same peer device. It provides redundancy through multiple connections, and they should be load sharing (active-active). This applies for both connectivity between switches and connectivity between switch and other devices such as hypervisors. As the consolidation of servers increases, so does the need for resiliency. ExtremeXOS software supports dynamic load sharing which includes the Link Aggregation Control Protocol (LACP) and Health Check Link Aggregation. The Link Aggregation Control Protocol is used to dynamically determine if link aggregation is possible and then to automatically configure the aggregation. LACP is part of the IEEE 802.3ad standard and allows the switch to dynamically reconfigure the link aggregation groups (LAGs). The LAG is enabled only when LACP detects that the remote device is also using LACP and is able to join the LAG. Health Check Link Aggregation is used to create a link aggregation group that monitors a particular TCP/IP address and TCP port. Static load sharing is also supported but is susceptible to configuration error. 4.3.1.1 LOAD SHARING ALGORITHMS Depending on the traffic characteristics within the fabric, the appropriate loadsharing algorithm should be selected to ensure proper distribution of traffic across the fabric. These load-sharing algorithms will need to be explicitly configured by the customer based on the nature of the traffic patterns that exist within the infrastructure. Administrators can configure the egress-link selection algorithm to factor in different traffic components such as Layer 2 source and destination MAC addresses, Layer 3 IPv4 or IPv6 source and destination IP addresses (IPv4 and IPv6), Layer 4 TCP or UDP source and destination port numbers, MPLS labels, etc. LACP should be used when configuring LAG between the spine and leaf switches, for the ISC LAG. Where supported on the virtualization, customers should also configure LACP on the switch edge ports and enable it on the servers. 4.3.2 VRRP 4.3.2.1 OVERVIEW Virtual Router Redundancy Protocol (VRRP) allows multiple switches to provide redundant routing services to users. VRRP is used to eliminate the single point of failure associated with manually configuring a default gateway address on each host in a network. Without using VRRP, if the configured default gateway fails, you must reconfigure each host on the network to use a different router as the default gateway. VRRP provides a redundant path for the hosts. Using VRRP, if the default gateway Data Center Solutions Guide – White Paper 25 fails, the backup router assumes forwarding responsibilities. When a VRRP router instance becomes active, the master router issues a gratuitous ARP response that contains the VRRP router MAC address for each VRRP router IP address. The VRRP MAC address for a VRRP router instance is an IEEE 802 MAC address in the following hexadecimal format: 00-00-5E-00-01-<vrid>. The master also always responds to ARP requests for VRRP router IP addresses with an ARP response containing the VRRP MAC address. Hosts on the network use the VRRP router MAC address when they send traffic to the default gateway. 4.3.2.2ACTIVE/ACTIVE VRRP VRRP in Active/Active mode allows both switches of an MLAG pair to simultaneously act as the default gateway for the subnet. Data Center Solutions Guide – White Paper 26 4.3.2.3FABRIC ROUTING Central to all data center designs is the need for optimized traffic routing within the data center as well as between data centers. Extreme Networks leverages standards based VRRP to provide a single virtual router gateway shared across multiple physical devices to provide redundancy and layer 3 resiliency. Normally traffic that needs to be Layer 3-routed are sent to the Aggregation switches running VRRP, which results in suboptimal east/west data center traffic flows. Traffic flows without Fabric Routing Data Center Solutions Guide – White Paper 27 Extreme Networks Fabric Routing is an enhancement to VRRP that optimizes the flow of east/west traffic within the data center by allowing the closest router to forward the data regardless of VRRP mastership. In a Fabric Routing enabled domain, the traffic is routed by the first switch/router directly to the destination regardless of the VRRP state it is in. This creates a distributed, nearest hop routing within the fabric that optimizes throughput, latency and traffic flows (minimizing traffic load through the fabric). No new protocols are required for Fabric Routing; fabric routing-enabled devices are fully compatible with standards-based devices. Traffic flows with Fabric Routing 4.3.2.4VRRP TRACKING There are certain cases wherein a VRRP master may have to relinquish its VRRP master status due to failure of an uplink connection. This uplink connection may be the Data Center’s link to the Internet. In order to add additional intelligence to such switches, VRRP tracking should be implemented. EXOS supports three VRRP tracking modes in ANY or ALL logic: • VLAN Tracking: track active VLANs, e.g. VLANs that go to the Core • Route Table Tracking: track specified routes in the routing table, e.g. route to Core next hop Data Center Solutions Guide – White Paper 28 • Ping Tracking: track connectivity using a simple ping to any outside responder, e.g. IP address of Core next hop If the tracking condition fails, then VRRP behaves as though it is locally disabled and relinquishes master status. [Works with Active/Active?] 4.3.3 ROUTING-AS-A-SERVICE As described above, Fabric Routing optimizes east/west data center traffic by pushing routing functionality to the edge of the network so that inter-VLAN traffic can be switched at the edge. In an SPB deployment, similar edge routing value can be achieved as Fabric Routing but eliminates Layer 3 routing protocols including VRRP, and this uses a new Extreme Networks feature called Routing-as-a-Service. In an SPB deployment, eliminating VRRP may be desirable since VRRP has its own drawbacks, independent of SPB, that may make it undesirable. VRRP is a chatty protocol that sends advertisements once per second, so it can be fairly resource intensive and scaling becomes an issue if there are many interfaces that may require it. Routing-as-a-Service preserves the VRRP property of virtual IP addressing (anycast addressing) and router redundancy without actually using VRRP, and utilizing the best path attributes of SPB. It interoperates with any traditional switch and can be positioned in various Layer 2 configurations including Virtual Switch Bonding (VSB) and redundantly attaching to Rapid and Multiple Spanning Trees (RSTP/MSTP). Traditional routing Data Center Solutions Guide – White Paper 29 Routing-as-a-Service requires only knowledge of the VLANs within the SPB domain and traffic can take the optimal path to the destination using MAC resolutions inherent in SPB. A SPB device that has the routing service enabled on one or more interfaces will forward traffic to any destination IP addresses that match locally connected subnets. It will respond to ARP requests using a virtual MAC derived from the interface configuration. When it cannot resolve the IP destination, it redirects the packet towards the SPB device capable of doing full routing Routing-as-a-Service With SPB, it is possible to know the topology of participating devices as SPB uses IS-IS to compute all paths to all devices. It is also possible to know the whereabouts of all hosts, i.e. the precise access devices where the hosts attach to the network. If every device has all VLANs configured, then all subnets in the domain are already locally attached and traffic can be routed directly to their destination via SPB. Hosts on different VLANs (and thus different IP subnets) within the SPB domain can communicate with one another through single hop routing. Routing-as-a-Service also avoids asymmetrical routing. Routing-as-a-Service offers a value proposition of multiple best path connections while retaining the essential qualities inherent in routing: segmentation and access control. This means hosts within the SPB domain can communicate in the most direct way regardless of VLAN association. Data Center Solutions Guide – White Paper 30 4.3.4 SERVER LOAD BALANCING Using the unique capabilities of Extreme Networks S-Series switches, a load balancing solution can be implemented without requiring any additional hardware. Load Sharing Network Address Translation (LSNAT, as defined in RFC 2391) allows an IP address and port number to be transformed into a Virtual IP address and port number (VIP) mapped into many physical devices. The Extreme Networks S-Series provides LSNAT support on a per VRF basis allowing multiple tenants to each utilize the virtualization and load balancing capabilities separately on the same device. When traffic destined to the VIP is seen by the LSNAT device, the device translates it into a real IP address and port combination using a selected algorithm such as Round Robin, Weighted Round Robin, Least Load or Fastest Response. This allows the device to choose from a group of real server addresses and replace the VIP with the selected IP address and port number. The LSNAT device then makes the appropriate changes to packet and header checksums before passing the packet along. On the return path, the device sees the source and destination pair with the real IP address and port number and knows that it needs to replace this source address and source port number with the VIP and appropriate checksum recalculations before sending the packet along. Persistence is a critical aspect of LSNAT to ensure that all service requests from a particular client will be directed to the same real server. Sticky persistence functionality provides less security but increased flexibility, allowing users to load balance all services through a virtual IP address. In addition, this functionality provides better resource utilization and thus increased performance. An essential benefit of using LSNAT is that it can be combined with routing policies. Configuring different costs for OSPF links, a second redundant server farm can be made reachable by other metrics. In this way, load balancing is achieved in a much more cost effective manner. 4.4 LOGICAL SEPARATION In a virtualized environment there often is a requirement to support multiple tenants. In an effort to protect multiple tenants from each other, logical separation is established on Layer 2 and Layer 3 level. There are different levels of logical separation: • VLAN • VRF • VR In the given design, all the Layer 3 interfaces will be configured on the Spine switch with the Leaf switches acting as pure Layer 2 transport. Thus no VR/VRF instances will be configured on the TOR switches, but all the VLANs need to be configured on both leaf and spine. Note: VRF in this context is not to be confused with Layer 3 VPN VRFs. 4.4.1 VLAN VLAN: At a basic level, the term VLAN is used to refer to a collection of devices that communicate as if they were on the same physical LAN. LAN segments are not restricted by the hardware that physically connects them, hence the V=virtual. The default VLAN is untagged on all ports, and there can be a maximum of 4094 VLANs on any Extreme platform. Data Center Solutions Guide – White Paper 31 In order to provision customers with VLANs, it is recommended that ranges be allocated for • Internal Infrastructure VLANs • Management VLANs • Tenant VLANs • Public VLANs 4.4.2 VRS The ExtremeXOS software supports virtual routers. This capability allows a single physical switch to be split into multiple virtual routers. This feature isolates traffic forwarded by one VR from traffic forwarded on a different virtual router. Each virtual router maintains a separate logical forwarding table, which allows the virtual routers to have overlapping IP addressing. Because each virtual router maintains its own separate routing information, packets arriving on one virtual router are never switched to another. Ports on the switch can either be used exclusively by one virtual router, or can be shared among two or more virtual routers. Each VLAN can belong to only one virtual router. There are System VRs and User-defined VRs. The System VRs are used for Management and there is one default VR-Default pre-allocated. The user VRs can be created for tenants and the VRs support any of the switch routing protocols including BGP, OSPF, ISIS, RIP etc. If the customers are deploying a multi-tenant environment where only the Gold service tier use dedicated VRs, the VR scale limit can dictate the number of Gold tenants that can be supported. Examples where VR needs may be required for tenants: higher bandwidth, routing needs extend beyond static routing or BGP routing. 4.4.3 VRF Virtual Router and Forwarding instances (VRFs) are similar to VRs in that they maintain isolation. The routing tables for each VRF are separate from the tables for other VRs and VRFs, so VRFs can support overlapping address space. VRFs are created as children of user VRs or VR-Default, and each VRF supports Layer 3 routing and forwarding. VRFs can only run static and BGP. VRFs tend to scale better than VRs as they require fewer resources, so VRFs are preferable for tenant isolation. 4.5QUALITY OF SERVICE (QOS) 4.5.1 OVERVIEW The Quality of Service (QoS) concept of quality is one in which the requirements of some applications and users are more critical than others, which means that some traffic receives preferential treatment. By using QoS mechanisms, network administrators can use existing resources efficiently and ensure the required level of service without reactively expanding or over-provisioning their networks. Using QoS in a data center allows you to: • Give some traffic groups higher priority access to network resources • Reserve bandwidth for special traffic groups • Restrict some traffic groups to bandwidth or data rates defined in a Service Level Agreement (SLA) Data Center Solutions Guide – White Paper 32 • Count frames and packets that exceed specified limits and optionally discard them (rate limiting) • Queue or buffer frames and packets that exceed specified limits and forward them later (rate shaping) • Modify QoS related fields in forwarded frames and packets (remarking) To ensure end-to-end QoS adherence, customers should predetermine the QoS requirements for the different tiers of service (not forgetting the management traffic!), and each CoS mapping should be mapped to a certain bandwidth reservation. Then all switches should be configured consistently with CoS to qosprofile mappings across the environment. All qosprofiles should be configured with the correct bandwidth reservations as requirements dictate. Additionally, servers should match the configuration as well. 4.5.2 CLASSIFICATION In the given IaaS platform, traffic can be classified • Infrastructure traffic • User traffic, for example Gold, Silver, Bronze The ACL-based traffic classification provide the most control of QoS features and can be used to apply ingress and egress rate limiting. An ACL can be used to add traffic to a traffic group based on the following frame or packet components: • MAC source or destination address • Ethertype • IP source or destination address • IP protocol • TCP flag • TCP, UDP, or other Layer 4 protocol • TCP or UDP port information • IP fragmentation Depending on the platform you are using, traffic classified into an ACL traffic group can have one of these actions: • Assigned to an ingress meter for rate limiting • Marked for an egress QoS profile for rate shaping • Marked for an egress traffic queue for rate shaping • Marked for DSCP replacement on egress • Marked for 802.1p priority replacement on egress • Assigned to an egress meter for rate limiting Non-ACL based traffic classification (CoS 802.1p OR DiffServ) specify ingress or egress QoS profile for rate limiting and rate shaping. These groups cannot use ingress or egress software traffic queues. However, non ACL-based traffic groups can use the packet-marking feature to change the dot1p or DiffServ values in egress frames or packets. In addition, port based traffic classification groups forward traffic to egress QoS profiles based on the incoming port number. VLAN-based traffic classification forward traffic to egress QoS profiles based on the VLAN membership of the ingress port. Data Center Solutions Guide – White Paper 33 Extreme also provides an easy-to-configure flood control mechanism which automatically classifies different types of flooded traffic (broadcast, multicast, unknown destination MAC) and then rate limits it. Traffic may be flooded when a VM or applications on the VM are misbehaving, and by enforcing a maximum bandwidth on the flooded traffic, the switch can minimize the data center impact of ingress flooding traffic. Ports can be configured to accept a specified rate of flooded packets per second, and if that rate is exceeded, the port blocks traffic and drops subsequent packets until the traffic again drops below the configured rate. This can prevent degraded throughput performance and even network outages. 4.5.3 EGRESS QOS PROFILES QoS Profiles are queues that provide ingress or egress rate limiting and rate shaping. Egress QoS profiles is supported on all ExtremeXOS switches and allows you to provide dual-rate egress rate shaping for all traffic groups on all egress ports. When you are configuring ACL-based traffic groups, you can use the qosprofile action modifier to select an egress QoS profile. For DiffServ-, port-, and VLAN-based traffic groups, the traffic group configuration selects the egress QoS profile. For CoS dot1p traffic groups on all platforms, the dot1p value selects the egress QoS profile. BlackDiamond X8 series switches and Summit family switches have two defaults egress QoS profiles named QP1 and QP8. Up to six additional QoS profiles (QP2 through QP7) can be configured on the switch. The default settings for egress QoS profiles are summarized in the following table. Table 1 - Default EXoS QoS Table INGRESS 802.1P PRIORITY VALUE 0-6 7 EGRESS QOS PROFILE NAME 17 QUEUE SERVICE PRIORITY VALUE BUFFER WEIGHT NOTES QP1 1(Low) 100% 1 This QoS profile is part of the default configuration and cannot be deleted. QP2 2(LowHi) 100% 1 You must create the QoS profile before using it. QP3 3(Normal) 100% 1 You must create this QoS profile before using it. QP4 4(NormalHi) 100% 1 You must create this QoS profile before using it. QP5 5(Medium) 100% 1 You must create this QoS profile before using it. QP6 6(MediumHi) 100% 1 You must create this QoS profile before using it. QP7 7(High) 100% 1 You must create this QoS profile before using it. You cannot create this QoS profile on SummitStack QP8 8(HighHi) 100% 1 This QoS profile is part of the default configuration and cannot be deleted. 4.5.4 BUFFER MANAGEMENT Regardless of how the network oversubscription is designed, one has to be aware of the fact that storage technologies will create a completely different traffic pattern on the network than a typical user or VDI (Virtual Desktop Infrastructure) session. Storage traffic typically bursts to very high bandwidth in the presence of parallelization (especially within storage clusters which serve a distributed database). New standards like parallel Network File System (pNFS) increase that level of parallelization towards the database servers. This parallelization will often lead to the condition in which packets must be transmitted at the exact same time (which is obviously not possible on a single interface); this is the definition of an “incast” problem. The switch needs to be able to buffer these micro bursts so that none of the packets in the transaction get lost, otherwise the whole database transaction will fail. As interface speeds increase, large network packet buffers are required. Data Center Solutions Guide – White Paper 34 Extreme offers different types of switches depending on the data center requirements. On one hand, applications with longer-living TCP sessions, such as storage, iSCSI, FCoE, backup applications, data replications, NFS, streaming, etc., require larger network packet buffers. For such cases, the Extreme Networks S-Series is perfectly positioned with a packet buffer that exceeds 2 Gigabytes per I/O slot modules to solve this problem. On the other hand, applications with shorter-living TCP sessions or transactions, such as high frequency trading, database transactions, character-oriented applications, many web applications, don’t have such large buffer requirements. For such cases, Extreme BDX8 and X670 leverages Smart Buffer technology which provides a dynamic and adaptive on-chip buffer allocation scheme that is superior to static per-port allocation schemes and avoids latency incurred by off-chip buffers. Ports have dedicated buffers and in addition can get extra buffer allocation from a shared pool as needed, thereby demonstrating an effective management of and tolerance for microbursts. In contrast, arbitrarily large off-chip buffers can exacerbate congestion or can increase latency and jitter, which leads to less deterministic Big Data job performance, especially if chaining jobs. While the Extreme hardware maximizes burst absorption capability and addresses temporary congestion, it also maintains fairness. Since Extreme’s Smart Buffer technology is adaptive in the shared buffering allocations, uncongested ports do not get starved of access to the shared buffer pool and they are not throttled by congestion on other ports, while still allowing congested ports to get more of the buffers to address the traffic burst. 4.5.5 OVERSUBSCRIPTION The acceptable oversubscription in a data center network, is highly dependent on the applications in use and is radically different than in a typical access network. Today’s design of presentation/web server, application server and database server “layers” combined with the new dynamics introduced through virtualization make it hard to predict traffic patterns and load between given systems in the data center network. The fact is that servers which use a hypervisor to virtualize applications yield higher performance and the resulting average demand on the interfaces belonging to these systems will be higher than on a typical server. Also, if virtual desktops are deployed, one has to carefully engineer the oversubscription and the quality of service architecture at the LAN access as well. Typically 0.5 to 1 Mbit/s per client must be reserved – without considering future streaming requirements. Data Center Solutions Guide – White Paper 35 Challenges with oversubscription include: • Potential for congestion collapse • Slow application performance • Potential loss of control plane traffic In general, oversubscription is simply calculated by using the ratio of network interfaces facing the downstream side versus the number of interfaces facing the upstream side (uplink) to the data center core In the case of a MLAG and VSB, all the links between switches are active and allow traffic to flow through. In the case of x670, the oversubscription ratio at the leaf switch is 3:1 (480G down and 160G up). In the case of a single 40G link failure between a leaf switch and spine switch, the oversubscription ratio at the edge switch will change to 4:1. If traffic utilization between the leaf and spine switch is too high, the 4:1 could cause serious congestion and packet drop even in a single link failure scenario. So if it is necessary to maintain the desired oversubscription rate in the event of single link failure, additional interfaces may be required in the design. 4.5.6 DATA CENTER BRIDGING Data Center Bridging (DCB) is used to enhance LANs to support I/O convergence in the data center, so that Ethernet LAN traffic and Fibre Channel (FC) storage area network (SAN) traffic can be transported on the same Ethernet-based network infrastructure. Standard Ethernet does not support lossless transport, but the DCB extends Ethernet to enable it to provide the level of CoS necessary to transport FC frames encapsulated in Ethernet over an Ethernet network. Essentially, DCB enables the different treatment of traffic based on a set of priorities. The benefits of a converged (bridged) data center network include: • Simpler management with only one fabric to deploy, maintain, and upgrade. • Fewer failure points where networks connect. • Lower costs because fewer cables, switches and other equipment require less power to communicate. Extreme’s data center solutions use the following specifications from the IEEE 802.1 DCB Task Group are: • Priority-based Flow Control (PFC) - Provides a link-level, flow-control mechanism that can be independently controlled for each priority to ensure zero-loss due to converged-network congestion. • Enhanced Transmission Selection (ETS) - Provides a common management framework for bandwidth assignment to traffic classes. • Data Center Bridging Exchange Protocol (DCBX) - A discovery and capability exchange protocol used to convey capabilities and configurations of the other DCB features between neighbors to ensure consistent configuration across the network. iSCSI can accomplishes the same state via TCP. However, DCB has aspects that make an iSCSI environment more reliable and customizable. It can improve performance and make that performance more deliverable. In a traditional IP network any lost frames need to be retransmitted. Removal of the potential for loss means that no retransmissions need to occur; and fewer retransmissions mean a gain in performance. While retransmissions are rare in well designed, traditional Ethernet deployments, DCB comes close to removing them completely. The second capability of DCB that’s important to iSCSI implementations is the allocation of bandwidth on specific links to specific functions. Data Center Solutions Guide – White Paper 36 iSCSI over DCB ideally needs an end-to-end DCB-aware connection to take full advantage of DCB’s lossless nature and bandwidth allocation capabilities but it can be added where it is needed. If the traffic from the storage to the switch is the issue then a DCB switch and DCB aware storage would be all that is needed. Alternatively if the servers need DCB then a DCB aware card and switch would be all that is needed 4.6 ELASTICITY A data center model that is truly elastic focuses on agility and modularity with simplified operations. Elasticity means that all the layers of the data center respond rapidly to new resource demands, adding and removing resources based on customer needs, with focus on being the whole network and end-to-end provisioning, not just on a single switch. Elasticity is beyond just an automation challenge, it’s about how synchronously the data center reacts to end customer business applications. 4.6.1 VM TRACKING Data traffic from VM to VM traverses the network as tagged traffic to maintain VLAN and tenant isolation. The VLANs need to be configured in the network fabric and associated to the appropriate edge ports on the leaf switches and be matched to the hypervisor configuration. Extreme Networks switches support multi-user, multi-method authentication on every port, absolutely essential when you have virtual machines as well as devices such as IP phones, computers, printers, copiers, security cameras and badge readers connected to the data center network. These multiple devices (or virtual machines) can connect to the same port and each device can have an independent policy configuration associated to it. With a manual workflow, every time an administrator creates a new port-group for VMs and allocates a certain VLAN tag on the hypervisor, the administrator would have to manually configure the corresponding VLAN on the leaf node, tag the downlink ports to the server and uplink ports to the spine, and create the VLAN on the spine and tag appropriate ports. And then when the port group is deleted or if the VM vMotions to another host, the administrator will need to repeat these tasks manually. Fortunately, Extreme Networks provides capabilities that turn this traditionallymanual workflow into a dynamic workflow. Extreme Networks Data Center Manager (DCM) integrates with the hypervisor (e.g. VMWare) to learn VM MAC addresses and then inputs this into the switch’s dynamic VLAN feature set that is called “VM Tracking” on the BDX8 and x670, and called “MAC Authentication” on the S-series and 7100. These feature combinations allow automation and orchestration through the hypervisor elements and dynamic VLAN and policy assignment on the network elements. When a virtual machine is detected on a port, ExtremeXOS VM Tracking feature uses NAC, or optionally a local policy, to determine the VR configuration and the VLAN configuration for the VM, and dynamically configures the VLAN on the access ports and related policies. If a virtual machine shuts down or is moved, its VLAN is pruned to preserve bandwidth. This feature creates an elastic infrastructure in which the network responds to changes dynamically in the virtual machine network. Data Center Solutions Guide – White Paper 37 This works with VMs configured to send tagged or untagged traffic. For untagged traffic, it authenticates against the MAC address. For a case where the VM sends tagged traffic, the VLAN tag of the received frame is also used to determine VLAN classification for the VM’s traffic. If VLAN configuration exists for the VM and it conflicts with the actual tag present in received traffic, the VM tracking feature reports an EMS message and does not trigger VLAN creation or port addition. However, if no configuration is present for the VM, the VM tracking feature assumes that there are no restrictions for classifying traffic for the VM to the received VLAN. The uplink ports can have either static VLAN configuration or they can also have the VLANs configured dynamically as needed. 4.6.2 AUTO-CONFIGURATION Extreme Networks provides a flexible and simple switch configuration solution, which allows organizations to quickly build networks or replace faulty switches for business continuity. Extreme Networks Auto Configuration feature is aimed to achieve plug-and-play deployment. It provides the ability to drop ship Extreme switches into the customer premises helps reduce or eliminate operational expenditure (OPEX) and costs involved in staging and any initial switch configuration. It also reduces costs incurred for customization of configuration with the ability to classify switches according to function, hardware type, or location. Standards-based classification (using DHCP) helps administrators create flexible and easy-to-manage configurations. Extreme Networks Auto Configuration provides: • Simple configuration which is easily enabled or disabled and the ability to drop ship Extreme switches into customer premises with feature enabled in advance either by the channel partners, system integrators or the Value-Added Resellers (VAR). Data Center Solutions Guide – White Paper 38 • Standards-based solution making the most effective use of protocols such as DHCP, and TFTP. DHCP is used for dynamic configuration of network parameters; TFTP is used for the configuration download. • Ability to download a configuration file in the standard configuration format (.cfg), as well as script files (.xsf). • Works with existing DHCP and TFTP infrastructure in the network, with minimal customization. • Ability to create a classification of switches based on the hardware/platform type which gives greater deployment flexibility. 4.7SECURITY 4.7.1 IDENTITY MANAGEMENT AND VDI To maintain controlled access to the data center, data center IT administrators need to learn more about the users and devices as they connect to and disconnect switches and take appropriate action based on their level of authorization. Administrators need to collect captured information, query LDAP servers to collect additional information, and then enable appropriate policies for traffic filtering, metering, generate EMS messages, and whitelist or blacklist identities. Extreme has a rich identity management (IDM) platform that seamlessly with NAC to manage an identity database and respond to all identity event triggers. IDM works with a variety of software components like LLDP, Kerberos, NetLogin, FDB, IP-Security. Extreme’s IDM platform also serves foundation for Virtual Desktop Infrastructure (VDI). With VDI, a user’s desktop is hosted in the data center as a virtual machine. What this means is that traditional network access control as well as Role Based Access Control (RBAC) that was tied to the user’s identity at the campus network edge now needs to move into the data center servernetwork edge where the user’s desktop is hosted. Extreme Networks’ Identity management solution, which transparently detects a user’s identity based on the user’s Kerberos authentication exchange, can be used in the data center at the server network edge for this purpose. When the VDI connection broker assigns a VM to a user, the user’s Kerberos authentication request passes through the network access switch in the data center. Extreme Networks Identity Management solution, detects the user’s identity based on passive Kerberos snooping, and provisions the network port with the right privileges for the user’s desktop that is now a virtual machine. Based on the user’s identity the appropriate role for the user can be configured and enforced dynamically directly at the VM level on the network access port. When a virtual desktop VM moves, the VM tracking capabilities can detect the VM movement and inform the Identity Management solution so that the user’s role can be enforced at the target server where the virtual desktop VM has moved. Data Center Solutions Guide – White Paper 39 4.7.2 TRAFFIC MANAGEMENT WITH THE OS An operating system designed from the ground up for Data center, ExtremeXOS builds into the network itself a traffic monitoring capability called CLEAR-Flow. CLEAR-Flow represents a new paradigm for network traffic management. For the first time, CLEAR-Flow brings together network monitoring, analysis, and response in a single process inside the Ethernet switching fabric. This creates a powerful toolbox for solving diverse network challenges that were previously difficult or impossible to solve, such as threat detection in high-speed networks. CLEAR-Flow is a broad framework for implementing security, monitoring, and anomaly detection in ExtremeXOS software. CLEAR-Flow allows data center IT administrators to specify certain types of traffic that deserve more attention. Once certain criteria for this traffic are met, the switch can then either take an immediate, pre-determined action, or send a copy of the traffic for off-switch analysis. This analysis can, in turn, result in the appropriate response to the particular traffic, for example, blocking a DoS attack or rate-limiting a user in violation of his service level agreement. CLEAR-Flow Processing Architecture Data Center Solutions Guide – White Paper 40 4.7.3 PROTECT THE NETWORK ELEMENTS Security in the Data Center has to happen at all levels. The network infrastructure layer needs to ensure that only authorized users are accessing the data center, and it needs to protect its network elements from deliberate attacks or unintentional vulnerabilities that can cause data center outages or bring down critical business applications. 4.7.3.1 DOS PROTECTION Intentional or unintentional traffic loads may overwhelm CPU processes on the switches, which would cause the switch to be too busy to service other functions and switch performance will suffer. Even with very fast CPUs, there will always be ways to overwhelm the CPU with packets that require costly processing. DoS Protection is designed to help prevent degraded switch performance by attempting to characterize the problem and filter out the offending traffic so that other functions can continue. When a flood of CPU bound packets reaches the switch, DoS Protection will count these packets. When the packet count nears the alert threshold, packets headers will be saved. If the threshold is reached, then these headers are analyzed, and a hardware access control list (ACL) is created to limit the flow of these packets to the CPU. This ACL will remain in place to provide relief to the CPU until it expires and the threat goes away. 4.7.3.2 GRATUITOUS ARP PROTECTION When a host sends an ARP request to resolve its own IP address it is called gratuitous ARP. While there are some valid times when users would issue a gratuitous ARP, data center IT administrators may not be able to prevent malicious users from causing a man-in-the-middle attack using gratuitous ARP. To protect against this type of attack, the switch can enable Gratuitous ARP protection and in response to receiving an unexpected gratuitous ARP, it will send out its own gratuitous ARP request to override the attacker. 4.7.3.3 IP DUPLICATE ADDRESS DETECTION IP address management in the data center can be challenging depending on the approach used to manage the IP address. IP address conflicts where the same IP address is configured on more than one machine will cause interruption to services so data center IT administrators need to detect and manage these to resolve any address conflicts as quickly as possible. Extreme’s Duplicate Address Detection (DAD) feature checks networks attached to a switch to see if IP addresses configured on the switch are already in use on an attached network. 4.7.3.4 DHCP SNOOPING AND ARP PROTECTION Data center IT administrators may want to strictly manage and allocate client IP addresses and prevent duplicate IP addresses from interrupting network operation. By using DHCP snooping and DHCP secured ARP, the switch won’t build its ARP table through the normal ARP learning process of tracking ARP requests and replies. Instead, the switch will build its ARP table from manually configured ARP entries or those secure ARP entries created by DHCP assignments or reassignments. 4.8 TOR AND EOR DESIGNS Top of Rack (ToR) designs are often deployed in data centers today. Their modular design makes staging and deployment of racks easy to incorporate with equipment life-cycle management. Also cabling is often perceived to be easier when compared to an End of Row (EoR) design, especially when a large amount of Gigabit Ethernet attached servers are deployed. Data Center Solutions Guide – White Paper 41 But ToR also has some disadvantages, such as: • ToR can introduce additional scalability concerns, specifically congestion over uplinks and shallow packet buffers which may prevent a predictable Class of Service (CoS) behavior. In an EoR scenario this can be typically achieved by adding new line cards to a modular chassis • Upgrades in technology (i.e. 1G to 10G, or 40G uplinks) often result in the complete replacement of a typical 1 Rack Unit (RU) ToR switch • Number of servers in a rack varies over time, thus varying the number of switch ports that must be provided Unused CAPEX sitting in the server racks is not efficient • Number of unused ports (aggregated) will be higher than in an End of Row (EoR) scenario This can also result in higher power consumption and greater cooling requirements compared to an EoR scenario These caveats may result in an overall higher Total Cost of Ownership (TCO) for a ToR deployment compared to an EoR deployment. Additionally cabling, cooling, rack space, power and services costs must also be carefully evaluated when choosing an architecture. Lastly a ToR design results in a higher oversubscription ratio towards the core and potentially a higher degree of congestion. A fabric-wide quality of service (QoS) deployment (with the emerging adoption of DCB) cannot fully address this concern today. Top of Rack design Data Center Solutions Guide – White Paper 42 Another data center topology option is an End of Row chassis-based switch for server connectivity. This design will place chassis-based switches at end of a row or the middle of a row to allow all the servers in a rack row to connect back to the switches. Compared to a ToR design the servers can be placed anywhere in the racks so hot areas due to high server concentration can be avoided. Also the usage of the EoR equipment is optimized compared to a ToR deployment, with rack space, power consumption, cooling and CAPEX decreased as well. The number of switches that must be managed is reduced with the added advantages of a highly available and scalable design. Typically chassis switches also provide more features and scale in an EoR scenario compared to smaller platforms typical of ToR designs. On the other hand, cabling can be more complex as the density in the EoR rack increase. 4.9 DATA CENTER INTERCONNECT (DCI) 4.9.1 OVERVIEW The evolving traffic patterns of clusters, servers and storage virtualization solutions are demanding new redundancy schemes. These schemes provide the transport technology used for inter-data center connectivity and cover the geographical distances between data centers. They are critical as the network design evolves to provide ever higher levels of stability, resiliency and performance. DCI solutions must provide for: • Cloud Bursting: create an elastic private cloud infrastructure that allows for optimized application delivery based on current and varying business demands • Disaster Recovery and Business Continuity: Effective and automated recovery from a catastrophic failure with no manual intervention to ensure business continuity • Workload and Data Mobility: A private cloud infrastructure that is optimized during runtime on resource utilization, application performance and product cost requires a borderless, single compute, network and storage infrastructure pool that can be dynamically allocated Data Center Solutions Guide – White Paper 43 The transport technology of choice between data centers is dependent upon several requirements: • Synchronous or asynchronous data replication • Jitter and delay acceptance for virtualized applications and their storage • Jitter and delay acceptance for cluster solutions • Available bandwidth per traffic class • Layer 2 or Layer 3 interconnect 4.9.2 LOAD BALANCING REQUIREMENT An important issue when operating a load-balanced service across data centers and within a data center is how to handle information that must be kept across the multiple requests in a user’s session. If this information is stored locally on one back end server, then subsequent requests going to different back end servers would not be able to find it. This might be cached information that can be recomputed, in which case load-balancing a request to a different back end server just introduces a performance issue. One solution to the session data issue is to send all requests in a user session consistently to the same back end server. This is known as “persistence” or “stickiness”. A downside to this technique is its lack of automatic failover: if a backend server goes down, its per session information becomes inaccessible, and sessions depending upon it are lost. So a seamless failover cannot be guaranteed. In most cases dedicated hardware load balancers are required. The discussion about load balancing and persistence has a great impact on separation. The figure below shows a typical situation for cluster node separation across two redundant data centers. In this example the node separation of different clusters types with shared nothing and shared data bases are shown. In many cases, the same subnet is used across both of the data centers, which is then route summarized. The “cluster” subnet will be advertised as an external route using “redistribute connected” and by filtering all subnets except the cluster subnet. While redistributing, the primary data center will be preferred to the remote data center by lower path cost until such time as the primary data center disappears completely. Cluster Node Separation Across Two data centers Data Center Solutions Guide – White Paper 44 The clients placed within the public campus network access the data center services across redundant switches. In active/standby redundant data center designs, the same subnet is used across both of the data centers, where the primary data center is specified with a lower path cost. The redundant switches are grouped together in one VRRP group, and the same VRRP IP and MAC are used in both locations to allow common gateway redundancy and allow seamless mobility. In this scenario, the primary data center will be preferred to the remote data center by lower path cost; the switch for the active data center will have higher VRRP priority. However this might cause problems in event of failover, when the traffic must be re-routed from the primary data center to the backup data center. This is especially true when traffic traverses stateful firewalls, when one has to make sure that traffic on both directions passes the same firewall system. Techniques for VRRP interface or next hop tracking can make sure that this is covered appropriately. To provide database access across both data centers at any time, connectivity between access switches and storage systems must be duplicated. Replication of databases must be achieved through Layer 2 techniques, such as VPLS, GRE, SPB, or with 802.1Q and RSTP/MSTP along with 802.3ad Link Aggregation or possibly through switch clustering/bonding techniques. In all cases one will face huge demand for bandwidth and performance that can be quite expensive for WAN links and must be properly sized. But the benefit will be improved data center availability and data center users will be able to load balance across them. 4.9.3 TYPES OF DCI Typically there can be Layer 3 IP interconnects between data centers and many data centers may need just that. There are some data center services that need a Layer 2 interconnect to stretch a subnet across multiple data centers, such as VM mobility, some data storage replication, server clustering, or other high availability and disaster recovery requirements. Data Center Solutions Guide – White Paper 45 The simplest method to connect multiple data centers together is to extend the common VLAN(s) across backbone extending the Layer 2 domain. This solution is a viable option for many smaller deployments but it should be considered on how this extension will impact larger networks. A more scalable method for Layer 2 data center interconnect is desirable. SPB can extend the Layer 2 domain between data centers. An alternative is to extend a Layer 2 tunnel between the data center sites to allow Layer 2 traffic to be transported transparently across the Layer 3 infrastructure, with the added benefit of not extending the size of the size of the spanning tree domain. Extreme switches leverage standard IP/GRE tunneling or VPLS to interconnect the data centers. In these scenarios, the multiple data center sites see each other as part of a common Layer 2 domain. Devices or virtual machines can easily be moved between data centers in a hot or cold manner. The networks can leverage Extreme Networks functionality including: • Fabric Routing which optimizes routing for east/west traffic by providing distributing routing in SPB and VSB-based designs • IP Host Mobility which optimizes routing for north/south traffic allowing IP host mobility by distributing a specific host route into the respective routing protocols to allow efficient symmetric traffic flow After moving to the new location, the VM will be reachable via its new location as a result of the VM host route advertisement by the local fabric router in the new data center location. Fabric routing and host routing optimize the flow of traffic into and between data centers by providing direct access to and from each data center symmetrically. The traffic optimization limits the amount of traffic that traverses the interconnect links to traffic that needs to go between data centers providing the added benefit of conserving bandwidth on potentially expensive data center interconnect links. Considering redundant data center designs where the same subnet is used across the primary and backup data center, a standard routed layer 3 interconnect may be suitable. This is environment is suitable in the scenario where traffic does not need to have direct Layer 2 connectivity between the respective data centers, such as when physical movement of server connectivity to a new data center is desired. 4.10 MANAGEMENT 4.10.1 DEVICE DISCOVERY Datacenter IT administrators need to verify physical connectivity to ensure proper physical writing between devices. Through Extreme Discovery Protocol (EDP) which is enabled by default, they can validate that the ports on the local Extreme switch are connected to the expected ports on the remote Extreme switch, looking at devices names, port numbers, neighbor IDs, number of VLANs those ports have been added too. The centralized data center management also can discover devices and provide topology information, as should be seamlessly integrated for the whole data center. The NetSight Discovery feature can automatically discover the new switches in the data center: Data Center Solutions Guide – White Paper 46 And then the NetSight Topology Map provides an easy way to visualize the data center. It is an automatically generated visual representation of network connectivity. Topology Maps provide Network Administrators with in-depth graphical views of device groupings, device links, VLANs, and Spanning Tree status. Data Center Solutions Guide – White Paper 47 4.10.2 OUT OF BAND NETWORK Out of Band (OOB) network management is an integral portion of the network infrastructure. A best practice for the OOB management network is that it should rely on a network infrastructure that is completely isolated and independent of the core data network. The advantages of separating the data and management plane are: • Ability to reach the infrastructure in the event of loss of data infrastructure. • Ability for administrators to troubleshoot and resuscitate the infrastructure in the event of a complete network outage • Dedicated path for management traffic, which while not bandwidth intensive, it is critical in providing key services to the infrastructure. As an alternative, it is possible to have a management network that is inline with the existing infrastructure. In this case, precautions must be taken so that access to the switches is still available even when the core network infrastructure is unavailable. Using a separate console network for example can achieve this. Apart from having a separate infrastructure to support the management of the infrastructure, the following network services should be provisioned to support the IaaS platform • SNMP • NTP • Syslog • Authentication • Network Management If switch management redundancy is of concern, then 2 Management switches should be deployed per rack and can be configured as MLAG peers, with VRRP. If there is only a single management switch, VRRP is not required in the management network. The hypervisor service consoles can have one VLAN and the leaf/spine switches can have another VLAN, to maintain separation. And the IP address range used for these management VLANs should be completely different from the data VLANs. All default gateways should be directed towards the LAYER 3 IP address on these VLANs, and the switch will route between the VLANs on the management infrastructure. 5 Data Center Infrastructure Elements The integration of the infrastructure elements is key to building a synchronous data center ecosystem. There are many infrastructure elements, hence many companies provide purely system integration services because this can be so challenging. Reference architectures that embody multiple elements to demonstrate interoperability are useful, for example, the VSPEX integrated reference architecture from EMC provides a modular reference architecture that is proven with best-in-class technologies including Extreme Networks. There are a wide variety of infrastructure elements that are needed in the ecosystem and there are a plethora of vendors that are available. With so many players, it is crucial to innovate and push technology boundaries synchronously. Extreme has many technology solution partners that help enrich the ecosystem of third party infrastructure elements. Please see http://www.extremenetworks.com/partners/tsp/ for more details on Extreme’s Technology Solution Partner program. Data Center Solutions Guide – White Paper 48 The infrastructure elements from third parties, what they are and how they fit into the whole data center ecosystem are also changing rapidly. Traditional centralized models for compute, storage, services are disaggregating and physical resourcing is evolving into virtualization resourcing. What embodies a data center “rack” is also being rearchitected, and with the Rack Scale Architecture (RSA) applications are no longer constrained to resources in a single server. The application drives allocation of compute, memory, and I/O from pools of resources that are aggregated in the rack itself. Similarly at the application layer, current models of custom hardware appliances for different services are being replaced in favor of Network Functions Virtualization (NFV), which will change the way firewalls, intrusion detect devices, load balancers, and other virtualized network functions (VNF) are deployed in the data center. NFV can address the challenges of legacy data centers, and VNF can be linked through “service chaining” enabling new services to be applied more quickly via the orchestration mechanisms. Data Center Solutions Guide – White Paper 49 5.1 SERVER VIRTUALIZATION 5.1.1 OVERVIEW Virtualization has introduced the ability to create dynamic data centers and with the added benefit of “green IT.” Server virtualization can provide better reliability and higher availability in the event of hardware failure. Server virtualization also allows higher utilization of hardware resources while improving administration by having a single management interface for all virtual servers. The server Virtualization layer transform the physical resources of a server by virtualizing the CPU, RAM, hard disk, and network controller. This transformation creates fully functional virtual machines that run isolated and encapsulated operating systems and applications just like physical computers. Different vendors such as VMWare, Citrix, Microsoft, and others, provide centralized management platforms, e.g. vSphere, Xen, Hyper-V, respectively, that provide administrators with a single interface for all aspects of monitoring, managing, and maintaining the virtual infrastructure, which can be accessed from multiple devices. The high-availability features of these hypervisors, such VMWare’s vMotion, Citrix’s XenMotion, Microsoft’s Live Migration, enable seamless migration of virtual machines and stored files from one server to another, or from one data storage area to another, with minimal or no performance impact. Coupled with other hypervisor features for intelligent resource scheduling, the virtual machines have access to the appropriate resources at any point in time through load balancing of compute and storage resources. The servers should be located on an out-of-band management network that has a completely separate network path as opposed to the data path. This ensures that loss of the data path will allow administrators to still access the infrastructure via the management path thereby allowing them to bring back the infrastructure as quickly as possible. 5.1.2 SWITCHING FUNCTIONS IN HYPERVISORS In a virtualized world where virtual machines are sharing common resources, there will be traffic coming in one physical adapter in one physical server, but the traffic will be destined to different VMs. The physical adapters need to be virtualized for the VMs. The software-based approach is to have a switching component reside in the hypervisor on the server. This virtual switch (i.e. vSwitch) is often proprietary to the hypervisor, and there is also development underway to develop an open standardsbased vSwitch. For performance enhancements, some hardware manufacturers are developing technologies that provide hardware acceleration for software based switches, for example Intel’s DPDK technology for packet framework optimizations. And then there is I/O virtualization, i.e. SR-IOV virtual function, where the adapters present multiple logical adapters to the hypervisor. Deployed with the hypervisors, distributed switches function as a single switch across all associated hosts. This enables data center administrators to set network configurations that span across all member hosts, and allows virtual machines to maintain consistent network configuration as they migrate across multiple hosts, maintaining their uplink properties 5.2 STORAGE Storage is a critical piece of a virtualized infrastructure. Data Centers are moving towards converged infrastructures that will result in fewer adapters, cables, and nodes, and ultimately in more efficient network operations. This is driving more requirements Data Center Solutions Guide – White Paper 50 around storage being supported by the Ethernet fabric. The BDX8 or S-series modular switches deliver an easy and effective way to optimize communications through automatic discovery, classification, and prioritization of SANs. Storage requirements vary by server type. Application servers require much less storage than database servers. There are several storage options – Direct Attached Storage (DAS), Network Attached Storage (NAS), or Storage Area Network (SAN). In the past, Fibre Channel (FC) offered better reliability and performance but needed highly-skilled SAN administrators. Dynamic data centers, leveraging server virtualization with Fibre Channel attached storage, will require the introduction of a new standard, Fibre Channel over Ethernet (FCoE). FCoE, requires LAN switch upgrades due to the nature of the underlying requirements, as well as Data Center Bridging Ethernet standards. FCoE is also non-routable, so it may cause issues when it comes to the implementation of disaster recovery or large geographical redundancy that Layer 2 connectivity cannot yet achieve. On the other hand, iSCSI which is a standard Ethernet block-based storage protocol that allows SCSI commands to be encapsulated in TCP/IP, give servers access to storage devices over common Ethernet IP networks. It provides support for faster speeds and improved reliability, making it more attractive. iSCSI offers increased flexibility and a more cost effective solution by leveraging existing network components (NICs, switches, etc.). In addition, Fibre Channel switches typically cost 50% more than Ethernet switches. Overall, iSCSI is easier to manage than Fibre Channel, considering most IT personnel familiarity with the management of IP networks. The storage solutions will also continue to evolve with Software Defined Storage (SDS). Traditionally the storage management software resides on the storage controller, and with SDS the software is decoupled from the hardware and moves off to a server. This decoupling enables more efficient resource management and offers more flexibility in hardware selection, perhaps even commodity storage hardware, and easier data moves between multiple storage vendors. SDS also provides the ability to integrate storage more smoothly into data center orchestration solutions and analytics tools that need access to large pools of data. 5.3 FIREWALLS Data Centers need protection against DoS attacks and other threats that target the users in the data center, and firewalls mitigate threats by inspecting data traffic and following user-created rules to take action on the traffic. This firewall is responsible for protecting the entire data center against any malicious attacks from the Internet or even within itself. It is also responsible for controlling the traffic flow that originates from the tenants and heads towards the Internet. It is imperative to couple this type of firewall with a larger security infrastructure to protect the data center against intelligent attacks like DDoS. The aggregation layer is the best-suited enforcement point for firewall security, or additional services like VPN, Intrusion Prevention System (IPS). All servers can access these services with short but predictable latency and bandwidth in an equal fashion. High performance and intelligent Layer 4-7 application switches provide always-on, highly scalable and secure business critical applications or be part of that layer itself. Network architects typically configure service modules and appliances to be in transparent (pass-through) mode, since these modules need to be able to be removed without requiring a reconfiguration of the entire system. When these modules are put in-line (all traffic passes through them), module throughput must be calculated so that the service modules will not introduce significant congestion into the system. One must avoid adding additional points of oversubscription whenever possible. For example, while traffic from clients to servers must pass through an IPS, Data Center Solutions Guide – White Paper 51 traffic between servers may not need to. In addition to raw bandwidth, the number of concurrent sessions and the rate of connections per second that a security device supports can introduce additional performance issues. The number of concurrent sessions or connections per second can be calculated from the total number of servers and end users. While there’s no general rule for this calculation, vendors will typically supply a recommendation based upon the use model and configuration. In a typical data center deployment, the firewall access layer consists of a cluster of multiple firewalls. These firewalls have the ability to exchange traffic flow state between them, thereby allowing for an active/active access topology. Such a topology allows for a redundant access network and also provides the necessary resource required, serving the entire data center with the bandwidth for Internet access. The firewall physical infrastructure can be connected to the Data Center spine via links on high-speed interfaces off the BDX8 or S-series. The VMs can then use the spine switch as the default gateway and let the switch handle the routing. For traffic destined within the Data Center, the spine switch routes the traffic back down to the right leaf switches. For traffic destined outside the Data Center, the spine switch may have a default route to transmit the traffic to the firewall for processing. Virtual firewall services are also provided by some hypervisors that place firewall filters on the virtual network adapters that provide stateful inspection of data traffic and allow or prevent transmission based on user-defined rules. This happens transparently to the network elements. As with the evolution towards NFV, a lot of the firewalling is becoming more and more virtualized and distributed because it is less expensive than doing it in hardware and enables service chaining. 5.4 SERVICE CHAINING Service chaining is an integral solution of an automated datacenter as it manages how services provided in the datacenter, switching, Routing, Firewall, Loadbalancing, IDS/ IPS, DLP, Antivirus, Antispam, QoS etc are provided to the tenants of the datacenter. Traditional datacenters heavily relied on manual configuration to set the precedence of services and create the traffic path taking traffic from the server to the different services. Services that traditionally were provided by dedicated hardware and for the whole network are being virtualized and applied on a per-tenant basis in cloud environments. Given OneController’s visibility to the entire data center, the controller can rapidly deploy those new functionalities, and provide service registration, service insertion, and service chaining. These evolutions will challenge the way IT administrators build their data centers but will ultimately improve their agility and ability to offer new services. The advent of automation and orchestration tools allowing dynamic workload creation, like OpenStack together with SDN technologies, which Extreme Networks embraces with its OneController platform, allow to orchestrate the network path as another resource in the datacenter. The target of this concept is to provide a generalized policy object containing all the services and relations assigned to a server, customer or traffic definition. This generalized policy can contain traditional policy concepts like: • Vlan Membership • Traffic QoS • Simple traffic filters Data Center Solutions Guide – White Paper 52 Or can relate to more elaborate concepts like: • Firewall rules • IDS/IDP rules • Load balancing group membership in a LB configuration • Traffic engineering/Resiliency configurations • Traffic mirroring Together with ordering for these actions the generalized policy will allow to define the traffic path in the network and how the traffic is handled from one service to another to increase the value chain in an automated manner without tedious individual configuration processes for each server. Some of the concepts expressed here are present in the Group Policy concept in OneController, some of them will be developed as enhancements to the Group Policy framework in future releases of OneController. As some examples: 1. A firewall is added to the network and several firewalling rules are created. These rules can be registered as extensions to the Policy framework so the policy can be added to the end system like: Endpoint_A MAPSTO Policy_A With the following Policy definition: Policy_A: • Apply bandwidth shaping to traffic TCP port 29 to 10Mbps • Apply QoS premium to TCP port 5006 • Set VLAN 3, egress tagged • Apply Firewall policy Firewall_A As part of the firewall registration in the framework must include how the traffic is handled by the controller and how flows must be created to chain the service such a way that traffic from Endpoint_A is forwarded through Firewall and the Firewall rule is applied. 2.A VoIP recording device is added to the network and so a new service definition can include “recorder processing” as part of its definition indicating that all or some traffic from that device must be mirrored to the recorder for recording. 3.A guest management system is added to the network to it can register with the policy API and add the service “captive portal processing” to the list of services that a user can receive. http://www.extremenetworks.com/contact Phone +1-408-579-2800 ©2014 Extreme Networks, Inc. All rights reserved. Extreme Networks and the Extreme Networks logo are trademarks or registered trademarks of Extreme Networks, Inc. in the United States and/or other countries. All other names are the property of their respective owners. For additional information on Extreme Networks Trademarks please see http://www.extremenetworks.com/company/legal/trademarks/. Specifications and product availability are subject to change without notice. 8916-1114 WWW.EXTREMENETWORKS.COM Data Center Solutions Guide – White Paper 53