Federation Enterprise Hybrid Cloud 3.5 - Concepts and

ABSTRACT This Solution Guide provides an introduction to the concepts and architectural options available within the Federation Enterprise Hybrid Cloud solution. It should be used as an aid to deciding on the most suitable configuration for the initial deployment of a Federation Enterprise Hybrid Cloud solution. February 2016 Copyright © 2016 EMC Corporation. All rights reserved. Published in the USA. Published February 2016 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC2, EMC, Avamar, Data Domain, Data Protection Advisor, Enginuity, GeoSynchrony, Hybrid Cloud, PowerPath/VE, RecoverPoint, SMI-S Provider, Solutions Enabler, VMAX, Syncplicity, Unisphere, ViPR, EMC ViPR Storage Resource Management, Virtual Storage Integrator, VNX, VPLEX, VPLEX, Geo, VPLEX Metro, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Federation Enterprise Hybrid Cloud 3.5 Concepts and Architecture Guide Solution Guide Part Number H14719 2 Contents Chapter 1 Executive Summary ............................................................. 5 Federation solutions ............................................................................................ 6 Document purpose .............................................................................................. 6 Audience ............................................................................................................ 6 Essential reading ................................................................................................ 6 Solution purpose ................................................................................................. 6 Business challenge .............................................................................................. 7 Technology solution............................................................................................. 7 Terminology ....................................................................................................... 8 We value your feedback! ...................................................................................... 9 Chapter 2 Cloud Management Platform Options .................................... 10 Overview ..........................................................................................................11 Cloud management platform components .............................................................11 Cloud management platform models ....................................................................14 Chapter 3 Network Topologies ............................................................ 21 Overview ..........................................................................................................22 Implications of virtual networking technology options .............................................22 Logical network topologies ..................................................................................24 Chapter 4 Single-Site/Single vCenter Topology..................................... 31 Overview ..........................................................................................................32 Single-site networking considerations ...................................................................32 Single-site storage considerations ........................................................................33 Recovery of cloud management platform ..............................................................36 Backup of single-site/single vCenter enterprise hybrid cloud ....................................36 Chapter 5 Dual-Site/Single vCenter Topology ....................................... 37 Overview ..........................................................................................................38 Standard dual-site/single vCenter topology ...........................................................38 Continuous availability dual-site/single vCenter topology ........................................39 Continuous availability network considerations ......................................................40 VPLEX Witness ...................................................................................................45 VPLEX topologies ...............................................................................................46 Continuous availability storage considerations .......................................................53 Recovery of cloud management platform ..............................................................56 Backup in dual-site/single vCenter enterprise hybrid cloud ......................................56 CA dual-site/single vCenter ecosystem .................................................................57 Chapter 6 Dual-Site/Dual vCenter Topology ......................................... 58 Overview ..........................................................................................................59 3 Contents Standard dual-site/dual vCenter topology .............................................................59 Disaster recovery dual-site/dual vCenter topology ..................................................60 Disaster recovery network considerations .............................................................61 vCenter Site Recovery Manager considerations ......................................................69 vRealize Automation considerations ......................................................................72 Disaster recovery storage considerations ..............................................................73 Recovery of cloud management platform ..............................................................74 Best practices ....................................................................................................75 Backup in dual-site/dual vCenter topology ............................................................75 DR dual-site/dual vCenter ecosystem ...................................................................76 Chapter 7 Data Protection.................................................................. 77 Overview ..........................................................................................................78 Concepts...........................................................................................................79 Standard Avamar configuration............................................................................84 Redundant Avamar/single vCenter configuration ....................................................86 Redundant Avamar/dual vCenter configuration ......................................................90 Chapter 8 Solution Rules and Permitted Configurations ......................... 95 Overview ..........................................................................................................96 Architectural assumptions ...................................................................................96 VMware Platform Services Controller ....................................................................96 VMware vRealize tenants and business groups .......................................................98 EMC ViPR tenants and projects ............................................................................99 General storage considerations .......................................................................... 100 VMware vCenter endpoints ................................................................................ 100 Permitted topology configurations ...................................................................... 101 Permitted topology upgrade paths ...................................................................... 102 Bulk import of virtual machines ......................................................................... 103 DR dual-site/dual vCenter topology restrictions.................................................... 104 Resource sharing ............................................................................................. 106 Data protection considerations ........................................................................... 106 Software resources .......................................................................................... 106 Sizing guidance ............................................................................................... 106 Chapter 9 Conclusion ....................................................................... 107 Conclusion ...................................................................................................... 108 Chapter 10 References ....................................................................... 109 Federation documentation ................................................................................. 110 4 Chapter 1: Executive Summary This chapter presents the following topics: Federation solutions ............................................................................................ 6 Document purpose .............................................................................................. 6 Audience ............................................................................................................ 6 Essential reading ................................................................................................ 6 Solution purpose ................................................................................................. 6 Business challenge .............................................................................................. 7 Technology solution............................................................................................. 7 Terminology ....................................................................................................... 8 We value your feedback! ...................................................................................... 9 5 Chapter 1: Executive Summary EMC II, Pivotal, RSA, VCE, Virtustream, and VMware form a unique Federation of strategically aligned businesses that are free to execute individually or together. The Federation businesses collaborate to research, develop, and validate superior, integrated solutions and deliver a seamless experience to their collective customers. The Federation provides customer solutions and choice for the software-defined enterprise and the emerging third platform of mobile, cloud, big data, and social networking. The Federation Enterprise Hybrid Cloud 3.5 solution is a completely virtualized data center, fully automated by software. The solution starts with a foundation that delivers IT as a service (ITaaS), with options for high availability, backup and recovery, and disaster recovery (DR). It also provides a framework and foundation for add-on modules, such as database as a service (DaaS), platform as a service (PaaS), and cloud brokering. This Solution Guide provides an introduction to the concepts and architectural options available within the Federation Enterprise Hybrid Cloud solution. It should be used as an aid to deciding on the most suitable configuration for the initial deployment of a Federation Enterprise Hybrid Cloud solution. This Solution Guide is intended for executives, managers, architects, cloud administrators, and technical administrators of IT environments who want to implement a hybrid cloud IaaS platform. Readers should be familiar with the VMware® vRealize® Suite, storage technologies, general IT functions and requirements, and how a hybrid cloud infrastructure accommodates these technologies and requirements. The Federation Enterprise Hybrid Cloud 3.5: Reference Architecture Guide describes the reference architecture of a Federation Enterprise Hybrid Cloud solution. The guide introduces the features and functionality of the solution, the solution architecture and key components, and the validated hardware and software environments. The following guides provide further information about various aspects of the Federation Enterprise Hybrid Cloud solution:  Federation Enterprise Hybrid Cloud 3.5: Reference Architecture Guide  Federation Enterprise Hybrid Cloud 3.5: Administration Guide  Federation Enterprise Hybrid Cloud 3.5: Infrastructure and Operations Management Guide  Federation Enterprise Hybrid Cloud 3.5: Security Management Guide The Federation Enterprise Hybrid Cloud solution enables customers to build an enterpriseclass, scalable, multitenant infrastructure that enables: 6  Complete management of the infrastructure service lifecycle  On-demand access to and control of network bandwidth, servers, storage, and security Chapter 1: Executive Summary  Provisioning, monitoring, protection, and management of the infrastructure services by the line of business users, without IT administrator involvement  Provisioning from application blueprints with associated infrastructure resources by line-of-business application owners without IT administrator involvement  Provisioning of backup, continuous availability (CA), and DR services as part of the cloud service provisioning process  Maximum asset use While many organizations have successfully introduced virtualization as a core technology within their data center, the benefits of virtualization have largely been restricted to the IT infrastructure owners. End users and business units within customer organizations have not experienced many of the benefits of virtualization, such as increased agility, mobility, and control. Transforming from the traditional IT model to a cloud-operating model involves overcoming the challenges of legacy infrastructure and processes, such as:  Inefficiency and inflexibility  Slow, reactive responses to customer requests  Inadequate visibility into the cost of the requested infrastructure  Limited choice of availability and protection services The difficulty in overcoming these challenges has given rise to public cloud providers who have built technology and business models catering to the requirements of end-user agility and control. Many organizations are under pressure to provide similar service levels within the secure and compliant confines of the on-premises data center. As a result, IT departments need to create cost-effective alternatives to public cloud services, alternatives that do not compromise enterprise features such as data protection, DR, and guaranteed service levels. This Federation Enterprise Hybrid Cloud solution integrates the best of EMC and VMware products and services, and empowers IT organizations to accelerate implementation and adoption of a hybrid cloud infrastructure, while still enabling customer choice for the compute and networking infrastructure within the data center. The solution caters to customers who want to preserve their investment and make better use of their existing infrastructure and to those who want to build out new infrastructures dedicated to a hybrid cloud. This solution takes advantage of the strong integration between EMC technologies and the VMware vRealize Suite. The solution, developed by EMC and VMware product and services teams includes EMC scalable storage arrays, integrated EMC and VMware monitoring, and data protection suites to provide the foundation for enabling cloud services within the customer environment. The Federation Enterprise Hybrid Cloud solution offers several key benefits to customers:  Rapid implementation: The solution can be designed and implemented in as little as 28 days, in a validated, tested, and repeatable way. This increases the time-to-value while simultaneously reducing risk.  Supported solution: Implementing Federation Enterprise Hybrid Cloud through EMC also results in a solution that is supported by EMC and further reduces risk associated with the ongoing operations of your hybrid cloud. 7 Chapter 1: Executive Summary  Defined upgrade path: Customers implementing the Federation Enterprise Hybrid Cloud receive upgrade guidance based on the testing and validation completed by the Federation engineering teams. This upgrade guidance enables customers, partners, and EMC services teams to perform upgrades faster, and with reduced risk.  Validated and tested integration: Extensive testing and validation has been conducted by solutions engineering teams resulting in simplified use, management, and operation. The EMC Federation EMC II, Pivotal, RSA, VCE, Virtustream, and VMware form a unique Federation of strategically aligned businesses; each can operate individually or together. The Federation provides customer solutions and choice for the software-defined enterprise and the emerging “3rd platform” of mobile, cloud, big data and social, transformed by billions of users and millions of apps. Table 1 lists the terminology used in this guide. Table 1. 8 Terminology Term Definition ACL Access control list AIA Authority Information Access API Application programming interface Blueprint A blueprint is a specification for a virtual, cloud, or physical machine and is published as a catalog item in the common service catalog Business group A managed object that associates users with a specific set of catalog services and infrastructure resources CBT Changed Block Tracking CDP CRL Distribution Point CRL Certificate Revocation List CSR Certificate Signing Request DHCP Dynamic Host Configuration Protocol Fabric group A collection of virtualization compute resources and cloud endpoints managed by one or more fabric administrators FQDN Fully qualified domain name HSM Hardware security module IaaS Infrastructure as a service IIS Internet Information Services LAG Link aggregation group that bundles multiple physical Ethernet links between two or more devices into a single logical link can also be used to aggregate available bandwidth, depending on the protocol used. LDAP Lightweight Directory Access Protocol LDAPS LDAP over SSL Chapter 1: Executive Summary Term Definition MCCLI Management Console Command Line Interface PEM Privacy Enhanced Mail PKI Public key infrastructure PVLAN Private virtual LAN SSL Secure Sockets Layer TACACS Terminal Access Controller Access Control System vRealize Automation blueprint A specification for a virtual, cloud, or physical machine that is published as a catalog item in the vRealize Automation service catalog VDC Virtual device context vDS Virtual distributed switch VLAN Virtual local area network VMDK Virtual machine disk VRF Virtual routing and forwarding VSI Virtual Storage Integrator VXLAN Virtual Extensible LAN EMC and the authors of this document welcome your feedback on the solution and the solution documentation. Please contact us at EMC.Solution.Feedback@emc.com with your comments. Authors: Ken Gould, Fiona O’Neill 9 Chapter 2: Cloud Management Platform Options This chapter presents the following topics: Overview ..........................................................................................................11 Cloud management platform components .............................................................11 Cloud management platform models ....................................................................14 10 Chapter 2: Cloud Management Platform Options The cloud management platform supports the entire management infrastructure for this solution. This management infrastructure is divided into three pods (functional areas), which consist of one or more VMware vSphere® ESXi™ clusters and/or vSphere resource groups, depending on the model deployed. Each pod performs a solution-specific function. This chapter describes the components of the management platform and the models available for use. After reading it, you should be able to decide on the model that suits your environment. Management terminology and hierarchy To understand how the management platform is constructed, it is important to know how a number of terms are used throughout this guide. Figure 1 shows the relationship between platform, pod, and cluster and their relative scopes as used in the Federation Enterprise Hybrid Cloud. Figure 1. Cloud management terminology and hierarchy The following distinctions exist in terms of the scope of each term:  Platform (Cloud Management Platform) is an umbrella term intended to represent the entire management environment.  Pod (Management Pod) Each management pod is a subset of the overall management platform and represents a distinct area of functionality. What each area constitutes in terms of compute resources differs depending on the management models discussed in Cloud management platform models.  Cluster (Technology Cluster) is used in the context of the individual technologies. While it may refer to vSphere clusters, it can also refer to VPLEX clusters, EMC RecoverPoint® clusters, and so on.  Resource pools Non-default resource pools are used only when two or more management pods are collapsed onto the same vSphere cluster and are used to control and guarantee resources to each affected pod. Management platform components Federation Enterprise Hybrid Cloud sub-divides the management platform into distinct functional areas called management pods. These management pods are:  Core Pod  Network Edge Infrastructure (NEI) Pod  Automation Pod 11 Chapter 2: Cloud Management Platform Options Figure 2 shows how the components of the management stack are distributed among the management pods. Figure 2. Cloud management platform component layout Core Pod The Core Pod provides the base set of resources to establish the Federation Enterprise Hybrid Cloud solution services. It consists of:  External VMware vCenter Server™ (optional): This vCenter instance hosts only the Core Pod components and hardware. It is required when using the Distributed management model and may already exist, depending on customer resources.  Cloud VMware vCenter Server: This vCenter instance is used to manage the NEI and Automation components and compute resources. If using a Collapsed management model or Hybrid management model, it also hosts the Core Pod components and hardware. vRealize Automation uses this vCenter Server as its endpoint from which the appropriate vSphere clusters are reserved for use by vRealize Automation business groups. Note: While Figure 2 depicts vCenter, Update Manager, and PSC as one cell of related components, they are deployed as separate virtual machines.  Microsoft SQL Server: Hosts SQL Server databases used by the Cloud vCenter Server and VMware Update Manager™. It also hosts the VMware vCenter Site Recovery Manager™ database in a DR dual-site/dual vCenter topology. 12 Chapter 2: Cloud Management Platform Options Note: Figure 2 includes separate SQL Server virtual machines for the External and Cloud vCenter SQL Server databases. This provides maximum resilience. Placing both vCenter databases on the same SQL Server virtual machine in the Core Pod is also supported. The vRealize IaaS SQL Server database must be on its own SQL Server instance in the Automation Pod.  VMware NSX™: Used to deploy and manage the virtual networks for the management infrastructure and Workload Pods.  EMC SMI-S Provider: Management infrastructure required by EMC ViPR. Note: When using VCE platforms, the SMI-S provider in the AMP cluster can be used, but must be kept in line with versions specified by Federation Enterprise Hybrid Cloud Simple Support Matrix. This may require a VCE Impact Assessment (IA) to achieve. The hardware hosting the Core Pod is not enabled as a vRealize Automation compute resource, but the virtual machines it hosts provide the critical services required to instantiate the cloud. All of the virtual machines on the Core Pod are deployed on non-ViPR provisioned storage. The virtual machines can use existing SAN connected storage or any highly availability storage in the customer environment. The Federation Enterprise Hybrid Cloud supports Fibre Channel (FC), iSCSI, and NFS storage from EMC VNX® storage systems for the Core Pod storage. Though not mandatory, FC connectivity is strongly recommended. Note: For continuous availability topologies, this must be SAN-based block storage. All storage should be RAID protected and all vSphere ESXi servers should be configured with EMC PowerPath/VE for automatic path management and load balancing. Network Edge Infrastructure (NEI) Pod The NEI Pod is only required where VMware NSX is deployed, and is used to host NSX controllers, north-south NSX Edge Services Gateway (ESG) devices, and NSX Distributed Logical Router (DLR) control virtual machines. Use vSphere DRS rules to ensure that NSX Controllers are separated from each other and also to ensure that primary ESGs are separated from primary DLRs so that a host failure does not affect network availability. This pod provides the convergence point for the physical and virtual networks. Like the Core Pod, storage for this pod should be RAID protected and the Federation recommends Fibre Channel connections. vSphere ESXi hosts should run EMC PowerPath®/VE for automatic path management and load balancing. Automation Pod The Automation Pod hosts the remaining virtual machines used for automating and managing the cloud infrastructure. The Automation Pod supports the services responsible for functions such as the user portal, automated provisioning, monitoring, and metering. The Automation Pod is managed by the Cloud vCenter Server instance; however it is dedicated to automation and management services. Therefore, the resources from this pod are not exposed to vRealize Automation business groups. The Automation Pod cannot share networks or storage resources with the workload clusters, and should be on a distinctly different Layer 3 network to both the Core and NEI management pods, even when using a collapsed management model. Storage provisioning for the Automation Pod follows the same guidelines as the NEI Pod. Automation Pod networks may be VxLANs managed by NSX. Note: While Figure 2 depicts vRealize IaaS as one cell of related components, the individual vRealize Automation roles are deployed as separate virtual machines. 13 Chapter 2: Cloud Management Platform Options Workload Pods Workload Pods are configured and assigned to fabric groups in VMware vRealize™ Automation. Available resources are used to host virtual machines deployed by business groups in the Federation Enterprise Hybrid Cloud environment. All business groups can share the available vSphere ESXi cluster resources. EMC ViPR® service requests are initiated from the vRealize Automation catalog to provision Workload Pod storage. Note: Workload Pods were previously termed resource pods in Enterprise Hybrid Cloud 2.5.1 and earlier. The Federation Enterprise Hybrid Cloud supports three management models, as shown in Table 2. Note: Minimum host count depends on the compute resources specification being sufficient to support the relevant management virtual machine requirements. The Federation Enterprise Hybrid Cloud sizing tool may recommend a higher number based on the server specification chosen. Table 2. Federation Enterprise Hybrid Cloud management models Management model Pod vCenter used Cluster used Minimum no of hosts Resource pool used Distributed Core External Core Cluster 2 N/A NEI (NSX Only) Cloud NEI Cluster 4 N/A Automation Cloud Automation Cluster 2 N/A Core Cloud Collapsed Cluster 2 (w/o NSX) Core Collapsed Hybrid 4 (w/ NSX) NEI (NSX Only) Cloud Collapsed Cluster NEI Automation Cloud Collapsed Cluster Core Cloud Core Cluster (AMP*) 3* N/A NEI (NSX Only) Cloud NEI Cluster 4 N/A Automation Cloud Automation Cluster 2 N/A Automation * Based on VCE AMP2-HAP configuration. For ultimate resilience and ease of use during maintenance windows, creating vSphere clusters sizes based on N+2 may be appropriate according to customer preference, where N is the calculated CPU and RAM requirements for the hosted virtual machines plus host system overhead. The Federation Enterprise Hybrid Cloud sizing tool sizes vSphere clusters based on an N+1 algorithm. Table 3 indicates the Federation Enterprise Hybrid Cloud topologies supported by each management model. The topologies themselves are described later in this document. 14 Chapter 2: Cloud Management Platform Options Table 3. Federation Enterprise Hybrid Cloud topologies supported by each management model Management model Single site Standard dualsite/single vCenter CA dualsite/single vCenter Standard dualsite/dual vCenter Disaster recovery dualsite/dual vCenter Distributed Supported Supported Supported Supported Supported Collapsed Supported Supported Supported Supported Supported Hybrid Supported Supported Supported Supported Supported The following sections describe each of these models and provides guidance on how to choose the appropriate model. Distributed management model The distributed management model uses two separate vCenter instances and each management pod has its own distinct vSphere cluster. It requires a minimum of six hosts when used without NSX, or eight hosts when used with NSX. The first External vCenter Server instance manages all vSphere host and virtual machine components for the Core Pod. While the virtual machine running this vCenter instance can also be located within the Core Pod itself, it may be located on a separate system for further levels of high availability. The second Cloud vCenter Server instance located on the cloud management platform manages the NEI, Automation, and Workload Pods supporting the various business groups within the enterprise. This vCenter server acts as the vSphere end-point for vRealize Automation. Figure 3 shows the distributed management model configuration with two vCenters where the first vCenter supports the Core Pod and the second vCenter supports the remaining cloud management pods and tenant resources. Figure 3. Distributed Federation Enterprise Hybrid Cloud management model – vSphere view The distributed management model:  Enables Core Pod functionality and resources to be provided by a pre-existing vSphere instance within your environment. 15 Chapter 2: Cloud Management Platform Options Collapsed management model  Provides the highest level of resource separation (that is, host level) between the Core, Automation, and NEI Pods.  Places the NEI Pod ESXi cluster as the single intersection point between the physical and virtual networks configured within the solution, which eliminates the need to have critical networking components compete for resources as the solution scales and the demands of other areas of the cloud management platform increase.  Enhances the resilience of the solution because a separate vCenter server and SQL Server instance host the core cloud components. The collapsed management model uses a single vCenter server to host all Core, Automation, and NEI Pod components as well as the Workload Pods. Each management pod is implemented as an individual vSphere resource pool on a single (shared) vSphere cluster, which ensures that each pod receives the correct proportion of compute and network resources. It requires a minimum of two physical hosts when used without NSX, or four hosts when used with NSX. Figure 4 shows an example of how the vSphere configuration might look with a collapsed management model. Figure 4. Collapsed Federation Enterprise Hybrid Cloud management model – vSphere view The collapsed management model: 16  Provides the smallest overall management footprint for any given cloud size to be deployed.  Allows resource allocation between pods to be reconfigured with minimal effort.  Allows high-availability overhead to be reduced by using a single cluster, but does not alter the CPU, RAM, or storage required to manage the solution. Chapter 2: Cloud Management Platform Options Resource pool considerations Given that a single vSphere cluster is used in the collapsed management model, a vSphere resource group is required for each management pod to ensure sufficient resources are reserved for each function. Use the guidelines in Table 4 as the starting point for balancing these resources appropriately. Table 4. Collapsed management model: Resource groups configuration Resource Core NEI Auto CPU 20% 20% 60% RAM 20% 5% 75% Note: These figures are initial guidelines and should be monitored in each environment and fine-tuned accordingly. The percentages can be implemented, as shares, in whatever scale is required, as long as the percentage of shares assigned to each resource pool corresponds to the ratio of percentages in Table 4. Hybrid management model The hybrid management model uses a single vCenter server to host all Core, Automation, and NEI Pod components as well as the Workload Pods. Each management pod has its own vSphere cluster. Therefore, it requires a minimum of six hosts when used without NSX, or eight hosts when used with NSX. Figure 5 shows an example of how the vSphere configuration might look with a hybrid management model. Figure 5. Hybrid Federation Enterprise Hybrid Cloud management model – vSphere view 17 Chapter 2: Cloud Management Platform Options The hybrid management model: Deciding on the management model  Provides the highest level of resource separation (that is, host level) between the Core, Automation, and NEI Pods.  Places the NEI Pod ESXi cluster as the single intersection point between the physical and virtual networks configured within the solution, which eliminates the need to have critical networking components compete for resources as the solution scales and the demands of other areas of the cloud management platform increase.  Is compatible with VxBlock NSX factory deployments, and may use the VCE AMP vCenter as the Cloud vCenter. Use the following key criteria to decide which management model is most suited for your environment: Reasons to select the distributed management model Use these criteria to decide if this model is suitable for your environment. Reasons for selecting the distributed management model are:  To use existing infrastructure to provide the resources that will host the Core Pod.  To achieve the highest level of resource separation (that is, host level) between the Core, Automation, and NEI Pods.  To minimize the intersection points for north/south traffic to just the hosts that provide compute resources to the NEI Pod.  To maximize the resilience of the solution by using a separate vCenter server and SQL Server instance to host the Core Pod components. Reasons to select the collapsed management model Use these criteria to decide if this model is suitable for your environment. Reasons for selecting the collapsed management model are:  To deploy the smallest management footprint for any given cloud size.  To reconfigure resource allocation between pods to be reconfigurable with minimal effort. Reasons to select the hybrid management model Use these criteria to decide if this model is suitable for your environment. Reasons for selecting the hybrid management model are: Network quality of service considerations  To achieve the highest level of resource separation (that is, host level) between the Core, Automation, and NEI Pods.  To minimize the intersection points for north/south traffic to just the hosts that provide compute resources to the NEI Pod.  To have a management model that overlays easily with VCE VxBlock, and to use the VCE AMP vCenter as the Cloud vCenter. When using a management model that involves collapsed clusters, it may be necessary to configure network quality of service (QoS) to ensure that each function has a guaranteed minimum level of bandwidth available. Table 5 shows the suggested initial QoS settings. These may be fine-tuned as appropriate to the environment. Note: These values are suggestions based on the logical network Layout 1 in Chapter 3. As this layout is only a sample, you should collapse or divide these allocations according to the network topology you want to implement 18 Chapter 2: Cloud Management Platform Options Table 5. Suggested network QoS settings Name VLAN DVS shares DVS % Min QoS COS vmk_ESXi_MGMT 100 500 5% 2 vmk_NFS 200 750 7.5% 4 vmk_iSCSI 300 750 7.5% 4 vmk_vMOTION 400 1400 14% 1 DPG_Core 500 500 5% 2 DPG_NEI 600 500 5% 2 DPG_Automation 700 500 5% 2 DPG_Tenant_Uplink 800 2000 20% 0 VXLAN_Transport 900 * * * Avamar_Target (optional) 1000 ** ** ** DPG_AV_Proxies (optional) 1100 600 6% 0 ESG_DLR_Transit Virtual Wire 1250 12.5% 0 Workload Virtual Wire 1250 12.5% 0 *This is a VXLAN_Transport VLAN. The shares are associated with the virtual wire networks that use the transport VLAN. **Physical network only. No shares required. Component high availability Using vSphere ESXi clusters with VMware vSphere High Availability (vSphere HA) provides general virtual machine protection across the management platform. Additional levels of availability can be provided by using nested clustering between the component virtual machines themselves, such as Windows Failover Clustering, PostgreSQL clustering, load balancer clustering, or farms of machines that work together natively in an N+1 architecture, to provide a resilient architecture. Distributed vRealize Automation The Federation Enterprise Hybrid Cloud requires the use of distributed vRealize Automation installations. In this model, multiple instances of each vRealize automation role are deployed behind a load balancer to ensure scalability and fault tolerance. All-in-one vRealize Automation installations are not supported for production use. VMware NSX Load Balancing technology is fully supported, tested, and validated by Federation Enterprise Hybrid Cloud. Other load balancer technologies supported by VMware for use in vRealize Automation deployments are also permitted, but configuration assistance for those technologies should be provided by VMware or the vendor. Use of a load balancer not officially supported by Federation Enterprise Hybrid Cloud or VMware for use with vRealize Automation requires a Federation Enterprise Hybrid Cloud request for product qualification (RPQ). Clustered vRealize Orchestrator Both clustered and stand-alone vRealize Orchestrator installations are supported by Federation Enterprise Hybrid Cloud. Table 6 details the specific component high-availability options, as supported by each management model. 19 Chapter 2: Cloud Management Platform Options Table 6. vRealize Automation and vRealize Orchestrator High Availability options Management model Distributed vRealize Automation Minimal vRealize Automation (AIO) Clustered vRealize Orchestrator (Active/Active) Stand-alone vRealize Orchestrator Distributed Supported Not supported Supported Supported Collapsed Supported Not supported Supported Supported Hybrid Supported Not supported Supported Supported Highly available VMware Platform Services Controller Highly available configurations for VMware Platform Services Controller are not supported in Federation Enterprise Hybrid Cloud 3.5. 20 Chapter 3: Network Topologies This chapter presents the following topics: Overview ..........................................................................................................22 Implications of virtual networking technology options .............................................22 Logical network topologies ..................................................................................24 21 Chapter 3: Network Topologies This solution provides a network architecture design that is resistant in the event of failure, enables optimal throughput, multitenancy and secure separation. This section presents a number of generic logical network topologies. Further network considerations specific to each topology are presented in the relevant chapters. Physical connectivity In designing the physical architecture, the main considerations are high availability, performance, and scalability. Each layer in the architecture should be fault tolerant with physically redundant connectivity throughout. The loss of any one infrastructure component or link should not result in loss of service to the tenant; if scaled appropriately, there is no impact on service performance. Physical network and FC connectivity to the compute layer may be provided over a converged network to converged network adapters on each compute blade, or over any network and FC adapters that are supported by the hardware platform and vSphere. Supported virtual networking technologies The Federation Enterprise Hybrid Cloud supports different virtual networking technologies as follows:  VMware NSX for vSphere  VMware vSphere Distributed Switch The dynamic network services with vRealize Automation showcased in this solution require either NSX. vSphere Distributed Switch supports static networking configurations only, precluding the use of VXLANs. The following section describes the implications, and features available, when VMware NSX is used with the Federation Enterprise Hybrid Cloud solution compared to non-NSX-based alternatives. Solution attributes with and without VMware NSX 22 Table 7 compares the attributes, support, and responsibility for various aspects of the Federation Enterprise Hybrid Cloud solution, under its various topologies when used with and without VMware NSX. Table 7. Comparing solution attributes with and without VMware NSX Topology Solution attributes with NSX Solution attributes without NSX Single-site  Provides the fully tested and validated load balancer component for vRealize Automation and other Automation Pod components.   vRealize Automation multi-machine blueprints may use networking components provisioned dynamically by NSX. Requires a non-NSX load balancer for vRealize Automation and other Automation Pod components. Load balancers listed as supported by VMware are permitted, but the support burden falls to VMware or the relevant vendor.   Supports the full range of NSX functionality supported by VMware vRealize. vRealize Automation blueprints must use pre-defined vSphere networks only (no dynamic provisioning of networking components possible).  Possesses fewer security features due to the absence of NSX.  Reduces network routing efficiency due to lack of east-west kernel level routing options provided by NSX. Chapter 3: Network Topologies Standard dualsite/single vCenter  Provides the fully tested and validated load balancer component for vRealize Automation and other Automation Pod components.  vRealize Automation multi-machine blueprints may use networking components provisioned dynamically by NSX.  CA dualsite/single vCenter Standard dualsite/dual vCenter Supports the full range of NSX functionality supported by VMware vRealize.  Provides the fully tested and validated load balancer component for vRealize Automation and other Automation Pod components.  vRealize Automation multi-machine blueprints may use networking components provisioned dynamically by NSX.  Supports the full range of NSX functionality supported by VMware vRealize.  Enables automatic path fail over when the ‘preferred’ site fails.  Enables VXLAN over L2 or Layer 3 DCI to support tenant workload networks availability in both physical locations.  Provides the fully tested and validated load balancer component for vRealize Automation and other Automation Pod components.  vRealize Automation multi-machine blueprints may use networking components provisioned dynamically by NSX.  Supports the full range of NSX functionality supported by VMware vRealize.  Requires a non-NSX load balancer for vRealize Automation and other Automation Pod components. Load balancers listed as supported by VMware are permitted, but the support burden falls to VMware or the relevant vendor.  vRealize Automation blueprints must use pre-defined vSphere networks only (no dynamic provisioning of networking components possible).  Possesses fewer security features due to the absence of NSX.  Reduces network routing efficiency due to lack of east-west kernel level routing options provided by NSX.  Requires a non-NSX load balancer for vRealize Automation and other Automation Pod components. Load balancers listed as supported by VMware are permitted, but the support burden falls to VMware or the relevant vendor.  vRealize Automation blueprints must use pre-defined vSphere networks only (no dynamic provisioning of networking components possible).  Possesses fewer security features due to the absence of NSX.  Reduces network routing efficiency due to lack of east-west kernel level routing options provided by NSX.  Requires Layer 2 VLANs present at both sites to back tenant virtual machine vSphere port groups.  Requires a non-NSX load balancer for vRealize Automation and other Automation Pod components. Load balancers listed as supported by VMware are permitted, but the support burden falls to VMware or the relevant vendor.  vRealize Automation blueprints must use pre-defined vSphere networks only (no dynamic provisioning of networking components possible).  Possesses fewer security features due to the absence of NSX.  Reduces network routing efficiency due to lack of east-west kernel level routing options provided by NSX. 23 Chapter 3: Network Topologies DR dualsite/dual vCenter  Provides the fully tested and validated load balancer component for vRealize Automation and other Automation Pod components.  Does not support inter-site protection of dynamically provisioned VMware NSX networking artifacts.  Supports consistent NSX security group membership by ensuring virtual machines are placed in corresponding predefined security groups across sites via Federation workflows.  Allows fully automated network reconvergence for tenant resource pods networks on the recovery site via Federation workflows, the redistribution capability of BGP/OSPF and the use of NSX redistribution polices.  Does not honor NSX security tags applied to a virtual machine on the protected site prior to failover.  Requires a non-NSX load balancer for vRealize Automation and other Automation Pod components. Load balancers listed as supported by VMware are permitted, but the support burden falls to VMware or the relevant vendor.  vRealize Automation blueprints must use pre-defined vSphere networks only (no dynamic provisioning of networking components possible).  Possesses fewer security features due to the absence of NSX.  Reduces network routing efficiency due to lack of east-west kernel level routing options provided by NSX.  Requires customer-supplied IP mobility technology.  Requires manual or customer provided re-convergence process for tenant resource pods on the recovery site. Each logical topology is designed to address the requirements of multitenancy and secure separation of the tenant resources. It is also designed to align with security best practices for segmenting networks according to the purpose or traffic type. In the distributed management platform option, a minimum of one distributed vSwitch is required for each of the External and Cloud vCenters, unless you run the Core Pod components on standard vSwitches. In that case, a minimum of one distributed vSwitch is required for the Cloud vCenter to support NSX networks. Multiple distributed vSwitches are supported in both cases. Note: While the minimum is one distributed vSwitch per vCenter, the Federation recommends two distributed vSwitches in the Cloud vCenter. The first distributed switch should be used for cloud management networks and the second distributed switch for tenant workload networks. The sample layouts provided later in this chapter use this model and indicate which networks are on each distributed switch by indicating vDS1 or vDS2. Additional distributed switches can be created for additional tenants if required. In the collapsed management platform option, there must be at least one distributed vSwitch in the Cloud vCenter to support NSX. Multiple distributed vSwitches are supported. 24 Chapter 3: Network Topologies Network layouts The following network layouts are sample configurations intended to assist in understanding the elements that need to be catered for in a Federation Enterprise Hybrid Cloud network design. They do not represent a prescriptive list of the permitted configurations for logical networks in Federation Enterprise Hybrid Cloud. The network layout should be designed based on individual requirements. Layout 1 Figure 6 shows one possible logical-to-physical network layout where standard vSphere switches are used for the basic infrastructural networks. This layout may be preferable where:  Additional NIC cards are available in the hosts to be used.  Increased protection against errors in configuration at a distributed vSwitch level is required   It does this by placing the NFS, iSCSI, and vSphere vMotion networks on standard vSwitches. Dynamic networking technology is required through the use of NSX or vCloud Networking and Security Note: All VLAN suggestions are samples only and should be determined by the network team in each particular environment. Figure 6. Network layout 1 25 Chapter 3: Network Topologies Descriptions of each network are provided in Table 8. Table 8. Name Type Switch type Location VLAN Description vmk_ESXi_MGMT VMkernel Standard vSwitch vSphere ESXi hosts 100 VMkernel on each vSphere ESXi host that hosts the management interface for the vSphere ESXi host itself. DPG_Core network should be able to reach this network. vmk_NFS VMkernel Standard vSwitch External vCenter and Cloud vCenter 200 Optional VMkernel used to mount NFS datastores to the vSphere ESXi hosts. NFS File Storage should be connected to the same VLAN / subnet or routable from this subnet. vmk_iSCSI VMkernel Standard vSwitch External vCenter and Cloud vCenter 300 Optional VMkernel used to mount iSCSI datastores to the vSphere ESXi hosts. iSCSI network portals should be configured to use the same VLAN / subnet or routable from this subnet. vmk_vMOTION VMkernel Standard vSwitch External vCenter and Cloud vCenter 400 VMkernel used for vSphere vMotion between vSphere ESXi hosts. DPG_Core vSphere distributed port group Distributed vSwitch 1 External vCenter 500 Port group to which the management interfaces of all the core management components connect DPG_NEI vSphere distributed port group Distributed vSwitch 1 Cloud vCenter 600 Port group to which the NSX controllers on the NEI Pod connect. DPG_Core network should be able to reach this network. DPG_Automation vSphere distributed port group Distributed vSwitch 1 Cloud vCenter 700 Port group to which the management interfaces of all the Automation Pod components connect DPG_Tenant_Uplink vSphere distributed port group Distributed vSwitch 2 Cloud vCenter 800 Port group used for all tenant traffic to egress from the cloud. Multiples may exist. VXLAN_Transport NSX distributed port group Distributed vSwitch 2 Cloud vCenter 900 Port group used for VTEP endpoints between vSphere ESXi hosts to allow VXLAN traffic. ESG_DLR_Transit NSX logical switch Distributed vSwitch 2 Cloud vCenter Virtual wire VXLAN segments connecting Tenant Edge and Tenant DLRs. Multiples may exist. Workload NSX logical switch Distributed vSwitch 2 Cloud vCenter Virtual wire Workload VXLAN segments. Multiples may exist. Avamar_Target Primary PVLAN N/A Physical switches 1000 Promiscuous primary PVLAN to which physical Avamar grids are connected. This PVLAN has an associated secondary isolated PVLAN (1100) in which the Avamar proxies are placed (Optional) 26 Network layout 1 descriptions Chapter 3: Network Topologies Name Type Switch type Location VLAN Description DPG_AV_Proxies Secondary PVLAN / Distributed vSwitch 2 Physical switches/ Cloud vCenter 1100 Isolated secondary PVLAN to which Avamar Proxies virtual machines are connected. This PVLAN enables proxies to communicate with Avamar Grids on the Avamar_Target network but prevents proxies from communicating with each other (Optional) vSphere distributed port group Layout 2 Figure 7 shows a second possible logical to physical network layout where distributed vSphere switches are used for all basic infrastructural networks other than the vSphere ESXi management network. This layout may be preferable where:  Fewer NIC cards are available in the hosts to be used.  Increased consolidation of networks is required   It does this by placing all bar the ESXi management interfaces on distributed vSwitches. Dynamic networking technology is required through the use of NSX or vCloud Networking and Security Note: All VLAN suggestions are samples only and should be determined by the network team in each particular environment. Figure 7. Network layout 2 27 Chapter 3: Network Topologies Descriptions of each network are provided in Table 9. Table 9. 28 Network layout 2 descriptions Name Type Switch type Location VLAN Description vmk_ESXi_MGMT VMkernel Standard vSwitch ESXi hosts 100 VMkernel on each vSphere ESXi host that hosts the management interface for the vSphere ESXi host itself. DPG_Core network should be able to reach this network. vmk_NFS VMkernel Distributed vSwitch 1 External vCenter and Cloud vCenter 200 Optional VMkernel used to mount NFS datastores to the vSphere ESXi hosts. NFS File Storage should be connected to the same VLAN / subnet or routable from this subnet. vmk_iSCSI VMkernel Distributed vSwitch 1 External vCenter and Cloud vCenter 300 Optional VMkernel used to mount iSCSI datastores to the vSphere ESXi hosts. iSCSI network portals should be configured to use the same VLAN / subnet or routable from this subnet. vmk_vMOTION VMkernel Distributed vSwitch 1 External vCenter and Cloud vCenter 400 VMkernel used for vSphere vMotion between vSphere ESXi hosts. DPG_Core vSphere distributed port group Distributed vSwitch 1 External vCenter 500 Port group to which the management interfaces of all the core management components connect DPG_NEI vSphere distributed port group Distributed vSwitch 1 Cloud vCenter 600 Port group to which the NSX controllers on the NEI Pod connect. DPG_Core network should be able to reach this network. DPG_Automation vSphere distributed port group Distributed vSwitch 1 Cloud vCenter 700 Port group to which the management interfaces of all the Automation Pod components connect DPG_Tenant_Uplink vSphere distributed port group Distributed vSwitch 2 Cloud vCenter 800 Port group used for all tenant traffic to egress from the cloud. Multiples may exist. VXLAN_Transport NSX distributed port group Distributed vSwitch 2 Cloud vCenter 900 Port group used for VTEP endpoints between vSphere ESXi hosts to allow VXLAN traffic. ESG_DLR_Transit NSX logical switch Distributed vSwitch 2 Cloud vCenter Virtual wire VXLAN segments connecting Tenant Edge and Tenant DLRs. Multiples may exist Workload NSX logical switch Distributed vSwitch 2 Cloud vCenter Virtual wire Workload VXLAN segments. Multiples may exist. Chapter 3: Network Topologies Name Type Switch type Location VLAN Description Avamar_Target Primary PVLAN N/A Physical switches 1000 Promiscuous primary PVLAN to which physical Avamar grids are connected. This PVLAN has an associated secondary isolated PVLAN (1100) in which the Avamar proxies are placed Secondary PVLAN/ Distributed vSwitch Physical switches/ Cloud vCenter 1100 Isolated secondary PVLAN to which Avamar Proxies virtual machines are connected. This PVLAN enables proxies to communicate with Avamar Grids on the Avamar_Target network but prevents proxies from communicating with each other (Optional) DPG_AV_Proxies (Optional) vSphere distributed port group Layout 3 Figure 8 shows a third possible logical-to-physical network layout where distributed vSphere switches are used for all networks other than the management network. This layout may be preferable where:  There is no requirement for dynamic networking.  Reduction of management host count is paramount (as it removes the need for the NEI Pod). Note: All VLAN suggestions are samples only and should be determined by the network team in each particular environment. Figure 8. Network layout 3 29 Chapter 3: Network Topologies Descriptions of each network are provided in Table 10. Table 10. Name Type Switch type Location VLAN Description vmk_ESXi_MGMT VMkernel Standard vSwitch ESXi Hosts 100 VMkernel on each vSphere ESXi host that hosts the management interface for the ESXi Host itself. DPG_Core network should be able to reach this network. vmk_NFS VMkernel Standard vSwitch External vCenter and Cloud vCenter 200 Optional VMkernel used to mount NFS datastores to the vSphere ESXi hosts. NFS File Storage should be connected to the same VLAN/subnet or routable from this subnet. vmk_iSCSI VMkernel Standard vSwitch External vCenter and Cloud vCenter 300 Optional VMkernel used to mount iSCSI datastores to the vSphere ESXi hosts. iSCSI network portals should be configured to use the same VLAN / subnet or routable from this subnet. vmk_vMOTION VMkernel Standard vSwitch External vCenter and Cloud vCenter 400 VMkernel used for vSphere vMotion between vSphere ESXi hosts. DPG_Core vSphere distributed port group Distributed vSwitch 1 External vCenter 500 Port group to which the management interfaces of all the core management components connect DPG_Automation vSphere distributed port group Distributed vSwitch 1 Cloud vCenter 600 Port group to which the management interfaces of all the Automation Pod components connect DPG_Tenant_Uplink vSphere distributed port group Distributed vSwitch 2 Cloud vCenter 700 Port group used for all tenant traffic to egress from the cloud. Multiples may exist. DPG_Workload_1 vSphere distributed port group Distributed vSwitch 2 Cloud vCenter 800 Port group used for workload traffic DPG_Workload_2 vSphere distributed port group Distributed vSwitch 2 Cloud vCenter 900 Port group used for workload traffic Avamar_Target Primary PVLAN N/A Physical switches 1000 Promiscuous primary PVLAN to which physical Avamar grids are connected. This PVLAN has an associated secondary isolated PVLAN (1100) in which the Avamar proxies are placed Secondary PVLAN / Distributed vSwitch 2 Physical switches/ Cloud vCenter 1100 Isolated secondary PVLAN to which Avamar Proxies virtual machines are connected. This PVLAN enables proxies to communicate with Avamar Grids on the Avamar_Target network but prevents proxies from communicating with each other (optional) DPG_AV_Proxies (optional) 30 Network layout 3 descriptions vSphere distributed port group Chapter 4: Single-Site/Single vCenter Topology This chapter presents the following topics: Overview ..........................................................................................................32 Single-site networking considerations ...................................................................32 Single-site storage considerations ........................................................................33 Recovery of cloud management platform ..............................................................36 Backup of single-site/single vCenter enterprise hybrid cloud ....................................36 31 Chapter 4: Single-Site/Single vCenter Topology This chapter describes networking and storage considerations for a single-site/single vCenter topology in the Federation Enterprise Hybrid Cloud solution. When to use the single-site topology The single-site/single vCenter Federation Enterprise Hybrid Cloud topology should be used when restart or recovery of the cloud to another data center is not required. It can also be used as the base deployment on top of which you may layer the dual-site/single vCenter or dual-site/dual vCenter topology at a later time. Architecture Figure 9 shows the single-site/single vCenter architecture for the Federation Enterprise Hybrid Cloud solution including the required sets of resources separated by pod. Figure 9. Supported virtual networking technologies Supported VMware NSX features 32 Federation Enterprise Hybrid Cloud single-site architecture The Federation Enterprise Hybrid Cloud supports the following virtual networking technologies in the single-site topology:  VMware NSX (recommended)  VMware vSphere Distributed Switch When using VMware NSX in a single-site architecture, the Federation Enterprise Hybrid Cloud solution supports the full range of NSX functionality supported by VMware vRealize Automation. The integration between these components provides all the required functionality, including but not limited to:  Micro-segmentation  Dynamic provisioning of VMware NSX constructs via vRealize blueprints  Use of NSX security polices, groups, and tags  Integration with the VMware NSX Partner ecosystem for enhanced security Chapter 4: Single-Site/Single vCenter Topology NSX best practices In a single-site topology, when NSX is used, all NSX Controller components reside in the same site and within the NEI Pod. NSX best practice recommends that each NSX Controller is placed on a separate physical host. NSX creates Edge Services Gateways (ESGs) and Distributed Logical Routers (DLRs). Best practice for ESGs and DLRs recommends that they are deployed in HA pairs, and that the ESGs and DLRs are separated from each other onto different physical hosts. Combining these best practices means that a minimum of four physical hosts are required to support the NEI Pod function when NSX is used. VMware anti-affinity rules should be used to ensure that the following conditions are true during optimum conditions:  NSX Controllers reside on different hosts.  NSX ESGs configured for high availability reside on different hosts.  NSX DLR Control virtual machines reside on different hosts.  NSX ESG and DLR Control virtual machines reside on different hosts. When using the Federation Enterprise Hybrid Cloud Sizing tool, consider the choice of server specification for the NEI Pod to ensure efficient use of hardware resources, as the tool will enforce the four-server minimum when NSX is chosen. Storage design This Federation Enterprise Hybrid Cloud solution presents storage in the form of storage service offerings that greatly simplify virtual storage provisioning. The storage service offerings are based on ViPR virtual pools, which are tailored to meet the performance requirements of general IT systems and applications. Multiple storage system virtual pools, consisting of different disk types, are configured and brought under ViPR management. ViPR presents the storage to the enterprise hybrid cloud as virtual storage pools, abstracting the underlying storage details and enabling provisioning tasks to be aligned with the application’s class of service. In Federation Enterprise Hybrid Cloud, each ViPR virtual pool represents a storage service offering can be supported or backed by multiple storage pools of identical performance and capacity. This storage service offering concept is summarized in Figure 10. 33 Chapter 4: Single-Site/Single vCenter Topology Figure 10. Storage service offerings for the hybrid cloud Note: The storage service offerings in Figure 10 are suggestions only. Storage service offerings can be configured and named as appropriate to reflect their functional use. The storage service examples in Figure 10 suggest the following configurations:  All Flash: Can be provided by either EMC XtremIO™, VNX as all-flash storage, or VMAX FAST VP where only the flash tier is used.  Tiered: Provides VNX or VMAX block or file-based VMFS or NFS storage devices and is supported by multiple storage pools using EMC Fully Automated Storage Tiering for Virtual Pools (FAST® VP) and EMC Fully Automated Storage Tiering (FAST®) Cache.  Single Tier: Provides EMC VNX block- or file-based VMFS or NFS storage and is supported by multiple storage pools using a single storage type of NL-SAS in this example. We suggest these storage service offerings only to highlight what is possible in a Federation Enterprise Hybrid Cloud environment. The full list of supported platforms includes:  EMC VMAX  EMC VNX  EMC XtremIO  EMC ScaleIO®  EMC VPLEX  EMC RecoverPoint  Isilon® (Workload use only) As a result many other storage service offerings can be configured to suit business and application needs, as appropriate. Note: The Federation recommends that you follow the best practice guidelines when deploying any of the supported platform technologies. The Federation Enterprise Hybrid Cloud does not require any variation from these best practices. 34 Chapter 4: Single-Site/Single vCenter Topology Storage consumption vRealize Automation provides the framework to build relationships between vSphere storage profiles and Business Groups so that they can be consumed through the service catalog. Initially, physical storage pools are configured on the storage system and made available to ViPR where they are configured into their respective virtual pools. At provisioning time, LUNs or file systems are configured from these virtual pools and presented to vSphere as VMFS or NFS datastores. The storage is then discovered by vRealize Automation and made available for assignment to business groups within the enterprise. This storage service offering approach greatly simplifies the process of storage administration. Instead of users having to configure the placement of individual virtual machine disks (VMDKs) on different disk types such as serial-attached storage (SAS) and FC, they simply select the appropriate storage service level required for their business need. Virtual disks provisioned on FAST VP storage benefit from the intelligent data placement. While frequently accessed data is placed on disks with the highest level of service, less frequently used data is migrated to disks reflecting that service level. When configuring virtual machine storage, a business group administrator can configure blueprints to deploy virtual machines onto any of the available storage service levels. In the example in Figure 11, a virtual machine can be deployed with a blueprint including a SQL Server database, to a storage service offering named Prod-2, which was designed with the performance requirements of such an application in mind. Figure 11. Blueprint storage configuration in vRealize Automation The devices for this SQL Server database machine have different performance requirements, but rather than assigning different disk types to each individual drive, each virtual disk can be configured on the Prod-2 storage service offering. This allows the underlying FAST technology to handle the best location for each individual block of data across the tiers. The vRealize Automation storage reservation policy ensures that the VMDKs are deployed to the appropriate storage. The storage presented to vRealize Automation can be shared and consumed across the various business groups using the capacity and reservation policy framework in vRealize Automation. Storage provisioning Storage is provisioned to the Workload vSphere clusters in the environment using the Provision Cloud Storage catalog item that can provision VNX, VMAX, XtremIO, ScaleIO, and VPLEX Local storage to single-site topology workload clusters. The workflow interacts with both ViPR and vRealize Automation to create the storage, present it to the chosen vSphere cluster and add the new volume to the relevant vRealize Storage Reservation Policy. 35 Chapter 4: Single-Site/Single vCenter Topology vSphere clusters are made eligible for storage provisioning by tagging them with vRealize Automation custom properties that define them as Unprotected clusters, that is, that they are not involved in any form of inter-site replication relationship. This tagging is done during the installation and preparation of vSphere clusters for use by the Federation Enterprise Hybrid Cloud using the Unprotected Cluster Onboarding workflows provided as part of the Federation Enterprise Hybrid Cloud self-service catalog. Note: Virtual machines on the cluster may still be configured to use backup as a service, as shown in Chapter 7. As local-only (unprotected) vSphere clusters can also exist in continuous availability and DR topologies, this process ensures that only the correct type of storage is presented to the single-site vSphere clusters and no misplacement of virtual machines intended for inter-site protection occurs. ViPR virtual pools For block-based provisioning, ViPR virtual arrays should not contain more than one protocol. For Federation Enterprise Hybrid Cloud this means that ScaleIO storage and FC block storage must be provided via separate virtual arrays. Note: Combining multiple physical arrays into fewer virtual arrays to provide storage to virtual pools is supported. Single-site topology Recovery of the management platform does not apply to a single-site topology, because there is no target site to recover to. Single-site/single vCenter topology backup The recommended option for backup in a single-site/single vCenter topology is the Standard Avamar configuration, though the Redundant Avamar/single vCenter configuration may also be used to provide additional resilience. Both options are described in Chapter 7. . 36 Chapter 5: Dual-Site/Single vCenter Topology This chapter presents the following topics: Overview ..........................................................................................................38 Standard dual-site/single vCenter topology ...........................................................38 Continuous availability dual-site/single vCenter topology ........................................39 Continuous availability network considerations ......................................................40 VPLEX Witness ...................................................................................................45 VPLEX topologies ...............................................................................................46 Continuous availability storage considerations .......................................................53 Recovery of cloud management platform ..............................................................56 Backup in dual-site/single vCenter enterprise hybrid cloud ......................................56 CA dual-site/single vCenter ecosystem .................................................................57 37 Chapter 5: Dual-Site/Single vCenter Topology This chapter describes the networking and storage considerations for a dual-site/single vCenter topology in the Federation Enterprise Hybrid Cloud solution. When to use the dual-site/single vCenter topology The dual-site/single vCenter Federation Enterprise Hybrid Cloud topology may be used when restart of the cloud to another data center is required. It should only be used in either of the following two scenarios:  Standard dual-site/single vCenter topology  Two sites are present that require management via a single vCenter instance and a single Federation Enterprise Hybrid Cloud management platform/portal. Note: In this case, the scope of the term ‘site’ is at the user’s discretion. It could be taken to mean separate individual geographical locations, or could also mean independent islands of infrastructure in the same geographical location, such as independent VCE VxBlock platforms.  This topology has no additional storage considerations beyond the single-site/single vCenter topology because each site has completely independent storage.  When used with VMware NSX, this topology employs an additional NEI Pod on the second site to ensure north/south network traffic egresses the second site in the most efficient manner. The local NEI Pod will host the Edge gateway services for its respective site. Note: There is a 1:1 relationship between NSX Manager and vCenter Server, although there is a second NEI Pod, there is still only one NSX manager instance. Continuous availability dual-site/single vCenter topology Continuous availability is required. This topology also requires that:  EMC VPLEX storage is available.  Stretched Layer 2 VLANs are permitted or the networking technology chosen supports VXLANs.  The latency between the two physical data center locations is less than 10 ms. The standard dual-site/single vCenter Federation Enterprise Hybrid Cloud topology controls two sites, each with independent islands of infrastructure using a single vCenter instance and Federation Enterprise Hybrid Cloud management stack/portal. This architecture provides a mechanism to extend an existing Federation Enterprise Hybrid Cloud by adding additional independent infrastructure resources to an existing cloud, when resilience of the management platform itself is not required. Figure 12 shows the architecture used for this topology option. 38 Chapter 5: Dual-Site/Single vCenter Topology Figure 12. Federation Enterprise Hybrid Cloud standard dual-site/single vCenter architecture The continuous availability (CA) dual-site/single vCenter Federation Enterprise Hybrid Cloud topology is an extension of the standard dual-site/single vCenter model that stretches the infrastructure across sites, using VMware vSphere Metro Storage Cluster (vMSC), vSphere HA, and VPLEX in Metro configuration. This topology enables multi-site resilience across two sites with automatic restart of both the management platform and workload virtual machines on the surviving site. Figure 13 shows the architecture used for this topology option. 39 Chapter 5: Dual-Site/Single vCenter Topology Figure 13. Supported virtual networking technologies Federation Enterprise Hybrid Cloud CA dual-site/single vCenter architecture The Federation Enterprise Hybrid Cloud supports the following virtual networking technologies in the dual-site/single vCenter topology:  VMware NSX (recommended)  VMware vSphere Distributed Switch If vSphere Distributed Switch is used in the CA dual-site/single vCenter topology, then all networks must be backed by a Layer 2 VLAN that is present in both locations. VMware NSX enables you to use VXLANs backed by a Layer 3 DCI. 40 Chapter 5: Dual-Site/Single vCenter Topology Supported VMware NSX features NSX best practices When using VMware NSX in dual-site/single vCenter architecture, the Federation Enterprise Hybrid Cloud solution supports the full range of NSX functionality supported by VMware vRealize Automation. The integration between these components provides all required functionality including, but is not limited to:  Micro-segmentation  Dynamic provisioning of VMware NSX constructs via vRealize blueprints  Use of NSX security polices, groups, and tags  Integration with the VMware NSX Partner ecosystem for enhanced security In the CA dual-site, single vCenter topology, when NSX is used, all NSX Controller components reside in the NEI Pod, but the NEI Pod is supported by a vSphere Metro Storage (stretched) cluster. NSX best practice recommends that each controller is placed on a separate physical host. NSX creates Edge Services Gateways (ESGs) and Distributed Logical Routers (DLRs). Best practice for ESGs and DLRs recommends that they are deployed in HA pairs, and that the ESGs and DLRs are separated from each other onto different physical hosts. Combining these best practices, and ensuring that each site is fully capable of running the NSX infrastructure optimally means that a minimum of four physical hosts per site (eight in total) are required to support the NEI Pod function when NSX is used. VMware affinity and anti-affinity rules should be used to ensure that the following conditions are true during optimum conditions:  NSX Controllers reside on different hosts.  NSX Edge Services Gateways reside on different hosts.  NSX Distributed Logical Router Control virtual machines reside on different host.  NSX ESG and DLR Control virtual machines do not reside on the same physical hosts.  All NSX Controllers reside on a given site, and move laterally within that site before moving to the alternate site. When using the Federation Enterprise Hybrid Cloud Sizing tool, appropriate consideration should be given to the choice of server specification for the NEI Pod to ensure efficient use of hardware resources, as the tool will enforce the four server minimum when NSX is chosen. Data Center Interconnect Data centers that are connected together over a metro link can use either Layer 2 bridged VLAN connectivity or Layer 3 routed IP connectivity. Both Data Center Interconnect (DCI) options have advantages and disadvantages. However, new standards and technologies, such as Virtual Extensible LAN (VXLAN), address most of the disadvantages. Layer 2 DCI A Layer 2 DCI should be used in continuous availability scenarios where VMware NSX is not available. Traditional disadvantages of Layer 2 DCI The risks related to Layer 2 extensions between data centers mirror some of the limitations faced in traditional Ethernet broadcast domains. The limiting factor is the scalability of a single broadcast domain. A large number of hosts and virtual machines within a broadcast domain, all of which contend for shared network resources, can result in broadcast storms. The results of broadcast storms are always to the 41 Chapter 5: Dual-Site/Single vCenter Topology detriment of network availability, adversely affecting application delivery and ultimately leading to a poor user experience. This can affect productivity. As the CA architecture is stretched across both data centers, a broadcast storm could cause disruption in both the primary and secondary data centers. Multiple Layer 2 interconnects create additional challenges for stretched networks. If unknown broadcast frames are not controlled, loops in the Layer 2 extension can form. This can also cause potential disruption across both data centers, resulting in network downtime and loss of productivity. If used, the Spanning Tree Protocol (STP) needs to be run and carefully managed to control loops across the primary and secondary site interconnecting links. Loop avoidance and broadcast suppression mechanisms are available to the IT professional, but must be carefully configured and managed. Traditional advantages of Layer 2 DCI The greatest advantage of Layer 2 DCI is the IP address mobility of physical and virtual machines across both data centers. This simplifies recovery in the event of a failure in the primary data center. Note: Layer 2 connectivity is often necessary for applications where heartbeats and clustering techniques are used across multiple hosts. In some cases, technologies might not be able to span Layer 3 boundaries. Layer 3 DCI A Layer 2 DCI may be used in continuous availability scenarios where VMware NSX is available. Traditional disadvantages of Layer 3 DCI If an infrastructure failure occurs at the primary site, a machine migrated to the secondary data center must be reconfigured to use an alternate IP addressing scheme. This can be more time consuming and error prone than having a high-availability deployment across a single Layer 2 domain. Inter-site machine clustering may not be supported over a Layer 3 boundary, which can be either multicast or broadcast based. Traditional advantages of Layer 3 DCI Layer 3 DCI does not use extended broadcast domains or require the use of STP. Therefore, there is greater stability of the production and services networks across both primary and secondary data centers. Note: The data center interconnect physical link is subject to the availability of the local telecommunications service provider and the business requirement of the enterprise. Optimal continuous availability DCI networking solution The network topology used in the CA for Federation Enterprise Hybrid Cloud solution can use the advantages of both Layer 2 and Layer 3 DCI topologies when used with VMware NSX. Layer 2 requirements such as resource and management traffic are handled by the VXLAN implementation enabled by NSX. This offers the advantage of IP mobility across both sites by placing the resource and management traffic on spanned VXLAN segments. It also eliminates the complexity of STP and performance degradation that large broadcast domains can introduce. 42 Chapter 5: Dual-Site/Single vCenter Topology VXLANs can expand the number of Layer 2 domains or segments beyond the 802.1q limit of 4,096 VLANs to a theoretical limit of 16 million. VXLANs can also extend the Layer 2 environment over Layer 3 boundaries. An underlying Layer 3 data center interconnect runs a dynamic route distribution protocol with rapid convergence characteristics such as Open Shortest Path First (OSPF). OSPF routing metrics route the ingress traffic to the primary data center. If the primary data center is unavailable, the OSPF algorithm automatically converges routes to the secondary data center. This is an important advantage compared to using a traditional Layer 2 DCI and Layer 3 DCI solution in isolation. Note: NSX also supports Border Gateway Protocol (BGP) and Intermediate System to Intermediate System (IS-IS) route distribution protocols. The Federation Enterprise Hybrid Cloud supports OSPF and BGP, but not IS-IS. In a collapsed management model, all clusters are part of the same vCenter instance and therefore can all be configured to use the security and protection features offered by the same NSX Manager instance. If this is not a requirement for Core Pod, then a stretched Layer 2 network may also be used. In a distributed management model, two vCenter instances are used. Given the 1:1 relationship between a vCenter instance and NSX Manager, a second NSX manager instance would be required if the Core Pod is to use the security and protection NSX provisioned networks. Given the small number of virtual machines present in the external vCenter, it may be appropriate to consider a stretched Layer 2 VLAN for this network if the second NSX manager instance is deemed unnecessary. Figure 14 shows one possible scenario where two data centers are connected using both a Layer 2 and a routed Layer 3 IP link and how the Core, NEI, Automation, and Workload segments could be provisioned. 43 Chapter 5: Dual-Site/Single vCenter Topology Figure 14. Continuous availability data center interconnect example using VMware NSX In this scenario, the following properties are true:  vSphere ESXi stretched clusters are utilized to host the Core, Automation, NEI, and Workload virtual machines. This, with vSphere HA, enables virtual machines to be automatically restarted on the secondary site, if the primary site fails.  The Core Pod virtual machines are connected to a stretched VLAN. This prevents the need for a second NSX manager machine.  The NSX controllers (NEI Pod) are connected to the same stretched VLAN as the Core Pod virtual machines.  The Automation Pod virtual machines are connected to an NSX logical network, backed by VXLAN and available across both sites.  The Workload Pod virtual machines are connected to a NSX logical network, backed by VXLAN and available across both sites.  VXLAN encapsulated traffic must be able to travel between vSphere ESXi hosts at both sites. One or more NSX Edge Services Gateways (ESGs) are deployed at each site to control traffic flow between the virtual and physical network environments. Note: NSX supports three modes of replication for VXLAN traffic unicast, multicast and hybrid. Unicast mode enables VXLAN traffic to be carried across Layer 3 boundaries without assistance from the underlying physical network, but requires availability of the NSX Controllers. 44 Chapter 5: Dual-Site/Single vCenter Topology vSphere HA, with VPLEX and VPLEX Witness, enables the cloud-management platform virtual machines to restore the cloud-management service on the secondary site in the event of a total loss of the primary data center. In this scenario, the virtual machines automatically move to and operate from vSphere ESXi nodes residing in the secondary data center. Edge Services Gateway considerations  All workload virtual machines should use NSX logical switches connected to a Distributed Logical Router (DLR). The DLR can provide the same default gateway to a virtual machine, whether it is running at the primary or secondary site.  DLRs should be connected to at least one ESG at each site and a dynamic route distribution protocol (such as OSPF and others supported by NSX) should be used to direct traffic flow. We recommend that you use both NSX High Availability and vSphere High Availability in conjunction with host DRS groups, virtual machine DRS groups and virtual machine DRS affinity rules to ensure that DLR virtual machines run on the correct site in optimum conditions. This solution has all the advantages of traditional Layer 2 and Layer 3 solutions. It provides increased flexibility and scalability by implementing VXLANs, and benefits from increased stability by not extending large broadcast domains across the VPLEX Metro. VPLEX Witness is an optional component deployed in customer environments where the regular preference rule sets are insufficient to provide seamless zero or near-zero recovery time objective (RTO) storage availability in the event of site disasters or VPLEX cluster and inter-cluster failures. Without VPLEX Witness, all distributed volumes rely on configured rule sets to identify the preferred cluster in the event of a cluster partition or cluster/site failure. However, if the preferred cluster fails (for example, as a result of a disaster event), VPLEX is unable to automatically enable the surviving cluster to continue I/O operations to the affected distributed volumes. VPLEX Witness is designed to overcome this. The VPLEX Witness server is deployed as a virtual appliance running on a customer’s vSphere ESXi host that is deployed in a failure domain separate from both of the VPLEX clusters. The third fault domain must have power and IP isolation from both the Site A and Site B fault domains, which host the VPLEX Metro Clusters. This eliminates the possibility of a single fault affecting both the cluster and VPLEX Witness. VPLEX Witness connects to both VPLEX clusters over the management IP network. By reconciling its own observations with the information reported periodically by the clusters, VPLEX Witness enables the clusters to distinguish between inter-cluster network partition failures and cluster failures, and to automatically resume I/O operations in these situations. Figure 15 shows an example of a high-level deployment of VPLEX Witness and how it can augment an existing static preference solution. The VPLEX Witness server resides in a fault domain separate from the VPLEX clusters on Site A and Site B. 45 Chapter 5: Dual-Site/Single vCenter Topology Figure 15. High-level deployment of EMC VPLEX Witness VMware classifies the stretched VPLEX Metro cluster configuration with VPLEX into the following categories: Deciding on VPLEX topology  Uniform host access configuration with VPLEX host Cross-Connect—vSphere ESXi hosts in a distributed vSphere cluster have a connection to the local VPLEX system and paths to the remote VPLEX system. The remote paths presented to the vSphere ESXi hosts are stretched across distance.  Non-uniform host access configuration without VPLEX host Cross-Connect— vSphere ESXi hosts in a distributed vSphere cluster have a connection only to the local VPLEX system. Use the following guidelines to help you decide which topology suits your environment:   Uniform host access configuration with VPLEX host Cross-Connect Uniform (Cross-Connect) is typically used where:  Inter-site latency is less than 5ms.  Stretched SAN configurations are possible. Non-Uniform (without Cross-Connect) is typically used where:  Inter-site latency is between 5 ms and 10 ms.  Stretched SAN configurations are not possible. EMC GeoSynchrony® supports the concept of a VPLEX Metro cluster with Cross-Connect. This configuration provides a perfect platform for a uniform vSphere stretched-cluster deployment. VPLEX with host Cross-Connect is designed for deployment in a metropolitantype topology with latency that does not exceed 5 ms round-trip time (RTT). vSphere ESXi hosts can access a distributed volume on the local VPLEX cluster and on the remote cluster in the event of a failure. When this configuration is used with VPLEX Witness, vSphere ESXi hosts are able to survive through multiple types of failure scenarios. For example, in the event of a VPLEX cluster or back-end storage array failure, the vSphere ESXi hosts can still access the second VPLEX cluster with no disruption in service. In the unlikely event that the preferred site fails, VPLEX Witness intervenes and ensures that access to the surviving cluster is automatically maintained. In this case, vSphere HA automatically restarts all affected virtual machines. 46 Chapter 5: Dual-Site/Single vCenter Topology Figure 16 shows that all ESXi hosts are connected to the VPLEX clusters at both sites. This can be achieved in a number of ways:  Merge switch fabrics by using Inter-Switch Link (ISL) technology used to connect local and remote SANs.  Connect directly to the remote data center fabric without merging the SANs. Figure 16. Deployment model with VPLEX host Cross-Connect This type of deployment is designed to provide the highest possible availability for a Federation Enterprise Hybrid Cloud environment. It can withstand multiple failure scenarios including switch, VPLEX, and back-end storage at a single site with no disruption in service. For reasons of performance and availability, the Federation recommends that separate host bus adapters be used for connecting to local and remote switch fabrics. Note: VPLEX host Cross-Connect is configured at the host layer only and does not imply any cross connection of the back-end storage. The back-end storage arrays remain locally connected to their respective VPLEX clusters. From the host perspective, in the uniform deployment model with VPLEX host CrossConnect, the vSphere ESXi hosts are zoned to both the local and the remote VPLEX clusters. Figure 17 displays the VPLEX storage views for a host named DRM-ESXi088, physically located in Site A of our environment. Here the initiators for the host are registered and added to both storage views with the distributed device being presented from both VPLEX clusters. 47 Chapter 5: Dual-Site/Single vCenter Topology Figure 17. VPLEX storage views with host Cross-Connect This configuration is transparent to the vSphere ESXi host. The remote distributed volume is presented as an additional set of paths. Figure 18 shows the eight available paths that are presented to host DRM-ESXi088, for access to the VPLEX distributed volume hosting the datastore named CC-Shared-M3. The serial numbers of the arrays are different because four of the paths are presented from the first VPLEX cluster and the remaining four are presented from the second. Figure 18. Datastore paths in a VPLEX with host Cross Connect configuration PowerPath/VE autostandby mode Neither the host nor the native multipath software can by themselves distinguish between local and remote paths. This poses a potential impact on performance if remote paths are used for I/O in normal operations because of the cross-connect latency penalty. 48 Chapter 5: Dual-Site/Single vCenter Topology PowerPath/VE provides the concept of autostandby mode, which automatically identifies all remote paths and sets them to standby (asb:prox is the proximity-based autostandby algorithm). This feature ensures that only the most efficient paths are used at any given time. PowerPath/VE groups paths internally by VPLEX cluster. The VPLEX cluster with the lowest minimum path latency is designated as the local/preferred VPLEX cluster, while the other VPLEX cluster within the VPLEX Metro system is designated as the remote/non-preferred cluster. A path associated with the local/preferred VPLEX cluster is put in active mode, while a path associated with the remote/non-preferred VPLEX cluster is put in autostandby mode. This forces all I/O during normal operations to be directed towards the local VPLEX cluster. If a failure occurs where the paths to the local VPLEX cluster are lost, PowerPath/VE activates the standby paths and the host remains up and running on the local site, while accessing storage on the remote site. Non-uniform host access configuration without VPLEX Cross-Connect The non-uniform host configuration can be used for a Federation Enterprise Hybrid Cloud deployment if greater distances are required. The supported latency of this configuration requires that the round-trip time be within 5 ms to comply with VMware HA requirements. Without the cross-connect deployment, vSphere ESXi hosts at each site have connectivity to only that sites VPLEX cluster. Figure 19 shows that hosts located at each site have connections to only their respective VPLEX cluster. The VPLEX clusters have a link between them to support the VPLEX Metro configuration, and the VPLEX Witness is located in a third failure domain. Figure 19. VPLEX architecture without VPLEX Cross-Connect The major benefit of this deployment option is that greater distances can be achieved in order to protect the infrastructure. With the EMC VPLEX AccessAnywhereTM feature, the nonuniform deployment offers the business another highly resilient option that can withstand various types of failures including front-end and back-end single path failure, single switch failure, and single back-end array failure. Figure 20 shows the storage views from VPLEX cluster 1 and cluster 2. In the example nonuniform deployment, hosts DRM-ESXi077 and DRM-ESXi099 represent hosts located in different data centers. They are visible in their site-specific VPLEX cluster’s storage view. With AccessAnywhere, the hosts have simultaneous write access to the same distributed device, but only via the VPLEX cluster on the same site. 49 Chapter 5: Dual-Site/Single vCenter Topology Figure 20. VPLEX Storage Views without VPLEX Cross-Connect Figure 21 shows the path details for one of the hosts in a stretched cluster that has access to the datastores hosted on the VPLEX distributed device. The World Wide Name (WWN) on the Target column shows that all paths to that distributed device belong to the same VPLEX cluster. PowerPath/VE has also been installed on all of the hosts in the cluster, and it has automatically set the VPLEX volume to the adaptive failover mode. The autostandby feature is not used in this case because all the paths to the device are local. Figure 21. vSphere Datastore Storage paths without VPLEX Cross-Connect With vSphere HA, the virtual machines are also protected against major outages, such as network partitioning of the VPLEX WAN link or an entire site failure. In order to prevent any unnecessary down time, the Federation recommends that the virtual machines reside on the site that would win ownership of the VPLEX distributed volume in the event of such a partitioning occurring. Site affinity for management platform machines 50 When using the CA dual-site/single vCenter topology, the Federation recommends that all platform components are bound to a given site using VMware affinity ‘should’ rules. This ensures minimum latencies between components while still allowing them to move to the surviving site in the case of a site failure. Chapter 5: Dual-Site/Single vCenter Topology Site affinity for tenant virtual machines The solution uses VMware Host Distributed Resource Scheduler (DRS) groups to subdivide the vSphere ESXi hosts in each workload and management cluster into groupings of hosts corresponding to their respective sites. It does this by defining two VMware host DRS groups in the format SiteName_Hosts where the site names of both sites are defined during the installation of the Federation Enterprise Hybrid Cloud foundation package. VMware virtual machine DRS groups are also created in the format Sitename_VMs during the preparation of the ESXi cluster for continuous availability. Storage reservation polices (SRPs) created by the Federation Enterprise Hybrid Cloud storage as service workflows are automatically named to indicate the preferred site in which that storage type is run. Note: In this case, the preferred site setting means that in the event of a failure that results in the VPLEX units being unable to communicate, that this site will be the one that continues to provide read/write access to the storage. During deployment of a virtual machine though the vRealize portal, the user is asked to choose from a list of storage reservation policies. Federation Enterprise Hybrid Cloud custom workflows use this information to place the virtual machine on a vSphere ESXi cluster with access to the required storage type and placing the virtual machine into the appropriate virtual machine DRS group. Virtual machines to host DRS rules are then used to bind virtual machines to the preferred site by configuring the SiteName_VMs virtual machine DRS group with a setting of “should run” on the respective SiteName_Hosts host DRS group. This ensures virtual machines run on the required site, while allowing them the flexibility of failing over if the infrastructure on that site becomes unavailable. Figure 22 shows how the virtual machine DRS groups and affinity rules might look in a sample configuration. Figure 22. Sample view of site affinity DRS group and rule configuration 51 Chapter 5: Dual-Site/Single vCenter Topology Note: The values “SiteA” and “SiteB” shown in both Figure 22 and Figure 23 can and should be replaced with meaningful site names in a production environment. They must correlate with the site name values entered during the Federation Enterprise Hybrid Cloud Foundation package initialization for site affinity to work correctly. Figure 23 shows a simple example of two scenarios where virtual machines are deployed to a vMSC and how the logic operates to place those virtual machines on their preferred sites. Figure 23. Deploying virtual machines with site affinity Scenario 1: Deploy VM1 with affinity to Site A This scenario describes deploying a virtual machine (VM1) with affinity to Site A: 1. During virtual machine deployment, the user chooses a storage reservation policy named SiteA_Preferred_CA_Enabled. 1. This storage reservation policy choice filters the cluster choice to only those clusters with that reservation policy. In this case cluster 1. 2. Based on the selected storage reservation policy, Federation Enterprise Hybrid Cloud workflows programmatically determine that Site A is the preferred location, and therefore locates the virtual machine DRS affinity group corresponding with Site A, namely SiteA_VMs. 3. The expected result is: a. VM1 is deployed into SiteA_VMs, residing on host CL1-H1 or CL1H2. b. VM1 is deployed onto a datastore from the SiteA_Preferred_CA_Enabled storage reservation policy, for example: VPLEX_Distributed_LUN_SiteA_Preferred_01 or VPLEX_Distributed_LUN_SiteA_Preferred_02 Scenario 2: Deploy VM2 with affinity to Site B This scenario describes deploying a virtual machine (VM2) with affinity to Site B: 1. 52 During virtual machine deployment, the user chooses a storage reservation policy named SiteB_Preferred_CA_Enabled. Chapter 5: Dual-Site/Single vCenter Topology 2. This storage reservation policy choice filters the cluster choice to only those clusters with that reservation policy. In this case cluster 1. 3. Based on the selected storage reservation policy, Federation Enterprise Hybrid Cloud workflows programmatically determine that Site B is the preferred location, and therefore locates the virtual machine DRS affinity group corresponding with Site B, namely SiteB_VMs. 4. The expected result is: a. VM2 is deployed into SiteB_VMs, meaning it resides on hosts CL1-H3 or CL1H4. b. VM1 is deployed onto a datastore from the SiteB_Preferred_CA_Enabled storage reservation policy. For example: VPLEX_Distributed_LUN_SiteB_Preferred_01 or VPLEX_Distributed_LUN_SiteB_Preferred_02 ViPR virtual arrays There must be at least one virtual array for each site. By configuring the virtual arrays in this way, ViPR can discover the VPLEX and storage topology. You should carefully plan and perform this step because it is not possible to change the configuration after resources have been provisioned, without first disruptively removing the provisioned volumes. ViPR virtual pools ViPR virtual pools for block storage offer two options under High Availability: VPLEX local and VPLEX distributed. When you specify local high availability for a virtual pool, the ViPR storage provisioning services create VPLEX local virtual volumes. If you specify VPLEX distributed high availability for a virtual pool, the ViPR storage provisioning services create VPLEX distributed virtual volumes. To configure a VPLEX distributed virtual storage pool through ViPR:  Ensure a virtual array exists for both sites, with the relevant physical arrays associated with those virtual arrays. Each VPLEX cluster must be a member of the virtual array at its own site only.  Before creating a VPLEX high-availability virtual pool at the primary site, create a local pool at the secondary site. This is used as the target virtual pool when creating VPLEX distributed virtual volumes. When creating the VPLEX high-availability virtual pool on the source site, select the source storage pool from the primary site, the remote virtual array, and the remote pool created in Step 2. This pool is used to create the remote mirror volume that makes up the remote leg of the VPLEX virtual volume. Note: This pool is considered remote when creating the high availability pool because it belongs to VPLEX cluster 2 and we are creating the high availability pool from VPLEX cluster 1. Figure 24 shows this configuration, where VPLEX High Availability Virtual Pool represents the VPLEX high-availability pool being created. 53 Chapter 5: Dual-Site/Single vCenter Topology Figure 24. Interactions between local and VPLEX distributed pools As described in Site affinity for tenant virtual machines, Federation Enterprise Hybrid Cloud workflows leverage the ‘winning’ site in a VPLEX configuration to determine which site to map virtual machines to. To enable active/active clusters, it is therefore necessary to create two sets of datastores – one set that will win on Site A and another set than will win on Site B. To enable this, you need to configure an environment similar to Figure 24 for Site A, and the inverse of it for Site B (where the local pool is on Site A, and the high availability pool is configured from Site B). ViPR and VPLEX consistency groups interaction VPLEX uses consistency groups to maintain common settings on multiple LUNs. To create a VPLEX consistency group using ViPR, a ViPR consistency group must be specified when creating a new volume. ViPR consistency groups are used to control multi-LUN consistent snapshots and have a number of important rules associated with them when creating VPLEX distributed devices:  All volumes in any given ViPR consistency group must contain only LUNs from the same physical array. As a result of these considerations, the Federation Enterprise Hybrid Cloud STaaS workflows create a new consistency group per physical array, per vSphere cluster per site.  All VPLEX distributed devices in a given ViPR consistency group must have source and target backing LUNS from the same pair of arrays. As a result of these two rules, it is a requirement of the Federation Enterprise Hybrid Cloud that an individual ViPR virtual pool is created for every physical array that provides physical pools for use in a VPLEX distributed configuration. Virtual Pool Collapser function Federation Enterprise Hybrid Cloud STaaS workflows use the name of the ViPR virtual pool chosen as part of the naming for the vRealize Storage Reservation Policy (SRP) that the new datastore is added to. The Virtual Pool Collapser (VPC) function of Federation Enterprise Hybrid Cloud collapses the LUNs from multiple virtual pools into a single SRP. The VPC function can be used in the scenario where multiple physical arrays provide physical storage pools of the same configuration or service level to VPLEX, but through 54 Chapter 5: Dual-Site/Single vCenter Topology different virtual pools, and where required to ensure that all LUNS provisioned across those physical pools are collapsed into the same SRP. VPC can be enabled or disabled at a global Federation Enterprise Hybrid Cloud level. When enabled, the Federation Enterprise Hybrid Cloud STaaS workflows examine the naming convention of the virtual pool selected to determine which SRP it should add the datastore to. If the virtual pool has the string ‘_VPC-‘ in it, then Federation Enterprise Hybrid Cloud knows that it should invoke VPC logic. Virtual Pool Collapser example Figure 25 shows an example of VPC in use. In this scenario, the administrator has enabled the VPC function and created two ViPR virtual pools  GOLD_VPC-000001, which has physical pools from Array 1  GOLD_VPC-000002, which has physical pools from Array 2 When determining how to construct the SRP name to be used, the VPC function will only use that part of the virtual pool name that exists before ‘_VPC-‘. In this example that results in the term ‘GOLD’ which then contributes to the common SRP name of SITEA_GOLD_CA_Enabled. This makes it possible to conform to the rules of ViPR consistency groups as well as providing a single SRP for all datastores of the same type, which maintains abstraction and balanced datastore usage at the vRealize layer. Figure 25. Virtual Pool Collapser example In the example shown in Figure 25, all storage is configured to win on a single site (Site A). To enable true active/active vSphere Metro Storage clusters, additional pools should be configured in the opposite direction, as mentioned in Continuous availability storage considerations. Storage provisioning VPLEX distributed storage is provisioned to the Workload vSphere clusters in the environment using the Federation Enterprise Hybrid Cloud catalog item named Provision Cloud Storage. As shown in Figure 24, these VPLEX volumes can be backed by VMAX, VNX, or XtremIO arrays. Note: The Federation recommends that you follow the best practice guidelines when deploying any of the supported platform technologies. The Federation Enterprise Hybrid Cloud does not require any variation from these best practices. 55 Chapter 5: Dual-Site/Single vCenter Topology The workflow interacts with both ViPR and vRealize Automation to create the storage, presents it to the chosen vSphere cluster, and adds the new volume to the relevant vRealize storage reservation policy. As with the single-site topology, vSphere clusters are made eligible for storage provisioning by ‘tagging’ them with vRealize Automation custom properties. However, in this case they are defined as CA Enabled clusters, that is, they are part of a vMSC that spans both sites in the environment. This tagging is done during the installation and preparation of vSphere clusters for use by the Federation Enterprise Hybrid Cloud using the CA Cluster Onboarding workflow provided as part of the Federation Enterprise Hybrid Cloud self-service catalog. As local-only vSphere clusters can also be present in CA topology, the Provision Cloud Storage catalog item will automatically present only ViPR VPLEX distributed virtual storage pools to provision from when you attempt to provision to a CA-enabled vSphere cluster. 56 Standard dualsite/single vCenter topology This model provides no resilience/recovery for the cloud management platform. To enable this you should use the CA dual-site/single vCenter variant. CA dualsite/single vCenter topology As all of the management pods reside on vMSC, management components are recovered through vSphere HA mechanisms. Assuming the VPLEX Witness has been deployed in a third fault domain, this should happen automatically. Dual-site/single vCenter topology backup The primary option for backup in a dual-site/single vCenter topology is the Redundant Avamar/single vCenter configuration though the Standard Avamar configuration may also be used if backup is only required on one of the two sites. Both options are described in Chapter 7. Chapter 5: Dual-Site/Single vCenter Topology Ecosystem interactions Figure 26 shows how the concepts in this chapter interact in a CA dual-site/single vCenter configuration. Data protection concepts from Chapter 7 are also included. Figure 26. CA dual-site/single vCenter ecosystem 57 Chapter 6: Dual-Site/Dual vCenter Topology This chapter presents the following topics: Overview ..........................................................................................................59 Standard dual-site/dual vCenter topology .............................................................59 Disaster recovery dual-site/dual vCenter topology ..................................................60 Disaster recovery network considerations .............................................................61 vCenter Site Recovery Manager considerations ......................................................69 vRealize Automation considerations ......................................................................72 Disaster recovery storage considerations ..............................................................73 Recovery of cloud management platform ..............................................................74 Best practices ....................................................................................................75 Backup in dual-site/dual vCenter topology ............................................................75 DR dual-site/dual vCenter ecosystem ...................................................................76 58 Chapter 6: Dual-Site/Dual vCenter Topology This chapter describes networking and storage considerations for a dual-site/dual vCenter topology in the Federation Enterprise Hybrid Cloud solution. When to use the dual-site/dual vCenter topology The dual-site/single vCenter Federation Enterprise Hybrid Cloud topology may be used in either of the following scenarios. Standard dual-site/dual vCenter topology Two sites are present that require management via independent vCenter instances and a single Federation Enterprise Hybrid Cloud management stack/portal. Each site must have their own storage and networking resources, otherwise this model has no additional considerations per site than those listed in the single-site/single vCenter model. This is because each site has totally independent infrastructure resources with independent vCenters, but is managed by the same Federation Enterprise Hybrid Cloud management platform/portal. Note: In this case, the scope of the term ‘site’ is at the users’ discretion. This can be separate individual geographical locations, or independent islands of infrastructure in the same geographical location, such as independent VxBlock platforms. Disaster recovery dual-site/single vCenter topology Disaster recovery (restart of virtual machines on another site through the use of VMware Site Recovery Manager) is required. This topology also requires that EMC RecoverPoint is available. Note: Typically this model is used when the latency between the two physical data center locations exceeds the required latency for the use of vSphere metro storage clusters using VPLEX distributed storage (10 ms). The standard dual-site/dual vCenter Federation Enterprise Hybrid Cloud architecture controls two sites, each with independent islands of infrastructure, each using its own vCenter instance but controlled by a single Federation Enterprise Hybrid Cloud management platform/portal. This architecture provides a mechanism to extend an existing Federation Enterprise Hybrid Cloud by adding additional independent infrastructure resources to an existing cloud, when resilience of the management platform itself is not required, but where the resources being added either already belong to an existing vCenter or it is desirable for them to do so. Figure 27 shows the architecture used for this topology option. 59 Chapter 6: Dual-Site/Dual vCenter Topology Figure 27. Federation Enterprise Hybrid Cloud standard dual-site/dual vCenter architecture The DR dual-site/dual vCenter topology for the Federation Enterprise Hybrid Cloud solution provides protection and restart capability for workloads deployed to the cloud. Management and workload virtual machines are placed on storage protected by RecoverPoint and are managed from VMware vCenter Site Recovery Manager™. This topology allows for multi-site resilience across two sites with DR protection for both the management platform and workload virtual machines on the surviving site. Figure 28 shows the overall architecture of the solution. 60 Chapter 6: Dual-Site/Dual vCenter Topology Figure 28. Physical network design Federation Enterprise Hybrid Cloud DR dual-site/dual vCenter architecture The Federation Enterprise Hybrid Cloud solution deploys a highly resilient and fault-tolerant network architecture for intra-site network, compute, and storage networking. To achieve this, it uses features such as redundant hardware components, multiple link aggregation technologies, dynamic routing protocols, and high availability deployment of logical networking components. The DR dual-site/dual vCenter topology of the Federation Enterprise Hybrid Cloud solution requires network connectivity across two sites using WAN technologies. It maintains the resiliency of the Federation Enterprise Hybrid Cloud by implementing a similarly high-availability and fault tolerant network design with redundant links and dynamic routing protocols. The high-availability features of the solution, which can minimize downtime and service interruption, address any component-level failure within the site. Throughput and latency requirements are other important aspects of physical network design. To determine these requirements, consider carefully both the size of the workload and data that must be replicated between sites and the requisite RPOs and RTOs for your 61 Chapter 6: Dual-Site/Dual vCenter Topology applications. Traffic engineering and QOS capabilities can be used to guarantee the throughput and latency requirements of data replication. Requirements based on the management model The DR dual-site/dual vCenter topology is supported on all Federation Enterprise Hybrid Cloud management models. The Automation Pod components must be on a different Layer 3 network to the Core and NEI Pod components so that they can be failed over using VMware Site Recovery Manager, and the Automation network re-converged without affecting the Core and NEI Pod components on the source site. Supported virtual networking technologies The Federation Enterprise Hybrid Cloud supports the following virtual networking technologies in the dual-site/dual vCenter topology: Supported VMware NSX features Unsupported VMware NSX features  VMware NSX (recommended)  VMware vSphere Distributed Switch backed by non-NSX technologies for network reconvergence When using VMware NSX in dual-site /dual vCenter architecture, the Federation Enterprise Hybrid Cloud supports many NSX features, including but not limited to the following:  Micro-segmentation  Use of NSX security polices and groups The following VMware NSX Features are not supported in dual-site/dual vCenter topology:  Inter-site protection of dynamically provisioned VMware NSX networking components  NSX security tags Note: NSX security tags are not honored on failover as part of the out-of-the-box solution, but may be implemented as a professional services engagement. NSX best practices In a DR dual-site/dual vCenter topology, when NSX is used, NSX Controllers reside on each site’s corresponding NEI Pod. NSX best practice recommends that each controller is placed on a separate physical host. NSX will create ESGs and DLRs. Best practice for ESGs and DLRs recommends that they are deployed in HA pairs, and that the ESGs and DLRs are separated onto different physical hosts. Combining the above best practices means that a minimum of four physical hosts per site (eight hosts in total) are required to support the NEI pod function when NSX is used. VMware anti-affinity rules should be used to ensure that the following conditions are true during optimum conditions:  NSX controllers reside on different hosts.  NSX ESGs reside on different hosts.  NSX DLR Control virtual machines reside on different host.  NSX ESG and DLR Control virtual machines do not reside on the same physical hosts. When using the Federation Enterprise Hybrid Cloud Sizing tool, appropriate consideration should be given to the choice of server specification for the NEI Pod to ensure efficient use of hardware resources, as the tool will enforce the four server per site minimum when NSX is chosen. Figure 29 shows how the various NSX components are deployed independently on both sites within the topology. 62 Chapter 6: Dual-Site/Dual vCenter Topology Figure 29. NEI Pods from the cloud vCenter Server instances on Site A and Site B Perimeter NSX Edge When used with VMware NSX, the Federation Enterprise Hybrid Cloud solution provides multitier security support and security policy enforcement by deploying NSX Edges as perimeter firewalls. An NSX Edge can be deployed at different tiers to support tiered security policy control. Each site's NSX Manager deploys corresponding NSX Edge Services Gateways (ESGs) configured for services such as firewall, DHCP, NAT, VPN, and SSL-VPN. Logical switches When used with VMware NSX, the Federation Enterprise Hybrid Cloud solution provides logical networking support through NSX logical switches that corresponds to VXLAN segments. These logical switches support the extension of Layer 2 connections between various virtual machines and other networking components such as NSX Edges and logical routers. The use of VXLAN also increases the scalability of the solution. For the DR for the Federation Enterprise Hybrid Cloud solution topology, transit logical switches are required on both sites to provide connections between the DLRs and NSX Edges, as shown in Figure 30 and Figure 31. Duplicate logical switches are also needed on both sites for use by the workload virtual machines. 63 Chapter 6: Dual-Site/Dual vCenter Topology Figure 30. Logical switches on Site A Figure 31. Logical switches on Site B Distributed logical router When VMware NSX is used with the DR dual-site/dual vCenter topology, the NSX network elements of logical switches, DLRs and ESGs must be in place before configuring DRprotected blueprints to enable DR-protected workload provisioning. The DLRs perform east-west routing between the NSX logical switches. Additionally, the DLR can provide gateway services such as NAT for the virtual machines connected to the preprovisioned application with the ESG performing north-south routing between the DLR to the physical core network. Note: These network elements must be created either using the NSX UI or by direct API calls. When these elements are in place, vRealize Automation blueprints can be configured to connect a machine's network adapter to their respective logical switch. The DLR control virtual machine is deployed on the NEI Pod in high-availability mode. In this mode, two virtual machines are deployed on separate hosts as an active/passive pair. The active/passive pair maintains state tables and verifies each other's availability through heartbeats. When a failure of the active DLR is detected, the passive DLR immediately takes over and maintains the connection state and workload availability. 64 Chapter 6: Dual-Site/Dual vCenter Topology A DLR kernel module is deployed to each NSX-enabled Workload Pod host to provide east/west traffic capability and broadcast reduction. To provide default gateway services on both sites, a corresponding DLR must be deployed on both sites, as shown in Figure 32. Figure 32. IP mobility between the primary and recovery sites DLR interfaces on Site A and Site B The Federation Enterprise Hybrid Cloud solution supports migration of virtual machines to a recovery site without the need to change the IP addresses of the virtual machines. It does this by fully automating network re-convergence of tenant resource pods during disaster recovery failover when using VMware NSX only. Non-NSX network technology requirements The use of vSphere Distributed Switch, backed by other non-NSX networking technologies is permitted, but requires that the chosen technology supports IP mobility. Additionally, it requires that network re-convergence for both the Automation Pod and tenant resource pods is carried out manually in accordance with the chosen network technology, or that automation of that network re-convergence is developed as a professional services engagement. Maintenance of the alternative network convergence strategy is outside the scope of Federation Enterprise Hybrid Cloud support. VMware NSX-based IP mobility Default gateways on each site are created using DLRs. By configuring the DLRs on both sites identically, the same IP addresses and IP subnets are assigned to their corresponding 65 Chapter 6: Dual-Site/Dual vCenter Topology network interfaces, as shown in Figure 32. In this way, there is no need to reconfigure workloads default gateway settings in a recovery scenario. A dynamic routing protocol is configured for the logical networking and is integrated with the physical networking to support dynamic network convergence and IP mobility for the networks (subnets) supported for DR. This approach simplifies the solution and eliminates the need to deploy additional services to support IP address changes. As shown in Figure 33, route distribution requires that an IP prefix created for each logical switch that is connected to the DLR. Note: For DR protection the IP Prefix must be configured with the same name and IP network values on both primary and recovery DLRs. Figure 33. Route redistribution policy on Site A and Site B A route redistribution policy is configured by adding one or more prefixes to the Route Distribution table, so that logical switch networks defined in the prefix list can be redistributed to the dynamic routing protocol on the primary site DLR where the virtual machines are deployed and running. The route redistribution policy on the recovery site DLR 66 Chapter 6: Dual-Site/Dual vCenter Topology is configured to deny redistribution of networks connected to the recovery site, as shown in Figure 30. In the event of a disaster or a planned migration, you should execute a recovery plan in VMware Site Recovery Manager. After the virtual machines are powered off, the Federation Enterprise Hybrid Cloud network convergence scripts automatically (when using VMware NSX) determine the networks relevant to the cluster being failed over, and modify the action settings of those networks on the primary site DLR to deny redistribution of the networks associated with the cluster being failed. Note: Only the protected networks contained in the specific Site Recovery Manager protection plan being executed will be set to ‘Deny’. A subsequent recovery step uses the same network convergence scripts to modify the route redistribution policy on the recovery site DLR to permit redistribution of the corresponding recovery site networks before powering on the virtual machines. This dynamic network convergence ensures that the virtual machines can reach infrastructure services, such as Domain Name System (DNS) and Microsoft Active Directory on the recovery site, and reduces the recovery time. You can implement an additional level of routing control from a site to the WAN peering point to ensure that only appropriate networks are advertised. To enable network failover with the same IP subnet on both sites, a network can be active only on the primary site or the recovery site. To support this, the unit of failover for a network is restricted to a single compute cluster. All virtual machines on a compute cluster can fail over to the recovery site without affecting virtual machines running on other compute clusters. If the network spans multiple clusters, the administrator must configure the recovery plan to ensure that all virtual machines on the same network are active only on one site. Supported routing protocols The Federation Enterprise Hybrid Cloud has validated network designs using both OSPF and BGP in disaster recovery environments. BGP is recommended over OSPF, but both have been validated. VMware NSXbased security design This section describes the additional multitier security services available to virtual machines deployed in the Federation Enterprise Hybrid Cloud solution when used with VMware NSX. NSX security policies NSX security policies use security groups to simplify security policy management. A security group is a collection of objects, such as virtual machines, to which a security policy can be applied. To enable this capability the machines contained in the multi-machine blueprint must be configured with one or more security groups. A network security administrator or application security administrator configures the security policies to secure application traffic according to business requirements. To ensure consistent security policy enforcement for virtual machines on the recovery site, you must configure the security policies on both the primary and recovery sites. NSX perimeter Edge security Perimeter edges are deployed using NSX Edges on both the primary and recovery sites. The perimeter NSX Edge provides security features, such as stateful firewalls, and other services such as DHCP, NAT, VPN, and load balancer. The configuration of various services must be manually maintained on both the primary and recovery site perimeter edges. This ensures consistent security policy enforcement in case of DR or planned migration of virtual machines to the recovery site. 67 Chapter 6: Dual-Site/Dual vCenter Topology NSX distributed firewall The Federation Enterprise Hybrid Cloud solution supports the distributed firewall capability of NSX to protect virtual machine communication and optimize traffic flow. The distributed firewall is configured though the Networking and Security -> Service Composer -> Security Groups section of the vSphere web client. Figure 34 shows various security groups that may be pre-created in the NSX security configuration. Figure 34. Security groups on the primary and recovery sites The Federation Enterprise Hybrid Cloud solution provides an option to associate security group information with a machine blueprint. When a business user deploys the blueprint, the virtual machine is included in the security group configuration. This ensures enforcement of the applicable security policy as soon as the virtual machine is deployed. As shown in Figure 35, a corresponding security group of the same name must be created on the recovery site. To ensure that workloads are consistently protected after failover, both primary and recovery site security policies must be identically configured. Figure 35. 68 Security group on the recovery site Chapter 6: Dual-Site/Dual vCenter Topology Overview This DR for Federation Enterprise Hybrid Cloud solution incorporates storage replication using RecoverPoint, storage provisioning using ViPR, and integration with Site Recovery Manager to support DR services for applications and virtual machines deployed in the hybrid cloud. Site Recovery Manager natively integrates with vCenter and NSX to support DR, planned migration, and recovery plan testing. RecoverPoint and ViPR Storage Replication Adapters Site Recovery Manager integrates with EMC RecoverPoint storage replication and ViPR automated storage services via EMC Storage Replication Adapters (SRAs). The SRAs control the EMC RecoverPoint replication process. The EMC RecoverPoint SRA controls the Automation Pod datastores. The ViPR SRA controls protected Workload Pod datastores. Site mappings To support DR services, the Site Recovery Manager configuration must include resource mappings between the vCenter Server instance on the protected site and the vCenter Server instance on the recovery site. The mappings enable the administrator to define automated recovery plans for failing over application workloads between the sites according to defined RTOs and RPOs. The resources you need to map include resource pools, virtual machine folders, networks, and the placeholder datastore. The settings must be configured on both the protected and recovery sites to support application workload recovery between the two sites. Resource pool mappings A Site Recovery Manager resource pool specifies the compute cluster, host, or resource pool that is running a protected application. Resource pools must be mapped between the protected site and the recovery site in both directions so that, when an application fails over, the application can then run on the mapped compute resources on the recovery site. Folder mappings When virtual machines are deployed using the Federation Enterprise Hybrid Cloud solution, the virtual machines are placed in particular folders in the vCenter Server inventory to simplify administration. By default, virtual machines are deployed in a folder named VRM. This folder must be mapped between the protected and recovery sites in both directions. When used with Federation Enterprise Hybrid Cloud backup services, the folders used by backup as a service are automatically created in both vCenters and mapped in Site Recovery Manager. Network mappings Virtual machines may be configured to connect to different networks when deployed. Applications deployed with DR support must be deployed on networks that have been configured as defined in the Disaster recovery network considerations section. The networks must be mapped in Site Recovery Manager between the protected and recovery sites in both directions. For testing recovery plans, you should deploy a test network and use test network mappings when you create the recovery plan. Note: A Layer 3 network must be failed over entirely. Active machines in a given Layer 3 network must reside only in the site with the "permit" route redistribution policy. Placeholder datastore For every protected virtual machine, Site Recovery Manager creates a placeholder virtual machine on the recovery site. The placeholder virtual machine retains the virtual machine properties specified by the global inventory mappings or specified during protection of the individual virtual machine. 69 Chapter 6: Dual-Site/Dual vCenter Topology A placeholder datastore must be accessible to the compute clusters that support the DR services. The placeholder datastore must be configured in Site Recovery Manager and must be associated with the compute clusters. Disaster recovery support for Automation Pod vApps The Federations Enterprise Hybrid Cloud used several components that are deployed as a vSphere vApp. Currently this list includes:  EMC ViPR Controller  EMC ViPR SRM Site Recovery Manager protects virtual machines, but does not preserve the vApp structure required for EMC ViPR Controller and EMC ViPR SRM virtual machines to function. The high-level steps to achieve recovery of vApps are: 1. Deploy the vApp identically in both sites. 2. Vacate the vApp on the recovery site (delete the virtual machines, but retain the virtual machine container). 3. Protect the vApp on the protected site through Site Recover Manager, mapping the vApp containers from both sites. 4. Reapply virtual machine vApp settings on placeholder virtual machines. For additional details on the process and if other vApps in the environment are required, see the VMware Knowledge Base topic: vCenter Operations Manager 5.0.x: Using Site Recovery Manager to Protect a vApp Deployment. Protection groups A protection group is the unit of failover in Site Recovery Manager. The Federation Enterprise Hybrid Cloud solution supports failover at the granularity of the Workload Pod. In the context of the DR dual-site/dual vCenter topology, two Workload Pods are assigned to a DR pair, where one pod is the primary and is considered the protected cluster, and the second pod is the alternate site and is considered the recovery cluster. All protection groups associated with a DR pair and all the virtual machines running on a particular pod must failover together. For the DR dual-site/dual vCenter topology there is a 1:1 mapping between a DR pair and a recovery plan, and each recovery plan will contain one or more protection groups. Each protection group contains a single replicated vSphere datastore, and all the virtual machines that are running on that datastore. When you deploy new virtual machines on a Workload Pod vRealize Automation, those virtual machines are automatically added to the corresponding protection group and fail over with that protection group. Recovery plans Recovery plans enable administrators to automate the steps required for recovery between the primary and recovery sites. A recovery plan may include one or more protection groups. You can test recovery plans to ensure that protected virtual machines recover correctly to the recovery site. Tenant Pod recovery plans The automated networking re-convergence capabilities of this DR for Federation Enterprise Hybrid Cloud solution eliminate the need to change the IP addresses of workload virtual machines when they failover from one site to the other. Instead, the tenant networks move with the virtual machines and supports virtual machine communication outside the network when on the recovery site. 70 Chapter 6: Dual-Site/Dual vCenter Topology When using VMware NSX, Federation Enterprise Hybrid Cloud can automate network reconvergence of the tenant workload pods via a custom VMware Site Recovery Manager step of the Site Recovery Manager recovery plan, ensuring security policy compliance on the recovery site during a real failover. However, running a test Site Recovery Manager recovery plan with VMware NSX does not affect the production virtual machines, because the network convergence automation step has the required built-in intelligence to know that the networks should not be re-converged in that scenario. If non-NSX alternatives are used, then this network re-convergence is not automated, and therefore needs to be done manually during a pause in the Site Recovery Manager recovery plan, or via an automated Site Recovery Manager task created as part of a professional services engagement. Note: A recovery plan must be manually created per DR-enabled cluster before any STaaS operations are executed - two per pair to enable failover and failback. Automation Pod recovery plans Network re-convergence of the network supporting the Federation Enterprise Hybrid Cloud Automation Pod is a manual task irrespective of the presence of VMware NSX. Note: This reflects the out-of-the-box solution experience. Automated network re-convergence for the Automation Pod can be achieved via a professional services engagement. Collapsed management model When configuring the protection group and recovery plans for the Automation Pod components under a collapsed management model, you must exclude all Core and NEI Pod components from the configurations. This is to ensure that system does not attempt to fail over the Core and NEI components from one site to the other. 71 Chapter 6: Dual-Site/Dual vCenter Topology Configuring primary and recovery site endpoints The Federation Enterprise Hybrid Cloud solution uses vRealize Automation to provide automated provisioning and management of cloud resources such as storage and virtual machines. To support DR services for cloud resources, you must configure vRealize Automation two virtual (vCenter) endpoints. The first endpoint is configured to support IaaS services for the first site; this endpoint uses the vCenter Server instance where the storage and virtual machines for the first site are deployed. The second endpoint is configured to serve as the recovery site for the resources of the first site. If required, workloads can also be configured to run in the secondary site, with recovery in the first site by configuring multiple DR cluster pairs with a protected cluster in each site, and a corresponding recovery cluster on the other site. To configure each endpoint, a separate vCenter agent must be installed on the IAAS server that is running vRealize Automation. Configuring the infrastructure for disaster recovery services The vRealize Automation IaaS administrator must assign the compute resources for the Workload Pods, on both the protected and recovery sites, to the fabric administrator for allocation to business groups. In the dual-site/dual vCenter DR configuration, you must designate Workload Pods (clusters) as DR-enabled, and all workloads deployed to that cluster will be DR-protected. If you have additional workloads that don’t require DR support then additional local (unprotected) workload pods should be provisioned to accommodate this. When replicated storage is provisioned to a protected Workload Pod, the fabric administrator must update the reservation policies for the relevant business groups to allocate the newly provisioned storage. Federation Enterprise Hybrid Cloud STaaS workflows automatically add newly provisioned storage to the appropriate protection group. This ensures that the virtual machines deployed on the storage are automatically protected and are included in the recovery plans defined for the Workload Pod. Configuring application blueprints for disaster recovery Storage reservation policies are used to deploy virtual machine disks to a datastore that provides the required RPO. The vRealize Automation IaaS administrator must create storage reservation policies to reflect the RPOs of different datastores. The fabric administrator must then assign the policies to the appropriate datastores of the compute clusters. Business Group administrators can configure the blueprints for virtual machines so that business users can select an appropriate storage reservation policy when deploying an application. The business user requests a catalog item in the Federation Enterprise Hybrid Cloud tenant portal, selects storage for the virtual machines, and assigns an appropriate storage reservation policy for the virtual machines disks based on the required RPO. The choice made at this point also dictates whether the virtual machine will be DR protected or not. The virtual machines disks are then placed on datastores that support the required RPO. The virtual machines are automatically deployed with the selected DR protection service and associated security policy for both the primary and recovery sites. 72 Chapter 6: Dual-Site/Dual vCenter Topology ViPR managed Workload Pod storage For the Workload Pods, ViPR SRA manages the protection of ViPR-provisioned storage. ViPR SRA provides an interface between Site Recovery Manager and ViPR Controller. ViPR Controller, which is part of the Automation Pod, must be running and accessible before the ViPR SRA can instruct ViPR to control the EMC RecoverPoint replication functions. This means that the Automation Pod and ViPR vApp must be functioning before Site Recovery Manager can execute a recovery of the Workload Pods. Storage at each site The Core and NEI clusters on each site require site-specific storage that does not need to be protected by EMC RecoverPoint. Site Recovery Manager also requires site-specific datastores on each site to contain the placeholder virtual machines for the tenant and automation pods. The Automation Pod storage must be distinct from the Core and NEI storage and protected by EMC RecoverPoint. ViPR virtual arrays There must be at least one virtual array for each site. By configuring the virtual arrays in this way, ViPR can discover the EMC RecoverPoint and storage topology. You should carefully plan and perform this step because it is not possible to change the configuration after resources have been provisioned, without first disruptively removing the provisioned volumes. ViPR virtual pools When you specify EMC RecoverPoint as the protection option for a virtual pool, the ViPR storage provisioning services create the source and target volumes and the source and target journal volumes, as shown in Figure 36. Figure 36. ViPR/EMC RecoverPoint protected virtual pool Each DR-protected/recovery cluster pair has storage that replicates (under normal conditions) in a given direction, for example, from Site A to Site B. To allow active/active site configuration, additional DR cluster pairs should be configured whose storage replicates in the opposite direction. You must create two sets of datastores – one set that will replicate from Site A and another set that will replicate from Site B. To enable this, you need to configure an environment similar to Figure 36 for Site A, and the inverse of it for Site B (where the protected source pool is Site B, and local target pool is on Site A). RecoverPoint journal considerations Every RecoverPoint-protected LUN requires access to a journal LUN to maintain the history of disk writes to the LUN. The performance of the journal LUN is critical in the overall performance of the system attached to the RecoverPoint-protected LUN and therefore its 73 Chapter 6: Dual-Site/Dual vCenter Topology performance capability should be in line with the expected performance needs of that system. By default, ViPR uses the same virtual pool for both the target and the journal LUN for a RecoverPoint copy, but it does allow you to specify a separate or dedicated pool. In both cases, the virtual pool and its supporting physical pools should be sized to provide adequate performance. Storage provisioning EMC RecoverPoint protected storage is provisioned to the Workload vSphere clusters in the environment using the catalog item named Provision Cloud Storage. Note: The Federation recommends that you follow the best practice guidelines when deploying any of the supported platform technologies. The Federation Enterprise Hybrid Cloud does not require any variation from these best practices. The workflow interacts with both ViPR and vRealize Automation to create the storage, present it to the chosen vSphere cluster and add the new volume to the relevant vRealize storage reservation policy. As with the single-site topology, vSphere clusters are made eligible for storage provisioning by tagging them with vRealize Automation custom properties. However, in this case they are defined as DR-enabled clusters, that is, they are part of a Site Recovery Manager configuration that maps protected clusters to recovery clusters. This tagging is done during the installation and preparation of vSphere clusters for use by the Federation Enterprise Hybrid Cloud using the DR Cluster Onboarding workflow provided as part of the Federation Enterprise Hybrid Cloud self-service catalog. As local-only vSphere clusters can also be present in a DR dual-site/dual vCenter topology, when you attempt to provision to a DR-enabled vSphere cluster, the Provision Cloud Storage catalog item will automatically present only EMC RecoverPoint-protected virtual storage pools to provision from. Standard dualsite/dual vCenter topology This model provides no resilience/recovery for the cloud management platform. To enable this you should use the DR dual-site/dual vCenter variant. DR dual-site/dual vCenter topology In the DR dual-site/dual vCenter topology, EMC RecoverPoint and Site Recovery Manager protect the Automation Pod. This allows for recovery between Site A and Site B in planned and unplanned recovery scenarios. EMC RecoverPoint SRA for Site Recovery Manager is used to interact with EMC RecoverPoint during a failover of the Automation Pod’s resources. The Core and NEI Pods (when NSX is used) are created manually on both sites to mirror functionality such as NSX dynamic routing, NSX security groups, NSX security policies (firewall rules) and to host the Site Recovery Manager servers. As a result, there is no need to protect them using EMC RecoverPoint or Site Recovery Manager. In a distributed management model, this is accomplished by excluding the Core and NEI Pods from the process of creating associated datastore replications, protection groups, and recovery plans for the vSphere ESXi clusters hosting those functions. In a collapsed management model, all components are on the same vSphere ESXi cluster, so the Core and NEI components must be excluded from Site Recovery Manager recovery plans and protections groups for that cluster. Despite residing on the same vSphere cluster, the Automation Pod components should be on a distinct network and a distinct set of datastores, so that they can be failed over between sites without affecting Core or NEI components. 74 Chapter 6: Dual-Site/Dual vCenter Topology Tenant workload networks are automatically re-converged to the recovery site by the Federation Enterprise Hybrid Cloud solution when used with VMware NSX. When non-NSX alternatives are used, tenant network re-convergence is not automated by the Federation Enterprise Hybrid Cloud. Automation Pod network re-convergence is a manual step with or without the presence of VMware NSX. The vCenter Server instances on each site manage the NEI, Automation, and Workload Pods on their respective sites and act as the vSphere end-points for vRealize Automation. The vCenter Server instances are integrated using Site Recovery Manager, which maintains failover mappings for the networks, clusters, and folders between the two sites. Naming conventions VMware vCenter Site Recovery Manager protection groups Protection group names must match the Workload Pod names—for example, if SAComputePod2 is the name of Workload Pod 2 on Site A, then the Site Recovery Manager protection group must also be named SAComputePod2. The solution relies on this correspondence when performing several of the automation tasks necessary for successful failover and subsequent virtual machine management through vRealize Automation. VMware NSX security groups Security group names must be the same on both sites. VMware NSX security policies Security policy names must be the same on both sites. EMC ViPR virtual pools ViPR virtual pool names must be meaningful because they are the default names for storage reservation policies. For example, when creating Tier 1 DR protected storage with an RPO of 10 minutes, Tier 1 – DR Enabled – 10 Minute RPO is an appropriate name. NSX logical networks Each Workload Pod (compute cluster) must have its own transport zone. The NEI Pod must be a member of each transport zone. If a transport zone spans multiple compute clusters, the corresponding Site Recovery Manager protection groups must be associated with the same Site Recovery Manager recovery plan. The reason for this is that, when a transport zone spans multiple compute clusters, network mobility from Site A to Site B affects the virtual machines deployed across these clusters; therefore, the clusters must be failed over as a set. DR dual-site/dual vCenter topology backup The recommended option for backup in a DR dual-site/dual vCenter topology is the Redundant Avamar/dual vCenter configuration. This option is described in detail in Chapter 7. 75 Chapter 6: Dual-Site/Dual vCenter Topology Ecosystem interactions Figure 37 shows how the concepts in this chapter interact in a DR dual-site/dual vCenter configuration. Data protection concepts from Chapter 7 are also included. Figure 37. 76 DR ecosystem Chapter 7: Data Protection This chapter presents the following topics: Overview ..........................................................................................................78 Concepts...........................................................................................................79 Standard Avamar configuration............................................................................84 Redundant Avamar/single vCenter configuration ....................................................86 Redundant Avamar/dual vCenter configuration ......................................................90 77 Chapter 7: Data Protection This chapter discusses the considerations for implementing data protection, also known as backup as a service (BaaS) in the context of the Federation Enterprise Hybrid Cloud. Backup and recovery of a hybrid cloud is a complicated undertaking in which many factors must be considered, including:  Backup type and frequency  Impact and interaction with replication  Recoverability methods and requirements  Retention periods  Automation workflows  Interface methods (workflows, APIs, GUI, CLI, scripts, and so on)  Implementation in a CA or DR-enabled environment VMware vRealize Orchestrator™, which is central to all of the customizations and operations used in this solution, manages operations across several EMC and VMware products, including:  VMware vRealize Automation  VMware vCenter  EMC Avamar and EMC Data Protection Advisor™ This solution uses Avamar as the technology to protect your datasets. Using Avamar, this backup solution includes the following characteristics:  Abstracts and simplifies backup and restore operations for cloud users  Uses VMware Storage APIs for Data Protection, which provides Changed Block Tracking for faster backup and restore operations  Provides full image backups for running virtual machines  Eliminates the need to manage backup agents for each virtual machine in most cases  Minimizes network traffic by deduplicating and compressing data Note: The Federation recommends that you engage an Avamar product specialist to design, size, and implement a solution specific to your environment and business needs. 78 Chapter 7: Data Protection Scalable backup architecture The Federation Enterprise Hybrid Cloud backup configurations add scalable backup by adding the ability to configure an array of Avamar instances. Federation Enterprise Hybrid Cloud BaaS workflows automatically distribute the workload in a round-robin way across the available Avamar instances, and provides a catalog item to enable additional Avamar instances (up to a maximum of 15 Avamar replication pairs) to be added to the configuration. When new Avamar instances are added, new virtual machine workloads are automatically assigned to those new instances until an equal number of virtual machines are assigned to all Avamar instances in the environment. Once that target has been reached, virtual machines are assigned in a round-robin way again. The configuration of the Avamar instances is stored by the Federation Enterprise Hybrid Cloud workflows for later reference when reconfiguring or adding instances. Avamar replication pairs An Avamar replication pair is defined as a relationship configured between two Avamar instances, and is used by the Federation Enterprise Hybrid Cloud workflows to ensure backup data is protected against the loss of a physical Avamar instance. Normally this is used to ensure that data backed up on one site is available to restore on a secondary site, but it could also be used to provide extra resilience on a single site if required. The Federation Enterprise Hybrid Cloud provides two different redundant Avamar configurations that use an array of Avamar replication pairs to achieve the same scalability as the standard Avamar configuration but with the added resilience that every instance of Avamar has a replication partner, to which it can replicate any backup sets that it receives. Note: In the standard Avamar configuration, each instance is technically configured as the first member of an Avamar replication pair. In this case, no redundancy exists, but it can be added later by adding a second member to each replication pair. To achieve this, the Federation Enterprise Hybrid Cloud uses the concepts of primary and secondary Avamar instances within each replication pair, and the ability to reverse these personalities so that, in the event of a failure, backup and restore operations can continue. The primary Avamar instance is where all scheduled backups are executed. It is also the instance that Federation Enterprise Hybrid Cloud on-demand backup and restore features communicate with in response to dynamic user requests. The primary Avamar instance also has all the currently active replication groups, making it responsible for replication of new backup sets to the secondary Avamar instance. The secondary Avamar instance has the same configurations for backup and replication policies, except that BaaS workflows initially configure these policies in a disabled state. If the primary Avamar instance becomes unavailable, the policies on the secondary Avamar instance can be enabled via the Toggle Single Avamar Pair Designations catalog item to enable backup and replication operations to continue. Note: Replication operations do not catch up until the original primary Avamar instance (now designated as secondary) becomes available again, at which time replication automatically transmits newer backup sets to the secondary system. In this solution, after a redundant Avamar configuration is enabled, the Federation Enterprise Hybrid Cloud workflows configure all subsequent backups with replication enabled. If one member of the Avamar replication pair is offline, backups taken to the surviving member of the pair will automatically be replicated once the offline member is brought back online. 79 Chapter 7: Data Protection How each Avamar instance in a replication pair operates varies based on which backup topology is configured, and is described in the context of each individual topology later in this chapter. VMware vCenter folder structure and backup service level relationship When a backup service level is created via the Create Backup Service Level vRealize Automation catalog item, it creates an associated set of folders in the cloud vCenter (or both cloud vCenters if done in a dual-site/dual vCenter environment). The number of folders created depends on how many Avamar pairs are present, and these folders become part of the mechanism for distributing the backup load. Note: In a DR dual-site/dual vCenter environment, the Create a Backup Service Level catalog item automatically creates Site Recovery Manager folder mappings between the new folders created in the first cloud vCenter and their corresponding folders in the second vCenter. Example If you create a backup service level named Daily-7yr in your environment, and four Avamar replication pairs (numbered 0 through 3) are present, then the following folders are created in the relevant cloud vCenter servers:  Daily-7yr-Pair0  Daily-7yr-Pair1  Daily-7yr-Pair2  Daily-7yr-Pair3 When you assign a virtual machine to the Daily-7yr backup policy, the workflows use a selection algorithm to determine the Avamar pair with least load, and then assign the virtual machine to the associated folder. So if Avamar-Pair2 is determined to be the best target, then the virtual machine is moved to the Daily-7yr-Pair2 vCenter folder and automatically backed up by Avamar-Pair2 as a result. How the Avamar instances are assigned to monitor and backup these folders differs dependent on which backup topology is deployed, and is described in the context of each individual topology later in this chapter. Avamar pair to vSphere cluster association Avamar image-level backups work by mounting snapshots of VMDKs to Avamar proxy virtual machines and then backing up the data to the Avamar instance that the Avamar proxy is registered with. In a fully deployed Federation Enterprise Hybrid Cloud with up to 10,000 user virtual machines and hundreds of vSphere clusters, this could lead to Avamar proxy sprawl if not properly configured and controlled. To do this, the Federation Enterprise Hybrid Cloud associates vSphere clusters to a subset of Avamar replications pairs. This means that a reduced number of Avamar proxy virtual machines are required to service the cloud. Associations between a vSphere cluster and an Avamar pair is done via the Federation Enterprise Hybrid Cloud BaaS Associate Avamar Pairs with vSphere Cluster catalog item. Note: In a DR dual-site/dual vCenter topology, when a protected cluster is associated with an Avamar pair, the associated recovery cluster is automatically associated with the same Avamar pair to ensure continuity of service on failover. 80 Chapter 7: Data Protection Avamar designations In the redundant Avamar/single vCenter configuration, there are two Avamar instances in each pair, and both are assigned to monitor the same vCenter folder and to backup any virtual machines that folder contains. To ensure that this does not result in both instances backing up the same virtual machine and then replicating each backup (four copies in total), the Federation Enterprise Hybrid Cloud uses primary and secondary Avamar instances within each replication pair, and the ability to reverse these personalities so that, in the event of a failure, backup and restore operations can continue. The primary Avamar instance is where all scheduled backups are executed. It is also the instance that Federation Enterprise Hybrid Cloud on-demand backup and restore features communicate with in response to dynamic user requests. The primary Avamar instance also has all the currently active replication groups, making it responsible for replication of new backup sets to the secondary Avamar instance. The secondary Avamar instance has the same configurations for backup and replication policies, except that BaaS workflows initially configure these policies in a disabled state. If the primary Avamar instance becomes unavailable, the policies on the secondary Avamar instance can be enabled via the Toggle Single Avamar Pair Designations catalog item to enable backup and replication operations to continue. Note: Avamar designations are only relevant in the redundant Avamar/single vCenter topology, because the standard Avamar configuration does not have replication, and in the redundant Avamar/dual vCenter configuration each member of a pair is configured to monitor a folder from only one of the two vCenters. Avamar proxy server configuration To associate an Avamar pair with a vSphere cluster, an Avamar proxy virtual machine needs to be deployed to that cluster. Standard Avamar configuration In single-site topologies, all proxies are on the same site. Therefore, the minimum number of proxy virtual machines required per Avamar pair for each cluster is one. Two is recommended for high availability, if there is scope within the overall number of proxies that can be deployed to the environment. Ideally, this number should be in the region of 60 to 80 proxies. Redundant Avamar/single vCenter configuration As the virtual machines on every vSphere cluster could be backed up by either of the members of an Avamar replication pair at different points in time, proxies for both the primary and secondary Avamar instances of every associated Avamar replicated pair should be deployed to every vSphere cluster. This means a minimum of two proxies is required. Four proxies would provide additional resilience, if the scope exists within the overall number of proxies that can be deployed to the environment. If the environment also includes CA, then the proxies for the Site A instances should be bound to Site A using virtual machine DRS affinity groups with a DRS virtual machine to host rule that sets those virtual machines to must run on a host DRS group that contains the Site A hosts. Similarly, proxies for the Site B Avamar instance should be bound to Site B hosts. This ensures that no unnecessary cross-WAN backups occur, as Avamar can use vStorage APIs for Data Protection to add VMDKs (from the local leg of the VPLEX volume) to proxy virtual machines bound to physical hosts on the same site as the primary Avamar instance. 81 Chapter 7: Data Protection Redundant Avamar/dual vCenter configuration In a dual-site/dual vCenter configuration, each vSphere cluster must have an Avamar proxy virtual machine for the local Avamar instance of every Avamar replicated pair associated with it. This ensures backups are taken locally and replicated to the other member of the Avamar pair. In a dual-site/dual vCenter configuration with DR, when a failover occurs, virtual machines will be moved from the vCenter folders on Site A to their corresponding vCenter folders on Site B, at which point the other member of the Avamar replication pair will assume responsibility for backing up and restoring those virtual machines. Therefore, each vSphere cluster still only requires a minimum of one Avamar proxy for every Avamar instance that is associated with it. Two will provide extra resilience. Note: In this configuration, if a failure of a single Avamar instance occurs without the failure of the vCenter infrastructure on the same site, then the second member of the Avamar replication pair will not automatically assume responsibility to backup virtual machines. To further protect against this scenario, additional resilience can be added on each site by using an Avamar RAIN grid. Avamar administratively full Determining that a backup target, in this case an Avamar instance, has reached capacity can be based on a number of metrics of the virtual machines it is responsible for protecting, including:  The number of virtual machines assigned to the instance  The total capacity of those virtual machines  The rate of change of the data of those virtual machines  The effective deduplication ratio that can be achieved while backing up those virtual machines  The available network bandwidth and backup window size Because using these metrics can be somewhat subjective, the Federation Enterprise Hybrid Cloud provides the ability for an administrator to preclude an Avamar instance or Avamar replication pair from being assigned further workload by setting a binary Administrative Full flag set via the Set Avamar to Administrative Full vRealize Automation catalog item. When a virtual machine is enabled for data protection via Federation Enterprise Hybrid Cloud BaaS workflows, the available Avamar instances are assessed to determine the most suitable target. If an Avamar instance or Avamar replication pair has had the Administrative Full flag set, then that instance/pair is excluded from the selection algorithm but continues to back up its existing workloads through on-demand or scheduled backups. If workloads are retired and an Avamar instance or pair is determined to have free capacity, again the Administrative Full flag can be toggled back, including it in the selection algorithm. Policy-based replication 82 Policy-based replication provides granular control of the replication process. With policybased replication, you create replication groups in Avamar Administrator to define the following replication settings:  Members of the replication group, which are either entire domains or individual clients  Priority for the order in which backup data replicates  Types of backups to replicate based on the retention setting for the backup or the date on which the backup occurred  Maximum number of backups to replicate for each client Chapter 7: Data Protection  Destination server for the replicated backups  Schedule for replication  Retention period of replicated backups on the destination server The redundant Avamar configurations automatically create a replication group associated with each backup policy and configure it with a 60-minute stagger to the interval associated with the backup policy. This enables the backups to complete before the replication starts. Note: This schedule can be manually altered within the Avamar GUI, but it is important that you make changes to both the primary and secondary versions of the replication group schedule so that replication operates as required if the Avamar personalities are reversed. Replication control If Data Domain is used as a backup target, Avamar is responsible for replication of Avamar data from the source Data Domain system to the destination Data Domain system. As a result, all configuration and monitoring of replication is done via the Avamar server. This includes the schedule on which Avamar data is replicated between Data Domain units. You cannot schedule replication of data on the Data Domain system separately from the replication of data on the Avamar server. There is no way to track replication by using Data Domain administration tools. Note: Do not configure Data Domain replication to replicate data to another Data Domain system that is configured for use with Avamar. When you use Data Domain replication, the replicated data does not refer to the associated remote Avamar server. 83 Chapter 7: Data Protection Architecture This section describes the features of the standard Avamar configuration shown in Figure 38 and the environments where it may be used. Figure 38. Scenarios for use Standard Avamar configuration architecture Best use The most logical fit for a standard Avamar configuration is a single-site Federation Enterprise Hybrid Cloud deployment. Alternate uses The standard Avamar configuration can be used in topologies such as CA dual-site and DR dual-site topologies with the following caveats: 84  The architecture provides no resilience on the secondary site in either of the dual-site topologies. If the site that hosts the Avamar instances is lost, then there is no ability to restore from backup.  In the CA dual-site/single vCenter topology, any virtual machines that reside on the site with no Avamar instances present will back up across the WAN connection.  In the DR dual-site/dual vCenter topology, any virtual machines that reside on the recovery site (and therefore are registered with a different vCenter) have no ability to back up. Chapter 7: Data Protection In the standard Avamar configuration, if the Create Backup Service level workflow creates a folder named Daily-7yr, and there are four Avamar replications pairs present, then it will configure the following backup policies with the Avamar replication pairs:  Avamar-Pair0: Assigned to monitor vCenter folder Daily-7yr-Pair0  Avamar-Pair1: Assigned to monitor vCenter folder Daily-7yr-Pair1  Avamar-Pair2: Assigned to monitor vCenter folder Daily-7yr-Pair2  Avamar-Pair3: Assigned to monitor vCenter folder Daily-7yr-Pair3 In this case, each pair has only one member, and therefore only one Avamar instance is monitoring each folder. Characteristics The characteristics of the standard Avamar configuration are:  All Avamar instances are standalone, that is, backup sets are not replicated to a secondary Avamar system.  It works in the context of a single cloud vCenter only.  All Avamar instances contain active backup policies. Note: An Avamar instance can be set to administratively full and still have active backup policies.  Distribution examples All Avamar instances are considered to be on the same site, and therefore the roundrobin distribution of virtual machines to vCenter folders includes all Avamar instances that are:  Assigned to the vSphere cluster that the virtual machine is on.  Are not set to Administratively Full. The following scenarios convey how virtual machines are assigned to vCenter folders to distribute load evenly across Avamar instances, assuming the following configuration, as shown in Figure 38:  Four Avamar instances and two vSphere clusters exist  AV_REP_PAIR0 and AV_REP_PAIR1 are assigned to Cluster 1  AV_REP_PAIR2 and AV_REP_PAIR3 are assigned to Cluster 2 Note: In this example all virtual machines are deployed to the backup policy named Daily-7yr. Scenario 1: VM1 is deployed to Cluster 1 - No other workload virtual machines exist  AV_REP_PAIR2 and AV_REP_PAIR3 are ruled out because they are not assigned to Cluster 1.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 1.  It is placed in a folder named Daily-7yr-Pair0, indicating assignment to AV_REP_PAIR0. AV_REP_PAIR1 is an equally viable candidate as both grids are empty, but AV_REP_PAIR0 is selected based on numerical order. 85 Chapter 7: Data Protection Scenario 2: VM2 is deployed to Cluster 1 - VM1 exists  AV_REP_PAIR2 and AV_REP_PAIR3 are ruled out because they are not assigned to Cluster 1.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 1.  It is placed in a folder named Daily-7yr-Pair1 indicating assignment to AV_REP_PAIR1 because the round-robin algorithm determined that AV_REP_PAIR1 had fewer virtual machines than the other candidate AV_REP_PAIR0. VM3 follows a similar logic and ends up being managed by AV_REP_PAIR2 while VM4 is managed by AV_REP_PAIR3. Architecture This section describes the features of the redundant Avamar/single vCenter configuration shown in Figure 39 and the environments where it can be used. Figure 39. 86 Redundant Avamar/single vCenter configuration Chapter 7: Data Protection Scenarios for use Best use The most logical fit for a redundant Avamar/single vCenter configuration is a dual-site/single vCenter Federation Enterprise Hybrid Cloud deployment. Alternate uses The redundant Avamar/single vCenter configuration can be used in the single-site topology with no caveats to provide a backup infrastructure that can tolerate the loss of a physical Avamar instance. Note: The redundant Avamar/single vCenter should not be used in a DR dual-site topology because doing so imposes caveats that can be overcome using the redundant Avamar/dual vCenter configuration without the need for any extra components. vCenter folder assignments In the redundant Avamar/single vCenter configuration, if the Create Backup Service level workflow creates a folder called Daily-7yr and there are four Avamar replications pairs present then it will configure the following backup policies with the Avamar replication pairs:  Avamar-Pair0: Assigned to monitor vCenter folder Daily-7yr-Pair0  Avamar-Pair1: Assigned to monitor vCenter folder Daily-7yr-Pair1  Avamar-Pair2: Assigned to monitor vCenter folder Daily-7yr-Pair2  Avamar-Pair3: Assigned to monitor vCenter folder Daily-7yr-Pair3 As there is only one vCenter, and therefore only one vCenter folder per Avamar replication pair, each Avamar instance in the pair is configured to monitor the same vCenter folder. At this point, the concept of primary and secondary Avamar members are employed to ensure that only one member of the pair is actively backing up and replicating the virtual machines at any given point in time. Characteristics The characteristics of the redundant Avamar/single vCenter configuration are:  All Avamar instances are configured in pairs and all backups are replicated.  It works in the context of a single cloud vCenter only.  Fifty percent of the Avamar instances have active backup and replication polices at any given point in time (50 percent of the Avamar instances are primary, 50 percent are secondary.) Note: Primary means that the backup policies on that instance are enabled. An Avamar instance can be set to administratively full and still be considered primary.  Avamar replication pairs are defined as split across sites, and therefore the roundrobin distribution of virtual machines to vCenter folders includes all Avamar pairs that:  Are assigned to the vSphere cluster that the virtual machine is on.  Have their primary member on the same site as the virtual machine DRS Affinity group that the virtual machine is a member of.  Are not set to Administratively Full. 87 Chapter 7: Data Protection Distribution examples The following scenarios convey how virtual machines are assigned to vCenter folders in order to distribute load evenly across Avamar instances, assuming the following configuration (as shown in Figure 39):  Six primary Avamar instances, six secondary instances. and two vSphere clusters exist  AV_REP_PAIR0 through AV_REP_PAIR3 are assigned to Cluster 1  AV_REP_PAIR4 and AV_REP_PAIR5 are assigned to Cluster 2 Note: In this example, all virtual machines are deployed to a backup policy named Daily-7yr. Scenario 1: VM1 is deployed to Cluster 1, Site A - No other workload virtual machines exist  AV_REP_PAIR4 and AV_REP_PAIR5 are ruled out because they are not assigned to Cluster 1.  AV_REP_PAIR1 and AV_REP_PAIR3 are ruled out for being primary on Site B.  AV_REP_PAIR0 and AV_REP_PAIR2 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 1 – Host CL1-H1.  It is placed in a folder named Daily-7yr-Pair0 indicating assignment to AV_REP_PAIR0. AV_REP_PAIR2 is an equally viable candidate as both grids are empty, but AV_REP_PAIR0 is chosen based on numerical order. Scenario 2: VM2 is deployed to Cluster 1, Site A - VM1 exists  AV_REP_PAIR4 and AV_REP_PAIR5 are ruled out because they are not assigned to Cluster 1.  AV_REP_PAIR1 and AV_REP_PAIR3 are ruled out because their primary instances are on Site B.  AV_REP_PAIR0 and AV_REP_PAIR2 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 1 – Host CL1-H1.  It is placed in a folder named Daily-7yr-Pair2 indicating assignment to AV_REP_PAIR2 because the round-robin algorithm determined that AV_REP_PAIR2 had fewer virtual machines than the other candidate AV_REP_PAIR0. Scenario 3: VM3 is deployed to Cluster 1, Site B - VM1 and VM2 exist  AV_REP_PAIR4 and AV_REP_PAIR5 are ruled out because they are not assigned to Cluster 1.  AV_REP_PAIR0 and AV_REP_PAIR2 are ruled out because their primary instances are on Site A. 88  AV_REP_PAIR1 and AV_REP_PAIR3 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 1 – Host CL1-H2.  It is placed in a folder named Daily-7yr-Pair1 indicating assignment to AV_REP_PAIR1. AV_REP_PAIR3 is an equally viable candidate as both grids are empty, but AV_REP_PAIR1 is selected based on numerical order. Chapter 7: Data Protection Scenario 4: VM4 is deployed to Cluster 1, Site B - VM1, VM2, and VM3 exist  AV_REP_PAIR4 and AV_REP_PAIR5 are ruled out because they are not assigned to Cluster 1.  AV_REP_PAIR0 and AV_REP_PAIR2 are ruled out because their primary instances are on Site A.  AV_REP_PAIR1 and AV_REP_PAIR3 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 1 – Host CL1-H2.  It is placed in a folder named Daily-7yr-Pair3 indicating assignment to AV_REP_PAIR3 because the round-robin algorithm determined that AV_REP_PAIR3 had fewer virtual machines than the other candidate AV_REP_PAIR1. When VM5 and VM6 are deployed to Cluster 2, the same logic dictates that VM5 is managed by AV_REP_PAIR4 while VM6 is managed by AV_REP_PAIR5 based on the Cluster to Avamar Pair mappings. 89 Chapter 7: Data Protection Architecture This section describes the features of the redundant Avamar/dual vCenter configuration shown in Figure 40 and the environments where it may be used. Figure 40. Scenarios for use Redundant Avamar/dual vCenter configuration Best use The most logical fit for a redundant Avamar/dual vCenter configuration is a dual-site/dual vCenter Federation Enterprise Hybrid Cloud deployment. Alternate uses There are no valid alternate uses for this configuration as no other topology uses dual-cloud vCenters. 90 Chapter 7: Data Protection vCenter folder assignments In the redundant Avamar/dual vCenter configuration, if the Create Backup Service level workflow creates a folder named Daily-7yr, and there are four Avamar replications pairs present, then it will configure the following backup policies with the Avamar replication pairs:  Avamar-Pair0: Assigned to monitor vCenter folder Daily-7yr-Pair0  Avamar-Pair1: Assigned to monitor vCenter folder Daily-7yr-Pair1  Avamar-Pair2: Assigned to monitor vCenter folder Daily-7yr-Pair2  Avamar-Pair3: Assigned to monitor vCenter folder Daily-7yr-Pair3 Because there are two vCenters, each Avamar instance in a pair is configured to monitor one of the two corresponding vCenter folders, that is, the instance on Site A monitors the folder from the Site A vCenter, and the instance from Site B monitors the folder from the Site B vCenter. As a virtual machine can only be one of the two folders (even in a DR dualsite/dual vCenter topology) there is no duplication of backups. Note: When VMware Site Recovery Manager is used, placeholder virtual machines are created as part of the Site Recovery Manager protection process. To ensure that Avamar does not detect these placeholder virtual machines, additional folders are created in each vCenter with a ‘_PH’ suffix, and placeholder virtual machines are located in these folders via Site Recovery Manager folder mappings. Before failing over a DR cluster, run the Prepare for DP Failover catalog item. This moves the production virtual machines out of their service level folders on the protected site, so that their placeholders are not created in an Avamar monitored folders when Site Recovery Manager re-protects the virtual machine after failover. Characteristics The characteristics of the redundant Avamar/single vCenter configuration are:  All Avamar instances are configured in pairs and all backups are replicated.  It works in the context of a dual-cloud vCenter only. Note: An Avamar instance can be set to Administratively Full and still have active backup and replication polices.  Distribution examples Avamar replication pairs are defined as being split across sites, and therefore the round-robin distribution of virtual machines to vCenter folders include all Avamar pairs that are:  Assigned to the vSphere cluster that the virtual machine is on.  Not set to Administratively Full. The following scenarios convey how virtual machines are assigned to vCenter folders in order to distribute load evenly across Avamar instances, assuming the following configuration (as shown in Figure 39):  Six active Avamar instances (in three Avamar replication pairs), three protected vSphere clusters, three recovery vSphere clusters, and two local clusters exist  AV_REP_PAIR0 through AV_REP_PAIR1 are assigned to Clusters 1 through 6  AV_REP_PAIR2 is assigned to Clusters 7 and 8 Note: All virtual machines are deployed to the backup policy named Daily-7yr for the example. Scenario 1: VM1 is deployed to Cluster 1 - No other workload virtual machines exist  AV_REP_PAIR2 is ruled out because it is not assigned to Cluster 1.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets. 91 Chapter 7: Data Protection  The expected results are:  The virtual machine is deployed to Cluster 1 – Host CL1-H1.  It is placed in a folder named Daily-7yr-Pair0 indicating assignment to AV_REP_PAIR0. AV_REP_PAIR1 is an equally viable candidate as both grids are empty, but AV_REP_PAIR0 is selected based on numerical order.  Because Cluster 1 is on Site A, AV_INSTANCE_00 will back up the virtual machine and replicate the backups to AV_INSTANCE_01. Scenario 2: VM2 is deployed to Cluster 1 - VM1 exists  AV_REP_PAIR2 is ruled out as it is not assigned to Cluster 1.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets  The expected results are:  The virtual machine is deployed to Cluster 1 – Host CL1-H1.  It is placed in a folder named Daily-7yr-Pair1 indicating assignment to AV_REP_PAIR1 because the round-robin algorithm determined that AV_REP_PAIR2 had fewer virtual machines than the other candidate, AV_REP_PAIR0.  Because Cluster 1 is on Site A, AV_INSTANCE_02 will back up the virtual machine and replicate the backups to AV_INSTANCE_03. Scenario 3: VM3 is deployed to Cluster 3 - VM1 and VM2 exist  AV_REP_PAIR2 is ruled out as it is not assigned to Cluster 3.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 3 – Host CL3-H1.  It is placed in a folder named Daily-7yr-Pair0 indicating assignment to AV_REP_PAIR0. AV_REP_PAIR1 is an equally viable candidate as both have equal virtual machines (one assigned to each pair globally, but none to the instances on Site B), but AV_REP_PAIR0 is selected based on numerical order. Scenario 4: VM4 is deployed to Cluster 3 - VM1, VM2, and VM3 exist  AV_REP_PAIR2 is ruled out, because it is not assigned to Cluster 3.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 3 – Host CL3-H1.  It is placed in a folder named Daily-7yr-Pair1 indicating assignment to AV_REP_PAIR1 because the round-robin algorithm determined that AV_REP_PAIR1 had fewer virtual machines than the other candidate AV_REP_PAIR0.  Because Cluster 3 is on Site B, AV_INSTANCE_03 will back up the virtual machine and replicate the backups to AV_INSTANCE_02. Scenario 5: VM5 is deployed to Cluster 5 - VM1, VM2, VM3 and VM4 exist 92  AV_REP_PAIR2 is ruled out, because it is not assigned to Cluster 5.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 5 – Host CL5-H1.  It is placed in a folder named Daily-7yr-Pair0 indicating assignment to AV_REP_PAIR0. AV_REP_PAIR1 is an equally viable candidate because both have Chapter 7: Data Protection equal virtual machines (two assigned to each pair globally), but AV_REP_PAIR0 is selected based on numerical order.  Because Cluster 5 is on Site A, AV_INSTANCE_00 will back up the virtual machine and replicate the backups to AV_INSTANCE_01. Scenario 6: VM6 is deployed to Cluster 6 - VM1, VM2, VM3, VM4 and VM5 exist  AV_REP_PAIR2 is ruled out, because it is not assigned to Cluster 6.  AV_REP_PAIR0 and AV_REP_PAIR1 are identified as potential targets.  The expected results are:  The virtual machine is deployed to Cluster 6 – Host CL6-H1.  It is placed in a folder named Daily-7yr-Pair1 indicating assignment to AV_REP_PAIR1 because the round-robin algorithm determined that AV_REP_PAIR1 had fewer virtual machines than the other candidate AV_REP_PAIR0.  Because Cluster 6 is on Site B, AV_INSTANCE_03 will back up the virtual machine and replicate the backups to AV_INSTANCE_02. Scenario 7: VM7 is deployed to Cluster 7 - VM1, VM2, VM3, VM4, VM5 and VM6 exist  AV_REP_PAIR0 and AV_REP_PAIR1 are ruled out, because they are not assigned to Cluster 7.  AV_REP_PAIR2 is identified as the only potential target.  The expected results are:  The virtual machine is deployed to Cluster 7 – Host CL7-H1.  It is placed in a folder named Daily-7yr-Pair2 indicating assignment to AV_REP_PAIR3. We found no other viable candidates.  Because Cluster 7 is on Site A, AV_INSTANCE_05 will back up the virtual machine and replicate the backups to AV_INSTANCE_05. 93 Chapter 7: Data Protection Redundant Avamar/dual vCenter proxy example Figure 41 shows an example of how proxies might be configured in a redundant Avamar/dual vCenter environment Figure 41. 94 Redundant Avamar/dual vCenter proxy example Chapter 8: Solution Rules and Permitted Configurations This chapter presents the following topics: Overview ..........................................................................................................96 Architectural assumptions ...................................................................................96 VMware Platform Services Controller ....................................................................96 VMware vRealize tenants and business groups .......................................................98 EMC ViPR tenants and projects ............................................................................99 General storage considerations .......................................................................... 100 VMware vCenter endpoints ................................................................................ 100 Permitted topology configurations ...................................................................... 101 Permitted topology upgrade paths ...................................................................... 102 Bulk import of virtual machines ......................................................................... 103 DR dual-site/dual vCenter topology restrictions.................................................... 104 Resource sharing ............................................................................................. 106 Data protection considerations ........................................................................... 106 Software resources .......................................................................................... 106 Sizing guidance ............................................................................................... 106 95 Chapter 8: Solution Rules and Permitted Configurations This chapter looks at the rules, configurations, and dependencies between the Federation Enterprise Hybrid Cloud components and their constructs, outlining how this influences the supported configurations within the cloud. Assumption and justifications The following assumptions and justifications apply to the Federation Enterprise Hybrid Cloud architecture:  The appliance based version of vCenter is not supported. The vCenter Server full installation is used because it:  Provides support for an external Microsoft SQL Server database  Resides on a Windows System that also supports the VMware Update Manager™ service, enabling minimal resource requirements in smaller configurations  VMware Platform Services Controller is used instead of the vRealize Automation Identity Appliance because it supports the multisite, single sign-on requirements of the solution  Windows-based Platform Services Controllers are used as they are the natural upgrade path from previous versions of Federation Enterprise Hybrid Cloud  The appliance-based versions of Platform Services Controllers are not supported This solution uses VMware Platform Services Controller in place of the vRealize Automation Identity Appliance. VMware Platform Services Controller is deployed on the dedicated virtual machine server in each Core Pod (multiple Core Pods exist in the DR dual-site/dual vCenter topology) and an additional Platform Services Controller (Auto-Platform Services Controller) is deployed on a server in the Automation Pod. The Auto-Platform Services Controller server provides authentication services to all the Automation Pod management components requiring Platform Services Controller integration. This configuration enables authentication services to fail over with the other automation components and enables a seamless transition between Site A and Site B. There is no need to change IP addresses, DNS, or management component settings. Platform Services Controller domains The Federation Enterprise Hybrid Cloud uses one or more Platform Services Controller domains depending on the management platform deployed. Platform Services Controller instances are configured within those domains according to the following model:   96 External Platform Services Controller domain (distributed management model only)  First external Platform Services Controller  Second external Platform Services Controller (DR dual-site/dual vCenter topology only) Cloud SSO domain (All topologies)  First Cloud Platform Services Controller  Automation Pod Platform Services Controller  Second Cloud Platform Services Controller (DR dual-site/dual vCenter topology only) Chapter 8: Solution Rules and Permitted Configurations Figure 42 shows the Platform Services Controller domains and how each Platform Services Controller instance and domain required are configured. Figure 42. First Platform Services Controller instance in each single sign-on domain SSO domain and vCenter SSO instance relationships This first VMware Platform Services Controller deployed in each single sign-on domain is deployed by creating a new vCenter Single Sign-On domain, enabling it to participate in the default vCenter Single Sign-On namespace (vsphere.local). This primary Platform Services Controller server supports identity sources, such as Active Directory, OpenLDAP, local operating system users, and SSO embedded users and groups. This is the default deployment mode when you install VMware Platform Services Controller. 97 Chapter 8: Solution Rules and Permitted Configurations Subsequent vCenter Single Sign-On instances in each single sign-on domain Additional VMware Platform Services Controller instances are installed by joining the new Platform Services Controller to an existing single sign-on domain, making them part of the existing domain, but in a new SSO site. When you create Platform Services Controller servers in this fashion, the deployed Platform Services Controller instances become members of the same authentication namespace as the Platform Services Controller instance. This deployment mode should only be used after you have deployed the first Platform Services Controller instance in each single sign-on domain. In vSphere 6.0, VMware Platform Services Controller single sign-on data (such as policies, solution users, application users, and identity sources) is automatically replicated between each Platform Services Controller instance in the same authentication namespace every 30 seconds. vRealize tenant design The Federation Enterprise Hybrid Cloud can operate using single or multiple vRealize Automation tenants. STaaS operations rely on the tenant URL value configured as part of the vRealize Automation tenant, and therefore require an individual vRealize Orchestrator server per additional tenant, if the ability for multiple tenants to execute STaaS operations is required. The Federation Enterprise Hybrid Cloud foundation package needs to be installed on each of the vRealize Orchestrator servers, entering the relevant tenant URL during installation. This is also required in order to populate the vRealize Automation catalog in each tenant with the relevant STaaS catalog items. vRealize tenant best practice As the vRealize Automation IaaS administrator is a system-wide role, having multiple tenants configure endpoints and execute STaaS operations may not provide any additional value over and above the use of a single tenant with multiple business groups. Therefore, while multiple tenants are supported, Federation Enterprise Hybrid Cloud is normally deployed with a single tenant with respect to STaaS and Data Protection operations. vRealize business group design Federation Enterprise Hybrid Cloud uses two system business groups in each non-default tenant. The first, EHCSystem, is used as the target for installation of the vRealize Automation advanced services STaaS catalog items. It does not require any compute resources. The second system business group, EHCOperations, is used as the group where Federation Enterprise Hybrid Cloud storage administrators are configured. It is given entitlements to the STaaS and Cluster Onboarding catalog items. It has no compute resource requirements. vRealize business group best practice The Federation recommends that applications provisioned using vRealize Automation Application Services each have a separate business group per application type to enable administrative separation of blueprint creation and manipulation. Figure 43 shows an example where the EHCSystem and EHCOperations system business groups are configured alongside three tenant business groups (IT, HR, and Manufacturing) and three application business groups used by vRealize Automation Application Services for Microsoft SharePoint, Oracle, and Microsoft Exchange. 98 Chapter 8: Solution Rules and Permitted Configurations Figure 43. Software-defined data center tenant design and endpoints ViPR tenants The Federation Enterprise Hybrid Cloud uses a single ViPR tenant. The default provider tenant or an additional non-default tenant may be used. ViPR projects Federation Enterprise Hybrid Cloud STaaS operations rely on a correlation between the tenant URL value of the user executing the request and a ViPR project name. Therefore, to enable STaaS for an additional vRealize tenant, you must create a corresponding ViPR project whose name and case match that of the vRealize tenant URL. As each project can have a maximum total storage capacity (quota) associated with it that cannot be exceeded, the use of multiple ViPR projects enables multiple vRealize Automation tenants within the Federation Enterprise Hybrid Cloud to provision storage from the same storage endpoints in a controlled or limited fashion. ViPR consistency groups ViPR consistency groups are an important component of the CA and DR topologies for the Federation Enterprise Hybrid Cloud solution. Consistency groups logically group volumes within a project to ensure that a set of common properties is applied to an entire group of volumes during a fault event. This ensures host-to-cluster or application-level consistency when a failover occurs. Consistency groups are created by Federation Enterprise Hybrid Cloud STaaS operations and are specified when CA or DR-protected volumes are provisioned. Consistency group names must be unique within the ViPR environment. When used with VPLEX in the CA dual-site/single vCenter, these consistency groups are created per physical array, per vSphere cluster, and per site. When used with RecoverPoint in a DR dual-site/dual vCenter configuration, these consistency groups are created in 1:1 relationship with the vSphere datastore/LUN. 99 Chapter 8: Solution Rules and Permitted Configurations vSphere datastore clusters VMware Raw Device Mappings (RDMs) The Federation Enterprise Hybrid Cloud does not support datastore clusters for the following reasons: 1. Linked clones do not work with datastore clusters, causing multi machine blueprints to fail unless configured with explicitly different reservations for edge devices. 2. vRealize Automation already performs capacity analysis during initial placement. 3. Can result in inconsistent behavior when virtual machines report their location to vRealize Automation. Misalignment with vRealize reservations can makes the virtual machine un-editable. 4. Federation Enterprise Hybrid Cloud STaaS operations do not place new LUNs into datastore clusters, therefore all datastore clusters would have to be manually maintained. 5. Specific to DR: a. Day 2 storage DRS migrations between datastores would break the Site Recovery Manager protection for the virtual machines moved. b. Day 2 storage DRS migrations between datastores would result in rereplicating the entire virtual machine to the secondary site. VMware Raw Device Mappings are not created by or supported by Federation Enterprise Hybrid Cloud STaaS services. If created outside of STaaS services, then any issues arising from their use will not be supported by Federation Enterprise Hybrid Cloud customer support. If changes are required in the environment to make them operate correctly, then a Federation Enterprise Hybrid Cloud RPQ should be submitted first for approval. Federation Enterprise Hybrid Cloud support teams may request that you back out any change made to the environment to facilitate RDMs. Multiple vCenter endpoints are supported within Federation Enterprise Hybrid Cloud. However depending on the topology chosen, there are certain considerations as outlined in this section. Single-site/single vCenter and dualsite/single vCenter topologies These topologies can support:  Only one vCenter per tenant with the ability to execute STaaS services.  More than one vCenter per tenant as long as the second and subsequent vCenter endpoints do not require STaaS or BaaS services. Enabling additional STaaS-enabled vCenter endpoints To enable additional STaaS-enabled vCenter endpoints, these topologies require a separate vRealize tenant per vCenter endpoint for the following reasons: 100  Federation Enterprise Hybrid Cloud STaaS catalog items use vRealize Orchestrator through the vRealize Automation advanced server configuration, which only allows one vRealize Orchestrator to be configured per tenant.  This vRealize Orchestrator stores important vCenter configuration details gathered during the Federation Enterprise Hybrid Cloud Foundation installation process.  To store additional vCenter configuration details, an additional vRealize Orchestrator is required.  Specifying the additional vRealize Orchestrator as an advanced server configuration requires an additional tenant. Chapter 8: Solution Rules and Permitted Configurations Each vCenter endpoint requires its own independent vRealize Orchestrator server and NSX Manager instance.  The vRealize Orchestrator consideration is based on the additional tenant consideration above.  The NSX Manager requirement is based on the VMware requirement for a 1:1 relationship between vCenter and NSX. Enabling additional BaaS-enabled vCenter endpoints These topologies require independent Avamar instances for each vCenter endpoint to enable BaaS services. Note: As backup service level names use a common vRealize dictionary, backup service levels created by each tenant will be visible to all tenants. Therefore, it is advisable to name backup service levels to indicate which tenant they were created for. This enables an operator to identify the backup service levels relevant to them. Dual-site/dual vCenter topologies This topology can support:  Two vCenters per tenant with the ability to execute STaaS and BaaS services.  More than two vCenters per tenant as long as the third and subsequent vCenter endpoints do not require STaaS, BaaS or DRaaS services. Enabling additional STaaS and BaaS-enabled vCenter endpoints Additional STaaS and BaaS enabled vCenter endpoints require additional tenants and Avamar instances similar to the single vCenter topologies. Note: As backup service level names use a common vRealize dictionary, backup service levels created by each tenant will be visible to all tenants. Therefore it is advisable to name backup service levels to indicate which tenant they were created for. This enables an operator to identify the backup service levels relevant to them. Combining topologies The following configurations are permitted for each Federation Enterprise Hybrid Cloud instance:  Local only (single-site/single vCenter)  Local plus CA combined   Uses the CA dual-site/single vCenter topology and provides local-only and CA functionality via distinct Workload Pods with separate networks and storage Local plus DR combined  Uses the DR dual-site/dual vCenter topology and provides local-only and DR functionality via distinct Workload Pods with separate networks and storage Note: Federation Enterprise Hybrid Cloud 3.5 does not support both DR and CA functionality on the same Federation Enterprise Hybrid Cloud instance. 101 Chapter 8: Solution Rules and Permitted Configurations Single site to continuous availability upgrade Single-site Federation Enterprise Hybrid Cloud deployments can be upgraded to CA dualsite/single vCenter topology by adopting either an online or offline upgrade approach with the following considerations. Considerations  The topology upgrade is an EMC professional services engagement and provides three basic methods of conversion based on the original storage design  NFS to VPLEX Distributed VMFS (Online via Storage vMotion)  Standard VMFS to VPLEX Distributed VMFS (Offline via VPLEX encapsulation)  VPLEX Local VMFS to VPLEX Distributed VMFS (Online via VPLEX Local to VPLEX Metro conversion)  If NFS is in use for management platform storage, then new VPLEX storage is required.  In non-BaaS environments, local workloads can be migrated to new CA clusters using storage vMotion if required. Note: Federation Enterprise Hybrid Cloud 3.5 does not currently provide an automated mechanism to achieve this. Contact EMC Professional Services to assist in this process.  Existing local-only workload clusters may remain as local-only clusters or be converted to CA-enabled clusters after the topology upgrade. Note: EMC Professional Services should execute this process  In BaaS environments, virtual machines requiring CA protection should remain on the original cluster and the cluster should be converted to a CA-enabled cluster. This is due to the need to carefully manage the relationships of vSphere clusters, Avamar grids, Avamar proxies, and vCenter folder structure to preserve the ability to restore backups taken prior to the topology upgrade.  Single-site to disaster recovery upgrade After the topology upgrade, new clusters can be provisioned to provide CA or localonly functionality for new tenant virtual machines. Single-site Federation Enterprise Hybrid Cloud deployments can be upgraded to DR dualsite/dual vCenter topology by adopting with the following considerations: Considerations  Additional Core and NEI Pod infrastructure and components need to be deployed on the second site.  Additional Automation Pod infrastructure needs to be deployed on the second site to become the target for the Automation Pod failover.  EMC RecoverPoint needs to be installed and configured and all Automation Pod LUNs replicated to the second site. Note: If NFS volumes were used for Automation Pod storage then new FC-based block datastores should be provided, and the Automation Pod components migrated to the new storage using Storage vMotion.  102 Prior to the upgrade, the Automation Pod components must be deployed on a distinct network segment from the Core and NEI Pods. Chapter 8: Solution Rules and Permitted Configurations  A Microsoft SQL Server instance and a vCenter Single Sign-On role must be deployed to a server in the Automation Pod during the initial deployment.  Migration of previously existing virtual machines from local to DR clusters is not currently supported with default functionality. Note: If there is a requirement to DR-enabled pre-existing tenant workloads, contact EMC Services teams to provide this as custom functionality. Importing from virtual machines and adding Federation Enterprise Hybrid Cloud services For environments that require existing virtual machines to be imported into the Federation Enterprise Hybrid Cloud, the bulk import feature of vRealize Automation enables the import of one of more virtual machines. This functionality is available only to vRealize Automation users who have Fabric Administrator and Business Group Manager privileges. The Bulk Import feature imports virtual machines intact with defining data such as reservation, storage path, blueprint, owner, and any custom properties. Federation Enterprise Hybrid Cloud offers the ability to layer Federation Enterprise Hybrid Cloud services onto pre-existing virtual machines by using and extending the bulk import process. Before beginning the bulk import process, the following conditions must be true:  Target virtual machines are located in an Federation Enterprise Hybrid Cloud vCenter endpoint Note: This is not an additional IaaS-only vCenter endpoint if they are also present.   Target virtual machines must be located on the correct vRealize Automation managed compute resource cluster and that cluster must already be on-boarded as a Federation Enterprise Hybrid Cloud cluster.  In cases where DR services are required for the target virtual machines, then they must be on a DR-enabled cluster.  In cases where data protection services are required for the target virtual machines, then they must be on a cluster that is associated with an Avamar pair. Target virtual machines must be located on the correct vRealize Automation managed datastore.  In cases where DR services are required for the target virtual machines, then they must be on a datastore protected by EMC RecoverPoint.  In cases where data protection services are required for the target virtual machines, then they must be on a datastore that is registered with an EMC Avamar grid. Note: The process for importing these virtual machines and adding Federation Enterprise Hybrid Cloud services is documented in the Federation Enterprise Hybrid Cloud 3.5: Administration Guide. 103 Chapter 8: Solution Rules and Permitted Configurations Multimachine blueprints Load balancers cannot be deployed as part of a protected multimachine blueprint. However, you can manually edit the upstream Edge to include load-balancing features for a newly deployed multimachine blueprint. vRealize Automation Failover state operations Provisioning of virtual machines to a protected DR cluster is permitted at any time, as long as that site is operational. If you provision a virtual machine while the recovery site is unavailable due to vCenter Site Recovery Manager disaster recovery failover, you need to run the DR Remediation catalog item to bring it into protected status when the recovery site is back online. During STaaS provisioning of a protected datastore, Federation Enterprise Hybrid Cloud workflows issue a DR auto-protect attempt for the new datastore with vCenter Site Recovery Manager. If both sites are operational when the request is issued, this should be successful. If, however, one site is offline (vCenter Site Recovery Manager Disaster Recovery Failover) when the request is made, the datastore will be provisioned, but you must run the DR Remediation catalog item to bring it into a protected status. Note: The DR Remediation catalog item can be run at any time to ensure that all DR items are protected correctly. 104 Failover granularity While replication is at the datastore level, the unit of failover for in a DR configuration is a DR-enabled cluster. It is not possible to failover a subset of virtual machines on a single DRprotected cluster. This is because all networks supporting these virtual machines are converged to the recovery site during a failover. RecoverPoint cluster limitations There is also a limit of 64 consistency groups per RecoverPoint appliance and 128 consistency groups per RecoverPoint cluster. Therefore, the number of nodes deployed in the RecoverPoint cluster should be sized to allow appropriate headroom for surviving appliances to take over the workload of failed appliance. RecoverPoint licensing The Federation Enterprise Hybrid Cloud supports RecoverPoint CL-based licensing only. It does not support RecoverPoint SE or RecoverPoint EX, as these versions are not currently supported by EMC ViPR. VMware Site Recovery Manager limitations Protection maximums Table 11 shows the maximums that apply for Site Recovery Manager-protected resources. Table 11. Site Recovery Manager protection maximums Total number of Maximum Virtual machines configured for protection using array-based replication 5,000 Virtual machines per protection group 500 Protection groups 250 Recovery plans 250 Protection groups per recovery plan 250 Virtual machines per recovery plan 2,000 Replicated datastores (using array-based replication) and >1 RecoverPoint cluster 255 Chapter 8: Solution Rules and Permitted Configurations Recovery maximums Table 12 shows the maximums that apply for Site Recovery Manager recovery plans. Table 12. Implied Federation Enterprise Hybrid Cloud storage maximums Site Recovery Manager protection maximums Total number of Maximum Concurrently executing recovery plans 10 Concurrently recovering virtual machines using array-based replication 2,000 Table 13 indicates the storage maximums in a Federation Enterprise Hybrid Cloud DR environment, when all other maximums are taken into account. Table 13. Implied Federation Enterprise Hybrid Cloud storage maximums Total number of Maximum DR enabled datastores per RecoverPoint Consistency Group 1 DR enabled datastores per RecoverPoint Cluster 128 DR enabled datastores per Federation Enterprise Hybrid Cloud environment 250 To ensure maximum protection for DR-enabled vSphere clusters, the Federation Enterprise Hybrid Cloud STaaS workflows create each LUN in its own RecoverPoint consistency group. This ensures that ongoing STaaS provisioning operations have no effect on either the synchronized state of existing LUNs or the history of restore points for those LUNs maintained by EMC RecoverPoint. Because there is a limit of 128 consistency groups per EMC RecoverPoint cluster, there is therefore a limit of 128 Federation Enterprise Hybrid Cloud STaaS provisioned LUNs per RecoverPoint cluster. To extend the scalability further, additional EMC RecoverPoint clusters are required. Each new datastore is added to its own Site Recovery Manager protection group. As there is a limit of 250 protection groups per Site Recovery Manager installation, this limits the total number of datastores in a DR environment to 250, irrespective of the number of RecoverPoint clusters deployed. Storage support Supports VMAX, VNX, XtremIO, and VMAX3 (behind VPLEX) only. Network support The Federation Enterprise Hybrid Cloud provides fully automated network re-convergence during disaster recovery failover when using VMware NSX only. The use of vSphere Distributed Switch backed by other networking technologies is also permitted, but requires that network re-convergence is carried out manually in accordance with the chosen network technology, or that automation of network re-convergence is developed as a professional services engagement. NSX security support Only supports the assignment of blueprint virtual machines to a security group. Does not support the assignment of blueprints to security policies or security tags. 105 Chapter 8: Solution Rules and Permitted Configurations Resource isolation As vRealize Automation endpoints are visible to all vRealize Automation IaaS administrators, resource isolation in the truest sense is not possible. However, use of locked blueprints and storage reservation policies can be used to ensure that certain types of workload (such as those whose licensing is based on CPU count) can be restricted to only a subset of the Workload Pods available in the environment. This includes the ability to control those licensing requirements across tenants by ensuring that all relevant deployments are on the same set of compute resources. Resource sharing All endpoints configured across the vRealize Automation instance by an IaaS administrator are available to be added to fabric groups, and therefore consumed by any business group across any of the vRealize Automation tenants. Provisioning to vCenter endpoints, however, can still only be done through the tenant configured as part of the Federation Enterprise Hybrid Cloud foundation installation in that tenant and its vRealize Orchestrator server. 106 Application tenant integration The Federation recommends that applications provisioned using vRealize Automation Application Services each have their own business group by application type to enable administrative separation of blueprint creation and manipulation. Supported Avamar platforms The Federation Enterprise Hybrid Cloud supports physical Avamar infrastructure only. It does not support Avamar Virtual Edition Scale out limits Federation Enterprise Hybrid Cloud 3.5 supports a maximum of 15 Avamar replication pairs (30 individual physical instances). Federation Enterprise Hybrid Cloud software resources For information about qualified components and versions required for the initial release of the Federation Enterprise Hybrid Cloud 3.5 solution, refer to the Federation Enterprise Hybrid Cloud 3.5: Reference Architecture Guide. For up-to-date supported version information, refer to the EMC Simple Support Matrix: EMC Hybrid Cloud 3.5: elabnavigator.emc.com. Federation Enterprise Hybrid Cloud sizing For all Federation Enterprise Hybrid Cloud sizing operations, refer to the EMC Mainstay Sizing tool: mainstayadvisor.com/go/emc. Chapter 9: Conclusion This chapter presents the following topic: Conclusion ...................................................................................................... 108 107 Chapter 9: Conclusion The Federation Enterprise Hybrid Cloud solution provides on-demand access and control of infrastructure resources and security while enabling customers to maximize asset use. Specifically, the solution integrates all the key functionality that customers demand of a hybrid cloud and provides a framework and foundation for adding other services. This solution provides the following features and functionality:  Continuous availability  Disaster recovery  Data protection  Automation and self-service provisioning  Multitenancy and secure separation  Workload-optimized storage  Elasticity and service assurance  Monitoring  Metering and chargeback The solution uses the best of EMC and VMware products and services to empower customers to accelerate the implementation and adoption of hybrid cloud while still enabling customer choice for the compute and networking infrastructure within the data center. 108 Chapter 10: References This chapter presents the following topic: Federation documentation ................................................................................. 110 109 Chapter 10: References These documents are available on EMC.com. Access to Online Support depends on your login credentials. If you do not have access to a document, contact your Federation representative. 110  Federation Enterprise Hybrid Cloud 3.5: Reference Architecture Guide  Federation Enterprise Hybrid Cloud 3.5: Infrastructure and Operations Management Guide  Federation Enterprise Hybrid Cloud 3.5: Security Management Guide  Federation Enterprise Hybrid Cloud 3.5: Administration Guide  VCE Foundation for Federation Enterprise Hybrid Cloud Addendum 3.5  VCE Foundation Upgrade from 3.1 to 3.5 Process

Federation Enterprise Hybrid Cloud 3.5 - Concepts and

Related documents

Products

Support

Federation Enterprise Hybrid Cloud 3.5 - Concepts and

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib