EMC RecoverPoint/Cluster Enabler for Microsoft Failover Clusters

EMC RecoverPoint/Cluster
Enabler for Microsoft Failover
Cluster
© Copyright 2010 EMC Corporation. All rights reserved.
1
Disaster Restart
Business challenges and requirements
 Meeting recovery point objective (RPO) and
recovery time objective (RTO) requirements with
current plan
– Business and/or regulatory needs
– Need to reduce RPO and RTO times
 Application benchmarks from days or hours to minutes or
seconds
– Need for continuous operations with no data loss
 Cost of assets and maintenance at disaster
recovery site
– Need to maintain software versions (updates and patches)
 Reliability of the disaster recovery plan
– Need to address applications requiring dependent write
consistency between and across operating systems
– Need to periodically test to ensure it will work when required
© Copyright 2010 EMC Corporation. All rights reserved.
2
Business Impact of Application
and Data Inaccessibility
Hot site/cold site
Downtime Cost
Electronic vaulting
Database replication
Remote replication
Dedicated hot standby
Geographical clusters
Time
Cost of downtime escalates quickly over time
© Copyright 2010 EMC Corporation. All rights reserved.
3
Microsoft Failover Cluster
A high-availability restart solution
Node or resource failure automatically restarts failed nodes on another
node where resources are available
Node
Fails
Resource Group:
Microsoft SQL
Resource Group:
Microsoft Exchange
Resource Group:
Oracle
Microsoft failover cluster provides high availability;
shared-nothing cluster model
© Copyright 2010 EMC Corporation. All rights reserved.
4
Cluster Enabler 4.0
Features and capabilities overview
 Integrates RecoverPoint and RecoverPoint/SE
with Microsoft failover cluster
– Automatic site failover for remote replication operations
– Supports majority node set quorum options: Majority Node Set
(MNS), and MNS with File Share Witness
 Supports RecoverPoint continuous remote replication
(CRR)
Cluster Enabler 4.0
supports RecoverPoint
3.1.1 or later and
any array supported by
RecoverPoint and
RecoverPoint/SE
– Using Fibre Channel or Gigabit Ethernet for remote replication
– Up to 400 milliseconds maximum latency for asynchronous
replication
– Up to 4 milliseconds maximum latency for synchronous replication
 Supports Windows Server and Server Core for Windows
Server 2008
– Up to two nodes per site with Windows Server 2003
– Up to eight nodes per site with Windows Server 2008 and Windows
Server 2008 R2
– Supports clustering of up to eight child partitions with
Hyper-V
© Copyright 2010 EMC Corporation. All rights reserved.
5
Majority Node Set Support
Majority Node Set
 Used as a tie-breaker to avoid split-brain scenarios
 From a cluster-node perspective, each node sees the
quorum as a local resource
– Each cluster node stores the configuration information on a local disk
 Each node has access to local disk when it starts up
– Cluster service ensures cluster configuration is consistent on each
cluster node
 Changes are replicated across the Majority Node Set
File Share Witness
 External to an cluster providing an additional quorum
vote
– 2- to 4-node cluster can survive up to N-1 node failures
– 4- to 8-node cluster can survive up to N-2 node failures
 Acts as a witness to Majority Node Set
– Enhances geographically disbursed failover cluster
 Recommended that File Share Witness be configured in
a third site
© Copyright 2010 EMC Corporation. All rights reserved.
6
Cluster Enabler for Microsoft Failover Cluster
LAN/WAN
Private Interconnect
File Share Witness with
RecoverPoint/CE
installed
RecoverPoint
Site A
Cluster nodes with
RecoverPoint/CE installed
Site B
Failover cluster supports up to 8 nodes with Windows Server 2003/2008
using Majority Node Set with and without File Share Witness
© Copyright 2010 EMC Corporation. All rights reserved.
7
Cluster Enabler for Microsoft Failover Cluster
Node failure
Role of Major Software Components
 Microsoft failover cluster software
– Protects against server hardware or network connection failures
– Initiates failover actions to a clustered node for resource group
restart
 Cluster Enabler 4.0 software
– Installed on all cluster nodes and on File Share Witness (if File
Share Witness is used)
– Responds to queries from the cluster service that determine cluster
behavior
– Determines RecoverPoint state and initiates appropriate
RecoverPoint actions using the RecoverPoint API
 RecoverPoint software
– CRR provides remote mirroring of production data
– CRR journal retained, allowing for point-in-time recovery outside of
cluster operations
© Copyright 2010 EMC Corporation. All rights reserved.
8
Cluster Enabler and Node Failure Event
Failover steps
 Site A node fails, resulting in heartbeat
response timeout
 Cluster reforms between the Site B node
and the File Share Witness node
 The Site B node brings resource groups
from the Site A node online
 The latest image of the RecoverPoint
volumes listed in the resource group are
automatically recovered, read/write
enabled, and mounted to the Site B node
Majority Node Set
with File Share
Witness
 Application listed as part of the failed
Site A node resource group is restarted
 The Site A node network address is
added to the network interface of the Site
B node and client traffic is routed to the
Site B node
RecoverPoint
Site A
© Copyright 2010 EMC Corporation. All rights reserved.
Site B
9
Disaster Recovery for Hyper-V
Automated failover operations for Hyper-V environments
New
LAN/WAN
Private Interconnect
Majority Node Set with
File Share Witness
Prod 1
Target 2
Site A
Target 1
RecoverPoint
Cluster nodes with
RecoverPoint/CE installed
Prod 2
Site B
Hyper-V with Failover Clusters supports up to 8 nodes with Windows 2008 R2
© Copyright 2010 EMC Corporation. All rights reserved.
10
Hyper-V Overview
Cluster Enabler 4.0 supports Hyper-V with failover clusters
New
 Failover of the virtual machine (VM) resource
– RecoverPoint/CE is deployed in the Hyper-V parent partition
– Cluster relocation is at the VM level
 Hyper-V Live Migration and Quick Migration—between nodes at the
same or different sites
– Live Migration supported with RecoverPoint CRR synchronous replication
– Quick Migration supported with synchronous and asynchronous replication
– Use for planned maintenance—such as VM relocation for hardware upgrades and
software upgrades
– Use for VM workload re-distribution—move VMs from one physical host to another
© Copyright 2010 EMC Corporation. All rights reserved.
11
Hyper-V Virtual Machine Failure Event
Failover steps with Cluster Enabler 4.0
New
 Site A Hyper-V physical node fails, resulting
in heartbeat response timeout
 Cluster reforms between the Site B node
and the File Share Witness node
 The Site B node brings Hyper-V virtual
machine resource groups from the Site A
node online
 RecoverPoint target volumes for
consistency groups listed in affected
resource groups are recovered and
mounted to the Site B node
Majority Node Set
with File Share
Witness
 Virtual machines listed as part of the failed
Site A node resource group are restarted
RecoverPoint
Site A
 The Site A node network address is added
to the network interface of the Site B node
and client traffic is routed to the Site B node
Site B
Virtual Machines can failover within and between failover cluster nodes
© Copyright 2010 EMC Corporation. All rights reserved.
12
Hyper-V Live Migration
New
Planned hardware maintenance on physical server requires moving VM to
another physical server
Majority Node Set with
File Share Witness
R1
R2
Site A
R2
RecoverPoint CRR
synchronous replication
R1
Site B
Live migration can be within the same site or between sites
© Copyright 2010 EMC Corporation. All rights reserved.
13
Multi-Array Support
RecoverPoint
RecoverPoint
WAN
Each named cluster
group’s associated
devices reside in a single
RecoverPoint consistency
group of the same name
File Share Witness
with RecoverPoint/CE
installed
Devices for
Cluster Group 1
Devices for
Cluster Group 2
© Copyright 2010 EMC Corporation. All rights reserved.
Cluster nodes with
RecoverPoint/CE installed
14
Microsoft Failover Clusters Deployed
with Oracle on Windows
Network
Oracle
Oracle
Oracle
Oracle
Majority Node Set with
File Share Witness
Target 1
Prod1
Target 2
RecoverPoint
Prod 2
Failover clusters configured with Oracle Fail Safe
© Copyright 2010 EMC Corporation. All rights reserved.
15
Benefits of Cluster Enabler
 Provides rapid site restart with RecoverPoint
– Automatic site failover for common disruptions—including
compete site disasters and server, storage, or networkrelated failures
 Minimizes site failback time with RecoverPoint
– Only changes are copied by RecoverPoint or
RecoverPoint/SE to resynchronize the primary cluster
storage system
 Provides multi-array support
– One cluster can span multiple storage arrays at the same
or different sites
– Different clusters can share storage arrays
 Supports heterogeneous storage arrays
– A mix of arrays can be used
– Storage arrays do not have to be identical between sites
© Copyright 2010 EMC Corporation. All rights reserved.
16