Datacenter Switchover

advertisement
Workflow Steps
Perform a datacenter switchover for a
database availability group
Version 1.2 (Updated 12/2012)
Exchange 2010 - Datacenter Switchover
Stop-DatabaseAvailabilityGroup
Restore-DatabaseAvailabilityGroup
Exchange 2010 - Datacenter Switchback
Start-DatabaseAvailabilityGroup
Stop-DatabaseAvailabilityGroup
Has the datacenter switchover been approved?
YES
NO
Stop-DatabaseAvailabilityGroup
Is the primary datacenter online or physically
accessible?
YES
NO
Stop-DatabaseAvailabilityGroup
Do the remote and primary datacenters have network
connectivity?
YES
NO
Stop-DatabaseAvailabilityGroup
Are the Exchange servers in the primary datacenter
online?
YES
NO
Stop-DatabaseAvailabilityGroup
Is your DAG extended to multiple Active Directory sites?
YES
NO
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
Repeat the above command for all Active Directory sites containing DAG members that are not the recovery datacenter AD site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
Exchange servers that were accessible in the primary datacenter should have their Cluster services forcibly cleaned up and the
Cluster service should be configured with a startup type of DISABLED. You can verify this using Services.msc.
3)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary site>
Repeat the above command for all DAG members that are not in the recovery datacenter.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
Exchange servers that were accessible in the primary datacenter should have their Cluster services forcibly cleaned up and the
Cluster service should be configured with a startup type of DISABLED. You can verify this using Services.msc.
3)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
Is your DAG extended to multiple Active Directory sites?
YES
NO
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
-ConfigurationOnly:$True
Repeat for any additional Active Directory sites that are not the recovery datacenter.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary
site> -ConfigurationOnly:$True
Repeat command for all DAG members that are not in the recovery datacenter.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
Are the Exchange servers in primary datacenter online?
YES
NO
Stop-DatabaseAvailabilityGroup
Is your DAG extended to multiple Active Directory sites?
YES
NO
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
-ConfigurationOnly:$True
Repeat for any additional Active Directory sites that are not the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary
site> -ConfigurationOnly:$True
Repeat for any additional DAG members that are not in the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Optional: If Exchange Management Shell access to the primary datacenter is available, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
Repeat for any additional Active Directory sites that are not the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1) Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
Exchange servers that were accessible in the primary datacenter should have their Cluster services forcibly cleaned up and the
Cluster service should be configured with a startup type of DISABLED. You can verify this using Services.msc.
3)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
No Exchange server instance if functional to service the Exchange Management Shell – in this instance this step can be skipped.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary
site>
Repeat for any additional DAG members that are not in the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
Is your DAG extended to multiple Active Directory sites?
YES
NO
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary datacenter>
-ConfigurationOnly:$True
Repeat for any additional Active Directory sites that are not the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
-ConfigurationOnly:$True
Repeat for any additional Active Directory sites that are not the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
OPTIONAL: Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
-ConfigurationOnly:$True
Repeat for any additional Active Directory sites that are not the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG (this assumes at least one Exchange
server exists :
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Optional: If Exchange Management Shell access to the primary datacenter is available, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary site> configurationOnly:$TRUE
Repeat for any additional DAG members that are not in the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1)
Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG (this assumes at least one Exchange server
exists :
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all mailbox servers in
the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This error can be
safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
Is your DAG extended to multiple Active Directory sites?
YES
NO
Stop-DatabaseAvailabilityGroup
COMMANDS:
Optional: If Exchange Management Shell access to the primary datacenter is available, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
-ConfigurationOnly:$True
Repeat for any additional Active Directory sites that are not the recovery datacenter Active Directory site.
EXPECTED OUTCOMES:
1) Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
2)
A double write to both a domain controller in the recovery datacenter and a domain controller in the primary datacenter of the
StoppedMailboxServers attribute is performed. This is done to bypass Active Directory site replication latency.
COMMON ERRORS:
If a domain controller in the primary datacenter is not available, the command may return an Active Directory provider error. This
error can be safely ignored.
Command Completed?
Stop-DatabaseAvailabilityGroup
COMMANDS:
Using the Exchange Management Shell on a sever in the recovery datacenter, run:
Stop-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary
site> -ConfigurationOnly:$True
Repeat command for all DAG members that are not in the recovery datacenter.
EXPECTED OUTCOMES
1) Verify the servers on the StartedMailboxServers and StoppedMailboxServers lists for the DAG:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
The StoppedMailboxServer should list all mailbox servers in the primary datacenter and the StartedMailboxServers should list all
mailbox servers in the recovery datacenter.
Command Completed?
Restore-DatabaseAvailabilityGroup
Did Stop-DatabaseAvailabilityGroup complete
successfully?
YES
NO
Restore-DatabaseAvailabilityGroup
COMMANDS:
Stop the Cluster service on each DAG member in the recovery datacenter. To do this run the appropriate command for your DAG
member’s operating system:
• Windows Server 2008 R2:
• Windows Server 2008 SP2:
Stop-Service Clussvc
Net Stop Clussvc
EXPECTED OUTCOMES:
Cluster services are stopped on remaining nodes.
COMMON ERRORS
Access denied – You must use an elevated command prompt run as administrator if the default administrator account is not used
Command Completed?
Restore-DatabaseAvailabilityGroup
Is the Cluster service stopped on all DAG members in
your recovery datacenter?
YES
NO
Restore-DatabaseAvailabilityGroup
COMMANDS:
From the Exchange Management Shell on an Exchange server in the recovery datacenter, run:
Restore-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <recovery site>
-AlternateWitnessDirectory:<AWSPath> -AlternateWitnessServer:<AWSName>
EXPECTED OUTCOMES:
1) A DAG member in the recovery datacenter is randomly selected and it’s Cluster service is started in /forceQuourm mode
2) DAG members on the StoppedMailboxServers list are evicted from the DAG’s cluster thereby adjusting the membership count
a) If the resulting membership count is EVEN or results in a SINGLE node, the Cluster is configured with a Node and File Share Majority
quorum and it begins using the Alternate Witness Server and Alternate Witness Directory
3) Cluster services are started on the remaining DAG members and they successfully join the DAG’s cluster
VERIFICATION:
Use the following steps to verify that the DAG members are up and the Cluster Group is online by running the following commands:
Windows Server 2008 R2
1) Import-Module FailoverClusters
2) Get-ClusterNode –Cluster <DAGName>
3) Get-ClusterGroup –Cluster <DAGName>
Windows Server 2008 SP2
1) Cluster <DAGName> node
2) Cluster <DAGName> group
COMMON ERRORS:
Nodes fail to evict with error 0x46. See http://aka.ms/0x46
Command Completed?
Restore-DatabaseAvailabilityGroup
Assuming all pre-requisites have been met,
any activation blocks can now be removed
and databases can be mounted
Command Completed?
Start-DatabaseAvailabilityGroup
Is your primary datacenter online?
YES
NO
Start-DatabaseAvailabilityGroup
Ensure that supporting services are available including but not limited to:
1) Active Directory / domain controllers / global catalog / FSMO role holders
2) Domain Name Services (DNS)
3) Witness Server
4) Supporting Exchange roles: Client Access and Hub Transport
OPTIONAL:
Dynamic Host Configuration Protocol servers (DHCP), if DHCP addresses are used for DAG networks
Edge Transport server
Unified Messaging server
Continue…
Start-DatabaseAvailabilityGroup
Are the necessary services established and functioning?
YES
NO
Start-DatabaseAvailabilityGroup
COMMANDS:
Verify network connectivity between all DAG members.
Suggested methods:
1) Ping test between DAG members
2) Map administrative shares between DAG members
EXPECTED OUTCOMES:
Connectivity between datacenters is functioning and all cluster inter-node communications are
operating normally
Command Completed?
Start-DatabaseAvailabilityGroup
Have datacenter communications been verified?
YES
NO
Start-DatabaseAvailabilityGroup
Verify that Cluster service on the DAG members in the primary datacenter have a startup type of
DISABLED. If they do not, either the Stop-DatabaseAvailabilityGroup command was not successful or
the DAG members in the primary datacenter failed to receive eviction notification after network
connectivity between datacenters was restored
Do not proceed until Cluster service cleanup has occurred and Cluster service has a startup type of DISABLED.
You can optionally run the following command on the DAG members in the primary datacenter to
forcibly cleanup the outdated cluster information:
Cluster node /forcecleanup
Continue…
Start-DatabaseAvailabilityGroup
Does the Cluster service show a startup type of
disabled?
YES
NO
Start-DatabaseAvailabilityGroup
Is your DAG extended to multiple Active Directory sites?
YES
NO
Start-DatabaseAvailabilityGroup
COMMAND:
Using the Exchange Management Shell, run the following command:
Start-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite <primary site>
Repeat for all other Active Directory sites that were stopped during the datacenter switchover process.
EXPECTED OUTCOMES:
1) DAG members in the primary datacenter are added to the DAG’s cluster
2) If the resulting membership count is EVEN, the cluster is to use the Node and File Share Majority quorum
VERIFICATION:
Use the following steps to verify that the DAG members are up and the Cluster Group is online by running the following commands:
Windows Server 2008 R2
1) Import-Module FailoverClusters
2) Get-ClusterNode –Cluster <DAGName>
3) Get-ClusterGroup –Cluster <DAGName>
Windows Server 2008 SP2
1) Cluster <DAGName> node
2) Cluster <DAGName> group
The following command shows the StartedMailboxServers list with all DAG members and an empty StoppedMailboxServers list:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
COMMON ERRORS:
Nodes may fail to join the cluster with invalid node error. If this occurs, retry the command again.
Continue…
Start-DatabaseAvailabilityGroup
COMMAND:
Using the Exchange Management Shell, run the following command:
Start-DatabaseAvailabilityGroup –Identity <DAGName> -MailboxServer <DAG member in primary site>
Repeat for all other Mailbox servers that were stopped during the datacenter switchover process.
EXPECTED OUTCOMES:
1) DAG members in the primary datacenter are added to the DAG’s cluster
2) If the resulting membership count is EVEN, the cluster is to use the Node and File Share Majority quorum
VERIFICATION:
Use the following steps to verify that the DAG members are up and the Cluster Group is online by running the following commands:
Windows Server 2008 R2
1) Import-Module FailoverClusters
2) Get-ClusterNode –Cluster <DAGName>
3) Get-ClusterGroup –Cluster <DAGName>
Windows Server 2008 SP2
1) Cluster <DAGName> node
2) Cluster <DAGName> group
The following command shows the StartedMailboxServers list with all DAG members and an empty StoppedMailboxServers list:
Get-DatabaseAvailabilityGroup –Identity <DAGName> | FL
COMMON ERRORS:
Nodes may fail to join the cluster with invalid node error. If this occurs, retry the command again.
Continue…
Start-DatabaseAvailabilityGroup
Were the DAG members added to the cluster
successfully?
YES
NO
Start-DatabaseAvailabilityGroup
Were the DAG members added to the cluster
successfully?
YES
NO
Start-DatabaseAvailabilityGroup
COMMANDS:
Reset the DAG’s Witness Server and Alternate Witness Server properties by running the following command:
Set-DatabaseAvailabilityGroup –Identity <DAGName> -WitnessServer <WSName>
-AlternateWitnessServer <AWSName>
EXPECTED OUTCOMES:
Witness Server and Alternate Witness Server properties are configured to ensure the appropriate witness server is in use
If the Cluster configuration does not match the DAG configuration, the Cluster is updated with the proper configuration
COMMON ERRORS:
Administrators incorrectly verify which file share witness is currently in use. See http://aka.ms/E14FSW.
Continue…
Start-DatabaseAvailabilityGroup
After any activation blocks have been
removed, active database copies can be
moved to servers in the primary datacenter
Continue…
Download