Exchange Server 2010: High Availability Concepts

SESSION CODE: EXL303
Scott Schnoll
Principal Technical Writer
Microsoft Corporation
EXCHANGE SERVER 2010
HIGH AVAILABILITY CONCEPTS
(c) 2011 Microsoft. All rights reserved.
Agenda
► Exchange Server 2010 High Availability Concepts
–
–
–
–
–
–
–
–
–
Terminology
Quorum
Witness, Witness Server and Alternate Witness Server
Active Manager
AutoDatabaseMountDial
Activation Preference
Circular Logging and Continuous Replication
Replay Lag and Truncation Lag
Continuous Replication Modes
(c) 2011 Microsoft. All rights reserved.
Exchange Server 2010 High
Availability
Concept: Terminology
Exchange Server High Availability
Terminology
► High Availability must meet three criteria
– Service Availability
– Data Availability
– Automatic recovery from most failures
► Site Resilience
– Manual switchover process (datacenter switchover)
used to activate service and data access in an
alternate datacenter when the primary datacenter can
no longer provide the required level of service
Exchange Server High Availability
Terminology
► *overs (pronounced “star overs”)
– Short for Failovers and Switchovers
• Failover is automatic, performed by the system
• Switchover is manual, performed by an administrator
– Database *overs
– Server *overs
– Datacenter switchover
(c) 2011 Microsoft. All rights reserved.
Exchange Server 2010 High
Availability
Concept: Quorum
Quorum
► A consensus of voters used to ensure that only one
subset of members is functioning at one time
► A majority of members must be active and have
communications with each other
► Represents a shared view of members
– Voters and resources
► Dual Usage
– Data shared between the voters representing
configuration, etc.
– Number of voters required for the solution to stay running
(majority)
Quorum
► Quorum is necessary for cluster functions and for DAG
functions
► Exchange 2010 uses two cluster quorum models
– Node Majority (DAGs with an odd number of members)
– Node and File Share Majority (DAGs with an even number of
members)
► Quorum = (N/2) + 1 (whole numbers only)
–
–
–
–
6 members: (6/2) + 1 = 4 votes for quorum (can lose 3 voters)
9 members: (9/2) + 1 = 5 votes for quorum (can lose 4 voters)
13 members: (13/2) + 1 = 7 votes for quorum (can lose 6 voters)
15 members: (15/2) + 1 = 8 votes for quorum (can lose 7 voters)
Exchange Server 2010 High
Availability
Concept: Witness, Witness Server and
Alternate Witness Server
Witness
► A witness is a share with a file on a server that is
external to the DAG that participates in determining
quorum by providing a weighted vote for the DAG
member that has a lock on the witness.log file
– Configured for all DAGs
– Used only by DAGs that have an even number of members
► Witness server does not maintain a copy of quorum
data, does not vote, and is not a member of the DAG
or cluster
Witness
► Part of cluster core resource group (Cluster Group)
– Represented by File Share Witness resource (UNC path)
• File share witness cluster resource, directory, and share
automatically created and removed as needed
– Uses Cluster IsAlive check for monitoring and availability
• If witness IsAlive fails, Cluster Group is failed and moved to
another DAG member
• If other DAG member cannot bring witness resource online, the
resource will remain in a Failed state, with restart attempts every
60 minutes
• See http://support.microsoft.com/kb/978790 for details
Witness
► If Failed and needed for quorum, cluster will try
to online File Share Witness resource once
– If witness is Failed and cannot be brought Online,
quorum is lost
► If witness can be restarted or is already Online:
– An SMB lock is placed on witness.log typically by
node that owns Cluster Group (“locking node”)
– The Locking Node increments PAXOS and writes the
updated PAXOS tag to the witness.log file
Witness
► When locked, the Locking Node retains a
weighted vote
– Members in contact with locking node are in majority
and maintain quorum
– Members not in contact with locking node are in
minority and lose quorum
► When witness is no longer needed to maintain
quorum, SMB lock on witness.log is released
Witness
► Witness used only when needed for quorum
V
XXX
Witness
► Witness used only when needed for quorum
XX
Witness Server
► No pre-configuration typically necessary
– Exchange Trusted Subsystem must be member of
local Administrators group on Witness Server if
Witness Server is not running Exchange 2010
► Cannot be a member of the DAG (present or
future)
► Must be in the same Active Directory forest as
DAG
Witness Server
► Can be Windows Server 2003 or later
– File and Printer Sharing for Microsoft Networks must be
enabled
► Replicating witness directory/share with DFS not
supported
► Not necessary to cluster Witness Server
– If you do cluster witness server, you must use Windows
2008
► Single witness server can be used for multiple DAGs
– Each DAG requires its own unique Witness Directory/Share
Witness Server
► Using a non-Exchange 2010 witness server:
– You will receive a spurious warning when running NewDatabaseAvailabilityGroup, Set-DatabaseAvailabilityGroup
or Restore-DatabaseAvailabilityGroup
The Exchange Trusted Subsystem is not a member of the
local Administrators group on specified witness server
<ServerName>.
– Code bug described at http://aka.ms/xh5rah
Alternate Witness Server
► Witness server used by a DAG after a datacenter
switchover
► DAG is configured to use alternate witness server
when you run RestoreDatabaseAvailabilityGroup or ahead of time
by using Set-DatabaseAvailabilityGroup
► DAGs do not dynamically switch witness servers
– Alternate witness server does not provide redundancy
for witness server or FSW resource
Exchange Server 2010 High
Availability
Concept: Active Manager
Active Manager
► Exchange component that manages high
availability platform
– Runs inside the Microsoft Exchange Replication
service on every Mailbox server
– Is the definitive source of information on where a
database is active
• Stores this information in cluster database
• Provides this information to Active Manager client running
on other server roles (Client Access and Hub Transport)
Active Manager Roles
► Standalone Active Manager
► Primary Active Manager (PAM)
► Standby Active Manager (SAM)
► Active Manager Client
– Runs in RPC Client Access service on CAS and
Transport service on Hub
Active Manager
► Primary Active Manager (PAM)
– Runs on the node that owns the cluster core resources
(cluster group)
– Gets topology change notifications
– Reacts to server failures
– Selects the best database copy on failovers and
targetless switchovers
– Detects failures of local Information Store and local
databases
Active Manager
► Standby Active Manager (SAM)
– Runs on every other node in the DAG
– Detects failures of local Information Store and local
databases
• Reacts to failures by asking PAM to initiate a failover
– Responds to queries from CAS/Hub about which server
hosts the active copy
► Both roles are necessary for automatic recovery
– If the Microsoft Exchange Replication service is stopped,
automatic recovery will not happen
Exchange Server 2010 High
Availability
Concept: AutoDatabaseMountDial
AutoDatabaseMountDial
► When a replicated mailbox database is affected by a
failure (e.g., disk, network, service, server), what two
types of recovery are possible?
► Answer:
– Manual – the Administrator performs recovery (switchover)
– Automatic – Exchange 2010 performs recovery (failover)
► Exchange 2010 uses AutoDatabaseMountDial setting
as part of its automatic recovery logic
► AutoDatabaseMountDial is configured using SetMailboxServer
AutoDatabaseMountDial
► Configured on a perMailbox server basis
► Used to determine if
activating passive copy can
be mounted, based on
number of missing log files
► Can be overridden by an
administrator performing a
switchover
AutoDatabaseMountDial
► Best Availability – 12 or fewer missing log files
► Good Availability – 6 or fewer missing log files
(Default)
► Lossless – 0 missing log files
► Best Effort – Boundless; available for switchovers
only
► Custom value – Configured using ADSIEdit
– msExchDataLossForAutoDatabaseMount attribute of
Mailbox server object
– if custom value >12 is used, increase transport dumpster
size accordingly
AutoDatabaseMountDial
► Dial setting consulted as part of a failover or targetless
switchover during Best Copy Selection (BCS)
► During a failover, the passive copy being activated
attempts to copy any missing log files from previous
active
– If successful, then the database will mount with zero data loss
– If unsuccessful (lossy failure), then the database will mount based
on the AutoDatabaseMountDial setting
► If data loss is outside of AutoDatabaseMountDial setting,
another copy (if available) will be tried
– If another copy is not available, administrator must intervene
Exchange Server 2010 High
Availability
Concept: Activation Preference
Activation Preference
► A whole number value assigned to each copy of a
given database (e.g., 1, 2, 3, 4, etc.), where 1 is at the
top of the preference order
► Different use in Exchange 2010 RTM vs. SP1 and later
– RTM uses it as tie-breaking mechanism when multiple
activation targets have the same copy queue length
– SP1 uses it for sorting initial list of potential activation
targets when AutoDatabaseMountDial is set to Lossless on
all servers that host a copy of the database
Activation Preference
► Simple example
– Multiple copies of a database in multiple datacenters
Activation Preference
► Activation preference is not a guarantee
► Based on nature of the failure and health and
status of passive copies
► Can be reconfigured on the fly
– Use Set-MailboxDatabaseCopy
– Does not require any restarts
– The number cannot be larger than the number of
copies of the mailbox database
► Also used as part of DAG re-balancing
Exchange Server 2010 High
Availability
Concept: Circular Logging and
Continuous Replication
Circular Logging and Continuous Replication
► Exchange 2010 includes two forms of circular
logging
– ESE (aka JET) circular logging (CL)
– Continuous replication circular logging (CRCL)
► Over time, each Exchange database generates a set
of log files known as the log stream
► When enabled, circular logging allows Exchange to
overwrite transaction log files after the data
contained in the log files is committed to the
database
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► Exchange 2007 continuous replication also
included CRCL, which is different from CL
– CL is a function of JET and is performed by the
Information Store service (store.exe)
– CRCL is a function of continuous replication and is
performed by the Exchange Replication service
(msexchangerepl.exe)
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► For truncation to occur on highly available (non-lagged)
mailbox database copies, the answer must be "Yes" to the
following questions:
– Has the log file been backed up, or is CRCL enabled?
– Is the log file below the checkpoint?
– Do the other non-lagged copies of the database agree with
deletion?
– Has the log file been inspected by all lagged copies of the database?
► For truncation to occur on lagged database copies, the answer
must be "Yes" to the following questions:
– Is the log file below the checkpoint?
– Is the log file older than ReplayLagTime + TruncationLagTime?
– Is the log file deleted on the active copy of the database?
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► MSExchangeRepl manages CRCL so that log
continuity is maintained and logs are not deleted
if they are still needed for replication
► MSExchangeRepl and Store communicate via
RPCs regarding which log files can be deleted
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► Both features enabled and disabled same way
– By using the checkbox on the Maintenance tab of the
database Properties dialog that says Enable circular
logging
– By using the Set-MailboxDatabase cmdlet with the
-CircularLoggingEnabled parameter set to $true
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► Once enabled, the feature you get depends on
whether or not database is replicated
– If the mailbox database is not replicated, it will use JET
circular logging. In this case, enabling or disabling JET
circular logging will require a dismount and mount of
the database
– If the mailbox database is replicated, it will use CRCL.
In this case, enabling or disabling CRCL takes effect
dynamically; there is no need to dismount and remount the database
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► No way to transition a database from CL to CRCL or
vice versa
– If you have circular logging enabled for a non-replicated
database and you add your first passive copy, Exchange
will block that action to prevent switching from JET CL to
CRCL
• This is to prevent logs from being truncated while the new copy
seed starts (the newly added passive copy would not seed
correctly if required logs were removed)
– When you try to remove the last passive copy, that would
result in a transition from CRCL to CL, and is blocked
because that transition would require a dismounting and
re-mounting of the active copy of the database
(c) 2011 Microsoft. All rights reserved.
Circular Logging and Continuous Replication
► You must therefore disable CRCL/CL when
– You add the second copy of a database (e.g., add the
first passive copy)
– You are trying to remove the last passive copy of a
database
Database "{0}" has circular logging enabled. It is not
possible to add or remove database copies while circular
logging is enabled. Please disable circular logging before
adding or removing mailbox database copies.
(c) 2011 Microsoft. All rights reserved.
Exchange Server 2010 High
Availability
Concept: Replay Lag and Truncation
Lag
Replay Lag Time
► Enables you to take a database copy back to a specific
point in time
► Configure a Replay Lag Time >0
– Configure using Set-MailboxDatabaseCopy
– Any database copy configured with a replay lag time >0 is
considered a “lagged copy”
► Lagged copies are only for point-in-time protection, but
they are not a replacement for point-in-time backups
– Logical corruption and/or mailbox deletion prevention scenarios
– Provide a maximum of 14 days protection
Replay Lag Time
► When should you deploy a lagged copy?
– Useful only to mitigate a risk
– Might not be needed if deploying a backup solution
► Lagged copies are not HA database copies
– Lagged copies should never be automatically activated!
– Steps for manual activation documented at
http://technet.microsoft.com/en-us/library/dd979786.aspx
► ESE single page restore feature does not support lagged
copies
– If a lagged copy has database page corruption, it will have to be
reseeded (which will lose the lagged aspect of the copy)
Truncation Lag Time
► Enables you to use the logs on a passive
database copy to recover from the loss of log
files on the active database copy
► Configure a Truncation Lag Time >0
– Configure using Set-MailboxDatabaseCopy
Using Replay Lag or Truncation Lag
► Using Replay Lag or Truncation Lag affects your
storage design
► By design, both cause logs to build up, even
when continuous replication circular logging
(CRCL) is enabled
► Plan your storage design accordingly
► See http://technet.microsoft.com/enus/library/dd335158.aspx for more info
Exchange Server 2010 High
Availability
Concept: Continuous Replication
Modes
Continuous Replication Modes
► Two Modes of Continuous Replication in SP1
– File Mode (traditional log shipping)
– Block Mode (ESE log buffer data shipping)
► Operates on a per-database level
► Initial Mode is File Mode
► Block Mode triggered when copy queue length
is 0
► Reverts to File Mode if copy queue length grows
(c) 2011 Microsoft. All rights reserved.
Continuous Replication Modes
ESE Log Buffer
Database
Logfragment
isup
built
copy
to
Log
and and
date
detected
inspected
converted
to
Log
File 6
Log
File 3
Log
File 5
Log
Log
File
File 44
Log
File 1
Log
File 2
Replication Log Buffer
Send me the latest log
files … I have log 2
Log
File 6
Log
File 7
Log
File 1
Log
File 2
complete log
Continuous Replication
Replication – Block
Continuous
– FileMode
Mode
Continuous Replication Modes
► Is database copy in block mode or file mode?
Get-Counter -ComputerName <DAGMemberName> Counter "\MSExchange Replication(*)\Continuous
replication - block mode Active“
Timestamp
--------8/21/2011 2:15:29 PM
CounterSamples
-------------\\ex1\\msexchange replication(db9)\continuous replication - block mode active : 1
\\ex1\\msexchange replication(db8)\continuous replication - block mode active : 1
\\ex1\\msexchange replication(db6)\continuous replication - block mode active : 1
\\ex1\\msexchange replication(db5)\continuous replication - block mode active : 1
\\ex1\\msexchange replication(db3)\continuous replication - block mode active : 1
\\ex1\\msexchange replication(db2)\continuous replication - block mode active : 1
\\ex1\\msexchange replication(db7)\continuous replication - block mode active : 0
\\ex1\\msexchange replication(db4)\continuous replication - block mode active : 0
\\ex1\\msexchange replication(db1)\continuous replication - block mode active : 0
\\ex1\\msexchange replication(_total)\continuous replication - block mode active : 6
Resources
Exchange Team Blog - http://aka.ms/ehlo
Exchange 2010 Documentation - http://aka.ms/ex2010docs
My Blog – http://aka.ms/schnoll
Twitter: @schnoll
Enrol in Microsoft Virtual Academy Today
Why Enroll, other than it being free?
The MVA helps improve your IT skill set and advance your career with a free, easy to access
training portal that allows you to learn at your own pace, focusing on Microsoft
technologies.
What Do I get for enrolment?
► Free training to make you become the Cloud-Hero in my Organization
► Help mastering your Training Path and get the recognition
► Connect with other IT Pros and discuss The Cloud
Where do I Enrol?
www.microsoftvirtualacademy.com
Then tell us what you think. TellTheDean@microsoft.com
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other
countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing
market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this
presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
(c) 2011 Microsoft. All rights reserved.
Resources
www.msteched.com/Australia
www.microsoft.com/australia/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http:// technet.microsoft.com/en-au
http://msdn.microsoft.com/en-au
Resources for IT Professionals
Resources for Developers
(c) 2011 Microsoft. All rights reserved.