Failover Clustering

advertisement
Failover Clustering
Microsoft Cluster Service (MSCS) is available for installation on Windows 2000 Advanced Server, Windows
2000 Datacenter Server, and Windows NT Enterprise Edition with Service Pack 5 (although this is not
recommended). MSCS allows applications, such as SQL Server 2000 Enterprise Edition, to take advantage of
high-availability failover functionality. A failover means that if one server, or node, fails, the applications on
another server in the cluster will start up and take over for the failed node. There are two types of failovers,
planned and those that occur as a result of a server hardware or software problem. Only applications that are
cluster-aware can utilize the MSCS failover capabilities. Cluster-aware applications, such as SQL Server, will
remain available even if one node encounters a failure. Windows 2000 Advanced Server supports two-node
clustering, and Windows 2000 Datacenter supports four-node clustering.
MSCS achieves high availability by using a shared disk array. For SQL Server clustering, this shared disk
array contains the database files (.mdf, .ndf, .ldf), along with backups, and other optional components.
The shared disk can be a regular external SCSI disk or disk array, or a Storage Area Network (SAN) array.
SAN arrays are sold by hardware and software vendors, and allow high-speed data transfer to external disk
arrays via a fiber channel connection (and network card). Software RAID is not supported; only hardware
RAID configurations are allowed. File encryption is also unsupported for the shared disk(s).
Binary installation files and executables are installed locally on
both nodes of a two-node cluster, and are not placed on the shared
disk. The SQL Server and SQL Server Agent services are also
installed on each node of the cluster. For a SQL Server 2000
virtual server, the same service is installed on both nodes, but
only one side is active at any time. Virtual server is another way of
saying that the SQL Server instance is on a failover cluster. Clients
and applications connect to the virtual server as they would for a
nonclustered default or named instance of SQL Server. The SQL
Server 2000 virtual server can exist or be controlled by either
node within a cluster (but only one at a time):
Chapter 10
If a failure occurs on the active node, the second node will 'go live'; its services will start up and take over
control of the files on the shared disk array for the SQL Server 2000 virtual server:
In a two-node cluster, two servers are connected using a private network (heartbeat) connection. The private
network gives the two nodes a fast path for checking each other's status. Network connectivity for end users is
established via a public IP address. End users connect to the SQL Server 2000 virtual server using the virtual
server NetBIOS name. This name and TCP/IP address is defined prior to SQL Server installation, and is
specified during installation.
A SQL Server 2000 virtual server is defined by a cluster group, which in turn is made up of separate
resources. Resources included in a SQL Server 2000 virtual server group include a shared disk drive or drives,
the virtual server name, the virtual server IP address, the SQL Server and SQL Server agent services and,
optionally, the SQL Server Fulltext service. This group is treated as a single unit where, in the event of a node
failure, all resources must be started or engaged on the failover node. Resources and groups are managed in
Cluster Administrator, which will be reviewed later in this chapter.
A two-node cluster needs the following names and IP addresses:
550
Type
Description
Physical server name and TCP/IP address for first
node
Just like a normal server requirement, you have
a name and associated IP address.
Physical server name and TCP/IP address for
second node
…
Cluster name and TCP/IP address
The cluster name is a virtual name and IP
address used for adding MSCS, and is
referenced in Cluster Administrator (more on
this later on).
Heartbeat name and TCP/IP address for Node 1
The heartbeat or private network connection
between the two nodes for checking status and
availability. These should be on the same
subnet.
Failover Clustering
Type
Description
Heartbeat name and TCP/IP address for Node 2
…
SQL Server 2000 virtual server name and TCP/IP
address
This is the name that SQL Server users will use
to access the clustered instance. This instance
can exist on either Node 1 or Node 2. Users
should treat this instance like any regular SQL
Server instance. If you are installing just one
SQL Server instance on the cluster, this is an
active/passive setup.
Second SQL Server 2000 virtual server name and
TCP/IP address
If using two instances of SQL Server on one
cluster, this is called an active/active setup.
Failover clustering with SQL Server 2000 Enterprise Edition allows an active/passive, or active/active setup.
Active/passive, also called single instance, means the failover cluster has one node that remains unused until
a failover occurs. You can have a SQL Server named instance called BookRepo\INS1 that is managed by
Node1 by default, but can fail over to Node2:
Active/active configuration, also called multi-instance, makes use of both nodes in a cluster. You can run two
instances of SQL Server on the cluster. For example, Node1 can run BookRepo\INS1 and Node2 can run
BookRepo\INS2. Both instances are separate installations of SQL Server and can fail over to each other's
node if one node fails. The advantage of active/active is that you have twice the SQL Server instances to work
with, and the hardware does not go to waste:
551
Chapter 10
In the event of a failure in Node2, services for both instances of SQL Server would be running on Node1,
pointing to the associated shared disk drives in order to keep both instances running.
552
Failover Clustering
It is important to understand that MSCS allows high availability, but not total fault-tolerance. If one node
fails with an existing SQL Server 2000 virtual server, that virtual server will restart and be controlled from
the second node. Depending on how large your instances are, and the recovery time for your databases,
failover can take anywhere from 15 seconds to several minutes. During a failover, any users currently
connected will be disconnected as if the server were rebooted.
Also note that failover clustering is not the same as network load balancing (NLB). Workload is not
distributed between the two nodes for one SQL Server instance. NLB is a Microsoft Windows product that
balances incoming Internet Protocol traffic across multiple nodes, acting as one logical application, and
allowing the application to scale out up to 32 nodes. NLB is used primarily for Web and media servers, as well
as Terminal Services. Failover Clustering (the cluster service) on the other hand, is used primarily for
database, file, print, and messaging servers.
Nor is a failover cluster a substitute for a disaster recovery plan. You must still produce a disaster recovery
plan, in order to get mission-critical applications back online as quickly as possible (see Chapter 6).
Cluster Meta Data
The Cluster Group manages failover and keeps track of groups and resources. This group uses a shared drive
called the quorum drive, which manages the cluster and logging:
Do not use this quorum drive to install files on your SQL Server instances. The cluster group usually consists
of the cluster IP address resource, cluster name resource, quorum drive resource, time service resource, and
MS DTC service resource. Just like a SQL Server group, the Cluster Group is only managed by one node at a
time, but can failover to the second node in the event of a problem.
Cluster Administrator
Throughout this chapter, you will be using the Cluster Administrator. Cluster Administrator can be launched
on either of the cluster nodes, or from your client.
If you are familiar with previous versions of failover clustering for SQL Server, please note that
as an enhancement from SQL Server 2000 onwards, you can now use SQL Server Service
Manager or SQL Server Enterprise Manager to start and stop SQL Server, without having to use
Cluster Administrator to start and stop SQL Server services.
If you have not installed the Administrative Tools from the Windows 2000 CD, you must do so in order to run
Cluster Administrator from your client. To install these utilities, run AdminPak.msi from the I386 folder on
the Windows 2000 Advanced Server CD.
To launch Cluster Administrator, go to Start | Programs | Administrative Tools | Cluster Administrator. A
faster method is to go to Start |Run, type in cluadmin and OK.
553
Chapter 10
Cluster Administrator allows you to:
‰
Add new groups (groups are logical groupings of resources).
‰
Add resources to groups (resources being disk drives, IP addresses, network names, services, and
more…)
‰
Fail a group to another node. This means that the group and the resources it contains are taken offline
on the existing node, and brought online on the other node.
‰
Take a group offline (if you need to add new resources or configure group properties).
‰
Configure resource properties and dependencies (resource dependencies help make sure that certain
resources are necessary for others to run).
‰
Monitor the status of the nodes and resources (red X marks and yellow caution flags show issues with
entities in Cluster Administrator).
This chapter will explain how to use Cluster Administrator in the context of installing and configuring a SQL
Server 2000 Virtual Server. This chapter also assumes that MSCS has already been installed and configured.
Pre-Installation Checklist for SQL Server
Failover Clustering
‰
SQL Server 2000 Enterprise Edition is required.
‰
For active/passive, you need to install SQL Server once. For active/active, you install SQL Server
twice, once for each virtual server name. Make sure your name and TCP/IP addresses are reserved for
your virtual server name(s).
‰
Verify that you can ping each of the heartbeat DNS names for the cluster, both for the physical node
names, and the cluster name. If you get a response problem, make sure that the nodes are running and
IP addresses have been configured properly.
‰
Make sure that the shared disk drive letter mappings are the same on both nodes of the cluster.
‰
Make sure that the shared disk drive(s) are accessible.
‰
Make sure your hardware is listed on Microsoft's Hardware Compatibility List (HCL). The fact that you
can get it to work does not mean that Microsoft will support you if you run into problems. Treat this
warning seriously for your production environments.
‰
Make sure all operating system service packs and patches are installed first.
‰
Make sure NetBIOS is disabled for your private (heartbeat) network properties. Go to Start | Settings |
Network | Dialup Connections. Double-click the private network connection. Click Properties. Select
TCP/IP and select Properties. Click the Advanced button. On the WINS tab, make sure NetBIOS is
disabled.
‰
Make sure your network cards being used for both public and private connections are not set to auto
detect. Go to Start | Settings | Network | Dialup Connections. Double-click the network connection
(make sure to do this for each network connection used). Click Properties. Click Configure. In the
Advanced tab, click Advanced, and select Link Speed & Duplex. Make sure this is not set to Auto
detect, but rather to the speed capabilities for the card.
‰
Determine ahead of time the naming convention for your virtual SQL Server name(s) and cluster
name. With two physical names, two private heartbeat names, one cluster name, and one or more
virtual SQL Server names, keeping track can get a little confusing!
554
Failover Clustering
‰
Do not install a failover cluster on a domain controller. Putting SQL Server instances on a domain
controller is generally not recommended for nonclustered implementations in general. You may
encounter performance contention if the domain controller is particularly busy, or security issues if
you require a domain account for running the SQL Server and SQL Server Agent service accounts.
10.1 How to… Install a SQL Server 2000
Virtual Server
Prior to installing the SQL Server 2000 instance, you must install a cluster aware version of the Microsoft
Distributed Transaction Coordinator (MS DTC) service.
‰
Open a command line window and type comclust. Press Enter. The MS DTC service will now be
installed as 'cluster aware':
‰
Repeat step 1 on the second node.
‰
Check to make sure that the MS DTC service was added to the Cluster Group on your cluster. Go to
Start | Run | and type cluadmin.
‰
Expand the Group folder and click the Cluster Group (this may have been named something else by
your administrator after the MSCS installation, but this is the group with the quorum disk resources,
cluster name, and IP address). Make sure MS DTC is in the group:
‰
While in Cluster Administrator, verify that the node from which you are installing SQL Server
currently controls the shared disk(s) you plan to use for the SQL Server instance. If not, right-click the
group containing the disk resources and select Move Group; this will move control of the group to the
opposite node.
‰
Insert the SQL Server 2000 CD ROM on the installation node. Installing a SQL Server instance for
clustering from a network drive can cause difficulties and does not always work. If the main splash
screen doesn't automatically start up in a few seconds, double-click autorun.exe in the root
directory of the CD.
555
Chapter 10
‰
Select SQL Server 2000 Components.
‰
Select Install Database Server.
‰
Select Next.
‰
Type in the SQL Server 2000 virtual server name.
‰
Enter your name and company.
‰
Read the software license agreement and, if you agree to the terms, select Yes.
‰
Enter the SQL Server 2000 Enterprise Edition CD key.
‰
Enter the IP address for the SQL Server 2000 virtual server you reserved prior to installation. Select
the network connection (public or private); your SQL Server instance should use the public
connection.
‰
Select the shared disk that the data files should occupy. Do not use the quorum disk for your SQL
Server data files; the Quorum disk belongs to the Cluster Group (or whatever you named the group
that controls the failover cluster), which is the group that manages the cluster's groups and resources.
If the SQL Server instance requires the cluster drive and the SQL Server group fails, this may also take
the cluster down. This is because dependencies, such as a physical disk, require that all dependents
(including SQL Server and the Cluster Service) exist in the same group. Sharing the Quorum disk with
your Virtual SQL Server instance may cause I/O contention or performance issues.
‰
Select which nodes in the cluster will have SQL Server installed. The default for a two-node cluster
should already have the two physical nodes selected.
‰
Select an administrator account with administrator permissions for both nodes in the cluster. This
must be a domain account.
‰
Decide whether or not this will be a default instance or a named instance. If a named instance,
uncheck Default and select an instance name. For active/passive, keeping Default checked is the norm,
but for active/active, your server names may make more sense if you use an instance name.
‰
Select the installation type and directory location. Program files are placed on a local partition (during
installation on both nodes) and the data files on a shared disk.
‰
Select which components to install.
‰
Select the domain account to be used for the SQL Server and SQL Server agent services. This must be
an account that exists on both nodes in the cluster, with administrator permissions.
‰
Select the SQL Server instance collation.
‰
Select the TCP/IP port. If the port number is 0, a port will be dynamically chosen for you upon
installation. If this is an externally facing SQL Server Instance (accessed over a firewall), you may
wish to pre-define a TCP/IP port that will be opened to the firewall for such SQL Server Instances. If
you are configuring a multi-instance cluster, you may need two such defined ports. Defining standard
ports for single or multi-instance clusters will allow you to minimize the number of ports that must be
opened on your firewall for each externally-facing failover cluster. The alternative is unique ports,
selected randomly during the install, for each instance belonging to a Virtual Server or Named
Instance.
‰
Select Next:
556
Failover Clustering
‰
Choose a licensing mode.
‰
As the installation begins, you will receive update dialog boxes indicating that operations are being
performed on the cluster nodes, and that virtual server resources are being created:
‰
Select Finish.
‰
Although you may not be prompted to do so, reboot the first node on which you installed the SQL
Server 2000 Virtual Server, and then the second node too.
‰
To create an active/active configuration, repeat the steps to add a new SQL Server instance.
10.2 How to… Install a Service Pack for a SQL
Server 2000 Virtual Server
As new service packs are released, make sure to read the instructions thoroughly to see if any changes in
procedure have been added for clustered installs. The following example shows how to install service pack 2
for SQL Server 2000 for a SQL Server 2000 Virtual Server (see Chapter 1 for a review of downloading and
installing a service pack):
557
Chapter 10
‰
First, make sure you are installing the service pack from the node currently controlling the SQL Server
instance group that you wish to upgrade.
‰
Double-click the setup.bat file in the installation path directory of the service pack files.
‰
Select Next.
‰
Select the virtual server name that you wish to upgrade.
‰
Select the authentication mode for the service pack setup.
You will receive the following update:
‰
Select the user name and password with administrator permissions for all nodes in the cluster.
‰
Installation will begin. System databases will be updated, and binary files on both nodes will be
updated.
‰
You will be prompted to back up the master and msdb databases.
‰
Restart both nodes in the cluster. If you have an active/active topology, restart the first node, and then
failover the non-upgraded instance to the restarted node and restart the second node. To fail over or
fail back a group, simply right-click the group in Cluster Administrator and select Move Group. Fail
back each SQL Server group to its default instance (node by which you want each SQL Server group to
be controlled).
10.3 How to… Implement Post-Installation
Steps
After installing a SQL Server virtual server, there are some remaining steps that must be performed:
‰
Start Cluster Administrator by selecting Start | Run and typing cluadmin.
‰
In Cluster Administrator, type in the cluster or server name. If you have already been in Cluster
Administrator, any cluster connections you had open will immediately be brought up instead. If you
want to access a new cluster, go to the File | Open Connection window:
‰
Give the SQL Server 2000 virtual server group a more appropriate name (instead of Disk Group 1 or
something not indicating SQL Server). To change the group name, expand Groups and click the group
name once to enter a new name. Select Enter when complete.
558
Failover Clustering
Adding Additional Disk Resources
After installing SQL Server, the only disk resource added to your SQL Server 2000 Virtual Server group will
be the data file drive selected during installation. If you wish to use additional shared disk drives for SQL
Server, they must be added to the SQL Server group.
‰
Open the group that currently holds the disk drives you wish to add to the SQL Server group,
right-click each disk resource, select Change Group, and select the group to which to move it.
‰
You will be prompted to make sure you wish to move this resource to a new group, and then again, for
which you should select Yes:
‰
If the original group is now empty, right-click the group in Cluster Administrator and select Delete.
‰
Next, you must make this physical disk a dependency of SQL Server so that SQL Server can use it.
Dependencies ensure that resources are available before a specific resource can be taken online within
a cluster group. First, the SQL Server 2000 virtual server group must be taken offline; do this by
right-clicking the group in Cluster Administrator, and selecting Take Offline.
‰
In the right pane, with the Virtual SQL Server instance group selected in the left pane, right-click the
SQL Server resource (the service) and select Properties.
‰
In the Dependencies tab, click the Modify button.
‰
Add the disk resource you wish the Virtual SQL Server instance to use, by clicking the available
resources and selecting the right arrow to add to the Dependencies pane.
‰
Select OK in the Modify Dependencies dialog box, and OK again in the Properties dialog box for the
service.
‰
Right-click the SQL Server group in the left pane of Cluster Administrator, and select Bring Online.
Other Post-Installation Configurations to
Monitor
‰
Watch out for fixed memory sizes defined on your SQL Server instances; make sure the total size used
for an active/active cluster does not exceed the total memory resources of one node. For example, if
you set a fixed 2.5GB size for each instance but the total memory per server is 4GB, you will have
problems if two instances need to share a node.
‰
Keep the BUILTIN\Administrators account in SQL Server, as it is used by the cluster service
account.
‰
If you plan on using replication on a clustered SQL Server instance, use a share name on the shared
disk. Do not use a local disk on either of the physical server nodes.
‰
Always look for instructions specific to clustering. For example, if downloading the latest security
patch, follow the instructions for a cluster and not the regular installation instructions. If explicit
clustered installation instructions do not exist, do not assume that your changes will work, or that
your new components will be cluster-aware.
‰
If you ever need to change an IP address for the Virtual SQL Server, you must do so using the SQL
Server Enterprise Edition installation CD. See Microsoft Knowledge Base article Q244980, HOWTO:
Change the Network IP Addresses on a Virtual SQL Server.
‰
To change the domain of a SQL Server Failover Cluster, reference the Microsoft Knowledge Base
article 319016, HOW TO: Change Domains for a SQL Server 2000 Failover Cluster.
559
Chapter 10
‰
Unlike a nonclustered instance of SQL Server 2000, renaming a Virtual Server on a failover cluster is
not recommended. You must uninstall and reinstall the cluster with the new name. For more details,
see the Microsoft Knowledge Base article 307336, INF: How to Change a Clustered SQL Server Network
Name.
‰
Microsoft recommends that symmetric multiprocessing (SMP) systems should have one processor
reserved for the operating system and cluster service, using processor affinity. You should consider
this within the context of how intense CPU usage will be for your SQL Server 2000 virtual server (or
servers for active/active).
10.4 How to… Troubleshoot a Failed SQL
Server Virtual Server
When installing a new SQL Server 2000 virtual server, installation could take anywhere from 15 to 30
minutes, depending on the hardware used. If your installation takes longer than that, make sure the
installation has not hung. Press the Alt-Tab key combination to make sure that there are no error messages in
the background. For example, if attempting to install SQL Server from a network drive, you might receive a
files not found error in the background (which is why installing from a CD-ROM drive is preferred).
For failed or interrupted installations, you can usually restart installation by removing the resources created
so far via Cluster Administrator; these include the resources for SQL Server Name and SQL Server IP address.
Do not delete the resource for the shared disk drive(s) you may be using. You may also need to delete binary
files that were installed on the local disk on both nodes, before trying to reinstall the SQL Server clustered
instance.
Before attempting to do a manual cleanup of the files and resources, attempt a regular uninstall first (see the
next section). For more details on manual uninstalls of Virtual Servers, see the Microsoft Knowledge Base
article 290991, Manually Remove SQL Server 2000 Default, Named, or Virtual Instance.
Some other troubleshooting tips to keep in mind:
‰
SQL Server 2000 clustering does not support SQLMail and SQLAgentMail.
‰
Examine the Sqlstp.log file in C:\WINNT directory for more detail about which errors occurred and
how far the installation proceeded before failing.
‰
Check out the Microsoft Knowledge Base article Q321063 HOW TO: Troubleshoot the 'Setup Failed to
Perform Required Operations on the Cluster Nodes' Error, for more in-depth information on
troubleshooting failed installations. Other useful troubleshooting articles include article Q279642,
PRB: SQL Server 2000 Virtual Server Setup Error: 'The Drive Chosen for the Program Files Installation
Path <C:>, Is Not a Valid Path on All the Nodes of the Cluster', Not Valid; article Q235529, MSCS
Virtual Server Limitations in a Windows 2000 Domain Environment; and article Q283794, Problems
Using Certificate with Virtual Name in Clustered SQL Servers.
10.5 How to… Uninstall a SQL Server 2000
Virtual Server
To uninstall a SQL Server 2000 Virtual Server:
‰
560
From the SQL Server 2000 Enterprise Edition installation CD, select Setup.
Failover Clustering
‰
Select Next from the Welcome screen.
‰
Select the SQL Server 2000 Virtual Server you wish to uninstall.
‰
Select Upgrade, remove, or add components to an existing instance of SQL Server.
‰
Select the named instance to remove.
‰
Select Next to proceed with uninstalling the installation.
‰
Select a valid service account with administrator permissions to all nodes (installation will remove
binary files from both nodes in the cluster).
‰
You will receive a prompt updating the progress of the operation.
‰
You will then receive a prompt after the instance has been removed successfully.
‰
Select Finish in the final dialog box. Reboot each node in the cluster.
561
Download