Failover Clustering Microsoft Cluster Service (MSCS) is available for installation on Windows 2000 Advanced Server, Windows 2000 Datacenter Server, and Windows NT Enterprise Edition with Service Pack 5 (although this is not recommended). MSCS allows applications, such as SQL Server 2000 Enterprise Edition, to take advantage of high-availability failover functionality. A failover means that if one server, or node, fails, the applications on another server in the cluster will start up and take over for the failed node. There are two types of failovers, planned and those that occur as a result of a server hardware or software problem. Only applications that are cluster-aware can utilize the MSCS failover capabilities. Cluster-aware applications, such as SQL Server, will remain available even if one node encounters a failure. Windows 2000 Advanced Server supports two-node clustering, and Windows 2000 Datacenter supports four-node clustering. MSCS achieves high availability by using a shared disk array. For SQL Server clustering, this shared disk array contains the database files (.mdf, .ndf, .ldf), along with backups, and other optional components. The shared disk can be a regular external SCSI disk or disk array, or a Storage Area Network (SAN) array. SAN arrays are sold by hardware and software vendors, and allow high-speed data transfer to external disk arrays via a fiber channel connection (and network card). Software RAID is not supported; only hardware RAID configurations are allowed. File encryption is also unsupported for the shared disk(s). Binary installation files and executables are installed locally on both nodes of a two-node cluster, and are not placed on the shared disk. The SQL Server and SQL Server Agent services are also installed on each node of the cluster. For a SQL Server 2000 virtual server, the same service is installed on both nodes, but only one side is active at any time. Virtual server is another way of saying that the SQL Server instance is on a failover cluster. Clients and applications connect to the virtual server as they would for a nonclustered default or named instance of SQL Server. The SQL Server 2000 virtual server can exist or be controlled by either node within a cluster (but only one at a time): Chapter 10 If a failure occurs on the active node, the second node will 'go live'; its services will start up and take over control of the files on the shared disk array for the SQL Server 2000 virtual server: In a two-node cluster, two servers are connected using a private network (heartbeat) connection. The private network gives the two nodes a fast path for checking each other's status. Network connectivity for end users is established via a public IP address. End users connect to the SQL Server 2000 virtual server using the virtual server NetBIOS name. This name and TCP/IP address is defined prior to SQL Server installation, and is specified during installation. A SQL Server 2000 virtual server is defined by a cluster group, which in turn is made up of separate resources. Resources included in a SQL Server 2000 virtual server group include a shared disk drive or drives, the virtual server name, the virtual server IP address, the SQL Server and SQL Server agent services and, optionally, the SQL Server Fulltext service. This group is treated as a single unit where, in the event of a node failure, all resources must be started or engaged on the failover node. Resources and groups are managed in Cluster Administrator, which will be reviewed later in this chapter. A two-node cluster needs the following names and IP addresses: 550 Type Description Physical server name and TCP/IP address for first node Just like a normal server requirement, you have a name and associated IP address. Physical server name and TCP/IP address for second node … Cluster name and TCP/IP address The cluster name is a virtual name and IP address used for adding MSCS, and is referenced in Cluster Administrator (more on this later on). Heartbeat name and TCP/IP address for Node 1 The heartbeat or private network connection between the two nodes for checking status and availability. These should be on the same subnet. Failover Clustering Type Description Heartbeat name and TCP/IP address for Node 2 … SQL Server 2000 virtual server name and TCP/IP address This is the name that SQL Server users will use to access the clustered instance. This instance can exist on either Node 1 or Node 2. Users should treat this instance like any regular SQL Server instance. If you are installing just one SQL Server instance on the cluster, this is an active/passive setup. Second SQL Server 2000 virtual server name and TCP/IP address If using two instances of SQL Server on one cluster, this is called an active/active setup. Failover clustering with SQL Server 2000 Enterprise Edition allows an active/passive, or active/active setup. Active/passive, also called single instance, means the failover cluster has one node that remains unused until a failover occurs. You can have a SQL Server named instance called BookRepo\INS1 that is managed by Node1 by default, but can fail over to Node2: Active/active configuration, also called multi-instance, makes use of both nodes in a cluster. You can run two instances of SQL Server on the cluster. For example, Node1 can run BookRepo\INS1 and Node2 can run BookRepo\INS2. Both instances are separate installations of SQL Server and can fail over to each other's node if one node fails. The advantage of active/active is that you have twice the SQL Server instances to work with, and the hardware does not go to waste: 551 Chapter 10 In the event of a failure in Node2, services for both instances of SQL Server would be running on Node1, pointing to the associated shared disk drives in order to keep both instances running. 552 Failover Clustering It is important to understand that MSCS allows high availability, but not total fault-tolerance. If one node fails with an existing SQL Server 2000 virtual server, that virtual server will restart and be controlled from the second node. Depending on how large your instances are, and the recovery time for your databases, failover can take anywhere from 15 seconds to several minutes. During a failover, any users currently connected will be disconnected as if the server were rebooted. Also note that failover clustering is not the same as network load balancing (NLB). Workload is not distributed between the two nodes for one SQL Server instance. NLB is a Microsoft Windows product that balances incoming Internet Protocol traffic across multiple nodes, acting as one logical application, and allowing the application to scale out up to 32 nodes. NLB is used primarily for Web and media servers, as well as Terminal Services. Failover Clustering (the cluster service) on the other hand, is used primarily for database, file, print, and messaging servers. Nor is a failover cluster a substitute for a disaster recovery plan. You must still produce a disaster recovery plan, in order to get mission-critical applications back online as quickly as possible (see Chapter 6). Cluster Meta Data The Cluster Group manages failover and keeps track of groups and resources. This group uses a shared drive called the quorum drive, which manages the cluster and logging: Do not use this quorum drive to install files on your SQL Server instances. The cluster group usually consists of the cluster IP address resource, cluster name resource, quorum drive resource, time service resource, and MS DTC service resource. Just like a SQL Server group, the Cluster Group is only managed by one node at a time, but can failover to the second node in the event of a problem. Cluster Administrator Throughout this chapter, you will be using the Cluster Administrator. Cluster Administrator can be launched on either of the cluster nodes, or from your client. If you are familiar with previous versions of failover clustering for SQL Server, please note that as an enhancement from SQL Server 2000 onwards, you can now use SQL Server Service Manager or SQL Server Enterprise Manager to start and stop SQL Server, without having to use Cluster Administrator to start and stop SQL Server services. If you have not installed the Administrative Tools from the Windows 2000 CD, you must do so in order to run Cluster Administrator from your client. To install these utilities, run AdminPak.msi from the I386 folder on the Windows 2000 Advanced Server CD. To launch Cluster Administrator, go to Start | Programs | Administrative Tools | Cluster Administrator. A faster method is to go to Start |Run, type in cluadmin and OK. 553 Chapter 10 Cluster Administrator allows you to: Add new groups (groups are logical groupings of resources). Add resources to groups (resources being disk drives, IP addresses, network names, services, and more…) Fail a group to another node. This means that the group and the resources it contains are taken offline on the existing node, and brought online on the other node. Take a group offline (if you need to add new resources or configure group properties). Configure resource properties and dependencies (resource dependencies help make sure that certain resources are necessary for others to run). Monitor the status of the nodes and resources (red X marks and yellow caution flags show issues with entities in Cluster Administrator). This chapter will explain how to use Cluster Administrator in the context of installing and configuring a SQL Server 2000 Virtual Server. This chapter also assumes that MSCS has already been installed and configured. Pre-Installation Checklist for SQL Server Failover Clustering SQL Server 2000 Enterprise Edition is required. For active/passive, you need to install SQL Server once. For active/active, you install SQL Server twice, once for each virtual server name. Make sure your name and TCP/IP addresses are reserved for your virtual server name(s). Verify that you can ping each of the heartbeat DNS names for the cluster, both for the physical node names, and the cluster name. If you get a response problem, make sure that the nodes are running and IP addresses have been configured properly. Make sure that the shared disk drive letter mappings are the same on both nodes of the cluster. Make sure that the shared disk drive(s) are accessible. Make sure your hardware is listed on Microsoft's Hardware Compatibility List (HCL). The fact that you can get it to work does not mean that Microsoft will support you if you run into problems. Treat this warning seriously for your production environments. Make sure all operating system service packs and patches are installed first. Make sure NetBIOS is disabled for your private (heartbeat) network properties. Go to Start | Settings | Network | Dialup Connections. Double-click the private network connection. Click Properties. Select TCP/IP and select Properties. Click the Advanced button. On the WINS tab, make sure NetBIOS is disabled. Make sure your network cards being used for both public and private connections are not set to auto detect. Go to Start | Settings | Network | Dialup Connections. Double-click the network connection (make sure to do this for each network connection used). Click Properties. Click Configure. In the Advanced tab, click Advanced, and select Link Speed & Duplex. Make sure this is not set to Auto detect, but rather to the speed capabilities for the card. Determine ahead of time the naming convention for your virtual SQL Server name(s) and cluster name. With two physical names, two private heartbeat names, one cluster name, and one or more virtual SQL Server names, keeping track can get a little confusing! 554 Failover Clustering Do not install a failover cluster on a domain controller. Putting SQL Server instances on a domain controller is generally not recommended for nonclustered implementations in general. You may encounter performance contention if the domain controller is particularly busy, or security issues if you require a domain account for running the SQL Server and SQL Server Agent service accounts. 10.1 How to… Install a SQL Server 2000 Virtual Server Prior to installing the SQL Server 2000 instance, you must install a cluster aware version of the Microsoft Distributed Transaction Coordinator (MS DTC) service. Open a command line window and type comclust. Press Enter. The MS DTC service will now be installed as 'cluster aware': Repeat step 1 on the second node. Check to make sure that the MS DTC service was added to the Cluster Group on your cluster. Go to Start | Run | and type cluadmin. Expand the Group folder and click the Cluster Group (this may have been named something else by your administrator after the MSCS installation, but this is the group with the quorum disk resources, cluster name, and IP address). Make sure MS DTC is in the group: While in Cluster Administrator, verify that the node from which you are installing SQL Server currently controls the shared disk(s) you plan to use for the SQL Server instance. If not, right-click the group containing the disk resources and select Move Group; this will move control of the group to the opposite node. Insert the SQL Server 2000 CD ROM on the installation node. Installing a SQL Server instance for clustering from a network drive can cause difficulties and does not always work. If the main splash screen doesn't automatically start up in a few seconds, double-click autorun.exe in the root directory of the CD. 555 Chapter 10 Select SQL Server 2000 Components. Select Install Database Server. Select Next. Type in the SQL Server 2000 virtual server name. Enter your name and company. Read the software license agreement and, if you agree to the terms, select Yes. Enter the SQL Server 2000 Enterprise Edition CD key. Enter the IP address for the SQL Server 2000 virtual server you reserved prior to installation. Select the network connection (public or private); your SQL Server instance should use the public connection. Select the shared disk that the data files should occupy. Do not use the quorum disk for your SQL Server data files; the Quorum disk belongs to the Cluster Group (or whatever you named the group that controls the failover cluster), which is the group that manages the cluster's groups and resources. If the SQL Server instance requires the cluster drive and the SQL Server group fails, this may also take the cluster down. This is because dependencies, such as a physical disk, require that all dependents (including SQL Server and the Cluster Service) exist in the same group. Sharing the Quorum disk with your Virtual SQL Server instance may cause I/O contention or performance issues. Select which nodes in the cluster will have SQL Server installed. The default for a two-node cluster should already have the two physical nodes selected. Select an administrator account with administrator permissions for both nodes in the cluster. This must be a domain account. Decide whether or not this will be a default instance or a named instance. If a named instance, uncheck Default and select an instance name. For active/passive, keeping Default checked is the norm, but for active/active, your server names may make more sense if you use an instance name. Select the installation type and directory location. Program files are placed on a local partition (during installation on both nodes) and the data files on a shared disk. Select which components to install. Select the domain account to be used for the SQL Server and SQL Server agent services. This must be an account that exists on both nodes in the cluster, with administrator permissions. Select the SQL Server instance collation. Select the TCP/IP port. If the port number is 0, a port will be dynamically chosen for you upon installation. If this is an externally facing SQL Server Instance (accessed over a firewall), you may wish to pre-define a TCP/IP port that will be opened to the firewall for such SQL Server Instances. If you are configuring a multi-instance cluster, you may need two such defined ports. Defining standard ports for single or multi-instance clusters will allow you to minimize the number of ports that must be opened on your firewall for each externally-facing failover cluster. The alternative is unique ports, selected randomly during the install, for each instance belonging to a Virtual Server or Named Instance. Select Next: 556 Failover Clustering Choose a licensing mode. As the installation begins, you will receive update dialog boxes indicating that operations are being performed on the cluster nodes, and that virtual server resources are being created: Select Finish. Although you may not be prompted to do so, reboot the first node on which you installed the SQL Server 2000 Virtual Server, and then the second node too. To create an active/active configuration, repeat the steps to add a new SQL Server instance. 10.2 How to… Install a Service Pack for a SQL Server 2000 Virtual Server As new service packs are released, make sure to read the instructions thoroughly to see if any changes in procedure have been added for clustered installs. The following example shows how to install service pack 2 for SQL Server 2000 for a SQL Server 2000 Virtual Server (see Chapter 1 for a review of downloading and installing a service pack): 557 Chapter 10 First, make sure you are installing the service pack from the node currently controlling the SQL Server instance group that you wish to upgrade. Double-click the setup.bat file in the installation path directory of the service pack files. Select Next. Select the virtual server name that you wish to upgrade. Select the authentication mode for the service pack setup. You will receive the following update: Select the user name and password with administrator permissions for all nodes in the cluster. Installation will begin. System databases will be updated, and binary files on both nodes will be updated. You will be prompted to back up the master and msdb databases. Restart both nodes in the cluster. If you have an active/active topology, restart the first node, and then failover the non-upgraded instance to the restarted node and restart the second node. To fail over or fail back a group, simply right-click the group in Cluster Administrator and select Move Group. Fail back each SQL Server group to its default instance (node by which you want each SQL Server group to be controlled). 10.3 How to… Implement Post-Installation Steps After installing a SQL Server virtual server, there are some remaining steps that must be performed: Start Cluster Administrator by selecting Start | Run and typing cluadmin. In Cluster Administrator, type in the cluster or server name. If you have already been in Cluster Administrator, any cluster connections you had open will immediately be brought up instead. If you want to access a new cluster, go to the File | Open Connection window: Give the SQL Server 2000 virtual server group a more appropriate name (instead of Disk Group 1 or something not indicating SQL Server). To change the group name, expand Groups and click the group name once to enter a new name. Select Enter when complete. 558 Failover Clustering Adding Additional Disk Resources After installing SQL Server, the only disk resource added to your SQL Server 2000 Virtual Server group will be the data file drive selected during installation. If you wish to use additional shared disk drives for SQL Server, they must be added to the SQL Server group. Open the group that currently holds the disk drives you wish to add to the SQL Server group, right-click each disk resource, select Change Group, and select the group to which to move it. You will be prompted to make sure you wish to move this resource to a new group, and then again, for which you should select Yes: If the original group is now empty, right-click the group in Cluster Administrator and select Delete. Next, you must make this physical disk a dependency of SQL Server so that SQL Server can use it. Dependencies ensure that resources are available before a specific resource can be taken online within a cluster group. First, the SQL Server 2000 virtual server group must be taken offline; do this by right-clicking the group in Cluster Administrator, and selecting Take Offline. In the right pane, with the Virtual SQL Server instance group selected in the left pane, right-click the SQL Server resource (the service) and select Properties. In the Dependencies tab, click the Modify button. Add the disk resource you wish the Virtual SQL Server instance to use, by clicking the available resources and selecting the right arrow to add to the Dependencies pane. Select OK in the Modify Dependencies dialog box, and OK again in the Properties dialog box for the service. Right-click the SQL Server group in the left pane of Cluster Administrator, and select Bring Online. Other Post-Installation Configurations to Monitor Watch out for fixed memory sizes defined on your SQL Server instances; make sure the total size used for an active/active cluster does not exceed the total memory resources of one node. For example, if you set a fixed 2.5GB size for each instance but the total memory per server is 4GB, you will have problems if two instances need to share a node. Keep the BUILTIN\Administrators account in SQL Server, as it is used by the cluster service account. If you plan on using replication on a clustered SQL Server instance, use a share name on the shared disk. Do not use a local disk on either of the physical server nodes. Always look for instructions specific to clustering. For example, if downloading the latest security patch, follow the instructions for a cluster and not the regular installation instructions. If explicit clustered installation instructions do not exist, do not assume that your changes will work, or that your new components will be cluster-aware. If you ever need to change an IP address for the Virtual SQL Server, you must do so using the SQL Server Enterprise Edition installation CD. See Microsoft Knowledge Base article Q244980, HOWTO: Change the Network IP Addresses on a Virtual SQL Server. To change the domain of a SQL Server Failover Cluster, reference the Microsoft Knowledge Base article 319016, HOW TO: Change Domains for a SQL Server 2000 Failover Cluster. 559 Chapter 10 Unlike a nonclustered instance of SQL Server 2000, renaming a Virtual Server on a failover cluster is not recommended. You must uninstall and reinstall the cluster with the new name. For more details, see the Microsoft Knowledge Base article 307336, INF: How to Change a Clustered SQL Server Network Name. Microsoft recommends that symmetric multiprocessing (SMP) systems should have one processor reserved for the operating system and cluster service, using processor affinity. You should consider this within the context of how intense CPU usage will be for your SQL Server 2000 virtual server (or servers for active/active). 10.4 How to… Troubleshoot a Failed SQL Server Virtual Server When installing a new SQL Server 2000 virtual server, installation could take anywhere from 15 to 30 minutes, depending on the hardware used. If your installation takes longer than that, make sure the installation has not hung. Press the Alt-Tab key combination to make sure that there are no error messages in the background. For example, if attempting to install SQL Server from a network drive, you might receive a files not found error in the background (which is why installing from a CD-ROM drive is preferred). For failed or interrupted installations, you can usually restart installation by removing the resources created so far via Cluster Administrator; these include the resources for SQL Server Name and SQL Server IP address. Do not delete the resource for the shared disk drive(s) you may be using. You may also need to delete binary files that were installed on the local disk on both nodes, before trying to reinstall the SQL Server clustered instance. Before attempting to do a manual cleanup of the files and resources, attempt a regular uninstall first (see the next section). For more details on manual uninstalls of Virtual Servers, see the Microsoft Knowledge Base article 290991, Manually Remove SQL Server 2000 Default, Named, or Virtual Instance. Some other troubleshooting tips to keep in mind: SQL Server 2000 clustering does not support SQLMail and SQLAgentMail. Examine the Sqlstp.log file in C:\WINNT directory for more detail about which errors occurred and how far the installation proceeded before failing. Check out the Microsoft Knowledge Base article Q321063 HOW TO: Troubleshoot the 'Setup Failed to Perform Required Operations on the Cluster Nodes' Error, for more in-depth information on troubleshooting failed installations. Other useful troubleshooting articles include article Q279642, PRB: SQL Server 2000 Virtual Server Setup Error: 'The Drive Chosen for the Program Files Installation Path <C:>, Is Not a Valid Path on All the Nodes of the Cluster', Not Valid; article Q235529, MSCS Virtual Server Limitations in a Windows 2000 Domain Environment; and article Q283794, Problems Using Certificate with Virtual Name in Clustered SQL Servers. 10.5 How to… Uninstall a SQL Server 2000 Virtual Server To uninstall a SQL Server 2000 Virtual Server: 560 From the SQL Server 2000 Enterprise Edition installation CD, select Setup. Failover Clustering Select Next from the Welcome screen. Select the SQL Server 2000 Virtual Server you wish to uninstall. Select Upgrade, remove, or add components to an existing instance of SQL Server. Select the named instance to remove. Select Next to proceed with uninstalling the installation. Select a valid service account with administrator permissions to all nodes (installation will remove binary files from both nodes in the cluster). You will receive a prompt updating the progress of the operation. You will then receive a prompt after the instance has been removed successfully. Select Finish in the final dialog box. Reboot each node in the cluster. 561