Microsoft® SQL Server™ with Failover Clustering Solution Guide Enterprise Systems Group (ESG) Storage Systems Group (SSG) Dell White Paper By Kim L Murphy April 2001 Contents Executive Summary ...............................................................................4 Introduction ............................................................................................5 Methodology, Products and Tools .....................................................7 Windows Clustering.......................................................................7 SQL Server 2000...............................................................................7 SQL Server 2000 Failover Clustering ..........................................8 Assumptions/Pre-Requisites.........................................................9 Planning Considerations for SQL Server .......................................10 Planning Steps ...............................................................................10 Choosing the Platform .................................................................10 SCSI vs. Fibre Channel ............................................................................ 11 SCSI Storage Characteristics .................................................................. 11 Sizing the system ..........................................................................12 Planning the I/O subsystem ........................................................13 PERC 2 Controller Disk Array Limitation............................................ 13 Planning the Network ..................................................................14 Limitations .....................................................................................15 Description of Environment .......................................................16 Detailed Description of System ............................................................. 16 Hardware: ................................................................................................. 16 Software: ................................................................................................... 16 Building the Base Cluster ..................................................................17 Hardware Installation and Configuration ...............................17 Windows 2000 Installation ..........................................................18 Install Windows 2000 Service Pack (SP)1 (or later) ................21 Creating Windows 2000 Disk Partitions ...................................21 Setting up a Domain User Account for the Cluster ................21 Cluster Configuration Wizard ....................................................22 Verifying the Cluster Setup ........................................................31 Managing Disks in Cluster Administrator ..............................34 Configuring MS DTC for the Cluster .......................................36 Adding SQL Server 2000 to the Cluster...........................................37 Best Practices, Key Findings and Summary ...................................40 Contacts .................................................................................................42 Appendix 1: Reference Documents .................................................43 Appendix 2: Vendor Information ....................................................44 Dell Computer Corporation ........................................................44 Microsoft Corporation..................................................................44 April 2001 Page 2 Dell ESG/SSG Worldwide Marketing Intel Corporation...........................................................................44 Appendix 3: Glossary of Terms and Acronyms ............................45 Figure 1: Example of network cabling connections ..............................................................................14 Figure 2: Local area connections renamed "public" and "private" ......................................................20 Figure 3: Cluster Service Configuration Wizard: Create or join a cluster .........................................23 Figure 4: Naming the cluster ....................................................................................................................24 Figure 5: Select an Account ......................................................................................................................24 Figure 6: The Add or Remove Managed Disks window ......................................................................25 Figure 7: The cluster file storage screen ..................................................................................................26 Figure 8: Assigning network role for public network connection ......................................................26 Figure 9: Assigning network role for private network connection .....................................................27 Figure 10: Setting network failover order ..............................................................................................28 Figure 11: Cluster IP address screen .......................................................................................................28 Figure 12: Cluster administrator ..............................................................................................................29 Figure 13: Create or join the second node in the cluster ......................................................................30 Figure 14: Cluster name ............................................................................................................................30 Figure 15: Select an account for the cluster service ...............................................................................31 Figure 16: Cluster Administrator showing two nodes online .............................................................32 Figure 17: Cluster Administrator showing Active Groups for Node A .............................................33 Figure 18: Cluster Administrator showing the moved group .............................................................34 Figure 19: Cluster Administrator showing contents of Disk 2 group .................................................35 Figure 20: Changing the group name ......................................................................................................36 Figure 21: Screen for naming the SQL Server virtual server ...............................................................38 Figure 22: Entering IP address for the virtual server ............................................................................38 Figure 23: After clicking "Add" ................................................................................................................39 Table 1: Characteristics of Various RAID Levels ...................................................................................12 Table 2: SQL Server Database System Requirements for Shared Volumes .......................................13 Table 3: Sample IP Addresses for Cluster Nodes and for WINS, DNS, and Gateway Servers .......15 Table 4: Installation Requirements for various versions of Windows 2000........................................18 April 2001 Page 3 Dell ESG/SSG Worldwide Marketing Section 1 Executive Summary Microsoft® SQL Server™ failover clustering is designed to provide high server availability for customer databases. It does not provide non-stop operation; rather it provides a quick recovery from system failure. This quick recovery offers sufficient availability for most mission-critical applications. The goal of this paper is to serve as a guide for Dell field technicians, sales persons, and customers on the steps necessary to design and configure a two-node SQL Server failover cluster system. The solution proposed in this paper involves the configuration of two equivalent Dell™ PowerEdge™ servers that are clustered using Microsoft® Cluster Server (MSCS), and that run SQL Server 2000 for failover clustering. This paper explains the installation and configuration procedures necessary to plan, configure, install and test a SQL Server 2000 failover cluster on Dell hardware. Included in this paper are the following steps: Planning the system Elementary sizing Configuring the hardware for clustering Installing and configuring Windows® 2000 Installing MSCS Installing SQL Server 2000 for failover This paper provides specific instructions that are designed to lead to a successful installation. It is easy to miss small details that can cause the installation to fail. This paper highlights those details and explains why they are important to a successful installation. April 2001 Page 4 Dell ESG/SSG Worldwide Marketing Section 2 Introduction Installing and configuring a cluster can be complicated and time consuming if not performed correctly. This document is designed to help simplify the process and reduce errors by providing a systematic procedure for installing, configuring and testing a SQL Server Failover Cluster. The instructions provided within this paper help to successfully design, configure and install a SQL Server cluster. Installing a SQL Server cluster is not a trivial task and this paper is not intended to minimize the complexity of the task, rather to assist in the execution of the installation. Attention to detail is very important, since small details can mean the difference between a successful and unsuccessful installation. Where possible the most important details are highlighted with a note. Dell has performed extensive tests and developed specific solutions using a variety of different server systems and storage solutions. These have resulted in several Dell PowerEdge Cluster Solutions. Dell PowerEdge Cluster Solutions consist of the following models: April 2001 SE100, SE200, SL200 SCSI solutions These cluster solutions use Dell PERC RAID controllers. Platform Guides for these solutions may be downloaded from the following URL http://support.dell.com/us/en/docs/index.asp?cc=12&ct=89. FE100 or FL100 Fibre Channel solutions using SAN 3.0 These cluster solutions use PowerVault™ SAN 3.0 and the PowerVault 650F storage system. Platform Guides for these solutions may be downloaded from the following URL http://support.dell.com/us/en/docs/index.asp?cc=12&ct=89. FE100/FE200 or FL100/FL200 Fibre Channel solutions using SAN 4.0 These cluster solutions use PowerVault SAN 4.0 and the PowerVault 660F or 650F storage systems. Platform Guides for these solutions may be downloaded from the following URL http://support.dell.com/us/en/docs/index.asp?cc=12&ct=89. Page 5 Dell ESG/SSG Worldwide Marketing The following software is installed: Microsoft® Windows® 2000 Advanced Server Service Pack 1 for Windows 2000 (may be installed by default with the Windows 2000 installation) Microsoft Windows 2000 Cluster Services SQL Server 2000 Enterprise Edition (will automatically include failover clustering setup) The paper explains Windows Clustering and SQL Server failover clustering and why failover is used. It also covers system planning and system setup from the ground up – from hardware setup and configuration to software installation – for a SQL Server Failover Cluster system. April 2001 Page 6 Dell ESG/SSG Worldwide Marketing Section 3 Methodology, Products and Tools This paper is designed to assist in the planning, configuration and installation of Microsoft SQL Server 2000 with Microsoft Windows 2000 using windows clustering. The objective of this paper is to provide a straightforward, step-bystep tutorial that will lead to a successful configuration. This configuration of this solution consists of several components and tools that are listed in this section including a description of what they are and how they work. Windows Clustering Windows Clustering is a service that is included with Microsoft Windows 2000 Advanced Server or Windows 2000 Datacenter Server. A primary purpose of clustering is to provide high server availability through failover between server nodes (if one server fails, the other takes over its functionality). This only works for cluster-aware applications, which provide the functionality that allows them to take advantage of the failover features provided by MSCS. In this paper, the cluster-aware application is SQL Server 2000. (Other types of cluster-aware applications may include knowledge management software, ERP, and file and print services.) SQL Server 2000 works together with MSCS to provide failoverclustering capabilities. See the next section, SQL Server 2000 Failover Clustering, for more details on how SQL Server failover works. MSCS requires a shared disk storage system between the cluster nodes. It also requires disk space for a resource called the quorum disk. The quorum disk is used to store cluster configuration database checkpoints and log files that help manage the cluster. This is discussed further in the Planning the I/O subsystem section. SQL Server 2000 SQL Server 2000 is a scalable Relational Database Management System (RDBMS). SQL Server 2000 is designed to store and retrieve large amounts of data quickly via the Structured Query Language (SQL). SQL Server 2000 supports large or small databases. It also supports large numbers of multiple online users. April 2001 Page 7 Dell ESG/SSG Worldwide Marketing SQL Server 2000 includes features such as indexes, views, and stored procedures. SQL Server 2000 also includes a replication option. New with SQL Server 2000 is the Federated Server option that allows for multiple SQL Server systems to form a single distributed database. Other options include Data Transformation Services (DTS) for moving data in and out of the SQL Server database, XML support, and a full-featured Online Analytical Processing (OLAP) option. SQL Server 2000 Failover Clustering SQL Server failover clustering provides high server availability such that if one system fails or is purposely taken offline, SQL Server processing switches to a second, clustered system. This switch is known as a failover. Failover is designed to minimize system downtime. Once a failover occurs, the failed system may be restored, brought back online, and then it is possible to switch processing back to the restored system – this is called failback. Each system that is part of a cluster is called a node. With Windows 2000 Advanced Server a cluster can have up to two nodes and with Windows 2000 Datacenter Server a cluster may have up to four nodes. The nodes in a cluster share a common set of cluster resources, such as disk drives. Each node is connected to a network and can communicate with other nodes. Nodes send other nodes network messages called heartbeat messages. If the MSCS software detects the loss of a heartbeat from one of the nodes in the cluster, then failover will occur for that node. Note: Microsoft Windows 2000 Advanced Server supports up to two nodes in a failover cluster while Windows 2000 Datacenter Server supports up to four nodes in a failover cluster. When a failover occurs, the user connections to the failed node will be lost, so users must log in again once the second clustered node takes over. It takes a very short period of time for this failover process to occur, during which time the users will be waiting. Therefore, failover does not provide 100 percent uptime, but close to it. The main use for failover is to provide high server availability in cases such as the following: 1. System failure – this includes SQL Server failures, power losses, and hardware failures that cause the system to hang or shutdown. 2. Planned system downtime – for things such as upgrades, system maintenance, etc. It is possible to manually take the first cluster node offline, causing failover to the second node. The second node will continue processing user requests, while the first node can be upgraded without causing downtime for users. MSCS allows two main methods for failover clustering – active/active and active/passive. (A one-node cluster is also an option, but is not discussed here. April 2001 Page 8 Dell ESG/SSG Worldwide Marketing See SQL Server Books Online, online documentation that comes with SQL Server installation. It can be starting by clicking Start -> Programs -> Microsoft SQL Server -> Books Online.) Active/Active Clustering In active/active clustering, the server nodes in the cluster run their own workload simultaneously. Each server in the cluster is available to do real work, or is “active,” yet each server in the cluster is also available to recover the resources and workload of any other server in the cluster. This configuration eliminates the need for an idle server to wait for a failure. However, one drawback to active/active clustering is the risk of overloading the node that takes over for the failed one. The possibility of overload exists because a single server must now perform its own work and that of the failed node. Active/Passive Clustering In active/passive clustering, one node in the cluster remains idle, while the other node (or nodes if running Windows 2000 Datacenter Server) is active. If an active node fails, the processing of cluster-aware applications (SQL Server) is switched to the passive node. The failed node can then be restored and the application can failback to it, so that it becomes an active node again, leaving the passive node available for the next time failover is needed. Note: This paper will focus on active/passive SQL Server failover clustering on a two-node cluster running Windows 2000 Advanced Server and MSCS. Assumptions/Pre-Requisites This document assumes that users: April 2001 Are familiar with SQL Server and can install a regular (non-clustered, non-failover) SQL Server 2000 server instance. Are familiar with the basics of network IP addressing and configuration. Can install Windows 2000 Advanced Server / Datacenter. Are familiar with setting up and configuring Dell PowerEdge servers and Dell PERC2 RAID controllers. (For use with PERC3 RAID controllers, please contact a local Dell representative.) Page 9 Dell ESG/SSG Worldwide Marketing Section 4 Planning Considerations for SQL Server Before the installation process begins, it is important to thoroughly plan the configuration, as this can help to avoid costly reconfiguration time. It is necessary to both size and plan the system from the beginning, as it can be very difficult to add resources to a cluster later on. Planning Steps Planning a SQL Server failover cluster involves the following steps: Choosing the platform Sizing the system Laying out the I/O subsystem Designing the network Sizing the system is not the focus of this paper, however a few sizing tips are provided. The I/O subsystem is critical since the shared disk is one of the key components of the failover cluster. Another key component is the use of a number of TCP/IP network addresses. Planning these components ahead of time is critical to a successful cluster installation. Choosing the Platform The first determination is which PowerEdge solution (platform) is appropriate for the customer. The choice must come from the list of Dell certified configurations. To find the certified solutions, see the Dell documents entitled: April 2001 Dell PowerEdge Cluster SE100, SE200, and SL200 Platform Guide for SCSI systems Dell PowerEdge ClusterFE100/FL100 Platform Guide for Fibre Channel systems Dell PowerEdge Cluster FE100/FE200 and FL100/FL200 Platform Guide for Fibre Channel systems using SAN 4.0 Page 10 Dell ESG/SSG Worldwide Marketing These documents can be downloaded from the Dell support web site at the following URL: http://support.dell.com/us/en/docs/. They provide specifics on the minimum hardware requirements necessary to build a cluster for each solution. However, they do not have information on how to configure disk drives, nor on how to install MSCS and SQL Server. These topics are covered in this paper. SCSI vs. Fibre Channel Whether customers need a SCSI solution or a Fibre Channel solution will depend on their current and future disk storage needs and preferences. Below are some characteristics of each type of solution that will help determine which is best. Fibre Channel Characteristics Fibre Channel provides high availability through redundant Fibre Channel host bus adapters (HBAs) in the Dell PowerEdge server and dual storage processors in the Dell PowerVault storage system. Fibre Channel also requires dual standby power supplies (SPS) in the Dell PowerVault storage system, which provides integrity of the storage processor write-cache in case one power supply fails. Thus writecaching may be enabled, which will improve disk write performance. With SCSI, write-caching may not be used with a failover cluster. Fibre Channel allows for easy storage subsystem growth. It is relatively easy to add disks to the PowerVault 650F storage system without having to shutdown the system. (For use with the PowerVault 660F, please contact Dell.) A directly connected Fibre Channel solution supports up to 120 disks on one PowerVault 650F. For a Storage Attached Network (SAN) solution, it is possible to have up to 480 disks on one cluster connecting through a Fibre Channel switch. The same switch can also support additional stand-alone systems and clusters. Although Fibre Channel systems are typically more expensive than SCSI, the expandability and flexibility is much greater than SCSI. SCSI Storage Characteristics April 2001 The SCSI solutions (SE100, SE200, SL200) allow at most two PERC 2 RAID controllers for the clustered disks. This results in a maximum of 24 disk drives for the SE100, and a maximum of 48 disk drives for the SE200 and SL200, can be configured for the cluster’s shared disks. If that amount of storage is not sufficient for the customer’s current and future growth needs, then Fibre Channel solutions should be considered. Page 11 Dell ESG/SSG Worldwide Marketing Fibre Channel systems are more expensive than SCSI, so if the customer only needs a small number of disks and does not require the high availability provided by Fibre Channel, then SCSI solutions may be appropriate and will be less expensive. Again, consider the expected growth by the customer. Sizing the system With either a SCSI solution or a Fibre Channel solution, it is important to properly size the system before beginning the configuration and installation. It is very difficult to add storage to a cluster system once MSCS and SQL Server have been installed and configured. This is because clustered storage cannot be easily modified while the cluster is up and running. With the SCSI solution, several steps require both nodes in the cluster to be shutdown or rebooted when adding storage. This requires significant downtime in a system that is designed for maximum uptime. When sizing the system, it is important to keep in mind not only the storage capacity that is needed, but also the performance levels that must be maintained. A single disk drive can handle approximately 85 I/Os per second before latencies increase and performance begins to degrade. If the choice is a SCSI disk subsystem and it is configured with a shared volume(s) as RAID 5, it will lead to a significant number of extra I/Os during writes as well as increased write latencies due to the RAID 5 parity. Table 1 provides a review of RAID and describes the additional I/Os incurred by RAID. Table 1: Characteristics of Various RAID Levels RAID Level Fault Tolerance Logical Reads RAID 0 RAID 1 or 10 RAID 5 None Best Moderate 1 1 1 Physical I/Os per Read 1 1 1 Logical Writes Physical I/Os per Write 1 1 1 1 2 (writes) 4 (2 reads, 2 writes) Note: Because of the additional overhead incurred by RAID 5 writes, Dell never recommends that RAID 5 be used in a SQL Server database configuration. This is especially true of the volume(s) containing the transaction log files. It is acceptable to use RAID 5 on volume(s) used exclusively for read-only data. The I/O subsystem should be sized accordingly in order to keep the number of physical I/Os within acceptable limits. An I/O subsystem that exceeds the physical limitations of the disk drives will experience high latencies that will in turn severely affect system performance. April 2001 Page 12 Dell ESG/SSG Worldwide Marketing Note: Write-caching (write back) should not be enabled on the PERC 2 controller used for shared cluster disks or the data in the cache will be lost during failover. In addition, users should not use software RAID with failover clustering. Hardware RAID must be used. Planning the I/O subsystem In addition to carefully sizing the I/O subsystem, the I/O subsystem must be planned with the cluster functionality in mind. Creating both volumes for the cluster as well as volumes for a well-designed SQL Server database system requires that several shared volumes be created. Table 2describes some of the guidelines. Table 2: SQL Server Database System Requirements for Shared Volumes Volume Quorum Drive SQL Server Transaction Log files Database data files Description This is used for the cluster quorum and should not be used for any other purposes. The transaction log files should be on their own mirrored volume for both performance and database protection. Fault Tolerance RAID 1 The database data files themselves should be placed on a RAID 10. The number of drives will vary depending on performance requirements. RAID 10 RAID 1 It is important to plan this out in advance in order to avoid costly reconfiguration time later. PERC 2 Controller Disk Array Limitation PERC 2/SC and PERC 2/DC have some limitations with the RAID configuration utility. A maximum of eight disk arrays can be configured per controller (not per channel). When using RAID 10, each set of two disks must be first configured as its own disk array of RAID 1, which makes it possible to span these arrays to create RAID 10. Here lies the problem. To create a RAID 10 array out of a total of 20 disks (10 disks mirrored and striped with 10 other disks), it is necessary to create 10 - RAID 1 disk arrays and span them – but this can’t be done because there is a maximum of eight disk arrays. This leaves four disks that cannot be configured. Since all available disks should be configured, there is a solution. To start, create RAID 10 volume(s) out of 7 sets of mirrored pairs (14 disks), which uses up seven of the available disk arrays. Then use the eighth disk array to create a RAID 5 volume out of the remaining six disks. The RAID 5 volume can be used for the quorum disk or for storing other files. Note: In addition, it is very important that the PERC 2/SC or PERC 2/DC controllers in both nodes be running the latest version of the device driver. They must also be running the latest version of the firmware. April 2001 Page 13 Dell ESG/SSG Worldwide Marketing Planning the Network At least two network cards are required in each of the cluster nodes – that is four NICs for the two-node cluster. One NIC is for the public LAN and one is for the private, node-to-node cluster interconnect. The public network is for communication with clients and domain controllers. The private network is for node-to-node communication, cluster status signals, and cluster management. Having two cards provides fault tolerance (redundancy) for cluster node-to-node communications in case of a failure on one of the networks. The private network can be a crossover cable. See Figure 1 for a diagram of the network. Figure 1: Example of two-node cluster network cabling connections It is necessary to provide static IP addresses for each of the four NICs, one IP address for the Cluster Service (discussed later), and another for the SQL Server virtual server address. This makes a total of six static IP addresses needed for the two-node cluster. The public NIC must be on a different subnet from the private NIC. Table 3 shows an example of IP addresses for the cluster nodes and for WINS, DNS, and Gateway servers: Note: With the exception of the private network IP addresses (10.10.10.1 and 10.10.10.2), the IP addresses given below are for example only and may not be used in another network. Actual IP addresses will depend on individual networks. April 2001 Page 14 Dell ESG/SSG Worldwide Marketing Table 3: Sample IP Addresses for Cluster Nodes and for WINS, DNS, and Gateway Servers Usage Public Network (to clients) Cluster Node A Cluster Node B 192.168.1.1 192.168.1.2 (subnet.x.x) (subnet.x.x) 255.255.255.0 10.10.10.1 10.10.10.2 (users may use this value) ( users may use this value) 255.255.255.0 192.168.1.3 (subnet.x.x) Primary WINS 192.168.1.11 (subnet.x.x) Secondary WINS 192.168.1.12 (subnet.x.x) Primary DNS 192.168.1.21 (subnet.x.x) Secondary DNS 192.168.1.22 (subnet.x.x) 192.168.1.31 (subnet.x.x) 192.168.1.4 (subnet.x.x) Public Network Subnet Mask Private Network (node-to-node interconnect) Private Network Subnet Mask Cluster Service WINS Servers DNS Servers Default Gateway SQL Server Virtual Server Limitations SQL Server 2000 and Microsoft Cluster Services provide for a quick resumption of service in the event of a system hardware or software failure. MSCS does not provide for continuous uptime. In the event of a failure, the node (and SQL Server) is essentially shutdown and the database is started on the standby node. All users are disconnected and must reconnect. This constitutes loss of services. What SQL Server and MSCS can provide is a quick way to get running again. In addition, since the same data files are used on both nodes, SQL Server and MSCS will not survive the loss of data that occurs either because of an I/O failure or data corruption due to software or user error. There are several cases where SQL Server and MSCS will not be able to recover the system to a functional state as described below: April 2001 Disk subsystem failure SQL Server and MSCS will restart the SQL Server instance on another node, but using the same shared disk subsystem. A failure of the disk subsystem will cause all nodes in the cluster to fail. Data corruption If an OS or SQL Server program failure corrupts the database, clustering does not help since there is only one database. It will still be corrupt even if a second cluster node starts SQL Server. User error A user error that drops a table or database cannot be recovered by using clustering. Page 15 Dell ESG/SSG Worldwide Marketing These cases can only be recovered by using a database backup, highlighting the importance of backups, whether using a clustered server or not. Description of Environment The testing environment used for this guide was the SE200 Dell hardware platform cluster solution. This guide describes the back-end database system setup only. It does not discuss middle-tier or front-end-tier client systems. Detailed Description of System The following hardware and software was used for the Dell SE200 solution. Hardware: 2 Dell PowerEdge 6450 (as the two cluster nodes) 4 -700MHz Pentium® III Xeon™ processors per node 4 GB RAM per node 3 PERC 2/DC controllers in each node (a total of 6 controllers) – one PERC 2/DC was used for the internal disks, and the other two were used for the external cluster disks 2 PowerVault 210S storage units (SCSI storage) 20 disk drives (10 in each PowerVault 210S) 2 Ethernet NICs (one per node, in addition to the embedded NIC-a total of four NICs) – the added NIC is the Intel® Pro/100+ Server Adapter and the embedded NIC is the Intel 8255x-based PCI Ethernet Adapter (10/100) Domain Name Server connected to the public network A network switch connecting nodes with the domain name server Software: April 2001 Windows 2000 Advanced Server with Service Pack 1 (5.0.2195 SP1 build 2195) PERC 2/DC device driver version 2.62.0.0 SQL Server 2000 Enterprise Edition (build 8.00.194) Page 16 Dell ESG/SSG Worldwide Marketing Section 5 Building the Base Cluster In order to configure SQL Server Failover Clustering on Windows 2000, several installation phases must be completed. These phases are made up of the hardware installation and configuration, which were discussed earlier, the OS and MSCS installation and configuration, and the SQL Server with failover clustering installation. Each phase must be completed correctly in order for failover to function properly. Hardware Installation and Configuration Dell has developed several documents to provide a step-by-step tutorial on configuring the cluster hardware. The detailed steps for installing and configuring the cluster hardware are provided in one or more these documents that can be downloaded from the Dell support web site at the following URL: http://support.dell.com/us/en/docs/ Dell™ PowerEdge™ Cluster FE100/FL100 Installation and Troubleshooting Guide Dell™ PowerEdge™ Cluster FE100/FE200 and FL100/FL200 Systems Using SAN 4.0 Installation and Troubleshooting Guide Dell™ PowerEdge™ Cluster SE100, SE200, and SL200 Systems Installation and Troubleshooting Guide For completeness, a summary of those steps is given here. 1. Choose and size the platform that is appropriate for the system. The platform involves two servers and a shared storage subsystem of your choice (SCSI or Fibre Channel). 2. Once a platform is chosen, follow the information in the previous section, “Planning the System,” to ensure that all the necessary hardware is on hand. 3. If the PowerVault 2xxS does not have two ESEMs or two SEMMs, add them now. 4. Verify that the PowerVault 2xxS is set to forced join mode. 5. Cable the system hardware for clustering. Note: Start up the systems and perform step 5 one node at a time. April 2001 Page 17 Dell ESG/SSG Worldwide Marketing 6. Enable cluster mode in the BIOS configuration utility for each controller(s) attached to a shared storage system. (First on Node A, then on Node B). Note: To perform step 5, once in the BIOS configuration utility (press <Ctrl+M> during boot), select Objects->Adapter->Adapter 0 or 1 (the one that has shared storage). Then highlight “Cluster Mode=DISABLED” and hit <Enter>. Select ENABLE and hit <Enter>. Select YES to change the mode. “Cluster Mode=ENABLED” should appear. Turn off Node B now. 7. On Node A only, change the SCSI IDs on the controller(s) attached to the shared storage through the BIOS configuration utility. Note: To do this, in the same screen as above, highlight “INITIATOR ID=X” (where X represents some number) and change this ID to one that is not being used, such as 6. Do this for both adapters in Node A if two are being used for shared storage. 8. Reboot Node A and hit <Ctrl+M> to go to the BIOS configuration utility again, and when the warning appears, continue, and configure and initialize the RAID disk arrays. Start Node B and also configure and initialize the disk arrays in the same way it was done for Node A. Do not perform steps 7 and 8 on Node B. Once the chosen hardware is configured for clustering as directed in the Dell document mentioned above, continue with the installation of Windows 2000. Windows 2000 Installation It is important that the correct version of Windows 2000 is installed in order to support clustering. Use Table 4 to help determine which version of Windows 2000 is required. Table 4: Installation Requirements for various versions of Windows 2000 Windows 2000 Version Professional Server Advanced Server Datacenter Server April 2001 Hardware Supported 1-2 CPUs 4 GB of RAM 1-4 CPUs 4 GB of RAM 1-8 CPUs 8 GB of RAM 1-32 CPUs 64 GB of RAM Page 18 Clustering Supported No No 2 Node Failover Clustering 4 Node Failover Clustering Dell ESG/SSG Worldwide Marketing Along with Windows 2000 Advanced Server (or Datacenter Server), the MSCS Cluster Service must be installed and in operation before SQL Server 2000 (with failover clustering) can be installed. For the two-node cluster solution described in this guide, install Windows 2000 Advanced Server on each node. Two things to know before installing the OS: 1. Both of the cluster nodes must belong to a common domain. Make sure that the IP address chosen for the public network can communicate with the chosen domain. Note the domain name for installation 2. When at the Windows 2000 Components applet during the installation process, check the box next to “Cluster Service” as a component to be installed. This copies required files to the system for MSCS. The service can also be installed later by running the Add/Remove Components Control Panel applet. If the Cluster Service is installed as part of the installation, once the installation is complete and the machine reboots, a window will appear called “Finish Setup.” By clicking on “Finish Setup” the Cluster Configuration Wizard will appear. However, it is necessary to create disk partitions before this wizard is run. Cancel out of the Finish Setup window and run the Cluster Configuration Wizard later. See the section Cluster Configuration Wizard in this paper for details. If network IP addresses were not configured during installation, do so now. Go to the desktop and right click on the My Network Places icon. Select Properties and rename the “Local Area Connections” to have a more meaningful label -such as “public” and “private” -- referring to the network for public connection and the network for private cluster interconnect. An example of renamed network connections is shown in Figure 2. April 2001 Page 19 Dell ESG/SSG Worldwide Marketing Figure 2: Local area connections renamed "public" and "private" In addition, it is necessary to disable NetBIOS for all private networks on both nodes. To do this: April 2001 1. Right-click on the private network connection shown in Figure 2and select Properties. 2. Highlight Internet Protocol (TCP/IP) and click the Properties button. 3. Click on the Advanced button. 4. Go to the WINS tab. 5. Select Disable NetBIOS over TCP/IP. 6. Click OK, OK, and OK to finish. Page 20 Dell ESG/SSG Worldwide Marketing Install Windows 2000 Service Pack (SP)1 (or later) Once Windows 2000 has been installed, it is necessary to install Windows 2000 SP 1 on each node in the cluster. The SP1 (or later) installation is very simple and is not covered here. Creating Windows 2000 Disk Partitions Once the operating system is installed with all network IP addresses configured, create Windows 2000 disk partitions on the RAID disks that were configured earlier. Go to the Computer Management/Disk Management window. Here are some rules about disk partitions for a cluster: Disks must be configured as Basic, not Dynamic Disks must be formatted using NTFS Disk partitions and drive letters for the shared cluster disks must be exactly the same for Node A and Node B Following Dell’s recommendation, create the three disk partitions discussed earlier– one for database data, one for database logs, and one for the quorum disk – and assign drive letters such as F:, G:, and H: to them. Create the partitions and assign drive letters on Node A first, then do the exact same configuration on Node B. It is possible that the partitions appear on Node B before they are created. This is okay and is an effect of the cluster mode. See pages 5-15 of the Dell Installation and Troubleshooting Guide for the SE100, SE200, SL200 platforms for step-by-step details of creating disk partitions. Setting up a Domain User Account for the Cluster The Cluster Service requires a domain user account under which it will run. It is necessary to create this user before configuring the cluster in the next section. This account should not belong to another user on the domain. It should be specifically for the Cluster Service. To create the user on the domain controller, follow these steps: April 2001 1. Click Start -> Programs -> Administrative Tools -> Active Directory Users and Computers. 2. Click the + sign to expand the domain. 3. Click Users. 4. Right-click Users, point to New, and click User. Page 21 Dell ESG/SSG Worldwide Marketing 5. Type in the cluster user name information (in the example case the logon name is “cluster”). 6. Click Next. 7. Set the password settings to User Cannot Change Password and Password Never Expires. 8. Click Next and Finish to create the user. 9. Right-click the user name (cluster) in the left pane. Select Properties. 10. Click Add Members to a Group. 11. Click Administrators and click OK. Now there is a domain user account with administrator privileges for the Cluster Service. Cluster Configuration Wizard Once the disk partitions have been created for both nodes, it is possible to run the Cluster Configuration Wizard to setup MSCS. Perform the following steps to run the Cluster Configuration Wizard on Node A, which will be the first node in the cluster. (Also, follow these steps to start the wizard on Node B later.) There are two ways to run the Cluster Configuration Wizard: 1. If the Cluster Service was installed when Windows 2000 was installed, go to Start -> Programs -> Administrative Tools -> Configure Your Server and click Advanced, then Cluster Service. 2. If the Cluster Service has not been installed yet, follow these steps to install the service and run the wizard: a. Go to Start -> Settings -> Control Panel. b. Double-click Add/Remove Programs. c. Double-click Add/Remove Windows Components. d. Check next to Cluster Service and click Next. e. April 2001 Insert the Windows 2000 installation CD-ROM (the Cluster Service files are there) and type in Z:\i386 for the path (where Z: is the CD-ROM drive letter). Page 22 Dell ESG/SSG Worldwide Marketing f. Click OK. g. Click Next to begin the Cluster Configuration Wizard. The Wizard will start with either a Welcome window or a Hardware Configuration window, depending on how it was started. After clicking Next in those windows, the “Create or Join a Cluster” window appears, as shown in Figure 3. Since this is the first node being configured, select “The first node in the cluster”. Figure 3: Cluster Service Configuration Wizard: Create or join a cluster Next will come the “Cluster Name” window. Enter a name for the cluster. In this instance the cluster name is CLUSTER6450 as shown in Figure 4. April 2001 Page 23 Dell ESG/SSG Worldwide Marketing Figure 4: Naming the cluster Next the Select Account window appears. Enter the user name and password created earlier on the domain controller for the Cluster Service to use. In Figure 5, the user “cluster” does not have a password on the domain SOLENG. Figure 5: Select an Account April 2001 Page 24 Dell ESG/SSG Worldwide Marketing Next the “Add or Remove Managed Disks” window appears. By default, all the disks on the shared SCSI bus will appear in the Managed Disks column. In this case, it is expected, because all shared disks are to be managed by the cluster. See Figure 6. If there are shared disks that should not be managed by the cluster, highlight the disk and click Remove. Figure 6: The Add or Remove Managed Disks window Next the Cluster File Storage window appears. Enter the disk to specify where the quorum files will be located. In this case, it is the H: drive as shown in Figure 7. April 2001 Page 25 Dell ESG/SSG Worldwide Marketing Figure 7: The cluster file storage screen After this, is the “Configuring Cluster Networks” window. Simply click Next to go to the “Network Connections” window. Here users are presented with their networks, one at a time, in no particular order. The public network is first in the example shown in Figure 8. Select the option “All Communications (mixed network)” for the public network. (This example shows public and private for the network names because they were labeled as such earlier.) Figure 8: Assigning network role for public network connection April 2001 Page 26 Dell ESG/SSG Worldwide Marketing Next, the private network should appear for configuration. Choose the option “Internal cluster communications only (private network)” for this network as shown in Figure 9. Figure 9: Assigning network role for private network connection Next the “Internal Cluster Communications” window appears. Here, set the priority, or order in which the networks should be used for node-to-node cluster communications, as shown in Figure 10. It is important to always put the private network first, and then the public network. If the private network were to fail, then cluster communications would switch to the public network. April 2001 Page 27 Dell ESG/SSG Worldwide Marketing Figure 10: Setting network failover order Now the “Cluster IP Address” window appears. Here enter the virtual IP address for the cluster to use and the subnet mask, as shown in Figure 11. The IP address will not be the same as the example. This address should be unique from the physical private and public network card addresses, and should be on the same subnet as the public network. The wizard should automatically detect the network to be public. Figure 11: Cluster IP address screen April 2001 Page 28 Dell ESG/SSG Worldwide Marketing Now click Finish to complete the cluster configuration on the first node. The Cluster Service will be started on the first node. To verify the cluster is installed and configured properly, go to Start -> Programs -> Administrative Tools -> Cluster Administrator. A window similar to Figure 12 should appear. Figure 12: Cluster administrator Now it is possible to configure the second node in the cluster. This configuration is much shorter because the network settings will be based on the configuration of the first node. Both nodes should still be running at this point. Follow the steps at the beginning of this section to start the Cluster Configuration Wizard on the second node. Once at the “Create or Join a Cluster” window of the wizard, choose “The second or next node in the cluster”, as shown in Figure 13. April 2001 Page 29 Dell ESG/SSG Worldwide Marketing Figure 13: Create or join the second node in the cluster Now the Cluster Name window appears. Enter the name created for the cluster during the first node’s configuration, as shown in Figure 14. It is not necessary to enter the user name at this point. In the next step, the wizard will automatically supply the account selected during configuration of the first node. Figure 14: Cluster name April 2001 Page 30 Dell ESG/SSG Worldwide Marketing Next the “Select an Account” window appears and the account name is automatically entered. Enter the password if there is one. Figure 15 shows the account “cluster” with no password. Figure 15: Select an account for the cluster service Click Next and Finish to complete the configuration. The Cluster Service will now start. Verifying the Cluster Setup To verify that the cluster is functioning correctly, open the Cluster Administrator on either node and go to Start -> Programs -> Administrative Tools -> Cluster Administrator. Both nodes should appear as shown in Figure 16 (MSFT6450A and MSFT6450B). April 2001 Page 31 Dell ESG/SSG Worldwide Marketing Figure 16: Cluster Administrator showing two nodes online Now select the “Active Groups” folder under the first node, as shown in Figure 17, to see the currently active groups for that node. If they do not appear under node A, check node B. April 2001 Page 32 Dell ESG/SSG Worldwide Marketing Figure 17: Cluster Administrator showing Active Groups for Node A To test the cluster, right-click on the Cluster Group name in the right pane, and select Move Group from the menu. This causes the group to failover to the second node. The group should disappear from here and reappear under the “Active Groups” folder of the second node as shown in Figure 18. April 2001 Page 33 Dell ESG/SSG Worldwide Marketing Figure 18: Cluster Administrator showing the moved group If this worked on the cluster, then the cluster is fully operational. The next topic to address is how to create the Log and Data group that will contain both disks F: and G:. For more information about cluster installation, see the Microsoft online document Step by Step Guide to Installing Cluster Service at www.microsoft.com/windows2000/library/planning/server/clustersteps.asp. Managing Disks in Cluster Administrator For the purpose of this SQL Server installation, it is necessary to group data and log disks, F: and G:, together so they can be selected as one group for SQL Server failover. If the system is installed as shown here, with one quorum disk, one data disk, and one log disk, then there will be three groups in the Groups folder of Cluster Administrator: H: Cluster Group F: Disk 1 G: Disk 2 When installing SQL Server, a cluster group must be selected for the SQL Server database files (data and log), and another group selected as the quorum disk. So instead of having Disk 1 and Disk 2 with F: and G: separately, F: and G: will be joined into one group called Log and Data. (It is possible to group these disks April 2001 Page 34 Dell ESG/SSG Worldwide Marketing after the SQL Server installation as well, but it is clearer to do it now for this demonstration). To do this: 1. 2. Click on the group Disk 2. Its disk G: appears in the right pane as shown in Figure 19. Figure 19: Cluster Administrator showing contents of Disk 2 group 3. Right-click on the disk (G:) in the right-pane and select Change Group. 4. Then choose Disk 1 as the group into which this disk should be changed. 5. Click Yes and Yes again to move the disk to group Disk 1. 6. Now right-click on Disk 2 in the left pane and select Delete to remove this empty group from the window. 7. Now both disks F: and G: should appear under Disk 1. 8. Right-click on Disk 1 and select Rename to change the group name to Log and Data (or whatever name the user chooses). Now there should be two groups as shown in Figure 20. April 2001 Page 35 Dell ESG/SSG Worldwide Marketing Figure 20: Changing the group name Configuring MS DTC for the Cluster MS DTC (Microsoft Distributed Transaction Coordinator) is required in a cluster for distributed queries and two-phase commit transactions. The comclust.exe program must be run on each node to configure MS DTC to run in clustered mode. See the topic "Failover Clustering Dependencies" in SQL Server Books Online for more details on the use of MS DTC with clusters. In the Cluster Administrator window, click on the Cluster Group in the left pane. If MS DTC does not show up as a resource in the right pane (most likely it won’t), configure MS DTC for clustering. Do so by running the following command at a command prompt, first on Node A, then on Node B: >Comclust.exe Make sure to do this on both nodes, one at a time. Now MS DTC should appear as a resource in the Cluster Group. Installation of SQL Server 2000 can now begin. April 2001 Page 36 Dell ESG/SSG Worldwide Marketing Section 6 Adding SQL Server 2000 to the Cluster After installing MSCS, running the Cluster Configuration Wizard, and testing cluster functionality, it is time to begin the SQL Server 2000 Setup program on the primary node. It is not necessary to install SQL Server manually on the other node – this is done automatically when Setup is run on the primary node as long as the other node is powered on and available in the cluster. So have both cluster nodes turned on during the SQL Server Setup program. (Remember that it is possible to have a three-node or four-node failover cluster with Windows 2000 Datacenter Server, in which case the rules are the same. SQL Server binaries will be automatically installed locally on all nodes in the cluster.) The Setup program detects that the system is a cluster, and asks several questions concerning cluster and virtual server information at the beginning of the install process. Then the normal SQL Server installation process continues. Once all the information is entered, the Setup program automatically installs a new, separate instance of SQL Server binaries on the local disk of each server in the cluster. The binaries are installed in exactly the same path on each cluster node, so it is important to ensure that each node has a local drive letter in common with all the other nodes in the cluster, such as C:\SQL2000. The Setup program also installs the system databases on the specified cluster (shared) disk. System databases must be on a clustered disk so that they can be shared between the nodes (and failed over when necessary), because these databases contain specific user login and database object information that must be the same for each node. Note: In SQL Server 2000, during a failover, only the databases fail over. In SQL Server version 6.5 and SQL Server version 7.0, both the SQL Server databases and binaries fail over during a failover. What follows are instructions for the cluster-related screens that appear when installing SQL Server on the primary node in a cluster (with the other node online too). These screens appear near the beginning of the installation. The first cluster screen is the Computer Name screen as shown in Figure 21. Enter a name for the SQL Server virtual server. This will be the name of the SQL Server instance, and the server name that users login to, so that whether Node A or Node B is actually online, the virtual server name will allow users access to the online node. April 2001 Page 37 Dell ESG/SSG Worldwide Marketing Figure 21: Screen for naming the SQL Server virtual server Next the Failover Clustering screen appears. Enter the virtual server IP address here. This is a unique address, different from the Cluster Service address. Make sure that the Network to Use is the public network. See Figure 22. Figure 22: Entering IP address for the virtual server Click Add to add this address to the white box as shown in Figure 23, then Next to go to the next screen. April 2001 Page 38 Dell ESG/SSG Worldwide Marketing Figure 23: After clicking "Add" After clicking Next at this screen, continue with the regular SQL Server installation screens. When the program has copied all of the files a message appears stating that it was completed successfully. April 2001 Page 39 Dell ESG/SSG Worldwide Marketing Section 7 Best Practices, Key Findings and Summary SQL Server 2000 in conjunction with Windows Cluster Services forms a Microsoft SQL Server Failover Cluster. A failover cluster allows SQL Server (a cluster-aware application) to migrate from one system or node to a standby node. This increases the availability of the total system by allowing quick recovery in the case of SQL Server failure on one node. In addition, if a hardware problem is at fault, the standby node will take over and continue functioning until the primary node can be repaired. Even when using SQL Server, Windows Clustering, and RAID I/O subsystems, it is still critical to perform effective SQL Server database backups. If there is data corruption because of a disk failure, or if a user inadvertently deletes a table or database that must be retrieved, only database backups will provide recovery. Always perform regular database and transaction log backups. This is an important part of database maintenance. Attention to detail is important in configuring a SQL Server cluster. Highlights of some of the important details are summarized here: Install the latest version of the Windows 2000 Service Pack before clustering the system. It is important that both nodes of the cluster are brought up to the latest revision before clustering occurs. Use Windows 2000 SP1 or the latest service pack. It is important to properly size and plan the system before beginning to configure it. Once the system is configured and running it is more difficult to add storage and reconfigure. Planning the network is very important. Since a SQL Server cluster requires multiple IP addresses, these must be planned in advance. Take notes during the installation/configuration. Careful note taking can help in reproducing the configuration as well as in determining what went wrong in the event of a failure. Documenting recovery procedures is very valuable. It may be a long time from when the initial configuration was built to when it may need to be modified. This paper has described the functionality and usefulness of SQL Server 2000 and Microsoft Cluster Server that form a failover cluster. This paper explained how to plan, configure, and install a SQL Server failover cluster. The use of the April 2001 Page 40 Dell ESG/SSG Worldwide Marketing SQL Server failover cluster can help reduce system downtime by restoring the system to a running state as soon as possible. April 2001 Page 41 Dell ESG/SSG Worldwide Marketing Section 8 Contacts For questions about this paper or the implementation of SQL Server with Dell PowerEdge servers, please contact your Dell sales representative. For comments and feedback, please mail to us_solutions_engineering@dell.com. Solution Engineering Enterprise Systems Group Dell Computer Corporation One Dell Way Round Rock, Texas USA 78682 +1-(800) WWW-DELL (999-3355) Or +1-(512) 338-4400 www.dell.com Dell, PowerEdge, and PowerVault are trademarks of Dell Computer Corporation. Microsoft, Windows, and SQL Server are registered trademarks or trademarks of Microsoft Corporation. Intel, Pentium, and Xeon are registered trademarks or trademarks of Intel Corporation. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. ©Copyright 2001 Dell Computer Corporation. All rights reserved. Reproduction in any manner whatsoever without the express written permission of Dell Computer Corporation is strictly forbidden. For more information, contact Dell. Dell cannot be responsible for errors in typography or photography. Information in this document is subject to change without notice. April 2001 Page 42 Dell ESG/SSG Worldwide Marketing Section 9 Appendix 1: Reference Documents The following documents can be found online at http://support.dell.com/us/en/docs/index.asp?cc=12&ct=89. “Dell PowerEdge Cluster SE100, SE200, and SL200 Platform Guide” for SCSI systems. “Dell PowerEdge Cluster FE100/FL100 Platform Guide” for fiber channel systems. “Dell PowerEdge Cluster SE100, SE200, and SL200 Installation and Troubleshooting Guide” “Dell PowerEdge Cluster FE100/FL100 Installation and Troubleshooting Guide” The following document can be found at: www.microsoft.com/windows2000/library/planning/server/clustersteps.asp: April 2001 “Step by Step Guide to Installing Cluster Service” Page 43 Dell ESG/SSG Worldwide Marketing Section 109 Appendix 2: Vendor Information Dell Computer Corporation Dell Computer Corporation (NASDAQ: DELL) is the world's leading direct computer systems company, based on revenues of $32 billion for the past four quarters, and is a premier provider of products and services required for customers to build their Internet infrastructures. The company ranks No. 48 on the Fortune 500, No. 154 on the Fortune Global 500, and No. 7 on the Fortune “global most admired” lists of companies. Dell designs, manufactures and customizes products and services to customer requirements, and offers an extensive selection of software and peripherals. Dell is also the #2 provider of Intel-based servers in the world. (Source: IDC). Information on Dell and its products can be obtained at www.dell.com. Microsoft Corporation Founded in 1975, Microsoft (NASDAQ: MSFT) is a worldwide leader in software, services and Internet technologies for personal and business computing. The company offers a wide range of products and services, including Microsoft Windows 2000, IIS, and SQL Server 2000, which were used in our lab with the testing we performed. For more information about Microsoft, see www.microsoft.com. Intel Corporation Intel (NASDAQ: INTC) is the world's largest chipmaker, a leading manufacturer of computer, networking and communications products, and a premier supplier of building blocks for the worldwide Internet economy. Servers based on Intel processors deliver cost-effective performance and scalability for business applications. All Dell machines used for the testing described in this paper used Intel processors. Additional information about Intel is available at www.intel.com/pressroom. April 2001 Page 44 Dell ESG/SSG Worldwide Marketing Section 11 Appendix 3: Glossary of Terms and Acronyms Cluster-aware – A cluster-aware application provides functionality that allows it to take advantage of the failover features provided by clustering. Failback - Once a failover occurs, the failed system is restored, brought back online, and processing is resumed by switching back to the restored system. Failover – A way to attain high server availability. For example, if one system fails or is purposely taken offline, an application can switch to a second, clustered system – this switch is known as a failover. Failover minimizes system downtime. HBA - Host bus adapter MSCS – Microsoft Cluster Server NIC – Network Interface Card Quorum disk – Disk space used to store cluster configuration database checkpoints and log files that help manage the cluster. RAID – Redundant Array of Inexpensive Disks Subnet – The part of the IP address that includes the network prefix and subnet number, but not the host number. April 2001 Page 45 Dell ESG/SSG Worldwide Marketing