High Availability and Scalability Technologies An Oracle9i RAC Solution Presented by: Arquimedes Smith Oracle9i RAC Architecture Real Application Cluster (RAC) is a powerful new feature in Oracle9i Database that can greatly enhance an application’s scalability and availability. Oracle9i RAC is an Oracle database that has two or more instances accessing a shared database via cluster technology. A cluster is a group of machines (or nodes) that work together to perform the same task. To support this architecture, two or more machines that host the database instance are linked by a high-speed interconnect to form the cluster. The interconnect is a physical network used as a means of communications between each node of the cluster. Objectives: Identify the components of hardware clusters and Oracle9i Real Application Clusters Architecture. Identify the functions of the Global Cache Service, the Global Enqueue Service, and the Global Resource Directory. Describe the concepts and architecture of Cache Fusion. Define when and how dynamic resource remastering occurs. Explain database recovery for Cache Fusion. Real Application Clusters Concepts Oracle9i Real Application Clusters allow multiple instances to execute against the same database. The typical installation involves a cluster of servers with access to the same disks. The nodes that actually run instances form a proper subset of the cluster. The cluster nodes are connected by an interconnect that allows them to share disks and run the Cluster Group Services (CGS), the Global Cache Service, and the Global Enqueue Service. A node is defined as the collection of processors, shared memory and disks that runs an instance. A node may have more than one CPU, in either an SMP or a NUMA configuration. The node monitor is part of the vendor-provided cluster management software (CMS) that monitors the health of processes running in the cluster, and is used by CGS to control the membership of instances in Real Application Clusters. A node-to-instance mapping defines which instances run on which nodes. For example, it can specify that instance RACA runs on host 1, and instance RACB runs on host 2. This mapping is stored in text configuration files on UNIX, and in the registry on Windows. Benefits of Real Application Clusters By using multiple instances running on their own nodes against the same database, Real Application Clusters provide the following advantages over single instance databases: Applications have higher availability because they can be accessed from any instance: an instance failure on one node does not prevent work from continuing on one or more surviving instances. More database users can be supported when a single node reaches its capacity to support additional sessions. Some processing, particularly operations that can be executed with parallel components, is completed faster when the work is spread across multiple nodes. Work can scale (more work can be completed in the same amount of time) when each instance can be optimized to support a maximum workload. Block Transfers in Real Application Clusters In Oracle8i, a Consistent Read Cache Fusion server was introduced to Parallel Server. This removed the need to force disk writes to satisfy queries and greatly simplified application design for Oracle Parallel Server. Consistent Read Cache Fusion allowed a read consistent block image to be transferred from the buffer cache of the writing instance to the cache of the reading instance using the cluster interconnect. This contrasted with previous releases that required the current block to be forced to disk for the reading instance, followed, in most cases, by forcing writes of rollback segment blocks to produce a read consistent copy of the block in the reading instance. Consistent Read Cache Fusion was also known as Cache Fusion, Phase 1, or Write-Read Cache Fusion. The latter name refers to the fact that blocks changed in one instance (write activity) have a read consistent image sent across the interconnect to satisfy a query (read activity). In Oracle9i, Real Application Clusters replace Oracle Parallel Server and provide full Cache Fusion. With Cache Fusion, modified blocks currently in the buffer cache of one instance are transferred to another instance across the cluster interconnect rather than by forcing disk writes of the blocks. This is true for blocks required for changes by the second instance (write-write transfers) as well as for queries on the second instance (write-read transfers). The mechanism also allows read-read and read-write transfers, which reduces the need to read blocks from disk. Cache Fusion Model The Global Resource Directory contains the current status of resources shared by the instances. Its contents are maintained by the Global Cache and Global Enqueue Services using messages containing sufficient information to ensure that the current block image can be located. These messages also identify block copies being retained by an instance for use by the recovery mechanisms and include sequence information to identify the order of changes made to that block since it was read from disk. The mode in which a block resource is held (NULL, Shared, or Exclusive), as well as the resource role (local or global) and past image history, are maintained by the Global Resource Directory. Details of resource roles and past images are provided later in this lesson. The Global Cache Service is responsible for assigning resource modes and roles, updating their status, locating the most current block image when necessary, and informing holders of past images when those images are no longer needed. The information managed and maintained by the Global Resource Directory allows Real Application Clusters to minimize the time taken to return to normal processing following an instance failure or cluster reconfiguration. Additionally, this information allows the LMS process to migrate a block resource master from its original location. The Global Cache Service migrates the block resource record to the requesting instance when a single instance appears to make exclusive, but repeated, use of the block. Global Cache Service Resource Modes A block resource can be held by an instance in one of three modes: NULL (N) The NULL mode is the default status for each instance. It indicates that the instance does not currently hold the resource in any mode. Shared (S) The shared resource mode is required for an instance to read the contents of a block, typically to satisfy a query. Multiple instances can hold a shared mode resource on the same block concurrently. Exclusive (X) The exclusive resource mode allows an instance to change the contents of a block covered by the resource. When an instance holds a resource in exclusive mode, all other instances must hold it in NULL mode. Global Cache Service Resource Roles Global Cache Service block resources are held in either a local or a global role. When a resource is held with a local role, it behaves very similarly to a PCM lock in Oracle Parallel Server. That is, an exclusive mode resource can only be held in one instance at a time and that no other instance can hold that resource in any mode whereas multiple instances can hold the resource in shared mode. Also, the Global Resource Directory does not have to retain any information about a resource being held in NULL mode by an instance. Global roles allow these restrictions to be broken. Specifically, an instance can use a global resource for a consistent read while it is concurrently held in exclusive mode by another instance. Also, two instances can hold dirty copies of a block concurrently although only one of them can have the resource in exclusive mode. Clean blocks held only in shared mode on multiple instances do not need a global role resource. Also, global block resource information can be stored in the Global Resource Directory to manage the history of block transfers even if the resource mode is NULL. With local resources, the Global Cache Service discards resource allocation information for instances which downgrade a resource to NULL mode. Fast Real Application Clusters Reconfiguration Many cluster hardware vendors use a disk-based quorum system that allows each node to determine which other nodes are currently active members of the cluster. These systems also allow a node to remove itself from the cluster or to remove other nodes from the cluster. The latter is accomplished through a type of voting system, managed through the shared quorum disk, that allows nodes to determine which will remain active if one of more of them become disconnected from the cluster interconnect. Real Application Clusters implement a similar disk-based system to determine which instances are currently active and which are not. The system uses a heartbeat mechanism to perform frequent checks against the disk to determine the status of each instance, and also uses voting system to exclude instances that can no longer be reached by one or more active instances At each heartbeat, every member instance gives its impression of the other members’ availability. It they all agree, nothing further is done until the next heartbeat. If two or more instances report a different instance configuration from each other (for example, because the cluster interconnect is broken between a pair of nodes), then one member arbitrates among the different membership configurations.Once this configuration is tested, the arbitrating instance uses the shared disk to publish the proposed configuration to the other instances. All active instances then examine the published configuration, and, if necessary, terminate themselves Background Processes Although some of the background processes used by Real Application Clusters have the same names as those in Oracle Parallel Server, their functions and activities are different from those in previous Oracle cluster-enabled software releases. These differences are discussed below. BSP: The Block Server Process was introduced in Oracle8i Parallel Server to support Read Consistent Cache Fusion. It does not exist in a Real Application Clusters database where these activities are performed by the LMS process. LMS: This process was known as the Lock Manager Server Process in Oracle Parallel Server. The new LMS process in Real Application Clusters executes as the Global Cache Service Process. LMON: This process was known as the Lock Manager Monitor in Oracle Parallel Server. While Real Application Clusters databases use the same process name, the process is defined as the Global Enqueue Service Monitor. LMD: There was a Lock Manager process, called LMD0, in Oracle Parallel Server, with a zero at the end of the name implying that multiple copies of the process may be allowed. This process is not used by Real Application Clusters but a new process, called LMD, executes as the Global Enqueue Service. Instance RAC1 Instance RAC2 PGA Private SQL Area Cursors SQL Area LMS SGA Global Resource Directory PGA Database Buffer Cache Shared Pool Data Dictionary Cache Java Pool Library Cache Large Pool LMON LMD LGWR DBWR Redo Log Files Redo Log Buffer Streams Pool LCK Misc. CKPT Private SQL Area Cursors SQL Area LMS Data files and Control files SGA Global Resource Directory Database Buffer Cache Shared Pool Data Dictionary Cache Java Pool LMON Library Cache Large Pool Redo Log Buffer Streams Pool LMD LCK DBWR LGWR Misc. CKPT Redo Log Files Background processes in RAC Background processes of a single instance, as in RAC configuration, each instance will have a set of these background processes. The following characteristics are unique to a RAC implementation as opposed to a single instance configuration: RAC is a configuration with multiple instances of Oracle running on many nodes. Multiple instances of Oracle share a single physical database. Multiple instances reside on different nodes and communicate with each other via a cluster interconnect. Instances may come and leave the cluster dynamically, provided the number is within MAX_INSTANCES value defined in the parameter file. Instances share a common database that comprises common data files and control file. Each instance participating the clustered configuration will have individual log files, rollback segments and undo tablespaces. All instances participating the clustered configuration can simultaneously execute transactions against the common shared database. Instances participating in the clustered configuration communicate via the cluster interconnect using a new technology called the cache fusion technology. Shared Initialization Parameter Files In previous Oracle cluster-enabled software releases, you had to have a separate initialization file for each instance in order to assign different parameter values to the instances. However, certain parameters had to have the same value for every instance in a clustered database. To simplify the management of the instance-specific and common database parameters, the IFILE parameter was commonly used. This parameter would point to a file containing the common parameter values and was included in each of the individual instance parameter files. In Oracle9i, you can store the parameters for all the instances belonging to an Oracle Real Application Clusters database in a single file. This simplifies the management of the instances because you only have one file to maintain. It is also easier to avoid making mistakes, such as changing a value in one instance’s file but not in another’s, if all the parameters are listed in one place. To allow values for different instances to share the same file, parameter entries that are specific to a particular instance are prefixed with the instance name using a dot notation. For example, to assign different sort area sizes to two instances, PROD1and PROD2, you could include the following entries in your parameter file: prod1.sort_area_size = 1048576 prod2.sort_area_size = 524288 Shared Initialization Parameter File When you put the parameters for all your instances in a single initialization file, you need this file to be available to the process that starts up each instance. If your instances are started automatically as part of the system startup routines, you need a copy of the file on each node in the cluster. However, in Oracle9i, you can store the parameters for all the instances in a special binary file known as a server parameter file (SPFILE). By storing your SPFILE on a shared cluster disk partition, the parameter file becomes available to any node in the cluster. Therefore, you only need to keep and maintain one copy of this file. The cluster in the slide consists of two nodes. Node 1 defines /hdisk1 as its $ORACLE_HOME directory and uses the name ORAC1 for its instance. Node 2 is using /hdisk1 for its $ORACLE_HOME and has ORAC2 for its instance’s name. A partition on the raw device, called /dev/rdisk1/spfile, holds the binary file, SPFILE. Each node has an instance-specific initialization file configured, containing just one entry: SPFILE = /dev/rdisk1/spfile This entry simply points to the raw partition holding the server parameter file containing the entries that are common to both instances as well as the instance-specific entries. The example shows two of the entries in SPFILE. These entries contain parameters that follow the naming standard suggested for instances: using the SID, defined at the operating system level, for the INSTANCE_NAME. Because instance names are unique to each instance, they use dot notation to include the instance name with the parameter name: orac1.instance_name = orac1 orac2.instance_name = orac2 Instance Naming In addition to the instance name assigned at the operating system level, using the ORACLE_SID environment variable, Oracle Parallel Server instances could be named with the INSTANCE_NAME parameter. However, there were limitations with this identification method: The INSTANCE_NAME values did not have to be unique in different instances of the same database On some platforms, the ORACLE_SID value could be the same for all instances of the same database. This meant that the instance names could not be used reliably by management tools to identify a particular instance. In Oracle9i, each instance of a Real Application Clusters database is required to have a unique name assigned with the SID. Unique instance names enable system-management tools to identify instances to the user with the instance names, with the assurance that these names are unique. Unique names also allow the instances associated with the same database to share an initialization file through the use the SID as a parameter prefix as described earlier. Unique Instance Numbers Oracle Parallel Server instances chose an unused instance number if a multi-instance database instance started up without specifying a value for the INSTANCE_NUMBER parameter. For example, if neither of the parameter files in a two-instance cluster database included an INSTANCE_NUMBER parameter, then the first instance started would become instance one and the second to start would become instance two. The instance numbers depended solely on the startup order and either instance could be identified as one or as two. Further, if the THREAD parameter were given a value for each these instances, the instance numbers and thread numbers for the instances could be different.This could be confusing when querying the dynamic performance tables and trying to manage the two instances. In Real Application Clusters, the default value for the INSTANCE_NUMBER parameter is set to 1. If you try to start two instances without specifying non-default values for one of them, the second instance will not start and you will receive an error message. To start successfully, each instance is required to specify a unique number in its INSTANCE_NUMBER parameter. Although this requirement for unique instance numbers will not prevent you from using different values for the INSTANCE_NUMBER and THREAD parameters for an instance, you are more likely to change them both if you are editing the file parameter file. Instance Names and Numbers There are three different databases shown in the example below: PROD, DEV, and TEST. PROD has four instances, PROD1, PROD2, PROD3, and PROD4, each running on one of the four available nodes. DEV has two instances, DEV1 and DEV2, running on node A and node C respectively. Similarly, TEST has two instance, TEST1 on node B and TEST2 on node D. Instance Names and Numbers The recommended naming and numbering for these instances would be as follows: 1. Number the redo threads for each instance 1, 2, 3, 4, and so on. That is, your redo thread numbers should start at 1 and increment by 1. The slide shows the thread numbers assigned to each of the eight instances used in the example. 2. Set the ORACLE_SID to be the database name plus its redo thread number as a suffix. For example, on node A, you would use ORACLE_SID = PROD1 for the PROD database instance and ORACLE_SID = DEV1 for the DEV database instance. 3. Set the THREAD parameter to match the thread number you chose for the instance and which you should have reflected in the instance’s ORACLE_SID value. For example, the THREAD value for the PROD database instance on node C should be 3. 4. Set the INSTANCE_NUMBER parameter value to be the same as the THREAD parameter value for each instance. For example, the INSTANCE_NUMBER for the PROD database instance on node C should also be set to 3, the value assigned to THREAD in item 3. 5. Set the INSTANCE_NAME parameters for each instance to match the SID name in the parameter file. For example, the INSTANCE_NAME value for the TEST database instance on node D should be PROD4. Following these recommendations, the parameter file for the DEV database would contain the following entries: dev1.thread = 1 dev2.thread = 2 dev1.instance_number = 1 dev2.instance_number = 2 dev1.instance_name = dev1 dev2.instance_name = dev2 SRVCTL Commands SRVCTL is a tool that lets you manage your Real Application Clusters environment from the command line. It replaces and extends the capabilities of the Oracle Parallel Server tool, called OPSCTL, which supported only the START and STOP subcommands. In comparison to OPSCTL, the START and STOP subcommands have been extended with additional options and new subcommands have been added to support more functionality.