Leveraging EMC CLARiiON Storage Replication to Offload Oracle Recovery Manager (RMAN) Backup Applied Technology Abstract Oracle Recovery Manager (RMAN) incremental backup allows very large databases to be backed up online very efficiently. However, running RMAN alongside production activities can impact production service levels. One method to offload RMAN backups, including fast incremental backups with Block Change Tracking (BCT), is to leverage a storage system’s rapid point-in-time replication technology. This white paper covers the procedure tested by EMC® CLARiiON® engineering to achieve this. August 2008 Copyright © 2008 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Part Number H5681 Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 2 Table of Contents Executive summary ............................................................................................ 4 Introduction ......................................................................................................... 4 Audience ...................................................................................................................................... 4 Terminology ................................................................................................................................. 5 Overview.............................................................................................................. 5 The RMAN offload process ................................................................................ 6 Offload procedure testing .................................................................................. 7 Production host environment ....................................................................................................... 7 Logical Oracle data storage layout on the production host...................................................... 8 RMAN backup offload server host environment .......................................................................... 8 Logical Oracle data storage layout on the backup host ........................................................... 9 Testing focus................................................................................................................................ 9 Test workload............................................................................................................................... 9 Test procedure............................................................................................................................. 9 Initial production database setup ............................................................................................. 9 Initial backup database setup................................................................................................. 10 Enable BCT tracking .............................................................................................................. 11 Test Phase 1: Establish baseline............................................................................................... 11 Test Phase 2: Exercise process to perform offloaded RMAN incremental backup................... 12 Begin hot backup.................................................................................................................... 12 Perform a clone split for database files .................................................................................. 12 End hot backup ...................................................................................................................... 13 Switch logs ............................................................................................................................. 13 Execute Block Change Tracking Switch ................................................................................ 13 Create control file copies........................................................................................................ 13 Perform a clone split for the redo and archive clone group ................................................... 13 Resynchronize the RMAN catalog ......................................................................................... 14 Leverage split clones to perform offloaded incremental backup............................................ 14 Start the ASM instance........................................................................................................... 14 Mount the database instance ................................................................................................. 14 Back up the database instance .............................................................................................. 15 Test Phase 3: Validating correct restore on the production host from offloaded incremental backups...................................................................................................................................... 15 Restore procedure.................................................................................................................. 16 Verify correct restore/recovery ............................................................................................... 16 Test Phase 4: Analysis of offloading process effectiveness...................................................... 17 Testing observations/findings ......................................................................... 17 Verifying that BCT driven incremental backup is offloaded ....................................................... 20 Verification procedure ................................................................................................................ 21 Conclusion ........................................................................................................ 22 References ........................................................................................................ 22 Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 3 Executive summary Oracle Recovery Manager (RMAN) is a comprehensive and powerful server-managed tool for backing up Oracle databases online, and managing database recovery when needed. As databases continue to grow larger in size, incremental backups, where only database changes since the last successfully completed backup are actually captured, are becoming increasingly popular. The ability to perform incremental backups has been steadily enhanced beginning with Oracle 9i through Oracle 11g. However, even with incremental backups, a backup operation can still be quite time consuming, with production performance impacted for the duration that RMAN backup is running. Hence, it would still be advantageous to be able to offload the RMAN backup overhead from the production database host. The offloading of the backup process can be achieved by leveraging a split mirror of the database, created by the underlying storagebased replication technology of EMC® CLARiiON® systems. The offloaded backups can be cataloged and maintained by RMAN to achieve the most effective database recovery. This white paper covers the general approach of how to utilize EMC CLARiiON replication technology to offload the RMAN backup process, and the associated benefits that can be expected. In particular, the paper covers how the process can be extended to incorporate the use of the Block Change Tracking feature introduced in Oracle Database 10g to significantly enhance the speed for performing RMAN incremental backups. Introduction Since the introduction at Oracle 8, the Recovery Manager (RMAN) utility had been continually enhanced through Oracle Database 9i, 10g, and 11g. Generally, RMAN runs alongside the production database service, competing with other workloads on the same database. This can result in both prolonging the time required to complete the backup task and performance impact to other database users. By utilizing CLARiiON storage-based replication technology with RMAN online backups, a point-in-time snapshot of the production database content can be quickly captured and mounted to a different server, leveraging a secondary Oracle database instance where the actual RMAN backup operations can be performed. The process of offloading the RMAN backup operation to a split mirror created by storage-based replication has been covered in past joint EMC/Oracle white papers. Those papers are listed in the “References” section. This paper specifically focuses on offloading incremental backups, which is a refinement of the general offloading process. In Oracle Database 10g, a new Block Change Tracking (BCT) mechanism had been added as an administrator-enabled database operational mode With BCT enabled, all database blocks changed since the last RMAN backup will be tracked. When a new incremental backup is performed, RMAN will be able to selectively back up only those database blocks marked as changed since the last backup. This can significantly reduce the time needed to scan a large database looking for changed blocks. While the focus is on offloading the incremental RMAN backup in conjunction with the ability to leverage BCT, the general methodology of offloading is generically applicable for all types of RMAN backups, including the subsequent registering of the backup file in the RMAN catalog, to be used on the production instance should an RMAN managed database restore/recovery become necessary. Audience This white paper covers the procedural steps that can be followed to implement the operational process as stated. EMC field personnel supporting customers with Oracle deployments, database administrators, and storage administrators responsible for supporting their own operations involving Oracle database Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 4 application deployments can evaluate if the procedure as discussed in this paper would be suitable as an extension or enhancement to their current operational practice. Terminology ASM — Automatic Storage Management, a logical volume management mechanism for holding and managing Oracle data. ASM diskgroup — A logical volume group consisting of OS defined disks devices. This diskgroup will be used to house database files BCT — Block Change Tracking file, an enhancement first introduced in Oracle Database 10g that can be optionally enabled to improve efficiency for performing incremental backups using RMAN. Clones — See SV clone below. RMAN — Recovery Manager, Oracle’s primary facility for managing database backups and recoveries. SV — SnapView™, CLARiiON array application software that allows users to create/manage/manipulate bitwise replicas of LUNs managed by the storage system quickly, irrespective of the actual size of the source LUN data involved. SV clone — A LUN maintained by the SnapView software application running inside the CLARiiON system to form a bitwise content reflection (mirror) of another LUN, known as the source LUN, that may be used and changed actively by applications from servers attached to and using that particular source LUN. Overview This white paper assumes that the readers are already familiar with the use of Oracle Recovery Manager (RMAN), including the concept of enabling and using Block Change Tracking (BCT) to improve the efficiency of performing incremental backup with RMAN. Readers needing more details on the specifics of RMAN should refer to the Oracle Recovery Manager Concept and Recovery Manager Administration Guide, as well as other related material that can be found at: http://www.oracle.com/technology/documentation/index.html. The database itself does not have to be explicitly opened. It can be in a mounted state. RMAN backups can also be performed directly against an opened database being actively accessed and modified by concurrent application usage. However, there is a potential performance impact when running RMAN against a database with heavy user activity taking place concurrently. It will also compete for server CPU, I/O, and memory resources against the foreground user application processes. This has both the negative impact of elongating the RMAN task, causing the backup to take longer to complete, while extending the time duration that normal production service level will be compromised. By leveraging storage managed “fast” replication techniques, such as CLARiiON SnapView support, it becomes viable to leverage a point-in-time storage replica of the database, redirecting it to a separate server (and database instance), where the RMAN backup can be done as an isolated activity. This avoids the production user transaction service level impact for the purpose of conducting the RMAN backup activity, as well as expediting the actual time to perform the backup, by offloading the backup process to take advantage of additional server and storage resources. As database sizes continue to grow, disk-based backups, and more frequent use of incremental backups as opposed to full backups, become key trends. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 5 Block Change Tracking (BCT), which improves the efficiency of RMAN incremental backup, was first introduced in Oracle Database 10g. With the growing acceptance for disk-based backup through the Oracle managed Flash Recovery Area, the use of the BCT feature to facilitate incremental backup (especially to the disk-based backup areas) is also becoming increasingly popular. If RMAN is run directly against the production database, at the point the RMAN activity is invoked, the database engine will freeze the current BCT map, and switch to a new BCT map to track further changes that may be occurring while the current RMAN incremental backup is being done. The switched off map is then used to drive RMAN to pick up the correct set of database pages for the purpose of creating the incremental backup set with pages changed since the last completed RMAN backup of the database. To leverage storage-based replication support to offload the RMAN process, it is necessary, as part of the offloading procedure, to ensure that the BCTs are switched appropriately in the production instance before we leverage the storage replica for offloading, since we will not be running RMAN backup directly against the production instance. Oracle provides an explicit administration function that can be called to achieve this. The offload procedure has to include this BCT map switch call against the production system prior to the storage split for offloading purpose. By explicitly switching the tracking map prior to splitting the storage images and offloading the RMAN backup at midnight on Monday, for example, all new changes during Monday against the production will be reflected on the new map. So, when we get to midnight on Tuesday, and are ready to repeat the offload process, the split-off storage replica will correctly include the tracking map segment reflecting the block changes between Monday and Tuesday midnight. This paper outlines the procedural steps that have been tested and verified as functioning correctly in the CLARiiON engineering labs. Specifically, the process as reported in the following sections includes the steps to perform the proper BCT map switching, and the use of the BCT map to drive the offloaded RMAN process to perform the efficient incremental backup. As the main purpose of running Oracle Managed Backup using RMAN is the relative ease and efficiency in restoring and recovering the production database in the event of a failure, the testing conducted to support this paper included the verification of database restore/recovery from the backups generated from the offload process. This paper also covers the relative effectiveness comparison of having the RMAN task offloaded as opposed to running the same backup directly against the production database while it is actively supporting database user activity. A relative comparison of the offloaded incremental backup generation with and without relying on the BCT option is also included. The RMAN offload process Figure 1 is a logical flow diagram of the procedural steps involved in the offload process. The process flow assumes that the production database is never shut down and therefore remains available throughout the process. The procedural steps are exercised in the order as numbered. The actual details of DBA SQL and OS or storage commands used are described in the following sections. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 6 Production Oracle Database Instance 1 2 Begin DB Hot backup End DB Hot Backup 4 Switch BCT 6 7 Database clone CLARiiON DB clone split 3 5 Production database Two copies of backup controlfiles ASM instance start, mount DB clone, FRA, ARCHIVE 8 Mount Cloned database with FRA, cloned archived logs 9 BACKUP file-set diskgroup FLASH RECOVERY AREA 10 Catalog backup in RMAN catalog 11 FRA cloned area Archive current log Split FRA clone Run RMAN incremental backup to BACKUP diskgroup RMAN catalog Offload Backup Oracle Database Figure 1. Logical flow of RMAN incremental backup offload process Offload procedure testing The following sections detail the environment under which the engineering testing of the procedure was conducted, and the specific samples of Oracle database administration commands (SQL commands), CLARiiON storage management CLI commands, RMAN commands, ASM commands, and other OS commands used to implement and exercise the process. Most of the commands are generic to OS platforms. The OS platform-specific commands can be adopted for the actual deployment platform as appropriate. Production host environment • Dell 2650 servers with: 2 x 3.2 GHz Xeon CPUs with hyper-threading 1 GB L1 cache Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 7 • • • • • 8 GB RAM RHEL 4.0 Update 4 (2.6.9-42) kernel for IA32 architecture Dual port QLA2462 4 Gb FC HBA (both ports used for HA support) Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production EMC PowerPath® 5.0 b157 for RHEL 4.0 Linux for IA32 systems EMC CLARiiON CX3-80 with: FLARE® pre-release builds of 26, build bundle 03.26.080.5.005 146 GB FC drives at 15k rpm Logical Oracle data storage layout on the production host ASM diskgroup name RMDATA RMBACKUP RMFLASH RMREDO RMCAT CX3 LUN name RMDATA1 RMDATA2 RMDATA3 RMBACK1 RMFLASH1 RMREDO1 RMREDO2 RMCAT1 RAID type and size 4+1R5, 50 GB 4+1R5, 50 GB 4+1R5, 50 GB 4+1R5, 150 GB 4+1R5, 50 GB 4+1R5, 10 GB 4+1R5, 10 GB 4+1R5, 150 GB Attributes DB data source DB data source DB data source Backup files store Flash_recovery_area Redo log area Redo log area RMAN catalog database Note that a separate disk group, RMBACKUP, is dedicated to hold backup files created, as opposed to having the backup file sets created by default into the Flash Recovery Area. The reason this has to be done is that when we try to offload the RMAN backup task to be performed by another Oracle instance on the backup server, that instance is really a distinct instance, not part of a clustered service instance with the production instance. ASM only allows ASM disk groups to be share mounted concurrently by database instances that are co-operating as part of a cluster. When the two servers and ASM instances are not configured as part of the cluster, the ASM instances on the different server would prevent simultaneous mounting of the same set of storage for use as an ASM group. So, if the production instance is using the RMFLASH group for its activities, such as creating archived logs, flashback logs, and so on, it would not be possible for the backup instance to try to mount up the same set of storage as the Flash_Recovery_Area disk group (RMFLASH), and to write backup file sets into it as part of the RMAN activity. By isolating the backup file sets to use an ASM separate disk group, this ASM group can be alternatively mounted on either the backup server (when the RMAN backup offloading is performed), or back to the production server, when it becomes necessary to perform a restore/recovery action on the production server. The backup file set will be stored within this disk group, and the file path names would be appropriately recorded in the RMAN catalog. RMAN backup offload server host environment • • • • • • Dell 2650 servers with: 2 x 3.2 GHz Xeon CPUs with hyper-threading 1 GB L1 cache 8 GB RAM RHEL 4.0 Update 4 (2.6.9-42) kernel for IA32 architecture Dual port QLA2462 4 Gb FC HBA (both ports used for HA support) Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production EMC PowerPath 5.0 b157 for RHEL 4.0 Linux for IA32 systems Shared (with production) EMC CLARiiON CX3-80 with: FLARE pre-release builds of 26, build bundle 03.26.080.5.005 146GB FC drives at 15k rpm Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 8 Logical Oracle data storage layout on the backup host ASM diskgroup name RMDATA RMBACKUP RMFLASH RMREDO CX3 LUN name RMDATA1_clone RMDATA2_clone RMDATA3_clone (RMBACK1) RMFLASH1_clone RAID type and size 4+1R5, 50 GB 4+1R5, 50 GB 4+1R5, 50 GB (4+1R5, 150 GB ) 4+1R5, 50 GB RMREDO1_clone RMREDO2_clone 4+1R5, 10 GB 4+1R5, 10 GB Attributes DB data clone DB data clone DB data clone Backup file set store Flash Recovery Area clone Redo log area clone Redo log area clone Note that the backup instance uses the same ASM diskgroup RMBACKUP from the same physical storage as the production instance. Theoretically, an alternative approach is to create a separate storage clone for the RMBACKUP group, mount the clone to the backup instance, let the RMAN backup write the backup file set into the storage clone, and then leverage storage clone reverse synchronization to update the RMBACKUP production disk group content. But to do this reliably and safely, the production source for RMBACKUP and the storage clone on the backup server still would likely need to be closed and unmounted from both servers. In that case, it is easier just to close and unmount the RMBACKUP group from the host (production or server) that currently is not using it, and mount it back up on the server that actually needs to read from or write into the RMBACKUP group when the need arises. Testing focus The primary testing focus was to validate the correctness of the offload procedure. In particular, the ability to leverage BCT to ensure the most effective incremental backup as being viable for the RMAN task run against the offloaded database storage replicated set was key. Correctness was deemed satisfied when a subsequent RMAN database recovery task could be run directly against the production database, using the backup file sets created through the offload process, and verifying that the content of the database as restored and recovered in fact contained the expected data content. The views were also examined to ensure that incremental backups were actually occurring. Since the main purpose of going through the offloading process is to minimize ongoing production service in order to perform the necessary operational backups, as auxiliary goals, production test workload performance impact comparisons were done to determine the tradeoffs between running the RMAN backup task directly against the production database compared to the offloading process. Also, the time to actually execute the same type of RMAN backups, including BCT-enabled incremental backups, between doing the backups directly, versus having that done on the offloaded system, were collected and compared. Test workload A TPC-C like OLTP workload was used as the test workload to drive continual database changes into the “production” database in conjunction with the attempt to run the RMAN backup, either directly from the production host, or with the RMAN task offloaded to the backup host. Test procedure The following are the step-by-step details of the tests conducted to validate the offloading process. Initial production database setup Five distinct ASM diskgroups were created from the LUNs provisioned from the CX3-80 system. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 9 In this test scenario, the RMDATA group holds the production database with about 25 GB of operational data. The RMBACKUP group is configured to be used to keep all the database RMAN backup sets created. Normally, this group is not mounted on the production instance. It is only mounted manually when there is a need to access any backup file sets from this disk group in order to perform restore/recovery of lost database files. The RMREDO group holds the redo logs. The RMFLASH group is used as the FLASH_RECOVERY_AREA. Archived logs are kept, in addition to control file backups, in the BCT table files. The RMCAT group is used to hold the RMAN catalog, which records what has been backed up. After the database was populated with the test workload data, an initial full database backup was captured by running an RMAN full database backup. The full backup was cataloged in the RMAN catalog. Clones for LUNs making up the RMDATA, RMREDO, and RMFLASH were initially synchronized in the CX3-80 in preparation for conducting the rest of the test steps. Initial backup database setup Three clones of the LUNs that form the RMDATA production data group were exposed to the server used to perform the backup offloading. The clones of the RMREDO group, the RMFLASH group, and the source LUN forming the RMBACKUP group from the production database were also visible on the backup server. This set of clones provides a point-in-time storage image of the production LUNs that will be used to support the offloading of the backup functions. After the clones had been fully content synchronized with the production source LUN content, the clones were storage split from their source LUNs. Once clones were split off from their production source LUNs, they became storage LUNs on the backup server. When the ASM instance on the backup server was started up, ASM correctly identified and mounted up the following diskgroups: RMDATA (from the data clones), RMREDO (from the redo LUN clone), and RMFLASH (clones of the production Flash_Recovery_Area, which also holds the archived logs and other files required, such as copies of backup controlfiles, the BCT file itself). In our ASM instance configuration, the ASM instance will try to automatically mount up the RMDATA, RMREDO, and RMFLASH groups upon startup. However, RMBACKUP is excluded from the list of ASM groups to be automatically mounted, as we have to manually coordinate the use of this group between the production and backup servers, each needing to mount the group from the same set of storage exclusively when the use of the group is required. We will use RMAN catalog on the backup server directly from production server by inserting the catalog address in the backup server’s tnsnames.ora file. CATDB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ProductionhostAddress )(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = catdb.ProductionhostAddress ) ) ) The clone of the production RMFLASH is leveraged to provide access to the needed backup control file copies, the BCT map files, and the archived logs. Flashback logging may have been enabled on production, and flashback logs will be present in RMFLASH clones. For the backup instance which is activated for the main purpose of supporting the running of the RMAN task, flashback logging is not enabled and used for the backup instance. Flashback database should not be done on the backup instance after the database has been mounted on the backup instance. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 10 When necessary (such as when we have to perform a direct restore on the production server), the RMBACKUP group would be manually remounted on the production server (after unmounting it from the backup server). Since the ASM path and file name are correctly cataloged by RMAN from the backup, the needed files can therefore be correctly restored directly from the RMBACKUP group once it is remounted on the production server, allowing the restore function to be performed correctly. Enable BCT tracking Enable BCT for the production database by executing the following commands: # export ORACLE_SID=oastoltp # sqlplus /nolog connect / as sysdba; SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE ‘+RMFLASH/rman_change_track.f’ REUSE; The REUSE option tells Oracle to overwrite any present file with the offered name. To determine whether change tracking is enabled, you can query V$BLOCK_CHANGE_TRACKING.STATUS. As shown by the above command, create tracking file in the RMFLASH disk group. With BCT tracking enabled, RMAN incremental backup leverages the map to optimize the work required, accessing and backing up only the database pages that have been changed by committed transactions since the last successful database backup point. Note that every time Block Change Tracking is disabled, and then re-enabled, a completely new set of maps will be established. To be able to reap the full benefit of being able to minimize the number of pages being scanned to support ongoing RMAN incremental backup runs, BCT should be left enabled for the production database unless the added overhead so adversely impacts the production service level that it is unacceptable to do so. Test Phase 1: Establish baseline The test phases began with establishing a baseline for performing a direct incremental backup against the test database that was being updated at a steady rate by an OLTP workload. The key metrics collected in these phases included: • The observed workload transaction throughput input while the incremental backup was executing • The time taken to complete that particular incremental backup For our testing, an OLTP test workload developed by the Oracle internal stress testing QA group called the Oracle Automated Stress Test (OAST) was used. An OAST test session was started up to execute for 60 minutes, with the database already running with BCT enabled. After 15 minutes into the execution of OAST, a direct RMAN incremental backup on the production instance was performed. The test steps executed were as follows: Go to the $OAST_HOME (/opt/app/oracle/db_1/oast/home/) location. $ ./nrunoastoltp50.sh –n OLTP –u 150 –t 3600 where options -n = Name of directory where you want to save results -u = number of users -t = time duration of operation After 15 minutes, start RMAN backup using the following commands: Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 11 $ rman TARGET / CATALOG rman/rman@catdb Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 18 15:38:12 2007 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: TPCC (DBID=3136004487) connected to recovery catalog database RMAN> RUN { RECOVER COPY OF DATABASE WITH TAG ‘incr_update’; BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG ‘incr_update’ DATABASE; } Record the elapsed time required to complete the incremental backup action. Capture the OLTP transaction rate per minute running data for the 60 minutes of run duration, noting specifically the time region when the RMAN backup was running. Mark the transaction throughput degradation observed during the time period when the RMAN backup is executing. This establishes the operational effectiveness baseline to which the offloading process will be eventually compared against. Shut down the production database. Perform a database RESTORE to tag ‘incr_update’. This causes the database to be restored first to the level 0 base copy of the database. Then the incremental level 1 backup would be automatically applied as part of the RMAN RESTORE DATABASE process. Verify that the database content is in fact restored as expected to the point where the incremental backup was generated. The expected SCN to which the restored/recovered database should be consistent with the last SCN for the level 1 backup captured and registered in the RMAN catalog. Test Phase 2: Exercise process to perform offloaded RMAN incremental backup The procedure for performing the offloading and the backup from the backup server to avoid impact to the production instance was exercised as follows. Shut down the production database. Delete the RMAN incremental backup set from RMBACKUP and remove its record from the RMAN catalog. Restore the database to the initial setup state with the level 0 full backup. Restart the database. Make sure that BCT is still enabled. (If not, re-enable BCT using the procedure as described in the previous section “Enable BCT tracking.”) Run the same OLTP workload against the restored database for 60 minutes again. Then, about 15 minutes into the run, invoke the following sequence of operational steps on the production server: Begin hot backup SQL> alter database begin backup; Perform a clone split for database files # naviseccli –h array1 -user {admin_user} –password {admin_password} snapview Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 12 -consistentfractureclones –ClonGroupNameCloneId CL_RMDATA1 0000000001 CL_RMDATA2 0000000001 CL_RMDATA3 0000000001 where CL_RMDATA1, CL_RMDATA2, and CL_RMDATA3 are the clone relationship sets associating RMDATA1 to RMDATA1_clone, RMDATA2 to RMDATA2_clone, and RMDATA3 to RMDATA3_clone. This naviseccli command ensures that the synchronized clones for each of the ASM disks making up the DATA ASM group are fractured from their respective production source LUN at exactly the same point in time, as the clones representing a dependent write order consistent state of the three ASM disks. End hot backup SQL> alter database end backup; Switch logs SQL> alter system archive log current; Execute Block Change Tracking Switch SQL> execute dbms_backup_restore.bctSwitch (); This will switch the BCT map and begin a new BCT map. Create control file copies Create two copies of the control file. One copy (control_start) will be used to mount the database on the backup host. The second copy (control_backup) will be used as a part of the incremental backup set used by RMAN. $ rman TARGET / CATALOG rman/rman@catdb Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 18 15:45:30 2007 Copyright (c) 1982, 2007, Oracle. All rights reserved. connected to target database: TPCC (DBID=3136004487, not open) connected to recovery catalog database RMAN> run { Copy current controlfile to ‘+RMFLASH/control_start’; Copy current controlfile to ‘+RMFLASH/control_backup’; } Perform a clone split for the redo and archive clone group # naviseccli –h array1 -user {admin_user} –password {admin_password} snapview -consistentfractureclones –ClonGroupNameCloneId CL_RMREDO1 0000000001 CL_RMREDO2 0000000001 CL_RMFLASH 0000000001 where CL_RMREDO1, CL_RMREDO2, and CL_RMFLASH are the clone relations association with RMREDO1, RMREDO2, RMFLASH, and their corresponding clones. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 13 Resynchronize the RMAN catalog RMAN > resync catalog Record the elapsed time required. This should include: • Time to get the running database into hot backup mode • Time to execute the storage clone split • Time to take the database out of hot backup state Also record the transactional performance impact to production for the duration when the database has to be placed into hot backup state. Leverage split clones to perform offloaded incremental backup Enable host access to the clones of RMDATA and RMARCH LUNs for the backup instance. Start up ASM in exclusive mode on the backup server. Mount the RMDATA, RMARCH groups from the storage clones activated for access. Start the ASM instance # export ORACLE_SID=+ASM # sqlplus /nolog SQL > connect / as sysdba SQL> startup mount Mount RMBACKUP ASM group to the backup server. If RMBACKUP is still mounted on production, it will be necessary to go back to the production instance and unmount the group first. The remounted ASM groups (based on the clones of RMDATA1, RMDATA2, and RMDATA3, as well as RMFLASH, plus the RMBACKUP group), now allow the backup database instance to be started back up for performing the RMAN incremental backup task. It is extremely important to perform the RMAN backup action without opening the database under the backup instance. Otherwise, the backup instance would have attempted to perform log and crash recovery on the database as re-opened on the backup instance. Backup file sets so created would not have consistent SCN sequencing relative to the production database. The backup file set will therefore not be correctly registered into the RMAN catalog, and would therefore be unusable for subsequent production database recovery. Mount the database instance Before the database is mounted, change the Backup database instance init.ora CONTROL_FILE parameter to point to the copied control file. For example: Set the parameter control_file = +RMFLASH/control_start in the p_run.ora configuration file of the database instance on the backup server. After changing the parameter, mount the database: # export ORACLE_SID=oastoltp # sqlplus /nolog SQL > connect / as sysdba SQL> startup mount The ORACLE instance is then started with the particular control file. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 14 Back up the database instance Now we will perform incremental backup on the backup host using the “control_backup” control file, as this control file is SCN consistent with the production database. Additionally, this control file was the previously backed up control file. The reason behind this concept is, once the database is mounted, the SCN will be changed and will no longer be pointing to the initial state of the control file: $ rman TARGET / CATALOG rman/rman@catdb RMAN> run {allocate channel dev1 type disk; allocate channel dev2 type disk; backup format '+RMFLASH/ctl%d%s%p%t' controlfilecopy ‘+RMFLASH/control_backup'; recover copy of database with tag ‘incr_update’; backup incremental level 1 for recover of copy with tag ‘incr_update’ database; release channel dev1; release channel dev2; } It is worthwhile going through the details of the RMAN script for clarification. This backup script will take an incremental backup at level 1 using the copied backup control file. The backup incremental level 1 for recover….. database command does not always create an incremental backup. If there is no level 0 backup available then applying this command creates an image copy backup of the database with précised tagging. The first time the script runs, this commands has no effect, since no level 0 backup has been created. The recover of copy with tag…command enables RMAN to apply any available incremental level 1 backup to a set of datafile copies with the mentioned tag. The script has no effect on the first and second run because there is neither incremental level 1 backup nor datafile copy during the first time. For the second time, there is a datafile copy but it is still based on the incremental level 1 backup copy. But the third and all subsequent runs, contains both datafile copy and incremental level 1 backup. Hence level 1 incremental backup applied to the existing datafile copy, brings the datafile copy up to the checkpoint SCN of the level 1 incremental. Record the time to perform the RMAN incremental backup task above. The test steps, starting from BEGIN BACKUP on the production instance through the RMAN incremental backup execution on the backup server, can be repeated as part of a recurring backup offloading process. If BCT is turned off on production server occasionally, the map will get reset, and a subsequent incremental backup may not be able to leverage the full performance advantage of BCT. However, the procedural step will still function correctly. Test Phase 3: Validating correct restore on the production host from offloaded incremental backups This test phase covers the process of restoring and recovering the database on the production server using incremental backups created from the backup node, as well as validating that the correct database content is in fact restored and recovered. Shut down the backup database instance and unmount the RMFLASH ASM group. Go back to the production server. Before shutting down the database instance on the production server, remove some data from the table as follows: $ sqlplus /nolog SQL*Plus: Release 11.1.0.6.0 - Production on Wed Dec 5 13:47:45 2007 Copyright (c) 1982, 2007, Oracle. All rights reserved. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 15 SQL> connect system/manager as sysdba Connected. SQL> select count(*) from oastoltp.cust; COUNT(*) 1500000 SQL> delete from oastoltp.cust where C_ID < 100; 49500 rows deleted. SQL> select count(*) from oastoltp.cust; COUNT(*) 1450500 Shut down the running database instance. Perform a direct recovery of the production database, first by restoring with the level 0 full backup. Then RMAN will apply the incremental backup that was generated through the offloading process from the backup server directly using the production instance. Restore procedure $ rman CATALOG rman/rman@catdb TARGET system/manager Recovery Manager: Release 11.1.0.6.0 - Production on Wed Dec 5 14:02:42 2007 Copyright (c) 1982, 2007, Oracle. All rights reserved. connected to target database: OASTDB (DBID=4002616050, not open) connected to recovery catalog database RMAN> resync catalog; starting full resync of recovery catalog full resync complete RMAN> run { 2> restore database; 3> recover database; 4> alter database open; 5> } The incremental restore should work against the level 0 backup, just as if the incremental backup had been created directly by running RMAN against the production instance originally as in step 1. Upon the completion of the database restore and recovery process leveraging the level 0 and level 1 backups, key tables were examined to ensure that the content was in fact correctly restored to the database state at the time the level 1 incremental backup was taken. Alternatively, you can try the following for the same. Verify correct restore/recovery Verify the correctness by counting total number of records in the oastoltp.cust table again. SQL> select count(*) from oastoltp.cust; Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 16 COUNT(*) 1500000 This confirms successful execution of the database restore and recovery process. Test Phase 4: Analysis of offloading process effectiveness Compare the production operational duration of service impact observed from the time needed to perform an RMAN backup directly against the production database, versus just going through the process of putting the database into hot backup, creating a storage replica, and then taking the database back out of hot backup, as the procedure to enable offloading of the RMAN backup action. Also, compare the relative transactional impact to production by estimating the total number of “transactions that were pre-empted” from production in order to accommodate the need to execute an operational backup action required by the business. The testing findings are summarized as follows. Testing observations/findings Test case 1: Only a database OLTP workload running on the production host The observation graph in Figure 2 shows the performance impact when only the database was running on the production server. 60000 50000 Transactions 40000 30000 TPM 20000 10000 60 62 54 56 58 48 50 52 42 44 46 36 38 40 30 32 34 24 26 28 18 20 22 12 14 16 6 8 10 2 4 M IN U TE S 0 Figure 2. TPM performance graph while only the OLTP workload was running on the production host The transaction per minute (TPM) range is between 26570 and 19850. Fewer transactions occur for the initial 10 minutes as the database needs some time to load all drivers and start up the measurement interval. But sharp observation shows that it increases the TPM rate and decreases variation from 22174 to 26570. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 17 Test Case 2: RMAN incremental backup using BCT running parallel with a database workload on the production host The observation graph in Figure 3 shows performance impact on the transaction execution rate when RMAN backup operation is running in parallel with the database workload on the production host. TPM 60000 50000 Transactions 40000 30000 20000 10000 62 58 60 56 52 54 48 50 46 44 42 38 40 36 34 32 28 30 26 22 24 18 20 16 14 12 8 10 6 2 4 M IN U TE S 0 Figure 3. TPM performance graph while the backup and OLTP workload run parallel The collected test performance data indicated that for the test workload, with the RMAN backup started and running in parallel, the ongoing OLTP workload took a 23 percent throughput hit. Also, the RMAN task itself took longer to complete, and was still executing when the OLTP test workload was shut down after it was running for about an hour. Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 18 Test Case 3: Offload data for incremental backup using the bitmap change tracking file The observation graph in Figure 4 shows little performance impact on the production host during the hot backup period. TPM 60000 50000 Transactions 40000 30000 20000 HOT BACKUP PERIOD 10000 62 60 56 58 54 52 50 48 46 44 42 40 38 36 32 34 30 28 26 24 20 22 18 16 14 12 8 10 4 6 2 M IN U TE S 0 Figure 4. TPM performance graph when offloading data with hot backup In Figure 4, the OLTP workload took a performance hit for about 3 minutes, during the time the database was put into hot backup mode, as part of the offloading process. Once the hot backup state was exited and all the offloading procedural steps completed on the production server, the production server level returned to normal (comparable to what we were reporting in Figure 2). Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 19 Combined graph OLTP-TPM RMAN-OLTP-TPM BCT-TPM 60000 50000 40000 30000 20000 10000 62 60 56 58 54 50 52 48 44 46 42 38 40 34 36 32 28 30 26 22 24 20 16 18 14 12 8 10 4 6 2 M IN U TE S 0 Figure 5. Performance impact difference The graph in Figure 5 supports the recommendation to try to offload the actual RMAN backup task leveraging storage-based replication techniques. When the actual RMAN task was run in the midst of active production work (illustrated by the yellow graph line), the OLTP workload throughput rate dropped. At the same time, the actual time to complete the RMAN incremental backup task essentially was prolonged to extend beyond the time when the OLTP test workload was terminated. Leveraging Oracle Hot Backup and storage-based point-in-time replication, the foreground user transaction processing was momentarily impacted. The performance impact was limited to the duration that the database had to be in the hot backup state, accommodating all the needed procedural steps to be properly conducted. The actual storage-based point-in-time replication for all practical purpose took zero time within the database hot backup window. Once the database exited hot backup state, user processing reverted back to normal performance level. The actual RMAN backup task offloaded can be scheduled and conducted with more latitude for a convenient time window. Because the RMAN backup that was offloaded was in fact run against a database state that was no longer subjected to high transactional content changes, the actual time taken to run the RMAN backup task was also shortened on the backup server. This was due to the fact that the RMAN task was no longer contending with other activities against the database content being backed up. Verifying that BCT driven incremental backup is offloaded The obvious reason to leverage BCT is to optimize the amount of work, and therefore the time required, to perform an incremental backup. Without the BCT maps, RMAN performs an incremental backup by scanning through all the database files involved, looking for data pages that have been changed since the last successfully completed backup. For large database files, this can turn out to be a very time consuming process. Oracle maintains information within the BCT map, tying the map to a particular checkpoint number. As RMAN reads up the database blocks tracked, if, for whatever reason, a database page is inconsistent with Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 20 the checkpoint established according to the map, RMAN will forgo the use of the map, and revert back to scanning the database blocks to ensure that the backup is created correctly. As part of the offload testing, we specifically verified that the BCT was correctly used to drive the offloaded incremental backup efficiently, without requiring full scans on the database files. Verification procedure Before taking an incremental backup on the backup host, execute the following SQL query: SQL> select checkpoint_time, checkpoint_change#,blocks_read,datafile_blocks,used_change_tracking,file# from v$backup_datafile order by file# asc; CHECKPOIN CHECKPOINT_CHANGE# BLOCKS_READ DATAFILE_BLOCKS USE FILE# --------- ------------------ ----------- --------------- --- -------- 4-Dec-07 781054 690 690 NO 0 The Parameter “USE” shows NO, which means the datafiles have not been incrementally backed up yet using the BCT map. Execute the same SQL query to determine how many datafiles are being backed up using the BCT map after taking the incremental backup: CHECKPOIN CHECKPOINT_CHANGE# BLOCKS_READ DATAFILE_BLOCKS --------- ------------------ ----------- --------------- USE --- FILE# --------- 4-Dec-07 781054 690 690 NO 4-Dec-07 10008076 4-Dec-07 9314394 690 690 NO 0 471 204800 YES 1 4-Dec-07 9314394 4-Dec-07 9314394 3995 38400 YES 2 1 102400 YES 4-Dec-07 9314394 3 75 3328 YES 4 4-Dec-07 9314394 1 3584 YES 5 4-Dec-07 9314394 1051 6144 YES 6 4-Dec-07 9314394 13367 16128 YES 7 4-Dec-07 9314394 12175 22528 YES 8 4-Dec-07 9314394 1 39808 YES 9 4-Dec-07 9314394 41967 115328 YES 10 4-Dec-07 9314394 50031 128256 YES 11 4-Dec-07 9314394 287 3584 YES 12 4-Dec-07 9314394 1 3328 YES 13 4-Dec-07 9314394 1 3328 YES 14 4-Dec-07 9314394 1 15360 YES 15 4-Dec-07 9314394 14899 27136 YES 16 4-Dec-07 9314394 1 30208 YES 17 4-Dec-07 9314394 12819 41728 YES 18 4-Dec-07 9314394 1 3328 YES 19 4-Dec-07 9314394 28675 115456 YES 20 4-Dec-07 9314394 392832 392832 YES 21 4-Dec-07 9314394 42047 113408 YES 22 4-Dec-07 9314394 42151 112768 YES 23 4-Dec-07 9314394 40467 114048 YES 24 4-Dec-07 9314394 51323 126976 YES 25 Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 0 21 CHECKPOIN CHECKPOINT_CHANGE# BLOCKS_READ DATAFILE_BLOCKS USE FILE# 4-Dec-07 9314394 40907 114688 YES 26 4-Dec-07 9314394 52043 126336 YES 27 4-Dec-07 9314394 51343 127616 YES 28 4-Dec-07 9314394 28679 113536 YES 29 4-Dec-07 9314394 32247 114176 YES 30 4-Dec-07 9314394 31739 114816 YES 31 4-Dec-07 9314394 3115 25600 YES 32 Thirty-four rows were selected. The V$BACKUP_DATAFILE view reflects the different database files involved from different past backup actions. The USE column indicates whether the backup for that particular file was in fact done leveraging the BCT tracking information. With BCT enabled and correctly leveraged by RMAN, the offloaded RMAN incremental backup took about 1.5 minutes in our test. Disabling the BCT map with the following: SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING; And re-executing the same incremental backup, the function took close to 5 minutes to complete Host level OS IO monitoring also confirmed that significantly more database file pages were read when the incremental backup was performed without using the BCT properly (or not using the BCT). For our testing, BCT enabled incremental backup effectively took 30 percent of the time that would have been needed otherwise if RMAN had to scan through all the files to ascertain the pages changed since last backup. Conclusion Our testing and observations confirmed that by properly combining Oracle backup technologies and tools with the underlying EMC CLARiiON storage replication capabilities to offload the actual backup task, the Oracle database backup process can be made significantly more effective: • Impact to ongoing database service is minimized to perform the backup using RMAN. • The time to complete the RMAN backup is optimized and more predictable (while trying to run the RMAN backup in the midst of heavy foreground production imposes more work, and more time variability, to the RMAN task). • The more efficient incremental backup capability enabled by the BCT feature in Oracle Database 11g is not affected by leveraging storage replication technique to enable the RMAN backup task to be offloaded. References The following resources should be consulted. Other references and information can be found on EMC.com • Using Oracle Database 2- Day DBA 11g Release (11.1) • Using Oracle Database Reference 11g Release 1 (11.1) • EMC CLARiiON SnapView and MirrorView for Oracle Database 10g Automatic Storage Management —Best Practices Planning white paper • Using Oracle 10g’s Automatic Storage Management with EMC Storage Technology white paper • Using Oracle 10g Release 2 (10.2) Database Backup and Recovery Basics • Oracle Database 11g Automatic Storage Management page on Oracle.com Leveraging EMC CLARiiON Storage Replication to Offload Oracle RMAN Backup Applied Technology 22