Virtualization Destructive Test Plan For Vendor Compatibility testing with Oracle Database. Document Version: 1.2 V1.0 V1.1 V1.2 10/18/13 11/08/13 02/02/14 Initial version Added stress test details. Revised for 12cR1 (12.1.0.1) Requestor: Certification ID: Vendor Compatibility Type: Oracle 11g Release 2 Oracle 12c Release 1 Vendor Technology Stack: Platform(s): Virtualization Technology: Oracle VM Server Version 6 Update 2 Live Migration OVM 6.2 Table of Contents 1. Background ................................................................................................. 3 Objective .............................................................................................................. 3 Scope ................................................................................................................... 3 Stakeholders ......................................................................................................... 3 Vendor Technical Skills ........................................................................................... 3 Tasks and Schedule ............................................................................................... 4 2. Test Environment Specifications .................................................................. 5 Before You Begin ................................................................................................... 5 Workload Driver..................................................................................................... 5 Hardware and System Components .......................................................................... 6 Host and Storage Topologies ................................................................................... 6 Software Reference Configuration ............................................................................ 7 3. Test Evaluation Criteria ............................................................................... 8 Preconditions......................................................................................................... 8 PASS/FAIL Criteria ................................................................................................. 8 Test Results Collection Process ................................................................................ 9 Defect Tracking and Result Logging .......................................................................... 9 4. 5. Pre-test check list ...................................................................................... 10 Virtualization Compatibility Software level Test Details ............................ 10 Virtualization – Software level Test Categories......................................................... 11 List of Vendor-Covered Internal Software level Tests ................................................ 11 Oracle Virtualization– Software level Tests .............................................................. 11 Appendix A: Sample Collection Logs ....................................................................... 15 Appendix B: Repeat tests ...................................................................................... 15 Appendix C: Checking for physical and logical corruption .......................................... 15 Appendix D: Inject SQL rows ................................................................................. 16 Appendix E: Session failure ................................................................................... 16 Appendix F: Table and datafile deletion .................................................................. 17 Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 2 of 17 1. Background Objective This Virtualization Destructive Test Plan (VDTP) defines a set of destructive test scenarios as defined by Oracle Server Technologies and supplemented by the technology vendor. The main objective is to certify the compatibility of a vendor-supplied virtualization server stack components with the Oracle Database, so customers and system integrators can be confident in deploying both vendor and Oracle technologies. OCE, have traditionally maintained a consistent set of automated, regression-style tests. When such tests successfully PASS in a vendors server technology stack, Oracle will effectively certify or validate the virtualization compatibility. Scope Virtualization Compatibility: The virtualization destructive test scenarios (under high system load) consist of a number of software and hardware failures, if applicable, against the Oracle database that Oracle and/or the vendor is to detect and recover from. The following CMS ID's are covered within this document. E.g. - OCE Support ID : OEL 6.2 with Oracle Single Instance v12.1.0.1 Stakeholders Name Put your name and phone number/email here. Vendor email address CetSupp_ww@oracle.com Organization Role (company/team name) (Approver, Owner, Reviewer) Vendor e.g. Contractor OCE Development & Support Reviewer/Approver Vendor Technical Skills Oracle Database administration General virtualization system and network administration System testing / Quality assurance Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 3 of 17 Tasks and Schedule Seq Task Name Completion Date 0 Virtualization Destructive Test Plan (VDTP) reviewed and customized (Kick-off meeting: {Date}) Virtualization hardware ready Vendor Virtualization software installed Oracle software installed Basic infrastructure validation tests completed Database workload schema and software ready Complete test coverage MM/DD/YYYY 1 2 3 4 5 6 7 MM/DD/YYYY MM/DD/YYYY MM/DD/YYYY MM/DD/YYYY MM/DD/YYYY MM/DD/YYYY Final deliverables: Document Destructive Test Results Publish/update Best Practices sheet (on Oracle Metalink and/or vendor site) – based on results of destructive tests Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 4 of 17 2. Test Environment Specifications Before You Begin Look at all the test cases, if there are particular tests that cannot be accomplished, discuss these before starting. Pre-approvals can be obtained to waive certain conditions. Acquire the necessary tools to stress your CPU and IO loads. The tests require system statistics collection utility (such as sar, vmstat, iostat, top) running in the background while tests are executed. Look at logs collection requirement, to ensure that there are sufficient scripts to collect the results. Workload Driver For the purpose of covering the virtual destructive tests Oracle requires the use of OAST The following table shows the list of all concurrent workloads available to vendors, to be executed against all RAC nodes, at the start of every hardware fault injection test: # Workload Workload Description Workload Behavior 1 OAST OLTP TPC-C workload Available from OTN website via CertSup_ww@oralce.com. 2 Cpu and memory hug This is the external/non database workload and should not be used alone. It should only be used as a supplement if the database workload cannot achieve the desired constrains Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 5 of 17 Hardware and System Components Oracle and the vendor will work together to define the minimum hardware specifications required to validate the virtual destructive test results. The following table can be used to customize the system requirements : Component Host Description Specifications Minimum of 2 CPUs 8 CPU entitlement, what are the virtual/ logical CPUs ? Memory and Swap Minimum of 4Gb RAM, plus 4Gb memory swap. 32 GB memory, swap ? Detail the Guest Memory size. Storage hardware and topology Storage disk pool/enclosure layout and volume management software -- vendor’s choice. SAN / NAS Technology ? Host and Storage Topologies This section should be used to graphically lay out virtualization topologies to be tested. NOTE : Outline the Test Environment layout. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 6 of 17 Software Reference Configuration Oracle and the vendor jointly define the minimum software versions and configuration schemes required for the RAC hosts. The following table can be used to customize the software requirements, and is based on the Oracle 11g RAC HA (high availability) Cluster Reference Configuration. Software Component Operating system Description Specifications Indicate 32 or 64 bit, platform, version number, update number Oracle Enterprise Linux 6.2 OVM 3.0 Oracle Database Indicate the following configuration choices: RDBMS Homes and trace files (on local file systems, shared NAS mount points or CFS directories). Indicate whether these file systems are on local disk,NAS, or SAN. RDBMS 12.1.0.1 Software patches Database, control and log files (on ASM and raw disks, dedicated volume groups with striping and mirroring provided externally, NAS mount points or CFS directories) FAST_START_MTTR_TARGET=60 to expedite crash recovery List any mandatory patch numbers (operating system, Oracle, vendor) Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL FAST_START_MTTR_TARGET is left at default setting, per prior agreement. Oracle RDBMS 12.1.0.1 Page 7 of 17 3. Test Evaluation Criteria Preconditions 1. All hardware, host-to-storage interconnects (interface cards, cables, switches) are factory tested and provisioned for shared cluster deployments. 2. All operating system components are installed with the most recent pre-release or production releases, kernel packages, patch levels, etc. 3. Software installation should succeed 100% and adhere to standard methods used by joint customers, i.e. no incomplete or undocumented steps will be allowed. Vendor should work closely with Oracle to ensure all install-related problems are thoroughly rectified. 4. No special Oracle or vendor software configuration settings will be permitted to bypass behavioral problems. The only exceptions shall be those where such configuration changes improve overall system stability and/or high availability. The Software Reference Configuration section should reflect such changes and the resulting benefits. *************** PRE TEST REQUIREMENT ******************* Create the DB and instance resources. Oracle’s recommendation is to use Oracle Universal Installer (OUI) and the Database Creation Assistant (DBCA), and Grid Control to create database and database resources. PASS/FAIL Criteria The following is NOT an exhaustive list of test PASS/FAIL criteria, but provides an idea of what Oracle usually looks for (e.g. in system trace files or descriptions of test outcomes): 1. Tests are marked FAIL when the following Oracle or vendor product functionality loss or unexpected outcome is recorded after any hardware fault injection: 2. Tests are marked PASS when none of the outcomes above is observed, or when any test outcome references high-priority (e.g. priority 2 or higher) bugs logged against Oracle or vendor software, and such bugs are fixed and subsequently verified. 3. Use a tool similar to “sar and vmstat to collect, report, or save system activity information” or “top/vmstat” or something else in the background during the destructive test to show you have CPU load >= 90%. The sar/top output is collected at every 30 seconds interval. If programs are needed for statistics collection, please contact OCE Support. PASS/FAIL Criteria General: Internal Oracle errors reported by Oracle RDBMS Process hangs or memory leaks OS kernel crashes Entire host deaths Storage: File (logical or physical) corruptions Host I/O hangs Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 8 of 17 Test Results Collection Process This section documents the REQUIRED method to collect test results for subsequent certification audits and verify test executions. By following the log collecting process and log structure, most of the tests can be verified in more efficient way (analyze by scripts). YOU WILL GET YOUR RESULTS FASTER. # Description 1 Clean up the following files: 2 3 Run each destructive test, taking note of the test start time, test stop time and fault injection time. At the end of the test run, please tar up and compress the following traces as <VendorName>_<TestCode>.tar.gz (e.g. OracleCorp_VDTP-STOR-01.tar.gz): System message files (e.g. /var/log/messages on Linux) 4 All files under RDBMS background_dump_dest, core_dump_dest and user_dump_dest directories. Please see appendix C for log files layout. [If running into Oracle-related problems] Run diagcollection.pl script (as root): # # # # # # 5 RDBMS background, core and user dump directories. (use sqlplus, SQL> ‘show parameters dump_dest’ then you will see the directory destination) <DBHome>/log/diag/rdbms/<hostname>/*/* /var/log/messages (or similar files) cd /tmp script <VendorName>_<TestCode>_diagcollection.out echo "Ensure that ORACLE_HOME and ORACLE_BASE" echo "are set to the right locations!" $ORACLE_HOME/bin/diagcollection.pl --collect --all exit tar up <VendorName>_<TestCode>.tar.gz and ftp to the following location (as anonymous user): ftp://sftp.oracle.com/support/incoming/<YOUR_CERTID>/ Defect Tracking and Result Logging Oracle product defects may be documented and tracked through Support TARs, or directly with Oracle Server Technologies. In either case, the defect becomes an entry in the Oracle Bug Database. Test traces and reproducible evidence will be uploaded to a corporate repository such as bugftp. {Vendor to provide product defect tracking and reporting tool to be used.} Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 9 of 17 4. Pre-test check list [Pre-0] check the test spec version and obtain the permission to waive the N/A test in advance. [Pre-1] processes running in real time and should be memory resident. Steps: Use ‘ps –eacf | egrep ‘…..’’ to make sure desired processes are running in RT mode. [Pre-2] check the active software version and others Send email to CertSupp_ww@oracle.com to ask for the latest version of test specification Brief state the purpose of certification. List the test cases that are not applicable to your certification. Purpose is to certify that db can be reliably deployed in the virtualization environment Attach your output Explain your system dependent command. Attached your output Make sure the active version is the one you tested. This check may not apply if the RDBMS is running with default scheduling and priority. Use system dependent command to check if processes are locked in memory. Steps: sqlplus / as sysdba and check the DB version e.g 11.2.0.4, 12.1.0.1 5. Virtualization Compatibility Software level Test Details This section provides a starting template for the definition of all software level Virtualization destructive test scenarios. To be considered for validation of the vendor’s virtual technology with Oracle Database, the vendor team must work jointly with Oracle OCE Team to enhance and customize the test cases defined herein. New test cases may be driven by rationales such as: To augment test coverage to specific Virtualization or critical areas To address known virtualization-related issues in the field The vendor test team must fill out the shaded columns for Oracle evaluation, i.e. Actual Test Outcome and PASS/FAIL checklist. Please provide as much documentation as needed, e.g. test exceptions, best practices, product defect numbers, patches or workarounds applied. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIALL Page 10 of 17 Virtualization – Software level Test Categories Here are the proposed virtualization software level test categories defined for vendor validation with Oracle database : [A] Vendor covered internal Software level Tests [B] Oracle Virtualization – Software level Tests List of Vendor-Covered Internal Software level Tests E.g. include Stress tests with various workloads including database (Oracle, DB2), web services (websphere), file services (NFS, ftp) E.g. Administrative tests including Tivoli, DLGR (prior and post virtualizationtechnology being tested, such as Live Migration) Functional tests including regressions of Oracle VM, VIOS, and Hypervisor components, memory update stress tests. Oracle Virtualization– Software level Tests [SW-VDST-1] Preconditions: Expected outcome: OUTCOME Run “IO and CPU ERP” workload for 24 hours continuously with DEDICATED LGR CPU. Run this test with parameters: db_block_checking=full db_block_checksum=full The LGR and Database should stay up and running. Verify: Run OAST in a dedicated processor LGR with a mimimun of 2 physical CPU entitlement and 4 logical CPU's. 1. No database corruption occurred. See Appendix C. To include, back-toback active virtualization technology (e.g. Live Migration), and Enterprise Manager running. Virtualization technology, looping for duration of workload. 2. No host reboot 3. No workload hung Enterprise Manager running prior to and during workload. 4. No ORA-600 or data corruption 5. No ORA-3113 or instance death. 6. Attach the db logs See Appendix A. 1. 2. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database Run OAST workload for 24 hours. Run artificial memory hug, non database workload to supplement the database workload. ORACLE and {Vendor} CONFIDENTIAL Memory+CPU hog used to keep the LGR continuously paging and CPU consumption high After test completion test the database for any physical logical corruption. Page 11 of 17 [SW-VDST-2] Preconditions: Expected outcome: OUTCOME Run “IO and CPU ERP” workload for 24 hours continuously with SHARED (virtual) LGR CPU. Run this test with parameters: db_block_checking=full db_block_checksum=full The LGR and Database should stay up and running. Verify: Can be same remarks as SW-VDST-1. Run OAST in shared processor LGR with a minimum of 2 physical CPU entitlement, 4 logical CPU and 4 virtual CPU's. 1. No database corruption occurred. See Appendix C. Virtualization technology, looping for duration of workload. 2. No host reboot 3. No workload hung Enterprise Manager running prior to and during workload. 4. No ORA-600 or data corruption 5. No ORA-3113 or instance death. 6. Attach the db logs See Appendix A. 1. [SW-VDST-3] Run OAST workload for 24 hours. 2. Run artificial memory hug, non database workload to supplement the database workload. Preconditions: Instance Failure. Run this test with parameters: db_block_checking=full db_block_checksum=full Virtualization technology, using synchronization hooks rather than asynchronous back-toback Virtualization technology loops, to ensure the recovery process overlaps with a suspend/resume. Enterprise Manager running prior to and during workload. 1. 2. Run OAST workload Run artificial memory hug, non database workload to supplement the database workload Let the workloads run stable for 20 minutes Inject SQL rows See Appendix D. Find pid of PMON process for this instance ‘kill –9 <pid of PMON process>’. Startup database. 3. 4. 5. 6. [SW-VDST-4] Instance Failure. Preconditions: As above for [SW-VDST-3] with ‘Shutdown Abort’ as opposed to kill -9. Expected outcome: OUTCOME Verify automatic database recovery at instance startup. Kill PMON to cause the database to fail. 1. 2. ORACLE and {Vendor} CONFIDENTIAL Examine logs to verify that a Virtualization technology occurred during database recovery. Upon completion of recovery, the database should be verified to not contain the 1 million rows of uncommitted data. The database should also be verified to have no corruption. Expected outcome: OUTCOME Verify automatic database recovery at instance startup. Could be the same remarks as SWVDST-3. 1. Shutdown abort should be used in place of killing PMON. 2. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database No database corruption occurred. See Appendix C. Attach the db logs See Appendix A. The database should be restarted, in sync with a Virtualization technology, to perform automatic recovery. No database corruption occurred. See Appendix C. Attach the db logs See Appendix A. Page 12 of 17 [SW-VDST-5] Preconditions: Expected outcome: Session Failure. Run this test with parameters: db_block_checking=full db_block_checksum=full Verify automatic SMON transaction rollback. Virtualization technology, using synchronization hooks rather than asynchronous back-toback Virtualization technology loops, to ensure the recovery process overlaps with a suspend/resume. 1. Enterprise Manager running prior to and during workload. 1. 2. Run OAST workload Run artificial memory hug, non database workload to supplement the database workload. Let the workloads run stable for 20 minutes Inject SQL rows with multiple connections See Appendix E. Find pid of session processes ‘kill –9 <pid for session process>’. 3. 4. 5. 2. 3. Query v$session to see that the killed sessions are removed. No database corruption occurred. See Appendix C. Attach the db logs See Appendix A. OUTCOME Killing of the connections should be synchronized with a Virtualization technology. After all the rowinjecting connections were killed, the sessions and locks should be verified to be cleaned up automatically. Rows in the table should be counted to ensure each connection inserted a multiple of 100 rows. [SW-VDST-6] Preconditions: Expected outcome: OUTCOME Data loss. Run this test with parameters: db_block_checking=full db_block_checksum=full Use database flashback recovery to restore the dropped table. Virtualization technology, using synchronization hooks rather than asynchronous back-toback Virtualization technology loops, to ensure the recovery process overlaps with a suspend/resume. 1. In Appendix F, ie. “flashback table to before drop” can be changed to “flashback table to scn” Enterprise Manager running prior to and during workload. 1. Using the existing OAST database from previous test drop the recently created table. See Appendix F. 2. No database corruption occurred. See Appendix C. Attach the db logs See Appendix A. Insert 333333 rows with id=1. Commit and record the current SCN. Insert 444444 rows with id=2. Delete all rows with id=1. Commit, and then flashback table to previously recorded SCN. The flashback operation was synchronized to overlap with a virtualization technology. After the flashback, the table should be checked to contain only the 333333 rows with id=1. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIAL Page 13 of 17 [SW-VDST-7] Preconditions: Expected outcome: Datafile loss. Run this test with parameters: db_block_checking=full db_block_checksum=full Recover the lost datafile and verify: Virtualization technology, using synchronization hooks rather than asynchronous back-toback Virtualization technology loops, to ensure the recovery process overlaps with a suspend/resume. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database OUTCOME Enterprise Manager running prior to and during workload. Shutdown database, backup a datafile, startup, create a table in the tablespace that datafile is a part of, insert 1000000 rows into the table. shutdown database, remove datafile and put backup file in place, start database to do media recovery. 1. Using the existing OAST database from the previous test, create a cold backup of the datafile holding the recently created table of inserted rows. The media recovery should synchronize to overlap with a Virtualization technology. 2. Physically remove the datafile holding the table. See Appendix F. After recovery, the table should be verified to contain the 100000 rows. ORACLE and {Vendor} CONFIDENTIAL 1. 2. . No database corruption occurred. See Appendix C. Attach the db logs See Appendix A. Page 14 of 17 Appendix A: Sample Collection Logs The various trace data to be collected for all timing and test outcome evaluation can be found in the following directory locations: Trace File Kernel syslog file 11gR1 Trace Location RDBMS logs Depends on your setting, do a “show parameters dump_dest” under sqlplus prompt Vendor-specific, e.g. /var/log/syslog/* Description OS events, including node reboot times RDBMS trace files, bdump/ cdump/ & udump/ Appendix B: Repeat tests Please ensure the test is repeated many times. Because of the timing relationship with the suspensions (e.g. Live Migration Virtualization technology), one cannot guarantee that the kills occur at the right time to reveal problems. Appendix C: Checking for physical and logical corruption Checking for physical and logical corruption in the datafiles will require use of RMAN (BACKUP VALIDATE CHECK LOGICAL DATABASE) and the Database Verify Utility (dbv). Using the Database Verify utility (dbv) to Oracle block corruptions you can use a quick script similar to: #!/bin/bash BLOCKSIZE=$1 DATADIR=$2 cd $DATADIR ls -1 *.dbf | while read FILE do dbv file=$FILE blocksize=$BLOCKSIZE done Which can be invoked as: ./dbv.sh 8192 $ORACLE_HOME/oradata/$ORACLE_SID >> dbv.log 2>&1 Using RMAN to check the database for block and logical corruptions use BACKUP VALIDATE CHECK LOGICAL DATABASE. Use 4 channels to speed up the process with the following command file: Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIAL Page 15 of 17 run { allocate channel c1 device type disk ; allocate channel c2 device type disk ; allocate channel c3 device type disk ; allocate channel c4 device type disk ; backup validate check logical database; } Invoke the RMAN command file and have the rman output go to a logfile, as: rman target / cmdfile rman_validate.cmd log rman_validate.log 2>&1 & As RMAN proceeds, check for any corruptions via: select count(*) from v$database_block_corruption; Appendix D: Inject SQL rows Synchronze Instance and Session failures with the Virtualization technology being tested, e.g. Live Migration, using the "hook" mechanism, if possible, prior to the LGR suspend/quiesce phase. For Instance failure insert 1 million rows uncomitted, then when informed by the hook, kill the instance. The scenario should look similar to the following: 1. 2. 3. 4. 5. 6. Prep time (set to 13 seconds) prior to the LGR suspending LGR notification via a hook to note that the Virtualization technology is ready Inject SQL rows uncommitted. Crash DB, via pmon kill Initiate the Virtualization technology once trigger is confirmed Restart DB and verify automatic recovery With the above method we can at least guarantee that the Virtualization technology, e.g. Live Migration, is in progress when invoking each of the tests. Appendix E: Session failure Synchronze Session failures with a Virtualization technology using the "hook" mechanism, if possible, prior to the LGR suspend/quiesce phase. # Enable multiple database writer processes db_writer_processes=10 Create multiple worker sessions, similar to the following: Every worker loops forever doing insert 20 records Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIAL Page 16 of 17 sleep 1 second commit once for each 100 records Start 1000 workers at 0.1 second intervals At start of preptime, start killing the workers 1 by 1 with no wait time. Preptime should be set in a way that the guest suspend/quiesce phase happens when roughly 500 workers have been killed. Loop until all entries in v$session and v$lock owned by the workers are cleaned up. Count number of records inserted by each worker to confirm that each worker inserted a multiple of 100 records and the sequence numbers start from 1 with no skip. Appendix F: Table and datafile deletion Synchronze table and datafile deletions with a Virtualization technology using the "hook" mechanism prior to the LGR suspend phase. Using the existing OAST database from previous test drop the newly created table. Note that additional data may need to be added to existing table if recovery is too brief to overlap with a suspend phase. Use database flashback recovery to restore the dropped table. In the case of datafile recovery, recover the datafile prior to the LGR suspend. Note that additional data may need to be added to existing datafile if recovery is too brief to overlap with a suspend phase. Virtualization Destructive Test Plan For Vendor Compatibility With Oracle Database ORACLE and {Vendor} CONFIDENTIAL Page 17 of 17