White Paper: DB2 to Oracle Migration MacDB2O 2006: Version – February 1, 2005 1. Overview This document reviews Macrosoft’s process methodology for migrating IBM Mainframe DB2 databases to Oracle. Much of our work in this area has been done in conjunction with work projects intended to migrate mainframe applications to server environments (Unix, Linux, Windows). We have our own migration tools (MMK – Mainframe Migration Toolkit), as well as expertise in using industry standard tools such as Oracle Workbench, Micro Focus Revolve, etc. We are development partners of IBM and Oracle, and as well are partners of: Micro Focus, Migration Transformation Consortium (MTC) for legacy migrations, and Mainframe Migration Alliance (MMA). The methodology we use requires minimum interaction with the client’s mainframe production environment, thereby reducing costs and minimizing interruptions of the production system. This is achieved through a phased activity, in which most of the analysis, tool-creation, mock-conversion, testing work is done on a PC (back office). Our global delivery model (offshore, near-shore, and onsite) further minimizes costs for the client. 2. Process Phases The migration process generally includes the following steps: 1. 2. 3. 4. Creation of overall Project Plan Creation of overall Migration Plan Database Schema Migration Data Migration These are explained below: 2.1 Creation of Overall Project Plan This involves the following activities: Determine migration scenario Identify all migration tasks Develop infrastructure sustaining processes Determine the training requirements Determine resources that will be required (HW,SW and people) Determine QA & Support processes and teams Identify customer responsibilities Determine MVS change management process Determine the environments -- Test, Development, Production Determine the network connectivity (VPN) to both source and target systems Determine the amount of data to migrate Determine the amount of customization for online and batch Archiving of historical data to reduce the amount of data to migrate Determine the parallel testing strategy Determine details of integration with other systems Determine naming conventions and process standards [eg: Oracle database object names can be 30 chars long whereas in DB2 it is 18 chars ] Prepare a proto target environment Test the migration scenario 2.2 Creation of Overall Migration Plan This involves the following activities: Analyze customization and platform differences Develop an infrastructure plan Design the layout of the databases, table-spaces, and tables Estimate CPU and Storage (disk & memory) sizes Analyze and choose migration tools Analyze the fixes for present and future environments Develop plans for: Print migration Batch migration Testing (Test Criteria & Strategy ) Determine customization of source applications required Determine conversion of JCLs etc. to shell scripts required Determine conversion/Customization of third party tools Determine EBCDIC-ASCII conversion - possible issues Develop plans for Security Backup and disaster recovery Administration Database monitoring Performance monitoring Change management Stress test Incident tracking Vulnerability assessment 2.3 Database Schema Migration The phases involved in DB schema migration are shown in Fig. 1 and explained below: 2.3.1 Extract Database Meta data This phase involves extracting and analyzing the source database structure, from the source DDL statements. Modeling tools and reverse engineering can help in capturing all details of the schema. 2.3.2 Convert Database Objects This is the major step of the schema migration process. All database objects in the source database need to be converted to the equivalent objects in the target system. Typically objects such as data types, tables, columns, views, indexes, stored procedures, triggers, packages, sequences, authorities, functions etc. need to be converted. Factors such as data type, scale, precision, length and default values for table columns, functions, and stored procedures, null values etc. can cause issues. Refer Section 4 for examples of terminology differences between DB2 and Oracle. 2.3.3 Convert Queries This is the next major phase in the database schema migration process. Even though the basic SQL commands are the same, SQLs differs from engine to engine (Refer to Section 4 for examples of differences between Oracle PL/SQL and DB2 SQL). SQL translation requires good expertise and knowledge of both the source and target systems in order to avoid performance issues. 2.3.4 Implement Converted Objects This phase involves building the database structure, on the target platform through scripts or the facilities provided in the target system. Enhancements related to the schema or performance can also be considered in this phase, utilizing the special features in the target system 2.4 Data Migration The phases involved in data migration are shown in Fig. 2 and explained below: 2.4.1 Data Analysis This phase involves walkthrough of the data presently in the database (or in use). Some data, which is well accommodated in the source system, may not be accommodated in the target system. Usually the volume of data is large and a full walkthrough may not be possible. In such cases random samples are taken for identifying data items, which can cause problems in movement. 2.4.2 Data Cleanup / Enrichment A data cleanup / enrichment prior to migration can help in effective movement of data. There could be obsolete or unused items, as well as items which will not affect the source or target system if modified. If this step is performed well in advance, the subsequent phases in this process will gain significant advantage. 2.4.3 Conversion Study This phase involves assimilation of the outputs from the above two phases, and detailed study for finalization of a conversion strategy. This phase can be categorized into the following steps: Fitment / Conversion Study – Output of this phase is a study report detailing the changes required in the data items for the movement. Formation of Migration Strategy – Output of this phase is the “Migration Strategy Document” detailing the planned process of migration, tools planned to be used etc. Finalization of Scope of Migration - In this phase, the scope of migration is defined. Items such as scope, limitations, performance and maintenance issues etc. need to be well defined. Finalization of Acceptance Criteria – This phase will define the acceptance / test criteria, test process and test procedures to ensure that the data movement is fault free. 2.4.3.1 Conversion Strategy Signoff In this phase, user (client) approves all the documents mentioned above. This phase is very crucial, while handling critical data. 2.4.4 Conversion Tool Preparation In this phase, the tools required for the data movement are developed (or customized). In production systems, tools are very crucial since the final data movement is done in one shot (usually in 1 or 2 days during off hours or holidays). The tools preparation is a full project activity of its own involving all phases of SDLC. 2.4.5 Mock Conversion In this phase, a mock conversion is performed, using the existing data in the source system. This may involve several rounds as below: Mock Conversion Round 1 Fixing of mismatches observed in round 1 Mock Conversion Round 2 Fixing of mismatches observed in round 2 … It is very important to document the change records during this phase. 2.4.6 Conversion for Parallel Run Usually a pre-production system is setup for parallel run to which the data migration can be performed to ensure that the migration is problem-free. In this phase a one-shot data migration from the source system to the pre-production system is performed. Detailed testing is carried out to ensure that the data migration is fault free. Detailed performance testing and monitoring is also done in this phase. 2.4.7 Conversion for Live System This is the final step of actual data movement from source system to target system. In production systems, this should be done in one shot when the system is not active (usually off hours or holidays). In 24x7 systems, the system may have to be brought down to off-line mode for the data movement. 3. General Milestones and Deliverables Generally we envisage documents that include the following: Planning Overall Project Plan Document Overall Migration Plan Document System Overview document Plan – Sign Off Analysis & Design Gap Analysis document SRS - Migration Specification document Data Migration Strategy document Migration Tools (Design/Test/Usage) documents User Acceptance Test Plan (UAT) document Test Criteria, Plan & Procedures Migration Strategy & Tools – Sign Off Design – Sign Off Schema Migration Schema Migration Reports Unit Test Reports (Schema Validation) Schema Migration – Sign Off Data Migration Mockup Data Migration Reports (including cleanup details) Unit Test Reports Data Migration – Sign Off User Acceptance Test Test Reports UAT – Sign Off Installation/Live Run Live Migration Test Reports Live Cutover – Sign Off Delivery & Post implementation Support Parallel Run Reports Tools & Application Software developed All other documents Project – Sign Off 4. Examples of Differences between DB2 and Oracle Terminology DB2 Oracle Database A subsystem can have more than one database. Databases are used to logically group application data. All databases share the same system catalogs, system parameters, and processes in the subsystem. DBADM authority is granted on the database level. SYSADM authority is granted at the subsystem level. A database is logically divided into tablespaces. There are several tablespace types: simple, segmented, partitioned and large partitioned (for 16 TB tables). A non-partitioned tablespace points to one physical VSAM file on DASD. A partitioned tablespace points to one VSAM file per partition on DASD. A segmented or simple tablespace can contain one or more tables. Equivalent to pages; 4 K, 8 K, 16 K, 32 K. Each instance has one database and one set of system catalog tables. Tablespace Blocks Extents The unit by which storage is allocated for a VSAM file. The size of the primary and secondary extents is specified in the CREATE A database is logically divided into tablespaces. A tablespace can point to one or more physical database files on disk. One or more tables can reside in a tablespace The smallest unit of database storage. Database files are formatted into blocks, which can be from 2 K to 16 K. The unit by which storage is allocated in a database file. The size of the primary and secondary extents are specified in the Storage TABLESPACE statement. A VSAM file can grow up to a maximum of 119 secondary extents. Extents are made up of contiguous pages. Stogroups Stored Procedures Plan Clusters Clustering Index Secondary Authid A series of DASD volumes assigned a unique name and used to allocate VSAM datasets for DB2 tablespaces and indexes. Stored procedures are written in C, C++, COBOL, Assembler, PL/1or the new DB2 SQL Stored Procedure language. The compiled host language is stored on the DB2 server and the compiled SQL is stored on the database. A plan is an executable module of SQL that is composed of one or more packages and was created from a DBRM. A DBRM is a module of uncompiled SQL statements that were extracted from the source program by pre-compilation. A DBRM is bound into a plan or a package. No equivalent. An index created on a column of a table where the data values are stored in the same physical sequence as the index. Allows for fast sequential access. Secondary Authid or RACF Group. Privileges can be granted to a secondary authid. Primary authids are assigned to the secondary authid Group. Primary authids inherit all clause of the CREATE TABLE or CREATE INDEX statements or default to the sizes specified in the CREATE TABLESPACE statement. Extents are allocated until there is no more free space in the files that make up the tablespace, or the maximum number of extents has been reached. The size of the file is specified in the CREATE TABLESPACE statement. Extents are made up of contiguous blocks of storage. No equivalent. Written in PL*SQL, JAVA etc. Stored procedures are stored in an Oracle table and executed from within the database. No equivalent. Clusters are an optional method of storing data. This approach creates an indexed cluster for groups of tables frequently joined. Each value for the cluster index is stored only once. The rows of a table that contain the clustered key value are physically stored together on disk. No equivalent. No direct equivalent in Oracle. Groups of privileges known as roles can be granted to a user ID. Package Other examples of differences privileges granted to the secondary authid (group) they are in. A package consists of a single program of executable SQL and the access paths to that SQL. The package is stored on the database and invoked by the host language executable. A package is created by doing a BIND. A package may be part of a PLAN. PRIQTY SECQTY Smallint\Decimal FREEPAGE etc. No equivalent as known in Oracle. A “package” in Oracle has another meaning. Package is written in PL*SQL and allows you to group all related programming such as stored procedures, functions, and variables in one database object that can be shared by applications. INITIAL NEXT NUMBER FREELIST etc. 5. Database Migration – General Questionnaire (Example) A. Customer Data Customer Name:___________________________________ Phone no.:____________________ Contact Person: ____________________________________Fax no.: ______________________ B. Technical Data B.1 Source System Hardware Model ___________________ Operating System Name _____________________ Operating System Version ____________ Database DB2 Database Version ___________ Size of Production Database ____________ No. of Concurrent Users in Production ____________ Avg. No. of On-line Transactions in Production per hour __________________________________ No. of Batch Processes _____________ (a ) Production No. of Databases ________ No. of Tablespaces ________ Size _______ (b ) Test No. of Databases ________ No. of Tablespaces ________ Size _______ (c) Development No. of Databases ________ No. of Tablespaces ________ Size _______ (d) Other ___________ No. of Databases ________ No. of Tablespaces ________ Size _______ List Names and Number of rows for the 10 largest tables in Production: __________________ ____________ __________________ ____________ __________________ ____________ __________________ ____________ __________________ ____________ __________________ ____________ Have any stored procedures been written to access the database? ( ) Yes ( ) No how many _________ What is the average number of SQL calls per stored procedure? ____________ How long are the stored procedures (total number of statements) ____________ Have any triggers been written to access the database? ( ) Yes ( ) No how many ________ What is the average number of SQL statements per trigger _____________ How long are the triggers (total number of statements)? _____________ Is an archival process in place ( ) Yes ( ) No Brief description of Hardware Configuration _____________________________________________ Brief description of application, third party tools & Host languages __________________________ __________________________________________________________________ B.2 Target System Hardware __________________________ Operating System Name ____________________ Operating System Version ____________ Database: Oracle Database Version _________ Do Migration Tools such as “Oracle Migration Workbench” Exist ? ( ) Yes ( ) No Brief description of Hardware Configuration _____________________________________________ Brief description of application, third party tools & Host languages __________________________ ________________________________________________________________________