Data Warehouse Testing – Practical Approach By Sharath R Bhat Table of Contents Introduction ......................................................................................................................... 3 1 Date warehouses Definition ........................................................................................ 3 2 Challenges of Data warehouse Testing ....................................................................... 3 3 Testing Methodology .................................................................................................. 3 4 Testing Types .............................................................................................................. 4 4.1 Unit Testing ........................................................................................................ 4 4.2 Integration Testing .............................................................................................. 5 4.3 Technical Shakedown Test ................................................................................. 5 4.4 System Testing .................................................................................................... 6 4.5 User Acceptance Testing .................................................................................... 7 4.6 Operational Readiness Testing (ORT) ................................................................ 7 5 Test Data ..................................................................................................................... 8 6 Conclusion .................................................................................................................. 8 Introduction The characteristics of Data Warehouse applications pose a challenge for any organization planning a Data Warehouse initiative: How can we be sure our data is accurate and reliable when there are enormous amounts of it, and when it comes from multiple systems with different data structures? The testing approach described here combines practical experience implemented during testing Data warehouse projects. Date warehouses Definition 1 "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process". There are three types of data warehouses: 1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise. 2. ODS (Operational Data Store) - This has a broad enterprise wide scope, but unlike the real enterprise data warehouse, data is refreshed in near real time and used for routine business activity. 3. Data Mart - Datamart is a subset of data warehouse and it supports a particular region, business unit or business function. Challenges of Data warehouse Testing 2 3 Data selection from multiple source systems and analysis that follows pose great challenge. Volume and the complexity of the data. Inconsistent and redundant data in a data warehouse. Inconsistent and Inaccurate reports. Non-availability of History data. Testing Methodology Use of Traceability to enable full test coverage of Business Requirements In depth review of Test Cases Manipulation of Test Data to ensure full test coverage Fig 1 Testing Methodology (V- Model) 4 Provision of appropriate tools to speed the process of Test Execution & Evaluation Regression Testing Testing Types The following are types of Testing performed for Data warehousing projects. 1. Unit Testing. 2. Integration Testing. 3. Technical Shakedown Testing. 4. System Testing. 5. Operation readiness Testing 6. User Acceptance Testing. 4.1 Unit Testing The objective of Unit testing involves testing of Business transformation rules, error conditions, mapping fields at staging and core levels. Unit testing involves the following 1. Check the Mapping of fields present in staging level. 2. Check for the duplication of values generated using Sequence generator. 3. Check for the correctness of surrogate keys, which uniquely identifies rows in database. 4. Check for Data type constraints of the fields present in staging and core levels. 5. Check for the population of status and error messages into target table. 6. Check for string columns are left and right trimmed. 7. Check every mapping needs to implement the process abort mapplet which is invoked if the number of record read from source is not equal to trailer count. 8. Check every object, transformation, source and target need to have proper metadata. Check visually in data warehouse designer tool if every transformation has a meaningful description. 4.2 Integration Testing The objective of Integration Testing is to ensure that workflows are executed as scheduled with correct dependency. Integration testing involves the following 1. To check for the execution of workflows at the following stages Source to Staging A. Staging A to Staging B. Staging B to Core. 2. To check target tables are populated with correct number of records. 3. Performance of the schedule is recorded and analysis is performed on the performance result. 4. To verify the dependencies among workflows between source to staging, staging to staging and staging to core is have been properly defined. 5. To Check for Error log messages in appropriate file. 6. To verify if the start jobs starts at pre-defined starting time. Example if the start time for first job has been configured to be at 10:00AM and the Control-M group has been ordered at 7AM, the first job would not start in Control-M until 10:00AM. 7. To check for restarting of Jobs in case of failures. 4.3 Technical Shakedown Test Due to the complexity in integrating the various source systems and tools, there are expected to be several teething problems with the environments. A Technical Shakedown Test will be conducted prior to commencing System Testing, Stress & Performance, User Acceptance testing and Operational Readiness Test to ensure the following points are proven: Hardware is in place and has been configured correctly (including Informatica architecture, Source system connectivity and Business Objects). All software has been migrated to the testing environments correctly. All required connectivity between systems are in place. 4.4 End-to-end transactions (both online and batch transactions) have been executed and do not fall over. System Testing The objective of System Testing is to ensure that the required business functions are implemented correctly. This phase includes data verification which tests the quality of data populated into target tables. System Testing involves the following 1. To check the functionality of the system meets the business specifications. 2. To check for the count of records in source table and comparing with the number of records in the target table followed by analysis of rejected records. 3. To check for end to end integration of systems and connectivity of the infrastructure (e.g. hardware and network configurations are correct), 4. To check all transactions, database updates and data flows functions for accuracy. 5. To validate Business reports functionality. Reporting functionality Ability to report data as required by Business using Business Objects Report Structure Since the universe and reports have been migrated from previous version of Business Objects, it’s necessary to ensure that the upgraded reports replicate the structure/format and data requirements (until and unless a change / enhancement has been documented in Requirement Traceability Matrix / Functional Design Document). Enhancements Enhancements like reports’ structure, prompts ordering which were in scope of upgrade project will be tested Data Accuracy The data displayed in the reports / prompts matches with the actual data in data mart. Performance Ability of the system to perform certain functions within a prescribed time. That the system meets the stated performance criteria according to agreed SLAs or specific non-functional requirements. Security That the required level of security access is controlled and works properly, including domain security, profile security, Data Security, UserID and password control, and access procedures. That the security system cannot be bypassed. Usability That the system is useable as per specified requirements User Accessibility That specified type of access to data is provided to users Connection Parameters Test the connection Data provider Check for the right universe and duplicate data Conditions/Selection Test the for selection criteria for the correct logic criteria Object testing Test the objects definitions Context testing Ensure formula is with input or output context Variable testing Test the variable for its syntax and data type compatible Formulas or calculations Test the formula for its syntax and validate the data given by the formula Filters Test the data has filter correctly Alerts Check for extreme limits Report alerts Sorting Test the sorting order of Section headers fields, blocks Totals and subtotals Validate the data results Universe Structure 4.5 Integrity of universe is maintained and there are no divergences in terms of joins / objects / prompts User Acceptance Testing The objective of this testing to ensure that System meets the expectations of the business users. It aims to prove that the entire system operates effectively in a production environment and that the system successfully supports the business processes from a user's perspective. Essentially, these tests will run through “a day in the life of” business users. The tests will also include functions that involve source systems connectivity, jobs scheduling and Business reports functionality. 4.6 Operational Readiness Testing (ORT) This is the final phase of testing which focuses on verifying the deployment of software and the operational readiness of the application. The main areas of testing in this phase include: Deployment Test 1. Tests the deployment of the solution 2. Tests overall technical deployment “checklist” and timeframes 3. Tests the security aspects of the system including user authentication and authorization, and user-access levels. Operational and Business Acceptance Testing 1. Tests the operability of the system including job control and scheduling. 2. Tests include normal scenarios, abnormal, and fatal scenarios 5 Test Data Given the complexity of Data warehouse projects; preparation of test data is daunting task. Volume of data required for each level of testing is given below. Unit Testing - This phase of testing will be performed with a small subset (20%) of production data for each source system. Integration Testing - This phase of testing will be performed with a small subset of production data for each source system. System Testing – This phase of a subset of live data will be used which is sufficient in volume to contain all required test conditions that includes normal scenarios, abnormal, and fatal scenarios but small enough that workflow execution time does not impact the test schedule unduly. 6 Conclusion Data warehouse solutions are becoming almost ubiquitous as a supporting technology for the operational and strategic functions at most companies. Data warehouses play an integral role in business functions as diverse as enterprise process management and monitoring, and production of financial statements. The approach described here combines an understanding of the business rules applied to the data with the ability to develop and use testing procedures that check the accuracy of entire data sets. This level of testing rigor requires additional effort and more skilled resources. However, by employing this methodology, the team can be more confident, from day one of the implementation of the DW, in the quality of the data. This will build the confidence of the end-user community, and it will ultimately lead to a more effective implementation.