Data Manager Best Practices Business Intelligence Solutions ETL Catalog and Data Mart should be stored in different schemas A Data manager catalog provides a central repository for the information that defines how Data Manager extracts, transforms and delivers data. The catalog stores Data Manager Builds, connection specifications, Job Streams, user-defined functions and the dimensional framework. Development/Test/Production Catalog Tables Target Connection Source Connection Source Connection Target Connection Development, Test, Production schemas, migration and version control strategy We recommend that you create a dedicated schema for each of your environments: development, test and production. 1. All builds should be created in the development catalog first. Backup the catalog every day before leave and check in to subversion/CVS. 2. When the development is done, the dev catalog can be backed up and restored to test catalog. Make sure to modify the target database connection in test catalog. QA/Testers validate the data in test schema and open ticket if any problem is found. Developers fix problems in Dev catalog and push to Test for the next release. This process may be repeated in several iterations. Check in the latest Test catalog in to subversion/CVS. 3. When the testing is done, backup test catalog and restore to production schema and modify the target database connection to production. Check in the latest production catalog in to subversion/CVS. Automate the ETL, Deploy and Schedule the job -- A JobStream can multi-task events and allow commands to be executed in a parallel or serial manner. -- The developed JobStream can be published as Data Movement tasks into the IBM Cognos BI production environment, where they can be added to jobs and be scheduled for execution. Create builds 1. Dimension Builds Insert reference dimension • • • • Create hierarchy for the reference dimension Insert level(s) for the hierarchy Insert lookups In the reference dimension Create dimension build using the reference dimension You can create complicated hierarchy during the “Create hierarchy” step SCDs can be easily defined at the “Create dimension build” step Lookups can be created after the reference dimension is completed. The lookup is used in fact build to load SKEYs from dimension tables based on the business key. Reference dimension is also used to handle unmatched numbers in fact build 2. Fact Builds Fact build can be easily created using the wizard. • • • Lookups can be added in the Reference tab of the Transformation Model. It can replace the business key with the surrogate key in dimension for you automatically. Make sure to check the “Use surrogates when available” checkbox to enable this function. Late Arriving Facts can be handled in fact build. Three ways to handle unmatched members: -- accept unmatched number identifiers and save unmatched member details via reference structure. -- accept unmatched number identifiers. These identifiers will be stored in the catalog and will be loaded when your corresponding dimension build run next time. -- reject those unmatched number identifiers. Customized Refresh strategies in the Fact build Debugging Steps The following is an example on debugging ETL issue and solving the problem in data manager. JIRA issue: ETL is not pulling DIM_ALLOCATION.cfae_purpose_code correctly Description: According to OARD_source_to_target_maps.xls, DIM_ALLOCATION.cfae_purpose_code should be pulled from ALLOCATION.cfae_purpose_code. I found that the cfae_purpose_code in our target table is not match the ones in AIMS source table. 1. Check the mapping file to verify what exactly cfae_purpose_code is pulled from and find out if there are any transformation on this column 2. Run query ( or spot check) to verify the problem 3. If the problem is confirmed, check the query used in DM to pull this column. Run “retrieve 1 row” to verify if data in this column is correct. If it’s wrong, copy the query to Toad, debug this query and fix the problem in the query 4. If the data in the column retrieved by step 3 is correct, then the query used by DM in this build is correct. Check the DataStream to see if this Data Source is mapped correctly. If the mapping is incorrect, fix the mapping. If the fix in Data Stream affect Hierarchy, it’s level(s) and templates, modify them accordingly. 5. If the Data Stream is correctly mapped, then check the mapping in Hierarchy. Fix it here if any problem is found. 6. If you can’t find any problem from step 3-5, there is no problem in reference dimension. Go to check the dimension build. 7. In the dimension build, check the template to see if anything is defined properly. If any problem is found, fix it here. 8. If nothing wrong in step 7, check the mapping in the Dimension Table Properties. If any problem is found, fix it. 9. Hooray!!! You fixed the problem!!! Useful Sources and References Useful Sources and References – – Kimball, Ralph; et al. The Data Warehouse Lifecycle Toolkit. Wiley Kimball, Ralph; Margy Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Wiley. – Kimball, Ralph; Joe Caserta. The Data Warehouse ETL Toolkit. Wiley – www.cognoise.com – www.ittoolbox.com – www.tdwi.org – www.kimballgroup.com Question & Answer