Data Manager Best Practices

advertisement
Data Manager Best Practices
Business Intelligence Solutions

ETL Catalog and Data Mart should be stored in different schemas
A Data manager catalog provides a central repository for the information that defines how Data Manager extracts, transforms and delivers
data. The catalog stores Data Manager Builds, connection specifications, Job Streams, user-defined functions and the dimensional framework.
Development/Test/Production
Catalog
Tables
Target
Connection
Source
Connection
Source
Connection
Target
Connection

Development, Test, Production schemas, migration and version
control strategy
We recommend that you create a dedicated schema for each of your environments: development, test and production.
1. All builds should be created in the development catalog first. Backup the catalog every day before leave and check in to subversion/CVS.
2. When the development is done, the dev catalog can be backed up and restored to test catalog. Make sure to modify the target database
connection in test catalog. QA/Testers validate the data in test schema and open ticket if any problem is found. Developers fix problems in Dev
catalog and push to Test for the next release. This process may be repeated in several iterations. Check in the latest Test catalog in to
subversion/CVS.
3. When the testing is done, backup test catalog and restore to production schema and modify the target database connection to production.
Check in the latest production catalog in to subversion/CVS.

Automate the ETL, Deploy and Schedule the job
-- A JobStream can multi-task events and allow commands to be executed in a parallel or serial manner.
-- The developed JobStream can be published as Data Movement tasks into the IBM Cognos BI production environment,
where they can be added to jobs and be scheduled for execution.

Create builds
1. Dimension Builds
Insert reference
dimension
•
•
•
•
Create hierarchy
for the reference
dimension
Insert level(s)
for the
hierarchy
Insert lookups
In the reference
dimension
Create dimension
build using the
reference
dimension
You can create complicated hierarchy during the “Create hierarchy” step
SCDs can be easily defined at the “Create dimension build” step
Lookups can be created after the reference dimension is completed. The lookup is used in fact build to load SKEYs
from dimension tables based on the business key.
Reference dimension is also used to handle unmatched numbers in fact build
2. Fact Builds
Fact build can be easily created using the wizard.
•
•
•
Lookups can be added in the Reference tab of the Transformation Model. It can replace the business key with the
surrogate key in dimension for you automatically. Make sure to check the “Use surrogates when available” checkbox to
enable this function.
Late Arriving Facts can be handled in fact build.
Three ways to handle unmatched members:
-- accept unmatched number identifiers and save unmatched member details
via reference structure.
-- accept unmatched number identifiers. These identifiers will be stored in the
catalog and will be loaded when your corresponding dimension build run next time.
-- reject those unmatched number identifiers.

Customized Refresh strategies in the Fact build

Debugging Steps
The following is an example on debugging ETL issue and solving the problem in data manager.
JIRA issue: ETL is not pulling DIM_ALLOCATION.cfae_purpose_code correctly
Description: According to OARD_source_to_target_maps.xls, DIM_ALLOCATION.cfae_purpose_code should be pulled from
ALLOCATION.cfae_purpose_code. I found that the cfae_purpose_code in our target table is not match the ones in AIMS
source table.
1.
Check the mapping file to verify what exactly cfae_purpose_code is pulled from and find out if there are any
transformation on this column
2.
Run query ( or spot check) to verify the problem
3.
If the problem is confirmed, check the query used in DM to pull this column. Run “retrieve 1 row” to verify if data in this
column is correct. If it’s wrong, copy the query to Toad, debug this query and fix the problem in the query
4.
If the data in the column retrieved by step 3 is correct, then the query used by DM in this build is correct. Check the
DataStream to see if this Data Source is mapped correctly. If the mapping is incorrect, fix the mapping. If the fix in Data
Stream affect Hierarchy, it’s level(s) and templates, modify them accordingly.
5.
If the Data Stream is correctly mapped, then check the mapping in Hierarchy. Fix it here if any problem is found.
6.
If you can’t find any problem from step 3-5, there is no problem in reference dimension. Go to check the dimension build.
7.
In the dimension build, check the template to see if anything is defined properly. If any problem is found, fix it here.
8.
If nothing wrong in step 7, check the mapping in the Dimension Table Properties. If any problem is found, fix it.
9.
Hooray!!! You fixed the problem!!!
Useful Sources and References

Useful Sources and References
–
–
Kimball, Ralph; et al. The Data Warehouse Lifecycle Toolkit. Wiley
Kimball, Ralph; Margy Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling.
Wiley.
–
Kimball, Ralph; Joe Caserta. The Data Warehouse ETL Toolkit. Wiley
–
www.cognoise.com
–
www.ittoolbox.com
–
www.tdwi.org
–
www.kimballgroup.com
Question
& Answer
Download