Data modeling. Presentation by – Anupama Vudaru, Phani Kondapalli Content by – Prathibha Madineni, Subrahmanyam Kolluri October 2010 Preface • Agenda – Basics of Data Modeling, Insurance industry and Erwin • Duration and timings – 4 days x 2 hrs • Expectations – In-class, hands on and post session work • Course contents – Divided into slides, videos and print outs • Legends used – • Post-session work – Attendees are expected to do hands-on home work assigned for the day Contents A. Data Modeling overview B. Data Modeling development life cycle Day 1 C. Components of Data Modeling D. Data Modeling notations and design standards E. Case study – CDM overview A. Conceptual data model B. Types of Data modeling Day 2 C. Various tools available D. Developing CDM using Erwin E. Case study – LDM overview A. Logical data model B. Developing LDM using Erwin Day 3 C. Meta Data preservation for Design Considerations D. Dimensional Data Modeling E. Case study – PDM overview A. Physical data model Day 4 B. Logical Data Model vs Physical Data Model C. Developing PDM using Erwin D. Advanced Features of Erwin Day 3 A. Logical data model B. Developing LDM using Erwin C. Meta Data preservation D. Dimensional Data Modeling E. Case study – PDM overview A. Logical Data Modeling 1. LDM is a more formal representation of the CMD. 2. Relational / dimensional theory is applied as per design decisions. 3. Normalization / de-normalization of data is taken care. 4. Like objects may be grouped into super and sub types. 5. Many-to-many relationships are resolved using associative entities. 6. Greater complexity is usually added as decisions about history maintained, logically unique keys, etc. 7. LDM can and should be, ‘proven’ by playing business transactions against it. 8. Need to concentrate on meta data preservation and documentation too. Example of Logical Data Model B. Developing LDM using Erwin 1. Case study 1. Industry knowledge 2. Business requirements 3. Convert the subjects to entities 4. Convert the business verbs into relations 5. Identify the type of design – relational or dimensional 6. Identify the acceptable redundancy – normalization or de-normalization 7. Identify the techniques – star or snow flake 8. Identify the entity types – master:detail or fact:dimension 9. Identify the type of relationship 10. Indentify the attributes 11. Identify the keys C. Meta data preservation around design consideration Why? 1. As a repository to revisit the design considerations. 2. As documentation to store and preserve the knowledge for future generations. 3. To use it to ship in the meta integration package to / from other applications as part of meta data management and lineage. How? Entity level • Definition: Properly define the entity using full English. • Examples: — Give examples in full English, jus like the way a business analyst would talk. — Examples should contain information about a record. • Excludes: Make a note of any excludes in terms of business. • Business Purpose: Business purpose explanation for the existence and usage of the entity. • Notes: Additional notes and descriptions. Attribute level • Definition: Properly define the attribute using full English. • Atomic attribute: Whether the values for this attribute will be a single word or multi word. • Examples: Examples of kind of data that is stored in this attribute. • Excludes: Situations where exceptions are allowed. • Business Purpose: Business purpose explanation for the existence and usage of the attribute. • Allowable Values: Type or count of values allowed for this attribute. • Range: Range of values allowed for this attribute • Other static rules: Any rules governing the data, consistency for the attribute. • Notes: Additional notes and descriptions. D. Dimensional Data Modeling 1. De-Normalized with one fact table and multiple dimensions 1. Details (Ex. City) 2. Levels 7. Critical Column 4. Each row may have 8. Non-Transactional multiple lines from 9. Surrogate Key fact table 3. Can be used for analysis OLAP Star Design 6. Granular Design 3. Hierarchical Relations 2. Great performance with less joins Dimension Dimensional Data Modeling Components Snow Flake Fact 1. Partially normalized tables 1. Less Columns 2. Not optimized for performance due to increased joins 2. More Rows (Millions) 3. Not meant for OLAP, Instead works as source for Data Marts 4. We may build Data Marts 5. More Columns (50 – 100) dimension table relation 6. Measures (Ex: Qty_Sold / 3. Numbers (No Text) Amt_Sold) 4. Added up (Summations) 5. Every row has corresponding D. Dimensional Data Modeling FACTS • Additive Facts: Facts that can be summed up through all of the dimensions in the fact table • Semi-Additive Facts: Facts that can be summed up for some of the dimensions in the fact table, but not the others • Non-Additive Facts: Facts that cannot be summed up for any of the dimensions present in the fact table • Conformed Facts: A shared fact that is designed to be used in the same way across multiple data marts DIMENSIONS • Slowly changing dimensions: Dimensions with data that changes slowly SCD Type 1 SCD Type 2 SCD Type 3 • Rapidly changing dimensions: Dimensions with one or more attributes changing frequently • Degenerate Dimensions: Derived from a fact Does not have its own dimension table • Conformed Dimensions: Dimensions that are exactly the same or perfect subset of the other • Role playing Dimensions: A dimension which is expressed differently in a fact table using views D. Dimensional Data Modeling 4. Slowly changing dimensions • Dimensions that change over time • Categorized into three types: Type 1, Type 2 and Type 3 • Overwriting the old values SCD Type1 • Creating an another additional record • Very useful for reporting purposes SCD Type2 • Creating new fields SCD Type3 B. Developing LDM using Erwin 1. Case study 05. LDM