Data Modeling and Erwin

advertisement
Data modeling.
Presentation by – Anupama Vudaru, Phani Kondapalli
Content by – Prathibha Madineni, Subrahmanyam Kolluri
October 2010
Preface
• Agenda – Basics of Data Modeling, Insurance industry and Erwin
• Duration and timings – 4 days x 2 hrs
• Expectations – In-class, hands on and post session work
• Course contents – Divided into slides, videos and print outs
• Legends used –
• Post-session work – Attendees are expected to do hands-on home work assigned for the day
Contents
A. Data Modeling overview
B. Data Modeling development life cycle
Day 1
C. Components of Data Modeling
D. Data Modeling notations and design standards
E. Case study – CDM overview
A. Conceptual data model
B. Types of Data modeling
Day 2
C. Various tools available
D. Developing CDM using Erwin
E. Case study – LDM overview
A. Logical data model
B. Developing LDM using Erwin
Day 3
C. Meta Data preservation for Design Considerations
D. Dimensional Data Modeling
E. Case study – PDM overview
A. Physical data model
Day 4
B. Logical Data Model vs Physical Data Model
C. Developing PDM using Erwin
D. Advanced Features of Erwin
Day 3
A. Logical data model
B. Developing LDM using Erwin
C. Meta Data preservation
D. Dimensional Data Modeling
E. Case study – PDM overview
A. Logical Data Modeling
1. LDM is a more formal representation of the
CMD.
2. Relational / dimensional theory is applied as
per design decisions.
3. Normalization / de-normalization of data is
taken care.
4. Like objects may be grouped into super and
sub types.
5. Many-to-many relationships are resolved
using associative entities.
6. Greater complexity is usually added as
decisions about history maintained, logically
unique keys, etc.
7. LDM can and should be, ‘proven’ by playing
business transactions against it.
8. Need to concentrate on meta data
preservation and documentation too.
Example of Logical Data Model
B. Developing LDM using Erwin
1. Case study
1. Industry knowledge
2. Business requirements
3. Convert the subjects to entities
4. Convert the business verbs into relations
5. Identify the type of design – relational or dimensional
6. Identify the acceptable redundancy – normalization or de-normalization
7. Identify the techniques – star or snow flake
8. Identify the entity types – master:detail or fact:dimension
9. Identify the type of relationship
10. Indentify the attributes
11. Identify the keys
C. Meta data preservation around design consideration
Why?
1. As a repository to revisit the design
considerations.
2. As documentation to store and preserve the
knowledge for future generations.
3. To use it to ship in the meta integration
package to / from other applications as part of
meta data management and lineage.
How?
Entity level
• Definition: Properly define the entity using full
English.
• Examples:
— Give examples in full English, jus like the way a
business analyst would talk.
— Examples should contain information about a
record.
• Excludes: Make a note of any excludes in terms of
business.
• Business Purpose: Business purpose explanation for
the existence and usage of the entity.
• Notes: Additional notes and descriptions.
Attribute level
• Definition: Properly define the attribute using full
English.
• Atomic attribute: Whether the values for this
attribute will be a single word or multi word.
• Examples: Examples of kind of data that is stored in
this attribute.
• Excludes: Situations where exceptions are allowed.
• Business Purpose: Business purpose explanation for
the existence and usage of the attribute.
• Allowable Values: Type or count of values allowed for
this attribute.
• Range: Range of values allowed for this attribute
• Other static rules: Any rules governing the data,
consistency for the attribute.
• Notes: Additional notes and descriptions.
D. Dimensional Data Modeling
1. De-Normalized with one
fact table and multiple
dimensions
1. Details (Ex. City)
2. Levels
7. Critical Column
4. Each row may have 8. Non-Transactional
multiple lines from 9. Surrogate Key
fact table
3. Can be used for analysis OLAP
Star
Design
6. Granular Design
3. Hierarchical
Relations
2. Great performance with
less joins
Dimension
Dimensional Data
Modeling
Components
Snow Flake
Fact
1. Partially normalized tables
1. Less Columns
2. Not optimized for
performance due to
increased joins
2. More Rows
(Millions)
3. Not meant for OLAP,
Instead works as source
for Data Marts
4. We may build Data Marts
5. More Columns (50
– 100)
dimension table
relation
6. Measures (Ex:
Qty_Sold /
3. Numbers (No Text)
Amt_Sold)
4. Added up
(Summations)
5. Every row has
corresponding
D. Dimensional Data Modeling
FACTS
• Additive Facts: Facts that can be summed up
through all of the dimensions in the fact
table
• Semi-Additive Facts: Facts that can be
summed up for some of the dimensions in
the fact table, but not the others
• Non-Additive Facts: Facts that cannot be
summed up for any of the dimensions
present in the fact table
• Conformed Facts: A shared fact that is
designed to be used in the same way
across multiple data marts
DIMENSIONS
• Slowly changing dimensions: Dimensions
with data that changes slowly
 SCD Type 1
 SCD Type 2
 SCD Type 3
• Rapidly changing dimensions: Dimensions
with one or more attributes changing
frequently
• Degenerate Dimensions:
 Derived from a fact
 Does not have its own dimension table
• Conformed Dimensions: Dimensions that
are exactly the same or perfect subset of
the other
• Role playing Dimensions: A dimension
which is expressed differently in a fact table
using views
D. Dimensional Data Modeling
4. Slowly changing dimensions
• Dimensions that change over time
• Categorized into three types: Type 1, Type 2
and Type 3
• Overwriting the old values
SCD
Type1
• Creating an another additional record
• Very useful for reporting purposes
SCD
Type2
• Creating new fields
SCD
Type3
B. Developing LDM using Erwin
1. Case study
05. LDM
Download