Introduction to Dimensional Modeling

advertisement
Dimensional Modeling
Chapter 2
The Dimensional Data Model
An alternative to the normalized data
model
 Present information as simply as
possible (easier to understand)
 Return queries as quickly as possible
(efficient for queries)
 Track the underlying business processes
(process focused)
The Dimensional Data Model





Contains the same information as the
normalized model
Has far fewer tables
Grouped in coherent business
categories
Pre-joins hierarchies and lookup tables
resulting in fewer join paths and fewer
intermediate tables
Normalized fact table with denormalized
dimension tables.
GB Video E-R Diagram
Customer
#Cust No
F Name
L Name
Ads1
Ads2
City
State
Zip
Tel No
CC No
Expire
Requestor
of
Rental
#Rental No
Date
Clerk No
Pay Type
CC No
Expire
CC Approval
Owner of
Line
#Line No
Due Date
Return Date
OD charge
Pay type
Holder of
Name for
Title
#Title No
Name
Vendor No
Cost
Video
#Video No
One-day fee
Extra days
Weekend
GB Video Data Mart
Customer
CustID
Cust No
F Name
L Name
Address
AddressID
Adddress1
Address2
City
State
Zip
AreaCode
Phone
Rental
RentalID
Rental No
Clerk No
Store
Pay Type
Line
LineID
OD Charge
OneDayCharge
ExtraDaysCharge
WeekendCharge
DaysReserved
DaysOverdue
CustID
AddressID
RentalId
VideoID
TitleID
RentalDateID
DueDateID
ReturnDateID
Rental Date
RentalDateIDDue Date
Return Date
SQLDate
DueDateID
Day
SQLDateReturnDateID
SQLDate
Week
Day
Quarter
Week Day
Holiday
Quarter Week
Holiday Quarter
Holiday
Video
VideoID
Video No
Title
TitleID
TitleNo
Name
Cost
Vendor Name
Fact Table
Measurements associated with a specific business
process
 Grain: level of detail of the table
 Process events produce fact records
 Facts (attributes) are usually
•
•



Numeric
Additive
Derived facts included
Foreign (surrogate) keys refer to dimension tables
(entities)
Classification values help define subsets
Dimension Tables
Entities describing the objects of the process
 Conformed dimensions cross processes
 Attributes are descriptive
•
•





Text
Numeric
Surrogate keys
Less volatile than facts (1:m with the fact table)
Null entries
Date dimensions
Produce “by” questions
Bus Architecture




An architecture that permits aggregating
data across multiple marts
Conformed dimensions and attributes
Drill Down vs. Drill Across
Bus matrix
Keys and Surrogate Keys





A surrogate key is a unique identifier for data
warehouse records that replaces source
primary keys (business/natural keys)
Protect against changes in source systems
Allow integration from multiple sources
Enable rows that do not exist in source data
Track changes over time (e.g. new customer
instances when addresses change)
Replace text keys with integers for efficiency
Slowly Changing Dimensions
Attributes in a dimension that change more
slowly than the fact granularity
 Type 1: Current only
 Type 2: All history
 Type 3: Most recent few (rare)
Note: rapidly changing dimensions usually
indicate the presence of a business process
that should be tracked as a separate
dimension or as a fact table
CustKey
BKCustID
CustName
CommDist
Gender
HomOwn?
1552
31421
Jane Rider
3
F
N
Fact Table
Date
CustKey
ProdKey
Item Count
Amount
1/7/2004
1552
95
1
1,798.00
3/2/2004
1552
37
1
27.95
5/7/2005
1552
87
2
320.26
2/21/2006
1552 2387
42
1
19.95
Dimension with a slowly changing attribute
Cust
Key
BKCust
ID
Cust
Name
Comm
Dist
Gender
Hom
Own?
Eff
End
1552
31421
Jane Rider
3
F
N
1/7/2004
1/1/2006
2387
31421
Jane Rider
31
F
N
1/2/2006
12/31/9999
Date Dimensions



One row for every day for which you expect to
have data for the fact table (perhaps generated
in a spreadsheet and imported)
Usually use a meaningful integer surrogate key
(such as yyyymmdd 20060926 for Sep. 26,
2006). Note: this order sorts correctly.
Include rows for missing or future dates to be
added later.
Degenerate Dimensions


Dimensions without attributes. (Such as
a transaction number or order number.)
Put the attribute value into the fact table
even though it is not an additive fact.
Snowflaking
(Outrigger Dimensions or Reference Dimensions)




Connects entities to dimension tables
rather than the fact table
Complicates coding and requires
additional processing for retrievals
Makes type 2 slowly changing
dimensions harder to maintain
Useful for seldom used lookups
M:N Multivalued Dimensions



Fact to Dimension
Dimension to Dimension
Try to avoid these. Solutions can be
very misleading.
Multivalued Dimensions
SALESREP
SalesRepKey
Name
Address
SALESREP-ORDER-BRIDGE
SalesRepKey
SalesrepGroupKey
Weight= (1/NumReps)
ORDERS (FACT)
SalesRepKey
ProductKey
SalesRepGrpKey
CustomerKey
OrderQty
Hierarchies
Group data within dimensions: SalesRep
• Region
• State
• County
• Neighborhood
Problem structures
• Variable depth
• Frequently changing
Heterogeneous Products

Several different kinds of entry with
different attributes for each
• (The sub-class problem)
Aggregate Dimensions

Dimensions that represent data at
different levels of granularity
• Remove a dimension
• Roll up the hierarchy (provide a new shrunken
dimension with new surr-key that represents
rolled up data)
Junk Dimensions

Miscellaneous attributes that don’t
belong to another entity, usually
representing processing levels
• Flags
• Categories
• Types
Fact Tables

Transaction
• Track processes at discrete points in time
when they occur

Periodic snapshot
• Cumulative performance over specific time
intervals

Accumulating snapshot
• Constantly updated over time.
May include
multiple dates representing stages.
Aggregates

Precalculated summary tables
• Improve performance
• Record data an coarser granularity
Download