Dimensional modeling II

advertisement
ACCTG 6910
Building Enterprise &
Business Intelligence Systems
(e.bis)
Dimensional Modeling II
Olivia R. Liu Sheng, Ph.D.
Emma Eccles Jones Presidential Chair of Business
1
The Business Dimensional Lifecycle
Technical
Architecture
Design
Product
Selection &
Installation
Business
Project
Planning
Requirement
Dimensional
Modeling
Physical
Design
Data Staging
Design &
Development
Deployment
Maintenance
and
Growth
Definition
End-User
Application
Specification
End-User
Application
Development
Project Management
2
Outline
• Table structure, types, characteristics
and terminology
• Design steps
• Dimensional models with varying types
of fact and dimension tables
3
Types of Facts
• Transactional facts (transactions or line
items in transactions)
• Snapshots
• Factless facts
4
Types of Dimensions
•
•
•
•
•
Role playing dimensions
Heterogeneous dimensions
Slowly changing dimensions
Large dimensions
Many-to-many dimensions
5
Keys and Attributes
• Primary key - a column whose value
uniquely identifies each row (record) in
the table.
• Attributes – columns in a table that are
not designated as the primary key.
• Foreign key – a non-primary-key
attribute for a table that corresponds to
a primary key of another table.
6
Attributes in DW tables
• Dimension Table
– One Primary Key
– Dimension Attributes
• Fact table
– Primary key --- A collection of primary keys from
all its associated dimension tables
• All warehouse keys in fact table are foreign keys
referring to its associated dimension tables
• All/part of warehouse keys in fact table form the primary
key of fact table
– Fact Attributes
7
Attributes in DW tables
CUSTOMER
TIME
#
*
*
*
*
*
*
*
*
*
*
TIME_KEY
ORD ERD ATE
D AY_ OF_ WEEK
D AY_ NUMBER_IN _MONTH
D AY_ NUMBER_IN _YEAR
WEEK_N UMBER
MON TH
QUARTER
H OL IDAY_FLAG
FISC AL_YEAR
FISC AL_QUARTER
referenced by
referenced by
SALES
reference
#
#
#
*
*
*
#
*
*
*
*
CUSTOMER_ KEY
C ID
C NAME
STATE
C ITY
reference
TIME_KEY
PRODUC T_KEY
CUSTOMER_ KEY
PRIC E
QUANTITY
SALES
Data warehouse keys
generated by the system
reference
referenced by
PRODUCT
#
*
*
*
PRODUC T_KEY
PID
PNAME
PCN AME
8
Keys and Grain
• Keys
– Primary or natural keys (from source
systems)
– Warehouse or synthetic keys (generated by
a data warehouse tool)
• Grain
– The level of detail of fact measures
described in the DW, e.g., sales
transactions from order line items by order
date, product and customer
9
Single-Fact-Table Data
Warehouse Design Decisions
1. The business questions in focus and source
information systems*
2. The grain of the fact table
3. The dimensions tables and keys
4. The fact attributes and dimension attributes
*All DW attributes must be mapped to or derived from
source attributes
10
Single-Fact-Table Data
Warehouse Design Decisions
1. The business questions in focus and source
information systems
2. The grain of the fact table
3. The dimensions tables and keys
4. The fact attributes and dimension attributes
11
Sample Business Questions
• Report Sales in terms of – (total) amt, (total)
qty and (avg.) price
• Report Sales by PRODUCT name and/or
category name
• Report Sales by CUSTOMER name, city and/or
or state
• Report Sales by ORDER date, month, year,
holiday, special event or other time
constraints
• Report using a combination of the measures
and constraints
12
Relational Schema of B.com B2B System
Orders ( Order_No, SID, BID, CID, Order_date)
OrderLine (Order_No, Line_ID, PID, Actual_Del_Date,
Target_Del_Date, Arrival_Date, Shipping_Fee, Tax,
Quantity, Unit_Price,Defect_on_arrival)
Delivery ( SID, CID, Unit_shipping_fee,
UNIT_DEL_TIME)
Contract ( CID, Contract_Name, Payment_term,
Payment_num)
Payment ( PaymentID, OrderNO, Pay_Amount, Date)
13
Relational Schema of B.com B2B System
Category ( CAT_ID, CAT_Name)
Product ( PID, CAT_ID, P_Weight, P_Life, P_Name)
Supplier ( SID, S_Name, S_City, S_State, S_Country)
Product_Supply ( PID, SID, Unit_Price,
Quantity_in_Stock, Production_in_Week)
Buyer ( BID, B_Name, CityID, B_Type)
Buyer_City ( CityID, C_Name, C_State, C_Country,
C_Tax)
14
Single-Fact-Table Data
Warehouse Design Decisions
1. The business questions in focus and source
information systems
2. The grain of the fact table
3. The dimensions tables and keys
4. The fact attributes and dimension attributes
15
Grain of the Fact Table
Type of fact table: transactional facts
Potential grains: order or orderline
Constraints: order date, product,
customer
Grain: sales from orderline (by order
date, product, and customer)
16
Single-Fact-Table Data
Warehouse Design Decisions
1. The business questions in focus and source
information systems
2. The grain of the fact table
3. The dimensions tables and keys
4. The fact attributes and dimension attributes
17
Dimension Tables and Keys
Key dimension tables jointly make up the primary key
for a fact table
CUSTOMER
#
*
*
*
*
CUSTOMER_ KEY
C ID
C NAME
C ITY
STATE
REFERENCED BY
SALES
REFERENCE
#
#
#
*
*
*
TIME_KEY
CUSTOMER_ KEY
PRODUC T_KEY
PRIC E
QUANTITY
SALES_AMOU NT
REFERENCE
REFERENCE
REFERENCED BY
TIME
# TIME_KEY
* ORD ER_ DATE
* D AY_ OF_ WEEK
* D AY_ NUMBER_IN _MONTH
* D AY_ NUMBER_IN _YEAR
* WEEK_N UMBER
* MON TH
* QUARTER
* H OL IDAY_FLAG
...
REFERENCED BY
PRODUCT
#
*
*
*
PRODUC T_KEY
PID
PNAME
PCN AME
18
Single-Fact-Table Data
Warehouse Design Decisions
1. The business questions in focus and source
information systems
2. The grain of the fact table
3. The dimensions tables and keys
4. The fact attributes and dimension attributes
19
Determine Fact Attributes
SALES
#
#
#
*
*
*
TIME_KEY
CUSTOMER _KEY
PRODU CT_ KEY
PRIC E
QUANTITY
SALES_ AMOUNT
20
Types of Fact Attributes
• Additive fact attributes can be added
along any dimension.
CUSTOMER
TIME
#
*
*
*
*
*
*
*
*
*
*
TIME_KEY
ORD ERD ATE
D AY_ OF_ WEEK
D AY_ NUMBER_IN _MONTH
D AY_ NUMBER_IN _YEAR
WEEK_N UMBER
MON TH
QUARTER
H OL IDAY_FLAG
FISC AL_YEAR
FISC AL_QUARTER
referenced by
referenced by
SALES
reference
#
#
#
*
*
*
#
*
*
*
*
CUSTOMER_ KEY
C ID
C NAME
STATE
C ITY
reference
TIME_KEY
PRODUC T_KEY
CUSTOMER_ KEY
PRIC E
QUANTITY
SALES
reference
referenced by
PRODUCT
#
*
*
*
PRODUC T_KEY
PID
PNAME
PCN AME
21
Types of Fact Attributes
• Non-additive fact attributes cannot be
added along any dimension.
CUSTOMER
TIME
#
*
*
*
*
*
*
*
*
*
*
TIME_KEY
ORD ERD ATE
D AY_ OF_ WEEK
D AY_ NUMBER_IN _MONTH
D AY_ NUMBER_IN _YEAR
WEEK_N UMBER
MON TH
QUARTER
H OL IDAY_FLAG
FISC AL_YEAR
FISC AL_QUARTER
referenced by
referenced by
SALES
reference
#
#
#
*
*
*
#
*
*
*
*
CUSTOMER_ KEY
C ID
C NAME
STATE
C ITY
reference
TIME_KEY
PRODUC T_KEY
CUSTOMER_ KEY
PRIC E
QUANTITY
SALES
reference
referenced by
PRODUCT
#
*
*
*
PRODUC T_KEY
PID
PNAME
PCN AME
22
Types of Fact Attributes
• Semi-additive fact attributes can be
added along some dimensions.
INVENTORY_PRODUCT
# PRODUC T_KEY
INVENTORY_TIME
REFERENCED BY
# TIME_KEY
INVENTORYFACT
REFERENCE
#
#
#
*
REFERENCED BY
REFERENCE
TIME_KEY
PRODUC T_KEY
WAR EHOUSE_KEY
QUANTITY_ON_H AND
REFERENCE
REFERENCED BY
WAREHOUSE
# WAR EHOUSE_KEY
23
Time Dimension
• Data warehouse needs an explicit time
dimension table instead of just a time
attribute (e.g, ORDERDATE).
• Save computation effort and improve
query performance
• Complex queries regarding calendar
calculation are hidden from end users of
data warehouse.
24
Time Dimension
Besides the time attribute, time dimension table
includes the following additional attributes:
– Day_of_week (1-7); Day_number_in_month (131);
– Day_number_in_year (1-365)
– Week_number (1-52); month (1-12), Quarter (14)
– Holiday_flag (y/n)
– Fiscal_quarter, Fiscal_year
25
Determine Dimension Attributes
CUSTOMER
TIME
#
*
*
*
*
*
*
*
*
*
*
TIME_KEY
ORD ERD ATE
D AY_ OF_ WEEK
D AY_ NUMBER_IN _MONTH
D AY_ NUMBER_IN _YEAR
WEEK_N UMBER
MON TH
QUARTER
H OL IDAY_FLAG
FISC AL_YEAR
FISC AL_QUARTER
referenced by
referenced by
SALES
reference
#
#
#
*
*
*
#
*
*
*
*
CUSTOMER_ KEY
C ID
C NAME
STATE
C ITY
reference
TIME_KEY
PRODUC T_KEY
CUSTOMER_ KEY
PRIC E
QUANTITY
SALES
reference
referenced by
PRODUCT
#
*
*
*
PRODUC T_KEY
PID
PNAME
PCN AME
26
Avoid Snowflake Designs
CUSTOMER
TIME
#
*
*
*
*
*
*
*
*
*
*
TIME_KEY
ORD ERD ATE
D AY_ OF_ WEEK
D AY_ NUMBER_IN _MONTH
D AY_ NUMBER_IN _YEAR
WEEK_N UMBER
MON TH
QUARTER
H OL IDAY_FLAG
FISC AL_YEAR
FISC AL_QUARTER
referenced by
referenced by
SALES
reference
#
#
#
*
*
*
#
*
*
*
*
CUSTOMER_ KEY
C ID
C NAME
STATE
C ITY
reference
TIME_KEY
PRODUC T_KEY
CUSTOMER_ KEY
PRIC E
QUANTITY
SALES
reference
referenced by
PRODUCT
#
*
*
*
PRODUC T_KEY
PID
PNAME
PCN AME
27
Avoid Snowflake Design
TIME
REFERENCED BY
REFERENCED BY
SALES
CUSTOMER
REFERENCE
REFERENCE
REFERENCE
REFERENCED BY
PRODUCT_CATEGORY
# PRODUC T_C ATEGOR Y_KEY
* PCID
* PCN AME
PRODUCT
REFERENCED BY
REFERECE
Snowflake structure
#
*
*
*
PRODUC T_KEY
PID
PNAME
PRODUC T_C ATEGOR Y_KEY
28
Avoid Snowflake Schemas
• Tradeoff of avoiding snowflake
– Advantage: improve query performance
and easy of understanding
– Disadvantage: require more storage space
29
Download