Chapter 1 Adamson & Venerable Dimensional Modeling

advertisement
3/5/2015
Class 06.1
Chapter 1
Adamson & Venerable
Spring 2015
Dimensional Modeling
•
•
•
•
•
•
•
Dimensional Model Basics
Fact & Dimension Tables
Star Schema
Granularity
Facts and Measures
Multiple Processes
Aggregation
1
3/5/2015
Dimensional Modeling
• Dimensional Model Basics
– OLTP Model for Transactions
– Master Files vs. Transaction files?
– Relational Model vs. Dimensional Model
• ERDs vs. Star Schema
• Normalization issues
• Reports & Queries in Relational Models can be complex:
– Especially were seeking historical comparisons
– Business Process Measurements (Not in RDBMS)
•
•
•
•
Gross Margins
Average Aged A/R balances
Inventory levels compared to Sales by Warehouse Location
ROI, etc.
Fact & Dimension Tables
Order Entry Process – Table 1.1
Facts or Measures
Dimensions
Gross sales dollars
Customer Name
Total Cost
Customer state, zip, etc.
Margin dollars
Date of order
Quantity Sold
Quarter of order
Month of order
Year of order
Salesperson name
Sales Territory
Corporate region
Product name
Product brand
Product category
2
3/5/2015
Dimensional Modeling
• Fact & Dimension Tables
– Use Normalization approach to determining
appropriate dimension tables
– Functional dependencies
• Customer_key  Name, address, city, state, zip
• Salesperson_key  Name, code, territory, region
• Date_key date, day_week, day_of_month, etc.
• Order_key  product_key, order_date_key,
salesperson_key, customer_key, order_dollars,
extended_cost, margin_dollars, quantity_ordered,
order_number, order_line
Dimensions (continued)
• Dimensions represent key characteristics of each fact
or transaction
• For example, a sale “fact” would consist of an Item sold,
quantity, and price;
• A required Dimension will thus be: PART, which will
contain other data about the Item, such as description,
weight, size, etc.
• However, we may want to know about other
attributes/properties or Dimensions of the Sale, such
as: Customer, Rep, Location, Time represented as Day
of Week, etc.
3
3/5/2015
Star Diagram for Orders
NEW
NEW
Star Schema: Fact Tables
• Primary Keys on Fact Table
– Foreign Keys
• Orders [Order_Date_key, Product_key, salesperson_key,
customer_key, ….]
– Surrogate Key
• Orders [Order_ID, Order_Date_key, Product_key,
Salesperson_key, Customer_key, etc….]
– Avoid Keys from other Systems
• Granularity
– What level of detail?
• Individual Transactions (lowest “grain”)
• Daily summaries, weekly summaries, monthly summaries, etc.
• Must be uniform and consistent throughout data warehouse for
computational purposes
4
3/5/2015
Dimensional Modeling: The
Star Schema
• RECALL: Data Warehouses are relatively nonvolatile;
– However, some changes may be necessary
• Most common changes are updates to the
FACT tables.
– However, these are generally done in “batches”
• Even the Dimension tables require some
changes: Type 1; Type 2; and Type 3 …
Dimensional Modeling
• Changing dimension tables:
– Type 1 Change
• Correct an error or a value that has no significance in future
analysis
– E.g., phone number change in Customer dimension
– Type 2 Change
• Add new row in dimension table
– Significant change, such as a new location for customer
– Type 3 Change
• Add new dimension or modify schema (add columns)
– New organization structure for firm; want to be able to compare
new and old. E.g., International Currency Conversion
5
3/5/2015
Type 1 Example
Susan's Tax Bracket attribute value changes from Medium to High
Jukić, Vrbsky, Nestorov – Database
Systems
Chapter 8
Type 2 Example
Susan's Tax Bracket attribute value changes from Medium to High (SIMPLE CASE)
Jukić, Vrbsky, Nestorov – Database
Systems
Chapter 8 – Slide 15
6
3/5/2015
Type 3 Example A
Susan's Tax Bracket attribute value changes from Medium to High
Jukić, Vrbsky, Nestorov – Database
Systems
Chapter 8 – Slide 16
Type 3 Example B (with timestamps)
Susan's Tax Bracket attribute value changes from Medium to High
Jukić, Vrbsky, Nestorov – Database
Systems
Chapter 8 – Slide 17
7
3/5/2015
Facts and Measures
• 3 Kinds of Measures
– Fully additive
• Facts may be added equally well across any dimension
• E.g., gross sales dollars, extended cost, margin dollars,
quantity sold
– Non-additive
• Facts that may not be added. Generally ratios.
– E.g., Margin rate
• Need to break components into individual facts:
– Margin dollars, sales dollars
– Semi-additive
• Fact that can be summarized across some dimensions
– Units sold of same vs. different products
• E.g., Banking account balances can be totaled for point in time,
but not across time periods.
More Examples of Kinds of
Measures
• Fully Additive
– Most dollar amounts are fully additive.
• Dollars of all products sold in a time period.
• Non-Additive
– Percentages and ratios are not additive without
appropriate weightings
• 10% gain in Investment #1 can not be added to a 20%
gain in investment #2
• Semi-Additive
– A bank can add all customer balances together at the end
of each month; but not across time.
8
3/5/2015
Dimensional Modeling
• Multiple Processes
– E.g., Orders vs. Shipments
– Different processes
• Payroll vs. Purchasing
– Different grains or granularity
• E.g., Time dimension … Very Important!!
– Hour (time of day?)
» Scalar function performed on computer clock
– Day
– Week
– Month
– Year
Multiple Processes: Orders & Shipments
Data Marts are subsets of the Data
Warehouse: E.g., Orders and
Shipments are separate data marts
9
3/5/2015
Dimensional Modeling
• Aggregation (Σ)
– As size of Data Warehouse grows we may
need to aggregate data in order to improve
query performance
– Reduces granularity!
• E.g., Change from Daily to Monthly Summaries
– Keep same Dimension Tables
– Time table may be modified
– Fact table keeps same types of attributes as before
» BUT quantities now represent monthly
summarizations of data.
Aggregate Table
Month_date_key
De-normalize from
Day to Month
Adjust primary key accordingly
10
3/5/2015
Becker’s Rule #2:
Materializing Object Views (MOV)
• 1. Objects
• 2. Views
• 3. Materialize
Views
Reports,
Forms
Gartner Magic Quadrant
BI & Analytics Platforms
– Layouts, formats, etc
Tables
11
Gartner Magic Quadrant
BI & Analytics Platforms
3/5/2015
Summary
•
•
•
•
•
•
Dimensional Model Basics
Fact & Dimension Tables
Star Schema
Granularity
Data Changes: Type 1, Type 2, Type 3
Facts and Measures: Additive, Nonadditive, Semi-additive
• Multiple Processes
• Aggregation
12
Download