Bran-ID - ITU old blogs

advertisement
Book of Lars Frank, Chapter 10,
SCD (Slowly Changing Dimensions):
The hidden slides of this slideshow may be important.
However, I will focus on leaning by exercises and therefore, rattling off
new concepts are often done in hidden slides.
Introduction to Slowly Changing Dimensions (SCD)
If the attributes of a dimension is dynamic
(e.i. they may be updated) we say that they are slowly changing.
May the Branch-size of a Branch-office change after e.g. a renovation?
May the Branch-name of a Branch-office change?
Fact table
Bank
accounts
- Account#
- Interest-last-year
- Cost-last-year
- Branch#
Dimension
Branch-offices
- Branch#
- Branch-name
- Branch-size
Exercise in SCD:
Soppose the attribute Branch-size is dynamic and aggregations is
made to the level (Branch-size, Year) or (Branch-size, Month) .
Does this aggregation make sense and how would you solve
possible problems?
Fact table
Bank
accounts
- Account#
- Interest-last-year
- Cost-last-year
- Branch#
Dimension
Branch-offices
- Branch#
- Branch-name
- Branch-size
Exercise in SCD:
Soppose the attribute Branch-name is dynamic and aggregations
is made to the level (Branch-name, Year).
Does this aggregation make sense and how would you solve
possible problems?
Fact table
Bank
accounts
- Account#
- Interest-last-year
- Cost-last-year
- Branch#
Dimension
Branch-offices
- Branch#
- Branch-name
- Branch-size
Problems with slowly changing dimensions:
•If you do not update a dynamic attribute the datawarehouse is stale.
•If you update a dynamic attribute the old measures may be aggregated to a
wrong attribute level value as e.g. the Branch office size!
TimeID
Dayname
TimeID
Week
Branch
Office
Month
Quarter
ProductID
…
Branch Office
Address
City
District
Size group
Value group
ProductID
Year
Amount
Product name
Day no
Price
Product group
Working day
Price category
Which dimension attributes and relationships
may be slowly changing and which of these give aggregation problems?
Response type
Evaluation criteria
Is historical
information preserved
Aggregation performance
Storage consumption
Response 1 where dimension
records are overwritten
No
In the evaluation, we define this
solution to have average
performance
Only the current dimension
record version is stored. No
redundant data is stored
Response 2 where new
versions are created
Yes
Version records makes
performance slower
proportional to the number of
changes
All old versions of dimension
records are stored often with
redundant attributes
Response 3 where only one
historical version is saved
The current version and a
single history destroying
version are saved
No performance degradation
occurs if either the current or
the historical version are used in
a query
Normally, only a single extra
attribute version is stored
Response 4 that use the top of
a dynamic dimension
hierarchy as a new static
dimension
Yes
Better or worse depen-ding on
whether both dimension tables
are used in a query
The relatively large fact table
must have an extra foreign key
attribute
Response 5 with dimension
data as fact data
Yes
Better or worse depen-ding on
whether the new fact data are
used in a query
The relatively large fact table
must have an extra attribute for
each dynamic dimension attribute
Response 6 that use fine
granularity in combination
with response 1 or 3
The finer the granularity,
the more historical state
information is preserved
The finer the granularity, the
slower the performance
The finer the granularity, the
more storage consumption
Response 7 that stores
dynamic dimension data as
static facts in another data
mart
Yes
Better or worse depen-ding on
whether both fact tables are
used in a drill across query
This is the most storage
consuming solution as at least a
new fact and foreign key are
stored in the new fact table
Kimball’s type 1 response:
Owerwrite the old value:
Bank account Fact
- Account-ID
- Time-ID
- Branch-ID
- Interest-last-month
- Cost-last-month
Figure 3.2
Branch-office Dimension
- Branch-ID
- Branchname
Time Dimension
- Time-ID
- Monthname
Response 1 used with dimension attribute change:
Sales fact table
Bran-ID
…
001
Quantity
Branch office dimension
…
2000
Bran-ID …
Quantity
001
2000
Bran-ID …
Quantity
001
2000
001
3500
Bran-ID
…
001
…
…
Br-Name
…
Centre
ButikID …
Br-Name
001
West
ButikID …
Br-Name
001
West
…
…
In response 2 you create a new version of the changed record:
Sales fact table
Bran-ID …
Quantity
001
2000
Bran-ID …
Quantity
001
2000
Branch office dimension
…
…
…
Bran-ID …
Bran-Size
001
250
Bran-ID …
Bran-Size
001
250
002
450
Bran-ID …
Bran-Size
Bran-ID …
Quantity
001
2000
001
250
002
3500
002
450
…
…
…
How is it possible to aggregate to the fhysical Branch office level?
Exercise in SCD:
Soppose the attribute Branch-name and Branch-size use response
type 1 and 2, respectively and are changed at the same time.
How is it in this situation possible not to preserve the historic
Branch-name information as the this gives wrong name level
aggregations?
Fact table
Bank
accounts
- Account#
- Interest-last-year
- Cost-last-year
- Branch#
Dimension
Branch-offices
- Branch#
- Branch-name
- Branch-size
Exercise:
Customers
What SCD
responces will
you recommend
for the
datawarehouses
designed in the
car rentel case of
slideshow 1.
Branch
offices
Orders
Contracts
Pick up
Reservations
Car return
Cars
Car types
Garage
services
Garages
Kimball’s 3 responces to
slowly changing dimensions :
1. Owerwrite the old value.
2. Create a new dimension record with the new
value.
3. Create an extra attribute for the changed
dimension value.
Kimball’s type 3 response:
Create an extra attribute for the changed dimension relationship.
Suppose the product group of a product may be changed.
Does this solution make meaningful aggregations to the two group levels?
In response 3, you create a new version attribute:
Order-line fact table
Bran-ID …
Quantity
001
2000
Bran-ID …
Quantity
001
2000
Bran-ID …
Quantity
001
2000
001
3500
Branch office dimension
…
…
…
Bran-ID … Old-Size
New-Size
001
250
250
Bran-ID … Old-Size
New-Size
001
450
250
Bran-ID … Old-Size
New-Size
001
450
250
Does this solution make meaningful aggregations to the two Size levels?
…
…
…
Response 3 should only be used for a new grouping criteria:
Order-line fact table
Prod-ID …
Quantity
001
2000
Prod-ID …
Quantity
001
2000
Prod-ID …
Quantity
001
2000
001
3500
Product dimension
…
Prod-ID
001
…
Prod-ID
001
…
Prod-ID
001
… Old-group New-group
…
A
… Old-group New-group …
A
B
… Old-group New-group …
A
B
What is the difference between the Grouping update and the previous Branch size update as the
Grouping aggregations functions well while the Branch-size aggregations does not give any
meening?
Suppose the product group of a product may be changed.
How would you implement SCD response 2 in this example?
Orderdetail fact
- Order-ID
- Product-ID
- Qty
- Price
Product dimension
- Product-ID
- Group-ID
- Product-name
Productgroup dimension
- Group-ID
- Group-name
Will SCD response 2 make meaningful aggregations if you want
to compare product group sale over time?
Will SCD response 3 make meaningful aggregations?
Exercise in when to preserve historic information.
Exchange the Product dimension with a Branch office dimension
and the Productgroup dimension with a Branch-Size dimension in
the following example!
Orderdetail fact
- Order-ID
- Product-ID
- Qty
- Price
Product dimension
- Product-ID
- Group-ID
- Product-name
Productgroup dimension
- Group-ID
- Group-name
Notice!
It may be both
attribute and business
dependent whether
you want to preserve
historic information
or not.
Will SCD response 2 make meaningful aggregations if you want
to compare the sale of the Branch-Size over time?
Will SCD response 3 make meaningful aggregations?
Suppose the product group of a product may be updated.
Will the response type 1 give correct aggregations to the group
level if you want to compare product group sale over time?
Orderdetail fact
- Order-ID
- Product-ID
- Qty
- Price
Product dimension
- Product-ID
- Product-name
- Group-ID
- Group-name
Suppose the product group of a product may be changed.
Will the solution below give correct aggregations to the group
level if you want to compare product group sale over time?
Orderdetail fact
- Order-ID
- Product-ID
- Group-ID
- Qty
- Price
Product dimension
- Product-ID
- Product-name
Productgroup dimension
- Group-ID
- Group-name
SCD Type 4 may be used in dynamic dimension hierarchies:
Suppose both
salary group and
product group are
dynamic.
Does this make SCD
problems?
Product Dimension
- Product-ID
- Product-name
-Product-group-name
Figure 2.1
Order Dimension
- Order-ID
-Ordertype
. ..
Dimension Hierachy
OrderdetailsFact
- Product-ID
- Order-ID
- Date-ID
- Salesman-ID
- Qty
- Price
Time Dimension
- Date-ID
- Date
- Month
- Year
- Holiday indication
Salesman
- Salesman-ID
- Salesman-name
-Salary-group-ID
Salary-Group
- Salary-group-ID
- Salary-name
- Salary
. ..
The Type 4 Responce:
Dynamic relationships in a
dimension hierarchy may
be related directly to the
fact table
Order Dimension
- Order-ID
-Ordertype
. ..
Product Dimension
- Product-ID
- Product-name
OrderdetailsFact
- Product-ID
- Order-ID
- Date-ID
- Salesman-ID
-Salary-group-ID
-Product-group-ID
- Qty
- Price
Product-group
Dimension
- Product-group-ID
-Product-group-name
Figure 3.1
Time Dimension
- Date-ID
- Date
- Month
- Year
- Holiday indication
Salesman Dimension
- Salesman-ID
- Salesman-name
. ..
Salary-Group
Dimension
- Salary-group-ID
- Salary-name
- Salary
SCD Type 5 store dynamic attributes in the fact table:
Dimension
Orders
- Order#
- Ordertype
Dimension
Products
- Product#
- Product-name
- Price
Fact table
Dimension
Orderdetails
- Product#
- Order#
- Qty
- Date#
- Salesman#
Salesmen
- Salesman#
- Salesman-name
Dimension
Time
- Date#
- Date-Name
SCD Type 6 Responce:
Use fine granularity:
Bank account Fact
- Account-ID
- Time-ID
- Branch-ID
- Interest-last-month
- Cost-last-month
Figure 3.2
Branch-office Dimension
- Branch-ID
- Branchname
Time Dimension
- Time-ID
- Monthname
The Type 7 Response:
Store the Dynamic Dimension Data as Static Facts in another Mart.
Fact table
Example
Orderdetails
Time sheets per day per
salesman per department
Let us suppose a fact table
- Product#
stores the sale of products in a
- Order#
- Qty
department store. In this
example the department
Products
Salesmen
records may have an attribute
with the number of salesmen
- Product#
as well as well as an attribute
- Salesman#
- Product-name
- Salesman-name
- Price
with the monthly costs of the
- Group#
departments.
Product
These attributes are dynamic!
groups
- Group#
- Group-name
- Department#
Departments
Department#
Department name
No. of employes
Department costs
Which response type would
you recommend?
Exercise:
Select responses to
SCD for the
Airline DW.
Airline
companies
Flight routes
Airports
Subroutes
Departures
Tickets
Customers
Travel
arrangement
Exercise:
Select responses to SCD
for the Hotel DW.
Hotel
chains
Hotels
Rooms
Room
reservations
Services/
tours/
car rentals
Customer
groups
Check-in
periods
Customers
Exercise.
Select responses to SCD for the travel agency.
Customers
Buyer
Orders
Traveler
Bookings
Reservations
Flight routes/
Room types/
Car types/
service types
Departures/
Hotel rooms/
Car rentals/
etc.
Product
owners
Exercise.
Design a datawarehouse for a promotion company.
Customers
Orders
Logical
promotions
Order lines
Promotion
media
Physical
promotions
Presentation
blocks/types
How is it possible to measure the results of promotions and where
should these measures be stored in the data warehouse?
Exercise:
Design a DW for a commercial TV channel
HRM exercise:
Make some requerements for a HRM system and try to group them in
OLTP and OLAP requerements.
Make an ER diagram for an OLTP database and one or more OLAP
datamarts that can fulfill the requerements.
Design a datawarehouse for a bank:
It should be possible to analyze both costs and
revenye for customers, households, branch
offices, regions, account managers etc.
Exercise:
Design a datawarehouse for a housing association that let
out flats, shops and office areas.
It is possible to sign up on vaiting lists for these.
Exercise:
Design et datawarehouse for DSB in order to deminish train delays.
Exercise:
Design a datawarehouse for stock exchange dealers in a bank.
Kimball’s type 2 response:
Suppose an account shifts Branch relationship in
the middle of the month.
Will the aggregations be correct and how will
you solve possible problems?
Bank account Fact
- Account-ID
- Date-ID
- Branch-ID
- Interest-last-month
- Cost-last-month
Branch-office Dimension
- Branch-ID
- Branchname
Time Dimension
- Date-ID
- Monthname
Figure 3.2
Can you find more solutions?
Kimball’s type 2 response:
Suppose both the Branch relationship and the
Branch-size are dynamic.
How can aggregations be correct?
Bank account Fact
- Account-ID
- Date-ID
- Branch-ID
- Interest-last-month
- Cost-last-month
Branch-office Dimension
- Branch-ID
- Branchname
Time Dimension
- Date-ID
- Monthname
End of session
Thank you !!!
Download