Chapter 3. Conventional Data Warehouses

advertisement
Chapter 3
Conventional Data Warehouses
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
1
Level name
Child level
name
Parent level
name
Key attributes
Other attributes
Key attributes
Other attributes
Key attributes
Other attributes
(a) Level
(0,1)
(1,1)
(0,n)
(1,n)
(c) Cardinalities
(b) Hierarchy
Level name1
Key attributes
Other attributes
role name1
Fact
relationship
name
role name2
Level name2
Key attributes
Other attributes
Measures
(d) Fact relationship with measures and associated levels
Additive
Semiadditive +!
Nonadditive
/Derived
(e) Types of
measures
x
Criterion
(f) Analysis
criterion
(g) Distributing
factor
(h) Exclusive
relationships
Fig. 3.1. Notation of the MultiDim model
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
2
Product number
Product name
Description
Size
...
Product
groups
Product
Category name
Description
...
Distribution
Category
Distributor
Department
Department name
Description
...
Distributor id
Name
...
Store
Sales
Quantity
Price +
/Amount
Payment date Date
Event
Order date Weekday flag
Weekend flag
Season
...
Sector
Customer
Customer id
Customer name
Customer address
...
Customer type
Store number
Store name
Store address
Manager name
Sales group district
Sales group region
City name
City population
City area
State name
State population
State area
State major activity
...
Time
x
Sector name
Description
...
Profession
Profession name
Description
...
Fig. 3.2. A conceptual multidimensional schema of a sales data warehouse
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
3
Independent
Parallel
Dependent
Specialization
Aggregation
0..*
Association
1..*
Criterion
1
1
Individual
Strict
Alternative
Balanced
0..*
1..*
Simple
Nonstrict
Unbalanced
Generalized
Recursive
Noncovering
Fig. 3.3. Hierarchy classification
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
4
Product number
Product name
Description
Size
Distributor
...
Product groups
Product
Category
Department
Category name
Description
...
Department name
Description
...
(a) Schema
department A
category 1
product A
product B
category 2
product C
product D
(b) Examples of instances
Fig. 3.4. A balanced hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
5
ATM Name
Address
Model
Money capacity
...
Agency
Bank structure
ATM
Branch
Agency name
Address
Area
No. employees
...
Bank
Branch name
Address
Min. capital
Max. capital
...
Bank name
Address
Manager
Headquarters
...
(a) Schema
bank X
branch 1
agency 11
ATM 111
branch 2
agency 12
branch 3
agency 31
agency 32
ATM 112
(b) Examples of instances
Fig. 3.5. An unbalanced hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
6
Employee
subordinate
supervisor
(a)
Entity name
Entity type
Address
Model
Money capacity
Area
No. employees
Min. capital
Max. capital
Manager
Headquarters
...
Bank structure
Supervision
Employee id
Name
Address
City
State
Title
Position
Salary
Gender
Marital status
No. children
...
Financial entity
subsidiary
parent
(b)
Fig. 3.6. Examples of recursive hierarchies
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
7
Customer
Customer id
Customer name
Address
Branch name
Area name
...
Person
Profession name
Class name
...
Company
Type name
Sector name
...
Fig. 3.7. Entity-relationship representation of customer types
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
8
Customer
Customer id
Customer name
Address
...
Customer type
Type
Sector
Type name
Description
...
Sector name
Description
...
Branch
x
x
Profession
Class
Prof name
Description
...
Branch name
Description
...
Area
Area name
Description
...
Class name
Description
...
(a) Schema
area A
branch 1
branch 2
...
class 1
profession A
sector 1
profession B
type A
type B
...
customer X
customer Y
...
customer Z
customer K
(b) Examples of instances
Fig. 3.8. A generalized hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
9
Journal
Publication
Publication id
Title
Abstract
No. pages
Publication type
Journal name
Volume
Number
Year
Book
x
Book name
Publisher
Year
Proceedings
Conference
Proceedings name
Acceptance rate
Year
Conference id
Conference name
Description
Fig. 3.9. A generalized hierarchy without a joining level
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
10
County
Warehouse number
Name
Address
...
Warehouse location
Warehouse
City
City name
City population
City area
...
x
County name
County population
County area
...
State
x
State name
State population
State area
State major activity
...
(a) Schema
state 1
county 1
city A
city B
city C
city D
...
warehouse X
warehouse Y
...
warehouse Z
warehouse K
(b) Examples of instances
Fig. 3.10. A noncovering hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
11
Sales
Amount
Model
Dimensions
Weight
Display
Memory
...
Product groups
Product
Category
Category name
Description
...
Department
Depart. name
Description
...
(a) Schema
electronics
phone
PDA
MP3 player
mobile phone
(b) Examples of instances
Fig. 3.11. A balanced nonstrict hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
12
Product sales Aggregation Aggregation
amount
by category by department
50
100
40
60
70
50
Phone
170
20
PDA
100
Product sales Aggregation Aggregation
amount
by category by department
Electronics
370
MP3
100
(a) Strict hierarchy
Phone
170
20
40
60
70
PDA
200
Electronics
570
MP3
200
(b) Nonstrict hierarchy
Fig. 3.12. Example of aggregation for a sales amount measure
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
13
Salary
Employee id
Employee name
Position
...
Org. structure
Employee
Payroll
..
Section
Section name
Description
Activity
...
Division
Division name
Type
Responsible
...
Fig. 3.13. A nonstrict hierarchy with a distributing factor
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
14
Section name
Description
Activity
...
Org. structure
Section
Division
Division name
Type
Responsible
...
Employee
Payroll
Salary
Employee id
Employee name
Position
...
Fig. 3.14. Transforming a nonstrict hierarchy into a strict hierarchy with an
additional dimension
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
15
Employee
Section
E1
E2
E3
No. employees
by section
Section
S1
3
S1
S2
2
S2
S3
2
S3
Division
No. employees
by division
7
D1
E4
E5
(a)
(b)
Fig. 3.15. Double-counting problem for a nonstrict hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
16
Time
Date
...
Calendar
Quarter
Month
Quarter number
...
Month name
...
Year
Year
...
Bimester
Bimester number
...
(a) Schema
Year 2001
Year
Quarter
Month
Q1-2001
Q2-2001
Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001
Month
Bimester
Year
B1-2001
B2-2001
B3-2001
...
...
...
Year 2001
(b) Examples of instances
Fig. 3.16. Alternative hierarchies
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
17
Product number
Product name
Description
Size
Distributor name
...
Distributor
location
Product
Product
groups
Category
Department
Category name
Description
...
Department name
Description
...
Distributor division
x
Division name
Responsible
...
x
Distributor region
Region name
Area
...
Fig. 3.17. Parallel independent hierarchies, composed of a balanced and a
noncovering hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
18
City name
City population
City area
...
Sales district
State
State name
State population
State area
...
District name
Representative
Contact info
...
Store location
Country
Sales organization
Store number
Store name
Store address
Manager name
...
Sales organization
Store
Store location
City
Country name
Capital
Country area
GDP Growth
...
Sales region
Region name
Responsible
Region extent
...
Fig. 3.18. Parallel dependent hierarchies, composed of two balanced hierarchies
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
19
Employee id
Employee name
Address
Function
Education
...
Work
Employee
Live
City
City name
City population
City area
...
Sales district
State
State name
State population
State area
...
District name
Representative
Contact info
...
Fig. 3.19. Parallel dependent hierarchies leading to different parent members of the
shared level
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
20
Area
Sales
Category
Customer
Customer
location
Customer
type
Department
Type
Sector
x
x
Profession
Branch
Class
Quantity
Amount
Product
groups
Time
Sales district
Store
location
Customer
location
City
Store
location
Customer
location
Store
Sales
organization
Store
location
Product
State
Sales
organization
Country
Sales region
Works
Sales time
Sales
time
Payroll
Payroll
time
Bimester
Supervision
Quarter
Sales
time
Sales time
Employee
Base salary
Working hours
Extra payment
supervisor
Section
Affiliated
Month
..
Division
Program type
subordinate
Payroll
time
Year
Sales
incentive
programs
Amount
Fig. 3.20. A multidimensional schema containing several kinds of hierarchies
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
21
Store number
Store name
...
Store location
Store
Province
State
Province id
Province name
...
State name
State population
...
(a) Schema
all
state 1
province A
province B
province C
province D
...
store X
store Y
...
store Z
store K
(b) Examples of instances
Fig. 3.21. A generalized hierarchy with different root levels
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
22
Product
Product number
Product name
...
Product groups
Brand
Brand name
Description
...
x
Category
Category name
Description
...
Department
Department name
Description
...
Fig. 3.22. A generalized hierarchy with three aggregation paths and different root
levels
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
23
Product
Product number
Product name
Description
Size
...
Time
Date
Day of week
Weekday flag
Weekend flag
...
Payment date
Customer
Order date
Due date
Ship to
Order line
Shipping date
Quantity
Amount
Bill to
Customer id
Customer name
Customer address
...
Fig. 3.23. Example of a role-playing dimension
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
24
Store
Store number
Store name
Store address
Manager name
Sales group district
Sales group region
City name
City population
City area
State name
State population
State area
State major activity
...
Product
Product number
Product name
Description
Size
...
Sales
Time
Date
Event
Weekday flag
Weekend flag
Season
...
Transaction
Transaction no
Quantity
Amount
Fig. 3.24. A schema containing a fact dimension
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
25
Client
Time
Date
Day of week
Weekday flag
Weekend flag
Season
...
Financial
info
Balance
Account
Account no
Type
Description
Opening date
...
Bank structure
Client id
Client name
Client address
...
Agency
Agency name
Address
Area
No. employees
...
Fig. 3.25. Multidimensional schema for analysis of bank accounts
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
26
Time
Account
Client
Balance
T1
A1
C1
100
T1
A1
C2
100
T1
A1
C3
100
T1
A2
C1
500
T1
A2
C2
500
Fig. 3.26. Example of double-counting problem for a multivalued dimension
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
27
Client
Client id
Client name
Client address
...
Time
Date
Event
Weekday flag
Weekend flag
Season
...
Account
Financial
info
Balance
Account no
Type
Description
Opening date
...
Bank structure
Account
holders
Agency
Agency name
Address
Area
No. employees
...
(a) Creating two fact relationships
Fig. 3.27. Decomposition of the fact relationship in Fig. 3.25
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
28
Client
Client id
Client name
Client address
...
Time
Date
Event
Weekday flag
Weekend flag
Season
...
Financial
info
Balance
Account
Account no
Type
Description
Opening date
...
Bank structure
Holder
Agency
Agency name
Address
Area
No. employees
...
(b) Including a nonstrict hierarchy
Fig. 3.27. Decomposition of the fact relationship in Fig. 3.25
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
29
Time
Date
Event
Weekday flag
Weekend flag
Season
...
Financial
info
Balance
Client
Client id
Client name
Client address
...
Account
Account no
Type
Description
Opening date
...
Bank structure
Group id
...
Holder
Client group
Agency
Agency name
Address
Area
No. employees
...
Fig. 3.28. Alternative representation of the schema shown in Fig. 3.25
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
30
Dimension
1
/Name: string
1
xor
/
DimHierAgg
Generalization
Aggregation
Composition
Association
Derived attribute
1..*
Hierarchy
Related
Criterion: string
RoleName: string
1..*
1
Level
Name: string
1..*
HierLevAgg
2..*
Fact relationship
MeasAgg
0..*
2..*
1
1
1..*
Attribute
Name: string
Name: string
LevAttrAgg Type: DataType
Derived: Boolean
1
0..*
0..*
Identified
1..*
KeyAttrAgg
child parent
1..*
0..*
Connects
Key
MinChildCard: int
MaxChildCard: int
MinParentCard: int
MaxParentCard: int
DistrFactor: Boolean
Measure
Additivity: AddType
« enumeration »
DataType
integer
real
string
...
« enumeration »
AddType
additive
semiadditive
nonadditive
Fig. 3.29. Metamodel of the MultiDim model
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
31
Product
Product key
Product number
Product name
Description
Size
Category fkey
...
ProdDist
Product key
Distributor key
Distr. factor
Store
Store key
Store number
Store name
Store address
Manager name
Sales group
district
Sales group region
City name
City population
City area
State name
State population
State area
Major activity
...
Category
Category key
Category name
Description
Department fkey
...
Department
Department key
Department name
Description
...
Time
Distributor
Distributor key
Distributor id
Name
...
Time key
Date
Event
Weekday flag
Weekend flag
Season
...
Sales
Product fkey
Order date fkey
Payment date fkey
Store fkey
Customer fkey
Quantity
Amount
Sector
Sector key
Sector name
Description
...
Customer
Customer key
Customer id
Customer name
Customer address
Sector fkey
Profession fkey
...
Profession
Profession key
Profession name
Description
...
Fig. 3.30. Relational representation of the example in Fig. 3.2
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
32
Store
Product
Product key
Product number
Product name
Description
Size
Distributor
Category fkey
...
Category
Category key
Category name
Description
Department fkey
...
Department
Department key
Department name
Description
...
(a) Snowflake structure
Store key
Store number
Store name
Store address
Manager name
Sales group district
Sales group region
City name
City population
City area
State name
State population
State area
Major activity
...
(b) Flat table
Fig. 3.31. Relations for a balanced hierarchy
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
33
bank X
branch 1
agency 11
ATM 111
ATM 121
branch 2
branch 3
agency 12
PH
agency 31
agency 32
PH
PH
PH
PH
Fig. 3.32. Transformation of the unbalanced hierarchy shown in Fig. 3.5b into a
balanced one using placeholders
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
34
Employee
Employee key
Employee id
Name
Address
City
State
Title
Position
Salary
Gender
Marital status
No. children
...
Supervisor fkey
(a)
Financial entity
Entity key
Entity name
Entity type
Address
Model
Money capacity
Area
No. employees
Min. capital
Max. capital
Manager
Headquarters
...
Parent fkey
(b)
Fig. 3.33. Relational implementation of the recursive hierarchies shown in Fig. 3.6
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
35
Product
Product key
Product number
Product name
Description
Size
Distributor
Category
Department
...
Time
Time key
Date
Event
Weekday flag
Weekend flag
Season
...
Sales
Product fkey
Time fkey
Employee fkey
Quantity
Amount
Employee
Employee key
Employee id
Name
Address
City
State
Title
Position
Salary
Gender
Marital status
No. children
...
Supervisor fkey
Fig. 3.34. A schema with a recursive hierarchy in the Employee dimension
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
36
Financial entity
Entity
key
Entity
name
1
2
3
4
5
6
7
8
9
10
ATM111
ATM112
agency11
agency12
branch1
branch2
agency31
agency32
branch3
bankX
Entity
No. Min. Max.
Parent
Money
Addr. Model capacity Area
...
type
emp. capital capital
fkey
ATM
ATM
Agency
Agency
Branch
Branch
Agency
Agency
Branch
Bank
…
…
…
…
…
…
…
…
…
…
T1
T2
null
null
null
null
null
null
null
null
100000
150000
null
null
null
null
null
null
null
null
null
null
150
135
null
null
230
185
null
null
null null
null …
null null
null …
12
null
null …
9
null
null …
null 10000 99000 …
null 1500 55000 …
15
null
null …
12
null
null …
null 7500 98500 …
null null
null …
3
3
5
5
10
10
9
9
10
null
Fig. 3.35. Parent-child relational schema of an unbalanced hierarchy with instances
from Fig. 3.5b
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
37
Type
Customer
Customer key
Customer Id
Customer name
Address
Type fkey
Profession fkey
...
Type key
Type name
Description
Sector fkey
...
Profession
Profession key
Profession name
Description
Category fkey
...
Sector
Sector key
Sector name
Description
Branch fkey
...
Class
Class key
Class name
Description
Branch fkey
...
Area
Area key
Area name
Description
...
Branch
Brach key
Branch name
Description
Area fkey
...
Fig. 3.36. Relations for the generalized hierarchy shown in Fig. 3.8
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
38
Type
Customer
Customer key
Customer Id
Customer name
Address
Type fkey
Profession fkey
Branch fkey
Customer type
...
Type key
Type name
Description
Sector fkey
...
Profession
Profession key
Profession name
Description
Category fkey
...
Sector
Sector key
Sector name
Description
Branch fkey
...
Class
Class key
Class name
Description
Branch fkey
...
Area
Area key
Area name
Description
...
Branch
Brach key
Branch name
Description
Area fkey
...
Fig. 3.37. Improved relational representation for the generalized hierarchy shown in
Fig. 3.8
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
39
EmplSection
Employee fkey
Section fkey
Distributing factor
Payroll
Employee fkey
...
Salary
Employee
Employee key
Employee id
Employee name
Position
...
Section
Section key
Section name
Description
Division fkey
...
Division
Division key
Division name
Type
...
Fig. 3.38. Relational tables for the nonstrict hierarchy of Fig. 3.11
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
40
Payroll
Employee ref
...
Salary
Employee
Employee key
Employee id
Employee name
Position
Sections (1,n)
Section ref
Distrib. factor
...
Section
Section key
Section name
Description
Division ref
...
Division
Division key
Division name
Type
...
Fig. 3.39. Object-relational representation of the nonstrict hierarchy of Fig. 3.13
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
41
Quarter
Time
Time key
Date
Month fkey
...
Month
Month key
Month name
Quarter fkey
Bimester fkey
...
Quarter key
Quarter number
Year fkey
...
Year
Year key
Year
...
Bimester
Bimester key
Bimester number
Year fkey
...
Fig. 3.40. Relations for the alternative hierarchies in Fig. 3.16
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
42
City
Store
Store key
Store number
Store name
Store address
City fkey
Sales district fkey
...
City key
City name
City population
City area
State fkey
...
Sales district
District key
District name
Representative
Contact info
State fkey
...
Country
State
State key
State name
State population
State area
Country fkey
Sales region fkey
...
Country key
Country name
Capital
Country area
GDP Growth
...
Sales region
Region key
Region name
Responsible
Region extent
...
Fig. 3.41. Relational schema of a set of parallel dependent hierarchies composed of
two balanced hierarchies
Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi
43
Download