ch9mod

advertisement
University of Manitoba
Asper School of Business
3500 DBMS
Bob Travica
Business Analytics and
Decision Making
OLTP, OLAP & SAP
Chapter 9 & SAP Materials
Updated 2018
D
B
S
Y
S
T
E
M
S
OLTP vs. OLAP
Online Transaction Processing (OLTP) = relational
database systems
Online Analytical Processing (OLAP)
Category
Data storage
Indexes
Joins
Duplicated data
Updates
Queries
OLTP
3NF tables
Few
Many
Normalized,
limited duplication
Continuous,
small data sets
Specific
OLAP
Multidimensional cubes
Many
Minimal
De-normalized DB
In some intervals,
Large data sets
Ad hoc
2 of 20
D
B
S
Y
S
T
E
M
S
OLAP via Data Warehousing
MIS 3500
Predefined
reports
Interactive
data analysis
Operations’
data
Periodical
transfers
Online Transaction
Processing (OLTP):
Querying Databases
with 3NF tables
Flat files
Online Analytical
Processing (OLAP);
Data warehousing;
Data Mining.
Usually de-normalized data.
3 of 23
D
B
S
Y
S
T
E
M
S
OLTP & OLAP in Enterprise Systems
 Enterprise Systems (Enterprise Resource Planning Systems)
support both.
 Example: An SAP-based system can be a TPS, MIS and DSS
for the entire organization. DSS capability draws on data
warehousing & cubing.
 Process approach to organization with data flowing smoothly
end-to-end. Process link up horizontally (department-todepartment) and vertically (process-sub-process).
 Business process* is for the most part the system process.
 More...
4 of 23
D
B
S
Y
S
T
E
M
S
Date Warehousing Goals
Data warehouse (DW):
 Integrates data from different sources to get a larger picture of
business
 Yields multidimensional view of data by creating data cubes
 Allows for statistical analysis on large data sets (test
hypotheses on relationships between pieces of data)
 Allows for discovering new relationships by querying cubes or
by applying data mining software.
5 of 20
D
B
Extraction, Transformation, and Loading
• Preparations performed on data – ETL process
Transform/Transport
S
Y
S
T
E
M
S
Customers
Extract
Convert “Client” to
“Customer”
Load
Apply standard product
numbers
Convert currencies
Fix region codes
Transaction data
from diverse
systems.
Data warehouse:
All data must be
consistent.
6 of 20
D
B
S
Y
S
T
E
M
S
Three-Dimensional View of Data: Cube
• Created in a datawarehouse
P1
Months in year
P2
P3
P4
P5
Sales at
Location
Logic similar to
crosstab query
And pivot table.
7 of 20
D
B
S
Y
S
T
E
M
S
Data Hierarchy
Year
Levels
Quarter
Roll-up
To get higher-level totals
Month
Week
Drill-down
To get lower-level details
Day
8 of 20
D
B
Datawarehouse Tables: Star Design
Dimension
S
Y
S
T
E
M
S
Product
ProductID
Price
Design is:
- Hierarchical (dimension tables have no
direct association)
- De-normalized (fact table): Price &
Quantity inputted to Fact table;
Calculated
fact
Dimension
Customer
Location
CustomerID
LocationID
Detail
Detail
Dimension
Fact Table *
Revenue=Price*Quantity
per customer, product, period
Sale
Sale
SaleDate
SaleDate
Quantity
Quantity
Discount
Discount
StoreID
Inputted from Product, Sale & Customer;
most dimensions replicated in Fact table.
Revenue broken down by product,
sales location, and desired time period
(time column/s – day of year, or even
smaller; basis for rollup).
New keys usually used in the fact table (e.g., SaleTbl#-Row#).
9 of 20
D
B
Data Warehouse for Tyson Foods
TYSON FROZEN FOODS
CENTRAL FACT AND DIMENSIONS
Central Fact
CUSTOMER DIMENSION
B
S
Y
S
T
E
M
S
PK DIMID_CUSTOMER
SID_CUSTOMER
**NOTE-- SEE DRAWING B
COPATIME
PK DIMID_COPATIME
SID_COPATIME
WWFWK-FISC WK
WWFWE-FISC WK END
Dimension tables (truncated) provide
inputs for “facts” (calculated attributes)
in Fact table.
ORGANIZATION DIMENSION
PK DIMID_ORGANIZATION
SID_ORGANIZATION
PRCTR- PROFIT CENTER
WWPRS-PRICING SEGMENT
WWMLG-BUSINESS DIVISION
PK
PK
PK
PK
PK
PK
DIMID_PRODUCT
DIMID_CUSTOMER
DIMID_TIME
DIMID_COPATIME
DIMID_ORGANIZATION
DIMID_DISTRIBUTION
VV100-CASES
VV101-UNIT OF MEASURE
VV102-GROSS SALES
VV105-PRODUCT MFG
VV301-GEN DIV ADMIN COST
VV403-ADVERTISING
VV407-GUARANTEED LOANS
VV409-INTERCO COGS
VV412-INTERCO COST ADJ
VV424-PRODUCT RELAT EXP
VV426-SALES RELATED EXP
VV433-PRICING PROMO
VVA01-SURCHARGE
VVA02-OFF INVOICE DISC
VVA04-BILLBACK
VVA06-SALES ADJUST
VVA07-PRIMARY BROKERAGE
VVA10-SGML EXPENSE
VVA11-SHUTTLE FRT
VVA13-OUTSIDE FREEZR COST
VVA14-TYSN FREEZR CHARGE
VVA15-DIRECT MAILING
VVA16-SALES & MKT EXP
VVA18-GEN AND ADMIN
VVA19-G AND a
VVB01--ACCRUED ADVERTISING
VVB02-ACCRUED MKT
VVB09-INTERCO SALES
VVB10-STEVEDORING
VVB12--SALES ACCRUAL
VVB19--FREIGHT FWD
VVB20-INCENTIVE BROKERAGE
VVB32-GENERAL
VVB34-R AND D ACCRUAL
VVB35-REGULAR COOP
VVB39-SAMS UPCHARGE
VVB41-SPECIAL COOP CHARGES
VVB42-SLOTTING
VVB47-AD HOC PROGRAMS
VVJ13-INVENTORY ADJUSTMENT
VVJ17-CONTRACT COST DISCOUNT
Central FactCol1
PRODUCT DIMENSION
PK DIMID_PRODUCT
SID_PRODUCT
WWML3-MINOR PRODUCT LINE
*NOTE-- SEE DRAWING A
A
DISTRIBUTION DIMENSION
PK DIMID_DISTRIBUTION
SID_DISTRIBUTION
VKBUR-SALES PERSON
ZZSBK- SECONDARY BROKER
ZZPBK- PRIMARY BROKER
BZIRK- SALES DISTRICT
TIME DIMENSION
PK DIMID_TIME
SID_TIME
0CALMONTH
0CALQUART1
0CALYEAR
0FISCPER
0FISCVARNT
0CALWEEK
10 of 20
D
B
Datawarehouse Tables: Snowflake Design
Product
S
Y
S
T
E
M
S
ItemID
Description
Price
Category
Design is:
- Network-like (dimension tables
can connect directly)
- Still partly normalized (SaleCustomer-City)
OLAPItems
MerchTblRow
SaleTblRow
Price
Quantity
Fact Table
•
•
Sale
SaleID
SaleDate
CustomerID
Discount
SalesTax
Dimension Tables
City
CityID
ZipCode
City
State
Customer
CustomerID
Phone
FirstName
LastName
Address
ZipCode
CityID
Advantage: Design of Fact table simpler (Customer, City out);
faster processing in a narrower scope.
11 of 20
SAP Datawarehouse
D
B
Datawarehouse Cube
Details
S
Y
S
T
E
M
S
Can also be
Dimensions 
More on SAP Datawarehouse
12 of 20
D
B
S
Y
S
T
E
M
S
Multidimensional View of Data –
Precursors to DW: Excel Pivot Table
Dimension:
Last Name
Facts (Key Figures,
Measures)
Dimensions:
Quarter, Month
Quarter
Month
Quarter 1 Quarter 2 Quarter 3 Quarter 4 Grand Total
LastName EmployeeIDData
Carpenter
8 Sum of Animal
1,668.91
Sum of Merchandise
324.90
Eaton
6 Sum of Animal
522.37
Sum of Merchandise
30.60
Farris
7 Sum of Animal
5,043.36
Sum of Merchandise
826.92
Gibson
2 Sum of Animal
4,983.51
Sum of Merchandise
668.25
Hopkins
4 Sum of Animal
3,747.96
Sum of Merchandise
476.91
James
5 Sum of Animal
3,282.77
Sum of Merchandise
505.89
O'Connor
9 Sum of Animal
2,643.69
Sum of Merchandise
263.70
Reasoner
3 Sum of Animal
4,577.43
Sum of Merchandise
762.30
Reeves
1 Sum of Animal
1,120.93
Sum of Merchandise
263.88
Shields
10 Sum of Animal
1,008.76
Sum of Merchandise
62.10
Total Sum of Animal
28,599.69
Total Sum of Merchandise
4,185.45
606.97
78.30
426.39
99.00
341.85
54.90
1,059.70
188.10
1,549.83
238.50
1,194.88
252.90
2,373.08
693.45
180.91
83.70
625.74
89.10
372.65
121.50
437.88
99.00
510.12
55.80
589.68
116.80
7,591.11
1,624.05
162.15
22.50
2,840.72
569.50
7.20
128.70
562.50
107.10
796.47
306.00
2,556.10
450.90
128.41
7.20
150.11
99.00
2,500.24
396.90
6,701.03
1,495.80
2,709.47
630.90
1,426.72
192.60
6,899.53
1,321.02
9,089.44
1,357.65
5,443.90
858.51
6,243.84
1,397.34
3,334.72
403.20
8,293.09
1,365.10
1,120.93
263.88
1,170.91
84.60
45,732.55
7,874.80
Can place data in rows or columns.
By grouping months, can instantly get quarterly or monthly totals.
13 of 20
D
B
S
Y
S
T
E
M
S
Multidimensional View of Data: CUBE
Option in SQL 99
SELECT Category, Month, Sum, GROUPING (Category) AS Gc,
GROUPING (Month) AS Gm
FROM …
GROUP BY CUBE (Category, Month...)
Category
Bird
Bird
…
Bird
Bird
Cat
Cat
…
Cat
(null)
(null)
(null)
…
(null)
Month
Amount
Gc
Gm
1
2
135.00
45.00
0
0
0
0
(null)
(null)
1
2
32.00
607.50
396.00
113.85
0
1
0
0
0
0
0
0
(null)
1
2
3
1293.30
1358.8
1508.94
2362.68
1
0
0
0
0
1
1
1
(null)
8451.79
1
1
14 of 20
D
B
S
Y
S
T
E
M
S
GROUPING SETS: Hiding Details
SELECT Category, Month, Sum
FROM …
GROUP BY GROUPING SETS
( ROLLUP (Category),
ROLLUP (Month),
() )
Category Month
Bird
(null)
Cat
(null)
…
(null)
1
(null)
2
(null)
3
…
(null) (null)
Amount
607.50
1293.30
1358.8
1508.94
2362.68
8451.79
15 of 20
D
B
S
Y
S
T
E
M
S
SQL RANK Functions
SELECT Employee, SalesValue
RANK() OVER (ORDER BY SalesValue DESC) AS rank
DENSE_RANK() OVER (ORDER BY SalesValue DESC) AS dense
FROM Sales
ORDER BY SalesValue DESC, Employee;
Employee
SalesValue
rank
dense
Jones
18,000
1
1
Smith
16,000
2
2
Black
16,000
2
2
White
14,000
4
3
DENSE_RANK
does not skip
numbers
• Therefore, advances in SQL motivate DBMS vendors to support
OLAP and data warehousing.
16 of 20
D
B
S
Y
S
T
E
M
S
Broader Data Analysis with
Data Mining
 Goal: To discover unknown relationships in the data that can be
used to make better decisions.
 Exploratory analysis.
 A bottom-up approach that scans the data to find relationships
 Some statistical routines, but they are not sufficient
 Statistics relies on averages
 Sometimes the important data lies in more detailed pairs
 Supervised by developer vs. unsupervised (self-organizing
artificial neural networks)
17 of 23
D
B
Common Techniques for
Data Mining
 1. Classification/Prediction
S
Y
S
T
E
M
S
 2. Association Rules/Market Basket Analysis
 3. Clustering
 Some based heavily on classical statistics (1), others use
specialized mining software (2), yet others combine
statistics with specialized mining software (3).
 Currently, these techniques are considered part of data
analytics.
18 of 23
D
B
S
Y
S
T
E
M
S
1. Classification
(Prediction)
 Purpose: “Classify” things that are causes and those that are
effects.
 Examples
 Which borrowers/loans are most likely to be successful?
 Which customers are most likely to want a new item?
 Which companies are likely to file bankruptcy?
 Which workers are likely to quit in the next six months?
 Which startup companies are likely to succeed?
 Which tax returns are fraudulent?
19 of 23
D
B
S
Y
S
T
E
M
S
Classification Process





Clearly identify the outcome/dependent variable.
Identify potential variables that might affect the outcome.
Use sample data to test and validate the model.
Run the data through the model.
Regression/correlation analysis, decision trees & tables
(below).
Income Credit History Job Stability
Credit
Success
50000
Good
Good
Yes
75000
Mixed
Bad
No
20 of 20
D
B
2. Association/Market Basket
 Purpose: Determine what events or items go together/co-occur.
S
Y
S
T
E
M
S
 Examples:
 What items are customers likely to buy together?
(Business use: Consider putting the two together to
increase cross-selling.)
21 of 23
D
B
S
Y
S
T
E
M
S
Association Challenges
 If an item is rarely purchased, any other item bought with it
seems important. So combine items into categories.
 Some relationships are obvious.
 Burger and fries.
 Some relationships are puzzling/meaningless.
 Hardware store found that toilet rings sell well only when a new
store first opened. But what does it mean?
 Caution applies to data analytics: mere relationships without a
background knowledge can be misleading.
22 of 20
D
B
S
Y
S
T
E
M
S
3. Cluster Analysis
 Purpose: Determine groups of people or some entities.
 Examples
 Are there groups of customers? (If so, we could target
them; market segmentation)
 Do the locations for our stores have elements in common?
(If so, we can search for similar clusters for new locations.)
 Do employees have common characteristics?
(If so, we can hire similar, or dissimilar, people.)
Large intercluster
distance
Small intracluster distance
23 of 23
Download