Slides - Zhangxi Lin - Texas Tech University

advertisement
ISQS 6339, Business Intelligence
Creating Data Marts
Zhangxi Lin
Texas Tech University
1
Outline
Illustrative Example : Adventure Works Cycles (AWC)
 Hands-on Case: Maximum Miniature Manufacturing
 Data Warehousing with Microsoft SQL Server 2008

 Exercise 2

Types of Dimension
 Exercise 3
ILLUSTRATIVE EXAMPLE :
ADVENTURE WORKS CYCLES
(AWC)
3
Adventure Works Cycles (AWC)
A fictitious multinational manufacturer
and seller of bicycles and accessories
 Based on Bothell, Washington, USA and
has regional sales offices in several
countries


http://www.msftdwtoolkit.com/
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
4
Basic Business Information
Product orders by category
 Product Orders by Country/Region
 Product Orders by Sales Channel
 Customers by Sales Channel Snapshot

ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
5
AWC Business Requirements - Interview
summary





Interviewee: Brian Welker,VP of Sales
Sales to resellers: $37 million last year
17 people report to him including 3 regional sales managers
Previous problem: Hard to get information out of the company’s system
Major analytic areas:
 Sales planning
 Growth analysis
 Customer analysis
 Territory analysis







Sales performance
Basic sales reporting
Price lists
Special offers
Customer satisfaction
International support
Success criteria
 Easy data access, Flexible reporting and analyzing, All data in one place

What’s missing? – A lot – No indication of business value
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
6
Business Processes
Purchase Orders
 Distribution Center Deliveries
 Distribution Center Inventory
 Store Deliveries
 Store Inventory
 Store Sales

ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
7
Analytic Themes

See the Excel file
\\TechShare\coba\d\isqs3358\Repository\AWC\
AW_Analytic_Themes_List.xls

SQL Server 2008 R2 – Data Warehousing
Scaling and Performance 41’28”
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
8
AWC’s Bus Matrix
Dimensions
Customer (Reseller)
Customer (Internet)
Sales Territory
Currency
X
X
X
X
X
Orders
X
X
X
X
X
X
X
Call
Tracking
X
X
X
X
X
X
Returns
X
X
X
X
X
X
X
Facility
Employee
X
Call Reason
Product
X
Promotion
Date
Sales
Forecasting
Channel
Business
Process
X
X
X
X
ISQS 6339, Data Mgmt & BI, Zhangxi Lin
9
Prioritization Grid
High
Customer
Profitability
Orders
Orders
Forecast
Product
Profitability
Business
Value /
Impact
Call
Tracking
Manufacturing
Costs
Low
Low
Feasibility
Exchange
Rates
Returns
High
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
10
HANDS-ON CASE:
MAXIMUM MINIATURE
MANUFACTURING
Maximum Miniatures Manufacturing –
Designing Data Mart


General business needs
◦ To analyze the statistics available from the
manufacturing automation systems. The VP would like an
interactive analysis tool, rather than printed reports, for
the analysis.
The manufacturing automation system controls all the
machines to create figurines
◦ Filling a mold with the raw material
◦ Aiding the hardening of this materials
◦ Removal from the mod when hardening is complete
◦ Computerized painting of the figurines
◦ Curing the paint if necessary
12
Maximum Miniatures Manufacturing –
Creating Data Mart

Specific Business Needs
◦ Analyzing the following numbers




Dollar value of products sold
Number of products sold
Sale tax charged on products sold
Shipping charged on products sold






Store
Sales Promotion
Product
Day, Month, Quarter, and Year
Customer
Sales Person
◦ These numbers should be viewable by:
13
Data Requirements








Number of accepted products by batch by product by machines
by day
Number of rejected products by batch by product by machines
by day
Elapsed time for molding and hardening by product by machine
by day
Elapsed time for painting and curing by curing type by product
by machine by day
Product rolls up into product subtype, which rolls up into
product type
Machine rolls up into machine type, which rolls up into country
Day rolls up into month, which rolls up into quarter, which rolls
up into year
The information should be able to be filtered by machine
manufacturer and purchase date of the machine
14
Business Need of Sales


The VP of sales for Max Min, Inc. would like to analyze
sales information. This information is collected by three
OLTP systems: the Order Processing System, the Point
of Sale (POS) system, and the MaxMin.com Online
system.
To analyze the following numbers
◦
◦
◦
◦

Dollar value of products sold
Number of products sold
Sales tax charged on product sold
Shipping charged on product sold
These number should be viewable by: store, sales
promotion, product, time, customer, sales person
15
Snowflake Schema of the Data Mart
DimBatch
Manufacturingfact
DimMachine
DimProduct
DimMachineType
DimPlant
DimMaterial
DimCountry
DimProductSubType
DimProductType
16
DATA WAREHOUSING
WITH MICROSOFT SQL
SERVER 2008
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
17
Unified Dimensional Model (UDM)



A SQL Server 2008 technology
A UDM is a structure that sits over the top of a data mart and
looks exactly like an OLAP system to the end user.
Advantages
◦
◦
◦
◦
No need for a data mart.
Can be built over one or more OLTP systems.
Mixed data mart and OLTP system data
Can include data from database from other vendors and XMLformatted data
◦ Allows OLAP cubes to be built directly on top of transactional data
◦ Low latency
◦ Ease of creation and maintenance

Features
◦ Data sources
◦ Data views
◦ Proactive caching for preprocessed aggregates
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
18
Microsoft BI Toolset

Relational engine (RDBMS)
◦ T-SQL
◦ .NET Framework Command Language Runtime (CLR)

SQL Server Integration Services (SSIS) – ETL
◦ Data Transformation Pipeline (DTP)
◦ Data Transformation Runtime (DTR)

SQL Server Analysis Service (SSAS) – queries, ad hoc use, OLAP, data
mining
◦ Multi-Dimensional eXpressions (MDX) – a scripting language for data retrieval
from dimensional database
◦ Dimension design
◦ Cube design
◦ Data mining


SQL Server Reporting Services (SSRS) – ad hoc query, report building
Microsoft Visual Studio .NET is the fundamental tool for application
development

Design Facts, Dimensions and Transformation/Load Processes 3’46”
19
Structure and Components of Business
Intelligence
MS SQL Server 2008
SSMS
SSIS
SSAS
BIDS
SSRS
SAS
EG
SAS
EM
20
OLAP Architecture
◦ ROLAP – Relational OLAP
 Stores cube structure in a multidimensional database.
 Measures are left in the relational data mart.
 The preprocessed aggregates are also stored in a relational
database table.
◦ Multidimensional OLAP (MOLAP)
 Cube structure and preprocessed aggregates - multidimensional
database
 A copy of measures is also access in multidimensional database
 Faster but takes extra time to copy measures - latency
◦ Hybrid OLAP (HOLAP)
 Measures is stored in relational database
Disadvantages of OLAP
Complexity to administer
 Requires data mart
 Latency
 Read-only

Understanding the Cube Designer Tabs









Cube Structure: Use this tab to modify the architecture of a cube.
Dimension Usage: Use this tab to define the relationships between dimensions and
measure groups, and the granularity of each dimension within each measure group.
Calculations: Use this tab to examine calculations that are defined for the cube, to
define new calculations for the whole cube or for a subcube, to reorder existing
calculations, and to debug calculations step by step by using breakpoints.
KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a
cube.
Actions: Use this tab to create or modify drillthrough, reporting, and other actions for
the selected cube..
Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you
store sections of a cube in different locations with different properties, such as
aggregation definitions.
Perspectives: Use this tab to create and manage the perspectives in a cube. A
perspective is a defined subset of a cube, and is used to reduce the perceived complexity
of a cube to the business user.
Translations: Use this tab to create and manage translated names for cube objects, such
as month or product names.
Browser: Use this tab to view data in the cube.
ISQS 6339, Data Mgmt & BI, Zhangxi
Lin
23
TYPES OF DIMENSION
Types of dimension

There are 7 types of frequently referred dimensions
◦
◦
◦
◦
◦
◦
◦

Conformed dimensions
Junk dimensions
Role playing dimensions
Slowly changing dimensions (SCD)
Aggregate dimensions
Degenerate dimensions
Many-to-many or multivalued dimensions
For more information about types of dimension, check The
Microsoft Data Warehouse Toolkit, Joy Mundy and
Warren Thornthwaite, Wiley, 2006
Conformed Dimensions




A set of data attributes that have been physically
implemented in multiple database tables using the same
structure, attributes, domain values, definitions and
concepts in each implementation.
Dimension tables are not conformed if the attributes
are labeled differently or contain different values.
Conformed dimensions come in several different flavors.
At the most basic level, conformed dimensions mean
exactly the same thing with every possible fact table to
which they are joined.
E.g. The date dimension table connected to the sales
facts and the one connected to the inventory facts.
Conformed Dimensions
Dimensions are conformed when they are either
exactly the same (including keys) or one is a perfect
subset of the other. Most important, the row headers
produced in the answer sets from two different
conformed dimensions must be able to match perfectly.
 Conformed dimensions are either identical or strict
mathematical subsets of the most granular, detailed
dimension.

Junk Dimensions





Also called miscellaneous or mystery dimensions
They are miscellaneous attributes that don’t belong to any
existing dimension.
Typically flags or indictors that describe or categorize the
transaction in some way.
Contents are often important
Four alternatives for dealing with them
◦ Leave them in the fact table
◦ Create a separate dimension for each attribute
◦ Omit them
◦ Group them into a single junk dimension
28
Degenerate Dimensions


A degenerate dimension is a dimension key in the
fact table that does not have its own dimension
table, because all the interesting attributes have
been placed in analytic dimensions.
Features
◦ No description of its own
◦ No joining to an actual dimension table
◦ No attributes

Example: transaction ID
29
Junk Dimension Example
Role-playing dimensions


A table with multiple valid relationships between itself
and a fact table is known as a role-playing dimension.
For instance, a “Time" dimension can be used for
“Order Day", as well as “Ship Date", or “Close Day".
Slowly Changing Dimensions
The dimensions that have changeable attribute
values are slowly changing dimensions (SCDs)
 The attribute values of SCD may change over
time, which are critical to understand the
dynamics of the business. The ability to track
the changes of facts over time is critical to a
DW/BI system.
 Examples

◦ Employees changed their departments
◦ Home moving (16.8% American moved per year) –
zip code changes possible.

More information
32
Three Types of SCD
Type 1 SCD overwrites the existing attribute
value with a new value.You don’t care about
keeping track of historical values
 Type 2 SCD change tracking – ETL process
creates a new row in the dimension table to
capture the new values of the changed item
 Type 3 SCD – Similar to Type 2 SCD but only
track current state and the original state; two
additional attribute: SCD Start Date, SCD Initial
Value

Aggregate Dimensions



Situation: data at different levels of granularities
Two resolutions
◦ Removing a dimension
◦ Rolling up a dimension’s hierarchy and provide a new, shrunken
dimension at the aggregate level
In the following case, the number of possible aggregates is the
number of levels in each hierarchy of each dimension multiplied
together.
34
Many-to-many or Multivalued Dimensions



Relationship between a dimension table and fact table is
called one-to-many: one row in the dimension table may
join to many rows in the fact table.
Many-to-many or Multivalued Dimensions are referred
to as there are more than one row in a dimension table
joining to multiple rows in a fact table
Bridge table supports many-to-many relationship:
◦ fact-dimension
◦ dimension-dimension.
35
Many-to-many or Multivalued Dimensions

A dimensional model for a sales fact that
captures multiple sales reasons
EXERCISE 2
Exercise 2 – Creating a data mart with
SSMS

Learning Objectives
◦ How to design a dimensional model
◦ How to create a data mart with SSMS
◦ How to create a cube for a data mart.

Tasks
◦ Manually create the fact table and DimProduct table using SSMS (see the
detailed information from file DW_MMM.PDF in the shared directory
under \Docs)
◦ Import remaining tables from oredb.lin.mmm.empty
◦ Define the primary keys of tables and the relationships among them
◦ Create a cube
The primary key of the fact table is composed of three foreign keys
plus one time dimension key: ProductCode, BatchNumber,
MachineNumber, and DateOfManufacture.
 Deliverable:

◦ The printout of the screenshot of the cube structure and the success of the
deployment
38
Hints for Deploying the OLAP Cube

Due to the security
restrictions, you need to:
◦ Double click the entry in
Data Source
◦ Type in your eRaider
login information in the
Impersonation
Information panel
◦ Change the server to
OREDB
39
The screenshot of impersonation
information
The properties of the project
After this step you can proceed to deploy the cube
Surrogate Key
A natural key is a value that has meaning to the
user, but ought to be unique for every row. A
good example of a natural key would be a
license plate number for a car.
 A surrogate key is an artificial value that has no
meaning to the user, but is guaranteed to be
unique by the database itself.

Surrogate keys are created when doing data
warehousing. They are new from the keys in original
database
 They are also called meaningless keys, substitute keys,
non-natural keys, artificial keys
 Specifically, surrogate keys are used in slowly changed
dimensions (SCD) management

42
Benefits of surrogate keys





Protect the DW/BI system from changes in the
source system
Allow the DW/BI system to integrate data from
multiple source system
Enable developers to add rows to dimensions
that do not exist in the source system
Provide the means for tacking changes in
dimension
Are efficient in the relational database and
analysis services
Heterogeneous Products
Several products with differentiated
attributes
 Problem: sharing one dimension or use
different dimension?
 Resolutions

◦ One family-oriented dimension with core fact
and product tables plus specific information
for each line of product
44
EXERCISE 3
Exercise 3 – Creating a data mart with
SSAS
◦ Learning Objectives

Learn an alternative way to create a data mart

How to deploy a data mart
◦ Tasks:


Create data mart MaxMinSales2014.<lastname> with BIDS from a cube template

Deploy the data mart
Deliverable:

The printout of the screenshot of the cube structure and the success of the deployment

Reference: DW pp119-130. Also check file MaxMinSalesDM.pdf in \Docs of the
shared network drive.

View videos:

Create data mart with BIDS 8’34”, Add in more attributes to the database 5’42”, Cube Deployment 3’11”
46
Cube structure
Data Mart Tables
Download