Datawarehouse & Datamart

advertisement
Welcome to MIS420!
Be ready, You are entering a zone
where murphy’s law works like gravity
Module II: Designing Datamarts
Datawarehouse & Datamart
OLAPs vs. OLTPs
Dimensional Modeling
Creating Physical Design Using SQL Mgt. Studio
2
BI System Components
Module 1: Delivering BI
Chapter 1, 2, 10,18– Larson
Book
• Flat Files
• Transactions DB
(OLTP)
• XML Files
• Excel Files
• Etc.
Data Modeling
Capability
Transaction
Data Source
(AKA Datawarehousing
Capability)
Module 2: Design a Datamart:
Chapter 3 & 6 Larson Book
 Requirement Analysis
 Creating a Schema
SS DB Engine
Creating KPI
 Creating Reports
 Excel and Tableau
Data Repository
OLAP System
• Datamart
• Data Warehouse
• Multidimensional
Database - Cubes
Module 4: Populate a DataMart
Chapter 7 & 8 – Larson Book
ETL Process
SSI Services
Module 3: Business Analytics
Chapter 4,9, 10 – Larson Book
Build an OLAP/Cube
SSA Services
Data Analysis
Visualization
•
•
•
•
Cube Browsing
Reporting
Dashboards
Data Mining
Data Modeling Capability: Datamart component
Module 2: Design a Datamart:
Chapter 3 & 6 Larson Book
 Requirement Analysis
 Creating a Schema
SS DB Engine: our own DBE Account: KJ##NID –You
will create all the Datamarts for ICA#2 and HW#2 in this
account – use the naming convention wisely!
Transaction
Data Source
• Flat Files
• Transactions DB
(OLTP)
• XML Files
• Excel Files
• Etc.
Data Repository
OLAP System
• Datamart
(KJ##NID
Account has
the datamart
tables and
schemas)
• DataWarehouse
• Multidimensional
Database - Cubes
Module 4: Populate a DataMart
Chapter 7 & 8 – Larson Book
ETL Process
SSI Services
Module 3: Business Analytics
Chapter 4,9, 10 – Larson Book
Build an OLAP/Cube
SSA Services
Module 1: Delivering BI
Chapter 1, 2, 10,18– Larson
Book
Creating KPI
 Creating Reports
 Excel and Tableau
Data Analysis
Visualization
•
•
•
•
Cube Browsing
Reporting
Dashboards
Data Mining
Outline
 Data Warehouse Concept
 OLAPs vs. OLTPs (fundamental differences that suggest
the need for different design approaches)
 Dimensional Modeling
 Creating Physical Design Using SQL Mgt. Studio
5
Datawarehouse & Datamart
Concept and Characteristics
6
Data Warehouse
 Data Warehouse is a “central” repository for all or significant parts
of the data that an enterprise's various business systems collect.
 A warehouse is a collection of data that is
1.
2.
3.
4.
subject-oriented,
integrated,
time-variant and
non-volatile.
 Provides a consolidated view of enterprise data, optimized for
reporting and analysis.
 A physical repository where relational data are specially organized
to provide enterprise-wide, cleansed data in a standardized format
 Data Marts are smaller versions of warehouses
7
OLAP vs OLTP
8
OLAP vs. OLTP
 Online Transaction Processing Systems (OLTP): Systems
that process transactions (e.g., order processing) –
Inserting, Updating, Deleting appropriate records in a
database at the end of each transaction.
 Online Analytical Processing Systems (OLAP): Systems that
summarize&&analyze
analyze a collection of transaction data.
summarize
9
OLTP vs OLAP
 Relationship between OLTP and OLAP?
 Structural/Design differences?
 Purpose /Function difference?
 Difference in the type of data or information stored
 Size
 Users
 Data stored
 Performance Metric?
10
OLTP vs OLAP
 Relationship between OLTP and OLAP? OLTP a data source for OLAP
 Structural/Design differences?
 ER Modeling vs. Dimensional Modeling
 ER-Design vs. Star or Snow-Flake Design
 ER-Design -well structured steps, have been used and tested
for decades vs. Star and Snow-Flake Design widely used for
only a decade and still unstructured and the “rules” are not
well established
 Application oriented vs. Subject oriented
11
OLTP vs OLAP
 Purpose /Function difference?
 OLTP process transactions vs. .OLAP conducts analysis (performance, gain
insight)
 OLTP focus on transaction processing efficiencies vs. OLAP ease data retrieval
that is cognitively less overloading (allows for “chunks” or “Cubes” of data to
be viewed
 OLTP process repetitive transactions (insert, delete) and conduct simple
manipulations (select , update) vs. OLAP involves examining (mostly read only)
many data items, complex relationships and focuses on aggregates
 OLTP views detailed and flat transactions vs. OLAP multidimensional and
aggregates
12
OLTP vs OLAP
 Difference in the type of data or information stored
 OLTP current and isolated vs. OLAP historic and consolidated
 OLTP stores data specific to a transaction vs. OLAP stores data
specific to performance
 Size
 Users - OLTP has thousands of users vs. OLAP have hundreds or
fewer users
 Data stored - OLTP stores 100s MB-GB vs. OLAP stores 100s GBTB
 Performance Metric?
 Transaction Throughput vs. OLAP Query Throughput
 Data Quality - “Dirty” data a major issue for OLAP
13
Dimensional Modeling
Modeling Technique used to design data warehouses and data marts
14
ER Modeling vs. Dimensional Modeling
ER Modeling
Dimensional Modeling
 Transaction Capture
 Data Retrieval
 Reduce Data Redundancy –
 Intuitive and high query




15
highly normalized tables
Hard for End-user to
understand and remember
Not query friendly
All the attributes for an entity
including categorical as well
as numeric, belong to the
entity table.
Well defined theory driven
process
performance
 Categorical data in a
'dimension' entity and the
'fact' entity has mostly
numeric attributes.
 The only categorical (nonfact) field in the fact table are
the keys to dimension tables
 Process ill-defined…more of
an art
Dimensional Modeling – Benefits
16
1.
Produce database structures that are easy for end users to
understand and write queries against.
2.
Optimize query performance (as opposed to update
performance).
3.
Scalability - Dimensional models are scalable and “easily”
accommodate unexpected new data.
Designing a Data Mart
 Conceptualize: Identifying the information that the decision
makers need - measures, dimensions, hierarchies, and
attributes. (Group Deliverable I)
 Design & Build the database structure for the data mart using
either a star or snowflake schema. (Group Deliverable II)
17
Requirement Analysis –Decision Makers' Needs (GD#1)
 Business intelligence design must start with the decision
makers
 What foundational and feedback information do they need?
 How do they need that information sliced and diced for proper
analysis?
 More specifically:
 What facts, figures, statistics, and so forth do you need for
effective decision making? (measures)
 How should this information be sliced and diced for analysis?
(dimensions)
 What additional information can aid in decision making?
(attributes)
18
Data Mart – Structure
 Data Mart’s Structure consists of the following two types of
data objects
 Performance Measures (also referred as facts)
 Dimensions
 Hierarchies
 Attributes
19
Data Mart – Structure
 Performance Measures :A Measure is a numeric quantity
expressing some aspect of the organization's performance. The
information represented by this quantity is used to support or
evaluate the decision making and performance of the
organization. A measure can also be called a fact. Example –
Total Sales.
 Information needed during the design process
1.
2.
3.
4.
Name of the measure
What fields should be used to supply the data (source)
Data type (money, integer, decimal)
Formula used to calculate the measure (if there is one)
 Measures define what the decision makers want to see
20
Data Mart – Structure
 Dimensions (Slicers): A Dimension is a categorization used to spread
out an aggregate measure to reveal its constituent parts. Examples: “total
sales by sales person by year”
 Dimension - Key words: "by," "for each," or "for every“
 Information needed during the design process
 Name of the dimension
 What fields should be used to supply the data (source)
 Data type of the dimension's key (the code that uniquely identifies each member of the
dimension)
 Name of the parent dimension (if there is one)
 The dimensions and hierarchies define how the decision maker wants to
21
view the data.
Data Mart – Structure
 Hierarchy (Slicers; Drill Down): A Hierarchy is a structure
made up of two or more levels of related dimensions. A
dimension at an upper level of the hierarchy completely
contains one or more dimensions from the next lower level of
the hierarchy. Example: Time Dimension – Month, Quarter,
Year.
 Hierarchies are used to organize dimensions into various levels

22
Hierarchies – “roll up cities into sales regions" or "drill down from year into
quarter”
Data Mart – Structure
 Attributes: An Attribute is an additional piece of information
pertaining to a dimension member that is not the unique identifier or
the description of the member. Example: Regional Manager’s
information, Customers’ gender and age.
 Provides more contextual information about a dimension
 Information needed during the design process
 Name of the attribute
 What fields should be used to supply the data (source)
 Data type
 Name of the dimension to which it applies
 Allows decision makers to filter data
23
Dimensional Design – The Schema
 Key Principle - A dimensional schema physically separates
the measures that quantify a subject’s performance (e.g.,
student, business, team, process) from the descriptive
elements (a.k.a. dimensions) that summarize and categorize
the performance.
 Two types of schema
 A Star Schema
 A Snow Flake Schema
24
Data Mart’s – Data Objects – Various Measures
and Dimensions – how to configure?
Dimensions
Measures
Measures
Hierarchies
Hierarchies
25
The main idea underlying this design
Dim 3
Dim 6
Dim 1
Measure
Group
(Facts)
Dim 5
Dim 2
Dim 4
26
The Star Schema
27
The Snow Flake Schema
28
The Tables
 Measures – All the measures are placed in a single table called
the fact table in the schema
 The dimensions are places in their own table
 In the star schema, all the information for a hierarchy is stored in
the same table. The information for the parent (or grandparent or
great-grandparent, and so forth) dimension is added to the table
containing the dimension at the lowest level of the hierarchy.
 The snowflake schema works a bit differently. In the snowflake
schema, each level in the dimensional hierarchy has its own table.
The dimension tables are linked together with foreign key
relationships to form the hierarchy.
29
A Four Step Dimensional Modeling Process http://www.kimballgroup.com/ (Not in the book)
 Step 1: Describe the Business Process that the
Datamart Supports & Identify the Sources of
Measurement
 Key concept - Measurement Events
 Step 2: Declare the Fact Table Grain
 Key Concept – Fact Table Data Views
 Step 3: Choosing the Dimensions
 Key Concept – Cardinalities & Hierarchies
 Step 4: Choosing the Facts
 Key Concept – Its relationships with the measurement
events and the grain
30
Dimension Modeling Details - Steps
and Examples
Refer to the Class Handout and
LBD#1 for this section
31
Converting Logical Design to Physical Design
Using SQL Mgt. Studio
Refer to LBD#2 for this Section
32
Summary
 Overview of Data Warehouse concept – A data source for
OLAPs
 OLTP vs OLAP – Compare and Contrast
 Dimensional Modeling
 Benefits
 Data Objects
 Data Structures
 Schemas – Logical and Physical
33
Process of Designing these schemas
Logical Dimensional Schema
1.
In a logical dimensional schema, the fact, measures, and dimensions are represented
as entities and attributes that are independent of a database vendor and can be
transformed to a physical dimensional schema for any database vendor (such as SQL
Server 2012).
2.
A logical schema conceptually separates the facts/measures and the dimensions
surrounding measure events – i.e., illustrates the performance (in form of measures
and dimensions) about which an individual/organization wants to collect data, and
depicts relationships among the measures and dimensions.
3.
A logical schema contains representations of facts, dimensions, hierarchies, and
attributes, relationships, unique identifiers, and constraints between relationships.
Physical Dimensional Schema
After the logical objects and relationships are defined in a logical data model, you can use
the SQL Server 2012 Management Studio to transform (i.e., digitize) the logical model
into a database-specific physical representation in the form of a physical schema. A physical
dimensional schema the objects in the star or snowflake schema are actually database
tables.
34
Download