Data Warehouse

advertisement
Data Warehouse Systems
Gérard HAUDIQUERT
07/03/16
i
List of Modules
Module 1
Business Intelligence Architecture: Introduction
Module 2
Data supplying Service Operational Systems 
Decision-Making System
Module 3
Operational data and decision-making
data/Information Merge Service (Data Warehouse)
Module 4
Data Warehouse Design, Development and
Deployment
Module 5
Information typology Service based on Business
Needs (Data marts)
Module 6
ODS Service
Module 7
Access Services
Module 8
Enterprise Metadata Service
Gérard HAUDIQUERT
07/03/16
ii
Course Description
Overview:
During this session the student will become acquainted with a data warehouse architecture as well
as the processes required to implement a data warehouse (DW). (e.g. DW Design, DW data load
phase, data to information translation phase…). Both Enterprise data warehouse and almost
real- time data warehouse approaches will be discussed in this course. The » different access
services concepts such as Olap, data Mining ,XML arhitecture will be also covered during this
session.
Duration:
14 Hours
Objectives :
At the completion of this course you will be able to:
•Understand the data warehouse Concepts
•Describe the main features of the data Loading / Data Replication and data cleaning phases
•Describe what are an Enterprise Data Warehouse, an independent data mart, a dependent data
mart, a logical Data Mart, a physical Data Mart, an ODS.
•List the different evolution steps of a data warehouse life cycle
•List the differences between a decision-Making environment (DW) and an OLTP environment
• List the main selection criteria for a data warehouse solution
• List the main constraints/differences between a « traditional » and an « almost real-time » data
warehouse.
•List the three different phases of a data warehouse implementation project
•List the main features of a normalized or a denormalized model schemas
•List the the main features of a PK and a FK
•List the precautions to take when implementing many data marts and ODS in a BI Infrastructure
•Through a WEB case study describe how a data warehouse environment can answer to the
business requirements
•Understand OLAP Terminology including data modelling, ROLAP, MOLAP and HOLAP
•List the data characteristics for data mining
•Position Data Mining and Knowledge Discovery in Database
•Understand the concepts of Data Mining (e.g. Basic steps to implement, predictive model,
targets... )
•Describe advantages of using Metadata solution as well as the main characteristics of the
different metadata types
Session Type :
All day lecture
Gérard HAUDIQUERT
07/03/16
iii
Table of Contents
Module 1 : Business Intelligence Architecture: Introduction
Objectives
1-2
“BI” Architecture and Oil Rafinery Structure
1-3
Multiple Data Sources and Data Islands
1-4
Business Infrastructure Constraints
1-5
BI Framework: Gartner Research
1-6
ROI? BI Architecture and Data Warehouse advantages
1-7
IT Schema – Principle
1-8
Decision-Making Infrastructure Main Services
1-9
Gérard HAUDIQUERT
07/03/16
iv
Table of Contents
Module 2 : Data supplying Service
Objectives
2-2
Data supplying Service
2-3
Chapter 2-1: ETT - Extract Transform Transport
2-4
The Complex Data Acquisition Process
2-5
By Hand or with a Tool?
2-6
Data warehouse: Data Preparation Steps
2-7
How Current is the Data?
2-8
Data/Processus Latency Perspective
2-9
Entreprise Data Flow
2-10
Alimentation ETL – Definition
2-11
Transformation Engine: Architecture
2-12
ETL Tool Architectures
2-13
ETL Approaches
2-14
ETL Custom Coding
2-15
ETL Magic Quadrant (Gartner)
2-16
ETL ROI ‘ Levers ’
2-17
Gérard HAUDIQUERT
07/03/16
v
Table of Contents
Chapter 2-2: EAI - Enterprise Application Integration
2-18
Alimentation: Enterprise Application Integration (EAI)
2-19
The Role of EAI
2-20
EAI Reference Architecture
2-22
Hurwitz EAI: Market segmentation
2-23
EAI: Market segmentation
2-24
Chapter 2-3: MOM - ETL vs EAI - Data Quality
2-26
Message Oriented Middleware: MOM
2-27
Example: Teradata DBMS with MQSeries Feed into Tpump
2-28
Active Data Warehouse with Enterprise Application Integration
2-29
Integration Brokers vs. Outils ETT
2-30
When Do You Choose between ETL and
EAI Technologies?
2-31
Definition - Data Quality
2-33
PMP Research
2-34
Gérard HAUDIQUERT
07/03/16
vi
Table of Contents
Module 3 : Operational Data and Decision-Making
Data/Information Merge Service (Data Warehouse)
Objectives
3-2
Data Warehouse
3-3
Chapter 3-1: Data Warehouse Concepts
3-4
Enterprise Data Warehouse
3-5
Definition: Data Warehouse
3-6
Definition: Independent Data Mart
3-7
Multi Data Mart Evolution -- Decision Point!
3-8
Centralized DW - The Solution
3-9
Definition: Enterprise Data Warehouse
3-10
Definition: Dependent Data Mart
3-11
Typologies Data Warehouse: Gartner Definition
3-12
Data Warehouse vs Data Mart (Gartner)
3-13
Data Warehouse Evolution
3-14
Application Evolution: Stage 1
3-15
Application Evolution: Stage 2
3-18
Application Evolution: Stage 3
3-19
Application Evolution: Impact on the Database
3-20
“Tip of the Iceberg” The Business & IT View
3-21
Information Evolution - The Starting Point
3-22
Build on the Foundation!
3-23
Realize Exponential Growth in ROI!
3-24
Gérard HAUDIQUERT
07/03/16
vii
Table of Contents
Know Your Customers!
3-25
Know What Your Customers Buy!
3-26
Ready to Mine!
3-27
Data Warehousing is a Continually Evolving Process
3-28
Data Warehouse: A Solution
3-29
Be Careful!
3-30
Chapter 3-2: Data Warehouse vs OLTP Specifics
3-31
The Game is Different
3-32
Contrasting Environments
3-33
DBMS Workloads are Different
3-34
Contrasting OLTP, Traditional, and Active Data Warehousing
3-35
What a Data Warehouse Activity Cycle Looks Like.
3-36
Query: Simple vs. Complex Processing
3-37
Data Warehouse: Not Only a Problem due to the Volume!
3-38
Data Warehouse: Solution Selection Criteria
3-39
Data Warehouse Central Issues
3-40
DataWarehouse Main Actors Source IDC
3-41
Chapter 3-3: Almost Real-Time Data Warehouse
3-42
The Data Warehouse Evolves to be “Almost Real-Time” ?
3-43
Information Evolution in a Data Warehouse Environment
3-44
What’s driving this Evolution?
3-45
How « Real » is the Real-Time Trend? (Gartner Group)
3-46
Gérard HAUDIQUERT
07/03/16
viii
Table of Contents
Traditional DW Work Flow vs Almost Real-Time DW
“Diverse” Work Flow
3-47
What an Almost Real-Time Data Warehouse Activity Cycle Looks Like.
3-48
Timeframe: Point-in-Time Vs. Historical
3-49
Business Question
3-50
The Only Constant is Change
3-51
Gérard HAUDIQUERT
07/03/16
ix
Table of Contents
Module 4 : Data Warehouse Design, Development and
Deployment
Objectives
4-2
Chapter 4-1: Data Warehouse Implementation Concepts
4-3
Data Warehouse Framework
4-4
Data Warehouse Methodology
4-5
Building the Active Warehouse is an Iterative Process
4-6
Data Architecture Issues
4-7
The Data Warehouse Becomes the Enterprise Information
Integration Point
4-8
Chapter 4-2: Data Warehouse Modelisation
4-9
Business-Centric Consulting: Model the Business
4-10
Business information MODELING
4-11
Database DESIGN
4-12
Normalized Data Models
4-13
Denormalized Data Models
4-14
Database Views - The Best of Both Worlds
4-15
Modelisation Step 1: Logical data Model (LDM)
4-16
One Primary Key (PK) per Table (PK)
4-17
Foreign Key (FK)
4-18
Normalization
4-19
3rd Forme Normale: Definition
4-20
Database DESIGN
4-21
Gérard HAUDIQUERT
07/03/16
x
Table of Contents
Database Design Components
4-22
Modelisation Step 2: Extended Logical Data Model (ELDM)
4-23
Modelisation Step 3: Physical Data Model
4-24
Multidimensional /Normalized
4-25
Environment Physical Organization (Example)
4-26
Chapter 4-3: Data Warehouse ROI/Budget Notions
4-27
Investment Diagram/ROI: Case Study
4-28
Two Synchronized processes!
4-29
Data Warehouse: Budget Example (Gartner)
4-30
Data Warehouse Cost
4-31
Data preparation: ETL (Gartner)
4-32
Project Plan Example
4-33
Gérard HAUDIQUERT
07/03/16
xi
Table of Contents
Module 5 : Information typology Service based on Business
Needs (Data marts)
Objectives
5-2
Data Marts
5-3
Chapter 5-1: Data Marts Concepts
5-4
Data Warehouse vs Data Mart (Gartner)
5-5
Typologies Data Warehouse: Gartner Definition
5-6
Enterprise Data Warehouse
5-7
Independent Data Marts
5-8
Data Warehouse Definitions Summary
5-9
When Do You Build a Dependent Data Mart?
5-10
“Federated” Warehouse
5-11
Logical or Physical Data Marts
5-12
Chapter 5-2: Data Marts Specifics
5-13
Multiple Images/Data Marts
5-14
The Problem with Data Marts
5-15
Consider the Problem of Data Consistency
5-16
The Cost of Alternative
5-17
Guidelines to Implement Data Marts (Gartner)
5-18
PMP Research Survey
5-19
Gérard HAUDIQUERT
07/03/16
xii
Table of Contents
Module 6 : ODS Service
Objectives
6-2
Operational Data Store
6-3
Information Evolution in a Data Warehouse Environment
6-4
Growing Number of Tactical Users
6-5
Expanded Reliance on ODS’s
6-6
The Problem with the ODS Architecture
6-7
Almost Real-Time Warehouse Architecture
6-8
Consider the Problem of Data Consistency
6-9
An Integrated Solution Example
6-10
Business Events
6-11
WEB Sites and DW
6-12
Web and DW: Crucial Observations
6-13
Web and DW:Case Study
6-14
Gérard HAUDIQUERT
07/03/16
xiii
Table of Contents
Module 7 : Access Services
Objectives
7-2
Access Services
7-3
Chapter 7-1: OnLine Analytical Processing
7-4
What is OLAP
7-5
Multidimensional Data
7-6
OLAP Terminology
7-7
Dimension and Levels
7-8
Aggregation / Summarization
7-9
Normalized Data Models
7-10
Normalized Tables
7-11
Data Modelling
7-12
Complete Denormalization
7-13
Star Schema
7-14
Snowflake Schema
7-16
Snowflake vs. Star Schema
7-18
Designing an OLAP Data Model
7-19
OLAP Design - Basic Steps
7-20
Statistics and OLAP
7-22
ROLAP-MOLAP-HOLAP
7-23
MOLAP
7-24
ROLAP
7-25
Basic OLAP Approaches
7-26
Gérard HAUDIQUERT
07/03/16
xiv
Table of Contents
Chapter 7-2: Data Mining
7-28
Data Mining: Defining Characteristics
7-29
Data Mining: Data Deluge
7-30
Data Mining: The Data
7-31
Data Mining: Business Decision Support
7-32
Data Mining: Steps in Data Mining / Analysis
7-33
Data Mining vs. OLAP vs. Standard query tools
7-34
Predictive Modelling
7-35
Predictive Modelling: Types of Targets
7-36
Chapter 7-3:XML Solution Concepts
7-37
Web Browsing from the Data Warehouse
7-38
Create a XML solution: Logical view
7-39
Create a XML solution: Physical View
7-40
ISV Solutions Examples
7-41
Gérard HAUDIQUERT
07/03/16
xv
Table of Contents
Module 8 : Enterprise Metadata Service
Objectives
8-2
Decision Support Infrastructure
8-3
Understand Metadata: Carrier ’s Example
8-4
Heterogeneous Architecture: The Headache
8-5
The Answer is in the Metadata
8-6
Business Metadata
8-7
Technical Metadata
8-8
Advantage Using Metadata
8-9
Example of Multi-Source Problems
8-10
Where is the Metadata? Everywhere!
8-11
Federated Metadata Architecture
8-12
Metadata Models for Data Warehousing
8-13
DataWarehouse Example: Case Study
8-14
Gérard HAUDIQUERT
07/03/16
xvi
Download