Data Warehouse Systems Gérard HAUDIQUERT 07/03/16 i List of Modules Module 1 Business Intelligence Architecture: Introduction Module 2 Data supplying Service Operational Systems Decision-Making System Module 3 Operational data and decision-making data/Information Merge Service (Data Warehouse) Module 4 Data Warehouse Design, Development and Deployment Module 5 Information typology Service based on Business Needs (Data marts) Module 6 ODS Service Module 7 Access Services Module 8 Enterprise Metadata Service Gérard HAUDIQUERT 07/03/16 ii Course Description Overview: During this session the student will become acquainted with a data warehouse architecture as well as the processes required to implement a data warehouse (DW). (e.g. DW Design, DW data load phase, data to information translation phase…). Both Enterprise data warehouse and almost real- time data warehouse approaches will be discussed in this course. The » different access services concepts such as Olap, data Mining ,XML arhitecture will be also covered during this session. Duration: 14 Hours Objectives : At the completion of this course you will be able to: •Understand the data warehouse Concepts •Describe the main features of the data Loading / Data Replication and data cleaning phases •Describe what are an Enterprise Data Warehouse, an independent data mart, a dependent data mart, a logical Data Mart, a physical Data Mart, an ODS. •List the different evolution steps of a data warehouse life cycle •List the differences between a decision-Making environment (DW) and an OLTP environment • List the main selection criteria for a data warehouse solution • List the main constraints/differences between a « traditional » and an « almost real-time » data warehouse. •List the three different phases of a data warehouse implementation project •List the main features of a normalized or a denormalized model schemas •List the the main features of a PK and a FK •List the precautions to take when implementing many data marts and ODS in a BI Infrastructure •Through a WEB case study describe how a data warehouse environment can answer to the business requirements •Understand OLAP Terminology including data modelling, ROLAP, MOLAP and HOLAP •List the data characteristics for data mining •Position Data Mining and Knowledge Discovery in Database •Understand the concepts of Data Mining (e.g. Basic steps to implement, predictive model, targets... ) •Describe advantages of using Metadata solution as well as the main characteristics of the different metadata types Session Type : All day lecture Gérard HAUDIQUERT 07/03/16 iii Table of Contents Module 1 : Business Intelligence Architecture: Introduction Objectives 1-2 “BI” Architecture and Oil Rafinery Structure 1-3 Multiple Data Sources and Data Islands 1-4 Business Infrastructure Constraints 1-5 BI Framework: Gartner Research 1-6 ROI? BI Architecture and Data Warehouse advantages 1-7 IT Schema – Principle 1-8 Decision-Making Infrastructure Main Services 1-9 Gérard HAUDIQUERT 07/03/16 iv Table of Contents Module 2 : Data supplying Service Objectives 2-2 Data supplying Service 2-3 Chapter 2-1: ETT - Extract Transform Transport 2-4 The Complex Data Acquisition Process 2-5 By Hand or with a Tool? 2-6 Data warehouse: Data Preparation Steps 2-7 How Current is the Data? 2-8 Data/Processus Latency Perspective 2-9 Entreprise Data Flow 2-10 Alimentation ETL – Definition 2-11 Transformation Engine: Architecture 2-12 ETL Tool Architectures 2-13 ETL Approaches 2-14 ETL Custom Coding 2-15 ETL Magic Quadrant (Gartner) 2-16 ETL ROI ‘ Levers ’ 2-17 Gérard HAUDIQUERT 07/03/16 v Table of Contents Chapter 2-2: EAI - Enterprise Application Integration 2-18 Alimentation: Enterprise Application Integration (EAI) 2-19 The Role of EAI 2-20 EAI Reference Architecture 2-22 Hurwitz EAI: Market segmentation 2-23 EAI: Market segmentation 2-24 Chapter 2-3: MOM - ETL vs EAI - Data Quality 2-26 Message Oriented Middleware: MOM 2-27 Example: Teradata DBMS with MQSeries Feed into Tpump 2-28 Active Data Warehouse with Enterprise Application Integration 2-29 Integration Brokers vs. Outils ETT 2-30 When Do You Choose between ETL and EAI Technologies? 2-31 Definition - Data Quality 2-33 PMP Research 2-34 Gérard HAUDIQUERT 07/03/16 vi Table of Contents Module 3 : Operational Data and Decision-Making Data/Information Merge Service (Data Warehouse) Objectives 3-2 Data Warehouse 3-3 Chapter 3-1: Data Warehouse Concepts 3-4 Enterprise Data Warehouse 3-5 Definition: Data Warehouse 3-6 Definition: Independent Data Mart 3-7 Multi Data Mart Evolution -- Decision Point! 3-8 Centralized DW - The Solution 3-9 Definition: Enterprise Data Warehouse 3-10 Definition: Dependent Data Mart 3-11 Typologies Data Warehouse: Gartner Definition 3-12 Data Warehouse vs Data Mart (Gartner) 3-13 Data Warehouse Evolution 3-14 Application Evolution: Stage 1 3-15 Application Evolution: Stage 2 3-18 Application Evolution: Stage 3 3-19 Application Evolution: Impact on the Database 3-20 “Tip of the Iceberg” The Business & IT View 3-21 Information Evolution - The Starting Point 3-22 Build on the Foundation! 3-23 Realize Exponential Growth in ROI! 3-24 Gérard HAUDIQUERT 07/03/16 vii Table of Contents Know Your Customers! 3-25 Know What Your Customers Buy! 3-26 Ready to Mine! 3-27 Data Warehousing is a Continually Evolving Process 3-28 Data Warehouse: A Solution 3-29 Be Careful! 3-30 Chapter 3-2: Data Warehouse vs OLTP Specifics 3-31 The Game is Different 3-32 Contrasting Environments 3-33 DBMS Workloads are Different 3-34 Contrasting OLTP, Traditional, and Active Data Warehousing 3-35 What a Data Warehouse Activity Cycle Looks Like. 3-36 Query: Simple vs. Complex Processing 3-37 Data Warehouse: Not Only a Problem due to the Volume! 3-38 Data Warehouse: Solution Selection Criteria 3-39 Data Warehouse Central Issues 3-40 DataWarehouse Main Actors Source IDC 3-41 Chapter 3-3: Almost Real-Time Data Warehouse 3-42 The Data Warehouse Evolves to be “Almost Real-Time” ? 3-43 Information Evolution in a Data Warehouse Environment 3-44 What’s driving this Evolution? 3-45 How « Real » is the Real-Time Trend? (Gartner Group) 3-46 Gérard HAUDIQUERT 07/03/16 viii Table of Contents Traditional DW Work Flow vs Almost Real-Time DW “Diverse” Work Flow 3-47 What an Almost Real-Time Data Warehouse Activity Cycle Looks Like. 3-48 Timeframe: Point-in-Time Vs. Historical 3-49 Business Question 3-50 The Only Constant is Change 3-51 Gérard HAUDIQUERT 07/03/16 ix Table of Contents Module 4 : Data Warehouse Design, Development and Deployment Objectives 4-2 Chapter 4-1: Data Warehouse Implementation Concepts 4-3 Data Warehouse Framework 4-4 Data Warehouse Methodology 4-5 Building the Active Warehouse is an Iterative Process 4-6 Data Architecture Issues 4-7 The Data Warehouse Becomes the Enterprise Information Integration Point 4-8 Chapter 4-2: Data Warehouse Modelisation 4-9 Business-Centric Consulting: Model the Business 4-10 Business information MODELING 4-11 Database DESIGN 4-12 Normalized Data Models 4-13 Denormalized Data Models 4-14 Database Views - The Best of Both Worlds 4-15 Modelisation Step 1: Logical data Model (LDM) 4-16 One Primary Key (PK) per Table (PK) 4-17 Foreign Key (FK) 4-18 Normalization 4-19 3rd Forme Normale: Definition 4-20 Database DESIGN 4-21 Gérard HAUDIQUERT 07/03/16 x Table of Contents Database Design Components 4-22 Modelisation Step 2: Extended Logical Data Model (ELDM) 4-23 Modelisation Step 3: Physical Data Model 4-24 Multidimensional /Normalized 4-25 Environment Physical Organization (Example) 4-26 Chapter 4-3: Data Warehouse ROI/Budget Notions 4-27 Investment Diagram/ROI: Case Study 4-28 Two Synchronized processes! 4-29 Data Warehouse: Budget Example (Gartner) 4-30 Data Warehouse Cost 4-31 Data preparation: ETL (Gartner) 4-32 Project Plan Example 4-33 Gérard HAUDIQUERT 07/03/16 xi Table of Contents Module 5 : Information typology Service based on Business Needs (Data marts) Objectives 5-2 Data Marts 5-3 Chapter 5-1: Data Marts Concepts 5-4 Data Warehouse vs Data Mart (Gartner) 5-5 Typologies Data Warehouse: Gartner Definition 5-6 Enterprise Data Warehouse 5-7 Independent Data Marts 5-8 Data Warehouse Definitions Summary 5-9 When Do You Build a Dependent Data Mart? 5-10 “Federated” Warehouse 5-11 Logical or Physical Data Marts 5-12 Chapter 5-2: Data Marts Specifics 5-13 Multiple Images/Data Marts 5-14 The Problem with Data Marts 5-15 Consider the Problem of Data Consistency 5-16 The Cost of Alternative 5-17 Guidelines to Implement Data Marts (Gartner) 5-18 PMP Research Survey 5-19 Gérard HAUDIQUERT 07/03/16 xii Table of Contents Module 6 : ODS Service Objectives 6-2 Operational Data Store 6-3 Information Evolution in a Data Warehouse Environment 6-4 Growing Number of Tactical Users 6-5 Expanded Reliance on ODS’s 6-6 The Problem with the ODS Architecture 6-7 Almost Real-Time Warehouse Architecture 6-8 Consider the Problem of Data Consistency 6-9 An Integrated Solution Example 6-10 Business Events 6-11 WEB Sites and DW 6-12 Web and DW: Crucial Observations 6-13 Web and DW:Case Study 6-14 Gérard HAUDIQUERT 07/03/16 xiii Table of Contents Module 7 : Access Services Objectives 7-2 Access Services 7-3 Chapter 7-1: OnLine Analytical Processing 7-4 What is OLAP 7-5 Multidimensional Data 7-6 OLAP Terminology 7-7 Dimension and Levels 7-8 Aggregation / Summarization 7-9 Normalized Data Models 7-10 Normalized Tables 7-11 Data Modelling 7-12 Complete Denormalization 7-13 Star Schema 7-14 Snowflake Schema 7-16 Snowflake vs. Star Schema 7-18 Designing an OLAP Data Model 7-19 OLAP Design - Basic Steps 7-20 Statistics and OLAP 7-22 ROLAP-MOLAP-HOLAP 7-23 MOLAP 7-24 ROLAP 7-25 Basic OLAP Approaches 7-26 Gérard HAUDIQUERT 07/03/16 xiv Table of Contents Chapter 7-2: Data Mining 7-28 Data Mining: Defining Characteristics 7-29 Data Mining: Data Deluge 7-30 Data Mining: The Data 7-31 Data Mining: Business Decision Support 7-32 Data Mining: Steps in Data Mining / Analysis 7-33 Data Mining vs. OLAP vs. Standard query tools 7-34 Predictive Modelling 7-35 Predictive Modelling: Types of Targets 7-36 Chapter 7-3:XML Solution Concepts 7-37 Web Browsing from the Data Warehouse 7-38 Create a XML solution: Logical view 7-39 Create a XML solution: Physical View 7-40 ISV Solutions Examples 7-41 Gérard HAUDIQUERT 07/03/16 xv Table of Contents Module 8 : Enterprise Metadata Service Objectives 8-2 Decision Support Infrastructure 8-3 Understand Metadata: Carrier ’s Example 8-4 Heterogeneous Architecture: The Headache 8-5 The Answer is in the Metadata 8-6 Business Metadata 8-7 Technical Metadata 8-8 Advantage Using Metadata 8-9 Example of Multi-Source Problems 8-10 Where is the Metadata? Everywhere! 8-11 Federated Metadata Architecture 8-12 Metadata Models for Data Warehousing 8-13 DataWarehouse Example: Case Study 8-14 Gérard HAUDIQUERT 07/03/16 xvi