Intro to BI BI Vendor Landscape BI Roles & Responsibili0es Data Governance and Quality DW Architectures BI Capabili0es & Maturity ETL Processes Mul0-­‐
dimensional Analysis BI Architectures ì Module 3 of Business Intelligence Series Copyright © 2011 Radiant Advisors, All Rights Reserved 2 Objec0ves q  Understand the components of DW architectures q  Understand the purpose of each data store q  Understand the reasoning of Inmon’s architecture q  Understand the reasoning of Kimball’s architecture q  Understand the Maturity of architectures Copyright © 2011 Radiant Advisors, All Rights Reserved 3 Outline ì  Components of DW Architectures ì  Inmon “Hub and Spoke” Architecture ì  Kimball “Architected Data Marts” Approach ì  Other Architectures Copyright © 2011 Radiant Advisors, All Rights Reserved 4 Components of DW Architectures ì Building blocks and organizing data stores Copyright © 2011 Radiant Advisors, All Rights Reserved 5 Building Blocks of Data Warehouse ì 
Data Flow / Data Migra0on ì 
Transforming Data ì 
ì 
ì 
ì 
Data Sources: ì 
ì 
ì 
Integra0ng Data Cleansing Data Deriving Data ETL Opera0onal Systems DW Staging, EDW Data Targets: ì 
ì 
Data Source DW Staging, EDW, Data Marts Copyright © 2011 Radiant Advisors, All Rights Reserved Data Target 6 Data Stores and their Labels ì  Data Store is the most generic label for a data set ì 
Data Store Database or Files ì  Purpose specific data stores receive classic names ì  Architecture has different data ODS Staging stores based on its approach Data Warehouse ì  Some data stores are for end users and others for internal use Copyright © 2011 Radiant Advisors, All Rights Reserved Data Mart 7 Common Data Flows Independent Data Mart Kimball Data Marts Tradi0onal Data Warehouse Inmon Data Warehouse Opera0onal databases Opera0onal databases Opera0onal databases Opera0onal databases ETL ETL ETL ETL Data Mart Staging Area Data Warehouse Staging Area ETL ETL ETL Data Mart Data Mart Data Warehouse ETL Data Mart Copyright © 2011 Radiant Advisors, All Rights Reserved 8 Operational Source Systems Opera0onal databases ETL Staging Area ETL Data Warehouse ETL Data Mart ì  Applica0ons and opera0onal systems which are used by the company capture all the ac0vity within a company ì  ERP Systems ì  SAP, Oracle Apps, JD Edwards, Peopleso` ì  Order Mgmt System, Customer Care System, SalesForce, Inventory Mgmt, ì  Mainframes, Proprietary or Std Databases, Files Copyright © 2011 Radiant Advisors, All Rights Reserved 9 Staging Area Opera0onal databases ETL Staging Area ETL Data Warehouse ETL Data Mart ì  Purpose: Loca0on to store data before DW processing ì  Focus on the “Extract” of Data from Opera0onal Source Systems into the DW environment ì  Synchroniza0on arrivals of new data before DW transforma0ons can begin ì  System A: 8pm/System B: 1am/System C: 2am ì  Typically light transforma0ons are done to make DW transforma0ons easier ì  Staging Area is considered inside the DW (landing zone) Copyright © 2011 Radiant Advisors, All Rights Reserved 10 Types of Staging Areas Transient Staging: Persistent Staging: ì  Data loaded as a temporary ì  Data is kept in Staging over ì  Data is deleted or archived ì  Data Marts need a central area pre DW processing a`er DW processing ì  Purpose: ì  Synchronize Processing mul0ple DW refreshes place for “autonomous” data to be consistent ì  Purpose: ì  Data Mart consistency ì  Audit Trail History Copyright © 2011 Radiant Advisors, All Rights Reserved 11 Types of Staging Data Autonomous Data: Conformed Data: ì  Business events ì  Reference or Dimensional data ì  Events typically are not ì  References are updated and ì  Autonomous data flows ì  Reference data is kept in updated through Staging is not needed to be kept Sales Event Copyright © 2011 Radiant Advisors, All Rights Reserved maintained over 0me staging if rela0ng to event data is needed in the future Customer who made the Sale Products purchased Store loca0on of the sales Date of Sale 12 Inmon Data Warehouse ì 
Purpose: Centralized hub of consistent enterprise data which all data marts are dependent on ì 
Subject Oriented ì 
Business subject areas are business en((es and rela(onships ì 
ì 
Implies Normalized •  3rd Normal Form •  Business Modeling •  Business subjects and rela0onships •  Subject areas modeled for data integrity Integrated ì 
ì 
ì 
Opera0onal data is standardize into a single data model and made consistent in meaning and codes Data is typically atomic detail level and summarized as needed Nonvola0le ì 
ì 
Customer, Order, Loca0ons, Inventory, Product, Accoun0ng, etc. Once loaded into the data warehouse, the data is not updated or changed Time Variant ì 
Stores near current data (since the last acquisi0on process) and all historical data and changes for analysis Copyright © 2011 Radiant Advisors, All Rights Reserved 13 Kimball’s Data Warehouse ì 
Purpose: The union of all departmental data marts with enterprise consistency maintained by the bus architecture ì 
Subject Oriented ì 
Business subject areas are func(onal data marts ì 
Implies Dimensional •  Business Subject are Func0onal Areas •  Metrics based with quan0ty and units •  Metrics qualified by dimensional data (date, loca0on, product, customer, etc) ì 
Integrated ì 
ì 
ì 
Opera0onal data is standardize into a single data model and made consistent in meaning and codes Data is typically atomic detail level and summarized as needed Nonvola0le ì 
ì 
Sales, Finance, Order Mmgt, CRM, Supply Chain Once loaded into the data warehouse, the data is not updated or changed Time Variant ì 
Stores near current data (since the last acquisi0on process) and all historical data and changes for analysis Copyright © 2011 Radiant Advisors, All Rights Reserved 14 Inmon’s Data Mart Opera0onal databases ETL Staging Area ETL Data Warehouse ETL Data Mart ì  Purpose: Departmental informa0on needs ì  Dependent on the Data Warehouse for its data ì  Supports departmental focus or business discipline ì  Can be modeled as dimensional, normalized, denormalized ì  Can be implemented as databases, cubes or files ì  Dimensions are denormalized subject area en00es Copyright © 2011 Radiant Advisors, All Rights Reserved 15 Kimball’s Data Marts Opera0onal databases ETL ì  Purpose: Departmental informa0on needs ì  Dependent on the Bus Architecture Staging Area ì  Combina0on of ETL code and Persistent Staging ETL ì  Data consistency across data marts ensured by the Data Warehouse ETL Data Mart bus architecture and conformed dimensions ì  Predominately dimensional models with fact tables and dimension tables ì  Can be implemented as star schemas in databases but typically in cube databases Copyright © 2011 Radiant Advisors, All Rights Reserved 16 Operational Data Store ì  Purpose: Integrated current data refreshed throughout the day to support opera0onal needs ì  Shares the same integrated characteris0cs of a DW ì  Current only data ì  no history or very limited history ì  Refreshed more o`en then the DW ì  Typically a few 0mes per day ì  Data Flow Challenge: Acquisi0ons go to ODS or Staging first? Copyright © 2011 Radiant Advisors, All Rights Reserved 17 Exploratory Data Store ì  Purpose: Typically supports the need to extract data sets for data mining explora0on of data ì  Data Analysis with Data Mining by Sta0s0cians ì  Extracts from available data stores put into large denormalized single tables ì  Data Miners looks for pajerns and correla0ons hidden the data ì  Different analysis and workload than other data stores Copyright © 2011 Radiant Advisors, All Rights Reserved 18 Sand Box ì  Purpose: Data store that allows for prototyping, understanding data and temporary informa0on needs ì  Considered the non-­‐produc0on quick and dirty environment for playing with data ì  Business analysts, end users and BI developers u0lize this non-­‐produc0on environment for quick results Copyright © 2011 Radiant Advisors, All Rights Reserved 19 All the Data Flow Options Opera0onal Systems Data Warehouse Extract Hourly Flat Flat File File Staging Extract Once a Day CRM database Data Acquisi0ons Opera0onal Data Store Meta Data Copyright © 2011 Radiant Advisors, All Rights Reserved Financial Data Mart Sand Box Data Warehouse Data Mining Mart Data Distribu0ons ERP database Opera0onal database Data Marts Opera0ons Data Mart Sales Data Mart Financial Cube Extract As Needed 20 Hub and Spoke Architecture ì Bill Inmon “Father of Data Warehousing” Copyright © 2011 Radiant Advisors, All Rights Reserved 21 Hub and Spoke (Inmon) ERP database Opera0onal database Centralized Hub Consistent Enterprise Data Extract data Data Warehouse Distribute data Data Marts Customer Data Mart Sales Data Mart CRM database Flat Flat File File Transform Code / Business Rules/ Cleansing Financial Cube Consistency across data marts enforced by data warehouse persistence Copyright © 2011 Radiant Advisors, All Rights Reserved 22 Normalized Data Modeling En00es: Employee Department Skill Expert In Office Ajributes of Employee: Employee ID (unique Id) First name Last name Home address Rela0onships: Employee manages zero, one or many Departments Department is headed by one and only one Employee Modeling Data to be stored based on Business Rules through the use of Normaliza0on Rules for En00es, Rela0onships and Ajributes Copyright © 2011 Radiant Advisors, All Rights Reserved 23 Source 1 Source 2 Source 3 Source 4 Source 5 Data Warehouse
ODS
Staging Layer
Normalized
Subject Areas
Conformed
Dim & Metrics
Opera0ons & Supply Finance Copyright © 2011 Radiant Advisors, All Rights Reserved Order to Cash CRM 24 Top Down Approach Enterprise Scope Enterprise Modeling & Architecture Incremental Development Planning Data Warehouse Design & Development Department Scope Data Mart Design & Development Incremental Deployment Copyright © 2011 Radiant Advisors, All Rights Reserved 25 Pros and Cons of Hub and Spoke Pros: Cons: ì  Data consistency ensured ì  Focus on enterprise model ì  Scalable architecture ì  Adds addi0onal work to data through dependence ì  Supports many types of data mart needs jeopardize business needs mart development from dependence ì  Extra work to design and code ETL for DW Copyright © 2011 Radiant Advisors, All Rights Reserved 26 Deliverables for a DW Activity Data
Source Data Analysis Source Information
Data Flow
Job Scheduler scripts Stage Data Modeling & Design DW EDL DDM ODS Acquisi0on & Loading jobs Integra0on & Cleansing jobs Copyright © 2011 Radiant Advisors, All Rights Reserved Transform to Dimensional Model jobs Presenta0on development Localizing Metrics and Dimensions jobs 27 Architected Data Marts ì Ralph Kimball’s Dimensional Bus Architecture Copyright © 2011 Radiant Advisors, All Rights Reserved 28 Bus Architecture (Kimball) Dimensional “Data Warehouse” ETL Bus CRM database Flat Flat File File Transform Code Business Rules Conformed Dimensions Stage Area Customer Data Mart Load Data Opera0onal database Extract Data ERP database Sales Data Mart Financial Cube Based upon Ralph Kimball Architected Data Marts Approach Consistency across data marts enforced by ETL Business in code Copyright © 2011 Radiant Advisors, All Rights Reserved 29 Dimensional Modeling Answers Business Ques0ons: 1.  How much Sales were last month by Sales Person and Product Category? 2.  Are Sales Quan00es of Product Category A increasing each month for the past year? 3.  What Products do Customers in City A buy most of this month? 4.  Who are our repeat buy customers? Modeling Data to be stored based on Business Defined Metrics or Facts through the use of Dimensional Rules Facts and Dimensions Copyright © 2011 Radiant Advisors, All Rights Reserved 30 Bottom Up Approach Enterprise Scope Opera0ons & Support Data Mart Deployment Data Mart Design & Development Department Scope Copyright © 2011 Radiant Advisors, All Rights Reserved Iden0fy Business Area Scope 31 Pros and Cons of Data Marts Pros: Cons: ì  Quicker Deliveries ì  Departmental focus risk ì  Focused on Department Informa0on Needs ì  Dimensional Models easily understood by business ì  Designed for analy0cs Copyright © 2011 Radiant Advisors, All Rights Reserved enterprise consistency ì  Defini0ons driven by department users over enterprise ì  Poor communica0ons can lead teams to build silos of data 32 Activity Data
Data Flow
Source Data Analysis Job Scheduler scripts Source Information
Data Modeling & Design Presenta0on development ETL Source ì 
Decision
DM Need a quick win to prove ourselves ì 
90 day delivery – leverage existing reports… ì 
Department 1 gets value (others have to wait) ì 
New infrastructure, tools, training take time ì 
Success! Let s do it again, and again… Copyright © 2011 Radiant Advisors, All Rights Reserved 33 Activity Data
Data Flow
Source Data Analysis Job Scheduler scripts Information
Data Modeling & Design Decision
Presenta0on development Source Source ETL DM Source Source DM Source DM Copyright © 2011 Radiant Advisors, All Rights Reserved 34 Other Aspects of Architecture Copyright © 2011 Radiant Advisors, All Rights Reserved ì 35 Federation of Global Companies Americas Opera0ons European Opera0ons ETL ETL Staging Area Staging Area ETL Federated DW Localized Data Marts Enterprise Data ETL Americas DW Europe DW ETL ETL Am Data Mart EU Data Mart Copyright © 2011 Radiant Advisors, All Rights Reserved Opera0onal Systems & Departments in Europe ETL Opera0onal Systems & Departments in Americas Global Data Mart 36 Federation of Conglomerate Lines of Business GE Medical Business GE Capital Business ETL ETL Staging Area Staging Area Federated DW Localized Data Marts ETL Medical DW Financial DW ETL ETL Medical Data Marts Financial Data Marts Copyright © 2011 Radiant Advisors, All Rights Reserved ETL ETL Enterprise Data Global Data Mart 37 Data
Marts
Spreadmarts
GULF
Data
Warehouses
CHASM Enterprise
DW
Operational
Reporting
1. Prenatal
% of companies by stage Architecture Maturity BI Services
Infant
2. Child
3. Teenager
Business Value
Semantic Integration
Data Consolidation
4. Adult
5. Sage
TDWI Maturity Model Survey 38 Copyright © 2011 Radiant Advisors, All Rights Reserved TDWI BI Maturity Model Prenatal
Executive
Perception
Information
Culture
Infant
Cost
Center
Inform
Executives
IT Backlog
Analytics
Culture
Child
Empower
Workers
Self Service
Awareness
Understanding
Teenager
Adult
Monitor
Processes
Drive the
Business
Sage
Drive the
Market
Customized
Delivery
The BI
Utility
Actionable
Information
Decision
Automation
Cost
ROI
Value
Architecture
Management
Reporting
Spreadmart
Data Marts
Data Warehouses Enterprise Data
Warehousing
Analytical
Services
Adapted from The BI Maturity Model, Wayne Eckerson, TDWI Director of Research
Copyright © 2011 Radiant Advisors, All Rights Reserved 39 Summary ì  DW architectures are about data flows, data stores and data transforma0ons ì  Inmon’s Hub and Spoke architecture has a persisted centralized hub of data and ensures consistency through data mart dependence ì  Kimball’s Data Mart architecture is focused on 0mely business informa0on needs and ensures consistency through bus architecture and conformed dimensions Copyright © 2011 Radiant Advisors, All Rights Reserved 40 Further Research Write up a descrip0on of your main clients data warehouse environment u0lizing the terms in this module. Be sure to include data stores used and how, refresh rates, subject areas and user groups Copyright © 2011 Radiant Advisors, All Rights Reserved 41