BizTalk - Atlanta.mdf

advertisement
Atlanta Microsoft Database Forum
Introduction to Data Warehousing Concepts
Presented by
Brian Thomas
Solution Builders, Inc.
March 8, 2004
Brian.Thomas@SolutionBuilders.com
What is a Data Warehouse?
Data collected from one or many systems that exist within
and outside the organization. The Data is structured in
such a way as to reduce the amount of time that it takes to
produce reliable information.
Why Build a Data Warehouse?
• To Provide a Consistent Common Source for Corporate
Information
• To Store Large Volumes of Historical Detail Data from
Mission Critical Applications
• Improve the Ability to Access, Report Against, and
Analyze Information
• To Solve or Improve Upon Business Processes
Turning Data into Information
Functional Data Warehouse
Sales System
System Generated
Reports
Sales Analysis is extrapolated
from the System Reports.
Turning Data into Information
Functional Data Warehouse
Sales System
Functional Data Warehouse
of Sales Information
Sales Information is available to a
wider audience of decision makers.
Turning Data into Information
Division A
Cross Organizational Functional Data Warehouse
Division B
Sales System
Sales System
Division C
Centralized Data
Warehouse of Sales Data
from across the
Organization
Sales System
Analysis performed and
Decisions drawn from
the Cross Organizational
Sales Data
Turning Data into Information
Cross Functional Data Warehouse
Marketing System
Sales System
Corporate Performance
Analysis is extrapolated
from the System Reports.
Production Systems
System Generated
Reports
Turning Data into Information
Cross Functional Data Warehouse
Marketing System
Sales System
Cross Functional
Data Warehouse
of Information
Corporate Performance
Analysis is available to a
wider audience.
Production Systems
Turning Data into Information
Division C
Division B
Division A
Cross Organizational & Cross Functional
Data Warehouse
Centralized Cross
Functional Data
Warehouse
of Information
Analysis is performed and
Decisions made from the
Cross Functional
Organizational
Performance Data
Data Warehouse Architecture
Business
Group
Level
Divisional
Level
Enterprise Data Warehouse
DW / DM
DM
DM
DW / DM
DM
DM
DW / DM
DM
DM
Data Access & Query Management Services
Corporate
Level
Increased Level of Standardization
Extraction Transformation Load (ETL)
Division C
External Data
Data Warehouse Components
Increased Local Specifications
Division B
Division A
Source Systems
Management
Systems
Planning &
Forecasting
Analytics &
Modeling
Access
Methods
Portal /
Web Interface
`
Desktop
Applications
Performance
Management
Printed
Reports
Scorecards &
Dashboards
Email
Query &
Reporting
Mobile
Devices
Data Warehouse Architecture
External Data
Data Staging
Area
Extract, Transformation and Load (ETL)
Division C
Division B
Division A
Source Systems
Data Warehouse
Repository
Data Warehouse Architecture
Data Staging Area
• Subject Area Oriented
• Data Structure more closely mirrors
Operational System Data Layouts
• Supports Identification of Changed Data
• Acts as a Working Area to Support the
Transformation Process
Data Warehouse Architecture
• Perform Attribute Standardization and
Cleansing
• Apply Business Rules and Calculations
• Consolidate using Matching and Merge / Purge
Logic
• Ensure Proper Linking and Tracking of
History
Extract, Transformation and Load (ETL)
Extraction, Transformation & Load (ETL)
Data Warehouse Architecture
Extraction, Transformation & Load (ETL)
App. A: Male , Female
App. B: 1 , 0
App. C: x , y
App. D: m , f
Male, Female
Lookup Function
App. A: pipeline (cm)
App. B: pipeline (inches)
App. C: pipeline (mcf)
App. D: pipeline (yds)
pipeline (cm)
Conversion Function
App. A: Date (julian)
App. B: Date (yyyymmdd)
App. C: Date (mm/dd/yyyy)
App. D: Date (absolute)
Date (julian)
Formatting Function
App. A: Description
App. B: Description
App. C: Description
App. D: Description
App. A: balance on hand
App. B: current balance
App. C: cash in house
App. D: balance
Description
Merging Function
Balance
Mapping Function
Data Warehouse Architecture
Data Warehouse Repository
• Organized around Conformed Dimensions and
Facts
• Promotes Usability and Intuitiveness
• Consolidated and Cross-Functional
• Historical and Atomic Representation of Data
•Insulated from Source System Modifications
and Additions
Data Warehouse Repository
Star Schema Concepts
Fact Table
This table is the core of the Star
Schema Structure and contains
the Facts or Measures available
through the Data Warehouse.
These Facts answer the questions
of “What”, “How Much”, or
“How Many”.
Some Examples:
Sales Dollars, Units Sold, Gross Profit,
Expense Amount, Net Income, Unit Cost,
Number of Employees, Turnover, Salary,
Tenure, etc.
Data Warehouse Repository
Star Schema Concepts
Dimension Tables
These tables describe the Facts
or Measures. These tables
contain the Attributes and may
also be Hierarchical.
These Dimensions answer the
questions of “Who”, “What”,
“When”, or “Where”.
Some Examples:
• Day, Week, Month, Quarter, Year
• Sales Person, Sales Manager, VP of Sales
• Product, Product Category, Product Line
• Cost Center, Unit, Segment, Business, Company
Data Warehouse Repository
Star Schema Concepts
Employee_Dim
EmployeeKey
EmployeeID
.
.
.
Time_Dim
TimeKey
TheDate
.
.
.
Shipper_Dim
ShipperKey
ShipperID
.
.
.
Sales_Fact
TimeKey
EmployeeKey
ProductKey
CustomerKey
ShipperKey
Required Data
(Business Metrics)
or (Measures)
.
.
.
Product_Dim
ProductKey
ProductID
.
.
.
Customer_Dim
CustomerKey
CustomerID
.
.
.
Data Warehouse Repository
Markets Dimension
Cube Concepts
Atlanta
Chicago
Denver
Grapes
Cherries
Melons
Apples
Dallas
Q1
Q4
Q2
Q3
Time Dimension
Data Warehouse Repository
Markets Dimension
Cube Concepts
Sales
Fact
Atlanta
Chicago
Denver
Grapes
Cherries
Melons
Apples
Dallas
Q1
Q4
Q2
Q3
Time Dimension
Data Warehouse Repository
Storage Concepts
• Relational On-Line Analytical Processing (ROLAP): The information that is
stored in the Data Warehouse is held in a relational structure. Aggregations are
performed on the fly either by the database or in the analysis tool.
• Multidimensional On-Line Analytical Processing (MOLAP): This information
is aggregated in a predefined manner based on the characteristics of the
Measures and the defined hierarchy of the Dimensions. Since the data is preaggregated, navigating through the hierarchies is instantaneous. The user is
simply navigating to a point within the Multidimensional Cube and not
performing any on the fly aggregations.
• Hybrid On-Line Analytical Processing (HOLAP): This is a combination of
MOLAP and ROLAP. A portion of the data is predefined and aggregated. This
would typically be the set of information that is accessed most frequently.
Additional detail can be held in a ROLAP structure and allow a user to drill
through the MOLAP structure into the ROLAP structure.
Data Warehouse Repository
Cube Concepts
Client perspective
Query performance
Storage consumption
MOLAP
HOLAP
ROLAP
Fastest
Faster
Fast
High
Medium
Low
Microsoft Office, Reporting Services and .NET Framework
Divisional
Level
DW / DM
DM
DM
Increased Level of Standardization
Extraction Transformation Load (ETL)
Division A
Division B
Business
Group
Level
Enterprise Data Warehouse
DW / DM
DM
DM
Increased Local Specifications
Division C
Corporate
Level
DW / DM
DM
DM
Data Access & Query Management Services
SQL Server Relational Database and Analysis Services Management
Systems
Data Warehouse Components
Source Systems
External Data
Where does Microsoft fit in?
Planning &
Forecasting
Analytics &
Modeling
Access
Methods
Portal /
Web Interface
`
Desktop
Applications
Performance
Management
Printed
Reports
Scorecards &
Dashboards
Email
Query &
Reporting
Mobile
Devices
SQL Stored Procedures, SQL Views, MDX, and .NET Web Services
SharePoint Portal, Exchange, and .NET Framework
SQL Server DTS
Q &A
Download