Uploaded by Hiếu Thái Khắc

Ch12 The-Nature-of-Data-The-Data-Warehouse

advertisement
DATA WAREHOUSE AND
INTEGRATION
Lecturer: Nguyễn Văn Hồ, M.A.
Data Warehouse and Integration:
The Nature of Data & The Data Warehouse
Nguyễn Văn Hồ, M.A.
honv@uel.edu.vn
Data Warehouse and Integration
Agenda
Introduction
Definition of Data warehouse, Business Intelligence, and ETL
Data Warehouse vs Operational System
Evolving to a modern Data Warehousing
Q&A
Data Warehouse and Integration
Why care about data?
Data helps making decisions, a decision made without consider
data is simply a guess
Image source: http://blog.popcornmetrics.com/content/images/2015/03/blind-text2-min.png
5
Data Warehouse and Integration
Insights from data
Optimization
Competitive advantage
What’s the best that can happen?
Predictive
Modeling
What will happen next?
What if these trends continue?
Why is this happening?
Alerts
Query
Drilldown
Ad-hoc
Reports
Forecasting
Advanced
Analytics
(Predictive)
(25% of usage)
Statistical
Analysis
What actions are needed?
What exactly is the problem?
How many, how often, who and where?
Standard
Report
What just happened?
Traditional Analytics
(Descriptive/Analysis)
WHAT ISHAPPENING
(75% of usage)
Degree of intelligence
6
Source: Eight levels of analytics/SAS
Data Warehouse and Integration
Trusted Information
CONSUMER
BANKING DIVISION
•
•
•
•
•
•
•
Card center
ATM
Telesales Center
Personal credit center
Sales Management department
E-Banking department
Market research Sub-dep
The problems are:
- About 10 or 15 days needs to gather all
reports
- There are some conflict and mistakes.
- The are many format of reports (PDF file,
Word File, Tiff Files, Excel Files ,…)
- Some reports highlight the good points and
hide negative points.
- etc.
Image source: http://www.moneywalks.com/wp-content/uploads/2008/10/credit-report.jpg
7
Data Warehouse and Integration
Data is every where, information nowhere
 I can't find data I need - data is scattered over many versions
with subtle differences
 I can't understand the data I found - data is not well
documented
 I can't use the data I found - data needs to be transformed from
one form to other
8
Data Warehouse and Integration
What are the users saying…
•
Data should be integrated across the enterprise
•
Summary data has a real value to the organization
•
Historical data holds the key to understanding data over time
•
What-if capabilities are required
9
Data Warehouse and Integration
In What way I can Answer the above question?
Is Data Warehousing is the Solution?
Can I Improve my business using Data
warehousing?
Yes, How?
Data Warehouse and Integration
What is Data Warehouse?
 Is a central location where consolidated data from multiple locations are
stored
 Is not loaded every time when new data is generated
 There are timelines determined by the business as to when a Data
Warehouse needs to be loaded: Daily, monthly, once in a quarter etc.
Source
1
Source
2
Source
n
User 1
Data Warehouse
User 2
User n
11
Data Warehouse and Integration
What is Data Warehouse?
"A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile
collection of data in support of management's
decision-making process."
Subject
oriented
Integrated
Data
Warehouse
Non
Volatile
Image source: google.com
Time
variant
12
Data Warehouse and Integration
Subject-oriented
Data is categorized and stored by business subject rater than by
application
OLTP Applications
Data Warehouse Subject
Equity Plans
Shares
Insurance
Customer
financial
information
Savings
Loans
13
Data Warehouse and Integration
Integrated
Data on given subject is defined and stored once.
Savings
Current
Account
Loans
OLTP Applications
Data Warehouse
14
Data Warehouse and Integration
Time-variant
Data is stored as a series of snapshots, each representing of period
of time...
TIME
DATA
Jan-2016
January
Feb-2016
February
Mar-2016
March
15
Data Warehouse and Integration
Non-Volatile
Typically data in data warehouse is not updated or deleted
Operational
Warehouse
Load
Insert Update Delete
Read
Read
16
Data Warehouse and Integration
Changing data
First time load
Operational databases
Warehouse database
Refresh
Refresh
17
The Inmon Warehouse
Data Warehouse and Integration
What is Data Warehouse?
"Data warehouse is the conglomerate of
all data marts within the enterprise.
Information is always stored in the
dimensional model."
Image source: google.com
19
The Kimball Warehouse
Data Warehouse and Integration
Getting started with Choices
Kimball
- Will start with data marts
- Focus on quick delivery to users
Inmon
- Will focus on the enterprise
- Organizational focus
How to Choose?
Data Warehouse and Integration
What is Business Intelligence (BI)?
BI is a set of tools and techniques that enables analysis of information which
improves business decisions
Image source: http://www.vedamsoft.com/images/bi.jpg
Data Warehouse and Integration
Complete Spectrum of BI Technologies
High
PREDICTION
What may happen?
Predictive Analytics
MONITORING
What’s happening now?
Complexit
y
Dashboard, Scorecard
ANALYSIS
Why did it happen?
Cube, Visualization Utilities
REPORTING
Queries, Reporting & Search tools
What happens?
Low
Business Value
High
Data Warehouse and Integration
Data warehouse and BI Landscape
25
Image source: http://4.bp.blogspot.com/
Data Warehouse and Integration
What is ETL?
 The process of gathering data from the production systems, cleansing it,
validating it and moving it into the Data Warehouse.
 This process can be considered part of the Data Warehouse Infrastructure.
 ETL stand for
•
Data Extraction
– get data from multiple, heterogeneous, and external sources
•
Data Transformation
– convert data from legacy or host format to warehouse format
•
Data Loading
– sort, summarize, consolidate, compute views, check integrity, and build
indexes and partitions
Data Warehouse and Integration
What is ETL?
Determine the data you need and the
data you don't need in your target
OLAP
OLTP
A
G
H
O
R
C
M
K
F
C
M
L
F
T
L
R
S
T
Interested Data
Determine from where (Source) the
above data is going to come from
Determine the data extraction methods,
Cleansing rules and transform rules
Name varchar(50)
FirstName char(20)
LastName char(20)
Joining FirstName
& LastName
Load the dimension data and the fact
data
27
Data Warehouse and Integration
Data Warehouses vs Operational Systems?
• Goals
• Structure
• Size
• Performance optimization
• Technologies used
Download