Uploaded by sweetdua1

Data Warehousing

advertisement
Data Warehousing
1
Instructor
Mr H.S. Mluba
Email:mluba.h@gmail.com
2
Course Description
This course involves the understanding of various concept of data
warehouse which containing historical data. Such data
warehouse are created for analytical purposes (including the use
of the tools of data mining and knowledge discovery), and
storage.
3
Course Objectives
A. To enable students to understand the basic concepts of
data warehouse theory, design and implementation;
B. To understand various application of data warehouses in
comparison with transactional databases;
C. To enable student to understand how can use data
warehouse for knowledge extraction.
4
Data Warehousing
COURSE CONTENTS






Introduction to data warehouse
Trends in data warehousing
Data warehouse :The building blocks
Data pre-processing: Introduction
Data pre-processing: Data cleaning
Data pre-processing: Data transformation
5
Data Warehousing
COURSE CONTENTS




Data pre-processing: Dimensionality
reduction
Data warehouse environment-Structure of
the data warehouse
Data warehouse environment-Granularity
Data warehouse environment-Structuring
data in data warehouse
6
Data Warehousing
COURSE CONTENTS




Data Extraction, Transformation and Loading
–INTRODUCTION
Data Extraction, Transformation and Loading
– TRANSFORMATION
Data Extraction, Transformation and LoadingData Loading
Online Analytical Processing (OLAP) 7
Data Warehousing
COURSE CONTENTS





Multi dimensional OLAP (MOLAP)
Relational OLAP (ROLAP)
Dimensional Modeling
Data cube and OLAP technology in
multidimensional database
Efficient processing of OLAP queries.
8
Data Warehousing
COURSE CONTENTS




Quality factors of data warehouse and its
evaluation. Supporting Decision Making
Data quality management in data warehouse
Introduction to web warehousing
OLAP and the web, Building a web-enabled data
warehouse
9
Data Warehousing
COURSE CONTENTS



Data warehouse deployment: Deployment
activities
Data warehouse deployment: Security Issues,
Backup and Recovery
Introduction to data mining
10
Data Warehousing
COURSE CONTENTS


Selected Concept on using data warehouse for
knowledge discovery (Mining of Association
rule).
Selected Concept on using data warehouse for
knowledge discovery (ClassificationIntroduction)
11
Data Warehousing
COURSE CONTENTS


Selected Concept on using data warehouse for
knowledge discovery (Classification-Bayesian
classification and rule based classification).
Selected Concept on using data warehouse for
knowledge discovery (Classification-Lazy learner,
Prediction, Accuracy and Error Measure).
12
Reference/Text Books:
1. Inmon W. H.: Building the Data Warehouse, Wiley &Sons,
2005.
2. Paulraj Ponniah. Data Warehousing Fundamentals: A
Comprehensive Guide for IT Professionals. Copyright ©
2001 John Wiley & Sons, Inc.
3. Jiawei H., Micheline K., Data Mining Concepts and
Techniques, Morgan Kaufmann Publishers, 2001.
13
Reference/Text Books:
4. Ralph Kimball and Margy Ross. The Data Warehouse
Toolkit: The Complete Guide to Dimensional Modeling
(Second Edition). John Wiley and Sons, 2002 ISBN: 0471-20024-7
5. Humphries Hawkins Dy. Data warehousing architecture
and implementation (First Edition). Prentice Hall PTR
14
Prerequisite
 Introduction to Database Systems
15
Data Warehousing
MODULE 1- INTRODUCTION
16
Instructor
Mr H.S. Mluba
Email:mluba.h@gmail.com
17
Module 1. Introduction to Data warehouse

Evolution of Data warehouse

Concept of data warehouse

Goals of data warehouse

Data warehouse application
18
Evolution Of Data Warehouse
 Since the 1970s, organizations have gained competitive advantage
through automation of business processes to offer more efficient
and cost-effective services to customers
 This resulted in accumulation of growing amounts of data in
operational databases
 Organizations now focus on ways to use operational data to
support decision-making, as a means of gaining competitive
advantage
 However, operational systems were never designed to support
such business activities
19
Module 1. Introduction to Data warehouse

Evolution of Data warehouse

Concept of data warehouse

Goals of data warehouse

Data warehouse application
20
What is a Data Warehouse?

A copy of transaction data, specifically structured for query
and analysis” —Ralph Kimball

“A data warehouse is a simple, complete and consistent store of
data obtained from a variety of sources and made available to end
users in a way they can understand and use it in a business
context” —IBM.
21
What is a Data Warehouse?

Defined in many different ways, but not rigorously.

A decision support database that is maintained separately
from the organization’s operational database

Support information processing by providing a solid
platform of consolidated, historical data for analysis.

“A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.”—W. H. Inmon
22
What is a Data Warehouse?

Data warehousing:

The process of constructing and using data warehouses
23
Data Warehouse—Subject-Oriented

Organized around major subjects, such as customer, product, sales

Focusing on the modeling and analysis of data for decision makers,
not on daily operations or transaction processing

Provide a simple and concise view around particular subject issues
by excluding data that are not useful in the decision support process
24
Data Warehouse - Subject-Oriented (cont…)
Data is categorised and stored in the DW by type rather than by Application
Operational
Systems
Manufacturing
Accounting
Order entry
Operational data is organised by
specific processes or tasks
Data
Warehouse
Customer
Vendor
Product
Warehoused data is organised by subject
area and draws from data residing in
many operational systems
25
Data Warehouse—Integrated


Constructed by integrating multiple, heterogeneous data
sources
 relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
 Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
 E.g., Hotel price: currency, tax, breakfast covered, etc.
 When data is moved to the warehouse, it is converted.
26
Data Warehouse – Integrated (cont…)
•Built separately
•Built over time
•Integrated from start
•Built at same time
Operational Environment
Savings
Database
Data Warehouse
Database
Savings
Application
No
Application
Flavour
Current Accounts
Database
Current
Accounts
Application
Personal Loans
Database
Subject = Customer
Personal
Loans
Application
Customer data stored in several Databases
Example: Banking Institution
27
Data Warehouse—Time Variant

The time horizon for the data warehouse is significantly longer
than that of operational systems


Operational database: current value data
Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
28
Data Warehouse—Time Variant

Every key structure in the data warehouse


Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain
“time element”
29
Data Warehouse—Nonvolatile

A physically separate store of data transformed
from the operational environment
30
Data Warehouse - Non-Volatile (cont…)
Insert
Read
Update
Operational
Application
Insert
Load
Delete
Read
Operational
Application
Update
Delete
Data
Warehouse
Read Only
End Users
31
Data Warehouse—Nonvolatile

Operational update of data does not occur in the
data warehouse environment

Does not require transaction processing,
recovery, and concurrency control mechanisms

Requires only two operations in data accessing:

initial loading of data and access of data
32
Why a Separate Data Warehouse?

High performance for both systems
•
•
DBMS— tuned for OLTP: access methods, indexing, concurrency
control, recovery
Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation
33
Why a Separate Data Warehouse?

Different functions and different data:
•


missing data: Decision support requires historical data which
operational DBs do not typically maintain
data consolidation: Decision support requires consolidation
(aggregation, summarization) of data from heterogeneous
sources.
data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled.
34
The Traditional Approach

Query-driven (lazy, on-demand)
Clients
Integration System
...
Wrapper
Wrapper
Source
Source
Metadata
Wrapper
...
Source
35
Disadvantages of Query-Driven Approach
 Delay in query processing
 Slow or unavailable information sources
 Complex filtering and integration
 Inefficient and potentially expensive for frequent queries
 Competes with local processing at sources
 Hasn’t caught on in industry
36
The Warehousing Approach
Data
Warehouse
Integration System
Extractor/
Monitor
Source
Extractor/
Monitor
Source
...
Metadata
Extractor/
Monitor
...
Source
37
The Warehousing Approach
 Information integrated in advance
 Stored in warehouse for direct querying and analysis
38
Advantages of Warehousing Approach
 High query performance
– But not necessarily most current information
 Doesn’t interfere with local processing at sources
– Complex queries at warehouse
– OLTP at information sources
 Information copied at warehouse
– Can modify, annotate, summarize, restructure, etc.
– Can store historical information
– Security, no auditing
39
Difference between Data warehouse and
Operational Database
Operational Data base (OLTP) Data warehouse (OLAP)
It involves day to day
processing
It involves historical processing of
information.
OLTP systems are used by clerks,
DBAs, or database professionals.
OLAP systems are used by knowledge workers
such as executives, managers, and analysts.
It is used to run the business.
It is used to analyze the business.
It focuses on Data in.
It focuses on Information out.
It is based on Entity Relationship
Model.
It is based on Star Schema, Snowflake Schema,
and Fact Constellation Schema.
It is application oriented.
It focuses on Information out.
It contains current data.
It contains historical data.
40
Difference between Data warehouse and
Operational Database
Operational Data base (OLTP)
Data warehouse (OLAP)
It provides primitive and highly
detailed data.
It provides summarized and
consolidated data.
It provides detailed and flat
relational view of data.
It provides summarized and
multidimensional view of data.
The number of users is in
thousands.
The number of users is in
hundreds.
The number of records accessed is
in tens.
The number of records accessed is
in millions.
The database size is from 100 MB to The database size is from 100GB to
100 GB.
100 TB.
41
Module 1. Introduction to Data warehouse

Evolution of Data warehouse

Concept of data warehouse

Goals of data warehouse

Data warehouse application
42
Goals of Data Warehouse
 Serving as the foundation for improved decision making.
– It must have the right data in it to support decision making.
• Decision is the true output from facts given by data
warehouse.
43
When is Data Warehouse Not Appropriate
 When the Operational System are not ready.
– The data warehouse is populated with information primarily
from the operational systems of the enterprise. A good
indicator of operational system readiness is the amount of IT
effort focused on operational systems.
– A number of telltale signs indicate a lack of readiness
• Many new operational systems are planned for
development or are in the process of being deployed.
• Many of the operational systems are legacy applications
that require much firefighting.
• Many of the operational systems require major
enhancements and must be overhauled.
44
When is Data Warehouse Not Appropriate
 When the need is operational Integration.
– Despite its ability to provide integrated data for decisional
information needs, a data warehouse does not in any way
contribute to meeting the operational information needs of
the enterprise. They do not integrate data quickly enough or
often enough for operational management purposes.
– If the enterprise needs operational integration, then the
typical data warehouse deployment is insufficient.
45
Module 1. Introduction to Data warehouse

Evolution of Data warehouse

Concept of data warehouse

Goals of data warehouse

Data warehouse application
46
Data Warehouse Application
 The successful implementation of data warehousing
technologies creates new possibilities for enterprises.
 Applications that previously were not feasible due to the lack of
integrated data are now possible.
 There are different types of enterprises that implement data
warehouses and the types of applications that they have
deployed.
 Warehousing is categorized in different applications into the
following types and tasks.
47
Types of Warehousing Application
 Sales and Marketing
– Performance trend analysis: Since a data warehouse is
designed to store historical data, it is an ideal technology for
analyzing performance trends within an organization.
– Cross-selling: By obtaining a clearer picture of customers and
the services that they avail themselves of, the enterprise can
identify opportunities for cross-selling additional products
and services to existing customers.
– Customer profiling and target marketing: Internal enterprise
data can be integrated with census and demographic data to
analyze and derive customer profiles.
– Promotions and product bundling: The data warehouse
allows enterprises to analyze their customers' purchasing
histories as an input to promotions and product bundling.
48
Types of Warehousing Application
 Financial analysis and Management
– Risk analysis and management: Integrated warehouse data
allow enterprises to analyze their risk exposure.
– Profitability analysis: If operating costs and revenues are
tracked or allocated at a sufficiently detailed level in
operational systems, a data warehouse can be used for
profitability analysis.
49
Types of Warehousing Application
 Customer care and services
– Customer relationship management : Warehouse data can
also be used as the basis for managing the enterprise's
relationships with its many customers. Customers will be far
from pleased if different groups in the same enterprise ask
them for the same information more than once. Customers
appreciate enterprises that never forget special instructions,
preferences, or requests.
50
Download