Lecture2 - Rabieramadan.org

advertisement
Data Warehouse Fundamentals
Rabie A. Ramadan, PhD
2
Your Assignment

For an airlines company, how can strategic information increase
the number of frequent flyers? Discuss giving specific details.

You are a Senior Analyst in the IT department of a company
manufacturing automobile parts. The marketing heads are
complaining about the poor response by IT in providing strategic
information. Draft a proposal to them explaining the reasons for
the problems and why a data warehouse would be the only
viable solution.
2
Lecture Objectives

Review formal definitions of a data warehouse




Discuss the defining features
Distinguish between data warehouses and data marts
Study each component or building block that makes up a data
warehouse
3
What is a Data Warehouse?
(a practitioner’s viewpoint)

“A data warehouse is simply a single, complete, and
consistent store of data obtained from a variety of
sources and made available to end users in a way they
can understand and use it in a business context” –
Barry Devlin, IBM Consultant

“A data warehouse is a database of data gathered from
many systems and intended to support management
reporting and decision making” – Michael Corey et al,
CTO of OneWarranty.com
4
What is a Data Warehouse?
(a Classical viewpoint)

According to W. H. Inmon
(Building a Data
Warehouse, 1992) “A DW is
a subject oriented, integrated,
time varying, non-volatile
collection of data that is used
primarily in organizational
decision making.”
5
WHAT IS DATA WAREHOUSING

A data warehouse is typically a dedicated database
system for decision making that is separate from the
production database(s) used operationally. It differs from
production system in that:
• it covers a much longer time horizon than transaction
•
•
systems
it includes multiple databases that have been processed so
that the warehouse’s data are defined uniformly (i.e., ‘clean’
data)
it is optimized for answering complex queries from
managers and analysts.
6
Standard DB v. DW
7
CHARACTERISTICS
8
CHARACTERISTICS
9
Characteristics of a Data
Warehouse
10
Characteristics of a Data
Warehouse
11
SUBJECT ORIENTATION

Data is organized around major subjects of the enterprise.
12
Subject Oriented

Data warehouses are designed to help you analyze data.
For example, to learn more about your company's sales data, you can
build a warehouse that concentrates on sales.

Using this warehouse, you can answer questions like "Who was our
best customer for this item last year?" This ability to define a data
warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.

E.g. claims data are organized around the subject of claims and not by
individual applications of Auto Insurance and Workers’ Comp
13
Class Activity

•
A data warehouse is a subject oriented. What would be
the major critical business subject for :
A local community bank as a business unit
Customer
Profit
Loans
14
Integrated

Integration is closely related to subject
orientation.

Data warehouses must put data from
disparate sources into a consistent format.
They must resolve such problems as naming
conflicts and inconsistencies among units of
measure.


When they achieve this, they are said to be
integrated.
15
Non volatile

Non-volatile means that, once entered into the
warehouse, data are not changed/updated.

This is logical because the purpose of a warehouse is
to enable you to analyze what has occurred.
16
Time Variant




In order to discover trends in business, analysts
need large amounts of data.
This is very much in contrast to online
transaction processing (OLTP) systems, where
performance requirements demand that
historical data be moved to an archive.
The data are kept for many years so they can be
used for trends, forecasting, and comparisons
over time.
A data warehouse's focus on change over
time is what is meant by the term time
variant.
17
Data Granularity
18
DATA MARTS

Data Mart: A scaled-down version of the data
warehouse


A data mart is a small warehouse designed for
the Small Business Unit (SBU) or department
level.

It is often a way to gain entry and provide an
opportunity to learn

Major problem: if they differ from department
to department, they can be difficult to integrate
enterprise-wide
19
Data Warehouse and Data Mart
20
Data Mart and Data Warehouse
21
Data Warehouse COST

Data warehouses are not cheap
•
Median cost to create (does not include operating
cost) = $2.2M

Multimillion dollar costs are common

Their design and implementation is still an art and they
require considerable time to create.
22
Data Warehouse SIZE

Being designed for the enterprise so that
everyone has a common data set, they are
large and increase in size with time.

Typical storage sizes run from 50 Gigabytes
to several Terabytes
23
APPLICATION - DATA MINING


Also known as Knowledge Data Discovery
(KDD)
Mining terminology refers to finding answers
about a business from the data warehouse that
the executive or analyst had not thought to ask
24
Data Warehouse Architectures
25
Data Warehouse Architectures:
Basic
26
Data Warehouse Architectures:
with a Staging Area
27
Data Warehouse Architectures:
with a Staging Area and Data Marts
28
A General Architecture for
Data Warehousing
29
30
Problems and Issues
31
Data Systems Supporting DW
32
Class Activity
33
Class Activity

What are the main components of a data warehouse
for your school system?
34
Project

Egypt Election System
•
Governorates’ database system
•
Summarization System
•
Data Warehouse Server
•
Web page with query based system
• Multiple databases on Multiple Servers
• Meta data
35
Download