The Data Warehouse Environment

advertisement
The Data Warehouse
Environment
The Structure of the Data
Warehouse
 There
are different levels of detail in the
data warehouse.
 Older
level of detail (usually on alternate,
bulk storage)
 A Current level of detail
 A level o f lightly summarized data (the
data mart level)
 A level of highly summarized data.
Subject Orientation

The data warehouse is oriented to the major
subject areas of the corporation that have
been defined in the high level corporate
data model.
 Typical subject areas include the following :






Customer
Product
Transaction or activity
Policy
Claim
Account
Day 1-Day n Phenomenon

On day 1, there is a polyglot of legacy
systems essentially doing operational,
transactional processing
 On day 2, the first few tables of the first
subject area of the data warehouse are
populated. At this point, a certain amount of
curiosity is raised, and the users start to
discover data warehouses and analytical
processing
 On day 3, more of the data warehouse is
populated, and the population of more data
comes more users.
Day 1-Day n Phenomenon
(continue...)

On day 4, as more of the warehouse
becomes populated, some of the data that
had resided in the operational environment
becomes properly placed in the data
warehouse. And the data warehouse is now
discovered as a source for doing analytical
processing
 On day 5, departmental database (data mart
or OLAP) start to blossom. Departmental
find that it is cheaper and easier to get their
processing done by bringing data from the
data warehouse into their own departmental
processing environment.
Day 1-Day n Phenomenon
(continue...)

On day 6, the land rush to departmental
systems takes place. It is cheaper, faster, and
easier to get departmental data that it is to get
data from the data warehouse. Soon end
users are weaned from the detail of data
warehouse to departmental processing.
 On the day n, the architecture is fully
developed.
Granularity
 What
is granularity ?
 The Benefit of granularity
 Granularity Example
 Dual levels of granularity
Exploration and Data Mining
 Granular
data found in the data
warehouse supports more than data
marts. It also supports the processes of
exploration and data mining
 What is Data mining ?
Living Sample Database
 The
other way of changing the
granularity of data
 How ?
Partitioning as a Design
approach



What is Partitioning ?
How to do a Partitioning ?
The benefit







Loading data
Accessing data
Archiving data
Deleting data
Monitoring data
Storing data
Problem doing partitioning
Structuring Data in the Data
Warehouse
 The
most common way to structure data
within the data warehouse
 Simple
cumulative
 Rolling summary
 Simple direct
 Continuous
Data Warehouse : The
Standard Manual

The kinds of things the publication should
contain are the following :












A description of what a data warehouse is
A description of source systems feeding the warehouse
How to use the data warehouse
How to get help if there is a problem
Who is responsible for what
The migration plan for the warehouse
How warehouse data relates to operational data
How to use warehouse data for DSS
When not to add data to the warehouse
What kind of data is not in the warehouse
A guide to the meta data that is available
What the system of record is
Auditing and the Data
Warehouse

The primary reasons for not doing auditing
from data warehouse





Data that otherwise would not find its way into the warehouse
suddenly has to be there
The timing of data entry into the warehouse changes dramatically
when auditing capability is required
The backup and recovery restrictions for the data warehouse
change drastically when auditing capability is required
Auditing data at the warehouse forces the granularity of data in the
warehouse to be at the very lowest level.
In short, it is possible to audit from the data
warehouse environment, but due to the
complications involved, it makes much more
sense to audit elsewhere
Cost Justification
 Why
not using ROI ?
 Justifying your data warehouse
 Cost of running reports
 Cost of building the data warehouse
Data
Homogeneity/Heterogeneity
 Data
homogeneity ?
 Data heterogeneity?
Personal
Databases
“Heterogeneities are everywhere”
World
Wide
Web
Scientific Databases
Digital Libraries
Purging Warehouse Data
 How
data is purged or the detail of data
is transformed ?
Reporting and the Architected
Environment
 The
differences between the two types
of reporting

Operational
Reporting


The line item is of the
essence; the summary is
of little or no importance
once used
Of interest to the clerical
community

Operational
Reporting


The line item is of little or
no use once used;the
summary or other
calculation is of primary
importance
Of interest to the
managerial community
The Operational Window of
Opportunity
 Sample
of opportunity
Incorrect Data in the Data
Warehouse
 How
should the architect handle
incorrect data in the data warehouse ?
Download