Uploaded by Syed Ijlal Haider

turban ch02 DataWarehouse

advertisement
Business Intelligence:
A Managerial Approach
Chapter 2:
Data Warehousing—DW
Learning Objectives




2-2
Understand basic definitions & concepts of DW
Learn different types of DW architectures; their
comparative advantages and disadvantages
Describe processes used in developing & managing
DW
Explain DW operations
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Learning Objectives

Explain the role of DW in decision support

Explain ETL processes


2-3
Describe real-time (a.k.a. right-time and/or
active) DW
Understand DW administration & security issues
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
What is a Data Warehouse?

Data warehouse is a single, complete and
consistent store of data obtained from a
variety of different sources made available to
end users in what they can understand and
use in a business context
Barry Bevlin’s
2-4
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
What is a Data Warehouse?
2-5
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
What is a Data Warehouse?


2-6
A physical repository where relational data
are specially organized to provide enterprisewide, cleansed data in a standardized format
“The DW is a collection of integrated,
subject-oriented databases designed to
support DSS functions, where each unit of
data is non-volatile and relevant to some
moment in time”
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Why Data Warehouse?
2-7
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Advantages of DW?
2-8
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Characteristics (properties) of DW
Subject oriented
Data is categorized and stored by business
subject rather than by application
Integrated
Data on a given subject is collected from disparate
sources and stored in a single place
Time-Variant
Data is stored as a series of snapshots, each
representing a period of time
Non-Volatile
2-9
Typically data in DW is not updated or deleted
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Subject oriented
Data are organized by detailed subject,
such as sales, products, or customers
containing only information relevant for
decision support. Subject orientation
enables users to determine not only how
their business is performing, but why.
2-10
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Integrated
DW must place data from different
sources into a consistent format. To do
so, they must deal with naming conflicts
and discrepancies among units of
measure. A DW is presumed to be totally
integrated.
2-11
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Time variant—temporal
time series
A DW maintains historical data. They detect
trends,
deviations,
and
long-term
relationships for forecasting and comparisons,
leading to business decision making.
Every DW has a temporal quality. Frequency
of time may be daily, weekly, monthly or
even yearly views
2-12
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Nonvolatile
After data are entered into a DW, users
cannot change or update the data. Obsolete
data are discarded, and changes are recorded
as new data.
2-13
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Characteristics of DW—Metadata
data about data
A data warehouse contains metadata
about how the data are organized and
how to effectively use them.
The author, date created, file extension,
date modified and file size are examples
of very basic document metadata
2-14
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Metadata
2-15
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Characteristics of DW—Relational
The relational database model uses a
two-dimensional structure of rows and
columns to store data, in tables of
records corresponding to real-world
entities. Tables can be linked by common
key values.
2-16
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Terminologies in Relational databases
Data items
Records—Rows
Fields—Columns
Tables
Database
DBMS
Relational Database
RDBMS
2-17
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Relational Database model
2-18
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Characteristics of DW—Multidimensional
in multi-dimensional database the data is
presented to the user in such a way as to
represent a hypercube, or multi-dimensional
array, where each individual data value is
contained within a cell accessible by multiple
indexes
2-19
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Ti
m
e
Sales volumes of
a specific Product
on variable Time
and Region
Cells are filled
with numbers
representing
sales volumes
Geography
Product
Sales volumes of
a specific Region
on variable Time
and Products
Sales volumes of
a specific Time on
variable Region
and Products
2-20
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
A multi-dimensional database
view
A 3-dimensional
OLAP cube with
slicing
operations
Characteristics of DW—Client/server
A data warehouse uses the client/server
architecture to provide easy access for end
users.
2-21
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Characteristics of DW—Active
Real-time and/or right-time
Newer data warehouses provide realtime, or active, data access and analysis
capabilities
2-22
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mart
a DW combines databases across an entire
enterprise, a data mart is usually smaller and
focuses on a particular subject e.g.,
marketing, operations or department like
Finance
2-23
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mart
2-24
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Types of Data Mart
2-25
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
DW Framework
No data marts option
Applications
(Visualization)
Data
Sources
Access
ETL
Process
Select
Legacy
Metadata
Extract
POS
Transform
Enterprise
Data warehouse
Integrate
Other
OLTP/wEB
Data mart
(Finance)
Load
Replication
External
data
2-26
Data mart
(Engineering)
Data mart
(...)
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
/ Middleware
Data mart
(Marketing)
API
ERP
Routine
Business
Reporting
Data/text
mining
OLAP,
Dashboard,
Web
Custom built
applications
Legacy Data
National Register of Citizens and Electoral Rolls
are collectively called as the Legacy Data
Also Information stored in an old or obsolete
format or computer system that is, therefore,
difficult to access or process.
2-27
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
DW Architecture
2-28
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
3 Tier architecture



2-29
Data acquisition software (back-end) which
extracts data from legacy systems and external
sources, consolidates and summarizes them, and
loads them into DW
DW itself that contains the data & software
Client (front-end) software that allows users to
access and analyze data from DW
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
DW Architectures
Tier 1:
Client workstation
BI Engine
2-30
Tier 2:
Application server
DW Itself
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Tier 3:
Database server
data and the software
for data acquisition
DW Architectures
Tier 1:
Client workstation
BI Engine
2-31
Tier 2:
Application & database server
DW itself and
data and the software
for data acquisition
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
A Web-based DW Architecture
Web pages
Client
(Web browser)
Internet/
Intranet/
Extranet
Web
Server
An extranet is a website that allows controlled
access to partners, vendors and suppliers or an
authorized set of customers
2-32
Application
Server
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data
warehouse
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Extraction, transformation, and load (ETL)
Transient
data source
Packaged
application
Data
warehouse
Legacy
system
Extract
Transform
Cleanse
Load
Data mart
Other internal
applications
2-33
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
DW Schema



2-34
Star Schema
Snowflake Schema
Galaxy Schema or Fact constellation
Schema
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Schema
2-35
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Types of Schemas: Star Schema
2-36
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Types of Schemas: Snowflake Schema
2-37
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Types of Schemas: Galaxy Schema
2-38
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Representation of Data in DW


Dimensional Modeling – a retrieval-based system that
supports high-volume query access
Star schema – the most commonly used and the
simplest style of dimensional modeling




2-39
Contain a fact table surrounded by and connected to several
dimension tables
Fact table contains the descriptive attributes (numerical
values) needed to perform decision analysis and query
reporting
Dimension tables contain classification and aggregation
information about the values in the fact table
Snowflakes schema – an extension of star schema
where the diagram resembles a snowflake in shape
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Multidimensionality

Multidimensionality
The ability to organize, present, and analyze data
by several dimensions, such as sales by region, by
product, by salesperson, and by time (four
dimensions)

Multidimensional presentation



2-40
Dimensions: products, salespeople, market segments,
business units, geographical locations, distribution channels,
country, or industry
Measures: money, sales volume, head count, inventory
profit, actual versus forecast
Time: daily, weekly, monthly, quarterly, or yearly
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Analysis of Data in DW

Online analytical processing (OLAP)



OLAP Activities




2-41
Data driven activities performed by end users to
query the online system and to conduct analyses
Data cubes, drill-down / rollup, slice & dice, …
Generating queries (query tools)
Requesting ad hoc reports
Conducting statistical and other analyses
Developing multimedia-based applications
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Analysis of Data Stored in DW
OLTP vs. OLAP

OLTP (online transaction processing)



OLAP (online analytic processing)


2-42
A system that is primarily responsible for capturing
and storing data related to day-to-day business
functions such as ERP, CRM, SCM, POS,
The main focus is on efficiency of routine tasks
A system is designed to address the need of
information extraction by providing effectively and
efficiently ad hoc analysis of organizational data
The main focus is on effectiveness
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLTP vs. OLAP
2-43
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLTP vs. OLAP
2-44
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP vs. OLTP
2-45
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP Operations





2-46
Slice – a subset of a multidimensional array
Dice – a slice on more than two dimensions
Drill Down/Up – navigating among levels of
data ranging from the most summarized (up)
to the most detailed (down)
Roll Up – computing all of the data
relationships for one or more dimensions
Pivot – used to change the dimensional
orientation of a report or an ad hoc querypage display
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
e
Ti
m
Slicing
Operations on a
Simple TreeDimensional
Data Cube
Sales volumes of
a specific Product
on variable Time
and Region
Product
Cells are filled
with numbers
representing
sales volumes
Geography
OLAP
A 3-dimensional
OLAP cube with
slicing
operations
Sales volumes of
a specific Region
on variable Time
and Products
Sales volumes of
a specific Time on
variable Region
and Products
2-47
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP operations—Roll-up
2-48
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP operations—Drill Down
2-49
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP operations—Slice
2-50
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP operations—Dice
2-51
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
OLAP operations—Pivot
2-52
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Types of OLAP cubes
2-53
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Variations of OLAP


2-54
Multidimensional OLAP (MOLAP)
OLAP implemented via a specialized
multidimensional database (or data store) that
summarizes transactions into multidimensional
views ahead of time
Relational OLAP (ROLAP)
The implementation of an OLAP database on top
of an existing relational database
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Key concepts



2-55
Dimension tables
Fact tables
Measures
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Dimensions
2-56
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Dimensions
2-57
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Facts and measure
2-58
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
End of the Chapter

2-59
Questions, comments
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Download