09.DW_and_OLAP

advertisement
09. Data Warehouse (DW) &
On-line Analytic Processing (OLAP)
Rev: Feb, 2013
Euiho (David) Suh, Ph.D.
POSTECH Strategic Management of Information and Technology Laboratory
(POSMIT: http://posmit.postech.ac.kr)
Dept. of Industrial & Management Engineering
POSTECH
Contents
1
2
3
Data Warehouse
1)
Introduction of Data Warehouse
2)
Concepts for Data Warehouse
3)
Difficulties and Trends
On-line Analytic Processing (OLAP)
1)
Introduction of OLAP
2)
Concepts for OLAP
Case Study
1. Data Warehouse
Definition of Data Warehouse
1) Introduction of Data Warehouse
■ Data Warehouse
Integrated
A data warehouse is a
Non-volatile
collection of data in support of management’s decisions
Time variant
– Stores static data that has been extracted from other databases in an organization
– Central source of data that has been cleaned, transformed, and cataloged
– Data is used for data mining, analytical processing, analysis, research, decision support
Scattered Information
Cleaned Data Warehouse
Query & Distribute to End User
100
Customer
50
Cost
0
Bond
Sales
HR
Finance
3
1. Data Warehouse
Data Warehouse Architecture
1) Introduction of Data Warehouse
■ Data Warehouse architecture
Source Data
SQL
Data
Warehouse
Data Mart
Enterprise
server
Workgroup
server
External
file
Query,
Reporting tool
SQL
OLAP tool
SQL
SQL
OLTP
System
SQL
EIS/DSS
Application
RDB
SQL
Datamining
Application
SQL
Slice/Dice
Back up
file
Infra, Data integration and
Administration
* Building the Data Warehouse
Web browser
MDB
Application development, Data access & Use
*Use of Data Warehouse
4
1. Data Warehouse
Data Warehouse Architecture
1) Introduction of Data Warehouse
■ Technical architecture for a data warehousing system
source
data
Data
Acquisition
Component
Data
Manager
Component
warehouse
data
Information
Directory
Component
Design
Component
Data
Delivery
Component
Middleware
Component
warehouse
metadata
external
metadata
Management Component
5
external
data
Data Access
Component
1. Data Warehouse
Introduction of Database
2) Concepts for Data Warehouse
■ Definition of database
– Integrated collection of logically related data elements
■ Common Database Structures (Types)
–
Hierarchical
•
•
•
–
Network
•
•
–
Most widely used structure
Data elements are stored in tables
Row represents a record; column is a field
Can relate data in one file with data in another,
if both files share a common data element
Multidimensional
•
•
•
•
–
Used in some mainframe DBMS packages
Many-to-many relationships
Relational
•
•
•
•
–
Early DBMS structure
Records arranged in tree-like structure
Relationships are one-to-many
Relational Structure
Variation of relational model
Uses multidimensional structures to
organize data
Data elements are viewed as being in cubes
Popular for analytical databases that support Online Analytical Processing (OLAP)
Object-Oriented
•
•
Store data together with the appropriate methods for accessing it i.e. encapsulation
Information is represented in the form of objects as used in object-oriented programming
6
Object-Oriented
Structure
1. Data Warehouse
Metadata and Data Marts
2) Concepts for Data Warehouse
■ Metadata
– Data about data (similar to catalog card in library)
– Define the data in the data warehouse
– Enable to find the data in data warehouse, more easily and fast
■ Data Marts
– Collection of database
– Comparing with Data Warehouse, data marts are usually smaller and focus on a particular
subject or department.
– Data marts are subsets of larger Data Warehouse
■ Data Warehouse vs. Data Mart
– Data in Data Warehouse
• The data needs to be gathered from all the relevant transactional systems that produce it, cleansed and
validated, and made available from a system-of-record that ensures the referential integrity of the data
– Data in Data Mart
• The data needs to be presented in a structure that is intuitive to the users and facilitates their ability to
query the data that is relevant to their needs
7
1. Data Warehouse
Information Flow
2) Concepts for Data Warehouse
■ Data Warehouse built on top of DB
Data Marts
Finance
Management
Reporting
Accounting
Sales
Marketing
8
1. Data Warehouse
Data Warehouse Components
2) Concepts for Data Warehouse
■ Data Warehouse Components
9
1. Data Warehouse
Applications and Data Marts
2) Concepts for Data Warehouse
■ Applications and Data Marts
10
1. Data Warehouse
Difficulties in implementing DW
3) Difficulties and Trends
■ Complete Alignment
– Make sure you have full involvement and buy -in from those that represent your users - the
consumers of your data warehouse.
■ Iterative & Frequent Update
– Consider all aspects of the process of researching your data sources, capturing and
transmitting that data to the data warehouse, transforming and loading it into the data
warehouse and accounting for its lineage.
■ Risk
– Make sure you develop a proper risk management plan.
11
1. Data Warehouse
Future Trends
3) Difficulties and Trends
■ Enterprise Data Warehouse
– The enterprise data warehouse, whether a single store or integrated data marts across a variety
of platforms, yields a view of the operation previously unattainable
by Don Hatcher, SAS
■ Real-time
– Organization move to more real-time data transformation and seek to better leverage common
metadata across applications
by Allan Houpt, CA
■ Capacity
– The future of data warehousing is all about ever larger data warehouses - in fact I just read
about a U.S. Government effort to create petabyte repositories
by Roman Bukary, SAP Director of Market Strategy
12
2. OLAP
Definition of OLAP
1) Introduction of OLAP
■ OLAP (On-Line Analytical Processing)
– The dynamic enterprise analysis required to create, manipulate, animate and synthesis
information from Enterprise Data Models
* Providing OLAP: An IT Mandate
E.F. Codd (1993)
– FASMI (Fast Analysis of Shared Multidimensional Information)
• This definition was first used in early 1995, and has not needed revision since
Pendse & Greeth (1995)
FAST
ANALYSIS
SHARED
MULTIDIMENSIONAL
INFORMATION
13
2. OLAP
OLAP Architecture
1) Introduction of OLAP
■ OLAP Architecture
14
2. OLAP
From OLTP to OLAP
2) Concepts for OLAP
■ Data used in OLAP
– Sales data of June? (OLTP)
– Multi-dimensional data (having many features) (OLAP)
■ Direct Access: EUC Environment
Information
Source
Information
Broker
Information
Consumer
■ From What to Why
– OLTP: Storing primitive data, supporting routine business operation (What)
– OLAP: Storing cumulative data, supporting business goal (Why)
15
2. OLAP
OLTP vs. OLAP
2) Concepts for OLAP
■ OLTP vs. OLAP
OLTP
OLAP
Definition
On-Line Transaction Processing
On-Line Analytical Processing
Objective
Operational
Analytical
Focus
Daily repetitious work
Decision support in organization
Developer
Computer expert
End-user
User
Simple operator
Special analyst
Storing
Current value
Summarized and Consolidated data
Use
Repetitive
Unstructured
Response
Immediate
Delayed
Data
Updated
Summarized
Update
Field
Recomputation
Amount of Data
Small
Much
Data Structure
Complex
Simple
Database
RDB
MDB
Data period
Past, Current
Past, Current, Future
Query type
Regular
Irregular, Analytical
16
2. OLAP
Enterprise IT Architecture
2) Concepts for OLAP
■ OLTP/OLAP Enterprise IT Architecture
17
2. OLAP
Data Warehouse vs. OLAP Server
2) Concepts for OLAP
■ Data Warehouse vs. OLAP Server
Data Warehouse
OLAP Server
Objective
Ready to all kinds of retrieval
Specialized retrieval
Characteristics
Data Storage
Computation Engine
Query Type
Read only
Read/Write
Response
Flexible
Consistent, rapid
Content
Historical, present
Historical, present, Future
Data Structure
Plain
Multi-dimensional
Amount of Data
Huge, much detail
Much, detail
Development period
A few month, yrs
A few weeks, months
18
2. OLAP
Two types of OLAP
2) Concepts for OLAP
■ MOLAP
Query
MDBMS
MD Processing
Clients
Respond
■ ROLAP
SQL
RDBMS
SQL
Query
MD Processing
19
Respond
Clients
2. OLAP
From RDB to MDB
2) Concepts for OLAP
■ Basic Data Structure of MDB & RDB
Cube
Table
Field, Row
Dimension
Record,
Column
– RDB: OLTP, Data Warehouse
Hierarchy
– MDB: OLAP
■ RDB as OLAP Server
– Cannot handle and represent Multi-dimensional relationship well
– Cannot summarize data well
■ MDB as OLAP Server
–
–
–
Gives many managerial viewpoints
EUC
Supports analysis functionality
20
Reference
■ Euiho Suh, “EIS_DSS_OLAP_DW (PPT Slide)”, POSMIT Lab.
(POSTECH Strategic Management of Information and Technology Laboratory)
■ O’Brien & Marakas, “Introduction to Information Systems – Sixteenth Edition”,
McGraw – Hill, Chapter 5
21
Download