Document

advertisement
資料倉儲介紹
Data Warehousing and OLAP
楊立偉教授
台灣大學工管系
1
Agenda
1. Introduction
2. Data Warehouse Theory
3. System Features
4. Demo
5. Discussions
2
1. Introduction
3
1.1 Introduction
• A data warehouse is a subject-oriented,
integrated, time-variant, nonvolatile
collection of data in support of
management decisions
4
1.1 Introduction (cont’d)
How are organizations using data warehouse ?
1. Increasing customer focus, which includes the analysis of
customer buying patterns.
2. Repositioning products and managing product portfolios
by comparing the performance of sales by time or
regions, in order to fine-tune production strategies
3. Analyzing operations and looking for sources of profit
4. Managing the customer relationship, making
environmental corrections, and managing the cost of
5
corporate assets
1.2 Data Warehouse Characteristics
• It is a database designed for analytical tasks,
using data from multiple applications
• It supports a relatively small number of users
with relatively long interactions
• Its usage is read-intensive
• Its content is periodically updated
6
1.2 Data Warehouse Characteristics (cont’d)
• It contains current and historical data to
provide a historical perspective of information
• It contains a few large tables
• Each query frequently results in a large result
set and involves frequent full table scan and
multi-table joins
7
1.3 Datawarehousing
• The Processing of constructing and using data
warehouses
Heterogeneous
Data Sources
Data Cleaning
Data
Integration
And
Consolidation
Constructing Data warehouse
8
Interactive
Analysis
Making
Strategic
Decisions
Using Data Warehouse
1.4 Three-tier System Architecture
Executives or
Decision Making Staffs
OLAP Tools
Datawarehouse Server
Operational DBMS
9
IT or Datawarehouse
Administrators
2. Data Warehouse Theory
10
2.1 Data Warehouse Theory
• Why not use Database directly ?
– The update-driven approach is inefficient.
– Potentially expensive for frequent queries.
• Use Data warehouse instead
– The query-driven approach is enough for
making strategic decisions.
– Separate the operational DBMS for daily and
critical operations.
11
2.2 Data Cube
• A multidimensional, logical view of the data
• Concept hierarchy
– Multiple data granularity 多重的資料顆粒度
– Data summarization 資料加總
– Data generalization 資料一般化
12
• A 3-dimension Data Cube
13
• Drill-down on time data for Q1
14
• Roll-up on address
• Adding a dimension supplier
15
2.3 Efficient Data Cube Computation
• The challenges : 2N combinations
– Concept hierarchy and Aggregations
makes it more complicated !
• Materialization of data cube 如何實作
ALL
– Materialize every, none, or some ?
– Algorithms for selection
Address
Item
Address, Time
Address, Item
Time
• Based on size
• Based on sharing,
• Based on access frequency.
16
Address, Time, Item
Time, Item
2.4 On-Line Analytical Processing (OLAP)
• Fast on-line processing of data cubes or
multi-dimensional databases
• OLAP operations:
–
–
–
–
Drilling
Pivoting 樞紐分析
Slicing and Dicing
Filtering, etc.
17
2.4 On-Line Analytical Processing (Cont’d)
• A multidimensional, logical view of the data.
• Interactive analysis of the data (drill, pivot, slice_dice,
filter) and Quick response to OLAP queries.
• Summarization and aggregations at every dimension
intersection.
• Retrieval and display of data in 2-D or 3-D cross-tabs,
charts, and graphs, with easy pivoting of the axes.
• Analytical modeling: deriving ratios, variance, etc. and
involving data across many dimensions.
• Forecasting, trend analysis, and statistical analysis.
18
3. System Feature
19
3.1 Data sources supported
• ODBC-compatible DBMS
– Oracle, Microsoft SQL, MySQL, IBM DB2, etc.
• Files
– MS Access, MS Excel, etc.
– Text files (CSV-format)
20
3.2 Data Cleansing 資料清洗
• Database schema translation
– Field selection and mapping
– Field re-naming
– Field aggregating and deriving
• Data filtering
• Data value conversion
– Data value mapping
– Data value function
– Date value conversion and decomposition
21
3.3 Building of Data Cube
• Support for multi-dimension data
• Support for concept hierarchy
22
3.5 Interactive Front-end Tools
•
•
•
•
User-defined multi-dimension
User-defined dimension hierarchy
User-defined data granularity
Real-time graph capabilities
– Bar chart
– Pie chart
– Line chart
23
3.6 Other features
• Web-based OLAP GUI
– Easy to access from Internet
• Easy to integrated with other systems
– Import / Export capability
24
4. Demo
25
5. Discussions
26
5.1 Roadmap
• Integrated with Data mining
–
–
–
–
–
Major Group / Sales Analysis 主力客群
Prospects Analysis and Forecast 潛在購買分析與預測
Association of Customers and Sales 關聯分析
Market Segment Recommendation 市場區隔
Other Business Intelligence application
• Integrated to e-Marketing
– 1-to-1 Personalization & Recommendation 個人化推薦
– Target marketing 目標行銷
– Loyalty program 客戶忠誠度計劃
27
Download