Chapter 12 OLTP vs OLAP OLTP vs OLAP

advertisement
Chapter 12
Databases for Online Analytical
Processing
Class 09: Chapter 12
1
OLTP vs OLAP
• Operational Database: a database designed
to support the day-to-day transactions of an
organization
• Data Warehouse: historical data is
periodically trimmed from the operational
database and moved to a database
specifically designed for analysis
– Term coined by Bill Inmon in early 1980s
– Significant contributions by Ralph Kimball and others
Class 09: Chapter 12
2
OLTP vs OLAP
• Online Transaction Processing (OLTP):
– High transaction volume
– Each transaction uses relatively little data
– Day-to-day activities; current data
• Online Analytical Processing (OLAP):
– Relatively few transactions
– Each transaction uses large amounts of data
– Historical data; analysis and decision-making
Class 09: Chapter 12
3
1
Data Warehouses
• Data Warehouse: A subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of management’s
decision-making process
–
–
–
–
Organized around major subjects of the enterprise
Integrated from multiple operational sources
Only accurate across a known time period
Not updated in real-time; new data added
periodically (as often as needed)
Class 09: Chapter 12
4
Benefits of Data Warehousing
• Potential high returns on investment
• Competitive advantage
• Increased productivity of corporate
decision-makers
Class 09: Chapter 12
5
Challenges of Data Warehousing
•
•
•
•
•
•
•
•
•
Underestimation of resources for data loading
Hidden integrity problems in source data
Required data not captured
Ever-increasing end-user demands
Consolidating data from disparate sources
High demand for resources
Data ownership
Difficulty in determining requirements
“Big Bang” projects (complex, large scope)
Class 09: Chapter 12
6
2
DW DBMS Requirements
•
•
•
•
•
•
•
•
•
Load performance
Load processing
Data quality management
Query performance
Terabyte/Petabyte scalability
Networked or Cloud data warehouse
Warehouse administration
Integrated dimensional analysis
Advanced query and analytics capability
Class 09: Chapter 12
7
Data Warehouse Metadata
• Primary purpose is to show the pathway back to
where the data began
• However, it has other functions that relate to data
transformation, loading, DW management and
query generation
• Major integration issue is how to synchronize the
various types of meta-data across multiple
products:
– Passing metadata from tool to tool
– Using a metadata repository
Class 09: Chapter 12
8
Administration and Management Tools
•
•
•
•
•
•
•
•
•
•
•
Monitoring data loading
Data quality and integrity checks
Managing and updating metadata
Monitoring database performance
Auditing the data warehouse
Replicating, subsetting and distributing data
Maintaining efficient data storage management
Archiving and backing up data
Purging data
Implementing recovery after failure
Security management
Class 09: Chapter 12
9
3
Comparison of OLTP Systems
and Data Warehouses
Class 09: Chapter 12
10
DW Architecture: Summary Tables
Class 09: Chapter 12
11
Star Schema Architecture
Class 09: Chapter 12
12
4
Star Schema Variants
• Snowflake Schema: A variant of the star
schema where each dimension can have its
own dimensions
• Starflake (Hybrid) Schema: a hybrid
structure that contains a mixture of
(denormalized) star and (normalized)
snowflake schemas
Class 09: Chapter 12
13
Multi-Dimensional OLAP
• Use multi-dimensional structures to store data and
relationships with data.
• Best visualized as cubes of data with cubes within
cubes
• Each side of cube is a dimension
• Support for Analytical Operations:
– Consolidation (aggregation of data)
– Drill-down (reverse of aggregation)
– Slicing and dicing (pivoting): look at data from
different viewpoints
Class 09: Chapter 12
14
Class 09: Chapter 12
15
5
Class 09: Chapter 12
16
Data Marts
• Data Mart: a subset of a data warehouse
that supports the requirements of a
particular department or business function.
– Limited scope
– Not intended for operational reporting
– Must less information than a data warehouse
Class 09: Chapter 12
17
Reasons for Creating a Data Mart
•
•
•
•
•
Data tailored to department or function
Lower cost than a full DW
Lower risk project than a full DW
Limited (usually 1) end user analysis tool
Database placed physically near the
department, reducing network delays
Class 09: Chapter 12
18
6
Data Mart Issues
•
•
•
•
•
•
Functionality
Size
Load performance
User access to multiple data marts
Administration
Expansion and growth (may require
reloads)
Class 09: Chapter 12
19
Data Mart Approaches
• Build enterprise DW to populate data marts
– Data marts won’t be done if DW project stalls
• Build several data marts and integrate later
– Generally lower risk
– Data marts may produce inconsistent results
– Overall cost may be higher due to integration
• Build DW and data marts simultaneously
– Practically guarantees a never-ending project
from hell
Class 09: Chapter 12
20
Designing Data Warehouses
• Must understand how data will be used
• Star Schema: logical structure that has a
fact table in the center surrounded by
dimension tables (reference data)
– Must identify core transactions in business
Class 09: Chapter 12
21
7
Factors Influencing Fact Table
Design
•
•
•
•
•
•
•
Required time period
Statistical samples vs. detailed data
Columns to omit
Column size reduction
Intelligent vs. dumb keys
Optimal approach to account for time
Partitioning of fact table
Class 09: Chapter 12
22
Designing Dimension Tables
• Identify shared (conformed) dimensions
• Star schema vs. snowflake
• Collapse vs. split hierarchies
Class 09: Chapter 12
23
Typical MOLAP Architecture
Class 09: Chapter 12
24
8
SQL Extensions
• Augments SQL with operations appropriate to
data analysis and decision-support applications
such as:
–
–
–
–
–
Ranking
Moving averages
Comparisons (e.g. time period over time period)
Market share
Statistical functions (correlation, regression, etc.)
Class 09: Chapter 12
25
Data Mining
• Data Mining: The process of extracting
valid, previously unknown, comprehensible
and actionable information from large
databases and using it to make crucial
business decisions
– Tends to work from the data up
– Normally requires large data volumes for
accurate results
Class 09: Chapter 12
26
Data Mining Techniques
• Predictive Modeling
– Classification: put records in predetermined classes
– Value prediction: regression
• Database Segmentation
– Demographic clustering
– Neural clustering
• Link Analysis
– Association discovery
– Sequential pattern discovery
– Similar time sequence discovery
• Deviation Detection: identify outliers from the norm
– Statistics
– Visualization
Class 09: Chapter 12
27
9
Data Integration Methods
• ETL: Extract, Transform and Load.
– Periodic (schedule) bulk process
– Good for loading/refreshing data warehouses
and data marts
– Commercial packages (e.g. IBM [Ascential]
Datastage or custom developed).
– Common transformations are summarization,
categorization, recoding
– The target for the data is a centralized database
Class 09: Chapter 12
28
ETL
Class 09: Chapter 12
29
Data Integration Methods
• EAI: Enterprise Application Integration
– Framework of integrating data among disparate
applications
– Usually accomplished with push technology
that is event-driven
– Message queues are a common implementation
method
– The target for the data is an application
Class 09: Chapter 12
30
10
EAI
Class 09: Chapter 12
31
Data Integration Methods
• EII: Enterprise Information Integration
– Real-time integration of disparate data sources
– As queries are run, data is gathered from the
various sources to satisfy the request
– The target for the data is a person
Class 09: Chapter 12
32
EII
Class 09: Chapter 12
33
11
Data Integration Methods
• ODS: Operational Data Store
– Similar to a Data Warehouse, but operational
instead of historic
– Assembles “one version of the truth” from
multiple disparate data sources
– Often loaded using ETL-like processes, but can
be clearing house for updates as well
– Often used as the source for OLAP databases
Class 09: Chapter 12
34
ODS
Class 09: Chapter 12
35
Next
• Assignment 3 Walkthrough
Class 09: Chapter 12
36
12
Class 09: Chapter 12
37
13
Download