Data Mining and Data Warehousing – a connected view

advertisement
Data Mining and Data
Warehousing – a connected view
Introduction
• Data mining describes a collection of
techniques that aim to find useful but
undiscovered patterns in collected data
• The goal of data mining is to create models
for decision-making that predict future
behavior based on analysis of past activity
Introduction
• Data warehousing is a blend of technologies
aimed at the effective integration of
operational databases into an environment
that enables the strategic use of data. These
technologies include relational and
multidimensional database management
systems, client/server architecture, metadata
modeling and repositories, graphical user
interfaces, and much more.
Operational vs Informational
Databases
Table 2-1 Operational Versus informational Databases
Operational vs Informational
Databases
Table 2-2 Comparison of Data Stores, and Data Warehouses
Definition and characteristics of a
data warehouse
•
•
•
•
•
•
•
It’s a database designed for analytical tasks
It supports a relatively small number of users
Its usage is read-intensive
Its content is periodically updated (mostly additions)
It contains current and historical dta
It contains a few large tables
Each query frequently results in a large result set and
involves frequent full table scan and multi-table joins
• A formal definition of the data warehouse is offered by
W.H. Inmon
– A data warehouse is a subject-oriented, integrated, time-variant,
non-volatile collection of data in support of management decisions
Data warehouse architecture
Figure 2-1 Data Warehouse Environment
Data warehouse architecture
Figure 2-1 Data Warehouse and Data Operational Data Store
Data warehouse architecture
Figure 2-3 Two-tiered Data WarehouseArchitecture
Data warehouse architecture
Figure 2-4 Multi-tiered Data WarehouseArchitecture
Data mining defined
• Data mining as the process of discovering
meaningful new correlations, patterns, and
trends by digging into (mining) large
amounts of data stored in warehouse.
• The major attraction of data mining is its
capability to build predictive rather than
retrospective models
Predictive versus Retrospective
Models
Table 2-3 Predictive Versus Retrospective Models
Data Mining application Domain
•
•
•
•
Customer retention
Sales and customer service
Marketing
Risk Assessment and Fraud Detection
Data Mining Categories and
Research Focus
• Data mining techniques deal with discovery and
learning, and as such fall into three major learning
modes: supervised, unsupervised, and
reinforcement learning
• Data mining techniques can be categorized:
–
–
–
–
Representation of models and results
The type of data the techniques operates on
Application type
Pattern attributes
Data Mining Categories and
Research Focus
• Data mining categorized by business problems
– Retrospective Analysis
– Predictive Analysis
• These two classes of business problems can be
further classified by
–
–
–
–
Classification
Clustering/Segmentation
Associations
Sequencing
Data Mining Categories and
Research Focus
• Approaches that underlie the most
contemporary research in data mining:
–
–
–
–
The induction approach
The database querying approach
The compression approach
The approach of approximation and searching
Download