Chapter 1 Summary - Computing and Information Studies

advertisement
CHAPTER 1 (HAN and KAMBER)
This chapter is a good introduction to the field of knowledge discovery in databases/data
mining(KDD/DM). Many terms are introduced, key data mining functionalities are
briefly described and many application oriented concepts are discussed. The emphasis (in
this book) is on database perspective of data mining.
At first reading, it is adequate to emphasize Sections 1.1, 1.2, 1.4, 1.7 and 1.9
Section 1.1 Motivation and Importance
1. Large volumes of data are available and it is expected that this data contains
useful information that can be helpful..
2. DM is a natural evolution of the database technology. The technology has
evolved through data collection and management, advanced data analysis, on-line
transaction processing, www, on-line analytical processing, and now data mining.
Section 1.2 What is Data Mining
1. Extracting or mining knowledge from large data bases.
2. Main steps: data cleaning; data integration; data selection; data mining; pattern
evaluation; knowledge presentation.
3. Data mining system components: Database, data warehouse, www, or other data
repository; database or data warehouse sever; knowledge base; data mining
engine; pattern evaluation module; user interface.
4. Data mining involves an integration of techniques from database/data warehouse
technology, statistics, machine learning, high-performance computing, pattern
recognition, and neural networks.
5. Emphasis is on efficient (in terms of time and storage) and scalable (running time
linear with data size) data mining techniques.
Section 1.3 Different Types of Databases
1.
2.
3.
4.
Relational
Data warehouses
Transactional databases
Advanced data and information systems
 Object-oriented and object-relational databases
 Temporal, sequence, time-series databases
 Spatial and spatiotemporal databases
 Text and multimedia databases
 Heterogeneous and legacy databases
 Data stream databases
 World wide web
Section 1.4 Data Mining Functionalities
1.
2.
3.
4.
5.
6.
7.
Concept description: characterization and discrimination
Mining frequent patterns, associations, and correlations
Classification
Prediction
Cluster analysis
Outlier analysis
Evolution analysis
Section 1.5 Interestingness of Discovered Patterns
Techniques for evaluating the interestingness of discovered patterns or knowledge.
Section 1.6 Classification of Data Mining Systems
1.
2.
3.
4.
Kinds of databases
Kinds of knowledge discovered/mined.
Kinds of techniques employed
According to application domain
Section 1.7 Data Mining Task Primitives
1.
2.
3.
4.
5.
Task-relevant data
Kind of knowledge to be mined
Background knowledge
Interestingness measures
Expected visualization representation
Section 1.8 Integration of Data Mining and Data Base Systems
No coupling, loose coupling, semi-tight coupling, tight coupling.
Section 1.9 Major Issues in Data Mining
1. Mining methodology and user interaction issues: different kinds of knowledge
in databases, interactive mining at multiple levels of abstraction, incorporation
of background knowledge, data mining query languages and ad hoc data
mining, presentation and visualization of results, handling noisy and
incomplete data, interestingness problem.
2. Performance issues: efficiency and scalability of data mining algorithms,
parallel, distributed and incremental algorithms.
3. Database diversity issues: handling of relational and complex data,
heterogeneous databases and global information systems.
Download