Introduction to Data Mining
1. What is data mining?
2. Motivating Challenges
3. The Origins of Data Mining
4. Data Mining Tasks
Data Mining
Prepared by Phan Huy Tam – Finance & Banking Dept - UEL
2
Data mining is the process of automatically discovering useful information in large data repositories
Data mining techniques are deployed to scour large data sets in order to find novel and useful patterns that
might otherwise remain unknown.
1. What is data mining?
2. Motivating Challenges
3. The Origins of Data Mining
4. Data Mining Tasks
3
Knowledge Discovery in Databases (KDD)
Data mining is an integral part of knowledge discovery in databases (KDD), which is the overall process of
converting raw data into useful information
Feature Selection
Dimensionality Reduction
Normalization
Data Sub setting
Input Data
Data
Processing
1. What is data mining?
Filtering Pattern
Visualization
Pattern Interpretation
Data Mining
2. Motivating Challenges
Postprocessing
3. The Origins of Data Mining
Information
4. Data Mining Tasks
1. Scalability
4. Data Ownership and Distribution
Massive data sets: out-of-core algorithms may be
necessary when processing data sets that cannot fit
into main memory (parallel and distributed
algorithms).
2. High Dimensionality
Traditional data analysis techniques that were
developed for low-dimensional data often do not
work well for such high-dimensional data due to
issues such as curse of dimensionality.
Data is geographically distributed among resources
belonging to multiple entities (distributed data
mining techniques).
5. Non-traditional Analysis
Extremely labor-intensive (trial & error), desire to
automate the process of hypothesis generation and
evaluation
3. Heterogeneous and Complex Data
non-traditional types of data include web and social
media data containing text, hyperlinks, images, audio,
and videos, DNA sequence, climate data…
1. What is data mining?
2. Motivating Challenges
3. The Origins of Data Mining
4. Data Mining Tasks
4
5
Traced back to the late 1980s
Challenges and opportunities in applying computational techniques to extract actionable knowledge from large
databases and fueled the tremendous growth of this field.
information retrieval
sampling
estimation
evolutionary computing
information theory
statistics
optimization
search algorithms
signal processing
modeling techniques
artificial intelligence
estimation
machine learning
big data
visualization
pattern recognition
hypothesis testing
1. What is data mining?
2. Motivating Challenges
3. The Origins of Data Mining
4. Data Mining Tasks
6
1. What is data mining?
2. Motivating Challenges
3. The Origins of Data Mining
4. Data Mining Tasks
Prepared by Phan Huy Tam – Finance & Banking Dept - UEL
Email: tamph@uel.edu.vn
Phone: 0798109293