Penelitian Data Mining

advertisement
Data Mining:
5. Penelitian Data Mining
Romi Satria Wahono
romi@romisatriawahono.net
http://romisatriawahono.net/dm
WA/SMS: +6281586220090
1
Romi Satria Wahono
•
•
•
•
•
•
•
•
SD Sompok Semarang (1987)
SMPN 8 Semarang (1990)
SMA Taruna Nusantara Magelang (1993)
B.Eng, M.Eng and Ph.D in Software Engineering from
Saitama University Japan (1994-2004)
Universiti Teknikal Malaysia Melaka (2014)
Research Interests: Software Engineering,
Machine Learning
Founder dan Koordinator IlmuKomputer.Com
Peneliti LIPI (2004-2007)
Founder dan CEO PT Brainmatics Cipta Informatika
2
Course Outline
1.
2.
3.
4.
5.
Pengenalan Data Mining
Proses Data Mining
Evaluasi dan Validasi pada Data Mining
Metode dan Algoritma Data Mining
Penelitian Data Mining
3
5. Penelitian Data Mining
1. Standard Proses Penelitian pada Data
Mining
2. Masalah Umum Penelitian Data Mining
3. Journal Publications on Data Mining
4
1. Standard Proses Penelitian
pada Data Mining
5
Data Mining Standard Process (CRISP–
DM)
• A cross-industry standard was clearly
required that is industry neutral, toolneutral, and application-neutral
• The Cross-Industry Standard Process for Data
Mining (CRISP–DM) was developed in 1996
(Chapman, 2000)
• CRISP-DM provides a nonproprietary and
freely available standard process for fitting
data mining into the general problem-solving
strategy of a business or research unit
6
CRISP-DM
7
1. Business Understanding Phase
• Enunciate the project objectives and requirements
clearly in terms of the business or research unit as
a whole
• Translate these goals and restrictions into the
formulation of a data mining problem definition
• Prepare a preliminary strategy for achieving these
objectives
8
2. Data Understanding Phase
• Collect the data
• Use exploratory data analysis to familiarize yourself
with the data and discover initial insights
• Evaluate the quality of the data
• If desired, select interesting subsets that may
contain actionable patterns
9
3. Data Preparation Phase
• Prepare from the initial raw data the final data set
that is to be used for all subsequent phases. This
phase is very labor intensive
• Select the cases and variables you want to analyze
and that are appropriate for your analysis
• Perform transformations on certain variables, if
needed
• Clean the raw data so that it is ready for the
modeling tools
10
4. Modeling phase
• Select and apply appropriate modeling techniques
• Calibrate model settings to optimize results
• Remember that often, several different techniques
may be used for the same data mining problem
• If necessary, loop back to the data preparation
phase to bring the form of the data into line with
the specific requirements of a particular data
mining technique
11
5. Evaluation phase
• Evaluate the one or more models delivered in the
modeling phase for quality and effectiveness before
deploying them for use in the field
• Determine whether the model in fact achieves the
objectives set for it in the first phase
• Establish whether some important facet of the
business or research problem has not been
accounted for sufficiently
• Come to a decision regarding use of the data
mining results
12
6. Deployment phase
• Make use of the models created: Model creation
does not signify the completion of a project
• Example of a simple deployment: Generate a report
• Example of a more complex deployment:
Implement a parallel data mining process in
another department
• For businesses, the customer often carries out the
deployment based on your model
13
Latihan
• Pelajari dan pahami Case Study 1-5 dari buku
Larose (2005) Chapter 1
• Pelajari dan pahami bagaimana menerapkan
CRISP-DM pada tesis Firmansyah (2011)
tentang penerapan algoritma C4.5 untuk
penentuan kelayakan kredit
14
2. Masalah Umum Penelitian
Data Mining
15
Masalah Utama Penelitian Data Mining
• Mining Methodology
•
•
•
•
•
•
Mining various and new kinds of knowledge
Mining knowledge in multi-dimensional space
Data mining: An interdisciplinary effort
Boosting the power of discovery in a networked environment
Handling noise, uncertainty, and incompleteness of data
Pattern evaluation and pattern- or constraint-guided mining
• User Interaction
• Interactive mining
• Incorporation of background knowledge
• Presentation and visualization of data mining results
16
Masalah Utama Penelitian Data Mining
• Efficiency and Scalability
• Efficiency and scalability of data mining algorithms
• Parallel, distributed, stream, and incremental mining
methods
• Diversity of data types
• Handling complex types of data
• Mining dynamic, networked, and global data
repositories
• Data Mining and Society
• Social impacts of data mining
• Privacy-preserving data mining
• Invisible data mining
17
3. Journal Publications on Data
Mining
18
Transactions and Journals
• Review Paper (survey and state-of-the-art):
• ACM Computing Surveys (CSUR)
• Research Paper (technical):
• ACM Transactions on Knowledge Discovery from Data
(TKDD)
• ACM Transactions on Information Systems (TOIS)
• IEEE Transactions on Knowledge and Data Engineering
• Springer Data Mining and Knowledge Discovery
• International Journal of Business Intelligence and Data
Mining (IJBIDM)
19
Cognitive Assignment III
1. Baca paper yang ada di
http://romisatriawahono.net/lecture/dm/paper/
2. Rangkumkan masing-masing dalam bentuk slide dengan
struktur:
1.
2.
3.
4.
5.
6.
7.
8.
Latar Belakang Masalah (Research Background)
Pernyataan Masalah (Problem Statements)
Pertanyaan Penelitian (Research Questions)
Tujuan Penelitian (Research Objective)
Metode-Metode yang Sudah Ada (Existing Methods)
Metode yang Diusulkan (Proposed Method)
Hasil (Results)
Kesimpulan (Conclusion)
3. Presentasikan di depan kelas pada mata kuliah berikutnya
20
Referensi
1. Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining:
Practical Machine Learning Tools and Techniques 3rd
Edition, Elsevier, 2011
2. Daniel T. Larose, Discovering Knowledge in Data: an
Introduction to Data Mining, John Wiley & Sons, 2005
3. Florin Gorunescu, Data Mining: Concepts, Models and
Techniques, Springer, 2011
4. Jiawei Han and Micheline Kamber, Data Mining: Concepts
and Techniques Third Edition, Elsevier, 2012
5. Oded Maimon and Lior Rokach, Data Mining and Knowledge
Discovery Handbook Second Edition, Springer, 2010
6. Warren Liao and Evangelos Triantaphyllou (eds.), Recent
Advances in Data Mining of Enterprise Data: Algorithms and
Applications, World Scientific, 2007
21
Download