Uploaded by Indiecompany

W4 data exploration

advertisement
Data exploration
Lukman Heryawan, PhD
10 Maret 2022
Perkuliahan (sebelum UTS)
• Link elok: https://elok.ugm.ac.id/course/view.php?id=9358 (Data mining - CS IUP)
• 7 kali pertemuan:
• Process model of data mining
• Data types and attributes
• Data distance and data collection
• Data exploration
• Data preparation
• Supervised model development and evaluation
• Supervised model improvement
• Penilaian:
• UTS - 5 soal (25%)
• Tugas individu – 5 kali (5%)
• Tugas kelompok - 2 kali + keaktifan kelas (20%)
Perkuliahan (setelah UTS)
• Link elok: https://elok.ugm.ac.id/course/view.php?id=9358 (Data mining - CS IUP)
• 7 kali pertemuan:
• Clustering methods
• Clustering evaluation
• Frequent itemset
• Association rule
• Sequential pattern
• Case study: solving issue using supervised methods
• Case study: solving issue using unsupervised methods
• Penilaian:
• UAS - 5 soal (25%)
• Tugas individu – 5 kali (5%)
• Tugas kelompok - 2 kali + keaktifan kelas (20%)
Materi
Data mining (DM) definition
• Data mining is a process of extracting and discovering patterns in large
data sets involving methods at the intersection of machine learning,
statistics, and database systems. Wikipedia
DM definition (cont)
Process model of CRISP-DM
• Cross-industry standard process for data mining, known as CRISP-DM
is an open standard process model that describes common
approaches used by data mining experts.
• It is the most widely-used analytics model.
https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining
CRISP-DM Process
Four stages of data mining
CRISP-DM and data exploration
Data exploration
https://www.heavy.ai/learn/data-exploration
VOSviewer
• VOSviewer is a software tool for constructing and visualizing
bibliometric networks.
• These networks may for instance include journals, researchers, or
individual publications.
• VOSviewer also offers text mining functionality that can be used to
construct and visualize co-occurrence networks of important terms
extracted from a body of scientific literature.
• https://www.vosviewer.com/
VOSviewer example
Case study
MeSH on Demand Tool:
An Easy Way to Identify Relevant MeSH Terms
Text mining of PubMed database
https://www.nlm.nih.gov/pubs/techbull/mj14/mj14_mesh_on_demand.html
Abstract example
Related MesH
https://pubmed.ncbi.nlm.nih.gov/?term=covid+19
Similar articles
measurement using
data distance
Data visualization for MeSH on demand application
Weekly assignment
(due date: 16 March 2022, 23:59)
• Create github account for this assignment
• Develop MeSH on demand application and store the
development progress report to your github account
• You may use this reference:
https://meshb.nlm.nih.gov/MeSHonDemand
• Write a detail explanation of your development progress report
in github
• For example you can explain a visualization script that be
used to visualize terms of PubMed articles
• Share your progress and github link to email
lukmanh@ugm.ac.id
• Email is sent with subject:
Name of student_assignment4_DM_CSIUP
Example of email format of assignment 4:
Tanya jawab
• Email: lukmanh@ugm.ac.id
• Scholar profile: https://scholar.google.co.id/citations?user=V_iMAWYAAAAJ&hl=en
Download