Data exploration Lukman Heryawan, PhD 10 Maret 2022 Perkuliahan (sebelum UTS) • Link elok: https://elok.ugm.ac.id/course/view.php?id=9358 (Data mining - CS IUP) • 7 kali pertemuan: • Process model of data mining • Data types and attributes • Data distance and data collection • Data exploration • Data preparation • Supervised model development and evaluation • Supervised model improvement • Penilaian: • UTS - 5 soal (25%) • Tugas individu – 5 kali (5%) • Tugas kelompok - 2 kali + keaktifan kelas (20%) Perkuliahan (setelah UTS) • Link elok: https://elok.ugm.ac.id/course/view.php?id=9358 (Data mining - CS IUP) • 7 kali pertemuan: • Clustering methods • Clustering evaluation • Frequent itemset • Association rule • Sequential pattern • Case study: solving issue using supervised methods • Case study: solving issue using unsupervised methods • Penilaian: • UAS - 5 soal (25%) • Tugas individu – 5 kali (5%) • Tugas kelompok - 2 kali + keaktifan kelas (20%) Materi Data mining (DM) definition • Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Wikipedia DM definition (cont) Process model of CRISP-DM • Cross-industry standard process for data mining, known as CRISP-DM is an open standard process model that describes common approaches used by data mining experts. • It is the most widely-used analytics model. https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining CRISP-DM Process Four stages of data mining CRISP-DM and data exploration Data exploration https://www.heavy.ai/learn/data-exploration VOSviewer • VOSviewer is a software tool for constructing and visualizing bibliometric networks. • These networks may for instance include journals, researchers, or individual publications. • VOSviewer also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature. • https://www.vosviewer.com/ VOSviewer example Case study MeSH on Demand Tool: An Easy Way to Identify Relevant MeSH Terms Text mining of PubMed database https://www.nlm.nih.gov/pubs/techbull/mj14/mj14_mesh_on_demand.html Abstract example Related MesH https://pubmed.ncbi.nlm.nih.gov/?term=covid+19 Similar articles measurement using data distance Data visualization for MeSH on demand application Weekly assignment (due date: 16 March 2022, 23:59) • Create github account for this assignment • Develop MeSH on demand application and store the development progress report to your github account • You may use this reference: https://meshb.nlm.nih.gov/MeSHonDemand • Write a detail explanation of your development progress report in github • For example you can explain a visualization script that be used to visualize terms of PubMed articles • Share your progress and github link to email lukmanh@ugm.ac.id • Email is sent with subject: Name of student_assignment4_DM_CSIUP Example of email format of assignment 4: Tanya jawab • Email: lukmanh@ugm.ac.id • Scholar profile: https://scholar.google.co.id/citations?user=V_iMAWYAAAAJ&hl=en