A Data Mining Course for Computer Science and non Computer Science Students Jamil Saquer Computer Science Department Missouri State University Springfield, MO Outline Introduction Motivation Challenges Design of the Course Topics Covered Assignments Examination Format Conclusion Introduction What is data mining (DM)? non-trivial process of identifying valid, novel, useful, and ultimately understandable patterns in large volumes of data. DM is an interdisciplinary topic Has many things in common with machine learning and pattern recognition Motivation for the Course Introducing more electives Introducing graduate level CS courses Informatics Program Interest to faculty members and students from other departments Author’s main area of research Challenges in Designing the Course Diverse student population CS vs. non-CS undergrad vs. grad Solution Informatics program in design stages MNAS CS option is new • Therefore, emphasis on undergrad CS students Accommodating other students Minimize prerequisites CS 2 (or even CS 1) Capable of using a DM software Scientific background/mentality • One from business, another from GGP For grad CS students: • project requires more research • Tests could be a little different Emphasize understanding basic DM concepts and using software for mining data Design of the Course Used book by Dunham Book divided into 3 parts About 1 week spent on definitions, applications, motivations, challenges, … Core of the course spent on core DM subjects: classification, clustering, mining association rules Last week for project presentations Classification Assigning objects to classes supervised learning Example: classify a military vehicle as a friendly or an enemy vehicle Methods covered include: decision trees, Naïve Bayesian, k-nearest neighbor, backpropogation Clustering Grouping objects into different classes unsupervised learning Example: cluster Weblog data to discover groups of similar access patterns Techniques covered include: link algorithms, nearest neighbor, k-means, PAM, BIRCH, DBSCAN, CURE, ROCK Association Rules Finding patterns that occur together Example: diapers and beer are usually bought together Techniques covered: Apriori, sampling, partitioning, FP-growth Assignments Students need to learn how to mine data One assignment on each core DM topic apply two different algorithms on at least two data sets, one has to be relatively large can use any DM package (Weka) Students write a report Students learn how to run an experiment Term Project Group projects Either provide a non-trivial implementation of a DM algorithm Or, learn about a DM topic not discussed in class Graduate students required to read at least three research papers and to write a report All students present their project in class Examination Format Open book Two types of questions First type, require basic knowledge of the material definitions, T/F, short answers Second type, apply certain algorithms on small data sets Conclusion DM is an interesting course for CS and non-CS students DM can be taught for non-CS students A DM course can be taught for students with minimal CS background Questions