A Data Mining Course for Computer Science and non Computer

A Data Mining Course for
Computer Science and non
Computer Science Students
Jamil Saquer
Computer Science Department
Missouri State University
Springfield, MO
 Introduction
 Design
of the Course
Topics Covered
Examination Format
 Conclusion
 What
is data mining (DM)?
non-trivial process of identifying valid, novel,
useful, and ultimately understandable patterns
in large volumes of data.
DM is an interdisciplinary topic
Has many things in common with machine
learning and pattern recognition
Motivation for the Course
 Introducing
more electives
 Introducing graduate level CS courses
 Informatics Program
 Interest to faculty members and students
from other departments
 Author’s main area of research
Challenges in Designing the
 Diverse
student population
CS vs. non-CS
undergrad vs. grad
 Solution
Informatics program in design stages
MNAS CS option is new
• Therefore, emphasis on undergrad CS students
Accommodating other students
Minimize prerequisites
CS 2 (or even CS 1)
Capable of using a DM software
Scientific background/mentality
• One from business, another from GGP
For grad CS students:
• project requires more research
• Tests could be a little different
Emphasize understanding basic DM concepts
and using software for mining data
Design of the Course
 Used
book by Dunham
Book divided into 3 parts
 About
1 week spent on definitions,
applications, motivations, challenges, …
 Core of the course spent on core DM
subjects: classification, clustering, mining
association rules
 Last week for project presentations
 Assigning
objects to classes
supervised learning
 Example:
classify a military vehicle as a
friendly or an enemy vehicle
 Methods covered include: decision trees,
Naïve Bayesian, k-nearest neighbor,
 Grouping
objects into different classes
unsupervised learning
 Example:
cluster Weblog data to discover
groups of similar access patterns
 Techniques covered include: link
algorithms, nearest neighbor, k-means,
Association Rules
 Finding
patterns that occur together
 Example: diapers and beer are usually
bought together
 Techniques covered: Apriori, sampling,
partitioning, FP-growth
 Students
need to learn how to mine data
 One assignment on each core DM topic
apply two different algorithms on at least two
data sets, one has to be relatively large
can use any DM package (Weka)
 Students
write a report
 Students learn how to run an experiment
Term Project
 Group
 Either provide a non-trivial implementation
of a DM algorithm
 Or, learn about a DM topic not discussed
in class
 Graduate students required to read at
least three research papers and to write a
 All students present their project in class
Examination Format
 Open
 Two types of questions
 First type, require basic knowledge of the
definitions, T/F, short answers
 Second
type, apply certain algorithms on
small data sets
 DM
is an interesting course for CS and
non-CS students
 DM can be taught for non-CS students
 A DM course can be taught for students
with minimal CS background