A Data Mining Course for Computer Science and non Computer

advertisement
A Data Mining Course for
Computer Science and non
Computer Science Students
Jamil Saquer
Computer Science Department
Missouri State University
Springfield, MO
Outline
 Introduction


Motivation
Challenges
 Design



of the Course
Topics Covered
Assignments
Examination Format
 Conclusion
Introduction
 What



is data mining (DM)?
non-trivial process of identifying valid, novel,
useful, and ultimately understandable patterns
in large volumes of data.
DM is an interdisciplinary topic
Has many things in common with machine
learning and pattern recognition
Motivation for the Course
 Introducing
more electives
 Introducing graduate level CS courses
 Informatics Program
 Interest to faculty members and students
from other departments
 Author’s main area of research
Challenges in Designing the
Course
 Diverse


student population
CS vs. non-CS
undergrad vs. grad
 Solution


Informatics program in design stages
MNAS CS option is new
• Therefore, emphasis on undergrad CS students
Accommodating other students

Minimize prerequisites



CS 2 (or even CS 1)
Capable of using a DM software
Scientific background/mentality
• One from business, another from GGP

For grad CS students:
• project requires more research
• Tests could be a little different

Emphasize understanding basic DM concepts
and using software for mining data
Design of the Course
 Used

book by Dunham
Book divided into 3 parts
 About
1 week spent on definitions,
applications, motivations, challenges, …
 Core of the course spent on core DM
subjects: classification, clustering, mining
association rules
 Last week for project presentations
Classification
 Assigning

objects to classes
supervised learning
 Example:
classify a military vehicle as a
friendly or an enemy vehicle
 Methods covered include: decision trees,
Naïve Bayesian, k-nearest neighbor,
backpropogation
Clustering
 Grouping

objects into different classes
unsupervised learning
 Example:
cluster Weblog data to discover
groups of similar access patterns
 Techniques covered include: link
algorithms, nearest neighbor, k-means,
PAM, BIRCH, DBSCAN, CURE, ROCK
Association Rules
 Finding
patterns that occur together
 Example: diapers and beer are usually
bought together
 Techniques covered: Apriori, sampling,
partitioning, FP-growth
Assignments
 Students
need to learn how to mine data
 One assignment on each core DM topic


apply two different algorithms on at least two
data sets, one has to be relatively large
can use any DM package (Weka)
 Students
write a report
 Students learn how to run an experiment
Term Project
 Group
projects
 Either provide a non-trivial implementation
of a DM algorithm
 Or, learn about a DM topic not discussed
in class
 Graduate students required to read at
least three research papers and to write a
report
 All students present their project in class
Examination Format
 Open
book
 Two types of questions
 First type, require basic knowledge of the
material

definitions, T/F, short answers
 Second
type, apply certain algorithms on
small data sets
Conclusion
 DM
is an interesting course for CS and
non-CS students
 DM can be taught for non-CS students
 A DM course can be taught for students
with minimal CS background
Questions
Download