Title: Topics on Data Mining Instructor: Abdullah Mueen Time: MWF

advertisement
CS 591.003
Title: Topics on Data Mining
Instructor: Abdullah Mueen
Time: MWF 10:00 - 10:50 AM
Room: Science Math Learning Center 352
Office Hours: Wed & Thu, 11:00AM-1:00PM
Description: This course covers a range of topics on data mining. Introductory
topics: clustering, classification, outlier Detection and association-rule discovery.
Advanced topics: technologies for data mining (Data-Cube, MapReduce), algorithms
for mining rich data types (time series, graph, trajectory) and applications of mining
algorithms (search result ranking, recommender system). The course will have
lectures on the introductory topics and assigned reading on the advanced topics.
What you will learn:




Basic data mining algorithms and their applications.
Hands-on experience in cleaning, managing and processing large data.
Some advanced data mining applications in specific domains and the
challenges need to be solved.
How to write papers in data mining workshops and conferences when you
have good results.
Book: Data Mining: Concepts and Techniques, 3rd ed. By Jiawei Han, Micheline
Kamber and Jian Pei
Grading: Grading will be based on the project (60%) and presentation (40%) of the
paper with heavy emphasis on the project. There will be no exam.
Lecture Schedule:
Week 1:
Week 2:
Week 3:
Week 4:
Week 5:
Week 6:
Week 7:
Week 8:
Week 9:
Week 10:
Week 11:
Week 12:
Week 13:
Week 14-16
Classification: Chapter 8
Classification: Chapter 8+9
Frequent Pattern Mining: Chapter 6, Labor day
Clustering: Chapter 10
Clustering: Chapter 10+11
Outlier Detection: Chapter 12
Time Series Mining: Slides
Data Mining Tools: Matlab, Weka, VW, Fall Break
Mining other data types: Slides
Paper presentation
Paper Presentation
Paper Presentation
Paper Presentation
Project, Thanksgiving
Papers:
Complete list of papers is in the course page. If you want to present a paper of your
own choice, feel free to send it before 5th for my approval.
Presentation:
Each student present one paper selected from the pool. The presentation will be for
30 minutes and remaining time will be discussion. Every student should pick a
paper and a day by the 5th week. The schedule will be maintained in the course page.
Project:
Each group will do one project. A group can have at most two students. Of course,
expected work would be twice as much as one would do. I prefer individual projects.
A project consists of three phases.
1. Proposal (20%): Pick a data from below. Define a problem/pattern/structure
you want to solve/find/utilize in the data. Discuss the expected results if you
succeed. If you don't want to define your own, you can propose to reproduce
the original paper of the respective dataset which would be much harder.
2. Implementation (20%): Solve/find/utilize the problem/pattern/structure in
the data automatically. You can use any language and platform.
3. Report (20%): Write up the method you applied and discuss the
findings/results.