CS 591.003 Title: Topics on Data Mining Instructor: Abdullah Mueen Time: MWF 10:00 - 10:50 AM Room: Science Math Learning Center 352 Office Hours: Wed & Thu, 11:00AM-1:00PM Description: This course covers a range of topics on data mining. Introductory topics: clustering, classification, outlier Detection and association-rule discovery. Advanced topics: technologies for data mining (Data-Cube, MapReduce), algorithms for mining rich data types (time series, graph, trajectory) and applications of mining algorithms (search result ranking, recommender system). The course will have lectures on the introductory topics and assigned reading on the advanced topics. What you will learn: Basic data mining algorithms and their applications. Hands-on experience in cleaning, managing and processing large data. Some advanced data mining applications in specific domains and the challenges need to be solved. How to write papers in data mining workshops and conferences when you have good results. Book: Data Mining: Concepts and Techniques, 3rd ed. By Jiawei Han, Micheline Kamber and Jian Pei Grading: Grading will be based on the project (60%) and presentation (40%) of the paper with heavy emphasis on the project. There will be no exam. Lecture Schedule: Week 1: Week 2: Week 3: Week 4: Week 5: Week 6: Week 7: Week 8: Week 9: Week 10: Week 11: Week 12: Week 13: Week 14-16 Classification: Chapter 8 Classification: Chapter 8+9 Frequent Pattern Mining: Chapter 6, Labor day Clustering: Chapter 10 Clustering: Chapter 10+11 Outlier Detection: Chapter 12 Time Series Mining: Slides Data Mining Tools: Matlab, Weka, VW, Fall Break Mining other data types: Slides Paper presentation Paper Presentation Paper Presentation Paper Presentation Project, Thanksgiving Papers: Complete list of papers is in the course page. If you want to present a paper of your own choice, feel free to send it before 5th for my approval. Presentation: Each student present one paper selected from the pool. The presentation will be for 30 minutes and remaining time will be discussion. Every student should pick a paper and a day by the 5th week. The schedule will be maintained in the course page. Project: Each group will do one project. A group can have at most two students. Of course, expected work would be twice as much as one would do. I prefer individual projects. A project consists of three phases. 1. Proposal (20%): Pick a data from below. Define a problem/pattern/structure you want to solve/find/utilize in the data. Discuss the expected results if you succeed. If you don't want to define your own, you can propose to reproduce the original paper of the respective dataset which would be much harder. 2. Implementation (20%): Solve/find/utilize the problem/pattern/structure in the data automatically. You can use any language and platform. 3. Report (20%): Write up the method you applied and discuss the findings/results.