CS 7301 Data Mining (Graduate Level) Spring 2006 People: Instructor: Dr. Latifur Khan Office: ECSS (ES) 3.228 Phone: (972) 883 4137 E-mail: lkhan@utdallas.edu Office Hours: Friday 3.30 – 4.30 p.m. or by appointment. URL: http://www.utdallas.edu/~lkhan/Spring2006/cs7301.htm Class Time & Location: Friday: 9:30 a.m. 12:15 p.m. ECSS2.311 CS 7301 002 10635 RECENT ADVANCES IN COMPUTING: DATA MINING Teaching Assistants (TA): Lei Wang (leiwang@utdallas.edu) Database Laboratory at UTD--DBL@UTD). Course Summary This course covers the essential concepts, principles, techniques, and mechanisms for the design, analysis, use, and implementation of computerized data mining. Key knowledge discovery and data mining concepts and techniques are examined: Mining Association Rules in Large Databases, Classification and Prediction, Clustering Analysis, and Mining Complex Types of Data. The data mining system examined in this course represents the state-of-the-art, including traditional approaches as well as recent research developments. By providing a balanced view of "theory" and "practice," the course should allow the student to understand, use, and build practical data warehouses and data mining techniques in various domains. The course is intended to provide a basic understanding of the issues and problems involved in data mining systems, a knowledge of currently practical techniques for satisfying the needs of such a system, and an indication of the current research approaches that are likely to provide a basis for tomorrow's solutions. Required Materials The following textbook will be used this semester to augment the material presented in lectures: B1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor Morgan Kaufmann Publishers, August 2000. 550 pages. ISBN 1-55860489-8 B2. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Introduction to Data Mining, AddisonWesley April 2005. Optional Material: B3. Richard O. Duda, Peter E. Hart and David G. Stork Pattern Classification, Wiley Inter-science, ISBN: 0-47105669-3 Grading Homework I Homework II Homework III Homework IV Homework V 7% Project Presentation and Report Submission 10% Exam: 35% Requirements Your course grade will be based 35% on the exam, and 65% on the assignments. Please note that you must take the exam and do all the assignments to pass the course. The exams will be closed book. The first homework will be related to Weka system (a data mining tool). The second, third, and fourth home works will be related to the analysis of association rule, clustering and classification using Weka system for a given data set respectively. The fifth homework will study various data reduction techniques using MATLAB. The project will be a group programming assignment. Each group will implement a particular data mining applications (e.g., intrusion detection, image annotation etc.). In April or at the end of the semester each group will give a 1 hour presentation and submit a short report with regard to their project. Violations of academic honesty and integrity in this course will not be tolerated. The instructor will deal strictly with any violations. The "Academic Integrity Policy" provides details. Grades will be changed only when a grading error has been made; negotiation is not appropriate. If you think an error has been made, you should submit a written statement. You must submit an item for regrading within 10 days from when grading of that item is completed. Students are encouraged to discuss class topics between themselves. However, collaboration during the implementation of programming assignments, homework and tests is strictly forbidden. Please, be aware that your programs/homeworks/tests will be AUTOMATICALLY compared with each other during the evaluation. It is important to know that if you want to regrade the homework, you have to consult with the TA. Lectures Topic Chapters/Papesr Handout + Introductory Concepts Mining Association Rules in Large Database Chapter 1 [B1] Chapter 1, 2 [B2] Association Analysis: Advanced Concepts Clustering Analysis Cluster Analysis: Additional Issues and Algorithms Classification and Prediction Data Reduction Techniques Mining Complex Data Anomaly Detection Presentation Exam Homework/Lecture Notes Lecture Note Weka System Chapter 6 [B1] Chapter 6 [B2] Assoc iation Rules Chapter 7 [B2] Chapter 8 [B1] Lecture Note Software and DataSet: Chapter 8[B2] Chapter 9 [B2] Chapter 7 [B1] Lecture Note Chapter 3.8 [B3] Chapter 10 [B2] How to read a Technical Paper Related to Data Mining: Lecture Note April 7/14 April 21 ci.edu/~mlearn/MLRepository.html. 3. Some additional documents related to WEKA: http://maya.cs.depaul.edu/~classes/ect584/WEKA/index.html 1. Weka System-http://www.cs.w aikato.ac.nz/ml/ weka/ 2. UCI Dataset -http://www.ics.u