CS 6V81 Data Mining (Graduate Level)

advertisement
CS 7301 Data Mining (Graduate Level)
Spring 2006
People:
Instructor: Dr. Latifur Khan
Office: ECSS (ES) 3.228
Phone: (972) 883 4137
E-mail: lkhan@utdallas.edu
Office Hours: Friday 3.30 – 4.30 p.m. or by appointment.
URL: http://www.utdallas.edu/~lkhan/Spring2006/cs7301.htm
Class Time & Location:
Friday: 9:30 a.m. 12:15 p.m.
ECSS2.311
CS 7301 002 10635 RECENT ADVANCES IN COMPUTING: DATA MINING
Teaching Assistants (TA):
Lei Wang (leiwang@utdallas.edu)
Database Laboratory at UTD--DBL@UTD).
Course Summary
This course covers the essential concepts, principles, techniques, and mechanisms for the
design, analysis, use, and implementation of computerized data mining. Key knowledge
discovery and data mining concepts and techniques are examined: Mining Association
Rules in Large Databases, Classification and Prediction, Clustering Analysis, and Mining
Complex Types of Data. The data mining system examined in this course represents the
state-of-the-art, including traditional approaches as well as recent research developments.
By providing a balanced view of "theory" and "practice," the course should allow the
student to understand, use, and build practical data warehouses and data mining
techniques in various domains. The course is intended to provide a basic understanding of
the issues and problems involved in data mining systems, a knowledge of currently
practical techniques for satisfying the needs of such a system, and an indication of the
current research approaches that are likely to provide a basis for tomorrow's solutions.
Required Materials
The following textbook will be used this semester to augment the material
presented in lectures:
B1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques,
The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series
Editor Morgan Kaufmann Publishers, August 2000. 550 pages. ISBN 1-55860489-8
B2. Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar, Introduction to Data Mining, AddisonWesley April 2005.
Optional Material:
B3. Richard O. Duda, Peter E. Hart and David G. Stork
Pattern Classification, Wiley Inter-science, ISBN: 0-47105669-3
Grading
Homework I
Homework II
Homework III
Homework IV
Homework V 7%
Project
Presentation and Report Submission 10%
Exam: 35%
Requirements
Your course grade will be based 35% on the exam, and 65% on the assignments. Please
note that you must take the exam and do all the assignments to pass the course. The
exams will be closed book. The first homework will be related to Weka system (a data
mining tool). The second, third, and fourth home works will be related to the analysis of
association rule, clustering and classification using Weka system for a given data set
respectively. The fifth homework will study various data reduction techniques using
MATLAB. The project will be a group programming assignment. Each group will
implement a particular data mining applications (e.g., intrusion detection, image
annotation etc.). In April or at the end of the semester each group will give a 1 hour
presentation and submit a short report with regard to their project. Violations of academic
honesty and integrity in this course will not be tolerated. The instructor will deal strictly
with any violations. The "Academic Integrity Policy" provides details.
Grades will be changed only when a grading error has been made; negotiation is not
appropriate. If you think an error has been made, you should submit a written
statement. You must submit an item for regrading within 10 days from when grading
of that item is completed.
Students are encouraged to discuss class topics between themselves. However,
collaboration during the implementation of programming assignments, homework
and tests is strictly forbidden. Please, be aware that your programs/homeworks/tests
will be AUTOMATICALLY compared with each other during the evaluation.
It is important to know that if you want to regrade the homework, you have to consult
with the TA.
Lectures
Topic
Chapters/Papesr
Handout +
Introductory
Concepts
Mining Association
Rules in Large
Database
Chapter 1 [B1]
Chapter 1, 2 [B2]
Association
Analysis: Advanced
Concepts
Clustering Analysis
Cluster Analysis:
Additional Issues
and Algorithms
Classification and
Prediction
Data Reduction
Techniques
Mining Complex
Data
Anomaly Detection
Presentation
Exam
Homework/Lecture
Notes
Lecture Note
Weka System
Chapter 6 [B1]
Chapter 6 [B2]
Assoc iation Rules
Chapter 7 [B2]
Chapter 8 [B1]
Lecture Note
Software and
DataSet:
Chapter 8[B2]
Chapter 9 [B2]
Chapter 7 [B1]
Lecture Note
Chapter 3.8 [B3]
Chapter 10 [B2]
How to read
a Technical
Paper
Related to
Data
Mining:
Lecture Note
April 7/14
April 21
ci.edu/~mlearn/MLRepository.html.
3. Some additional documents related to WEKA:
http://maya.cs.depaul.edu/~classes/ect584/WEKA/index.html
1. Weka
System-http://www.cs.w
aikato.ac.nz/ml/
weka/
2. UCI Dataset
-http://www.ics.u
Download