CS729 : Advanced Data Mining - Jordan University of Science and

advertisement
Year:
Jordan University of Science and Technology
Faculty of Computer & InformationTechnology
Department of Computer Information Systems
2007/2008
Semester: 1
Course Title
Course Number
Instructor
Office Location
Office Phone
Office Hours
Email
Course Information
Advanced methods in Data Mining
CS 729
Dr. Hassan Najadat
Ph4 L0
7201000 Ext. 23405
Sunday, Tuesday, Thursday 10:15-11:15 or by appointment
najadat@just.edu.jo
Catalog Description
Advanced techniques in data mining topics may includes: Association Rule mining, Classification,
Clustering, Text Minining, Knowledge Extraction, Web Information Retrieval, Mediators, Wrappers and
Data Warehousing, Web Mining and Crawling, Decision Trees, Statistical Methods, Pattern Recognition
and Machine Learning Techniques.
Title
Author
Publishers
Year
Edition
Book Website
References
Text Book
Data Mining: Concepts and Techniques,
Han, J. and Kamber, M.
Morgan Kaufmann
2006
2
http://www.cs.sfu.ca/~han/dmbook
 Principles of Data Mining, David Hand, Heikki Mannila and Padhraic
Smyth. The MIT Press, 2001.
 Applied Data Mining Statistical Methods for Business and Industry.
Paolo Giudici, Wiley, 2003.
 Data Mining Practical Machine Learning Tools and Techniques. Ian H.
Witten and Eibe Frank. Morgan Kauffmann, 2005.
Assessment Type
Midterm
Term Paper
Readings and Assignments
Final Exam
Assessment Policy
Expected Due Date
TBA
TBA
TBA
TBA
Weight
25%
15%
10 %
50%
1
Course Objectives
The main objective of this course is to provide students with the theoretical background and practical
experience necessary to the application of data mining techniques to real world problems. The main
objectives of the course are:
1. Define Data Mining. (15%)
2. Differentiate between data mining and databases. (10%)
3. Introduce the data mining models and tasks (10%)
4. Study data mining constraints (10 %)
5. Study classification (15%)
6. Study clustering. (15%).
7. Study Association Rules. (15%)
8. Study Data Mining applications (10%).
Teaching & Learning Methods

Class lectures, lecture notes, homework assignments, and projects are designed to achieve the course
objectives.
You should read the assigned chapters before class, complete assignments on time, participate in class
discussions among other things to understand the material. You should ask questions, whether in class
or during office hours.
You are responsible for all material covered in class.
If you have any concerns, please communicate them to the instructor in class, in office or by email.



Course Content
Topics
Week
1
Introduction and Related Concepts
2, 3
Data Preprocessing
4,5
Mining Frequent Patterns, Associations, and Correlations
6, 7
Classification and Prediction
8
Cluster Analysis I
9
National Holiday (see Academic calendar in JUST website) 14-11-07
chapter
1
2
5
6
7
objective
6
8
10
Midterm
11
Cluster Analysis II
7
Mining Complex Types of Data (Spatial Databases, Multimedia
Databases, Time-Series and Sequence Data, Text mining, Web
Mining)
Paper Presentations
Different
chapters
12, 15
16
1, 2, 4
3
7
5
6
2
Learning Outcomes
Upon successful completion of this course, students will have the skills to apply data mining techniques to real
world problems. In particular, students
 Should be able to know basic tasks associated with data mining such as classification and clustering.
 Should be able to implement data mining tasks.
 Should be able to choose a suitable data mining task to the problem at hand.
 Should be able to interpret the results produced by data mining.
 Should know web mining and in what ways it is different from data mining.
 Should be able to understand the characteristics of spatial and temporal data mining.
Additional Notes
Assignments




Readings






Term PAPER


Drop Date
Makeup
Exams
Attendance
Code of
Conduct


Assignments are due at the beginning of class,
Late assignments will not be accepted,
All works have to be done independently,
Students handing in similar homeworks will receive a grade of 0 (ZERO) and
face possible disciplinary actions.
Everyone needs to present at least one paper (with high-quality PPT/PDF slides).
Everyone needs to bring one-page summary of the papers to be presented in class
and hand it in right after the class presentations.
You are strongly encouraged to select the papers in excellent quality and
published or appeared in 2005, 2006 or 2007.
Discuss with me before you finalize your paper selection.
Recommended conference proceedings: SIGKDD, SIGIR, ICML, SIGMOD,
ICDM, SDM etc.
Recommended journals: DMKD (Data Mining and Knowledge Discovery),
SIGKDD Explorations, Machine Learning, Journal of Machine Learning
Research, Knowledge and Information Systems (KAIS), IEEE TKDE, etc.
Everyone will conduct a research project during the course. The project can be a
comparative study on existing data mining algorithms for a specific application,
a development of new data mining algorithms which to some extent improve the
existing methods, a novel application of existing methods to practical problems.
The final stage of your project will be a high quality Data Mining research paper
(publishable in substantial media - i.e., a Journal such as IEEE TKDE or?) will
be submitted by each student
The last day to drop the course is the 6-12-2007
In accordance with university regulations, i.e. students should bring a valid
excuse authenticated through valid channels in JUST.


Students are expected to attend all classes
If a student misses 10% of the classes without an acceptable reason, the student
will be assigned a grade of 35, according to the rules of JUST.
The assignments, and of course the quizzes, and exams need to be done
individually. Copying of another student's work or code, even if changes are
subsequently made, is inappropriate, and such work or code will not be accepted.
The University has very clear guidelines for academic misconduct, and they will be
enforced in this class.
3
Download