Year: Jordan University of Science and Technology Faculty of Computer & InformationTechnology Department of Computer Information Systems 2007/2008 Semester: 1 Course Title Course Number Instructor Office Location Office Phone Office Hours Email Course Information Advanced methods in Data Mining CS 729 Dr. Hassan Najadat Ph4 L0 7201000 Ext. 23405 Sunday, Tuesday, Thursday 10:15-11:15 or by appointment najadat@just.edu.jo Catalog Description Advanced techniques in data mining topics may includes: Association Rule mining, Classification, Clustering, Text Minining, Knowledge Extraction, Web Information Retrieval, Mediators, Wrappers and Data Warehousing, Web Mining and Crawling, Decision Trees, Statistical Methods, Pattern Recognition and Machine Learning Techniques. Title Author Publishers Year Edition Book Website References Text Book Data Mining: Concepts and Techniques, Han, J. and Kamber, M. Morgan Kaufmann 2006 2 http://www.cs.sfu.ca/~han/dmbook Principles of Data Mining, David Hand, Heikki Mannila and Padhraic Smyth. The MIT Press, 2001. Applied Data Mining Statistical Methods for Business and Industry. Paolo Giudici, Wiley, 2003. Data Mining Practical Machine Learning Tools and Techniques. Ian H. Witten and Eibe Frank. Morgan Kauffmann, 2005. Assessment Type Midterm Term Paper Readings and Assignments Final Exam Assessment Policy Expected Due Date TBA TBA TBA TBA Weight 25% 15% 10 % 50% 1 Course Objectives The main objective of this course is to provide students with the theoretical background and practical experience necessary to the application of data mining techniques to real world problems. The main objectives of the course are: 1. Define Data Mining. (15%) 2. Differentiate between data mining and databases. (10%) 3. Introduce the data mining models and tasks (10%) 4. Study data mining constraints (10 %) 5. Study classification (15%) 6. Study clustering. (15%). 7. Study Association Rules. (15%) 8. Study Data Mining applications (10%). Teaching & Learning Methods Class lectures, lecture notes, homework assignments, and projects are designed to achieve the course objectives. You should read the assigned chapters before class, complete assignments on time, participate in class discussions among other things to understand the material. You should ask questions, whether in class or during office hours. You are responsible for all material covered in class. If you have any concerns, please communicate them to the instructor in class, in office or by email. Course Content Topics Week 1 Introduction and Related Concepts 2, 3 Data Preprocessing 4,5 Mining Frequent Patterns, Associations, and Correlations 6, 7 Classification and Prediction 8 Cluster Analysis I 9 National Holiday (see Academic calendar in JUST website) 14-11-07 chapter 1 2 5 6 7 objective 6 8 10 Midterm 11 Cluster Analysis II 7 Mining Complex Types of Data (Spatial Databases, Multimedia Databases, Time-Series and Sequence Data, Text mining, Web Mining) Paper Presentations Different chapters 12, 15 16 1, 2, 4 3 7 5 6 2 Learning Outcomes Upon successful completion of this course, students will have the skills to apply data mining techniques to real world problems. In particular, students Should be able to know basic tasks associated with data mining such as classification and clustering. Should be able to implement data mining tasks. Should be able to choose a suitable data mining task to the problem at hand. Should be able to interpret the results produced by data mining. Should know web mining and in what ways it is different from data mining. Should be able to understand the characteristics of spatial and temporal data mining. Additional Notes Assignments Readings Term PAPER Drop Date Makeup Exams Attendance Code of Conduct Assignments are due at the beginning of class, Late assignments will not be accepted, All works have to be done independently, Students handing in similar homeworks will receive a grade of 0 (ZERO) and face possible disciplinary actions. Everyone needs to present at least one paper (with high-quality PPT/PDF slides). Everyone needs to bring one-page summary of the papers to be presented in class and hand it in right after the class presentations. You are strongly encouraged to select the papers in excellent quality and published or appeared in 2005, 2006 or 2007. Discuss with me before you finalize your paper selection. Recommended conference proceedings: SIGKDD, SIGIR, ICML, SIGMOD, ICDM, SDM etc. Recommended journals: DMKD (Data Mining and Knowledge Discovery), SIGKDD Explorations, Machine Learning, Journal of Machine Learning Research, Knowledge and Information Systems (KAIS), IEEE TKDE, etc. Everyone will conduct a research project during the course. The project can be a comparative study on existing data mining algorithms for a specific application, a development of new data mining algorithms which to some extent improve the existing methods, a novel application of existing methods to practical problems. The final stage of your project will be a high quality Data Mining research paper (publishable in substantial media - i.e., a Journal such as IEEE TKDE or?) will be submitted by each student The last day to drop the course is the 6-12-2007 In accordance with university regulations, i.e. students should bring a valid excuse authenticated through valid channels in JUST. Students are expected to attend all classes If a student misses 10% of the classes without an acceptable reason, the student will be assigned a grade of 35, according to the rules of JUST. The assignments, and of course the quizzes, and exams need to be done individually. Copying of another student's work or code, even if changes are subsequently made, is inappropriate, and such work or code will not be accepted. The University has very clear guidelines for academic misconduct, and they will be enforced in this class. 3