Course Policy CSCI 4370/5370 Data Mining Spring 2014 Catalog description This course introduces the basic concepts, principles, and the state-of-the-art technologies for Data Mining including Introduction of Data Mining, Data Preprocessing, Data Warehouse, Association Rules, Classification, and Clustering. Specific applications in financial data and Bioinformatics are included. Prerequisite: CSCI 3360 Database Systems Course goal: Introduce concepts, principles, technologies and practice of knowledge discovery and data mining. Objectives: Upon the completion of this course, the student will be able to Master the key concepts and principles employed in Data Mining Specify the relations between Data Mining, Data Warehouse, and Database Systems. Mine frequent patterns, associations, and correlations from different type of data Apply classification techniques for rule extraction and prediction. Form meaningful clusters and explain the clustering results. For graduate students (CSCI 5370), apply advanced Data Mining techniques to their research topic and generate quality results. For graduate students (CSCI 5370), demonstrate an ability to work with undergraduate students as a team leader and guide the team to complete the final project. Textbook Data Mining: Concepts and Techniques, Micheline Kamber and Jiawei Han, 2nd ed., Morgan Kaufmann, 2005 Reference Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Prentice Hall, 2006 Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten and Eibe Frank ; Morgan Kaufmann, 2005. Other handouts and selected papers Course Description This course includes the following major topics: Introduction: The basic architecture of a data mining system is described and a brief introduction to the concepts of database systems and data warehouses is provided. Data Preprocessing: Using techniques for preprocessing the data prior to mining are described methods of data cleaning, data integration and transformation. Data Warehouse: An introduction to data warehouse and OLAP (Online Analytical Processing) is provided. Topics include the concept of data warehouses and multidimensional databases, the construction of data cubes, and the relationship between data warehousing and data mining. Association Rules: An introduction to this topic including a classification of association rules, a presentation of the basic Apriori algorithm and its variations, techniques for mining multi-level association rules, multi-dimensional association rules, and correlation rules. Classification: A description of methods for data classification and prediction, including decision tree, Bayesian classification, Neural Networks, K-nearest neighbor, genetic algorithm, and fuzzy set approaches is provided. Clustering: A description of methods of cluster analysis is given. This topic first introduces the concept of data clustering, and then presents several major data clustering approaches. 1 Course Grade Undergraduate (CSCI 4370) Midterm exam Final exam Homework Assignment and Semester project o Research proposal --- 5% o Research proposal presentation --- 5% o Research final presentation --- 10% o Final Research paper --- 35% Class Participation 20 % 20 % 55 % 5% Graduate (CSCI 5370) Midterm exam Final exam Homework Assignment and Semester project o Research proposal --- 5% o Research proposal presentation --- 5% o Research final presentation --- 10% o Final Research paper --- 35% Related Research paper presentation Class Participation 15 % 15 % 55 % 10 % 5% Students are expected to read and present several papers on various topics involving current techniques. The semester project would involve researching and writing a 6-8 page IEEE format paper for each team. Graduate students are responsible for leading and guiding the team to complete the final project. The objective is to get acquainted with reading scientific papers in the area of Data Mining, to practice scientific writing, and to do state-of-the-art research of one particular topic. The target quality of the research papers initiated in the class is expected to be good enough for submission to Data Mining professional conferences with appropriate further modification. Your numeric score will be translated to a letter grade at the end of the semester according to the table below. Numeric Score 90 – 100 80 – 89 70 – 79 60 – 69 0 - 59 Letter Grade A B C D F Attendance and Drop Policy: Attendance of every class is mandatory. Class attendance contributes 5% of your final grade. The Last day to withdraw from this course with a W grade is 3/21 (Double Check yourself) Statement on Academic dishonesty/plagiarism: Academic misconduct is defined in the section of Academic Policies in your Student Handbook. Students who engage in such misconduct will be penalized. You are encouraged to familiarize with all policies listed in the Student Handbook The University of Central Arkansas affirms its commitment to academic integrity and expects all members of the university community to accept shared responsibility for maintaining academic integrity. Students in this course are subject to the provisions of the university's Academic Integrity Policy, approved by the Board of Trustees as Board Policy No. 709 on February 10, 2010, and published in the Student Handbook. Penalties for academic misconduct in this course may include a failing grade on an assignment, a failing grade in the course, or any other course-related sanction the instructor determines to be appropriate. Continued enrollment in this course affirms a student's acceptance of this university policy. 2 The University of Central Arkansas adheres to the requirements of the Americans with Disabilities Act. If you need an accommodation under this Act due to a disability, please contact the UCA Office of Disability Services, 450-3613. Dr. Bernard Chen, Ph.D. Assistant Professor Computer Science Department University of Central Arkansas 3 CSCI 4370/5370 Data Mining (Spring 2014) Class Policy Instructor: Dr. Bernard Chen Office: MCST 304, Email: bchen@uca.edu (subject MUST include CSCI2320) Class Schedule: 4:05 pm - 5:20 pm T/TH @MCST 339 Office Hours: Tuesday, Thursday: 12:05 ~ 4:05pm or by appointment Wednesday: 11:00 am ~ 1:00pm or by appointment Extra instruction is available and encouraged when your own attempts to understand the subject matter are unsuccessful. Come prepared with specific questions or areas to be discussed. Attendance and Drop Policy 1. Attendance is mandatory. Attendance will be taken in the form of a short answer related to the class. If you are absent on a day when homework, lab assignments or programming projects are due, you will automatically forfeit any points assigned; the course assignment late-policy shall not apply. In addition, missed in-class daily work, quizzes and exams cannot be made-up. If you do not attend class, you automatically forfeit any points given that day. Only Exceptions: a. School related functions such as band, orchestra, sports events, etc. A note from the coach, instructor, supervisor, etc. must be provided. Any homework, lab assignments, or programming projects due during the planned absence must be turned in to the instructor prior to the missed class, unless prior approval is obtained from your instructor (via written request) to submit the work after you return. Any missed exams must be made-up by the first class-day following the return from such an excused event. b. Medically related absence. For all medically related absences proper documentation from a physician including the physician’s name and phone number included on the document must be provided. c. Family related emergency. Such emergencies must involve an immediate family member (father, mother, brother, sister) or other member identified in advance to the professor. 2. It is the student’s responsibility to find out any information they missed due to an absence. 3. The students are required to attend all classes the semester. If the students miss more than 3 classes, for each class the students missed, it will result in one point reduction from the final score of the class (the max one point reduction is not bounded by the attendance’s 5% overall). 4. If the students absent from the class for consecutive two weeks (4 classes for T,TH class; 6 classes for M,W,F classes), the students will receive a “W” without notice. 5. All computers and cell phones need to be shut down during the class unless the instructor asks the students to open the computer. If the computer or cell phone is turned on when it is not necessary, students will be considered absent for the class. Homework Policy 1. Homework shall be submitted on the date due. NO LATE ASSIGNMENTS SHALL BE ACCEPTED. 2. Unless specifically stated otherwise, you may collaborate on homework; however, the work submitted must reflect the individual effort of the person presenting the work. 3. If it is necessary for a student to be absent, it is still their responsibility to determine if there are any changes in assignment due dates, schedule changes, etc. and to submit all assignments when due. 4. Save all work on a floppy diskette or USB flash memory device for back-up purposes. (The computers on campus are reloaded periodically and anything you leave on them will be erased.) 5. In case of a discrepancy in recorded grades, it is suggested that each student keep a portfolio of his/her graded work. 4 Exam Policy 1. Missing an exam is a very serious matter. There are only 3 valid reasons for missing an exam (see Attendance and Drop Policy above): a. School related functions such as band, orchestra, sports events, etc. A note from the coach, instructor, supervisor, etc. must be provided. b. An illness which requires a doctor's care (you must provide documentation from your physician for the absence, which includes the physician’s name and phone number.) c. A documented family emergency such as a death or surgery. 2. Make-up tests will be conducted at the instructor’s discretion. Classroom / Lab Conduct 1. 2. 3. 4. No food/drink in the classroom or lab. No cell phone use in the classroom or lab (talking, texting, calculating, etc.). No music/pornography in the classroom or lab. Students must be provided with an environment conducive to learning. Disturbance of class by inappropriate talking, laughing, being loud, inappropriate images on your computer screen, etc. shall result in the student’s dismissal from the class. 5. Class and lab time are to be devoted to learning the material outlined in the course policy and syllabus. This time shall not be utilized for checking email, visiting FaceBook or MySpace sites, or engaging in chat or any other non-course related activities. Violation of this policy shall result in the student’s dismissal from the class. Academic Misconduct 1. The conduct of students in this course is expected to be in compliance with the ethical standards detailed on pages 40-41 of the UCA 2006-2007 Student Handbook in the section entitled “Definition of Academic Misconduct”. 2. Dishonesty in any form – including plagiarism, turning in assignments prepared by others, unauthorized possession of exams, copying assignments from other student’s work/storage media, allowing other students to copy or view your work – shall result in the student being penalized for the violation; such penalty may result in that student being dismissed from the course and assigned an “F” at the end of the semester. If assignments are copied, both students involved will be penalized equally. University Policies It is important that you familiarize yourself with the university policies described in the UCA 2006-2007 Student Handbook. a. Computer Use Policy: Refer to the section starting on page 31 of the UCA 2006-2007 Student Handbook. b. Sexual Harassment Policy: Refer to the section starting on page 117 of the UCA 2006-2007 Student Handbook. c. Academic Policies: Refer to the section starting on page 38 of the UCA 2006-2007 Student Handbook. Building Emergency Plan statement An Emergency Procedures Summary (EPS) for the building in which this class is held will be discussed during the first week of this course. EPS documents for most buildings on campus are available at http://uca.edu/mysafety/bep/. Every student should be familiar with emergency procedures for any campus building in which he/she spends time for classes or other purposes. Disabilities The University of Central Arkansas adheres to the requirements of the Americans with Disabilities Act. If you need an accommodation under this Act due to a disability, please contact the UCA Office of Disability Services at 450-3135. Dr. Bernard Chen, Ph.D. Assistant Professor Computer Science Department University of Central Arkansas 5 CSCI 4370/5370 Data Mining Spring 2014 Week 1 2 3– 4 5 6– 8 9 – 11 12– 14 15 Topic Introduction to Data Mining Data Preprocessing Data Cleaning Data Integration and Transformation Data Reduction Data Warehouse Multidimensional Data Model Data Warehouse Architecture Data Warehouse Implementation From Data Warehouse to Data Mining Semester Project Discussion Mining Frequent Patterns, Associations, and Correlations Basic Concepts and a Road Map Efficient and Scalable Frequent Itemset Mining Method Mining Various Kinds of Association Rules From Association Mining to Correlation Analysis Constrain-Based Association Mining Classification and Prediction Decision Tree Bayesian Classification Rule-Based Classification Classification by Back-propagation SVM Associative Classification Lazy Learners Clustering Cluster Analysis Partition Methods Hierarchical Methods Density-Based Methods Grid-Based Method Model-Based Methods Clustering High-Dimensional Data Semester Project Presentation NOTE: This syllabus represents a general plan for the course and deviations from this plan may be necessary during the duration of the course. 6