Syllabus - Home | Georgia State University

advertisement
Course Policy
CSCI 4370/5370 Data Mining
Spring 2014
Catalog description
This course introduces the basic concepts, principles, and the state-of-the-art technologies for Data Mining including
Introduction of Data Mining, Data Preprocessing, Data Warehouse, Association Rules, Classification, and Clustering.
Specific applications in financial data and Bioinformatics are included.
Prerequisite: CSCI 3360 Database Systems
Course goal: Introduce concepts, principles, technologies and practice of knowledge discovery and data mining.
Objectives: Upon the completion of this course, the student will be able to
 Master the key concepts and principles employed in Data Mining
 Specify the relations between Data Mining, Data Warehouse, and Database Systems.
 Mine frequent patterns, associations, and correlations from different type of data
 Apply classification techniques for rule extraction and prediction.
 Form meaningful clusters and explain the clustering results.
 For graduate students (CSCI 5370), apply advanced Data Mining techniques to their research topic and generate
quality results.
 For graduate students (CSCI 5370), demonstrate an ability to work with undergraduate students as a team leader
and guide the team to complete the final project.
Textbook

Data Mining: Concepts and Techniques, Micheline Kamber and Jiawei Han, 2nd ed., Morgan Kaufmann, 2005
Reference



Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Prentice Hall, 2006
Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten and Eibe Frank ; Morgan
Kaufmann, 2005.
Other handouts and selected papers
Course Description
This course includes the following major topics:
 Introduction: The basic architecture of a data mining system is described and a brief introduction to
the concepts of database systems and data warehouses is provided.
 Data Preprocessing: Using techniques for preprocessing the data prior to mining are described
methods of data cleaning, data integration and transformation.
 Data Warehouse: An introduction to data warehouse and OLAP (Online Analytical Processing) is
provided. Topics include the concept of data warehouses and multidimensional databases, the
construction of data cubes, and the relationship between data warehousing and data mining.
 Association Rules: An introduction to this topic including a classification of association rules, a
presentation of the basic Apriori algorithm and its variations, techniques for mining multi-level
association rules, multi-dimensional association rules, and correlation rules.
 Classification: A description of methods for data classification and prediction, including decision tree,
Bayesian classification, Neural Networks, K-nearest neighbor, genetic algorithm, and fuzzy set
approaches is provided.
 Clustering: A description of methods of cluster analysis is given. This topic first introduces the
concept of data clustering, and then presents several major data clustering approaches.
1
Course Grade
Undergraduate (CSCI 4370)




Midterm exam
Final exam
Homework Assignment and Semester project
o Research proposal --- 5%
o Research proposal presentation --- 5%
o Research final presentation --- 10%
o Final Research paper --- 35%
Class Participation
20 %
20 %
55 %
5%
Graduate (CSCI 5370)





Midterm exam
Final exam
Homework Assignment and Semester project
o Research proposal --- 5%
o Research proposal presentation --- 5%
o Research final presentation --- 10%
o Final Research paper --- 35%
Related Research paper presentation
Class Participation
15 %
15 %
55 %
10 %
5%
Students are expected to read and present several papers on various topics involving current techniques. The
semester project would involve researching and writing a 6-8 page IEEE format paper for each team. Graduate
students are responsible for leading and guiding the team to complete the final project. The objective is to get
acquainted with reading scientific papers in the area of Data Mining, to practice scientific writing, and to do
state-of-the-art research of one particular topic. The target quality of the research papers initiated in the class is
expected to be good enough for submission to Data Mining professional conferences with appropriate further
modification.
Your numeric score will be translated to a letter grade at the end of the semester according to the table below.
Numeric Score
90 – 100
80 – 89
70 – 79
60 – 69
0 - 59
Letter Grade
A
B
C
D
F
Attendance and Drop Policy:
Attendance of every class is mandatory. Class attendance contributes 5% of your final grade. The Last
day to withdraw from this course with a W grade is 3/21 (Double Check yourself)
Statement on Academic dishonesty/plagiarism:
Academic misconduct is defined in the section of Academic Policies in your Student Handbook.
Students who engage in such misconduct will be penalized. You are encouraged to familiarize with all
policies listed in the Student Handbook
The University of Central Arkansas affirms its commitment to academic integrity and expects all members of the university community to
accept shared responsibility for maintaining academic integrity. Students in this course are subject to the provisions of the university's
Academic Integrity Policy, approved by the Board of Trustees as Board Policy No. 709 on February 10, 2010, and published in the Student
Handbook. Penalties for academic misconduct in this course may include a failing grade on an assignment, a failing grade in the course, or
any other course-related sanction the instructor determines to be appropriate. Continued enrollment in this course affirms a student's
acceptance of this university policy.
2
The University of Central Arkansas adheres to the requirements of the Americans with Disabilities Act. If you need an accommodation
under this Act due to a disability, please contact the UCA Office of Disability Services, 450-3613.
Dr. Bernard Chen, Ph.D.
Assistant Professor
Computer Science Department
University of Central Arkansas
3
CSCI 4370/5370 Data Mining (Spring 2014)
Class Policy
Instructor: Dr. Bernard Chen
Office: MCST 304, Email: bchen@uca.edu (subject MUST include CSCI2320)
Class Schedule: 4:05 pm - 5:20 pm T/TH @MCST 339
Office Hours:
Tuesday, Thursday: 12:05 ~ 4:05pm or by appointment
Wednesday: 11:00 am ~ 1:00pm or by appointment
Extra instruction is available and encouraged when your own attempts to understand the subject matter
are unsuccessful. Come prepared with specific questions or areas to be discussed.
Attendance and Drop Policy
1. Attendance is mandatory. Attendance will be taken in the form of a short answer related to the class.
If you are absent on a day when homework, lab assignments or programming projects are due, you will
automatically forfeit any points assigned; the course assignment late-policy shall not apply. In addition,
missed in-class daily work, quizzes and exams cannot be made-up. If you do not attend class, you
automatically forfeit any points given that day. Only Exceptions:
a. School related functions such as band, orchestra, sports events, etc. A note from the coach, instructor,
supervisor, etc. must be provided. Any homework, lab assignments, or programming projects due during
the planned absence must be turned in to the instructor prior to the missed class, unless prior approval is
obtained from your instructor (via written request) to submit the work after you return. Any missed
exams must be made-up by the first class-day following the return from such an excused event.
b. Medically related absence. For all medically related absences proper documentation from a physician
including the physician’s name and phone number included on the document must be provided.
c. Family related emergency. Such emergencies must involve an immediate family member (father, mother,
brother, sister) or other member identified in advance to the professor.
2. It is the student’s responsibility to find out any information they missed due to an absence.
3. The students are required to attend all classes the semester. If the students miss more than 3 classes, for each
class the students missed, it will result in one point reduction from the final score of the class (the max one
point reduction is not bounded by the attendance’s 5% overall).
4. If the students absent from the class for consecutive two weeks (4 classes for T,TH class; 6 classes for M,W,F
classes), the students will receive a “W” without notice.
5. All computers and cell phones need to be shut down during the class unless the instructor asks the students to
open the computer. If the computer or cell phone is turned on when it is not necessary, students will be
considered absent for the class.
Homework Policy
1. Homework shall be submitted on the date due. NO LATE ASSIGNMENTS SHALL BE ACCEPTED.
2. Unless specifically stated otherwise, you may collaborate on homework; however, the work submitted must
reflect the individual effort of the person presenting the work.
3. If it is necessary for a student to be absent, it is still their responsibility to determine if there are any changes
in assignment due dates, schedule changes, etc. and to submit all assignments when due.
4. Save all work on a floppy diskette or USB flash memory device for back-up purposes. (The computers on
campus are reloaded periodically and anything you leave on them will be erased.)
5. In case of a discrepancy in recorded grades, it is suggested that each student keep a portfolio of his/her graded
work.
4
Exam Policy
1. Missing an exam is a very serious matter. There are only 3 valid reasons for missing an exam (see Attendance
and Drop Policy above):
a. School related functions such as band, orchestra, sports events, etc. A note from the coach, instructor,
supervisor, etc. must be provided.
b. An illness which requires a doctor's care (you must provide documentation from your physician for the
absence, which includes the physician’s name and phone number.)
c. A documented family emergency such as a death or surgery.
2. Make-up tests will be conducted at the instructor’s discretion.
Classroom / Lab Conduct
1.
2.
3.
4.
No food/drink in the classroom or lab.
No cell phone use in the classroom or lab (talking, texting, calculating, etc.).
No music/pornography in the classroom or lab.
Students must be provided with an environment conducive to learning. Disturbance of class by inappropriate
talking, laughing, being loud, inappropriate images on your computer screen, etc. shall result in the student’s
dismissal from the class.
5. Class and lab time are to be devoted to learning the material outlined in the course policy and syllabus. This
time shall not be utilized for checking email, visiting FaceBook or MySpace sites, or engaging in chat or any
other non-course related activities. Violation of this policy shall result in the student’s dismissal from the
class.
Academic Misconduct
1. The conduct of students in this course is expected to be in compliance with the ethical standards detailed on
pages 40-41 of the UCA 2006-2007 Student Handbook in the section entitled “Definition of Academic
Misconduct”.
2. Dishonesty in any form – including plagiarism, turning in assignments prepared by others, unauthorized
possession of exams, copying assignments from other student’s work/storage media, allowing other students
to copy or view your work – shall result in the student being penalized for the violation; such penalty may
result in that student being dismissed from the course and assigned an “F” at the end of the semester. If
assignments are copied, both students involved will be penalized equally.
University Policies
It is important that you familiarize yourself with the university policies described in the UCA 2006-2007 Student
Handbook.
a. Computer Use Policy: Refer to the section starting on page 31 of the UCA 2006-2007 Student
Handbook.
b. Sexual Harassment Policy: Refer to the section starting on page 117 of the UCA 2006-2007 Student
Handbook.
c. Academic Policies: Refer to the section starting on page 38 of the UCA 2006-2007 Student Handbook.
Building Emergency Plan statement
An Emergency Procedures Summary (EPS) for the building in which this class is held will be discussed during the first week
of this course. EPS documents for most buildings on campus are available at http://uca.edu/mysafety/bep/. Every student
should be familiar with emergency procedures for any campus building in which he/she spends time for classes or other
purposes.
Disabilities
The University of Central Arkansas adheres to the requirements of the Americans with Disabilities Act. If you
need an accommodation under this Act due to a disability, please contact the UCA Office of Disability Services
at 450-3135.
Dr. Bernard Chen, Ph.D.
Assistant Professor
Computer Science Department
University of Central Arkansas
5
CSCI 4370/5370 Data Mining Spring 2014
Week
1
2
3– 4
5
6– 8
9 – 11
12– 14
15
Topic
Introduction to Data Mining
Data Preprocessing
 Data Cleaning
 Data Integration and Transformation
 Data Reduction
Data Warehouse
 Multidimensional Data Model
 Data Warehouse Architecture
 Data Warehouse Implementation
 From Data Warehouse to Data Mining
Semester Project Discussion
Mining Frequent Patterns, Associations, and Correlations
 Basic Concepts and a Road Map
 Efficient and Scalable Frequent Itemset Mining
Method
 Mining Various Kinds of Association Rules
 From Association Mining to Correlation Analysis
 Constrain-Based Association Mining
Classification and Prediction
 Decision Tree
 Bayesian Classification
 Rule-Based Classification
 Classification by Back-propagation
 SVM
 Associative Classification
 Lazy Learners
Clustering
 Cluster Analysis
 Partition Methods
 Hierarchical Methods
 Density-Based Methods
 Grid-Based Method
 Model-Based Methods
 Clustering High-Dimensional Data
Semester Project Presentation
NOTE: This syllabus represents a general plan for the course and deviations from this plan may
be necessary during the duration of the course.
6
Download