Cleveland State University
Department of Electrical and Computer Engineering
CIS 660/760 Data Mining
Catalog Data : CIS 660/CIS760 Data Mining (4-0-4)
Prerequisites: CIS 505 and CIS 530. The study of data mining methods, technologies, and algorithms. Topics include data cleaning issues, data pre-processing, data transforming, model creation, feature selection, and model validation and evaluation. Students will learn data mining processes and issues, and various techniques and be able to apply data mining techniques and algorithms to practical data mining problems with handson experiences on processing sample data sets or data stream obtained from well-known social network sites. The course extends advanced data mining techniques on anomaly detections, big data processing techniques for web mining and text mining with data mining algorithms and implementations. Finally, the students will be exposed to the latest advances in research in data analytics fields.
Textbooks : “Data Mining Concepts and Techniques” Jiawei Han / Micheline Kamber
Morgan Kaufmann Publishers, 3rd Eds., 2011 ISBN-13: 978-
0123814791 ISBN-10: 0123814790
Introduction to Data Mining: Pang-nin Tang, Michael Steinbach , Vipin
Kumar, Addison Wesley; ISBN-10: 0321321367 ISBN-13: 978-
0321321367
Lecture Notes Taken from the Selective Database Research Papers on
Data Mining and Big Data Processing
Coordinator: Dr. Sunnie S. Chung
Outcomes : A student who successfully completes this course will be able to:
• Develop basic understanding of data mining theory and algorithms;
• Implement mathematical and logical ideas of data mining algorithms;
• Develop computer programs for data mining algorithms;
• Apply data mining algorithms to solve practical data analytics problems;
• Develop data analytics applications using data analytics systems and software tools;
Topics
1.
Introduction to Data Mining and Big Data
Data Exploration
Data Cleaning
Data Preprocessing
Data Transformation
Lecture Hours
6
2.
Data Warehouse and Online Analytical Processing (OLAP)
On Line Analytical Processing
OLAP Aggregation Operators: Cube, Roll Up, Drill Down
Multi-Dimensional Data Warehouse Design
6
Data Mining Process with Enterprise Data Warehouse
MultiD imensional eXpressions (MDX), Data Mining eXpressions (DMX)
Association Rule Mining, Frequent Pattern Mining 3 3.
4.
5.
6.
Association Rules Mining Algorithms:
APRIORI Algorithm and Optimization
Frequent Pattern Tree, Hash Tree
Rule Generation and Optimization
Correlation Measures
Classification and Prediction
Decision Tree
7.
8.
PEBLS: Parallel Exemplar-Based Learning
Alternative Classification:
Naive Bayesian and Bayesian Network
Neural Network
Support Vector Machine
9.
Backpropagation
Ensemble Learning: Boost, Bagging
10.
11.
Error Estimation, Model Evaluation
Clustering Analysis
K-means
3
3
6
3
6
3
3
3
Hierarchical clustering
Density-based clustering
Evaluation of Clustering Techniques
12.
Information Retrieval:
Text Mining
Web Mining
13.
Data Mining Anomaly Detection
Outlier Detection
Statistical Approaches
Distance-based Approaches
14.
Advanced Research literature review and Presentations
On Data Analytics and Projects
15.
Exams and Reviews
3
3
6
3
__
60
Grading: The course grade is based on a student's overall performance through the entire
Semester. The final grade is distributed among the following components:
1.
Exams (Midterm & Final) 45% (20% Midterm, 25% Final)
2.
Computer Labs 25% (about 3-4 Assignments)
3.
Project on Data Mining or Data Analytics: 2 person group project (20%)
4.
Research Topic Presentation : 10%
Additional Requirements for CIS760 Students:
• Doctoral students who take CIS760 must select a project to work on
• Doctoral students who take CIS760 must work on the project individually (instead of 2 person group)
• The list of projects and research papers for doctoral students will be given separately in class. A tentative example of the selection of the research projects and the paper list are given at the end of the course schedule here
• In each exam, one additional problem is designed to be completed by doctoral students only
Computer Software Required :
• R Programming
• WEKA: http://www.cs.waikato.ac.nz/~ml/weka/
• Microsoft SQL Server 2014,
• Microsoft Visual Studio 2013 or any higher
• Microsoft SQL Server Business Intelligence Data Analytic Tool 2014 or higher
• Adventure Works 2014 Data Warehouse Database for SQL Server 2014 or higher
– Will be directed in class
Tentative List of Research Papers and Projects for CIS 760 Doctoral Students:
CIS 760 Doctoral Students should choose a research topics (will be given in class) and give a 30 min presentation on the related papers on the topic and complete a project related to the subjects. A list of selective papers and Project Specification on each research topic will be given in class.