Cleveland State University Department of Electrical and Computer Engineering

advertisement

Cleveland State University

Department of Electrical and Computer Engineering

CIS 660/760 Data Mining

Catalog Data : CIS 660/CIS760 Data Mining (4-0-4)

Prerequisites: CIS 505 and CIS 530. The study of data mining methods, technologies, and algorithms. Topics include data cleaning issues, data pre-processing, data transforming, model creation, feature selection, and model validation and evaluation. Students will learn data mining processes and issues, and various techniques and be able to apply data mining techniques and algorithms to practical data mining problems with handson experiences on processing sample data sets or data stream obtained from well-known social network sites. The course extends advanced data mining techniques on anomaly detections, big data processing techniques for web mining and text mining with data mining algorithms and implementations. Finally, the students will be exposed to the latest advances in research in data analytics fields.

Textbooks : “Data Mining Concepts and Techniques” Jiawei Han / Micheline Kamber

Morgan Kaufmann Publishers, 3rd Eds., 2011 ISBN-13: 978-

0123814791 ISBN-10: 0123814790

Introduction to Data Mining: Pang-nin Tang, Michael Steinbach , Vipin

Kumar, Addison Wesley; ISBN-10: 0321321367 ISBN-13: 978-

0321321367

Lecture Notes Taken from the Selective Database Research Papers on

Data Mining and Big Data Processing

Coordinator: Dr. Sunnie S. Chung

Outcomes : A student who successfully completes this course will be able to:

• Develop basic understanding of data mining theory and algorithms;

• Implement mathematical and logical ideas of data mining algorithms;

• Develop computer programs for data mining algorithms;

• Apply data mining algorithms to solve practical data analytics problems;

• Develop data analytics applications using data analytics systems and software tools;

Topics

1.

Introduction to Data Mining and Big Data

Data Exploration

Data Cleaning

Data Preprocessing

Data Transformation

Lecture Hours

6

2.

Data Warehouse and Online Analytical Processing (OLAP)

On Line Analytical Processing

OLAP Aggregation Operators: Cube, Roll Up, Drill Down

Multi-Dimensional Data Warehouse Design

6

Data Mining Process with Enterprise Data Warehouse

MultiD imensional eXpressions (MDX), Data Mining eXpressions (DMX)

Association Rule Mining, Frequent Pattern Mining 3 3.

4.

5.

6.

Association Rules Mining Algorithms:

APRIORI Algorithm and Optimization

Frequent Pattern Tree, Hash Tree

Rule Generation and Optimization

Correlation Measures

Classification and Prediction

Decision Tree

7.

8.

PEBLS: Parallel Exemplar-Based Learning

Alternative Classification:

Naive Bayesian and Bayesian Network

Neural Network

Support Vector Machine

9.

Backpropagation

Ensemble Learning: Boost, Bagging

10.

11.

Error Estimation, Model Evaluation

Clustering Analysis

K-means

3

3

6

3

6

3

3

3

Hierarchical clustering

Density-based clustering

Evaluation of Clustering Techniques

12.

Information Retrieval:

Text Mining

Web Mining

13.

Data Mining Anomaly Detection

Outlier Detection

Statistical Approaches

Distance-based Approaches

14.

Advanced Research literature review and Presentations

On Data Analytics and Projects

15.

Exams and Reviews

3

3

6

3

__

60

Grading: The course grade is based on a student's overall performance through the entire

Semester. The final grade is distributed among the following components:

1.

Exams (Midterm & Final) 45% (20% Midterm, 25% Final)

2.

Computer Labs 25% (about 3-4 Assignments)

3.

Project on Data Mining or Data Analytics: 2 person group project (20%)

4.

Research Topic Presentation : 10%

Additional Requirements for CIS760 Students:

• Doctoral students who take CIS760 must select a project to work on

• Doctoral students who take CIS760 must work on the project individually (instead of 2 person group)

• The list of projects and research papers for doctoral students will be given separately in class. A tentative example of the selection of the research projects and the paper list are given at the end of the course schedule here

• In each exam, one additional problem is designed to be completed by doctoral students only

Computer Software Required :

• R Programming

• WEKA: http://www.cs.waikato.ac.nz/~ml/weka/

• Microsoft SQL Server 2014,

• Microsoft Visual Studio 2013 or any higher

• Microsoft SQL Server Business Intelligence Data Analytic Tool 2014 or higher

• Adventure Works 2014 Data Warehouse Database for SQL Server 2014 or higher

– Will be directed in class

Tentative List of Research Papers and Projects for CIS 760 Doctoral Students:

CIS 760 Doctoral Students should choose a research topics (will be given in class) and give a 30 min presentation on the related papers on the topic and complete a project related to the subjects. A list of selective papers and Project Specification on each research topic will be given in class.

Download