CISC 4327 syllabus Summer 2015

advertisement

GENERAL INFORMATION

UNIVERSITY OF MARY HARDIN-BAYLOR

COMPUTER SCIENCE CLASS SYLLABUS

Summer, 2015

Course Number:

Course Title:

Number of Credits:

Location of Class:

Meeting Time:

Professor:

Office:

Office Hours:

Office Phone:

Email:

Class web-page:

COURSE DESCRIPTION

CISC 4327

Data Mining Algorithms and Applications

3

Davidson Building, Room 122

2:30 - 3:50 pm Tuesday & Thursday

Dr. William G. Tanner, Jr.

Room 119 Davidson Building posted in Davidson and on-line

(254) 295-4645 btanner@umhb.edu

http://mars.umhb.edu/

With the current and continuing increase in information sources, e.g. google, yahoo, etc., we have truly entered the “Information Age”. We are also in information overload. For that reason alone we need to perfect efficient and effective algorithms to understand and benefit from the many sources of information.

Data mining is an increasingly important branch of computer science that examines data in order to find and describe patterns. Because we live in a world where we can be overwhelmed with information, it is imperative that we find ways to classify this input, to find the information we need, to illuminate structures, and to be able to draw conclusions. Data mining is a very practical discipline with many applications in science, and government, such as web analysis, disease diagnosis and outcome prediction, weather forecasting, fraud detection, and terrorism threat detection. It is based on methods from several fields, but mainly machine learning, statistics, and information visualization.

This course examines the design and efficiency of Data mining algorithms for the classification, association and non-trivial discovery of insights and knowledge within data. The data mining applications linked to data or databases will use both traditional and new data mining methods. A course in algorithms presents an opportunity to expose students to some of the fundamentals of data mining, in the form of decision trees.

These trees are a basic structure for representing data. Decision tree induction algorithms are used to classify data, perhaps the most common data mining task. This course will require a lot of out of class time.

The average student should spend between 3 to 15 hours per week, working on programs and projects

(keep up and if you start falling behind, ask for help early). Assignments will be given out in class and posted on the CISC 4327 web-page and help; Web-link: http://mars.umhb.edu/wgt/cisc4327/

COURSE OBJECTIVES

This is not a beginning programming course; a previous structured language course either in JAVA, C# or C++ is required. Some of the topics covered will be:

1. Classification and parsing methods.

2. Methods for discover of the association and clustering of data.

3. Data processing, file I/O and data access methods.

4. Data mining algorithms, implemented in C#

5. Incremental learning, Bayesian networks and other classifiers.

6. Use of standard data mining programs and Implementation of our own.

COURSE MATERIALS:

Textbook:

Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Introduction to Data Mining, 2005, Addison Wesley,

ISBN #: 978-0321321367

Textbook Resources: http://www-users.cs.umn.edu/~kumar/dmbook/index.php

http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm

Other items:

A flash drive is required for this class (a 16 Gb or larger pen-drive is recommended).

COMPUTING LABORATORY

Our computer lab will have appropriate software installed to allow you to program in C++. Either Dev-C++ or Visual C++ is recommended. You are responsible for maintaining backup copies of all your programs. Our web-page at: http://mars.umhb.edu/ will be used to provide software and a BBS for class interaction.

COURSE POLICY AND PROCEDURES

1.

Grading: The final grade calculation will be reached according to the distribution described in the UMHB

Catalog. The final course grade will be computed by the following percentages:

Class participation & Daily Assignments 10%

10%

80%

Laboratory Projects

Tests (3): + FINAL

2. Attendance: The student is expected to attend ALL scheduled classes and will be held responsible for all class work and assignments. Continued absences will result in an unsatisfactory grade report for the course and exceeding 80% of schedule classes will result in an failing grade automatically

(for a TTH course that will be no more than 9).

3. Tests: All students are required to be present for a test. If an extreme emergency occurs, and you cannot make the test time, the student should make every effort to contact the professor by email, telephone or in person to receive permission to miss the test. Permission will be granted only in the case of extenuating circumstances.

4. Makeup Tests: Students desiring a Makeup Test must make arrangements with the professor to take the test. A Makeup Test must be scheduled during office hours BEFORE the next scheduled test. If a student fails to take a Makeup Test before the next scheduled test, that student will receive a ZERO for the missed test.

5. Assignments: All assignments will be due on the DUE-DATE (normally Tuesday’s). They are due at the beginning of a class period.

6. Final Exam: The final exam will be comprehensive. NO MAKEUP WILL BE GIVEN FOR THE FINAL.

5

6

7

4

Day Date Topic

0 June 9 Introduction

1

2

3

June 9 What is Data Mining?

June 10 Data: Types, Quality & Preprocessing

June 10 Measures of Similarity and Dissimilarity

June 11 Exploring Data: Summary Statistics

June 11 Visualization & OLAP

June 15 DUE Take Home Examination 1 (Chapters 1 – 3)

June 15 Classification: Basic Concepts & Decision Trees

June 15 Classification: Model Evaluation

June 16 Classification: Evaluating Classifier

June 16 Classification: Alternative Techniques

June 17 Classification: Rule-Based, Bayesian

June 17 Classification: ANN, SVM, Ensemble Methods

Lectures / Exams

8

9

June 22 DUE Take Home Examination 2 (Chapters 4 – 5)

June 18 Association Analysis: Basic Concepts - Item Generation

June 18 Association Analysis: Rule Generation, FP-Growth Algorithm

10 June 22 Association Analysis: Evaluation of Association Patterns

June 22 Association Analysis: Advanced Concepts, Categorical Attributes

11 June 23 Association Analysis: Sequential, Subgraph & Infrequent Patterns

June 29 DUE Take Home Examination 3 (Chapters 6 – 7)

12 June 29 Cluster Analysis: Basic Concepts & Algorithms

Chapter 1

Chapter 1

Chapter 2

Chapter 2

Chapter 3

Chapter 3

Exam #1

Chapter 4

Chapter 4

Chapter 4

Chapter 5

Chapter 5

Chapter 5

Exam #2

Chapter 6

Chapter 6

Chapter 7

Chapter 7

Exam #3

Chapter 8

Chapter 8

Chapter 7

June 29 Cluster Analysis: DBSCAN & Cluster Evaluation

13 June 30 Cluster Analysis: Prototype-Based & Density-Based Clustering

June 30 Cluster Analysis: Graph-Based & Scalable Clustering Algorithms

14 July 01 Review for Final Examination

16 July 02 Final Examination: Chapters (1 – 9)

Chapter 9

Chapter 9

Download