New Course Proposal - STAT 8250

Cover Sheet (10/02/2002)
Course Number/Program Name STAT 8250 Data Mining II
Department Mathematics and Statistics
Degree Title (if applicable) Ph.D. in Analytics and Data Science
Proposed Effective Date Fall 2014
Check one or more of the following and complete the appropriate sections:
X New Course Proposal
Course Title Change
Course Number Change
Course Credit Change
Course Prerequisite Change
Course Description Change
Sections to be Completed
If proposed changes to an existing course are substantial (credit hours, title, and description), a
new course with a new number should be proposed.
A new Course Proposal (Sections II, III, IV, V, VII) is required for each new course proposed as
part of a new program. Current catalog information (Section I) is required for each
existing course incorporated into the program.
Minor changes to a course can use the simplified E-Z Course Change Form.
Submitted by:
Faculty Member
Not Approved
Department Curriculum Committee Date
Not Approved
Department Chair
College Curriculum Committee
College Dean
GPCC Chair
Dean, Graduate College
Not Approved
Not Approved
Not Approved
Not Approved
Not Approved
Vice President for Academic Affairs Date
Not Approved
Current Information (Fill in for changes)
Page Number in Current Catalog
Course Prefix and Number
Course Title
Class Hours
____Laboratory Hours_______Credit Hours________
Description (or Current Degree Requirements)
Proposed Information (Fill in for changes and new courses)
Course Prefix and Number _STAT 8250_________
Course Title _Data Mining II
Class Hours 3 ____Laboratory Hours_0____CreditHours___3 ___
Prerequisites STAT 8240
Description (or Proposed Degree Requirements):
This course is a continuation of STAT 8240: Data Mining. Data Mining is an information
extraction activity whose goal is to discover hidden facts contained in databases, perform
prediction and forecasting, and generally improve their performance through interaction with
data. The process includes data selection, cleaning, coding, using different statistical, pattern
recognition and machine learning techniques, and reporting and visualization of the generated
structures. The course will introduce additional modeling tools for pattern recognition and
prediction, including Sequential Pattern Analysis, Neural Networks, Support Vector Machine,
Nearest-neighbor classifiers, and many others. These tools will be taught through examples of
practical applications. Students will be encouraged to try different Data Mining software.
This new course will serve as an elective in both the MSAS program and in the Ph.D. in
Analytics and Data Science. Students with a particular emphasis in business are the
intended audience.
Additional Information (for New Courses only)
Instructor: TBD
Text: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and
Vipin Kumar AND SAS Course Notes: Applied Analytics Using SAS
Enterprise Miner by SAS®
Prerequisites: STAT 8240
Enable students to clean, expore, and mine a large data set with more tools such as
sequential pattern discovery, neural network modeling, bagging and boosting technique,
support vector machines, etc.
Instructional Method:
Traditional in-class instruction.
Method of Evaluation
Homework and Final Project.
Resources and Funding Required (New Courses only)
Other Personnel
New Books
New Journals
Other (Specify)
Funding Required Beyond
Normal Departmental Growth
This form will be completed by the requesting department and will be sent to the Office of the
Registrar once the course has been approved by the Office of the President.
The form is required for all new courses.
(Note: Limit 30 spaces)
Approval, Effective Term
Grades Allowed (Regular or S/U)
If course used to satisfy CPC, what areas?
Learning Support Programs courses which are
required as prerequisites
STAT 8250
Data Mining II
Fall 2014
Vice President for Academic Affairs or Designee __
VII Attach Syllabus (Attached)
Office Hours:
STAT8250 – Data Mining II
Course Pre-requisite: STAT8240
Course Text:
 [Required] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005). Introduction to
Data Mining. Addison Wesley. ISBN: 0-321-32136-7. Website:
 [Required] SAS Course Notes, Applied Analytics Using SAS Enterprise Miner. (Will be
provided by the instructor)
 [Recommended. But this one is based on a lower version software]. Randall Matignon,
Data Mining Using SAS Enterprise Miner, Website:
 [Recommended. This one is more theoretical and advanced] Trevor Hastie, Robert
Tibshirani, and Jerome Friedman (2001). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer-Verlab. ISBN 0-387-95284-5
Meeting Schedule:
Course Software:
SAS Enterprise Miner will be used as the major datamining software. JMP or some fee data
mining software may be shown during the lecture. Students are expected to have a working
knowledge of SAS and Enterprise Miner to be successful in this course.
Course Description:
This course is a continuation of STAT 8240: Data Mining. Data Mining is an information
extraction activity whose goal is to discover hidden facts contained in databases, perform
prediction and forecasting, and generally improve their performance through interaction with
data. The process includes data selection, cleaning, coding, using different statistical, pattern
recognition and machine learning techniques, and reporting and visualization of the generated
structures. The course will introduce additional modeling tools for pattern recognition and
prediction, including Sequential Pattern Analysis, Neural Networks, Support Vector Machine,
Nearest-neighbor classifiers, and many others. These tools will be taught through examples of
practical applications. Students will be encouraged to try different Data Mining software.
Course Content:
 Classification Methods
o Nearest-neighbor classifiers
o Artificial neural network
o Support Vector Machine
o Bagging, Boosting, and Forest
o Application with Enterprise Miner (data scoring, report generator, etc)
 Association Analysis
o Sequential Patterns
o Application with Enterprise Miner
 Other software
o Weka
o Knime
o Orange
 Case Studies
o Predict defaulting on home equity line of credit
o Heavy traffic path on website browsing
Learning Outcomes:
By the end of the course, students should be able to:
1. Describe the process of building a nearest-neighbor classifier;
2. Use software to build nearest-neighbor classifiers, and explain the results;
3. Describe the process of building an artificial neural network;
4. Use software to build artificial neural network, and explain the results;
5. Describe the fundamental idea of support vecor machine modeling;
6. Use software to build support vector machines, and explain the results;
7. Describe the process of bagging and boosting;
8. Use software to do bagging and boosting;
9. Compare different predictive models;
10. Demonstrate knowledge of basic sequential pattern discovery concepts, and some of
the algorithms used to generate sequences;
11. Clean, explore, and mine a large data set from real applications, and present the
results clearly in writing and orally.
Grading process (350 points in total):
o A (315 or above)
o B (280~314)
o C (245~279)
o D (210~244)
o F (209 or below)
Grade Components:
1. Attendance (24 points in total).
At each meeting, attendance will be checked. If you missed a class because of
some uncontrollable event (sick, accident, emergency, etc), please bring the
appropriate document (doctor’s note, ticket, supervisor’s note, etc) to get points
back. Email notice from the student is not considered as an “appropriate
2. Homework Assignments (6 total at 40 points each; 240 overall points)
Location and due date: Homework will be posted on line two weeks before the
due date. Please check the calendar file to see where to find it, and the due date.
Late submission and email submission: Homework assignments are collected at
the beginning of the face-to-face meeting on the due date. I prefer hard copy.
However if you have to miss a class, please email me the solution before the class
time. Otherwise, it will be considered a late submission. Late submission won’t
be accepted or graded in any circumstance. Please send the email to the email
address above in the header. I will confirm the acceptance within 24 hours. If I
doesn’t confirm, that means something is wrong so please re-send it to assure.
Please do not send emails to the vista email box because it will delay my
Discussion policy: You are allowed and encouraged to work together with other
students on homework problems, as long as you write up and turn in your own
solution. You are also allowed to ask me questions, although you should try to
think about the problems before asking. Questions such as “please tell me how to
answer problem #2” or “my program doesn’t run, please help” won’t be
answered. You should try to show me your work when ask questions. Some
good examples: “I am using the formula on page XXX to solve the problem, is it
appropriate?” “ I find several nodes to deal with missing values in Enterprise
Miner, does it matter to use node AAA, not the node BBB?” “ I want to
…….(describe your object in detail). Here is my program flow chart (give me the
screen shot), but I always got the following message: ……, Is there any obvious
problem in it?”
Regrade policy: If you feel that I have made an error in grading your homework
or test, please turn in your assignment again with a written explanation to me, and
I will consider your request. Please note that regarding may cause your grade to
go up, or down.
3. Final project report (44 total possible points)
Students will work as a team with up to 3 persons in the group. Please choose
your team members by yourself, and please do so as early as possible. Team
information is due on MMDDYY, together with assignment 5. Each team will
look for your own interest area, the data set, select the appropriate method (at
least two techniques from this class) to mine the data, and organize the analysis
and results in a written report.
Each group should submit a team project report. The project report should
roughly include:
1) Motivation of the project;
2) Introduction to the dataset and the target variable(s);
3) Data cleaning, preparing, and mining process;
4) Model validation;
5) Model evaluation or/and comparison;
6) Other findings.
Grading will be based on accuracy, organization overall clarity, detail, and quality
of both form and content. Effective communication also requires proper
grammar, punctuation, and spelling, but the grade will not be lowered for a few
minor errors. The project report grades will be designated as follows:
1) Excellent – Exceptional, one of the very best in the class. (The best: 44
points; Others: 41 points)
2) Good -- Above average for the class. (37 points)
3) Average – About as good as most teams in the class. (33 points)
4) Insufficient – Not quite up to the level of most teams. (29 points)
5) Unsatisfactory – Unsatisfactory (e.g., way too short, lack of major
components such as the introduction to the data, exhibits little to no
understanding) (25 points and below)
A report grade will be given to the whole team. If a team contains more than one
member, at the end of the course, team members will also evaluate one another on
a 0~100% grading scale. Individual grade on the project report will be weighted
according to what average percentage a student earns from team members.
4. Final project presentation (42 points)
There is no final exam for this course. However, at the final exam time, you will
present your project findings. Please organize your major findings in a brief
presentation. The length of the presentation depends on how many groups are
there in the class, so it will be announced after the team information is collected.
However, no matter how short the presentation will be, every group member
should present some slide(s). The grade will be given to every individual based
on the following aspects:
1) Accurate and Complete explanation of key concepts. (6 total possible pts.)
2) Smooth and clever transitions to connect key points. (6 total possible pts.)
3) Presents information in logical, interesting sequence which audience can
follow. (6 total possible pts.)
4) Poised, clear articulation; proper volume; steady rate; enthusiasm;
confidence. Speaker is clearly comfortable in front of the group. (6 total
possible pts.)
5) Maintains eye contact, seldom returning to notes, presentation is like a
planned conversation. (6 total possible pts.)
6) Demonstrate knowledge of the topic by responding confidently, precisely
and appropriately to all audience questions. (6 total possible pts, if N/A, 6
pts will be granted automatically)
7) Finished the presentation within the allotted time. (6 total possible pts; one
more minute, minus 1 point)
For the above 1) to 6), the grade will be designated as: excellent – 6 points; good
– 5 points; average – 4 points; insufficient – 3 points; unsatisfactory – 2 points
and below. Your final presentation grade will be (my assessment)*40% + (the
average of your other classmates’ assessment)*60%.
Withdrawal Policy…The last day to withdraw from the course and possibly receive a "W"
Students who find that they cannot continue in college for the entire semester after being
enrolled, because of illness or any other reason, need to complete an online form. To completely
or partially withdraw from classes at KSU, a student must withdraw online at, under Owl Express, Student Services.
The date the withdrawal is submitted online will be considered the official KSU withdrawal date
which will be used in the calculation of any tuition refund or refund to Federal student aid and/or
HOPE scholarship programs. It is advisable to print the final page of the withdrawal for your
records. Withdrawals submitted online prior to midnight on the last day to withdraw without
academic penalty will receive a “W” grade. Withdrawals after midnight will receive a “WF”.
Failure to complete the online withdrawal process will produce no withdrawal from classes. Call
the Registrar’s Office at 770-423-6200 during business hours if assistance is needed.
Students may, by means of the same online withdrawal and with the approval of the university
Dean, withdraw from individual courses while retaining other courses on their schedules. This
option may be exercised up until MMDDYYYY.
This is the date to withdraw without academic penalty for SEMESTER YYYY classes. Failure to
withdraw by the date above will mean that the student has elected to receive the final grade(s)
earned in the course(s). The only exception to those withdrawal regulations will be for those
instances that involve unusual and fully documented circumstances.
Academic Integrity: Every KSU student is responsible for upholding the provisions of the
Student Code of Conduct, as published in the Undergraduate and Graduate Catalogs. Section II
of the Student Code of Conduct addresses the University’s policy on academic honesty,
including provisions regarding plagiarism and cheating, unauthorized access to University
materials, misrepresentation/falsification of University records or academic work, malicious
removal, retention, or destruction of library materials, malicious/intentional misuse of computer
facilities and/or services, and misuse of student identification cards. Incidents of alleged
academic misconduct will be handled through the established procedures of the University
Judiciary Program, which includes either an “informal” resolution by a faculty member, resulting
in a grade adjustment, or a formal hearing procedure, which may subject a student to the Code of
Conduct’s minimal one semester suspension requirement.
Unauthorized Collaboration: Submission for academic credit of a work product, or a part
thereof, represented as its being one's own effort, which has been developed in substantial
collaboration with or without assistance from another person or source, is a violation of
academic honesty. It is also a violation of academic honesty knowingly to provide such
assistance. Collaborative work specifically authorized by a faculty member is allowed.