KENNESAW STATE UNIVERSITY GRADUATE COURSE PROPOSAL OR REVISION, Cover Sheet (10/02/2002) Course Number/Program Name STAT 8250 Data Mining II Department Mathematics and Statistics Degree Title (if applicable) Ph.D. in Analytics and Data Science Proposed Effective Date Fall 2014 Check one or more of the following and complete the appropriate sections: X New Course Proposal Course Title Change Course Number Change Course Credit Change Course Prerequisite Change Course Description Change Sections to be Completed II, III, IV, V, VII I, II, III I, II, III I, II, III I, II, III I, II, III Notes: If proposed changes to an existing course are substantial (credit hours, title, and description), a new course with a new number should be proposed. A new Course Proposal (Sections II, III, IV, V, VII) is required for each new course proposed as part of a new program. Current catalog information (Section I) is required for each existing course incorporated into the program. Minor changes to a course can use the simplified E-Z Course Change Form. Submitted by: Faculty Member Approved _____ Date Not Approved Department Curriculum Committee Date Approved Approved Approved Approved Approved Approved Not Approved Department Chair Date College Curriculum Committee Date College Dean Date GPCC Chair Date Dean, Graduate College Date Not Approved Not Approved Not Approved Not Approved Not Approved Vice President for Academic Affairs Date Approved Not Approved President Date KENNESAW STATE UNIVERSITY GRADUATE COURSE/CONCENTRATION/PROGRAM CHANGE I. Current Information (Fill in for changes) Page Number in Current Catalog ___ Course Prefix and Number ___ Course Title ___ Class Hours ____Laboratory Hours_______Credit Hours________ Prerequisites ___ Description (or Current Degree Requirements) II. Proposed Information (Fill in for changes and new courses) Course Prefix and Number _STAT 8250_________ Course Title _Data Mining II Class Hours 3 ____Laboratory Hours_0____CreditHours___3 ___ Prerequisites STAT 8240 Description (or Proposed Degree Requirements): This course is a continuation of STAT 8240: Data Mining. Data Mining is an information extraction activity whose goal is to discover hidden facts contained in databases, perform prediction and forecasting, and generally improve their performance through interaction with data. The process includes data selection, cleaning, coding, using different statistical, pattern recognition and machine learning techniques, and reporting and visualization of the generated structures. The course will introduce additional modeling tools for pattern recognition and prediction, including Sequential Pattern Analysis, Neural Networks, Support Vector Machine, Nearest-neighbor classifiers, and many others. These tools will be taught through examples of practical applications. Students will be encouraged to try different Data Mining software. III. Justification This new course will serve as an elective in both the MSAS program and in the Ph.D. in Analytics and Data Science. Students with a particular emphasis in business are the intended audience. IV. Additional Information (for New Courses only) Instructor: TBD Text: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar AND SAS Course Notes: Applied Analytics Using SAS Enterprise Miner by SAS® Prerequisites: STAT 8240 Objectives: Enable students to clean, expore, and mine a large data set with more tools such as sequential pattern discovery, neural network modeling, bagging and boosting technique, support vector machines, etc. Instructional Method: Traditional in-class instruction. Method of Evaluation Homework and Final Project. V. Resources and Funding Required (New Courses only) Resource Amount Faculty Other Personnel Equipment Supplies Travel New Books New Journals Other (Specify) NA NA NA NA NA NA NA NA TOTAL NA Funding Required Beyond Normal Departmental Growth NA VI. COURSE MASTER FORM This form will be completed by the requesting department and will be sent to the Office of the Registrar once the course has been approved by the Office of the President. The form is required for all new courses. DISCIPLINE COURSE NUMBER COURSE TITLE FOR LABEL (Note: Limit 30 spaces) CLASS-LAB-CREDIT HOURS Approval, Effective Term Grades Allowed (Regular or S/U) If course used to satisfy CPC, what areas? Learning Support Programs courses which are required as prerequisites Statistics STAT 8250 Data Mining II 3–0–3 Fall 2014 Regular APPROVED: ________________________________________________ Vice President for Academic Affairs or Designee __ VII Attach Syllabus (Attached) Course: Instructor: Office: Office Hours: Email: STAT8250 – Data Mining II TBD TBD TBD TBD Course Pre-requisite: STAT8240 Course Text: [Required] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005). Introduction to Data Mining. Addison Wesley. ISBN: 0-321-32136-7. Website: http://wwwusers.cs.umn.edu/~kumar/dmbook/index.php [Required] SAS Course Notes, Applied Analytics Using SAS Enterprise Miner. (Will be provided by the instructor) [Recommended. But this one is based on a lower version software]. Randall Matignon, Data Mining Using SAS Enterprise Miner, Website: http://www.sasenterpriseminer.com/ [Recommended. This one is more theoretical and advanced] Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlab. ISBN 0-387-95284-5 Meeting Schedule: TBD Course Software: SAS Enterprise Miner will be used as the major datamining software. JMP or some fee data mining software may be shown during the lecture. Students are expected to have a working knowledge of SAS and Enterprise Miner to be successful in this course. Course Description: This course is a continuation of STAT 8240: Data Mining. Data Mining is an information extraction activity whose goal is to discover hidden facts contained in databases, perform prediction and forecasting, and generally improve their performance through interaction with data. The process includes data selection, cleaning, coding, using different statistical, pattern recognition and machine learning techniques, and reporting and visualization of the generated structures. The course will introduce additional modeling tools for pattern recognition and prediction, including Sequential Pattern Analysis, Neural Networks, Support Vector Machine, Nearest-neighbor classifiers, and many others. These tools will be taught through examples of practical applications. Students will be encouraged to try different Data Mining software. Course Content: Classification Methods o Nearest-neighbor classifiers o Artificial neural network o Support Vector Machine o Bagging, Boosting, and Forest o Application with Enterprise Miner (data scoring, report generator, etc) Association Analysis o Sequential Patterns o Application with Enterprise Miner Other software o JMP o Weka o Knime o Orange Case Studies o Predict defaulting on home equity line of credit o Heavy traffic path on website browsing Learning Outcomes: By the end of the course, students should be able to: 1. Describe the process of building a nearest-neighbor classifier; 2. Use software to build nearest-neighbor classifiers, and explain the results; 3. Describe the process of building an artificial neural network; 4. Use software to build artificial neural network, and explain the results; 5. Describe the fundamental idea of support vecor machine modeling; 6. Use software to build support vector machines, and explain the results; 7. Describe the process of bagging and boosting; 8. Use software to do bagging and boosting; 9. Compare different predictive models; 10. Demonstrate knowledge of basic sequential pattern discovery concepts, and some of the algorithms used to generate sequences; 11. Clean, explore, and mine a large data set from real applications, and present the results clearly in writing and orally. Grading: Grading process (350 points in total): o A (315 or above) o B (280~314) o C (245~279) o D (210~244) o F (209 or below) Grade Components: 1. Attendance (24 points in total). At each meeting, attendance will be checked. If you missed a class because of some uncontrollable event (sick, accident, emergency, etc), please bring the appropriate document (doctor’s note, ticket, supervisor’s note, etc) to get points back. Email notice from the student is not considered as an “appropriate document”. 2. Homework Assignments (6 total at 40 points each; 240 overall points) Location and due date: Homework will be posted on line two weeks before the due date. Please check the calendar file to see where to find it, and the due date. Late submission and email submission: Homework assignments are collected at the beginning of the face-to-face meeting on the due date. I prefer hard copy. However if you have to miss a class, please email me the solution before the class time. Otherwise, it will be considered a late submission. Late submission won’t be accepted or graded in any circumstance. Please send the email to the email address above in the header. I will confirm the acceptance within 24 hours. If I doesn’t confirm, that means something is wrong so please re-send it to assure. Please do not send emails to the vista email box because it will delay my response. Discussion policy: You are allowed and encouraged to work together with other students on homework problems, as long as you write up and turn in your own solution. You are also allowed to ask me questions, although you should try to think about the problems before asking. Questions such as “please tell me how to answer problem #2” or “my program doesn’t run, please help” won’t be answered. You should try to show me your work when ask questions. Some good examples: “I am using the formula on page XXX to solve the problem, is it appropriate?” “ I find several nodes to deal with missing values in Enterprise Miner, does it matter to use node AAA, not the node BBB?” “ I want to …….(describe your object in detail). Here is my program flow chart (give me the screen shot), but I always got the following message: ……, Is there any obvious problem in it?” Regrade policy: If you feel that I have made an error in grading your homework or test, please turn in your assignment again with a written explanation to me, and I will consider your request. Please note that regarding may cause your grade to go up, or down. 3. Final project report (44 total possible points) Students will work as a team with up to 3 persons in the group. Please choose your team members by yourself, and please do so as early as possible. Team information is due on MMDDYY, together with assignment 5. Each team will look for your own interest area, the data set, select the appropriate method (at least two techniques from this class) to mine the data, and organize the analysis and results in a written report. Each group should submit a team project report. The project report should roughly include: 1) Motivation of the project; 2) Introduction to the dataset and the target variable(s); 3) Data cleaning, preparing, and mining process; 4) Model validation; 5) Model evaluation or/and comparison; 6) Other findings. Grading will be based on accuracy, organization overall clarity, detail, and quality of both form and content. Effective communication also requires proper grammar, punctuation, and spelling, but the grade will not be lowered for a few minor errors. The project report grades will be designated as follows: 1) Excellent – Exceptional, one of the very best in the class. (The best: 44 points; Others: 41 points) 2) Good -- Above average for the class. (37 points) 3) Average – About as good as most teams in the class. (33 points) 4) Insufficient – Not quite up to the level of most teams. (29 points) 5) Unsatisfactory – Unsatisfactory (e.g., way too short, lack of major components such as the introduction to the data, exhibits little to no understanding) (25 points and below) A report grade will be given to the whole team. If a team contains more than one member, at the end of the course, team members will also evaluate one another on a 0~100% grading scale. Individual grade on the project report will be weighted according to what average percentage a student earns from team members. 4. Final project presentation (42 points) There is no final exam for this course. However, at the final exam time, you will present your project findings. Please organize your major findings in a brief presentation. The length of the presentation depends on how many groups are there in the class, so it will be announced after the team information is collected. However, no matter how short the presentation will be, every group member should present some slide(s). The grade will be given to every individual based on the following aspects: 1) Accurate and Complete explanation of key concepts. (6 total possible pts.) 2) Smooth and clever transitions to connect key points. (6 total possible pts.) 3) Presents information in logical, interesting sequence which audience can follow. (6 total possible pts.) 4) Poised, clear articulation; proper volume; steady rate; enthusiasm; confidence. Speaker is clearly comfortable in front of the group. (6 total possible pts.) 5) Maintains eye contact, seldom returning to notes, presentation is like a planned conversation. (6 total possible pts.) 6) Demonstrate knowledge of the topic by responding confidently, precisely and appropriately to all audience questions. (6 total possible pts, if N/A, 6 pts will be granted automatically) 7) Finished the presentation within the allotted time. (6 total possible pts; one more minute, minus 1 point) For the above 1) to 6), the grade will be designated as: excellent – 6 points; good – 5 points; average – 4 points; insufficient – 3 points; unsatisfactory – 2 points and below. Your final presentation grade will be (my assessment)*40% + (the average of your other classmates’ assessment)*60%. Withdrawal Policy…The last day to withdraw from the course and possibly receive a "W" is MMDDYYYY. Students who find that they cannot continue in college for the entire semester after being enrolled, because of illness or any other reason, need to complete an online form. To completely or partially withdraw from classes at KSU, a student must withdraw online at www.kennesaw.edu, under Owl Express, Student Services. The date the withdrawal is submitted online will be considered the official KSU withdrawal date which will be used in the calculation of any tuition refund or refund to Federal student aid and/or HOPE scholarship programs. It is advisable to print the final page of the withdrawal for your records. Withdrawals submitted online prior to midnight on the last day to withdraw without academic penalty will receive a “W” grade. Withdrawals after midnight will receive a “WF”. Failure to complete the online withdrawal process will produce no withdrawal from classes. Call the Registrar’s Office at 770-423-6200 during business hours if assistance is needed. Students may, by means of the same online withdrawal and with the approval of the university Dean, withdraw from individual courses while retaining other courses on their schedules. This option may be exercised up until MMDDYYYY. This is the date to withdraw without academic penalty for SEMESTER YYYY classes. Failure to withdraw by the date above will mean that the student has elected to receive the final grade(s) earned in the course(s). The only exception to those withdrawal regulations will be for those instances that involve unusual and fully documented circumstances. Academic Integrity: Every KSU student is responsible for upholding the provisions of the Student Code of Conduct, as published in the Undergraduate and Graduate Catalogs. Section II of the Student Code of Conduct addresses the University’s policy on academic honesty, including provisions regarding plagiarism and cheating, unauthorized access to University materials, misrepresentation/falsification of University records or academic work, malicious removal, retention, or destruction of library materials, malicious/intentional misuse of computer facilities and/or services, and misuse of student identification cards. Incidents of alleged academic misconduct will be handled through the established procedures of the University Judiciary Program, which includes either an “informal” resolution by a faculty member, resulting in a grade adjustment, or a formal hearing procedure, which may subject a student to the Code of Conduct’s minimal one semester suspension requirement. Unauthorized Collaboration: Submission for academic credit of a work product, or a part thereof, represented as its being one's own effort, which has been developed in substantial collaboration with or without assistance from another person or source, is a violation of academic honesty. It is also a violation of academic honesty knowingly to provide such assistance. Collaborative work specifically authorized by a faculty member is allowed.