Course Title: Data Mining I Quarter/Year: Spring 2012 Course Number, Section and CRN: STAT 3880/4880 Sect. 1, CRN 2176 / Sect. 1, CRN 2177 Prerequisites: STAT 1400 – Statistics II or STAT 4610 – Quantitative Methods Meeting Place and Time: DCB 130 4:00pm-5:50pm Name of Professor: Dr. Kellie Keeling Office Hours: TT 9:00-10:00, 2:00-4:00, Virtual Office Hours as Posted, and by Appointment Discussion Board: The General Questions area should be used to ask questions that may be relevant to all the students in the course. The instructor will log on to the discussion board nearly every day. Office Location: DCB 590 E-Mail Address: DM@statsdairy.com Phone Number: 303-871-2296 Class Web Presence: http://statsdairy.com and http://blackboard.du.edu/ Introduction: This is a blended course. That means we will meet face-to-face for most Tuesdays and Thursdays, but there is a substantial portion of the course material that will be delivered and completed "online." Attendance at the face-to-face meetings and participation in all online activities is required. You will find that the online and face-to-face elements of this course are interdependent and integrated. Online participation is required every week – you will be expected to go online to continue discussion or complete other activities. You will be assigned to a group and completion of some activities will require group interaction. Some activities will be face-to-face and some will be online. If you miss a face-to-face class for a legitimate reason, you may complete the in-class group assignment on your own and submit it the following class. ALL STUDENTS NEED TO FOLLOW THESE EXPECTATIONS: University of Denver Honor Code All students are expected to abide by the University of Denver Honor Code. These expectations include the application of academic integrity and honesty in your class participation and assignments. The Honor Code can be viewed in its entirety at this link: http://www.du.edu/ccs/honorcode.html All members of the University of Denver are expected to uphold the values of Integrity, Respect, and Responsibility. These values embody the standards of conduct for students, faculty, staff, and administrators as members of the University community. In order to foster an environment of ethical conduct in the University community, all community members are expected to take “constructive action,” that is, any effort to discuss or report any behavior contrary to the Honor Code with a neutral party. Failure to do so constitutes a violation of the DU Honor Code. Specifically, plagiarism and cheating constitute academic misconduct and can result in both a grade penalty imposed by the instructor and disciplinary action including suspension or expulsion. As part of their responsibility to uphold the Honor Code, instructors reserve the right to have papers submitted through SafeAssign to check for plagiarism against a database of papers submitted previously at DU, a national database of papers, and the Internet. Official Communications The standard method of communicating official information from the Daniels College of Business to its students is through email. Students are provided a DU account using the protocol of firstname.lastname@du.edu, but must set up a "preferred" off-campus email address. Emails sent to the DU account will be forwarded to the preferred email account. DU accounts do not store messages. More information is available at: http://www.du.edu/studentemail/. Students with Disabilities A student who qualifies for academic accommodations because of a disability must submit a Faculty Letter to the instructor from the DU Disability Services Program (DSP) in a timely manner, so that the needs of the student can be addressed. Accommodations will not be provided retroactively, e.g., following an exam or after the due date of a project. DSP determines eligibility for accommodations based on documented disabilities. DSP is located in Ruffatto Hall, 1999 E. Evans Ave. (303-871-2278). http://www.du.edu/studentlife/disability/dsp/index.html Performance Assessment The Daniels College of Business may use assessment tools in this course and other courses for evaluation. Educational Assessment is defined as the systematic collection, interpretation, and use of information about student characteristics, educational environments, learning outcomes and client satisfaction to improve program effectiveness, student performance and professional success. Gifts from Students Because of possible perceptions of undue influence, it is not appropriate for a student to give a gift to a faculty member while the student is still enrolled in that faculty member’s class, including through the grading period. As a general rule, Daniels discourages the giving of gifts between students and faculty. Emergency Procedures The College places great emphasis on the safety of its students. Please respect emergency instructions, including fire alarms. For more information, go to http://www.du.edu/emergency/whattodowhen/index.html REQUIRED COURSE MATERIALS: Course Description This course is designed to prepare you for managerial data analysis and data mining. More specifically, the course addresses the how, when, why, and where of data mining. The emphasis is on understanding the application of a wide range of modern techniques to specific decision-making situations, rather than on mastering the theoretical underpinnings of the techniques. Upon successful completion of the course, you should be able to perform the computational processes necessary to extract information from multidimensional data and transform it into knowledge that can lead to improved business performance. The course covers methods that are aimed at prediction, forecasting, classification, clustering, and association. Students will gain hands-on experience in using computer software to mine business data sets. Beyond Grey Pinstripes: In this course we will discuss issues around ethics in data mining as related to Business and Society. Specific concerns are misleading statistics and graphs and personally identifiable information that could lead to Discrimination, Privacy abuse, and Telemarketing abuses. Required Materials • Clicker response system device – These are available new & used from the bookstore-> • Textbooks Free Online Book: Discovering Knowledge in Data: An Introduction to Data Mining. Larose, John Wiley & Sons. 2005. http://0-library.books24x7.com.bianca.penlib.du.edu/toc.asp?bookid=12378 (I will post pdf files from this book for you on Blackboard) Supplemental Book (Not Required): Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 2nd Ed., Berry and Linoff, Wiley Publishing, 2004 Software Microsoft Excel (2007/2010) JMP 9.0 (disks will be passed around the first day) Camtasia Relay - I will send you information about downloading Other freely available products we discover through the course such as Weka and WebCrawler • Course Assessment Performance will be evaluated on the items below. For this class, all assignments and exams assume you are trainees for Stats Dairy. Your training score is only a measure of your performance in this class and does not reflect my opinion of you as an individual or your worth as a person. Module Exam 1 15% Module Exam 2 20% Module Exam 3 20% Discussion participation 20% Mini Assignments 5% Reading Quizzes 5% Group Project 10% Real World Summary 5% 100% Grades: Stats Dairy regularly hires more data mining trainees than it needs. By means of this course we determine where to place the graduates of the program: 90% - 100% A Trainees who receive an A are considered on the "fast track" and will start out as data mining analysts. Our studies show that most trainees who fall in this group reach an executive position within 10 years. 80% - 89% B Trainees who receive a B will start out as assistant data mining analysts. This does not mean that they cannot reach the executive level but it will be more difficult 70% - 79% 60% - 69% 00% - 59% C D F since they will not regularly be put into career-enhancing positions such as overseas consulting assignments. Trainees who receive a C will be put into staff positions for further development. Trainees who receive a D will be offered non-management positions. Trainees who receive an F will be separated from Stats Dairy. NOTE: + and – grades are given according to the DU scale Course Assignments Overall Description: This course is set up as a blended course. Therefore, in addition to meeting face-to-face, we will meet online in Blackboard for a variety of activities. Here is a general outline of what to expect each week in the course: 1. Read/View the assigned reading material for Tuesday (posted on Blackboard). 2. Complete the “check for understanding” multiple choice reading quiz for Tuesday’s reading before class. 3. Attend class on Tuesday. Complete the in-class group mini assignment. 4. Post comments about the Discussion Board Analysis for the week (due Wednesday night). 5. Read/View the assigned reading material for Thursday (posted on Blackboard). 6. Complete the “check for understanding” multiple choice reading quiz for Thursday’s reading before class. 7. Post a comment about the week’s real word summaries. 8. If you are the lead group, post your response to the Discussion Board Analysis by Monday night. Details about each graded component Exams: Exams will be completed in two parts: On paper without notes and on your computer with your notes. Calculators will be permitted. Cell phones cannot be used as calculators. If you are going to miss an exam for a legitimate conflict, you must receive permission from me BEFORE the exam in order to reschedule. Otherwise you will receive a zero on the exam. No make-up exams will be given. For the portion of the exam taking on your personal computer, you will be required to record what you are doing using Camtasia Relay. Reading Quizzes: For all class days except exam days, a reading quiz will be due before class. These reading quizzes are multiple-choice and cover assigned reading materials. The lowest three scores will be dropped. Discussion Participation: Discussion Board Analysis: Each week a group will be assigned to be the “lead group” to post an initial analysis for a given problem. These must be posted by Monday night. Then the remainder of the groups should respond to the analysis Wednesday night. See the grading rubric on Blackboard under “Discussion Board Analysis Schedule.” Depending on class size we may have 2 groups one week. Real World Discussion: Each person should comment on each real world summary presented by their fellow students. Students can either comment on: one thing they liked or learned from this presentation or one suggestion to make the presentation better. Group Project Comments: Each person should comment on one good thing about each presentation and one suggestion for improvement or further analysis that could be completed. Mini Assignments: During class time an assignment will be given that supplements the topics learned by watching recorded lectures and reading supplied materials. These can be completed in small groups or individually. These are due at the end of class or at the beginning of the following class. The lowest three scores of these mini assignments will be dropped. In addition, participation in clicker quizzes will be a part of this grade. Group Project: There will be a group project assigned during the final module of the class. 5-7 minute executive summaries of the groups' projects will be presented the final day of class. Real World Summaries: Each student will sign up for a day to post a real world example of the use of data mining. These summaries will be presented as a 2-minute PowerPoint presentation with recorded audio using Camtasia Relay. These can be accessed under Blackboard "Real World Summaries Schedule". The presentations will begin the second week of classes. Communication If you are having difficulty with the course material, please see me at your earliest convenience. Do not wait until the first exam to see me about difficulties you are experiencing in comprehending the course material. Do not allow yourself to fall behind in covering the assigned material as this will most certainly result in a poor course grade. Keep up with your assignments and the readings in the text! Honor Code You are expected to abide by the University's honor code on all assignments and exams. The Honor Code is meant to foster and advance an environment of ethical conduct in our academic community. The Code of Student Conduct contains information on the behavioral standards expected of all students at the University of Denver including the areas of civility, community, integrity, and responsibility. Details can be found at: http://www.du.edu/ccs/ Classroom Environment The optimal learning environment may be impaired significantly when the class as a whole is distracted from its intended focus by the actions of a few. Accordingly, classroom computers should be used only as directed by the instructor. Also, in-class use of cell phones, beepers and other devices that potentially may create classroom distractions is prohibited (e.g., cell phones must be set on silent). Further, the behavior of each member in the class must be conducive to the learning of the class as individuals and as a whole. Students with Special Needs If you need adaptations or accommodations because of a disability and are registered with the Disability Services Program (DSP), if you have emergency medical information to share with me, if you need special arrangements in case the building must be evacuated, or if you require rescheduling of an exam due to a religious holiday, please make an appointment to see me during the first week of class. Learning Outcomes Classifications The following 6 levels of Bloom’s Taxonomy are used to classify the learning outcomes [Bloom B. S. (1956). Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. New York: David McKay Co Inc.] 1. Knowledge: Recall data or information. Ex: defines, describes, identifies, knows, labels, lists, matches, names, outlines, recalls, recognizes, reproduces, selects, states 2. Comprehension: Understand the meaning and interpretation of instructions and problems. State a problem in one's own words. Ex: comprehends, converts, defends, distinguishes, estimates, explains, extends, generalizes, summarizes, translates 3. Application: Use a concept in a new situation. Ex: applies, changes, computes, constructs, demonstrates, discovers, modifies, operates, predicts, relates, shows, solves, use 4. Analysis: Separates material or concepts into component parts so that its organizational structure may be understood. Distinguishes between facts and inferences. Ex: analyzes, breaks down, compares, contrasts, diagrams, differentiates, identifies, illustrates, infers, outlines, relates, selects, separates 5. Synthesis: Builds a structure or pattern from diverse elements. Ex: categorizes, combines, compiles, composes, creates, devises, designs, explains, generates, modifies, organizes, plans, rearranges, reconstructs, relates, reorganizes, revises, rewrites, summarizes, tells, writes 6. Evaluation: Make judgments about the value of ideas or materials. Ex: appraises, compares, concludes, contrasts, criticizes, critiques, defends, describes, discriminates, evaluates, explains, interprets, justifies, relates, summarizes, supports Course Schedule – Details here: http://www.statsdairy.com/dm/dmsch.html Date Learning outcomes Modules (Principal content elements) MODULE 1: PREDICTION Tue Mar 27 Introduction to Data Mining Thu Mar 29 Introduction to JMP Data Preprocessing Define Data Mining and its basic terminology. 1 Describe and perform data cleaning and preparation methods. Identify and perform the steps in the data mining process. 1,3 Exploratory Data Analysis Ethics in Data Mining Describe, summarize, and display information in a data set. Thu Apr 5 Multiple Regression and Model Evaluation Techniques Discuss the ethics that are involved with data mining. Tue Apr 10 General Linear Models Explain and use performance metrics to evaluate data mining 2,6 models. Tue Apr 3 Create and interpret multiple regression models. Thu Apr 12 2,3 3,4,5 2,6 3,4 Module 1 Exam MODULE 2: PREDICTION/CLASSIFICATION Tue Apr 17 Time series forecasting Explain, apply, and interpret forecasting models. 2,3,4 Explain, apply, and interpret principal components analysis. Thu Apr 19 Principal Components Analysis Tue Apr 24 Classification Methods k-nearest neighbor Thu Apr 26 Logistic Regression Tue May 1 Classification and Regression Trees (CART) / Decision Trees Thu May 3 Module 2 Exam 2,3,4 Explain, apply, and interpret three simple classification methods. Explain, apply, and interpret logistic regression models. 2,3,4 Explain, apply, and interpret classification and regression trees 2,3,4 (CART). MODULE 3: CLASSIFICATION, CLUSTERING, and ASSOCIATION Tue May 8 Discriminant Analysis Explain, apply, and interpret discriminant analysis models. Thu May 10 Cluster Analysis Explain, apply, and interpret cluster analysis models. Tue May 15 Neural Networks Explain, apply, and interpret association rules. 2,3,4 Explain, apply, and interpret neural network models. Thu May 17 Association Rules Tue May 12 Text Mining Thu May 24 Module 3 Exam Tue May 29 Project Work Days Thu May 31 Project Work Days Tue Jun 5 Project Presentations 2,3,4 2,3,4 2,3,4 2,3,4