Course Title: Data Mining / Predictive Analytics Quarter/Year: Spring Quarter, 2013 Course Number, Section, CRN: STAT 3880 CRN 2016 Sect. 1 / INFO 4300 CRN 4865 Sect. 1 Prerequisites: STAT 1400 Statistics II, INFO 1010 Business Statistics & Analytics or STAT 4610 – Business Statistics Meeting Place and Time: DCB 240 MW 4:00pm-5:50pm Name of Professor: Dr. Kellie Keeling Office Hours: MW 1:30p-3:45p, Virtual Office Hours as Posted, and by Appointment Discussion Board: The General Questions area should be used to ask questions that may be relevant to all the students in the course. The instructor will log on to the discussion board nearly every day. Office Location: DCB 590 E-Mail Address: Moo@statsdairy.com Phone Number: 303-871-2296 (but please e-mail to leave me a message, as my phone eats messages) Class Web Presence: http://statsdairy.com and http://blackboard.du.edu/ Introduction: This is a blended course. That means we will meet face-to-face for Mondays and Wednesdays, but there is a substantial portion of the course material that will be delivered and completed "online." Attendance at the face-to-face meetings and participation in all online activities is required. You will find that the online and face-to-face elements of this course are interdependent and integrated. Online participation is required every week – you will be expected to go online to complete reading quizzes, watch lectures, and participate in discussion. You will be assigned to a group and completion of some activities will require group interaction. Some activities will be faceto-face and some will be online. If you miss a face-to-face class for a legitimate reason, you may complete the in-class group assignment on your own and submit it the following class. ALL STUDENTS NEED TO FOLLOW THESE EXPECTATIONS: University of Denver Honor Code All students are expected to abide by the University of Denver Honor Code. These expectations include the application of academic integrity and honesty in your class participation and assignments. The Honor Code can be viewed in its entirety at this link: http://www.du.edu/studentlife/studentconduct/index.html All members of the University of Denver are expected to uphold the values of Integrity, Respect, and Responsibility. These values embody the standards of conduct for students, faculty, staff, and administrators as members of the University community. In order to foster an environment of ethical conduct in the University community, all community members are expected to take “constructive action,” that is, any effort to discuss or report any behavior contrary to the Honor Code with a neutral party. Failure to do so constitutes a violation of the DU Honor Code. Specifically, plagiarism and cheating constitute academic misconduct and can result in both a grade penalty imposed by the instructor and disciplinary action including suspension or expulsion. As part of their responsibility to uphold the Honor Code, instructors reserve the right to have papers submitted through SafeAssign to check for plagiarism against a database of papers submitted previously at DU, a national database of papers, and the Internet. Official Communications The standard method of communicating official information from the Daniels College of Business to its students is through email. Students are provided a DU account using the protocol of firstname.lastname@du.edu, but must set up a "preferred" off-campus email address. Emails sent to the DU account will be forwarded to the preferred email account. DU accounts do not store messages. More information is available at: http://www.du.edu/studentemail/. Students with Disabilities A student who qualifies for academic accommodations because of a disability must submit a Faculty Letter to the instructor from the DU Disability Services Program (DSP) in a timely manner, so that the needs of the student can be addressed. Accommodations will not be provided retroactively, e.g., following an exam or after the due date of a project. DSP determines eligibility for accommodations based on documented disabilities. DSP is located in Ruffatto Hall, 1999 E. Evans Ave. (303-871-2278). http://www.du.edu/studentlife/disability/dsp/index.html Performance Assessment The Daniels College of Business may use assessment tools in this course and other courses for evaluation. Educational Assessment is defined as the systematic collection, interpretation, and use of information about student characteristics, educational environments, learning outcomes and client satisfaction to improve program effectiveness, student performance and professional success. Conflicts of Interest, including Gifts from Students The University of Denver requires all employees to avoid real or perceived conflicts of interest. Because of possible perceptions of undue influence, it is not appropriate for a student to give a gift to a faculty member while the student is still enrolled in that faculty member’s class, including through the grading period. Emergency Procedures The College places great emphasis on the safety of its students. Please respect emergency instructions, including fire alarms. For more information, go to http://www.du.edu/emergency/whattodowhen/index.html . Daniels Areas of Interdisciplinary Strength: This course will cover statistical ethics. REQUIRED COURSE MATERIALS: Course Description This course is designed to prepare you for managerial data analysis and data mining. More specifically, the course addresses the how, when, why, and where of data mining. The emphasis is on understanding the application of a wide range of modern techniques to specific decision-making situations, rather than on mastering the theoretical underpinnings of the techniques. Upon successful completion of the course, you should be able to perform the computational processes necessary to extract information from multi-dimensional data and transform it into knowledge that can lead to improved business performance. The course covers methods that are aimed at prediction, forecasting, classification, clustering, and association. Students will gain hands-on experience in using computer software to mine business data sets. Required Materials • Clicker Turning Point response system device – These are available new & used from the bookstore -> • Textbooks Free Online Book: Discovering Knowledge in Data: An Introduction to Data Mining. Larose, John Wiley & Sons. 2005. http://0library.books24x7.com.bianca.penlib.du.edu/toc.asp?bookid=12378 (I will post pdf files from this book for you on Blackboard) Supplemental Book (Not Required): Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 2nd Ed., Berry and Linoff, Wiley Publishing, 2004 • Software Microsoft Excel (2007/2010/2013) JMP 9.0 (disks will be passed around the first day) Camtasia Relay - I will send you information about downloading Other freely available products including R, Weka, and WebCrawler Course Assessment Performance will be evaluated on the items below. For this class, all assignments and exams assume you are trainees for Stats Dairy. Your training score is only a measure of your performance in this class and does not reflect my opinion of you as an individual or your worth as a person. Module Exam 1 15% Module Exam 2 20% Module Exam 3 20% Group Mini-Example/Peer Feedback 10% Reading/Video Quizzes 15% Group Project 10% Mini Assignments/Attendance (Clickers) 10% 100% Grades: Stats Dairy regularly hires more data mining trainees than it needs. By means of this course we determine where to place the graduates of the program: 90% - 100% A Trainees who receive an A are considered on the "fast track" and will start out as data mining analysts. Our studies show that most trainees who fall in this group reach an executive position within 10 years. 80% - 89% B Trainees who receive a B will start out as assistant data mining analysts. This does not mean that they cannot reach the executive level but it will be more difficult since they will not regularly be put into career-enhancing positions such as overseas consulting assignments. 70% - 79% C Trainees who receive a C will be put into staff positions for further development. 60% - 69% D Trainees who receive a D will be offered non-management positions. 00% - 59% F Trainees who receive an F will be separated from Stats Dairy. Grading scale: A: 93-100%; A-: 90-92.9%; B+: 87-89.9%, B: 83-86.9%; B-: 80-82.9%; etc. Course Assignments Overall Description: This course is set up as a blended course. Therefore, in addition to meeting face-toface, we will conduct course activities through Blackboard. Here is a general outline of what to expect each class for the course: 1. Read/View the assigned reading material for class (posted on Blackboard). • Complete the “check for understanding” multiple choice reading quiz for Monday’s reading before class. 2. Watch the lecture video for class (posted on Blackboard). • Complete the quiz within the video. 3. Attend class. • Complete the in-class group mini assignment. • Participate using clickers during in-class quizzing. • Watch Group Mini-Example Presentation. (starts week 2) o Post your evaluation comments. 4. If you are a group presenting a mini-example, prepare your presentation before class and present in during your assigned class. Details about each graded component Exams: Exams will be completed in two parts: On paper without notes and on your computer with your notes. Calculators will be permitted. Cell phones cannot be used as calculators. If you are going to miss an exam for a legitimate conflict, you must receive permission from me BEFORE the exam in order to reschedule. Otherwise you will receive a zero on the exam. No make-up exams will be given. For the portion of the exam taking on your personal computer, you will be required to record what you are doing using Camtasia Relay. Reading Quizzes: For all class days except exam days, a reading quiz will be due before class. These reading quizzes are multiple-choice and cover assigned reading materials. The lowest three scores will be dropped. Late work will be accepted with a 31% penalty up to one week late. Note: the due dates for the first week will be extended. Video Quizzes: For all class days except exam days, short lecture videos will be posted for you to watch before class. The videos will be posted 2 days before class (starting week 2). While watching the video, you need to complete a quiz. Note: the due dates for the first week will be extended. Group Mini-Examples: Each group is responsible for one mini-example presentation during class. You will sign up for a date the first day of class. These will start during the second week of class. Group Peer Feedback: Each person should comment on each group Mini-Example presentation through the link on Blackboard. Mini Assignments: During class time an assignment will be given that supplements the topics learned by watching recorded lectures and reading supplied materials. These can be completed in small groups or individually. These are due at the end of class or at the beginning of the following class. The lowest three scores of these mini assignments will be dropped. In addition, participation in clicker quizzes will be a part of this grade. Group Project: There will be a group project assigned during the final module of the class. The projects will be shared with the class the final day of class. Communication If you are having difficulty with the course material, please see me at your earliest convenience. Do not wait until the first exam to see me about difficulties you are experiencing in comprehending the course material. Do not allow yourself to fall behind in covering the assigned material as this will most certainly result in a poor course grade. Keep up with your assignments and the readings in the text! Classroom Environment The optimal learning environment may be impaired significantly when the class as a whole is distracted from its intended focus by the actions of a few. Accordingly, classroom computers should be used only as directed by the instructor. Also, in-class use of cell phones, beepers and other devices that potentially may create classroom distractions is prohibited (e.g., cell phones must be set on silent). Further, the behavior of each member in the class must be conducive to the learning of the class as individuals and as a whole. Course Schedule – Details here: http://statsdairy.com/INFO4300/schedule.html Date Principal content elements Learning outcomes W Apr 10 MODULE 1: PREDICTION Define Data Mining and its basic terminology.1 Describe and perform data cleaning and preparation Introduction to JMP methods.2,3 Data Preprocessing Identify and perform the steps in the data mining process.1,3 Exploratory Data Analysis Describe, summarize, and display information in a data Ethics in Data Mining set.3,4,5 Multiple Regression and Model Discuss the ethics that are involved with data mining.2,6 Evaluation Techniques Explain and use performance metrics to evaluate data mining models.2,6 General Linear Models Create and interpret multiple regression models.3,4 Module 1 Exam M Apr 15 W Apr 17 MODULE 2: PREDICTION/CLASSIFICATION Time series forecasting Explain, apply, and interpret forecasting models.2,3,4 Explain, apply, and interpret principal components analysis. Principal Components Analysis M Mar 25 W Mar 27 M Apr 1 W Apr 3 M Apr 8 M Apr 22 Introduction to Data Mining Classification Methods k-nearest neighbor W Apr 24 M Apr 29 Logistic Regression Classification and Regression Trees (CART) / Decision Trees W May 1 Module 2 Exam 2,3,4 Explain, apply, and interpret three simple classification methods.2,3,4 Explain, apply, and interpret logistic regression models. 2,3,4 Explain, apply, and interpret classification and regression trees (CART). 2,3,4 M May 13 MODULE 3: CLASSIFICATION, CLUSTERING, and ASSOCIATION Explain, apply, and interpret discriminant analysis models. 2,3,4 Discriminant Analysis Explain, apply, and interpret cluster analysis models. 2,3,4 Cluster Analysis Explain, apply, and interpret association rules. 2,3,4 Explain, apply, and interpret neural network models. 2,3,4 Neural Networks W May 15 M May 20 Association Rules Text Mining W May 22 Module 3 Exam M May 27 Memorial Day W May 29 Project Work Day M May 6 W May 8 M Jun 3 Project Presentations Learning Outcomes Classifications: 1. Knowledge, 2. Comprehension, 3. Application, 4. Analysis, 5. Synthesis, 6. Evaluation