MIST.606 Section 201 – Fall 2015 Business Intelligence and Data Mining Wednesday 3:30 PM – 6:20 PM Instructor: Xiaobai (Bob) Li, PhD Office: Southwick 201B Phone: 978-934-2707 E-mail: xiaobai_li@uml.edu Web address: http://faculty.uml.edu/xli/Courses/MIST.606(63.755)/ Office hours: Tue/Thu 3:00 PM – 3:30 PM; 4:45 PM – 6:00 PM; Wed. 3:00 PM – 3:30 PM; 6:20 PM – 7:00 PM; and by appointment Text: Required: Ian Witten, Eibe Frank, Mark Hall (WFH): Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2011, ISBN: 978-0-12-374856-0. Required: Viktor Mayer-Schönberger, Kenneth Cukier (MSC): Big Data: A Revolution That Will Transform How We Live, Work, and Think. Paperback (2014) ISBN: 978-0544227750; or Hardcover (2013) ISBN: 978-0544-00269-2. Recommended: Foster Provost, Tom Fawcett (PF): Data Science for Business. O'Reilly Media, 2013, ISBN: 978-1-449-36132-7. Catalog Description: This course introduces the concepts and technologies of business intelligence and data mining. The course studies how data-oriented business intelligence techniques can be used by organizations to gain competitive advantages, as well as how to design and develop these techniques. Topics include classification, clustering, association analysis, prediction, and text and web mining. Data-mining related ethical issues will also be discussed. Place in Curriculum and Course Prerequisite: This course is required for all students in the Master of Science in Business Analytics (MSBA) program at Manning School of Business (MSB). This is also one of the courses to satisfy the “Managing Systems and Technology” area requirement in the MBA curriculum at MSB. The prerequisite course is 49.211 (Statistics) or equivalent, which is a prerequisite for all the MSBA and MBA courses. The completion of this course (or an equivalent statistics course) is strictly required. Course Overview: Data mining is the systematic process of discovering hidden patterns and knowledge in data. In this course, we study how to analyze, design and develop data-driven business intelligence technologies. Importantly, we study how the technologies can be employed by organizations to leverage their data assets in order to better understand and serve their customers, and thus gain competitive advantages. Applications to both strategic and operational problems will be discussed. Topics include Big Data, data transformation, classification, prediction, association, clustering, text mining, model and performance evaluation, and data-mining related privacy and ethical issues. Course Objectives: Upon completion of the course, the students should demonstrate a good understanding of the concept of data mining and business intelligence, develop analytical thinking ability and educated instinct for dissecting and exploring data, gain solid skills of using data-mining techniques and tools, be able to solve real-world data-driven decision problems at strategic, tactic, and operational levels. Teaching Methods: Lecture, discussion, and case analysis. Attendance and Participation Students are expected to attend class regularly and to read the class materials for each class in advance. Each student is required to sign up for one class topic and present an introduction of the topic in the class scheduled for the topic. The length and format of the presentation are flexible (between 10 and 20 minutes, preferably without PowerPoint slides). Assignments and Exams: Assignments include problem solving and case analysis. All assignments are due at the start of the class on the due date. Assignments are very involved and students should start each assignment well before the due date. No make-up exams will be given unless a written request, with a university approved excuse, is submitted one week before the exam. Project: The course project requires students to apply data-mining techniques learned in class to reallife problems. Students will work in groups for the project. Detailed information about the project will be provided a few weeks after the start of the semester. Labs and Software: We will use an open-source package associated with the WFH book, called Weka (http://www.cs.waikato.ac.nz/ml/weka/), which is written in Java. We may also use a commercial package called IBM SPSS Modeler. There may be a few lab classes (meet in PA 205) to provide students hands-on experience in using the data-mining packages. Grading: Participation and Attendance Assignments Project Midterm Exam Final Exam 10% 25% 20% 20% 25% A+ 97 – 100 B+ 87 – 89.9 C+ 70 – 79.9 A- 90 – 92.9 B- 80 – 82.9 F below 60 A 93 – 96.9 B 83 – 86.9 C 60 – 69.9 Class Schedule (Subject to Change): Date Topics Text Chapters 09/02 Course Introduction 09/09 Overview of Data Mining Data Input Big Data: Now, More, Messy WFH Ch. 1-2 PF Ch. 1-2 MSC Ch. 1-3 09/16 Decision Trees WFH Ch. 3.3, 3.4 (pp.67-72), 4.3, 4.4 PF Ch. 3 09/23 Training, Testing, and Cross-Validation Decision Trees WFH Ch. 5.1, 5.2, 5.3, 6.1 PF Ch. 5 (pp.111-117, 126-129, 133-135) 09/30 Naïve Bayes WFH Ch. 4.2 PF Ch. 9 10/07 Nearest Neighbors WFH Ch. 3.5, 4.7 (pp.131-132) PF Ch. 6 (pp.141-159) 10/14 Big Data: Correlation Model and Performance Evaluation Data Transformations MSC Ch. 4 PF Ch. 7 WFH Ch. 7.1, 7.2 (pp.314-6), 7.6 (pp.338-9) 10/21 Midterm Exam Association Rules WFH Ch. 3.4 (pp.72-73), 4.5 PF Ch. 12 (pp.291-304) 10/28 Clustering WFH Ch. 3.6, 4.8, 6.8 (pp.273-279) PF Ch. 6 (pp.163-175) 11/04 Regression Project Proposal Due WFH Ch. 3.2 (p.62), 4.6 (pp.124-125) PF Ch. 4 (pp.95-97) 11/11 No Class (Veterans Day) 11/18 Regression Big Data: Datafication, Value, Implications Handout MSC Ch. 5-7 11/25 Text Mining Data Privacy WFH 7.3 (pp.328-329), Ch. 9.5 PF Ch. 10 MSC Ch. 8-10 12/02 Project Presentation 12/09 Final Exam