Syllabus

advertisement
MIST.606 Section 201 – Fall 2015
Business Intelligence and Data Mining
Wednesday 3:30 PM – 6:20 PM
Instructor:
Xiaobai (Bob) Li, PhD
Office: Southwick 201B
Phone: 978-934-2707
E-mail: xiaobai_li@uml.edu
Web address: http://faculty.uml.edu/xli/Courses/MIST.606(63.755)/
Office hours: Tue/Thu 3:00 PM – 3:30 PM; 4:45 PM – 6:00 PM;
Wed. 3:00 PM – 3:30 PM; 6:20 PM – 7:00 PM; and by appointment
Text:
Required: Ian Witten, Eibe Frank, Mark Hall (WFH): Data Mining: Practical Machine
Learning Tools and Techniques. Morgan Kaufmann, 2011, ISBN: 978-0-12-374856-0.
Required: Viktor Mayer-Schönberger, Kenneth Cukier (MSC): Big Data: A Revolution That
Will Transform How We Live, Work, and Think. Paperback (2014) ISBN: 978-0544227750;
or Hardcover (2013) ISBN: 978-0544-00269-2.
Recommended: Foster Provost, Tom Fawcett (PF): Data Science for Business. O'Reilly
Media, 2013, ISBN: 978-1-449-36132-7.
Catalog Description:
This course introduces the concepts and technologies of business intelligence and data
mining. The course studies how data-oriented business intelligence techniques can be used
by organizations to gain competitive advantages, as well as how to design and develop these
techniques. Topics include classification, clustering, association analysis, prediction, and
text and web mining. Data-mining related ethical issues will also be discussed.
Place in Curriculum and Course Prerequisite:
This course is required for all students in the Master of Science in Business Analytics
(MSBA) program at Manning School of Business (MSB). This is also one of the courses to
satisfy the “Managing Systems and Technology” area requirement in the MBA curriculum
at MSB. The prerequisite course is 49.211 (Statistics) or equivalent, which is a prerequisite
for all the MSBA and MBA courses. The completion of this course (or an equivalent
statistics course) is strictly required.
Course Overview:
Data mining is the systematic process of discovering hidden patterns and knowledge in data.
In this course, we study how to analyze, design and develop data-driven business
intelligence technologies. Importantly, we study how the technologies can be employed by
organizations to leverage their data assets in order to better understand and serve their
customers, and thus gain competitive advantages. Applications to both strategic and
operational problems will be discussed. Topics include Big Data, data transformation,
classification, prediction, association, clustering, text mining, model and performance
evaluation, and data-mining related privacy and ethical issues.
Course Objectives:
Upon completion of the course, the students should
 demonstrate a good understanding of the concept of data mining and business intelligence,
 develop analytical thinking ability and educated instinct for dissecting and exploring data,
 gain solid skills of using data-mining techniques and tools,
 be able to solve real-world data-driven decision problems at strategic, tactic, and
operational levels.
Teaching Methods:
Lecture, discussion, and case analysis.
Attendance and Participation
Students are expected to attend class regularly and to read the class materials for each class
in advance. Each student is required to sign up for one class topic and present an
introduction of the topic in the class scheduled for the topic. The length and format of the
presentation are flexible (between 10 and 20 minutes, preferably without PowerPoint slides).
Assignments and Exams:
Assignments include problem solving and case analysis. All assignments are due at the start
of the class on the due date. Assignments are very involved and students should start each
assignment well before the due date. No make-up exams will be given unless a written
request, with a university approved excuse, is submitted one week before the exam.
Project:
The course project requires students to apply data-mining techniques learned in class to reallife problems. Students will work in groups for the project. Detailed information about the
project will be provided a few weeks after the start of the semester.
Labs and Software:
We will use an open-source package associated with the WFH book, called Weka
(http://www.cs.waikato.ac.nz/ml/weka/), which is written in Java. We may also use a
commercial package called IBM SPSS Modeler. There may be a few lab classes (meet in PA
205) to provide students hands-on experience in using the data-mining packages.
Grading:
Participation and Attendance
Assignments
Project
Midterm Exam
Final Exam
10%
25%
20%
20%
25%
A+ 97 – 100
B+ 87 – 89.9
C+ 70 – 79.9
A- 90 – 92.9
B- 80 – 82.9
F below 60
A 93 – 96.9
B 83 – 86.9
C 60 – 69.9
Class Schedule (Subject to Change):
Date
Topics
Text Chapters
09/02
Course Introduction
09/09
Overview of Data Mining
Data Input
Big Data: Now, More, Messy
WFH Ch. 1-2
PF Ch. 1-2
MSC Ch. 1-3
09/16
Decision Trees
WFH Ch. 3.3, 3.4 (pp.67-72), 4.3, 4.4
PF Ch. 3
09/23
Training, Testing, and Cross-Validation
Decision Trees
WFH Ch. 5.1, 5.2, 5.3, 6.1
PF Ch. 5 (pp.111-117, 126-129, 133-135)
09/30
Naïve Bayes
WFH Ch. 4.2
PF Ch. 9
10/07
Nearest Neighbors
WFH Ch. 3.5, 4.7 (pp.131-132)
PF Ch. 6 (pp.141-159)
10/14
Big Data: Correlation
Model and Performance Evaluation
Data Transformations
MSC Ch. 4
PF Ch. 7
WFH Ch. 7.1, 7.2 (pp.314-6), 7.6 (pp.338-9)
10/21
Midterm Exam
Association Rules
WFH Ch. 3.4 (pp.72-73), 4.5
PF Ch. 12 (pp.291-304)
10/28
Clustering
WFH Ch. 3.6, 4.8, 6.8 (pp.273-279)
PF Ch. 6 (pp.163-175)
11/04
Regression
Project Proposal Due
WFH Ch. 3.2 (p.62), 4.6 (pp.124-125)
PF Ch. 4 (pp.95-97)
11/11
No Class (Veterans Day)
11/18
Regression
Big Data: Datafication, Value, Implications
Handout
MSC Ch. 5-7
11/25
Text Mining
Data Privacy
WFH 7.3 (pp.328-329), Ch. 9.5
PF Ch. 10
MSC Ch. 8-10
12/02
Project Presentation
12/09
Final Exam
Download