Syllabus - Stevens Institute of Technology

advertisement
Stevens Institute of Technology
Howe School of Technology Management
Syllabus
MGT787 Statistical Learning and Analytics
Semester:
Instructor name and contact information
Germán Creamer, Babbio 637
gcreamer@stevens.edu
Day of Week/Time
Mon. 6.15-8.45PM, P120
Office Hours: M, 1.30 PM – 3 PM
Class Website: Canvas
Overview
The significant amount of corporate information available requires a systematic and
analytical approach to select the most important information and anticipate major events.
Machine learning algorithms facilitate this process understanding, modeling and
forecasting the behavior of major corporate variables.
This course introduces statistical and graphical (machine learning) models used for
inference and prediction. The emphasis of the course is in the learning capability of the
algorithms and their application to several business areas.
Prerequisites: Basic course in probability and statistics at the level of MGT 620 or BIA
652 Multivariate data analytics.
Course Objectives
Students will:



Learn the fundamental concepts of statistical learning algorithms.
Explore existent and new applications of statistical learning methods to business
problems, and to generic classification problems.
Learn to solve analytical problems in groups and effectively communicate its
results.
Relationship of Course to Rest of Curriculum
Students will have the opportunity to explore the main concepts of statistical learning that
will be used in the applied modules of this program.
Learning Goals
By the end of this course, the students will be able to:
1. Understand the foundations of statistical learning algorithms
2. Apply statistical models and analytical methods to several business domains using
a statistical language.
3. Recognize the value and also the limits of statistical learning algorithms to solve
business problems.
Additional learning objectives include the development of:
1. Written and oral communications skills: students are required to communicate
properly during the class discussions and project class presentations. Homeworks
and project report should be presented “as if” they were submitted to a senior
manager of a major corporation.
2. Solve a major analytical problem using large and heterogeneous datasets in a
group project and communicate its results in a professional way.
Pedagogy
The class will combine class presentations, discussions, exercises and case analysis to
motivate students and train them in the appropriate use of statistical and econometric
techniques.
Required Texts
Foster Provost and Tom Fawcett, Data Science for Business, O’Reilly, 2013.
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
(Amazon.com sells the paperback version (2013))
Case Pilgrim Bank A (602104), Harvard Business School
You must register in the following website, buy the case and download related
documents: https://cb.hbsp.harvard.edu/cbmp/access/28615189
Optional Texts
Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical
Learning. Springer-Verlag, New York, 2010 (selected sections)
(downloadable at http://www-stat.stanford.edu/~tibs/ElemStatLearn/).
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to
Information Retrieval, Cambridge University Press. 2008 (downloadable at
http://nlp.stanford.edu/IR-book).
R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification, John Wiley & Sons, 2001.
Tom M. Mitchell, Machine Learning, McGraw-Hill Series in Computer Science, 1997.
Vasant Dhar and Roger Stein. Seven methods for transforming corporate data into
business intelligence. Upper Saddle River: Prentice Hall. 1997.
Additional Free Texts
A. Rajaraman, J. Ullman Mining of Massive Datasets Book (very useful for big data
problems)
Mohammed Zaki and Wagner Meira Jr. Mohammed Zaki and Wagner Meira Jr. Data
Mining and Analysis: Fundamental Concepts and Algorithms (draft)
StatSoft Electronic Statistics Textbook (statistics and data mining)
Roberto Battiti and Mauro Brunato LIONbook: Learning and Intelligent Optimization
(introductory)
Assignments
The course will have a main project and 4 assignments/cases of data analysis. The
assignments must be submitted electronically through the course website before the
beginning of the class of the assigned day. Each student must submit his/her own report.
You should also include the Readme, log and code files if you used a script or wrote a
program. E-mail submissions will not be accepted. Each assignment has a value of 5
points.
Project
The project requires that participants build a decision support system (DSS) based on one
of the methods explored in this course. Each project must be developed by groups of
three students and they should present a project proposal at the middle of the semester.
PhD students should prepare an academic paper that counts as the final project for this
course. The above paper is a good example of a potential academic paper, however this
should be oriented to conferences such as "Innovative Applications of Artificial
Intelligence Conference" or “International Conference on Information Systems.” The
paper should be based on a theoretical or applied exploration of one of the methods
studied in this course or any other data analysis method approved by the instructor.
Grades
Assignment
Assignments/cases
Team project
Participation
Final exam
Total Grade
Grade %
10%
35%
10%
45%
100%
Software
Python is the main software packages that will be used. You should participate in the
Python bootcamp offered by the school at the beginning of the semester.
Class policy
Late Policy: 1 point lost for each day late. No assignments accepted after 3 days.
Cooperation: You are allowed to discuss lecture and textbook materials, and how to
approach assignments.
You cannot share ideas in any written form: code, pseudocode or solutions. You cannot
submit someone else's work found through internet or any other source, or a modification
of that work, with or without that person's knowledge, regardless of the circumstances
under which it was obtained, copied, or modified. Of course, no cooperation is allowed
during exams.
Re-grades: If you dispute the grade received for an assignment, you must submit, in
writing, your detailed and clearly stated argument for what you believe is incorrect and
why. This must be submitted by the beginning of the next class after the assignment was
returned. Requests for re-grade after the beginning of class will not be accepted. A
written response will be provided by the next class indicating your final score. Be aware
that requests of re-grade of a specific problem can result in a regrade of the entire
assignment. This re-grade and written response is final; no additional re-grades or debate
for that assignment.
Syllabus
Week
1
2
3
Topic(s)
Introduction to data science and
data analytic thinking
Predictive modeling
From correlation to supervised
segmentation
Reading(s)
PF, ch. 1 and 2
PF, ch. 3
B., 1.3, 1.4, 1.5
B, 1.6, 14.4
Optional reference:
HTF, ch. 9.2
4
Linear models
PF, ch. 4
B. 3.1, 4.1.1-4.1.3, 4.3.2
5
Support vector machines
B, 6.1, 6.2, 7.1
Optional references:
HTF, ch. 12
MRS, ch. 15
6
7
Model performance analysis
PF, ch. 5, 7 and 8
Mean variance decomposition
B, 3.2, 14.2-14.3
PF, ch. 12
Optional references (click on
each):
ADTrees, Bagging, Random
Forests
HTF, 8.7, 10.1, 15.1-15.3, 16
Combining models:
Ensemble methods
Hwks
Hwk 1: Python
Project proposal
B, 14.1, 14.4, 14.5
8
Graphical models
9
10
Graphical Models
Relational learning: Bayesian
models
11
Application to marketing:
Targeting consumers
Case Pilgrim Bank 1st part
Sequential data (time series):
Markov decision processes:
-Reinforcement learning
-Time series
-Application to trading
Sequential data (time series):
Hidden Markov models
B, 13.1
http://www1.icsi.berkeley.ed
u/~moody/MoodySaffellTN
N01.pdf
13
Sequential data (time series):
Hidden Markov models
Case Pilgrim Bank 2nd part
14
Application to finance:
12
Mixed trading strategies
PF, ch. 9
B. 1.2
Optional references:
HTF, ch. 8.3-8.4
MRS, ch. 11, 13
B, Ch. 8
PF, ch. 11
B, 11.1, 11.2, 11.3
B, 13.2
Hwk 2: classification
Hwk 3:
Case I
discussion
Hwk 4:
Case II
discussion
Final project report
Creamer, Model calibration…,
Quantitative.
algorithmic trading
- Project presentations
PF: Provost and Fawcett, Data Science for Business
B: C. Bishop, Pattern Recognition and Machine Learning
Optional readings:
HTF: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning. 2010
MRS: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction
to Information Retrieval, 2008.
Download