Programme

advertisement
Government of Russian Federation
Federal State Autonomous Educational Institution
of High Professional Education
National Research University «Higher School of Economics»
Faculty of Computer Science
School of Data Analysis and Artificial Intelligence
Syllabus for the course
«Modern Methods in Decision Making»
for Master degree specialisation
010402.68 «Applied Mathematics and Informatics»
for «Data Sciences» Master program
Author:
Geoffrey G. Decrouez, assistant professor,
ggdecrouez@hse.ru
APPROVED BY
Approved on the meeting of the School of
Data Analysis and Artificial Intelligence
Head of the School Sergei O. Kuznetsov
Academic supervisor of the
Master program «Data
Sciences» in specialisastion
010402 «Applied Mathematics
and Information Science»
Sergei O. Kuznetsov
____________________
____________________
«___» _________ 2015 г.
«___» _________ 2015 г.
Recommended by Academic Council of the Programme
«Applied Mathematics and Information Science»
«___» _________ 2015 г.
Manager of the School of Data Analysis and
Artificial Intelligence Larisa I. Antropova
____________________
Moscow, 2015
The syllabus must not be used by other departments of the university and other educational
institution without permission of the department of the syllabus author
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
1. Teachers
Author, assistant professor: Geoffrey G. Decrouez, National Research University Higher School
of Economics, Faculty of Computer Science, Department of Data Analysis and artificial
Intelligence.
2. Scope of Use
The present program establishes minimum demands of students’ knowledge and skills, and
determines content of the course.
The present syllabus is aimed at department teaching the course, their teaching assistants, and
students of the Master of Science 010402.68 «Applied Mathematics and Informatics».
This syllabus meets the standards required by:
• Educational standards of National Research University Higher School of Economics;
• Educational program «Data Sciences» of Federal Master’s Degree Program 010402.68,
2015;
• University curriculum of the Master’s program in «Data Science» (010402.68) for 2015.
Summary
The first section investigates modern tools in statistical learning for regression and classification
problems. We start by reviewing linear regression, before moving to regression techniques beyond
the linear model, including shrinkage methods, splines, smoothing splines, neural networks and
deep learning. We then introduce validation techniques, such as the bootstrap and cross-validation.
Next, we discuss tree-based methods, bagging, boosting, and random forests. In the second part of
this course, we present time-series models for sequential data, including Markov models, hidden
Markov models, Wiener filtering, Kalman filtering. We also investigate sampling algorithms, such
as rejection sampling, importance sampling, Markov chain Monte Carlo, Gibbs sampling. The focus
of this course is on the mathematical derivations of the algorithms, and R language will be used to
explain implementation of the techniques.
3. Learning Objectives
The learning objective of the core course «Modern Methods in Decision Making» is to provide
students with essential theoretical and practical knowledge in modern statistical learning techniques,
such as
• Linear model, regularization techniques, splines;
• Classification techniques, logistic regression, tree-based methods;
• Validation techniques;
• Neural networks, deep learning;
• Time-series modeling;
• Sampling algorithms;
• Elements of Bayesian learning;
• Use of R environment;
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
4. Learning outcomes
After completing the study of the discipline « Modern Methods in Decision Making » the
student should:
• Know modern regression and classification techniques in supervised learning.
• Know how to implement these models using a programming language such as R.
• Understand the theory behind the most widely used statistical learning models.
• Use validation techniques to select a candidate model for the purpose of prediction.
• Think critically with real data.
• Learn to develop complex mathematical reasoning.
5. Place of the discipline in the Master’s program structure
The course «Modern Methods in Decision Making» is a course taught in the first year of the
Master’s program «Data Science». It is compulsory for all students of the Master’s program.
Prerequisites
The course is in the continuation of the core course «Modern methods of Data Analysis» proposed
in Modules 1 and 2 in the Master`s program «Data Science». Students are expected to be already
familiar with some statistical learning techniques, and have skills in analysis, linear algebra and
probability theory. Students must have completed the course «Probability Theory and Mathematical
Statistics».
The following knowledge and competences are needed to study the discipline:
• A good command of the English language, both orally and written.
• A sound knowledge in probability theory and linear algebra.
Main competences developed after completing the study this discipline can be used to learn the
following disciplines:
• Theoretical aspects of statistical learning techniques.
• Implement and test the techniques on real data.
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
After completing the study of the discipline « Modern Methods in Decision Making » the student
should have the following competences:
Descriptors
Competence
Code
Educative forms and methods
Code (UC) (indicators of
The ability
to
reflect developed C-1
methods
of
activity.
The ability to
C-2
propose a model
to invent and test
methods
and
tools
of
professional
activity
SC-М1
Capability
of C-3
development
of
new
research
methods, change
of scientific and
industrial profile
of self-activities
SC-М3
SC-М2
aimed at generation and
achievement of the
development of the
result)
The student is able to
reflect developed
mathematical models
in statistical learning.
The student is able to
select a model using
validation
techniques
and to test it on dataset
from coming from reallife examples.
competence
Lectures and tutorials.
Examples covered
lectures
and
Assignments.
during the
tutorials.
Students
obtain Assignments,
additional
necessary
knowledge material/reading provided.
in statistical learning,
sufficient to develop
and understand new
methods in
closely
related disciplines such
as
in
Machine
Learning.
6. Schedule
Two pairs consist of 2 academic hours for lecture followed by 2 academic hour for tutorial after lecture.
Module 3:
№
1.
2.
3.
4.
5.
6.
7.
Topic
Decision Theory. Linear Regression.
Shrinkage methods.
Polynomial regression and splines.
Validation techniques.
Logistic regression, tree-based methods.
Neural networks, deep learning.
SVM, kernels.
Total
Contact hours
hours Lectures Seminars
9
2
2
11
2
2
11
2
2
11
2
2
15
4
4
15
4
4
15
4
4
Self-study
5
7
7
7
7
7
7
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
Module 4:
№
8.
9.
10.
11.
12.
Topic
Introduction to Bayesian learning.
Hidden Markov Models.
Wiener filtering.
Kalman filtering.
Sampling algorithms.
Total:
Total
Contact hours
hours Lectures Seminars
11
2
2
11
2
2
17
5
5
17
5
5
19
6
6
162
40
40
Self-study
7
7
7
7
7
82
Requirements and Grading
Type of
grading
Type of
work
Mid-Term
Exam
Homework
Exam
1 Mid-semester test.
Solving 2 homework tasks and
2 examples.
Written exam. Preparation time –
1 180 min.
Final
9. Assessment
The assessment consists of two homeworks and one mid-semester exam at the end of Module 3.
The homework problems are based on each lecture topics and are handed out to the students
throughout the semester.
Final assessment is the final exam. Students have to demonstrate knowledge of the material
covered during both module 3 and 4.
The grade formula:
The exam is worth 60% of the final mark.
Final course mark is obtained from the following formula: Final=0.2*(Homeworks)+
0.2*(Mid-term exam)+0.6*(Exam).
The grades are rounded in favour of examiner/lecturer with respect to regularity of class and home
works. All grades, having a fractional part greater than 0.5, are rounded up.
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
Table of Grade Accordance
Ten-point
Grading Scale
1 - very bad
2 – bad
3 – no pass
4 – pass
5 – highly pass
6 – good
7 – very good
8 – almost excellent
9 – excellent
10 – perfect
Five-point
Grading Scale
Unsatisfactory - 2
FAIL
Satisfactory – 3
Good – 4
PASS
Excellent – 5
10. Course Description
The following list describes main mathematical definitions which will be considered in the
course in correspondence with lecture order.
Topic 1. Decision Theory and Linear regression.
Loss function, Conditional expectation, Bayesian vs frequentist, Expected loss, Linear
regression, Gram-Schmidt orthogonalization, confidence and predictive intervals. Introduction to
Bayesian linear regression.
Topic 2. Shrinkage methods
Ridge regression, lasso, elastic-net, singular value decomposition, effective degrees of freedom.
Topic 3. Polynomial regression and splines
Polynomial regression, splines, natural spline, smoothing splines, singular value decomposition.
Topic 4. Validation techniques
AIC, BIC, cross-validation, bootstrap.
Topic 5. Classification
Logistic regression, tree-based methods, bagging, boosting, AdaBoost, support vector
machine, elements of convex optimization.
Topic 6. Deep learning
Neural networks, back-propagation, deep learning.
Topic 7. Time-series modeling
Hidden-Markov models, Wiener filtering, Kalman filtering.
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
Topic 8. Sampling algorithms
Rejection sampling, importance sampling, MCMC, Gibbs sampling.
11. Term Educational Technology
The following educational technologies are used in the study process:
•
discussion and analysis of the results during the tutorials;
•
regular assignments to test the progress of the student;
•
consultation time on Monday mornings.
12. Recommendations for course lecturer
Course lecturer is advised to use interactive learning methods, which allow participation of the
majority of students, such as slide presentations, combined with writing materials on board, and
usage of interdisciplinary papers to present connections between probability theory and statistics.
The course is intended to be adaptive, but it is normal to differentiate tasks in a group if necessary,
and direct fast learners to solve more complicated tasks.
13. Recommendations for students
The course is interactive. Lectures are combined with classes. Students are invited to ask questions
and actively participate in group discussions. There will be special office hours for students, which
would like to get more precise understanding of each topic. The lecturer is ready to answer your
questions online by official e-mails that you can find in the “contacts” section. References found in
section 15.1 are suggested to help students in their understanding of the material. This course is
taught in English, and students can ask teaching assistants to help them with the language.
14. Final exam questions
The final exam will consist of a selection of problems equally weighted. No material is allowed
for the exam. Each question will focus on a particular topic presented during the lectures. The first
question of the exam will ask the students to prove a result or a theorem proved during the class.
The questions consist in exercises on any topic seen during the lectures. To be prepared for the final
exam, students must be able to solve questions from the problem sheets, and questions from the two
assignments.
15. Reading and Materials
15.1.
Recommended Reading
We will be following the following three textbooks for this subject:
T. Hastie, R. Tibshirani, J.Friedman. The Elements of Statistical Learning (2009). Springer
G. James, D. Witten, T. Hastie, R. Tibshirani. An introduction to Statistical Learning
(2013). Springer
National Research University Higher School of Economics
Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science»,
Master program
C. Bishop. Patter recognition and machine learning (2006). Springer.
15.2.
Course webpage
All material of the discipline are posted on the course webpage.
Students are provided with links to the lecture notes, problem and additional readings.
16.
Equipment
The course requires a laptop and projector.
Lecture materials, course structure and the syllabus are prepared by Geoffrey Decrouez.
Download