Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School of Data Analysis and Artificial Intelligence Syllabus for the course «Modern Methods in Decision Making» for Master degree specialisation 010402.68 «Applied Mathematics and Informatics» for «Data Sciences» Master program Author: Geoffrey G. Decrouez, assistant professor, ggdecrouez@hse.ru APPROVED BY Approved on the meeting of the School of Data Analysis and Artificial Intelligence Head of the School Sergei O. Kuznetsov Academic supervisor of the Master program «Data Sciences» in specialisastion 010402 «Applied Mathematics and Information Science» Sergei O. Kuznetsov ____________________ ____________________ «___» _________ 2015 г. «___» _________ 2015 г. Recommended by Academic Council of the Programme «Applied Mathematics and Information Science» «___» _________ 2015 г. Manager of the School of Data Analysis and Artificial Intelligence Larisa I. Antropova ____________________ Moscow, 2015 The syllabus must not be used by other departments of the university and other educational institution without permission of the department of the syllabus author National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program 1. Teachers Author, assistant professor: Geoffrey G. Decrouez, National Research University Higher School of Economics, Faculty of Computer Science, Department of Data Analysis and artificial Intelligence. 2. Scope of Use The present program establishes minimum demands of students’ knowledge and skills, and determines content of the course. The present syllabus is aimed at department teaching the course, their teaching assistants, and students of the Master of Science 010402.68 «Applied Mathematics and Informatics». This syllabus meets the standards required by: • Educational standards of National Research University Higher School of Economics; • Educational program «Data Sciences» of Federal Master’s Degree Program 010402.68, 2015; • University curriculum of the Master’s program in «Data Science» (010402.68) for 2015. Summary The first section investigates modern tools in statistical learning for regression and classification problems. We start by reviewing linear regression, before moving to regression techniques beyond the linear model, including shrinkage methods, splines, smoothing splines, neural networks and deep learning. We then introduce validation techniques, such as the bootstrap and cross-validation. Next, we discuss tree-based methods, bagging, boosting, and random forests. In the second part of this course, we present time-series models for sequential data, including Markov models, hidden Markov models, Wiener filtering, Kalman filtering. We also investigate sampling algorithms, such as rejection sampling, importance sampling, Markov chain Monte Carlo, Gibbs sampling. The focus of this course is on the mathematical derivations of the algorithms, and R language will be used to explain implementation of the techniques. 3. Learning Objectives The learning objective of the core course «Modern Methods in Decision Making» is to provide students with essential theoretical and practical knowledge in modern statistical learning techniques, such as • Linear model, regularization techniques, splines; • Classification techniques, logistic regression, tree-based methods; • Validation techniques; • Neural networks, deep learning; • Time-series modeling; • Sampling algorithms; • Elements of Bayesian learning; • Use of R environment; National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program 4. Learning outcomes After completing the study of the discipline « Modern Methods in Decision Making » the student should: • Know modern regression and classification techniques in supervised learning. • Know how to implement these models using a programming language such as R. • Understand the theory behind the most widely used statistical learning models. • Use validation techniques to select a candidate model for the purpose of prediction. • Think critically with real data. • Learn to develop complex mathematical reasoning. 5. Place of the discipline in the Master’s program structure The course «Modern Methods in Decision Making» is a course taught in the first year of the Master’s program «Data Science». It is compulsory for all students of the Master’s program. Prerequisites The course is in the continuation of the core course «Modern methods of Data Analysis» proposed in Modules 1 and 2 in the Master`s program «Data Science». Students are expected to be already familiar with some statistical learning techniques, and have skills in analysis, linear algebra and probability theory. Students must have completed the course «Probability Theory and Mathematical Statistics». The following knowledge and competences are needed to study the discipline: • A good command of the English language, both orally and written. • A sound knowledge in probability theory and linear algebra. Main competences developed after completing the study this discipline can be used to learn the following disciplines: • Theoretical aspects of statistical learning techniques. • Implement and test the techniques on real data. National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program After completing the study of the discipline « Modern Methods in Decision Making » the student should have the following competences: Descriptors Competence Code Educative forms and methods Code (UC) (indicators of The ability to reflect developed C-1 methods of activity. The ability to C-2 propose a model to invent and test methods and tools of professional activity SC-М1 Capability of C-3 development of new research methods, change of scientific and industrial profile of self-activities SC-М3 SC-М2 aimed at generation and achievement of the development of the result) The student is able to reflect developed mathematical models in statistical learning. The student is able to select a model using validation techniques and to test it on dataset from coming from reallife examples. competence Lectures and tutorials. Examples covered lectures and Assignments. during the tutorials. Students obtain Assignments, additional necessary knowledge material/reading provided. in statistical learning, sufficient to develop and understand new methods in closely related disciplines such as in Machine Learning. 6. Schedule Two pairs consist of 2 academic hours for lecture followed by 2 academic hour for tutorial after lecture. Module 3: № 1. 2. 3. 4. 5. 6. 7. Topic Decision Theory. Linear Regression. Shrinkage methods. Polynomial regression and splines. Validation techniques. Logistic regression, tree-based methods. Neural networks, deep learning. SVM, kernels. Total Contact hours hours Lectures Seminars 9 2 2 11 2 2 11 2 2 11 2 2 15 4 4 15 4 4 15 4 4 Self-study 5 7 7 7 7 7 7 National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program Module 4: № 8. 9. 10. 11. 12. Topic Introduction to Bayesian learning. Hidden Markov Models. Wiener filtering. Kalman filtering. Sampling algorithms. Total: Total Contact hours hours Lectures Seminars 11 2 2 11 2 2 17 5 5 17 5 5 19 6 6 162 40 40 Self-study 7 7 7 7 7 82 Requirements and Grading Type of grading Type of work Mid-Term Exam Homework Exam 1 Mid-semester test. Solving 2 homework tasks and 2 examples. Written exam. Preparation time – 1 180 min. Final 9. Assessment The assessment consists of two homeworks and one mid-semester exam at the end of Module 3. The homework problems are based on each lecture topics and are handed out to the students throughout the semester. Final assessment is the final exam. Students have to demonstrate knowledge of the material covered during both module 3 and 4. The grade formula: The exam is worth 60% of the final mark. Final course mark is obtained from the following formula: Final=0.2*(Homeworks)+ 0.2*(Mid-term exam)+0.6*(Exam). The grades are rounded in favour of examiner/lecturer with respect to regularity of class and home works. All grades, having a fractional part greater than 0.5, are rounded up. National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program Table of Grade Accordance Ten-point Grading Scale 1 - very bad 2 – bad 3 – no pass 4 – pass 5 – highly pass 6 – good 7 – very good 8 – almost excellent 9 – excellent 10 – perfect Five-point Grading Scale Unsatisfactory - 2 FAIL Satisfactory – 3 Good – 4 PASS Excellent – 5 10. Course Description The following list describes main mathematical definitions which will be considered in the course in correspondence with lecture order. Topic 1. Decision Theory and Linear regression. Loss function, Conditional expectation, Bayesian vs frequentist, Expected loss, Linear regression, Gram-Schmidt orthogonalization, confidence and predictive intervals. Introduction to Bayesian linear regression. Topic 2. Shrinkage methods Ridge regression, lasso, elastic-net, singular value decomposition, effective degrees of freedom. Topic 3. Polynomial regression and splines Polynomial regression, splines, natural spline, smoothing splines, singular value decomposition. Topic 4. Validation techniques AIC, BIC, cross-validation, bootstrap. Topic 5. Classification Logistic regression, tree-based methods, bagging, boosting, AdaBoost, support vector machine, elements of convex optimization. Topic 6. Deep learning Neural networks, back-propagation, deep learning. Topic 7. Time-series modeling Hidden-Markov models, Wiener filtering, Kalman filtering. National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program Topic 8. Sampling algorithms Rejection sampling, importance sampling, MCMC, Gibbs sampling. 11. Term Educational Technology The following educational technologies are used in the study process: • discussion and analysis of the results during the tutorials; • regular assignments to test the progress of the student; • consultation time on Monday mornings. 12. Recommendations for course lecturer Course lecturer is advised to use interactive learning methods, which allow participation of the majority of students, such as slide presentations, combined with writing materials on board, and usage of interdisciplinary papers to present connections between probability theory and statistics. The course is intended to be adaptive, but it is normal to differentiate tasks in a group if necessary, and direct fast learners to solve more complicated tasks. 13. Recommendations for students The course is interactive. Lectures are combined with classes. Students are invited to ask questions and actively participate in group discussions. There will be special office hours for students, which would like to get more precise understanding of each topic. The lecturer is ready to answer your questions online by official e-mails that you can find in the “contacts” section. References found in section 15.1 are suggested to help students in their understanding of the material. This course is taught in English, and students can ask teaching assistants to help them with the language. 14. Final exam questions The final exam will consist of a selection of problems equally weighted. No material is allowed for the exam. Each question will focus on a particular topic presented during the lectures. The first question of the exam will ask the students to prove a result or a theorem proved during the class. The questions consist in exercises on any topic seen during the lectures. To be prepared for the final exam, students must be able to solve questions from the problem sheets, and questions from the two assignments. 15. Reading and Materials 15.1. Recommended Reading We will be following the following three textbooks for this subject: T. Hastie, R. Tibshirani, J.Friedman. The Elements of Statistical Learning (2009). Springer G. James, D. Witten, T. Hastie, R. Tibshirani. An introduction to Statistical Learning (2013). Springer National Research University Higher School of Economics Syllabus for the course «Modern methods in decision making» for 010402.68 «Data Science», Master program C. Bishop. Patter recognition and machine learning (2006). Springer. 15.2. Course webpage All material of the discipline are posted on the course webpage. Students are provided with links to the lecture notes, problem and additional readings. 16. Equipment The course requires a laptop and projector. Lecture materials, course structure and the syllabus are prepared by Geoffrey Decrouez.