Lecture 1 Introduction to the Master’s Programme ”Statistics and Data Mining” Practical questions 1 Personnel at Statistics and Machine Learning department Name Name Oleg Sysoev Mattias Villani Senior lecturer Professor Responsible for “Statistics and Data Mining” Division chief Anders Nordgaard Ann-Charlotte Hallberg (Lotta) Senior lecturer Director of studies Lecturer Introductory course "Statistics and Data Mining" 2 Personnel at Statistics and Machine Learning department Name Name Bertil Wegmann Per Sidén Postdoc PhD student Annelie Almquist Lilian Alarik Study councellor Administrator (registration, course reporting,…) Introductory course "Statistics and Data Mining" 3 Personnel at Statistics and Machine Learning department Name Name Linda Wänström Måns Magnusson Senior lecturer PhD Student Karl Wahlin Josef Wilzén Senior lecturer PhD Student Responsible for the bachelor program Introductory course "Statistics and Data Mining" 4 Personnel at ADIT Name Name Patrick Lambrix Jose Pena Professor Senior lecturer Introductory course "Statistics and Data Mining" 5 This course Lectures – Attendance is obligatory Reading one statistical paper and writing a summary Reading one more statistical paper and writing a critical review URKUND is used Plagiarism is forbidden! (discovered plagiarism implies a request to disciplinary board) Course end: January 2016 Grading for this course: Pass fail Several teachers from IDA are involved You meet other IDA master students Introductory course "Statistics and Data Mining" 6 Statistics and Data Mining program Aims: To build advanced models for explaining complex real-life systems and predicting new events To extract, organize and explore large volumes of data To learn how to discover important (hidden) information (trends, patterns) from large and complex data sets To get an in-deep knowledge of models and methods Competences: Data mining, machine learning, statistical modeling, visualization methods, databases, programming etc Introductory course "Statistics and Data Mining" 7 Introductory course "Statistics and Data Mining" 8 Job opportunities A plenty of jobs are waiting in USA and Europe Master program gives excellent background to search jobs as analyst, engineer, manager or consultant in Business Intelligence (BI) Customer Resource Management (CRM) Bioinformatics Economics IT industry …and many other areas where large or complex datasets are involved Example jobs: Predictive Modeling and Data Mining Scientists/Analysts, USA Statistical Modeller/Software Developer, London Analytiker, Försäkringskassan Introductory course "Statistics and Data Mining" 9 Master program overview Master program= 120 ECTS credits Obligatory courses (42 ECTS) Introductory courses (at least 6 ECTS) Those are courses in statistics that you need to take in order to get a degree in Statistics. Complementary courses Advanced R programming: recommended for all students missing a solid programming background Statistical methods: recommended for people with a little statistics in the background, i.e. computer scientists or engineers (check syllabus if you are not sure)! Profile courses (at least 12 ECTS) You must take and finish these courses to get a degree If you have found some interesting course which is not in the schedule, we may count it as profile, contact Oleg S. Master thesis (30 ECTS) In order to make a sufficient progress in studies, you need to gain 30 ECTS credits/ semester Introductory course "Statistics and Data Mining" 10 Semester admission rules at least 6 ECTS credits of the first semester to be admitted to the second semester at least 40 ECTS credits of the first year, in order to be admitted to the third semester 65 ECTS credits of the programme, including all obligatory courses, in order to be admitted to the master thesis course. Introductory course "Statistics and Data Mining" 11 Master program overview Year 1 Semester 1 Semester 2 Period 3 Data mining - clustering and association analysis (732A31, 15 credits) Period 1 Period 2 Advanced Academic Studies (732A42, 3 credits) Time series analysis (732A34, 6 credits) Introduction to Machine Learning (732A52, 9 credits) Advanced R programming (732A50, 6 credits) Statistical methods ( 732A49, 6 credits) Introductory course "Statistics and Data Mining" Computational statistics (732A38, 6 credits) Neural Networks and Learning Systems (TBMI26, 6 credits) Multivariate statistical methods (732A37, 6 credits) Web programming and interactivity (TDDD24, 4 credits) 12 Period 4 Philosophy of science (720A04,3 credits) Bayesian learning, (732A46, 6 credits) Master program overview Year 2 Semester 3 Semester 4 Period 1 Visualization (732A39, 6 credits) Period 2 Advanced Machine learning (732A37, 6 credits) Optimization (TAOP23, 6 credits) Text Mining (732A47, 6 credits) Probability theory (732A40, 6 credits) Database Technology (TDDD37 , 6 credits) Period 3 Period 4 MASTER THESIS (732A30, 30 credits) Data mining project (732A32, 6 credits) Statistical evidence evaluation (732A45, 6 credits) EXCHANGE STUDIES Introductory course "Statistics and Data Mining" 13 Obligatory courses Academic studies (several sessions, ends before january) Introduction to Machine Learning Data Mining – Clustering and Association analysis Laws of nature and scientific models, theories and observations Computational statistics Unsupervised learning, focus on algorithms Philosophy of science Predictive modelling: Ridge regression, Generalized additive models, neural networks, support vector machines etc Random number generation, MCMC Bayesian learning Using prior knowledge to make better decisions and inferences Introductory course "Statistics and Data Mining" 14 Introductory courses Statistical Methods Probability, conventional distributions: Normal, Poisson, Gamma… Point and interval estimation Hypothesis testing Basics of Bayesian statistics Advanced programming in R Basic programming (loops, data types) Advanced topics (debugging, peformance enhancement etc). Introductory course "Statistics and Data Mining" 15 Profile courses Visualization Time Series Analysis Multivariate random variables, transforms, order statistics, convergence. Necessary for PhD studies. Multivariate statistical methods Autocorrelation, forecasting, ARIMA models Probability theory Static, interactive and dynamic graphics for data analysis Principal components, factor analysis, canonical correlation Statistical evidence evaluation Methods to secure, analyze and interpret (technical) evidence to be used in the legal process of particular cases Introductory course "Statistics and Data Mining" 16 Profile courses Neural networks and learning systems Web programming course Specify, implement and evaluate a data mining algorithm Text Mining Linear, nonlinear, network optimization Data mining project HTML, XML, PHP Optimization Given by Department of Biomedical Engineering Advanced neural networks, kernel methods, reinforcement learning, genetic algorithms Extracting text data from different sources and analysis linguistically and by statistical tools Database technology Relational databases, relational algebra, SQL, query optimization Introductory course "Statistics and Data Mining" 17 Other information Master program’s homepage(schedule, courses, news…): http://www.liu.se/en/education/master/programmes/F7MSM /student?l=en Facebook page: https://www.facebook.com/liustatisticsmaster Email to staff: name.lastname@liu.se Example: oleg.sysoev@liu.se Webpages of courses: www.ida.liu.se/~course_code/ This course: www.ida.liu.se/~732A42/ Introductory course "Statistics and Data Mining" 18 Course registration To get credits for a course, you must register on it. International students: register for max 120 ECTS, you pay for more than that! (Swedish language courses not included) Registration is done by Student Portal: https://www3.student.liu.se/portal/eng If you have problems with registration, contact our administrator Annelie Almquist (annelie.almquist@liu.se ) Here you can choose between registration for single subject course or study program Introductory course "Statistics and Data Mining" 19 LiU-Account and personal number It is necessary for you to get a LIU-account as soon as possible (house Zenit, student office) Access to Student Portal Course registration Access to course materials Access to department computers If you come outside Sweden, it is very important to get a Personal Number at the Tax office: Address: Kungsgatan 37, Linköping Needed for medical help Introductory course "Statistics and Data Mining" 20 Lectures, Labs, Seminars Lectures: normally presented in PowerPoint, later available either at course page or LISAM. Attendance is typically not obligatory Labs: typically computer exercises done individually or in groups of two. Attendance is typically not obligatory. A written report should be normally submitted. Seminars: Discussions of theory and labs, student presentations. Attendance typically obligatory. Introductory course "Statistics and Data Mining" 21 Exams and points Exams Each course has 1 exam and 2 re-exams You must register for the written or computer exam at least 10 days in advance. Exam results may not be improved if aim for high grade and feel that written badly cross every non-empty page in solutions Exam results should normally be available after 2 weeks Credits Some courses have separate credits for labs (or project) and for the exam Credits for some courses can be obtained only after you are completely done with the course Introductory course "Statistics and Data Mining" 22 Course evaluation KURT: course evaluation system at LiU You evaluate the courses you have done Sent via email The surveys are anonymous! Very important for improvements of courses – please answer these surveys! You will be invited to a meeting with study councellor periodically to discuss your current studies and plan the coming studies. Introductory course "Statistics and Data Mining" 23 Schedules of the courses Some schedules are on the course homepages Some schedules accessed via TimeEdit: https://se.timeedit.net/web/liu/db1/schema Type the course name and run Introductory course "Statistics and Data Mining" 24 How to find a room Room at LiU: http://www.liu.se/karta/?l=en Room at the department (IDA) Go to www.ida.liu.se and choose “Find IDA room” from the droplist. Introductory course "Statistics and Data Mining" 25 Useful links Homepage "Statistics and Data Mining" Information from the faculty Practical Guide Welcome activities for masters Introductory course "Statistics and Data Mining" 26 Questions Questions related to the program? Contact Oleg Sysoev http://www.ida.liu.se/department/contact/contactsearch.en.s html?NAME=Oleg%20Sysoev Questions about master studies in general? Contact Darja Utgof http://www.student.liu.se/masters/master-scoordinators?l=en Introductory course "Statistics and Data Mining" 27