Lecture 1 Introduction to the Master’s Programme Practical questions ”Statistics and Data Mining”

advertisement
Lecture 1
Introduction to the Master’s Programme
”Statistics and Data Mining”
Practical questions
1
Personnel at Statistics and Machine Learning department
Name
Name
Oleg Sysoev
Mattias Villani
Senior lecturer
Professor
Responsible for
“Statistics and
Data Mining”
Division chief
Anders
Nordgaard
Ann-Charlotte
Hallberg
(Lotta)
Senior lecturer
Director of
studies
Lecturer
Introductory course "Statistics and Data Mining"
2
Personnel at Statistics and Machine Learning department
Name
Name
Bertil Wegmann
Per Sidén
Postdoc
PhD student
Annelie
Almquist
Lilian Alarik
Study councellor
Administrator
(registration, course
reporting,…)
Introductory course "Statistics and Data Mining"
3
Personnel at Statistics and Machine Learning department
Name
Name
Linda Wänström
Måns
Magnusson
Senior lecturer
PhD Student
Karl Wahlin
Josef Wilzén
Senior lecturer
PhD Student
Responsible for the
bachelor program
Introductory course "Statistics and Data Mining"
4
Personnel at ADIT
Name
Name
Patrick Lambrix
Jose Pena
Professor
Senior lecturer
Introductory course "Statistics and Data
Mining"
5
This course








Lectures – Attendance is obligatory
Reading one statistical paper and writing a summary
Reading one more statistical paper and writing a critical
review
URKUND is used  Plagiarism is forbidden! (discovered
plagiarism implies a request to disciplinary board)
Course end: January 2016
Grading for this course: Pass fail
Several teachers from IDA are involved
You meet other IDA master students
Introductory course "Statistics and Data
Mining"
6
Statistics and Data Mining program
Aims:
 To build advanced models for explaining complex real-life
systems and predicting new events
 To extract, organize and explore large volumes of data
 To learn how to discover important (hidden) information
(trends, patterns) from large and complex data sets
 To get an in-deep knowledge of models and methods
Competences:
 Data mining, machine learning, statistical modeling,
visualization methods, databases, programming etc
Introductory course "Statistics and Data
Mining"
7
Introductory course "Statistics and Data
Mining"
8
Job opportunities


A plenty of jobs are waiting in USA and Europe
Master program gives excellent background to search jobs as
analyst, engineer, manager or consultant in





Business Intelligence (BI)
Customer Resource Management (CRM)
Bioinformatics
Economics
IT industry
…and many other areas where large or complex datasets are
involved
Example jobs:

Predictive Modeling and Data Mining Scientists/Analysts, USA

Statistical Modeller/Software Developer, London

Analytiker, Försäkringskassan
Introductory course "Statistics and Data
Mining"
9
Master program overview

Master program= 120 ECTS credits

Obligatory courses (42 ECTS)


Introductory courses (at least 6 ECTS)




Those are courses in statistics that you need to take in order to get a degree
in Statistics.
Complementary courses


Advanced R programming: recommended for all students missing a solid
programming background
Statistical methods: recommended for people with a little statistics in the
background, i.e. computer scientists or engineers (check syllabus if you are
not sure)!
Profile courses (at least 12 ECTS)


You must take and finish these courses to get a degree
If you have found some interesting course which is not in the schedule, we
may count it as profile, contact Oleg S.
Master thesis (30 ECTS)
In order to make a sufficient progress in studies, you need to
gain 30 ECTS credits/ semester
Introductory course "Statistics and Data Mining"
10
Semester admission rules

at least 6 ECTS credits of the first semester to be
admitted to the second semester

at least 40 ECTS credits of the first year, in order to be
admitted to the third semester

65 ECTS credits of the programme, including all
obligatory courses, in order to be admitted to the
master thesis course.
Introductory course "Statistics and Data
Mining"
11
Master program overview
Year 1
Semester 1
Semester 2
Period 3
Data mining - clustering and
association analysis
(732A31, 15 credits)
Period 1
Period 2
Advanced Academic Studies
(732A42, 3 credits)
Time series analysis
(732A34, 6 credits)
Introduction to
Machine Learning
(732A52, 9 credits)
Advanced R programming
(732A50, 6 credits)
Statistical methods
( 732A49, 6 credits)
Introductory course "Statistics and Data
Mining"
Computational statistics
(732A38, 6 credits)
Neural Networks and Learning
Systems
(TBMI26, 6 credits)
Multivariate statistical
methods
(732A37, 6 credits)
Web programming and
interactivity
(TDDD24, 4 credits)
12
Period 4
Philosophy of
science
(720A04,3
credits)
Bayesian learning,
(732A46, 6 credits)
Master program overview
Year 2
Semester 3
Semester 4
Period 1
Visualization
(732A39, 6 credits)
Period 2
Advanced Machine learning
(732A37, 6 credits)
Optimization
(TAOP23, 6 credits)
Text Mining
(732A47, 6 credits)
Probability theory
(732A40, 6 credits)
Database Technology
(TDDD37 , 6 credits)
Period 3
Period 4
MASTER THESIS
(732A30, 30 credits)
Data mining project (732A32, 6 credits)
Statistical evidence evaluation (732A45, 6 credits)
EXCHANGE STUDIES
Introductory course "Statistics and Data
Mining"
13
Obligatory courses


Academic studies (several sessions, ends before january)
Introduction to Machine Learning


Data Mining – Clustering and Association analysis


Laws of nature and scientific models, theories and observations
Computational statistics


Unsupervised learning, focus on algorithms
Philosophy of science


Predictive modelling: Ridge regression, Generalized additive
models, neural networks, support vector machines etc
Random number generation, MCMC
Bayesian learning

Using prior knowledge to make better decisions and inferences
Introductory course "Statistics and Data
Mining"
14
Introductory courses

Statistical Methods





Probability, conventional distributions: Normal, Poisson,
Gamma…
Point and interval estimation
Hypothesis testing
Basics of Bayesian statistics
Advanced programming in R


Basic programming (loops, data types)
Advanced topics (debugging, peformance enhancement
etc).
Introductory course "Statistics and Data
Mining"
15
Profile courses

Visualization


Time Series Analysis


Multivariate random variables, transforms, order statistics,
convergence. Necessary for PhD studies.
Multivariate statistical methods


Autocorrelation, forecasting, ARIMA models
Probability theory


Static, interactive and dynamic graphics for data analysis
Principal components, factor analysis, canonical correlation
Statistical evidence evaluation

Methods to secure, analyze and interpret (technical)
evidence to be used in the legal process of particular cases
Introductory course "Statistics and Data
Mining"
16
Profile courses

Neural networks and learning systems


Web programming course


Specify, implement and evaluate a data mining algorithm
Text Mining


Linear, nonlinear, network optimization
Data mining project


HTML, XML, PHP
Optimization


Given by Department of Biomedical Engineering Advanced
neural networks, kernel methods, reinforcement learning,
genetic algorithms
Extracting text data from different sources and analysis
linguistically and by statistical tools
Database technology

Relational databases, relational algebra, SQL, query
optimization
Introductory course "Statistics and Data
Mining"
17
Other information
Master program’s homepage(schedule, courses, news…):
http://www.liu.se/en/education/master/programmes/F7MSM
/student?l=en
Facebook page:
https://www.facebook.com/liustatisticsmaster
Email to staff: name.lastname@liu.se
 Example: oleg.sysoev@liu.se
Webpages of courses: www.ida.liu.se/~course_code/
 This course: www.ida.liu.se/~732A42/
Introductory course "Statistics and Data
Mining"
18
Course registration




To get credits for a course, you must register on it.
International students: register for max 120 ECTS, you pay for more
than that! (Swedish language courses not included)
Registration is done by Student Portal:
https://www3.student.liu.se/portal/eng
If you have problems with registration, contact our administrator
Annelie Almquist (annelie.almquist@liu.se )
Here you can choose between registration for
single subject course or study program
Introductory course "Statistics and Data
Mining"
19
LiU-Account and personal number

It is necessary for you to get a LIU-account as soon as
possible (house Zenit, student office)





Access to Student Portal
Course registration
Access to course materials
Access to department computers
If you come outside Sweden, it is very important to get a
Personal Number at the Tax office:


Address: Kungsgatan 37, Linköping
Needed for medical help
Introductory course "Statistics and Data
Mining"
20
Lectures, Labs, Seminars



Lectures: normally presented in PowerPoint, later
available either at course page or LISAM. Attendance is
typically not obligatory
Labs: typically computer exercises done individually or in
groups of two. Attendance is typically not obligatory. A
written report should be normally submitted.
Seminars: Discussions of theory and labs, student
presentations. Attendance typically obligatory.
Introductory course "Statistics and Data
Mining"
21
Exams and points

Exams





Each course has 1 exam and 2 re-exams
You must register for the written or computer exam at
least 10 days in advance.
Exam results may not be improved  if aim for high grade
and feel that written badly  cross every non-empty page in
solutions
Exam results should normally be available after 2 weeks
Credits


Some courses have separate credits for labs (or project) and
for the exam
Credits for some courses can be obtained only after you are
completely done with the course
Introductory course "Statistics and Data
Mining"
22
Course evaluation

KURT: course evaluation system at LiU





You evaluate the courses you have done
Sent via email
The surveys are anonymous!
Very important for improvements of courses – please
answer these surveys!
You will be invited to a meeting with study councellor
periodically to discuss your current studies and plan the
coming studies.
Introductory course "Statistics and Data
Mining"
23
Schedules of the courses

Some schedules are on the course homepages

Some schedules accessed via TimeEdit:

https://se.timeedit.net/web/liu/db1/schema

Type the course name and run
Introductory course "Statistics and Data
Mining"
24
How to find a room

Room at LiU:


http://www.liu.se/karta/?l=en
Room at the department (IDA)

Go to www.ida.liu.se and choose “Find IDA room” from the
droplist.
Introductory course "Statistics and Data
Mining"
25
Useful links




Homepage "Statistics and Data Mining"
Information from the faculty
Practical Guide
Welcome activities for masters
Introductory course "Statistics and Data
Mining"
26
Questions

Questions related to the program?


Contact Oleg Sysoev
http://www.ida.liu.se/department/contact/contactsearch.en.s
html?NAME=Oleg%20Sysoev
Questions about master studies in general?

Contact Darja Utgof
http://www.student.liu.se/masters/master-scoordinators?l=en
Introductory course "Statistics and Data
Mining"
27
Download