The Course

advertisement
Statistical Modeling for Social Scientists (Course
Code: SSPS10023)
School of Social & Political Science,
University of Edinburgh
SESSION 2015-2016
Semester 1
The Course
The main aim of this course is to provide a broad perspective on the use of
statistical modeling to reach conclusions from data. It covers generalized linear
models, some major statistical learning tools, and models for complex causal
relationships, mainly in the context of social sciences. Lectures are combined
with practical computer lab tutorials in order to illustrate the applications of the
theoretical tools.
The course employs a hands-on approach through analysis using the statistical
software R. The applications are mostly chosen from real social science research
questions but examples from other disciplines like biology, medicine and
engineering are also given.
Although the course will cover the technical aspects of the models introduced,
the emphasis will be on application, coding and interpretation.
On top of the theoretical tools introduced, the course aims to equip students
two other computational skills: data management and data visualization. R
packages dplyr and ggplot2 will be introduced and used for these purposes.
Learning Outcomes
By the end of the course students will:
1. Have a unified conceptual and mathematical understanding of
generalized linear models.
2. Be able to use the statistical software R for data management, data
analysis and data visualization.
3. Be able to analyze multidimensional data through dimension reduction
and clustering
4. To appreciate the uses and limits maximum likelihood estimation.
5. Be able to deal with a particular causality problem using the instrumental
variable regression.
2
Course Organisation
The course convener is Ugur Ozdemir (Room 3.02 CMB); office hours by
appointment; email: Ugur.Ozdemir@ed.ac.uk
Course Secretary - Daniel Jackson, email: daniel.jackson@ed.ac.uk
Contact number 0131 511 337
Lectures:
Tuesday, 1110-1300
Tutorial:
Wednesday, 10:00-10:50
3
Course Outline
1. Principles of Statistical Modeling
2. Introduction to R
3. Data Manipulation and Visualization with R
4. GLM : Basics
5. GLM Estimation: Maximum Likelihood Principle
6. GLM: Binary Variables and Logistic Regression
7. GLM: Nominal and Ordinal Logistic Regression
8. GLM: Poisson Regression and Log-Linear Models
9. Unsupervised Learning: PCA / Clustering
10. Instrumental Variable Regression
11. Revision
Statistical Software
R will be used throughout the course. R is an open source software and freely
available online. We will also use R-Studio, a graphical user interface for R, which
is also freely available.
Course Reading
The only required text for the course:
Dobson, Annette J., and Adrian Barnett. An Introduction to Generalized Linear
Models. CRC Press, 2008. (DA)
The book has been ordered to Blackwell Bookshop.
All other weekly readings will be provided through Learn.
Other References
Madsen, Henrik, and Poul Thyregod. Introduction to general and generalized
linear models. CRC Press, 2010.
Matloff, Norman. The Art of R Programming: A tour of statistical software design.
No Starch Press, 2011.
Crawley, Michael J. The R book. John Wiley & Sons, 2012.
4
Chang, Winston. R Graphics Cookbook. O'Reilly Media, Inc., 2012.
Madsen, Henrik, and Poul Thyregod. Introduction to General and Generalized
Linear Models. CRC Press, 2010
Agresti, Alan. Foundations of Linear and Generalized Linear Models. John Wiley &
Sons, 2015.
Dunteman, George H., and Moon-Ho R. Ho. An Introduction to Generalized Linear
Models. Sage, 2006.
5
Assessment
Course assessment is based on:
Tutorial Assessment (40%)
Tutorial assessment will based on the best eight out of nine weekly tutorial
quizzes. There will be no quiz in the first and the last weeks. Each of the 8 selected
quizzes will be worth 5 percentage points of the 40 percentage points allocated to
all tutorial assignments. The quizzes will be no longer than 15 minutes and will
typically include questions regarding to previous two weeks’ material.
Timed Assignment (60%)
Students will have 72 hours to complete a timed assignment. There will be some
constrained choice on the assignment and it will include both problem solving
and data analysis sections.
The assignment will be made available after the lecture on Dec 1st and it will be
due back Dec 4th at 12pm.
Further details about the tutorial assessment and the timed assignment will be
provided in class.
6
Course Programme
Week 1 - 22/09/15: Principles of Statistical Modeling





Exploratory data analysis
Model formulation
Parameter estimation
Model diagnostics
Inference and interpretation
Readings:
- Cox, D. R. and E. J. Snell (1981). Applied Statistics: Principles and Examples.
London: Chapman & Hall (p: 1-19)
- DA (Chapter 3)
Week 2 - 29/09/15: Introduction to R




Installing R, R-Studio and R packages
Data structures in R (vectors, matrices, lists, data frames)
Simple programming structures
Data input and output
Readings:
https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
Week 3 - 06/10/15: Data Manipulation and Visualization with R


Data manipulation in R using the dplyr and tidyr packages
Data visualization in R using the ggplot2 package
Readings:
https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
7
https://www.rstudio.com/wp-content/uploads/2015/02/data-wranglingcheatsheet.pdf
https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
Week 4 - 13/10/15: GLM: Basics




Exponential family of distributions
Error structures
Properties of distributions in the exponential family
Generalized linear models
Readings:
- DA (Chapter 3)
Week 5 - 20/10/15: GLM Estimation: Maximum Likelihood Principle






Point estimation theory
The likelihood function
The maximum likelihood estimate
Distribution of the ML estimator
Generalized loss-function and deviance
Likelihood ratio tests
Readings:
- Madsen, Henrik, and Poul Thyregod. Introduction to general and generalized
linear models. CRC Press, 2010 (Chapter 3)
Week 6 - 27/10/15: GLM: Binary Variables and Logistic Regression





Dose response models
General logistic regression model
Goodness of fit statistics
Residuals
Other diagnostics
8
Readings:
- DA (Chapter 7)
Week 7 - 03/11/15: GLM: Nominal and Ordinal Logistic Regression




Introduction
Multinomial distribution
Nominal logistic regression
Ordinal logistic regression
Readings:
- DA (Chapter 8)
Week 8 - 10/11/15: GLM: Poisson Regression and Log-Linear Models





Poisson regression
Examples of contingency tables
Probability models for contingency tables
Log-linear models
Inference for log-linear models
Readings:
- DA (Chapter 9)
Week 9 - 17/11/15: Unsupervised Learning: PCA / Clustering


Principal component analysis
Clustering Methods
o K-Means Clustering
o Hierarchical Clustering
Readings:
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to
Statistical Learning. New York: Springer. (Chapter 10)
9
Week 10 - 24/11/15: Instrumental Variable (IV) Regression




Causal Effect Estimation with a Binary IV
Traditional IV Estimators
Recognized Pitfalls of Traditional IV Estimation
Instrumental Variable Estimators of Average Causal Effects
Readings:
- Morgan, S. L., & Winship, C. (2014). Counterfactuals and Causal Inference.
Cambridge University Press. (Chapter 7)
Week 11 - 01/12/15: Revision
10
Guide to Using LEARN for Online Tutorial Sign-Up:
The following is a guide to using LEARN to sign up for your tutorial. If you have any
problems using the LEARN sign up, please contact the course secretary by email
edwin.cruden@ed.ac.uk
Tutorial sign up will open on 13:30 15.09.14 after the first lecture has taken place, and
will close at 12 noon on the Friday of Week 1 19.09.14.
Step 1 – Accessing LEARN course pages
Access to LEARN is through the MyEd Portal. You will be given a log-in and password
during Freshers’ Week. Once you are logged into MyEd, you should see a tab called
‘Courses’ which will list the active LEARN pages for your courses under ‘myLEARN’.
Step 2 – Welcome to LEARN
Once you have clicked on the relevant course from the list, you will see the Course
Content page. There will be icons for the different resources available, including one
called ‘Tutorial Sign Up’. Please take note of any instructions there.
Step 3 – Signing up for your tutorial
Clicking on Tutorial Sign Up will take you to the sign up page where all the available
tutorial groups are listed along with the running time and location.
Once you have selected the group you would like to attend, click on the ‘Sign up’ button.
A confirmation screen will display.
IMPORTANT: If you change your mind after having chosen a tutorial you cannot go back
and change it and you will need to email the course secretary. Reassignments once
tutorials are full or after the sign-up period has closed will only be made in exceptional
circumstances.
Tutorials have restricted numbers and it is important to sign up as soon as possible. The
tutorial sign up will only be available until 12 noon on the Friday of Week 1 19.09.14 so
that everyone is registered to a group ahead of tutorials commencing in Week 2. If you
have not yet signed up for a tutorial by this time you will be automatically assigned to
a group which you will be expected to attend.
11
The Operation of Lateness Penalties (1st/2nd years):
Management of deadlines and timely submission of all assessed items (coursework,
essays, project reports, etc.) is a vitally important responsibility in your university career.
Unexcused lateness will mean your work is subject to penalties and will therefore have an
adverse effect on your final grade.
If you miss the submission deadline for any piece of assessed work 5 marks will be
deducted for each calendar day that work is late, up to a maximum of five calendar days
(25 marks). Work that is submitted more than five days late will not be accepted and will
receive a mark of zero. There is no grace period for lateness and penalties begin to apply
immediately following the deadline. For example, if the deadline is Tuesday at 12 noon,
work submitted on Tuesday at 12.01pm will be marked as one day late, work submitted
at 12.01pm on Wednesday will be marked as two days late, and so on.
Extension Policy (1st/2nd years):
If you have good reason for not meeting a coursework deadline, you may request an
extension from either your tutor (for extensions of up to five calendar days) or the course
organiser (for extensions of six or more calendar days), normally before the deadline.
Any requests submitted after the deadline may still be considered by the course organiser
if there have been extenuating circumstances. A good reason is illness, or serious
personal circumstances, but not pressure of work or poor time management. Your
tutor/course organiser must inform the course secretary in writing about the extension,
for which supporting evidence may be requested. Work which is submitted late without
your tutor's or course organiser's permission (or without a medical certificate or other
supportive evidence) will be subject to lateness penalties.
Procedure for Viewing Marked Exam Scripts:
If you would like to see your exam script after the final marks have been published then
you should contact the course secretary by email to arrange a time to do this. Please note
that there will be no feedback comments written on the scripts, but you may find it useful
to look at what you wrote, and see the marks achieved for each individual question. You
will not be permitted to keep the exam script but you are welcome to take it away to read
over or make photocopies. If you wish to do this please bring a form of ID that can be left
at the office until you return the script. Please note that scripts cannot be taken away
overnight.
Return of Feedback:
Feedback for coursework will be returned online via ELMA
Monitoring Attendance and Engagement
12
It is the policy of the University as well as good educational practice to monitor
the engagement and attendance of all our students on all our programmes. This
provides a positive opportunity for us to identify and help those of you who
might be having problems of one kind or another, or who might need additional
support. Monitoring attendance is particularly important for our Tier 4 students,
as the University is the sponsor of your UK visa. Both the School and the
individual student have particular responsibilities to ensure that the terms of
your visa are met fully so that you can continue your studies with us. Tier 4
students should read carefully the advice set out in the Appendix to this
Handbook. This can also be found here
www.sps.ed.ac.uk/undergrad/current_students/student_support/students_on_
a_tier_4_visa .You can also contact: www.ed.ac.uk/immigration
Collaboration, Cheating and Plagiarism
Plagiarism Guidance for Students:
Avoiding Plagiarism:
Material you submit for assessment, such as your essays, must be your own work. You
can, and should, draw upon published work, ideas from lectures and class discussions,
and (if appropriate) even upon discussions with other students, but you must always
make clear that you are doing so. Passing off anyone else’s work (including another
student’s work or material from the Web or a published author) as your own is plagiarism
and will be punished severely. When you upload your work to ELMA you will be asked to
check a box to confirm the work is your own. ELMA automatically runs all submissions
through ‘Turnitin’, our plagiarism detection software, and compares every essay against
a constantly-updated database, which highlights all plagiarised work. Assessed work that
contains plagiarised material will be awarded a mark of zero, and serious cases of
plagiarism will also be reported to the College Academic Misconduct officer. In either
case, the actions taken will be noted permanently on the student's record. For further
details on plagiarism see the Academic Services’ website:
http://www.ed.ac.uk/schools-departments/academicservices/students/undergraduate/discipline/plagiarism
13
Discussing Sensitive Topics:
You should read this handbook carefully and if there are any topics that you may feel
distressed by you should seek advice from the course convenor and/or your Personal
Tutor.
For more general issues you may consider seeking the advice of the Student Counselling
Service, http://www.ed.ac.uk/schools-departments/student-counselling
Learning Resources for Undergraduates:
The Study Development Team at the Institute for Academic Development (IAD) provides
resources and workshops aimed at helping all students to enhance their learning skills
and develop effective study techniques. Resources and workshops cover a range of
topics, such as managing your own learning, reading, note making, essay and report
writing, exam preparation and exam techniques.
The study development resources are housed on 'LearnBetter' (undergraduate), part of
Learn, the University's virtual learning environment. Follow the link from the IAD Study
Development web page to enrol: www.ed.ac.uk/iad/undergraduates
Workshops are interactive: they will give you the chance to take part in activities, have
discussions, exchange strategies, share ideas and ask questions. They are 90 minutes
long and held on Wednesday afternoons at 1.30pm or 3.30pm. The schedule is available
from the IAD Undergraduate web page (see above).
Workshops are open to all undergraduates but you need to book in advance, using the
MyEd booking system. Each workshop opens for booking 2 weeks before the date of the
workshop itself. If you book and then cannot attend, please cancel in advance through
MyEd so that another student can have your place. (To be fair to all students, anyone
who persistently books on workshops and fails to attend may be barred from signing up
for future events).
Study Development Advisors are also available for an individual consultation if you have
specific questions about your own approach to studying, working more effectively,
strategies for improving your learning and your academic work. Please note, however,
that Study Development Advisors are not subject specialists so they cannot comment on
the content of your work. They also do not check or proof read students' work.
To make an appointment with a Study Development Advisor, email iad.study@ed.ac.uk
(For support with English Language, you should contact the English Language Teaching
Centre).
14
Appendix
STUDENTS ON A TIER 4 VISA
As a Tier 4 student, the University of Edinburgh is the sponsor of your UK visa.
The University has a number of legal responsibilities, including monitoring your
attendance on your programme and reporting to the Home Office where:

you suspend your studies, transfer or withdraw from a course, or complete
your studies significantly early;

you fail to register/enrol at the start of your course or at the two additional
registration sessions each year and there is no explanation;

you are repeatedly absent or are absent for an extended period and are
excluded from the programme due to non-attendance. This includes missing
Tier 4 census points without due reason. The University must maintain a
record of your attendance and the Home Office can ask to see this or request
information about it at any time;
As a student with a Tier 4 visa sponsored by the University of Edinburgh, the
terms of your visa require you to, (amongst others):

Ensure you have a correct and valid visa for studying at the University of
Edinburgh, which, if a Tier 4 visa, requires that it is a visa sponsored by the
University of Edinburgh;

Attend all of your University classes, lectures, tutorials, etc where required. This
includes participating in the requirements of your course including submitting
assignments, attending meetings with tutors and attending examinations . If you
cannot attend due to illness, for example, you must inform your School. This
includes attending Tier 4 Census sessions when required throughout the academic
session.

Make sure that your contact details, including your address and contact
numbers are up to date in your student record.

Make satisfactory progress on your chosen programme of studies.

Observe the general conditions of a Tier 4 General student visa in the UK,
including studying on the programme for which your visa was issued, not
overstaying the validity of your visa and complying with the work restrictions
of the visa.
15
Please note that any email relating to your Tier 4 sponsorship, including census
dates and times will be sent to your University email address - you should
therefore check this regularly.
Further details on the terms and conditions of your Tier 4 visa can be found in
the “Downloads” section at www.ed.ac.uk/immigration
Information or advice about your Tier 4 immigration status can be obtained by
contacting the International Student Advisory Service, located at the
International Office, 33 Buccleuch Place, Edinburgh EH8 9JS
Email: immigration@ed.ac.uk
16
Download