Statistical Modeling for Social Scientists (Course Code: SSPS10023) School of Social & Political Science, University of Edinburgh SESSION 2015-2016 Semester 1 The Course The main aim of this course is to provide a broad perspective on the use of statistical modeling to reach conclusions from data. It covers generalized linear models, some major statistical learning tools, and models for complex causal relationships, mainly in the context of social sciences. Lectures are combined with practical computer lab tutorials in order to illustrate the applications of the theoretical tools. The course employs a hands-on approach through analysis using the statistical software R. The applications are mostly chosen from real social science research questions but examples from other disciplines like biology, medicine and engineering are also given. Although the course will cover the technical aspects of the models introduced, the emphasis will be on application, coding and interpretation. On top of the theoretical tools introduced, the course aims to equip students two other computational skills: data management and data visualization. R packages dplyr and ggplot2 will be introduced and used for these purposes. Learning Outcomes By the end of the course students will: 1. Have a unified conceptual and mathematical understanding of generalized linear models. 2. Be able to use the statistical software R for data management, data analysis and data visualization. 3. Be able to analyze multidimensional data through dimension reduction and clustering 4. To appreciate the uses and limits maximum likelihood estimation. 5. Be able to deal with a particular causality problem using the instrumental variable regression. 2 Course Organisation The course convener is Ugur Ozdemir (Room 3.02 CMB); office hours by appointment; email: Ugur.Ozdemir@ed.ac.uk Course Secretary - Daniel Jackson, email: daniel.jackson@ed.ac.uk Contact number 0131 511 337 Lectures: Tuesday, 1110-1300 Tutorial: Wednesday, 10:00-10:50 3 Course Outline 1. Principles of Statistical Modeling 2. Introduction to R 3. Data Manipulation and Visualization with R 4. GLM : Basics 5. GLM Estimation: Maximum Likelihood Principle 6. GLM: Binary Variables and Logistic Regression 7. GLM: Nominal and Ordinal Logistic Regression 8. GLM: Poisson Regression and Log-Linear Models 9. Unsupervised Learning: PCA / Clustering 10. Instrumental Variable Regression 11. Revision Statistical Software R will be used throughout the course. R is an open source software and freely available online. We will also use R-Studio, a graphical user interface for R, which is also freely available. Course Reading The only required text for the course: Dobson, Annette J., and Adrian Barnett. An Introduction to Generalized Linear Models. CRC Press, 2008. (DA) The book has been ordered to Blackwell Bookshop. All other weekly readings will be provided through Learn. Other References Madsen, Henrik, and Poul Thyregod. Introduction to general and generalized linear models. CRC Press, 2010. Matloff, Norman. The Art of R Programming: A tour of statistical software design. No Starch Press, 2011. Crawley, Michael J. The R book. John Wiley & Sons, 2012. 4 Chang, Winston. R Graphics Cookbook. O'Reilly Media, Inc., 2012. Madsen, Henrik, and Poul Thyregod. Introduction to General and Generalized Linear Models. CRC Press, 2010 Agresti, Alan. Foundations of Linear and Generalized Linear Models. John Wiley & Sons, 2015. Dunteman, George H., and Moon-Ho R. Ho. An Introduction to Generalized Linear Models. Sage, 2006. 5 Assessment Course assessment is based on: Tutorial Assessment (40%) Tutorial assessment will based on the best eight out of nine weekly tutorial quizzes. There will be no quiz in the first and the last weeks. Each of the 8 selected quizzes will be worth 5 percentage points of the 40 percentage points allocated to all tutorial assignments. The quizzes will be no longer than 15 minutes and will typically include questions regarding to previous two weeks’ material. Timed Assignment (60%) Students will have 72 hours to complete a timed assignment. There will be some constrained choice on the assignment and it will include both problem solving and data analysis sections. The assignment will be made available after the lecture on Dec 1st and it will be due back Dec 4th at 12pm. Further details about the tutorial assessment and the timed assignment will be provided in class. 6 Course Programme Week 1 - 22/09/15: Principles of Statistical Modeling Exploratory data analysis Model formulation Parameter estimation Model diagnostics Inference and interpretation Readings: - Cox, D. R. and E. J. Snell (1981). Applied Statistics: Principles and Examples. London: Chapman & Hall (p: 1-19) - DA (Chapter 3) Week 2 - 29/09/15: Introduction to R Installing R, R-Studio and R packages Data structures in R (vectors, matrices, lists, data frames) Simple programming structures Data input and output Readings: https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf Week 3 - 06/10/15: Data Manipulation and Visualization with R Data manipulation in R using the dplyr and tidyr packages Data visualization in R using the ggplot2 package Readings: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html 7 https://www.rstudio.com/wp-content/uploads/2015/02/data-wranglingcheatsheet.pdf https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf Week 4 - 13/10/15: GLM: Basics Exponential family of distributions Error structures Properties of distributions in the exponential family Generalized linear models Readings: - DA (Chapter 3) Week 5 - 20/10/15: GLM Estimation: Maximum Likelihood Principle Point estimation theory The likelihood function The maximum likelihood estimate Distribution of the ML estimator Generalized loss-function and deviance Likelihood ratio tests Readings: - Madsen, Henrik, and Poul Thyregod. Introduction to general and generalized linear models. CRC Press, 2010 (Chapter 3) Week 6 - 27/10/15: GLM: Binary Variables and Logistic Regression Dose response models General logistic regression model Goodness of fit statistics Residuals Other diagnostics 8 Readings: - DA (Chapter 7) Week 7 - 03/11/15: GLM: Nominal and Ordinal Logistic Regression Introduction Multinomial distribution Nominal logistic regression Ordinal logistic regression Readings: - DA (Chapter 8) Week 8 - 10/11/15: GLM: Poisson Regression and Log-Linear Models Poisson regression Examples of contingency tables Probability models for contingency tables Log-linear models Inference for log-linear models Readings: - DA (Chapter 9) Week 9 - 17/11/15: Unsupervised Learning: PCA / Clustering Principal component analysis Clustering Methods o K-Means Clustering o Hierarchical Clustering Readings: - James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. New York: Springer. (Chapter 10) 9 Week 10 - 24/11/15: Instrumental Variable (IV) Regression Causal Effect Estimation with a Binary IV Traditional IV Estimators Recognized Pitfalls of Traditional IV Estimation Instrumental Variable Estimators of Average Causal Effects Readings: - Morgan, S. L., & Winship, C. (2014). Counterfactuals and Causal Inference. Cambridge University Press. (Chapter 7) Week 11 - 01/12/15: Revision 10 Guide to Using LEARN for Online Tutorial Sign-Up: The following is a guide to using LEARN to sign up for your tutorial. If you have any problems using the LEARN sign up, please contact the course secretary by email edwin.cruden@ed.ac.uk Tutorial sign up will open on 13:30 15.09.14 after the first lecture has taken place, and will close at 12 noon on the Friday of Week 1 19.09.14. Step 1 – Accessing LEARN course pages Access to LEARN is through the MyEd Portal. You will be given a log-in and password during Freshers’ Week. Once you are logged into MyEd, you should see a tab called ‘Courses’ which will list the active LEARN pages for your courses under ‘myLEARN’. Step 2 – Welcome to LEARN Once you have clicked on the relevant course from the list, you will see the Course Content page. There will be icons for the different resources available, including one called ‘Tutorial Sign Up’. Please take note of any instructions there. Step 3 – Signing up for your tutorial Clicking on Tutorial Sign Up will take you to the sign up page where all the available tutorial groups are listed along with the running time and location. Once you have selected the group you would like to attend, click on the ‘Sign up’ button. A confirmation screen will display. IMPORTANT: If you change your mind after having chosen a tutorial you cannot go back and change it and you will need to email the course secretary. Reassignments once tutorials are full or after the sign-up period has closed will only be made in exceptional circumstances. Tutorials have restricted numbers and it is important to sign up as soon as possible. The tutorial sign up will only be available until 12 noon on the Friday of Week 1 19.09.14 so that everyone is registered to a group ahead of tutorials commencing in Week 2. If you have not yet signed up for a tutorial by this time you will be automatically assigned to a group which you will be expected to attend. 11 The Operation of Lateness Penalties (1st/2nd years): Management of deadlines and timely submission of all assessed items (coursework, essays, project reports, etc.) is a vitally important responsibility in your university career. Unexcused lateness will mean your work is subject to penalties and will therefore have an adverse effect on your final grade. If you miss the submission deadline for any piece of assessed work 5 marks will be deducted for each calendar day that work is late, up to a maximum of five calendar days (25 marks). Work that is submitted more than five days late will not be accepted and will receive a mark of zero. There is no grace period for lateness and penalties begin to apply immediately following the deadline. For example, if the deadline is Tuesday at 12 noon, work submitted on Tuesday at 12.01pm will be marked as one day late, work submitted at 12.01pm on Wednesday will be marked as two days late, and so on. Extension Policy (1st/2nd years): If you have good reason for not meeting a coursework deadline, you may request an extension from either your tutor (for extensions of up to five calendar days) or the course organiser (for extensions of six or more calendar days), normally before the deadline. Any requests submitted after the deadline may still be considered by the course organiser if there have been extenuating circumstances. A good reason is illness, or serious personal circumstances, but not pressure of work or poor time management. Your tutor/course organiser must inform the course secretary in writing about the extension, for which supporting evidence may be requested. Work which is submitted late without your tutor's or course organiser's permission (or without a medical certificate or other supportive evidence) will be subject to lateness penalties. Procedure for Viewing Marked Exam Scripts: If you would like to see your exam script after the final marks have been published then you should contact the course secretary by email to arrange a time to do this. Please note that there will be no feedback comments written on the scripts, but you may find it useful to look at what you wrote, and see the marks achieved for each individual question. You will not be permitted to keep the exam script but you are welcome to take it away to read over or make photocopies. If you wish to do this please bring a form of ID that can be left at the office until you return the script. Please note that scripts cannot be taken away overnight. Return of Feedback: Feedback for coursework will be returned online via ELMA Monitoring Attendance and Engagement 12 It is the policy of the University as well as good educational practice to monitor the engagement and attendance of all our students on all our programmes. This provides a positive opportunity for us to identify and help those of you who might be having problems of one kind or another, or who might need additional support. Monitoring attendance is particularly important for our Tier 4 students, as the University is the sponsor of your UK visa. Both the School and the individual student have particular responsibilities to ensure that the terms of your visa are met fully so that you can continue your studies with us. Tier 4 students should read carefully the advice set out in the Appendix to this Handbook. This can also be found here www.sps.ed.ac.uk/undergrad/current_students/student_support/students_on_ a_tier_4_visa .You can also contact: www.ed.ac.uk/immigration Collaboration, Cheating and Plagiarism Plagiarism Guidance for Students: Avoiding Plagiarism: Material you submit for assessment, such as your essays, must be your own work. You can, and should, draw upon published work, ideas from lectures and class discussions, and (if appropriate) even upon discussions with other students, but you must always make clear that you are doing so. Passing off anyone else’s work (including another student’s work or material from the Web or a published author) as your own is plagiarism and will be punished severely. When you upload your work to ELMA you will be asked to check a box to confirm the work is your own. ELMA automatically runs all submissions through ‘Turnitin’, our plagiarism detection software, and compares every essay against a constantly-updated database, which highlights all plagiarised work. Assessed work that contains plagiarised material will be awarded a mark of zero, and serious cases of plagiarism will also be reported to the College Academic Misconduct officer. In either case, the actions taken will be noted permanently on the student's record. For further details on plagiarism see the Academic Services’ website: http://www.ed.ac.uk/schools-departments/academicservices/students/undergraduate/discipline/plagiarism 13 Discussing Sensitive Topics: You should read this handbook carefully and if there are any topics that you may feel distressed by you should seek advice from the course convenor and/or your Personal Tutor. For more general issues you may consider seeking the advice of the Student Counselling Service, http://www.ed.ac.uk/schools-departments/student-counselling Learning Resources for Undergraduates: The Study Development Team at the Institute for Academic Development (IAD) provides resources and workshops aimed at helping all students to enhance their learning skills and develop effective study techniques. Resources and workshops cover a range of topics, such as managing your own learning, reading, note making, essay and report writing, exam preparation and exam techniques. The study development resources are housed on 'LearnBetter' (undergraduate), part of Learn, the University's virtual learning environment. Follow the link from the IAD Study Development web page to enrol: www.ed.ac.uk/iad/undergraduates Workshops are interactive: they will give you the chance to take part in activities, have discussions, exchange strategies, share ideas and ask questions. They are 90 minutes long and held on Wednesday afternoons at 1.30pm or 3.30pm. The schedule is available from the IAD Undergraduate web page (see above). Workshops are open to all undergraduates but you need to book in advance, using the MyEd booking system. Each workshop opens for booking 2 weeks before the date of the workshop itself. If you book and then cannot attend, please cancel in advance through MyEd so that another student can have your place. (To be fair to all students, anyone who persistently books on workshops and fails to attend may be barred from signing up for future events). Study Development Advisors are also available for an individual consultation if you have specific questions about your own approach to studying, working more effectively, strategies for improving your learning and your academic work. Please note, however, that Study Development Advisors are not subject specialists so they cannot comment on the content of your work. They also do not check or proof read students' work. To make an appointment with a Study Development Advisor, email iad.study@ed.ac.uk (For support with English Language, you should contact the English Language Teaching Centre). 14 Appendix STUDENTS ON A TIER 4 VISA As a Tier 4 student, the University of Edinburgh is the sponsor of your UK visa. The University has a number of legal responsibilities, including monitoring your attendance on your programme and reporting to the Home Office where: you suspend your studies, transfer or withdraw from a course, or complete your studies significantly early; you fail to register/enrol at the start of your course or at the two additional registration sessions each year and there is no explanation; you are repeatedly absent or are absent for an extended period and are excluded from the programme due to non-attendance. This includes missing Tier 4 census points without due reason. The University must maintain a record of your attendance and the Home Office can ask to see this or request information about it at any time; As a student with a Tier 4 visa sponsored by the University of Edinburgh, the terms of your visa require you to, (amongst others): Ensure you have a correct and valid visa for studying at the University of Edinburgh, which, if a Tier 4 visa, requires that it is a visa sponsored by the University of Edinburgh; Attend all of your University classes, lectures, tutorials, etc where required. This includes participating in the requirements of your course including submitting assignments, attending meetings with tutors and attending examinations . If you cannot attend due to illness, for example, you must inform your School. This includes attending Tier 4 Census sessions when required throughout the academic session. Make sure that your contact details, including your address and contact numbers are up to date in your student record. Make satisfactory progress on your chosen programme of studies. Observe the general conditions of a Tier 4 General student visa in the UK, including studying on the programme for which your visa was issued, not overstaying the validity of your visa and complying with the work restrictions of the visa. 15 Please note that any email relating to your Tier 4 sponsorship, including census dates and times will be sent to your University email address - you should therefore check this regularly. Further details on the terms and conditions of your Tier 4 visa can be found in the “Downloads” section at www.ed.ac.uk/immigration Information or advice about your Tier 4 immigration status can be obtained by contacting the International Student Advisory Service, located at the International Office, 33 Buccleuch Place, Edinburgh EH8 9JS Email: immigration@ed.ac.uk 16