SOC 210: GRADUATE STATISTICS I _____________________________________________________________________________ Lecture: Section Number: 14618 Room: SSM 100 Day: Monday & Wednesday Time: 1:30-2:45 PM ______________________________________________________________________________ Instructor: Office: Office Hours: E-Mail: Kyle Dodson COB 337 Wednesday, 3:00 PM- 4:30 PM (or by appointment) kdodson2@ucmerced.edu Course Description This is the first semester of the two-course sequence in social statistics required of graduate students in Sociology. This course takes a systematic approach to the general linear model for continuous dependent variables; the second semester course covers nonlinear regression models for categorical and limited dependent variables. In addition to laying the theoretical foundations for future social science research, this course introduces students to the use of computerized statistical analysis using the software program Stata. Students are encouraged to think creatively about how to use statistical methods in their own research. Students meet twice each week for a 75 minute lecture on statistical fundamentals, theory, applications, and topics. There are no mathematics prerequisites. Students are not expected to have a background in calculus, but facility with algebra and exposure to the rudiments of statistical distribution theory and hypothesis testing is expected. The course is organized into four sections. The first section of the course covers the fundamental mathematical and statistical concepts that are the building blocks for regression analysis. The purpose of this section is both to refresh your memory and to provide a deeper, more formal presentation of familiar concepts. The second section focuses on the assumption and mechanics of the classical linear regression model. At the end of the second section you will have a good mechanical knowledge of regression analysis. The third section includes a practical exposition of the general linear model as we begin to relax the assumptions of the classical linear regression model. At the end of the third section you will have a deeper theoretical and applied understanding of the flexibility and limitations of the general linear regression model for social science data. The final section presents an overview of topics in estimation for common problems in social science research, including an introduction to structural equation models. The purpose of this brief section is to give you some exposure to more complex models for continuous dependent variables rather than to ask you to develop sophistication with these techniques. 1 Details The SOC 210 course notes are required reading and the primary source for lectures. Course notes are published on the web. These notes were authored by Professor Patricia McManus at Indiana University. They may not be cited, reproduced, or distributed without the expressed written consent of Professor McManus. There is no required textbook. An optional text that is accessible is Jeffrey Wooldridge’s Introductory Econometrics, 4th edition. The textbook expands on topics covered in the course notes, covers statistical theory in more detail, and presents material that is beyond the scope of a one-semester course or is covered only briefly in class. Students interested in an optional introductory text may be interested in Paul Allison’s Multiple Regression: A Primer. This text covers background material that is typically covered in an undergraduate course in social statistics, and introduces many of the course topics. Students interested in obtaining a more advanced graduate-level econometrics textbook for reference are encouraged to choose Damodar N. Gujarati’s Basic Econometrics as an introductory text or the intermediate text Econometric Analysis by William H. Greene. We use Stata version 13 in this class. Academic Technology Services at UCLA maintains an unparalleled set of online statistical resources, including an excellent site for learning and using Stata. Students who want more in-depth coverage of Stata can find copies of the Stata manuals on the computers in the graduate student lab. Primary texts: S554 Lecture Notes. Available on the course website. Recommended texts: Allison, Paul D. 1998. Multiple Regression: A Primer. Pine Forge Press. Greene, William H. 2002. Econometric Analysis 5th Edition. Prentice-Hall. Gujarati, Damodar N. 2003. Basic Econometrics 4th Edition. New York: McGraw-Hill. Wooldridge, Jeffrey. 2009. Introductory Econometrics: A Modern Approach 4th Edition. Mason, Ohio: South-Western College Publishing Co. 2 Software: All the necessary software is available in the graduate student lab. Class assignments will require the use of Stata Version 13, a statistical software package. Stata includes a text-editor to write and edit batch files containing multiple Stata commands. These batch files, called “do-files,” can also be edited with any text-based file editor such as NotePad and Textpad. Stata’s text editor has become increasingly sophisticated, but my personal preference is to use Textpad (http://www.textpad.com). A syntax file that allows for code highlighting can be downloaded from Scott Long’s website (http://www.indiana.edu/~jslsoc/stata/wf/misc/textpad_syntax_file/stata.syn). Students who wish to buy Stata for home use can inquire about student discounts through UC Merced IT. Students will also need a flexible word-processing package such as Microsoft Word to complete assignments. The word-processor and printer must have the capacity to (a) choose a fixed font (such as Courier) for tables and (b) choose a Greek or mathematical font for common characters such as β and σ. Microsoft Word also offers useful matrix and graphics capabilities. Supplies: Your network folder (“U Drive” or “L Drive”) will prove useful for storing data and Programs You will likely need portable storage. Your best bet is a USB-compatible memory key. You’ll need three large Campus Mail envelopes for handing in assignments. Get them from the SSHA office or mailroom, and write your name on the outside, along with your campus address if you are not in the Sociology department. I use these envelopes to return assignments. Lots of paper and toner. Requirements and Grading: Students are required to complete 8 homework assignments during the course of the semester. Each assignment is worth between 50-175 points for a total of 1000 points over the course of the semester. The final grade for the course is determined by the sum of individual assignment grades. Grading for this course is as follows: 981 – 1000 931 – 980 901 – 930 881 – 900 831 – 880 801 – 830 A+ A AB+ B B- 781-800 731-780 701-730 601-700 0 -599 C+ C CD F 3 Assignments Each of the assignments includes data analysis exercises using Stata and the course data extracts provided. The earlier assignments will be very structured, but towards the end of the semester you will be asked to choose or construct your own variables for analysis. You can use the course data extracts for these assignments, or you may discuss alternative data sources with me in advance. Note that you should design these analyses so that the results reveal interesting relationships. If you choose variables that are not significant and have small effects, your grade will necessarily be lower. Deadlines and Late Penalties It is critical that you keep up with assignments. Assignments should be handed to the professor or placed in the professor’s box in the Classroom and Office Building on the due date. Be sure to confirm with me if you need to make alternative arrangements or if you are handing in a late assignment – don’t expect the box to be checked regularly. Assignments due in class are due at the start of class on the due date. Late assignments will be penalized by 5 points if they are received within 24 hours of the time due, 10 points if they are received within 48 hours of the time due, 25 points if they are received within 72 hours, 50 points if they are received within 96 hours, and 75 points if received after 96 hours. In no case will assignments be accepted on or after the sixth calendar day after the due date. Working Together Students are encouraged to discuss homework assignments and data preparation with each other. In particular, when cleaning data and constructing new variables for the early assignments it is a good idea to compare your data with one or two other students before beginning your write-up. The final product, however, must reflect your own work. On computer assignments that require that you choose variables for analysis, everyone is expected to use different variables. If you are aware that someone else is using the same variables that you are using, one or both of you need to change variables. Computer Problems If you are having problems analyzing your data, be sure to bring a hardcopy listing of the command file and the output, along with a disk with the command file and the output file. It is impossible to diagnose error messages without these. If you send a question electronically, include the Stata log file. Revisions of Assignments You may submit revisions of any assignment scoring below 90% of the total point count. Sometimes I note on the assignment that a revision should be done, but you can always request a revision. If you submit a revised assignment your final grade will be an average of the original and the revised work, less any late penalties imposed on the original submission. Revisions must follow this format: Include a brief memo summarizing the problems you have addressed. Resubmit the original, graded, marked-up copy of the assignment in entirety along with the revised assignment. Include a clean copy of the entire assignment, including questions that have not been changed from the original. 4 Use highlighting and/or marginal notes on the revised assignment to indicate all sections that have been changed. If the revision includes any changes in the data analysis, a new copy of the entire data log should be submitted. The entire package should be given to me directly – not to the lab instructor - within three weeks of the date the original was returned. Disability Statement I am committed to providing assistance to help you be successful in this course. Reasonable accommodations are available for students with a documented disability. If you have a disability and may need accommodations to fully participate in this class, please visit the Disability Services Center. All accommodations must be approved through Disability Services (Kolligan Library, West Wing Suite 109). Please stop by or call 209-228-6996 to make an appointment with a disability specialist. Honor Code If you plagiarize, or otherwise cheat, on any assignment, you will fail this course and your transcript will note your violation of the academic honesty policy. Plagiarism involves intentionally representing someone else’s words or ideas as your own. If you use outside sources—either in the form of quotes or ideas—you must cite them to indicate where they come from. Please see or email me if you need help with citations. When in doubt, ask! If you cheat, or let someone else represent your work as their own, you are in violation of the student code of conduct. You will fail this course and your failing grade will be identified on your student transcript as resulting from academic dishonesty. Please consult the office of student life web site if you require further information: http://studentlife.ucmerced.edu/ (then go to “Student Judicial Affairs” and look at the “academic honesty policy”). Your enrollment in this course indicates your willingness to comply with all requirements and policies. 5 Course Outline & Readings This outline provides a good but imperfect estimate of the time we will spend on each topic. The actual schedule is subject to change. Please read the course notes before class. Readings from the course notes are identified as “Lectures 1 & 2.” Assignment Due Dates and Point Count Assignment 1. Introduction to Stata Assignment 2. Data Management Assignment 3. Regression Fit Assignment 4. Regression Inference Assignment 5. Functional Form Assignment 6. Diags / Heterosked. Assignment 7. Multiple Regression TOTAL POINTS Mon Wed Mon Mon Mon Mon Fri Feb 2 Feb 18 Mar 2 Mar 16 Apr 143 Apr 27 May 8 130 130 150 150 125 150 165 1000 I. Preliminary Toolkit. Mathematics, Probability, Statistics Week #1 January 21 Organizational Meeting Week #2 January 26-28 Introductory Concepts: Why, What, How Course Notes: Lecture 1 Course Notes: Appendix A Week #3 February 2-4 Recapitulation. Mathematics, Probability, Statistics Course Notes: Lecture 2 Course Notes: Lecture 3 II. Presentation of the Classical Normal Linear Regression Model Week #4 February 9-11 Simple Regression: Estimation, Interpretation, Fit Course Notes: Lecture 4 Week #5 February 18 Multiple Regression: Estimation, Interpretation, Fit Course Notes: Lecture 5 Week #6 February 23-25 Statistical Inference I: Interval Estimation and Hypothesis Testing Course Notes: Lecture 6, 6.1 through 6.9 Week #7 Mar 2-4 Statistical Inference II: Interval Estimation and Hypothesis Testing Course Notes: Lecture 6, 6.10 to end Course Notes: Appendix D Week #8 Mar 9-11 Specification Tests & Functional Form Course Notes: Lecture 7 6 Week #9 Mar 16-18 Group Differences: Dummy Variables and Interaction Terms Course Notes: Lecture 8 Week #10 March 30 April 1 Omitted Variables, Measurement Error, Multicollinearity Course Notes: Lecture 9 Course Notes: Lecture 10 III. The General Linear Model: Relaxing the Assumptions Week #11 April 6-8 Heteroskedastic Errors, Loglinear Transformation, WLS Course Notes: Lecture 11 Week #12 April 13-15 Correlated Errors Course Notes: Lecture 12 IV. Topics in Estimation Week #13 April 20-22 Introduction to Systems of Equations Course Notes: Lecture 13 Week #14 April 27-29 Jackknife and Bootstrap Methods for Interval Estimation Course Notes: Lecture 14 Week #15 May 4-6 Maximum Likelihood Estimation Course Notes: Lecture 15 7