University of Miami

College of Arts and Sciences

Department of Sociology

SOC 611 – Intermediate Sociological Statistics

Spring 2011

Professor:

Time:

Classroom:

Office:

Office Hours:

Telephone:

Fax:

Assistant:

Michael T. French, Ph.D.

Wednesdays, 9:00-11:30

Sociology conference room

Merrick, Room 121F

After class or by appointment

305-284-6039

305-284-5310

Zuzer Calero (305-284-8288)

E-mail:

Prerequisites mfrench@miami.edu

The prerequisites for this course include SOC 211, SOC 511, or another introductory statistics class; some familiarity/experience using a statistical package(s) on a personal computer such as Stata, SAS, or SPSS; and/or permission of the instructor. At a minimum, students should be familiar and comfortable with all topics covered in Appendixes A, B, and C (pages 695-783) of Wooldridge

(2009) and Chapters 1-4 of Hamilton (2009).

Course Description

Analysis of sociological, economic, and demographic data requires a broad understanding and application of statistical techniques. Although the application of statistical techniques to actual data has a specific name in most disciplines (e.g., econometrics, psychometrics, biostatistics), many of the core methods are practiced by all disciplines. Thus, this course will introduce a wide range of data analysis methods that are commonly used by social scientists. Statistics can be taught from various perspectives with a number of different approaches. Some professors use a completely nonmathematical approach with an emphasis on memorizing techniques and formulas. At the other extreme, instructors develop the material within a rigorous and complex mathematical framework.

This course will use a more balanced approach with a focus on developing and understanding key statistical concepts, and then applying these concepts with numerous “real world” examples and actual data.

It is assumed that all students have a solid foundation of basic statistical concepts that were taught in some of the prerequisite courses. These concepts include descriptive statistics, probability, probability distributions, basic estimators, confidence intervals, and hypothesis testing. The course will begin with a brief overview of statistical concepts and then continue with a detailed presentation of specific topics including simple linear regression, multiple regression, regression diagnostics, techniques for dichotomous dependent variables (logit, probit, and linear probability models), techniques for count dependent variables (Poisson regression, negative binomial regression), panel data methods, simultaneous equation models, and (time permitting) miscellaneous topics. All of

these methods will be demonstrated through the use of examples in the textbook and “hands on” use of actual data sets.

Since the course will be structured as an applied statistical learning experience rather than a theoretical statistics course, it is imperative that students test their comprehension and skills through analysis of primary and secondary data sets. Many statistical packages are available for these purposes (e.g., SAS, SPSS, Minitab), but we will use Stata in class examples because this program is used in most of the required textbook problems, and it is very easy to learn, powerful, and flexible.

Multiples copies of Stata 11 will be available in the student computer lab and the graduate student lounge.

Course Website

All students officially registered for the course or auditing with my permission can access the course’s Blackboard website at : www.miami.edu/blackboard . Your Blackboard User Name is your UM Email Alias (you can find your UM Email Alias on MyUM at www.miami.edu/myum ) and your initial password is your birth date. The Blackboard site includes a course syllabus, important announcements, assignments, PowerPoint slides, and any other relevant course information.

Copies of PowerPoint slides will not be distributed in class (please download these prior to class).

Academic Integrity

Academic dishonesty in any form is not tolerated. This policy is required to encourage consistent ethical behavior among students and to foster a climate of fair competition. Personal integrity is a quality that is expected and respected at the University of Miami. Consequently, the Student Honor

Code in is force at all times. Students are responsible for reading, understanding, and upholding the

Honor Code which is available from the Office of the Dean of Students.

All assignments submitted in this class must be original work and cannot be submitted to more than one class. Your grade in the class and student status may be affected by such actions.

Learning Objectives

Students who successfully complete this course will possess the following skills.

1.

Students will understand when it is appropriate to use descriptive, correlational, and/or multivariate statistical analysis.

2.

When multivariate statistical analysis is appropriate, students will be familiar with various techniques for analyzing continuous, count, categorical, skewed, and dichotomous dependent variables.

3.

Students will know when it is appropriate and how to test for various statistical problems in multivariate analyses such as multicollinearity, heteroskedasticity, clustering, endogeneity, omitted variables bias, and sample selection bias.

4.

Since various measures in sociological, economic, medical, and epidemiological data sets are not normally distributed, students will be familiar with techniques that can be used to

2

minimize the bias that may result from standard regression models, such as data transformations, two-part models, bivariate probit, and instrumental variables.

5.

Students will become knowledgeable with Stata so that they can complete homework assignments, a group project, and job-specific data analysis tasks that they may encounter after this course.

Grading

The course grade will be based on a short (group) empirical paper (25%), three problem sets (45%), and a final exam (30%). The short empirical paper and problem sets will be completed outside of class. Students can consult with textbooks and other reference materials, but no assistance is permitted from classmates, instructors, former students, or colleagues. All students will attest to this condition by signing a written pledge on their assignments. The final exam will be taken during the last day of class. The empirical paper (no less than 8 and no more than 10 double-spaced pages) will test a compelling hypothesis or research question(s) in sociology, economics, epidemiology, public health, etc. (See Chapter 19 in Wooldridge (2009)). Students will complete this assignment in groups of 3-4 people. Seventy-five percent of the group project grade will be determined by the instructor and assigned equally to all group members. The other group members will anonymously assess and record the remainder of each student’s grade. The data for this group project can be obtained from a variety of existing data sets or from some data that may be available from your faculty mentor. All supporting statistical output must be included in an appendix to the paper. Each group will deliver a

15 minute presentation of their paper during class time. The final exam will be comprehensive and consist of problems similar to the ones included in the problem sets. Every student must take the final exam during the assigned day and time.

Group Project

The objective of the group project is to provide you with some experience in applying the concepts and methods of research to a real problem. The purpose is to expose you to the real-world research environment where skills such as dividing workload, accepting responsibilities, coordinating individual efforts, communicating effectively, effective resolutions of conflict, and written and oral presentations are immensely valuable. In addition, the group project will provide you with an opportunity to integrate and apply the material learned in this and other classes in the development of a research strategy. The specific requirements of the group project will be announced at the beginning of the semester and posted on the Blackboard website.

You will work on this assignment in groups of 3-4 people. Like group projects your will encounter later in your professional career, group projects in school can sometimes be a frustrating experience.

It is often difficult to pick convenient times for everyone to meet. Group members sometimes feel that the division of labor is not equitable. As frustrating as this may be, it is important for each of you to learn how to manage group work. You will be placed into groups by the third week of classes. You will elect one person in the group as a contact person. Make sure that you exchange email addresses and phone numbers with your group members so that you can contact them and they can contact you outside class.

All problems arising within the group related to relative contributions of the group members are to be handled internally by the group. This is an essential part of the group project experience. You will, however, have an opportunity to evaluate your group members at the end of the semester based

3

on the quality and quantity of the contributions. Your grade can go up or down based on these evaluations. Do not allow others to do a large or the main part of the project. Be involved from the beginning of the project until the end.

This project must include the following statement signed by all group members: I have actively participated in the preparation of this assignment and I attest to its integrity.

Course Materials and Required Readings

The required textbooks for the course are:

Wooldridge, Jeffrey, M. 2009. Introductory Econometrics: A Modern Approach. (Fourth

Edition). Mason, OH: South-Western Cengage Learning (ISBN: 10: 0-324-66054-5).

Acock, Alan, C. 2010. A Gentle Introduction to Stata. (Third Edition). College Station, TX:

Stata Press (ISBN-10: 1597180750).

Hamilton, Lawrence C. 2009. Statistics with Stata (Updated for Version 10). Brooks/Cole

Publishers (ISBN: 0-495-557862).

The optional textbooks for the course are:

Ashenfelter, O., P.B. Levine, and D.J. Zimmerman. 2003. Statistics and Econometrics: Methods

and Applications. Hoboken, NJ: John Wiley and Sons.

Greene, W. H. 2008. Econometric analysis (6 th ed.). New Jersey: Pearson Prentice-Hall.

Montgomery, D.C., E.A. Peck, and G.G. Vining. 2006. Introduction to Linear Regression

Analysis. (Fourth Edition), Wiley Series in Probability and Statistics, New York, NY: John

Wiley and Sons.

Long, J.S. 1997. Regression Models for Categorical and Limited Dependent Variables (Advanced

Quantitative Techniques in the Social Sciences). Sage Publications. ISBN: 0803973748.

Kennedy, P. 2008. A Guide to Econometrics. (Sixth Edition), Malden, Mass.: Blackwell

Publishing.

Whenever necessary and helpful, class handouts, research manuscripts, and recent journal articles will be provided throughout the semester to further explain and demonstrate various statistical methods.

Class Format

Your presence in class is essential to your ability to understand and apply the material covered in this course. Students are expected to be punctual and attend every class. Treat this class as you would any other professional obligation. By accepting a job you are making an implicit commitment to

4

attend work regularly. By registering for this class you make a similar commitment. Missing classes will have a negative effect on your class grade.

This class requires collaborating with others to complete group assignments. Missing class or arriving late/leaving early only frustrates your classmates and hinders the learning process.

Absence from class does not exempt you from being responsible for all the material covered in class and being aware of any announcements made in class. If you miss class it is your responsibility to obtain the lecture or discussion notes and handouts, if any, from your classmates. The class format will be designed to stimulate participation from all students. I encourage questions, comments, and debate. The quality of the class is a direct function of your preparation and discussion. All pagers, cell phones, and other electronic devices must be turned off during class time.

5

COURSE SCHEDULE

I. Overview

A.

Empirical analysis a.

Theoretical model b.

Statistical model

B.

Types of data a.

Cross-sectional data b.

Time-series data c.

Pooled cross-sectional data d.

Panel data

C.

Types of variables (measures) a.

Continuous b.

Binary (dichotomous or dummy) c.

Categorical (ordered and unordered) d.

Count e.

Censored

D.

Hypothesis testing

E.

Types of error

F.

Interpretation and causality

II. Simple Linear Regression

A.

Regression analysis

B.

Linear regression model

C.

Ordinary Least Squares (OLS) estimation

D.

Interpretation of regression coefficients

E.

Inference for regression coefficients

F.

Inference for predicted values

G.

Confidence intervals

H.

Hypothesis testing

I.

Goodness of fit (R 2 )

J.

Nonlinearities in simple linear regression

III. Multiple Regression

A.

Multiple regression model

B.

Interpretation of regression coefficients

C.

OLS Estimation

D.

Inference a.

Confidence intervals b.

Hypothesis testing

E.

Testing multiple linear restrictions

F.

Adjusted R 2

G.

Dummy explanatory variables

H.

Interaction variables

I.

Joint hypothesis tests

J.

F-Test of all variables

PROBLEM SET 1 (Tentative due date: February 23)

6

Chapter 1 (Wooldridge)

Chapters 5-7 (Acock)

Chapter 2 (Wooldridge)

Chapter 6 (Hamilton)

Chapter 8 (Acock)

Chapters 3-7 (Wooldridge)

Chapter 6 (Hamilton)

Chapter 10 (Acock)

IV. Regression Diagnostics

A.

Including an irrelevant variable

B.

Omitted variable bias

C.

Nonlinearities

D.

Outliers

E.

Residual-versus-fitted plots

F.

Multicollinearity

G.

Measurement error

H.

Instrumental variables

I.

Heteroskedasticity

J.

Serial correlation (autocorrelation)

V. Limited Dependent Variable Models

A.

Linear probability model

B.

Logit model

C.

Probit model

D.

Censored and truncated regression models a.

Tobit

E.

Count data models a.

Poisson regression b.

Overdispersion c.

Zero inflation d.

Negative binomial technique

F.

Sample selection corrections

PROBLEM SET 2 (Tentative due date: March 23)

VI. Panel Data Methods

A.

Pooling independent cross sections across time

B.

Difference-in-difference models

C.

Unobservable heterogeneity

D.

Fixed effects models

E.

Random effects models

VII. Simultaneous Equations Models

A.

Structural and reduced-form equations

B.

Simultaneous equations bias a.

specification test b.

structural endogeneity c.

statistical endogeneity

C.

Identification

D.

Estimation a.

indirect least squares b.

instrumental variables (IV) estimation c.

two-stage least-squares (2SLS) estimation d.

bivariate probit e.

two-stage residual inclusion (2SRI) estimation

7

Chapters 8-9 (Wooldridge)

Chapter 7 (Hamilton)

Chapter 17 (Wooldridge)

Chapters 10-11 (Hamilton)

Chapter 11 (Acock)

Chapters 13-14 (Wooldridge)

Chapters 15-16 (Wooldridge)

E.

Testing for endogeneity

F.

Testing overidentification restrictions

PROBLEM SET 3 (Tentative due date: April 13)

VIII. Miscellaneous Topics

A.

Categorical dependent variables a.

ordered logit or probit b.

multinomial logit or probit

B.

General Linear Model (GLM) estimation

C.

Imputation for missing data

D.

Outliers

E.

Bootstrapping

F.

Propensity Score Matching (PSM)

G.

Treatment Regression (treatreg) technique

H.

Nonparametric regression a.

local linear regression (kernel regression) b.

partially linear or semi-linear regression

I.

Robust regression

J.

Median regression

K.

Generalized Methods of Moments (GMM) estimation

L.

Quantile regression

GROUP PROJECT (Tentative due date: April 27)

FINAL EXAM (Scheduled date: May 4)

Handouts

Chapters 10 & 14 (Hamilton)

8