5811 Lab 13

advertisement
Agenda
Soc 5811 Lab #13
12.05.05
I. Welcome
1. Review last lab.
2. Lab handouts, datasets, and other information can be found at:
http://www.tc.umn.edu/~long0324/
II. Objectives
1. More multiple regression analysis.
2. More dummy variables and interaction terms.
3. Short lab assignment.
4. Final projects.
5. Next week we will cover more advanced diagnostics for multiple regression.
III. More on dummy variables
1. Recall that dummy variables allow you to compare groups within the
regression equation. Coefficients for dummy variables are not slopes, but rather
differences in constants. Dummy variables can be constructed form any nominal
category. Remember to always exclude a dummy category in your equation, or
the model will “blow up.”
2. When first learning how to construct dummy variables, it is best to construct a
single variable for every category or group of categories (i.e., DWHITE,
DBLACK, and DOTHER). When you get the swing of things, you can
sometimes only make a single dummy for the group we are most interested in.
a. What is a reference group? What things should we keep in mind when
selecting a reference group?
3. Construct a dummy variable for female (using sex). Regress occupational
prestige (prestg80) on education (educ), income (rincom98), and the female
dummy variable. What did you find? Compare your results with regressions for
males and females separately (i.e., select each group and run a regression for each
without the dummy variable).
a. How do we interpret the dummy variable?
b. Do the separate regressions yield different slopes?
IV. Multiple regression with interaction terms
1. Interaction effects show the difference in slopes between two variables for each
category of a third variable. Interaction terms are constructed by multiplying the
two variables you are interested in. Today we will talk about interaction terms
between a dummy and interval variable, although interaction terms can also be
created using two interval variables.
2. Create an interaction term using female dummy variable from above and
education (educ). Hint; use the compute command to multiply the two variables.
3. Conduct a multivariate regression with the interaction term and its components,
using income (rincom98) as your dependent variable. What will the interaction
term tell us? What did you find?
4 Create dummy variables for Protestant, Catholic, and Jewish (using relig). The
omitted category is all other religious preferences. Construct three interaction
terms with age (age) and each of the dummy variables.
5. First, conduct a regression using religious attendance (attend) as your
dependent variable and using age and each of the dummy variables as your
independent variables. Also, include a dummy variable for female. What did you
find?
6. Next, conduct a regression including your three interaction terms. You should
usually include each of the component variables in the model, as well. What did
you find? How do we interpret the interaction terms?
7. Write the complete regression equation, then predict the religious
attendance for a 47-year old Catholic male. Write a third equation predicting the
religious attendance of a 27-year old Hindu female.
8. When using interaction terms with two continuous variables, the coefficient is
interpreted as the unit change in one variable in the interaction term on the slope
of the line for the other variable in the interaction term and the dependent
variable. See lecture slides for an example.
V. Short assignment on multiple regression
1. This assignment is primarily meant to give you feedback before you wrap up
your final paper. In order to get feedback sooner, you must turn in your
assignment via e-mail or in my box by Wednesday, 5:00 pm. Assignments
turned in after that will not be penalized, but you might not get feedback in time
for your final project.
2. Use the 2002 GSS to create a set with the following independent variables (see
lab from last week for instructions on creating sets):
sex
rincom98
relig
racecen1
marital
health
tvhours
age
hrs1
educ
And: a dependent variable of your choosing. It must have at least four
ordinal categories. Feel free to use one of the variables above.
3. Using your dependent variable, use the variable list above to construct a
multiple regression model and interpret your findings. You must satisfy each of
the following requirements:
1. You must have at least four independent variables.
2. One variable should be an interaction term.
3. One variable should be a dummy variable.
4.
You must test and show the bivariate assumptions for at least one
independent variable (i.e., conditional normality, homoskedasticity).
BONUS: Show the assumptions for multiple regression for at least one
independent variable.
5. In your final write-up, you must state the hypotheses for each of your
independent variables, interpret the R-square, and determine if your
hypotheses were supported. Interpret the slope for at least one of your
independent variables. You must also provide a substantive interpretation
of both your interaction term and your dummy variable.
4. Keep in mind that building models is more than just plugging a bunch of
independent variables into the regression. Sometimes variables that do not have
any significant effects, do not affect other variables, and only increase the Rsquare by a fraction can be omitted from the model. In your write up, provide a
few sentences stating how you constructed the model, if you omitted any variables
because of insignificant effects, and possible biases that you perceive may affect
your results. These are important things to think about for any project.
5. Checklist for the assignment:
__ Dependent variable
__ Four independent variables
__Dummy
__Interaction
__ Bivariate assumptions tested for one independent variable
__ Interpretations of results
__ Brief discussion of model building and bias
__ Output (only the regression output and scatterplots/histograms for
assumptions is necessary)
__ Syntax
6. Keep in mind that this assignment is more for getting feedback than for getting
things “right.” See me if you have questions.
GENERAL SPSS INSTRUCTIONS
I. Scatterplots
1. Click on Graphs, Scatter.
2. Choose a Simple, Matrix, Overlay, or 3-D scatterplot. For today, we will
only be looking at simple scatterplots.
3. Place your independent variable into the x-axis box and your dependent
variable into the y-axis box.
4. If your cases have labels (such as country names), put the label variable into
the Label cases by box.
5. To add a title to your scatterplot, click on Title.
6. Double-click on the scatterplot in the Output window to open the Chart
Editor.
II. Correlation
1. Click on Analyze, Correlate, Bivariate.
2. Place the variables into the box.
3. Check the Pearson correlation coefficient box.
4. Paste and Run.
III. Multiple Regression
1. Click on Analyze, Regression, Linear.
2. Place the dependent and independent variables into the appropriate boxes.
3. As a default, SPSS provides the model summary statistics, ANOVA statistics,
and coefficients. For information on additional options, consult the Norusis text,
pp. 451-461. We will explore some of these options later in the semester.
4. To test for conditional normality, select a value, or range of values, for x, and
check the resulting histogram for the y variable. The assumption is met if the
distribution of the y variable appears to be normal.
5. To check for homoskedasticity, check the bivariate scatterplot of the x and y
variables.
6. For checking the assumptions for multiple regression, save the unstandardized
and standardized values in the Save window.
IV. Computing new variables
1. Click on Transform, Compute.
2. Type the name of the new variable being created in the Target Variable box.
3. Drag variables from the left into the computation window, and use
mathematical symbols or the embedded functions to construct the equation for
the new variable.
4. If the computation only applies to certain cases, use the If… option to set up
selection criteria.
5. Paste, and Run.
Download