Agenda Soc 5811 Lab #13 12.05.05 I. Welcome 1. Review last lab. 2. Lab handouts, datasets, and other information can be found at: http://www.tc.umn.edu/~long0324/ II. Objectives 1. More multiple regression analysis. 2. More dummy variables and interaction terms. 3. Short lab assignment. 4. Final projects. 5. Next week we will cover more advanced diagnostics for multiple regression. III. More on dummy variables 1. Recall that dummy variables allow you to compare groups within the regression equation. Coefficients for dummy variables are not slopes, but rather differences in constants. Dummy variables can be constructed form any nominal category. Remember to always exclude a dummy category in your equation, or the model will “blow up.” 2. When first learning how to construct dummy variables, it is best to construct a single variable for every category or group of categories (i.e., DWHITE, DBLACK, and DOTHER). When you get the swing of things, you can sometimes only make a single dummy for the group we are most interested in. a. What is a reference group? What things should we keep in mind when selecting a reference group? 3. Construct a dummy variable for female (using sex). Regress occupational prestige (prestg80) on education (educ), income (rincom98), and the female dummy variable. What did you find? Compare your results with regressions for males and females separately (i.e., select each group and run a regression for each without the dummy variable). a. How do we interpret the dummy variable? b. Do the separate regressions yield different slopes? IV. Multiple regression with interaction terms 1. Interaction effects show the difference in slopes between two variables for each category of a third variable. Interaction terms are constructed by multiplying the two variables you are interested in. Today we will talk about interaction terms between a dummy and interval variable, although interaction terms can also be created using two interval variables. 2. Create an interaction term using female dummy variable from above and education (educ). Hint; use the compute command to multiply the two variables. 3. Conduct a multivariate regression with the interaction term and its components, using income (rincom98) as your dependent variable. What will the interaction term tell us? What did you find? 4 Create dummy variables for Protestant, Catholic, and Jewish (using relig). The omitted category is all other religious preferences. Construct three interaction terms with age (age) and each of the dummy variables. 5. First, conduct a regression using religious attendance (attend) as your dependent variable and using age and each of the dummy variables as your independent variables. Also, include a dummy variable for female. What did you find? 6. Next, conduct a regression including your three interaction terms. You should usually include each of the component variables in the model, as well. What did you find? How do we interpret the interaction terms? 7. Write the complete regression equation, then predict the religious attendance for a 47-year old Catholic male. Write a third equation predicting the religious attendance of a 27-year old Hindu female. 8. When using interaction terms with two continuous variables, the coefficient is interpreted as the unit change in one variable in the interaction term on the slope of the line for the other variable in the interaction term and the dependent variable. See lecture slides for an example. V. Short assignment on multiple regression 1. This assignment is primarily meant to give you feedback before you wrap up your final paper. In order to get feedback sooner, you must turn in your assignment via e-mail or in my box by Wednesday, 5:00 pm. Assignments turned in after that will not be penalized, but you might not get feedback in time for your final project. 2. Use the 2002 GSS to create a set with the following independent variables (see lab from last week for instructions on creating sets): sex rincom98 relig racecen1 marital health tvhours age hrs1 educ And: a dependent variable of your choosing. It must have at least four ordinal categories. Feel free to use one of the variables above. 3. Using your dependent variable, use the variable list above to construct a multiple regression model and interpret your findings. You must satisfy each of the following requirements: 1. You must have at least four independent variables. 2. One variable should be an interaction term. 3. One variable should be a dummy variable. 4. You must test and show the bivariate assumptions for at least one independent variable (i.e., conditional normality, homoskedasticity). BONUS: Show the assumptions for multiple regression for at least one independent variable. 5. In your final write-up, you must state the hypotheses for each of your independent variables, interpret the R-square, and determine if your hypotheses were supported. Interpret the slope for at least one of your independent variables. You must also provide a substantive interpretation of both your interaction term and your dummy variable. 4. Keep in mind that building models is more than just plugging a bunch of independent variables into the regression. Sometimes variables that do not have any significant effects, do not affect other variables, and only increase the Rsquare by a fraction can be omitted from the model. In your write up, provide a few sentences stating how you constructed the model, if you omitted any variables because of insignificant effects, and possible biases that you perceive may affect your results. These are important things to think about for any project. 5. Checklist for the assignment: __ Dependent variable __ Four independent variables __Dummy __Interaction __ Bivariate assumptions tested for one independent variable __ Interpretations of results __ Brief discussion of model building and bias __ Output (only the regression output and scatterplots/histograms for assumptions is necessary) __ Syntax 6. Keep in mind that this assignment is more for getting feedback than for getting things “right.” See me if you have questions. GENERAL SPSS INSTRUCTIONS I. Scatterplots 1. Click on Graphs, Scatter. 2. Choose a Simple, Matrix, Overlay, or 3-D scatterplot. For today, we will only be looking at simple scatterplots. 3. Place your independent variable into the x-axis box and your dependent variable into the y-axis box. 4. If your cases have labels (such as country names), put the label variable into the Label cases by box. 5. To add a title to your scatterplot, click on Title. 6. Double-click on the scatterplot in the Output window to open the Chart Editor. II. Correlation 1. Click on Analyze, Correlate, Bivariate. 2. Place the variables into the box. 3. Check the Pearson correlation coefficient box. 4. Paste and Run. III. Multiple Regression 1. Click on Analyze, Regression, Linear. 2. Place the dependent and independent variables into the appropriate boxes. 3. As a default, SPSS provides the model summary statistics, ANOVA statistics, and coefficients. For information on additional options, consult the Norusis text, pp. 451-461. We will explore some of these options later in the semester. 4. To test for conditional normality, select a value, or range of values, for x, and check the resulting histogram for the y variable. The assumption is met if the distribution of the y variable appears to be normal. 5. To check for homoskedasticity, check the bivariate scatterplot of the x and y variables. 6. For checking the assumptions for multiple regression, save the unstandardized and standardized values in the Save window. IV. Computing new variables 1. Click on Transform, Compute. 2. Type the name of the new variable being created in the Target Variable box. 3. Drag variables from the left into the computation window, and use mathematical symbols or the embedded functions to construct the equation for the new variable. 4. If the computation only applies to certain cases, use the If… option to set up selection criteria. 5. Paste, and Run.