Agenda Soc 5811 Lab #10 11.14.05 I. Welcome 1. Proposals due tomorrow! 2. Review last lab. 2. Lab handouts, datasets, and other information can be found at: http://www.tc.umn.edu/~long0324/ II. Objectives 1. Learn how to calculate correlation coefficients in SPSS. 2. Learn how to conduct bivariate regression analyses in SPSS. 3. Learn one way to construct an index for the final paper. 3. If time remains, we can address remaining questions about your proposals. III. Review scatterplots. 1. Last week we looked at the various options for constructing simple scatterplots. SPSS has three other options. Overlay scatterplots are for than on x-y pair, i.e., you can plot two y-variables with the same x-variable. Matrix scatterplots represent multiple bivariate scatterplots in the same diagram with rows and columns representing the number of variables chosen. In matrix scatterplots, all possible combinations of variables are represented. 3-D scatterplots plot three variables in three dimensions. 2. Construct an overlay scaterplot with GDP per capita (gdppou00), illiteracy rates (illrat00), and foreign aid per capita (aidpc00). To construct an overlay scatterplot, you must select your variables as x-y pairs. What can we interpret from the scatterplot? When are overlay plots useful? 3. Construct a matrix scatterplot with the same variables. How do we interpret a matrix scatterplot? What do the rows and columns mean? 4. Finally, construct a 3-D scatterplot with the same variables. Double-click on the scatterplot in the Output window to open the Chart editor; here you can rotate the dimensions of the graph. 5. In addition to adding a regression line, we can also add a Mean of Y Reference Line. Construct a simple scatterplot with illiteracy (illrat00) and infant mortality (morrai00). Add a regression line and mean of y reference line. In the scatterplot, what is the explained variance? What is the error variance? How can we use this information to calculate the R2? IV. Review correlation. 1. The Pearson correlation coefficient is the same as the R in a bivariate regression model, which is a measure of linear association. When squared, the correlation coefficient becomes the coefficient of determination, R2. 2. We can calculate multiple correlation coefficients simultaneously in SPSS. Calculate correlation coefficients for infant immortality (morrai00), illiteracy (illrat00), and GDP per capita (gdppou00). Interpret the results. 3. We have not discussed how we can conduct hypothesis tests with correlation coefficients, but it can be done. Any ideas? What would be the null hypothesis? V. Bivariate regression. 1. Bivariate regression refers to the linear relationship of two variables, including how well the two variables are associated and the slope of the line that best fits the data. Before conducting a regression analysis, it is important to first look at the scatterplot to see if a linear relationship exists, as well as clean up the data to take care of missing values, etc. 2. Also, before running a regression analysis it is important to first check whether or not the data meets the assumptions of regression. What are the assumptions of bivariate regression? First, we must be using a random sample with a sufficient N. Second, the relationship between the independent and dependent variable must be linear. The third assumption is conditional normality, or a normal distribution of Y for different values of X. Finally, the variance of Y must be homoskedastic, or equal across values of X. 3. Construct a scatterplot with illiteracy and infant mortality. Check to make sure the assumptions of regression are met. Is it a random sample? Is there a linear relationship? Is Y normally distributed for every value of X? How can we determine this? Are the variances of the error equal for every value of X? How can we check this? 4. Conduct a linear regression analysis with the same two variables, and interpret the output. a. First, how well are the two variables correlated? What statistic tells us this? b. What is the R2? Is the line a good fit for the data? This depends… c. The ANOVA analysis refers to the partitioned error and regression variances. Recall that the R2 is calculated by dividing the regression variance by the total variance around Y-bar. What statistic in the ANOVA output can we use to calculate R2? d. The most important statistics in the output are the coefficients. What is the coefficient for illiteracy? How can we interpret this? How do we interpret the constant? How do we interpret the significance levels of each (i.e., how do we do hypothesis tests for the slope? Although we haven’t discussed it in lecture yet, what do you think the standard error for the coefficient is? 5. For now, it is important that you understand the meaning of R2 and the slope and constant estimates. We will discuss what standardized coefficients are later. VI. One way to construct an index… 1. It is important that you have interval variables for a regression analysis, although ordinal variables with many categories can also be used. Unfortunately, many surveys do not have ordinal variables with many categories. So, we have to construct an index. 2. If multiple variables measure the general phenomena you are interested in, and they are all measured on the same scale, the variables can simply be added to construct one variable. 3. Open the 2002 GSS subset, which includes three 3-category variables (conlegis, conjudge, and confed) measuring confidence in different branches of the government. Check the frequencies of each to check the scales and missing values. Are they each coded on the same scale? What do you notice about the scale (i.e., what do big values mean)? Recode the variables so that higher values represent more confidence in the government. 4. To add the three variables to construct one 9-category variable, simply construct a new variable in the Compute window. The Compute command allows you to use mathematical functions with variables to construct new ones. Construct a new variable, congov, by adding the three variables together. 5. Check the frequency of your new variable. Is it nine categories? VI.I. Any questions about the proposals? Hint: See p. 256 for the assumptions of multiple regression (BLUE). SPSS INSTRUCTIONS I. Scatterplots 1. Click on Graphs, Scatter. 2. Choose a Simple, Matrix, Overlay, or 3-D scatterplot. For today, we will only be looking at simple scatterplots. 3. Place your independent variable into the x-axis box and your dependent variable into the y-axis box. 4. If your cases have labels (such as country names), put the label variable into the Label cases by box. 5. To add a title to your scatterplot, click on Title. 6. Double-click on the scatterplot in the Output window to open the Chart Editor. II. Correlation 1. Click on Analyze, Correlate, Bivariate. 2. Place the variables into the box. 3. Check the Pearson correlation coefficient box. 4. Paste and Run. III. Bivariate Regression 1. Click on Analyze, Regression, Linear. 2. Place the dependent and independent variables into the appropriate boxes. 3. As a default, SPSS provides the model summary statistics, ANOVA statistics, and coefficients. For information on additional options, consult the Norusis text, pp. 451-461. We will explore some of these options later in the semester. 4. To test for conditional normality, select a value, or range of values, for x, and check the resulting histogram for the y variable. The assumption is met if the distribution of the y variable appears to be normal. 5. To check for homoskedasticity, check the bivariate scatterplot of the x and y variables. IV. Computing new variables 1. Click on Transform, Compute. 2. Type the name of the new variable being created in the Target Variable box. 3. Drag variables from the left into the computation window, and use mathematical symbols or the embedded functions to construct the equation for the new variable. 4. If the computation only applies to certain cases, use the If… option to set up selection criteria. 5. Paste, and Run.