Review for Final Examination COMM 550X, May 12, 11 am- 1pm Final Examination Practice for the Mid-Term • • Multiple choice portion of the test: There will be 50 multiple choice questions chosen at random from this pool of possible test questions. Each item will be worth 1 point SPSS DATA ANALYSIS: You will be tested in SPSS on bivariate correlation, multiple regression, and MANOVA/discriminant analysis. The questions will use the data sets statelevel.sav and NationsoftheWorldModified.sav. The questions will have point values as follows: bivariate correlation, 10 points; multiple regression, 18 points; MANOVA/discriminant analysis, 22 points Sample Test Question for Bivariate Correlation (8 points) Using the NationsoftheWorldModified.sav data set, test the hypothesis that that there is a significant positive association between a country’s civil liberties score and the annual number of peace demonstrations in that country. Set your confidence level at .05. Report the obtained value of the test statistic, the N, the df and probability level, and whether or not you can reject the null hypothesis of no association between the two variables. Testing the Hypothesis You have been asked to see if there is a significant association between two variables. For tests where both variables are interval level or better and no causal relationship between the two is implied, the appropriate test statistic to compute is the bivariate correlation. You are looking for a significant level of Pearson’s r, the correlation coefficient In SPSS Data Editor, open the NationsoftheWorldModified.sav data file Go to Analyze/Correlate/Bivariate and put the two variables, civil liberties score and number of peaceful political demonstrations, into the Variables window Select a one-tailed test (you do this because you have made a prediction about the direction of the relationship, that it will be positive) and flag significant correlations Under Correlation Coefficients select Pearson and click OK Compare your output to the next slide SPSS Output for Bivariate Correlation You only get a small amount of output for bivariate correlation. Note the correlation coefficient (.077), the sample size (N = 112) and the significance level (.208). DF is equal to N-2 for Pearson’s r. Before you did the test, you set your confidence level to .05, so p (the probablility level) needed to be smaller than .05 for you to reject the null hypothesis. But your obtained value of Pearson’s r has a significance level of .208. Consequently, you cannot reject the null hypothesis, and you are not able to confirm your research hypothesis that there is a significant positive association between a country’s civil liberties score and the number of its peaceful political demonstrations Correlations Civil liberties score Number of peaceful political demonstrations Pearson Correlation Sig . (1-tailed) N Pearson Correlation Sig . (1-tailed) N Civil liberties score 1 . 112 .077 .208 112 Number of peaceful political demonstra tions .077 .208 112 1 . 112 Pearson’s r significance level Writing up your Result “Bivariate correlation analysis was performed to test the hypothesis that a country’s civil liberties score was positively associated with its number of peaceful political demonstrations. The obtained value of Pearson’s r was .077 (N = 112, df = 110, p = .208, one-tailed test), which was not significant. Consequently, we cannot reject the null hypothesis that there is no association between a country’s civil liberties score and its number of peaceful political demonsrations, and our research hypothesis was not confirmed.” (Note: if the significance level had fallen below .05, then you would have confirmed your research hypothesis only if the sign of the association between the two variables was positive, as predicted, that is, if the obtained correlation coefficient was positive) Sample Test Question for Multiple Regression You are asked to test the hypothesis that a country’s scores on the civil liberties index is a function of a linear combination of three variables, (1) percentage of seats in the lower legislative house held by the largest party, (2) percentage in the work force who are women, and (3) percentage of voting age population who voted in the last election. You believe that these variables are of importance in the order listed above. Further, you expect that the signs of the first predictor, percentage of seats, will be negative, and the signs of the second two predictors will be positive. Test the hypothesis and then write an equation for predicting the score of a new case on the civil liberties index based on the three variables. Set your confidence level to .05. Report the test statistic, N, df, and obtained probability level, and all other statistics appropriate to determining whether or not you have used the procedure correctly, and state whether or not your data support rejecting the null hypothesis that civil liberties is unrelated to the three variables, and confirming your research hypothesis Testing the Hypothesis To test this hypothesis, you need a procedure which looks at the relationship between a single, interval or better level variable on the one hand and multiple interval level or better predictors on the other. This is multiple regression. Since your theory has given you a reason to order the importance of your predictors ahead of time, you choose a hierarchical regression analysis where you enter the variables into the regression equation in the order of their presumed importance. SPSS Procedure for Multiple Regression Download the NationsoftheWorldModified.sav data file Go to Analyze/ Regression/ Linear Move civil liberties score into the Dependent Box Now we are going to enter variables one at a time, in the order predicted by our theory. Move your first to enter variable, percentage of seats in the lower legislative house held by the largest party, into the Independent box and click Next Move your second to enter variable, percentage of the work force who are women, into the Independent box and click Next Finally, move your third to enter variable, percentage of the voting age population who voted in the last election, into the Independent box. DON’T click next again Make sure the enter option is selected under Method Under Statistics, select Estimates, Confidence Intervals, Model Fit, R squared change, Descriptives, Part and Partial Correlation, and Collinearity Statistics, and click Continue. Under Options, check Include Constant in the Equation, click Continue and then OK. You are doing this so you will be able to write the equation for predicting new cases’ civil liberties scores from raw scores on the predictor variables. Compare results to next slides SPSS Output: The Variables and their Order of Entry Look for this box to make sure you have done the hierarchical regression form of multiple regression and that your variables have been entered in the order predicted by your theory Variables Entered/Removedb Model 1 2 3 Variables Entered Percent of seats in lower legis hse held by larg est a party Percent of labor force who area women Percent of voting age pop who voted in last a election Variables Removed Method . Enter . Enter . Enter a. All req uested variables entered. b. Dependent Variable: Civil liberties score The Regression Model Summary Table Next, look for your model summary. Note that there are three models examined, and the notes a, b, and c tell which of your predictors are in each model. Note that model 1, with only the percent of seats in the lower legislative house variable entered, was significant (F = 52.544, p <.001), and when the percentage of labor force who are women variable was added in model 2, the increase in R square, the percent of variance accounted for, was significant (F = 6.346, p < .014). Thus the two-variable model is significantly correlated with Y. Note that Model three didn’t change R square significantly (p = .471) (didn’t improve prediction significantly) so you really don’t need the third predictor, percent of voting age population who voted in last election. You choose Model 2 Model Summary Chang e Statistics Model 1 2 3 R .625a .659b .662c R Square .391 .435 .438 Adjusted R Square .383 .421 .417 Std. Error of the Estimate 1.277 1.237 1.240 R Square Chang e .391 .044 .004 F Change 52.544 6.346 .525 df1 df2 1 1 1 82 81 80 Sig . F Change .000 .014 .471 a. Predictors: (Constant), Percent of seats in lower legis hse held by largest party b. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women c. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women, Percent of voting age pop who voted in last election Regression Statistics; R and R Square Note the statistics for the Model you have chosen, Model 2. The multiple correlation R between civil liberties score and the two predictors is .659. The amount of variance in the civil liberties score accounted for by the combination of the two variables is .435 Model Summary Chang e Statistics Model 1 2 3 R .625a .659b .662c R Square .391 .435 .438 Adjusted R Square .383 .421 .417 Std. Error of the Estimate 1.277 1.237 1.240 R Square Chang e .391 .044 .004 F Change 52.544 6.346 .525 df1 df2 1 1 1 82 81 80 Sig . F Change .000 .014 .471 a. Predictors: (Constant), Percent of seats in lower legis hse held by largest party b. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women c. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women, Percent of voting age pop who voted in last election Overall Significance of the Regression Equation Look in the ANOVA table to get the overall F value for the Model you have chosen (the F (2, 81) value for the two variable combination of percent of seats held by largest party and percent of labor force who are women is 31.158, p < .001 ANOVAd Model 1 2 3 Reg ression Residual Total Reg ression Residual Total Reg ression Residual Total Sum of Squares 85.620 133.618 219.238 95.327 123.911 219.238 96.135 123.103 219.238 df 1 82 83 2 81 83 3 80 83 Mean Square 85.620 1.629 F 52.544 Sig . .000a 47.664 1.530 31.158 .000b 32.045 1.539 20.825 .000c a. Predictors: (Constant), Percent of seats in lower leg is hse held by largest party b. Predictors: (Constant), Percent of seats in lower leg is hse held by largest party, Percent of labor force who are women c. Predictors: (Constant), Percent of seats in lower leg is hse held by largest party, Percent of labor force who are women, Percent of voting ag e pop who voted in last election d. Dependent Variable: Civil liberties score Standardized and Unstandardized Coefficients; Multicollinearity Continue to examine your output. Note the standardized and unstandardized coefficients. You can use the unstandardized coefficients to write the regression equation Y = 6.194 -.039 percent of seats held by largest party + .034 percent of labor force who are women. You can use the standardized coefficients to compare the relative contributions of number of seats and percent of women (.-620 and .210, respectively) and note that both standardized coefficents were significantly different from zero. Note also that the sign of the standardized coefficient for percentage of seats was a minus sign, as predicted by your theory, and that the sign of the other variable was positive, as predicted. You can also report your tolerance and VIF statistics which suggest that multicollinearity was not a problem (tolerance is 1.0, VIF is not near 10) Coefficientsa Model 1 2 3 (Constant) Percent of seats in lower legis hse held by largest party (Constant) Percent of seats in lower legis hse held by largest party Percent of labor force who are women (Constant) Percent of seats in lower legis hse held by largest party Percent of labor force who are women Percent of voting age pop who voted in last election Unstandardized Coefficients B Std. Error 7.298 .358 Standardized Coefficients Beta Sig . .000 -7.249 .000 -.051 -.029 11.078 .000 5.081 7.306 Zero-order Correlations Partial -.625 -.625 -.625 1.000 1.000 Part Collinearity Statistics Tolerance VIF -.040 .005 6.194 .559 -.039 .005 -.620 -7.425 .000 -.050 -.029 -.625 -.636 -.620 1.000 1.000 .034 .014 .210 2.519 .014 .007 .061 .224 .270 .210 1.000 1.000 5.776 .805 7.179 .000 4.175 7.377 -.038 .006 -.594 -6.497 .000 -.049 -.026 -.625 -.588 -.544 .840 1.190 .032 .014 .195 2.263 .026 .004 .060 .224 .245 .190 .942 1.062 .006 .009 .068 .725 .471 -.011 .023 .347 .081 .061 .796 1.256 a. Dependent Variable: Civil liberties score -.625 95% Confidence Interval for B Lower Bound Upper Bound 6.585 8.011 t 20.367 Writing up Your Multiple Regression Results “To test the hypothesis that a country’s civil liberties score was significantly related to a linear combination of the number of seats in the lower legislative house held by the largest party, the number of women in the labor force, and the percentage of the voting age population who voted in the last election, a multiple regression analysis was conducted. It was expected that the variable ‘number of seats held by the largest party’ would be negatively correlated with civil liberties score and the other two variables positively related. Results of the regression analysis indicated that a two-variable model which included number of seats in the lower legislative house held by the largest party and percentage of women in the workplace was significantly correlated with civil liberties scores (F (2, 81) = 31.158, p < .001. Addition of the third variable to the predictive model did not significantly increase the amount of variance in civil liberties score (F = .525, p < .471). The two-variable combination accounted for approximately 43.5% of the variance in civil liberties score. (continued on next slide) Writing up Your Multiple Regression Results, cont’d The best fitting regression equation for predicting civil liberties score from the two variables was civil liberties score = 6.194 -.039 percent of seats held by largest party + .034 percent of labor force who are women. Significant standardized coefficients (βs) were obtained for the two variables (-.620 for percent of seats held by the largest party and .210 for percentage of women in the labor force), indicating that countries with higher scores on civil liberties would be likely to have a smaller percentage of seats in the lower legislative house held by the largest party and a larger percentage of women in the labor force, as predicted. Tolerance and VIF for the two-variable model were both equal to 1.0, indicating that multicollinearity was not an issue. Thus we can say that partial support for the hypothesis was obtained.” Sample Test Question for Discriminant Analysis Now we are going to test the following hypothesis: Southern and non-Southern states differ significantly on a combination of two types of traffic fatality: restrained and unrestrained motor vehicle accidents, such that Southern states will have a significantly higher value on the combined indicators than nonSouthern states. Testing the Hypothesis Both discriminant analysis and MANOVA can be used in the case where you have two or more interval or better level predictors (DVs in the usage of MANOVA) and a nominal level grouping variable (IV in the usage of MANOVA). In this case we have a nominal level grouping variable (Southern/non-Southern) and interval level (actually ratio level) DVs or discriminating variables (traffic fatality variables). We are going to use discriminant analysis to do the MANOVA, which (1) will give the identical result in the case where there are only two groups (two levels of the grouping variable) and (2) let us practice doing discriminant analysis and evaluating the efficacy of the discriminant function. We are going to be looking for a significant level of Wilks’ lambda as an indicator of significant differences and support for the hypothesis. It is also necessary for the signs of the discriminant function coefficients to be in the same direction as that predicted for the two variables (a positive relationship with “southerness”). SPSS Procedure for Discriminant Analysis Download the file statelevel.sav. In SPSS Data Editor, open the data file statelevel.sav Go to Analyze/Classify/Discriminant In the Group box put South (dummy) and set the maximum and minimum values to 1 and 0, respectively In the Independents, put restrained motor vehicle deaths per 100k and unrestrained motor vehicle deaths per 100k Make sure that the Enter Independents Together button is checked Under Statistics, check Means, univariate ANOVAs, Box’s M, Unstandardized function coefficients, and click continue Under Classify, select Summary Table and Territorial Map, and click Continue, and then OK Compare your output to the next few slides Examining Your SPSS Output: Group Means First, look at the group means. Note that the means are in the expected direction with levels of the two vehicle death variables higher in the South than in the non-South. Univariate F tests show that the differences are significant for both of the variables. So you have significant differences in the expected direction on both of your variables considered separately Group Statistics South dummy Non-south South Total Mean Restrained motor veh deaths per 100k Unrestrained motor veh deaths per 100k Restrained motor veh deaths per 100k Unrestrained motor veh deaths per 100k Restrained motor veh deaths per 100k Unrestrained motor veh deaths per 100k Std. Deviation Valid N (listwise) Unweig hted Weig hted 8.62437 2.785566 34 34.000 7.54033 4.380484 34 34.000 10.65369 2.182391 16 16.000 10.69006 4.332577 16 16.000 9.27376 2.756466 50 50.000 Tests of Equality of Group Means 8.54824 4.568596 50 Wilks' 50.000 Lambda Restrained motor veh deaths per 100k Unrestrained motor veh deaths per 100k F df1 df2 Sig . .880 6.567 1 48 .014 .894 5.664 1 48 .021 Box’s M Test for Equality of Group Covariances, and Significance of Wilk’s Lambda Overall Test Next, look at your Box’s M test for the equality of group covariances. Box’s M is not significant, which means you have met one of the assumptions of MANOVA, that the group covariances for the levels of the grouping variable are equal. Now look at the value of Wilks’ lambda, and assess it for significance. Wilks’ lambda is significant by the Chi-square test, and it equals .783. If we interpret this significant value of Wilks’ lambda in a MANOVA-like way, we have confirmed the hypothesis that Southern and non-Southern states differ significantly on the combination of the two motor vehicle predictors. (If we were interpreting this in a discriminant analysis type of way, we would say that the combination of two types of traffic related fatalities left .783 of the variance in Southern state-ness “unexplained”). Wilks’ lambda is one of those measures you want to be close to zero, so this result is statistically significant, but not all that impressive Test Results Box's M F Approx. df1 df2 Sig . 1.912 .602 3 19070.702 .613 Tests null hypothesis of equal population covariance matrices. W ilks' Lambda Test of Function(s) 1 Wilks' Lambda .783 Chi-square 11.478 df 2 Sig . .003 The Canonical Correlation From your printout you will also want to report the canonical correlation between the combination of the two traffic fatality variables and South/Non-South, which is. 465. This represents the correlation of the grouping variable (South/non-South) with the new canonical variable formed by weighting the two original predictors (traffic fatalities belted and unbelted) by the weights from the discriminant function. You don’t usually report the equation for classifying new cases in the write-up when you are using MANOVA or discriminant analysis to test a hypothesis about group differences You would use these weights to classify new cases as to south/non-South Eigenv alues Canonical Discriminant Function Coefficients Function 1 Restrained motor veh deaths per 100k Unrestrained motor veh deaths per 100k (Constant) Unstandardized coefficients .291 .163 -4.093 Function 1 Eig envalue .277a % of Variance 100.0 Cumulative % 100.0 a. First 1 canonical discriminant functions were used in the analysis. Canonical Correlation .465 Discriminant Function Coefficients, Group Means on Functions You would report the standardized discriminant function coefficients to show the relative contribution of each of the two predictors, which in this case are about equal, and both positively associated with the discriminant function, as required for support of your hypothesis. Then would you report the group means (centroids) on the discriminant function which shows that the South is highly positively correlated with it (e.g., being a Southern state is highly correlated with higher vehicle deaths) and the non-south is negative correlated with it. Standardized Canonical Discriminant Function Coefficients Function 1 Restrained motor veh deaths per 100k Unrestrained motor veh deaths per 100k .760 .713 Functions at Group Centroids South dummy Non-south South Function 1 -.354 .751 Unstandardized canonical discriminant functions evaluated at g roup means Classification Results Finally, you would report the re-classification results (that is, the results of using the discriminant function coefficients to create a new, canonical variable out of the old predictors and use this new variable to re-classify cases as to South or non-South) and the most frequently occurring misclassifications; e.g., 78% of the cases were correctly re-classified based on the discriminant function. Slightly more errors proportionally were made re-classifying the Southern than the non-Southern cases Classification Resultsa Original Count % South dummy Non-south South Non-south South Predicted Group Membership Non-south South 27 7 4 12 79.4 20.6 25.0 75.0 a. 78.0% of orig inal grouped cases correctly classified. Total 34 16 100.0 100.0 Writing up your Discriminant Analysis Result “A discriminant analysis was conducted to perform a multivariateanalysis of variance test of the hypothesis that Southern states differ from non-Southern states on a linear combination of two types of traffic fatality, restrained motor vehicle accidents and unrestrained motor vehicle accidents, such that Southern states will have a significantly higher value on the combined indicators than non-Southern states. The obtained value of Wilks’ lambda, .783, was significant at p <.003 (Chi-square = 11.478, df = 2, Box’s M =1.912, n.s.). The canonical correlation between the grouping variable and the new canonical variable composed of the two predictors was .465. Significant univariate differences of means between Southern and non-Southern states were also obtained for restrained motor vehicle accidents (F (1, 48) = 6.567, p <.014) and unrestrained vehicle accidents (F (1, 48) = 5.664, p < .021). Mean differences were in the expected direction: means for restrained motor vehicle accidents were 10.65 for Southern states and 8.62 for non-Southern states; means for unrestrained motor vehicle accidents were 10.69 for Southern states and 7.54 for non-Southern states. Writing up Your Discriminant Analysis Result, cont’d Table 1 presents the standardized discriminant function coefficients. Higher scores on the discriminant function corresponded to higher traffic fatality rates for both of the discriminating variables. Table 2 presents the group centroids on the discriminant function; the Southern states group had a high, positive centroid with respect to the function, corresponding to higher rates of traffic fatalities. Table 3 presents the results of the re-classification analysis, which shows that the discriminant function was successful in reclassifying 78% of the cases.”