EDF 6486 Advanced Analysis in Educational Research Homework Due March 5, 2012 Solutions Hinkle, et al 1. Communication Client (X) 1 39 2 27 3 30 4 52 5 30 6 42 7 35 8 32 9 29 10 33 11 46 12 43 13 55 n = 25 ΣX = 962 ΣX2 = 38536 Satisfaction (Y) 29 32 25 19 30 21 24 33 27 31 23 27 33 Communication (X) 39 43 37 48 30 42 38 42 25 36 49 42 Client 14 15 16 17 18 19 20 21 22 23 24 25 Satisfaction (Y) 31 25 24 32 35 22 30 25 33 31 23 27 ΣXY = 26267 ΣY = 692 ΣY2 = 19622 a. & c. 40 35 Satisfaction 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 Communication 1 45 50 55 60 b. The slope of the regression line (b) is found: b n XY X Y n X 2 X 2 2526267 962 692 656675 665704 9029 .238 2 963400 925444 37956 2538536 962 The Y-intercept (a) is found: a Y b X n 692 ( .238)( 962) 692 ( 228.960) 920.960 36.84 25 25 25 So, the regression equation for predicting marital satisfaction scores (Y) from communication score (X) is: Y .238 X 36.84 c. We can determine two points that are on the regression line. The first is (0, 36.84) since we know the value of Y when X is zero (the Y-intercept) is equal to a. When X = 10, the value of Y is .23810 36.84 2.38 36.84 34.46 . We plot these on the graph on the previous page and connect these points. This gives us the regression line. d. Using the regression equation, we would predict that a client who has a communication score of 43 has a marital satisfaction score as shown below. Y .238 X 36.84 .238(43) 36.84 10.234 36.84 26.61 e. The standard error of estimate can be found using: sY X sY 1 r 2 n 1 n 2 (Formula 6.11) sY is the standard deviation of the criterion variable. In this case it is the marital satisfaction variable. We know that sY Y 2 Y n 2 n 1 , so 19622 692 n 19622 478864 25 19622 19154.56 sY 25 1 24 24 2 2 467.44 19.48 4.41 24 Now, we can find the standard deviation of the communication variable (X) using the formula sX X 2 X n 2 n 1 38536 962 n 25 1 2 sX so, 38536 925444 25 24 38536 37017.76 1518.24 63.26 7.95 24 24 We can find the correlation between communication and satisfaction in marriage by using s 7.95 the formula r bY X X .238 .2381.80 .428 4.41 sY Getting back to Formula 6.11, we can calculate 2 sY X sY 1 r 2 n 1 n 2 4.41 1 .428 24 23 4.41 1 .18 1.04 4.41 .82 1.02 4.41.911.02 4.09 Using Formula 6.12 we obtain 2 sY X sY 1 r 2 4.41 1 .428 4.41 1 .18 4.41 .82 4.41.91 4.01 f. If clients have communication scores of 28, we would predict that their marital satisfactions scores would be: Y bX a .23828 36.84 6.66 36.84 30.18 If the mean of the conditional distribution is 30.18 and the standard error of estimate is .401, the z-score that is equivalent to a score of 25 is found by Y Y 25 30.18 5.18 z 1.29 sY X 4.01 4.01 3 .9015 .4015 Predicted Y Z 25 -1.29 .5000 30.18 0 The figure above shows us that the percentage of clients with communications scores who will have marital satisfaction scores greater than 25 is 90.15%. g. If the communication score is 33, the predicted marital satisfaction score is Y bX a .23833 36.84 7.85 36.84 28.99 The 95% confidence interval for the predicted value of Y is Y tCV s . The number Y of degrees of freedom is n-2. Remember that s is the standard error of predicted scores Y that can be calculated: s sY X Y 4.01 1 XX 1 33 38.48 1 4.01 1 2 2 n n 1s X 25 247.95 2 2 5.48 1 .04 4.01 2463.26 2 1.04 30.03 1518.24 4.01 1.04 .02 4.01 1.06 4.011.03 4.13 and therefore, the 95% confidence interval for the predicted marital satisfaction score, given a communication score of 33 is: Y tCV s 28.99 2.0644.13 28.99 8.52 Y So, the upper limit of CI95 is 28.99 + 8.52 = 37.51 4 and the lower limit of CI95 is 28.99 – 8.52 = 20.47 . h. From part e., we know that r = -.428 We can test the null hypothesis that ρ = 0 using n2 the formula t r with n-2 degrees of freedom. For our null hypothesis 1 r2 tr n2 25 2 23 23 .428 .428 .428 2 2 1 r 1 .183 .817 1 .428 .428 28.15 .4285.31 2.273 At α = .05 for a two tailed test at 23 degrees of freedom, the critical value of t is ±2.069. Since our value of t is beyond this value, we will reject the null hypothesis and conclude that the correlation between communication and marital satisfaction is not zero. That is, there is a significant correlation between these two variables. i. From part b., we know that b = -.238. We can test the null hypothesis that β = 0 using the formula b with n-2 degrees of freedom . t sY X SS X We know that SS X n 1s X2 247.95 2463.20 1516.80 . 2 So, t .238 0 .238 .238 2.31 4.01 1516.80 4.01 38.95 .1030 At α = .05 for a two tailed test at 23 degrees of freedom, the critical value of t is ±2.069. Since our value of t is beyond this value, we will reject the null hypothesis and conclude that the regression coefficient (β) of the regression line for predicting marital satisfaction from communication skills is not zero and the a knowledge of communication skills will enhance prediction of marital satisfaction scores. Note that the value of t obtained here is the same (allowing for rounding error) as the one obtained in testing the null hypothesis ρ = 0 in part i of this question. 5 3. a. s b r Y sX a 4.2 .67 .671.17 .784 3.6 Y b X n 1854 .7841782 1854 1397.09 456.91 3.81 120 120 120 So, the regression equation for predicting income from level of education is Y .784 X 3.81 b. The income of a person with 13.5 years is found using: Y .784 X 3.81 .78413.5 3.81 10.58 3.81 14.39 or $14,390. c. sY X sY 1 r 2 n 1 n 2 2 sY X 4.2 1 .67 (Formula 6.11) so, 120 1 120 2 4.2 1 .449 119 118 4.2 .551 1.01 4.2.742 1.00 3.12 Alternatively, sY X sY 1 r 2 (Formula 6.12) so, 2 sY X 4.2 1 .67 4.2 1 .449 4.2 .551 4.2.742 3.12 We can predict the income of those with 15 years of education thusly. Y .784 X 3.81 .78415 3.81 11.76 3.81 15.57 The z-score corresponding to a value of 18.5 thousand dollars is .1736 Y Y 18.5 15.57 z sY X 3.12 Income Z 15.57 0 18.50 .94 6 2.93 .94 3.12 We can conclude that 17.36% of the members of the population have annual incomes greater than 18.5 thousand dollars. e. If the shopper indicates he has 16 years of education, the predicted income is Y bX a .78416 3.81 12.54 3.81 16.35 The 95% confidence interval for the predicted value of Y is Y tCV s . The number Y of degrees of freedom is n-2. Remember that s is the standard error of predicted scores Y that can be calculated: s sY X Y 3.12 16 14.85 1 XX 1 1 3.12 1 2 2 n n 1s X 120 1193.6 2 2 1.15 1 .01 3.12 11912.96 2 1.01 1.32 1542.24 3.12 1.01 .02 0 3.12 1.03 3.121.03 3.21 and therefore, the 95% confidence interval for the predicted income, given a level of education of 16 years is: Y tCV s 16.35 1.983.21 16.35 6.36 Y So, the upper limit of CI95 is 16.35 + 6.36 = 22.71 and the lower limit of CI95 is 16.35 – 6.36 = 9.99 . f. We will test the null hypothesis that ρ = 0 using the formula t r n2 with n-2 1 r2 degrees of freedom. So, 120 2 118 118 t .67 .67 .67 .67 214.55 .6714.65 9.82 2 1 .67 1 .45 .55 At α = .05 for a two tailed test at 118 degrees of freedom, the critical value of t is ±1.980. Since our value of t is beyond this value, we will reject the null hypothesis and conclude that the correlation between communication and marital satisfaction is not zero. That is, there is a significant correlation between these two variables. 7 g. From part a., we know that b = .784. We can test the null hypothesis that β = 0 using the formula b with n-2 degrees of freedom . t sY X SS X We know that SS X n 1s X2 1193.6 11912.96 1542.24 . 2 So, t .784 0 3.12 1542.24 .784 .784 9.8 3.12 .08 39.27 At α = .05 for a two tailed test at 118 degrees of freedom, the critical value of t is ±1.980. Since our value of t is beyond this value, we will reject the null hypothesis and conclude that the regression coefficient (β) of the regression line for predicting income from education is not zero and knowledge of communication skills will enhance prediction of marital satisfaction scores. Note that the value of t obtained here is the same (allowing for rounding error) as the one obtained in testing the null hypothesis ρ = 0 in part f of this question. 8 Green, et al. Lesson 32, Questions 5-6 We first load the database for this lesson (Lesson 32 Exercise File 2) into the data view window of the SPSS system. The first portion of the file looks like this. 5. In order to carry out a bivariate linear regression we click on the Analyze menu at the top of the data view screen and pull down the menu. Click on the Regression submenu and on the first choice (Linear...) in that submenu. The choice should look like the one shown on the next page. 9 This should give you the Linear Regression dialog box shown below. 10 Since we wish to predict the number of publications for each professor, the variable num_pubs is the dependent variable. And since we wish to predict this value from the work ethic of the professors, that score, (work_eth) is the independent variable. Place these variables in the appropriate windows of the dialog box. The box should now look like the one shown below. Since we will want to plot the scatterplot for predicted and residual scores in Question 6, click on the Plots button to obtain the dialog box shown below. To obtain the desired scatterplot, move the variable containing the z-score of the residual (the difference between the predicted number of publications and the actual number for each professor), ZRESID, into the Y window. Then, move the z-score of the predicted number of publications for each professor into the X window as shown on the next page. 11 Now, click the Continue button on the Linear Regression: Plots dialog box to obtain the original Linear Regression dialog box. Click on the OK button in this dialog box to obtain your output. a. The output table below is the result of a significance test that assesses the predictability of the number of publications from the work ethic. ANOVA b Model 1 Regres sion Residual Total Sum of Squares 1922.444 4387.556 6310.000 df 1 48 49 Mean Square 1922.444 91.407 F 21.032 Sig. .000a a. Predictors: (Constant), Work Ethic b. Dependent Variable: Number of publications This ANOVA tests the null hypothesis H0: R = 0. That is, that there is no predictability of number of publication by work ethic. The alternative hypothesis is Ha: R ≠ 0, that is that there is some correlation between the two variables and that one can be predicted from the other. We see that the probability of the null hypothesis being true is less that .001. Therefore, we will reject the null hypothesis and conclude that there is a degree of predictability of number of publications and work ethic for these professors. 12 b. The table below gives us the values of the regression equation. Coeffi cientsa Model 1 (Const ant) W ork Ethic Unstandardized Coeffic ients B St d. Error -2. 963 2.823 .450 .098 St andardiz ed Coeffic ients Beta .552 t -1. 050 4.586 Sig. .299 .000 a. Dependent Variable: Number of publications The regression equation will be in the form Y bX a . In this case we see that the value of b, the regression coefficient and the slope of the regression line is .450. The value of a, the Y-intercept of the regression line, is -2.963. Note, however, that the t-test for the significance of a (testing the null hypothesis that a = 0) is .299. This is above .05, so it seems that we cannot reject the null. Therefore, we have to assume that the value of a is zero. So, the regression equation is Y .450 X . c. The output table below shows the correlation between number of publications and work ethic. Model Summ aryb Model 1 R R Square .552a .305 Adjust ed R Square .290 St d. Error of the Es timate 9.561 a. Predic tors: (Constant), Work Ethic b. Dependent Variable: Number of publicat ions We see that this correlation is .552. 13 We can see the scatterplot of the predicted and residual scores below. Scatterplot Dependent Variable: Number of publications Regression Standardized Residual 4 3 2 1 0 -1 -2 -2 -1 0 1 2 Regression Standardized Predicted Value Note that for predicted values with standardized scores (z-scores) of less than zero (the mean score) the residuals are about the same and clustered very closely together. Above z = 0, however, the residuals are spread over many values. This scatterplot lacks homoscedasticity which means that the standard errors for the predictions are lower for lower predicted values and higher for higher predicted values. 14