Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang 6. Looking for groups of explanatory variables through multiple regression Explanatory variables vs. response variables MR examines whether the explanatory variables (EV) we’ve posited explain very much of what is going on in response variables (RV) Y = α + β1 xi1 + … + βk xik + errori TOEFL score = some constant number (the intercept; α) + time spent on English per week (β1 ) + aptitude score (β2 ) + a number which fluctuates for each individual (the error) MR can also predicts how people in the future will score on the response variable Venn diagram of regression variables Hours of study per week MLAT score TOEFL score Personality The mathematical formula of a line Line equation: y = 2 + 0.5 x actual value Y1 ‧ Error or residual predicted value Y’1 ‧ ‧ ‧ ‧ slope = 0.5 ‧‧ ‧ ‧ intercept = 2 ‧ Regression line The regression line The least squares regression line (the line that minimizes the sum of the squared errors about the line; Σ(Y-Y’)2 is minimized ‧ The best fitting line (closest to the data points) ‧ ‧ ‧ ‧ Error or residual 6. Looking for groups of explanatory variables through multiple regression 6.1 Standard multiple regression (SMR) In SMR, the importance of the EV variable depends on how much it uniquely overlaps with the RV. SMR answers the two questions: What are the nature and size of the relationship between the RV and the set of EV? How much of the relationship is contributed uniquely by each EV? Venn diagram of standard regression design Hours of study per week a b TOEFL score c MLAT score d e Personality 6. Looking for groups of explanatory variables through multiple regression 6.2 Sequential (Hierarchical) multiple regression (HMR) In HMR, all of the areas of the EV’s that overlap with the RV will be counted, but the way that they will be included depends on the order in which the researcher enters the variables into the equation The importance of any variable can be emphasized in HMR, depending on the order in which it is entered. If two variables overlap to a large degree, then entering one of them first will leave little room for explanation for the second variable HMR answers the question: Do the subsequent variables entered in each step add to the prediction of the RV after differences in the variables from the previous step have been eliminated? Venn diagram of sequential regression design HMP Hours of study per week a b TOEFL score c MLAT score d e Personality Assumptions for MR Table 7.1 (P184) Normal distribution Homogeneity of variances Linearity Multicollinearity (EV’s involved in the regression should not be highly intercorrelated) 6. Looking for groups of explanatory variables through multiple regression 6.4 Starting the MR (PP187-188) Analyze > Regression > Linear Put the RV in the box “Dependent” For Standard regression: put all EV into the “Independent” box with the Method set at “Enter” For sequential regression: put all EV’s into the “Independent” box with the Method set at “Enter”. Push the Next button after entering each one. Enter the EV in the order you want them into the regression equation. Open the buttons: Statistics, Plots, and Options 6. Looking for groups of explanatory variables through multiple regression 6.5 Regression output in SPSS Analyze > Regression > Linear Regression Output Descriptive Statistics results of the course Final score Mean Std. Deviation N Student English results of the evaluation by proficiency motivation scale teachers LangAnxiety 74.46 2.185 3.0370 3.0741 2.7315 10.386 .7024 .97057 .98770 .77163 54 54 54 54 54 Regression Output Correlations Student English results of the proficiency motivation scale Final score Pearson Correlation Final score Student English proficiency results of the motivation scale results of the course evaluation by teachers LangAnxiety Sig. (1-tailed) .565 .616 .565 1.000 .211 .616 .211 1.000 .374 .170 .115 .032 -.088 .031 Final score . .000 .000 .000 . .063 .000 .063 . .003 .109 .203 .410 .265 .411 Final score 54 54 54 Student English proficiency 54 54 54 54 54 54 54 54 54 54 54 54 Student English proficiency results of the motivation scale results of the course evaluation by teachers LangAnxiety N 1.000 results of the motivation scale results of the course evaluation by teachers LangAnxiety Correlations results of the course evaluation by teachers Pearson Correlation LangAnxiety Final score .374 .032 Student English proficiency .170 -.088 .115 .031 1.000 -.077 results of the motivation scale results of the course evaluation by teachers Regression Output (Standard) Variables Entered/Removed Model 1 Variables Variables Entered Removed results of the . Enter motivation scale, LangAnxiety, results of the course evaluation by teachers, Student English proficiency, Midterm score Method a a. All requested variables entered. Regression Output (Sequential) Variables Entered/Removed Model 1 Variables Variables Entered Removed Student English proficiency 2 3 results of the a . Enter results of the course evaluation by teachers 4 Method . Enter a motivation scale b . Enter a LangAnxiety a a. All requested variables entered. b. Dependent Variable: Final score . Enter Regression output (Standard) b Model Summary Model 1 R .854 R Square a .730 Adjusted R Std. Error of the Square Estimate .701 5.675 a. Predictors: (Constant), results of the motivation scale, LangAnxiety, results of the course evaluation by teachers, Student English proficiency, Midterm score b. Dependent Variable: Final score Regression Output (Sequential) e Model Summary Model R Square R Adjusted R Std. Error of the Square Estimate .565 a .319 .306 8.653 2 .760 b .577 .561 6.885 3 .797 c .635 .613 6.460 d .640 .611 6.479 1 4 .800 a. Predictors: (Constant), Student English proficiency b. Predictors: (Constant), Student English proficiency, results of the motivation scale c. Predictors: (Constant), Student English proficiency, results of the motivation scale, results of the course evaluation by teachers d. Predictors: (Constant), Student English proficiency, results of the motivation scale, results of the course evaluation by teachers, LangAnxiety e. Dependent Variable: Final score e Model Summary Change Statistics Model R Square Change F Change df1 df2 Sig. F Change 1 0.32 24.355 1 52 .000 2 0.26 31.141 1 51 .000 3 0.06 7.933 1 50 .007 4 0.01 .707 1 49 .404 e. Dependent Variable: Final score Regression output (Standard) Coefficients a Standardized Unstandardized Coefficients Model 1 B Coefficients Std. Error Beta (Constant) 8.620 7.960 Student English proficiency 4.137 1.271 .280 LangAnxiety 1.214 1.020 .090 Midterm score results of the course .464 .116 .382 2.500 .806 .238 3.631 .926 .339 evaluation by teachers results of the motivation scale a. Dependent Variable: Final score Y=8.62 + 4.14*EngProf + 1.21*Anx +.46*Mid + 2.50*EvaTch + 3.63*Motiv Coefficients a 95.0% Confidence Interval for B Model 1 t Sig. Lower Bound Upper Bound (Constant) 1.083 .284 -7.386 24.625 Student English proficiency 3.255 .002 1.581 6.692 LangAnxiety 1.191 .239 -.835 3.264 Midterm score 3.984 .000 .230 .698 results of the course 3.101 .003 .879 4.121 3.921 .000 1.769 5.493 evaluation by teachers results of the motivation scale Under 5 a. Dependent Variable: Final score T-test Coefficients a Correlations Model 1 Zero-order Partial Collinearity Statistics Part Tolerance VIF Student English proficiency .565 .425 .244 .763 1.311 LangAnxiety .032 .169 .089 .982 1.019 Midterm score .710 .498 .299 .613 1.632 results of the course .374 .409 .233 .958 1.043 evaluation by teachers Regression Output (Sequential) Coefficients a Standardized Unstandardized Coefficients Model 1 B (Constant) Student English proficiency 2 (Constant) Student English proficiency results of the motivation scale 3 (Constant) Student English proficiency results of the motivation scale results of the course evaluation by teachers 4 (Constant) Student English proficiency results of the motivation scale results of the course evaluation by teachers Coefficients Std. Error Beta 56.214 3.881 8.351 1.692 42.866 3.906 6.728 1.377 .455 5.563 .997 .520 36.815 4.248 6.175 1.307 .418 5.346 .939 .500 2.577 .915 .245 33.913 5.482 6.269 1.316 .424 5.301 .943 .495 2.629 .920 .250 .977 1.162 .073 LangAnxiety .565 Y= 42.87 + (6.73)*EngProf + (5.56)*Motiv a. Dependent Variable: Final score Coefficients a 95.0% Confidence Interval for B Model 1 t (Constant) Student English proficiency 2 (Constant) Student English proficiency results of the motivation scale 3 (Constant) Sig. 14.485 .000 Lower Bound Upper Bound 48.426 64.001 4.935 .000 4.956 11.747 10.975 .000 35.024 50.707 4.884 .000 3.963 9.493 5.580 .000 3.562 7.564 8.666 .000 28.282 45.347 Regression Output Residuals Statistics Minimum Predicted Value Maximum a Mean Std. Deviation N 58.93 94.72 74.46 8.311 54 -1.869 2.437 .000 1.000 54 1.000 2.685 1.942 .344 54 59.85 94.85 74.49 8.305 54 Residual -9.760 21.291 .000 6.230 54 Std. Residual -1.506 3.286 .000 .962 54 Stud. Residual -1.600 3.417 -.002 1.006 54 -11.029 23.017 -.025 6.816 54 -1.627 3.875 .010 1.050 54 Mahal. Distance .282 8.120 3.926 1.624 54 Cook's Distance .000 .189 .019 .034 54 Centered Leverage Value .005 .153 .074 .031 54 Std. Predicted Value Standard Error of Predicted Value Check outliers Adjusted Predicted Value Deleted Residual Stud. Deleted Residual a. Dependent Variable: Final score Regression Output: P-P plot for diagnosing normal distribution of data Check normality assumption Look at distribution of residuals, not individual variables Regression Output: Plot of studentized residuals crossed with fitted values The shape should show a cloud of data scattered randomly Check homogeneity of variances 6. Looking for groups of explanatory variables through multiple regression 6.6 Reporting the results of regression analysis Correlations between the explanatory variables and the response variable Correlations among the explanatory variables Correlation matrix with r-value, p-value, and N Standard or sequential regression? R square or R square change for each step of the model Regression coefficients for all regression models (esp. unstandarized coefficients, labeled B, and the coefficient for the intercept, labeled “constant” in SPSS output) For standard regression, report the t-tests for the contribution of each variable to the model 6. Looking for groups of explanatory variables through multiple regression 6.6 Reporting the results of regression analysis The multiple correlation coefficient, R2, expresses how much of the variable in scores of the response variable can be explained by the variance in the statistical explanatory variables The squared semipartial correlations (sr2) provides a way of assessing the unique contribution of each variable to the overall R. These numbers are already a percentage variance effect size (of the r family) Example reporting on Lafrance & Gottardo (2005): P198 Application activities 7.4.5 (Q1-Q6): PP199-200