Fundamental Statistics in Applied Linguistics Research Spring 2010

advertisement
Fundamental Statistics in
Applied Linguistics Research
Spring 2010
Weekend MA Program on Applied English
Dr. Da-Fu Huang
6. Looking for groups of explanatory
variables through multiple regression
Explanatory variables vs. response variables
MR examines whether the explanatory variables
(EV) we’ve posited explain very much of what is
going on in response variables (RV)
 Y = α + β1 xi1 + … + βk xik + errori
TOEFL score = some constant number (the
intercept; α) + time spent on English per week
(β1 ) + aptitude score (β2 ) + a number which
fluctuates for each individual (the error)
MR can also predicts how people in the future will
score on the response variable
Venn diagram of regression variables
Hours of study per week
MLAT score
TOEFL score
Personality
The mathematical formula of a line
Line equation: y = 2 + 0.5 x
actual value Y1
‧
Error or
residual
predicted value Y’1
‧ ‧
‧
‧ slope = 0.5
‧‧ ‧ ‧
intercept = 2
‧
Regression line
The regression line
The least squares regression line (the line that
minimizes the sum of the squared errors about the line; Σ(Y-Y’)2
is minimized
‧
The best fitting line
(closest to the data points)
‧
‧
‧
‧
Error or residual
6. Looking for groups of explanatory
variables through multiple regression
6.1 Standard multiple regression (SMR)
In SMR, the importance of the EV variable depends
on how much it uniquely overlaps with the RV.
SMR answers the two questions:
What are the nature and size of the relationship between
the RV and the set of EV?
How much of the relationship is contributed uniquely by
each EV?
Venn diagram of standard regression design
Hours of study per week
a
b
TOEFL score
c
MLAT score
d
e
Personality
6. Looking for groups of explanatory
variables through multiple regression
6.2 Sequential (Hierarchical) multiple regression (HMR)
 In HMR, all of the areas of the EV’s that overlap with the
RV will be counted, but the way that they will be included
depends on the order in which the researcher enters the
variables into the equation
 The importance of any variable can be emphasized in HMR,
depending on the order in which it is entered. If two
variables overlap to a large degree, then entering one of
them first will leave little room for explanation for the
second variable
 HMR answers the question:
 Do the subsequent variables entered in each step add to the
prediction of the RV after differences in the variables from the
previous step have been eliminated?
Venn diagram of sequential regression design
HMP
Hours of study per week
a
b
TOEFL score
c
MLAT score
d
e
Personality
Assumptions for MR
Table 7.1 (P184)
Normal distribution
Homogeneity of variances
Linearity
Multicollinearity (EV’s involved in the
regression should not be highly intercorrelated)
6. Looking for groups of explanatory
variables through multiple regression
6.4 Starting the MR (PP187-188)
Analyze > Regression > Linear
Put the RV in the box “Dependent”
For Standard regression: put all EV into the
“Independent” box with the Method set at “Enter”
For sequential regression: put all EV’s into the
“Independent” box with the Method set at
“Enter”. Push the Next button after entering each
one. Enter the EV in the order you want them
into the regression equation.
Open the buttons: Statistics, Plots, and Options
6. Looking for groups of explanatory
variables through multiple regression
6.5 Regression output in SPSS
Analyze > Regression > Linear
Regression Output
Descriptive Statistics
results of the
course
Final score
Mean
Std. Deviation
N
Student English
results of the
evaluation by
proficiency
motivation scale
teachers
LangAnxiety
74.46
2.185
3.0370
3.0741
2.7315
10.386
.7024
.97057
.98770
.77163
54
54
54
54
54
Regression Output
Correlations
Student English
results of the
proficiency
motivation scale
Final score
Pearson Correlation
Final score
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
LangAnxiety
Sig. (1-tailed)
.565
.616
.565
1.000
.211
.616
.211
1.000
.374
.170
.115
.032
-.088
.031
Final score
.
.000
.000
.000
.
.063
.000
.063
.
.003
.109
.203
.410
.265
.411
Final score
54
54
54
Student English proficiency
54
54
54
54
54
54
54
54
54
54
54
54
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
LangAnxiety
N
1.000
results of the motivation
scale
results of the course
evaluation by teachers
LangAnxiety
Correlations
results of the
course
evaluation by
teachers
Pearson Correlation
LangAnxiety
Final score
.374
.032
Student English proficiency
.170
-.088
.115
.031
1.000
-.077
results of the motivation
scale
results of the course
evaluation by teachers
Regression Output (Standard)
Variables Entered/Removed
Model
1
Variables
Variables
Entered
Removed
results of the
. Enter
motivation scale,
LangAnxiety,
results of the
course
evaluation by
teachers,
Student English
proficiency,
Midterm score
Method
a
a. All requested variables entered.
Regression Output (Sequential)
Variables Entered/Removed
Model
1
Variables
Variables
Entered
Removed
Student English
proficiency
2
3
results of the
a
. Enter
results of the
course
evaluation by
teachers
4
Method
. Enter
a
motivation scale
b
. Enter
a
LangAnxiety
a
a. All requested variables entered.
b. Dependent Variable: Final score
. Enter
Regression output (Standard)
b
Model Summary
Model
1
R
.854
R Square
a
.730
Adjusted R
Std. Error of the
Square
Estimate
.701
5.675
a. Predictors: (Constant), results of the motivation scale, LangAnxiety,
results of the course evaluation by teachers, Student English
proficiency, Midterm score
b. Dependent Variable: Final score
Regression Output (Sequential)
e
Model Summary
Model
R Square
R
Adjusted R
Std. Error of the
Square
Estimate
.565
a
.319
.306
8.653
2
.760
b
.577
.561
6.885
3
.797
c
.635
.613
6.460
d
.640
.611
6.479
1
4
.800
a. Predictors: (Constant), Student English proficiency
b. Predictors: (Constant), Student English proficiency, results of the
motivation scale
c. Predictors: (Constant), Student English proficiency, results of the
motivation scale, results of the course evaluation by teachers
d. Predictors: (Constant), Student English proficiency, results of the
motivation scale, results of the course evaluation by teachers,
LangAnxiety
e. Dependent Variable: Final score
e
Model Summary
Change Statistics
Model
R Square
Change
F Change
df1
df2
Sig. F Change
1
0.32
24.355
1
52
.000
2
0.26
31.141
1
51
.000
3
0.06
7.933
1
50
.007
4
0.01
.707
1
49
.404
e. Dependent Variable: Final score
Regression output (Standard)
Coefficients
a
Standardized
Unstandardized Coefficients
Model
1
B
Coefficients
Std. Error
Beta
(Constant)
8.620
7.960
Student English proficiency
4.137
1.271
.280
LangAnxiety
1.214
1.020
.090
Midterm score
results of the course
.464
.116
.382
2.500
.806
.238
3.631
.926
.339
evaluation by teachers
results of the motivation
scale
a. Dependent Variable: Final score
Y=8.62 + 4.14*EngProf + 1.21*Anx +.46*Mid + 2.50*EvaTch + 3.63*Motiv
Coefficients
a
95.0% Confidence Interval for B
Model
1
t
Sig.
Lower Bound
Upper Bound
(Constant)
1.083
.284
-7.386
24.625
Student English proficiency
3.255
.002
1.581
6.692
LangAnxiety
1.191
.239
-.835
3.264
Midterm score
3.984
.000
.230
.698
results of the course
3.101
.003
.879
4.121
3.921
.000
1.769
5.493
evaluation by teachers
results of the motivation
scale
Under 5
a. Dependent Variable: Final score
T-test
Coefficients
a
Correlations
Model
1
Zero-order
Partial
Collinearity Statistics
Part
Tolerance
VIF
Student English proficiency
.565
.425
.244
.763
1.311
LangAnxiety
.032
.169
.089
.982
1.019
Midterm score
.710
.498
.299
.613
1.632
results of the course
.374
.409
.233
.958
1.043
evaluation by teachers
Regression Output (Sequential)
Coefficients
a
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
Student English proficiency
2
(Constant)
Student English proficiency
results of the motivation
scale
3
(Constant)
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
4
(Constant)
Student English proficiency
results of the motivation
scale
results of the course
evaluation by teachers
Coefficients
Std. Error
Beta
56.214
3.881
8.351
1.692
42.866
3.906
6.728
1.377
.455
5.563
.997
.520
36.815
4.248
6.175
1.307
.418
5.346
.939
.500
2.577
.915
.245
33.913
5.482
6.269
1.316
.424
5.301
.943
.495
2.629
.920
.250
.977
1.162
.073
LangAnxiety
.565
Y= 42.87 + (6.73)*EngProf + (5.56)*Motiv
a. Dependent Variable: Final score
Coefficients
a
95.0% Confidence Interval for B
Model
1
t
(Constant)
Student English proficiency
2
(Constant)
Student English proficiency
results of the motivation
scale
3
(Constant)
Sig.
14.485
.000
Lower Bound
Upper Bound
48.426
64.001
4.935
.000
4.956
11.747
10.975
.000
35.024
50.707
4.884
.000
3.963
9.493
5.580
.000
3.562
7.564
8.666
.000
28.282
45.347
Regression Output
Residuals Statistics
Minimum
Predicted Value
Maximum
a
Mean
Std. Deviation
N
58.93
94.72
74.46
8.311
54
-1.869
2.437
.000
1.000
54
1.000
2.685
1.942
.344
54
59.85
94.85
74.49
8.305
54
Residual
-9.760
21.291
.000
6.230
54
Std. Residual
-1.506
3.286
.000
.962
54
Stud. Residual
-1.600
3.417
-.002
1.006
54
-11.029
23.017
-.025
6.816
54
-1.627
3.875
.010
1.050
54
Mahal. Distance
.282
8.120
3.926
1.624
54
Cook's Distance
.000
.189
.019
.034
54
Centered Leverage Value
.005
.153
.074
.031
54
Std. Predicted Value
Standard Error of Predicted
Value
Check outliers
Adjusted Predicted Value
Deleted Residual
Stud. Deleted Residual
a. Dependent Variable: Final score
Regression Output: P-P plot for diagnosing normal
distribution of data
Check normality assumption
Look at distribution of residuals, not individual variables
Regression Output: Plot of studentized residuals
crossed with fitted values
The shape should show a cloud of data scattered randomly
Check homogeneity of variances
6. Looking for groups of explanatory
variables through multiple regression
6.6 Reporting the results of regression analysis
 Correlations between the explanatory variables and the
response variable
 Correlations among the explanatory variables
 Correlation matrix with r-value, p-value, and N
 Standard or sequential regression?
 R square or R square change for each step of the model
 Regression coefficients for all regression models (esp.
unstandarized coefficients, labeled B, and the coefficient
for the intercept, labeled “constant” in SPSS output)
 For standard regression, report the t-tests for the
contribution of each variable to the model
6. Looking for groups of explanatory
variables through multiple regression
6.6 Reporting the results of regression analysis
 The multiple correlation coefficient, R2, expresses how
much of the variable in scores of the response variable
can be explained by the variance in the statistical
explanatory variables
The squared semipartial correlations (sr2) provides
a way of assessing the unique contribution of each
variable to the overall R.
 These numbers are already a percentage variance effect
size (of the r family)
 Example reporting on Lafrance & Gottardo (2005):
P198
Application activities 7.4.5 (Q1-Q6):
PP199-200
Download