Effect Size Statistics for Bivariate Linear Regression

advertisement
SIMPLE LINEAR REGRESSION: PREDICTION
For the bivariate linear regression problem, data are collected on an independent or predictor
variable (X) and a dependent or criterion variable (Y) for each individual. Bivariate linear
regression computes an equation that relates predicted Y scores (Ŷ) to X scores. The regression
equation includes a slope weight for the independent variable, Bslope (b), and an additive constant,
Bconstant (a):
Ŷ = Bslope X + Bconstant
(or)
Ŷ = bX + a
Indices are computed to assess how accurately the Y scores are predicted by the linear equation.
We will focus on applications in which both the predictor and the criterion are quantitative
(continuous – interval/ratio data) variables. However, bivariate regression analysis may be used
in other applications. For example, a predictor could have two levels like gender and be scored 0
for females and 1 for males. A criterion may also have two levels like pass-fail performance,
scored 0 for fail and 1 for pass.
Linear regression can be used to analyze data from experimental or non-experimental designs. If
the data are collected using experimental methods (e.g., a tightly controlled study in which
participants have been randomly assigned to different treatment groups), the X and Y variables
may be referred to appropriately as the independent and the dependent variables, respectively.
SPSS uses these terms. However, if the data are collected using non-experimental methods (e.g.,
a study in which subjects are measured on a variety of variables), the X and Y variables are more
appropriately referred to as the predictor and the criterion, respectively.
UNDERSTANDING BIVARIATE LINEAR REGRESSION
A significance test can be conducted to evaluate whether X is useful in predicting Y. This test can
be conceptualized as evaluating either of the following null hypotheses: the population slope
weight is equal to zero or the population correlation coefficient is equal to zero.
The significance test can be derived under two alternative sets of assumptions, assumptions for a
fixed-effects model and those for a random-effects model. The fixed-effects model is probably
more appropriate for experimental studies, while the random-effects model seems more
appropriate for non-experimental studies. If the fixed-effects assumptions hold, linear or nonlinear relationships can exist between the predictor and criterion. On the other hand, if the
random-effects assumptions hold, the only type of statistical relationship that can exist between
two variables is a linear one.
Regardless of the choice of assumptions, it is important to examine a bivariate scatterplot of the
predictor and the criterion variables prior to conducting a regression analysis to assess if a nonlinear relationship exists between X and Y and to detect outliers. If the relationship appears to be
non-linear based on the scatterplot, you should not conduct a simple bivariate regression analysis
but should evaluate the inclusion of higher-order terms (variables that are squared, cubed, and so
on) in your regression equation. Outliers should be checked to ensure that they were not
incorrectly entered in the data set and, if correctly entered, to determine their effect on the results
of the regression analysis.
FIXED-EFFECTS MODEL ASSUMPTIONS FOR BIVARIATE LINEAR REGRESSION
Assumption 1: The Dependent Variable is Normally Distributed in the Population for Each
Level of the Independent Variable
In many applications with a moderate or larger sample size, the test of the slope may
yield reasonably accurate p values even when the normality assumption is violated. To
the extent that population distributions are not normal and sample sizes are small, the p
values may be invalid. In addition, the power of this test may be reduced if the population
distributions are non-normal.
Assumption 2: The Population Variances of the Dependent Variable are the same for All
Levels of the Independent Variable
To the extent that this assumption is violated and the sample sizes differ among the levels
of the independent variables, the resulting p value for the overall F test is not trustworthy.
Assumption 3: The Cases Represent a Random Sample from the Population, and the Scores
are Independent of Each Other from One Individual to the Next
The significance test for regression analysis will yield inaccurate p values if the
independence assumption is violated.
RANDOM-EFFECTS MODEL ASSUMPTIONS FOR BIVARIATE LINEAR REGRESSION
Assumption 1: The X and Y Variables are Bivariately Normally Distributed in the Population
If the variables are bivariately normally distributed, each variable is normally distributed
ignoring the other variable and each variable is normally distributed at every level of the
other variable. The significance test for bivariate regression yields, in most cases,
relatively valid results in terms of Type I errors when the sample is moderate to large in
size. If X and Y are bivariately normally distributed, the only type of relationship that
exists between these variables is linear.
Assumption 2: The Cases Represent a Random Sample from the Population, and the Scores
on Each Variable are Independent of Other Scores on the Same Variable
The significance test for regression analysis will yield inaccurate p values if the
independence assumption is violated.
REGRESSION
PAGE - 2
EFFECT SIZE STATISTICS FOR BIVARIATE LINEAR REGRESSION
Linear regression is a more general procedure that assesses how well one or more independent
variables predict a dependent variable. Consequently, SPSS reports strength-of-relationship
statistics that are useful for regression analyses with multiple predictors. Four correlational
indices are presented in the output for the Linear Regression procedure: the Pearson productmoment correlation coefficient (r), the multiple correlation coefficient (R), its squared value (R2),
and the adjusted R2. However, there is considerable redundancy among these statistics for the
single-predictor case: R = |r|, R2 = r2, and the adjusted R2 is approximately equal to R2.
Accordingly, the only correlational indices we need to report in our manuscript for a bivariate
regression are r and r2.
The Pearson product-moment correlation coefficient ranges in values from -1.00 to +1.00. A
positive value suggests that as the independent variable X increases, the dependent variable Y
increases. A zero value indicates that as X increases, Y neither increases nor decreases. A
negative value indicates that as X increases, Y decreases. Values close to -1.00 or +1.00 indicate
stronger linear relationships. The interpretation of strength of relationship should depend on the
research context.
By squaring r, we obtain an index that directly tells us how well we can predict Y from X, r2
indicates the proportion of Y variance that is accounted for by its linear relationship with X.
Alternatively, r2 (coefficient of determination) can be conceptualized as the proportion reduction
in error that we achieve by including X in the regression equation in comparison with not
including X in the regression equation.
Other strength-of-relationship indices may be reported for bivariate regression problems. For
example, SPSS gives Standard Error of the Estimate on the output. The standard error of
estimate is an index indicating how large the typical error is in predicting Y from X. it is a useful
index over and above correlational indices because it indicates how badly we predict the
dependent variable scores in the metric of these scores. In comparison, correlational statistics are
unit-less indices and, therefore, are abstract and difficult to interpret.
REGRESSION
PAGE - 3
CONDUCTING A BIVARIATE LINEAR REGRESSION ANALYSIS
1. Open the data file
2. Click Analyze
 Regression
then click Linear
You will see the Linear Regression dialog box.
3. Select your dependent variable
then click ► to move it to the Dependent box.
(for this example – MATHACH was chosen)
4. Select your independent variable
then click ► to move it to the Independent box.
(for this example – VISUAL was chosen)
5. Click Statistics
You will see the Linear Regression: Statistics dialog box
6. Click Confidence intervals and Descriptives
Make sure that Estimates and Model fit are also selected.
7. Click Continue
8a. (For total sample information) Click OK
For this example, we will look at the total sample information.
8b. (For group information) Click Paste
Make the necessary adjustments to your syntax (i.e., temporary/select if command), then
run the analysis.
REGRESSION
PAGE - 4
SELECTED SPSS OUTPUT FOR BIVARIATE LINEAR REGRESSION
The results of the bivariate linear regression analysis example are shown below. The B’s, as
labeled on the output in the Unstandardized Coefficients box, are the additive constant, a
(8.853) and the slope weight, b (.745) of the regression equation used to predict the
dependent variable from the independent variable.
The regression or prediction equation is as follows:
Ŷ = Bslope X + Bconstant
(or)
Ŷ = bX + a
Predicted Mathematics Test Score = .745 Visualization Test Score + 8.853
Syntax:
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT mathach
/METHOD=ENTER visual .
De scri ptive Statistics
Mean
MATHACH Mathematics
13.09803
Test Score
VISUAL Visualizat ion
5.69700
Test
St d. Deviat ion
N
6.604590
500
3.886535
500
REGRESSION
PAGE - 5
Correl ations
Pearson Correlation
Sig. (1-tailed)
N
MA THACH
Mathematics
Test S core
VISUA L
Visualization
Test
1.000
.438
.438
1.000
.
.000
.000
.
500
500
500
500
MA THACH Mathematics
Test S core
VISUA L V isualization
Test
MA THACH Mathematics
Test S core
VISUA L V isualization
Test
MA THACH Mathematics
Test S core
VISUA L V isualization
Test
Variables Entered/Removedb
Model
1
Variables
Entered
VISUAL
Vis ualizati
a
on Tes t
Variables
Removed
Method
.
Enter
a. All requested variables entered.
b. Dependent Variable: MATHACH
Mathematics Test Score
Model Summary
Model
1
R
.438a
R Square
.192
Adjusted
R Square
.191
Std. Error of
the Estimate
5.941865
a. Predictors: (Constant), VISUAL Visualization Test
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
4184.416
17582. 267
21766. 682
df
1
498
499
Mean Square
4184.416
35.306
F
118.519
Sig.
.000a
a. Predic tors: (Constant), VISUAL Vis ualiz ation Test
b. Dependent Variable: MATHACH Mathematic s Test Score
REGRESSION
PAGE - 6
Coefficientsa
Model
1
(Constant)
VISUAL
Vis ualization Test
Unstandardized
Coefficients
B
Std. Error
8.853
.472
.745
Standardized
Coefficients
Beta
.068
.438
t
18.763
Sig.
.000
10.887
.000
95% Confidence Interval for B
Lower Bound Upper Bound
7.926
9.780
.611
.880
a. Dependent Variable: MATHACH Mathematics Test Score
Based on the magnitude of the correlation coefficient, we can conclude that the visualization
test is moderately related to the mathematics test (r = .438). Approximately 19% (r2 = .192)
of the variance of the mathematics test is associated with the visualization test.
The hypothesis test of interest evaluates whether the independent variable predicts the
dependent variable in the population. More specifically, it assesses whether the population
correlation coefficient is equal to zero or, alternatively, whether the population slope is equal
to zero. This significance test appears in two places for a bivariate regression analysis: the F
test reported as part of the ANOVA table and the t test associated with the independent
variable in the Coefficient table. They yield the same p value because they are identical tests:
F (1, 498) = 118.519, p < .001 and t (498) = 10.887, p < .001. In addition, the fact that the
95% confidence interval for the slope does not contain the value of zero indicates that the
hypothesis should be rejected at the .05 level.
REGRESSION
PAGE - 7
USING SPSS GRAPHS TO DISPLAY THE RESULTS
A variety of graphs have been suggested for interpreting linear regression results. The results of
the bivariate regression analysis can be summarized using a bivariate scatterplot. Conduct the
following steps to create a simple bivariate scatterplot for our example:
1. Click Graphs (on the menu bar)
then click Scatter
2. Click Simple
then click Define
3. Click the Dependent (Criterion) Variable and click ► to move it to the Y axis box.
4. Click the Independent (Predictor) Variable and click ► to move it to the X axis box.
5. Click OK
Once you have created a scatterplot showing the relationship between the two variables, you can
add a regression line by following these steps:
1. Double-click on the chart to select it for editing, and maximize the chart editor.
2. Click Chart from the menu at the top of window in the chart editor
then click Options.
3. Click Total in the Fit Line box.
4. Click OK then close the Chart1 – SPSS Chart Editor
For Example: Your scatterplot would look like the one below:
30
Mathematics Test Score
20
10
0
-10
-10
0
10
20
V isualization Test
An examination of the plot allows us to assess how accurately the regression equation predicts
the dependent variable scores. In this case, the equation offers some predictability, but many
points fall far off the line, indicating poor prediction for those points.
REGRESSION
PAGE - 8
Download