Chapter 10 - Bakersfield College

advertisement
Chapter 10
DESCRIBING RELATIONSHIPS USING
CORRELATION AND REGRESSION
Going Forward
Your goals in this chapter are to learn:
• How to create and interpret a scatterplot
• What a regression line is
• When and how to compute the Pearson r
• How to perform significance testing of the
Pearson r
• The logic of predicting scores using linear
2
regression and r
Understanding Correlations
Correlation Coefficient
• A correlation coefficient is a statistic that
describes the important characteristics of a
relationship
• It simplifies a complex relationship involving
many scores into one number that is easily
interpreted
Distinguishing Characteristics
• A scatterplot is a graph of the individual data
points from a set of X-Y pairs
• When a relationship exists, as the X scores
increase, the Y scores change such that
different values Y tend to be paired with
different values of X
A Scatterplot Showing the Existence of a
Relationship Between the Two Variables
Linear Relationships
• A linear relationship forms a pattern following
one straight line
• The linear regression line is the straight line
that summarizes a relationship by passing
through the center of the scatterplot
Positive and Negative Relationships
• In a positive linear relationship, as the X
scores increase, the Y scores also tend to
increase
• In a negative linear relationship, as the scores
on the X variable increase, the Y scores tend to
decrease
Scatterplot of a Positive
Linear Relationship
Scatterplot of a Negative
Linear Relationship
Nonlinear Relationships
In a nonlinear relationship, as the X scores
increase, the Y scores do not only increase or
only decrease: at some point, the Y scores alter
their direction of change.
Scatterplot of a Nonlinear Relationship
Strength of a Relationship
The strength of a relationship is the extent to
which one value of Y is consistently paired with
one and only one value of X
• The larger the absolute value of the
correlation coefficient, the stronger the
relationship
• The sign of the correlation coefficient
indicates the direction of a linear relationship
Correlation Coefficients
• Correlation coefficients may range between
–1 and +1. The closer to ±1 the coefficient is,
the stronger the relationship; the closer to 0
the coefficient is, the weaker the relationship.
• As the variability in the Y scores at each X
becomes larger, the relationship becomes
weaker
Correlation Coefficient
A correlation coefficient tells you
• The relative degree of consistency with which
Ys are paired with Xs
• The variability in the group of Y scores paired
with each X
• How closely the scatterplot fits the regression
line
• The relative accuracy of prediction
A Perfect Correlation (±1)
Intermediate Strength Correlation
No Relationship
The Pearson
Correlation Coefficient
Pearson Correlation Coefficient
Describes the linear relationship between two
interval variables, two ratio variables, or one
interval and one ratio variable.
The computing formula is
r
N (XY )  (X )(Y )
[ N ( X )  ( X ) ] [ N ( Y )  ( Y ) ]
2
2
2
2
Step-by-Step
Step 1. Compute the necessary components:
• X
• Y
2
2
• ( X )
• (Y )
2
2
• X
• Y
• XY
• N
Step-by-Step
• Step 2. Use these values to compute the
numerator
N (XY )  (X )(Y )
• Step 3. Use these values to compute the
denominator and then divide to find r
[ N (X )  (X ) ][ N (Y )  (Y ) ]
2
2
2
2
Significance Testing of the Pearson r
Two-Tailed Test of the Pearson r
• Statistical hypotheses for a two-tailed test
H 0 : r  0; H a : r  0
• This H0 indicates the r value we obtained from
our sample is because of sampling error
• The sampling distribution of r shows all
possible values of r that occur when samples
are drawn from a population in which r = 0
Two-Tailed Test of the Pearson r
Two-Tailed Test of the Pearson r
• Find appropriate rcrit from the table based on
– Whether you are using a two-tailed or one-tailed
test
– Your chosen a
– The degrees of freedom (df) where df = N – 2,
where N is the number of X-Y pairs in the data
• If robt is beyond rcrit, reject H0 and accept Ha
• Otherwise, fail to reject H0
One-Tailed Test of the Pearson r
• One-tailed, predicting positive correlation
H 0 : r  0; H a : r  0
• One-tailed, predicting negative correlation
H 0 : r  0; H a : r  0
An Introduction to Linear Regression
Linear Regression
Linear regression is the procedure for predicting
unknown Y scores based on known correlated X
scores.
• X is the predictor variable
• Y is the criterion variable
• The symbol for the predicted Y score is Y 
(pronounced Y prime)
Linear Regression
The equation that produces the value of Y  at
each X and defines the straight line that
summarizes the relationship is called the linear
regression equation.
Proportion of Variance
Accounted For
• The proportion of variance accounted for
describes the proportion of all differences in Y
scores that are associated with changes in the
X variable
• The proportion of variance accounted for
equals r 2
Example 1
For the following data set of
interval/ratio scores,
calculate the Pearson
correlation coefficient.
X
Y
1
8
2
6
3
6
4
5
5
1
6
3
Example 1
Pearson Correlation Coefficient
• Determine N
• Calculate X ,
(X ) , X , Y , (Y ) ,
2
Y , and XY
2
2
2
• Insert each value into the following formula
r
N (XY )  (X )(Y )
[ N ( X )  ( X ) ] [ N ( Y )  ( Y ) ]
2
2
2
2
Example 1
Pearson Correlation Coefficient
N=6
X
X2
Y
Y2
XY
1
1
8
64
8
2
4
6
36
12
3
9
6
36
18
4
16
5
25
20
5
25
1
1
5
6
36
3
9
18
X = 21
X 2 = 91
Y = 29 Y 2 = 171
XY = 81
Example 1
Pearson Correlation Coefficient
r
N (XY )  (X )(Y )
[ N ( X 2 )  ( X ) 2 ] [ N ( Y 2 )  ( Y ) 2
6(81)  (21)( 29)
486  609


2
2
[105][185]
[6(91)  (21) ][6(171)  (29)
123

  0.88
139.374
Example 2
Significance Test of the Pearson r
Conduct a two-tailed significance test of the
Pearson r just calculated. Use a = .05.
• df = N – 2 = 6 – 2 = 4
• rcrit = 0.811
• Since robt of –0.88 falls beyond the critical
value of –0.811, reject H0 and accept Ha.
• The correlation in the population is
significantly different from 0
Example 3
Proportion of Variance Accounted For
Calculate the proportion of variance accounted
for, using the given data.
Proportion of variance accounted for is
r   0.88  0.7744
2
2
Download