Document

advertisement
11.7 CORRELATION: HOW STRONG IS THE LINEAR
RELATIONSHIP?
DEFINITION:
The sample correlation coefficient r measures the strength of the
linear relationship between two quantitative variables. It describes
the direction of the linear association and indicates how closely the
points in a scatter-plot are to the least squares regression line.
Features of the correlation coefficient.
1  r  1
1. Range
2.
Sign
The sign of the correlation coefficient
indicates direction of association — negative [-1 ,
0) or positive (0 , +1].
3.
Magnitude
The magnitude of the correlation
coefficient indicates the strength of the linear
association. If the data follow a straight line
r  1 (if the slope is positive) or r  1 (if the
slope is negative), indicating a perfect linear
association. If r  0 then there is no linear
association.
4.
5.
Measures Strength The correlation only measures the
strength of the linear association.
Unit-less
The correlation is computed using standard
scores of the two variables. It has no unit of
measure and the absolute value of r will not
change if the units of measurement for x or y
are changed. The correlation between x and y
is the same as the correlation between y and
x.
15
Some Pictures....
y
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
r  0.8 .
Positive, moderate to strong linear association,
y
x
x
x
x
x
x
x
x
x
x
x
x
x
x
r  0.2
x
Negative, weak linear association,
x
A strong association, just not a linear one,
y
x
x
x
r  0.
x
x
x
x
x
Let's Do It! 113.8Matching Graphs
The scatter-plot #1 to the right yields a regression line of
y = -2.6 + 1.1x and a correlation of r = 0.84.
Using this information as a base, match each of the four scatter-plots
below to the correct description of its regression line and correlation
coefficient. The scales on the axes of the scatter-plots are the same.
16
How to Calculate the Correlation Coefficient r
Dep var: PERCENT N:18 Multiple R: .840 Squared multiple R: .706
Adjusted squared multiple R: .688 Standard error of estimate: 10.547
Variable
CONSTANT
LENGTH
Coefficient Std error
96.681
6.289
-5.970
0.963
Std coef Tolerance
0.000 .
-0.840 .100E+01
T
P(2 tail)
15.373 0.000
-6.201 0.000
Analysis of Variance
Source Sum-of-squares DF Mean-square F-ratio
P
Regression
4276.908 1 4276.908
38.448
0.000
Residual
1779.815 16
111.238
“Multiple R: 0.840” = absolute value of the correlation coefficient r.
The sign of r can be determined by looking at the sign of the slope,
which here is -5.970.
Correlation between length of putt and percentage of putts made is
r  084
. .
17
The formula:
r 

n xi y i    xi  y i 


n  xi2   xi 

n  y i2   y i 
2
2
Example Test 1 v e r s us Test 2 Obtaining t he Correlation Coefficient
“By Hand”
We already have computed the summation quantities needed for
finding r, shown in the calculation table.
Completed Calculation Table
Total:
r
xi
yi
xi2
xi yi
yi2
8
10
12
14
16
9
13
14
15
19
64
100
144
196
256
72
130
168
210
304
81
169
196
225
361
x
i
y
 60
i
n  xi yi     xi   yi 
n  x
2
i
  x 
i
2
n  y
2
i
x
 70
 y 
i
2

2
i
 760
x y
i i
 884
5(884)  (60)(70)
5(760)  (60) 2 5(1032)  (70) 2
y
2
i
 1032
 0.965
The large positive correlation coefficient and the scatter-plot indicate
a strong, positive, linear association between Test 1 and Test 2
scores.
18
Let’s Do It! 2 Birth Rates
We gathered data from 1970 for twelve nations on the percentage of
women aged 14 or older who were economically active and the crude
birth rate. (We define the crude birth rate as the number of births in
a year per 1000 population size) We are interested in the relationship
of the crude birth rate (y) on the percentage of women who were
economically active (x)
a. Create the scatter-plot.
Determine if there is a
positive, negative, or
association between x and y.
Nation
Algeria
Argentina
Denmark
E. Germany
Guatemala
India
Ireland
Jamaica
Japan
Philippines
USA
Soviet Union
x
2
19
34
40
8
12
20
20
37
19
30
46
y
48
21
14
11
41
37
22
31
19
42
15
18
b. Find the equation of the regression line. Interpret the slope.
c. Find the correlation coefficient r.
19
Obtaining the Correlation Using the TI
To get the regression line and the correlation coefficient using the TI
we first need to turn on the diagnostic option. If the x data is in L1
and the y data is in L2, then the steps are as follows:
Let’s Do It! 3 Birth Rates
Check the value of r you obtained in activity2 above using TIcalculator.
20
Let’s Do It! 4
Data on Milk Production
Milk samples were obtained from 14 Holstein-Friesian cows, and each
was analyzed to determine uric acid concentration (Y), measured in
mol/L. In addition to acid concentration, the total mild production
(X), measured in kg/day, was recorded for each cow. The data was
entered into a computer and the following regression output was
obtained.
(a) What is the equation of the least squared regression line?
(b) What is the correlation between x and y?
r = __________.
(c) We are interested if the linear relationship is significant by testing
the following hypothesis.
Main Hypothesis: The slope of the regression line equals 0.
Circle the p-value in the output that is used to test this hypothesis. At
the level of significance of 0.05, we would (circle one):
Accept H0
Reject H0
Can’t Tell
21
THE SQUARED CORRELATION r 2 — WHAT DOES IT TELL
US?
r = correlation coefficient, gives the strength and the direction of the
linear relationship between two quantitative variables x and y;
–1  r  1.
Note that when we square r we get => 0  r2  1. The value of r2
Is the percentage of variation of dependent variable that are
explained the independent variable x.
The quantity r2 is generally denoted in computer output as R2, and is
often reported as a percent.
r2 = 0.75 => about 75% of the variation in the response variable y
can be explained by the linear relationship between x and y.
Homework page 546 : 2,3, 4 ,14,15,22, 36, 37, 39
( for 2, 14, 22, 37, 39 create SPSS data files for the table and use the regression procedure to
answer these questions, include a copy of your outputs.
For the other questions about the same tables you can use the output only to check your answer,
you work will still be required.).
22
Download