Stat 203 Additional (FOR INTEREST) material.

advertisement
Stat 203 Additional (FOR INTEREST) material.
You are not responsible for knowing this.
We’ve been looking at the Pearson correlation r without
looking at how it’s calculated.
For correlating the response variable to multiple explanatory
variables, the easiest way is to use the sum of squares error
and total (SSE and SST)
For only one y variable and one x variable we have a more
directed way.
r is the Pearson correlation coefficient.
n is the sample size.
The parts in the brackets are “How many standard errors
above the x mean and above the y mean” respectively
This following notation isn’t exactly right, but it will serve our
purposes.
zx and zy are the standardized scores of x and y
(the raw scores).
For a set of 5 dragons, we might have a dataset like this:
Length in cm (x)
34.3
24.8
30.0
28.7
30.9
Weight in grams (y)
670
373
557
480
567
Which produces this scatterplot:
If y (weight) increases with x (length), then above-average x
values will occur for the same cases as above-average y values.
z
So x
>0
usually when
zy > 0
That means, for most values,
.
(zx )(zy) > 0
In the correlation formula you’re adding mostly positive
numbers, and your correlation will end up positive.
If y decreases as x increases, below-average x occurs with
above-average y.
z
So x
<0
usually when
zy > 0
That means, for most values,
.
(zx )(zy) < 0
In the correlation formula you’re adding mostly negative
numbers, and your correlation will end up negative.
First, standardize the scores.
Length in cm (x)
34.3
z = 1.32
24.8
z = -1.43
30.0
z = 0.08
28.7
z = -0.30
30.9
z = 0.34
Weight in grams (y)
670
z = 1.27
373
z = -1.41
557
z = 0.25
480
z = -0.45
567
z= 0.34
Then multiply each one together
Length in cm (x)
Weight in grams (y)
34.3
z = 1.32
670
z = 1.27
24.8
z = -1.43
373
z = -1.41
30.0
z = 0.08
557
z = 0.25
28.7
z = -0.30
480
z = -0.45
30.9
z = 0.34
567
z= 0.34
(zx )(zy)
1.68
2.02
0.02
0.13
0.11
Then add the multiplied values
Length in cm (x)
Weight in grams (y)
34.3
z = 1.32
670
z = 1.27
24.8
z = -1.43
373
z = -1.41
30.0
z = 0.08
557
z = 0.25
28.7
z = -0.30
480
z = -0.45
30.9
z = 0.34
567
z= 0.34
TOTAL
(zx )(zy)
1.68
2.02
0.02
0.13
0.11
3.97
This pretty much does the whole formula for us.
r= 0.985, very strong positive.
Final note: The correlation formula doesn’t show up in your
textbook in this form, but in an equivalent but longer form.
For the equivalence and more information I recommend
http://en.wikipedia.org/wiki/Pearson_productmoment_correlation_coefficient
Download