SD/Correlation Computations in Easy Steps Example: Year

advertisement
SD/Correlation Computations in Easy Steps
Math 1070-1, Spring 2003 (Univ. of Utah)
Example: (SL-County unemployment data)
Year
Rate in %
(x)
(y)
1997
2.7
1998
3.4
1999
3.4
2000
3
2001
4.3
Sample size: n = 5.
Means: x̄
ȳ
= (1997 + · · · + 2001)/5
= (2.7 + · · · + 4.3)/5
= 1999,
= 3.36.
The standard Deviation of x: (First the square)
Sx2 =
1 X
(x − x̄)2
n−1
x − x̄
−2 −1 0 1
(x − x̄)2
4
1 0 1
4 + 1 + 0 + 1 + 4) = 2.5. ThereSo: Sx2 = 1
4 (√
fore, Sx = 2.5 ≈ 1.581139 years (without
rounding).
What does this mean?
2
4
Also,
y − ȳ
(y − ȳ)2
−0.66
0.44
0.04
0
0.04
0
−0.36
0.13
0.94
0.88
So:
1
2
Sy ≈ (0.44 + 0 + 0 + 0.13 + 0.88) ≈ 0.363.
4
√
Therefore, Sy ≈ 0.363 ≈ 0.6024948% (without rounding). For correlation, let me start by
reminding you of the formula:
X x − x̄ !
1
y − ȳ
r=
n−1
Sx
Sy
1 X
=
SUxSUy ,
n−1
where SU means “in standard units.” In other
words, the above says, “first compute a column of x in standard units and one for y.
Then cross-multiply and add. Finally, divide by
n − 1.” Now we are off to work out the details
which I will take pains to do very meticulously
so as to avoid those silly—and unacceptable—
errors.
Recall that SUx = (x − x̄)/Sx. So:
x
x − x̄
SUx
1997
−2
−1.3
1998
−1
−0.6
1999
0
0
2000
1
0.6
2001
2
1.3
3.4
0.04
0.07
3
−0.36
-0.6
4.3
0.94
1.56
Ditto for the y’s:
y
y − ȳ
SUy
2.7
−0.66
-1.1
SUx
SUy
SUxSUy
So
−1.3
-1.1
1.39
3.4
0.04
0.07
−0.6
0.07
-0.04
0
0.07
0
0.6
-0.6
-0.38
1.3
1.56
1.97
1
r = 1.39 + (−0.04) + 0 + (−0.38) + 1.97
4
≈ 0.7348094 (no rounding).
Regression The equation of the regression line
is: SUy = rSUx. I.e.,
x − x̄
y − ȳ
=r
.
Sy
Sx
Solve for y (DO IT!) to obtain:
x − x̄
+ ȳ
y = rSy
S
x rSy
rSy
x + ȳ −
x̄ .
=
Sx
| S
|
{zx }
{z
}
(slope)
(intercept)
In our Example above, we had
Sx ≈ 1.58
Sy ≈ 0.6
x̄ = 1999
ȳ = 3.36
r ≈ 0.73.
So slope = (rSy /Sx) ≈ (0.73 × 0.6/1.58) =
0.28 (without rounding). Similarly, intercept =
−556.36 (without rounding; check this!) So,
the regression line—in the previous Example—
is:
y = 0.28x − 556.36.
The regression-prediction for the unemployment
in SL-county in the year x = 2001 (based on
the above data):
y ≈ 0.28 × 2001 − 556.36 = 3.92%.
Download