SD/Correlation Computations in Easy Steps Math 1070-1, Spring 2003 (Univ. of Utah) Example: (SL-County unemployment data) Year Rate in % (x) (y) 1997 2.7 1998 3.4 1999 3.4 2000 3 2001 4.3 Sample size: n = 5. Means: x̄ ȳ = (1997 + · · · + 2001)/5 = (2.7 + · · · + 4.3)/5 = 1999, = 3.36. The standard Deviation of x: (First the square) Sx2 = 1 X (x − x̄)2 n−1 x − x̄ −2 −1 0 1 (x − x̄)2 4 1 0 1 4 + 1 + 0 + 1 + 4) = 2.5. ThereSo: Sx2 = 1 4 (√ fore, Sx = 2.5 ≈ 1.581139 years (without rounding). What does this mean? 2 4 Also, y − ȳ (y − ȳ)2 −0.66 0.44 0.04 0 0.04 0 −0.36 0.13 0.94 0.88 So: 1 2 Sy ≈ (0.44 + 0 + 0 + 0.13 + 0.88) ≈ 0.363. 4 √ Therefore, Sy ≈ 0.363 ≈ 0.6024948% (without rounding). For correlation, let me start by reminding you of the formula: X x − x̄ ! 1 y − ȳ r= n−1 Sx Sy 1 X = SUxSUy , n−1 where SU means “in standard units.” In other words, the above says, “first compute a column of x in standard units and one for y. Then cross-multiply and add. Finally, divide by n − 1.” Now we are off to work out the details which I will take pains to do very meticulously so as to avoid those silly—and unacceptable— errors. Recall that SUx = (x − x̄)/Sx. So: x x − x̄ SUx 1997 −2 −1.3 1998 −1 −0.6 1999 0 0 2000 1 0.6 2001 2 1.3 3.4 0.04 0.07 3 −0.36 -0.6 4.3 0.94 1.56 Ditto for the y’s: y y − ȳ SUy 2.7 −0.66 -1.1 SUx SUy SUxSUy So −1.3 -1.1 1.39 3.4 0.04 0.07 −0.6 0.07 -0.04 0 0.07 0 0.6 -0.6 -0.38 1.3 1.56 1.97 1 r = 1.39 + (−0.04) + 0 + (−0.38) + 1.97 4 ≈ 0.7348094 (no rounding). Regression The equation of the regression line is: SUy = rSUx. I.e., x − x̄ y − ȳ =r . Sy Sx Solve for y (DO IT!) to obtain: x − x̄ + ȳ y = rSy S x rSy rSy x + ȳ − x̄ . = Sx | S | {zx } {z } (slope) (intercept) In our Example above, we had Sx ≈ 1.58 Sy ≈ 0.6 x̄ = 1999 ȳ = 3.36 r ≈ 0.73. So slope = (rSy /Sx) ≈ (0.73 × 0.6/1.58) = 0.28 (without rounding). Similarly, intercept = −556.36 (without rounding; check this!) So, the regression line—in the previous Example— is: y = 0.28x − 556.36. The regression-prediction for the unemployment in SL-county in the year x = 2001 (based on the above data): y ≈ 0.28 × 2001 − 556.36 = 3.92%.