Vector Variance, Covariance, & Correlation

advertisement
1
Computing the Variance
We can employ vector operations to compute the variance of an array of numbers. Consider data
on a variable X to be expressed as a vector x (hypothetical scores on the first quiz)
x = (15, 11, 9, 14, 13, 12, 6, 13, 8, 10)
variance is the sum of squared deviations divided by the number of observations:
N
 ( X i  X )2
S 
i 1
2
N
N
, where X 
X
i 1
i
N
Use vector notation and operations at each step along the way to compute the variance of this
data set.
1. We need the sum of the observations on X
N
X
i 1
i
= 15+11+9+14+13+12+6+13+8+10
1
1

1

1
1
using vector notation, x1 = (15, 11, 9, 14, 13, 12, 6, 13, 8, 10)  
1
1

1
1

1
= (15)(1) + (11)(1) + … + (10)(1) = 111
2. We next need to compute the mean
N
X 
X
i 1
i
N
Michael C. Rodriguez
=
111
= 11.1
10
Using vector notation, 1/n (x1) = 1/10 (111) = 11.1
EPSY 8269: Matrix Algebra for Statistical Modeling
2
3. Now we need deviation scores, obtained by subtracting the mean from each observation.
X i  X  di
using vector notation, we can subtract a vector of means from the vector of observations
15 11.1
11 11.1
  

 9  11.1
  

14 11.1
13 11.1
d = x – m =  
=
12 11.1
 6  11.1
  

13 11.1
 8  11.1
  

10 11.1
 3 .9 
  0.1


 2.1


 2.9 
 1. 9 


 0 .9 
  5.1


 1. 9 
  3.1


  1.1
4. Next we compute the sum of squared deviations, using vector notation:
 3 .9 
  0.1


 2.1


 2.9 
 1.9 
ss = dd = (3.9, -0.1, -2.1, 2.9, 1.9, 0.9, -5.1, 1.9, -3.1, -1.1) 
 = 72.9
 0 .9 
  5.1


 1.9 
  3.1


  1.1
5. Finally, we take the average squared deviation to get the variance.
(1/10)ss = (1/10)(72.9) = 7.29
Michael C. Rodriguez
EPSY 8269: Matrix Algebra for Statistical Modeling
3
Computing the Covariance and Correlation
Covariance is found by taking the average cross product of deviation scores.
To do this we need to compute deviation scores for both X and Y and compute the average cross
product of the deviation scores.
r
Cov xy
sx s y
 xy

d x d y
n
x y
2
n
2
n

n
d x d x
n
d y d y

d x d y
d x d x d y d y
n
Matrix.
compute
compute
compute
compute
compute
compute
compute
compute
compute
compute
x = {15;11;9;14;13;12;6;13;8;10}.
y = {152;145;111;132;143;128;89;121;99;105}.
ones = Make(10,1,1).
meanx = (1/10)*T(x)*ones.
mx = Make(10,1,meanx).
dx = x - mx.
meany = (1/10)*T(y)*ones.
my = Make(10,1,meany).
dy = y - my.
cp = T(dx)*(dy).
print cp.
compute
compute
compute
compute
sx = sqrt(T(dx)*dx).
sy = sqrt(T(dy)*dy).
sxsy = sx*sy.
r = inv(sxsy)*cp.
print sx.
print sy.
print r.
End Matrix.
Run MATRIX procedure:
CP
SX
SY
R
468.5000000
8.538149682
63.50196847
.8640893313
------ END MATRIX ----Note: to create vectors, rows are separated by ; and columns by ,
Michael C. Rodriguez
EPSY 8269: Matrix Algebra for Statistical Modeling
Download