A Demonstrations that the Sample Variance is Biased

advertisement
Estimating the Population Mean (  ) and Variance (  2)1
George H Olson, Ph. D.
Leadership and Educational Studies
Appalachian State University
August 2009
Unbiased estimates of population parameters
A statistic, w, computed on a sample, is an unbiased estimate of a population parameter,
θ, if its expected value [ℰ (w)] is the parameter, θ.
(Note, , the symbol, ℰ, is use here to denote the expectation operator. Other sources might use a
different symbol to denote the expectation operator. In simple terms the expectation operator is
an instruction to compute the expected value of variable in parentheses following it.
Unbiased estimate of the population mean
It is easy to show, for example, that the sample mean, X , provides an unbiased estimate
of the population mean, μ.
E X  E
 X1  X 2  X 3 
 XN 
N
E  X1   E  X 2   E  X 3  

N
(1)
E XN 
.
However, any ℰ ( X i ) is, by definition, µ, for all observations taken from the same population.
Therefore,
E X 
NE ( X )
.
N
(2)
Unbiased estimate of the population variance
In contrast, it can be shown that the sample variance, s2, is a biased estimate of the
population variance, σ2, i.e., that ℰ (s2) ≠ σ2:
1
The material presented here was derived from Hays (1973), 2nd Ed, pp. 272-274.
 N 2

  Xi

E  s2   E  i
 X2
 N





(3)
 N 2
  Xi 
  E  X 2 .
E  i
 N 




Consider, first, the first term to the right of the equal sign in (3), above.
 N 2 N
2
  X i  E  X i 
 i
E i
.
N
 N 




(4)
By definition, the population variance is,
 2  E  X 2   2 ;
(5)
so that, for any observation, i,
E  X i2    2   2 .
(6)
Substituting (6) into (4) yields,
 N 2 N
2
  X i  E  X i 
 i
E i
N
N






 
N

2

2

(7)
i
N
N    2 
2

N
    2.
2
Now, consider the second term to the right of the equal sign in (3), E  X 2  . The variance of the
sampling distribution of means is:
 X2  E  X 2   [E  X ]2
E X
2

,
(8)
2
However, form (7),
E  X 2    X2   2 .
(9)
Substituting the expressions, (7) and (9) into (3) yields:
 N 2
  Xi 
2
 E  X 2 
E s   E  i
N






  2   2    X2   2 
(10)
  2   X2 .
In words, the expected value of the sample variance is the difference between the population
variance,  2 , and the variance of the distribution of sample means,  X2 . Since the variance of the
distribution of sample means typically is not zero, the sample variance under-estimates the
population variance. Hence, the sample variance is a biased estimator of the population variance.
It has already been demonstrated, in (2), that the sample mean, X , is an unbiased estimate of the
population mean, µ. Now we need an unbiased estimate ( ŝ 2 ) {note the use of the carrot, ^, to
imply estimate} of the population variance σ2. In (10), it was shown that
E  s 2    2   X2 ,
which, after substituting (11), yields
E (s )   
2
2
2
N
N 2 2
  
N
N
 N 1 2

 .
 N 
(11)
Equation (13) shows that the average of the sample variances [ E (s 2 ) ] is too small by a factor of
N
.
N 1
Hence, an unbiased estimate of  2 is given by
sˆ 2 
N
s2.
( N  1)
It is easy to show that ŝ 2 is an unbiased estimate of  2 :
 N 
2
E sˆ 2  
E (s )
 N 1 
 N  N  1  2



 N  1  N 
  2.
 
Unbiased estimate of the standard error of the mean,  X .
The unbiased estimate of  X is given by s X , where
sX 
sˆ 2
N
1 N
  
s2
N
(
N

1)
 


sˆ
N
sˆ
(12)
2
N
.
In words, the unbiased estimate of the standard error of the mean is the unbiased estimate of the
population standard deviation divided by the square root of the sample size.
Download