INSY 7300 Notes on SMDs from Infinite Populations F2008

advertisement
INSY 7300 Notes on Sampling distributions (SMDs) from Underlying
Infinite Populations
F2009
By S. Maghsoodloo
As an example consider the population {2, 5, 8, 8, 11}, where prime is used to
indicate that two elements of this size N = 5 population have identical values of 8.
Simple calculations show that the population mean E(Y) = Y = 6.80, the variance 2Y =
V(Y) = E[(Y  6.8)2] = 9.360  the standard deviation  = 3.0594, and the coefficient of
variation of Y is given by CVY = /E(Y) = 49.99%.
Now consider a random sample of size n = 2 with replacement from the above
population. First, from the standpoint of sampling, the population is considered infinite
because it can never be exhausted, but if we do the sampling without replacement, the
population will be finite of size N = 5; in fact it will be exhausted after two random
samples of size n = 2. Because, all the processes that we consider in this course are
hypothetically infinite, we use the sampling with replacement to illustrate the concept of
sampling distributions. Accordingly, let y represent the mean of the random sample of
size n = 2 with replacement. Then the SMD (SaMpling Distribution), or the probability
mass function, of y in the case of with replacement is given by
 1 / 25,
 2 / 25,
pmf ( y )  5 / 25,
6 / 25,
 4 / 25,
y  2, 11
y  3.5
y 5
 E( y ) = the weighted average of the sample mean
y  6.5, 8
y  9.5
over all possible values in the range space of y is E( y ) =
 y  pmf (y) = 13/25 +
Ry
7/25 + 25/25 + 87/25 + 38/25 = 6.80 =   y is an unbiased estimator of , i.e., the
amount of bias in y as an estimator of  is given by B( y ) = E( y )   = 0.
Variance by definition is the weighted (or long-term) average of deviations from
the mean squared, i.e., V( y ) = E[( y )2] = E[( y 6.8)2] = [(4.8)2+(4.2)2]/25 +
(3.3)22/25 +(1.8)25/25 + [(0.3)2 +1.22]6/25 +2.724/25 = 4.680.
It can easily be proven that for all infinite populations in the universe, the
1
variance of the sample mean is equal to V(individuals)/(sample size), i.e., V( y ) =
V(Y)/n = 2/n. For our example, n = 2, V(Y) = 9.360, and thus V( y ) = 9.36/n =9.360/2
= 4.68, as before!
Further, the variance operator can be reduced to the expected-value operators
as follows:
V(Y) = E[(Y EY)2] = E[(Y )2] = E[(Y2  2Y + 2 )] = E(Y2)  2E(Y) + 2
= E(Y2)  22 + 2 = E(Y2)  2 = E(Y2)  [E(Y)]2.
To illustrate the use of the above formula, we re-compute the V( y ):
2
E[ ( y) ] = 125/25 + 24.50/25 + 125/25 + 6(6.52)/25 + 6(82)/25 +9.524/25 =
50.92
2
V( y ) = E[ ( y) ]  [E( y )]2= 50.92  6.802 = 4.680 as before!
n
 (yi  y)2
=
CSS USS  CF
=
,
n 1
n 1
(  yi )2 / n .
The pmf (Pr. Mass
One measure of variability is S2 = i 1
n
where the USS =

i 1
y i2 and the CF =
n 1
n
i 1
function, or SMD) of S2 for our example with n = 2 (with Replacement) is given
by
7 / 25, S2  0
6 / 25, S2 18
2
pmf (S )  
2
10 / 25, S2  4.5
2 / 25, S  40.5
 E(S2) =
 S2  p(S2 ) = 0 + 186/25 + 45/25 +
R
S2
81/25 = 9.360 = 2  S2 is an unbiased estimator of 2 for all infinite populations; if the
population were finite this would not be the case. In fact for all finite populations, the
E(S2) = N2/(N 1).
Just because S2 is an unbiased estimator of 2 for all infinite populations, it does
not at all imply that S is an unbiased estimator of , as illustrated below for our example.
2
 7 / 25, S  0

0.5
 6 / 25, S  18

P( S )  
10
/
25,
S

4.5


 2 / 25, S  40.5
E(S) =
 S  p(S) = 2.375878785 <  =
3.0594  The amount of bias in S for this
RS
example is B(S) = 2.375878785 3.059412 = 0.6835329234. In fact, for all infinite
populations, S is a biased estimator of . It will be shown below that  per force must
always exceed E(S), i.e., the amount of bias in S as an estimator of  is always
negative. By definition:
V(S) = E(S2)  [E(S)]2 = 2  [E(S)]2 > 0  2 > [E(S)]2  > E(S), or  E(S) <  
B(S) = E(S)   < 0.
Further, note that for all rvs (random variables) in the universe, V(
[E(
Y )]2 > 0  E(Y) >
[E(
Y )]2

E  Y
> E(
Y)
= E(Y) 
Y ).
If the underlying distribution is the Laplace-Gaussian N(, 2), then it can be proven
that E(S) = c4, where c4 =
2
(n / 2)

lies within the interval
n  1 [(n  1) / 2]
[0.797884561, 1), where c4 at n = 2 is equal to 0.797884561, and the limit of c4(as n 
)  1. Thus, for a N(, 2), an unbiased estimator of  is given by S/c4.
Exercise for Friday 08/19/2011. Use the following two independent pmfs
p(y1) =
4 / 9, y1  1
2 / 9, y1  2.5
3 / 9, y1  3
and
p(y2) =

3 / 5, y 2  1.5
2 / 5, y 2  2.0 in order to illustrate
the following properties of the expected and variance operators.
(1) E(Y1 +Y2) = E(Y1) + E(Y2)
(2) V(3Y1) = 9V(Y1)
(3) V(Y1 +Y2) = V(Y1) + V(Y2) only because Y1 and Y2 are considered independent.
(4) For the example on pp. 1-2 of these notes, compute the amount of bias for a
3
random sample of size 2 (with replacement) for the sample range R̂ and the
sample median y . Further, compute the variance of y and compare against
V( y ).
It has been shown in statistical literature that the asymptotic SMD of the p th
sample quantile, ŷ p , is approximately Laplace-Gaussian with mean yp and
variance given by Var( ŷ p )  pq/[nf2(yp)], where f(y) is the underlying density
function, and q = 1p. This implies, that the SMD of the median from a normal
universe is approximately normal (for n > 10) with E( ŷ 0.50 ) = , due to symmetry,
and variance V( ŷ 0.50 ) = 0.50.5/[nf2()] = 0.25/[n/(22)] = 2/(2n). Thus, the
SE( ŷ 0.50 ) =
 / 2 / n =
 / 2 SE( y) = 1.25331414SE( y) , which is larger
than the SE( y) by at least 25%. It has also been shown in statistical literature that
when the sample size n from a N(, 2) is small, then V( ŷ 0.50 )/ V( y ) = 1, 1.35,
1.19, and 1.44 for n = 2, 3, 4, 5, respectively. Kendall & Stuart (1967), Vol. 2, p. 7
give these results but do not provide information for n = 6, 7, 8, 9, and 10.
4
Download