Sampling Distributions

advertisement
Sampling Distributions
Introduction
Distribution of Mean

Finite Population
Chi-Square Distribution
T Distribution
F Distribution
Order Statistics
Introduction

Statistics concerns itself mainly with conclusions and predictions resulting
from chance outcomes that occur in carefully planned experiments or
investigations.
o These chance outcomes constitute a subset, or sample, of
measurements or observations from a larger set of values called the
population.
o In the continuous case they are usually values of identically distributed
random variables, whose distribution referred to as the population
distribution, or the infinite population sampled.

If X1, X2... Xn are independent and identically distributed random variables, we
say that they constitute a random sample from the infinite population given by
their common distribution.
o If f ( x1 , x2 , , xn ) is the value of the joint distribution of such a set of
n
random variable at ( x1 , x2 , , xn ) , then f ( x1 , x2 ,  , x n )   f ( xi ) .
i 1

Statistical inferences are usually based on statistics, that is, on random
variables that are functions of a set of random variables X1, X2... Xn
constituting a random sample. Typical of these “statistics” are:
o If X1, X2... Xn constitute a random sample, then
n
X 
X
i
i 1
n
is called the sample mean and
n
S2 
(X
i 1
i
 X )2
n 1
is called the sample variance.
Distribution of Mean

Since statistics are themselves random variables, their values will vary from
sample to sample, and it is customary to refer to their distributions as
sampling distributions.

Sampling distribution of the mean: If X1, X2... Xn constitute a random sample
from an infinite population with the mean  and the variance  2 , then
E ( X )   and var( X ) 
2
n
.
o
1
 n X  n 1
E ( X )  E  i    E ( X i )  n     
n
 i 1 n  i 1 n
o
2

 
 n X i n 1

var( X )  E ([ X  E ( X )] )  E 
  E ( X i ) 

 
 i 1 n i 1 n

2
2

 
 n 1
 n 1
 E  ( X i   )    2 E[( X i   ) 2 ]

 
 i 1 n
 i 1 n
1
2
2
 n  2  
n
n
o It is customary to write E ( X ) as  X and var( X ) as  X2 , and refer to
 X as the standard error of the mean.

Central limit theorem: If X1, X2... Xn constitute a random sample from an
infinite population with the mean  and the variance  2 , and the momentgenerating function M X (t ) , then the limiting distribution of
X 
2 n
as n   , is the standard normal distribution.
Z
o Sometimes, the central limit theorem is interpreted incorrectly as
implying that the distribution of X approaches a normal distribution
when n   . This is incorrect because var( X )  0 when n   .
o On the other hand, the central limit theorem does justify approximating
the distribution of X with a normal distribution having the mean 
and the variance  2 n when n is large. (i.e. n  30 regardless of the
actual shape of the population sampled).
o If the population in normal, the distribution of X is a normal
distribution regardless of the size of n: If X is the mean of a random
sample of size n from a normal population with the mean  and the
variance  2 , its sampling distribution is a normal distribution with
the mean  and the variance  2 n .
Distribution of Mean: Finite Population (without replacement)

If an experiment consists of selecting one or more values from a finite set of
numbers {c1 , c2 ,  c N } , this set is referred to as a finite population of size N.

If X1 is the first value drawn form a finite population of size N, X2 is the
second value drawn... Xn is the nth value drawn, and the joint probability
distribution of these n random variables is given by
1
N ( N  1)    ( N  n  1)
for each ordered n-tuple of values of these random variables, then X1, X2... Xn
f ( x1 , x2 ,  , xn ) 
are said to constitute a random sample from the given finite population.
o The probability for each subset of n of the N elements of the finite
population (regardless of the order) is
n!
1

N ( N  1)    ( N  n  1)  N 
 
n
This is often given as alternative definition or as a criteria for the
selection of a random sample of size n from a finite population of size
N
N: Each of the   possible samples must have the same probability.
n

The marginal distribution of Xr is given by f ( x r ) 
1
for xr  c1 , c2 ,  c N ,
N
for r  1,2, , n .

The mean and the variance of the finite population {c1 , c2 ,  c N } are
N
   ci 
i 1

N
1
1
and  2   ( ci   ) 2 
N
N
i 1
The joint marginal distribution of any two of the random variables X1, X2, ...,
Xn is given by g ( x r , x s ) 
1
for each ordered pair of elements of the
N ( N  1)
finite population.

If Xr and Xs are the rth and sth random variables of a random sample of size n
drawn from the finite population {c1 , c2 ,  c N } , then cov( X r , X s )  

2
N 1
If X is the mean of a random sample of size n from a finite population of size
N with the mean  and the variance  2 , then E ( X )   and
var X  
2 N n
n

N 1
.
 
o Note that var X differs from that of a infinite population only by the
finite population correction factor
N n
. When N is large compared
N 1
 
to n, the difference between the two formulas for var X is usually
negligible.
The Chi-Square Distribution
If X has the standard normal distribution, then X 2 has the chi-square
distribution with   1 degree of freedom. Thus, chi-square distribution plays an
important role in problems of sampling from normal distribution.

A random variable X has a chi-square distribution, and it is referred to as a
chi-square random variable, if and only if its probability density is given by
f ( x) 
1
x
 2
2 ( 2)
 2
2
e

x
2
for x > 0.
o The mean and variance of the chi-square distribution are given by
   and  2  2 , and its moment-generating function is given by
M X (t )  (1  2t )  2 .

If X1, X2... Xn are independent random variables having standard normal
n
distributions, then Y   X i2 has the chi-square distribution with   n
i 1
degree of freedom.

If X1, X2... Xn are independent random variables having chi-square
n
distributions, with  1 , 2 , , n degrees of freedom, then Y   X i2 has the
i 1
chi-square distribution with  1   2     n degrees of freedom.

If X1 and X2 are independent random variables, X1 has a chi-square distribution
with  1 degrees of freedom, and X1 + X2 has a chi-square distribution with
   1 degrees of freedom, then X2 has a chi-square distribution with
   1 degrees of freedom.

If X and S 2 are the mean and the variance of a random sample of size n from
a normal population with the mean  and the variance  2 , then
1. X and S 2 are independent;
2. the random variable
degrees of freedom.
(n  1) S 2
2
has a chi-square distribution with n - 1
o Proof for part 2: Begin with the identity
n
 X
i 1
n

    Xi  X
2
i

2

n X 

2
i 1
then divide each term by  2 and substitute n  1S 2 for
 X
n
i 1
2
n  1S 2   X   
 Xi   




 n 
 
2
i 1 


n
i
X
,
2
2
The term on the left-hand side is a random variable having a chi-square
distribution with n degrees of freedom, the second term on the righthand side is a random variable having a chi-square distribution with 1
degree of freedom, and since X and S are independent (without proof
here), it follows that the first term on the right-hand side is a random
variable with n – 1 degree of freedom.

When the degree of freedom is greater than 30, the probabilities related to chisquare distributions are usually approximated with normal distributions: If X
is a random variable having a chi-square distribution with  degrees of
freedom and  is large, the distribution of
X 
2
, and alternatively
2 X  2 , can be approximated with the standard normal distribution.
The t Distribution
In realistic applications the population standard deviation  is unknown. This
makes it necessary to replace  with an estimate, usually with the sample
standard deviation S.

If Y and Z are independent random variables, Y has a chi-square distribution
with  degrees of freedom, and Z has the standard normal distribution, then
the distribution of
is given by
T
Z
Y
  1 
 1


2  2


t
2
  1  
f (t )  
     
  
2
for    t   , and is called the t distribution with  degrees of freedom.

If X and S 2 are the mean and the variance of a random sample of size n from
a normal population with the mean  and the variance  2 , then
X 
S n
has the t distribution with n - 1 degrees of freedom.
T
o The random variables Y 
n  1S 2

2
and Z 
X 

n
have,
respectively, a chi-square distribution with n – 1 degrees of freedom
and the standard normal distribution. Since they are also independent,
substitution into the formula for T 

Z
Y
yields the result.
When the degree of freedom is 30 or more, probabilities related to the t
distribution are usually approximated with the use of normal distributions.
The F Distribution

If U and V are independent random variables having chi-square distribution
with  1 and  2 degrees of freedom, then
U 1
V 2
is a random variable having an F distribution, namely, a random variable
F
whose probability density is given by
   2 
1
1
 ( 1  2 )
 1

1
2
1 





2
  1   f 2 1  1 f  2
g( f )  
 

     
2


 1   2   2 
2  2
for f  0 and g ( f )  0 elsewhere.

If S12 and S 22 are the variances of independent random samples of sizes n1 and
n2 from normal distributions with the variances  12 and  22 , then
S12  12
S 22  22
is a random variable having an F distribution with n1  1 and n2  1 degrees
F
of freedom.
o
 12 
n1  1s12
 12
and  22 
n2  1s 22
 22
are values of independent
random variables having chi-square distributions with n1  1 and
n2  1 degrees of freedom. Substitution of the values for U and V
yields the result.

Applications of the distribution arise in problems in which the variances of
two normal populations are compared.
Order Statistics

Consider a random sample of size n from an infinite population with a
continuous density and suppose we arrange the values of X1, X2... Xn according
to size. If we look upon the smallest of the x's as a value of the random
variable Y1, the next largest as a value of the random variable Y2, the next
largest after that as a value of the random variable Y3, ..., and the largest as a
value of the random variable Yn, we refer to these Y's as order statistics.
o In particular, Y1 is the first order statistic, Y2 is the second order
statistic and so on. (As the population in infinite and continuous, the
probability that any two of the x's will be alike is zero).

For random samples of size n from an infinite population which has the value
f(x) at x, the probability density of the rth order statistic Yr is given by
r 1
nr
n!
 yr f ( x )dx  f ( y )   f ( x )dx 
r 


 yr
( r  1)! ( n  r )!  
for    yr   .
g r ( yr ) 

For large n, the sampling distribution of the median for random samples of
size 2n  1 is approximately normal with the mean ~ and the variance
1
.
8[ f ( ~ )] 2 n
Download