Notes 21 - Wharton Statistics Department

advertisement
Statistics 512 Notes 21: Sufficiency and Its
Implications
Sufficiency Definition: A statistic Y  u ( X1 , , X n ) is said
to be sufficient for  if the conditional distribution of
X 1 , , X n given Y  y does not depend on  for any value
of y .
Example 1: X 1 ,
, X n a sequence of independent Bernoulli
random variables with P( X i  1)   .
Y  i 1 X i is sufficient for  .
n
Example 2:
Let X 1 , , X n be iid Uniform( 0,  ). Consider the statistic
Y  max1i n X i .
We have shown before (see Notes 1) that
 ny n 1
0  y 

fY ( y )    n
0
elsewhere

For Y   , we have
P ( X 1  x1 ,
, X n  xn | Y  y ) 
P ( X 1  x1 , , X n  xn | Y  y )
P (Y  y )
1
I
n Y 
1

 n 1
 n 1
ny
n
IY 
ny
which does not depend on  .
For Y   , P ( X1  x1 , , X n  xn | Y  y)  0 .
Thus, the conditional distribution does not depend on 
and Y  max1i n X i is a sufficient statistic. 
It is often hard to verify or disprove sufficiency of a
statistic directly because we need to find the distribution of
the sufficient statistic. The following theorem is often
helpful.
Factorization Theorem: Let X 1 , , X n denote the a random
sample of size n from a distribution that has pdf or pmf
f ( x; ) ,   . A statistic Y  u ( X1 , , X n ) is sufficient
for  if and only if we can find two nonnegative functions,
k1 and k2 such that for all sample points ( x , , x )
f ( x1; ) f ( xn ; )  k1[u( x1 , , xn ); ]k2[( x1 , , xn )]
1
where k2 [( x1 ,
n
, xn )] does not depend upon  .
Proof: We prove this for the discrete case. (The proof for
the general case is more subtle and requires regularity
conditions, but the basic ideas are the same). First, suppose
that the probability mass function factors as given in the
theorem. We have
P(u ( X 1 , , X n )  u; )   P( X 1  x1 , , X n  xn ; )
u ( x1 , , xn ) u
 k1[u; ]

u ( x1 , , xn )
k2 [( x1 ,
, xn )]
We then have
P[( X 1  x1 ,
, X n  xn ) | u ( X 1 ,
, X n )  u;  ] 
P(( X 1  x1 , , X n  xn ), u ( X 1 , , X n )  u; )
P(u ( X 1 , , X n )  u; )

k2 [( x1 , , xn )]
 k2 [( x1 , , xn )]
u ( x1 , , xn ) u
Thus, the conditional distribution does not depend on  so
that Y  u ( X1 , , X n ) is sufficient for  .
Now suppose that Y  u ( X1 ,
, X n ) is sufficient for  ,
meaning that the distribution of ( X1 , , X n ) given
, X n ) is independent of  . Let
k1[u, ]  P[u( X1 , , X n )  u; ]
u( X1 ,
, xn )]  P[( X1  x1 ,
k2 [( x1 ,
, X n  xn ) | u( X1 ,
, X n )  u; ]
We then have
P [( X 1  x1 ,
, X n  xn ); ] 
P[u ( X 1 ,
, X n )  u ( x1 ,
k1[u ( x1 ,
, xn ),  ]k2 [( x1 ,
, xn ); ]P[( X 1  x1 ,
, X n  xn ) | u ( X 1 ,
, X n )  u( x1,
, xn )] 
, xn )]
as was to be shown.

Example 1 Continued: X 1 , , X n a sequence of
independent Bernoulli random variables with
P( X i  1)   . To show that Y  i 1 X i is sufficient for
 , we factor the probability mass function as follows:
n
P( X 1  x1 ,
n
, X n  xn ; )    xi (1   )1 xi
i 1
x
n
x
   i1 i (1   )  i1 i
n
  


 1 
n
 i1 xi
n
(1   ) n
The pmf is of the form k1 (i 1 xi , )k2 ( x1 ,
k2 ( x1 , , xn )  1.
n
, xn ) where
Example 2 continued: Let X 1 , , X n be iid Uniform( 0,  ).
To show that Y  max1i n X i is sufficient, we factor the pdf
as follows:
n
1
1
f ( x1 , , xn ; )   I 0 xi   n I max1in X i  I max1in X i 0
i 1


The pdf is of the form k1 ( I max1in X i  ,  )k2 ( x1 , , xn ) where
1
k ( x1 , , xn ; )  n I max1in X i  , k2 ( x1 , , xn )  I max1in X i 0

2
Example 3: Let X 1 , , X n be iid Normal (  ,  ). Although
the factorization theorem was stated explicitly for a onedimensional sufficient statistic, it also applies to
multidimensional sufficient statistics. The pdf factors as
1
 1

exp  2 ( xi   ) 2 
 2

i 1  2
1
n
 1
2
 n
exp
(
x


)
 2 2  i 1 i

 (2 ) n / 2


n
, xn ;  ,  )  
2
f ( x1 ,

1
 1
exp  2
n/2
 (2 )
 2
n

n
2
2
x

2

x

n


i
i

i 1
i 1
n
The pdf is thus of the form
k1 (i 1 xi , i 1 xi2 , ,  2 )k2 ( x1 ,
n
k2 ( x1 ,
n
, xn ) where
, xn )  1.
2
Thus, (i 1 xi , i 1 xi ) is a two-dimensional sufficient
n
n
2
statistic for (  ,  ) , i.e., the distribution of X 1 ,
, X n is
2
2
independent of (  ,  ) given (i 1 xi , i 1 xi ) .
n
n
Rao-Blackwell Theorem:
Theorem 7.3.1 (stated a little differently): Let X 1 , , X n be
an iid sample from the pdf or pmf f ( x; ) ,   . Let
u ( X1 , , X n ) be a sufficient statistic for  . Let
ˆ  W ( X , , X ) be an estimator of  . Because
1
n
, X n ) is a sufficient statistic for  ,
  E(ˆ | u( X1, , X n )  u; ) is a function of X1 , , X n
that is independent of  . The theorem is that for all  ,
MSE ( )  MSE (ˆ)
The inequality is strict unless   ˆ .
u( X1 ,
Proof:
By the property of iterated conditional expectations
(Theorem 2.3.1 (a)),
E ( )  E  E (ˆ | u ( X1 , , X n ))   E ( ) .
Therefore, to compare the two mean square error of the two
estimators, we need only compare their variances. From
the proof of Theorem 2.3.1(b), we have
Var (ˆ)  Var  E (ˆ | u ( X 1 , , X n ))   E [Var (ˆ | u ( X 1 , , X n ))]
or
Var (ˆ)  Var    E [Var (ˆ | u ( X 1 ,
, X n ))]
Thus, Var (ˆ)  Var ( ) unless
Var (ˆ | u( X1 , , X n ))  0 for all u ( X1 ,
the case only if ˆ is a function of u ( X1 ,
, X n ) , which is
, X n ) , which
would imply   ˆ 
Since   E(ˆ | u( X1 , , X n )  u; ) is a function of the
sufficient statistic u ( X1 , , X n ) , the Rao-Blackwell
theorem gives a strong rationale for basing estimators on
sufficient statistics if they exist. If an estimator is not a
function of a sufficient statistic, it can be improved.
Further theoretical support for the maximum likelihood
estimator is provided by the fact that it is a function of any
sufficient statistic:
Theorem 7.3.2: Let ( X1 , , X n ) be an iid sample from a
distribution that has pdf or pmf f ( x; ),    . If a
sufficient statistic Y  u ( X1 , , X n ) for  exists and if a
maximum likelihood estimator ˆ of  also exists uniquely,
then ˆ is a function of the sufficient statistic
MLE
Y  u( X1 ,
, Xn) .
Proof: Using the factorization theorem, the likelihood can
be written as
L( )  f ( x1 , , xn ; )  k1[u( x1 , , xn ),  ]k2[( x1, , xn )] .
To maximize this quantity, we need only maximize
k1[u ( x1 , , xn ), ] ; thus ˆMLE is a function of
u( X1 ,
, Xn) .
Download