Statistics 512 Notes 21: Sufficiency and Its Implications Sufficiency Definition: A statistic Y u ( X1 , , X n ) is said to be sufficient for if the conditional distribution of X 1 , , X n given Y y does not depend on for any value of y . Example 1: X 1 , , X n a sequence of independent Bernoulli random variables with P( X i 1) . Y i 1 X i is sufficient for . n Example 2: Let X 1 , , X n be iid Uniform( 0, ). Consider the statistic Y max1i n X i . We have shown before (see Notes 1) that ny n 1 0 y fY ( y ) n 0 elsewhere For Y , we have P ( X 1 x1 , , X n xn | Y y ) P ( X 1 x1 , , X n xn | Y y ) P (Y y ) 1 I n Y 1 n 1 n 1 ny n IY ny which does not depend on . For Y , P ( X1 x1 , , X n xn | Y y) 0 . Thus, the conditional distribution does not depend on and Y max1i n X i is a sufficient statistic. It is often hard to verify or disprove sufficiency of a statistic directly because we need to find the distribution of the sufficient statistic. The following theorem is often helpful. Factorization Theorem: Let X 1 , , X n denote the a random sample of size n from a distribution that has pdf or pmf f ( x; ) , . A statistic Y u ( X1 , , X n ) is sufficient for if and only if we can find two nonnegative functions, k1 and k2 such that for all sample points ( x , , x ) f ( x1; ) f ( xn ; ) k1[u( x1 , , xn ); ]k2[( x1 , , xn )] 1 where k2 [( x1 , n , xn )] does not depend upon . Proof: We prove this for the discrete case. (The proof for the general case is more subtle and requires regularity conditions, but the basic ideas are the same). First, suppose that the probability mass function factors as given in the theorem. We have P(u ( X 1 , , X n ) u; ) P( X 1 x1 , , X n xn ; ) u ( x1 , , xn ) u k1[u; ] u ( x1 , , xn ) k2 [( x1 , , xn )] We then have P[( X 1 x1 , , X n xn ) | u ( X 1 , , X n ) u; ] P(( X 1 x1 , , X n xn ), u ( X 1 , , X n ) u; ) P(u ( X 1 , , X n ) u; ) k2 [( x1 , , xn )] k2 [( x1 , , xn )] u ( x1 , , xn ) u Thus, the conditional distribution does not depend on so that Y u ( X1 , , X n ) is sufficient for . Now suppose that Y u ( X1 , , X n ) is sufficient for , meaning that the distribution of ( X1 , , X n ) given , X n ) is independent of . Let k1[u, ] P[u( X1 , , X n ) u; ] u( X1 , , xn )] P[( X1 x1 , k2 [( x1 , , X n xn ) | u( X1 , , X n ) u; ] We then have P [( X 1 x1 , , X n xn ); ] P[u ( X 1 , , X n ) u ( x1 , k1[u ( x1 , , xn ), ]k2 [( x1 , , xn ); ]P[( X 1 x1 , , X n xn ) | u ( X 1 , , X n ) u( x1, , xn )] , xn )] as was to be shown. Example 1 Continued: X 1 , , X n a sequence of independent Bernoulli random variables with P( X i 1) . To show that Y i 1 X i is sufficient for , we factor the probability mass function as follows: n P( X 1 x1 , n , X n xn ; ) xi (1 )1 xi i 1 x n x i1 i (1 ) i1 i n 1 n i1 xi n (1 ) n The pmf is of the form k1 (i 1 xi , )k2 ( x1 , k2 ( x1 , , xn ) 1. n , xn ) where Example 2 continued: Let X 1 , , X n be iid Uniform( 0, ). To show that Y max1i n X i is sufficient, we factor the pdf as follows: n 1 1 f ( x1 , , xn ; ) I 0 xi n I max1in X i I max1in X i 0 i 1 The pdf is of the form k1 ( I max1in X i , )k2 ( x1 , , xn ) where 1 k ( x1 , , xn ; ) n I max1in X i , k2 ( x1 , , xn ) I max1in X i 0 2 Example 3: Let X 1 , , X n be iid Normal ( , ). Although the factorization theorem was stated explicitly for a onedimensional sufficient statistic, it also applies to multidimensional sufficient statistics. The pdf factors as 1 1 exp 2 ( xi ) 2 2 i 1 2 1 n 1 2 n exp ( x ) 2 2 i 1 i (2 ) n / 2 n , xn ; , ) 2 f ( x1 , 1 1 exp 2 n/2 (2 ) 2 n n 2 2 x 2 x n i i i 1 i 1 n The pdf is thus of the form k1 (i 1 xi , i 1 xi2 , , 2 )k2 ( x1 , n k2 ( x1 , n , xn ) where , xn ) 1. 2 Thus, (i 1 xi , i 1 xi ) is a two-dimensional sufficient n n 2 statistic for ( , ) , i.e., the distribution of X 1 , , X n is 2 2 independent of ( , ) given (i 1 xi , i 1 xi ) . n n Rao-Blackwell Theorem: Theorem 7.3.1 (stated a little differently): Let X 1 , , X n be an iid sample from the pdf or pmf f ( x; ) , . Let u ( X1 , , X n ) be a sufficient statistic for . Let ˆ W ( X , , X ) be an estimator of . Because 1 n , X n ) is a sufficient statistic for , E(ˆ | u( X1, , X n ) u; ) is a function of X1 , , X n that is independent of . The theorem is that for all , MSE ( ) MSE (ˆ) The inequality is strict unless ˆ . u( X1 , Proof: By the property of iterated conditional expectations (Theorem 2.3.1 (a)), E ( ) E E (ˆ | u ( X1 , , X n )) E ( ) . Therefore, to compare the two mean square error of the two estimators, we need only compare their variances. From the proof of Theorem 2.3.1(b), we have Var (ˆ) Var E (ˆ | u ( X 1 , , X n )) E [Var (ˆ | u ( X 1 , , X n ))] or Var (ˆ) Var E [Var (ˆ | u ( X 1 , , X n ))] Thus, Var (ˆ) Var ( ) unless Var (ˆ | u( X1 , , X n )) 0 for all u ( X1 , the case only if ˆ is a function of u ( X1 , , X n ) , which is , X n ) , which would imply ˆ Since E(ˆ | u( X1 , , X n ) u; ) is a function of the sufficient statistic u ( X1 , , X n ) , the Rao-Blackwell theorem gives a strong rationale for basing estimators on sufficient statistics if they exist. If an estimator is not a function of a sufficient statistic, it can be improved. Further theoretical support for the maximum likelihood estimator is provided by the fact that it is a function of any sufficient statistic: Theorem 7.3.2: Let ( X1 , , X n ) be an iid sample from a distribution that has pdf or pmf f ( x; ), . If a sufficient statistic Y u ( X1 , , X n ) for exists and if a maximum likelihood estimator ˆ of also exists uniquely, then ˆ is a function of the sufficient statistic MLE Y u( X1 , , Xn) . Proof: Using the factorization theorem, the likelihood can be written as L( ) f ( x1 , , xn ; ) k1[u( x1 , , xn ), ]k2[( x1, , xn )] . To maximize this quantity, we need only maximize k1[u ( x1 , , xn ), ] ; thus ˆMLE is a function of u( X1 , , Xn) .