It would be very difficult to identify sufficient statistics based on the

advertisement
THE FACTORIZATION THEOREM FOR SUFFICIENCY

It would be very difficult to identify sufficient statistics based on the definition.
Fortunately, sufficient statistics can be easily identified through use of the factorization
theorem. In utilizing the factorization theorem, the objective is to break the likelihood
into three factors.
Suppose that X is random with likelihood f(x). Almost always x is a vector and it
often happens that  is a vector as well. We factor the likelihood as
 factor involving   factor involving   factor involving 
L = f(x) = 



 x but not     but not x   both x and  
Any of the factors could be the number 1, of course. Also, pure numeric factors such as
1
can be placed in any one of the factors. The sufficient statistic consists of whatever
2
statistics in x are in the rightmost factor. We will get the best answer if we do a good job
of pulling lots of the x information into the first factor, away from .
EXAMPLE 1: Here’s a simple example. Suppose that X1, X2, …, Xn is a sample from the
Poisson () distribution. The likelihood is
n
L =
e

i 1
x1  x2  ...  xn


= e  n
xi !
xi
n
 xi !
i 1


 1 
  e n   x1  x2 ... xn
=  n


  xi ! 
 i 1



The first factor contains x’s but no , the second factor contains  but no x’s, and the final
factor contains inseparable x’s with . The sufficient statistic is the function of x that
appears in the final factor; this is t = x1 + x2 + … + xn. In random variable form, we’d
write this as T(X) = X´1 = X1 + X2 + … + Xn .
EXAMPLE 2: As another simple example, suppose that X1, X2, …, Xn is a sample from
N(, 2), where  is known. (It’s critical to indicate whether  is a known number; the
contrast will be made clear by the next example.) The only unknown parameter is .
The likelihood is
1
1
2
 2   xi  

 x  2 
 1
1

2 i
2  i 1
e
e 2
L = 
=

n/2
 n  2 
i 1 

  2
n
n
=


1
 n  2 
n/2
e
n
1  n 2
2 
  xi  2  xi  n 
2 2  i1
i 1

1
gs2011
THE FACTORIZATION THEOREM FOR SUFFICIENCY

=

1
 n  2 
n/2
e
1
2 2
n

2
 xi
i 1
e
2 2
n
 xi
i 1

e
n 2
2 2
1 n
 n
n

 2  xi2  
 xi    2
1
2  i1
2 2 i1
2
=  n

e

e

e


   2  n / 2
 
 

 
2




The first factor involves the x’s (and known quantities) but no . The third factor
involves the unknown parameter  but no x’s. The middle factor involves both x’s and
the unknown . The sufficient statistic is the function of x that appears in this third
factor. This is T(X) = X1 + X2 + … + Xn . As a side note, we should indicate that any
one-to-one function of X would also quality as a sufficient statistic; for this problem,
many people would say that X is the sufficient statistic.
EXAMPLE 3: Suppose that X1, X2, …, Xn is a sample from N(, 2), where neither  nor
 is known. The likelihood is the same as above. However, with both parameters
unknown we write L as
n
1 n
 1    n 2 
 2  xi2 
 1 22 i1 xi
2  i 1
2
 e
L = 
   n e
  e

  2  n /2  




 

2
The first factor contains x’s (in an empty sense) but no unknown parameters. The second
factor contains unknown parameters but no x’s. The final factor has both unknown
parameters and x’s. We get the sufficient statistic from this final factor. Here the
sufficient statistic has two coordinates. We could say that T(X) = (T1(X), T2(X) ) =
n
 n

X
,
X i2  . Three important notes about this example:


i

i 1
 i 1

(1)
(2)
(3)

Since one-to-one functions of the sufficient statistic also qualify as
sufficient, many people give the sufficient statistics as ( X , s2).
The dimension of the sufficient statistic has to be at least that of the
unknown parameter. With two unknown parameters, we should expect at
least two coordinates in the sufficient statistic.
There is some temptation to match up the coordinates. We’d like to think
that X is sufficient for  and s2 is sufficient for 2. This should be
resisted, because sufficiency is a little too complicated for this. After all,
we could also make a one-to-one transformation of the parameters.
2
gs2011
THE FACTORIZATION THEOREM FOR SUFFICIENCY

EXAMPLE 4: Suppose that X1, X2, …, Xn is a sample from the density
 x 
f  x    12 e
. The likelihood is

n
L =
i 1
1
2
e
 xi  

n
1 
= n e i 1
2
xi  
In this case, we are unable to break apart the sum in the exponent. Thus, we can’t factor
any of the x’s away from . Accordingly, we must take the whole data X as the sufficient
statistic. The sufficiency principle fails to help us for this problem.
It should be noted that some people like to say “the order statistic is sufficient.”
By this, they simply mean that you only need the set of values which came up as
xi’s without reference to the ordering. Specifically, L can be calculated from 
and the set of x-values (without knowing which one actually came up as x1, which
came up as x2, and so on). This point is unnecessarily tendentious.
EXAMPLE 5: Suppose that Y1, Y2, …, Yn are independent random variables in which
the distribution of Yi is normal N(0 + 1 xi , 2). This is of course the simple linear
regression model. There are three parameters (0, 1, and ). The xi values are regarded
as non-random. The likelihood is
n
1
2
 2   yi  0  1 xi 
 2  yi  0  1 xi  
 1
1
2  i 1
e
L = 
e 2
 = n
n /2
  2 

i 1   2
1
n
2
Restructure the sum in the exponent:
1
2 2
n
 y
i 1
i
 0  1 xi 
2
1 n 2
y  n 02  12
=
2  i
2  i 1
n
x
i 1
2
i

 20  yi  21  xi yi  20 1  xi 
i 1
i 1
i 1

n
n
n
Only three functions of the yi values appear. These are
n
Yi 2 ,
i 1

3
n
y , y
i 1
sufficient statistics are the random variable versions
n
2
i
i 1
n
i
n
, and
Yi , and
i 1
x
i
i 1
yi . The
n
x Y .
i 1
i
i
gs2011
THE FACTORIZATION THEOREM FOR SUFFICIENCY

n
In some discussions of this model, the sums  xi and
i 1
n
x
i 1
2
i
are also included as
sufficient statistics. These would sufficient only in the non-technical sense that all the
regression calculations can be made from the five sums
n
 xi
i 1

n
 xi2
i 1
n
 yi
i 1
n
 yi2
i 1
n
x
i 1
4
i
yi
gs2011
Download