THE FACTORIZATION THEOREM FOR SUFFICIENCY It would be very difficult to identify sufficient statistics based on the definition. Fortunately, sufficient statistics can be easily identified through use of the factorization theorem. In utilizing the factorization theorem, the objective is to break the likelihood into three factors. Suppose that X is random with likelihood f(x). Almost always x is a vector and it often happens that is a vector as well. We factor the likelihood as factor involving factor involving factor involving L = f(x) = x but not but not x both x and Any of the factors could be the number 1, of course. Also, pure numeric factors such as 1 can be placed in any one of the factors. The sufficient statistic consists of whatever 2 statistics in x are in the rightmost factor. We will get the best answer if we do a good job of pulling lots of the x information into the first factor, away from . EXAMPLE 1: Here’s a simple example. Suppose that X1, X2, …, Xn is a sample from the Poisson () distribution. The likelihood is n L = e i 1 x1 x2 ... xn = e n xi ! xi n xi ! i 1 1 e n x1 x2 ... xn = n xi ! i 1 The first factor contains x’s but no , the second factor contains but no x’s, and the final factor contains inseparable x’s with . The sufficient statistic is the function of x that appears in the final factor; this is t = x1 + x2 + … + xn. In random variable form, we’d write this as T(X) = X´1 = X1 + X2 + … + Xn . EXAMPLE 2: As another simple example, suppose that X1, X2, …, Xn is a sample from N(, 2), where is known. (It’s critical to indicate whether is a known number; the contrast will be made clear by the next example.) The only unknown parameter is . The likelihood is 1 1 2 2 xi x 2 1 1 2 i 2 i 1 e e 2 L = = n/2 n 2 i 1 2 n n = 1 n 2 n/2 e n 1 n 2 2 xi 2 xi n 2 2 i1 i 1 1 gs2011 THE FACTORIZATION THEOREM FOR SUFFICIENCY = 1 n 2 n/2 e 1 2 2 n 2 xi i 1 e 2 2 n xi i 1 e n 2 2 2 1 n n n 2 xi2 xi 2 1 2 i1 2 2 i1 2 = n e e e 2 n / 2 2 The first factor involves the x’s (and known quantities) but no . The third factor involves the unknown parameter but no x’s. The middle factor involves both x’s and the unknown . The sufficient statistic is the function of x that appears in this third factor. This is T(X) = X1 + X2 + … + Xn . As a side note, we should indicate that any one-to-one function of X would also quality as a sufficient statistic; for this problem, many people would say that X is the sufficient statistic. EXAMPLE 3: Suppose that X1, X2, …, Xn is a sample from N(, 2), where neither nor is known. The likelihood is the same as above. However, with both parameters unknown we write L as n 1 n 1 n 2 2 xi2 1 22 i1 xi 2 i 1 2 e L = n e e 2 n /2 2 The first factor contains x’s (in an empty sense) but no unknown parameters. The second factor contains unknown parameters but no x’s. The final factor has both unknown parameters and x’s. We get the sufficient statistic from this final factor. Here the sufficient statistic has two coordinates. We could say that T(X) = (T1(X), T2(X) ) = n n X , X i2 . Three important notes about this example: i i 1 i 1 (1) (2) (3) Since one-to-one functions of the sufficient statistic also qualify as sufficient, many people give the sufficient statistics as ( X , s2). The dimension of the sufficient statistic has to be at least that of the unknown parameter. With two unknown parameters, we should expect at least two coordinates in the sufficient statistic. There is some temptation to match up the coordinates. We’d like to think that X is sufficient for and s2 is sufficient for 2. This should be resisted, because sufficiency is a little too complicated for this. After all, we could also make a one-to-one transformation of the parameters. 2 gs2011 THE FACTORIZATION THEOREM FOR SUFFICIENCY EXAMPLE 4: Suppose that X1, X2, …, Xn is a sample from the density x f x 12 e . The likelihood is n L = i 1 1 2 e xi n 1 = n e i 1 2 xi In this case, we are unable to break apart the sum in the exponent. Thus, we can’t factor any of the x’s away from . Accordingly, we must take the whole data X as the sufficient statistic. The sufficiency principle fails to help us for this problem. It should be noted that some people like to say “the order statistic is sufficient.” By this, they simply mean that you only need the set of values which came up as xi’s without reference to the ordering. Specifically, L can be calculated from and the set of x-values (without knowing which one actually came up as x1, which came up as x2, and so on). This point is unnecessarily tendentious. EXAMPLE 5: Suppose that Y1, Y2, …, Yn are independent random variables in which the distribution of Yi is normal N(0 + 1 xi , 2). This is of course the simple linear regression model. There are three parameters (0, 1, and ). The xi values are regarded as non-random. The likelihood is n 1 2 2 yi 0 1 xi 2 yi 0 1 xi 1 1 2 i 1 e L = e 2 = n n /2 2 i 1 2 1 n 2 Restructure the sum in the exponent: 1 2 2 n y i 1 i 0 1 xi 2 1 n 2 y n 02 12 = 2 i 2 i 1 n x i 1 2 i 20 yi 21 xi yi 20 1 xi i 1 i 1 i 1 n n n Only three functions of the yi values appear. These are n Yi 2 , i 1 3 n y , y i 1 sufficient statistics are the random variable versions n 2 i i 1 n i n , and Yi , and i 1 x i i 1 yi . The n x Y . i 1 i i gs2011 THE FACTORIZATION THEOREM FOR SUFFICIENCY n In some discussions of this model, the sums xi and i 1 n x i 1 2 i are also included as sufficient statistics. These would sufficient only in the non-technical sense that all the regression calculations can be made from the five sums n xi i 1 n xi2 i 1 n yi i 1 n yi2 i 1 n x i 1 4 i yi gs2011