Notes 8 - Wharton Statistics Department

advertisement
Statistics 550 Notes 8
Reading: Section 1.6.1-1.6.4
I. Correction on Minimal Sufficiency
The statement of Theorem 2 in Notes 7 was wrong. The
correct statement is
Theorem 2 (Lehmann and Scheffe, 1950): Suppose S ( X ) is
a sufficient statistic for  . Also suppose that if for two
sample points x and y , the ratio f ( x |  ) / f ( y |  ) is
constant as a function of  , then S ( x )  S ( y ) . Then
S ( X ) is a minimal sufficient statistic for  .
Proof: Let T ( X ) be any statistic that is sufficient for  . By
the factorization theorem, there exist functions g and h
such that f ( x |  )  g (T ( x ) | θ ) h( x ) . Let x and y be any
two sample points with T ( x )  T ( y ) . Then
f ( x |  ) g (T ( x ) |  )h( x) h( x)


f ( y |  ) g (T ( y) |  )h( y) h( y) .
Since this ratio does not depend on  , the assumptions of
the theorem imply that S ( x )  S ( y ) . Thus, S ( X ) is at
least as coarse a partition of the sample space as T ( X ) , and
consequently S ( X ) is minimal sufficient.
Example 1: Consider the ratio
1
x
n
x
  i1 i (1   )  i1 i
n
n
f (x | )

n
n
f ( y |  )   i1 yi (1   ) n  i1 yi .
If this ratio is constant as a function of  , then we must
have i 1 xi  i 1 yi . Since we have shown that
n
n
n
T ( X )   X i is a sufficient statistic, it follows from the
i 1
n
above sentence and Theorem 2 that T ( X )   X i is a
i 1
minimal sufficient statistic.
II. Exponential Families
The binomial and normal models exhibited the interesting
feature that there is a natural minimal sufficient statistic
whose dimension is independent of the sample size. The
exponential family models are a general class of models
that exhibit this feature.
The class of exponential family models includes many of
the mostly widely used statistical models (e.g., binomial,
normal, gamma, Poisson, multinomial). Exponential
family models have an underlying structure with elegant
properties that we will discuss.
One-parameter exponential families: The family of
distributions of a model {P :  } is said to be a oneparameter exponential family if there exist real-valued
2
functions  ( ), B( ), T ( x ), h( x ) such that the pdf or pmf
may be written as
p( x |  )  h( x ) exp{ ( )T ( x )  B( )}
(0.1)
Comments:
(1) For an exponential family, the support of the
distribution (i.e., { x : p( x |  )  0} ) cannot depend on  .
Thus, X 1 , , X n iid Uniform (0,  ) is not an exponential
family model.
(2) For an exponential family model, T ( x ) is a sufficient
statistic by the factorization theorem.
(3)  , B, T are not unique. For example,  can be
multiplied by a constant c and T can be divided by the same
constant c.
Examples of one-parameter exponential family models:
(1) Poisson family.
Let X ~ Poisson( ), 0     . Then for x {0,1, 2,...} ,
 x e 1
p( x |  ) 
 exp{x log    } .
x!
x!
This is a one-parameter exponential family with
 ( )  log  , B( )   , T ( x)  x, h( x)  1/ x ! .
(2) Binomial family.
Let X ~ Binomial(n, ), 0    1 . Then for
x  {0,1, 2,..., n} ,
3
n
p( x |  )     x (1   ) n  x
 x
n
  
   exp[ x log 
  n log(1   )]
x
1




 
This is a one-parameter exponential family with
n
  
 ( )  log 
,
B
(

)


n
log(1


),
T
(
x
)

x
,
h
(
x
)

 

 1 
 x
The family of distributions obtained by taking iid samples
from one-parameter exponential families are themselves
one-parameter exponential families.
Specifically, suppose X ~ P and {P :  } is an
exponential family, then for X1 , , X n iid with common
distribution P ,
p( x1 ,
 n

n
, xn |  )   h( xi )  exp  ( ) i 1T ( xi )  nB( ) 


 i 1

A sufficient statistic is i 1T ( xi ) and it is one dimensional
whatever the sample size n is.
For X 1 , , X n iid Poisson (  ), the sufficient statistic
n

n
i 1
T ( xi ) has a Poisson ( n ) distribution and hence has an
exponential family model. It is generally true that the
sufficient statistic of an exponential family model follows
an exponential family.
4
Theorem 1.6.1: Let {P :  } be a one-parameter
exponential family of discrete distributions:
p( x |  )  h( x ) exp{ ( )T ( x )  B( )}
Then the family of the distributions of the statistic T ( X ) is
a one-parameter exponential family of discrete distributions
whose pdf may be written
h *(t ) exp{ ( )t  B( )}
for suitable h*.
Proof: By definition,
P [T ( x)  t ]   p( x |  )
{ x:T ( x ) t }


h( x) exp[ ( )T ( x)  B( )]
{ x:T ( x ) t }
 exp[ ( )t  B( )]{

h( x)}
{ x:T ( x ) t }
*
If we let h (t ) 

{ x:T ( x ) t }
h( x) , the result follows.
A similar theorem holds for continuous exponential
families.
A useful reparameterization of the exponential family
model is to index    ( ) as the parameter to yield
p( x |  )  h( x) exp[T ( x)  A( )] ,
(0.2)
where A( )  log   h( x) exp[T ( x)]dx in the continuous
case and the integral is replaced by a sum in the discrete
space. If   , then A( ) must be finite. Let
  { :| A( ) | } . The model given by (0.2) with
5
 ranging over  is called the canonical one-parameter
exponential family generated by T and h.  is called the
natural parameter space and T is called the natural
sufficient statistic. The canonical one-parameter
exponential family contains the one-parameter exponential
family (0.1) with parameter space   and can be thought
of as the “biggest” possible parameter space for the
exponential family.
Example 1: Let X ~ Poisson( ), 0     . Then for
x {0,1, 2,...} ,
p( x |  ) 
 x e
x!

1
exp{x log    }
x!
(0.3)
Letting   log  , we have
1
p ( x |  )  exp{ x  exp( )}, x={0,1,2,...} .
x!
We have

1
A( )  log  e x
x 0 x !
(e ) x
 log 
x!
x 0

 log exp(e )  e
Thus,   { :| A( ) | }   .
Note that if 1     , then (0.3) would still be a oneparameter exponential family but it would be a strict subset
of the canonical one-parameter exponential family
6
generated by T and h with natural parameter space
  { :| A( ) | }   .
A useful result about exponential families is the following
computational shortcut for moments of the natural
sufficient statistic:
Theorem 1.6.2: If X is distributed according to (0.2) and
 is an interior point of  , then the moment-generating
function of T ( X ) exists and is given by
M ( s )  E[exp( sT ( X ))]exp[ A( s   )  A( )]
for s in some neighborhood of 0.
Moreover,
E [T ( X )]  A '( ), Var [T ( X )]  A ''( ) .
Proof: This is the proof for the continuous case.
M ( s )  E (exp( sT ( X )))  
 h( x) exp[(s   )T ( x)  A( )]dx
 {exp[ A( s   )  A( )]}  h( x) exp[( s   )T ( x)  A( s   )]dx
 exp[ A( s   )  A( )]
because the last factor, being the integral of a density, is
one. The rest of the theorem follows from the moment
generating property of M ( s ) (see Section A.12 of Bickel
and Doksum).
Comment on proof: In order for the moment generating
function (MGF) properties to hold, the MGF must exist (be
less than infinity) for s in some neighborhood of 0. The
proof that the MGF exists for s in some neighborhood of 0
7
relies on the fact that  is an interval or  , which is
established in Section 1.6.4.
Example 1 continued: Let X ~ Poisson( ), 0     .
The natural sufficient statistic is T ( X )  X and   log  ,
A( )  e . Thus, using Theorem 1.6.2,
E [ X ] 
d 
e
 e



log

d  log
d2 
Var [ X ]  2 e
 e



log

d
  log
Example 2: Suppose X 1 , , X n is a sample from a
population with pdf
x
x2
p( x |  )  2 exp( 2 ), x  0,   0

2
This is known as the Rayleigh distribution. It is used to
model the density of time until failure for certain types of
equipment. The data comes from an exponential family:
n
xi2
 n xi 
p( x1 , , xn |  )    2  exp( 2 )
i 1 2
 i 1  
 n 
1
   xi  exp(  2
2
 i 1 
Here

n
x
i 1
2
i
 n log  2 )
1
1
2
,



, B( )  n log  2 , A( )  n log(2 ) .
2
2
2
8
n
Therefore, the natural sufficient statistic  X
i 1
2
i
has mean
A '( )  n /   2n 2 and variance A ''( )  n /  2  4n 4 .
Proving that a one parameter family is not an exponential
family
A one parameter exponential family is a family
p( x |  )  h( x ) exp{ ( )T ( x )  B( )} ,   .
Consider a one parameter family { p( x |  ),   } . If the
support of p( x |  ) is different for different  , then the
family is not an exponential family because p( x |  )  0 if
and only if h( x )  0 .
Suppose that the support of p( x |  ) is the same for all
  . We can write the pdf or pmf of the family as
p( x |  )  h( x ) exp{g ( x,  )} .
In order for this to be an exponential family, we need to be
able to write
g ( x,  )   ( )T ( x )  B( )
(0.4)
for some functions  , B and T .
Suppose (0.4) holds. Then for any two sample points x1
and x2 ,
g ( x1 , )  g ( x2 , )   ( )[T ( x1 )  T ( x2 )] and
for any four sample points x1 , x2 , x3 , x4 ,
9
g ( x1 ,  )  g ( x2 ,  ) T ( x1 )  T ( x2 )

g ( x3 ,  )  g ( x4 ,  ) T ( x3 )  T ( x4 )
is constant as a function of  .
Thus, a necessary condition for a one-parameter
exponential family is that for any four sample points,
x1 , x2 , x3 , x4 ,
g ( x1 , )  g ( x2 ,  )
g ( x3 ,  )  g ( x4 ,  )
must be constant as a function of  .
Proof that the Cauchy family is not an exponential family:
The Cauchy family is
1
p( x |  ) 
 (1  ( x   ) 2 )


1
 exp log
2 
  (1  ( x   ) ) 
 exp{ log   log[1  ( x   ) 2 ]},      ,    x  
Thus, for the Cauchy family,
g ( x, )   log   log[1  ( x   )2 ] .
For any four sample points x1 , x2 , x3 , x4 ,
g ( x1 , )  g ( x2 ,  )  log[1  ( x1   ) 2 ]  log[1  ( x2   ) 2 ]

g ( x3 , )  g ( x4 ,  )  log[1  ( x3   ) 2 ]  log[1  ( x4   ) 2 ]
This is not constant as a function of  so the Cauchy
family is not an exponential family.
10
II. Multiparameter exponential families
One-parameter exponential families have a natural onedimensional sufficient statistic regardless of the sample
size. A k-parameter exponential family has a kdimensional sufficient statistic regardless of the sample
size.
The family of distributions of a model {P :  } is said to
be a k-parameter exponential family if there exist realvalued functions 1 , ,k , B of  such that the pdf or pmf
may be written as
k
p( x |  )  h( x) exp{ j ( )T j ( x)  B( )}
j 1
By the factorization theorem, T ( X )  (T1 ( X ),
a sufficient statistic.
(0.5)
, Tk ( X )) is
2
Example 1: Suppose X 1 , , X n is iid N (  ,  ) . Then
n
( xi   ) 2
1
2
p ( x |  , )  
exp{
}
2
2
2
i 1
n

x n 2 
n
 1 
 1

2
i 1 i

 exp  2 2  i 1 xi    2  2 2 
 2 


which corresponds to a two-parameter exponential family
n
2
with T ( X )  (i 1 X i , i 1 X i ) .
n
n
11
Example 2: Multinomial. Suppose we observe n
independent trials where each trial can end up in one of k
possible categories {1,...,k} with probabilities
  { p1 , , pk 1 , pk  1  p1   pk 1} . Let
y1 ( x ), , yk ( x) be the number of outcomes in categories
1,...,k in the n trials. Then,
p( x |  ) 



n!
y1 ( x )
yk ( x )
n!
y1 ( x )
 p1 
 
yk ( x )  pk 
n!
y1 ( x )
yk ( x )
pk yk ( x )
p1 y1 ( x )
y1 ( x )
 pk 1 


 pk 
yk 1 ( x )
exp[ y1 ( x ) log( p1 / pk ) 
pk n
 y k 1 ( x ) log( pk 1 / pk )  n log pk ]
n!
y1 ( x )
yk ( x )
exp[ y1 ( x ) log( p1 / pk ) 
k 1
 y k 1 ( x) log( pk 1 / pk )  n log(1   exp(log
i 1
The multinomial is a (k-1) parameter exponential family
with   (log( p1 / pk , ,log( pk 1 / pk )) ,
k 1
T ( x)  y1 ( x),
, yk 1 ( x) and A( )  n log(1   exp(i )) .
i 1
Moments of Sufficient Statistics: As with the oneparameter exponential family, it is convenient to index the
family by   (1 , ,k ) . The analogue of Theorem 1.6.2
that calculates the moments of the sufficient statistics is
Corollary 1.6.1:
12
pi
))]
pk
T
 A
E0 T ( X )  
(0 ),


 1

A
,
(0 ) 
k

2 A
Var0 T ( X ) 
(0 )
a b
Example 2 continued: For the multinomial distribution,
E[ y j ( x )] 

n
i
pi
pk
ne


n log 1   ei  

k 1
k 1
 j

 i 1  1  e i 1 


k 1
i 1
Cov0 [ yi ( x ), y j ( x )] 


i 1
pi
pk

 npi
1
pi
pk
pk
n
i  j
ne e


n log 1   ei  
 npi p j , i  j
k 1
 j k
 i 1  (1  ei ) 2

k 1
i 1
Var0 [ yi ( x )] 



n log 1   ei   npi (1  pi ) .
2
 j
 i 1 
k 1
2
Curved Exponential Families:
A curved exponential family is a family
k
p( x |  )  h( x) exp{ j ( )T j ( x)  B( )}
j 1
for which dim( )  k .
An exponential family for which dim( )  k is a full
exponential family.
Example of a curved exponential family:
X 1 , , X n ~ N ( ,  2 ) .
13
n
p( x |  )  
i 1
( xi   ) 2
1
exp{
}
2
2
2
 1
 1 

 exp  2 2
2




n
n
 1 
 1

exp
 2

 2
 2 

n
2
i 1 i
x


n
x
i 1 i
2

n 2 
 2
2 

n
x



 i 1 i 2 
2
This is an exponential family with   (1/(2 ),1/  ) .
The parameter space is a curve:
14
 i 1 xi2 
n
1
n
Download