Notes 7 - Wharton Statistics Department

advertisement
Stat 550 Notes 7
Notes:
1. I would like to change my office hours today from the usual
time of 4:45-5:45 to 1-2 because of a statistics seminar this
afternoon. I also have office hours Wed., 9-10 and Thurs., 4:455:45 this week, or by appointment.
2. I am going to go over the material from Bickel and Doksum,
Section 1.6.4 in the Appendum to Notes 6 later when we cover
Section 2.3 as I think the motivation will be clearer then.
I. Methods of Estimation: Basic Heuristics of Estimation
Basic Setup: Family of possible distributions for data X
{ p( x |  ), } . Observe data X.
Point estimation: Best estimate of  based on data X.
We discussed the decision theoretic approach to evaluating point
estimates focusing particularly on squared error as a loss
function which results in mean squared error as the risk
function. But how do we come up with possible estimators of
?
Example Estimation Problems:
(1) Bernoulli data: We observe X 1 , , X n iid Bernoulli (  ) (e.g.,
Yao Ming’s free throws). How do we estimate  ?
1
(2) Regression. We are interested in the mean of a response Y
given covariates X 1 , , X p and assume a model
E (Y | X 1 , , X p )  g (   X 1 , , X p ) , where g is a known
function and  is the unknown parameter vector.
Example: Life insurance companies are keenly interested in
predicting how long their customers will live because their
premiums and profitability depend on such numbers. An
actuary for one insurance company gathered data from 100
recently deceased male customers. She recorded Y=the age at
death of the customer, X1 = the age at death of his mother, X2
=the age at death of his father, X3=the mean age at death of his
grandmothers and X4 =the mean age at death of his grandfathers.
Multiple linear regression model:
E (Y | X1 , X 2 , X 3 , X 4 )  0  1 X1  2 X 2  3 X 3  4 X 4
(3) Parameter estimation in an iid model. As part of a study to
estimate the population size of the bowhead whale, Raftery and
Zeh wanted to understand the distribution of whale swimming
speeds. They randomly sampled the time to swim 1km of 210
whales and believe that the gamma model is a reasonable model
 p x p 1e x
for this data. p( x | p,  )  ( p) .
How do we estimate p and  ?
2
(4) Hardy-Weinberg equilibrium. If gene frequencies are in
equilibrium, then for a gene with two alleles, the genotypes AA,
Aa and aa occur in a population with frequencies
(1   )2 , 2 (1   ), 2 respectively according to the HardyWeinberg law. In a sample from the Chinese population of
Hong-Kong in 1937, blood types occurred with the following
frequencies, where M and N are erythrocyte antigens:
Blood Type
M
MN
N
Total
Frequency 342
500
187
1029
We can model the observed blood types as an iid sample from a
2
2
multinomial distribution with probabilities (1   ) , 2 (1   ), .
How do we estimate  ?
Minimum contrast heuristic: Choose a contrast function
 ( X ,  ) that measures the “discrepancy” between the data X and
the parameter vector  . The range of the contrast function is
typically taken to be the real numbers greater than or equal to
zero and the smaller the value of the contrast function, the more
“plausible”  is based on the data X.
Let  0 denote the true parameter. Define the population
discrepancy D( 0 , ) as the expected value of the discrepancy
 ( X , ) :
D( 0 , )  E0  ( X , )
3
(0.1)
In order for  ( X ,  ) to be a valid contrast function, we require
that D( 0 , ) is uniquely minimized for    0 , i.e.,
D( 0 , )  D( 0 , 0 ) if    0 .
   0 is the minimizer of D(0 , ) . Although we don’t know
D( 0 , ) , the contrast function  ( X ,  ) is an unbiased estimate
of D( 0 , ) (see (0.1)). The minimum contrast heuristic is to
estimate  by minimizing  ( X ,  ) , i.e.,
ˆ  min  ( X , ) .
 
Example 1: Suppose X 1 , , X n iid Bernoulli (p), 0  p  1 . The
following is an example of a contrast functions and an
associated estimate:
n
2

(
X
,
p
)

(
X

p
)

i
“Least Squares”:
.
i 1
D( p0 , p)  E p0 [ i 1 ( X i  p) 2 ] 
n
 np0  2npp0  np 2
We have
D( p0 , p)
 2np0  2np
p
and it can be verified by the second derivative test that
arg min p D( p0 , p)  p0
n
2

(
X
,
p
)

(
X

p
)

i
Thus,
is a valid contrast function.
i 1
4
The associated estimate is
n
pˆ  arg min p  ( X , p)  arg min p  i 1 ( X i  p) 2
 arg min p p  2 p  i 1 X i
n
2


n
i 1
Xi
n
The following is an example of a function that is not a contrast
function:
n
 ( X , p)   ( X i  p) 4
i 1
D( p0 , p)  E p0 [ i 1 ( X i  p) 4 ] 
n
 E p0 [ i 1 X i4  4 X i3 p  6 X i2 p 2  4 X i p 3  p 4 ]
n
 p0  4 p0 p  6 p0 p 2  4 p0 p 3  p 4
For p0  0.7 , we find that D( p0 , p) is maximized at about
p=0.57
5
Least Squares methods for estimating regression can be viewed
as a minimum contrast estimates (Example 2.1.1).
Estimating Equation Heuristic:
Suppose  is d-dimensional. Consider a d-dimensional function
 ( X , ) and define
V ( 0  )  E0 ( X , ) .
6
Suppose V (0  )  0 has  0 as its unique solution for  0   .
We do not know V (0  but  ( X , ) is an unbiased estimate of
V (0  . The estimating equation heuristic is to estimate  by
solving  ( X , )  0 , i.e.,
 ( X ,ˆ)  0 .
 ( X , ) is called an estimating equation.
II. Method of Moments
Suppose X 1 , , X n iid from { p( x |  ), } where  is ddimensional.
Let 1 ( ), , d ( ) denote the first d-moments of the
population we are sampling from (assuming that they exist),
 j ( )  E ( X j ), 1  j  d
Define the jth sample moment ˆ j by
1 n
ˆ j   i 1 X i j , 1  j  d .
n
The function
 ( X , )  (ˆ1  1 ( ), , ˆ d  d ( ))
is an estimating equation for which
V (θ0   E  ( X , )  ( E ˆ1  1 ( ),
, E ˆ d  d ( ))  0
For many models, V (θ0   0 for all    0 so that
 ( X ,ˆ)  0 is a valid estimating equation.
7
Suppose   ( 1 ( ), , d ( )) is a 1-1 continuous function
d
d
from  to  . Then the estimating equation estimate of
 based on  ( X , ) is the ˆ that solves  ( X ,ˆ)  0 , i.e.,
ˆ   (ˆ)  0, j  1, , d .
j
j
Example 1: X 1 ,
1 ( ) 
, X n iid Uniform (0,  ) .

2
The method of moments estimator solves,
ˆ
X   0,
2
i.e., ˆ  2X .
Example 2: X 1 ,
1 ( )  
, X n iid N (  ,  2 )
2 ( )   2   2
The method of moments estimator solves,
X  0
2
X
i1 i
n
n
 2  2  0
 i 1 X i2
n
2
Thus, ˆ  X and ˆ 
Example 3: X 1 ,
n
X2


, X n iid Gamma ( p,  ) .
8
n
2
(
X

X
)
i
i 1
n
.
f ( x | p,  ) 
 p x p 1e x
( p )
for x  0 . (see Section B.2.2 of Bickel
and Doksum).
The first two moments of the gamma distribution are
p
p ( p  1)
E p , ( X )  , E p , ( X 2 ) 

 2 . (see Excercise B.2.3, page
526).
The method of moments estimator solves
pˆ
X  0
ˆ
2
X
 i 1 i
n
pˆ ( pˆ  1)
0
2
ˆ
n

which yields
X
pˆ  X ˆ  X
ˆ  n 2
i1 X i  X 2 and
n

X

n
2
X
i
i 1
n
X
2
.
Empirical Example: The gamma model is frequently used for
describing precipitation levels. In a study of the natural
variability of rainfall, the rainfall of summer storms was
measured by a network of rain gauges in southern Illinois for the
years 1960-1964. 227 measurements were taken.
9
For these data, X  .224,

n
i 1
X i2
n
 0.184 , so that the method
of moments estimates are
X
.224
ˆ  n 2

 1.674
2
 i 1 X i  X 2 .184  .224
n
X
0.224
pˆ  X

0.224
 0.375
n
2
2
0.184

0.224
 i 1 X i  X 2
n
10
The following plot shows the Gamma ( p  .375,   1.674 )
density plotted on the histogram. In order to make the visual
comparison easy, the density was normalized to have a total area
equal to the total area under the histogram, which is the number
of observation times the bin width of the histogram, or
227*.2=45.4.
Qualitatively, the fit of the gamma model to the data looks
reasonable; we will examine methods for assessing the goodness
of fit of a model to data in Chapter 4.
Large sample motivation for method of moments:
11
A reasonable requirement for a point estimator is that it should
converge to the true parameter value as we collect more and
more information.
Suppose X 1 , , X n iid.
A point estimator h(X1,...,Xn) of a parameter q( ) is consistent
P
if h(X1,...,Xn)  q( ) as n   for all   .
Definition of convergence in probability (A.14.1, page 466).
P
h(X1,...,Xn)  q( ) means that for all   0 ,
lim P[| h( X 1 ,..., X n )  q( ) |  ]  0 .
n 
Under certain regularity conditions, the method of moments
estimator is consistent. We give a proof for a special case
Let g ( )  ( 1 ( ), , d ( )) . By the assumptions in
formulating the method of moments, g is a 1-1 continuous
d
d
function from  to  . The method of moments estimator
solves
g (ˆ)  (ˆ1 , , ˆ d )  0 .
d
When the g’s range is  , then
ˆ  g 1 (ˆ1 , , ˆ d ) . We prove the method of moments
1
1
estimator is consistent when ˆ  g (ˆ , , ˆ ) and g is
1
continuous.
12
d
Sketch of Proof: The method of moments estimator solves
ˆ j   j (ˆ)  0, j  1, , d .
By the law of large numbers,
P
( ˆ1 , , ˆ d ) ( 1 ( ), ,  d ( )) .
By the open mapping theorem (A.14.8, page 467), since
g 1 is assumed to be continuous,
ˆ  g 1 ( ˆ1 ,
P
, ˆ d )  g 1 ( 1 ( ),
, d ( ))  
Comments on method of moments:
(1) Instead of using the first d moments, we could use higher
order moments (or other functions of the data – see Problem
2.1.13) instead, leading to different estimating equations. But
the method of moments estimator may be altered by which
moments we choose.
Example: X 1 , , X n iid Poisson(  ). The first moment is
1 ( )  E ( X )   . Thus, the method of moments estimator
based on the first moment is ˆ  X .
We could also consider using the second moment to form a
method of moments estimator.
2 ( )  E ( X 2 )     2 .
The method of moments estimator based on the second moment
solves
1 n 2 ˆ ˆ2
Xi    

i 1
n
13
Solving this equation (by taking the positive root), we find that
1/ 2
1 1 1 n

ˆ       i 1 X i2  .
2 4 n

The two method of moments estimators are different.
For example, for the data
> rpois(10,1)
[1] 2 3 0 1 2 1 3 1 2 1,
the method of moments estimator based on the first moment is
1.1 and the method of moments estimator based on the second
moment is 1.096872.
(2) The method of moments does not use all the information that
is available.
X 1 , , X n iid Uniform (0,  ) .
The method of moments estimator based on the first moment is
ˆ  2X . If 2 X  max X i , we know that   max X i  ˆ
14
Download