Notes 18 - Wharton Statistics Department

advertisement
Stat 550 Notes 18
Reading: Chapter 4.4-4.5, 4.7
I. Confidence Bounds, Intervals and Regions (Section 4.4)
A set estimate of  is a set S ( X ) that  is estimated to belong to
based on the sample X.
Example 1: For a sample X1 , X 2 , X 3 , X 4 from a
N (  ,1) distribution, a set estimator of  is
S ( X )  [ X  1, X  1] . This means that we assert that  is in
this interval.
What is gained by using the interval rather than the point
estimate since the interval is less precise?
By giving up some precision in our estimate (or assertion about
 ), we have some gained some confidence, or assurance, that
our assertion is correct.
Example 1 continued: When we estimate  by X , the
probability that we are exactly correct, that is P(   X )  0 .
However, with a set estimator, we have a positive probability of
being correct. The probability that  is covered by the interval
[ X  1, X  1] can be calculated as
1
P(   [ X  1, X  1])  P( X  1    X  1)
 P (1    X  1)
 P (1  X    1)
1
X 
1


)
1/ 4
1/ 4
1/ 4
 P (2  Z  2)
 0.9544
where Z has a standard normal distribution.
Thus, we have over a 95% chance of covering the unknown
parameter with our interval estimator. Sacrificing some
precision in our estimate, in moving from a point to an interval,
has resulted in increased confidence that our assertion is correct.
 P(
The purpose of using a set estimator rather than a point
estimator is to have some guarantee of capturing the parameter
of interest. The certainty of this guarantee is quantified in the
following definitions:
For a set estimator S ( X ) of  , the coverage probability of
S ( X ) for  is P (  S ( X )) , i.e., the coverage probability for
 is the probability that the random set S ( X ) contains  when
the true parameter is  .
For a set estimator S ( X ) , the confidence coefficient of S ( X ) is
the infimum of the coverage probabilities,
inf  P (  S ( X )) .
2
Typically, we seek to use set estimators that have confidence
coefficients of at least 95%.
A set estimator that has confidence coefficient at least
1   is called a 1   confidence set (confidence region).
For a one-dimensional parameter  , we typically seek set
estimators that are intervals,
S ( X )  ( L( X ), U ( X )), L( X )  U ( X ) ;
this type of set estimator is called a confidence interval.
If we are only interested in a lower bound for  , we can
consider a set estimator of the form
S ( X )  ( L( X ), )
and if we are only interested in an upper bound for  , we can
consider a set estimator of the form
S ( X )  (, U ( X )) ;
these types of set estimators are called one-sided confidence
intervals or confidence bounds.
In general, a set estimator along with a confidence coefficient is
called a confidence set.
Example:
CI for mean of normal distribution with known variance:
X 1 , , X n iid N (  ,  2 ) where  2 known.
X 
~ N (0,1)
Then 
n
3
1
Let z   ( ) where  is the CDF of a standard normal
random variable, e.g., z.975  1.96 . We have

X 
1    P   z  
z 
1
1

2
2

n










P   z 
 X    z 
X
1
n
2
 1 2 n



 
P  X  z 
 X z 

1
1
n
n
2
2


Thus, X  z1 
2

(1   ) CI for 
n is a
Why are confidence sets useful?
(1) Confidence sets are typically used along with a point
estimator to give a sense of the accuracy of the point estimator,
e.g., “It is estimated that the mean volume of the left
hippocampus is 0.20 cm3 smaller for people with schizophrenia
than for people without schizophrenia. A 95% confidence
interval for the difference is 0.07 to 0.33 cm3”
(2) By behaving as if the parameter belongs to the set estimated
by the set estimator when the confidence coefficient is high
(e.g., 95%) we will usually behave reasonably.
Interpretation of confidence intervals
Common textbook interpretation of a 95% confidence interval:
4
If we repeat the experiment over and over, a 95% confidence
interval will contain the parameter 95% of the time. This is
correct but not particularly useful since we rarely repeat the
same experiment over and over.
More useful interpretation (Wasserman, All of Statistics) :
On day 1, you collect data and construct a 95 percent confidence
interval for a parameter 1 . On day 2, you collect new data and
construct a 95 percent confidence interval for an unrelated
parameter  2 . On day 3, you collect new data and construct a
95 percent confidence interval for an unrelated parameter 3 .
You continue this way constructing 95 percent confidence
intervals for a sequence of unrelated parameters 1 , 2 ,
Then
95 percent of your intervals will trap the true parameter value.
Example: The following are two recent Gallup-CBS polls.
“Are you confident or not confident in Barack Obama’s ability
to be a good president?”:
95% confidence interval for proportion of U.S. adults who are
confident: (0.62, 0.68)
“Do you believe in extrasensory perception, or ESP?”
95% confidence interval for proportion of U.S. adults who
believe in ESP: (0.38, 0.44)
If you form confidence intervals for proportions from polls
every day for the rest of your life, 95 percent of your intervals
will contain the true proportion. This is true even though you
5
are estimating a different quantity (a different poll question)
every day.
A confidence interval is not a probability statement about  :
In a confidence interval, the interval is the random quantity, not
the parameter. The fact that a confidence interval is not a
probability statement about  can be confusing. Let  be a
fixed, known real number and let X1 , X 2 be iid random variables
such that P( X i  1)  P( X i  1)  1/ 2 . Now define
Yi    X i and suppose we only observe Y1 , Y2 . Define the
following “confidence interval” which actually contains only
one point:
{Y  1} if Y1  Y2
C 1
{(Y1  Y2 ) / 2} if Y1  Y2
No matter what  is, we have P (  C )  3/ 4 so this is a 75
percent confidence interval. Suppose we now do the experiment
and we get Y1  15 and Y2  17 . Then our 75 percent confidence
interval is {16}. However, we are certain that  is 16.
II. The Duality Between Confidence Sets and Tests (Section
4.5)
Confidence sets and hypothesis tests ask the same question, but
from a slightly different perspective. Both procedures look for
consistency between sample statistics and population
parameters. The hypothesis test fixes the parameter and asks
what sample values (the acceptance region) are consistent with
6
that fixed value of the parameter. The confidence set fixes the
sample value and asks what parameter values (the confidence
set) makes this sample value most plausible.
We show that a level (1   ) confidence set can be obtained by
“inverting” a family of level  tests.
Duality Theorem: For each  0   , let A( 0 ) be the acceptance
region of a level  test of H 0 :    0 . For each possible sample
data x , define a set S ( x ) in the parameter space  by
S ( x)  { 0 : x  A( 0 )}
(0.1)
Then the random set S ( x ) is a 1   confidence set. Conversely,
let S ( x ) be a 1   confidence set. For any  0   , define
A( 0 )  { x :  0  S ( x)}
Then A( 0 ) is the acceptance region of a level  test of
H 0 :   0 .
Proof: For the first part, since A( 0 ) is the acceptance region of
a level  test,
P0 ( X  A( 0 ))   and hence P0 ( X  A( 0 ))  1   .
Since θ0 is aribtrary, we can write
P ( X  A( ))  1   for all  
The above inequality together with (0.1) shows that the coverage
probability of the set S ( x ) is given by
P (  S ( X ))  P ( X  A( ))  1   for all  
7
showing that S ( x ) is a 1   confidence set.
For the converse, the Type I error probability for the test of
H 0 :    0 with acceptance region A( 0 ) is
P0 ( X  A( 0 ))  P0 ( 0  S ( X ))   ;
so this is a level  test.
Note: The shape of the confidence interval is determined by the
type of the test that is inverted. To find two-sided confidence
intervals, invert two-sided tests of H 0 :   0 vs. H1 :   0 .
To find one-sided confidence intervals, invert one-sided tests.
Example: Suppose X 1 , , X n iid N (  ,1) and we want to find a
95% one-sided confidence interval that gives an upper bound for
 . To do this, we can invert a test of H 0 :   0 vs.
H1 :   0 . The acceptance region of the UMP level 0.05 test
is

n
i 1
( X i  0 )
n
  1 (0.05) ; see Notes 17 (note, on Notes 17, I
1
mistakenly wrote  (0.05) instead of  (0.05) ). Thus a 95%
confidence interval is
n


( X i  0 )

 


1
i 1
  (0.05)    0 : 0 
 0 :
n



 

8

n
i 1
n
Xi
 1 (0.05) 



n


If we wanted a 95% one-sided confidence interval for  that
gives a lower bound, we can invert the UMP test of H 0 :   0
vs. H1 :   0 . Thus, a 95% confidence interval is
n


( X i  0 )

 


1
i 1
  (0.95)    0 : 0 
 0 :
n



 


n
i 1
Xi
n
 1 (0.95) 



n


Suppose we wanted a two-sided 95% confidence interval for  .
We want to invert a two-sided test of H 0 :   0 vs.
H1 :   0 .
The z test (Moore and McCabe, Introduction to the Practice of
Statistics) has rejection region


C ( x)   x :


 i 1 ( X i  0 )
n
n
1
  (0.975) or

n
i 1
( X i  0 )
n


  (0.025) 


1
This is a size 0.05 test
Inverting this test gives a 95% confidence interval of


1
 0 :  (0.025) 



n
i 1
( X i  0 )
n


  (0.975)  


n

X i  1 (0.975)


i 1

 0 
 0 :
n
n


1

n
i 1
n
Xi
 1 (0.025) 



n


III. Bayesian Formulation of Set Estimators (Chapter 4.7)
In the frequentist approach to set estimation, we seek to find a
random set that has a high guaranteed probability (confidence
9
coefficient) of containing the true parameter in repeated samples
from the true distribution.
In the Bayesian approach to set estimation, we seek to find a set
that has a high posterior probability of containing the true
parameter. In the Bayesian approach, it is meaningful to talk
about the probability that the true parameter belongs to the set.
Example 1: The sponsors of television shows targeted at the
children’s market wanted to know the amount of time children
spend watching television, since the types and numbers of
programs and commercials are greatly influenced by this
information. As a result, it was decided to survey 100 American
children and ask them to keep track of the number of hours of
television they watch each week. From past experience, it is
known that the population standard deviation of the weekly
amount of television watched is   8.0 hours and that that the
weekly amount of television watched is normally distributed.
The television sponsors want a 95% confidence interval for the
mean amount of television watched by children.
10
Distributions
Time
10
20
30
40
50
Moments
Mean
27.191
Std Dev
8.3728408
Std Err Mean
0.8372841
upper 95%
Mean
28.852353
lower 95%
Mean
25.529647
N
100
95% Confidence Interval for  :

8
X z 
 27.191  1.96
 [25.62, 28.76]
1
n
100
2
11
It is tempting to say (and many experimenters do) that “the
probability is 95% that  is in the interval [25.62, 28.76].”
However, within the frequentist paradigm, such a statement is
invalid since the parameter is assumed fixed. Formally, the
interval [25.62, 28.76] is one of the possible realized values of
the random interval X  z1 
2
 does not move, X  z1 

n , and since the parameter

n is in the realized interval [25.62,
28.76] with probability either 0 or 1. When we say that the
interval [25.62, 28.76] has a 95% chance of coverage, we only
mean that we know that 95% of the time the random interval
2
X z

X z 

will
contain
the
true
1
1
n
2
2
trials of the experiment.

n in repeated
In contrast, the Bayesian setup allows us to say that  is inside
[25.62, 28.76] with some (subjective) probability, not 0 or 1.
This is because, under the Bayesian setup,  is a random
variable with a probability distribution.
A Bayesian set estimator is called a credible set (region) and the
credible probability of the set estimator is the posterior
probability that the set contains the true parameter:
P(  C ( x ) | X  x )
A set estimator C ( x ) is said to be a (1   ) credible set if the
credible probability of the set is  (1   ) , i.e.,
P(  C ( x ) | X  x )  1   .
12
A natural approach to constructing a (1   ) credible set
is to consider the collection of  that are most likely under the
posterior distribution p ( | x ) : Ck  { : p( | x )  k} where k
is chosen so that P(  Ck | X  x)  1   . Such a credible set
is called the highest posterior density (HPD) set.
The shape of an HPD region is determined by the shape of the
posterior distribution. When  is 1-dimensional and the
posterior distribution is unimodal, then an HPD set will be an
interval.
2
Example 1 continued: Suppose X 1 , , X n iid N (  ,  0 ) with
 unknown and  02 known. Suppose we take a Bayesian
2
approach and have a prior distribution  ~ N ( 0 , 0 ) with
0 , 02 known. From Notes 3, the posterior distribution
p(  | X 1 ,
, X n ) is
 nX 0


 2 2

1
0
0

N
,
 n  1 n  1 
 2 2 2 2 
0
0
0 
 0
A HPD set that has credible probability (1   ) is
13






nX
 02
n

2
0

0
 02

1

nX
1
z
1
2
0

2
n

2
0

1

,
 02
n

2
0
2
0

0
 02

1

z
1
2
0

2


1

n
1 

 02  02 
For the data in Example 1, if we have a prior for  of
N (30,100) , then the HPD set with credible probability 95% is
100* 27.19 30

1
64
100  1.96
 [25.645, 28.771]
100 1
100 1


64 100
64 100
This is slightly different than the 95% confidence interval of
[25.623,28.759].
We now examine how the 95% credible set performs as a
confidence interval and how the 95% confidence interval
performs as a credible set.
We first calculate the coverage probability of the 95% HPD
credible set. Let
nX
E ( | X ) 

2
0
n
 02

0
 02

1
 02
X
n
1

 02
n
 02
2
0

1
 02
 0
n
 02
 02 02
Var (  | X ) 

n
1  02  n 02

 02  02
1
14

1
 02
,
The posterior distribution for  | X is N ( E (  | X ),Var (  | X )) .
2
2
Let    0 /(n 0 ) .
The coverage probability of the 95% HPD credible set is


P |   E (  | X ) | 1.96 Var (  | X ) 


P  1.96 Var (  | X )    X





P     1.96 Var (  | X )  0





 0
 1.96 Var (  | X )  
n
1
n
1

 2
 2
2
2

0 0
0 0

1
n
n
1

 02
2
0






 X
    1.96 Var (  | X )  0
n
1
n
1
n
1 



 02  02
 02  02
 02  02 
1
n
1


2
2
2

0
0
0 

P    1.96 Var (  | X )  0
X
   1.96 Var (  | X )  0
n
1
n
1
n
1 





 02  02
 02  02
 02  02 

2
0
2
0


1


 n

2



X
  2  12     1.96 Var (  | X )  0  0
 2

n
1  0 / n 
 0 0 




 02  02 


P 


1


 n

2



  2  12     1.96 Var (  | X )  0  0


n
1 
 0 0 

 2
2



0 0 




1


 n
2


0
1 


  2  2    0
 1.96 Var (  | X )   
n
1
 0 0 

 2
2



0 0




1


 n
2


0
1 
   2  2     0
 1.96 Var (  | X )  
n
1
 0 0 


2
2





0
0



15
1
2
0
We can see that if the true mean  is sufficiently greater than
1
the prior mean 0 so that
  0
 02
n
 02

1
 1.96 Var (  | X ) ,
then the
 02
coverage probability will be less than 0.5, and in fact we can
find  large enough such that the coverage probability is
arbitrarily close to zero. Thus, the confidence coefficient of the
credibility set is zero.
Similar calculations show that the 95% confidence interval has
credibility probability tending to zero for some true  .
This example points out that credibility intervals do not in
general have good confidence coefficients and confidence
intervals have poor credibility probabilities for certain true
values of the parameter. That being said, the confidence interval
and credibility interval are often fairly similar as in Example 1,
and as n   , the ratio of the upper and lower bounds of the
confidence interval and credibility interval in Example 1
converge to 1 (i.e., in some sense, asymptotically the credibility
interval and confidence interval are equivalent).
16
Download