C22.0015 / B90.3302 NOTES for Wednesday 2011.APR.06 Here`s a

advertisement
C22.0015 / B90.3302
NOTES for Wednesday 2011.APR.06
Here’s a fun maximum likelihood estimation problem. Suppose that X1, X2, …, Xn is a
1  x
e
sample from the density f( x | ) =
. What’s the maximum likelihood
2
estimate?
The likelihood is
1
e
2n
L =

n

xi  
i 1
The log likelihood is
log L =  n log 2 
n

i 1
xi  
Can we ever succeed in differentiating this?
Maybe, but perhaps we don’t need to.
Let’s suppose that the xi’s have been sorted into increasing order x1 < x2 < … < xn and
that there are no ties.
n
The function

i 1
n

i 1
xi  
xi  
is a polygonal line in . If xj <  < xj+1 , then
j
=

i 1
j
=
    xi 
i 1
The derivative
xi   

n

i  j 1
xi  
n
x
i  j 1
i


d
is j – (n – j) = 2j – n . If n is even, this can be made zero by
d
n
. This would correspond to using  in the interval which defines the
2
median value. If n is odd, you can minimize this by selecting  as the median.
using j =
1
We have a detailed handout on confidence intervals. The information presented next
here gives numeric examples related to that handout.
We need to talk about the notion of confidence intervals.
topic.
Please see the handout on this
Example 1 from this handout, with actual numbers. (It helps to lay these out on a
number line.) Suppose that you have a sample of 28 values from a normal population
with mean  and standard deviation . The sample has an average of 2,840 and a
s
standard deviation of 418. The 95% confidence interval for  is x  t0.025; 27
.
n
418
This is 2,840  2.0518
, or about 2,840  162.08. This interval is
28
(2,677.92 , 3,002.08).
418
, which is
28
2,840  218.87. It’s quite a bit longer, but that’s the cost of greater confidence.
The 99% confidence interval uses t0.005; 27 = 2.7707. It is 2,840  2.7707
The one-sided 95% interval to infinity starts at x - t0.05; 27
s
. This is
n
418
 2,840 - 134.54 = 2,705.46. The interval is (2,705.46 , ∞).
28
Observe that it starts below the value for x .
2,840 - 1.7032
The one-sided 95% interval from -∞ ends at x + t0.05; 27
s
. This is
n
418
 2,840 + 134.54 = 2,974.54. The interval is (-∞ , 2.974.54). This
28
interval ends above the value for x .
2,840 + 1.7032
You get to pick you confidence interval strategy once. You don’t get to play around
with lots of possibilities, as we’ve done here for illustrative purposes.
Example 2 from this handout, with actual numbers. Suppose that, with the data above,
you wanted a 95% confidence interval for 2, and you wanted this interval one-sided
 n  1 s 2
27  4182
from zero to an upper bound. This upper bound is
1  =
16.1514
 2n1 
 292,082.9154. We are 95% confident that 2 is at most 292,082.9154.
2
A simple
square root says that we’re 95% confident that  is at most 540.45. Note that this upper
limit is above the data value s = 418.
The one-sided interval to infinity has lower limit
 n  1 s 2
 2n1 

=
27  4182
40.1133
 117,605.5822. We’re 95% confident that 2 exceed this. We’re 95% confident that 
exceeds 342.94. This lower limit is below the data value s = 418.
 n  1 s 2 
  n  1 s 2
The two-sided 95% interval would probably be done as 
.
,

1  2 
2
  2  2




n 1
n 1


 27  4182 27  4182 
This interval is 
,
 . This is (109,216.4049 , 323,709.4981). By
 43.1945 14.5734 
taking square roots, we get the interval for  as (330.48, 568.95). This is not going to be
the shortest interval.
Example 3 from this handout with actual numbers. Suppose that you have two samples
of values with these summaries:
Sample
Size
Average
A
B
18
22
310
355
Standard
deviation
47
51
We want the 95% confidence interval for the difference between population means.
Begin by finding the pooled standard deviation. Its value is 49.2507. The confidence
mn
interval is  X  Y   tm/2n 2 s p
, and this works out to (-76.7, -13.3). Note
mn
that this excludes zero, corresponding to a significant difference between the sample
0.025
means. The other interesting thing here is tm/2n 2 = t38
= 2.0244.
Example 4 from this handout with actual numbers. Using the values above, let’s say you
2A
wanted a 95% one-sided interval, zero to an upper limit, for 2 . Let’s agree that m = 18
B
s A2 
(sample size for sA) and n = 22 (sample size for sB). This upper limit is 2 Fn 1, m1 ,
sB
3
2
 47 
0.05
which here uses Fn1, m1  F21,17
= 2.2189. The upper limit is    2.2189 
 51 
1.8845.
2B
Now . . . suppose that you had phrased this as getting a lower limit for 2 . This lower
A
sB2 1  
0.95
limit is 2 Fn 1, m1 , here using Fn11,m1  F21,17
= 0.4675. The lower limit is
sA
2
 51 
   0.4675  0.5505
 47 
Two observations:
1
 1.8167 , so these are really the same interval viewed in
(1)
0.5505
two different ways.
1
1
0.95
(2)
 0.4675. If you lack a table of
F21,17
 0.05 =
2.1389
F17, 21
lower percent points, you can still get the values from upper
percent point tables.
All the usual notes apply again. We don’t get to play around with one-sided versus
two-sided, and we don’t get to do game with the confidence value.
Example 5 from this handout. There’s a separate handout on Fieller’s result.
Example 6 from this handout. It’s the simple binomial confidence interval. You’ve seen
this many times.
Example 7 from this handout. This has accompanying numbers.
Our final issue here is that of the relevance of hypothesis testing. Is it really just a game?
There’s a nice handout on this. This was a Science News article, March 27, 2010. The
article is by Tom Siegfried, pp 26-29.
4
Download