LINK ¨ OPINGS UNIVERSITET 732A36 THEORY OF STATISTICS, 6 CDTS Institutionen f¨

advertisement
LINKÖPINGS UNIVERSITET
Institutionen för datavetenskap
Statistik, ANd
732A36 THEORY OF STATISTICS, 6 CDTS
Master’s program in Statistics and Data Mining
Spring semester 2011
Suggested solutions to examples of tasks that may appear in exam
Task 1
(a) The Beta distribution is part of the exponential family and the density function may
be written
f (x; a, b) = e(a−1) log(x)+(b−1) log(1−x)−log B(a,b)
Thus the likelihood function is
P
P
L(x; a, b) = e(a−1) log xi +(b−1) log(1−xi )−n log B(a,b)
P
P
⇒ T = ( log xi , log(1 − xi )) is minimal sufficient for (a, b)
(b) The log-likelihood function is
l(a; x) = (a − 1)
X
log xi +
X
log(1 − xi ) − n · log B(a, 2)
First derivative w.r.t. a is
X
dl
=
log xi − n ·
da
Since
Z
B(a, 2) =
d
da
{B(a, 2)}
B(a, 2)
1
xa−1 (1 − x)dx = . . . =
0
1
a(a − 1)
we get
−2a − 1
d
{B(a, 2)} = . . . = 2
da
a (a + 1)2
that gives us
X
(2a + 1)n
dl
=
log xi +
da
a(a + 1)
Further we get
2na(a + 1) − (2a + 1)n(2a + 1)
n(2a2 + 2a + 1)
d2 l
=
=
.
.
.
=
−
da2
a2 (a + 1)2
a2 (a + 1)2
which is always negative since a > 1 > 0. Thus the equation that gives the MLestimator is
X
(2a + 1)n
=−
log xi
a(a + 1)
(c)
d2 l
Ia = E − 2
da
n(2a2 + 1)
= 2
a (a + 1)2
is N µ = a, σ = p
a(a + 1)
Thus the asymptotic distribution of âM L
. Subn(2a2 + 2a + 1)
stituting âM L = 2.5 for a gives the numerical 95% confidence interval as
2.5 · 3.5
2.5 ± 1.96 · p
⇒ 2.5 ± 0.9
20 · (2 · 2.52 + 2 · 2.5 + 1)
1
(d) The likelihood function can be written
7
Y
L(a; x ) =
!
f (xi ; a, 2)
· (P (X > 0.9))3
i=1
Now,
1
Z
P (X > 0.9) =
0.9
We also have that B(a, 2) =
upon simplification
R1
0
1
xa−1 (1 − x)dx
B(a, 2)
xa−1 (1 − x)dx = 1/a − 1/(a + 1) = 1/(a(a + 1)). Thus
1 − 0.92 (0.1a + 1)
a2 (a + 1)2
P (X > 9) =
and the likelihood function simplifies to
1
L(a; x ) = 7
·
a (a + 1)7
7
Y
!a−1
! 3
7
Y
1 − 0.92 (0.1a + 1)
(1 − xi ) ·
a2 (a + 1)2
i
·
zi
1
The log-likelihood function becomes (upon simplification)
l(a; x ) = −13 log a−13 log(a+1)+(a−1)
7
X
7
X
log(1−xi )
log xi +3 log(1−0.9a (0.1a+1)+
1
1
with first derivative
7
X
dl
13
13
0.9a−1 (0.1a2 + a − 0.09)
=− −
+
log xi − 3 ·
da
a
a+1
1 − 0.9a (0.1a + 1)
1
Assuming the stationary point defines a maximum the equation that gives the MLestimate becomes
7
0.9a−1 (0.1a2 + a − 0.09) X
13
+3·
=
log xi ' −3.527
a(a + 1)
1 − 0.9a (0.1a + 1)
1
(e) b = 1 ⇒
f (x; a, 1) =
1
· xa−1 = axa−1
B(a, 1)
a why the equation for finding the MM-estimator is
Thus E(X) = a +
1
a
= x̄
a+1
This gives us
âM M =
x̄
0.6143
=
= 1.59
1 − x̄
1 − 0.6143
2
Task 2
(a) Use Lemma 2.2 on p. 14 in the textbook. The log-likelihood function with normally
distributed data where µ = 2 is
n
n
1 X
n
2
x
(xi − 2)2
l(σ ; ) = − log σ − log(2π) − 2
2
2
σ 1
2
with first derivative
n
dl
n
1 X
=− 2 + 2 2
(xi − 2)2
dσ 2
2σ
(σ ) 1
and second derivative
n
d2 l
n
1 X
(xi − 2)2
2 2 =
2 2 −
2 3
d(σ )
2(σ )
(σ ) 1
The Fisher information is
n
n
n
n
d2 l
1 X 1 X 2
n
2
E (xi − 2) =
σ =
Iσ2 = E −
=
2 2
2 2−
2 3
2 2−
2 3
d(σ )
2(σ ) (σ ) 1
2(σ ) (σ ) 1
2(σ 2 )2
and thus the first derivative may be written
n
n
dl
·
2 =
dσ
2(σ 2 )2
1X
(xi − 2)2 − σ 2
n 1
!
1 Pn (x − 2)2 is unbiased this shows that this estimator is the one that
Since σb2 = n
i
1
4
attains the CR bound and thus s2 does not attain it. However since Var (s2 ) = n2σ
−1
4
2σ
2
2
b
and Var (σ ) = n both vanishes when n → ∞ we may condlude that s will attain
the CR-bound when n → ∞. (The variances may be easily obtained by using the fact
2
c2
is χ2n−1 -distributed and nσσ2 is χ2n -distributed.)
that for normally distributed data (n−1)s
σ2
(b)
2
MSE (σb2 ) = Var (σb2 ) + Bias(σb2 ) =
2
2
= Var (c(n − 1)s2 ) + Bias(c(n − 1)s2 ) = c2 (n − 1) · 2σ 4 + σ 2 (c(n − 1) − 1) =
· · · = c2 (2σ 2 (n − 1) + σ 4 (n − 1)2 ) − c · 2σ 4 (n − 1) + σ 4 = φ(c)
Minimise:
dφ
= 2cσ 4 (2(n − 1) + (n − 1)2 ) − 2σ 4 (n − 1)
dc
3
dφ
1
=0⇒c=
dc
n+1
d2 φ
= 2σ 4 (2(n − 1) + (n − 1)2 ) > 0
dc2
Thus the smallest MSE is obtained when c = 1/(n + 1).
Task 3
(a) The likelihood function is
P
λ
L(λ; x ) = Q
xi
xi !
· e−nλ
Thus,
L(λ1 ; x )
=
L(λ0 ; x )
λ1
λ0
P xi
· e−n(λ1 −λ0 ) ≥ A
Taking logarithms:
(log λ1 − log λ0 )
X
xi − n(λ1 − λ0 ) ≥ B
With λ1 < λ0 we get the form of the best test as
X
xi ≤ C =
B + n(λ1 − λ0 )
log λ1 − log λ0
P
P
(b) Critical region is xi ≤ C. P
With size 5% we get the equation P ( xi ≤ C | λ = 2) =
0.05. Now we can use that
xi is P o(nλ) (since it is a sum of independent Poisson
variates, all with mean λ). This gives the following equation for finding the critical
region:
C
X
(2n)i −2n
·e
= 0.05
i!
i=0
(c) No, since the form of the best test is
P
xi ≤ C for λ1 < λ0 but is
P
xi ≥ C for λ0 < λ1
maxλ≤1 {L(λ; x )}
. We find the maximum values by investigating
max {L(λ;
P x )}
Q
the log-likelihood: l(λ; x ) = ( xi ) log λ − log xi ! − nλ. This function is maximised
for λ = 3 and is thus increasing when λ approaches 3 from the left.⇒
(d) Test statistic is Λ =
P
1
max {L(λ; x )} = L(1; x ) = Q
λ≤1
xi !
P
3
max {L(λ; x )} = L(3; x ) = Q
4
xi
xi
xi !
· e−n·1 =
e−3
2! · 4! · 3!
· e−n·3 =
39 · e−9
2! · 4! · 3!
e−3
' 0.002
39 · e−9
Thus −2 log Λ ' 7.8 that is compared with the χ21 -distribution to get the P -value
approx. 0.0052.
n
o
2
P
d
l
(e) Iλ = E − 2 = 12 E ( xi ) = n ⇒ Iλ̂M L = n . But λ̂M L = x̄ and thus the Wald
λ
dλ
λ
λ̂M L
n(x̄ − 1)2
test is (x̄ − 1) · n
·
(x̄
−
1)
≥
C
or
equivalently
≥ C.
x̄
x̄
⇒Λ=
Task 4
(a) Prior: N (3, 2) ⇒ φ = 3, τ = 2. Data: N (µ, 2) ⇒ σ = 2. See the textbook on p. 125.
The posterior is


s
!
r
2
2
2 2
φσ
+
nx̄τ
σ
τ
3
·
4
+
nx̄
·
4
4
·
4
=N
N 2
,
,
σ + nτ 2
σ 2 + nτ 2
4+n·4
4+n·4
Here n = 3 and x̄ = 3.4 ⇒ posterior is N (3.3, 1). Under absolute error loss µ̂B is the
median of the posterior distribution, which here coincides with the mean, i.e. 3.3.
0 if x < 3
(b) LS (µ, δ1 ) =
and LS (µ, δ2 ) ≡ 0. Thus the risk functions are
6 if x ≥ 3
Z ∞
1
3−µ
− 81 ·(x−µ)2
6· √ e
R(µ, δ1 ) =
dx = 6 · P (X ≥ 3) = 6 · 1 − Φ
2
2 2π
3
R(µ, δ2 ) ≡ 0
Thus δ2 is minimax since it always minimises the risk
R∞
1
2
· √1 e− 8 ·(x−µ) dµ and for δ2 RB ≡ 0.
(c) For δ1 RB = −∞ 6 · 1 − Φ 3−µ
2
2 2π
Task 5
50 20
1
(a) The posterior q(π; x ) is proportional to
π (1 − π)30 ·
· π(1 − π)4 ∝
B(2, 5)
20
π 21 (1 − π)34 and is thus Beta(22,35). To find the limits a and b of the credible interval
(a, b)we need to solve the equations
Z 1
1
π 21 (1 − π)34 dπ = 0.95
l B(22, 35)
Z
u
1
1
π 21 (1 − π)34 dπ = 0.05
B(22, 35)
5
34
· 134−k · (−π)k we come (upon some algebra)
By using the result (1 − π) = k=0
k
to the following defining equations for the two limits:
34
P34
34 X
(−1)k
34
·
· (1 − a22+k ) = 0.95 · B(22, 35)
k
22
+
k
k=0
34 X
(−1)k
34
·
· (1 − b22+k ) = 0.05 · B(22, 35)
k
22
+
k
k=0
(b) Use Theorem 7.3 in the textbook ⇒
q(π | x , H1 )
B = lim
= lim
π→0.4 p(π | H1 )
π→0.4
1
· π 21 (1 − π)34
B(22,35)
1
· π ( 1 − π)4
B(2,5)
=
B(2, 5)
· 0.420 · 0.630
B(22, 35)
Now B(2, 5) ' 0.0333 and B(22, 35) ' 2.12 · 10−17 (can be tedious to calculate though,
not the best illustration of what you’re supposed to cope with during the exam), and
this gives us B ' 3.8. Thus the posterior odds becomes 3.8 · 0.1 = 0.38 or approx 1
against 2.63.
R1
(c) g(xn+1 | x ) = 0 f (xn+1 ; π)q(π | x )dπ =
R1
1
· π 21 (1 − π)34 dπ =
= 0 π xn+1 (1 − π)1−xn+1 ·
B(22, 35)
B(22 + xn+1 , 26 − xn+1 )
=
B(22, 35)
Task 6
(a) Ties: In Jar 7 Brand 2 has exactly 5 hours longer time of growth so this observation
is discarded. This leaves us with 14 jars disposable. For x = 9 jars of these Brand
2 has less than 5 hours longer time of growth. H0 says that the median growth time
difference is less than or equal to 5. Under this hypothesis x should follow a Bi(15, 0.5)distribution. Since 9 is larger than 14/2=7 we have no evidence for rejecting H0 .
(b) Compute the differences (Brand 2 − (Brand 1 + 5)). Analogously with a) this gives
a zero difference for jar 7 and that one is discarded. The rank sum for the negative
differences is 74, and although there are many ties we compute the z-score for this
value to be 1.35. Thus we have no evidence for rejecting H0 here either.
(c) Well, this is up to you to discuss.
6
Download