LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, ANd 732A36 THEORY OF STATISTICS, 6 CDTS Master’s program in Statistics and Data Mining Spring semester 2011 Suggested solutions to examples of tasks that may appear in exam Task 1 (a) The Beta distribution is part of the exponential family and the density function may be written f (x; a, b) = e(a−1) log(x)+(b−1) log(1−x)−log B(a,b) Thus the likelihood function is P P L(x; a, b) = e(a−1) log xi +(b−1) log(1−xi )−n log B(a,b) P P ⇒ T = ( log xi , log(1 − xi )) is minimal sufficient for (a, b) (b) The log-likelihood function is l(a; x) = (a − 1) X log xi + X log(1 − xi ) − n · log B(a, 2) First derivative w.r.t. a is X dl = log xi − n · da Since Z B(a, 2) = d da {B(a, 2)} B(a, 2) 1 xa−1 (1 − x)dx = . . . = 0 1 a(a − 1) we get −2a − 1 d {B(a, 2)} = . . . = 2 da a (a + 1)2 that gives us X (2a + 1)n dl = log xi + da a(a + 1) Further we get 2na(a + 1) − (2a + 1)n(2a + 1) n(2a2 + 2a + 1) d2 l = = . . . = − da2 a2 (a + 1)2 a2 (a + 1)2 which is always negative since a > 1 > 0. Thus the equation that gives the MLestimator is X (2a + 1)n =− log xi a(a + 1) (c) d2 l Ia = E − 2 da n(2a2 + 1) = 2 a (a + 1)2 is N µ = a, σ = p a(a + 1) Thus the asymptotic distribution of âM L . Subn(2a2 + 2a + 1) stituting âM L = 2.5 for a gives the numerical 95% confidence interval as 2.5 · 3.5 2.5 ± 1.96 · p ⇒ 2.5 ± 0.9 20 · (2 · 2.52 + 2 · 2.5 + 1) 1 (d) The likelihood function can be written 7 Y L(a; x ) = ! f (xi ; a, 2) · (P (X > 0.9))3 i=1 Now, 1 Z P (X > 0.9) = 0.9 We also have that B(a, 2) = upon simplification R1 0 1 xa−1 (1 − x)dx B(a, 2) xa−1 (1 − x)dx = 1/a − 1/(a + 1) = 1/(a(a + 1)). Thus 1 − 0.92 (0.1a + 1) a2 (a + 1)2 P (X > 9) = and the likelihood function simplifies to 1 L(a; x ) = 7 · a (a + 1)7 7 Y !a−1 ! 3 7 Y 1 − 0.92 (0.1a + 1) (1 − xi ) · a2 (a + 1)2 i · zi 1 The log-likelihood function becomes (upon simplification) l(a; x ) = −13 log a−13 log(a+1)+(a−1) 7 X 7 X log(1−xi ) log xi +3 log(1−0.9a (0.1a+1)+ 1 1 with first derivative 7 X dl 13 13 0.9a−1 (0.1a2 + a − 0.09) =− − + log xi − 3 · da a a+1 1 − 0.9a (0.1a + 1) 1 Assuming the stationary point defines a maximum the equation that gives the MLestimate becomes 7 0.9a−1 (0.1a2 + a − 0.09) X 13 +3· = log xi ' −3.527 a(a + 1) 1 − 0.9a (0.1a + 1) 1 (e) b = 1 ⇒ f (x; a, 1) = 1 · xa−1 = axa−1 B(a, 1) a why the equation for finding the MM-estimator is Thus E(X) = a + 1 a = x̄ a+1 This gives us âM M = x̄ 0.6143 = = 1.59 1 − x̄ 1 − 0.6143 2 Task 2 (a) Use Lemma 2.2 on p. 14 in the textbook. The log-likelihood function with normally distributed data where µ = 2 is n n 1 X n 2 x (xi − 2)2 l(σ ; ) = − log σ − log(2π) − 2 2 2 σ 1 2 with first derivative n dl n 1 X =− 2 + 2 2 (xi − 2)2 dσ 2 2σ (σ ) 1 and second derivative n d2 l n 1 X (xi − 2)2 2 2 = 2 2 − 2 3 d(σ ) 2(σ ) (σ ) 1 The Fisher information is n n n n d2 l 1 X 1 X 2 n 2 E (xi − 2) = σ = Iσ2 = E − = 2 2 2 2− 2 3 2 2− 2 3 d(σ ) 2(σ ) (σ ) 1 2(σ ) (σ ) 1 2(σ 2 )2 and thus the first derivative may be written n n dl · 2 = dσ 2(σ 2 )2 1X (xi − 2)2 − σ 2 n 1 ! 1 Pn (x − 2)2 is unbiased this shows that this estimator is the one that Since σb2 = n i 1 4 attains the CR bound and thus s2 does not attain it. However since Var (s2 ) = n2σ −1 4 2σ 2 2 b and Var (σ ) = n both vanishes when n → ∞ we may condlude that s will attain the CR-bound when n → ∞. (The variances may be easily obtained by using the fact 2 c2 is χ2n−1 -distributed and nσσ2 is χ2n -distributed.) that for normally distributed data (n−1)s σ2 (b) 2 MSE (σb2 ) = Var (σb2 ) + Bias(σb2 ) = 2 2 = Var (c(n − 1)s2 ) + Bias(c(n − 1)s2 ) = c2 (n − 1) · 2σ 4 + σ 2 (c(n − 1) − 1) = · · · = c2 (2σ 2 (n − 1) + σ 4 (n − 1)2 ) − c · 2σ 4 (n − 1) + σ 4 = φ(c) Minimise: dφ = 2cσ 4 (2(n − 1) + (n − 1)2 ) − 2σ 4 (n − 1) dc 3 dφ 1 =0⇒c= dc n+1 d2 φ = 2σ 4 (2(n − 1) + (n − 1)2 ) > 0 dc2 Thus the smallest MSE is obtained when c = 1/(n + 1). Task 3 (a) The likelihood function is P λ L(λ; x ) = Q xi xi ! · e−nλ Thus, L(λ1 ; x ) = L(λ0 ; x ) λ1 λ0 P xi · e−n(λ1 −λ0 ) ≥ A Taking logarithms: (log λ1 − log λ0 ) X xi − n(λ1 − λ0 ) ≥ B With λ1 < λ0 we get the form of the best test as X xi ≤ C = B + n(λ1 − λ0 ) log λ1 − log λ0 P P (b) Critical region is xi ≤ C. P With size 5% we get the equation P ( xi ≤ C | λ = 2) = 0.05. Now we can use that xi is P o(nλ) (since it is a sum of independent Poisson variates, all with mean λ). This gives the following equation for finding the critical region: C X (2n)i −2n ·e = 0.05 i! i=0 (c) No, since the form of the best test is P xi ≤ C for λ1 < λ0 but is P xi ≥ C for λ0 < λ1 maxλ≤1 {L(λ; x )} . We find the maximum values by investigating max {L(λ; P x )} Q the log-likelihood: l(λ; x ) = ( xi ) log λ − log xi ! − nλ. This function is maximised for λ = 3 and is thus increasing when λ approaches 3 from the left.⇒ (d) Test statistic is Λ = P 1 max {L(λ; x )} = L(1; x ) = Q λ≤1 xi ! P 3 max {L(λ; x )} = L(3; x ) = Q 4 xi xi xi ! · e−n·1 = e−3 2! · 4! · 3! · e−n·3 = 39 · e−9 2! · 4! · 3! e−3 ' 0.002 39 · e−9 Thus −2 log Λ ' 7.8 that is compared with the χ21 -distribution to get the P -value approx. 0.0052. n o 2 P d l (e) Iλ = E − 2 = 12 E ( xi ) = n ⇒ Iλ̂M L = n . But λ̂M L = x̄ and thus the Wald λ dλ λ λ̂M L n(x̄ − 1)2 test is (x̄ − 1) · n · (x̄ − 1) ≥ C or equivalently ≥ C. x̄ x̄ ⇒Λ= Task 4 (a) Prior: N (3, 2) ⇒ φ = 3, τ = 2. Data: N (µ, 2) ⇒ σ = 2. See the textbook on p. 125. The posterior is s ! r 2 2 2 2 φσ + nx̄τ σ τ 3 · 4 + nx̄ · 4 4 · 4 =N N 2 , , σ + nτ 2 σ 2 + nτ 2 4+n·4 4+n·4 Here n = 3 and x̄ = 3.4 ⇒ posterior is N (3.3, 1). Under absolute error loss µ̂B is the median of the posterior distribution, which here coincides with the mean, i.e. 3.3. 0 if x < 3 (b) LS (µ, δ1 ) = and LS (µ, δ2 ) ≡ 0. Thus the risk functions are 6 if x ≥ 3 Z ∞ 1 3−µ − 81 ·(x−µ)2 6· √ e R(µ, δ1 ) = dx = 6 · P (X ≥ 3) = 6 · 1 − Φ 2 2 2π 3 R(µ, δ2 ) ≡ 0 Thus δ2 is minimax since it always minimises the risk R∞ 1 2 · √1 e− 8 ·(x−µ) dµ and for δ2 RB ≡ 0. (c) For δ1 RB = −∞ 6 · 1 − Φ 3−µ 2 2 2π Task 5 50 20 1 (a) The posterior q(π; x ) is proportional to π (1 − π)30 · · π(1 − π)4 ∝ B(2, 5) 20 π 21 (1 − π)34 and is thus Beta(22,35). To find the limits a and b of the credible interval (a, b)we need to solve the equations Z 1 1 π 21 (1 − π)34 dπ = 0.95 l B(22, 35) Z u 1 1 π 21 (1 − π)34 dπ = 0.05 B(22, 35) 5 34 · 134−k · (−π)k we come (upon some algebra) By using the result (1 − π) = k=0 k to the following defining equations for the two limits: 34 P34 34 X (−1)k 34 · · (1 − a22+k ) = 0.95 · B(22, 35) k 22 + k k=0 34 X (−1)k 34 · · (1 − b22+k ) = 0.05 · B(22, 35) k 22 + k k=0 (b) Use Theorem 7.3 in the textbook ⇒ q(π | x , H1 ) B = lim = lim π→0.4 p(π | H1 ) π→0.4 1 · π 21 (1 − π)34 B(22,35) 1 · π ( 1 − π)4 B(2,5) = B(2, 5) · 0.420 · 0.630 B(22, 35) Now B(2, 5) ' 0.0333 and B(22, 35) ' 2.12 · 10−17 (can be tedious to calculate though, not the best illustration of what you’re supposed to cope with during the exam), and this gives us B ' 3.8. Thus the posterior odds becomes 3.8 · 0.1 = 0.38 or approx 1 against 2.63. R1 (c) g(xn+1 | x ) = 0 f (xn+1 ; π)q(π | x )dπ = R1 1 · π 21 (1 − π)34 dπ = = 0 π xn+1 (1 − π)1−xn+1 · B(22, 35) B(22 + xn+1 , 26 − xn+1 ) = B(22, 35) Task 6 (a) Ties: In Jar 7 Brand 2 has exactly 5 hours longer time of growth so this observation is discarded. This leaves us with 14 jars disposable. For x = 9 jars of these Brand 2 has less than 5 hours longer time of growth. H0 says that the median growth time difference is less than or equal to 5. Under this hypothesis x should follow a Bi(15, 0.5)distribution. Since 9 is larger than 14/2=7 we have no evidence for rejecting H0 . (b) Compute the differences (Brand 2 − (Brand 1 + 5)). Analogously with a) this gives a zero difference for jar 7 and that one is discarded. The rank sum for the negative differences is 74, and although there are many ties we compute the z-score for this value to be 1.35. Thus we have no evidence for rejecting H0 here either. (c) Well, this is up to you to discuss. 6