Stat 330 Homework 12 Solution Spring 2011 1. The Weibull distribution has two parameters α and β, its density function f (x0 is given as α fα,β (x) = αβ −α xα−1 e−(x/β) for x ≥ 0. Assuming response time of the Google search engine have a Weibull distribution with α = 2 (and are independent) (a) Find an ML Estimator (Maximum likelihood estimator) for β. Solution: For i.i.d data x1 , · · · , xn (the n realizations of i.i.d. random variables X1 , · · · , Xn ), Finding an ML estimator β̂ for β involves five steps: (a) find likelihood function: L(β) = n ∏ f2,β (xi ) = i=1 n ∏ 2β −2 xi e−(xi /β) 2 i=1 (b) find log-likelihood function (natural log!): n ∑ logL(β) = log f2,β (xi ) i=1 = n ∑ ( log 2 − 2 log β + log xi − (xi /β)2 ) i=1 = n log 2 − 2n log β + n ∑ log xi − β −2 i=1 n ∑ x2i i=1 (c) differentiate log-likelihood w.r.t. β: ∑ d 1 logL(β) = 0 − 2n + 0 + 2β−3 x2i = dβ β i=1 n = 2β−3 n ∑ x2i − 2n i=1 1 β (d) set derivative to 0, solve for β: n n ∑ ∑ 1 d 2 logL(β) = 0 ⇐⇒ 2β−3 xi − 2n = 0 ⇐⇒ x2i − nβ 2 = 0 dβ β i=1 i=1 ˆ (e) solution is beta: ( β̂ = 1 1∑ 2 x n i=1 i n ) 21 You may easily found that ∂ 2 l(β) 4n2 = − <0 n ∑ ∂β 2 β=β̂ 2 xi i=1 which implies concavity and ergo the maximality of l(β) at β̂. (b) The following ten numbers are Google response times (in seconds). Using your results, find β̂, 0.2, 0.01, 0.02, 0.06, 0.1, 0.05, 0.24, 0.17, 0.03, 0.12, 0.06, 0.08, 0.13, 0.22, 0.14. Solution: Using the data provided above, n ∑ x2i = 0.2533 i=1 and by the formula derived in part (a), the MLE β̂ is β̂ = (0.2533/15)1/2 = 0.1299. 2. There is concern about the speed of automobiles traveling over a particular stretch of highway. For a random sample of thirty automobiles, radar indicated the following speeds, in miles per hour: 82 88 64 78 90 57 74 70 81 60 75 78 85 77 78 65 73 79 73 66 71 70 61 77 66 69 72 67 64 74 Let the mean speed of all automobiles traveling over this stretch of highway be µ mph. (a) Find the sample mean and variance, x̄ and s2 . Solution: Using the data provided we have ∑n n 1 ∑ 2 i=1 xi x̄ = = 72.8 s = (xi − x̄)2 = 66.16552 n n − 1 i=1 (you can use R or other software like Matlab to get these too). (b) Find a 95 % confidence interval for the mean speed of all automobiles traveling over this stretch of highway. Solution: We have a large sample n = 30 and so we may assume the sample mean following normal distribution and use z (the standard normal) for the quantiles. 2 The 95% confidence interval for the mean speed of all automobiles traveling over this stretch of highway is ( ) s s x̄ − zα/2 √ , x̄ + zα/2 √ n n √ Here 1 − α = 0.95, zα/2 = Φ−1 (0.975) = 1.96 and s = s2 = 8.134219. Hence the 95% confidence interval for µ is given by 72.8 ± 2.91 or (69.89, 75.71). (c) Test the hypothesis that people are speeding, if the legal speed on this highway is 65 mph. That is, test H0 : µ = 65 vs. Ha : µ > 65 Solution: We need to test Ho : µ = 65 vs. Ha : µ > 65, where the test statistic is given as x̄ − µ0 72.8 − 65 √ = 5.252 √ = z= s/ n 8.134219/ 30 The p-value is P (z > 5.252) = 1 − P (z ≤ 5.252) ≈ 0. Since the p-value is really tiny we reject the null hypothesis and conclude that the mean speed of all the automobiles is greater than 65. Also using part (b) we can say that since 65 is not contained in the confidence interval we can conclude that the mean speed of the car is not equal to 65. 3. To assess whether a laboratory scale is accurate we can take a standard weight known to weigh exactly 100 grams and weigh it repeatedly. Suppose that the scale readings are normally distributed with variance σ 2 = 0.0625 grams. If the scale is accurate then the population mean µ(the mean obtained in many repeated weighings) would be 100 grams but if the scale is inaccurate the population mean could be higher or lower. (a) The weight is weighed 36 times and the sample mean is X̄ = 100.10. Construct a 99% confidence interval for µ. Do you believe the scale is accurate based on this interval? Solution: The weight is weighed 35 times and the sample mean is x̄ = 100.10. Here the population variance is known. So we use z in this case. The 99% confidence interval for µ is given as ( ) σ σ x̄ − zα/2 √ , x̄ + zα/2 √ = 100.10 ± 0.1073 = (99.9927, 100.2073) n n since 1 − α = 0.99, zα/2 = Φ−1 (0.995) = 2.576, σ = 0.25 and n = 36. Based on this interval we can conclude that the scale is accurate since 100 is contained in the interval. (b) Construct a 90% confidence interval for µ. Do you believe the scale is accurate based on this interval? 3 Solution: The 90% confidence interval for µ is given as ( ) σ σ x̄ − zα/2 √ , x̄ + zα/2 √ = 100.10 ± 0.0685 = (100.0315, 100.1685) n n since 1 − α = 0.9, zα/2 = Φ−1 (0.95) = 1.645, σ = 0.25 and n = 36. Based on this interval we can conclude that the scale is inaccurate since 100 is not contained in this interval. (c) Explain why one confidence interval finds the scale inaccurate while the other finds the scale accurate. Solution: The 99% confidence interval for µ found the scale to be accurate while the 90% confidence interval found the scale to be inaccurate because as we increase the confidence level the zα/2 value increases and hence the confidence interval becomes wider. The 99% confidence interval is wider and thus contains the value 100 but the 90% confidence interval which is narrower, does not contain the value 100. But we know that the wider interval is less precise and has less ”power” to detect if µ is different from 100. (d) Test the hypothesis H0 : µ = 100 vs. Ha : µ ̸= 100 Solution: The test statistic is given by z= x̄ − µ0 100.10 − 100 √ = √ = 2.4 σ/ n 0.25/ 36 The p-value is P (z > 2.4)+P (z < −2.4) = 2·(1−P (z ≤ 2.4)) = 2×(1−0.9918) = 0.0164. If we test at 5% level of significance the p-value is smaller than α = 0.05 and we reject the null hypothesis concluding that the mean weight obtained in many repeated weighings is not equal to 100. If the level of significance is 1% the p-value is greater than α = 0.01 and we fail to reject the null hypothesis concluding that the mean weight is equal to 100. 4. A manager evaluates effectiveness of a major hardware upgrade by running a certain process 50 times before the upgrade and 50 times after it. Based on these data, the average running time is 8.8 minutes before the upgrade, 7.5 minutes after it. Historically the standard deviation has been 1.2 minutes and presumably it has not changed. (a) Construct a 90% confidence interval for the difference in the mean running times µBef ore − µAf ter . Solution: The 90% confidence interval for µBef ore − µAf ter = the difference in the mean running times is ( √ ) √ 2 2 , (x̄B − x̄A ) + zα/2 · σ = (0.904, 1.696) (x̄B − x̄A ) − zα/2 · σ n n 4 where x̄B and x̄A are the average running time before and after the upgrade respectively so that x̄B −x̄A = 8.8−7.5 = 1.3, σ = 1.2 and zα/2 = Φ−1 (0.95) = 1.65 for 1 − α = 0.9. (b) Using this interval, can you conclude that the upgrade was effective? Why? Solution: Using the interval we can conclude that the upgrade is effective since the running time has decreased after the upgrade. As the confidence interval has only positive values we can conclude that the mean running time before the upgrade is greater than the mean running time after the upgrade and hence the upgrade was effective. 5. In a study of the relationship between the temperature change of certain type of chip and the its working duration, the following data were obtained from a sample size n = 75. The working duration, X, has mean 32.2 minutes and variance 6.4 minutes; the temperature increased from the room temperature, Y , has mean 8.4 temperature units and variance 2.8 temperature units. The sample covariance between X and Y is 3.6. (a) Estimate the linear regression equation predicting Y based on X. Solution: Summarizing the data we have n = 75, x̄ = 32.2, ȳ = 8.4, so that ∑n (x − x̄)(yi − ȳ) 3.6 ∑n i b1 = i=1 = 0.5625 = 2 6.4 i=1 (xi − x̄) and b0 = ȳ − x̄b1 = 8.4 − 32.3 × 0.5625 = −9.7125 so that the linear regression equation predicting Y based on X is y = −9.7125 + 0.5625x. (b) Use R2 to evaluate the goodness of fit. It the linear model a good model to describe the relationship between temperature change and running duration? Solution: First, we compute SST = n ∑ (yi − ȳ)2 = (n − 1) · s2y = (75 − 1) · 2.8 = 207.2 i=1 and SSR = b21 · (n − 1) · 6.4 = 148.85 so that SSR = 0.7232 = 72.32%. SST So the model can explain 72.32% of the total variation, which means the model is a reasonable good approach to the data. R2 = 5 (c) Predict the temperature change of the chip if it continues working for one hour. Solution: Using the equation from part (a), one hour of running means x = 60, so y = −9.7125 + 0.5625 · 60 = 24.0375 temp units. 6