Stat 330 Homework 12 Solution Spring 2011

advertisement
Stat 330
Homework 12 Solution
Spring 2011
1. The Weibull distribution has two parameters α and β, its density function f (x0 is
given as
α
fα,β (x) = αβ −α xα−1 e−(x/β)
for x ≥ 0. Assuming response time of the Google search engine have a Weibull distribution with α = 2 (and are independent)
(a) Find an ML Estimator (Maximum likelihood estimator) for β.
Solution: For i.i.d data x1 , · · · , xn (the n realizations of i.i.d. random variables
X1 , · · · , Xn ), Finding an ML estimator β̂ for β involves five steps:
(a) find likelihood function:
L(β) =
n
∏
f2,β (xi ) =
i=1
n
∏
2β −2 xi e−(xi /β)
2
i=1
(b) find log-likelihood function (natural log!):
n
∑
logL(β) =
log f2,β (xi )
i=1
=
n
∑
(
log 2 − 2 log β + log xi − (xi /β)2
)
i=1
= n log 2 − 2n log β +
n
∑
log xi − β −2
i=1
n
∑
x2i
i=1
(c) differentiate log-likelihood w.r.t. β:
∑
d
1
logL(β) = 0 − 2n + 0 + 2β−3
x2i =
dβ
β
i=1
n
= 2β−3
n
∑
x2i − 2n
i=1
1
β
(d) set derivative to 0, solve for β:
n
n
∑
∑
1
d
2
logL(β) = 0 ⇐⇒ 2β−3
xi − 2n = 0 ⇐⇒
x2i − nβ 2 = 0
dβ
β
i=1
i=1
ˆ
(e) solution is beta:
(
β̂ =
1
1∑ 2
x
n i=1 i
n
) 21
You may easily found that
∂ 2 l(β) 4n2
=
−
<0
n
∑
∂β 2 β=β̂
2
xi
i=1
which implies concavity and ergo the maximality of l(β) at β̂.
(b) The following ten numbers are Google response times (in seconds). Using your
results, find β̂,
0.2, 0.01, 0.02, 0.06, 0.1, 0.05, 0.24, 0.17, 0.03, 0.12, 0.06, 0.08, 0.13, 0.22, 0.14.
Solution: Using the data provided above,
n
∑
x2i = 0.2533
i=1
and by the formula derived in part (a), the MLE β̂ is
β̂ = (0.2533/15)1/2 = 0.1299.
2. There is concern about the speed of automobiles traveling over a particular stretch of
highway. For a random sample of thirty automobiles, radar indicated the following
speeds, in miles per hour:
82 88 64 78 90 57 74 70 81 60 75 78 85 77 78 65 73 79 73 66 71 70 61 77 66
69 72 67 64 74
Let the mean speed of all automobiles traveling over this stretch of highway be µ mph.
(a) Find the sample mean and variance, x̄ and s2 .
Solution: Using the data provided we have
∑n
n
1 ∑
2
i=1 xi
x̄ =
= 72.8
s =
(xi − x̄)2 = 66.16552
n
n − 1 i=1
(you can use R or other software like Matlab to get these too).
(b) Find a 95 % confidence interval for the mean speed of all automobiles traveling
over this stretch of highway.
Solution: We have a large sample n = 30 and so we may assume the sample mean
following normal distribution and use z (the standard normal) for the quantiles.
2
The 95% confidence interval for the mean speed of all automobiles traveling over
this stretch of highway is
(
)
s
s
x̄ − zα/2 √ , x̄ + zα/2 √
n
n
√
Here 1 − α = 0.95, zα/2 = Φ−1 (0.975) = 1.96 and s = s2 = 8.134219. Hence the
95% confidence interval for µ is given by 72.8 ± 2.91 or (69.89, 75.71).
(c) Test the hypothesis that people are speeding, if the legal speed on this highway
is 65 mph. That is, test H0 : µ = 65 vs. Ha : µ > 65
Solution: We need to test Ho : µ = 65 vs. Ha : µ > 65, where the test statistic
is given as
x̄ − µ0
72.8 − 65
√ = 5.252
√ =
z=
s/ n
8.134219/ 30
The p-value is P (z > 5.252) = 1 − P (z ≤ 5.252) ≈ 0. Since the p-value is really
tiny we reject the null hypothesis and conclude that the mean speed of all the
automobiles is greater than 65.
Also using part (b) we can say that since 65 is not contained in the confidence
interval we can conclude that the mean speed of the car is not equal to 65.
3. To assess whether a laboratory scale is accurate we can take a standard weight known
to weigh exactly 100 grams and weigh it repeatedly. Suppose that the scale readings
are normally distributed with variance σ 2 = 0.0625 grams. If the scale is accurate then
the population mean µ(the mean obtained in many repeated weighings) would be 100
grams but if the scale is inaccurate the population mean could be higher or lower.
(a) The weight is weighed 36 times and the sample mean is X̄ = 100.10. Construct a
99% confidence interval for µ. Do you believe the scale is accurate based on this
interval?
Solution: The weight is weighed 35 times and the sample mean is x̄ = 100.10.
Here the population variance is known. So we use z in this case. The 99%
confidence interval for µ is given as
(
)
σ
σ
x̄ − zα/2 √ , x̄ + zα/2 √
= 100.10 ± 0.1073 = (99.9927, 100.2073)
n
n
since 1 − α = 0.99, zα/2 = Φ−1 (0.995) = 2.576, σ = 0.25 and n = 36. Based on
this interval we can conclude that the scale is accurate since 100 is contained in
the interval.
(b) Construct a 90% confidence interval for µ. Do you believe the scale is accurate
based on this interval?
3
Solution: The 90% confidence interval for µ is given as
(
)
σ
σ
x̄ − zα/2 √ , x̄ + zα/2 √
= 100.10 ± 0.0685 = (100.0315, 100.1685)
n
n
since 1 − α = 0.9, zα/2 = Φ−1 (0.95) = 1.645, σ = 0.25 and n = 36. Based on this
interval we can conclude that the scale is inaccurate since 100 is not contained in
this interval.
(c) Explain why one confidence interval finds the scale inaccurate while the other
finds the scale accurate.
Solution: The 99% confidence interval for µ found the scale to be accurate
while the 90% confidence interval found the scale to be inaccurate because as we
increase the confidence level the zα/2 value increases and hence the confidence
interval becomes wider. The 99% confidence interval is wider and thus contains
the value 100 but the 90% confidence interval which is narrower, does not contain
the value 100. But we know that the wider interval is less precise and has less
”power” to detect if µ is different from 100.
(d) Test the hypothesis H0 : µ = 100 vs. Ha : µ ̸= 100
Solution: The test statistic is given by
z=
x̄ − µ0
100.10 − 100
√ =
√
= 2.4
σ/ n
0.25/ 36
The p-value is P (z > 2.4)+P (z < −2.4) = 2·(1−P (z ≤ 2.4)) = 2×(1−0.9918) =
0.0164. If we test at 5% level of significance the p-value is smaller than α = 0.05
and we reject the null hypothesis concluding that the mean weight obtained in
many repeated weighings is not equal to 100. If the level of significance is 1%
the p-value is greater than α = 0.01 and we fail to reject the null hypothesis
concluding that the mean weight is equal to 100.
4. A manager evaluates effectiveness of a major hardware upgrade by running a certain
process 50 times before the upgrade and 50 times after it. Based on these data, the average running time is 8.8 minutes before the upgrade, 7.5 minutes after it. Historically
the standard deviation has been 1.2 minutes and presumably it has not changed.
(a) Construct a 90% confidence interval for the difference in the mean running times
µBef ore − µAf ter .
Solution: The 90% confidence interval for µBef ore − µAf ter = the difference in
the mean running times is
(
√ )
√
2
2
, (x̄B − x̄A ) + zα/2 · σ
= (0.904, 1.696)
(x̄B − x̄A ) − zα/2 · σ
n
n
4
where x̄B and x̄A are the average running time before and after the upgrade
respectively so that x̄B −x̄A = 8.8−7.5 = 1.3, σ = 1.2 and zα/2 = Φ−1 (0.95) = 1.65
for 1 − α = 0.9.
(b) Using this interval, can you conclude that the upgrade was effective? Why?
Solution: Using the interval we can conclude that the upgrade is effective since
the running time has decreased after the upgrade. As the confidence interval
has only positive values we can conclude that the mean running time before the
upgrade is greater than the mean running time after the upgrade and hence the
upgrade was effective.
5. In a study of the relationship between the temperature change of certain type of chip
and the its working duration, the following data were obtained from a sample size
n = 75. The working duration, X, has mean 32.2 minutes and variance 6.4 minutes;
the temperature increased from the room temperature, Y , has mean 8.4 temperature
units and variance 2.8 temperature units. The sample covariance between X and Y is
3.6.
(a) Estimate the linear regression equation predicting Y based on X.
Solution: Summarizing the data we have n = 75, x̄ = 32.2, ȳ = 8.4, so that
∑n
(x − x̄)(yi − ȳ)
3.6
∑n i
b1 = i=1
= 0.5625
=
2
6.4
i=1 (xi − x̄)
and
b0 = ȳ − x̄b1 = 8.4 − 32.3 × 0.5625 = −9.7125
so that the linear regression equation predicting Y based on X is
y = −9.7125 + 0.5625x.
(b) Use R2 to evaluate the goodness of fit. It the linear model a good model to
describe the relationship between temperature change and running duration?
Solution: First, we compute
SST =
n
∑
(yi − ȳ)2 = (n − 1) · s2y = (75 − 1) · 2.8 = 207.2
i=1
and
SSR = b21 · (n − 1) · 6.4 = 148.85
so that
SSR
= 0.7232 = 72.32%.
SST
So the model can explain 72.32% of the total variation, which means the model
is a reasonable good approach to the data.
R2 =
5
(c) Predict the temperature change of the chip if it continues working for one hour.
Solution: Using the equation from part (a), one hour of running means x = 60,
so y = −9.7125 + 0.5625 · 60 = 24.0375 temp units.
6
Download