AMS572.01 Midterm Exam Fall, 2009

advertisement
AMS572.01
Fall, 2009 ♠♣♥♦
Midterm Exam
Name: ________________________________ ID: _____________________ Signature: _________________________
Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide
complete solutions for full credit. The exam goes from 12:50-2:10pm. Good luck!
1. (for all) To study the effectiveness of wall insulation in saving energy for home heating, the energy consumption (in
MWh) for 5 houses in Bristol, England, was recorded for two winters; the first winter was before insulation and the
second winter was after insulation:
House
Before
After
1
12.1
12.0
2
10.6
11.0
3
13.4
14.1
4
13.8
11.2
5
15.5
15.3
(a) Please provide a 95% confidence interval for the difference between the mean energy consumption before and after the
wall insulation is installed. What assumptions are necessary for your inference?
(b) Can you conclude that there is a difference in mean energy consumption before and after the wall insulation is
installed at the significance level 0.05? Please test it and evaluate the p-value of your test. What assumptions are necessary
for your inference?
(c) Please write the SAS program to perform the test and examine the necessary assumptions given in (b).
SOLUTION: This is inference on two population means, paired samples.
(a). d  0.36 , sd=1.30
1.30
 ( 1.25,1.97)
5
(b). H 0 :  d  0 , H 0 :  d  0
CI: 0.36  2.776 
d 0
0.36  0

 0.619
sd n 1.30 / 5
tn 1, / 2  t4,0.025  2.776
(1) t0 
Since t0  0.619 is smaller than t4,0.025  2.776 , we cannot reject H0.
(2) p  value  2  P(T  0.619)  0.57 In the exam, since you do not have access to the statistical software such as R,
you can only estimate the range of the p-value based on your t-table. You thus estimate p-value > 2*0.1=0.2 based on the
t-table you were given.
Assumptions for (a) and (b): the paired differences follow a normal distribution.
(c) The SAS program is:
Data energy;
Input before after @@;
Diff=before - after;
Datalines;
12.1 12.0 10.6 11.0 13.4 14.1 13.8 11.2 15.5 15.3
;
Run;
Proc univariate data = energy normal;
Var Diff;
Run;
2A (for AMS students). Suppose we have two independent random samples from two normal populations:
X 1 , X 2 , , X n1 ~ N 1 ,  2 , and Y1 , Y2 , , Yn2 ~ N 2 ,  2 . At the significance level α, please construct a test to test




whether 1  22 or not. (*Please include the derivation of the pivotal quantity, the proof of its distribution, and the
derivation of the rejection region for full credit.)
SOLUTION: Here is a simple outline of the derivation of the test: H 0 : 1  2 2  0 versus H a : 1  2 2  0


(a) We start with the point estimator for the parameter of interest 1  2 2  : X  2Y . Its distribution is



N 1  2 2 ,  1 / n1  4 / n2  using the mgf for N  , 
2
2
 which
independence properties of the random samples. From this we have

X  2Y   
Z

is M t   exp t   2 t 2 / 2 , and the
1
 2 2 
 1 / n1  4 / n2
~ N 0,1 .
Unfortunately, Z can not serve as the pivotal quantity because σ is unknown.
(b) We next look for a way to get rid of the unknown σ following a similar approach in the construction of the
pooled-variance t-statistic. We found that W  n1  1S12  n2  1S 22 /  2 ~  n21 n2 2 using the mgf for  k2

1
which is M t    
 2t 

k/2
, and the independence properties of the random samples.
(c) Then we found, from the theorem of sampling from the normal population, and the independence properties of the
random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have
obtained our pivotal quantity: T 
X  2Y   
1
 2 2 
S p 1 / n1  4 / n2
~ t n1  n2 2 , where S p2 
pooled sample variance.


(d) The rejection region is derived from P T0  c | H 0   , where T0 
n1  1S12  n2  1S 22
n1  n2  2
X  2Y   0
S p 1 / n1  4 / n2
is the
H0
~ t n1  n2 2 . Thus
c  t n1  n2  2, / 2 . Therefore at the significance level of α, we reject H 0 in favor of H a iff T0  t n1 n2 2, / 2
2B (for all non-AMS students). An experiment was conducted to compare the mean number of tapeworms in the
stomachs of sheep that had been treated for worms against the mean number in those that were untreated. A sample of 14
worm-infected lambs was randomly divided into 2 groups. Seven were injected with the drug and the remainders were left
untreated. After a 6-month period, the lambs were slaughtered and the following worm counts were recorded:
Drug-treated sheep
Untreated sheep
18 43 28 50 16 32 13
40 54 26 63 21 37 39
(a). Test at α = 0.05 whether the treatment is effective or not.
(b) What assumptions do you need for the inference in part (a)?
(c). Please write up the entire SAS program necessary to answer questions raised in (a) and (b).
SOLUTION: Inference on two population means. Two small and independent samples.
Drug-treated sheep: X 1  28.57 , s12  198.62 , n1  7
Untreated sheep: X 2  40.0 , s 22  215.33 , n2  7
(a) Under the normality assumption, we first test if the two population variances are equal. That is, H 0 :  12   22 versus
H a :  12   22 . The test statistic is
F0 
s12 198.62

 0.92 , F6,6, 0.05,U  4.28 and F6,6,0.05, L  1 / 4.28  0.23 .
s 22 215.33
Since F0 is between 0.23 and 4.28, we cannot reject H0 . Therefore it is reasonable to assume that  12   22 .
Next we perform the pooled-variance t-test with hypotheses H 0 : 1   2  0 versus H a : 1   2  0
t0 
X1  X 2 0
sp
1
1

n
n2

28.57  40.0  0  1.49
14.39
1 1

7 7
Since t 0  1.49 is greater than  t12, 0.05  1.782 , we cannot reject H0. We have insufficient evidence to reject
the hypothesis that there is no difference in the mean number of worms in treated and untreated lambs.
(b) (1) Both populations are normally distributed
(2)  12   22
(c) /*Problem #2B*/
data sheep;
input group worms;
datalines;
1 18
1 43
1 28
1 50
1 16
1 32
1 13
2 40
2 54
2 26
2 63
2 21
2 37
2 39
;
run;
proc univariate data=sheep normal;
class group;
var worms;
title 'Check for normality';
run;
proc ttest data=sheep;
class group;
var worms;
title 'Independent samples t-test';
run;
proc npar1way data=sheep wilcoxon;
class group;
var worms;
title 'Nonparametric test for two-mean comparisons';
run;
3 (for all). A federal agency has decided to investigate the advertised weight printed on cartons of a certain brand of
cereal. The company in question periodically samples cartons of cereal coming off the production line to check their
weight. A summary of 1,500 of the weights made available to the agency indicates a mean weight of 11.80 ounces per
carton and a standard deviation of .75 ounce. Use this information to determine the number of cereal cartons the federal
agency must examine to estimate the average weight of cartons being produced now, using a 99% confidence interval of
width .50.
SOLUTION: The federal agency has specified that the width of the confidence interval is to be .50, so E = 0.25.
Assuming that the weights made available to the agency by the company are accurate, we can take σ = 0.75. The
Z  
 2.58 * 0.75 
required sample size with zα/2 = z0.005 = 2.58 is n    / 2   
  59.91
 0.25 
 E 
2
2
That is, the federal agency must obtain a random sample of 60 cereal cartons to estimate the mean weight to within 0.25.
4 (for all). In order to test the accuracy of speedometers purchased from a subcontractor, the purchasing department of an
automaker orders a test of a sample of speedometers at a controlled speed of 55 mph. At this speed, it is estimated that the
variance of the readings is 1.
(a). How many speedometers need to be tested to have a 95% power to detect a bias of 0.5 mph or greater using a 0.01
level test?
(b). A sample of the size determined in (a) has a mean of 55.2 and standard deviation of 0.8. Can you conclude that the
speedometers have a bias?
(c). Calculate the power of the test if 50 speedometers are tested and the actual bias is 0.5 mph. Assume a population
standard deviation of 0.8.
SOLUTION:
 H 0 :   0  55
 H a :   a  55.5  55
(a) 
power  0.95    0.05.   1,   0.01 .
n
( z  z ) 2  2
(  a  0 ) 2
(2.326  1.645) 212
(2.326  1.645) 2 0.82

 63.1  64 (*Note, if   0.8 , n 
 40.4  41 )
(55.5  55) 2
(55.5  55)2
Hence, 64 packages of cereal speedometers need to be tested. (*Note, only 41 packages are needed if   0.8 )
 H 0 :   0  55
 H a :   55
(b) 
s  0.8, n  64,   0.01 . X  55.2 .
z0 
X  0 55.2  55
X  0 H 0

 2 . (*Note, Z 0 
~ N  0,1 -- This is the large sample z-test by the central linit
s / n 0.8 / 64
s/ n
theorem that is suitable even if the population distribution is not normal.)
Since z0  2  Z0.01  2.326 , we can not conclude that the speedometers have a bias.
(**Note: Here you can also use the t-test – but remember to mention that the t-test is suitable if we assume the population
distribution is normal!)
(c)   0.8,   0.01, n  50
 H 0 :   0  55

 H a :   a  55.5  55
Power  P (reject H 0 | H a )
 P( Z 0  z0.01 |   55.5)
X  0
 z0.01 |   55.5)
/ n
X  a
  0
 P(
 z0.01  a
|   55.5)
/ n
/ n
55.5  55
 P( Z  2.326 
)
0.8 / 50
 P( Z  2.09)  0.9817
 P(
****** That’s all folks! Have a happy Halloween! *******
Download