Statistics 241.3 - Intersession

advertisement
Statistics 242.3 – Review Questions for Final Exam - Solutions
I.
An embryologist wished to estimate the mean difference in time at which eye
pigmentation is first evidenced in an embryo for 2 varieties of birds. A sample of n1 =
33 embryos from species A revealed that it took an average of 74.32 hours for eye
pigmentation to appear with a standard deviation of 2.51 hours. A second sample of m
=33 revealed that it took an average of 100.21 hours for eye pigmentation to appear
with a standard deviation of 19.44 hours.
a) Estimate the difference in the mean time at which eye pigmentation appears for
the two varieties of birds, with a 95% confidence interval
Solution:
 x2  x1   z / 2
s12 s22
2.512 19.442

or 100.21  74.32   1.960 

n1 n2
33
33
or 19.2 to 32.58
.
b) Using a 5% significance level test
the Null Hypothesis H0: Variety A’s pigmentaion requires the same period of time
to appear as the time required by variety B
against
the alternative Hypothesis HA: Variety A’s pigmentaion requires a shorter period
of time to appear than the time required by variety B
Solution
test statistic z 
x2  x1

100.21  74.32  7.59
s12 s22
2.512 19.442


33
33
n1 n2
Critical Region: Reject H0 if z > z0.05 = 1.645
Thus H0 is rejected
.
c) Assume that the cost of obtaining an embryo of variety A is $0.25 while the cost
of obtaining an embryo of variety B is $1.50. Determine the optimal number of
embryos of variety that would yield an estimate of the difference A - B having
an error bound of N = 2 hours with a 95% level of confidence.
Solution
Sample size determination


z2 
z2 
c
c
n1   /22  x2  2  x y  and n2   /22  y2  1  x y 
B 
c1
B 
c2


where B = Error Bound = N = 2 and za/2 = z0.025 = 1.96.
Thus
Page 1


1.50
2
 2.5119.44  =121 and
 2.51 
0.25



1.9602 
0.25
2
n2 
 2.5119.44  =382
19.44 
2
2 
1.50

n1 
II.
1.9602
22
An experiment with two outcomes (Success and Failure) is repeated n times
independently. Let  denote the probability of success (1 –  the probability of
failure). Let =  – (1 - ) = 2 – 1 denote the difference between the probability of
success and the probability of failure.
a) Find, ˆ , the maximum Likelihood Estimate of .
Solution
1 if i th repitition is a "Success"
Let x1, x2, … , xn be such that xi  
th
0 if i repitition is a "Failure"
x
1 x
x 1
 
1 x
  1   1 
x
Then f  x   P  xi  x   
= 1     
 

 2   2 
1   x  0
The joint distribution of x1, x2, … , xn is
n
n
n   xi
 xi
1 x
   1  i1  1    i1
  1   1 

f  x1 , , xn     xi 1    i   

 

 

 2   2 
i 1
i 1  2   2 
n
n
  1 
  1  
  1 
 1 
l    ln L     xi ln 

n

xi  ln 

 
  S ln 
   n  S  ln 

 2  
 2 
 2 
i 1
i 1
  2 
n
xi
n
1 xi
n
where S   xi
i 1
l     S
2 1
2
  n  S 
 1 2
1
or S 1       1 n  S 
S
nS
 1

    0 if
 1 1
 2
n
xi

S
i 1
ˆ
 1  2ˆ 
Thus n  2S  n and   2  1  2
n
n
b) Determine the approximate sampling distribution of ˆ .
Solution
The sampling distribution of ˆ is approximately normal with mean ̂   and
standard deviation  ˆ 
 1   
n
. Thus the sampling distribution of ˆ  2ˆ  is
Page 2
also approximately normal with mean ˆ  2ˆ  1  2  1   and standard deviation
 ˆ  2 ˆ  2
 1   
n

1   1   
n
c) Let x1, x2, x3, … , xn denote a random sample of size n from the normal
distribution with mean standard deviation .
Find the maximum Likelihood Estimate of .
Solution
The joint density of x1, x2, x3, … , xn is
n
f  x1 ,
 2   xi  
1  12  xi 
1
2
i 1
e

e
 L  
n/2 n
2
 2  
2
n
, xn   
i 1
1
2
n
n
2
l    ln L     ln  2   n ln   21 2   xi   
2
i 1
Thus l     n
1 n   xi   
n n  x 
 
    i


 2 i 1    
 i 1  
2
1
n
x
2
i
  1   1 xi    


2


n
   xi  n 2
n
 x  
i 1
  xi  i 3   i 1
0
 i 1   
3
n
 n 
or n 2    xi     xi2  0 and  2  x  1  1n  s 2  x 2   0
i 1
 i 1 
This has two solutions using the quadratic formula

n
2
 x  x 2  4 1  1n  s 2  x 2 

b

b

4
ac
ˆ 

2a
2
.
x
5 2
 
x  1  1n  s 2
2
4
n
n
 n
 3  n 2

 n 2
2

x

2
n



x


xi  n 2  3 2
x


x

n




i
i





i
i


 i 1
i 1

 i 1

i 1

   i 1
l    
6
 
3




 

 nx  2n  3   n  1 s 2  nx 2   nx  n 2   3 2 

n  2  2 x  3 1  1n  s 2  x 2  
6

4

n  xˆ  2 1  1n  s 2  x 2  
Page 3
ˆ 4
at   ˆ
III.
In a certain type of test specimen, the normal stress on a specimen is known to be
functionally related to the shear resistance. The following table gives experimental
data on the two variables.
normal stress(X).
shear resistance (Y)

26.8
26.5
25.4
27.3
28.9
24.2
32.6
27.1
27.7
23.6
23.9
25.9
x = 165.3, x2 = 4599.9, y = 154.6, y2 = 3995.4, xy = 4259.2
a) Plot a scatter plot of this data and comment.
28
27
26
25
24
23
22
20
22
24
26
28
30
32
34
Comment: There seems to be no relationship between X and Y.
b) Estimate the least squares line for predicting shear resistance (Y) normal stress
(X). Plot its graph and comment on the values of its parameters.
S xx = 45.855
S yy = 11.8333
S xy = -0.0400
S
0.0400
 154.6 
 165.3 
ˆ  xy 
 -0.00087 , ˆ  y  ˆ x  
 -  -0.00087  
  25.79
S xx
45.855
 6 
 6 
28
27
26
25
24
23
22
20
22
24
26
Page 4
28
30
32
34
c) Using a 5% significance level test to determine if shear resistance (Y) is not
functionally related to normal stress(X).
The test statistic is t 
ˆ  
s
where s 
S xx
S yy 
S xy2
n2
S xx
 1.71998
-0.000872  
 -0.00343 .Comparing this with t0.025 = 2.776 for 4
1.71998
45.855
d.f., we cannot reject H0:  = 0.
Thus t 
d) What would you expect for shear resistance (Y) of a specimen that was known
to have a normal stress measurement (X) of 25 units. Repeat the calculation for
a normal stress measurement (X) of 30 units. Compute 95% prediction limits
for the shear resistance measurements (Y) in each of these cases.
The predicted value of Y when X = x0 is yˆ  ˆ  ˆ x0  25.7907  0.000872x0
25.769 when x0  25

25.765 when x0  30
1 x  x
Prediction limits for Y when X = x0 are yˆ  ˆ  ˆ x0  t / 2 s 1   0
n
S xx
2
 20.306 to 31.231 when x0  25

20.325 to 31.204 when x0  30
IV.
An urn contains 10 balls, of which  balls are blue (the rest being red and white).
We are interested in testing the null hypothesis H0:  =3 versus HA: =4. Suppose that
we take a sample of size 3 balls, and reject H0 if all 3 draws yield blue balls. Compute
, the probability of a type I error. Compute , the probability of a type II error.
a) Assuming sampling without replacement.
 3
 
3  2 1
1
3
  P  type I error   P  Rejecting H 0 when true   


10  10  9  8 120
 
3
 4
 
3  2 1 29
3
  P  type II error   P  Accepting H 0 when False  1   
 1 4

10 
10  9  8 30
 
3
Page 5
b) Assuming sampling with replacement.
3
27
 3
  P  type I error   P  Rejecting H 0 when true     
 10  1000
3
936
 4
  P  type II error   P  Accepting H 0 when False  1    
 10  1000
V.
The owner of a sporting goods store was interested in determining if there was any
difference in the tension strength of a newly strung tennis racket due to the technician
who performed the task of stringing the racket. The store had five technicians who
strung tennis racket. Each was asked to string n = 10 tennis rackets. The data was
summarized in the table below:
Technician
A
B
C
D
E
Mean tension strength
45.3 50.7 40.2 61.8 49.2
Standard deviation
8.6
12.1 10.8
8.6
20.2
Analyze this data.
Solution: Using the Anova F-test
Technician
A
B
C
Total Ti
451
507
402
n
T 2 G 2 124604.2 24702
SS Between   i 


 2586.2
N
10
50
i 1 ni
D
618
E
492
Grand Total (G)
2470
n
  n  1 s
i
2
i
 163.802, SS Between   N  k  MS Between  45 163.802   7371.09
N k
Thus the Anova Table is:
Source
SS
df
MS
F
p-value
Between 2586.2
4
646.55 3.94714 0.00788
Within
7371.09
45
163.802
Total
9957.29
49
There is a significant difference amongst the strengths.
MS Between 
VI.
i 1
Notice that from the table we see that in the sample of n1 = 2108 Catholics,
571 of their father's reached the High school graduate level of education, while for
the sample of n2 = 1558 Protestants, 446 of their father's reached the High school
graduate level of education
a) Determine 95% confidence limits for the proportion of Catholics whose
fathers reach the High School graduate level of education.
pˆ 1  pˆ 
pˆ  z / 2
or 0.252 to 0.290
n
Page 6
b) Determine 99% confidence limits for the proportion of Protestants whose
fathers reach the High School graduate level of education.
pˆ 1  pˆ 
pˆ  z / 2
or 0.257 to 0.316
n
c) Determine 95% confidence limits for the difference in proportion Protestants
and Catholics whose fathers reach the High School graduate level of
education.
pˆ 1  pˆ1  pˆ 2 1  pˆ 2 

or -0.014 to 0.045
 pˆ1  pˆ 2   z / 2 1
n1
n2
d) Determine 99% confidence limits for the difference in the proportion of
Protestants and Catholics whose fathers reach the High School graduate level
of education.
pˆ 1  pˆ1  pˆ 2 1  pˆ 2 

or -0.023 to 0.054
 pˆ1  pˆ 2   z / 2 1
n1
n2
e) Is there a significant difference ( = 0.01) between the proportion of
Protestants and Catholics whose fathers reach the High School graduate level
of education?
The test statistic is z 
pˆ1  pˆ 2
1 1
pˆ 1  pˆ    
 n1 n2 
=1.029 where pˆ 
Comparing z with z0.005 = 2.576 we accept H0: no difference
Page 7
n1 pˆ1  n2 pˆ 2
 0.2774
n1  n2
VII. Suppose that X and Y are independent unbiased measurements of the angle 
and 3 respectively. Namely E(X) =  and E(Y) = 3. In addition assume that X
and Y have the same variance 2.
a) Determine the conditions on a and b that would result in T = aX + bY being and
unbiased estimator of .
ET   EaX  bY   aE X   bEY   a  b3   a  3b   if a + 3b = 1
b) Determine the values of a and b that would make T = aX + bY the unbiased
estimator of  with the smallest variance.
V  Var T   Var aX  bY   a 2Var  X   b 2Var Y 
 2 1  a 2  
 a  b    a  
 

3

 

dV 
 1  a  1   
V is minimized when
  2a  2
     0
da 
 3  3  
1 a
1
That is a 
or 9a = 1 – a and a  .
9
8
1
1 a 1 8 8 1 7
Also b 



3
3
24
24
7
1
Hence T  8 X  24 Y .
2
2

c) If X and Y can be assumed to be normally distributed with standard deviation,  =
3.0 degrees. Determine the formula for a 95% confidence interval for  based on
the statistic T. Use this to compute an estimate of  with the observations X = 14
and Y = 46.
T  18 X  247 Y has a Normal distribution with mean  and variance




V  Var T   a 2  b2    18    247  32 = 0.90625
T 
T 
Thus z 
has a standard Normal distribution.

0.90625 0.951972
T 


 1.96
Thus 0.95  P z 0.025  z  z 0.025   P  1.96 
0.951972


 P 1.960.951972  T    1.960.951972  P 1.866    T  1.866
 PT  1.866    T  1.866
Hence T  1.866 is a 95% confidence interval for 
Since X = 14 and Y = 46and T  18 X  247 Y  18 14  247 46  15.667 .
Then 95% confidence interval for are 15.667  1.866 or 13.301 to 17.033
2
2
page 8
VIII. In a study to investigate the effect of regular physical exercise on the
reduction of high blood pressure in males aged 60-65, the researchers selected at
random a sample of n1 =12 males suffering from high blood pressure in the
given age group and for a two-year period placed the subjects on a daily physical
exercise program. A second sample of n2 =12 males suffering from high blood
pressure in the given age group was selected and again for the two-year period
this group of subjects were placed on a physical exercise program for which
they were required to perform only once a week. Finally a third sample of n3
=12 males suffering from high blood pressure in the given age group was
selected. No exercise program was required for individuals in sample three. At
the end of the two-year period the reduction in blood pressure was measured for
each of the subjects in the study and is presented in the table below.
Table V.1: Reduction in Blood pressure for three groups of
males aged 60-65 initially suffering from high blood pressure.
Daily Exercise Weekly exercise No Exercise
21.2
15.7
5.6
15.6
18.5
- 5.7
4.5
29.0
- 14.8
32.0
17.4
4.9
20.4
6.7
8.6
12.2
11.4
18.6
11.2
4.1
- 7.9
30.9
23.5
15.2
10.4
15.1
- 2.3
23.7
5.8
- 8.9
22.8
25.7
- 1.2
11.2
19.7
8.6
Mean
18.00833
16.05000
1.72500
Std. Dev. 8.55904
7.95104
10.20010
x
216.1
192.6
20.7
2
x
4697.43
3786.64
1180.17
Carry out the Analy
Solution
The ANOVA Table
Source
SS
df
MS
F
p-value
Between 1896.75
3
632.25 7.64712 0.00054
Within
2645.7
32
82.6782
Total
4542.45
35
Thus we reject the Hypothesis of equality between group means
page 9
IX. Suppose that x1, x2, x3, …, xn is a sample from the following distribution
 e -  x - 
for   x  
:
f x  ,    
otherwise
0
a) Determine method of moments estimators of .


1   xf  x  ,  dx    x e


-   x - 


dx and 2   x f  x  , dx    x 2 e
2

-   x - 
dx

Putting u = x – . Hence du = dx and when x = , ∞ then u = 0, ∞.
Hence





0
0
0
1    x e-  x-  dx     u    e-u du   u e-u du     e -u du 


1

 
1  




0
0
0
2    x 2 e-  x -  dx     u    e- u du   u 2  e- u du  2  u e- u du   2   e- u du
2


2
2
0
 2
1

 2 =
 2 2  2  2
2
The method of moments estimators  ,  satisfy
m1 
1 n
1  
1 n 2  2 2  2  2
x

x

and
m

i
 xi 
2
n i 1
n i 1

2


2


Thus m1  1   or   m1 1 and m2 2  m1  1  2 m1  1  2
Hence
m
2
Aand  
 m12   2  1 or   
m1  1

 m1 
1

1
m2  m12

1
m2  m12
since  is positive
 m1  m2  m12
b) Determine maximum Likelihood estimators of .
Solution:
The joint density of x1, x2, x3, …, xn is:
n

-    xi - 
n
 n
for   x1 ,
L   ,  x    f  xi  ,     e i1
:
i 1
0
otherwise
n

-   xi - 
 n e 
i 1
for   min xi

0
otherwise
n
l      ln L      n ln  -    xi -   for   min xi
i 1
page 10
, xn  
l   
l   
1 n
1

 n -   xi -    n     x  and
 n   if   

 i 1



1
1
1

The implication is that ˆ  min xi and  x  ˆ or ˆ 
ˆ
x  ˆ x  min xi
Note
X. Let x1, x2, x3, …, xn be a sample of size n from the density function given by:
  k 1  x k 
kx e
x0
f ( x  )  

0
elsewhere
where k is a known positive constant and is an unknown parameter.
a) Find the Maximum Likelihood estimator of  ˆ .
b) Find the Method of Moments estimator of .
XI.
In the following study the investigator was interested in determinig if the
Presence of Heart Disease was related to Systolic Blood pressure. The study
consisted of four groups of subjects with differing levels of Systolic Blood
pressure (<127, 127-146, 147-166, 167+). The data is tabulated below:
Coronary
Heart
Disease
Present
Absent
Total
Systolic Blood pressure (mm Hg)
<127
20
388
408
127-146
28
527
555
147-166
20
204
224
167+
24
118
142
Carry out the Chi-square test to determine if there is any significant
(and = 0.01differences in the Presence of heart disease
between the four Blood pressure groups.
Table: frequencies, expected frequencies, standardized residuals
Coronary
Systolic Blood pressure (mm Hg)
Heart
Total
<127 127-146 147-166 167+
Disease
Present
Absent
Total
r
c
 2  
i 1 j 1
x
ij
20
(28.24)
-1.551
388
(379.76)
0.423
28
(38.42)
-1.681
527
(516.58)
0.458
20
(15.51)
1.141
204
(208.49)
-0.311
24
(9.83)
4.52
118
(132.17)
-1.233
408
555
224
142
 Eij 
Eij
92
1237
1329
2
2
 7.815 , Reject independence.
 28.966 , Compare with  0.05
page 11
Download