252y0323

advertisement
10/28/03 252y0323
(Page layout view!)
ECO252 QBA2
Name KEY
SECOND HOUR EXAM Hour of Class Registered
October 30, 2003
Circle 11am 12:30pm
I. (53 points) Do all the following (2points each unless noted otherwise).
Note the following:
1. You will be penalized if you do not compute the sample variance of the d column in
question 20, so you might want to do it now.
2. This test is normed on 50 points, but there are 74 points possible including the take-home.
You may not finish the exam and might want to skip some questions.
3. A table identifying methods for comparing 2 samples is at the end of the exam.
(Note that some equations have been ‘squashed’ by a bug in Minitab. They print out just fine, but you may
have to click on them to see them on your screen.)
1. A manufacturer revises a manufacturing process and finds a fall in the defect rate of 5%  4%.
a) *The fall in defects is statistically significant because 5% is larger than 4%.
b) The fall in defects is statistically significant because the confidence interval supports H0.
c) The fall in defects is not statistically significant because 4% is smaller than 5%.
d) The fall in defects is not statistically significant because the confidence interval would
lead us to reject H0.
Explanation: .05  .04 as a confidence interval means the interval .01 to .09. Since this does
not include zero the values are significance. Formally, we are testing H 0 : D  0 with a
confidence interval. If we reject the null hypothesis, the difference is significant.
2.
If we wish to determine whether there is evidence that the proportion of successes is higher in
group 1 than in group 2, the appropriate test to use is
a) *the z test.
H 1 : p1  p 2 is always a test using z .
2
b) the  test.
c) both of the above
d) none of the above
TABLE 12-14
Recent studies have found that American children are more obese than in the past. The amount of time children spend watching
television has received much of the blame. A survey of 100 ten-year-olds revealed the following with regards to weights and
average number of hours a day spent watching television. We are interested in testing whether the average number of hours
spent watching TV and weights are independent at 1% level of significance.
Weights
More than 10 lbs. overweight
Within 10 lbs. of normal weight
More than 10 lbs. underweight
Total
3.
0-3
1
20
10
31
TV Hours
3-6
9
15
5
29
6+
20
15
5
40
Referring to Table 12-14, if there is no connection between weights and average number of hours
spent watching TV, we should expect how many children to be spending 3-6 hours, on average,
watching TV and are more than 10 lbs. underweight?
a) 5
b) *5.8
c) 6.2
d) 8
Explanation: In the total column 20 out of 100 are more than 10 lbs underweight. 20% of 29 is
5.8.
Total
30
50
20
100
10/28/03 252y0323
4.
Turn in your computer output from computer problem 1 only tucked inside this exam paper. (3
points - 2 point penalty for not handing this in.)
MTB > TwoT 90.0 'educ' 'sex';
SUBC>
Alternative -1.
Two-Sample T-Test and CI: EDUC, SEX
Two-sample T for EDUC
SEX
Female
Male
N
788
651
Mean
13.19
13.28
StDev
3.03
2.85
SE Mean
0.11
0.11
Difference = mu (Female) - mu (Male )
Estimate for difference: -0.091
90% upper bound for difference: 0.108
T-Test of difference = 0 (vs <): T-Value = -0.58
P-Value = 0.280
DF = 1412
The computer output above refers to a test very much like the Minitab test you ran of two independent
samples. The major difference is that 1439 numbers appear in column 1 (labeled EDUC) which give
number of years of education completed and the computer sorted them by gender using the words ‘female’
and ‘male’ in column 5 (labeled SEX). The variable x F can thus refer to an imaginary column of female
education figures and x M to in imaginary column of male education figures. Call this the GSSEduc output.
5.
Referring to the GSSEduc output, and using the rules taught in class, the null hypothesis that was
tested is .
a) H0:  F –  M  0
b) *H0:  F –  M  0
c) H0:  F –  M  0
d) H0:  F –  M  0
Explanation: On the last line of the output it says “T-Test of difference = 0 (vs <).” So
we have H 1 : D  0 (Since H 0 can’t be a strict inequality). On the 4th line from the bottom, it
says “Difference = mu (Female) - mu (Male )”. This says D   F   M . If we put
these together we get H 1 :  F   M  0. The opposite is H 0 :  F   M  0.
6.
Referring to the GSSEduc output, we can conclude, (doing no more calculations) that, for the
particular population that was sampled
a) At the   .10 level, there is sufficient evidence that women had fewer years of education
than men.
b) At the   .10 level, there is a difference between the years of education gotten by men
and women.
c) *At the   .10 there is insufficient evidence that the average men’s education level is
higher than the women’s.
d) At the   .10 level, there is sufficient evidence to conclude that there is no difference
between men’s and women’s education level.
Explanation: We cannot reject the null hypothesis because the p-value is above the
significance level. Saying ‘we cannot reject’ is equivalent to saying ‘insufficient evidence to
reject.’
2
10/28/03 252y0323
7.
Referring to the GSSEduc output, the most commonly used methods to find degrees of freedom
are (i) to calculate df  n1  n 2  2  788  651  2  1437 , or (ii) to say that since we have large
sample to use z , which is equivalent to saying that the degrees of freedom are infinite, yet the
computer claims df  1412 . Explain, briefly, what the computer probably did (and assumed) to
get that number.
Solution: From the syllabus supplement.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0
d  D0
d cv  D0  t 2 sd
D  d  t 2 sd
t
Between Two
H
:
D

D
s
1
0
d
Means(
s12 s22
D




s


1
2
d
Unknown,
n1 n2
Variances
2
2
 s1 s22 
  
Assumed
n

n2 
1
Unequal)
DF   2
2
   
s12
n1
n1  1
s 22
n2
n2  1
The computer assumed that it was comparing two independent samples with (possibly) unequal
variances (Method D3). It used the formula shown as DF above.
8.
(Wonnacott and Wonnacott) A small piece of hose in the cooling system of a new engine has a
lifetime that varies normally (following the Normal distribution) around a mean of 18 months with
a standard deviation of 4 months. The first maintenance check occurs at 12 months. What is the
probability that the hose will wear out before the maintenance check? (This is the same as the per
cent of hoses that will wear out before the first maintenance check!) Make a diagram!
12  18 

Solution: x ~ N 18,4 . Px  12   P z 
  Pz  1.5  .5  .4332  .0668
4 

Make a diagram! Either a diagram for z with zero in the middle and the area below -1.5 shaded
or a diagram for x with 18 in the middle and the area below 12 shaded.
9.
In problem 5 above, the manufacturer decides that too much money is being spent on maintenance
checks. If the manufacturer is willing to accept having 20% of hoses wear out before the fist
maintenance check, how many months (to the nearest 100th of a month) can the manufacturer wait
until the check? (This is the same as finding the 20th percentile of the distribution)
Solution: x ~ N 18,4 . We want x.80 , the 20th percentile of this distribution. The easy way is to
note that according to the t table z.20  0.842 . This means that Pz  0.842   .20 . So it must be
true that Pz  0.842   .20 , or z .80  0.842 Using the formula x    z , we get
x.80    z.80  18  0.842 4  14.63 . The hard way to do this, which we can only avoid
because .20 is an easy number to find on the t table, is to make a diagram for z with zero in the
middle and an area marked 50% above zero . We know z .80 is below zero since 20% is below
z .80 , and 50% is below zero. There must be 30% between z .80 and zero. But we know
z.80   z.20 , so P0  z  z .20   .3000 . If we look at the Normal table, the closest we can come is
P0  z  0.84   .2995 . This implies that z.80  0.84 and that
x.80    z.80  18  0.844  14.64 .
14 .63  18 

Check: Px  14 .63   P z 
  Pz  0.84   .5  .2995  .2005 Make a diagram!
4


3
10/28/03 252y0323
10. The t test for the difference between the means of 2 independent populations assumes that the
respective
a) sample sizes are equal.
b) sample variances are equal.
c) *populations are approximately normal.
d) all of the above
TABLE 10-3
The use of preservatives by food processors has become a controversial issue. Suppose 2 preservatives are extensively tested and
determined safe for use in meats. A processor wants to compare the preservatives for their effects on retarding spoilage. Suppose
15 cuts of fresh meat are treated with preservative A and 15 are treated with preservative B, and the number of hours until
spoilage begins is recorded for each of the 30 cuts of meat. The results are summarized in the table below.
Preservative A
Preservative B
x A = 106.4 hours
s A = 10.3 hours
x A = 96.54 hours
s B = 13.4 hours
11. Referring to Table 10-3, state the test statistic for determining if the population variance for
preservative B is larger than the population variance for preservative A.
a) F = 3.100
b) F = 1.300
c) *F = 1.693
d) F = 0.591
Explanation: We have the alternative hypothesis H 1:  B2   A2 . Since the rule says to put
the larger variance in the alternate hypothesis on top, we get
s B2
s A2

13 .4 2
10 .3 2
 1.693 .
12. Referring to Table 10-3, what assumptions are necessary for a comparison of the population
variances to be valid?
a) Both sampled populations are normally distributed.
b) Both samples are random and independent.
c) Neither (a) nor (b) is necessary.
d) *Both (a) and (b) are necessary.
4
10/28/03 252y0323
TABLE 10-4
A real estate company is interested in testing whether, on average, families in Gotham have been living in their current homes
for less time than the families in Metropolis have. A random sample of 100 families from Gotham and a random sample of 150
families in Metropolis yield the following data on length of residence in current homes:
Gotham:
x G = 35 months, s G2 = 900
Metropolis: x M = 50 months,
2
sM
= 1050
13. Referring to Table 10-4, which of the following represents the relevant hypotheses tested by the
real estate company?
a) * H 0 :  G –  M  0 versus H 1 :  G –  M  0
b) H 0 :  G –  M  0 versus H 1 :  G –  M  0
c)
H 0 :  G –  M  0 versus H1 :  G –  M  0
d) H 0 : xG – x M  0 versus H 1 : xG – x M  0
Explanation: The problem statement starts out by saying we want to test  G   M . This is
an alternative hypothesis because it is a strict inequality. If  G is below  M ,  G   M will
be negative.
14. Referring to Table 10-4, what is the estimated standard error of the difference between the two
sample means?
a) *4.00
b) 4.06
c) 5.61
d) 8.01
e) 16.00
Explanation: These are humongous samples so we use method D1. For this method we have
sd 
s12 s 22
900 1050



 9  7  16  4
n1 n 2
100 150
15. Referring to Table 10-4, what is (are) the critical value(s) for the test ratio for the relevant
hypothesis test if the level of significance is 0.05?
a) * z = – 1.645
b) z =  1.960
c) z = – 1.960
d) z = – 2.080
Explanation: Since the alternative hypothesis is H 1 : D   G –  M  0 , this is a left-tail test
and, if we use a test ratio we will compare it with  z   z .05  1.645 .
16. When testing H 0 : 1   2  0 versus H 1 : 1   2  0 , the observed value of the z -score (test
ratio) was found to be – 2.13. The p-value for this test would be
a) 0.0166.
b) 0.0332.
c) 0.9668.
d) *0.9834.
Explanation: Since the alternative hypothesis is H 1 : D  1 –  2  0 , this is a right-tail
test and, if we use a test ratio we will compute
z
D0
 2.13 . Since this is a right-tail
sd
test, we need Pz  2.13   Pz  0  P0  z  2.13  .5  .4834  .9834
5
10/28/03 252y0323
TABLE 10-9
A buyer for a manufacturing plant suspects that his primary supplier of raw materials is overcharging. In order to determine if
his suspicion is correct, he contacts a second supplier and asks for prices on various materials. He wants to compare these prices
with those of his primary supplier. The data collected is presented in the table below, with some summary statistics presented
(all of these might not be necessary to answer the questions which follow). The buyer believes that the differences are normally
distributed and will use this sample to perform an appropriate test at a level of significance of 0.01.
Primary
Secondary
Material
Supplier
Supplier
Difference
1
$55
$45
$10
2
$48
$47
$1
3
$31
$32
– $1
4
$83
$77
$6
5
$37
$37
$0
6
Sum:
Sum of Squares:
$55
$54
$1
$309
$292
$17
$15,472
$139
$17,573
17. Referring to Table 10-9, the hypotheses that the buyer should test are a null hypothesis that (Fill in
blanks) H 0 : D  0 or H 0 : 1   2  0 or H 0 : 1   2 versus an alternative hypothesis that
H1 : D  0 or H1 : 1   2  0 or H 1 : 1   2 . The text says H 0 :  D  0 vs. H1 :  D  0 .
18. Referring to Table 10-9, the test to perform is a
a) pooled-variance t test for differences in 2 means (D2).
b) separate-variance t test for differences in 2 means (D3).
c) Wilcoxon signed rank test for differences in 2 medians (D5b).
d) *t test for mean difference in paired data (D4).
e) Wilcoxon-Mann-Whitney test for differences in 2 medians (D5a).
19. Referring to Table 10-9, the number of degrees of freedom is
a) *5.
b) 10.
c) Irrelevant because you are using a rank test.
d) Found by a complicated formula
Explanation: There are 6 pairs of numbers. df  n  1  5.
6
10/28/03 252y0323
20. Two brands of gasket are being considered are for use on a high pressure oil pump. The number of
hours that the gasket worked are as follows.
Pump
Brand 1
1
2
3
4
5
Brand 2
x1
x2
2982.28
3025.86
2952.02
2954.64
2981.01
2863.39
2906.97
2873.52
2959.06
2899.98
difference
d  x1  x 2
118.89
118.89
78.50
-4.42
81.03
Because the data is paired, a test was run using Minitab,(method D4) with the following results
MTB > Paired c7 c8;
SUBC> Alternative 1.
Paired T-Test and CI: brand 1, brand 2
Paired T for brand 1 - brand 2
N
Mean
5 2979.2
5 2900.6
5
78.6
brand 1
brand 2
Difference
StDev SE Mean
29.7
13.3
37.3
16.7
____
___
95% lower bound for mean difference: 30.6
T-Test of mean difference = 0 (vs > 0): T-Value = 3.49 P-Value = ?
Compute the standard deviation of the d column, showing your work, and fill in the blanks in the
difference row. You should get a t-ratio approximately equal to the T-Value shown above. State the
hypotheses, find an approximate p-value and tell whether you reject the null hypothesis. (7)
Solution: Our results are as follows H 0 : D  0 or H 0 : 1   2  0 or H 0 : 1   2 versus an
alternative hypothesis that H 1 : D  0 or H 1 : 1   2  0 or H 1 : 1   2 . The text says H 0 :  D  0 vs.
H1 :  D  0 .
Row
brand 1
1
2
3
4
5
d
brand 2
x1
x2
d  x1  x 2
2982.28
3025.86
2952.02
2954.64
2981.01
2863.39
2906.97
2873.52
2959.06
2899.98
118.89
118.89
78.50
-4.42
81.03
392.89
 d  392 .89  78.578 ,
s d2 
sd 
difference
n
d
d2
14134.8
14134.8
6162.3
19.5
6565.9
41017.3
5
2
 nd 2
n 1

2536 .1974

5
41017 .3  578 .578 2
 2536 .1974 , s d  50.3697 and
4
s d2
d  0 78 .578
 507 .2394  22 .5220 . t 

 3.489 .
n
sd
22 .5220
Since there are 4 degrees of freedom, we compare the t we have computed with values of t on the t-table.
Note that t.4025  2.776 and t.401  3.747 are on either side of 3.289, so that .01  p  value  .025 .
If we assume that the significance level is 5%, since our p-values are below our significance level, we reject
the null hypothesis.
7
10/28/03 252y0323
For comparison, The Minitab printout has the following.
Results for: 2x0323-21.MTW
MTB > describe c7-c9
Descriptive Statistics: brand 1, brand 2, difference
Variable
brand 1
brand 2
differen
N
5
5
5
Mean
2979.2
2900.6
78.6
Median
2981.0
2900.0
81.0
TrMean
2979.2
2900.6
78.6
Variable
brand 1
brand 2
differen
Minimum
2952.0
2863.4
-4.4
Maximum
3025.9
2959.1
118.9
Q1
2953.3
2868.5
37.0
Q3
3004.1
2933.0
118.9
StDev
29.7
37.3
50.4
SE Mean
13.3
16.7
22.5
MTB > Paired c7 c8;
SUBC>
Alternative 1.
Paired T-Test and CI: brand 1, brand 2
Paired T for brand 1 - brand 2
brand 1
brand 2
Difference
N
5
5
5
Mean
2979.2
2900.6
78.6
StDev
29.7
37.3
50.4
SE Mean
13.3
16.7
22.5
95% lower bound for mean difference: 30.6
T-Test of mean difference = 0 (vs > 0): T-Value = 3.49
P-Value = 0.013
21. Using the means and standard deviations in the computer printout above, repeat the test done by
the computer, assuming the brand 1 and brand 2 columns represent independent samples and using
a pooled variance. (Method D2) . Show your work! (5 points) (Note!!! This said D3 on the
original and I’m amazed that no one caught me at it. You were given full credit if you used Method
D3.)
s1  29 .7
s 2  37 .3


Solution: Given: n1  5
and n 2  5
. d  x1  x 2  78.6 . df  5  1  5  1  8. We are
 x  2979 .2
 x  2900 .6
 1
 2
assuming 12   22 . H 0 : D  0 , H1 : D  0 .
n  1s12  n2  1s 22 429.72  4 37.32 882 .09  1391 .29


So s p2  1


 1136 .69 or s p  33 .715 .
8
2
n1  n 2  2
1 
1 1
  1
  1136 .69     454 .676  21 .3231 .
and s d  s p2  
n
n
5 5
2 
 1
So t 
d 0
78 .6

 3.686 . Since there are 8 degrees of freedom, we compare the t we have
sd
21 .3231
computed with values of t on the t-table. Note that t.8005  3.355 and t.8001  4.501 are on either side of
3.686, so that .001  p  value  .005 . If we assume that the significance level is 5%, since our p-values are
below our significance level, we reject the null hypothesis.
8
10/28/03 252y0323
This is confirmed by the Minitab output:
MTB > TwoSample c7 c8;
SUBC>
Pooled;
SUBC>
Alternative 1.
Two-Sample T-Test and CI: brand 1, brand 2
Two-sample T for brand 1 vs brand 2
brand 1
brand 2
N
5
5
Mean
2979.2
2900.6
StDev
29.7
37.3
SE Mean
13
17
Difference = mu brand 1 - mu brand 2
Estimate for difference: 78.6
95% lower bound for difference: 38.9
T-Test of difference = 0 (vs >): T-Value = 3.68
Both use Pooled StDev = 33.7
P-Value = 0.003
DF = 8
22. (Wonnacott and Wonnacott)A random sample of 7 workers are selected to work under better
conditions for a day while 3 others still work under the old conditions. The Wilcoxon procedure
for independent samples is used. To test a 1-sided hypothesis W is computed.
Output is as follows :
Old
44
44
49
W has the value
a) 6
b) *7.
c) 10
d) 137
Explanation: If we replace the numbers by their ranks, we get
x1
r1
x2
44
1.5
48
44
1.5
50
49
4
51
57
57
61
___
82
Sum
7
Since W is the smaller rank sum, it is 7.
New
48
50
51
57
57
61
82
r2
3
5
6
7.5
7.5
9
10
48
9
10/28/03 252y0323
Location - Normal distribution.
Compare means.
Location - Distribution not
Normal. Compare medians.
Paired Samples
Method D4
Independent Samples
Methods D1- D3
Method D5b
Method D5a
Proportions
Method D6
Variability - Normal distribution.
Compare variances.
Method D7
From the Formula Table:
Interval for
Confidence
Interval
Difference
D  d z 2  d
between Two
Means (
 12  22



d
known)
n1 n 2
(Method D1)
d  x1  x 2
Difference
between Two
Means (
unknown,
variances
assumed equal)
(Method D2)
D  d  t 2 s d
Difference
between Two
Means(
unknown,
variances
assumed
unequal)
(Method D3)
D  d  t 2 s d
Ratio of Variances
1 , DF2
F1DF


2
1
FDF1 , DF2
2
(Method D7)
sd  s p
Hypotheses
H 0 : D  D0 *
H 1 : D  D0 ,
z
D  1   2
H 0 : D  D0 *
1 1

n1 n2
Test Ratio
H 1 : D  D0 ,
D  1   2
t
sˆ 2p 
Critical Value
d cv  D0  z  2  d
d  D0
d
d cv  D0  t 2 sd
d  D0
sd
n1  1s12  n2  1s22
n1  n2  2
DF  n1  n2  2
s12 s22

n1 n2
sd 
DF 
H 0 : D  D0 *
 s12 s22 
  
n

 1 n2 
H 1 : D  D0 ,
t
D  1   2
d cv  D0  t 2 sd
d  D0
sd
2
   
s12
2
n1
n1  1
s 22
2
n2
n2  1
 22 s22 DF1 , DF2

F

 12 s12 .5  .5  2 
DF1  n1  1
DF2  n 2  1
 2

.5  .5   2    or
1  
2

H0 : 12   22
H1 : 12   22
F DF1 , DF2 
s12
s 22
and
F DF2 , DF1 
s 22
s12
* Same as H 0 : 1   2 , H1 : 1   2 if D0  0. Note that  has been changed to D . For method D4
see page 12.
10
10/28/03 252y0323
ECO252 QBA2
SECOND EXAM
October 30, 2003
TAKE HOME SECTION
Name: ________KEY_____________
Social Security Number: _________________________
II. Neatness Counts! Show your work! Not that formulas messed up by word can
be seen by clicking on them and print just fine.
1) To compare two formulations of gasoline, a company picked 7 automobiles and ran each automobile
for one week with formulation 1 and for one week with formulation 2 .
Miles per gallon appear below.
Row
1
2
3
4
5
6
7
gas 1
gas 2
30.8
34.5
13.2
26.3
26.2
26.2
26.3
30.2
34.7
12.6
25.3
25.7
25.0
25.0
Before you start, replace the 0 in 25.0 in the mpg for car 7 with the last digit of your Social Security
Number. This number will now be between 25.0 and 25.9. Example: Since my SS number is
265398248, I will change the last 25.0 to 25.8.
I got 26.214 for the mean miles per gallon for car 1. Make sure that you carry a comparable number of
digits in your computations. If gas 1 is the new formulation and gas 2 is the old formulation, the
company will switch from the old to the new formulation only if miles per gallon for the new
formulation are higher.   .01 except when indicated otherwise.
a. In order to make our decision, we must do a hypothesis test. What are the null and alternative
hypotheses that you are testing? (1) Use the 3 ways below to test the hypotheses.
b. Do the appropriate hypothesis test for your hypotheses using a test ratio and find an approximate
p-value for the hypothesis. On the basis of your p-value, would we reject the null hypothesis when the
significance level is 10%? Why? (3)
c. Repeat the test, using a critical value for the difference between the sample means. (2)
d. Do an confidence interval for the difference between the two means appropriate to your
hypotheses.(2)
e. Write a brief report to the product development vice president explaining whether the company
should switch to the new formulation and why? (1)
Solution: In this version I have left 25.0 alone. Since this is paired data (Each line refers to one car),
we need the sample mean and variance of the difference between mpg for the two gasoline
formulations. For the data above we have the following.
Row
1
2
3
4
5
6
7
gas 1
gas 2
x1
x2
30.8
34.5
13.2
26.3
26.2
26.2
26.3
183.5
30.2
34.7
12.6
25.3
25.7
25.0
25.0
178.5
C3
C4
d  x1  x 2 d 2
0.6
-0.2
0.6
1.0
0.5
1.2
1.3
5.0
0.36
0.04
0.36
1.00
0.25
1.44
1.69
5.14
 x  183 .5,  x
and  d  5.14 .
So
1
2
 178 .5,
 d  5 .0
2
If you did not know this was paired data
x 22  4825 .5.
x12  5069 .4 and

s12

 43.16
s 22
 45.56
11
10/28/03 252y0323
 d  5.0  0.7143 and s   d
So we have d 
n
7
s  0.26140  0.5113 .   .01,
2
d
2
 nd 2
n 1

5.14  70.7143 2
 0.26140 so
6
6
 3.143
df  n  1  6 and tn 1  t .01
a) The company will switch if 1   2 . This is the alternative hypothesis. So the null hypothesis must be
 H 0 : 1   2
 H 0 : 1   2  0
or 
or
 H 1 : 1   2
 H 1 : 1   2  0
1   2 . If we use D  1   2 , the hypotheses are as follows: 
H 0 : D  0
We can see from this that we have a right-tail test.

H 1 : D  0
As it says in document 252solnD3, if the paired data problem were on the formula table, it would appear as
below.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D  1   2
s
data.) (Method
sd  d
D4)
n
sd 
sd
n

.26140
 0.0373429  0.1932
7
d  D0 0.7143

 3.6972 To do a traditional hypothesis test, make a diagram
sd
.1932
of a Normal curve with zero in the middle. Show a ‘reject’ zone above t n 1  t 6   3.143 . Since our
b) Test Ratio Method: t 

.01
computed t is in this zone, reject the null hypothesis. To find the p-value, compare 3.6972 with the
6
6
 3.143 and t .005
 3.707 , we can say
df  n  1  6 row of the t table. Since 3.6972 is between t .01
.005  p  value  .01 . Since these are both below the significance level of 10%, we reject the null
hypothesis.
c) Critical Value Method: Because this is a right-tail test, d cv  D0 t  2 s d becomes
d cv  D0 t  s d  0  3.143 0.1932   0.6072 . Make a diagram of a Normal curve with zero in the middle.
Show a ‘reject’ zone above d cv  0.6072 . Since our computed d  0.7143 is in this zone, reject the null
hypothesis.
d) Confidence Interval Method: Because the alternative hypothesis is H 1 : D  0 , the confidence interval
formula D  d t  2 s d becomes D  d t  2 s d  0.7143  3.143 0.1932   0.1071 . Since
D  0.1071 contradicts the null hypothesis H 0 : D  0 , reject the null hypothesis.
e) Your report, if you got the same results that I did, should say that on the basis of a test of each fuel in
seven vehicles, the new formulation offers a significant improvement in miles per gallon over the old
formulation and thus should be adopted.
10/28/03 252y0323
Your Results: Solutions to these sections with other numbers are sketched on pages 4 – 13 of 252y023s.
For example, if you substituted 8 for zero, look for results for 2x0323-18 . The means, standard deviations
and standard errors s x  can be found in “Descriptive Statistics” where C3 is d . The “Paired T-Test”
repeats these numbers and gives (i) the bottom of the confidence interval as “ 99% lower bound” the value
x1 , K2 is
x 2 , K3 is
of the t ratio as “T-Value” and the p-value. C4 is the square of C3. K1 is
x12 and “Sum of Squares of Gas 2” is
d and K4 is
d 2 . “Sum of Squares of Gas 1” is
x 22 .



b) Test Ratio Method: The value of t that you should have gotten, t calc 



d  D0
, appears as ‘T-Value’ on
sd
the printout. To do a traditional hypothesis test, make a diagram of a Normal curve with zero in the middle.
6
Show a ‘reject’ zone above tn 1  t .01
 3.143 . Since our computed t is in this zone, reject the null
hypothesis. To find the p-value, compare t calc with the df  n  1  6 row of the t table. If t calc is between
6
6
t .01
 3.143 and t .005
 3.707 , we can say .005  p  value  .01 . Since these are both below the
significance level of 10%, we reject the null hypothesis.
c) Critical Value Method: Because this is a right-tail test, d cv  D0 t  2 s d becomes d cv  D0 t  s d . d is
6
 3.143 .
‘difference’ mean on the printout. s d is the ‘difference’ SE Mean on the printout. Use tn 1  t .01
Make a diagram of a Normal curve with zero in the middle. Show a ‘reject’ zone above d cv . Since your
computed d is in this zone, reject the null hypothesis.
d) Confidence Interval Method: Because the alternative hypothesis is H 1 : D  0 , the confidence interval
formula D  d t  2 s d becomes D  d t  2 s d (Use the numbers given in c)). Since your confidence
interval should contradict the null hypothesis H 0 : D  0 , reject the null hypothesis.
2) The following data refers to defects in finishing of samples of automobiles made on the various days
of the week.
Day .
No. with Major No. with Minor
No. with no
Size of Sample
Defects
Defects
Defects
Monday
8
22
170
200
Tuesday
2
10
188
200
Wednesday
6
16
178
200
Thursday
2
8
190
200
Friday
10
34
156
200
Before you start, replace the 0 in 10 in the Major Defects column with the last digit of your Social
Security Number and reduce 156 by the same amount. Example: Since my SS number is 265398248, I
will change the 10 to 18 and then subtract 8 from 156 to get 148.The sum of the row will stay 200.
a) Do a statistical test to show if the proportion of cars in the three categories is the same. (4)
13
10/28/03 252y0323
b) Assuming that you reject your null hypothesis, do a Marascuilo procedure to see which days have
proportions of cars with no defects that are significantly different from the others. Note that to do
this, you will have to divide the automobiles between those with no defects and those with some
defects (which will cut down on degrees of freedom) and then do C25  20 contrasts between
proportions. This seems like too much work. Since Friday has the highest defect rate it should be
enough to compare the defect rate on Friday with the defect rate on the other 4 days or the no defect
rate on Friday with the no defect rate on the other 4 days. (4)
Solution: a) I will leave the 10 major defects on Friday alone. This is a chi-squared test of
homogeneity. H 0 is Homogeneity . If we sum the columns and take the proportion  p r  in each row
we get
Day .
No. with Major No. with Minor
No. with no
Size of
Proportion
Defects
Defects
Defects
Sample
Monday
8
22
170
200
.2
Tuesday
2
10
188
200
.2
Wednesday
6
16
178
200
.2
Thursday
2
8
190
200
.2
Friday
10
34
156
200
.2
Sum
28
90
882
1000
1.0
This is our O . To get E take the column totals and multiply them by the proportion in each row. For
example the number for “no major defects” and “Monday” is gotten by multiplying the column sum,
882, by the row proportion, .2 to give us .2882   176 .4 . We thus have the following .
E
O
5.60 18 .0 176 .4
8 22 170
5.60 18 .0 176 .4
2 10 188
5.60 18 .0 176 .4
6 16 178
5.60 18 .0 176 .4
2 8 190
5.60 18 .0 176 .4
10 34 156
If we write these out by columns, we get the O and E columns below.
2
O2
O
E O
Row
E  O2 E  O 
E
E
E
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
8
2
6
2
10
22
10
16
8
34
170
188
178
190
156
1000
5.6
5.6
5.6
5.6
5.6
18.0
18.0
18.0
18.0
18.0
176.4
176.4
176.4
176.4
176.4
1000.0
-2.4
3.6
-0.4
3.6
-4.4
-4.0
8.0
2.0
10.0
-16.0
6.4
-11.6
-1.6
-13.6
20.4
0.0
5.76
12.96
0.16
12.96
19.36
16.00
64.00
4.00
100.00
256.00
40.96
134.56
2.56
184.96
416.16
1.0286
2.3143
0.0286
2.3143
3.4571
0.8889
3.5556
0.2222
5.5556
14.2222
0.2322
0.7628
0.0145
1.0485
2.3592
38.0045
11.429
0.714
6.429
0.714
17.857
26.889
5.556
14.222
3.556
64.222
163.832
200.363
179.615
204.649
137.959
1038.004
14
10/28/03 252y0323
Compute (Method 1) the E  O , E  O 2 and
E  O  2
E  O  2
E
2
columns.  computed
will be 38.0045 the sum of
O2
2
column.  computed
will be the sum of the
E
E
column less n , where n is the sum of the O or the E column. 1038.004 – 1000 = 38.004.
the
column. Or compute (Method 2) the
8
Degrees of freedom are df  r  1c  1  5  13  1  8, and, if we assume   .05 ,  2 .05  15.5073 .
2
Since  computed
is larger than the table value of  2 , reject the null hypothesis.
Your Results: Solutions to these sections with other numbers are sketched on pages 16 – 23 of 252y023s.
For example, if you substituted 8 for zero, look for results for 2x0323-28. The printout for the version of the
problem just discussed is below.
————— 10/30/2003 5:42:11 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive
D\MINITAB\2x0323-20.mtw".
Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My
Documents\Drive D\MINITAB\2x0323-20.mtw
# Worksheet was saved on Wed Oct 29 2003
Results for: 2x0323-20.mtw
MTB > Chisquare c1, c2, c3
Chi-Square Test: C1, C2, C3
Expected counts are printed below observed counts
C1
8
5.60
C2
22
18.00
C3
170
176.40
Total
200
2
2
5.60
10
18.00
188
176.40
200
3
6
5.60
16
18.00
178
176.40
200
4
2
5.60
8
18.00
190
176.40
200
5
10
5.60
34
18.00
156
176.40
200
Total
28
90
882
1000
1
Chi-Sq =
1.029
2.314
0.029
2.314
3.457
DF = 8, P-Value
+ 0.889
+ 3.556
+ 0.222
+ 5.556
+ 14.222
= 0.000
+
+
+
+
+
0.232
0.763
0.015
1.049
2.359
+
+
+
+
= 38.005
15
10/28/03 252y0323
2
You can see that the O and E table are printed together and that  computed
is done by method 1. Compare


8
2
this with  2 .05  15.5073 . Since all values of  computed
are above 35, the null hypothesis is rejected and
the p-value is zero.
b) According to the outline “The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected
 
and (ii) p a  p b   2 s p , where a and b represent 2 groups, the chi - squared has c  1 degrees of
freedom and the standard deviation is s p 
p a q a pb qb

, you can say that you have a significant
na
nb
difference between p a and p b .” Though there is no need to do this, because we are distinguishing only ‘no
defects’ and ‘defects, ’ our table is effectively as below.
Day .
Monday
Tuesday
Wednesday
Thursday
Friday
Sum
No. with
Defects
30
12
22
10
44
118
No. with no
Defects
170
188
178
190
156
882
Size of
Sample
200
200
200
200
200
1000
Proportion
With no defects
.85
.94
.89
.95
.78
pq
n
.0006375
.0002820
.0004895
.0002375
.0008580
156
 .78 . It also
200
pq .78 1  .78  .78.22 
44


 .0008580 . Note that, if I had used the defect rate,
 .22 ,
includes
n
200
200
200
pq
would be the same, as would any differences calculated between proportions. If we wish to
n
The table above also includes the proportion with no defects, for Friday this is p 
compare Monday with Friday , we let Friday give us p b and use Monday’s proportion as p a . This
means that the difference between the proportions is p a  pb  .85  .78  .07 and that
s 2p 
p a q a pb qb

 .0006375  .0008580  .0014955 . Because the degrees of freedom are now
na
nb
4
effectively df  r  1c  1  5  12  1  4, we use  2 .05  9.4877 . This means
 2 s p   9.4877 .0014955   .014189  .1191 . Since this is larger than the difference between
the proportions, we cannot say that there is a significant difference between the proportions.
16
10/28/03 252y0323
These computations are repeated in the table below, with ‘n.s.’ indicating an insignificant difference
and ‘s’ indicating a significant difference.
 2 s 2p
s 2 p
 2 s p
Day .
p a  pb
Monday
.0006375+.0008580
9.4877 (.0014955)
.1191
.85  .78  .07
=.0014955
= .014189
Tuesday
.0002820+.0008580
9.4877 (.0011400)
.1040
.94  .78  .16
=.0011400
= .010816
Wednesday
.0004895+.0008580
9.4877 (.0013475)
.1131
.89  .78  .11
=.0013475
= .012785
Thursday
.0002375+.0008580
9.4877 (.0011055)
.1024
.95  .78  .17
=.0011055
= .010489
 
Conclusion
n.s.
s
n.s.
s.
4
Your results should be very similar since  2 .05  9.4877 is used in all versions and the values of s p
are not very different.
3) Extra credit. Assume that the data in problem 1 represents two independent samples and that you are
not willing to assume that variances are equal. Test your hypothesis all three ways. (6)
 H 0 : 1   2
 H 0 : 1   2  0
Solution: If we use D  1   2 , the hypotheses are as follows: 
or 
or
 H 1 : 1   2
 H 1 : 1   2  0
H 0 : D  0
We can see from this that we have a right-tail test.

H 1 : D  0
From the Formula Table:
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d  t 2 s d
d cv  D0  t 2 sd
d  D0
t
between Two
H
:
D

D
,
sd
1
0
s2 s2
Means(
sd  1  2
D




1
2
unknown,
n1 n2
variances
2
 s12 s22 
  
assumed
n

n2 
1
DF   2
unequal)
s


s n 2
n
(Method D3)

2
1
2
2
1
n1  1
From Problem 1
d  0.7143 .
x12  5069 .4 ,

x
1
x
2
n2  1
 183 .5,
2
2
x
2
 178 .5, x1  26 .21 and x 2  25 .50 but we already have
 4825 .5 , s12  43.16 and s 22  45.56 .
s12 43 .16
s 2 45 .56

 6.1657 2 
 6.5086
n1
7
n2
7
sd 
s12 s 22

 6.1657  6.5086  12 .6743
n1 n 2
s12 s22

 12 .6743  3.5601
n1 n2
17
10/28/03 252y0323
df 
 s12 s 22 



 n1 n 2 


 s12





n1 
n1  1
2
 s 22




2


n 2 
2

12 .6743 2
6.1657 2  6.5086 2
6

160 .6379
160 .6379

 11 .9912 . I
6.335976  7.060312 13 .396288
6
n2 1
11
rounded this down to 11 degrees of freedom. t .01
 2.718
d  D0 0.7143

 0.201 To do a traditional hypothesis test, make a diagram
sd
3.5601
of a Normal curve with zero in the middle. Show a ‘reject’ zone above t 11  2.718 . Since our computed t
a) Test Ratio Method: t 
.01
is not in this zone, do not reject the null hypothesis. To find the p-value, compare 0.201 with the df  11
11
11
row of the t table. Since 0.201 is between t .45
 0.129 and t .40
 0.260 , we can say
.40  p  value  .45. Since these are both above the significance level of 10%, we do not reject the null
hypothesis.
b) Critical Value Method: Because this is a right-tail test, d cv  D0 t  2 s d becomes
d cv  D0 t  s d  0  2.718 3.5601   9.6764 . Make a diagram of a Normal curve with zero in the middle.
Show a ‘reject’ zone above d cv  9.6764 . Since our computed d  0.7143 is not in this zone, do not
reject the null hypothesis.
c) Confidence Interval Method: Because the alternative hypothesis is H 1 : D  0 , the confidence interval
formula D  d t  2 s d becomes D  d t  2 s d  0.7143  2.718 3.5601   8.9621 . Since
D  8.9621 does not contradict the null hypothesis H 0 : D  0 , do not reject the null hypothesis.
Your Results: Solutions to these sections with other numbers are sketched on pages 25 – 29 of 252y023s.
For example, if you substituted 8 for zero, look for results for 2x0323-18 . The means, standard deviations
and standard errors s x  for x1 and x 2 appear first. The “Two-Sample T-Test” gives these numbers and
gives (i) the bottom of the confidence interval as “ 99% lower bound” the value of the t ratio as “T-Value”
x1 , K2 is
x 2 , “Sum of
and the p-value. For other values go back to the pages for Problem 1: K1 is
2
2
x
x
.
Squares of Gas 1” is
1 and “Sum of Squares of Gas 2” is
2




d  D0
a) Test Ratio Method: The value of t that you should have gotten, t calc 
, appears as ‘T-Value’ on
sd
the printout. To do a traditional hypothesis test, make a diagram of a Normal curve
with zero in the middle.
11
 2.718 . Since our computed t is not in this zone, do not reject the null
Show a ‘reject’ zone above t .01
hypothesis. To find the p-value, compare t calc with the df  11 row of the t table. If t calc is between is
11
11
t .45
 0.129 and t .40
 0.260 , we can say .40  p  value  .45. Since these are both above the
significance level of 10%, we do not reject the null hypothesis.
b) Critical Value Method: Because this is a right-tail test, d cv  D0 t  2 s d becomes d cv  D0 t  s d . d
is ‘Estimate for difference’ on the printout. You can calculate s d 
d
11
 2.718 . Make a
. Use t .01
t calc
diagram of a Normal curve with zero in the middle. Show a ‘reject’ zone above d cv . Since your computed
d is not in this zone, do not reject the null hypothesis.
18
10/28/03 252y0323
c) Confidence Interval Method: Because the alternative hypothesis is H 1 : D  0 , the confidence interval
formula D  d t  2 s d becomes D  d t  2 s d (Use the numbers given in b)). Since your confidence
interval should not contradict the null hypothesis H 0 : D  0 , do not reject the null hypothesis.
While I have your attention, the following was just added to the outline for section
D.
Let’s try p-value again! Say we end up with z  3.00 .
If H 1 is D  0 , p  0, p  p 0 or    0 , pval  Pz  3  .5  P0  z  3 .
If H 1 is D  0 , p  0, p  p 0 or    0 , pval  Pz  3  .5  P0  z  3 .
If H 1 is D  0 , p  0, p  p 0 or    0 , pval  2Pz  3  2.5  P0  z  3 .
Now let’s say that we end up calculating t calc  3.00 . We compare this with the appropriate line on the t
n 1
n 1
table and find that t .005
.
 3.00  t .001
If H 1 is D  0 or    0 , .001  pval  .005 .
If H 1 is D  0 or    0 , 1  .005  pval  1  .001 or .995  pval  .999 .
If H 1 is D  0 or    0 , 2(.001)  pval  2(.005 ) or .002  pval  .01 .
General Comments: There are some formulas that just don’t go together that some of you insisted on using
together.
s12 s22 (Method D1 or D3) doesn’t work with

n1 n2
DF  n1  n2  2 (Method D2).
1)
sd 
2)
DF  n  1 , which we use for paired data (Method D4) doesn’t work with
(Method D1 or D3) or
sd  s p
sd 
s12 s22

n1 n2
1 1 (Method D2)

n1 n2
19
Download