Part 1:

advertisement
252solngr3-08 3/7/08 (Open this document in 'Page Layout' view!)
Name:
Class days and time:
Please include this on what you hand in!
Graded Assignment 3
Solution (15 pages)
Part 1: In your outline there are 6 methods to compare means or medians, methods D1, D2, D3, D4, D5a
and D5b. Methods D6a and D6b compare proportions and method D7 compares variances or standard
deviations. In the following cases, identify H 0 and H 1 and identify which method to use. If the hypotheses
involve a mean, state the hypotheses in terms of both  and D  1   2 . If the hypotheses involve a
proportion, state them in terms of both p and p  p1  p 2 . If the hypotheses involve standard deviations
or variances, state them in terms of both  2 and
 12
 22
or
 22
 12
. All the questions involve means, medians,
proportions or variances. One of these problems is a chi-squared test.
Note: Look at 252thngs (252thngs) on the syllabus supplement part of the website before you start (and
before you take exams). ). Neatness and clarity of explanation are expected. Note that from now on
neatness means paper neatly trimmed on the left side if it has been torn, multiple pages stapled and
paper written on only one side. This is very similar to Problem D8.
----------------------------------------------------------------------------------------------------------------------------Example: This may seem long but it appears on an old graded assignment 3.
A group of supervisors are given the exams on management skills before and after taking a course in
management. Scores are as follows.
Supervisor
Before
After
1
63
78
2
93
92
3
84
91
4
72
80
5
65
69
6
72
85
7
91
99
8
84
82
9
71
81
10
80
87
11
68
93
If we assume that the distribution of results is Normal, what method should we use to answer the question
“Has the course improved the scores of the managers?”
Solution: You are comparing means before and after the course. You can get away with using means
because the parent distributions are Normal. If  2 is the mean of the second sample, you are hoping that
 2  1 , which, because it contains no equality is an alternate hypothesis. So your hypotheses are
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
or 
. If D  1   2 , then 
. The important thing to notice

 H 1 : 1   2
 H 1 : 1   2  0
H 1 : D  0
here is that the data are in before and after pairs, so you use Method D4.
-------------------------------------------------------------------------------------------------------------------------------General considerations.
1) All methods in section D are methods that can be used only for comparison of 2 samples. This is
because, if  (theta) is a parameter like  or p,   1   2 is easy to define and will be zero if  1 and
 1 are equal. If we go to more than two samples, say 3, 1   2   3 will not be zero when 1   2   3
we need something like 1   0 2   2   0 2   3   0 2 , where  0 is some sort of average of the
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
parameters of the samples. This will equal zero if all the parameters are equal and will not allow positive
discrepancies in one sample to cancel out negative discrepancies in another. This is what takes us to chisquared and ANOVA methods.
Saying    2     2     2  0 is not the same as saying D 3        0 , because
1
0
2
0
3
0
1
2
3
D 3 would be negative if 1   2   3 , but saying 1   0 2   2   0 2  0 is the same as saying
D 2      0 . (Try proving this – it’s simple algebra.)
1
2
2) You can always substitute a method for the median for a method for the mean, but not vice versa.
However, if a Normal distribution applies, a method involving means will be more efficient and powerful.
3) The computer will used Method D3 when it is not told what method to use. This is quite general because
if the sample variances are similar, it gives results like D2 and if the sample sizes are large, it gives results
like D1. However, if variances are equal D2 is easier to use and if the samples are large D1 is easier to use.
4) The K-S and Lilliefors methods only exist because chi-square performs so poorly for small samples. K-S
needs  ,  or other parameters. Lilliefors uses x or s and only works to test for a Normal distribution.
5) ‘Significant’ in statistics means that we have rejected a hypothesis like H 0 :   0 and ‘significantly
different’ means that we have rejected a hypothesis like H 0 : 1   2 . Of course, if two parameters are
significantly different, their difference is significant. Remember, if we are saying that a difference is
significant, we are saying that a difference as large or larger than what we observe is very unlikely under our
null hypotheses and that a p-value tells us the probability of getting a difference as large or larger if the null
hypothesis is true.
6) Be careful of inequalities. If 1   2 or  2  1 and D  1   2 , then D  0. Please remember A hypothesis containing ,  or  is an alternative hypothesis. The null hypotheses will contain
,  or  .
7) In most problems you are better off trying to figure out what the alternative hypothesis is before
you try to state the null hypothesis.
8) Do not lose sight of the fact that the purpose of samples is to compare populations. We may look at
numbers in methods D6b and chi-squared tests, but our purpose is to deal with proportions of a population.
1. Dora Jarr and Daughters is a maker of components for automobile dashboards. When Dora retired, her
company’s stock became publicly traded. A sample of 160 stock analysts were asked whether they rated the
stock as a ‘buy‘in 2007 and again in 2008. 79 analysts rated the stock a ‘buy’ in both 2007 and 2008. 15
analysts recommended it as a ‘buy’ in 2007 but not in 2008. 9 analysts upgraded the stock to a ‘buy’ in
2008. The remaining analysts did not consider the stock a ‘buy’ in either year. Can we say that the
proportion of analysts who favor the stock has fallen?
Solution: This can be called a paired comparison of proportions and the method is D6b, the McNemar Test
Let p1 represent the proportion of analysts that rated the stock as a ‘buy‘ in 2007 and p2 represent the
proportion of analysts that rated the stock as a ‘buy‘ in 2007. We want to test to see if p1  p 2 . This is an
alternative hypothesis because it does not contain an equality. The null hypothesis is the opposite p1  p 2 .
x11  79 analysts rated the stock a ‘buy’ in both 2007 and 2008. x12  15 analysts recommended it as a
‘buy’ in 2007 but not in 2008. x 21  9 analysts upgraded the stock to a ‘buy’ in 2008. The remaining
x 22  160 – 79 – 15 – 9 analysts did not consider the stock a ‘buy’ in either year. Our hypotheses are given
H 0 : p1  p 2
along with the table to be analyzed. 
or if p  p1  p 2 ,
H 1 : p1  p 2
question 2
question 1
yes no
yes
 x11 x12 
x

no
 21 x 22 
H 0 : p  0

H 1 : p  0
2
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
2. Of a sample of 200 MBA students, 110 are males. Of a sample of 500 managers, 300 are males. Is there
a significant difference between the fraction of males in the population of MBA students and the population
of managers? (What are H 0 and H 1 and what is the identifier of the method you would use?)
Solution: You are comparing two proportions. If p1 refers to the proportion of males among MBA
 p  x1  110
 1
n1
200
students and p 2 refers to the proportion of males among managers, 
 p2  x2
 300
n2
500

 H 0 : p1  p 2
H : p  p 2  0
 H : p  0
or  0 1
. If p  p1  p 2 , then  0
. Since we are comparing

 H 1 : p1  p 2
H 1 : p1  p 2  0
 H 1 : p  0
proportions of unrelated samples, use Method D6a.
3. We add a sample of 100 CEOs to the data in 2. 80 of them are males. Can we say that there is a
significant difference between the proportion of males in all three groups?
Solution: You are comparing three proportions. If p1 refers to the proportion of males among MBA
students, p 2 refers to the proportion of males among managers and p 3 refers to the proportion of males
 p  x1  110
 1
n1
200

 H 0 : p1  p 2  p 3

x
 300
among CEOs,  p 2  2
. Since we are comparing proportions of
n2
500  H : Not all ps equal
 1

 p  x3
 80
 3
n3
100
more than two unrelated samples, use a chi-squared test of homogeneity.
4. You have two machines that plop fruit into bottles, a new one and an old one. A sample of weights of 10
bottles from the old machine is taken, the average weight is 971.375 grams with a standard deviation of
15.250 grams. A sample of weights is taken from the new machine and the average weight turns out to be
971.374 grams with a standard deviation of 11.001 grams. If variability is a measure of reliability, can we
say that the new machine is more reliable than the old one?
Solution: The variance (or standard deviation) is a measure of variability, which is the opposite of
consistency or reliability especially in this case where the difference in means hardly exists. You need to
test the equality of the variances. ‘Less consistent or reliable’ means a larger variance, so the new machine
 H 0 :  12   22
(machine 2) being more reliable than the old one translates as " 12   22 ?" so we have 
or
 H 1 :  12   22

 12
H 0 : 2  1
H 0 :   
2

1
2
. In terms of the variance ratio you are testing 
and you will do this by

 12
H 1 :  1   2

H 1 : 2  1
2

s12
against Fn1 1,n2 1 . This is Method D7.
s 22
Question for a Later Exam: The F-test assumes that the underlying distribution is Normal. What if you
doubt that the Normal Distribution applies? Solution: Levene Test.
Question for a Later Exam: What if there are 3 machines? Solution: Levene or Bartlett Test.
comparing
3
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
5. The Wallaby Shock Absorber company takes 6 of its own shock absorbers and tests them for durability
by driving different cars 20000 miles with them. The mean and variance of the strength of the shock was
recorded giving a mean of 10.716 and a standard deviation of 3.069. 6 of a competitor’s shocks were tested
the same way, and a mean of 10.300 and a standard deviation of 3.304 were found. The manufacturer
wants to compare the means, and assumes an underlying Normal distribution, but needs to find out first
whether to use method D2 or D3. What should the manufacturer do to decide?
Solution: To make the decision as to which method to use, you need to test the equality of the variances.
H 0 :   
 H 0 :  12   22
2
2
1
2
or 
. In terms of the variance ratio 12 or 22 , you are testing

2
1
 H 1 :  12   22
H 1 :  1   2


 12
 22
H 0 : 2  1
H 0 : 2  1
2
1


or 
. In practice, this is a right-sided test because s1  3.069 and s 2  3.304.

2
1
 22


H 1 : 2  1
H 1 : 2  1
2
1


So you will compare
s 22
s12
with F5,5 . This is Method D7.
2
6. The manufacturer in the previous example never did decide what to do. Instead Wallaby continued the
experiment by testing 120 of its own shocks and 90 of the competitors. For Wallaby’s shocks the mean and
standard deviation were now 10.701 with a standard deviation of 3.051 . For the competitor the mean was
now 10.422 with a standard deviation 3.043. What method can they now use to compare the average
strength of the shocks?
Solution: The word ‘average’ makes you think of the mean. You are comparing two means, with a total
H :    2
sample size of 200. There is no reason to assume that one mean is larger than the other. So  0 1
 H 1 : 1   2
 H 0 : 1   2  0
H 0 : D  0
or 
. If D  1   2 , then 
. You can certainly get away with method D1
 H 1 : 1   2  0
H 1 : D  0
which works for large sample sizes and this would be preferred if you must work with a calculator. D3
would also work and would be used on a computer, but it’s more effort. Of course, you could still test for
equal variances and use method D2.
Question for a Later Exam: What if we want to compare three or more firms’ shock absorbers? Solution:
One-way ANOVA.
7. Assume that the situation is identical to problem 5 above, but that an analysis of the data indicates that
the distribution of strengths is highly skewed to the right. What method should be used now to compare the
strength of the shocks?
Solution: The skewness of the data should alert you to the fact that you should compare medians rather than
means, since it is unlikely for this small a sample that the sample means are normally distributed. If  1 is
 H 0 : 1   2
the median for Wallaby’s shocks, we have 
. Since we are comparing medians and the data are
 H 1 : 1   2
not paired, use Method D5a.
Question for a Later Exam: What if we want to compare three or more firms’ shock absorbers? Solution:
Kruskal-Wallis
4
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
8. A sample of ten customers are asked to rate their experience with two service firms on a scale of one to
ten. Scores are as follows.
Supervisor
Firm A
Firm B
1
4
7
2
5
6
3
9
6
4
5
7
5
4
5
6
9
8
7
5
4
8
7
6
9
3
5
10
3
6
If we assume that the distribution of results is Normal, what method should we use to see if there is a
difference between average ratings customers give to the firms?
Solution: You are comparing means before and after the course. You can get away with using means
because the parent distributions are Normal. If  2 is the mean of the second sample, you are hoping that
 2  1 , which, because it contains no equality is an alternate hypothesis. So your hypotheses are
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
or 
. If D  1   2 , then 
. The important thing to notice

 H 1 : 1   2
 H 1 : 1   2  0
H 1 : D  0
here is that the data are in before and after pairs, so you use Method D4.
9. Normally in a problem like problem 8, we should not assume an underlying Normal distribution. What
method would we use if we do not assume that the underlying distribution was Normal?
 H 0 : 1   2
Solution: If  2 is the median after and  1 is the median after, we have 
. Since we are
 H 1 : 1   2
comparing medians and the data are paired, use Method D5b.
10. We have data for 15 recent oil spills caused by fire and 15 oil spills caused by collision. We want to
show that the spills caused by collision are worse than those caused by fire, and we have evidence that the
variability of the spills caused by collision is far larger than the variability of spills caused by fire. Assume
that the severity of the spill is measured by the number of gallons spilled. What method should we use?
Solution: It is hard to think of anything but average size of spills that would be compared here, though per
cent of environment damages is a possibility. You can get away with using means if the parent distributions
are Normal. If  2 is the mean of the second sample, you are expecting that  2  1 , which, because it
 H 0 : 1   2
contains no equality is an alternate hypothesis. So your hypotheses are 
or
 H 1 : 1   2
 H 0 : 1   2  0
H 0 : D  0
. If D  1   2 , then 
An important thing to notice here is that there is no

 H 1 : 1   2  0
H 1 : D  0
obvious way to pair the data and that the variances seem different, so you use Method D3 unless you are
really suspicious that the data are not Normal, in which case you use method D5a.
Question for a Later Exam: What if we want to compare the scores of these individuals on three or more
exams? Solution: Friedman
5
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
11. Students are asked to rate the reality shows produced by ABC and Fox on a scale of 1 to 10. Analysis
of the data set, which can be considered ordinal data, indicates that the distribution is not symmetrical. A
random sample of 30 ratings of Fox shows and 25 ratings of ABC show is given and the researcher wants to
prove that Fox produces better shows.
Solution: The things that should alert you to the fact that you should compare medians rather than means
are the fact that this is ordinal data and the fact that the distribution is not symmetrical. If  1 is the median
 H 0 : 1   2
for Fox and  2 is the median for ABC, we have 
. Since we are comparing medians and the
 H 1 : 1   2
data are not paired, use Method D5a.
12. Unemployment rates are found for 20 urban communities and 10 university communities in
Pennsylvania. The researcher wants to show that workers in the second group of communities are better off
than workers in the first group. Average unemployment rates are computed and variances of employment
rates between communities of each type seem similar.
Solution: The word ‘average’ makes you think of the mean. You are comparing two means, with a total
sample size of 30, which is relatively small. Workers in the second group of communities are better off if
the unemployment rate is lower – this translates as " 1   2 ?" . Since it does not contain an equality, it
 H 0 : 1   2
 H 0 : 1   2  0
must be an alternate hypothesis. So 
or 
. If D  1   2 , then
 H 1 : 1   2
 H 1 : 1   2  0
H 0 : D  0
. You can certainly get away with method D3, but it’s more work than D2, which the similar

H 1 : D  0
variance tells us ought to work.
6
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
Part 2: Do problems 1 and 2 in part 1 using a 95% confidence level. Find p-values.
1. Dora Jarr and Daughters is a maker of components for automobile dashboards. When Dora retired, her
company’s stock became publicly traded. A sample of 160 stock analysts were asked whether they rated the
stock as a ‘buy‘in 2007 and again in 2008. 79 analysts rated the stock a ‘buy’ in both 2007 and 2008. 15
analysts recommended it as a ‘buy’ in 2007 but not in 2008. 9 analysts upgraded the stock to a ‘buy’ in
2008. The remaining analysts did not consider the stock a ‘buy’ in either year. Can we say that the
proportion of analysts who favor the stock has fallen?
Solution: This can be called a paired comparison of proportions and the method is D6b, the McNemar Test
Let p1 represent the proportion of analysts that rated the stock as a ‘buy‘ in 2007 and p2 represent the
proportion of analysts that rated the stock as a ‘buy‘ in 2007. We want to test to see if p1  p 2 . This is an
alternative hypothesis because it does not contain an equality. The null hypothesis is the opposite p1  p 2 .
x11  79 analysts rated the stock a ‘buy’ in both 2007 and 2008. x12  15 analysts recommended it as a
‘buy’ in 2007 but not in 2008. x 21  9 analysts upgraded the stock to a ‘buy’ in 2008. The remaining
x 22  160 – 79 – 15 – 9 = 57 analysts did not consider the stock a ‘buy’ in either year. Our hypotheses are
H 0 : p  0
H : p  p 2
given along with the table to be analyzed.  0 1
or if p  p1  p 2 , 
. The general
H 1 : p  0
H 1 : p1  p 2
question 2
2008
question 1
2007
yes no
buy don' t
design of the table is
, in this case
. We will compare
x
x
yes
buy
 11
79 15 
12 
x

no
don' t  9 57 
 21 x 22 
z
x12  x 21
x12  x 21

15  9
15  9

62
 1.500  1.225 against z  . Make a diagram of a Normal curve
24
with 0 in the middle and a ‘reject’ region below 1.645. Since 1.225 is not in the ‘reject region, do not reject
the null hypothesis. Since the alternative hypothesis is H 1 : p  0 , the p-value is the probability that p is
greater than or equal to the value it actually takes. p  value  Pz  1.22   .5  .3888  .1112 . Because this
is above .05, we cannot reject the null hypothesis.
2. Of a sample of 200 MBA students, 110 are males. Of a sample of 500 managers, 300 are males. Is there
a significant difference between the fraction of males in the population of MBA students and the population
of managers? (What are H 0 and H 1 and what is the identifier of the method you would use?)
Solution: You are comparing two proportions. If p1 refers to the proportion of males among MBA
 p  x1  110
 .550
 1
n1
200
students and p 2 refers to the proportion of males among managers, 
 p2  x2
 300
 .600
n2
500

 H 0 : p1  p 2
H 0 : p1  p 2  0
 H 0 : p  0
or 
. If p  p1  p 2 , then 
. Since we are comparing

 H 1 : p1  p 2
H 1 : p1  p 2  0
 H 1 : p  0
proportions of unrelated samples, use Method D6a. The relevant part of Table 3 follows.
7
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
Interval for
Confidence
Interval
p  p  z 2 s p
Difference
between
proportions
q  1 p
p  p1  p 2
s p 
p1 q1 p 2 q 2

n1
n2
Hypotheses
Test Ratio
H 0 : p  p 0
z
H 1 : p  p 0
p  p 0
 p
p 0  p 01  p 02
If p  0
or p 0  0
 p 
p 01q 01 p 02 q 02

n1
n2
Or use
s p
Critical Value
pcv  p0  z 2  p
If p  0
 1
1 


 n1 n 2 
 p  p 0 q 0 
n1 p1  n 2 p 2
n1  n 2
p0 
p1  .550 and p 2  .600 . This means p  p1  p 2  .550  .600  .050 . n1  200 , n 2  500 and the
110  300 410

 .58571 . We have
200  500 700
q1  1  p1  1  .550  .450 , q 2  1  p 2  1  .600  .400 and q 0  1  p 0  1  .58571  .41429 . So we
overall proportion of males in the two groups is p 0 
can say  p 
 1
1 
1
1 
  .58571 .41429 
p 0 q 0  

  .24265 .00500  .00200 
 200 500 
 n1 n 2 
 .001699  .041214 . If the significance level is 5%, use z .025  1.960
 p1  p2    p1  p2   z
.550 .450  .600 .400 

 .001238  .000480
(i). Confidence Interval: p  p  z s p or
2
s p 
p1 q1 p 2 q 2


n1
n2
200

2
s p , where
500
 .001716  .041443. The confidence interval for the difference in the two proportions
is p  p  z s p  .050  1.960.041443  .050  0.081. We can compare this
2
interval with p0  0 . Since 0.081 is larger in absolute value than -.050, the interval
includes zero and we cannot reject the null hypothesis.
p  p 0  p1  p    p10  p 20  .050  0
(ii). Test Ratio: z 


 1.213 where p10
 p
 p
.041214
and p 20 come from the null hypothesis if specified and   p 
p1q1 p 2 q 2
although

n1
n2
sp may have to be used if p1 and p 2 are unknown and if the null hypothesis is
p1  p2 or p0  0 , we use  p 
 1
n p  n2 p 2
1 
 , where p 0  1 1
p 0 q 0  
n1  n 2
 n1 n 2 
x1  x 2
and x1 and x2 are the number of successes in sample 1 and sample 2,
n1  n 2
respectively. For a 95% confidence level the computed value of z must be below -1.960
or above 1.960 to reject the null hypothesis. Make a diagram with zero in the middle and
shade the two rejection regions. Since 1.213 does not fall in a rejection region we cannot
reject the null hypothesis.
Since this is a two-sided test, pvalue  2Pp  050   2Pz  1.21  2.5  .3869 
 .2262 . Because this is above   .05 , we cannot reject the null hypothesis.
(iii). Critical Value: pCV  p0  z 2  p or  p1  p 2 CV   p10  p 20   z 2   p

 0  1.960 .041212   .081 . Make a diagram with zero in the middle and shade the
two rejection regions, one above .081 and the other below -.081. Test this against
p  p1  p 2  .050 . Since -.050 does not fall in a rejection region, we cannot reject
the null hypothesis.
8
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
For completeness, an annotated Minitab printout of the problem above follows.
————— 3/7/2008 4:11:22 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > PTwo 200 110 500 300;
Instruction needs n1, x1, n2, x2.
SUBC>
Pooled.
We have used here the std error computed for
the test ratio or critical value above.
Test and CI for Two Proportions
Sample
1
2
X
110
300
N
200
500
Sample p
0.550000
0.600000
Difference = p (1) - p (2)
Estimate for difference: -0.05
95% CI for difference: (-0.131226, 0.0312263)
Test for difference = 0 (vs not = 0): Z = -1.21 P-Value = 0.225
Fisher's exact test: P-Value = 0.235
The .225 p-value is the same as above
except for rounding error.
MTB > PTwo 200 110 500 300.
We have used here the std error computed for
the confidence interval above.
Test and CI for Two Proportions
Sample
1
2
X
110
300
N
200
500
Sample p
0.550000
0.600000
Difference = p (1) - p (2)
Estimate for difference: -0.05
95% CI for difference: (-0.131226, 0.0312263)
Test for difference = 0 (vs not = 0): Z = -1.21 P-Value = 0.228
Fisher's exact test: P-Value = 0.235
If you really want to know what Fisher’s
exact test is, I have documentation.
Part 3. (Extra Credit) Invent and solve 4 problems. One each for methods D1 thru D4.
Obviously, there is a lot of room for whimsy here. Entries will be judged for originality.
Here is a possibility.
The data below represent samples of cost overruns (in millions of dollars) over a period of two years by two
companies in Federal contracts. Below I have calculated the means, the standard errors and the variances
for the columns. If we are to choose one of these firms for future contracts on the basis of cost overruns, do
we have enough information to make a choice?
Company 1
Company 2
Difference
x2
d  x1  x 2
x1
1
2
3
4
5
6
7
8
9
10
11
1.49
3.69
6.79
1.09
0.09
0.89
3.99
-1.21
0.89
0.19
0.59
0.69
1.49
0.69
1.99
0.99
-0.11
-1.31
-0.11
1.99
-2.31
-0.41
0.8
2.2
6.1
-0.9
-0.9
1.0
5.3
-1.1
-1.1
2.5
1.0
n1  11
x1  1.681
s x1  0.684
s1  2.267
n 2  11
x 2  0.326
s x2  0.406
s 2  1.347
n d  11
d  1.355
s d  0.756
s d  2.508
Note: In case you didn’t know, cost overruns are undesirable. We only have a basis for choice if we can
show a significant difference between the mean cost overruns. I am assuming a significance level of 5%.
9
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
Problem 1: The data above represent cost overruns, where each line represents jobs of comparable size and
scope.
Problem 2: The data above represents two independent samples of cost overruns by the two firms in
government contracts.
Problem 3: The data above represents two independent samples of cost overruns. Use a test for equality of
variances before deciding on what method to use.
Problem 4: Assume that the means and variances that you got from the columns apply to two independent
random samples, each of size 80.
I will only use the t-ratio in the solutions that follow.
Problem 1: The data above represent cost overruns, where each line represents jobs of comparable size and
scope.
If the lines of the table represent numbers with a unique relationship between them we have paired
data, Method D4.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D




1
2
s
data.)
df  n  1 where
sd  d
n
n1  n 2  n
n d  11
d  1.355
s d  0.756
s d  2.508
df  11  1  10 .
d  D0
1.355  0

 1.792 Since this is a two-sided test, the ‘do not reject’ region is between
sd
0.756
 t 10  2.228 . Since our computed t is between these two values, do not reject the null hypothesis.
t
.025
This agrees with the Minitab solution below.
MTB > Paired c5 c6.
Paired T-Test and CI: xc1, xc2
Paired T for xc1 - xc2
N
Mean
StDev SE Mean
xc1
11 1.68091 2.26736 0.68363
xc2
11 0.32636 1.34705 0.40615
Difference 11 1.35455 2.50773 0.75611
95% CI for mean difference: (-0.33017, 3.03926)
T-Test of mean difference = 0 (vs not = 0): T-Value = 1.79
Value = 0.103
P-
Note that since the p-value is above the 5% significance level, we cannot reject the null hypothesis and thus
have no basis for choosing between the firms.
10
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
Problem 2: The data above represents two independent samples of cost overruns.
If the variances are not equal and we are assuming a Normally distributed population, we can use
method D3, the Satterthwaite approximation. The formula table has the following formulas.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d  t 2 s d
d cv  D0  t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
2
2
s
s
Means(
sd  1  2
D  1   2
unknown,
n1
variances
assumed
unequal)
DF 
n2
 s12 s22 
  
n

 1 n2 
2
   
s12
2
n1
n1  1
s 22
2
n2
n2  1
n1  11
x1  1.681
s x1  0.684
s1  2.267
n 2  11
x 2  0.326
s x2  0.406
s 2  1.347
This is the worksheet that I recommended for Method D3.
Sample 1
 0.684 2
s2
 0.467856
1
n1
s 22
n2
Sum
 s12 s 22 



 n1 n 2 


Thus s d 
DF 
 0.164836
 0.406 2
Sample 2
d  1.355
=0.632692
s12 s 22

 0.632692  0.79542
n1 n 2
 s12 s 22 



 n1 n 2 


2
2
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1

0.63692 2
0.467856 2  0.164836 2
10

10 0.40030 
 16 .27
.246060
10
d  D0
1.355  0

 1.703 . Since this is a two-sided test, the ‘do not reject’ region is between
sd
0.79542
 t 29  2.045 . Since our computed t is between these two values, do not reject the null hypothesis.
t
.025
MTB > TwoSample c5 c6.
Two-Sample T-Test and CI: xc1, xc2
Two-sample T for xc1 vs xc2
N Mean StDev SE Mean
xc1 11 1.68
2.27
0.68
xc2 11 0.33
1.35
0.41
Difference = mu (xc1) - mu (xc2)
Estimate for difference: 1.35455
95% CI for difference: (-0.33116, 3.04026)
T-Test of difference = 0 (vs not =): T-Value = 1.70
0.108 DF = 16
P-Value =
Note that since the p-value is above the 5% significance level, we cannot reject the null hypothesis and thus
have no basis for choosing between the firms.
11
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
Problem 3: The data above represents two independent samples of cost overruns. Use a test for equality of
variances before deciding on what method to use.
All right, I cheated and ran the variance comparison on Minitab.
MTB > VarTest c5 c6;
SUBC>
Unstacked.
Test for Equal Variances: xc1, xc2
95% Bonferroni confidence intervals for standard deviations
N
Lower
StDev
Upper
xc1 11 1.50962 2.26736 4.35771
xc2 11 0.89687 1.34705 2.58894
F-Test (normal distribution)
Test statistic = 2.83, p-value = 0.116
Levene's Test (any continuous distribution)
Test statistic = 0.57, p-value = 0.458
Since both tests give us a p-value above our significance level, we cannot reject the null hypothesis
of equal variances. The formula table gives us the formulas below. (Method D2 – Comparison of two means
with samples coming from populations with similar variances.)
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d  t 2 s d
d cv  D0  t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
1 1
Means (
sd  s p

D  1   2
n1  1s12  n2  1s22
n n
2
unknown,
1
variances
assumed equal)
sˆ p 
2
n1  n2  2
DF  n1  n2  2
n1  11
x1  1.681
s x1  0.684
s1  2.267
n 2  11
x 2  0.326
s x2  0.406
s 2  1.347
d  1.355
Since our samples are of equal size we can use the simplified formula for the pooled standard deviation.
sˆ 2p 
sd 
11  12.267 2  11  11.347 2
11  11  2
3.4769  1

2.267 2  1.347 2
 3.4768 sˆ p  3.4768  1.8646
2
1
  0.6322  0.7951 df  11  11  2  20
11 11 

Note that if I use the more exact values for the standard deviations given in the Minitab printout above, I get
a pooled standard deviation of 1.8649. Note that s d and the t-ratio are the same as in the last version except
for a rounding error.
d  D0
1.355  0
t

 1.704 . Since this is a two-sided test, the ‘do not reject’ region is between
sd
0.7951
20
 t .025
 2.086 . Since our computed t is between these two values, do not reject the null hypothesis.
12
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
MTB > TwoSample c5 c6;
SUBC>
Pooled.
Two-Sample T-Test and CI: xc1, xc2
Two-sample T for xc1 vs xc2
N Mean StDev SE Mean
xc1 11 1.68
2.27
0.68
xc2 11 0.33
1.35
0.41
Difference = mu (xc1) - mu (xc2)
Estimate for difference: 1.35455
95% CI for difference: (-0.30417, 3.01327)
T-Test of difference = 0 (vs not =): T-Value = 1.70
0.104 DF = 20
Both use Pooled StDev = 1.8649
P-Value =
Note that since the p-value is above the 5% significance level, we cannot reject the null hypothesis and thus
have no basis for choosing between the firms.
Problem 4: Assume that the means and variances that you got from the columns apply to two independent
random samples, each of size 80.
If the sample size is very large, we can use method D1 with our values of the sample standard
deviation replacing the population standard deviation. The formula table has the following formulas.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
d cv  D0  z d
D  d z 2  d
d  D0
z
between Two
H 1 : D  D0 ,
d
Means (
 12  22
D  1   2
d 

known)
n1
n2
d  x1  x 2
sd 
z
n1  80
n 2  80
x1  1.681
x 2  0.326
s12 s 22


n1 n 2
2.267 2 1.347 2


80
80
s1  2.267
s 2  1.347
d  1.355
0.08692  0.2948
d  D0
1.355  0

 4.596 Since this is a two-sided test, the ‘do not reject’ region is between
sd
0.2948
 z .025  1.960 . Since our computed t is not between these two values, reject the null hypothesis. Of course,
we cannot force the computer to use a z test, but the results below produce an essentially identical p-value,
since for the above, we have pvalue  2.5  P0  z  4.59   2.5  .5  0.
MTB > TwoT 80 1.68071 2.26736 80 0.32636 1.34705.
Two-Sample T-Test and CI
Sample
N Mean StDev SE Mean
1
80 1.68
2.27
0.25
2
80 0.33
1.35
0.15
Difference = mu (1) - mu (2)
Estimate for difference: 1.35435
95% CI for difference: (0.77092, 1.93778)
T-Test of difference = 0 (vs not =): T-Value = 4.59
0.000 DF = 128
P-Value =
Note that since the p-value is not above the 5% significance level, we can reject the null hypothesis and thus
have a basis for choosing between the firms.
Notice the effect of the restrictiveness of the assumptions. The most restrictive assumption is
assuming paired data. We got a p-value of .103.The assumption of equal variances is considerably less
13
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
restrictive, and the p-value rose to .104. When we relaxed the assumption of equal variances, the p-value
rose to .108.
None of this is very impressive, but it does indicate that generally more restrictive assumptions seem to be
yielding more powerful tests. Of course having 80 pairs of numbers instead of 22 trumps any assumptions
and the last test, whether done using method D1 or, more correctly, using method D4 as the computer does,
is much more powerful than the first three.
A large part of the difference in power comes from the size of the denominators in the t ratio.
There are three important rules governing variances. First, x1 , x 2 …. x n are independent, as they are if a
sample is random, Var ( x1  x 2      x n )  Var ( x1 )  Var ( x 2 )      Var ( x n ) . Second, if x1 and x 2 are not
independent Vard   s d  Var( x1  x 2 )  Var x1   Varx 2   2Covx1 , x 2  , where Covx1 , x 2  is a
measure of the relationship between x1 and x 2 that is positive if the two variables tend to move together.
Note: this is true because x  y 2  x 2  y 2  2 xy . Finally, Var(ax)  a 2Var( x) . Since

1
di 
n
variance
d 
 di 
  n  , but that each individual d
i
in the sample has the same expected value (mean) and
   Var dni    Var dn    Var 1n d    12 Vard   n 12 Vard   1n Vard 
n
 


n 
 
s d2  Var d 
1
1
2
Var x1   Var x 2   Covx1 , x 2  . This indicates that s d for paired data will be smaller than s d for
n
n
n
independent samples where Covx1 , x 2  = 0.

It’s hard to say anything about the relative size of s d 
 n  1s12  n 2  1s 22
sˆ p   1

n1  n 2  2

s12 s 22
1 1
and s d  sˆ 2p    where

n1 n 2
n n

 . As we have seen in the computer solutions to methods D2 and D3 above,


if the sample sizes are equal, so that sˆ 2p 
 s 2  s 22
s12  s 22
and s d   1
 2
n

 1 1 
 
  n n 

 s 2  s 22   2 
s12 s 22



  1
. That is s d will be the same for D2 and D3. Method D2 seems to gain
 2   n 
n
n


power because of the larger numbers of degrees of freedom. In general for method D2
 n  1s12  n 2  1s 22
sd   1

n1  n 2  2

  n1  n 2 


 n1 n 2 


n1  1n1  n 2  s12 n 2  1n1  n 2  s 22

n1  n 2  2n 2 n1 n1  n 2  2n1 n 2
This will give a
s12
s2
and 2 are both less than one. However, a little
n1
n2
experimentation shows that that is not the case. For example if n1  100 and n 2  50
smaller s d if the fractions multiplying
sd 
100  1150  s12  50  1150  s 22
148 50 n1 148 100 n 2
 2.006
s12
s2
 .4966 2
n1
n2
So that if we say n 2  n1  a
sd 
n1  12n1  a  s12 n1  a  12n1  a  s 22

2n1  a  2n1  a  n1
2n1  a  2n1 n 2

2n1  a
2n1  a  2
n1  1 s12 n1  a  1 s 22

n1  a  n1
n1
n2
14
252solngr3 3/7/08 (Open this document in 'Page Layout' view!)
As a gets larger
n 1
n  a 1
2n1  a
will change very slowly but 1
will grow and 1
will shrink.
2n1  a  2
n1  a
n1
The largest a is when n2  n1  a  2 or a  n1  2 ,
n1  1 s12 n1  a  1 s 22

n1  a  n1
n1
n2

large and the fraction multiplying
n1  1 s12 1 s 22

2 n1 n1 n 2
2n1  a

2n1  a  2
n1  2
and
n1
, so that the fraction multiplying
s12
is extremely
n1
s 22
is extremely small.
n2
15
Download