252solnD4

advertisement
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
D. COMPARISON OF TWO SAMPLES
1. Two Means, Two Independent Samples, Large Samples.
Text 10.1-10.3, 10.7 [10.1 – 10.3, 10.5] (10.1 – 10.3, 10.5)
2. Two Means, Two Independent Samples, Populations Normally Distributed, Population Variances Assumed Equal.
Text 10.4, 10.13a, 10.20a, b, e [10.4, 10.15a, 10.13a,b,e]. For the last
problem: x1  17 .5571 , s1  1.9333 , x 2  19.8905 , s 2  4.5767 (10.4, 10.14, 10.12a,b,e)


3. Two Means, Two independent Samples, Populations Normally Distributed, Population Variances not Assumed Equal.
Optional Text 10.20[10.13c,d] (10.12c,d) See data above. D3, D4
4. Two Means, Paired Samples (If samples are small, populations should be normally distributed).
Text 10.26, 10.29[10.36, 10.37], D1, D2 (10.32*(in 252hwkadd.), [10.34] (different numbers), 10.25[10.35], D1, D2)
5. Rank Tests.
a. The Wilcoxon-Mann-Whitney Test for Two Independent Samples. Text 12.65[10.48] (10.46)
b. Wilcoxon Signed Rank Test for Paired Samples.
Text 12.74-12.76[10.57-59] (10.80-82 on CD), Downing & Clark 18-15, 18-9 (in chapter 17 in D&C 3rd edition), D5
6. Proportions.
Text 10.32, 10.38, 10.39, 12.32** [12.2, 12.7*, 12.8*] (12.2)
7. Variances.
Text 10.40, 10.43-10.48 [10.16, 10.19 - 10.24, 10.25] (10.15, 10.18 - 10.23, 10.24) D6a (below), D6, D7 (A summary problem),
D8 (A summary problem)
Graded assignment 3 will be posted.
This document is a solution to summary problems.
-------------------------------------------------------------------------------------------------------------------------Problem D7: This is a summary problem and could be an exam in itself. Almost everything you need
to know about comparing two sample means or variances is in here.
In a study of sleep gotten with a sleeping pill and with a placebo the results were as below (Keller, Warren,
Bartel, 2nd ed. p. 354). Test for a difference in means or medians as appropriate.
d
x1
x2
Pill
Placebo
difference
7.3
8.5
6.4
9.0
6.9
x1  7.620
s12
 12
 1.197
6.8
7.9
6.0
8.4
6.5
.5
.6
.4
.6
.4
x 2  7.120 d  0.500
s 22  0.997 s d2  0.010
a. Assume that these are independent samples from population with a normal distribution and that
  22 (Test if  12   22 ).
b. Assume that these are independent samples and that  12   22 .
c. Assume these are paired samples.
In each case do (i) a 99% confidence interval for 1-2 , (ii) test if 1=2 . (iii) In case a test
  22 .
d. Redo part a(ii) assuming that the parent population is not normal.
e. Redo part c(ii) assuming that the parent distribution is not normal.
if  12
1
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
Solution: Assume   .01 .
a) Assume that these are independent samples from a normal distribution and that  12   22 (Test if
 12   22 ).
From the Syllabus supplement:
Interval for
Confidence
Interval
Difference
D  d  t 2 sd
Between Two
1 1
Means (
sd  s p

n
n2
Unknown,
1
Variances
DF  n1  n2  2
Assumed equal)
(i)
Hypotheses
Test Ratio
H 0 : D  D0
t
H1 : D  D0
D  1   2
sˆ 2p 
Critical Value
d  D0
sd
d cv  D0  t 2 sd
n1  1s12  n2  1s22
n1  n2  2
Confidence interval: In the case of equal variances we used a pooled variance,
sˆ2p 
n1  1s12  n2  1s22
n1  n2  2
s d  sˆ p
1
1


n1 n 2
d  x1  x 2

and

41.197   40.947 
 1.097 . This is used to compute
8
1.097  1  1  
5
D  1  2

8
t  tn1  n2 1  t .005
 3.355 ,
0.439  0.662 .
5
,
the
becomes
2
Since
we
can
D  d  t 2 sd
equation
say
,
1   2  x1  x 2   t 2 s d
that
where
or
1   2  0.500  3.355 0.662   0.500  2.221
(ii)
H0 : D  0
H1 : D  0 or H 0 : 1   2
H1 : 1   2 . If we use a test ratio,
d  D0 x1  x2   1  2  0.500  0


 0.755 .
sd
sd
0.662
8
 t .005
 3.355 , we accept H 0 . If we use
t
Since
a
this
critical
is
between
value
instead,
d cv   0  t 2 sd  0  3.355 0.062   2.221 . Since d  0.500 is between these critical
values, we accept H 0 .
(iii)
We are testing
test F
DF1 , DF2
H 0 :  12   22
H1 :  12   22 . According to the syllabus supplement,
s12
s22
DF2 , DF1
 2 and F
 2 , where DF1  n1  1 and DF2  n 2  1 .
s2
s1
s 22 0.997
1.197
or

 0.833 , so we

1
.
201
s12 1.197
s 22 0.997
4,4 , But it's not available
accept H 0 . (Actually we should be checking against F4,4  F.005
4,4   9.60 and is larger than
F.01
s12

2
4, 4  must be larger than F 4, 4  . So if
on the table. A check of the table shows that F.005
.01


4, 4 
4, 4 
1.201 is less than F
, it also must be less than F
.
.005
.01
2
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
b) Assume that these are independent samples and that  12   22 .
From the Syllabus supplement:
Interval for
Confidence
Hypotheses
Interval
Difference
H 0 : D  D0
D  d  t 2 sd
Between Two
H1 : D  D0
Means(
s12 s22
D  1   2
sd 

Unknown,
n1 n2
Variances
2
 s12 s22 
  
Assumed
n

n2 
1
Unequal)
DF   2
2
Test Ratio
t
Critical Value
d  D0
sd
d cv  D0  t 2 sd
   
s12
s 22
n1
n1  1
(i)
n2
n2  1
(Optional) Confidence interval: In the case of unequal variances we use the Satterthwaite
method.
s12 1.197
s 22 0.997
s2 s2

 0.2394 ,

 0.1994 , so 1  2  0.2394  0.1994  0.4388 .
n1
5
n2
5
n1 n 2
If we use this in the degrees of freedom formula, we find
DF 
 s12 s 22 



 n1 n 2 


2
2
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1

0.4388 2
0.2394 2  0.1994 2
4
 7.9341 . We round this down to get
4
7 degrees of freedom. This is used with s d 
s12 s 22

 0.4388  0.6624 . Since we
n1 n 2
can say that d  x1  x 2 and D  1  2 , the equation D  d  t sd , where
2
7
t  t .005
 3.499 , becomes 1   2  x1  x 2   t s d or
2
1   2  0.500  3.499 0.662   0.500  2.318
(ii)
H0 : D  0
H1 : D  0 or H 0 : 1   2
H1 : 1   2 . If we use a test ratio,
d  D0 x1  x2   1  2  0.500  0


 0.755 .
sd
sd
0.6624
7
 t .005
 3.499 , we accept H 0 . If we use
t
Since
a
this
critical
is
between
value
instead,
d cv  D0  t 2 sd  0  3.4990.0624  2.318 . Since d  0.500 is between these critical
values, we accept H 0 .
3
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
Assume these are paired samples.
If the paired data problem were on the formula table, it would appear as below.
Interval for
Confidence
Hypotheses
Test Ratio
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D  1   2
s
data.)
sd  d
n
(i)
Critical Value
d cv  D0 t  2 s d
Confidence interval: In the case of paired data, we act as if we have only n  n1  n 2
4
pairs. DF  n  1  4 . t  t .005
 4.604 and s d 
sd

n
0.010
 .002  0.447 .
5
d  x1  x 2 and D  1  2 , the equation D  d  t 2 sd , becomes
1   2  x1  x 2   t 2 s d or 1   2  0.500  4.604 0.0447   0.500  0.206
(ii)
H0 : D  0
H1 : D  0 or H 0 : 1   2
H1 : 1   2 . If we use a test ratio,
d  D0 x1  x2   1   2  0.500  0


 11 .18 . Since this is not between
sd
sd
0.0447
 t 4   4.604 , we reject H . If we use a critical value instead,
t
.005
0
d cv  D0  t 2 sd  0  4.6040.0447  0.206 . Since d  0.500 is not between these
critical values, we reject H 0 .
c)
Redo part a(ii) assuming that the parent population is not normal.
Since the parent population is not
normal and the data represents two independent
samples we do a Wilcoxon rank sum test. To do
this we rank the ten numbers from 1 to ten
starting at the extreme end of the smallest
sample. Since the samples are of the same size
we arbitrarily pick x1 as the smaller sample and
note that 9 is the largest number in both samples
so that is where we start our ranking. Since we
are working with non normal items, our
hypotheses are stated as
H0 : 1   2 H1 : 1   2
x1
r1
x2
r1
7.3
8.5
6.4
9.0
6.9
5
2
9
1
6
23
6.8
7
7.9
4
6.0 10
8.4
3
6.5
8
32
d
.5
.6
.4
.6
.4
From the above n1  n2  5 , and the sums of the ranks are SR1  23 and SR2  32 . W is the smaller of the
two rank sums and is 23. To check our rank sums note that n1  n2  n  10 and that if the rank sums are
nn  1
10 11
 55 , so the ranking seems correct. If we go to
. In this case 23  32 
2
2
Table 5 in the syllabus supplement, we find that the p-value for W  23 is .210. Since this is a 2-sided test
it should be doubled to .410. In any case, it is above   .01 , so accept H 0 . For a 5% test Table 6 could be
used.
correct, SR1  SR2 
4
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
d) Redo part c(ii) assuming that the parent distribution is not normal.
Since the parent population is not
normal and the data represents paired samples we
would prefer to do a Wilcoxon signed rank test
of the hypotheses H0 : 1   2 H1 : 1   2 . To
do this we take the values of d  x1  x 2 and
replace them with their absolute values d . We
rank the n values from 1 to n . To compute
corrected ranks we add + or - according to the
sign in d and replace all ties with average ranks.
x1
x2
d
7.3
8.5
6.4
9.
6.9
6.8
7.9
6.0
8.4
6.5
.5
.6
.4
.6
.4
rank corrected rank
d
.5
.6
.4
.6
.4
3
4
2
5
1
+3.0
+4.5
+1.5
+4.5
+1.5
For example, ranks 4 and 5 are both replaced
with 4.5, their average, because they correspond
to identical values (.6) of d .
We next compute T  and T  , the sums of the positive and negative ranks. In this case
T   3.0  4.5  1.5  4.5  1.5  15 , while T   0. Our check on the ranking is that the sum of the numbers
nn  1 56 
nn  1

 15 , which, as it should be, is the sum
from 1 to n is
, In this case, since n  5 ,
2
2
2
of T  and T  . We call the smaller of T  and T  , in this case 0, TL , and look it up on Table 7 in the
syllabus supplement. Unfortunately for n  5 , there are no appropriate values, so we cannot reject H 0 .
A second choice test here would be a sign test. We use a binomial table to find out the probability
of getting 5 (or more) positive differences in 5 tries, assuming that the probability is .5. From the binomial
table this probability is .0313, but to make this into a p-value for a 2-sided test, we must double it to .0626.
Since   .01 is less than the p-value, we must accept H 0 , though, if we were working with a higher
significance level we could reject it.
Problem D8: (2001 Graded Assignment 3) In your outline there are 6 methods to compare means or
medians, methods D1, D2, D3, D4, D5a and D5b. Method D6 compares proportions and method D7
compares variances or standard deviations. In the following cases, identify H 0 and H 1 and identify which
method to use. If the hypotheses involve a mean, state the hypotheses in terms of both  and D  1  2 .
If the hypotheses involve a proportion, state them in terms of both p and p  p1  p 2 . If the hypotheses
involve standard deviations or variances, state them in terms of both  2 and
 12
 22
or
 22
 12
. All the
questions involve means, medians, proportions or variances.
Note: Look at 252thngs ( 252thngs) on the syllabus supplement part of the website before you start (and
before you take exams)
a. You have data on income in two villages ( x1 in village 1, x 2 in village 2). You want to test the hypothesis that village 1 has
higher earnings than village 2. You know that income has an extremely skewed distribution. and you have to decide whether to use
the mean or the median income.
b. You have a sample of earned incomes for 25 couples, both of whom are teachers. ( x1 is the women's incomes in a column, x 2
is the men's. Each line represents one couple. ) Test to see if the women make more than the men.
c. You have interviewed a sample of 80 small businesses in the Northeast and 75 small businesses in the Southeast. Each business
has indicated whether they sell in foreign markets. You want to show that businesses in the Northeast are more likely to export. ( x1
x 2 in the Southeast).
x1 , for a sample of 20 pharmaceutical firms in Europe and profit rates, x 2 , for a sample of 17
is the total number of firms that export in the Northeast sample,
d. You have profit rates,
pharmaceutical firms in the US. You believe that they are normally distributed and you wish to see whether the European firms were
more profitable than the American firms.
5
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
e. In order to see which garage to use under contract for automobile repairs, 10 cars are towed first to garage 1 and than to garage 2.
You end up with two data sets, the first data column, x1 , is estimates from the first garage and the second data column, x 2 , is
estimates for the second garage. Each of the 10 lines of data refers to one car. You believe that the estimates are approximately
normally distributed. Compare the estimates in garage 1 and 2.
f. You are having a part produced in two different machines. x1 is 200 randomly selected data points that represent the length of
parts from machine one,
x 2 is 200 randomly selected data points that represent the length of parts from machine two. You want to
test your suspicion that parts from machine 2 are longer than parts from machine 1. In a problem of this type you would assume that
the lengths are normally distributed.
g. You also suspect that parts from machine two are more variable in length than parts from machine one. Test this suspicion.
Solution with problem statements inserted.
It may help to use the following table taken from the outline.
Paired Samples
Location - Normal distribution.
Method D4
Compare means.
Independent Samples
Methods D1- D3
Location - Distribution not
Normal. Compare medians.
Method D5b
Method D5a
Proportions
Method D6b
Method D6a
Variability - Normal distribution.
Compare variances.
Method D7
a. You have data on income in two villages ( x1 in village 1, x 2 in village 2). You want to test the
hypothesis that village 1 has higher earnings than village 2. You know that income has an extremely skewed
distribution. and you have to decide whether to use the mean or the median income.
Solution: Because of the skewed distribution, the median is the preferred statistic. If  is the median.
H 0 : 1   2
. Since we are comparing medians and the data are not paired, use Method D5a.

H 1 : 1   2
b. You have a sample of earned incomes for 25 couples, both of whom are teachers. ( x1 is the women's
incomes in a column, x 2 is the men's. Each line represents one couple. ) Test to see if the women make
more than the men.
H 0 : 1   2
Solution: If  is the median. 
. Since we are comparing medians and the data are paired, use
H 1 : 1   2
Method D5b.
c. You have interviewed a sample of 80 small businesses in the Northeast and 75 small businesses in the
Southeast. Each business has indicated whether they sell in foreign markets. You want to show that
businesses in the Northeast are more likely to export. ( x1 is the total number of firms that export in the
Northeast sample, x 2 in the Southeast).
 p  x1
 1
n1  H 0 : p1  p 2
H 0 : p1  p 2  0
Solution: If 
or 
. If p  p1  p 2 , then

H
:
p

p
x
2
 1 1
H 1 : p1  p 2  0
 p2  2
n2

Since we are comparing proportions, use Method D6.
H 0 : p  0
.

H 1 : p  0
6
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
d. You have profit rates, x1 , for a sample of 20 pharmaceutical firms in Europe and profit rates, x 2 , for a
sample of 17 pharmaceutical firms in the US. You believe that they are normally distributed and you wish
to see whether the European firms were more profitable than the American firms.
 H 0 : 1   2
 H 0 : 1   2  0
H : D  0
Solution: 
or 
. If D  1  2 , then  0
. Because you believe
 H 1 : 1   2
 H 1 : 1   2  0
H1 : D  0
that the Normal distribution applies, you use a method that compares means. The total sample size is too
small to use Method D1, which means that D2 or D3 should work. You could test the variances for equality
and use D2, or not bother and use D3.
e. In order to see which garage to use under contract for automobile repairs, 10 cars are towed first to
garage 1 and than to garage 2. You end up with two data sets, the first data column, x1 , is estimates from
the first garage and the second data column, x 2 , is estimates for the second garage. Each of the 10 lines of
data refers to one car. You believe that the estimates are approximately normally distributed. Compare the
estimates in garage 1 and 2.
 H 0 : 1   2
Solution: There is no reason to assume that one garage is cheaper than the other, so 
or
 H 1 : 1   2
 H 0 : 1   2  0
H : D  0
. If D  1  2 , then  0
. Again, you compare means because you are,

H
:




0
2
 1 1
 H1 : D  0
presumably, interested in the total amount that you will pay for the repairs, which means that you want the
lowest average cost. The important thing to notice here is that the data are in pairs, so you use Method D4.
f. You are having a part produced in two different machines. x1 is 200 randomly selected data points that
represent the length of parts from machine one, x 2 is 200 randomly selected data points that represent the
length of parts from machine two. You want to test your suspicion that parts from machine 2 are longer than
parts from machine 1. In a problem of this type you would assume that the lengths are normally distributed.
You could use Method D2 (if you tested the variances for equality) or D3 here, but, since you have two
large samples, it would be far easier to use Method D1.
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
Solution: 
or 
. If D  1  2 , then 
.
 H 1 : 1   2
 H 1 : 1   2  0
H1 : D  0
g. You also suspect that parts from machine two are more variable in length than parts from machine one.
Test this suspicion.
H 0 :   
 H 0 :  12   22
2
2
1
2
Solution: 
or 
. In terms of the variance ratio 12 or 22 , the alternate
2
1
 H 1 :  12   22
H 1 :  1   2
hypothesis rules, so H 0 :
 22
 12
 1 and H 1 :
 22
 12
 1 . Since you are comparing variances, use Method D7.
----------------------------------------------------------------------------------------------------------------------------This is just an excerpt from an old solution to grass3, but it may make it easier to do grass3 and take
the exams.. Remember the following:
You have not done a hypothesis test unless you have stated your hypotheses, run the numbers and
stated your conclusion.
The rule on p-value says if the p-value is less than the significance level (alpha =  ) reject the null
hypothesis; if the p-value is greater than or equal to the significance level, do not reject the null
hypothesis.
A table follows.
7
252solnD4 10/19/06 (Open this document in 'Page Layout' view!) Re-edited to replace  or  with D .
From the Formula Table (with Method D4 added):
Interval for
Confidence
Hypotheses
Interval
Difference
H 0 : D  D0 *
D  d z 2  d
between Two
H 1 : D  D0 ,
Means (
 12  22
D  1   2
d 

known)
n1 n 2
(Method D1)
d  x1  x 2
Difference
between Two
Means (
unknown,
variances
assumed equal)
(Method D2)
D  d  t 2 s d
Difference
between Two
Means(
unknown,
variances
assumed
unequal)
(Method D3)
D  d  t 2 s d
Difference
between Two
Means (paired
data.)
(Method D4)
Ratio of Variances
1 , DF2
F1DF


2
1
FDF1 , DF2
2
(Method D7)
sd  s p
H 0 : D  D0 *
1 1

n1 n2
H 1 : D  D0 ,
D  1   2
Test Ratio
z
t
sˆ 2p 
Critical Value
d cv  D0  z  2  d
d  D0
d
d cv  D0  t 2 sd
d  D0
sd
n1  1s12  n2  1s22
n1  n2  2
DF  n1  n2  2
s12 s22

n1 n2
sd 
DF 
H 0 : D  D0 *
 s12 s22 
  
n

 1 n2 
t
d  D0
sd
d cv  D0  t 2 sd
t
d  D0
sd
d cv  D0 t  2 s d
D  1   2
2
   
s12
2
s 22
n1
n1  1
2
n2
n2  1
H 0 : D  D0 *
D  d t  2 s d
H 1 : D  D0 ,
d  x1  x 2
 22
 12
H 1 : D  D0 ,

s22
s12
D  1   2
1 , DF2
F.5DF
 .5  
2
DF1  n1  1
DF2  n 2  1

 2

.5  .5   2    or
1  
2

H0 : 12   22
H1 : 12   22
sd 
sd
n
F DF1 , DF2 
s12
s 22
and
F DF2 , DF1 
s 22
s12
* Same as H 0 : 1   2 , H1 : 1   2 if D0  0. Note that  has been changed to D .
© 2002 Roger Even Bove
8
Download