252solnD2

advertisement
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
D. COMPARISON OF TWO SAMPLES
1. Two Means, Two Independent Samples, Large Samples.
Text 10.1-10.3, 10.7 [10.1 – 10.3, 10.5] (10.1 – 10.3, 10.5)
2. Two Means, Two Independent Samples, Populations Normally Distributed, Population Variances Assumed Equal.
Text 10.4, 10.13a, 10.20a, b, e [10.4, 10.15a, 10.13a,b,e]. For the last
problem: x1  17 .5571 , s1  1.9333 , x 2  19.8905 , s 2  4.5767 (10.4, 10.14, 10.12a,b,e)


3. Two Means, Two independent Samples, Populations Normally Distributed, Population Variances not Assumed Equal.
Optional Text 10.20[10.13c,d] (10.12c,d) See data above. D3, D4
4. Two Means, Paired Samples (If samples are small, populations should be normally distributed).
Text 10.26, 10.29[10.36, 10.37], D1, D2 (10.32*(in 252hwkadd.), [10.34] (different numbers), 10.25[10.35], D1, D2)
5. Rank Tests.
a. The Wilcoxon-Mann-Whitney Test for Two Independent Samples. Text 12.65[10.48] (10.46)
b. Wilcoxon Signed Rank Test for Paired Samples.
Text 12.74-12.76[10.57-59] (10.80-82 on CD), Downing & Clark 18-15, 18-9 (in chapter 17 in D&C 3rd edition), D5
6. Proportions.
Text 10.32, 10.38, 10.39, 12.32** [12.2, 12.7*, 12.8*] (12.2)
7. Variances.
Text 10.40, 10.43-10.48 [10.16, 10.19 - 10.24, 10.25] (10.15, 10.18 - 10.23, 10.24) D6a (below), D6, D7 (A summary problem), D8
(A summary problem)
Graded assignment 3 will be posted.
Solutions to problems in outline points 4 and 5 are in this document.
---------------------------------------------------------------------------------------------------------------------------------
Problems with 2 means and paired samples
Exercise 10.32 (only in 8th edition): The problem is to compare prices of a website, HomeGrocer.com
with local Seattle supermarkets. Data is below.
a. At the 0.05 level, is there evidence of a difference in the average price for products purchased from the
two vendors?
b. Compute the p-value in(a) and interpret its meaning.
c. Set up a 95% confidence interval estimate of the difference in the average price for products purchased
from the two vendors?
d. Compare the results in (a) and (c).
Solution:   .05 . The following results were obtained from Minitab. This is paired data because each
line represents a single product.
————— 10/3/2003 1:38:10 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Berenson\Data Files-8th\Minitab\ONLINE2.mtw".
Retrieving worksheet from file: C:\Berenson\Data Files-8th\Minitab\ONLINE2.mtw
# Worksheet was saved on Mon Apr 09 2001
Results for: ONLINE2.mtw
MTB > Paired
c2 c3.
Paired T-Test and CI: HomeGrocer, Supermarkets
Paired T for HomeGrocer - Supermarkets
HomeGrocer
Supermarkets
Difference
N
8
8
8
Mean
4.878
4.840
0.0375
StDev
2.723
2.798
0.2200
SE Mean
0.963
0.989
0.0778
252solnD2 10/06/03
95% CI for mean difference: (-0.1465, 0.2215)
1
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
T-Test of mean difference = 0 (vs. not = 0): T-Value = 0.48 P-Value = 0.644
#Note that since the p-value is above the significance level, we cannot reject
#the null hypothesis.
MTB > let c4=c2-c3
MTB > print c1-c4
Data Display
Row
1
2
3
4
5
6
7
8
#This is the original data.
Products
Tide High Efficiency, 64 oz.
Oreo Cookies, 20 oz.
Formula 409 Cleaner, 22 oz.
Pampers Newborn Diapers, 40 count
Coke Classic, dozen 12 oz. Cans
Colgate Total Toothpaste, 7.8 oz.
Tropicana Orange Juice, 64 oz.
Cheerrios Whole Grain Cereal, 20 oz.
Difference
0.0
-0.2
-0.1
-0.2
0.4
HomeGrocer
6.99
3.29
2.59
10.79
3.99
3.49
3.59
4.29
0.0
0.1
Supermarkets
6.99
3.49
2.69
10.99
3.59
3.49
3.49
3.99
0.3
MTB > ssq c40
Sum of Squares of Difference
Sum of squares (uncorrected) of Difference = 0.35000
MTB > sum c4
Sum of Difference
Sum of Difference = 0.30000
To do this problem we do not need statistics on x1 (HomeGrocer prices) or x 2 (Supermarket prices), but
only on the difference, d  x1  x 2 which is displayed above. You should be able to compute
 d  0.30 and  d  0.35.
 d  0.30  0.0375
So we have d 
n  8,
2
n
8
and s d2 
sd
d
2
 nd 2
n 1

0.35  80.0375 2
 .04839 , which gives
7
0.04839
 0.00604875  0.07777 .
8
n
If the paired data problem were on the formula table, it would appear as below.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D




1
2
data.)
s
sd  d
n
H 0 : 1   2
* Same as
H 1 : 1   2 if D0  0.
s d  0.04839  0.2200 . We need s d 

H 0 : D  D0
a) From the Formula table, we get 
where D  1   2 , but here D0  0 , so we can write
H 1 : D  D0
H 0 : D  0
H 0 :  1   2
H 0 :  1   2  0
or 
or 
.

H 1 : D  0
H 1 :  1   2
H 1 :  1   2  0
  2.365.
t n1  t .7025

2
2
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
Test Ratio Method: t 
d  D0
0.0375

 .4822 Our ‘reject zone’ is the area above 2.365 and the area
sd
0.07777
below -2.365. Since the computed t-ratio is between these values, we do not reject H 0 .
Critical value method: d cv  D0 t  2 s d  0  2.365 0.07777  0.1839 . Since d  0.0375 is between
0.1839 and +0.1839, we cannot reject H 0 .
The Instructor’s Solution Manual says
H0 : D  0
Test statistic:
Decision: Since the absolute
value of t = 0.4821 is less than
the upper critical value of
2.365, we accept H 0 and
vs. H1 :  D  0 .
t
D  D
 0.4821
SD
n
b) If we compare t calc
conclude that there is not
enough evidence to suggest
that there is significant
difference in the average price
for products purchased from
HomeGrocer.com and Seattle
Supermarkets.
 .4822 with the 7 degrees of freedom line of the t-table, we find that it is larger than
0.402  t ..735 and smaller than 0.549  t ..730 . So, .30  Pt  .4833   .35 . Because this is a 2-sidid test,
double this to get .60  pvalue  .70 . Minitab says that the p-value is .644. The Instructor’s Solution
Manual says, “Using the t-table, .p value >0.5. From Excel the p value is 0.645. The probability of
obtaining a mean difference in average price that gives rise to a test statistic that deviates from 0 by 0.4821
or more in either direction is 0.645.
c) The confidence interval is D  d t  2 s d  0.0375  2.365 0.07777   0.0375  0.18392 or -0.1464 to
0.2214.
d) The Instructor’s Solution Manual says " The results in (a) and (d) are the same. The hypothesized value
of 0 for the difference in the average price for items purchased from HomeGrocer.com and Seattle
Supermarkets is inside the 95% confidence interval.”
Exercise 10.26 [10.36 in 9th] (10.34 in 8th edition): The data below gives a sample of prices of randomly
selected books at a college book store and at Amazon. Only the first two columns are given. a) Is there a
difference between mean prices from the two stores at the 1% significance level? b) What assumptions are
needed? c) Construct and interpret a 99% confidence interval for the mean price difference. d) Compare
results of a) and c)
9th and 10th edition data.
Row
Textbook
Book Store Amazon
x1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Access 2000 Guidebook
HTML 4.0 CD with Java Script
Designing the Physical Education Curriculum
Service Management: Operations, Strategy and IT
Fundamentals of Real Estate Appraisal
Investments
Intermediate Financial Management
Real Estate Principles
The Automobile Age
Geographic Information Systems in Ecology
Geosystems: An Introduction to Physical Geography
Understanding Contemporary Africa
Early Childhood Education Today
System of Transcedental Idealism (1800)
Principles and Labs for Fitness and Wellness
52.22
52.74
39.04
101.28
37.45
113.41
109.72
101.28
29.49
70.07
83.87
23.21
72.80
17.41
37.72
x2
d
57.34 -5.12
44.47
8.27
41.48 -2.44
73.72 27.56
42.04 -4.59
95.38 18.03
119.80 -10.08
62.48 38.80
32.43 -2.94
74.43 -4.36
83.81
0.06
26.48 -3.27
73.48 -0.68
20.98 -3.57
40.43 -2.71
52.96
d2
26.21
68.39
5.95
759.55
21.07
325.08
101.61
1505.44
8.64
19.01
0.00
10.69
0.46
12.74
7.34
2872.21
3
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
Solution: To do this problem we do not need statistics on x1 or x 2 , but only on the difference,
d  x1  x 2 which is displayed above. You should be able to compute n  15,
d
2
 d  52.96 and
 2872 .21 .
So we have x1  x 2  d 
 d  52.96  3.5307
n
15
and s d2 
d
2
 nd 2
n 1

2872 .21  153.5307 2
14
 191 .8016 , which gives s d  191 .8016  13 .8492 . We need
sd 
sd
n

191 .8016
 12 .7868  3.5759 .
15
H 0 : D  D0
a) From the Formula table, we get 
where D  1   2 , but here D0  0 , so we can write
H 1 : D  D0
H 0 : D  0
H 0 :  1   2
H :    2  0
or  0 1
or 
.

H
:
D

0
H
:




0
2
 1
H 1 :  1   2
 1 1
The Instructor’s Solution Manual says H0: D = 0 There is no difference in the average price of
textbooks between the local bookstore and Amazon.com.
H1: D  0 There is a difference in the average price of textbooks
between the local bookstore and Amazon.com.


n 1
14
  .01 and t    t .005  2.977.
2
Test Ratio Method: t 
d  D0 3.5307

 0.987 . Our ‘reject zone’ is the area above 2.977 and the area
sd
3.5759
below -2.977. Since the computed t-ratio is between these values, we do not reject H 0 . The Instructor’s
Solution Manual says “There is not enough evidence to conclude that there is a difference in the average
price of textbooks between the local bookstore and Amazon.com.”
Critical value method: d cv  D0 t  2 s d  0  2.977 3.5759  10 .6455 . Since d  3.5307 is between
10.6455 and 10 .6455 , we cannot reject H 0 .
b) The Instructor’s Solution Manual says “One must assume that the distribution of the differences between
the average price of business textbooks between the local bookstore and Amazon.com is approximately
normally distributed.”
c) If we look at the 14 df line of the t-table, our t calc  0.987 falls between t .20  0.868 and t .15  1.345 ,
so we can say that .15  Pt  0.987   .20 . Since this is a 2-tailed problem, say .30  pvalue  .40 .
The Instructor’s Solution Manual says that the p-value is .3402.
d) The confidence interval is D  d t  2 sd  3.5307  2.977 3.5759   3.5307  10 .6455 or -7.115 to
14.1762.
e) The Instructor’s Solution Manual says “The results in (a) and (d) are the same. The hypothesized value
of 0 for the difference in the average price for textbooks between the local bookstore and Amazon.com is
inside the 99% confidence interval.
4
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
Same problem with 8th edition data.
Book
Row
1
2
3
4
5
6
7
8
9
10
11
12
x1
d
x2
PriceOn
PriceOff
55.00
47.50
50.50
38.95
58.70
49.90
39.95
41.50
42.25
44.95
45.95
56.95
50.95
45.75
50.95
38.50
56.25
45.95
40.25
39.95
43.00
42.25
44.00
55.60
4.05
1.75
-0.45
0.45
2.45
3.95
-0.30
1.55
-0.75
2.70
1.95
1.35
18.70
d2
16.4025
3.0625
0.2025
0.2025
6.0025
15.6025
0.0900
2.4025
0.5625
7.2900
3.8025
1.8225
57.4450
 d  18 .70 and  d  57.4450 .
 d  18.70  1.5583
So we have x  x  d 
2
n  12,
1
2
n
12
and s d2 
d
 2.5732 , which gives s d  2.5732  1.6041 . We need s d 
2
 nd 2
n 1
sd

n

57 .4450  12 1.5583 2
11
2.5732
 0.2144  0.4631 .
12
H 0 : D  D0
a) From the Formula table, we get 
where D  1   2 , but here D0  0 , so we can write
H 1 : D  D0
H 0 : D  0
H 0 :  1   2
H 0 :  1   2  0
or 
or 
.

H 1 : D  0
H 1 :  1   2
H 1 :  1   2  0
The Instructor’s Solution Manual says H0: D = 0 There is no difference in the average price of
business textbooks between on-campus and off-campus stores.
H1: D  0
There is no difference in the average price of
business textbooks between on-campus and off-campus stores.
  3.106.
  .01 and t n1  t .11
005
2
Solutions below differ somewhat from the Instructor’s Solution Manual.
Test Ratio Method: t 
d  D0 1.5583

 3.365 . Our ‘reject zone’ is the area above 3.106 and the area
sd
0.4631
below -3.106. Since the computed t-ratio is not between these values, reject H 0 . The Instructor’s Solution
Manual says “There is enough evidence to conclude that there is a difference in the average price of
business textbooks between on-campus and off-campus stores.”
Critical value method: d cv D0 t  2 sd  0  3.106 0.4631  1.4384 . Since d  1.5583 is not between
1.4384 and 1.4384 , we reject H 0 .
b) The Instructor’s Solution Manual says “One must assume that the distribution of the differences between
the average price of business textbooks between on-campus and off-campus stores is approximately
normally distributed.”
c) If we look at the 11 df line of the t-table, our tcalc  3.365 falls between t.005  3.106 and t.001  4.025 ,
so we can say that .001  Pt  0.987   .005 . Since this is a 2-tailed problem, say .002  pvalue  .010 .
The Instructor’s Solution Manual says that the p-value is .0067.
5
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
d) The confidence interval is D  d t  2 sd  1.5583  3.206 0.4631   1.558  1.438 or 0.120 to 2.996.
e) The Instructor’s Solution Manual says “The results in (a) and (d) are the same. The hypothesized value
of 0 for the difference in the average price for textbooks between two stores outside the 99% confidence
interval.
Exercise 10.29[10.37 in 9th] (10.35 in 8th edition): The book gives a piece of Excel output based on the
data set perform. Look at it! Interpret it! Excel output has been copied from the text.
t-Test: Paired Two Sample for Means
Before
74.54286
80.90252
35
-0.1342
0
34
-2.69904
0.005376
1.690923
0.010752
2.032243
After
79.8
37.16471
35
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Solution: From the Instructor’s Solution Manual
10.37 From the descriptive statistics provided in the Microsoft Excel output there does not seem to be
any violation of the assumption of normality. The mean and median are similar and the skewness
value is near 0. Without observing other graphical devices such as a stem-and-leaf display, boxand-whisker plot, or normal probability plot, the fact that the sample size (n = 35) is not very small
enables us to assume that the paired t test is appropriate here. The Microsoft Excel output for the
paired t test indicates that a significant improvement in average performance ratings has occurred.
The calculated t statistic of –2.699 falls far below the one-tailed critical value of –1.6909 using a
.05 level of significance. The p value is 0.005376.
Comment: Did they see the same output I saw? If they had actually presented the median and skewness,
and the skewness was small, and the mean and mode close, I might agree that the method is
appropriate.
Problem D1: A trucking company wishes to compare mileage per gallon on its current air filter with a new
product. Results are as below. See if the new filter actually gives better mileage. Assume that the
underlying distribution is normal. Use a 5% significance level.
Current Filter x1 8.6 6.1 11.4 7.9 6.6 8.9 6.4 6.5 6.3
New Filter
a.
b.
c.
d.
x2 8.3 8.2 7.8 11.6 9.8 9.7 6.7 9.9 8.1
Assume each pair of numbers represents experience on a single truck.
Assume that these represent two independent random samples, but 1 = 2.
(Optional) Again assume two random samples, but that the variances are not equal.
Test that the mean is 8.2 for each filter.
6
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
Solution:
x1  7.54
s1  1.68602
x 2  8.98
s2  1.40855
d
 d   14.4  1.44
s d2 
n

10
d  nd
2
n 1
2

65 .92  10  1.44 2
9
 5.02444 . sd  2.24063
x1
x2
d
d2
8.6
6.1
11.4
7.9
6.6
8.9
6.4
6.7
6.5
6.3
75.4
8.3
8.2
7.8
11.6
9.8
9.7
6.7
9.7
9.9
8.1
89.8
0.3
-2.1
3.6
-3.7
-3.2
-0.8
-0.3
-3.0
-3.4
-1.8
-14.4
0.09
4.41
12.96
13.69
10.24
0.64
0.09
9.00
11.56
3.24
65.92
n1  n2  n  10
H :    2  0
 H 0 : 1   2
H : D  0
or  0 1
or  0

H
:




0
H
:



2
2
 1 1
 1 1
H 1 : D  0
D  1   2 , d  x1  x2  7.54  8.98  1.44 .
It’s time to remind you of the four methods we have for comparing two means. We use 3 of them in this
problem. In parts b), c) and d), only the test ratio method will be used. Can you supply the answers for the
other two approaches?
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
s
d
d  x1  x 2
Means (paired
D




1
2
data.)
s
sd  d
n
Difference
H
:
D

D
*
d cv  D0  z d
D  d z 2  d
d  D0
0
0
z
between Two
H
:
D

D
,
d
1
0
Means (
 12  22
D







1
2
known)
d
n
n
1
2
d  x1  x 2
Difference
between Two
Means (
unknown,
variances
assumed equal)
D  d  t 2 s d
Difference
between Two
Means(
unknown,
variances
assumed
unequal)
D  d  t 2 s d
sd  s p
H 0 : D  D0 *
1
1

n1 n2
H 1 : D  D0 ,
D  1   2
t
sˆ 2p 
d  D0
sd
d cv  D0  t  2 s d
n1  1s12  n2  1s22
n1  n2  2
DF  n1  n2  2
s12 s22

n1 n2
sd 
DF 
H 0 : D  D0 *
 s12 s22 
  
n

 1 n2 
H 1 : D  D0 ,
D  1   2
t
d  D0
sd
d cv  D0  t  2 s d
2
   
s12
2
n1
n1  1
s 22
2
n2
n2  1
7
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
a) This is paired data. 8.6 represents experience on truck 1 with carburetor 1 and 8.3 represents experience
s
2.24063
on the same truck with carburetor 2. df  n  1  10  1  9 , s d 

 0.70855
n
10
Do this by one of the following three methods.
d  D0  1.44  0

 2.0323 . Make a diagram with zero in the middle showing a
(i) Test Ratio: t 
sd
0.70855
'reject' region below  t 9  1.833 . Since -2.0323 falls in the 'reject' region, reject H .
0
.05
(ii) Critical Value: d CV  D0  t s d  0  1.833 0.70855   1.299 . Make a diagram with zero in the
middle showing a 'reject' below -1.299. Since d  1.44 falls in the 'reject' region, reject H 0 .
(iii) Confidence interval: D  d  t 2 s d becomes D  d  t s d  1.44  1.833 0.70855   0.141 .
D  0.141 contradicts the null hypothesis D  0 so reject H 0 .
b) We are assuming that these are two independent samples, but  1   2 .
n  1s12  n2  1s22  91.68602 2  91.40855 2  2.403504

So s p2  1
18
n1  n2  2
1
1
 2.403504     2.403504 0.2   0.69333
10
10


18
t.05
 1.734 and t 
1 
 1

and sd  s p2  

 n1 n2 
df  n1  1  n 2  1  10  1  10  1  18.
d  D0  1.44  0

 2.0769 . Make a diagram with zero in the middle showing a
sd
0.69333
'reject' below -1.734. Since t  2.0769 falls in the 'reject' region, reject H 0 .
c)
s12 1.6802 2
s 2 1.40855 2
s2 s2

 0.28427 , 2 
 0.19840 , so 1  2  0.28427  0.29840  0.48267 ,
n1
10
n2
10
n1 n 2
sd 
DF 
s12 s 22

 0.48267  0.69474 , d  x1  x 2  1.44 and
n1 n 2
 s12 s 22 



 n1 n 2 


2
2
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1

0.48267 2
0.28427 2  0.19840 2
9
 17 .4 . We round this down to 17 degrees of freedom,
9
d  D0  1.44  0
17

 2.065 . Make a diagram with zero in the middle showing
 1.740 . t 
and use t .05
sd
0.69474
a 'reject' below -1.740. Since t  2.065 falls in the 'reject' region, reject H 0 .
9
d) We have two samples of ten, so that for each two-tailed test, t.025
 2.262 . H 0 :   8.2 and
H 1 :   8.2 . To use a test ratio, make a diagram with zero in the middle and ‘reject’ zones above 2.262
and below -2.262.
x   0 7.54  8.2
(i) For the first sample, t  1

 1.238 . Since this is between -2.262 and
1.63602
sx
1
10
2.262, do not reject the null hypothesis.
8
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
x 2   0 8.98  8.2

 2.078 . Since this is between -2.262 and
1.40855
s x2
(ii) For the second sample t 
10
2.262, do not reject the null hypothesis.
Problem D2: Do 2-sided and (if appropriate) 1-sided confidence intervals in D.1
Solution:   .05 . In parts a)-c), use D  d  t 2 s d and D  d  t s d .
a) From problem D1, df  9.
9
t.025
 2.262

9
t  1.833
.05
D  d  t 2 s d  1.44  2.262 0.70855   1.44  1.60 .
D  d  t s d  1.44  1.833 0.70855   1.44  1.29  0.14
b) From problem D1, df  18 .
D  d t s
t 18  2.101

.025
t 18  1.734
.05
2
d
 1.44  2.101 0.69333   1.44  1.46 .
D  d  t s d  1.44  1.734 0.69333   1.44  1.269  0.24
c) From problem D1, df  17 .
17
D  d  t 2 s d  1.44  2.110 0.69474   1.44  1.47 .
t.025
 2.110
17
t .05
 1.740
D  d  t s d  1.44  1.740 0.69474   1.44  1.21  0.23
d) From problem D1, df  9.
 1.68602 
  7.54  1.21 .
10 

9
t.025
 2.262
  x1  t 2 sx  7.54  2.262 
9
t.025
 2.262
  x1  t 2 sx  8.98  2.262 
 1.40855 
  8.98  1.01 .
 10 
Exercise 10.25 [10.35 in 9th but I swear that I have never seen this before]: The problem uses data from
the file on your CD called MEASUREMENT to answer the following. a) At the 5% level of significance,
is there evidence of a difference of the mean measurements (from in-line and analytical lab)? b) What
assumption is necessary to perform this test? c) Use a graphical method to evaluate the assumption. d)
Construct and interpret a 95% confidence interval estimate of the difference between the mean
measurements.
Solution: The original data follows in columns c1-c3. I have added labels to the Minitab printout to clarify
it.
—————
10/20/2005 7:33:28 PM
————————————————————
Welcome to Minitab, press F1 for help.
MTB > WOpen "C:\BBS\10th ed student minitab files\MEASUREMENT.MTW".
Retrieving worksheet from file: 'C:\BBS\10th ed student minitab
files\MEASUREMENT.MTW'
Worksheet was saved on Tue Oct 22 2002
Results for: MEASUREMENT.MTW
MTB > let c4=c2-c3
MTB > let c5 = c4*c4
#I am computing the difference.
#I am squaring the difference.
9
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
MTB > print c1-c5
Data Display
Row Sample In-Line
x1
1
1
8.01
2
2
7.56
3
3
7.47
4
4
7.40
5
5
7.83
6
6
7.50
7
7
6.86
8
8
7.31
9
9
7.45
10
10
7.23
11
11
7.37
12
12
7.49
13
13
6.21
14
14
6.68
15
15
5.12
16
16
4.84
17
17
4.84
18
18
5.21
19
19
5.35
20
20
5.60
21
21
5.32
22
22
5.16
23
23
5.66
24
24
6.31
MTB > sum c4
Sum of d
Sum of d = -0.07
Analytical lab
d
dsq
d2
x2
d  x1  x 2
8.01
0.00 0.0000
7.29
0.27 0.0729
7.54
-0.07 0.0049
7.42
-0.02 0.0004
7.80
0.03 0.0009
7.65
-0.15 0.0225
6.93
-0.07 0.0049
7.46
-0.15 0.0225
7.60
-0.15 0.0225
7.40
-0.17 0.0289
7.50
-0.13 0.0169
7.41
0.08 0.0064
6.25
-0.04 0.0016
6.54
0.14 0.0196
5.20
-0.08 0.0064
4.70
0.14 0.0196
4.82
0.02 0.0004
5.33
-0.12 0.0144
5.30
0.05 0.0025
5.40
0.20 0.0400
5.39
-0.07 0.0049
5.17
-0.01 0.0001
5.50
0.16 0.0256
6.24
0.07 0.0049
 d  0.07
MTB > sum c5
Sum of dsq
Sum of dsq = 0.3437
d
2
 0.3437
MTB > describe c4
Descriptive Statistics: d
Variable N
d
24
N*
Mean SE Mean StDev Minimum
Q1
0 -0.0029 0.0249 0.1222 -0.1700 -0.1100
Median
Q3
Maximum
-0.0150 0.0775 0.2700
MTB > Paired c2 c3.
Paired T-Test and CI: In-Line, Analytical lab
Paired T for In-Line - Analytical lab
In-Line
Analytical lab
Difference
N
24
24
24
Mean
6.49083
6.49375
-0.002917
StDev
1.08624
1.11711
0.122207
SE Mean
0.22173
0.22803
0.024945
95% CI for mean difference: (-0.054520, 0.048687)
T-Test of mean difference = 0 (vs not = 0): T-Value = -0.12
0.908
P-Value =
10
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
You can verify that d 
 d   0.07  0.002917
s d2 
d
sd

n
 nd 2
n 1
n
24
0.343496

= 0.0149346 s d  0.0149346  0.122207
23
sd 
2

0.3437  24  0.002917 2
23
s d2
0.0149346

 0.00062227  .024945
n
24
H :    2
H : D  0
H :    2  0
Our hypotheses are  0 1
or  0 1
or  0
  .05
 H 1 : 1   2
H 1 : D  0
 H 1 : 1   2  0
The t test gives a p-value of .908, which is certainly above the significance level and the confidence
interval includes zero, so we cannot reject the null hypothesis. The remainder of the solution is copied from
the Instructor’s Solutions Manual (Slightly edited).
H0 : D  0
(a)
vs. H1 :  D  0
Excel Output:
t-Test: Paired Two Sample for Means
In-Line
Analytical lab
6.490833
6.49375
1.179912 1.247928804
24
24
0.994239
0
23
-0.11692
0.453968
1.71387
0.907937
2.068655
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Test statistic:
t
D  D
d  D0
.
 0.1169 This is the same as t 
sd
SD
n
Decision: Since t = 0.1169 falls between the lower and upper critical values  2.0687,
do not reject H 0 . There is not enough evidence to conclude that there is a difference in
the mean measurements in-line and from an analytical lab.
(b) What assumption is necessary to perform this test?
You must assume that the distribution of the differences between the mean measurements
is approximately normal.
(c) Use a graphical method to evaluate the assumption.
Box-and-whisker Plot
Normal Probability Plot
0.3
Difference
0.25
Difference
0.2
0.15
0.1
0.05
-0.3
0
-0.05 -2
-0.2
-1.5
-0.1
-1
-0.5
0
0
0.1
0.5
0.2
1
1.5
0.3
2
11
-0.1
-0.15
-0.2
Z Value
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
The distributions appear to be right skewed.
(d) Construct and interpret a 95% confidence interval estimate of the difference between the mean
measurements.
Dt
SD
0.1222
 0.0029  2.0687
n
24
0.0545  D  0.0487
This is a problem from a while ago that I left in to give you some more practice.
McClave et. al. Exercise 9.33: a) x1 is 1999 inflation rates as forecasted earlier in the 1999 by nine
economists. x 2 is 2000 inflation rates as forecasted earlier in the 2000 by the same economists. The
question is ‘were they more optimistic in 1999?’ What are our hypotheses?
H 0 : 1   2
H :    2  0
H : D  0
or  0 1
or  0
D  1   2

H1 : 1   2
 H1 : 1   2  0
H 1 : D  0
b) Test the above hypotheses.
n1  n2  n  9
Obs
x1  2.333 x2  2.544
d
 d   1.9  0.211
n
9
 x1  x2  2.333  2.544  0.211
sd2
d

2
 nd
n 1
2

0.77  9 0.211 2
8
 0.0461 . sd  0.2147
1
2
3
4
5
6
7
8
9
x1
x2
d
d2
1.8
2.3
2.3
2.5
2.3
2.5
2.5
2.3
2.5
21.0
2.2
2.3
2.3
3.0
2.4
3.0
2.5
2.6
2.6
22.9
-0.4
0
0
-0.5
-0.1
-0.5
0
-0.3
-0.1
-1.9
0.16
0
0
0.25
0.01
0.25
0
0.09
0.01
0.77
n1  n2  n  9
This is paired data. Each line represents the opinion of one economist.
0.0461 0.2147

 0.07157
9
n
9
Do this by one of the following three methods.
d  D0  0.211  0

 2.948 . Make a diagram with zero in the middle showing a
(i) Test Ratio: t 
sd
0.07157
sd 
s
df  n  1  9  1  8.

8
 1.860 . Since -2.948 falls in the 'reject' region, reject H 0 .
'reject' region below  t.05
(ii) Critical Value: d CV  D0  t s d  0  1.860 0.07157   0.133 . Make a diagram with zero in the
middle showing a 'reject' below -0.133. Since d  0.211 falls in the 'reject' region, reject H 0 .
(iii) Confidence interval: D  d  t 2 s d becomes   d  t sd    0.211  1.860 0.07157   0.078 .
D  0.078 contradicts the null hypothesis D  0 so reject H 0 .
Because the difference between the 1999 number and the 2000 number is negative, the 2000 number
is above the 1999 number and the economists were more optimistic.
12
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
Problems with 2 nonnormal samples (medians)
.
Exercise 12.65 [10.48 in 9th] (10.46 in 8th edition): Da boss ranks 10 individuals each assigned to
traditional training (T) and an experimental method (E) on performance. The 20 employees have been rated
T 1 2 3 5 9 10 12 13 14 15
1-20, where 1 is worst. The data is as follows:
. Using a 5%
E 4 6 7 8 11 16 17 18 19 20
significance level can you say that there is a difference in the median performance between the two
methods?
Solution: These data are not cross-classified so we use Mann-Whitney-Wilcoxon method. They also are
ordered for you.
x1
r1
x2
r2
1
4
Not given
Not given
2
6
3
7
5
8
9
11
10
16
12
17
13
18
14
19
15
20
84
126
H 0 : 1   2
. 
or the null hypothesis is simply 'similar distributions.'
H 1 : 1   2
From the Instructor’s Solution Manual If population 1 is ‘traditional and population 2 is ‘experimental,’ the
null hypothesis says that there is no difference in performance between the traditional and the experimental
training methods,’ and the alternative hypothesis says that there is a difference in performance between the
traditional and the experimental training methods.
Check this by noting that these two rank sums must add to the sum of the first n1  n2  10  10  20  n
nn  1 20 21

 210 , and that
2
2
SR1  SR2  126  84  210 . Our test statistic is the sum of the ranks of the smaller sample, but in this case,
since the samples are of equal size, set SR1  W . Go to table 6a or E8 in the text and find the bounds for
n1  10 and n2  10 . The decision rule is if SR1  W  78 or W  132 , reject H0. Since the test statistic
is between these bounds, do not reject H0. There is not enough evidence to conclude that there is a
difference in performance between the traditional and the experimental training methods.
numbers, and that this is
Exercise 12.74 [10.57 in 9th] (10.80 in 8th edition): You are given the following 12 differences between
two related samples. Test for a median difference of zero. What is the value of the test statistic?
Data: 3.2, 1.7, 4.5, 0.0, 11.1, -0.8, 2.3, -2.0, 0.0, 14.8, 5.6, 1.7.
Solution; H 0 :  0 . If we are testing for a median of zero, the original data appear in the x column. If
we are testing for the difference between two medians replace x by two columns of data and in the third
column compute the difference, d  x1  x 2 . This method cannot handle ties, so remove the zero
differences. Rank the 10 remaining differences bottom-to-top in the column marked r , putting an asterisk
by ties. The last column, r*, resolves the ties by doing things like replacing 2 and 3 by their average, 2.5.
The last column also notes the sign of the differences. We compute T  and T  , the rank sums of the
numbers signed with + and - .
13
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
x
3.2
1.7
4.5
0.0
11.1
-0.8
2.3
-2.0
0.0
14.8
5.6
1.7
d  x  0
3.2
1.7
4.5
0.0
11.1
-0.8
2.3
-2.0
0.0
14.8
5.6
1.7
d  x 
3.2
1.7
4.5
0.0
11.1
0.8
2.3
2.0
0.0
14.8
5.6
1.7
Rank r 
6
2*
7
Discard
9
1
5
4
Discard
10
8
3*
Corrected Rank r *
6+
2.5+
7+
9+
15+
410+
8+
2.5+
We find that T   6  2.5  7  9  5  10  8  2.5  50 and T   5.
nn  1 10 11

 55 . We find that
We should check that for n  10, T    T   
2
2
T   T   50  5  55. The smaller of our totals is 5. If we check Table 7 for n  10, and a 2-sided
  .05 test we find 8 in the .025 column. This means that we reject the null hypothesis if our rank sum is
less than or equal to 8. Since 5 is below 8, reject the null hypothesis
Exercise 12.75 [10.58 in 9th] (10.81 in 8th edition): In 12.74 find the lower and upper critical values of
W from Table E.9? Use a 5% significance level to test for a zero median.
Solution: These two exercises give the procedure for using the text table E9, which differs somewhat from
Table 7.
n ' = 10,  = 0.05, WL = 8, WU = 47
Exercise 12.76 [10.59 in 9th] (10.82 in 8th edition): In 12.74 , what is your statistical decision?
Solution: Since W = 50 > WU = 47, reject H 0 .
14
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
Downing & Clark (Computational Problem) 17-15: Twenty people rank two political candidates (A, B)
on a scale of 1-10. Test the null hypothesis that people have no preference between the candidates. Data is
shown in columns A and B below.   .10 
Solution: Because this is preference data, we cannot assume that it has the normal distribution. Because it
H :   2
is paired, use the Wilcoxon Signed Rank Test.  0 1
or the null hypothesis is simply 'similar
H 1 : 1   2
distributions.'
Person
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
A
2
5
3
7
4
8
9
8
7
5
6
7
8
8
9
9
8
4
10
8
x1 
x 2 
B
8
6
5
8
5
4
8
9
8
6
5
4
5
6
6
7
9
6
7
5
d  x1  x 2
-6
-1
-2
-1
-1
4
1
-1
-1
-1
1
3
3
2
3
2
-1
-2
3
3
d  x1  x 2
6
1
2
1
1
4
1
1
1
1
1
3
3
2
3
2
1
2
3
3
Rank r 
20
1
10
2
3
19
4
5
6
7
8
14
15
11
16
12
9
13
17
18
Corrected Rank r *
20511.55519+
5+
5555+
16+
16+
11.5+
16+
11.5+
511.516+
16+
To explain the calculation of corrected ranks we need the table below. Because of the presence of numbers
of equal magnitude, the number in ' Corrected Rank r * ' is the average of the numbers in ' Rank r  .'
Rank r 
Corrected Rank r *
d  x1  x 2
1
1 2 3 4 5 6 7 8 9
5
2
10 11 12 13
11.5
3
14 15 16 17 18
16
4
19
19
6
20
20
If we sum the corrected ranks we get T   132 for those with a + sign and T   78 for those with a sign. The smaller of these is designated TL  78. (Check: T    T   132  78  210 . This should be
equal to the sum of the first 20 numbers, which is 2021 2  210 .) If we use Table 7 " Critical Values of
TL in the Wilcoxon Signed Rank Sum Test …..," we use the .05 column for a 2-sided test with   .10 . For
n  20 the critical value is 60. Since 78 is above 60, do not reject H 0 .
For values of n above 15, TL , the smaller of T  and T  , has the normal distribution and may
be used here with T  1 4 nn  1  1 4 20 21  105 and variance
 T2  16 2n  1T  16 41105  717.5. If the significance level is 10% and the test is two-sided, we
15
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
reject
z
our
TL   T
T
null

hypothesis
78  105
717 .5
if
z
TL   T
T
does
not
lie
between
 z 2   z.05  1.645.
 1.007 . Since z this is between 1.645 , we do not reject H 0 .
Downing & Clark (Computational Problem) 17-9: Defense budgets are given for two countries over two
decades. Check to see that they have the same distribution. Data is given in columns x1 and x 2 below.
  .10  Note - It is not totally clear to me that the Wilcoxon-Mann-Whitney test is appropriate in this
case, because the data in each row may come from a single year and a paired data test may be more
appropriate, and it is not clear in general why a nonparametric procedure is appropriate. For this reason,
Computational problem 11 may be a better example of when to use this method.
Solution: Because the data are assumed to be independent random samples and thus not paired, use the
H :   2
Wilcoxon-Mann-Whitney Rank Sum Test.  0 1
or the null hypothesis is simply 'similar
H 1 : 1   2
distributions.' n1  10 and n2  10 . In the table below, r1 and r2 represent bottom to top ranking.
x1
10
12
15
16
18
22
26
28
30
29
r1
3
4
7
8
10
12
15
17
19
18
113
x2
9
8
17
14
13
19
24
25
27
31
r2
2
1
9
6
5
11
13
14
16
20
97
So the sums of the ranks are SR1  113, SR2  97 . Check - Note that these two rank sums must add to the
sum of the first n1  n2  10  10  20  n numbers, and that this is
SR1  SR2  97  113  210 .
nn  1 20 21

 210 , and that
2
2
The smaller of SR1 and SR2 is called W . This can be compared against the critical values for TL and TU
in Table 6b. For n1  10 , n2  10 and a 2-tailed test with   .10 , TL  82 and TU  128 . Since W  97 ,
it is between these values and we cannot reject the null hypothesis.
For values of n1 and n 2 that are too large for the tables, W has the normal distribution with mean
W  1 2 n1 n1  n2  1  1 2 1010  10  1  105 and variance  W2  16 n2 W  16 10105  175.
W  W 97  105
z

 0.605 . If we wish to take a p-value approach
W
175
p  value  2Pz  0.605   .5  .2274  .2726 . Since our p-value is larger than the significance level,
we cannot reject the null hypothesis. Though the text uses this method in this problem, strictly speaking, it
should be limited to cases where n2  20, so that we are better off using the table.
16
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
252solnD2 10/06/03
Some problems from McClave et. al. have been left in here for practice.
Mc Clave et al Exercise 15-12†: Let x1 be sample B (the smaller sample) and x 2 be sample A. Test that
the median for sample B is above the median for sample A. The data given is, for A x1  65,35,47,52 and,
for B, x 2  37,40,33,29,42,33,35,26,34 .
H :   2
H : x  x 2
Solution: Our hypotheses become  0 1
or  0 1
  .05 .
H 1 : 1   2
 H 1 : x1  x 2
If we put these data in order we get
x1 x 2
35
28
29
33
33
34
35
37
40
42
We have 13 numbers here that we will rank from 1 to 13. Because
the numbers in x1 are generally higher than the numbers in x 2 , we
rank from top to bottom. the ranks and their sums are to the right.
r1
6 .5
r2
1
2
3 .5
3 .5
5
6 .5
8
9
10
47
52
65
11
12
13
42 .5 48 .5
To check our ranking, note that n1  4, n 2  9, n  n1  n 2  9  4  13, SR1  42.5, SR2  48 .5, and
nn  1 1314 

 91 . We do,
2
2
so our ranking is likely to be correct. We designate the rank sums of the smaller group as W  42.5 . The
problem says   .05 . The sample sizes are too large for table 5, but we can use Table 6a for a one sided
test with n1  4 and n 2  9 . The two critical values it gives are 14 and 42, so we reject the null hypothesis
since W is above the upper bound.
SR1  SR2  42 .5  48.5  91 . Remember that we should have SR1  SR2 
17
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
252solnD2 10/06/03
McClave et. al. Exercise 15-28†: Check the data below to see if they come from similar distributions.
Each line represents a couple.
H :   2
H : x  x 2
Solution;  0 1
or  0 1
  .05 . Because the data are cross-classified, we use the
H 1 : 1   2
H 1 : x1 not  x 2
Wilcoxon signed rank test. The original data appear in the first two columns. In the third column we
compute the difference, d  x1  x 2 . This method cannot handle ties, so we remove the last pair. We rank
these 11 differences bottom-to-top in the column marked r , putting an asterisk by ties. The last column,
r*, resolves the ties by doing things like replacing 5, 6 and 7 by their average, 6. The last column also
notes the sign of the differences. We compute T  and T  , the rank sums of the numbers signed with +
and - .
We find that T   11  1.5  8  3.5  3.5  27.5
x1 x 2
d
r
r*
and T   39.5.
10
2
8 11 11 
We should check that for n  11,
2
1
1 1* 1.5 
T    T   nn  1  1112   66. We find
7
3
4 8
8 
2
2
1 6 5 9
9 
that T    T   39.5  27.5  66. the smaller
2
5  3 5*
6 
of our totals is 27.5. If we check Table 7 for
6 4
2 3 * 3.5 
n  11, and a 2-sided   .05 test we find 11 in
the
.025 column. This means that we reject the
5 8  3 6*
6 
null
hypothesis if our rank sum is less than or
7 10  3 7 *
6 
equal to 11. Since 27.5 is above 11, do not reject
9 7
2 4 * 3.5 
the null hypothesis
2
8  6 10 10 
10
12
11
12
1 2 *
0
1.5 
Problem D5: a) If our sample consists of the numbers 9,14,16,16,18,19,22,23,25,26 , test the hypotheses
 H 0 :   15
by computing x   for each value of x and using the magnitude and sign of the results to

 H 1 :   15
rank them and perform a Wilcoxon signed rank test.   .05 
H 0 : 1   2
b) for the following data, test the hypotheses 
on the following paired samples
H 1 : 1   2
 x1

x2
09 14 16 16 18 19 22 23 25 78
using a Wilcoxon signed rank test.   .05  .
14 10 08 14 13 16 12 40 13 24
Solution: a)
x
9
14
16
16
18
19
22
23
25
26
d  x  0
-6
-1
1
1
3
4
7
8
10
11
d  x 
6
1
1
1
3
4
7
8
10
11
Rank r 
6
1
2
3
4
5
7
8
9
10
Corrected Rank r *
622+
2+
4+
5+
7+
8+
9+
10+
18
252solnD2 10/18/06 (Open this document in 'Page Layout' view!). Re-edited to replace  or  with D .
 H 0 :   15
so d  x  0  x  15 . If we sum the corrected ranks we get T   47 for those with a +

 H 1 :   15
sign and T   8 for those with a - sign. The smaller of these is designated TL  8. (Check:
T   T   47  8  55 . This should be equal to the sum of the first 10 numbers, which is 1011  55 .)
2
If we use Table 7 " Critical Values of TL in the Wilcoxon Signed Rank Sum Test …..," we use the .025
column for a 2-sided test with   .05 . For n  10 the critical value is 8. Since this is equal to our value of
TL , reject H 0 .
b)
Observation
1
2
3
4
5
6
7
8
9
10
x1
9
14
16
16
18
19
22
23
25
78
x2
14
10
8
14
13
16
12
40
13
24
d  x1  x 2
-5
4
8
2
5
3
10
-17
12
54
d  x1  x 2
5
4
8
2
5
3
10
17
12
54
Rank r 
4
3
6
1
5
2
7
9
8
10
Corrected Rank r *
4.53+
6+
1+
4.5+
2+
7+
98+
10+
H 0 : 1   2
. If we sum the corrected ranks we get T   41 .5 for those with a + sign and T   13.5 for

H 1 : 1   2
those with a - sign. The smaller of these is designated TL  13 .5. (Check: T    T   41.5  13.5  55 .
This should be equal to the sum of the first 10 numbers, which is 1011  55 .) If we use Table 7 "
2
Critical Values of TL in the Wilcoxon Signed Rank Sum Test …..," we use the .025 column for a 2-sided
test with   .05 . For n  10 the critical value is 8. Since our value of TL is above the critical value, do not
reject H 0 .
Parts not copied ©2003 Roger Even Bove
19
Download