Document 15930452

advertisement
252y0821 3/31/08
ECO252 QBA2
SECOND EXAM
March 28 2008
Solution-33 pages
Name
KEY
Class Hour___________________
Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not
usually acceptable.
I. (8 points) Do all the following. Make diagrams!
x ~ N 11, 13  - If you are not using the supplement table, make sure that I know it.
54  11 
 0  11
z
 P 0.85  z  3.31  P0.85  z  0  P0  z  3.31
1. P0  x  54   P 
13 
 13
 .3023  .4995  .8018 . Values are underlined on the next page.
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade
the area between -0.85 and 3.31. Because this is on both sides of zero, we must add together the area
between -0.85 and zero and the area between zero and 3.31. If you wish, make a completely separate
diagram for x . Draw a Normal curve with a mean at 11. Indicate the mean by a vertical line! Shade the
area between zero and 54. This area includes the mean (11) and areas to either side of it so we add together
these two areas.
 16  11 

 Pz  2.08   Pz  0  P2.08  z  0  .5  .4812  .0188
2. Px  16   P  z 
13 

For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade
the area below -2.08. Because this is on one side of zero, we must subtract the area between -2.08 and zero
from the entire (larger) area below zero. If you wish, make a completely separate diagram for x . Draw a
Normal curve with a mean at 11. Indicate the mean by a vertical line! Shade the area below -16. This
area does not include the mean (11) so we subtract the area between -16 and the mean from the larger area
below the mean.
41  11 
12  11
z
 P0.08  z  2.31  P0  z  2.31  P0  z  0.08 
3. P12  x  41  P 
13 
 13
 .4896  .0319  .4577
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade
the area between 0.08 and 2.31. Because this is on one side of zero, we subtract the area between zero and
0.08 from the larger area between zero and 2.31. If you wish, make a completely separate diagram for x .
Draw a Normal curve with a mean at 11. Indicate the mean by a vertical line! Shade the area between 12
and 41. This area does not include the mean (11) so we subtract the area between the mean and 12 from the
larger area between the mean and 41.
4.
x.055 (Do not try to use the t table to get this.) For z make a diagram. Draw a Normal curve with a
mean at 0. Indicate zero by a vertical line! z .055 is the value of z with 5.5% of the distribution above it.
Since 100 – 5.5 = 94.5, it is also the 94.5th percentile. Since 50% of the standardized Normal distribution is
below zero, your diagram should show that the probability between z .055 and zero is 94.5% - 50% =
44.5% or P0  z  z.055   .4450 . The closest we can come to this on the standardized Normal table is
P0  z  1.60   .4452 . So z .055  1.60 . To get from z .055 to x.055 , use the formula x    z , which is
x
. x  11  1.6013  31.80 . If you wish, make a completely separate diagram

for x . Draw a Normal curve with a mean at 11. Show that 50% of the distribution is below the mean (11).
If 5.5% of the distribution is above x.055 , it must be above the mean and have 44.5% of the distribution
between it and the mean.
the opposite of z 
31 .8  11 

 Pz  1.60   Pz  0  P0  z  1.60   .5  .4452  .0548
Check: Px  31 .8  P  z 
13 

1
252y0821 3/31/08
TABLE 4
The Standard Normal Distribution.
Example: P0  z  1.21  0.3869
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.1
0.2
0.3
0.4
0.0000
0.0398
0.0793
0.1179
0.1554
0.0040
0.0438
0.0832
0.1217
0.1591
0.0080
0.0478
0.0871
0.1255
0.1628
0.0120
0.0517
0.0910
0.1293
0.1664
0.0160
0.0557
0.0948
0.1331
0.1700
0.0199
0.0596
0.0987
0.1368
0.1736
0.0239
0.0636
0.1026
0.1406
0.1772
0.0279
0.0675
0.1064
0.1443
0.1808
0.0319
0.0714
0.1103
0.1480
0.1844
0.0359
0.0753
0.1141
0.1517
0.1879
0.5
0.6
0.7
0.8
0.9
0.1915
0.2257
0.2580
0.2881
0.3159
0.1950
0.2291
0.2611
0.2910
0.3186
0.1985
0.2324
0.2642
0.2939
0.3212
0.2019
0.2357
0.2673
0.2967
0.3238
0.2054
0.2389
0.2704
0.2995
0.3264
0.2088
0.2422
0.2734
0.3023
0.3289
0.2123
0.2454
0.2764
0.3051
0.3315
0.2157
0.2486
0.2794
0.3078
0.3340
0.2190
0.2517
0.2823
0.3106
0.3365
0.2224
0.2549
0.2852
0.3133
0.3389
1.0
1.1
1.2
1.3
1.4
0.3413
0.3643
0.3849
0.4032
0.4192
0.3438
0.3665
0.3869
0.4049
0.4207
0.3461
0.3686
0.3888
0.4066
0.4222
0.3485
0.3708
0.3907
0.4082
0.4236
0.3508
0.3729
0.3925
0.4099
0.4251
0.3531
0.3749
0.3944
0.4115
0.4265
0.3554
0.3770
0.3962
0.4131
0.4279
0.3577
0.3790
0.3980
0.4147
0.4292
0.3599
0.3810
0.3997
0.4162
0.4306
0.3621
0.3830
0.4015
0.4177
0.4319
1.5
1.6
1.7
1.8
1.9
0.4332
0.4452
0.4554
0.4641
0.4713
0.4345
0.4463
0.4564
0.4649
0.4719
0.4357
0.4474
0.4573
0.4656
0.4726
0.4370
0.4484
0.4582
0.4664
0.4732
0.4382
0.4495
0.4591
0.4671
0.4738
0.4394
0.4505
0.4599
0.4678
0.4744
0.4406
0.4515
0.4608
0.4686
0.4750
0.4418
0.4525
0.4616
0.4693
0.4756
0.4429
0.4535
0.4625
0.4699
0.4761
0.4441
0.4545
0.4633
0.4706
0.4767
2.0
2.1
2.2
2.3
2.4
0.4772
0.4821
0.4861
0.4893
0.4918
0.4778
0.4826
0.4864
0.4896
0.4920
0.4783
0.4830
0.4868
0.4898
0.4922
0.4788
0.4834
0.4871
0.4901
0.4925
0.4793
0.4838
0.4875
0.4904
0.4927
0.4798
0.4842
0.4878
0.4906
0.4929
0.4803
0.4846
0.4881
0.4909
0.4931
0.4808
0.4850
0.4884
0.4911
0.4932
0.4812
0.4854
0.4887
0.4913
0.4934
0.4817
0.4857
0.4890
0.4916
0.4936
2.5
2.6
2.7
2.8
2.9
0.4938
0.4953
0.4965
0.4974
0.4981
0.4940
0.4955
0.4966
0.4975
0.4982
0.4941
0.4956
0.4967
0.4976
0.4982
0.4943
0.4957
0.4968
0.4977
0.4983
0.4945
0.4959
0.4969
0.4977
0.4984
0.4946
0.4960
0.4970
0.4978
0.4984
0.4948
0.4961
0.4971
0.4979
0.4985
0.4949
0.4962
0.4972
0.4979
0.4985
0.4951
0.4963
0.4973
0.4980
0.4986
0.4952
0.4964
0.4974
0.4981
0.4986
3.0
0.4987
0.4987
0.4987
0.4988
0.4988
0.4989
0.4989
0.4989
0.4990
0.4990
For values above 3.09, see below
If z 0 is between
3.08
3.11
3.14
3.18
3.22
3.27
3.33
3.39
3.49
3.62
3.90
and
and
and
and
and
and
and
and
and
and
and
3.10
3.13
3.17
3.21
3.26
3.32
3.38
3.48
3.61
3.89
up
P0  z  z0  is
.4990
.4991
.4992
.4993
.4994
.4995
.4996
.4997
.4998
.4999
.5000
2
252y0821 3/31/08
II. (5+ points) Do all the following. Look them over first – There is a section III in the in-class exam and
the computer problem is at the end. Show your work where appropriate. There is a penalty for not
doing Problem 1. Page 11 is left blank if you need more space for calculations.
Note the following:
1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish
the exam and might want to skip some questions.
2. A table identifying methods for comparing 2 samples is at the end of the exam.
3. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may
receive credit for this even if you are wrong.
4. Use a 5% significance level unless the question says otherwise.
5. Read problems carefully. A problem that looks like a problem on another exam may be quite different.
6. Make sure that you state your null and alternative hypothesis, that I know what method you are using and what the
conclusion is when you do a statistical test. Use a significance level of .05 unless you are told otherwise.
1. You wish to assess the stability of the price of a stock and you find closing prices for the last year.
Rather than computing a variance of the entire population you take a sample of seven randomly picked
closing prices and compute a sample standard deviation. The sample is below – compute the sample
standard deviation. Show your work! (3)
Row x
1 89
2 124
3 56
4 94
5 75
6 82
7 63
6
For your convenience the sum of the first six numbers in x is
x
 520 and the sum of the first six numbers
i 1
6
squared is
x
2
 47618 .
i 1
Solution: If you took advantage of the numbers that were given
x
2
 47618 63  47618 3969  51587 .
 x  520  63  583 and
2
If you wasted time by not using this freebie, the results are as below. Of course you should not bother with
the last two columns.
Row
1
2
3
4
5
6
7
x
x2
89
124
56
94
75
82
63
583
7921
15376
3136
8836
5625
6724
3969
51587
xx
5.7143
40.7143
-27.2857
10.7143
-8.2857
-1.2857
-20.2857
0.0000
 x  x 2
32.65
1657.65
744.51
114.80
68.65
1.65
411.51
3031.43
If you used the computational formula, you got s x2 
 x  583 ,  x  51587 ,
 x  x   0 (a check) and  x  x 2
2
We thus have
 3031 .43 . So the mean of x is
x
x
2
 x  583  83.2857 .
n
 nx 2
n 1
7

51587  783 .2857 2 3031 .4452

6
6
 505 .2409 and s x  505 .2409  22 .4776 . If you wasted more time by using the definitional formula,
you got s x2 
 x  x 
n 1
2

3031 .43
 505 .2383 and s x  505 .2383  22 .4775 . Minitab gets
6
s x2  505.238 and s x  22.48 .
3
252y0821 3/31/08
2) You wish to compare this stock against a second stock that your friend recommends. Your friend has taken a random
sample of 10 closing prices and assures us that the sample mean price of this stock is 117.699 and the sample standard
deviation is 55.2764. You don’t like your friend’s stock because 1) it has a larger variance, indicating that it is riskier
and it costs more per share. The values you get are in the y column with z y being the y values with the 117.699
subtracted and the result divided by 55.2764. Compare the variances using a statistical test of the equality of variances.
(2)
zy
y
Row
1
2
3
4
5
6
7
8
9
10
78.48
130.93
93.17
105.37
69.50
85.43
102.84
259.27
151.17
100.83
-0.71
0.24
-0.44
-0.22
-0.87
-0.58
-0.27
2.56
0.61
-0.31
Solution: We have s x2  505.238 , df x  6 , s 2y 55 .2764 2  3055 .480 and df y  9 . From Table 3 we
have the following for comparison of two variances on the assumption of Normality. Our hypotheses are
H 0:  12   22 and H 1:  12   22
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Ratio of Variances  22 s22 DF , DF
s12
H0 : 12   22
DF1 , DF2
1
2

F
F




.
5

.
5

2
 12 s12
s2
H : 2   2
1 , DF2
F1DF


2
1
FDF1 , DF2
2
1
DF1  n1  1
1
2
2
and
DF2  n 2  1
F DF2 , DF1 
 2

.5  .5   2    or
1  
2

As explained in class, we should compare the larger of the two ratios F
s 22
s12
DFx , DFy
against an appropriate value of F . The larger of the two ratios is F 9,6  
2
s 2y
s x2


s x2
s 2y
or F
DFy , DFx

s 2y
s x2
3055 .480
 6.0476 . The
505 .238
9,6   5.52 , and since the computed
part of the F table with df 2  6 is below. So F.025
F is larger than the
table F , we reject the null hypothesis. Though we really should divide the standard deviation by the mean
to get per-dollar risk, the second stock looks riskier.

df 2  6
df1
1
2
*
*
0.100 3.78 3.46
0.050 5.99 5.14
0.025 8.81 7.26
0.010 13.75 10.92
3
4
5
*
3.29
4.76
6.60
9.78
*
3.18
4.53
6.23
9.15
*
3.11
4.39
5.99
8.75
6
*
3.05
4.28
5.82
8.47
7
*
8
*
3.01
4.21
5.70
8.26
*
2.98
4.15
5.60
8.10
9
*
2.96
4.10
5.52
7.98
10
11
12
*
2.94
4.06
5.46
7.87
*
2.92
4.03
5.41
7.79
*
2.90
4.00
5.37
7.72
4
252y0821 3/31/08
The parts of Table 3 to be used in questions 3) and 5) follow.
Interval for
Confidence
Hypotheses
Interval
Difference
H 0 : D  D0 *
D  d z 2  d
between Two
H 1 : D  D0 ,
Means (
 12  22
D  1   2
d 

known)
n1
n2
Same as
H 0 : 1   2
d  x1  x 2
Test Ratio
z
d  D0
d
Critical Value
d cv  D0  z d
H 1 : 1   2
if D 0  0.
Difference
between Two
Means (
unknown,
variances
assumed equal)
D  d  t 2 s d
Difference
between Two
Means(
unknown,
variances
assumed
unequal)
D  d  t 2 s d
sd  s p
H 0 : D  D0 *
1
1

n1 n2
t
H 1 : D  D0 ,
D  1   2
sˆ 2p 
d  D0
sd
d cv  D0  t  2 s d
n1  1s12  n2  1s22
n1  n2  2
DF  n1  n2  2
s12 s22

n1 n2
sd 
DF 
H 0 : D  D0 *
 s12 s22 
  
n

 1 n2 
t
H 1 : D  D0 ,
D  1   2
d  D0
sd
d cv  D0  t  2 s d
2
   
s12
2
n1
n1  1
s 22
2
n2
n2  1
3) Are you sure that stock y has a higher average price than stock x ? Using the results of 2) compare the
mean prices. If you do not assume equality of variances assume that you can use 14 degrees of freedom for
the test. (3)
Solution: We have x  83.2857 , s x2  505.238 , n x  7 , y  117.699 , s 2y  3055 .480 and n y  10 .
H 0:  x   y and H 1:  x   y or, if D   x   y , H 0: D  0 and H 1: D  0
As a result of our test in 2), we cannot assume equal variances. So we can compute
s d 
s12 s 22
505 .238 3055 .480



 72 .1769  305 .5480  377 .7249  19 .43514 ,
n1 n 2
7
10
14
 1.761 . We will use only one of the following three methods.
d  83.286  117 .899  34.613 and t .05
Test Ratio: t 
d  0  34 .613

 1.781 . since this is a left-sided test, our ‘reject’ region is all points
sd
19 .43514
14
 1.761 . Since t  1.781 is below -1.761, we (barely) reject H 0 .
below  t .05
Critical value for the difference between sample means: For a two-sided test d cv  0  t 14 s d , but this is a
2
left-sided test and, because the alternative hypothesis is H 1: D  0 , we need one critical value below zero
d cv  0  t14 s d  1.76119 .43514   34 .225 Our ‘reject’ region is all points below -34.225. Since
d  34 .613 is below our critical value, we reject H 0 .
One-sided confidence interval: For a two-sided test D  d  t 14 s d , but this is a left-sided test and,
2
because the alternative hypothesis is H 1: D  0 , we need a one-sided confidence interval or lower limit
5
252y0821 3/31/08
D  d  t14 s d  34 .613  34 .225  0.338 . The interval D  0.338 and does not include zero, so it
contradicts H 0: D  0 , and we reject H 0 .
4) Test stock y to see if it has a Normal distribution. How do your results from the test of Normality affect
your assessment of the results in 2) and 3)? (4)
Solution: we can set this up as a Lilliefors test if we put the numbers in order. All probabilities come from
the standardized Normal table. Since each number is considered a group the O column is all ones. cum O
is a running sum of the O column . Fo is the cum O column divided by n  10 .
Row
1
2
3
4
5
6
7
8
9
10
Sum
zy
y
78.48
130.93
93.17
105.37
69.50
85.43
102.84
259.27
151.17
100.83

z y in order Fe  P z  z y
-0.71
0.24
-0.44
-0.22
-0.87
-0.58
-0.27
2.56
0.61
-0.31
-0.87
-0.71
-0.58
-0.44
-0.31
-0.27
-0.22
0.24
0.61
2.56
.5
.5
.5
.5
.5
.5
.5
.5
.5
.5
+
+
+
.3078
.2611
.2190
.1700
.1217
.1064
.0871
.0948
.2291
.4948
=
=
=
=
=
=
=
=
=
=

O cum O Fo D  Fo  Fe
.1922
.2389
.2810
.3300
.3783
.3935
.4129
.5948
.7291
.9948
1
1 .1000
1
2 .2000
1
3 .3000
1
4 .4000
1
5 .5000
1
6 .6000
1
7 .7000
1
8 .8000
1
9 .9000
1 10 1.0000
10  n
.0922
.0389
.0190
.0700
.1217
.2065
.2871
.2014
.1709
.0052
The relevant part of the new Lilliefors table appears below. We reject our null hypothesis of Normality if
the largest number in the D  Fo  Fe exceeds the number from the Lilliefors table.
TABLE 11: Critical Values for the Lilliefors Test
n
  .20
  .15
  .10
  .05
  .01
4
5
.3027
.2893
.3216
.3027
.3456
.3188
.3754
.3427
.4129
.3959
6
7
8
9
10
.2694
.2521
.2387
.2273
.2171
.2816
.2641
.2502
.2382
.2273
.2982
.2802
.2649
.2522
.2410
.3245
.3041
.2825
.2744
.2616
.3728
.3504
.3331
.3162
.3037
The maximum discrepancy between the Observed and Expected cumulative distributions is .2871. This
exceeds the table value of .2410.
5) Using the sample means and standard deviations you found in 2) and 3) but assuming that both samples
are of size 100 and come from a Normal distribution, do an 11% confidence interval for the difference
between the means. (2)
[14]
Solution: We have x  83.2857 , s x2  505.238 , n x  100 , y  117.699 , s 2y  3055 .480 and n y  100 .
H 0:  x   y and H 1:  x   y or, if D   x   y , H 0: D  0 and H 1: D  0
As a result of our test in 2), we cannot assume equal variances. So we can compute
s d 
s12 s 22
505 .238 3055 .480



 5.05238  30 .55480  35 .60718  5.96718 ,
n1 n 2
100
100
d  83.286  117 .899  34.613 and this is the large sample case, so use z .05  1.645 . We will use only
one of the following three methods.
d  0  34 .613
Test Ratio: z 

 5.801 . since this is a left-sided test, our ‘reject’ region is all points
sd
5.96718
below z .05  1.645 . Since t  5.801 is below -1.645, we (definitely) reject H 0 .
6
252y0821 3/31/08
Critical value for the difference between sample means: For a two-sided test d cv  0  z  s d , but this is a
2
left-sided test and, because the alternative hypothesis is H 1: D  0 , we need one critical value below zero
d cv  0  z s d  1.645 5.96718   9.816 Our ‘reject’ region is all points below -9.816. Since
d  34 .613 is below our critical value, we reject H 0 .
One-sided confidence interval: For a two-sided test D  d  z  2 s d , but this is a left-sided test and, because
the alternative hypothesis is H 1: D  0 , we need a one-sided confidence interval or lower limit
D  d  z s d  34 .613  9.816  24 .797 . The interval D  24.797 and does not include zero, so it
contradicts H 0: D  0 , and we reject H 0 .
7
252y0821 3/31/08
III. (18+ points) Do as many of the following as you can. (2 points each unless noted otherwise). Look
them over first – the computer problem is at the beginning. Show your work where appropriate.
1. Computer question.
a) Turn in your first computer output. Only do b, c and d if you did. (3)
b) (Meyer and Krueger) A corporation rents apartments within the city of Phoenix and in the
surrounding suburbs. It wishes to verify that the mean rent in the city is lower than in the suburbs. Two
independent random samples are taken. These appear below.
City
401.84
666.95
804.01
611.09
Suburb
458.98
994.09
810.44
764.69
815.86
755.37
715.30
314.14
584.52
650.46
904.77
587.72
870.44
970.26
639.96
657.92
403.64
617.37
506.58
695.45
752.60
735.26
444.47
567.60
574.94
538.26
313.08
752.66
398.33
732.83
667.61
762.35
670.29
458.07
396.20
656.04
676.23
364.37
953.06
728.25
187.23
878.82
720.20
745.79
793.68
764.80
879.91
737.99
566.75
279.74
918.40
654.05
841.70
648.31
1106.17
919.93
(i) What are your null and alternative hypotheses?
Solution: The alternate hypothesis is H 1: 1   2 so the null hypothesis is H 0: 1   2 or, if
D  1   2 , H 0: D  0 and H 1: D  0 . So we have a left-sided test.
(ii) Three tests appear below – Which is correct for your null hypotheses? (3)   .01
Test 1.
MTB > TwoSample c1 c2;
SUBC>
Confidence 99.0.
Two-Sample T-Test and CI: City, Suburb
Two-sample T for City vs Suburb
SE
N Mean StDev Mean
City
30
590
169
31
Suburb 30
743
189
34
Difference = mu (City) - mu (Suburb)
Estimate for difference: -153.0
99% CI for difference: (-276.2, -29.7)
T-Test of difference = 0 (vs not =): T-Value = -3.31
DF = 57
P-Value = 0.002
Test 2.
MTB > TwoSample c1 c2;
SUBC>
Confidence 99.0;
SUBC>
Alternative -1.
Two-Sample T-Test and CI: City, Suburb
Two-sample T for City vs Suburb
SE
N Mean StDev Mean
City
30
590
169
31
Suburb 30
743
189
34
Difference = mu (City) - mu (Suburb)
Estimate for difference: -153.0
99% upper bound for difference: -42.3
T-Test of difference = 0 (vs <): T-Value = -3.31
P-Value = 0.001
DF = 57
Solution: Test 2 says that the alternative hypothesis is H 1: 1   2 , so that’s what we want. In
our language the null hypothesis is H 0: 1   2 .
8
252y0821 3/31/08
Test 3.
MTB > TwoSample c1 c2;
SUBC>
Confidence 99.0;
SUBC>
Alternative 1.
Two-Sample T-Test and CI: City, Suburb
Two-sample T for City vs Suburb
SE
N Mean StDev Mean
City
30
590
169
31
Suburb 30
743
189
34
Difference = mu (City) - mu (Suburb)
Estimate for difference: -153.0
99% lower bound for difference: -263.7
T-Test of difference = 0 (vs >): T-Value = -3.31
P-Value = 0.999
DF = 57
c) From the output, but using the correct format for a confidence interval, what is an appropriate confidence
interval to test your hypotheses? (2) Solution: The alternative hypothesis is H 1: 1   2 , so the
appropriate confidence interval (or upper limit)l given in Test 2 is ‘99% upper bound for
difference: -42.3.‘ In our format that would be D  42 .3 or 1   2  42 .3
d) What is your conclusion? Why? (   .01 ) (2) Test 2 says ‘P-Value = 0.002.’ So we reject the null
hypothesis H 0: 1   2 because the p-value is below the significance level.
e) What method was used by the computer? D1, D2, D3, D4, D5a, D5b, D6a, D6b, D7? (1)
Explanation: As explained in class this is always the default method for comparing two means.
f) The following tests were run after the original hypotheses tests. What do they tell us about the
appropriateness of the method? Why? (1)
[12]
MTB > NormTest c1;
SUBC>
KSTest.
Probability Plot of City
MTB > NormTest c2;
SUBC>
KSTest.
Probability Plot of Suburb
Solution: On both the plots generated on the next page the result of the Lilliefors (Kolmogorov-Smirnov)
test for normality is ‘p-value > 0.150.’ So, if we are still using a 1% significance level (or any other
common significance level), we cannot reject the null hypothesis of Normality. The t-tests we have used
are dependent on an assumption of Normality. (Actually, we could use a so-called ‘Fat-Pencil Test’ on the
data and notice that the points do appear close to a straight line, which also attests Normality.)
9
252y0821 3/31/08
2. A ‘robust’ test procedure is one that
a) Can only be done with a computer
b) Requires an underlying Normal distribution
c) Is sensitive to slight violations of its assumptions.
d) *Is insensitive to slight violations of its assumptions.
3. (Ng -219, 18) Assume that you have the following information: s12  4 , s 22  6 , n1  16 and n 2  25
and you wish to do a pooled-variance t test, your ŝ p and degrees of freedom are (3) [17]
a) 2.45, 41
b) 2.24, 41
c) 2.29, 41
d) 2.00, 41
e) 2.45, 39
f) 2.24, 39
g)* 2.29, 39
h) 2.00, 39
i) 2.45, 16
j) 2.24, 16
k) 2.29, 16
l) 2.00, 16
m) 2.45, 25
n) 2.24, 25
o) 2.29, 25
p) 2.00, 25
e) It’s more appropriate to add standard
errors and use z
2


n2  1s 22 154  246 204
n

1
s

1
Explanation: The formula for a pooled variance is sˆ 2p  1


n1  n 2  1
16  25  2
39

 5.231 . s p  5.231  2.29 with 3 significant figures. n1  n 2  2  16  25  2  39 .
10
252y0821 3/31/08
4. If I want to test to see if the mean of x1 is smaller than the mean of x 2 my null hypothesis is:
(Note: D  1   2 ) Only check one answer!
(2)
a) 1   2 and D  0
b) 1   2 and D  0
e) * 1   2 and D  0
f) 1   2 and D  0
c) 1   2 and D  0
d) 1   2 and D  0
g) 1   2 and D  0
h) 1   2 and D  0
Explanation: The alternate hypothesis is H 1 : 1   2 , which is the same as saying H 1 : D  0 . The null
hypothesis must contain an equality, so it will read H 0 : 1   2 or H 1 : D  0
5. Consumers are asked to take the Pepsi Challenge. They were asked they which cola they preferred and
the number that preferred Pepsi was recorded. Sample 1 was males and sample 2 was females. The
following was run on Minitab.
[19]
MTB > PTwo 109 46 52 13;
SUBC>
Pooled.
Test and CI for Two Proportions
Sample
X
N Sample p
1
46 109 0.422018
2
13
52 0.250000
Difference = p (1) - p (2)
Estimate for difference: 0.172018
95% CI for difference: (0.0221925, 0.321844)
Test for difference = 0 (vs not = 0): Z = 2.12
P-Value = 0.034
On the basis of the printout above we can say one of the following.
a) At a 99% confidence level we can say that we have enough evidence to state that the proportion
of men that prefer Pepsi differs from the proportion of women that prefer Pepsi
b) *At a 95% confidence level we can say that we have enough evidence to state that the
proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi
c) At a 99% confidence level we can say that we have enough evidence to state that the proportion
of men that prefer Pepsi equals the proportion of women that prefer Pepsi.
d) At a 96% confidence level there is insufficient evidence to indicate that the proportion of men
that prefer Pepsi differs from the proportion of women that prefer Pepsi
Explanation: We have a choice of a 99% confidence interval and a 1% significance level or a 95%
confidence level and a 5% significance level. Since the p-value is .034, we reject the null hypothesis at the
95% confidence level but not at the 99% confidence interval. The null hypothesis is equal proportions.
11
252y0821 3/31/08
6. (Lenzi) A group of runners run a 100 meter dash before and after running a marathon. Their times are
shown below. Pat – how long did they wait before running the second dash?
Row
1
2
3
4
5
6
7
Before
12.4
11.8
12.5
12.0
11.5
11.2
12.9
After
12.6
12.2
12.4
12.7
12.0
11.8
12.7
d
-0.2
-0.4
0.1
-0.7
-0.5
-0.6
0.2
Minitab printed out the following statistics.
Variable
Before
After
d
N
7
7
7
N* Mean
0 12.043
0 12.343
0 -0.300
SE Mean
0.226
0.134
0.131
StDev
0.597
0.355
0.346
Minimum
11.200
11.800
-0.700
Q1
11.500
12.000
-0.600
Median
12.000
12.400
-0.400
Q3
Maximum
12.500 12.900
12.700 12.700
0.100 0.200
Can we show that they were slower after the marathon?
a) How many degrees of freedom do we have in this problem? (1) Solution: This is a paired data
problem (D4) and we have 7 before and after pairs, so DF  6 . Table 3 says the following.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D




1
2
data.)
s
df  n  1 where
sd  d
Same as
n1  n 2  n
n
H 0 : 1   2
H 1 : 1   2
if D 0  0.
b) What are our null and alternative hypotheses? (1) Solution: The alternate hypothesis is
H 1 : 1   2 or H 1 : D  0 . So the null hypothesis, which must contain an equality, is
H 0 : 1   2 or H 0 : D  0 .
c) What is the approximate p-value for our result? (3) Show your work! (2 points if you do not do
a p-value)
d 0
Solution: The formula for a test ratio is t 
, and the printout says d  0.300 and
sd
sd 
0.346
 0.131 . So t 
d  0  0.300

 2.2900 . The part of the t table for 6 degrees of freedom is
sd
0.131
7
below. Since this is a left-sided test, p  value  Pt  2.29 
Significance Level
df .45 .40 .35 .30 .25 .20 .15 .10
.05 .025
.01 .005 .001
6 0.131 0.265 0.404 0.553 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.208
6
6
 1.943  2.2900  2.447  t .025
Since t .05
, we can conclude that .025  Pt  2.290   .05 and, because
this is a left-sided test, .025  p  value  Pt  2.290   .05 .
d) On the basis of your p-value, what is our conclusion if the confidence level is 95%? Why? (1)
Solution: If the confidence level is 95%, the significance level is 5% and we reject the null hypothesis
because the p-value is below the significance level.
12
252y0821 3/31/08
e) What if the confidence level is 99%? Why? (You do not need a p-value to answer this part of
the question, though is would help.) (1)
[26]
Solution: If the confidence level is 99%, the significance level is 1% and we cannot reject the null
hypothesis because the p-value is not below the significance level.
7. A researcher takes independent random samples salaries of 18 women (Sample 1) and 18 men (Sample
2) who are fairly recent Business graduates with the following results.
Row Women
Men Difference
Minitab gives the following statistics.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
64709
47105
28972
31449
42574
59051
26838
56651
64929
57497
38290
67106
67280
40826
60826
46207
58976
45809
40824
54465
68433
54941
54050
53043
45680
40399
57584
78224
53722
34915
59636
50499
53502
77186
60208
48381
23885
-7360
-39461
-23492
-11476
6008
-18842
16252
7345
-20727
-15432
32191
7644
-9673
7324
-30979
-1232
-2572
Descriptive Statistics: Women, Men, d
Variable
Women
Men
d
N
18
18
18
N* Mean SE Mean
0 50283 3156
0 54761 2708
0 -4478 4462
StDev
13392
11488
18929
Test the statement that women have a significantly lower salary than men.
The relevant part of Table 3 appears on page 5.
a) What are your null and alternative hypotheses? (1)
Solution: Here we go again with the same old stuff. ) The alternate hypothesis is H 1 : 1   2 or
H 1 : D  0 . So the null hypothesis, which must contain an equality, is H 0 : 1   2 or H 0 : D  0 .
b) (Extra Credit) If you do not assume that variances of the two samples are equal
(i) How many degrees of freedom do you have? (4)
Solution: So now we are stuck with the Satterthwaite approximation. Let’s start with
s12
13392 2
or 3156 2  9960336
 9963648

18
n1
s 22

n2
11488 2
 7331896 .889
18
s12 s 22


n1 n 2
df 
 s12 s 22 



 n1 n 2 


2
17295544.889
2
 s12 s 22 



 n1 n 2 


2
or
17293600
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1
or df 
or 2708 2  7333264
17295544 .889 2

9963648
17
2
7331896 .889 2

17

17293600

2.9914587  10 14
5.8396636 10 12  3.1621595  10 12
 33 .23059
2
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1
9960336
17
2
2
7333264

17
2

2.9906860  10 14
5.8357820  10 12  3.16333389  10 12
 33 .23309
13
252y0821 3/31/08
It’s a good thing that I had two versions of the standard errors from Minitab, because I got something like
55 on the first try. These both should be rounded down to 33. Incidentally, this calculation was done quite
rapidly by saving the three items in the last ratio in my 25-year-old calculator’s storage.
d  D0
(ii) If you use the formula t 
, what is the value of s d ? (2)
sd
Solution: s d 
s2 s2
s12 s 22
. We had 1  2  17295544.889 or 17293600 so the square root is 4158.79 or

n1 n 2
n1 n 2
4158.56.
(iii) Compute the t ratio and test the hypothesis, clearly stating your conclusions
  .05  (2)
Solution: t 
d  D0
sd

4478  0
 1.077 . We were testing H 0 : 1   2 or H 0 : D  0 against
4158 .7
H 1 : 1   2 or H 1 : D  0 , so if df  33 , we will reject the null hypothesis if the computed t ration is
33
 1.692 . Since 1.077 is not below -1.692, we cannot reject the null hypothesis.
below  t .05
c) If you assume that variances of the two samples are equal
(i) How many degrees of freedom do you have? (1)
Solution: Let’s repeat our previous data.
Descriptive Statistics: Women, Men, d
Variable
Women
Men
d
N
18
18
18
N* Mean SE Mean
0 50283 3156
0 54761 2708
0 -4478 4462
StDev
13392
11488
18929
Degrees of freedom are n1  n2  2  18  18  2  34 . Notice how close this is to the previous calculation.
(ii) If you use the formula t 
d  D0
, what is the value of s d ? (3)
sd
Solution: The formula for a pooled variance is
n  1s12  n2  1s 22 17 13392 2  17 11488 2 13392 2  11488 2
sˆ 2p  1


 155659904 .
n1  n 2  1
18  18  2
2

s p  1556599041 12476.374 .
 1
1 
1 1
  155659904     17295544 .89  4158 .7913
s d  sˆ 2p  
n
n
 18 18 
2 
 1
(iii) Compute the t ratio and test the hypothesis, clearly stating your conclusions
  .05  (2)
[32]
Solution: t 
d  D0
sd

4478  0
 1.077 . We were testing H 0 : 1   2 or H 0 : D  0 against
4158 .7
H 1 : 1   2 or H 1 : D  0 , so if df  34 , we will reject the null hypothesis if the computed t ration is
34
 1.691 . Since 1.077 is not below -1.691, we cannot reject the null hypothesis.
below  t .05
14
252y0821 3/31/08
For the curious, here is the computer output for the two versions of the test.
Two-Sample T-Test and CI: Women, Men
Two-sample T for Women vs Men
N
Mean StDev SE Mean
Women 18 50283 13392
3156
Men
18 54761 11488
2708
Difference = mu (Women) - mu (Men)
Estimate for difference: -4478
95% upper bound for difference: 2561
T-Test of difference = 0 (vs <): T-Value = -1.08
DF = 33
P-Value = 0.145
MTB > TwoSample c1 c2;
SUBC>
Pooled;
SUBC>
Alternative -1.
Two-Sample T-Test and CI: Women, Men
Two-sample T for Women vs Men
N
Mean StDev SE Mean
Women 18 50283 13392
3156
Men
18 54761 11488
2708
Difference = mu (Women) - mu (Men)
Estimate for difference: -4478
95% upper bound for difference: 2555
T-Test of difference = 0 (vs <): T-Value = -1.08
DF = 34
Both use Pooled StDev = 12476.2829
P-Value = 0.145
15
252y0821 3/31/08
8. (Meyer and Krueger again) Back to the Phoenix problem. The people in the problem on page 3 are still
obsessing over the relationship of rents to whether an apartment is urban or suburban. The computer output
from a Chi-Squared test is below.
Results for: 251x0821-06.MTW
MTB > WSave "C:\Documents and Settings\RBOVE\My
Documents\Minitab\251x0821-06.MTW";
SUBC>
Replace.
Saving file as: 'C:\Documents and Settings\RBOVE\My
Documents\Minitab\251x0821-06.MTW'
MTB > print c1-c3
Data Display
Row Rent
City
1 <500
48
2 500-599
51
3 600-699
30
4 700 up
22
Suburb
2
11
17
19
MTB > ChiSquare c2 c3.
Chi-Square Test: City, Suburb
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
City
48
37.75
2.783
Suburb
2
12.25
8.577
Total
50
2
51
46.81
0.375
11
15.19
1.156
62
3
30
35.48
0.848
17
11.52
2.613
47
4
22
30.95
2.591
19
10.05
7.983
41
Total
151
49
200
1
Chi-Sq = 26.925, DF = 3, P-Value = 0.000
a) The above is a Chi-squared test of (1)
i) *independence
ii) homogeneity
iii) goodness-of-fit
iv) none of the above
b) What is the null hypothesis of this test and, assuming a 95% confidence level, what is the
conclusion? (2) [35]
Solution: The null hypothesis is that rents and location are independent. Because the p-value is
extremely low (below 5%), we reject the null hypothesis.
16
252y0821 3/31/08
9. Back to Phoenix again. The people in the previous Phoenix problems are now sure that the distribution
of rents is not Normal but skewed to the right. They select a random sample of 10 rents in the city and
another random sample of 10 rents in the suburbs (1-City, 2-Suburb). The researchers now believe that
rentals in the city are lower than rentals in the country. The researchers will do the following test.
a) T-test of paired data
b) Wilcoxon signed rank test
c) T-test of means of independent samples
d) *Wilcoxon-Mann-Whitney test
e) None of the above
10. Assuming that a rank test of some sort is done in Problem 9, what will be our null hypothesis, and,
assuming that the smaller of the two sums of ranks is 44 and that we are working with a 95% confidence
level, what will be our conclusion and why? (3) [40]
Solution: This is a one-sided test, ( H 0: 1   2 and H 1 : 1   2 ) so that if we work with the appropriate
table, we have the part of Table 6 below. We will reject the hypothesis of equal medians if TL  44 is
below 69. It is. We do.
TABLE 6: Critical values of the Rank Sum for the Mann-Whitney-Wilcoxon Rank Sum
Test for Independent Samples.
Table 6b:   .05 for a 1-tailed test or   .10 for a 2-tailed test.
n1
n2
3
4
5
6
7
8
9
10
11
12
3
4
5
6
7
8
9
10
11
12
-6
18
7
20
8
22
8
25
9
27
9
30
10
32
11
34
11
37
6
18
11
25
12
28
13
31
14
34
15
37
16
40
17
43
18
46
19
49
7
20
12
28
19
36
20
40
21
44
23
47
24
51
26
54
27
58
28
62
8
22
13
31
20
40
28
50
29
55
31
59
33
63
35
67
37
71
38
76
8
25
14
34
21
44
29
55
39
66
41
71
43
76
45
81
47
86
49
91
9
27
15
37
23
47
31
59
41
71
51
85
54
90
56
96
59
101
62
106
9
30
16
40
24
51
33
63
43
76
54
90
66
105
69
111
72
117
75
123
10
32
17
43
26
54
35
67
45
81
56
96
69
111
82
128
86
134
89
141
11
34
18
46
27
58
37
71
47
86
59
101
72
117
86
134
100
153
104
160
11
37
19
49
28
62
38
76
49
91
62
106
75
123
89
141
104
160
120
180
17
252y0821 3/31/08
ECO252 QBA2
SECOND EXAM
March 28, 2008
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Class hours registered and attended (if different):_________________________
IV. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points). In each section state
clearly what number you are using to personalize data. There is a penalty for failing to include your student number on this page, not
clarifying version number in each section and not including class hour somewhere. Please write on only one side of the paper. Be
prepared to turn in your Minitab output for the first computer problem and to answer the questions on the problem sheet
about it or a similar problem.
1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a relationship between the
chosen major and whether the students had student loans. The students’ majors were categorized as Agriculture, Child Development,
a be the second31  a and the number of business majors
Engineering, Liberal Arts, Business, Science and Technology. Before you start personalize the data as follows. Let
to-last digit of your student number. Change the number of Science majors with loans to
who have loans to
24  a for every part of this problem. The total number of students in the survey will not change. Put your version
of the table below on top of the first page of your solution. Use a 99% confidence level in this problem.
Ag Ch Engg Lib Bus Sci Tech
Loan
32 37
98
89
24
31
57
None
35 50
137 124
51
29
71
a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science majors are more likely to
have loans than other majors. Tell which group you consider sample 1. State H 0 and H 1 in terms of the proportions involved and
also in terms of the difference between the proportions, explaining whether this difference is a statistic from sample 1 minus a statistic
from sample 2 or the reverse. (1)
b) Use a test ratio to test your hypotheses from a) (2)
c) Use a critical value for the difference between proportions to test your hypotheses from a) (2)
d) Use an appropriate confidence interval to test your hypotheses from a) (2)
e) Treat each major separately and test the hypothesis that the proportion of students that have loans is independent of major (4)
f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of business students that have
loans with the proportions for the other 6 majors. Tell which differences are significant. (3) [14]
g) (Extra credit) Check your results using Minitab.
(i) To do a chi-squared test on an O table that is in Columns c22-c28, simply put the row labels in Column c21 and print
out your data. Then type in
ChiSquare c22 – c28.
The computer will print back the columns with their names, but below each number from the O table you will find the corresponding
values of E and
O  E 2
E
, the contribution of the value of
O to the chi-square total. Use the p-value to find out if we reject the
hypothesis of equal proportions at the 1% significance level.
(ii) To do a test of the alternative hypothesis
H 1 : p1  p 2 , where p1 
x1
x2
and p 2 
, use the command
n1
n2
x1 , n1 , x 2 and n 2 .
MTB > PTwo x1 n1 x 2 n 2 ;
below, substituting your numbers for
SUBC>
SUBC>
SUBC>
The computer will print back
Confidence 99.0;
Alternative 1;
Pooled.
x1 , n1 , p1 
x1
x2
, x 2 , n 2 and p 2 
a p-value for a z-test and Fisher’s exact test (results
n1
n2
should be somewhat similar to the z-test) and a 1-sided 99% confidence interval.
2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into categories the professor
labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the ‘Fat’ faculty is labeled x1 . A sample of ‘ego
strength’ of the ‘Fit’ faculty is labeled
x 2 . d  x1  x 2 .
Use a 95% confidence level in this problem.
18
252y0821 3/31/08
The professor has computed
scores = 64.96,

x
1
 Sum
of Fat
x12  Sum of squares of Fat scores =
307.607,
x
x
2
 Sum
2
2
 Sum of squares of Fit scores =
of scores of Fit = 90.02,
581.239,
 d  Sum of diff = -25.06 and
 d  Sum of squares of diff = 51.8198.
2
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fat
Fit
x1
x2
4.99
4.24
4.74
4.93
4.16
5.53
4.12
5.10
4.47
5.30
3.12
3.77
5.09
5.40
6.68
6.42
7.32
6.38
6.16
5.93
7.08
6.37
6.53
6.68
5.71
6.20
6.04
6.52
Diff
d  x1  x 2
-1.69
-2.18
-2.58
-1.45
-2.00
-0.40
-2.96
-1.27
-2.06
-1.38
-2.59
-2.43
-0.95
-1.12
b , where b is the last digit of your student number. Please state clearly what row you removed.
n1  n 2  13 rows of data. You will need the mean and variance of all three columns of data if you do
To personalize the data remove row
At this point you will have
all sections of this problem. You can save yourself considerable effort by using the computational formula for the variance with the
sums and sums of squares that the professor computed with the value or value squared of the numbers you removed subtracted.
The professor got the following results.
Variable
n
Mean SE Mean StDev
Median
Fat
14
4.640
0.184 0.690
4.835
Fit
14
6.430
0.115 0.431
6.400
diff
14
-1.790
0.196 0.732
-1.845
Your results should be relatively similar. Credit for computing the sample statistics needed is included in the relevant parts of this
problem. State hypotheses and conclusions clearly in each segment of the problem.
a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of
the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the
variances for the ‘fit’ and ‘fat’ populations are similar. (3)
b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the
ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal
distribution and that the variances for the ‘fit’ and ‘fat’ populations are not similar. (3)
c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in a) of b) is correct? Do the
appropriate test. Assume that the data comes from the Normal distribution. Should we have used a) or b)? (2) [22]
d) Compute the mean and variance of the column of differences and test the column to see if the Normal distribution works for these
data. (4)
e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns come from is Normal, do a
one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people differs. (2)
f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random samples but instead represent
the ego strength of the same 14 or 13 faculty members before and after a fitness program. Assuming that the Normal distribution
applies, can we say that the ego strength of the faculty has increased? (2)
g) Repeat f) under the assumption that the Normal distribution does not apply. (1)
h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35]
i) Extra credit. Use Minitab to check your work.
The commands that you might need are as follows – remember that the subcommand ’Alternative -1’ gives a left-sided test
and ’Alternative +1’ gives a right sided test. If this subcommand is not used a 2-sided test will appear.
The basic command to compare two means for data in c2 and c3 is
MTB > TwoSample c2 c3.
This will produce a 2-sided test using Method D3. A semicolon followed by the Alterative subcommand will produce a 1-sided test.
Adding the subcommand ’Pooled’ switches the method to D2. Remember that a semicolon tells Minitab that a subcommand is
coming and a period tells Minitab that the command is complete. To use Method C4 on the same two columns use the command
MTB > Paired c2 c3.
This also can be modified with the Alternative command.
To test C2 for Normality using a Lilliefors test use
MTB > NormTest c4;
SUBC>
KSTest.
There are two other tests for Normality baked into Minitab. These are the Anderson-Darling test and the Ryan-Joiner test. The graph
produced by any of these can be analyzed by the Fat Pencil Test. To get a basic explanation of these tests use the Stat pull-down menu
hit basic statistics and then Normality Test. Finally hit ‘help’ and investigate the topics available. There will be a small bonus for those
of you who mention Minitab’s problems with English grammar. To use the Anderson-Darling test, use the NormTest command
without a subcommand. To use the Ryan-Joiner test use
19
252y0821 3/31/08
MTB > NormTest c4;
SUBC>
RJTest.
A really impressive paper might compare the results of the 3 tests and then show the results of an internet search on the differences
between them.
The other two tests that are relevant here can be accessed by using the Stat pull-down menu and the Nonparametrics
option. The instruction for a left-sided (Wilcoxon)-Mann-Whitney test would be
MTB > Mann-Whitney 95.0 c2 c3;
SUBC>
Alternative -1.
Minitab’s instructions for a 2-sided Wilcoxon signed rank test of a median of -2 from one sample in C4 would be
MTB > WTest -2 c4.
To do a one-sided test comparing samples in two columns take
d  x1  x 2 and do a test that the median of d is zero. Again
Alternative can be used to get a 1-sided test.
Also there is some advice from last term’s Take-home.
To fake computation of a sample variance or standard deviation of the data in column c1 using column c2 for the squares,
MTB > let C2 = C1*C1 * performs multiplication
MTB > name k1 'sum'
** would do a power, but multiplication
MTB > name k2 'sumsq' is more accurate.
MTB > let k1 = sum(c1)
MTB > let k2 = sum(c2)
This is equivalent to let k2 = ssq(c1)
MTB > print k1 k2
Data Display
This is a progress report for my data
sum
3047.24
set.
sumsq
468657
MTB > name k1 'meanx'
MTB > let k1 = k1/count(c1)
/means division. Count gives n.
MTB > let k2 = k2 - (count(c1))*k1*k1
MTB > print k1 k2
Data Display
meanx
152.362
sumsq
4372.53
MTB > name k2 'varx'
MTB > let k2 = k2/((count(c1))-1)
MTB > print k1 k2
Data Display
meanx
152.362
varx
230.133
MTB > name k2 'stdevx'
MTB > let k2 = sqrt(k2)
MTB > print k1 k2
Data Display
meanx
152.362
stdevx
15.1701
Sqrt gives a square root.
Print C1, C2
To check for equal variances for data in C1 and C2, use
MTB > VarTest c1 c2;
SUBC>
Unstacked.
Both an F test and a Levine test will be run. The Levine test is for non-Normal data so you want the F test results.
To check your mean and standard deviation, use
`
MTB > describe C1
To put a items in column C1 in order in column C2, use
MTB > Sort c1 c2;
SUBC>
By c1.
3. Sorry. This is all I’ve got.
20
252y0821 3/31/08
1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a
relationship between the chosen major and whether the students had student loans. The students’ majors
were categorized as Agriculture, Child Development, Engineering, Liberal Arts, Business, Science and
Technology. Before you start, personalize the data as follows. Let a be the second-to-last digit of your
student number. Change the number of Science majors with loans to 31  a and the number of business
majors who have loans to 24  a for every part of this problem. The total number of students in the survey
will not change. Put your version of the table below on top of the first page of your solution. Use a 99%
confidence level in this problem.
Ag
32
35
Loan
None
Ch
37
50
Engg
98
137
Lib
89
124
Bus
24
51
Sci
31
29
Tech
57
71
a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science
majors are more likely to have loans than other majors. Tell which group you consider sample 1. State H 0
and H 1 in terms of the proportions involved and also in terms of the difference between the proportions,
explaining whether this difference is a statistic from sample 1 minus a statistic from sample 2 or the
reverse. (1)
Solution: Let’s call science group 1. Our alternative hypothesis is now H 1 : p1  p 2 or, if
p  p1  p 2 H 1 : p  0 . Accordingly, our null hypothesis is H 0 : p1  p 2 or H 0 : p  0 .
It’s time to quote Table 3.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
pcv  p0  z 2  p
Difference
p  p 0
p  p  z 2 s p
H 0 : p  p 0
z
between
If
p  0
 p
H 1 : p  p 0
p  p1  p 2
proportions
 1
1 
If p  0
 p  p 0 q 0   
p 0  p 01  p 02
p1 q1 p 2 q 2
q  1 p
s p 

 n1 n 2 
p 01q 01 p 02 q 02
 p 

n1
n2
or p 0  0
n1
n2
n p  n2 p 2
p0  1 1
Or use s p
n1  n 2
Version 0: If a  0 , our table is as below.
Row
1
2
3
C1
Loan
None
Col Total
Ag
32
35
67
Ch
37
50
87
Engg
98
137
235
Lib
89
124
213
Bus
24
51
75
Sci
31
29
60
Tech
57
71
128
Total
368
497
865
Out of 865-60 = 805 non-science majors, 368 – 31 = 337 have loans. This is p 2 
60 science majors p1 
Row
1
2
3
4
5
6
7
Labels
Ag
Ch
Engg
Lib
Bus
Sci
Tech
Loan
32
37
98
89
24
31
57
337
 .418634 . For the
805
31
 .516667 .
60
None
35
50
137
124
51
29
71
Col
Total
67
87
235
213
75
60
128
%Loan
0.477612
0.425287
0.417021
0.417840
0.320000
0.516667
0.445313
Version 9a: If a  9 , our table is as below.
Row
1
2
3
C1
Loan
None
Col Total
Ag
32
35
67
Ch
37
50
87
Engg
98
137
235
Lib
89
124
213
Bus
15
51
66
Sci
40
29
69
Tech
57
71
128
Total
368
497
865
21
252y0821 3/31/08
Out of 865-69 = 796 non-science majors, 368 – 40 = 328 have loans. This is p 2 
69 science majors p 2 
Row
1
2
3
4
5
6
7
Labels
Ag
Ch
Engg
Lib
Bus
Sci
Tech
Loan
32
37
98
89
15
40
57
328
 .412060 . For the
796
40
 .579710 .
69
None
35
50
137
124
51
40
71
Col
Total
67
87
235
213
66
69
128
%Loan
0.477612
0.425287
0.417021
0.417840
0.227273
0.579710
0.445313
Version 9b: If a  9 , and you held the total number in each major constant, you got the table below.
Row
1
2
3
C1
Loan
None
Col Total
Ag
32
35
67
Ch
37
50
87
Engg
98
137
235
Lib
89
124
213
Bus
15
60
75
Sci
40
20
60
Tech
57
71
128
Total
368
497
865
Out of 865-60 = 705 non-science majors, 368 – 40 = 328 have loans. This is p 2 
69 science majors p 2 
Row
1
2
3
4
5
6
7
Labels
Ag
Ch
Engg
Lib
Bus
Sci
Tech
Loan
32
37
98
89
15
40
57
328
 .465248 . For the
705
40
 .666667 .
60
None
35
50
137
124
60
20
71
Col
Total
67
87
235
213
75
60
128
%Loan
0.477612
0.425287
0.417021
0.417840
0.200000
0.666667
0.445313
b) Use a test ratio to test your hypotheses from a) (2)
H 0 : p1  p 2 or H 0 : p  0 and H 1 : p1  p 2 or H 1 : p  0 .
Version 0: If a  0 , our table is as below.
Row
1
2
3
C1
Loan
None
Col Total
Ag
32
35
67
Ch
37
50
87
Engg
98
137
235
Lib
89
124
213
Bus
24
51
75
Sci
31
29
60
Tech
57
71
128
Total
368
497
865
Out of 865-60 = 605 non-science majors, 368 – 31 = 337 have loans. This is p 2 
60 science majors p1 
337
 .418634 . For the
805
p  p 0
31
 .516667 . The test ratio is z 
, z.01  2.327 Since this is a right60
 p
sided test, we reject H 0 if our calculated z-ratio is above 2.327. p  .516667 .418624  .098043
For a test ratio or a critical value for p , p 0 
n1 p1  n 2 p 2 368

 .425434
n1  n 2
865
 1
1 
1 
 1
  .425434 .574566  

  .425434 .574566 .017909 
60
805
n
n


2 
 1
p  0 .098043
 .425434 .574566 .017909   .004378  .066164 . z 

 1.482 so we do not reject
 p
.066164
 p  p 0 q 0 
H 0 . Alternatively, p  value  Pz  1.48   .5  .4306  .0694  .01 -same conclusion.
22
252y0821 3/31/08
c) Use a critical value for the difference between proportions to test your hypotheses from a) (2)
pcv  p0  z 2  p . But we need one critical value above zero so pcv  0  z 2  p
 2.327 .066163   .1540 and we reject H 0 if p is above this value. p  .098043 is clearly below the
critical value, so do not reject H 0 .
d) Use an appropriate confidence interval to test your hypotheses from a) (2) For a confidence interval for
pq
p q
p , s p   1 1  2 2
n
n2
 1

 .516667 .483333  .418634 .581366  
  

  .004162 .000251
60
805



 .004413  .066433 The two-sided formula p  p  z 2 s p . The alternative hypothesis is
H 1 : p  0 , so our confidence interval is p  p  z s p  .098043  3.327 .066433   .1230 . This
interval includes zero, so H 0 : p  0 is not contradicted.
e) Treat each major separately and test the hypothesis that the proportion of students that have loans is
independent of major (4) See below.
f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of
business students that have loans with the proportions for the other 6 majors. Tell which differences are
significant. (3) [14] We are testing H 0 : p1  p 2  p3  p 4  p5  p 6  p 7 .
Our data is repeated below.
Row
1
2
3
C1
Loan
None
Col Total
Ag
32
35
67
Ch
37
50
87
Engg
98
137
235
Lib
89
124
213
Bus
24
51
75
Sci
31
29
60
Tech
57
71
128
Total
368
497
865
We can make this into an expanded O table by summing in each direction, computing row proportions,
pq
finding the proportion with no kids and computing s 2p 
.
n
O
Major
Loan?
Ag
Ch Engg Lib
 32
Yes
37
98
89

No
50 137 124
 35
Sum
67
87
235
213
Pr oportion .4776 .4253 .4170 .4178
pq
.0037 .0028 .0010 .0011
n
I will use the row proportions and the column sums to create the
E
Loan?
Yes
No
Sum
Pr oportion
pq
n
Ag
Ch
28 .50 37 .01

38 .50 49 .99
67
87
.4776 .4253
.0037 .0028
Bus Sci Tch
Total
pr
24
31
57 
368
.4254

51
29
71
497
.5746
75
60
128 865 1.0000
.3200 .5167 .4453
.0029 .0041 .0019
E table.
Major
Engg Lib Bus Sci Tch
Total
pr
99 .98 90 .62 31 .91 25 .53 54 .46  368
.4254

135 .0 122 .38 43 .09 34 .47 73 .54  497
.5746
235
213
75
60
128
571 1.0000
.4170 .4178 .3200 .5167 .4453
.0010 .0011 .0029 .0041 .0019
Actually, all numbers were carried to more places than reported here.
On the next page  2 is computed two ways. This is, of course, unnecessary.
23
252y0821 3/31/08
Row
E
OE
28.504
37.013
99.977
90.617
31.908
25.526
54.455
38.496
49.987
135.023
122.383
43.092
34.474
73.545
865
3.49595
-0.01272
-1.97688
-1.61734
-7.90751
5.47399
2.54451
-3.49595
0.01272
1.97688
1.61734
7.90751
-5.47399
-2.54451
0.000
O
1
32
2
37
3
98
4
89
5
24
6
31
7
57
8
35
9
50
10
137
11
124
12
51
13
29
14
71
Total865
O  E 2
E
O2
E
0.42877
35.925
0.00000
36.987
0.03909
96.062
0.02887
87.412
1.95969
18.052
1.17388
37.648
0.11890
59.663
0.31748
31.822
0.00000
50.013
0.02894 139.006
0.02137 125.639
1.45104
60.359
0.86919
24.395
0.08804
68.544
6.52526 871.526
O  E 2
O2
 n . Both of these two
E
E
formulas are shown above. There is no reason to do both. DF  r  1c  1  2  17  1  6 . So we have
The formula for the chi-squared statistic is  2 
2 

O  E 2
2 6 
E
 6.5253 or  2 


or  2 

O2
 n  871 .52526  865  6.5253 . If we compare our results
E
with  .01  16.8119 , we notice that our computed value is below the table value Since our computed
value of chi-squared is smaller than the table value, we cannot reject our null hypothesis.
Note: Marascuilo Procedure.
The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected and
 
(ii) p a  p b   2 s p , where a and b represent 2 groups, the chi - squared has c  1 degrees of
freedom and the standard deviation is s p 
p a q a pb qb

, you can say that you have a significant
na
nb
difference between p a and p b . This is equivalent to using a confidence interval of
c 1  pa qa
pa  pb   pa  pb    2
p q 

 b b 
n
nb 
 a
Version 0
O
Major
Loan?
Ag
Ch Engg Lib Bus Sci Tch
Total
pr


Yes
32
37
98
89
24
31
57
368
.4254


No
50 137 124
51
29
71
497
.5746
 35
Sum
67
87
235
213
75
60
128 865 1.0000
Pr oportion .4776 .4253 .4170 .4178 .3200 .5167 .4453
pq
.0037 .0028 .0010 .0011 .0029 .0041 .0019
n
pq
Note that I should have carried more places in
. But I wanted a concise table for your convenience.
n
24
252y0821 3/31/08
6
The proportion of business students with loans is .3200 and we are using  2 .05  16.8119. The contrast we
p q

will use is thus p a  p 5   p a  .3200   16 .8119  a a  .0029 
 na

Ag: p1  p 5  .4776  .3200   16 .8119 .0037  .0029   .1576  .3331
Chem: p 2  p 5  .4153  .3200   16 .8119 .0028  .0029   .1053  .3096
Engineering: p 3  p 5  .4170  .3200   16 .8119 .0010  .0029   .0970  .2561
Library: p 4  p 5  .4178  .3200   16 .8119 .0011  .0029   .0978  .2224
Science: p 6  p 5  .5167  .3200   16 .8119 .0041  .0029   .1967  .2593
Tech: p 7  p 5  .4453  .3200   16 .8119 .0019  .0029   .1253  .2841
There is no surprise here. Since the chi-squared test said that there was no significant difference we expect
all of our intervals to include zero. They do.
g) (Extra credit) Check your results using Minitab.
(i) To do a chi-squared test on an O table that is in Columns c1-c7, simply put the row labels in a
Column if you want them and print out your data. Then type in
ChiSquare c1 – c7.
The computer will print back the columns with their names, but below each number from the O table you
O  E 2
, the contribution of the value of O to the chi-square
E
total. Use the p-value to find out if we reject the hypothesis of equal proportions at the 1% significance
level.
will find the corresponding values of E and
Chi-Square Test: O1, O2, O3, O4, O5, O6, O7
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
O1
O2
32
37
28.50 37.01
0.429 0.000
O3
98
99.98
0.039
O4
O5
O6
O7
89
24
31
57
90.62 31.91 25.53 54.46
0.029 1.960 1.174 0.119
2
35
38.50
0.317
50
49.99
0.000
137
135.02
0.029
124
122.38
0.021
51
43.09
1.451
29
34.47
0.869
71
73.54
0.088
497
Total
67
87
235
213
75
60
128
865
1
Total
368
Chi-Sq = 6.525, DF = 6, P-Value = 0.367
(ii) To do a test of the alternative hypothesis H 1 : p1  p 2 , where p1 
x1
x
and p 2  2 , use the
n1
n2
command below, substituting your numbers for x1 , n1 , x 2 and n 2 .
MTB > PTwo x1 n1 x 2 n 2 ;
SUBC>
Confidence 99.0;
SUBC>
Alternative 1;
SUBC>
Pooled.
x1
x
, x 2 , n 2 and p 2  2 a p-value for a z-test and Fisher’s
n1
n2
exact test (results should be somewhat similar to the z-test) and a 1-sided 99% confidence interval. An
example of Minitab output follows.
The computer will print back x1 , n1 , p1 
25
252y0821 3/31/08
MTB > PTwo 805 337 60 31;
SUBC>
Alternative -1;
SUBC>
Pooled.
Test and CI for Two Proportions
Sample
1
2
X
337
31
N
805
60
Sample p
0.418634
0.516667
Difference = p (1) - p (2)
Estimate for difference: -0.0980331
95% upper bound for difference: 0.0118693
Test for difference = 0 (vs < 0): Z = -1.48
Fisher's exact test: P-Value = 0.090
P-Value = 0.069
MTB > PTwo 805 337 60 40;
SUBC>
Alternative -1;
SUBC>
Pooled.
Test and CI for Two Proportions
Sample
1
2
X
337
40
N
805
60
Sample p
0.418634
0.666667
Difference = p (1) - p (2)
Estimate for difference: -0.248033
95% upper bound for difference: -0.143925
Test for difference = 0 (vs < 0): Z = -3.74
Fisher's exact test: P-Value = 0.000
P-Value = 0.000
26
252y0821 3/31/08
2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into
categories the professor labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the
‘Fat’ faculty is labeled x1 . A sample of ‘ego strength’ of the ‘Fit’ faculty is labeled x 2 . d  x1  x 2 .
Use a 95% confidence level in this problem.
The professor has computed
Fat scores = 64.96,
x
2
1
x
1
Row
 Sum of
1
2
3
4
5
6
7
8
9
10
11
12
13
14
 Sum of squares of Fat scores
= 307.607,
x
x
2
 Sum of scores of Fit = 90.02,
2
2
 Sum of squares of Fit scores
= 581.239,
 d  Sum of diff = -25.06 and
 d  Sum of squares of diff =
2
51.8198.
Fat
Fit
Diff
x1
x2
d  x1  x 2
4.99
4.24
4.74
4.93
4.16
5.53
4.12
5.10
4.47
5.30
3.12
3.77
5.09
5.40
6.68
6.42
7.32
6.38
6.16
5.93
7.08
6.37
6.53
6.68
5.71
6.20
6.04
6.52
-1.69
-2.18
-2.58
-1.45
-2.00
-0.40
-2.96
-1.27
-2.06
-1.38
-2.59
-2.43
-0.95
-1.12
To personalize the data remove row b , where b is the last digit of your student number. Please state
clearly what row you removed. At this point you will have n1  n 2  13 rows of data. You will need the
mean and variance of all three columns of data if you do all sections of this problem. You can save yourself
considerable effort by using the computational formula for the variance with the sums and sums of squares
that the professor computed with the value or value squared of the numbers you removed subtracted.
The professor got the following results. Individualized solutions are in 252y0821a.
Variable
Fat
Fit
diff
n
14
14
14
Mean
4.640
6.430
-1.790
SE Mean
0.184
0.115
0.196
StDev
0.690
0.431
0.732
Median
4.835
6.400
-1.845
Your results should be relatively similar. Credit for computing the sample statistics needed is included in
the relevant parts of this problem. State hypotheses and conclusions clearly in each segment of the problem.
a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population
mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that
the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are
similar. (3)
Fat
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
x1
Fit
x12
4.24 17.9776
4.74 22.4676
4.93 24.3049
4.16 17.3056
5.53 30.5809
4.12 16.9744
5.10 26.0100
4.47 19.9809
5.30 28.0900
3.12
9.7344
3.77 14.2129
5.09 25.9081
5.40 29.1600
59.97 282.707
x2
diff
x22
d
To summarize
d2
6.42 41.2164 -2.18 4.7524
7.32 53.5824 -2.58 6.6564
6.38 40.7044 -1.45 2.1025
6.16 37.9456 -2.00 4.0000
5.93 35.1649 -0.40 0.1600
7.08 50.1264 -2.96 8.7616
6.37 40.5769 -1.27 1.6129
6.53 42.6409 -2.06 4.2436
6.68 44.6224 -1.38 1.9044
5.71 32.6041 -2.59 6.7081
6.20 38.4400 -2.43 5.9049
6.04 36.4816 -0.95 0.9025
6.52 42.5104 -1.12 1.2544
83.34 536.616 -23.37 48.9637
x
x
x
x
d
d
1
 59.97,
2
1
 282.707,
2
 83.34,
2
2
 536.616,
 -23.37,
2
 48.9637 and
n1  n 2  n  13 .
27
252y0821 3/31/08
These will have to be used somewhere in the next problems, so let’s get it over with. Of course, you could
have saved lots of time using the numbers that I gave you.
x

59 .97
x1

 4.61308 s12 
n1
13
s1  0.71068
1
x2 
x
2
83 .34
 6.4108 , s 22 
13

n2
s 2  0.44182

d 
x
2
1
 n1 x12
n1  1
x
2
2
 n 2 x 22
n2  1
 d   23.37  1.7977 , s   d
2
d
2

282 .707  134.61308 2 6.06078

 0.50503
12
12

536 .616  136.4108 2 2.342490

 0.19521
12
12
 nxd 2

n
13
n 1
s d  0.76112
The parts of Table 3 useful in a) and b) follow.
Interval for
Confidence
Hypotheses
Interval
Difference
H 0 : D  D0 *
D  d  t 2 s d
between Two
H 1 : D  D0 ,
1
1
Means (
sd  s p

D  1   2
n1 n2
unknown,
variances
assumed equal)
DF  n1  n2  2
Difference
between Two
Means(
unknown,
variances
assumed
unequal)
H 0 : D  D0 *
D  d  t 2 s d
s12 s22

n1 n2
sd 
DF 
 s12 s22 
  
n

 1 n2 
H 1 : D  D0 ,
48 .9637  13 1.7977 2 6.95163

 0.57930
12
12
Test Ratio
t
sˆ 2p 
t
D  1   2
d  D0
sd
Critical Value
d cv  D0  t  2 s d
n1  1s12  n2  1s22
n1  n2  2
d  D0
sd
d cv  D0  t  2 s d
2
   
s12
2
n1
n1  1
s 22
2
n2
n2  1
The hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population
mean of the ‘fat’ faculty translates as H 1: 1   2 (or, if D  1   2 , H 1: D  0 ) , which means that the
null hypothesis is H 0: 1   2 (or H 1: D  0 ).
Solution: If we assume equal variances, we use DF  n1  n 2  2  13  13  2  24 and we have

n  1s12  n2  1s22  120.50503   120.19521   8.40324  0.35014
d  1.7977 and sˆ2p  1
n1  n2  1
13  13  2
24
s

1 
  1
1 1
  0.35014    
 0.59173 . So s d  s p2  
n
n
 13 13 
2 
 1
This is a left sided test, so we will be using  t 24  1.711
p
20.35014 
 .05387  0.23209.
13
.05
d  0  1.7977
Test Ratio: t 

 7.745 . Since this is a left-sided test, our ‘reject’ region is all points
sd
0.23209
24
24
 1.711 . Since t  7.745 is below -1.711, we reject H 0 . Note that t .001
 3.467 so that
below  t .05
the p-value is below .001.
Critical value for the difference between sample means: For a two-sided test d cv  0  t 24 s d , but this is a
2
left-sided test and, because the alternative hypothesis is H 1: D  0 , we need one critical value below zero
28
252y0821 3/31/08
d cv  0  t24 s d  1.7110.23209   0.3971 Our ‘reject’ region is all points below -0.3971. Since
d  1.7977 is below our critical value, we reject H 0 .
One-sided confidence interval: For a two-sided test D  d  t 24 s d , but this is a left-sided test and,
2
because the alternative hypothesis is H 1: D  0 , we need a one-sided confidence interval or lower limit
D  d  t24 s d  1.7977  0.3971  1.401 . The interval D  1.401 does not include zero, so it
contradicts H 0: D  0 , and we reject H 0 .
b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the
population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty.
Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’
populations are not similar. (3)
Solution: Recall x1  4.61308 , s12  0.50503 , s1  0.71068 , x 2  6.4108 , s 22  0.19521, s 2  0.44182 ,

d  1.7977 and n1  n 2  13 .
Our worksheet includes the following.
s12
0.50503

 0.03885
13
n1
s 22
n2
s12 s 22

n1 n 2
DF 
s d 
0.19521

 0.01502
13
s12 s 22

 0.05387  0.23209
n1 n 2
 0.05387
 s12 s 22 



 n1 n 2 


2
2
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1
Test Ratio: t 

0.05387 2
0.03885 2  0.01502 2
12

0.002902
 20.072
0.0001258  0.000188
12
d  0  1.7977

 7.746 . Since this is a left-sided test, our ‘reject’ region is all points
sd
0.23209
20
20
 1.725 . Since t  7.746 is below -1.725, we reject H 0 . Note that t .001
 3.552 so that
below  t .05
the p-value is below .001.
Critical value for the difference between sample means: For a two-sided test d cv  0  t 14 s d , but this is a
2
left-sided test and, because the alternative hypothesis is H 1: D  0 , we need one critical value below zero
d cv  0  t20 s d  1.725 0.23209   0.400 Our ‘reject’ region is all points below -0.400. Since
d  1.7977 is below our critical value, we reject H 0 .
One-sided confidence interval: For a two-sided test D  d  t 14 s d , but this is a left-sided test and,
2
because the alternative hypothesis is H 1: D  0 , we need a one-sided confidence interval or lower limit
D  d  t14 s d  1.7977  0.400  1`.398 . The interval D  1.398 does not include zero, so it
contradicts H 0: D  0 , and we reject H 0 .
29
252y0821 3/31/08
c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in
a) of b) is correct? Do the appropriate test. Assume that the data comes from the Normal distribution.
Should we have used a) or b)? (2) [22]
Solution: Recall s12  0.50503 , s 22  0.19521 and n1  n 2  13 . For a test of H 0:  12   22 , we need only
test the larger of
s12
s 22
and
s 22
s12
against the appropriate F . Because the number of degrees of freedom for
2
both variances is 12, we will test
s12
s 22

0.50503
12,12  3.28 . The ‘reject’ zone is above
 2.587 against F.025
0.19521
3.29, so we do not reject the null hypothesis.
d) Compute the mean and variance of the column of differences and test the column to see if the Normal
distribution works for these data. (4)
Solution: I have already done this above, and have done one Lilliefors test, so I will use Minitab as my
d d
calculator this time around. The d column is put in order. We calculate ' z ' 
. We use the Normal
sd
table to find the cumulative expected frequency Fe  Pz ' z ' . For example we find Pz  1.52710 
 Pz  1.52   .5  P1.52  z  0  .5  .4357  .0643 . The computer, which does not round the value
of ' z' , gets a slightly more accurate value of 0.063368. The O column is added along to get the cum O
column, which is divided by n  13 to get the observed cumulative frequency, Fo . Finally, the D column
gives the absolute values of the difference between the cumulative observed and the cumulative expected
frequencies. The maximum value in the D column, which is .143266, is compared against the 5% value in
the Lilliefors table, which is .2337. Since the computed value does not exceed the table value, we cannot
reject the null hypothesis of Normality.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
d
-2.18
-2.58
-1.45
-2.00
-0.40
-2.96
-1.27
-2.06
-1.38
-2.59
-2.43
-0.95
-1.12
d in order ' z ' 
-2.96
-2.59
-2.58
-2.43
-2.18
-2.06
-2.00
-1.45
-1.38
-1.27
-1.12
-0.95
-0.40
d d
Fe  Pz ' z ' O cum O Fo
sd
-1.52710
-1.04098
-1.02784
-0.83076
-0.50230
-0.34463
-0.26580
0.45682
0.54879
0.69331
0.89039
1.11374
1.83636
0.063368
0.148943
0.152013
0.203055
0.307729
0.365185
0.395196
0.676099
0.708424
0.755943
0.813372
0.867306
0.966848
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
3
4
5
6
7
8
9
10
11
12
13
0.07692
0.15385
0.23077
0.30769
0.38462
0.46154
0.53846
0.61538
0.69231
0.76923
0.84615
0.92308
1.00000
D  Fo  Fe
0.013555
0.004903
0.078756
0.104638
0.076886
0.096354
0.143266
0.060714
0.016116
0.013288
0.032782
0.055771
0.033152
e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns
come from is Normal, do a one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people
differs. (2)
Solution: Because the data are assumed to be independent nonnormal random samples and thus not paired,
 H 0 : 1   2
use the Wilcoxon-Mann-Whitney Rank Sum Test. 
or the null hypothesis is simply a one H 1 : 1   2
sided version of ‘similar distributions.' n1 n 2  13 . In the table below, r1 and r2 represent bottom to top
ranking.
30
252y0821 3/31/08
Row
x1
r1
x2
r2
1
2
3
4
5
6
7
8
9
10
11
12
13
4.24
4.74
4.93
4.16
5.53
4.12
5.10
4.47
5.30
3.12
3.77
5.09
5.40
5
7
8
4
13
3
10
6
11
1
2
9
12
91
6.42
7.32
6.38
6.16
5.93
7.08
6.37
6.53
6.68
5.71
6.20
6.04
6.52
21
26
20
17
15
25
19
23
24
14
18
16
22
260
Recall that n1  13, and n 2  13, and that the total number of numbers that we have ranked is
nn  1 26 27 

 351 and that, to verify
2
2
our ranking, we find that the two rank sums add to 91 + 260 = 351. The outline says that the smaller of
SR1 and SR2 is called W and is compared with Table 5 or 6. W  91 . Neither table has critical values for
problems this large. For values of n1 and n 2 that are too large for the tables, W has the normal distribution
with mean W  1 2 n1 n1  n 2  1  1 2 1327   175 .5 and variance
n  n1  n 2  26. Note that the sum of the first 22 numbers is
 W2  16 n2 W  1 6 13 175 .5  380 .25 . Note that the outline says n1  n 2 , but this does not create a
problem here.. If the significance level is 5% and the test is one-sided, we reject our null hypothesis if
W  W
W  W 91  175 .5  84 .5
z


 4.222 . Since
lies below z .05  1.645 . In this case z 
W
W
19 .5
380 .25
this is below -1.645, we reject H 0 . To get a p-value for this result, use Pz  4.22   .5  .5000   0 .
f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random
samples but instead represent the ego strength of the same 14 or 13 faculty members before and after a
fitness program. Assuming that the Normal distribution applies, can we say that the ego strength of the
faculty has increased? (2)
Solution: Table 3 has the following.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D




1
2
data.)
s
df  n  1 where
sd  d
n1  n 2  n
n
The hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population
mean of the ‘fat’ faculty translates as H 1: 1   2 (or, if D  1   2 , H 1: D  0 ) , which means that the
null hypothesis is H 0: 1   2 (or H 1: D  0 ).

Recall the following. x1  4.61308 , x 2  6.4108 , d  1.7977 , s d2  0.57930 , s d  0.76112 and
n  13 . This means that s 
d
12
 1.782
be using  t .05
sd
n

0.57930
 0.04456  0.21110 . This is a left sided test, so we will
13
31
252y0821 3/31/08
Test Ratio: t 
d  0  1.7977

 8.516 . Since this is a left-sided test, our ‘reject’ region is all points
sd
0.21110
12
12
below  t .05
 1.782 . Since t  8.516 is below -1,782, we reject H 0 . Note that t .001
 3.930 so that
the p-value is below .001.
Critical value for the difference between sample means: For a two-sided test d cv  0  t 12 s d , but this is a
2
left-sided test and, because the alternative hypothesis is H 1: D  0 , we need one critical value below zero
d cv  0  t12 s d  1.782 0.21110   0.37618 Our ‘reject’ region is all points below -0.3971. Since
d  1.7977 is below our critical value, we reject H 0 .
One-sided confidence interval: For a two-sided test D  d  t 24 s d , but this is a left-sided test and,
2
because the alternative hypothesis is H 1: D  0 , we need a one-sided confidence interval or lower limit
D  d  t14 s d  1.7977  0.37618  1.422 . The interval D  1.422 does not include zero, so it
contradicts H 0: D  0 , and we reject H 0 .
g) Repeat f) under the assumption that the Normal distribution does not apply. (1)
Solution: This is a Wilcoxon signed rank test.   .05 . H 0: 1   2 and H 1: 1   2 . The data are below:
The column d is the absolute value of d , the column r ranks the absolute values, and the column r * is
the ranks corrected for ties and marked with the signs on the differences.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
x1
x2
d  x 2  x1
d
r
4.24
4.74
4.93
4.16
5.53
4.12
5.10
4.47
5.30
3.12
3.77
5.09
5.40
6.42
7.32
6.38
6.16
5.93
7.08
6.37
6.53
6.68
5.71
6.20
6.04
6.52
-2.18
-2.58
-1.45
-2.00
-0.40
-2.96
-1.27
-2.06
-1.38
-2.59
-2.43
-0.95
-1.12
2.18
2.58
1.45
2.00
0.40
2.96
1.27
2.06
1.38
2.59
2.43
0.95
1.12
9
11
6
7
1
13
4
8
5
12
10
2
3
r*
91167113485121023-
If we add together the numbers in r * with a
+ sign we get . T   0 . If we do the same
for numbers with a – sign, we get T   91 .
To check this, note that these two numbers
must sum to the sum of the first n numbers,
nn  1 1314 

 91 , and
and that this is
2
2
that T    T   0  91  91 .
We check 0, the smaller of the two rank sums against the numbers in table 7. {wsignedr} For a one-sided
5% test, we use the   .05 column. For n  13 , the critical value is 21, and we reject the null hypothesis
only if our test statistic is below this critical value. Since our test statistic is 0, we reject the null hypothesis.
32
252y0821 3/31/08
h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35]
Solution: This is a Wilcoxon signed rank test.   .05 . H 0: d  2 and H 1: d  1 . The data are below:
We have replaced the x1 column with our old d column and replaced the x 2 column with a column of
2’s. We compute a new d ' column by subtracting the 2’s from the original d column. The column d ' is
the absolute value of d ' , the column r ranks the absolute values, and the column r * is the ranks corrected
for ties and marked with the signs on the differences. Because there is a zero in the ranking of d ' , we have
lowered all the ranks by 1 and left out the zero.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
d
-2.18
-2.58
-1.45
-2.00
-0.40
-2.96
-1.27
-2.06
-1.38
-2.59
-2.43
-0.95
-1.12
2' s d '  d  2' s d '
-2
-2
-2
-2
-2
-2
-2
-2
-2
-2
-2
-2
-2
-0.18
-0.58
0.55
0.00
1.60
-0.96
0.73
-0.06
0.62
-0.59
-0.43
1.05
0.88
0.18
0.58
0.55
0.00
1.60
0.96
0.73
0.06
0.62
0.59
0.43
1.05
0.88
r
3
6
5
1
13
11
9
2
8
7
4
12
10
r*
254+
12+
108+
17+
6311+
9+
If we add together the numbers in
r * with a + sign we get
. T   51 . If we do the same for
numbers with a – sign, we get
T   27 . To check this, note that
these two numbers must sum to the
sum of the first n numbers, and
nn  1 12 13 

 78 ,
that this is
2
2
and that
T   T   51  27  78 .
We check 27, the smaller of the two rank sums against the numbers in table 7. {wsignedr} For a two-sided
5% test, we use the   .025 column. For n  12 , the critical value is 14, and we can only reject the null
hypothesis only if our test statistic is below this critical value. Since our test statistic is 27, we cannot reject
the null hypothesis. Most of you seemed to be looking for magic here. Almost everyone did the test
appropriate to part g. Common sense says that if you want to test for -2, you have to use -2 somewhere in
the problem.
i) Extra credit. Use Minitab to check your work.
All versions of this problem appear in the appendix below in excruciating detail.
33
Download