252y0521 4/07/05 ECO252 QBA2 Name KEY

advertisement
252y0521 4/07/05 (Page layout view!)
ECO252 QBA2
Name KEY
SECOND HOUR EXAM Hour of Class Registered
April 4, 2005
Circle 9am 10am
Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not
usually acceptable.
I. (8 points) Do all the following. Make diagrams!
x ~ N 21, 6 - If you are not using the supplement table, make sure that I know it.
21  21 
 0  21
z
 P 3.50  z  0  .4998
1. P0  x  21 .00   P 
6 
 6
Make a diagram! For z draw a Normal curve with a vertical line at zero in the middle. Shade the area
between -3.50 and 0 and note that it begins or ends at zero, so that you can just look up a single number on
the table.
N
ormal Curv e with Mean 21 and Standard Dev iation N
6orm al Curv e with Mean 0 and Standard Dev iation 1
The Area Between 0 and 21 is 0.4998
The Area Between -3.5 and 0 is 0.4998
0.07
0.4
0.06
0.3
0.04
Density
Density
0.05
0.03
0.2
0.02
0.1
0.01
0.00
0
10
20
Da ta A x is
30
0.0
40
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
10 .22  21 

2. Px  10 .22   P  z 
  Pz  1.80   Pz  0  P1.80  z  0  .5  .4641  .0359
6


Make a diagram! For z draw a Normal curve with a vertical line at zero in the middle. Shade the entire
area below -1.80, and note that it is on one side of the mean, so that you subtract the area between -1.80 and
zero from the entire area below zero.
Normal Curv e with Mean 21 and Standard Dev iation N
6ormal Curv e with Mean 0 and Standard Dev iation 1
The Area to the Left of 10.22 is 0.0362
The Area to the Left of -1.8 is 0.0359
0.07
0.4
0.06
0.3
0.04
Density
Density
0.05
0.03
0.2
0.02
0.1
0.01
0.00
0
10
20
Da ta A x is
30
40
0.0
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
252y0521 4/07/05 (Page layout view!)
30 .4  21 
 7  21
z
3. P7.00  x  30 .4  P 
  P 2.33  z  1.57 
6
 6

 P2.33  z  0  P0  z  1.57   .4901  .4418  .9319
Make a diagram! For z draw a Normal curve with a vertical line at zero in the middle. Shade the area
between -2.33 and 1.57 and note that it is on both sides of the mean, so that you add the area between -2.33
to the area between zero and 1.57.
N
ormal Curv e with Mean 21 and Standard Dev iation N
6orm al Curv e with Mean 0 and Standard Dev iation 1
The Area Between 7 and 30.4 is 0.9316
The Area Between -2.33 and 1.57 is 0.9319
0.07
0.4
0.06
0.3
0.04
Density
Density
0.05
0.03
0.2
0.02
0.1
0.01
0.00
0
10
20
Da ta A x is
30
0.0
40
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
x.035 First we must find z 035 . This is the value of z that has Pz  z .035   .035 or
P0  z  z .035   .5  .035  .4650 . On the Normal table, the closest we can find to .4650 is
P0  z  1.81  .4649 . So z .035  1.81 and x.035    z .035   21  1.816  31.86.
4.
31 .86  21 

Check: Px  31 .86   P  z 
  Pz  1.81  .5  .4649  .0351  .035
6


Make a diagram! For z draw a Normal curve with zero in the middle. Divide the area above zero into
3.5% above z 035 and 50% - 3.5% = 46.5% below z 035 .
Normal Curv e with Mean 0 and Standard Dev iation N
1ormal Curv e with Mean 21 and Standard Dev iation 6
The Area to the Right of 1.81 is 0.0351
The Area to the Right of 31.86 is 0.0351
0.07
0.4
0.06
0.05
Density
Density
0.3
0.2
0.04
0.03
0.02
0.1
0.01
0.0
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
0.00
0
10
20
Da ta A x is
30
40
2
252y0521 4/07/05 (Page layout view!)
How the graphs were made.
The program (macro) that I wrote for this is called Normarea5A and it is called by using the usual way of
calling a macro in Minitab by using its name with a % in front of it. To enable Minitab to find Normarea5A,
a nonsense worksheet called ‘notmuch’ is placed in the same file as NormArea5A and loaded first. It does
not affect the results. The dialog below creates the first graph.
————— 4/6/2005 10:59:45 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > WOpen "C:\Documents and Settings\rbove\My Documents\Minitab\notmuch.MTW".
Retrieving worksheet from file: 'C:\Documents and Settings\rbove\My
Documents\Minitab\notmuch.MTW'
Worksheet was saved on Fri Jan 21 2005
Results for: notmuch.MTW
MTB > %normarea5a
Executing from file: normarea5a.MAC
Graphic display of normal curve areas
Finds and displays areas to the left or right of a given value
or between two values. (This macro uses C100-C116 and K100-K116)
Enter the mean and standard deviation of the normal curve.
DATA> 21
DATA> 6
Do you want the area to the left of a value? (Y or N)
n
Do you want the area to the right of a value? (Y or N)
n
Enter the two values for which you want the area between.
DATA> 0
DATA> 21
...working...
Normal Curve Area
3
252y0521 4/07/05 (Page layout view!)
II. (24+ points) Do all the following? (2points each unless noted otherwise).
Note the following:
1. This test is normed on 50 points, but there are more points possible including the take-home.
You are unlikely to finish the exam and might want to skip some questions.
2. A table identifying methods for comparing 2 samples is at the end of the exam.
3. If you answer ‘None of the above’ in any question, you should provide an alternative
answer and explain why. You may receive credit for this even if you are wrong.
4. Many of you are still wasting both our time by making statements without statistical
tests to back them up.
1.
Computer Problem(Bassett et. al.) 400 children were divided into two groups. The
first and larger group were taught Mathematics by traditional methods.
The second group was taught by an experimental method. Test scores were
recorded and are available. The computer analysis of the data is shown in
the three tests below. The First Test was done using method 2. The
researcher was reprimanded by her supervisor for assuming that the
population variances were equal, so she ran the Second Test without
assuming equal variances. Because she was very annoyed at her supervisor
she ran the third test for equal variances. The output is below. Assume a
significance level of 1%. Do not do any unnecessary computations.
#First Test::
MTB > TwoSample c1 c2;
SUBC>
Pooled;
SUBC>
Alternative -1.
Two-Sample T-Test and CI: x1, x2
Two-sample T for x1 vs x2
N
Mean StDev SE Mean
x1 250 68.45
7.96
0.50
x2 150 70.62
7.06
0.58
Difference = mu (x1) - mu (x2)
Estimate for difference: -2.17521
95% upper bound for difference: -0.87525
T-Test of difference = 0 (vs <): T-Value = -2.76
P-Value = 0.003
DF = 398
P-Value = 0.002
DF = 343
#Second Test:
MTB > TwoSample c1 c2;
SUBC>
Alternative -1.
Two-Sample T-Test and CI: x1, x2
Two-sample T for x1 vs x2
x1
x2
N
250
150
Mean
68.45
70.62
StDev
7.96
7.06
SE Mean
0.50
0.58
Difference = mu (x1) - mu (x2)
Estimate for difference: -2.17521
95% upper bound for difference: -0.91322
T-Test of difference = 0 (vs <): T-Value = -2.84
#Third Test:
MTB > VarTest c1 c2;
SUBC>
Unstacked.
Test for Equal Variances: x1, x2
F-Test (normal distribution)
Test statistic = 1.27, p-value = 0.107
4
252y0521 4/07/05 (Page layout view!)
a) Turn in your first computer assignment (2)
b) Look only at the first test. (i) What are the null and alternative hypotheses? (1)
(ii) Can we conclude that the experimental method is better? What are the numbers in the
output that bring you to this conclusion? (2)
c) Make a drawing of an (almost) Normal curve. Label the center of the curve with a zero
and show the area under the curve that is the p-value. (1) Easiest question on the exam!
Note: (From the outline) A p-value is a measure of the credibility of the null hypothesis and is defined
lower
 low 






as the probability that a test statistic or ratio as extreme  as or more extreme  than the observed
 high 
 higher 




statistic or ratio could occur, assuming that the null hypothesis is true. This means that if we have a
calculated a t - ratio with a value of t calc , and we have a left-sided test, pvalue  Pt  t calc  . It we
have a right sided test , pvalue  Pt  t calc  . If we have a 2-sided test pvalue  2Pt  t calc  or
pvalue  2Pt  t calc  , whichever is smaller. So, for a one-sided test make a diagram of the t
distribution with a mean of zero, find the value of t calc and shade the appropriate side of t calc . For a 2sided test, find both t calc and t calc and shaded the tail above whichever is positive and below
whichever is negative.
Solution: The important line here is
T-Test of difference = 0 (vs <): T-Value = -2.76
P-Value = 0.003
DF = 398
This tells us that the alternative hypothesis is H 1 : 1   2 , that t  2.76 and that
pvalue  .003 . So we can say (i) The null and alternative hypotheses are H 0 : 1   2 (the
opposite of our alternate hypothesis and H 1 : 1   2 . (ii) We can we conclude that the
experimental method is better because we reject the null hypothesis at the 1% significance
level because the p-value is 0.3% and is below 1%. To make the diagram draw a curve with
a mean and a vertical line at zero and shade the area under the curve below -2.76. To get this
area using Minitab I used my program tareaA as below. But since the degrees of freedom are
so large, why not substitute z for t and note that Pz  2.76   .5  .4971  .0029 ?
MTB > %tareaA
Executing from file: tareaA.MAC
Graphic display of t curve areas
Finds and displays areas to the left or right of a given value
or between two values. (This macro uses C100-C116 and K100-K120)
Enter the degrees of freedom.
DATA> 398
Do you want the area to the left of a value? (Y or N)
y
Enter the value for which you want the area to the left.
DATA> -2.76
...working...
t Curve Area
5
252y0521 4/07/05 (Page layout view!)
t Curve with 398 Degrees of Freedom and Standard Deviation 1.00252
The Area to the Left of -2.76 is 0.0030
0.4
Density
0.3
0.2
0.1
0.0
-4
-3
-2
-1
0
Data A xis
1
2
3
4
Data Display
mode
0
median
0
d) Look at the results of the second test. Do they look different to you from the results of the
first test? Why? (2)
Solution: If we look at the same line as the first test we see the following:
T-Test of difference = 0 (vs <): T-Value = -2.84
P-Value = 0.002
DF = 343
The value of t hasn’t changed much and we still have a tiny p-value. It doesn’t look different
to me, but if you expressed a good reason to disagree, fine!
e)
Look at the results of the third test? What do you think were the null and alternative
hypotheses? Was her supervisor right that she should not have assumed equal variances?
Why(2)
[8]
Solution: If we look at the output of the third test we find the following:
Test for Equal Variances: x1, x2
F-Test (normal distribution)
Test statistic = 1.27, p-value = 0.107
If it is a test for equal variances and there is no indication of a 1-sided test, it should have the
hypotheses H 0 :  12   22 and H1 :  12   22 . The p-value is well above the significance level
of 1%, so there is no reason we can reject the assumption of equal variances.
Questions 2-5 refer to Exhibit 1.
Exhibit 1: (Schiffler, Adams) A new feed is supposedly superior to what
you have used in the past to feed your pigs. You divide your pigs into
two troops of 60 pigs each. After one month the results are as follows.
You want to decide if the new feed is actually better. Assume that the
sample data comes from two Normal populations with equal population
variance.

Old Feed x1  175 .9 , s1  12 , n1  60
New Feed x 2  180 .2 , s 2  19 , n 2  60
2.
Note: d  175 .9  180 .2  4.3
What is the alternative hypothesis?
a) 1   2
1   2
c) 1   2
d) * 1   2
b)
e)
3.
None of the above.
What is s d ?
(2 – 3 if I have evidence that you got it correctly)
6
252y0521 4/07/05 (Page layout view!)
a) 0.5
b) 0.7
c) 1.5
d) *2.9
e) 8.4
Explanation: From Table 5 of the syllabus supplement:
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d  t 2 s d
d cv  D0  t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
1 1
Means (
sd  s p

D




n  1s12  n2  1s22
1
2
n1 n2
unknown,
sˆ 2p  1
n1  n2  2
* Same as
variances
H
:



assumed equal)
0
1
2
DF  n1  n2  2
H 1 : 1   2
if D0  0.
n  1s12  n2  1s 22  5912 2  5919 2  144  361  252 .5

s p2  1
118
2
n1  n 2  2
1 
  1
 2 
  252 .5   8.41  2.9
s d  s p2  
 60 
 n1 n 2 
4. If we do not reject the Null hypothesis, do we decide that there is a reason to switch to the new
feed? (1)
Solution: If the alternate hypothesis says 1   2 , it says that the new piggy porridge is better than the
old one. If we do not reject the null hypothesis, we can’t say that the new sow slop is better and we
cannot justify a switch.
5.
Change your assumptions to assume that both samples have a population standard deviation of 15.
Find a 93% two-sided confidence interval for the difference between the means. (3) Is there a
significant difference between the means? Why? (1)
Solution: From Table 5 of the syllabus supplement:
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
d cv  D0  z d
D  d z 2  d
d  D0
z
between Two
H
:
D

D
,
d
1
0
Means (
 12  22
D







1
2
d
known)
n1
n2
d  x1  x 2
d 
 12
n1

 22
n2

15 12 15 22


60
60
2225 
 7.5  2.739 .   .07 . From the last page:
60
d  175 .9  180 .2  4.3 . From the second page of this solution, z .035  1.81 .
So D  d  z  2  d  4.3  1.812.739   4.3  4.96 or -9.26 to 0.66.
Since the interval includes zero (or because the value of the error part of the expression exceeds the
absolute value of the difference between the sample means) the difference is not statistically significant
at the 93% confidence level.
[17]
Questions 6 and 7 refer to Exhibit 2.
7
252y0521 4/07/05 (Page layout view!)
Exhibit 2: (Lees) The net income figures for seven regions in which
Smelly-Welly Dirt Devourer is sold are given before and after a
reorganization.
Region
Before
After
Difference
Reorganization Reorganization
1
40
62
-22
2
35
49
-14
3
42
39
3
4
30
28
2
5
55
55
0
6
63
66
-3
7
36
40
-4
The researcher decides that the Wilcoxon Signed Rank test for paired
samples is appropriate. The region with a tie is dropped from
consideration leaving 6 pairs for this test only. The test is one sided.
Minitab gives the following sample statistics for the data for use in
Problem 7. (Use   .05 )
Description Variable n
Mean
SE Mean StDev
Before
7
43.00
4.46 11.80
x1
After
7
48.43
5.15 13.62
x2
d
Difference
7
-5.43
3.49
9.24
Comment: Note that you were told that the data was paired.
6.
In the Wilcoxon Signed Rank Test, the number that you compare to the values in the Wilcoxon
Signed Rank Test Table is (3)
a. 3
b. *3.5
c. 17
d. 17.5
e. 18
f. 18.5
g. None of the above – write in the correct number and show your work.
[20]
Explanation: The original data is repeated with ranks of the differences r  and corrected
ranks r * .
Region
Before
After
Difference
r
r*
D
1
40
62
-22
22
6
6 2
35
49
-14
14
5
5 3
42
39
3
3
3
2.5+
4
30
28
2
2
1
1 +
5
55
55
0
-6
63
66
-3
3
2
2.57
36
40
-4
4
4
4 The sum of the positive ranks is 3.5 and the sum of the negative ranks is 17.5. We can check
67 
 21  3.5  17 .5. The smaller of
this by recalling that the sum of the first 6 numbers is
2
the two rank sums is 3.5. (If we compare this number with Table 7, we find that for a onesided 5% test the critical value is 2. Since 3.5 is above it, we cannot reject the null hypothesis
of equal medians.)
7.
If we change our assumptions to state that the underlying distribution is Normal, we should not be
using the Wilcoxon Signed Rank Test. If we use a test based on the mean we have all of the
following:
a. Confidence Interval: D  d  t  2 s d .
8
252y0521 4/07/05 (Page layout view!)
b. Test Ratio: t 
d  D0
sd
c. Critical Value: d cv  D0  t  2 s d .
On the basis of the information above, find d , s d and the numbers of degrees of freedom. If you
do any calculations make sure that I know what they are. (4)
[24]
Solution: From Table 5 of the syllabus supplement:
Since we are dealing with paired data, the relevant line part of the table is stated below.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
sd
d  x1  x 2
Means (paired
D




1
2
s
data.)
df  n  1 where
sd  d
n
n1  n 2  n
Since we are dealing with paired data, the relevant line part of the output is stated below.
Description Variable
d
Difference
n
7
Mean
-5.43
SE Mean StDev
3.49
9.24
If we just read this we find d  43  48 .43  5.43, s d 
sd

9.24
 3.49 and df  n  1  6.
n
7
Of course, if you are compulsive enough to finish the problem,
d  D0
- 5.43 - 0
6
t

 1.557  1.943  t .05
so, apparently, we do not reject the null
sd
3.49
hypothesis, which was never stated.
8.
A marble machine is recalibrated, and the owner is afraid that it is producing marbles that are too
small. The standard size is 12mm. The following results pop out after 105 diameters are fed to the
computer..
One-Sample T: x1
Test of mu = 12 vs < 12
Variable
x1
N
105
Mean
12.0150
StDev
0.0498
95%
Upper
SE Mean
Bound
T
P
0.0049 12.0231 3.09 0.999
Make a diagram showing p-value. Suppose that you were doing a 2-sided test with the same
numbers, what would the p-value be.?(Credit raised to 3)
[27]
Solution: a) The output tells us two things: 1) Because the alternative hypothesis is   12 this is
a left-sided test and 2) t  3.09. To make the diagram draw an almost Normal curve with a center
and vertical line at zero and shade the entire area below 3.09. pvalue  Pt  3.09   .999 .
9
252y0521 4/07/05 (Page layout view!)
t Curve with 104 Degrees of Freedom and Standard Deviation 1.00976
The Area to the Left of 3.09 is 0.9987
0.4
Density
0.3
0.2
0.1
0.0
-4
-3
-2
-1
0
Data A xis
1
2
3
4
b) If we want a two-sided test of H 0 :   12 vs. H 1 :   12 , the p-value is defined as the
probability of getting a result as extreme as or more extreme than our actual results.
pvalue  2Pt  3.09   2 1  Pt  3.09   2 1  .999   2 .001  .002
9.
(Lee) The number of people calling in sick during a certain week is below.
M 42 T 33 W 35 R 25 F 45
The null hypothesis is that people are equally likely to call in sick on each day. This is a chi-square test
of (1)
a. Homogeneity
b. Independence
c. *Uniformity
d. Normal distribution
e. Poisson Distribution
Explanation: This is the simplest of chi-squared goodness-of-fit tests. Uniformity is the assumption
that every class is of equal size.
10. (Lee) The number of people calling in sick during a certain week is below.
M 42 T 33 W 35 R 25 F 45
The null hypothesis is that people are equally likely to call in sick on each day. What is the chi-square
value? (3)
a. 11.9023
b. *6.889
c. 3.296
d. 4.198
e. 7.895
Solution: The sum of the numbers above is n  180 . Uniformity means that each of the r  5 classes
1 1
is   .2 of the data. We thus have the following:
r 5
Row
O
1
42
2
33
3
35
4
25
5
45
Sum 180
So  2 
E
36
36
36
36
36
180
-6
3
1
11
-9
0
 E  O  

E 
 
E  O2
E O
36
9
1
121
81
E  O  2
1.00000
0.25000
0.02778
3.36111
2.25000
6.88889
E
or
O2
E
49.0000
30.2500
34.0278
17.3611
56.2500
186.8889
 O2 
  n  6.8889 .
E 
 
11. (Lee) The number of people calling in sick during a certain week is below.
M 42 T 33 W 35 R 25 F 45
10
252y0521 4/07/05 (Page layout view!)
The null hypothesis is that people are equally likely to call in sick on each day. Do we reject the null
hypothesis at a 5% significance level? No answer will be accepted without a reason.
 
Solution: Df  r  1  5  1  4 and from Table 1 we get  2
4
.05
 9.4877 . Since our computed
  6.8889 is smaller than the table value, we do not reject the null hypothesis.
2
12. (Lee) The number of people calling in sick during a certain week is below.
M 42 T 33 W 35 R 25 F 45
The null hypothesis is that people are equally likely to call in sick on each day. Can we do this by
another method than Chi Squared? Do it! (4)
[36]
Solution: This is s Kolmogorov- Smirnoff Test. H 0 : Uniform ,   .05
Day
E
n
Fe
O
M
T
W
R
F
.2
.2
.2
.2
.2
1.0
.20
.40
.60
.80
1.00
42
33
35
25
45
180
cum
O
Fo
D
42
75
110
135
180
.2333
.4167
.6111
.7500
1.0000
.0333
.0167
.0111
.0500
.0000
Fe comes from adding .2s. Fo comes from dividing the cumulative O by n  180 . max D  , the
maximum difference, is .5000. The Kolmogorov-Smirnov table of critical values gives us , for n  35

.20
.10
.05
.01
CV 1.07
1.22
1.36
1.63
n
n
n
n
If we substitute n  180 , we get the table below.

.20
.10
.05
.01
CV .0797
.0909
.1014
.1214
Since max D is smaller than any of the critical values, we conclude that p  value  .20 . For   .05 , we
 
cannot reject H 0 .
11
252y0521 4/07/05 (Page layout view!)
Location - Normal distribution.
Compare means.
Location - Distribution not
Normal. Compare medians.
Paired Samples
Method D4
Independent Samples
Methods D1- D3
Method D5b
Method D5a
Proportions
Method D6
Variability - Normal distribution.
Compare variances.
Method D7
12
252y0521 4/07/05 (Page layout view!)
ECO252 QBA2
SECOND EXAM
April 4, 2005
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
III. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly.
(19+ points)
1) An industrial plant is trying to figure out whether gas or electric fuel is cheaper per delivered quadrillion btus. Random
samples of 11 electricity-using plants and 16 gas using plants are taken. The results appear below. The columns labeled x1 and
x2 are the data and the rx1 and rx2 columns are the ranks of the numbers within their own column, which you may find useful.
Row
x1
x2
rx1 rx2
1 45.14
9.55
10
2
2 10.11 38.76
2
15
3 29.38 16.65
7
9
4 19.65 19.00
5
12
5 16.25 17.00
4
10
6 29.46 29.01
8
13
7
8.13 12.34
1
6
8 45.63 11.18
11
4
9 24.49 12.15
6
5
10 12.71 14.40
3
7
11 37.04
8.00
9
1
12
16.19
8
13
33.46
14
14
18.37
11
15
9.86
3
Minitab gives the following information, which also may help you.
Descriptive Statistics: x1, x2
Variable
N N*
Mean SE Mean StDev Minimum
Q1
x1
11
0 25.27
4.02 13.33
8.13 12.71
x2
15
0 17.73
2.35
9.11
8.00 11.18
Median
24.49
16.19
Q3
37.04
19.00
Maximum
45.63
38.7
Before you start, personalize the data as follows: If the second to last number of your student number is 0-4 add it to the second
to last digit of the numbers in x1. If the second to last number of your student number is 5-9, divide it by 100 and subtract it
from x1. (If the number is 2, the first two numbers become 45.14 + .20 = 45.34 and 10.21.) (If the number is 6, the first two
numbers become 45.14 - .06 = 45.08 and 10.05.) Use
  .10
a) Compute a (mean and) standard deviation for the electric plants, show your work. Excessive rounding will be penalized
throughout this exam. (1)
b) Test to see if
x1 is Normally distributed. (3)
c) Test to see if the standard deviations of the two samples are equal. (1)
d) Test to see if the means of the two samples differ significantly on the assumption that your answers to b) and c) showed equal
variances and Normal distributions. Use a test ratio, critical value or a confidence interval (4) or all three (6). Your answers to
all three should be almost identical. [9]
e) Assume that the tests showed unequal variances but Normal distributions, repeat the test (4 extra credit)
f) Assume that the tests showed that the distributions were not Normal, repeat the test. (4) [13]
Solution: To save time, I will use the original numbers. It won’t change your results much.
a) Compute a (mean and) standard deviation for the electric plants, show your work. Excessive
rounding will be penalized throughout this exam. (1) From the computations on the next
x1  277 .99 and
x12  8802 .55 .
On the next page we have n1  11,
So x1 
x
1
n1


277 .99
 25 .2718 , s12 
11


x12
 nx12
n1  1

8802 .55  1125 .2718 2
10
13
252y0521 4/07/05 (Page layout view!)
1777 .24
 177 .724 and 177.724  13.3313
10
b) Test to see if x1 is Normally distributed. (4)
Assume   .10 H0 : Normal The only practical method is the Lilliefors method.

The numbers must be in order before we begin computing cumulative probabilities! From above,
xx
remember that x  25.2718 and s  13 .3333 . We compute z 
. (This is really a t .) Fe is the
s
cumulative distribution, gotten from the Normal table by adding or subtracting 0.5. Fo comes from
the fact that there are 11 numbers, so that each number is one eleventh of the distribution.
For   .05 and n  10 the critical value from the Lilliefors table is 0.230. Since the largest deviation
here is .1179, we do not reject H 0 .
Row
x
1
8.13
2 10.11
3 12.71
4 16.25
5 19.65
6 24.49
7 29.38
8 29.46
9 37.04
10 45.14
11 45.63
Sum 277.99
x2
z
66.10
102.21
161.54
264.06
386.12
599.76
863.18
867.89
1371.96
2037.62
2082.10
8802.55
-1.29
-1.14
-0.94
-0.68
-0.42
-0.06
0.31
0.31
0.88
1.49
1.53
O
1
1
1
1
1
1
1
1
1
1
1
11
Ocum
1
2
3
4
5
6
7
8
9
10
11
Fo
0.09091
0.18182
0.27273
0.36364
0.45455
0.54545
0.63636
0.72727
0.81818
0.90909
1.00000
Fe
0.099251
0.127705
0.173025
0.249286
0.336622
0.476617
0.621020
0.623301
0.811314
0.931932
0.936631
D
0.008342
0.054114
0.099702
0.114351
0.117924
0.068837
0.015344
0.103972
0.006868
0.022842
0.063369
c) Test to see if the standard deviations of the two samples are equal. (2) The easiest part of the takehome.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Ratio of Variances  22 s22 DF , DF
s12
H0 : 12   22
DF1 , DF2
 2 F.5  1.5  2 
F

2
2
1
s1
s 22
H1 : 12   22
1
1 , DF2
F1DF

 2
DF1  n1  1
and
FDF1 , DF2
2
DF2  n 2  1
s 22
DF2 , DF1
F

 2
s12

.5  .5   2    or
1  
2

From our computations of the variance s12  177.724 . From the computer printout
s 22  9.112  82 .992 , n1  11 and n2  15 .
s 2 177 .724


14,10  2.86 . We
F 14,10  22 
 2.141 for a two sided test compare this to F n1 1, n2 1  F.05
2
82
.
992
s1
should also compare F 10,14 
s12
s 22

1
10,14  2.60
against F.05
, but the test ratio is below 1 and
2.141
cannot possibly be above the critical value. Since both ratios are below their critical values, we cannot
reject the null hypothesis. We can also do this by a confidence interval. In “Confidence Limits and
Hypothesis testing for Variances,” in the syllabus supplement, the formula given is
s12
 12 s12 ( n2 1, n1 1)
s12
 12 s12 (14,10)
1
1



F
,
which
becomes
or

10,14  2  s 2 F.05
s 22 F.05
s 22 Fn1 1, n2 1  22 s 22 2
2
2
2
14
252y0521 4/07/05 (Page layout view!)
2
2
1
1
1
 12 
2.86 and, finally 0.337  12  2.597 . This interval includes one, so we
1.141 2.60  2 1.141
2
cannot reject the null hypothesis.
d) Test to see if the means of the two samples differ significantly on the assumption that your answers
to b) and c) showed equal variances and Normal distributions. Use a test ratio, critical value or a
confidence interval (4) or all three (6). Your answers to all three should be almost identical.[9]
From our computations of the variance n1  11, x1  25.2718 , and s12  177.724 .
From the computer printout n 2  15, x 2  17 .73, and s 22  9.112  82 .992 .
H 0 : 1   2 , H1 : 1   2 ,
d  x1  x 2  25.27 17.73  7.54 and   .10 . If we assume that
n  1s12  n2  1s 22 10177 .724   1482.992 

the variances are equal s p2  1

 122 .464 , so that
n1  n 2  2
24
 1
1 
11 
1 1
 15
  122 .464     122 .464 
s d2  sˆ 2p  

  122 .464 0.157575   19 .297 and
n
n
11
15
165
165




2 
 1
d  D0
7.54
1 
  1
  19.297  4.32928 . t 

 1.716 and
s d  s p2  
sd
4.32928
 n1 n 2 
df  n1  n 2  2  11  15  2  24 .
Make a diagram: Show an almost Normal curve with a center at zero and critical values at
24
24
t .05
 1.711 and  t .05
 1.711 . Since the computed value of t is between these, do not reject the
null hypothesis.
24
s d  7.54  1.7114.32928   7.54  7.41 . Since
 Confidence Interval: D  d  t  2 s d  7.54  t 05

7.41 is smaller than 7.54, the interval does not include zero.
d  D0
d  D0
7.54
t

 1.716 Make a diagram: Show an almost
Test Ratio: t 
sd
4.32928
sd
24
24
 1.711 and  t .05
 1.711 . Since
Normal curve with a center at zero and critical values at t .05

the computed value of t is not between these, reject the null hypothesis.
Critical Value: d cv  D0  t  2 s d  7.41 . Make a diagram: Show an almost Normal curve with
a center at zero and critical values at 7.41 and -7.41. Since the computed value of
d  x1  x 2  7.54 is not between these critical values, reject the null hypothesis.
e) Assume that the tests showed unequal variances but Normal distributions, repeat the test (4 extra
credit)
From our computations of the variance n1  11, x1  25.2718 , and s12  177.724 .
From
the computer printout n 2  15, x 2  17 .73, and s 22  9.112  82 .992 .
H 0 : 1   2
H 1 : 1   2
d  x1  x 2  25.27 17.73  7.54 .   .10
15
252y0521 4/07/05 (Page layout view!)
If we do not assume equal variances, use the following worksheet:
sx21 
s12 177 .724

 16 .1567
n1
11
s x22 
s 22 82 .992

 5.5328
n2
15
s d2 
s12 s 22

n1 n 2
2
 s12 
 
 n1 
16 .1567 2
 

 26 .1039
n1  1
10
 21.6895


  s2 s2 2
  1  2 
  n1 n 2 
df  
2
2
  s2 
 s 22 
1
 
 
 n2 
  n1 
 


n

1
n2 1
 1
s12 s 22

 21.6895  4.6571
n1 n 2
sd 




2

s d2

21 .6895 2


2
2
 26 .1039  2.1866
 s x2

s x2
1

 2
 n1 1 n 2  1


 
   
2
 s22 
 
2
n 
 2   5.5328  2.1866
n2  1
14
 470 .4344

 16 .6287
 28 .2905

d  D0
7.54

 1.619 Make a diagram: Show an
sd
4.6571
almost Normal curve with a center at zero and critical values at t 16  2.120 and  t 16  2.120 . Since
Round this down and use 16 degrees of freedom. t 
.025
.025
the computed value of t is between these, do not reject the null hypothesis.
f) Assume that the tests showed that the distributions were not Normal, repeat the test. (4) [13]
Solution: If the parent distribution is not Normal, we can use a Wilcoxon-Mann-Whitney test of the
equality of two medians.
If we use a Wilcoxon-Mann-Whitney rank test, we get the following.
Row
x1
r1
x2
r2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
45.14
10.11
29.38
19.65
16.25
29.46
8.13
45.63
24.49
12.71
37.04
25
5
20
17
12
21
2
26
18
9
23
178
9.55
38.76
16.65
19.00
17.00
29.01
12.34
11.18
12.15
14.40
8.00
16.19
33.46
18.37
9.86
3
24
13
16
14
19
8
6
7
10
1
11
22
15
4
173
n1  11, and n 2  15 . Note that n1  n 2  11  15  26 and the sum of the first 26 numbers is
26 27 
 351 , so that, as a check on my ranking 178 + 173 = 351.
2
According to the outline, for values of n1 and n2 that are too large for the tables, W , the smaller
of the two rank sums, has the Normal distribution with mean W 
 W2

1
6 n 2 W
1
2 n1
n1  n2  1 and variance
. If the significance level is 10% and the test is two-sided, we reject our null
hypothesis if z 
W  W
W
does not lie between z .05  1.645 .
16
252y0521 4/07/05 (Page layout view!)
We have n1  11 and n 2  15 . W  173 is the smaller of the rank sums.
W  1 2 n1 n1  n2  1  1 2 1111  15  1  5.5 27   148 .5 and  W2  16 n2 W
W  W 173  148 .5
=1.272

 1 6 15148 .5  371 .25 . Thus z 
W
371 .25
Since this is between the critical values, do not reject the null hypothesis of equal medians.
We could also say Pvalue  2Pz  1.27   2.5  .3980   2.1020   .2040  .10
Since the p-value is above the significance level, do not reject the null hypothesis of equal
medians.
2) A national survey categorized 600 responses about federal government regulation according to the income of the respondees.
Too little
Just enough
Too much
regulation
regulation
regulation
Low income
125
48
27
Medium income
103
58
39
72
69
59
High income
Personalize the data as follows: Subtract the second digit of your student number from the upper left-hand number and add it to
the lower right-hand number.
Do the following
  .05  :
a) Is there a relation between incomes and views of government regulation? (4)
b) Use a method for comparing two proportions to compare the proportion of low income people and medium income people
who feel that there there is too little regulation. (3)
c) Use the the Marascuilo procedure to compare the proportions of the three income groups that say there is just enough
regulation. (Note that you are now dividing responses into ‘just enough and ‘not just enough’ and that this cuts down your
degrees of freedom) (4)
Solution:
a) Is there a relation between incomes and views of government regulation? (4)
This is a chi-squared test of homogeneity or independence. First we must complete the table by adding
sums and compute the proportion of the n  600 people in each row. The O (observed) table is the
original data set off below by double lines.
O
Too little
Just enough
Too much
Total
pr
regulation
regulation
regulation
Low income
125
48
27
200
1/3
Medium income
High income
103
58
39
200
1/3
72
69
59
200
1/3
Total
300
175
125
600
1.00
n  600 . The proportions in rows, p r , are used with column totals to get the items in the E (expected)
table. For example, the first number in the second row is 1 175   58 .33 . Note that row and column sums
3
in E are the same as in O except for a possible small rounding error.
E
TL
JE
TM total
pr
H
100 .00
58 .33
41 .67
200
1
M
100 .00
58 .33
41 .67
200
1
L
100 .00
58 .33
41 .67
200
total 100 .00 175 .00 125 .00
3
3
1
3
100 1.00
17
252y0521 4/07/05 (Page layout view!)
(Note that  2 is computed two different ways here - only one way is needed.)
Row
O
1
125
2
103
3
72
4
48
5
58
6
69
7
27
8
39
9
59
Total 600
E
E O
E  O2
100.00
100.00
100.00
58.33
58.33
58.33
41.67
41.67
41.69
600.02
-25.00
-3.00
28.00
10.33
0.33
-10.67
14.67
2.67
-17.31
0.02
625.000
9.000
784.000
106.709
0.109
113.849
215.209
7.129
299.636
H 0 : Opinion homogeneous by income class
E  O  2
O2
E
E
6.25000
0.09000
7.84000
1.82940
0.00187
1.95181
5.16460
0.17108
7.18724
30.48600
156.250
106.090
51.840
39.499
57.672
81.622
17.495
36.501
83.497
630.466
 .2054   9.4877
DF  r  1c  1  22  4
O  E 2  630 .466  600  30.466 or 30.486
O2
n 
E
E
Since this is more than 9.4877, reject H 0 .
Make a diagram! Try the one below, which has the rejection region in blue. Show that 30.5 falls in the
rejection region.


ChiSquare Curve with 4 Degrees of Freedom and Standard Deviation 2.82843
The Area to the Right of 9.4877 is 0.0500
0.20
Density
0.15
0.10
0.05
0.00
0
5
10
Data A xis
15
20
25
Even better, how about a p-value? The diagram below tells us the pvalue  0  .05 , so we reject H 0 .
ChiSquare Curve with 4 Degrees of Freedom and Standard Deviation 2.82843
The Area to the Right of 30.466 is 0.0000
0.20
Density
0.15
0.10
0.05
0.00
0
5
10
15
Data A xis
20
25
30
18
252y0521 4/07/05 (Page layout view!)
b) Use a method for comparing two proportions to compare the proportion of low income people and
medium income people who feel that there is too little regulation. (3)
Too little
Just enough
Too much
Total
regulation
regulation
regulation
Low income
125
48
27
200
Medium income
103
58
39
200
High income
72
69
59
200
Total
300
175
125
600
125 out of 200 low income people say that there is too little regulation.
103 out of 200 medium income people say that there is too little regulation.
125
103
 .625 and p 2 
 .515 . n1  200 , n 2  200 ,
The observed proportions are p1 
200
200
q1  1  p1  1  .625  .375 , and q 2  1  p 2  1  .515  .485 .
Let p  p1  p 2 . So p  p1  p 2  .625  .515  .110 and our hypotheses become H 0 : p1  p 2  0 and
H 1 : p1  p 2  0 . or H 0 : p  0 and H 1 : p  0 . p 0  0 is the value of p  p1  p 2 from H 0 .
s p 
, p0 
p1 q1 p 2 q 2
.625 .375  .515 .485 



 .001172  .001249  .002421  .04920
n1
n2
200
200
125  103 n 2 p 2  n3 p 3 200 .625   200 .105  228



 .570 q 0  1  p 0  1  .570  .430 .
200  200
n 2  n3
200  200
400
  .05, z 2  z.025  1.960. Note that q  1  p and that q and p are between zero and one.
 p  p 0 q 0

1
n1

1
n3

.570 .430  1 200  1 200 
.2451 .0100   .002451  .04851
Use one of the following:
Confidence interval: Since the alternate hypothesis is H 1 : p  0 , the confidence interval will be
p  p  z 2 s p  .110  1.960.04920 or p  .110  0.0964 . Since the error part of the confidence
interval is smaller in absolute value than p  p1  p 2  .110 , the interval does not include zero. This
contradicts H 0 : p  0 , so reject H 0 . Make a diagram of a Normal curve with p  .110 in the
middle. The area described by the confidence interval is between p  .110  0.096  .014 and
p  .110  0.096  .206. Show zero on the graph. Since zero does not fall in the confidence interval, reject
H0 .
Test ratio: z 
p  p 0
 p

.110  0
 2.268 . Make a diagram of a Normal curve with zero in the
.04851
middle. The ‘reject’ zone is the area below  z   z.025  1.960. or above z  z.025  1.960 . Since the
2
2
test ratio is in the upper part of the ‘reject’ zone, reject H 0 .
Critical value: Because the alternate hypothesis is H 1 : p  0 , we need two critical values. Use
pcv  p0  z 2  p  0  1.960.04851  .095 or -0.095 to 0.095. Make a diagram of a Normal curve
with zero in the middle. The ‘reject’ zones are the area below -.095 and the area able .095. Since p  .110
is in the upper ‘reject’ zone, reject H 0 .
The p-value for this problem is 2Pp  .110   2Pz  2.268   2.5  .4884   .0232 . Since this is below
  .05, reject H 0 .
19
252y0521 4/07/05 (Page layout view!)
c)
Use the Marascuilo procedure to compare the proportions of the three income groups that say there
is just enough regulation. (Note that you are now dividing responses into ‘just enough and ‘not just
enough’ and that this cuts down your degrees of freedom) (4)
The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected and
 
(ii) p a  p b   2 s p , where a and b represent 2 groups, the chi - squared has c  1 degrees of
p a q a pb qb

, you can say that you have a significant
na
nb
freedom and the standard deviation is s p 
difference between p a and p b . This is equivalent to using a confidence interval of
c 1  p a q a
p a  pb   p a  pb    2

 n
 a

pb qb
nb




Too little
Just enough
Too much
regulation
regulation
regulation
Low income
125
48
27
Medium income
103
58
39
High income
72
69
59
Actually, we should make this into a 2 by 3 table before we start and repeat the chi-squared test.
O
Low
Medium
High
Total
pr
Income Income
Income
Just enough
48
58
69
175
.2917
Not just enough
152
142
131
425
.7083
Total
200
200
200
600
1.0000
Proportion
.240
.290
.345
We create E using the row proportions.
O
Low
Medium
Income Income
Just enough
58.34
58.34
Not just enough
141.66
141.66
Total
200.00
200.00
O
Row
1
2
3
4
5
6
Total
48
152
58
142
69
131
600
High
Income
58.34
141.66
200.00
E
E O
E  O2
58.34
141.66
58.34
141.66
58.34
141.66
600.00
10.34
-10.34
0.34
-0.34
-10.66
10.66
0
106.916
106.916
0.116
0.116
113.636
113.636
H 0 : Opinion homogeneous by income class
Total
pr
175
425
600
.2917
.7083
1.0000
E  O  2
E
1.83263
0.75473
0.00198
0.00082
1.94782
0.80217
5.34015
O2
E
39.493
163.095
57.662
142.341
81.608
121.142
605.340
DF  r  1c  1  12  2
 .2052   5.9915
O  E 2  605 .340  600  5.340 or 5.3402. Hey! Much to my total surprise, we really
O2
n 
E
E
shouldn’t go on since the hypothesis of homogeneity is true! However, I assume that most of you did and
some of you would have gotten a rejection if you did this, so here we go!


20
252y0521 4/07/05 (Page layout view!)
If s p 
p a q a pb qb

and we are going to use  .2052   5.9915 in a confidence interval of
na
nb
c 1  p a q a
p a  pb   p a  pb    2
O
Just enough
Not just enough
Total n a
Proportion p a
Proportion q a
 a2

 n
 a

Low Income
48
152
200
.240
.760
.0009120
pb qb
nb

p q
 we should compute  a2  a a for a  1,2,3 .

na

Medium Income
58
142
200
.290
.710
.0010295
High Income
69
131
200
.345
.655
.0011299
2



2





p1  p 2   p1  p 2    2 .05  12   22  .240  .290   5.9915 .0009120  .0010295
 0.050  0.198
p1  p3   p1  p3    2 .05  12   32  .240  .345   5.9915 .0009120  .0011299
 0.105  0.111
2
p 2  p3   p 2  p3    2 .05  22   32  .290  .345   5.9915 .0010295  .0011299 
 0.055  0.114
So, the chi-square test spoke the truth and, because the p part of the confidence interval is smaller in
absolute value than the error part, there is no significant difference between these proportions. Who woulda
guessed it!
21
Download