252x0721 3/19/07 ECO252 QBA2 Name

advertisement
252x0721 3/19/07
ECO252 QBA2
SECOND HOUR EXAM
March 23 2007
Name
Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not
usually acceptable.
I. (8 points) Do all the following. Make diagrams!
x ~ N 10, 7 - If you are not using the supplement table, make sure that I know it.
1. P7  x  25 
2. Px  15 
3. P5  x  0
4.
x.085 (Do not try to use the t table to get this.)
1
252x0721 3/19/07
II. (22+ points) Do all the following? (2points each unless noted otherwise). Look them over first –
the computer problem is at the end. Show your work where appropriate.
Note the following:
1. This test is normed on 50 points, but there are more points possible including the take-home.
You are unlikely to finish the exam and might want to skip some questions.
2. A table identifying methods for comparing 2 samples is at the end of the exam.
3. If you answer ‘None of the above’ in any question, you should provide an alternative
answer and explain why. You may receive credit for this even if you are wrong.
4. Use a 5% significance level unless the question says otherwise.
5. Read problems carefully. A problem that looks like a problem on another exam may be
quite different.
6. Make sure that you state your null and alternative hypothesis, that I know what method you are
using and what the conclusion is when you do a statistical test.
1. (Anderson, Sweeny, Williams) We wish to compare miles per gallon of two similar automobiles. A
random sample of 8 automobiles is chosen and 8 drivers are asked to drive the cars on identical roads. The
data is as follows.
Row
1
2
3
4
5
6
7
8
Driver
1
2
3
4
5
6
7
8
Model 1
Model 2
difference
x1
x2
d
28
23
25
23
24
26
29
24
26
22
27
22
23
25
27
26
2
1
-2
1
1
1
2
-2
I have computed x1  25.25 , s1  2.2520 , x 2  24 .75 and s 2  2.1213
a. Compute the sample variance for the d column – Show your work! (2)
b. Is there a significant difference between the gas consumption in the two models? State your hypotheses!
(2)
c. Test to see if the variances of the two cars’ gas consumption are similar. (2)
[6]
2
252x0721 3/19/07
Exhibit 1: A quality control engineer is in charge of manufacture of computer disks. Two different
processes can be used to manufacture the disks. The engineer suspects that the Kohler method produces a
greater proportion of defective disks than the Russell method. Out of a sample of 150 Kohler disks, 27 are
defective. Out of a sample of 200 Russell disks, 18 are defective. If Kohler disks are sample 1 and Russell
disks are sample 2, test the engineer’s suspicion at the 1% level.
2. The hypotheses that should be tested in exhibit 1 are
a. H 0 : p1  p 2  0 and H 1 : p1  p 2  0
b. H 0 : p1  p 2  0 and H 1 : p1  p 2  0
c. H 0 : p1  p 2  0 and H 1 : p1  p 2  0
d. H 0 : p1  p 2  0 and H 1 : p1  p 2  0
e. H 0 : p1  p 2  0 and H 1 : p1  p 2  0
f. H 0 : p1  p 2  0 and H 1 : p1  p 2  0
g. None of the above. (Write in correct answer.)
3. For exhibit 1, find the value of the test ratio. (3)
[8]
[11]
4. For exhibit 1, the hypotheses in 2 and the test ratio in 3 draw an approximately normal curve and show
the ‘reject’ region by shading it. (3)
[14]
5. For exhibit 1and the hypotheses in 2, find a p-value for the test. (2)
[16]
5a.For exhibit 1, find a 17%   .17  2-sided confidence interval for the difference between the 2
proportions. (4)
3
252x0721 3/19/07
Exhibit 2: A data entry operation sends a group of its employees to a typing course. The table below shows
their speed before x1  and after x 2  . d  x1  x 2 . r1 and r2 represent the ranks of the numbers when
the before and after speeds are ranked between 1 and 16. d is the absolute value of the items in the d
column. r d drops the zero and ranks the numbers in d from 1 to 7 and r d * is the ranks with their signs
added.
Row
1
2
3
4
5
6
7
8
Processor
1
2
3
4
5
6
7
8
Before
rB
After
rA
d
abs d
rank
sRank
x1
r1
x2
r2
d
d
rd
rd *
59
57
60
66
68
59
72
52
5.5
3.5
7.5
12.0
13.0
5.5
15.0
1.0
57
62
60
63
69
63
74
56
3.5
9.0
7.5
10.5
14.0
10.5
16.0
2.0
2
-5
0
3
-1
-4
-2
-4
2
5
0
3
1
4
2
4
2.5
7.0
*
4.0
1.0
5.5
2.5
5.5
2.5
-7.0
*
4.0
-1.0
-5.5
-2.5
-5.5
6. Assume that exhibit 2 represents the scores of one sample of eight employees before and after the
training. Can we say that the median speed has risen? Do an appropriate statistical test. (3)
[19]
7. Assume that instead the before and after columns represent independent samples. Can we say that the
median speed has risen? Do an appropriate statistical test. (3) [22]
4
252x0721 3/19/07
8. The owner of Mother Truckers (which actually moved me once) wants to prove that her firm is superior
to her arch rivals Wallflower Van Lines and wants to use proportion of shipment with claims filed as a way
of doing that. She assembles the following data.
Mother Truckers
Wallflower
Total Shipments Sampled
900
750
Total number of shipments with
162
60
claims over $50
Which would be proper to analyse the data?
a.  2 test for independence.
b.  2 test for homogeneity
c. ANOVA
d. z test for comparing 2 proportions.
e. Sign test
f. The McNemar Test
g. None of the above.
9. Which is the closest to the probability that a  2 random variable with 4 degrees of freedom will be
greater than 10?
a. .01
b. .05
c. .10
d. .99
e. .95
f. .90
10. During a period of 20 days 720 patients arrive at a hospital or an average of 1.5 per hour over 480
hours. For example during 106 of the 480 hours there were no arrivals. See if a Poisson distribution fits
these data. (6)
[32]
Row
x
1
0
2
1
3
2
4
3
5
4
6 5 or more
O
106
140
125
106
3
0
480
xO
0
140
250
318
12
0
720
5
252x0721 3/19/07
11. Computer question.
a. Turn in your first computer output. Only do b, c and d if you did. (3)
b. A researcher believes that bank CEOs are paid more than utility CEOs. A random sample of
eight salaries (in thousands) is collected for each industry. What were the null and alternative
hypotheses tested? At the 95% confidence level could the researcher state that bank CEOs are paid
more than utility CEOs? Why? How would the results be affected if we insist on a 99%
confidence level? (2)
c. What is the difference between the two hypothesis tests that were done with the salary data? (1)
d. (Lee) A manufacturer is afraid that the company is producing slow egg timers. A sample of 12
timers is chosen and the time in seconds that was needed for the timers to run out was recorded.
What hypotheses were tested? Can the manufacturer conclude that the timers are slow if a 95%
confidence level is used? Why? How would the results be affected if we insist on a 99%
confidence level? (2)
[40 actually 44]
————— 3/19/2007 7:55:12 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > print c1 c2
Data Display
Row
1
2
3
4
5
6
7
8
Banks
755
712
845
985
1300
1143
733
1189
Utilities
620
395
653
1050
1030
528
610
964
MTB > describe c1 c2
Descriptive Statistics: Banks, Utilities
Variable
Banks
Utilities
N
8
8
N*
0
0
Variable
Banks
Utilities
Maximum
1300.0
1050.0
Mean
957.8
731.3
SE Mean
81.3
87.9
StDev
230.0
248.6
Minimum
712.0
395.0
Q1
738.5
548.5
Median
915.0
636.5
Q3
1177.5
1013.5
MTB > TwoSample c1 c2;
SUBC>
Alternative 1.
Two-Sample T-Test and CI: Banks, Utilities
Two-sample T for Banks vs Utilities
SE
N Mean StDev Mean
Banks
8
958
230
81
Utilities 8
731
249
88
Difference = mu (Banks) - mu (Utilities)
Estimate for difference: 226.500
95% lower bound for difference: 14.437
T-Test of difference = 0 (vs >): T-Value = 1.89
P-Value = 0.041
DF = 13
6
252x0721 3/19/07
MTB > TwoSample c1 c2;
SUBC>
Pooled;
SUBC>
Alternative 1.
Two-Sample T-Test and CI: Banks, Utilities
Two-sample T for Banks vs Utilities
SE
N Mean StDev Mean
Banks
8
958
230
81
Utilities 8
731
249
88
Difference = mu (Banks) - mu (Utilities)
Estimate for difference: 226.500
95% lower bound for difference: 15.589
T-Test of difference = 0 (vs >): T-Value = 1.89
Both use Pooled StDev = 239.4934
P-Value = 0.040
DF = 14
MTB > print c6
Data Display
Seconds
190
199
198
176
180
174
181
183
208
188
198
165
MTB > describe seconds
Descriptive Statistics: Seconds
Variable
Seconds
N
12
N*
0
Variable
Seconds
Maximum
208.00
Mean
186.67
SE Mean
3.60
StDev
12.47
Minimum
165.00
Q1
177.00
Median
185.50
Q3
198.00
MTB > Onet c6;
SUBC>
Test 180;
SUBC>
Alternative 1.
One-Sample T: Seconds
Test of mu = 180 vs > 180
Variable
Seconds
N
12
Mean
186.667
StDev
12.471
SE Mean
3.600
95%
Lower
Bound
180.202
The methods were listed in the outline in the following table.
Paired Samples
Location - Normal distribution.
Method D4
Compare means.
T
1.85
P
0.046
Independent Samples
Methods D1- D3
Location - Distribution not
Normal. Compare medians.
Method D5b
Method D5a
Proportions
Method D6b
Method D6a
Variability - Normal distribution.
Compare variances.
Method D7
7
252x0721 3/19/07
Blank page.
8
252x0721 3/19/07
ECO252 QBA2
SECOND EXAM
March 23, 2007
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
III. Neatness Counts! Show your work! Always state your hypotheses and
conclusions clearly. (19+ points). In each section state clearly what number you are
using to personalize data (Your Version number). There is a penalty for failing to
include your student number on this page and not stating version number in each
section. Please write on only one side of the paper.
1. A bicycle manufacturer wishes to test the proposition that the age of bicycle buyers is older in mountain
biking country than in flatter land. In the course of a few hours in Mountain City and Flatland City two sets
of customer data are collected - 11 ages in Mountain City and 9 in Flatland City. Personalize the data as
follows. The manufacturer’s researcher brings his little brother along. The brother is 10 + x years old,
where x is the second to last digit of your student number. The brother puts his age in as a last item in both
columns. So now the researcher has one column of 12 ages and another of 10 ages. Example: Ima Badrisk
has the number 375290, so the 12th number in the ‘Mtn’ column is 19 as is the 10th number in the ‘Fltlnd’
column.
Row
1
2
3
4
5
6
7
8
9
10
11
Mtn
29
38
31
17
36
28
44
9
32
23
35
Fltlnd
11
14
15
12
14
25
14
11
8
a. You are the data analyst and you are fairly clueless. So you compare the ages every way possible. First
you compute means and standard deviations for both columns (Show your work!) (3)
b. With no good reason to do so, you compare the mean ages assuming a Normal distribution with equal
variances (4). You may use a test ratio, a critical value or a confidence interval (2 points extra if you use all
three and get the same result each time.
c. Now you are not sure that was right and repeat the analysis while dropping the assumption of equal
variances. (4 extra credit)
d. But you are not really sure that that was right either, so repeat the analysis by comparing medians. (3)
[10]
e. So now you have three different sets of results and you have to decide which one to present to your boss.
To decide whether you should have used the method in b) or in c) you compare variances. (2)
f. But since, perhaps, you should have compared medians instead, you use a test to see if the data in
Mountain city was Normally distributed. (4).
g. So, on the basis of these tests, which method should you have used? Make a decision and present your
results. (1)
[17]
9
252x0721 3/19/07
2. A corporate president is beginning to worry that his customer representatives are dressing too informally.
A sample of 11 representatives are selected and told not to wear a suit the first week and then told to wear a
suit the following week. Customers are asked to rate the representatives according to how professionally
they were treated, and from their questionnaires, each representative is given a rating.
The ratings appear below. Personalize the data as follows. The 10 in the ‘without’ column is an obvious
error. ‘Correct’ it by adding the last digit of your student number to it, and make a corresponding correction
in the difference column. If your student number ends in zero add 10. Example: Ima Badrisk has the
number 375290, so the 11th number in the ‘Without’ column is 20 and the 11 th number in the ‘Difference’
column is
2.
a. Test to see if the Reps received significantly higher ratings when wearing suits assuming that the samples
come from the Normal distribution. (3)
b. Test to see if the Reps received significantly higher ratings when wearing suits without assuming a
Normal distribution. (3)
c. So, given the source of the data, which of the two is the correct method to use? Why? (1) [24]
Row
1
2
3
4
5
6
7
8
9
10
11
Rep
A
B
C
D
E
F
G
H
I
J
K
With
27
23
25
22
25
26
21
25
26
28
22
Without
22
16
25
19
21
24
20
19
23
26
10
Difference
5
7
0
3
4
2
1
6
3
2
12
For your convenience, the following sums have been calculated for the first 10 numbers in each column.
With
Sum 248
Sum of squares 6194
Without
Sum 215
Sum of squares 4709
Difference
Sum 33
Sum of squares 153
3. The table below is data that were assembled to see if there is a difference in numbers of children among
students of various types of higher education institutions. Samples were taken in community colleges (CC),
large universities (LU) and small colleges (SC). Personalize the data by adding the third to last digit of
your student number to the 25 in the upper right-hand corner.
a. Is the number of children independent of the type of institution? (5)
b. Divide each sample into those with children and those without and use a Marascuilo procedure to find
the three possible differences between the proportions with kids and tell which pairs have significant
differences. (3)
Row
1
2
3
4
Number
0 Kids
1 Kid
2 Kids
More Kids
CC
25
49
31
22
LU
178
141
54
14
SC
31
12
8
6
10
252x0721 3/19/07
c. A sample of customer’s purchases at a dollar store appears below. Personalize the data by adding the
third to last digit of your student number to the 7 at the end. If that digit is zero add 10. Calculations of the
mean and variance have been done to remind you how to work with frequencies. f is the frequency of the
class and x is the class midpoint.
Row
1
2
3
4
5
6
7
n
Class
0 to under 10
10 to under 20
20 to under 30
30 to under 40
40 to under 50
50 to under 60
60 to under 70

f  129 , x 
f
6
14
29
38
25
10
7
129
midpoint
fx
fx2
5
30
150
15
210
3150
25
725 18125
35 1330 46550
45 1125 50625
55
550 30250
65
455 29575
4425 178425
 fx  4425  34.3023
n
129
and s 2 
 fx
2
 nx 2
n 1

178425  129 34 .3023 2

128
1359.375.
Check it to see if the sample follows represents the Normal distribution. (4)
d. The following data should be personalized by adding the third to last digit of your student number
divided by 100 to .621. Example: Ima Badrisk has the number 375290, so she adds .02 to .621 and gets
.641. The formula for the cumulative function for the continuous uniform distribution between c and d is
xc
F x  
. Check to see if these data are uniformly distributed between zero and 1. (3) [40]
d c
0.621
0.503
0.203
0.477
0.710
0.581
0.329
0.480
0.554
0.382
11
Download