Possible Rubric for Statistics Exams.

advertisement
252y0511 2/25/05 (Open in ‘Print Layout’ format)
Possible Rubric for Statistics Exams.
I have been hearing a lot about rubrics lately, and have taken a while to be assured that they are
not the materials that the third pig built his house out of. My first attempt at this came to me in a recent
assessment meeting.
1.
2.
3.
4.
5.
Did the student make a good effort to understand the question? This would include asking
the instructor and consulting notes and texts if he/she did not understand what was
desired.
Was the method used to solve the problem the best and most appropriate for the problem?
Was the method used correctly?
Did the student present the solution in such a way that the instructor can understand how
the student got the answers presented? This should include all formulas, equations and
tables used. Is it evident from the way the work is presented that the student understood
what he/she was doing? Is it legible?
Was the conclusion stated clearly? Was the null hypotheses rejected or not rejected?
What were the implications of the conclusion for a relevant goal, for example the
decision to buy a new product?
In view of what was said here, it is incredible that, on every exam I give, students give me confidence
intervals and tests for means when I ask for confidence intervals and tests for medians, variances and even
proportions. Check the wording on the questions that you misunderstood. Though one question on the
multiple choice part of the exam takes a bit of thought, can you identify what wording in the question made
you think it was about a mean? Can you tell me what it was? It is also remarkable that there are any people
out there who do not know that proportions, probabilities and p-values (which are probabilities) must be
between one and zero. It is also amazing to me that that so many of you cannot express the difference
between t and z . In the most practical sense a value of t  comes from the t table and must be used with
s , the sample standard deviation in confidence intervals and tests for the population mean. There are only a
few other cases where we use t  and they will be discussed later in the course. On the other hand z  ,
which comes only from the bottom line of the t table, but can be calculated using the table of the
standardized Normal distribution, must be used with  , the population standard deviation, in confidence
intervals and tests for the mean. z  is also used in large sample tests for the population proportion,
population mean, population standard deviation, population median and the means of the Poisson and
Binomial distribution if the correct formulas are used, but don’t push it. The Normal distribution should not
be used if more accurate methods are available. In any case, look at Things You Should Never Do on and
Exam or Anywhere Else before you do another assignment and frequently thereafter.
Cheating
Most instructors would consider any collusion on the take-home exam cheating. I am a bit more lenient,
since I believe peer-learning is important. But there are limits. Helping one another with methodology is
acceptable, but when students copy one another’s numbers, it is not acceptable. On this exam, I saw many
mistakes that I doubt that the individual would have made if he/she were working alone. Furthermore, some
copying was so blatant that errors appeared because one student could not read another student’s
handwriting. (Several papers copied ‘interval’ as ‘internal’.) This sort of communication, where one person
is doing all the work and a second individual is simply copying, and copying badly, is not cooperation. I
am afraid that more evidence of this sort of cheating will send me and the exams to the dean.
1
252y0511 2/25/05 (Open in ‘Print Layout’ format)
ECO 252 QBA2
FIRST HOUR EXAM
February 28 2005
Name ___KEY___________
Hour of class registered _____
Class attended if different ____
Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not
usually acceptable.
I. (8 points) Do all the following. x ~ N 1.5, 6 Do not make diagrams of x with zero in the middle.
Make up your mind! If you are diagramming x , put the mean in the middle; if you are diagramming z put
zero in the middle. Copies of the diagrams are below, but need a vertical line at 1.5 or zero to be
x
completely useful. Remember z 
and that this equation implies that if we have z and need a value

of x , as in parts j-n, we use x    z .
21  1.5 
 0  1.5
z
1. P0  x  21 .00   P
  P 0.25  z  3.25   P 0.25  z  0  P0  z  3.25 
6
6 

 .0987  .4994  .5981
Make a diagram! Your diagram for x should show a Normal curve with a vertical line at 1.5 in the
middle and the area shaded from zero to 21. Your diagram for z shows a Normal curve with a vertical line
at zero in the middle and the area shaded from - 0.25 to 2.14. Because this area is on both sides of zero,
add.
Normal Curv e with Mean 1.5 and Standard Dev iationN
6
orm al Curv e with Mean 0 and Standard Dev iation 1
The Area Between 0 and 21 is 0.5981
The Area Between -0.25 and 3.25 is 0.5981
0.07
0.4
0.06
0.3
0.04
Density
Density
0.05
0.03
0.2
0.02
0.1
0.01
0.00
-20
-10
0
Da ta A x is
10
0.0
20
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
Graph for x
Graph for z
1.50  1.5 
  7.00  1.5
z
2. P7.00  x  1.50   P
  P 1.42  z  0   .4222
6
6


Make a diagram! Your diagram for z shows a Normal curve with a vertical line at zero in the middle and
the area shaded from – 1.42 to zero. Because this area starts at zero, you do not need to add or subtract.
Normal Curv e with Mean 1.5 and Standard Dev iationN
6
orm al Curv e with Mean 0 and Standard Dev iation 1
The Area Between -7 and 1.5 is 0.4217
The Area Between -1.42 and 0 is 0.4222
0.07
0.4
0.06
0.3
0.04
Density
Density
0.05
0.03
0.2
0.02
0.1
0.01
0.00
-20
-10
0
Da ta A x is
10
20
0.0
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
2
252y0511 2/25/05 (Open in ‘Print Layout’ format)
10 .22  1.5 

3. Px  10 .22   P z 
  Pz  1.45   Pz  0  P0  z  1.45   .5  .4265  .0735
6


Normal Curv e with Mean 1.5 and Standard Dev iationN
6
ormal Curv e with Mean 0 and Standard Dev iation 1
The Area to the Right of 10.22 is 0.0731
The Area to the Right of 1.45 is 0.0735
0.07
0.4
0.06
0.3
0.04
Density
Density
0.05
0.03
0.2
0.02
0.1
0.01
0.00
4.
-20
-10
0
Da ta A x is
10
0.0
20
-5.0
-2.5
0.0
Da ta A x is
2.5
5.0
x.08 Make a diagram showing a Normal curve centered at zero, with 8% above z .08 , 42% between
z .08 and zero and 50% below zero. Recall that z .08 is 8% from the top of the distribution and 50% - 8% =
42% from zero. So P0  z  z.08   .4200 . According to the Normal table
P0  z  1.40   .4192 and P0  z  1.41  .4207 . So z .08 is between 1.40 and 1.41, and either would be
an acceptable answer, but z .08  1.405 would be better. Now, if we use x    z , we get
x.08  1.5  1.406  9.90 or x.08  1.5  1.405 6  9.93 or x.08  1.5  1.416  9.96 .
Check: Px  9.93
9.93  1.5 

 P z 
  Pz  1.405   Pz  0  P0  z  1.41  .5  .4207  .0793  .08 .
6


Density
Nor mal C ur ve with M ean 1 .5 and Standar d Deviation 6
T he A r ea to the Right of 9 .9 is 0 .0 8 0 8
0.050
0.025
0.000
-20
-10
0
10
20
30
20
30
20
30
Data Axis
Density
Nor mal C ur ve with M ean 1 .5 and Standar d Deviation 6
T he A r ea to the Right of 9 .9 3 is 0 .0 8 0 0
0.050
0.025
0.000
-20
-10
0
10
Data Axis
Density
Nor mal C ur ve with M ean 1 .5 and Standar d Deviation 6
T he A r ea to the Right of 9 .9 6 is 0 .0 7 9 3
0.050
0.025
0.000
-20
-10
0
10
Data Axis
3
252y0511 2/25/05 (Open in ‘Print Layout’ format)
II. (5 points-2 point penalty for not trying part a.)
(Mansfield) A random sample is taken of the length in feet of aluminum foil rolls. The following
data is found. (Recomputing what I’ve done for you is a great way to waste time.)
x
x2
1
2
3
4
5
6
7
Sum
74.88
75.86
74.81
74.28
74.35
73.41
74.66
522.25
5607.0144
5754.7396
5596.5361
5517.5184
5527.9225
5389.0281
5574.1156
38966.8747
a. Compute the sample standard deviation, s , of the waiting times. Show your work! (2)
b. Compute a 99% confidence interval for the mean,  . (2)
c. Is the population mean significantly different from 75.8 ft? (1)
 x  522 .25  74.6071
Solution: a. x 
s
2
x

2
 nx 2

38966 .8747  774 .6071 2
6
n
7
n 1
3.3384

 0.55640 s  0.55640  0.7459 . Note that excessive rounding can throw this answer
6
way off. Using x  74.6071 , I got s 2  0.05167 and s  0.2273 .
b. Compute a 99% confidence interval for the mean,  . (2)
Given: x  76 .6071 , s  0.7459 , n  7 and   .01 . So
0.55640
6
 3.707
 0.079486  0.2819 , DF  n 1  6 and t .005
7
n
7
  x  t 2 s x  74.6071 3.707 0.2819  74.607  1.045 or 73.562 to 75.652.
sx 
s

0.7459

c. Since 75.8 is not on the confidence interval, it is significantly different from the sample mean of
74.6071.
4
252y0511 2/25/05 (Open in ‘Print Layout’ format)
III. Do all of the following Problems (18+ points) Show your work except in multiple choice questions.
(Actually – it doesn’t hurt there either.) If the answer is ‘None of the above,’ put in the correct answer.
1.
When a p-value is smaller than a significance level
a) A type one error has been committed
b) A type two error has been committed
c) *The null hypothesis is rejected
d) The alterative hypothesis is rejected
e) The critical value is correct.
2.
The t distribution should be used when the parent (underlying) population
a) *Is Normal, the population standard deviation is unknown and we are testing a mean.
b) Is Normal, the population standard deviation is known and we are testing a mean.
c) Is Normal, the mean of the population is unknown and we are testing a mean.
d) Is binomial and we are testing for a proportion.
e) The t distribution should be used in all of these cases.
3.
The Normal distribution can be used in all of the cases below except when:
a) We are testing a mean, the population standard deviation is unknown and the sample is
large.
b) We are testing a proportion and the sample is large.
c) We are testing a variance and the sample is large.
d) We are testing the mean of a Poisson distribution and the sample is large.
e) *All of the above are cases when the Normal distribution can be used.
[6]
4.
(Lange) The state wants to estimate the proportion of the labor force that was unemployed in
North Hotzeplotz and wants to be 99% confident that their estimate is within 5% (written 0.05) of
the population proportion. If the proportion is probably about 15%, how large a sample is needed?
(3)
[9]
Solution: The outline says “The usually suggested formula is n 
pqz 2
, but since p is usually
e2
unknown, a conservative choice is to set p  0.5 . This is the formula everyone forgets that we
covered.” We don’t need to use .5, since we have an estimate of p  .15, which implies
q  1  p  .85. So we try n 
pqz 2
e2

.15.85 2.576 2
.05 2
 338 .42 . Use at least 339 respondents.
5
252y0511 2/25/05 (Open in ‘Print Layout’ format)
5.
I wish to do a test to see if the average level of satisfaction of my employees is above 75 on a zero
to 100 scale. I take a survey of 30 of my 100 employees and get a mean of 76. I assume that the
data is Normally distributed with a standard deviation of 8. What are my null and alternate
hypotheses?
a) H 0 :   75 and H 1 :   75
b) H 0 :   75 and H 1 :   75
* H 0 :   75 and H 1 :   75
d) H 0 :   75 and H 1 :   75
c)
H 0 :   75 and H 1 :   75
f) H 0 :   75 and H 1 :   75
g) None of the above.
e)
6.
I wish to do a test to see if the average level of satisfaction of my employees is above 75 on a zero
to 100 scale. I take a survey of 30 of my 100 employees and get a mean of 76. I assume that the
data is Normally distributed with a standard deviation of 8. Assume that your null and alternate
hypotheses in question 5 are correct and that you significance level is 92%. Find a critical value
for the sample mean. Show clearly what formulas you are using. (3)
[14]
Solution: H 0 :   75 and H 1 :   75 .   .08 , n  30, N  100 ,  0  75 ,   8, x  76 .
The formula table says.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Mean (
x  0
  x  z 2  x
xcv  0  z 2  x
H0 :   0
z
known)
x
H1 :    0

x 
n
Mean (
x  0
  x  t 2 s x
xcv   0  t 2 s x
H0 :   0
t
unknown)
sx
H1 :    0
DF  n 1
s
sx 
n
The alternate hypothesis says that the mean is above 75, so we find a critical value above 75. You
found z .08 in Part I of this exam!! x cv   0  z  x  75  z.08

n
N n
N 1
6470 
100  30
 75  1.405 1.5085  75  1.405 1.228 
 75  1.405
3099 
30 100  1
 75  1.73  75.7 You would get the same value if you used z.08  1.40 or z .08  1.41 . Note: The
 75  1.405
8
finite population correction had to appear on the exam, but without it  x 
7.

 1.46 .
n
(Dummeldinger)A movie rental chain is considering opening a new outlet. The corporation will
open an outlet only if more than 5000 out of the 20000 households in the area have DVD players.
It randomly selects 300 households and finds that 96 have DVD players. What are our null and
alternative hypotheses?
a) H 0 :   5000 and H1 :   5000
b) H 0 : p  5000 and H1 : p  5000
c) H 0 : p  .32 and H1 : p  .32
d) * H 0 : p  .25 and H1 : p  .25
e) H 0 :   .25 and H1 :   .25
f) H 0 :   5000 and H1 :   5000
[16]
Note the p 
96
 .32 is a statistic, not a parameter and that we know it is true.
300
6
252y0511 2/25/05 (Open in ‘Print Layout’ format)
8.
We wish to determine if the median income in an area exceeds $40000. A random sample of 250
households was selected. 104 had incomes above $40000. Let p be the proportion with incomes
above $40000. Our null hypotheses include (3):
a) H 0 :   40000
b) H 0 :   40000
H 0 : p  0.5
d) * H 0 : p  0.5
e) H 0 : p  0.5
f) H 0 :   40000
g) * H 0 :   40000
h) H 0 :   0.5
i) H 0 :   0.5
[19]
Explanation: “The median income in an area exceeds $40000” is written   40000 and
must be an alternative hypothesis, since it does not contain an equality. Its opposite is
H 0 :  40000 . The table in the outline reads as below.
Hypotheses about
Hypotheses about a proportion
a median
If p is the proportion
If p is the proportion
above  0
below  0
c)
H 0 :  0

H 1 :  0
 H 0 :   0

H 1 :   0
 H 0 :   0

H 1 :   0
 H 0 : p .5

 H 1 : p .5
 H 0 : p .5

 H 1 : p .5
 H 0 : p .5

 H 1 : p .5
 H 0 : p .5

 H 1 : p  .5
 H 0 : p .5

 H 1 : p  .5
 H 0 : p .5

 H 1 : p  .5
9.
We wish to determine if the median income in an area exceeds $40000. A random sample of 250
households was selected. 104 had incomes above $40000. Let p be the proportion with incomes
above $40000. Assume that your null hypotheses in 8) are correct and test the hypothesis.
a) Using a test ratio and a p-value. (2).
b) Using a critical value for x , p, s or the median as appropriate. (2).
c) Using a confidence interval. (2, 3 or 5 depending on your level of chutzpah).
d) (Extra Credit) Redo a), b) or c) assuming that the sample of 250 came from a population
of 2000.
Solution: a) From the formula table we have:
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Proportion
p  p0
p  p  z 2 s p
pcv  p0  z 2  p
H 0 : p  p0
z
p
H1 : p  p0
pq
p0 q0
sp 
p 
n
n
q  1 p
q  1 p
0
0
7
252y0511 2/25/05 (Open in ‘Print Layout’ format)
 H :  40000
The Hypotheses were  0
and are now
 H 1 :  40000
p
104
 .416 and I assumed that   .05 .  p 
250
sp 
a)
H 0 : p  0.5
. So n  250 , p 0  0.5,

H 1 : p  0.5
0.50.5
 .001  .03162 ,
250
0.416 0.584 
 .0009718  .03117
250
Using a test ratio and a p-value. (2). z 
p  p0
p

.416  .5
 2.66 . Since this is a right.03162
sided test, pvalue  Pz  2.66   .5  .4941  .9841 . Since this is above .05, do not
reject the null hypothesis.
Normal Curve with Mean 0 and Standard Deviation 1
The Area to the Right of -2.66 is 0.9961
0.4
Density
0.3
0.2
0.1
0.0
-5
-4
-3
-2
-1
0
Data A xis
1
2
3
4
b) Using a critical value for x , p, s or the median as appropriate. (2). The only reasonable
possibility is a critical value for p . Since the alternative hypothesis is p  .5 , we need a
critical value above 0.5. pcv  p0  z.05 p  .5  1.645.03162  .552 . Make a diagram
c)
of a Normal curve with a mean at 0.05 and a rejection zone above 0.552. Since p  .416
is not in the rejection zone, do not reject the null hypothesis.
Using a confidence interval. (2, 3 or 5 depending on your level of chutzpah). If we do it
the easy way , use a one-sided confidence interval in the same direction as the alternative
hypothesis. p  p  z  s p  .416  1.645.03117  .364. Since it is not impossible for
the proportion to be both at most 0.467 and at least 0.5, reject the null hypothesis. Make a
diagram with an almost Normal curve centered at .416. To represent the confidence
interval, show a shaded area above .416. To represent the null hypotheses show a second
shaded area below .5. Note that they overlap. Actually, it is sufficient to show that
p 0  .5 is in the first shaded area.
If you have real nerve try a confidence interval for the median. It says in the outline to
use x k , where k 
n  1  z .2 n
is the lower limit in a 2-sided confidence interval. The
2
upper limit would be n  k  1 . Since our alternative hypothesis is H 1 :  40000 , we
want a lower limit in a 1-sided confidence interval. So we use k 
n  1  z .05 n
2
251  1  1.645 250
 112 .495 . Our interval will be   x113 , where x113 is the 113th
2
number when the numbers are put in order. We can say P  x113   .95 and if x113 is
above 40000, reject the null hypothesis.

8
252y0511 2/25/05 (Open in ‘Print Layout’ format)
d) (Extra Credit) Redo a), b) or c) assuming that the sample of 250 came from a population
pq N  n
0.426 0.584  1750

of 2000. In Section 8.7, the text says to use s p 
n
N 1
250
1999
 .008507  .02917 in a confidence interval. By analogy  p 
p0 q0
n
N n
N 1
.5.5 1750
 .008754  .09356 would be used in test ratios or critical values. This
250 1999
is the variance of the Hypergeometric distribution.

9
252y0511 2/25/05 (Open in ‘Print Layout’ format)
ECO252 QBA2
FIRST EXAM
February 28 2005
TAKE HOME SECTION
Name: _________________________
Student Number and class: _________________________
IV. Do at least 3 problems (at least 7 each) (or do sections adding to at least 20 points - Anything extra
you do helps, and grades wrap around) . Show your work! State H 0 and H 1 where appropriate. You
have not done a hypothesis test unless you have stated your hypotheses, run the numbers and stated
your conclusion.. (Use a 95% confidence level unless another level is specified.) Answers without
reasons are not usually acceptable.
1.
(Dummeldinger) You are an automobile manufacturer and the EPA has just estimated that your 2005 Prejector model gets
35 miles per gallon on the highway. You wish to prove that the Prejector gets more than 35 mpg. 50 of the current model
are tested with the results below. To personalize the data below take the last digit of your student number, divide it by 10
and add it to the numbers below. (For example, Seymour Butz’s student number is 976502, so he will add 0.20 and change
the data to 44.64, 48.04, 37.57 etc. – but see the hint below, you do not need to write down the numbers that you are using,
just your computations.)
Miles per gallon
44.44
47.84
34.59
32.02
35.61
42.56
40.92
33.56
44.26
21.41
42.70
44.70
37.37
36.27
35.80
46.52
24.56
38.24
41.37
33.47
43.45
35.72
43.58
33.07
39.57
23.55
51.16
45.40
37.59
41.49
21.05
44.28
29.14
29.54
34.07
48.02
43.41
41.86
42.31
23.98
24.03
35.20
27.58
41.13
32.18
39.03
44.44
36.78
35.47
33.88
 x  1860 .17 ,  x  71904 .65,  x  a    x   na,
 x  a2   x 2  2ax  a 2    x 2   2ax   a 2   x 2  2a x  na 2
Hint:
n  50,
2
Assume that the Normal distribution applies to the data and use a 99% confidence level.
a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1)
b. State your null and alternative hypotheses (1)
c. Test the hypothesis using a test ratio (1)
d. Test the hypothesis using a critical value for a sample mean. (1)
e. Test the hypothesis using a confidence interval (1)
f. Find an approximate p-value for the null hypothesis. (1)
g. On the basis of your tests, is the EPA right? Why? (1)
h. Assume that the Normal distribution does not apply and, using the data as given above, test that the median is above 35.
(3)
i. (Extra credit) Again, use the data as given and do an approximate 99% 2-sided confidence interval for the median.
2.
3.
Once again, assume that the Normal distribution applies, but assume a population standard deviation of 7 and that we are
testing whether the mean is below 36 mpg. (99% confidence level)
a. State your null and alternative hypotheses(1)
b. Find a p-value for the null hypothesis using the mean that you found in a. (1)
c. Create a power curve for the test. (6)
a. Assume that you are testing the hypothesis 
 36 using the original data. Let p be the proportion of the data above
p  .5. Using a 99% confidence level find a critical
36, so that, according to the outline, your alternate hypothesis is
value for
p , how many items in the sample of 50 would have to be above 36 for you to reject the null hypothesis (This
answer should either say ‘between 0 and ?’ or ‘between ? and 50.’) (2)
b. Using the proportion of numbers above 36 in the original data, find a p-value for the null hypothesis. (1)
c. (Extra credit) Create a power curve for the test by using the alternate hypothesis in b and finding the power for other
values of p1 . (up to 6)
d. Assume that
p  .5 , how large a sample would you need to estimate the proportion above 36 with an error of .01? How
much would you cut down the sample size if you used the proportion that you actually found? Illustrate how much the
required sample size would fall if you lowered the confidence level. (3)
e. Use the proportion that you found in 3b) to create a 2-sided confidence interval for the proportion above 36. Does it
differ significantly from .5? Why? (2)
10
252y0511 2/25/05 (Open in ‘Print Layout’ format)
4.
a. Take the standard deviation that you found in 1), add the same quantity that you added in part 1) to it. (For example,
Seymour Butz’s student number is 976502 and he found s  7.12 , so he will add 0.20 to it and use 7.32.)
b. Test the hypothesis that the standard deviation is 6. (99% confidence level) Use a test ratio. (2)
Find a p-value for your answer in 4a). (1)
c. Do a 99% confidence interval for the standard deviation (2)
d. (Extra credit) Redo 4a) using an appropriate confidence interval. (2)
e. (Extra credit) Find critical values for s in 4a). (1)
f. A bank's average default rate on loans is supposedly 7 per month. In the first month there are 13 defaults. Test the first
assertion assuming a Poisson distribution. Use a two-sided test with a 1% significance level. (2)
g. In 4f) find what values of x (the number of defaults in the first month) would enable you not to reject the null
hypothesis. (2)
h. (Extra credit) Assume that the bank, in fact, has an average default rate on loans of 9 per month, what is the probability
that you will fail to reject your null hypothesis that the mean is 7, using the ‘accept’ zone that you found in g)?
1.
(Dummeldinger) You are an automobile manufacturer and the EPA has just estimated that your
2005 Prejector model gets 35 miles per gallon on the highway. You wish to prove that the
Prejector gets more than 35 mpg. 50 of the current model are tested with the results below. To
personalize the data below take the last digit of your student number, divide it by 10 and add it to
the numbers below. (For example, Seymour Butz’s student number is 976502, so he will add 0.20
and change the data to 44.64, 48.04, 37.57 etc. – but see the hint below, you do not need to write
down the numbers that you are using, just your computations.)
Miles per gallon
44.44
47.84
34.59
32.02
35.61
42.56
40.92
33.56
44.26
21.41
42.70
44.70
37.37
36.27
35.80
46.52
24.56
38.24
41.37
33.47
43.45
35.72
43.58
33.07
39.57
23.55
51.16
45.40
37.59
41.49
21.05
44.28
29.14
29.54
34.07
48.02
43.41
41.86
42.31
23.98
24.03
35.20
27.58
41.13
32.18
39.03
44.44
36.78
35.47
33.88
 x  1860 .17 ,  x  71904 .65,  x  a    x   na,
x  a2   x 2  2a x na2
Hint: n  50 ,
2
Assume that the Normal distribution applies to the data and use a 99% confidence level.
a. Find the sample mean and sample standard deviation of the incomes in your data, showing
your work. (1)
Seymour wouldn’t use the formulas above, so he actually added all the numbers and their squares.
x  1861.17 and
x 2  71979.1. If he had had the sense to use the formulas above,
He got

he would have found

 x  0.20   1860 .17  50.02   1861 .17 and  x  0.202
 71904 .65  2.02 1860 .17   50.02 2  71904 .65  74.41  0.02  71979 .08
So I will use
and n  50 .
 x  1861 .17 ,  x
2
 71979 .08
x
 x  1861 .17  37.223
s2 
sx 
s
n

55 .133
 1.050 and n  50.
50

n
x
50
2
 nx 2
n 1
71979 .08  50 37 .223 2
 55 .133
49
s  55 .133  7.425
11
252y0511 2/25/05 (Open in ‘Print Layout’ format)
b. State your null and alternative hypotheses (1)
H 0 :   35 and H 1 :   35
The formula table says:
Interval for
Confidence
Hypotheses
Interval
Mean (
  x  t 2 s x
H0 :   0
unknown)
H :  
DF  n 1
1
Test Ratio
t
0
Critical Value
xcv   0  t  2 s x
x  0
sx
sx 
s
n
49
Here   .01,  0  35, x  37 .223 , s x  1.050 and n  50. For a one sided test use t .01
 2.405 .
c. Test the hypothesis using a test ratio (1)
x  0
37 .223  35

 2.117 . Make a diagram showing an almost-Normal curve with a
t
1.050
sx
49
 2.405 . Since 2.117 is not in the
vertical bar at t  0 . Show a 1% rejection zone above t .01
rejection zone, do not reject the null hypothesis.
d. Test the hypothesis using a critical value for a sample mean. (1)
The alternative hypothesis tells us that we need a critical value above 35. The form will
be x cv   0  t s x  35  2.405 1.050   37.525 . Make a diagram showing an almost-Normal
curve with a vertical bar at  0  35 . Show a 1% rejection zone above 37.525. Since x  37 .223
is not in the rejection zone, do not reject the null hypothesis.
e. Test the hypothesis using a confidence interval (1)
In view of the alternative hypothesis, the one-sided confidence interval will have the
form   x  t s x  37.223  2.405 1.050   34.70 . Make a diagram showing an almost-Normal
curve with a vertical bar at x  37 .223 . Represent the confidence interval by shading the entire
area above 34.70. Represent the null hypothesis, H 0 :   35 by shading the entire area below 35.
Since these areas overlap, the confidence interval does not contradict the null hypothesis.
f. Find an approximate p-value for the null hypothesis. (1) Recall that t 
x  0
sx
37 .223  35
 2.117 . Since we are doing a right-sided test, the p-value would be Pt  2.117 .
1.050
Remember that we have n  1  50  1  49 degrees of freedom. If we try to locate 2.117 on the
49df line of the t table we find that it is between t 49  2.010 and t 49  2.405 . Remember that



.025


.01
49
49
 .025 and P t  t .01
 .01 . Since 2.117 lies between
by the definition of the symbols, P t  t .025
the two values of t that we found on the table, .025  Pt  2.117   .01 , or .025  pvalue  .01 .
This is verified by running tAreaA (See next page.). If we use   .01, we can see that the p-value
is above the significance level, which means we do not reject the null hypothesis.
12
252y0511 2/25/05 (Open in ‘Print Layout’ format)
t Curve with 49 Degrees of Freedom and Standard Deviation 1.02105
The Area to the Right of 2.117 is 0.0197
0.4
Density
0.3
0.2
0.1
0.0
-5
-4
-3
-2
-1
0
Data A xis
1
2
3
4
g. On the basis of your tests, is the EPA right? Why? (1) Since we have not rejected the null
hypothesis, H 0 :   35 , we cannot dispute the EPA’s statement that the mean gas mileage is 35
mpg.
h. Assume that the Normal distribution does not apply and, using the data as given above, test that the
median is above 35. (3) The original data is repeated with values above 35 starred.
44.44*
34.59
35.61*
40.92*
44.26*
42.70*
47.84*
32.02
42.56*
33.56
21.41
44.70*
37.37*
36.27*
35.80*
46.52*
24.56
38.24*
41.37*
33.47
43.45*
35.72*
43.58*
33.07
39.57*
23.55
51.16*
45.40*
37.59*
41.49*
21.05
44.28*
29.14
29.54
34.07
48.02*
43.41*
41.86*
42.31*
23.98
24.03
35.20*
27.58
41.13*
32.18
39.03*
44.44*
36.78*
35.47*
33.88
It looks to me as if 33 out of 50 are above 35. Using the outline, we have ‘The median mpg is
above 35’ is written   35 and must be an alternative hypothesis since it does not contain an
equality. Its opposite is H 0 :  35 . Let us say that p is the proportion above 35. The table in the
outline reads as below.
Hypotheses about
a median
Hypotheses about a proportion
If p is the proportion
If p is the proportion
above  0
below  0
 H 0 :   0
 H 0 : p .5
 H 0 : p .5



 H 1 : p .5
 H 1 : p .5
 H 1 :   0
 H 0 :   0
 H 0 : p .5
 H 0 : p .5



 H 1 : p  .5
H 1 :   0
 H 1 : p .5
 H 0 :   0
 H 0 : p .5
 H 0 : p .5



H
:



0
 H 1 : p  .5
 H 1 : p  .5
 1
So our hypotheses become H 0 : p  .5 and H 1 : p  .5 . There are three ways to go from here.
(α) According to the binomial table with p  .5 and n  50 ,
pvalue  Px  33  1  Px  32   1  .98358  .0164 . This is below   .05 .
(β) If we do a conventional test of a proportion, p 
p 
33
 .66 ,
50
p  p 0 .66  .5
p0 q0
.5.5

 2.263 . We can test this

 .0707 , and z 
p
.0707
n
50
against z .05  1.645 or z.01  2.327 or we can say pvalue  Pz  2.263 
.5  P0  z  2.26   .5  .4881  .0119 .
13
252y0511 2/25/05 (Open in ‘Print Layout’ format)
(γ) From the outline, we could also use z 
2x  n

233   50
n
 2.263 . This is identical
50
to (β).
These all lead to a rejection of our null hypothesis if   .05 or to a failure to reject the hypothesis
if   .01 .
i. (Extra credit) Again, use the data as given and do an approximate 99% 2-sided confidence
interval for the median.
The original data is presented in order.
x
x
x
Rank
Rank
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
21.05
21.41
23.55
23.98
24.03
24.56
27.58
29.14
29.54
32.02
32.18
33.07
33.47
33.56
33.88
34.07
34.59
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
It says in the outline to use x k , where k 
35.20
35.47
35.61
35.72
35.80
36.27
36.78
37.37
37.59
38.24
39.03
39.57
40.92
41.13
41.37
41.49
41.86
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
42.31
42.56
42.70
43.41
43.45
43.58
44.26
44.28
44.44
44.44
44.70
45.40
46.52
47.84
48.02
51.16
n  1  z .2 n
is the lower limit in a 2-sided confidence interval.
2
The upper limit would be n  k  1 . If we want a 2-sided interval with   .01, use z .005  2.576 . So
50  1  2.576 50
 16.392 , which rounds down to 16 and n  k  1  50  16  1  35 . From the data
2
above x16  34 .07 and x35  42 .31 . So we can say that P34.07    42.31  .99 Let us try to verify this
by use of the binomial table with p  .5 and n  50 . P16  x  35   Px  35   Px  15 
 .99870  .00330  .9954 . The next smallest interval would be x17 to x34 and P17  x  34 
 Px  34   Px  16   .99670  .00767  .9890 is too small if we want to be conservative.
k
2.
Once again, assume that the Normal distribution applies, but assume a population standard
deviation of 7 and that we are testing whether the mean is below 36 mpg. (99% confidence level)
a. State your null and alternative hypotheses (1) H 0 :   36 and H 1 :   36 .
b. Find a p-value for the null hypothesis using the mean that you found in a. (1) Remember that
n  50 and that Seymour found x  37.223 . If   7,  x 
z
x  0
x


7
n
50
37 .223  36

 1.24 and since this is a left-sided test,
0.9899

49
 0.9899 .
50
pvalue  Pz  1.24   .5  .3925  .8925 .
14
252y0511 2/25/05 (Open in ‘Print Layout’ format)
c. Create a power curve for the test. (6) Remember that we had H 0 :   36 and H 1 :   36 , so
that the critical value must be below 36. The formula table gives us xcv   0  z  x , which
2
becomes x cv   0  z  x  36  2.327 0.9899   33.697 .
The diagram for the test is a Normal curve centered on a vertical line at 36. Below 36, the value
33.697 cuts off a 1% rejection zone. So we do not reject the null hypothesis if the sample mean is
above 33.697. Let us use the following points for our operating characteristic curve: 36.00, 34.8,
33.697, 32.4 and 31.2. Since the difference between 36 and 33.697 is about 2.4, I used a distance
of half that or 1.2 to pick my points. We will not reject the null hypothesis if the sample mean is
above 33.697.  x  0.9899 .So
33 .697  36 
  Pz  2.33   .5  .4901  .9901  1  
0.9899 

Power  1    1  .9901  .0099  
  Px  33.697   36  P z 

  Px  33.697   34.8  P z 


Power  1    1  .8665  .1335
33 .697  34 .8 
  Pz  1.11  .5  .3665  .8665
0.9899

  Px  33.697   33.697  P z 


33 .45  33 .45 
  P  z  0   .5
0.9899

Power  1    1  .5  .5
  Px  33.697   32.4  P z 
33 .697  32 .4 
  Pz  1.31  .5  .4049  .0951
0.9899

  Px  33.697   31.2  P z 
33 .697  31 .2 
  Pz  2.52   .5  .4941  .0059
0.9899



Power  1    1  .0951  .9049


Power  1    1  .0059  .9941
The power curve is a simple graph of these points. The x axis goes from about 31 to 36 and the
y axis from zero to 1. The curve falls from almost 1 or 100% to .01 or 1% at 36.
3.
a. Assume that you are testing the hypothesis   36 using the original data. Let p be the
proportion of the data above 36, so that, according to the outline, your alternate hypothesis is
p  .5. Using a 99% confidence level find a critical value for p , how many items in the sample of
50 would have to be above 36 for you to reject the null hypothesis (This answer should either say
‘between 0 and ?’ or ‘between ? and 50.’) (2)
We had H 0 :  36 and H 1 :  36 which (according to the table on page 12) became
H 0 : p  .5 and H 1 : p  .5 . We need a critical value for a one-sided test that is below .5. We
p0 q0
.5.5

 .0707 , so that pcv  p0  z  p
n
50
 .5  2.327 .0707   .3354 . This is about 16 items out of 50, so that, if there are between 0 and 16
items above 50, we reject the null hypothesis. If you have more sense than I did and use the
Binomial table instead, you will find that Px  16   .00767 is the highest probability below 1%
on the n  50, p  .5 part of the table.
have already seen that  p 
b. Using the proportion of numbers above 36 in the original data, find a p-value for the null
hypothesis. (1)
If you look at the numbers in order on page 14, you will see that 22 are below 36 and 28 are above
28
 .56 . Since this is a left-sided test, pvalue  P p  .56 
36. The proportion is thus p 
50
15
252y0511 2/25/05 (Open in ‘Print Layout’ format)
.56  .5 

 Pz 
 Pz  0.84   .5  .2995  .7995 . The killjoys who used the Binomial table got
.0707 

a much more accurate value of Px  28   .83888 .
c. (Extra credit) Create a power curve for the test by using the alternate hypothesis in b and finding
the power for other values of p1 . (up to 6)
Remember that the hypotheses are H 0 : p  .5 and H 1 : p  .5 , and that p cv  .3354 . If the
proportion is below .5, we will not reject the null hypothesis if p is above .3354. The halfway
point between .5 and .3354 is .4177, which is about .08 below .5. I used .5, .42, .3353, .26 and .18.
Note that I ignored the failure of everyone who did this question to recompute  p .

 .5  .4901  .9901  1  
p 
p  .42
 .5  .3869  .8869
p  .3353
p 
p  .26
p 
 .5  .3869  .1131
p 
p  .18
 .5  .4979  .0021
.3353  .5 
  Pz 
 Pz  2.33 
.0707 

Power  1    1  .9901  .0099  
 p  .0707
p  .5
.3353  .42 
.42 .58 

  Pz 
 Pz  1.21
 .0698
.0698 
50

Power  1    1  .8869  .1131
.3353  .3353 
.3353 .6647 

  Pz 
 .0668
  Pz  0   .5
.0668
50


Power  1    1  .5  .5
.3353  .26 
.26 .74 

  Pz 
 Pz  1.21
 .0620
.0620 
50

Power  1    1  .1131  .8869
.3353  .18 
.18 .82 

  Pz 
 Pz  2.87 
 .0543
.0543 
50

Power  1    1  .0021  .9979
d. Assume that p  .5 , how large a sample would you need to estimate the proportion above 36
with an error of .01? How much would you cut down the sample size if you used the proportion
that you actually found? Illustrate how much the required sample size would fall if you lowered
the confidence level. (3)
The outline says “The usually suggested formula is n 
pqz 2
, but since p is usually unknown, a
e2
conservative choice is to set p  0.5 . This is the formula everyone forgets that we covered.
Assume   .01 . So n 
found p 
pqz 2
e2

.5.52.576 2
.012
 16598 .44 , and we use 16599. We actually
28
.56 .44 2.576 2
 .56 . If we use .56 instead, n 
 16350 .55 , and we use 16351.
50
.012
This is 98.5% of the previous value, but higher values of p could bring considerable savings..
Now, if we switch from a 99% confidence level to a 95% confidence level,
n
.56 .44 1.960 2
.012
 9465 .70 . We use 9466, which is 58% of our second value and 57% of our
original sample size and thus represents a considerable saving.
16
252y0511 2/25/05 (Open in ‘Print Layout’ format)
e. Use the proportion that you found in 3b) to create a 2-sided confidence interval for the
28
 .56 , p  p  z s p
proportion above 36. Does it differ significantly from .5? Why? (2) p 
2
50
pq
.56 .44 

 .0702 . A 99% confidence interval would be .76  2.576 .0702 
n
50
 .56  .181 or .379 to .741. If you used a 95% confidence level instead, you would get
.56  1.960 .0702   .56  .138 or .422 to .698. Because the confidence interval includes .5, the
difference is not significant at the 95% or 99% level.
and s p 
4.
a. Take the standard deviation that you found in 1), add the same quantity that you added in part 1)
to it. (For example, Seymour Butz’s student number is 976502 and he found s  7.12 , so he added
0.20 to it and used 7.32.) (No credit.)
b. Test the hypothesis that the standard deviation is 6. (99% confidence level) Use a test ratio. (2)
Find a p-value for your answer in 4a). (1)
Our hypotheses are H 0 :   6 and H 1 :   6 . Since n  50 , we are in a large sample situation.
The outline says  2 
2 
49 7.425 2
62
n  1s 2
 02
and for large samples z  2  2  2DF   1 . So
 75 .0389 . Assume   .01, DF  50  1  49 and
z  275 .0389   249   1  150 .0778  97  12 .2596  9.84886  2.40 . For a 2-sided test,
make a Normal curve with a vertical line at the center where z  0 . Rejection zones will be above
z.005  2.596 and below z.025  2.576 . Since 2.40 is between the critical values, do not reject
the null hypothesis. Note that for a 5% significance level, the hypothesis would be rejected.
Since this is a 2-sided test, we use pvalue  2Pz  2.40   2.5  .4918   .0164 .
c. Do a 99% confidence interval for the standard deviation (2). The outline gives
s 2DF 
z 2  2DF 
 
s 2DF 
 z 2  2DF 
, If   .01, z   z.005  2.576 and
2
2 DF  249   98  9.899 . The interval is thus
7.425 9.899 
7.425 9.899 
 
or
2.576  9.899
 2.576  9.899
73 .500
73 .500
 
or 5.892    10.369
12 .475
7.323
d. (Extra credit) Redo 4b) using an appropriate confidence interval. (2) H 0 :   6 and
H 1 :   6 and   .01 . The interval becomes
7.425 9.899 
7.425 9.899 
 
or
1.960  9.899
 1.960  9.899
73 .500
73 .500
 
or 6.197 to 9.260. Since   6 is not on this interval, reject the null
11 .859
7.939
hypothesis.
e. (Extra credit) Find critical values for s in 4a). (1) The easiest way to do this is to use the
69.899 
formula sheet, which says scv   2 DF . These become s cv 
or
 1.960  9.899
 z  2 DF
2
17
252y0511 2/25/05 (Open in ‘Print Layout’ format)
69.899 
69.899 
59 .394
59 .394

 5.008 and

 7.481 . Note that s  7.425 is
1.960  9.899 11 .859
 1.960  9.899
7.939
between these two values, so that we cannot reject the null hypothesis.
f. A bank's average default rate on loans is supposedly 7 per month. In the first month there are
13 defaults. Test the first assertion assuming a Poisson distribution. Use a two-sided test with a
1% significance level. (2)
This is essentially problem B4, which was assigned.   .01 .
H : Poisson7 
If we assume that the distribution is Poisson, our hypotheses are  0
. Though it
H 1 : not Poisson7 
is possible to put together a rejection region, the easiest way to do this is to use the Poisson(7)
table and a p-value approach. Since this is a 2-sided test, we double p-values. If we look up the
probability that x is 13 or larger in the Poisson table, we find: pvalue  2Px  13
 21  Px  12   21  .9730   2.0270   .0540 . Since pvalue   , do not reject H 0 .
g. In 4f) find what values of x (the number of defaults in the first month) would enable you not to
reject the null hypothesis. (2)
Try x  14 . pvalue  2Px  14   21  Px  13  21  .98719   2.01281   .0256 .
pvalue  2Px  15   21  Px  14   21  .99428   2.00572   .0114 .
pvalue  2Px  16   21  Px  15   21  .99757   2.00243   .00486 .
If x  2 pvalue  2Px  2  2.02964   .0593 .
If x  1 pvalue  2Px  1  2.00730   .0146 .
If x  0 pvalue  2Px  0  2.00091   .00182 .
If   .01, the p-value is below the significance level for x  16 and x  0 (In a 2-sided test, the
trick is to look for probabilities below  2  .005 and above 1   2  .995 .) So we do not reject the
null hypothesis if 1  x  15 .
If x  15
If x  16
h. (Extra credit) Assume that the bank, in fact, has an average default rate on loans of 9 per
month, what is the probability that you will fail to reject your null hypothesis that the mean is 7,
using the ‘accept’ zone that you found in g)?
This is actually the easiest problem on the exam. If the mean is 9,
  P1  x  15   Px  15   Px  0  .97796  .00012  .97784 . The power is 1-.97784 =
0.02%, which is about as bad as it gets. This takes us back to my statement in lecture that a 1period trial has very little power. If, instead, you let the trial run for 12 months, the mean would be
  127  84 . and the standard deviation would be   84 . The critical values would be
 
x cv  84  2.576 84 . It should be very easy to show that if, in fact   12 9  108 ,  is much
lower.
18
Download