Document 15929778

advertisement
251y0711
2/20/07
ECO251 QBA1
FIRST EXAM
February 21, 2007
Name: ___KEY__________________
Student Number: _________________________
Class Hour: _____________________
Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer
needs a calculation or short explanation to count.
Part I. (7 points)
The following numbers are a sample and represent the prices of regular in a sample of 11 gas stations
2.28, 2.38, 2.50, 2.42, 2.34, 2.38, 2.44, 2.48, 2.38, 2.65, 2.66
Compute the following: Show your work!
a) The Median (1)
b) The Standard Deviation (3)
c) The 4th quintile (2)
d) The Coefficient of variation (1)
Numbers in order n  11
x
x2
x1
2.28
5.1984
x2
2.34
5.4756
x3
2.38
5.6644
a) pn  1  .512   6.0 The middle number is
the 6th number, which is 2.42. If you really want
to get formal
x.50  x6  0x7  x6   2.42  02.44  2.42  .
b) x 
 x  26.91  2.44636 ,

n
11
x 2  nx 2
65 .9757  112.44636 2
11  1
x4
2.38
5.6644
x5
2.38
5.6644
x6
2.42
5.8564
x7
2.44
5.9536
x8
2.48
6.1504
x9
2.50
6.2500
s  0.014425025  0.12010
Calculator got .1200227 without rounding.
c) The 4th quintile has 4 5 or 80% below it.
pn  1  .812   9.60 . So a  9 and .b  0.60
x10
2.65
7.0225
x1 p  xa  .b( xa1  xa ) So
x11
2.66
7.0756
26.91
65.9757
x1.80  x.20  x9  0.80( x10  x9 )
 2.50  0.602.65  2.50   2.59
Total
s2 

n 1
0.144250254

 0.014425025 . So
10
s 0.12010

 0.04909 or 4.91%
x 2.44636
Note that mean, median and fourth quintile must be between 2.28 and 2.66. In the variance excess rounding
will give you a negative variance. s 2 cannot be negative.
d) C 
1
251y0711
2/20/07
Part II. (At least 35 points – 2 points each unless marked - Parentheses give points on individual
questions. Brackets give cumulative point total.) Exam is normed on 50 points.
1. The difference between cumulative and ordinary frequency distributions is that the cumulative frequency
distribution shows the number of observations which are:
a) greater than particular values, whereas the ordinary frequency distribution shows the number of
observations in each class interval.
b)* less than that particular values, whereas the ordinary frequency distribution shows the number
of observations in each class interval.
c) in each class interval less than that particular values, whereas the ordinary frequency
distribution shows the number of observations less than particular values.
d) in each class interval less than that particular values, whereas the ordinary frequency
distribution shows the number of observations greater than particular values.
e) none of the above.
2. Mark the variables below as qualitative or categorical (A), quantitative and continuous (B1) or
quantitative and discrete (B2) (1 each)
a) atmospheric pressure. B1
b) method of contraception. A
c) expenditure per pupil. B1
d) Fahrenheit temperature. B1
e) Number of murders in Philadelphia over a year. B2
[7]
Exhibit 1: Given below is the stem-and-leaf display representing the amount of oil in gallons (with
leaves in gallons) used by a sample of 25 emergency generators during a power outage.
5
6
7
8
9
|
|
|
|
|
3.0
2.1
0.2
1.0
2.8
7.2
2.4
3.1
2.5
7.1
3.0 5.5 7.3 7.8 8.6 8.8 8.8
3.3 6.2 7.7 8.2 8.8
4.5 6.8
7.5
3. In Exhibit 1, if an ogive showing relative frequency is constructed using 50.0 to under 60.0 as the first
class, what will be the height of the point above 70 on the x axis?
[9]
Answer: The interval 60-70 has 9 items in it and the interval before it has 2, so the cumulative frequency is
F 11
 0.44
F  11 and the relative cumulative frequency is Frel  
n 25
4. In Exhibit 1 find the first quartile of amount of oil used.
[11]
Answer: position  pn  1  .2526   6.5 . The 6th number is 65.5 and the 7th number is 67.3. So a  6
and .b  0.50
x1 p  xa  .b( xa1  xa ) So x1.25  x.75  x6  0.50 ( x7  x6 )  65.5  0.5067.3  65.5  66.4
or simply
65 .5  67 .3
 66 .4 .
2
2
251y0711
2/20/07
Exhibit 1: Given below is the stem-and-leaf display representing the amount of oil in gallons (with
leaves in gallons) used by a sample of 25 emergency generators during a power outage.
5
6
7
8
9
|
|
|
|
|
3.0
2.1
0.2
1.0
2.8
7.2
2.4
3.1
2.5
7.1
3.0 5.5 7.3 7.8 8.6 8.8 8.8
3.3 6.2 7.7 8.2 8.8
4.5 6.8
7.5
5. Using the data in Exhibit 1, Assume that the data is to be presented in 7 classes, show how you would
decide what class interval to use and list the classes below with their frequencies. (5)
[16]
Class
Frequency
A __ to under __ __
B __ to under __ __
C __ to under __ __
D __ to under __ __
E __ to under __ __
F __ to under __ __
G __ to under __ __
Answer: The numbers lie between 53.0 and 97.5. So the width will be something slightly above
97 .5  53 .0
w
 6.3571 . You could use 6.36 or 7 or, perhaps 8. I will try 8.
7
Class
Frequency
A 50 to under 58
2
B 58 to under 66
4
C 66 to under 74
8
D 74 to under 82
5
E 82 to under 90
3
F 90 to under 98
3
G 98 to under 106
0
This didn’t work because the last class was empty. So I will try 7.
Class
A 50 to under 57
B 57 to under 64
C 64 to under 71
D 71 to under 78
E 78 to under 85
F 85 to under 92
G 92 to under 99
Frequency
1
4
7
4
5
1
3
25
3
251y0711
2/20/07
6. If a frequency distribution is skewed to the left, which of the following measures is likely to have the
largest value?
[18]
a) mean
b) median
c) *mode
d) All of the above will be almost the same size.
e) the parameter
f) It is impossible to tell unless we know whether we are dealing with a sample or a population.
Explanation: The usual ordering has the median between the mean and the mode. The mean will be pulled
down relative to the mode, so the median would lie below the mode too and the largest value is the mode.
7. A list of the countries that are members of the European Union in order of their GDP per capita is an
example of
[20]
a) *Ordinal data.
b) Nominal data.
c) Interval data.
d) Ratio data.
e) None of the above.
8. A frequency distribution is of unknown shape and consists of 600 observations with a mean of 162 and a
standard deviation of 12.
a) What is the minimum number of observations that must fall between 138 and 186?
Answer: 138 is 24 below the mean. 24 is twice 12 so the range 138 to 186 is two standard deviations above
and below the mean. According to the Tchebyschev inequality, the largest number of observations that
1
1
could be more than k  2 standard deviations from the mean is 2  . So the interval must contain at
4
k
least three quarters of the observations or 450 out of 600.
b) What is the maximum number of observations that could be above 210?
[24]
Answer: 210 is 48 above the mean. This is k  4 standard deviations. So the maximum number is
1
1
 . This would be (rounding down) 37 out of 600 observations. (Actually there is a 1-tailed version
2
16
k
1
1

that says Px     k  
. This would give us 35 observations.)
2
17
1 k
c) How would you change your answers to a) and b) if you found that the distribution was symmetrical and
unimodal? (3)
The Empirical rule says 68% within one standard distribution of the mean, 95% within
two and almost all (99.7%) within three. This implies about 408 in 138-186 and at most 2 above 198. There
are unlikely to be any above 210.
[27]
9. You have a deck of 52 cards consisting of ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, king and queen of hearts,
clubs, diamonds and spades. Explain how you could divide the deck into 2 classes that are: (1 each)
a) collectively exhaustive but not mutually exclusive; This is pretty open ended. For this one we could try
cards up to and including 6s  6 and cards above 5  6 . 6 would be in both groups.
b) mutually exclusive but not collectively exhaustive; Spades as one group, clubs as the other. All red cards
have been forgotten. We could also try cards below 6 and cards above 6 (and forget 6).
c) both mutually exclusive and collectively exhaustive. Red cards as one group, Black as the other. Or
maybe face cards as one, non-face cards as the other. We could also try cards below 6 and cards above 6.up
to and including 6s  6 and cards above 5  6 . [30]
4
251y0711
2/20/07
10. In ECO 252 you will learn to test a null hypothesis. A null hypothesis is a quantitative statement about
a population that can be disproved. The null hypothesis must meet three requirements: First, it must contain
 ,  or  ; Second, it must include a parameter or parameters and Third, it must contain reasonable values
for the parameter. Consider the following: (i)   3, (ii)   3, (iii) x  2, (iv) s  5, (v)   3.
The following could be null hypotheses:
a) (iii), (iv) and (v).
b) (i), (ii) and (v).
c)* (i) and (ii).
d) (i) only.
e) all of the above
f) None of the above.
[32]
Explanation: (iii) x  2 and (iv) s  5 involve sample statistics, not population parameters. (v)   3
involves a parameter, but a standard deviation cannot be negative.
11. Which of the following statements about the median is not true?
a) *It is more affected by extreme values than the arithmetic mean.
b) It is a measure of central tendency
c) It is equal to the second quartile.
d) It is equal to the mode in symmetrical, unimodal distributions.
e) All of the above are true.
12. In a set of numerical data the second quartile is always halfway between the first and third quartile.
The above statement is: (1)
a) True
b) *False
Explanation: It’s between them all right but only halfway if the distribution is symmetrical.
5
251y0711
2/20/07
ECO251 QBA1
FIRST EXAM
February 21, 2007
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Throughout this exam show your work! Please indicate clearly what sections of the problem you are
answering and what formulas you are using. Turn this is with your in-class exam.
Part IV. Do all the Following (12+ Points). These are based on problems by Edward J. Kane. Show your
work!
1. In May 1997 Forbes Magazine provided data on the salaries of 50 CEOs. These were arranged by Allen
L. Webster to give the table below. Amounts are in thousands. Treat these data as a sample. Personalize the
data below by adding the six digits of your student number to the last 6 frequencies. .For example,
Seymour Butz’s student number is 876509 so he adds 8 to second frequency and 7 to the third frequency,
etc and uses {9, 19, 17, 14, 9, 3, and 14} (adding to 85). You may check your work on the computer, but
what is turned in should look as if it had all been done by hand.
Salary in Thousands Frequency
1
2
3
4
5
6
7
90 to
440 to
790 to
1140 to
1490 to
1840 to
2190 to
under
under
under
under
under
under
under
440
790
1140
1490
1840
2190
2540
a. Calculate the Cumulative Frequency (0.5)
b. Calculate the Mean (0.5)
c. Calculate the Median (1)
d. Calculate the Mode (0.5)
e. Calculate the Variance (1.5)
f. Calculate the Standard Deviation (1)
g. Calculate the Interquartile Range (1.5)
h. Calculate a Statistic showing Skewness and
interpret it (1.5)
i. Make a frequency polygon of the data
(Neatness Counts!)(1)
j. Extra credit: Put a (horizontal) box plot below
the frequency chart using the same horizontal
scale (1)
9
11
10
8
4
3
5
Note that unreasonable answers are answers where the mean, median, mode, first quartile and third
quartile do not fall between 90 and 2540.
Solution using the original numbers: If we use the original numbers and either the computational method
(Columns 1-7) or the definitional method (Columns 1-5, 8-11), we get the following for the frequencies.
(1)
Row
Class
1 90
to under
2 440 to under
3 790 to under
4 1140 to under
5 1490 to under
6 1840 to under
7 2190 to under
Total
(11)
Row
1
2
3
4
5
6
(2)(3) (4)
f F
x
440
9 9 265
790 11 20 615
1140 10 30 965
1490 8 38 1315
1840 4 42 1580
2190 3 45 1845
2540 5 50 2110
50
(5)
fx
2385
6765
9650
10520
6320
5535
10550
51725
(6)
2
fx
(7)
3
fx
632025 1.67487E+08
4160475 2.55869E+09
9312250 8.98632E+09
13833800 1.81914E+10
9985600 1.57772E+10
10212075 1.88413E+10
22260500 4.69697E+10
70396725 111492128375
(8)
(9)
(10)
xx
f x  x  f x  x 2
-769.5
-419.5
-69.5
280.5
545.5
810.5
1075.5
-6925.5 5329172
-4614.5 1935783
-695.0
48303
2244.0
629442
2182.0 1190281
2431.5 1970731
5377.5 5783501
0.0 16887213
f x  x 3
-4100798046
-812060864
-3357024
176558481
649298286
1597277273
6
251y0711
7
2/20/07
6220155594
3727073700
I usually tell people that they are wasting their time if they use the definitional method. Because of the
large numbers here that may not be true. Remember that the numbers here are in thousands. Because of the
large numbers we might want to try to work in millions. This would mean changing the x column to 0.265,
0.615, 0.965, 1.140 etc. We would get, for the first row fx = 2.385, fx2 = 0.6320 and fx3 = 0.1675. In
any case the definitional method numbers would be more tractable.
If you used the computational method, you would have computed columns 2, 3, 4, 5, 6, and 7 and gotten
 f  50 and  fx  51725 , so that the mean is x 
find  fx  70396725 and  fx  111492128375.
n
2
 fx  51725  1034 .5 . You would also
n
50
3
If you used the definitional method, you would have computed columns 2, 3, 4, 5, 8, 9, 10 and 11 and
gotten 7 and gotten n 
 f  50
You would have followed by getting
 f x  x 
3
 fx 51725
 fx  51725 , so that the mean is x  n  50  1034 .5 .
 f x  x   0 (a check),  f x  x 2  16887213 and
and
 3727073700 .
If you used one of Pearson’s measures of skewness, you would not have bothered with columns 7 or 11. In
any case only an Adrian Munk personality would have computed everything here.
a. Calculate the Cumulative Frequency (0.5): See the F column above.
fx 51725
b. Calculate the Mean (0.5): We have already found x 

 1034 .5 .
n
50
c. Calculate the Median (1): position  pn  1  .550   25 . This is above F  20 and below F  30, so
the interval is the 3rd, 790 to 1140, which has a frequency of 10. Each interval width is 1140 - 790 = 350.
 pN  F 
 .550   20 
x1 p  L p  
 w so x1.5  x.5  790  
 350   790  0.5350   965
10
f


p


d. Calculate the Mode (0.5): The largest group is 440 to 790, which has a frequency of 11, so by convention
the mode is its midpoint, which is mo  615. It is possible that you will have two modes. Note that to be
reasonable, Q1  x50  Q3 and that Q1, x50 , Q3, x and the mode must be between 90 and 2540.

e. Calculate the Variance (1.5): s 2 
or s 2 
 f x  x 
n 1
2

 fx
2
 nx 2
n 1

70396725  50 1034 .52 16887212 .5

 344636 .99
49
49
16887213
 346637 . The computer got 346637 too.
49
f. Calculate the Standard Deviation (1): s  346637  588.76
g. Calculate the Interquartile Range (1.5): First Quartile: position  pn  1  .2551  12.75 . This is
above F  9 and below F  20, so the interval is the 2nd, 440 to 790, which has a frequency of 11.
 pN  F 
 .2550   9 
x1 p  L p  
 w gives us Q1  x1.25  x.75  440  
 350 
11
f




p
 440  0.31818 350   551 .36 .
7
251y0711
2/20/07
Third Quartile: position  pn  1  .7551  38.25 . This is above F  38 and below F  42, so the
interval is the 5th, 1490 to 1840 which has a frequency of 4.
 .75 50   38 
x1.75  x.25  1490  
 350   1490   0.125 350  1446 .25 . Since this ended up in the
4


 .7550   30 
wrong group, I tried the earlier group. x1.75  x.25  1140  
 350   1468 .125
8


This illustrates the inaccuracy of the formula, which is only an approximation. We can go back to the
original assumption about the layout of the numbers. The interval 1140 to 1490 has a frequency of 8. This
yields an interval between subsequent pairs of numbers between x31 and x38 of 3508  43.75 . We assume
a half interval of 21.875 between 1140 and x31 and another interval of the same size between x38 and
1490. The interval 1490 to 1840 contains x39 through x 42 and has a frequency of 4. The interval between
subsequent pairs of numbers between x39 and
x 42 is
350
4
 87.5 . The half interval between 1490 and x39
or between x 42 and 1840 is 43.75. If the colon below represents the group boundary at 1490, a diagram of
x37 through
x 40 appears below. The intervals between x38 and x39 add to 21.875 + 43.75 = 65.625. If
position  pn  1  .7551  38.25 , our value should be x38  .25x39  x38   1468 .75  .2565 .625 
x38  .25x39  x38   1468 .75  .2565.625   1484 .66.
x37  1424 .375 x38  1468 .125 :
x39  1533 .75
x 40  1621 .25
43.75
21.875 43.75
87.5
Obviously, I would accept any of the three answers given here, but 1446.25 is probably the worst. If we use
1468.125, we have IQR  Q3  Q1  1468 .125  551 .36   916 .765 .
Note that, no matter how much you may want to believe it, it is not true that the IQR = (.25) (n+1) – (.75)
(n+1) = .50 (n+1).
h. Calculate a Statistic showing Skewness and interpret it (1.5) : We had n  50 and
that the mean is x  1034 .5 . We also found that
 f x  x 
3
k 3
 fx
2
 70396725 ,
 fx
3
 fx
 51725 , so
 111492128375 and
 3727073700 , s  588 .76 , x.5  965 and mo  615.
n
(n  1)( n  2)

 fx
3
 3x
 fx
2

 2nx 3 

50
1114921283 75  31034 .570396725  250 1034 .53
4948 


 0.0212585  1114921283 75  2.1847624 10 10  1.1071118 10 11  0.0212585 327073700   79231996 .
or k 3 
n
(n  1)( n  2)
or g 1 
k3
or
s3

 f x  x 
793232009
1034 .53
3

50
3727073700   79232008 .92 The computer gets 79232009.
49 48 
 0.391614
Pearson's Measure of Skewness SK1 
mean  mode  1034 .5  615   0.7125 or
std .deviation
588 .76
3mean  median 31034 .5  965 

 0.3541
588 .76
std .deviation
Because of the positive sign, the measures all imply skewness to the right..
SK 2 
8
251y0711
2/20/07
i. Make a frequency polygon of the data (Neatness Counts!)(1)
If we add to dummy groups to our data, we have the following.
Salary in Thousands
0
1
2
3
4
5
6
7
8
-260
90
440
790
1140
1490
1840
2190
2540
to
to
to
to
to
to
to
to
to
under
under
under
under
under
under
under
under
under
Frequency
90
440
790
1140
1490
1840
2190
2540
2890
Midpoint
0
9
11
10
8
4
3
5
0
-85
265
615
965
1315
1580
1845
2110
2460
Normally, A frequency polygon would require that we plot the midpoints on the x axis and the frequencies
on the y axis using straight lines between each point. The graph would begin and end at a zero frequency.
However, the most natural display here would let x run from zero to 2500 or 3000. At zero the value on the
vertical axis would be about 2.19. ( This, of course, is very approximate. The equation for a line going
through (-85, 9) and (265, 9) would be y  2.18535 0.02571x and this would be 2.18535 at x  0 , but
you don’t need to know this to have an approximately correct intercept.
j. Extra credit: Put a (horizontal) box plot below the frequency chart using the same horizontal scale (1)
The five-number summary is (265, 551.36, 965, 1468.125. 2460).
IQR  Q3  Q1  1468 .125  551 .36   916 .765 . 1.5( IQR )  1375 .15 If you use fences, they should be at
551 .36  1375 .15  823 .79 and 1468 .125  1375 .15  2843 .28 . But these are beyond the range of the
data, which makes them irrelevant. So the box extends from 551.36 to 1468.125, with a median marked by
a horizontal line at 965. The whiskers go from the box to 265 and 2460 with dotted lines showing the full
range unnecessary. A rough picture is below.
0
500
1000
1500
2000
2500
2. Take your student number as a sample of size 6. Each digit will be a separate number. Change all
zeroes to nines. For example, Seymour Butz’s student number is 876509, so his numbers are 8, 7, 6, 5, 9
and 9. Find the following
a) Geometric Mean
b) Harmonic mean
c) Root-mean-square
If you wish, d) Compute the geometric mean from a) using natural and/or base 10 logarithms. (1 point extra
credit each).
Solution: Using Seymour’s numbers, and being incredibly lazy, I ran most of this on Minitab.
————— 2/21/2007 5:44:04 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > let c2 = loge(c1)
MTB > let c3 = logten (c1)
MTB > let c4 = 1/c1
MTB > let c5 = c1*c1
MTB > print c1 - c5
9
251y0711
2/20/07
Data Display
Row
1
2
3
4
5
6
x
8
7
6
5
9
9
ln_x
2.07944
1.94591
1.79176
1.60944
2.19722
2.19722
log_x
0.903090
0.845098
0.778151
0.698970
0.954243
0.954243
1/x
0.125000
0.142857
0.166667
0.200000
0.111111
0.111111
xsq
64
49
36
25
81
81
MTB > sum c1
Sum of x
Sum of x = 44
MTB > sum c2
Sum of ln_x
Sum of ln_x = 11.8210
MTB > sum c3
Sum of log_x
Sum of log_x = 5.13379
MTB > sum c4
Sum of 1/x
Sum of 1/x = 0.856746
MTB > sum c5
Sum of xsq
Sum of xsq = 336
Solution: Using the original data, before I started, I computed the following table.
(1)
(2)
My computations are thus as below.
1
Row x
x2
ln(x)
log(x)
x
1 8 2.07944
2 7 1.94591
3 6 1.79176
4 5 1.60944
5 9 2.19722
6 9 2.19722
Sum 44 11.82099
0.903090
0.845098
0.778151
0.698970
0.954243
0.954243
5.133795
0.125000
0.142857
0.166667
0.200000
0.111111
0.111111
0.856746
64
49
36
25
81
81
336
44
 7.333333 . Note that reasonable answers should all fall in the interval between
6
the highest and lowest digit. In this case that would mean 5 to 9.
The arithmetic mean is
1
a) The Geometric Mean. The formula table says x g  x1  x 2  x3  x n  n  n
x
x g  6 876599  6 136080  136080 6  1369890.1666667  7.17187 .
1
b) The Harmonic Mean. The formula table says
1
1

xh n
x
1
1 11 1 1 1 1 1 1
         0.125000  0.142857  0.166667  0.200000  0.111111  0.111111 
xh 6  8 7 6 5 9 9  6
10
251y0711

2/20/07
1
0.856746   0.142791 .
6
So xh 
1
1
n

1
x

1
 7.00324 .
0.142791
Of course some of you decided that 1  1  1  1  1  1  1  1  1  1   ? 1 
xh
n
x
68
7
6
5
9
9
1

  ??? .
68 7  6599
This is, of course, an easier way to do the problem. It is also wrong, and you will get an A for the
course if you can prove to me that it is not wrong!
1
1
x 2 or x rms 2 
x2
c) The root-mean-square. The formula table says x rms 
n
n




1 2
336
8  7 2  62  52  92  92 
 56 . So x rms  56  7.48331 .
6
6
1
ln( x)  , but I said in class that this could be
d) Geometric Mean. The formula table says ln x g 
n
either natural logs or logs to the base 10.
1
Natural Logarithms. ln x g  11 .82099   1.970165 and x g  e1.97019  7.171860 . There
6
must be a substantial rounding error here.
1
Logarithms to the base 10. log x g  5.133795   0.85563 and x g  10 0.85563  7.17183
6
x rms 2 
 

 
 
11
Download