251y0451 10/20/04 Name: _____KEY______________ Student Number : _____________________

advertisement
251y0451 10/20/04
ECO251 QBA1
FIRST HOUR EXAM
October 6, 2004
Name: _____KEY______________
Student Number : _____________________
Class Hour: _____________________
Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer
needs a calculation or short explanation to count.
Part I. (7 points)
Use the eleven numbers that you used in the second problem in the take-home exam. (If you don’t have
them – take your student number plus the numbers (3, 6, 9, 9, 21) . Example: Seymour Butz’s student
number is 876509, so he gets 8, 7, 6, 5, 0, 9, 3, 6, 9, 9, 21. Of course, he has read “Things That You Should
Never Do on an Exam or Anywhere Else” and knows that he can’t use them this way. )
Compute the following:
a) The Median (1)
b) The Standard Deviation (3)
c) The 2nd Quintile (2)
d) The Coefficient of variation (1)
Solution: Seymour used the eleven numbers 1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1 . The numbers in order are 1,
1, 2, 3, 5, 7, 7, 10, 12, 15, 22.
x
x2
1
1
x1
1
1
x2
x3
2
4
x4
3
9
x5
5
25
x6
7
49
x7
7
49
x8
10
100
x9
12
144
x10
15
225
x11
Total
22
484
85
1091
a) The middle number is 7.
b) n  11, x 
 x  85  7.72727 , s   x
2
2
 nx 2

n
11
n 1
434 .181818

 43 .41818 . So s  43.41818  6.589247
10
c) position  pn  1  .412  =4.8. So a  4 and .b  .8 .
1091  117.72727 2
10
x1 p  xa  .b( xa1  xa ) so x1.4  x.6  x 4  .8x5  x 4   3  .85  3  4.6
d) C 
s 6.589247

 0.8527
x
7.72727
1
251y0451 10/20/04
If you enjoy wasting time, you might want to use the definitional formula.
xx
 x  x 2
x1
x
1
-6.7273
45.256
x2
1
-6.7273
45.256
x3
2
-5.7273
32.802
x4
3
-4.7273
22.347
x5
5
-2.7273
7.438
x6
7
-0.7273
0.529
x7
7
-0.7273
0.529
x8
10
2.2727
5.165
x9
12
4.2727
18.256
x10
15
7.2727
52.873
x11
Total
22
14.2727
203.711
85
0.0003
434.182
n  11, x 
 x  85  7.72727 , s   x  x 
2
n
11
n 1
2

434 .182
 43 .4182 . The vast majority of people
10
 x x 
who thought that they were using the definitional formula used
2
, which, I believe, should have
n 1
given them x 2 . Doing a little bit of homework should have prevented this error.
2
251y0451 10/07/04
Part II.
1. The problem in the textbook that gives the data used in the take home also gives the braking
distance for a sample of domestic made cars. It is presented below. Cumulative frequency (in red)
is needed to get the median and was not given.
Distance(feet) frequency
210
220
230
240
250
260
270
280
290
300
310
Sum
–
–
–
–
-
220
230
240
250
260
270
280
290
300
310
320
1
1
1
1
4
3
6
4
2
2
0
25
Cumulative frequency
1
2
3
4
8
11
17
21
23
25
25
Minitab was used to calculate statistics from these data. It claims the following: (Note!!!!!) x  269 ,
s 2  525 , k 3  7281 .61. You will not be able to use any of these numbers in b) or c) without some
manipulation in parts b and c. Answers below are not acceptable unless you give some evidence in the
sample statistics.
a) Do American cars have a shorter braking distance? Compare all 3 measures of central
tendency. (2)
b) Are American cars more consistent in braking distance than foreign cars? Use a
dimension-free measurement of variability. (2)
c) Compare the direction and degree of skewness in the two distributions. Use one
dimension- free measure of skewness. (2)
d) Write a 5-number summary of the results from the first take-home problem. (2) 15
Solution: a) Seymour had given us, for the foreign-made cars. x  260 .647 , x1.5  x.5  255 .6810 and the
mode is 255.
For the median for the domestic cars position  pn  1  .526   13 . Since 13 is above 11 and below 17,
 .525   11 
the median is in 270-280, which has a frequency of 6. x 1.5  x.5  270  
10  272 .5 . The
6


mode is the midpoint of the largest group, which is 275.
Domestic
Foreign
Mean
269
260.647
Median 272.5
255.681
Mode
275
255
According to all measures, American cars have a longer braking distance.
b) Seymour says for the foreign cars s  567.731  23.8271 . If we compute the coefficient of variation,
Cx
s
22 .913
23 .8271
 .08518 Foreign C 
 .0914 .
Domestic C 
269
260 .647
American cars are more consistent.
3
251y0451 10/07/04
c) You can use g1 or SK
Domestic
Mean
269
Mode
275
-7281.61
k3
s
22.913
k3
7281 .61
 .6053
g1  3
s
22 .913 3
or
3mean  mode  3269  275 
 .786
SK 
22 .913
std .deviation
Foreign
260.647
255
8389.92
23.8271
8689 .93
23 .8271 3
= .6424
3260 .647  255 
 .0.711
23 .8271
My answers are not consistent. g1 makes Foreign more skewed, while SK makes Domestic look more
skewed. However, Domestic is skewed to the left and Foreign to the right.
d)
Lower Limit
First Quartile
Median
Third Quartile
Upper Limit
2.
210
243.5
255.681
275.357
320
The following numbers refer to miles-per-gallon of a sample of vehicles (Bowerman and
O’Connell).
Class (mpg)
F
f rel
f
Frel
29.8 - 30.3
____
____
____
.0612
30.4 – 30.9
____
____
____
.2449
31.0 – 31.5
____
____
24
____
31.6 – 32.1
____
.2653
35
.7551
32.2 – 32.7
11
.2245
46
.9388
32.8 – 33.3
3
.0612
49
1.000
Fill in the missing numbers. (5)
20
Even with corrections made above, this had some errors, but I still could check easily to see if you
knew what you were doing. The completely corrected results were.
Class (mpg)
F
f rel
f
Frel
29.8 - 30.3
3
.0612
3
.0612
30.4 – 30.9
9
.1837
12
.2449
31.0 – 31.5
12
.2449
24
.4898
31.6 – 32.1
13
.2653
37
.7551
32.2 – 32.7
9
.1837
46
.9388
32.8 – 33.3
3
.0612
49
1.000
Total
49
1.0000
4
251y0451 10/07/04
Part III. (At least 22 points – 2 points each unless marked)
1.
Mark the variables below as qualitative (A) or quantitative (B)
a) Number of days a patient stays at a spa
B
b) Preferences for 10 beers on a 1st to 10th scale A
c) Method of contraception
A
d) Per cent change in population between censuses B
2.
Which of the following is an example of continuous ratio data?
a) Number of days a patient stays at a spa
b) Preferences for beers on a 1 to 10 scale
c) Method of contraception
d) *Per cent change in population between censuses
e) None of the above.
4
3.
A summary measure that is computed to describe a characteristic of a population is called
a) *a parameter.
b) a census.
c) a statistic.
d) An inference
e) None of the above
6
4.
In general what are the two types of descriptive statistic most frequently reported
a) Measures of kurtosis and measures of dispersion
b) Measures of kurtosis and measures of skewness
c) Measures of kurtosis and measures of central tendency
d) Measures of dispersion and measures of skewness
e) *Measures of dispersion and measures of central tendency
f) Measures of skewness and measures of central tendency
g) None of the above.
8
5
251y0451 10/07/04
Mark the following formulas (1 each) . Circle a, b or c. b) must be filled in if you
have circled it.
5.
Coefficient of Excess
2 
4
3
4
or g 2 
k4
s4
a) This cannot be negative.
b) *If this is negative it means the distribution is Platykurtic (Flat – topped).
c) This can be negative, but it has no special meaning.
6.



n
x 3  3x
x 2  2nx 3
(Skewness)
(n  1)( n  2)
a) This cannot be negative.
b) *If this is negative it means the distribution is Skewed to the left
c) This can be negative, but it has no special meaning.
k 3
x
x
7.
(Sample mean)
n
a) This cannot be negative.
b) If this is negative it means the distribution is ______
c) *This can be negative, but it has no special meaning.
8.
s2 
x
2
 nx 2
(Variance)
n 1
a) *This cannot be negative.
b) If this is negative it means the distribution is ______
c) This can be negative, but it has no special meaning.
12
Does it really mean anything to tell me that if one of these statistics is negative, the distribution is
negative?
6
251y0451 10/07/04
Exhibit 1: The following is taken from Problem 3.22 in the text. The data below represent sales tax
receipts submitted to a township government by 50 businesses in one quarter.
Sales Taxes ($000)
10.3
13.0
11.1
10.0
9.3
11.1
11.2
10.2
12.9
11.5
9.6 9.0 14.5 13.0
7.3 5.3 12.5 8.0
11.1 9.9 9.8 11.6
9.2 10.0 12.8 12.5
10.7 11.6 7.8 10.5
6.7
11.8
15.1
9.3
7.6
11.0
8.7
12.5
10.4
10.1
8.4
10.6
6.5
12.7
8.9
10.3
9.5
7.5
10.5
8.6
The text solution manual offers the following results.
(a) Stem-and-leaf display of Quarterly Sales Tax Receipts
5
6
7
8
9
10
11
12
13
14
15
3
57
3568
04679
02335689
00123345567
011125668
555789
00
5
1
(b)  = 10.28
(c)
(d)
(e)
(f)
9.
 2 = 4.1820,  = 2.045
64% of the receipts are within 1 standard deviations of the mean.
94% of the receipts are within 2 standard deviations of the mean.
100% of the receipts are within 3 standard deviations of the mean.
According to the stem and leaf display, what percent of the receipts were below $7000? (1)
3/50 = 6%
10. If the researcher was directed to present the data in 6 classes, what should the class interval
be? Show your calculations.
15
15 .1  5.3
 1.63 Let’s try 2
Lowest is 5.3. Highest is 15.1
6
11. Show the actual intervals you might use.
Class
A
B
C
D
E
F
From
5
7
9
11
13
15
17
to
Under
Under
Under
Under
Under
Under
7
9
11
13
15
17
7
251y0451 10/07/04
Before we start, most of you seem to have no idea what ‘3 standard deviations from the mean’
signifies. Nevertheless, one student paper put it this way.
    10.28  2.045   10.28  2.045 or 8.235 to 12.325
  2  10.28  22.045   10.28  4.090 or 6.190 to 14.370
  3  10.28  32.045   10.28  6.135 or 4.145 to 16.415
Two of these should appear in your answer below.
12. The description above says that 64% of the receipts are within 1 standard deviations of the
mean. Between what numbers does this mean? How does this compare with the empirical
rule? Why might there be a discrepancy? (3)
Empirical rule: (For Symmetrical Unimodal distributions only): 68% within one
standard deviation of the mean, 95% within two and almost all (99.7%) within three.
This is lower and could be because the distribution is not quite symmetric.
13. The description above says that 100% of the receipts are within 3 standard deviations of the
mean. Between what numbers does this mean? How does this compare with the Chebyshev
rule? Why might there be a discrepancy? (3)
1
1
Chebyshef’s Inequality: P x    k  2 or P  k  x    k   1  2 .
k
k
This means that at least 8/9 should be within 3 standard deviations of the mean. In the
real world the number is almost always larger.


8
251y0451 10/07/04
ECO251 QBA1
FIRST EXAM
October 6, 2004
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Throughout this exam show your work! Please indicate clearly what sections of the problem you are
answering and what formulas you are using. Turn this is with your in-class exam.
Part IV. Do all the Following (11 Points) Show your work!
1. The frequency distribution below represents the braking distance for a sample of foreign made cars..
Personalize the data as follows. Write down your student number. Take the last two digits of the number.
Add the largest of the two last numbers to the frequency for 300-310 and the second largest to the frequency
for 310-320. Use the results as your frequencies. For example, Seymour Butz’s student number is 876509
so he adds 0 to the last frequency and 9 to the second to last frequency and uses (1, 3, 12, 15, 22, 7, 7, 5, 2,
10, 1).
Distance (feet) frequency
210
220
230
240
250
260
270
280
290
300
310
–
–
–
–
-
220
230
240
250
260
270
280
290
300
310
320
a. Calculate the Cumulative Frequency (0.5)
b. Calculate The Mean (0.5)
c. Calculate the Median (1)
d. Calculate the Mode (0.5)
e. Calculate the Variance (1.5)
f. Calculate the Standard Deviation (1)
g. Calculate the Interquartile Range (1.5)
h. Calculate a Statistic showing Skewness and
Interpret it (1.5)
i. Make an ogive of the data showing relative or
percentage cumulative frequency (Neatness
Counts!)(1.5)
j. Extra credit: Put a (horizontal) box plot below
the ogive using the same scale. (1)
1
3
12
15
22
7
7
5
2
1
1
Solution: x is the midpoint of the class. Our convention is to use the midpoint of 50 to 60, not 50 to
59.999. Note also, that the midpoints have been divided by 10. Most numbers should be multiplied by 10,
the variance should be multiplied by 100 and k 3 by 1000. Calculations follow for both the computational
and definitional formulas. (Don’t do both.) Seymour’s frequencies are used below.
If you used computational formulas, you should have the following.
1
2
3
4
5
6
7
8
9
10
11
n
class
f
F
x
210-220
220-230
230-240
240-250
250-260
260-270
270-280
280-290
290-300
300-310
310-320
Total
1
3
12
15
22
7
7
5
2
10
1
85
1
4
16
31
53
60
67
72
74
84
85
215
225
235
245
255
265
275
285
295
305
315
 f  85,  fx
 22155 ,
fx
215
675
2820
3675
5610
1855
1925
1425
590
3050
315
22155
 fx
2
fx3
fx 2
46225
151875
662700
900375
1430550
491575
529375
406125
174050
930250
99225
5822325
 5822325 ,
9938375
34171875
155734500
220591875
364790250
130267375
145578125
115745625
51344750
283726250
31255875
1543144875
 fx
3
 1543144875 .
9
251y0451 10/07/04
If you used definitional formulas, you should have the following.
1
2
3
4
5
6
7
8
9
10
11
n
210-220
220-230
230-240
240-250
250-260
260-270
270-280
280-290
290-300
300-310
310-320
1
3
12
15
22
7
7
5
2
10
1
85
1
4
16
31
53
60
67
72
74
84
85
 f  85,  fx
 f x  x 
2
x
f F
class
215
225
235
245
255
265
275
285
295
305
315
 22155 ,
 47689.4, and
f x  x 
f  x  x 2
f x  x 3
-45.647
-106.941
-307.765
-234.706
-124.235
30.471
100.471
121.765
68.706
443.529
54.353
0.000
2083.7
3812.1
7893.3
3672.5
701.6
132.6
1442.0
2965.3
2360.2
19671.8
2954.2
47689.4
-95113
-135892
-202439
-57463
-3962
577
20698
72214
81081
872504
160572
712778
xx
fx
215
675
2820
3675
5610
1855
1925
1425
590
3050
315
22155
-45.6471
-35.6471
-25.6471
-15.6471
-5.6471
4.3529
14.3529
24.3529
34.3529
44.3529
54.3529
 f x  x   0 (except for a possible rounding error),
 f x  x 
3
 712778.
a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column.
b. Calculate the Mean (1): x 
 fx  22155  260 .647
n
85
c. Calculate the Median (2): position  pn  1  .586   43 . This is above F  31 and below F  53 so
 pN  F 
the interval is the 5th one, 250 – 260. x1 p  L p  
 w so
 f p 
 .585   31 
x1.5  x.5  250  
 10   250  5.6818  255 .6810
22


d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 22 is the largest frequency,
the modal group is 250 to 260 and the mode is 255 ..
e. Calculate the Variance (3): s 2 
s2 
 f x  x 
n 1
2

 fx
2
 nx 2
n 1

5822325  85260 .647 2 47692 .0

 567 .762 or
84
84
47689 .4
 567 .731 . The computer got 567.731.
84
f. Calculate the Standard Deviation (2): s  567.731  23.8271 .
g. Calculate the Interquartile Range (3): First Quartile: position  pn  1  .2586   21.50 . This is above
 pN  F 
F  16 and below F  31, so the interval is 240-250. x1 p  L p  
 w gives us
 f p 
 .2585   16 
Q1  x1.25  x.75  240  
 10   243 .5 .
15


Third Quartile: position  pn  1  .7586   64.5 . This is above F  60 and below F  67 , so the
 .75 85   60 
interval is 270-280. x1.75  x.25  270  
 10   275 .357 .
7


IQR  Q3  Q1  275.357  243.5  31.857 .
Note that an answer for the mean, median, mode, first quartile or third quartile that is not between
the highest and lowest number, in this case 210 and 340, is not reasonable!
10
251y0451 10/07/04
h. Calculate a Statistic showing Skewness and interpret it (3):
fx  22155 ,
fx 2  5822325 ,
We had n  85, x  260 .647 ,
 f x  x 
3
k 3


 fx
3
 1543144875 , and
 712778.
n
(n  1)( n  2)
 fx
3
 3x
 fx
2

 2nx 3 

85
1543144875  3260 .647 5822325  285 260 .647 3
84 83 

 0.0121916 1543144875  4552714633  3010281526   0.0121916 711768   8677 .59 .
or k 3 
n
(n  1)( n  2)
or g 1 
k3
s
3

 f x  x 
8689 .93
23 .8271 3
3

85
712778   8689 .92 The computer gets 8689.93.
84 83 
 0.6423958
3mean  mode 3260 .647  255 

 0.7110
std .deviation
23 .8271
Because of the positive sign, the measures imply skewness to the right.
or
Pearson's Measure of Skewness SK 
i. An ogive is a simple line graph with cumulative frequency between zero and one on the y-axis and the
numbers 200-340 on the x-axis. The data Seymour showed is:
F
up to
Frel
210
220
230
240
250
260
270
280
290
300
310
320
330
0
1
4
16
31
53
60
67
72
74
84
85
85
0
.012
.047
.188
.365
.624
.706
.788
.847
.870
.988
1.000
1.000
Each number in the Frel column is the corresponding number in the F column divided by n  85. The y
axis should be marked from zero to a 1.00. The y axis should be marked from zero to a 1.00. In spite of the
fact that the question tells you that an ogive shows cumulative frequency, many of you gave me a frequency
polygon, most of you did not obey the convention that the curve starts at zero and most of you did not
convert of per cent.
j. The box plot should show the median and the quartiles and use the same x axis as the ogive.
11
251y0451 10/07/04
2. Use the frequencies you used in problem 1 in this problem as values of x .
For these eleven numbers, compute the a) Geometric Mean b) Harmonic mean, c) Root-mean-square
(1point each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10
logarithms. (1 point extra credit each ). While you’re at it, compute the sample mean and
bring it and the numbers that you used on this take-home exam to the in-class exam
(no credit until you get to the exam – but it won’t hurt).
Solution: Note that Seymour used the eleven numbers 1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1. He found
x  85 or x  7.72727 . This is not used in any of the following calculations and there is no reason why

you should have computed it except to use in class! Note that an answer that is not between the highest
and lowest number is not reasonable!
a) The Geometric Mean.
1
x g  x1  x 2  x3  x n  n  n
x 
11
1312 15 22 775210 1  11 58212000
 58212000

1
11
 58212000 0.0909091  5.08054 . At least, not many of you tried to get the answer by dividing
582112000 by 11, but a number of you seem to have convinced your selves that you could take a square
root instead of an 11th root.
b) The Harmonic Mean.
1 1

xh n
1 1
1
 x  11  1  3  12  15  22  7  7  5  2  10  1 
1
1
1
1
1
1
1
1
1
1

1
1.00000  0.33333  0.08333  0.06667  0.04545  0.14286  0.14286  0.20000  0.50000  0.100000  1.00000 
11

1
3.61450   0.328591 .
11
So xh 
1
1
n

1
x

1
 3.0433 .
0.328591
Of course many of you decided that 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
xh
n
x
11  1
3
12
15
22
7
7
5
2
10
1
1  1  1  1 
 ??? 
  . . This is, of course, an easier way to do the problem, but I warned you that
n
x  11  89 


1 1 1
it wouldn’t work. . It is equivalent to believing that  
2 2 4

c) The Root-Mean-Square.
1
1 2
2
x rms

x2 
1  3 2  12 2  15 2  22 2  7 2  7 2  5 2  2 2  10 2  12
n
11
1
 1  9  144  225  484  49  49  25  4  100  1
11
1
1
x 2  99 .1818  9.9590 .
 1091   99 .1818 . So x rms 
n
5
2
1
1
1
2

x 2  ???
x  812 . This is, of course, an
Of course many of you decided that x rms
n
n
11
easier way to do the problem, but I warned you that it wouldn’t work. It is equivalent to believing that
22  22  42 .





 
12
251y0451 10/07/04
d) (i) Geometric mean using natural logarithms
1
ln( x)  1 ln 1  ln 3  ln 12   ln 15   ln 22   ln 7   ln 7   ln 5  ln 2  ln 10   ln 1
ln x g 
n
11
1
 0  1.09861  2.48491  2.70805  3.09104  1.94591  1.94591  1.60944  0.69315  2.30259  0
11
1
 17 .8796   1.62542
11
So x g  e 51.62542  5.08054 .
 

(ii) Geometric mean using logarithms to the base 10
1
log( x)
log x g 
n
1
 log 1  log 3  log 12   log 15   log 22   log 7   log 7   log 5  log 2  log 10   log 1
11
1
 0  0.47712  1.09861  1.17609  1.34242  0.84510  0.84510  0.69897  0.30103  1.00000  0
11
1
 7.76501   0.705910
11
 

So x g  10 0.70510  5.0854 .
Notice that the original numbers and all the means are between 1 and 22.
It’s probably more efficient to handle a problem this large in columns. The arithmetic mean is also
computed below.
1
x
Row
x2
logx 
ln x 
x
1
2
3
4
5
6
7
8
9
10
11
Total
1
3
12
15
22
7
7
5
2
10
1
85
1.00000
0.33333
0.08333
0.06667
0.04545
0.14286
0.14286
0.20000
0.50000
0.10000
1.00000
3.61450
1
9
144
225
484
49
49
25
4
100
1
1091
0.00000
0.47712
1.07918
1.17609
1.34242
0.84510
0.84510
0.69897
0.30103
1.00000
0.00000
7.76501
0.00000
1.09861
2.48491
2.70805
3.09104
1.94591
1.94591
1.60944
0.69315
2.30259
0.00000
17.8796
Total 7.72727 0.328591 99.1818 0.705910
n
1.62542
So, as before x  7.72727 , xh 
1
 3.0433 , x rms 
0.328591
1
n
x
2
 99.1818  9.9590
x g  10 0.70510  5.0854 and x g  e 51.62542  5.08054 .
13
Download