251y0112 10/08/01 key

advertisement
251y0112 10/08/01
Part I.
ECO251 QBA1
FIRST HOUR EXAM
OCTOBER 2, 2001
Name _______key___________
SECTION MWF 10 11 TR 11 12:30
(10 points)
1. Indicate whether the following are: Nominal Data, Ordinal Data, Interval Data, Continuous Ratio Data or
Discrete Ratio Data. (3)
a. Price to earnings ratio of your stock. Ans: Continuous Ratio.
b. Number of customers who said that service was unsatisfactory in a survey. Ans: Discrete Ratio.
c. The Likert Scale rates customer satisfaction with your firm's service on a one to five scale
where 1 is exceptional and 5 is unsatisfactory. Ans: Ordinal.
(Note: See text p. 13 for most of this - discrete/continuous was defined in class)
2.(D-68) Which of the following explains the shape of a distribution best? (1)
a. Mean
b. Median
c. Box Plot.
*d. Stem-and-leaf plot
e. Mode
(Note: See text pp. 39-44)
3. Make a diagram of a table and show where the field is. (1)
4. The accompanying box plot shows the sale prices of homes (in thousands) in a Pennsylvania town
0
30
60
70 80
110
140
a. What percent of home prices fall between $60 thousand and $80 thousand - why? (2)
Ans: Since 60 is the first quartile and has 25% below it and 80 is the third quartile with 25%
above it, 50% must be between them
b. If the mean price is $69 thousand, is the data skewed to the left or right? (1)
Ans: Since, for data that is skewed to the left, Mean < median < mode, because the diagram
shows that the median is 70, and the mean is lower, it must be skewed to the left
5. Which of the following is a graph that shows cumulative frequency? (2)
a. Histogram
*b. Ogive
c. Frequency Polygon
d. Pie chart
e. None of the above
1
251y0112 10/08/01
Part II. Compute an appropriate answer, showing your work (15+ Points)
a) A distribution of 89 home sale prices has a mean of $67500, a median of $72500 and a standard
deviation of $10000. What is the maximum number of homes that have prices that could be below $37500?
(2)
x   37500  67500



 3  ,
Ans: Since 37500 is 3 standard deviations below the mean  z 

10000


according to Chebyschev, there could only be
1
k2

1
32

1
9
above $97500, this is less than 10
homes.
b) Assume that the distribution above is symmetrical and unimodal. Give a rough answer to the
question in a) and explain your reasoning. (2)
Ans: Since 37500 is 3 standard deviations below the mean, the Empirical rule says that there will
be almost none below $37500.
c) The smallest selling price in the distribution above was $25,000 and the largest was $146,000 (Note
correction!). If these data are to be presented in five classes, what intervals would you use? Explain your
reasoning using an appropriate formula and use it to fill in the table below.(3)
146000  25000
 24200 so use 25000. This is only a suggestion. Any number somewhat
Ans:
5
above 24200 will work, as long as you cover the range.
Class
A
B
C
D
E
From
25000
50000
75000
100000
125000
to
49999
74999
99999
124999
149999
d) WIM technology weighs and measures trucks driving at highway speeds. Trucks are classified
in a report as follows:
A 'WIM gross weight above 70,000 lbs.' B 'WIM gross weight 70,00 lbs. or less.
C 'WIM total length above 60 ft.
D 'WIM total length no more than 60 ft.
Which of the following classes are mutually exclusive? (Circle) (1.5)
A and C ,
*C and D, A, B, and C
Which of the following classes are collectively exhaustive? (Circle) (1.5)
A and C ,
*C and D, *A, B, and C
(Note: This was grade at 0.5 for each item correctly marked or not marked)
2
251y0112 10/08/01
e) For the numbers 3, 103, 203, 303 and 403, compute the i) Root-mean-square ii) Harmonic
mean, iii) Geometric mean (2.5 each)
x  1015 . This is not used in any of the following calculations and there is
Solution: Note that

no reason why you should have computed it!
(i) The Root-Mean-Square.
1
1
1
2
x rms

x 2  3 2  103 2  203 2  303 2  403 2  9  10609  41209  91809  162409 
n
5
5



1
306045   61209 . So x rms 
5
(ii) The Harmonic Mean.

1
1

xh n

1 1
1
n
x
1 
1
2
 61209  247 .404 .
 x  5  3  103  203  303  403   5 0.333333333  0.009708738
1
1
0.353749899
5
1
1
1
  0.070749979
. So xh 
1
1
n
x
1

 0.0049261008  0.003300330  0.002481310
1
 14 .13427965 .
0.070749979
(iii) The Geometric Mean.
1
x g  x1  x 2  x3  x n  n  n
 7659531243
x 
5
3103 203 303 403   5 7659531243
 7659531243

1
5
0.2  94.8070 .
Or
 
ln x g 

1
n
 ln( x)  5 ln 3  ln 103   ln 203   ln 303   ln 403 
1
1
1
1.0986  4.6347  5.3132  5.7137  5.9989   22 .7522   4.55684 . So
5
5
x g  e 4.55684  94 .8070 .
Or
 
log x g 

1
n
 log( x)  5 log3  log103   log203   log303   log403  
1
1
0.47712  2.01284  2.30750  2.48144  2.60531   1 9.88420   1.97684 . So
5
5
x g  10 1.97684  94 .8070 .
Notice that the original numbers and all the means are between 3 and 403.
3

251y0112 10/08/01
Part III. Do the following problems (25 Points)
1. I have the following data for sales clerk work hours at a sample of 8 stores.
310 254 180 170 116 100 96 320
Compute the following:
a) The Median (1)
b) The Standard Deviation (4)
c) The 3rd Decile (2)
Index x
xx
x2
 x  x 2
1
96
9216
-97.25 9457.6
2 100
10000
-93.25 8695.6
3 116
13456
-77.25 5967.6
4 170
28900
-23.25
540.6
5 180
32400
-13.25
175.6
6 254
64516
60.75 3690.6
7 310
96100
116.75 11630.6
8 320
102400
126.75 16065.6
1546
356988
0.00 58223.5
Note that, to be reasonable, the mean, median and 3 rd decile must fall between 96 and 320.
Solution: Compute the Following:
Note that x is in order
n8,
x
 1546 ,
x
2
 356988 ,
 x  x   0.00,  x  x 2  58223.5 .
a) Just put the numbers in order and average the middle numbers, x.5 
Or formally: position  pn  1  a.b  .59  4.5
x 4  x 5 170  180

 175 .
2
2
x1 p  xa  .b( xa1  xa ) so x1.5  x.5  x 4  0.5( x5  x 4 )  170  0.5(180  170 )  175 .
 x  1546  193 .25
b) x 
n
 x  x 
8
s
2
x

2
 nx 2
n 1

356988  8193 .25 2
 8317 .64 or
7
2
58223 .5
 8317 .64 s  8317.64  91.2011
n 1
7
c) The 3rd decile has 30% below it. position  pn  1  a.b  0.39  2.7 . a  2, .b  0.7 .
s2 

x1 p  xa  .b( xa1  xa ) so x1.3  x.7  x 2  0.7( x3  x 2 )  100  0.7(116  100 )  111 .2
(New Formula: position  1  pn  1  a.b  1  0.3(7)  1  2.1  3.1 . a  3, .b  0.1 .
x1 p  xa  .b( xa1  xa ) so x1.3  x.7  x3  0.1( x 4  x3 )  116  0.1(170  116 )  121 .4. )
4
251y0112 10/08/01
2. A bank is investigating the amount of time customers are put on hold when they call. The times are
tabulated below. (Assume that the numbers are a sample.)
a. Calculate the Cumulative Frequency (1)
b. Calculate The Mean (1)
amount
frequency
c. Calculate the Median (2)
less than 30 seconds
2200
d. Calculate the Mode (1)
30 - 59.99 seconds
800
e. Calculate the Variance (3)
60 - 89.99 seconds
770
f. Calculate the Standard Deviation (2)
90 - 119.99 seconds
200
g. Calculate the Interquartile Range (3)
120 - 149.99 seconds
20
h. Calculate a Statistic showing Skewness and
150 - 179.99 seconds
10
Interpret it (3)
i. Make a frequency polygon of the Data
(Neatness Counts!)(2)
(Note - It may make things easier to move the decimal point to the left in the midpoint column, before you
start calculating - but be careful of the median etc. if you do it. For a printout doing things this way , see
251z0112)
Solution: x is the midpoint of the class. Our convention is to use the midpoint of 0 to 2, not 1.99999.
F
f
class
x
A
0- 29.99 2200 2200 15 33000
B 30-199.99 800 3000 45 36000
C 60- 89.99 770 3770 75 57750
D 90-119.99 200 3970 105 21000
E 120-149.99
20 3990 135
2700
F 150-179.99
10 4000 165
1650
4000
152100
n
 f  4000 ,  fx
 f x  x 
2
 152100 ,
 3504398, and
fx3
fx 2
fx
xx
f x  x  f x  x 2 f x  x 3
495000
7425000 -23.025 -50655.5 1166331 -26854784
1620000 72900000
6.975
5580.0
38920
271470
4331250 324843744 36.975 28470.7 1052706 38923800
2205000 231524992 66.975 13395.0 897130 60085288
364500 49207500 96.975
1939.5 188083 18239350
272250 44921248 126.975
1269.7 161227 20471734
9288000 730822528
0.0 3504398 111136864
 fx
 f x  x 
3
2
 9288000 ,
 fx
3
 730822528 ,
 f x  x   0,
 111136864. Note that, to be reasonable, the mean, median
and quartiles must fall between 0 and 180. (If you moved your decimal point one place to the left before
you started, your x column is now in tens,
fx is in tens, fx 2 is in hundreds, fx3 is in ten thousands,
x  x is in tens, f x  x  is in tens, f x  x 2 is in hundreds and f x  x 3 is in ten thousands.).
a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column.
b. Calculate the Mean (1): x 
 fx  152100
 38 .025
n
4000
c. Calculate the Median (2): position  pn  1  .54001   2000 .5 . This is above F  0 and below
 pN  F 
F  2200 , so the interval is A, 0-29.99. x1 p  L p  
 w so
 f p 
 .54000   0 
x1.5  x.5  0  
 30   27 .2727
 2200

d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 2100 is the largest
frequency, the modal group is 0 to 29.99 and the mode is 15.000.
e. Calculate the Variance (3): s
s2 
 f x  x 
n 1
2

2
 fx

2
 nx 2
n 1

9288000  4000 38 .025 2
 876 .318 or
3999
3504398
 876 .318
3999
5
251y0112 10/08/01
f. Calculate the Standard Deviation (2): s  876.318  29.6027
g. Calculate the Interquartile Range (3): First Quartile: position  pn  1  .254001   1000 .25 . This is
 pN  F 
above F  0 and below F  2200 , so the interval is A, 0-29.99. x1 p  L p  
 w gives us
 f p 
 .25 4000   0 
Q1  x1.25  x.75  0  
 30   13 .636 .
2200


Third Quartile: position  pn  1  .754001   3000 .75 . This is above F  3000 and below F  3770 ,
 .754000   3000 
so the interval is C, 60-89.99. x1.75  x.25  60  
 30   60 .000 .
770


IQR  Q3  Q1  60.000 13.636  46.364 .
(New Formula:
For the median - position  1  pn  1  1  0.53999   2000 .5 . This is the same result as on the previous
page.
For the first quartile - position  1  pn  1  1  0.253999   1000 .75 . This leads to interval A and the
same result as above.
For the third quartile -- position  1  pn  1  1  0.753999   3000 .25 . This leads to interval C and the
same result as above.)
h. Calculate a Statistic showing Skewness and interpret it (3):
n
k 3
fx 3  3x
fx 2  2nx 3  4000
730822528  338 .025 9288000  24000 38 .025 3
(n  1)( n  2)
3999 3998 





 0.000250188 111136864   27805 .1 .
or k 3 
n
(n  1)( n  2)
or g 1 
k3
s
3

 f x  x 
27805 .1
29 .6027 3
3

4000
111136864   27805 .1
3999 3998 
 1.07184
3mean  mode 338 .025  15 .0

 2.333
std .deviation
29 .6027
Because of the positive sign, the measures imply skewness to the right.
i. Make a frequency polygon of the Data (Neatness Counts!)(2) A frequency polygon is a line graph of the
frequency. It should hit zero on the right at 15, but this point will not show if the x axis starts at zero. The
next point is 2200 at x  15, so the x  0 height is y  2100 / 2  1100. It falls after that (the next point is
f  800 at x  45 ) and hits zero at x  195 , which may be hard to show. In general, it is difficult to put a
consistent scale on the y-axis because of the extreme differences in the values of f . Putting the y-axis on a
logarithmic scale with the distances 1 to 10, 10 to 100, 100 to 1000 and 100 to 10000 equal would help.
This might be a bit hard and messy, however, without some appropriate graph paper.
A copy of the frequency polygon as done by Minitab appears on the next page, but I would prefer
to see the x-axis and the y-axis start at zero and the x-points marked as 15, 45, 75, etc.
or
Pearson's Measure of Skewness SK 
6
251y0112 10/08/01
f
2000
1000
0
0
100
200
x
7
Download