251y0111 10/08/01 key

advertisement
251y0111 10/08/01
Part I.
ECO251 QBA1
FIRST HOUR EXAM
OCTOBER 2, 2001
Name ____key______________
SECTION MWF 10 11 TR 11 12:30
(10 points)
1. Indicate whether the following are: Nominal Data, Ordinal Data, Interval Data, Continuous Ratio Data or
Discrete Ratio Data. (3)
a. The number of students taking ECO 251 Ans: Discrete Ratio.
b. Your firm's profits as a percent of sales Ans: Continuous Ratio.
c. The Likert Scale rates customer satisfaction with your firm's service on a one to five scale
where 1 is exceptional and 5 is unsatisfactory. Ans: Ordinal.
(Note: See text p. 13 for most of this - discrete/continuous was defined in class)
2.(D-68) Which of the following explains the shape of a distribution best? (1)
a. Mean
b. Median
c. Box Plot.
*d. Stem-and-leaf plot
e. Mode
(Note: See text pp. 39-44)
3. Make a diagram of a table and show where the stub is. (1)
4. The accompanying box plot shows the sale prices of homes (in thousands) in a Pennsylvania town
0
30
60
70 80
110
140
a. What percent of home prices fall between $60 thousand and $80 thousand - why? (2)
Ans: Since 60 is the first quartile and has 25% below it and 80 is the third quartile with 25%
above it, 50% must be between them.
b. If the mean price is $71 thousand, is the data skewed to the left or right? (1)
Ans: Since, for data that is skewed to the right, Mean > median > mode, because the diagram
shows that the median is 70, and the mean is higher, is must be skewed to the right.
5. Which of the following is a graph that consists of bars, each of which represents the frequency  f  ? (2)
*a. Histogram
b. Ogive
c. Frequency Polygon
d. Pie chart
e. None of the above
1
251y0111 10/08/01
Part II. Compute an appropriate answer, showing your work (15+ Points)
a) A distribution of 89 home sale prices has a mean of $67500, a median of $72500 and a standard
deviation of $10000. What is the maximum number of homes that have prices that could be above $97500?
(2)
x   97500  67500



 3 ,
Ans: Since 97500 is 3 standard deviations above the mean  z 

10000


according to Chebyschev, there could only be
1
k2

1
32

1
9
above $97500, this is less than 10
homes.
b) Assume that the distribution above is symmetrical and unimodal. Give a rough answer to the
question in a) and explain your reasoning. (2)
Ans: Since 97500 is 3 standard deviations above the mean, the Empirical rule says that there will
be almost none above $97500.
c) The smallest selling price in the distribution above was $25,000 and the largest was $150,000.
If these data are to be presented in six classes, what intervals would you use? Explain your reasoning using
an appropriate formula and use it to fill in the table below.(3)
150000  25000
 20833 so use 22000. This is only a suggestion. Any number somewhat
Ans:
6
above 20833 will work, as long as you cover the range.
Class
A
B
C
D
E
F
From
24000
46000
68000
90000
112000
134000
To
45999
67999
89999
111999
133999
155999
d) WIM technology weighs and measures trucks driving at highway speeds. Trucks are classified
in a report as follows:
A 'WIM gross weight above 70,000 lbs.' B 'WIM gross weight 70,000 lbs. or less.
C 'WIM total length above 60 ft.
D 'WIM total length no more than 60 ft.
Which of the following classes are mutually exclusive? (Circle) (1.5)
*A and B ,
B and C, A, B, and C
Which of the following classes are collectively exhaustive? (Circle) (1.5)
*A and B ,
B and C, *A, B, and C
(Note: This was grade at 0.5 for each item correctly marked or not marked)
2
251y0111 10/08/01
e) For the numbers 1, 101, 201, 301 and 401, compute the i) Root-mean-square ii) Harmonic
mean, iii) Geometric mean (2 each)
x  1005 . This is not used in any of the following calculations and there is
Solution: Note that

no reason why you should have computed it!
(i) The Root-Mean-Square.
1
1
1
2
x rms

x 2  12  101 2  201 2  301 2  401 2  1  10201  40401  90601  160801 
n
5
5



1
302005   60401 . So x rms 
5
(ii) The Harmonic Mean.

1 1

xh n

1 1
1
n
x
1 
1
2
 60401  245 .766 .
 x  5  1  101  201  301  401   5 1.000000000
1
1
1.020692139
5
1
1
1
  0.204138428
. So xh 
1
1
n

1
x

 0.00990099  0.004875124  0.003322259  0.002493766

1
 4.8986 .
0.204138428
(iii) The Geometric Mean.
1
x g  x1  x 2  x3  x n  n  n
 2450351001
x 
5
1101201301401  5 2450351001
 2430351001

1
5
0.2  75.4824 .
Or
 
ln x g 
1
n
 ln( x)  5 ln 1  ln 101  ln 201  ln 301  ln 401  5 0  4.6151  5.3033  5.7071  5.9939 
1
1
1
21 .6194   4.32388 . So x g  e 4.32388  75 .4824 .
5
Or
1
log( x)  1 log 1  log101   log201   log 301   log401  
log x g 
n
5
1
1
 0  2.00432  2.30320  2.47857  2.60314   9.38922   1.87785 . So
5
5

 

x g  10 1.87785  75 .4824 .
Notice that the original numbers and all the means are between 1 and 401.
3
251y0111 10/08/01
Part III. Do the following problems (25 Points)
1. I have the following data for sales clerk work hours at a sample of 8 stores.
300 254 190 170 116 100 96 320
Compute the following:
a) The Median (1)
b) The Standard Deviation (4)
c) The 3rd Decile (2)
Index x
xx
x2
 x  x 2
1
96
9216
-97.25 9457.6
2 100
10000
-93.25 8695.6
3 116
13456
-77.25 5967.6
4 170
28900
-23.25
540.6
5 190
36100
-3.25
10.6
6 254
64516
60.75 3690.6
7 300
90000
106.75 11395.6
8 320
102400
126.75 16065.6
1546
354588
0.00 55823.5
Note that, to be reasonable, the mean, median and 3rd decile must fall between 96 and 320.
Solution: Compute the Following:
Note that x is in order
n8,
x
 1546 ,
x
2
 354588 ,
 x  x   0.00,  x  x 2  55823.5 .
a) Just put the numbers in order and average the middle numbers, x.5 
Or formally: position  pn  1  a.b  .59  4.5
x 4  x 5 170  190

 180 .
2
2
x1 p  xa  .b( xa1  xa ) so x1.5  x.5  x 4  0.5( x5  x 4 )  170  0.5(190  170 )  180 .
 x  1546  193 .25
b) x 
n
 x  x 
8
s
2
x

2
 nx 2
n 1

354588  8193 .25 2
 7974 .786 or
7
2
55823 .5
 7974 .786 s  79784.786  89.3017
n 1
7
c) The 3rd decile has 30% below it. position  pn  1  a.b  0.39  2.7 . a  2, .b  0.7 .
s2 

x1 p  xa  .b( xa1  xa ) so x1.3  x.7  x 2  0.7( x3  x 2 )  100  0.7(116  100 )  111 .2
(New Formula: position  1  pn  1  a.b  1  0.3(7)  1  2.1  3.1 . a  3, .b  0.1 .
x1 p  xa  .b( xa1  xa ) so x1.3  x.7  x3  0.1( x 4  x3 )  116  0.1(170  116 )  121 .4. )
4
251y0111 10/08/01
2. A bank is investigating the amount of time customers are put on hold when they call. The times are
tabulated below. (Assume that the numbers are a sample.)
a. Calculate the Cumulative Frequency (1)
b. Calculate The Mean (1)
less than 30 seconds
2100
c. Calculate the Median (2)
30 - 59.99 seconds
900
d. Calculate the Mode (1)
60 - 89.99 seconds
770
e. Calculate the Variance (3)
90 - 119.99 seconds
200
f. Calculate the Standard Deviation (2)
120 - 149.99 seconds
20
g. Calculate the Interquartile Range (3)
150 - 179.99 seconds
10
h. Calculate a Statistic showing Skewness and
Interpret it (3)
i. Make a frequency polygon of the Data
(Neatness Counts!)(2)
(Note - It may make things easier to move the decimal point to the left in the midpoint column, before you
start calculating - but be careful of the median etc. if you do it. For a printout doing things this way, see
251z0111)
amount
frequency
Solution: x is the midpoint of the class. Our convention is to use the midpoint of 0 to 2, not 1.99999.
F
f
class
x
A
0- 29.99 2100 2100 15 31500
B 30-199.99 900 3000 45 40500
C 60- 89.99 770 3770 75 57750
D 90-119.99 200 3970 105 21000
E 120-149.99
20 3990 135
2700
F 150-179.99
10 4000 165
1650
4000
155100
n
 f  4000 ,  fx
 f x  x 
2
 155100 ,
 3453997, and
fx3
fx 2
fx
xx
f x  x  f x  x 2 f x  x 3
472500
7087500 -23.775 -49927.5 1187027 -28221556
1822500 82012496
6.225
5602.5
34876
217100
4331250 324843744 36.225 27893.2 1010433 36602928
2205000 231524992 66.225 13245.0 877150 58089264
364500 49207500 96.225
1924.5 185185 17819428
272250 44921248 126.225
1262.3 159328 20111114
9468000 739597440
0.0 3453997 104618280
 fx
 f x  x 
3
2
 9468000 ,
 fx
3
 739597440 ,
 f x  x   0,
 104618280. Note that, to be reasonable, the mean, median
and quartiles must fall between 0 and 180. (If you moved your decimal point one place to the left before
you started, your x column is now in tens,
fx is in tens, fx 2 is in hundreds, fx3 is in ten thousands,
x  x is in tens, f x  x  is in tens, f x  x 2 is in hundreds and f x  x 3 is in ten thousands.).
a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column.
b. Calculate the Mean (1): x 
 fx  155100
 38 .775
n
4000
c. Calculate the Median (2): position  pn  1  .54001   2000 .5 . This is above F  0 and below
 pN  F 
F  2100 , so the interval is A, 0-29.99. x1 p  L p  
 w so
 f p 
 .54000   0 
x1.5  x.5  0  
 30   28 .5714
 2100

d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 2100 is the largest
frequency, the modal group is 0 to 29.99 and the mode is 15.000.
e. Calculate the Variance (3): s
s2 
 f x  x 
n 1
2

2
 fx

2
 nx 2
n 1

9468000  4000 38 .775 2
 863 .715 or
3999
3453997
 863 .715
3999
5
251y0111 10/08/01
f. Calculate the Standard Deviation (2): s  863.715  29.3890
g. Calculate the Interquartile Range (3): First Quartile: position  pn  1  .254001   1000 .25 . This is
 pN  F 
above F  0 and below F  2100 , so the interval is A, 0-29.99. x1 p  L p  
 w gives us
 f p 
 .25 4000   0 
Q1  x1.25  x.75  0  
 30   14 .286 .
2100


Third Quartile: position  pn  1  .754001   3000 .75 . This is above F  3000 and below F  3770 ,
 .754000   3000 
so the interval is C, 60-89.99. x1.75  x.25  60  
 30   60 .000 .
770


IQR  Q3  Q1  60.000 14.286  45.714 .
(New Formula:
For the median - position  1  pn  1  1  0.53999   2000 .5 . This is the same result as on the previous
page.
For the first quartile - position  1  pn  1  1  0.253999   1000 .75 . This leads to interval A and the
same result as above.
For the third quartile -- position  1  pn  1  1  0.753999   3000 .25 . This leads to interval C and the
same result as above.)
h. Calculate a Statistic showing Skewness and interpret it (3):
n
k 3
fx 3  3x
fx 2  2nx 3  4000
739597440  338 .775 946800  24000 38.775 3
(n  1)( n  2)
3999 3998 





 0.000250188 104618240   26174 .2 .
or k 3 
or g 1 
n
(n  1)( n  2)
k3
s
3

 f x  x 
26174 .2
29 .3890 3
3

4000
2104618280   26174 .2
3999 3998 
 1.03114
3mean  mode 338 .775  15 .0

 2.427
std .deviation
29 .3890
Because of the positive sign, the measures imply skewness to the right.
i. Make a frequency polygon of the Data (Neatness Counts!)(2) A frequency polygon is a line graph of the
frequency. It should hit zero on the right at 15, but this point will not show if the x axis starts at zero. The
next point is 2100 at x  15, so the x  0 height is y  2100 / 2  1050 . It falls after that (the next point is
f  900 at x  45 ) and hits zero at x  195 , which may be hard to show. In general, it is difficult to put a
consistent scale on the y-axis because of the extreme differences in the values of f . Putting the y-axis on a
logarithmic scale with the distances 1 to 10, 10 to 100, 100 to 1000 and 100 to 10000 equal would help.
This might be a bit hard and messy, however, without some appropriate graph paper.
A Minitab version of the frequency polygon appears on the last page of 251y0112.
or
Pearson's Measure of Skewness SK 
6
Download