251y0112 10/08/01 Part I. ECO251 QBA1 FIRST HOUR EXAM OCTOBER 2, 2001 Name _______key___________ SECTION MWF 10 11 TR 11 12:30 (10 points) 1. Indicate whether the following are: Nominal Data, Ordinal Data, Interval Data, Continuous Ratio Data or Discrete Ratio Data. (3) a. Price to earnings ratio of your stock. Ans: Continuous Ratio. b. Number of customers who said that service was unsatisfactory in a survey. Ans: Discrete Ratio. c. The Likert Scale rates customer satisfaction with your firm's service on a one to five scale where 1 is exceptional and 5 is unsatisfactory. Ans: Ordinal. (Note: See text p. 13 for most of this - discrete/continuous was defined in class) 2.(D-68) Which of the following explains the shape of a distribution best? (1) a. Mean b. Median c. Box Plot. *d. Stem-and-leaf plot e. Mode (Note: See text pp. 39-44) 3. Make a diagram of a table and show where the field is. (1) 4. The accompanying box plot shows the sale prices of homes (in thousands) in a Pennsylvania town 0 30 60 70 80 110 140 a. What percent of home prices fall between $60 thousand and $80 thousand - why? (2) Ans: Since 60 is the first quartile and has 25% below it and 80 is the third quartile with 25% above it, 50% must be between them b. If the mean price is $69 thousand, is the data skewed to the left or right? (1) Ans: Since, for data that is skewed to the left, Mean < median < mode, because the diagram shows that the median is 70, and the mean is lower, it must be skewed to the left 5. Which of the following is a graph that shows cumulative frequency? (2) a. Histogram *b. Ogive c. Frequency Polygon d. Pie chart e. None of the above 1 251y0112 10/08/01 Part II. Compute an appropriate answer, showing your work (15+ Points) a) A distribution of 89 home sale prices has a mean of $67500, a median of $72500 and a standard deviation of $10000. What is the maximum number of homes that have prices that could be below $37500? (2) x 37500 67500 3 , Ans: Since 37500 is 3 standard deviations below the mean z 10000 according to Chebyschev, there could only be 1 k2 1 32 1 9 above $97500, this is less than 10 homes. b) Assume that the distribution above is symmetrical and unimodal. Give a rough answer to the question in a) and explain your reasoning. (2) Ans: Since 37500 is 3 standard deviations below the mean, the Empirical rule says that there will be almost none below $37500. c) The smallest selling price in the distribution above was $25,000 and the largest was $146,000 (Note correction!). If these data are to be presented in five classes, what intervals would you use? Explain your reasoning using an appropriate formula and use it to fill in the table below.(3) 146000 25000 24200 so use 25000. This is only a suggestion. Any number somewhat Ans: 5 above 24200 will work, as long as you cover the range. Class A B C D E From 25000 50000 75000 100000 125000 to 49999 74999 99999 124999 149999 d) WIM technology weighs and measures trucks driving at highway speeds. Trucks are classified in a report as follows: A 'WIM gross weight above 70,000 lbs.' B 'WIM gross weight 70,00 lbs. or less. C 'WIM total length above 60 ft. D 'WIM total length no more than 60 ft. Which of the following classes are mutually exclusive? (Circle) (1.5) A and C , *C and D, A, B, and C Which of the following classes are collectively exhaustive? (Circle) (1.5) A and C , *C and D, *A, B, and C (Note: This was grade at 0.5 for each item correctly marked or not marked) 2 251y0112 10/08/01 e) For the numbers 3, 103, 203, 303 and 403, compute the i) Root-mean-square ii) Harmonic mean, iii) Geometric mean (2.5 each) x 1015 . This is not used in any of the following calculations and there is Solution: Note that no reason why you should have computed it! (i) The Root-Mean-Square. 1 1 1 2 x rms x 2 3 2 103 2 203 2 303 2 403 2 9 10609 41209 91809 162409 n 5 5 1 306045 61209 . So x rms 5 (ii) The Harmonic Mean. 1 1 xh n 1 1 1 n x 1 1 2 61209 247 .404 . x 5 3 103 203 303 403 5 0.333333333 0.009708738 1 1 0.353749899 5 1 1 1 0.070749979 . So xh 1 1 n x 1 0.0049261008 0.003300330 0.002481310 1 14 .13427965 . 0.070749979 (iii) The Geometric Mean. 1 x g x1 x 2 x3 x n n n 7659531243 x 5 3103 203 303 403 5 7659531243 7659531243 1 5 0.2 94.8070 . Or ln x g 1 n ln( x) 5 ln 3 ln 103 ln 203 ln 303 ln 403 1 1 1 1.0986 4.6347 5.3132 5.7137 5.9989 22 .7522 4.55684 . So 5 5 x g e 4.55684 94 .8070 . Or log x g 1 n log( x) 5 log3 log103 log203 log303 log403 1 1 0.47712 2.01284 2.30750 2.48144 2.60531 1 9.88420 1.97684 . So 5 5 x g 10 1.97684 94 .8070 . Notice that the original numbers and all the means are between 3 and 403. 3 251y0112 10/08/01 Part III. Do the following problems (25 Points) 1. I have the following data for sales clerk work hours at a sample of 8 stores. 310 254 180 170 116 100 96 320 Compute the following: a) The Median (1) b) The Standard Deviation (4) c) The 3rd Decile (2) Index x xx x2 x x 2 1 96 9216 -97.25 9457.6 2 100 10000 -93.25 8695.6 3 116 13456 -77.25 5967.6 4 170 28900 -23.25 540.6 5 180 32400 -13.25 175.6 6 254 64516 60.75 3690.6 7 310 96100 116.75 11630.6 8 320 102400 126.75 16065.6 1546 356988 0.00 58223.5 Note that, to be reasonable, the mean, median and 3 rd decile must fall between 96 and 320. Solution: Compute the Following: Note that x is in order n8, x 1546 , x 2 356988 , x x 0.00, x x 2 58223.5 . a) Just put the numbers in order and average the middle numbers, x.5 Or formally: position pn 1 a.b .59 4.5 x 4 x 5 170 180 175 . 2 2 x1 p xa .b( xa1 xa ) so x1.5 x.5 x 4 0.5( x5 x 4 ) 170 0.5(180 170 ) 175 . x 1546 193 .25 b) x n x x 8 s 2 x 2 nx 2 n 1 356988 8193 .25 2 8317 .64 or 7 2 58223 .5 8317 .64 s 8317.64 91.2011 n 1 7 c) The 3rd decile has 30% below it. position pn 1 a.b 0.39 2.7 . a 2, .b 0.7 . s2 x1 p xa .b( xa1 xa ) so x1.3 x.7 x 2 0.7( x3 x 2 ) 100 0.7(116 100 ) 111 .2 (New Formula: position 1 pn 1 a.b 1 0.3(7) 1 2.1 3.1 . a 3, .b 0.1 . x1 p xa .b( xa1 xa ) so x1.3 x.7 x3 0.1( x 4 x3 ) 116 0.1(170 116 ) 121 .4. ) 4 251y0112 10/08/01 2. A bank is investigating the amount of time customers are put on hold when they call. The times are tabulated below. (Assume that the numbers are a sample.) a. Calculate the Cumulative Frequency (1) b. Calculate The Mean (1) amount frequency c. Calculate the Median (2) less than 30 seconds 2200 d. Calculate the Mode (1) 30 - 59.99 seconds 800 e. Calculate the Variance (3) 60 - 89.99 seconds 770 f. Calculate the Standard Deviation (2) 90 - 119.99 seconds 200 g. Calculate the Interquartile Range (3) 120 - 149.99 seconds 20 h. Calculate a Statistic showing Skewness and 150 - 179.99 seconds 10 Interpret it (3) i. Make a frequency polygon of the Data (Neatness Counts!)(2) (Note - It may make things easier to move the decimal point to the left in the midpoint column, before you start calculating - but be careful of the median etc. if you do it. For a printout doing things this way , see 251z0112) Solution: x is the midpoint of the class. Our convention is to use the midpoint of 0 to 2, not 1.99999. F f class x A 0- 29.99 2200 2200 15 33000 B 30-199.99 800 3000 45 36000 C 60- 89.99 770 3770 75 57750 D 90-119.99 200 3970 105 21000 E 120-149.99 20 3990 135 2700 F 150-179.99 10 4000 165 1650 4000 152100 n f 4000 , fx f x x 2 152100 , 3504398, and fx3 fx 2 fx xx f x x f x x 2 f x x 3 495000 7425000 -23.025 -50655.5 1166331 -26854784 1620000 72900000 6.975 5580.0 38920 271470 4331250 324843744 36.975 28470.7 1052706 38923800 2205000 231524992 66.975 13395.0 897130 60085288 364500 49207500 96.975 1939.5 188083 18239350 272250 44921248 126.975 1269.7 161227 20471734 9288000 730822528 0.0 3504398 111136864 fx f x x 3 2 9288000 , fx 3 730822528 , f x x 0, 111136864. Note that, to be reasonable, the mean, median and quartiles must fall between 0 and 180. (If you moved your decimal point one place to the left before you started, your x column is now in tens, fx is in tens, fx 2 is in hundreds, fx3 is in ten thousands, x x is in tens, f x x is in tens, f x x 2 is in hundreds and f x x 3 is in ten thousands.). a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column. b. Calculate the Mean (1): x fx 152100 38 .025 n 4000 c. Calculate the Median (2): position pn 1 .54001 2000 .5 . This is above F 0 and below pN F F 2200 , so the interval is A, 0-29.99. x1 p L p w so f p .54000 0 x1.5 x.5 0 30 27 .2727 2200 d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 2100 is the largest frequency, the modal group is 0 to 29.99 and the mode is 15.000. e. Calculate the Variance (3): s s2 f x x n 1 2 2 fx 2 nx 2 n 1 9288000 4000 38 .025 2 876 .318 or 3999 3504398 876 .318 3999 5 251y0112 10/08/01 f. Calculate the Standard Deviation (2): s 876.318 29.6027 g. Calculate the Interquartile Range (3): First Quartile: position pn 1 .254001 1000 .25 . This is pN F above F 0 and below F 2200 , so the interval is A, 0-29.99. x1 p L p w gives us f p .25 4000 0 Q1 x1.25 x.75 0 30 13 .636 . 2200 Third Quartile: position pn 1 .754001 3000 .75 . This is above F 3000 and below F 3770 , .754000 3000 so the interval is C, 60-89.99. x1.75 x.25 60 30 60 .000 . 770 IQR Q3 Q1 60.000 13.636 46.364 . (New Formula: For the median - position 1 pn 1 1 0.53999 2000 .5 . This is the same result as on the previous page. For the first quartile - position 1 pn 1 1 0.253999 1000 .75 . This leads to interval A and the same result as above. For the third quartile -- position 1 pn 1 1 0.753999 3000 .25 . This leads to interval C and the same result as above.) h. Calculate a Statistic showing Skewness and interpret it (3): n k 3 fx 3 3x fx 2 2nx 3 4000 730822528 338 .025 9288000 24000 38 .025 3 (n 1)( n 2) 3999 3998 0.000250188 111136864 27805 .1 . or k 3 n (n 1)( n 2) or g 1 k3 s 3 f x x 27805 .1 29 .6027 3 3 4000 111136864 27805 .1 3999 3998 1.07184 3mean mode 338 .025 15 .0 2.333 std .deviation 29 .6027 Because of the positive sign, the measures imply skewness to the right. i. Make a frequency polygon of the Data (Neatness Counts!)(2) A frequency polygon is a line graph of the frequency. It should hit zero on the right at 15, but this point will not show if the x axis starts at zero. The next point is 2200 at x 15, so the x 0 height is y 2100 / 2 1100. It falls after that (the next point is f 800 at x 45 ) and hits zero at x 195 , which may be hard to show. In general, it is difficult to put a consistent scale on the y-axis because of the extreme differences in the values of f . Putting the y-axis on a logarithmic scale with the distances 1 to 10, 10 to 100, 100 to 1000 and 100 to 10000 equal would help. This might be a bit hard and messy, however, without some appropriate graph paper. A copy of the frequency polygon as done by Minitab appears on the next page, but I would prefer to see the x-axis and the y-axis start at zero and the x-points marked as 15, 45, 75, etc. or Pearson's Measure of Skewness SK 6 251y0112 10/08/01 f 2000 1000 0 0 100 200 x 7