251y0111 10/08/01 Part I. ECO251 QBA1 FIRST HOUR EXAM OCTOBER 2, 2001 Name ____key______________ SECTION MWF 10 11 TR 11 12:30 (10 points) 1. Indicate whether the following are: Nominal Data, Ordinal Data, Interval Data, Continuous Ratio Data or Discrete Ratio Data. (3) a. The number of students taking ECO 251 Ans: Discrete Ratio. b. Your firm's profits as a percent of sales Ans: Continuous Ratio. c. The Likert Scale rates customer satisfaction with your firm's service on a one to five scale where 1 is exceptional and 5 is unsatisfactory. Ans: Ordinal. (Note: See text p. 13 for most of this - discrete/continuous was defined in class) 2.(D-68) Which of the following explains the shape of a distribution best? (1) a. Mean b. Median c. Box Plot. *d. Stem-and-leaf plot e. Mode (Note: See text pp. 39-44) 3. Make a diagram of a table and show where the stub is. (1) 4. The accompanying box plot shows the sale prices of homes (in thousands) in a Pennsylvania town 0 30 60 70 80 110 140 a. What percent of home prices fall between $60 thousand and $80 thousand - why? (2) Ans: Since 60 is the first quartile and has 25% below it and 80 is the third quartile with 25% above it, 50% must be between them. b. If the mean price is $71 thousand, is the data skewed to the left or right? (1) Ans: Since, for data that is skewed to the right, Mean > median > mode, because the diagram shows that the median is 70, and the mean is higher, is must be skewed to the right. 5. Which of the following is a graph that consists of bars, each of which represents the frequency f ? (2) *a. Histogram b. Ogive c. Frequency Polygon d. Pie chart e. None of the above 1 251y0111 10/08/01 Part II. Compute an appropriate answer, showing your work (15+ Points) a) A distribution of 89 home sale prices has a mean of $67500, a median of $72500 and a standard deviation of $10000. What is the maximum number of homes that have prices that could be above $97500? (2) x 97500 67500 3 , Ans: Since 97500 is 3 standard deviations above the mean z 10000 according to Chebyschev, there could only be 1 k2 1 32 1 9 above $97500, this is less than 10 homes. b) Assume that the distribution above is symmetrical and unimodal. Give a rough answer to the question in a) and explain your reasoning. (2) Ans: Since 97500 is 3 standard deviations above the mean, the Empirical rule says that there will be almost none above $97500. c) The smallest selling price in the distribution above was $25,000 and the largest was $150,000. If these data are to be presented in six classes, what intervals would you use? Explain your reasoning using an appropriate formula and use it to fill in the table below.(3) 150000 25000 20833 so use 22000. This is only a suggestion. Any number somewhat Ans: 6 above 20833 will work, as long as you cover the range. Class A B C D E F From 24000 46000 68000 90000 112000 134000 To 45999 67999 89999 111999 133999 155999 d) WIM technology weighs and measures trucks driving at highway speeds. Trucks are classified in a report as follows: A 'WIM gross weight above 70,000 lbs.' B 'WIM gross weight 70,000 lbs. or less. C 'WIM total length above 60 ft. D 'WIM total length no more than 60 ft. Which of the following classes are mutually exclusive? (Circle) (1.5) *A and B , B and C, A, B, and C Which of the following classes are collectively exhaustive? (Circle) (1.5) *A and B , B and C, *A, B, and C (Note: This was grade at 0.5 for each item correctly marked or not marked) 2 251y0111 10/08/01 e) For the numbers 1, 101, 201, 301 and 401, compute the i) Root-mean-square ii) Harmonic mean, iii) Geometric mean (2 each) x 1005 . This is not used in any of the following calculations and there is Solution: Note that no reason why you should have computed it! (i) The Root-Mean-Square. 1 1 1 2 x rms x 2 12 101 2 201 2 301 2 401 2 1 10201 40401 90601 160801 n 5 5 1 302005 60401 . So x rms 5 (ii) The Harmonic Mean. 1 1 xh n 1 1 1 n x 1 1 2 60401 245 .766 . x 5 1 101 201 301 401 5 1.000000000 1 1 1.020692139 5 1 1 1 0.204138428 . So xh 1 1 n 1 x 0.00990099 0.004875124 0.003322259 0.002493766 1 4.8986 . 0.204138428 (iii) The Geometric Mean. 1 x g x1 x 2 x3 x n n n 2450351001 x 5 1101201301401 5 2450351001 2430351001 1 5 0.2 75.4824 . Or ln x g 1 n ln( x) 5 ln 1 ln 101 ln 201 ln 301 ln 401 5 0 4.6151 5.3033 5.7071 5.9939 1 1 1 21 .6194 4.32388 . So x g e 4.32388 75 .4824 . 5 Or 1 log( x) 1 log 1 log101 log201 log 301 log401 log x g n 5 1 1 0 2.00432 2.30320 2.47857 2.60314 9.38922 1.87785 . So 5 5 x g 10 1.87785 75 .4824 . Notice that the original numbers and all the means are between 1 and 401. 3 251y0111 10/08/01 Part III. Do the following problems (25 Points) 1. I have the following data for sales clerk work hours at a sample of 8 stores. 300 254 190 170 116 100 96 320 Compute the following: a) The Median (1) b) The Standard Deviation (4) c) The 3rd Decile (2) Index x xx x2 x x 2 1 96 9216 -97.25 9457.6 2 100 10000 -93.25 8695.6 3 116 13456 -77.25 5967.6 4 170 28900 -23.25 540.6 5 190 36100 -3.25 10.6 6 254 64516 60.75 3690.6 7 300 90000 106.75 11395.6 8 320 102400 126.75 16065.6 1546 354588 0.00 55823.5 Note that, to be reasonable, the mean, median and 3rd decile must fall between 96 and 320. Solution: Compute the Following: Note that x is in order n8, x 1546 , x 2 354588 , x x 0.00, x x 2 55823.5 . a) Just put the numbers in order and average the middle numbers, x.5 Or formally: position pn 1 a.b .59 4.5 x 4 x 5 170 190 180 . 2 2 x1 p xa .b( xa1 xa ) so x1.5 x.5 x 4 0.5( x5 x 4 ) 170 0.5(190 170 ) 180 . x 1546 193 .25 b) x n x x 8 s 2 x 2 nx 2 n 1 354588 8193 .25 2 7974 .786 or 7 2 55823 .5 7974 .786 s 79784.786 89.3017 n 1 7 c) The 3rd decile has 30% below it. position pn 1 a.b 0.39 2.7 . a 2, .b 0.7 . s2 x1 p xa .b( xa1 xa ) so x1.3 x.7 x 2 0.7( x3 x 2 ) 100 0.7(116 100 ) 111 .2 (New Formula: position 1 pn 1 a.b 1 0.3(7) 1 2.1 3.1 . a 3, .b 0.1 . x1 p xa .b( xa1 xa ) so x1.3 x.7 x3 0.1( x 4 x3 ) 116 0.1(170 116 ) 121 .4. ) 4 251y0111 10/08/01 2. A bank is investigating the amount of time customers are put on hold when they call. The times are tabulated below. (Assume that the numbers are a sample.) a. Calculate the Cumulative Frequency (1) b. Calculate The Mean (1) less than 30 seconds 2100 c. Calculate the Median (2) 30 - 59.99 seconds 900 d. Calculate the Mode (1) 60 - 89.99 seconds 770 e. Calculate the Variance (3) 90 - 119.99 seconds 200 f. Calculate the Standard Deviation (2) 120 - 149.99 seconds 20 g. Calculate the Interquartile Range (3) 150 - 179.99 seconds 10 h. Calculate a Statistic showing Skewness and Interpret it (3) i. Make a frequency polygon of the Data (Neatness Counts!)(2) (Note - It may make things easier to move the decimal point to the left in the midpoint column, before you start calculating - but be careful of the median etc. if you do it. For a printout doing things this way, see 251z0111) amount frequency Solution: x is the midpoint of the class. Our convention is to use the midpoint of 0 to 2, not 1.99999. F f class x A 0- 29.99 2100 2100 15 31500 B 30-199.99 900 3000 45 40500 C 60- 89.99 770 3770 75 57750 D 90-119.99 200 3970 105 21000 E 120-149.99 20 3990 135 2700 F 150-179.99 10 4000 165 1650 4000 155100 n f 4000 , fx f x x 2 155100 , 3453997, and fx3 fx 2 fx xx f x x f x x 2 f x x 3 472500 7087500 -23.775 -49927.5 1187027 -28221556 1822500 82012496 6.225 5602.5 34876 217100 4331250 324843744 36.225 27893.2 1010433 36602928 2205000 231524992 66.225 13245.0 877150 58089264 364500 49207500 96.225 1924.5 185185 17819428 272250 44921248 126.225 1262.3 159328 20111114 9468000 739597440 0.0 3453997 104618280 fx f x x 3 2 9468000 , fx 3 739597440 , f x x 0, 104618280. Note that, to be reasonable, the mean, median and quartiles must fall between 0 and 180. (If you moved your decimal point one place to the left before you started, your x column is now in tens, fx is in tens, fx 2 is in hundreds, fx3 is in ten thousands, x x is in tens, f x x is in tens, f x x 2 is in hundreds and f x x 3 is in ten thousands.). a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column. b. Calculate the Mean (1): x fx 155100 38 .775 n 4000 c. Calculate the Median (2): position pn 1 .54001 2000 .5 . This is above F 0 and below pN F F 2100 , so the interval is A, 0-29.99. x1 p L p w so f p .54000 0 x1.5 x.5 0 30 28 .5714 2100 d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 2100 is the largest frequency, the modal group is 0 to 29.99 and the mode is 15.000. e. Calculate the Variance (3): s s2 f x x n 1 2 2 fx 2 nx 2 n 1 9468000 4000 38 .775 2 863 .715 or 3999 3453997 863 .715 3999 5 251y0111 10/08/01 f. Calculate the Standard Deviation (2): s 863.715 29.3890 g. Calculate the Interquartile Range (3): First Quartile: position pn 1 .254001 1000 .25 . This is pN F above F 0 and below F 2100 , so the interval is A, 0-29.99. x1 p L p w gives us f p .25 4000 0 Q1 x1.25 x.75 0 30 14 .286 . 2100 Third Quartile: position pn 1 .754001 3000 .75 . This is above F 3000 and below F 3770 , .754000 3000 so the interval is C, 60-89.99. x1.75 x.25 60 30 60 .000 . 770 IQR Q3 Q1 60.000 14.286 45.714 . (New Formula: For the median - position 1 pn 1 1 0.53999 2000 .5 . This is the same result as on the previous page. For the first quartile - position 1 pn 1 1 0.253999 1000 .75 . This leads to interval A and the same result as above. For the third quartile -- position 1 pn 1 1 0.753999 3000 .25 . This leads to interval C and the same result as above.) h. Calculate a Statistic showing Skewness and interpret it (3): n k 3 fx 3 3x fx 2 2nx 3 4000 739597440 338 .775 946800 24000 38.775 3 (n 1)( n 2) 3999 3998 0.000250188 104618240 26174 .2 . or k 3 or g 1 n (n 1)( n 2) k3 s 3 f x x 26174 .2 29 .3890 3 3 4000 2104618280 26174 .2 3999 3998 1.03114 3mean mode 338 .775 15 .0 2.427 std .deviation 29 .3890 Because of the positive sign, the measures imply skewness to the right. i. Make a frequency polygon of the Data (Neatness Counts!)(2) A frequency polygon is a line graph of the frequency. It should hit zero on the right at 15, but this point will not show if the x axis starts at zero. The next point is 2100 at x 15, so the x 0 height is y 2100 / 2 1050 . It falls after that (the next point is f 900 at x 45 ) and hits zero at x 195 , which may be hard to show. In general, it is difficult to put a consistent scale on the y-axis because of the extreme differences in the values of f . Putting the y-axis on a logarithmic scale with the distances 1 to 10, 10 to 100, 100 to 1000 and 100 to 10000 equal would help. This might be a bit hard and messy, however, without some appropriate graph paper. A Minitab version of the frequency polygon appears on the last page of 251y0112. or Pearson's Measure of Skewness SK 6