251y0451 10/20/04 ECO251 QBA1 FIRST HOUR EXAM October 6, 2004 Name: _____KEY______________ Student Number : _____________________ Class Hour: _____________________ Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer needs a calculation or short explanation to count. Part I. (7 points) Use the eleven numbers that you used in the second problem in the take-home exam. (If you don’t have them – take your student number plus the numbers (3, 6, 9, 9, 21) . Example: Seymour Butz’s student number is 876509, so he gets 8, 7, 6, 5, 0, 9, 3, 6, 9, 9, 21. Of course, he has read “Things That You Should Never Do on an Exam or Anywhere Else” and knows that he can’t use them this way. ) Compute the following: a) The Median (1) b) The Standard Deviation (3) c) The 2nd Quintile (2) d) The Coefficient of variation (1) Solution: Seymour used the eleven numbers 1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1 . The numbers in order are 1, 1, 2, 3, 5, 7, 7, 10, 12, 15, 22. x x2 1 1 x1 1 1 x2 x3 2 4 x4 3 9 x5 5 25 x6 7 49 x7 7 49 x8 10 100 x9 12 144 x10 15 225 x11 Total 22 484 85 1091 a) The middle number is 7. b) n 11, x x 85 7.72727 , s x 2 2 nx 2 n 11 n 1 434 .181818 43 .41818 . So s 43.41818 6.589247 10 c) position pn 1 .412 =4.8. So a 4 and .b .8 . 1091 117.72727 2 10 x1 p xa .b( xa1 xa ) so x1.4 x.6 x 4 .8x5 x 4 3 .85 3 4.6 d) C s 6.589247 0.8527 x 7.72727 1 251y0451 10/20/04 If you enjoy wasting time, you might want to use the definitional formula. xx x x 2 x1 x 1 -6.7273 45.256 x2 1 -6.7273 45.256 x3 2 -5.7273 32.802 x4 3 -4.7273 22.347 x5 5 -2.7273 7.438 x6 7 -0.7273 0.529 x7 7 -0.7273 0.529 x8 10 2.2727 5.165 x9 12 4.2727 18.256 x10 15 7.2727 52.873 x11 Total 22 14.2727 203.711 85 0.0003 434.182 n 11, x x 85 7.72727 , s x x 2 n 11 n 1 2 434 .182 43 .4182 . The vast majority of people 10 x x who thought that they were using the definitional formula used 2 , which, I believe, should have n 1 given them x 2 . Doing a little bit of homework should have prevented this error. 2 251y0451 10/07/04 Part II. 1. The problem in the textbook that gives the data used in the take home also gives the braking distance for a sample of domestic made cars. It is presented below. Cumulative frequency (in red) is needed to get the median and was not given. Distance(feet) frequency 210 220 230 240 250 260 270 280 290 300 310 Sum – – – – - 220 230 240 250 260 270 280 290 300 310 320 1 1 1 1 4 3 6 4 2 2 0 25 Cumulative frequency 1 2 3 4 8 11 17 21 23 25 25 Minitab was used to calculate statistics from these data. It claims the following: (Note!!!!!) x 269 , s 2 525 , k 3 7281 .61. You will not be able to use any of these numbers in b) or c) without some manipulation in parts b and c. Answers below are not acceptable unless you give some evidence in the sample statistics. a) Do American cars have a shorter braking distance? Compare all 3 measures of central tendency. (2) b) Are American cars more consistent in braking distance than foreign cars? Use a dimension-free measurement of variability. (2) c) Compare the direction and degree of skewness in the two distributions. Use one dimension- free measure of skewness. (2) d) Write a 5-number summary of the results from the first take-home problem. (2) 15 Solution: a) Seymour had given us, for the foreign-made cars. x 260 .647 , x1.5 x.5 255 .6810 and the mode is 255. For the median for the domestic cars position pn 1 .526 13 . Since 13 is above 11 and below 17, .525 11 the median is in 270-280, which has a frequency of 6. x 1.5 x.5 270 10 272 .5 . The 6 mode is the midpoint of the largest group, which is 275. Domestic Foreign Mean 269 260.647 Median 272.5 255.681 Mode 275 255 According to all measures, American cars have a longer braking distance. b) Seymour says for the foreign cars s 567.731 23.8271 . If we compute the coefficient of variation, Cx s 22 .913 23 .8271 .08518 Foreign C .0914 . Domestic C 269 260 .647 American cars are more consistent. 3 251y0451 10/07/04 c) You can use g1 or SK Domestic Mean 269 Mode 275 -7281.61 k3 s 22.913 k3 7281 .61 .6053 g1 3 s 22 .913 3 or 3mean mode 3269 275 .786 SK 22 .913 std .deviation Foreign 260.647 255 8389.92 23.8271 8689 .93 23 .8271 3 = .6424 3260 .647 255 .0.711 23 .8271 My answers are not consistent. g1 makes Foreign more skewed, while SK makes Domestic look more skewed. However, Domestic is skewed to the left and Foreign to the right. d) Lower Limit First Quartile Median Third Quartile Upper Limit 2. 210 243.5 255.681 275.357 320 The following numbers refer to miles-per-gallon of a sample of vehicles (Bowerman and O’Connell). Class (mpg) F f rel f Frel 29.8 - 30.3 ____ ____ ____ .0612 30.4 – 30.9 ____ ____ ____ .2449 31.0 – 31.5 ____ ____ 24 ____ 31.6 – 32.1 ____ .2653 35 .7551 32.2 – 32.7 11 .2245 46 .9388 32.8 – 33.3 3 .0612 49 1.000 Fill in the missing numbers. (5) 20 Even with corrections made above, this had some errors, but I still could check easily to see if you knew what you were doing. The completely corrected results were. Class (mpg) F f rel f Frel 29.8 - 30.3 3 .0612 3 .0612 30.4 – 30.9 9 .1837 12 .2449 31.0 – 31.5 12 .2449 24 .4898 31.6 – 32.1 13 .2653 37 .7551 32.2 – 32.7 9 .1837 46 .9388 32.8 – 33.3 3 .0612 49 1.000 Total 49 1.0000 4 251y0451 10/07/04 Part III. (At least 22 points – 2 points each unless marked) 1. Mark the variables below as qualitative (A) or quantitative (B) a) Number of days a patient stays at a spa B b) Preferences for 10 beers on a 1st to 10th scale A c) Method of contraception A d) Per cent change in population between censuses B 2. Which of the following is an example of continuous ratio data? a) Number of days a patient stays at a spa b) Preferences for beers on a 1 to 10 scale c) Method of contraception d) *Per cent change in population between censuses e) None of the above. 4 3. A summary measure that is computed to describe a characteristic of a population is called a) *a parameter. b) a census. c) a statistic. d) An inference e) None of the above 6 4. In general what are the two types of descriptive statistic most frequently reported a) Measures of kurtosis and measures of dispersion b) Measures of kurtosis and measures of skewness c) Measures of kurtosis and measures of central tendency d) Measures of dispersion and measures of skewness e) *Measures of dispersion and measures of central tendency f) Measures of skewness and measures of central tendency g) None of the above. 8 5 251y0451 10/07/04 Mark the following formulas (1 each) . Circle a, b or c. b) must be filled in if you have circled it. 5. Coefficient of Excess 2 4 3 4 or g 2 k4 s4 a) This cannot be negative. b) *If this is negative it means the distribution is Platykurtic (Flat – topped). c) This can be negative, but it has no special meaning. 6. n x 3 3x x 2 2nx 3 (Skewness) (n 1)( n 2) a) This cannot be negative. b) *If this is negative it means the distribution is Skewed to the left c) This can be negative, but it has no special meaning. k 3 x x 7. (Sample mean) n a) This cannot be negative. b) If this is negative it means the distribution is ______ c) *This can be negative, but it has no special meaning. 8. s2 x 2 nx 2 (Variance) n 1 a) *This cannot be negative. b) If this is negative it means the distribution is ______ c) This can be negative, but it has no special meaning. 12 Does it really mean anything to tell me that if one of these statistics is negative, the distribution is negative? 6 251y0451 10/07/04 Exhibit 1: The following is taken from Problem 3.22 in the text. The data below represent sales tax receipts submitted to a township government by 50 businesses in one quarter. Sales Taxes ($000) 10.3 13.0 11.1 10.0 9.3 11.1 11.2 10.2 12.9 11.5 9.6 9.0 14.5 13.0 7.3 5.3 12.5 8.0 11.1 9.9 9.8 11.6 9.2 10.0 12.8 12.5 10.7 11.6 7.8 10.5 6.7 11.8 15.1 9.3 7.6 11.0 8.7 12.5 10.4 10.1 8.4 10.6 6.5 12.7 8.9 10.3 9.5 7.5 10.5 8.6 The text solution manual offers the following results. (a) Stem-and-leaf display of Quarterly Sales Tax Receipts 5 6 7 8 9 10 11 12 13 14 15 3 57 3568 04679 02335689 00123345567 011125668 555789 00 5 1 (b) = 10.28 (c) (d) (e) (f) 9. 2 = 4.1820, = 2.045 64% of the receipts are within 1 standard deviations of the mean. 94% of the receipts are within 2 standard deviations of the mean. 100% of the receipts are within 3 standard deviations of the mean. According to the stem and leaf display, what percent of the receipts were below $7000? (1) 3/50 = 6% 10. If the researcher was directed to present the data in 6 classes, what should the class interval be? Show your calculations. 15 15 .1 5.3 1.63 Let’s try 2 Lowest is 5.3. Highest is 15.1 6 11. Show the actual intervals you might use. Class A B C D E F From 5 7 9 11 13 15 17 to Under Under Under Under Under Under 7 9 11 13 15 17 7 251y0451 10/07/04 Before we start, most of you seem to have no idea what ‘3 standard deviations from the mean’ signifies. Nevertheless, one student paper put it this way. 10.28 2.045 10.28 2.045 or 8.235 to 12.325 2 10.28 22.045 10.28 4.090 or 6.190 to 14.370 3 10.28 32.045 10.28 6.135 or 4.145 to 16.415 Two of these should appear in your answer below. 12. The description above says that 64% of the receipts are within 1 standard deviations of the mean. Between what numbers does this mean? How does this compare with the empirical rule? Why might there be a discrepancy? (3) Empirical rule: (For Symmetrical Unimodal distributions only): 68% within one standard deviation of the mean, 95% within two and almost all (99.7%) within three. This is lower and could be because the distribution is not quite symmetric. 13. The description above says that 100% of the receipts are within 3 standard deviations of the mean. Between what numbers does this mean? How does this compare with the Chebyshev rule? Why might there be a discrepancy? (3) 1 1 Chebyshef’s Inequality: P x k 2 or P k x k 1 2 . k k This means that at least 8/9 should be within 3 standard deviations of the mean. In the real world the number is almost always larger. 8 251y0451 10/07/04 ECO251 QBA1 FIRST EXAM October 6, 2004 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Throughout this exam show your work! Please indicate clearly what sections of the problem you are answering and what formulas you are using. Turn this is with your in-class exam. Part IV. Do all the Following (11 Points) Show your work! 1. The frequency distribution below represents the braking distance for a sample of foreign made cars.. Personalize the data as follows. Write down your student number. Take the last two digits of the number. Add the largest of the two last numbers to the frequency for 300-310 and the second largest to the frequency for 310-320. Use the results as your frequencies. For example, Seymour Butz’s student number is 876509 so he adds 0 to the last frequency and 9 to the second to last frequency and uses (1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1). Distance (feet) frequency 210 220 230 240 250 260 270 280 290 300 310 – – – – - 220 230 240 250 260 270 280 290 300 310 320 a. Calculate the Cumulative Frequency (0.5) b. Calculate The Mean (0.5) c. Calculate the Median (1) d. Calculate the Mode (0.5) e. Calculate the Variance (1.5) f. Calculate the Standard Deviation (1) g. Calculate the Interquartile Range (1.5) h. Calculate a Statistic showing Skewness and Interpret it (1.5) i. Make an ogive of the data showing relative or percentage cumulative frequency (Neatness Counts!)(1.5) j. Extra credit: Put a (horizontal) box plot below the ogive using the same scale. (1) 1 3 12 15 22 7 7 5 2 1 1 Solution: x is the midpoint of the class. Our convention is to use the midpoint of 50 to 60, not 50 to 59.999. Note also, that the midpoints have been divided by 10. Most numbers should be multiplied by 10, the variance should be multiplied by 100 and k 3 by 1000. Calculations follow for both the computational and definitional formulas. (Don’t do both.) Seymour’s frequencies are used below. If you used computational formulas, you should have the following. 1 2 3 4 5 6 7 8 9 10 11 n class f F x 210-220 220-230 230-240 240-250 250-260 260-270 270-280 280-290 290-300 300-310 310-320 Total 1 3 12 15 22 7 7 5 2 10 1 85 1 4 16 31 53 60 67 72 74 84 85 215 225 235 245 255 265 275 285 295 305 315 f 85, fx 22155 , fx 215 675 2820 3675 5610 1855 1925 1425 590 3050 315 22155 fx 2 fx3 fx 2 46225 151875 662700 900375 1430550 491575 529375 406125 174050 930250 99225 5822325 5822325 , 9938375 34171875 155734500 220591875 364790250 130267375 145578125 115745625 51344750 283726250 31255875 1543144875 fx 3 1543144875 . 9 251y0451 10/07/04 If you used definitional formulas, you should have the following. 1 2 3 4 5 6 7 8 9 10 11 n 210-220 220-230 230-240 240-250 250-260 260-270 270-280 280-290 290-300 300-310 310-320 1 3 12 15 22 7 7 5 2 10 1 85 1 4 16 31 53 60 67 72 74 84 85 f 85, fx f x x 2 x f F class 215 225 235 245 255 265 275 285 295 305 315 22155 , 47689.4, and f x x f x x 2 f x x 3 -45.647 -106.941 -307.765 -234.706 -124.235 30.471 100.471 121.765 68.706 443.529 54.353 0.000 2083.7 3812.1 7893.3 3672.5 701.6 132.6 1442.0 2965.3 2360.2 19671.8 2954.2 47689.4 -95113 -135892 -202439 -57463 -3962 577 20698 72214 81081 872504 160572 712778 xx fx 215 675 2820 3675 5610 1855 1925 1425 590 3050 315 22155 -45.6471 -35.6471 -25.6471 -15.6471 -5.6471 4.3529 14.3529 24.3529 34.3529 44.3529 54.3529 f x x 0 (except for a possible rounding error), f x x 3 712778. a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column. b. Calculate the Mean (1): x fx 22155 260 .647 n 85 c. Calculate the Median (2): position pn 1 .586 43 . This is above F 31 and below F 53 so pN F the interval is the 5th one, 250 – 260. x1 p L p w so f p .585 31 x1.5 x.5 250 10 250 5.6818 255 .6810 22 d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 22 is the largest frequency, the modal group is 250 to 260 and the mode is 255 .. e. Calculate the Variance (3): s 2 s2 f x x n 1 2 fx 2 nx 2 n 1 5822325 85260 .647 2 47692 .0 567 .762 or 84 84 47689 .4 567 .731 . The computer got 567.731. 84 f. Calculate the Standard Deviation (2): s 567.731 23.8271 . g. Calculate the Interquartile Range (3): First Quartile: position pn 1 .2586 21.50 . This is above pN F F 16 and below F 31, so the interval is 240-250. x1 p L p w gives us f p .2585 16 Q1 x1.25 x.75 240 10 243 .5 . 15 Third Quartile: position pn 1 .7586 64.5 . This is above F 60 and below F 67 , so the .75 85 60 interval is 270-280. x1.75 x.25 270 10 275 .357 . 7 IQR Q3 Q1 275.357 243.5 31.857 . Note that an answer for the mean, median, mode, first quartile or third quartile that is not between the highest and lowest number, in this case 210 and 340, is not reasonable! 10 251y0451 10/07/04 h. Calculate a Statistic showing Skewness and interpret it (3): fx 22155 , fx 2 5822325 , We had n 85, x 260 .647 , f x x 3 k 3 fx 3 1543144875 , and 712778. n (n 1)( n 2) fx 3 3x fx 2 2nx 3 85 1543144875 3260 .647 5822325 285 260 .647 3 84 83 0.0121916 1543144875 4552714633 3010281526 0.0121916 711768 8677 .59 . or k 3 n (n 1)( n 2) or g 1 k3 s 3 f x x 8689 .93 23 .8271 3 3 85 712778 8689 .92 The computer gets 8689.93. 84 83 0.6423958 3mean mode 3260 .647 255 0.7110 std .deviation 23 .8271 Because of the positive sign, the measures imply skewness to the right. or Pearson's Measure of Skewness SK i. An ogive is a simple line graph with cumulative frequency between zero and one on the y-axis and the numbers 200-340 on the x-axis. The data Seymour showed is: F up to Frel 210 220 230 240 250 260 270 280 290 300 310 320 330 0 1 4 16 31 53 60 67 72 74 84 85 85 0 .012 .047 .188 .365 .624 .706 .788 .847 .870 .988 1.000 1.000 Each number in the Frel column is the corresponding number in the F column divided by n 85. The y axis should be marked from zero to a 1.00. The y axis should be marked from zero to a 1.00. In spite of the fact that the question tells you that an ogive shows cumulative frequency, many of you gave me a frequency polygon, most of you did not obey the convention that the curve starts at zero and most of you did not convert of per cent. j. The box plot should show the median and the quartiles and use the same x axis as the ogive. 11 251y0451 10/07/04 2. Use the frequencies you used in problem 1 in this problem as values of x . For these eleven numbers, compute the a) Geometric Mean b) Harmonic mean, c) Root-mean-square (1point each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10 logarithms. (1 point extra credit each ). While you’re at it, compute the sample mean and bring it and the numbers that you used on this take-home exam to the in-class exam (no credit until you get to the exam – but it won’t hurt). Solution: Note that Seymour used the eleven numbers 1, 3, 12, 15, 22, 7, 7, 5, 2, 10, 1. He found x 85 or x 7.72727 . This is not used in any of the following calculations and there is no reason why you should have computed it except to use in class! Note that an answer that is not between the highest and lowest number is not reasonable! a) The Geometric Mean. 1 x g x1 x 2 x3 x n n n x 11 1312 15 22 775210 1 11 58212000 58212000 1 11 58212000 0.0909091 5.08054 . At least, not many of you tried to get the answer by dividing 582112000 by 11, but a number of you seem to have convinced your selves that you could take a square root instead of an 11th root. b) The Harmonic Mean. 1 1 xh n 1 1 1 x 11 1 3 12 15 22 7 7 5 2 10 1 1 1 1 1 1 1 1 1 1 1 1 1.00000 0.33333 0.08333 0.06667 0.04545 0.14286 0.14286 0.20000 0.50000 0.100000 1.00000 11 1 3.61450 0.328591 . 11 So xh 1 1 n 1 x 1 3.0433 . 0.328591 Of course many of you decided that 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 xh n x 11 1 3 12 15 22 7 7 5 2 10 1 1 1 1 1 ??? . . This is, of course, an easier way to do the problem, but I warned you that n x 11 89 1 1 1 it wouldn’t work. . It is equivalent to believing that 2 2 4 c) The Root-Mean-Square. 1 1 2 2 x rms x2 1 3 2 12 2 15 2 22 2 7 2 7 2 5 2 2 2 10 2 12 n 11 1 1 9 144 225 484 49 49 25 4 100 1 11 1 1 x 2 99 .1818 9.9590 . 1091 99 .1818 . So x rms n 5 2 1 1 1 2 x 2 ??? x 812 . This is, of course, an Of course many of you decided that x rms n n 11 easier way to do the problem, but I warned you that it wouldn’t work. It is equivalent to believing that 22 22 42 . 12 251y0451 10/07/04 d) (i) Geometric mean using natural logarithms 1 ln( x) 1 ln 1 ln 3 ln 12 ln 15 ln 22 ln 7 ln 7 ln 5 ln 2 ln 10 ln 1 ln x g n 11 1 0 1.09861 2.48491 2.70805 3.09104 1.94591 1.94591 1.60944 0.69315 2.30259 0 11 1 17 .8796 1.62542 11 So x g e 51.62542 5.08054 . (ii) Geometric mean using logarithms to the base 10 1 log( x) log x g n 1 log 1 log 3 log 12 log 15 log 22 log 7 log 7 log 5 log 2 log 10 log 1 11 1 0 0.47712 1.09861 1.17609 1.34242 0.84510 0.84510 0.69897 0.30103 1.00000 0 11 1 7.76501 0.705910 11 So x g 10 0.70510 5.0854 . Notice that the original numbers and all the means are between 1 and 22. It’s probably more efficient to handle a problem this large in columns. The arithmetic mean is also computed below. 1 x Row x2 logx ln x x 1 2 3 4 5 6 7 8 9 10 11 Total 1 3 12 15 22 7 7 5 2 10 1 85 1.00000 0.33333 0.08333 0.06667 0.04545 0.14286 0.14286 0.20000 0.50000 0.10000 1.00000 3.61450 1 9 144 225 484 49 49 25 4 100 1 1091 0.00000 0.47712 1.07918 1.17609 1.34242 0.84510 0.84510 0.69897 0.30103 1.00000 0.00000 7.76501 0.00000 1.09861 2.48491 2.70805 3.09104 1.94591 1.94591 1.60944 0.69315 2.30259 0.00000 17.8796 Total 7.72727 0.328591 99.1818 0.705910 n 1.62542 So, as before x 7.72727 , xh 1 3.0433 , x rms 0.328591 1 n x 2 99.1818 9.9590 x g 10 0.70510 5.0854 and x g e 51.62542 5.08054 . 13