251y0711 2/20/07 ECO251 QBA1 FIRST EXAM February 21, 2007 Name: ___KEY__________________ Student Number: _________________________ Class Hour: _____________________ Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer needs a calculation or short explanation to count. Part I. (7 points) The following numbers are a sample and represent the prices of regular in a sample of 11 gas stations 2.28, 2.38, 2.50, 2.42, 2.34, 2.38, 2.44, 2.48, 2.38, 2.65, 2.66 Compute the following: Show your work! a) The Median (1) b) The Standard Deviation (3) c) The 4th quintile (2) d) The Coefficient of variation (1) Numbers in order n 11 x x2 x1 2.28 5.1984 x2 2.34 5.4756 x3 2.38 5.6644 a) pn 1 .512 6.0 The middle number is the 6th number, which is 2.42. If you really want to get formal x.50 x6 0x7 x6 2.42 02.44 2.42 . b) x x 26.91 2.44636 , n 11 x 2 nx 2 65 .9757 112.44636 2 11 1 x4 2.38 5.6644 x5 2.38 5.6644 x6 2.42 5.8564 x7 2.44 5.9536 x8 2.48 6.1504 x9 2.50 6.2500 s 0.014425025 0.12010 Calculator got .1200227 without rounding. c) The 4th quintile has 4 5 or 80% below it. pn 1 .812 9.60 . So a 9 and .b 0.60 x10 2.65 7.0225 x1 p xa .b( xa1 xa ) So x11 2.66 7.0756 26.91 65.9757 x1.80 x.20 x9 0.80( x10 x9 ) 2.50 0.602.65 2.50 2.59 Total s2 n 1 0.144250254 0.014425025 . So 10 s 0.12010 0.04909 or 4.91% x 2.44636 Note that mean, median and fourth quintile must be between 2.28 and 2.66. In the variance excess rounding will give you a negative variance. s 2 cannot be negative. d) C 1 251y0711 2/20/07 Part II. (At least 35 points – 2 points each unless marked - Parentheses give points on individual questions. Brackets give cumulative point total.) Exam is normed on 50 points. 1. The difference between cumulative and ordinary frequency distributions is that the cumulative frequency distribution shows the number of observations which are: a) greater than particular values, whereas the ordinary frequency distribution shows the number of observations in each class interval. b)* less than that particular values, whereas the ordinary frequency distribution shows the number of observations in each class interval. c) in each class interval less than that particular values, whereas the ordinary frequency distribution shows the number of observations less than particular values. d) in each class interval less than that particular values, whereas the ordinary frequency distribution shows the number of observations greater than particular values. e) none of the above. 2. Mark the variables below as qualitative or categorical (A), quantitative and continuous (B1) or quantitative and discrete (B2) (1 each) a) atmospheric pressure. B1 b) method of contraception. A c) expenditure per pupil. B1 d) Fahrenheit temperature. B1 e) Number of murders in Philadelphia over a year. B2 [7] Exhibit 1: Given below is the stem-and-leaf display representing the amount of oil in gallons (with leaves in gallons) used by a sample of 25 emergency generators during a power outage. 5 6 7 8 9 | | | | | 3.0 2.1 0.2 1.0 2.8 7.2 2.4 3.1 2.5 7.1 3.0 5.5 7.3 7.8 8.6 8.8 8.8 3.3 6.2 7.7 8.2 8.8 4.5 6.8 7.5 3. In Exhibit 1, if an ogive showing relative frequency is constructed using 50.0 to under 60.0 as the first class, what will be the height of the point above 70 on the x axis? [9] Answer: The interval 60-70 has 9 items in it and the interval before it has 2, so the cumulative frequency is F 11 0.44 F 11 and the relative cumulative frequency is Frel n 25 4. In Exhibit 1 find the first quartile of amount of oil used. [11] Answer: position pn 1 .2526 6.5 . The 6th number is 65.5 and the 7th number is 67.3. So a 6 and .b 0.50 x1 p xa .b( xa1 xa ) So x1.25 x.75 x6 0.50 ( x7 x6 ) 65.5 0.5067.3 65.5 66.4 or simply 65 .5 67 .3 66 .4 . 2 2 251y0711 2/20/07 Exhibit 1: Given below is the stem-and-leaf display representing the amount of oil in gallons (with leaves in gallons) used by a sample of 25 emergency generators during a power outage. 5 6 7 8 9 | | | | | 3.0 2.1 0.2 1.0 2.8 7.2 2.4 3.1 2.5 7.1 3.0 5.5 7.3 7.8 8.6 8.8 8.8 3.3 6.2 7.7 8.2 8.8 4.5 6.8 7.5 5. Using the data in Exhibit 1, Assume that the data is to be presented in 7 classes, show how you would decide what class interval to use and list the classes below with their frequencies. (5) [16] Class Frequency A __ to under __ __ B __ to under __ __ C __ to under __ __ D __ to under __ __ E __ to under __ __ F __ to under __ __ G __ to under __ __ Answer: The numbers lie between 53.0 and 97.5. So the width will be something slightly above 97 .5 53 .0 w 6.3571 . You could use 6.36 or 7 or, perhaps 8. I will try 8. 7 Class Frequency A 50 to under 58 2 B 58 to under 66 4 C 66 to under 74 8 D 74 to under 82 5 E 82 to under 90 3 F 90 to under 98 3 G 98 to under 106 0 This didn’t work because the last class was empty. So I will try 7. Class A 50 to under 57 B 57 to under 64 C 64 to under 71 D 71 to under 78 E 78 to under 85 F 85 to under 92 G 92 to under 99 Frequency 1 4 7 4 5 1 3 25 3 251y0711 2/20/07 6. If a frequency distribution is skewed to the left, which of the following measures is likely to have the largest value? [18] a) mean b) median c) *mode d) All of the above will be almost the same size. e) the parameter f) It is impossible to tell unless we know whether we are dealing with a sample or a population. Explanation: The usual ordering has the median between the mean and the mode. The mean will be pulled down relative to the mode, so the median would lie below the mode too and the largest value is the mode. 7. A list of the countries that are members of the European Union in order of their GDP per capita is an example of [20] a) *Ordinal data. b) Nominal data. c) Interval data. d) Ratio data. e) None of the above. 8. A frequency distribution is of unknown shape and consists of 600 observations with a mean of 162 and a standard deviation of 12. a) What is the minimum number of observations that must fall between 138 and 186? Answer: 138 is 24 below the mean. 24 is twice 12 so the range 138 to 186 is two standard deviations above and below the mean. According to the Tchebyschev inequality, the largest number of observations that 1 1 could be more than k 2 standard deviations from the mean is 2 . So the interval must contain at 4 k least three quarters of the observations or 450 out of 600. b) What is the maximum number of observations that could be above 210? [24] Answer: 210 is 48 above the mean. This is k 4 standard deviations. So the maximum number is 1 1 . This would be (rounding down) 37 out of 600 observations. (Actually there is a 1-tailed version 2 16 k 1 1 that says Px k . This would give us 35 observations.) 2 17 1 k c) How would you change your answers to a) and b) if you found that the distribution was symmetrical and unimodal? (3) The Empirical rule says 68% within one standard distribution of the mean, 95% within two and almost all (99.7%) within three. This implies about 408 in 138-186 and at most 2 above 198. There are unlikely to be any above 210. [27] 9. You have a deck of 52 cards consisting of ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, king and queen of hearts, clubs, diamonds and spades. Explain how you could divide the deck into 2 classes that are: (1 each) a) collectively exhaustive but not mutually exclusive; This is pretty open ended. For this one we could try cards up to and including 6s 6 and cards above 5 6 . 6 would be in both groups. b) mutually exclusive but not collectively exhaustive; Spades as one group, clubs as the other. All red cards have been forgotten. We could also try cards below 6 and cards above 6 (and forget 6). c) both mutually exclusive and collectively exhaustive. Red cards as one group, Black as the other. Or maybe face cards as one, non-face cards as the other. We could also try cards below 6 and cards above 6.up to and including 6s 6 and cards above 5 6 . [30] 4 251y0711 2/20/07 10. In ECO 252 you will learn to test a null hypothesis. A null hypothesis is a quantitative statement about a population that can be disproved. The null hypothesis must meet three requirements: First, it must contain , or ; Second, it must include a parameter or parameters and Third, it must contain reasonable values for the parameter. Consider the following: (i) 3, (ii) 3, (iii) x 2, (iv) s 5, (v) 3. The following could be null hypotheses: a) (iii), (iv) and (v). b) (i), (ii) and (v). c)* (i) and (ii). d) (i) only. e) all of the above f) None of the above. [32] Explanation: (iii) x 2 and (iv) s 5 involve sample statistics, not population parameters. (v) 3 involves a parameter, but a standard deviation cannot be negative. 11. Which of the following statements about the median is not true? a) *It is more affected by extreme values than the arithmetic mean. b) It is a measure of central tendency c) It is equal to the second quartile. d) It is equal to the mode in symmetrical, unimodal distributions. e) All of the above are true. 12. In a set of numerical data the second quartile is always halfway between the first and third quartile. The above statement is: (1) a) True b) *False Explanation: It’s between them all right but only halfway if the distribution is symmetrical. 5 251y0711 2/20/07 ECO251 QBA1 FIRST EXAM February 21, 2007 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Throughout this exam show your work! Please indicate clearly what sections of the problem you are answering and what formulas you are using. Turn this is with your in-class exam. Part IV. Do all the Following (12+ Points). These are based on problems by Edward J. Kane. Show your work! 1. In May 1997 Forbes Magazine provided data on the salaries of 50 CEOs. These were arranged by Allen L. Webster to give the table below. Amounts are in thousands. Treat these data as a sample. Personalize the data below by adding the six digits of your student number to the last 6 frequencies. .For example, Seymour Butz’s student number is 876509 so he adds 8 to second frequency and 7 to the third frequency, etc and uses {9, 19, 17, 14, 9, 3, and 14} (adding to 85). You may check your work on the computer, but what is turned in should look as if it had all been done by hand. Salary in Thousands Frequency 1 2 3 4 5 6 7 90 to 440 to 790 to 1140 to 1490 to 1840 to 2190 to under under under under under under under 440 790 1140 1490 1840 2190 2540 a. Calculate the Cumulative Frequency (0.5) b. Calculate the Mean (0.5) c. Calculate the Median (1) d. Calculate the Mode (0.5) e. Calculate the Variance (1.5) f. Calculate the Standard Deviation (1) g. Calculate the Interquartile Range (1.5) h. Calculate a Statistic showing Skewness and interpret it (1.5) i. Make a frequency polygon of the data (Neatness Counts!)(1) j. Extra credit: Put a (horizontal) box plot below the frequency chart using the same horizontal scale (1) 9 11 10 8 4 3 5 Note that unreasonable answers are answers where the mean, median, mode, first quartile and third quartile do not fall between 90 and 2540. Solution using the original numbers: If we use the original numbers and either the computational method (Columns 1-7) or the definitional method (Columns 1-5, 8-11), we get the following for the frequencies. (1) Row Class 1 90 to under 2 440 to under 3 790 to under 4 1140 to under 5 1490 to under 6 1840 to under 7 2190 to under Total (11) Row 1 2 3 4 5 6 (2)(3) (4) f F x 440 9 9 265 790 11 20 615 1140 10 30 965 1490 8 38 1315 1840 4 42 1580 2190 3 45 1845 2540 5 50 2110 50 (5) fx 2385 6765 9650 10520 6320 5535 10550 51725 (6) 2 fx (7) 3 fx 632025 1.67487E+08 4160475 2.55869E+09 9312250 8.98632E+09 13833800 1.81914E+10 9985600 1.57772E+10 10212075 1.88413E+10 22260500 4.69697E+10 70396725 111492128375 (8) (9) (10) xx f x x f x x 2 -769.5 -419.5 -69.5 280.5 545.5 810.5 1075.5 -6925.5 5329172 -4614.5 1935783 -695.0 48303 2244.0 629442 2182.0 1190281 2431.5 1970731 5377.5 5783501 0.0 16887213 f x x 3 -4100798046 -812060864 -3357024 176558481 649298286 1597277273 6 251y0711 7 2/20/07 6220155594 3727073700 I usually tell people that they are wasting their time if they use the definitional method. Because of the large numbers here that may not be true. Remember that the numbers here are in thousands. Because of the large numbers we might want to try to work in millions. This would mean changing the x column to 0.265, 0.615, 0.965, 1.140 etc. We would get, for the first row fx = 2.385, fx2 = 0.6320 and fx3 = 0.1675. In any case the definitional method numbers would be more tractable. If you used the computational method, you would have computed columns 2, 3, 4, 5, 6, and 7 and gotten f 50 and fx 51725 , so that the mean is x find fx 70396725 and fx 111492128375. n 2 fx 51725 1034 .5 . You would also n 50 3 If you used the definitional method, you would have computed columns 2, 3, 4, 5, 8, 9, 10 and 11 and gotten 7 and gotten n f 50 You would have followed by getting f x x 3 fx 51725 fx 51725 , so that the mean is x n 50 1034 .5 . f x x 0 (a check), f x x 2 16887213 and and 3727073700 . If you used one of Pearson’s measures of skewness, you would not have bothered with columns 7 or 11. In any case only an Adrian Munk personality would have computed everything here. a. Calculate the Cumulative Frequency (0.5): See the F column above. fx 51725 b. Calculate the Mean (0.5): We have already found x 1034 .5 . n 50 c. Calculate the Median (1): position pn 1 .550 25 . This is above F 20 and below F 30, so the interval is the 3rd, 790 to 1140, which has a frequency of 10. Each interval width is 1140 - 790 = 350. pN F .550 20 x1 p L p w so x1.5 x.5 790 350 790 0.5350 965 10 f p d. Calculate the Mode (0.5): The largest group is 440 to 790, which has a frequency of 11, so by convention the mode is its midpoint, which is mo 615. It is possible that you will have two modes. Note that to be reasonable, Q1 x50 Q3 and that Q1, x50 , Q3, x and the mode must be between 90 and 2540. e. Calculate the Variance (1.5): s 2 or s 2 f x x n 1 2 fx 2 nx 2 n 1 70396725 50 1034 .52 16887212 .5 344636 .99 49 49 16887213 346637 . The computer got 346637 too. 49 f. Calculate the Standard Deviation (1): s 346637 588.76 g. Calculate the Interquartile Range (1.5): First Quartile: position pn 1 .2551 12.75 . This is above F 9 and below F 20, so the interval is the 2nd, 440 to 790, which has a frequency of 11. pN F .2550 9 x1 p L p w gives us Q1 x1.25 x.75 440 350 11 f p 440 0.31818 350 551 .36 . 7 251y0711 2/20/07 Third Quartile: position pn 1 .7551 38.25 . This is above F 38 and below F 42, so the interval is the 5th, 1490 to 1840 which has a frequency of 4. .75 50 38 x1.75 x.25 1490 350 1490 0.125 350 1446 .25 . Since this ended up in the 4 .7550 30 wrong group, I tried the earlier group. x1.75 x.25 1140 350 1468 .125 8 This illustrates the inaccuracy of the formula, which is only an approximation. We can go back to the original assumption about the layout of the numbers. The interval 1140 to 1490 has a frequency of 8. This yields an interval between subsequent pairs of numbers between x31 and x38 of 3508 43.75 . We assume a half interval of 21.875 between 1140 and x31 and another interval of the same size between x38 and 1490. The interval 1490 to 1840 contains x39 through x 42 and has a frequency of 4. The interval between subsequent pairs of numbers between x39 and x 42 is 350 4 87.5 . The half interval between 1490 and x39 or between x 42 and 1840 is 43.75. If the colon below represents the group boundary at 1490, a diagram of x37 through x 40 appears below. The intervals between x38 and x39 add to 21.875 + 43.75 = 65.625. If position pn 1 .7551 38.25 , our value should be x38 .25x39 x38 1468 .75 .2565 .625 x38 .25x39 x38 1468 .75 .2565.625 1484 .66. x37 1424 .375 x38 1468 .125 : x39 1533 .75 x 40 1621 .25 43.75 21.875 43.75 87.5 Obviously, I would accept any of the three answers given here, but 1446.25 is probably the worst. If we use 1468.125, we have IQR Q3 Q1 1468 .125 551 .36 916 .765 . Note that, no matter how much you may want to believe it, it is not true that the IQR = (.25) (n+1) – (.75) (n+1) = .50 (n+1). h. Calculate a Statistic showing Skewness and interpret it (1.5) : We had n 50 and that the mean is x 1034 .5 . We also found that f x x 3 k 3 fx 2 70396725 , fx 3 fx 51725 , so 111492128375 and 3727073700 , s 588 .76 , x.5 965 and mo 615. n (n 1)( n 2) fx 3 3x fx 2 2nx 3 50 1114921283 75 31034 .570396725 250 1034 .53 4948 0.0212585 1114921283 75 2.1847624 10 10 1.1071118 10 11 0.0212585 327073700 79231996 . or k 3 n (n 1)( n 2) or g 1 k3 or s3 f x x 793232009 1034 .53 3 50 3727073700 79232008 .92 The computer gets 79232009. 49 48 0.391614 Pearson's Measure of Skewness SK1 mean mode 1034 .5 615 0.7125 or std .deviation 588 .76 3mean median 31034 .5 965 0.3541 588 .76 std .deviation Because of the positive sign, the measures all imply skewness to the right.. SK 2 8 251y0711 2/20/07 i. Make a frequency polygon of the data (Neatness Counts!)(1) If we add to dummy groups to our data, we have the following. Salary in Thousands 0 1 2 3 4 5 6 7 8 -260 90 440 790 1140 1490 1840 2190 2540 to to to to to to to to to under under under under under under under under under Frequency 90 440 790 1140 1490 1840 2190 2540 2890 Midpoint 0 9 11 10 8 4 3 5 0 -85 265 615 965 1315 1580 1845 2110 2460 Normally, A frequency polygon would require that we plot the midpoints on the x axis and the frequencies on the y axis using straight lines between each point. The graph would begin and end at a zero frequency. However, the most natural display here would let x run from zero to 2500 or 3000. At zero the value on the vertical axis would be about 2.19. ( This, of course, is very approximate. The equation for a line going through (-85, 9) and (265, 9) would be y 2.18535 0.02571x and this would be 2.18535 at x 0 , but you don’t need to know this to have an approximately correct intercept. j. Extra credit: Put a (horizontal) box plot below the frequency chart using the same horizontal scale (1) The five-number summary is (265, 551.36, 965, 1468.125. 2460). IQR Q3 Q1 1468 .125 551 .36 916 .765 . 1.5( IQR ) 1375 .15 If you use fences, they should be at 551 .36 1375 .15 823 .79 and 1468 .125 1375 .15 2843 .28 . But these are beyond the range of the data, which makes them irrelevant. So the box extends from 551.36 to 1468.125, with a median marked by a horizontal line at 965. The whiskers go from the box to 265 and 2460 with dotted lines showing the full range unnecessary. A rough picture is below. 0 500 1000 1500 2000 2500 2. Take your student number as a sample of size 6. Each digit will be a separate number. Change all zeroes to nines. For example, Seymour Butz’s student number is 876509, so his numbers are 8, 7, 6, 5, 9 and 9. Find the following a) Geometric Mean b) Harmonic mean c) Root-mean-square If you wish, d) Compute the geometric mean from a) using natural and/or base 10 logarithms. (1 point extra credit each). Solution: Using Seymour’s numbers, and being incredibly lazy, I ran most of this on Minitab. ————— 2/21/2007 5:44:04 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > let c2 = loge(c1) MTB > let c3 = logten (c1) MTB > let c4 = 1/c1 MTB > let c5 = c1*c1 MTB > print c1 - c5 9 251y0711 2/20/07 Data Display Row 1 2 3 4 5 6 x 8 7 6 5 9 9 ln_x 2.07944 1.94591 1.79176 1.60944 2.19722 2.19722 log_x 0.903090 0.845098 0.778151 0.698970 0.954243 0.954243 1/x 0.125000 0.142857 0.166667 0.200000 0.111111 0.111111 xsq 64 49 36 25 81 81 MTB > sum c1 Sum of x Sum of x = 44 MTB > sum c2 Sum of ln_x Sum of ln_x = 11.8210 MTB > sum c3 Sum of log_x Sum of log_x = 5.13379 MTB > sum c4 Sum of 1/x Sum of 1/x = 0.856746 MTB > sum c5 Sum of xsq Sum of xsq = 336 Solution: Using the original data, before I started, I computed the following table. (1) (2) My computations are thus as below. 1 Row x x2 ln(x) log(x) x 1 8 2.07944 2 7 1.94591 3 6 1.79176 4 5 1.60944 5 9 2.19722 6 9 2.19722 Sum 44 11.82099 0.903090 0.845098 0.778151 0.698970 0.954243 0.954243 5.133795 0.125000 0.142857 0.166667 0.200000 0.111111 0.111111 0.856746 64 49 36 25 81 81 336 44 7.333333 . Note that reasonable answers should all fall in the interval between 6 the highest and lowest digit. In this case that would mean 5 to 9. The arithmetic mean is 1 a) The Geometric Mean. The formula table says x g x1 x 2 x3 x n n n x x g 6 876599 6 136080 136080 6 1369890.1666667 7.17187 . 1 b) The Harmonic Mean. The formula table says 1 1 xh n x 1 1 11 1 1 1 1 1 1 0.125000 0.142857 0.166667 0.200000 0.111111 0.111111 xh 6 8 7 6 5 9 9 6 10 251y0711 2/20/07 1 0.856746 0.142791 . 6 So xh 1 1 n 1 x 1 7.00324 . 0.142791 Of course some of you decided that 1 1 1 1 1 1 1 1 1 1 ? 1 xh n x 68 7 6 5 9 9 1 ??? . 68 7 6599 This is, of course, an easier way to do the problem. It is also wrong, and you will get an A for the course if you can prove to me that it is not wrong! 1 1 x 2 or x rms 2 x2 c) The root-mean-square. The formula table says x rms n n 1 2 336 8 7 2 62 52 92 92 56 . So x rms 56 7.48331 . 6 6 1 ln( x) , but I said in class that this could be d) Geometric Mean. The formula table says ln x g n either natural logs or logs to the base 10. 1 Natural Logarithms. ln x g 11 .82099 1.970165 and x g e1.97019 7.171860 . There 6 must be a substantial rounding error here. 1 Logarithms to the base 10. log x g 5.133795 0.85563 and x g 10 0.85563 7.17183 6 x rms 2 11