251y0411 2/27/04 ECO251 QBA1 FIRST HOUR EXAM February 18, 2004 Name: ________KEY___________ Student Number and class: _____________________ Part I. (7 points) Use the 5 3-digit numbers that you used in the second problem in the take-home exam. (If you don’t have them – take the numbers (9, 9, 9, 9, 9, 9, 10, 12, 8,1300) and replace the nines with your student number, changing any zeros in your student number to ones, then rewrite the resulting string of numbers as five three-digit numbers. Example: Seymour Butz’s student number is 976500, so he gets 976511101281300 which, as three digit numbers is (976, 511, 101, 281, 300. ) Compute the following: a) The Median (1) b) The Standard Deviation (3) c) The 3rd Quintile (2) d) The Coefficient of variation (1) Solution: The numbers in order are 101, 281, 300, 511, 976. x x2 101 10201 x1 281 78961 x2 x3 300 90000 x4 511 261121 x5 Total 976 952576 2169 1392859 a) The middle number is 300. b) n 5, x x 2169 433 .80 , s x 2 2 nx 2 n 5 n 1 451946 .8 112986 .7 . So s 112986.7 336.1349 4 c) pn 1 .66 3.6 . So a 3 and .b .6 1392859 5433 .80 2 5 1 x1 p xa .b( xa1 xa ) so x1.6 x.4 x3 0.6( x 4 x3 ) 511 0.6(511 300) 637.6 d) C s 336 .1349 0.7749 x 433 .8 1 251y0411 2/16/04 Part II. (At least 35 points – At least 2 points each) 1. Which of the following is a graph of the cumulative distribution? a) *Ogive b) Histogram c) Frequency Polygon d) Pie Chart e) None of the above 2. Which of the following is an example of continuous ratio data? a) The Likert Scale rates consumer impression of a product on a 1 to 5 scale with one best and 5 worst. b) The Celsius scale for measuring temperature c) The number of Brittany Spaniels entered in a dog show d) *The number of dollars you paid in sales tax last year. 3. Cumulative relative frequency cannot be calculated by a) *Taking the cumulative frequency for each class and dividing by the sum of cumulative frequencies for all classes. b) The relative frequency for each class plus the sum of the relative frequencies of all previous classes. c) The relative frequency of each class plus the cumulative relative frequency of the previous class d) The cumulative frequency of each class divided by the sum of the frequencies for all classes. e) All of the above can be used to calculate cumulative relative frequencies. 4. Consider the following formulas (i) x 2 nx 2 n 1 (ii) k 3mean mode (iii) 33 std .deviation s n x 3 3x x 2 2nx 3 .If the sample is skewed to the left, which of these (n 1)( n 2) should be positive? a) *(i) b) (ii) c) (iii) d) (iv) e) None should be positive f) All should be positive. Answer: Any legitimate measure of skewness should be negative if the population is skewed to the n x x 3 right. From your formula table, the measures of skewness are: (i) k 3 (n 1)( n 2) (iv) n (n 1)( n 2) x 3 3x x 2 2nx 3 - skewness, (ii) g1 k3 s3 - relative skewness and 3mean mode (iii) SK - Pearson’s measure of skewness. The other one is s 2 std .deviation x 2 nx x x 2 n 1 2 - the sample variance, which is always positive and measures dispersion. . n 1 251y0411 2/16/04 2 5. (D-68) From which of the following would it be easiest to calculate the interquartile range? a) Mean, median and mode b) Ogive c) *Box Plot. d) Stem-and-leaf plot 6. A summary measure that is computed to describe a characteristic of a sample is called a) a parameter. b) a census. c) *a statistic. d) the scientific method. 7. What is the difference between a field and a cell? Make a diagram of a table and show where one of these is. Solution A cell is a location in a field. The table that was handed out to you is below. Table Number Title Headnote Stub Master Caption Stub Head R O W L A B E L S Footnotes Source Note Column Labels C E L L S Boxhead Field Field 8. If a distribution is skewed to the right, we would expect a) mode > mean b) *mode < median c) median > mean d) mode > median Explanation: If it is skewed to the right the order should be ‘mode, median, mean,’ so the median is larger than the mode. 9. The estimation of the population average family expenditure on food based on the sample average expenditure of 1,000 families is an example of a) *inferential statistics. b) descriptive statistics. c) a parameter. d) a statistic. 3 251y0411 2/16/04 10. Which of the following is most likely a parameter as opposed to a statistic? a) the average score of the first five students completing an assignment b) *the proportion of females registered to vote in a county c) the average height of people randomly selected from a database d) the proportion of trucks stopped yesterday that were cited for bad brakes Duplicate question – ignore!!! 11. Which of the following is most likely a parameter as opposed to a statistic? a) the average score of the first five students completing an assignment b) the proportion of females registered to vote in a county c) the average height of people randomly selected from a database d) the proportion of trucks stopped yesterday that were cited for bad brakes TABLE 2-2 At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen by each information systems officer. X f_ 1 7 2 5 3 11 4 8 5 9 12. Referring to Table 2-2, across all of the regional offices, how many total employees were supervised by those surveyed? a) 15 b) 40 c) *127 d) 200 fx 71 52 113 84 95 7 10 33 32 45 127 . Explanation: TABLE 2-4 A survey was conducted to determine how people rated the quality of programming available on television. Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The stem-and-leaf display of the data is shown below. StemLeaves 3 24 4 03478999 5 0112345 6 12566 7 01 8 9 2 13. Referring to Table 2-4, what fraction of the respondents rated overall television quality with a rating of 80 or above? a) 0.00 b) *0.04 c) 0.96 d) 1.00 Explanation: There are 25 numbers. The highest 3 are 70, 71 and 92. Only 92 is above 80. 1 out of 25 is 1 25 .04. 4 251y0411 2/16/04 14. (6 points) On the basis of 100 observations Stock A has a mean rate of return of 7% with a standard deviation of 1% Stock B has a mean rate of return of 9% and a standard deviation of 1.5% For stock A the fraction of observations between 4% and 10% must be at least ___88___%. If returns on stock A have a symmetrical unimodal distribution, the fraction of observations between 4% and 10% must be approximately __99.7%_______. According to what you have learned in class, which of these two stocks is riskiest? You must show why for your answer to count. Explanation: According to the Bienayme-Chebyshev rule (I called it Chebyshef’s Inequality), 1 k 2 is the largest possible proportion in the tails, where tails are defined as the points below k and the points above k . Since, for stock A, 7 and 1, 4 is 3 and 10 is 3 , so k 3. 1 k2 19 is the maximum proportion in the tails and 1 19 8 9 88 .8% is the minimum proportion in the center. According to the empirical rule, almost all or approximately 99.7% must be in the center (‘99%,’ ‘99.7%’ or ‘almost all’ were accepted.) For the Stock A, the coefficient of variation is 1 7 0.14 and for Stock B it is 1.5 9 0.167 . Stock B is riskier. 15. (3 points) A survey of 47 cities shows that the number of new AIDS cases reported last year varied from 135 to 1337. If these data are to be presented in 5 classes, what intervals would you use? Explain your reasoning using an appropriate formula and use it to fill in the table below. Class From To A B C D E 1337 135 Use 240 .4 . A possible interval above 240.4 might be 250 or 300. We might use one 5 of the following: Class From To Class From To A 100 under 350 A 0 under 300 B 350 under 600 B 300 under 600 C 600 under 850 C 600 under 900 D 850 under 1100 D 900 under 1200 E 1100 under 1350 E 1200 under 1500 16. Cities are divided into three classes Class A At least 900 new AIDS cases B Less than 900 new AIDS cases C More than 30% of new AIDS cases also had a chronic contagious disease. Which of the following classes are mutually exclusive? (Circle) (1.5) A and B , B and C, A, B, and C Which of the following classes are collectively exhaustive? (Circle) (1.5) A and B , B and C, A, B, and C 5 251y0411 2/16/04 ECO251 QBA1 FIRST EXAM February 18, 2004 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Throughout this exam show your work! Please indicate clearly what sections of the problem you are answering and what formulas you are using. Turn this is with your in-class exam. Part III. Do all the Following (11 Points) Show your work! 1. Look at the frequency distribution below. Replace the 9s with your student number. If any digit of your student number is 0, change it to a 1. For example, Seymour Butz’s student number is 976500 so the frequencies he uses are (9, 7, 6, 5, 1, 1, 10, 12, 8). Class frequency $500- 599.99 $600- 699.99 $700- 799.99 $800- 899.99 $900- 999.99 $1000-1099.99 $1100-1199.99 $1200-1299.99 $1300-1399.99 a. Calculate the Cumulative Frequency (0.5) b. Calculate The Mean (0.5) c. Calculate the Median (1) d. Calculate the Mode (0.5) e. Calculate the Variance (1.5) f. Calculate the Standard Deviation (1) g. Calculate the Interquartile Range (1.5) h. Calculate a Statistic showing Skewness and Interpret it (1.5) i. Make a histogram of the Data showing relative or percentage frequency (Neatness Counts!)(1) j. Extra credit: Put a (horizontal) box plot below the histogram using the same scale. (1) 9 9 9 9 9 9 10 12 8 Assume that this data represents a sample of rents paid in Chester County. Solution: x is the midpoint of the class. Our convention is to use the midpoint of 50 to 60, not 59.999. Note also, that the midpoints have been divided by 10. Most numbers should be multiplied by 10, the variance should be multiplied by 100 and k 3 by 1000. Calculations follow for both the computational and definitional formulas. (Don’t do both.) class A B C D E F G H I f F x 50- 59.999 9 9 55 60- 69.999 7 16 65 70- 79.999 6 22 75 80- 89.999 5 27 85 90- 99.999 1 28 95 100-109.999 1 29 105 110-119.999 10 39 115 120-129.999 12 51 125 130-139.999 8 59 135 59 fx fx 2 495 455 450 425 95 105 1150 1500 1080 5755 27225 29575 33750 36125 9025 11025 132250 187500 145800 612275 fx3 1497375 1922375 2531250 3070625 857375 1157625 15208750 23437500 19683000 69365875 xx f x x -42.5424 -382.881 -32.5424 -227.797 -22.5424 -135.254 -12.5424 -62.712 -2.5424 -2.542 7.4576 7.458 17.4576 174.576 27.4576 329.492 37.4576 299.661 0.001 f x x 2 16288.7 7413.0 3049.0 786.6 6.5 55.6 3047.7 9047.1 11224.6 50918.8 f x x 3 -692959 -241238 -68731 -9865 -16 415 53205 248411 420447 -290331 f 59, fx 5755 , fx 612275 , fx 69365875 , f x x 0 (except for a rounding error), f x x 2 50918.8, and f x x 3 290331. Note that, to be reasonable, the n 2 3 mean, median and quartiles must fall between 50 and 140. a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column. 251y0411 2/16/04 6 b. Calculate the Mean (1): x fx 5755 97.5424 n 59 c. Calculate the Median (2): position pn 1 .560 30 . This is above F 29 and below F 39, pN F so the interval is G, 110-119.999 in hundreds. x1 p L p w so f p .559 29 x1.5 x.5 110 10 110 0.5 110 .5 10 d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 12 is the largest frequency, the modal group is , 120 to 129.99 and the mode is 125 (in hundreds). e. Calculate the Variance (3): s 2 s2 f x x n 1 2 fx 2 nx 2 n 1 612275 59 97 .5424 2 50918 .332 877 .902 or 58 58 50918 .8 877 .910 . The computer got 877.908. 58 f. Calculate the Standard Deviation (2): s 877.9 29.629 . g. Calculate the Interquartile Range (3): First Quartile: position pn 1 .2560 15 . This is above pN F F 9 and below F 16 , so the interval is B, 60-69.999. x1 p L p w gives us, in hundreds, f p .25 59 9 Q1 x1.25 x.75 60 10 68 .214 . 7 Third Quartile: position pn 1 .7560 45 . This is above F 39 and below F 51, so the interval .7559 39 is H, 120-129.999. x1.75 x.25 120 10 124 .375 . 12 IQR Q3 Q1 124.375 68.214 56.161. h. Calculate a Statistic showing Skewness and interpret it (3): fx 5755 , fx 612275 , fx 69365875 , and f x x 3 290331. n fx 3x fx 2nx 585957 69365875 397.5424 612275 259 97.5424 (n 1)( n 2) We had n 59, k3 2 3 2 3 3 3 0.017846 69365875 179168319 109512153 0.017846 290291 5181 . or k 3 n (n 1)( n 2) or g 1 k3 s 3 f x x 5181 29 .629 3 3 59 290331 5181 The computer gets -5181.37 5857 0.19912 3mean mode 397 .5424 125 2.780 std .deviation 29 .629 Because of the negative sign, the measures imply skewness to the left. or Pearson's Measure of Skewness SK 7 251y0411 2/16/04 i. A histogram is a simple bar graph with frequency on the y-axis and the numbers 300-1200 on the x-axis. The data Seymour showed is: class f f rel A B C D E F G H I 50- 59.999 9 60- 69.999 7 70- 79.999 6 80- 89.999 5 90- 99.999 1 100-109.999 1 110-119.999 10 120-129.999 12 130-139.999 8 59 .1525 .1186 .1017 .0847 .0169 .0169 .1695 .2034 .1356 .9998 Each number in the column is the corresponding number in the column divided by n 59. The y axis couold be marked from zero to 0.25. j. The box plot should show the median and the quartiles and use the same x axis as the histogram.. 2. Use the frequencies you used in problem 1 in this problem as values of x . Add 1300 at the end. Write the result in clumps of 3 digits. Example: In the last problem, Seymour Butz used (9, 7, 6, 5, 1, 1, 10, 12, 8). If we add 1300 at the end, we have 976511101281300. In 3 digit clumps this gives him (976, 511, 101, 281, 300). For these five numbers, compute the a) Geometric Mean b) Harmonic mean, c) Root-mean-square (1point each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10 logarithms. (1 point extra credit each ). While you’re at it, compute the sample mean and bring it to the exam (no credit – but it won’t hurt). x 2169 . This is not used in any of the following calculations and Solution: Note that Seymour found there is no reason why you should have computed it! a) The Geometric Mean. 1 x g x1 x 2 x3 x n n n x 5 976 511101281300 5 4.2463879 10 12 4246387930 00 1 4246387900 00.20 335 .432 . b) The Harmonic Mean. 1 1 xh n 1 1 1 x 5 976 511 101 281 300 5 0.00102459 1 1 0.019774583 5 1 1 1 0.00395492 . 1 So xh 1 1 n 1 x 0.00195695 0.009900990 0.00355872 0.00333333 1 252 .850 0.00395492 c) The Root-Mean-Square. 1 1 1 2 x rms x 2 976 2 511 2 101 2 281 2 200 2 952576 261121 10201 78961 90000 n 5 5 1 1392859 5 278571 .8 . So x rms 1 n x 2 278571 .8 527 .799 . 8 5 251y0411 2/16/04 d) (i) Geometric mean using natural logarithms 1 ln( x) 1 ln 976 ln 511 ln 101 ln 281 ln 300 ln x g n 6 1 1 6.88346 6.23637 4.61512 5.63835 5.7037825 29 .07709 5.81542 5 5 So x g e 5.81542 335 .432 . (ii) Geometric mean using logarithms to the base 10 1 log( x) 1 log 976 log511 log101 log 281 log300 log x g n 5 1 1 2.98945 2.70842 2.00432 2.44871 2.47712 12 .62801 2.52560 . 6 5 So x g 10 2.52560 335 .432 . Notice that the original numbers and all the means are between 101 and 976. 9