Solutions Manual to Accompany 2-1 Essentials of Modern Business Statistics With Microsoft Excel Eighth Edition David R. Anderson University of Cincinnati Dennis J. Sweeney University of Cincinnati Thomas A. Williams Rochester Institute of Technology South-Western Cincinnati, Ohio 2-1 Commented [Ma1]: Contents Preface Chapter 1. Data and Statistics 2. Descriptive Statistics: Tabular and Graphical Methods 3. Descriptive Statistics: Numerical Methods 4. Introduction to Probability 5. Discrete Probability Distributions 6. Continuous Probability Distributions 7. Sampling and Sampling Distributions 8. Interval Estimation 9. Hypothesis Testing 10. Comparisons Involving Means 11. Comparisons Involving Proportions and A Test of Independence 12. Simple Linear Regression 13. Multiple Regression 14. Statistical Methods for Quality Control Preface The purpose of Essentials of Modern Business Statistics with Microsoft Excel is to provide students, primarily in the fields of business administration and economics, with a sound conceptual introduction to the field of Statistics and its many applications. The text is applications-oriented and has been written with the needs of the nonmathematical in mind. The solutions manual furnishes assistance by identifying learning objectives and providing detailed solutions for all exercises in the text. Note: The solutions to the case problems are included in a separate manual. Acknowledgements We would like to provide special recognition to Catherine J. Williams for her efforts in preparing the solutions manual. David R. Anderson Dennis J. Sweeney Thomas A. Williams 2-3 Chapter 1 DATA and STATISTICS Learning Objectives 1. Obtain an appreciation for the breadth of statistical applications in business and economics. 2. Understand the meaning of the terms elements, variables, and observations as they are used in STATISTICS. 3. Understand that DATA are obtained using one of the following scales of measurement: nominal, ordinal, interval, and ratio. 4. Obtain an understanding of the difference between qualitative, quantitative, crossectional and time series DATA. 5. Learn about the sources of DATA for statistical analysis both internal and external to the firm. 6. Be aware of how errors can arise in DATA. 7. Know the meaning of descriptive STATISTICS and statistical inference. 8. Be able to distinguish between a POPULATION and a SAMPLE. 9. Understand the role a SAMPLE plays in making statistical inferences about the POPULATION. 2-4 Solutions: 1. 2. 3. 4. 5. STATISTICS can be referred to as numerical facts. In a broader sense, STATISTICS is the field of study dealing with the collection, analysis, presentation and interpretation of DATA. a. 9 b. 4 c. Country and room rate are qualitative variables; number of rooms and the overall score are quantitative variables. d. Country is nominal; room rate is ordinal; number of rooms is ratio and overall score is interval. a. Average number of rooms = 808/9 = 89.78 or approximately 90 rooms b. 2 of 9 are located in England; approximately 22% c. 4 of 9 have a room rate of $$; approximately 44% a. 10 b. All brands are models of minisystems manufactured. c. Average price = 3140/10 = $314 d. $314 a. 5 b. Price, CD capacity, and the number of tape decks are quantitative. Sound quality and FM tuning sensitivity and selectivity are qualitative. c. Average CD capacity = 30/10 = 3. d. 7 (100) 70% 10 e. 4 (100) 40% 10 6. Questions a, c, and d are quantitative. Questions b and e are qualitative. 7. 8. a. The variable is qualitative. b. Nominal with four labels or categories. a. 1005 b. Qualitative 2-5 9. c. Percentages d. .29(1005) = 291.45 or approximately 291. a. Qualitative b. 30 of 71; 42.3% 10. a. Quantitative; ratio b. Qualitative; nominal c. Qualitative (Note: Rank is a numeric label that identifies the position of a student in the class. Rank does not indicate how much or how many and is not quantitative.); ordinal d. Qualitative; nominal e. Quantitative; ratio 11. a. Quantitative; ratio b. Qualitative; ordinal c. Qualitative; ordinal (assuming employees can be ranked by classification) d. Quantitative; ratio e. Qualitative; nominal 12. a. The POPULATION is all visitors coming to the state of Hawaii. b. Since airline flights carry the vast majority of visitors to the state, the use of questionnaires for passengers during incoming flights is a good way to reach this POPULATION. The questionnaire actually appears on the back of a mandatory plants and animals declaration form that passengers must complete during the incoming flight. A large percentage of passengers complete the visitor information questionnaire. c. Questions 1 and 4 provide quantitative DATA indicating the number of visits and the number of days in Hawaii. Questions 2 and 3 provide qualitative DATA indicating the categories of reason for the trip and where the visitor plans to stay. 13. a. Quantitative - Earnings measured in billions of dollars. b. Time series with 6 observations c. Volkswagen's annual earnings. d. Time series shows an increase in earnings. An increase would be expected in 2003, but it appears that the rate of increase is slowing. 2-6 14. a. b. Type of music is a qualitative variable The graph, based on time series DATA, is shown below. Percentage of Music Sales 34 32 30 28 26 24 22 20 1995 1996 1997 1998 1999 2000 2001 Year c. The bar graph, based on cross-sectional DATA, is shown below. % of Music Sales in 1998 30.0 25.0 20.0 15.0 10.0 5.0 0.0 Type of Music 15. Crossectional DATA. The DATA were collected at the same or approximately the same point in time. 16. a. We would like to see DATA from product taste tests and test marketing the product. b. Such DATA would be obtained from specially designed statistical studies. 2-7 17. Internal DATA on salaries of other employees can be obtained from the personnel department. External DATA might be obtained from the Department of Labor or industry associations. 18. a. (48/120)100% = 40% in the SAMPLE died from some form of heart disease. This can be used as an estimate of the percentage of all males 60 or older who die of heart disease. b. 19. a. The DATA on cause of death is qualitative. All subscribers of Business Week at the time the 1996 survey was conducted. b. Quantitative c. Qualitative (yes or no) d. Crossectional - 1996 was the time of the survey. e. Using the SAMPLE results, we could infer or estimate 59% of the POPULATION of subscribers have an annual income of $75,000 or more and 50% of the POPULATION of subscribers have an American Express credit card. 20. a. 56% of market belonged to A.C. Nielsen $387,325 is the average amount spent per category b. 3.73 c. $387,325 21. a. The two POPULATIONs are the POPULATION of women whose mothers took the drug DES during pregnancy and the POPULATION of women whose mothers did not take the drug DES during pregnancy. b. It was a survey. c. 63 / 3.980 = 15.8 women out of each 1000 developed tissue abnormalities. d. The article reported “twice” as many abnormalities in the women whose mothers had taken DES during pregnancy. Thus, a rough estimate would be 15.8/2 = 7.9 abnormalities per 1000 women whose mothers had not taken DES during pregnancy. e. In many situations, disease occurrences are rare and affect only a small portion of the POPULATION. Large SAMPLEs are needed to collect DATA on a reasonable number of cases where the disease exists. 22. a. All adult viewers reached by the Denver, Colorado television station. b. The viewers contacted in the telephone survey. c. A SAMPLE. It would clearly be too costly and time consuming to try to contact all viewers. 23. a. Percent of television sets that were tuned to a particular television show and/or total viewing audience. b. All television sets in the United States which are available for the viewing audience. Note this would not include television sets in store displays. c. A portion of these television sets. Generally, individual households would be contacted to determine 2-8 which programs were being viewed. 2-9 d. 24. a. The cancellation of programs, the scheduling of programs, and advertising cost rates. This is a statistically correct descriptive statistic for the SAMPLE. b. c. An incorrect generalization since the DATA was not collected for the entire POPULATION. An acceptable statistical inference based on the use of the word “estimate.” d. While this statement is true for the SAMPLE, it is not a justifiable conclusion for the entire POPULATION. e. This statement is not statistically supportable. While it is true for the particular SAMPLE observed, it is entirely possible and even very likely that at least some students will be outside the 65 to 90 range of grades. 2 - 10 Chapter 2 Descriptive STATISTICS: Tabular and Graphical Methods Learning Objectives 1. Learn how to construct and interpret summarization procedures for qualitative DATA such as : frequency and relative frequency distributions, bar graphs and pie charts. Be able to use Excel's COUNTIF function to construct a frequency distribution and the Chart Wizard to construct a bar graph and pie chart. 2. Learn how to construct and interpret tabular summarization procedures for quantitative DATA such as: frequency and relative frequency distributions, cumulative frequency and cumulative relative frequency distributions. Be able to use Excel's FREQUENCY function to construct a frequency distribution and the Chart Wizard to construct a histogram. 3. Learn how to construct a histogram and an ogive as graphical summaries of quantitative DATA. 4. Be able to use and interpret the exploratory DATA analysis technique of a stem-and-leaf display. 5. Learn how to construct and interpret cross tabulations and scatter diagrams of bivariate DATA. Be able to use Excel's Pivot Table report to construct a cross tabulation and the Chart Wizard to construct a scatter diagram. 2 - 11 Solutions: 1. Class A B C 2. a. 1 - (.22 + .18 + .40) = .20 b. .20(200) = 40 Frequency 60 24 36 120 Relative Frequency 60/120 = 0.50 24/120 = 0.20 36/120 = 0.30 1.00 c/d Class A B C D Total 3. a. 360° x 58/120 = 174° b. 360° x 42/120 = 126° Frequency .22(200) = 44 .18(200) = 36 .40(200) = 80 .20(200) = 40 200 Percent Frequency 22 18 40 20 100 c. No Opinion 16.7% Yes 48.3% No 35% d. 70 60 Frequency 50 40 30 20 10 0 Yes No No Opinion Response 4. a. The DATA are qualitative. b. TV Show Millionaire Frasier Chicago Hope Charmed Total: Frequency 24 15 7 4 50 12 Percent Frequency 48 30 14 8 100 c. 30 Frequency 25 20 15 10 5 0 Millionaire Frasier Chicago Charmed TV Show Charmed 8% Chicago 14% Millionaire 48% Frasier 30% d. 5. Millionaire has the largest market share. Frasier is second. a. Name Brown Davis Johnson Jones Smith Williams Frequency 7 6 10 7 12 8 Relative Frequency .14 .12 .20 .14 .24 .16 Percent Frequency 14% 12% 20% 14% 24% 16% 50 1.00 b. 14 12 Frequency 10 8 6 4 2 0 Brown c. Brown Davis Johnson Jones Smith Williams Davis Johnson Jones Smith .14 x 360 = 50.4 .12 x 360 = 43.2 .20 x 360 = 72.0 .14 x 360 = 50.4 .24 x 360 = 86.4 .16 x 360 = 57.6 Williams 16% Smith 24% Brown 14% Jones 14% Davis 12% Johnson 20% d. 6. Most common: Smith, Johnson and Williams a. Book 7 Habits Millionaire Motley Dad Frequency 10 16 9 13 14 Percent Frequency 16.66 26.67 15.00 21.67 Williams WSJ Guide Other Total: 6 6 60 10.00 10.00 100.00 The Ernst & Young Tax Guide 2000 with a frequency of 3, Investing for Dummies with a frequency of 2, and What Color is Your Parachute? 2000 with a frequency of 1 are grouped in the "Other" category. b. The rank order from first to fifth is: Millionaire, Dad, 7 Habits, Motley, and WSJ Guide. c. The percent of sales represented by The Millionaire Next Door and Rich Dad, Poor Dad is 48.33%. 7. Rating Outstanding Very Good Good Average Poor Frequency 19 13 10 6 2 50 Relative Frequency 0.38 0.26 0.20 0.12 0.04 1.00 Management should be pleased with these results. 64% of the ratings are very good to outstanding. 84% of the ratings are good or better. Comparing these ratings with previous results will show whether or not the restaurant is making improvements in its ratings of food quality. 8. a. Position Pitcher Catcher 1st Base 2nd Base 3rd Base Shortstop Left Field Center Field Right Field 9. Frequency 17 4 5 4 2 5 6 5 7 55 b. Pitchers (Almost 31%) c. 3rd Base (3 - 4%) d. Right Field (Almost 13%) e. Infielders (16 or 29.1%) to Outfielders (18 or 32.7%) Relative Frequency 0.309 0.073 0.091 0.073 0.036 0.091 0.109 0.091 0.127 1.000 a/b. Starting Time 7:00 7:30 8:00 8:30 9:00 Frequency 3 4 4 7 2 Percent Frequency 15 20 20 35 10 20 16 100 c. Bar Graph 8 7 Frequency 6 5 4 3 2 1 0 7:00 7:30 8:00 8:30 9:00 Starting Time d. 9:00 10% 7:00 15% 7:30 20% 8:30 35% 8:00 20% e. 10. a. The most preferred starting time is 8:30 a.m.. Starting times of 7:30 and 8:00 a.m. are next. The DATA refer to quality levels from 1 "Not at all Satisfied" to 7 "Extremely Satisfied." b. Rating 3 4 5 6 7 Frequency 2 4 12 24 18 60 Relative Frequency 0.03 0.07 0.20 0.40 0.30 1.00 c. Bar Graph 30 25 Frequency 20 15 10 5 0 3 4 5 6 7 Rating d. The survey DATA indicate a high quality of service by the financial consultant. The most common ratings are 6 and 7 (70%) where 7 is extremely satisfied. Only 2 ratings are below the middle scale value of 4. There are no "Not at all Satisfied" ratings. 11. Class Frequency Relative Frequency Percent Frequency 12-14 15-17 18-20 21-23 24-26 2 8 11 10 9 40 0.050 0.200 0.275 0.250 0.225 1.000 5.0 20.0 27.5 25.5 22.5 100.0 Total 12. Class less than or equal to 19 less than or equal to 29 less than or equal to 39 less than or equal to 49 less than or equal to 59 Cumulative Frequency 10 24 41 48 50 18 Cumulative Relative Frequency .20 .48 .82 .96 1.00 13. 18 16 14 Frequency 12 10 8 6 4 2 0 10-19 20-29 30-39 40-49 50-59 1.0 .8 .6 .4 .2 0 10 20 30 40 50 14. a/b. Class 6.0 - 7.9 8.0 - 9.9 10.0 - 11.9 12.0 - 13.9 14.0 - 15.9 Frequency 4 2 8 3 3 20 PercentFrequency 20 10 40 15 15 100 60 15. a/b. Waiting Time 0-4 5-9 10 - 14 15 - 19 20 - 24 Totals Frequency 4 8 5 2 1 20 RelativeFrequency 0.20 0.40 0.25 0.10 0.05 1.00 c/d. Waiting Time Less than or equal to 4 Less than or equal to 9 Less than or equal to 14 Less than or equal to 19 Less than or equal to 24 e. Cumulative Frequency 4 12 17 19 20 Cumulative Relative Frequency 0.20 0.60 0.85 0.95 1.00 12/20 = 0.60 16. a. Stock Price ($) 10.00 - 19.99 20.00 - 29.99 30.00 - 39.99 40.00 - 49.99 50.00 - 59.99 60.00 - 69.99 Total Relative Frequency 0.40 0.16 0.24 0.08 0.04 0.08 1.00 Frequency 10 4 6 2 1 2 25 20 Percent Frequency 40 16 24 8 4 8 100 12 Frequency 10 8 6 4 2 0 10.0019.99 20.0029.99 30.0039.99 40.0049.99 50.0059.99 60.0069.99 Stock Price Many of these are low priced stocks with the greatest frequency in the $10.00 to $19.99 range. b. Earnings per Share ($) -3.00 to -2.01 -2.00 to -1.01 -1.00 to -0.01 0.00 to 0.99 1.00 to 1.99 2.00 to 2.99 Total Frequency 2 0 2 9 9 3 25 Relative Frequency 0.08 0.00 0.08 0.36 0.36 0.12 1.00 Percent Frequency 8 0 8 36 36 12 100 10 Frequency 9 8 7 6 5 4 3 2 1 0 -3.00 to -2.01 -2.00 to -1.01 -1.00 to -0.01 0.00 to 0.99 1.00 to 1.99 2.00 to 2.99 Earnings per Share The majority of companies had earnings in the $0.00 to $2.00 range. Four of the companies lost money. 17. a. Amount 0-99 100-199 200-299 300-399 400-499 b. Frequency 5 5 8 4 3 25 Histogram 22 Relative Frequency .20 .20 .32 .16 .12 1.00 9 8 7 Frequency 6 5 4 3 2 1 0 0-99 100-199 200-299 300-399 400-499 Amount ($) c. 18. a. The largest group spends $200-$300 per year on books and magazines. There are more in the $0 to $200 range than in the $300 to $500 range. Lowest salary: $93,000 Highest salary: $178,000 b. Salary ($1000s) 91-105 106-120 121-135 136-150 151-165 166-180 Total Frequency 4 5 11 18 9 3 50 c. Proportion $135,000 or less: 20/50. d. Percentage more than $150,000: 24% Relative Frequency 0.08 0.10 0.22 0.36 0.18 0.06 1.00 Percent Frequency 8 10 22 36 18 6 100 20 18 16 Frequency 14 12 10 8 6 4 2 0 91-105 106-120 121-135 136-150 151-165 166-180 Salary ($1000s) e. 19. a/b. Number 140 - 149 150 - 159 160 - 169 170 - 179 180 - 189 190 - 199 Totals Frequency 2 7 3 6 1 1 20 Relative Frequency 0.10 0.35 0.15 0.30 0.05 0.05 1.00 c/d. Number Less than or equal to 149 Less than or equal to 159 Less than or equal to 169 Less than or equal to 179 Less than or equal to 189 Less than or equal to 199 Cumulative Frequency 2 9 12 18 19 20 24 Cumulative Relative Frequency 0.10 0.45 0.60 0.90 0.95 1.00 e. Frequency 20 15 10 5 140 20. a. 160 180 200 The percentage of people 34 or less is 20.0 + 5.7 + 9.6 + 13.6 = 48.9. b. The percentage of the POPULATION over 34 years old is 16.3 + 13.5 + 8.7 + 12.6 = 51.1 c. The percentage of the POPULATION that is between 25 and 54 years old inclusively is 13.6 + 16.3 + 13.5 = 43.4 d. The percentage less than 25 years old is 20.0 + 5.7 + 9.6 = 35.3. So there are (.353)(275) = 97.075 million people less than 25 years old. e. An estimate of the number of retired people is (.5)(.087)(275) + (.126)(275) = 46.6125 million. 21. a/b. Computer Usage (Hours) 0.0 2.9 3.0 5.9 6.0 8.9 9.0 - 11.9 12.0 - 14.9 Total Frequency 5 28 8 6 3 50 Relative Frequency 0.10 0.56 0.16 0.12 0.06 1.00 c. 30 Frequency 25 20 15 10 5 0 0.0 - 2.9 3.0 - 5.9 6.0 - 8.9 9.0 - 11.9 12.0 - 14.9 Computer Usage (Hours) d. 60 50 Frequency 40 30 20 10 0 3 6 9 12 15 Computer Usage (Hours) e. The majority of the computer users are in the 3 to 6 hour range. Usage is somewhat skewed toward the right with 3 users in the 12 to 15 hour range. 26 22. 23. 24. 5 7 8 6 4 5 8 7 0 2 2 5 5 6 8 8 0 2 3 5 Leaf Unit = 0.1 6 3 7 5 5 7 8 1 3 4 8 9 3 6 10 0 4 5 11 3 Leaf Unit = 10 11 6 12 0 2 13 0 6 7 14 2 2 7 15 5 16 0 2 8 17 0 2 3 25. 26. 9 8 9 10 2 4 6 6 11 4 5 7 8 8 9 12 2 4 5 7 13 1 2 14 4 15 1 Leaf Unit = 0.1 0 4 7 8 9 9 1 1 2 9 2 0 0 1 3 5 5 6 8 3 4 9 4 8 5 6 7 1 28 27. 4 1 3 6 6 7 5 0 0 3 8 9 6 0 1 1 4 4 5 7 7 9 9 7 0 0 0 1 3 4 4 5 5 6 6 6 7 8 8 8 0 1 1 3 4 4 5 7 7 8 9 9 0 2 2 7 or 4 1 3 4 6 6 7 5 0 0 3 5 8 9 6 0 1 1 4 4 6 5 7 7 9 9 7 0 0 0 1 3 4 4 7 5 5 6 6 6 7 8 8 8 0 1 1 3 4 4 8 5 7 7 8 9 9 0 2 2 9 7 28. a. 0 5 8 1 1 1 3 3 4 4 1 5 6 7 8 9 9 2 2 3 3 3 5 5 2 6 8 3 3 6 7 7 9 4 0 4 7 8 5 5 6 0 b. 2000 P/E Forecast 5-9 10 - 14 15 - 19 20 - 24 25 - 29 30 - 34 35 - 39 40 - 44 45 - 49 50 - 54 55 - 59 60 - 64 Total Frequency 2 6 6 6 2 0 4 1 2 0 0 1 30 29. a. 30 Percent Frequency 6.7 20.0 20.0 20.0 6.7 0.0 13.3 3.3 6.7 0.0 0.0 3.3 100.0 y x 1 2 Total A 5 0 5 B 11 2 13 C 2 10 12 Total 18 12 30 1 2 Total A 100.0 0.0 100.0 B 84.6 15.4 100.0 C 16.7 83.3 100.0 b. y x c. y x d. 30. a. 1 2 A 27.8 0.0 B 61.1 16.7 C 11.1 83.3 T otal 100.0 100.0 Category A values for x are always associated with category 1 values for y. Category B values for x are usually associated with category 1 values for y. Category C values for x are usually associated with category 2 values for y. 56 40 y 24 8 -8 -24 -40 -40 -30 -20 -10 0 10 20 30 40 x b. There is a negative relationship between x and y; y decreases as x increases. 31. Quality Rating Good Very Good Excellent Total Meal Price ($) 20-29 30-39 33.9 2.7 54.2 60.5 11.9 36.8 100.0 100.0 10-19 53.8 43.6 2.6 100.0 40-49 0.0 21.4 78.6 100.0 As the meal price goes up, the percentage of high quality ratings goes up. A positive relationship between meal price and quality is observed. 32. a. Sales/Margins/ROE A B C D E Total 0-19 20-39 EPS Rating 40-59 1 1 3 4 1 4 1 2 4 1 6 0-19 20-39 EPS Rating 40-59 60-79 1 5 2 1 80-100 8 2 3 9 13 60-79 11.11 41.67 28.57 20.00 80-100 88.89 16.67 42.86 Total 9 12 7 5 3 36 b. Sales/Margins/ROE A B C D E 8.33 14.29 60.00 33.33 14.29 20.00 66.67 33.33 Total 100 100 100 100 100 Higher EPS ratings seem to be associated with higher ratings on Sales/Margins/ROE. Of those companies with an "A" rating on Sales/Margins/ROE, 88.89% of them had an EPS Rating of 80 or 32 higher. Of the 8 companies with a "D" or "E" rating on Sales/Margins/ROE, only 1 had an EPS rating above 60. 33. a. Sales/Margins/ROE A B C D E Total A 1 1 1 1 4 Industry Group Relative Strength B C D 2 2 4 5 2 3 3 2 1 1 1 2 11 7 10 E Total 9 12 7 5 3 36 1 1 2 4 b/c. The frequency distributions for the Sales/Margins/ROE DATA is in the rightmost column of the crosstabulation. The frequency distribution for the Industry Group Relative Strength DATA is in the bottom row of the crosstabulation. d. Once the crosstabulation is complete, the individual frequency distributions are available in the margins. 34. a. 80 70 Relative Price Strength 60 50 40 30 20 10 0 0 20 40 60 80 100 120 EPS Rating b. One might expect stocks with higher EPS ratings to show greater relative price strength. However, the scatter diagram using this DATA does not support such a relationship. The scatter diagram appears similar to the one showing "No Apparent Relationship" in Figure 2.19. 35. a. The crosstabulation is shown below: Count of Observation Position Guard Speed 4-4.5 4.5-5 5-5.5 5.5-6 Grand Total 12 1 13 Offensive tackle 2 Wide receiver 6 9 Grand Total 6 11 7 3 19 4 12 15 40 b. There appears to be a relationship between Position and Speed; wide receivers had faster speeds than offensive tackles and guards. c. The scatter diagram is shown below: 10 9 Rating 8 7 6 5 4 4 4.5 5 5.5 6 Speed d. There appears to be a relationship between Speed and Rating; slower speeds appear to be associated with lower ratings. In other words,, prospects with faster speeds tend to be rated higher than prospects with slower speeds. 36. a. Vehicle F-Series Silverado Taurus Camry Accord Frequency 17 12 8 7 6 34 Percent Frequency 34 24 16 14 12 Total b. 50 100 The two top selling vehicles are the Ford F-Series Pickup and the Chevrolet Silverado. Accord 12% F-Series 34% Camry 14% Taurus 16% Silverado 24% c. 37. a/b. Industry Beverage Chemicals Electronics Food Aerospace Totals c. Frequency 2 3 6 7 2 20 Percent Frequency 10 15 30 35 10 100 8 7 Frequency 6 5 4 3 2 1 0 Beverage Chemicals Electronics Food Aerospace Industry 38. a. Response Accuracy Approach Shots Mental Approach Power Practice Putting Short Game Strategic Decisions Total b. Frequency 16 3 17 8 15 10 24 7 100 Percent Frequency 16 3 17 8 15 10 24 7 100 Poor short game, poor mental approach, lack of accuracy, and limited practice. 39. a-d. Sales 0 - 499 500 - 999 1000 - 1499 1500 - 1999 2000 - 2499 Frequency 13 3 0 3 1 Relative Frequency 0.65 0.15 0.00 0.15 0.05 36 Cumulative Frequency 13 16 16 19 20 Cumulative Relative Frequency 0.65 0.80 0.80 0.95 1.00 Total 20 1.00 e. 14 12 Frequency 10 8 6 4 2 0 0-499 500-999 1000-1499 1500-1999 2000-2499 Sales 40. a. Closing Price 0 - 9.99 10 - 19.99 20 - 29.99 30 - 39.99 40 - 49.99 50 - 59.99 60 - 69.99 70 - 79.99 Totals Frequency 9 10 5 11 2 2 0 1 40 Relative Frequency 0.225 0.250 0.125 0.275 0.050 0.050 0.000 0.025 1.000 b. Closing Price Less than or equal to 9.99 Less than or equal to 19.99 Less than or equal to 29.99 Less than or equal to 39.99 Less than or equal to 49.99 Cumulative Frequency 9 19 24 35 37 Cumulative Relative Frequency 0.225 0.475 0.600 0.875 0.925 Less than or equal to 59.99 Less than or equal to 69.99 Less than or equal to 79.99 39 39 40 0.975 0.975 1.000 c. 12 10 Frequency 8 6 4 2 0 10 20 30 40 50 60 70 80 Closing Price d. Over 87% of common stocks trade for less than $40 a share and 60% trade for less than $30 per share. 41. a. Exchange American New York Over the Counter Frequency 3 2 15 20 Relative Frequency 0.15 0.10 0.75 1.00 b. Earnings Per Share 0.00 - 0.19 0.20 - 0.39 0.40 - 0.59 0.60 - 0.79 0.80 - 0.99 Frequency 7 7 1 3 2 20 Relative Frequency 0.35 0.35 0.05 0.15 0.10 1.00 Seventy percent of the shadow stocks have earnings per share less then $0.40. It looks like low EPS should be expected for shadow stocks. Price-Earning Ratio 0.00 - 9.9 Frequency 3 38 Relative Frequency 0.15 10.0 - 19.9 20.0 - 29.9 30.0 - 39.9 40.0 - 49.9 50.0 - 59.9 7 4 3 2 1 20 0.35 0.20 0.15 0.10 0.05 1.00 P-E Ratios vary considerably, but there is a significant cluster in the 10 - 19.9 range. 42. Income ($) 18,000-21,999 22,000-25,999 26,000-29,999 30,000-33,999 34,000-37,999 Total Frequency 13 20 12 4 2 51 Relative Frequency 0.255 0.392 0.235 0.078 0.039 1.000 25 Frequency 20 15 10 5 0 18,000 - 21,999 22,000 - 25,999 26,000 - 29,999 Per Capita Income 43. a. 30,000 - 33,999 34,000 - 37,999 0 8 9 1 0 2 2 2 3 4 4 4 1 5 5 6 6 6 6 7 7 8 8 8 8 9 9 9 2 0 1 2 2 2 3 4 4 4 2 5 6 8 3 0 1 3 b/c/d. Number Answered Correctly 5-9 10 - 14 15 - 19 20 - 24 25 - 29 30 - 34 Totals e. Relative Frequency 0.050 0.200 0.375 0.225 0.075 0.075 1.000 Frequency 2 8 15 9 3 3 40 Cumulative Frequency 2 10 25 34 37 40 Relatively few of the students (25%) were able to answer 1/2 or more of the questions correctly. The DATA seem to support the Joint Council on Economic Education’s claim. However, the degree of difficulty of the questions needs to be taken into account before reaching a final conclusion. 44. a/b. High Temperature 3 4 Low Temperature 3 9 4 3 6 8 5 7 5 0 0 0 2 4 4 5 5 7 9 6 1 4 4 4 4 6 8 6 1 8 7 3 5 7 9 7 2 4 5 5 8 0 1 1 4 6 8 9 0 2 3 9 c. It is clear that the range of low temperatures is below the range of high temperatures. Looking at the stem-and-leaf displays side by side, it appears that the range of low temperatures is about 20 degrees below the range of high temperatures. d. There are two stems showing high temperatures of 80 degrees or higher. They show 8 cities with high temperatures of 80 degrees or higher. 40 e. Frequency High Temp. Low. Temp. 0 1 0 3 1 10 7 2 4 4 5 0 3 0 20 20 Temperature 30-39 40-49 50-59 60-69 70-79 80-89 90-99 Total Low Temperatur 45. a. 80 75 70 65 60 55 50 45 40 35 30 40 50 60 70 80 90 100 High Temperature b. There is clearly a positive relationship between high and low temperature for cities. As one goes up so does the other. 46. a. Occupation Cabinetmaker Lawyer Physical Therapist Systems Analyst Total 30-39 40-49 1 5 1 2 7 30-39 40-49 10 50 Satisfaction Score 50-59 60-69 70-79 2 4 3 2 1 1 5 2 1 1 4 3 10 11 8 80-89 1 2 3 Total 10 10 10 10 40 b. Occupation Cabinetmaker Lawyer Physical Therapist Systems Analyst 20 Satisfaction Score 50-59 60-69 70-79 20 40 30 20 10 10 50 20 10 10 40 30 80-89 10 20 Total 100 100 100 100 c. Each row of the percent crosstabulation shows a percent frequency distribution for an occupation. Cabinet makers seem to have the higher job satisfaction scores while lawyers seem to have the lowest. Fifty percent of the physical therapists have mediocre scores but the rest are rather high. 47. a. 40,000 35,000 Revenue $mil 30,000 25,000 20,000 15,000 10,000 5,000 0 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 Employees b. There appears to be a positive relationship between number of employees and revenue. As the number of employees increases, annual revenue increases. 48. a. Fuel Type Year Constructed Elect. Nat. Gas Oil Propane Other 1973 or before 40 183 12 5 7 1974-1979 24 26 2 2 0 1980-1986 37 38 1 0 6 1987-1991 48 70 2 0 1 Total 149 317 17 7 14 Total 247 54 82 121 504 b. Year Constructed 1973 or before 1974-1979 1980-1986 1987-1991 Total Frequency 247 54 82 121 504 Fuel Type Electricity Nat. Gas Oil Propane Other Total 42 Frequency 149 317 17 7 14 504 100,000 c. Crosstabulation of Column Percentages Fuel Type Year Constructed Elect. Nat. Gas Oil Propane Other 1973 or before 26.9 57.7 70.5 71.4 50.0 1974-1979 16.1 8.2 11.8 28.6 0.0 1980-1986 24.8 12.0 5.9 0.0 42.9 1987-1991 32.2 22.1 11.8 0.0 7.1 Total 100.0 100.0 100.0 100.0 100.0 d. Crosstabulation of row percentages. Year Constructed 1973 or before 1974-1979 1980-1986 1987-1991 e. Fuel Type Elect. Nat. Gas Oil Propane Other 16.2 74.1 4.9 2.0 2.8 44.5 48.1 3.7 3.7 0.0 45.1 46.4 1.2 0.0 7.3 39.7 57.8 1.7 0.0 0.8 Total 100.0 100.0 100.0 100.0 Observations from the column percentages crosstabulation For those buildings using electricity, the percentages have not changes greatly over the years. For the buildings using natural gas, the majority were constructed in 1973 or before; the second largest percentage was constructed in 1987-1991. Most of the buildings using oil were constructed in 1973 or before. All of the buildings using propane are older. Observations from the row percentages crosstabulation Most of the buildings in the CG&E service area use electricity or natural gas. In the period 1973 or before most used natural gas. From 1974-1986, it is fairly evenly divided between electricity and natural gas. Since 1987 almost all new buildings are using electricity or natural gas with natural gas being the clear leader. 49. a. Crosstabulation for stockholder's equity and profit. Stockholders' Equity ($000) 0-1200 1200-2400 2400-3600 3600-4800 4800-6000 Total b. 0-200 10 4 4 200-400 1 10 3 18 2 16 Profits ($000) 400-600 600-800 3 1 3 6 1 2 800-1000 1000-1200 1 2 1 1 1 2 4 4 Total 12 16 13 3 6 50 800-1000 0.00 12.50 7.69 33.33 0.00 1000-1200 8.33 0.00 7.69 66.67 0.00 Total 100 100 100 100 100 Crosstabulation of Row Percentages. Stockholders' Equity ($1000s) 0-1200 1200-2400 2400-3600 3600-4800 4800-6000 0-200 83.33 25.00 30.77 0.00 200-400 8.33 62.50 23.08 0.00 33.33 Profits ($000) 400-600 600-800 0.00 0.00 0.00 0.00 23.08 7.69 0.00 0.00 50.00 16.67 c. 50. a. Stockholder's equity and profit seem to be related. As profit goes up, stockholder's equity goes up. The relationship, however, is not very strong. Crosstabulation of market value and profit. Market Value ($1000s) 0-8000 8000-16000 16000-24000 24000-32000 32000-40000 Total b. 51. a. 27 Profit ($1000s) 300-600 600-900 4 4 2 2 1 1 2 2 1 13 6 900-1200 4 Total 27 12 4 4 3 50 900-1200 0.00 16.67 25.00 25.00 0.00 Total 100 100 100 100 100 2 1 1 Crosstabulation of Row Percentages. Market Value ($1000s) 0-8000 8000-16000 16000-24000 24000-32000 32000-40000 c. 0-300 23 4 0-300 85.19 33.33 0.00 0.00 0.00 Profit ($1000s) 300-600 600-900 14.81 0.00 33.33 16.67 50.00 25.00 25.00 50.00 66.67 33.33 There appears to be a positive relationship between Profit and Market Value. As profit goes up, Market Value goes up. Scatter diagram of Profit vs. Stockholder's Equity. 1400.0 1200.0 Profit ($1000s) 1000.0 800.0 600.0 400.0 200.0 0.0 0.0 1000.0 2000.0 3000.0 4000.0 5000.0 Stockholder's Equity ($1000s) b. Profit and Stockholder's Equity appear to be positively related. 44 6000.0 7000.0 52. a. Scatter diagram of Market Value and Stockholder's Equity. 45000.0 Market Value ($1000s) 40000.0 35000.0 30000.0 25000.0 20000.0 15000.0 10000.0 5000.0 0.0 0.0 1000.0 2000.0 3000.0 4000.0 5000.0 6000.0 Stockholder's Equity ($1000s) b. There is a positive relationship between Market Value and Stockholder's Equity. 7000.0 Chapter 3 Descriptive STATISTICS: Numerical Methods Learning Objectives 1. Understand the purpose of measures of location. 2. Be able to compute the mean, median, mode, quartiles, and various percentiles. 3. Understand the purpose of measures of variability. 4. Be able to compute the range, interquartile range, variance, standard deviation, and coefficient of variation. 5. Understand how z scores are computed and how they are used as a measure of relative location of a DATA value. 6. Know how Chebyshev’s theorem and the empirical rule can be used to determine the percentage of the DATA within a specified number of standard deviations from the mean. 7. Learn how to construct a 5-number summary and a box plot. 8. Be able to compute and interpret covariance and correlation as measures of association between two variables. 9. Be able to compute a weighted mean. 46 Solutions: x 1. xi 75 n 15 5 10, 12, 16, 17, 20 Median = 16 (middle value) x 2. xi 96 n 16 6 10, 12, 16, 17, 20, 21 Median = 3. 16 17 16.5 2 15, 20, 25, 25, 27, 28, 30, 32 20 i (8) 1.6 2nd position = 20 100 20 25 22.5 2 25 (8) 2 100 i 65 i (8) 5.2 6th position = 28 100 i Mean 4. 28 30 29 2 75 (8) 6 100 xi 657 5. a. Median = 57 6th item Mode = 53 It appears 3 times x xi n b. 1106.4 36.88 30 There are an even number of items. Thus, the median is the average of the 15th and 16th items after the DATA have been placed in rank order. Median = c. 59.727 11 n 36.6 36.7 36.65 2 Mode = 36.4 This value appears 4 times d. First Quartile i F25 I30 7.5 G H100J K Rounding up, we see that Q1 is at the 8th position. Q1 = 36.2 e. Third Quartile i F75 I G H100 J K30 22.5 Rounding up, we see that Q3 is at the 23rd position. Q3 = 37.9 6. a. x xi n 1845 92.25 20 Median is average of 10th and 11th values after arranging in ascending order. Median 66 95 2 80.5 DATA are multimodal b. x xi n 1334 66.7 20 Median 66 70 2 68 Mode = 70 (4 brokers charge $70) 7. c. Comparing all three measures of central location (mean, median and mode), we conclude that it costs more, on average, to trade 500 shares at $50 per share. d. Yes, trading 500 shares at $50 per share is a transaction value of $25,000 whereas trading 1000 shares at $5 per share is a transaction value of $5000. a. x xi n 1380 46 30 b. Yes, the mean here is 46 minutes. The newspaper reported on average of 45 minutes. c. Median d. Q1 = 7 (value of 8th item in ranked order) 45 52.9 48.95 2 Q3 = 70.4 (value of 23rd item in ranked list) 48 e. Find position i 40th percentile = 40 30 12; 40th percentile is average of values in 12th and 13th positions. 100 28.8 + 29.1 = 28.95 2 8. a. x xi n 695 34.75 20 Mode = 25 (appears three times) b. DATA in order: 18, 20, 25, 25, 25, 26, 27, 27, 28, 33, 36, 37, 40, 40, 42, 45, 46, 48, 53, 54 Median (10th and 11th positions) 33 36 34.5 2 At home workers are slightly younger c. i 25 (20) 5; use positions 5 and 6 100 Q1 25 26 2 75 i 25.5 (20) 15; use positions 15 and 16 100 Q3 d. i 42 45 2 32 43.5 (20) 6.4; round up to position 7 100 32nd percentile = 27 At least 32% of the people are 27 or younger. 9. a. x xi n 270, 377 10,815.08 Median (Position 13) = 8296 25 b. Median would be better because of large DATA values. c. i = (25 / 100) 25 = 6.25 Q1 (Position 7) = 5984 i = (75 / 100) 25 = 18.75 Q3 (Position 19) = 14,330 d. i = (85/100) 25 = 21.25 85th percentile (position 22) = 15,593. Approximately 85% of the websites have less than 15,593 unique visitors. 10. a. xi = 435 x xi 435 n 48.33 9 DATA in ascending order: 28 42 45 48 49 50 55 58 60 Median = 49 Do not report a mode; each DATA value occurs once. The index could be considered good since both the mean and median are less than 50. b. i 25 100 9 2.25 Q1 (3rd position) = 45 i 75 100 9 6.75 Q3 (7th position) = 55 11. Using the mean we get xcity =15.58, xcountry = 18.92 For the SAMPLEs we see that the mean mileage is better in the country than in the city. City 13.2 14.4 15.2 15.3 15.3 15.3 15.9 16 16.1 16.2 16.2 16.7 16.8 Median Mode: 15.3 Country 17.2 17.4 18.3 18.5 18.6 18.6 18.7 19.0 19.2 19.4 19.4 20.6 21.1 Median Mode: 18.6, 19.4 The median and modal mileages are also better in the country than in the city. 50 12. a. xi x 12, 780 n b. xi x 1976 n c. d. 13. $639 20 98.8 pictures 20 xi 2204 110.2 minutes n 20 This is not an easy choice because it is a multicriteria problem. If price was the only criterion, the lowest price camera (Fujifilm DX-10) would be preferred. If maximum picture capacity was the only criterion, the maximum picture capacity camera (Kodak DC280 Zoom) would be preferred. But, if battery life was the only criterion, the maximum battery life camera (Fujifilm DX10) would be preferred. There are many approaches used to select the best choice in a multicriteria situation. These approaches are discussed in more specialized books on decision analysis. x Range 20 - 10 = 10 10, 12, 16, 17, 20 25 i (5) 1.25 100 Q1 (2nd position) = 12 75 i (5) 3.75 100 Q3 (4th position) = 17 IQR = Q3 - Q1 = 17 - 12 = 5 14. xi x n s2 75 15 5 ( x i x )2 n 1 64 16 4 s 16 4 15. 15, 20, 25, 25, 27, 28, 30, 34 i 25 (8) 2 100 Q1 i 75 (8) 6 100 Q1 IQR = Q3 - Q1 = 29 - 22.5 = 6.5 x xi 204 25.5 n 8 Range = 34 - 15 = 19 20 25 2 28 30 2 22.5 29 s2 ( x i x )2 n 1 242 34.57 7 s 34.57 5.88 16. a. b. Range = 190 - 168 = 22 (x i x )2 376 s 2 = 376 = 75.2 5 c. s 75.2 8.67 d. Coefficient of Variation 17. 8.67 178 100 4.87 Range = 92-67 = 25 IQR = Q3 - Q1 = 80 - 77 = 3 x = 78.4667 ∑ x x 411.7333 2 i 411.7333 29.4095 s2 ∑ xi x n 1 14 2 s 29.4095 5.4231 18. a. x xi n 115.13 (Mainland); 36.62 (Asia) Median (7th and 8th position) Mainland = (110.87 + 112.25) / 2 = 111.56 Median (6th and 7th position) Asia = (32.98 + 40.41) / 2 = 36.695 b. Range = High - Low Range Standard Deviation Coefficient of Variation Mainland 86.24 26.82 23.30 52 Asia 42.97 11.40 31.13 c. 19. a. b. Greater mean and standard deviation for Mainland. Greater coefficient of variation for Asia. Range = 60 - 28 = 32 IQR = Q3 - Q1 = 55 - 45 = 10 435 x 48.33 9 (x i x )2 742 s2 (x x )2 i n 1 742 92.75 8 s 92.75 9.63 c. 20. The average air quality is about the same. But, the variability is greater in Anaheim. Dawson Supply: Range = 11 - 9 = 2 s 4.1 0.67 9 J.C. Clark: Range = 15 - 7 = 8 s 21. a. 60.1 2.58 9 Winter Range = 21 - 12 = 9 IQR = Q3 - Q1 = 20-16 = 4 Summer Range = 38 - 18 = 20 IQR = Q3 - Q1 = 29-18 = 11 b. Winter Summer c. Variance 8.2333 44.4889 Standard Deviation 2.8694 6.6700 Winter Coefficient of Variation = x s 100 2.8694 17.7 100 16.21 s 100 6.6700 25.6 100 26.05 Summer Coefficient of Variation = d. x More variability in the summer months. 22. a. 500 Shares at $50 Min Value = 34 Max Value = 195 Range = 195 - 34 = 161 Q1 45 50 47.5 Q3 140 140 2 140 2 Interquartile range = 140 - 47.5 = 92.5 1000 Shares at $5 Min Value = 34 Max Value = 90 Range = 90 - 34 = 56 Q1 60 60.5 60.25 Q3 2 79.5 80 79.75 2 Interquartile range = 79.75 - 60.25 = 19.5 b. 500 Shares at $50 (x x )2 51, 402.25 i 2705.3816 n 1 19 s 2705.3816 52.01 s2 1000 Shares at $5 (x x )2 5526.2 i 290.8526 n 1 19 s 290.8526 17.05 s2 c. 500 Shares at $50 Coefficient of Variation = s (100) x 52.01 (100) 56.38 92.25 1000 Shares at $5 Coefficient of Variation = s x d. (100) 17.05 (100) 25.56 66.70 The variability is greater for the trade of 500 shares at $50 per share. This is true whether we use the standard deviation or the coefficient of variation as a measure. 23. s2 = 0.0021 Production should not be shut down since the variance is less than .005. 24. Quarter milers s = 0.0564 54 Coefficient of Variation = (s/ x )100 = (0.0564/0.966)100 = 5.8 Milers s = 0.1295 Coefficient of Variation = (s/ x )100 = (0.1295/4.534)100 = 2.9 Yes; the coefficient of variation shows that as a percentage of the mean the quarter milers’ times show more variability. 25. x xi 75 n 15 5 s2 ( x x) 2 n 1 10 z 10 15 64 4 4 1.25 4 20 20 15 z 1.25 4 12 z 12 15 0.75 4 17 z 17 15 .50 4 16 z 16 15 .25 4 26. z 520 500 .20 100 z 650 500 1.50 100 z 500 500 0.00 100 z 450 500 0.50 100 z 280 500 2.20 100 27. a. z 40 30 5 2 1 1 22 0.75 At least 75% b. z 45 30 5 c. z 38 30 5 d. z 42 30 5 e. z 48 30 5 28. a. 3 1 1.6 1 0.89 At least 89% 1 0.61 At least 61% 1.62 2.4 1 3.6 1 1 2.42 1 0.83 At least 83% 0.92 At least 92% 3.62 Approximately 95% b. Almost all c. Approximately 68% 29. a. 1 32 This is from 2 standard deviations below the mean to 2 standard deviations above the mean. With z = 2, Chebyshev’s theorem gives: 1 1 z2 1 1 22 1 1 4 3 4 Therefore, at least 75% of adults sleep between 4.5 and 9.3 hours per day. b. This is from 2.5 standard deviations below the mean to 2.5 standard deviations above the mean. With z = 2.5, Chebyshev’s theorem gives: 1 1 1 1 2 1 .84 z2 6.25 2.5 Therefore, at least 84% of adults sleep between 3.9 and 9.9 hours per day. 1 c. With z = 2, the empirical rule suggests that 95% of adults sleep between 4.5and 9.3 hours per day. The probability obtained using the empirical rule is greater than the probability obtained using Chebyshev’s theorem. 30. a. 2 hours is 1 standard deviation below the mean. Thus, the empirical rule suggests that 68% of the kids watch television between 2 and 4 hours per day. Since a bell-shaped distribution is symmetric, approximately, 34% of the kids watch television between 2 and 3 hours per day. b. c. 1 hour is 2 standard deviations below the mean. Thus, the empirical rule suggests that 95% of the kids watch television between 1 and 5 hours per day. Since a bell-shaped distribution is symmetric, approximately, 47.5% of the kids watch television between 1 and 3 hours per day. In part (a) we concluded that approximately 34% of the kids watch television between 2 and 3 hours per day; thus, approximately 34% of the kids watch television between 3 and 4 hours per day. Hence, approximately 47.5% + 34% = 81.5% of kids watch television between 1 and 4 hours per day. Since 34% of the kids watch television between 3 and 4 hours per day, 50% - 34% = 16% of the kids watch television more than 4 hours per day. 56 31. a. Approximately 68% of scores are within 1 standard deviation from the mean. b. Approximately 95% of scores are within 2 standard deviations from the mean. c. Approximately (100% - 95%) / 2 = 2.5% of scores are over 130. d. Yes, almost all IQ scores are less than 145. 71.00 90.06 32. a. z 0.95 b. z c. The z-score in part a indicates that the value is 0.95 standard deviations below the mean. The z-score in part b indicates that the value is 3.90 standard deviations above the mean. 20 168 90.06 20 3.90 The labor cost in part b is an outlier and should be reviewed for accuracy. 33. a. b. x is approximately 63 or $63,000, and s is 4 or $4000 This is from 2 standard deviations below the mean to 2 standard deviations above the mean. With z = 2, Chebyshev’s theorem gives: 1 1 z2 1 1 22 1 1 4 3 4 Therefore, at least 75% of benefits managers have an annual salary between $55,000 and $71,000. c. The histogram of the salary DATA is shown below: 9 8 7 Frequency 6 5 4 3 2 1 0 56-58 58-60 60-62 62-64 64-66 66-68 68-70 70-72 72-74 Salary Although the distribution is not perfectly bell shaped, it does appear reasonable to assume that the distribution of annual salary can be approximated by a bell-shaped distribution. d. With z = 2, the empirical rule suggests that 95% of benefits managers have an annual salary between $55,000 and $71,000. The probability is much higher than obtained using Chebyshev’s theorem, but requires the assumption that the distribution of annual salary is bell shaped. e. There are no outliers because all the observations are within 3 standard deviations of the mean. 34. a. x is 100 and s is 13.88 or approximately 14 b. If the distribution is bell shaped with a mean of 100 points, the percentage of NBA games in which the winning team scores more than 100 points is 50%. A score of 114 points is z = 1 standard deviation above the mean. Thus, the empirical rule suggests that 68% of the winning teams will score between 86 and 114 points. In other words, 32% of the winning teams will score less than 86 points or more than 114 points. Because a bell-shaped distribution is symmetric, approximately 16% of the winning teams will score more than 114 points. c. For the winning margin, x is 11.1 and s is 10.77. To see if there are any outliers, we will first compute the z-score for the winning margin that is farthest from the SAMPLE mean of 11.1, a winning margin of 32 points. z xx s 32 11.1 1.94 10.77 Thus, a winning margin of 32 points is not an outlier (z = 1.94 < 3). Because a winning margin of 32 points is farthest from the mean, none of the other DATA values can have a z-score that is less than 3 or greater than 3 and hence we conclude that there are no outliers 35. a. x xi n 79.86 3.99 20 58 Median = b. 4.17 4.20 4.185 (average of 10th and 11th values) 2 Q1 = 4.00 (average of 5th and 6th values) Q3 = 4.50 (average of 15th and 16th values) 12.5080 ( x x )2 0.8114 19 n 1 c. s d. Allison One: z 4.12 3.99 0.16 0.8114 Omni Audio SA 12.3: z e. 2.32 3.99 2.06 0.8114 The lowest rating is for the Bose 501 Series. It’s z-score is: z 2.14 3.99 0.8114 2.28 This is not an outlier so there are no outliers. 36. 15, 20, 25, 25, 27, 28, 30, 34 Smallest = 15 i 25 (8) 2 100 Median i Q1 20 25 22.5 2 Q3 28 30 29 2 25 27 26 2 75 (8) 8 100 Largest = 34 37. 15 38. 5, 6, 8, 10, 10, 12, 15, 16, 18 Smallest = 5 20 25 30 35 i 25 100 (9) 2.25 Q1 = 8 (3rd position) Median = 10 i 75 100 (9) 6.75 Q3 = 15 (7th position) Largest = 18 5 39. 10 15 20 IQR = 50 - 42 = 8 Lower Limit: Upper Limit: Q1 - 1.5 IQR = 42 - 12 = 30 Q3 + 1.5 IQR = 50 + 12 = 62 65 is an outlier 40. a. b. Five number summary: 5 9.6 14.5 19.2 52.7 IQR = Q3 - Q1 = 19.2 - 9.6 = 9.6 Lower Limit: Upper Limit: c. Q1 - 1.5 (IQR) = 9.6 - 1.5(9.6) = -4.8 Q3 + 1.5(IQR) = 19.2 + 1.5(9.6) = 33.6 The DATA value 41.6 is an outlier (larger than the upper limit) and so is the DATA value 52.7. The financial analyst should first verify that these values are correct. Perhaps a typing error has caused 25.7 to be typed as 52.7 (or 14.6 to be typed as 41.6). If the outliers are correct, the analyst might consider these companies with an unusually large return on equity as good investment candidates. d. * -10 41. a. 5 20 35 Median (11th position) 4019 i 25 (21) 5.25 100 Q1 (6th position) = 1872 60 * 50 65 i 75 (21) 15.75 100 Q3 (16th position) = 8305 608, 1872, 4019, 8305, 14138 b. Limits: IQR = Q3 - Q1 = 8305 - 1872 = 6433 Lower Limit: Q1 - 1.5 (IQR) = -7777 Upper Limit: Q3 + 1.5 (IQR) = 17955 c. There are no outliers, all DATA are within the limits. d. Yes, if the first two digits in Johnson and Johnson's sales were transposed to 41,138, sales would have shown up as an outlier. A review of the DATA would have enabled the correction of the DATA. e. 0 42. a. 6,000 3,000 9,000 12,000 15,000 Mean = 105.7933 Median = 52.7 b. Q1 = 15.7 Q3 = 78.3 c. IQR = Q3 - Q1 = 78.3 - 15.7 = 62.6 Lower limit for box plot = Q1 - 1.5(IQR) = 15.7 - 1.5(62.6) = -78.2 Upper limit for box plot = Q3 + 1.5 (IQR) = 78.3 + 1.5(62.6) = 172.2 Note: Because the number of shares covered by options grants cannot be negative, the lower limit for the box plot is set at 0. This, outliers are value in the DATA set greater than 172.2. Outliers: Silicon Graphics (188.8) and ToysRUs (247.6) d. 43. a. Mean percentage = 26.73. The current percentage is much greater. Five Number Summary (Midsize) 51 71.5 81.5 96.5 128 Five Number Summary (Small) 73 101 108.5 121 140 b. Box Plots Midsize 50 60 70 80 90 100 110 120 130 60 70 80 90 100 110 120 130 140 Small Size 50 c. 150 The midsize cars appear to be safer than the small cars. 44. a. x = 37.48 Median = 23.67 b. c. Q1 = 7.91 Q3 = 51.92 IQR = 51.92 - 7.91 = 44.01 Lower Limit: Q1 - 1.5(IQR) = 7.91 - 1.5(44.01) = -58.11 Upper Limit: Q3 + 1.5(IQR) = 51.92 + 1.5(44.01) = 117.94 Russia, with a percent change of 125.89, is an outlier. Turkey, with a percent change of 254.45 is another outlier. d. With a percent change of 22.64, the United States is just below the 50th percentile - the median. 45. a. 70 60 50 y 40 30 20 10 0 0 5 10 x 62 15 20 b. Negative relationship c/d. xi 40 x 40 8 yi 230 5 ( xi x)( yi y) 240 y 230 46 5 ( x i x) 2 118 ( y i y) 2 520 sxy (xi x )( yi y ) 240 60 n 1 5 1 sx (x x )2 n 1 118 5.4314 5 1 sy ( y y )2 n 1 520 11.4018 5 1 rxy sxy sx sy 60 0.969 (5.4314)(11.4018) There is a strong negative linear relationship. 46. a. 18 16 14 12 y 10 8 6 4 2 0 0 5 10 15 x 20 25 30 b. Positive relationship c/d. xi 80 x 80 16 5 ( xi x )( yi y) 106 y yi 50 50 10 5 ( x i x) 2 272 ( y i y) 2 86 sxy (xi x )( yi y) 106 26.5 n 1 5 1 sx (x x )2 n 1 272 8.2462 5 1 86 ( y y)2 4.6368 n 1 5 1 sxy 26.5 rxy 0.693 sx sy (8.2462)(4.6368) sy A positive linear relationship 47. a. 750 700 y = SAT 650 600 550 500 450 400 2.6 2.8 3 3.2 x = GPA b. Positive relationship 64 3.4 3.6 3.8 x c/d. xi 19.8 19.8 3.3 ( xi x )( yi y) 143 y yi 3540 6 3540 590 6 ( x i x) 2 0.74 ( y i y) 2 36,400 sxy (xi x )( yi y) 143 28.6 n 1 6 1 sx (x x )2 n 1 0.74 0.3847 6 1 sy ( y y)2 n 1 36, 400 85.3229 6 1 rxy sxy 28.6 sx sy 0.8713 (0.3847)(85.3229) A positive linear relationship 48. Let x = driving speed and y = mileage x xi 420 420 42 10 (x x )( y y ) 475 i yi 270 y (x x )2 1660 i i 270 27 10 ( y y)2 164 i sxy (xi x )( yi y) 475 52.7778 n 1 10 1 sx 1660 (x x )2 13.5810 n 1 10 1 sy 164 ( y y)2 4.2687 n 1 10 1 rxy sxy sx sy 52.7778 .91 (13.5810)(4.2687) A strong negative linear relationship 49. a. b. 50. a. The SAMPLE correlation coefficient is .78. There is a positive linear relationship between the performance score and the overall rating. The SAMPLE correlation coefficient is .92. b. There is a strong positive linear relationship between the two variables. 51. The SAMPLE correlation coefficient is .88. This indicates a strong positive linear relationship between the daily high and low temperatures. 52. a. x wi xi wi b. 6(3.2) 3(2) 2(2.5) 8(5) 70.2 3.69 6 3 2 8 19 3.2 2 2.5 5 4 12.7 3.175 4 53. fi 4 7 9 5 25 Mi 5 10 15 20 fi Mi 20 70 135 100 325 fi Mi 325 13 n 25 x s2 fi Mi Mi x 4 7 9 5 5 10 15 20 -8 -3 +2 +7 fi (M i x )2 n 1 600 (M i x )2 64 9 4 49 25 24 s 25 5 54. a. Grade xi 4 (A) 3 (B) 2 (C) 1 (D) 0 (F) x b. wi xi wi Weight wi 9 15 33 3 0 60 Credit Hours 9(4) 15(3) 33(2) 3(1) 150 2.5 9 15 33 3 60 Yes; satisfies the 2.5 grade point average requirement 55. fi 4 7 Mi 5 10 66 fi Mi 20 70 f i (M i x )2 256 63 36 245 600 9 5 25 x 135 100 325 fi Mi 325 13 n s2 15 20 25 fi Mi Mi x (Mi x ) 2 fi (Mi x ) 2 4 7 9 5 5 10 15 20 -8 -3 +2 +7 64 9 4 49 256 63 36 245 600 600 fi (M i x )2 25 n 1 24 s 25 5 56. Mi fi fi Mi Mi x 74 192 280 105 23 6 680 2 7 12 17 22 27 148 1,344 3,360 1,785 506 162 7,305 -8.742647 -3.742647 1.257353 6.257353 11.257353 16.257353 (Mi x ) 2 fi (Mi x ) 2 76.433877 14.007407 1.580937 39.154467 126.728000 264.301530 5,656.1069 2,689.4221 442.6622 4,111.2190 2,914.7439 1,585.8092 17,399.9630 Estimate of total gallons sold: (10.74)(120) = 1288.8 7305 x 10.74 680 s2 17, 399.9630 25.63 679 s 5.06 57. a. Class 0 1 2 3 4 Totals x i fMi n 1745 fi 15 10 40 85 350 500 Mi 0 1 2 3 4 3.49 500 b. Mi x ( M i x )2 f i ( M i x )2 fi Mi 0 10 80 255 1400 1745 -3.49 -2.49 -1.49 -0.49 +0.51 s2 58. a. x ( M i x )2 f i n 1 xi 444.95 0.8917 499 3463 n 12.18 6.20 2.22 0.24 0.26 Total 182.70 62.00 88.80 20.41 91.04 444.95 s 0.8917 0.9443 138.52 25 Median = 129 (13th value) Mode = 0 (2 times) b. It appears that this group of young adults eats out much more than the average American. The mean and median are much higher than the average of $65.88 reported in the newspaper. c. Q1 = 95 (7th value) Q3 = 169 (19th value) d. Min = 0 Max = 467 Range = 467 - 0 = 467 IQR = Q3 - Q1 = 169 - 95 = 74 e. s2 = 9271.01 f. The z - score for the largest value is: z s = 96.29 467 138.52 96.29 3.41 It is the only outlier and should be checked for accuracy. 59. a. xi = 760 x xi n 760 38 20 Median is average of 10th and 11th items. Median 36 36 2 36 The modal cash retainer is 40; it appears 4 times. b. For Q1, 68 25 i 20 5 100 Since i is integer, Q1 28 30 2 29 For Q3, i 75 20 15 100 Since i is integer, Q3 c 40 50 2 45 Range = 64 – 15 = 49 Interquartile range = 45 – 29 = 16 d. 2 3318 174.6316 s2 ∑ xi x n 1 20 1 s e. 60. a. 174.6316 13.2148 Coefficient of variation = x xi n 260 s 100 x 13.2148 100 34.8 38 18.57 14 Median = 16.5 (Average of 7th and 8th values) b. s2 = 53.49 c. Quantex has the best record: 11 Days d. z s = 7.31 27 18.57 1.15 7.31 Packard-Bell is 1.15 standard deviations slower than the mean. e. z 12 18.57 0.90 7.31 IBM is 0.9 standard deviations faster than the mean. f. Check Toshiba: z 37 18.57 7.31 2.52 On the basis of z - scores, Toshiba is not an outlier, but it is 2.52 standard deviations slower than the mean. 61. SAMPLE mean = 7195.5 Median = 7019 (average of positions 5 and 6) SAMPLE variance = 7,165,941 SAMPLE standard deviation = 2676.93 62. a. b. The SAMPLE mean is 83.135 and the SAMPLE standard deviation is 16.173. With z = 2, Chebyshev’s theorem gives: 1 1 1 z2 1 1 1 22 4 3 4 Therefore, at least 75% of household incomes are within 2 standard deviations of the mean. Using the SAMPLE mean and SAMPLE standard deviation computed in part (a), the range within 75% of household incomes must fall is 83.135 2(16.173) = 83.135 32.346; thus, 75% of household incomes must fall between 50.789 and 115.481, or $50,789 to $115,481. c. With z = 2, the empirical rule suggests that 95% of household incomes must fall between $50,789 to $115,481. For the same range, the probability obtained using the empirical rule is greater than the probability obtained using Chebyshev’s theorem. d. The z-score for Danbury, CT is 3.04; thus, the Danbury, CT observation is an outlier. 63. a. Public Transportation: x Automobile: x 320 320 32 10 32 10 b. Public Transportation: s = 4.64 Automobile: s = 1.83 c. Prefer the automobile. The mean times are the same, but the auto has less variability. d. DATA in ascending order: Public: 25 28 29 29 32 32 33 34 37 41 Auto: 29 30 31 31 32 32 33 33 34 35 Five number Summaries Public: 25 29 32 34 41 Auto: 29 31 32 33 35 70 Box Plots: Public: 24 28 32 36 40 28 32 36 40 Auto: 24 The box plots do show lower variability with automobile transportation and support the conclusion in part c. 64. a. b. 65. a. The SAMPLE covariance is 502.67. Because the SAMPLE covariance is positive, there is a positive linear relationship between income and home price. The SAMPLE correlation coefficient is .933; this indicates a strong linear relationship between income and home price. Let x = media expenditures ($ millions) and y = shipments in barrels (millions) xi 404.1 x 404.1 40.41 (x x )( y y ) 3763.481 i yi 119.9 10 i i A positive relationship sx 19, 248.469 (x x )2 46.2463 10 1 n 1 939.349 ( y y )2 10.2163 n 1 10 1 sxy 418.1646 rxy 0.885 sx sy (46.2463)(10.2163) sy 119.9 (x x )2 19, 248.469 sxy (xi x )( yi y) 3763.481 418.1646 n 1 10 1 b. y 11.99 10 ( y y)2 939.349 i Note: The same value can also be obtained using Excel's CORREL function 66. a. b. The scatter diagram indicates a positive relationship xi 798 yi 11, 688 x2 71, 306 i rxy xi yi 1, 058, 019 y2 16, 058, 736 i xi yi xiyi / n 1, 058, 019 (798)(11, 688) / 9 .9856 2 2 x2 x / n y2 y / n 71, 306 (798)2 / 9 16, 058, 736 (11, 688)2 / 9 i i i i Strong positive relationship 67. a. The scatter diagram is shown below: 72 3.5 3 Earnings 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 Book Value b. 68. a. b. 69. The SAMPLE correlation coefficient is .75; this indicates a linear relationship between book value and earnings. (800 + 750 + 900)/3 = 817 Month January February March Weight 1 2 3 x wi xi x wi xi wi wi 1(800) 2(750) 3(900) 5000 833 1 2 3 6 20(20) 30(12) 10(7) 15(5) 10(6) 965 11.4 days 20 30 10 15 10 85 70. a. x fi Mi fi Mi Mi x ( Mi x ) 2 fi ( Mi x ) 2 10 40 150 175 75 15 10 475 47 52 57 62 67 72 77 470 2080 8550 10850 5025 1080 770 28,825 -13.68 -8.68 -3.68 +1.32 +6.32 +11.32 +16.32 187.1424 75.3424 13..5424 1.7424 39.9424 128.1424 266.3424 1871.42 3013.70 2031.36 304.92 2995.68 1922.14 2663.42 14,802.64 28, 825 475 60.68 b. s2 14, 802.64 31.23 474 s 31.23 5.59 71. x 1030 fi Mi fi Mi Mi x ( Mi x ) 2 f i ( Mi x ) 2 2 6 4 4 2 2 20 29.5 39.5 49.5 59.5 69.5 79.5 59.0 237.0 198.0 238.0 139.0 159.0 1,030.0 -22 -12 -2 8 18 28 484 144 4 64 324 784 968 864 16 256 648 1568 4320 51.5 20 s 4320 227.37 19 s = 15.08 74 Chapter 4 Introduction to Probability Learning Objectives 1. Obtain an appreciation of the role probability information plays in the decision making process. 2. Understand probability as a numerical measure of the likelihood of occurrence. 3. Know the three methods commonly used for assigning probabilities and understand when they should be used. 4. Know how to use the laws that are available for computing the probabilities of events. 5. Understand how new information can be used to revise initial (prior) probability estimates using Bayes’ theorem. Solutions: 1. Number of experimental Outcomes = (3) (2) (4) = 24 F6I 6! 65 4 3 2 1 HGJ3K3!3! (3 2 1)(3 2 1) 20 2. ABC ABD ABE ABF ACD P36 3. 6! (6 3)! ACE ACF ADE ADF AEF BCD BCE BCF BDE BDF BEF CDE CDF CEF DEF 2nd Toss 3rd Toss (6)(5)(4) 120 BDF BFD DBF DFB FBD FDB 4. a. 1st Toss H H T H (H,H,H) T (H,H,T) H (H,T,H) T (H,T,T) H T H (T,H,H) T (T,H,T) T H (T,T,H) T (T,T,T) b. Let: H be head and T be tail (H,H,H) (T,H,H) (H,H,T) (T,H,T) (H,T,H) (T,T,H) (H,T,T) (T,T,T) c. 5. The outcomes are equally likely, so the probability of each outcomes is 1/8. P(Ei) = 1 / 5 for i = 1, 2, 3, 4, 5 P(Ei) 0 for i = 1, 2, 3, 4, 5 P(E1) + P(E2) + P(E3) + P(E4) + P(E5) = 1 / 5 + 1 / 5 + 1 / 5 + 1 / 5 + 1 / 5 = 1 The classical method was used. 13- 76 6. P(E1) = .40, P(E2) = .26, P(E3) = .34 The relative frequency method was used. 7. 8. No. Requirement (4.3) is not satisfied; the probabilities do not sum to 1. P(E1) + P(E2) + P(E3) + P(E4) = .10 + .15 + .40 + .20 = .85 a. There are four outcomes possible for this 2-step experiment; planning commission positive - council approves; planning commission positive - council disapproves; planning commission negative council approves; planning commission negative - council disapproves. b. Let p = positive, n = negative, a = approves, and d = disapproves Planning Commission Council a (p, a) d p (p, d) n a . (n, a) d (n, d) F5 0 I 50! 50 49 48 47 GJ H4 K4!46! 4 3 2 1 230,300 9. 10. a. Use the relative frequency approach: P(California) = 1,434/2,374 = .60 b. Number not from 4 states = 2,374 - 1,434 - 390 - 217 - 112 = 221 P(Not from 4 States) = 221/2,374 = .09 c. P(Not in Early Stages) = 1 - .22 = .78 d. Estimate of number of Massachusetts companies in early stage of development - (.22)390 86 13- 77 e. If we assume the size of the awards did not differ by states, we can multiply the probability an award went to Colorado by the total venture funds disbursed to get an estimate. Estimate of Colorado funds = (112/2374)($32.4) = $1.53 billion Authors' Note: The actual amount going to Colorado was $1.74 billion. 11. a. No, the probabilities do not sum to one. They sum to .85. b. Owner must revise the probabilities so they sum to 1.00. 12. a. Use the counting rule for combinations: FG49IJ 49! (49)(48)(47)(46)(45) 1,906,884 H5 K 5!44! (5)(4)(3)(2)(1) b. Very small: 1/1,906,884 = 0.0000005 c. Multiply the answer to part (a) by 42 to get the number of choices for the six numbers. No. of Choices = (1,906,884)(42) = 80,089,128 Probability of Winning = 1/80,089,128 = 0.0000000125 13. Initially a probability of .20 would be assigned if selection is equally likely. DATA does not appear to confirm the belief of equal consumer preference. For example using the relative frequency method we would assign a probability of 5 / 100 = .05 to the design 1 outcome, .15 to design 2, .30 to design 3, .40 to design 4, and .10 to design 5. 14. a. P (E2) = 1 / 4 b. P(any 2 outcomes) = 1 / 4 + 1 / 4 = 1 / 2 c. P(any 3 outcomes) = 1 / 4 + 1 / 4 + 1 / 4 = 3 / 4 15. a. S = {ace of clubs, ace of diamonds, ace of hearts, ace of spades} b. S = {2 of clubs, 3 of clubs, . . . , 10 of clubs, J of clubs, Q of clubs, K of clubs, A of clubs} c. There are 12; jack, queen, or king in each of the four suits. d. For a: 4 / 52 = 1 / 13 = .08 For b: 13 / 52 = 1 / 4 = .25 For c: 12 / 52 = .23 13- 78 16. a. (6) (6) = 36 SAMPLE points b. Die 2 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Total for Both Die 1 c. 6 / 36 = 1 / 6 d. 10 / 36 = 5 / 18 e. No. P(odd) = 18 / 36 = P(even) = 18 / 36 or 1 / 2 for both. f. Classical. A probability of 1 / 36 is assigned to each experimental outcome. 17. a. (4, 6), (4, 7), (4 , 8) b. .05 + .10 + .15 = .30 c. (2, 8), (3, 8), (4, 8) d. .05 + .05 + .15 = .25 e. .15 18. a. 0; probability is .05 b. 4, 5; probability is .10 + .10 = .20 c. 0, 1, 2; probability is .05 + .15 + .35 = .55 19. a. b. Yes, the probabilities are all greater than or equal to zero and they sum to one. P(A) = P(0) + P(1) + P(2) = .08 + .18 + .32 = .58 13- 79 . c. P(B) = P(4) = .12 20. a. P(N) = 56/500 = .112 b. P(T) = 43/500 = .086 c. Total in 6 states = 56 + 53 + 43 + 37 + 28 + 28 = 245 P(B) = 245/500 = .49 Almost half the Fortune 500 companies are headquartered in these states. 21. a. P(A) = P(1) + P(2) + P(3) + P(4) + P(5) = 20 12 6 3 1 50 50 50 50 50 = .40 + .24 + .12 + .06 + .02 = .84 b. P(B) = P(3) + P(4) + P(5) = .12 + .06 + .02 = .20 c. 22. a. P(2) = 12 / 50 = .24 P(A) = .40, P(B) = .40, P(C) = .60 b. P(A B) = P(E1, E2, E3, E4) = .80. Yes P(A B) = P(A) + P(B). c. Ac = {E3, E4, E5} Cc = {E1, E4} P(Ac) = .60 P(Cc) = .40 d. A Bc = {E1, E2, E5} P(A Bc) = .60 e. P(B C) = P(E2, E3, E4, E5) = .80 23. a. P(A) = P(E1) + P(E4) + P(E6) = .05 + .25 + .10 = .40 P(B) = P(E2) + P(E4) + P(E7) = .20 + .25 + .05 = .50 P(C) = P(E2) + P(E3) + P(E5) + P(E7) = .20 + .20 + .15 + .05 = .60 b. A B = {E1, E2, E4, E6, E7} P(A B) = P(E1) + P(E2) + P(E4) + P(E6) + P(E7) = .05 + .20 + .25 + .10 + .05 = .65 c. A B = {E4} P(A B) = P(E4) = .25 13- 80 d. Yes, they are mutually exclusive. e. Bc = {E1, E3, E5, E6}; P(Bc) = P(E1) + P(E3) + P(E5) + P(E6) = .05 + .20 + .15 + .10 = .50 24. Let E = experience exceeded expectations M = experience met expectations a. Percentage of respondents that said their experience exceeded expectations = 100 - (4 + 26 + 65) = 5% P(E) = .05 b. P(M E) = P(M) + P(E) = .65 + .05 = .70 25. Let Y = high one-year return M = high five-year return a. P(Y) = 15/30 = .50 P(M) = 12/30 = .40 P(Y M) = 6/30 = .20 b. P(Y M) = P(Y) + P(M) - P(Y M) = .50 + .40 - .20 = .70 c. 1 - P(Y M) = 1 - .70 = .30 26. Let Y = high one-year return M = high five-year return a. P(Y) = 9/30 = .30 P(M) = 7/30 = .23 b. P(Y M) = 5/30 = .17 c. P(Y M) = .30 + .23 - .17 = .36 P(Neither) = 1 - .36 = .64 27. Big Ten Pac-10 a. Yes No P(Neither) = 6823 13, 429 Yes 849 2112 2,961 No 3645 6823 10,468 .51 13- 81 4494 8935 13,429 b. P(Either) = c. P(Both) = 2961 4494 849 .05 13, 429 13, 429 13, 429 849 13, 429 28. .06 Let: B = rented a car for business reasons P = rented a car for personal reasons a. P(B P) = P(B) + P(P) - P(B P) = .54 + .458 - .30 = .698 b. P(Neither) = 1 - .698 = .302 29. a. P(E) = 1033 .36 2851 P(R) = 854 2851 P(D) = 964 2851 .30 .34 b. Yes; P(E D) = 0 c. Probability = 1033 .43 2375 d. Yes e. P(E A) = P(E) + P(A) = .36 + .18 = .54 30. a. b. c. 31. a. b. P(A B) = P(A B) = .40 = .6667 P(B) .60 P(B A) = P(A B) = .40 = .80 P(A) .50 No because P(A | B) P(A) P(A B) = 0 P(A B) = P(A B) = 0 = 0 P(B) .4 c. No. P(A | B) P(A); the events, although mutually exclusive, are not independent. d. Mutually exclusive events are dependent. 13- 82 13- 83 32. a. Single Married Total Under 30 .55 .10 .65 30 or over .20 .15 .35 Total .75 .25 1.00 b. 65% of the customers are under 30. c. The majority of customers are single: P(single) = .75. d. .55 e. Let: A = event under 30 B = event single f. P(A B) = .55 P(B A) = P(A B) = .55 = .8462 P(A) .65 P(A)P(B) = (.65)(.75) = .49 Since P(A B) P(A)P(B), they cannot be independent events; or, since P(A | B) P(B), they cannot be independent. 33. a. Reason for Applying Quality Cost/Convenience Other Total Full Time .218 .204 .039 .461 Part Time .208 .307 .024 .539 .426 .511 .063 1.00 b. It is most likely a student will cite cost or convenience as the first reason - probability = .511. School quality is the first reason cited by the second largest number of students - probability = .426. c. P(Quality | full time) = .218 / .461 = .473 d. P(Quality | part time) = .208 / .539 = .386 13- 84 e. For independence, we must have P(A)P(B) = P(A B). From the table, P(A B) = .218, P(A) = .461, P(B) = .426 P(A)P(B) = (.461)(.426) = .196 Since P(A)P(B) P(A B), the events are not independent. 34. a. P(O) = 0.38 + 0.06 = 0.44 b. P(Rh-) = 0.06 + 0.02 + 0.01 + 0.06 = 0.15 c. P(both Rh-) = P(Rh-) P(Rh-) = (0.15)(0.15) = 0.0225 d. P(both AB) = P(AB) P(AB) = (0.05)(0.05) = 0.0025 e. P(Rh O) f. P(Rh+) = 1 - P(Rh-) = 1 - 0.15 = 0.85 P(B Rh+) = P(Rh O) P(O) P(B Rh+) 0.06 0.136 0.44 = P(Rh+) 35. a. 0.09 = 0.106 0.85 P(Up for January) = 31 / 48 = 0.646 b. P(Up for Year) = 36 / 48 = 0.75 c. P(Up for Year Up for January) = 29 / 48 = 0.604 P(Up for Year | Up for January) = 0.604 / 0.646 = 0.935 d. They are not independent since P(Up for Year) P(Up for Year | Up for January) 0.75 0.935 36. a. Occupation Cabinetmaker Lawyer Physical Therapist Systems Analyst Under 50 .000 .150 .000 .050 Total .200 50-59 .050 .050 .125 .025 .250 Satisfaction Score 60-69 70-79 .100 .075 .025 .025 .050 .025 .100 .075 .275 .200 b. P(80s) = .075 (a marginal probability) c. P(80s | PT) = .050/.250 = .20 (a conditional probability) d. P(L) = .250 (a marginal probability) 13- 85 80-89 .025 .000 .050 .000 .075 Total .250 .250 .250 .250 1.000 e. P(L Under 50) = .150 (a joint probability) f. P(Under 50 | L) = .150/.250 = .60 (a conditional probability) g. P(70 or higher) = .275 (Sum of marginal probabilities) 37. a. P(A B) = P(A)P(B) = (.55)(.35) = .19 b. P(A B) = P(A) + P(B) - P(A B) = .55 + .35 - .19 = .71 c. P(shutdown) = 1 - P(A B) = 1 - .71 = .29 38. a. b. P(Telephone) 52 0.2737 190 This is an intersection of two events. It seems reasonable to assume the next two messages will be independent; we use the multiplication rule for independent events. P(E-mail Fax) = P(E-mail) P(Fax) = c. 30 15 190 190 This is a union of two mutually exclusive events. 0.0125 P(Telephone Interoffice Mail) = P(Telephone) + P(Interoffice Mail) = 39. a. b. 52 18 70 0.7368 190 190 190 Yes, since P(A1 A2) = 0 P(A1 B) = P(A1)P(B | A1) = .40(.20) = .08 P(A2 B) = P(A2)P(B | A2) = .60(.05) = .03 c. P(B) = P(A1 B) + P(A2 B) = .08 + .03 = .11 d. 1 B) = .08 = .7273 .11 2 B) = .03 = .2727 .11 P(A P(A 40. a. P(B A1) = P(A1)P(B | A1) = (.20) (.50) = .10 P(B A2) = P(A2)P(B | A2) = (.50) (.40) = .20 P(B A3) = P(A3)P(B | A3) = (.30) (.30) = .09 b. P(A2 B) = .20 .10 + .20 + .09 = .51 c. Events A1 P(Ai) .20 P(B | Ai) .50 13- 86 P(Ai B) .10 P(Ai | B) .26 A2 A3 .50 .40 .20 .51 .30 .30 .09 .23 1.00 .39 1.00 S1 = successful, S2 = not successful and B = request received for additional information. 41. a. P(S1) = .50 b. P(B | S1) = .75 c. P(S1 B) = 42. (.50) (.75) = .375 = .65 (.50) (.75) + (.50) (.40) .575 M = missed payment D1 = customer defaults D2 = customer does not default P(D1) = .05 P(D2) = .95 P(M | D2) = .2 P(M | D1) = 1 a. P( D1)P( M D1) P( D1 M) = P( D )P(M D ) + P( D )P( M D ) 1 1 2 2 (.05) (1) = (.05) (1) + (.95) (.2) = .05 .24 = .21 b. 43. Yes, the probability of default is greater than .20. Let: S = small car Sc = other type of vehicle F = accident leads to fatality for vehicle occupant We have P(S) = .18, so P(Sc) = .82. Also P(F | S) = .128 and P(F | Sc) = .05. Using the tabular form of Bayes Theorem provides: Events S Sc Prior Probabilities .18 .82 1.00 Conditional Probabilities .128 .050 Joint Probabilities .023 .041 .064 Posterior Probabilities .36 .64 1.00 From the posterior probability column, we have P(S | F) = .36. So, if an accident leads to a fatality, the probability a small car was involved is .36. 13- 87 44. Let A1 = Story about Basketball Team A2 = Story about Hockey Team W = "We Win" headline P(A1) = .60 P(W | A1) = .641 P(A2) = .40 P(W | A2) = .462 Ai A1 A2 P(W | A1) .641 .462 P(Ai) .60 .40 P(W Ai) .3846 .1848 .5694 The probability the story is about the basketball team is .6754. 45. a. Let S = person is age 65 or older P(S) = b. 34, 991, 753 .12 281, 421, 906 Let D = takes prescription drugs regularly P(D) = P(D S) + P(D Sc) = P(D | S)P(S) + P(D | Sc)P(Sc) = .82(.12) + .49(.88) = .53 c. Let D5 = takes 5 or more prescriptions P(D5 S) = P(D5 | S)P(S) = .40(.12) = .048 d. P(S | D5) = P(S D5 ) P(D5 ) P(D5) = P(S D5) + P(Sc D5) = P(D5 | S)P(S) + P(D5 | Sc)P(Sc) = .40(.12) + (.28)(.88) = .048 + .246 = .294 P(S | D 5) = 46. a. .048 .16 .294 P(Excellent) = .18 P(Pretty Good) = .50 P(Pretty Good Excellent) = .18 + .50 = .68 13- 88 P(Ai | M ) .3846/.5694 .1848/.5694 = .6754 = .3246 1.0000 Note: Events are mutually exclusive since a person may only choose one rating. b. 1035 (.05) = 51.75 We estimate 52 respondents rated US companies poor. c. 1035 (.01) = 10.35 We estimate 10 respondents did not know or did not answer. 47. a. b. (2) (2) = 4 Let s = successful u = unsuccessful Oil Bonds s E1 u s E2 u s E3 u E4 c. O = {E1, E2} d. M = {E1, E3} O M = {E1, E2, E3} e. O M = {E1} f. No; since O M has a SAMPLE point. 48. a. P(satisfied) = 0.61 13- 89 b. The 18 - 34 year old group (64% satisfied) and the 65 and over group (70% satisfied). c. P(not satisfied) = 0.26 + 0.04 = 0.30 49. Let I = treatment-caused injury D = death from injury N = injury caused by negligence M = malpractice claim filed $ = payment made in claim We are given P(I) = 0.04, P(N | I) = 0.25, P(D | I) = 1/7, P(M | N) = 1/7.5 = 0.1333, and P($ | M) = 0.50 a. P(N) = P(N | I) P(I) + P(N | Ic) P(Ic) = (0.25)(0.04) + (0)(0.96) = 0.01 b. P(D) = P(D | I) P(I) + P(D | Ic) P(Ic) = (1/7)(0.04) + (0)(0.96) = 0.006 c. P(M) = P(M | N) P(N) + P(M | Nc) P(Nc) = (0.1333)(0.01) + (0)(0.99) = 0.001333 P($) = P($ | M) P(M) + P($ | Mc) P(Mc) = (0.5)(0.001333) + (0)(0.9987) = 0.00067 50. a. Probability of the event = P(average) + P(above average) + P(excellent) = 11 14 13 50 50 50 = .22 + .28 + .26 = .76 b. Probability of the event = P(poor) + P(below average) = 51. a. 4 8 .24 50 50 P(leases 1) = 168 / 932 = 0.18 b. P(2 or fewer) = 401 / 932 + 242 / 932 + 65 / 932 = 708 / 932 = 0.76 c. P(3 or more) = 186 / 932 + 112 / 932 = 298 / 932 = 0.32 d. P(no cars) = 19 / 932 = 0.02 13- 90 52. a. Yes No Total 23 and Under .1026 .0996 .2022 24 - 26 .1482 .1878 .3360 27 - 30 .0917 .1328 .2245 31 - 35 .0327 .0956 .1283 36 and Over .0253 .0837 .1090 Total .4005 .5995 1.0000 b. .2022 c. .2245 + .1283 + .1090 = .4618 d. .4005 53. a. . P(24 to 26 | Yes) = .1482 / .4005 = .3700 b. P(Yes | 36 and over) = .0253 / .1090 = .2321 c. .1026 + .1482 + .1878 + .0917 + .0327 + .0253 = .5883 d. P(31 or more | No) = (.0956 + .0837) / .5995 = .2991 e. No, because the conditional probabilities do not all equal the marginal probabilities. For instance, P(24 to 26 | Yes) = .3700 P(24 to 26) = .3360 54. Let I = important or very important M = male F = female a. P(I) = .49 (a marginal probability) b. P(I | M) = .22/.50 = .44 (a conditional probability) 13- 91 c. P(I | F) = .27/.50 = .54 (a conditional probability) d. It is not independent P(I) = .49 P(I | M) = .44 and P(I) = .49 P(I | F) = .54 e. 55. a. Since level of importance is dependent on gender, we conclude that male and female respondents have different attitudes toward risk. P(B S) = P(B S) = .12 = .30 P(S) .40 We have P(B | S) > P(B). Yes, continue the ad since it increases the probability of a purchase. b. c. Estimate the company’s market share at 20%. Continuing the advertisement should increase the market share since P(B | S) = .30. P(B S) = P(B S) = .10 = .333 P(S) .30 The second ad has a bigger effect. 56. a. P(A) = 200/800 = .25 b. c. P(B) = 100/800 = .125 P(A B) = 10/800 = .0125 d. P(A | B) = P(A B) / P(B) = .0125 / .125 = .10 e. No, P(A | B) P(A) = .25 57. Let A = lost time accident in current year B = lost time accident previous year Given: P(B) = .06, P(A) = .05, P(A | B) = .15 a. P(A B) = P(A | B)P(B) = .15(.06) = .009 b. P(A B) = P(A) + P(B) - P(A B) = .06 + .05 - .009 = .101 or 10.1% 58. Let: A = return is fraudulent B = exceeds IRS standard for deductions Given: P(A | B) = .20, P(A | Bc) = .02, P(B) = .08, find P(A) = .3. Note P(Bc) = 1 - P(B) = .92 13- 92 P(A) = P(A B) + P(A Bc) = P(B)P(A | B) + P(Bc)P(A | Bc) = (.08)(.20) + (.92)(.02) = .0344 We estimate 3.44% will be fraudulent. 59. a. b. P(Oil) = .50 + .20 = .70 Let S = Soil test results Events High Quality (A1) Medium Quality (A2) No Oil (A3) P(Ai) .50 .20 .30 1.00 P(S | Ai) .20 .80 .20 P(Ai S) .10 .16 .06 P(S) = .32 P(Ai | S) .31 .50 .19 1.00 P(Oil) = .81 which is good; however, probabilities now favor medium quality rather than high quality oil. 60. a. A1 = field will produce oil A2 = field will not produce oil W = well produces oil Events Oil in Field No Oil in Field P(Ai) .25 .75 1.00 P(Wc | Ai) .20 1.00 P(Wc Ai) .05 .75 .80 P(Ai | Wc) .0625 .9375 1.0000 The probability the field will produce oil given a well comes up dry is .0625. b. Events Oil in Field No Oil in Field P(Ai) .0625 .9375 1.0000 P(Wc | Ai) .20 1.00 P(Wc Ai) .0125 .9375 .9500 P(Ai | Wc) .0132 .9868 1.0000 The probability the well will produce oil drops further to .0132. c. Suppose a third well comes up dry. The probabilities are revised as follows: Events Oil in Field Incorrect Adjustment P(Ai) .0132 .9868 1.0000 P(Wc | Ai) .20 1.00 P(Wc Ai) .0026 .9868 .9894 Stop drilling and abandon field if three consecutive wells come up dry. 13- 93 P(Ai | Wc) .0026 .9974 1.0000 Chapter 5 Discrete Probability Distributions Learning Objectives 1. Understand the concepts of a random variable and a probability distribution. 2. Be able to distinguish between discrete and continuous random variables. 3. Be able to compute and interpret the expected value, variance, and standard deviation for a discrete random variable and understand how an Excel worksheet can be used to ease the burden of the calculations. 4. Be able to compute probabilities using a binomial probability distribution and be able to compute these probabilities using Excel's BINOMDIST function. 5. Be able to compute probabilities using a Poisson probability distribution and be able to compute these probabilities using Excel's POISSON function. 6. Know when and how to use the hypergeometric probability distribution and be able to compute these probabilities using Excel's HYPGEOMDIST function. 13- 94 Solutions: 1. a. Head, Head (H,H) Head, Tail (H,T) Tail, Head (T,H) Tail, Tail (T,T) b. x = number of heads on two coin tosses c. Outcome (H,H) (H,T) (T,H) (T,T) 2. Values of x 2 1 1 0 d. Discrete. It may assume 3 values: 0, 1, and 2. a. Let x = time (in minutes) to assemble the product. b. It may assume any positive value: x > 0. c. Continuous 3. Let Y = position is offered N = position is not offered a. S = {(Y,Y,Y), (Y,Y,N), (Y,N,Y), (Y,N,N), (N,Y,Y), (N,Y,N), (N,N,Y), (N,N,N)} b. Let N = number of offers made; N is a discrete random variable. c. Experimental Outcome Value of N 4. 5. (Y,Y,Y) (Y,Y,N) (Y,N,Y) (Y,N,N) (N,Y,Y) (N,Y,N) (N,N,Y) (N,N,N) 3 2 2 1 2 1 1 0 x = 0, 1, 2, . . ., 12. a. S = {(1,1), (1,2), (1,3), (2,1), (2,2), (2,3)} b. Experimental Outcome Number of Steps Required 6. (1,1) 2 (1,2) 3 a. values: 0,1,2,...,20 discrete b. values: 0,1,2,... discrete c. values: 0,1,2,...,50 discrete 13- 95 (1,3) 4 (2,1) 3 (2,2) 4 (2,3) 5 7. d. values: 0 x 8 continuous e. values: x > 0 continuous a. f (x) 0 for all values of x. f (x) = 1 Therefore, it is a proper probability distribution. 8. b. Probability x = 30 is f (30) = .25 c. Probability x 25 is f (20) + f (25) = .20 + .15 = .35 d. Probability x > 30 is f (35) = .40 a. x 1 2 3 4 f (x) 3/20 = .15 5/20 = .25 8/20 = .40 4/20 = .20 Total 1.00 b. f (x) .4 .3 .2 .1 x 1 c. 2 f (x) 0 for x = 1,2,3,4. f (x) = 1 9. a. Age 6 7 8 9 Number of Children 37,369 87,436 160,840 239,719 f(x) 0.018 0.043 0.080 0.119 13- 96 3 4 10 11 12 13 14 286,719 306,533 310,787 302,604 289,168 2,021,175 0.142 0.152 0.154 0.150 0.143 1.001 b. f(x) .16 .14 .12 .10 .08 .06 .04 .02 x 6 c. 7 8 9 10 11 12 13 14 f(x) 0 for every x f(x) = 1 Note: f(x) = 1.001 in part (a); difference from 1 is due to rounding values of f(x). 10. a. x 1 2 3 4 5 f(x) 0.05 0.09 0.03 0.42 0.41 1.00 x 1 2 3 4 5 f(x) 0.04 0.10 0.12 0.46 0.28 1.00 b. c. P(4 or 5) = f (4) + f (5) = 0.42 + 0.41 = 0.83 13- 97 d. Probability of very satisfied: 0.28 e. Senior executives appear to be more satisfied than middle managers. 83% of senior executives have a score of 4 or 5 with 41% reporting a 5. Only 28% of middle managers report being very satisfied. 11. a. Duration of Call x 1 2 3 4 f(x) 0.25 0.25 0.25 0.25 1.00 b. f (x) 0.30 0.20 0.10 x 0 1 2 3 4 c. f (x) 0 and f (1) + f (2) + f (3) + f (4) = 0.25 + 0.25 + 0.25 + 0.25 = 1.00 d. f (3) = 0.25 e. P(overtime) = f (3) + f (4) = 0.25 + 0.25 = 0.50 12. a. b. 13. a. Yes; f (x) 0 for all x and f (x) = .15 + .20 + .30 + .25 + .10 = 1 P(1200 or less) = f (1000) + f (1100) + f (1200) = .15 + .20 + .30 = .65 Yes, since f (x) 0 for x = 1,2,3 and f (x) = f (1) + f (2) + f (3) = 1/6 + 2/6 + 3/6 = 1 b. f (2) = 2/6 = .333 c. f (2) + f (3) = 2/6 + 3/6 = .833 14. a. f (200) = 1 - f (-100) - f (0) - f (50) - f (100) - f (150) = 1 - .95 = .05 This is the probability MRA will have a $200,000 profit. 13- 98 b. P(Profit) = f (50) + f (100) + f (150) + f (200) = .30 + .25 + .10 + .05 = .70 c. P(at least 100) = f (100) + f (150) + f (200) = .25 + .10 +.05 = .40 15. a. x 3 6 9 f (x) .25 .50 .25 1.00 x f (x) .75 3.00 2.25 6.00 E (x) = = 6.00 b. x - -3 0 3 x 3 6 9 (x - )2 9 0 9 f (x) .25 .50 .25 (x - )2f (x) 2.25 0.00 2.25 4.50 Var (x) = 2 = 4.50 c. = 4.50 = 2.12 16. a. y 2 4 7 8 f (y) .20 .30 .40 .10 1.00 E(y) = = 5.20 b. y- -3.20 -1.20 1.80 2.80 y 2 4 7 8 (y - )2 10.24 1.44 3.24 7.84 y f (y) .40 1.20 2.80 .80 5.20 (y - )2 f (y) 2.048 .432 1.296 .784 4.560 f (y) .20 .30 .40 .10 Var ( y) 4.56 4.56 2.14 17. a/b. x 0 f (x) .10 x f (x) .00 x- -2.45 13- 99 (x - )2 6.0025 (x - )2 f (x) .600250 1 2 3 4 5 E (x) 2 .15 .30 .20 .15 .10 .15 .60 .60 .60 .50 2.45 2.1025 .2025 .3025 2.4025 6.5025 .315375 .060750 .060500 .360375 .650250 2.047500 = = 2.45 = 2.0475 = 1.4309 18. a/b. x 0 1 2 3 4 5 -1.45 - .45 .55 1.55 2.55 f (x) .01 .23 .41 .20 .10 .05 E (x) = x f (x) .00 .23 .82 .60 .40 .25 2.30 x- -2.3 -1.3 -0.3 0.7 1.7 2.7 (x - )2 5.29 1.69 0.09 0.49 2.89 7.29 Var (x) = 2 = (x - )2 f (x) .0529 .3887 .0369 .0980 .2890 .3645 1.2300 The expected value, E (x) = 2.3, of the probability distribution is the same as the average reported in the 1997 Statistical Abstract of the United States. The variance of the number of television sets per household is Var (x) = 1.23 television sets squared. The standard deviation is = 1.11 television sets. 19. a. E (x) = x f (x) = 0 (.50) + 2 (.50) = 1.00 b. E (x) = x f (x) = 0 (.61) + 3 (.39) = 1.17 c. The expected value of a 3 - point shot is higher. So, if these probabilities hold up, the team will make more points in the long run with the 3 - point shot. 20. a. x 0 400 1000 2000 4000 6000 f (x) .90 .04 .03 .01 .01 .01 1.00 x f (x) 0.00 16.00 30.00 20.00 40.00 60.00 166.00 E (x) = 166. If the company charged a premium of $166.00 they would break even. b. Gain to Policy Holder -260.00 140.00 740.00 1,740.00 3,740.00 5,740.00 f (Gain) .90 .04 .03 .01 .01 .01 13- 100 (Gain) f (Gain) -234.00 5.60 22.20 17.40 37.40 57.40 -94.00 E (gain) = -94.00. The policy holder is more concerned that the big accident will break him than with the expected annual loss of $94.00. 21. a. E (x) = x f (x) = 0.05(1) + 0.09(2) + 0.03(3) + 0.42(4) + 0.41(5) = 4.05 b. E (x) = x f (x) = 0.04(1) + 0.10(2) + 0.12(3) + 0.46(4) + 0.28(5) = 3.84 c. Executives: d. Middle Managers: 2 = (x - )2 f(x) = 1.1344 Executives: = 1.1169 2 = (x - )2 f(x) = 1.2475 Middle Managers: = 1.0651 e. 22. a. The senior executives have a higher average score: 4.05 vs. 3.84 for the middle managers. The executives also have a slightly higher standard deviation. E (x) = x f (x) = 300 (.20) + 400 (.30) + 500 (.35) + 600 (.15) = 445 The monthly order quantity should be 445 units. b. 23. a. Cost: 445 @ $50 = $22,250 Revenue: 300 @ $70 = 21,000 $ 1,250 Loss Laptop: E (x) = .47(0) + .45(1) + .06(2) + .02(3) = .63 Desktop: E (x) = .06(0) + .56(1) + .28(2) + .10(3) = 1.42 b. Laptop: Var (x) = .47(-.63)2 + .45(.37)2 + .06(1.37)2 + .02(2.37)2 = .4731 Desktop: Var (x) = .06(-1.42)2 + .56(-.42)2 + .28(.58)2 + .10(1.58)2 = .5636 c. 24. a. From the expected values in part (a), it is clear that the typical subscriber has more desktop computers than laptops. There is not much difference in the variances for the two types of computers. Medium E (x) = x f (x) = 50 (.20) + 150 (.50) + 200 (.30) = 145 Large: E (x) = x f (x) = 0 (.20) + 100 (.50) + 300 (.30) = 140 Medium preferred. b. Medium x f (x) x- (x - )2 13- 101 (x - )2 f (x) 50 150 200 .20 .50 .30 -95 5 55 9025 25 3025 1805.0 12.5 907.5 2 = 2725.0 Large y 0 100 300 y - -140 -40 160 f (y) .20 .50 .30 (y - )2 19600 1600 25600 (y - )2 f (y) 3920 800 7680 2 = 12,400 Medium preferred due to less variance. 25. a. S F S F S F 2 1 1 2! b. f (1) c. f (0) d. f (2) e. P (x 1) = f (1) + f (2) = .48 + .16 = .64 f. E (x) = n p = 2 (.4) = .8 1 (.4) (.6) 2 0 0 2 (.4) (.6) 2 2 2 1! 1! (.4) (.6) (.4)(.6) .48 2! 0! 2! 0 2! 2! 0! (1)(.36) .36 (.16)(1) .16 Var (x) = n p (1 - p) = 2 (.4) (.6) = .48 = .48 = .6928 26. a. f (0) = .3487 b. f (2) = .1937 c. P(x 2) = f (0) + f (1) + f (2) = .3487 + .3874 + .1937 = .9298 d. P(x 1) = 1 - f (0) = 1 - .3487 = .6513 e. E (x) = n p = 10 (.1) = 1 13- 102 f. Var (x) = n p (1 - p) = 10 (.1) (.9) = .9 = .9 = .9487 27. a. f (12) = .1144 b. f (16) = .1304 c. P (x 16) d. P (x 15) = 1 - P (x 16) = 1 - .2374 = .7626 e. E (x) = n p = 20(.7) = 14 f. Var (x) = n p (1 - p) = 20 (.7) (.3) = 4.2 = 28. a. b. = f (16) + f (17) + f (18) + f (19) + f (20) = .1304 + .0716 + .0278 + .0068 + .0008 = .2374 4.2 = 2.0494 f (2) 6 2 2 4 .3292 (.33) (.67) P(at least 2) = 1 - f(0) - f(1) 6 = 1 0 6 (.33) (.67) 0 6 1 (.33) (.67) 1 5 = 1 - .0905 - .2673 = .6422 c. f (0) 10 0 0 10 (.33) (.67) .0182 29. P(At Least 5) 30. a. Probability of a defective part being produced must be .03 for each trial; trials must be independent. b. = 1 - f (0) - f (1) - f (2) - f (3) - f (4) = 1 - .0000 - .0005 - .0031 - .0123 - .0350 = .9491 Let: D = defective G = not defective 13- 103 1st part 2nd part Experimental Outcome D Number Defective (D, D) 2 (D, G) 1 (G, D) 1 (G, G) 0 G D . G D G c. 2 outcomes result in exactly one defect. d. P (no defects) = (.97) (.97) = .9409 P (1 defect) = 2 (.03) (.97) = .0582 P (2 defects) = (.03) (.03) = .0009 31. Binomial n = 10 and p = .05 f ( x) 10! x!(10 x)! (.05) x (.95)10x a. Yes. Since they are selected randomly, p is the same from trial to trial and the trials are independent. b. f (2) = .0746 c. f (0) = .5987 d. P (At least 1) = 1 - f (0) = 1 - .5987 = .4013 32. a. b. .90 P (at least 1) = f (1) + f (2) 1 1 2! f (1) = (.9) (.1) 1! 1! = 2 (.9) (.1) = .18 2 0 2! f (2) = (.9) (.1) 2! 0! = 1 (.81) (1) = .81 P (at least 1) = .18 + .81 = .99 Alternatively 13- 104 P (at least 1) = 1 - f (0) 0 2 2! f (0) = (.9) (.1) = .01 0! 2! Therefore, P (at least 1) = 1 - .01 = .99 c. P (at least 1) = 1 - f (0) 3! f (0) = 0 3 (.9) (.1) = .001 0! 3! Therefore, P (at least 1) = 1 - .001 = .999 d. Yes; P (at least 1) becomes very close to 1 with multiple systems and the inability to detect an attack would be catastrophic. 20! 33. a. f(12) = (.5)12 (.5)8 12!8! Using Table 5 in Appendix 8, f(12) = .0708 b. f(0) + f(1) + f(2) + f(3) + f(4) + f(5) .0000 + .0000 + .0002 + .0011 + .0046 + .0148 = .0207 c. E(x) = np = 20(.5) = 10 d. Var (x) = 2 = np(1 - p) = 20(.5)(.5) = 5 = 5 = 2.24 34. a. f (3) = .0634 b. The answer here is the same as part (a). The probability of 12 failures with p = .60 is the same as the probability of 3 successes with p = .40. c. f (3) + f (4) + · · · + f (15) = 1 - f (0) - f (1) - f (2) = 1 - .0005 - .0047 - .0219 = .9729 35. a. f (0) + f (1) + f (2) = .0115 + .0576 + .1369 = .2060 b. f (4) = .2182 c. 1 - [ f (0) + f (1) + f (2) + f (3) ] d. = n p = 20 (.20) = 4 36. x 0 f (x) .343 = 1 - .2060 - .2054 = .5886 x- -.9 (x - )2 .81 13- 105 (x - )2 f (x) .27783 1 2 3 37. .441 .189 .027 1.000 .1 1.1 2.1 .01 1.21 4.41 .00441 .22869 .11907 2 = .63000 E(x) = n p = 30(0.29) = 8.7 2 = n p (1 - p) = 30(0.29)(0.71) = 6.177 = 6.177 = 2.485 38. a. f (x) 3x e3 x! b. f (2) 32 e3 9(.0498) .2241 2! 2 c. f (1) 31 e3 3(.0498) .1494 1! d. P (x 2) = 1 - f (0) - f (1) = 1 - .0498 - .1494 = .8008 39. a. f (x) 2x e2 x! b. = 6 for 3 time periods c. f (x) 6x e6 x! d. f (2) 22 e2 4(.1353) .2706 2! 2 e. f (6) 66 e6 .1606 6! f. f (5) 45 e4 .1563 5! 40. a. = 48 (5 / 60) = 4 f (3) b. 43 e4 (64)(.0183) .1952 3! 6 = 48 (15 / 60) = 12 13- 106 f (10) 1210 e12 .1048 10! c. = 48 (5 / 60) = 4 I expect 4 callers to be waiting after 5 minutes. f (0) 40 e4 .0183 0! The probability none will be waiting after 5 minutes is .0183. d. = 48 (3 / 60) = 2.4 f (0) 2.40 e2.4 .0907 0! The probability of no interruptions in 3 minutes is .0907. 41. a. b. 30 per hour = 1 (5/2) = 5/2 f (3) 5 / 2 e 3! 3 c. 42. a. b. f (0) f (0) (5 / 2) .2138 5 / 20 e(5 / 2) e(5/ 2) .0821 0! 7 0 7 e e 7 .0009 0! probability = 1 - [f(0) + f(1)] f (1) 71 e7 7e 7 .0064 1! probability = 1 - [.0009 + .0064] = .9927 c. = 3.5 f (0) 3.50 e3.5 e 3.5 .0302 0! probability = 1 - f(0) = 1 - .0302 = .9698 d. 43. a. probability = 1 - [f(0) + f(1) + f(2) + f(3) + f(4)] = 1 - [.0009 + .0064 + .0223 + .0521 + .0912] = .8271 f (0) 100 e10 e 10 .000045 0! 13- 107 b. f (0) + f (1) + f (2) + f (3) f (0) = .000045 (part a) 1 f (1) = 10 e -10 = .00045 1! Similarly, f (2) = .00225, f (3) = .0075 and f (0) + f (1) + f (2) + f (3) = .010245 c. 2.5 arrivals / 15 sec. period Use = 2.5 f (0) d. 44. 2.50 e2.5 .0821 0! 1 - f (0) = 1 - .0821 = .9179 Poisson distribution applies a. = 1.25 per month b. f (0) 1.250 e1.25 0.2865 0! c. f (1) 1.251 e1.25 0.3581 1! d. P (More than 1) = 1 - f (0) - f (1) = 1 - 0.2865 - 0.3581 = 0.3554 45. a. average per month = 18 1.5 12 b. f (0) 1.50 e1.5 e 1.5 .2231 0! c. probability = 1 - [f(0) + f(1)] = 1 - [.2231 + .3347] = .4422 3 10 3 1 4 1 46. a. f (1) 10 4 3! 1!2! 7! 3!4! 10! 4!6! (3)(35) 210 13- 108 .50 b. 3 10 3 2 22 f (2) (3)(1) 10 .067 45 2 c. f (0) 3 10 3 0 20 (1)(21) 10 .4667 45 2 d. 3 10 3 2 42 f (2) (3)(21) 10 .30 210 4 4 15 4 3 10 3 f (3) 47. (4)(330) 15 .4396 3003 10 48. Hypergeometric with N = 10 and r = 6 a. f (2) 6 2 4 1 10 (15)(4) .50 120 3 b. Must be 0 or 1 prefer Coke Classic. f (1) 6 1 4 2 10 (6)(6) .30 120 3 f (0) 6 0 4 3 10 (1)(4) .0333 120 3 P (Majority Pepsi) = f (1) + f (0) = .3333 49. Parts a, b & c involve the hypergeometric distribution with N = 52 and n = 2 a. r = 20, x = 2 13- 109 20 2 f (2) 32 0 52 (190)(1) .1433 1326 2 b. r = 4, x = 2 4 48 2 0 f (2) 52 (6)(1) .0045 1326 2 c. r = 16, x = 2 16 2 f (2) 36 0 52 (120)(1) .0905 1326 2 d. Part (a) provides the probability of blackjack plus the probability of 2 aces plus the probability of two 10s. To find the probability of blackjack we subtract the probabilities in (b) and (c) from the probability in (a). P (blackjack) = .1433 - .0045 - .0905 = .0483 50. N = 60 n = 10 a. r = 20 x = 0 0 IF 4 0I F2G F40! JI G G 10J H0 J KH K b1gH 10!30!K F40! IF 10 !50 ! I G J G f (0) = H10!30!KH60! J K 60I 60! F G J 10!50! H1 0 K = 40 39 38 37 36 35 34 33 32 31 6059 5857 56 5554 5352 51 .01 b. r = 20 x = 1 13- 110 20IF 40 F 1K H H9 IK F40! IF10!50!I GJ GJ f (1) = 20G JG H9!31!KH60! JK 60I F GJ H1 0 K .07 c. 1 - f (0) - f (1) = 1 - .08 = .92 d. Same as the probability one will be from Hawaii. In part b that was found to equal approximately .07. 51. a. f (2) 11 14 2 3 (55)(364) 25 .3768 53,130 5 b. f (2) 14 11 2 3 (91)(165) 25 .2826 53,130 5 c. f (5) 14 11 5 0 (2002)(1) 25 .0377 53,130 5 d. f (0) 14 11 0 5 25 (1)(462) .0087 53,130 5 52. Hypergeometric with N = 10 and r = 2. Focus on the probability of 0 defectives, then the probability of rejecting the shipment is 1 - f (0). a. n = 3, x = 0 f (0) 2 8 0 3 10 56 .4667 120 3 P (Reject) = 1 - .4667 = .5333 b. n = 4, x = 0 13- 111 f (0) 2 0 8 4 10 70 .3333 210 4 P (Reject) = 1 - .3333 = .6667 c. n = 5, x = 0 f (0) 2 8 0 5 10 56 .2222 252 5 P (Reject) = 1 - .2222 = .7778 d. Continue the process. n = 7 would be required with the probability of rejecting = .9333 53. a., b. and c. x 1 2 3 4 5 f (x) 0.18 0.18 0.03 0.38 0.23 1.00 E(x) = = 3.30 = 54. a. and b. x f (x) 0.18 0.36 0.09 1.52 1.15 3.30 x- -2.30 -1.30 -0.30 0.70 1.70 (x - )2 5.29 1.69 0.09 0.49 2.89 (x - )2 f (x) 0.9522 0.6084 0.0081 0.7448 3.3235 5.6370 x - -2.64 -1.64 -0.64 0.36 1.36 (x - )2 6.9696 2.6896 0.4096 0.1296 1.8496 (x - )2 f (x) 0.139392 0.161376 0.114688 0.069984 0.184960 0.670400 2 = 5.6370 5.6370 = 2.3742 x 1 2 3 4 5 f (x) 0.02 0.06 0.28 0.54 0.10 1.00 x f (x) 0.02 0.12 0.84 2.16 0.50 3.64 f (x) 0 and f (x) = 1 E(x) = = 3.64 Var (x) = 2 = 0.6704 c. People do appear to believe the stock market is overvalued. The average response is slightly over halfway between “fairly valued” and “somewhat over valued.” 55. a. x f (x) 13- 112 9 10 11 12 13 b. E (x) .30 .20 .25 .05 .20 = x f (x) = 9 (.30) + 10 (.20) + 11 (.25) + 12 (.05) + 13 (.20) = 10.65 Expected value of expenses: $10.65 million c. Var (x) = (x - )2 f (x) = (9 - 10.65)2 (.30) + (10 - 10.65)2 (.20) + (11 - 10.65)2 (.25) + (12 - 10.65)2 (.05) + (13 - 10.65)2 (.20) = 2.1275 d. Looks Good: E (Profit) = 12 - 10.65 = 1.35 million However, there is a .20 probability that expenses will equal $13 million and the college will run a deficit. 56. a. n = 20 f (3) b. n = 20 f (0) c. and 20 3 3 17 (0.04) (0.04) and 20 x = 3 0.0364 x = 0 0 20 (0.04) (0.96) 0.4420 0 E (x) = n p = 1200 (0.04) = 48 The expected number of appeals is 48. d. = n p (1 - p) = 1200 (0.04)(0.96) = 46.08 = 46.08 = 6.7882 57. a. We must have E (x) = np 10 With p = .4, this leads to: n(.4) 10 n 25 b. With p = .12, this leads to: n(.12) 10 n 83.33 So, we must contact 84 people in this age group to have an expected number of internet users of at least 10. 13- 113 c. 25(.4)(.6) 2.45 d. 84(.12)(.88) 2.97 58. Since the shipment is large we can assume that the probabilities do not change from trial to trial and use the binomial probability distribution. a. n = 5 5 f (0) 0 5 0 5 (.01) (.99) 1 4 .9510 b. f (1) c. 1 - f (0) = 1 - .9510 = .0490 d. No, the probability of finding one or more items in the SAMPLE defective when only 1% of the items in the POPULATION are defective is small (only .0490). I would consider it likely that more than 1% of the items are defective. 59. a. b. 1 (.01) (.99) .0480 E(x) = np = 100(.041) = 4.1 Var (x) = np(1 - p) = 100(.041)(.959) = 3.9319 3.9319 1.9829 60. a. 61. E(x) = 800(.41) = 328 b. np(1 p) 800(.41)(.59) 13.91 c. For this one p = .59 and (1-p) = .41, but the answer is the same as in part (b). For a binomial probability distribution, the variance for the number of successes is the same as the variance for the number of failures. Of course, this also holds true for the standard deviation. = 15 prob of 20 or more arrivals = f (20) + f (21) + · · · = .0418 + .0299 + .0204 + .0133 + .0083 + .0050 + .0029 + .0016 + .0009 + .0004 + .0002 + .0001 + .0001 = .1249 62. = 1.5 prob of 3 or more breakdowns is 1 - [ f (0) + f (1) + f (2) ]. 1 - [ f (0) + f (1) + f (2) ] = 1 - [ .2231 + .3347 + .2510] 13- 114 = 1 - .8088 = .1912 63. = 10 f (4) = .0189 64. a. f (3) b. 33 e3 0.2240 3! f (3) + f (4) + · · · = 1 - [ f (0) + f (1) + f (2) ] 0 f (0) = -3 3 e -3 = e = .0498 0! Similarly, f (1) = .1494, f (2) = .2240 1 - [ .0498 + .1494 + .2241 ] = .5767 65. Hypergeometric N = 52, n = 5 and r = 4. F4IF48I 2J G H KGJ H3 K 6(17296) a. F52I 2,598,960 .0399 GJ H5 K 4IF 48I F G H1JKG H4 JK 4(194580) .2995 F52I 2,598,960 GJ H5 K 4IF 48I F G H0JKG H5 JK 1,712,304 c. F52I 2,598,960 .6588 GJ H5 K b. d. 66. a. 1 - f (0) = 1 - .6588 = .3412 f (1) 7 3 1 1 10 (7)(3) .4667 45 2 b. f (2) 7 2 3 0 10 (21)(1) .4667 45 2 13- 115 c. f (0) 7 0 3 2 10 (1)(3) .0667 45 2 13- 116 Chapter 6 Continuous Probability Distributions Learning Objectives 1. Understand the difference between how probabilities are computed for discrete and continuous random variables. 2. Know how to compute probability values for a continuous uniform probability distribution and be able to compute the expected value and variance for such a distribution. 3. Be able to compute probabilities using a normal probability distribution. Understand the role of the standard normal distribution in this process. 4. Be able to use tables for the standard normal probability distribution to compute both standard normal probabilities and probabilities for any normal distribution. 5. Given a cumulative probability be able to compute the z-value and x-value that cuts off the corresponding area in the left tail of a normal distribution. 6. Be able to use Excel's NORMSDIST and NORMDIST functions to compute probabilities for the standard normal distribution and any normal distribution. 7. Be able to use Excel's NORMSINV and NORMINV function to find z and x values corresponding to given cumulative probabilities. 8. Be able to compute probabilities using an exponential probability distribution and using Excel's EXPONDIST function. 9. Understand the relationship between the Poisson and exponential probability distributions. 13- 117 Solutions: 1. a. f (x) 3 2 1 x .50 2. 1.0 1.5 2.0 b. P(x = 1.25) = 0. The probability of any single point is zero since the area under the curve above any single point is zero. c. P(1.0 x 1.25) = 2(.25) = .50 d. P(1.20 < x < 1.5) = 2(.30) = .60 a. f (x) .15 .10 .05 x 0 10 b. P(x < 15) = .10(5) = .50 c. P(12 x 18) = .10(6) = .60 d. E(x) e. Var(x) 10 20 2 20 15 (20 10)2 12 8.33 13- 118 30 40 3. a. f (x) 3 / 20 1 / 10 1 / 20 x 110 120 130 Minutes 4. b. P(x 130) = (1/20) (130 - 120) = 0.50 c. P(x > 135) = (1/20) (140 - 135) = 0.25 d. E(x) 120 140 2 130 minutes a. f (x) 1.5 1.0 .5 x 0 5. 1 2 b. P(.25 < x < .75) = 1 (.50) = .50 c. P(x .30) = 1 (.30) = .30 d. P(x > .60) = 1 (.40) = .40 a. Length of Interval = 261.2 - 238.9 = 22.3 1 for 238.9 x 261.2 f (x) 22.3 0 b. elsewhere Note: 1 / 22.3 = 0.045 P(x < 250) = (0.045)(250 - 238.9) = 0.4995 13- 119 3 140 Almost half drive the ball less than 250 yards. c. P(x 255) = (0.045)(261.2 - 255) = 0.279 d. P(245 x 260) = (0.045)(260 - 245) = 0.675 e. P(x 250) = 1 - P(x < 250) = 1 - 0.4995 = 0.5005 The probability of anyone driving it 250 yards or more is 0.5005. With 60 players, the expected number driving it 250 yards or more is (60)(0.5005) = 30.03. Rounding, I would expect 30 of these women to drive the ball 250 yards or more. 6. a. P(12 x 12.05) = .05(8) = .40 b. P(x 12.02) = .08(8) = .64 c. P(x 11.98) P(x 12.02) 14 4244 3 14 4244 3 .005(8) .04 .64 .08(8) Therefore, the probability is .04 + .64 = .68 7. a. P(10,000 x < 12,000) = 2000 (1 / 5000) = .40 The probability your competitor will bid lower than you, and you get the bid, is .40. b. P(10,000 x < 14,000) = 4000 (1 / 5000) = .80 c. A bid of $15,000 gives a probability of 1 of getting the property. d. Yes, the bid that maximizes expected profit is $13,000. The probability of getting the property with a bid of $13,000 is P(10,000 x < 13,000) = 3000 (1 / 5000) = .60. The probability of not getting the property with a bid of $13,000 is .40. The profit you will make if you get the property with a bid of $13,000 is $3000 = $16,000 - 13,000. So your expected profit with a bid of $13,000 is EP ($13,000) = .6 ($3000) + .4 (0) = $1800. If you bid $15,000 the probability of getting the bid is 1, but the profit if you do get the bid is only $1000 = $16,000 - 15,000. So your expected profit with a bid of $15,000 is EP ($15,000) = 1 ($1000) + 0 (0) = $1,000. 13- 120 = 10 70 80 90 100 110 120 130 8. 9. a. =5 35 40 45 50 55 60 65 b. .6826 since 45 and 55 are within plus or minus 1 standard deviation from the mean of 50. c. .9544 since 40 and 60 are within plus or minus 2 standard deviations from the mean of 50. 10. a. P(z 1.5) = .9332 b. P(z 1.0) = .8413 c. P(1.0 z 1.5) = .9332 - .8413 = .0919 d. P(0 < z < 2.5) = .9938 - .5000 = .4938 11. a. P(z -1) = P(z 1) = .8413 b. P(z -1) = 1 - P(z 1) = 1 - .8413 = .1587 c. P(z -1.5) = P(z 1.5) = .9332 d. P(-2.5 z) = P(z 2.5) = .9938 e. P(-3 < z 0) = P(0 < z < 3) = .9986 - .5000 = .4986 12. a. .7967 - .5000 = .2967 b. .9418 - .5000 = .4418 c. 1.0000 - .6700 = .3300 13- 121 d. e. .5910 .8849 f. 1.0000 - .7611 = .2389 13. a. .6879 - .0239 = .6640 b. .8888 - .6985 = .1903 c. .9599 - .8508 = .1091 14. a. z = 1.96 b. z = 1.96 c. z = .61 d. Area to left of z is .8686 z = 1.12 e. z = .44 f. Area to left of z is .6700 z = .44 15. a. b. Compute .9030 / 2 = .4515 so the area to the left of z is .5000 + .4515 = .9515. c. Compute .2052 / 2 = .1026 so the area to the left of z is .5000 + .1026 = .6026. z = .26. d. Look in the table for an area of .9948; z = 2.56. e. Look in the table for an area of .6915. Since the value we are seeking is below the mean, the z value must be negative. Thus, z = -.50. 16. a. 17. Look in the table for an area of 1.0000 - .2119 = .7881. Now z = .80 cuts off an area of .2199 in the upper tail. Thus, for an area of .2119 in the lower tail z = -80. z = 1.66. Look in the table for an area of .9900. The area value in the table closest to .9900 provides the value z = 2.33. b. Look in the table for an area of .9750. This corresponds to z = 1.96. c. Look in the table for an area of .9500. Since .9500 is exactly halfway between .9495 (z = 1.64) and .9505 (z = 1.65), we select z = 1.645. However, z = 1.64 or z = 1.65 are also acceptable answers. d. Look in the table for an area of .9000. The area value in the table closest to .9000 provides the value z = 1.28. Let x = amount spent = 527, = 160 13- 122 z 700 527 a. 1.08 160 P(x > 700) = P(z > 1.08) = .5000 - .3599 = .1401 b. z 100 527 160 2.67 P(x < 100) = P(z < -2.67) = .5000 - .4962 = .0038 c. At 700, z = 1.08 from part (a) At 450, z 450 527 160 .48 P(450 < x < 700) = P(-.48 < z < 1.08) = .8599 - .3156 = .5443 d. z 300 527 160 1.42 P(x 300) = P(z -1.42) = .5000 - .4222 = .0778 18. a. Find P(x 60) At x = 60 60 - 49 = 0.69 16 z = P(x < 60) = 0.7549 P(x 60) = 1 - P(x < 60) = 0.2451 b. Find P(x 30) At x = 30 z = 30 - 49 = – 1.19 16 P(x 30) = 1.0000 - 0.8830 = 0.1170 c. Find z-score so that P(z z-score) = 0.10 z-score = 1.28 cuts off 10% in upper tail Now, solve for corresponding value of x. 1.28 x 49 16 x = 49 + (16)(1.28) = 69.48 So, 10% of subscribers spend 69.48 minutes or more reading The Wall Street Journal. 19. We have = 3.5 and = .8. 13- 123 a. z 5.0 3.5 1.88 .8 P(x > 5.0) = P(z > 1.88) = 1 - P(z < 1.88) = 1 - .9699 = .0301 The rainfall exceeds 5 inches in 3.01% of the Aprils. b. z 3 3.5 .63 .8 P(x < 3.0) = P(z < -.63) = P(z > .63) = 1 - P(z < .63) = 1 - .7357 = .2643 The rainfall is less than 3 inches in 26.43% of the Aprils. c. z = 1.28 cuts off approximately .10 in the upper tail of a normal distribution. x = 3.5 + 1.28(.8) = 4.524 If it rains 4.524 inches or more, April will be classified as extremely wet. We use = 27 and = 8 20. a. z 11 27 8 2 P(x 11) = P(z -2) = 1.0000 - .9772 = .0228 The probability a randomly selected subscriber spends less than 11 hours on the computer is .025. b. z 40 27 8 1.63 P(x > 40) = P(z > 1.63) = 1 - P(z 1.63) = 1 - .9484 = .0516 5.16% of subscribers spend over 40 hours per week using the computer. c. A z-value of .84 cuts off an area of .20 in the upper tail. x = 27 + .84(8) = 33.72 A subscriber who uses the computer 33.72 hours or more would be classified as a heavy user. 21. From the normal probability tables, a z-value of 2.05 cuts off an area of approximately .02 in the upper tail of the distribution. x = + z = 100 + 2.05(15) = 130.75 A score of 131 or better should qualify a person for membership in Mensa. Use = 441.84 and = 90 22. a. At 400 13- 124 400 441.84 z .46 90 At 500 z 500 441.84 .65 90 P(0 z < .65) = .2422 P(-.46 z < 0) = .1772 P(400 z 500) = .1772 + .2422 = .4194 The probability a worker earns between $400 and $500 is .4194. b. Must find the z-value that cuts off an area of .20 in the upper tail. Using the normal tables, we find z = .84 cuts off approximately .20 in the upper tail. So, x = + z = 441.84 + .84(90) = 517.44 Weekly earnings of $517.44 or above will put a production worker in the top 20%. c. At 250, z 250 441.84 90 2.13 P(x 250) = P(z -2.13) = 1.0000 - .9834 = .0166 The probability a randomly selected production worker earns less than $250 per week is .0166. 23. a. b. z 60 80 10 2 Area to left is 1.0000 - .9772 = .0228 At x = 60 z 60 80 2 Area to left is .0228 .5 Area to left is .3085 10 At x = 75 z 75 80 10 P(60 x 75) = .3085 - .0228 = .2857 c. z 90 80 10 1 Area = 1 - .8413 = .1587 Therefore 15.87% of students will not complete on time. (60) (.1587) = 9.522 We would expect 9.522 students to be unable to complete the exam in time. 13- 125 24. a. x ∑ 902.75 n s b. xi ∑(x x )2 114.185 n 1 We will use x as an estimate of and s as an estimate of in parts (b) - (d) below. Remember the DATA are in thousands of shares. At 800 z 800 902.75 .90 114.185 P(x 800) = P(z -.90) = 1 - P(z .90) = 1 - .8159 = .1841 The probability trading volume will be less than 800 million shares is .1841 c. At 1000 z 1000 902.75 .85 114.185 P(x 1000) = P(z .85) = 1 - P(z .85) = 1 - .8023 = .1977 The probability trading volume will exceed 1 billion shares is .1977 d. A z-value of 1.645 cuts off an area of .05 in the upper tail x = + z = 902.75 + 1.645(114.185) = 1,090.584 They should issue a press release any time share volume exceeds 1,091 million. = 442.54, = 65 25. a. z 400 442.54 65 .65 P(x > 400) = P(z > -.65) = .5000 +.2422 = .7422 b. z 300 442.54 65 2.19 P(x 300) = P(z -2.19) = .5000 - .4857 = .0143 c. At x = 400, z = -.65 from part (a) At x = 500, z 500 442.54 65 .88 13- 126 P(400 < x < 500) = P(-.65 < z < .88) = .8106 - .2578 = .5528 26. a. P(x 6) = 1 - e-6/8 = 1 - .4724 = .5276 b. P(x 4) = 1 - e-4/8 = 1 - .6065 = .3935 c. P(x 6) = 1 - P(x 6) = 1 - .5276 = .4724 d. 27. a. P(4 x 6) = P(x 6) - P(x 4) = .5276 - .3935 = .1341 P(x x0 ) 1 ex / 3 0 b. P(x 2) = 1 - e-2/3 = 1 - .5134 = .4866 c. P(x 3) = 1 - P(x 3) = 1 - (1 - e3/3 ) = e-1 = .3679 d. P(x 5) = 1 - e-5/3 = 1 - .1889 = .8111 e. P(2 x 5) = P(x 5) - P(x 2) = .8111 - .4866 = .3245 28. a. P(x 10) = 1 - e-10/20 = .3935 b. P(x 30) = 1 - P(x 30) = 1 - (1 - e-30/20 ) = e-30/20 = .2231 c. P(10 x 30) = P(x 30) - P(x 10) = (1 - e-30/20 ) - (1 - e-10/20 ) = e-10/20 - e-30/20 = .6065 - .2231 = .3834 29. a. f(x) .09 .08 .07 .06 .05 .04 .03 .02 .01 x 6 12 18 13- 127 24 b. P(x 12) = 1 - e-12/12 = 1 - .3679 = .6321 c. P(x 6) = 1 - e-6/12 = 1 - .6065 = .3935 d. P(x 30) = 1 - P(x < 30) = 1 - (1 - e-30/12) = .0821 30. a. 50 hours b. P(x 25) = 1 - e-25/50 = 1 - .6065 = .3935 c. P(x 100) = 1 - (1 - e-100/50) = .1353 31. a. P(x 2) = 1 - e-2/2.78 = .5130 b. P(x 5) = 1 - P(x 5) = 1 - (1 - e-5/2.78 ) = e-5/2.78 = .1655 c. P(x 2.78) = 1 - P(x 2.78) = 1 - (1 - e-2.78/2.78 ) = e-1 = .3679 This may seem surprising since the mean is 2.78 minutes. But, for the exponential distribution, the probability of a value greater than the mean is significantly less than the probability of a value less than the mean. 32. a. If the average number of transactions per year follows the Poisson distribution, the time between transactions follows the exponential distribution. So, = 1 30 and 1 of a year 1 1/ 30 30 then f(x) = 30 e-30x b. A month is 1/12 of a year so, P x 1 12 1 P x 1 12 1 (1 e 30 /12 )e 30 /12 .0821 The probability of no transaction during January is the same as the probability of no transaction during any month: .0821 c. Since 1/2 month is 1/24 of a year, we compute, P x 1 24 1 e 30 / 24 1.2865 .7135 13- 128 33. a. Let x = sales price ($1000s) 1 f (x) 25 for 200 x 225 0 elsewhere b. P(x 215) = (1 / 25) (225 - 215) = 0.40 c. P(x < 210) = (1 / 25)(210 - 200) = 0.40 d. E (x) = (200 + 225)/2 = 212,500 If she waits, her expected sale price will be $2,500 higher than if she sells it back to her company now. However, there is a 0.40 probability that she will get less. It’s a close call. But, the expected value approach to decision making would suggest she should wait. 34. a. For a normal distribution, the mean and the median are equal. 63,000 b. Find the z-score that cuts off 10% in the lower tail. z-score = -1.28 Solving for x, – 1.28 = x – 63,000 15,000 x = 63,000 - 1.28 (15000) = 43,800 c. The lower 10% of mortgage debt is $43,800 or less. Find P(x > 80,000) At x = 80,000 z = 80,000 – 63,000 = 1.13 15,000 P(x > 80,000) = 1.0000 - .8708 = 0.1292 d. Find the z-score that cuts off 5% in the upper tail. z-score = 1.645. Solve for x. 1.645 = x – 63,000 15,000 x = 63,000 + 1.645 (15,000) = 87,675 The upper 5% of mortgage debt is in excess of $87,675. 35. a. P(defect) = 1 - P(9.85 x 10.15) 13- 129 = 1 - P(-1 z 1) = 1 - .6826 = .3174 Expected number of defects = 1000(.3174) = 317.4 b. P(defect) = 1 - P(9.85 x 10.15) = 1 - P(-3 z 3) = 1 - .9972 = .0028 Expected number of defects = 1000(.0028) = 2.8 c. Reducing the process standard deviation causes a substantial reduction in the number of defects. = 6,312 36. a. z = -1.645 cuts off .05 in the lower tail So, 1.645 b. 1000 6312 1000 6312 3229 1.645 At 6000, z At 4000, z 6000 6312 3229 4000 6312 3229 .10 .72 P(4000 < x < 6000) = P(-.72 < z < -.10) = .4602 - .2358 = .2244 c. z = 1.88 cuts off approximately .03 in the upper tail x = 6312 + 1.88(3229) = 12,382.52 The households with the highest 3% of expenditures spent more than $12,382. = 10,000 37. a. = 1500 At x = 12,000 13- 130 z 12, 000 10, 000 1500 1.33 Area to left is .9082 P(x > 12,000) = 1.0000 - .9082 = .0918 b. At .95 z = 1.645 = x - 10,000 1500 Therefore, x = 10,000 + 1.645(1500) = 12,468. 95% 0.05 10,000 12,468 12,468 tubes should be produced. 38. a. At x = 200 z 200 150 25 2 Area = .9772 P(x > 200) = 1 - .9772 = .0228 b. Expected Profit = Expected Revenue - Expected Cost = 200 - 150 = $50 39. a. Find P(80,000 x 150,000) At x = 150,000 z = 150,000 – 126,681 = 0.78 30,000 z = 80,000 – 126,681 = – 1.56 30,000 P(x 150,000) = 0.7823 At x = 80,000 P(x 80,000) = 1.0000 - .9406 = 0.0594 P(80,000 x 150,000) = 0.7823 - 0.0594 = 0.7229 13- 131 b. Find P(x < 50,000) At x = 50,000 z = 50,000 – 126,681 = – 2.56 30,000 P(x < 50,000) = 1.0000 - .9948 = 0.0052 c. Find the z-score cutting off 95% in the left tail. z-score = 1.645. Solve for x. 1.645 = x – 126,681 30,000 x = 126,681 + 1.645 (30,000) = 176,031 The probability is 0.95 that the number of lost jobs will not exceed 176,031. 40. a. At 400, z = 400 - 450 = -.500 100 Area to left is .3085 At 500, z = 500 - 450 = +.500 100 Area to left is .6915 P(400 x 500) = .6915 - .3085 = .3830 38.3% will score between 400 and 500. b. At 630, z = 630 - 450 = 1.80 100 96.41% do worse and 3.59% do better . c. At 480, z = 480 - 450 = .30 100 38.21% are acceptable. 41. a. At 75,000 13- 132 Area to left is .6179 75, 000 67, 000 z 1.14 7, 000 P(x > 75,000) = P(z > 1.14) = 1 - P(z 1.14) = 1 - .8729 = .1271 The probability of a woman receiving a salary in excess of $75,000 is .1271 b. At 75,000 75, 000 65, 500 z 1.36 7, 000 P(x > 75,000) = P(z > 1.36) = 1 - P(z 1.36) = 1 - .9131 = .0869 c. The probability of a man receiving a salary in excess of $75,000 is .0869 At x = 50,000 50, 000 67, 000 z 2.43 7, 000 P(x < 50,000) = P(z < -2.43) = 1 - P(z < 2.43) = 1 - .9925 = .0075 The probability of a woman receiving a salary below $50,000 is very small: .0075 d. The answer to this is the male copywriter salary that cuts off an area of .01 in the upper tail of the distribution for male copywriters. Use z = 2.33 x = 65,500 + 2.33(7,000) = 81,810 A woman who makes $81,810 or more will earn more than 99% of her male counterparts. 42. = .6 At 2% z = -2.05 z = x - x = 18 -2.05 = 18 - .6 = 18 + 2.05 (.6) = 19.23 oz. 0.02 18 =19.23 The mean filling weight must be 19.23 oz. 13- 133 43. a. P(x 15) = 1 - e-15/36 = 1 - .6592 = .3408 b. P(x 45) = 1 - e-45/36 = 1 - .2865 = .7135 Therefore P(15 x 45) = .7135 - .3408 = .3727 c. P(x 60) = 1 - P(x < 60) = 1 - (1 - e-60/36) = .1889 44. a. Mean time between arrivals = 1/7 minutes b. f(x) = 7e-7x c. d. P(x > 1) = 1 - P(x < 1) = 1 - [1 - e-7(1)] = e-7 = .0009 12 seconds is .2 minutes P(x > .2) = 1 - P(x < .2) = 1- [1- e-7(.2)] = e-1.4 = .2466 45. a. b. 1 x / 36.5 e .0274e.0274 x 36.5 P(x < 40) = 1 - e-.0274(40) = 1 - .3342 = .6658 P(x < 20) = 1 - e-.0274(20) = 1 - .5781 = .4219 P(20 < x < 40) = .6658 - .4219 = .2439 c. From part (b), P(x < 40) = .6658 P(x > 40) = P(x 40) = 1 - P(x < 40) = 1 - .6658 = .3342 46. a. 1 0.5 therefore = 2 minutes = mean time between telephone calls b. Note: 30 seconds = .5 minutes P(x .5) = 1 - e-.5/2 = 1 - .7788 = .2212 c. P(x 1) = 1 - e-1/2 = 1 - .6065 = .3935 d. P(x 5) = 1 - P(x < 5) = 1 - (1 - e-5/2) = .0821 13- 134 Chapter 7 Sampling and Sampling Distributions Learning Objectives 1. Understand the importance of sampling and how results from SAMPLEs can be used to provide estimates of POPULATION parameters such as the POPULATION mean, the POPULATION standard deviation and / or the POPULATION proportion. 2. Know what simple random sampling is and how simple random SAMPLEs are selected. 3. Be able to select a simple random SAMPLE using Excel. 4. Understand the concept of a sampling distribution. 5. Know the central limit theorem and the important role it plays in sampling. 6. Know the characteristics of the sampling distribution of the SAMPLE mean ( x ) and the sampling distribution of the SAMPLE proportion ( p ). 7. Learn about a variety of sampling methods including stratified random sampling, cluster sampling, systematic sampling, convenience sampling and judgment sampling. 8. Know the definition of the following terms: simple random sampling factor sampling with replacement sampling without replacement sampling distribution point estimator finite POPULATION correction standard error 13- 135 Solutions: 1. a. AB, AC, AD, AE, BC, BD, BE, CD, CE, DE b. With 10 SAMPLEs, each has a 1/10 probability. c. E and C because 8 and 0 do not apply.; 5 identifies E; 7 does not apply; 5 is skipped since E is already in the SAMPLE; 3 identifies C; 2 is not needed since the SAMPLE of size 2 is complete. 2. Using the last 3-digits of each 5-digit grouping provides the random numbers: 601, 022, 448, 147, 229, 553, 147, 289, 209 Numbers greater than 350 do not apply and the 147 can only be used once. Thus, the simple random SAMPLE of four includes 22, 147, 229, and 289. 3. 4. 459, 147, 385, 113, 340, 401, 215, 2, 33, 348 a. We first number the companies from 1 to 10: 1 AT&T, 2 IBM, , 10 Pfizer. Random Number 6 8 5 4 1* Company in SAMPLE Microsoft Motorola Cisco Johnson & Johnson AT&T *Note that the random numbers 5 and 6 were skipped because we are sampling without replacement. b. Company AT&T IBM American Online Johnson & Johnson Cisco Systems Microsoft General Electric Motorola Intel Pfizer Random Number Assigned 6 8 5 4 5 6 1 1 3 8 Company in SAMPLE Note that both American Online and Cisco were assigned a random number of 5. We broke the tie by including the first to receive a 5 in the SAMPLE. c. d. 10! (10)(9)(8)(7)(6) Number of SAMPLEs of Size 5 = 252 5!(10 5)! (5)(4)(3)(2)(1) Use Excel's RAND() function to assign a random number between 0 and 1 to each of the companies, then proceed as in part (b) above. The five with the smallest random numbers can be found by using Excel's SORT tool. 13- 136 5. a. 283, 610, 39, 254, 568, 353, 602, 421, 638, 164 b. Generate a random number for each of the 645 students. Include the students associated with the 50 smallest random numbers in the SAMPLE. 6. 2782, 493, 825, 1807, 289 7. Use the DATA disk accompanying the book and the EAI file. Generate a random number using the RAND() function for each of the 2500 managers. Then sort the list of managers with respect to the random numbers. The first 50 managers are the SAMPLE. 8. a. 21 random numbers were needed. The teams selected are Wisconsin, Clemson, Washington, USC, Oklahoma, and Colorado. b. Use Excel to generate 25 random numbers - one for each team. Then sort the list of teams with respect to the list of random numbers. We can also use the same first two digits in column 9 of Table 7.1. Using the random numbers in Table 7.1, the following 6 teams are used in the SAMPLE: Nebraska, Florida State, Michigan, Texas, Washington, and TCU. These are the teams with the six smallest random numbers. (There is a tie between TCU and Colorado for 6th smallest.) 9. 511, 791, 99, 671, 152, 584, 45, 783, 301, 568, 754, 750 10. finite, infinite, infinite, infinite, finite 11. a. x xi / n b. b. 9 6 ( x x) 2 n 1 ( x i x )2 = (-4)2 + (-1)2 + 12 (-2)2 + 12 + 52 = 48 s s= 12. a. 54 48 3.1 61 p = 75/150 = .50 p = 55/150 = .3667 13. a. Totals x xi / n 465 xi ( xi x) 94 100 85 94 92 465 +1 +7 -8 +1 -1 0 93 5 13- 137 ( xi x )2 1 49 64 1 1 116 b. 14. a. ( x x )2 n 1 149/784 = 0.19 s 116 5.39 4 b. 251/784 = 0.32 c. Total receiving cash = 149 + 219 + 251 = 619 619/784 = 0.79 15. a. x x / n 70 7 years i 10 b. s 20.2 (x x )2 1.5 years n 1 10 1 16. p = 1117/1400 = 0.80 17. a. 595/1008 = .59 b. 332/1008 = .33 c. 81/1008 = .08 18. a. Use the DATA disk accompanying the book and the EAI file. Generate a random number for each manager and select managers associated with the 50 smallest random numbers as the SAMPLE. b. Use Excel's AVERAGE function to compute the mean for the SAMPLE. c. Use Excel's STDEV function to compute the SAMPLE standard deviation. d. Use the SAMPLE proportion as a point estimate of the POPULATION proportion. 19. a. The sampling distribution is normal with E ( x ) = = 200 x / n 50 / 100 5 For +5, ( x - ) = 5 z x x 5 1 5 Probability of being within 5 is .6826 b. For + 10, (x ) = 10 z x x Probability of being within 10 is .9544 13- 138 10 5 2 x / n 20. x 25 / 50 3.54 x 25 / 100 2.50 x 25 / 150 2.04 x 25 / 200 1.77 The standard error of the mean decreases as the SAMPLE size increases. 21. a. b. x / n 10 / 50 1.41 n / N = 50 / 50,000 = .001 Use x / n 10 / 50 1.41 c. n / N = 50 / 5000 = .01 Use x / n 10 / 50 1.41 d. n / N = 50 / 500 = .10 Use x 500 50 10 N n 1.34 N 1 n 500 1 50 Note: Only case (d) where n /N = .10 requires the use of the finite POPULATION correction factor. 22. a. b. Using the central limit theorem, we can approximate the sampling distribution of x with a normal probability distribution provided n 30. n = 30 x / n 50 / 30 9.13 x 400 n = 40 x / n 50 / 40 7.91 x 13 - 139 400 23. a. x / n 16 / 50 2.26 For , ( x ) 2 z x x 2 0.88 2.26 P(0 z 0.88) = .3106 For 2, the probability is 2(.3106) = .6212 b. x 16 100 1.60 z x x 2 1.25 1.60 P(0 z 1.25) = .3944 For 2, the probability is 2(.3944) = .7888 c. x 16 200 1.13 z x x 2 1.77 1.13 P(0 z 1.77) = .4616 d. For 2, the probability is 2(.4616) = .9232 16 0.80 x 400 2 x 2.50 z 0.80 x P(0 z 2.50) = .4938 For 2, the probability is 2(.4938) = .9876 e. The larger SAMPLE provides a higher probability that the SAMPLE mean will be within 2 of . 24. a. x / n 4000 / 60 516.40 x 51,800 E( x ) The normal distribution is based on the Central Limit Theorem. 13- 140 b. For n = 120, E ( x ) remains $51,800 and the sampling distribution of x can still be approximated by a normal distribution. However, x is reduced to 4000 / 120 = 365.15. c. As the SAMPLE size is increased, the standard error of the mean, x , is reduced. This appears logical from the point of view that larger SAMPLEs should tend to provide SAMPLE means that are closer to the POPULATION mean. Thus, the variability in the SAMPLE mean, measured in terms of x , should decrease as the SAMPLE size is increased. x / n 4000 / 60 516.40 51,300 51,800 52,300 25. a. z 52, 300 51,800 .97 516.40 P(0 z .97) = .3340 b. For 500, the probability is 2(.3340) = .6680 x / n 4000 / 120 365.15 z = 52,300 - 51,800 = +1.37 365.15 P(0 z 1.37) = .4147 For 500, the probability is 2(.4147) = .8294 26. a. A normal distribution E ( x ) 1.20 x / n 0.10 / 50 0.014 b. z z 1.22 1.20 1.41 0.014 1.18 1.20 0.014 1.41 P(0 z 1.41) = .4207 P(-1.41 z 0) = .4207 probability = 0.4207 + 0.4207 = 0.8414 13- 141 x c. z z 1.21 1.20 0.71 0.014 1.19 1.20 0.014 P(0 z .71) = .2612 0.71 P(-.71 z 0) = .2612 probability = 0.2612 + 0.2612 = 0.5224 27. a. E( x ) = 1017 x / n 100 / 75 11.55 z 1027 1017 0.87 P(0 z .87) = .3078 11.55 z 1007 1017 11.55 0.87 P(-.87 z 0) = .3078 probability = 0.3078 + 0.3078 = 0.6156 b. z z 1037 1017 11.55 997 1017 11.55 1.73 P(0 z 1.73) = .4582 1.73 P(-1.73 z 0) = .4582 probability = 0.4582 + 0.4582 = 0.9164 28. a. z x 34, 000 / n Error = x - 34,000 = 250 n = 30 z = 250 = .68 .2518 x 2 = .5036 = .88 .3106 x 2 = .6212 = 1.25 .3944 x 2 = .7888 = 1.77 .4616 x 2 = .9232 = 2.50 .4938 x 2 = .9876 2000 / 30 n = 50 z = 250 2000 / 50 n = 100 z = 250 2000 / 100 n = 200 z = 250 2000 / 200 n = 400 z = 250 2000 / 400 b. A larger SAMPLE increases the probability that the SAMPLE mean will be within a specified distance from the POPULATION mean. In the salary example, the probability of being within 250 of ranges from .5036 for a SAMPLE of size 30 to .9876 for a SAMPLE of size 400. 13- 142 29. a. E( x ) = 982 x / n 210 / 40 33.2 z x / n 100 210 / 40 3.01 .4987 x 2 = .9974 b. z x / n 25 210 / 40 .75 .2734 x 2 = .5468 c. 30. a. The SAMPLE with n = 40 has a very high probability (.9974) of providing a SAMPLE mean within $100. However, the SAMPLE with n = 40 only has a .5468 probability of providing a SAMPLE mean within $25. A larger SAMPLE size is desirable if the $25 is needed. Normal distribution, E( x ) = 166,500 x / n 42,000 / 100 4200 b. z x 10,000 2.38 / n 4,200 P(-2.38 z 2.38) = .9826 c. $5000 z = 5000/4200 = 1.19 P(-1.19 z 1.19) = .7660 $2500 z = 2500/4200 = .60 P(-.60 z .60) = .4514 $1000 z = 1000/4200 = .24 P(-.24 z .24) = .1896 d. Increase SAMPLE size to improve precision of the estimate. SAMPLE size of 100 only has a .4514 probability of being within $2,500. = 1.46 = .15 31. a. n = 30 z x / n .03 .15 / 30 1.10 P(1.43 x 1.49) = P(-1.10 z 1.10) = .3643(2) = .7286 b. n = 50 z x / n .03 .15 / 50 1.41 13- 143 P(1.43 x 1.49) = P(-1.41 z 1.41) = .4207(2) = .8414 c. n = 100 z x / n .03 .15 / 100 2.00 P(1.43 x 1.49) = P(-2 z 2) = .4772(2) = .9544 d. 32. a. b. A SAMPLE size of 100 is necessary. n / N = 40 / 4000 = .01 < .05; therefore, the finite POPULATION correction factor is not necessary. With the finite POPULATION correction factor x 4000 40 8.2 N n 1.29 4000 1 40 N 1 n Without the finite POPULATION correction factor x / n 1.30 Including the finite POPULATION correction factor provides only a slightly different value for x than when the correction factor is not used. c. z x 1.30 2 1.54 1.30 P(-1.54 z 1.54) = .8764 33. a. E ( p ) = p = .40 0.40(0.60) p(1 p) 0.0490 100 n b. p c. Normal distribution with E ( p ) = .40 and p = .0490 d. It shows the probability distribution for the SAMPLE proportion p . 34. a. E ( p ) = .40 p 0.40(0.60) p(1 p) 0.0346 n 200 z p p p 0.03 0.0346 P(-.87 z .87) = .6156 13- 144 0.87 b. z p p p 0.05 1.45 0.0346 P(-1.45 z 1.45) = .8530 35. p p(1 p) n p (0.55)(0.45) 0.0497 100 p (0.55)(0.45) 0.0352 200 p (0.55)(0.45) 0.0222 500 p (0.55)(0.45) 0.0157 1000 p decreases as n increases 36. a. p z (0.30)(0.70) 0.0458 100 p p p 0.04 0.87 0.0458 P(-.87 z .87) = 2(.3078) = .6156 Area = 0.3078 x 2 = 0.6156 b. p z (0.30)(0.70) 0.0324 200 p p p 0.04 1.23 0.0324 Area = 0.3907 x 2 = 0.7814 c. p z (0.30)(0.70) 0.0205 500 p p p 0.04 1.95 0.0205 Area = 0.4744 x 2 = 0.9488 13- 145 d. p z (0.30)(0.70) 0.0145 1000 p p p 0.04 2.76 0.0145 Area = 0.4971 x 2 = 0.9942 e. With a larger SAMPLE, there is a higher probability p will be within .04 of the POPULATION proportion p. 37. a. 0.30(0.70) p(1 p) 0.0458 100 n p 0.30 The normal distribution is appropriate because n p = 100 (.30) = 30 and n (1 - p ) = 100 (.70) = 70 are both greater than 5. b. P (.20 p .40) = ? z = .40 - .30 = 2.18 .0458 P(0 z 2.18) = .4854 Probability sought is 2(.4854) = .9708 c. P (.25 p .35) = ? z = .35 - .30 = 1.09 .0458 P(-1.09 z 1.09) = .7242 38. a. E ( p ) = .76 p 0.76(1 0.76) p(1 p) 0.0214 400 n The normal distribution is appropriate because np = 400(.76) = 304 and n(1-p) = 400 (.24) = 96 are both greater than 5. 13- 146 b. z z 0.79 0.76 1.40 0.0214 0.73 0.76 0.0214 1.40 P(0 z 1.40) = .4192 P(-1.40 z 0) = .4192 probability = 0.4192 + 0.4192 = 0.8384 c. p z 0.76(1 0.76) p(1 p) 0.0156 750 n 0.79 0.76 0.0156 z 0.73 0.76 0.0156 1.92 P(0 z 1.92) = .4726 1.92 P(-1.92 z 0) = .4726 probability = 0.4726 + 0.4726 = 0.9452 39. a. Normal distribution E ( p ) = .50 p b. z p(1 p) n (.50)(1.50) .0206 589 p p .04 1.94 .0206 p .4738 x 2 = .9476 c. z p p .03 1.46 .0206 p .4279 x 2 = .8558 d. z p p .02 .97 .0206 p .3340 x 2 = .6680 40. a. Normal distribution E ( p ) = 0.25 13- 147 p b. z (0.25)(0.75) p(1 p) 0.0306 n 200 0.03 0.98 0.0306 P(0 z .98) = .3365 probability = 0.3365 x 2 = 0.6730 c. z 0.05 1.63 0.0306 P(0 z 1.63) = .4484 probability = 0.4484 x 2 = 0.8968 41. a. Normal distribution with E( p ) = p = .25 and p b. z p(1 p) n .25(1.25) .0137 1000 p p .03 2.19 .0137 p P(.22 p .28) = P(-2.19 z 2.19) = .4857(2) = .9714 c. z p p .25(1 .25) 500 .03 1.55 .0194 P(.22 p .28) = P(-1.55 z 1.55) = .4394(2) = .8788 42. a. p(1 p) 0.15(0.85) 0.0505 n 50 p 0.15 b. P (.12 p .18) = ? z = .18 - .15 = .59 .0505 P(-.59 z .59) = .4448 c. P ( p .10) = ? 13- 148 z = .10 - .15 = -.99 .0505 P(z -.99) = .3389 + .5000 = .8389 43. a. E ( p ) = 0.17 p (0.17)(1 0.17) p(1 p) 0.01328 800 n Normal distribution b. z z 0.19 0.17 1.51 0.01328 0.34 0.37 0.01328 1.51 P(0 z 1.51) = .4345 P(-1.51 z 0) = .4345 probability = 0.4345 + 0.4345 = 0.8690 c. p z (0.17)(1 0.17) p(1 p) 0.0094 1600 n 0.19 0.17 0.0094 z 0.15 0.17 0.0094 2.13 P(0 z 2.13) = .4834 2.13 P(-2.13 z 0) = .4834 probability = 0.4834 + 0.4834 = 0.9668 44. 112, 145, 73, 324, 293, 875, 318, 618 45. a. Normal distribution E(x ) = 3 x b. z 1.2 n x / n .17 50 .25 1.2 / 50 1.47 .4292 x 2 = .8584 46. a. Normal distribution E ( x ) = 31.5 13- 149 x b. n 12 50 1.70 1 0.59 1.70 z P(0 z .59) = .2224 probability = 0.2224 x 2 = 0.4448 c. 3 1.77 1.70 z P(0 z 1.77) = .4616 probability = 0.4616 x 2 = 0.9232 47. a. E ( x ) = $24.07 x 0.50 z n 4.80 120 0.44 1.14 0.44 P(0 z 1.14) = .3729 probability = 0.3729 x 2 = 0.7458 b. 1.00 z 2.28 0.44 P(0 z 2.28) = .4887 probability = 0.4887 x 2 = 0.9774 = 41,979 = 5000 48. a. x 5000 / 50 707 b. z x 0 0 707 x P( x > 41,979) = P(z > 0) = .50 c. z x x 1000 1.41 707 P(40,979 x 42,979) = P(-1.41 z 1.41) = (.4207)(2) = .8414 d. x 5000 / 100 500 z x x 1000 2.00 500 P(40,979 x 42,979) = P(-2 z 2) = (.4772)(2) = .9544 13- 150 49. a. x N n N 1 n N = 2000 x N = 5000 x N = 10,000 x 2000 50 144 20.11 2000 1 50 5000 50 144 20.26 5000 1 50 10,000 50 144 20.31 10,000 1 50 Note: With n / N .05 for all three cases, common statistical practice would be to ignore 144 20.36 for each case. 50 the finite POPULATION correction factor and use x b. N = 2000 z = 25 = 1.24 20.11 z 25 1.23 20.26 z = 25 = 1.23 20.31 P(-1.24 z 1.24) = .7850 N = 5000 P(-1.23 z 1.23) = .7814 N = 10,000 P(-1.23 z 1.23) = .7814 All probabilities are approximately .78 50. a. x n 500 n 20 n = 500 / 20 = 25 and n = (25)2 = 625 b. For 25, z = 25 = 1.25 20 P(-1.25 z 1.25) = .7888 51. Sampling distribution of x 13- 151 0.05 n 30 0.05 1.9 2.1 x 1.9 + 2.1 = 2 2 The area below = 2.1 must be .95. An area of .95 in the standard normal table shows z = 1.645. = Thus, 2.1 2.0 (0.1) 30 0.33 1.645 / 30 1.645 Solve for 52. p = .305 a. Normal distribution with E( p ) = p = .305 and p b. z p(1 p) n .305(1.305) .0326 200 p p .04 1.23 .0326 p P(.265 p .345) = P(-1.23 z 1.23) = .3907(2) = .7814 c. z p p .02 .61 .0326 p P(.285 p .325) = P(-.61 z .61) = .2291(2) = .4582 53. p p(1 p) n (0.40)(0.60) 0.0245 400 P ( p .375) = ? z = .375 - .40 = -1.02 .0245 P(z -1.02) = P(z 1.02) = .8461 13- 152 P ( p .375) = .8461 54. a. p(1 p) n p z p p p .05 (.71)(1 .71) .0243 350 2.06 .0243 .4803 x 2 = .9606 b. z p p p .75 .71 1.65 .0243 Area = .4505 P ( p .75) = 1.0000 - .9505 = .0495 55. a. Normal distribution with E ( p ) = .15 and p b. (0.15)(0.85) p(1 p) 0.0292 150 n P (.12 p .18) = ? z = .18 - .15 = 1.03 .0292 P(-1.03 z 1.03) = 2(.3485) = .6970 56. a. p .25(.75) p(1 p) .0625 n n Solve for n n .25(.75) 48 (.0625)2 b. Normal distribution with E ( p ) = .25 and x = .0625 c. P ( p .30) = ? z .30 .25 .0625 P(z .8) = 1 - P(z .8) = 1 - .7881 = .2119 Thus P ( p .30) = .2119 13- 153 .8 Chapter 8 Interval Estimation Learning Objectives 1. Be able to construct and interpret an interval estimate of a POPULATION mean and / or a POPULATION proportion. 2. Understand the concept of a sampling error. 3. Be able to use knowledge of a sampling distribution to make probability statements about the sampling error. 4. Understand and be able to compute the margin of error. 5. Learn about the t distribution and when it should be used in constructing an interval estimate for a POPULATION mean. 6. Be able to use the worksheets presented in the chapter as templates for constructing interval estimates. 7. Be able to determine the size of a simple random SAMPLE necessary to estimate a POPULATION mean and a POPULATION proportion with a specified level of precision. 8. Know the definition of the following terms: confidence interval confidence coefficient confidence level precision sampling error margin of error degrees of freedom 13- 154 Solutions: 1. 2. a. x / n 5 / 40 0.79 b. At 95%, z / n 1.96(5 / 40) 1.55 a. 32 1.645 (6 / 50 ) 32 1.4 b. (30.6 to 33.4) 32 1.96 (6 / 50 ) 32 1.66 c. (30.34 to 33.66) 32 2.576 (6 / 50 ) 32 2.19 3. a. (29.81 to 34.19) 80 1.96 (15 / 60 ) 80 3.8 b. (76.2 to 83.8) 80 1.96 (15 / 120 ) 80 2.68 c. (77.32 to 82.68) Larger SAMPLE provides a smaller margin of error. 126 1.96 (s / n ) 4. 1.96 16.07 n 4 n 1.96(16.07) 7.874 4 n 62 5. 6. a. x / n 5.00 / 49 .7143 b. 1.96 / n 1.96(5.00 / 49 ) 1.4 c. 34.80 1.4 or (33.40 to 36.20) a. x 369 b. s = 50 13- 155 c. 369 1.96 (50/ 250 ) 369 6.20 (362.8 to 375.2) x z.025 ( / n ) 7. 3.37 1.96 (.28 / 120 ) 3.37 .05 8. a. x z / 2 (3.32 to 3.42) n 12,000 1.645 (2, 200 / 245) 12,000 231 b. (11,769 to 12,231) 12,000 1.96 (2, 200 / 245) 12,000 275 c. (11,725 to 12,275) 12,000 2.576 (2, 200 / 245) 12,000 362 9. (11,638 to 12,362) d. Interval width must increase since we want to make a statement about with greater confidence. a. x b. s c. Margin of Error =1.96 xi 13.75 n (x i x )2 n 1 4.8969 s n 1.96 4.8969 1.24 60 95% Confidence Interval: 13.75 1.24 or $12.51 to $14.99 10. x z.025 s n 7.75 1.96 3.45 180 7.75 .50 11. a. (7.25 to 8.25) Using Excel we obtained a SAMPLE mean of x = 6.34 and a SAMPLE standard deviation of 2.163. The confidence interval is shown below: 6.34 1.96 (2.163 / 50 ) 13- 156 6.34 .60 The 95% confidence interval estimate is 5.74 to 6.94. 12. a. x xi 114 n b. s 3.8 minutes 30 (x x )2 2.26 minutes n 1 s Margin of Error = z.025 c. x z.025 .95 b. .90 c. .01 d. .05 e. .95 f. .85 2.26 30 .81 minutes (2.99 to 4.61) 14. a. 1.734 b. -1.321 c. 3.365 d. -1.761 and +1.761 e. -2.048 and +2.048 15. a. 1.96 s n 3.8 .81 13. a. n x xi / n 80 10 8 84 ( x x )2 3.464 n 1 8 1 b. s c. With 7 degrees of freedom, t.025 = 2.365 x t.025 (s / n ) 13- 157 10 2.365 (3.464 / 8 ) 10 2.90 16. a. (7.10 to 12.90) 17.25 1.729 (3.3 / 20 ) 17.25 1.28 b. (15.97 to 18.53) 17.25 2.09 (3.3 / 20 ) 17.25 1.54 c. (15.71 to 18.79) 17.25 2.861 (3.3 / 20 ) 17.25 2.11 (15.14 to 19.36) At 90% , 80 t.05 (s / n ) with df = 17 t.05 = 1.740 17. 80 1.740 (10 / 18 ) 80 4.10 At 95%, 80 2.11 (10 / (75.90 to 84.10) 18) with df = 17 t.05 = 2.110 80 4.97 18. a. x xi 18.96 $1.58 n 12 b. s .239 (x x )2 .1474 12 1 n 1 c. t.025 = 2.201 (75.03 to 84.97) x t.025 (s / n ) 1.58 2.201 (.1474 / 12 ) 1.58 .09 19. (1.49 to 1.67) x xi / n 6.53 minutes s ( x x )2 0.54 minutes n 1 x t.025 (s / n ) 13- 158 6.53 2.093 (0.54 / 20 ) 6.53 .25 20. a. (6.28 to 6.78) 22.4 1.96 (5 / 61) 22.4 1.25 b. (21.15 to 23.65) With df = 60, t.025 = 2.000 22.4 2 (5 / 61) 22.4 1.28 c. Confidence intervals are essentially the same regardless of whether z or t is used. x 21. (21.12 to 23.68) xi 864 n s $108 8 654 (x x )2 9.6658 8 1 n 1 t.025 = 2.365 x t.025 (s / n ) 108 2.365 (9.6658 / 8) 108 8.08 22. a. b. (99.92 to 116.08) Using Excel, x = 6.86 and s = 0.78 x t.025 (s / n ) t.025 = 2.064 df = 24 6.86 2.064 (0.78 / 25) 6.86 0.32 (6.54 to 7.18) (1.96)2 (25) 2 23. n z.025 E2 24. a. Planning value of = Range/4 = 36/4 = 9 2 n z.025 E2 2 b. 2 2 52 (1.96)2 (9)2 32 96.04 Use n 97 34.57 Use n 35 13- 159 c. n 25. a. n b. n (1.96)2 (9)2 22 (1.96)2 (6.82)2 (1.5)2 (1.645)2 (6.82)2 22 n z E2 2 26. a. b. n c. n 27. a. n b. n c. n 2 (1.96)2 (9400)2 (200)2 (1.96)2 (2,000)2 (500)2 (1.96)2 (2,000)2 (200) 2 (1.96)2 (2,000)2 (100)2 2 79.41 Use n 80 31.47 Use n 32 339.44 Use 340 (1000)2 (500)2 n z E2 Use n 78 (1.96)2 (9400)2 (1.96)2 (9400)2 2 28. a. 77.79 1357.78 Use 1358 8486.09 Use 8487 61.47 Use n 62 384.16 Use n 385 1536.64 Use n 1537 (1.645)2 (220)2 52.39 Use 53 (50)2 (1.96)2 (220)2 b. n c. n d. Must increase SAMPLE size to increase confidence. 29. a. n b. n (50)2 74.37 (2.576)2 (220)2 (50)2 (1.96)2 (6.25)2 22 (1.96)2 (6.25)2 12 Use 75 128.47 Use 129 37.52 Use n 38 150.06 Use n 151 13- 160 (1.96)2 (7.8)2 30. n 31. a. p = 100/400 = 0.25 b. c. 58.43 22 Use n 59 0.25(0.75) p(1 p) 0.0217 n 400 p z.025 p(1 p) n .25 1.96 (.0217) .25 .0424 32. a. .70 1.645 (.2076 to .2924) 0.70(0.30) 800 .70 .0267 b. .70 1.96 (.6733 to .7267) 0.70(0.30) 800 .70 .0318 n 33. 2 z.025 p(1 p) (.6682 to .7318) 2 (1.96) (0.35)(0.65) E2 34. 349.59 Use n 350 (0.05)2 Use planning value p = .50 n 35. a. (1.96)2 (0.50)(0.50) (0.03)2 0.6904(1 0.6904) p(1 p) 1.645 0.0267 n 814 1.645 c. 0.6904 0.0267 b. Use n 1068 p = 562/814 = 0.6904 b. 36. a. 1067.11 (0.6637 to 0.7171) p = 152/346 = .4393 p p(1 p) n .4393(1.4393) .0267 346 13- 161 p z.025 p .4393 1.96(.0267) .4393 .0523 p(1 p) , p = 182/650 = .28 n p 1.96 37. (.3870 to .4916) .28 1.96 (0.28)(0.72) 650 0.28 0.0345 38. a. (0.2455 to 0.3145) (0.26)(0.74) p(1 p) 1.96 0.0430 n 400 1.96 b. 0.26 0.0430 c. n 39. a. n (0.2170 to 0.3030) 1.962 (0.26)(0.74) (0.03)2 2 z.025 p(1 p) 821.25 (1.96)2 (.33)(1.33) E2 b. n 2 z.005 p(1 p) (2.576)2 (.33)(1.33) 41. 1630.19 (.03)2 E b. 943.75 Use 944 (.03)2 2 40. a. Use n 822 p = 255/1018 = 0.2505 1.96 (0.2505)(1 0.2505) = 0.0266 1018 p p(1 p) n .16(1.16) .0102 1285 Margin of Error = 1.96 p = 1.96(.0102) = .02 .16 1.96 p .16 .02 z 42. n 2 .025 (.14 to .18) p(1 p) E2 13- 162 Use 1631 September n October n November n Pre-Election n 43. a. n 1.962 (.50)(1.50) .042 1.962 (.50)(1.50) .032 1.962 (.50)(1.50) 1.962 (.50)(1.50) p = 445/601 = 0.7404 c. 0.7404 1.96 b. z.025 s n 1.96 (0.7054 to 0.7755) 20, 500 400 2009 x z.025 (s / n ) 50,000 2009 45. a. Use n 601 (0.7404)(0.2596) 601 0.7404 0.0350 44. a. 9604 .012 b. 1067.11 Use 1068 2401 .022 1.962 (0.5)(1 0.5) 600.25 (0.04) 2 600.25 Use 601 (47,991 to 52,009) x z.025 (s / n ) 252.45 1.96 (74.50 / 64 ) 252.45 18.25 or $234.20 to $270.70 b. Yes. the lower limit for the POPULATION mean at Niagara Falls is $234.20 which is greater than $215.60. 46. a. Using Excel, x = 49.8 minutes b. Using Excel, s = 15.99 minutes c. x 1.96 (s / n ) 49.8 1.96 (15.99 / 200 ) 49.8 2.22 47. a. (47.58 to 52.02) Using Excel, we find x = 16.8 and s = 4.25 13- 163 With 19 degrees of freedom, t.025 = 2.093 x 2.093 (s / n ) 16.8 2.093 (4.25 / 20 ) 16.8 1.99 b. (14.81 to 18.79) Using Excel, we find x = 24.1 and s = 6.21 24.1 2.093 (6.21 / 20 ) 24.1 2.90 c. 48. a. (21.2 to 27.0) 16.8 / 24.1 = 0.697 or 69.7% or approximately 70% x xi / n 132 13.2 10 547.6 ( x x )2 7.8 9 n 1 b. s c. With d f = 9, t.025 = 2.262 x t.025 (s / n ) 13.2 2.262 (7.8 / 10 ) 13.2 5.58 d. The 5.58 shows poor precision. A larger SAMPLE size is desired. 49. n 50. n 51. n n 52. (7.62 to 18.78) n 1.962 (45)2 102 77.79 (2.33)2 (2.6)2 36.7 Use n 37 61.47 Use n 62 12 (1.96)2 (8) 2 22 (2.576)2 (8) 2 22 (1.96)2 (675)2 1002 Use n 78 106.17 Use n 107 175.03 Use n 176 13- 164 53. a. p 1.96 p(1 p) , p = 212/450 = .47 n (0.47)(0.53) 450 0.47 1.96 0.47 0.0461 b. 0.47 2.576 (0.4239 to 0.5161) (0.47)(0.53) 450 0.47 0.06 c. 54. a. b. c. 55. a. The margin of error becomes larger. p = 200/369 = 0.5420 (0.5420)(0.4580) p(1 p) 1.96 0.0508 n 369 0.5420 0.0508 (0.4912 to 0.5928) 1.96 p = 504 / 1400 = .36 b. 1.96 56. a. n b. n 57. a. (0.41 to 0.53) (0.36)(0.64) 0.0251 1400 (2.33)2 (0.70)(0.30) (0.03)2 (2.33)2 (0.50)(0.50) (0.03)2 1266.74 Use n 1267 1508.03 Use n 1509 p = 110 / 200 = 0.55 0.55 1.96 (0.55)(0.45) 200 .55 .0689 b. 58. a. b. n (.4811 to .6189) (1.96)2 (0.55)(0.45) (0.05) 2 380.32 Use n 381 p = 340/500 = .68 p p(1 p) n .68(1.68) .0209 500 13- 165 p z.025 p .68 1.96(.0209) .68 .0409 59. a. n (.6391 to .7209) (1.96)2 (0.3)(0.7) (0.02)2 2016.84 b. p = 520/2017 = 0.2578 c. p 1.96 p(1 p) n (0.2578)(0.7422) 2017 0.2578 1.96 0.2578 0.0191 60. a. b. c. Use n 2017 (0.2387 to 0.2769) p = 618 / 1993 = .3101 p 1.96 p(1 p) 1993 0.3101 1.96 (0.3101)(0.6899) 1993 .3101 .0203 (.2898 to .3304) n z z 2 p(1 p) E2 (1.96)2 (0.3101)(0.6899) (0.01)2 8218.64 Use n 8219 No; the SAMPLE appears unnecessarily large. The .02 margin of error reported in part (b) should provide adequate precision. 13- 166 Chapter 9 Hypothesis Testing Learning Objectives 1. Learn how to formulate and test hypotheses about a POPULATION mean and/or a POPULATION proportion. 2. Understand the types of errors possible when conducting a hypothesis test. 3. Be able to determine the probability of making various errors in hypothesis tests. 4. Know how to compute and interpret p-values. 5. Be able to use the Excel worksheets presented in the chapter as templates for conducting hypothesis tests about POPULATION means and proportions. 6. Know the definition of the following terms: null hypothesis alternative hypothesis type I error type II error critical value level of significance one-tailed test two-tailed test p-value 13- 167 Solutions: 1. a. H0: 600 Manager’s claim. Ha: > 600 2. b. We are not able to conclude that the manager’s claim is wrong. c. The manager’s claim can be rejected. We can conclude that > 600. a. H0: 14 Ha: > 14 3. 4. b. There is no statistical evidence that the new bonus plan increases sales volume. c. The research hypothesis that > 14 is supported. We can conclude that the new bonus plan increases the mean sales volume. a. H0: = 32 Specified filling weight Ha: 32 Overfilling or underfilling exists b. There is no evidence that the production line is not operating properly. Allow the production process to continue. c. Conclude 32 and that overfilling or underfilling exists. Shut down and adjust the production line. a. H0: 220 Ha: < 220 5. Research hypothesis Research hypothesis to see if mean cost is less than $220. b. We are unable to conclude that the new method reduces costs. c. Conclude < 220. Consider implementing the new method based on the conclusion that it lowers the mean cost per hour. a. The Type I error is rejecting H0 when it is true. In this case, this error occurs if the researcher concludes that the mean newspaper-reading time for individuals in management positions is greater than the national average of 8.6 minutes when in fact it is not. b. The Type II error is accepting H0 when it is false. In this case, this error occurs if the researcher concludes that the mean newspaper-reading time for individuals in management positions is less than or equal to the national average of 8.6 minutes when in fact it is greater than 8.6 minutes. 6. a. H0: 1 The label claim or assumption. Ha: > 1 b. Claiming > 1 when it is not. This is the error of rejecting the product’s claim when the claim is true. 13- 168 c. 7. a. Concluding 1 when it is not. In this case, we miss the fact that the product is not meeting its label specification. H0: 8000 Ha: > 8000 8. Research hypothesis to see if the plan increases average sales. b. Claiming > 8000 when the plan does not increase sales. A mistake could be implementing the plan when it does not help. c. Concluding 8000 when the plan really would increase sales. This could lead to not implementing a plan that would increase sales. a. H0: 220 Ha: < 220 9. b. Claiming < 220 when the new method does not lower costs. A mistake could be implementing the method when it does not help. c. Concluding 220 when the method really would lower costs. This could lead to not implementing a method that would lower costs. a. z = -1.645 Reject H0 if z < -1.645 b. z x 9.46 10 1.91 2 / 50 s/ n Reject H0; conclude Ha is true. 10. a. z = 2.05 Reject H0 if z > 2.05 x 16.5 15 1.36 7 / 40 s/ n b. z c. Using the cumulative normal probability table, the area to the right of z = 1.36 is 1 - .9131 = .0869. Thus, the p-value is .0869 d. Do not reject H0 11. Reject H0 if z < -1.645 a. z x 22 25 2.50 s / n 12 / 100 Reject H0 b. z x 24 25 .83 s / n 12 / 100 Do Not Reject H0 13- 169 c. z x 23.5 25 1.25 s / n 12 / 100 Do Not Reject H0 d. z x 22.8 25 1.83 s / n 12 / 100 Reject H0 12. a. p-value = 1 - .9656 = .0344 Reject H0 b. p-value = 1 - .6736 = .3264 Do not reject H0 c. p-value = 1 - .9332 = .0668 Do not reject H0 d. z = 3.09 is the largest table value with 1 - .999 = .001 area in tail. For z = 3.30, the p-value is less than .001 or approximately 0. Reject H0. e. Since z is to the left of the mean and the rejection region is in the upper tail, the p-value is the area to the right of z = -1.00. Because the standard normal distribution is symmetric, the area to the right of z = -1.00 is the same as the area to the left of z = 1.00. Thus, the p-value = .8413. Do not reject H0. 13. a. H0: 1056 Ha: < 1056 b. Reject H0 if z < -1.645 c. 910 1056 x 1.83 z s / n0 1600 / 400 d. Reject H0 and conclude that the mean refund of “last minute” filers is less than $1056. e. p-value = 1.0000 - .9664 = .0336 14. a. z.01 = 2.33 Reject H0 if z > 2.33 x z c. Reject H0; conclude the mean television viewing time per day is greater than 6.70. 15. a. s/ n 7.25 6.70 b. 2.5 / 200 3.11 A summary of the SAMPLE DATA is shown below: SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 100 $9300 $4500 H0: 10,192 Ha: < 10,192 Reject H0 if z < –1.645. 13- 170 z x s/ n 9300 10,192 1.98 4500 / 100 b. The area to the left of z = -1.98 is the same as the area to the right of z = 1.98. Using the cumulative normal probability table, the area to the right of z = 1.98 is 1 - .9761 = .0239. Thus, the p-value = .0239. c. The manager can use the SAMPLE results to conclude that the mean sales price of used cars at the dealership is less than the mean sales price of used cars using the national average. The manager may want to explore the possible reasons for the lower prices at the dealership. Perhaps sales personnel are making excessive price concessions to close the sales. Perhaps the dealership is missing out on a portion of the late model used car market that might warrant used cars with higher prices. The manager’s judgment and insight might suggest other reasons the dealership is experiencing the lower mean sales prices. 16. A summary of the SAMPLE DATA is shown below: SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 30 27,500 1000 H0: 28,000 Ha: < 28,000 Reject H0 if z < -1.645 z x 0 27,500 28,000 2.74 s/ n 1000 / 30 Reject H0; Tires are not meeting the at least 28,000 design specification. Because the standard normal distribution is symmetric, the area to the left of z = -2.74 is the same as the area to the right of z = 2.74. Using the cumulative normal probability table, the area to the right of z = 2.74 is 1 - .9969 = .0031. Thus, the p-value = .0031. 17. a. H0: Ha: < 13 b. z.01 = 2.33 Reject H0 if z < -2.33 x z d. Reject H0; conclude Canadian mean internet usage is less than 13 hours per month. s/ n 10.8 13 c. 9.2 / 145 2.88 Note: p-value = .002 18. a. H0: 5.72 13- 171 Ha: > 5.72 5.98 5.72 c. d. p-value < ; reject H0. Conclude teens in Chicago have a mean expenditure greater than 5.72. 19. a. z x 2.12 s / n 1.24 / 102 p-value = 1.0000 - .9830 = .0170 b. H0: 181,900 Ha: < 181,900 x z c. p-value = 1.0000 - .9983 = .0017 d. p-value < ; reject H0. Conclude mean selling price in South is less than the national mean selling price. 20. a. s/ n 166, 400 181, 900 b. 33, 500 / 40 2.93 H0: 37,000 Ha: > 37,000 x z c. p-value = 1.0000 - .9292 = .0708 d. p-value > ; do not reject H0. Cannot conclude POPULATION mean salary has increased in June 2001. 21. a. b. s/ n 38,100 37, 000 b. 5200 / 48 1.47 Reject H0 if z < -1.96 or z > 1.96 z x 10.8 10 2.40 s / n 2.5 / 36 Reject H0; conclude Ha is true. 22. a. Reject H0 if z < -2.33 or z > 2.33 x 14.2 15 1.13 5 / 50 s/ n b. z c. p-value = (2) (1 - .8708) = .2584 d. Do not reject H0 23. Reject H0 if z < -1.96 or z > 1.96 a. z x 22 25 2.68 s / n 10 / 80 Reject H0 13- 172 b. c. d. x 27 25 1.79 s / n 10 / 80 x 23.5 25 z 1.34 s / n 10 / 80 x 28 25 z 2.68 s / n 10 / 80 z Do not reject H0 Do not reject H0 Reject H0 24. a. p-value = 2(1 - .9641) = .0718 Do not reject H0 b. p-value = 2(1 - .6736) = .6528 Do not reject H0 c. p-value = 2(1 - .9798) = .0404 Reject H0 d. approximately 0 Reject H0 e. p-value = 2(1 - .8413) = .3174 25. a. Do not reject H0 z.025 = 1.96 Reject H0 if z < -1.96 or z > 1.96 x z c. Do not reject H0. Cannot conclude a change in the POPULATION mean has occurred. d. p-value = 2(1.000 - .9382) = .1236 26. a. s/ n 38.5 39.2 b. 4.8 / 112 1.54 H0: = 8 Ha: 8 Reject H0 if z < -1.96 or if z > 1.96 z x 0 7.5 8 1.71 s / n 3.2 / 120 Do not reject H0; cannot conclude the mean waiting time differs from eight minutes. b. 27. a. Using the cumulative normal probability table, the area to the left of z = -1.71 is 1 - .9564 = .0436. Thus, the p-value = 2 (.0436) = .0872. H0: = 16 Continue production Ha: 16 Shut down Reject H0 if z < -1.96 or if z > 1.96 b. z x 0 16.32 16 2.19 s/ n .8 / 30 Reject H0 and shut down for adjustment. 13- 173 c. z x 0 15.82 16 1.23 s/ n .8 / 30 Do not reject H0; continue to run. d. For x = 16.32, p-value = 2 (1 - .9857) = .0286 For x = 15.82, p-value = 2 (1 - .8907) = .2186 28. A summary of the SAMPLE DATA is shown below: SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 45 2.39 .20 H0: = 2.2 Ha: 2.2 Reject H0 if z < -2.33 or if z > 2.33 z x 0 2.39 2.20 6.37 s/ n .20 / 45 Reject H0 and conclude 2.2 - minute standard is not being met. H0: = 15.20 29. Ha: 15.20 Reject H0 if z < -1.96 or if z > 1.96 z x 0 14.30 15.20 1.06 s/ n 5 / 35 Do not reject H0; the SAMPLE does not provide evidence to conclude that there has been a change. p-value = 2 (1 - .8554) = .2892 30. a. H0: = 1075 Ha: 1075 x z c. p-value = 2(1.0000 - .9236) = .1528 d. Do not reject H0. Cannot conclude a change in mean amount of charitable giving. 31. a. s/ n 1160 1075 b. 840 / 200 1.43 With 15 degrees of freedom, t.05 = 1.753 Reject H0 if t > 1.753 13- 174 b. 32. a. t x 0 s/ n 11 10 3 / 16 1.33 Do not reject H0 x ∑ xi / n = 108 / 6 = 18 10 ∑( x x)2 1.414 n 1 61 b. s b. d. Reject H0 if t < -2.571 or t > 2.571 x 0 18 20 t 3.46 s / n 1.414 / 6 e. Reject H0; conclude Ha is true. 33. Reject H0 if t < -1.721 a. t 13 15 1.17 8 / 22 Do not reject H0 b. t 11.5 15 2.05 8 / 22 Reject H0 c. t 15 15 0 8 / 22 Do not reject H0 d. t 19 15 2.35 8 / 22 Do not reject H0 34. Excel's TDIST function with 15 degrees of freedom was used to determine each p-value. a. p-value = .01 Reject H0 b. p-value = .10 Do not reject H0 c. p-value = .03 Reject H0 d. p-value = .15 Do not reject H0 e. p-value = .003 Reject H0 35. a. H0: 3.00 Ha: 3.00 b. t.025 = 2.262 Reject H0 if t < -2.262 or if t > 2.262 13- 175 c. x xi 28 n 2.80 10 .44 (x x )2 .70 n 1 10 1 d. s e. t f. Do not reject H0; cannot conclude the POPULATION mean earnings per share has changed. g. t.10 = 1.383 x s/ n 2.80 3.00 .70 / 10 .90 p-value is greater than .10 x 2 = .20 Actual p-value = .3916 36. a. A summary of the SAMPLE DATA is shown below: SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 25 84.5 14.5 H0: = 90 Ha: 90 Degrees of freedom = 24 t.025 = 2.064 Reject H0 if z < -2.064 or if z > 2.064 t x 0 84.5 90 1.90 s / n 14.5 / 25 Do not reject H0; we cannot conclude the mean household expenditure in Corning differs from the U.S. mean expenditure. b. 37. a. Using Excel's TDIST function, the p-value corresponding to t = -1.90 is approximately .07. H0: 55 Ha: > 55 With 7 degrees of freedom, reject H0 if t < 1.895. x ∑ xi / n = 475 / 8 = 59.38 s ∑( x x)2 123.87 4.21 7 n 1 13- 176 t x 0 59.38 55 2.94 s/ n 4.21 / 8 Reject H0; the mean number of hours worked per week exceeds 55. b. 38. a. Using Excel's TDIST function, the p-value corresponding to t = 2.94 is approximately .011. H0: 4000 Ha: 4000 b. t.05 = 2.160 13 degrees of freedom Reject H0 if t < -2.160 or if t > 2.160 x t d. Do not reject H0; Cannot conclude that the mean cost in New City differs from $4000. e. With 13 degrees of freedom s/ n 4120 4000 c. 275 / 14 1.63 t.05 = 1.771 t.10 = 1.350 1.63 is between 1.350 and 1.771. Therefore the p-value is between .10 and .20. 39. a. H0: 280 Ha: > 280 b. 286.9 - 280 = 6.9 yards c. t.05 = 1.860 with 8 degrees of freedom d. t e. Reject H0; The POPULATION mean distance of the new driver is greater than the USGA approved driver.. f. t.05 = 1.860 x s/ n 286.9 280 10 / 9 2.07 t.025 = 2.306 p-value is between .025 and .05 Actual p-value = .0361 40. H0: 2 13- 177 Ha: > 2 With 9 degrees of freedom, reject H0 if t > 1.833 x = 2.4 s = .5164 x 2.4 2 t s / n0 .5164 / 10 2.45 Using Excel's TDIST function, the p-value corresponding to t = 2.45 is approximately .02. Reject H0 and claim is greater than 2 hours. For cost estimating purposes, consider using more than 2 hours of labor time. 41. a. b. Reject H0 if z > 1.645 p z 42. a. b. .50(.50) .0354 200 p p p .57.50 1.98 .0354 Reject H0 if z < -1.96 or z > 1.96 p z .20(.80) .02 400 p p p .175.20 1.25 .02 c. p-value = 2(1 - .8944) = .2122 d. Do not reject H0. 43. Reject H0 Reject H0 if z < -1.645 a. p z .75(.25) .0250 300 p p p .68.75 2.80 .025 p-value = 1 - .8974 = .0026 Reject H0. b. z p p p .72.75 1.20 .025 13- 178 p-value = 1 - .8849 = .1151 Do not reject H0. c. z p p p .70.75 2.00 .025 p-value = 1 - .8772 = .0228 Reject H0. d. z p p p .77.75 .80 .025 In this case, the p-value is the area to the left of z = .80. Thus, the p-value = .7881. Do not reject H0. 44. a. H0: p .40 Ha: p > .40 b. Reject H0 if z > 1.645 c. p = 188/420 = .4476 p(1 p) .40(1.40) p .0239 n 420 z d. 45. p p p .4476 .40 1.99 .0239 Reject H0. Conclude that there has been an increase in the proportion of users receiving more than ten e-mails per day. H0: p .64 Ha: p < .64 Reject H0 if z < –1.645. p = 52/100 = .2667 z .52.64 .64(.36) 100 2.5 Reject H0; conclude that less than 64% of the shoppers believe that the supermarket ketchup is as good as the national name brand ketchup. 46. a. p = 285/460 = .62 13- 179 b. H0: p 0.50 Ha: p > 0.50 Reject H0 if z > 2.33 p p0 p0 (1 p0 ) n z .57.50 3.13 .50(1.50) 500 Reject H0; a Burger King taste preference should be expressed by over 50% of the consumers. c. 47. Yes; the statistical evidence shows Burger King fries are preferred. The give-away was a good way to get potential customers to try the new fries. A summary of the SAMPLE DATA is shown below: SAMPLE Size 200 Number of College Students 42 H0: p = .25 Ha: p .25 Reject H0 if z < -1.645 or if z > 1.645 p = 42/200 = .21 p z .25(.75) .0306 200 p p0 p .21.25 1.31 .0306 Do not reject H0; the magazine’s claim of 25% cannot be rejected. p-value = 2 (1 - .9049) = .1902 48. a. b. p = 67/105 = .6381 (about 64%) p(1 p) n p z p p p .50(1.50) .0488 105 .6381.50 2.83 .0488 c. p-value = 2(1.0000 - .9977) = .0046 d. p-value < .01, reject H0. Conclude preference is for the four ten-hour day schedule. 13- 180 49. a. H0: p = .44 Ha: p .44 b. p = 205/500 = .41 p(1 p) n p z p p p .44(1.44) .0222 500 .41.44 1.35 .0222 p-value = 2(1.0000 - .9115) = .1770 Do not reject H0. Cannot conclude that there has been a change in the proportion of repeat customers. c. p = 245/500 = .49 z p p p .49 .44 2.25 .0222 p-value = 2(1.0000 - .9878) = .0244 Reject H0. conclude that the proportion of repeat customers has changed. The point estimate of the percentage of repeat customers is now 49%. 50. a. p(1 p) n p z p p p .75(1.75) .025 300 .72 .75 1.20 .025 b. p-value = 1.0000 - .8849 = .1151 c. Do not reject H0. Cannot conclude the manager's claim is wrong based on this SAMPLE evidence. 51. a. H0: p .047 Ha: p < .047 b. p = 35/1182 = .0296 c. p z d. .047(1.047) .0062 1182 p p p .0296 .047 2.82 .0062 p-value = 1.0000 - .9976 = .0024 13- 181 e. 52. a. p-value < , reject H0. The error rate for Brooks Robinson is less than the overall error rate. H0: 45,250 Ha: > 45,250 x 47, 000 45, 250 b. z c. p-value = 1.0000 - .9966 = .0034 d. 53. a. s/ n 6300 / 95 2.71 p-value < ; reject H0. New York City school teachers must have a higher mean annual salary. H0: 30 Ha: < 30 Reject H0 if z < –2.33 z x 0 29.5 30 1.96 s / n 1.8 / 50 Do not reject H0; the SAMPLE evidence does not support the conclusion that the Buick LeSabre provides less than 30 miles per gallon. b. p-value = 1 - .9963 = .0037 c. x z.05 x z.05 n 1.5 45 27.6.44 Interval is 27.16 to 28.04 54. H0: 25,000 Ha: > 25,000 Reject H0 if z > 1.645 z x 0 26, 000 25, 000 2.26 s/ n 2, 500 / 32 p-value = 1.0000 - .9881 = .0119 Reject H0; the claim should be rejected. The mean cost is greater than $25,000. 55. H0: = 120 Ha: 120 13- 182 With n = 10, use a t distribution with 9 degrees of freedom. Reject H0 if t < -2.262 or of t > 2.262 x xi 118.9 n s (x x )2 4.93 n 1 x 118.9 120 t s / n0 4.93 / 10 .71 Do not reject H0; the results do not permit rejection of the assumption that = 120. 56. a. H0: = 550 Ha: 550 Reject H0 if z < -1.96 or if z > 1.96 z x 0 562 550 1.80 s/ n 40 / 36 Do not reject H0; the claim of $550 per month cannot be rejected. b. c. p-value = 2(1 - .9641) = .0718 s x z.025 n x 1.96 40 36 562 13 Interval is 549 to 575 Do not reject H0 since 550 is in the above interval. 57. a. A summary of the SAMPLE DATA is shown below: SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 30 80 20 H0: 72 Ha: > 72 x 72 80 72 z s / n 20 / 30 2.19 13- 183 p-value = 1 - .9857 = .0143 b. Since p-value < .05, reject H0; the mean idle time exceeds 72 minutes per day. H0: p .79 58. Ha: p < .79 Reject H0 if z < -1.645 p = 360/500 = .72 z p p0 .72.79 3.84 (.79)(.21) p 500 Reject H0; conclude that the proportion is less than .79 in 1995. 59. A summary of the SAMPLE DATA is shown below: SAMPLE Size 400 Number that Work with Coworkers 304 H0: p .72 Ha: p > .72 Reject H0 if z > 1.645 p = 304/400 = .76 p p0 .76.72 1.87 (.76)(.24) p 400 Reject H0: conclude that the proportion of workers at Trident is greater. z 60. a. b. The research is attempting to see if it can be concluded that less than 50% of the working POPULATION hold jobs that they planned to hold. .50(.50) .0136 1350 .41.50 z 6.62 .0136 p Reject H0 if z < -2.33 Reject H0; it can be concluded that less than 50% of the working POPULATION hold jobs that they planned to hold. The majority hold jobs due to chance, lack of choice, or some other unplanned reason. 13- 184 .75(.25) .0229 356 p = 313/356 = .88 p 61. z .88.75 .0229 5.68 Reject H0; conclude p 0. DATA suggest that 88% of women wear shoes that are at least one size too small. 62. a. b. p = 355/546 = .6502 p(1 p) n p z p p p .67(1.67) .0201 546 .6502 .67 .98 .0201 c. p-value = 2(1.0000 - .8365) = .3270 d. p-value , do not reject H0. The assumption of two-thirds cannot be rejected. 63. a. b. p = 330/400 = .825 p(1 p) n p z p p p .78(1.78) .0207 400 .825 .78 2.17 .0207 c. p-value = 2(1.0000 - .9850) = .03 d. p-value < , reject H0. Arrival rate has changed from 78%. Service appears to be improving. 64. a. b. p = 44/125 = .352 p(1 p) n p z p p p .47(1.47) .0446 125 .352 .47 2.64 .0446 c. p-value = 1.0000 - .9959 = .0041 d. Reject H0; conclude that the proportion of food containing pesticide residues has been reduced. 13- 185 Chapter 10 Comparisons Involving Means Learning Objectives 1. Be able to develop interval estimates and conduct hypothesis tests about the difference between the means of two POPULATIONs. 2. Know the properties of the sampling distribution of the difference between two means x1 x2 . 3. Be able to use the t distribution to conduct statistical inferences about the difference between the means of two normal POPULATIONs with equal variances. 4. Understand the concept and use of a pooled variance estimate. 5. Learn how to analyze the difference between the means of two POPULATIONs when the SAMPLEs are independent and when the SAMPLEs are matched. 6. Understand how the analysis of variance procedure can be used to determine if the means of more than two POPULATIONs are equal. 7. Know the assumptions necessary to use the analysis of variance procedure. 8. Understand the use of the F distribution in performing the analysis of variance procedure. 9. Know how to set up an ANOVA table and interpret the entries in the table. 10. Be able to use the Excel worksheets and tools presented to conduct comparisons involving means. 13- 186 Solutions: 1. a. x1 x2 = 13.6 - 11.6 = 2 b. sx x 1 2 s2 s2 (2.2)2 (3)2 0.595 2 50 35 n1 n2 1 2 1.645(.595) 2 .98 or 1.02 to 2.98 c. 2 1.96(.595) 2 1.17 or 0.83 to 3.17 2. a. x1 x2 = 22.5 - 20.1 = 2.4 b. s2 c. sx x s (n 1)s2 (n 1)s2 1 1 2 2 n1 n 2 2 2 1 2 9(2.5)2 7(2)2 10 8 2 5.27 F1 1 I F1 1I J 5.27G J 1.09 G Hn n K H10 8 K 1 2 16 degrees of freedom, t.025 = 2.12 2.4 2.12(1.09) 2.4 2.31 or .09 to 4.71 3. a. x1 ∑ xi / n 54 / 6 9 x2 ∑ xi / n 42 / 6 7 b. s1 ∑(x x )2 n1 1 18 1.90 6 1 s2 ∑(x x )2 n2 1 16 1.79 6 1 c. x1 x2 = 9 - 7 = 2 d. s2 e. With 10 degrees of freedom, t.025 = 2.228 (n 1)s2 (n 1)s2 1 1 2 n1 n2 2 2 5(1.90)2 5(1.79)2 662 13- 187 3.41 2 sx x s 1 2 F1 1 I F1 1I J 3.41G J 1.07 G Hn n K H6 6 K 1 2 2 2.228(1.07) 2 2.37 or -0.37 to 4.37 4. a. x1 x2 = 1.58 - 0.98 = $0.60 b. s x1 x2 s2 1 n1 s2 2 n2 x1 x2 z / 2 s x x 1 .122 .082 .021 50 42 2 .60 ± 1.96(.021) .60 ± .04 or .56 to .64 5. a. 22.5 - 18.6 = 3.9 miles per day b. x1 x2 z / 2 s x x 1 sx x 1 2 s2 1 n1 2 s2 2 n2 (8.4)2 (7.4)2 1.58 50 50 22.5 - 18.6 1.96(1.58) 3.9 3.1 or 0.6 to 7.0 6. LA 6.72 2.374 x s x1 x2 z / 2 s x x 1 sx x 1 2 Miami 6.34 2.163 2 (2.374)2 (2.163)2 s2 0.454 2 50 50 n1 n2 s2 1 6.72 - 6.34 1.96(.454) .38 .89 or -.51 to 1.27 7. a. x1 x2 = 14.9 - 10.3 = 4.6 years b. s x1 x2 s2 1 n1 s2 2 n2 5.22 3.82 .66 100 85 13- 188 z.025 sx x = 1.96(.66) = 1.3 1 c. 2 x1 x2 z.025 sx x 1 2 4.6 1.3 or 3.3 to 5.9 8. a. x1 x2 = 45,700 - 44,500 = 1,200 b. Pooled variance s2 7(700)2 11(850)2 632, 083 18 sx x 1 362.88 2 With 18 degrees of freedom t.025 = 2.101 1200 2.101(362.88) 1200 762 or 438 to 1962 9. c. POPULATIONs are normally distributed with equal variances. a. n1 = 10 n2 = 8 x1 = 21.2 x2 = 22.8 s1 = 2.70 s2 = 3.55 x1 x2 = 21.2 - 22.8 = -1.6 Kitchens are less expensive by $1,600. b. x1 x2 z / 2 s x x 1 2 Degrees of freedom = n1 + n2 - 2 = 16 t.05 = 1.746 s2 9(2.70)2 7(3.55)2 10 8 2 sx x 1 2 9.63 1.47 -1.6 1.746(1.47) 13- 189 -1.6 2.57 or -4.17 to +.97 10. a. b. c. x1 = 17.54 x2 = 15.36 x1 x2 = 17.54 - 15.36 = $2.18 per hour greater for union workers. (n 1)s2 (n 1)s2 14(2.24)2 19(1.99)2 1 2 2 4.41 s2 1 15 20 2 n1 n 2 2 x1 x2 t / 2 sx x 1 2 sx x 1 0.72 2 17.54 15.36 t / 2 (.72) = 2.18 t / 2 (.72) Note: Using Excel's TINV function, t.025 = 2.035. 2.18 2.035(.72) 2.18 1.47 or 0.71 to 3.65 d. 11. a. There does appear to be a difference in the mean wage rate for these two groups. sx x 1 z 2 s2 s2 (5.2)2 (6)2 1.18 2 40 50 n1 n2 1 (25.2 22.8) 2.03 1.18 Reject H0 if z > 1.645 Reject H0; conclude Ha is true and > 0. b. 12. a. p-value = 1.0000 - .9788 = .0212 sx x 1 z 2 s2 1 n1 s2 (8.4)2 (7.6)2 1.31 2 80 70 n2 ( x1 x2 ) (1 2 ) (104 106) 0 1.53 sx x 1.31 1 2 Reject H0 if z < -1.96 or z > 1.96 Do not reject H0 b. 13. a. p-value = 2(1.0000 - .9370) = .1260 x1 x2 =1.4 – 1.0 = 0.4 13- 190 (n 1)s2 (n 1)s2 s2 1 1 2 n1 n 2 2 sx x 1 2 7(.4)2 6(.6)2 872 0.2523 0.26 2 With 13 degrees of freedom. t.025 = 2.16 Reject H0 if t < -2.16 or t > 2.16 t ( x1 x2 ) (1 2 ) 0.4 1.54 0.26 sx x 1 2 Do not reject H0 14. a. H0: µ1 - µ2 = 0 Ha: 0 b. Reject H0 if z < -1.96 or if z > 1.96 c. s x1 x2 z s2 1 n1 s2 2 n2 (16.8)2 (15.2)2 1.79 150 175 x1 x2 0 39.3 35.4 0 sx x 1 15. 2 2.18 1.79 d. Reject H0; conclude the POPULATION means differ. e. p-value = 2(1.0000 - .9854) = .0292 A summary of the SAMPLE DATA is shown below: Airport Miami Los Angeles SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 50 6.34 2.163 50 6.72 2.374 We will treat Los Angeles as POPULATION 1 H0: Ha: z ( x1 x2 ) (1 2 ) (6.72 6.34) 0 0.84 s2 s2 (2.374)2 (2.163)2 50 50 Since 0.84 < z.05 = 1.64 we cannot reject H0 13- 191 16. H0: µ1 - µ2 = 0 Ha: 0 Reject H0 if z < -1.96 or if z > 1.96 z (x1 x2 ) 0 2 2 (40 35) (9)2 (10)2 36 49 2.41 Reject H0; customers at the two stores differ in terms of mean ages. p-value = 2(1.0000 - .9920) = .0160 17. a. POPULATION 1 is supplier A. POPULATION 2 is supplier B. b. H0: 0 Stay with supplier A Ha: > 0 Change to supplier B Reject H0 if z > 1.645 z (x1 x2 ) (1 2 ) 2 2 (14 12.5) 0 2.68 (3)2 (2)2 50 30 p-value = 1.0000 - .9963 = .0037 Reject H0; change to supplier B. 18. A summary of the SAMPLE DATA is shown below: Employees Male Female SAMPLE SAMPLE Size SAMPLE Mean Standard Deviation 44 $12.34 $0.92 32 $11.59 $0.76 We will treat the male employees as POPULATION 1. H0: 0 Ha: > 0 Reject H0 if z > 2.33 z (x1 x2 ) (1 2 ) (12.34 11.59) 0 3.88 s2 s2 (.92)2 (.76)2 44 1332 192 Reject H0; wage discrimination appears to exist. 19. a. H0: = 0 Ha: 0 Degrees of freedom = n1 + n2 - 2 = 24 t.025= 2.064 Reject H0 if t < -2.064 or if t > 2.064 x1 = 30.6 x2 = 27 s1 = 3.35 s2 = 2.64 s2 sx x 1 s2 2 t (30.6 27) 0 1.20 (3.35)2 (2.64) 2 12 14 1.20 3.0 Reject H0; the POPULATION means differ. b. Public Accountants have a higher mean. x1 x2 = 30.6 - 27 = 3.6, or $3,600. 20. a. H0: 0 Ha: 0 2 x x 1 z 2 2 2 2 2.5 2.5 .36 112 x1 x2 0 x x 1 84 69.95 69.56 .36 1.08 2 b. p-value = 2(1.0000 - .8599) = .2802 c. Do no reject H0. Cannot conclude that there is a difference between the POPULATION mean scores for the two golfers. 21. a. H0: 0 Ha: 0 b. t.025 = 2.021 df = n1 + n2 - 2 = 22 + 20 - 2 = 40 Reject H0 if t < -2.021 or if t > 2.021 13- 193 c. s2 n1 1 s12 n2 1 s22 n 1 n2 2 sx x 1 t x1 x2 0 sx x 1 (22 1)(.8)2 (20 1)(1.1)2 22 20 2 2.5 2.1 .9108 .2948 2 1.36 .2948 2 d. Do not reject H0. Cannot conclude that a difference between POPULATION mean exists. e. Using Excel's TDIST function, p-value = .18. 22. a. H0: 0 Ha: > 0 b. t.05 = 1.711 df = n1 + n2 - 2 = 16 + 10 - 2 = 24 Reject H0 if t > 1.711 c. s2 n1 1 s12 n2 1 s22 n 1 n2 2 sx x 1 t 2 x1 x2 0 sx x 1 (16 1)(.64)2 (10 1)(.75)2 16 10 2 .2755 6.82 6.25 .4669 2.07 .2755 2 d. Reject H0. Conclude that the consultant with the more experience has the higher POPULATION mean rating. e. Using Excel's TDIST function, p-value = .025. 23. a. 1, 2, 0, 0, 2 b. d ∑ di / n 5 / 5 1 c. sd d. With 4 degrees of freedom, t.05 = 2.132 ∑(d d )2 n 1 4 1 51 Reject H0 if t > 2.132 t d d sd / n 1 0 2.24 1/ 5 13- 194 Using Excel's TDIST function, p-value = .04. Reject H0; conclude d > 0. 24. a. 3, -1, 3, 5, 3, 0, 1 b. d ∑ di / n 14 / 7 2 c. sd d. d =2 e. With 6 degrees of freedom t.025 = 2.447 26 ∑(d d )2 2.082 7 1 n 1 2 2.447 2.082 / 7 2 1.93 or .07 to 3.93 25. Difference = rating after - rating before H0: d 0 Ha: d > 0 With 7 degrees of freedom, reject H0 if t > 1.895 d = .625 and sd = 1.3025 t d d .625 0 1.36 sd / n 1.3025 / 8 p-value is greater than .10 Do not reject H0; we cannot conclude that seeing the commercial improves the mean potential to purchase. 26. Differences: .20, .29, .39, .02, .24, .20, .20, .52, .29, .20 d ∑ di / n 2.55 /10 .255 sd ∑(d d )2 .1327 n 1 With df = 9, t.025 = 2.262 13- 195 d t.025 sd n .1327 .255 2.262 10 .255 .095 or .16 to .35 27. Differences: 8, 9.5, 6, 10.5, 15, 9, 11, 7.5, 12, 5 d = 93.5/10 = 9.35 and sd = 2.954 t.025 = 2.262 e j 9.35 2.262 2.954 / 10 9.35 2.11 Interval estimate is 7.24 to 11.46 28. H0: d = 0 Ha: d 0 Reject H0 if t < -2.365 or if t > 2.365 df = 7 Differences -.01, .03, -.06, .16, .21, .17, -.09, .11 d ∑ di / n .52 / 8 .065 sd t ∑(d d )2 .1131 n 1 d 0 sd n .065 1.63 .1131 8 Do not reject H0. Cannot conclude that the POPULATION means differ. 29. Using matched SAMPLEs, the differences are as follows: 4, -2, 8, 8, 5, 6, -4, -2, -3, 0, 11, -5, 5, 9, 5 H0: d 0 Ha: d > 0 d = 3 and sd = 5.21 t d d sd / n 3 0 2.23 5.21 / 15 Using Excel's TDIST function, p-value = .02. 13- 196 With 14 degrees of freedom, reject H0 if t > 1.761 or if p-value < = .05. Reject H0. Conclude that the POPULATION of readers spends more time, on average, watching television than reading. 30. a. Difference = Price deluxe - Price Standard H0: d = 10 Ha: d 10 With 6 degrees of freedom, reject H0 if t < -2.447 or if t > 2.447; alternatively, reject H0 p-value < = .05. d = 8.86 and sd = 2.61 t d d 8.86 10 1.16 sd / n 2.61 / 7 Using Excel's TDIST function, p-value = .29. Do not reject H0; we cannot reject the hypothesis that a $10 price differential exists. b. d t / 2 sd n 8.86 2.447 2.61 7 8.86 2.41 or 6.45 to 11.27 31. a. H0: 1 - 2 = 0 Ha: 1 - 2 0 With df = 11, t.025 = 2.201 Reject H0 if t < -2.201 or if t > 2.201; alternatively, reject H0 if p-value < = .05. Calculate the difference, di, for each stock. d ∑ di / n 85 / 12 7.08 sd ∑(d d )2 3.34 n 1 13- 197 if t x sd / n 7.34 p-value 0 Reject H0; a decrease in P/E ratios is being projected for 1998. b. d t.025 sd n 7.08 2.201 3.34 12 7.08 2.12 or 4.96 to 9.21 32. a. x = (30 + 45 + 36)/3 = 37 k SSTR ∑ nj x j x j 1 = 5(30 - 37) + 5(45 - 37) + 5(36 - 37) = 570 2 2 2 2 MSTR = SSTR /(k - 1) = 570/2 = 285 k b. SSE ∑(n j 1)s j = 4(6) + 4(4) + 4(6.5) = 66 2 j 1 MSE = SSE /(nT - k) = 66/(15 - 3) = 5.5 c. F = MSTR /MSE = 285/5.5 = 51.82 F.05 = 3.89 (2 degrees of freedom numerator and 12 denominator) Since F = 51.82 > F.05 = 3.89, we reject the null hypothesis that the means of the three POPULATIONs are equal. d. Source of Variation Treatments Error Total 33. a. Sum of Squares 570 66 636 Degrees of Freedom 2 12 14 Mean Square 285 5.5 x = (153 + 169 + 158)/3 = 160 k SSTR ∑ nj x j x j 1 = 4(153 - 160) + 4(169 - 160) + 4(158 - 160) = 536 2 2 MSTR = SSTR /(k - 1) = 536/2 = 268 13- 198 2 2 F 51.82 k b. SSE ∑(n j 1)s j = 3(96.67) + 3(97.33) +3(82.00) = 828.00 2 j 1 MSE = SSE /(nT - k) = 828.00 /(12 - 3) = 92.00 c. F = MSTR /MSE = 268/92 = 2.91 F.05 = 4.26 (2 degrees of freedom numerator and 9 denominator) Since F = 2.91 < F.05 = 4.26, we cannot reject the null hypothesis. d. Source of Variation Treatments Error Total 34. a. x Sum of Squares 536 828 1364 4(100) 6(85) 5(79) Degrees of Freedom 2 9 11 Mean Square 268 92 F 2.91 87 15 k SSTR ∑ nj x j x j 1 = 4(100 - 87) + 6(85 - 87) + 5(79 - 87) = 1,020 2 2 2 2 MSTR = SSB /(k - 1) = 1,020/2 = 510 k b. SSE ∑(n j 1)s j = 3(35.33) + 5(35.60) + 4(43.50) = 458 2 j 1 MSE = SSE /(nT - k) = 458/(15 - 3) = 38.17 c. F = MSTR /MSE = 510/38.17 = 13.36 F.05 = 3.89 (2 degrees of freedom numerator and 12 denominator) Since F = 13.36 > F.05 = 3.89 we reject the null hypothesis that the means of the three POPULATIONs are equal. d. Source of Variation Treatments Error Total Sum of Squares 1020 458 1478 Degrees of Freedom 2 12 14 Mean Square 510 38.17 F 13.36 Source of Variation Treatments Error Total Sum of Squares 1200 300 1500 Degrees of Freedom 3 60 63 Mean Square 400 5 F 80 35. a. b. F.05 = 2.76 (3 degrees of freedom numerator and 60 denominator) Since F = 80 > F.05 = 2.76 we reject the null hypothesis that the means of the 4 POPULATIONs are equal. 13- 199 36. a. Source of Variation Treatments Error Total b. Sum of Squares 120 216 336 Degrees of Freedom 2 72 74 Mean Square 60 3 F 20 F.05 = 3.12 (2 numerator degrees of freedom and 72 denominator) Since F = 20 > F.05 = 3.12, we reject the null hypothesis that the 3 POPULATION means are equal. 37. Manufacturer 1 23 6.67 SAMPLE Mean SAMPLE Variance Manufacturer 2 28 4.67 Manufacturer 3 21 3.33 x = (23 + 28 + 21)/3 = 24 k SSTR ∑ nj x j x j 1 = 4(23 - 24) + 4(28 - 24) + 4(21 - 24) = 104 2 2 2 2 MSTR = SSTR /(k - 1) = 104/2 = 52 k SSE ∑(n j 1)s j = 3(6.67) + 3(4.67) + 3(3.33) = 44.01 2 j 1 MSE = SSE /(nT - k) = 44.01/(12 - 3) = 4.89 F = MSTR /MSE = 52/4.89 = 10.63 F.05 = 4.26 (2 degrees of freedom numerator and 9 denominator) Since F = 10.63 > F.05 = 4.26 we reject the null hypothesis that the mean time needed to mix a batch of material is the same for each manufacturer. 38. Superior 5.75 1.64 SAMPLE Mean SAMPLE Variance Peer 5.5 2.00 Subordinate 5.25 1.93 x = (5.75 + 5.5 + 5.25)/3 = 5.5 k SSTR ∑ nj x j x j 1 = 8(5.75 - 5.5) + 8(5.5 - 5.5) + 8(5.25 - 5.5) = 1 2 2 2 MSTR = SSTR /(k - 1) = 1/2 = .5 k SSE ∑(n j 1)s j = 7(1.64) + 7(2.00) + 7(1.93) = 38.99 2 j 1 MSE = SSE /(nT - k) = 38.99/21 = 1.86 13- 200 2 F = MSTR /MSE = 0.5/1.86 = 0.27 F.05 = 3.47 (2 degrees of freedom numerator and 21 denominator) Since F = 0.27 < F.05 = 3.47, we cannot reject the null hypothesis that the means of the three POPULATIONs are equal; thus, the source of information does not significantly affect the dissemination of the information. 39. Marketing Managers 5 .8 SAMPLE Mean SAMPLE Variance Marketing Research 4.5 .3 Advertising 6 .4 x = (5 + 4.5 + 6)/3 = 5.17 k SSTR ∑ nj x j x j 1 = 6(5 - 5.17) + 6(4.5 - 5.17) + 6(6 - 5.17) = 7.00 2 2 2 2 MSTR = SSTR /(k - 1) = 7.00/2 = 3.5 k SSE ∑(n j 1)s j = 5(.8) + 5(.3) + 5(.4) = 7.50 2 j 1 MSE = SSE /(nT - k) = 7.50/(18 - 3) = .5 F = MSTR /MSE = 3.5/.50 = 7.00 F.05 = 3.68 (2 degrees of freedom numerator and 15 denominator) Since F = 7.00 > F.05 = 3.68, we reject the null hypothesis that the mean perception score is the same for the three groups of specialists. 40. Real Estate Agent 67.73 117.72 SAMPLE Mean SAMPLE Variance Architect 61.13 180.10 Stockbroker 65.80 137.12 x = (67.73 + 61.13 + 65.80)/3 = 64.89 k SSTR ∑ nj x j x j 1 = 15(67.73 - 64.89) + 15(61.13 - 64.89) + 15(65.80 - 64.89) = 345.47 2 2 2 MSTR = SSTR /(k - 1) = 345.47/2 = 172.74 k SSE ∑(n j 1)s j = 14(117.72) + 14(180.10) + 14(137.12) = 6089.16 2 j 1 13- 201 2 MSE = SSE /(nT - k) = 6089.16/(45-3) = 144.98 F = MSTR /MSE = 172.74/144.98 = 1.19 F.05 = 3.22 (2 degrees of freedom numerator and 42 denominator) Since F = 1.19 < F.05 = 3.22, we cannot reject the null hypothesis that the job stress ratings are the same for the three occupations. 41. The Excel output is shown below: SUMMARY Groups Count Sum Average Variance Banking 12 183 15.25 29.8409 Financial Services 7 128 18.2857 16.5714 Insurance 10 163 16.3 15.1222 ANOVA Source of Variation SS df MS Between Groups 40.7732 2 20.3866 Within Groups 563.7786 26 21.6838 Total 604.5517 28 F P-value 0.9402 0.4034 Since the p-value = 0.4034 > = 0.05, we cannot reject the null hypothesis that that the mean price/earnings ratio is the same for these three groups of firms. 42. x x z 1 2 s2 s2 .05 45, 000 35, 000 1.645 (4000)2 60 (3500)2 80 10,000 1066 or 8,934 to 11,066 43. H0: 1 - 2 = 0 Ha: 1 - 2 0 Reject H0 if z < -1.96 or if z > 1.96 13- 202 F crit 3.3690 z (x1 x2 ) (1 2 ) (4.27 3.38) 0 3.99 (1.85)2 (1.46)2 2 2 120 100 Reject H0; a difference exists with system B having the lower mean checkout time. 44. a. H0: 1 - 2 0 Ha: 1 - 2 > 0 Reject H0 if z > 1.645 b. n1= 30 n2 = 30 x1 = 16.23 x2 = 15.70 s1 = 3.52 s2 = 3.31 (3.52)2 sx x 1 z 2 30 (3.31)2 30 0.88 ( x1 x2 ) 0 (16.23 15.70) 0.59 sx x 0.88 1 2 Do not reject H0; cannot conclude that the mutual funds with a load have a greater mean rate of return. Load funds 16.23% ; no load funds 15.7% c. At z = 0.59, Area = 0.2224 p-value = 1.0000 - .7224 = 0.2776 45. Difference = before - after H0: d 0 Ha: d > 0 With 5 degrees of freedom, reject H0 if t > 2.015 d = 6.167 and sd = 6.585 t d d 6.167 0 2.29 sd / n 6.585 / 6 Using Excel's TDIST function, p-value = .035. Reject H0; conclude that the program provides weight loss. 46. a. POPULATION 1 - 1996 13- 203 POPULATION 2 - 1997 H0: 1 - 2 0 Ha: 1 - 2 > 0 b. d ∑ di / n 1.74 / 14 0.12 sd ∑(d d )2 0.33 n 1 Degrees of freedom = 13; t.05 = 1.771 Reject H0 if t > 1.771 or if p-value < = .05 t d 0 0.12 1.42 sd / n 0.33 / 14 Using Excel's TDIST function, p-value = .09. Do not reject H0. The SAMPLE of 14 companies shows earnings are down in the fourth quarter by a mean of 0.12 per share. However, DATA does not support the conclusion that mean earnings for all companies are down in 1997. 47. a. Area 1 96 50 SAMPLE Mean SAMPLE Variance Area 2 94 40 s2 s2 50 40 2 pooled estimate = 1 45 2 2 estimate of standard deviation of x1 x2 t 4.74 x1 x2 96 94 .42 4.74 4.74 t.025 = 2.447 (6 degrees of freedom) Since t = .42 < t.025 = 2.477, the means are not significantly different. b. x = (96 + 94)/2 = 95 k SSTR ∑ nj x j x j 1 = 4(96 - 95) + 4(94 - 95) = 8 2 2 2 MSTR = SSTR /(k - 1) = 8 /1 = 8 13- 204 k SSE ∑(n j 1)s j = 3(50) + 3(40) = 270 2 j 1 MSE = SSE /(nT - k) = 270 /(8 - 2) = 45 F = MSTR /MSE = 8 /45 = .18 F.05 = 5.99 (1 degree of freedom numerator and 6 denominator) Since F = .18 < F.05 = 5.99 the means are not significantly different. c. Area 1 96 50 SAMPLE Mean SAMPLE Variance Area 2 94 40 Area 3 83 42 x = (96 + 94 + 83)/3 = 91 k SSTR ∑ nj x j x j 1 = 4(96 - 91) + 4(94 - 91) + 4(83 - 91) = 392 2 2 2 2 MSTR = SSTR /(k - 1) = 392 /2 = 196 k SSE ∑(n j 1)s j = 3(50) + 3(40) + 3(42) = 396 2 j 1 MSTR = SSE /(nT - k) = 396 /(12 - 3) = 44 F = MSTR /MSE = 196 /44 = 4.45 F.05 = 4.26 (2 degrees of freedom numerator and 6 denominator) Since F = 4.45 > F.05 = 4.26 we reject the null hypothesis that the mean asking prices for all three areas are equal. 48. The Excel output for these DATA is shown below: SUMMARY Groups Count Sum Average Variance Sport Utility Small Pickup 10 10 586 488 58.6 48.8 20.9333 17.7333 Full-Size Pickup 10 601 60.1 22.1 ANOVA Source of Variation SS df 13- 205 MS F P-value F crit Between Groups 753.2667 2 376.6333 Within Groups 546.9 27 20.2556 1300.167 29 Total 18.5941 8.37E-06 3.3541 Because the p-value = .000 < = .05, we can reject the null hypothesis that the mean resale value is the same. It appears that the mean resale value for small pickup trucks is much smaller than the mean resale value for sport utility vehicles or full-size pickup trucks. 49. Food 52.25 22.25 SAMPLE Mean SAMPLE Variance Personal Care 62.25 15.58 Retail 55.75 4.92 x = (52.25 + 62.25 + 55.75)/3 = 56.75 k SSTR ∑ nj x j x j 1 = 4(52.25 - 56.75) + 4(62.25 - 56.75) + 4(55.75 - 56.75) = 206 2 2 2 2 MSTR = SSTR /(k - 1) = 206 /2 = 103 k SSE ∑(n j 1)s j = 3(22.25) + 3(15.58) + 3(4.92) = 128.25 2 j 1 MSE = SSE /(nT - k) = 128.25 /(12 - 3) = 14.25 F = MSTR /MSE = 103 /14.25 = 7.23 F.05 = 4.26 (2 degrees of freedom numerator and 9 denominator) Since F = 7.23 exceeds the critical F value, we reject the null hypothesis that the mean age of executives is the same in the three categories of companies. 50. Physical Therapist 63.7 164.68 Lawyer 50.0 124.22 SAMPLE Mean SAMPLE Variance Cabinet Maker 69.1 105.88 Systems Analyst 61.2 136.62 x 50.0 63.7 69.1 61.2 61 4 k SSTR ∑ nj x j x j 1 = 10(50.0 - 61) + 10(63.7 - 61) + 10(69.1 - 61) + 10(61.2 - 61) = 1939.4 2 2 13- 206 2 2 2 MSTR = SSTR /(k - 1) = 1939.4 /3 = 646.47 k SSE ∑(n j 1)s j = 9(124.22) + 9(164.68) + 9(105.88) + 9(136.62) = 4,782.60 2 j 1 MSE = SSE /(nT - k) = 4782.6 /(40 - 4) = 132.85 F = MSTR /MSE = 646.47 /132.85 = 4.87 F.05 = 2.87 (3 degrees of numerator and 36 denominator) Since F = 4.87 > F.05 = 2.87, we reject the null hypothesis that the mean job satisfaction rating is the same for the four professions. 51. The Excel output for these DATA is shown below: Anova: Single Factor SUMMARY Groups Count Sum Average Variance West South 10 10 1080 917 108 91.7 565.5556 384.9 Northeast 10 1211 121.1 826.3222 ANOVA MS F Between Groups Source of Variation 4338.867 2 2169.4333 3.6630 Within Groups 15991 27 592.2593 20329.87 29 Total SS df P-value 0.0391 F crit 3.3541 Because the p-value = .0391 < = .05, we can reject the null hypothesis that the mean rate for the three regions is the same. 52. The Excel output is shown below: SUMMARY Groups Count Sum Average Variance West 10 600 60 52.0933 South 10 454 45.4 57.9067 North Central 10 473 47.3 45.9444 Northeast 10 521 52.1 37.8511 ANOVA 13- 207 MS F Between Groups Source of Variation SS 1271 df 3 423.6667 8.7446 Within Groups 1744.16 36 48.4489 Total 3015.16 39 P-value 0.0002 F crit 2.8663 Since the p-value = 0.0002 < = 0.05, we can reject the null hypothesis that that the mean base salary for art directors is the same for each of the four regions. 53. The Excel output for these DATA is shown below: SUMMARY Groups Count Sum Average Variance Wide Receiver Guard 15 13 111.2 79.4 7.4133 6.1077 0.7841 0.5474 Offensive Tackle 12 84.7 7.0583 0.6408 ANOVA MS F Between Groups Source of Variation 12.4020 SS df 2 6.2010 9.3283 Within Groups 24.5957 37 0.6647 Total 36.9978 39 P-value 0.0005 F crit 3.2519 Because the p-value = .0005 < = .05, we can reject the null hypothesis that the mean rating for the three positions is the same. It appears that wide receivers and tackles have a higher mean rating than guards. 54. The output obtained using Excel's Anova: Single factor tool is shown. 13- 208 SUMMARY Groups UK US Europe Count 22 22 22 Sum 265.14 329.05 442.3 Average Variance 12.0518 1.9409 14.9568 3.4100 20.1045 6.4308 df MS 365.8766 3.9272 ANOVA Source of Variation Between Groups Within Groups SS 731.7533 247.4164 Total 979.1696 2 63 F 93.1637 P-value 0.0000 F crit 3.1428 65 Since the p-value = 0.0000 is less than = .05, we can reject the null hypothesis that the mean download time is the same for Websites located in the United kingdom, United States and Europe. 13- 209 Chapter 11 Comparisons Involving Proportions and A Test of Independence Learning Objectives 1. Know the properties of the sampling distribution of the difference between two proportions ( p1 p2 ) . 2. Be able to develop interval estimates and conduct hypothesis tests about the difference between the proportions of two POPULATIONs. 3. Be able to conduct a goodness of fit test when the POPULATION is hypothesized to have a multinomial probability distribution. 4. For a test of independence, be able to set up a contingency table, determine the observed and expected frequencies, and determine if the two variables are independent. 5. Understand the role of the chi-square distribution in conducting tests of goodness of fit and independence. 6. Be able to use the Excel worksheets presented as templates for interval estimates and hypothesis tests involving proportions. 13- 210 Solutions: 1. a. p1 p2 = .48 - .36 = .12 b. s p1 p2 p1 (1 p1 ) p2 (1 p2 ) 0.48(0.52) 0.36(0.64) 0.0373 400 300 0.12 1.645(0.0373) 0.12 0.0614 or 0.0586 to 0.1814 c. 0.12 1.96(0.0373) 0.12 0.0731 or 0.0469 to 0.1931 2. a. p n1 p 1 n 2 p 2 n1 n2 200(0.22) 300(0.16) 0.184 200 300 F1 1 I G H200 300JK 0.0354 sp p (0.184)(0.816) 1 2 Reject H0 if z > 1.645 z (.22 .16) 0 1.69 .0354 Reject H0 b. 3. p-value = (1.0000 - .9545) = .0455 p1 = 220/400 = 0.55 p2 = 192/400 = 0.48 sp p 1 2 0.55(0.45) 0.48(0.52) 0.0353 400 400 p1 p2 ± 1.96 sp p 1 2 0.55 - 0.48 1.96(0.0353) 0.07 0.0691 or 0.0009 to 0.1391 7% more executives are predicting an increase in full-time jobs. The confidence interval shows the difference may be from 0% to 14%. 13- 211 4. a. p1 = 682/1082 = .6303 (63%) p2 = 413/1008 = .4097 (41%) p1 p2 = .6303 - .4097 = .2206 (22%) b. p1 p2 p1 (1 p1 ) p1 p2 1.96 p p 1 p2 (1 p2 ) .6303(1.6303) .4097(1.4097) .0213 1082 1008 (0.58)(0.42) (0.43)(0.57) 2 .2206 1.96(.0213) .2206 .0418 or .1788 to .2624 p1 p2 z / 2 sp p 5. 1 p1 (1 p1 ) sp p 1 2 2 p2 (1 p2 ) n1 0.57(1710) 975 sp p 1 2 n2 0.08(1710) 137 (0.58)(0.42) (0.43)(0.57) 0.045 975 137 0.58 - 0.43 ± 1.96(0.045) 0.15 ± 0.09 or 0.07 to 0.24 6. a. p1 = 279/300 = 0.93 p2 = 255/300 = 0.85 b. H0: p1 - p2 = 0 Ha: p1 - p2 0 Reject H0 if z < -1.96 or if z > 1.96 p 279 255 300 300 0.89 sp p (0.89)(0.11) 1 2 z F1 1 I G H300 300JK 0.0255 p1 p2 0 0.93 0.85 3.13 sp p 0.0255 1 2 13- 212 Using Excel's NORMSDIST function, p-value = .002. c. Reject H0; women and men differ on this question. p1 p2 1.96sp p 1 sp p 1 2 2 (0.93)(0.07) (0.85)(0.15) 0.0253 300 300 0.93 - 0.85 1.96(0.0253) 0.08 0.05 or 0.03 to 0.13 95% confident that 3% to 13% more women than men agree with this statement. H0: p1 p2 7. Ha: p1 > p2 z b ( p1 p2 ) p 1 p2 sp p 1 g 2 n p n p 1545(0.675) 1691(0.608) p 1 1 2 2 0.64 n1 n2 1545 1691 1 1 1 1 sp p p(1 p) ( 0.64 )( 0.36) n n 1545 1691 1 2 1 F I G H JK 2 z F G H IJ 0.017 K (0.675 0.608) 0 3.94 0.017 Since 3.94 > z.05 = 1.645, we reject H0 p-value 0 Conclusion: The proportion of men that feel that the division of housework is fair is greater than the proportion of women that feel that the division of housework is fair. 8. a. A summary of the SAMPLE DATA is shown below: Respondents Men Women SAMPLE Size 200 300 Number Cooperating 110 210 H0: p1 - p2 = 0 Ha: p1 - p2 0 Reject H0 if z < -1.96 or if z > 1.96 13- 213 p 110 210 0.64 200 300 sp p (0.64)(0.36) 1 2 F1 1 I G H200 300JK 0.0438 p1 110 / 200 0.55 z p2 210 / 300 0.70 b ( p1 p2 ) p 1 p2 sp p 1 g (0.55 0.70) 0 3.42 0.0438 2 Reject H0; there is a difference between response rates for men and women. b. 0.15 1.96 0.55(0.45) 0.70(0.30) 200 300 .15 .0863 or .0637 to .2363 Greater response rate for women. 9. a. H0: p1 - p2 = 0 Ha: p1 - p2 0 Reject H0 if z < -1.96 or if z > 1.96 p 63 60 0.3514 150 200 F1 1 I G H150 200JK 0.0516 sp p (0.3514)(0.6486) 1 2 p1 63 / 150 0.42 z b ( p1 p2 ) p 1 p2 sp p 1 p2 60 / 200 0.30 g (0.42 0.30) 0 2.33 0.0516 2 p-value = 2(1.0000 - .9901) = .0198 Reject H0; there is a difference between the recall rates for the two commercials. b. (0.42 0.30) 1.96 0.42(58) 0.30(0.70) 150 200 .12 .10 or .02 to .22 10. p n 1 p 1 n2 p2 n1 n2 sp p p(1 p) 1 2 232(.815) 210(.724) .7718 232 210 1 13- 214 1 .04 z p1 p2 0 sp p 1 .815 .724 2.28 .04 2 p-value = 2(1.0000 - .9887) = .0226 p-value < .05, reject H0. The POPULATION proportions differ. NYSE is showing a greater proportion of stocks below their 1997 highs. 11. H0: p1 - p2 0 Ha: p1 - p2 0 p n1 p1 n2 p2 n1 n2 240(.40) 250(.32) .3592 240 250 sp p p(1 p) p1 p2 0 .40 .32 1 z 2 sp p 1 2 1 1 .0434 1.85 .0434 p-value = 1.0000 - .9678 = .0322 p-value < .05, reject H0. The proportion of users at work is greater in Washington D.C. 12. Expected frequencies: e1 = 200 (.40) = 80, e2 = 200 (.40) = 80 e3 = 200 (.20) = 40 Actual frequencies: f1 = 60, f2 = 120, f3 = 20 (60 - 80)2 (120 - 80) 2 (20 - 40)2 + + 2 = 80 80 40 = 400 80 + 1600 + 80 400 40 = 5 + 20 + 10 = 35 2 .01 = 9.21034 with k - 1 = 3 - 1 = 2 degrees of freedom Since 2 = 35 > 9.21034 reject the null hypothesis. The POPULATION proportions are not as stated in the null hypothesis. 13- 215 13. Expected frequencies: e1 = 300 (.25) = 75, e2 = 300 (.25) = 75 e3 = 300 (.25) = 75, e4 = 300 (.25) = 75 Actual frequencies: f1 = 85, f2 = 95, f3 = 50, f4 = 70 2 = (85 - 75)2 (95 - 75) 2 + 75 100 = 75 (50 - 75)2 + 75 400 + 75 625 + 75 (70 - 75)2 + 75 75 25 + 75 1150 = 75 = 15.33 2 .05 = 7.81473 with k - 1 = 4 - 1 = 3 degrees of freedom Since 2 = 15.33 > 7.81473 reject H0 We conclude that the proportions are not all equal. 14. H0 = pABC = .29, pCBS = .28, pNBC = .25, pOther = .18 Ha = The proportions are not pABC = .29, pCBS = .28, pNBC = .25, pOther = .18 Expected frequencies: 300 (.29) = 87, 300 (.28) = 84 300 (.25) = 75, 300 (.18) = 54 e1 = 87, e2 = 84, e3 = 75, e4 = 54 Actual frequencies: f1 = 95, f2 = 70, f3 = 89, f4 = 46 2 .05 = 7.81 (3 degrees of freedom) 2 = (95 - 87)2 87 = 6.87 + (70 - 84)2 84 + (89 - 75)2 + (46 - 54)2 75 54 Do not reject H0; there is no significant change in the viewing audience proportions. 13- 216 15. Observed Frequency (fi) 177 135 79 41 36 38 506 Hypothesized Proportion 0.30 0.20 0.20 0.10 0.10 0.10 Totals: Category Brown Yellow Red Orange Green Blue Expected Frequency (ei) 151.8 101.2 101.2 50.6 50.6 50.6 (fi - ei)2 / ei 4.18 11.29 4.87 1.82 4.21 3.14 29.51 2 .05 = 11.07 (5 degrees of freedom) Since 29.51 > 11.07, we conclude that the percentage figures reported by the company have changed. 16. Observed Frequency (fi) 264 255 229 748 Hypothesized Proportion 1/3 1/3 1/3 Totals: Category Full Service Discount Both Expected Frequency (ei) 249.33 249.33 249.33 (fi - ei)2 / ei 0.86 0.13 1.66 2.65 2 = 4.61 (2 degrees of freedom) .10 Since 2.65 < 4.61, there is no significant difference in preference among the three service choices. 17. Category News and Opinion General Editorial Family Oriented Business/Financial Female Oriented African-American Observed Frequency (fi) 20 15 30 22 16 12 115 Hypothesized Proportion 1/6 1/6 1/6 1/6 1/6 1/6 Totals: Expected Frequency (ei) 19.17 19.17 19.17 19.17 19.17 19.17 (fi - ei)2 / ei .04 .91 6.12 .42 .52 2.68 10.69 2 .10 = 9.24 (5 degrees of freedom) Since 10.69 > 9.24, we conclude that there is a difference in the proportion of ads with guilt appeals among the six types of magazines. 18. Expected frequencies: 2 2 = (43 - 45) + 45 ei = (1 / 3) (135) = 45 (53 - 45)2 45 + (39 - 45)2 = 2.31 45 13- 217 19. With 2 degrees of freedom, 2.05 = 5.99 Do not reject H0; there is no justification for concluding a difference in preference exists. H0: p1 = .03, p2 = .28, p3 = .45, p4 = .24 df = 3 2 = 11.34 .01 Reject H0 if 2 > 11.34 Rating Excellent Good Fair Poor Observed 24 124 172 80 400 (fi - ei)2 / ei 12.00 1.29 .36 2.67 2 = 16.31 Expected .03(400) = 12 .28(400) = 112 .45(400) = 180 .24(400) = 96 400 Reject H0; conclude that the ratings differ. A comparison of observed and expected frequencies show telephone service is slightly better with more excellent and good ratings. 20. H0 = The column variable is independent of the row variable Ha = The column variable is not independent of the row variable Expected Frequencies: A 28.5 21.5 P Q 2 2 = (20 - 28.5) + (44 - 39.9) 28.5 2 + B 39.9 30.1 (50 - 45.6) 39.9 2 + C 45.6 34.4 (30 - 21.5) 45.6 2 + (26 - 30.1) 21.5 30.1 = 7.86 2 .025 = 7.37776 with (2 - 1) (3 - 1)= 2 degrees of freedom Since 2 = 7.86 > 7.37776 Reject H0 Conclude that the column variable is not independent of the row variable. 21. H0 = The column variable is independent of the row variable Ha = The column variable is not independent of the row variable Expected Frequencies: P Q R A 17.5000 28.7500 13.7500 B 30.6250 50.3125 24.0625 13- 218 C 21.8750 35.9375 17.1875 2 + (30 - 34.4) 34.4 2 2 = (20 - 17.5000)2 + 17.5000 = 19.78 (30 - 30.6250)2 ++ (30 - 17.1875)2 30.6250 17.1875 2 .05 = 9.48773 with (3 - 1) (3 - 1)= 4 degrees of freedom Since 2 = 19.78 > 9.48773 Reject H0 Conclude that the column variable is not independent of f the row variable. 22. H0 : Type of ticket purchased is independent of the type of flight Ha: Type of ticket purchased is not independent of the type of flight. Expected Frequencies: e11 = 35.59 e21 = 150.73 e31 = 455.68 Ticket First First Business Business Full Fare Full Fare e12 = 15.41 e22 = 65.27 e32 = 197.32 Flight Domestic International Domestic International Domestic International Totals: Observed Frequency (fi) 29 22 95 121 518 135 920 Frequency (ei) 35.59 15.41 150.73 65.27 455.68 197.32 (fi - ei)2 / ei 1.22 2.82 20.61 47.59 8.52 19.68 100.43 2 .05 = 5.99 with (3 - 1)(2 - 1) = 2 degrees of freedom Since 100.43 > 5.99, we conclude that the type of ticket purchased is not independent of the type of flight. 23. a. Observed Frequency (fij) Same Different Total Domestic 125 140 265 European 55 105 160 Asian 68 107 175 Total 248 352 600 Domestic 109.53 155.47 265 European 66.13 93.87 160 Asian 72.33 102.67 175 Total 248 352 600 Expected Frequency (eij) Same Different Total Chi Square (fij - eij)2 / eij Same Domestic 2.18 European 1.87 13- 219 Asian 0.26 Total 4.32 Different 1.54 0.18 3.04 2 = 7.36 2 .05 = 5.99 Degrees of freedom = 2 b. 1.32 Reject H0; conclude brand loyalty is not independent of manufacturer. Brand Loyalty Domestic 125/265 = .472 (47.2%) Highest European 55/160 = .344 (34.4%) Asian 68/175 = .389 (38.9%) 24. Major Business Engineering Oil 30 30 Chemical 22.5 22.5 Industry Electrical 17.5 17.5 Computer 30 30 Note: Values shown above are the expected frequencies. 2 .01 = 11.3449 (3 degrees of freedom: 1 x 3 = 3) 2 = 12.39 Reject H0; conclude that major and industry not independent. 25. Expected Frequencies: e11 = e21 = e31 = e41 = e51 = e61 = 31.0 29.5 13.0 5.5 7.0 14.0 Most Difficult Spouse Spouse Parents Parents Children Children Siblings Siblings In-Laws In-Laws Other Relatives Other Relatives e12 = e22 = e32 = e42 = e52 = e62 = 31.0 29.5 13.0 5.5 7.0 14.0 Gender Men Women Men Women Men Women Men Women Men Women Men Women Totals: Observed Expected Frequency (fi) 37 25 28 31 7 19 8 3 4 10 16 12 200 Frequency (ei) 31.0 31.0 29.5 29.5 13.0 13.0 5.5 5.5 7.0 7.0 14.0 14.0 2 .05 = 11.0705 with (6 - 1) (2 - 1) = 5 degrees of freedom 13- 220 (fi - ei)2 / ei 1.16 1.16 0.08 0.08 2.77 2.77 1.14 1.14 1.29 1.29 0.29 0.29 13.43 Since 13.43 > 11.0705. we conclude that gender is not independent of the most difficult person to buy for. 26. Expected Frequencies: e11 = e21 = e31 = e41 = e51 = e61 = 17.16 14.88 28.03 22.31 17.16 15.45 e12 = e22 = e32 = e42 = e52 = e62 = Magazine News News General General Family Family Business Business Female Female African-American African-American 12.84 11.12 20.97 16.69 12.84 11.55 Observed Frequency (fi) 20 10 15 11 30 19 22 17 16 14 12 15 201 Appeal Guilt Fear Guilt Fear Guilt Fear Guilt Fear Guilt Fear Guilt Fear Totals: Frequency (ei) 17.16 12.84 14.88 11.12 28.03 20.97 22.31 16.69 17.16 12.84 15.45 11.55 (fi - ei)2 / ei 0.47 0.63 0.00 0.00 0.14 0.18 0.00 0.01 0.08 0.11 0.77 1.03 3.41 2 .01 = 15.09 with (6 - 1) (2 - 1) = 5 degrees of freedom Since 3.41 < 15.09, the hypothesis of independence cannot be rejected. 27. a. Observed Frequency (fij) Correct Incorrect Total Pharm 207 3 210 Consumer 136 4 140 Computer 151 9 160 Telecom 178 12 190 Total 672 28 700 Consumer 134.4 5.6 140 Computer 153.6 6.4 160 Telecom 182.4 7.6 190 Total 672 28 700 Expected Frequency (eij) Correct Incorrect Total Pharm 201.6 8.4 210 Chi Square (fij - eij)2 / eij 13- 221 Correct Incorrect Pharm .14 3.47 Degrees of freedom = 3 Consumer .02 .46 Computer .04 1.06 Telecom .11 2.55 Total .31 7.53 2 = 7.85 2 = 7.81473 .05 Reject H0; conclude that order fulfillment is not independent of industry. b. 28. The pharmaceutical industry is doing the best with 207 of 210 (98.6%) correctly filled orders. Expected Frequencies: Supplier A B C Good 88.76 173.09 133.15 Part Quality Minor Defect 6.07 11.83 9.10 Major Defect 5.14 10.08 7.75 2 = 7.96 2 .05 = 9.48773 (4 degrees of freedom: 2 x 2 = 4) Do not reject H0; conclude that the assumption of independence cannot be rejected 29. Expected Frequencies: Education Level Did not complete high school High school degree College degree Democratic 28 32 40 Party Affiliation Republican 28 32 40 Independent 14 16 20 2 = 13.42 2 .01 = 13.2767 (4 degrees of freedom: 2 x 2 = 4) Reject H0; conclude that party affiliation is not independent of education level. 30. Expected Frequencies: e11 = 11.81 e21 = 8.40 e31 = 21.79 Siskel Con Con Con Mixed Mixed Mixed Pro Pro e12 = 8.44 e22 = 6.00 e32 = 15.56 Ebert Con Mixed Pro Con Mixed Pro Con Mixed e13 = 24.75 e23 = 17.60 e33 = 45.65 Observed Frequency (fi) 24 8 13 8 13 11 10 9 13- 222 Expected Frequency (ei) 11.81 8.44 24.75 8.40 6.00 17.60 21.79 15.56 (fi - ei)2 / ei 12.57 0.02 5.58 0.02 8.17 2.48 6.38 2.77 Pro Pro Totals: 64 160 45.65 7.38 45.36 2 .01 = 13.28 with (3 - 1) (3 - 1) = 4 degrees of freedom Since 45.36 > 13.28, we conclude that the ratings are not independent. 31. A summary of the SAMPLE DATA is shown below: Region I II SAMPLE Size 500 800 p1 = 175 / 500 = .35 p2 = 360 / 800 = .45 0.35(0.65) 0.45(0.55) 0.0276 500 800 sp p 1 Number Indicating An Intent to Purchase 175 360 2 .10 2.575(.0276) .10 .071 or .029 to .171 32. a. H0: p1 - p2 0 Ha: p1 - p2 > 0 b. p1 = 704/1035 = .6802 (68%) p2 = 582/1004 = .5797 (58%) p1 p2 = .6802 - .5797 = .1005 p n 1 p 1 n 2 p2 n1 n2 1035(0.6802) 1004(0.5797) .6307 1035 1004 sp p p(1 p) 1 z 2 ( p1 p2 ) 0 .6802 .5797 4.70 sp p .0214 1 2 p-value 0 c. 33. a. Reject H0; proportion indicating good/excellent increased. H0: p1 - p2 = 0 13- 223 .0214 Ha: p1 - p2 0 Reject H0 if z < -1.96 or if z > 1.96 p 76 90 400 900 0.1277 F1 1 I J 0.02 G H 400 900 K sp p (0.1277)(0.8723) 1 2 p1 76 / 400 0.19 z p2 90 / 900 0.10 ( p1 p2 ) ( p1 p2 ) (0.19 0.10) 0 4.50 sp p 0.02 1 2 p-value 0 Reject H0; there is a difference between claim rates. b. 0.09 1.96 0.19(0.81) 0.10(0.90) 400 900 .09 .0432 or .0468 to .1332 p 34. 9 5 14 0.0341 142 268 410 sp p (0.0341)(0.9659) 1 2 F1 1 I G H142 268JK 0.0188 p1 9 / 142 0.0634 p2 5 / 268 0.0187 p1 p2 0.0634 0.0187 0.0447 z 0.0447 0 0.0188 2.38 p-value = 2(1.0000 - .9913) = 0.0174 Reject H0; There is a significant difference in drug resistance between the two states. New Jersey has the higher drug resistance rate. 35. a. b. .38(430) = 163.4 Estimate: 163 .23(285) = 65.55 Estimate: 66 p1 p2 .38 .23 .15 sp p 1 2 .38(1.38) .23(1.23) .064 163 66 Confidence interval: .15 1.96(.064) or .15 .125(.025 to .275) 13- 224 c. 36. a. Yes, since the confidence interval in part (b) does not include 0, I would conclude that the Kodak campaign is more effective than most. p1 .38 p2 .22 Point estimate = p1 p2 .38 .22 .16 b. H0: p1 - p2 0 Ha: p1 - p2 0 c. p n 1 p 1 n 2 p2 n1 n2 (200)(.38) (200)(.22) .30 200 200 1 sp p 1 z 1 2 .38 .22 2 .0458 3.49 .0458 z.01 = 2.33 With z = 3.49 > 2.33 we reject H0 and conclude that expectations for future inflation have diminished. 37. Observed Expected 60 50 45 50 59 50 36 50 2 = 8.04 2 .05 = 7.81473 (3 degrees of freedom) Reject H0; conclude that the order potentials are not the same in each sales territory. 38. Observed Expected 2 = (48 – 37.03)2 37.03 48 37.03 + 323 306.82 79 126.96 16 21.16 63 37.03 (323 – 306.82)2 (63 – 37.03)2 + + 306.82 37.03 = 41.69 2 .01 = 13.2767 (4 degrees of freedom) Since 41.69 > 13.2767, reject H0. Mutual fund investors' attitudes toward corporate bonds differ from their attitudes toward corporate stock. 13- 225 39. Observed Expected 2 = (20 – 35) 2 35 20 35 + (20 – 35)2 35 + 20 35 40 35 (40 – 35)2 + (60 – 35) 35 35 60 35 2 = 31.43 2 .05 = 7.81473 (3 degrees of freedom) Since 31.43 > 7.81473, reject H0. The park manager should not plan on the same number attending each day. Plan on a larger staff for Sundays and holidays. 40. Observed Expected 13 18 16 18 28 18 17 18 16 18 2 = 7.44 2 .05 = 9.48773 Do not reject H0; the assumption that the number of riders is uniformly distributed cannot be rejected. 41. Observed Frequency (fi) 105 235 55 90 15 500 Hypothesized Proportion 0.28 0.46 0.12 0.10 0.04 Totals: Category Very Satisfied Somewhat Satisfied Neither Somewhat Dissatisfied Very Dissatisfied Expected Frequency (ei) 140 230 60 50 20 (fi - ei)2 / ei 8.75 0.11 0.42 32.00 1.25 42.53 2 .05 = 9.49 (4 degrees of freedom) Since 42.53 > 9.49, we conclude that the job satisfaction for computer programmers is different than 42. Expected Frequencies: Quality Shift 1st 2nd 3rd Good 368.44 276.33 184.22 13- 226 Defective 31.56 23.67 15.78 the 2 = 8.11 2 .05 = 5.99147 (2 degrees of freedom) 43. Reject H0; conclude that shift and quality are not independent. Expected Frequencies: e11 = e21 = e31 = e41 = 1046.19 28.66 258.59 516.55 e12 = e22 = e32 = e42 = Employment Full-Time Full-time Part-Time Part-Time Self-Employed Self-Employed Not Employed Not Employed 632.81 17.34 156.41 312.45 Observed Frequency (fi) 1105 574 31 15 229 186 485 344 2969 Region Eastern Western Eastern Western Eastern Western Eastern Western Totals: Frequency (ei) 1046.19 632.81 28.66 17.34 258.59 156.41 516.55 312.45 (fi - ei)2 / ei 3.31 5.46 0.19 0.32 3.39 5.60 1.93 3.19 23.37 2 .05 = 7.81 with (4 - 1) (2 - 1) = 3 degrees of freedom Since 23.37 > 7.81, we conclude that employment status is not independent of region. 44. Expected frequencies: Loan Approval Decision Approved Rejected 24.86 15.14 18.64 11.36 31.07 18.93 12.43 7.57 Loan Offices Miller McMahon Games Runk 2 = 2.21 2 .05 = 7.81473 (3 degrees of freedom) Do not reject H0; the loan decision does not appear to be dependent on the officer. 45. a. Observed Frequency (fij) Men Women Total Never Married 234 216 450 Married 106 168 274 Divorced 10 16 26 Total 350 400 750 Married Divorced Total Expected Frequency (eij) Never Married 13- 227 Men Women Total 210 240 450 127.87 146.13 274 12.13 13.87 26 350 400 750 Married 3.74 3.27 Divorced .38 .33 Total 6.86 6.00 2 = 12.86 Chi Square (fij - eij)2 / eij Never Married 2.74 2.40 Men Women 2 = 9.21 .01 Degrees of freedom = 2 Reject H0; conclude martial status is not independent of gender. b. Martial Status Never Married 66.9% 54.0% Men Women Married 30.3% 42.0% Divorced 2.9% 4.0% Men 100 - 66.9 = 33.1% have been married Women 100 - 54.0 = 46.0% have been married 46. Expected Frequencies: e11 2 (50)(18) (50)(24) (50)(12) 9, e12 12, , e25 6 100 100 100 (4 9)2 9 (10 12)2 12 (4 6)2 9.76 6 2 = 9.48773 (4 degrees of freedom) .05 Since 9.76 > 9.48773, reject H0. Banking tends to have lower P/E ratios. We can conclude that industry type and P/E ratio are related. 47. Expected Frequencies: County Urban Rural Total Sun 56.7 11.3 68 Mon 47.6 9.4 57 Days of the Week Tues Wed Thur 55.1 56.7 60.1 10.9 11.3 11.9 66 68 72 2 = 6.20 2 .05 = 12.5916 (6 degrees of freedom) 13- 228 Fri 72.6 14.4 87 Sat 44.2 8.8 53 Total 393 78 471 Do not reject H0; the assumption of independence cannot be rejected. 48. Expected Frequencies: Occupied Vacant Total Los Angeles 165.7 34.3 200.0 2 San Diego 124.3 25.7 150.0 San Francisco 186.4 38.6 225.0 2 San Jose 165.7 34.3 200.0 Total 642 133 775 2 (160 - 165.7) (26 - 34.3) (116 - 124.3) ++ + 34.3 165.7 124.3 = 7.78 2 = 2 = 7.81473 with 3 degrees of freedom .05 Since 2 = 7.78 7.81473 Do not reject H0. We cannot conclude that office vacancies are dependent on metropolitan area, but it is close: the p-value is slightly larger than .05. 13- 229 Chapter 12 Simple Linear Regression Learning Objectives 1. Understand how regression analysis can be used to develop an equation that estimates mathematically how two variables are related. 2. Understand the differences between the regression model, the regression equation, and the estimated regression equation. 3. Know how to fit an estimated regression equation to a set of SAMPLE DATA based upon the least- squares method. 4. Be able to determine how good a fit is provided by the estimated regression equation and compute the SAMPLE correlation coefficient from the regression analysis output. 5. Understand the assumptions necessary for statistical inference and be able to test for a significant relationship. 6. Learn how to use a residual plot to make a judgement as to the validity of the regression assumptions, recognize outliers, and identify influential observations. 7. Know how to develop confidence interval estimates of y given a specific value of x in both the case of a mean value of y and an individual value of y. 8. Be able to compute the SAMPLE correlation coefficient from the regression analysis output. 9. Know the definition of the following terms: independent and dependent variable simple linear regression regression model regression equation estimated regression equation scatter diagram coefficient of determination standard error of the estimate confidence interval prediction interval residual plot standardized residual plot outlier influential observation leverage 13- 230 Solutions: a. 16 14 12 10 y 1 8 6 4 2 0 0 1 2 3 4 5 6 x b. There appears to be a linear relationship between x and y. c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part d we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. d. Summations needed to compute the slope and y-intercept are: x 15 y 40 i b1 i (x x )( y y ) 26 i i i (xi x )( yi y) 26 2.6 (x i x )2 10 b0 y b1 x 8 (2.6)(3) 0.2 yˆ 0.2 2.6x e. (x x )2 10 y$ 0.2 2.6(4) 10.6 13- 231 a. 35 30 25 20 y 2. 15 10 5 0 0 2 4 6 8 10 x b. There appears to be a linear relationship between x and y. c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part d we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. d. Summations needed to compute the slope and y-intercept are: x 19 i b1 y 116 (x x )( y y) 57.8 i i i i (xi x )( yi y) 57.8 1.8766 (x i x )2 30.8 b0 y b1 x 23.2 (1.8766)(3.8) 30.3311 y$ 30.33 1.88x e. (x x )2 30.8 y$ 30.33 1.88(6) 19.05 13- 232 3. a. 7 6 5 y 4 3 2 1 0 0 2 4 6 8 x b. Summations needed to compute the slope and y-intercept are: x 26 i b1 y 17 (x x )( y y) 11.6 i i i i (xi x )( yi y) 11.6 0.5088 (x i x )2 22.8 b0 y b1 x 3.4 (0.5088)(5.2) 0.7542 y$ 0.75 0.51x c. (x x )2 22.8 y$ 0.75 0.51(4) 2.79 13- 233 10 4. a. 135 130 125 y 120 115 110 105 100 61 62 63 64 65 66 67 68 69 x b. c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part d we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. d. Summations needed to compute the slope and y-intercept are: x 325 i b1 y 585 i (x x )( y y) 110 i i i (xi x )( yi y) 110 5.5 (x i x )2 20 b0 y b1 x 117 (5.5)(65) 240.5 y$ 240.5 5.5x e. (x x )2 20 y$ 240.5 5.5x 240.5 5.5(63) 106 pounds 13- 234 There appe a. 2100 1900 1700 1500 1300 y 5. 1100 900 700 500 300 100 0 20 40 60 80 100 120 140 x b. There appears to be a linear relationship between x = media expenditures (millions of dollars) and = case sales (millions). c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part d we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. Summations needed to compute the slope and y-intercept are: x 420.6 i b1 y 5958.7 i (x x )( y y) 142, 040.3443 i i (x x )2 9847.6486 i (xi x )( yi y) 142, 040.3443 14.4238 (x i x )2 9847.6486 b0 y b1x 851.2429 (14.4238)(60.0857) 15.42 y$ 15.42 14.42x d. A one million dollar increase in media expenditures will increase case sales by approximately 14.42 million. e. y$ 15.42 14.42x 15.42 14.42(70) 993.98 13- 235 y a. 1.4 1.2 1 0.8 y 6. 0.6 0.4 0.2 0 66 68 70 72 74 76 78 80 82 84 x b. There appears to be a linear relationship between x = percentage of flights arriving on time and = number of complaints per 100,000 passengers. c. Summations needed to compute the slope and y-intercept are: x 667.2 y 7.18 i b1 i (x x )( y y) 9.0623 i i (x x )2 128.7 i (xi x )( yi y) 9.0623 0.0704 (x i x )2 128.7 b0 y b1 x 0.7978 (0.0704)(74.1333) 6.02 y$ 6.02 0.07x d. A one percent increase in the percentage of flights arriving on time will decrease the number of complaints per 100,000 passengers by 0.07. e y$ 6.02 0.07x 6.02 0.07(80) 0.42 13- 236 y 1550 1500 S&P 1450 1400 1350 1300 9600 9800 10000 10200 10400 10600 10800 11000 11200 DJIA 7. a. b. Let x = DJIA and y = S&P. Summations needed to compute the slope and y-intercept are: x 104,850 i b 1 y 14, 233 i (x x )( y y ) 268, 921 i (x x )2 1,806, 384 i (xi x )( yi y) 268, 921 0.14887 (x i x )2 1,806, 384 b0 y b1x 1423.3 (.14887)(10, 485) 137.629 yˆ 137.63 0.1489x c. yˆ 137.63 0.1489(11, 000) 1500.27 or approximately 1500 13- 237 i a. 1800 1600 1400 1200 Price 8. 1000 800 600 400 200 0 0 1 2 3 4 5 6 Sidetrack Capability b. There appears to be a linear relationship between x = sidetrack capability and y = price, with higher priced models having a higher level of handling. c. Summations needed to compute the slope and y-intercept are: x 28 i b1 y 10, 621 i (x x )( y y) 4003.2 i i (xi x )( yi y) 4003.2 204.2449 (x i x )2 19.6 b0 y b1 x 1062.1 (204.2449)(2.8) 490.21 yˆ 490.21 204.24x d. yˆ 490.21 204.24x 490.21 204.24(4) 1307 13- 238 (x x )2 19.6 i a. Let x = years of experience and y = annual sales ($1000s) 150 140 130 120 110 y 9. 100 90 80 70 60 50 0 2 4 6 8 10 x b. Summations needed to compute the slope and y-intercept are: x 70 i b1 y 1080 (x x )( y y) 568 i i i i (xi x )( yi y) 568 4 (x i x )2 142 b0 y b1 x 108 (4)(7) 80 y$ 80 4 x c. (x x )2 142 y$ 80 4x 80 4(9) 116 13- 239 12 14 95 Overall Rating 90 85 80 75 70 65 60 100 150 200 250 Performance Score 10. a. b. Let x = performance score and y = overall rating. Summations needed to compute the slope and yintercept are: x 2752 i b 1 y 1177 i (x x )( y y) 1723.73 i i (xi x )( yi y) 1723.73 0.1452 (x i x )2 11, 867.73 b0 y b1x 78.4667 (.1452)(183.4667) 51.82 yˆ 51.82 0.145x c. yˆ 51.82 0.145(225) 84.4 or approximately 84 13- 240 (x x )2 11,867.73 i 11. a. Let x = hotel revenue and y = gaming revenue 900.0 800.0 700.0 600.0 y 500.0 400.0 300.0 200.0 100.0 0.0 0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 x b. There appears to be a linear relationship between the variables. c. The summations needed to compute the slope and the y-intercept are: x 2973.3 y 3925.6 i b1 i (x x )( y y) 453, 345.042 i i (xi x )( yi y) 453, 345.042 0.9385 (x i x )2 483, 507.581 b0 y b1 x 392.56 (0.9385)(297.33) 113.52 y$ 113.52 0.94x d. y$ 113.52 0.94x 113.52 0.94(500) 583.5 13- 241 (x x )2 483, 507.581 i 12. a. 40000 35000 Revenue 30000 25000 20000 15000 10000 5000 0 0 20000 40000 60000 80000 100000 Number of Employees b. There appears to be a positive linear relationship between the number of employees and the revenue. b. Let x = number of employees and y = revenue. Summations needed to compute the slope and yintercept are: x 4200 i b 1 y 1669 i (x x )( y y) 4, 658, 594,168 i i (xi x )( yi y) 4, 658, 594,168 0.316516 (x i x )2 14, 718, 343,803 b0 y b1 x 14, 048 (.316516)(40, 299) 1293 yˆ 1293 0.3165x d. yˆ 1293 .3165(75, 000) 25, 031 13- 242 (x x )2 14, 718, 343, 803 i 13. a. Let x = adjusted gross income ($1000s) and y = total itemized deductions ($1000s) 30.0 25.0 y 20.0 15.0 10.0 5.0 0.0 0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 x b. The summations needed to compute the slope and the y-intercept are: x 399 y 97.1 i b1 i (x x )( y y) 1233.7 i i (x x )2 7648 i (xi x )( yi y) 1233.7 0.16131 (x i x )2 7648 b0 y b1 x 13.87143 (0.16131)(57) 4.67675 y$ 4.68 0.16x c. y$ 4.68 0.16x 4.68 0.16(52.5) 13.08 or approximately $13,080. The agent's request for an audit appears to be justified. 13- 243 14. a. Let x = average room rate ($) and y = occupancy rate (%) 85 80 y 75 70 65 60 60 70 80 90 100 110 x b. The summations needed to compute the slope and the y-intercept are: x 1677.25 i b 1 y 1404.3 (x x )( y y) 897.9493 i i i (x x )2 3657.4568 i (xi x )( yi y) 897.9493 0.2455 (x i x )2 3657.4568 b0 y b1 x 70.215 (0.2455)(83.8625) 49.63 y$ 49.63.2455x c. 15. a. y$ 49.63.2455x 49.63.2455(80) 69.3% The estimated regression equation and the mean for the dependent variable are: y$i 0.2 2.6xi y 8 The sum of squares due to error and the total sum of squares are SSE ∑( y i y$i )2 12.40 SST ∑( yi y)2 80 Thus, SSR = SST - SSE = 80 - 12.4 = 67.6 b. r2 = SSR/SST = 67.6/80 = .845 The least squares line provided a very good fit; 84.5% of the variability in y has been explained by the least squares line. c. rxy .845 .9192 13- 244 16. a. The estimated regression equation and the mean for the dependent variable are: yˆi 30.33 1.88x y 23.2 The sum of squares due to error and the total sum of squares are SSE ∑( yi yˆi )2 6.33 SST ∑( yi y)2 114.80 Thus, SSR = SST - SSE = 114.80 - 6.33 = 108.47 b. r2 = SSR/SST = 108.47/114.80 = .945 The least squares line provided an excellent fit; 94.5% of the variability in y has been explained by the estimated regression equation. c. rxy .945 .9721 Note: the sign for rxy is negative because the slope of the estimated regression equation is negative. (b1 = -1.88) 17. The estimated regression equation and the mean for the dependent variable are: yˆi .75 .51x y 3.4 The sum of squares due to error and the total sum of squares are SSE ∑( yi yˆi )2 5.3 SST ∑( yi y)2 11.2 Thus, SSR = SST - SSE = 11.2 - 5.3 = 5.9 r2 = SSR/SST = 5.9/11.2 = .527 We see that 52.7% of the variability in y has been explained by the least squares line. rxy .527 .7259 18. a. The estimated regression equation and the mean for the dependent variable are: yˆ 1790.5 581.1x y 3650 The sum of squares due to error and the total sum of squares are SSE ∑( y i yˆi )2 85,135.14 SST ∑( y i y )2 335, 000 Thus, SSR = SST - SSE = 335,000 - 85,135.14 = 249,864.86 b. r2 = SSR/SST = 249,864.86/335,000 = .746 We see that 74.6% of the variability in y has been explained by the least squares line. 13- 245 c. 19. a. rxy .746 .8637 The estimated regression equation and the mean for the dependent variable are: yˆ 137.63 .1489x y 1423.3 The sum of squares due to error and the total sum of squares are SSE ∑( y i yˆi )2 7547.14 SST ∑( yi y)2 47, 582.10 Thus, SSR = SST - SSE = 47,582.10 - 7547.14 = 40,034.96 b. r2 = SSR/SST = 40,034.96/47,582.10 = .84 We see that 84% of the variability in y has been explained by the least squares line. c. 20. a. rxy .84 .92 Let x = income and y = home price. Summations needed to compute the slope and y-intercept are: x 1424 i b 1 y 2455.5 i (x x )( y y) 4011 i (x x )2 1719.618 i i (xi x )( yi y) 4011 2.3325 (x i x )2 1719.618 b0 y b1x 136.4167 (2.3325)(79.1111) 48.11 yˆ 48.11 2.3325x b. The sum of squares due to error and the total sum of squares are SSE ∑( yi yˆi )2 2017.37 SST ∑( y i y)2 11, 373.09 Thus, SSR = SST - SSE = 11,373.09 – 2017.37 = 9355.72 r2 = SSR/SST = 9355.72/11,373.09 = .82 We see that 82% of the variability in y has been explained by the least squares line. rxy .82 .91 c. 21. a. yˆ 48.11 2.3325(95) 173.5 or approximately $173,500 The summations needed in this problem are: x 3450 i b1 y 33, 700 i (x x )( y y ) 712, 500 i i (xi x )( yi y) 712, 500 7.6 (x i x )2 93, 750 13- 246 (x x )2 93, 750 i b0 y b1 x 5616.67 (7.6)(575) 1246.67 y$ 1246.67 7.6x b. $7.60 c. The sum of squares due to error and the total sum of squares are: SSE ∑( y i yˆi )2 233, 333.33 SST ∑( y i y)2 5, 648, 333.33 Thus, SSR = SST - SSE = 5,648,333.33 - 233,333.33 = 5,415,000 r2 = SSR/SST = 5,415,000/5,648,333.33 = .9587 We see that 95.87% of the variability in y has been explained by the estimated regression equation. d. 22. a. y$ 1246.67 7.6x 1246.67 7.6(500) $5046.67 The summations needed in this problem are: x 613.1 i b 1 y 70 (x x )( y y) 5766.7 i i (x x )2 45,833.9286 i i (xi x )( yi y) 5766.7 0.1258 (x i x )2 45,833.9286 b0 y b1 x 10 (0.1258)(87.5857) 1.0183 yˆ 1.0183 0.1258x b. The sum of squares due to error and the total sum of squares are: SSE ∑( yi yˆi )2 1272.4495 SST ∑( yi y)2 1998 Thus, SSR = SST - SSE = 1998 - 1272.4495 = 725.5505 r2 = SSR/SST = 725.5505/1998 = 0.3631 Approximately 37% of the variability in change in executive compensation is explained by the twoyear change in the return on equity. c. rxy 0.3631 0.6026 It reflects a linear relationship that is between weak and strong. 23. a. b. s2 = MSE = SSE / (n - 2) = 12.4 / 3 = 4.133 s MSE 4.133 2.033 13- 247 c. (x i x )2 10 s sb (x x )2 1 d. t 2.033 10 0.643 b1 2.6 4.04 sb .643 1 t.025 = 3.182 (3 degrees of freedom) Since t = 4.04 > t.05 = 3.182 we reject H0: 1 = 0 e. MSR = SSR / 1 = 67.6 F = MSR / MSE = 67.6 / 4.133 = 16.36 F.05 = 10.13 (1 degree of freedom numerator and 3 denominator) Since F = 16.36 > F.05 = 10.13 we reject H0: 1 = 0 Source of Variation Regression Error Total 24. a. Sum of Squares 67.6 12.4 80.0 Mean Square 67.6 4.133 F 16.36 Mean Square F s2 = MSE = SSE / (n - 2) = 6.33 / 3 = 2.11 b. s MSE 2.11 1.453 c. (x i x )2 30.8 s sb (x x )2 1 d. Degrees of Freedom 1 3 4 t 1.453 30.8 0.262 b1 1.88 7.18 sb .262 1 t.025 = 3.182 (3 degrees of freedom) Since t = -7.18 < -t.025 = -3.182 we reject H0: 1 = 0 e. MSR = SSR / 1 = 8.47 F = MSR / MSE = 108.47 / 2.11 = 51.41 F.05 = 10.13 (1 degree of freedom numerator and 3 denominator) Since F = 51.41 > F.05 = 10.13 we reject H0: 1 = 0 Source of Variation Sum of Squares Degrees of Freedom 13- 248 108.47 6.33 114.80 Regression Error Total 25. a. 1 3 4 108.47 2.11 51.41 s2 = MSE = SSE / (n - 2) = 5.30 / 3 = 1.77 s MSE 1.77 1.33 b. (x i x )2 22.8 s sb (x x )2 1 t b1 sb .51 1.33 22.8 0.28 1.82 .28 1 t.025 = 3.182 (3 degrees of freedom) Since t = 1.82 < t.025 = 3.182 we cannot reject H0: 1 = 0; x and y do not appear to be related. c. MSR = SSR/1 = 5.90 /1 = 5.90 F = MSR/MSE = 5.90/1.77 = 3.33 F.05 = 10.13 (1 degree of freedom numerator and 3 denominator) Since F = 3.33 < F.05 = 10.13 we cannot reject H0: 1 = 0; x and y do not appear to be related. 26. a. s2 = MSE = SSE / (n - 2) = 85,135.14 / 4 = 21,283.79 s MSE 21,283.79 145.89 (x i x )2 0.74 s sb (x x )2 1 t b1 sb 1 145.89 0.74 169.59 581.08 3.43 169.59 t.025 = 2.776 (4 degrees of freedom) Since t = 3.43 > t.025 = 2.776 we reject H0: 1 = 0 b. MSR = SSR / 1 = 249,864.86 / 1 = 249.864.86 F = MSR / MSE = 249,864.86 / 21,283.79 = 11.74 F.05 = 7.71 (1 degree of freedom numerator and 4 denominator) 13- 249 Since F = 11.74 > F.05 = 7.71 we reject H0: 1 = 0 c. Source of Variation Regression Error Total 27. a. Sum of Squares 249864.86 85135.14 335000 Degrees of Freedom 1 4 5 Mean Square 249864.86 21283.79 F 11.74 Summations needed to compute the slope and y-intercept are: x 37 i b1 y 1654 i (x x )( y y) 315.2 i (x x )2 10.1 i i (xi x )( yi y) 315.2 31.2079 (x i x )2 10.1 b0 y b1 x 165.4 (31.2079)(3.7) 19.93 yˆ 19.93 31.21x b. SSE = ( y i yˆi )2 2487.66 SST = ( yi y)2 = 12,324.4 Thus, SSR = SST - SSE = 12,324.4 - 2487.66 = 9836.74 MSR = SSR/1 = 9836.74 MSE = SSE/(n - 2) = 2487.66/8 = 310.96 F = MSR / MSE = 9836.74/310.96 = 31.63 F.05 = 5.32 (1 degree of freedom numerator and 8 denominator) Since F = 31.63 > F.05 = 5.32 we reject H0: 1 = 0. Upper support and price are related. c. r2 = SSR/SST = 9,836.74/12,324.4 = .80 The estimated regression equation provided a good fit; we should feel comfortable using the estimated regression equation to estimate the price given the upper support rating. d. 28. yˆ = 19.93 + 31.21(4) = 144.77 SST = 411.73 SSE = 161.37 SSR = 250.36 MSR = SSR / 1 = 250.36 MSE = SSE / (n - 2) = 161.37 / 13 = 12.413 13- 250 F = MSR / MSE = 250.36 / 12.413= 20.17 F.05 = 4.67 (1 degree of freedom numerator and 13 denominator) Since F = 20.17 > F.05 = 4.67 we reject H0: 1 = 0. 29. SSE = 233,333.33 SST = 5,648,333.33 SSR = 5,415,000 MSE = SSE/(n - 2) = 233,333.33/(6 - 2) = 58,333.33 MSR = SSR/1 = 5,415,000 F = MSR / MSE = 5,415,000 / 58,333.25 = 92.83 Source of Variation Regression Error Total Sum of Squares 5,415,000.00 233,333.33 5,648,333.33 Degrees of Freedom 1 4 5 Mean Square 5,415,000 58,333.33 F 92.83 F.05 = 7.71 (1 degree of freedom numerator and 4 denominator) Since F = 92.83 > 7.71 we reject H0: 1 = 0. Production volume and total cost are related. 30. Using the computations from Exercise 22, SSE = 1272.4495 SST = 1998 SSR = 725.5505 s 254.4899 15.95 ∑(x i x )2 = 45,833.9286 s sb (x x )2 1 t b1 sb 1 0.1258 15.95 0.0745 45, 833.9286 1.69 0.0745 t.025 = 2.571 Since t = 1.69 < 2.571, we cannot reject H0: 1 = 0 There is no evidence of a significant relationship between x and y. 31. SST = 11,373.09 SSE = 2017.37 SSR = 9355.72 MSR = SSR / 1 = 9355.72 MSE = SSE / (n - 2) = 2017.37/ 16 = 126.0856 F = MSR / MSE = 9355.72/ 126.0856 = 74.20 13- 251 F.01 = 8.53 (1 degree of freedom numerator and 16 denominator) Since F = 74.20 > F.01 = 8.53 we reject H0: 1 = 0. 32. a. b. y$ 6.1092 0.8951x t b1 sb 1 0.8951 6.01 0.149 t.025 = 2.306 (8 degrees of freedom) Since t = 6.01 > t.025 = 2.306 we reject H0: 1 = 0; monthly maintenance expense is related to usage. c. 33. a. r2 = SSR/SST = 1575.76/1924.90 = 0.82. A good fit. 9 b. y$ 20.0 7.21x c. t = 5.29 > t.025 = 2.365 we reject H0: 1 = 0 d. SSE = SST - SSR = 51,984.1 - 41,587.3 = 10,396.8 MSE = 10,396.8 / 7 = 1,485.3 F = MSR / MSE = 41,587.3 / 1,485.3 = 28.00 F.05 = 5.59 (1 degree of freedom numerator and 7 denominator) Since F = 28 > F.05 = 5.59 we reject H0: 1 = 0. e. 34. a. b. y$ 20.0 7.21x 20.0 7.21(50) 380.5 or $380,500 y$ = 80.0 + 50.0 x F = MSR / MSE = 6828.6 / 82.1 = 83.17 F.05 = 4.20 (1 degree of freedom numerator and 28 denominator) Since F = 83.17 > F.05 = 4.20 we reject H0: 1 = 0. Branch office sales are related to the salespersons. c. t= 50 = 9.12 5.482 13- 252 t.025 = 2.048 (28 degrees of freedom) Since t = 9.12 > t.05 = 2.048 we reject H0: 1 = 0 d. 35. p-value = .000 A portion of the Excel Regression tool output for this problem follows: Regression STATISTICS Multiple R 0.7379 R Square 0.5444 Adjusted R Square 0.5094 Standard Error 4.1535 Observations 15 ANOVA SS MS F Regression df 1 268.0118 268.0118 15.5357 Residual 13 224.2682 17.2514 Total 14 492.28 Coefficients Standard Error 36. t Stat P-value Intercept 11.3332 2.7700 4.0914 0.0013 Gross Profit Margin (%) 0.6361 0.1614 3.9415 0.0017 a. $y = 11.3332 + .6361x where x = Gross Profit Margin (%) b. Significant relationship: Significance F = .0017 < = .05 c. Significant relationship: P-value = .0017 < = .05 d. r2 = 0.5444; Not a good fit A portion of the Excel Regression tool output for this problem follows: Regression STATISTICS Multiple R 0.6502 R Square 0.4228 Adjusted R Square 0.3907 Standard Error 11.5925 Observations 20 ANOVA 13- 253 Significance F 0.0017 df Regression Residual Total Intercept Age 1 18 19 SS MS 1771.982016 1771.982 2418.967984 134.3871 4190.95 Coefficients Standard Error -42.7965 19.3816 1.0043 0.2766 t Stat P-value -2.2081 0.0405 3.6312 0.0019 a. yˆ = 42.7965 + 1.0043x where x = Age b. Significant relationship: Significance F = .0019 < = .05 c. r2 = 0.4228; Not a good fit 37. F Significance F 13.1857 0.0019 A portion of the Excel Regression tool output for this problem follows: Regression STATISTICS Multiple R 0.9277 R Square 0.8606 Adjusted R Square 0.8467 Standard Error 6.6343 Observations 12 ANOVA df SS MS Regression 1 2717.8625 2717.8625 Residual 10 440.1375 44.0137 Total 11 3158 Coefficients Standard Error Intercept Income ($1000s) t Stat F 61.7503 P-value -11.8020 12.8441 -0.9189 2.1843 0.2780 7.8581 1.3768E-05 a. yˆ = 11.802 + 2.1843x where x = Income ($1000s) b. Significant relationship: P-value = .000 < = .05 c. r2 = 0.86; A very good fit 13- 254 0.3798 Significance F 1.3768E-05 38. a. Scatter diagram: 40.0 Average Rental Rate ($) 35.0 30.0 25.0 20.0 15.0 10.0 0.0 5.0 10.0 15.0 20.0 25.0 Vaacancy Rate (%) b. There appears to be a linear relationship between the two variables. A portion of the Excel Regression tool output for this problem follows: Regression STATISTICS Multiple R 0.6589 R Square 0.4341 Adjusted R Square 0.3988 Standard Error 4.8847 Observations 18 ANOVA SS MS F Regression df 1 292.9137 292.9137 12.2760 Residual 16 381.7712 23.8607 Total 17 674.6849 13- 255 Significance F 0.0029 Coefficients Standard Error t Stat P-value Intercept 37.0747 3.5277 10.5097 1.36938E-08 Vacancy Rate (%) -0.7792 0.2224 -3.5037 c. yˆ = 37.0747 - 0.7792x where x = Vacancy Rate (%) d. Significant relationship: Significance F (or P-value) < = .05 e. 39. a. r2 = 0.43; Not a very good fit s = 2.033 x 3 (x i x )2 10 syˆ s 2 1 (4 3)2 1 (x p x ) 2.033 1.11 n (x x )2 5 10 p b. y$ 0.2 2.6x 0.2 2.6(4) 10.6 y$p t /2 sy$ p 10.6 3.182(1.11) = 10.6 3.53 or 7.07 to 14.13 c. sind s 1 d. y$p t /2 sind 2 1 (4 3)2 1 (x p x ) 2.033 1 2.32 n (x x )2 5 10 10.6 3.182(2.32) = 10.6 7.38 or 3.22 to 17.98 40. a. b. s = 1.453 x 3.8 syˆ s p (x i x )2 30.8 2 1 (3 3.8)2 1 (x p x ) 1.453 .068 n (x x )2 5 30.8 y$ 30.33 1.88x 30.33 1.88(3) 24.69 y$p t /2 sy$ p 24.69 3.182(.68) = 24.69 2.16 13- 256 0.0029 or 22.53 to 26.85 c. sind s 1 d. y$p t /2 sind 2 1 (3 3.8)2 1 (x x ) 1.453 1 1.61 n (x x )2 5 30.8 24.69 3.182(1.61) = 24.69 5.12 or 19.57 to 29.81 41. s = 1.33 x 5.2 syˆ s p (x i x )2 22.8 2 1 (3 5.2)2 1 (x p x ) 1.33 0.85 n (x x )2 5 22.8 y$ 0.75 0.51x 0.75 0.51(3) 2.28 y$p t /2 sy$ p 2.28 3.182 (.85) = 2.28 2.70 or -.40 to 4.98 sind s 1 2 1 (3 5.2)2 1 (x p x ) 1.33 1 1.58 n (x x )2 5 22.8 y$p t /2 sind 2.28 3.182 (1.58) = 2.28 5.03 or -2.27 to 7.31 42. a. s = 145.89 x 3.2 syˆ s p (x i x )2 0.74 2 1 (3 3.2)2 1 (x p x ) 145.89 68.54 2 n (x x ) 6 0.74 yˆ = 1790.5 + 581.1x = 1790.5 + 581.1(3) = 3533.8 y$p t /2 sy$ p 13- 257 3533.8 2.776(68.54) = 3533.8 190.27 or $3343.53 to $3724.07 b. sind s 1 2 1 (3 3.2)2 1 (x p x ) 145.89 1 161.19 n (x x )2 6 0.74 y$p t /2 sind 3533.8 2.776(161.19) = 3533.8 447.46 or $3086.34 to $3981.26 43. a. b. yˆ 51.82 .1452x 51.82 .1452(200) 80.86 s = 3.5232 x 183.4667 (x i x )2 11, 867.73 2 1 (200 183.4667)2 1 (x p x ) 3.5232 1.055 n (x x )2 15 11, 867.73 syˆ s p y$p t /2 sy$ p 80.86 2.160(1.055) = 80.86 2.279 or 78.58 to 83.14 c. sind s 1 2 1 (200 183.4667)2 1 (x p x ) 3.5232 1 3.678 n (x x )2 11,867.73 15 y$p t /2 sind 80.86 2.160(3.678) = 80.86 7.944 or 72.92 to 88.80 44. a. x 57 (x i x )2 7648 s2 = 1.88 syˆ s p s = 1.37 2 1 (52.5 57)2 1 (x p x ) 1.37 0.52 2 n (x x ) 7 7648 y$p t /2 sy$ p 13.08 2.571 (.52) = 13.08 1.34 13- 258 or 11.74 to 14.42 or $11,740 to $14,420 b. sind = 1.47 13.08 2.571 (1.47) = 13.08 3.78 or 9.30 to 16.86 or $9,300 to $16,860 c. Yes, $20,400 is much larger than anticipated. d. Any deductions exceeding the $16,860 upper limit could suggest an audit. 45. a. y$ 1246.67 7.6(500) $5046.67 b. x 575 (x i x )2 93, 750 s2 = MSE = 58,333.33 s = 241.52 sind s 1 2 1 (500 575)2 1 (x p x ) 241.52 1 267.50 n (x x )2 6 93, 750 y$p t /2 sind 5046.67 4.604 (267.50) = 5046.67 1231.57 or $3815.10 to $6278.24 c. 46.a. Based on one month, $6000 is not out of line since $3815.10 to $6278.24 is the prediction interval. However, a sequence of five to seven months with consistently high costs should cause concern. Summations needed to compute the slope and y-intercept are: x 227 i b1 y 2281.7 i (x x )( y y) 6003.41 i i (x x )2 1032.1 i (xi x )( yi y) 6003.41 5.816694 (x i x )2 1032.1 b0 y b1x 228.17 (5.816694)(27.7) 67.047576 y$ 67.0476 5.8167 x b. SST = 39,065.14 SSE = 4145.141 SSR = 34,920.000 r2 = SSR/SST = 34,920.000/39,065.141 = 0.894 The estimated regression equation explained 89.4% of the variability in y; a very good fit. c. s2 = MSE = 4145.141/8 = 518.143 13- 259 s 518.143 22.76 2 1 (35 27.7)2 1 (x p x ) 22.76 8.86 n (x x )2 10 1032.1 syˆ s p y$ 67.0476 5.8167x 67.0476 5.8167(35) 270.63 y$p t /2 sy$ p 270.63 2.262 (8.86) = 270.63 20.04 or 250.59 to 290.67 d. sind s 1 2 1 (35 27.7)2 1 (x x ) 22.76 1 24.42 n (x x )2 10 1032.1 y$p t /2 sind 270.63 2.262 (24.42) = 270.63 55.24 or 215.39 to 325.87 47. a. Using Excel's Regression tool, the estimated regression equation is: yˆ = 7.0222 + 1.5873x or yˆ = 7.02 + 1.59x b. The residuals calculated using yˆ = 7.02 + 1.59x are 3.48, -2.47, -4.83, -1.60, and 5.22 c. 6 Residuals 4 2 0 -2 0 5 10 15 20 25 -4 -6 x With only 5 observations it is difficult to determine if the assumptions are satisfied; however, the plot does suggest curvature in the residuals which would indicate that the error team assumptions are not satisfied. The scatter diagram for these DATA also indicates that the underlying relationship between x and y may be curvilinear. 13- 260 d. x = 14 s = 4.8765 xi x 2 xi xi x (x i x )2 6 11 15 18 20 -8 -3 1 4 6 64 9 1 16 36 126 ∑ x x i hi sy yˆ y i yˆi Standardized Residual .5079 .0714 .0079 .1270 .2857 .7079 .2714 .2079 .3270 .4857 2.6356 4.1625 4.3401 4.0005 3.4972 3.48 -2.47 -4.83 -1.60 5.22 1.32 -.59 -1.11 -.40 1.49 2 i i e. Standardized Residuals 2 1.5 1 0.5 0 -0.5 0 5 10 15 20 25 -1 -1.5 x The shape of the standardized residual plot is the same shape as the residual plot. The conclusions reached in part (c) are also appropriate here. 48. a. Using Excel's Regression tool, the estimated regression equation is: yˆ = 2.322 + 0.6366x or yˆ = 2.32 + 0.64x b. 13- 261 Residuals -2 -3 -4 x The assumption that the variance is the same for all values of x is questionable. The variance appears to increase for larger values of x. 49. a. Using Excel's Regression tool, the estimated regression equation is: yˆ = 29.3991 + 1.5475x or yˆ = 29.40 + 1.55x b. Significant relationship: Significance F (or P-value) < = .05 c. 13- 262 10 Residuals 5 0 0 5 10 15 20 25 -5 -10 -15 x d. The residual plot here leads us to question the assumption of a linear relationship between x and y. Even though the relationship is significant at the = .05 level, it would be extremely dangerous to extrapolate beyond the range of the DATA. (e.g. x > 20). 50. a. From the solution to Exercise 9 we know that y$ = 80 + 4x Residuals 8 -4 -6 -8 x b. 51. a. The assumptions concerning the error terms appear reasonable. Let x = return on investment (ROE) and y = price/earnings (P/E) ratio. yˆ 32.13 3.22x b. 13- 263 Standardized Residuals 2 1.5 1 0.5 0 -0.5 -1 -1.5 0 10 20 30 40 50 60 x c. There is an unusual trend in the residuals. The assumptions concerning the error term appear questionable. 52. No. Regression or correlation analysis can never prove that two variables are casually related. 53. The estimate of a mean value is an estimate of the average of all y values associated with the same x. The estimate of an individual y value is an estimate of only one of the y values associated with a particular x. 54. To determine whether or not there is a significant relationship between x and y. However, if we reject B1 = 0, it does not imply a good fit. 55. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.8624 R Square 0.7438 Adjusted R Square 0.7118 Standard Error 1.4193 Observations 10 ANOVA MS F Regression df 1 SS 46.7838 46.7838 23.2233 Residual 8 16.1162 2.0145 Total 9 62.9 Coefficients Standard Error t Stat 0.0013 P-value Intercept 9.2649 1.0991 8.4293 2.99E-05 Shares 0.7105 0.1474 4.8191 0.0013 b. Significance F Since the p-value corresponding to F = 23.223 = .001 < = .05, the relationship is significant. 13- 264 c. r 2 = .744; a good fit. The least squares line explained 74.4% of the variability in Price. d. yˆ 9.26 .711(6) 13.53 56. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.9586 R Square 0.9189 Adjusted R Square 0.9116 Standard Error 11.0428 Observations 13 ANOVA SS MS Regression df 1 15208.3982 15208.4 124.7162 Residual 11 1341.3849 121.9441 Total 12 16549.7831 Coefficients Standard Error t Stat F Significance F 2.42673E-07 P-value Intercept -3.8338 5.9031 -0.6495 Common Shares Outstanding (millions) 0.2957 0.0265 11.1676 2.43E-07 0.5294 b. yˆ 3.83 .296(150) 40.57 ; approximately 40.6 million shares of options grants outstanding. c. r 2 = .919; a very good fit. The least squares line explained 91.9% of the variability in Options. 57. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.6852 R Square 0.4695 Adjusted R Square 0.4032 Standard Error 2.6641 Observations 10 ANOVA MS F Regression df 1 SS 50.2554 50.2554 7.0807 Residual 8 56.7806 7.0976 Total 9 107.036 13- 265 Significance F 0.0288 Coefficients Standard Error t Stat P-value Intercept 0.2747 0.9004 0.3051 0.7681 S&P 500 0.9498 0.3569 2.6609 0.0288 b. Since the p-value = 0.029 is less than = .05, the relationship is significant. c. r2 = .470. The least squares line does not provide a very good fit. d. Woolworth has higher risk with a market beta of 1.25. 58. a. 100 High Temperature 90 80 70 60 50 40 35 45 55 65 75 85 Low Temperature b. It appears that there is a positive linear relationship between the two variables. c. The Excel output is shown below: Regression STATISTICS Multiple R 0.8837 R Square 0.7810 Adjusted R Square 0.7688 Standard Error 5.2846 Observations 20 ANOVA SS MS F Regression df 1 1792.2734 1792.273 64.1783 Residual 18 502.6766 27.9265 Total 19 2294.95 13- 266 Significance F 2.40264E-07 Coefficients Standard Error t Stat P-value Intercept 23.8987 6.4812 3.6874 0.0017 Low 0.8980 0.1121 8.0111 2.4E-07 d. Since the p-value corresponding to F = 64.18 = .000 < = .05, the relationship is significant. e. r 2 = .781; a good fit. The least squares line explained 78.1% of the variability in high temperature. f. rxy .781 .88 59. The Excel output is shown below: Regression STATISTICS Multiple R 0.9253 R Square 0.8562 Adjusted R Square 0.8382 Standard Error 4.2496 Observations 10 ANOVA SS MS F Regression df 1 860.0509486 860.0509 47.6238 Residual 8 144.4740514 18.0593 Total 9 1004.525 Coefficients Standard Error t Stat Significance F P-value Intercept 10.5280 3.7449 2.8113 0.0228 Weekly Usage 0.9534 0.1382 6.9010 0.0001 a. yˆ = 10.528 + .9534x b. Since the p-value corresponding to F = 47.62 = .0001 < = .05, we reject H0: 1 = 0. c. Using the PredInt macro, the 95% prediction interval is 28.74 to 49.52 or $2874 to $4952 d. Yes, since the expected expense is $3913. 13- 267 0.0001 60. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.8597 R Square Adjusted R Square 0.7391 Standard Error 1.4891 0.6739 Observations 6 ANOVA MS F Regression df 1 SS 25.1304 25.1304 11.3333 Residual 4 8.8696 2.2174 Total 5 34 Coefficients Standard Error t Stat Significance F 0.0281 P-value Intercept 22.1739 1.6527 13.4164 0.0002 Line Speed -0.1478 0.0439 -3.3665 0.0281 b. Since the p-value corresponding to F = 11.33 = .0281 < = .05, the relationship is significant. c. r 2 = .739; a good fit. The least squares line explained 73.9% of the variability in the number of defects. d. Using the PredInt macro, the 95% confidence interval is 12.294 to 17.271. The scatter diagram follows: 10 8 Days Absent 61. a. 6 4 2 0 0 5 10 Distance to Work A negative linear relationship appears to be reasonable. 13- 268 15 20 b. The Excel output is shown below: Regression STATISTICS Multiple R 0.8431 R Square 0.7109 Adjusted R Square 0.6747 Standard Error 1.2894 Observations 10 ANOVA MS F Regression df 1 SS 32.6993 32.6993 19.6677 Residual 8 13.3007 1.6626 Total 9 46 Coefficients Standard Error t Stat Significance F 0.0022 P-value Intercept 8.0978 0.8088 10.0119 8.41E-06 Distance to Work -0.3442 0.0776 -4.4348 0.0022 c. Since the p-value corresponding to F = 419.67 is .0022 < = .05. We reject H0 : 1 = 0. d. r2 = .711. The estimated regression equation explained 71.1% of the variability in y; this is a reasonably good fit. e. Using the PredInt macro, the 95% confidence interval is 5.195 to 7.559 or approximately 5.2 to 7.6 days. 62. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.9341 R Square 0.8725 Adjusted R Square 0.8566 Standard Error 75.4983 Observations 10 ANOVA df Regression Residual Total Intercept Age SS 1 8 9 312050 45600 357650 Coefficients Standard Error 220 58.4808 131.6667 17.7951 13- 269 MS 312050 5700 F Significance F 54.7456 7.62662E-05 t Stat P-value 3.7619 0.0055 7.3990 7.63E-05 b. Since the p-value corresponding to F = 54.75 is .000 < = .05, we reject H0: 1 = 0. c. r2 = .873. The least squares line provided a very good fit. d. Using the PredInt macro, the 95% prediction interval is 559.5 to 933.9 or $559.50 to $933.90 63. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.9369 R Square 0.8777 Adjusted R Square 0.8624 Standard Error 7.5231 Observations 10 ANOVA SS MS F Regression df 1 3249.720752 3249.721 57.4182 Residual 8 452.7792483 56.5974 Total 9 3702.5 Coefficients Standard Error t Stat Significance F P-value Intercept 5.8470 7.9717 0.7335 0.4842 Hours Studying 0.8295 0.1095 7.5775 6.44E-05 b. Since the p-value corresponding to F = 57.42 is .000 < = .05, we reject H0: 1 = 0. c. 84.65 points d. Using the PredInt macro, the 95% prediction interval is 65.35 to 103.96 64. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.4659 R Square 0.2171 Adjusted R Square 0.1736 Standard Error 0.2088 Observations 20 13- 270 6.43959E-05 ANOVA MS F Regression df 1 SS 0.2175 0.2175 4.9901 Residual 18 0.7845 0.0436 Total 19 1.002 Coefficients Intercept Adjusted Gross Income Standard Error t Stat P-value -0.4710 0.5842 -0.8061 0.4307 3.86778E-05 1.73143E-05 2.2339 0.0384 b. Since the p-value = 0.0384 is less than = .05, the relationship is significant. c. r2 = .217. The least squares line does not provide a very good fit. d. Using the PredInt macro, the 95% confidence interval is .7729 to .9927. 13- 271 Significance F 0.0384 Chapter 13 Multiple Regression Learning Objectives 1. Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables. 2. Be able to interpret the coefficients in a multiple regression analysis. 3. Know the assumptions necessary to conduct statistical tests involving the hypothesized regression model. 4. Understand the role of Excel in performing multiple regression analysis. 5. Be able to interpret and use Excel's Regression tool output to develop the estimated regression equation. 6. Be able to determine how good a fit is provided by the estimated regression equation. 7. Be able to test for the significance of the regression equation. 8. Understand how multicollinearity affects multiple regression analysis. 9. Know how residual analysis can be used to make a judgement as to the appropriateness of the model, identify outliers, and determine which observations are influential. 13- 272 Solutions: 1. a. b1 = .5906 is an estimate of the change in y corresponding to a 1 unit change in x1 when x2 is held constant. b2 = .4980 is an estimate of the change in y corresponding to a 1 unit change in x2 when x1 is held constant. 2. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.8124 R Square 0.6600 Adjusted R Square 0.6175 Standard Error 25.4009 Observations 10 ANOVA SS MS F Regression df 1 10021.24739 10021.25 15.5318 Residual 8 5161.652607 645.2066 Total 9 15182.9 Coefficients Standard Error P-value 45.0594 25.4181 1.7727 0.1142 X1 1.9436 0.4932 3.9410 0.0043 An estimate of y when x1 = 45 is yˆ = 45.0594 + 1.9436(45) = 132.52 b. t Stat Intercept The Excel output is shown below: Regression STATISTICS Multiple R 0.4707 R Square 0.2215 Adjusted R Square 0.1242 Standard Error 38.4374 Observations 10 14 - 273 Significance F 0.0043 ANOVA SS MS F Regression df 1 3363.4142 3363.414 2.2765 Residual 8 11819.4858 1477.436 Total 9 15182.9 Coefficients Standard Error t Stat Significance F 0.1698 P-value Intercept 85.2171 38.3520 2.2220 0.0570 X2 4.3215 2.8642 1.5088 0.1698 An estimate of y when x2 = 15 is yˆ = 85.2171 + 4.3215(15) = 150.04 c. The Excel output is shown below: Regression STATISTICS Multiple R 0.9620 R Square 0.9255 Adjusted R Square 0.9042 Standard Error 12.7096 Observations 10 ANOVA SS MS F Regression df 2 14052.15497 7026.077 43.4957 Residual 7 1130.745026 161.535 Total 9 15182.9 Coefficients Intercept Standard Error t Stat Significance F 0.0001 P-value -18.3683 17.97150328 -1.0221 0.3408 X1 2.0102 0.2471 8.1345 8.19E-05 X2 4.7378 0.9484 4.9954 0.0016 An estimate of y when x1 = 45 and x2 = 15 is yˆ = -18.3683 + 2.0102(45) + 4.7378(15) = 143.16 3. a. b1 = 3.8 is an estimate of the change in y corresponding to a 1 unit change in x1 when x2, x3, and x4 are held constant. b2 = -2.3 is an estimate of the change in y corresponding to a 1 unit change in x2 when x1, x3, and x4 are held constant. 13 - 274 b3 = 7.6 is an estimate of the change in y corresponding to a 1 unit change in x3 when x1, x2, and x4 are held constant. b4 = 2.7 is an estimate of the change in y corresponding to a 1 unit change in x4 when x1, x2, and x3 are held constant. 4. 5. a. yˆ = 25 + 10(15) + 8(10) = 255; sales estimate: $255,000 b. Sales can be expected to increase by $10 for every dollar increase in inventory investment when advertising expenditure is held constant. Sales can be expected to increase by $8 for every dollar increase in advertising expenditure when inventory investment is held constant. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.8078 R Square 0.6526 Adjusted R Square 0.5946 Standard Error 1.2152 Observations 8 ANOVA df SS 16.6401 8.8599 25.5 MS 16.6401 1.4767 Coefficients Standard Error 88.6377 1.5824 t Stat 56.0159 P-value 2.174E-09 0.4778 3.3569 0.0153 SS 23.4354 2.0646 25.5 MS 11.7177 0.4129 Regression Residual Total Intercept Television Advertising ($1000s) b. 1 6 7 1.6039 F Significance F 11.2688 0.0153 The Excel output is shown below: Regression STATISTICS Multiple R 0.9587 R Square 0.9190 Adjusted R Square 0.8866 Standard Error 0.6426 Observations 8 ANOVA df Regression Residual Total 2 5 7 13 - 275 F Significance F 28.3778 0.0019 Intercept Television Advertising ($1000s) Newspaper Advertising ($1000s) 6. Coefficients Standard Error 83.2301 1.5739 t Stat P-value 52.8825 4.57E-08 2.2902 0.3041 7.5319 0.0007 1.3010 0.3207 4.0567 0.0098 c. No, it is 1.6039 in part (a) and 2.2902 above. In this exercise it represents the marginal change in revenue due to an increase in television advertising with newspaper advertising held constant. d. Revenue = 83.2301 + 2.2902(3.5) + 1.3010(1.8) = $93.59 or $93,590 a. The Excel output is shown below: Regression STATISTICS Multiple R 0.5579 R Square 0.3112 Adjusted R Square 0.2620 Standard Error 7.0000 Observations 16 ANOVA SS MS F Regression df 1 309.9516 309.9516 6.3255 Residual 14 686.0028 49.0002 Total 15 995.9544 Coefficients b. Standard Error t Stat P-value Intercept 49.7800 19.1062 2.6054 0.0208 Curb Weight (lb.) 0.0151 0.0060 2.5151 0.0247 The Excel output is shown below: Regression STATISTICS Multiple R 0.9383 R Square 0.8804 Adjusted R Square 0.8620 Standard Error 3.0274 Observations 16 ANOVA 13 - 276 Significance F 0.0247 SS MS F Regression df 2 876.8049822 438.4025 47.8327 Residual 13 119.1493928 9.1653 Total 15 995.954375 Coefficients c. Standard Error t Stat Significance F 1.01401E-06 P-value Intercept 80.4873 9.1393 8.8067 Curb Weight (lb.) -0.0031 0.0035 -0.8968 7.69E-07 0.3861 Horsepower 0.1047 0.0133 7.8643 2.7E-06 yˆ = 80.4873 - 0.0031(2910) + 0.1047(296) = 102 Note to instructor: The Excel output shows that Curb Weight is not very significant (p-value = .3861) given the effect of Horsepower. In Section 15.5, students will learn how to test for the significance of the individual parameters. 7. a. The Excel output is shown below: Regression STATISTICS 0.9121 Multiple R R Square 0.8318 Adjusted R Square 0.7838 Standard Error 51.1363 Observations 10 ANOVA df Regression Residual Total Intercept Capacity Comfort 8. 2 7 9 SS MS 90548.0554 45274.03 18304.4446 2614.921 108852.5 Coefficients Standard Error 356.1208 197.1740 -0.0987 0.0459 122.8672 21.7998 t Stat 1.8061 -2.1524 5.6362 F Significance F 17.3137 0.0019 P-value 0.1139 0.0684 0.0008 b. b1 = -.0987 is an estimate of the change in the price with respect to a 1 cubic inch change in capacity with the comfort rating held constant. b2 = 122.8672 is an estimate of the change in the price with respect to a 1 unit change in the comfort rating with the capacity held constant. c. yˆ = 356.1208 - .0987(4500) + 122.8672(4) = $403 a. The Excel output is shown below: 13 - 277 Regression STATISTICS Multiple R 0.7629 R Square 0.5820 Adjusted R Square 0.5329 Standard Error 16.9770 Observations 20 ANOVA df 9. SS MS Regression 2 6823.2072 3411.604 11.8368 F Residual 17 4899.7428 288.2202 Total 19 11722.95 Coefficients Standard Error Intercept 247.3579 110.4462 2.2396 0.0388 Safety Rating -32.8445 13.9504 -2.3544 0.0308 Annual Expense Ratio (%) 34.5887 14.1294 2.4480 0.0255 b. yˆ 247.3579 32.8445(7.5) 34.5887(2) 70.2 a. The Excel output is shown below: t Stat Significance F 0.0006 P-value Regression STATISTICS Multiple R 0.6182 R Square 0.3821 Adjusted R Square 0.2998 Standard Error 12.4169 Observations 18 ANOVA df Regression Residual Total Intercept Average Class Size Combined SAT Score b. 2 15 17 SS 1430.4194 2312.6917 3743.111111 Coefficients Standard Error 26.7067 51.6689 MS 715.2097 154.1794 F Significance F 4.6388 0.0270 t Stat P-value 0.5169 0.6128 -1.4298 0.9931 -1.4397 0.1705 0.0757 0.0391 1.9392 0.0715 yˆ = 26.7067 - 1.4298(20) + 0.0757(1000) = 73.8 or 73.8% 13 - 278 Note to instructor: the Excel output shows that Average Class Size is not very significant (p-value = .1705) given the effect of Combined SAT Score. In Section 15.5, students will learn how to test for the significance of the individual parameters. 10. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.9616 R Square 0.9246 Adjusted R Square Standard Error 0.9188 226.6709 Observations 15 ANOVA df SS MS F Regression 1 8192067.3605 8192067.3605 Residual 13 667935.9155 51379.6858 Total 14 8860003.2760 Coefficients Standard Error t Stat Significance F 159.4418 1.13179E-08 P-value Intercept 33.3352 83.0767 0.4013 Cars 7.9840 0.6323 12.6270 1.13179E-08 0.6947 b. An increase of 1000 cars in service will result in an increase in revenue of $7.984 million. c. The Excel output is shown below: Regression STATISTICS Multiple R 0.9703 R Square 0.9416 Adjusted R Square Standard Error 0.9318 207.7292 Observations 15 ANOVA df SS MS Regression 2 8342186.4020 4171093.2010 Residual 12 517816.8740 43151.4062 Total 14 8860003.2760 Coefficients Standard Error Intercept 105.9727 85.5166 13 - 279 t Stat 1.2392 F 96.6618 P-value 0.2390 Significance F 3.98523E-08 11. a. Cars 8.9427 0.7746 11.5451 7.42955E-08 Locations -0.1914 0.1026 -1.8652 SSE = SST - SSR = 6,724.125 - 6,216.375 = 507.75 SSR 6, 216.375 .924 SST 6, 724.125 b. R2 c. R2 1 (1 R2 ) n 1 1 (1.924) 10 1 .902 a n p 1 10 2 1 d. The estimated regression equation provided an excellent fit. 12. a. b. c. 13. a. R2 SSR 14, 052.2 .926 SST 15,182.9 R2 1 (1 R2 ) n 1 1 (1.926) 10 1 .905 a n p 1 10 2 1 Yes; after adjusting for the number of independent variables in the model, we see that 90.5% of the variability in y has been accounted for. R2 SSR SST 1760 .975 1805 b. R2 1 (1 R2 ) n 1 1 (1.975) 30 1 .971 a n p 1 30 4 1 c. The estimated regression equation provided an excellent fit. 14. a. b. R2 SSR 12, 000 .75 SST 16, 000 R2 1 (1 R2 ) a c. 15. a. n 1 1.25 9 .68 n p 1 7 The adjusted coefficient of determination shows that 68% of the variability has been explained by the two independent variables; thus, we conclude that the model does not explain a large amount of variability. R2 SSR 23.435 .919 SST 25.5 R2 1 (1 R2 ) a b. 0.0868 n 1 1 (1.919) 8 1 .887 n p 1 8 2 1 Multiple regression analysis is preferred since both R2 and R2a show an increased percentage of the variability of y explained when both independent variables are used. 13 - 280 16. Note: the Excel output is shown with the solution to Exercise 6. a. No; R Square = .3112 b. Multiple regression analysis is preferred since both R Square and Adjusted R Square show an increased percentage of the variability of y explained when both independent variables are used. 17. a. b. 18. R Square = .3821 Adjusted R Square = .2998 The fit is not very good Note: The Excel output is shown with the solution to Exercise 10. a. R Square = .9416 b. The fit is very good. 19. a. Adjusted R Square = .9318 MSR = SSR/p = 6,216.375/2 = 3,108.188 MSE SSE 507.75 72.536 n p 1 10 2 1 b. F = MSR/MSE = 3,108.188/72.536 = 42.85 F.05 = 4.74 (2 degrees of freedom numerator and 7 denominator) Since F = 42.85 > F.05 = 4.74 the overall model is significant. c. t = .5906/.0813 = 7.26 t.025 = 2.365 (7 degrees of freedom) Since t = 2.365 > t.025 = 2.365, is significant. d. t = .4980/.0567 = 8.78 Since t = 8.78 > t.025 = 2.365, is significant. 20. A portion of the Excel output is shown below. Regression STATISTICS Multiple R 0.9620 R Square 0.9255 Adjusted R Square 0.9042 Standard Error 12.7096 Observations 10 ANOVA df SS 13 - 281 MS F Significance F Regression 2 14052.15497 7026.077 Residual 7 1130.745026 161.535 Total 9 15182.9 Coefficients Intercept Standard Error t Stat 43.4957 0.0001 P-value -18.36826758 17.97150328 -1.0221 0.3408 X1 2.0102 0.2471 8.1345 8.19E-05 X2 4.7378 0.9484 4.9954 0.0016 a. Since the p-value corresponding to F = 43.4957 is .0001 < = .05, we reject H0: = = 0; there is a significant relationship. b. Since the p-value corresponding to t = 8.1345 is .000 < = .05, we reject H0: = 0; is significant. c. Since the p-value corresponding to t = 4.9954 is .0016 < = .05, we reject H0: = 0; is significant. 21. a. b. 22. a. In the two independent variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1 when x2 is held constant. In the single independent variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1. Yes. If x1 and x2 are correlated, one would expect a change in the coefficient of x1 when x2 is dropped from the model. SSE = SST - SSR = 16000 - 12000 = 4000 s2 SSE n - p -1 MSR SSR p b. 4000 571.43 7 12000 6000 2 F = MSR/MSE = 6000/571.43 = 10.50 F.05 = 4.74 (2 degrees of freedom numerator and 7 denominator) Since F = 10.50 > F.05 = 4.74, we reject H0. There is a significant relationship among the variables. 23. a. F = 28.38 F.01 = 13.27 (2 degrees of freedom, numerator and 1 denominator) Since F > F.01 = 13.27, reject H0. b. Alternatively, the p-value of .002 leads to the same conclusion. t = 7.53 13 - 282 t.025 = 2.571 Since t > t.025 = 2.571, is significant and x1 should not be dropped from the model. c. t = 4.06 t.025 = 2.571 Since t > t.025 = 2.571, is significant and x2 should not be dropped from the model. 24. Note: The Excel output is shown below: Regression STATISTICS Multiple R 0.9383 R Square 0.8804 Adjusted R Square 0.8620 Standard Error 3.0274 Observations 16 ANOVA SS MS F Regression df 2 876.8049822 438.4025 47.8327 Residual 13 119.1493928 9.1653 Total 15 995.954375 Coefficients a. Standard Error t Stat 80.4873 9.1393 8.8067 Curb Weight (lb.) -0.0031 0.0035 -0.8968 0.3861 Horsepower 0.1047 0.0133 7.8643 2.7E-06 F = 47.8327 F.05 = 3.81 (2 degrees of freedom numerator and 13 denominator) Since F = 47.8327 > F.05 = 3.81, we reject H0: = = 0. Alternatively, since the p-value = .000 < = .05 we can reject H0. b. P-value Intercept For Curb Weight: H0: = 0 Ha: 0 Since the p-value = 0.3861 > = 0.05, we cannot reject H0 For Horsepower: 13 - 283 7.69E-07 Significance F 1.01401E-06 H0: = 0 H a: 0 Since the p-value = 0.000 < = 0.05, we can reject H0 25. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.6867 R Square 0.4715 Adjusted R Square 0.3902 Standard Error 5.4561 Observations 16 ANOVA MS F Regression df 2 345.2765787 SS 172.6383 5.7992 Residual 13 387.0034213 29.7695 Total 15 732.28 Coefficients Standard Error t Stat Significance F 0.0158 P-value Intercept 6.0382 4.5893 1.3157 0.2110 Gross Profit Margin (%) 0.6916 0.2133 3.2421 0.0064 Sales Growth (%) 0.2648 0.1871 1.4154 0.1805 b. Since the p-value = 0.0158 < = 0.05, there is a significant relationship among the variables. c. For Gross Profit Margin (%): Since the p-value = 0.0064 < = 0.05, Profit% is significant. For Gross Profit Margin (%): Since the p-value = 0.1805 > = 0.05, Sales% is not significant. 26. Note: The Excel output is shown below: Regression STATISTICS Multiple R 0.9703 R Square 0.9416 Adjusted R Square Standard Error Observations 0.9318 207.7292 15 ANOVA 13 - 284 df SS MS F Regression 2 8342186.4020 4171093.2010 Residual 12 517816.8740 43151.4062 Total 14 8860003.2760 Coefficients Intercept Standard Error t Stat 96.6618 Significance F 3.98523E-08 P-value 105.9727 85.5166 1.2392 Cars 8.9427 0.7746 11.5451 7.42955E-08 0.2390 Locations -0.1914 0.1026 -1.8652 0.0868 a. Since the p-value corresponding to F = 96.6618 is 0.000 < = .05, there is a significant relationship among the variables. b. For Cars: Since the p-value = 0.000 < = 0.05, Cars is significant c. For Location: Since the p-value = 0.0868 > = 0.05, Location is not significant 27. a. b. yˆ = 29.1270 + .5906(180) + .4980(310) = 289.8150 The point estimate for an individual value is yˆ = 289.8150, the same as the point estimate of the mean value. 28. a. Using the PredInt macro, the 95% confidence interval is 132.16 to 154.16. b. Using the PredInt macro, the 95% prediction interval is 111.13 to 175.18. 29. a. yˆ = 83.2 + 2.29(3.5) + 1.30(1.8) = 93.555 or $93,555 Note: In Exercise 5b, the Excel output also shows that b0 = 83.2301, b1 = 2.2902, and b2 = 1.3010; hence, yˆ = 83.2301 + 2.2902x1 + 1.3010x2. Using this estimated regression equation, we obtain yˆ = 83.2301 + 2.2902(3.5) + 1.3010(1.8) = 93.588 or $93,588 The difference, $93,588 - $93,555 = $33, is simply due to the fact that additional significant digits are used in the computations. b. Using the PredInt macro, the confidence interval estimate: 92.840 to 94.335 or $92,840 to $94,335 c. Using the PredInt macro, the prediction interval estimate: 91.774 to 95.401 or $91,774 to $95,401 13 - 285 30. a. Since Curb Weight is not statistically significant (see Exercise 24), we will use an estimated regression equation which uses only Horsepower to predict the speed at 1/4 mile. The Excel output is shown below: Regression STATISTICS Multiple R 0.9343 R Square 0.8730 Adjusted R Square 0.8639 Standard Error 3.0062 Observations 16 ANOVA df SS MS Regression 1 869.4340 869.434 Residual 14 126.5204 9.0372 Total 15 995.9544 Coefficients Standard Error t Stat F Significance F 96.2064 1.18632E-07 P-value Intercept 72.6500 2.6555 27.3586 1.49E-13 Horsepower 0.0968 0.0099 9.8085 1.19E-07 Using the PredInt macro, the point estimate is a speed of 101.29 miles per hour. b. Using the PredInt macro, the 95% confidence interval is 99.490 to 103.089 miles per hour. c. Using the PredInt macro, the 95% prediction interval is 94.596 to 107.984 miles per hour. 31. a. Using the PredInt macro, the 95% confidence interval is 58.37% to 75.03%. b. Using the PredInt macro, the 95% prediction interval is 35.24% to 90.59%. 32. a. E(y) = + x1 + x2 where x2 = 0 if level 1 and 1 if level 2 b. E(y) = + x1 + (0) = + x1 c. E(y) = + x1 + (1) = + x1 + d. = E(y | level 2) - E(y | level 1) is the change in E(y) for a 1 unit change in x1 holding x2 constant. 13 - 286 33. a. b. two E(y) = + x1 + x2 + x3 where x2 0 1 0 c. x3 0 0 1 Level 1 2 3 E(y | level 1) = + x1 + (0) + (0) = + x1 E(y | level 2) = + x1 + (1) + (0) = + x1 + E(y | level 3) = + x1 + (0) + (0) = + x1 + = E(y | level 2) - E(y | level 1) = E(y | level 3) - E(y | level 1) is the change in E(y) for a 1 unit change in x1 holding x2 and x3 constant. 34. a. $15,300 b. Estimate of sales = 10.1 - 4.2(2) + 6.8(8) + 15.3(0) = 56.1 or $56,100 c. Estimate of sales = 10.1 - 4.2(1) + 6.8(3) + 15.3(1) = 41.6 or $41,600 35. a. Let Type = 0 if a mechanical repair Type = 1 if an electrical repair The Excel output is shown below: Regression STATISTICS Multiple R 0.2952 R Square 0.0871 Adjusted R Square -0.0270 Standard Error 1.0934 Observations 10 ANOVA df Regression Residual Total Intercept Type b. SS 1 8 9 0.9127 9.5633 10.476 Coefficients Standard Error 3.45 0.5467 0.6167 0.7058 MS 0.9127 1.1954 F Significance F 0.7635 0.4077 t Stat P-value 6.3109 0.0002 0.8738 0.4077 The estimated regression equation did not provide a good fit. In fact, the p-value of .4077 shows that the relationship is not significant for any reasonable value of . 13 - 287 c. Person = 0 if Bob Jones performed the service and Person = 1 if Dave Newton performed the service. The Excel output is shown below: Regression STATISTICS Multiple R 0.7816 R Square 0.6109 Adjusted R Square 0.5623 Standard Error 0.7138 Observations 10 ANOVA df Regression Residual Total Intercept Person d. 36. a. SS 1 8 9 MS 6.4 4.076 10.476 Coefficients Standard Error 4.62 0.3192 -1.6 0.4514 6.4 0.5095 F Significance F 12.5613 0.0076 t Stat P-value 14.4729 5.08E-07 -3.5442 0.0076 We see that 61.1% of the variability in repair time has been explained by the repair person that performed the service; an acceptable, but not good, fit. The Excel output is shown below: Regression STATISTICS Multiple R 0.9488 R Square 0.900199692 Adjusted R Square 0.850299539 Standard Error 0.4174 Observations 10 ANOVA df Regression Residual Total Intercept Months Since Last Service Type Person b. SS 3 6 9 9.4305 1.0455 10.476 Coefficients Standard Error 1.8602 0.7286 0.2914 0.0836 1.1024 0.3033 -0.6091 0.3879 MS 3.1435 0.1743 F Significance F 18.0400 0.0021 t Stat P-value 2.5529 0.0433 3.4862 0.0130 3.6342 0.0109 -1.5701 0.1674 Since the p-value corresponding to F = 18.04 is .0021 < = .05, the overall model is statistically significant. 13 - 288 c. 37. a. b. The p-value corresponding to t = -1.57 is .1674 > = .05; thus, the addition of Person is not statistically significant. Person is highly correlated with Months (the SAMPLE correlation coefficient is -.691); thus, once the effect of Months has been accounted for, Person will not add much to the model. Let Position = 0 if a guard Position = 1 if an offensive tackle. The Excel output is shown below: Regression STATISTICS Multiple R 0.6895 R Square 0.4755 Adjusted R Square 0.4005 Standard Error 0.6936 Observations 25 ANOVA MS F Regression df 3 SS 9.1562 3.0521 6.3451 Residual 21 10.1014 0.4810 Total 24 19.2576 Coefficients Standard Error t Stat Significance F 0.0031 P-value Intercept 11.2233 4.5226 2.4816 0.0216 Position 0.7324 0.2893 2.5311 0.0194 Weight 0.0222 0.0104 2.1352 0.0447 Speed -2.2775 0.9290 -2.4517 0.0231 c. Since the p-value corresponding to F = 6.3451 is .0031 < = .05, there is a significant relationship between rating and the independent variables. d. The value of Adjusted R Square is .4005; the estimated regression equation did not provide a very good fit. e. Since the p-value for Position is .0194 < = .05, position is a significant factor in the player’s rating. f. yˆ 11.2233 .7324(1) .0222(300) 2.2775(5.1) 7.0 13 - 289 38. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.9346 R Square 0.8735 Adjusted R Square 0.8498 Standard Error 5.7566 Observations 20 ANOVA df Regression Residual Total Intercept Age Pressure Smoker 3 16 19 SS MS 3660.7396 1220.247 530.2104 33.1382 4190.95 Coefficients Standard Error -91.7595 15.2228 1.0767 0.1660 0.2518 0.0452 8.7399 3.0008 F Significance F 36.8230 2.06404E-07 t Stat P-value -6.0278 1.76E-05 6.4878 7.49E-06 5.5680 4.24E-05 2.9125 0.0102 b. Since the p-value corresponding to t = 2.9125 is .0102 < = .05, smoking is a significant factor. c. Using the PredInt macro, the point estimate is 34.27; the 95% prediction interval is 21.35 to 47.18. Thus, the probability of a stroke (.2135 to .4718 at the 95% confidence level) appears to be quite high. The physician would probably recommend that Art quit smoking and begin some type of treatment designed to reduce his blood pressure. 39. a. Job satisfaction can be expected to decrease by 8.69 units with a one unit increase in length of service if the wage rate does not change. A dollar increase in the wage rate is associated with a 13.5 point increase in the job satisfaction score when the length of service does not change. b. 40. a. b. 41. a. yˆ = 14.4 - 8.69(4) + 13.5(6.5) = 67.39 The expected increase in final college grade point average corresponding to a one point increase in high school grade point average is .0235 when SAT mathematics score does not change. Similarly, the expected increase in final college grade point average corresponding to a one point increase in the SAT mathematics score is .00486 when the high school grade point average does not change. yˆ = -1.41 + .0235(84) + .00486(540) = 3.19 The regression equation is Regression STATISTICS Multiple R 0.9681 R Square 0.9373 Adjusted R Square 0.9194 Standard Error 0.1298 Observations 10 13 - 290 ANOVA df SS MS 2 1.7621 0.8810 Residual 7 0.1179 0.0168 Total 9 1.88 Coefficients Standard Error b. F Regression t Stat Significance F 52.3053 6.17838E-05 P-value Intercept X1 -1.4053 0.0235 0.4848 0.0087 -2.8987 2.7078 0.0230 0.0303 X2 0.0049 0.0011 4.5125 0.0028 F.05 = 4.74 (2 degrees of freedom numerator and 7 degrees of freedom denominator) F = 52.44 > F.05; significant relationship. c. R2 SSR .937 SST R2 1 (1.937) a 9 .919 7 good fit d. t.025 = 2.365 (7 DF) for B: t = 2.71 > 2.365; reject H0 : B = 0 for B: t = 4.51 > 2.365; reject H0 : B = 0 42. a. The regression equation is Regression STATISTICS Multiple R 0.9493 R Square 0.9012 Adjusted R Square 0.8616 Standard Error 3.773 Observations 8 ANOVA df SS MS Regression 2 648.83 324.415 Residual 5 71.17 14.234 Total 7 720 13 - 291 F 22.7916 Significance F 0.0031 Coefficients b. Standard Error t Stat P-value Intercept 14.4 8.191 1.7580 0.1391 X1 -8.69 1.555 -5.5884 0.0025 X2 13.517 2.085 6.4830 0.0013 F.05 = 5.79 (5 degrees of freedom) F = 22.79 > F.05; significant relationship. c. R2 SSR .901 SST R2 1 (1.901) a 7 .861 5 good fit d. t.025 = 2.571 (5 degrees of freedom) for : t = -5.59 < -2.571; reject H0 : = 0 for : t = 6.48 > 2.571; reject H0 : = 0 43. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.5423 R Square 0.2941 Adjusted R Square 0.2689 Standard Error 19.4957 Observations 30 ANOVA SS MS F Regression df 1 4433.856352 4433.856 11.6656 Residual 28 10642.25117 380.0804 Total 29 15076.10752 Coefficients b. Standard Error t Stat Significance F P-value Intercept 12.7928 6.6242 1.9312 0.0636 Book Value Per Share 2.2649 0.6631 3.4155 0.0020 The value of R Square is .2941; the estimated regression equation does not provide a good fit. 13 - 292 0.0020 c. The Excel output is shown below: Regression STATISTICS Multiple R 0.7528 R Square 0.5667 Adjusted R Square 0.5346 Standard Error 15.5538 Observations 30 ANOVA MS F Regression df 2 8544.237582 SS 4272.119 17.6591 Residual 27 6531.869938 241.9211 Total 29 15076.10752 Coefficients Standard Error t Stat Significance F 1.24768E-05 P-value Intercept 5.8766 5.5448 1.0598 0.2986 Book Value Per Share 2.5356 0.5331 4.7562 5.87E-05 Return on Equity Per Share (%) 0.4841 0.1174 4.1220 0.0003 Since the p-value corresponding to the F test is 0.000, the relationship is significant. 44. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.9747 R Square 0.9500 Adjusted R Square 0.9319 Standard Error 2.1272 Observations 16 ANOVA MS F Regression df 4 946.1809495 SS 236.5452 52.2768 Residual 11 49.7734 4.5249 Total 15 995.954375 13 - 293 Significance F 4.33829E-07 Coefficients Standard Error t Stat P-value Intercept 97.5702 11.7926 8.2738 4.74E-06 Price ($1000s) 0.0693 0.0380 1.8210 0.0959 Curb Weight (lb.) -0.0008 0.0026 -0.3145 0.7590 Horsepower 0.0590 0.0154 3.8235 0.0028 Zero to 60 (Seconds) -2.4836 0.9601 -2.5869 0.0253 b. Since the p-value corresponding to the F test is 0.000, the relationship is significant. c. Since the p-values corresponding to the t test for both Horsepower (p-value = .0028) and Zero to 60 (p-value = .0253) are less than .05, both of these independent variables are significant. d. The Excel output is shown below: Regression STATISTICS Multiple R 0.9648 R Square 0.9309 Adjusted R Square 0.9203 Standard Error 2.3011 Observations 16 ANOVA SS MS F Regression df 2 927.1181 463.559 87.5449 Residual 13 68.8363 5.2951 Total 15 995.9544 t Stat P-value Intercept Coefficients 103.1028 Standard Error 9.4478 10.9129 6.47E-08 Horsepower 0.0558 0.0145 3.8436 0.0020 Zero to 60 (Seconds) -3.1876 0.9658 -3.3006 0.0057 13 - 294 Significance F 2.86588E-08 e. The standardized residual plot is shown below: Standard Residuals 3 2 1 0 -1 80 90 100 110 120 -2 Predicted y There is an unusual trend in the plot and one observation appears to be an outlier. f. The Excel output is shown below: Household Exposures 45. a. The Excel output indicates that observation 2 is an outlier 700 600 500 400 300 200 100 0 0 20 40 60 Times Ad Aired b. The Excel output is shown below: Regression STATISTICS Multiple R 0.9829 R Square 0.9660 Adjusted R Square Standard Error Observations 0.9618 31.70350482 10 13 - 295 80 100 ANOVA MS F Regression df 1 228519.8983 SS 228519.9 227.3576 Residual 8 8040.897745 1005.112 Total 9 236560.796 Coefficients Standard Error t Stat Significance F 3.70081E-07 P-value Intercept 53.2448 16.5334 3.2204 0.0122 Times Ad Aired 6.7427 0.4472 15.0784 3.7E-07 Since the p-value is 0.000, the relationship is significant. c. The Excel output is shown below: Regression STATISTICS Multiple R 0.9975 R Square 0.9949 Adjusted R Square 0.9935 Standard Error 13.0801 Observations 10 ANOVA MS F Regression df 2 235363.1688 SS 117681.6 687.836 Residual 7 1197.62722 171.0896 Total 9 236560.796 Coefficients Standard Error t Stat 9.23264E-09 P-value Intercept 73.0634 7.5067 9.7331 2.56E-05 Times Ad Aired 5.0368 0.3268 15.4131 1.17E-06 101.1129 15.9877 6.3244 0.0004 BigAds Significance F d. The p-value corresponding to the t test for BigAds is 0.0004; thus, the dummy variable is significant. e. The dummy variable enables us to fit two different lines to the DATA; this approach is referred to as piecewise linear approximation. 13 - 296 46. a. The Excel output is shown below: Regression STATISTICS Multiple R 0.6059 R Square 0.3671 Adjusted R Square 0.3445 Standard Error 5.4213 Observations 30 ANOVA SS MS F Regression df 1 477.2478 477.2478 16.2385 Residual 28 822.9189 29.3900 Total 29 1300.1667 Coefficients Standard Error t Stat Significance F 0.0004 P-value Intercept 38.7718 4.3481 8.9170 1.13E-09 Suggested Retail Price ($) 0.0008 0.0002 4.0297 0.0004 Since the p-value corresponding to F = 16.24 is .0004 < = .05, there is a significant relationship between the resale value (1%) and the suggested price. b. R-Square = .3671; not a very good fit. c. Let Type1 = 0 and Type2 = 0 if a small pickup; Type1 = 1 and Type2 = 0 if a full-size pickup; and Type1 = 0 and Type2 = 1 if a sport utility. The Excel output using Type1, Type2, and Price is shown below: Regression STATISTICS Multiple R 0.7940 R Square 0.6305 Adjusted R Square 0.5879 Standard Error 4.2985 Observations 30 ANOVA MS F Regression df 3 819.7710938 SS 273.257 14.7892 Residual 26 480.3955729 18.4768 Total 29 1300.166667 13 - 297 Significance F 8.11183E-06 Coefficients Standard Error d. t Stat P-value Intercept 42.5539 3.5618 11.9472 4.62E-12 Type1 9.0903 2.2476 4.0444 0.0004 Type2 Suggested Retail Price ($) 7.9172 2.1634 3.6596 0.0011 0.0003 0.0002 1.8972 0.0690 Since the p-value corresponding to F = 14.7892 is .000 < = .05, there is a significant relationship between the resale value and the independent variables. Note that individually, Suggested retail Price is not significant at the .05 level of significance. If we rerun the regression using just Type1 and Type2 the value of Adjusted R-Square decreases to .5482, a drop of approximately .04. Thus, it appears that for these DATA, the type of vehicle is the strongest predictor of the resale value. 13 - 298 Chapter 14 Statistical Methods for Quality Control Learning Objectives 1. Learn about the importance of quality control and how statistical methods can assist in the quality control process. 2. Learn about acceptance sampling procedures. 3. Know the difference between consumer’s risk and producer’s risk. 4. Be able to use the binomial probability distribution to develop acceptance sampling plans. 5. Know what is meant by multiple sampling plans. 6. Be able to construct quality control charts and understand how they are used for statistical process control. 7. Know the definitions of the following terms: producer's risk assignable causes consumer's risk common causes acceptance sampling control charts acceptable criterion upper control limit operating characteristic curve lower control limit 13 - 299 Solutions: 1. a. b. For n = 4 UCL = + 3( / n ) = 12.5 + 3(.8 / 4 ) = 13.7 LCL = - 3( / n ) = 12.5 - 3(.8 / 4 ) = 11.3 For n = 8 UCL = + 3(.8 / 8 ) = 13.35 LCL = - 3(.8 / 8 ) = 11.65 For n = 16 UCL = + 3(.8 / 16 ) = 13.10 LCL = - 3(.8 / 16 ) = 11.90 2. c. UCL and LCL become closer together as n increases. If the process is in control, the larger SAMPLEs should have less variance and should fall closer to 12.5. a. 677.5 5.42 25(5) UCL = + 3( / b. LCL = - 3( / 3. n ) = 5.42 + 3(.5 / 5 ) = 6.09 n ) = 5.42 - 3(.5 / 5 ) = 4.75 135 0.0540 25(100) a. p b. p p(1 p) n 0.0540(0.9460) 0.0226 100 UCL = p + 3 p = 0.0540 + 3(0.0226) = 0.1218 c. LCL = p - 3 p = 0.0540 -3(0.0226) = -0.0138 Use LCL = 0 4. R Chart: UCL = RD4 = 1.6(1.864) = 2.98 LCL = RD3 = 1.6(0.136) = 0.22 x Chart: UCL = x A2 R = 28.5 + 0.373(1.6) = 29.10 LCL = x A2 R = 28.5 - 0.373(1.6) = 27.90 5. a. UCL = + 3( / LCL = - 3( / n ) = 128.5 + 3(.4 / 6 ) = 128.99 n ) = 128.5 - 3(.4 / 6 ) = 128.01 14 - 300 6. b. x xi / n 772.4 128.73 6 in control c. x xi / n 774.3 129.05 6 out of control 20.12 19.90 20.01 2 Process Mean = UCL = + 3( / n ) = 20.01 + 3( / 5 ) = 20.12 Solve for : (20.12 20.01) 5 0.082 3 7. SAMPLE Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Observations 31 26 25 17 38 41 21 32 41 29 26 23 17 43 18 30 28 40 18 22 42 18 30 25 29 42 17 26 34 17 31 19 24 35 25 42 36 29 29 34 28 35 34 21 35 36 29 28 33 30 40 25 32 17 29 31 32 31 28 26 xi 33.67 26.33 29.67 21.00 34.00 39.67 22.33 28.67 36.00 25.33 32.33 22.33 24.33 31.67 24.00 34.33 32.00 33.33 25.00 27.33 R = 11.4 and x 29.17 R Chart: UCL = RD4 = 11.4(2.575) = 29.35 LCL = RD3 = 11.4(0) = 0 x Chart: UCL = x A2 R = 29.17 + 1.023(11.4) = 40.8 LCL = x A2 R = 29.17 - 1.023(11.4) = 17.5 13 - 301 Ri 14 17 9 8 9 6 12 6 8 13 14 6 15 26 11 12 8 11 11 12 R Chart: 30 UCL = 29.3 20 R = 11.4 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 LCL = 0 SAMPLE Number x Chart: UCL = 40.8 40 = 30 x = 29.17 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SAMPLE Number 8. 141 0.0470 20(150) a. p b. p p(1 p) n 0.0470(0.9530) 0.0173 150 UCL = p + 3 p = 0.0470 + 3(0.0173) = 0.0989 LCL = p - 3 p = 0.0470 -3(0.0173) = -0.0049 13 - 302 LCL = 17.5 c. Use LCL = 0 12 p 0.08 150 Process should be considered in control. d. p = .047, n = 150 UCL = np + 3 np(1 p) = 150(0.047) + 3 150(0.047)(0.953) = 14.826 LCL = np - 3 np(1 p) = 150(0.047) - 3 150(0.047)(0.953) = -0.726 Thus, the process is out of control if more than 14 defective packages are found in a SAMPLE of 150. 9. e. Process should be considered to be in control since 12 defective packages were found. f. The np chart may be preferred because a decision can be made by simply counting the number of defective packages. a. Total defectives: 165 p b. 165 0.0413 20(200) p 0.0413(0.9587) 0.0141 200 p(1 p) n UCL = p + 3 p = 0.0413 + 3(0.0141) = 0.0836 LCL = p - 3 p = 0.0413 + 3(0.0141) = -0.0010 Use LCL = 0 20 0.10 200 c. p d. p = .0413, n = 200 Out of control UCL = np + 3 np(1 p) = 200(0.0413) + 3 200(0.0413)(0.9587) = 16.702 LCL = np - 3 np(1 p) = 200(0.0413) - 3 200(0.0413)(0.9587) = 0.1821 e. 10. The process is out of control since 20 defective pistons were found. f ( x) n! p x (1 p)nx x!(n x)! When p = .02, the probability of accepting the lot is 13 - 303 f (0) 25! 0!(25 0)! (0.02)0 (1 0.02)25 0.6035 When p = .06, the probability of accepting the lot is f (0) 11. a. 25! 0!(25 0)! (0.06) 0 (1 0.06)25 0.2129 Using binomial probabilities with n = 20 and p0 = .02. P (Accept lot) = f (0) = .6676 Producer’s risk: = 1 - .6676 = .3324 b. P (Accept lot) = f (0) = .2901 Producer’s risk: = 1 - .2901 = .7099 12. At p0 = .02, the n = 20 and c = 1 plan provides P (Accept lot) = f (0) + f (1) = .6676 + .2725 = .9401 Producer’s risk: = 1 - .9401 = .0599 At p0 = .06, the n = 20 and c = 1 plan provides P (Accept lot) = f (0) + f (1) = .2901 + .3703 = .6604 Producer’s risk: = 1 - .6604 = .3396 For a given SAMPLE size, the producer’s risk decreases as the acceptance number c is increased. 13. a. Using binomial probabilities with n = 20 and p0 = .03. P(Accept lot) = f (0) + f (1) = .5438 + .3364 = .8802 Producer’s risk: = 1 - .8802 = .1198 b. With n = 20 and p1 = .15. P(Accept lot) = f (0) + f (1) = .0388 + .1368 = .1756 Consumer’s risk: = .1756 c. The consumer’s risk is acceptable; however, the producer’s risk associated with the n = 20, c = 1 plan is a little larger than desired. 13 - 304 14. c 0 1 2 P (Accept) p0 = .05 .5987 .9138 .9884 Producer’s Risk .4013 .0862 .0116 P (accept) p1 = .30 .0282 .1493 .3828 Consumer’s Risk .0282 .1493 .3828 (n = 15) 0 1 2 3 .4633 .8291 .9639 .9946 .5367 .1709 .0361 .0054 .0047 .0352 .1268 .2968 .0047 .0352 .1268 .2968 (n = 20) 0 1 2 3 .3585 .7359 .9246 .9842 .6415 .2641 .0754 .0158 .0008 .0076 .0354 .1070 .0008 .0076 .0354 .1070 (n = 10) The plan with n = 15, c = 2 is close with = .0361 and = .1268. However, the plan with n = 20, c = 3 is necessary to meet both requirements. 15. a. P (Accept) shown for p values below: c 0 1 2 p = .01 .8179 .9831 .9990 p = .05 .3585 .7359 .9246 p = .08 .1887 .5169 .7880 p = .10 .1216 .3918 .6770 p = .15 .0388 .1756 .4049 The operating characteristic curves would show the P (Accept) versus p for each value of c. b. P (Accept) c 0 1 2 16. a. x 20 At p0 = .01 .8179 .9831 .9990 1908 Producer’s Risk .1821 .0169 .0010 At p1 = .08 .1887 .5169 .7880 Consumer’s Risk .1887 .5169 .7880 95.4 20 b. UCL = + 3( / LCL = - 3( / c. n ) = 95.4 + 3(.50 / 5 ) = 96.07 n ) = 95.4 - 3(.50 / 5 ) = 94.73 No; all were in control 13 - 305 17. a. For n = 10 UCL = + 3( / LCL = - 3( / n ) = 350 + 3(15 / 10 ) = 364.23 n ) = 350 - 3(15 / 10 ) = 335.77 For n = 20 UCL = 350 + 3(15 / 20 ) = 360.06 LCL = 350 - 3(15 / 20 ) = 339.94 For n = 30 UCL = 350 + 3(15 / 30 ) = 358.22 LCL = 350 - 3(15 / 30 ) = 343.78 b. Both control limits come closer to the process mean as the SAMPLE size is increased. c. The process will be declared out of control and adjusted when the process is in control. d. The process will be judged in control and allowed to continue when the process is out of control. e. All have z = 3 where each tail area = 1 - .9986 = .0014 P (Type I) = 2 (.0014) = .0028 f. 18. The Type II error probability is reduced as the SAMPLE size is increased. R Chart: UCL = RD4 = 2(2.115) = 4.23 LCL = RD3 = 2(0) = 0 x Chart: UCL = x A2 R = 5.42 + 0.577(2) = 6.57 LCL = x A2 R = 5.42 - 0.577(2) = 4.27 Estimate of Standard Deviation: $ R d2 19. R = 0.665 2 0.86 2.326 x = 95.398 x Chart: UCL = x A2 R = 95.398 + 0.577(0.665) = 95.782 LCL = x A2 R = 95.398 - 0.577(0.665) = 95.014 R Chart: 13 - 306 UCL = RD4 = 0.665(2.115) = 1.406 LCL = RD3 = 0.665(0) = 0 The R chart indicated the process variability is in control. All SAMPLE ranges are within the control limits. However, the process mean is out of control. SAMPLE 11 ( x = 95.80) and SAMPLE 17 ( x =94.82) fall outside the control limits. 20. R = .053 x = 3.082 x Chart: UCL = x A2 R = 3.082 + 0.577(0.053) = 3.112 LCL = x A2 R = 3.082 - 0.577(0.053) = 3.051 R Chart: UCL = RD4 = 0.053(2.115) = 0.1121 LCL = RD3 = 0.053(0) = 0 All DATA points are within the control limits for both charts. 21. a. .0 8 UCL .0 6 .0 4 .0 2 LCL 0 Warning: Process should be checked. All points are within control limits; however, all points are also greater than the process proportion defective. 13 - 307 b. 25 UCL 24 23 LCL 22 Warning: Process should be checked. All points are within control limits yet the trend in points show a movement or shift toward UCL out-of-control point. 22. a. p = .04 p 0.04(0.96) p(1 p) 0.0139 n 200 UCL = p + 3 p = 0.04 + 3(0.0139) = 0.0817 LCL = p - 3 p = 0.04 - 3(0.0139) = -0.0017 Use LCL = 0 b. 13 - 308 out of control UCL (.082) .04 LCL (0) For month 1 p = 10/200 = 0.05. Other monthly values are .075, .03, .065, .04, and .085. Only the last month with p = 0.085 is an out-of-control situation. 23. a. Use binomial probabilities with n = 10. At p0 = .05, P(Accept lot) = f (0) + f (1) + f (2) = .5987 + .3151 + .0746 = .9884 Producer’s Risk: = 1 - .9884 = .0116 At p1 = .20, P(Accept lot) = f (0) + f (1) + f (2) = .1074 + .2684 + .3020 = .6778 Consumer’s risk: = .6778 b. The consumer’s risk is unacceptably high. Too many bad lots would be accepted. c. Reducing c would help, but increasing the SAMPLE size appears to be the best solution. 24. a. P (Accept) are shown below: (Using n = 15) f (0) f (1) p = .01 .8601 .1303 .9904 p = .02 .7386 .2261 .9647 p = .03 .6333 .2938 .9271 p = .04 .5421 .3388 .8809 p = .05 .4633 .3658 .8291 = 1 - P (Accept) .0096 .0353 .0729 .1191 .1709 Using p0 = .03 since is close to .075. Thus, .03 is the fraction defective where the producer will tolerate a .075 probability of rejecting a good lot (only .03 defective). b. f (0) p = .25 .0134 13 - 309 f (1) .0668 .0802 = 25. a. P (Accept) when n = 25 and c = 0. Use the binomial probability function with n! f ( x) p x (1 p)nx x!(n x)! or 25! 0 f (0) p (1 p)25 (1 p)25 0!25! If p = .01 p = .03 p = .10 p = .20 f (0) .7778 .4670 .0718 .0038 b. 1.0 P (Accept) .8 .6 .4 .2 .00 .02 .04 .06 .08 .10 .12 Percent Defective c. 1 - f (0) = 1 - .778 = .222 13 - 310 .14 .16 .18 .20