INSTRUCTOR’S SOLUTIONS MANUAL NANCY S. BOUDREAU Bowling Green State University S TATISTICS FOR B USINESS AND E CONOMICS TWELFTH EDITION James T. McClave Info Tech, Inc. University of Florida P. George Benson College of Charleston Terry Sincich University of South Florida Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Reproduced by Pearson from electronic files supplied by the author. Copyright © 2014, 2011, 2008 Pearson Education, Inc. Publishing as Pearson, 75 Arlington Street, Boston, MA 02116. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. ISBN-13: 978-0-321-83681-6 ISBN-10: 0-321-83681-2 www.pearsonhighered.com Chapter 1 Statistics, Data, and Statistical Thinking 1.1 Statistics is a science that deals with the collection, classification, analysis, and interpretation of information or data. It is a meaningful, useful science with a broad, almost limitless scope of applications to business, government, and the physical and social sciences. 1.2 Descriptive statistics utilizes numerical and graphical methods to look for patterns, to summarize, and to present the information in a set of data. Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data. 1.3 The four elements of a descriptive statistics problem are: 1. 2. 3. 4. 1.4 The population or sample of interest. This is the collection of all the units upon which the variable is measured. One or more variables that are to be investigated. These are the types of data that are to be collected. Tables, graphs, or numerical summary tools. These are tools used to display the characteristic of the sample or population. Identification of patterns in the data. These are conclusions drawn from what the summary tools revealed about the population or sample. The five elements of an inferential statistical analysis are: 1. 2. 3. 4. 5. The population of interest. The population is a set of existing units. One or more variables that are to be investigated. A variable is a characteristic or property of an individual population unit. The sample of population units. A sample is a subset of the units of a population. The inference about the population based on information contained in the sample. A statistical inference is an estimate, prediction, or generalization about a population based on information contained in a sample. A measure of reliability for the inference. The reliability of an inference is how confident one is that the inference is correct. 1.5 The first major method of collecting data is from a published source. These data have already been collected by someone else and are available in a published source. The second method of collecting data is from a designed experiment. These data are collected by a researcher who exerts strict control over the experimental units in a study. These data are measured directly from the experimental units. The final method of collecting data is observational. These data are collected directly from experimental units by simply observing the experimental units in their natural environment and recording the values of the desired characteristics. The most common type of observational study is a survey. 1.6 Quantitative data are measurements that are recorded on a meaningful numerical scale. Qualitative data are measurements that are not numerical in nature; they can only be classified into one of a group of categories. 1.7 A population is a set of existing units such as people, objects, transactions, or events. A variable is a characteristic or property of an individual population unit such as height of a person, time of a reflex, amount of a transaction, etc. 1 Copyright © 2014 Pearson Education, Inc. 2 Chapter 1 1.8 A population is a set of existing units such as people, objects, transactions, or events. A sample is a subset of the units of a population. 1.9 A representative sample is a sample that exhibits characteristics similar to those possessed by the target population. A representative sample is essential if inferential statistics is to be applied. If a sample does not possess the same characteristics as the target population, then any inferences made using the sample will be unreliable. 1.10 An inference without a measure of reliability is nothing more than a guess. A measure of reliability separates statistical inference from fortune telling or guessing. Reliability gives a measure of how confident one is that the inference is correct. 1.11 A population is a set of existing units such as people, objects, transactions, or events. A process is a series of actions or operations that transform inputs to outputs. A process produces or generates output over time. Examples of processes are assembly lines, oil refineries, and stock prices. 1.12 Statistical thinking involves applying rational thought processes to critically assess data and inferences made from the data. It involves not taking all data and inferences presented at face value, but rather making sure the inferences and data are valid. 1.13 The data consisting of the classifications A, B, C, and D are qualitative. These data are nominal and thus are qualitative. After the data are input as 1, 2, 3, and 4, they are still nominal and thus qualitative. The only differences between the two data sets are the names of the categories. The numbers associated with the four groups are meaningless. 1.14 Answers will vary. First, number the elements of the population from 1 to 200,000. Using MINITAB, generate 10 numbers on the interval from 1 to 200,000, eliminating any duplicates. The 10 numbers selected for the random sample are: 135075 89127 189226 83899 112367 191496 110021 44853 42091 198461 Elements with the above numbers are selected for the sample. 1.15 a. The experimental unit for this study is a single-family residential property in Arlington, Texas. b. The variables measured are the sale price and the Zillow estimated value. Both of these variables are quantitative. c. If these 2,045 properties were all the properties sold in Arlington, Texas in the past 6 months, then this would be considered the population. d. If these 2,045 properties represent a sample, then the population would be all the single-family residential properties sold in the last 6 months in Arlington, Texas. Copyright © 2014 Pearson Education, Inc. Statistics, Data, and Statistical Thinking 1.16 1.17 1.18 1.19 3 e. No. The real estate market across the United States varies greatly. The prices of single-family residential properties in this small area are probably not representative of all properties across the United States. a. The experimental unit for this study is an NFL quarterback. b. The variables measured in this study include draft position, NFL winning ratio, and QB production score. Since the draft position was put into 3 categories, it is a qualitative variable. The NFL winning ratio and the QB production score are both quantitative. c. Since we want to project the performance of future NFL QBs, this would be an application of inferential statistics. a. The population of interest is all citizens of the United States. b. The variable of interest is the view of each citizen as to whether the president is doing a good or bad job. It is qualitative. c. The sample is the 2000 individuals selected for the poll. d. The inference of interest is to estimate the proportion of all U.S. citizens who believe the president is doing a good job. e. The method of data collection is a survey. f. It is not very likely that the sample will be representative of the population of all citizens of the United States. By selecting phone numbers at random, the sample will be limited to only those people who have telephones. Also, many people share the same phone number, so each person would not have an equal chance of being contacted. Another possible problem is the time of day the calls are made. If the calls are made in the evening, those people who work in the evening would not be represented. a. High school GPA is a number usually between 0.0 and 4.0. Therefore, it is quantitative. b. Honors/awards would have responses that name things. Therefore, it would be qualitative. c. The scores on the SAT's are numbers between 200 and 800. Therefore, it is quantitative. d. Gender is either male or female. Therefore, it is qualitative. e. Parent's income is a number: $25,000, $45,000, etc. Therefore, it is quantitative. f. Age is a number: 17, 18, etc. Therefore, it is quantitative. I. Qualitative; the possible responses are "yes" or "no," which are non-numerical. II. Quantitative; age is measured on a numerical scale, such as 15, 32, etc. III. Qualitative; the possible responses are “yes” or “no,” which are non-numerical. IV. Qualitative; the possible responses are "laser printer" or "another type of printer," which are nonnumerical. Copyright © 2014 Pearson Education, Inc. 4 1.20 Chapter 1 V. Qualitative; the speeds can be classified as "slower," "unchanged," or "faster," which are nonnumerical. VI. Quantitative; the number of people in a household who have used Windows 95 at least once is measured on a numerical scale, such as 0, 1, 2, etc. a. For question 1, the data collected would be qualitative. The possible response would be “yes” or “no”. For question 2, the data collected would be quantitative. The responses would be numbers such as 0, 1, 2, etc. For question 3, the data collected would be qualitative. The possible responses would be “yes” or “no”. 1.21 1.22 1.23 b. The data collected from the 1,066 adults would be a sample. These adults would only be a part of all adults in the United States. a. Whether the data collected on the chief executive officers at the 500 largest U. S. companies is a population or a sample depends on what one is interested in. If one is only interested in the information from the CEO’s of the 500 largest U.S. companies, then these data form a population. If one is interested in the information on CEO’s from all U.S. firms, then these data would form a sample. b. 1. The industry type of the CEO’s company is a qualitative variable. The industry type is a name. 2. The CEO’s total compensation is a meaningful number. Thus, it is a quantitative variable. 3. The CEO’s total compensation over the previous five years is a meaningful number. Thus, it is a quantitative variable. 4. The number of company stock shares (millions) held is a meaningful number. Thus, it is a quantitative variable. 5. The CEO’s age is a meaningful number. Thus, it is a quantitative variable. 6. The CEO’s efficiency rating is a meaningful number. Thus, it is a quantitative variable. a. The population of interest is the status of computer crime at all United States businesses and government agencies. b. The method of data collection was a survey. Since not all of those who were sent a survey responded, the sample was self-selected. The results are probably not representative of the population. Usually, those who respond to surveys have very strong opinions, either positive or negative. c. The variable of interest is whether or not the firm or agency had unauthorized use of its computer systems during the year. Since the response would be either yes or no, the variable would be qualitative. d. If the sample was representative, we could infer that approximately 41% of all U. S. corporations and government agencies experienced unauthorized use of their computer systems during the year. Since the data collected consist of the entire population, this would represent a descriptive study. Flaherty used the data to help describe the condition of the U.S. Treasury in 1861. Copyright © 2014 Pearson Education, Inc. Statistics, Data, and Statistical Thinking 5 1.24 This study would be an example of inferential statistics. The researchers collected data over 2 years. Using this information, the researchers are projecting or making inferences about what will happen in the future. 1.25 a. The population of interest is all individuals who earned MBA degrees since January 1990. b. The method of data collection was a survey. c. This is probably not a representative sample. The sample was self-selected. Not all of those who were selected for the study responded to all four surveys. Those who did respond to all 4 surveys probably have very strong opinions, either positive or negative, which may not be representative of all of those in the population. a. The population of interest is all CPA firms. b. A survey was used to collect the data. c. This sample was probably not representative. Not all of those selected to be in the sample responded. In fact, only 992 of the 23,500 people who were sent the survey responded. Generally, those who do respond to surveys have very strong opinions, either positive or negative. These may not be the opinions of all CPA firms. d. Since the sample may not be representative, the inferences drawn in the study may not be valid. a. Length of maximum span can take on values such as 15 feet, 50 feet, 75 feet, etc. Therefore, it is quantitative. b. The number of vehicle lanes can take on values such as 2, 4, etc. Therefore, it is quantitative. c. The answer to this item is "yes" or "no," which is not numeric. Therefore, it is qualitative. d. Average daily traffic could take on values such as 150 vehicles, 3,579 vehicles, 53,295 vehicles, etc. Therefore, it is quantitative. e. Condition can take on values "good," "fair," or "poor," which are not numeric. Therefore, it is qualitative. f. The length of the bypass or detour could take on values such as 1 mile, 4 miles, etc. Therefore, it is quantitative. g. Route type can take on values "interstate," U.S.," "state," "county," or "city," which are not numeric. Therefore, it is qualitative. a. The variable of interest to the researchers is the rating of highway bridges. b. Since the rating of a bridge can be categorized as one of three possible values, it is qualitative. c. The data set analyzed is a population since all highway bridges in the U.S. were categorized. d. The data were collected observationally. Each bridge was observed in its natural setting. 1.26 1.27 1.28 Copyright © 2014 Pearson Education, Inc. 6 Chapter 1 1.29 a. The process being studied is the distribution of pipes, valves, and fittings to the refining, chemical, and petrochemical industries by the Wallace Company of Houston. b. The variables of interest are the speed of the deliveries, the accuracy of the invoices, and the quality of the packaging of the products. c. The sampling plan was to monitor a subset of current customers by sending out a questionnaire twice a year and asking the customers to rate the speed of the deliveries, the accuracy of the invoices, and the quality of the packaging minutes. The sample is the total numbers of questionnaires received. d. The Wallace Company's immediate interest is learning about the delivery process of its distribution of pipes, valves, and fittings. To do this, it is measuring the speed of deliveries, the accuracy of the invoices, and the quality of its packaging from the sample of its customers to make an inference about the delivery process to all customers. In particular, it might use the mean speed of its deliveries to the sampled customers to estimate the mean speed of its deliveries to all its customers. It might use the mean accuracy of its invoices from the sampled customers to estimate the mean accuracy of its invoices of all its customers. It might use the mean rating of the quality of its packaging from the sampled customers to estimate the mean rating of the quality of its packaging of all its customers. e. Several factors might affect the reliability of the inferences. One factor is the set of customers selected to receive the survey. If this set is not representative of all the customers, the wrong inferences could be made. Also, the set of customers returning the surveys may not be representative of all its customers. Again, this could influence the reliability of the inferences made. a. The population of interest would be the set of all students. The sample of interest would be the students participating in the experiment. The variable measured in this study is whether the student would spend money on repairing a very old car or not. b. The data-collection method used was a designed experiment. The students participating in the experiment were randomly assigned to one of three emotional states and then asked a question. c. The researcher could estimate the proportion of all students in each of the three emotional states who would spend money to repair a very old car. d. One factor that might affect the reliability of the inference drawn is whether the students in the experiment were representative of all students. It is stated that the sample was made up of volunteer students. Chances are that these volunteer students were not representative of all students. In addition, if these students were all from the same school, they probably would not be representative of the population of students either. a. The population of interest would be all accounting alumni of a large southwestern university. b. Age would produce quantitative data – the responses would be numbers. 1.30 1.31 Gender would produce qualitative data – the responses would be ‘male’ or ‘female’. Level of education would produce qualitative data – the responses could be categories such college degree, master’s degree, or PhD degree. Income would produce quantitative data – the responses would be numbers. Job satisfaction score would produce quantitative data. We would assume that a satisfaction score would be a number, where the higher the number, the higher the job satisfaction. Machiavellian rating score would produce quantitative data. We would assume that a rating score Copyright © 2014 Pearson Education, Inc. Statistics, Data, and Statistical Thinking 7 would be a number, where the higher the score, the higher the Machiavellian traits. c. The sample is the 198 people who returned the useable questionnaires. d. The data collection method used was a survey. e. The inference made by the researcher is that Machiavellian behavior is not required to achieve success in the accounting profession. f. Generally, those who respond to surveys are those with strong feelings (in either direction) toward the subject matter. Those who do not have strong feelings for the subject matter tend not to answer surveys. Those who did not respond might be those who are not real happy with their jobs or those who are not real unhappy with their jobs. Thus, we might have no idea what type of scores these people would have on the Machiavellian rating score. 1.32 a. Give each stock in the NYSE-Composite Transactions table of the Wall Street Journal a number (1 to m). Using a random number table or a computer program, select n different numbers on the interval from 1 to m. The stocks with the same numbers as the n chosen numbers will be selected for the sample. 1.33 a. The experimental units for this study are engaged couples who used a particular website. b. There are two variables of interest – the price of the engagement ring and the level of appreciation. Price of the engagement ring is a quantitative variable because it is measured on a numerical scale. Level of appreciation is a qualitative variable. There are 7 different categories for this variable that are then assigned numbers. c. The population of interest would be all engaged couples. d. No, the sample is probably not representative. Only engaged couples who used a particular web site were eligible to be in the sample. Then, only those with “average” American names were invited to be in the sample. e. Answers will vary. First, we will number the individuals from 1 to 50. Using MINITAB, 25 random numbers were generated on the interval from 1 to 50. The random numbers are: 1, 4, 5, 8, 12, 13, 17, 18, 19, 20, 22, 26, 27, 30, 31, 33, 34, 35, 38, 39, 40, 42, 43, 46, 49 The individuals who were assigned the numbers corresponding to the above numbers would be assigned to one role and the remaining individuals would be assigned to the other role. 1.34 Answers will vary. Using MINITAB, the 5 seven-digit phone numbers generated with area code 373 were: 373-639-0598 373-411-9164 373-502-7699 373-782-2719 373-930-3231 1.35 a. Some possible questions are: 1. In your opinion, why has the banking industry consolidated in the past few years? Check all that apply. a. Too many small banks with not enough capital. Copyright © 2014 Pearson Education, Inc. 8 Chapter 1 b. c. d. e. f. 2. A result of the Savings and Loan scandals. To eliminate duplicated resources in the upper management positions. To provide more efficient service to the customers. To provide a more complete list of financial opportunities for the customers. Other. Please list. Using a scale from 1 to 5, where 1 means strongly disagree and 5 means strongly agree, indicate your agreement to the following statement: "The trend of consolidation in the banking industry will continue in the next five years." 1 strongly disagree 1.36 1.37 1.38 2 disagree 3 no opinion 4 agree 5 strongly agree b. The population of interest is the set of all bank presidents in the United States. c. It would be extremely difficult and costly to obtain information from all bank presidents. Thus, it would be more efficient to sample just 200 bank presidents. However, by sending the questionnaires to only 200 bank presidents, one risks getting the results from a sample which is not representative of the population. The sample must be chosen in such a way that the results will be representative of the entire population of bank presidents in order to be of any use. a. The process being studied is the process of filling beverage cans with soft drink at CCSB's Wakefield plant. b. The variable of interest is the amount of carbon dioxide added to each can of beverage. c. The sampling plan was to monitor five filled cans every 15 minutes. The sample is the total number of cans selected. d. The company's immediate interest is learning about the process of filling beverage cans with soft drink at CCSB's Wakefield plant. To do this, they are measuring the amount of carbon dioxide added to a can of beverage to make an inference about the process of filling beverage cans. In particular, they might use the mean amount of carbon dioxide added to the sampled cans of beverage to estimate the mean amount of carbon dioxide added to all the cans on the process line. e. The technician would then be dealing with a population. The cans of beverage have already been processed. He/she is now interested in the outputs. a. The population of interest is the set of all people in the United States over 14 years of age. b. The variable being measured is the employment status of each person. This variable is qualitative. Each person is either employed or not. c. The problem of interest to the Census Bureau is inferential. Based on the information contained in the sample, the Census Bureau wants to estimate the percentage of all people in the labor force who are unemployed. Suppose we want to select 900 intersections by numbering the intersections from 1 to 500,000. We would then use a random number table or a random number generator from a software program to select 900 distinct intersection points. These would then be the sampled markets. Now, suppose we want to select the 900 intersections by selecting a row from the 500 and a column from the 1,000. We would first number the rows from 1 to 500 and number the columns from 1 to 1,000. Using a random number generator, we would generate a sample of 900 from the 500 rows. Obviously, many rows will be selected more than once. At the same time, we use a random number generator to select 900 Copyright © 2014 Pearson Education, Inc. Statistics, Data, and Statistical Thinking 9 columns from the 1,000 columns. Again, some of the columns could be selected more than once. Placing these two sets of random numbers side-by-side, we would use the row-column combinations to select the intersections. For example, suppose the first row selected was 453 and the first column selected was 731. The first intersection selected would be row 453, column 731. This process would be continued until 900 unique intersections were selected. 1.39 Answers will vary. a. The results as stated indicate that by eating oat bran, one can improve his/her health. However, the only way to get the stated benefit is to eat only oat bran with limited results. People may change their eating habits expecting an outcome that is almost impossible. b. To investigate the impact of domestic violence on birth defects, one would need to collect data on all kinds of birth defects and whether the mother suffered any domestic violence or not during her pregnancy. One could use an observational study survey to collect the data. c. Very few people are always happy with the way they are. However, many people are happy with themselves most of the time. One might want to ask a series of questions to measure self-esteem rather than just one. One question might ask what percent of the time the high school girl is happy with the way she is. d. The results of the study are probably misleading because of the fact that if someone relied on a limited number of foods to feed her children it does not imply that the children are hungry. In addition, one might cut the size of a meal because the children were overweight, not because there was not enough food. One might get better information about the proportion of hungry American children by actually recording what a large, representative sample of children eat in a week. e. A leading question gives information that seems to be true, but may not be complete. Based on the incomplete information, the respondent may come to a different decision than if the information was not provided. Copyright © 2014 Pearson Education, Inc. Chapter 2 Methods for Describing Sets of Data 2.1 First, we find the frequency of the grade A. The sum of the frequencies for all five grades must be 200. Therefore, subtract the sum of the frequencies of the other four grades from 200. The frequency for grade A is: 200 (36 + 90 + 30 + 28) = 200 184 = 16 To find the relative frequency for each grade, divide the frequency by the total sample size, 200. The relative frequency for the grade B is 36/200 = .18. The rest of the relative frequencies are found in a similar manner and appear in the table: Grade on Statistics Exam A: 90 100 B: 80 89 C: 65 79 D: 50 64 F: Below 50 Total 2.2 a. Relative Frequency .08 .18 .45 .15 .14 1.00 To find the frequency for each class, count the number of times each letter occurs. The frequencies for the three classes are: Class X Y Z Total b. Frequency 16 36 90 30 28 200 Frequency 8 9 3 20 The relative frequency for each class is found by dividing the frequency by the total sample size. The relative frequency for the class X is 8/20 = .40. The relative frequency for the class Y is 9/20 = .45. The relative frequency for the class Z is 3/20 = .15. Class X Y Z Total Frequency 8 9 3 20 Relative Frequency .40 .45 .15 1.00 10 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data c. 11 The frequency bar chart is: 9 8 Frequency 7 6 5 4 3 2 1 0 X d. Y C la s s Z The pie chart for the frequency distribution is: Pie Chart of Class Category X Z 15.0% Y Z X 40.0% Y 45.0% 2.3 a. The type of graph is a bar graph. b. The variable measured for each of the robots is type of robotic limbs. c. From the graph, the design used the most is the “legs only” design. d. The relative frequencies are computed by dividing the frequencies by the total sample size. The total sample size is n = 106. The relative frequencies for each of the categories are: Type of Limbs None Both Legs ONLY Wheels ONLY Total Frequency 15 8 63 20 106 Relative Frequency 15/106 = .142 8 / 106 = .075 63/106 = .594 20/106 = .189 1.000 Copyright © 2014 Pearson Education, Inc. 12 Chapter 2 e. Using MINITAB, the Pareto diagram is: .60 Relative Frequency .50 .40 .30 .20 .10 0 Legs Wheels None Both Type Percent within all data. 2.4 a. From the pie chart, 50.4% or .504 of the sampled adults living in the U.S. use the internet and pay to download music. From the data, 506 out of 1,003 adults or 506/1,003 = .504 of sampled adults in the U.S. use the internet and pay to download music. These two results agree. b. Using MINITAB, a pie chart of the data is: Pie Chart of Download-Music Category Pay No Pay No Pay 33.0% Pay 67.0% Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.5 13 Using MINITAB, the Pareto diagram for the data is: Chart of Tenants 50 Percent 40 30 20 10 0 Small SmallStandard Large Tenants Major Anchor Percent within all data. Most of the tenants in UK shopping malls are small or small standard. They account for approximately 84% of all tenants ([711 + 819]/1,821 = .84). Very few (less than 1%) of the tenants are anchors. 2.6 a. The relative frequency for each response category is found by dividing the frequency by the total sample size. The relative frequency for the category “Insurance Companies” is 869/2119 = .410. The rest of the relative frequencies are found in a similar manner and are reported in the table. Most responsible for rising health-care costs Insurance companies Pharmaceutical companies Government Hospitals Physicians Other Not at all sure TOTAL Number responding 869 339 338 127 85 128 233 2,119 Copyright © 2014 Pearson Education, Inc. Relative Frequencies 869/2119 = .410 339/2119 = .160 338/2119 = .160 127/2119 = .060 85/2119 = .040 128/2119 = .060 233/2119 = .110 1.000 14 Chapter 2 b. Using MINITAB, the relative frequency bar chart is: Chart of Category 40 Count 30 20 10 0 Insurance Co c. Pharm Government Hospitals Physicians Category Other Not sure O ther Phy sicians Using MINITAB, the Pareto diagram is: Chart of Category Relative Frequency .40 .30 .20 .10 0 Insurance Co. Gov ernment Pharm Not sure Hospitals Category Most American adults in the sample (41%) believe that the Insurance companies are the most responsible for the rising costs of health care. The next highest categories are Government and Pharmaceutical companies with about 16% each. Only 4% of American adults in the sample believe physicians are the most responsible for the rising health care costs. 2.7 a. Since the variable measured is manufacturer, the data type is qualitative. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data b. 15 Using MINITAB, a frequency bar chart for the data is: Number Shipped 120000 Frequency 100000 80000 60000 40000 Pax Tech Provenco SZZT Toshiba Urmet Toshiba Pax Tech Glintt EIntelligent Urmet Omron KwangWoo EIntelligent Glintt Fujian Landi Bitel 0 CyberNet 20000 Manufacturer Using MINITAB, the Pareto diagram is: Number Shipped 120000 100000 Frequency 80000 60000 40000 Bitel CyberNet Provenco Omron KwangWoo 0 SZZT 20000 Fujian Landi c. Manufacturer Most PIN pads shipped in 2007 were manufactured by either Fujian Landi or SZZT Electronics. These two categories make up (119,000 + 67,300)/334,039= 186,300/334,039 = .558 of all PIN pads shipped in 2007. Urmet shipped the fewest number of PIN pads among these 12 manufacturers. Copyright © 2014 Pearson Education, Inc. 16 2.8 Chapter 2 Using MINITAB, the bar graphs of the 2 waves is: Sch NoWorkGrad NoWorkBusSch WorkMBA Sch 2 NoWorkGrad WorkMBA NoWorkBusSch 1 90 80 70 60 50 40 30 20 10 0 WorkNoMBA Percent WorkNoMBA Chart of Job Status Job Status Panel variable: Wave; Percent within all data. In wave 1, most of those taking the GMAT were working (2657/3244 =.819) and none had MBA’s. About 20% were not working but were in either a 4-year institution or other graduate school ([36 + 551]/3244 = .181). In wave 2, almost all were now working ([1787 + 1372]/3244 = .974). Of those working, more than half had MBA’s (1787/[1787 + 1372] = .566). Of those not working, most were in another graduate school. 2.9 Using MINITAB, the pie chart is: Pie Chart of Percent vs Blog/Forum C ategory C ompany Employ ees Third Party Not Identified Not Identified 15.4% Company 38.5% Third Party 11.5% Employ ees 34.6% Companies and Employees represent (38.5 + 34.6 = 73.1) slightly more than 73% of the entities creating blogs/forums. Third parties are the least common entity. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.10 17 Using MINITAB, a bar chart of the data is: Chart of INDUSTRY 14 12 Count 10 8 6 4 2 Aerospace & Defense Banking Business Services & Capital Goods Chemicals Conglomerates Construction Consumer Durables Diversified Financia Drugs & biotechnolog Food Drink & tobacco Health care equipmen Hotels, Restaurants Household & personal Insurance Materials Media Oil & Gas Operations Retailing Semiconductors Software & Services Technology Hardware Telecommunications s Transportation Utilities 0 INDUSTRY Industries with the highest frequencies include Oil & Gas Operations, Retailing, Drugs & biotechnologies, and Health care equipment. Industries with the smallest frequencies include Business Services, Construction, Banking, and Consumer Durables. 2.11 a. Using MINITAB, a pie chart of the data is: Pie Chart of PREVUSE Category NEVER USED USED 28.8% NEVER 71.2% From the chart, 71.2% or .712 of the sampled physicians have never used ethics consultation. Copyright © 2014 Pearson Education, Inc. 18 Chapter 2 b. Using MINITAB, a pie chart of the data is: Pie Chart of FUTUREUSE C ategory NO YES NO 19.5% YES 80.5% From the chart, 19.5% or .195 of the sampled physicians state that they will not use the services in the future. c. Using MINITAB, the side-by-side pie charts are: Pie Chart of PREVUSE MED SURG C ategory NEVER USED USED 27.9% USED 29.3% NEVER 70.7% NEVER 72.1% Panel variable: SPEC The proportion of medical practitioners who have never used ethics consultation is .707. The proportion of surgical practitioners who have never used ethics consultation is .721. These two proportions are almost the same. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data d. 19 Using MINITAB, the side-by-side pie charts are: Pie Chart of FUTUREUSE MED SURG Category NO YES NO 17.3% NO 23.3% YES 76.7% YES 82.7% Panel variable: SPEC The proportion of medical practitioners who will not use ethics consultation in the future is .173. The proportion of surgical practitioners who will not use ethics consultation in the future is .233. The proportion of surgical practitioners who will not use ethics consultation in the future is greater than that of the medical practitioners. Using MINITAB, the side-by-side bar graphs are: Chart of Acquisitions No 1980 Yes 1990 100 75 50 Percent 2.12 25 0 2000 100 75 50 25 0 No Yes Acquisitions Panel variable: Year; Percent within all data. In 1980, very few firms had acquisitions 18 / 1,963 .009 . By 1990, the proportion of firms having acquisitions increased to 350 / 2,197 .159 . By 2000, the proportion of firms having acquisitions increased to 748 / 2,778 .269 . Copyright © 2014 Pearson Education, Inc. 20 2.13 Chapter 2 Using MINITAB, the side-by-side bar graphs are: Chart of Dive Left Middle Ahead Right Behind 80 60 Percent 40 20 0 Tied 80 60 40 20 0 Left Middle Right Dive Panel variable: Situation; Percent within all data. From the graphs, it appears that if the team is either tied or ahead, the goal-keepers tend to dive either right or left with equal probability, with very few diving in the middle. However, if the team is behind, then the majority of goal-keepers tend to dive right (71%). 2.14 Using MINITAB, a pie chart of the data is: Pie Chart of Measure Big Shows 20.0% Total visitors 26.7% Category Big Shows Funds Raised Members Paying visitors Total visitors Funds Raised 23.3% Paying visitors 16.7% Members 13.3% Since the sizes of the slices are close to each other, it appears that the researcher is correct. There is a large amount of variation within the museum community with regard to performance measurement and evaluation. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.15 21 a. The variable measured by Performark is the length of time it took for each advertiser to respond back. b. The pie chart is: Pie Chart of Response Time C ategory > 120 days 13-59 day s 60-120 days Never responded > 120 days 12.0% Never responded 21.0% 13-59 days 33.0% 60-120 day s 34.0% Twenty-one percent or .21 17,000 3,570 of the advertisers never respond to the sales lead. d. The information from the pie chart does not indicate how effective the "bingo cards" are. It just indicates how long it takes advertisers to respond, if at all. a. Using MINITAB, the side-by-side graphs are: Chart of Frequency vs Stars 5 Content 4 3 2 1 Exposure 16 12 8 Frequency 2.16 c. 4 Faculty Opportunity 0 16 12 8 4 0 5 4 3 2 1 Stars Panel variable: Criteria From these graphs, one can see that very few of the top 30 MBA programs got 5-stars in any criteria. In addition, about the same number of programs got 4 stars in each of the 4 criteria. The biggest difference in ratings among the 4 criteria was in the number of programs receiving 3-stars. More programs received 3-stars in Course Content than in any of the other criteria. Consequently, fewer programs received 2-stars in Course Content than in any of the other criteria. Copyright © 2014 Pearson Education, Inc. b. Since this chart lists the rankings of only the top 30 MBA programs in the world, it is reasonable that none of these best programs would be rated as 1-star on any criteria. a. Using MINITAB, bar charts for the 3 variables are: Chart of Well Class 120 100 Count 80 60 40 20 0 Private Public Well Class Chart of Aquifer 200 150 Count 2.17 Chapter 2 100 50 0 Bedrock Unconsolidated Aquifer Chart of Detection 160 140 120 100 Count 22 80 60 40 20 0 Below Limit Detect Detection Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data b. 23 Using MINITAB, the side-by-side bar chart is: Chart of Detection Below Limit Private Detect Public 80 70 Percent 60 50 40 30 20 10 0 Below Limit Detect Detection Panel variable: Well Class; Percent within all data. c. Using MINITAB, the side-by-side bar chart is: Chart of Detection Below Limit Bedrock Detect Unconsoli 70 60 Percent 50 40 30 20 10 0 Below Limit Detect Detection Panel variable: Aquifer; Percent within all data. d. From the bar charts in parts a-c, one can infer that most aquifers are bedrock and most levels of MTBE were below the limit ( 2 / 3) . Also the percentages of public wells verses private wells are relatively close. Approximately 80% of private wells are not contaminated, while only about 60% of public wells are not contaminated. The percentage of contaminated wells is about the same for both types of aquifers ( 30%) . Copyright © 2014 Pearson Education, Inc. 24 2.18 Chapter 2 Using MINITAB, the relative frequency histogram is: .25 Relative Frequency .20 .15 .10 .05 0 0 2.5 4.5 6.5 8.5 Class 10.5 12.5 14.5 16.5 To find the number of measurements for each measurement class, multiply the relative frequency by the total number of observations, n = 500. The frequency table is: Measurement Class Relative Frequency .10 .5 2.5 .15 2.5 4.5 .25 4.5 6.5 .20 6.5 8.5 .05 8.5 10.5 .10 10.5 12.5 .10 12.5 14.5 .05 14.5 16.5 Frequency 500(.10) = 50 500(.15) = 75 500(.25) = 125 500(.20) = 100 500(.05) = 25 500(.10) = 50 500(.10) = 50 500(.05) = 25 500 Using MINITAB, the frequency histogram is: 140 120 100 Frequency 2.19 .5 80 60 40 20 0 0 .5 2.5 4.5 6.5 8.5 Class 10.5 12.5 14.6 16.5 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.20 a. The original data set has 1 + 3 + 5 + 7 + 4 + 3 = 23 observations. b. For the bottom row of the stem-and-leaf display: 25 The stem is 0. The leaves are 0, 1, 2. Assuming that the data are up to two digits, rounded off to the nearest whole number, the numbers in the original data set are 0, 1, and 2. 2.21 2.22 2.23 c. Again, assuming that the data are up to two digits, rounded off to the nearest whole number, the dot plot corresponding to all the data points is: a. This is a frequency histogram because the number of observations is graphed for each interval rather than the relative frequency. b. There are 14 measurement classes. c. There are 49 measurements in the data set. a. The measurement class 10 – 20 has the highest proportion of respondents. b. The approximate proportion of the 144 organizations that reported a percentage monetary loss from malicious insider actions less than 20% is .30 + .38 = .68. c. The approximate proportion of the 144 organizations that reported a percentage monetary loss from malicious insider actions greater than 60% is .07 + .03 + .04 + .05 = .19. d. The approximate proportion of the 144 organizations that reported a percentage monetary loss from malicious insider actions between 20% and 30% is .11. Therefore about .11(144) = 15.84 or 16 of the 144 organizations reported a percentage monetary loss from malicious insider actions between 20% and 30%. a. Since the label on the vertical axis is Percent, this is a relative frequency histogram. We can divide the percents by 100% to get the relative frequencies. b. Summing the percents represented by all of the bars above 100, we get approximately 12%. Copyright © 2014 Pearson Education, Inc. 26 Chapter 2 2.24 a. Using MINITAB, the stem-and-leaf display and histogram are: Stem-and-Leaf Display: Score Stem-and-leaf of Score Leaf Unit = 1.0 1 1 2 3 3 4 4 5 7 11 17 24 41 62 (37) 87 27 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 N = 186 9 3 4 8 3 44 6667 888999 0001111 22222222222333333 444444555555555555555 6666666666666667777777777777777777777 888888888888888888888888888899999999999999999999999999999999 000000000000000000000000000 Histogram of Score 60 Frequency 50 40 30 20 10 0 72 76 80 84 Score 88 92 96 100 b. From the stem-and-leaf display, there are only 7 observations with sanitation scores less than 86. The proportion of ships with accepted sanitation standards is (186 7) / 186 179 / 186 .962 . c. The score of 69 is highlighted in the stem-and-leaf display. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.25 a. Using MINITAB, a dot plot of the data is: Dotplot of Acquisitions 0 240 360 480 Acquisitions 600 720 840 b. By looking at the dot plot, one can conclude that the years 1996-2000 had the highest number of firms with at least one acquisition. The lowest number of acquisitions in that time frame (748) is almost 100 higher than the highest value from the remaining years. a. Using MINITAB, a histogram of the current values of the 32 NFL teams is: Histogram of Value ($mil) 14 12 10 Frequency 2.26 120 8 6 4 2 0 750 900 1050 1200 1350 Value ($mil) 1500 1650 1800 Copyright © 2014 Pearson Education, Inc. 27 Chapter 2 b. Using MINITAB, a histogram of the 1-year change in current value for the 32 NFL teams is: Histogram of Chang1Yr (%) 10 Frequency 8 6 4 2 0 -4 c. -2 0 2 4 Chang1Yr (%) 6 8 10 Using MINITAB, a histogram of the debt-to-value ratios for the 32 NFL teams is: Histogram of Debt/Value (%) 20 15 Frequency 28 10 5 0 0 16 32 Debt/Value (%) 48 64 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data d. 29 Using MINITAB, a histogram of the annual revenues for the 32 NFL teams is: Histogram of Revenue ($mil) 16 14 Frequency 12 10 8 6 4 2 0 225 e. 250 275 300 325 Revenue ($mil) 350 375 400 Using MINITAB, a histogram of the operating incomes for the 32 NFL teams is: Histogram of Income ($mil) 10 Frequency 8 6 4 2 0 0 f. 20 40 60 Income ($mil) 80 100 120 For all of the histograms, there is 1 team that has a very high score. The Dallas Cowboys have the largest values for current value, annual revenues, and operating income. However, the New York Giants have the highest 1-year change, while the New York Jets have the highest debt-to-value ratio. All of the graphs except the one showing the 1-Yr Value Changes are skewed to the right. Copyright © 2014 Pearson Education, Inc. 30 Chapter 2 2.27 a. Using MINITAB, the frequency histograms for 2011 and 2010 SAT mathematics scores are: His togr a m of M ATH 2 0 1 1 , M A TH 2 0 1 0 480 520 Frequency M A T H2011 560 600 M A T H2010 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 480 520 560 600 It appears that the scores have not changed very much at all. The graphs are very similar. Using MINITAB, the frequency histograms for 2011 and 2001 SAT mathematics scores are: His togr am of M ATH2 0 1 1 , M ATH2 0 0 1 480 M A T H2011 510 540 570 600 M A T H2001 14 12 12 10 10 Frequency b. 8 8 6 6 4 4 2 2 0 0 480 520 560 600 It appears that the scores have shifted to the right. The scores in 2011 appear to be somewhat better than the scores in 2011. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data c. 31 Using MINITAB, the frequency histogram of the differences is: H is togr a m of D iffM a th 16 14 Frequency 12 10 8 6 4 2 0 -32 -16 0 DiffM a t h 16 32 From this graph of the differences, we can see that there are more observations to the right of 0 than to the left of 0. This indicates that, in general, the scores have improved since 2001. d. 2.28 From the graph, the largest improvement score is in the neighborhood of 32. The actual largest score is 32 and it is associated with Michigan. Using MINITAB, the two dot plots are: Dotplot of Arrive, Depart Arrive Depart 108 120 132 144 156 168 Data Yes. Most of the numbers of items arriving at the work center per hour are in the 135 to 165 area. Most of the numbers of items departing the work center per hour are in the 110 to 140 area. Because the number of items arriving is larger than the number of items departing, there will probably be some sort of bottleneck. Copyright © 2014 Pearson Education, Inc. 32 2.29 Chapter 2 Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: Dioxide Stem-and-leaf of Dioxide Leaf Unit = 0.10 5 7 (2) 7 7 5 5 4 4 0 0 1 1 2 2 3 3 4 N = 16 12234 55 34 44 3 0000 The highlighted values are values that correspond to water specimens that contain oil. There is a tendency for crude oil to be present in water with lower levels of dioxide as 6 of the lowest 8 specimens with the lowest levels of dioxide contain oil. 2.30 Yes, we would agree with the statement that honey may be the preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection. For those receiving the honey dosage, 14 of the 35 children (or 40%) had improvement scores of 12 or higher. For those receiving the DM dosage, only 9 of the 33 (or 24%) children had improvement scores of 12 or higher. For those receiving no dosage, only 2 of the 37 children (or 5%) had improvement scores of 12 or higher. In addition, the median improvement score for those receiving the honey dosage was 11, the median for those receiving the DM dosage was 9 and the median for those receiving no dosage was 7. 2.31 Using MINITAB, the relative frequency histograms of the years in practice for the two groups of doctors are: Histogram of YRSPRAC 0.0 NO 25 7.5 15.0 22.5 30.0 37.5 YES Percent 20 15 10 5 0 0.0 7.5 15.0 22.5 30.0 37.5 YRSPRAC Panel variable: FUTUREUSE The researchers hypothesized that older, more experienced physicians will be less likely to use ethics consultation in the future. From the histograms, approximately 38% of the doctors that said “no” have more than 20 years of experience. Only about 19% of the doctors that said “yes” had more than 20 years of experience. This supports the researchers’ assertion. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data a. Using MINITAB, the stem-and-leaf display is as follows, where the stems are the units place and the leaves are the decimal places: Stem-and-Leaf Display: Time Stem-and-leaf of Time Leaf Unit = 0.10 (26) 23 15 9 4 2 2 1 1 1 1 2 3 4 5 6 7 8 9 10 N = 49 00001122222344444445555679 11446799 002899 11125 24 8 1 b. A little more than half (26/49 = .53) of all companies spent less than 2 months in bankruptcy. Only two of the 49 companies spent more than 6 months in bankruptcy. It appears that, in general, the length of time in bankruptcy for firms using "prepacks" is less than that of firms not using prepacks." c. A dot diagram will be used to compare the time in bankruptcy for the three types of "prepack" firms: Dotplot of Time vs Votes Votes 2.32 33 Joint None Prepack 1.2 2.4 3.6 4.8 6.0 7.2 8.4 9.6 Time d. The highlighted times in part a correspond to companies that were reorganized through a leverage buyout. There does not appear to be any pattern to these points. They appear to be scattered about evenly throughout the distribution of all times. Copyright © 2014 Pearson Education, Inc. 34 2.33 Chapter 2 Using MINITAB, the histogram of the data is: Histogram of INTTIME 60 50 Frequency 40 30 20 10 0 0 75 150 225 300 INTTIME 375 450 525 This histogram looks very similar to the one shown in the problem. Thus, there appears that there was minimal or no collaboration or collusion from within the company. We could conclude that the phishing attack against the organization was not an inside job. 2.34 Using MINITAB, the stem-and-leaf display for the data is: Stem-and-Leaf Display: Time Stem-and-leaf of Time Leaf Unit = 1.0 3 7 (7) 11 6 4 2 1 N = 25 3 239 4 3499 5 0011469 6 34458 7 13 8 26 9 5 10 2 The numbers in bold represent delivery times associated with customers who subsequently did not place additional orders with the firm. Since there were only 2 customers with delivery times of 68 days or longer that placed additional orders, I would say the maximum tolerable delivery time is about 65 to 67 days. Everyone with delivery times less than 67 days placed additional orders. 2.35 x 3.2 2.5 2.1 3.7 2.8 2.0 16.3 2.717 Assume the data are a sample. The sample mean is: x n 6 6 The median is the average of the middle two numbers when the data are arranged in order (since n = 6 is even). The data arranged in order are: 2.0, 2.1, 2.5, 2.8, 3.2, 3.7. The middle two numbers are 2.5 and 2.8. The median is: 2.5 2.8 5.3 2.65 2 2 Copyright © 2014 Pearson Education, Inc. 2.36 Methods for Describing Sets of Data x 85 8.5 a. x b. x 400 25 16 c. x 35 .778 45 d. x 242 13.44 18 n 35 10 2.37 The mean and median of a symmetric data set are equal to each other. The mean is larger than the median when the data set is skewed to the right. The mean is less than the median when the data set is skewed to the left. Thus, by comparing the mean and median, one can determine whether the data set is symmetric, skewed right, or skewed left. 2.38 The median is the middle number once the data have been arranged in order. If n is even, there is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number. A data set with five measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle number, which is 5. A data set with six measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the average of the 5 5 10 middle two numbers which is 5. 2 2 2.39 Assume the data are a sample. The mode is the observation that occurs most frequently. For this sample, the mode is 15, which occurs three times. x 18 10 15 13 17 15 12 15 18 16 11 160 14.545 The sample mean is: x 11 n 11 The median is the middle number when the data are arranged in order. The data arranged in order are: 10, 11, 12, 13, 15, 15, 15, 16, 17, 18, 18. The middle number is the 6th number, which is 15. 2.40 a. b. x x 7 4 15 2.5 x x 2 4 40 3.08 6 6 33 Median = 3 (mean of 3rd and 4th numbers, after ordering) 2 Mode = 3 n 13 13 n Median = 3 (7th number, after ordering) Mode = 3 Copyright © 2014 Pearson Education, Inc. 36 Chapter 2 c. 2.41 2.42 x x 51 37 496 49.6 10 10 48 50 Median = 49 (mean of 5th and 6th numbers, after ordering) 2 Mode = 50 n a. For a distribution that is skewed to the left, the mean is less than the median. b. For a distribution that is skewed to the right, the mean is greater than the median. c. For a symmetric distribution, the mean and median are equal. a. b. The mean is x 9 (.1) (1.6) 14.6 16.0 7.7 19.9 9.8 3.2 24.8 17.6 10.7 9.1 140.7 10.82 x n 13 13 The average annualized percentage return on investment for 13 randomly selected stock screeners is 10.82. Since the number of observations is odd, the median is the middle number once the data have been arranged in order. The data arranged in order are: -1.6 -.1 3.2 7.7 9.0 9.1 9.8 10.7 14.6 16.0 17.6 19.9 24.8 The middle number is 9.8 which is the median. Half of the annualized percentage returns on investment are below 9.8 and half are above 9.8. 2.43 2.44 a. The mean amount exported on the printout is 653. This means that the average amount of money per market from exporting sparkling wine was $653,000. b. The median amount exported on the printout is 231. Since the median is the middle value, this means that half of the 30 sparkling wine export values were above $231,000 and half of the sparkling wine export values were below $231,000. c. The mean 3-year percentage change on the printout is 481. This means that in the last three years, the average change is 481%, which indicates a large increase. d. The median 3-year percentage change on the printout is 156. Since the median is the middle value, this means that half, or 15 of the 30 countries’ 3-year percentage change values were above 156% and half, or 15 of the 30 countries’ 3-year percentage change values were below 156%. a. The sample mean is: x n x i 1 n i 1.72 2.50 2.16 1.95 37.62 1.881 20 20 The sample average surface roughness of the 20 observations is 1.881. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data b. 37 The median is found as the average of the 10th and 11th observations, once the data have been ordered. The ordered data are: 1.06 1.09 1.19 1.26 1.27 1.40 1.51 1.72 1.95 2.03 2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.50 2.57 2.64 The 10th and 11th observations are 2.03 and 2.05. The median is: 2.03 2.05 4.08 2.04 2 2 The middle surface roughness measurement is 2.04. Half of the sample measurements were less than 2.04 and half were greater than 2.04. 2.45 c. The data are somewhat skewed to the left. Thus, the median might be a better measure of central tendency than the mean. The few small values in the data tend to make the mean smaller than the median. a. The mean is x b. x 1,680,927 885,182 881, 777 563,967 15,192, 021 759, 601.05 . n 20 20 The average research expenditures for the top 20 ranked universities is 759,601.05 thousand dollars. Since the number of observations is even, the median is the average of the middle 2 numbers once the data have been arranged in order. Since the data are already arranged in order, the median is 702,592 688, 225 695, 408.5 . 2 Half of the institutions have a research expenditure less than 695,408.5 thousand dollars and half have research expenditures greater than 695,408.5 thousand dollars. 2.46 2.47 c. No, the mean from part a would not be a good measure for the center of the distribution for all American universities. The data in part a come from only the top 20 universities. These universities would not be representative of all American universities. a. The mean is 67.755. The statement is accurate. b. The median is 68.000. The statement is accurate. c. The mode is 64. The statement is not accurate. A better statement would be: “The most common reported level of support for corporate sustainability for the 992 senior managers was 64. d. Since the mean and median are almost the same, the distribution of the 992 support levels should be fairly symmetric. The histogram in Exercise 2.23 is almost symmetric. a. The median is the middle number (18th) once the data have been arranged in order because n = 35 is odd. The honey dosage data arranged in order are: 4,5,6,8,8,8,8,9,9,9,9,10,10,10,10,10,10,11,11,11,11,12,12,12,12,12,12,13,13,14,15,15,15,15,16 The 18th number is the median = 11. Copyright © 2014 Pearson Education, Inc. 38 Chapter 2 b. The median is the middle number (17th) once the data have been arranged in order because n = 33 is odd. The DM dosage data arranged in order are: 3,4,4,4,4,4,4,6,6,6,7,7,7,7,7,8,9,9,9,9,9,10,10,10,11,12,12,12,12,12,13,13,15 The 17th number is the median = 9. c. The median is the middle number (19th) once the data have been arranged in order because n = 37 is odd. The No dosage data arranged in order are: 0,1,1,1,3,3,4,4,5,5,5,6,6,6,6,7,7,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,10,11,12,12 The 19th number is the median = 7. 2.48 d. Since the median for the Honey dosage is larger than the other two, it appears that the honey dosage leads to more improvement than the other two treatments. a. The mean dioxide level is x 3.3 0.5 1.3 4.0 29 1.81 . The average dioxide amount is 16 16 1.81. b. Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.1 0.2 0.2 0.3 0.4 0.5 0.5 1.3 1.4 2.4 2.4 3.3 4.0 4.0 4.0 4.0 The median is 1.3 1.4 2.7 1.35 . Half of the dioxide levels are below 1.35 and half are above 2 2 1.35. c. The mode is the number that occurs the most. For this data set the mode is 4.0. The most frequent level of dioxide is 4.0. d. Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.1 0.3 1.4 2.4 2.4 3.3 4.0 4.0 4.0 4.0 The median is e. 2.4 3.3 5.7 2.85 . 2 2 Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.2 0.2 0.4 0.5 0.5 1.3 The median is f. 0.4 0.5 0.9 0.45 . 2 2 The median level of dioxide when crude oil is present is 0.45. The median level of dioxide when crude oil is not present is 2.85. It is apparent that the level of dioxide is much higher when crude oil is not present. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.49 2.50 39 a. Skewed to the right. There will be a few people with very high salaries such as the president and football coach. b. Skewed to the left. On an easy test, most students will have high scores with only a few low scores. c. Skewed to the right. On a difficult test, most students will have low scores with only a few high scores. d. Skewed to the right. Most students will have a moderate amount of time studying while a few students might study a long time. e. Skewed to the left. Most cars will be relatively new with a few much older. f. Skewed to the left. Most students will take the entire time to take the exam while a few might leave early. a. The sample means is: x x 3.58 3.48 3.27 1.17 77.07 1.927 40 n 40 The median is found as the 20th and 21st observations, once the data have been ordered. The 20th and 21st observations are 1.75 and 1.76. The median is: 1.75 1.76 3.51 1.755 2 2 The mode is the number that occurs the most and is 1.4, which occurs 3 times. b. The sample average driving performance index is 1.927. The median driving performance index is 1.755. Half of all driving performance indexes are less than 1.755 and half are higher. The most common driving performance index value is 1.4. c. Since the mean is larger than the median, the data are skewed to the right. Using MINITAB, a histogram of the driving performance index values is: Histogram of INDEX 10 Frequency 8 6 4 2 0 1.5 2.51 2.0 2.5 INDEX 3.0 3.5 The mean is 141.31 hours. This means that the average number of semester hours per candidate for the CPA exam is 141.31 hours. The median is 140 hours. This means that 50% of the candidates had more than 140 semester hours of credit and 50% had less than 140 semester hours of credit. Since the mean and median are so close in value, the data are probably not skewed, but close to symmetric. Copyright © 2014 Pearson Education, Inc. 40 Chapter 2 2.52 a. Using MINITAB, the output is: Descriptive Statistics: YRSPRAC Variable YRSPRAC N 112 N* 6 Mean 14.598 Minimum 1.000 Median 14.000 Maximum 40.000 Mode 14, 20, 25 N for Mode 9 The mean is 14.598. The average length of time in practice for this sample is 14.598 years. The median is 14. Half of the physicians have been in practice less than 14 years and half have been in practice longer than 14 years. There are 3 modes: 14, 20, and 25. The most frequent years in practice are 14, 20, and 25 years. b. Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC FUTUREUSE NO YES N 21 91 N* 2 4 Mean 16.43 14.176 Minimum 1.00 1.000 Median 18.00 14.000 Maximum 35.00 40.000 Mode 25 14, 20 N for Mode 5 8 The mean for the physicians who would refuse to use ethics consultation in the future is 16.43. The average time in practice for these physicians is 16.43 years. The median is 18. Half of the physicians who would refuse ethics consultation in the future have been in practice less than 18 years and half have been in practice more than 18 years. The mode is 25. The most frequent years in practice for these physicians is 25 years. 2.53 c. From the results in part b, the mean for the physicians who would use ethics consultation in the future is 14.176. The average time in practice for these physicians is 14.176 years. The median is 14. Half of the physicians who would use ethics consultation in the future have been in practice less than 14 years and half have been in practice more than 14 years. There are 2 modes: 14 and 20. The most frequent years in practice for these physicians are 14 and 20 years. d. The results in parts b and c confirm the researchers’ theory. The mean, median and mode of years in practice are larger for the physicians who would refuse to use ethics consultation in the future than those who would use ethics consultation in the future. For the "Joint exchange offer with prepack" firms, the mean time is 2.6545 months, and the median is 1.5 months. Thus, the average time spent in bankruptcy for "Joint" firms is 2.6545 months, while half of the firms spend 1.5 months or less in bankruptcy. For the "No prefiling vote held" firms, the mean time is 4.2364 months, and the median is 3.2 months. Thus, the average time spent in bankruptcy for "No prefiling vote held" firms is 4.2364 months, while half of the firms spend 3.2 months or less in bankruptcy. For the "Prepack solicitation only" firms, the mean time is 1.8185 months, and the median is 1.4 months. Thus, the average time spent in bankruptcy for "Prepack solicitation only" firms is 1.8185 months, while half of the firms spend 1.4 months or less in bankruptcy. Since the means and medians for the three groups of firms differ quite a bit, it would be unreasonable to use a single number to locate the center of the time in bankruptcy. Three different "centers" should be used. 2.54 a. The sample mean is: x n x i 1 n i 5 2 4 ... 3 78 3.90 20 20 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 41 The sample median is found by finding the average of the 10th and 11th observations once the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 11 The 10th and 11th observations are 3 and 4. The average of these two numbers (median) is: median 3 4 7 3.5 2 2 The mode is the observation appearing the most. For this data set, the mode is 1, which appears 5 times. b. Eliminating the largest number which is 11 results in the following: The sample mean is: x n i x i 1 n 5 2 4 ... 3 67 3.53 19 19 The sample median is found by finding the middle observation once the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 The 10th observation is 3. The median is 3 The mode is the observations appearing the most. For this data set, the mode is 1, which appears 5 times. By dropping the largest number, the mean is reduced from 4.05 to 3.68. The median is reduced from 3.5 to 3. There is no effect on the mode. c. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 11 If we drop the lowest 2 and largest 2 observations we are left with: 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 The sample 10% trimmed mean is: x n x i 1 n i 1 1 2 ... 7 56 3.5 16 16 The advantage of the trimmed mean over the regular mean is that very large and very small numbers that could greatly affect the mean have been eliminated. 2.55 2.56 a. Due to the "elite" superstars, the salary distribution is skewed to the right. Since this implies that the median is less than the mean, the players' association would want to use the median. b. The owners, by the logic of part a, would want to use the mean. a. The primary disadvantage of using the range to compare variability of data sets is that the two data sets can have the same range and be vastly different with respect to data variation. Also, the range is greatly affected by extreme measures. Copyright © 2014 Pearson Education, Inc. 42 Chapter 2 b. The sample variance is the sum of the squared deviations of the observations from the sample mean divided by the sample size minus 1. The population variance is the sum of the squared deviations of the values from the population mean divided by the population size. c. The variance of a data set can never be negative. The variance of a sample is the sum of the squared deviations from the mean divided by n 1. The square of any number, positive or negative, is always positive. Thus, the variance will be positive. The variance is usually greater than the standard deviation. However, it is possible for the variance to be smaller than the standard deviation. If the data are between 0 and 1, the variance will be smaller than the standard deviation. For example, suppose the data set is .8, .7, .9, .5, and .3. The sample mean is: x x .8 .7 .9 .5 .3 3.2 .64 .5 n 5 The sample variance is: s 2 x x 2 2 n 1 n 3.22 13 .232 .058 5 1 4 2.28 The standard deviation is s .058 .241 2.57 a. s2 b. a. b. n x s 2.3 1.52 x2 n 1 2 n x x 17 2 7 3.619 7 1 s 3.619 1.9 302 10 7.111 10 1 s 7.111 2.67 63 2 2 n 1 n x 154 Range = 1 (3) = 4 s2 2.58 n 1 82 5 2.3 5 1 22 Range = 8 (2) = 10 s2 d. 2 2 Range = 6 0 = 6 s2 c. x x Range = 4 0 = 4 s2 s2 x2 n 1 n x x 2 2 n 1 2 n x x 202 10 4.8889 10 1 84 2 2 n 1 n (6.8) 2 17 1.395 17 1 25.04 1002 40 3.3333 40 1 380 s 1.395 1.18 s 4.8889 2.211 s 3.3333 1.826 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data c. 2.59 a. s2 2 n 1 17 2 20 .1868 20 1 18 n x 3 1 10 10 4 28 x s2 b. x x 2 x 28 5.6 s2 s .1868 .432 x 3 1 10 10 4 226 2 2 2 2 2 2 5 n x x 2 2 n 1 n 282 5 69.2 17.3 5 1 4 226 x 8 10 32 5 55 x 43 s 17.3 4.1593 x 8 10 32 5 1213 2 x 55 13.75 feet 2 2 2 2 4 n x2 x n 1 2 n 552 4 456.75 152.25 square feet 4 1 3 1213 s 152.25 12.339 feet c. x 1 (4) (3) 1 (4) (4) 15 x (1) (4) (3) 1 (4) (4) 59 2 x s2 d. s2 2 2 2 2 2 x 15 2.5 6 n x x 2 2 n 1 x x 2 n (15) 2 6 21.5 4.3 6 1 5 59 1 1 1 2 1 4 10 2 5 5 5 5 5 5 5 x 2 1 .33 ounce n 6 24 1 1 1 2 1 4 x 2 5 5 5 5 5 5 25 .96 2 2 2 2 2 2 3 x x 2 n 1 s 4.3 2.0736 n 2 24 22 .2933 25 6 .0587 square ounce 6 1 5 s .0587 .2422 ounce Copyright © 2014 Pearson Education, Inc. 44 Chapter 2 2.60 a. s2 b. n 1 n x 1992 5 3.7 5 1 s 3.7 1.92 3032 9 1,949.25 9 1 s 1,949.25 44.15 2952 8 1,307.84 8 1 s 1,307.84 36.16 7935 x2 n 1 2 n x x 25, 795 Range = 100 2 = 98 s2 2.61 2 2 Range = 100 1 = 99 s2 c. x x Range = 42 37 = 5 2 2 n 1 n 20, 033 This is one possibility for the two data sets. Data Set 1: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Data Set 2: 0, 0, 1, 1, 2, 2, 3, 3, 9, 9 The two sets of data above have the same range = largest measurement smallest measurement = 9 0 = 9. The means for the two data sets are: x1 x2 x 0 1 2 3 4 5 6 7 8 9 45 4.5 n 10 10 n 10 10 x 0 0 1 1 2 2 3 3 9 9 30 3 The dot diagrams for the two data sets are shown below. Dotplot of x1, x2 x1 0 2 4 x 6 8 6 8 x2 0 2 x 4 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.62 This is one possibility for the two data sets. Data Set 1: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 Data Set 2: 1, 1, 1, 1, 1, 5, 5, 5, 5, 5 x1 x 1 1 2 2 3 3 4 4 5 5 30 3 x2 x 1 1 1 1 1 5 5 5 5 5 30 3 n 10 10 n 10 10 Therefore, the two data sets have the same mean. The variances for the two data sets are: s12 s22 x x n 1 2 2 n x x 2 2 n 1 302 10 20 2.2222 9 9 110 n 302 10 40 4.4444 9 9 130 The dot diagrams for the two data sets are shown below. Dotplot of x1, x2 x1 x 1 2 3 x2 1 2 3 4 5 4 5 x 2.63 a. s2 b. x x Range = 3 0 = 3 2 n 1 n 2 72 5 1.3 5 1 15 s 1.3 1.14 After adding 3 to each of the data points, Range = 6 3 = 3 Copyright © 2014 Pearson Education, Inc. 45 46 Chapter 2 s2 c. x x 2 2 n n 1 222 5 1.3 5 1 102 s 1.3 1.14 After subtracting 4 from each of the data points, x x Range = 1 (4) = 3 s2 2.64 2 2 n n 1 (13) 2 5 1.3 5 1 39 s 1.3 1.14 d. The range, variance, and standard deviation remain the same when any number is added to or subtracted from each measurement in the data set. a. The range is the difference between the maximum and minimum values. The range 24.8 – 1.6 26.4 . The units of measurement are percents. b. The variance is s2 x2 x n 1 n 2 140.7 2 13 2236.41 1522.8069 713.6031 59.4669 13 1 12 12 2236.41 The units are square percents. 2.65 c. The standard deviation is s 59.4669 7.7115 . The units are percents. a. The range is the difference between the largest observation and the smallest observation. From the printout, the largest observation is $4,852 thousand and the smallest observation is $70 thousand. The range is: R $4,852 $70 $4,882 thousand b. From the printout, the standard deviation is s = $1,113 thousand. c. The variance is the standard deviation squared. The variance is: s 2 1,1132 1, 238, 769 million dollars squared 2.66 a. The sample variance of the honey dosage group is: s2 x x 2 2 n 1 n 3752 35 277.142857 8.1512605 35-1 34 4295- The standard deviation is: s 8.1512605 2.855 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data b. 47 The sample variance of the DM dosage group is: s2 x x 2 2 n 1 n 2752 33 339.33333 10.604167 33-1 32 2631- The standard deviation is: s 10.604167 3.256 c. The sample variance of the control group is: s2 x x 2 2 n 1 n 2412 37 311.243243 8.6456456 37-1 36 1881- The standard deviation is: s 8.6456456 2.940 2.67 2.68 d. The group with the most variability is the group with the largest standard deviation, which is the DM group. The group with the least variability is the group with the smallest standard deviation, which is the honey group. a. The range is 155. The statement is accurate. b. The variance is 722.036. The statement is not accurate. A more accurate statement would be: “The variance of the levels of supports for corporate sustainability for the 992 senior managers is 722.036.” c. The standard deviation is 26.871. If the units of measure for the two distributions are the same, then the distribution of support levels for the 992 senior managers has less variation than a distribution with a standard deviation of 50. If the units of measure for the second distribution is not known, then we cannot compare the variation in the two distributions by looking at the standard deviations alone. d. The standard deviation best describes the variation in the distribution. The range can be greatly affected by extreme measures. The variance is measured in square units, which is hard to interpret. Thus, the standard deviation is the best measure to describe the variation. a. Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC N 112 N* 6 Mean 14.598 StDev 9.161 Variance 83.918 Range 39.000 The range is 39. The difference between the largest years in practice and the smallest years in practice is 39 years. The variance is 83.918 square years. The standard deviation is 9.161 years. b. Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC FUTUREUSE NO YES N 21 91 N* 2 4 Mean 16.43 14.176 StDev 10.05 8.950 Variance 100.96 80.102 Range 34.00 39.000 For the physicians who would refuse to use ethics consultation in the future, the standard deviation is 10.05 years. Copyright © 2014 Pearson Education, Inc. 48 2.69 Chapter 2 c. For the physicians who would use ethics consultation in the future, the standard deviation is 8.95 years. d. The variation in the length of time in practice for the physicians who would refuse to use ethics consultation in the future is greater than that for the physicians who would use ethics consultation in the future. a. The range is the largest observation minus the smallest observation or 11 – 1 = 10. xi 782 xi2 i 450 n 20 7.6737 The variance is: s 2 i n 1 20 1 2 The standard deviation is: s s 2 7.6737 2.77 b. The largest observation is 11. It is deleted from the data set. The new range is: 9 – 1 = 8. xi 67 2 xi2 i 329 n 19 5.1520 The variance is: s 2 i n 1 19 1 2 The standard deviation is: s s 2 5.1520 2.27 When the largest observation is deleted, the range, variance and standard deviation decrease. c. The largest observation is 11 and the smallest is 1. When these two observations are deleted from the data set, the new range is: 9 – 1 = 8. xi i 662 xi2 328 n 18 5.0588 The variance is: s 2 i n 1 18 1 2 The standard deviation is: s s 2 5.0588 2.25 When the largest and smallest observations are deleted, the range, variance and standard deviation decrease. 2.70 a. A worker's overall time to complete the operation under study is determined by adding the subtasktime averages. Worker A The average for subtask 1 is: x The average for subtask 2 is: x x 211 30.14 x 21 3 n 7 n 7 Worker A's overall time is 30.14 + 3 = 33.14. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data Worker B The average for subtask 1 is: x The average for subtask 2 is: x 49 x 213 30.43 x 29 4.14 n 7 n 7 Worker B's overall time is 30.43 + 4.14 = 34.57. b. x x Worker A s n 1 Worker B s 2 2 x2 n x n 1 2 n 2112 7 15.8095 3.98 7 1 6455 2132 7 .9524 .98 7 1 6487 c. The standard deviations represent the amount of variability in the time it takes the worker to complete subtask 1. d. Worker A s x x 2 2 n 1 n x x 212 7 .6667 .82 7 1 67 Worker B s e. 2 2 n 1 n 292 7 4.4762 2.12 7 1 147 I would choose workers similar to worker B to perform subtask 1. Worker B has a slightly higher average time on subtask 1 (A: x 30.14 , B: x 30.43 ). However, Worker B has a smaller variability in the time it takes to complete subtask 1 (part b). He or she is more consistent in the time needed to complete the task. I would choose workers similar to Worker A to perform subtask 2. Worker A has a smaller average time on subtask 2 (A: x 3 , B: x 4.14 ). Worker A also has a smaller variability in the time needed to complete subtask 2 (part d). 2.71 a. The unit of measurement of the variable of interest is dollars (the same as the mean and standard deviation). Based on this, the data are quantitative. b. Since no information is given about the shape of the data set, we can only use Chebyshev's Rule. $900 is 2 standard deviations below the mean, and $2100 is 2 standard deviations above the mean. Using Chebyshev's Rule, at least 3/4 of the measurements (or 3/4 200 = 150 measurements) will fall between $900 and $2100. Copyright © 2014 Pearson Education, Inc. 50 Chapter 2 $600 is 3 standard deviations below the mean and $2400 is 3 standard deviations above the mean. Using Chebyshev's Rule, at least 8/9 of the measurements (or 8/9 200 178 measurements) will fall between $600 and $2400. $1200 is 1 standard deviation below the mean and $1800 is 1 standard deviation above the mean. Using Chebyshev's Rule, nothing can be said about the number of measurements that will fall between $1200 and $1800. $1500 is equal to the mean and $2100 is 2 standard deviations above the mean. Using Chebyshev's Rule, at least 3/4 of the measurements (or 3/4 200 = 150 measurements) will fall between $900 and $2100. It is possible that all of the 150 measurements will be between $900 and $1500. Thus, nothing can be said about the number of measurements between $1500 and $2100. 2.72 2.73 2.74 Since no information is given about the data set, we can only use Chebyshev's Rule. a. Nothing can be said about the percentage of measurements which will fall between x s and x s . b. At least 3/4 or 75% of the measurements will fall between x 2 s and x 2 s . c. At least 8/9 or 89% of the measurements will fall between x 3s and x 3s . According to the Empirical Rule: a. Approximately 68% of the measurements will be contained in the interval x s to x s . b. Approximately 95% of the measurements will be contained in the interval x 2 s to x 2 s . c. Essentially all the measurements will be contained in the interval x 3s to x 3s . a. x s2 x 206 8.24 n 25 x x 2 2 n 1 n 2062 25 3.357 25 1 1778 s 3.357 1.83 b. Number of Measurements in Interval Interval c. Percentage x s , or (6.41, 10.07) 18 18 / 25 .72 or 72% x 2 s , or (4.58, 11.90) 24 24 / 25 .96 or 96% x 3s , or (2.75, 13.73) 25 25 / 25 1.00 or 100% The percentages in part b are in agreement with Chebyshev's Rule and agree fairly well with the percentages given by the Empirical Rule. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data d. Range 12 5 7 and s 51 Range 7 1.75 4 4 The range approximation provides a satisfactory estimate of s 1.83 from part a. 2.75 Using Chebyshev's Rule, at least 8/9 of the measurements will fall within 3 standard deviations of the mean. Thus, the range of the data would be around 6 standard deviations. Using the Empirical Rule, approximately 95% of the observations are within 2 standard deviations of the mean. Thus, the range of the data would be around 4 standard deviations. We would expect the standard deviation to be somewhere between Range/6 and Range/4. For our data, the range 760 135 625 . The Range 625 Range 625 156.25 . 104.17 and 6 6 4 4 Therefore, I would estimate that the standard deviation of the data set is between 104.17 and 156.25. It would not be feasible to have a standard deviation of 25. If the standard deviation were 25, the data would span 625/25 = 25 standard deviations. This would be extremely unlikely. a. Using MINITAB, the histogram of the data is: Histogram of Wheels 12 10 8 Frequency 2.76 6 4 2 0 1 2 3 4 5 6 7 8 Wheels Since the distribution is skewed to the right, it is not mound-shaped and it is not symmetric. b. Using MINITAB, the results are: Descriptive Statistics: Wheels Variable Wheels N 28 Mean 3.214 StDev 1.371 Minimum 1.000 Q1 2.000 Median 3.000 Q3 4.000 Maximum 8.000 The mean is 3.214 and the standard deviation is 1.371. c. d. The interval is: x 2 s 3.214 2(1.371) 3.214 2.742 (0.472, 5.956) . According to Chebyshev’s rule, at least 75% of the observations will fall within 2 standard deviations of the mean. Copyright © 2014 Pearson Education, Inc. 52 2.77 2.78 Chapter 2 e. According to the Empirical Rule, approximately 95% of the observations will fall within 2 standard deviations of the mean. f. Actually, 26 of the 28 or 26/28 = .929 of the observations fall within the interval. This value is close to the 95% that we would expect with the Empirical Rule. a. The interval x 2 s will contain at least 75% of the observations. This interval is x 2 s 3.11 2(.66) 3.11 1.32 (1.79, 4.43) . b. No. The value 1.25 does not fall in the interval x 2 s . We know that at least 75% of all observations will fall within 2 standard deviations of the mean. Since 1.25 falls more than 2 standard deviations from the mean, it would not be a likely value to observe. a. Using Chebyshev’s Rule, at least 75% of the observations will fall within 2 standard deviations of the mean. x 2 s 4.25 2(12.02) 4.25 24.04 ( 19.79, 28.29) or (0, 28.29) since we cannot have a negative number blogs. 2.79 2.80 b. We would expect the distribution to be skewed to the right. We know that we cannot have a negative number of blogs/forums. Even 1 standard deviation below the mean is a negative number. We would assume that there are a few very large observations because the standard deviation is so big compared to the mean. a. The 2 standard deviation interval around the mean is: b. Using Chebyshev’s Theorem, at least ¾ of the observations will fall within 2 standard deviations of the mean. Thus, at least ¾ of first-time candidates for the CPA exam have total credit hours between 105.77 and 176.85. c. In order for the above statement to be true, nothing needs to be known about the shape of the distribution of total semester hours. a. Since the data are mound-shaped and symmetric, we know from the Empirical Rule that approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval will be: x 2 s 39 2(6) 39 12 (27, 51) . b. We know that approximately .05 of the observations will fall outside the range 27 to 51. Since the distribution of scores is symmetric, we know that half of the .05 or .025 will fall above 51. c. We know from the Empirical Rule that approximately 99.7% (essentially all) of the observations will fall within 3 standard deviations of the mean. This interval is: x 3s 39 3(6) 39 18 (21, 57) . x 2 s 141.31 2(17.77) 141.31 35.54 (105.77, 176.85) x n 2.81 a. The sample mean is: x i 1 n i 17,800 95.699 186 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 53 n xi n 2 17,8002 x i 1 1,707,998 n 186 24.6332 The sample variance is: s 2 i 1 n 1 186 1 2 The standard deviation is: s s 2 24.6332 4.9632 x s 95.699 4.963 (90.736, 100.662) b. x 2 s 95.699 2(4.963) 95.699 9.926 (85.773, 105.625) x 3s 95.699 3(4.963) 95.699 14.889 (80.810, 110.558) c. There are 166 out of 186 observations in the first interval. This is (166 / 186) 100% 89.2% . There are 179 out of 186 observations in the second interval. This is (179 / 186) 100% 96.2% . There are 182 out of 186 observations in the second interval. This is (182 / 186) 100% 97.8% . The percentages for the first 2 intervals are much larger than we would expect using the Empirical Rule. The Empirical Rule indicates that approximately 68% of the observations will fall within 1 standard deviation of the mean. It also indicates that approximately 95% of the observations will fall within 2 standard deviations of the mean. Chebyshev’s Theorem says that at least ¾ or 75% of the observations will fall within 2 standard deviations of the mean and at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. It appears that our observed percentages agree with Chebyshev’s Theorem better than the Empirical Rule. 2.82 2.83 a. The interval is: x 2 s 13.2 2(19.5) 13.2 39 ( 25.8, 52.2) or (0, 52.2) since we cannot have negative number of minutes. b. Since this interval contains negative numbers, we know that the distribution cannot be symmetric. One cannot have negative values for time spent on a laptop computer. c. Since we know the data are not symmetric, we must use Chebyshev’s Rule. At least ¾ or 75% of the observations will fall between -25.8 and 52.2 or between 0 and 52.2 minutes. x The sample mean is: n x i 1 n i 240.9 248.8 215.7 238.0 2347.4 234.74 10 10 The sample variance deviation is: n xi n 2 2347.42 xi i 1 551,912.1 883.424 n 10 98.1582 s 2 i 1 9 9 n 1 2 The sample standard deviation is: s2 98.1582 9.91 The data are fairly symmetric, so we can use the Empirical Rule. We know from the Empirical Rule that almost all of the observations will fall within 3 standard deviations of the mean. This interval would be: x 3s 234.74 3(9.91) 234.74 29.73 (205.01, 264.47) Copyright © 2014 Pearson Education, Inc. 54 Chapter 2 2.84 a. Using MINITAB, the frequency histogram for the time in bankruptcy is: Histogram of TIME 20 Frequency 15 10 5 0 2 4 6 Time in Bankruptcy 8 10 The Empirical Rule is not applicable because the data are not mound shaped. b. Using MINITAB, the descriptive measures are: Descriptive Statistics: TIME Variable TIME N 49 Mean 2.549 StDev 1.828 Minimum 1.000 Q1 1.350 Median 1.700 Q3 3.500 Maximum 10.100 From Chebyshev’s Theorem, we know that at least 75% of the observations will fall within 2 standard deviations of the mean. This interval is: x 2 s 2.549 2(1.828) 2.549 3.656 ( 1.107, 6.205) or (0, 6.205) since we cannot have negative months. 2.85 c. There are 47 of the 49 observations within this interval. The percentage would be (47 / 49) 100% 95.9% . This agrees with Chebyshev’s Theorem (at least 75%). It also agrees with the Empirical Rule (approximately 95%). d. From the above interval we know that about 95% of all firms filing for prepackaged bankruptcy will be in bankruptcy between 0 and 6.2 months. Thus, we would estimate that a firm considering filing for bankruptcy will be in bankruptcy up to 6.2 months. a. b. The interval x 2 s for the flexed arm group is x 2 s 59 3(4) 59 12 (47, 71) . The interval for the extended are group is x 2 s 43 3(2) 43 6 (37, 49) . We know that at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean using Chebyshev’s Rule. Since these 2 intervals barely overlap, the information supports the researchers’ theory. The shoppers from the flexed arm group are more likely to select vice options than the extended arm group. The interval x 2 s for the flexed arm group is x 2 s 59 2(10) 59 20 (39, 79) . The interval for the extended are group is x 2 s 43 2(15) 43 30 (13, 73) . Since these two intervals overlap almost completely, the information does not support the researcher’s theory. There does not appear to be any difference between the two groups. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.86 55 a. Yes. The distribution of the buy-side analysts is fairly flat and skewed to the right. The distribution of the sell-side analysts is more mound shaped and is not spread out as far as the buy-side distribution. Since the buy-side distribution is more spread out, the variance of the buy-side distribution will be larger than the variance of the sell-side distribution. Because the buy-side distribution is skewed to the right, the mean will be pulled to the right. Thus, the mean of the buyside distribution will be greater than the mean of the sell-side distribution. b. Since the sell-side distribution is fairly mound-shaped, we can use the Empirical Rule. The Empirical Rule says that approximately 95% of the observations will fall within 2 standard deviations of the mean. The interval for the sell-side distribution would be: x 2 s .05 2(.85) .05 1.7 ( 1.75, 1.65) Since the buy-side distribution is skewed to the right, we cannot use the Empirical Rule. Thus, we will use Chebyshev’s Rule. We know that at least (1 – 1/k2) will fall within k standard deviations of the mean. If we choose k 4 , then (1 1/ 42 ) .9375 or 93.75%. This is very close to 95% requested in the problem. The interval for the buy-side distribution to contain at least 93.75% of the observations would be: x 4 s .85 4(1.93) .85 7.72 ( 6.87, 8.57) Note: This interval will contain at least 93.75% of the observations. It may contain more than 93.75% of the observations. 2.87 Since we do not know if the distribution of the heights of the trees is mound-shaped, we need to apply Chebyshev's Rule. We know 30 and 3 . Therefore, 3 30 3(3) 30 9 (21, 39) . According to Chebyshev's Rule, at least 8 / 9 .89 of the tree heights on this piece of land fall within this interval and at most 1 / 9 .11 of the tree heights will fall above the interval. However, the buyer will only 1000 purchase the land if at least .20 of the tree heights are at least 40 feet tall. Therefore, the buyer 5000 should not buy the piece of land. 2.88 a. Since we do not have any idea of the shape of the distribution of SAT-Math score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x 3s 19 3(65) 19 195 ( 176, 214) Thus, for a randomly selected student, we could be pretty sure that this student’s score would be anywhere from 176 points below his/her previous SAT-Math score to 214 points above his/her previous SAT-Math score. b. Since we do not have any idea of the shape of the distribution of SAT-Verbal score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x 3s 7 3(49) 7 147 ( 140, 154) Thus, for a randomly selected student, we could be pretty sure that this student’s score would be anywhere from 140 points below his/her previous SAT-Verbal score to 154 points above his/her previous SAT-Verbal score. Copyright © 2014 Pearson Education, Inc. 56 Chapter 2 c. 2.89 A change of 140 points on the SAT-Math would be a little less than 2 standard deviations from the mean. A change of 140 points on the SAT-Verbal would be a little less than 3 standard deviations from the mean. Since the 140 point change for the SAT-Math is not as big a change as the 140 point on the SAT-Verbal, it would be most likely that the score was a SAT-Math score. We know 25 and 1 . Therefore, 2 25 2(.1) 25 .2 (24.8, 25.2) The machine is shut down for adjustment if the contents of two consecutive bags fall more than 2 standard deviations from the mean (i.e., outside the interval (24.8, 25.2)). Therefore, the machine was shut down yesterday at 11:30 (25.23 and 25.25 are outside the interval) and again at 4:00 (24.71 and 25.31 are outside the interval). 2.90 2.91 a. z b. z c. z d. z x x 40 30 2 (sample) 5 s x x 2 standard deviations above the mean. 90 89 .5 (population) 2 .5 standard deviations above the mean. 50 50 0 (population) 5 0 standard deviations above the mean. x x 20 30 2.5 (sample) 4 s 2.5 standard deviations below the mean. Using the definition of a percentile: a. Percentile 75th Percentage Above 25% Percentage Below 75% b. 50th 50% 50% c. 20th 80% 20% d. 84th 16% 84% 2.92 QL corresponds to the 25th percentile. QM corresponds to the 50th percentile. QU corresponds to the 75th percentile. 2.93 We first compute z-scores for each x value. a. z b. z c. d. x x 100 50 2 25 1 4 3 1 x 0 200 2 z 100 z x 10 5 1.67 3 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 57 The above z-scores indicate that the x value in part a lies the greatest distance above the mean and the x value of part b lies the greatest distance below the mean. 2.94 Since the element 40 has a z-score of 2 and 90 has a z-score of 3, 2 40 and 2 40 2 40 40 2 3 90 3 90 3 90 By substitution, 40 2 3 90 5 50 10 and 40 2(10) 60 . Therefore, the population mean is 60 and the standard deviation is 10. 2.95 The mean score of U.S. eighth-graders on a mathematics assessment test is 283. This is the average score. The 25th percentile is 259. This means that 25% of the U.S. eighth-graders score below 259 on the test and 75% score higher. The 75th percentile is 308. This means that 75% of the U.S. eighth-graders score below 308 on the test and 25% score higher. The 90th percentile is 329. This means that 90% of the U.S. eighthgraders score below 329 on the test and 10% score higher. 2.96 a. The z-score is z b. Since the data are mound-shaped and symmetric and 39 is the mean, .5 of the sampled drug dealers will have WR scores below 39. c. If 5% of the drug dealers have WR scores above 49, then 95% will have WR scores below 49. Thus, 49 will be the 95th percentile. x x 30 39 1.5 . A score of 30 is 1.5 standard deviations below the mean. 6 s 2.97 A median starting salary of $41,100 indicates that half of the University of South Florida graduates had starting salaries less than $41,100 and half had starting salaries greater than $41,100. At mid-career, half of the University of South Florida graduates had a salary less than $71,100 and half had salaries greater than $71,100. At mid-career, 90% of the University of South Florida graduates had salaries under $131,000 and 10% had salaries greater than $131,000. 2.98 a. From Exercise 2.81, x 95.699 and s = 4.963. The z-score for an observation of 74 is: z x x 74 95.699 4.37 4.963 s This z-score indicates that an observation of 74 is 4.37 standard deviations below the mean. Very few observations will be lower than this one. b. The z-score for an observation of 98 is: z x x 92 95.699 0.75 4.963 s This z-score indicates that an observation of 92 is .75 standard deviations below the mean. This score Copyright © 2014 Pearson Education, Inc. 58 Chapter 2 is not an unusual observation in the data set. 2.99 2.100 2.101 Since the 90th percentile of the study sample in the subdivision was .00372 mg/L, which is less than the USEPA level of .015 mg/L, the water customers in the subdivision are not at risk of drinking water with unhealthy lead levels. x x 155 67.755 3.25 . This score would not be s 26.871 considered a typical level of support. It is 3.25 standard deviations above the mean. Very few observations would be above this value. The z-score associated with a score of 155 is z a. The 10th percentile is the score that has at least 10% of the observations less than it. If we arrange the data in order from the smallest to the largest, the 10th percentile score will be the .10(75) = 7.5 or 8th observation. When the data are arranged in order, the 8th observation is 0. Thus, the 10th percentile is 0. b. The 95th percentile is the score that has at least 95% of the observations less than it. If we arrange the data in order from the smallest to the largest, the 95th percentile score will be the .95(75) = 71.25 or 72nd observation. When the data are arranged in order, the 72nd observation is 21. Thus, the 95th percentile is 21. x n c. The sample mean is: x i 1 n i 393 5.24 75 xi 3932 xi2 i 5943 n 75 52.482 The sample variance is: s 2 i 75 1 n 1 2 The standard deviation is: s s 2 52.482 7.244 The z-score for a county with 48 Superfund sites is: z 2.102 x x 48 5.24 5.90 7.244 s d. Yes. A score of 48 is almost 6 standard deviations from the mean. We know that for any data set almost all (at least 8/9 using Chebyshev’s Theorem) of the observations are within 3 standard deviations of the mean. To be almost 6 standard deviations from the mean is very unusual. a. Since the data are approximately mound-shaped, we can use the Empirical Rule. On the blue exam, the mean is 53% and the standard deviation is 15%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x s 53 15 (38, 68) About 95% of all students will score within 2 standard deviations of the mean. This interval is: x 2 s 53 2(15) 53 30 (23, 83) About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: x 3s 53 3(15) 53 45 (8, 98) b. Since the data are approximately mound-shaped, we can use the Empirical Rule. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 59 On the red exam, the mean is 39% and the standard deviation is 12%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x s 39 12 (27, 51) About 95% of all students will score within 2 standard deviations of the mean. This interval is: x 2 s 39 2(12) 39 24 (15, 63) About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: x 3s 39 3(12) 39 36 (3, 75) The student would have been more likely to have taken the red exam. For the blue exam, we know that approximately 95% of all scores will be from 23% to 83%. The observed 20% score does not fall in this range. For the red exam, we know that approximately 95% of all scores will be from 15% to 63%. The observed 20% score does fall in this range. Thus, it is more likely that the student would have taken the red exam. a. The z-score for Harvard is z = 5.08. This means that Harvard’s productivity score was 5.08 standard deviations above the mean. This is extremely high and extremely unusual. b. The z-score for Howard University is z = .85. This means that Howard University’s productivity score was .85 standard deviations below the mean. This is not an unusual z-score. c. Yes. Other indicators that the distribution is skewed to the right are the values of the highest and lowest z-scores. The lowest z-score is less than 1 standard deviation below the mean while the highest z-score is 5.08 standard deviations above the mean. Using MINITAB, the histogram of the z-scores is: Histogram of Z-Score 70 60 50 Frequency 2.103 c. 40 30 20 10 0 -1 0 1 2 Z-Score 3 4 5 This histogram does imply that the data are skewed to the right. Copyright © 2014 Pearson Education, Inc. 60 Chapter 2 2.104 a. From the problem, 2.7 and .5 z x z x x z For z = 2.0, x 2.7 2.0(.5) 3.7 For z = 1.0, x 2.7 1.0(.5) 2.2 For z = .5, x 2.7 .5(.5) 2.95 For z = 2.5, x 2.7 2.5(.5) 1.45 b. For z = 1.6, x 2.7 1.6(.5) 1.9 c. If we assume the distribution of GPAs is approximately mound-shaped, we can use the Empirical Rule. From the Empirical Rule, we know that .025 or 2.5% of the students will have GPAs above 3.7 (with z = 2). Thus, the GPA corresponding to summa cum laude (top 2.5%) will be greater than 3.7 (z > 2). We know that .16 or 16% of the students will have GPAs above 3.2 (z = 1). Thus, the limit on GPAs for cum laude (top 16%) will be greater than 3.2 (z > 1). We must assume the distribution is mound-shaped. 2.105 Not necessarily. Because the distribution is highly skewed to the right, the standard deviation is very large. Remember that the z-score represents the number of standard deviations a score is from the mean. If the standard deviation is very large, then the z-scores for observations somewhat near the mean will appear to be fairly small. If we deleted the schools with the very high productivity scores and recomputed the mean and standard deviation, the standard deviation would be much smaller. Thus, most of the z-scores would be larger because we would be dividing by a much smaller standard deviation. This would imply a bigger spread among the rest of the schools than the original distribution with the few outliers. 2.106 To determine if the measurements are outliers, compute the z-score. a. b. z x x 65 57 .727 11 s Since the z-score is less than 3, this would not be an outlier. x x 21 57 3.273 Since the z-score is greater than 3 in absolute value, this would be an 11 s outlier. z Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data c. d. 2.107 z 61 x x 72 57 1.364 Since the z-score is less than 3, this would not be an outlier. 11 s x x 98 57 3.727 Since the z-score is greater than 3 in absolute value, this would be an 11 s outlier. z The interquartile range is IQR QU QL 85 60 25 . The lower inner fence = QL 1.5( IQR ) 60 1.5(25) 22.5 . The upper inner fence = QU 1.5( IQR ) 85 1.5(25) 122.5 . The lower outer fence = QL 3( IQR ) 60 3(25) 15 . The upper outer fence = QU 3( IQR ) 85 3(25) 160 . With only this information, the box plot would look something like the following: * ──────────── ──────────────────│ + │────── ──────────── ─┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼─── 10 20 30 40 50 60 70 80 90 100 110 The whiskers extend to the inner fences unless no data points are that small or that large. The upper inner fence is 122.5. However, the largest data point is 100, so the whisker stops at 100. The lower inner fence is 22.5. The smallest data point is 18, so the whisker extends to 22.5. Since 18 is between the inner and outer fences, it is designated with a *. We do not know if there is any more than one data point below 22.5, so we cannot be sure that the box plot is entirely correct. 2.108 a. Median is approximately 4. b. QL is approximately 3 (Lower Quartile) QU is approximately 6 (Upper Quartile) c. IQR QU QL 6 3 3 d. The data set is skewed to the right since the right whisker is longer than the left, there is one outlier, and there are two potential outliers. e. 50% of the measurements are to the right of the median and 75% are to the left of the upper quartile. f. The upper inner fence is QU 1.5( IQR ) 6 1.5(3) 10.5 . The upper outer fence is QU 3( IQR ) 6 3(3) 15 . Thus, there are two suspect outliers, 12 and 13. There is one highly suspect outlier, 16. Copyright © 2014 Pearson Education, Inc. 62 Chapter 2 2.109 a. Using MINITAB, the box plot for sample A is given below. Boxplot of Sample A 200 Sample A 175 150 125 100 Using MINITAB, the box plot for sample B is given below. Boxplot of Sample B 210 200 Sample B 190 180 170 160 150 140 b. In sample A, the measurement 84 is an outlier. This measurement falls outside the lower outer fence. Lower outer fence = Lower hinge 3( IQR ) 150 3(172 150) 150 3(22) 84 Lower inner fence = Lower hinge 1.5( IQR ) 150 1.5(22) 117 Upper inner fence = Upper hinge 1.5( IQR ) 172 1.5(22) 205 In addition, 100 may be an outlier. It lies outside the inner fence. In sample B, 140 and 206 may be outliers. The point 140 lies outside the inner fence while the point 206 lies right at the inner fence. Lower outer fence = Lower hinge 3( IQR ) 168 3(184 169) 168 3(15) 123 Lower inner fence = Lower hinge 1.5( IQR ) 168 1.5(15) 145.5 Upper inner fence = Upper hinge 1.5( IQR ) 184 1.5(15) 206.5 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.110 2.111 2.112 a. The approximate 25th percentile PASI score before treatment is 10. The approximate median before treatment is 15. The approximate 75th percentile PASI score before treatment is 28. b. The approximate 25th percentile PASI score after treatment is 3. The approximate median after treatment is 5. The approximate 75th percentile PASI score after treatment is 7.5. c. Since the 75th percentile after treatment is lower than the 25th percentile before treatment, it appears that the ichthyotherapy is effective in treating psoriasis. a. The average expenditure per full-time employee is $6,563. The median expenditure per employee is $6,232. Half of all expenditures per employee were less than $6,232 and half were greater than $6,232. The lower quartile is $5,309. Twenty-five percent of all expenditures per employee were below $5,309. The upper quartile is $7,216. Seventy-five percent of all expenditures per employee were below $7,216. b. IQR QU QL $7, 216 $5, 309 $1, 907 . c. The interquartile range goes from the 25th percentile to the 75th percentile. Thus, .5 .75 .25 of the 1,751 army hospitals have expenses between $5,309 and $7,216. a. From the printout, x 52.334 and s = 9.224. The highest salary is 75 (thousand). The z-score is z x x 75 52.334 2.46 9.224 s Therefore, the highest salary is 2.46 standard deviations above the mean. The lowest salary is 35.0 (thousand). The z-score is z x x 35.0 52.334 1.88 9.224 s Therefore, the lowest salary is 1.88 standard deviations below the mean. The mean salary offer is 52.33 (thousand). The z-score is z x x 52.33 52.334 0 9.224 s The z-score for the mean salary offer is 0 standard deviations from the mean. No, the highest salary offer is not unusually high. For any distribution, at least 8/9 of the salaries should have z-scores between 3 and 3. A z-score of 2.46 would not be that unusual. 2.113 63 b. Since no salaries are outside the inner fences, none of them are suspect or highly suspect outliers. a. The z-score is: z x x 160 141.31 1.05 17.77 s Since the z-score is not large, it is not considered an outlier. Copyright © 2014 Pearson Education, Inc. 64 Chapter 2 b. Z-scores with values greater than 3 in absolute value are considered outliers. An observation with a z-score of 3 would have the value: z xx x 141.31 3 3(17.77) x 141.31 53.31 x 141.31 x 194.62 17.77 s An observation with a z-score of 3 would have the value: z xx x 141.31 3 3(17.77) x 141.31 53.31 x 141.31 x 88.00 17.77 s Thus any observation of semester hours that is greater than or equal to 194.62 or less than or equal to 88 would be considered an outlier. 2.114 From Exercise 2.100, x 67.755 and s 26.87 . Using MINITAB, a boxplot of the data is: Boxplot of Support 160 140 120 Support 100 80 60 40 20 0 From the boxplot, the support level of 155 would be an outlier. From Exercise 2.100, we found the z-score x x 155 67.755 associated with a score of 155 as z 3.25 . Since this z-score is greater than 3, the 26.871 s observation 155 is considered an outlier. a. Using MINITAB, the boxplots for each type of firm are: Boxplot of TIME vs VOTES 10 8 TIME 2.115 6 4 2 0 Joint None VOTES Prepack Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.116 65 b. The median bankruptcy time for Joint firms is about 1.5. The median bankruptcy time for None firms is about 3.2. The median bankruptcy time for Prepack firms is about 1.4. c. The range of the "Prepack" firms is less than the other two, while the range of the "None" firms is the largest. The interquartile range of the "Prepack" firms is less than the other two, while the interquartile range of the "Joint" firms is larger than the other two. d. No. The interquartile range for the "Prepack" firms is the smallest which corresponds to the smallest standard deviation. However, the second smallest interquartile range corresponds to the "None" firms. The second smallest standard deviation corresponds to the "Joint" firms. e. Yes. There is evidence of two outliers in the "Prepack" firms. These are indicated by the two *'s. There is also evidence of two outliers in the "None" firms. These are indicated by the two *'s. a. From Exercise 2.101, x 5.24 , s 2 52.482 , and s 7.244 . We will use 3 standard deviations from the mean as the cutoff for outliers. Z-scores with values greater than 3 in absolute value are considered outliers. An observation with a z-score of 3 would have the value: z xx x 5.24 3 3(7.244) x 5.24 21.732 x 5.24 x 26.972 7.244 s An observation with a z-score of -3 would have the value: xx x 5.24 z 3 3(7.244) x 5.24 21.732 x 5.24 x 16.492 7.244 s Thus, any observation that is greater than 26.972 or less than -16.492 would be considered an outlier. In this data set there would be 1 outlier: 48. x n b. Deleting the observation 48, the sample mean is: x i 1 n i 345 4.66 74 xi 3452 xi2 i 3639 n 74 27.8158 The sample variance is: s 2 i 74 1 n 1 2 The standard deviation is: s s 2 27.8158 5.274 The mean has decreased from 5.24 to 4.66, while the standard deviation decreased from 7.244 to 5.274. Copyright © 2014 Pearson Education, Inc. 66 Chapter 2 2.117 a. Using MINITAB, the boxplot is: Boxplot of Score 100 95 Score 90 85 80 75 70 From the boxplot, there appears to be 10 outliers: 69, 73, 74, 78, 83, 84, 84, 86, 86, and 86. b. From Exercise 2.81, x 95.699 and s = 4.963. Since the data are skewed to the left, we will consider observations more than 2 standard deviations from the mean to be outliers. An observation with a z-score of 2 would have the value: z xx x 95.699 2 2(4.963) x 95.699 9.926 x 95.699 x 105.625 4.963 s An observation with a z-score of -2 would have the value: z xx x 95.699 2 2(4.963) x 95.699 9.926 x 95.699 x 85.773 4.963 s Observations greater than 105.625 or less than 85.773 would be considered outliers. Using this criterion, the following observations would be outliers: 69, 73, 74, 78, 83, 84, and 84. c. No, these methods do not agree exactly. Using the boxplot, 10 observations were identified as outliers. Using the z-score method, only 7 observations were identified as outliers. However, the 3 additional points that were not identified as outliers using the z-score method were very close to the cutoff value. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.118 a. 67 Using MINITAB, the box plot is: Boxplot of TIME 70 60 50 TIME 40 30 20 10 0 The median is about 18. The data appear to be skewed to the right since there are 3 suspect outliers to the right and none to the left. The variability of the data is fairly small because the IQR is fairly small, approximately 26 10 = 16. b. The customers associated with the suspected outliers are customers 268, 269, and 264. c. In order to find the z-scores, we must first find the mean and standard deviation. x x 815 20.375 n s2 40 x x 2 n 1 n 2 2 24129 815 40 192.90705 40 1 s 192.90705 13.89 The z-scores associated with the suspected outliers are: Customer 268 z 49 20.375 2.06 13.89 Customer 269 z 50 20.375 2.13 13.89 Customer 264 z 64 20.375 3.14 13.89 All the z-scores are greater than 2. These are unusual values. 2.119 From the stem-and-leaf display in Exercise 2.34, the data are fairly mound-shaped, but skewed somewhat to the right. The sample mean is x x 1493 59.72 . n 25 Copyright © 2014 Pearson Education, Inc. 68 Chapter 2 The sample variance is s 2 x x 2 2 n 1 n 14932 25 321.7933 . 25 1 96,885 The sample standard deviation is s 321.7933 17.9386 . The z-score associated with the largest value is z Since x x 102 59.72 2.36 . 17.9386 s the data are not extremely skewed to the right, this observation is probably not an outlier. The observations associated with the one-time customers are 5 of the largest 7 observations. Thus, repeat customers tend to have shorter delivery times than one-time customers. 2.120 For Perturbed Intrinsics, but no Perturbed Projections: x 2 n x i 1 n i n xi n i 1 2 8.12 x 15.63 i n 5 2.508 .627 s 2 i 1 n 1 5 1 4 8.1 1.62 5 The z-score corresponding to a value of 4.5 is z s s 2 .627 .792 x x 4.5 1.62 3.63 .792 s Since this z-score is greater than 3, we would consider this an outlier for perturbed intrinsics, but no perturbed projections. For Perturbed Projections, but no Perturbed Intrinsics: x 2 n x i 1 n i n xi n 2 125.82 xi i 1 3350.1 n 5 184.972 46.243 s 2 i 1 5 1 4 n 1 125.8 25.16 5 s s 2 46.243 6.800 The z-score corresponding to a value of 4.5 is z x x 4.5 25.16 3.038 6.800 s Since this z-score is less than -3, we would consider this an outlier for perturbed projections, but no perturbed intrinsics. Since the z-score corresponding to 4.5 for the perturbed projections, but no perturbed intrinsics is smaller in absolute value than that for perturbed intrinsics, but no perturbed projections, it is more likely that the that the type of camera perturbation is perturbed projections, but no perturbed intrinsics. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.121 69 Using MINITAB, the scatterplot is: Scatterplot of Var2 vs Var1 18 16 14 Var2 12 10 8 6 4 2 0 0 1 2 3 4 5 6 8 Var1 2.122 Using MINITAB, a scatterplot of the data is: Scatterplot of Var2 vs Var1 14 12 Var2 10 8 6 4 2 0 0 2 4 Var1 2.123. From the scatterplot of the data, it appears that as the number of punishments increases, the average payoff decreases. Thus, there appears to be a negative linear relationship between punishment use and average payoff. This supports the researchers conclusion that “winners” don’t punish”. Copyright © 2014 Pearson Education, Inc. 70 2.124 Chapter 2 Using MINITAB, the scatterplot of the data is: Scatterplot of Catch vs Search 7000 Catch 6000 5000 4000 3000 15 20 25 Search 30 35 There is an apparent negative linear trend between the search frequency and the total catch. As the search frequency increases, the total catch tends to decrease. Using MINITAB, a scattergram of the data is: Scatterplot of SLUGPCT vs ELEVATION 0.625 0.600 0.575 SLUGPCT 2.125 0.550 0.525 0.500 0.475 0.450 0 1000 2000 3000 ELEVATION 4000 5000 6000 If we include the observation from Denver, then we would say there might be a linear relationship between slugging percentage and elevation. If we eliminated the observation from Denver, it appears that there might not be a relationship between slugging percentage and elevation. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.126 71 Using MINITAB, the scatterplot of the data is: Scatterplot of MATH2011 vs MATH2001 625 600 MATH2011 575 550 525 500 475 450 460 480 500 520 540 560 MATH2001 580 600 620 There appears to be a positive linear trend between the Math SAT scores in 2001 and the Math SAT scores in 2011. As the 2001 Math SAT scores increase, the 2011 Math SAT scores also tend to increase. a. Using MINITAB, a scatterplot of JIF and cost is: Scatterplot of JIF vs Cost 3.5 3.0 2.5 JIF 2.0 1.5 1.0 0.5 0.0 0 200 400 600 800 1000 Cost 1200 1400 1600 1800 There is a slight negative linear trend to the data. As cost increases, JIF tends to decrease. b. Using MINITAB, a scatterplot of the number of cities and cost is: Scatterplot of Cites vs Cost 800 700 600 500 Cites 2.127 400 300 200 100 0 0 200 400 600 800 1000 Cost 1200 1400 1600 1800 Copyright © 2014 Pearson Education, Inc. 72 Chapter 2 There is a moderate positive trend to the data. As cost increases, the number of cities tends to increase. c. Using MINITAB, a scatterplot of RPI and cost is: Scatterplot of RPI vs Cost 4 RPI 3 2 1 0 0 200 400 600 800 1000 Cost 1200 1400 1600 1800 There is a slight positive trend to the data. As cost increases, RPI tends to increase. Using MINITAB, the scatterplot of the data is: Scatterplot of Mass vs Time 7 6 5 4 M ass 2.128 3 2 1 0 0 10 20 30 T ime 40 50 60 There is evidence to indicate that the mass of the spill tends to diminish as time increases. As time is getting larger, the mass is decreasing. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.129 a. 73 Using MINITAB, a scatterplot of the data is: Scatterplot of Year2 vs Y ear1 55 50 Year2 45 40 35 30 20 30 40 Year1 50 60 There is a moderate positive trend to the data. As the scores for Year1 increase, the scores for Year2 also tend to increase. b. Using MINITAB, the scattergram of the data is: Scatterplot of Value ($mil) vs Income ($mil) 2000 1800 1600 Value ($mil) 2.130 From the graph, two agencies that had greater than expected PARS evaluation scores for Year2 were USAID and State. 1400 1200 1000 800 600 0 20 40 60 Income ($mil) 80 100 120 There is a moderate positive trend to the data. As operating income increases, the 2011 value also tends to increase. Since the trend is moderate, we would recommend that an NFL executive use operating income to predict a team’s current value. Copyright © 2014 Pearson Education, Inc. 74 Chapter 2 2.131 a. Using MINITAB, the scatterplot of the data is: Scatterplot of YRSPRAC vs EDHRS 40 YRSPRAC 30 20 10 0 0 200 400 600 800 1000 EDHRS There does not appear to be much of a relationship between the years of experience and the amount of exposure to ethics in medical school. Using MINITAB, a boxplot of the amount of exposure to ethics in medical school is: Boxplot of EDHRS 1000 800 600 EDHRS b. 400 200 0 The one data point that is an extreme outlier is the value of 1000. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data c. 75 After removing this data point, the scatterplot of the data is: Scatterplot of YRSPRAC vs EDHRS 40 YRSPRAC 30 20 10 0 0 10 20 30 40 50 EDHRS 60 70 80 90 With the data point removed, there now appears to be a negative trend to the data. As the amount of exposure to ethics in medical school increases, the years of experience decreases. 2.132 Using MINITAB, a scatterplot of the data is: Scatterplot of ACCURACY vs DISTANCE 75 70 ACCURACY 65 60 55 50 45 280 290 300 DISTANCE 310 320 Yes, his concern is a valid one. From the scatterplot, there appears to be a fairly strong negative relationship between accuracy and driving distance. As driving distance increases, the driving accuracy tend to decrease. 2.133 One way the bar graph can mislead the viewer is that the vertical axis has been cut off. Instead of starting at 0, the vertical axis starts at 12. Another way the bar graph can mislead the viewer is that as the bars get taller, the widths of the bars also increase. Copyright © 2014 Pearson Education, Inc. 76 Chapter 2 2.134 a. Using MINITAB, the time series plot is: Time Series Plot of Deaths 900 800 700 Deaths 600 500 400 300 200 100 0 2003 2004 2005 2006 Index 2.135 b. The time series plot is misleading because the information for 2006 is incomplete – it is based on only 2 months while all of the rest of the years are based on 12 months. c. In order to construct a plot that accurately reflects the trend in American casualties from the Iraq War, we would want complete data for 2006 and information for the years 2007 through 2011. a. The graph might be misleading because the scales on the vertical axes are different. The left vertical axis ranges from 0 to $120 million. The right vertical axis ranges from 0 to $20 billion. b. Using MINITAB, the redrawn graph is: Time Series Plot of Craigslist, NewspaperAds 18000 Variable C raigslist NewspaperA ds 16000 14000 Data 12000 10000 8000 6000 4000 2000 0 2003 2004 2005 2006 Index 2007 2008 2009 Although the amount of revenue produced by Craigslist has increased dramatically from 2003 to 2009, it is still much smaller than the revenue produced by newspaper ad sales. 1.136 a. This graph is misleading because it looks like as the days are increasing, the number of barrels collected per day are also increasing. However, the bars are the cumulative number of barrels collected. The cumulative value can never decrease. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data b. 77 Using MINITAB, the graph of the daily collection of oil is: Chart of Barrells 2500 Barrells 2000 1500 1000 500 0 May-16 May-17 May-18 May-19 May-20 Day May-21 May-22 May-23 From this graph, it shows that there has not been a steady improvement in the suctioning process. There was an increase for 3 days, then a leveling off for 3 days, then a decrease. 2.137 The relative frequency histogram is: Histogram of Class Relative frequency .20 .15 .10 .05 0 1.125 2.625 4.125 5.625 Measurement Class 7.125 8.625 2.138 The mean is sensitive to extreme values in a data set. Therefore, the median is preferred to the mean when a data set is skewed in one direction or the other. 2.139 a. z b. z c z d. z x x x x 50 60 1 10 z 70 60 1 10 z 80 60 2 10 50 50 0 5 z 70 50 4 5 z 80 50 6 5 50 40 1 10 z 70 40 3 10 z 80 40 4 10 50 40 .1 100 z 70 40 .3 100 z 80 40 .4 100 Copyright © 2014 Pearson Education, Inc. 78 Chapter 2 2.140 2.141 a. If we assume that the data are about mound-shaped, then any observation with a z-score greater than 3 in absolute value would be considered an outlier. From Exercise 2.139, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 1, and the z-score corresponding to 80 is 2. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers. b. From Exercise 2.139, the z-score corresponding to 50 is 2, the z-score corresponding to 70 is 2, and the z-score corresponding to 80 is 4. Since the z-score corresponding to 80 is greater than 3, 80 would be considered an outlier. c. From Exercise 2.139, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 3, and the z-score corresponding to 80 is 4. Since the z-scores corresponding to 70 and 80 are greater than or equal to 3, 70 and 80 would be considered outliers. d. From Exercise 2.139, the z-score corresponding to 50 is .1, the z-score corresponding to 70 is .3, and the z-score corresponding to 80 is .4. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers. a. x 13 1 10 3 3 30 x b. x 25 6.25 x n 49 7 7 s2 x 12 3 n 4 2 2 n 2 2 302 5 108 27 5 1 4 288 s 27 5.20 x 13 6 6 0 241 2 x 3 3 3 3 12 x 2 x x n 1 s2 4 n 2 2 2 2 x x 2 n 1 n 2 x x 2 n 252 4 84.75 28.25 4 1 3 241 s 28.25 5.32 x 1 0 1 10 11 11 15 569 . 2 n 1 2 2 2 x 1 0 1 10 11 11 15 49 x d. s2 5 n 2 x 13 6 6 0 25 x c. x 30 6 x 13 1 10 3 3 288 2 2 2 2 492 7 226 37.67 7 1 6 569 2 2 s 37.67 6.14 x 3 3 3 3 36 2 s2 2 x x 2 2 2 2 n 1 n 2 122 4 0 0 4 1 3 36 Copyright © 2014 Pearson Education, Inc. 2 s 0 0 2.142 a. x 4 6 6 5 6 7 34 x b. Methods for Describing Sets of Data x 34 5.67 s2 6 n 79 x 4 6 6 5 6 7 198 2 2 x x 2 2 2 n 1 n 2 2 2 2 342 6 5.3333 1.0667 6 1 5 198 s 1.067 1.03 x 1 4 (3) 0 (3) (6) 9 x (1) 4 (3) 0 (3) (6) 71 2 x x n 9 $1.5 6 s2 x x 2 2 2 n 1 n 2 2 2 2 2 (9) 2 6 57.5 11.5 dollars squared 6 1 5 71 s 11.5 $3.39 c. x 5 5 5 5 16 2.0625 3 x s2 4 2 1 1 x 2.0625 .4125% n x 5 5 5 5 16 1.2039 2 3 2 4 2 2 2 1 2 1 2 5 x x 2 2 n 1 n 2.06252 .3531 5 .0883% squared 5 1 4 1.2039 s .0883 .30% d. (a) Range = 7 4 = 3 (b) Range = $4 ($-6) = $10 (c) Range = 4 1 64 5 59 % % % % % .7375% 5 16 80 80 80 2.143 The range is found by taking the largest measurement in the data set and subtracting the smallest measurement. Therefore, it only uses two measurements from the whole data set. The standard deviation uses every measurement in the data set. Therefore, it takes every measurement into account—not just two. The range is affected by extreme values more than the standard deviation. 2.144 range 20 5 4 4 Copyright © 2014 Pearson Education, Inc. 80 Chapter 2 2.145 Using MINITAB, the scatterplot is: Scatterplot of Var2 vs Var1 30 Var2 25 20 15 10 100 a. 300 Var1 Management System Cause Category Engineering & Design Procedures & Practices Management & Oversight Training & Communication TOTAL b. 400 500 To find relative frequencies, we divide the frequencies of each category by the total number of incidents. The relative frequencies of the number of incidents for each of the cause categories are: Number of Incidents Relative Frequencies 27 24 22 10 83 27 / 83 = .325 24 / 83 = .289 22 / 83 = .265 10 / 83 = .120 1 The Pareto diagram is: Management Systen Cause Category 35 30 25 P er cent 2.146 200 20 15 10 5 0 E ng&D es c. P roc&P ract M gmt&O v er C ategor y Trn&C omm The category with the highest relative frequency of incidents is Engineering and Design. The category with the lowest relative frequency of incidents is Training and Communication. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.147 a. 81 The relative frequency for each response category is found by dividing the frequency by the total sample size. The relative frequency for the category “Global Marketing” is 235/2863 = .082. The rest of the relative frequencies are found in a similar manner and are reported in the table. Area Global Marketing Sales Management Buyer Behavior Relationships Innovation Marketing Strategy Channels/Distribution Marketing Research Services TOTAL Number 235 494 478 498 398 280 213 131 136 2,863 Relative Frequencies 235/2863 = .082 494/2863 = .173 478/2863 = .167 498/2863 = .174 398/2863 = .139 280/2863 = .098 213/2863 = .074 131/2863 = .046 136/2863 = .048 1.00 Relationships and sales management had the most articles published with 17.4% and 17.3%, respectively. Not far behind was Buyer Behavior with 16.7%. Of the rest of the areas, only innovation had more than 10%. b. Using MINITAB, the pie chart of the data is: Pie Chart of Number vs Area Serv ices Mark eting research 4.8% 4.6% Global Mark eting 8.2% C hannells/Distribution 7.4% Sales Management 17.3% Mark eting Strategy 9.8% Inov ation 13.9% C ategory Global Mark eting Sales Management Buy er Behavior Relationships Inovation Mark eting Strategy C hannells/Distribution Mark eting research Serv ices Buy er Behav ior 16.7% Relationships 17.4% The slice for Marketing Research is smaller than the slice for Sales Management because there were fewer articles on Marketing Research than for Sales Management. 2.148 a. The data are time series data because the numbers of bankruptcies were collected over a period of 10 months. Copyright © 2014 Pearson Education, Inc. 82 Chapter 2 b. Using MINITAB, the time series plot is: Time Series Plot of Bankrupties 120000 100000 Bankrupties 80000 60000 40000 20000 0 0 Jan c. 2.149 Feb Mar Apr May Jun Month Jul Aug Sep Oct There is a generally increasing trend in the number of bankruptcies as the months increase. Using MINITAB, the pie chart is: Pie Chart of F vs DrivStar 2 4, 4.1% 5 18, 18.4% 3 17, 17.3% C ategory 2 3 4 5 4 59, 60.2% 60% of cars have 4-star rating and only 4% have 2-star ratings. 2.150 a. The average driver’s severity of head injury in head-on collisions is 603.7. b. Since the mean and median are close in value, the data should be fairly symmetric. Thus, we can use the Empirical Rule. We know that about 95% of all observations will fall within 2 standard deviations of the mean. This interval is x 2 s 603.7 2(185.4) 603.7 370.8 (232.9, 974.5) Most of the head-injury ratings will fall between 232.9 and 974.5. c. x x 408 603.7 1.06 185.4 s Since the absolute value is not very big, this is not an unusual value to observe. The z-score would be: z Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.151 a. 83 Using MINITAB, a Pareto diagram for the data is: Chart Defects 70 60 Frequency 50 40 30 20 10 0 Body Accessories Electrical Defect Transmission Engine The most frequently observed defect is a body defect. b. Using MINITAB, a Pareto diagram for the Body Defect data is: Chart of Body Defects 30 Frequency 25 20 15 10 5 0 Paint Dents Upolstery Body Defect Windshield Chrome Most body defects are either paint or dents. These two categories account for 30 25 / 70 55 / 70 .786 of all body defects. Since these two categories account for so much of the body defects, it would seem appropriate to target these two types of body defects for special attention. 2.152 a. The data collection method was a survey. b. Since the data were 4 different categories, the variable is qualitative. Copyright © 2014 Pearson Education, Inc. 84 Chapter 2 c. Using MINITAB, a pie chart of the data is: Pie Chart of Made USA Category < 50% 100% 50-74% 75-99% < 50% 4, 3.8% 75-99% 20, 18.9% 50-74% 18, 17.0% 100% 64, 60.4% About 60% of those surveyed believe that “Made in USA” means 100% US labor and materials. 2.153 a. From the information given, we have x 375 and s = 25. From Chebyshev's Rule, we know that at least three-fourths of the measurements are within the interval: x 2 s , or (325, 425) Thus, at most one-fourth of the measurements exceed 425. In other words, more than 425 vehicles used the intersection on at most 25% of the days. b. According to the Empirical Rule, approximately 95% of the measurements are within the interval: x 2 s , or (325, 425) This leaves approximately 5% of the measurements to lie outside the interval. Because of the symmetry of a mound-shaped distribution, approximately 2.5% of these will lie below 325, and the remaining 2.5% will lie above 425. Thus, on approximately 2.5% of the days, more than 425 vehicles used the intersection. 2.154 The percentile ranking of the age of 25 years would be 100% 75% = 25%. Thus, an age of 25 would correspond to the 25th percentile. 2.155 a. Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf of PENALTY Leaf Unit = 10 (28) 10 5 5 4 3 3 3 3 2 1 b. N = 38 0 0011111222222223333334444899 1 00239 2 3 0 4 0 5 6 7 8 5 9 3 10 0 See the highlighted leaves in part a. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data c. 2.156 85 Most of the penalties imposed for Clean Air Act violations are relatively small compared to the penalties imposed for other violations. All but two of the penalties for Clean Air Act violations are below the median penalty imposed. Using MINITAB, the pie charts are: Color F (82, 26.6%) E (44, 14.3%) D (16, 5.2%) I (40, 13.0%) G (65, 21.1%) H (61, 19.8%) VS1 (81, 26.3%) IF (44, 14.3%) VS2 (53, 17.2%) VVS2 (78, 25.3%) VVS1 (52, 16.9%) Clarity The F color occurs the most often with 26.6%. The clarity that occurs the most is VS1 with 26.3%. The D color occurs the least often with 5.2%. The clarity that occurs the least is IF with 14.3%. a. Using MINITAB, the relative frequency histogram is: Histogram of CARAT .20 Relative frequency 2.157 .15 .10 .05 0 0.30 0.45 0.60 CARAT 0.75 0.90 1.05 Copyright © 2014 Pearson Education, Inc. Chapter 2 b. Using MINITAB, the relative frequency histogram for the GIA group is: Histogram for GIA .14 Relative frequency .12 .10 .08 .06 .04 .02 0 0.30 0.45 0.60 CARAT 0.75 0.90 1.05 Using MINITAB, the relative frequency histograms for the HRD and IGI groups are: Histogram for HRD .40 Relative frequency c. .30 .20 .10 0 0.5 0.6 0.7 0.8 CARAT 0.9 1.0 1.1 Histogram for IGI .35 .30 Relative frequency 86 .25 .20 .15 .10 .05 0 0.2 0.4 0.6 CARAT 0.8 1.0 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data d. The HRD group does not assess any diamonds less than .5 carats and almost 40% of the diamonds they assess are 1.0 carat or higher. The IGI group does not assess very many diamonds over .5 carats and more than half are .3 carats or less. More than half of the diamonds assessed by the GIA group are more than .5 carats, but the sizes are less than those of the HRD group. x n The sample mean is: x i 1 n e. i 194.32 .631 308 The average number of carats for the 308 diamonds is .631. f. The median is the average of the middle two observations once they have been ordered. The 154th and 155th observations are .62 and .62. The average of these two observations is .62. Half of the diamonds weigh less than .62 carats and half weigh more. g The mode is 1.0. This observation occurred 32 times. h. Since the mean and median are close in value, either could be a good descriptor of central tendency. i. From Chebyshev’s Theorem, we know that at least ¾ or 75% of all observations will fall within 2 standard deviations of the mean. From part e, x .63 . xi 194.322 xi2 i 146.19 n 308 .0768 square carats The variance is: s 2 i 308 1 n 1 2 The standard deviation is: s s 2 .0768 .277 carats This interval is: x 2 s .631 2(.277) .631 .554 (.077, 1.185) Using MINITAB, the scatterplot is: Scatterplot of PRICE vs CARAT 18000 16000 14000 12000 PRICE 2.158 87 10000 8000 6000 4000 2000 0 0.2 0.3 0.4 0.5 0.6 0.7 CARAT 0.8 0.9 1.0 1.1 As the number of carats increases the price of the diamond tends to increase. There appears to be an upward trend. Copyright © 2014 Pearson Education, Inc. 88 Chapter 2 2.159 a. Using MINITAB, a bar graph of the data is: Chart of Cause 12 10 Count 8 6 4 2 0 Collision Fire Grounding Cause HullFail Unknown Fire and grounding are the two most likely causes of puncture. b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Spillage Variable Spillage N 42 Mean 66.19 StDev 56.05 Minimum 25.00 Q1 32.00 Median 43.00 Q3 77.50 Maximum 257.00 The mean spillage amount is 66.19 thousand metric tons, while the median is 43.00. Since the median is so much smaller than the mean, it indicates that the data are skewed to the right. The standard deviation is 56.05. Again, since this value is so close to the value of the mean, it indicates that the data are skewed to the right. Since the data are skewed to the right, we cannot use the Empirical Rule to describe the data. Chebyshev’s Rule can be used. Using Chebyshev’s Rule, we know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. x 3s 66.19 3(56.05) 66.19 168.15 ( 101.96, 234.34) or (0, 234.34) since we cannot have negative spillage. Thus, at least 8/9 of all oil spills will be between 0 and 234.34 thousand metric tons. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.160 Using MINITAB, a pie chart of the data is: Pie Chart of Defectt Category False True True 49, 9.8% False 449, 90.2% A response of ‘true’ means the software contained defective code. Thus, only 9.8% of the modules contained defective software code. 2.161 a. Since no information is given about the distribution of the velocities of the Winchester bullets, we can only use Chebyshev's Rule to describe the data. We know that at least 3/4 of the velocities will fall within the interval: x 2 s 936 2(10) 936 20 (916, 956) Also, at least 8/9 of the velocities will fall within the interval: x 3s 936 3(10) 936 30 (906, 966) b. Since a velocity of 1,000 is much larger than the largest value in the second interval in part a, it is very unlikely that the bullet was manufactured by Winchester. Copyright © 2014 Pearson Education, Inc. 89 90 Chapter 2 2.162 a. First, we must compute the total processing times by adding the processing times of the three departments. The total processing times are as follows: Request 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total Processing Time 13.3 5.7 7.6 20.0* 6.1 1.8 13.5 13.0 15.6 10.9 8.7 14.9 3.4 13.6 14.6 14.4 Request 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Total Processing Time 19.4* 4.7 9.4 30.2 14.9 10.7 36.2* 6.5 10.4 3.3 8.0 6.9 17.2* 10.2 16.0 11.5 Request 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Total Processing Time 23.4* 14.2 14.3 24.0* 6.1 7.4 17.7* 15.4 16.4 9.5 8.1 18.2* 15.3 13.9 19.9* 15.4 14.3* 19.0 The stem-and-leaf displays with the appropriate leaves highlighted are as follows: Stem-and-leaf of Mkt Leaf Unit = 0.10 6 0 7 1 14 2 16 3 22 4 (10) 5 18 6 8 7 4 8 2 9 2 10 1 11 0112446 3 0024699 25 001577 0344556889 0002224799 0038 07 0 0 Stem-and-leaf of Engr Leaf Unit = 0.10 7 14 19 23 (5) 22 19 14 9 9 7 6 5 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 4466699 3333788 12246 1568 24688 233 01239 22379 66 0 3 023 0 4 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data Stem-and-leaf of Accnt Leaf Unit = 0.10 19 (8) 23 21 19 15 15 13 11 11 11 11 10 9 9 8 8 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 HI 111111111112 2333444 55556888 00 79 0023 23 78 8 2 91 Stem-and-leaf of Total Leaf Unit = 1.00 1 3 5 11 17 21 (5) 24 14 10 6 5 4 0 1 0 33 0 45 0 666677 0 888999 1 0000 1 33333 1 4444445555 1 6677 1 8999 2 0 2 3 2 44 HI 30, 36 0 4 99, 105, 135, 144, 182, 220, 300 Of the 50 requests, 10 were lost. For each of the three departments, the processing times for the lost requests are scattered throughout the distributions. The processing times for the departments do not appear to be related to whether the request was lost or not. However, the total processing times for the lost requests appear to be clustered towards the high side of the distribution. It appears that if the total processing time could be kept under 17 days, 76% of the data could be maintained, while reducing the number of lost requests to 1. b. For the Marketing department, if the maximum processing time was set at 6.5 days, 78% of the requests would be processed, while reducing the number of lost requests by 4. For the Engineering department, if the maximum processing time was set at 7.0 days, 72% of the requests would be processed, while reducing the number of lost requests by 5. For the Accounting department, if the maximum processing time was set at 8.5 days, 86% of the requests would be processed, while reducing the number of lost requests by 5. c. Using MINITAB, the summary statistics are: Descriptive Statistics: REQUEST, MARKET, ENGINEER, ACCOUNT Variable MARKET ENGINEER ACCOUNT TOTAL N Mean 50 4.766 50 5.044 50 3.652 50 13.462 StDev 2.584 3.835 6.256 6.820 Minimum 0.100 0.400 0.100 1.800 Q1 2.825 1.775 0.200 8.075 Median Q3 5.400 6.250 4.500 7.225 0.800 3.725 13.750 16.600 Copyright © 2014 Pearson Education, Inc. Maximum 11.000 14.400 30.000 36.200 92 Chapter 2 d. The z-scores corresponding to the maximum time guidelines developed for each department and the total are as follows: Marketing: z Engineering: z x x 7.0 5.04 .51 3.84 s Accounting: z x x 8.5 3.65 .77 6.26 s Total: z e. x x 6.5 4.77 .67 2.58 s x x 17 13.46 .52 6.82 s To find the maximum processing time corresponding to a z-score of 3, we substitute in the values of z, x , and s into the z formula and solve for x. z xx x x zs x x zs s Marketing: x 4.77 3(2.58) 4.77 7.74 12.51 None of the orders exceed this time. Engineering: x 5.04 3(3.84) 5.04 11.52 16.56 None of the orders exceed this time. These both agree with both the Empirical Rule and Chebyshev's Rule. Accounting: x 3.65 3(6.26) 3.65 18.78 22.43 One of the orders exceeds this time or 1/50 = .02. Total: x 13.46 3(6.82) 13.46 20.46 33.92 One of the orders exceeds this time or 1/50 = .02. These both agree with Chebyshev's Rule but not the Empirical Rule. Both of these last two distributions are skewed to the right. f. Marketing: x 4.77 2(2.58) 4.77 5.16 9.93 Two of the orders exceed this time or 2/50 = .04. Engineering: x 5.04 2(3.84) 5.04 7.68 12.72 Two of the orders exceed this time or 2/50 = .04. Accounting: x 3.65 2(6.26) 3.65 12.52 16.17 Three of the orders exceed this time or 3/50 = .06. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 93 x 13.46 2(6.82) 13.46 13.64 27.10 Two of the orders exceed this time or 2/50 = .04. Total: All of these agree with Chebyshev's Rule but not the Empirical Rule. g. No observations exceed the guideline of 3 standard deviations for both Marketing and Engineering. One observation exceeds the guideline of 3 standard deviations for both Accounting (#23, time = 30.0 days) and Total (#23, time = 36.2 days). Therefore, only (1/10) 100% of the "lost" quotes have times exceeding at least one of the 3 standard deviation guidelines. Two observations exceed the guideline of 2 standard deviations for both Marketing (#31, time = 11.0 days and #48, time = 10.0 days) and Engineering (#4, time = 13.0 days and #49, time = 14.4 days). Three observations exceed the guideline of 2 standard deviations for Accounting (#20, time = 22.0 days; #23, time = 30.0 days; and #36, time = 18.2 days). Two observations exceed the guideline of 2 standard deviations for Total (#20, time = 30.2 days and #23, time = 36.2 days). Therefore, (7/10) 100% = 70% of the "lost" quotes have times exceeding at least one the 2 standard deviation guidelines. We would recommend the 2 standard deviation guideline since it covers 70% of the lost quotes, while having very few other quotes exceed the guidelines. One reason the plot may be interpreted differently is that no scale is given on the vertical axis. Also, since the plot almost reaches the horizontal axis at 3 years, it is obvious that the bottom of the plot has been cut off. Another important factor omitted is who responded to the survey. b. A scale should be added to the vertical axis. Also, that scale should start at 0. a. Using MINITAB, the time series plot of the data is: Time Series Plot of Acquisitions 900 800 700 600 500 400 300 200 100 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 0 1980 2.164 a. Acquisitions 2.163 Year Copyright © 2014 Pearson Education, Inc. Chapter 2 b. To find the percentage of the sampled firms with at least one acquisition, we divide number with acquisitions by the total sampled and then multiply by 100%. For 1980, the percentage of firms with at least on acquisition is (18/1963)*100% = .92%. The rest of the percentages are found in the same manner and are listed in the following table: Year Number of firms 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 TOTAL 1,963 2,044 2,029 2,187 2,248 2,238 2,277 2,344 2,279 2,231 2,197 2,261 2,363 2,582 2,775 2,890 3,070 3,099 2,913 2,799 2,778 51,567 Number with Acquisitions 18 115 211 273 317 182 232 258 296 350 350 370 427 532 626 652 751 799 866 750 748 9,123 Percentage with Acquisitions .92% 5.63% 10.40% 12.48% 14.10% 8.13% 10.19% 11.01% 12.99% 15.69% 15.93% 16.36% 18.07% 20.60% 22.56% 22.56% 24.46% 25.78% 29.73% 26.80% 26.93% Using MINITAB, the time series plot is: Time Series Plot of Percent 30 25 20 Percent 15 10 5 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 0 1980 94 Year c. In this case, both plots are almost the same. In general, the time series plot of the percents would be more informative. By changing the observations to percents, one can compare time periods with different sample sizes on the same basis. Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 2.165 a. b. 95 Since the mean is greater than the median, the distribution of the radiation levels is skewed to the right. x s 10 3 (7, 13) ; x 2 s 10 2(3) (4, 16) ; x 3s 10 3(3) (1, 19) Interval (7, 13) (4, 16) (1, 19) Chebyshev's At least 0 At least 75% At least 88.9% Empirical 68% 95% 100% Since the data are skewed to the right, Chebyshev's Rule is probably more appropriate in this case. c. The background level is 4. Using Chebyshev's Rule, at least 75% or .75(50) 38 homes are above the background level. Using the Empirical Rule, 97.5% or .975(50) 49 homes are above the background level. d. z x x 20 10 3.333 3 s It is unlikely that this new measurement came from the same distribution as the other 50. Using either Chebyshev's Rule or the Empirical Rule, it is very unlikely to see any observations more than 3 standard deviations from the mean. 2.167 a. Since it is given that the distribution is mound-shaped, we can use the Empirical Rule. We know that 1.84% is 2 standard deviations below the mean. The Empirical Rule states that approximately 95% of the observations will fall within 2 standard deviations of the mean and, consequently, approximately 5% will lie outside that interval. Since a mound-shaped distribution is symmetric, then approximately 2.5% of the day's production of batches will fall below 1.84%. b. If the data are actually mound-shaped, it would be extremely unusual (less than 2.5%) to observe a batch with 1.80% zinc phosphide if the true mean is 2.0%. Thus, if we did observe 1.8%, we would conclude that the mean percent of zinc phosphide in today's production is probably less than 2.0%. a. Both the height and width of the bars (peanuts) change. Thus, some readers may tend to equate the area of the peanuts with the frequency for each year. b. Using MINITAB, the frequency bar chart is: Chart of Peanut 5 4 Peanut 2.166 3 2 1 0 1975 1980 1985 1990 1995 2000 2005 2010 Year Copyright © 2014 Pearson Education, Inc. 96 Chapter 2 2.168 a. Clinic A claims to have a mean weight loss of 15 during the first month and Clinic B claims to have a median weight loss of 10 pounds in the first month. With no other information, I would choose Clinic B. It is very likely that the distributions of weight losses will be skewed to the right – most people lose in the neighborhood of 10 pounds, but a couple might lose much more. If a few people lost much more than 10 pounds, then the mean will be pulled in that direction. b. For Clinic A, the median is 10 and the standard deviation is 20. For Clinic B, the mean is 10 and the standard deviation is 5. For Clinic A: The mean is 15 and the median is 10. This would indicate that the data are skewed to the right. Thus, we will have to use Chebyshev’s Rule to describe the distribution of weight losses. x 2 s 15 2(20) 15 40 ( 25, 55) Using Chebyshev’s Rule, we know that at least 75% of all weight losses will be between -25 and 55 pounds. This means that at least 75% of the people will have weight losses of between a loss of 55 pounds to a gain of 25 pounds. This is a very large range. For Clinic B: The mean is 10 and the median is 10. This would indicate that the data are symmetrical. Thus, the Empirical Rule can be used to describe the distribution of weight losses. x 2 s 10 2(5) 10 10 (0, 20) Using the Empirical Rule, we know that approximately 95% of all weight losses will be between 0 and 20 pounds. This is a much smaller range than in Clinic A. I would still recommend Clinic B. Using Clinic A, a person has the potential to lose a large amount of weight, but also has the potential to gain a relatively large amount of weight. In Clinic B, a person would be very confident that he/she would lose weight. c. 2.169 One would want the clients selected for the samples in each clinic to be representative of all clients in that clinic. One would hope that the clinic would not choose those clients for the sample who lost the most weight just to promote their clinic. First we make some preliminary calculations. Of the 20 engineers at the time of the layoffs, 14 are 40 or older. Thus, the probability that a randomly selected engineer will be 40 or older is 14/20 = .70. A very high proportion of the engineers is 40 or over. In order to determine if the company is vulnerable to a disparate impact claim, we will first find the median age of all the engineers. Ordering all the ages, we get: 29, 32, 34, 35, 38, 39, 40, 40, 40, 40, 40, 41, 42, 42, 44, 46, 47, 52, 55, 64 The median of all 20 engineers is 40 40 80 40 2 2 Now, we will compute the median age of those engineers who were not laid off. The ages underlined 40 40 80 40 . above correspond to the engineers who were not laid off. The median of these is 2 2 Copyright © 2014 Pearson Education, Inc. Methods for Describing Sets of Data 97 The median age of all engineers is the same as the median age of those who were not laid off. The median 40 41 81 40.5 , which is not that much different from the median age of those age of those laid off is 2 2 not laid off. In addition, 70% of all the engineers are 40 or older. Thus, it appears that the company would not be vulnerable to a disparate impact claim. 2.170 Answers will vary. The graph is made to look like the amount of money spent on education has risen dramatically from 1980 to 2000, but the 4th grade reading scores have not increased at all. The graph does not take into account that the number of school children has also increased dramatically in the last 20 years. A better portrayal would be to look at the per capita spending rather than total spending. 2.171 There is evidence to support this claim. The graph peaks at the interval above 1.002. The heights of the bars decrease in order as the intervals get further and further from the peak interval. This is true for all bars except the one above 1.000. This bar is greater than the bar to its right. This would indicate that there are more observations in this interval than one would expect, suggesting that some inspectors might be passing rods with diameters that were barely below the lower specification limit. Copyright © 2014 Pearson Education, Inc. Chapter 3 Probability 3.1 a. Since the probabilities must sum to 1, P ( E3 ) 1 P ( E1 ) P ( E 2 ) P ( E 4 ) P ( E5 ) 1 .1 .2 .1 .1 .5 b. 3.2 P ( E3 ) 1 P ( E1 ) P ( E2 ) P ( E4 ) P ( E5 ) 1 P ( E3 ) P ( E2 ) P ( E4 ) P ( E5 ) 2 P ( E3 ) 1 .1 .2 .1 2 P ( E3 ) .6 P ( E3 ) .3 c. P ( E3 ) 1 P ( E1 ) P ( E 2 ) P ( E 4 ) P ( E5 ) 1 .1 .1 .1 .1 .6 a. This is a Venn Diagram. b. If the sample points are equally likely, then P (1) P (2) P (3) P (10) 1 10 Therefore, 1 1 1 3 .3 10 10 10 10 1 1 2 P ( B ) P (6) P (7) .2 10 10 10 P ( A) P (4) P (5) P (6) c. 3.3 1 1 3 5 .25 20 20 20 20 3 3 6 P ( B ) P (6) P (7) .3 20 20 20 P ( A) P (4) P (5) P (6) P( A) P(1) P(2) P(3) .05 .20 .30 .55 P( B ) P (1) P (3) P (5) .05 .30 .15 .50 P(C ) P(1) P(2) P(3) P(5) .05 .20 .30 .15 .70 3.4 a. b. c. d. 9 9! 9 8 7 6 5 4 3 2 1 126 4 4!(9 4)! 4 3 2 1 5 4 3 2 1 7 7! 7 6 5 4 3 2 1 21 2 2!(7 2)! 2 1 5 4 3 2 1 4 4! 4 3 2 1 1 4 4!(4 4)! 4 3 2 1 1 5 5! 5 4 3 2 1 1 0 0!(5 0)! 1 5 4 3 2 1 98 Copyright © 2014 Pearson Education, Inc. Probability 3.5 3.6 99 e. 6 6! 6 5 4 3 2 1 6 5 5!(6 5)! 5 4 3 2 1 1 a. N 5 5! 5 4 3 2 1 120 10 n 2 2!(5 2)! 2 1 3 2 1 12 b. N 6 6! 6 5 4 3 2 1 720 20 n 3 3!(6 3)! 3 2 1 3 2 1 36 c. N 20 20! 20 19 18 3 2 1 2.432902008 1018 15,504 14 n 5 5!(20 5)! 5 4 3 2 1 15 14 13 3 2 1 1.569209242 10 a. The tree diagram of the sample points is: b. If the dice are fair, then each of the sample points is equally likely. Each would have a probability of 1/36 of occurring. Copyright © 2014 Pearson Education, Inc. 100 Chapter 3 c. There is one sample point in A: (3,3). Thus, P ( A) 1 . 36 There are 6 sample points in B: (1,6) (2,5) (3,4) (4,3) (5,2) and (6,1). P ( B) 6 1 . 36 6 There are 18 sample points in C: (1,1) (1,3) (1,5) (2,2) (2,4) (2,6) (3,1) (3,3) (3,5) (4,2) (4,4) 18 1 . (4,6) (5,1) (5,3) (5,5) (6,2) (6,4) and (6,6). Thus, P(C ) 36 2 3.7 a. If we denote the marbles as B1, B2, R1, R2, and R3, then the ten sample points are: (B1, B2) (B1, R1) (B1, R2) (B1, R3) (B2, R1) (B2, R2) (B2, R3) (R1, R2) (R1, R3) (R2, R3) b. 1 . 10 There are 6 sample points in B: (B1, R1) (B1, R2) (B1, R3) (B2, R1) (B2, R2) (B2, R3). 1 6 3 Thus, P ( B ) 6 . 10 10 5 1 3 . There are 3 sample points in C: (R1, R2) (R1, R3) (R2, R3). Thus, P (C ) 3 10 10 Each student will obtain slightly different proportions. However, the proportions should be close to c. 3.8 Each of the sample points would be equally likely. Thus, each would have a probability of 1/10 of occurring. There is one sample point in A: (B1, B2). Thus, P( A) P ( A) 1 / 10, P ( B ) 6 / 10, and P (C ) 3 / 10. 3.9 a. The sample points of this experiment correspond to each of the 6 possible colors of the M&M’s. Let B r = brown, Y = yellow, R = red, Bl = blue, O = orange, G = green. The six sample points are: Br, Y, R, Bl, O, and G b. From the problem, the probabilities of selecting each color are: P(Br) = 0.13, P(Y) = 0.14, P(R) = 0.13, P(Bl) = 0.24, P(O) = 0.2, P(G) = 0.16 c. The probability that the selected M&M is brown is P(Br) = 0.13 d. The probability that the selected M&M is red, green or yellow is: P ( R or G or Y ) P ( R ) P (G ) P (Y ) 0.13 0.16 0.14 0.43 e. 3.10 P (not Bl ) P ( R ) P (G ) P (Y ) P ( Br ) P (O ) 0.13 0.16 0.14 0.13 0.20 0.76 Define the following events: I: {personal illness} F: {family issues} N: {personal needs} E: {entitlement mentality} S: {stress} Copyright © 2014 Pearson Education, Inc. Probability a. The 5 sample points are: I, F, N, E, S b. The probability of each sample points are: P ( I ) 0.34, P ( F ) 0.22, P ( N ) 0.18, P ( E ) 0.13, P ( S ) 0.13 c. The probability that the absence is due to something other than “personal illness” (I) is: P (not I ) P ( F ) P ( N ) P ( E ) P ( S ) 0.22 0.18 0.13 0.13 0.66 3.11 Define the following events: M: {Nanny who was placed in a job last year is a male} P(M ) 3.12 a. 24 .0057 4,176 Define the following events: H5: {Hurricane develops from 5th tropical storm} H12: {Hurricane develops before the 12th or higher tropical storm} 3.13 P(H 5 ) 11 .164 67 b. P ( H 12 ) 67 5 .925 67 a. The 5 sample points are the possible responses of a randomly selected person who participated in Harris Poll: None, 1-2, 3-5, 6-9, 10 or more b. The probabilities are: P (none) 0.19, P (1 2) 0.31, P (3 5) 0.26, P (6 9) 0.05, P (10 or more) 0.19 c. Define the following event: A: {Respondent looks for healthcare information online more than two times per month} P ( A) P (3 5) P (6 9) P (10 or more) 0.25 0.05 0.19 0.50 3.14 a. Define the following events: A: {Respondent works during summer vacation} B: {Respondent does not work during summer vacation} C: {Respondent unemployed} The sample points are A, B, and C. b. Reasonable probabilities are: P ( A) .46, P ( B ) .35, and P (C ) .19 Copyright © 2014 Pearson Education, Inc. 101 102 3.15 Chapter 3 c. P ( B or C ) P ( B ) P (C ) .35 .19 .54 a. The international consumer is most likely to use the Certification mark on a label to identify a green product. b. Define the following events: A: {Certification mark on label} B: {Packaging} C: {Reading information about the product} D: {Advertisement} E: {Brand website} F: {Other} P ( A or B ) P ( A) P ( B ) .45 .15 .60 3.16 c. P (C or E ) P (C ) P ( E ) .12 .04 .16 d. P (not D ) P ( A) P ( B ) P (C ) P ( E ) P ( F ) .45 .15 .12 .04 .18 .94 a. Define the following events: A: {Total visitors} B: {Paying visitors} C: {Big shows} D: {Funds raised} E: {Members} P( A or D) P( A) P( D) b & c. 8 7 15 .5 30 30 30 A tree diagram with the corresponding probabilities for this problem follows. To compute the probabilities, we have to assume that this sample is representative of all such museums. In addition, we have to assume that each selection of a museum is independent of the second selection. The probability of selecting a particular type of museum is estimated by the number of museums in that category divided by 30. Each sample point consists of two museums. The probabilities of each type of museum in the pair are then multiplied together to find the probability of the sample point. The probabilities are shown in the tree. Copyright © 2014 Pearson Education, Inc. Probability 3.17 d. P ( AA or DD or AD or DA) P ( AA) P ( DD ) P ( AD ) P ( DA) .071 .054 .062 .062 .249 a. Define the following event: C: {Slaughtered chicken passes inspection with fecal contamination} P (C ) 1 .01 100 Copyright © 2014 Pearson Education, Inc. 103 104 Chapter 3 b. 3.18 306 .0095 .01 32, 075 Yes. The probability of a slaughtered chicken passing inspection with fecal contamination rounded off to 2 decimal places is .01. Based on the data, P (C ) Define the following events: B: {Bitel} C: {Cybernet} F: {Fujian Landi} G: {Glint (Pava Rede)} I: {Intelligent} K: {Kwang Woo} O: {Omron} PT: {Pax Tech} PC: {Provenco Cadmus} S: {SZZT Electronics} T: {Toshiba TEC} U: {Urmet} To compute the probability of each event, we first must sum the number of units shipped by all the manufacturers. The sum is 334,039. P ( B ) 13, 500 / 344, 039 .040; P (C ) 16, 200 / 344, 039 .048; P ( F ) 119, 000 / 344, 039 .356; P (G ) 5, 990 / 344, 039 .018; P ( I ) 4, 562 / 344, 039 .014; P ( K ) 42, 000 / 344, 039 .126; P (O ) 20, 000 / 344, 039 .060; P ( PT ) 10, 072 / 344, 039 .030; P ( PC ) 20, 000 / 344, 039 .060; P ( S ) 67, 300 / 344, 039 .201; P (T ) 12, 415 / 344, 039 .037; P (U ) 3, 000 / 344, 039 .009 a. P ( F or S ) P ( F ) P ( S ) 0.356 0.201 0.557 b. Define the event: D: {PIN pad is defective} P ( D ) 1000 / 334, 039 .003 3.19 a. The probability that any network is selected on a particular day is 1/8. Therefore, P( ESPN selected on July 11) = 1/8. b. The number of ways to select four networks for the weekend days is a combination of 8 networks 8 8! 8 7 6 5 4 3 2 1 taken 4 at a time. The number of ways to do this is 70 . 4 4!(8 4)! 4 3 2 1 4 3 2 1 c. First, we need to find the number of ways one can choose the 4 networks where ESPN is one of the 4. If ESPN has to be chosen, then the number of ways of doing this is a combination of one thing taken 1 1! 1 one at a time or 1 . The number of ways to select the remaining 3 networks is a 1 1!(1 1)! 1 1 Copyright © 2014 Pearson Education, Inc. Probability 105 7 7! 7 6 5 4 3 2 1 combination of 7 things taken 3 at a time or 35 . Thus, the total 3 3!(7 3)! 3 2 1 4 3 2 1 number of ways of selecting 4 networks of which one has to be ESPN is 1(35) = 35. Finally, the probability of selecting ESPN as one of the 4 networks for the weekend analysis is 35 / 70 .5 . 3.20 a. Since order does not matter, the number of different bets would be a combination of 8 things taken 2 at 8 a time. The number of ways would be 2 b. 3.21 8! 8 7 6 5 4 3 2 1 40,320 28 . 2!(8 2)! 2 1 6 5 4 3 2 1 1440 If all players are of equal ability, then each of the 28 sample points would be equally likely. Each would have a probability of occurring of 1/28. There is only one sample point with values 2 and 7. Thus, the probability of winning with a bet of 2-7 would be 1/28 or .0357. Since one would be selecting 3 stocks from 15 without replacement, the total number of ways to select the 3 stocks would be a combination of 15 things taken 3 at a time. The number of ways would be 15 15! 15 14 13 3 2 1 1.307674368 1012 455 2874009600 3 3!(15 3)! 3 2 1 12 11 10 3 2 1 3.22 Denote Pu = public, Pr = private, B = bedrocks, U = unconsolidated, BL = below limit, D = detect. a. The 8 sample points for this experiment in which the well class (public or private), aquifer (bedrocks or unconsolidated) and detectible (below limit or detect) MTBE level of a well are observed are as follows: (Pu, B, BL) (Pu, U, BL) b. c. (Pr, B, BL) (Pu, B, D) (Pr, U, BL) (Pu, U, D) (Pr, B, D) (Pr, U, D) P ( Pu , B , BL ) 57 / 223 0.256 P ( Pr , B , BL ) 81 / 223 0.363 P ( Pu , B , D ) 41 / 223 0.184 P ( Pr , B , D ) 22 / 223 0.099 P ( Pu , U , BL ) 15 / 223 0.067 P ( Pr , U , BL ) 0 / 223 0.000 P ( Pu , U , D ) 7 / 223 0.031 P ( Pr , U , D ) 0 / 223 0.000 Define the following event: D = {Well has a detectible level or MTBE} P ( D ) P ( Pu , B , D ) P ( Pu , U , D ) P ( Pr , B , D ) P ( Pr , U , D ) 0.184 0.031 0.099 0 0.314 This means that if one well is chosen at random, the probability that it has a detectible level of MTBE is .314. 3.23 a. Since we want to maximize the purchase of grill #2, grill #2 must be one of the 3 grills in the display. Thus, we have to pick 2 more grills from the 4 remaining grills. Since order does not matter, the number of different ways to select 2 grill displays from 4 would be a combination of 4 things taken 2 at a time. The number of ways is: 4 4! 4 3 2 1 24 6 2 2!(4 2)! 2 1 2 1 4 Copyright © 2014 Pearson Education, Inc. 106 Chapter 3 Let Gi represent Grill i. The possibilities are: G1G2G3, G1G2G4, G1G2G5, G2G3G4, G2G3 G5, G2G4G5 b. 3.24 To find reasonable probabilities for the 6 possibilities, we divide the frequencies by the total sample size of 124. The probabilities would be: P (G1G 2 G3 ) 35 / 124 .282 P (G1G 2 G 4 ) 8 / 124 .065 P (G1G 2 G5 ) 42 / 124 .339 P (G 2 G3 G 4 ) 4 / 124 .032 P (G 2 G3 G5 ) 1 / 124 .008 P (G2 G4 G5 ) 34 / 124 .274 c. P( display contained Grill #1) P (G1G 2 G3 ) P (G1G2 G 4 ) P (G1G2 G5 ) .282 .065 .339 .686 a. Let H = Hyundai Elantra, T = Toyota Prius, and S = Subaru Forrester. All possible rankings are as follows, where the first car listed is ranked first, the second car listed is ranked second, and the third car listed is ranked third: H,T,S b. H,S,T S,H,T S,T,H T,H,S T,S,H If each set of rankings is equally likely, then each has a probability of 1/6. The probability that the Toyota Prius is ranked first P (T , H , S ) P (T , S , H ) 1 / 6 1 / 6 2 / 6 1 / 3 The probability that the Hyundai Elantra is ranked third P ( S , T , H ) P (T , S , H ) 1 / 6 1 / 6 2 / 6 1 / 3 . The probability that the Toyota Prius is ranked first and the Subaru Forrester is ranked second P (T , S , H ) 1 / 6 . 3.25 1 3 1 2 or 1 to 2. 3 3 a. The odds in favor of an Oxford Shoes win are to 1 b. If the odds in favor of Oxford Shoes are 1 to 1, then the probability that Oxford Shoes wins is 1 1 . 11 2 c. If the odds against Oxford Shoes are 3 to 2, then the odds in favor of Oxford Shoes are 2 to 3. Therefore, the probability that Oxford Shoes wins is 3.26 2 2 . 23 5 First, we need to compute the total number of ways we can select 2 bullets (pair) from 1,837 bullets. This is a combination of 1,837 things taken 2 at a time. 1,837 1,837! 1837 1836 1 1837 1836 1,686,366 2 2!(1,837 2)! 2 1 1835 1834 1 2 The number of pairs is: The probability of a false positive is the number of false positives divided by the number of pairs and is: P(false positive) = # false positives / # pairs 693 / 1,686,366 .0004 This probability is very small. There would be only about 4 false positives out of every 10,000. I would have confidence in the FBI’s forensic evidence. Copyright © 2014 Pearson Education, Inc. Probability 3.27 a. The number of ways the 5 commissioners can vote is 2(2)(2)(2)(2) = 25 = 32 (Each of the 5 commissioners has 2 choices for his/her vote – For or Against.) b. Let F denote a vote ‘For’ and A denote a vote ‘Against’. The 32 sample points would be: 107 FFFFF FFFFA FFFAF FFAFF FAFFF AFFFF FFFAA FFAFA FAFFA AFFFA FFAAF FAFAF AFFAF FAAFF AFAFF AAFFF FFAAA FAFAA FAAFA FAAAF AFFAA AFAFA AFAAF AAFFA AAFAF AAAFF FAAAA AFAAA AAFAA AAAFA AAAAF AAAAA Each of the sample points should be equally likely. Thus, each would have a probability of 1/32. c. The sample points that result in a 2-2 split for the other 4 commissioners are: FFAAF FAFAF AFFAF FAAFF AFAFF AAFFF FFAAA FAFAA FAAFA AFFAA AFAFA AAFFA There are 12 sample points. d. Let V = event that your vote counts. P (V ) 12 / 32 0.375 . e. If there are now only 3 commissioners in the bloc, then the total number of ways the bloc can vote is 2(2)(2) 23 8 . The sample points would be: FFF FFA FAF AFF FAA AFA AAF AAA The number of sample points where your vote would count is 4: FAF, AFF, FAA, AFA Let W = event that your vote counts in the bloc. P (W ) 4 / 8 0.5 . 3.28 3.29 a. P ( B c ) 1 P ( B ) 1 .7 .3 b. P ( Ac ) 1 P ( A) 1 .4 .6 c. P ( A B ) P ( A) P ( B ) P ( A B ) .4 .7 .3 .8 a. A: {HHH, HHT, HTH, THH, TTH, THT, HTT} B: {HHH, TTH, THT, HTT} A B : {HHH, HHT, HTH, THH, TTH, THT, HTT} Ac: {TTT} A B : {HHH, TTH, THT, HTT} b. P ( A) c. P ( A B ) P ( A) P ( B ) P ( A B ) d. No. P ( A B ) 7 8 P(B) 4 1 8 2 P( A B) 7 8 P ( Ac ) 1 8 7 1 1 7 8 2 2 8 1 which is not 0. 2 Copyright © 2014 Pearson Education, Inc. P( A B) 4 1 8 2 108 3.30 Chapter 3 The experiment consists of rolling a pair of fair dice. The sample points are: 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6 2, 1 2, 2 2, 3 2, 4 2, 5 2, 6 3, 1 3, 2 3, 3 3, 4 3, 5 3, 6 4, 1 4, 2 4, 3 4, 4 4, 5 4, 6 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 6, 1 6, 2 6, 3 6, 4 6, 5 6, 6 Since each die is fair, each sample point is equally likely. The probability of each sample point is 1/36. a. A: {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} B: {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6)} A B : {(3, 4), (4, 3)} A B : {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6), (1, 6), (2, 5), (5, 2), (6, 1)} Ac: {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 2), (2, 3), (2, 4), (2, 6), (3, 1), (3, 2), (3, 3), (3, 5), (3, 6), (4, 1), (4, 2), (4, 4), (4, 5), (4, 6), (5, 1), (5, 3), (5, 4), (5, 5), (5, 6), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)} b. 1 6 1 P ( A) 6 36 36 6 1 11 P ( B ) 11 36 36 1 15 5 P ( A B ) 15 36 36 12 1 30 5 P ( Ac ) 30 36 36 6 1 11 1 6 11 2 15 5 6 36 18 36 36 12 c. P ( A B ) P ( A) P ( B ) P ( A B ) d. A and B are not mutually exclusive. To be mutually exclusive, P ( A B ) must be 0. Here, P( A B) 3.31 1 1 2 P( A B) 2 36 36 18 1 . 18 1 1 1 1 1 15 3 5 5 5 20 10 20 4 a. P ( A) P ( E1 ) P ( E2 ) P ( E3 ) P ( E5 ) P ( E6 ) b. P ( B ) P ( E2 ) P ( E3 ) P ( E4 ) P ( E7 ) c. P ( A B ) P ( E1 ) P ( E 2 ) P ( E3 ) P ( E 4 ) P ( E5 ) P ( E 6 ) P ( E 7 ) 1 1 1 1 13 5 5 20 5 20 1 1 1 1 1 1 1 1 5 5 5 20 20 10 5 d. P ( A B ) P ( E2 ) P ( E3 ) e. P ( Ac ) 1 P ( A) 1 1 1 2 5 5 5 3 1 4 4 Copyright © 2014 Pearson Education, Inc. Probability 3.32 3.33 13 7 20 20 f. P( B c ) 1 P( B) 1 g. P ( A Ac ) P ( E1 ) P ( E2 ) P ( E3 ) P ( E4 ) P ( E5 ) P ( E6 ) P ( E7 ) 109 1 1 1 1 1 1 1 1 5 5 5 20 20 10 5 1 1 5 1 20 5 20 4 h. P ( Ac B ) P ( E4 ) P ( E7 ) a. P ( Ac ) P ( E3 ) P ( E6 ) .2 .3 .5 b. P ( B c ) P ( E1 ) P ( E7 ) .10 .06 .16 c. P ( Ac B ) P ( E3 ) P ( E6 ) .2 .3 .5 d. P ( A B ) P ( E1 ) P ( E 2 ) P ( E3 ) P ( E 4 ) P ( E5 ) P ( E 6 ) P ( E 7 ) .10 .05 .20 .20 .06 .30 .06 .97 e. P ( A B ) P ( E 2 ) P ( E 4 ) P ( E5 ) .05 .20 .06 .31 f. P ( Ac B c ) P ( E8 ) .03 g. No. A and B are mutually exclusive if P ( A B ) 0 . Here, P ( A B ) .31 . a. P ( A) .50 .10 .05 .65 b. P ( B ) .10 .07 .50 .05 .72 c. P (C ) .25 d. P ( D ) .05 .03 .08 e. P ( Ac ) .25 .07 .03 .35 (Note: P ( Ac ) 1 P ( A) 1 .65 .35 ) f. P ( A B ) P ( B ) .10 .07 .50 .05 .72 g. P( A C ) 0 h. Two events are mutually exclusive if they have no sample points in common or if the probability of their intersection is 0. P ( A B ) P ( A) .50 .10 .05 .65 . Since this is not 0, A and B are not mutually exclusive. P ( A C ) 0 . Since this is 0, A and C are mutually exclusive. P ( A D ) .05 . Since this is not 0, A and D are not mutually exclusive. P ( B C ) 0 . Since this is 0, B and C are mutually exclusive. Copyright © 2014 Pearson Education, Inc. 110 Chapter 3 P ( B D ) .05 . Since this is not 0, B and D are not mutually exclusive. P (C D ) 0 . Since this is 0, C and D are mutually exclusive. 3.34 3.35 3.36 a. The outcome "On" and "High" is A D . b. The outcome "Low" or "Medium" is Dc. a. The analyst makes an early forecast and is only concerned with accuracy is the event ( A B ) . b. The analyst is not only concerned with accuracy is the event Ac. c. The analyst is from a small brokerage firm or makes an early forecast is the event C B . d. The analyst makes a late forecast and is not only concerned with accuracy is the event B c Ac . Define the following events: A: {problems with absenteeism} T: {problems with turnover} From the problem, P ( A) .55, P (T ) .41 , and P ( A T ) .22 P(problems with either absenteeism or turnover) P ( A T ) P ( A) P (T ) P ( A T ) .55 .41 .22 .74 3.37 a. Define the following events: L: {Legs only} W: {Wheels only} B: {Both legs and wheels} N: {Neither legs nor wheels} The sample points are: L, W, B, and N b. From the given data: P( L) 3.38 63 .594 106 P (W ) 20 .189 106 P(B) 8 .075 106 c. P (Wheels) P (W or B ) P (W ) P ( B ) .189 .075 .264 d. P (Legs) P ( L or B ) P ( L ) P ( B ) .594 .075 .669 e. P (Either legs or wheels) 1 P ( N ) 1 .142 .858 P( N ) Define the following event: A: {Store violates the NIST scanner accuracy standard} Then P ( Ac ) 1 P ( A) 1 52 / 60 8 / 60 .133 Copyright © 2014 Pearson Education, Inc. 15 .142 106 Probability 3.39 Define the following events: A: {oil structure is active} I: {oil structure is inactive} C: {oil structure is caisson} W: {oil structure is well protector} F: {oil structure is fixed platform} a. The simple events are all combinations of structure type and activity type. The simple events are: AC, AW, AF, IC, IW, IF b. 3.40 Reasonable probabilities would be the frequency divided by the sample size of 3,400. The probabilities are: P ( AC ) 503 / 3, 400 .148 P ( AW ) 225 / 3, 400 .066 P ( AF ) 1, 447 / 3, 400 .426 P ( IC ) 598 / 3, 400 .176 P ( IW ) 177 / 3, 400 .052 P ( IF ) 450 / 3, 400 .132 c. P ( A) P ( AC ) P ( AW ) P ( AF ) .148 .066 .426 .640 d. P (W ) P ( AW ) P ( IW ) .066 .052 .118 e. P ( IC ) .176 f. P ( I F ) P ( IC ) P ( IW ) P ( IF ) P ( AF ) .176 .052 .132 .426 .786 g. P(C c ) 1 P(C ) 1 P ( AC ) P( IC ) 1 .148 .176 1 .324 .676 Define the following events: M: {UK citizen visits MySpace} B: {UK citizen visits Bebo} Copyright © 2014 Pearson Education, Inc. 111 112 a. 3.41 Chapter 3 The Venn Diagram that illustrates the use of social networking sites in UK is: M M∩B B 4% 1% 3% b. P ( M B ) P ( M ) P ( B ) P ( M B ) 0.04 0.03 0.01 0.06 c. P ( M c B c ) 1 P ( M B ) 1 0.06 0.94 First, define the following events: F: {Fully compensated} P: {Partially compensated} N: {Non-compensated} R: {Left because of retirement} From the text, we know 127 45 72 7 11 10 28 , P( P) , P( N ) , and P( R) P( F ) 244 244 244 244 244 3.42 127 244 a. P( F ) b. P( F R) c. P( F c ) 1 P( F ) 1 d. P( F R) P( F ) P( R) P( F R) 7 244 127 117 244 244 127 28 7 148 244 244 244 244 Define the following events: I: {Invests in Market} N: {No investment} a. P(I ) 44, 651 .283 158, 044 Copyright © 2014 Pearson Education, Inc. Probability 3.43 31, 943 17, 958 12,145 9, 531 71, 577 .453 158, 044 158, 044 b. P (IQ 6) c. P ( I {IQ 6}) d. P ( I {IQ 6}) P ( I ) P (IQ 6) P ( I {IQ 6}) .283 .453 .168 .568 e. P ( I c ) 1 P ( I ) 1 .283 .717 f. Two events are mutually exclusive if the probability of their intersection is 0. 893 P ( I {IQ 1}) .006 . Since this value is not 0, these two events are not mutually 158, 044 exclusive. a. P S A . Products 6 and 7 are contained in this intersection. b. P(possess all the desired characteristics) P( P S A) P(6) P(7) c. 10, 270 6, 698 5,135 4, 464 26, 567 .168 158, 044 158, 044 A S P ( A S ) P (2) P (3) P (5) P (6) P (7) P (8) P (9) P (10) d. 1 1 1 1 1 1 1 1 8 4 10 10 10 10 10 10 10 10 10 5 PS P ( P S ) P (2) P (6) P (7) 3.44 a. 1 1 1 3 10 10 10 10 Define the following events: G: {Student is assigned to the guilty state} C: {Student chooses the stated option} Then P (G ) 57 / 171 .333 . 3.45 b. P (C ) 60 / 171 .351 c. P (G C ) 45 / 171 .263 d. P (G C ) P (G ) P (C ) P (G C ) .333 .351 .263 .421 Define the following events: M1: {Model 1} M2: {Model 2} a. P (5) 85 .531 160 Copyright © 2014 Pearson Education, Inc. 1 1 1 10 10 5 113 114 3.46 Chapter 3 b. P (5 0) P (5) P (0) P (5 0) .531 c. P ( M 2 0) 35 0 .531 .219 .75 160 15 .094 160 Define the following events: A: {Individual tax return is audited by the IRS} B: {Corporation tax return is audited by the IRS} 3.47 1, 581, 394 .0111 142,823,105 a. P ( A) b. P ( Ac ) 1 P ( A) 1 .0111 .9889 c. P( B) d. P ( B c ) 1 P ( B ) 1 .0139 .9861 29,803 .0139 2,143,808 Define the following events: A: {Air pressure is over-reported by 4 psi or more} B: {Air pressure is over-reported by 6 psi or more} C: {Air pressure is over-reported by 8 psi or more 3.48 a. For gas station air pressure gauges that read 35 psi, P ( B ) .09 . b. For gas station air pressure gauges that read 55 psi, P (C ) .09 . c. For gas station air pressure gauges that read 25 psi, P ( Ac ) 1 P ( A) 1 .16 .84 . d. No. If air pressure is over-reported by 6 psi or more, then it is also over-reported by 4 psi or more. Thus, these 2 events are not mutually exclusive. e. The columns in the table are not mutually exclusive. All events in the last column (% Over-reported by 8 psi or more) are also part of the events in the first and second columns. All events in the second column are also part of the events in the first column. In addition, there is no column for the event ‘Over-reported by less than 4 psi or not over-reported’. There are a total of 6 6 6 216 possible outcomes from throwing 3 fair dice. To help demonstrate this, suppose the three dice are different colors – red, blue and green. When we roll these dice, we will record the outcome of the red die first, the blue die second, and the green die third. Thus, there are 6 possible outcomes for the first position, 6 for the second, and 6 for the third. This leads to the 216 possible outcomes. Copyright © 2014 Pearson Education, Inc. Probability 115 The Grand Duke argued that the chance of getting a sum of 9 and the chance of getting a sum of 10 should be the same since the number of partitions for 9 and 10 are the same. These partitions are: 9 126 135 144 225 234 333 10 136 145 226 235 244 334 In each case, there are 6 partitions. However, if we take into account the three colors of the dice, then there are various ways to get each partition. For instance, to get a partition of 126, we could get 126, 162, 216, 261, 612, and 621 (again, think of the red die first, the blue die second, and the green die third). However, to get a partition of 333, there is only 1 way. To get a partition of 144, there are 3 ways: 144, 414, and 441. The numbers of ways to get each of the above partitions are: 9 126 135 144 225 234 333 # ways 6 6 3 3 6 _ 1 25 10 136 145 226 235 244 334 # ways 6 6 3 6 3 _3 27 Thus, there are a total of 25 ways to get a sum of 9 and 27 ways to get a sum of 10. The chance of throwing a sum of 9 (25 chances out of 216 possibilities) is less than the chance of throwing a 10 (27 chances out of 216 possibilities). 3.49 3.50 3.51 3.52 a. P( A | B) P ( A B ) .1 .5 .2 P( B) b. P ( B | A) P ( A B ) .1 .25 .4 P ( A) c. Events A and B are said to be independent if P ( A | B ) P ( A) . In this case, P ( A | B ) .5 and P ( A) .4 . Thus, A and B are not independent. a. P ( A B ) P ( A | B ) P ( B ) .6(.2) .12 b. P ( B | A) a. If two events are independent, then P ( A B ) P ( A) P ( B ) .4(.2) .08 . b. If two events are independent, then P ( A | B ) P ( A) .4 . c. P ( A B ) P ( A) P ( B ) P ( A B ) .4 .2 .08 .52 a. Since A and B are mutually exclusive events, P ( A B ) P ( A) P ( B ) .30 .55 .85 P ( A B ) .12 .3 P ( A) .4 Copyright © 2014 Pearson Education, Inc. 116 3.53 Chapter 3 b. Since A and C are mutually exclusive events, P ( A C ) 0 c. P( A | B) d. Since B and C are mutually exclusive events, P ( B C ) P ( B ) P (C ) .55 .15 .70 e. No, B and C cannot be independent events because they are mutually exclusive events. a. P ( A) P ( E1 ) P ( E 2 ) P ( E3 ) .2 .3 .3 .8 P( A B) 0 0 .55 P( B) P ( B ) P ( E 2 ) P ( E3 ) P ( E5 ) .3 .3 .1 .7 P ( A B ) P ( E 2 ) P ( E3 ) .3 .3 .6 b. P ( E1 | A) P ( E 1 A) P ( E 1) .2 .25 P ( A) P ( A) .8 P ( E2 | A) P ( E 2 A) P ( E 2) .3 .375 P ( A) P ( A) .8 P ( E3 | A) P ( E 3 A) P ( E 3) .3 .375 P ( A) P ( A) .8 The original sample point probabilities are in the proportion .2 to .3 to .3 or 2 to 3 to 3. The conditional probabilities for these sample points are in the proportion .25 to .375 to .375 or 2 to 3 to 3. c. (1) P ( B | A) P ( E 2 | A) P ( E3 | A) .375 .375 .75 (from part b) (2) P ( B | A) P ( A B ) .6 .75 (from part a) .8 P ( A) The two methods do yield the same result. 3.54 d. If A and B are independent events, P ( B | A) P ( B ) . From part c, P ( B | A) .75 . From part a, P ( B ) .7 . Since .75 .7 , A and B are not independent events. a. If two fair coins are tossed, there are 4 possible outcomes or simple events. They are: E1 = HH E2 = HT E3 = TH E4 = TT Event A contains the simple events E1, E2, and E3. Event B contains the simple events E2 and E3. Copyright © 2014 Pearson Education, Inc. Probability 117 A Venn diagram of this would be: A B E2 E3 E1 E4 Since the coins are fair, each of the sample points is equally likely. Each would have probabilities of ¼. b. 1 3 P ( A) 3 .75 4 4 P ( A B ) P ( E2 )P ( E3 ) 3.55 1 2 1 P ( B ) 2 .5 4 4 2 1 1 2 1 .5 4 4 4 2 P ( A B ) .5 1 .5 P( B) P ( B | A) P ( A B ) .5 .667 .75 P ( A) c. P( A | B) a. P ( A) P ( E1 ) P ( E3 ) .22 .15 .37 b. P ( B ) P ( E 2 ) P ( E3 ) P ( E 4 ) .31 .15 .22 .68 c. P ( A B ) P ( E3 ) .15 d. P( A | B) e. P(B C ) 0 f. P (C | B ) g. For pair A and B: A and B are not independent because P ( A | B ) P ( A) or .2206 .37 . P ( A B ) .15 .2206 P( B) .68 P (C B ) 0 0 .68 P( B) For pair A and C: P ( A C ) P ( E1 ) .22 P( A | C ) P (C ) P ( E1 ) P ( E5 ) .22 .10 .32 P ( A C ) .22 .6875 .32 P (C ) Copyright © 2014 Pearson Education, Inc. 118 Chapter 3 A and C are not independent because P ( A | C ) P ( A) or .6875 .37 . For pair B and C: B and C are not independent because P (C | B ) P (C ) or 0 .32 . 3.56 The 36 possible outcomes obtained when tossing two dice are listed below: (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) A: {(1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (3, 6), (4, 1), (4, 3), (4, 5), (5, 2), (5, 4), (5, 6), (6, 1), (6, 3), (6, 5)} B: {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5), (6, 6)} A B : {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5)} If A and B are independent, then P ( A) P ( B ) P ( A B ) . P ( A) 18 1 36 2 P ( A) P ( B ) 3.57 a. P( B) 7 36 P( A B) 6 1 36 6 1 7 7 1 P ( A B ) . Thus, A and B are not independent. 2 36 72 6 P ( A C ) 0 A and C are mutually exclusive. P ( B C ) 0 B and C are mutually exclusive. b. P ( A) P (1) P (2) P (3) .20 .05 .30 .55 P ( B ) P (3) P (4) .30 .10 .40 P (C ) P (5) P (6) .10 .25 .35 P ( A B ) P (3) .30 P( A | B) P ( A B ) .30 .75 .40 P( B) A and B are independent if P ( A | B ) P ( A) . Since P ( A | B ) .75 and P ( A) .55 , A and B are not independent. Since A and C are mutually exclusive, they are not independent. Similarly, since B and C are mutually exclusive, they are not independent. c. Using the probabilities of sample points, P ( A B ) P (1) P (2) P (3) P (4) .20 .05 .30 .10 .65 Using the additive rule, P ( A B ) P ( A) P ( B ) P ( A B ) .55 .40 .30 .65 Copyright © 2014 Pearson Education, Inc. Probability Using the probabilities of sample points, P ( A C ) P (1) P (2) P (3) P (5) P (6) .20 .05 .30 .10 .25 .90 Using the additive rule, P ( A C ) P ( A) P (C ) P ( A C ) .55 .35 0 .90 3.58 3.59 From the Exercise, P ( A) .15 , P ( B ) .10 , and P ( A B ) .05 . a. If events A and B are mutually exclusive then P ( A B ) 0 . For this problem, P ( A B ) .05 . Therefore, events A and B are not mutually exclusive. b. P ( B | A) c. Events A and B are independent if P ( B | A) P ( B ) . For this exercise, P ( B | A) .333 and P ( B ) .10 . Since these are not equal, events A and B are not independent. P ( A B ) .05 .333 .15 P ( A) Define the following events: A: {Company is a banking/investment company} B: {Company is based in United States} From the problem, we know that P ( A B ) P( A | B) 3.60 4 9 .20 and P ( B ) .45 20 20 P ( A B ) .20 .444 . P( B) .45 Define the following events: G: {The respondent is assigned to the guilt state} A: {The respondent is assigned to the anger state} C: {The respondent chooses the stated option to repair car} a. From Exercise 3.44, we know P (G ) 57 / 171 .333 and P (G C ) 45 / 171 .263 P (C | G ) b. P (G C ) .263 .790 .333 P (G ) From Exercise 3.44, we know P (C ) 60 / 171 .351 . Thus, P (C c ) 1 .351 .649 P( A | C c ) P ( A C c ) 50 / 171 .292 P ( A C c ) .292 .450 .649 P (C c ) Copyright © 2014 Pearson Education, Inc. 119 120 Chapter 3 c. Two events C and G are independent if P (C G ) P (C ) P (G ) . From Exercise 3.44, P (G ) .333 , P (C ) .351 , and P (G C ) .263 . P (G ) P (C ) .333(.351) .117 .263 P (G C ) . Thus C and G are not independent. 3.61 Define the following events: A: {Internet user has wireless connection via mobile device} B: {Internet user uses Twitter} From the exercise, P ( A ) .54 and P ( B | A) .25 . P ( A B ) P ( B | A) P ( A) .25(.54) .135 3.62 Define the following events: A: {Person is victim of identity theft} B: {Theft occurred from unauthorized use of credit card} From the exercise, P ( A) .05 and P ( B | A) .53 3.63 a. P ( A) .05 b. P ( A B ) P ( B | A) P ( A) .05(.53) .0265 Define the following events: F: {Worker is fully compensated} P: {Worker is partially compensated} N: {Worker is non-compensated} R: {Worker retired} From the exercise, P ( F ) 127 / 244 .520 , P ( P ) 45 / 244 .184 , P ( R | F ) 7 / 127 .055 , P ( R | P ) 11 / 45 .244 , and P ( R | N ) 10 / 72 .139 . a. P ( R | F ) 7 / 127 .055 b. P ( R | N ) 10 / 72 .139 c. The two events are independent if P ( R | F ) P ( R ) . 7 11 10 28 .115 and P ( R | F ) 10 / 72 .055 . Since these are not equal, events R 244 244 and F are not independent. P( R) Copyright © 2014 Pearson Education, Inc. Probability 3.64. 121 Define the following events: A: {Respondent works during summer vacation} B: {Respondent does not work during summer vacation} C: {Respondent unemployed} D: {Respondent monitors business emails} From Exercise 3.14: P ( A ) .46 , P ( B ) .35 , P (C ) .19 . From this exercise, P ( D | A) .35 . 3.65 a. P ( D | A) .35 b. P ( A D ) P ( D | A) P ( A) .35(.46) .161 c. P ( B D ) 0 (If an employee is not working, then he/she will not monitor business emails.) Define the following events: I: {Invests in Market} N: {No investment} a. 10, 270 6, 698 5,135 4, 464 P ( I {IQ 6}) 26,567 158, 044 .371 P ( I | IQ 6) 31,943 17,958 12,145 9,531 71,577 P (IQ 6) 158, 044 b. 44, 651 26,567 P( I {IQ 5}) 18, 084 158, 044 .209 . P( I | IQ 5) 158, 044 71,577 86, 467 P(IQ 5) 158, 044 c. 3.66 Yes, it appears that investing in the stock market is dependent on IQ. If investing in the stock market and IQ were independent, then P ( I | IQ 5) P ( I | IQ 6) P ( I ) . Since P ( I | IQ 5) P ( I | IQ 6) , then investing in the stock market and IQ are dependent. Define the following events: th Ai : {i CEO has bachelor’s degree} 13 .325 40 a. P ( A1 ) b. If the first 4 CEO’s have just bachelor’s degree, then on the next pick there are only 9 left to choose from. Similarly, after picking 4 CEO’s, there are only 36 observations left to choose from. P ( A5 | A1 A2 A3 A4 ) 9 .25 36 Copyright © 2014 Pearson Education, Inc. 122 3.67 Chapter 3 Define the following events: A: {Ambulance can travel to location A under 8 minutes} B: {Ambulance can travel to location B under 8 minutes} C: {Ambulance is busy} We are given P ( A) .58 , P ( B ) .42 , and P (C ) .3 . 3.68 a. P ( A C c ) P ( A | C c ) P (C c ) .58(1 .3) .406 b. P ( B | C c ) P (C c ) .42(1 .3) .294 If A and B are independent, then P ( A B ) P ( A) P ( B ) . For this Exercise, P ( A) 1174 416 1590 1174 89 1263 .883 , P ( B ) .702 , and 1800 1800 1800 1800 P( A B) 1174 .652 . 1800 P ( A) P ( B ) .883(.702) .620 .652 P ( A B ) . Thus, A and B are not independent. 3.69 Define the following events: A: {Alarm A sounds alarm} B: {Alarm B sounds alarm} I: {Intruder} a. From the problem P A | I .9, P B | I .95, P( A | I c ) .2 and P( B | I c ) .1 . b. Since the two systems are operating independently of each other, P ( A B | I ) P ( A | I ) P ( B | I ) .9(.95) .855 3.70 c. P ( A B | I c ) P ( A | I c ) P ( B | I c ) .2(.1) .02 d. P ( A B | I ) P ( A | I ) P ( B | I ) P ( A B | I ) .9 .95 .855 .995 a. Since there are 2 vineyards and 3 years, there are a total of 2(3) = 6 combinations. b. Of the 6 combinations, 3 of them are from the Llarga vineyard. Thus, P (Llarga) 3 / 6 .5 . c. Of the 6 combinations, 2 of them are Year 3. Thus, P (Year 3) 2 / 6 .333 d. If the tasters are independent, then the probability that each selects Llarga is P (Llarga) P (Llarga) P (Llarga) P (Llarga) .5(.5)(.5)(.5) .0625 . Copyright © 2014 Pearson Education, Inc. Probability 3.71 123 Define the following event: A: {The specimen labeled “red snapper” was really red snapper} a. The probability that you are actually served red snapper the next time you order it at a restaurant is P ( A) 1 .77 .23 b. P(at least one customer is actually served red snapper) = 1 – P(no customer is actually served red snapper) 1 P ( A c A c A c Ac Ac ) 1 P ( A c ) P ( Ac ) P ( A c ) P ( A c ) P ( Ac ) 1 .775 1 .271 .729 Note: In order to compute the above probability, we had to assume that the trials or events are independent. This assumption is likely to not be valid. If a restaurant served one customer a look-alike variety, then it probably served the next one a look-a-like variety. 3.72 First, define the following event: A: {CVSA correctly determines the veracity of a suspect} P(A) = .98 (from claim) a. The event that the CVSA is correct for all four suspects is the event A A A A . P ( A A A A) .98(.98)(.98)(.98)(.98) .9224 b. The event that the CVSA is incorrect for at least one of the four suspects is the event ( A A A A) c . P ( A A A A) c 1 P ( A A A A) 1 .9224 .0776 c. If the CVSA had an accuracy of .98, then the probability of observing 2 incorrect results is less than .0776. Since 2 incorrect results were observed, it was either a rare event or the accuracy of the CVSA is not .98 but something less than .98. 3.73 Define the following events: A: {Patient receives PMI sheet} B: {Patient was hospitalized} P ( A ) .20 , 3.74 P ( A B ) .12 , P ( B | A) P ( A B ) .12 .60 .20 P ( A) Define the following events: I: {Leak ignites immediately (jet fire)} D: {Leak has delayed ignition (flash fire)} From the problem, P ( I ) .01 and P ( D | I c ) .01 The probability of a jet fire or a flash fire P( I D) P( I ) P( D) P( I D) P ( I ) P ( D | I c ) P ( I c ) P ( I D ) .01 .01(1 .01) 0 .01 .0099 .0199 Copyright © 2014 Pearson Education, Inc. 124 Chapter 3 A tree diagram of this problem is: I I .01 D(.01) IcD .99(.01)=.0099 .01 .99 Ic Dc (.99) 3.75 a. IcDc .99(.99)=.9801 If the coin is balanced, then P ( H ) .5 and P (T ) .5 on any trial. Also, we can assume that the results of any coin toss is independent of any other. Thus, P( H H H H H H H H H H ) P( H ) P( H ) P( H ) P( H ) P( H ) P( H ) P( H ) P( H ) P( H ) P( H ) .5(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5) .510 .0009766 P( H H T T H T T H H H ) P( H ) P( H ) P(T ) P(T ) P( H ) P(T ) P(T ) P( H ) P( H ) P( H ) .5(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5) .510 .0009766 P(T T T T T T T T T T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) .5(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5) .510 .0009766 b. Define the following events: A: {10 coin tosses result in all heads or all tails} B: {10 coin tosses result in mix of heads and tails} P( A) P( H H H H H H H H H H ) P(T T T T T T T T T T ) .0009766 .0009766 .0019532 c. d. 3.76 P ( B ) 1 P ( A) 1 .0019532 .9980468 From the above probabilities, the chances that either all heads or all tails occurred is extremely rare. Thus, if one of these sequences really occurred, it is most likely sequence #2. Define the following events: A: {Algorithm predicts defects} B: {Module has defects} C: {Algorithm is correct} Copyright © 2014 Pearson Education, Inc. Probability a. Accuracy P (C ) P ( A B ) P ( Ac B c ) b. Detection rate P ( A | B ) d bd c. False alarm P ( A | B c ) c ac d. Precision P ( B | A) e. From the SWDEFECTS file the table is: d a ad abcd abcd abcd d cd Module has Defects Algorithm Predicts Defects False True No 400 29 Yes 49 20 Accuracy P (C ) P ( A B ) P ( Ac B c ) 20 400 420 d a d a .843 a b c d a b c d a b c d 400 29 49 20 498 The probability that the algorithm is correct is .843. Detection rate P ( A | B ) 20 20 d .408 b d 29 20 49 The probability that the algorithm predicts a defect given the module is actually defective is .408. False alarm P ( A | B c ) 49 49 c .109 a c 400 49 449 The probability that the algorithm predicts a defect given the module is not defective is .109. Precision P ( B | A) 20 20 d .290 c d 49 20 69 The probability that the module is defective given the algorithm predicted a defect is .290. 3.77 a. P ( B1 A) P ( A | B1 ) P ( B1 ) .3(.75) .225 b. P ( B2 A) P ( A | B2 ) P ( B2 ) .5(.25) .125 c. P ( A) P ( B1 A) P ( B2 A) .225 .125 .35 d. P ( B1 | A) P ( B1 A) .225 .643 .35 P ( A) e. P ( B2 | A) P ( B2 A) .125 .357 .35 P ( A) Copyright © 2014 Pearson Education, Inc. 125 126 3.78 Chapter 3 First, we find the following probabilities: P ( A B1 ) P ( A | B1 ) P ( B1 ) .4(.2) .08 P ( A B2 ) P ( A | B2 ) P ( B2 ) .25(.15) .0375 P ( A B3 ) P ( A | B3 ) P ( B3 ) .6(.65) .39 P ( A) P ( A B1 ) P ( A B2 ) P ( A B3 ) .08 .0375 .39 .5075 3.79 a. P ( B1 | A) P ( A B1 ) .08 .158 .5075 P ( A) b. P ( B2 | A) P ( A B2 ) .0375 .074 .5075 P ( A) c. P ( B3 | A) P ( A B3 ) .39 .768 P ( A) .5075 If A is independent of B1, B2, and B3, then P ( A | B1 ) P ( A) .4 . Then P ( B1 | A) 3.80 P ( A | B1 ) P ( B1 ) .4(.2) .2 .4 P ( A) From the information given, P( D) 1 / 80 , P ( D c ) 79 / 80 , P ( N | D ) 1 / 2 , P ( N c | D ) 1 / 2 , P ( N | D c ) 1 , and P ( N c | D c ) 0 . Using Bayes’ Rule P( D N ) P( N | D) P ( D) P( N ) P ( N | D) P ( D) P ( N | D c ) P( D c ) 1 1 1 1 1 2 80 160 160 .0063 1 1 79 1 79 1 158 159 1 2 80 80 160 80 160 160 P( D | N ) 3.81 Define the following events: E: {Expert makes the correct decision} N: {Novice makes the correct decision} M: {Matched condition} E: {Similar distracter condition} E: {Non-similar distracter condition} a. P ( E c | M ) 1 .9212 .0788 b. P ( N c | M ) 1 .7455 .2545 c. Since P ( N c | M ) .2545 P ( E c | M ) .0788 , it is more likely that the participant is a Novice. Copyright © 2014 Pearson Education, Inc. Probability 3.82 P ( E1 error ) P (error ) P (error | E1 ) P ( E1 ) P (error | E1 ) P ( E1 ) P (error | E2 ) P ( E2 ) P (error | E3 ) P ( E3 ) P ( E1 | error ) a. .01(.30) .003 .003 .158 .01(.30) .03(.20) .02(.50) .003 .006 .01 .019 P ( E2 error ) P (error ) P (error | E2 ) P ( E2 ) P (error | E1 ) P ( E1 ) P(error | E2 ) P ( E2 ) P (error | E3 ) P ( E3 ) P ( E2 | error ) b. .03(.20) .006 .006 .316 .01(.30) .03(.20) .02(.50) .003 .006 .01 .019 P ( E3 error ) P (error ) P (error | E3 ) P ( E3 ) P (error | E1 ) P ( E1 ) P (error | E2 ) P ( E2 ) P(error | E3 ) P ( E3 ) P ( E3 | error ) c. 3.83 .02(.50) .01 .01 .526 .01(.30) .03(.20) .02(.50) .003 .006 .01 .019 d. If there was a serious error, the probability that the error was made by engineer 3 is .526. This probability is higher than for any of the other engineers. Thus engineer #3 is most likely responsible for the error. a. Converting the percentages to probabilities, P (275 300) .52 , P (305 325) .39 , and P (330 350) .09 . b. Using Bayes Theorem, P(275 300 CC ) P(CC ) P(CC | 275 300) P(275 300) P(CC | 275 300) P(275 300) P(CC | 305 325) P(305 325) P(CC | 330 350) P(330 350) P(275 300 | CC ) 3.84 127 .775(.52) .403 .403 .516 .775(.52) .77(.39) .86(.09) .403 .3003 .0774 .7807 Define the following events: U: {Athlete uses testosterone} P: {Test is positive} a. Sensitivity is P ( P | U ) 50 .5 100 Copyright © 2014 Pearson Education, Inc. 128 Chapter 3 9 1 .01 .99 900 b. Specificity is P ( P c | U c ) 1 c. First, we need to find the probability that an athlete is a user: P (U ) 100 / 1000 .1 . Next, we need to find the probability of a positive test: P ( P ) P ( P | U ) P (U ) P ( P | U c ) P (U c ) .5(.1) .01(.9) .05 .009 .059 Positive predictive value is P (U | P ) 3.85 P (U P ) P ( P | U ) P (U ) .5(.1) .847 P( P) P( P) .059 Define the following events: S: {Shale} D: {Dolomite } G: {Gamma ray reading > 60 } From the exercise: P ( D ) 476 295 34 280 .617 , P ( S ) .383 , P (G | D ) .071 , and P (G | S ) .949 . 771 771 476 295 P ( D G ) P (G | D ) P ( D ) .071(.617) .0438 and P (G ) P (G | D ) P ( D ) P (G | S ) P ( S ) .071(.617) .949(.383) .0438 .3635 .4073 . P ( D G ) .0438 .1075 . Since this probability is so small, we would suggest that the .4073 P (G ) area should not be mined. Thus, P ( D | G ) 3.86 Define the following events: D: {Defect in steel casting} H: {NDE detects ‘Hit” or defect in steel casting} From the problem, P( H | D) .97 , P ( H | D c ) .005 , and P ( D ) .01 . P ( H ) P ( H | D ) P ( D ) P ( H | D c ) P ( D c ) .97(.01) .005(.99) .0097 .00495 .01465 P( D | H ) 3.87 P ( D H ) P ( H | D ) P ( D ) .97(.01) .0097 .6621 .01465 .01465 P( H ) P( H ) Define the following event: D: {Chip is defective} From the Exercise, P ( S1 ) .15 , P ( S 2 ) .05 , P ( S 3 ) .10 , P ( S 4 ) .20 , P ( S 5 ) .12 , P ( S 6 ) .20 , and P ( S 7 ) .18 . Also, P ( D | S1 ) .001 , P ( D | S 2 ) .0003 , P ( D | S 3 ) .0007 , P ( D | S 4 ) .006 , P ( D | S 5 ) .0002 , P ( D | S 6 ) .0002 , and P ( D | S 7 ) .001 . Copyright © 2014 Pearson Education, Inc. Probability a. P ( S1 | D ) 129 We must find the probability of each supplier given a defective chip. P ( S1 D ) P( D) P ( D | S1 ) P ( S1 ) P ( D | S1 ) P ( S1 ) P ( D | S 2 ) P ( S 2 ) P ( D | S3 ) P ( S3 ) P ( D | S 4 ) P ( S 4 ) P ( D | S5 ) P ( S5 ) P ( D | S6 ) P( S6 ) P( D | S7 ) P( S7 ) .001(.15) .001(.15) .0003(.05) .0007(.10) .006(.20) .0002(.12) .0002(.02) .001(.18) .00015 .00015 .0893 .00015 .000015 .00007 .0012 .000024 .00004 .00018 .001679 P( S2 | D) P ( S 2 D ) P ( D | S 2 ) P ( S 2 ) .0003(.05) .000015 .0089 .001679 .001679 P( D) P( D) P ( S3 | D ) P ( S3 D ) P ( D | S3 ) P ( S3 ) .0007(.10) .00007 .0417 P( D) P( D) .001679 .001679 P(S4 | D) P ( S 4 D ) P ( D | S 4 ) P ( S 4 ) .006(.20) .0012 .7147 .001679 .001679 P( D) P( D) P ( S5 | D ) P ( S5 D ) P ( D | S5 ) P ( S5 ) .0002(.12) .000024 .0143 P( D) P( D) .001679 .001679 P ( S6 | D ) P ( S6 D ) P ( D | S6 ) P ( S6 ) .0002(.20) .00004 .0238 P( D) P( D) .001679 .001679 P ( S7 | D ) P ( S7 D ) P ( D | S7 ) P ( S7 ) .001(.18) .00018 .1072 .001679 .001679 P( D) P( D) Of these probabilities, .7147 is the largest. This implies that if a failure is observed, supplier number 4 was most likely responsible. b. If the seven suppliers all produce defective chips at the same rate of .0005, then P ( D | S i ) .0005 for all i = 1, 2, 3, … 7 and P ( D ) .0005 . For any supplier i, P ( S i D ) P ( D | S i ) P ( S i ) .0005 P ( S i ) and P ( Si | D ) P ( Si D ) P ( D | Si ) P ( Si ) .0005 P ( Si ) P ( Si ) .0005 .0005 P( D) Thus, if a defective is observed, then it most likely came from the supplier with the largest proportion of sales (probability). In this case, the most likely supplier would be either supplier 4 or supplier 6. Both of these have probabilities of .20. Copyright © 2014 Pearson Education, Inc. 130 3.88 Chapter 3 Define the following events: A: {Alarm A sounds alarm} B: {Alarm B sounds alarm} I: {Intruder} From the problem: P ( A | I ) .9 , P ( B | I ) .95 , P ( A | I c ) .2 , P ( B | I c ) .1 , and P ( I ) .4 Since the two systems are operating independently of each other, P ( A B | I ) P ( A | I ) P ( B | I ) .9(.95) .855 P ( A B I ) P ( A B | I ) P ( I ) .855(.4) .342 P ( A B | I c ) P ( A | I c ) P ( B | I c ) .2(.1) .02 P ( A B I c ) P ( A B | I c ) P ( I c ) .02(.6) .012 Thus, P ( A B ) P ( A B I ) P ( A B I c ) .342 .012 .354 Finally, P ( I | A B ) 3.89 a. b. P ( A B I ) .342 .966 P( A B) .354 P (T | E ) 1 , then P (T | E ) P (T c | E ) . Thus, the probability of more than two bullets given the c P (T | E ) evidence is greater than the probability of two bullets given the evidence. This supports the theory of more than two bullets were used in the assassination of JFK. If Using Bayes Theorem, P (T | E ) P (T ) P ( E | T ) P (T c ) P ( E | T c ) . and P (T c | E ) c c P (T ) P ( E | T ) P (T ) P ( E | T ) P (T ) P ( E | T ) P (T c ) P ( E | T c ) P(T ) P( E | T ) P(T | E ) P (T ) P( E | T ) P(T ) P( E | T ) P(T c ) P( E | T c ) . Thus, c P(T | E ) P(T c ) P( E | T c ) P (T c ) P( E | T c ) P(T ) P( E | T ) P(T c ) P( E | T c ) 3.90 a. If the Dow Jones Industrial Average increases, a large New York bank would tend to decrease the prime interest rate. Therefore, the two events are not mutually exclusive since they could occur simultaneously. b. The next sale by a PC retailer could not be both a notebook and a desktop computer. Since the two events cannot occur simultaneously, the events are mutually exclusive. c. Since both events cannot occur simultaneously, the events are mutually exclusive. Copyright © 2014 Pearson Education, Inc. Probability 3.91 a. 131 The two probability rules for a sample space are that the probability for any sample point is between 0 and 1 and that the sum of the probabilities of all the sample points is 1. For this Exercise, all the probabilities of the sample points are between 0 and 1 and P ( S ) P ( S ) P ( S ) P ( S ) P ( S ) .2 .1 .3 .4 1.0 4 i 1 b. 1 i 2 3 4 P ( A) P ( S1 ) P ( S 4 ) .2 .4 .6 3.92 P ( A B ) P ( A) P ( B ) P ( A B ) .7 .5 .4 .8 3.93 a. If events A and B are mutually exclusive, then P ( A B ) 0 . P( A | B) 3.94 P( A B) 0 0 .3 P( B) b. No. If events A and B are independent, then P ( A | B ) P ( A) . However, from the Exercise we know P ( A) .2 and from part a, we know P ( A | B ) 0 . Thus, events A and B are not independent. a. Because events A and B are independent, we have: P ( A B ) P ( A) P ( B ) .3(.1) .03 Thus, P ( A B ) 0 , and the two events cannot be mutually exclusive. 3.95 P ( A B ) .03 .3 .1 P( B) P ( B | A) P ( A B ) .03 .1 .3 P ( A) b. P( A | B) c. P ( A B ) P ( A) P ( B ) P ( A B ) .3 .1 .03 .37 P ( A B ) .4 , P ( A | B ) .8 Since P( A | B ) .8 P( A B) , substitute the given probabilities into the formula and solve for P(B). P( B) .4 .4 P ( B ) .5 P( B) .8 3.96 The number of ways to select 5 things from 50 is a combination of 50 things taken 5 at a time or 50 50! 50! 50 49 48 47 46 45! 2,118, 760 . 5 5!(50 5)! 5!45! 5 4 3 2 1 45! 3.97 a. P( A B) 0 P ( B C ) P (2) .2 P ( A C ) P (1) P (2) P (3) P (5) P (6) .3 .2 .1 .1 .2 .9 P ( A B C ) P (1) P (2) P (3) P (4) P (5) P (6) .3 .2 .1 .1 .1 .2 1 Copyright © 2014 Pearson Education, Inc. 132 Chapter 3 P ( B c ) P (1) P (3) P (5) P (6) .3 .1 .1 .2 .7 P ( Ac B ) P (2) P (4) .2 .1 .3 P( B | C ) .2 .2 P( B C ) P (2) .4 P (C ) P (2) P (5) P (6) .2 .1 .2 .5 P ( B | A) 0 P ( B A) 0 P ( A) P ( A) b. Since P ( A B ) 0 , and P ( A) P ( B ) 0 , these two would not be equal, implying A and B are not independent. However, A and B are mutually exclusive, since P ( A B ) 0 . c. P ( B ) P (2) P (4) .2 .1 .3 . But P ( B | C ) , calculated above, is .4. Since these are not equal, B and C are not independent. Since P ( B C ) .2 , B and C are not mutually exclusive. 3.98 3.99 a. 6! 6 5 4 3 2 1 720 b. 10 10! 10 9 8 1 10 9 8 7 1 1 9!(10 9)! 9 c. 10 10! 10 9 8 1 10 1 1!(10 1)! 1 9 8 1 d. 6 6! 6 5 4 3 2 1 20 3 3!(6 3)! 3 2 1 3 2 1 e. 0! 1 Define the following events: E: {Industrial accident caused by faulty Engineering & Design} P: {Industrial accident caused by faulty Procedures & Practices} M: {Industrial accident caused by faulty Management & Oversight} T: {Industrial accident caused by faulty Training & Communication} a. The sample points for this problem are: E, P, M, and T. Reasonable probabilities are: P ( E ) 27 / 83 .3253 , P ( P ) 24 / 83 .2892 , P ( M ) 22 / 83 .2651 , and P (T ) 10 / 83 .1205 . b. P ( E ) 27 / 83 .3253 . Approximately 32.53% of all industrial accidents are caused by faulty Engineering and Design. c. P(Industrial accident caused by something other than procedures & practices) 1 P ( P c ) 1 .2892 .7108 . Approximately 71.08% of all industrial accidents are caused by something other than faulty procedures & practices. Copyright © 2014 Pearson Education, Inc. Probability 3.100 a. 133 Define the following events: J: {Raise based on job performance} C: {Raise based on cost of living} U: {Unsure.} The 3 sample points are: J, C, and U 3.101 b. We will base the probabilities on the proportions of the 10,000 U.S. workers surveyed who responded in each category. Thus, P ( J ) .35 , P (C ) .50 , and P (U ) .15 c. P(Raise based on either job performance or cost of living) P ( J ) P (C ) .35 .50 .85 Define the event: B: {Small business owned by non-Hispanic white female} From the problem, P ( B ) .27 The probability that a small business owned by a non-Hispanic white is male-owned is P ( B c ) 1 P ( B ) 1 .27 .73 . 3.102 Define the following events: C: {Public school building has inadequate plumbing} D: {Public school has plans for repairing building} From the problem, we know P (C ) .25 and P ( D | C ) .38 . P (C D ) P ( D | C ) P (C ) .38(.25) .095 3.103 a. This statement is false. All probabilities are between 0 and 1 inclusive. One cannot have a probability of 4. b. If we assume that the probabilities are the same as the percents (changed to proportions), then this is a true statement. P (4 or 5) P (4) P (5) .6020 .1837 .7857 3.104 c. This statement is true. There were no observations with one star. Thus, P (1) 0 . d. This statement is false. P (2) .0408 and P (5) .1837 . P (5) P (2) . Define the following events: S: {cause of fatal crash is speeding} C: {cause of fatal crash is missing a curve} From the problem, we know P(S) = .3 and P ( S C ) .12 . P (C | S ) P (C S ) .12 .4 .3 P(S ) Copyright © 2014 Pearson Education, Inc. 134 3.105 3.106 Chapter 3 a. B C b. Ac c. CB d. A Cc a. The 5 sample points are: Total population, Agricultural change, Presence of industry, Growth, and Population concentration. b. The probabilities are best estimated with the sample proportions. Thus, P(Total population) = .18 P(Agricultural change) = .05 P(Presence of industry) = .27 P(Growth) = .05 P(Population concentration) = .45 c. Define the following event: A: {Factor specified is population-related} P(A) = P(Total population) + P(Growth) + P(Population concentration) .18 .05 .45 .68 . 3.107 Define the following events: G: {regularly use the golf course} T: {regularly use the tennis courts} Given: P (G ) .7 and P (T ) .5 The event "uses neither facility" can be written as G c T c or (G T ) c . We are given P (G c T c ) P[(G T ) c ] .05 . The complement of the event "uses neither facility" is the event "uses at least one of the two facilities" which can be written as G T . P (G T ) 1 P (G T ) c 1 .05 .95 From the additive rule, P (G T ) P (G ) P (T ) P (G T ) .95 .7 .5 P (G T ) P (G T ) .25 Copyright © 2014 Pearson Education, Inc. Probability a. The Venn Diagram is: G .45 T .25 .25 .05 3.108 b. P (G T ) .95 from above. c. P (G T ) .25 from above. d. P (G | T ) S P (G T ) .25 .5 P (T ) .5 Define the following events: A: {electrical switch monitors quality of power} B: {electrical switch not wired properly} From the problem, P ( A ) .90 and P ( B | A) .90 . P ( A B c ) P ( B c | A) P ( A) (1 .90)(.90) .09 . 3.109 a. P ( A) 1, 465 .684 2,143 b. P( B) 265 .124 2,143 c. No. There is one sample point that they have in common: Plaintiff trial win – reversed, Jury d. P ( Ac ) 1 P ( A) 1 .684 .316 e. P( A B) 194 71 429 111 731 1, 536 .717 2,143 2,143 f. P( A B) 194 .091 2,143 Copyright © 2014 Pearson Education, Inc. 135 136 3.110 Chapter 3 Since there are 11 individuals who are willing to serve on the panel, the number of different panels of 5 experts is a combination of 11 things taken 5 at a time or 11 11! 11 10 9 8 7 6 5 4 3 2 1 462 5 5!6! (5 4 3 2 1)(6 5 4 3 2 1) 3.111 Define the following events: A: {The watch is accurate} N: {The watch is not accurate} Assuming the manufacturer's claim is correct, P ( N ) .05 and P ( A) 1 P ( N ) 1 .05 .95 The sample space for the purchase of four of the manufacturer's watches is listed below. (A, A, A, A) (N, A, A, A) (A, N, N, A) (N, A, N, N) (A, A, A, N) (A, A, N, N) (N, A, N, A) (N, N, A, N) (A, A, N, A) (A, N, A, N) (N, N, A, A) (N, N, N, A) (A, N, A, A) (N, A, A, N) (A, N, N, N) (N, N, N, N) a. All four watches not being accurate as claimed is the sample point (N, N, N, N). Assuming the watches purchased operate independently and the manufacturer's claim is correct, P N , N , N , N P N P N P N P N .054 .00000625 b. The sample points in the sample space that consist of exactly two watches failing to meet the claim are listed below. (A, A, N, N) (N, A, A, N) (A, N, A, N) (N, A, N, A) (A, N, N, A) (N, N, A, A) The probability that exactly two of the four watches fail to meet the claim is the sum of the probabilities of these six sample points. Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( A, A, N , N ) P ( A) P ( A) P ( N ) P ( N ) .95(.95)(.05)(.05) .00225625 All six of the sample points will have the same probability. Therefore, the probability that exactly two of the four watches fail to meet the claim when the manufacturer's claim is correct is 6(.00225625) .0135 Copyright © 2014 Pearson Education, Inc. Probability c. 137 The sample points in the sample space that consist of three of the four watches failing to meet the claim are listed below. (A, N, N, N) (N, N, A, N) (N, A, N, N) (N, N, N, A) The probability that three of the four watches fail to meet the claim is the sum of the probabilities of the four sample points. Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( A, N , N , N ) P ( A) P ( N ) P ( N ) P ( N ) .95(.05)(.05)(.05) .00011875 All four of the sample points will have the same probability. Therefore, the probability that three of the four watches fail to meet the claim when the manufacturer's claim is correct is 4(.00011875) .000475 If this event occurred, we would tend to doubt the validity of the manufacturer's claim since its probability of occurring is so small. d. All four watches tested failing to meet the claim is the sample point (N, N, N, N). Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( N , N , N , N ) P ( N ) P ( N ) P ( N ) P ( N ) .05(.05)(.05)(.05) .00000625 Since the probability of observing this event is so small if the claim is true, we have strong evidence against the validity of the claim. However, we do not have conclusive proof that the claim is false. There is still a chance the event can occur (with probability .00000625) although it is extremely small. 3.112 The possible ways of ranking the blades are: GSW SGW WGS GWS SWG WSG If the consumer had no preference but still ranked the blades, then the 6 possibilities are equally likely. Therefore, each of the 6 possibilities has a probability of 1/6 of occurring. 1 1 2 1 6 6 6 3 a. P(Ranks G first) P (GSW ) P (GWS ) b. P(Ranks G last) P ( SWG ) P (WSG ) c. P(ranks G last and W second) P ( SWG ) d. P (WGS ) 1 1 2 1 6 6 6 3 1 6 1 6 Copyright © 2014 Pearson Education, Inc. 138 3.113 Chapter 3 Define the following events: A: {Never smoked cigars} B: {Former cigar smoker} C: {Current cigar smoker} D: {Died from cancer} E: {Did not die from cancer} 3.114 3.115 a. P ( D | A) 782 782 P ( D A) 137, 243 .006 121,529 121,529 P ( A) 137, 243 b. P( D | B) 91 91 P( D B) 137, 243 .012 7,848 7,848 P( B) 137, 243 c. P( D | C ) 141 141 P( D C ) 137, 243 .018 7,866 7,866 P (C ) 137, 243 a. Consecutive tosses of a coin are independent events since what occurs one time would not affect the next outcome. b. If the individuals are randomly selected, then what one individual says should not affect what the next person says. They are independent events. c. The results in two consecutive at-bats are probably not independent. The player may have faced the same pitcher both times which may affect the outcome. d. The amount of gain and loss for two different stocks bought and sold on the same day are probably not independent. The market might be way up or down on a certain day so that all stocks are affected. e. The amount of gain or loss for two different stocks that are bought and sold in different time periods are independent. What happens to one stock should not affect what happens to the other. f. The prices bid by two different development firms in response to the same building construction proposal would probably not be independent. The same variables would be present for both firms to consider in their bids (materials, labor, etc.). Define the following events: A: {Wheelchair user had an injurious fall} B: {Wheelchair user had all five features installed in the home} C: {Wheelchair user had no falls} D: {Wheelchair user had none of the features installed in the home} a. P ( A) 48 .157 306 b. P(B) 9 .029 306 Copyright © 2014 Pearson Education, Inc. Probability 139 89 .291 306 c. P (C D ) d. P( A | B) 2 P( A B ) 2 306 .222 9 P( B) 9 306 e. P( A | D) 20 P ( A D) 306 20 .183 109 P ( D) 109 306 3.116 Define the following events: A1: {Paraguay is assigned to Group A} A2: {Ecuador is assigned to Group A} B1: {Paraguay is assigned to Group B} B2: {Sweden or top team in pot 3 is assigned to Group B} D1: {Paraguay is assigned to Group D} D2: {Ecuador is assigned to Group D} If the teams are drawn at random from each pot, the probability that any team is assigned to a group is 1/8. a. P ( A1 ) 1 / 8 .125 b. P ( A1 A2 ) P ( A1 ) P ( A2 ) P ( A1 A2 ) 1 / 8 1 / 8 0 2 / 8 .25 c. P ( B1 B2 ) P ( B1 ) P ( B2 ) (1 / 8)(2 / 8) 2 / 64 .03125 d. We can look at this probability by looking at how the slots can be filled. We will just look at how the teams from pot 2 can be put into Groups A, B, C, and D. The order of filling these really does not matter, so we will look at the ways to fill Group C, then Group D, then Group A, then Group B. First, we will find the total number of ways we can fill these 4 slots or Groups where Group C cannot have Paraguay or Ecuador. Since Group C cannot have Paraguay or Ecuador, then there are only 6 ways to fill Group C. There would then be 7 ways to fill Group D, 6 ways to fill Group A and 5 ways to fill Group B. The total ways to fill these 4 Groups without having Paraguay or Ecuador in Group C is 6(7)(6)(5) = 1,260. Now, we will find the number of ways we can fill these 4 Groups where Group C cannot have Paraguay or Ecuador and Group D does have either Paraguay or Ecuador. There will be 6 ways to fill Group C, 2 ways to fill Group D, 6 ways to fill Group A, and 5 ways to fill Group B. The total ways to fill these 4 Groups without having Paraguay or Ecuador in Group C and having either Paraguay or Ecuador in Group D is 6(2)(6)(5) = 360. Thus, the probability that Group C does not have either Paraguay or Ecuador and Group D does have either Paraguay or Ecuador is 360 / 1, 260 2 / 7 .286 . Finally, the probability that Group D does not have either Paraguay or Ecuador is 1 .286 .714 . Copyright © 2014 Pearson Education, Inc. 140 3.117 Chapter 3 Define the following events: S1: {Salesman makes sale on the first visit} S2: {Salesman makes a sale on the second visit} P ( S1 ) .4 P ( S 2 | S1c ) .65 The sample points of the experiment are: S1 S 2c , S1c S 2 , S1c S2c The probability the salesman will make a sale is: P ( S1 S 2c ) P ( S1c S 2 ) P ( S1 ) P ( S 2 | S1c ) P ( S1c ) .4 .65(1 .4) .4 .39 .79 3.118 Define the following events: S: {System shuts down} F1: {Hardware failure} F2: {Software failure} F3: {Power failure} From the Exercise, we know: P ( F1 ) .01 , P ( F2 ) .05 , and P ( F3 ) .02 . Also, P ( S | F1 ) .73 , P ( S | F2 ) .12 , and P ( S | F3 ) .88 . The probability that the current shutdown is due to a hardware failure is: P( F1 | S ) P( F1 S ) P( S | F1 ) P( F1 ) P( S ) P( S | F1 ) P( F1 ) P( S | F2 ) P( F2 ) P( S | F3 ) P( F3 ) .73(.01) .0073 .0073 .2362 .73(.01) .12(.05) .88(.02) .0073 .006 .0176 .0309 The probability that the current shutdown is due to a software failure is: P( F2 | S ) P( F2 S ) P( S | F2 ) P( F2 ) P( S ) P( S | F1 ) P( F1 ) P( S | F2 ) P( F2 ) P( S | F3 ) P( F3 ) .12(.05) .006 .006 .1942 .73(.01) .12(.05) .88(.02) .0073 .006 .0176 .0309 The probability that the current shutdown is due to a power failure is: P( F3 | S ) P( F3 S ) P( S | F3 ) P( F3 ) P( S ) P( S | F1 ) P( F1 ) P( S | F2 ) P( F2 ) P( S | F3 ) P( F3 ) .88(.02) .0176 .0176 .5696 .73(.01) .12(.05) .88(.02) .0073 .006 .0176 .0309 Copyright © 2014 Pearson Education, Inc. Probability 3.119 a. Suppose we let the four positions in a sample point represent in order (1) Raise a broad mix of crops, (2) Raise livestock, (3) Use chemicals sparingly, and (4) Use techniques for regenerating the soil, such as crop rotation. A farmer is either likely (L) to engage in an activity or unlikely (U). The possible classifications are: LLLL LLLU LLUL LULL ULLL LLUU LULU LUUL ULLU ULUL UULL LUUU ULUU UULU UUUL UUUU b. Since there are 16 classifications or sample points and all are equally likely, then each has a probability of 1/16. P (UUUU ) c. 1 16 The probability that a farmer will be classified as likely on at least three criteria is 1 5 . P ( LLLL ) P ( LLLU ) P ( LLUL ) P ( LULL ) P (ULLL ) 5 16 16 3.120 Define the following events: C: {Committee judges joint acceptable} I: {Inspector judges joint acceptable} The sample points of this experiment are: C I , C I c , C c I , Cc I c a. The probability the inspector judges the joint to be acceptable is: P ( I ) P (C I ) P (C c I ) 101 23 124 .810 153 153 153 The probability the committee judges the joint to be acceptable is: P (C ) P (C I ) P (C I c ) b. 101 10 111 .725 153 153 153 The probability that both the committee and the inspector judge the joint to be acceptable is: P (C I ) 101 .660 153 The probability that neither judge the joint to be acceptable is: P (C c I c ) c. 141 The probability the inspector and committee disagree is: P (C I c ) P (C c I ) 10 23 33 .216 153 153 153 Copyright © 2014 Pearson Education, Inc. 19 .124 153 142 Chapter 3 The probability the inspector and committee agree is: P (C I ) P (C c I c ) 3.121 101 19 120 .784 153 153 153 Define the following events: O1: O2: O3: A: {Component #1 in System A operates properly} {Component #2 in System A operates properly} {Component #3 in System A operates properly} {System A works properly} P (O1 ) 1 P O1c 1 .12 .88 a. P (O2 ) 1 P O2c 1 .09 .91 P (O3 ) 1 P O3c 1 .11 .89 P ( A) P (O1 O2 O3 ) P (O1 ) P (O2 ) P (O3 ) .88(.91)(.89) .7127 (since the three components operate independently) b. P ( Ac ) 1 P ( A) 1 .7127 .2873 (see part a) c. Define the following events: C1: {Component 1 in System B works properly} C2: {Component 2 in System B works properly} D3: {Component 3 in System B works properly} D4: {Component 4 in System B works properly} C: {Subsystem C works properly} D: {Subsystem D works properly} The probability a component fails is .1, so the probability a component works properly is1 .1 .9 . Subsystem C works properly if both components 1 and 2 work properly. P (C ) P (C1 C 2 ) P (C1 ) P (C 2 ) .9(.9) .81 (since the components operate independently) Similarly, P ( D ) P ( D1 D2 ) P ( D1 ) P ( D2 ) .9(.9) .81 The system operates properly if either subsystem C or D operates properly. The probability that System B operates properly is: P (C D ) P (C ) P ( D ) P (C D ) P (C ) P ( D ) P (C ) P ( D ) .81 .81 .81(.81) .9639 d. The probability exactly one subsystem fails in System B is: P (C D c ) P (C c D ) P (C ) P ( D c ) P (C c ) P ( D ) .81(1 .81) (1 .81)(.81) .1539 .1539 .3078 Copyright © 2014 Pearson Education, Inc. Probability e. 143 The probability that System B fails is the probability that both subsystems fail: P (C c D c ) P (C c ) P ( D c ) (1 .81)(1 .81) .0361 f. The system operates correctly 99% of the time means it fails 1% of the time. The probability one subsystem fails is .19. The probability n subsystems fail is .19n. Thus, we must find n such that (.19) n .01 n 3 3.122 Define the following events: R: {Successful regime change is achieved} M: {Mission is extended to support a weak Iraq government) a. The probability that a successful regime change is not achieved is P ( R c ) 1 P ( R ) 1 .7 .3 b. P ( R | M ) .26 c. Given that P ( M ) .55 , the probability that the mission is extended and results in a successful regime change is P ( M R ) P ( R | M ) P ( M ) .26(.55) .143 3.123 The probability of a false positive is P ( A | B ) . 3.124 Define the following events: A1: {Fuse made by line 1} A2: {Fuse made by line 2} D: {Fuse is defective} From the Exercise, we know P ( D | A1 ) .06 and P ( D | A2 ) .025 . Also, P ( A1 ) P ( A2 ) .5 . Two fuses are going to be selected and we need to find the probability that one of the two is defective. We can get one defective fuse out of two by getting a defective on the first and non-defective on the second ( D D c ) or non-defective on the first and defective on the second ( D c D ) . The probability of getting one defective out of two fuses given line 1 is: P ( D D c | A1 ) P ( D c D | A1 ) P ( D | A1 ) P( D c | A1 ) P( D c | A1 ) P( D | A1 ) .06(1 .06) (1 .06)(.06) .06(.94) .94(.06) .1128 P(1 D | A1 ) The probability of getting one defective out of two fuses given line 2 is: P ( D D c | A2 ) P( D c D | A2 ) P( D | A2 ) P( D c | A2 ) P( D c | A2 ) P( D | A2 ) .025(1 .025) (1 .025)(.025) .025(.975) .975(.025) .04875 P(1 D | A2 ) The probability of getting one defective out of two fuses is: P (1 D ) P (1 D A1 ) P (1 D A2 ) P (1 D | A1 ) P ( A1 ) P (1 D | A2 ) P ( A2 ) .1128(.5) .04875(.5) .0564 .024375 .080775 Copyright © 2014 Pearson Education, Inc. 144 Chapter 3 Finally, we want to find: P ( A1 | 1 D ) 3.125 P (1 D A1 ) .0564 .6982 P (1 D) .080775 Define the following events: A: {Press is correctly adjusted} B: {Press is incorrectly adjusted} D: {part is defective} From the exercise, P ( A ) .90 , P ( D | A) .05 , and. We also know that event B is the complement of event A. Thus, P ( B ) 1 P ( A) 1 .90 .10 . P( B | D) 3.126 P( B D) P( D | B) P( B) .50(.10) .05 .05 .526 P( D) P ( D | B ) P ( B ) P ( D | A) P( A) .50(.10) .05(.90) .05 .045 .095 There are a total of 6 6 = 36 outcomes when rolling 2 dice. If we let the first number in the pair represent the outcome of die number 1 and the second number in the pair represent the outcome of die number 2, then the possible outcomes are: 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6 If both dice are fair, then each of these outcomes are equally like and have a probability of 1/36. a. To win on the first roll, a player must roll a 7 or 11. There are 6 ways to roll a 7 and 2 ways to roll an 11. Thus the probability of winning on the first roll is: P (7 or 11) b. To lose on the first roll, a player must roll a 2 or 3. There is 1 way to roll a 2 and 2 ways to roll a 3. Thus the probability of losing on the first roll is: P (2 or 3) c. 8 .2222 36 3 .0833 36 If a player rolls a 4 on the first roll, the game will end on the next roll if the player rolls 4 (player wins) or if the player rolls a 7 (player loses). There are 3 ways to roll a 4 and 6 ways to roll a 7. Thus, P (4or 7 on 2 nd roll) 36 9 .25 . 36 36 Copyright © 2014 Pearson Education, Inc. Probability 3.127 145 Define the flowing events: A: {Dealer draws a blackjack} B: {Player draws a blackjack} a. For the dealer to draw a blackjack, he needs to draw an ace and a face card. There are 4 4! 4 3 2 1 4 ways to draw an ace and 1 1!(4 1)! 1 3 2 1 12 12! 12 11 10 1 12 ways to draw a face card (there are 12 face 1 1!(12 1)! 1 11 10 9 1 cards in the deck). The total number of ways a dealer can draw a blackjack is 4 12 = 48. The total number of ways a dealer can draw 2 cards is 52 52! 52 51 50 1 1326 2 2!(52 2)! 2 1 50 49 48 1 Thus, the probability that the dealer draws a blackjack is P ( A) b. 48 .0362 1326 In order for the player to win with a blackjack, the player must draw a blackjack and the dealer does not. Using our notation, this is the event B AC . We need to find the probability that the player draws a blackjack P( B) and the probability that the dealer does not draw a blackjack given the player does P ( Ac | B ) . Then, the probability that the player wins with a blackjack is P ( Ac | B ) P ( B ) . The probability that the player draws a blackjack is the same as the probability that the dealer draws a blackjack, which is P ( B ) .0362 . There are 5 scenarios where the dealer will not draw a blackjack given the player does. First, the dealer could draw an ace and not a face card. Next, the dealer could draw a face card and not an ace. Third, the dealer could draw two cards that are not aces or face cards. Fourth, the dealer could draw two aces, and finally, the dealer could draw two face cards. The number of ways the dealer could draw an ace and not a face card given the player draws a blackjack is 3 36 3! 36! 3 2 1 36 35 34 1 3(36) 108 1 1 2 1 1 35 34 33 1 1!(3 1)! 1!(36 1)! 1 (Note: Given the player has drawn blackjack, there are only 3 aces left and 36 non-face cards.) The number of ways the dealer could draw a face card and not an ace given the player draws a blackjack is 11 36 11! 36! 11 10 9 1 36 35 34 1 11(36) 396 1 1 10 9 8 1 1 35 34 33 1 1!(11 1)! 1!(36 1)! 1 Copyright © 2014 Pearson Education, Inc. 146 Chapter 3 The number of ways the dealer could draw neither a face card nor an ace given the player draws a blackjack is 36 36! 36 35 34 1 630 2 2!(36 2)! 2 1 34 33 32 1 The number of ways the dealer could draw two aces given the player draws a blackjack is 3 3! 3 2 1 3 2 2!(3 2)! 2 1 1 The number of ways the dealer could draw two face cards given the player draws a blackjack is 11 11! 11 10 9 1 55 2 9 8 7 1 2!(11 2)! 2 The total number of ways the dealer can draw two cards given the player draws a blackjack is 50 50! 50 49 48 1 1225 2 48 47 46 1 2!(50 2)! 2 1 The probability that the dealer does not draw a blackjack given the player draws a blackjack is P ( Ac | B ) 108 396 630 3 55 1192 .9731 1225 1225 Finally, the probability that the player wins with a blackjack is P ( B Ac ) P ( Ac | B ) P ( B ) .9731(.0362) .0352 3.128 a. Define the following events: W: F: {Player wins the game Go} {Player plays first (black stones)} P (W F ) 319 / 577 .553 b. P (W F | CA) 34 / 34 1 P (W F | CB ) 69 / 79 .873 P (W F | CC ) 66 / 118 .559 P (W F | BA) 40 / 54 .741 P (W F | BB ) 52 / 95 .547 P (W F | BC ) 27 / 79 .342 P (W F | AA) 15 / 28 .536 P (W F | AB ) 11 / 51 .216 P (W F | AC ) 5 / 39 .128 Copyright © 2014 Pearson Education, Inc. Probability c. 147 There are three combinations where the player with the black stones (first) is ranked higher than the player with the white stones: CA, CB, and BA. P (W F | CA CB BA) (34 69 40) / (34 79 54) 143 / 167 .856 d. There are three combinations where the players are of the same level: CC, BB, and AA. P (W F | CC BB AA) (66 52 15) / (118 95 28) 133 / 241 .552 3.129 First, we will list all possible sample points for placing a car (C) and 2 goats (G) behind doors #1, #2, and #3. If the first position corresponds to door #1, the second position corresponds to door #2, and the third position corresponds to door #3, the sample space is: (C G G) (G C G) (G G C) Now, suppose you pick door #1. Initially, the probability that you will win the car is 1/3 – only one of the sample points has a car behind door #1. The host will now open a door behind which is a goat. If you pick door #1 in the first sample point (C G G), the host will open either door #2 or door #3. Suppose he opens door #3 (it really does not matter). If you pick door #1 in the second sample point (G C G), the host will open door #3. If you pick door #1 in the third sample point (G G C), the host will open door #2. Now, the new sample space will be: (C G) (G C) (G C) where the first position corresponds to door #1 (the one you chose) and the second position corresponds to the door that was not opened by the host. Now, if you keep door #1, the probability that you win the car is 1/3. However, if you switch to the remaining door, the probability that you win the car is now 2/3. Based on these probabilities, it is to your advantage to switch doors. The above could be repeated by selecting door #2 initially or door #3 initially. In either of these cases, again, the probability of winning the car is 1/3 if you do not switch and 2/3 if you switch. Thus, Marilyn was correct. 3.130 Suppose we define the following event: E: {Error produced when dividing} From the problem, we know that P ( E ) 1 / 9, 000, 000, 000 The probability of no error produced when dividing is P ( E c ) 1 P ( E ) 1 1 / 9,000,000,000 8,999,999,999 / 9,000,000,000 .999999999 1.0000 Suppose we want to find the probability of no errors in 2 divisions (assuming each division is independent): P ( E c E c ) .999999999(.999999999) .999999999 1.0000 Thus, in general, the probability of no errors in k divisions would be: c c k k P ( Ec Ec Ec E ) P ( E ) [8, 999, 999, 999 / 9, 000, 000, 000] k Copyright © 2014 Pearson Education, Inc. 148 Chapter 3 Suppose a user ran a program that performed 1 billion divisions. The probability of no errors in these 1 billion divisions would be: P ( E c )1,000,000,000 [8,999,999,999 / 9,000,000,000]1,000,000,000 .8948 Thus, the probability of at least 1 error in 1 billion divisions would be 1 P ( E c )1,000,000,000 1 [8,999,999,999 / 9,000,000,000]1,000,000,000 1 .8948 .1052 Copyright © 2014 Pearson Education, Inc. Chapter 4 Random Variables and Probability Distributions 4.1 4.2 a. The number of newspapers sold by New York Times each month can take on a countable number of values. Thus, this is a discrete random variable. b. The amount of ink used in printing the Sunday edition of the New York Times can take on an infinite number of different values. Thus, this is a continuous random variable. c. The actual number of ounces in a one gallon bottle of laundry detergent can take on an infinite number of different values. Thus, this is a continuous random variable. d. The number of defective parts in a shipment of nuts and bolts can take on a countable number of values. Thus, this is a discrete random variable. e. The number of people collecting unemployment insurance each month can take on a countable number of values. Thus, this is a discrete random variable. a. The closing price of a particular stock on the New York Stock Exchange is discrete. It can take on only a countable number of values. b. The number of shares of a particular stock that are traded on a particular day is discrete. It can take on only a countable number of values. c. The quarterly earnings of a particular firm is discrete. It can take on only a countable number of values. d. The percentage change in yearly earnings between 2011 and 2012 for a particular firm is continuous. It can take on any value in an interval. e. The number of new products introduced per year by a firm is discrete. It can take on only a countable number of values. f. The time until a pharmaceutical company gains approval from the U.S. Food and Drug Administration to market a new drug is continuous. It can take on any value in an interval of time. 4.3 Since there are only a fixed number of outcomes to the experiment, the random variable, x, the number of stars in the rating, is discrete. 4.4 The number of customers, x, waiting in line can take on values 0, 1, 2, 3, … . Even though the list is never ending, we call this list countable. Thus, the random variable is discrete. 4.5 The variable x, total compensation in 2011 (in $ millions), is reported in whole number dollars. Since there are a countable number of possible outcomes, this variable is discrete. 4.6 A banker might be interested in the number of new accounts opened in a month, or the number of mortgages it currently has, both of which are discrete random variables. 149 Copyright © 2014 Pearson Education, Inc. 150 Chapter 4 4.7 An economist might be interested in the percentage of the work force that is unemployed, or the current inflation rate, both of which are continuous random variables. 4.8 The manager of a hotel might be concerned with the number of employees on duty at a specific time, or the number of vacancies there are on a certain night. 4.9 The manager of a clothing store might be concerned with the number of employees on duty at a specific time of day, or the number of articles of a particular type of clothing that are on hand. 4.10 A stockbroker might be interested in the length of time until the stock market is closed for the day. 4.11 a. p (22) .25 b. P( x 20 or x 24) P( x 20) P( x 24) .15 .20 .35 c. P( x 23) P( x 20) P( x 21) P( x 22) P( x 23) .15 .10 .25 .30 .80 a. The variable x can take on values 1, 3, 5, 7, and 9. b. The value of x that has the highest probability associated with it is 5. It has a probability of .4. c. Using MINITAB, the probability distribution of x as a graph is: 4.12 .4 p(x) .3 .2 .1 0 1 3 4 5 x 6 7 8 9 d. P( x 7) .2 e. P( x 5) p(5) p(7) p(9) .4 .2 .1 .7 f. P( x 2) p(3) p(5) p(7) p(9) .2 .4 .2 .1 .9 g. 4.13 2 E( x) xp( x) 1(.1) 3(.2) 5(.4) 7(.2) 9(.1) .1 .6 2.0 1.4 .9 5.0 p(x) 1 . Thus, p(2) p(3) p(5) p(8) p(10) 1 a. We know b. P( x 2 or x 10) P( x 2) P( x 10) .15 .25 .40 c. P( x 8) P( x 2) P( x 3) P( x 5) P( x 8) .15 .10 .25 .25 .75 p(5) 1 p(2) p(3) p(8) p(10) 1 .15 .10 .25 .25 .25 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.14 4.15 p( x) .1 .3 .3 .2 .9 1 . a. This is not a valid distribution because b. This is a valid distribution because 0 p( x) 1 for all values of x and c. This is not a valid distribution because p(4) .3 0 . d. The sum of the probabilities over all possible values of the random variable is p( x) .15 .15 .45 .35 1.1 1 , so this is not a valid probability distribution. a. When a die is tossed, the number of spots observed on the upturned face can be 1, 2, 3, 4, 5, or 6. Since the six sample points are equally likely, each one has a probability of 1/6. p(x) .25 .5 .25 1 . The probability distribution of x may be summarized in tabular form: x 1 2 3 4 5 6 p(x) 1 6 1 6 1 6 1 6 1 6 1 6 The probability distribution of x may also be presented in graphical form: p(x) b. 1/6 0 1 2 3 4 5 6 x 4.16 151 a. The sample points are (where H = head, T = tail): x = # heads b. HHH HHT HTH THH HTT THT TTH TTT 3 2 2 2 1 1 1 0 If each event is equally likely, then P(sample point) 1 1 k 8 1 1 1 1 3 1 1 1 3 1 p(3) , p(2) , p (1) , and p(0) 8 8 8 8 8 8 8 8 8 8 Copyright © 2014 Pearson Education, Inc. 152 Chapter 4 c. Using Minitab, the graph of p(x) is: .500 p(x) .375 .250 .125 0 0 1 2 3 x d. a. 3 1 4 1 8 8 8 2 E( x) xp( x) 4(.02) (3)(.07) (2)(.10) (1)(.15) 0(.3) 1(.18) 2(.10) 3(.06) 4(.02) .08 .21 .2 .15 0 .18 .2 .18 .08 0 2 E[( x )2 ] ( x ) 2 p( x) ( 4 0) 2 (.02) ( 3 0) 2 (.07) ( 2 0) 2 (.10) ( 1 0) 2 (.15) (0 0) 2 (.30) (1 0) 2 (.18) (2 0) 2 (.10) (3 0) 2 (.06) (4 0) 2 (.02) .32 .63 .4 .15 0 .18 .4 .54 .32 2.94 2.94 1.715 b. Using MINITAB, the graph is: Histogram of x .30 .25 .20 p(x) 4.17 P( x 2 or x 3) p (2) p(3) .15 .10 .05 0 -4 -3 2 -2 -1 0 1 2 0 3 4 2 2 0 2(1.715) 0 3.430 (3.430, 3.430) Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions c. 4.18 a. 153 P(3.430 x 3.430) p (3) p (2) p (1) p (0) p(1) p (2) p (3) .07 .10 .15 .30 .18 .10 .06 .96 E( x) xp( x) 10(.05) 20(.20) 30(.30) 40(.25) 50(.10) 60(.10) .5 4 9 10 5 6 34.5 2 E( x )2 ( x )2 p( x) (10 34.5)2 (.05) (20 34.5) 2 (.20) (30 34.5) 2 (.30) (40 34.5)2 (.25) (50 34.5) 2 (.10) (60 34.5) 2 (.10) 30.0125 42.05 6.075 7.5625 24.025 65.025 174.75 174.75 13.219 b. Using MINITAB, the graph is: Histogram of x .30 .25 p(x) .20 .15 .10 .05 0 10 20 30 40 50 60 x 2 c. 34.5 2 2 34.5 2(13.219) 34.5 26.438 (8.062, 60.938) P(8.062 x 60.938) p(10) p(20) p(30) p(40) p(50) p(60) .05 .20 .30 .25 .10 .10 1.00 4.19 a. It would seem that the mean of both would be 1 since they both are symmetric distributions centered at 1. b. P(x) seems more variable since there appears to be greater probability for the two extreme values of 0 and 2 than there is in the distribution of y. c. For x: E( x) xp( x) 0(.3) 1(.4) 2(.3) 0 .4 .6 1 2 E[( x ) 2 ] ( x )2 p( x) (0 1)2 (.3) (1 1)2 (.4) (2 1)2 (.3) .3 0 .3 .6 Copyright © 2014 Pearson Education, Inc. 154 Chapter 4 For y: E( y) yp( y) 0(.1) 1(.8) 2(.1) 0 .8 .2 1 2 E[( y )2 ] ( y ) 2 p( y ) (0 1) 2 (.1) (1 1) 2 (.8) (2 1) 2 (.1) .1 0 .1 .2 The variance for x is larger than that for y. 4.20 4.21 a. The possible values of x are 1, 2, 3, and 4 or more. b. P( x 1) .26 c. P( x 4) .25 d. We cannot compute E(x) because the last value of x has more than one value (4 or more). To find the E(x), each possible value of x can have only one value. a. The probability distribution for x is found by converting the Percent column to a probability column by dividing the percents by 100. The probability distribution of x is: x 2 3 4 5 b. P( x 5) p(5) .1837 . c. P( x 2) p(2) .0408 . p(x) .0408 .1735 .6020 .1837 E ( x) xi p ( xi ) 2(.0408) 3(.1735) 4(.6020) 5(.1837) 4 d. i 1 .0816 .5205 2.4080 .9185 3.9286 3.93 The average star rating for a car’s drivers-side star rating is 3.93. 4.22 a. Yes. Relative frequencies are observed values from a sample. Relative frequencies are commonly used to estimate unknown probabilities. In addition, relative frequencies have the same properties as the probabilities in a probability distribution, namely 1. all relative frequencies are greater than or equal to zero 2. the sum of all the relative frequencies is 1 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 155 Using MINITAB, the graph of the probability distribution is: .16 .14 .12 p(age) .10 .08 .06 .04 .02 0 20 c. 22 24 26 age 28 30 32 Let x = age of employee. Then P( x 30) .13 .15 .12 .40 . P( x 40) 0 P( x 30) .02 .04 .05 .07 .04 .02 .07 .02 .11 .07 .51 4.23 d. P( x 25or x 26) .02 .07 .09 a. In order for this to be a valid probability distribution, all probabilities must be between 0 and 1 and the sum of all the probabilities must be 1. For this data, all the probabilities are between 0 and 1. If you sum all of the probabilities, the sum is 1. b. P( x 10) P( x 10) P( x 11) P( x 20) .02 .02 .02 .02 .01 .01 .01 .01 .01 .005 .005 .14 c. The mean of x is E ( x) xp( x) 0(.17) 1(.10) 2(.11) 20(.005) 0 .1 .22 .33 .1 4.655 The variance of x is 2 E ( x ) 2 ( x )2 p( x) (0 4.655) 2 (.17) (1 4.655) 2 (.1) (2 4.655) 2 (.11) (20 4.655)2 (.005) 3.6837 1.3359 .7754 1.1773 19.8560 d. From Chebyshev’s Rule, we know that at least .75 of the observations will fall within 2 standard deviations of the mean. The standard deviation is 19.8560 4.456 . The interval is: 2 4.655 2(4.456) 4.655 8.912 (4.257, 13.567) . Copyright © 2014 Pearson Education, Inc. 156 4.24 Chapter 4 a. The probability distribution for x is: Grill Display Combination 1-2-3 1-2-4 1-2-5 2-3-4 2-3-5 2-4-5 4.25 x 6 7 8 9 10 11 p(x) 35 /124 .282 8 /124 .065 42 /124 .339 4 /124 .032 1/124 .008 34 /124 .274 b. P( x 10) p(11) .274 a. The possible values of x are 0, 2, 3, and 4. b. To find the probability distribution of x, we first find the frequency distribution of x. We then divide the frequencies by n 106 to get the probabilities. The probability distribution of x is: 0 35 .3302 x f(x) p(x) c. 2 58 .5472 3 5 .0472 4 8 .0755 E( x) xp( x) 0(.3302) 2(.5472) 3(.0472) 4(.0755) 1.538 . For all social robots, the average number of legs on the robot is 1.538. 4.26 a. For this problem, x = sequence number of a Florida tropical storm within a season that develops into a hurricane. Thus, x can take on values 1, 2, 3, … . Since this is a countable number of outcomes, x is a discrete random variable. b. The probability distribution of x is found by dividing the number of storms by the total number of storms, n = 67. The probability distribution of x is: x 1 2 3 4 5 6 7 8 f(x) 4 10 5 6 11 5 5 5 p(x) .0597 .1493 .0746 .0896 .1642 .0746 .0746 .0746 x 9 10 11 12 13 14 15 22 f(x) 4 2 5 1 1 1 1 1 p(x) .0597 .0299 .0746 .0149 .0149 .0149 .0149 .0149 c. P( x 5) .1642 d. P( x 5) P( x 1) P( x 2) P( x 3) P( x 4) .0597 .1493 .0746 .0896 .3732 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions e. 157 The expected value is E ( x) xp( x) 1(.0597) 2(.1493) 3(.0746) 22(.0149) .0597 .2986 .2238 .3278 6.1174 The average sequence number of a Florida tropical storm within a season that develops into a hurricane is 6.1174. 4.27 f. No, it is not likely. The probability is only .0149. a. The random variable x is a discrete random variable because it can take on only values 0, 1, 2, 3, 4, or 5 in this example. b. p (0) 5!(.35)0 (.65)5 0 5 4 3 2 1(1)(.65)5 .655 .1160 0!(5 0)! 1 5 4 3 2 1 p(1) 5!(.35)1 (.65)51 5 4 3 2 1(.35)1 (.65) 4 5(.35)(.65) 4 .3124 1!(5 1)! 1 4 3 2 1 p(2) 5!(.35) 2 (.65)5 2 5 4 3 2 1(.35) 2 (.65)3 10(.35) 2 (.65)3 .3364 2!(5 2)! 2 1 3 2 1 p(3) 5!(.35)3 (.65)53 5 4 3 2 1(.35)3 (.65) 2 10(.35)3 (.65) 2 .1811 3!(5 3)! 3 2 1 2 1 p(4) 5!(.35) 4 (.65)5 4 5 4 3 2 1(.35) 4 (.65)1 5(.35) 4 (.65)1 .0488 4!(5 4)! 4 3 2 1 1 p(5) 5!(.35)5 (.65)55 5 4 3 2 1(.35)5 (.65)0 (.35)5 .0053 5!(5 5)! 5 4 3 2 1 1 c. The two properties of discrete random variables are that 0 p( x) 1 for all x and above, all probabilities are between 0 and 1 and p(x) 1 . From p( x) .1160 .3124 .3364 .1811 .0488 .0053 1 4.28 d. P( x 4) p(4) p(5) .0488 .0053 .0541 a. First, we must find the probability distribution of x. Define the following events: C: {Chicken is contaminated} N: {Chicken is not contaminated} If 3 slaughtered chickens are randomly selected, then the possible outcomes are: CCC, CCN, CNC, NCC, CNN, NCN, NNC, and NNN Each of these outcomes are NOT equally likely since P(C ) 1/100 .01 . Copyright © 2014 Pearson Education, Inc. 158 Chapter 4 P( N ) 1 P(C ) 1 .01 .99 . P(CCC ) P(C C C ) P(C ) P(C ) P(C ) .01(.01)(.01) .000001 P(CCN ) P(CNC ) P( NCC ) P(C C N ) P(C ) P(C ) P( N ) .01(.01)(.99) .000099 P(CNN ) P( NCN ) P( NNC ) P(C N N ) P(C ) P( N ) P( N ) .01(.99)(.99) .009801 P( NNN ) P( N N N ) P( N ) P( N ) P( N ) .99(.99)(.99) .970299 The variable x is defined as the number of contaminated chickens in the sample. The value of x for each of the outcomes is: x 3 2 2 2 1 1 1 0 Event CCC CCN CNC NCC CNN NCN NNC NNN p(x) .000001 .000099 .000099 .000099 .009801 .009801 .009801 .970299 The probability distribution of x is: x 0 1 2 3 b. p(x) .970299 .029403 .000297 .000001 Using MINITAB, the probability graph for x is: 1 .8 p(x) .6 .4 .2 0 0 1 2 3 x 4.29 c. P( x 1) P( x 0) P( x 1) .970299 .029403 .999702 a. p(1) .23(.77)11 .23(.77)0 .23 . The probability that one would encounter a contaminated cartridge on the first trial is .23. b. p(5) .23(.77)51 .23(.77)4 .0809 . The probability that one would encounter a the first contaminated cartridge on the fifth trial is .0809. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.30 c. P( x 2) 1 P( x 1) 1 P( x 1) 1 .23 .77 . The probability that the first contaminated cartridge is found on the second trial or later is .77. a. If the first letters of consumers’ last names are all equally likely, then P( x i) 1/ 26 for i = 1, 2, …, 26. b. The expected value is 159 E ( x) xp( x) 1 1 1 1 1 2 3 26 13.5 26 26 26 26 The average number given to a consumer based on his last name is 13.5. c. 4.31 4.32 This probability distribution is probably not realistic. Very few consumers have last names that begin with Q or U. However, many consumers have last names that begin with S and T. One could estimate the true probability distribution of x by taking a random sample of names from a phone book and looking at the relative frequency distribution of the values of x assigned to the sampled names. a. 20 100 20 20! 80! 20! 80! 0 3 - 0 0!(20 0)! 3!(80 3)! 0!20! 3!77! 82,160 p (0) .508 100! 100! 161, 700 100 3!(100 3)! 3!97! 3 b. 20 100 20 20! 80! 20! 80! 1 3 - 1 1!(20 1)! 2!(80 2)! 1!19! 2!78! 63, 200 p (1) .391 100! 100! 161, 700 100 3!(100 3)! 3!97! 3 c. 20 100 20 20! 80! 20! 80! 2 3 2 2!(20 2)!1!(80 1)! 2!18!1!79! 15, 200 p (2) .094 100! 100! 161, 700 100 3!(100 3)! 3!97! 3 d. 20 100 20 20! 80! 20! 1 3 3 0 1,140 3!(20 3)! 0!(80 0)! 3!17! p (3) .007 100! 100! 161, 700 100 3!(100 3)! 3!97! 3 a. E ( x ) xp ( x ) All x Firm A: E ( x ) 0(.01) 500(.01) 1000(.01) 1500(.02) 2000(.35) 2500(.30) 3000(.25) 3500(.02) 4000(.01) 4500(.01) 5000(.01) 0 5 10 30 700 750 750 70 40 45 50 2450 Copyright © 2014 Pearson Education, Inc. 160 Chapter 4 Firm B: E ( x ) 0(.00) 200(.01) 700(.02) 1200(.02) 1700(.15) 2200(.30) 2700(.30) 3200(.15) 3700(.02) 4200(.02) 4700(.01) 0 2 14 24 255 660 810 480 74 84 47 2450 b. 2 2 ( x ) 2 p ( x) All x Firm A: 2 (0 2450) 2 (.01) (500 2450) 2 (.01) (5000 2450) 2 (.01) 60, 025 38, 025 21, 025 18, 050 70,875 750 75, 625 22, 050 24, 025 42, 025 65, 025 437,500 437,500 661.44 Firm B: 2 (0 2450) 2 (.00) (200 2450) 2 (.01) (4700 2450) 2 (.01) 0 50, 625 61, 250 31, 250 84,375 18, 750 84, 375 31, 250 61, 250 50, 625 492, 500 492,500 701.78 Firm B faces greater risk of physical damage because it has a higher variance and standard deviation. 4.33 To find the probability distribution of x, we sum the probabilities associated with the same value of x. The probability distribution is: x p(x) 4.34 8.5 .462189 9 .288764 9.5 .141671 10 .069967 10.5 .025236 11 .011657 12 .000518 To determine which group of Finnish citizens has the highest average IQ score, we must find the expected value for each group. To do this, we first find the probability distribution for each group by dividing the frequency for each IQ level in each group by the group total. The probability distributions are: IQ 1 2 3 4 5 6 7 8 9 Invest .020 .030 .045 .120 .190 .230 .150 .115 .100 No Invest .041 .083 .088 .174 .217 .191 .099 .062 .045 For Investors, E( x) xp( x) 1(.020) 2(.030) 3(.045) 9(.100) 5.895 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions For Non-investors, E( x) xp(x) 1(.041) 2(.083) 3(.088) 9(.045) 4.992 161 Thus, the investors had a higher average IQ than the non-investors. 4.35 a. Let x = the potential flood damages. Since we are assuming if it rains the business will incur damages and if it does not rain the business will not incur any damages, the probability distribution of x is: 0 .7 x p(x) b. 300,000 .3 The expected loss due to flood damage is E ( x) xp( x) 0(.7) 300, 000(.3) 0 90, 000 $90, 000 All x 4.36 Let x = winnings in the Florida lottery. The probability distribution for x is: x $1 $6,999,999 p(x) 22,999,999/23,000,000 1/23,000,000 The expected net winnings would be: E ( x ) xp ( x) ( 1) All x 1 22, 999, 999 6,999,999 $.70 23, 000, 000 23, 000, 000 The average winnings of all those who play the lottery is $.70. 4.37 a. Since there are 20 possible outcomes that are all equally likely, the probability of any of the 20 numbers is 1/20. The probability distribution of x is: P( x 5) 1/ 20 .05 ; P( x 10) 1/ 20 .05 ; etc. x 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 p(x) .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 b. E( x) xp( x) 5(.05) 10(.05) 15(.05) 20(.05) 25(.05) 30(.05) 35(.05) 40(.05) 45(.05) 50(.05) 55(.05) 60(.05) 65(.05) 70(.05) 75(.05) 80(.05) 85(.05) 90(.05) 95(.05) 100(.05) 52.5 c. 2 E ( x ) 2 ( x ) 2 p( x) (5 52.5) 2 (.05) (10 52.5) 2 (.05) (15 52.5) 2 (.05) (20 52.5) 2 (.05) (25 52.5) 2 (.05) (30 52.5) 2 (.05) (35 52.5) 2 (.05) (40 52.5) 2 (.05) (45 52.5) 2 (.05) (50 52.5) 2 (.05) (55 52.5) 2 (.05) (60 52.5) 2 (.05) (65 52.5) 2 (.05) (70 52.5) 2 (.05) (75 52.5) 2 (.05) (80 52.5) 2 (.05) (85 52.5) 2 (.05) (90 52.5) 2 (.05) (95 52.5) 2 (.05) (100 52.5) 2 (.05) 831.25 Copyright © 2014 Pearson Education, Inc. 162 Chapter 4 831.25 28.83 Since the uniform distribution is not mound-shaped, we will use Chebyshev's theorem to describe the data. We know that at least 8/9 of the observations will fall with 3 standard deviations of the mean and at least 3/4 of the observations will fall within 2 standard deviations of the mean. For this problem, 2 52.5 2(28.83) 52.5 57.66 (5.16, 110.16) . Thus, at least 3/4 of the data will fall between 5.16 and 110.16. For our problem, all of the observations will fall within 2 standard deviations of the mean. Thus, x is just as likely to fall within any interval of equal length. d. If a player spins the wheel twice, the total number of outcomes will be 20(20) = 400. The sample space is: 5, 5 10, 5 15, 5 20, 5 25, 5... 100, 5 5,10 10,10 15,10 20,10 25,10... 100,10 5,15 10,15 15,15 20,15 25,15... 100,15 . . . . . . . . . . . . . . . . . . 5,100 10,100 15,100 20,100 25,100... 100,100 Each of these outcomes are equally likely, so each has a probability of 1/400 = .0025. Now, let x equal the sum of the two numbers in each sample. There is one sample with a sum of 10, two samples with a sum of 15, three samples with a sum of 20, etc. If the sum of the two numbers exceeds 100, then x is zero. The probability distribution of x is: x 0 10 15 20 25 30 35 40 45 50 e. f. p(x) .5250 .0025 .0050 .0075 .0100 .0125 .0150 .0175 .0200 .0225 x 55 60 65 70 75 80 85 90 95 100 p(x) .0250 .0275 .0300 .0325 .0350 .0375 .0400 .0425 .0450 .0475 We assumed that the wheel is fair, or that all outcomes are equally likely. E( x) xp( x) 0(.5250) 10(.0025) 15(.0050) 20(.0075) 100(.0475) 33.25 2 E ( x ) 2 ( x - ) 2 p ( x) (0 33.25) 2 (.5250) (10 33.25) 2 (.0025) (15 33.25) 2 (.0050) (20 33.25) 2 (.0075) (100 33.25) 2 (.0475) 1, 471.3125 1,471.3125 38.3577 g. P ( x 0) .525 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions h. Given that the player obtains a 20 on the first spin, the possible values for x (sum of the two spins) are 0 (player spins 85, 90, 95, or 100 on the second spin), 25, 30, ..., 100. To get an x of 25, the player would spin a 5 on the second spin. Similarly, the player would have to spin a 10 on the second spin order to get an x of 30, etc. Since all of the outcomes are equally likely on the second spin, the distribution of x is: x 0 25 30 35 40 45 50 55 60 p(x) .20 .05 .05 .05 .05 .05 .05 .05 .05 x 65 70 75 80 85 90 95 100 p(x) .05 .05 .05 .05 .05 .05 .05 .05 i. The probability that the players total score will exceed one dollar is the probability that x is zero. P( x 0) .20 j. Given that the player obtains a 65 on the first spin, the possible values for x (sum of the two spins) are 0 (player spins 40, 45, 50, up to 100 on second spin), 70, 75, 80,..., 100. In order to get an x of 70, the player would spin a 5 on the second spin. Similarly, the player would have to spin a 10 on the second spin in order to get an x of 75, etc. Since all of the outcomes are equally likely on the second spin, the distribution of x is: x 0 70 75 80 85 90 95 100 p(x) .65 .05 .05 .05 .05 .05 .05 .05 The probability that the players total score will exceed one dollar is the probability that x is zero. P( x 0) .65 . 4.38 163 a. Each point in the system can have one of 2 status levels, “free” or “obstacle”. Define the following events: AF: {Point A is free} BF: {Point B is free} CF: {Point C is free} AO: {Point A is obstacle} BO: {Point B is obstacle} CO: {Point C is obstacle} Thus, the sample points for the space are: AFBFCF, AFBFCO, AFBOCF, AFBOCO, AOBFCF, AOBFCO, AOBOCF, AOBOCO Copyright © 2014 Pearson Education, Inc. 164 Chapter 4 b. Since it is stated that the probability of any point in the system having a “free” status is .5, the probability of any point having an “obstacle” status is also .5. Thus, the probability of each of the sample points above is P( Ai Bi Ci ) .5(.5)(.5) .125 . The values of Y, the number of free links in the system, for each sample point are listed below. A link is free if both the points are free. Thus, a link from A to B is free if A is free and B is free. A link from B to C is free if B is free and C is free. Sample point Y Probability AFBFCF AFBFCO AFBOCF AFBOCO AOBFCF AOBFCO AOBOCF AOBOCO 2 1 0 0 1 0 0 0 .125 .125 .125 .125 .125 .125 .125 .125 The probability distribution for Y is: Y 0 1 2 4.39 Probability .625 .250 .125 Let x = bookie's earnings per dollar wagered. Then x can take on values $1 (you lose) and $-5 (you win). The only way you win is if you pick 3 winners in 3 games. If the probability of picking 1 winner in 1 game is .5, then P( www) p( w) p( w) p( w) .5(.5)(.5) .125 (assuming games are independent). Thus, the probability distribution for x is: x p(x) $1 .875 $-5 .125 E( x) xp( x)1(.875) 5(.125) .875 .625 $.25 4.40 a. 6! 6! 6 5 4 3 2 1 15 2!(6 2)! 2!4! (2 1)(4 3 2 1) b. 5 5! 5! 5 4 3 2 1 10 2 1)(3 2 1) 2!(5 2)! 2!3! (2 c. 7 7! 7! 7 6 5 4 3 2 1 1 0 0!(7 0)! 0!7! (1)(7 6 5 4 3 2 1) (Note: 0! = 1) Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.41 d. 6 6! 6! 6 5 4 3 2 1 1 6 6!(6 6)! 6!0! (6 5 4 3 2 1)(1) e. 4 4! 4! 4 3 2 1 4 3 3!(4 3)! 3!1! (3 2 1)(1) a. x is discrete. It can take on only six values. b. This is a binomial distribution. c. 5 5! 5 4 3 2 1 (.7) 0 (.3)5 (1)(.00243) .00243 p (0) (.7) 0 (.3)5 0 0!5! 1 5 4 3 2 1 0 165 5 5! (.7)1 (.3) 4 .02835 p (1) (.7)1 (.3)5 1 1 1!4! 5 5! (.7) 2 (.3)3 .1323 p (2) (.7) 2 (.3)5 2 2 2!3! 5 5! (.7)3 (.3) 2 .3087 p (3) (.7)3 (.3)5 3 3 3!2! 5 5! (.7) 4 (.3)1 .36015 p (4) (.7) 4 (.3)5 4 4 4!1! 5 5! (.7)5 (.3) 0 .16807 p (5) (.7)5 (.3)5 5 5!0! 5 Histogram of x .4 p(x) .3 .2 .1 0 0 1 2 2 4.42 3 4 3.5 5 2 npq 5(.7)(.3) 1.0247 d. np 5(.7) 3.5 e. 2 3.5 2(1.0247) 3.5 2.0494 (1.4506, 5.5494) a. 3 3! 3 2 1 p (0) (.3) 0 (.7)3 0 (.3)0 (.7)3 (1)(.7)3 .343 0!3! 1 3 2 1 0 3 3! (.3)1 (.7) 2 .441 p (1) (.3)1 (.7)31 1 1!2! 3 3! (.3) 2 (.7)1 .189 p (2) (.3) 2 (.7)3 2 2 2!1! Copyright © 2014 Pearson Education, Inc. 166 Chapter 4 3 3! (.3)3 (.7) 0 .027 p (3) (.3)3 (.7)3 3 3!0! 3 b. 4.43 4.44 The probability distribution in tabular form is: x p(x) 0 1 2 3 .343 .441 .189 .027 a. P( x 1) 5! 5 4 3 2 1 (.2)1(.8) 4 (.2)1(.8) 4 5(.2)1 (.8)4 .4096 1!4! (1)(4 3 2 1) b. P( x 2) 4! 4 3 2 1 (.6) 2(.4) 2 (.6) 2(.4) 2 6(.6) 2 (.4) 2 .3456 2!2! (2 1)(2 1) c. P( x 0) 3! 3 2 1 (.7) 0(.3) 3 (.7) 0(.3) 3 1(.7)0 (.3)3 .027 0!3! (1)(3 2 1) d. P( x 3) 5! 5 4 3 2 1 (.1) 3(.9) 2 (.1) 3(.9) 2 10(.1)3 (.9) 2 .0081 3!2! (3 2 1)(2 1) e. P( x 2) 4! 4 3 2 1 (.4) 2(.6) 2 (.4) 2(.6) 2 6(.4)2 (.6) 2 .3456 2!2! (2 1)(2 1) f. P( x 1) a. P x 2 P( x 2) P( x 1) .167 .046 .121 (from Table I, Appendix D with n = 10 and p = .4) b. P ( x 5) .034 c. P( x 1) 1- P( x 1) 1 .919 .081 d. P( x 10) P( x 9) 0 e. P( x 10) 1 P( x 9) 1 .002 .998 f. P( x 2) P( x 2) P( x 1) .206 .069 .137 3! 3 2 1 (.9)1(.1) 2 (.9)1(.1) 2 3(.9)1 (.1)2 .027 1!2! (1)(2 1) Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.45 a. 167 np 25(.5) 12.5 2 np(1 p) 25(.5)(.5) 6.25 and 2 6.25 2.5 b. np 80(.2) 16 2 np(1 p) 80(.2)(.8) 12.8 and 2 12.8 3.578 c. np 100(.6) 60 2 np(1 p) 100(.6)(.4) 24 and 2 24 4.899 d. np 70(.9) 63 2 np(1 p) 70(.9)(.1) 6.3 and 2 6.3 2.510 e. np 60(.8) 48 2 np(1 p) 60(.8)(.2) 9.6 and 2 9.6 3.098 f. np 1,000(.04) 40 2 np(1 p) 1,000(.04)(.96) 38.4 and 2 38.4 6.197 4.46 x is a binomial random variable with n = 4. a. If the probability distribution of x is symmetric, p(0) = p(4) and p(1) = p(3). n We know p ( x ) p x q n x x = 0, 1, ... , n, x When n = 4, 4 4 4! 0 4 4! 4 0 4 4 p (0) p (4) p 0 q 4 p 4 q 0 p q p q q p pq 0!4! 4!0! 0 4 Since p q 1 , p = .5 Therefore, the probability distribution of x is symmetric when p = .5. b. If the probability distribution of x is skewed to the right, then the mean is greater than the median. Therefore, there are more small values in the distribution (0, 1) than large values (3, 4). For this to happen, p must be smaller than .5. If we pick p .2 , the probability distribution of x will be skewed to the right. c. If the probability distribution of x is skewed to the left, then the mean is smaller than the median. Therefore, there are more large values in the distribution (3, 4) than small values (0, 1). For this to happen, p must be larger than .5. If we pick p .8 , the probability distribution of x will be skewed to the left. Copyright © 2014 Pearson Education, Inc. Chapter 4 d. In part a, x is a binomial random variable with n = 4 and p = .5. 4 p ( x ) .5 x.54 x x x = 0, 1, 2, 3, 4 4 4! 4 .5 1(.5) 4 .0625 p (0) .5 0.5 4 0 0!4! 4 4! 4 .5 4(.5) 4 .25 p (1) .51.5 3 1 1!3! 4 4! 4 .5 6(.5) 4 .375 p (2) .5 2.5 2 2!2! 2 p(3) p(1) .25 (since the distribution is symmetric) p(4) p(0) .0625 The probability distribution of x in tabular form is: x 0 1 2 3 4 p(x) .0625 .25 .375 .25 .0625 np 4(.5) 2 Using MINITAB, the graph of the probability distribution of x when n 4 and p .5 is as follows. Histogram of x .4 .3 p(x) 168 .2 .1 0 0 1 2 x 3 4 2 In part b, x is a binomial random variable with n 4 and p .2 . 4 p ( x ) .2 x.8 4 x x x = 0, 1, 2, 3, 4 4 p (0) .20.84 1(1).84 .4096 0 4 p (1) .21.83 4(.2).83 .4096 1 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4 p (3) .23.81 4(.2)3 (.8) .0256 3 4 p (2) .2 2.82 6(.2) 2 .82 .1536 2 4 p(4) .24.80 1(.2) 4 (1) .0016 4 The probability distribution of x in tabular form is: x 0 1 2 3 4 p(x) .4096 .4096 .1536 .0256 .0016 np 4(.2) .8 . Using MINITAB, the graph of the probability distribution of x when n 4 and p .2 is as follows: Histogram of x .4 p(x) .3 .2 .1 0 0 1 2 x 3 4 .8 In part c, x is a binomial random variable with n 4 and p .8 . 4 p ( x ) .8 x.2 4- x x x = 0, 1, 2, 3, 4 4 p (0) .80.2 4 1(1).2 4 .0016 0 4 p (1) .81.23 4(.8).23 .0256 1 4 p (2) .82.2 2 6(.8) 2 .2 2 .1536 2 4 p (3) .83.21 4(.8)3 .2 .4096 3 4 p(4) .84.20 1(.8) 4 (1) .4096 4 Copyright © 2014 Pearson Education, Inc. 169 170 Chapter 4 The probability distribution of x in tabular form is: x 0 1 2 3 4 p(x) .0016 .0256 .1536 .4096 .4096 Note: The distribution of x when n 4 and p .8 is the reverse of the distribution of x when n 4 and p .8 . np 4(.8) 3.2 Using MINITAB, the graph of the probability distribution of x when n 4 and p .8 is as follows: Histogram of x .4 p(x) .3 .2 .1 0 0 1 2 x 3 4 3.2 4.47 e. In general, when p .5 , a binomial distribution will be symmetric regardless of the value of n. When p is less than .5, the binomial distribution will be skewed to the right; and when p is greater than .5, it will be skewed to the left. (Refer to parts a, b, and c.) a. Let S = adult who does not work while on summer vacation. b. To see if x is approximately a binomial random variable we check the characteristics: 1. n identical trials. Although the trials are not exactly identical, they are close. Taking a sample of reasonable size n from a very large population will result in trials being essentially identical. 2. Two possible outcomes. The adults can either not work on their summer vacation or they can work on their summer vacation. S = adult does not work on summer vacation and F = adult does work on summer vacation. 3. P(S) remains the same from trial to trial. If we sample without replacement, then P(S) will change slightly from trial to trial. However, the differences are extremely small and will essentially be 0. 4. Trials are independent. Again, although the trials are not exactly independent, they are very close. 5. The random variable x = number of adults who work on their summer vacation in n = 10 trials. Thus, x is very close to being a binomial. We will assume that it is a binomial random variable. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions c. For this problem, p .35 . d. Using MINITAB n 10 and p .35 , the probability is: 171 Probability Density Function Binomial with n = 10 and p = 0.35 x 3 P( X = x ) 0.252220 Thus, P( x 3) .2522 . e. Using MINITAB n 10 and p .35 , the probability is: Cumulative Distribution Function Binomial with n = 10 and p = 0.35 x 2 P( X <= x ) 0.261607 Thus, P( x 2) P( x 0) P( x 1) P( x 2) .2616 . 4.48 a. To see if x is approximately a binomial random variable we check the characteristics: 1. n identical trials. Although the trials are not exactly identical, they are close. Taking a sample of size n 15 from a very large population will result in trials being essentially identical. 2. Two possible outcomes. The hotel guests are either aware of and participate in the conservation efforts or they do not. S = hotel guest is aware of and participates in conservation efforts and F = hotel guest is not aware of and/or does not participate in conservation efforts. 3. P(S) remains the same from trial to trial. If we sample without replacement, then P(S) will change slightly from trial to trial. However, the differences are extremely small and will essentially be 0. 4. Trials are independent. Again, although the trials are not exactly independent, they are very close. 5. The random variable x = number of hotel guests who are aware of and participate in conservation efforts in n 15 trials. Thus, x is very close to being a binomial. We will assume that it is a binomial random variable. b. Define the following events: P: {hotel guest is aware of conservation program} A: {hotel guest participates in conservation efforts} Then, p P( P | A) P( A) .72(.66) .4752 . Copyright © 2014 Pearson Education, Inc. 172 c. Chapter 4 Using MINITAB with n 15 and p .45 , the probability is: Cumulative Distribution Function Binomial with n = 15 and p = 0.45 x 9 P( X <= x ) 0.923071 Thus, P( x 10) 1 P( x 9) 1 .9231 .0769 . 4.49 a. To see if x is approximately a binomial random variable we check the characteristics: 1. n identical trials. Although the trials are not exactly identical, they are close. Taking a sample of size n 250 from a very large population will result in trials being essentially identical. 2. Two possible outcomes. A U.S. adult has either used the internet and paid to download music or he/she has not. S = U.S. adult has used the internet and paid to download music and F = U.S. adult has not used the internet and/or has not paid to download music. 3. P(S) remains the same from trial to trial. If we sample without replacement, then P(S) will change slightly from trial to trial. However, the differences are extremely small and will essentially be 0. 4. Trials are independent. Again, although the trials are not exactly independent, they are very close. 5. The random variable x = number of U.S. adults who have used the internet and paid to download music in n 250 trials. Thus, x is very close to being a binomial. We will assume that it is a binomial random variable. 4.50 b. For this example, p .5 . Half of the adults in the U.S. have used the internet and have paid to download music. c. E ( x) np 250(.5) 125 a. We will check the 5 characteristics of a binomial random variable. 1. 2. 3. The experiment consists of n identical trials. There are only 2 possible outcomes for each trial. Let S = general practice physician in the United States does not recommend medicine as a career and F = general practice physician in the United States does recommend medicine as a career. The probability of success (S) is the same from trial to trial. For each trial p P ( S ) .60 and 4. q 1 p 1 .60 .40 . The trials are independent. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 5. 173 The binomial random variable x is the number of general practice physicians in the United States in n trials who do not recommend medicine as a career. Thus, x is a binomial random variable. b. From the information given, p .60 . c. E ( x) np 25(.60) 15 npq 25(.60)(.40) 6 2.4495 d. 4.51 From Table I, Appendix D, with n 25 and p .60 , P( x 1) 1 P( x 0) 1 .000 1.000 . For this problem, let x = number of law librarians who are unsatisfied with their job. Then x is a binomial random variable with n 20 and p 1 .90 .10 . Using a MINITAB with n 20 and p .10 , the probability is Cumulative Distribution Function Binomial with n = 20 and p = 0.1 x 2 P( X <= x ) 0.676927 Thus, P( x 2) .6769 . 4.52 a. Let x = number of students who initially answer the question correctly in 20 students. Then x is a binomial random variable with n 20 and p .5 . Using a MINITAB with n 20 and p .5 , the probability is: Cumulative Distribution Function Binomial with n = 20 and p = 0.5 x 10 P( X <= x ) 0.588099 Thus, P( x 10) 1 P( x 10) 1 .5881 .4119 . b. Let y = number of students who answer the question correctly after immediate feedback in 20 students. Then y is a binomial random variable with n 20 and p .7 . Using a MINITAB with n 20 and p .7 , the probability is: Cumulative Distribution Function Binomial with n = 20 and p = 0.7 x 10 P( X <= x ) 0.0479619 Thus, P( x 10) 1 P( x 10) 1 .0480 .9520 . Copyright © 2014 Pearson Education, Inc. 174 4.53 Chapter 4 a. Let x = number of pairs correctly identified by an expert in 5 trials. Then x is a binomial random variable with n 5 and p .92 . Using a MINITAB with n 5 and p .92 , the probability is: Probability Density Function Binomial with n = 5 and p = 0.92 x 5 P( X = x ) 0.659082 Thus, P( x 5) .6591 . b. Let y = number of pairs correctly identified by a novice in 5 trials. Then y is a binomial random variable with n 5 and p .75 . Using a MINITAB with n 5 and p .75 , the probability is: Probability Density Function Binomial with n = 5 and p = 0.75 x 5 P( X = x ) 0.237305 Thus, P( x 5) .2373 . 4.54 a. Let x = number of commissioners out of 4 who vote in favor of an issue. Then x is a binomial random variable with n 4 and p .5 (since they are equally likely to vote for or against an issue). The probability that your vote counts is equal to P( x 2) . P( x 2) b. Let x = number of commissioners out of 2 who vote in favor of an issue. Then x is a binomial random variable with n 2 and p .5 (since they are equally likely to vote for or against an issue). The probability that your vote counts is equal to P ( x 1) . P( x 1) 4.55 4! 4 3 2 1 2 .52 (.5) 4 2 .5 (.5)2 .375 2!(4 2)! 2 1 2 1 2! 2 1 1 1 .51 (.5)2 1 .5 (.5) .5 1!(2 1)! 1 1 Let x = number of major bridges in Denver that will have a rating of 4 or below in 2020 in 10 trials. Then x has an approximate binomial distribution with n 10 and p .09 . a. P ( x 3) 1 P ( x 2) 1 P ( x 0) P( x 1) P( x 2) b. Since the probability of seeing at least 3 bridges out of 10 with ratings of 4 or less is so small, we can conclude that the forecast of 9% of all major Denver bridges will have ratings of 4 or less in 2020 is too small. There would probably be more than 9%. 10 10 10 1 .09 0 (.91)10 0 .091 (.91)10 1 .092 (.91)10 2 0 1 2 10! 10! 1 9 10! 1 .090.9110 .09 .91 .092.918 1 .389 .385 .171 .055 0!10! 1!9! 2!8! Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.56 175 Define the following events: A: {Taxpayer is audited} B: {Taxpayer has income less than $1 million) C: {Taxpayer has income of $1 million or higher} a. From the information given in the problem, P( A|B) 1/100 .01 b. P( A | C ) 9 /100 .09 Let x = number of taxpayers with incomes under $1 million who are audited. Then x is a binomial random variable with n 5 and p .01 . 5 5! .011 (.99) 4 .0480 P ( x 1) .011 (.99)5 1 1!4! 1 5 P( x 1) 1 [ P( x 0) P( x 1)] 1 .010 (.99)5 0 .0480 0 5! .010 (.99)5 .0480 1 [.9510 .0480] 1 .9990 .0010 1 0!5! c. Let x = number of taxpayers with incomes of $1 million or more who are audited. Then x is a binomial random variable with n 5 and p .09 . 5 5! .091 (.91) 4 .3086 P ( x 1) .091 (.91)5 1 1! 4! 1 5 P( x 1) 1 [ P( x 0) P( x 1)] 1 .090 (.91)5 0 .3086 0 5! .090 (.91)5 .3086 1 [.6240 .3086] 1 .9326 .0674 1 0!5! d. Let x = number of taxpayers with incomes under $1 million who are audited. Then x is a binomial random variable with n 2 and p .01 . Let y = number of taxpayers with incomes $1 million or more who are audited. Then y is a binomial random variable with n 2 and p .09 . 2 2! .010 (.91) 2 .9801 P ( x 0) .010 (.91) 2 0 0! 2! 0 2 2! .090 (.91) 2 .8281 P ( y 0) .090 (.91) 2 0 0!2! 0 P( x 0) P( y 0) .9801(.8281) .8116 e. We must assume that the variables defined as x and y are binomial random variables. We must assume that the trials are identical, the probability of success is the same from trial to trial, and that the trials are independent. Copyright © 2014 Pearson Education, Inc. 176 4.57 Chapter 4 a. b. E ( x) np 800(.65) 520 npq 800(.65)(.35) 182 13.49 Half of the 800 food items would be 400. A value of x 400 would have a z-score of: z x 400 520 8.90 13.49 Since the z-score associated with 400 items is so small (8.90), it would be virtually impossible to observe less than half with any pesticides if the 65% value was correct. 4.58 Assuming the supplier's claim is true, np 500(.001) .5 and npq 500(.001)(.999) .4995 .707 If the supplier's claim is true, we would only expect to find .5 defective switches in a sample of size 500. Therefore, it is not likely we would find 4. Based on the sample, the guarantee is probably inaccurate. Note: z 4.59 x 4 .5 4.95 . This is an unusually large z-score. .707 a. We must assume that the probability that a specific type of ball meets the requirements is always the same from trial to trial and the trials are independent. To use the binomial probability distribution, we need to know the probability that a specific type of golf ball meets the requirements. b. For a binomial distribution, np and npq . In this example, n two dozen 2 12 24 , p .10 , and q 1 .10 .90 . (Success here means the golf ball does not meet standards.) np 24(.10) 2.4 and npq 24(.10)(.90) 1.47 c. In this situation, n 24 , p = Probability of success = Probability golf ball does meet standards = .90, and q 1 .90 .10 . E ( y ) np 24(.90) 21.6 and npq 24(.10)(.90) 1.47 (Note that is the same as in part b.) 4.60 a. For this test, n 20 and p .10 . Then x is a binomial random variable with n 20 and p .10 . Using Table I, Appendix D, with n 20 and p .10 , P ( x 1) .392 b. For the experiment in part a, the level of confidence is 1 P( x 1) 1 .392 .608 . Since this value is not close to 1, this would not be an acceptable level. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions c. 177 Suppose we increased n from 20 to 25. Using Table I, Appendix D, with n 25 and p .10 , P ( x 1) .271 . This value is smaller than the value found in part a. The level of confidence is 1 P( x 1) 1 .271 .729 . Now, suppose we keep n 20 , but change K to 0 instead of 1. Using Table I, Appendix D, with n 20 and p .10 , P ( x 0) .122 . This value is again, smaller than the value found in part a. The level of confidence is 1 P( x 1) 1 .122 .878 . d. Suppose we let K 0 . Now, we need to find n such that the level of confidence .95, which means that P( x 0) .05 . n P( x 0) .10 (.9) n 0 .05 0 n! n .9 .05 0!n! .9n .05 ln(.9n ) ln(.05) nln(.9) ln(.05) n ln(.05) 2.99573 28.4 ln(.9) .10536 Thus, if K 0 , then we need a sample size of 29 or larger to get a level of confidence of at least .95. Now, suppose K = 1. Now, we need to find n such that the level of confidence is at least .95, which means that P( x 1) .05 . n n P ( x 1) P ( x 0) P ( x 1) .10 (.9) n 0 .11 (.9) n 1 .05 0 1 n! n n! .9 (.1)1 .9n 1 .05 0!n! 1!(n 1)! .9 n n(.1)1 .9n 1 .05 .9 n 1 .9 (.1) n ln(.05) From here, we will use trial and error. Copyright © 2014 Pearson Education, Inc. 178 Chapter 4 30 1 For n 30 , .9 .9 .1(30) .1837 n .9n1 .9 .1n 30 .9301 .9 .1 30 .1837 .9401 .9 .1 40 .0805 40 .9451 .9 .1 45 .0524 45 .9461 .9 .1 46 .0480 46 Thus, for K 1, we would need a sample size of 46 to get a level of confidence of at least .95. a. The random variable x is discrete since it can assume a countable number of values (0, 1, 2, ...). b. This is a Poisson probability distribution with 3 . c. In order to graph the probability distribution, we need to know the probabilities for the possible values of x. Using MINITAB with 3 : Probability Density Function Poisson with mean = 3 x 0 1 2 3 4 5 6 7 8 9 10 P( X = x ) 0.049787 0.149361 0.224042 0.224042 0.168031 0.100819 0.050409 0.021604 0.008102 0.002701 0.000810 Using MINITAB, the probability distribution of x in graphical form is: Histogram of x .25 .20 .15 f(x) 4.61 .10 .05 0 0 1 2 3 4 5 6 7 8 9 10 x d. 3 2 3 and 3 1.7321 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.62 1.5 . Using MINITAB with the Poisson distribution and 1.5 , the probabilities are: Cumulative Distribution Function Poisson with mean = 1.5 x 3 2 0 6 4.63 P( X <= x ) 0.934358 0.808847 0.223130 0.999074 a. P( x 3) .934358 . b. P( x 3) 1 P( x 2) 1 .808847 .191153 . c. P( x 3) P( x 3) P( x 2) .934358 .808847 .125511 . d. P( x 0) .22313 . e. P( x 0) 1 P( x 0) 1 .22313 .77687 . f. P( x 6) 1 P( x 6) 1 .999074 .000926 . a. r N r 3 5 3 3! 2! x n x 1 3 1 1!2! 2!0! 3(1) .3 P ( x 1) 5! 10 N 5 3!2! n 3 b. r N r 3 9 3 3! 6! x n x 3 5 3 3!0! 2!4! 1(15) P ( x 3) .119 9! 126 N 9 5!4! n 5 c. r N r 2 4 2 2! 2! x n x 2 2 2 2!0! 0!2! 1(1) .167 P ( x 2) 4! 6 N 4 2!2! n 2 d. r N r 2 4 2 2! 2! x n x 0 2 0 0!2! 2!0! 1(1) .167 P ( x 0) 4! 6 N 4 2!2! n 2 Copyright © 2014 Pearson Education, Inc. 179 180 4.64 Chapter 4 For N 8, n 3, and r 5 , a. r N r 58 5 5! 3! x n x 1 3 1 1!4! 2!1! 5(3) .268 P ( x 1) 8! 56 N 8 3!5! n 3 b. r N r 58 5 5! 3! x n x 0 3 0 0!5! 3!0! 1(1) .018 P ( x 0) 8! 56 N 8 3!5! n 3 c. r N r 58 5 5! 3! x n x 3 3 3 3!2! 0!3! 10(1) P ( x 3) .179 8! 56 N 8 3!5! n 3 d. P( x 4) P( x 4) P( x 5) 0 Since the sample size is only 3, there is no way to get 4 or more successes in only 3 trials. 4.65 a. Using MINITAB with 1 ,and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 1 x 2 P( X <= x ) 0.919699 P( x 2) .919699 b. Using MINITAB with 2 ,and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 2 x 2 P( X <= x ) 0.676676 P( x 2) .676676 c. Using MINITAB with 3 ,and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 3 x 2 P( X <= x ) 0.423190 P( x 2) .42319 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions d. The probability decreases as increases. This is reasonable because is equal to the mean. As the mean increases, the probability that x is less than a particular value will decrease. a. To graph the Poisson probability distribution with 5 , we need to calculate p(x) for x = 0 to 15. Using MINITAB with 5 , the results are: Probability Density Function Poisson with mean = 5 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P( X = x ) 0.006738 0.033690 0.084224 0.140374 0.175467 0.175467 0.146223 0.104445 0.065278 0.036266 0.018133 0.008242 0.003434 0.001321 0.000472 0.000157 Using MINITAB, the graph is: Histogram of x .20 .15 f(x) 4.66 181 .10 .05 0 0 2 4 6 8 10 12 14 x 5 2 b. 5 and = 5 2.2361 2 5 2(2.2361) 5 4.4722 (.5278, 9.4722) Copyright © 2014 Pearson Education, Inc. 182 Chapter 4 c. Using MINITAB with 5 Cumulative Distribution Function Poisson with mean = 5 x 9 0 P( X <= x ) 0.968172 0.006738 P(.5278 x 9.4722) P(1 x 9) P( x 9) P( x 0) .968172 .006738 .961434 4.67 4.68 For this problem, N 100, n 10, and x 4 . a. If the sample is drawn without replacement, the hypergeometric distribution should be used. The hypergeometric distribution requires that sampling be done without replacement. b. If the sample is drawn with replacement, the binomial distribution should be used. The binomial distribution requires that sampling be done with replacement. With N 10, n 5, and r 7 , x can take on values 2, 3, 4, or 5. a. r N r 7 10 7 7! 3! x n x 2 5 2 2!5! 3!0! 21(1) P ( x 2) .083 10! 252 N 10 5!5! n 5 r N r 7 10 7 7! 3! x n x 3 5 3 3!4! 2!1! 35(3) .417 P ( x 3) 10! 252 N 10 5!5! n 5 r N r 7 10 7 7! 3! 4 5 4 x n x 4!3!1!2! 35(3) .417 P ( x 4) 10! 252 N 10 5!5! 5 n r N r 7 10 7 7! 3! x n x 5 5 5 5!2! 0!3! 21(1) P ( x 5) .083 10! 252 N 10 5!5! n 5 The probability distribution of x in tabular form is: x 2 3 4 5 p(x) .083 .417 .417 .083 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. nr 5(7) 3.5 10 N 2 c. 183 r ( N r )n( N n) 7(10 7)5(10 5) 525 .583 900 N 2 ( N 1) 102 (10 1) .5833 .764 2 3.5 2(.764) 3.5 1.528 (1.972, 5.028) The graph of the distribution is: 0.4 p(x) 0.3 0.2 0.1 0.0 2 1.972 4.69 3 4 x 3.5 d. P(1.972 x 5.028) P(2 x 5) 1.000 a. The characteristics of a binomial random variable are: 5 5.028 1. n identical trials. We are selecting 10 robots from 106. On the first trial, we are selecting 1 robot out of 106. On the next trial, we are selecting 1 robot out of 105. On the 10th trial, we are selecting 1 robot out of 97. These trials are not identical. 2. Two possible outcomes. A selected robot either has no legs or wheels or it has some legs or wheels. S = robot has no legs or wheels and F = robot has either legs and/or wheels. This condition is met 3. P(S) remains the same from trial to trial. For this example the probability of success does not stay constant. On the first trial, there are 106 robots of which 15 have neither legs nor wheels. Thus, P(S) on the first trial is 15/106. If a robot with neither legs nor wheels is selected on the first trial, then P(S) on the second trial would be 14/105. If a robot with neither legs nor wheels is not selected on the first trial, then P(S) on the second trial would be 15/105. The value of P(S) is not constant from trial to trial. This condition is not met. 4. Trials are independent. The trials are not independent. The type of robot selected on one trial affects the type of robot selected on the next trial. This condition is not met. 5. The random variable x = number of robots selected that do not have legs or wheels in 10 trials. The necessary conditions for a binomial random variable are not met. Copyright © 2014 Pearson Education, Inc. 184 Chapter 4 b. The characteristics of a hypergeometric random variable are: 1. The experiment consists of randomly drawing n elements without replacement from a set of N elements, r of which are successes and (N – r) of which are failures. For this example there are a total of N 106 robots, of which r 15 have neither legs nor wheels and N – r 106 –15 95 have some legs and/or wheels. We are selecting n 10 robots. 2. The hypergeometric random variable x is the number of successes in the draw of n elements. For this example, x = number of robots selected with no legs or wheels in 20 selections. c. 4.70 nr 10(15) 1.415 and 106 N 15(106 15)10(106 10) r ( N r )n( N n) 1.1107 1.0539 2 N ( N 1) 1062 (106 1) d. 15 106 15 15! 91! 2 10 2 2!(15 2)! 8!(91 8)! 105(8.49869 x1012 ) .2801 P ( x 2) 106! 3.18535 x1013 106 10!(106 10)! 10 a. E ( x) 45 b. z c. Using MINITAB with 45 , and the Poisson distribution, the probability is: x 45 6.7082 360 45 46.96 6.7082 Cumulative Distribution Function Poisson with mean = 45 x 65 P( X <= x ) 0.998028 P( x 65) .998028 4.71 a. With 4.5 , P( x 0) b. P ( x 1) c. E ( x) 4.5 4.50 e 4.5 0.0111 0! 4.51 e 4.5 0.0500 1! 4.5 2.12 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.72 185 Let x = number of male nannies in 10 trials. For this problem, N = 4,176, r = 24, and n = 10. r N r 24 4176 24 24! 4152! 0 10 0 x n x 0!24!10!4142! P ( x 1) 1 P ( x 0) 1 1 1 4176! N 4176 10!4166! n 10 1 4.73 1.50613 1036 1 .9439 .0561 1.59559 1036 Let x = number of “clean” cartridges selected in 5 trials. For this problem, N 158, n 5, and r 122 . r N r 122 36 122! 36! x n x 5 0 5!117! 0!36! .2693 P ( x 5) 158! N 158 5!153! n 5 4.74 Using MINITAB with 5 , and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 5 x 10 P( X <= x ) 0.986305 P( x 10) 1 P( x 10) 1 .986305 .013695 4.75 Let x = number of times “total visitors” is selected in 5 museums. For this exercise, x has a hypergeometric distribution with N 30, n 5, r 8 and x 0 . 8 30 8 8! 22! 0 5 0 0!(8 0)! 5!(22 5)! P ( x 0) .1848 30! 30 5!(30 5)! 5 4.76 Let x = number of game-day traffic fatalities at the winning team’s location. For this Exercise, x has a Poisson distribution with .5 . P ( x 3) 1 P ( x 2) 1 P ( x 0) P( x 1) P( x 2) 1 4.77 .50 e .5 .51 e .5 .52 e .5 1 .6065 .3033 .0758 .0144 0! 1! 2! Let x = number of times cell phone accesses color code “b” in 7 handoffs. For this problem, x has a hypergeometric distribution with N 85, n 7, and r 40 . Copyright © 2014 Pearson Education, Inc. 186 Chapter 4 40 85 40 40! 45! 2! 40 2 ! 5! 45 5 ! 2 72 780(1, 221, 759) P ( x 2) .1931 85! 85 4,935,847, 320 7! 85 7 ! 7 4.78 4.79 For this exercise, x has a hypergeometric distribution with N 57, n 10, and r 45 . a. 45 57 45 45! 12! 5! 45 5 ! 5! 12 5 ! 5 10 5 1, 221, 759(792) P ( x 5) .0224 57! 43,183, 019,880 57 10! 57 10 ! 10 b. 45 57 45 45! 12! 8! 45 8 ! 2! 12 2 ! 8 10 8 215,553,195(66) P ( x 8) .3294 57! 43,183, 019,880 57 10! 57 10 ! 10 c. E ( x) nr 10(45) 7.895 57 N Let x = number of flaws in a 4 meter length of wire. For this exercise, x has a Poisson distribution with .8. The roll will be rejected if there is at least one flaw in the sample of a 4 meter length of wire. P( x 1) 1 P( x 0) 1 .80 e .8 1 .4493 .5507 0! We have to assume that the flaws are randomly distributed throughout the roll of wire and that the 4 meter sample of wire is representative of the entire roll. 4.80 4.81 a. 9 The average of 9 noise events in a unit of time. b. 9 3 c. SNR a. Using MINITAB with 10 , 9 3 3 Probability Density Function Poisson with mean = 10 x 24 P( X = x ) 0.0000732 P( x 24) .0000732 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 187 Using MINITAB with 10 , Probability Density Function Poisson with mean = 10 x 23 P( X = x ) 0.0001756 P( x 23) .0001756 c. 4.82 Yes, these probabilities are good approximations for the probability of “fire” and “theft”. The researchers estimated these probabilities to be .0001, indicating that these would be extremely rare events. Our probabilities of .0001 and .0002 are very close to .0001. If it takes exactly 5 minutes to wash a car and there are 5 cars in line, it will take 5(5) = 25 minutes to wash these 5 cars. Thus, for anyone to be in line at closing time, more than 1 car must arrive in the final ½ hour. In addition, if on average 10 cars arrive per hour, then an average of 5 cars will arrive per ½ hour (30 minutes). If we let x = number of cars to arrive in ½ hour, then x is a Poisson random variable with 5 . Using MINITAB with 5 , Cumulative Distribution Function Poisson with mean = 5 x 1 P( X <= x ) 0.0404277 P( x 1) 1 P( x 1) 1 .0404277 .9595723 Since this probability is so large, it is very likely that someone will be in line at closing time. 4.83 Let x = number of females promoted in the 72 employees awarded promotion, where x is a hypergeometric random variable. From the problem, N 302, n 72, and r 73 . We need to find if observing 5 females who were promoted was fair. E ( x) nr 72(73) 17.40 302 N If 72 employees are promoted, we would expect that about 17 would be females. V ( x) 2 r ( N r )n( N n) 73(302 73)72(302 72) 10.084 N 2 ( N 1) 3022 (302 1) 10.084 3.176 Using Chebyshev’s Theorem, we know that at least 8/9 of all observations will fall within 3 standard deviations of the mean. The interval from 3 standard deviations below the mean to 3 standard deviations above the mean is: 3 17.40 3(3.176) 17.40 9.528 (7.872, 26.928) If there is no discrimination in promoting females, then we would expect between 8 and 26 females to be promoted within the group of 72 employees promoted. Since we observed only 5 females promoted, we would infer that females were not promoted fairly. Copyright © 2014 Pearson Education, Inc. 188 4.84 4.85 Chapter 4 Table II in the text gives the area between z 0 and z z0 . In this exercise, the answers may thus be read directly from the table by looking up the appropriate z. a. P(0 z 2.0) .4772 b. P(0 z 3.0) .4987 c. P(0 z 1.5) .4332 d. P(0 z .80) .2881 Using Table II, Appendix D: a. P( z 1.46) .5 P(0 z 1.46) .5 .4279 .0721 b. P( z 1.56) .5 P(1.56 z 0) .5 .4406 .0594 c. P(.67 z 2.41) P(0 z 2.41) P(0 z .67) .4920 .2486 .2434 d. P(1.96 z .33) P(1.96 z 0) P(.33 z 0) .4750 .1293 .3457 e. f. P( z 0) .5 P(2.33 z 1.50) P(2.33 z 0) P(0 z 1.50) .4901 .4332 .9233 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.86 Using Table II, Appendix D, a. b. c. d. e. f. 4.87 P(1 z 1) P(1 z 0) P(0 z 1) .3413 .3413 .6826 P(2 z 2) P(2 z 0) P(0 z 2) .4772 .4772 .9544 P(2.16 z 0.55) P(2.16 z 0) P(0 z 0.55) .4846 .2088 .6934 P(.42 z 1.96) P(.42 z 0) P(0 z 1.96) .1628 .4750 .6378 P( z 2.33) P(2.33 z 0) P( z 0) .4901 .5 .9901 P( z 2.33) P( z 0) P(0 z 2.33) .5 .4901 .9901 Using Table II, Appendix D: a. P(1 z 1) P(1 z 0) P(0 z 1) .3413 .3413 .6826 Copyright © 2014 Pearson Education, Inc. 189 190 Chapter 4 b. c. d. 4.88 P(1.96 z 1.96) P(1.96 z 0) P(0 z 1.96) .4750 .4750 .9500 P(1.645 z 1.645) P(1.645 z 0) P(0 z 1.645) .4500 .4500 .9000 (using interpolation) P(2 z 2) P(2 z 0) P(0 z 2) .4772 .4772 .9544 Using Table II, Appendix D: a. P ( z z0 ) .05 A1 .5 .05 .4500 Looking up the area .4500 in Table II gives z0 1.645 . b. P( z z0 ) .025 A1 .5 .025 .4750 Looking up the area .4750 in Table II gives z 0 1.96 . c. P ( z z0 ) .025 A1 .5 .025 .4750 Looking up the area .4750 in Table II gives z 1.96 . Since z0 is to the left of 0, z 0 1.96 . d. P ( z z0 ) .10 A1 .5 .1 .4000 Looking up the area .4000 in Table II gives z 0 1.28 . Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions e. P ( z z 0 ) .10 A1 .5 .1 .4000 z 0 1.28 (same as in d) 4.89 Using Table II of Appendix D: a. P ( z z 0 ) .2090 A .5 .2090 .2910 Looking up the area .2910 in the body of Table II gives z0 .81 . (z0 is negative since the graph shows z0 is on the left side of 0.) b. P ( z z 0 ) .7090 P( z z0 ) P( z 0) P(0 z z0 ) .5 P(0 z z0 ) .7090 Therefore, P (0 z z 0 ) .7090 .5 .2090 A Looking up the area .2090 in the body of Table II gives z0 .55 . c. P ( z0 z z0 ) .8472 P ( z0 z z0 ) 2 P (0 z z0 ) .8472 Therefore, P (0 z z0 ) .8472 / 2 .4236 . Looking up the area .4236 in the body of Table II gives z 0 1.43 . d. P ( z0 z z0 ) .1664 P ( z0 z z0 ) 2 P (0 z z0 ) .1664 Therefore, P (0 z z0 ) .1664 / 2 .0832 . Looking up the area .0832 in the body of Table II gives z0 .21 . e. P ( z 0 z 0) .4798 P ( z 0 z 0) P (0 z z 0 ) Looking up the area .4798 in the body of Table II gives z 0 2.05 . f. P ( 1 z z 0 ) .5328 P ( 1 z z 0 ) P ( 1 z 0) P (0 z z 0 ) .5328 Copyright © 2014 Pearson Education, Inc. 191 192 Chapter 4 P (0 z 1) P (0 z z 0 ) .5328 Thus, P (0 z z 0 ) .5328 .3413 .1915 Looking up the area .1915 in the body of Table II gives z0 .50 . 4.90 4.91 4.92 a. z 1 b. z 1 c. z0 d. z 2.5 e. z3 a. z b. z c. z d. z e. z f. z x x x x x x 20 30 2.50 4 30 30 0 4 27.5 30 0.625 4 15 30 3.75 4 35 30 1.25 4 25 30 1.25 4 Using Table II of Appendix D: a. To find the probability that x assumes a value more than 2 standard deviations from : P( x 2 ) P( x 2 ) P( z 2) P( z 2) 2 P( z 2) 2(.5 .4772) 2(.0228) .0456 To find the probability that x assumes a value more than 3 standard deviations from : P( x 3 ) P( x 3 ) P( z 3) P( z 3) 2 P( z 3) 2(.5 .4987) 2(.0013) .0026 b. To find the probability that x assumes a value within 1 standard deviation of its mean: Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions P( x ) P(1 z 1) 2 P(0 z 1) 2(.3413) .6826 To find the probability that x assumes a value within 2 standard deviations of : P( 2 x 2 ) P(2 z 2) 2 P(0 z 2) 2(.4772) .9544 c. To find the value of x that represents the 80th percentile, we must first find the value of z that corresponds to the 80th percentile. P ( z z0 ) .80 . Thus, A1 A2 .80 . Since A1 .50 , A2 .80 .50 .30 . Using the body of Table II, z0 .84 . To find x, we substitute the values into the z-score formula: z x .84 x 1000 x .84(10) 1000 1008.4 10 To find the value of x that represents the 10th percentile, we must first find the value of z that corresponds to the 10th percentile. P ( z z0 ) .10 . Thus, A1 .50 .10 .40 . Using the body of Table II, z 0 1.28 . To find x, we substitute the values into the z-score formula: z 4.93 a. b. x 1.28 x 1000 x 1.28(10) 1000 987.2 10 12 11 10 11 P(10 x 12) P z 2 2 P(0.50 z 0.50) A1 A2 .1915 .1915 .3830 10 11 6 11 P(6 x 10) P z P(2.50 z 0.50) 2 2 P(2.50 z 0) P(0.50 z 0) .4938 .1915 .3023 Copyright © 2014 Pearson Education, Inc. 193 194 Chapter 4 c. d. e. f. 4.94 16 11 13 11 P(13 x 16) P z P(1.00 z 2.50) 2 2 P(0 z 2.50) P(0 x 1.00) .4938 .3413 .1525 12.6 11 7.8 11 P(7.8 x 12.6) P z 2 2 P(1.60 z 0.80) A1 A2 .4452 .2881 .7333 13.24 11 P( x 13.24) P z 2 P( z 1.12) A2 .5 A1 .5000 .3686 .1314 7.62 11 P( x 7.62) P z 2 P( z 1.69) A1 A2 =.4545 .5000 .9545 The random variable x has a normal distribution with 50 and 3 . a. P ( x x0 ) .8413 So, A1 A2 .8413 Since A1 .5 , A2 .8413 .5 .3413 Looking up the area .3413 in the body of Table II, Appendix D gives z0 1.0 . To find x0, substitute all the values into the z-score formula: z b. x 1.0 x0 50 x0 50 3(1.0) 53 3 P ( x x0 ) .025 So, A .5 .025 .4750 Looking up the area .4750 in the body of Table II, Appendix D gives z 0 1.96 . To find x0, substitute all the values into the z-score formula: z x 1.96 x0 50 x0 50 3(1.96) 55.88 3 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions c. 195 P ( x x0 ) .95 So, A1 A2 .95 . Since A2 .5 , A1 .95 .5 .4500 . Looking up the area .4500 in the body of Table II, Appendix D gives (since it is exactly between two values, average the z-scores) z 0 1.645 . To find x0, substitute into the z-score formula: x z d. 1.645 x0 50 x0 50 3(1.645) 45.065 3 P (41 x x0 ) .8630 x z 41 50 3 3 A1 P(41 x ) P(3 z 0) P(0 z 3) .4987 A1 A2 .8630 , since A1 .4987 , A2 .8630 .4987 .3643 . Looking up .3643 in the body of Table II, Appendix D gives z0 1.1 . To find x0, substitute into the z-score formula: z e. x 1.1 x0 50 x 50 3(1.1) 53.3 3 P ( x x0 ) .10 So A .5 .10 .4000 Looking up area .4000 in the body of Table II, Appendix D gives z 0 1.28 . Since z0 is to the left of 0, z 0 1.28 . To find x0, substitute all the values into the z-score formula: z f. x 1.28 x 50 x 50 3(1.28) 46.16 3 P ( x x0 ) .01 A .5 .01 .4900 Looking up area .4900 in the body of Table II, Appendix D gives z0 2.33 . To find x0, substitute all the values into the z-score formula: Copyright © 2014 Pearson Education, Inc. 196 Chapter 4 z 4.95 a. x 2.33 x 50 x 50 3(2.33) 56.99 3 In order to approximate the binomial distribution with the normal distribution, the interval 3 np 3 npq should lie in the range 0 to n. When n = 25 and p = .4, np 3 npq 25 .4 3 25(.4)(1 .4) 10 3 6 10 7.3485 (2.6515, 17.3485) Since the interval calculated does lie in the range 0 to 25, we can use the normal approximation. 4.96 b. 2 np 25(.4) 10 and npq 25(.4)(.6) 6 c. P ( x 9) 1 P ( x 8) 1 .274 .726 (Table I, Appendix D) d. (9 .5) 10 P( x 9) P z 6 P( z .61) .5000 .2291 .7291 (using Table II, Appendix D) np 1000(.5) 500 , npq 1000(.5)(.5) 15.811 a. Using the normal approximation, (500 .5) 500 P( x 500) P z P z .03 =.5 .0120 .4880 15.811 (from Table II, Appendix D) 4.97 b. (500 .5) 500 (490 .5) 500 z P (490 x 500) P P ( .66 z .03) 15.811 15.811 .2454 .0120 .2334 (from Table II, Appendix D) c. (550 .5) 500 P( x 550) P z P z 3.19 .5 .49929 .00071 15.811 (from Table II, Appendix D) a. Using MINITAB with 105.3 and 8 , the probability is: Cumulative Distribution Function Normal with mean = 105.3 and standard deviation = 8 x 120 P( X <= x ) 0.966932 P( x 120) 1 P x 120 1 .966932 .033068 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 197 Using MINITAB with 105.3 and 8 , the probabilities are: Cumulative Distribution Function Normal with mean = 105.3 and standard deviation = 8 x 110 100 P( X <= x ) 0.721566 0.253825 P (100 x 110) P ( x 110) P ( x 100) .721566 .253825 .467741 c. Using MINITAB with 105.3 and 8 , the value of a is found: Inverse Cumulative Distribution Function Normal with mean = 105.3 and standard deviation = 8 P( X <= x ) 0.25 x 99.9041 Thus, a = 99.9041. 4.98 Using MINITAB with 67.755 and 26.871 , the probabilities are: Cumulative Distribution Function Normal with mean = 67.755 and standard deviation = 26.871 x 40 120 P( X <= x ) 0.150826 0.974070 a. P ( x 40) .150826 b. P (40 x 120) P ( x 120) P ( x 40) .974070 .150826 .823244 c. P ( x 120) 1 P ( x 120) 1 .974070 .02593 d. We want to find a where P ( x a ) .25 . Using MINITAB with 67.755 and 26.871 , the value of a is found: Inverse Cumulative Distribution Function Normal with mean = 67.755 and standard deviation = 26.871 P( X <= x ) 0.25 x 49.6308 Thus, a = 49.6308. Copyright © 2014 Pearson Education, Inc. 198 4.99 Chapter 4 a. Using MINITAB with 59 and 5 , the probability is: Cumulative Distribution Function Normal with mean = 59 and standard deviation = 5 x 60 P( X <= x ) 0.579260 P ( x 60) 1 P ( x 60) 1 .57926 .42074 b. Using MINITAB with 43 and 5 , the probability is: Cumulative Distribution Function Normal with mean = 43 and standard deviation = 5 x 60 P( X <= x ) 0.999663 P ( x 60) 1 P ( x 60) 1 .999663 .000337 4.100 a. Using Table II, Appendix D, 0 5.26 P ( x 0) P z P ( z 0.526) 10 .5 P ( 0.53 z 0) .5 .2019 .7019 b. 15 5.26 5 5.26 P(5 x 15) P z P(0.026 z 0.974) 10 10 P(.03 z 0) P(0 z .97) .0120 .3340 .3460 c. 1 5.26 P( x 1) P z P( z 0.426) 10 .5 P(0.43 z 0) .5 .1664 .3336 d. 25 5.26 P( x 25) P z P( z 3.026) 10 .5 P(3.03 z 0) .5 .4988 .0012 Since the probability of seeing a win percentage of -25% or anything more unusual is so small (p = .0012), we would conclude that the average casino win percentage is not 5.26%. 4.101 a. Let x buy-side analyst’s forecast error. Then x has an approximate normal distribution with .85 and 1.93 . Using Table II, Appendix D, 2.00 .85 P( x 2.00) P z P( z .60) .5 .2257 .2743 1.93 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 199 Let y sell-side analyst’s forecast error. Then y has an approximate normal distribution with .05 and .85 . Using Table II, Appendix D, 2.00 (.05) P( y 2.00) P z P( z 2.41) .5 .4920 .0080 .85 4.102 Let x driver’s head injury rating. The random variable x has a normal distribution with 605 and 185 . Using Table II, Appendix D, a. 700 605 500 605 z P(500 x 700) P P(0.57 z 0.51) 185 185 P ( 0.57 z 0) P (0 z 0.51) .2157 .1950 .4107 b. 500 605 400 605 z P(400 x 500) P P(1.11 z 0.57) 185 185 P ( 1.11 z 0) P ( 0.57 z 0) .3665 .2157 .1508 c. 850 605 P( x 850) P z P( z 1.32) .5 P(0 z 1.32) 185 .5 .4066 .9066 d. 1, 000 605 P( x 1, 000) P z P( z 2.14) .5 P(0 z 2.14) 185 .5 .4838 .0162 4.103 From Exercise 4.49, we determined that x is a binomial random variable with n = 250 and p = .5 a. np 250(.5) 125 b. npq 250(.5)(.5) 62.5 7.9057 c. z d. In order to approximate the binomial distribution with the normal distribution, the interval x 200 125 9.49 7.9057 3 np 3 npq should lie in the range 0 to n. When n 250 and p .5 , np 3 npq 125 3(7.9057) 125 23.7171 (101.2829, 148.7171) Since the interval calculated does lie in the range 0 to 250, we can use the normal approximation. Copyright © 2014 Pearson Education, Inc. 200 Chapter 4 Using MINITAB with 125 and 7.9057 , the approximate probability is: Cumulative Distribution Function Normal with mean = 125 and standard deviation = 7.9057 x 200 P( X <= x ) 1 P ( x 200) 1 4.104 Let x = number of patients who undergo laser surgery who have serious post-laser vision problems in 100,000 trials. Then x is a binomial random variable with n 100, 000 and p .01 . E ( x ) np 100, 000(.01) 1, 000 2 npq 100, 000(.01)(.99) 990 31.464 To see if the normal approximation is appropriate, we use: 3 1, 000 3(31.464) 1, 000 94.392 (905.608, 1, 094.392) Since the interval lies in the range of 0 to 100,000, the normal approximation is appropriate. 949.5 1000 P( x 950) P z P( z 1.61) .5 .4463 .0537 (using Table II, Appendix D) 31.464 4.105 If the goal keeper stands in the middle of the goal and can reach any ball within 9 feet, then the only way a player can score is if he/she shoots the ball within 3 feet of either goal post. a. If a player aims at the right goal post, then the player will score if x is between -3 and 0. Using Table II, 00 3 0 z Appendix D, we get P(3 x 0) P P(1 z 0) .3413 . 3 3 b. If a player aims at the center of the goal, then the player will score if x is greater than 9 or less than -9. Using Table II, Appendix D, we get 9 0 90 P ( x 9) P ( x 9) P z P z P ( z 3) P ( z 3) 3 3 (.5 .4987) (.5 .4987) .0026 c. If a player aims halfway between the right goal post and the outer limit of the goal keeper’s reach, then the player will score if x is between -1.5 and 1.5. Using Table II, Appendix D, we get 1.5 0 1.5 0 z P(1.5 x 1.5) P P(.5 z .5) .1915 .1915 .3830 . 3 3 4.106 Let x = number of defects per million. Then x has an approximate normal distribution with 3 . Using Table II, Appendix D, 3 1.5 3 3 1.5 3 z P(3 1.5 x 3 1.5 ) P P(1.5 z 1.5) .4332 .4332 .8664 It is fairly likely that the goal will be met. Since the probability is .8664, the goal would be met approximately 86.64% of the time. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.107 a. 201 Let x = rating. Then x has a normal distribution with 50 and 15 . Using Table II, Appendix D, P x xo .10. Find xo. x 50 P( z zo ) .10 P( x xo ) P z o 15 A1 .5 .10 .4000 Looking up area .4000 in Table II, z o 1.28 zo b. xo 50 x 50 1.28 o xo 50 1.28(15) 69.2 15 15 P x xo .10 .20 .40 .70. Find xo. x 50 P( z zo ) .70 P( x xo ) P z o 15 A1 .70 .5 .2000 Looking up area .2000 in Table II, zo .52 zo 4.108 a. xo 50 x 50 .52 o xo 50 .52(15) 42.2 15 15 Using MINITAB with 7.5 and 2.5 , the probability is: Cumulative Distribution Function Normal with mean = 7.5 and standard deviation = 2.5 x 9 P( X <= x ) 0.725747 P ( x 9) .725747 . Since this probability is less than .90, the regulations are not being met at EMS station A. b. Using MINITAB with 7.5 and 2.5 , the probability is: Cumulative Distribution Function Normal with mean = 7.5 and standard deviation = 2.5 x 2 P( X <= x ) 0.0139034 P ( x 2) .0139034 . Since this probability is so small, it would be very unlikely that the call was serviced by Station A. 4.109 a. Using Table II, Appendix D, and 75 and 7.5 , 80 75 P( x 80) P z P z .67 .5 .2486 .2514 7.5 Thus, 25.14% of the scores exceeded 80. Copyright © 2014 Pearson Education, Inc. 202 Chapter 4 b. P ( x x0 ) .98. Find x0. x 75 P( x x0 ) P z 0 P( z z0 ) .98 7.5 A1 .98 .5 .4800 Looking up area .4800 in Table II, zo 2.05 . z0 4.110 x 0 75 x 75 2.05 0 x 0 90.375 7.5 7.5 Let x = wage rate. The random variable x is normally distributed with 18.50 and 1.25 . Using Table II, Appendix D, a. 19.80 18.50 P( x 19.80) P z P( z 1.04) 1.25 .5 P(0 z 1.04) .5 .3508 .1492 b. 19.80 18.50 P( x 19.80) P z P( z 1.04) 1.25 .5 P(0 z 1.04) .5 .3508 .1492 c. P ( x ) P ( x ) .5 . Thus, 18.50 . (Recall from section 2.4 that in a symmetric distribution, the mean equals the median.) 4.111 Let x = number of additional Electoral College votes a candidate will win if he/she wins California’s 55 votes. Then x has a normal distribution with 241.5 and 49.8 . In order to be elected, the candidate will have to win an additional 270 – 55 215 votes or x has to be greater than or equal to 215. Using MINITAB with 241.5 and 49.8 , the probability is: Cumulative Distribution Function Normal with mean = 241.5 and standard deviation = 49.8 x 215 P( X <= x ) 0.297318 P ( x 215) 1 P ( x 215) 1 .297318 .702682 The probability the candidate becomes the next president if he/she wins California is about .70. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.112 203 Let x = number of guests who participate in the conservation program in 200 trials. Then x is a binomial random variable with n 200 and p .45 . The mean of the distribution is np 200(.45) 90 and the standard deviation is npq 200(.45)(.55) 49.5 7.0356 . In order to approximate the binomial distribution with the normal distribution, the interval 3 np 3 npq should lie in the range 0 to n. When n 200 and p .45 , np 3 npq 90 3(7.0356) 90 21.1068 (68.8932, 111.1068) Since the interval calculated does lie in the range 0 to 200, we can use the normal approximation. 110 .5 90 P( x 110) P z P( z 2.91) .5 .4982 .0018 7.0356 Since this probability is so low, it is very unlikely that the claim is true. 4.113 b. Let v = number of credit card users out of 100 who carry Visa. Then v is a binomial random variable with n 100 and p .50 . E (v ) npv 100(.50) 50. Let d = number of credit card users out of 100 who carry Discover. Then d is a binomial random variable with n 100 and pd .09 . E ( d ) np d 100(.09) 9. c. To see if the normal approximation is valid, we use: 3 npv 3 npv qv 100(.5) 3 100(.5)(.5) 50 3(5) 50 15 (35, 65) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. (50 .5) 50 P(v 50) P z P( z .1) .5 .0398 .5398 5 Let a = number of credit card users out of 100 who carry American Express. Then a is a binomial random variable with n 100 and pa .08 . To see if the normal approximation is valid, we use: 3 npa 3 npa qa 100(.08) 3 100(.08)(.92) 8 3(2.713) 8 8.139 (.139, 16.139) Since the interval does not lie in the range 0 to 100, using the normal approximation to approximate the probability is risky. (50 .5) 8 P(a 50) P z P( z 15.30) .5 .5 0 2.713 Copyright © 2014 Pearson Education, Inc. 204 4.114 Chapter 4 d. In order for the normal approximation to be valid, 3 must lie in the interval (0, n). This check was done in part c for both portions of the question. The normal approximation was justified for the first part but not the second. a. Let x = quantity injected per container. The random variable x has a normal distribution with 10 and .2 . 10 10 P x 10 P z P z 0.0 .5 .2 10 10 P( x 10) P z P( z 0.0) .5 .2 4.115 b. Since the container needed to be reprocessed, it cost $10. Upon refilling, it contained 10.60 units with a cost of 10.60($20) = $212. Thus, the total cost for filling this container is $10 + $212 = $222. Since the container sells for $230, the profit is $230 $222 = $8. c. Let x = quantity injected per container. The random variable x has a normal distribution with 10.10 and .2 . The expected value of x is E ( x ) 10.10 . The cost of a container with 10.10 units is 10.10($20) = $202. Thus, the expected profit would be the selling price minus the cost or $230 $202 = $28. We have to find the probability of observing x = .7 or anything more unusual given the two different values of . Without receiving executive coaching: Using Table II, Appendix D with .75 and .085 , .7 .75 P( x .7) P z P( z .59) .5 .2224 .2776 . .085 After receiving executive coaching: Using Table II, Appendix D with .52 and .075 , .7 .52 P( x .7) P z P( z 2.40) .5 .4918 .0082 . .075 Since the probability of observing x .7 for those not receiving executive coaching is much larger than the probability of x .7 for those receiving executive coaching, it is more likely that the leader did not receive executive coaching. 4.116 a. If z is a standard normal random variable, QL zL is the value of the standard normal distribution which has 25% of the data to the left and 75% to the right. Find zL such that P z zL .25 A1 .50 .25 .25. Look up the area A1 .25 in the body of Table II of Appendix D; zL .67 (taking the closest value). If interpolation is used, .675 would be obtained. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 205 QU z U is the value of the standard normal distribution which has 75% of the data to the left and 25% to the right. Find zU such that P z zU .75 A1 A2 P( z 0) P(0 z zU ) .5 P(0 z zU ) .75 Therefore, P (0 z z U ) .25. Look up the area .25 in the body of Table II of Appendix D; z U .67 (taking the closest value). b. Recall that the inner fences of a box plot are located 1.5(QU QL ) outside the hinges (QL and QU). To find the lower inner fence, QL 1.5(QU QL ) .67 1.5 .67 ( .67) .67 1.5 1.34 2.68 ( 2.70 if zL .675 and z U .675) The upper inner fence is: QU 1.5(QU QL ) .67 1.5 .67 ( .67) .67 1.5 1.34 2.68 ( 2.70 if zL .675 and z U .675) c. Recall that the outer fences of a box plot are located 3(QU QL ) outside the hinges (QL and QU). To find the lower outer fence, QL -3(QU QL ) .67 3 .67 ( .67) .67 3 1.34 4.69 ( 4.725 if zL .675 and z U .675) The upper outer fence is: QU 3(QU QL ) .67 3 .67 ( .67) .67 3 1.34 4.69 (4.725 if z L .675 and z U .675) Copyright © 2014 Pearson Education, Inc. 206 Chapter 4 d. P ( z 2.68) P z 2.68 2P z 2.68 2(.5000 .4963) 2 .0037 .0074 (Table II, Appendix D) (or 2(.5000 .4965) .0070 if 2.70 and 2.70 are used) P( z 4.69) P z 4.69 2P z 4.69 2(.5000 .5000) 0 4.117 4.118 e. In a normal probability distribution, the probability of an observation being beyond the inner fences is only .0074 and the probability of an observation being beyond the outer fences is approximately zero. Since the probability is so small, there should not be any observations beyond the inner and outer fences. Therefore, they are probably outliers. a. The proportion of measurements that one would expect to fall in the interval is about .68. b. The proportion of measurements that one would expect to fall in the interval 2 is about .95. c. The proportion of measurements that one would expect to fall in the interval 3 is about 1.00. a. IQR QU QL = 195 72 123 b. IQR / s 123 / 95 1.295 c. Yes. Since IQR is approximately 1.3, this implies that the data are approximately normal. 4.119 If the data are normally distributed, then the normal probability plot should be an approximate straight line. Of the three plots, only plot c implies that the data are normally distributed. The data points in plot c form an approximately straight line. In both plots a and b, the plots of the data points do not form a straight line. 4.120 a. Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: x Stem-and-leaf of x Leaf Unit = 0.10 5 6 8 11 14 14 10 7 2 1 2 3 4 5 6 7 8 9 N = 28 11266 1 35 035 039 3457 346 24469 47 Since the data do not form a mound-shape, it indicates that the data may not be normally distributed. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 207 Using MINITAB, the descriptive statistics are: Descriptive Statistics: x Variable x N 28 Mean 5.511 StDev 2.765 Minimum 1.100 Q1 3.350 Median 6.100 Q3 8.050 Maximum 9.700 The standard deviation is 2.765. c. Using the printout from MINITAB in part b, QL 3.35 , and QU 8.05 . The IQR QU QL = 8.05 3.35 4.7 . If the data are normally distributed, then IQR / s 1.3 . For this data, IQR / s 4.7 / 2.765 1.70 . This is a fair amount larger than 1.3, which indicates that the data may not be normally distributed. d. Using MINITAB, the normal probability plot is: Probability Plot of x Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 5.511 2.765 28 0.533 0.158 Percent 80 70 60 50 40 30 20 10 5 1 -5 0 5 x 10 15 The data do not form a particularly a straight line. This indicates that the data are not normally distributed. a. Using MINITAB, a histogram of the data is: Histogram of Support Normal 80 Mean StDev N 70 67.76 26.87 992 60 Frequency 4.121 50 40 30 20 10 0 0 20 40 60 80 Support 100 120 140 The data are fairly mound-shaped. This indicates that the data are probably from a normal distribution. Copyright © 2014 Pearson Education, Inc. 208 Chapter 4 b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Support Variable Support N 992 Mean 67.755 StDev 26.871 Minimum 0.000000000 Q1 49.000 Median 68.000 Q3 86.000 Maximum 155.000 If the data are normal, then approximately 68% of the observations should fall within 1 standard deviation of the mean. For this data, the interval is x s 67.755 26.871 (40.884, 94.626) . There are 665 out of the 992 observations in this interval which is 665 / 992 .670 or 67%. This is very close to the 68%. If the data are normal, then approximately 95% of the observations should fall within 2 standard deviations of the mean. For this data, the interval is x 2 s 67.755 2(26.871) 67.755 53.742 (14.013, 121.497) . There are 946 out of the 992 observations in this interval which is 946 / 992 .954 or 95.4%. This is very close to the 95%. If the data are normal, then approximately 100% of the observations should fall within 3 standard deviations of the mean. For this data, the interval is x 3s 67.755 3(26.871) 67.755 80.613 ( 12.858, 148.368) . There are 991 out of the 992 observations in this interval which is 991/ 992 .999 or 99.9%. This is very close to the 100%. Since these percents are very close to percentages for the normal distribution, this indicates that the data are probably from a normal distribution. c. The IQR QU QL 86 49 37 and the standard deviation is s 26.871 . If the data are normal, IQR 37 IQR 1.3 . For this data, 1.377 . This is very close to 1.3. This indicates that the s s 26.871 data probably come from a normal distribution. then d. Using MINITAB, the normal probability plot is: Probability Plot of Support Normal - 95% CI 99.99 Mean StDev N AD P-Value 99 Percent 95 67.76 26.87 992 0.496 0.214 80 50 20 5 1 0.01 0 50 100 Support 150 200 Except for the several 0’s on the left of the plot, the data are very close to a straight line. This again indicates that the data probably come from a normal distribution. 4.122 The histogram of the data is very close to a normal distribution. The engineers should use the normal distribution to model the behavior of shear strength for rack fractures. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.123 a. 209 If the data are normal, then approximately 68% of the observations should fall within 1 standard deviation of the mean. For this data, the interval is x s 89.2906 3.1834 (86.1072, 92.4740) . There are 34 out of the 50 observations in this interval which is 34 / 50 .68 or 68%. This is exactly the 68%. If the data are normal, then approximately 95% of the observations should fall within 2 standard deviations of the mean. For this data, the interval is x 2 s 89.2906 2(3.1834) 89.2906 6.3668 (82.9238, 95.6574) . There are 48 out of the 50 observations in this interval which is 48 / 50 .96 or 96%. This is very close to the 95%. If the data are normal, then approximately 100% of the observations should fall within 3 standard deviations of the mean. For this data, the interval is x 3s 89.2906 3(3.1934) 89.2906 9.5502 (79.7404, 98.8408) . There are 50 out of the 50 observations in this interval which is 50 / 50 1.00 or 100%. This is exactly the 100%. Since these percents are very close to percentages for the normal distribution, this indicates that the data are approximately normal. The IQR QU QL 91.88 87.2725 4.6075 and the standard deviation is s = 3.1834. If the data are IQR 4.6075 IQR 1.3 . For this data, 1.447 . This is fairly close to 1.3. This s s 3.1834 indicates that the data are approximately normal. normal, then b. 4.124 4.125 The data on the plot are fairly close to a straight line. This indicates that the data are approximately normal. Based on the normal probability plot, it appears that the data are not approximately normal. If the data are normal, then the probability plot should reflect a straight line. In this graph, the plot of the data is not a straight line. The information given in the problem states that x 4.71 , s 6.09 , QL 1 , and QU 6 . To be normal, the data have to be symmetric. If the data are symmetric, then the mean would equal the median and would be half way between the lower and upper quartile. Half way between the upper and lower quartiles is 3.5. The sample mean is 4.71, which is much larger than 3.5. This implies that the data may not be normal. In addition, the interquartile range divided by the standard deviation will be approximately 1.3 if the data are normal. For this data, IQR QU QL 6 1 .82 6.09 s s The value of .82 is much smaller than the necessary 1.3 to be normal. Again, this is an indication that the data are not normal. Finally, the standard deviation is larger than the mean. Since one cannot have values of the variable in this case less than 0, a standard deviation larger than the mean indicates that the data are skewed to the right. This implies that the data are not normal. Copyright © 2014 Pearson Education, Inc. 4.126 Chapter 4 We will look at the 4 methods or determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the failure times of the 50 used panels is: Histogram of Fail Normal 12 Mean StDev N 1.935 0.9287 50 10 8 Frequency 210 6 4 2 0 0 1 2 Fail 3 4 From the histogram, the data appear to have a somewhat normal distribution. Next, we look at the intervals x s, x 2 s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Fail Variable Fail N 50 Mean 1.935 StDev 0.929 Q1 1.218 Median 1.835 Q3 2.645 x s 1.935 .929 (1.006, 2.864) 33 of the 50 values fall in this interval. The proportion is 33/50 = .66. This is fairly close to the .68 we would expect if the data were normal. x 2 s 1.935 2(.929) 1.935 1.858 (0.077, 3.793) 49 of the 50 values fall in this interval. The proportion is 49/50 = .98. This is a fair amount above the .95 we would expect if the data were normal. x 3s 1.935 3(.929) 1.935 2.787 ( 0.852, 4.722) 50 of the 50 values fall in this interval. The proportion is 50/50 =1.00. This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the data may be normal. Next, we look at the ratio of the IQR to s. IQR = Q U – Q L = 2.645 – 1.218 1.427 . IQR 1.427 1.54 . This is somewhat larger than the 1.3 we would expect if the data were normal. This s .929 method indicates the data may be normal. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 211 Finally, using MINITAB, the normal probability plot is: Probability Plot of Fail Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 1.935 0.9287 50 0.305 0.557 Percent 80 70 60 50 40 30 20 10 5 1 -1 0 1 2 Fail 3 4 5 Since the data form a fairly straight line, the data may be normal. From the 4 different methods, all indications are that the failure times are approximately normal. We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the driver’s head injury rating is: Histogram of DrivHead Normal 25 Mean StDev N 603.7 185.4 98 20 Frequency 4.127 15 10 5 0 200 400 600 800 DrivHead 1000 1200 From the histogram, the data appear to be somewhat skewed to the right, but is fairly mound-shaped. This indicates that the data are approximately normal. Next, we look at the intervals x s, x 2s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DrivHead Variable DrivHead N 98 Mean 603.7 StDev 185.4 Minimum 216.0 Q1 475.0 Median 605.0 Q3 724.3 Maximum 1240.0 x s 603.7 185.4 (418.3, 789.1) 68 of the 98 values fall in this interval. The proportion is .69. This is very close to the .68 we would expect if the data were normal. Copyright © 2014 Pearson Education, Inc. 212 Chapter 4 x 2 s 603.7 2(185.4) 603.7 370.8 (232.9, 974.5) 96 of the 98 values fall in this interval. The proportion is .98. This is a fair amount larger than the .95 we would expect if the data were normal. x 3s 603.7 3(185.4) 603.7 556.2 (47.5, 1,159.9) 97 of the 98 values fall in this interval. The proportion is .99. This is fairly close to the 1.00 we would expect if the data were normal. From this method, it appears that the data may be normal. Next, we look at the ratio of the IQR to s. IQR QU QL 724.3 475.0 249.3 . IQR 249.3 1.3 This is equal to the 1.3 we would expect if the data were normal. This method indicates s 185.4 the data are approximately normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of DrivHead Normal - 95% CI 99.9 Mean StDev N AD P-Value 99 95 603.7 185.4 98 0.492 0.214 Percent 90 80 70 60 50 40 30 20 10 5 1 0.1 0 200 400 600 800 DrivHead 1000 1200 1400 Since the data form a fairly straight line, the data are approximately normal. From the 4 different methods, all indications are that the driver’s head injury rating data are normal. We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the sanitation scores is: Histogram of Score 60 50 40 Frequency 4.128 30 20 10 0 60.0 67.5 75.0 Score 82.5 90.0 97.5 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 213 From the histogram, the data appear to be skewed to the left. This indicates that the data are not normal. Next, we look at the intervals x s, x 2s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DDT Variable Score N 182 Mean 95.044 StDev 5.391 Minimum 56.000 Q1 94.000 Median 96.500 Q3 98.000 Maximum 100.000 x s 95.044 5.391 (89.653, 100.435) 164 of the 182 values fall in this interval. The proportion is .90. This is much larger than the .68 we would expect if the data were normal. x 2 s 95.044 2(5.391) 95.044 10.782 (84.262, 105.826) 174 of the 182 values fall in this interval. The proportion is .96. This is slightly larger than the .95 we would expect if the data were normal. x 3s 95.044 3(5.391) 95.044 16.173 (78.871, 111.271) 178 of the 182 values fall in this interval. The proportion is .978. This is somewhat smaller than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR QU QL 798 94 4 . IQR 4 .742 This is much smaller than the 1.3 we would expect if the data were normal. This s 5.391 method indicates the data are not normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of Score Normal - 95% CI 99.9 Mean StDev N AD P-Value 99 Percent 95 90 95.04 5.391 182 10.307 <0.005 80 70 60 50 40 30 20 10 5 1 0.1 50 60 70 80 90 100 110 120 Score Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the sanitation scores data are not normal. Copyright © 2014 Pearson Education, Inc. 4.129 Chapter 4 We will look at the 4 methods or determining if the 3 variables are normal. Distance: First, we will look at A histogram of the data. Using MINITAB, the histogram of the distance data is: Histogram of DISTANCE Normal 18 Mean StDev N 16 298.9 7.525 40 14 12 Frequency 214 10 8 6 4 2 0 285 290 295 300 305 DISTANCE 310 315 320 From the histogram, the distance data do not appear to have a normal distribution. Next, we look at the intervals x s, x 2 s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DISTANCE, ACCURACY, INDEX Variable DISTANCE N 40 Mean 298.95 StDev 7.53 Minimum 283.20 Q1 294.60 Median 299.05 Q3 302.00 Maximum 318.90 x s 298.95 7.53 (291.42, 306.48) 28 of the 40 values fall in this interval. The proportion is 28 / 40 .70 . This is fairly close to the .68 we would expect if the data were normal. x 2 s 298.95 2(7.53) 298.95 15.06 (283.89, 314.01) 37 of the 40 values fall in this interval. The proportion is 37 / 40 .925 . This is a fair amount below the .95 we would expect if the data were normal. x 3s 298.95 3(7.53) 298.95 22.59 (276.36, 321.54) 40 of the 40 values fall in this interval. The proportion is 40 / 40 1.00 . This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the distance data may not be normal. Next, we look at the ratio of the IQR to s. IQR Q U – Q L = 302 – 294.6 7.4 IQR 7.4 .983 . This is much smaller than the 1.3 we would expect if the data were normal. This s 7.53 method indicates the distance data may not be normal. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 215 Finally, using MINITAB, the normal probability plot is: Probability Plot of DISTANCE Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 298.9 7.525 40 0.521 0.174 Percent 80 70 60 50 40 30 20 10 5 1 280 290 300 DISTANCE 310 320 Since the data do not form a fairly straight line, the distance data may not be normal. From the 4 different methods, all indications are that the distance data are not normal. Accuracy: First, we will look at a histogram of the data. Using MINITAB, the histogram of the accuracy data is: Histogram of ACCURACY Normal 14 Mean StDev N 12 61.97 5.226 40 Frequency 10 8 6 4 2 0 48 54 60 ACCURACY 66 72 From the histogram, the accuracy data do not appear to have a normal distribution. Descriptive Statistics: DISTANCE, ACCURACY, INDEX Variable ACCURACY N 40 Mean 61.970 StDev 5.226 Minimum 45.400 Q1 59.400 Median 61.950 Q3 64.075 Maximum 73.000 x s 61.97 5.226 (56.744, 67.196) 30 of the 40 values fall in this interval. The proportion is 30 / 40 .75 . This is much greater than the .68 we would expect if the data were normal. x 2 s 61.97 2(5.226) 61.97 10.452 (51.518, 72.422) 37 of the 40 values fall in this interval. The proportion is 37 / 40 .925 . This is a fair amount below the .95 we would expect if the data were normal. Copyright © 2014 Pearson Education, Inc. Chapter 4 x 3s 61.97 3(5.226) 61.97 15.678 (46.292, 77.648) 39 of the 40 values fall in this interval. The proportion is 39 / 40 .975 . This is a fair amount lower than the 1.00 we would expect if the data were normal. From this method, it appears that the accuracy data may not be normal. Next, we look at the ratio of the IQR to s. IQR Q U – Q L 64.075 – 59.4 4.675 . IQR 4.675 .895 . This is much smaller than the 1.3 we would expect if the data were normal. This s 5.226 method indicates the accuracy data may not be normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of ACCURACY Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 61.97 5.226 40 0.601 0.111 Percent 80 70 60 50 40 30 20 10 5 1 45 50 55 60 65 ACCURACY 70 75 80 Since the data do not form a fairly straight line, the accuracy data may not be normal. From the 4 different methods, all indications are that the accuracy data are not normal. Index: First, we will look at a histogram of the data. Using MINITAB, the histogram of the index data is: Histogram of INDEX Normal Mean StDev N 10 1.927 0.6602 40 8 Frequency 216 6 4 2 0 0.5 1.0 1.5 2.0 INDEX 2.5 3.0 3.5 From the histogram, the index data do not appear to have a normal distribution. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 217 Next, we look at the intervals x s, x 2 s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DISTANCE, ACCURACY, INDEX Variable INDEX N 40 Mean 1.927 StDev 0.660 Minimum 1.170 Q1 1.400 Median 1.755 Q3 2.218 Maximum 3.580 x s 1.927 .660 (1.267, 2.587) 30 of the 40 values fall in this interval. The proportion is 30 / 40 .75 . This is much greater than the .68 we would expect if the data were normal. x 2 s 1.927 2(.660) 1.927 1.320 (.607, 3.247) 37 of the 40 values fall in this interval. The proportion is 37 / 40 .925 . This is a fair amount below the .95 we would expect if the data were normal. x 3 s 1.927 3(.660) 1.927 1.980 ( .053, 3.907) 40 of the 40 values fall in this interval. The proportion is 40 / 40 1.000 . This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the index data may not be normal. Next, we look at the ratio of the IQR to s. IQR Q U – Q L 2.218 – 1.4 .818 . IQR .818 1.23 . This is fairly close to the 1.3 we would expect if the data were normal. This method s .66 indicates the index data may normal. Finally, using MINTAB, the normal probability plot is: Probability Plot of INDEX Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 1.927 0.6602 40 1.758 <0.005 Percent 80 70 60 50 40 30 20 10 5 1 0 1 2 INDEX 3 4 Since the data do not form a fairly straight line, the index data may not be normal. From 3 of the 4 different methods, the indications are that the index data are not normal. Copyright © 2014 Pearson Education, Inc. 4.130 Chapter 4 We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the tensile strength values is: Histogram of Strength 3.0 2.5 Fr equency 218 2.0 1.5 1.0 0.5 0.0 330 335 340 345 Str ength 350 355 From the histogram, the data appear to be somewhat skewed to the left. This might indicate that the data are not normal. Next, we look at the intervals x s, x 2s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Strength Variable Strength N Mean 11 342.13 StDev 7.91 Minimum 328.20 Q1 334.70 Median 343.60 Q3 347.80 Maximum 356.30 x s 342.13 7.91 (334.22, 350.04) 8 of the 11 values fall in this interval. The proportion is .73. This is somewhat larger than the .68 we would expect if the data were normal. x 2 s 342.13 2(7.91) 342.13 15.82 (326.31, 357.95) All 11 of the 11 values fall in this interval. The proportion is 1.00. This is somewhat larger than the .95 we would expect if the data were normal. x 3 s 342.13 3(7.91) 342.13 23.73 (318.40, 365.86) Again, all 11 of the 11 values fall in this interval. The proportion is 1.00. This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the data are quite normal. Next, we look at the ratio of the IQR to s. IQR Q U – Q L = 347.80 – 334.70 13.1 . IQR 13.1 1.656 This is much larger than the 1.3 we would expect if the data were normal. This method s 7.91 indicates the data are not normal. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 219 Finally, using MINITAB, the normal probability plot is: Probability Plot of Strength Normal - 95% CI 99 Mean 342.1 StDev 7.907 N 11 AD 0.154 P-Value 0.937 95 90 80 Percent 70 60 50 40 30 20 10 5 1 310 320 330 340 Strength 350 360 370 Since the data do form a fairly straight line, the data could be normal. From the 4 different methods, three of the four indicate that the data probably are not from a normal distribution. 4.131 From Exercise 2.51, it states that the mean number of semester hours for those taking the CPA exam is 141.31 and the median is 140. It also states that most colleges only require 128 semester hours for an undergraduate degree. Thus, the minimum value for the total semester hours is around 128. The z-score associated with 128 is: z x 128 141.31 .75 17.77 If the data are normal, we know that about .34 of the observations are between the mean and 1 standard deviation below the mean. Thus, .16 of the observations are more than 1 standard deviation below the mean. With this distribution, that is impossible. Thus, the data are not normal. The mean is greater than the median, so we know that the data are skewed to the right. 4.132 a. f ( x) 1 d c (c x d ) 1 1 1 d c 73 4 1 (3 x 7) f ( x) 4 0 otherwise c d 3 7 10 5 2 2 2 b. c. 5 1.155 3.845, 6.155 d c 12 P ( x ) P (3.845 x 6.155) 73 12 4 12 1.155 b a 6.155 3.845 2.31 .5775 d c 73 4 Copyright © 2014 Pearson Education, Inc. 220 4.133 Chapter 4 a. f ( x) 1 (c x d ) d c 1 1 1 .04 d c 45 20 25 .04 (20 x 45) So, f ( x) 0 otherwise c d 20 45 65 32.5 2 2 2 c. Using MINITAB, the graph is: f(x) b. d c 12 45 20 12 7.22 1/25 0 20 45 x 2 18.06 32.5 2 46.94 2 32.5 2(7.22) 18.06, 46.94 P 18.06 x 46.94 P 20 x 45 (45 20).04 1 4.134 .04 (20 x 45) From Exercise 4.133, f ( x) 0 otherwise a. P (20 x 30) (30 20)(.04) .4 b. P 20 x 30 (30 20)(.04) .4 c. P ( x 30) (45 30)(.04) .6 d. P ( x 45) (45 45)(.04) 0 e. P ( x 40) (40 20)(.04) .8 f. P x 40 (40 20)(.04) .8 g. P (15 x 35) (35 20)(.04) .6 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions h. 4.135 4.136 4.137 P (21.5 x 31.5) (31.5 21.5)(.04) .4 P( x a) ea / ea /1 . Using a calculator: a. P x 1 e1/1 e1 .367879 b. P x 3 1 P x 3 1 e3/1 1 e3 1 .049787 .950213 c. P x 1.5 e1.5/1 e1.5 .223130 d. P x 5 1 P x 5 1 e5/1 1 e5 1 .006738 .993262 a. P x 4 1 P x 4 1 e4/2.5 1 e1.6 1 .201897 .798103 b. P x 5 e5/2.5 e2 .135335 c. P x 2 1 P x 2 1 e2/2.5 1 e.8 1 .449329 .550671 d. P x 3 e3/2.5 e1.2 .301194 f ( x) 1 1 1 .01 d c 200 100 100 .01 (100 x 200) f ( x) 0 otherwise a. c d 100 200 300 150 2 2 2 d c 12 200 100 12 100 12 28.8675 2 150 2(28.8675) 150 57.735 92.265, 207.735 P x 92.265 P x 207.735 P x 100 P x 200 0 0 0 b. 3 150 3(28.8675) 150 86.6025 63.3975, 236.6025 P 63.3975 x 236.6025 P 100 x 200 (200 100) .01 1 c. From a, 2 92.265, 207.735 . P 92.265 x 207.735 P 100 x 200 (200 100) .01 1 Copyright © 2014 Pearson Education, Inc. 221 222 4.138 Chapter 4 With =2 , 2 a. 3 2 3(2) 2 6 ( 4, 8) Since 3 lies below 0, find the probability that x is more than 3 8 . P x 8 e8/2 e4 .018316 b. 2 2 2(2) 2 4 ( 2, 6) Since 2 lies below 0, find the probability that x is between 0 and 6. P x 6 1 P( x 6) 1 e6/2 1 e3 1 .049787 .950213 c. .5 2 .5(2) 2 1 (1, 3) P 1 x 3 P x 1 P x 3 e1/2 e3/2 e.5 e1.5 .606531 .223130 .383401 4.139 For this problem, f ( x ) 1 f ( x) 3600 0 1 1 . Thus, 3600 0 3600 0 x 3600 otherwise The last 15 minutes would represent the last 15(60) = 900 seconds. P (2700 x 3600) (3600 2700) 4.140 a. 1 900 .25 3600 3600 P(1200 x 1500) P( x 1200) P( x 1500) e1200/1000 e1500/1000 e1.2 e1.5 .3012 .2231 .0781 4.141 b. P( x 1200) e1200/1000 e1.2 .3012 c. P ( x 1500 | x 1200) a. Let x = temperature with no bolt-on trace elements. Then x has a uniform distribution. f ( x) 1 d c P (1200 x 1500) .0781 .2593 .3012 P ( x 1200) (c x d ) 1 1 1 d c 290 260 30 1 Therefore, f ( x ) 30 0 (260 x 290) otherwise Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions P(280 x 284) (284 280) 1 1 4 .133 30 30 Let y = temperature with bolt-on trace elements. Then y has a uniform distribution. f ( y) 1 d c (c y d ) 1 1 1 d c 285 278 7 1 Therefore, f ( y ) 7 0 (278 y 285) otherwise P(280 y 284) (284 280) b. P( x 268) (268 260) 1 1 4 .571 7 7 1 1 8 .267 30 30 P ( y 268) (268 260)(0) 0 4.142 a. Let x = number of anthrax spores. Then x has an approximate uniform distribution. f ( x) 1 d c c x d 1 1 1 .1 d c 10 0 10 .1 Therefore, f ( x) 0 (0 x 10) otherwise P x 8 8 – 0.1 .8 4.143 4.144 b. P 2 x 5 5 – 2.1 .3 a. P x 2 e2/2.5 e.8 .449329 b. P x 5 1– P( x 5) 1– e5/2.5 1– e2 1 .135335=.864665 (using a calculator) a. Let x = time until the first critical part failure. Then x has an exponential distribution with =.1 . (using a calculator) P( x 1) e1/.1 e10 .0000454 (using a calculator) b. .5/.1 30 minutes = .5 hours. P( x .5) 1 P( x .5) 1 e 1 e5 1 .0067 .9933 Copyright © 2014 Pearson Education, Inc. 223 4.145 Chapter 4 a. For this problem, x has a uniform distribution on the interval from 0 to 1. Thus, b. 1 For this problem, f ( x) 0 c. 4.146 a. 0 x 1 c d 0 1 .5. 2 2 P ( x .7) (1 .7)(1) .3 otherwise 2 2! 1 . Thus, the density can be either 0 or 1. With n = 2, the total possible connections is 2 2!(2 2)! Therefore, the uniform model would not be a good approximation for the distribution of network density. For layer 2, let x = amount loss. Since the amount of loss is random between .01 and .05 million dollars, the uniform distribution for x is: f ( x) 1 d c (c x d ) 1 1 1 25 d c .05 .01 .04 25 (.01 x .05) Therefore, f ( x) 0 otherwise A graph of the distribution looks like the following: f(x) 224 25 0 .01 .05 x d c .05 .01 c d .01 + .05 2 .0115 , 2 .0115 .00013 = .03 , 2 2 12 12 The mean loss for layer 2 is .03 million dollars and the variance of the loss for layer 2 is .00013 million dollars squared. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions For layer 6, let x = amount loss. Since the amount of loss is random between .50 and 1.00 million dollars, the uniform distribution for x is: f ( x) 1 d c (c x d ) 1 1 1 2 d c 1.00 .50 .50 2 (.50 x 1.00) Therefore, f ( x) 0 otherwise A graph of the distribution looks like the following: f(x) b. 225 2 0 .05 1 x d c 1.00 .50 c d .50 1.00 .1443 , .75 , 2 2 12 12 2 .1443 .0208 2 The mean loss for layer 6 is .75 million dollars and the variance of the loss for layer 6 is .0208 million dollars squared. c. A loss of $10,000 corresponds to x .01 . P x .01 1 A loss of $25,000 corresponds to x .025 . P x .025 Base Height (.025 .01)(25) .015 25 .375 d. A loss of $750,000 corresponds to x .75 . A loss of $1,000,000 corresponds to x 1 . P .75 x 1 1.00 .75 (2) .25 2 .5 A loss of $900,000 corresponds to x .90 . P x .9 (1.00 .90)(2) .10 2 .20 P x .9 0 Copyright © 2014 Pearson Education, Inc. 4.147 Chapter 4 a. The amount dispensed by the beverage machine is a continuous random variable since it can take on any value between 6.5 and 7.5 ounces. b. Since the amount dispensed is random between 6.5 and 7.5 ounces, x is a uniform random variable. f ( x) 1 d c (c x d ) 1 1 1 1 d c 7.5 6.5 1 1 Therefore, f ( x) 0 (6.5 x 7.5) otherwise The graph is as follows: f(x) 226 1 0 6.5 7.5 x 2 6.422 c. 7 2 7.577 c d 6.5 7.5 14 7 2 2 2 d c 12 7.5 6.5 12 .2887 2 7 2(.2887) 7 .5774 6.422, 7.577 d. P( x 7) (7.5 7) 1 .5 e. P x 6 0 f. P(6.5 x 7.25) (7.25 6.5) 1 .75 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions g. 227 The probability that the next bottle filled will contain more than 7.25 ounces is: P x 7.25 (7.5 7.25) 1 .25 The probability that the next 6 bottles filled will contain more than 7.25 ounces is: P[ x 7.25 x 7.25 x 7.25 x 7.25 x 7.25 x 7.25 ] P x 7.25 .256 .0002 6 4.148 a. 120/95 .2828 Two minutes equals 120 seconds. P( x 120) e b. Using MINITAB, a histogram of the data with the exponential distribution displayed on top of it is: Histogram of INTTIME Exponential Mean N 70 95.52 267 60 Frequency 50 40 30 20 10 0 0 75 150 225 300 INTTIME 375 450 525 The data appear to fit the exponential distribution fairly well. 4.149 a. Let x = product’s lifetime at the end of its lifetime. Then x has an exponential distribution with 500, 000 . P x 700,000 1– P x 700,000 1 – e700000/5000000 1– e1.4 1 .246597 .753403 b. Let y = product’s lifetime during its normal life. Then y has a uniform distribution. f ( y) 1 d c (c y d ) 1 1 1 d c 1, 000,000 100, 000 900,000 1 Therefore, f ( y ) 900,000 0 (100,000 y 1,000,000) otherwise 1 P( y 700, 000) (700, 000 100, 000) .667 900, 000 Copyright © 2014 Pearson Education, Inc. 228 Chapter 4 c. P x 830,000 1– P x 830,000 1– e830000/5000000 1– e1.66 1 .190139 .809861 (Using a calculator) 1 P( y 830, 000) (830, 000 100, 000) .811 900, 000 4.150 Let x = cycle availability, where x has a uniform distribution on the interval from 0 to 1. d c 1 0 c d 0 1 .289 .5 and 2 2 12 12 The 10th percentile is that value of x such that 10% of all observations are below it. Let K1 = 10th percentile. P ( x K1 ) ( K1 0)(1 0) K1 .10 The lower quartile is that value of x such that 25% of all observations are below it. Let K2 = 25th percentile. P ( x K 2 ) ( K 2 0)(1 0) K 2 .25 The upper quartile is that value of x such that 75% of all observations are below it. Let K3 = 75th percentile. P ( x K 3 ) ( K 3 0)(1 0) K 3 .75 4.151 Let x = number of inches a gouge is from one end of the spindle. Then x has a uniform distribution with f(x) as follows: 1 1 1 f ( x) d c 18 0 18 0 0 x 18 otherwise In order to get at least 14 consecutive inches without a gouge, the gouge must be within 4 inches of either end. Thus, we must find: P x 4 P x 14 (4 0) 1/18 (18 14) 1/18 4 /18 4 /18 8 /18 .4444 4.152 a. Let x = life length of CD-ROM. Then x has an exponential distribution with 25, 000 . R(t ) P( x t ) et /25,000 b. c. R(8,760) P( x 8,760) e8,760/25,000 e.3504 .7044 S(t) = probability that at least one of two drives has a length exceeding t hours = 1 – probability that neither has a length exceeding t hours 1 – P( x1 t ) P( x2 t ) 1 – 1 – P x1 t 1 – P x2 t 1 – 1 – e t / 25,000 1 – e t / 25,000 1 – 1 – 2e t / 25,000 e t /12,500 2e t / 25,000 – e t /12,500 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.153 d. S (8,760) 2e8,760/25,000 e8,760/12,500 2(.7044) .4962 1.4088 .4962 .9126 e. The probability in part d is greater than that in part b. We would expect this. The probability that at least one of the systems lasts longer than 8,760 hours would be greater than the probability that only one system lasts longer than 8,760 hours. Let x be a random variable with an exponential distribution with mean . Let k = median of the distribution. Then P ( x k ) .5. We now need to find k. P( x k ) .5 e k / .5 k / ln(.5) k ln(.5) .693147 a. Using MINITAB, a graph is: f(p) 4.154 1 0 0 1 p c d 0 1 d c 1 0 .289 , .5 , 2 2 12 12 b. c. P p .95 (1 .95) 1 .05 2 .2892 .083 P p .95 (.95 0) 1 .95 d. The analyst should use a uniform probability distribution with c .90 and d .95 . 1 1 1 30 f ( p ) d c .95 .90 .05 0 4.155 229 a. For 250 , P x a e (.90 p .95) otherwise a /250 For a 300 and b 200 , show P x a b P x a P x b P x 300 200 P x 500 e500/250 e2 .1353 P x 300 P x 200 e300/250e200/250 e1.2e.8 .3012 .4493 .1353 Since P x 300 200 P x 300 P x 200 , then P x 300 200 P x 300 P x 200 Copyright © 2014 Pearson Education, Inc. 230 Chapter 4 Also, show P x 300 200 P x 300 P x 200 . Since we already showed that P x 300 200 P x 300 P x 200 , then P x 300 200 P x 300 P x 200 . b. Let a 50 and b 100 . Show P x a b P x a P x b P x 50 100 P x 150 e150/250 e.6 .5488 P x 50 P x 100 e50/250 e100/250 e.2 e.4 .8187 .6703 .5488 Since P x 50 100 P x 50 P x 100 , then P x 50 100 P x 50 P x 100 Also, show P x 50 100 P x 50 P x 100 . Since we already showed that P x 50 100 P x 50 P x 100 , then P x 50 100 P x 50 P x 100 . c. Show P x a b P x a P x b P x a b e a b / 250 4.156 4.157 e a / 250 eb / 250 P x a P x b a. This experiment consists of 100 trials. Each trial results in one of two outcomes: chip is defective or not defective. If the number of chips produced in one hour is much larger than 100, then we can assume the probability of a defective chip is the same on each trial and that the trials are independent. Thus, x is a binomial. If, however, the number of chips produced in an hour is not much larger than 100, the trials would not be independent. Then x would not be a binomial random variable. b. This experiment consists of two trials. Each trial results in one of two outcomes: applicant qualified or not qualified. However, the trials are not independent. The probability of selecting a qualified applicant on the first trial is 3 out of 5. The probability of selecting a qualified applicant on the second trial depends on what happened on the first trial. Thus, x is not a binomial random variable. It is a hypergeometric random variable. c. The number of trials is not a specified number in this experiment, thus x is not a binomial random variable. In this experiment, x is counting the number of calls received. d. The number of trials in this experiment is 1000. Each trial can result in one of two outcomes: favor state income tax or not favor state income tax. Since 1000 is small compared to the number of registered voters in Florida, the probability of selecting a voter in favor of the state income tax is the same from trial to trial, and the trials are independent of each other. Thus, x is a binomial random variable. n p ( x) p x q n - x x = 0, 1, 2, ... , n x a. 7 7! 3 4 P( x 3) p(3) .53.57 3 .5 .5 35(.125)(.0625) .2734 3 3!4! b. 4 4! 3 1 P( x 3) p(3) .83.24 3 .8 .2 4 .512 .2 .4096 3 3!1! Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions c. 4.158 a. 15 15! 1 14 P( x 1) p(1) .11.9151 .1 .9 15 .1.228768 .3432 1!14! 1 xp( x) 10 .2 12 .3 18 .1 20 .4 15.4 2 ( x )2 p( x) (10 15.4)2 .2 (12 15.4)2 .3 (18 15.4)2 .1 (20 15.4)2 .4 18.44 18.44 4.294 4.159 b P x 15 p 10 p 12 .2 .3 .5 c. 2 15.4 2 4.294 6.812, 23.988 d. P 6.812 x 23.988 .2 .3 .1 .4 1.0 From Table I, Appendix D: a. P x 14 P( x 14) P( x 13) .584 .392 .192 b. P ( x 12) .228 c. P x 12 1 P( x 12) 1 .228 .772 d. P (9 x 18) P ( x 18) P ( x 8) .992 .005 .987 e. P 8 x 18 P( x 17) P( x 8) .965 .005 .960 f. np 20 .7 14 , g. 2 14 2 2.049 14 4.098 9.902, 18.098 2 npq 20 .7.3 4.2 , 4.2 = 2.049 P 9.902 x 18.098 P(10 x 18) P( x 18) P( x 9) .992 .017 .975 4.160 a. Using MINITAB with 2 , Probability Density Function Poisson with mean = 2 x 3 P( X = x ) 0.180447 p 3 P x 3 .180447 Copyright © 2014 Pearson Education, Inc. 231 232 Chapter 4 b. Using MINITAB with 1 , Probability Density Function Poisson with mean = 1 x 4 P( X = x ) 0.0153283 p 4 P x 4 .0153283 c. Using MINITAB with .5 , Probability Density Function Poisson with mean = .5 x 2 P( X = x ) 0.0758163 p 2 P x 2 .0758163 4.161 4.162 4.163 a. Poisson b. Binomial c. Binomial a. r N r 38 3 3! 5! x n x 2 5 2 2!1! 3!2! 3(10) P ( x 2) .536 8! 56 N 8 5!3! n 5 b. r N r 26 2 2! 4! x n x 2 2 2 2!0! 0!4! 1(1) P( x 2) .067 6! 15 N 6 2!4! n 2 c. r N r 4 5 4 4! 1! x n x 3 4 3 3!1!1!0! 4(1) P( x 3) .8 5! 5 N 5 4!1! n 3 a. Discrete - The number of damaged inventory items is countable. b. Continuous - The average monthly sales can take on any value within an acceptable limit. c. Continuous - The number of square feet can take on any positive value. d. Continuous - The length of time we must wait can take on any positive value. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.164 a. 1 1 1 , f ( x ) d c 90 10 80 0 otherwise b. c d 10 90 50 2 2 d c 12 90 10 12 23.094011 The interval 2 50 2 23.094 50 46.188 3.812, 96.188 is indicated on the graph. f(x) c. (10 x 90) 1/80 0 10 90 x 2 3.812 4.165 50 2 96.188 1 5 .625 80 8 d. P ( x 60) (60 10) e. P ( x 90) 0 f. P ( x 80) (80 10) g. P ( x ) P (50 23.094 x 50 23.094) P (26.906 x 73.094) h. P x 75 (90 75) a. P ( z 2.1) A1 A2 .5 .4821 .9821 1 7 .875 80 8 1 46.188 (73.094 26.906) .577 80 80 1 15 .1875 80 80 Copyright © 2014 Pearson Education, Inc. 233 234 4.166 Chapter 4 b. P ( z 2.1) A2 .5 A1 .5 .4821 .0179 c. P ( z 1.65) A1 A2 .4505 .5000 .9505 d. P(2.13 z .41) P(2.13 z 0) P(.41 z 0) .4834 .1591 .3243 e. P ( 1.45 z 2.15) A1 A2 .4265 .4842 .9107 f. P ( z 1.43) A1 .5 A2 .5000 .4236 .0764 a. P ( z z 0 ) .5080 P (0 z z 0 ) .5080 .5 .0080 Looking up the area .0080 in Table II, gives z0 .02 . b. P ( z z 0 ) .5517 P ( z 0 z 0) .5517 .5 .0517 Looking up the area .0517 in Table II, z0 .13 . Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions c. 235 P ( z z 0 ) .1492 P (0 z z 0 ) .5 .1492 .3508 Looking up the area .3508 in Table II, gives z 0 1.04 . d. P ( z 0 z .59) .4773 P ( z0 z 0) P (0 z .59) .4773 P (0 z .59) .2224 Thus, P ( z0 z 0) .4773 .2224 .2549 Looking up the area .2549 in Table II, gives z0 .69 . 4.167 x 7 a. For the probability density function, f ( x) e b. For the probability density function, f ( x ) 1 , 5 x 25 , x is a uniform random variable. 20 c. For the probability function, f ( x) a. P( x 1) 1 P x 1 1 e1/3 1 .716531 .283469 (using calculator) b. P x 1 e1/3 .716531 c. P x 1 0 , x 0 , x is an exponential random variable. 7 e.5[( x 10)/5] 2 4.168 5 2 , x is a normal random variable. (x is a continuous random variable. There is no probability associated with a single point.) 4.169 d. P( x 6) 1 P x 6 1 e6/3 1 e2 1 .135335 .864665 (using a calculator) e. P(2 x 10) P( x 2) P x 10 e2/3 e10/3 .513417 .035674 .47774 (using calculator) a. 80 75 P( x 80) P z P( z .5) .5+.1915 .6915 10 (Table II, Appendix D) b. 85 75 P( x 85) P z P( z 1) .5 .3413 .1587 10 (Table II, Appendix D) Copyright © 2014 Pearson Education, Inc. 236 Chapter 4 c. 75 75 70 75 P(70 x 75) P z 10 10 P ( .5 z 0) P (0 z .5) .1915 (Table II, Appendix D) d. P x 80 1 P( x 80) 1 .6915 .3085 (Refer to part a.) e. P x 78 0 , since a single point does not have an area. f. 110 75 P( x 110) P z P( z 3.5) 10 .5 .49977 .99977 (Table II, Appendix D) 4.170 np 100 .5 50 , npq 100(.5)(.5) 5 a. (48 .5) 50 P( x 48) P z P( z .30) 5 .5 .1179 =.3821 b. (65 .5) 50 (50 .5) 50 P(50 x 65) P z 5 5 P ( .10 z 3.10) .0398 .49903 .53883 c. (70 .5) 50 P( x 70) P z P( z 3.90) 5 .5 .49995 .00005 d. (58 .5) 50 (55 .5) 50 P(55 x 58) P z 5 5 P(.90 z 1.70) P(0 z 1.70) P(0 z .90) .4554 .3159 .1395 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions e. (62 .5) 50 (62 .5) 50 P x 62 P z 5 5 P(2.30 z 2.50) P(0 z 2.50) (0 z 2.30) .4938 .4893 .0045 f. P ( x 49 or x 72) (49 .5) 50 (72 .5) 50 P z P z 5 5 P ( z .10) P ( z 4.30) (.5 .0398) (.5 .5) .4602 4.171 x is normal random variable with 40 , 2 36 , and 6 . a. P ( x x0 ) .10 So, A .5 .10 .4000 . Looking up the area .4000 In the body of Table II, Appendix D gives z 0 1.28 . To find x0, substitute the values into the z-score formula: x x 40 1.28 0 x0 1.28(6) 40 47.68 z0 0 6 b. P ( x x0 ) .40 Looking up the area .4000 in the body of Table II, Appendix D gives z 0 1.28 . To find x0, substitute the values into the z-score formula: z0 c. x0 1.28 x0 40 x0 1.28(6) 40 47.68 6 P x x0 .05 So, A .5000 .0500 .4500 . Looking up the area .4500 in the body of Table II, Appendix D gives z 0 1.645 . (.45 is halfway between .4495 and .4505; therefore, we average the z-scores) 1.64 1.65 1.645 2 z0 is negative since the graph shows z0 is on the left side of 0. Copyright © 2014 Pearson Education, Inc. 237 238 Chapter 4 To find x0, substitute the values into the z-score formula: z0 d. x0 1.645 x0 40 x0 1.645(6) 40 30.13 6 P x x0 .40 So, A .5000 .4000 .1000 . Looking up the area .1000 in the body of Table II, Appendix D gives z0 .25 . To find x0, substitute the values into the z-score formula: z0 e. x0 .25 x0 40 x0 .25(6) 40 41.5 6 P ( x0 x ) .45 Looking up the area .4500 in the body of Table II, Appendix D gives z 0 1.645 . (.45 is halfway between .4495 and .4505; therefore, we average the z-scores) 1.64 1.65 1.645 2 z0 is negative since the graph shows z0 is on the left side of 0. To find x0, substitute the values into the z-score formula: z0 4.172 a. x0 1.645 x0 40 x0 1.645(6) 40 30.13 6 We will check the 5 characteristics of a binomial random variable. 1. The experiment consists of n = 5 identical trials. We have to assume that the number of bottled water brands is large. 2. There are only 2 possible outcomes for each trial. Let S = brand of bottled water used tap water and F = brand of bottled water did not use tap water. 3. The probability of success (S) is the same from trial to trial. For each trial, p P S .25 and q 1 – p 1 .25 .75 . 4. The trials are independent. 5. The binomial random variable x is the number of brands in the 5 trials that used tap water. If the total number of brands of bottled water is large, then the above characteristics will be basically true. Thus, x is a binomial random variable. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. c. d. e. 5 The formula for the probability distribution for x is p( x) .25x (.75)5 x , for x = 0, 1, 2, 3, 4, 5. x 5 5! P( x 2) .252 (.75)5 2 .252.753 .2637 2!3! 2 5 5 P( x 1) P( x 0) P ( x 1) .250 (.75)5 0 .251 (.75)5 1 0 1 5! 5! .250.755 .251.754 .2373 .3955 .6328 0!5! 1!4! E ( x ) np 65(.25) 16.25 2 npq 65(.25)(.75) 12.1875 3.49 To see if the normal approximation is appropriate, we use: 3 16.25 3 3.49 16.25 10.47 5.78, 26.72 Since this interval lies in the range from 0 to 65, the normal approximation is appropriate. (20 .5) 16.25 P( x 20) P z P( z .93) .5 P(0 z .93) .5 .3238 .1762 3.49 (Using Table II, Appendix D) Since this probability is not small, it is likely that 20 or more brands will contain tap water. p( x ) p(0) p(1) p(2) p(3) p(4) p(5) 6 4.173 a. i 1 i .0102 .0768 .2304 .3456 .2592 .0778 1.0000 b. P x 4 .2592 c. P x 2 P x 0 P x 1 .0102 .0768 .0870 d. P( x 3) P x 3 P x 4 P x 5 .3456 .2592 .0778 .6826 E ( x) xi p ( xi ) 0(.0102) 1(.0768) 2(.2304) 3(.3456) 4(.2592) 5(.0778) 6 e. i 1 0 .0768 .4608 1.0368 1.0368 .3890 3.0002 On the average, 3 out of every 5 dentists will use nitrous oxide. 4.174 Let x = transmission delay. The random variable x has a normal distribution with 48.5 and 8.5 . Using Table II, Appendix D, a. 57 48.5 P( x 57) P z P( z 1.00) 8.5 .5 P(0 z 1) .5 .3413 .8413 Copyright © 2014 Pearson Education, Inc. 239 240 4.175 Chapter 4 b. 60 48.5 40 48.5 P(40 x 60) P z P(1 z 1.35) 8.5 8.5 P(1 z 0) P(0 z 1.35) .3413 .4115 .7528 a. For this problem, c 0 and d 1 . 1 1 1 d c 1 0 1 (0 x 1) f ( x) 0 otherwise c d 0 1 .5 2 2 2 ( d c ) 2 (1 0) 2 1 .0833 12 12 12 .0833 .289 b. P .2 x .4 (.4 .2) 1 .2 c. P x .995 (1 .995) 1 .005 . Since the probability of observing a trajectory greater than .995 is so small, we would not expect to see a trajectory exceeding .995. 4.176 a. Using MINITAB with 1.2 , Probability Density Function Poisson with mean = 1.2 x 0 P( X = x ) 0.301194 P x 0 .301194 b. Using MINITAB with 1.2 , Cumulative Distribution Function Poisson with mean = 1.2 x 1 P( X <= x ) 0.662627 P ( x 2) 1 – P ( x 1) 1 .662627 .337373 4.177 Let x = interarrival time between patients. Then x is an exponential random variable with a mean of 4 minutes. a. P x 1 1 P( x 1) 1 e1/4 1 e.25 1 .778801 .221199 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. Assuming that the interarrival times are independent, P(next 4 interarrival times are all less than 1 minute) P x 1 .2211994 .002394 4 c. 4.178 a. P x 10 e10/4 e2.5 .082085 Let x = number of trees infected with the Dutch elm disease in the two trees purchased. For this problem, x is a hypergeometric random variable with N 10 , n 2 , and r 3 . The probability that both trees will be healthy is: r N r 3 10 3 3! 7! x n x 0 2 0 0!3! 2!5! 1(21) .467 P x 0 10! 45 N 10 2!8! n 2 4.179 b. The probability that at least one tree will be infected is: P( x 1) 1 P x 0 1 .467 .533 . a. We will check the 5 characteristics of a binomial random variable. 1. 2. 3. 4. 5. The experiment consists of n 20 identical trials. There are only 2 possible outcomes for each trial. Let S = intruding object is detected and F = intruding object is not detected. The probability of success (S) is the same from trial to trial. For each trial, p P S .8 and q 1 – p 1 .8 .2 . The trials are independent. The binomial random variable x is the number of intruding objects in the 20 trials that are detected. Thus, x is a binomial random variable. b. For this experiment, n 20 and p .8 . c. Using Table I, Appendix D, with n 20 and p .8 , P x 15 P( x 15) P( x 14) .370 .196 .174 d. Using Table I, Appendix D, with n 20 and p .8 , P ( x 15) 1 P ( x 14) 1 .196 .804 e. E x np 20 .8 16 . For every 20 intruding objects, SBIRS will detect an average of 16. Copyright © 2014 Pearson Education, Inc. 241 242 4.180 Chapter 4 Let x = demand for white bread. Then x is a normal random variable with 7200 and 300 : a. P ( x x0 ) .94 . Find x0. x 7200 P( x x0 ) P z 0 P( z z0 ) .94 300 A1 .94 .50 .4400 Using Table II and area .4400, z0 1.555 . z0 b. x 0 7200 x 7200 1.555 0 x0 7666.5 7667 300 300 If the company produces 7,667 loaves, the company will be left with more than 500 loaves if the demand is less than 7, 667 500 7,167 . 7167 7200 P x 7167 P z P( z .11) .5 .0438 .4562 300 (from Table II, Appendix D) Thus, on 45.62% of the days the company will be left with more than 500 loaves. 4.181 a. Let x1 = repair time for machine 1. Then x1 has an exponential distribution with 1 1 hour. P x1 1 e1/1 e1 .367879 (using a calculator) b. Let x2 = repair time for machine 2. Then x2 has an exponential distribution with 2 2 hours. P x2 1 e1/2 e.5 .606531 (using a calculator) c. Let x3 = repair time for machine 3. Then x3 has an exponential distribution with 3 .5 hours. P x3 1 e1/.5 e2 .135335 (using a calculator) Since the mean repair time for machine 4 is the same as for machine 3, P x4 1 P x3 1 .135335. d. The only way that the repair time for the entire system will not exceed 1 hour is if all four machines are repaired in less than 1 hour. Thus, the probability that the repair time for the entire system exceeds 1 hour is: P(Repair time entire system exceeds 1 hour) 1 P ( x1 1) ( x2 1) ( x3 1) ( x4 1) 1 P ( x1 1) P ( x2 1) P( x3 1) P ( x4 1) 1 (1 .367879)(1 .606531)(1 .135335)(1 .135335) 1 .632121.393469 .864665 .864665 1 .185954 .814046 4.182 a. In order for the number of deaths to follow a Poisson distribution, we must assume that the probability of a death is the same for any week. We must also assume that the number of deaths in any week is independent of any other week. The first assumption may not be valid. The probability of a death may not be the same for every week. The number of passengers varies from week to week, so the probability of a death may change. Also, things such as weather, which varies from week to week may increase or decrease the chance of derailment. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. E ( x ) 16 and 16 4 c. The z-score corresponding to x 4 is z d. Using MINITAB with 16 , we get the following probability: 243 4 16 3 . Since this z-score is 3 standard deviations from 4 the mean, it would be very unlikely that only 4 or fewer deaths occur next week. Cumulative Distribution Function Poisson with mean = 160 x 4 P( X <= x ) 0.0004004 P ( x 4) .0004 This probability is consistent with the answer in part c. The probability of 4 or fewer deaths is essentially zero, which is very unlikely. 4.183 4.184 nr 10(8) .383 N 209 a. For N 209 , r 10 , and n 8 , E ( x ) b. 8 209 8 8! 201! 4 10 4 4!(8 4)! 6!(201 6)! .0002 P ( x 4) 209! 209 10!(209 10)! 10 To construct a relative frequency histogram for the data, we can use 7 measurement classes. Interval width = Largest number - smallest number 98.0716 .7434 13.9 Number of classes 7 We will use an interval width of 14 and a starting value of .74335. The measurement classes, frequencies, and relative frequencies are given in the table. Class Measurement Class Class Frequency 1 2 3 4 5 6 7 .74335 14.74335 14.74335 28.74335 28.74335 42.74335 42.74335 56.74335 56.74335 70.74335 70.74335 84.74335 84.74335 98.74335 6 4 6 6 5 4 9 40 Class Relative Frequency 6/40 = .15 .10 .15 .15 .125 .10 .225 1.000 The histogram looks like the data could be from a uniform distribution. The last class (84.74335 98.74335) has a few more observations in it than we would expect. However, we cannot expect a perfect graph from a sample of only 40 observations. Copyright © 2014 Pearson Education, Inc. 244 Chapter 4 Histogram of Class Relative frequency .25 .20 .15 .10 98.74335 84.74335 70.74335 56.74335 42.74335 28.74335 .74335 0 14.74335 .05 Class Using MINITAB, the histogram with the normal distribution overlaid is: Histogram of Bankruptcy Normal 20 Mean StDev N 2.549 1.828 49 15 Frequency 4. 185 10 5 0 0 2 4 6 Bankruptcy 8 10 The data are skewed to the right, and do not appear to be normally distributed. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Bankruptcy Variable Bankruptcy N 49 Mean 2.549 StDev 1.828 Minimum 1.000 Q1 1.350 Median 1.700 Q3 3.500 Maximum 10.100 x s 2.549 1.828 0.721, 4.377 x 2s 2.549 2 1.828 2.549 3.656 (1.107, 6.205) x 3s 2.549 31.828 2.549 5.484 (2.935, 8.033) Of the 49 measurements, 44 are in the interval (0.721, 4.377). The proportion is 44 / 49 .898 . This is much larger than the proportion (.68) stated by the Empirical Rule. Of the 49 measurements, 47 are in the interval (1.107, 6.205). The proportion is 47 / 49 .959 . This is close to the proportion (.95) stated by the Empirical Rule. Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 245 Of the 49 measurements, 48 are in the interval (2.935, 8.033). The proportion is 48 / 49 .980 . This is smaller than the proportion (1.00) stated by the Empirical Rule. This would imply that the data are not normal. IQR 2.15 1.176 . If the data are normally distributed, this s 1.828 ratio should be close to 1.3. Since 1.176 is smaller than 1.3, this indicates that the data may not be normal. IQR QU QL 3.500 1.350 2.15 . Using MINITAB, the normal probability plot is: Probability Plot of Bankruptcy Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 2.549 1.828 49 3.160 <0.005 Percent 80 70 60 50 40 30 20 10 5 1 -4 -2 0 2 4 Bankruptcy 6 8 10 Since this plot is not a straight line, the data are not normal. All four checks indicate that the data are not normal. Using MINITAB, the histogram with the normal distribution overlaid is: Histogram of PENALTY Normal Mean StDev N 25 132309 249632 38 20 Frequency 4.186 15 10 5 0 -400000 -200000 0 200000 400000 600000 800000 1000000 PENALTY The data are skewed to the right, and do not appear to be normally distributed. Copyright © 2014 Pearson Education, Inc. 246 Chapter 4 Using MINITAB, the descriptive statistics are: Descriptive Statistics: PENALTY Variable PENALTY N 38 Mean 132309 StDev 249632 Minimum 2500 Q1 20000 Median 35000 Q3 101250 Maximum 1000000 x s 132,309 249,632 117,323, 381,941 x 2s 132,309 2 249,632 132,309 499,264 (366,955, 631,573) x 3s 132,309 3 249,632 132,309 748,896 (616,587, 881, 205) Of the 38 measurements, 34 are in the interval (-117,323, 381,941). The proportion is 34 / 38 .895 . This is much larger than the proportion (.68) stated by the Empirical Rule. Of the 38 measurements, 35 are in the interval (-366,955, 631,573). The proportion is 35 / 38 .921 . This is smaller than the proportion (.95) stated by the Empirical Rule. Of the 38 measurements, 36 are in the interval (616,587, 881,205). The proportion is 36 / 38 .947 . This is much smaller than the proportion (1.00) stated by the Empirical Rule. This would imply that the data are not normal. IQR 98, 750 .28 . If the data are normally distributed, s 249, 632 this ratio should be close to 1.3. Since .28 is smaller than 1.3, this indicates that the data are not normal. IQR QU QL 101, 250 20, 000 98, 750 . Using MINITAB, the normal probability plot is: Probability Plot of PENALTY Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 132309 249632 38 7.602 <0.005 Percent 80 70 60 50 40 30 20 10 5 1 -500000 -250000 0 250000 PENALTY 500000 750000 1000000 Since this plot is not a straight line, the data are not normal. All four checks indicate that the data are not normal. 4.187 Let x equal the difference between the actual weight and recorded weight (the error of measurement). The random variable x is normally distributed with 592 and 628 . Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions a. 247 We want to find the probability that the weigh-in-motion equipment understates the actual weight of the truck. This would be true if the error of measurement is positive. 0 592 P x 0 P z P ( z .94) 628 .5000 .3264 =.8264 b. P(overstate the weight) = 1 P(understate the weight) 1 .8264 .1736 (Refer to part a.) For 100 measurements, approximately 100 .1736 =17.36 or 17 times the weight would be overstated. c. 400 592 P x 400 P z P( z .31) 628 .5000 .1217 .6217 d. We want P(understate the weight) = .5 To understate the weight, x 0 . Thus, we want to find so that P x 0 .5 0 P x 0 P z .5 628 From Table II, Appendix D, z0 0 . To find , substitute into the z-score formula: x 0 0 0 z0 0 628 Thus, the mean error should be set at 0. We want P(understate the weight) = .4 To understate the weight, x 0 . Thus, we want to find so that P x 0 .4 . A .5 .40 .1 . Look up the area .1000 in the body of Table II, Appendix D, z0 .25 . To find , substitute into the z-score formula: z0 x0 .25 0 0 .25 628 157 628 Copyright © 2014 Pearson Education, Inc. 248 4.188 Chapter 4 Let x = number of packets observed by a network sensor in 150 trials. Then x has an approximate binomial distribution with n 150 and p .001 . The virus will be detected if at least 1 packet is observed. 150 150! 0 150 0 P( x 1) 1 P( x 0) 1 1 .999150 1 .8606 .1394 .001 (.999) 0 0!150! 4. 189 a. np 25 .05 1.25 npq 25(.05)(.95) 1.09 Since is not an integer, x could not equal its expected value. b. The event is ( x 5) . From Table I with n 25 and p .05 : P ( x 5) 1 P ( x 4) 1 .993 .007 4.190 c. Since the probability obtained in part b is so small, it is unlikely that 5% applies to this agency. The percentage is probably greater than 5%. a. Let x = crop yield. The random variable x has a normal distribution with 1,500 and 250 . 1, 600 -1,500 P x 1, 600 P z P z .4 .5 .1554 .6554 250 (Using Table II, Appendix D) b. Let x1 = crop yield in first year and x2 = crop yield in second year. If x1 and x2 are independent, then the probability that the farm will lose money for two straight years is: 1,600 1,500 1,600 1,500 P x1 1, 600 P x2 1, 600 P z1 P z2 250 250 P z1 .4 P z2 .4 .5 .1554 .5 .1554 .6554 .6554 .4295 (Using Table II, Appendix D) c. P(1,500 2 x 1,500 + 2) = [1,500 2 ] 1,500 [1,500 2 ] 1,500 P(1,500 2 x 1,500 2 ) P z P(2 z 2) 2P(0 z 2) 2 .4772 .9544 (using Table II, Appendix D) 4.191 Let x = number of grants awarded to the north side in 140 trials. The random variable x has a hypergeometric distribution with N 743 , n 140 , and r 601 . a. E ( x) 2 nr 140(601) 113.24 N 743 r ( N r ) n( N n) 601(743 601)140(743 140) 17.5884 N 2 ( N 1) 7432 (743 1) 17.5884 4.194 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 249 If the grants were awarded at random, we would expect approximately 113 to be awarded to the north side. We observed 140. The z-score associated with 140 is: z x 140 113.24 6.38 4.194 Because this z-score is so large, it would be extremely unlikely to observe all 140 grants to the north side if they are randomly selected. Thus, we would conclude that the grants were not randomly selected. 4.192 Let x = length of time a bus is late. Then x is a uniform random variable with probability distribution: 1 f ( x) 20 0 4.193 (0 x 20) otherwise 0 20 10 2 a. b. 1 1 P( x 19) (20 19) .05 20 20 c. It would be doubtful that the director’s claim is true, since the probability of the being more than 19 minutes late is so small. a. The properties of valid probability distributions are: p(x) 1 and 0 p ( x) 1 for all x. For ARC a1: 0 p ( x ) 1 for all x and p( x) .05 .10 .25 .60 1.00 Thus, this is a valid probability distribution. For ARC a2: 0 p ( x ) 1 for all x and p(x) .10 .30 .60 0 1.00 Thus, this is a valid probability distribution. For ARC a3: 0 p ( x ) 1 for all x and p(x) .05 .25 .70 0 1.00 Thus, this is a valid probability distribution. For ARC a4: 0 p ( x ) 1 for all x and p( x) .90 .10 0 0 1.00 Thus, this is a valid probability distribution. b. For Arc a1, P ( x 1) P ( x 2) P ( x 3) .25 .6 .85 c. For Arc a2, P ( x 1) P ( x 2) .60 For Arc a3, P ( x 1) P ( x 2) .70 For Arc a4, P ( x 1) 0 Copyright © 2014 Pearson Education, Inc. 250 Chapter 4 d. E( x) xp( x) 0(.05) 1(.10) 2(.25) 3(.60) 0 .10 .50 1.80 2.40 For Arc a1, The average capacity of Arc a1 is 2.40. E( x) xp( x) 0(.10) 1(.30) 2(.60) 0 .30 1.20 1.50 For Arc a2, The average capacity of Arc a2 is 1.50. E( x) xp( x) 0(.05) 1(.25) 2(.70) 0 .25 1.40 1.65 For Arc a3, The average capacity of Arc a3 is 1.65. E( x) xp( x) 0(.90) 1(.10) 0 .10 .10 For Arc a4, The average capacity of Arc a4 is 0.10. e. 2 E ( x ) ( x ) 2 p( x) For Arc a1, 2 (0 2.4) 2 (.05) (1 2.4)2 (.10) (2 2.4) 2 (.25) (3 2.4) 2 (.60) (2.4) 2 (.05) (1.4) 2 (.10) (.4) 2 (.25) (.6) 2 (.60) .288 .196 .04 .216 .74 .74 .86 2.40 2(.86) 2.40 1.72 (.68, 4.12) We would expect most observations to fall within 2 standard deviations of the mean or 2 E ( x ) ( x ) 2 p( x ) For Arc a2, 2 (0 1.5)2 (.10) (1 1.5)2 (.30) (2 1.5)2 (.60) (1.5)2 (.10) (.5)2 (.30) (.5)2 (.60) .225 .075 .15 .45 .45 .67 1.50 2(.67) 1.50 1.34 (.16, 2.84) We would expect most observations to fall within 2 standard deviations of the mean or 2 E ( x ) ( x ) 2 p ( x) For Arc a3, 2 (0 1.65)2 (.05) (1 1.65)2 (.25) (2 1.65)2 (.70) (1.65)2 (.05) (.65)2 (.25) (.35)2 (.70) .136125 .105625 .08575 .3275 .3275 .57 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 251 1.65 2(.57) 1.65 1.14 (.51, 2.79) We would expect most observations to fall within 2 standard deviations of the mean or For Arc a4, 2 2 E ( x ) ( x ) 2 p ( x) (0 .1) 2 (.90) (1 .1) 2 (.10) ( .1) 2 (.90) (.9) 2 (.10) .009 .081 .090 .09 .30 .10 2(.30) .10 .60 ( .50, .70) We would expect most observations to fall within 2 standard deviations of the mean or 4.194 Let x = number of doctors who refuse ethics consultation in n 10 trials. From Exercise 2.11, we can estimate p with p .195 . Then x will be a binomial random variable with n 10 and p .195 . Using MINITAB with n 10 and p .195 , the probability is: Cumulative Distribution Function Binomial with n = 10 and p = 0.195 x 1 P( X <= x ) 0.391097 P ( x 2) 1 P ( x 1) 1 .391097 .608903 4.195 a. Using MINITAB with 5 , Cumulative Distribution Function Poisson with mean = 5 x 2 P( X <= x ) 0.124652 P x 3 P( x 2) .125 b. E x 5 . The average number of calls blocked during the peak hour of video conferencing call time is 5. 4.196 Let x = number of spoiled bottles in the sample of 3. Since the sampling will be done without replacement, x is a hypergeometric random variable with N 12 , n 3 , and r 1 . r N r 1 12 1 1! 11! 1 3 1 x n x 1!0! 2!9! 55 .25 P( x 1) 12! 220 N 12 3!9! n 3 4.197 Let x = number of defective CDs in n 1, 600 trials. Then x is a binomial random variable with n 1, 600 and p .006 . E x np 1,600 .006 9.6 . 2 npq 1, 600(.006)(.994) 9.5424 3.089 Copyright © 2014 Pearson Education, Inc. 252 Chapter 4 To see if the normal approximation is appropriate, we use: 3 9.6 3(3.089) 9.6 9.267 (0.333, 18.867) Since the interval lies in the range of 0 to 1,600, the normal approximation is appropriate. 11.5 9.6 P( x 12) P z P( z 0.62) .5 .2324 .2676 3.089 (Using Table II, Appendix D) Since this probability is fairly large, it would not be unusual to see 12 or more defectives in a sample of 1,600 if 99.4% were defect-free. Thus, there would be no evidence to cast doubt on the manufacturer’s claim. 4.198 a. b. If a large number of measurements are observed, then the relative frequencies should be very good estimators of the probabilities. E( x) xp( x) 1(.01) 2(.04) 3(.04) 4(.08) 5(.10) 6(.15) 7(.25) 8(.20) 9(.08) 10(.05) .01 .08 .12 .32 .50 .90 1.75 1.60 .72 .50 6.50 The average number of checkout lanes per store is 6.5. c. 2 ( x )2 p( x) (1 6.5) 2 (.01) (2 6.5) 2 (.04) (3 6.5) 2 (.04) (4 6.5) 2 (.08) All x (5 6.5) 2 (.10) (6 6.5) 2 (.15) (7 6.5) 2 (.25) (8 6.5) 2 (.20) (9 6.5) 2 (.08) (10 6.5) 2 (.05) .3025 .8100 .4900 .5000 .2250 .0375 .0625 .4500 .5000 .6125 3.99 3.99 1.9975 d. Chebyshev's Rule says that at least 0 of the observations should fall in the interval . Chebyshev's Rule says that at least 75% of the observations should fall in the interval 2 . e. 6.5 1.9975 (4.5025, 8.4975) P (4.5025 x 8.4975) .10 .15 .25 .20 .70 This is at least 0. 2 6.5 2(1.9975) 6.5 3.995 (2.505, 10.495) P (2.505 x 10.495) .04 .08 .10 .15 .25 .20 .08 .05 .95 This is at least .75 or 75%. 4.199 a. The contract will be profitable if total cost, x, is less than $1,000,000. 1, 000, 000 850, 000 P x 1, 000, 000 P z P z .88 .5 .3106 .8106 170, 000 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions b. 253 The contract will result in a loss if total cost, x, exceeds 1,000,000. P x 1,000,000 1 P x 1,000,000 1 .8106 .1894 c. P x R .99 . Find R. R 850, 000 P x R P z P z z0 .99 170, 000 A1 .99 .5 .4900 Looking up the area .4900 in Table II, z0 2.33 z0 R 850, 000 R 850, 000 2.33 170, 000 170, 000 R 2.33 170, 000 850, 000 $1, 246,100 a. For 17 . To graph the distribution, we will pick several values of x and find the value of f(x), where x = time between arrivals of the smaller craft at the pier. 1 1 f ( x ) e x / e x /17 17 f (1) 1 1/17 e .0555 17 f (3) f (7) 1 7 /17 e .0390 17 1 20/17 e .0181 17 f (20) 1 3/17 e .04937 17 f (5) f (10) 1 10/17 e .0327 17 f (15) f (25) 1 25/17 e .0135 17 Using MINITAB, the graph is: 0.06 0.05 0.04 f(x) 4.200 0.03 0.02 0.01 0.00 0 5 10 15 20 25 x Copyright © 2014 Pearson Education, Inc. 1 5/17 e .0438 17 1 15/17 e .0243 17 254 Chapter 4 b. We want to find the probability that the time between arrivals is less than 15 minutes. P x 15 1 P( x 15) 1 e15/17 1 .4138 .5862 4.201 We know from the Empirical Rule that almost all the observations are larger than 2 . ( 95% are between 2 and 2 ). Thus 2 100 . For the binomial, np n (.4) and npq n(.4)(.6) .24n 2 100 .4n 2 .24n 100 .4n .98 n 100 0 n , we get: Solving for n .98 .98 2 4(.4)(100) .98 12.687 2(.4) .8 n 17.084 n 17.084 2 291.9 292 4.202 Let x = tensile strength of a particular metal part. Then x is a normal random variable with 25 and 2 . The tolerance limits are 21 and 30. 21 25 P( x 21) P z P( z 2) .5 .4772 .0228 (Using Table II, Appendix D). 2 30 25 21 25 z P(21 x 30) P P(2 z 2.5) .4772 .4938 .9710 2 2 30 25 P( x 30) P z P( z 2.5) .5 .4938 .0062 2 E Profit $2 .0228 $10 .9710 $1.0062 $.0456 $9.71 $.0062 $9.66 4.203 Let x = load. Then x has a normal distribution with 20, 000 . We are given P 10 x 30 .95 . We want to find . P 10,000 x 30,000 .95 z1 z z2 .95 z1 z 0 P 0 z z2 .95 / 2 .4750 Looking up area .4750 in Table II, Appendix D, z 2 1.96 and z1 1.96 . z2 4.204 a. x 30 30, 000 20, 000 1.96 10, 000 5,102 1.96 Let x = number of passengers in 1500 who will be detained for luggage inspection. Then x is a binomial random variable with n 1,500 and p .20 . The expected number of passengers detained will be: E x np 1,500 .2 300 b. For n 4,000 , E x np 4,000 .2 800 Copyright © 2014 Pearson Education, Inc. Random Variables and Probability Distributions 4.205 c. (600 .5) 800 P x 600 P z P ( z 7.89) .5 .5 1.0 4000(.2)(.8) a. Using Table II, Appendix D. 255 For 1 : P(1 x 1) P (4 x 6) P(9 x 11) 1 5 65 11 5 1 5 45 95 P z z z P P 1 1 1 1 1 1 P(6 z 4) P ( 1 z 1) P(4 z 6) 0 .3413 .3413 0 .6826 For 2 : P(1 x 1) P(4 x 6) P(9 x 11) 1 5 65 11 5 1 5 45 95 P z z z P P 2 2 2 2 2 2 P(3 z 2) P(.5 z .5) P(2 z 3) (.4987 .4772) (.1915 .1915) (.4987 .4772) .4260 For 4 : P(1 x 1) P(4 x 6) P(9 x 11) 1 5 65 11 5 1 5 45 95 P z z z P P 4 4 4 4 4 4 P(1.5 z 1) P(.25 z .25) P(1 z 1.5) (.4332 .3413) (.0948 .0948) (.4332 .3413) .3734 b. For 1 , 764 of the 1100 flechettes hit a target. The proportion is 764/1100 = .6945. This is a little higher than the probability that was computed in part a. For 2 , 462 of the 1100 flechettes hit a target. The proportion is 462/1100 = .42. This is very close to the probability that was computed in part a. For 4 , 408 of the 1100 flechettes hit a target. The proportion is 408/1100 = .3709. Again, this is very close to the probability that was computed in part a. c. If the Army wants to maximize the chance of hitting the target that the prototype gun us aimed at, then should be set at 1. The probability of hitting the target is .6826. If the Army wants to hit multiple targets with a single shot of the weapon, then should be set at 2. The probability of hitting at least one of the targets is .4260. 4.206 Let x = number of disasters in 25 trials. If NASA’s assessment is correct, then x is a binomial random variable with n 25 and p 1 / 60, 000 .00001667 . If the Air Force’s assessment is correct, then x is a binomial random variable with n 25 and p 1 / 35 .02857 . Copyright © 2014 Pearson Education, Inc. 256 Chapter 4 If NASA’s assessment is correct, then the probability of no disasters in 25 missions would be: 25 P ( x 0) (1/ 60, 000) 0 (59,999 / 60, 000) 25 .9996 0 Thus, the probability of at least one disaster would be P ( x 1) 1 P ( x 0) 1 .9996 .0004 If the Air Force’s assessment is correct, then the probability of no disasters in 25 missions would be: 25 P ( x 0) (1/ 35) 0 (34 / 35) 25 .4845 0 Thus, the probability of at least one disaster would be P ( x 1) 1 P ( x 0) 1 .4845 .5155 One disaster actually did occur. If NASA’s assessment was correct, it would be almost impossible for at least one disaster to occur in 25 trials. If the Air Force’s assessment was correct, one disaster in 25 trials would not be an unusual event. Thus, the Air Force’s assessment appears to be appropriate. Copyright © 2014 Pearson Education, Inc. Chapter 5 Sampling Distributions a–b. The different samples of n 2 with replacement and their means are: x 0 1 2 3 1 2 3 4 Possible Samples 0, 0 0, 2 0, 4 0, 6 2, 0 2, 2 2, 4 2, 6 c. d. e. Possible Samples 4, 0 4, 2 4, 4 4, 6 6, 0 6, 2 6, 4 6, 6 x 2 3 4 5 3 4 5 6 Since each sample is equally likely, the probability of any 1 being selected is x 0 1 2 3 4 5 6 1 16 1 1 2 P ( x 1) 16 16 16 1 1 1 3 P ( x 2) 16 16 16 16 1 1 1 1 4 P ( x 3) 16 16 16 16 16 1 1 1 3 P ( x 4) 16 16 16 16 1 1 2 P ( x 5) 16 16 16 1 P ( x 6) 16 P ( x 0) Using MINITAB, the graph is: Histogram of x-bar .25 .1875 Probability 5.1 .125 .0625 0 0 1 2 3 x-bar 4 5 6 257 Copyright © 2014 Pearson Education, Inc. 11 1 4 4 16 p( x ) 1/16 2/16 3/16 4/16 3/16 2/16 1/16 258 5.2 Chapter 5 Answers will vary. Using a statistical package, 100 samples of size 2 with replacement were generated from the population containing 0, 2, 4, and 6. The sample mean was computed for each of the 100 samples of size 2. The relative frequency distribution for these 100 sample means is: x Frequency Relative frequency p( x ) 0 1 2 3 4 5 6 4 15 17 30 21 10 3 .04 .15 .17 .30 .21 .10 .03 1/16 = .0625 2/16 = .1250 3/16 = .1875 4/16 = .2500 3/16 = .1875 2/16 = .1250 1/16 = .0625 The exact distribution is in the last column headed with p ( x ) . The relative frequencies from this sample are similar to the probabilities from the exact distribution. 5.3 If the observations are independent of each other, then P 1, 1 p 1 p 1 .2 .2 .04 P 1, 2 p 1 p 2 .2 .3 .06 P 1, 3 p 1 p 3 .2 .2 .04 etc. a. Possible Sample x p( x ) Possible Samples x p( x ) 1, 1 1, 2 1, 3 1, 4 1, 5 2, 1 2, 2 2, 3 2, 4 2, 5 3, 1 3, 2 3, 3 1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3 .04 .06 .04 .04 .02 .06 .09 .06 .06 .03 .04 .06 .04 3, 4 3, 5 4, 1 4, 2 4, 3 4, 4 4, 5 5, 1 5, 2 5, 3 5, 4 5, 5 3.5 4 2.5 3 3.5 4 4.5 3 3.5 4 4.5 5 .04 .02 .04 .06 .04 .04 .02 .02 .03 .02 .02 .01 Summing the probabilities, the probability distribution of is: x 1 1.5 2 2.5 3 3.5 4 4.5 5 p( x ) .04 .12 .17 .20 .20 .14 .08 .04 .01 Copyright © 2014 Pearson Education, Inc. Sampling Distributions b. 259 Using MINITAB, the graph is: Histogram of x-bar .20 Probability .15 .10 .05 0 1 5.4 1.5 2 2.5 3 x-bar 3.5 4 4.5 5 c. P( x 4.5) .04 .01 .05 d. No. The probability of observing x 4.5 or larger is small (.05). E ( x) xp( x) 1.2 2 .3 3 .2 4 .2 5 .1 .2 .6 .6 .8 .5 2.7 E ( x ) xp( x ) 1.0 .04 1.5 .12 2.0 .17 2.5 .20 3.0 .20 3.5 .14 4.0 .08 4.5 .04 5.0 .01 .04 .18 .34 .50 .60 .49 .32 .18 .05 2.7 5.6 a. For a sample of size n 2 , the sample mean and sample median are exactly the same. Thus, the sampling distribution of the sample median is the same as that for the sample mean (see Exercise 5.3a). b. The probability histogram for the sample median is identical to that for the sample mean (see Exercise 5.3b). a. Answers will vary. A statistical package was used to generate 500 samples of size 15 from a uniform distribution on the interval from 150 to 200. The sample mean was computed for each sample of size 15. Using MINITAB, a histogram of the sample means is: Histogram of Mean 120 100 80 Frequency 5.5 60 40 20 0 156 162 168 174 Mean 180 186 192 Copyright © 2014 Pearson Education, Inc. 260 Chapter 5 b. The sample medians were computed for each of the 500 samples of size 15 used in part a. Using MINITAB, a histogram of the sample medians is: Histogram of Median 120 100 Frequency 80 60 40 20 0 156 162 168 174 Median 180 186 192 The sampling distribution of the sample medians is more spread out than the sampling distribution of the sample means. In addition, there are more observations in the middle of the distribution of the sample means than the distribution of the sample medians. a. Answers will vary. A statistical package was used to generate 500 samples of size 25 from a uniform distribution on the interval from 00 to 99. The sample mean was computed for each sample of size 25. Using MINITAB, a histogram of the sample means is: Histogram of Mean 60 50 40 Frequency 5.7 30 20 10 0 30 36 42 48 54 Mean 60 66 72 Copyright © 2014 Pearson Education, Inc. Sampling Distributions b. The sample variances were computed for each of the 500 samples of size 25 used in part a. Using MINITAB, a histogram of the sample variances is: Histogram of Variance 60 Frequency 50 40 30 20 10 0 400 5.8 a. 600 800 1000 Variance 1200 1400 xp( x) 0 1 4 1.667 3 3 3 3 1 1 1 5 5 1 5 1 5 1 78 2.889 ( x ) p ( x) 0 1 4 3 3 3 3 3 3 27 2 2 2 2 2 b. Sample 0, 0 0, 1 0, 4 1, 0 1, 1 1, 4 4, 0 4, 1 4, 4 x 0 0.5 1 2 2.5 4 c. x 0 0.5 2 0.5 1 2.5 2 2.5 4 Probability 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Probability 1/9 2/9 1/9 2/9 2/9 1/9 1 2 1 2 2 1 15 5 E ( x ) xp( x ) 0 0.5 1 2 2.5 4 1.667 9 9 9 9 9 9 9 3 Since E ( x ) , x is an unbiased estimator for . Copyright © 2014 Pearson Education, Inc. 261 262 Chapter 5 d. Recall that s 2 x x 2 2 n 1 For the first sample, s 2 n 02 02 For the second sample, s 2 0 0 2 1 12 02 2 2 0. 1 0 2 1 2 2 1 1 2 2 1 2 1 2 The rest of the values are shown in the table below. s2 0 0.5 8 0.5 0 4.5 8 4.5 0 Sample 0, 0 0, 1 0, 4 1, 0 1, 1 1, 4 4, 0 4, 1 4, 4 Probability 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 The sampling distribution of s 2 is: s2 0 0.5 4.5 8 e. Probability 3/9 2/9 2/9 2/9 3 2 2 2 26 2.889 E ( s 2 ) s 2 p( s 2 ) 0 .5 4.5 8 9 9 9 9 9 Since E ( s 2 ) 2 , s 2 is an unbiased estimator for 2 . 5.9 a. xp( x) 2 4 9 5 3 3 3 3 1 1 1 15 Copyright © 2014 Pearson Education, Inc. Sampling Distributions b. The possible samples of size n 3 , the sample means, and the probabilities are: Possible Samples 2, 2, 2 2, 2, 4 2, 2, 9 2, 4, 2 2, 4, 4 2, 4, 9 2, 9, 2 2, 9, 4 2, 9, 9 4, 2, 2 4, 2, 4 4, 2, 9 4, 4, 2 p( x ) x 2 8/3 13/3 8/3 10/3 5 13/3 5 20/3 8/3 10/3 5 10/3 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 m 2 2 2 2 4 4 2 4 9 2 4 4 4 Possible Samples 4, 4, 4 4, 4, 9 4, 9, 2 4, 9, 4 4, 9, 9 9, 2, 2 9, 2, 4 9, 2, 9 9, 4, 2 9, 4, 4 9, 4, 9 9, 9, 2 9, 9, 4 9, 9, 9 x 4 17/3 5 17/3 22/3 13/3 5 20/3 5 17/3 22/3 20/3 22/3 9 p( x ) 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 m 4 4 4 4 9 2 4 9 4 4 9 9 9 9 The sampling distribution of x is: x 2 8/3 10/3 4 13/3 5 17/3 20/3 22/3 9 p( x ) 1/27 3/27 3/27 1/27 3/27 6/27 3/27 3/27 3/27 1/27 27/27 1 8 3 10 3 1 13 3 E ( x ) xp ( x ) 2 4 27 3 27 3 27 27 3 27 6 17 3 20 3 22 3 1 5 9 27 3 27 3 27 3 27 27 2 8 10 4 13 30 17 20 22 9 135 5 27 27 27 27 27 27 27 27 27 27 27 c. Since 5 in part a, and E ( x ) 5 , x is an unbiased estimator of . The median was calculated for each sample and is shown in the table in part b. The sampling distribution of m is: m 2 4 9 p(m) 7/27 13/27 7/27 27/27 Copyright © 2014 Pearson Education, Inc. 263 264 Chapter 5 7 13 7 14 52 63 129 4.778 E (m) mp(m) 2 4 9 27 27 27 27 27 27 27 The E (m) 4.778 5 . Thus, m is a biased estimator of . d. 5.10 a. Use the sample mean, x . It is an unbiased estimator. xp( x) 0 1 2 1 3 3 3 1 1 1 b. Sample 0, 0, 0 0, 0, 1 0, 0, 2 0, 1, 0 0, 1, 1 0, 1, 2 0, 2, 0 0, 2, 1 0, 2, 2 1, 0, 0 1, 0, 1 1, 0, 2 1, 1, 0 1, 1, 1 x 0 1/3 2/3 1/3 2/3 1 2/3 1 4/3 1/3 2/3 1 2/3 1 Probability 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 Sample 1, 1, 2 1, 2, 0 1, 2, 1 1, 2, 2 2, 0, 0 2, 0, 1 2, 0, 2 2, 1, 0 2, 1, 1 2, 1, 2 2, 2, 0 2, 2, 1 2, 2, 2 x 4/3 1 4/3 5/3 2/3 1 4/3 1 4/3 5/3 4/3 5/3 2 Probability 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 From the above table, the sampling distribution of the sample mean would be: x 0 1/3 2/3 1 4/3 5/3 2 Probability 1/27 3/27 6/27 7/27 6/27 3/27 1/27 c. Sample 0, 0, 0 0, 0, 1 0, 0, 2 0, 1, 0 0, 1, 1 0, 1, 2 0, 2, 0 0, 2, 1 0, 2, 2 1, 0, 0 1, 0, 1 1, 0, 2 1, 1, 0 1, 1, 1 m 0 0 0 0 1 1 0 1 2 0 1 1 1 1 Probability 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 Sample 1, 1, 2 1, 2, 0 1, 2, 1 1, 2, 2 2, 0, 0 2, 0, 1 2, 0, 2 2, 1, 0 2, 1, 1 2, 1, 2 2, 2, 0 2, 2, 1 2, 2, 2 m 1 1 1 2 0 1 2 1 1 2 2 2 2 Copyright © 2014 Pearson Education, Inc. Probability 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 Sampling Distributions From the table, the sampling distribution of the sample median would be: m 0 1 2 d. Probability 7/27 13/27 7/27 1 1 3 2 6 7 4 6 5 3 1 E ( x ) xp( x ) 0 1 2 1 27 3 27 3 27 27 3 27 3 27 27 Since E ( x ) , x is an unbiased estimator for . 7 13 7 E (m) mp(m) 0 1 2 1 27 27 27 Since E (m) , m is an unbiased estimator for . e. x2 ( x )2 p( x ) (0 1) 2 1 1 3 2 6 2 7 1 1 (1 1) 27 3 27 3 27 27 2 2 4 6 5 3 1 2 1 1 (2 1)2 .2222 3 27 3 27 27 9 2 2 m2 (m 1)2 p(m) (0 1)2 14 7 2 13 2 7 .5185 (1 1) (2 1) 27 27 27 27 f. 5.11 Since both the sample mean and median are unbiased estimators and the variance is smaller for the sample mean, the sample mean would be the preferred estimator of . Answers will vary. MINITAB was used to generate 500 samples of size n = 25 observations from a uniform population from 1 to 50. The first 10 samples along with the sample means and medians are shown in the table below: Sample Observations Mean Median 1 28 27 11 19 50 30 47 26 9 33 50 15 21 41 31 41 35 32 32 17 6 32 39 34 21 29.08 31 2 8 6 15 47 26 48 28 25.88 26 3 6 20 27 1 50 14 21 37 46 23 8 29 18 28 40 39 49 33 23 28.24 28 4 45 12 26 13 40 17 11 43 8 35 20 8 44 48 13 46 49 17 47 27 5 45 5 40 38 25 37 47 2 17 40 32 6 22 30 23 6 17 8 43 27 21 5 18 45 31 15 7 40 1 22 29 8 25 3 44 34 29 6 33 32 43 6 43 24 49 14 37 8 46 44 1 12 36 18 30 25 4 9 7 33 36 41 30 13 17 19 14 36 20 39 41 20 15 38 12 37 14 10 4 46 49 49 45 49 24 4 32 32 3 45 18 6 9 40 3 42 21 44 50 42 14 24 10 36 1 34 42 47 24 46 2 38 22 18 7 8 22 20 36 18 45 16 29 27.4 26 3 43 47 16 35 35 24.88 23 3 35 23 45 24 39 38 35 37 24.20 23 3 49 34 24 40 27 5 49 11 30 23.16 22 2 18 22 14 6 22 9 6 9 9 21 36 9 19 2 37 15 8 3 25 22 27 28 23 17 14 6 35 5 20 34 4 41 Copyright © 2014 Pearson Education, Inc. 9 15 3 25.84 29 22.88 19 23.88 23 265 266 Chapter 5 Using MINITAB, side-by side histograms of the means and medians of the 500 samples are: Histogram of Mean, Median 12 18 Mean 24 30 36 42 Median 140 Frequency 120 100 80 60 40 20 0 12 5.12 18 24 30 36 42 a. Yes, it appears that x and the median are unbiased estimators of the population mean. The centers of both distributions above appear to be around 25 to 26. In fact, the mean of the sampling distribution of x is 25.65 and the mean of the sampling distribution of the median is 25.73. b. The sampling distribution of the median has greater variation because it is more spread out than the sampling distribution of x . a. The mean of the random variable x is E ( x) xp( x) 1(.2) 2(.3) 3(.2) 4(.2) 5(.1) 2.7 From Exercise 5.3, the sampling distribution of x is: x 1 1.5 2 2.5 3 3.5 4 4.5 5 p( x ) .04 .12 .17 .20 .20 .14 .08 .04 .01 E ( x ) xp ( x ) 1(.04) 1.5(.12) 2(.17) 2.5(.20) 3(.20) 3.5(.14) The mean of the sampling distribution of x is: 4(.08) 4.5(.04) 5(.01) 2.7 Since E ( x ) E ( x) , x is an unbiased estimator of . Copyright © 2014 Pearson Education, Inc. Sampling Distributions b. 267 x2 ( x )2 p( x ) (1 2.7) 2 (.04) (1.5 2.7) 2 (.12) (2 2.7) 2 (.17) The variance of the sampling distribution of x is: (2.5 2.7) 2 (.20) (3 2.7)2 (.20) (3.5 2.7) 2 (.14) (4 2.7) 2 (.08) (4.5 2.7)2 (.04) (5 2.7) 2 (.01) .805 c. 2 x 2.7 2 .805 2.7 1.794 (.906, 4.494) P(.906 x 4.494) .04 .12 .17 .2 .2 .14 .08 .95 5.13 a. Refer to the solution to Exercise 5.3. The values of s2 and the corresponding probabilities are listed below: s2 x x 2 2 n 1 n For sample 1, 1; s 2 22 2 0 1 2 For sample 1, 2: s 2 The rest of the values are calculated and shown: s2 0.0 0.5 2.0 4.5 8.0 0.5 0.0 0.5 2.0 4.5 2.0 0.5 0.0 p(s2) .04 .06 .04 .04 .02 .06 .09 .06 .06 .03 .04 .06 .04 s2 0.5 2.0 4.5 2.0 0.5 0.0 0.5 8.0 4.5 2.0 0.5 0.0 p(s2) .04 .02 .04 .06 .04 .04 .02 .02 .03 .02 .02 .01 The sampling distribution of s2 is: s2 0.0 0.5 2.0 4.5 8.0 b. c. p(s2) .22 .36 .24 .14 .04 2 ( x ) 2 p ( x) (1 2.7) 2 (.2) (2 2.7) 2 (.3) (3 2.7) 2 (.2) (4 2.7) 2 (.2) (5 2.7) 2 (.1) 1.61 E ( s 2 ) s 2 p( s 2 ) 0(.22) .5(.36) 2(.24) 4.5(.14) 8(.04) 1.61 Copyright © 2014 Pearson Education, Inc. 32 2 .5 1 5 268 Chapter 5 d. The sampling distribution of s is listed below, where s s 2 : s 0.000 0.707 1.414 2.121 2.828 e. p(s) .22 .36 .24 .14 .04 E ( s) sp( s) 0(.22) .707(.36) 1.41(.24) 2.1212(.14) 2.828(.04) 1.00394 Since E ( s ) 1.00394 is not equal to 2 1.61 1.269 , s is a biased estimator of . 5.14 E ( x) xp( x) 1(.2) 2(.3) 3(.2) 4(.2) 5(.1) 2.7 The mean of the random variable x is: From Exercise 5.5, the sampling distribution of the sample median is: m p(m) 1 .04 1.5 .12 2 .17 2.5 .20 3 .20 3.5 .14 4 .08 4.5 .04 5 .01 E ( m) mp ( m) 1(.04) 1.5(.12) 2(.17) 2.5(.20) 3(.20) 3.5(.14) 4(.08) The mean of the sampling distribution of the sample median m is: 4.5(.04) 5(.01) 2.7 Since E (m) , m is an unbiased estimator of . 5.15 The sampling distribution is approximately normal only if the sample size is sufficiently large or if the population being sampled from is normal. 5.16 a. x 10 , x / n 3 / 25 0.6 b. x 100 , x / n 25 / 25 5 c. x 20 , x / n 40 / 25 8 d. x 10 , x / n 100 / 25 20 a. x 100, x b. x 100, x c. x 100, x 5.17 n n n 100 100 100 4 25 100 5 2 1 Copyright © 2014 Pearson Education, Inc. Sampling Distributions 5.18 5.19 5.20 100 100 100 269 d. x 100, x e. x 100, x f. x 100, x a. x 20, x / n 16 / 64 2 b. By the Central Limit Theorem, the distribution of x is approximately normal. For the Central Limit Theorem to apply, n must be sufficiently large. For this problem, n 64 is sufficiently large. c. z d. z x x x x x x n n n 50 500 1.414 .447 1000 15.5 20 2.25 2 23 20 1.50 2 .316 In Exercise 5.18, it was determined that the mean and standard deviation of the sampling distribution of the sample mean are 20 and 2 respectively. Using Table II, Appendix D: a. 16 20 P( x 16) P z P( z 2) .5 .4772 .0228 2 b. 23 20 P( x 23) P z P( z 1.50) .5 .4332 .0668 2 c. 25 20 P( x 25) P z P( z 2.5) .5 .4938 .0062 2 d. 22 20 16 20 z P(16 x 22) P P(2 z 1) .4772 .3413 .8185 2 2 e. 14 20 P( x 14) P z P( z 3) .5 .4987 .0013 2 For this population and sample size, E ( x ) 100 , x / n 10 / 900 1/ 3 a. Almost all of the time, the sample mean will be within three standard deviations of the mean, i.e., 1 3 100 3 100 1 (99, 101) . Thus, the smallest value of x we would expect is 99 3 and the largest value would be 101. Copyright © 2014 Pearson Education, Inc. 270 Chapter 5 1 No more than three standard deviations, i.e., 3 1 3 No, the previous answer only depended on the standard deviation of the sampling distribution of the sample mean, not the mean itself. b. c. 5.21 By the Central Limit Theorem, the sampling distribution of x is approximately normal with x 30 and x / n 16 / 100 1.6 . Using Table II, Appendix D: 28 30 P( x 28) P z P( z 1.25) .5 .3944 .8944 1.6 a. 26.8 30 22.1 30 P(22.1 x 26.8) P z P(4.94 z 2) .5 .4772 .0228 1.6 1.6 b. 28.2 30 P( x 28.2) P z P( z 1.13) .5 .3708 .1292 1.6 c. 27.0 30 P( x 27.0) P z P( z 1.88) .5 .4699 .9699 1.6 d. Answers will vary. A computer package was used to generate 500 samples of size n 2 . The sample mean was computed for each of the 500 samples. This was repeated for 500 samples of size n 5 , 500 samples of size n 10 , 500 samples of size n 30 , and 500 samples of size n 50 . Using MINITAB, the relative frequency histograms for x for each of the sample sizes are: Histogram of xbar2, xbar5, xbar10, xbar30, xbar50 15 xbar2 30 45 60 75 90 xbar5 xbar10 .6 .4 Relative frequency 5.22 .2 xbar30 xbar50 0 15 30 45 60 75 90 .6 .4 .2 0 15 30 45 60 75 90 All of the histograms look mound-shaped. As n increases, the spread of the values of x decreases. Copyright © 2014 Pearson Education, Inc. Sampling Distributions a. From Exercise 2.33, the population of interarrival times is skewed to the right. b. The population mean and standard deviation are: x 25, 504.845 95.52 267 N 2 x2 x N N 2 4, 665, 241.665 25,504.845 2 267 267 8,348.025727 8,348.025727 91.3675 c. d. By the Central Limit Theorem, the sampling distribution of x will be approximately normal. 91.3675 14.4465 . Theoretically, x 95.52 and x n 40 90 95.52 P( x 90) P z P( z .38) .5 .1480 .3520 (Using Table II, Appendix D.) 14.4465 e &f. Answers will vary. A statistical package was used to randomly select 40 interarrival times from the Phishing data set and x was computed. This was repeated 50 times to simulate 50 students selecting 40 interarrival times and computing x . Using MINITAB, a histogram of the 50 x values is: Histogram of Means 16 14 12 Frequency 5.23 271 10 8 6 4 2 0 60 80 100 120 Means This shape is somewhat normal. g. Using MINITAB, the mean and standard deviation of these 50 means is: Descriptive Statistics: Means Variable Means N 50 Mean 96.09 StDev 14.08 Minimum 52.73 Q1 86.36 Median 95.65 Copyright © 2014 Pearson Education, Inc. Q3 105.23 Maximum 130.23 272 Chapter 5 The mean of these 50 means is 96.09. This is very close to x 95.52 found in part c. The standard deviation of these 50 means is 14.08. This is also very close to x 91.54 14.4465 found in n 40 part c. 5.24 5.25 5.26 5.27 a. x 96,850 b. x c. By the Central Limit Theorem, the sampling distribution of x is approximately normal. 30, 000 4, 242.6407 n 50 x x z e. P( x 89,500) P( z 1.73) .5 .4582 .9582 (Using Table II, Appendix D) a. x 68 . The average value of sample mean level of support is 68. b. x c. Because the sample size is large (n = 45 > 30), the Central Limit Theorem says that the sampling distribution of x is approximately normal. x 89,500 96,850 1.73 4, 242.6407 d. 27 4.0249 The standard deviation of the distribution of the sample means is 4.0249. n 45 d. 65 68 P( x 65) P z P( z .75) .5 .2734 .7734 (Using Table II, Appendix D) 4.0249 a. E ( x ) x .10 b. Since n 30 , the sampling distribution of x is approximately normal by the Central Limit Theorem. c. .13 .10 P( x .13) P z P( z 2.13) .5 .4834 .0166 (Using Table II, Appendix D) .0141 x .10 .0141 n 50 By the Central Limit Theorem, the sampling distribution of x is approximately normal with 8 1. x 105.3 and x n 64 103 105.3 P( x 103) P z P( z 2.3) .5 .4893 .0107 (Using Table II, Appendix D) 1 5.28 a. c d 0 3, 600 1,800 . The average value of the sample mean number of seconds 2 2 from the start of the hour is 1,800 second. E ( x ) E ( x) Copyright © 2014 Pearson Education, Inc. Sampling Distributions b. c. 5.29 273 1 1 (d c)2 (3, 600 0) 2 12 12 18, 000 n 60 60 2 x 2 Because the sample size is sufficiently large, by the Central Limit Theorem, the sampling distribution of x is approximately normal. d. 1700 1800 1900 1800 P(1700 x 1900) P z P(.75 z .75) .2734 .2734 .5468 18, 000 18, 000 (Using Table II, Appendix D) e. 2000 1800 P( x 2000) P z P( z 1.49) .5 .4319 .0681 (Using Table II, Appendix D) 18, 000 a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with a mean .193 .0273 . x .53 and standard deviation x 50 n b. .58 .53 P( x .58) P z P( z 1.83) .5 .4664 .0336 .0273 c. If Before Tensioning: x .53 .59 .53 P( x .59) P z P( z 2.20) .5 .4861 .0139 .0273 If After Tensioning: x .58 .59 .58 P( x .59) P z P( z 0.37) .5 .1443 .3557 .0273 Since the probability of getting a maximum differential of .59 or more Before Tensioning is so small, it would be very unlikely that the measurements were obtained Before Tensioning. However, since the probability of getting a maximum differential of .59 or more After Tensioning is not small, it would not be unusual that the measurements were obtained after tensioning. Thus, most likely, the measurements were obtained After Tensioning. 5.30 a. Since the sample size is small, we also have to assume that the distribution from which the sample .5 .1118 was drawn is normal. x 1.8 , x 20 n 1.85 1.8 P( x 1.85) P z P( z 0.45) .5 .1736 .3264 .1118 (using Table II, Appendix D) Copyright © 2014 Pearson Education, Inc. 274 Chapter 5 b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Rough Variable Rough N Mean 20 1.881 StDev 0.524 Minimum 1.060 Q1 1.303 Median 2.040 Q3 2.293 Maximum 2.640 From this output, the value of x is 1.881. c. For x = 1.881: 1.881 1.8 P( x 1.881) P z P( z 0.72) .5 .2642 .2358 .1118 Since this probability is so high, observing a sample mean of x 1.881 is not unusual. The assumptions in part a appear to be valid. 5.31 a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with 10 .5538 . x 6 and x 326 n 7.5 6 P( x 7.5) P z P( z 2.71) .5 .4966 .0034 .5538 (Using Table II, Appendix D) b. We first need to find the probability of observing the current data or anything more unusual if the true mean is 6. 300 6 P( x 300) P z P( z 530.88) .5 .5 0 .5538 Since the probability of observing a sample mean of 300 ppb or higher is essentially 0 if the true mean is 6 ppb, we would infer that the true mean PFOA concentration for the population of people who live near DuPont’s Teflon facility is not 6 ppb but higher than 6 ppb. 5.32 a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with x and x / n / 100 . b. The mean of the x distribution is equal to the mean of the distribution of the fleet or the fleet mean score. c. x 30 and x / n / 100 60 / 100 6 . 45 30 P( x 45) P z P z 2.5 .5 .4938 .0062 (Using Table II, Appendix D) 6 d. The sample mean of 45 tends to refute the claim. If the true fleet mean was as high as 30, observing a sample mean of 45 or higher would be extremely unlikely (probability = .0062). Thus, we would infer that the true mean is actually not 30 but something higher. Thus, we would refute the company’s claim that the mean “couldn’t possibly be as large as 30.” Copyright © 2014 Pearson Education, Inc. Sampling Distributions 5.33 275 By the Central Limit Theorem, the sampling distribution of x is approximately normal with x 40 and x n 5 100 .5 . 42 40 P( x 42) P z P( z 4) .5 .5 0 (Using Table II, Appendix D) .5 Since this probability is so small, it is very unlikely that the sample was selected from the population of convicted drug dealers. 5.34 For n 36 , x 406 and x / n 10.1/ 36 1.6833 . By the Central Limit Theorem, the sampling distribution is approximately normal (n is large). 400.8 406 P( x 400.8) P z (using Table II, Appendix D) P( z 3.09) .5 .4990 .0010 1.6833 We agree with the first operator. If the true value of is 406, it would be extremely unlikely to observe an x as small as 400.8 or smaller (probability .0010). Thus, we would infer that the true value of is less than 406. 5.35 For n 50 , we can use the Central Limit Theorem to decide the shape of the distribution of the sample mean bacterial counts. For the handrubbing sample, the sampling distribution of x is approximately 59 8.344 . For the handwashing sample, normal with a mean of x 35 and standard deviation n 50 the sampling distribution of x is approximately normal with a mean of x 69 and standard deviation n 106 50 14.991 . For Handrubbing: 30 35 P( x 30 | 35) P z P( z .60) .5 .2257 .2743 (using Table II, Appendix D) 8.344 For Handwashing: 30 69 P( x 30 | 69) P z P( z 2.60) .5 .4953 .0047 (using Table II, Appendix D) 14.991 Since the probability of getting a sample mean of less than 30 for the handrubbing is not small compared with that for the handwashing, the sample of workers probably came from the handrubbing group. 5.36 a. pˆ p .2 and pˆ p (1 p ) .2(1 .2) .0566 50 n b. pˆ p .2 and pˆ .2(1 .2) p(1 p) .0126 1, 000 n c. pˆ p .2 and pˆ p (1 p ) .2(1 .2) .02 400 n Copyright © 2014 Pearson Education, Inc. 276 5.37 5.38 5.39 Chapter 5 a. pˆ p .1 and pˆ p (1 p ) .1(1 .1) .0134 500 n b. pˆ p .5 and pˆ p (1 p ) .5(1 .5) .0224 500 n c. pˆ p .7 and pˆ p (1 p ) .7(1 .7) .0205 500 n a. pˆ p .3 and pˆ p (1 p ) .3(1 .3) .0512 80 n b. The sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large. c. z d. .35 .3 P( z .98) .5 .3365 .1635 (using Table II, Appendix D) P( pˆ .35) P z .3(1 .3) 80 a. E ( pˆ ) pˆ p .85 and pˆ b. The sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large. c. 5.40. pˆ p p (1 p ) n P( pˆ .9) P z .35 .3 .3(1 .3) 80 .98 p (1 p ) .85(1 .85) .0226 250 n .9 .85 P( z 2.21) .5 .4864 .9864 (using Table II, Appendix D) .85(1 .85) 250 We would not expect to see any values of p̂ more than 3 standard deviations below or above the mean value of p̂ . z pˆ p p(1 p) n 3 pˆ .4 .4(1 .4) 1500 3 .4(1 .4) pˆ .4 .0379 pˆ .4 pˆ .3621 1500 The smallest value we would expect for p̂ would be .3621. Copyright © 2014 Pearson Education, Inc. Sampling Distributions z pˆ p p(1 p) n 3 pˆ .4 .4(1 .4) 1500 3 277 .4(1 .4) pˆ .4 .0379 pˆ .4 pˆ .4379 1500 The largest value we would expect for p̂ would be .4379. a. Answers will vary. Using a statistical package, 500 samples of size 10 were generated from the population of (0,1). The histogram of the 500 sample proportions is: Histogram of p-hat10 120 Frequency 100 80 60 40 20 0 0.0 0.2 0.4 0.6 0.8 1.0 p-hat10 b. Using a statistical package, 500 samples of size 25 were generated from the population of (0,1). The histogram of the 500 sample proportions is: Histogram of p-hat25 100 80 Frequency 5.41 60 40 20 0 0.12 0.24 0.36 0.48 p-hat25 0.60 0.72 Copyright © 2014 Pearson Education, Inc. 278 Chapter 5 c. Using a statistical package, 500 samples of size 100 were generated from the population of (0,1). The histogram of the 500 sample proportions is: Histogram of p-hat100 50 Frequency 40 30 20 10 0 0.40 d. 0.44 0.48 0.52 p-hat100 0.56 0.60 0.64 As the sample size increases, the spread of the values of p̂ decreases. In the graph in part a, the spread of the values of p̂ is from 0 to 1. In the graph in part b, the spread of the values of p̂ is from .20 to .76. In the graph in part c, the spread of the values of p̂ is from .37 to .64. In all graphs, the distributions are mound-shaped. As the sample size increases, the distribution becomes more peaked. 5.42 pˆ p .4 and pˆ b. The sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large. c. d. 5.43 p (1 p ) .4(1 .4) .0476 106 n a. .59 .4 P( z 3.99) .5 .49997 .00003 (Using Table II, Appendix D) P( pˆ .59) P z .4(1 .4) 106 x 63 .59. We found in part c that P( pˆ .59) .00003 . Since this probability is so small, n 106 it casts doubt on the assumption that 40% of all social robots are designed with legs, but no wheels. pˆ a. pˆ p .67 b. pˆ c. p (1 p ) .67(1 .67) .0149 1000 n By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large. Copyright © 2014 Pearson Education, Inc. Sampling Distributions 5.44 d. .75 .67 P( pˆ .75) P z P( z 5.38) .5 .5 1 (using Table II, Appendix D) .67(1 .67) 1000 e. P( pˆ .5) P z .5 .67 P( z 11.43) .5 .5 1 (using Table II, Appendix D) .67(1 .67) 1000 By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large, with pˆ p .45 and pˆ 5.45 p (1 p ) .45(1 .45) .0222 . 500 n a. .4 .45 .5 .45 P(2.25 z 2.25) .4878 .4878 .9756 z P(.4 pˆ .5) P .45(1 .45) .45(1 .45) 500 500 (using Table II, Appendix D) b. .6 .45 P( z 6.74) .5 .5 0 (using Table II, Appendix D) P( pˆ .6) P z .45(1 .45) 500 a. By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large, with pˆ p .03 and pˆ 5.46 279 p (1 p ) .03(1 .03) .0054 . 1000 n b. P( pˆ .05) P z c. .025 .03 P( z .93) .5 .3238 .8238 (using Table II, Appendix D) P( pˆ .025) P z .03(1 .03) 1000 a. P( z 3.71) .5 .4999 .9999 (using Table II, Appendix D) .03(1 .03) 1000 .05 .03 Let pˆ H = sample proportion of Finnish citizens with high IQ who invest in the stock market. By the Central Limit theorem, the sampling distribution of pˆ H will be approximately normal since the sample size is sufficiently large, with pˆ H pH .44 and pˆ H pH (1 pH ) .44(1 .44) .0222 . 500 n Copyright © 2014 Pearson Education, Inc. 280 Chapter 5 150 .3 .44 P pˆ H P( z 6.31) .5 .5 1 P pˆ H .3 P z 500 .44(1 .44) 500 (using Table II, Appendix D) b. Let pˆ A = sample proportion of Finnish citizens with average IQ who invest in the stock market. By the Central Limit theorem, the sampling distribution of pˆ A will be approximately normal since the sample size is sufficiently large, with pˆ A p A .26 and pˆ A p A (1 p A ) .26(1 .26) .0196 . 500 n 150 .3 .26 P pˆ A P( z 2.04) .5 .4793 .0207 P pˆ A .3 P z 500 .26(1 .26) 500 (using Table II, Appendix D) c. Let pˆ L = sample proportion of Finnish citizens with low IQ who invest in the stock market. By the Central Limit theorem, the sampling distribution of pˆ L will be approximately normal since the sample size is sufficiently large, with pˆ L pL .14 and pˆ L pL (1 pL ) .14(1 .14) .0155 . 500 n 150 .3 .14 P pˆ L P( z 10.31) .5 .5 0 P pˆ L .3 P z 500 .14(1 .14) 500 (using Table II, Appendix D) 5.47 a. By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large, with pˆ p .4 and pˆ P pˆ .6 P z b. p (1 p ) .4(1 .4) .0693 . 50 n .6 .4 P( z 2.89) .5 .4981 .0019 (Using Table II, Appendix D) .4(1 .4) 50 Since the probability of observing a value of p̂ larger than .6 is so small (p = .0019) and we observed a value of pˆ .62 , we would conclude that the true proportion of adult cell phone owners who download an “app” is not .4 but something larger than .4. c. If the value of pˆ .62 was obtained at a convention for the International Association for the Wireless Telecommunications Industry, then it is probably not representative of the population of all adult cell phone owners. Those who attend such a convention would tend to be more “tech” savvy than the population of all adult cell phone owners. The value of pˆ .62 would be larger than what we would expect from the general population. Copyright © 2014 Pearson Education, Inc. Sampling Distributions 5.48 281 From Exercise 4.48, we defined the following events: P: {hotel guest is aware of conservation program} A: {hotel guest participates in conservation efforts} Then, the probability of a hotel guest being aware and participating in the hotel’s conservation efforts is p P( P | A) P( A) .72(.66) .4752 . By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large, with pˆ p .4752 and pˆ 42 ˆ .42 P pˆ P p P z 100 (using Table II, Appendix D) 5.49 p (1 p ) .4752(1 .4752) .0499 . 100 n .42 .4752 P( z 1.11) .5 .3665 .1335 .4752(1 .4752) 100 a. E ( pˆ ) pˆ p .92 b. By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large, with pˆ p .92 and pˆ p (1 p ) .92(1 .92) .0086 . 1000 n 900 .9 .92 P pˆ P( z 2.33) .5 .4901 .0099 P pˆ .9 P z 1000 .92(1 .92) 1000 (using Table II, Appendix D) 5.50 a. As the sample size increases, the standard error will decrease. This property is important because we know that the larger the sample size, the less variable our estimator will be. Thus, as n increases, our estimator will tend to be closer to the parameter we are trying to estimate. b. This would indicate that the statistic would not be a very good estimator of the parameter. If the standard error is not a function of the sample size, then a statistic based on one observation would be as good an estimator as a statistic based on 1000 observations. c. x would be preferred over A as an estimator for the population mean. The standard error of x is smaller than the standard error of A. d. The standard error of x is n 10 10 1.25 and the standard error of A is 3 2.5 . 64 64 If the sample size is sufficiently large, the Central Limit Theorem says the distribution of x is approximately normal. Using the Empirical Rule, approximately 68% of all the values of x will fall between 1.25 and 1.25 . Approximately 95% of all the values of x will fall between 2.50 and 2.50 . Approximately all of the values of x will fall between 3.75 and 3.75 . Copyright © 2014 Pearson Education, Inc. 282 Chapter 5 Using the Empirical Rule, approximately 68% of all the values of A will fall between 2.50 and 2.50 . Approximately 95% of all the values of A will fall between 5.00 and 5.00 . Approximately all of the values of A will fall between 7.50 and 7.50 . 5.51 5.52 a. "The sampling distribution of the sample statistic A" is the probability distribution of the variable A. b. "A" is an unbiased estimator of if the mean of the sampling distribution of A is . c. If both A and B are unbiased estimators of , then the statistic whose standard deviation is smaller is a better estimator of . d. No. The Central Limit Theorem applies only to the sample mean. If A is the sample mean, x , and n is sufficiently large, then the Central Limit Theorem will apply. However, both A and B cannot be sample means. Thus, we cannot apply the Central Limit Theorem to both A and B. a. First we must compute and . The probability distribution for x is: x 1 2 3 4 p(x) .3 .2 .2 .3 E ( x) xp( x) 1(.3) 2(.2) 3(.2) 4(.3) 2.5 2 E ( x ) 2 ( x )2 p( x) (1 2.5)2 (.3) (2 2.5)2 (.2) (3 2.5) 2 (.2) (4 2.5)2 (.3) 1.45 x 2.5 , x b. 5.53 n 1.45 40 .1904 By the Central Limit Theorem, the distribution of x is approximately normal. The sample size, n 40 , is sufficiently large. Yes, the answer depends on the sample size. By the Central Limit Theorem, the sampling distribution of x is approximately normal. x 19.6 , x 3.2 68 .388 a. 19.6 19.6 P( x 19.6) P z P( z 0) .5 .388 b. 19 19.6 P( x 19) P z P( z 1.55) .5 .4394 .0606 .388 c. 20.1 19.6 P( x 20.1) P z P( z 1.29) .5 .4015 .0985 .388 d. (Using Table II, Appendix D) 20.6 19.6 19.2 19.6 z P(19.2 x 20.6) P .388 .388 P(1.03 z 2.58) .3485 .4951 .8436 Copyright © 2014 Pearson Education, Inc. (Using Table II, Appendix D) (Using Table II, Appendix D) (Using Table II, Appendix D) Sampling Distributions 5.54 5.55 p (1 p ) .35(1 .35) .0213 . 500 n a. pˆ p .35 and pˆ b. By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large. By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large with pˆ p .8 and pˆ p (1 p ) .8(1 .8) .0231 . 300 n a. .83 .8 P( z 1.30) .5 .4032 .9032 (Using Table II, Appendix D) P pˆ .83 P z .8(1 .8) 300 b. .75 .8 P( z 2.17) .5 .4850 .9850 (Using Table II, Appendix D) P pˆ .75 P z .8(1 .8) 300 c. .79 .8 .81 .8 P(.43 z .43) .1664 .1664 .3328 z P .79 pˆ .81 P .8(1 .8) .8(1 .8) 300 300 (using Table II, Appendix D) Answers will vary. One hundred samples of size n = 2 were selected from a normal distribution with a mean of 100 and a standard deviation of 10. The process was repeated for samples of size n = 5, n = 10, n = 30, and n = 50. For each sample, the value of x was computed. Using MINITAB, the histograms for each set of 100 x ’s were constructed: Histogram of xbar2, xbar5, xbar10, xbar30, xbar50 Normal 85 xbar2 90 95 0 5 0 5 0 10 10 11 11 12 xbar5 xbar10 60 45 30 Frequency 5.56 283 15 xbar30 xbar50 60 0 85 90 95 100 105 110 115 120 45 xbar2 Mean 101.1 StDev 6.614 N 100 xbar5 Mean 99.70 StDev 6.278 N 100 xbar10 Mean 99.73 StDev 3.249 N 100 xbar30 Mean 100.2 StDev 2.040 N 100 30 15 0 85 90 95 100 105 110 115 120 Copyright © 2014 Pearson Education, Inc. xbar50 Mean 100.1 StDev 1.512 N 100 284 Chapter 5 The sampling distribution of x is normal regardless of the sample size because the population we sampled from was normal. Notice that as the sample size n increases, the variances of the sampling distributions decrease. 5.57 Answers will vary. One hundred samples of size n = 2 were selected from a uniform distribution on the interval from 0 to 10. The process was repeated for samples of size n = 5, n = 10, n = 30, and n = 50. For each sample, the value of x was computed. Using MINITAB, the histograms for each set of 100 x ’s were constructed: Histogram of xbar2, xbar5, xbar10, xbar30, xbar50 Normal 0.0 1.5 3.0 4.5 6.0 7.5 9.0 xbar2 xbar5 xbar10 48 36 Frequency 24 12 xbar30 xbar50 0 0.0 1.5 3.0 4.5 6.0 7.5 9.0 48 36 xbar2 Mean 4.935 StDev 2.073 N 100 xbar5 Mean 4.828 StDev 1.610 N 100 xbar10 Mean 5.004 StDev 0.9256 N 100 xbar30 Mean 5.010 StDev 0.5652 N 100 24 12 xbar50 Mean 4.998 StDev 0.4323 N 100 0 0.0 1.5 3.0 4.5 6.0 7.5 9.0 For small sizes of n, the sampling distributions of x are somewhat normal. As n increases, the sampling distributions of x become more normal. 5.58 a. Tossing a coin two times can result in: 2 heads (2 ones) 2 tails (2 zeros) 1 head, 1 tail (1 one, 1 zero) b. x2 heads 1 0 1 11 00 1 ; x2 tails 0 ; x1H,1T 2 2 2 2 c. pˆ 2 heads 2 0 1 1 ; pˆ 2 tails 0 ; pˆ 1H ,1T 2 2 2 d. There are four possible combinations for one coin tossed two times, as shown below: Coin Tosses H, H H, T T, H T, T p̂ 1 1/2 1/2 0 p̂ 0 1/2 1 Copyright © 2014 Pearson Education, Inc. p( pˆ ) 1/4 1/2 1/4 Sampling Distributions e. The sampling distribution of p̂ is given in the histogram shown. H istogr am of p-hat 0.5 p(p-hat) 0.4 0.3 0.2 0.1 0.0 0.0 5.59 0.5 p-hat 1.0 Given: 100 and 10 1 n 5 10 n 20 30 4.472 3.162 2.236 1.826 The graph of 10 n 40 50 1.581 1.414 against n is given here: Scatter plot of st er r vs n 10 9 8 st err 7 6 5 4 3 2 1 0 10 20 30 40 50 n 5.60 a. x 141 b. x n 18 100 1.8 Copyright © 2014 Pearson Education, Inc. 285 286 5.61 Chapter 5 c. By the Central Limit Theorem, the sampling distribution of x is approximately normal. d. z e. P( x 142) P( z 0.56) .5 .2123 .2877 (Using Table II, Appendix D) x x x 142 141 0.56 1.8 By the Central Limit Theorem, the sampling distribution of x is approximately normal with x 19 and x n 65 100 6.5 . 10 19 P( x 10) P z P( z 1.38) .5 .4162 .0838 (using Table II, Appendix D) 6.5 5.62 5.63 a. For x to be a binomial random variable, the n trials must be identical. We can assume that the process of selecting of a worker is identical from trial to trial. There are two possible outcomes - a worker missed work due to a back injury or not. The probability of success must be the same from trial to trial. We can assume that the probability of missing work due to a back injury is constant. The trials must be independent of each other. We can assume that the outcome of one trial will not affect the outcome of any other. Thus, x is a binomial random variable. b. From the information given in the problem, the estimate of p is .40. c. pˆ p .4 and pˆ d. .38 .4 P pˆ .38 P z P( z .41) .5 .1591 .3409 (using Table II, Appendix D) .4(1 .4) 100 a. E ( pˆ ) pˆ p .60 b. pˆ c. 5.64 p (1 p ) .4(1 .4) .0490 100 n p (1 p ) .6(1 .6) .0566 75 n By the Central Limit theorem, the sampling distribution of p̂ will be approximately normal since the sample size is sufficiently large. d. .70 .6 P( z 1.77) .5 .4616 .0384 (using Table II, Appendix D) P( pˆ .70) P z .6(1 .6) 75 a. x 89.34 ; x n 7.74 35 1.3083 Copyright © 2014 Pearson Education, Inc. Sampling Distributions 287 b. 5.65 c. 88 89.34 P( x 88) P z P( z 1.02) .5 .3461 .8461 (using Table II, Appendix D) 1.3083 d. 87 89.34 P( x 87) P z P( z 1.79) .5 .4633 .0367 (using Table II, Appendix D) 1.3083 a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with x and x / n / 50 . b. x 40 and x / 50 12 / 50 1.6971 . 44 40 P( x 44) P z P( z 2.36) .5 .4909 .0091 (using Table II, Appendix D) 1.6971 c. 2 / n 40 2 1.6971 40 3.3942 36.6058, 43.3942 43.3942 40 36.6058 40 z P(36.6058 x 43.3942) P 1.6971 (using Table II, Appendix D) 1.6971 P(2 z 2) 2 .4772 .9544 5.66 a. The mean diameter of the bearings, , is unknown with a standard deviation of .001 inch. Assuming that the distribution of the diameters of the bearings is normal, the sampling distribution of the sample mean is also normal. The mean and variance of the distribution are: .001 .0002 x , x 25 n Having the sample mean fall within .0001 inch of implies x .0001 or .0001 x .0001 .0001 .0001 z P(.0001 x .0001) P P(.50 z .50) .1915 .1915 .3830 .0002 .0002 (using Table II, Appendix D) b. The approximation is unlikely to be accurate. In order for the Central Limit Theorem to apply, the sample size must be sufficiently large. For a very skewed distribution, n 25 is not sufficiently large, and thus, the Central Limit Theorem will not apply. Copyright © 2014 Pearson Education, Inc. 288 5.67 Chapter 5 From Exercise 5.66, .001 . We must assume the Central Limit theorem applies (n is only 25). Thus, the distribution of x is approximately normal with x .501 and x n .001 25 .0002 . Using Table II, Appendix D, .4994 .501 .5006 .501 P( x .4994) P( x .5006) P z P z .0002 .0002 P( z 8) P( z 2) (.5 .5) (.5 .4772) .9772 5.68 a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with 6 x .3235. n 344 b. 19.1 18.5 If 18.5 , P( x 19.1) P z P( z 1.85) .5 .4678 .0322 .3235 (Using Table II, Appendix D) c. 19.1 19.5 If 19.5 , P( x 19.1) P z P( z 1.24) .5 .3925 .8925 .3235 (Using Table II, Appendix D) d. 19.1 P( x 19.1) P z .5 .3235 e. We know that P z 0 .5 . Thus, 19.1 0 19.1 .3235 19.1 P( x 19.1) P z .2 .3235 Thus, must be less than 19.1. If 19.1 , then P( P( x 19.1) .5. Since P( x 19.1) .5 , then 19.1 . 5.69 .2(1 .2) p (1 p ) .0253 250 n a. E ( pˆ ) pˆ p .2 and pˆ b. E ( pˆ ) 2 pˆ .2 2(.0253) .2 .0506 .1494, .2506 c. By the Central Limit Theorem, the sampling distribution of p̂ will be approximately normal sincef the sample size is sufficiently large. Thus, .2506 .2 .1494 .2 z P(.1494 pˆ .2506) P P(2 z 2) .4772 .4772 .9544 .0253 .0253 5.70 a. b. By the Central Limit Theorem, the sampling distribution of x is approximately normal since n 30 15 2.1213 with x 840 and x n 50 830 840 P( x 830) P z P( z 4.71) .5 .5 0 2.1213 Copyright © 2014 Pearson Education, Inc. Sampling Distributions c. Since the probability of observing a mean of 830 or less is extremely small ( 0) if the true mean is 840, we would tend to believe that the mean is not 840, but something less. d. By the Central Limit Theorem, the sampling distribution of x is approximately normal since n 30 45 6.3640 with x 840 and x n 50 289 830 840 P( x 830) P z P( z 1.57) .5 .4418 .0582 6.3640 5.71 a. Let p1 = probability of an error 1 / 100 .01 and p2 = probability of an error resulting in a significant problem 1 / 500 .002 . Let p̂1 = proportion of errors. Then E ( pˆ1 ) pˆ1 p1 .01 . Let p̂2 = proportion of significant errors. Then E ( pˆ 2 ) pˆ 2 p2 .002 . b. Since the distribution of p̂2 will be approximately normal by the Central Limit Theorem, we would expect the proportion of significant errors to fall within 2 standard deviations of the expected value. The interval would be: pˆ 2 2 pˆ 2 .002 2 5.72 Even though the number of flaws per piece of siding has a Poisson distribution, the Central Limit Theorem implies that the distribution of the sample mean will be approximately normal with x 2.5 and x 5.73 5.74 .002(1 .002) .002 .00036 (.00164, .00236) 60, 000 n 2.5 35 2.1 2.5 .2673 . Therefore, P( x 2.1) P z P( z 1.50) .5 .4332 .9332 2.5/ 35 (using Table II, Appendix D) a. If x is an exponential random variable, then E ( x) 60 . The standard deviation of x is 60 . 2 602 36 V ( x ) x2 Then, E ( x ) x 60 ; n 100 b. Because the sample size is fairly large, the Central Limit Theorem says that the sampling distribution of x is approximately normal. c. 30 60 P( x 30) P z P( z 5.0) .5 .5 0 (using Table II, Appendix D) 36 a. By the Central Limit Theorem, the distribution of x is approximately normal, with x 157 and x n 3 40 .474 . The sample mean is 1.3 psi below 157 or x 157 1.3 155.7 Copyright © 2014 Pearson Education, Inc. 290 Chapter 5 155.7 157 P( x 155.7) P z P( z 2.74) .5 .4969 .0031 (using Table II, Appendix D) .474 If the claim is true, it is very unlikely (probability = .0031) to observe a sample mean 1.3 psi below 157 psi. Thus, the actual population mean is probably not 157 but something lower. b. 155.7 156 P( x 155.7) P z P( z .63) .5 .2357 .2643 (using Table II, Appendix D) .474 The observed sample is more likely if 156 rather than 157 . 155.7 158 P( x 155.7) P z P( z 4.85) .5 .5 0 .474 The observed sample is less likely if 158 rather than 157 . c. If 2 , x n 2 40 .316 . 155.7 157 P( x 155.7) P z P( z 4.11) .5 .5 0 (using Table II, Appendix D) .316 The observed sample is less likely if 2 than if 3. If 6 , x n 6 40 .949 . 155.7 157 P( x 155.7) P z P( z 1.37) .5 .4147 .0853 .949 (using Table II, Appendix D) The observed sample is more likely if 6 than if 3. 5.75 Answers will vary. We are to assume that the fecal bacteria concentrations of water specimens follow an approximate normal distribution. Now, suppose that the distribution of the fecal bacteria concentration at a beach is normal with a true mean of 360 and with a standard deviation of 40. If only a single sample was selected, then the probability of getting an observation at the 400 level or higher would be: 400 360 P( x 400) P z P( z 1) .5 .3413 .1587 (using Table II, Appendix D) 40 Thus, even if the water is safe, the beach would be closed approximately 15.87% of the time. On the other hand, if the mean was 440 and the standard deviation was still 40, then the probability of getting a single observation less than the 400 level would be: 400 440 P( x 400) P z P( z 1) .5 .3413 .1587 (using Table II, Appendix D) 40 Thus, the beach would remain open approximately 15.78% of the time when it should be closed. Now, suppose we took a random sample of 64 water specimens. The sampling distribution of x is Copyright © 2014 Pearson Education, Inc. Sampling Distributions approximately normal by the Central Limit Theorem with x and x n 40 64 291 5. 400 360 If 360 , P( x 400) P z P( z 8) .5 .5 0 . Thus, the beach would never be shut 5 down if the water was actually safe if we took samples of size 64. 400 440 If 440 , P( x 400) P z P( z 8) .5 .5 0 . Thus, the beach would never be left 5 open if the water was actually unsafe if we took samples of size 64. The single sample standard can lead to unsafe decisions or inconvenient decisions, but is much easier to collect than samples of size 64. Copyright © 2014 Pearson Education, Inc. Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.1 a. For .10 , / 2 .10 / 2 .05 . z / 2 z.05 is the z-score with .05 of the area to the right of it. The area between 0 and z.05 is .5 .05 .4500 . Using Table II, Appendix D, z.05 1.645 . b. For .01 , / 2 .01/ 2 .005 . z / 2 z.005 is the z-score with .005 of the area to the right of it. The area between 0 and z.005 is .5 .005 .4950 . Using Table II, Appendix D, z.005 2.575 . c. For .05 , / 2 .05 / 2 .025 . z / 2 z.025 is the z-score with .025 of the area to the right of it. The area between 0 and z.025 is .5 .025 .4750 . Using Table II, Appendix D, z.025 1.96 . d. For .20 , / 2 .20 / 2 .10 . z / 2 z.10 is the z-score with .10 of the area to the right of it. The area between 0 and z.10 is .5 .10 .4000 . Using Table II, Appendix D, z.10 1.28 . 6.2 a. z / 2 1.96 , using Table II, Appendix D, P(0 z 1.96) .4750 . Thus, / 2 .5 .4750 .025 , 2(.025) .05 , and 1 1 .05 .95 . The confidence level is 100%(.95) 95% . b. z / 2 1.645 , using Table II, Appendix D, P(0 z 1.645) .45 . Thus, / 2 .5 .45 .05 , 2(.05) .10 , and 1 1 .10 .90 . The confidence level is 100%(.90) 90% . c. z / 2 2.575 , using Table II, Appendix D, P(0 z 2.575) .495 . Thus, / 2 .5 .495 .005 , 2(.005) .01 , and 1 1 .01 .99 . The confidence level is 100%(.99) 99% . d. z / 2 1.282 , using Table II, Appendix D, P(0 z 1.282) .4 . Thus, / 2 .5 .4 .1 , 2(.1) .20 , and 1 1 .20 .80 . The confidence level is 100%(.80) 80% . e. z / 2 .99 , using Table II, Appendix D, P(0 z .99) .3389 . Thus, / 2 .5 .3389 .1611 , 2(.1611) .3222 , and 1 1 .3222 .6778 . The confidence level is 100%(.6778) 67.78% . 6.3 a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z.025 b. c. x z.025 x z.025 n n n 28 1.96 102 1.96 15 1.96 12 75 28 .784 27.216, 28.784 22 200 .3 100 102 .65 101.35, 102.65 15 .0588 14.9412, 15.0588 292 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 293 d. 6.4 x z.025 4.05 1.96 .83 100 4.05 .163 3.887, 4.213 No. Since the sample size in each part was large (n ranged from 75 to 200), the Central Limit Theorem indicates that the sampling distribution of x is approximately normal. a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: b. c. n 25.9 1.96 2.7 90 25.9 .56 25.34, 26.46 n 25.9 1.645 2.7 90 25.9 .47 25.43, 26.37 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: x z.025 a. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: x z.025 n 25.9 2.58 2.7 90 25.9 .73 25.17, 26.63 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z / 2 s n 26.2 1.96 4.1 70 26.2 .96 (25.24, 27.16) b. The confidence coefficient of .95 means that in repeated sampling, 95% of all confidence intervals constructed will include . c. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: x z / 2 6.6 n e. x z.025 6.5 s n 26.2 2.58 4.1 70 26.2 1.26 (24.94, 27.46) d. As the confidence coefficient increases, the width of the confidence interval also increases. e. Yes. Since the sample size is 70, the Central Limit Theorem applies. This ensures the distribution of x is normal, regardless of the original distribution. If we were to repeatedly draw samples from the population and form the interval x 1.96 x each time, approximately 95% of the intervals would contain . We have no way of knowing whether our interval estimate is one of the 95% that contain or one of the 5% that does not. Copyright © 2014 Pearson Education, Inc. 294 Chapter 6 6.7 A point estimator is a single value used to estimate the parameter, . An interval estimator is two values, an upper and lower bound, which define an interval with which we attempt to enclose the parameter, . An interval estimate also has a measure of confidence associated with it. 6.8 a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: 33.9 1.96 3.3 33.9 1.96 3.3 33.9 .647 33.253, 34.547 x z.025 s b. x z.025 s c. For part a, the width of the interval is 2(.647) 1.294 . For part b, the width of the interval is n n 100 400 33.9 .323 33.577, 34.223 2(.323) .646 . When the sample size is quadrupled, the width of the confidence interval is halved. 6.9 Yes. As long as the sample size is sufficiently large, the Central Limit Theorem says the distribution of x is approximately normal regardless of the original distribution. 6.10 a. The confidence coefficient that was used is .95. b. Southwest: We are 95% confident that the true mean airfare for Southwest Airlines was between $412 and $496. Delta: We are 95% confident that the true mean airfare for Delta Airlines was between $468 and $500. USAir: We are 95% confident that the true mean airfare for USAir was between $247 and $372. 6.11 c. “95% confident” means that in repeated sampling, 95% of all confidence intervals constructed will contain the true mean. d. To reduce the width of the confidence interval, one would use a smaller confidence coefficient. The smaller the value of the confidence coefficient, the smaller the z-score associated with the confidence coefficient. Thus, one would be adding and subtracting a smaller value if the z-score is smaller. a. The point estimate of is x 3.11 . b. For confidence coefficient .98, .02 and / 2 .02 / 2 .01 . From Table II, Appendix D, z.01 2.33 . The confidence interval is: x z.01 c. n 3.11 2.33 .66 307 3.11 .088 (3.022, 3.198) This statement is incorrect. Once the interval is constructed, there is no probability involved. The true mean is either in the interval or it is not. A better statement would be: “We are 98% confident that the true mean GPA will be between 3.022 and 3.198.” Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 295 6.12 d. Since the sample size is so large ( n 307 ), the Central Limit Theorem applies. Thus, it does not matter whether the distributions of grades is skewed or not. a. The 90% confidence interval is (66.350, 69.160). b. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: x z.05 67.755 1.645 n 26.871 992 67.755 1.403 (66.352, 69.158) This is close to the interval reported in the output. 6.13 c. We are 90% confident that the true mean level of support for all senior managers is between 66.350 and 69.160. d. No. The 90% confidence interval does not contain 75. Therefore, it is not a likely value for the true mean. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: s x z.05 n 6,563 1.645 2, 484 1, 751 6,563 97.65 (6, 465.35, 6, 660.65) We are 90% confident that the true mean expenses per full-time equivalent employee of all U.S. Army hospitals is between $6,465.35 and $6,660.65. 6.14 6.15 a. From the printout, the 95% confidence interval is (1.6711, 2.1989). b. We are 95% confident that the true mean failure time of used colored display panels is between 1.6711 and 2.1989 years. c. If 95% confidence intervals are formed, then approximately .95 of the intervals will contain the true mean failure time. a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Wheels Variable Wheels N 28 Mean 3.214 StDev 1.371 Minimum 1.000 Q1 2.000 Median 3.000 Q3 4.000 Maximum 8.000 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: x z.005 n 67.755 2.58 1.371 28 3.214 .668 (2.546, 3.882) b. We are 99% confident that the true mean number of wheels used on all social robots is between 2.546 and 3.882. c. 99% of all similarly constructed confidence intervals will contain the true mean. Copyright © 2014 Pearson Education, Inc. 296 Chapter 6 6.16 a. The population of interest is all U.S. women who shop on Black Friday. b. The quantitative variable of interest is the number of hours spent shopping. c. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Hours Variable Hours N 38 Mean 6.079 StDev 2.755 Minimum 3.000 Q1 4.000 Median 5.000 Q3 7.250 Maximum 16.000 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z.025 6.17 n 6.079 1.96 2.755 38 6.079 .876 (5.203, 6.955) d. We are 95% confident that the true mean number of hours spent shopping on Black Friday is between 5.203 and 6.995. e. No. The confidence interval constructed in part c contains 5.5. Therefore, the 5.5 is not an unusual value for the mean. a. The target parameter is the population mean 20118 salary of these 500 CEOs who participated in the Forbes’ survey, . b. Answers will vary. Using MINITAB, a sample of 50 CEOs was selected. The ranks of the 50 selected are: 9, 10, 14, 18, 19, 22, 25, 32, 38, 45, 49, 50, 55, 60, 66, 69, 77, 96, 104, 106, 115, 147, 152, 167, 192, 197, 209, 213, 229, 241, 245, 261, 268, 292, 305, 309, 325, 337, 342, 358, 364, 370, 376, 384, 405, 417, 423, 433, 470, 482. c. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Pay ($mil) Variable Pay ($mil) N 50 Mean 12.40 StDev 10.34 Minimum 0.940 Q1 4.20 Median 7.18 Q3 19.47 Maximum 37.90 The sample mean is x 12.40 and the sample standard deviation is s 10.34 . d. Using MINITAB, the descriptive statistics for the entire data set is: Descriptive Statistics: Pay ($mil) Variable Pay ($mil) N 478 Mean 9.247 StDev 9.842 Minimum 0.000000000 Q1 3.413 Median 6.100 From the above, the standard deviation of the population is $9.842 million. Copyright © 2014 Pearson Education, Inc. Q3 11.346 Maximum 101.965 Inferences Based on a Single Sample: Estimation with Confidence Intervals 297 e. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: x z.005 6.18 n 12.40 2.58 9.84 50 12.40 3.59 (8.81, 15.99) f. We are 99% confident that the true mean salary of all 500 CEOs in the Forbes’ survey is between $8.81 million and $15.99 million. g. From part d, the true mean salary of all 500 CEOs is $9.247 million. This value does fall within the 99% confidence interval that we found in part e. a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Rate Variable Rate N 30 Mean 79.73 Median 80.00 StDev Minimum 5.96 60.00 Maximum 90.00 Q1 76.75 Q3 84.00 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: x z.05 6.19 s n 79.73 1.645 5.96 30 79.73 1.79 (77.94, 81.52) b. We are 90% confident that the mean participation rate for all companies that have 401(k) plans is between 77.94% and 81.52%. c. We must assume that the sample size ( n 30 ) is sufficiently large so that the Central Limit Theorem applies. d. Yes. Since 71% is not included in the 90% confidence interval, it can be concluded that this company's participation rate is lower than the population mean. e. The center of the confidence interval is x . If 60% is changed to 80%, the value of x will increase, thus indicating that the center point will be larger. The value of s2 will decrease if 60% is replaced by 80%, thus causing the width of the interval to decrease. a. An estimate of the true mean Mach rating score of all purchasing managers is x 99.6 . b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval is: x z / 2 s n 99.6 1.96 12.6 122 99.6 2.24 (97.36, 101.84) c. We are 95% confident that the true Mach rating score of all purchasing managers is between 97.36 and 101.84. d. Yes, there is evidence to dispute this claim. We are 95% confident that the true mean Mach rating score is between 97.36 and 101.84. It would be very unlikely that the true means Mach scores is as low as 85. Copyright © 2014 Pearson Education, Inc. 298 Chapter 6 6.20 a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z.025 b. 6.21 n 1.96 1.96 .15 55 1.96 .04 (1.92, 2.00) No. The value of 2.2 does not fall in the 95% confidence interval. Therefore, it is not a likely value for the true mean facial WHR. To answer the question, we will first form 90% confidence intervals for each of the 2 SAT scores. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval for SAT-Mathematics scores is: x z / 2 s n 19 1.645 65 265 19 6.57 (12.43, 25.57) We are 90% confident that the mean change in SAT-Mathematics score is between 12.43 and 25.57 points. The confidence interval for SAT-Verbal scores is: x z / 2 s n 7 1.645 49 265 7 4.95 (2.05, 11.95) We are 90% confident that the mean change in SAT-Verbal score is between 2.05 and 11.95 points. The SAT-Mathematics test would be the most likely of the two to have 15 as the mean change in score. This value of 15 is in the 90% confidence interval for the mean change in SAT-Mathematics score. However, 15 does not fall in the 90% confidence interval for the mean SAT-Verbal test. 6.22 x 11, 298 2.26 5, 000 For confidence coefficient, .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z.025 s n 2.26 1.96 1.5 5, 000 2.26 .04 (2.22, 2.30) We are 95% confident the mean number of roaches produced per roach per week is between 2.22 and 2.30. 6.23 a. For confidence coefficient .80, .20 and / 2 .20 / 2 .10 . From Table II, Appendix D, z.05 1.28 . From Table III, with df n 1 5 1 4, t.10 1.533 . b. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . From Table III, with df n 1 5 1 4, t.05 2.132 . c. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . From Table III, with df n 1 5 1 4, t.025 2.776 . d. For confidence coefficient .98, .02 and / 2 .02 / 2 .01 . From Table II, Appendix D, z.01 2.33 . From Table III, with df n 1 5 1 4, t.01 3.747 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 299 6.24 6.25 e. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.575 . From Table III, with df n 1 5 1 4, t.005 4.604 . f. Both the t- and z-distributions are symmetric around 0 and mound-shaped. The t-distribution is more spread out than the z-distribution. a. If x is normally distributed, the sampling distribution of x is normal, regardless of the sample size. b. If nothing is known about the distribution of x, the sampling distribution of x is approximately normal if n is sufficiently large. If n is not large, the distribution of x is unknown if the distribution of x is not known. a. P(t0 t t0 ) .95 where df = 10 Because of symmetry, the statement can be written P 0 t t0 .475 where df = 10 P t t0 .025 t0 2.228 6.26 b. P(t t0 or t t0 ) .05 where df = 10 c. P (t t0 ) .05 where df = 10 Because of symmetry, the statement can be written P t t0 .05 t0 1.812 d. P(t t0 or t t0 ) .10 where df = 20 e. P(t t0 or t t0 ) .01 where df = 5 a. P(t t0 ) .025 where df = 11; t0 2.201 b. P(t t0 ) .01 where df = 9; t0 2.821 c. P(t t0 ) .005 where df = 6. Because of symmetry, the statement can be rewritten 2 P t t0 .05 P t t0 .025 t0 2.228 2 P t t0 .10 P t t0 .05 t0 1.725 2 P t t0 .01 P t t0 .005 t0 4.032 P(t t0 ) .005 where df = 6; t0 3.707 d. P(t t0 ) .05 where df = 18; t0 1.734 Copyright © 2014 Pearson Education, Inc. 300 Chapter 6 6.27 First, we must compute x and s. x a. x 30 5 , s (30) 2 6 26 5.2 , s 5.2 2.2804 6 1 5 176 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n 1 6 1 5, t.05 2.015 . The 90% confidence interval is: s n 5 2.015 5 1.88 3.12, 6.88 2.2804 6 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 6 1 5, t.025 2.571 . The 95% confidence interval is: s n 5 2.571 2.2804 6 5 2.39 2.61, 7.39 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table III, Appendix D, with df n 1 6 1 5, t.005 4.032 . The 99% confidence interval is: x t.005 d. n 1 6 x t.025 c. 2 n 2 n x t.05 b. x x 2 a) s n 5 4.032 5 3.75 1.25, 8.75 s n 5 1.711 2.2804 25 5 .78 4.22, 5.78 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 25 1 24, t.025 2.064 . The 95% confidence interval is: x t.025 c) 6 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n 1 25 1 24, t.05 1.711 . The 90% confidence interval is: x t.05 b) 2.2804 s n 5 2.064 2.2804 25 5 .94 4.06, 5.94 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table III, Appendix D, with df n 1 25 1 24, t.005 2.797 . The 99% confidence interval is: x t.005 s n 5 2.797 2.2804 25 5 1.28 3.72, 6.28 Increasing the sample size decreases the width of the confidence interval. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 301 6.28 For this sample, x a. x 1567 97.9375 , s 2 n n 1 16 2 n 1567 2 16 159.9292 , s s 2 12.6463 16 1 155,867 For confidence coefficient, .80, .20 and / 2 .20 / 2 .10 . From Table III, Appendix D, with df n 1 16 1 15, t.10 1.341 . The 80% confidence interval for is: x t.10 b. x x 2 s n 97.94 1.341 12.6463 16 97.94 4.240 93.700, 102.180 For confidence coefficient, .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 16 1 15, t.025 2.131 . The 95% confidence interval for is: x t.025 s n 97.94 2.131 12.6463 16 97.94 6.737 91.203, 104.677 The 95% confidence interval for is wider than the 80% confidence interval for found in part a. c. For part a: We are 80% confident that the true population mean lies between 93.700 and 102.180. For part b: We are 95% confident that the true population mean lies between 91.203 and 104.677. The 95% confidence interval is wider than the 80% confidence interval because the more confident you want to be that lies in an interval, the wider the range of possible values. 6.29 a. The target parameter is = mean trap spacing for the population of red spiny lobster fishermen fishing in Baja California Sur, Mexico. b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Trap Variable Trap N 7 Mean 89.86 StDev 11.63 Minimum 70.00 Q1 82.00 Median 93.00 Q3 99.00 Maximum 105.00 The point estimate of is x 89.86 . c. For this problem, the sample size is n 7 . For a small sample size, the Central Limit Theorem does not apply. Therefore, we do not know what the sampling distribution of x is. d. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 7 1 6 , t.025 2.447 . The 95% confidence interval is: x t.025 e. s n 89.86 2.447 11.63 7 89.86 10.756 (79.104, 100.616) We are 95% confident that the true mean trap spacing for the population of red spiny lobster fishermen fishing in Baja California Sur, Mexico is between 79.104 and 100.616 meters. Copyright © 2014 Pearson Education, Inc. 302 Chapter 6 f. 6.30 We must assume that the population of trap spacings is normally distributed and that the sample is a random sample. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 12 1 11 , t.025 2.201 . The 95% confidence interval is: s x t.025 n 3, 643 2.201 4, 487 12 3, 643 2,850.92 (792.08, 6,493.92) We are 95% confident that the true mean level of radon exposure in tombs in the Valley of Kings is between 792.08 and 6,493.92 Bq/m3. 6.31 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n 1 25 1 24, t.05 1.711 . The 90% confidence interval is: x t.05 s n 75.4 1.711 10.9 25 75.4 3.73 (71.67, 79.13) We are 90% confident that the true mean breaking strength of the white wood is between 71.67 and 79.13. 6.32 We must assume that the distribution of the LOS's for all patients is normal. a. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n 1 20 1 19 , t.05 1.729 . The 90% confidence interval is: x t.05 6.33 6.34 s n 3.8 1.729 1.2 20 3.8 .464 3.336, 4.264 b. We are 90% confident that the mean LOS is between 3.336 and 4.264 days. c. “90% confidence” means that if repeated samples of size n are selected from a population and 90% confidence intervals are constructed, 90% of all intervals thus constructed will contain the population mean. a. The 95% confidence interval for the mean surface roughness of coated interior pipe is (1.63580, 2.12620). b. No. Since 2.5 does not fall in the 95% confidence interval, it would be very unlikely that the average surface roughness would be as high as 2.5 micrometers. a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: AAII Variable AAII N 13 Mean 10.82 StDev 7.71 Minimum -1.60 Q1 5.45 Median 9.80 Copyright © 2014 Pearson Education, Inc. Q3 16.80 Maximum 24.80 Inferences Based on a Single Sample: Estimation with Confidence Intervals 303 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n 1 13 1 12 , t.05 1.782 . The 90% confidence interval is: x t.05 s n 10.82 1.782 7.71 13 10.82 3.81 7.01, 14.63 We are 95% confident that the true average annualized percentage return on investment of all stock screeners provided by AAII is between 7.01 and 14.63. b. Since the confidence interval in part a contains only positive values, then on average, the AAII stock screeners perform better than the S&P500. c. We must assume that the annualized percentage returns on investment for all stock screeners are normally distributed and that the sample is random. Yes, this assumption seems to be satisfied. A histogram of the data is: Histogram of x 5 Frequency 4 3 2 1 0 0 5 10 15 20 25 x The distribution is fairly mound-shaped. 6.35 a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Skid Variable Skid N Mean 20 358.5 StDev 117.8 Minimum 141.0 Q1 276.0 Median 367.5 Q3 438.0 Maximum 574.0 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 20 1 19, t.025 2.093 . The 95% confidence interval is: x t.05 s n 358.5 2.093 117.8 20 358.5 55.13 (303.37, 413.63) b. We are 95% confident that the mean skidding distance is between 303.37 and 413.63 meters. c. In order for the inference to be valid, the skidding distances must be from a normal distribution. We will use the four methods to check for normality. First, we will look at a histogram of the data. Copyright © 2014 Pearson Education, Inc. Chapter 6 Using MINITAB, the histogram of the data is: Histogram of Skid 4 3 Fr equency 304 2 1 0 200 300 400 500 Skid From the histogram, the data appear to be fairly mound-shaped. This indicates that the data may be normal. Next, we look at the intervals x s, x 2s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. x s 358.5 117.8 (240.7, 476.3) 14 of the 20 values fall in this interval. The proportion is .70. This is very close to the .68 we would expect if the data were normal. x 2s 358.5 2(117.8) 358.5 235.6 (122.9, 594.1) 20 of the 20 values fall in this interval. The proportion is 1.00. This is a larger than the .95 we would expect if the data were normal. x 3s 358.5 3(117.8) 358.5 353.4 (5.1, 711.9) 20 of the 20 values fall in this interval. The proportion is 1.00. This is exactly the 1.00 we would expect if the data were normal. From this method, it appears that the data may be normal. Next, we look at the ratio of the IQR to s. IQR QU – QL = 438 – 276 162 . IQR 162 1.37 This is fairly close to the 1.3 we would expect if the data were normal. This s 117.8 method indicates the data may be normal. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 305 Finally, using MINITAB, the normal probability plot is: Probability Plot of Skid N ormal - 95% C I 99 95 90 Mean StDev 358.5 117.8 N AD P-Value 20 0.170 0.921 P er cent 80 70 60 50 40 30 20 10 5 1 0 100 200 300 400 Skid 500 600 700 800 Since the data form a fairly straight line, the data may be normal. From above, all the methods indicate the data may be normal. It appears that the assumption that the data come from a normal distribution is probably valid. 6.36 d. No. A distance of 425 meters falls above the 95% confidence interval that was computed in part a. It would be very unlikely to observe a mean skidding distance of at least 425 meters. a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: MTBE Variable MTBE N 12 Mean 97.17 StDev 113.76 Minimum 8.00 Q1 12.0 Median 50.5 Q3 146.0 Maximum 367.0 A point estimate for the true mean MTBE level for all well sites located near the New Jersey gasoline service station is x 97.17 . b. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table III, Appendix D, with df n 1 12 1 11, t.005 3.106 . The 99% confidence interval is: x t.005 s n 97.17 3.106 113.76 12 97.17 102.00 (4.83, 199.17) We are 99% confident that the true mean MTBE level for all well sites located near the New Jersey gasoline service station is between 4.83 and 199.17. c. We must assume that the data were sampled from a normal distribution. We will use the four methods to check for normality. First, we will look at a histogram of the data. Using MINITAB, the histogram of the data is: Copyright © 2014 Pearson Education, Inc. Chapter 6 Histogram of MTBE 5 4 Fr equency 306 3 2 1 0 0 50 100 150 200 M T BE 250 300 350 From the histogram, the data do not appear to be mound-shaped. This indicates that the data may not be normal. Next, we look at the intervals x s, x 2s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: x s 97.17 113.76 (16.59, 210.93) 10 of the 12 values fall in this interval. The proportion is .83. This is not very close to the .68 we would expect if the data were normal. x 2s 97.17 2(113.76) 97.17 227.52 (130.35, 324.69) 11 of the 12 values fall in this interval. The proportion is .92. This is a somewhat smaller than the .95 we would expect if the data were normal. x 3s 97.17 3(113.76) 97.17 341.28 (244.11, 438.45) 12 of the 12 values fall in this interval. The proportion is 1.00. This is exactly the 1.00 we would expect if the data were normal. From this method, it appears that the data may not be normal. Next, we look at the ratio of the IQR to s. IQR QU – QL = 146.0 –12.0 134.0 . IQR 134.0 1.18 This is somewhat smaller than the 1.3 we would expect if the data were normal. s 113.76 This method indicates the data may not be normal. Finally, using MINITAB, the normal probability plot is: Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 307 Probability Plot of MTBE N ormal - 95% C I 99 95 90 Mean StDev 97.17 113.8 N AD P-Value 12 0.929 0.012 P er cent 80 70 60 50 40 30 20 10 5 1 -300 -200 -100 0 100 200 M T BE 300 400 500 Since the data do not form a fairly straight line, the data may not be normal. From above, the all methods indicate the data may not be normal. are not normal. 6.37 a. It appears that the data probably Using MINITAB, the descriptive statistics are: Descriptive Statistics: Diox Amt Variable Diox Amt Crude No Yes N 10 6 Mean 2.590 0.517 StDev 1.542 0.407 Minimum 0.100 0.200 Q1 1.125 0.200 Median 2.850 0.450 Q3 4.000 0.700 Maximum 4.000 1.300 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 6 1 5 , t.025 2.571 . The 90% confidence interval is: x t.025 s n .517 2.571 .407 6 .517 .427 .090, .944 We are 95% confident that the true mean amount of dioxide present in water specimens that contain oil is between .090 and .944mg/l. b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 10 1 9 , t.025 2.262 . The 90% confidence interval is: x t.025 s n 2.590 2.262 1.542 10 2.590 1.103 1.487, 3.693 We are 95% confident that the true mean amount of dioxide present in water specimens that do not contain oil is between 1.487 and 3.693mg/l. c. Since the confidence interval for the mean amount of dioxide present in water specimens that contain oil is entirely below the confidence interval for the mean amount of dioxide present in water specimens that do not contain oil, we can conclude that the mean amount of dioxide present in water containing oil is significantly less than the mean amount of dioxide present in water not containing oil. Copyright © 2014 Pearson Education, Inc. 308 Chapter 6 6.38 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Comp Variable COMP N 10 Mean 1473 StDev 465 Minimum 825 Q1 1120 Median 1458 Q3 1758 Maximum 2220 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n 1 10 1 9, t.05 1.833 . The 90% confidence interval is: x t.05 s n 1473 1.833 465 10 1473 269.54 (1, 203.46, 1, 742.54) We are 90% confident that the true mean threshold compensation level for all major airlines is between $1,203.46 and $1,742.54. 6.39 a. The population from which the sample was drawn is the Forbes 212 Biggest Private companies. b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Revenue Variable Revenue N 15 Mean 5.39 StDev 4.86 Minimum 2.01 Q1 2.33 Median 2.80 Q3 6.99 Maximum 17.77 For confidence coefficient .98, .02 and / 2 .02 / 2 .01 . From Table III, Appendix D, with df n 1 15 1 14 , t.01 2.624 . The 98% confidence interval is: x t.01 6.40 s n 5.39 2.624 15 5.39 3.293 (2.097, 8.683) c. We are 98% confident that the mean revenue is between $2.097 and $8.683 billion. d. The population must be normally distributed in order for the procedure used in part b to be valid. e. Yes. The value of $5.0 billion dollars falls in the 98% confidence interval computed in part b. Therefore, we should believe the claim. By the Central Limit Theorem, the sampling distribution of p̂ is approximately normal with mean p̂ p and standard deviation p̂ 6.41 4.86 pq . n The sample size is large enough if both npˆ 15 and nqˆ 15 . a. When n 400 , pˆ .10 : npˆ 400(.10) 40 and nqˆ 400 .90 360 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. b. When n 50 , pˆ .10 : npˆ 50 .10 5 and nqˆ 50 .90 45 Since npˆ is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 309 c. When n 20 , pˆ .5 : npˆ 20 .5 10 and nqˆ 20 .5 10 Since both numbers are less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. d. When n 20 , pˆ .3 : npˆ 20 .3 6 and nqˆ 20 .7 14 Since both numbers are less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. 6.42 a. The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 121.88 106.48 and nqˆ 121.12 14.52 Since nqˆ is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. However, 14.52 is very close to 15, so the normal approximation may work fairly well. b. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: pˆ z .05 6.43 ˆˆ .88(.12) pq pq pˆ 1.645 .88 1.645 .88 .049 .831, .929 121 n n c. We must assume that the sample is a random sample from the population of interest and that the sample size is sufficiently large. a. The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 225 .46 103.5 and nqˆ 225 .54 121.5 b. Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval is: pˆ z .025 6.44 ˆˆ .46(.54) pq pq pˆ 1.96 .46 1.96 .46 .065 .395, .525 225 n n c. We are 95% confident the true value of p falls between .395 and .525. d. "95% confidence interval" means that if repeated samples of size 225 were selected from the population and 95% confidence intervals formed, 95% of all confidence intervals will contain the true value of p. a. Of the 50 observations, 15 like the product pˆ 15 .30 . 50 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 50 .3 15 and nqˆ 50 .7 35 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. Copyright © 2014 Pearson Education, Inc. 310 Chapter 6 For the confidence coefficient .80, .20 and / 2 .20 / 2 .10 . From Table II, Appendix D, z.05 1.28 . The confidence interval is: ˆˆ .3(.7) pq pq pˆ 1.28 .3 1.28 .3 .083 .217, .383 50 n n pˆ z .10 6.45 b. We are 80% confident the proportion of all consumers who like the new snack food is between .217 and .383. a. pˆ b. By the Central Limit Theorem, the sampling distribution of p̂ is approximately normal with p̂ p x 818 .4 n 2, 045 pq if n is sufficiently large. The sample size is sufficiently large if npˆ 15 and nqˆ 15 . n For this exercise, npˆ 2, 045(.4) 818 and nqˆ 2, 045(.6) 1, 227 . Since both values are greater than 15, the sample size is sufficiently large. and p̂ c. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 6.46 d. We are 95% confident that the true proportion of Arlington Texas homes with market values that are overestimated by more than 10% by Zillow is between .379 and .421. e. No, the claim is not believable. The 95% confidence interval constructed in part c does not contain .3. Thus, .3 is not a likely value for p. a. pˆ b. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: x 506 .504 n 1, 003 pˆ z.05 6.47 ˆˆ pq .4(.6) .4 1.96 .4 .021 (.379, .421) n 2, 045 ˆˆ pq .504(.496) .504 1.645 .504 .026 (.478, .530) n 1, 003 c. We are 90% confident that the true proportion of adults living in the U.S. who have paid to download music from the internet is between .478 and .530. d. “90% confident” means that in repeated sampling, 90% of all intervals constructed in a similar manner will contain the true proportion. a. The population of interest is all American adults. b. The sample is the 1,000 adults surveyed. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 311 c. The parameter of interest is the proportion of all American adults who think Starbucks coffee is overpriced, p. d. The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 1, 000 .73 730 and nqˆ 1, 000 .27 270 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval is: pˆ z / 2 pq pˆ z / 2 n ˆˆ .73(.27) pq .73 1.96 .73 .028 (.702, .758) 1000 n We are 95% confident that the true proportion of all American adults who say Starbucks coffee is overpriced is between .702 and .758. 6.48 a. pˆ x 63 .594 n 106 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: pˆ z.005 ˆˆ .594(.406) pq .594 2.58 .594 .123 (.471, .717) 106 n We are 99% confident that the true proportion of all social robots designed with legs but no wheels is between .471 and .717. 6.49 b. Since .40 does not fall in the 99% confidence interval, it is very unlikely that the true proportion of all social robots designed with legs but no wheels is .40. a. pˆ b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x 1, 298 .60 n 2,163 pˆ z.025 c. 6.50 p ˆˆ pq .60(.40) .60 1.96 .60 .02 (.58, .62) n 2,163 We are 95% confident that the true proportion of all drivers who are using a cell phone while operating a motor passenger vehicle is between .58 and .62. 22 x 2 20 2 .041 n 4 528 4 532 Copyright © 2014 Pearson Education, Inc. 312 Chapter 6 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: p z.025 6.51 pˆ .041(.959) pq .041 1.96 .041 .017 (.024, .058) 528 4 n4 x 144 .41 n 351 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: pˆ z.05 ˆˆ .41(.59) pq .41 1.645 .41 .043 (.367, .453) 351 n We are 90% confident that the true probability of unauthorized use of computer systems at an organization is between .367 and .453. If we were to take repeated samples and form similar confidence intervals, 90% of the confidence intervals would contain the true probability. 6.52 pˆ x 15 .15 n 100 Suppose we form a 95% confidence interval for the true proportion of minority-owned franchises in Mississippi. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 ˆˆ .15(.85) pq .15 1.96 .15 .07 (.08, .22) 100 n We are 95% confident that the true percentage of minority-owned franchises in Mississippi is between 8% and 22%. Since 20.5% falls in this interval, we would not conclude that the percentage of minority-owned franchises in Mississippi is less than the national value. 6.53 Of the 2,778 sampled firms, 748 announced one or more acquisitions during the year 2000. Thus, pˆ x 748 .269 n 2, 778 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 2, 778 .269 747 and nqˆ 2,778 .731 2,031 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 313 pˆ z / 2 ˆˆ pq pq .269(.731) pˆ 1.645 .269 1.645 .269 .014 (.255, .283) n n 2, 778 We are 90% confident that the true proportion of all firms that announced one or more acquisitions during the year 2000 is between .255 and .283. Changing these to percentages, the results would be 25.5% and 28.3%. 6.54 a. The population is all senior human resource executives at U.S. companies. b. The population parameter of interest is p, the proportion of all senior human resource executives at U.S. companies who believe that their hiring managers are interviewing too many people to find qualified candidates for the job. c. The point estimate of p is pˆ x 211 .42 . n 502 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 502 .42 210.84 and nqˆ 502 .58 291.16 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. d. For confidence coefficient .98, .02 and / 2 .02 / 2 .01 . From Table II, Appendix D, z.01 2.33 . The confidence interval is: pˆ z.01 ˆˆ .42(.58) pq .42 2.33 .42 .051 (.369, .471) 502 n We are 98% confident that the true proportion of all senior human resource executives at U.S. companies who believe that their hiring managers are interviewing too many people to find qualified candidates for the job is between .369 and .471. 6.55 e. A 90% confidence interval would be narrower. If the interval was narrower, it would contain fewer values, thus, we would be less confident. a. In order for the large-sample estimation method to be valid, npˆ 15 and nqˆ 15 . For this exercise, x 1 .003 , npˆ 333(.003) .999 , and nqˆ 333(.997) 332.001 . Since one of these n 333 values is less than 15, the large-sample estimation method is not valid. pˆ b. p 1 2 3 x2 .009 n 4 333 4 337 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: p z.025 .009(.991) pq .009 1.96 .009 .010 (.001, .019) 333 4 n4 Copyright © 2014 Pearson Education, Inc. 314 Chapter 6 We are 95% confident that the true proportion of all mountain casualties that require a femoral shaft splint is between 0 and .019. (We know the proportion cannot be negative, so the lower end point must be 0.) 6.56 a. x 16 .052 . n 308 The sample size is large enough if both npˆ 15 and nqˆ 15 . The point estimate of p is pˆ npˆ 308 .052 16 and nqˆ 308 .948 292 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: pˆ z.005 b. ˆˆ .052(.948) pq .052 2.58 .052 .033 (.019, .085) 308 n We are 99% confident that the true proportion of diamonds for sale on the open market that are classified as “D” color is between .019 and .085. x 81 .263 . The point estimate of p is pˆ n 308 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 308 .263 81 and nqˆ 308 .737 227 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: pˆ z.005 ˆˆ .263(.737) pq .263 2.58 .263 .065 (.198, .328) 308 n We are 99% confident that the true proportion of diamonds for sale on the open market that are classified as “VS1” clarity is between .198 and .328. 6.57 a. The parameter of interest is p, the proportion of all fillets that are red snapper. b. The estimate of p is pˆ x 22 17 .23 n 22 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 22 .23 5 and nqˆ 22 .77 17 Since npˆ is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 315 c. We will use Wilson’s adjustment to form the confidence interval. Using Wilson’s adjustment, the point estimate of the true proportion of all fillets that are not red snapper is p 7 x2 52 .269 n 4 22 4 26 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . Wilson’s adjusted 95% confidence interval is: .269(.731) pq .269 1.96 .269 .170 (.099, .439) 22 4 n We are 95% confident that the true proportion of all fillets that are red snapper is between .099 and .439. p z / 2 d. 6.58 a. For the large-sample estimation method to be valid, npˆ 15 and nqˆ 15 . For this exercise, x 12 .092 , npˆ 131(.092) 12.05 , and nqˆ 131(.908) 118.95 . Since one of these values n 131 is less than 15, the large-sample estimation method is not valid. We will use the Wilson’s adjustment. pˆ p 14 x 2 12 2 .104 n 4 131 4 135 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: p z.025 .104(.896) pq .104 1.96 .104 .051 (.053, .155) 131 4 n4 We are 95% confident that the true proportion of women with cosmetic dermatitis from using eye shadow who have a nickel allergy is between .053 and .155. b. In order for the large-sample estimation method to be valid, npˆ 15 and nqˆ 15 . For this exercise, x 25 .1 , npˆ 250(.1) 25 , and nqˆ 250(.9) 225 . Since both of these values are greater n 250 than 15, the large-sample estimation method is valid. pˆ For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 ˆˆ .1(.9) pq .1 1.96 .1 .037 (.063, .137) 250 n We are 95% confident that the true proportion of women with cosmetic dermatitis from using mascara who have a nickel allergy is between .063 and .137. Copyright © 2014 Pearson Education, Inc. 316 6.59 Chapter 6 c. No, we cannot determine which group is referenced. The value of .12 falls in both confidence intervals. pˆ x 282, 200 .85 n 332, 000 Suppose we form a 95% confidence interval for the true proportion of first class mail within the same city that is delivered on time between Dec. 10 and Mar. 3. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 ˆˆ pq .85(.15) .85 1.96 .85 .001 (.849, .851) n 332, 000 We are 95% confident that the true proportion of first class mail within the same city that is delivered on time between Dec. 10 and Mar. 3 is between .849 and .851 or between 84.9% and 85.1%. This interval does not contain the reported 95% of first class mailed delivered on time. It appears that the performance of the USPS is below the standard during this time period. z To compute the necessary sample size, use n / 2 where .05 and / 2 .05 / 2 .025 . From ME Table II, Appendix D, z.025 1.96 . 2 6.60 Thus, n 6.61 a. 1.962 (7.2) 307.328 308 . You would need to take 308 samples. .32 An estimate of is obtained from: range 4s s range 34 30 1 4 4 z To compute the necessary sample size, use n / 2 where .10 and / 2 .10 / 2 .05 . From ME Table II, Appendix D, z.05 1.645 . 2 1.645(1) Thus, n 67.65 68 .2 2 b. A less conservative estimate of is obtained from range 6s s range 34 30 .6667 6 6 z 1.645(.6667) Thus, n / 2 30.07 31 .2 ME 2 2 z / 2 pq 2 6.62 a. To compute the needed sample size, use n Table II, Appendix D, z.025 1.96 . ME 2 where .05 and / 2 .05 / 2 .025 . From Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 317 Thus, n (1.96)2 (.2)(.8) .08 2 96.04 97 . You would need to take a sample of size 97. z / 2 pq 2 b. To compute the needed sample size, use n ME 2 (1.96) 2(.5)(.5) 150.0625 151 . You would .08 2 need to take a sample of size 151. 6.63 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . We know p̂ is in the middle of the interval, so pˆ The confidence interval is pˆ z.05 .54 .26 .4 2 ˆˆ pq .4(.6) .4 1.645 n n .4(.6) .26 n .8059 .8059 .8059 .4 .26 .4 .26 n 5.756 n 5.756² 33.1 34 .14 n n We know .4 1.645 6.64 a. For a width of 5 units, ME 5 / 2 2.5 . z To compute the needed sample size, use n / 2 where .05 and / 2 .05 / 2 .025 . From ME Table II, Appendix D, z.025 1.96 . 2 1.96(14) Thus, n 120.47 121 2.5 2 You would need to take 121 samples at a cost of 121($10) = $1210. Yes, you do have sufficient funds. b. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . 1.645(14) n 84.86 85 2.5 2 You would need to take 85 samples at a cost of 85($10) = $850. You still have sufficient funds but have an increased risk of error. 6.65 a. The width of a confidence interval is W 2 ME 2 za /2 n For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . Copyright © 2014 Pearson Education, Inc. 318 Chapter 6 For n 16 , W 2za /2 n For n 25 , W 2za /2 n For n 49 , W 2za /2 n For n 100 , W 2za /2 For n 400 , W 2za /2 2 1.96 1 2 1.96 1 2 1.96 1 n n 16 25 49 2 1.96 2 1.96 0.98 0.784 0.56 1 100 1 400 0.392 0.196 b. 6.66 The sample size will be larger than necessary for any p other than .5. 6.67 From Exercise 6.29, the standard deviation is 11.63. If the width of the interval is 5, then ME 5 / 2 2.5 . For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . z 1.96(11.63) n /2 83.14 84 2.5 ME 2 6.68 2 From Exercise 6.48, pˆ .594 . For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . n 6.69 z2 / 2 pq 2.582 (.594)(.406) 285.4 286 ( ME ) 2 .0752 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . z 1.645(10.9) n /2 20.09 21 . Thus, we would need a sample of size 21. 4 ME 2 2 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 319 z To compute the necessary sample size, use n / 2 where .05 and / 2 .05 / 2 .025 . From ME Table II, Appendix D, z.025 1.96 . 2 6.70 1.96(12) Thus, n 245.86 246 1.5 2 6.71 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . Since we have no estimate given for the value of p, we will use .5. The sample size is: n 6.72 a. z2 / 2 pq 1.6452 (.5)(.5) 1, 691.3 1, 692 ( ME ) 2 .022 Since no level of significance was given, we will use 95%. From Exercise 6.16, s 2.755 . For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . z 1.96(2.755) n /2 116.6 117 .5 ME 2 b. 6.73 Answers will vary. A plan would need to be devised so that the selected shoppers were selected from a variety of different stores in a variety of locations so that the sample would be representative of the entire population. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.575 . From the previous estimate, we will use pˆ 1 / 3 to estimate p. n 6.74 2 z2 / 2 pq 2.5752 (1/ 3)(2 / 3) 14, 734.7 14, 735 ( ME )2 .012 From Exercise 6.58, the value of p̂ for both groups was close to .1. Since no level of significance was given, we will use 95%. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . n z2 / 2 pq 1.962 (.1)(.9) 384.16 385 ( ME )2 .032 z To compute the needed sample size, use n / 2 where .05 and / 2 .05 / 2 .025 . From Table ME II, Appendix D, z.025 1.96 . 2 6.75 1.96(10) Thus, for s 10 , n 42.68 43 3 2 1.96(20) For s 20 , n 170.72 171 3 2 Copyright © 2014 Pearson Education, Inc. 320 Chapter 6 1.96(30) For s 30 , n 384.16 385 3 2 z To compute the necessary sample size, use n / 2 where .10 and / 2 .10 / 2 .05 . From Table ME II, Appendix D, z.05 1.645 . 2 6.76 1.645(10) Thus, n 270.6 271 1 2 6.77 The bound is ME = .05. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.575 . z / 2 pq 2 We estimate p with pˆ 11/ 27 .407 . Thus, n ( ME ) 2 2.5752 (.407)(.593) 640.1 641 .052 The necessary sample size would be 641. The sample was not large enough. z To compute the needed sample size, use n / 2 where .10 and / 2 .10 / 2 .05 . From ME Table II, Appendix D, z.05 1.645 . 2 6.78 a. 1.645(2) Thus, n 1, 082.41 1, 083 .1 2 b. As the sample size decreases, the width of the confidence interval increases. Therefore, if we sample 100 parts instead of 1,083, the confidence interval would be wider. c. To compute the maximum confidence level that could be attained meeting the management's specifications, 100(.01) z z (2) n / 2 100 / 2 z2 / 2 .25 z / 2 .5 4 ME .1 2 2 Using Table II, Appendix D, P(0 z .5) .1915 . Thus, / 2 .5000 .1915 .3085 , 2 .3085 .617 , and 1 1 .617 .383 . The maximum confidence level would be 38.3%. 6.79 a. Percentage sampled = 1000 n (100%) (100%) 40% N 2500 Finite population correction factor: 2500 1000 N n .6 .7746 2500 N Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 321 b. Percentage sampled = 1000 n (100%) (100%) 20% N 5000 Finite population correction factor: c. Percentage sampled = 1000 n (100%) (100%) 10% N 10, 000 Finite population correction factor: d. Percentage sampled = 6.81 x = n 10, 000 1000 N n .9 .9487 10, 000 N 1000 n (100%) (100%) 1% N 100, 000 Finite population correction factor: 6.80 5000 1000 N n .8 .8944 5000 N 100, 000 1000 N n .99 .995 100, 000 N N n N 200 2500 1000 4.90 2500 a. x b. x c. x = 200 d. x 200 a. ˆ x s b. ˆ x 10, 000 4000 .6124 10, 000 4000 c. ˆ x 10, 000 10, 000 0 10, 000 10, 000 d. As n increases, x decreases. 1000 200 1000 5000 1000 5.66 5000 10, 000 1000 6.00 10, 000 1000 100, 000 1000 6.293 100, 000 1000 50 10, 000 2000 N n 1.00 N 10, 000 2000 n 50 50 Copyright © 2014 Pearson Education, Inc. 322 Chapter 6 e. We are computing the standard error of x . If the entire population is sampled, then x . There is no sampling error, so x 0 . 6.82 a. For n 64 , with the finite population correction factor: s N n 24 5000 64 ˆ x 3 .9872 2.9807 N 5000 n 64 Without the finite population correction factor: ˆ x s / n 24 64 3 ˆ x without the finite population correction factor is slightly larger. b. For n 400 , with the finite population correction factor: 24 5000 400 s N n ˆ x 1.2 .92 1.1510 N 5000 n 400 Without the finite population correction factor: ˆ x s / n 24 400 1.2 ˆ x without the finite population correction factor is larger. c. 6.83 In part a, n is smaller relative to N than in part b. Therefore, the finite population correction factor did not make as much of a difference in the answer in part a as in part b. The approximate 95% confidence interval for p is pˆ 2ˆ pˆ pˆ 2 6.84 An approximate 95% confidence interval for is: x 2ˆ x x 2 6.85 .42(.58) 6000 1600 pˆ (1 pˆ ) N n .42 2 .42 .021 .399, .441 1600 6000 n N a. x s n 14 375 40 N n 422 2 422 4.184 417.816, 426.184 375 N 40 x 1081 36.03 n 30 s 2 x x 2 2 n 1 n 1, 0812 30 96.3782 30 1 41, 747 The approximate 95% confidence interval is: x 2ˆ x x 2 s n 96.3782 300 30 N n 36.03 2 36.03 3.40 32.63, 39.43 300 N 30 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 323 b. pˆ x 21 .7 n 30 The approximate 95% confidence interval is: pˆ 2ˆ pˆ pˆ 2 6.86 a. .7(.3) 300 30 pˆ (1 pˆ ) N n .7 2 .7 .159 .541, .859 30 300 n N For N 2,193 , n 223 , x 116, 754 , and s 39,185 , the 95% confidence interval is: x 2ˆ x x 2 s n 39,185 2,193 223 N n 116, 754 2 116,754 4,974.06 2,193 N 223 (111, 779.94, 121, 728.06) 6.87 b. We are 95% confident that the mean salary of all vice presidents who subscribe to Quality Progress is between $111,777.94 and $121,728.06. a. First, we must estimate p: pˆ x 759 .560 n 1,355 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . Since n / N 1,355 /1, 696 .799 .05 , we must use the finite population correction factor. The 95% confidence interval is: pˆ z.025 6.88 ˆ ˆ N n pq .560(.440) 1, 696 1,355 .560 1.96 .560 .012 (.548, .572) n N 1,355 1, 696 b. We used the finite correction factor because the sample size was very large compared to the population size. c. We are 95% confident that the true proportion of active NFL players who select a professional coach as the most influential in their career is between .548 and .572. a. The population of interest is the set of all households headed by women that have incomes of $25,000 or more in the database. b. Yes. Since n / N 1,333 / 25, 000 .053 exceeds .05, we need to apply the finite population correction. c. The standard error for p̂ should be ˆ pˆ d. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The approximate 90% confidence interval is: .708(1 .708) 25,000 1,333 pˆ (1 pˆ ) N n .012 1333 25,000 n N pˆ 1.645ˆ pˆ .708 1.645 .012 .708 .020 .688, .728 Copyright © 2014 Pearson Education, Inc. 324 Chapter 6 6.89 a. fx First, we must calculate the sample mean: 15 x i 1 i i n 3(108) 2(55) 1(500) 19(100) 15, 646 156.46 100 100 The point estimate of the mean value of the parts inventory is x 156.46 . b. The sample variance and standard deviation are: 15 s 2 i 1 f x fx 2 i i 2 i i n n 1 6, 776,336 99 3(108) 2 2(55) 2 19(100) 2 100 1 15, 6462 100 2 15, 646 100 43, 720.83677 s s 2 43, 720.83677 209.10 The estimated standard error is ˆ x c. s n N n 209.10 500 100 18.7025 500 N 100 The approximate 95% confidence interval is: s N n 156.46 2 18.7025 156.46 37.405 119.055, 193.865 x 2ˆ x x 2 N n We are 95% confident that the mean value of the parts inventory is between $119.06 and $193.87. d. 6.90 Since the interval in part c does not include $300, the value of $300 is not a reasonable value for the mean value of the parts inventory. For N 1,500 , n 35 , x 1 , and s 124 , the 95% confidence interval is: s N n 124 1,500 35 x 2ˆ x x 2 1 2 1 41.43 40.43, 42.43 N 1,500 n 35 We are 95% confident that the mean error of the new system is between -$40.43 and $42.43. 6.91 pˆ x 15 .086 n 175 The standard error of p̂ is ˆ pˆ pˆ (1 pˆ ) N n .086(1 .086) 3000 175 .0206 n N 175 3000 An approximate 95% confidence interval for p is pˆ 2ˆ pˆ .086 2 .0206 .086 .041 .045, .127 Since .07 falls in the 95% confidence interval, it is not an uncommon value. Thus, there is no evidence that more than 7% of the corn-related products in this state have to be removed from shelves and warehouses. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 325 6.92 6.93 a. 2 2 16.0128 and .975,7 1.68987 / 2 .05 / 2 .025 ; .025,7 b. / 2 .10 / 2 .05 ; c. 2 2 39.9968 and .995,20 7.43386 / 2 .01/ 2 .005 ; .005,20 d. 2 2 34.1696 and .975,20 9.59083 / 2 .05 / 2 .025 ; .025,20 a. For confidence level .90, .10 and / 2 .10 / 2 .05 . Using Table IV, Appendix D, with 2 2 67.5048 and .95,49 34.7642 . The 90% confidence interval is: df n 1 50 1 49 , .05,49 (n 1) s 2 b. 2 .05 2 .05 2 .05 2 .95 (50 1)2.52 (50 1)2.52 2 4.537 2 8.809 67.5048 34.7642 2 (n 1) s 2 2 .95 (15 1).022 (15 1).022 2 .00024 2 .00085 23.6848 6.57063 2 (n 1) s 2 2 .95 (22 1)31.62 (22 1)31.62 2 641.86 2 1,809.09 32.6705 11.5913 For confidence level .90, .10 and / 2 .10 / 2 .05 . Using Table IV, Appendix D, with 2 2 9.48773 and .95,4 .710721 . The 90% confidence interval is: df n 1 5 1 4 , .05,4 (n 1) s 2 2 .05 6.94 (n 1) s 2 For confidence level .90, .10 and / 2 .10 / 2 .05 . Using Table IV, Appendix D, with 2 2 32.6705 and .95,21 11.5913 . The 90% confidence interval is: df n 1 22 1 21 , .05,21 (n 1) s 2 d. 2 For confidence level .90, .10 and / 2 .10 / 2 .05 . Using Table IV, Appendix D, with 2 2 23.6848 and .95,14 6.57063 . The 90% confidence interval is: df n 1 15 1 14 , .05,14 (n 1) s 2 c. 2 2 .05,16 26.2962 and .95,16 7.96164 2 (n 1) s 2 2 .95 (5 1)1.52 (5 1)1.52 2 .94859 2 12.6632 9.48773 .710721 To find the 90% confidence interval for , we need to take the square root of the end points of the 90% confidence interval for 2 from Exercise 6.93. a. The 90% confidence interval for is: 4.537 8.809 2.13 2.97 b. The 90% confidence interval for is: .00024 .00085 .016 .029 c. The 90% confidence interval for is: 641.86 1,809.09 25.34 42.53 d. The 90% confidence interval for is: .94859 12.6632 .974 3.559 Copyright © 2014 Pearson Education, Inc. 326 Chapter 6 6.95 Using MINITAB, the descriptive statistics are: Descriptive Statistics: x Variable N Mean StDev x 6 6.17 3.31 Minimum 2.00 Q1 2.75 Median 6.50 Q3 8.75 Maximum 11.00 For confidence level .95, .05 and / 2 .05 / 2 .025 . Using Table IV, Appendix D, with 2 2 12.8325 and .975,5 .831211 . The 95% confidence interval is: df n 1 6 1 5 , .025,5 (n 1) s 2 2 .025 6.96 2 (n 1) s 2 2 .975 (6 1)3.312 (6 1)3.312 2 4.269 2 65.904 12.8325 .831211 a. The target parameter is 2 , the population variation in WR scores for all convicted drug dealers. b. For confidence level .99, .01 and / 2 .01/ 2 .005 . Using Table IV, Appendix D, with 2 2 140.169 and .995,99 67.3276 . The 90% confidence interval is: df n 1 100 1 99 , .005,99 (n 1) s 2 2 .005 2 (n 1) s 2 2 .995 (100 1)62 (100 1)62 2 25.426 2 52.935 140.169 67.3276 c. “99% confidence” means that in repeated sampling, 99% of all confidence intervals constructed in a similar manner will contain the true variance. d. We must assume that a random sample was selected and that the population of interest is approximately normal. e. The variance in measured in terms of WR scores-squared. This is difficult to relate to the data. The standard deviation is measured in WR scores, the same units as the data. f. The 99% confidence interval for is: 25.426 52.935 5.042 7.276 We are 99% confident that the true standard deviation of WR scores is between 5.042 and 7.276. 6.97 a. To find the confidence interval for , we first find the confidence interval for 2 and then take the square root of the endpoints. For confidence level .95, .05 and / 2 .05 / 2 .025 . Using Table 2 2 71.4202 and .975,54 32.3574 . The 95% IV, Appendix D, with df n 1 55 1 54 , .025,54 confidence interval is: (n 1) s 2 2 .025 2 (n 1) s 2 2 .975 (55 1)(.15) 2 (55 1)(.15)2 2 .0170 2 .0375 71.4202 32.3574 The 95% confidence interval for is: .0170 .0375 0.130 0.194 We are 95% confident that the true standard deviation of the facial WHR values for all CEOs at publically traded Fortune 500 firms is between .130 and .194. b. In order for the interval to be valid, the distribution of WHR values should be approximately normally distributed. The distribution should look like: Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 327 Normal distribution 0 6.98 a. The 90% confidence interval for 2 is (672, 779). We are 90% confident that the true population variance of the level of support for all senior managers at CPA firms is between 672 and 779 points squared. b. For confidence level .90, .10 and / 2 .10 / 2 .05 . Using MINITAB with 2 2 1065.35 and .95,21 918.926 . df n 1 992 1 991 , .05,991 The 90% confidence interval is: (992 1)722 (992 1)722 2 671.6 2 778.6 1065.35 918.926 This corresponds to the interval on the printout. (n 1) s 2 2 .05 2 (n 1) s 2 2 .95 c. The 90% confidence interval for is (25.9, 27.9). d. To form the confidence interval for using the interval in part a, we take the square root of the endpoints: ( 672, 779) (25.9, 27.9) This is the same as the interval on the printout. 6.99 e. We are 90% confident that the true population standard deviation of the level of support for all senior managers at CPA firms is between 25.9 and 27.9 points. f. We must assume that the distribution of the level of support is approximately normally distributed. From Exercise 4.121, we concluded that the data were approximately normally distributed. To find the confidence interval for , we first find the confidence interval for 2 and then take the square root of the endpoints. For confidence level .95, .05 and / 2 .05 / 2 .025 . Using Table IV, 2 2 21.9200 and .975,11 3.81575 . The 95% confidence Appendix D, with df n 1 12 1 11 , .025,11 interval is: (n 1) s 2 2 .025 2 (n 1) s 2 2 .975 (12 1)(4, 487) 2 (12 1)(4, 487)2 2 10,103,323.86 2 58, 039, 666.91 21.9200 3.81575 Copyright © 2014 Pearson Education, Inc. 328 Chapter 6 The 95% confidence interval for is: 10,103,323.86 58, 039, 666.91 3,178.57 7, 618.38 We are 95% confident that the true standard deviation of radon levels in tombs in the Valley of Kings is between 3,178.57 and 7,618.38 Bq/m3. 6.100 To find the confidence interval for , we first find the confidence interval for 2 and then take the square root of the endpoints. For confidence level .95, .05 and / 2 .05 / 2 .025 . Using Table 2 2 30.1910 and .975,17 7.56418 . The 95% IV, Appendix D, with df n 1 18 1 17 , .025,17 a. confidence interval is: (n 1) s 2 2 .025 2 (n 1) s 2 2 .975 (18 1)(6.3)2 (18 1)(6.3) 2 2 22.349 2 89.201 30.1910 7.56418 The 95% confidence interval for is: 22.349 89.201 4.727 9.445 6.101 b. We are 95% confident that the true standard deviation of conduction times of the prototype system is between 4.727 and 9.445. c. No, the prototype system does not satisfy this requirement. In order to meet the requirement, the entire confidence interval constructed in part a would have to have values below 7. The interval constructed in part a contains 7, but also contains values greater than 7. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Drug Variable Drug N 50 Mean 89.291 StDev 3.183 Variance 10.134 Minimum 81.790 Median 89.375 Maximum 94.830 For confidence level .99, .01 and / 2 .01/ 2 .005 . From Table IV, Appendix D, with 2 2 79.4900 and .995,49 27.9907 . The 99% confidence interval is: df n 1 50 1 49 , .005,49 (50 1)(10.134) (50 1)(10.134) 2 6.247 2 17.740 79.4900 27.9907 We are 99% confident that the true population variation in drug concentrations for the new method is between 6.247 and 17.740. (n 1) s 2 6.102 a. 2 .005 2 (n 1) s 2 2 .995 Answers will vary. Using a statistical package, a random sample of 10 observations is: 148.289, 41.891, 73.051, 29.140, 211.240, 4.777, 49.255, 99.407, 90.823, 84.203 b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Sample Variable Sample N 10 Mean 83.2 StDev 60.5 Variance 3665.2 Minimum 4.78 Median 78.6 Copyright © 2014 Pearson Education, Inc. Maximum 211.2 Inferences Based on a Single Sample: Estimation with Confidence Intervals 329 For confidence level .95, .05 and / 2 .05 / 2 .025 . Using Table IV, Appendix D, with 2 2 19.0228 and .975,9 2.70039 . The 95% confidence interval is: df n 1 10 1 9 , .025,9 (n 1) s 2 2 .025 2 (n 1) s 2 2 .975 (10 1)(3,665.2) (10 1)(3, 665.2) 2 1, 734.066 2 12, 215.569 19.0228 2.70039 The measure of reliability for this estimate is 95%. c. Using MINITAB, the descriptive statistics are: Descriptive Statistics: INTTIME Variable INTTIME N 267 Mean 95.52 StDev 91.54 Variance 8379.41 Minimum 1.86 Median 70.88 Maximum 513.52 ( x ) . The variance reported here has a 2 The true population variance is found by 2 N 8,379.41(266) 8,3478.03 . This 267 value is in the 95% confidence interval constructed in part b. We know that in repeated sampling, 95% of all intervals constructed in a similar manner will contain the true variance and 5% will not. The interval that we constructed could be one of the 5% that did not contain the true variance. denominator of 266 instead of 267. The population variance is 6.103 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Spacing Variable Spacing N 7 Mean 89.86 StDev 11.63 Variance 135.14 Minimum 70.00 Median 93.00 Maximum 105.00 For confidence level .99, .01 and / 2 .01/ 2 .005 . Using Table IV, Appendix D, with 2 2 18.5476 and .995,6 .675727 . The 99% confidence interval is: df n 1 7 1 6 , .005,6 (n 1) s 2 6.104 2 .005 2 (n 1) s 2 2 .995 (7 1)(135.14) (7 1)(135.14) 2 43.717 2 1,199.952 18.5476 .675727 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Honey, DM Variable Honey DM a. N 35 33 Mean 10.714 8.333 StDev 2.855 3.256 Variance 8.151 10.604 Minimum 4.000 3.000 Median 11.000 9.000 Maximum 16.000 15.000 For confidence level .90, .10 and / 2 .10 / 2 .05 . Using MINITAB with 2 2 48.6024 and .95,34 21.6643 . The 90% confidence interval for the df n 1 35 1 34 , .05,34 variance is: (n 1) s 2 2 .05 2 (n 1) s 2 2 .95 (35 1)8.151 (35 1)8.151 2 5.702 2 12.792 48.6024 21.6643 The 95% confidence interval for the standard deviations is: 5.702 12.792 2.39 3.58 Copyright © 2014 Pearson Education, Inc. 330 Chapter 6 b. For confidence level .90, .10 and / 2 .10 / 2 .05 . Using MINITAB with 2 2 46.1943 and .95,32 20.0719 . The 90% confidence interval for the df n 1 33 1 32 , .05,32 variance is: (n 1) s 2 2 .05 2 (n 1) s 2 2 .95 (33 1)10.604 (33 1)10.604 2 7.346 2 16.906 46.1943 20.0719 The 95% confidence interval for the standard deviations is: 7.346 16.906 2.71 4.11 6.105 6.106 6.107 c. Since the confidence intervals overlap, the researchers cannot conclude that the variances of the two groups differ. a. P(t t0 ) .05 where df = 20. Thus, t0 1.725 . b. P(t t0 ) .005 where df = 9. Thus, t0 3.250 . c. P(t t0 or t t0 ) .10 where df = 8 is equivalent to P(t t0 ) .10 / 2 .05 where df = 8. Thus, t0 1.860 . d. P(t t0 or t t0 ) .01 where df = 17 is equivalent to P(t t0 ) .01/ 2 .005 where df = 17. Thus, t0 2.898 . a. For a small sample from a normal distribution with unknown standard deviation, we use the t-statistic. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n 1 23 1 22, t.025 2.074 . b. For a large sample from a distribution with an unknown standard deviation, we can estimate the population standard deviation with s and use the z-statistic. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . c. For a small sample from a normal distribution with known standard deviation, we use the z-statistic. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . d. For a large sample from a distribution about which nothing is known, we can estimate the population standard deviation with s and use the z-statistic. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . e. For a small sample from a distribution about which nothing is known, we can use neither z nor t. a. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.575 . The confidence interval is: x z.005 s n 32.5 2.575 30 225 32.5 5.15 27.35, 37.65 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 331 z 2.575(30) The sample size is n / 2 23,870.25 23,871 . .5 ME 2 b. c. For confidence level .99, .01 and / 2 .01/ 2 .005 . Using MINITAB with 2 2 282.268 and .995,224 173.238 . The 99% confidence interval is: df n 1 225 1 224 , .005,224 (n 1) s 2 2 .005 d. 6.108 a. 2 2 (n 1) s 2 2 .995 (225 1)(30)2 (225 1)(30)2 2 714.215 2 1,163.717 282.268 173.238 "99% confidence" means that if repeated samples of size 225 were selected from the population and 99% confidence intervals constructed for the population mean, then 99% of all the intervals constructed will contain the population mean. Of the 400 observations, 227 had the characteristic pˆ 227 / 400 .5675 . The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 400 .5675 227 and nqˆ 400 .4325 173 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 b. ˆˆ pq pq .5675(.4325) pˆ 1.96 .5675 1.96 .5675 .0486 .5189, .6161 n n 400 For this problem, ME .02 . For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . Thus, n ( z / 2 ) 2 pq (1.96) 2 (.5675)(.4325) 2,357.2 2,358 ME 2 .022 Thus, the sample size was 2,358. 6.109 6.110 a. 2 2 19.0228 and .975,9 2.70039 . Using Table IV, Appendix D, with df n 1 10 1 9 , .025,9 b. 2 2 32.8523 and .975,19 8.90655 . Using Table IV, Appendix D, with df n 1 20 1 19 , .025,19 c. 2 2 79.4900 and .995,49 27.9907 . Using Table IV, Appendix D, with df n 1 50 1 49 , .005,49 a. The finite population correction factor is: ( N n) N b. The finite population correction factor is: ( N n) (100 20) .8944 N 100 (2, 000 50) .9874 2, 000 Copyright © 2014 Pearson Education, Inc. 332 Chapter 6 c. 6.111 6.112 The finite population correction factor is: (1, 500 300) .8944 1,500 The parameters of interest for the problems are: (1) The question requires a categorical response. One parameter of interest might be the proportion, p, of all Americans over 18 years of age who think their health is generally very good or excellent. (2) A parameter of interest might be the mean number of days, , in the previous 30 days that all Americans over 18 years of age felt that their physical health was not good because of injury or illness. (3) A parameter of interest might be the mean number of days, , in the previous 30 days that all Americans over 18 years of age felt that their mental health was not good because of stress, depression, or problems with emotions. (4) A parameter of interest might be the mean number of days, , in the previous 30 days that all Americans over 18 years of age felt that their physical or mental health prevented them from performing their usual activities. a. A point estimate for the average number of latex gloves used per week by all healthcare workers with latex allergy is x 19.3 . b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z / 2 s n 19.3 1.96 11.9 46 19.3 3.44 (15.86, 22.74) c. We are 95% confident that the true average number of latex gloves used per week by all healthcare workers with a latex allergy is between 15.86 and 22.74. d. The conditions required for the interval to be valid are: i. ii. 6.113 ( N n) N The sample selected was randomly selected from the target population. The sample size is sufficiently large, i.e. n 30 . a. The point estimate of p is pˆ .11 . b. The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 150 .11 16.5 and nqˆ 150 .89 133.5 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 ˆˆ pq .11(.89) .11 1.96 .11 .05 (.06, .16) n 150 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 333 c. We are 95% confident that the true proportion of MSDSs that are satisfactorily completed is between .06 and .16. 6.114 a. Since all the people surveyed were from Muncie, Indiana, the population of interest is all consumers in Muncie, Indiana. b. The characteristic of interest in the population is the proportion of shoppers who believe that “Made in the USA” means that 100% of labor and materials are from the USA. c. The point estimate of p is pˆ x 64 .604 . n 106 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 106(.604) 64.024 and nqˆ 106(.396) 41.976 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: pˆ z.05 d. ˆˆ pq .604(.396) .604 1.645 .604 .078 (.526, .682) n 106 e. We are 90% confident that the true proportion of shoppers who believe that “Made in the USA” means that 100% of labor and materials are from the USA is between .526 and .682. “90% confidence” means that if we took repeated samples of size 106 and computed 90% confidence intervals for the true proportion shoppers who believe that “Made in the USA” means that 100% of labor and materials are from the USA, 90% of the intervals computed will contain the true proportion. f. From above, we will use pˆ .604 . For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . n 6.115 z2 / 2 pq 1.6452 (.604)(.396) 258.9 259 ( ME ) 2 .052 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . For this study, z 1.96(5) n /2 96.04 97 ME 1 2 6.116 The sample size needed is 97. a. From the printout, the 90% confidence interval for the mean lead level is (0.61, 5.16). b. From the printout, the 90% confidence interval for the mean copper level is (0.2637, 0.5529). Copyright © 2014 Pearson Education, Inc. 334 Chapter 6 c. We are 95% confident that the mean lead level in water specimens from Crystal Lakes Manors is between .61 and 5.16. We are 95% confident that the mean copper level in water specimens from Crystal Lakes Manors is between .2637 and .5529. d. 6.117 90% confidence means that if repeated samples of size n are selected and 90% confidence intervals formed, 90% of all confidence intervals will contain the true mean. First, we must estimate p: pˆ pˆ 2 x 50 .694 . The 95% confidence interval is: n 72 ˆˆ N n .694(.306) 251 72 pq .694 2 .694 .092 (.602, .786) 72 n N 251 We are 95% confident that the true proportion of all New Jersey Governor’s Council business members that have employees with substance abuse problems is between .602 and .786. 6.118 a. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: x z.05 n 12.2 1.645 10 100 12.2 1.645 10.555, 13.845 We are 90% confident that the mean number of days of sick leave taken by all the company’s employees is between 10.555 and 13.845. b. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . z 2.58(10) The sample size is n / 2 166.4 167 2 ME You would need to take n = 167 samples. 2 c. 2 To find the confidence interval for , we first find the confidence interval for 2 and then take the square root of the endpoints. For confidence level .90, .10 and / 2 .10 / 2 .05 . Using 2 2 124.342 and .95,99 77.9295 . The 90% confidence MINITAB with df n 1 100 1 99 , .05,99 interval is: (n 1) s 2 2 .05 2 (n 1) s 2 2 .95 (100 1)(10) 2 (100 1)(10) 2 2 79.619 2 127.038 124.342 77.9295 The 90% confidence interval for the standard deviation is: 79.619 127.038 8.923 11.271 We are 90% confident that the true population standard deviation of the number of sick days is between 8.923 and 11.271. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 335 6.119 a. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The 99% confidence interval is: x z / 2 6.120 s n 4.25 2.58 12.02 56 4.25 4.14 (0.11, 8.39) b. We are 99% confident that the true mean number of blogs/forums per site of all Fortune 500 firms that provide blogs and forums for marketing tools is between 0.11 and 8.39. c. No. Since our sample size is 56, the sampling distribution of x is approximately normal by the Central Limit Theorem. a. Answers will vary. Using MINITAB, 30 random numbers were generated using the uniform distribution from 1 to 308. The random numbers were: 9, 15, 19, 36, 46, 47, 63, 73, 90, 92, 108, 112, 117, 127, 144, 145, 150, 151, 172, 178, 218, 229, 230, 241, 242, 246, 252, 267, 274, 282 The 308 observations were numbered in the order that they appear in the file. Using the random numbers generated above, I selected the 9th, 15th, 19th, etc. observations for the sample. The selected sample is: .31, .34, .34, .50, .52, .53, .64, .72, .70, .70, .75, .78, 1.00, 1.00, 1.03, 1.04, 1.07, 1.10, .21, .24, .58, 1.01, .50, .57, .58, .61, .70, .81, .85, 1.00 b. Using MINITAB, the descriptive statistics for the sample of 30 observations are: Descriptive Statistics: carats-samp Variable carats-s N Mean 30 0.6910 Median StDev 0.7000 0.2620 Minimum 0.2100 Maximum 1.1000 Q1 Q3 0.5150 1.0000 From above, x .6910 and s .2620 . c. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z / 2 6.121 s n .691 1.96 .262 30 .691 .094 (.597, .785) d. We are 95% confident that the mean number of carats is between .597 and .785. e. From Exercise 2.49, we computed the “population” mean to be .631. This mean does fall in the 95% confidence interval we computed in part d. There are a total of 96 channel catfish in the sample. The point estimate of p is pˆ x 96 .667 . n 144 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 144 .667 96 and nqˆ 144 .333 48 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. Copyright © 2014 Pearson Education, Inc. 336 Chapter 6 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: pˆ z.05 ˆˆ pq .667(.333) .667 1.645 .667 .065 (.602, .732) n 144 We are 90% confident that the true proportion of channel catfish in the population is between .602 and .732. 6.122 a. For confidence coefficient .99 .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: x z / 2 s n 1.13 2.58 2.21 72 1.13 .67 (.46, 1.80) We are 99% confident that the mean number of pecks at the blue string is between .46 and 1.80. 6.123 b. Yes. The mean number of pecks at the white string is 7.5. This value does not fall in the 99% confident interval for the blue string found in part a. Thus, the chickens are more apt to peck at white string. a. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table III, Appendix D, with df n 1 3 1 2 , t.005 9.925 . The confidence interval is: x t.005 b. c. s n 49.3 9.925 1.5 3 49.3 8.60 40.70, 57.90 We are 99% confident that the mean percentage of B(a)p removed from all soil specimens using the poison is between 40.70% and 57.90%. We must assume that the distribution of the percentages of B(a)p removed from all soil specimens using the poison is normal. d. Since the 99% confidence interval for the mean percent removed contains 50%, this would be a very possible value. e. For confidence level .90, .10 and / 2 .10 / 2 .05 . Using Table IV, Appendix D, with 2 2 5.99147 and .95,2 .102587 . The 90% confidence interval is: df n 1 3 1 2 , .05,2 (n 1) s 2 2 .05 2 (n 1) s 2 2 .95 (3 1)(1.5) 2 (3 1)(1.5)2 2 .751 2 43.865 5.99147 .102587 We are 90% confident that the true population variance in the percentages of B(z)p removed is between .751 and 43.865. 6.124 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . For a width of .06, ME .06 / 2 .03 . The sample size is n ( z / 2 )2 pq (1.645)2 (.17)(.83) 424.2 425 ME 2 .032 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 337 You would need to take n 425 samples. 6.125 a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: IQ25, IQ60 Variable N IQ25 36 IQ60 36 Mean 66.83 45.31 Median 66.50 45.00 StDev 14.36 12.70 Minimum 41.00 22.00 Maximum 94.00 73.00 Q1 54.25 36.25 Q3 80.00 58.00 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: x z.005 s n 66.83 2.58 14.36 36 66.83 6.17 (60.66, 73.00) We are 99% confident that the mean raw IQ score for all 25-year-olds is between 60.66 and 73.00. b. We must assume that the sample is random, the observations are independent, and the sample size is sufficiently large. c. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: x z / 2 s n 45.31 1.96 12.7 36 45.31 4.15 (41.16, 49.46) We are 95% confident that the mean raw IQ score for all 60-year-olds is between 41.16 and 49.46. 6.126 a. The point estimate of p is pˆ x / n 35 / 55 .636 . b. The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 55 .636 35 and nqˆ 55 .364 20 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient, .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The confidence interval is: pˆ z.005 c. ˆˆ pq .636(.364) .636 2.58 .636 .167 .469, .803 n 55 We are 99% confident that the true proportion of fatal accidents involving children is between .469 and .803. Copyright © 2014 Pearson Education, Inc. 338 Chapter 6 d. The sample proportion of children killed by air bags who were not wearing seat belts or were improperly restrained is 24 / 35 .686 . This is rather large proportion. Whether a child is killed by an airbag could be related to whether or not he/she was properly restrained. Thus, the number of children killed by air bags could possibly be reduced if the child were properly restrained. e. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.575 . Also, ME .1 . The sample size is n 6.127 ( z / 2 )2 pq (2.575)2 (.636)(.364) 153.5 154 ME 2 .12 x 52 .867 . n 60 a. The point estimate of p is pˆ b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 ˆˆ pq .867(.133) .867 1.96 .867 .086 (.781, .953) n 60 c. We are 95% confident that the true proportion of Wal-Mart stores in California that have more than 2 inaccurately priced items per 100 scanned is between .781 and .953. d. If 99% of the California Wal-Mart stores are in compliance, then only 1% or .01 would not be. However, we found the 95% confidence interval for the proportion that are not in compliance is between .781 and .953. The value of .01 is not in this interval. Thus, it is not a likely value. This claim is not believable. e. The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 60(.867) 52 and nqˆ 60 .133 8 Since nqˆ is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. Thus, the confidence interval constructed in part b may not be valid. Any inference based on this interval is questionable. f. From above, the value of p̂ is .867. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . n z2 / 2 pq 1.6452 (.867)(.133) 124.8 125 ( ME ) 2 .052 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 339 6.128 Using MINITAB, the descriptive statistics are: Descriptive Statistics: r Variable r N 34 Mean 0.4224 Median StDev 0.4300 0.1998 Minimum -0.0800 Maximum 0.7400 Q1 0.2925 Q3 0.6000 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, . The confidence interval is: x z / 2 6.129 s n .4224 1.96 .1998 34 z.025 1.96 .4224 .0672 (.3552, .4896) We are 95% confident that the mean value of r is between .3552 and .4896. a. Of the 24 observations, 20 were 2 weeks of vacation pˆ 20 / 24 .833 . For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 b. ˆˆ pq .833(.167) .833 1.96 .833 .149 (.684, .982) n 24 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 24 .833 20 and nqˆ 24 .167 4 Since nqˆ is less than 15, the sample size is not sufficiently large to conclude the normal approximation is reasonable. The validity of the confidence interval is in question. c. The bound is ME = .02. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . Thus, z / 2 pq 2 n ( ME ) 2 1.96 2 (.833)(.167) 1, 336.02 1,337 .02 2 Thus, we would need a sample size of 1,337. 6.130 a. The point estimate for the fraction of the entire market who refuse to purchase bars is: pˆ b. x 23 .094 n 244 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 244 .094 22.9 and nqˆ 244 .906 221.1 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. Copyright © 2014 Pearson Education, Inc. 340 Chapter 6 c. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 d. 6.131 ˆˆ pq (.094)(.906) .094 1.96 .094 .037 (.057, .131) n 244 The best estimate of the true fraction of the entire market who refuse to purchase bars six months after the poisoning is .094. We are 95% confident the true fraction of the entire market who refuse to purchase bars six months after the poisoning is between .057 and .131. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . From Exercise 6.130, a good approximation for p is .094. Also, ME .02 . z / 2 pq 2 The sample size is n ( ME ) 2 (1.96) 2(.094)(.906) 817.9 818 .02 2 You would need to take n 818 samples. 6.132 The point estimate of p is pˆ x 36 .434 . n 83 The sample size is large enough if both npˆ 15 and nqˆ 15 . npˆ 83 .434 36 and nqˆ 83 .566 47 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: pˆ z.025 ˆˆ pq .434(.566) .434 1.96 .434 .107 (.327, .541) n 83 We are 95% confident that the true proportion of healthcare workers with latex allergies who suspects that he/she actually has the allergy is between .327 and .541. 6.133 Sampling error has to do with chance. In a population, there is variation – not all observations are the same. The sampling error has to do with the variation within a sample. By chance, one might get a sample that overestimates the mean just because all the observations in the sample happen to be high. Nonsampling error has to do with errors that have nothing to do with the sampling. These errors could be due to misunderstanding the question being asked, asking a question that the respondent does not know how to answer, etc. 6.134 a. 1 of the observations fall within k standard k2 1 deviations of the mean. We want to find k such that 1 2 .60 . k Using Chebyshev’s Theorem, we know that at least 1 1 1 1 1 .60 2 .40 k 2 2.5 k 2.5 1.5811 .4 k2 k Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Estimation with Confidence Intervals 341 Thus, s 80th percentile 20th percentile 73, 000 35,100 11,985.3267 2k 2(1.5811) For confidence coefficient .98, .02 and / 2 .02 / 2 .01 . From Table II, Appendix D, z.025 2.33 . z 2.33(11,985.3267) n /2 194.96 195 2, 000 ME 2 2 6.135 b. See part a. c. We have to assume that the estimate of the standard deviation is accurate. a. Answers will vary. Using a computer package, the 100 selected invoices are: 3590 1453 3726 2844 1767 1259 1091 1795 4431 4565 4586 1020 2135 1078 2659 4694 2572 4559 4601 965 4553 1052 3448 574 1360 3803 2247 1164 1862 2385 1255 4966 658 4007 4743 3746 3029 3723 3950 4662 217 949 4580 4126 1794 2912 67 2514 3544 1596 2344 1603 3744 1886 151 4258 183 1869 4509 4572 3875 34 3781 4993 1284 2177 4290 13 2717 287 2977 3459 4639 2272 3620 4646 1544 919 3820 1216 2052 4881 2220 3883 346 4744 312 4325 602 3137 121 2373 4684 2025 2254 4018 2304 3503 1634 2470 The observation numbers ending in 0 are highlighted above. b. x 10 .10 n 100 pˆ For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: pˆ z.05 c. 6.136 ˆˆ pq .10(.90) .10 1.645 .10 .049 (.051, .149) n 100 Our sample proportion was pˆ .10 which is equal to the true proportion. The confidence interval does contain .10. Since the manufacturer wants to be reasonably certain the process is really out of control before shutting down the process, we would want to use a high level of confidence for our inference. We will form a 99% confidence interval for the mean breaking strength. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table III, Appendix D, with df n 1 9 1 8 t.005 3.355 . The 99% confidence interval is: x t.005 s n 985.6 3.355 22.9 9 985.6 25.61 (959.99, 1,011.21) We are 99% confident that the true mean breaking strength is between 959.99 and 1,011.21. Since 1,000 is contained in this interval, it is not an unusual value for the true mean breaking strength. Thus, we would recommend that the process is not out of control. Copyright © 2014 Pearson Education, Inc. 342 Chapter 6 6.137 a. As long as the sample is random (and thus representative), a reliable estimate of the mean weight of all the scallops can be obtained. b. The government is using only the sample mean to make a decision. Rather than using a point estimate, they should probably use a confidence interval to estimate the true mean weight of the scallops so they can include a measure of reliability. a. We will form a 95% confidence interval for the mean weight of the scallops. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Weight Variable Weight N Mean StDev 18 0.9317 0.0753 Minimum 0.8400 Q1 0.8800 Median 0.9100 Q3 9800 Maximum 1.1400 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix A, with df = n – 1 = 18 – 1 = 17, t.025 = 2.110. The 95% confidence interval is: x t.025 s n .932 2.110 .0753 18 .932 .037 (.895, .969) We are 95% confident that the true mean weight of the scallops is between .8943 and .9691. Recall that the weights have been scaled so that a mean weight of 1 corresponds to 1/36 of a pound. Since the above confidence interval does not include 1, we have sufficient evidence to indicate that the minimum weight restriction was violated. Copyright © 2014 Pearson Education, Inc. Chapter 7 Inferences Based on a Single Sample: Tests of Hypothesis 7.1 The null hypothesis is the "status quo" hypothesis, while the alternative hypothesis is the research hypothesis. 7.2 The test statistic is used to decide whether or not to reject the null hypothesis in favor of the alternative hypothesis. 7.3 The "level of significance" of a test is . This is the probability that the test statistic will fall in the rejection region when the null hypothesis is true. 7.4 A Type I error is rejecting the null hypothesis when it is true. A Type II error is accepting the null hypothesis when it is false. the probability of committing a Type I error. the probability of committing a Type II error. 7.5 The four possible results are: 1. Rejecting the null hypothesis when it is true. This would be a Type I error. 2. Accepting the null hypothesis when it is true. This would be a correct decision. 3. Rejecting the null hypothesis when it is false. This would be a correct decision. 4. Accepting the null hypothesis when it is false. This would be a Type II error. 7.6 We can compute a measure of reliability for rejecting the null hypothesis when it is true. This measure of reliability is the probability of rejecting the null hypothesis when it is true which is . However, it is generally not possible to compute a measure of reliability for accepting the null hypothesis when it is false. We would have to compute the probability of accepting the null hypothesis when it is false, , for every value of the parameter in the alternative hypothesis. 7.7 When you reject the null hypothesis in favor of the alternative hypothesis, this does not prove the alternative hypothesis is correct. We are 100(1 )% confident that there is sufficient evidence to conclude that the alternative hypothesis is correct. If we were to repeatedly draw samples from the population and perform the test each time, approximately 100(1 )% of the tests performed would yield the correct decision. 7.8 a. 343 Copyright © 2014 Pearson Education, Inc. 344 Chapter 7 b. c. d. e. f. g. P( z 1.96) .025 P( z 1.645) .05 P( z 2.575) .005 P( z 1.28) .1003 P ( z 1.645or z 1.645) .10 P( z 2.575or z 2.575) .01 7.9 a. Let p = proportion of college presidents who believe that their online education courses are as good as or superior to courses that utilize traditional face-to-face instruction. The null hypothesis would be: H 0 : p .68 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.10 b. The rejection region requires / 2 .01/ 2 .005 in each tail if the z-distribution. From Table II, Appendix D, z.005 2.575 . The rejection region for a two-tailed test is z 2.575 or z 2.575 . a. Let average gain in green fees, lessons, or equipment expenditures for participating golf facilities. The null and alternative hypotheses would be: 345 H 0 : $2, 400 H a : $2, 400 7.11 b. The .05 is the Type I error rate. This means that the probability of concluding that the average gain in fees, lessons, or equipment expenditures for participation golf facilities exceeds $2,400 when in fact, the average is $2,400 is .05. c. The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Let p = student loan default rate in this year. To see if the student loan default rate is less than .07, we test: H 0 : p .07 H a : p .07 7.12 Let p = proportion of U.S. companies that have formal, written travel and entertainment policies for their employees. The null hypothesis would be: H 0 : p .80 7.13 Let mean caloric content of Virginia school lunches. To test the claim that after the testing period ended, the average caloric content dropped, we test: H 0 : 863 H a : 863 7.14 Let average Libor rate for 1-year loans. Since many Western banks think that the reported average Libor rate (1.1%) is too high, they want to show that the average is less than 1.1. The appropriate hypotheses would be: H 0 : 1.1 H a : 1.1 7.15 a. Since the company must give proof the drug is safe, the null hypothesis would be the drug is unsafe. The alternative hypothesis would be the drug is safe. b. A Type I error would be concluding the drug is safe when it is not safe. A Type II error would be concluding the drug is not safe when it is. is the probability of concluding the drug is safe when it is not. is the probability of concluding the drug is not safe when it is. c. In this problem, it would be more important for to be small. We would want the probability of concluding the drug is safe when it is not to be as small as possible. Copyright © 2014 Pearson Education, Inc. 346 7.16 Chapter 7 a. A Type I error would be concluding the proposed user is unauthorized when, in fact, the proposed user is authorized. A Type II error would be concluding the proposed user is authorized when, in fact, the proposed user is unauthorized. In this case, a more serious error would be a Type II error. One would not want to conclude that the proposed user is authorized when he/she is not. b. The Type I error rate is 1%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .01. The Type II error rate is .00025%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .0000025. c. The Type I error rate is .01%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .0001. The Type II error rate is .005%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .00005. 7.17 a. A Type I error is rejecting the null hypothesis when it is true. In a murder trial, we would be concluding that the accused is guilty when, in fact, he/she is innocent. A Type II error is accepting the null hypothesis when it is false. In this case, we would be concluding that the accused is innocent when, in fact, he/she is guilty. 7.18 b. Both errors are serious. However, if an innocent person is found guilty of murder and is put to death, there is no way to correct the error. On the other hand, if a guilty person is set free, he/she could murder again. c. In a jury trial, is assumed to be smaller than . The only way to convict the accused is for a unanimous decision of guilt. Thus, the probability of convicting an innocent person is set to be small. d. In order to get a unanimous vote to convict, there has to be overwhelming evidence of guilt. The probability of getting a unanimous vote of guilt if the person is really innocent will be very small. e. If a jury is prejudiced against a guilty verdict, the value of will decrease. The probability of convicting an innocent person will be even smaller if the jury if prejudiced against a guilty verdict. f. If a jury is prejudiced against a guilty verdict, the value of will increase. The probability of declaring a guilty person innocent will be larger if the jury is prejudiced against a guilty verdict. a. The null hypothesis is: Ho: There is no intrusion. b. The alternative hypothesis is: Ha: There is an intrusion. c. P(warning | no intrusion) 1 .001 . 1000 P(no warning | intrusion) 500 .5 1000 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis .7.19 7.20 7.21 7.22 a. p P( z 1.20) .5 .3849 .1151 b. p P( z 1.20) .5 .3849 .1151 c. The x is p P( z 1.20) P( z 1.20) 2(.1151) .2302 347 We will reject H0 if the p-value . a. .06 .05 , do not reject H0. b. .10 .05 , do not reject H0. c. .01 .05 , reject H0. d. .001 .05 , reject H0. e. .251 .05 , do not reject H0. f. .042 .05 , reject H0. a. Since the p-value .10 is greater than .05 , H0 is not rejected. b. Since the p-value .05 is less than .10 , H0 is rejected. c. Since the p -value .001 is less than .01 , H0 is rejected. d. Since the p-value .05 is greater than .025 , H0 is not rejected. e. Since the p-value .45 is greater than .10 , H0 is not rejected. z x 0 x 49.4 50 4.1/ 100 1.46 p-value p P( z 1.46) .5 .4279 .9279 (using Table II, Appendix D) Since the p-value is so large, H0 would not be rejected for any reasonable value of . There is no evidence to indicate the mean is greater than 50. 7.23 p-value p P( z 2.17) .5 P(0 z 2.17) .5 .4850 .0150 (using Table II, Appendix D) The probability of observing a test statistic of 2.17 or anything more unusual if the true mean is 100 is .0150. Since this probability is so small, there is evidence that the true mean is greater than 100. 7.24 First, find the value of the test statistic z x 0 x 10.7 10 3.1/ 50 1.60 p-value p P( z 1.60 or z 1.60) 2 P( z 1.60) 2(.5 .4452) 2(.0548) .1096 (using Table II, Appendix D) There is no evidence to reject H0 for .10 . Copyright © 2014 Pearson Education, Inc. 348 Chapter 7 7.25 p-value p P ( z 2.17) P ( z 2.17) 2(.5 .4850) 2(.0150) .0300 (using Table II, Appendix D) 7.26 a. The p-value reported by SPSS is for a two-tailed test. Thus, P ( z 1.63) P ( z 1.63) .1032 . For this one-tailed test, the p-value p P( z 1.63) .1032 / 2 .0516 . Since the p-value .0516 .05 , H0 is not rejected. There is insufficient evidence to indicate 75 at .05 . b. For this one-tailed test, the p-value P( z 1.63) . Since P( z 1.63) .1032 / 2 .0516 , P ( z 1.63) 1 .0516 .9484 . Since the p-value p .9484 .10 , H0 is not rejected. There is insufficient evidence to indicate 75 at .10 . c. For this one-tailed test, the p-value P( z 1.63) .1032 / 2 .0516 . Since the p-value p .0516 .10 , H0 is rejected. There is sufficient evidence to indicate 75 at .10 . d. For this two-tailed test, the p -value .1032 . Since the p-value .1032 .01 , H0 is not rejected. There is insufficient evidence to indicate 75 at .01 . 7.27 The smallest value of for which the null hypothesis would be rejected is just greater than .06. 7.28 a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with x and x b. n 20 400 The test statistic is z 1. x 0 n 72.5 70 2.5 . 20 400 c. The p-value is p P ( z 2.5) .5 .4938 .0062 . d. The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . e. Since the p-value is less than ( p .0062 .01) , H0 is rejected. There is sufficient evidence to indicate the true mean is greater than 70 at .01 . f. Since the observed value of the test statistics falls in the rejection region ( z 2.5 2.33) , H0 is rejected. There is sufficient evidence to indicate the true mean is greater than 70 at .01 . g. Yes, the conclusions in parts e and f agree. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.29 a. The decision rule is to reject H0 if x 270 . Recall that z x 0 x Therefore, reject H0 if x 270 can be written as reject H0 if z 349 . x 0 x 270 255 63 / 81 2.14 . The decision rule in terms of z is to reject H0 if z 2.14 . b. 7.30 a. P z 2.14 .5 P 0 z 2.14 .5 .4838 .0162 H 0 : 100 H a : 100 x 0 The test statistic is z x x 0 / n 110 100 60 / 100 1.67 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region, z 1.67 1.645 , H0 is rejected. There is sufficient evidence to indicate the true population mean is greater than 100 at .05 . b. H 0 : 100 H a : 100 x 0 The test statistic is z x 110 100 60 / 100 1.67 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region, ( z 1.67 1.96) , H0 is not rejected. There is insufficient evidence to indicate does not equal 100 at .05 . c. 7.31 a. In part a, we rejected H0 and concluded the mean was greater than 100. In part b, we did not reject H0. There was insufficient evidence to conclude the mean was different from 100. Because the alternative hypothesis in part a is more specific than the one in b, it is easier to reject H0. H 0 : .36 H a : .36 The test statistic is z x 0 x .323 .36 .034 / 64 1.61 The rejection region requires .10 in the lower tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . Copyright © 2014 Pearson Education, Inc. 350 Chapter 7 Since the observed value of the test statistic falls in the rejection region ( z 1.61 1.28) , H0 is rejected. There is sufficient evidence to indicate the mean is less than .36 at .10 . b. H 0 : .36 H a : .36 The test statistic is z 1.61 (see part a). The rejection region requires / 2 .10 / 2 .05 in the each tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 or z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.61< 1.645) , H0 is not rejected. There is insufficient evidence to indicate the mean is different from .36 at .10 . 7.32 a. Let true mean level of support. To determine if the true mean level of support differs from 75, we test: H 0 : 75 H a : 75 b. For this problem, a Type I error would be concluding the true mean level of support differs from 75 when, in fact, the true mean level of support is 75. For this problem, a Type II error would be concluding the true mean level of support equals 75 when, in fact, the true mean level of support differs from 75. 7.33 c. The test statistic is z 8.4923 and the p-value is p .0001 . d. Since the p-value is less than ( p .0001 .05) , H0 is rejected. There is sufficient evidence to indicate the true mean level of support differs from 75 at .05 . e. We do not need to make any assumptions about the distribution of support levels. The sample size is very large (n 992) . Thus, the Central Limit Theorem holds and no assumptions are necessary. a. Let true mean willingness to eat the brand of sliced apples. To determine if the true mean willingness to eat the brand of sliced apples exceeds 3, we test: H0 : 3 Ha : 3 The test statistic is z x 0 n 3.69 3 5.71 . 2.44 408 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 5.71 1.645) , H0 is rejected. There is sufficient evidence to indicate that the true mean willingness to eat the brand of sliced apples exceeds 3 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.34 351 b. Even though the willingness to eat scores are not normally distributed, the test in part a is valid. Because the sample size is so large (n 408) , the Central Limit Theorem applies. a. Let mean Mach rating score for all purchasing managers. To determine if the mean Mach rating score is different from 85, we test: H 0 : 85 H a : 85 b. The rejection requires / 2 .10 / 2 .05 in each tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 or z 1.645 . c. The test statistic is z d. Since the observed value of the test statistic falls in the rejection region z 12.80 1.645 , H0 is x o x 99.6 85 12.6 / 122 12.80 . rejected. There is sufficient evidence to indicate that the true mean Mach rating score of all purchasing managers is not 85 at .10 . 7.35 Let true mean facial width-to-height ratio. To determine if the true mean facial width-to-height ratio differs from 2.2, we test: H 0 : 2.2 H a : 2.2 The test statistic is z x 0 n 1.96 2.2 11.87 . .15 55 The p-value is p P( z 11.87) P( z 11.87) 0 0 0 . Since the p-value is so small ( p 0) , H0 will be rejected for any reasonable value of . There is sufficient evidence to indicate the true mean facial width-to-height ratio differs from 2.2 for .001 . 7.36 a. Let true mean rate of return of round-trip trades. To determine if the true mean rate of return of round-trip trades is positive, we test: H0 : 0 Ha : 0 b. The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . c. probability of making a Type I error or the probability of rejecting H0 when H0 is true. Thus, probability of concluding the true mean rate of return of round-trip trades is positive when, in fact, it is not. d. The test statistic is t 4.73 and the p-value is p 0.000 . Copyright © 2014 Pearson Education, Inc. 352 7.37 Chapter 7 e. Since the p-value is less than ( p 0.000 .05) , H0 is rejected. There is sufficient evidence to indicate the true mean rate of return of round-trip trades is positive at .05 . a. Let true mean weight of golf tees. To determine if the process is not operating satisfactorily, we test: H 0 : .250 H a : .250 b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tees Variable N Mean Median StDev Tees 40 0.25248 0.25300 0.00223 Minimum Maximum Q1 Q3 0.24700 0.25600 0.25100 0.25400 Thus, x .25248 and s .00223 . x 0 The test statistic is z d. The p-value is p P( z 7.03) P( z 7.03) 0 0 0 . e. The rejection region requires / 2 .01/ 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.575 . The rejection region is z 2.575 or z 2.575 . f. Since the observed value of the test statistic falls in the rejection region z 7.03 2.575 , H0 is x .25248 .250 c. .00223 / 40 7.03 . rejected. There is sufficient evidence to indicate the process is performing in an unsatisfactory manner at .01 . g. is the probability of a Type I error. A Type I error, in this case, is to say the process is unsatisfactory when, in fact, it is satisfactory. The risk, then, is to the producer since he will be spending time and money to repair a process that is not in error. is the probability of a Type II error. A Type II error, in this case, is to say the process is satisfactory when it, in fact, is not. This is the consumer's risk since he could unknowingly purchase a defective product. 7.38 Let mean IQ score of all Norway residents who were 6th-born or later. To determine if the mean IQ score of all Norway residents who were 6th-born or later is lower than the country mean, we test: H 0 : 5.2 H a : 5.2 The test statistic is z x 0 x 4.7 5.2 1.8 / 581 6.70 . The p-value is p P( z 6.70) .5 .5 0 (using Table II, Appendix D) Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 353 Since the p-value is less than ( p 0 .01) , H0 is rejected. There is sufficient evidence to indicate the true mean IQ score of all Norway residents who were 6th-born or later is lower than the country mean at .01 . 7.39 a. Let mean estimated time to read the report. To determine if the students, on average, overestimate the time it takes to read the report, we test: H 0 : 48 H a : 48 The test statistic is z x 0 x 60 48 1.85 . 41/ 40 The p-value is p P( z 1.85) .5 .4678 .0322 (using Table II, Appendix D) Since the p-value is less than ( p .0322 .10) , H0 is rejected. There is sufficient evidence to indicate the students, on average, overestimate the time it takes to read the report at .10 . b. Let mean estimated number of pages of the report read. To determine if the students, on average, underestimate the number of report pages read, we test: H 0 : 32 H a : 32 The test statistic is z x 0 x 28 32 14 / 42 1.85 . The p-value is p P( z 1.85) .5 .4678 .0322 (using Table II, Appendix D) Since the p-value is less than ( p .0322 .10) , H0 is rejected. There is sufficient evidence to indicate the students, on average, underestimate the number of report pages read at .10 . c. 7.40 No. In both tests, the sample sizes are greater than 30. Thus, the Central Limit Theorem will apply. The distribution of x is approximately normal regardless of the population distribution. Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable GASTURBINE N Mean 67 11066 StDev 1595 Minimum 8714 Q1 9918 Median 10656 Q3 11842 Maximum 16243 To determine if the mean heat rate of gas turbines augmented with high pressure inlet fogging exceeds 10,000 kJ/kWh, we test: H 0 : 10, 000 H a : 10, 000 The test statistic is z x o x 11, 066 10, 000 1,595 67 5.47 . Copyright © 2014 Pearson Education, Inc. 354 Chapter 7 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistics falls in the rejection region z 5.47 1.645 , H0 is rejected. There is sufficient evidence to indicate the true mean heat rate of gas turbines augmented with high pressure inlet fogging exceeds 10,000 kJ/kWh at .05 . 7.41 To determine if the mean point-spread error is different from 0, we test: H0 : 0 Ha : 0 The test statistic is z x 0 x 1.6 0 13.3 / 240 1.86 The rejection region requires / 2 .01/ 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.575 . The rejection region is z 2.575 or z 2.575 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.86 2.575) , H0 is not rejected. There is insufficient evidence to indicate that the true mean point-spread error is different from 0 at .01 . 7.42 a. Let average full-service fee (in thousands of dollars) of U.S. funeral homes in the current year. To determine if the average full-service fee exceeds $6,500, we test: H 0 : 6.50 H a : 6.50 b. Using MINITAB, the output is: Descriptive Statistics: FUNERAL Variable N Fee 36 Mean 6.819 Median 6.600 StDev 1.265 Minimum 5.200 Maximum 11.600 Q1 6.025 Q3 7.400 H 0 : 6.50 H a : 6.50 The test statistic is z x 0 x 6.819 6.50 1.265 36 1.51 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . .645) , H0 Since the observed value of the test statistic does not fall in the rejection region ( z 1.51>1 is not rejected. There is insufficient evidence to indicate the true mean full-service fee of U.S. funeral homes in the current year exceeds $6,500 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis c. 355 No. Since the sample size n 36 is greater than 30, the Central Limit Theorem applies. The distribution of x is approximately normal regardless of the population distribution. 7.43 a. To determine if the true mean forecast error for buy-side analysts is positive, we test: H0 : 0 Ha : 0 The test statistic is z x o x .85 0 1.93 / 3,526 26.15 . The observed p-value of the test is p P z 26.15 0 (Using Table II, Appendix D) Since the p-value is less than ( p 0 .01) , H0 is rejected. There is sufficient evidence to indicate that the true mean forecast error for buy-side analysts is positive at .01 . This means that the buyside analysts are overestimating earnings. b. To determine if the true mean forecast error for sell-side analysts is negative; we test: H0 : 0 Ha : 0 The test statistic is z x o x .05 0 .85 / 58,562 14.24 . The observed p-value of the test is p P( z 14.24) 0 (using Table II, Appendix D) Since the p-value is less than ( p 0 .01) , H0 is rejected. There is sufficient evidence to indicate that the true mean forecast error for sell-side analysts is negative at .01 . This means that the sellside analysts are underestimating earnings. 7.44 a. To determine if the sample data refute the manufacturer's claim, we test: H 0 : 10 H a : 10 b. A Type I error is concluding the mean number of solder joints inspected per second is less than 10 when, in fact, it is 10 or more. A Type II error is concluding the mean number of solder joints inspected per second is at least 10 when, in fact, it is less than 10. c. Using MINITAB, the descriptive statistics are: Descriptive Statistics: PCB Variable PCB N 48 Mean 9.292 Median 9.000 StDev 2.103 Minimum 0.000 Maximum 13.000 H 0 : 10 H a : 10 Copyright © 2014 Pearson Education, Inc. Q1 9.000 Q3 10.000 356 Chapter 7 x 0 The test statistic is z 9.292 10 2.33 2.103 / 48 x The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 2.33 1.645) , H0 is rejected. There is sufficient evidence to indicate the mean number of inspections per second is less than 10 at .05 . 7.45 a. To determine if CEOs at all California small firms generally agree with the statement, we test: H 0 : 3.5 H a : 3.5 The test statistic is z x o x 3.85 3.5 1.5 137 2.73 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistics falls in the rejection region ( z 2.73 1.645) , H0 is rejected. There is sufficient evidence to indicate CEOs at all California small firms generally agree with the statement (true mean scale score exceeds 3.5) at .05 . b. Although the sample mean of 3.85 is far enough away from 3.5 to statistically conclude the population mean score is greater than 3.5, a score of 3.85 may not be practically different from 3.5 to make any difference. c. No. Since the sample size n 137 is greater than 30, the Central Limit Theorem applies. The distribution of x is approximately normal regardless of the population distribution. 7.46 a. No. Since the hypothesized value of M 60, 000 falls in the 95% confidence interval, it is a likely candidate for the true mean. Thus, we would not reject H0. There is no evidence that the mean salary for males differs from $60,000. b. To determine if the true mean salary of males with post-graduate degrees differs from $60,000, we test: H 0 : 60, 000 H a : 60, 000 The test statistic is z x 0 x 61,340 60,000 0.61 2,185 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 0.61 1.96) , H0 is not rejected. There is insufficient evidence to indicate the true mean salary of males with postgraduate degrees differs from $60,000 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 357 c. Parts a and b must agree. In both cases, a two-sided test / confidence interval is used. The z-score used in both parts is the same, as are x and sx . d. No. Since the hypothesized value of F 33, 000 falls in the 95% confidence interval, it is a likely candidate for the true mean. Thus, we would not reject H0. There is no evidence that the mean salary for females differs from $33,000. e. To determine if the true mean salary of females with post-graduate degrees differs from $33,000, we test: H 0 : 33, 000 H a : 33,000 The test statistic is z x 0 x 32, 227 33, 000 0.83 932 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 0.83 1.96) , H0 is not rejected. There is insufficient evidence to indicate the true mean salary of females with post-graduate degrees differs from $33,000 at .05 . 7.47 7.48 f. Parts d and e must agree. In both cases, a two-sided test / confidence interval is used. The z-score used in both parts is the same, as are x and sx . a. We should use the t-distribution in testing a hypothesis about a population mean if the sample size is small, the population being sampled from is normal, and the variance of the population is unknown. b. Both distributions are mound-shaped and symmetric. The t-distribution is flatter than the zdistribution. a. P t 1.440 .10 (Using Table III, Appendix D, with df 6 ) b. P (t 1.782) .05 (Using Table III, Appendix D, with df 12 ) Copyright © 2014 Pearson Education, Inc. 358 c. Chapter 7 P (t 2.060) P t 2.060 .025 .025 .05 (Using Table III, Appendix D, with df 25 ) 7.49 7.50 d. The probability of a Type I error is computed above for each of the parts. a. The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df n 1 14 1 13 . From Table III, Appendix D, t.025 2.160 . The rejection region is t 2.160 or t 2.160 . b. The rejection region requires .01 in the upper tail of the t-distribution with df n 1 24 1 23 . From Table III, Appendix D, t.01 2.500 . The rejection region is t 2.500 . c. The rejection region requires .10 in the upper tail of the t-distribution with df n 1 9 1 8 . From Table III, Appendix D, t.10 1.397 . The rejection region is t 1.397 . d. The rejection region requires .01 in the lower tail of the t-distribution with df n 1 12 1 11 . From Table III, Appendix D, t.01 2.718 . The rejection region is t 2.718 . e. The rejection region requires / 2 .10 / 2 .05 in each tail of the t-distribution with df n 1 20 1 19 . From Table III, Appendix D, t.05 1.729 . The rejection region is t 1.729 or t 1.729 . f. The rejection region requires .05 in the lower tail of the t-distribution with df n 1 4 1 3 . From Table III, Appendix D, t.05 2.353 . The rejection region is t 2.353 . a. H0 : 6 Ha : 6 The test statistic is t x 0 4.8 6 2.064 s / n 1.3 / 5 The necessary assumption is that the population is normal. The rejection region requires .05 in the lower tail of the t-distribution with df n 1 5 1 4 . From Table III, Appendix D, t.05 2.132 . The rejection region is t 2.132 . Since the observed value of the test statistic does not fall in the rejection region (t 2.064 2.132) , H0 is not rejected. There is insufficient evidence to indicate the mean is less than 6 at .05 . b. H0 : 6 Ha : 6 The test statistic is t 2.064 (from a). The assumption is the same as in a. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 359 The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df n 1 5 1 4 . From Table III, Appendix D, t.025 2.776 . The rejection region is t 2.776 or t 2.776 . c. Since the observed value of the test statistic does not fall in the rejection region (t 2.064 2.776) , H0 is not rejected. There is insufficient evidence to indicate the mean is different from 6 at .05 . For part a, the p-value P (t 2.064) . Using MINITAB, Cumulative Distribution Function Student's t distribution with 4 DF x -2.064 P( X <= x ) 0.0539809 The p-value is p .05398 . For part b, the p-value P(t 2.064) P(t 2.064) . The p-value is p 2(.05398) .10796 . 7.51 a. We must assume that a random sample was drawn from a normal population. b. The hypotheses are: H 0 : 1, 000 H a : 1, 000 The test statistic is t 1.89 and the p-value is p .038 . Since the p-value is so small, there is evidence to reject H0. There is evidence to indicate the mean is greater than 1000 for .038 . c. The hypotheses are: H 0 : 1, 000 H a : 1, 000 The test statistic is t 1.89 and the p-value is 2(.038) .076 . There is no evidence to reject H0 for .05 . There is insufficient evidence to indicate the mean is different than 1000 for .05 . There is evidence to reject H0 for .076 . There is evidence to indicate the mean is different than 1000 for .076 . 7.52 a. To determine if the mean of the trap spacing measurements differs from 95 meters, we test: H 0 : 95 H a : 95 Copyright © 2014 Pearson Education, Inc. 360 Chapter 7 b. The value of x varies from sample to sample. The next sample may yield a value of x that is greater than 95. We must determine how unusual a value of x 89.9 is if the true mean is 95. c. The test statistic is t d. Using MINITAB, the results are: x 0 89.9 95 1.16 . s 11.6 n 7 One-Sample T Test of mu = 95 vs not = 95 N 7 Mean 89.9000 StDev 11.6000 SE Mean 4.3844 95% CI (79.1718, 100.6282) T -1.16 P 0.289 The p-value is p .289 . 7.53 e. Suppose we pick .05 . For this problem, probability of concluding the mean trap spacing is different from 95 when, in fact, the mean trap spacing is equal to 95. f. Since the p-value is greater than ( p .289 .05) , H0 is not rejected. There is insufficient evidence to indicate the mean trap spacing is different from 95 at .05 . g. In order for the test to be valid, the population of trap spacing measurements must be normal and the sample must be random. h. From Exercise 6.29, the 95% confidence interval is (79.104, 100.616). Since 95 is contained in this interval, there is no evidence to indicate the mean trap spacing is different from 95. This agrees with the test in part f. To determine if the mean level of radon exposure in the tombs is less than 6,000 Bq/m3, we test: H 0 : 6, 000 H a : 6, 000 From the printout, the test statistic is t 1.82 . Since this is a one-tailed test, the p-value is p .096 / 2 .0480 . Since the p-value is less than ( p .048 .10) , H0 is rejected. There is sufficient evidence to indicate the mean level of radon exposure is less than 6,000 Bq/m3 at .10 . 7.54 a. To determine if the true mean breaking strength of the new bonding adhesive is less than 5.70 Mpa, we test: H 0 : 5.70 H a : 5.70 b. The rejection region requires .01 in the lower tail of the t-distribution with df n 1 10 1 9 . From Table III, Appendix D, t.01 2.821 . The rejection region is t 2.821 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.55 x o 5.07 5.70 c. The test statistic is t d. Since the observed value of the test statistic falls in the rejection region (t 4.33 2.821) , H0 is rejected. There is sufficient evidence to indicate the true mean breaking strength of the new bonding adhesive is less than 5.70 Mpa at .01 . e. We must assume that the sample was random and selected from a normal population. a. To determine if the mean surface roughness of coated interior pipe differs from 2 micrometers, we test: s n 361 .46 10 4.33 . H0 : 2 Ha : 2 7.56 b. From the printout, the test statistic is t 1.02 . c. The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df n –1 20 –1 19 . From Table III, Appendix D, t.025 2.093 . The rejection region is t 2.093 or t 2.093 . d. Since the observed value of the test statistic does not fall in the rejection region (t 1.02 2.093) , H0 is not rejected. There is insufficient evidence to indicate the true mean surface roughness of coated interior pipe differs from 2 micrometers at .05 . e. The p-value is p .322 . Since the p-value is not less than .05 , H0 is not rejected. There is insufficient evidence to indicate the true mean surface roughness of coated interior pipe differs from 2 micrometers at .05 . f. From Exercise 6.33, we found the 95% confidence interval for the mean surface roughness of coated interior pipe to be (1.636, 2.126). Since the hypothesized value of ( 2) falls in the confidence interval, it is a likely value. We cannot reject it. The confidence interval and the test of hypothesis lead to the same conclusion because the critical values for the 2 techniques are the same. a. Let mean annualized percentage return on investment. To determine if the mean annualized percentage return on investment is positive, we test: H0 : 0 Ha : 0 x 0 10.8231 0 5.06 . s 7.7115 n 13 b. From the printout, x 10.8231 and s 7.7115 . The test statistic is t c. The p-value is p 0.0001 . d. Since the p-value is less than ( p .0001 .05) , H0 is rejected. There is sufficient evidence to indicate the mean annualized percentage return on investment is positive at .05 . e. We must assume that we have selected a random sample from the population and that the population Copyright © 2014 Pearson Education, Inc. 362 Chapter 7 of annualized percentage return on investments for all AAII stock screeners is normally distributed. 7.57 a. Let mean daily amount of distilled water collected by the new system. To determine if the mean daily amount of distilled water collected by the new system is greater than 1.4, we test: H 0 : 1.4 H a : 1.4 b. For this problem, probability of concluding the mean daily amount of distilled water collected by the new system is greater than 1.4 when, in fact, the mean daily amount of distilled water collected by the new system is not greater than 1.4. Since .10 , this means that H0 will be rejected when it is true about 10% of the time. c. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Water Variable Water N 3 Mean 5.243 StDev 0.192 Minimum 5.070 Q1 5.070 Median 5.210 Q3 5.450 Maximum 5.450 x 5.243 and s .192 . d. The test statistic is t e. Using MINITAB: x 0 5.243 1.4 34.67 . s .192 n 3 One-Sample T: Water Test of mu = 1.4 vs > 1.4 Variable Water N 3 Mean 5.24333 StDev 0.19218 SE Mean 0.11096 95% Lower Bound 4.91935 T 34.64 P 0.000 The p-value is p 0.000 . f. 7.58 Since the p-value is less than ( p .000 .10) , H0 is rejected. There is sufficient evidence to indicate daily amount of distilled water collected by the new system is greater than 1.4at .10 . Using MINITAB, the descriptive statistics are: Descriptive Statistics: Choice Variable Choice N 11 Mean 58.91 StDev 7.78 Minimum 43.00 Q1 56.00 Median 58.00 Q3 62.00 Maximum 76.00 Let mean choice score for consumers shopping with flexed arms. To determine if the mean choice score for consumers shopping with flexed arms is higher than 43, we test: H 0 : 43 H a : 43 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis The test statistic is t 363 x 0 58.91 43 6.78 . s 7.78 n 11 The rejection region requires .05 in the upper tail of the t-distribution with df n –1 11 –1 10 . From Table III, Appendix D, t.05 1.812 . The rejection region is t 1.812 . Since the observed value of the test statistic falls in the rejection region (t 6.78 1.812) , H0 is rejected. There is sufficient evidence to indicate the mean choice score for consumers shopping with flexed arms is higher than 43 at .05 . 7.59 Using MINITAB, the descriptive statistics are: One-Sample T: Skid Test of mu = 425 vs < 425 Variable Skid N 20 Mean 358.450 StDev 117.817 SE Mean 26.345 95% Upper Bound T 404.004 -2.53 P 0.010 To determine if the mean skidding distance is less than 425 meters, we test: H 0 : 425 H a : 425 The test statistics is t x o 358.45 425 2.53 . s n 117.817 20 The rejection region requires .10 in the lower tail of the t-distribution with. From Table III, Appendix D, t.10 1.328 . The rejection region is t 1.328 . Since the observed value of the test statistic falls in the rejection region (t 2.53 1.328) , H0 is rejected. There is sufficient evidence to indicate the true mean skidding distance is less than 425 meters at .10 . There is sufficient evidence to refute the claim. 7.60 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Dioxide Variable Dioxide a. Oil No Yes N 10 6 Mean 2.590 0.517 StDev 1.542 0.407 Minimum 0.100 0.200 Q1 1.125 0.200 Median 2.850 0.450 Q3 4.000 0.700 Maximum 4.000 1.300 To determine if the mean amount of dioxide present in water specimens that contain oil is less than 3 mg/l, we test: H0 : 3 Ha : 3 The test statistic is t x 0 .517 3 14.94 . s .407 n 6 Copyright © 2014 Pearson Education, Inc. 364 Chapter 7 The rejection region requires .10 in the lower tail of the t-distribution with df n –1 6 –1 5 . From Table III, Appendix D, t.10 1.476 . The rejection region is t 1.476 . Since the observed value of the test statistic falls in the rejection region (t 14.94 1.476) , H0 is rejected. There is sufficient evidence to indicate the mean amount of dioxide present in water specimens that contain oil is less than 3 mg/l at .10 . b. To determine if the mean amount of dioxide present in water specimens that do not contain oil is less than 3 mg/l, we test: H0 : 3 Ha : 3 The test statistic is t x 0 2.59 3 0.84 . s 1.542 n 10 The rejection region requires .10 in the lower tail of the t-distribution with df n –1 10 –1 9 . From Table III, Appendix D, t.10 1.383 . The rejection region is t 1.383 . Since the observed value of the test statistic does not fall in the rejection region (t 0.83 1.383) , H0 is not rejected. There is insufficient evidence to indicate the mean amount of dioxide present in water specimens that do not contain oil is less than 3 mg/l at .10 . 7.61 To determine if the true mean crack intensity of the Mississippi highway exceeds the AASHTO recommended maximum, we test: H 0 : .100 H a : .100 The test statistic is t x 0 s/ n .210 .100 .011 / 8 2.97 The rejection region requires .01 in the upper tail of the t-distribution with df n 1 8 1 7 . From Table III, Appendix D, t.01 2.998 . The rejection region is t 2.998 . Since the observed value of the test statistic does not fall in the rejection region (t 2.97 2.998) , H0 is not rejected. There is insufficient evidence to indicate that the true mean crack intensity of the Mississippi highway exceeds the AASHTO recommended maximum at .01 . 7.62 a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Plants Variable Plants N 20 Mean 3.900 StDev 2.770 Minimum 1.000 Q1 1.250 Median 3.500 Q3 5.000 Maximum 11.000 Let mean number of active nuclear power plants operating in all states. To determine if the mean number of active nuclear power plants operating in all states exceeds 3, we test: Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 365 H0 : 3 Ha : 3 The test statistic is t x o s n 3.9 3 2.77 20 1.45 The rejection region requires .10 in the upper tail of the t-distribution with df n –1 20 –1 19 . From Table III, Appendix D, t.10 1.328 . The rejection region is t 1.328 . Since the observed value of the test statistic falls in the rejection region t 1.45 1.328 , H0 is rejected. There is sufficient evidence to indicate the mean number of active nuclear power plants operating in all states exceeds 3 at .10 . We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the number of power plants is: Histogram of Plants Normal Mean StDev N 7 3.9 2.770 20 6 5 Frequency b. 4 3 2 1 0 -2 0 2 4 6 Plants 8 10 12 From the histogram, the data appear to be skewed to the right. This indicates that the data may not be normal. Next, we look at the intervals x s, x 2s, x 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. x s 3.9 2.77 (1.13, 6.67) 12 of the 20 values fall in this interval. The proportion is .60. This is smaller than the .68 we would expect if the data were normal. x 2s 3.9 2(2.77) 3.9 5.54 (1.64, 9.44) 19 of the 20 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. x 3s 3.9 3(2.77) 3.9 8.31 (4.41, 12.21) 20 of the 20 values fall in this interval. The proportion is 1.000. This is equal to the 1.00 we would expect if the data were normal. From the first interval, it appears that the data might not be normal. Next, we look at the ratio of the IQR to s. IQR QU – QL 5.00 –1.25 3.75 . IQR 3.75 1.35 This is pretty close to the 1.3 we would expect if the data were normal. This s 2.77 method indicates the data may be normal. Copyright © 2014 Pearson Education, Inc. Chapter 7 Finally, using MINITAB, the normal probability plot is: Probability Plot of Plants Normal - 95% CI 99 Mean StDev N AD P-Value 95 90 3.9 2.770 20 0.664 0.070 80 Percent 366 70 60 50 40 30 20 10 5 1 -5 0 5 Plants 10 15 Since the data do not form a straight line, the data may not be normal. From 3 of the 4 different methods, the indications are that the number of power plants data are not normal. c. The two largest values are 9 and 11. The two lowest values are 1 and 1. Using MINITAB with the data deleted yields the descriptive statistics: Descriptive Statistics: Plants2 Variable Plants2 N 16 Mean 3.500 StDev 1.826 Minimum 1.000 Q1 2.000 Median 3.500 Q3 5.000 Maximum 7.000 To determine if the mean number of active nuclear power plants operating in all states exceeds 3 (using the reduced data set), we test: H0 : 3 Ha : 3 The test statistic is t x o s n 3.50 3 1.826 16 1.095 The rejection region requires .10 in the upper tail of the t-distribution with df n – 1 16 –1 15 . From Table III, Appendix D, t.10 1.341 . The rejection region is t 1.341 . Since the observed value of the test statistic does not fall in the rejection region t 1.095 1.341 , H0 is not rejected. There is insufficient evidence to indicate the mean number of active nuclear power plants operating in all states exceeds 3 at .10 . By eliminating the top two and bottom two observations, we have changed the decision about H0. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis d. 7.63 367 It is very dangerous to eliminate data points to satisfy assumptions. The data may, in fact, not be normal. By eliminating data points, one has changed the kind of data that come from the parent population. Thus, incorrect decisions could be made. Using MINITAB, the descriptive statistics for the 2 plants are: Descriptive Statistics: AL1, AL2 Variable AL1 AL2 N Mean 2 0.00750 2 0.0700 StDev 0.00354 0.0283 Minimum 0.00500 0.0500 Q1 * * Median 0.00750 0.0700 Q3 * * Maximum 0.01000 0.0900 To determine if plant 1 is violating the OSHA standard, we test: H 0 : .004 H a : .004 The test statistic is t x o s n .0075 .004 .00354 2 1.40 Since no level was given, we will use .10 . The rejection region requires .10 in the upper tail of the t-distribution with df n 1 2 1 1 . From Table III, Appendix D, t.10 3.078 . The rejection region is t 3.078 . Since the observed value of the test statistic does not fall in the rejection region (t 1.40 3.078) , H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 1 at .10 . To determine if plant 2 is violating the OSHA standard, we test: H 0 : .004 H a : .004 The test statistic is t x o s n .07 .004 .0283 2 3.30 Since no level was given, we will use .10 . The rejection region requires .10 in the upper tail of the t-distribution with df n 1 2 1 1 . From Table III, Appendix D, t.10 3.078 . The rejection region is t 3.078 . Since the observed value of the test statistic falls in the rejection region t 3.30 3.078) , H0 is rejected. There is sufficient evidence to indicate the OSHA standard is violated by plant 2 at .10 . 7.64 a. Since the value of p̂ (.63) is much smaller than the hypothesized value of p (.70), it is likely that the null hypothesis is not correct. b. First, check to see if n is large enough. np0 100(.7) 70 and nq0 100(.3) 30 Since both np0 15 and nq0 15 , the normal approximation will be adequate. H 0 : p .70 H a : p .70 Copyright © 2014 Pearson Education, Inc. 368 Chapter 7 The test statistic is z pˆ p0 pˆ pˆ p0 p0 q0 n .63 .70 .70(.30) 100 1.53 The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.53 1.645) , H0 is not rejected. There is insufficient evidence to indicate that the proportion is less than .70 at .05 . 7.65 c. p-value p P( z 1.53) .5 .4370 .0630 . Since p is not less than .05 , H0 is not rejected. a. z pˆ p 0 p 0q 0 n .83 .9 .9(.1) 100 2.33 .7(.3) .9(.1) .0458 as compared to .03 in part a. Since the 100 100 denominator in this problem is smaller, the absolute value of z is larger. b. The denominator in Exercise 7.64 is c. The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 2.33 1.645) , H0 is rejected. There is sufficient evidence to indicate the population proportion is less than .9 at .05 . 7.66 d. The p-value p P( z 2.33) .5 .4901 .0099 (from Table II, Appendix D). Since the p-value is less than .05 , H0 is rejected. a. No. The p-value is the probability of observing your test statistic or anything more unusual if H0 is true. For this problem, the p-value .3300 / 2 .1650 . Given the true value of the population proportion, p, is .5, the probability of observing a test statistic of z .44 or larger is .1650. Since the p-value is not small ( p .1650) , there is no evidence to reject H0. There is no evidence to indicate the population proportion is greater than .5. b. If the alternative hypothesis were two-tailed, the p-value would be 2 times the p-value for a one-tailed test. For this problem, the p -value .3300 . The probability of observing your test statistic or anything more unusual if H0 is true is .3300. Since the p-value is so large, there is no evidence to reject H0 for .10 . There is no evidence to indicate that p .5 for .10 . 7.67 From Exercise 6.44, n 50 and since p is the proportion of consumers who do not like the snack food, p̂ will be: Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 369 Number of 0 's in sample 29 .58 n 50 First, check to see if the normal approximation will be adequate: pˆ np0 50(.5) 25 nq0 50(.5) 25 Since both np0 15 and nq0 15, the normal distribution will be adequate. a. H 0 : p .5 H a : p .5 The test statistic is z pˆ p 0 pˆ pˆ p 0 p 0q 0 n .58 .5 .5(1 .5) 50 1.13 . The rejection region requires .10 in the upper tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.13 1.28) , H0 is not rejected. There is insufficient evidence to indicate the proportion of customers who do not like the snack food is greater than .5 at .10 . b. 7.68 7.69 p value p P( z 1.13) .5 .3708 .1292 (using Table II, Appendix D) The sample size is large enough to use the normal approximation if npo 15 and nqo 15 . a. npo 900(.975) 877.5 15 and nqo 900(.025) 22.5 15 . Thus, the sample size is large enough. b. npo 125(.01) 1.25 15 and nqo 125(.99) 123.75 15 . Thus, the sample size is not large enough. c. npo 40(.75) 30 15 and nqo 40(.25) 10 15 . Thus, the sample size is not large enough. d. npo 15(.75) 11.25 15 and nqo 15(.25) 3.75 15 . Thus, the sample size is not large enough. e. npo 12(.62) 7.44 15 and nqo 12(.38) 4.56 15 . Thus, the sample size is not large enough. a. pˆ b. To determine if the true proportion of all internet-using adults who have paid to download music exceeds .7, we test: x 506 .67 n 755 H 0 : p .7 H a : p .7 Copyright © 2014 Pearson Education, Inc. 370 Chapter 7 c. 7.70 pˆ p 0 The test statistic is z pˆ pˆ p 0 p 0q 0 n .67 .7 .7(1 .7) 755 1.80 . d. The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . e. p value p P( z 1.80) .5 .4641 .9641 (using Table II, Appendix D) f. Since the observed value of the test statistic does not fall in the rejection region (t 1.80 2.33) , H0 is not rejected. There is insufficient evidence to indicate the true proportion of all internet-using adults who have paid to download music exceeds .7 at .01 . g. Since the p-value is not less than ( p .9641 .01) , H0 is not rejected. There is insufficient evidence to indicate the true proportion of all internet-using adults who have paid to download music exceeds .7 at .01 . a. p true proportion of all satellite radio subscribers who have a satellite radio receiver in their car. b. The null hypothesis is: H 0 : p .8 c. To determine if the claim is too high, the alternative hypothesis is: H a : p .8 d. pˆ x 396 .79 n 501 The test statistic is z pˆ p 0 pˆ pˆ p 0 p 0q 0 n .79 .8 .8(1 .8) 501 .56 . e. The rejection region requires .10 in the lower tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . f. p value p P( z .56) .5 .2123 .2877 (using Table II, Appendix D) g. Since the observed value of the test statistic does not fall in the rejection region ( z .56 1.28) , H0 is not rejected. There is insufficient evidence to indicate the claim is too high at .10 . Since the p-value is not less than ( p .2877 .10) , H0 is not rejected. There is insufficient evidence to indicate the claim is too high at .10 . 7.71 To determine if the sample provides sufficient evidence to indicate that the true percentage of all firms that announced one or more acquisitions during the year 2000 is less than 30%, we test: Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 371 H 0 : p .30 H a : p .30 x 748 .269 n 2, 778 pˆ po .269 .30 The test statistic is z 3.57 .30(.70) po qo 2, 778 n The point estimate is pˆ The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 3.57 1.645) , H0 is rejected. There is sufficient evidence to indicate that the true percentage of all firms that announced one or more acquisitions during the year 2000 is less than 30% at .05 . 7.72 a. If there is no relationship between color and gummy bear flavor, then .5 of the population of students will correctly identify the color. b. To determine if color and flavor are related, we test: H 0 : p .5 H a : p .5 7.73 c. From the printout, the p-value is p .000 . Since the p-value is less than ( p .000 .01) , H0 is rejected. There is sufficient evidence to indicate that color and flavor are related at .01 . a. To determine whether the true proportion of toothpaste brands with the ADA seal verifying effective decay prevention is less than .5, we test: H 0 : p .5 H a : p .5 7.74 b. From the printout, the p-value is p .231 . c. Since the observed p-value is greater than ( p .231 .10) , H0 is not rejected. There is insufficient evidence to indicate the true proportion of toothpaste brands with the ADA seal verifying effective decay prevention is less than .5 at .10 . a. pˆ x 198 .10 n 1982 To determine if the percentage of all residential properties purchased for vacation homes is less than 14%, we test: H 0 : p .14 H a : p .14 Copyright © 2014 Pearson Education, Inc. 372 Chapter 7 The test statistic is z pˆ p 0 pˆ pˆ p 0 p 0q 0 n .10 .14 .14(1 .14) 1982 5.13 . The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic falls in the rejection region ( z 5.13 2.33) , H0 is rejected. There is sufficient evidence to indicate the percentage of all residential properties purchased for vacation homes is less than 14% at .01 . b. 7.75 pˆ The return rate is only 1, 982 / 45, 000 .044 . This is a very low return rate. Since this is a selfselected sample (only those who wanted to respond returned their questionnaire), it is very likely that the sample was not representative. Therefore, the results are very suspect. x 417 77 .585 n 845 To determine if fewer than 60% of the coffee growers in southern Mexico are either certified or transitioning to become certified, we test: H 0 : p .60 H a : p .60 The test statistic is z pˆ p 0 pˆ pˆ p 0 p 0q 0 n .585 .60 .60(1 .60) 845 .89 . The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z .89 1.645) , H0 is not rejected. There is insufficient evidence to indicate that fewer than 60% of the coffee growers in southern Mexico are either certified or transitioning to become certified at .05 . 7.76 pˆ x 53 .106 n 500 To determine if the French unemployment rate dropped after the enactment of the 35-hour work week law, we test: H 0 : p .12 H a : p .12 The test statistic is z pˆ p 0 pˆ pˆ p 0 p 0q 0 n .106 .12 .12(1 .12) 500 .96 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 373 The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z .96 1.645) , H0 is not rejected. There is insufficient evidence to indicate that the French unemployment rate dropped after the enactment of the 35-hour work week law at .05 . 7.77 a. Le p proportion of middle-aged women who exhibit skin improvement after using the cream. For x 24 .727 . n 33 First we check to see if the normal approximation is adequate: np0 33(.6) 19.8 nq0 33(.4) 13.2 this problem, pˆ Since nq0 13.2 is less than 15, the assumption of normality may not be valid. We will go ahead and perform the test. To determine if the cream will improve the skin of more than 60% of middle-aged women, we test: H 0 : p .60 H a : p .60 The test statistic is z pˆ p0 p0 q0 n .727 .60 .60(.40) 33 1.49 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.49 1.645) , H0 is not rejected. There is insufficient evidence to indicate the cream will improve the skin of more than 60% of middle-aged women at .05 . b. The p-value is p P( z 1.49) (.5 .4319) .0681 . (Using Table II, Appendix D.) Since the pvalue is greater than ( p .0681 .05) , H0 is not rejected. There is insufficient evidence to indicate the cream will improve the skin of more than 60% of middle-aged women at .05 . 7.78 First, check to see if n is large enough. np0 2,376(.7) 1, 663.2 and nq0 2,376(.3) 712.8 Since both np0 15 and nq0 15 , the normal approximation will be adequate. pˆ x 1, 554 .654 n 2, 376 To determine if the true detection rate for pictures of PTW is different from .70, we test: H 0 : p .70 H a : p .70 Copyright © 2014 Pearson Education, Inc. 374 Chapter 7 The test statistic is z pˆ p0 pˆ pˆ p0 p0 q0 n .654 .70 .70(.30) 2,376 4.89 The rejection region requires / 2 .10 / 2 .05 in each tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 or z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 4.89 1.645) , H0 is rejected. There is sufficient evidence to indicate the true detection rate for pictures of PTW is different from .70 at .10 . 7.79 Answers will vary. The target population will be all households in the United States that have televisions. The experimental unit will be an individual household in the United States with a television. The variable to be measured is whether or not the household has a DVR. Let p = proportion of be all households in the United States that have televisions that also have DVR’s. The hypotheses of interest are: H 0 : p .41 H a : p .41 The test statistic is z pˆ p 0 pˆ pˆ p 0 p 0q 0 n pˆ .41 .41(1 .41) n . One could take a random number generator to generate 500 random telephone numbers to call to obtain a random sample. 7.80 Let p = proportion of students choosing the three-grill display so that Grill #2 is a compromise between a more desirable and a less desirable grill. pˆ x 85 .685 n 124 To determine if the proportion of students choosing the three-grill display so that Grill #2 is a compromise between a more desirable and a less desirable grill is greater than .167, we test: H 0 : p .167 H a : p .167 The test statistic is z pˆ po po qo n .685 .167 .167(.833) 124 15.47 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 15.47 1.645) , H0 is rejected. There is sufficient evidence to indicate that the true proportion of students choosing the three-grill display so that Grill #2 is a compromise between a more desirable and a less desirable grill is greater than .167 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.81 375 To minimize the probability of a Type I error, we will select .01 . First, check to see if the normal approximation is adequate: np0 100(.5) 50 nq0 100(.5) 50 Since both np0 .15 and nq0 .15 , the normal distribution will be adequate pˆ x 56 .56 n 100 To determine if more than half of all Diet Coke drinkers prefer Diet Pepsi, we test: H 0 : p .5 H a : p .5 The test statistic is z pˆ p 0 p 0q 0 n .56 .5 .5(.5) 100 1.20 The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.20 2.33) , H0 is not rejected. There is insufficient evidence to indicate that more than half of all Diet Coke drinkers prefer Diet Pepsi at .01 . Since H0 was not rejected, there is no evidence that Diet Coke drinkers prefer Diet Pepsi. 7.82 7.83 Using Table IV, Appendix D: a. For n 12 , df n 1 12 1 11 , P ( 2 02 ) .10 02 17.2750 b. For n 9 , df n 1 9 1 8 , P ( 2 02 ) .05 02 15.5073 c. For n 5 , df n 1 5 1 4 , P ( 2 02 ) .025 02 11.1433 a. df n 1 16 1 15 ; reject H0 if 2 6.26214 or 2 27.4884 b. df n 1 23 1 22 ; reject H0 if 2 40.2894 c. df n 1 15 1 14 ; reject H0 if 2 21.0642 d. df n 1 13 1 12 ; reject H0 if 2 3.57056 e. df n 1 7 1 6 ; reject H0 if 2 1.63539 or 2 12.5916 f. df n 1 25 1 24 ; reject H0 if 2 13.8484 Copyright © 2014 Pearson Education, Inc. 376 7.84 Chapter 7 a. b. It would be necessary to assume that the population has a normal distribution. H0 : 2 1 Ha : 2 1 The test statistic is 2 (n 1) s 2 02 (7 1)(4.84) 29.04 1 The rejection region requires .05 in the upper tail of the 2 distribution with df n 1 7 1 6 . 2 12.5916 . The rejection region is 2 12.5916 . From Table IV, Appendix D, .05 Since the observed value of the test statistic falls in the rejection region ( 2 29.04 12.5916) , H0 is rejected. There is sufficient evidence to indicate that the variance is greater than 1 at .05 . c. H0 : 2 1 Ha : 2 1 The test statistic is 2 (n 1) s 2 2 0 (7 1)(4.84) 29.04 1 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the 2 distribution with 2 2 df n 1 7 1 6 . From Table IV, Appendix D, .975 1.237347 and .025 14.4494 . The rejection region is 2 1.237347 or 2 14.4494 . Since the observed value of the test statistic falls in the rejection region ( 2 29.04 14.4494) , H0 is rejected. There is sufficient evidence to indicate that the variance is not equal to 1 at .05 . 7.85 a. H0 : 2 1 Ha : 2 1 The test statistic is 2 (n 1) s 2 2 0 (100 1)4.84 479.16 1 The rejection region requires .05 in the upper tail of the 2 distribution with 2 124.342 . The rejection region is df n 1 100 1 99 . From Table IV, Appendix D, .05 2 124.342 . Since the observed value of the test statistic falls in the rejection region ( 2 479.16 124.342) , H0 is rejected. There is sufficient evidence to indicate the variance is larger than 1 at .05 . b. In part b of Exercise 7.84, the test statistic was 2 29.04 . The conclusion was to reject H0 as it was in this problem. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.86 377 Some preliminary calculations are: s2 x x 2 2 n 1 n 30 2 7 7.9048 7 1 176 To determine if 2 1 , we test: H0 : 2 1 Ha : 2 1 The test statistic is 2 (n 1) s 2 2 0 (7 1)7.9048 47.43 1 The rejection region requires .05 in the lower tail of the 2 distribution with df n 1 7 1 6 . From 2 1.63539 . The rejection region is 2 1.63539 . Table IV, Appendix D, .95 Since the observed value of the test statistic does not fall in the rejection region ( 2 47.43 1.63539) , H0 is not rejected. There is insufficient evidence to indicate the variance is less than 1 at .05 . 7.87 a. To determine whether the population of institutional investors perform consistently, we test: H 0 : 2 102 100 H a : 2 100 b. The rejection region requires .05 in the lower tail of the 2 distribution with df n 1 200 1 199 . Using MINITAB, we get: Inverse Cumulative Distribution Function Chi-Square with 199 DF P( X <= x ) 0.05 x 167.361 The rejection region is 2 167.361 . c. For this problem, .05 . The probability of concluding the standard deviation is less than 10 when, in fact, it is equal to 10 is .05. If this test was repeated a large number of times, approximately 5% of the time we would conclude the standard deviation was less than 10 when it really was 10. d. From the printout, 2 154.81 and the p-value is p .009 . e. Since the p-value is less than ( p .009 .05) , H0 is rejected. There is sufficient evidence to indicate the standard deviation is less than 10% at .05 . f. We must assume that a random sample was selected from the target population and the population sampled from is approximately normal. Copyright © 2014 Pearson Education, Inc. 378 7.88 Chapter 7 a. To determine if the variance of the population of trap spacing measurements is larger than 10, we test: H 0 : 2 10 H a : 2 10 b. Using MINITAB, the results are: Descriptive Statistics: Spacing Variable N Spacing 7 Mean 89.86 StDev 11.63 Variance 135.14 Minimum 70.00 Q1 Median Q3 Maximum 82.00 93.00 99.00 105.00 The sample variance is s 2 135.14 . 7.89 c. The value of s2 is a variable. The next time a random sample is selected, the value of s2 could be much greater or much smaller. We need to find out how unusual it is to obtain a value of s2 of 135.14 if 2 10 . d. The test statistic is 2 e. Using MINITAB, the p-value is p P( 2 81.084) 0 . f. Since the p-value is so small, H0 is rejected for any reasonable value of . There is sufficient evidence to indicate the true population variance is greater than 10. g. We must assume that a random sample was selected from the target population and the population sampled from is approximately normal. a. Let 2 weight variance of tees. To determine if the weight variance differs from .000004 (injection mold process is out-of-control), we test: (n 1) s 2 2 0 (7 1)135.14 81.084 10 H 0 : 2 .000004 H a : 2 .000004 b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tees Variable N Mean Tees 40 0.25248 The test statistic is 2 Median 0.25300 (n 1) s 2 02 StDev 0.00223 Minimum 0.24700 Maximum Q1 Q3 0.25600 0.25100 0.25400 (40 1)(.00223)2 48.49 .000004 The rejection region requires / 2 .01/ 2 .005 in each tail of the 2 distribution with 2 2 66.7659 and .995 20.7065 . The df n 1 40 1 39 . From Table IV, Appendix D, .005 rejection region is 2 66.7659 or 2 20.7065 . Since the observed value of the test statistic does not fall in the rejection region Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 379 ( 2 49.49 66.7659 and 2 49.49 20.7065) , H0 is not rejected. There is insufficient evidence to indicate the injection mold process is out-of-control at .01 . c. We must assume that the distributions of the weights of tees is approximately normal. Using MINITAB, a histogram of the data is: Histogram of Tees Normal 12 Mean StDev N 0.2525 0.002230 40 10 Frequency 8 6 4 2 0 0.248 0.250 0.252 Tees 0.254 0.256 The data look fairly mound-shaped, so the assumption of normality seems to be reasonably satisfied. 7.90 a. To determine if the breaking strength variance of the new adhesive is less than the variance of the standard composite adhesive, 2 .25 , we test: H 0 : 2 .25 H a : 2 .25 b. The rejection region requires .01 in the lower tail of the 2 distribution with df n 1 10 1 9 . 2 2.087912 . The rejection region is 2 2.087912 . From Table IV, Appendix D, .99 7.91 (n 1) s 2 (10 1).462 7.6176 . .25 c. The test statistic is 2 b. Since the observed value of the test statistic does not fall in the rejection region ( 2 7.6176 2.087912) , H0 is not rejected. There is insufficient evidence to indicate the breaking strength variance of the new adhesive is less than the variance of the standard composite adhesive, 2 .25 at .01 . e. We must assume that the distribution of the breaking strengths is approximately normal and that a random sample was selected from this population. o2 To determine whether the true conduction time standard deviation is less than 7 seconds (variance less than 49), we test: H 0 : 2 72 H a : 2 72 Copyright © 2014 Pearson Education, Inc. 380 Chapter 7 The test statistic is 2 (n 1) s 2 02 (18 1)6.32 13.77 . 72 The rejection region requires .01 in the lower tail of the 2 distribution with df n 1 18 1 17 . 2 6.40776 . The rejection region is 2 6.40776 . From Table IV, Appendix D, .99 Since the observed value of the test statistic does not fall in the rejection region ( 2 13.77 6.40776) , H0 is not rejected. There is insufficient evidence to indicate the true conduction time standard deviation is less than 7 seconds at .01 . Thus, the prototype system does not satisfy this requirement. 7.92 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Drug Variable Drug N 50 Mean 89.291 StDev 3.183 Variance 10.134 Minimum 81.790 Median 89.375 Maximum 94.830 To determine whether new method of determining drug concentration is less variable than the standard method, we test: H0 : 2 9 Ha : 2 9 The test statistic is 2 (n 1) s 2 02 (50 1)10.134 55.174 . 9 The rejection region requires .01 in the lower tail of the 2 distribution with df n 1 50 1 49 . 2 29.7067 . The rejection region is 2 29.7067 . From Table IV, Appendix D, .99 Since the observed value of the test statistic does not fall in the rejection region ( 2 55.174 29.7067) , H0 is not rejected. There is insufficient evidence to indicate the new method of determining drug concentration is less variable than the standard method at .01 . 7.93 To determine if the diameters of the ball bearings are more variable when produced by the new process, test: H 0 : 2 .00156 H a : 2 .00156 The test statistic is 2 (n 1) s 2 02 99(.00211) 133.90 .00156 The rejection region requires use of the upper tail of the 2 distribution with df n 1 100 1 99 . We will use df 100 99 due to the limitations of the table. From Table IV, Appendix D, 2 2 .025 129.561 133.90 135.807 .010 . The p-value of the test is between .010 and .025. The decision made depends on the desired . For .010 , there is not enough evidence to show that the variance in the diameters is greater than .00156. For .025 , there is enough evidence to show that the variance in the diameters is greater than .00156. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.94 381 Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable N Mean GASTURBINE 67 11066 StDev 1595 Minimum 8714 Q1 9918 Median 10656 Q3 11842 Maximum 16243 To determine if the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine, we test: H 0 : 2 1,5002 H a : 2 1,5002 The test statistic is 2 (n 1) s 2 o2 (67 1)1,5952 74.625 . 1,5002 The rejection region requires .05 in the upper tail of the 2 distribution with df n 1 67 1 66 . Using MINITAB, Inverse Cumulative Distribution Function Chi-Square with 66 DF P( X <= x ) 0.95 x 85.9649 The rejection region is 2 85.9649 . Since the observed value of the test statistic does not fall in the rejection region ( 2 74.625 85.9649) , H0 is not rejected. There is insufficient evidence to indicate the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine at .05 . 7.95 a. Since the sample mean of 3.85 is not that far from the value of 3.5, a large standard deviation would indicate that the value 3.85 is not very many standard deviations from 3.5. b. The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . The test statistic is z x o x 3.85 3.5 137 To reject H0, z 2.33 . Thus, we need to find so z 2.33 . z 3.85 3.5 137 2.33 3.85 3.5 2.33 137 .35 .199065 1.758 Thus, the largest value of for which we will reject H0 is 1.758. Copyright © 2014 Pearson Education, Inc. 382 Chapter 7 c. To determine if 1.758 , we test: H 0 : 2 1.7582 H a : 2 1.7582 The test statistic is 2 (n 1) s 2 2 o (137 1)1.52 99.011 . 1.7582 The rejection region requires .01 in the lower tail of the 2 distribution with df n 1 137 1 136 . Since there are no values in the table with df 100 , we will use MINITAB to compute the p-value of the test statistic. Cumulative Distribution Function Chi-Square with 136 DF x 99.011 P( X <= x ) 0.0072496 Since the p-value is less than ( p 0.0072496 .01) , H0 is rejected. There is sufficient evidence to indicate the standard deviation of the scores is less than 1.758 at .01 . 7.96 a. The power of a test increases when: 1. 2. 3. 7.97 The distance between the null and alternative values of increases. The value of increases. The sample size increases. b. The power of a test is equal to 1 . As increases, the power decreases. a. By the Central Limit Theorem, the sampling distribution of x is approximately normal with 100 x 500 and x 20 . 25 n b. x0 0 z x 0 z where z z 1.645 from Table II, Appendix D. .05 n Thus, x0 500 1.645(20) 532.9 c. The sampling distribution of x is approximately normal by the Central Limit Theorem with x 550 and x = n 100 25 20 . d. P ( x0 532.9) when 550) P z e. Power 1 1 .1949 .8051 532.9 550 P ( z .86) .5 .3051 .1949 100 / 25 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.98 383 From Exercise 7.97 we want to test H 0 : 500 against H a : 500 using .05 , 100 , n 25 , and x 532.9 . 7.99 532.9 575 P( z 2.11) .5 .4826 .0174 100 / 25 (Using Table II, Appendix D) a. P( x0 532.9 when 575) P z b. Power 1 1 .0174 .9826 c. In Exercise 7.97, .1949 and power .8051 . The value of has decreased in this exercise since 575 is further from the hypothesized value than 550 . As a result, the power of the test in this exercise has increased (when decreases, the power of the test increases). a. The sampling distribution of x will be approximately normal (by the Central Limit Theorem) with 15 x 75 and x 2.143 . 49 n b. The sampling distribution of x will be approximately normal (by the Central Limit Theorem) with 15 x 70 and x 2.143 . 49 n c. First, find x0 0 z x 0 z Thus, x0 75 1.28 15 49 n where z.10 1.28 from Table II, Appendix D. 72.257 72.257 70 Now, find P( x0 72.257 when 70) P z P z 1.05 .5 .3531 .1469 15 / 49 7.100 d. Power 1 1 .1469 .8531 a. From Exercise 7.99, we want to test H 0 : 75 against H a : 75 using .10 , 15 , n 49 , and x 72.257 . Using Table II, Appendix D: If μ = 74 , 72.257 74 P z .81 .5 .2910 .7910 15 / 49 72.257 72 P z .12 .5 .0478 .4522 15 / 49 P( x0 72.257 when 74) P z If μ = 72 , P( x0 72.257 when 72) P z If μ = 70 , P ( x0 72.257 when 70) .1469 (Refer to Exercise 7.99, part c.) Copyright © 2014 Pearson Education, Inc. Chapter 7 If μ = 68 , 72.257 68 P z 1.99 .5 .4767 .0233 15 / 49 72.257 66 P z 2.92 .5 .4982 .0018 15 / 49 P( x0 72.257 when 68) P z If μ = 66 , P( x0 72.257 when 66) P z In summary, 74 .7910 b. 72 .4522 70 .1469 68 .0233 66 .0018 Using MINITAB, the graph is: Scatterplot of beta vs mu 0.8 0.7 0.6 0.5 beta 384 0.4 0.3 0.2 0.1 0.0 65 66 67 68 69 70 mu 71 72 73 74 c. Looking at the graph, is approximately .62 when 73 . d. Power 1 Therefore, Power 74 .7910 .2090 72 .4522 .5478 70 .1469 .8531 68 .0233 .9767 Copyright © 2014 Pearson Education, Inc. 66 .0018 .9982 Inferences Based on a Single Sample: Tests of Hypothesis 385 Scatterplot of Power vs mu 1.1 1.0 0.9 Power 0.8 0.7 0.6 0.5 0.4 0.3 0.2 65 66 67 68 69 70 mu 71 72 73 74 The power curve starts out close to 1 when 66 and decreases as increases, while the curve is close to 0 when 66 and increases as increases. 7.101 e. As the distance between the true mean and the null hypothesized mean 0 increases, decreases and the power increases. We can also see that as increases, the power decreases. a. First, the sample is sufficiently large if both np0 15 and nq0 15 . np0 100(.7) 70 and nq0 100(1 .7) 30 . Since both np0 15 and nq0 15, the normal distribution will be adequate. Thus, the sampling distribution of p̂ will be approximately normal with E ( pˆ ) p .7 and pˆ b. The sampling distribution of p̂ will be approximately normal with E ( pˆ ) p .65 and pˆ c. p0 q0 .7(.3) .0458 . 100 n p0 q0 .65(.35) .0477 . 100 n First, find pˆ 0, L p0 z / 2 pˆ p0 z / 2 Thus, pˆ 0, L .7 1.96 p0 q0 n where z.05/ 2 z.025 1.96 from Table II, Appendix D. .7(.3) .610 100 pˆ 0,U p0 z / 2 pˆ p0 z / 2 p0 q0 .7(.3) .7 1.96 .790 n 100 Now, find P (.610 pˆ .79 when p .65) Copyright © 2014 Pearson Education, Inc. 386 Chapter 7 .610 .65 .79 .65 P z P (.84 z 2.94) .2995 .4984 .7979 .65(.35) .65(.35) 100 100 7.102 d. P(.610 pˆ .79 when p .71) a. To determine if the mean size of California homes exceeds the national average, we test: .610 .71 .79 .71 P (2.20 z 1.76) .4861 .4608 .9469 P z .71(.29) .71(.29) 100 100 H 0 : 2,390 H a : 2,390 The test statistic is z x 0 x 2,507 2,390 257 / 100 4.55 The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic falls in the rejection region z 4.55 2.33 , H0 is rejected. There is sufficient evidence to indicate the mean size of California homes exceeds the national average at .01 . b. To compute the power, we must first set up the rejection regions in terms of x . s 257 x0 0 z x 0 2.33 2,390 2.33 2, 449.88 n 100 We would reject H0 if x 2, 449.88 The power of the test when 2, 490 would be: x a 2, 449.88 2, 490 Power P( x 2, 449.88 | 2, 490) P z 0 P z x 257 / 100 P( z 1.56) .5 .4406 .9406 c. The power of the test when 2, 440 would be: x a 2, 449.88 2, 440 Power P( x 2, 449.88 | 2, 440) P z 0 P z x 257 / 100 P( z 0.38) .5 .1480 .3520 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.103 a. We have failed to reject H0 when it is not true. This is a Type II error. To compute , first find: x0 0 z x 0 z Thus, x0 5.0 1.645 where z 1.645 from Table II, Appendix D. .05 n .01 100 4.998355 Then find: P( x0 4.998355 when 4.9975) P z P z .86 .5 .3051 .1949 7.104. 4.998355 4.9975 .01/ 100 b. We have rejected H0 when it is true. This is a Type I error. The probability of a Type I error is .05 . c. A departure of .0025 below 5.0 is 4.9975 . Using a, .1949 when 4.9975 . The power of the test is 1 1 .1949 .8051 . To compute the power, we must first set up the rejection region in terms of p̂ . The rejection region requires .10 in the lower tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . Thus, pˆ 0, L p0 z / 2 pˆ p0 z / 2 p0 q0 .8(.2) .8 1.28 .8 .023 .777 . 501 n pˆ p0 .777 .82 P z P( z 2.51) .5 .4940 .0060 Power P( pˆ .777 | p0 .82) P z pˆ .82(.18) 501 7.105. 387 To compute the power, we must first set up the rejection region in terms of p̂ . The rejection region requires / 2 .01/ 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.575 . The rejection region is z 2.575 or z 2.575 . Thus, pˆ 0, L p0 z / 2 pˆ p0 z / 2 pˆ 0,U p0 z / 2 pˆ p0 z / 2 p0 q0 .5(.5) .5 2.575 .5 .117 .383 and 121 n p0 q0 .5(.5) .5 2.575 .5 .117 .617 . 121 n Copyright © 2014 Pearson Education, Inc. 388 Chapter 7 pˆ p0 pˆ p0 Power P( pˆ .383 or pˆ .617 | p0 .65) P z P z pˆ pˆ .383 .65 .617 .65 P z P z P( z 6.16) P( z .76) (.5 .5) (.5 .2764) .7764 .65(.35) .65(.35) 121 121 7.106 a. To determine if the mean mpg for 2011 Honda Civic autos is greater than 36 mpg, we test: H 0 : 36 H a : 36 b. The test statistic is z x 0 x 38.3 36 6.4 / 50 2.54 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region z 2.54 1.645 , H0 is rejected. There is sufficient evidence to indicate that the mean mpg for 2011 Honda Civic autos is greater than 36 mpg at .05 . We must assume that the sample was a random sample. c. First find: x0 0 z x 0 z Thus, x0 36 1.645 6.4 50 n where z 1.645 from Table II, Appendix D. 37.49 For 36.5 : 37.49 36.5 Power P( x 37.49 | 36.5) P z P z 1.09 .5 .3621 .1379 6.4 / 50 For 37 : 37.49 37 Power P( x 37.49 | 37) P z P z .54 .5 .2054 .2946 6.4 / 50 For 37.5 : 37.49 37.5 Power P( x 37.49 | 37.5) P z P z .01 .5 .0040 .5040 6.4 / 50 For 38 : Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 37.49 38 Power P( x 37.49 | 38) P z P z .56 .5 .2123 .7123 6.4 / 50 For 38.5 : 37.49 38.5 Power P( x 37.49 | 38.5) P z P z 1.12 .5 .3686 .8686 6.4 / 50 d. Using MINITAB, the plot is: Scatterplot of Power vs Mu 0.9 0.8 0.7 Power 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 36.5 e. 37.0 37.5 µ 38.0 38.5 From the plot, the power is approximately .60. For 37.75 : 37.49 37.75 Power P ( x 37.49 | 37.75) P z P ( z .29) .5 .1141 .6141 6.4 50 f. From the plot, the power is approximately 1. For 41: 37.49 41 Power P ( x 37.49 | 41) P z P ( z 3.88) .5 .5 1.0 6.4 50 If the true value of is 41, the approximate probability that the test will fail to reject H0 is 1 1 0 . Copyright © 2014 Pearson Education, Inc. 389 390 7.107 Chapter 7 First, find x0 such that P( x x0 ) .05 . x 10 P( x x0 ) P z 0 P z z0 .05 . 1.2 / 48 From Table II, Appendix D, z0 1.645 . Thus, z0 x0 10 1.2 / 48 x0 1.645(.173) 10 9.715 The probability of a Type II error is: 9.715 9.5 P( x 9.715 | 9.5) P z P( z 1.24) .5 .3925 .1075 1.2 / 48 7.108 For a large sample test of hypothesis about a population mean, no assumptions are necessary because the Central Limit Theorem assures that the test statistic will be approximately normally distributed. For a small sample test of hypothesis about a population mean, we must assume that the population being sampled from is normal. The test statistic for the large sample test is the z statistic, and the test statistic for the small sample test is the t statistic. 7.109 The smaller the p-value associated with a test of hypothesis, the stronger the support for the alternative hypothesis. The p-value is the probability of observing your test statistic or anything more unusual, given the null hypothesis is true. If this value is small, it would be very unusual to observe this test statistic if the null hypothesis were true. Thus, it would indicate the alternative hypothesis is true. 7.110 The elements of the test of hypothesis that should be specified prior to analyzing the data are: null hypothesis, alternative hypothesis, and rejection region based on . 7.111 There is not a direct relationship between and . That is, if is known, it does not mean is known because depends on the value of the parameter in the alternative hypothesis and the sample size. However, as decreases, increases for a fixed value of the parameter and a fixed sample size. Thus, if is very small, will tend to be large. 7.112 P(Type I error) = P(rejecting H0 when it is true). Thus, if rejection of H0 would cause your firm to go out of business, you would want this probability or to be small. 7.113 a. H 0 : 80 H a : 80 The test statistic is t x 0 s/ n 72.6 80 19.4 / 20 7.51 The rejection region requires .05 in the lower tail of the t-distribution with df n 1 20 1 19 . From Table III, Appendix D, t.05 1.729 . The rejection region is t 1.729 . Since the observed value of the test statistic falls in the rejection region (t 7.51 1.729) , H0 is rejected. There is sufficient evidence to indicate that the mean is less than 80 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis b. 391 H 0 : 80 H a : 80 x 0 The test statistic is t s/ n 72.6 80 19.4 / 20 7.51 The rejection region requires / 2 .01/ 2 .005 in each tail of the t-distribution with df n 1 20 1 19 . From Table III, Appendix D, t.005 2.861 . The rejection region is t 2.861 or t 2.861 . Since the observed value of the test statistic falls in the rejection region (t 7.51 2.861) , H0 is rejected. There is sufficient evidence to indicate that the mean is different from 80 at .01 . 7.114 a. H 0 : 8.3 H a : 8.3 The test statistic is z x 0 x 8.2 8.3 .79 / 175 1.67 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.67 1.96) , H0 is not rejected. There is insufficient evidence to indicate that the mean is different from 8.3 at .05 . b. H 0 : 8.4 H a : 8.4 The test statistic is z x 0 x 8.2 8.4 .79 / 175 3.35 The rejection region is the same as part b, z 1.96 or z 1.96 . Since the observed value of the test statistic falls in the rejection region ( z 3.35 1.96) , H0 is rejected. There is sufficient evidence to indicate that the mean is different from 8.4 at .05 . c. H0 : 1 Ha : 1 or H0 : 2 1 Ha : 2 1 The test statistic is 2 (n 1) s 2 02 (175 1)(.79) 2 108.59 1 The rejection region requires / 2 .05 / 2 .025 in each tail of the 2 distribution with df n 1 175 1 174 . Since there are no values in the table with df 100 , we will use MINITAB to find the critical values. Copyright © 2014 Pearson Education, Inc. 392 Chapter 7 Inverse Cumulative Distribution Function Chi-Square with 174 DF P( X <= x ) 0.025 x 139.367 Inverse Cumulative Distribution Function Chi-Square with 174 DF P( X <= x ) 0.975 x 212.419 2 2 .025 212.419 and .975 139.367 . The rejection region is 2 212.419 or 2 139.367 . Since the observed value of the test statistic falls in the rejection region ( 2 108.59 139.367) , H0 is rejected. There is sufficient evidence to indicate the standard deviation differs from 1 at .05 . d. In part a, the rejection region is z 1.96 or z 1.96 . In terms of x , the rejection region would be: z z x 0 x x 0 x 1.96 xU 8.3 .79 1.96 175 .117 xU 8.3 xU 8.417 xL 8.3 .79 175 .117 xL 8.3 xL 8.183 Based on x , the rejection region would be: Reject H0 if x 8.183 or x 8.417 . The power of the test is the probability the test statistic falls in the rejection region, given the alternative hypothesis is true. In this case, we will let a 8.5 . Power P ( x 8.183 | a 8.5) P ( x 8.417 | a 8.5) 7.115 a. 8.183 8.5 8.417 8.5 P z P z .79 175 .79 175 P ( z 5.31) P ( z 1.39) (.5 .5) (.5 .4177) .9177 (Using Table II, Appendix D) H 0 : p .35 H a : p .35 The test statistic is z pˆ p 0 p 0q 0 n .29 .35 .35(.65) 200 1.78 The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 1.78 1.645) , H0 is rejected. There is sufficient evidence to indicate p .35 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis b. 393 H 0 : p .35 H a : p .35 The test statistic is z 1.78 (from a). The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.78 1.96) , H0 is not rejected. There is insufficient evidence to indicate p is different from .35 at .05 . 7.116 a. The p-value p .1288 P(t 1.174) . Since the p-value is not very small, there is no evidence to reject H0 for .10 . There is no evidence to indicate the mean is greater than 10. b. We must assume that a random sample was selected from a population that is normally distributed. c. For the alternative hypothesis H a : 10 , the p-value is 2 times the p-value for the one-tailed test. The p-value p 2(.1288) .2576 . There is no evidence to reject H0 for .10 . There is no evidence to indicate the mean is different from 10. 7.117 a. H 0 : 2 30 H a : 2 30 The test statistic is 2 (n 1) s 2 02 (41 1)(6.9) 2 63.48 30 The rejection region requires .05 in the upper tail of the 2 distribution with 2 df n 1 41 1 40 . From Table IV, Appendix D, .05 55.7585 . The rejection region is 2 55.7585 . Since the observed value of the test statistic falls in the rejection region ( 2 63.48 55.7585) , H0 is rejected. There is sufficient evidence to indicate the variance is larger than 30 at .05 . b. H 0 : 2 30 H a : 2 30 The test statistic is 2 63.48 (from part a). The rejection region requires / 2 .05 / 2 .025 in each tail of the 2 distribution with 2 2 df n 1 41 1 40 . From Table IV, Appendix D, .025 59.3417 and .975 24.4331 . The rejection region is 2 24.4331 or 2 59.3417 . Since the observed value of the test statistic falls in the rejection region ( 2 63.48 59.3417) , H0 is rejected. There is sufficient evidence to indicate the variance is not 30 at .05 . Copyright © 2014 Pearson Education, Inc. 394 7.118 7.119 Chapter 7 a. The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . b. The test statistic is z c. Since the observed value of the test statistics does not fall in the rejection region ( z .40 2.33) , H0 is not rejected. There is insufficient evidence to indicate the true mean number of latex gloves used per week by all hospital employees is less than 20 at .01 . a. The rejection region requires / 2 .01/ 2 .005 in each tail of the 2 distribution with x o x 19.3 20 11.9 46 .40 df n 1 46 1 45 . Using MINITAB, Inverse Cumulative Distribution Function Chi-Square with 45 DF P( X <= x ) 0.005 x 24.3110 Inverse Cumulative Distribution Function Chi-Square with 45 DF P( X <= x ) 0.995 x 73.1661 The rejection region is 2 24.3110 or 2 73.1661 . 7.120 (n 1) s 2 (46 1)11.92 63.7245 . 100 c. The test statistic is 2 d. Since the observed value of the test statistic does not fall in the rejection region ( 2 63.7245 24.3110 and 2 63.7245 73.1661) , H0 is not rejected. There is insufficient evidence to indicate the variance is different from 100 at .01 . a. pˆ b. o2 x 64 .604 n 106 H 0 : p .70 H a : p .70 pˆ p0 The test statistic is z d. The rejection region requires / 2 .01/ 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . p0 q0 n .604 .70 c. .70(.30) 106 2.16 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.121 395 e. Since the observed value of the test statistic does not fall in the rejection region ( z 2.16 2.58) , H0 is not rejected. There is insufficient evidence to indicate the true proportion of consumers who believe “Made in the USA” means 100% of labor and materials are from the United States is different from .70 at .01 . a. To determine if the average high technology stock is riskier than the market as a whole, we test: H0 : 1 Ha : 1 b. The test statistic is t x 0 s/ n The rejection region requires .10 in the upper tail of the t-distribution with df n 1 15 1 14 . From Table III, Appendix D, t.10 1.345 . The rejection region is t 1.345 . c. We must assume the population of beta coefficients of technology stocks is normally distributed. d. The test statistic is t x 0 1.23 1 2.41 s / n .37 / 15 Since the observed value of the test statistic falls in the rejection region (t 2.41 1.345) , H0 is rejected. There is sufficient evidence to indicate the mean high technology stock is riskier than the market as a whole at .10 . e. From Table III, Appendix D, with df n 1 15 1 14 , .01 P (t 2.41) .025 . Thus, .01 p-value .025 . The probability of observing this test statistic, t 2.41 , or anything more unusual is between .01 and .025. Since this probability is small, there is evidence to indicate the null hypothesis is false for .05 . f. To determine if the variance of the stock beta values differs from .15, we test: H 0 : 2 .15 H a : 2 .15 The test statistic is 2 (n 1) s 2 o2 (15 1).372 12.7773 . .15 The rejection region requires / 2 .05 / 2 .025 in each tail of the 2 distribution with 2 2 5.62872 and .025 26.1190 . df n 1 15 1 14 . From Table IV, Appendix D, .975 The rejection region is 2 5.62872 or 2 26.1190 . Since the observed value of the test statistic does not fall in the rejection region ( 2 12.7773 5.62875 and 2 12.7773 26.1190) , H0 is not rejected. There is insufficient evidence to indicate the variance of the stock beta values differs from .15 at .05 . 7.122 a. The population parameter of interest is p = proportion of items that had the wrong price scanned at California Wal-Mart stores. Copyright © 2014 Pearson Education, Inc. 396 Chapter 7 b. To determine if the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard, we test: H 0 : p .02 H a : p .02 c. The test statistic is z pˆ po po qo n .083 .02 .02(.98) 1000 14.23 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . d. Since the observed value of the test statistic falls in the rejection region ( z 14.23 1.645) , H0 is rejected. There is sufficient evidence to indicate that the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard at .05 . This means that the proportion of items with wrong prices at California Wal-Mart stores is much higher than what is allowed. e. In order for the inference to be valid, the sampling distribution of p̂ must be approximately normal. For this assumption to be valid, both np0 15 and nq0 15 . nq0 1000(.98) 980 np0 1000(.02) 20 Since np0 15 and nq0 15 , we can assume the distribution of p̂ is approximately normal. 7.123 a. Let p = proportion of time the camera correctly detects liars. The null hypothesis would be: H 0 : p .75 7.124 b. A Type I error would be to conclude the camera cannot correctly identify liars 75% of the time when, in fact, it can. A Type II error would be to conclude the camera can correctly identify liars 75% of the time when, in fact, it cannot. a. A Type I error is rejecting H0 when H0 is true. In this case, we would conclude that the mean number of carats per diamond is different from .6 when, in fact, it is equal to .6. A Type II error is accepting H0 when H0 is false. In this case, we would conclude that the mean number of carats per diamond is equal to .6 when, in fact, it is different from .6. b. From Exercise 6.120, the random sample of 30 diamonds yielded x .691 and s .262 . Let mean number of carats per diamond. To determine if the mean number of carats per diamond is different from .6, we test: H 0 : .6 H a : .6 The test statistic is z x 0 x .691 .6 .262 30 1.90 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 397 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.90 1.96) , H0 is not rejected. There is insufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at .05 . c. When is changed, H0, Ha, and the test statistic remain the same. The rejection region requires / 2 .10 / 2 .05 in each tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 or z 1.645 . Since the observed value of the test statistic falls in the rejection region z 1.90 1.645 H0 is rejected. There is sufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at .10 . d. 7.125 When the value of changes, the decision can also change. Thus, it is very important to include the level of used in all decisions. Some preliminary calculations are: x x 667.3 95.33 n s2 7 x x 2 2 n 1 n (667.3) 2 7 42.539 7 1 63,867.99 s 42.539 6.5222 a. To determine if the true mean cost-of-living index for Southeastern cities is different than the mean national cost-of-living index of 100, we test: H 0 : 100 H a : 100 b. Since the sample size is so small, we must assume that the population being sampled is normal. In addition, we must assume that the sample is random. c. The test statistic is t x 0 s/ n 95.33 100 6.5222 / 7 1.89 The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution. From Table III, Appendix D, with df n 1 7 1 6 , t.025 2.447 . The rejection region is t 2.447 or t 2.447 . Since the observed value of the test statistic does not fall in the rejection region (t 1.89 2.447) , H0 is not rejected. There is insufficient evidence to indicate the true mean cost-of-living index for Southeastern cities is different than the mean national cost-of-living index of 100 at .05 . Copyright © 2014 Pearson Education, Inc. 398 Chapter 7 d. The observed significance level is p-value p P(t 1.89) P(t 1.89) . Since we did not reject H0 in part c, we know that the p-value must be greater than .05. Using MINITAB, Cumulative Distribution Function Student's t distribution with 6 DF x P( X <= x ) -1.89 0.0538261 Thus, the p-value is p 2(.0538261) .1076522 . 7.126 a. Let p = proportion of shoppers using cents-off coupons. To determine if the proportion of shoppers using cents-off coupons exceeds .65, we test: H 0 : p .65 H a : p .65 The test statistic is z pˆ p0 p0 q0 n .72 .65 .65(.35) 1, 000 4.64 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 4.64 1.645) , H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at .05 . b. The sample size is large enough if the np0 15 and nq0 15 . np0 1000(.65) 650 nq0 1000(.35) 350 Since both np0 15 and nq0 15, the normal distribution will be adequate. 7.127 c. The p-value is p P( z 4.64) (.5 .5) .0 . (Using Table II, Appendix D.) Since the p-value is smaller than .05 , H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at .05 . a. The hypotheses would be: H0: Individual does not have the disease Ha: Individual does have the disease b. A Type I error would be: Conclude the individual has the disease when in fact he/she does not. This would be a false positive test. A Type II error would be: Conclude the individual does not have the disease when in fact he/she does. This would be a false negative test. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis c. 7.128 399 If the disease is serious, either error would be grave. Arguments could be made for either error being more grave. However, I believe a Type II error would be more grave: Concluding the individual does not have the disease when he/she does. This person would not receive critical treatment, and may suffer very serious consequences. Thus, it is more important to minimize . Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tunnel Variable Tunnel N 10 Mean 989.8 Median 970.5 StDev 160.7 Minimum 735.0 Maximum 1260.0 Q1 862.5 Q3 1096.8 To determine whether peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour, we test: H 0 : 1, 220 H a : 1, 220 The test statistic is t x 0 s/ n 989.8 1, 220 160.7 / 10 4.53 Since no is given, we will use .05 . The rejection region requires .05 in the lower tail of the t-distribution with df n 1 10 1 9 . From Table III, Appendix D, t.05 1.833 . The rejection region is t 1.833. Since the observed value of the test statistic falls in the rejection region (t 4.53 1.833) , H0 is rejected. There is sufficient evidence to indicate that peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour at .05 . 7.129 To determine if the true standard deviation of the point-spread errors exceed 15 (variance exceeds 225), we test: H 0 : 2 225 H a : 2 225 The test statistic is 2 (n 1) s 2 02 (240 1)13.32 187.896 225 The rejection region requires in the upper tail of the 2 distribution with df n 1 240 1 239 . The maximum value of df in Table IV is 100. Thus, we cannot find the rejection region using Table IV. Using a statistical package, the p-value associated with 2 187.896 is p .9938 . Since the p-value is so large, there is no evidence to reject H0. There is insufficient evidence to indicate that the true standard deviation of the point-spread errors exceeds 15 for any reasonable value of . Since the observed standard deviation (13.3) is less than the hypothesized value of the standard deviation (15) under H0, there is no way H0 will be rejected for any reasonable value of . Copyright © 2014 Pearson Education, Inc. 400 7.130 Chapter 7 a. To determine if the true mean number of pecks at the blue string is less than 7.5, we test: H 0 : 7.5 H a : 7.5 The test statistic is z x 0 x 1.13 7.5 2.21 72 24.46 The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic falls in the rejection region ( z 24.46 2.33) , H0 is rejected. There is sufficient evidence to indicate the true mean number of pecks at the blue string is less than 7.5 at .01 . b. From Exercise 6.122, the 99% confidence interval is .46, 1.80 . Since the hypothesized value of the mean ( 7.5) does not fall in the confidence interval, it is not a likely candidate for the true value of the mean. Thus, you would reject it. This agrees with the conclusion in part a. 7.131 a. First, check to see if n is large enough: np0 132(.5) 66 nq0 132(.5) 66 Since both np0 15 and nq0 15 , the normal distribution will be adequate. To determine if there is evidence to reject the claim that no more than half of all manufacturers are dissatisfied with their trade promotion spending, we test: H 0 : p .5 H a : p .5 The test statistic is z pˆ p 0 p 0q 0 n .36 .5 .5(.5) 132 3.22 The rejection region requires .02 in the upper tail of the z-distribution. From Table II, Appendix D, z.02 2.05 . The rejection region is z 2.05 . Since the observed value of the test statistic does not fall in the rejection region ( z 3.22 2.05) , H0 is not rejected. There is insufficient evidence to reject the claim that no more than half of all manufacturers are dissatisfied with their trade promotion spending at .02 . b. The observed significance level is p-value P( z 3.22) .5 .5 1 . Since this p-value is so large, H0 will not be rejected for any reasonable value of . Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis c. 401 First, we must define the rejection region in terms of p̂ . pˆ p0 z pˆ .5 2.05 .5(.5) .589 132 .589 .55 P z .90 .5 .3159 .8159 P ( pˆ .589 | p .55) P z .55(.45) 132 7.132 a. pˆ 24 / 40 .6 To determine if the proportion of shoplifters turned over to police is greater than .5, we test: H 0 : p .5 H a : p .5 The test statistic is z pˆ p0 p0 q0 n .6 .5 .5(.5) 40 1.26 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.26 1.645) , H0 is not rejected. There is insufficient evidence to indicate the proportion of shoplifters turned over to police is greater than .5 at .05 . b. To determine if the normal approximation is appropriate, we check: np0 40(.5) 20 nq0 40(.5) 20 Since both np0 15 and nq0 15 , the normal distribution will be adequate. c. The observed significance level of the test is p-value p P( z 1.26) .5 .3962 .1038 . (Using Table II, Appendix D) The probability of observing the value of our test statistic or anything more unusual if the true value of p is .5 is .1038. Since this p-value is so large, there is no evidence to reject H0. There is no evidence to indicate the true proportion of shoplifters turned over to police is greater than .5. 7.133 d. Any value of that is greater than the p-value would lead one to reject H0. Thus, for this problem, we would reject H0 for any value of .1038 . a. A Type II error is concluding the percentage of shoplifters turned over to police is 50% when in fact, the percentage is higher than 50%. b. First, calculate the value of p̂ that corresponds to the border between the acceptance region and the rejection region. Copyright © 2014 Pearson Education, Inc. 402 Chapter 7 c. P( pˆ po ) P( z zo ) .05. From Table II, Appendix D, z0 1.645 pˆ 0 po 1.645 pˆ .5 1.645 .5(.5) .5 .1300 .6300 40 .6300 -.55 P( pˆ .6300 when p .55) P z P ( z 1.02) .5 .3461 .8461 .55(.45) 40 If n increases, the probability of a Type II error would decrease. First, calculate the value of p̂0 that corresponds to the border between the acceptance region and the rejection region. P( pˆ po ) P( z zo ) .05. From Table II, Appendix D, z0 1.645 pˆ 0 po 1.645 pˆ .5 1.645 .5(.5) .5 .082 .582 100 .582 -.55 P ( z 0.64) .5 .2389 .7389 P( pˆ .582 when p .55) P z .55(.45) 100 7.134 a. To determine whether the mean profit change for restaurants with frequency programs is greater than $1,050, we test: H 0 : 1, 050 H a : 1, 050 b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: x Variable x N 12 Mean 2509 The test statistic is t StDev 2149 x 0 s/ n Variance 4619332 2509 1050 2149 / 12 Minimum -2191 Q1 1646 Median 2493 Q3 3426 Maximum 6553 2.35 The rejection region requires .05 in the upper tail of the t-distribution with df n 1 12 1 11 . From Table III, Appendix D, t.05 1.796 . The rejection region is t 1.796 . Since the observed value of the test statistic falls in the rejection region t 2.35 1.796 , H0 is rejected. There is sufficient evidence to indicate the mean profit change for restaurants with frequency programs is greater than $1,050 for .05 . It appears that the frequency program would be profitable for the company if adopted nationwide. Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.135 a. 403 To determine if the production process should be halted, we test: H0 : 3 Ha : 3 where mean amount of vinyl chloride in the air. The test statistic is z x 0 x 3.1 3 .5 / 50 1.41 The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic does not fall in the rejection region, ( z 1.41 2.33) , H0 is not rejected. There is insufficient evidence to indicate the mean amount of vinyl chloride in the air is more than 3 parts per million at .01 . Do not halt the manufacturing process. 7.136 b. As plant manager, I do not want to shut down the plant unnecessarily. Therefore, I want P(shut down plant when 3 ) to be small. c. The p-value is p P( z 1.41) .5 .4207 .0793 . Since the p-value is not less than .01 , H0 is not rejected. a. A Type II error would be concluding the mean amount of vinyl chloride in the air is less than or equal to 3 parts per million when, in fact, it is more than 3 parts per million. x .5 From Exercise 7.135, z 0 x0 z 0 x0 2.33 3 x0 3.165 / n n 50 b. 3.165 3.1 P( z .92) .5 .3212 .8212 For 3.1 , P ( x 3.165) P z .5 50 (from Table II, Appendix D) c. Power 1 1 .8212 .1788 d. 3.165 3.2 P ( z .49) .5 .1879 .3121 For 3.2 , P ( x 3.165) P z .5 50 Power 1 1 .3121 .6879 As the plant's mean vinyl chloride departs further from 3, the power increases. 7.137 a. No, it increases the risk of falsely rejecting H0, i.e., closing the plant unnecessarily. b. First, find x0 such that P( x x0 ) P z z0 .05 . Copyright © 2014 Pearson Education, Inc. 404 Chapter 7 From Table II, Appendix D, z0 1.645 z x0 / n 1.645 x0 3 .5 / 50 x0 3.116 Then, compute: P( x0 3.116 when 3.1) P z 3.116 3.1 P( z .23) .5 .0910 .5910 .5 / 50 Power 1 1 .5910 .4090 7.138 c. The power of the test increases as increases. a. Some preliminary calculations: x x 79.93 15.986 n s2 5 x x 2 2 n 1 n 1, 277.7627 5 1 79.932 5 .00043 s .00043 .0207 To determine if the mean measurement differs from 16.01, we test: H 0 : 16.01 H a : 16.01 The test statistic is t x 0 s/ n 15.986 16.01 .0207 / 5 2.59 The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df n 1 5 1 4 . From Table III, Appendix D, t.025 2.776 . The rejection region is t 2.776 or t 2.776 . Since the observed value of the test statistic does not fall in the rejection region (t 2.59 2.776) , H0 is not rejected. There is insufficient evidence to indicate the true mean measurement differs from 16.01 at .05 . b. We must assume that the sample of measurements was randomly selected from a population of measurements that is normally distributed. c. To determine if the standard deviation of the weight measurements is greater than .01, we test: H 0 : 2 .012 H a : 2 .012 The test statistic is 2 (n 1) s 2 2 o (5 1).00043 17.2 . .012 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 405 The rejection region requires .05 in the upper tail of the 2 distribution with 2 9.48773 . The rejection region is df n 1 5 1 4 . From Table IV, Appendix D, .05 2 9.48773 . Since the observed value of the test statistic falls in the rejection region ( 2 17.2 9.48773) , H0 is rejected. There is sufficient evidence to indicate the standard deviation of the weight measurements is greater than .01 at .05 . 7.139 a. To determine if the GSR for all scholarship athletes at Division I institutions differs from 60%, we test: H 0 : p .60 H a : p .60 x 315 .63 n 500 The point estimate is pˆ pˆ po The test statistic is z po qo n .63 .60 .60(.40) 500 1.37 The rejection region requires /2=.01/2=.005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . Since the observed value does not fall in the rejection region ( z 1.37 2.58) , H0 is not rejected. There is insufficient evidence to conclude that the GSR for all scholarship athletes at Division I institution differs from 60% at .01 . b. To determine if the GSR for all male basketball players at Division I institutions differs from 58%, we test: H 0 : p .58 H a : p .58 The point estimate is pˆ x 84 .42 n 200 pˆ po The test statistic is z po qo n .42 .58 .58(.42) 200 4.58 The rejection region requires /2=.01/2=.005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . Since the observed value falls in the rejection region ( z 4.58 2.58) , H0 is rejected. There is sufficient evidence to conclude that the GSR for all male basketball players at Division I institutions differs from 58% .01 . 7.140 a. z x 0 x 10.2 0 31.3 / 50 2.30 Copyright © 2014 Pearson Education, Inc. 406 Chapter 7 b. For this two-sided test, the p-value P( z 2.30) P( z 2.30) (.5 .4893) (.5 .4893) .0214 . Since this value is so small, there is evidence to reject H0. There is sufficient evidence to indicate the mean level of feminization is different from 0% for any value of .0214 . c. z x - 0 x 15.0 0 25.1 / 50 4.23 For this two-sided test, the p-value P( z 4.23) P( z 4.23) (.5 .5) (.5 .5) 0 . Since this value is so small, there is evidence to reject H0. There is sufficient evidence to indicate the mean level of feminization is different from 0% for any value of 0.0 . 7.141 Let = mean lacunarity measurement for all grassland pixels. To determine if the area sampled is grassland, we test: H 0 : 220 H a : 220 The test statistic is z x o x 225 220 20 / 100 2.50 . The rejection region requires / 2 .01/ 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . 7.142 Since the observed value of the test statistic does not fall in the rejection region ( z 2.50 2.58) , H0 is not rejected. There is insufficient evidence to conclude that the area sampled is not grassland at .01 . x o 52.3 51 z 1.29 a. x 7.1 50 The p-value is p P( z 1.29) P( z 1.29) (.5 .4015) (.5 .4015) .1970 . (Using Table II, Appendix D.) b. The p-value is p P( z 1.29) (.5 .4015) .0985 . (Using Table II, Appendix D.) c. z x o x 52.3 51 10.4 50 0.88 The p-value is p P( z 0.88) P( z 0.88) (.5 .3106) (.5 .3106) .3788 . (Using Table II, Appendix D.) d. In part a, in order to reject H0, would have to be greater than .1970. In part b, in order to reject H0, would have to be greater than .0985. In part c, in order to reject H0, would have to be greater than .3788. e. For a two-tailed test, / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . z x o x 2.58 52.3 51 s 50 2.58 s 50 52.3 51 .3649s 1.3 s 3.56 Copyright © 2014 Pearson Education, Inc. Inferences Based on a Single Sample: Tests of Hypothesis 7.143 a. 407 To determine whether the true mean rating for this instructor-related factor exceeds 4, we test: H0 : 4 Ha : 4 The test statistic is z x 0 x 4.7 4 1.62 / 40 2.73 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region z 2.73 1.645 , H0 is rejected. There is sufficient evidence to indicate that the true mean rating for this instructor-related factor exceeds 4 at .05 . b. If the sample size is large enough, one could almost always reject H0. Thus, we might be able to detect very small differences if the sample size is large enough. This would be statistical significance. However, even though statistical significance is found, it does not necessarily mean that there is practical significance. A statistical significance can sometimes be found between the hypothesized value of a mean and the estimated value of the mean, but, in practice, this difference would mean nothing. This would be practical significance. c. Since the sample size is sufficiently large n 40 , the Central Limit Theorem indicates that the sampling distribution of x is approximately normal. Also, since the sample size is large, s is a good estimator of . Thus, the analysis used is appropriate. 7.144 Let p proportion of patients taking the pill who reported an improved condition. First we check to see if the normal approximation is adequate: np0 7000(.5) 3500 nq0 7000(.5) 3500 Since both np0 15 and nq0 15 , the normal distribution will be adequate. To determine if there really is a placebo effect at the clinic, we test: H 0 : p .5 H a : p .5 The test statistic is z pˆ p0 p0 q0 n .7 .5 .5(.5) 7000 33.47 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region z 33.47 1.645 , H0 is rejected. There is sufficient evidence to indicate that there really is a placebo effect at the clinic at .05 . Copyright © 2014 Pearson Education, Inc. 408 7.145 Chapter 7 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Candy Variable Candy N 5 Mean 22.000 StDev 2.000 Minimum 20.000 Q1 20.500 Median 21.000 Q3 24.000 Maximum 25.000 To give the benefit of the doubt to the students we will use a small value of . (We do not want to reject H0 when it is true to favor the students.) Thus, we will use .001 . We must also assume that the sample comes from a normal distribution. To determine if the mean number of candies exceeds 15, we test: H 0 : 15 H a : 15 The test statistic is z x o n 22 15 2 5 7.83 The rejection region requires .001 in the upper tail of the z-distribution. From Table II, Appendix D, z.001 3.08 . The rejection region is z 3.08 . Since the observed value of the test statistic falls in the rejection region z 7.83 3.08 , H0 is rejected. There is sufficient evidence to indicate the mean number of candies exceeds 15 at .001 . Copyright © 2014 Pearson Education, Inc. Chapter 8 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.1 a. b. 8.2 8.3 1 2 x 1 2 1 2 2 x 2 2 2 1 n1 2 150 2 150 2 n2 900 100 150 6 144, 156 1600 100 150 8 142, 158 12 22 900 1600 100 100 2500 5 100 c. x x 1 2 150 150 0 d. ( 1 2 ) 2 e. The variability of the difference between the sample means is greater than the variability of the individual sample means. a. x 1 12 x b. x 2 10 x c. x x 1 2 12 10 2 x x d. Since n1 30 and n2 30 , the sampling distribution of x1 x2 is approximately normal by the Central Limit Theorem. a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: 1 2 12 n1 22 n2 1 n1 2 2 2 12 n1 22 n2 1 2 n1 2 n2 900 1600 0 10 ( 10, 10) 100 100 1 2 ( x1 x2 ) z.025 1 (150 150) 2 1 1 x x n2 4 3 12 n1 64 64 (5, 275 5, 240) 1.96 22 n2 .5 .375 4 2 32 64 64 25 .625 64 150 2 200 2 35 24.5 10.5, 59.5 400 400 We are 95% confident that the difference between the population means is between 10.5 and 59.5. 409 Copyright © 2014 Pearson Education, Inc. 410 Chapter 8 b. The test statistic is z ( x1 x2 ) ( 1 2 ) 2 1 n1 2 2 (5, 275 5, 240) 0 2 2 2.8 150 200 400 400 n2 The p-value is p P( z 2.8) P( z 2.8) 2P( z 2.8) 2(.5 .4974) 2 .0026 .0052 Since the p-value is so small, there is evidence to reject H0. There is evidence to indicate the two population means are different for .0052 . c. The p-value would be half of the p-value in part b. The p-value p P( z 2.8) .5 .4974 .0026 . Since the p-value is so small, there is evidence to reject H0. There is evidence to indicate the mean for population 1 is larger than the mean for population 2 for .0026 . d. The test statistic is z ( x1 x2 ) ( 1 2 ) 2 1 n1 2 2 (5, 275 5, 240) 25 n2 2 2 .8 150 200 400 400 The p-value of the test is p P( z .8) P( z .8) 2P( z .8) 2(.5 .2881) 2 .2119 .4238 Since the p-value is so large, there is no evidence to reject H0. There is no evidence to indicate that the difference in the 2 population means is different from 25 for .10 . e. 8.4 We must assume that we have two independent random samples. Assumptions about the two populations: 1. 2. Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal. Assumptions about the two samples: The samples are randomly and independently selected from the populations. 8.5 8.6 a. No. Both populations must be normal. b. No. Both population variances must be equal. c. No. Both populations must be normal. d. Yes. e. No. Both populations must be normal. a. s 2p ( n1 1) s12 ( n2 1) s22 (25 1)120 (25 1)100 5, 280 110 n1 n2 2 25 25 2 48 b. s 2p (20 1)12 (10 1)20 408 14.5714 20 10 2 28 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses c. s 2p (6 1).15 (10 1).2 2.55 .1821 6 10 2 14 d. s 2p (16 1)3, 000 (17 1)2,500 85, 000 2, 741.9355 16 17 2 31 sp2 falls nearer the variance with the larger sample size. 8.7 Some preliminary calculations are: x1 x2 x 11.8 2.36 1 5 n1 x 14.4 3.6 2 n2 4 s12 s22 x x 2 1 2 1 n1 n1 1 x22 x n2 n2 1 (11.8) 2 5 .733 5 1 30.78 2 2 (14.4) 2 4 .42 4 1 53.1 ( n1 1) s12 ( n2 1) s22 (5 1).773 + (4 1).42 4.192 = = = .5989 5+42 7 n1 n2 2 a. sp2 b. H0 : 1 2 0 Ha : 1 2 0 The test statistic is t ( x1 x2 ) D0 1 1 sp2 n1 n2 (2.36 3.6) 0 1 1 .5989 5 4 1.24 2.39 .5191 The rejection region requires .10 in the lower tail of the t-distribution with df n1 n2 2 5 4 2 7 . From Table III, Appendix D, t.10 =1.415 . The rejection region is t 1.415 . Since the test statistic falls in the rejection region (t 2.39 1.415) , H0 is rejected. There is sufficient evidence to indicate that 2 1 at .10 . c. A small sample confidence interval is needed because n1 5 30 and n2 4 30 . For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n1 n2 2 5 4 2 7 , t.05 =1.895 . The 90% confidence interval for (1 2 ) is: 1 1 1 1 ( x1 x2 ) t.05 sp2 (2.36 3.6) 1.895 .5989 1.24 .98 (2.22, 0.26) 5 4 n1 n2 d. The confidence interval in part c provides more information about ( 1 2 ) than the test of hypothesis in part b. The test in part b only tells us that 2 is greater than 1 . However, the confidence interval estimates what the difference is between 1 and 2 . Copyright © 2014 Pearson Education, Inc. 411 412 Chapter 8 8.8 a. x x b. The sampling distribution of x1 x2 is approximately normal by the Central Limit Theorem since n1 30 and n2 30 . 1 2 12 n1 22 n2 9 16 .25 .5 100 100 x x 1 2 10 1 c. 2 x1 x2 26.6 15.5 11.1 No, it does not appear that x1 x2 11.1 contradicts the null hypothesis H0 : 1 2 10 . The value 11.1 is fairly close to 10. d. The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . e. H0 : 1 2 10 Ha : 1 2 10 The test statistic is z ( x1 x2 ) 10 2 1 n1 2 2 (26.6 15.5) 10 2.2 .5 n2 The rejection region is z 1.96 or z 1.96 . (Refer to part d.) Since the observed value of the test statistic falls in the rejection region ( z 2.2 1.96) , H0 is rejected. There is sufficient evidence to indicate the difference in the population means is not equal to 10 at .05 . f. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The confidence interval is: 9 16 11.1 .98 (10.12, 12.08) 100 100 We are 95% confident that the difference in the two means is between 10.12 and 12.08 . (26.6 15.5) 1.96 8.9 g. The confidence interval gives more information. a. The p-value p .1150 . Since the p-value is not small, there is no evidence to reject H0 for .10 . There is insufficient evidence to indicate the two population means differ for .10 . b. If the alternative hypothesis had been one-tailed, the p-value would be half of the value for the twotailed test. Here, p-value .1150 / 2 .0575 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 413 There is no evidence to reject H0 for .05 . There is insufficient evidence to indicate the mean for population 1 is less than the mean for population 2 at .05 . There is evidence to reject H0 for .0575 . There is sufficient evidence to indicate the mean for population 1 is less than the mean for population 2 at .0575 . 8.10 Some preliminary calculations: x1 x2 sp2 a. x 654 43.6 s12 1 15 n1 x 858 53.625 2 n2 16 s22 x x 2 1 2 1 n1 n1 1 x22 x n2 n2 1 6542 15 419.6 29.9714 15 1 14 28934 2 2 8582 16 439.75 29.3167 16 1 15 46450 ( n1 1) s12 ( n2 1) s22 (15 1)29.9714 (16 1)29.3167 859.3501 29.6328 15 16 2 29 n1 n2 2 H0 : 2 1 10 Ha : 2 1 10 The test statistic is t ( x2 x1 ) D0 1 1 sp2 n2 n1 (53.625 43.6) 10 1 1 29.6328 16 15 .025 .013 1.9564 The rejection region requires .01 in the upper tail of the t-distribution with df n1 n2 2 15 16 2 29 . From Table III, Appendix D, t.01 2.462 . The rejection region is t 2.462 . Since the test statistic does not fall in the rejection region (t .013 2.462) , H0 is not rejected. There is insufficient evidence to conclude 2 1 10 at .01 . b. For confidence coefficient .98, .02 and / 2 .02 / 2 .01 . From Table III, Appendix D, with df n1 n2 2 15 16 2 29 , t.01 2.462 . The 98% confidence interval for ( 1 2 ) is: 1 1 1 1 ( x2 x1 ) t / 2 sp2 (53.625 43.6) 2.462 29.6328 16 15 n2 n1 10.025 4.817 5.208, 14.842 We are 98% confident that the difference between the mean of population 2 and the mean of population 1 is between 5.208 and 14.842. Copyright © 2014 Pearson Education, Inc. 414 Chapter 8 8.11 a. sp 2 ( n1 1) s12 ( n2 1) s22 (17 -1)3.4 2 (12 1)4.82 16.237 n1 n2 2 17 12 2 H0 : 1 2 0 Ha : 1 2 0 ( x1 x2 ) 0 1 1 sp2 n1 n2 The test statistic is t = (5.4 7.9) 0 1 1 16.237 + 17 12 = 1.646 Since no was given, we will use .05 . The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df n1 n2 2 17 12 2 27 . From Table III, Appendix D, t.025 2.052 . The rejection region is t 2.052 or t 2.052 . Since the observed value of the test statistic does not fall in the rejection region (t 1.646 2.052) , H0 is not rejected. There is insufficient evidence to indicate 1 2 is different from 0 at .05 . b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df n1 n2 2 17 12 2 27 , t.025 2.052 . The confidence interval is: 1 1 1 1 ( x1 x2 ) t.025 sp2 (5.4 7.9) 2.052 16.237 2.50 3.12 ( 5.62, 0.62) 17 12 n1 n2 8.12 a. The target parameter is 1 2 difference in mean trap measurements between the Bahia Tortugas fishing cooperative and the Punta Abreojos fishing cooperative. b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: BT, PA Variable N BT 7 PA 8 Mean 89.86 99.63 StDev Variance Minimum 11.63 135.14 70.00 27.38 749.70 66.00 Q1 Median Q3 Maximum 82.00 93.00 99.00 105.00 76.50 96.00 115.00 153.00 The point estimate is x1 x2 89.86 99.63 9.77 . c. Since the sample sizes for both samples are so small, the Central Limit Theorem does not apply. In addition, the population standard deviations are not known and must be estimated with the sample standard deviations. d. sp2 ( n1 1) s12 ( n2 1) s22 (7 1)135.14 (8 1)749.70 6, 058.74 466.0569 782 13 n1 n2 2 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n1 n2 2 7 8 2 13 , t.05 1.771 . The 90% confidence interval for ( 1 2 ) is: Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 415 1 1 1 1 ( x1 x2 ) t / 2 sp2 (89.86 99.63) 1.771 466.0569 7 8 n1 n2 9.77 19.787 29.557, 10.017 8.13 e. Since 0 is not in the 90% confidence interval, there is sufficient evidence to indicate a difference in the mean trap measurements between the two fishing cooperatives. f. We must assume that we have independent random samples from normal populations and that the population variances are the same. a. sp2 ( n1 1) s12 ( n2 1) s22 (25 1)10.412 (25 1)7.12 2 3,817.5 79.53125 25 25 2 48 n1 n2 2 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . Using MINITAB with df n1 n2 2 25 25 2 48 , t.025 2.011 . The 95% confidence interval for ( 1 2 ) is: 1 1 1 1 ( x1 x2 ) t / 2 sp2 (25.08 19.38) 2.011 79.53125 25 25 n1 n2 5.7 5.073 .627, 10.773 b. 8.14 Since 0 does not fall in the 95% confidence interval, there is evidence to indicate there is a difference in the mean response times between the two groups. Since the interval contains only positive numbers, it indicates that the mean response time for the group of students whose last names begin with the letters R-Z is shorter than the mean response time for the group of students whose last names begin with the letters A-I. This supports the researchers’ last name effect theory. Let 1 the mean test score of students on the SAT reading test in classrooms that used educational software and 2 the mean test score of students on the SAT reading test in classrooms that did not use the technology a. The parameter of interest is 1 2 . b. The null and alternative hypotheses for the test are: H 0 : 1 2 0 H a : 1 2 0 8.15 c. Since the p-value of the test is so large ( p 0.62) , we would not reject H0 for any reasonable value of . There is insufficient evidence to indicate that the mean test score of the students on the SAT reading test was significantly higher in classrooms using reading software products than in classrooms that did not use educational software. This agrees with the conclusion of the DOE. a. Let 1 mean number of items recalled by those in the video only group and 2 mean number of items recalled by those in the audio and video group. To determine if the mean number of items recalled by the two groups is the same, we test: Copyright © 2014 Pearson Education, Inc. 416 Chapter 8 H 0 : 1 2 0 H a : 1 2 0 b. s 2p n1 1 s12 n2 1 s22 20 11.982 20 1 2.132 n1 n2 2 The test statistic is t 1 1 s 2p n1 n2 3.70 3.30 0 4.22865 1 1 4.22865 20 20 0.4 0.62 .65028 c. The rejection region requires / 2 .10 / 2 .05 in each tail of the t-distribution with df n1 n2 2 20 20 2 38 . From Table III, Appendix D, t.05 1.684 . The rejection region is t 1.684 or t 1.684 . d. Since the observed value of the test statistic does not fall in the rejection region (t 0.62 1.684) , Ho is not rejected. There is insufficient evidence to indicate a difference in the mean number of items recalled by the two groups at .10 . e. The p-value is p .542 . This is the probability of observing our test statistic or anything more unusual if H0 is true. Since the p-value is not less than .10 , there is no evidence to reject H0 . There is insufficient evidence to indicate a difference in the mean number of items recalled by the two groups at .10 . f. We must assume: 1. 2. 3. 8.16 x1 x2 Do 20 20 2 Both populations are normal Random and independent samples 12 22 Let 1 mean drug concentration for Site1 and 2 mean drug concentration for Site 2. To determine if there is a difference in the mean drug concentration between the two Sites, we test: H 0 : 1 2 0 H a : 1 2 0 From the printout, the test statistic is t .57 and the p-value is p .573 . Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate a difference in the mean drug concentrations between the two Sites. 8.17 a. Let 1 mean forecast error of buy-side analysts and 2 mean forecast error of sell-side analysts. For confidence coefficient 0.95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval is: ( x1 x2 ) z.025 12 n1 22 n2 .85 ( .05) 1.96 1.932 .85 2 .90 .064 (.836, .964) 3, 526 58, 562 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 417 We are 95% confident that the difference in the mean forecast error of buy-side analysts and sell-side analysts is between .836 and .964. b. Based on 95% confidence interval in part a, the buy-side analysts has the greater mean forecast error because our interval contains positive numbers. c. The assumptions about the underlying populations of forecast errors that are necessary for the validity of the inference are: 1. 2. 8.18 The samples are randomly and independently sampled. The sample sizes are sufficiently large. a. No. Just looking at the sample means, as the students went from no solution to check figures, the sample mean improvement score increased. However, as the students went from check figures to complete solutions, the sample mean improvement score dropped to below the no solution group. b. The problem with using only the sample means to make inferences about the population mean knowledge gains for the three groups of students is that we don’t know the variability or the “spread” of the probability distributions of the populations. c. Let 1 mean knowledge gain for students in the “no solutions” group and 2 mean knowledge gain for students in the “check figures” group. To determine if the test score improvement decreases as the level of assistance increases, we test: H 0 : 1 2 0 H a : 1 2 0 d. Since the observed significance level of the test is not less than .05 ( p .8248 .05) , H0 is not rejected. There is insufficient evidence to indicate that the mean knowledge gain of students in the “no solutions” group is greater than the mean knowledge gain of students in the “check figures” group at .05 . e. Let 3 mean knowledge gain of students in the “completed solutions” group. To determine if the test score improvement decreases as the level of assistance increases, we test: H 0 : 2 3 0 H a : 2 3 0 f. Since the observed significance level of the test is not less than .05 ( p .1849 .05) , do not reject H0. There is insufficient evidence to indicate that the mean knowledge gain of students in the “check figures” group is greater than the mean knowledge gain of students in the “complete solutions” group at .05 . g. To determine if the test score improvement decreases as the level of assistance increases, we test: H 0 : 1 3 0 H a : 1 3 0 h. Since the observed significance level of the test is not less than .05 ( p .2726 .05) , do not reject H0. There is insufficient evidence to indicate that the mean knowledge gain of students in the “no solutions” group is greater than the mean knowledge gain of students in the “complete solutions” group at .05 . Copyright © 2014 Pearson Education, Inc. 418 Chapter 8 8.19 a. The descriptive statistics are: Descriptive Statistics: Text-line, Witness-line, Intersection Variable N Text-line 3 Witness-line 6 Intersection 5 Mean 0.3830 0.3042 0.3290 Median 0.3740 0.2955 0.3190 StDev 0.0531 0.1015 0.0443 Minimum 0.3350 0.1880 0.2850 Maximum Q1 Q3 0.4400 0.3350 0.4400 0.4390 0.2045 0.4075 0.3930 0.2900 0.3730 Let 1 mean zinc measurement for the text-line, 2 mean zinc measurement for the witness-line, and 3 mean zinc measurement for the intersection. s 2p ( n1 1) s12 ( n3 1) s32 (3 1).05312 (5 1).04432 .00225 n1 n3 2 352 For .05 , / 2 .05 / 2 .025 . Using Table III, Appendix D, with df n1 n2 2 3 5 2 6 , t.025 2.447 . The 95% confidence interval is: 1 1 1 1 ( x1 x3 ) t / 2 s 2p (.3830 .3290) 2.447 .00225 3 5 n1 n3 0.0540 .0848 (0.0308, 0.1388) We are 95% confident that the difference in mean zinc level between text-line and intersection is between 0.0308 and 0.1388. To determine if there is a difference in the mean zinc measurement between text-line and intersection, we test: H 0 : 1 3 0 H a : 1 3 0 The test statistic is t ( x1 x3 ) Do 1 1 s 2p n1 n3 (.3830 .3290) 0 1 1 .00225 3 5 1.56 The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df n1 n2 2 3 5 2 6 . From Table III, Appendix D, t.025 2.447 . The rejection region is t 2.447 or t 2.447 . Since the observed value of the test statistic does not fall in the rejection region (t 1.56 2.447) , H0 is not rejected. There is insufficient evidence to indicate a difference in the mean zinc measurement between text-line and intersection at .05 . b. s 2p n2 1 s22 n3 1 s32 n2 n3 2 (6 1).10152 (5 1).04432 .006596 652 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 419 For .05 , / 2 .05 / 2 .025 . Using Table III, Appendix D, with df n1 n2 2 6 5 2 9 , t.025 2.262 . The 95% confidence interval is: 1 1 1 1 ( x2 x3 ) t / 2 s 2p .3042 .3290 2.262 .006596 6 5 n2 n3 .0248 .1112 (.1361, .0864) We are 95% confident that the difference in mean zinc level between witness-line and intersection is between 0.1361 and 0.0864. To determine if the difference in mean zinc measurement between the witness-line and the intersection, we test: H 0 : 2 3 0 H a : 2 3 0 The test statistic is t x2 x3 Do 1 1 s n2 n3 2 p .3042 .3290 0 1 1 .006596 6 5 .50 The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution. From Table III, Appendix D, with df n1 n2 2 6 5 2 9 , t.025 2.262 . The rejection region is t 2.262 or t 2.262 . Since the observed value of the test statistic does not fall in the rejection region (t .50 2.262) , H0 is not rejected. There is insufficient evidence to indicate a difference in mean zinc measurement between witness-line and intersection at .05 . 8.20 c. If we order the sample means, the largest is Text-line, the next largest is intersection and the smallest is witness-line. In parts a and b, we found that text-line is not different from the intersection and that the witness-line is not different from the intersection. However, we cannot make any decisions about the difference between the witness-line and the text-line. d. In order for the above inferences to be valid, we must assume: 1. The three samples are randomly selected in an independent manner from the three target populations. 2. All three sampled populations have distributions that are approximately normal. 3. All three population variances are equal (i.e. 12 22 32 ) Let 1 mean amount of surplus Missouri producers are willing to sell to the biomass market and 2 mean amount of surplus Illinois producers are willing to sell to the biomass market. To determine if there is a difference in the mean amount of surplus producers are willing to sell to the biomass market between Missouri and Illinois producers, we test: Copyright © 2014 Pearson Education, Inc. 420 Chapter 8 H 0 : 1 2 0 H a : 1 2 0 The test statistic is z ( x1 x2 ) ( 1 2 ) 2 1 n1 2 2 (21.5 22.2) 0 2 2 .31 . 33.4 34.9 431 508 n2 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z .31 1.96) , H0 is not rejected. There is insufficient evidence to indicate a difference in the mean amount of surplus producers are willing to sell to the biomass market between Missouri and Illinois producers at .05 . 8.21. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Control, Rude Variable Rude Control N 45 53 Mean 8.511 11.81 StDev 3.992 7.38 Minimum 0.000 0.00 Q1 5.500 5.50 Median 9.000 12.00 Q3 11.000 17.50 Maximum 18.000 30.00 Let 1 mean performance level of students in the rudeness group and 2 mean performance level of students in the control group. To determine if the true performance level for students in the rudeness condition is lower than the true mean performance level for students in the control group, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is z ( x1 x2 ) 0 2 1 n1 2 2 n2 (8.511 11.81) 0 3.9222 7.382 45 53 2.81 The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic falls in the rejection region ( z 2.81 2.33) , H0 is rejected. There is sufficient evidence to indicate the true mean performance level for students in the rudeness condition is lower than the true mean performance level for students in the control group at .01 . 8.22 a. If the manipulation was successful, then the positive group (requiring a strong display of positive emotions) should have the higher mean response. The members of this group should disagree with the statement presented, resulting in higher responses. b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Positive, Neutral Variable Positive Neutral N 78 67 Mean 4.4872 1.8955 StDev 0.6595 0.4965 Minimum 2.0000 1.0000 Q1 4.0000 2.0000 Median 5.0000 2.0000 Copyright © 2014 Pearson Education, Inc. Q3 5.0000 2.0000 Maximum 5.0000 3.0000 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 421 Let 1 mean response for the positive group and 2 mean response for the neutral group. To determine if the manipulation was successful, we test: H 0 : 1 2 0 H a : 1 2 0 ( x1 x2 ) ( 1 2 ) The test statistic is z 2 1 n1 2 2 n2 (4.4872 1.8955) 0 2 .6595 .4965 78 67 2 26.94 . The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 26.94 1.645) , H0 is rejected. There is sufficient evidence to indicate the mean response for the positive group is greater than the mean response for the neutral group at .05 . Thus, the manipulation was successful. c. 8.23 We need to assume that random and independent samples were selected from each of the populations. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Honey, DM Variable Honey DM N 35 33 Mean 10.714 8.333 StDev 2.855 3.256 Minimum 4.000 3.000 Q1 9.000 6.000 Median 11.000 9.000 Q3 12.000 11.500 Maximum 16.000 15.000 Let 1 mean improvement in total cough symptoms score for children receiving the Honey dosage and 2 mean improvement in total cough symptoms score for children receiving the DM dosage. To test if honey may be a preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is z ( x1 x2 ) 0 2 1 n1 2 2 n2 (10.714 8.333) 0 2.8552 3.2562 35 33 3.20 Since no was given, we will use .05 . The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 3.20 1.645) , H0 is rejected. There is sufficient evidence to indicate that honey may be a preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection at .05 . Copyright © 2014 Pearson Education, Inc. 422 Chapter 8 8.24 a. Let 1 the mean heat rates of traditional augmented gas turbines and 2 the mean heat rates of aeroderivative augmented gas turbines. Some preliminary calculations are: s 2p n1 1 s12 n2 1 s22 39 112792 7 1 26522 n1 n2 2 39 7 2 2,371,831.409 To determine if there is a difference in the mean heat rates for traditional augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is t x1 x2 Do 1 1 s 2p n1 n2 11,544 12,312 0 1 1 2,371,831.409 39 7 768 1.21 632.1782 The rejection region requires / 2 .05/ 2 .025 in each tail of the t-distribution with df n1 n2 – 2 39 7 – 2 44 . From Table III, Appendix D, t.025 2.021 . The rejection region is t 2.021 or t 2.021 . Since the observed value of the test statistic does not fall in the rejection region (t 1.20 2.021) , H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the mean heat rates for traditional augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines at .05 . b. Let 3 the mean heat rates of advanced augmented gas turbines and 2 the mean heat rates of aeroderivative augmented gas turbines. Some preliminary calculations are: s 2p n3 1 s32 n2 1 s22 21 1 6392 7 1 26522 n3 n2 2 21 7 2 1,937,117.077 To determine if there is a difference in the mean heat rates for traditional augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines, we test: H 0 : 3 2 0 H a : 3 2 0 The test statistic is x x Do 9, 764 12,312 0 2,548 4.19 t 3 2 1 1 607.4329 1 2 1 1,937,117.077 sp 21 7 n3 n2 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 423 The rejection region requires / 2 .05/ 2 .025 in each tail of the t-distribution with df n1 n2 – 2 21 7 – 2 26 . From Table III, Appendix D, t.025 2.056 . The rejection region is t 2.056 or t 2.056 . Since the observed value of the test statistic falls in the rejection region (t 4.19 2.056) , H0 is rejected. There is sufficient evidence to indicate that there is a difference in the mean heat rates for advanced augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines at .05 . 8.25 a. We cannot provide a measure of reliability because we have no measure of the variability or variance of the data. b. We would need the variances of the two samples. c. Let 1 mean age for self-employed immigrants and 2 mean age for the wage-earning immigrants. To determine if the mean age for self-employed immigrants is less than the mean age for wage-earning immigrants, we test: H 0 : 1 2 0 H a : 1 2 0 The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . d. We use the following to solve for : z ( x1 x2 ) ( 1 2 ) 2 1 n1 1.91 e. 8.26 a. 2 2 (44.88 46.79) 0 2 2 870 n2 2.33 84,875 1 1 (2.33) 24.056 870 84,875 The true value of is likely to be smaller than 24.056. This standard deviation would be too large for the ages of people. H 0 : 1 2 0 H a : 1 2 0 The rejection region requires .10 in the lower tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . b. H 0 : 1 2 0 H a : 1 2 0 The test statistic is z d 0 3.5 0 4.71 . sd 21 nd 38 Copyright © 2014 Pearson Education, Inc. 424 Chapter 8 The rejection region is z 1.28 . (Refer to part a.) Since the observed value of the test statistic falls in the rejection region ( z 4.71 1.28) , H0 is rejected. There is sufficient evidence to indicate 1 2 0 at .10 . c. Since the sample size of the number of pairs is greater than 30, we do not need to assume that the population of differences is normal. The sampling distribution of d is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected. d. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: d z.05 8.27 8.28 3.5 1.645 sd nd 21 38 3.5 1.223 ( 4.723, 2.277) e. The confidence interval provides more information since it gives an interval of possible values for the difference between the population means. a. The rejection region requires .05 in the upper tail of the t-distribution with df nd 1 12 1 11 . From Table III, Appendix D, t.05 1.796 . The rejection region is t 1.796 . b. From Table III, with df nd 1 24 1 23 , t.10 1.319 . The rejection region is t 1.319 . c. From Table III, with df nd 1 4 1 3 , t.025 3.182 . The rejection region is t 3.182 . d. Using Minitab, with df nd 1 80 1 79 , t.01 2.374 . The rejection region is t 2.374 . a. Pair Difference 1 2 3 4 5 6 3 2 2 4 0 1 nd di nd (12) 2 i 1 2 34 d i 6 nd i 1 2 2 sd nd 1 6 1 2 d nd d i 1 nd b. i 12 2 6 d 1 2 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses c. For confidence coefficient .95, .05 and / 2 .05/ 2 .025 . From Table III, Appendix D, with df nd 1 6 1 5 , t.025 2.571 . The confidence interval is: d t / 2 d. sd nd 2 2.571 6 2 1.484 .516, 3.484 H 0 : d 0 H a : d 0 d The test statistic is t sd nd 2 2/ 6 3.46 The rejection region requires / 2 .05/ 2 .025 in each tail of the t-distribution with df nd 1 6 1 5 . From Table III, Appendix D, t.025 2.571 . The rejection region is t 2.571 or t 2.571 . Since the observed value of the test statistic falls in the rejection region (t 3.46 2.571) , H0 is rejected. There is sufficient evidence to indicate that the mean difference is different from 0 at .05 . 8.29 Let 1 mean of population 1 and 2 mean of population 2. a. b. 425 H 0 : d 0 where d 1 2 H a : d 0 Some preliminary calculations are: d Pair Population 1 Population 2 1 2 3 4 5 6 7 8 9 10 19 25 31 52 49 34 59 47 17 51 24 27 36 53 55 34 66 51 20 55 nd di nd 2 37 i 1 2 d i 181 nd 10 4.9 sd2 i 1 nd 1 10 1 2 nd d i 1 nd Difference, d 5 2 5 1 6 0 7 4 3 4 i 37 3.7 10 The test statistic is t d sd nd 3.7 4.9 5.29 10 Copyright © 2014 Pearson Education, Inc. 426 Chapter 8 The rejection region requires .10 in the lower tail of the t-distribution with df nd 1 10 1 9 . From Table III, Appendix D, t.10 1.383 . The rejection region is t 1.383 . Since the observed value of the test statistic falls in the rejection region (t 5.29 1.383) , H0 is rejected. There is sufficient evidence to indicate the mean of population 1 is less than the mean for population 2 at .10 . c. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df nd 1 10 1 9 , t.05 1.833 . The 90% confidence interval is: d t / 2 sd nd 3.7 1.833 4.9 10 3.7 1.28 ( 4.98, 2.42) We are 90% confident that the difference in the two population means is between 4.98 and 2.42. 8.30 d. We must assume that the population of differences is normal, and the sample of differences is randomly selected. a. Some preliminary calculations: d nd di nd i 1 2 4682 di 6,880 n 40 36.0103 d sd2 i 1 nd 1 40 1 2 nd d i 1 nd i 468 11.7 40 To determine if 1 2 d is different from 10, we test: H 0 : d 10 H a : d 10 The test statistic is z d D0 11.7 10 1.79 sd 36.0103 nd 40 The rejection region requires / 2 .05/ 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.79 1.96) , H0 is not rejected. There is insufficient evidence to indicate 1 2 d is different from 10 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.31 427 b. The p-value is p P( z 1.79) P( z 1.79) (.5 .4633) (.5 .4633) .0367 .0367 .0734 . The probability of observing our test statistic or anything more unusual if H0 is true is .0734. Since this p-value is not small, there is no evidence to indicate 1 2 d is different from 10 at .05 . c. No, we do not need to assume that the population of differences is normally distributed. Because our sample size is 40, the Central Limit Theorem applies. a. Let 1 mean starting BMI and 2 mean ending BMI. To determine if the mean BMI at the end of the camp is less than the mean BMI at the start of camp, we test: H 0 : d 0 H a : d 0 where d 1 2 b. The data should be analyzed as a paired-difference t-test. Each camper had his/her BMI measured at the start of the camp and at the end. Therefore, these two sets of BMI’s are not independent. c. The test statistic is z ( x1 x2 ) ( 1 2 ) n1 n2 2 1 d 2 2 3.3 (34.9 31.6) 0 2 2 3.10 . 6.9 6.2 76 76 d. The test statistic is z e. The test statistic using the paired-difference formula is much larger than the test statistic using the independent samples formula. The test statistic for the paired-difference provides more evidence to support the alternative hypothesis. f. Since the p-value is less than ( p .0001 .01) , H0 is rejected. There is sufficient evidence to indicate the mean BMI at the end of camp is less than the mean BMI at the start of camp. g. No, the differences in the BMI values do not have to be normally distributed. The sample size is d nd 1.5 / 76 19.18 . n 76 . Thus, the Central Limit Theorem applies and says that the sampling distribution of d will be approximately normally distributed. h. For confidence coefficient .99, .01 and / 2 .01 / 2 .005 . From Table II, Appendix D, z.005 2.58 . The 99% confidence interval is: d z / 2 d nd 3.3 2.58 1.5 76 3.3 .444 2.856, 3.744 We are 99% confident that the true difference in the mean BMI scores between the start of camp and the end of camp is between 2.857 and 3.743. 8.32 a. The data should be analyzed using a paired-difference test because that is how the data were collected. Evaluation scores were collected twice from each agency, once in year 1 and once in year 2. Since the two sets of data are not independent, they cannot be analyzed using independent samples. Copyright © 2014 Pearson Education, Inc. 428 Chapter 8 b. The differences between the Year 1 score and the Year 2 score for each agency are: Agency GSA Agriculture Social Security USAID Defense c. Score-Yr1 34 33 33 32 17 d d Score-Yr2 40 35 33 42 32 Difference (Yr12-Yr1) 6 2 0 10 15 The mean and standard deviation of the differences are: d di nd 33 6.6 5 sd2 2 i 2 i nd nd 1 332 5 147.2 36.8 5 1 4 365 sd 36.8 6.066 d Do 6.6 0 2.43 sd 6.066 5 nd d. The test statistic is t e. The rejection requires .10 in the upper tail of the t-distribution with df nd 1 5 1 4 . From Table III, Appendix D, t.10 1.533 . The rejection region is t 1.533 . 8.33 f. Since the observed value of the test statistic falls in the rejection region (t 2.433 1.533) , H0 is rejected. There is sufficient evidence to indicate that the true mean evaluation score of government agencies in Year 2 exceeds the true mean evaluation score in Year 1 at .10 . a. Since the data were collected as “twin holes”, it needs to be analyzed as paired differences. b. The differences are calculated by finding the difference between the 1st hole and the second hole. Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1st hole 5.5 11.0 5.9 8.2 10.0 7.9 10.1 7.4 7.0 9.2 8.3 8.6 10.5 5.5 10.0 2nd hole 5.7 11.2 6.0 5.6 9.3 7.0 8.4 9.0 6.0 8.1 10.0 8.1 10.4 7.0 11.2 Difference -0.2 -0.2 -0.1 2.6 0.7 0.9 1.7 -1.6 1.0 1.1 -1.7 0.5 0.1 -1.5 -1.2 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses d nd di nd 2 1 2 d 1 i n 22.65 (2.1) 15 1.597 d sd2 nd 1 15 1 2 nd i d 1 nd d. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D with df nd 1 15 1 14 , t.05 1.761 . The 90% confidence interval is: e. 2.1 0.14 15 c. d t / 2 429 sd nd .14 1.761 1.2637 15 sd 1.597 1.2637 . .14 .575 .435, .715 We are 90% confident that the true difference in the mean THM measurements between the 1st and 2nd hole is between -.435 and .715. Yes, the geologists can conclude that there is no evidence of a difference in the true mean THM measurements between the original holes and their twin holes because 0 falls in the interval at .10 . 8.34 a. Let d F C . To determine if the mean score for the fictitious brand is greater than the mean score for the commercially available brand, we test: H 0 : d 0 H a : d 0 b. The data should be analyzed as paired differences. Each child rated both brands, so the samples are not independent. c. The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the test statistic falls in the rejection region ( z 5.71 1.645) , H0 is rejected. There is sufficient evidence to indicate the mean score for the fictitious brand is greater than the mean score for the commercially available brand at .05 . 8.35 d. The p-value is p P( z 5.71) .5 .5 0 . e. Yes. Since the p-value is less than .01 , the conclusion would still be to reject H0. a. The data should be analyzed using a paired-difference experiment because that is how the data were collected. Response rates were observed twice from each survey using the “not selling” introduction method and the standard introduction method. Since the two sets of data are not independent, they cannot be analyzed using independent samples. b. Some preliminary calculations are: s 2p ( n1 1) s12 ( n2 1) s22 (29 1)(.12) 2 (29 1)(.11) 2 .01325 n1 n2 2 29 29 2 Copyright © 2014 Pearson Education, Inc. 430 Chapter 8 Let 1 mean response rate for those using the “not selling” introduction and 2 mean response rate for those using the standard introduction. Using the independent-samples t-test to determine if the mean response rate for “not selling” is higher than that for the standard introduction, we test: H 0 : 1 2 0 H a : 1 2 0 ( x1 x2 ) 0 (.262 .246) 0 .53 1 1 1 1 .01325 s 29 29 n1 n2 The rejection region requires .05 in the upper tail of the t-distribution with df n1 n2 – 2 29 29 – 2 56 . From Table III, Appendix D, t.05 1.671 . The rejection region is t 1.671 . The test statistic is t 2 p Since the observed value of the test statistic does not fall in the rejection region (t .53 1.671) , H0 is not rejected. There is insufficient evidence to indicate the mean response rate for “not selling” is higher than that for the standard introduction at .05 . 8.36 c. Since p-value is less than .05 ( p .001 .05) , H0 is rejected. There is sufficient evidence to indicate the mean response rate for “not selling” is higher than that for the standard introduction at .05 . d. The two inferences in parts b and c have different results because using the independent samples ttest is not appropriate for this study. The paired-difference design is better. There is much variation in response rates from survey to survey. By using the paired difference design, we can eliminate the survey to survey differences. a. Let 1 mean driver chest injury rating and 2 mean passenger chest injury rating. Because the data are paired, we are interested in 1 2 d , the difference in mean chest injury ratings between drivers and passengers. b. The data were collected as matched pairs and thus, must be analyzed as matched pairs. Two ratings are obtained for each car – the driver’s chest injury rating and the passenger’s chest injury rating. c. Using MINITAB, the descriptive statistics are: Descriptive Statistics: DrivChst, PassChst, diff Variable DrivChst PassChst diff N 98 98 98 Mean 49.663 50.224 -0.561 Median 50.000 50.500 0.000 StDev 6.670 7.107 5.517 Minimum 34.000 35.000 -15.000 Maximum 68.000 69.000 13.000 Q1 45.000 45.000 -4.000 Q3 54.000 55.000 3.000 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The 99% confidence interval is: d z.005 sd nd 0.561 2.58 5.517 98 0.561 1.438 (1.999, 0.877) Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.37 431 d. We are 99% confidence that the difference between the mean chest injury ratings of drivers and frontseat passengers is between 1.999 and 0.877. Since 0 is in the confidence interval, there is no evidence that the true mean driver chest injury rating exceeds the true mean passenger chest injury rating. e. Since the sample size is large, the sampling distribution of d is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected. Some preliminary calculations are: Operator Difference (Before - After) 5 3 9 7 2 2 1 11 0 5 1 2 3 4 5 6 7 8 9 10 d d d d a. To determine if the new napping policy reduced the mean number of customer complaints, we test: nd = 39 = 3.9 10 sd2 2 nE nd 1 2 2 319 39 10 18.5444 10 1 sd 18.5444 4.3063 H 0 : d 0 H a : d 0 The test statistic is t d 0 3.9 0 2.864 4.3063 sd 10 nd The rejection region requires .05 in the upper tail of the t-distribution with df nd 1 10 1 9 . From Table III, Appendix D, t.05 1.833 . The rejection region is t 1.833 . Since the observed value of the test statistic falls in the rejection region (t 2.864 1.833) , H0 is rejected. There is sufficient evidence to indicate the new napping policy reduced the mean number of customer complaints at .05 . b. In order for the above test to be valid, we must assume that 1. The population of differences is normal 2. The differences are randomly selected Copyright © 2014 Pearson Education, Inc. 432 Chapter 8 c. 8.38 Variables that were not controlled that could lead to an invalid conclusion include time of day agents worked, day of the week agents worked, and how much sleep the agents got before working, among others. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Initial, Final, Diff Variable Initial Final Diff N 3 3 3 Mean 5.640 5.453 0.1867 StDev 1.075 1.125 0.1106 Minimum 4.560 4.270 0.0700 Q1 4.560 4.270 0.0700 Median 5.650 5.580 0.2000 Q3 6.710 6.510 0.2900 Maximum 6.710 6.510 0.2900 Let 1 mean initial pH level, 2 mean final pH level, and d 1 2 difference in mean pH levels between the initial and final time periods. To determine if the mean pH level after 30 days differs from the initial pH level, we test: H 0 : d 0 H a : d 0 The test statistic is t d Do .1867 0 2.924 . .1106 sd 3 nd The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df nd 1 3 1 2 . From Table III, Appendix D, t.025 4.303 . The rejection region is t 4.303 and t 4.303 . Since the observed value of the test statistic does not fall in the rejection region (t 2.924 4.303) , H0 is not rejected. There is insufficient evidence to indicate the mean pH level after 30 days differs from the initial pH level at .05 . 8.39 Some preliminary calculations are: Circuit Standard Method 1 2 3 4 5 6 7 8 9 10 11 .80 .80 .83 .53 .50 .96 .99 .98 .81 .95 .99 di 1.43 d 1 0.13 11 nd nd Huffman-coding Method .78 .80 .86 .53 .51 .68 .82 .72 .45 .79 .77 Difference .02 .00 -.03 .00 -.01 .28 .17 .26 .36 .16 .22 nd di nd 2 1 2 1 di n 0.3799 (1.43) d 11 0.0194 sd2 nd 1 11 1 2 Copyright © 2014 Pearson Education, Inc. sd 0.0194 0.1393 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 433 For confidence coefficient .95, .05 and / 2 .05/ 2 .025 . From Table III, Appendix D, with df nd 1 11 1 10 , t.025 2.228 . The 95% confidence interval is: d t.025 sd n .13 2.228 .1393 11 .13 .094 (0.036, 0.224) We are 95% confident that the true difference in mean compression ratio between the standard method and the Huffman-based coding method is between 0.036 and 0.224. Since 0 is not contained in the interval, we can conclude there is a difference in mean compression ratios between the two methods. Since the values of the confidence interval are positive, we can conclude that the mean compression ratio for the Huffmanbased method is smaller than the standard method. 8.40 Let 1 mean number of crashes caused by red light running per intersection before camera is installed and 2 mean number of crashes caused by red light running per intersection after camera is installed. The data are collected as paired data, so we will analyze the data using a paired t-test. Then, let d 1 2 . Using MINITAB to compute the differences (Di) and the summary statistics, the results are: Descriptive Statistics: Before, After, Di Variable Before After Di N 13 13 13 Mean 2.513 1.506 1.007 StDev 1.976 1.448 1.209 Minimum 0.270 0.000 -0.850 Q1 0.805 0.260 0.265 Median 2.400 1.360 0.560 Q3 3.405 2.380 2.335 Maximum 7.350 4.920 2.780 To determine if the mean number of crashes caused by red light running has been reduced since the installation of red light cameras, we test: H 0 : d 0 H a : d 0 The test statistic is t d Do 1.007 0 3.003 sd 1.209 13 nd The rejection region requires .05 in the upper tail of the t-distribution with df nd 1 13 1 12 . From Table III, Appendix D, t.05 1.782 . The rejection region is t 1.782 . Since the observed value of the test statistic falls in the rejection region (t 3.003 1.782) , H0 is rejected. There is sufficient evidence to indicate that photo-red enforcement program is effective in reducing redlight-running crash incidents at intersections at .05 . 8.41 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Male, Female, Diff Variable N Male 19 Female 19 Diff 19 Mean 5.895 5.526 0.368 Median 6.000 5.000 1.000 StDev 2.378 2.458 3.515 Minimum 3.000 3.000 -5.000 Maximum Q1 12.000 4.000 12.000 4.000 7.000 -3.000 Copyright © 2014 Pearson Education, Inc. Q3 8.000 7.000 3.000 434 Chapter 8 Let 1 mean number of swims by male rat pups and 2 mean number of swims by female rat pups. Then d 1 2 . To determine if there is a difference in the mean number of swims required by male and female rat pups, we test: H 0 : d 0 H a : d 0 The test statistic is t d Do .368 0 0.46 3.515 sd 19 nd The rejection region requires / 2 .10 / 2 .05 in each tail of the t-distribution with df nd 1 19 1 18 . From Table III, Appendix D, t.05 1.734 . The rejection region is t 1.734 or t 1.734 . Since the observed value of the test statistic does not fall in the rejection region (t .46 1.734) , H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the mean number of swims required by male and female rat pups at .10 . (Using Minitab, the p-value .653.) Since the sample size is not large, we must assume that the population of differences is normally distributed and that the sample of differences is random. There is no indication that the sample differences are not from a random sample. However, because the number of swims is discrete, the differences are probably not normal. 8.42 Using MINITAB, the descriptive statistics are: Descriptive Statistics: HMETER, HSTATIC, Diff Variable N Mean StDev Minimum Q1 Median HMETER 40 1.0405 0.0403 0.9936 1.0047 1.0232 HSTATIC 40 1.0410 0.0410 0.9930 1.0043 1.0237 Diff 40 -0.000523 0.001291 -0.004480 -0.001078 -0.000165 Q3 1.0883 1.0908 0.000317 Maximum 1.1026 1.1052 0.001580 For confidence coefficient .95, .05 and / 2 .05/ 2 .025 . From Table III, Appendix D, with df nd 1 40 1 39 , t.025 2.021 . The 95% confidence interval is: d t.025 sd n 0.000523 2.021 0.001291 40 0.000523 0.000413 (0.000936, 0.000110) We are 95% confident that the true difference in mean density measurements between the two methods is between -0.000936 and -0.000110. Since the absolute value of this interval is completely less than the desired maximum difference of .002, the winery should choose the alternative method of measuring wine density. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.43 a. 435 From the exercise, we know that x1 and x2 are binomial random variables with the number of trials equal to n1 and n2 . From Chapter 7, we know that for large n, the distribution of pˆ1 x1 is n1 approximately normal. Since x1 is simply p̂1 multiplied by a constant, x1 will also have an approximate normal distribution. Similarly, the distribution of pˆ 2 x2 is approximately normal, and n2 thus, the distribution of x2 is approximately normal. b. The Central Limit Theorem is necessary to find the sampling distributions of p̂1 and p̂2 when n1 and n2 are large. Once we have established that both p̂1 and p̂2 have normal distributions, then the distribution of their difference will also be normal. 8.44 8.45 a. The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . b. The rejection region requires .025 in the lower tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 . c. The rejection region requires .05 in the lower tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . d. The rejection region requires .10 in the lower tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . From Section 6.4, it was given that the distribution of p̂ is approximately normal if npˆ 15 and nqˆ 15 . a. n1 pˆ1 12(.42) 5.04 15 and n1qˆ1 12(.58) 6.96 15 n2 pˆ 2 14(.57) 7.98 15 and n2 qˆ2 14(.43) 6.02 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of ( pˆ1 pˆ 2 ) is approximately normal. b. n1 pˆ1 12(.92) 11.04 15 and n1qˆ1 12(.08) 0.96 15 n2 pˆ 2 14(.86) 12.04 15 and n2 qˆ2 14(.14) 1.96 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of ( pˆ1 pˆ 2 ) is approximately normal. c. n1 pˆ1 30(.70) 21 15 and n1qˆ1 30(.30) 9 15 n2 pˆ 2 30(.73) 21.9 15 and n2 qˆ2 30(.27) 8.1 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of ( pˆ1 pˆ 2 ) is approximately normal. d. n1 pˆ1 100(.93) 93 15 and n1qˆ1 100(.07) 7 15 n2 pˆ 2 250(.97) 242.5 15 and n2 qˆ2 250(.03) 7.5 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of ( pˆ1 pˆ 2 ) is approximately normal. Copyright © 2014 Pearson Education, Inc. 436 Chapter 8 e. n1 pˆ1 125(.08) 10 15 and n1qˆ1 125(.92) 115 15 n2 pˆ 2 200(.12) 24 15 and n2 qˆ2 200(.88) 176 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of ( pˆ1 pˆ 2 ) is approximately normal. 8.46 For confidence coefficient .95, .05 and / 2 .05/ 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval for p1 p2 is approximately: a. b. c. 8.47 a. ( pˆ1 pˆ 2 ) z / 2 pˆ1qˆ1 pˆ 2 qˆ2 .65(1 .65) .58(1 .58) (.65 .58) 1.96 n1 n2 400 400 ( pˆ1 pˆ 2 ) z / 2 pˆ1qˆ1 pˆ 2 qˆ2 .31(1 .31) .25(1 .25) (.31 .25) 1.96 n1 n2 180 250 ( pˆ1 pˆ 2 ) z / 2 pˆ1qˆ1 pˆ 2 qˆ2 .46(1 .46) .61(1 .61) (.46 .61) 1.96 100 120 n1 n2 .07 .067 .003, .137 .06 .086 (.026, .146) .15 .131 (.281, .019) H 0 : p1 p2 0 H a : p1 p2 0 Will need to calculate the following: pˆ 1 320 .40 800 pˆ 2 The test statistic is z 400 .50 800 pˆ1 pˆ 2 0 1 1 ˆ ˆ pq n1 n2 pˆ 320 400 .45 800 800 (.40 .50) 0 1 1 (.45)(.55) 800 800 4.02 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic does not fall in the rejection region ( z 4.02 1.645) , H0 is not rejected. There is insufficient evidence to indicate that p1 p2 the proportions are unequal at .05 . b. H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z 4.02 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 437 The rejection region requires / 2 .01/ 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . c. Since the observed value of the test statistic falls in the rejection region ( z 4.02 2.58) , H0 is rejected. There is sufficient evidence to indicate that the proportions are unequal at .01 . H 0 : p1 p2 0 H a : p1 p2 0 Test statistic as above z 4.02 . The rejection region requires .01 in the lower tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic falls in the rejection region ( z 4.02 2.33) , H0 is rejected. There is sufficient evidence to indicate that p1 p2 at .01 . d. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: ( pˆ1 pˆ 2 ) z.05 pˆ1qˆ1 pˆ 2 pˆ 2 (.4)(.6) (.5)(.5) (.4 .5) 1.645 .10 .04 ( .14, .06) n1 n2 800 800 We are 90% confident that the difference between p1 and p2 is between .14 and .06. 8.48 pˆ n1 pˆ1 n2 pˆ 2 55(.7) 65(.6) 77.5 .646 55 65 120 n1 n2 qˆ 1 pˆ 1 .646 .354 H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 (.7 .6) 0 1 1 .646(.354) 55 65 1.14 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1 .645 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.14 1.645) , H0 is not rejected. There is insufficient evidence to indicate the proportion from population 1 is greater than that for population 2 at .05 . 8.49 a. pˆ 1 x1 29 .153 n1 189 b. pˆ 2 x2 32 .215 n2 149 Copyright © 2014 Pearson Education, Inc. 438 Chapter 8 c. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: ( pˆ1 pˆ 2 ) z / 2 pˆ1qˆ1 pˆ 2 qˆ2 .153(.847) .215(.785) (.153 .215) 1.645 n1 n2 189 149 .062 .070 (.132, .008) 8.50 d. We are 90% confident that the difference in the proportion of bidders who fall prey to the winner’s curse between super-experienced bidders and less-experienced bidders is between .132 and .008. Since this interval contains 0, there is no evidence to indicate that there is a difference in the proportion of bidders who fall prey to the winner’s curse between super-experienced bidders and lessexperienced bidders. a. The point estimate for the proportion of all Democrats who prefer steak as their favorite barbeque x 662 food is pˆ1 1 .5296 . n1 1, 250 b. The point estimate for the proportion of all Republicans who prefer steak as their favorite barbeque x 586 food is pˆ 2 2 .6301 . n2 930 The point estimate for the difference between proportions of all Democrats and all Republicans who prefer steak as their barbeque food is pˆ1 pˆ 2 .5296 .6301 .1005 . c. d. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval for the difference between the proportions of all Democrats and all Republicans who prefer steak as their barbeque food is pˆ1 pˆ 2 z.025 8.51 pˆ1qˆ1 pˆ 2 qˆ2 .5296(.4704) .6301(.3699) (.5296 .6301) 1.96 n1 n2 1, 250 930 .1005 .0416 (.1421, .0589) e. We are 95 percent confident that the difference in proportions of all Democrats and all Republicans who prefer steak as their favorite barbeque food is between -.1421 and -.0589. Since this interval does not contain 0, there is a sufficient evidence to indicate that there is a significant difference between the proportions of all Democrats and all Republicans who prefer steak as their favorite barbeque food. f. “95% confident” means that in repeated sampling, 95% of all confidence intervals constructed in the same manner will contain the true population difference in proportions and 5% will not. a. Let p1 proportion of producers who are willing to offer windrowing services to the biomass market in Missouri and p2 proportion of producers who are willing to offer windrowing services to the biomass market in Illinois. The parameter of interest is p1 p2 . b. To determine if the proportion of producers who are willing to offer windrowing services differs between Missouri and Illinois, we test: H 0 : p1 p2 0 H a : p1 p2 0 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 439 c. The test statistic is z 2.67 . d. The rejection region requires / 2 .01 / 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 and z 2.58 . e. The p-value is p .008 . f. Since the observed value of the test statistic falls in the rejection region ( z 2.67 2.58) , H0 is rejected. There is sufficient evidence to indicate that the proportion of producers who are willing to offer windrowing services differs between Missouri and Illinois at .01 . Since the p-value is less than ( p .008 .01) , H0 is rejected. This is the same conclusion as above. 8.52 a. Let p1 proportion of men who prefer to keep track of appointments in their head and p2 proportion of women who prefer to keep track of appointments in their head. To determine if the proportion of men who prefer to keep track of appointments in their head is greater than that of women, we test: H 0 : p1 p2 0 H a : p1 p2 0 b. pˆ n1 pˆ1 n2 pˆ 2 500(.56) 500(.46) .51 and qˆ 1 pˆ 1 .51 .49 500 500 n1 n2 The test statistic is z 8.53 ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 (.56 .46) 0 1 1 .51(.49) 500 500 3.16 c. The rejection region requires .01 in the upper tail of the z distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . d. The p-value is p P( z 3.16) .5 .5 0 . e. Since the observed value of the test statistic falls in the rejection region ( z 3.16 2.33) , H0 is rejected. There is sufficient evidence to indicate the proportion of men who prefer to keep track of appointments in their head is greater than that of women at .01 . a. The first population of interest is all hospital patients admitted in January. The second population of interest is all hospital patients admitted in May. b. pˆ 1 x1 32 .167 n1 192 pˆ 2 x2 34 .084 n2 403 The point estimate for the difference in malaria admission rates in January and May is pˆ1 pˆ 2 .167 .084 .083 . c. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: Copyright © 2014 Pearson Education, Inc. 440 Chapter 8 ( pˆ1 pˆ 2 ) z.05 pˆ1qˆ1 pˆ 2 qˆ2 .167(.833) .084(.916) (.167 .084) 1.645 192 403 n1 n2 .083 .050 (.033, .133) 8.54 d. Since 0 is not contained in the confidence interval, we can conclude that a difference exists in the true malaria admission rates in January and May. a. Let p1 proportion of customers returning the printed survey and p2 proportion of customers returning the electronic survey. Some preliminary calculations are: pˆ 1 x1 261 .414 n1 631 pˆ 2 x2 155 .374 n2 414 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: ( pˆ1 pˆ 2 ) z.05 pˆ1qˆ1 pˆ 2 qˆ2 .414(.586) .374(.626) (.414 .374) 1.645 n1 n2 631 414 .04 .051 (.011, .091) We are 90% confidence that the difference in the response rates for the two types of surveys is between .011 and .091. b. 8.55 Since the value .05 falls in the 90% confidence interval, it is not an unusual value. Thus, there is no evidence that the difference in response rates is different from .05. The researchers would be able to make this inference. Let p1 proportion of salmonella in the region’s water and p2 proportion of salmonella in the region’s wildlife. Some preliminary calculations are: pˆ 1 x1 18 .071 n1 252 pˆ 2 x2 20 .042 n2 476 pˆ x1 x2 18 20 38 .052 n1 n2 252 476 728 To determine if the prevalence of salmonella in the region’s water differs from the prevalence of salmonella in the region’s wildlife, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 .071 .042 1 1 .052(.948) 252 476 1.68 The rejection region requires / 2 .01 / 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 and z 2.58 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 441 Since the observed value of the test statistic does not fall in the rejection region ( z 1.68 2.58) , H0 is not rejected. There is insufficient evidence to indicate the prevalence of salmonella in the region’s water differs from the prevalence of salmonella in the region’s wildlife at .01 . 8.56 Let p1 proportion of patients in the angioplasty group who had subsequent heart attacks and p2 proportion of patients in the medication only group who had subsequent attacks. Some preliminary calculations: x 211 pˆ1 1 .184 n1 1,145 qˆ1 1 pˆ1 1 .184 .816 x2 202 .177 n2 1,142 qˆ2 1 pˆ 2 1 .177 .823 pˆ 2 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . A 95% confidence interval for the difference in the rate of heart attacks for the two groups is ( pˆ1 pˆ 2 ) z.025 pˆ1qˆ1 pˆ 2 qˆ 2 .184(.816) .177(.823) (.184 .177) 1.96 .007 .032 ( .025, .039) n1 n2 1,145 1,142 Since this interval contains 0, there is insufficient evidence to indicate that there is a difference in the rate of heart attacks between the angioplasty group and the medication only group at = .05. Yes, we agree with the study’s conclusion. 8.57 Let p1 proportion of African American MBA students who begin their career as entrepreneurs and p2 proportion of white MBA students who begin their career as entrepreneurs. Some preliminary calculations: x 209 pˆ1 1 .1603 n1 1,304 x2 356 .05 n2 7,120 qˆ2 1 pˆ 2 1 .05 .95 x1 x2 209 356 .0671 n1 n2 1,304 7,120 qˆ 1 pˆ 1 .0671 .9329 pˆ 2 pˆ qˆ1 1 pˆ1 1 .1603 .8397 To determine if African American MBA students are more likely to begin their careers as an entrepreneur than white MAB students, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 .1603 .05 1 1 .0671(.9329) 1,304 7,120 14.64 Since no was given, we will use .05 . The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Copyright © 2014 Pearson Education, Inc. 442 Chapter 8 Since the observed value of the test statistic falls in the rejection region ( z 14.64 1.645) , H0 is rejected. There is sufficient evidence to indicate that the proportion of African American MBA students who begin their career as entrepreneurs is significantly greater than the proportion of white MBA students who begin their career as entrepreneurs. 8.58 Let p1 accuracy rate for modules with correct code and p2 accuracy rate for modules with defective code. Some preliminary calculations are: pˆ 1 x1 400 .891 n1 449 pˆ 2 x2 20 .408 n2 49 For confidence coefficient .99, .01 and / 2 .01 / 2 .005 . From Table II, Appendix D, z.005 2.58 . The 99% confidence interval is: ( pˆ1 pˆ 2 ) z.005 pˆ1qˆ1 pˆ 2 qˆ 2 .891(.109) .408(.592) (.891 .408) 2.58 .483 .185 (.298, .668) n1 n2 449 49 We are 99% confident that the difference in accuracy rates between modules with correct code and modules with defective code is between .298 and .668. 8.59 a. Let p1 proportion of women who have food cravings and p2 proportion of men who have food cravings. We know that pˆ1 .97 and pˆ 2 .67 . We know that n1 p1 15 and n1q1 15 in order for the test to be valid. Thus, n1 .97 15 n1 15 / .97 16 and n1 .03 15 n1 15 / .03 500 . Also, n2 p2 15 and n2 q2 15 . Thus, n2 .67 15 n2 15 / .97 23 and n2 .33 15 n2 15 / .33 46 . Thus, n1 500 and n2 46 . b. 8.60 This study involved 1,000 McMaster University students. It is very dangerous to generalize the results of this study to the general adult population of North America. The sample of students used may not be representative of the population of interest. Let p1 proportion of TV commercials in 1998 that used religious symbolism and p2 proportion of TV commercials in a recent study that used religious symbolism. Some preliminary calculations are: x x2 x1 x 16 51 67 16 51 pˆ 1 .020 pˆ 2 2 .034 .029 n1 n2 797 1, 499 2, 296 n1 797 n2 1, 499 To determine if the percentage of TV commercials that use religious symbolism has changed since the 1998 study, we test: pˆ 1 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 443 H 0 : p1 p2 0 H a : p1 p2 0 ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 The test statistic is z .020 .034 1 1 .029(.971) 797 1, 499 1.90 Since no was given, we will use .05 . The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 and z 1.96 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.90 1.96) , H0 is not rejected. There is insufficient evidence to indicate the percentage of TV commercials that use religious symbolism has changed since the 1998 study at .05 . 8.61 a. For confidence coefficient .99, .01 and / 2 .01 / 2 .005 . From Table II, Appendix D, z.005 2.58 . ( z / 2 )2 ( p1q1 p2 q2 ) 2.58 .4(1 .4) .7(1 .7) 2.99538 29,953.8 29,954 .012 .0001 ( ME ) 2 2 n1 n 2 b. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . Since we have no prior information about the proportions, we use p1 p2 .5 to get a conservative estimate. For a width of .05, the margin of error is .025. ( z / 2 )2 ( p1q1 p2 q2 ) (1.645) .5(1 .5) .5(1 .5) 2164.82 2165 ( ME ) 2 .0252 2 n1 n 2 c. From part b, z.05 1.645 . ( z / 2 )2 ( p1q1 p2 q2 ) (1.645) .2(1 .2) .3(1 .3) 1.00123 1112.48 1113 .0009 ( ME ) 2 .032 2 n1 n 2 8.62 a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . z / 2 12 22 2 n1 n2 b. ME 2 (1.96) 2 (152 17 2 ) 192.83 193 3.2 2 If the range of each population is 40, we would estimate by 60 / 4 15 For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . z / 2 12 22 2 n1 n2 ME 2 (2.58) 2 (152 152 ) 46.80 47 82 Copyright © 2014 Pearson Education, Inc. 444 Chapter 8 c. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . For a width of 1, the margin of error is .5. z / 2 12 22 2 n1 n2 8.63 n1 n2 ME 2 (1.645) 2 (5.82 7.52 ) 143.96 144 .52 ( z / 2 ) 2 ( 12 22 ) ME 2 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . n1 n2 8.64 1.962 (14 14) 33.2 34 1.82 First, find the sample sizes needed for width 5, or margin of error 2.5. For confidence coefficient .9, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . z / 2 12 22 2 n1 n2 ME 2 (1.645) 2 (10 2 10 2 ) 86.59 87 2.52 Thus, the necessary sample size from each population is 87. Therefore, sufficient funds have been allocated to meet the specifications since n1 n2 100 are large enough samples. 8.65 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . z / 2 12 22 2 n1 n2 8.66 ME 2 (1.96)2 (92 92 ) 155.6 156 22 For confidence coefficient 0.99, .01 and / 2 .01/ 2 .005 . From the Table II, Appendix D, z.005 2.58 . z / 2 ( 12 22 ) 2 n1 n1 8.67 ME 2 2.582 (12 12 ) 53.25 54 .52 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . If we assume that we do not know the return rates, we will use .5 for both. z.05 ( p1q1 p2 q2 ) 2 1.6452 (.5(.5) .5(.5)) 13,530.1 13, 531 .012 ME 2 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . n1 n2 8.68 Since no information is given about the values of p1 and p2 , we will be conservative and use .5 for both. A width of .04 means the margin of error is .04 / 2 .02 . z / 2 ( p1q1 p2 q2 ) 2 n1 n2 ME 2 1.6452 .5(.5) .5(.5) .02 2 3,382.5 3,383 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.69 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . n1 n 2 8.70 445 a. ( z / 2 ) 2 ( 12 22 ) 1.96 2 (152 152 ) 1728.72 1729 ME 2 12 For confidence coefficient .80, .20 and / 2 .20 / 2 .10 . From Table II, Appendix D, z.10 1.28 . Since we have no prior information about the proportions, we use p1 p2 .5 to get a conservative estimate. For a width of .06, the margin of error is .03. z / 2 ( p1q1 p2 q2 ) 2 n1 n 2 b. ME 2 (1.28) 2 .5(1 .5) .5(1 .5) .032 910.22 911 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . Using the formula for the sample size needed to estimate a proportion, z / 2 pq 2 n ME 2 1.6452 .5(1 .5) .02 2 .6765 1691.27 1692 .0004 No, the sample size from part a is not large enough. 8.71 a. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . From Exercise 8.56, pˆ1 .184 and pˆ 2 .177 . z ( p1q1 p2 q2 ) 1.962 (.184(.816) .177(.823)) 5,050.7 5,051 n1 n2 / 2 2 .0152 ME 2 8.72 b. The study would involve 5, 051 2 10,102 patients. A study this large would be extremely time consuming and expensive. c. Since a difference of .015 is so small, the practical significance detecting a 0.015 difference may not be very worthwhile. A difference of .015 is so close to 0, that it might not make any difference. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . z / 2 12 22 2 n1 n2 8.73 (ME )2 1.962 (352 802 ) 292.9 293 102 a. With1 9 and 2 6 , F.05 4.10 . b. With 1 18 and 2 14 , F.01 3.57 . (Since1 18 is not given, we estimate the value between those for 1 15 and 1 20 .) c. With1 11 and 2 4 , F.025 8.80 . (Since1 11 is not given, we estimate the value by averaging those given for1 10 and 1 12 .) d. With 1 20 and 2 5 , F.10 3.21 . Copyright © 2014 Pearson Education, Inc. 446 Chapter 8 8.74 a. With 1 2 and 2 30 , P(F 5.39) .01 (Table VIII, Appendix D) b. With 1 24 and 2 10 , P( F 2.74) .05 (Table VI, Appendix D) Thus, P(F 2.74) 1 P( F 2.74) 1 .05=.95 . c. With 1 7 and 2 1 , P( F 236.8) .05 (Table VI, Appendix D) Thus, P(F 236.8) 1 P( F 236.8) 1 .05=.95 . 8.75 8.76 8.77 d. With1 40 and 2 40 , P(F 2.11) .01 (Table VIII, Appendix D) a. Reject H0 if F F.10 1.74 . (From Table V, Appendix D, with1 30 and 2 20 .) b. Reject H0 if F F.05 2.04 . (From Table VI, Appendix D, with1 30 and 2 20 .) c. Reject H0 if F F.025 2.35 . (From Table VII.) d. Reject H0 if F F.01 2.78 . (From Table VIII.) To test H 0 : 12 22 against H a : 12 22 , the rejection region is F F /2 with 1 10 and 2 12 . a. .20 and / 2 .20 / 2 .10 : Reject H0 if F F.10 2.19 (Table V, Appendix D) b. .10 and / 2 .10 / 2 .05 : Reject H0 if F F.05 2.75 (Table VI, Appendix D) c. .05 and / 2 .05 / 2 .025 : Reject H0 if F F.025 3.37 (Table VII, Appendix D) d. .02 and / 2 .02 / 2 .01 : Reject H0 if F F.01 4.30 (Table VIII, Appendix D) a. The rejection region requires .05 in the upper tail of the F-distribution with 1 n1 1 25 1 24 and 2 n2 1 20 1 19 . From Table VI, Appendix D, F.05 2.11 . The rejection region is F 2.11 (if s12 s22 ). b. The rejection region requires .05 in the upper tail of the F-distribution with 1 n2 1 15 1 14 and 2 n1 1 10 1 9 . From Table VI, Appendix D, F.05 3.01 . The rejection region is F 3.01 (if s22 s12 ). c. The rejection region requires / 2 .10 / 2 .05 in the upper tail of the F-distribution. If s12 s22 , 1 n1 1 21 1 20 and 2 n2 1 31 1 30 . From Table VI, Appendix D, F.05 1.93 . The rejection region is F 1.93 . If s12 s22 , 1 n2 1 30 and 2 n1 1 20 . From Table VI, F.05 2.04 . The rejection region is F 2.04 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses d. 447 The rejection region requires .01 in the upper tail of the F-distribution with 1 n2 1 41 1 40 and 2 n1 1 31 1 30 . From Table VIII, Appendix D, F.01 2.30 . The rejection region is F 2.30 (if s22 s12 ). e. The rejection region requires .05 and / 2 .05 / 2 .025 in the upper tail of the F-distribution. If s12 s22 , 1 n1 1 7 1 6 and 2 n2 1 16 1 15 . From Table VII, Appendix D, F.025 3.41 . The rejection region is F 3.41 . If s12 s22 , 1 n2 1 15 and 2 n1 1 6 . From Table VII, Appendix D, F.025 5.27 . The rejection region is F 5.27 . 8.78 a. To determine if a difference exists between the population variances, we test: H 0 : 12 22 H a : 12 22 The test statistic is F s22 8.75 2.26 s12 3.87 The rejection region requires / 2 .10 / 2 .05 in the upper tail of the F-distribution with 1 n2 1 27 1 26 and 2 n1 1 12 1 11 . From Table VI, Appendix D, F.05 2.60 . The rejection region is F 2.60 . Since the observed value of the test statistic does not fall in the rejection region (F 2.26 2.60) , H0 is not rejected. There is insufficient evidence to indicate a difference between the population variances at .10 . b. The p-value is p 2P( F 2.26) . From Tables V and VI, with1 16and 2 11 , 2 .05 2P( F 2.26) 2 .10 .10 2P( F 2.26) .20 There is no evidence to reject H0 for .10 . 8.79 a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Sample 1, Sample 2 Variable Sample 1 Sample 2 N 6 5 Mean 2.417 4.36 Median 2.400 3.70 StDev 1.436 2.97 Minimum 0.700 1.40 Maximum Q1 Q3 4.400 1.075 3.650 8.90 1.84 7.20 To determine if the variance for population 2 is greater than that for population 1, we test: H 0 : 12 22 H a : 12 22 The test statistic is F s22 2.97 2 4.28 2 s1 1.436 2 Copyright © 2014 Pearson Education, Inc. 448 Chapter 8 The rejection region requires .05 in the upper tail of the F-distribution with1 n2 1 5 1 4 and 2 n1 1 6 1 5 . From Table VI, Appendix D, F.05 5.19 . The rejection region is F 5.19 . Since the observed value of the test statistic does not fall in the rejection region (F 4.29 5.19) , H0 is not rejected. There is insufficient evidence to indicate the variance for population 2 is greater than that for population 1 at .05 . b. The p-value is p P(F 4.28) . From Tables V and VI, with1 4 and 2 5 , .05 p P(F 4.28) .10 There is no evidence to reject H0 for .05 but there is evidence to reject H0 for .10 . 8.80 a. To determine if a difference exists between the population variances, we test: 2 2 H 0 : BT PA 2 2 H a : BT PA b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: BT, PA Variable BT PA N 7 8 Mean 89.86 99.63 Variance 135.14 749.70 Minimum 70.00 66.00 Q1 82.00 76.50 Median 93.00 96.00 Q3 99.00 115.00 Maximum 105.00 153.00 2 2 sBT 135.14 and sPA 749.70 2 sPA Larger sample variance 749.70 5.55 . 2 sBT Smaller sample variance 135.14 c. The test statistic is F d. Using MINITAB, the p-value is: Cumulative Distribution Function F distribution with 7 DF in numerator and 68 DF in denominator x 5.55 P( X <= x ) 0.973421 The p-value is p 2(1 .973421) 2(.026579) .053158 . e. 8.81 Since the p-value is not less than ( p .053158 .01) , H0 is not rejected. There is insufficient evidence to indicate the variances are different at .01 . Let 12 variance of the number of ads recalled by children in the video only group and 22 variance of the number of ads recalled by children in the A/V group. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses a. 449 To determine if the group variances are equal, we test: H 0 : 12 22 H a : 12 22 s 2 2.132 larger sample variance 22 1.157 smaller sample variance s1 1.982 b. The test statistic is: F c. The rejection region requires / 2 .10 / 2 .05 in the upper tail of the F-distribution with 1 n2 1 20 1 19 and 2 n1 1 20 1 19 . From the Table VI, Appendix D, F.05 2.16 . The rejection region is F 2.16 . d. Since the observed value of the test statistic does not fall in the rejection region (F 1.157 2.16) , H0 is not rejected. There is insufficient evidence to indicate the variances of the number of ads recalled by the children in the video-only group and the A/V group differ at .10 . 8.82 e. Since we could not reject H0 that the variances were equal, it indicates that the assumption of equal variances is probably valid. The inference about the population means is probably valid. a. The amount of variability of GHQ scores tells us how similar or different the members of the group are on GHQ scores. The larger the variability, the larger the differences are among the members on the GHQ scores. The smaller the variability, the smaller the differences are among the members on the GHQ scores. b. Let 12 variance of the mental health scores of the employed and 22 variance of the mental health scores of the unemployed. To determine if the variability in mental health scores differs for employed and unemployed workers, we test: H 0 : 12 22 H a : 12 22 c. The test statistic is F Larger sample variance s22 5.102 2.45 Smaller sample variance s12 3.26 2 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n2 1 49 1 48 and 2 n1 1 142 1 141 . Using MINITAB, Inverse Cumulative Distribution Function F distribution with 48 DF in numerator and 141 DF in denominator P( X <= x ) 0.975 x 1.55339 The rejection region is F 1.55 . Since the observed value of the test statistic falls in the rejection region (F 2.45 1.55) , H0 is rejected. There is sufficient evidence to indicate that the variability in mental health scores differs for employed and unemployed workers for .05 . Copyright © 2014 Pearson Education, Inc. 450 Chapter 8 d. 8.83 We must assume that the 2 populations of mental health scores are normally distributed. We must also assume that we selected 2 independent random samples. Let 12 variance at site 1 and 22 variance of site 2. To determine if the variances at the two locations differ, we test: H 0 : 12 22 H a : 12 22 From the printout, the test statistic is F .844 and the p-value is p .681 . Since the p-value is not less than ( p .681 .05) , H0 is not rejected. There is insufficient evidence to indicate the variances at the two locations differ at .05 . 8.84 Let 12 variance zinc measurements from the text-line, 22 variance zinc measurements from the witness-line, and 32 variance zinc measurements from the intersection. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Text-line, Witness-line, Intersection Variable Text-lin WitnessIntersec a. N 3 6 5 Mean 0.3830 0.3042 0.3290 Median 0.3740 0.2955 0.3190 StDev 0.0531 0.1015 0.0443 Minimum 0.3350 0.1880 0.2850 Maximum 0.4400 0.4390 0.3930 Q1 0.3350 0.2045 0.2900 Q3 0.4400 0.4075 0.3730 To determine if the variation in the zinc measurements for the text-line and the intersection differ, we test: H 0 : 12 32 H a : 12 32 The test statistic is F Larger sample variance s12 .05312 1.437 Smaller sample variance s32 .04432 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n1 1 3 1 2 and 2 n3 1 5 1 4 . From Table VII, Appendix D, F.025 10.65 . The rejection region is F 10.65 . Since the observed value of the test statistic does not fall in the rejection region (F 1.437 10.65) , H0 is not rejected. There is insufficient evidence to indicate the variation in the zinc measurements for the text-line and the intersection differ at .05 . b. To determine if the variation in the zinc measurements for the witness-line and the intersection differ, we test: H 0 : 22 32 H a : 22 32 The test statistic is F Larger sample variance s22 .10152 5.250 Smaller sample variance s32 .04432 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 451 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n2 1 6 1 5 and 2 n3 1 5 1 4 . From Table IX, Appendix D, F.025 9.36 . The rejection region is F 9.36 . Since the observed value of the test statistic does not fall in the rejection region (F 5.25 9.36) , H0 is not rejected. There is insufficient evidence to indicate the variation in the zinc measurements for the witness-line and the intersection differ at .05 . 8.85 c. There is no indication that the variances of the zinc measurements for three locations differ. d. With only 3, 6, and 5 measurements, it is very difficult to check the assumptions. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Novice, Experienced Variable Novice Experien a. N 12 12 Mean 32.83 20.58 Median 32.00 19.50 StDev 8.64 5.74 Minimum 20.00 10.00 Maximum 48.00 31.00 Q1 26.75 17.25 Q3 39.00 24.75 Let 12 variance in inspection errors for novice inspectors and 22 variance in inspection errors for experienced inspectors. Since we wish to determine if the data support the belief that the variance is lower for experienced inspectors than for novice inspectors, we test: H 0 : 12 22 H a : 12 22 The test statistic is F Larger sample variance s12 8.642 2.27 Smaller sample variance s 22 5.74 2 The rejection region requires .05 in the upper tail of the F-distribution with1 n1 1 12 1 11 and 2 n2 1 12 1 11 . Using MINITAB: Inverse Cumulative Distribution Function F distribution with 11 DF in numerator and 11 DF in denominator P( X <= x ) 0.95 2.81793 x The rejection region is F 2.82 . Since the observed value of the test statistic does not fall in the rejection region (F 2.27 2.82) , H0 is not rejected. The sample data do not support her belief at .05 . Copyright © 2014 Pearson Education, Inc. 452 Chapter 8 b. Using MINITAB: Cumulative Distribution Function F distribution with 11 DF in numerator and 11 DF in denominator x 2.27 P( X <= x ) 0.905144 The p-value P(F 2.27) 1 P(F 2.27) 1 .905) .095 . 8.86 Let 12 heat rate variance of traditional augmented gas turbines, 22 heat rate variance of aeroderivative augmented gas turbines, and 32 heat rate variance of advanced augmented gas turbines. Using MINITAB, some preliminary calculations are: Descriptive Statistics: HEATRATE Variable HEATRATE a. ENGINE N Advanced 21 Aeroderiv 7 Traditional 39 Mean 9764 12312 11544 StDev Minimum Q1 Median Q3 639 9105 9252 9669 10060 2652 8714 9469 12414 14628 1279 10086 10592 11183 11964 Maximum 11588 16243 14796 To determine if the heat rate variances for traditional and aeroderivative augmented gas turbines differ, we test: H 0 : 12 22 H a : 12 22 The test statistic is F Larger sample variance s22 2652 2 4.299 Smaller sample variance s32 1279 2 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with numerator df 2 n2 –1 7 –1 6 and denominator df 3 n3 –1 39 –1 38 . From TableVII, Appendix D, F.025 2.74 . The rejection region is F 2.74 . Since the observed value of the test statistic falls in the rejection region (F 4.299 2.74) , H0 is rejected. There is sufficient evidence to indicate the heat rate variances for traditional and aeroderivative augmented gas turbines differ at .05 . Since the test in Exercise 8.24 a assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. b. To determine if the heat rate variances for advanced and aeroderivative augmented gas turbines differ, we test: H 0 : 22 32 H a : 22 32 The test statistic is F Larger sample variance s22 26522 17.224 Smaller sample variance s32 639 2 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 453 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with numerator df 1 n1 –1 7 –1 6 and denominator df 2 n2 –1 21–1 20 . From Table VII, Appendix D, F.025 3.13 . The rejection region is F 3.13 . Since the observed value of the test statistic falls in the rejection region (F 17.224 3.13) , H0 is rejected. There is sufficient evidence to indicate the heat rate variances for advanced and aeroderivative augmented gas turbines differ at .05 . Since the test in Exercise 8.24 b assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. 8.87 a. Let 12 variance of the order-to-delivery times for the Persian Gulf War and 22 variance of the order-to-delivery times for Bosnia. Descriptive Statistics: Gulf, Bosnia Variable Gulf Bosnia N 9 9 Mean 25.24 7.38 Median 27.50 6.50 StDev 10.5204 3.6537 Minimum 9.10 3.00 Maximum 41.20 15.10 Q1 15.30 5.25 Q3 32.15 9.20 To determine if the variances of the order-to-delivery times for the Persian Gulf and Bosnia shipments are equal, we test: H0 : 12 1 22 Ha : 12 1 22 The test statistic is F Larger sample variance s12 10.52042 8.29 Smaller sample variance s22 3.6537 2 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n1 1 9 1 8 and 2 n2 1 9 1 8 . From Table VII, Appendix D, F.025 4.43 . The rejection region is F 4.43 . Since the observed value of the test statistic falls in the rejection region (F 8.29 4.43) , H0 is rejected. There is sufficient evidence to indicate the variances of the order-to-delivery times for the Persian Gulf and Bosnia shipments differ at .05 . b. No. One assumption necessary for the small sample confidence interval for (1 2 ) is that 12 22 . For this problem, there is evidence to indicate that 12 22 . 8.88 Let 12 variance of improvement scores in the honey dosage group and 22 variance of improvement scores in the DM dosage group. From Exercise 8.23, s1 2.855 and s2 3.256 . To determine if the variability in coughing improvement scores differs for the two groups, we test: H 0 : 12 22 H a : 12 22 Copyright © 2014 Pearson Education, Inc. 454 Chapter 8 The test statistic is F s 2 3.2562 larger sample variance 22 1.30 smaller sample variance s1 2.8552 The rejection region requires / 2 .10 / 2 .05 in the upper tail of the F-distribution with numerator df 2 n2 –1 33 –1 32 and denominator df 1 n1 –1 35 –1 34 . From Table VI, Appendix D, F.05 1.84 . The rejection region is F 1.84 . Since the observed value of the test statistic does not fall in the rejection region (F 1.30 1.84) , H0 is not rejected. There is insufficient evidence to indicate the variability in the coughing improvement scores differs for the two groups at .10 . 8.89 a. The 2 samples are randomly selected in an independent manner from the two populations. The sample sizes, n1 and n2, are large enough so that x1 and x2 each have approximately normal sampling distributions and so that s12 and s22 provide good approximations to 12 and 22 . This will be true if n1 30 and n2 30 . 8.90 b. 1. 2. 3. Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal. The samples are randomly and independently selected from the populations. c. 1. 2. The relative frequency distribution of the population of differences is normal. The sample of differences are randomly selected from the population of differences. d. The two samples are independent random samples from binomial distributions. Both samples should be large enough so that the normal distribution provides an adequate approximation to the sampling distributions of p̂1 and p̂2 . e. The two samples are independent random samples from populations which are normally distributed. a. H 0 : 12 22 H a : 12 22 The test statistic is F Larger sample variance s22 120.1 3.84 Smaller sample variance s12 31.3 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution 1 n2 –1 15 –1 14 and 2 n1 –1 20 –1 19 . From Table VII, Appendix D, F.025 2.66 . The rejection region is F 2.66 . Since the observed value of the test statistic falls in the rejection region (F 3.84 2.66) , H0 is rejected. There is sufficient evidence to conclude 12 22 at .05 . b. No, we should not use a small sample t- test to test H0 : (1 2 ) 0 against Ha : (1 2 ) 0 because the assumption of equal variances does not seem to hold since we concluded 12 22 in part b. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.91 a. sp2 455 ( n1 1) s12 ( n1 1) s22 11(74.2) 13(60.5) 66.7792 12 14 2 n1 n2 2 H 0 : 1 2 0 H a : 1 2 0 ( x1 x2 ) 0 (17.8 15.3) 0 .78 1 1 1 1 66.7792 s 12 14 n1 n2 The rejection region requires .05 in the upper tail of the t-distribution with df n1 n2 – 2 12 14 – 2 24 . From Table III, Appendix D, t.05 1.711 . The rejection region is t 1.711 . The test statistic is t 2 p Since the observed value of the test statistic does not fall in the rejection region (t 0.78 1.711) , H0 is not rejected. There is insufficient evidence to indicate that 1 2 at .05 . b. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table III, Appendix D, with df n1 n2 – 2 12 14 – 2 24 , t.005 2.797 . The confidence interval is: 1 1 1 1 ( x1 x2 ) t.005 s 2p (17.8 15.3) 2.797 66.7792 n n 12 14 2 1 2.50 8.99 (6.49, 11.49) c. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . n1 n2 8.92 ( z / 2 ) 12 22 ( ME ) 2 (2.58) 2 (74.2 60.5) 2 2 224.15 225 Some preliminary calculations are: pˆ1 a. x1 110 .55 n1 200 pˆ 2 x2 130 .65 n2 200 pˆ x1 x2 110 130 240 .6 n1 n2 200 200 400 H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 (.55 .65) 0 1 1 .6(1 .6) 200 200 .10 2.04 .049 The rejection region requires .10 in the lower tail of the z-distribution. From Table II, Appendix D, z.10 1.28 . The rejection region is z 1.28 . Copyright © 2014 Pearson Education, Inc. 456 Chapter 8 Since the observed value of the test statistic falls in the rejection region ( z 2.04 1.28) , H0 is rejected. There is sufficient evidence to conclude p1 p2 0 at .10 . b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval for ( p1 p2 ) is approximately: pˆ1qˆ2 pˆ 2 qˆ2 .55(1 .55) .65(1 .65) (.55 .65) 1.96 n1 n2 200 200 ( pˆ1 pˆ 2 ) z / 2 c. .10 .096 (.196, .004) From part b, z.025 1.96 . Using the information from our samples, we can use p1 .55 and p2 .65 . For a width of .01, the margin of error is .005. z / 2 ( p1q1 p2 q2 ) 2 n1 n2 8.93 a. ME 2 (1.96) 2 .55(1 .55) .65(1 .65) .0052 1.82476 72,990.4 72, 991 .000025 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The confidence interval is: ( x1 x2 ) z.05 b. s12 s22 2.1 3.0 (12.2 8.3) 1.645 n1 n2 135 148 3.90 .31 (3.59, 4.21) H 0 : 1 2 0 H a : 1 2 0 The test statistic is z ( x1 x2 ) 2 1 2 2 s s n1 n2 (12.2 8.3) 0 2.1 3.0 135 148 20.60 The rejection region requires / 2 .01 / 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . Since the observed value of the test statistic falls in the rejection region ( z 20.60 2.58) , H0 is rejected. There is sufficient evidence to indicate that 1 2 at .01 . c. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . n1 n2 ( z / 2 ) 12 22 ( ME ) 2 (1.645) 2 (2.1 3.0) 345.02 346 .2 2 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.94 a. 457 This is a paired difference experiment. Some preliminary calculations are: Pair Difference (Pop. 1 - Pop. 2) 1 2 3 4 5 6 4 4 3 2 nd di nd 2 i 1 2 di 81 19 nd 5 2.2 sd2 i 1 nd 1 5 1 2 d nd i 19 d i 1 3.8 nd 5 sd 2.2 1.4832 H 0 : d 0 H a : d 0 The test statistic is t d 0 sd nd 3.8 0 1.4832 / 5 5.73 The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df nd 1 5 1 4 . From Table III, Appendix D, t.025 2.776 . The rejection region is t 2.776 or t 2.776 . Since the observed value of the test statistic falls in the rejection region (t 5.73 2.776) , H0 is rejected. There is sufficient evidence to indicate that the population means are different at .05 . b. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . Therefore, we would use the same t-value as above, t.025 2.776 . The confidence interval is: xd t / 2 8.95 sd nd 3.8 3.8 2.776 1.4832 5 3.8 1.84 (1.96, 5.64) c. The sample of differences must be randomly selected from a population of differences which has a normal distribution. a. Let 1 average size of the right temporal lobe of the brain for the short-recovery group and 2 average size of the right temporal lobe of the brain for the long-recovery group. The target parameter is 1 2 . We must assume that the two samples are random and independent, the two populations being sampled from are approximately normal, and the two population variances are equal. b. Let p1 proportion of athletes who have a good self-image of their body and p2 proportion of nonathletes who have a good self-image of their body. Copyright © 2014 Pearson Education, Inc. 458 Chapter 8 The target parameter for this comparison is p1 p2 . We must assume that the two samples are random and independent and that the sample sizes are sufficiently large. c. Let 1 average weight of eggs produced by a sample of chickens on regular feed and 2 average weight of eggs produced by a sample of chickens fed a diet supplemented by corn oil. Let d 1 2 average difference in weight between eggs produced by the chickens on regular feed and then on a diet supplemented with corn oil. The target parameter is d . We must assume that we have a random sample of differences and that the population of differences is approximately normal if the sample size is small. If the sample size of differences is greater than 30, we do not need to assume that the population of differences is normal. 8.96 a. Let p1 proportion of Opening Doors students enrolled full time and p2 proportion of traditional students enrolled full time. The target parameter for this comparison is p1 p2 . b. Let 1 mean GPA of Opening Doors students and 2 mean GPA of traditional students. The target parameter for this comparison is 1 2 . 8.97 a. Let 1 mean score for males and 2 mean score for females. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . The 90% confidence interval is: ( x1 x2 ) z / 2 12 n1 22 n2 (39.08 38.79) 1.645 6.732 6.942 127 114 0.29 1.452 (1.162, 1.742) We are 90% confident that the difference in mean service-rating scores between males and females is between -1.162 and 1.742. b. To determine if the service-rating score variances differ by gender, we test: H 0 : 12 22 H a : 12 22 The test statistic is F s 2 6.94 2 larger sample variance 22 1.06 smaller sample variance s1 6.732 The rejection region requires / 2 .10 / 2 .05 in the upper tail of the F-distribution with numerator df 2 n2 –1 114 –1 113 and denominator df 1 n1 –1 127 –1 126 . Using MINITAB, we get: Inverse Cumulative Distribution Function F distribution with 113 DF in numerator and 126 DF in denominator P( X <= x ) 0.95 x 1.35141 Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 459 F.05 1.35 . The rejection region is F 1.35 . Since the observed value of the test statistic does not fall in the rejection region (F 1.06 1.35) , H0 is not rejected. There is insufficient evidence to indicate the service-rating score variances differ by gender at .10 . c. 8.98 Since we did not reject H0 in part b, the confidence interval in part a is valid. Because 0 falls in the 90% confidence interval, we are 90% confident that there is no difference in the mean service-rating scores between males and females. Using MINITAB, some preliminary calculations are: Descriptive Statistics: Spillage Variable Spillage a. Cause Collision Fire Grounding HullFail Unknown N 10 11 11 9 1 Mean 76.6 75.0 53.73 63.7 25.000 StDev 70.4 61.9 29.45 63.1 * Variance 4950.9 3829.6 867.22 3984.5 * Q1 35.0 33.0 36.00 31.0 * Median 41.5 50.0 41.00 36.0 25.000 Q3 102.0 82.0 62.00 73.5 * Let 1 mean spillage for accidents caused by collision and 2 mean spillage for accidents caused by fire/explosion. s 2p n1 1 s12 n2 1 s22 10 1 4,950.9 11 1 3,829.6 n1 n2 2 10 11 2 4,360.7421 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df n1 n2 – 2 10 11– 2 19 , t.05 1.729 . The confidence interval is: 1 1 1 1 ( x1 x2 ) t.05 s 2p (76.6 75.0) 1.729 4,360.7421 10 11 n1 n2 1.6 49.89 (48.29, 51.49) b. Let 3 mean spillage for accidents caused by grounding and 4 mean spillage for accidents caused by hull failure. s 2p n3 1 s32 n4 1 s42 11 1 867.22 9 1 3,984.5 n3 n4 2 11 9 2 2, 252.6778 To determine if the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure, we test: H 0 : 3 4 0 H a : 3 4 0 The test statistic is t x3 x4 Do 1 1 s 2p n3 n4 53.73 63.7 0 1 1 2, 252.6778 11 9 6.61 .47 17.1342 Copyright © 2014 Pearson Education, Inc. Chapter 8 The rejection region requires / 2 .05/ 2 .025 in each tail of the t-distribution with df n1 n2 – 2 11 9 – 2 18 . From Table III, Appendix D, t.025 2.101 . The is t 2.101 or t 2.101 . rejection region Since the observed value of the test statistic does not fall in the rejection region (t .47 2.101) , H0 is not rejected. There is insufficient evidence to indicate the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure at .05 . c. The necessary assumptions are: We must assume that the distributions from which the samples were selected are approximately normal, the samples are independent, and the variances of the two populations are equal. Below are the histograms for each of the samples: Histogram of Spillage Normal -60 Collision 0 60 120 180 240 Fire 6.0 4.5 3.0 Frequency 460 1.5 Grounding HullFail 0.0 6.0 4.5 3.0 Collision 76.6 Mean StDev 70.36 N 10 Fire Mean 75 StDev 61.88 N 11 Grounding Mean 53.73 StDev 29.45 N 11 HullFail Mean 63.67 StDev 63.12 N 9 1.5 0.0 -60 0 60 120 180 240 Spillage Panel variable: Cause Based on the shapes of the histograms, it does not appear that the data are normally distributed. Also, we know that if the data are normally distributed, then the Interquartile Range, IQR, divided by the standard deviation should be approximately 1.3. We will compute IQR/s for each of the samples: Collision: Fire: Grounding: Hull Failure: IQR / s 102.0 – 35.0 / 70.4 .95 IQR / s 82 – 33 / 61.9 .79 IQR / s 62.0 – 36 / 29.45 .88 IQR / s 73.5 – 31 / 63.1 .67 Since all of these ratios are quite a bit smaller than 1.3, it indicates that none of the samples come from normal distributions. Thus, it appears that the assumption of normal distributions is violated. The sample standard deviations are: Collision: Fire: s 70.4 s 61.9 Grounding: Hull Failure: Copyright © 2014 Pearson Education, Inc. s 29.45 s 63.1 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 461 Without doing formal tests, it appears that the variances of the groups Collision, Fire, and Hull Failure are probably not significantly different. However, it appears that the variance for the Grounding group is smaller than the others. d. Let 12 variance of spillage for accidents caused by collision and 32 variance of spillage for accidents caused by grounding. To determine if the variances of the amounts of spillage due to collision and grounding differ, we test: H 0 : 12 32 0 H a : 12 32 0 The test statistic is F Larger sample variance s12 4, 950.9 5.71 Smaller sample variance s32 867.22 The rejection region requires / 2 .02 / 2 .01 in the upper tail of the F distribution with 1 n1 –1 10 –1 9 and 2 n3 –1 11–1 10 . From Table VIII, Appendix D, F.01 4.94 . The rejection region is F 4.94 . Since the observed value of the test statistic falls in the rejection region (F 5.71 4.94) , H0 is rejected. There is sufficient evidence to indicate the variances of the amounts of spillage due to collision and grounding differ at .02 . 8.99 a. Yes. The sample mean of the virtual-reality group is 10.67 points higher than the sample mean of the simple user interface group. b. Let 1 mean improvement score for the virtual-reality group and 2 mean improvement score for the simple user interface group. To determine if the mean improvement scores for the virtual-reality group is higher than that for the simple user interface group, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is z x1 x2 Do n1 n2 2 1 2 2 43.15 32.48 0 12.57 9.26 45 45 2 2 10.67 4.58 2.3274 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 4.58 1.645) , H0 is rejected. There is sufficient evidence to indicate the mean improvement scores for the virtual-reality group is higher than that for the simple user interface group .05 . Copyright © 2014 Pearson Education, Inc. 462 Chapter 8 8.100 a. Since there is much variability among cars, by using matched pairs, we can block out the variability among the cars and compare the means of the 2 types of shocks. b. Let 1 mean strength of manufacturer’s shock and 2 mean strength of competitor’s shock. Also, let d 1 2 . Using MINITAB the descriptive statistics are: Descriptive Statistics: Manufacturer, Competitor, Di Variable Manufacturer Competitor Di N 6 6 6 Mean 10.717 10.300 0.4167 StDev 1.752 1.818 0.1329 Minimum 8.800 8.400 0.2000 Q1 9.400 8.850 0.3500 Median 10.100 9.700 0.4000 Q3 12.675 12.250 0.5250 Maximum 13.200 13.000 0.6000 To determine if there is a difference in the mean strength of the two types of shocks after 20,000 miles, we test: H 0 : d 0 H a : d 0 The test statistic is t d 0 .4167 0 7.68 .1329 sd 6 nd The rejection region requires / 2 .05 / 2 .025 in each tail of the t-distribution with df nd 1 6 1 5 . From Table III, Appendix D, t.025 2.571 . The rejection region is t 2.571 or t 2.571 . Since the observed value of the test statistic falls in the rejection region (t 7.68 2.571) , H0 is rejected. There is sufficient evidence to indicate a difference in the mean strength of the two types of shocks after 20,000 miles at .05 . c. Using MINITAB: Cumulative Distribution Function Student's t distribution with 5 DF x 7.68 P( X <= x ) 0.999702 The observed significance level is P(t 7.68) P(t 7.68) 2P(t 7.68) 2(1 .999702) .000596 d. We must assume that the population of differences is normally distributed and that the sample is random. e. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix B, with df nd –1 6 –1 5 , t.025 2.571 . The 95% confidence interval is: Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses d z.025 sd nd .4167 2.571 .1329 6 463 .4167 .1395 (.2772, .5562) We are 95% confident that the difference in mean strength between the two types of shocks after 20,000 miles is between .2772 and .5562. f. Some preliminary calculations are: s 2p (n1 1) s12 (n2 1) s22 (6 1)1.7522 (6 1)1.8182 3.1873 n1 n2 2 662 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix B, with df n1 n2 – 2 6 6 – 2 11 , t.025 2.228 . The 95% confidence interval is: 1 1 1 1 ( 1 2 ) t.025 s 2p (10.717 10.300) 2.228 3.1873 n n 6 6 2 1 .417 2.2965 (1.8795, 2.7135) We are 95% confident that the difference in mean strength between the two types of shocks after 20,000 miles is between -1.8795 and 2.7135. 8.101 g. The interval assuming independent sample in part f is (1.8795, 2.7135) while the interval assuming paired differences in part e is (.2772, .5562). The interval assuming independent samples is much wider because the interval for the paired-difference eliminated the car to car differences. The interval from part e gives more information because the interval is narrower. h. No. If the data were collected using a paired experiment, then the data must be analyzed as a paired experiment. a. The data should be analyzed as a paired difference experiment because each actor who won an Academy Award was paired with another actor with similar characteristics who did not win the award. b. Let 1 mean life expectancy of Academy Award winners and 2 mean life expectancy of nonAcademy Award winners. Let d 1 2 . To compare the mean life expectancies of Academy Award winners and non-winners, we test: H 0 : d 0 H a : d 0 c. Since the p-value was so small, there is sufficient evidence to indicate the mean life expectancies of the Academy Award winners and non-winners are different for any value of .003 . Since the sample mean life expectancy of Academy Award winners is greater than that for non-winners, we can conclude that Academy Award winners have a longer mean life expectancy than non-winners. Copyright © 2014 Pearson Education, Inc. 464 Chapter 8 8.102 a. Let 1 mean carat size of diamonds certified by GIA and 2 mean carat size of diamonds certified by HRD. For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval is: 12 ( x1 x2 ) z / 2 n1 22 n2 (.6723 .8129) 1.96 .24562 .18312 151 79 .1406 .0563 (.1969, .0843) b. We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by HRD is between -.1969 and -.0843. Since both end points are negative, the mean carat size of diamonds certified by HRD is larger than the mean carat size of diamonds certified by GIA by anywhere from .0843 and .1969 carats. c. Let 3 mean carat size of diamonds certified by IGI. ( x1 x3 ) z / 2 d. e. f. 12 n1 32 n3 (.6723 .3665) 1.96 .2456 2 .21632 151 78 .3058 .0620 (.2438, .3678) We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by IGI is between .2438 and .3678. Since both end points are positive, the mean carat size of diamonds certified by GIA is larger than the mean carat size of diamonds certified by IGI by anywhere from .2438 and .3678 carats. ( x2 x3 ) z / 2 22 n2 32 n3 (.8129 .3665) 1.96 .18312 .21632 79 78 .4464 .0627 (.3837, .5091) We are 95% confident that the difference in mean carat size between diamonds certified by HRD and those certified by IGI is between .3837 and .5091. Since both end points are positive, the mean carat size of diamonds certified by HRD is larger than the mean carat size of diamonds certified by IGI by anywhere from .3837 and .5091 carats. Let 12 variance of carat size for diamonds certified by GIA, 22 variance of carat size for diamonds certified by HRD, and 32 variance of carat size for diamonds certified by IGI. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses g. 465 To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by HRD, we test: H 0 : 12 22 H a : 12 22 The test statistic is F Larger sample variance s12 .24562 1.799 Smaller sample variance s22 .18312 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n1 1 151 1 150 and 2 n2 1 79 1 78 . Using MINITAB, F.025 1.494 . The rejection region is F 1.494 . Since the observed value of the test statistic falls in the rejection region (F 1.799 1.494) , H0 is rejected. There is sufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by HRD at .05 . h. To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by IGI, we test: H 0 : 12 32 H a : 12 32 The test statistic is F Larger sample variance s12 .24562 1.289 Smaller sample variance s32 .21632 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n1 1 151 1 150 and 2 n3 1 78 1 77 . Using MINITAB, F.025 1.497 . The rejection region is F 1.497 . Since the observed value of the test statistic does not fall in the rejection region (F 1.289 1.497) , H0 is not rejected. There is insufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by IGI at .05 . i. To determine if the variation in selling price differs for diamonds certified by HRD and diamonds certified by IGI, we test: H 0 : 22 32 H a : 22 32 Copyright © 2014 Pearson Education, Inc. Chapter 8 The test statistic is F Larger sample variance s22 28982 1.87 Smaller sample variance s32 21212 The rejection region requires / 2 .05 / 2 .025 in the upper tail of the F-distribution with 1 n2 1 79 1 78 and 2 n3 1 78 1 77 . Using MINITAB, F.025 1.567 . The rejection region is F 1.567 . Since the observed value of the test statistic falls in the rejection region (F 1.87 1.567) , H0 is rejected. There is sufficient evidence to indicate the variation in selling price differs for diamonds certified by HRD and those certified by IGI at .05 . j. We will look at the 4 methods for determining if the data are normal. First, we will look at histograms of the data. Using MINITAB, the histograms of the carat sizes for the 3 certification bodies are: Histogram of CARAT 0.30 GIA 0.45 0.60 HRD 0.75 0.90 1.05 30 20 Frequency 466 10 0 IGI 30 20 10 0 0.30 0.45 0.60 0.75 0.90 1.05 CARAT Panel variable: CERT From the histograms, none of the data appear to be mound-shaped. It appears that none of the data sets are normal. Next, we look at the intervals x s , x 2 s , x 3 s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. For GIA: x s .6723 .2456 (.4267, .9179) 84 of the 151 values fall in this interval. The proportion is .56. This is much smaller than the .68 we would expect if the data were normal. x 2s .6723 2(.2456) .6723 .4912 (.1811, 1.1635) 151 of the 151 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. x 3s .6723 3(.2456) .6723 .7368 (.0645, 1.4091) 151 of the 151 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 467 From this method, it appears that the data are not normal. For IGI: x s .3665 .2163 (.1502, .5828) 69 of the 78 values fall in this interval. The proportion is .88. This is much larger than the .68 we would expect if the data were normal. x 2s .3665 2(.2163) .3665 .4326 (.0661, .7991) 74 of the 78 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. x 3s .3665 3(.2163) .3665 .6489 (.2824, 1.0154) 78 of the 78 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For HRD: x s .8129 .1831 (.6298, .9960) 30 of the 79 values fall in this interval. The proportion is .38. This is much smaller than the .68 we would expect if the data were normal. x 2s .8129 2(.1831) .8129 .3662 (.4467, 1.1791) 79 of the 79 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. x 3s .8129 3(.1831) .8129 .5493 (.2636, 1.3622) 79 of the 79 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. Using MINITAB, the quartiles are: Descriptive Statistics: CARAT Variable CARAT CERT GIA HRD IGI N 151 79 78 Mean 0.6723 0.8129 0.3665 StDev 0.2456 0.1831 0.2163 Q1 0.5000 0.6500 0.2100 Median 0.7000 0.8100 0.2900 Q3 0.9000 1.0000 0.4850 For GIA: IQR QU – QL .9 .4 .5 . IQR .5 2.036 This is much larger than the 1.3 we would expect if the data were normal. s .2456 This method indicates the data are not normal. Copyright © 2014 Pearson Education, Inc. Chapter 8 For IGI: IQR QU – QL .485 .210 .275 . IQR .275 1.27 This is very close to the 1.3 we would expect if the data were normal. This s .2163 method indicates the data might be normal. For HRD: IQR QU – QL 1.00 .65 .35 . IQR .35 1.91 This is much larger than the 1.3 we would expect if the data were normal. This s .1831 method indicates the data are not normal. Finally, using MINITAB, the normal probability plots are: Probability Plot of CARAT Normal - 95% CI -0.5 GIA 0.0 0.5 1.0 1.5 HRD 99.9 99 90 50 10 Percent 468 1 0.1 IGI 99.9 99 90 50 10 1 0.1 -0.5 0.0 0.5 1.0 1.5 CARAT GIA 0.6723 Mean StDev 0.2456 N 151 AD 3.268 P-Value <0.005 HRD Mean 0.8129 StDev 0.1831 N 79 AD 3.405 P-Value <0.005 IGI Mean 0.3665 StDev 0.2163 N 78 AD 5.561 P-Value <0.005 Panel variable: CERT Since the data do not form a straight line for GIA, the data are not normal. Since the data do not form a straight line for IGI, the data are not normal. Since the data do not form a straight line HRD, the data are not normal. From the 4 different methods, all indications are that the carat size data are not normal for any of the certification bodies. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.103 a. 469 Let 1 mean response by noontime watchers and 2 mean response by non-noontime watchers. To determine if the mean response differs for noontime and non-noontime watchers, we test: H 0 : 1 2 0 H a : 1 2 0 b. Since the p-value ( p .02) is less than .05 , H0 is rejected. There is sufficient evidence to indicate the mean response differs for noontime and non-noontime watchers at .05 . c. Since the p-value ( p .02) is greater than .01 , H0 is not rejected. There is insufficient evidence to indicate the mean response differs for noontime and non-noontime watchers at .01 . Since the two sample means are so close together, there appears to be no “practical” difference between the two means. Even if there is a statistically significant difference between the two means, there is no practical difference. d. 8.104 a. Let p1 proportion of managers and professionals who are male and p2 proportion of part-time MBA students who are male. To see if the samples are sufficiently large: n1 pˆ1 162(.95) 153.9 and n1qˆ1 162(.05) 8.1 n2 pˆ 2 109(.689) 75.101 and n2 qˆ2 109(.311) 33.899 Since n1qˆ1 8.1 15, the normal approximation may not be adequate. We will go ahead and perform the test. First, we calculate the overall estimate of the common proportion under H0. pˆ n1 pˆ1 n2 pˆ 2 162(.95) 109(.689) .845 n1 n2 162 109 To determine if the population of managers and professionals consists of more males than the part-time MBA population, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 (.95 .689) 0 1 1 .845(.155) 162 109 5.82 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 . Since the observed value of the test statistic falls in the rejection ( z 5.82 1.645) , H0 is rejected. There is sufficient evidence to indicate that population of managers and professionals consists of more males than the part-time MBA population at .05 . Copyright © 2014 Pearson Education, Inc. 470 Chapter 8 b. We had to assume: 1. Both samples were randomly selected 2. Both sample sizes are sufficiently large. c. Let p1 proportion of managers and professionals who are married and p2 proportion of part-time MBA students who are married. To see if the samples are sufficiently large: n1 pˆ1 162(.912) 147.7 and n1qˆ1 162(.088) 14.3 n2 pˆ 2 109(.534) 58.2 and n2 qˆ2 109(.466) 50.8 Since n1qˆ1 14.3 15, the normal approximation may not be adequate. We will go ahead and perform the test. First, we calculate the overall estimate of the common proportion under H0. pˆ n1 pˆ1 n2 pˆ 2 162(.912) 109(.534) .760 n1 n2 162 109 To determine if the population of managers and professionals consists of more married individuals than the part-time MBA population, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 (.912 .534) 0 1 1 .760(.240) 162 109 7.14 The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic falls in the rejection ( z 7.14 2.33) , H0 is rejected. There is sufficient evidence to indicate that population of managers and professionals consists of more married individuals than the part-time MBA population at .01 . d. We had to assume: 1. Both samples were randomly selected 2. Both sample sizes are sufficiently large. 8.105 a. Let p1 proportion of employed individuals who had a routine checkup in the past year and p2 proportion of unemployed individuals who had a routine checkup in the past year. The researchers are interested in whether there is a difference in these two proportions, so the parameter of interest is p1 p2 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses b. 471 To determine if there is a difference in the proportions of employed and unemployed individuals who had a routine checkup in the past year, we test: H 0 : p1 p2 0 H a : p1 p2 0 c. Some preliminary calculations are: pˆ1 x1 642 .563 n1 1,140 pˆ x1 x2 642 740 1,382 .615 n1 n2 1,140 1,106 2, 246 ( pˆ1 pˆ 2 ) 0 8.106 qˆ 1 pˆ 1 .615 .385 (.563 .669) 0 5.16 1 1 1 1 .615(.385) ˆ ˆ pq 1,140 1,106 n1 n2 The rejection region requires / 2 .01 / 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . The test statistic is z d. x2 740 .669 n2 1,106 pˆ 2 e. The p-value is p P( z 5.16) P( z 5.16) (.5 .5) (.5 .5) 0 . This agrees with what was reported. f. Since the observed value of the test statistic falls in the rejection region ( z 5.16 2.58) , H0 is rejected. There is sufficient evidence to indicate a difference in the proportion of employed and unemployed individuals who had routine checkups in the past year at .01 . a. Let 1 mean annual percentage turnover for U.S. plants and 2 mean annual percentage turnover for Japanese plants. The descriptive statistics are: Descriptive Statistics: US, Japan Variable US Japan s 2p N 5 5 Mean 6.562 3.118 Median 6.870 3.220 StDev 1.217 1.227 Minimum 4.770 1.920 Maximum 8.000 4.910 Q1 5.415 1.970 Q3 7.555 4.215 ( n1 1) s12 ( n2 1) s22 (5 1)1.217 2 (5 1)1.227 2 1.4933 n1 n2 2 552 To determine if the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is t ( x1 x2 ) D0 1 1 s n1 n2 2 p (6.562 3.118) 0 1 1 1.4933 5 5 4.456 Copyright © 2014 Pearson Education, Inc. 472 Chapter 8 The rejection region requires .05 in the upper tail of the t-distribution with df n1 n2 – 2 5 5 – 2 8 . From Table III, Appendix D, t.05 1.860 . The rejection region is t 1.860 . Since the observed value of the test statistic falls in the rejection region (t 4.456 1.86) , H0 is rejected. There is sufficient evidence to indicate the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants at .05 . b. The p-value p P(t 4.456) . Using MINITAB, with df n1 n2 – 2 5 5 – 2 8 , Cumulative Distribution Function Student's t distribution with 8 DF x 4.456 P( X <= x ) 0.998939 p P(t 4.456) 1 .9989 .0011 . Since the p-value is so small, there is evidence to reject H0 for .0011 . c. The necessary assumptions are: 1. 2. 3. Both sampled populations are approximately normal. The population variances are equal. The samples are randomly and independently sampled. There is no indication that the populations are not normal. The sample sizes are so small, it is hard to check the assumptions. Both sample variances are similar, so there is no evidence the population variances are unequal. There is no indication the assumptions are not valid. 8.107 a. The two populations of interest are all male cell phone users and all female cell phone users. b. The estimate of the proportion of men who sometimes do not drive safely while talking or texting on a cell phone is pˆ1 .32 . The estimate of the proportion of women is pˆ 2 .25 . c. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . A 90% confidence interval for the difference between the proportions of men and women who sometimes do not drive safely while talking or texting on a cell phone is: ( pˆ1 pˆ 2 ) z.05 pˆ1qˆ1 pˆ 2 qˆ2 .32(.68) .25(.75) (.32 .25) 1.645 .07 .041 (.029, .111) n1 n2 643 643 d. Since the interval does not contain 0, then there is a sufficient evidence to indicate that there is a difference between the proportions of men and women who sometimes do not drive safely while talking or texting on a cell phone. Also, the interval contains all positive values so we can conclude that men are more likely than women to sometimes not drive safely while talking or texting on a cell phone. e. The estimate of the proportion of men who used their cell phone in an emergency is pˆ1 .71 . The estimate of the proportion of women is pˆ 2 .77 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses f. 473 Let p1 proportion of men who used their cell phone in an emergency and p2 the proportion of women who used their cell phone in an emergency. Some preliminary calculations are: pˆ1 x1 x1 n1 pˆ1 643(.71) 456.53 n1 pˆ x1 x2 456.53 495.11 .74 643 643 n1 n2 pˆ 2 x2 x2 n2 pˆ 2 643(.77) 495.11 n2 qˆ 1 pˆ 1 .74 .26 To determine whether the proportions of men and women who used their cell phones in an emergency differ, we test: H 0 : p1 p2 0 H a : p1 p2 0 ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 The test statistic is z (.71 .77) 0 1 1 .74(.26) 643 643 2.45 . The rejection region requires / 2 .10 / 2 .05 in each tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 or z 1.645 . Since the observed value of the test statistic falls in the rejection region ( z 2.45 1.645) , H0 is rejected. There is sufficient evidence to indicate the proportions of men and women who used their cell phone in an emergency differ at .10 . 8.108 For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D, z.05 1.645 . We estimate p1 p2 .5 . z / 2 ( p1q1 p2 q2 ) 2 n1 n2 8.109 a. ME 2 (1.645) 2 .5(.5) .5(.5) .052 541.205 542 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Purchasers, Nonpurchasers Variable Purchase Nonpurch s 2p N 20 20 Mean 39.80 47.20 Median 38.00 52.00 StDev 10.04 13.62 Minimum Maximum 23.00 59.00 22.00 66.00 Q1 32.25 33.50 ( n1 1) s12 ( n2 1) s22 (20 1)13.62 2 (20 1)10.04 2 143.153 n1 n2 2 20 20 2 Let 1 mean age of nonpurchasers and 2 mean age of purchasers. Copyright © 2014 Pearson Education, Inc. Q3 48.75 58.75 474 Chapter 8 To determine if there is a difference in the mean age of purchasers and nonpurchasers, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is t ( x1 x2 ) 0 1 1 s n1 n2 2 p (47.20 39.80) 0 1 1 143.153 20 20 1.956 The rejection region requires / 2 .10 / 2 .05 in each tail of the t-distribution with df n1 n2 – 2 20 20 – 2 38 . From Table III, Appendix D, t.05 1.684 . The rejection region is t 1.684 or t 1.684 . Since the observed value of the test statistic falls in the rejection region (t 1.956 1.684) , H0 is rejected. There is sufficient evidence to indicate the mean age of purchasers and nonpurchasers differ at .10 . b. The necessary assumptions are: 1. 2. 3. Both sampled populations are approximately normal. The population variances are equal. The samples are randomly and independently sampled. c. The p-value is p P(t 1.956) P(t 1.956) (.5 .4748) (.5 .4748) .0504 . The probability of observing a test statistic of this value or more unusual if H0 is true is .0504. Since this value is less than .10 , H0 is rejected. There is sufficient evidence to indicate there is a difference in the mean age of purchasers and nonpurchasers. d. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with d f 38 , t.05 1.684 . The confidence interval is: 1 1 1 1 ( x2 x1 ) t.05 s 2p (39.8 47.2) 1.684 143.153 20 20 n1 n2 7.4 6.37 (13.77, 1.03) We are 90% confident that the difference in mean ages between purchasers and nonpurchasers is between 13.77 and 1.03. 8.110 Let p1 proportion of larvae that died in containers containing high carbon dioxide levels and p2 proportion of larvae that died in containers containing normal carbon dioxide levels. The parameter of interest for this problem is p1 p2 , or the difference in the death rates for the two groups. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 475 Some preliminary calculations are: pˆ x1 x2 .10(80) .05(80) .075 n1 n2 80 80 qˆ 1 pˆ 1 .075 .925 To determine if an increased level of carbon dioxide is effective in killing a higher percentage of leaf-eating larvae, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq 80 80 (.10 .05) 0 1 1 .075(.925) 80 80 1.201 The rejection region requires .01 in the upper tail of the z distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . Since the observed value of the test statistic does not fall in the rejection region ( z 1.201 2.33) , H0 is not rejected. There is insufficient evidence to indicate that an increased level of carbon dioxide is effective in killing a higher percentage of leaf-eating larvae at .01 . 8.111 a. Let p1 proportion of African-American drivers searched by the LAPD and p2 proportion of white drivers searched by the LAPD. Some preliminary calculations are: pˆ1 x1 12, 016 .195 n1 61, 688 pˆ x1 x2 12, 016 5,312 17,328 .103 n1 n2 61, 688 106,892 168, 580 pˆ 2 x2 5,312 .050 n2 106,892 To determine if the proportions of African-American and white drivers searched differs, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 .195 .050 1 1 .103(.897) 61, 688 106,892 94.35 The rejection region requires / 2 .05 / 2 .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 1.96 . The rejection region is z 1.96 or z 1.96 . Since the observed value of the test statistic falls in the rejection region ( z 94.35 1.96) , H0 is rejected. There is sufficient evidence to indicate the proportions of African-American drivers and white drivers searched differs at .05 . Copyright © 2014 Pearson Education, Inc. 476 Chapter 8 b. Let p1 proportion of ‘hits’ for African-American drivers searched by the LAPD and p2 proportion of ‘hits’ for white drivers searched by the LAPD. Some preliminary calculations are: pˆ1 x1 5,134 .427 n1 12, 016 pˆ x1 x2 5,134 3, 006 8,140 .470 n1 n2 12, 016 5, 312 17,328 pˆ 2 x2 3, 006 .566 n2 5,312 For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The 95% confidence interval is: ( pˆ1 pˆ 2 ) z.025 pˆ1qˆ1 pˆ 2 qˆ2 .427(.573) .566(.434) (.427 .566) 1.96 n1 n2 12, 016 5,312 .139 .016 (.155, -.123) We are 95% confident that the difference in ‘hit’ rates between African-American drivers and white drivers searched by the LAPD is between .155 and .123. d 51 8.112 i 155 3.04 51 a. d i 1 b. We do not need to estimate anything – we know the parameter’s value. c. Using MINITAB, the descriptive statistics are: 51 Descriptive Statistics: MATH2011, MATH2001, DiffMath Variable MATH2011 MATH2001 DiffMath N 51 51 51 Mean 536.84 533.80 3.04 StDev 41.84 33.79 13.40 Minimum 457.00 474.00 -31.00 Q1 501.00 505.00 -6.00 Median 527.00 526.00 2.00 Q3 570.00 561.00 12.00 Maximum 617.00 603.00 32.00 Let 1 mean Math SAT score in 2011 and 2 mean Math SAT score in 2001. Then d 1 2 . To determine if the true mean Math SAT score in 2011 differs from that in 2001, we test: H 0 : d 0 H a : d 0 The test statistic is z d Do 3.04 0 1.62 sd 13.40 51 nd The rejection region requires / 2 .10 / 2 .05 in each tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . The rejection region is z 1.645 or z 1.645 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 477 Since the observed value of the test statistic does not fall in the rejection region ( z 1.62 1.645) , H0 is not rejected. There is insufficient evidence to indicate the true mean Math SAT score in 2011 is different than that in 2001 at .10 . 8.113 For probability .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . Since we have no prior information about the proportions, we use p1 p2 .5 to get a conservative estimate. ( z / 2 )2 ( p1q1 p2 q2 ) (1.96) .5(1 .5) .5(1 .5) 1.9208 4,802 .0004 .022 ME 2 2 n1 n2 8.114 For confidence level .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D, z.025 1.96 . The standard deviation can be estimated by dividing the range by 4: Range 4 1 4 4 z / 2 12 22 2 n1 n2 8.115 (ME )2 1.962 (12 12 ) 192.08 193 .22 Some preliminary calculations are: s12 s22 x x 2 1 2 1 n1 n1 1 x x 2 2 2252 5 126 31.5 5 1 4 10, 251 2 2 n2 n2 1 227 2 5 45.2 11.3 5 1 4 10,351 Let 12 variance for instrument A and 22 variance for instrument B. Since we wish to determine if there is a difference in the precision of the two machines, we test: H 0 : 12 22 H a : 12 22 The test statistic is F Larger sample variance s12 31.5 2.79 = Smaller sample variance s22 11.3 The rejection region requires / 2 .10 / 2 .05 in the upper tail of the F-distribution with 1 n1 1 5 1 4 and 2 n2 1 5 1 4 . From Table VI, Appendix D, F.05 6.39 . The rejection region is F 6.39 . Since the observed value of the test statistic does not fall in the rejection region (F 2.79 6.39) , H0 is not rejected. There is insufficient evidence of a difference in the precision of the two instruments at .10 . Copyright © 2014 Pearson Education, Inc. 478 8.116 Chapter 8 Let 1 the mean relational intimacy score for participants in the CMC group and 2 the mean relational intimacy score for participants in the FTF group. Using MINITAB, the descriptive statistics are: Descriptive Statistics: CMC, FTF Variable CMC FTF N Mean 24 3.500 24 3.542 StDev 0.780 0.658 Minimum 2.000 2.000 Q1 3.000 3.000 Median 3.500 4.000 Q3 4.000 4.000 Maximum 5.000 5.000 Some preliminary calculations are: s 2p n1 1 s12 n2 1 s22 24 1 .7802 24 1 .6582 n1 n2 2 24 24 2 0.5207 To determine if the mean relational intimacy score for participants in the CMC group is lower than the mean relational intimacy score for participants in the FTF group, we test: H 0 : 1 2 0 H a : 1 2 0 The test statistic is t x1 x2 Do 1 1 s 2p n1 n2 3.500 3.542 0 1 1 .5207 24 24 0.042 .20 .20831 The rejection region requires .10 in the lower tail of the t-distribution with df n1 n2 – 2 24 24 – 2 46 . From Table III, Appendix D, t.10 1.303 . The rejection region is t 1.303 . Since the observed value of the test statistic does not fall in the rejection region (t .20 1.303) , H0 is not rejected. There is insufficient evidence to indicate that the mean relational intimacy score for participants in the CMC group is lower than the mean relational intimacy score for participants in the FTF group at .10 . 8.117 a. Let C1 mean relational intimacy score for the CMC group on the first meeting and C3 mean relational intimacy score for the CMC group on the third meeting. Let Cd difference in mean relational intimacy score between the first and third meetings for the CMC group. To determine if the mean relational intimacy score will increase between the first and third meetings, we test: H 0 : Cd 0 H a : Cd 0 b. The researchers used the paired t-test because the same individuals participated in each of the three meeting sessions. Thus, the samples would not be independent. Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 479 c. Since the p-value is so small ( p .003) , H0 would be rejected. There is sufficient evidence to indicate that the mean relational intimacy score for participants in the CMC group increased from the first to the third meeting for any value of .003 . d. Let F1 mean relational intimacy score for the FTF group on the first meeting and F3 mean relational intimacy score for the FTF group on the third meeting. Let Fd difference in mean relational intimacy score between the first and third meetings for the FTF group. To determine if the mean relational intimacy score will change between the first and third meetings, we test: H 0 : Fd 0 H a : Fd 0 8.118 e. Since the p-value is not small ( p .39) , H0 would be not be rejected. There is insufficient evidence to indicate that the mean relational intimacy score for participants in the FTF group changed from the first to the third meeting for any value of .39 . a. Let 1 mean scale score for employees who report positive spillover of work skills and 2 mean scale score for employees who did not report positive work spillover. To determine if the mean scale score for employees who report positive spillover of work skills differs from the mean scale score for employees who did not report positive work spillover, we test: H 0 : 1 2 0 H a : 1 2 0 b. It is appropriate to apply the large sample z-test because there are 114 workers that have been studied and divided into two groups. c. From the printout, the test statistics is t 8.847 (equal variances not assumed) and the p-value is p .000 . Since the p-value is less than ( p .000 .05) , H0 is rejected. There is sufficient evidence to indicate the mean scale score for employees who report positive spillover of work skills is different from the mean scale score for employees who did not report positive work spillover at .05 . d. We are 95% confident that the difference between the mean use of creative ideas scale scores for the two groups in between .627 and .988. Since interval does not contain 0, then we can say that there is a significant difference on the mean scale scores between the two groups. Yes, the inference derived from the confidence interval agrees with that from the hypothesis test. e. Let p1 proportion of male workers who reported positive work spillover and p2 proportion of male workers who did not report positive spillover of work skills. To determine if the proportions of male workers in the two groups are significantly different, we test: H 0 : p1 p2 0 H a : p1 p2 0 From the printout, the test statistic is z .75 and the p-value is p .453 . Since the p-value is not small, there is no evidence to reject H0. There is insufficient evidence to indicate the proportions of male workers in the two groups are significantly different at any value of .453 . Copyright © 2014 Pearson Education, Inc. 480 8.119 Chapter 8 Attitude towards the Advertisement: The p-value is p .091 . There is no evidence to reject H0 for .05 . There is no evidence to indicate the first ad will be more effective when shown to males for .05 . There is evidence to reject H0 for .10 . There is evidence to indicate the first ad will be more effective when shown to males for .10 . Attitude toward Brand of Soft Drink: The p-value is p .032 . There is evidence to reject H0 for .032 . There is evidence to indicate the first ad will be more effective when shown to males for .032 . Intention to Purchase the Soft Drink: The p-value is p .050 . There is no evidence to reject H0 for .05 . There is no evidence to indicate the first ad will be more effective when shown to males for .05 . There is evidence to reject H0 for .050 . There is evidence to indicate the first ad will be more effective when shown to males for .050 . No, I do not agree with the author’s hypothesis. The results agree with the author’s hypothesis for only the attitude toward the Brand using .05 . If we want to use .10 , then the author’s hypotheses are all supported. 8.120 a. To determine if the mean salary of all males with post-graduate degrees exceeds the mean salary of all females with post-graduate degrees, we test: H 0 : M F H a : M F b. The test statistic is z ( xM xF ) 0 s 8.121 2 xM s 2 xF (61, 340 32, 227) 2,185 2 932 2 12.26 c. The rejection region requires .01 in the upper tail of the z-distribution. From Table II, Appendix D, z.01 2.33 . The rejection region is z 2.33 . d. Since the observed value of the test statistic falls in the rejection region ( z 12.26 2.33) , H0 is rejected. There is sufficient evidence to indicate the mean salary of all males with post-graduate degrees exceeds the mean salary of all females with post-graduate degrees at .01 . a. Let p1 proportion of 9th grade boys who gambled weekly or daily in 1992 and p2 proportion of 9th grade boys who gambled weekly or daily in 1998. The researchers are interested in whether there is a difference in these two proportions, so the parameter of interest is p1 p2 . Some preliminary calculations are: pˆ1 x1 4, 684 .218 n1 21, 484 pˆ x1 x2 4, 684 5,313 9, 997 .224 n1 n2 21, 484 23,199 44, 683 pˆ 2 x2 5,313 .229 n2 23,199 Copyright © 2014 Pearson Education, Inc. qˆ 1 pˆ 1 .224 .776 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 481 To determine if there is a difference in the proportions of 9th grade boys who gambled weekly or daily in 1992 and 1998, we test: H 0 : p1 p2 0 H a : p1 p2 0 The test statistic is z ( pˆ1 pˆ 2 ) 0 1 1 ˆ ˆ pq n1 n2 (.218 .229) 0 1 1 .224(.776) 21, 484 23,199 2.79 The rejection region requires / 2 .01 / 2 .005 in each tail of the z-distribution. From Table II, Appendix D, z.005 2.58 . The rejection region is z 2.58 or z 2.58 . Since the observed value of the test statistic falls in the rejection region ( z 2.79 2.58) , H0 is rejected. There is sufficient evidence to indicate a difference in the proportions of 9th grade boys who gambled weekly or daily in 1992 and 1998 at .01 . b. Yes. If samples sizes are large enough, differences can almost always be found. Suppose we compute a 99% confidence interval. For confidence coefficient .99, .01 and / 2 .01/ 2 .005 . From Table II, Appendix D, z.005 2.58 . The 99% confidence interval is: ( pˆ1 pˆ 2 ) z / 2 pˆ1qˆ1 pˆ 2 qˆ2 .218(.782) .229(.771) (.218 .229) 2.58 n1 n2 21, 484 23,199 .011 .010 (.021, .001) We are 99% confident that the difference in the proportions of 9th grade boys who gambled weekly or daily in 1992 and 1998 is between .021 and .001. 8.122 a. We cannot make inferences about the difference between the mean salaries of male and female accounting/finance/banking professionals because no standard deviations are provided. b. To determine if the mean salary for males is significantly greater than that for females, we test: H 0 : 1 2 0 H a : 1 2 0 The rejection region requires .05 in the upper tail of the z-distribution. From Table II, Appendix D, z.05 1.645 . To make things easier, we will assume that the standard deviations for the 2 groups are the same. The test statistic is z x1 x2 Do 69, 484 52, 012 0 2 1 n1 2 2 n2 1 1 1400 1400 2 17,836 (.037796) Copyright © 2014 Pearson Education, Inc. 471,896.2038 482 Chapter 8 In order to reject H0 this test statistic must fall in the rejection region, or be greater than 1.645. Solving for we get: z 471,896.2038 1.645 471,896.2038 286,866.99 1.645 Thus, to reject H0 the average of the two standard deviations has to be less than $286,866.99. 8.123 c. Yes. In fact, reasonable values for the standard deviation will be around $5,000. which is much smaller than the required $286,866.99. d. These data were collected from voluntary subjects who responded to a Web-based survey. Thus, this is not a random sample, but a self-selected sample. Generally, subjects who respond to surveys tend to have very strong opinions, which may not be the same as the population in general. Thus, the results from this self-selected sample may not reflect the results from the population in general. Let 1 mean output for Design 1, 2 mean output for Design 2, and d 1 2 . Some preliminary calculations are: Difference (Design 1 - Design 2) 53 271 206 266 213 183 118 87 Working Days 8/16 8/17 8/18 8/19 8/20 8/23 8/24 8/25 d d 1,397 174.625 nd 8 sd2 d2 d 2 nd nd 1 (1,397) 8 6,548.839 8 1 289, 793 2 sd s 2d 6,548.839 80.925 To determine if Design 2 is superior to Design 1, we test: H 0 : d 0 H a : d 0 The test statistic is t d o sd nd 174.625 0 80.925 8 6.103 Since no value was given, we will use .05 . The rejection region requires .05 in the lower tail of the t-distribution with df nd 1 8 1 7 . From Table III, Appendix D, t.05 1.895 . The rejection region is t 1.895 . Since the observed value of the test statistic falls in the rejection region (t 6.103 1.895) , H0 is rejected. There is sufficient evidence to indicate Design 2 is superior to Design 1 at .05 . Copyright © 2014 Pearson Education, Inc. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table III, Appendix D, with df nd 1 8 1 7 , t.025 2.365 . A 95% confidence interval for d is: d t.025 sd nd 174.625 2.365 80.925 8 174.625 67.666 (242.29, 106.96) Since this interval does not contain 0, there is evidence to indicate Design 2 is superior to Design 1. Copyright © 2014 Pearson Education, Inc. 483 Chapter 9 Design of Experiments and Analysis of Variance 9.1 Since only one factor is utilized, the treatments are the four levels (A, B, C, D) of the qualitative factor. 9.2 The treatments are the combinations of levels of each of the two factors. There are 2 5 10 treatments. They are: (A, 50), (A, 60), (A, 70), (A, 80), (A, 90), (B, 50), (B, 60), (B, 70), (B, 80), (B, 90) 9.3 One has no control over the levels of the factors in an observational experiment. One does have control of the levels of the factors in a designed experiment. 9.4 a. College GPA's are measured on college students. The experimental units are college students. b. Household income is measured on households. The experimental units are households. c. Gasoline mileage is measured on automobiles. The experimental units are the automobiles of a particular model. d. The experimental units are the sectors on a computer diskette. e. The experimental units are the states. a. This is an observational experiment. The economist has no control over the factor levels or unemployment rates. b. This is a designed experiment. The manager chooses only three different incentive programs to compare, and randomly assigns an incentive program to each of nine plants. c. This is an observational experiment. Even though the marketer chooses the publication, he has no control over who responds to the ads. d. This is an observational experiment. The load on the facility's generators is only observed, not controlled. e. This is an observational experiment. One has no control over the distance of the haul, the goods hauled, or the price of diesel fuel. a. The response variable is QB production score. b. There is one factor which is draft position. c. The treatments are the three levels of draft position – Top 10, between picks 11-50, and after pick 50. d. The experimental units are the drafted quarterbacks. 9.5 9.6 484 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 485 9.7 9.8 a. The experimental units are the firms with CPAs. b. The response variable is the firm’s likelihood of reporting sustainability policies. c. There are two factors – firm size and firm type. d. There are two levels of firm size – large and small. There are two levels of firm type – public and private. e. The treatments are the combinations of the factor levels. There are 2 2 4 treatments – large/public, large/private, small/public, and small/private. a. The experimental units are the accounting alumni. b. The response variable is income. c. There are 2 factors in the problem: Mach score classification and Gender. d. Mach score classification has 3 levels – high, moderate, and low. Gender has 2 levels – male and female. e. There are a total of 2 3 6 treatments in his experiment. The treatments are all of the Mach score rating-gender combinations. 9.9 9.10 9.11 9.12 a. The study is designed because the experimental units (study participants) were randomly assigned to the treatments (gift givers and gift receivers). b. The experimental units are the study participants. The response variable is the level of appreciation measured on a scale from 1 to 7. There is one factor – role. There are two levels of role and thus, two treatments. The treatments are gift giver or gift receiver. a. The response variable in this problem is the consumer’s opinion on the value of the discount offer. b. There are two treatments in this problem: Within-store price promotion and between-store price promotion. c. The experimental units are the consumers. a. There are 2 factors in this problem, each with 2 levels. Thus, there are a total of 2 2 4 treatments. b. The 4 treatments are: (Within-store, home), (Within-store, in store), (Between-store, home), and (Between-store, in store). a. There are 2 factors in the problem: Type of yeast and Temperature. Type of yeast has 2 levels – Brewer’s yeast and baker’s yeast. Temperature has 4 levels – 45o, 48o, 51o and 54oC. b. The response variable is the autolysis yield. c. There are a total of 2 4 8 treatments in this experiment. The treatments are all the type of yeasttemperature combinations. d. This is a designed experiment. Copyright © 2014 Pearson Education, Inc. 486 Chapter 9 9.13 a. The experimental units for this study are the students in the introductory psychology class. b. The study is a designed experiment because the students are randomly assigned to a particular study group. c. There are 2 factors in this problem: Class standing and study group. d. Class standing has 3 levels: Low, Medium, and High. Study group has 2 levels: practice test and review. e. There are a total of 3 2 6 treatments. They are: (Low, Review), (Low, Practice exam), (Medium, Review), (Medium, Practice exam), (High, Review), and (High, Practice exam). f. The response variable is the final exam score. a. The dependent variable is the dissolution time. b. There are 3 factors in this experiment: Binding agent, binding concentration, and relative density. Binding agent has 2 levels – khaya gum and PVP. Binding concentration has 2 levels .5% and 4.0%. Relative density has 2 levels – high and low. c. There could be a total of 2 2 2 8 treatments for this experiment. They are: 9.14 khaya gum, .5%, high khaya gum, .5%, low khaya gum, 4.0%, high khaya gum, 4.0%, low 9.15 9.16 9.17 PVP, .5%, high PVP, .5%, low PVP, 4.0%, high PVP, 4.0%, low a. From Table VI with 1 4 and 2 4 , F.05 6.39 . b. From Table VIII with 1 4 and 2 4 , F.01 15.98 . c. From Table V with 1 30 and 2 40 , F.10 1.54 . d. From Table VII with 1 15 and 2 12 , F.025 3.18 . a. P ( F 3.48) 1 .05 .95 using Table VI, Appendix D, with 1 5 and 2 9 b. P F 3.09 .01 using Table VIII, Appendix D, with 1 15 and 2 20 c. P F 2.40 .05 using Table VI, Appendix D, with 1 15 and 2 15 d. P ( F 1.83) 1 .10 .90 using Table V, Appendix D, with 1 8 and 2 40 a. In the second dot diagram #2, the difference between the sample means is small relative to the variability within the sample observations. In the first dot diagram #1, the values in each of the samples are grouped together with a range of 4, while in the second diagram #2, the range of values is 8. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 487 b. For diagram #1, x1 x 7 8 9 9 10 11 54 9 1 6 n 6 x2 x 12 13 14 14 15 16 84 14 x2 x 10 10 12 16 18 18 84 14 For diagram #2, x1 c. x 5 5 7 11 13 13 54 9 1 6 n 6 2 6 n 6 2 6 n For diagram #1, 6 x 54 84 11.5 x 12 n SST ni ( xi x )2 6(9 11.5)2 6(14 11.5)2 75 2 i 1 For diagram #2, SST ni ( xi x )2 6 9 11.5 6 14 11.5 75 2 2 2 i 1 d. x x For diagram #1, s12 2 542 496 6 2 6 1 1 2 1 n1 n1 1 s22 x x x x 2 2 2 2 n2 n2 1 842 6 2 6 1 1186 SSE (n1 1) s12 (n2 1) s22 (6 1)2 (6 1)2 20 x x For diagram #2, s12 1 2 1 n1 n1 1 2 542 558 6 14.4 6 1 s22 2 2 2 2 n2 n2 1 842 6 14.4 6 1 1248 SSE (n1 1) s12 (n2 1)s22 (6 1)14.4 (6 1)14.4 144 e. For diagram #1, SS Total SST SSE 75 20 95 SST is 75 SST 100% 100% 78.95% of SS Total 95 SS (Total ) For diagram #2, SS Total SST SSE 75 144 219 SST is f. 75 SST 100% 100% 34.25% of SS Total 219 SS (Total ) For diagram #1, MST 75 SST 75 , k 1 2 1 MSE 20 SSE 2, n k 12 2 Copyright © 2014 Pearson Education, Inc. F MST 75 37.5 2 MSE 488 Chapter 9 For diagram #2, MST g. 75 144 75 SST SSE MST 75 , MSE 14.4 , F 5.21 k 1 2 1 n k 12 2 MSE 14.4 The rejection region for both diagrams requires .05 in the upper tail of the F-distribution with 1 k 1 2 1 1 and 2 n k 12 2 10 . From Table VI, Appendix D, F.05 4.96 . The rejection region is F 4.96 . For diagram #1, since the observed value of the test statistic falls in the rejection region ( F 37.5 4.96) , H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at .05 . For diagram #2, since the observed value of the test statistic falls in the rejection region ( F 5.21 4.96) , H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at .05 . h. 9.18 We must assume both populations are normally distributed with common variances. For each dot diagram, we want to test: H 0 : 1 2 H a : 1 2 From Exercise 9.17, Diagram #1 x1 9 Diagram #2 x1 9 x2 14 x2 14 2 1 s 2 s12 14.4 s22 2 s22 14.4 a. Diagram #1 s 2 s22 2 2 sp2 1 2 2 2 (n1 n2 ) Diagram #2 s 2 s22 14.4 14.4 sp2 1 14.4 2 2 (n1 n2 ) In Exercise 9.17, MSE 2 In Exercise 9.17, MSE 14.4 The pooled variance for the two-sample t-test is the same as the MSE for the F-test. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 489 b. t= Diagram #1 x1 x2 9 14 = 1 1 1 1 2 + sp2 6 6 n1 n2 = 6.12 In Exercise 9.17, F 37.5 t= Diagram #2 x1 x2 9 14 = 1 1 1 1 14.4 + sp2 6 6 n1 n2 = 2.28 In Exercise 9.17, F 5.21 The test statistic for the F-test is the square of the test statistic for the t-test. c. Diagram #1 For the t-test, the rejection region requires / 2 .05 / 2 .025 in each tail of the tdistribution with df n1 n2 2 6 6 2 10 . From Table III, Appendix D, t.025 2.228 . Diagram #2 For the t-test, the rejection region is the same as Diagram #1 since we are using the same , n1, and n2 for both tests. The rejection region is t 2.228 or t 2.228 . In Exercise 9.17, the rejection region for both diagrams using the F-test is F 4.96 . The tabled F value equals the square of the tabled t value. d. Diagram #1 For the t-test, since the test statistic falls in the rejection region (t 6.12 2.228) , we would reject H0. In Exercise 9.17, using the F-test, we rejected H0. e. Diagram #2 For the t-test, since the test statistic falls in the rejection region (t 2.28 2.228) , we would reject H0. In Exercise 9.17, using the F-test, we rejected H0. Assumptions for the t-test: 1. 2. 3. Both populations have relative frequency distributions that are approximately normal. The two population variances are equal. Samples are selected randomly and independently from the populations. Assumptions for the F-test: 1. 2. 3. Both population probability distributions are normal. The two population variances are equal. Samples are selected randomly and independently from the respective populations. The assumptions are the same for both tests. Copyright © 2014 Pearson Education, Inc. 490 9.19 Chapter 9 Refer to Exercise 9.17, the ANOVA table is: For diagram #1: Source Treatment Error Total df 1 10 11 SS 75 20 95 MS 75 2 F 37.5 SS 75 144 219 MS 75 14.4 F 5.21 For diagram #2: Source Treatment Error Total 9.20 a. df 1 10 11 SSE SS Total SST 46.5 17.5 29.0 df for Error is 41 6 35 MST SST 17.5 2.9167 6 k 1 MSE SSE 29.0 .8286 35 nk F MST 2.9167 3.52 .8286 MSE The ANOVA table is: Source Treatment Error Total df 6 35 41 SS 17.5 29.0 46.5 MS 2.9167 .8286 F 3.52 b. The number of treatments is k. We know k 1 6 k 7 . c. The total sample size is n 41 1 42 , where 41 df Total. d. First, one would number the 42 experimental units from 1 to 42. Then generate over 100 uniform random numbers from 1 to 42. The first 6 different random numbers will correspond to treatment 1. The next 6 different random numbers will correspond to treatment 2. Repeat the process for treatments, 3, 4, 5, 6, and 7. e. To determine if there is a difference among the population means, we test: H 0 : 1 2 7 H a : At least one of the population means differs from the rest The test statistic is F 3.52 . The rejection region requires .10 in the upper tail of the F-distribution with numerator 1 k 1 7 1 6 and denominator 2 n k 42 7 35 . From Table V, Appendix D, F.10 1.98 . The rejection region is F 1.98 . Since the observed value of the test statistic falls in the rejection region ( F 3.52 1.98) , H0 is rejected. There is sufficient evidence to indicate a difference among the population means at .10 . Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 491 f. The observed significance level is P( F 3.52) . Using MINITAB, Cumulative Distribution Function F distribution with 6 DF in numerator and 35 DF in denominator x 3.52 P( X <= x ) 0.992128 P( F 3.52) 1 .992128 .007872 . g. H 0 : 1 2 H a : 1 2 x1 x2 1 1 MSE n1 n2 The test statistic is t 3.7 4.1 1 1 .8286 6 6 .76 The rejection region requires / 2 .10 / 2 .05 in each tail of the t-distribution with df n k 35 . From Table III, Appendix D, t.05 1.697 . The rejection region is t 1.697 and t 1.697 . Since the observed value of the test statistic does not fall in the rejection region (t .76 1.697) , H0 is not rejected. There is insufficient evidence to indicate that 1 and 2 differ at .10 . h. For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df 35 , t.05 1.697 . The confidence interval is: 1 1 1 1 ( x1 x2 ) t.05 MSE (3.7 4.1) 1.697 .8286 .4 .892 1.292, .492 6 6 n1 n 2 i. The confidence interval is: x1 t.05 9.21 a. MSE .8286 3.7 1.697 3.7 .631 3.069, 4.331 6 6 Using MINITAB, the results are: One-way ANOVA: T1, T2, T3 Source Factor Error Total DF 2 9 11 S = 1.449 b. SS 12.30 18.89 31.19 MS 6.15 2.10 R-Sq = 39.44% F 2.93 P 0.105 R-Sq(adj) = 25.98% H 0 : 1 2 3 H a : At least two treatment means differ The test statistic is F 2.931 and the p-value is p .105 . Copyright © 2014 Pearson Education, Inc. 492 Chapter 9 Since the p-value is not less than ( p .105 .01) , H0 is not rejected. There is insufficient evidence to indicate a difference in the treatment means at .01 . 9.22 a. The type of design used was a completely randomized design. b. The dependent variable is the decrease in the number of promotional cards sold after implementation of the pay cuts. c. There is one factor in this example – type of pay cut. The factor levels are: unilateral wage cut, general wage cut, and baseline. d. Let 1 mean decrease in cards sold for those receiving the “unilateral wage cut”, 2 mean decrease in cards sold for those receiving the “general wage cut” and 3 mean decrease in cards sold for those receiving the “baseline”. To determine if the average decrease in cards sold differs depending on whether one or more of the workers received a pay cut, we test: H 0 : 1 2 3 H a : At least two treatment means differ 9.23 e. Since the p-value is less than ( p .001 .01) , H0 is rejected. There is sufficient evidence to indicate the average decrease in the number of cards sold differs depending on whether one or more of the workers received a pay cut. a. A completely randomized design was used for this study. The experimental units are the bus customers. The dependent variable is the performance score. There is one factor which is bus depot with 3 levels – Depot 1, Depot 2, and Depot 3. These factor levels are the treatments of the experiment. b. Yes. The p-value from the ANOVA F-test was p .0001 . For a 95% confidence level, .05 . Since the p-value is less than ( p .0001 .05) , H0 is rejected. There is sufficient evidence to indicate the mean customer performance scores differed across the three bus depots at .05 . 9.24 a. This is a completely randomized design because the subjects were randomly assigned to one of three groups. b. The response variable was the total WTP (willing to pay) value and the treatments were the 3 types of instructions given. c. To determine if the mean total WTP values differed among the three groups, we test: H 0 : 1 2 3 H a : At least two treatment means differ d. One would number the subjects from 1 to 252. Then, use a random number generator to generate 350 to 400 random numbers from 1 to 252 (We need to generate more than 252 random numbers to account for duplicates.) The first 84 different random numbers will be assigned to group 1, the next 84 different random numbers will be assigned to groups 2, and the rest will be assigned to group 3. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 493 9.25 a. To determine if the mean LUST discount percentages across the seven states differ, we test: H 0 : 1 2 7 H a : At least two treatment means differ b. From the ANOVA table, the test statistic is F 1.60 and the p-value is p 0.174 . Since the observed p-value is not less than ( p .174 .10) , H0 is not rejected. There is insufficient evidence to indicate a difference in the mean LUST discount percentages among the seven states at .10 . 9.26 a. To determine if differences exist in the mean rates of return among the three types of fund groups, we test: H 0 : 1 2 3 H a : At least two treatment means differ b. The rejection region requires .01 in the upper tail of the F-distribution with1 k 1 3 1 2 and 2 n k 90 3 87 . Using MINITAB, Inverse Cumulative Distribution Function F distribution with 2 DF in numerator and 87 DF in denominator P( X <= x ) 0.99 x 4.85777 The rejection region is F 4.86 . c. 9.27 Since the observed value of the test statistic falls in the rejection region ( F 6.965 4.86) , H0 is rejected. There is sufficient evidence to indicate differences exist in the mean rates of return among the three types of fund groups at .01 . To determine if the mean road rage score differs for the three income groups, we test: H 0 : 1 2 3 H a : At least two treatment means differ The test statistic is F 3.90 and the p-value is p .01 . Since the p-value is less than .05 , H0 is rejected. There is sufficient evidence to indicate the mean road rage score differs for the three income groups for .01 . Since the sample means increase as the income increases, it appears that road rage increases as income increases. 9.28 a. The experimental units are the participants in the study. b. The dependent variable is the brand recall score. c. There is one factor in this study – TV viewing group. Since there is only one factor, the treatments correspond to the factor levels of this variable. Thus, the treatments are the same as the three levels of TV viewer group. These 3 levels are violent content code, sex content code, and neutral TV. Copyright © 2014 Pearson Education, Inc. 494 Chapter 9 d. The means given are only sample means. If new samples were selected and sample means computed, the values and order of the sample means could change. In addition, the variances are not taken into account. e. The test statistic is F 20.45 and the p-value is p 0.000 . f. Since the p-value is less than ( p 0.000 .01) , Ho is rejected. There is sufficient evidence to indicate differences in the mean recall scores among the three viewing groups at .01 . The researchers can conclude that the content of the TV show affects the recall of imbedded commercials. g. Using MINITAB, the histograms of the three viewing groups are: Histogram of VIOLENT, SEX, NEUTRAL Normal VIOLENT SEX 24 VIO LENT Mean 2.083 StDev 1.730 N 108 30 18 20 SEX Mean 1.713 StDev 1.664 N 108 Frequency 12 10 6 0 0 -2 0 2 4 6 -2 0 2 4 6 NEUTRAL 30 NEUTRAL Mean 3.167 StDev 1.811 N 108 20 10 0 0 2 4 6 The assumptions for ANOVA are that the data are approximately normal and the variances of the groups are the same. From the legend above, the standard deviations are 1.730, 1.664, and 1.811. These are all very similar. From the plots, the distributions of the violent group and the neutral group are fairly normal. The distribution of the sex group is skewed to the right and may not be normal. 9.29 a. This was a completely randomized design. b. The experimental units are the college students. The dependent variable is the attitude toward tanning score and the treatments are the 3 conditions (view product advertisement with models with a tan, view product advertisement with models with no tan, and view product advertisement with no model). c. Let 1 mean attitude score for those viewing product advertisement with models with a tan, 2 mean attitude score for those viewing product advertisement with models without a tan, and 3 mean attitude score for those viewing product advertisement with no models. To determine if the treatment mean scores differ among the three groups, we test: H 0 : 1 2 3 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 495 d. These are just sample means. To determine if the population means differ, we have to determine how many standard deviations are between these sample means. In addition, the next time an experiment was conducted, the sample means could change. e. The hypotheses are: H 0 : 1 2 3 H a : At least two treatment means differ The test statistic is F 3.60 and the p-value is p .03 . Since the p-value is less than ( p .03 .05) , H0 is rejected. There is sufficient evidence to indicate a difference in the mean attitude scores among the three groups at .05 . 9.30 f. We must assume that we have random samples from approximately normal populations with equal variances. a. To determine if the mean knowledge gain differs among the three groups, we test: H 0 : 1 2 3 H a : At least two treatment means differ b. Using MINITAB, the results are: One-way ANOVA: NO, CHECK, FULL Source Factor Error Total DF 2 72 74 S = 2.706 c. SS 6.64 527.36 534.00 MS 3.32 7.32 R-Sq = 1.24% F 0.45 P 0.637 R-Sq(adj) = 0.00% The test statistic is F 0.45 and the p-value is p 0.637 . Since the p-value ( p 0.637) is larger than any reasonable significance level, H0 is not rejected. There is insufficient evidence to indicate a difference in the mean knowledge gained among the three levels of assistance for any reasonable value of . Practically speaking, there is not one type of assistance that helps students more than another. 9.31 a. To determine if the mean level of trust differs among the six treatments, we test: H 0 : 1 2 6 H a : At least two treatment means differ b. The test statistic is F 2.21 . The rejection region requires .05 in the upper tail of the F-distribution with1 k 1 6 1 5 and 2 n k 230 6 224 . Using MINITAB, Copyright © 2014 Pearson Education, Inc. 496 Chapter 9 Inverse Cumulative Distribution Function F distribution with 5 DF in numerator and 231 DF in denominator P( X <= x ) 0.95 x 2.25436 The rejection region is F 2.25 . Since the observed value of the test statistic does not fall in the rejection region ( F 2.21 2.25) , H0 is not rejected. There is insufficient evidence to indicate that at least two mean trusts differ at .05 . 9.32 c. We must assume that all six samples are drawn from normal populations, the six population variances are the same, and that the samples are independent. d. I would classify this experiment as designed. Each subject was randomly assigned to receive one of the six scenarios. a. I would classify this experiment as designed. Each subject was randomly assigned to receive one of the three dosages (DM, honey, nothing). There are 3 treatments in the study corresponding to the 3 dosages: DM, honey, nothing. b. Using MINITAB, the output is: One-way ANOVA: TotalScore versus Treatment Source Treatment Error Total DF 2 102 104 SS 318.51 927.72 1246.23 MS 159.25 9.10 S = 3.016 R-Sq = 25.56% F 17.51 P 0.000 R-Sq(adj) = 24.10% To determine if differences exist in the mean improvement scores among the 3 treatment groups, we test: H 0 : 1 2 3 H a : At least two treatment means differ The test statistic is F 17.51 and the p-value is p 0.000 . Since the observed p-value ( p 0.000) is less than any reasonable value of , H0 is rejected. There is sufficient evidence to indicate a difference in the mean improvement scores among the three levels of dosage for any reasonable value of . 9.33 To determine if the mean THICKNESS differs among the 4 types of housing, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 11.74 and the p-value is p 0.000 . Since the observed p-value ( p 0.000) is less than any reasonable value of , H0 is rejected. There is sufficient evidence to indicate a difference in the mean thickness among the four levels of housing for any reasonable value of . Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 497 To determine if the mean WHIPPING CAPACITY differs among the 4 types of housing, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 31.36 and the p-value is p 0.000 . Since the observed p-value ( p 0.000) is less than any reasonable value of , H0 is rejected. There is sufficient evidence to indicate a difference in the mean whipping capacity among the four levels of housing for any reasonable value of . To determine if the mean STRENGTH differs among the 4 types of housing, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 1.70 and the p-value is p 0.193 . Since the observed p-value ( p 0.193) is higher than any reasonable value of , H0 is not rejected. There is insufficient evidence to indicate a difference in the mean strength among the four levels of housing for any reasonable value of . Thus, the mean thickness and the mean percent overrun differ among the 4 housing systems. n x 3 i i 9.34 a. x i 1 76 26(10.5)25(3.9) 25(1.4) 405.5 5.3355 76 76 SST ni ( xi x )2 26(10.5 5.3355)2 25(3.9 5.3355)2 25(1.4 5.3355)2 1132.1941 3 i 1 b. SSE ( n1 1) s12 ( n2 1) s22 ( n3 1) s32 26 – 1 7.6 25 – 1 7.5 25 – 1 7.5 c. SS Total SST SSE 1,132.1941 4,144 5, 276.1941 2 2 1,444 1,350 1,350 4,144 MST F SST 1132.1942 566.0971 3 1 k 1 MSE 4144 SSE 56.7671 n k 76 3 MST 566.0971 9.97 MSE 56.7671 The ANOVA table is: Source Groups Error Total d. 2 df 2 73 75 SS 1132.1941 4144.00 5276.1941 MS 566.0971 56.77 F-value 9.97 To determine if the mean drops in anxiety levels differ among the 3 groups, we test: H 0 : 1 2 3 H a : At least two treatment means differ Copyright © 2014 Pearson Education, Inc. 498 Chapter 9 The test statistic is F 9.97 . The rejection region requires .01 in the upper tail of the F-distribution with1 k 1 3 1 2 and 2 n k 76 3 73 . From Table VIII, Appendix D, F.01 4.92 . The rejection region is F 4.92 . Since the observed value of the test statistic falls in the rejection region ( F 9.97 4.92) , H0 is rejected. There is sufficient evidence to indicate a difference in the mean drops in anxiety levels among the three groups at .01 . e. The assumption of constant variance is satisfied since the three sample variances are all very similar (7.6 2 57.76, 7.52 56.25, and 7.52 56.25) . We are unable to check the normality assumption since we need the individual drops in anxiety levels to create a histogram or stem-and-leaf plot. 9.35 The number of pairwise comparisons is equal to k (k 1) / 2 . a. For k 3 , the number of comparisons is 3(3 1) / 2 3 . b. For k 5 , the number of comparisons is 5(5 1) / 2 10 . c. For k 4 , the number of comparisons is 4(4 1) / 2 6 . d. For k 10 , the number of comparisons is 10(10 1) / 2 45 . 9.36 The experimentwise error rate is the probability of making a Type I error for at least one of all of the comparisons made. If the experimentwise error rate is .05 , then each individual comparison is made at a value of which is less than .05. 9.37 A comparisonwise error rate is the error rate (or the probability of declaring the means different when, in fact, they are not different, which is also the probability of a Type I error) for each individual comparison. That is, if each comparison is run using .05 , then the comparisonwise error rate is .05. 9.38 a. From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, and E and D. All other pairs of means are not significantly different because they are connected by lines. b. From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and B, A and D, C and B, C and D, E and B, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines. c. From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, and A and D. All other pairs of means are not significantly different because they are connected by lines. d. From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 499 9.39 ( 1 2 ) : 2, 15 Since all values in the interval are positive, 1 is significantly greater than 2 . ( 1 3 ) : 4, 7 Since all values in the interval are positive, 1 is significantly greater than 3 . ( 1 4 ) : 10, 3 Since 0 is in the interval, 1 is not significantly different from 4 . However, since the center of the interval is less than 0, 4 is larger than 1 . ( 2 3 ) : 5, 11 Since 0 is in the interval, 2 is not significantly different from 3 . However, since the center of the interval is greater than 0, 2 is larger than 3 . ( 2 4 ) : 12, 6 Since all values in the interval are negative, 4 is significantly greater than 2 . ( 3 4 ) : 8, 5 Since all values in the interval are negative, 4 is significantly greater than 3 . Thus, the largest mean is 4 followed by 1 , 2 ,and 3 . 9.40 a. The number of pairwise comparisons is c general baseline , and unilateral baseline . k (k 1) 3(3 1) 6 3 . These are: general unnilateral , 2 2 2 b. A multiple comparison procedure is recommended to keep the experimentwise error rate at the selected level. c. Since the confidence interval contains only positive values, there is evidence of a significant difference in the average decrease in promotional cards sold. Since the values are positive, this indicates that the average decrease for the baseline is greater than the average decrease for the general wage cut. 9.41 Since all confidence intervals contain only positive values, this indicates that there is evidence that all population means are different. The largest mean is for Depot 1, then next highest is Depot 2, and the lowest is Depot 3. 9.42 a. The test statistic is F 22.68 and the p-value is p 0.001 . Since the observed p-value ( p 0.001) is less than any reasonable level we select (.01, .05, or .10), we reject H0. There is sufficient evidence to indicate a difference in the mean number of alternatives listed among the three emotional states for any .001 . b. The probability of declaring at least one pair of means different when they are not is .05. c. The mean number of alternatives listed under the guilty state is significantly higher than mean number of alternatives listed under the angry and neutral states. There is no difference in the mean number of alternatives listed under the angry and neutral states. a. Tukey’s multiple comparison method is preferred over other methods because it controls experimental error at the chosen level. It is more powerful than the other methods. b. From the confidence interval comparing large-cap and medium-cap mutual funds, we find that 0 is in the interval. Thus, 0 is not an unusual value for the difference in the mean rates of return between large-cap and medium-cap mutual funds. This means we would not reject H0. There is insufficient evidence of a difference in mean rates of return between large-cap and medium-cap mutual funds at .05 . 9.43 Copyright © 2014 Pearson Education, Inc. 500 Chapter 9 c. From the confidence interval comparing large-cap and small-cap mutual funds, we find that 0 is not in the interval. Thus, 0 is an unusual value for the difference in the mean rates of return between large-cap and small-cap mutual funds. This means we would reject H0. There is sufficient evidence of a difference in mean rates of return between large-cap and small-cap mutual funds at .05 . d. From the confidence interval comparing medium-cap and small-cap mutual funds, we find that 0 is in the interval. Thus, 0 is not an unusual value for the difference in the mean rates of return between medium-cap and small-cap mutual funds. This means we would not reject H0. There is insufficient evidence of a difference in mean rates of return between medium-cap and small-cap mutual funds at .05 . e. From the above, the mean rate of return for large-cap mutual funds is the largest, followed by medium-cap, followed by small-cap mutual funds. The mean rate of return for large-cap funds is significantly larger than that for small-cap funds. No other differences exist. f. We are 95% confident of this decision. 9.44 The mean attitude score for those viewing the product advertisement with models with no tan was significantly lower than the mean attitude scores of the other two groups. There is no significant difference in the mean attitude scores between those viewing the product advertisement with models with tans and those viewing the product advertisement with no models. This indicates that the type of product advertisement can influence a consumer’s attitude towards tanning. 9.45 a. The probability of declaring at least one pair of means different when they are not is .01. b. There are a total of k (k 1) 3(3 1) 3 pair-wise comparisons. They are: 2 2 ‘Under $30 thousand’ to ‘Between $30 and $60 thousand’ ‘Under $30 thousand’ to ‘Over $60 thousand’ ‘Between $30 and $60 thousand’ to ‘Over $60 thousand’ c. Means for groups in homogeneous subsets are displayed in the table: Subsets Income Group Under $30,000 $30,000-$60,000 Over $60,000 d. N 379 392 267 1 4.60 2 5.08 5.15 Two of the comparisons in part b will yield confidence intervals that do not contain 0. They are: ‘Under $30 thousand’ to ‘Between $30 and $60 thousand’ ‘Under $30 thousand’ to ‘Over $60 thousand’ 9.46 k (k 1) 3(3 1) 3. 2 2 a. The total number of pairwise comparisons made in the Bonferroni analysis is b. The confidence interval for comparing the V and S groups is (.923, .183). (Violence is subtracted from Sex.) Since the confidence interval contains 0, there is no indication that there is a difference in mean recall between the V and S groups at .05 . c. The confidence interval for comparing the V and N groups is (.530, 1.636). (Violence is subtracted Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 501 from Neutral.) Since the confidence interval does not contain 0, there is evidence to indicate there is a difference in mean recall between the V and N groups at .05 . Since both endpoints are positive, there is evidence to indicate the mean recall for the Neutral group is significantly higher than that of the V group. The confidence interval for comparing the S and N groups is (.901, 2.007). (Sex is subtracted from Neutral.) Since the confidence interval does not contain 0, there is evidence to indicate there is a difference in mean recall between the S and N groups at .05 . Since both endpoints are positive, there is evidence to indicate the mean recall for the Neutral group is significantly higher than that of the S group. d. Yes. When compared to the Neutral group, the mean recalls for the V and S groups are significantly lower than the mean recall for the Neutral group. 9.47 The mean level of trust for the "no close" technique is significantly higher than that for "the visual close" and the "thermometer close" techniques. The mean level of trust for the "impending event" technique is significantly higher than that for the "thermometer close" technique. No other significant differences exist. 9.48 Using MINITAB, the multiple comparisons of the means is shown below: Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 98.06% Honey subtracted from: DM Control Lower -4.120 -5.890 Center -2.381 -4.201 Upper -0.642 -2.511 ----+---------+---------+---------+----(-----*------) (------*------) ----+---------+---------+---------+-----5.0 -2.5 0.0 2.5 Upper -0.104 ----+---------+---------+---------+----(------*------) ----+---------+---------+---------+-----5.0 -2.5 0.0 2.5 DM subtracted from: Control Lower -3.535 Center -1.820 None of the three confidence intervals contain 0: The confidence interval for the difference in mean improvement scores between DM and Honey is (4.120 and 0.642). Since this confidence interval is strictly below zero, this implies that the improvement scores for Honey are significantly higher than those of DM. The confidence interval for the difference in mean improvement scores between the Control group and Honey is (5.890 and 2.511). Since this confidence interval is strictly below zero, this implies that the improvement scores for Honey are significantly higher than those of the Control Group. Compared to the Control group (giving no treatment) and DM, honey is a preferable treatment since it has significantly higher improvement scores. The state is appropriate. Copyright © 2014 Pearson Education, Inc. 502 9.49 Chapter 9 a. The confidence interval for ( CAGE BARN ) is (.1250, .0323). Since 0 is not contained in this interval, there is sufficient evidence of a difference in the mean shell thickness between cage and barn egg housing systems. Since this interval is negative, this implies that the thickness is larger for the barn egg housing system. b. The confidence interval for ( CAGE FREE ) is (.1233, .0307). Since 0 is not contained in this interval, there is sufficient evidence of a difference in the mean shell thickness between cage and free range egg housing systems. Since this interval is negative, this implies that the thickness is larger for the free range egg housing system. c. The confidence interval for ( CAGE ORGANIC ) is (.1050, .0123). Since 0 is not contained in this interval, there is sufficient evidence of a difference in the mean shell thickness between cage and organic egg housing systems. Since this interval is negative, this implies that the thickness is larger for the organic egg housing system. d. The confidence interval for ( BARN FREE ) is (.0501, .0535). Since 0 is contained in this interval, there is insufficient evidence of a difference in the mean shell thickness between barn and free range egg housing systems. Since the center of the interval is greater than 0, the sample mean for barn is greater than that for free range. e. The confidence interval for ( BARN ORGANIC ) is (.0318, .0718). Since 0 is contained in this interval, there is insufficient evidence of a difference in the mean shell thickness between barn and organic egg housing systems. Since the center of the interval is greater than 0, the sample mean for barn is greater than that for organic. f. The confidence interval for ( FREE ORGANIC ) is (.0335, .0701). Since 0 is contained in this interval, there is insufficient evidence of a difference in the mean shell thickness between free range and organic egg housing systems. Since the center of the interval is greater than 0, the sample mean for free range is greater than that for organic. g. We rank the housing system means as follows: Housing System: Cage < Organic < Free < Barn We are 95% confident that the mean shell thickness for the cage housing system is significantly less than the mean thickness for the other three housing systems. There is no significant difference in the mean shell thicknesses among the barn, free range and organic housing systems. 9.50 a. There are 3 blocks used since Block df b 1 2 and 5 treatments since the treatment df k 1 4 . b. There were 15 observations since the Total df n 1 14 . c. H 0 : 1 2 5 H a : At least two treatment means differ MST 9.109 MSE d. The test statistic is F e. The rejection region requires .01 in the upper tail of the F distribution with1 k 1 5 1 4 and 2 n k b 1 15 5 3 1 8 . From Table VIII, Appendix D, F.01 7.01 . The rejection region is F 7.01 . Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 503 f. Since the observed value of the test statistic falls in the rejection region ( F 9.109 7.01) , H0 is rejected. There is sufficient evidence to indicate that at least two treatment means differ at .01 . g. The assumptions necessary to assure the validity of the test are as follows: 1. The probability distributions of observations corresponding to all the block-treatment combinations are normal. 2. The variances of all the probability distributions are equal. B2 SSB i CM i 1 k a. SSB x 49 266.7778 2 b 9.51 where CM 2 i 9 n 172 152 172 266.7778 .8889 3 3 3 SSE SS Total SST SSB 30.2222 21.5555 .8889 7.7778 MST SST 21.5555 10.7778 k 1 2 MSE SSE 7.7778 1.9445 n k b 1 4 FT MST 10.7778 5.54 MSE 1.9445 MSB FB SSB .8889 .4445 b 1 2 MSB .4445 .23 MSE 1.9445 The ANOVA table is: Source Treatment Block Error Total b. df 2 2 4 8 SS 21.5555 .8889 7.7778 30.2222 MS 10.7778 .4445 1.9445 F 5.54 .23 H 0 : 1 2 3 H a : At least two treatment means differ MST 5.54 MSE c. The test statistic is F d. A Type I error would be concluding at least two treatment means differ when they do not. A Type II error would be concluding all the treatment means are the same when at least two differ. e. The rejection region requires .05 in the upper tail of the F distribution with1 k 1 3 1 2 and 2 n k b 1 9 3 3 1 4 . From Table VI, Appendix A, F.05 6.94 . The rejection region is F 6.94 . Since the observed value of the test statistic does not fall in the rejection region ( F 5.54 6.94) , H0 is not rejected. There is insufficient evidence to indicate at least two of the treatment means differ at .05 . Copyright © 2014 Pearson Education, Inc. 504 Chapter 9 9.52 a. The ANOVA Table is as follows: Source Treatment Block Error Total b. df 2 3 6 11 SS 12.032 71.749 .708 84.489 MS 6.016 23.916 .118 F 50.958 202.586 To determine if the treatment means differ, we test: H 0 : A B C H a : At least two treatment means differ The test statistic is F MST 50.958 MSE The rejection region requires .05 in the upper tail of the F distribution with1 k 1 3 1 2 and 2 n k b 1 12 3 4 1 6 . From Table VI, Appendix D, F.05 5.14 . The rejection region is F 5.14 . Since the observed value of the test statistic falls in the rejection region ( F 50.958 5.14) , H0 is rejected. There is sufficient evidence to indicate that the treatment means differ at .05 . c. To see if the blocking was effective, we test: H 0 : 1 2 3 4 H a : At least two block means differ The test statistic is F MSB 202.586 MSE The rejection region requires .05 in the upper tail of the F distribution with1 b 1 4 1 3 and 2 n k b 1 12 3 4 1 6 . From Table VI, Appendix D, F.05 4.76 . The rejection region is F 4.76 . Since the observed value of the test statistic falls in the rejection region ( F 202.586 4.76) , H0 is rejected. There is sufficient evidence to indicate that blocking was effective in reducing the experimental error at .05 . d. From the printouts, we are given the differences in the sample means. The difference between Treatment B and both Treatments A and C are positive (1.125 and 2.450), so Treatment B has the largest sample mean. The difference between Treatment A and C is positive (1.325), so Treatment A has a larger sample mean than Treatment C. So Treatment B has the largest sample mean, Treatment A has the next largest sample mean and Treatment C has the smallest sample mean. From the printout, all the means are significantly different from each other. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 505 e. The assumptions necessary to assure the validity of the inferences above are: 1. 2. 9.53 a. The probability distributions of observations corresponding to all the block-treatment combinations are normal. The variances of all the probability distributions are equal. SST .2(500) 100 SSB .3(500) 150 SSE SS Total SST SSB 500 100 150 250 MST SST 100 33.3333 k 1 4 1 MSE SSE 250 250 10.4167 n k b 1 36 4 9 1 24 FT MST 33.3333 3.20 MSE 10.4167 MSB FB SSB 150 18.75 b 1 9 1 MSB 18.75 1.80 MSE 10.4167 To determine if differences exist among the treatment means, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 3.20 . The rejection region requires .05 in the upper tail of the F distribution with1 k 1 4 1 3 and 2 n k b 1 36 4 9 1 24 . From Table VI, Appendix D, F.05 3.01 . The rejection region is F 3.01 . Since the observed value of the test statistic falls in the rejection region ( F 3.20 3.01) , H0 is rejected. There is sufficient evidence to indicate differences among the treatment means at .05 . To determine if differences exist among the block means, we test: H 0 : 1 2 9 H a : At least two block means differ The test statistic is F 1.80 . The rejection region requires .05 in the upper tail of the F distribution with1 b 1 9 1 8 and 2 n k b 1 36 4 9 1 24 . From Table VI, Appendix D, F.05 2.36 . The rejection region is F 2.36 . Since the observed value of the test statistic does not fall in the rejection region ( F 1.80 2.36) , H0 is not rejected. There is insufficient evidence to indicate differences among the block means at .05 . Copyright © 2014 Pearson Education, Inc. 506 Chapter 9 b. SST .5(500) 250 SSB .2(500) 100 SSE SS Total SST SSB 500 250 100 150 MST SST 250 83.3333 k 1 4 1 MSE SSE 150 6.25 n k b 1 36 4 9 1 FT MST 83.3333 13.33 MSE 6.25 MSB FB SSB 100 12.5 b 1 9 1 MSB 12.5 2 MSE 6.25 To determine if differences exist among the treatment means, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 13.33 . The rejection region is F 3.01 (same as above). Since the observed value of the test statistic falls in the rejection region ( F 13.33 3.01) , H0 is rejected. There is sufficient evidence to indicate differences exist among the treatment means at .05 . To determine if differences exist among the block means, we test: H 0 : 1 2 9 H a : At least two block means differ The test statistic is F 2.00 . The rejection region is F 2.36 (same as above). Since the observed value of the test statistic does not fall in the rejection region ( F 2.00 2.36) , H0 is not rejected. There is insufficient evidence to indicate differences exist among the block means at .05 . c. SST .2(500) 100 SSB .5(500) 250 SSE SS Total SST SSB 500 100 250 150 MST SST 100 33.3333 k 1 4 1 MSE SSE 150 6.25 n k b 1 36 4 9 1 FT MST 33.3333 5.33 MSE 6.25 MSB FB SSB 250 31.25 b 1 9 1 MSB 31.25 5.00 MSE 6.25 To determine if differences exist among the treatment means, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 507 The test statistic is F 5.33 . The rejection region is F 3.01 (same as above). Since the observed value of the test statistic falls in the rejection region ( F 5.33 3.01) , H0 is rejected. There is sufficient evidence to indicate differences exist among the treatment means at .05 . To determine if differences exist among the block means, we test: H 0 : 1 2 9 H a : At least two block means differ The test statistic is F 5.00 . The rejection region is F 2.36 (same as above). Since the observed value of the test statistic falls in the rejection region ( F 5.00 2.36) , H0 is rejected. There is sufficient evidence to indicate differences exist among the block means at .05 . d. SST .4(500) 200 SSB .4(500) 200 SSE SS Total SST SSB 500 200 200 100 MST SST 200 66.6667 k 1 4 1 MSE SSE 100 4.1667 n k b 1 36 4 9 1 FT MST 66.6667 16.0 MSE 4.1667 MSB FB SSB 200 25 b 1 9 1 MSB 25 6.00 MSE 4.1667 To determine if differences exist among the treatment means, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 16.0 . The rejection region is F 3.01 (same as above). Since the observed value of the test statistic falls in the rejection region ( F 16.0 3.01) , H0 is rejected. There is sufficient evidence to indicate differences among the treatment means at .05 . To determine if differences exist among the block means, we test: H 0 : 1 2 9 H a : At least two block means differ The test statistic is F 6.00 . The rejection region is F 2.36 (same as above). Copyright © 2014 Pearson Education, Inc. 508 Chapter 9 Since the observed value of the test statistic falls in the rejection region ( F 6.00 2.36) , H0 is rejected. There is sufficient evidence to indicate differences exist among the block means at .05 . e. SST .2(500) 100 SSB .2(500) 100 SSE SS Total SST SSB 500 100 100 300 MST SST 100 33.3333 k 1 4 1 MSE SSE 300 12.5 n k b 1 36 4 9 1 FT MST 33.3333 2.67 MSE 12.5 MSB FB SSB 100 12.5 b 1 9 1 MSB 12.5 1.00 MSE 12.5 To determine if differences exist among the treatment means, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 2.67 . The rejection region is F 3.01 (same as above). Since the observed value of the test statistic does not fall in the rejection region ( F 2.67 3.01) , H0 is not rejected. There is insufficient evidence to indicate differences exist among the treatment means at .05 . To determine if differences exist among the block means, we test: H 0 : 1 2 9 H a : At least two block means differ The test statistic is F 1.00 . The rejection region is F 2.36 (same as above). Since the observed value of the test statistic does not fall in the rejection region ( F 1.00 2.36) , H0 is not rejected. There is insufficient evidence to indicate differences among the block means at .05 . 9.54 a. This experimental design is a randomized block design because in part B, the same subjects provided WTP amounts for insuring both a sculpture and a painting. Each subject had 2 responses. b. The dependent (response) variable is the WTP amount. The treatments are the two scenarios (sculpture and painting). The blocks are the 84 subjects. c. To determine if there is a difference in the mean WTP amounts between sculptures and paintings, we test: H 0 : 1 2 H a : 1 2 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 509 9.55 a. A randomized block design should be used to analyze the data because the same employees were measured at all three time periods. Thus, the blocks are the employees and the treatments are the three time periods. b. There is still enough information in the table to make a conclusion because the p-values are given. c. To determine if there are differences in the mean competence levels among the three time periods, we test: H 0 : 1 2 3 H a : At least two treatment means differ 9.56 d. The p-value is p 0.001 . At a significance level > .001, we reject H0. There is sufficient evidence to conclude that there is a difference in the mean competence levels among the three time periods for any value of .001 . e. With 90% confidence, the mean competence before the training is significantly less than the mean competence 2-days after and 2-months after. There is no significant difference in the mean competence between 2-days after and 2-months after. a. This was a randomized complete block design. The blocks are the months and the treatments were the 3 types of measures of electrical consumption. b. df Method k 1 3 1 2 , df Error n k b 1 12 3 4 1 6 , SST (k 1) MST (3 1)(.195) .390 , FMonth MSB 10.780 159.23 MSE .069 Source Forecast Method Month Error Total c. SSB (b 1) MSB (4 1)(10.780) 32.340 , df 2 3 6 11 SS .390 32.340 .414 33.144 MS .195 10.780 .069 F-value 2.83 156.23 p-value .08 < .01 To determine if there is a difference in the mean electrical consumption values among the three methods, we test: H 0 : 1 2 3 H a : At least 2 of the treatment means differs The test statistic is F 2.83 and the p-value is p .08 . Since the p-value is not less than ( p .08 .05) , H0 is not rejected. There is insufficient evidence to indicate a difference in mean electrical consumption values among the three methods at .05 . 9.57 a. The treatments were the 8 different activities. b. The blocks were the 15 adults who participated in the study. Copyright © 2014 Pearson Education, Inc. 510 9.58 Chapter 9 c. Since the p-value is less than ( p .001 .01) , H0 is rejected. There is sufficient evidence to indicate a difference in mean heart rate among the 8 activities at .01 . d. The treadmill jogging had the highest mean heart rate. It was significantly greater than the mean heart rates of all the other activities. Brisk treadmill walking had the second highest mean heart rate. It was significantly less than the mean heart rate of treadmill jogging, but significantly greater than the mean heart rates of the other 6 activities. There was no significant difference in the mean heart rates among the treatments Wii aerobics, Wii muscle conditioning, Wii yoga, and Wii balance. The mean heart rates for these activities were significantly less than the mean heart rates for treadmill jogging and brisk treadmill walking, but greater than the mean heart rates of handheld gaming and rest. There was no significant difference in the mean heart rate between handheld gaming and rest. The mean heart rate for these two activities were significantly less than those for the other 6 activities. a. The time of the year (month) could affect the number of rigs running, so a randomized complete block design was used to “block” out the month to month variation. b. There are 3 treatments in this experiment. They are the three states – California, Utah, and Alaska. c. There are 3 blocks in this experiment – the three months selected: Month 1, Month 2, and Month 3. d. To determine if there is a difference in the mean number of rigs running among the three states, we test: H 0 : 1 2 3 9.59 e. From the printout, the test statistic is F 38.0685 and the p-value is p .0025 . Since the p-value is so small, we would reject H0 for any value of .0025 . There is sufficient evidence to indicate a difference in the mean number of oil rigs running among the three states. f. From the XLSTAT printout, there is no significant difference in the mean number of oil rigs running in Alaska and Utah. However, both of these states have a significantly smaller number of rigs running than does California. Thus, California has the largest mean number of oil rigs running. a. To compare the mean item scores, we test: H 0 : 1 2 5 H a : At least 2 of the treatment means differs b. Each of the 11 items were reviewed by each of the 5 systematic reviews. Since all reviews were made on each item, the observations are not independent. Thus, the randomized block ANOVA is appropriate. c. The p-value for Review is p 0.319. Since the p-value is not small, H0 would not be rejected for any reasonable value of . There is insufficient evidence to indicate a difference in the mean review scores among the 5 systematic reviews. The p-value for Item is p 0.000. Since the p-value is small, H0 would be rejected for any reasonable value of . There is sufficient evidence to indicate a difference in the mean scores among the 5 reviews. d. None of the means are significantly different because all means are connected with the letter ‘a’. This agrees with the conclusion drawn in part c about the treatment Review. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 511 e. 9.60 The experiment-wise error rate is .05. This means that the probability of declaring at least 2 means different when they are not different is .05. Using SAS, the ANOVA Table is: The ANOVA Procedure Dependent Variable: temp Source DF Sum of Squares Mean Square F Value Pr > F Model 11 18.53700000 1.68518182 0.52 0.8634 Error 18 58.03800000 3.22433333 Corrected Total 29 76.57500000 R-Square Coeff Var Root MSE temp Mean 0.242076 1.885189 1.795643 95.25000 Source DF Anova SS Mean Square F Value Pr > F STUDENT PLANT 9 2 18.41500000 0.12200000 2.04611111 0.06100000 0.63 0.02 0.7537 0.9813 To determine if there are differences among the mean temperatures among the three treatments, we test: H 0 : 1 2 3 H a : At least 2 of the treatment means differs The test statistic is F 0.02 . The associated p-value is p .9813 . Since the p-value is very large, there is no evidence of a difference in mean temperature among the three treatments for any reasonable value of . Since there is no difference, we do not need to compare the means. It appears that the presence of plants or pictures of plants does not reduce stress. 9.61 Using MINITAB, the ANOVA table is: Two-way ANOVA: Rate versus Week, Day Analysis of Variance for Rate Source DF SS Week 8 575.2 Day 4 94.2 Error 32 376.9 Total 44 1046.4 Day 1 2 3 4 5 Mean 8.8 4.6 5.8 5.4 6.4 MS 71.9 23.5 11.8 F 6.10 2.00 P 0.000 0.118 Individual 95% CI -+---------+---------+---------+---------+ (--------*---------) (--------*---------) (--------*--------) (--------*---------) (---------*--------) -+---------+---------+---------+---------+ 2.5 5.0 7.5 10.0 12.5 Copyright © 2014 Pearson Education, Inc. 512 Chapter 9 To determine if there is a difference in mean rate of absenteeism among the 5 days of the week, we test: H 0 : 1 2 3 4 5 H a : At least 2 of the treatment means differs The test statistic is F 2.00 and the p-value is p .118 . Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate a difference in mean rate of absenteeism among the 5 days of the week for any value of .118 . To test for the effectiveness of blocking, we test: H 0 : 1 2 9 H a : At least 2 of the block means differs The test statistic is F 6.10 and the p-value is p .000 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate blocking was effective at any reasonable value of . 9.62 a. The treatments are the 4 pre-slaughter phases. The blocks are the 8 cows. b. Using SPSS, the output is: Tests of Between-Subjects Effects Dependent Variable:Rate Type III Sum of Source Squares df Mean Square F Sig. a 10 244.400 5.108 .001 341551.125 1 341551.125 7137.777 .000 Cow 1922.875 7 274.696 5.741 .001 Phase 521.125 3 173.708 3.630 .030 Error 1004.875 21 47.851 Total 345000.000 32 3448.875 31 Corrected Model Intercept Corrected Total 2444.000 a. R Squared = .709 (Adjusted R Squared = .570) The ANOVA table is simpler form is: Source Phase Cow Error Total df 3 7 21 31 SS 521.125 1922.875 1004.875 3448.875 MS 173.708 274.696 47.851 F-value 3.630 5.741 Copyright © 2014 Pearson Education, Inc. p-value .030 .001 Design of Experiments and Analysis of Variance 513 c. To determine if there are differences among the mean heart rates of cows in the four pre-slaughter phases, we test: H 0 : 1 2 3 4 H a : At least 2 of the treatment means differs The test statistic is F 3.63 and the p-value is p .030 . Since the p-value is less than ( p .030 .05) , H0 is rejected. There is sufficient evidence to indicate a difference in heart rates of cows among the four pre-slaughter phases at .05 . d. Since we rejected H0 in part c, the multiple comparison procedure is warranted. Using SPSS, the results are: Homogeneous Subsets Rate a,b Tukey HSD Subset Phase N 1 2 2.00 8 97.0000 3.00 8 103.1250 103.1250 4.00 8 105.1250 105.1250 1.00 8 Sig. 108.0000 .119 .508 Means for groups in homogeneous subsets are displayed. Based on observed means. The error term is Mean Square(Error) = 47.851. a. Uses Harmonic Mean Sample Size = 8.000. b. Alpha = 0.05. There is a significant difference in the mean heart rates between the first phase and the second phase. The mean heart rate at the first phase is significantly greater than the mean heart rate at the second phase. No other differences exist. Copyright © 2014 Pearson Education, Inc. 514 9.63 Chapter 9 Using MINITAB, the ANOVA table is: Two-way ANOVA: Corrosion versus Time, System Source Time System Error Total DF 2 3 6 11 S = 0.3060 System 1 2 3 4 SS 63.1050 9.5833 0.5617 73.2500 MS 31.5525 3.1944 0.0936 R-Sq = 99.23% Mean 9.0667 9.7333 11.0667 8.7333 F 337.06 34.12 P 0.000 0.000 R-Sq(adj) = 98.59% Individual 95% CIs For Mean Based on Pooled StDev ------+---------+---------+---------+--(----*-----) (-----*----) (----*-----) (----*-----) ------+---------+---------+---------+--8.80 9.60 10.40 11.20 To determine if there is a difference in mean corrosion rates among the 4 systems, we test: H 0 : 1 2 3 4 H a : At least 2 of the treatment means differs The test statistic is F 34.12 and the p-value is p .000 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate a difference in mean corrosion rates among the 4 systems at any reasonable value of . Using SAS, Tukey’s multiple comparison results are: Tukey's Studentized Range (HSD) Test for CORROSION NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 6 Error Mean Square 0.093611 Critical Value of Studentized Range 4.89559 Minimum Significant Difference 0.8648 Means with the same letter are not significantly different. Tukey Grouping Mean N SYSTEM A 11.0667 3 3 B B B 9.7333 3 2 9.0667 3 1 8.7333 3 4 C C C The mean corrosion rate for system 3 is significantly larger than all of the other mean corrosion rates. The mean corrosion rate of system 2 is significantly larger than the mean for system 4. If we want the system Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 515 (epoxy coating) with the lowest corrosion rate, we would pick either system 1 or system 4. There is no significant difference between these two groups and they are in the lowest corrosion rate group. 9.64 9.65 a. There are two factors. b. No, we cannot tell whether the factors are qualitative or quantitative. c. Yes. There are four levels of factor A and three levels of factor B. d. A treatment would consist of a combination of one level of factor A and one level of factor B. There are a total of 4 3 = 12 treatments. e. One problem with only one replicate is there are no degrees of freedom for error. This is overcome by having at least two replicates. a. The ANOVA table is: Source A B AB Error Total df 2 3 6 12 23 SS .8 5.3 9.6 1.3 17.0 MS .4000 1.7667 1.6000 .1083 F 3.69 16.31 14.77 df for A is a 1 3 1 2 df for B is b 1 4 1 3 df for AB is ( a 1)(b 1) 2 3 6 df for Error is n ab 24 3 4 12 df for Total is n 1 24 1 23 SSE SS Total SST SSB 17.0 .8 5.3 9.6 1.3 MSB SS B 5.3 1.7667 b 1 4 1 MSE 1.3 SSE .1083 n ab 24 3(4) FAB b. MSAB FA MSA SS A .8 .40 a 1 3 1 9.6 SS AB 1.60 (a 1)(b 1) (3 1)(4 1) MS A .4000 = 3.69 MSE .1083 FB MS AB 1.6000 = 14.77 MSE .1083 Sum of Squares for Treatment SSA SSB SSAB .8 5.3 2.6 15.7 MST 15.7 SST 1.4273 ab 1 3(4) 1 FT MST 1.4273 = 13.18 MSE .1083 To determine if the treatment means differ, we test: Copyright © 2014 Pearson Education, Inc. MS B 1.7667 = 16.31 MSE .1083 516 Chapter 9 H 0 : 1 2 12 H a : At least 2 of the treatment means differs The test statistic is F 13.18 . The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 3(4) 1 11 and 2 n ab 24 3(4) 12 . From Table VI, Appendix D, F.05 2.75 . The rejection region is F 2.75 . Since the observed value of the test statistic falls in the rejection region ( F 13.18 2.75) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .05 . c. d. e. Yes. We need to partition the Treatment Sum of Squares into the Main Effects and Interaction Sum of Squares. Then we test whether factors A and B interact. Depending on the conclusion of the test for interaction, we either test for main effects or compare the treatment means. Two factors are said to interact if the effect of one factor on the dependent variable is not the same at different levels of the second factor. If the factors interact, then tests for main effects are not necessary. We need to compare the treatment means for one factor at each level of the second. To determine if the factors interact, we test: H0: Factors A and B do not interact to affect the response mean Ha: Factors A and B do interact to affect the response mean The test statistic is F MS AB 14.77 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (3 1)(4 1) 6 and 2 n ab 24 3(4) 12 . From Table VI, Appendix D, F.05 3.00 . The rejection region is F 3.00 . Since the observed value of the test statistic falls in the rejection region ( F 14.77 3.00) , H0 is rejected. There is sufficient evidence to indicate the two factors interact to affect the response mean at .05 . 9.66 f. No. Testing for main effects is not warranted because interaction is present. Instead, we compare the treatment means of one factor at each level of the second factor. a. Factor A has 3 1 4 levels and factor B has 1 1 2 levels. b. There are a total of 23 1 24 observations and 4 2 8 treatments. Therefore, there were 24 / 8 3 observations for each treatment. c. AB Error MSA df (a 1)(b 1) (4 1)(2 1) 3 df n ab 24 4(2) 16 SSA SSA ( a 1) MSA (4 1)(.75) 2.25 a 1 MSB Copyright © 2014 Pearson Education, Inc. SS B .95 .95 b 1 2 1 Design of Experiments and Analysis of Variance 517 MSAB SSAB SSAB (a 1)(b 1) MSAB (4 1)(2 1)(.30) .9 (a 1)(b 1) SSE SS Total SSA SSB SSAB 6.5 2.25 .95 .9 2.4 Treatment df ab 1 4(2) 1 7 FA MS A .75 5.00 MSE .15 FB MST MSE SST 4.1 .5857 ab 1 7 MS B .95 6.33 MSE .15 FAB FT 2.4 SSE .15 n ab 24 4(2) MST .5857 3.90 MSE .15 MS AB .30 2.00 MSE .15 The ANOVA table is: Source Treatments A B AB Error Total d. df 7 3 1 3 16 23 SS 4.1 2.25 .95 .90 2.40 6.50 MS .59 .75 .95 .30 .15 F 3.90 5.00 6.33 2.00 To determine whether the treatment means differ, we test: H 0 : 1 2 8 H a : At least two treatment means differs The test statistic is F MST 3.90 MSE The rejection region requires .10 in the upper tail of the F-distribution with 1 ab 1 4(2) 1 7 and 2 n ab 24 4(2) 16 . From Table V, Appendix D, F.10 2.13 . The rejection region is F 2.13 . Since the observed value of the test statistic falls in the rejection region ( F 3.90 2.13) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .10 . e. To determine if the factors interact, we test: H0: Factors A and B do not interact to affect the response mean Ha: Factors A and B do interact to affect the response mean The test statistic is F 2.00 . The rejection region requires .10 in the upper tail of the F-distribution with 1 (a 1)(b 1) (4 1)(2 1) 3 and 2 n ab 24 4(2) 16 . From Table V, Appendix D, F.10 2.46 . The rejection region is F 2.46 . Copyright © 2014 Pearson Education, Inc. 518 Chapter 9 Since the observed value of the test statistic does not fall in the rejection region ( F 2.00 2.46) , H0 is not rejected. There is insufficient evidence to indicate factors A and B interact at .10 . To determine if the four means of factor A differ, we test: H0: There is no difference in the four means of factor A Ha: At least two of the factor A means differ The test statistic is F 5.00 . The rejection region requires .10 in the upper tail of the F-distribution with1 a 1 4 1 3 and 2 n ab 24 4(2) 16 . From Table V, Appendix D, F.10 2.46 . The rejection region is F 2.46 . Since the observed value of the test statistic falls in the rejection region ( F 5.00 2.46) , H0 is rejected. There is sufficient evidence to indicate at least two of the four means of factor A differ at .10 . To determine if the 2 means of factor B differ, we test: H0: There is no difference in the two means of factor B Ha: At least two of the factor B means differ The test statistic is F 6.33 . The rejection region requires .10 in the upper tail of the F-distribution with1 b 1 2 1 1 and 2 n ab 24 4(2) 16 . From Table V, Appendix D, F.10 3.05 . The rejection region is F 3.05 . Since the observed value of the test statistic falls in the rejection region ( F 6.33 3.05) , H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at .10 . All of the tests performed are warranted because interaction was not significant. 9.67 a. The treatments are the combinations of the levels of factor A and the levels of factor B. There are 2 3 6 treatments. The treatment means are: x11 x 3.1 4.0 3.55 x12 x21 x 5.9 5.3 5.6 x22 11 2 2 21 2 2 x 4.6 4.2 4.4 x13 x 6.4 7.1 6.75 x 2.9 2.2 2.55 x23 x 3.3 2.5 2.9 12 2 2 22 2 2 Copyright © 2014 Pearson Education, Inc. 13 2 2 23 2 2 Design of Experiments and Analysis of Variance 519 Using MINITAB, the graph is: Scatterplot of A1, A2 vs B 7 Variable A1 A2 Y-Data 6 5 4 3 2 1 b. 2 B 3 The treatment means appear to be different because the sample means are quite different. The factors appear to interact because the lines are not parallel. SST SSA SSB SSAB 4.4408 4.1267 18.0667 26.5742 MST 26.5742 SST 5.315 ab 1 2(3) 1 FT MST 5.315 21.62 MSE .246 To determine whether the treatment means differ, we test: H 0 : 1 2 6 H a : At least two treatment means differs The test statistic is F MST 21.62 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 2(3) 1 5 and 2 n ab 12 2(3) 6 . From Table VI, Appendix D, F.05 4.39 . The rejection region is F 4.39 . Since the observed value of the test statistic falls in the rejection region ( F 21.62 4.39) , H0 is rejected. There is sufficient evidence to indicate that the treatment means differ at .05 . This supports the plot in a. c. Yes. Since there are differences among the treatment means, we test for interaction. To determine whether the factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F MSAB 9.0033 36.62 = MSE .24583 The rejection region requires .05 in the upper tail of the F-distribution with Copyright © 2014 Pearson Education, Inc. 520 Chapter 9 1 (a 1)(b 1) (2 1)(3 1) 2 and 2 n ab 12 2(3) 6 . From Table VI, Appendix D, F.05 5.14 . The rejection region is F 5.14 . Since the observed value of the test statistic falls in the rejection region ( F 36.62 5.14) , H0 is rejected. There is sufficient evidence to indicate that factors A and B interact to affect the response mean at .05 . No. Because interaction is present, the tests for main effects are not warranted. e. The results of the tests in parts b and c support the visual interpretation in part a. a. The treatments are the combinations of the levels of factor A and the levels of factor B. There are 2 2 4 treatments. The treatment means are: x11 x 29.6 35.2 32.4 x12 11 2 x21 2 12.9 17.6 x21 15.25 2 2 Using MINITAB, the graph is: x22 x 47.3 42.1 44.7 12 2 x22 2 2 28.4 22.7 25.55 2 Scatterplot of A1, A2 vs B Variable A1 A2 45 40 35 Y-Data 9.68 d. 30 25 20 15 1 2 B The factors do not appear to interact—the lines are almost parallel. The treatment means do appear to differ because the sample means range from 15.25 to 44.7. x 235.8 6,950.205 CM 2 b. 2 i n 8 SS (Total ) x 2 CM 7922.92 6950.205 972.715 SSA A CM 154.2 81.6 6950.205 7609.05 6950.205 658.845 SSB B CM 95.3 140.5 6950.205 7205.585 6950.205 255.38 2 i 2(2) br 2 i ar 2 2 2(2) 2 2(2) 2 2(2) Copyright © 2014 Pearson Education, Inc. SSAB Design of Experiments and Analysis of Variance 521 AB SSA SSB CM 2 ij r 2 64.8 89.4 2 30.52 51.12 658.845 255.38 6950.205 7866.43 7864.43 2 2 2 2 2 SSE SS Total SSA SSB SSAB 972.715 658.845 255.38 2 56.49 A df a 1 2 1 1 B df b 1 2 1 1 AB df (a 1)(b 1) (2 1)(2 1) 1 Error df n ab 8 2 2 4 Total df n 1= 8 1 7 MSA SSA 658.845 SS B 255.38 658.845 MSB 255.38 a 1 b 1 1 1 MSE SS E 56.49 14.1225 n ab 4 FA MS A 658.845 46.65 MSE 14.1225 FB MS B 255.38 18.08 MSE 14.1225 MSAB FAB 2 SS AB 2 (a 1)(b 1) 1 MS AB 2 .14 MSE 14.1225 The ANOVA table is: Source A B AB Error Total c. df 1 1 1 4 7 SS 658.845 255.380 2.000 56.490 972.715 MS 658.845 255.380 2.000 14.1225 F 46.65 18.08 .14 SST SSA SSB SSAB 658.845 255.380 2.000 916.225 MST SST 916.225 305.408 ab 1 3 FT df ab 1 2 2 1 3 MST 305.408 = 21.63 MSE 14.1225 To determine whether the treatment means differ, we test: H 0 : 1 2 3 4 H a : At least two treatment means differs The test statistic is F 21.63 . The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 2(2) 1 3 and 2 n ab 8 2(2) 4 . From Table VI, Appendix D, F.05 6.59 . The rejection region is F 6.59 . Copyright © 2014 Pearson Education, Inc. 522 Chapter 9 Since the observed value of the test statistic falls in the rejection region ( F 21.63 6.59) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .05 . This agrees with the conclusion in part a. d. Since there are differences among the treatment means, we test for the presence of interaction: H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is F .14 . The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (2 1)(2 1) 1 and 2 n ab 8 2(2) 4 . From Table VI, Appendix D, F.05 7.71 . The rejection region is F 7.71 . Since the observed value of the test statistic does not fall in the rejection region ( F .14 7.71) ), H0 is not rejected. There is insufficient evidence to indicate the factors interact at .05 . e. Since the interaction was not significant, we test for main effects. To determine whether the two means of factor A differ, we test: H 0 : 1 2 H a : 1 2 The test statistic is F 46.65 . The rejection region requires .05 in the upper tail of the F-distribution with1 a 1 2 1 and 2 n ab 8 2(2) 4 . From Table VI, Appendix D, F.05 7.71 . The rejection region is F 7.71 . Since the observed value of the test statistic falls in the rejection region ( F 46.65 7.71) , H0 is rejected. There is sufficient evidence to indicate the two means of factor A differ at .05 . To determine whether the two means of factor B differ, we test: H 0 : 1 2 H a : 1 2 The test statistic is F 18.08 . The rejection region requires .05 in the upper tail of the F-distribution with 1 = b 1 = 2 1 = 1 and 2 n ab 8 2(2) 4 . From Table VI, Appendix D, F.05 7.71 . The rejection region is F 7.71 . Since the observed value of the test statistic falls in the rejection region ( F 18.08 7.71) , H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at .05 . Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 523 f. The results of all the tests agree with those in part a. g. Since no interaction is present, but the means of both factors A and B differ, we compare the two means of factor A and compare the two means of factor B. Since there are only two means to compare for each factor, the higher population mean corresponds to the higher sample mean. Factor A: x1 x 29.6 35.2 47.3 42.1 38.55 x2 x 12.9 17.6 28.4 22.7 20.4 1 2(2) br 2 2(2) br The mean for level 1 of factor A is significantly higher than the mean for level 2. Factor B: x1 x 29.6 35.2 12.9 17.6 23.825 x2 x 47.3 42.1 28.4 22.7 35.125 1 2(2) ar 2 2(2) ar The mean for level 2 of factor B is significantly higher than the mean for level 1. 9.69 a. SSA .2 1000 200 , SSB .11000 100 , SSAB .1 1000 100 SSE SS Total SSA SSB SSAB 1000 200 100 100 600 SST SSA SSB SSAB 200 100 100 400 MSB SS B 100 50 b 1 3 1 MSE 600 SSE 33.333 n ab 27 3(3) FA MSAB MSA SSA 200 100 a 1 3 1 100 SSAB 25 (a 1)(b 1) (3 1)(3 1) MST 400 SST 50 ab 1 3(3) 1 MSA 100 = = 3.00 MSE 33.333 FB MS B 50 1.50 MSE 33.333 MSAB 25 .75 MSE 33.333 FT MST 50 1.50 MSE 33.333 FAB Source A B AB Error Total df 2 2 4 18 26 SS 200 100 100 600 1000 MS 100 50 25 33.333 F 3.00 1.50 .75 Copyright © 2014 Pearson Education, Inc. 524 Chapter 9 To determine whether the treatment means differ, we test: H 0 : 1 2 9 H a : At least two treatment means differs The test statistic is F MST 1.50 MSE Suppose .05 . The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 3(3) 1 8 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.51 . The rejection region is F 2.51 . Since the observed value of the test statistic does not fall in the rejection region ( F 1.50 2.51) , H0 is not rejected. There is insufficient evidence to indicate the treatment means differ at .05 . Since there are no treatment mean differences, we have nothing more to do. b. SSA .1 1000 100 , SSB .1 1000 100 , SSAB .5 1000 500 SSE SS Total SSA SSB SSAB 1000 100 100 500 300 SST SSA SSB SSAB 100 100 500 700 MSB SS B 100 50 b 1 3 1 MSAB MSE 300 SSE 16.667 n ab 27 3(3) MST FA MSA SSA 100 50 a 1 3 1 500 SSAB 125 (a 1)(b 1) (3 1)(3 1) SST 700 87.5 ab 1 9 1 MS A 50 = = 3.00 MSE 16.667 FB MS B 50 3.00 MSE 16.667 MS AB 125 = = 7.50 MSE 16.667 FT MST 87.5 = = 5.25 MSE 16.667 FAB Source A B AB Error Total df 2 2 4 18 26 SS 100 100 500 300 1000 MS 50 50 125 16.667 F 3.00 3.00 7.50 To determine if the treatment means differ, we test: H 0 : 1 2 9 H a : At least two treatment means differs Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 525 The test statistic is F MST 5.25 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 3(3) 1 8 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.51 . The rejection region is F 2.51 . Since the observed value of the test statistic falls in the rejection region ( F 5.25 2.51) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .05 . Since the treatment means differ, we next test for interaction between factors A and B. To determine if factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F MS AB 7.50 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (3 1)(3 1) 4 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.93 . The rejection region is F 2.93 . Since the observed value of the test statistic falls in the rejection region ( F 7.50 2.93) , H0 is rejected. There is sufficient evidence to indicate the factors A and B interact at .05 . Since interaction is present, no tests for main effects are necessary. c. SSA .4 1000 400 , SSB .1 1000 100 , SSAB .2 1000 200 SSE SS Total SSA SSB SSAB 1000 400 100 200 300 SST SSA SSB SSAB 400 100 200 700 MSA MSB SS B 100 50 b 1 3 1 MSAB MSE 300 SSE 16.667 n ab 27 3(3) MST SSA 400 50 a 1 3 1 200 MSAB 50 (a 1)(b 1) (3 1)(3 1) 700 SST 87.5 ab 1 3(3) 1 FA MSA 200 = = 12.00 MSE 16.667 FB MSB 50 = = 3.00 MSE 16.667 FAB MSAB 50 = = 3.00 MSE 16.667 FT MST 87.5 = 5.25 MSE 16.667 Copyright © 2014 Pearson Education, Inc. 526 Chapter 9 Source A B AB Error Total df SS 400 100 200 300 1000 2 2 4 18 26 MS 200 50 50 16.667 F 12.00 3.00 3.00 To determine if the treatment means differ, we test: H 0 : 1 2 9 H a : At least two treatment means differs The test statistic is F MST 5.25 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 3(3) 1 8 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.51 . The rejection region is F 2.51 . Since the observed value of the test statistic falls in the rejection region ( F 5.25 2.51) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .05 . Since the treatment means differ, we next test for interaction between factors A and B. To determine if factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F MS AB 3.00 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (3 1)(3 1) 4 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.93 . The rejection region is F 2.93 . Since the observed value of the test statistic falls in the rejection region ( F 3.00 2.93) , H0 is rejected. There is sufficient evidence to indicate the factors A and B interact at .05 . Since interaction is present, no tests for main effects are necessary. d. SSA .4 1000 400 , SSB .4 1000 400 , SSAB .1 1000 100 SSE SS Total SSA SSB SSAB 1000 400 400 100 100 SST SSA SSB SSAB 400 400 100 900 MSB SS B 400 200 b 1 3 1 MSAB MSA SSA 400 200 a 1 3 1 100 SSAB 25 (a 1)(b 1) (3 1)(3 1) Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 527 MSE FA 100 SSE 5.556 n ab 27 3(3) MST 900 SST 112.5 ab 1 3(3) 1 MSA 200 = = 36.00 MSE 5.556 FB MSB 200 = = 36.00 MSE 5.556 MSAB 25 = = 4.50 MSE 5.556 FT MST 112.5 = 20.25 MSE 5.556 FAB Source A B AB Error Total df 2 2 4 18 26 SS 400 400 100 100 1000 MS 200 200 25 5.556 F 36.00 36.00 4.50 To determine if the treatment means differ, we test: H 0 : 1 2 9 H a : At least two treatment means differs The test statistic is F MST 20.25 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 3(3) 1 8 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.51 . The rejection region is F 2.51 . Since the observed value of the test statistic falls in the rejection region ( F 20.25 2.51) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .05 . Since the treatment means differ, we next test for interaction between factors A and B. To determine if factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F MSAB 4.50 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (3 1)(3 1) 4 and 2 n ab 27 3(3) 18 . From Table VI, Appendix D, F.05 2.93 . The rejection region is F 2.93 . Since the observed value of the test statistic falls in the rejection region ( F 4.50 2.93) , H0 is rejected. There is sufficient evidence to indicate the factors A and B interact at .05 . Since interaction is present, no tests for main effects are necessary. Copyright © 2014 Pearson Education, Inc. 528 Chapter 9 9.70 a. The experimental design used was a factorial design. b. The two factors are diet and age. There are 2 levels of diet – fine limestone (FL) and coarse limestone (CL). There are 2 levels of age – young and old. There are 2 2 4 treatments: FL/young, FL/old, CL/young, and CL/old. c. The experimental units are the hens. d. The dependent variable is egg shell thickness. e. If diet and age do not interact, then the effect of diet on the egg shell thickness is the same at each level of age. f. This indicates that there is no significant difference in egg shell thickness between the young and old hens. g. This indicates that there is a significant difference in the mean egg shell thickness due to diet. The mean egg shell thickness for eggs produced by hens on the CL diet is greater than the mean egg shell thickness for eggs produced by hens on the FL diet. a. The two factors are type of statement and order of information. There are 2 2 4 treatments: concrete/statement first, concrete/behavior first, abstract/statement first, and abstract/behavior first. b. This indicates that the effect of type of statement on the level of hypocrisy depends on the order of the information. c. Using MINITAB, a plot of the means is: 9.71 Scatterplot of Hypocrisy vs Order 6.00 Ty pe Abstract Concrete 5.75 Hypocrisy 5.50 5.25 5.00 4.75 4.50 Statement Behavior Order 9.72 d. Since the interaction between the type of statement and the order of information was significant, then the tests for main effects should not be performed. Multiple comparisons on some or all of the pairs of treatments should be performed next. a. There are a total of 2 4 8 treatments. b. The interaction between temperature and type was significant. This means that the effect of type of yeast on the mean autolysis yield depends on the level of temperature. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 529 c. To determine if the main effect of type of yeast is significant, we test: H 0 : Ba Br H a : Ba Br To determine if the main effect of temperature is significant, we test: H 0 : 1 2 3 4 H a : At least two treatment means differs d. The tests for the main effects should not be run since the test for interaction was significant. If interaction is significant, then these interaction effects could cover up the main effects. Thus, the main effect tests would not be informative. e. Baker’s yeast: The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o. Brewer’s yeast: The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o. 9.73 a. If justice reparation potential and producer need interact, then the effect of justice reparation potential on intension depends on the level of producer need. b. To determine if interaction exists, we test: H0: Justice reparation potential and producer need do not interact Ha: Justice reparation potential and producer need do interact The test statistic is F 20.55 and the p-value is p 0.000 . Since the p-value is less than ( p 0.000 .01) , H0 is rejected. There is sufficient evidence to indicate reparation justice potential and producer need interact to affect intension at .01 . 9.74 c. No. Since the test for interaction was significant, then the tests for the main effects are not necessary. d. This plot indicates that for high reparation justice potential, as producer need changes from High to Moderate, the mean intension decreases. However, for low reparation justice potential, as producer need changes from High to Moderate, the mean intension increases. This indicates that the effect of reparation justice potential on intension depends on the level of producer need. e. Yes. This is exactly what the graph shows. a. There are a total of 2 4 8 treatments for this study. They include all combinations of Insomnia status and Education level. The 8 treatments are: Normal sleeper, College graduate Normal sleeper, Some college Normal sleeper, High school graduate Normal sleeper, High school dropout Chronic insomnia, College graduate Chronic insomnia, Some college Chronic insomnia, High school graduate Chronic insomnia, High school dropout Copyright © 2014 Pearson Education, Inc. 530 Chapter 9 b. Since Insomnia and Education did not interact, this means that the effect of Insomnia on the Fatigue Severity Scale does not depend on the level of Education. In a graph, the lines will be parallel. A possible graph of this situation is: Scatterplot of FSS vs Insomnia Education 1 2 3 4 11 10 9 FSS 8 7 6 5 4 3 2 1.0 9.75 1.2 1.4 1.6 Insomnia 1.8 2.0 c. This means that the researchers can infer that the population mean FSS for people who had insomnia is higher than the population mean FSS for normal sleepers. d. This means that at least one level of education had a mean FSS score that differed from the rest. There may be more than one difference, but there is at least one. e. With 95% confidence, we can conclude that the mean FSS value for high school dropouts is significantly higher than the mean FSS values for the 3 other education levels. There is no significant difference in the mean FSS values for college graduates, those with some college, and high school graduates. a. dfOrder a 1 2 1 1 , df Menu b 1 2 1 1 , df OxM (a 1)(b 1) (2 1)(2 1) 1 , df Error n ab 180 2(2) 176 Source Order Menu Order x Menu Error Total df 1 1 1 176 179 F-value ----11.25 p-value ----<.001 b. Since the p-value is less than ( p 0.001 .05) , H0 is rejected. There is sufficient evidence to indicate order and menu interact to affect the amount willing to pay at .05 . c. No, these results are not required to complete the analysis. Since the test for interaction was significant, there is no need to run the main effect tests. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 531 d. Using MINITAB, a graph of the means is: Interaction Plot for WillingPay Order Vice Virtue 17 16 Mean 15 14 13 12 11 Homogeneous Mixed Menu 9.76 a. There are two factors for this experiment, housing system and weight class. There are a total of 2 4 8 treatments. The treatments are: Cage, M Barn, M b. Cage, L Barn, L Free, M Organic, M Free, L Organic, L Using SAS, the results are: The GLM Procedure Dependent Variable: OVERRUN c. Source DF Sum of Squares Mean Square F Value Pr > F Model 7 11364.52381 1623.50340 14.93 <.0001 Error 20 2175.33333 108.76667 Corrected Total 27 13539.85714 R-Square Coeff Var Root MSE OVERRUN Mean 0.839339 2.061383 10.42913 505.9286 Source DF Type I SS Mean Square F Value Pr > F HOUSING WTCLASS HOUSING*WTCLASS 3 1 3 10787.79048 329.14286 247.59048 3595.93016 329.14286 82.53016 33.06 3.03 0.76 <.0001 0.0973 0.5303 Source DF Type III SS Mean Square F Value Pr > F HOUSING WTCLASS HOUSING*WTCLASS 3 1 3 10787.79048 320.47407 247.59048 3595.93016 320.47407 82.53016 33.06 2.95 0.76 <.0001 0.1015 0.5303 To determine if interaction between housing system and weight class exists, we test: H0: Housing system and weight class do not interact Ha: Housing system and weight class do interact Copyright © 2014 Pearson Education, Inc. 532 Chapter 9 The test statistic is F 0.76 and the p-value is p .5303 . Since the p-value is not less than ( p .5303 .05) , H0 is not rejected. There is insufficient evidence to indicate that housing system and weight class interact at .05 . d. To determine if there is a difference in mean whipping capacity among the 4 housing systems, we test: H 0 : 1 2 3 4 H a : At least two treatment means differs The test statistic is F 33.06 and the p-value is p .0001 . Since the p-value is less than ( p .0001 .05) , H0 is rejected. There is sufficient evidence to indicate a difference in mean whipping capacity among the 4 housing systems at .05 . e. To determine if there is a difference in mean whipping capacity between the 2 weight classes, we test: H 0 : 1 2 H a : 1 2 The test statistic is F 2.95 and the p-value is p .1015 . Since the p-value is not less than ( p .1015 .05) , H0 is not rejected. There is insufficient evidence to indicate a difference in mean whipping capacity between the 2 weight classes at .05 . Yes. Using MINITAB, a plot of the data is: Scatterplot of Time vs Density Agent Gum PVP 8 7 6 M ean T ime 9.77 5 4 3 2 1 0 0 Low High Density Since the lines are not parallel, this indicates interaction is present. The increase in mean time when density is increased from low to high for PVP is not as great as the increase in mean time when density is increased from low to high for GUM. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 533 9.78 Using MINITAB, the results of the ANOVA are: General Linear Model: NUMBER versus GROUP, SET Factor GROUP SET Type fixed fixed Levels 3 3 Values 3, 6, 12 FIRST, LAST, MIDDLE Analysis of Variance for NUMBER, using Adjusted SS for Tests Source GROUP SET GROUP*SET Error Total DF 2 2 4 81 89 S = 1.00308 Seq SS 15.267 62.600 7.133 81.500 166.500 Adj SS 15.267 62.600 7.133 81.500 R-Sq = 51.05% Adj MS 7.633 31.300 1.783 1.006 F 7.59 31.11 1.77 P 0.001 0.000 0.142 R-Sq(adj) = 46.22% Means SET FIRST LAST MIDDLE N 30 30 30 NUMBER 3.0000 1.1000 1.4000 GROUP 3 6 12 N 30 30 30 NUMBER 2.4000 1.4333 1.6667 Tukey 95.0% Simultaneous Confidence Intervals Response Variable NUMBER All Pairwise Comparisons among Levels of GROUP GROUP = 3 subtracted from: GROUP 6 12 GROUP = Lower -1.586 -1.352 GROUP 12 6 Center -0.9667 -0.7333 Upper -0.3477 -0.1143 ---+---------+---------+---------+--(--------*--------) (--------*-------) ---+---------+---------+---------+---1.40 -0.70 0.00 0.70 subtracted from: Lower -0.3857 Center 0.2333 Upper 0.8523 ---+---------+---------+---------+--(--------*--------) ---+---------+---------+---------+---1.40 -0.70 0.00 0.70 Copyright © 2014 Pearson Education, Inc. 534 Chapter 9 Tukey 95.0% Simultaneous Confidence Intervals Response Variable NUMBER All Pairwise Comparisons among Levels of SET SET = FIRST subtracted from: SET LAST MIDDLE Lower -2.519 -2.219 SET = LAST SET MIDDLE Center -1.900 -1.600 Upper -1.281 -0.981 -----+---------+---------+---------+(-----*-----) (-----*-----) -----+---------+---------+---------+-2.0 -1.0 0.0 1.0 subtracted from: Lower -0.3190 Center 0.3000 Upper 0.9190 -----+---------+---------+---------+(-----*-----) -----+---------+---------+---------+-2.0 -1.0 0.0 1.0 To determine if group size and photo set interact to affect the number of selections, we test: H0: Group size and Photo set do not interact to affect the number of selections Ha: Group size and Photo set interact to affect the number of selections The test statistic is F 1.77 and the p-value is p .142 . Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate that group size and photo set interact to affect the number of selections for any reasonable level of . Since there is no evidence of an interaction, we will next test for the main effects. To determine if group size had an effect on the mean number of selections, we test: H 0 : 1 2 3 H a : At least two group size means differs The test statistic is F 7.59 and the p-value is p .001 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that group size has an effect on the mean number of selections for any level of greater than .001. To determine if photo set had an effect on the mean number of selections, we test: H 0 : 1 2 3 H a : At least two photo set means differs The test statistic is F 31.11 and the p-value is p .000 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that photo set has an affect the mean number of selections for any level of greater than .000. Since both main effects are significant, we will run Tukey’s multiple comparison procedure on each main effect to find where the differences exist. The mean number of selections made for the different group sizes are: __________________ Means: 1.433 1.667 2.400 Groups: 6 12 3 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 535 The confidence interval comparing size 3 to size 6 is (-1.586, -.3477). Since both endpoints of the interval are negative, the mean number of selections for size 3 is significantly greater than the mean number of selections for size 6. The confidence interval comparing size 3 to size 12 is (-1.352, -.1143). Since both endpoints of the interval are negative, the mean number of selections for size 3 is significantly greater than the mean number of selections for size 12. The confidence interval comparing size 6 to size 12 is (-.3857, .8523). Since 0 is contained in the interval, there is no difference in the mean number of selections between sizes 6 and 12. Thus, there are significantly more selections made for group size 3 than for the other two sizes. The mean number of selections made for the different photo sets are: __________________ Means: 1.10 1.40 3.00 Groups: Last Middle First The confidence interval comparing the first photo set to the last photo set is (-2.519, -1.281). Since both endpoints of the interval are negative, the mean number of selections for the first photo set is significantly greater than the mean number of selections for the last photo set. The confidence interval comparing the first photo set to the middle photo set is (-2.219, -.981). Since both endpoints of the interval are negative, the mean number of selections for the first photo set is significantly greater than the mean number of selections for the middle photo set. The confidence interval comparing the middle photo set to the last photo set is (-.3190, .9190). Since 0 is contained in the interval, there is no difference in the mean number of selections between the last photo set and the middle photo set. Thus, there are significantly more selections made for the first photo set than for the other two photo sets. 9.79 Using MINITAB, a complete factorial design was fit to the data: General Linear Model: RECALL versus CONTENT, BEFORE Factor CONTENT BEFORE Type fixed fixed Levels 3 2 Values NEUTRAL, SEX, VIOLENT NO, YES Analysis of Variance for RECALL, using Adjusted SS for Tests Source CONTENT BEFORE CONTENT*BEFORE Error Total S = 1.73153 DF 2 1 2 318 323 Seq SS 123.265 6.458 7.472 953.421 1090.617 R-Sq = 12.58% Adj SS 120.004 6.393 7.472 953.421 Adj MS 60.002 6.393 3.736 2.998 F 20.01 2.13 1.25 P 0.000 0.145 0.289 R-Sq(adj) = 11.21% Grouping Information Using Tukey Method and 95.0% Confidence CONTENT NEUTRAL VIOLENT SEX N 108 108 108 Mean 3.167 2.090 1.731 Grouping A B B Means that do not share a letter are significantly different. Copyright © 2014 Pearson Education, Inc. 536 Chapter 9 Grouping Information Using Tukey Method and 95.0% Confidence BEFORE NO YES N 162 162 Mean 2.470 2.188 Grouping A A Means that do not share a letter are significantly different. First, we test for the interaction term. To determine if content group and whether one had watched the commercial before interact to affect recall, we test: H 0 : Content and whether one watched commercial before do not interact H a : Content and whether one watched commercial before do interact The test statistic is F 1.25 and the p-value is p .289 . Since the p-value is not small, H0 is not rejected. There is no evidence to indicate content and whether the commercial was viewed before interact to affect recall for any reasonable value of . Next, we test for the main effects. To determine if the mean recall differs among the content groups, we test: H 0 : 1 2 3 H a : At least two means differ The test statistic is F 20.01 and the p-value is p .000 . Since the p-value is very small, H0 is rejected. There is evidence to indicate the mean recall differs among the different content groups for any reasonable value of . Tukey’s multiple comparison on the content means yielded the following. The mean recall for those in the neutral content group was significantly higher than the mean recall of the other 2 groups. No other differences existed. To determine if the mean recall differs between whether one watched the ad before or not, we test: H 0 : 1 2 H a : 1 2 The test statistic is F 2.13 and the p-value is p .145 . Since the p-value is not small, H0 is not rejected. There is no evidence to indicate the mean recall differs between whether one watched the ad before or not for any reasonable value of . These results agree with the researchers’ conclusions. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 537 9.80 Using MINITAB, the ANOVA results are: General Linear Model: Deviation versus Group, Trail Factor Group Trail Type Levels Values fixed 4 F G M N fixed 2 C E Analysis of Variance for Deviation, using Adjusted SS for Tests Source Group Trail Group*Trail Error Total DF 3 1 3 112 119 Seq SS 16271.2 46445.5 2245.2 82131.7 147093.6 Adj SS 13000.6 46445.5 2245.2 82131.7 Adj MS 4333.5 46445.5 748.4 733.3 F 5.91 63.34 1.02 P 0.001 0.000 0.386 First, we must test for treatment effects. SST SS Group SS Trail SS GxT 16, 271.2 46, 445.5 2, 245.2 64, 961.9 . The df 3 1 3 7 . MST 64,961.9 SST 9, 280.2714 ab 1 4(2) 1 F MST 9, 280.2714 12.66 MSE 733.3 To determine if there are differences in mean ratings among the 8 treatments, we test: H0: All treatment means are the same Ha: At least two treatment means differ The test statistic is F 12.66 . Since no was given, we will use .05 . The rejection region requires .05 in the upper tail of the F distribution with1 ab 1 4(2) 1 7 and 2 n – ab 120 – 4 2 112 . From Table VI, Appendix D, F.05 2.09 . The rejection region is F 2.09 . Since the observed value of the test statistic falls in the rejection region ( F 12.66 2.09) , H0 is rejected. There is sufficient evidence that differences exist among the treatment means at .05 . Since differences exist, we now test for the interaction effect between Trail and Group. To determine if Trail and Group interact, we test: H0: Trail and Group do not interact Ha: Trail and Group do interact The test statistic is F 1.02 and the p-value is p .386 . Since the p-value is greater than ( p .386 .05) , H0 is not rejected. There is insufficient evidence that Trail and Group interact at .05 . Since the interaction does not exist, we test for the main effects of Trail and Group. Copyright © 2014 Pearson Education, Inc. 538 Chapter 9 To determine if there are differences in the mean trail deviations between the two levels of Trail, we test: H 0 : 1 2 H a : 1 2 The test statistics is F 63.34 and the p-value is p .000 . Since the p-value is less than ( p .000 .05) , H0 is rejected. There is sufficient evidence that the mean trail deviations differ between the fecal extract trail and the control trail at .05 . To determine if there are differences in the mean trail deviations between the four levels of Group, we test: H 0 : 1 2 3 4 H a : At least two means differ The test statistics is F 5.91 and p .001 . Since the p-value is less than ( p .001 .05) , H0 is rejected. There is sufficient evidence that the mean trail deviations differ among the four groups at .05 . 9.81 a. Low Load, Ambiguous: Total1 n1 x1 25(18) 450 High Load, Ambiguous: Total2 n2 x2 25(6.1) 152.5 Low Load, Common: Total3 n3 x3 25(7.8) 195 High Load, Common: Total4 n4 x4 25(6.3) 157.5 (sum of all observations)2 (450 152.5 195 157.5)2 9552 9,120.25 n 100 100 b. CM c. Low Load total is 450 195 645 . High Load total is 152.5 157.5 310 . A a 2 i 6452 3102 SS( Load ) i 1 CM 9,120.25 10, 242.5 9,120.25 1,122.25 2(25) 2(25) br Ambiguous total is 450 152.5 602.5 . Common total is 195 157.5 352.5 B b SS( Name) j 1 ar 2 j CM AB a SS(Load Name) 602.52 352.52 7, 700.0625 9, 745.25 9,120.25 625 2(25) 2(25) b i 1 j 1 r 2 ij SS (Load) SS(Name) CM 450 2 152.52 1952 157.52 1,122.25 625 9,120.25 25 25 25 25 11,543.5 1,122.25 625 9,120.25 676 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 539 d. Low Load, Ambiguous: s12 152 225 (n1 1) s12 (25 1)225 5, 400 High Load, Ambiguous: s22 9.52 90.25 (n2 1) s22 (25 1)90.25 2,166 Low Load, Common: s32 9.52 90.25 (n3 1)s32 (25 1)90.25 2,166 High Load, Common: s42 102 100 (n4 1)s42 (25 1)100 2, 400 e. SSE (n1 1) s12 (n2 1) s22 (n3 1) s32 (n4 1) s42 5, 400 2,166 2,166 2, 400 12,132 f. SS Total = SS Load SS Name SS Load x Name SSE g. The ANOVA table is: 1,122.25 625 676 12,132 14, 555.25 Source Load Name Load x Name Error Total df 1 1 1 96 99 SS 1,122.25 625.00 676.00 12,132.00 14,555.25 MS 1,122.25 625.00 676.00 126.375 F 8.88 4.95 5.35 h. Yes. We computed 5.35, which is almost the same as 5.34. The difference could be due to round-off error. i. To determine if interaction between Load and Name is present, we test: H0: Load and Name do not interact Ha: Load and Name class do interact The test statistic is F 5.35 . The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (2 1)(2 1) 1 and 2 n – ab 100 – 2 2 96 . From Table VI, Appendix D, F.05 3.96 . The rejection region is F 3.96 . Since the observed value of the test statistic falls in the rejection region ( F 5.35 3.96) , H0 is rejected. There is sufficient evidence to indicate that Load and Name interact at .05 . Copyright © 2014 Pearson Education, Inc. 540 Chapter 9 Using MINITAB, a graph of the results is: Scatterplot of Mean vs Load Name 17.5 1 2 Mean 15.0 12.5 10.0 7.5 5.0 Low High Load From the graph, the interaction is quite apparent. For Low load, the mean number of jelly beans taken for the ambiguous name is much higher than the mean number taken for the common name. However, for High load, there is essentially no difference in the mean number of jelly beans taken between the two names. j. We must assume that: 1. The response distributions for each Load-Name combination (treatment) is normal. 2. The response variance is constant for all Load-Name combinations. 3. Random and independent samples of experimental units are associated with each Load-Name combination. 9.82 A one-way ANOVA has only one factor with 2 or more levels. A two-way ANOVA has 2 factors, each at 2 or more levels. 9.83 In a completely randomized design, independent random selection of treatments to be assigned to experimental units is required. In a randomized block design, the experimental units are first grouped into blocks such that within the blocks the experimental units are homogeneous and between the blocks the experimental units are heterogeneous. Once the experimental units are grouped into blocks, the treatments are randomly assigned to the experimental units within each block so that each treatment appears one time in each block. 9.84 There are 3 2 6 treatments. They are A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2. 9.85 When the overall level of significance of a multiple comparisons procedure is , the level of significance for each comparison is less than . This is because the comparisons within the experiment are not independent of each other. 9.86 a. SSE SSTotal SST 62.55 36.95 25.60 df Treatment k 1 4 1 3 MST SST 36.95 12.32 3 df df Error n k 20 4 16 MSE SSE 25.60 1.60 16 df Copyright © 2014 Pearson Education, Inc. df Total n 1 20 1 19 F MST 12.32 7.70 MSE 1.60 Design of Experiments and Analysis of Variance 541 The ANOVA table: Source Treatment Error Total b. df 3 16 19 SS 36.95 25.60 62.55 MS 12.32 1.60 F 7.70 To determine if there is a difference in the treatment means, we test: H 0 : 1 2 3 4 H a : At least two means differ where the i represents the mean for the ith treatment. The test statistic is F MST 7.70 MSE The rejection region requires .10 in the upper tail of the F-distribution with1 k 1 4 1 3 and 2 n – k 20 4 16 . From Table V, Appendix D, F.10 2.46 . The rejection region is F 2.46 . Since the observed value of the test statistic falls in the rejection region ( F 7.70 2.46) , H0 is rejected. There is sufficient evidence to conclude that at least two of the means differ at .10 . c. x4 x 57 11.4 4 5 n4 For confidence level .90, .10 and / 2 .10 / 2 .05 . From Table III, Appendix D, with df 16 , t.05 1.746 . The confidence interval is: x4 t.05 9.87 a. 1.6 MSE 11.4 1.746 11.4 .99 10.41, 12.39 5 n4 SST SS Tot – SS Block – SSE 22.31 – 10.688 .288 11.334 MST SST 11.334 3.778 , k 1 4 1 MS ( Block ) MSE FT df k – 1 4 – 1 3 SS ( Block ) 10.688 2.672 , df b – 1 5 – 1 4 b 1 5 1 SSE .288 .024 , df n – k – b 1 20 – 4 – 5 1 12 n k b 1 20 4 5 1 MST 3.778 157.42 MSE .024 FB MS ( Block ) 2.672 111.33 MSE .024 Copyright © 2014 Pearson Education, Inc. 542 Chapter 9 The ANOVA Table is: Source Treatment Block Error Total b. df 3 4 12 19 SS 11.334 10.688 0.288 22.310 MS 3.778 2.672 0.024 F 157.42 111.33 To determine if there are differences among the treatment means, we test: H 0 : A B C D H a : At least two treatment means differ The test statistic is F MST 157.42 MSE The rejection region requires .05 in the upper tail of the F-distribution with1 k 1 4 1 3 and 2 n – k b 1 20 4 5 1 12 . From Table VI, Appendix D, F.05 3.49 . The rejection region is F 3.49 . Since the observed value of the test statistic falls in the rejection region ( F 157.42 3.49) , H0 is rejected. There is sufficient evidence to indicate differences among the treatment means at .05 . c. Since there is evidence of differences among the treatment means, we need to compare the treatment k (k 1) 4(4 1) means. The number of pairwise comparisons is 6. 2 2 d. To determine if there are differences among the block means, we test: H0: All block means are the same Ha: At least two block means differ The test statistic is F MS ( Block ) 111.33 MSE The rejection region requires .05 in the upper tail of the F distribution with1 b 1 5 1 4 and 2 n – k b 1 20 4 5 1 12 . From Table VI, Appendix D, F.05 3.26 . The rejection region is F 3.26 . Since the observed value of the test statistic falls in the rejection region ( F 111.33 3.26) , H0 is rejected. There is sufficient evidence that the block means differ at .05 . 9.88 a. df AB ( a 1) b 1 3 5 15 df Error n ab 48 4 6 24 SSAB MSAB df 3.1 15 46.5 SS Total SSA SSB SSAB SSE 2.6 9.2 46.5 18.7 77 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 543 SS A 2.6 .8667 a 1 3 MSB MS A .8667 1.11 MSE .7792 FB MSA FA Source A B AB Error Total b. df 3 5 15 24 47 SS B 9.2 1.84 b 1 5 MSE MS B 1.84 2.36 MSE .7792 FAB SS 2.6 9.2 46.5 18.7 77.0 SSE 18.7 .7792 n ab 24 MS AB 3.1 3.98 MSE .7792 F 1.11 2.36 3.98 MS .8667 1.84 3.1 .7792 Factor A has a 3 1 4 levels and factor B has b 5 1 6 levels. The number of treatments is ab 4 6 24 . The total number of observations is n 47 1 48 . Thus, two replicates were performed. c. SST SSA SSB SSAB 2.6 9.2 46.5 58.3 MST 58.3 SST 2.5347 ab 1 4(6) 1 F MST 2.5347 3.25 MSE .7792 To determine whether the treatment means differ, we test: H 0 : 1 2 24 H a : At least two treatment means differ The test statistic is F MST 3.25 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 4(6) 1 23 and 2 n – ab 48 – 4 6 24 . From Table VI, Appendix D, F.05 2.03 . The rejection region is F 2.03 . Since the observed value of the test statistic falls in the rejection region ( F 3.25 2.03) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .05 . d. Since there are differences among the treatment means, we test for the presence of interaction: H0: Factor A and factor B do not interact to affect the response mean Ha: Factor A and factor B do interact to affect the response mean The test statistic is F MS AB 3.98 MSE The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (4 1)(6 1) 15 and 2 n – ab 48 – 4 6 24 . From Table VI, Appendix D, F.05 2.11 . The rejection region is F 2.11 . Copyright © 2014 Pearson Education, Inc. 544 Chapter 9 Since the observed value of the test statistic falls in the rejection region ( F 3.98 2.11) , H0 is rejected. There is sufficient evidence to indicate factors A and B interact to affect the response means at .05 . Since the interaction is significant, no further tests are warranted. Multiple comparisons need to be performed. 9.89 a. A completely randomized design was used. b. There are 4 treatments: 3 robots/colony, 6 robots/colony, 9 robots/colony, and 12 robots/colony. c. To determine if there were differences in the mean energy expended (per robot) among the 4 colony sizes, we test: H 0 : 1 2 3 4 H a : At least two means differ d. Since the p-value is less than p .001 .05 , H0 is rejected. There is sufficient evidence to indicate differences in mean energy expended per robot among the 4 colony sizes at .05 . 9.90 9.91 k k –1 The total number of comparisons conducted is c f. The mean energy expended by robots in the 12 robot colony is significantly smaller than the mean energy expended by robots in any of the other size colonies. There are no differences in the mean energy expended by robots in the 3 robot colony, the 6 robot colony, and the 9 robot colony. a. The response is the evaluation by the undergraduate student of the ethical behavior of the salesperson. b. There are two factors—type of sales job at two levels (high tech. vs. low tech.) and sales task at two levels (new account development vs. account maintenance). c. The treatments are the 2 2 4 combinations of type of sales job and sales task. The treatments are: high tech./new account development, low tech./new account development, high tech./account maintenance, and low tech./account maintenance. d. The experimental units are the college students. a. This is a complete 2 2 factorial design. The 2 factors are Color and Question. There are two levels of color – Blue and Red. There are two levels of question – difficult and simple. The 4 treatments are: blue/difficult, blue/simple, red/difficult, red/simple. b. Since the p-value is so small ( p .03) , H0 is rejected. There is a significant interaction between color and question. The effect of color on the mean score is different at each level of question. 2 4 4 – 1 e. Copyright © 2014 Pearson Education, Inc. 2 6. Design of Experiments and Analysis of Variance 545 c. Using MINITAB, the graph is: Scatterplot of Blue, Red vs Quest Variable Blue Red 80 Y-Data 70 60 50 40 Difficult Simple Quest Since the lines are not parallel, it indicates that there is significant interaction between color and question. 9.92 a. This is a randomized block design. The blocks are the 12 plots of land. The treatments are the three methods used on the shrubs: fire, clipping, and control. The response variable is the mean number of flowers produced. The experimental units are the 36 shrubs. b. Plot c. To determine if there is a difference in the mean number of flowers produced among the three treatments, we test: H 0 : 1 2 3 H a : The mean number of flowers produced differ for at least two of the methods The test statistic is F 5.42 and the p-value is p .009 . Since the p-value is so small ( p .009) , H0 is rejected. There is sufficient evidence that there are differences in the mean number of flowers produced among the three treatments at .009 . Copyright © 2014 Pearson Education, Inc. 546 9.93 9.94 Chapter 9 d. There is no difference in the mean number of flowers produced by Clipping and Burning. The mean number of flowers produced by Control is significantly less than the mean number for Clipping and Burning. a. The experimental design used in this example was a randomized block design. b. The experimental units in this problem are the electronic commerce and internet-based companies. The response variable is the rate of return for the stock of the companies. The treatments are the 4 categories of companies: e-companies, internet software and service, internet hardware, and internet communication. The blocks are the 3 age categories: 1 year-old, 3 year-old, and 5 year-old. a. The experimental unit in the study is the college tennis coach. The dependent variable is the response to the statement “the Prospective Student-Athlete Form on the web site contributes very little to the recruiting process” on a scale from 1 to 7. There is one factor in the study and it is the NCAA division of the college tennis coach. There are 3 levels of this factor, and thus, there are 3 treatments: Division I, Division II, and Division III. b. To determine if the mean responses of tennis coaches from the different divisions differ, we test: H 0 : 1 2 3 H a : At least two means differ 9.95 c. Since the observed p-value of the test ( p .003) is less than .05 , H0 is rejected. There is sufficient evidence to indicate differences in mean response among coaches of the 3 divisions at .05 . d. The mean response for Division I coaches is significantly higher than the mean responses for the Division II and Division III coaches. There is no difference in the mean responses between Division II and Division III coaches. a. To determine if leadership style affects behavior of subordinates, we test: H 0 : 1 2 3 4 H a : At least two treatment means differ The test statistic is F 30.4 . The rejection region requires .05 in the upper tail of the F-distribution with 1 ab 1 2(2) 1 3 and 2 n – ab 257 – 2 2 253 . From Table VI, Appendix D, F.05 2.60 . The rejection region is F 2.60 . Since the observed value of the test statistic falls in the rejection region ( F 30.4 2.60) , H0 is rejected. There is sufficient evidence to indicate that leadership style affects behavior of subordinates at .05 . b. From the table, the mean response for High control, low consideration is significantly higher than that for any other three treatments. The mean response for Low control, low consideration is significantly higher than that for High control, high consideration and for Low control, high consideration. No other significant differences exist. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 547 c. The assumptions for Bonferroni's method are the same as those for the ANOVA. Thus, we must assume that: i. The populations sampled from are normal. ii. The population variances are the same. iii. The samples are independent. 9.96 From the printout, the p-value for treatments or Decoy is p .589 . Since the p-value is not small, we cannot reject H0. There is insufficient evidence to indicate a difference in mean percentage of a goose flock to approach to within 46 meters of the pit blind among the three decoy types. This conclusion is valid for any reasonable value of . 9.97 a. This is a complete 6 6 factorial design. b. There are 2 factors – Coagulant and pH level. There are 6 levels of coagulant: 5, 10, 20, 50, 100, and 200 mg / liter. There are 6 levels of pH: 4.0, 5.0, 6.0, 7.0, 8.0, and 9.0. There are 6 6 36 treatments. In the pairs, let the coagulant level be the first number and pH level the second. The 36 treatments are: (5, 4.0) (10, 4.0) (20, 4.0) (50, 4.0) (100, 4.0) (200, 4.0) 9.98 a. (5, 5.0) (10, 5.0) (20, 5.0) (50, 5.0) (100, 5.0) (200, 5.0) (5, 6.0) (10, 6.0) (20, 6.0) (50, 6.0) (100, 6.0) (200, 6.0) (5, 8.0) (10, 8.0) (20, 8.0) (50, 8.0) (100, 8.0) (200, 8.0) (5, 9.0) (10, 9.0) (20, 9.0) (50, 9.0) (100, 9.0) (200, 9.0) The response is the weight of a brochure. There is one factor and it is carton. The treatments are the five different cartons, while the experimental units are the brochures. y .75005 .01406437506 CM 2 b. (5, 7.0) (10, 7.0) (20, 7.0) (50, 7.0) (100, 7.0) (200, 7.0) n 2 40 SS Total y 2 CM .014066537 .01406437506 .00000216264 T2 .14767 2 .15028 2 .14962 2 .15217 2 .150312 SST i CM .01406437506 ni 8 8 8 8 8 .01406568209 .01406437506 .00000130703 SSE SS Total SST .00000216264 .00000130703 .00000085561 MST SST .00000130703 .000000326756 k 1 5 1 MSE SSE .00000085561 .000000024446 40 5 nk F MST .000000326756 13.37 MSE .000000024446 Copyright © 2014 Pearson Education, Inc. 548 Chapter 9 Source Treatments Error Total df 4 35 39 SS .00000130703 .00000085561 .00000216264 MS .000000326756 .000000024446 F 13.37 To determine whether there are differences in mean weight per brochure among the five cartons, we test: H 0 : 1 2 3 4 5 H a : At least two treatment means differ The test statistic is F 13.37 . The rejection region requires .05 in the upper tail of the F-distribution with1 k 1 5 1 4 and 2 n – k 40 5 35 . From Table VI, Appendix D, F.05 2.61 . The rejection region is F 2.61 . Since the observed value of the test statistic falls in the rejection region ( F 13.37 2.61) , H0 is rejected. There is sufficient evidence to indicate a difference in mean weight per brochure among the five cartons at .05 . c. We must assume that the distributions of weights for the brochures in the five cartons are normal, that the variances of the weights for the brochures in the five cartons are equal, and that random and independent samples were selected from each of the cartons. d. Using MINITAB, the results of Tukey’s multiple comparison procedure are: Level Carton1 Carton2 Carton3 Carton4 Carton5 N 8 8 8 8 8 Mean 0.018459 0.018785 0.018703 0.019021 0.018789 Individual 95% CIs For Mean Based on Pooled StDev StDev ---+---------+---------+---------+----0.000105 (-----*-----) 0.000101 (----*-----) 0.000109 (----*-----) 0.000232 (-----*-----) 0.000188 (----*-----) ---+---------+---------+---------+-----0.01840 0.01860 0.01880 0.01900 Pooled StDev = 0.000156 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 549 Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 99.32% Carton1 subtracted from: Carton2 Carton3 Carton4 Carton5 Carton2 Carton3 Carton4 Carton5 Lower 0.0001013 0.0000188 0.0003375 0.0001050 Center 0.0003262 0.0002437 0.0005625 0.0003300 Upper 0.0005512 0.0004687 0.0007875 0.0005550 ------+---------+---------+---------+--(-----*------) (-----*-----) (-----*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070 Carton2 subtracted from: Carton3 Carton4 Carton5 Carton3 Carton4 Carton5 Lower -0.0003075 0.0000113 -0.0002212 Center -0.0000825 0.0002363 0.0000037 Upper 0.0001425 0.0004612 0.0002287 ------+---------+---------+---------+--(------*-----) (------*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070 Carton3 subtracted from: Carton4 Carton5 Carton4 Carton5 Lower 0.0000938 -0.0001387 Center 0.0003187 0.0000862 Upper 0.0005437 0.0003112 ------+---------+---------+---------+--(-----*------) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070 Carton4 subtracted from: Carton5 Carton5 Lower -0.0004575 Center -0.0002325 Upper -0.0000075 ------+---------+---------+---------+--(-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070 Copyright © 2014 Pearson Education, Inc. 550 Chapter 9 The means arranged in order are: Carton 1 .018459 Carton 3 .018703 Carton 2 .018785 Carton 5 .018789 Carton 4 .019021 The interpretation of the Tukey results are: The mean weight for carton 4 is significantly higher than the mean weights of all the other cartons. The mean weights of cartons 2, 3, and 5 are not significantly different from each other, but they are significantly higher than the mean weight of carton 1. Since there are differences among the cartons, management should sample from many cartons. a. There is one factor in this problem which is Group. There are 5 treatments in this problem, corresponding to the 5 levels of Group: Casualties, Survivors, Implementers/casualties, Implementers/survivors, and Formulators. The response variable is the ethics score. The experimental units are the employees enrolled in an Executive MBA program. b. To determine if there are any differences among the mean ethics scores for the five groups, we test: H 0 : 1 2 3 4 5 H a : At least two means differ c. The test statistic is F 9.85 and the p-value is p .000 . Since the p-value (0.000) is less than any reasonable significance level , H0 is rejected. There is sufficient evidence to indicate a difference in the mean ethics scores among the five groups of employees for any reasonable value of . d. We will check the assumptions of normality and equal variances. Using MINITAB, the histograms are: Histogram of CASUAL, SURVIVE, IMPCAS, IMPSUR, FORMUL Normal 0 C ASUAL 1 2 3 4 5 SURVIVE IMPC AS 30 20 Frequency 9.99 e. 10 IMPSUR FORMUL 30 0 0 1 2 3 4 5 C ASUAL Mean 1.787 StDev 0.8324 N 47 SURVIVE Mean 1.845 StDev 1.023 N 71 IMPC AS Mean 1.593 StDev 0.6360 N 27 IMPSUR Mean 2.545 StDev 1.301 N 33 20 10 0 0 1 2 3 4 5 FORMUL Mean 2.871 StDev 1.176 N 31 The data for some of the 5 groups do not look particularly mound-shaped, so the assumption of normality is probably not valid. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 551 Using MINITAB, the boxplots are: Boxplot of CASUAL, SURVIVE, IMPCAS, IMPSUR, FORMUL 5 Data 4 3 2 1 CASUAL SURVIVE IMPCAS IMPSUR FORMUL The spreads of responses do not appear to be about the same. The groups Implementers/survivors and Formulators have more variability than the other three groups. Thus, the assumption of constant variance is probably not valid. The assumptions required for the ANOVA F-test do not appear to be reasonably satisfied. 9.100 e. The Bonferroni method is preferred over other multiple comparisons methods because it does not require equal sample sizes. The five groups of employees do not have the same sample sizes. In addition, it is more powerful than Scheffe’s method. f. The number of pairwise comparisons for this analysis is c g. The mean ethics scores for both Groups 4 and 5 are significantly higher than the mean ethics scores for Groups 1, 2, and 3. There is no difference in the mean ethics scores between Group 4 and Group 5. There is no difference in the mean ethics scores among Groups 1, 2 and 3. a. This is a randomized block design. Response: Factor: Factor type: Treatments: Experimental units: k ( k 1) 5(5 1) 20 10 . 2 2 2 the length of time required for a cut to stop bleeding drug qualitative drugs A, B, and C subjects Copyright © 2014 Pearson Education, Inc. 552 Chapter 9 b. Using MINITAB, the results are: General Linear Model: Y versus Drug, Person Factor Drug Person Type Levels Values fixed 3 A B C fixed 5 1 2 3 4 5 Analysis of Variance for Y, using Adjusted SS for Tests Source Drug Person Error Total DF 2 4 8 14 Seq SS 156.4 7645.8 160.1 7962.3 Adj SS 156.4 7645.8 160.1 Adj MS 78.2 1911.5 20.0 F 3.91 95.51 P 0.066 0.000 Tukey 90.0% Simultaneous Confidence Intervals Response Variable Y All Pairwise Comparisons among Levels of Drug Drug = A subtracted from: Drug B C Lower -11.56 -3.72 Center -4.820 3.020 Upper 1.922 9.762 -----+---------+---------+---------+(-------*-------) (--------*-------) -----+---------+---------+---------+-8.0 0.0 8.0 16.0 Upper 14.58 -----+---------+---------+---------+(--------*-------) -----+---------+---------+---------+-8.0 0.0 8.0 16.0 Drug = B subtracted from: Drug C Lower 1.098 Center 7.840 Let 1 , 2 , and 3 represent the mean clotting times for the three drugs. To determine if there is a difference in mean clotting time among the 3 drugs, we test: H 0 : 1 2 3 H a : At least two treatment means differ The test statistic is F MS ( Drug ) 3.91 and the p-value is p .066 . MSE Since the p-value is less than ( p .066 .10) , H0 is rejected. There is sufficient evidence to indicate differences in the mean clotting times among the three drugs at .10 . c. The observed level of significance is given as p .066 . d. To determine if there is a significant difference in the mean response over blocks, we test: H 0 : 1 2 3 4 5 H a : At least two block means differ The test statistic is F MS ( Person) 95.51 and the p-value is p .000 . MSE Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 553 Since the p-value is less than ( p .000 .10) , H0 is rejected. There is sufficient evidence to indicate differences in the mean clotting times among the five people at .10 . e. The confidence interval to compare drugs A and B is (-11.56, 1.922). Since 0 is in the interval, there is no evidence of a difference in mean clotting times between drugs A and B. The confidence interval to compare drugs A and C is (-3.72, 9.762). Since 0 is in the interval, there is no evidence of a difference in mean clotting times between drugs A and C. The confidence interval to compare drugs B and C is (1.098, 14.58). Since 0 is not in the interval, there is evidence of a difference in mean clotting times between drugs B and C. Since the numbers are positive, the mean clotting time for drug C is greater than that for drug B. In summary, the mean clotting time for drug C is greater than that for drug B. No other differences exist. 9.101 9.102 a. The response is the quality of the steel ingot. b. There are two factors: temperature and pressure. They are quantitative factors since they are numerical. c. The treatments are the 3 5 15 factor-level combinations of temperature and pressure. d. The steel ingots are the experimental units. a. The degrees of freedom for “Type of message retrieval system” is a 1 2 1 1 . The degrees of freedom for “Pricing option” is b 1 2 1 1 . The degrees of freedom for the interaction of Type of message retrieval system and Pricing option is ( a 1) b – 1 (2 1)(2 1) 1 . The degrees of freedom for error is n ab 120 2 2 116 . Source Type of message retrieval system Pricing Option Type of system pricing option Error Total b. Df 1 1 1 116 119 SS - MS - F 2.001 5.019 4.986 To determine if “Type of system” and “Pricing option” interact to affect the mean willingness to buy, we test: H0: “Type of system” and “Pricing option” do not interact Ha: “Type of system” and “Pricing option” interact c. The test statistic is F MSAB 4.986 MSE The rejection region requires .05 in the upper tail of the F distribution with 1 (a 1)(b 1) (2 1)(2 1) 1 and 2 n – ab 120 – 2 2 116 . From Table VI, Appendix D, F.05 3.92 . The rejection region is F 3.92 . Copyright © 2014 Pearson Education, Inc. 554 Chapter 9 Since the observed value of the test statistic falls in the rejection region ( F 4.986 3.92) , H0 is rejected. There is sufficient evidence to indicate “Type of system” and “Pricing option” interact to affect the mean willingness to buy at .05 . 9.103 d. No. Since the test in part c indicated that interaction between “Type of system” and “Pricing option” is present, we should not test for the main effects. Instead, we should proceed directly to a multiple comparison procedure to compare selected treatment means. If interaction is present, it can cover up the main effects. a. We will select size as the quantitative variable and color as the qualitative variable. To determine if the mean size of diamonds differ among the 6 colors, we test: H 0 : 1 2 3 4 5 6 H a : At least two means differ b. Using MINITAB, the ANOVA table is: One-way ANOVA: Carats versus Color Analysis of Variance for Carats Source DF SS MS Color 5 0.7963 0.1593 Error 302 22.7907 0.0755 Total 307 23.5869 Level D E F G H I N 16 44 82 65 61 40 Mean 0.6381 0.6232 0.5929 0.5808 0.6734 0.7310 Pooled StDev = 0.2747 F 2.11 P 0.064 Individual 95% CIs For Mean Based on Pooled StDev StDev ----------+---------+---------+-----0.3195 (-------------*------------) 0.2677 (-------*-------) 0.2648 (-----*-----) 0.2792 (------*------) 0.2643 (------*------) 0.2918 (-------*--------) ----------+---------+---------+-----0.60 0.70 0.80 The test statistic is F 2.11 and the p-value is p .064 . Since the p-value is less than ( p .064 .10) , H0 is rejected. There is sufficient evidence to indicate the mean sizes of diamonds differ among the 6 colors at .10 . Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 555 We will check the assumptions of normality and equal variances. Using MINITAB, the histograms are: Histogram of CARAT Normal 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 D E F D 12 9 Frequency 6 3 G H 0.6381 0.3195 16 Mean StDev N E Mean StDev N 0 I 0.6232 0.2677 44 F Mean StDev N 12 9 0.5929 0.2648 82 G 6 Mean StDev N 3 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.5808 0.2792 65 H 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Mean StDev N CARAT Panel variable: COLOR 0.6734 0.2643 61 I The data for the 6 colors do not look particularly mound-shaped, so the assumption of normality is probably not valid. However, departures from this assumption often do not invalidate the ANOVA results. Using MINITAB, the box plots are: Boxplot of CARAT vs COLOR 1.1 1.0 0.9 0.8 CARAT c. 0.7 0.6 0.5 0.4 0.3 0.2 D E F G H I COLOR The spreads of all the colors appear to be about the same, so the assumption of constant variance is probably valid. Copyright © 2014 Pearson Education, Inc. 556 Chapter 9 d. Using MINITAB, the Tukey confidence intervals are: Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of COLOR Individual confidence level = 99.53% COLOR = D subtracted from: COLOR E F G H I Lower -0.2435 -0.2591 -0.2758 -0.1846 -0.1387 Center -0.0149 -0.0452 -0.0574 0.0353 0.0929 Upper 0.2136 0.1688 0.1611 0.2552 0.3244 ---------+---------+---------+---------+ (-------------*-------------) (------------*-------------) (------------*-------------) (-------------*-------------) (--------------*-------------) ---------+---------+---------+---------+ -0.16 0.00 0.16 0.32 COLOR = E subtracted from: COLOR F G H I Lower -0.1765 -0.1952 -0.1046 -0.0632 Center -0.0303 -0.0424 0.0503 0.1078 Upper 0.1160 0.1104 0.2051 0.2788 ---------+---------+---------+---------+ (--------*--------) (--------*---------) (---------*---------) (----------*---------) ---------+---------+---------+---------+ -0.16 0.00 0.16 0.32 COLOR = F subtracted from: COLOR G H I Lower -0.1422 -0.0518 -0.0129 Center -0.0122 0.0805 0.1381 Upper 0.1178 0.2129 0.2890 ---------+---------+---------+---------+ (-------*-------) (-------*-------) (---------*--------) ---------+---------+---------+---------+ -0.16 0.00 0.16 0.32 COLOR = G subtracted from: COLOR H I Lower -0.0469 -0.0071 Center 0.0927 0.1502 Upper 0.2322 0.3075 ---------+---------+---------+---------+ (--------*--------) (--------*---------) ---------+---------+---------+---------+ -0.16 0.00 0.16 0.32 COLOR = H subtracted from: COLOR I Lower -0.1017 Center 0.0576 Upper 0.2168 ---------+---------+---------+---------+ (---------*---------) ---------+---------+---------+---------+ -0.16 0.00 0.16 0.32 All of the confidence intervals contain 0. Thus, at 95% confidence, there is no evidence that the mean sizes of the diamonds are different among the different colors. This disagrees with the test of hypothesis because the test was run using .10 . 9.104 a. The experimenters expected there to be much variation in the number of participants from week to week (more participants at the beginning and fewer as time goes on). Thus, by blocking on weeks, this extraneous source of variation can be controlled. Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 557 b. df(Week) b 1 6 1 5 MS ( Prompt ) SST 1185.0 296.25 4 df F ( Prompt ) MST 296.25 39.87 MSE 7.43 MS 296.25 77.28 7.43 F 39.87 10.40 The ANOVA table is: Source Prompt Week Error Total c. df 4 5 20 29 SS 1185.0 386.4 148.6 1720.0 p 0.0001 0.0001 To determine if a difference exists in the mean number of walkers per week among the five walker groups, we test: H 0 : 1 2 3 4 5 H a : At least two treatment means differ where i represents the mean number of walkers in group i. The test statistic is F 39.87 . The rejection region requires .05 in the upper tail of the F-distribution with1 k 1 5 1 4 and 2 n – k b 1 30 5 6 1 20 . From Table VI, Appendix D, F.05 2.87 . The rejection region is F 2.87 . Since the observed value of the test statistic falls in the rejection region ( F 39.87 2.87) , H0 is rejected. There is sufficient evidence to indicate differences exist among the mean number of walkers per week among the 5 walker groups at .05 . d. The following conclusions are drawn: There is no significant difference in the mean number of walkers per week in the "Frequent/High" group and the "Frequent/Low group". The means for these two groups are significantly higher than the means for the other three groups. There is no significant difference in the mean number of walkers per week in the "Infrequent/Low" group and the "Infrequent/High" group. The means for these two groups are significantly higher than the mean for the "Control group. e. 9.105 In order for the above inferences to be valid, the following assumptions must hold: 1) The probability distributions of observations corresponding to all block-treatment conditions are normal. 2) The variances of all the probability distributions are equal. a. The treatments are the 3 3 9 combinations of PES and Trust. The nine treatments are: (BC, Low), (PC, Low), (NA, Low), (BC, Med), (PC, Med), (NA, Med), (BC, High), (PC, High), and (NA, High). b. df(Trust) 1 3 1 2 ; SSE SSTot SS PES SS Trust SSPT 301.55 4.35 15.20 3.50 278.50 Copyright © 2014 Pearson Education, Inc. Chapter 9 SS ( PES ) 4.35 2.175 2 df ( PES ) MS (Trust ) SS ( PT ) 3.50 .875 4 df ( PT ) MSE 278.50 SSE 1.458 191 df ( Error ) FPES MS ( PES ) 2.175 1.49 MSE 1.458 FTrust MS (Trust ) 7.600 5.21 MSE 1.458 FPT MS ( PT ) .875 .600 MSE 1.458 MS ( PES ) MS ( PT ) SS (Trust ) 15.20 7.600 2 df (Trust ) The ANOVA table is: Source PES Trust PES Trust Error Total c. df 2 2 4 191 199 SS 4.35 15.20 3.50 278.50 301.55 MS 2.175 7.600 .875 1.458 F 1.49 5.21 0.60 To determine if PES and Trust interact, we test: H0: PES and Trust do not interact to affect the mean tension Ha: PES and Trust do interact to affect the mean tension The test statistic is F 0.60 . The rejection region requires .05 in the upper tail of the F-distribution with 1 (a 1)(b 1) (3 1)(3 1) 4 and 2 n – ab 215 – 3 3 191 . From Table VI, Appendix D, F.05 2.37 . The rejection region is F 2.37 . Since the observed value of the test statistic does not fall in the rejection region ( F 0.60 2.37) , H0 is not rejected. There is insufficient evidence to indicate that PES and Trust interact at .05 . d. The plot of the treatment means is: Interaction Plot for Mean Trust High Low Medium 6.5 6.0 Mean 558 5.5 5.0 4.5 BC NA Perform PC Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 559 The three lines corresponding to the Trust levels are almost parallel. This indicates that PES and Trust do not interact. This agrees with the result in part c. 9.106 e. Since the interaction is not significant, the tests for the main effects should be run. a. There are a total of a b 3 3 9 treatments in this study. b. Using MINITAB, the ANOVA results are: ANOVA: Y versus Display, Price Factor Display Price Type Levels Values fixed 3 1 2 3 fixed 3 1 2 3 Analysis of Variance for Y Source Display Price Display*Price Error Total S = 22.2428 DF 2 2 4 18 26 SS 1691393 3089054 510705 8905 5300057 R-Sq = 99.83% MS F 845696 1709.37 1544527 3121.89 127676 258.07 495 P 0.000 0.000 0.000 R-Sq(adj) = 99.76% To get the SS for Treatments, we must add the SS for Display, SS for Price, and the SS for Interaction. Thus, SST 1, 691, 393 3, 089, 054 510, 705 5, 291,152 . The df 2 2 4 8 . MST SST 5, 291,152 661,394 3(3) 1 ab 1 F MST 661,394 1336.15 MSE 495 To determine whether the treatment means differ, we test: H 0 : 1 2 9 H a : At least two treatment means differ The test statistic is F MST 1, 336.15 . MSE The rejection region requires .10 in the upper tail of the F-distribution with 1 ab 1 3(3) 1 8 and 2 n – ab 27 – 3 3 18 . From Table V, Appendix D, F.10 2.04 . The rejection region is F 2.04 . Since the observed value of the test statistic falls in the rejection region ( F 1,336.15 2.04) , H0 is rejected. There is sufficient evidence to indicate the treatment means differ at .10 . c. Since there are differences among the treatment means, we next test for the presence of interaction. H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is F MSAB 258.07 and the p-value is p .000 . MSE Copyright © 2014 Pearson Education, Inc. 560 Chapter 9 Since the p-value is less than ( p .000 .10) , H0 is rejected. There is sufficient evidence to indicate the two factors interact at .10 . 9.107 9.108 d. The main effect tests are not warranted since interaction is present in part c. e. The nine treatment means need to be compared. a. This is a 2 2 factorial experiment. b. The two factors are the tent type (treated or untreated) and location (inside or outside). There are 2 2 4 treatments. The four treatments are (treated, inside), (treated, outside), (untreated, inside), and (untreated, outside). c. The response variable is the number of mosquito bites received in a 20 minute interval. d. There is sufficient evidence to indicate interaction is present. This indicates that the effect of the tent type on the number of mosquito bites depends on whether the person is inside or outside. a. This is a completely randomized design with a complete four-factor factorial design. b. There are a total of 2 2 2 2 16 treatments. c. Using SAS, the output is: Analysis of Variance Procedure Dependent Variable: Y Sum of Mean Source DF Squares Square F Value Pr > F Model 15 546745.50 36449.70 5.11 0.0012 Error 16 114062.00 7128.88 Corrected Total 31 660807.50 R-Square C.V. Root MSE Y Mean 0.827390 41.46478 84.433 203.63 Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 561 d. Source DF Anova SS Mean Square F Value Pr > F SPEED 1 56784.50 56784.50 7.97 0.0123 FEED 1 21218.00 21218.00 2.98 0.1037 SPEED*FEED 1 55444.50 55444.50 7.78 0.0131 COLLET 1 165025.13 165025.13 23.15 0.0002 SPEED*COLLET 1 44253.13 44253.13 6.21 0.0241 FEED*COLLET 1 142311.13 142311.13 19.96 0.0004 SPEED*FEED*COLLET 1 54946.13 54946.13 7.71 0.0135 WEAR 1 378.13 378.13 0.05 0.8208 SPEED*WEAR 1 1540.13 1540.13 0.22 0.6483 FEED*WEAR 1 946.13 946.13 0.13 0.7204 SPEED*FEED*WEAR 1 528.13 528.13 0.07 0.7890 COLLET*WEAR 1 1682.00 1682.00 0.24 0.6337 SPEED*COLLET*WEAR 1 512.00 512.00 0.07 0.7921 FEED*COLLET*WEAR 1 72.00 72.00 0.01 0.9212 SPEE*FEED*COLLE*WEAR 1 1104.50 1104.50 0.15 0.6991 To determine if the interaction terms are significant, we must add together the sum of squares for all interaction terms as well as the degrees of freedom. SS Interaction 55, 444.50 44, 253.13 142,311.13 54, 946.13 1,540.13 946.13 528.13 1, 682.00 512.00 72.00 1,104.50 303, 339.78 df(Interaction) = 11 MS ( Interaction) FInteraction SS ( Interacton) 303,339.78 27,576.34364 11 df ( Interaction) MS ( Interaction) 27,576.34364 3.87 MSE 7128.88 To determine if interaction effects are present, we test: H0: No interaction effects exist Ha: Interaction effects exist The test statistic is F 3.87 . The rejection region requires .05 in the upper tail of the F-distribution with1 11 and 2 16 . From Table VI, Appendix D, F.05 2.49 . The rejection region is F 2.49 . Since the observed value of the test statistic falls in the rejection region ( F 3.87 2.49) , H0 is rejected. There is sufficient evidence to indicate that interaction effects exist at .05 . Since the sums of squares for a balanced factorial design are independent of each other, we can look at the SAS output to determine which of the interaction effects are significant. The three-way interaction between speed, feed, and collet is significant ( p .0135) . There are three two-way interactions with p-values less than .05. However, all of these two-way interaction terms are imbedded in the significant three-way interaction term. Copyright © 2014 Pearson Education, Inc. 562 Chapter 9 e. Yes. Since the significant interaction terms do not include wear, it would be necessary to perform the main effect test for wear. All other main effects are contained in a significant interaction term. To determine if the mean finish measurements differ for the different levels of wear, we test: H0: The mean finish measurements for the two levels of wear are the same Ha: The mean finish measurements for the two levels of wear are different The test statistic is F 0.05 and the p-value is p .8280 . Since the p-value is not less than ( p .8280 .05) , H0 is not rejected. There is insufficient evidence to indicate that the mean finish measurements differ for the different levels of wear at .05 . f. We must assume that: i. The populations sampled from are normal. ii. The population variances are the same. iii. The samples are random and independent. 9.109 Using MINITAB, the ANOVA Table is: ANOVA: Rating versus Prep, Standing Factor Prep Standing Type Levels Values fixed 2 PRACTICE REVIEW fixed 3 HI LOW MED Analysis of Variance for Rating Source Prep Standing Prep*Standing Error Total DF 1 2 2 126 131 SS 54.735 16.500 13.470 478.955 563.659 S = 1.94967 R-Sq = 15.03% MS 54.735 8.250 6.735 3.801 F 14.40 2.17 1.77 P 0.000 0.118 0.174 R-Sq(adj) = 11.66% Tukey 95.0% Simultaneous Confidence Intervals Response Variable Rating All Pairwise Comparisons among Levels of Prep Prep = PRACTICE subtracted from: Prep REVIEW Lower -1.960 Center -1.288 Upper -0.6162 ---+---------+---------+---------+--(-----------*----------) ---+---------+---------+---------+---1.80 -1.20 -0.60 0.00 First, we must test for treatment effects. SST SSP SSS SSPS 54.735 16.500 13.470 84.705 . The df 1 2 2 5 . Copyright © 2014 Pearson Education, Inc. Design of Experiments and Analysis of Variance 563 MST SST 84.705 16.941 ab 1 2(3) 1 F MST 16.941 4.46 MSE 3.801 To determine if there are differences in mean ratings among the 6 treatments, we test: H0: All treatment means are the same Ha: At least two treatment means differ The test statistic is F 4.46 . Since no was given, we will use .05 . The rejection region requires .05 in the upper tail of the F distribution with1 ab 1 2(3) 1 5 and 2 n – ab 132 – 2 3 126 . From Table VI, Appendix D, F.05 2.29 . The rejection region is F 2.29 . Since the observed value of the test statistic falls in the rejection region ( F 4.46 2.29) , H0 is rejected. There is sufficient evidence that differences exist among the treatment means at .05 . Since differences exist, we now test for the interaction effect between Preparation and Class Standing. To determine if Preparation and Class Standing interact, we test: H0: Preparation and Class Standing do not interact Ha: Preparation and Class Standing do interact The test statistic is F 1.77 and the p-value is p .174 . Since the p-value is greater than ( p .174 .05) , H0 is not rejected. There is insufficient evidence that Preparation and Class Standing interact at .05 . Since the interaction does not exist, we test for the main effects of Preparation and Class standing. To determine if there are differences in the mean rating between the three levels of Class standing, we test: H 0 : L M H H a : At leaset two treatment means differ The test statistics is F 2.17 and the p-value is p 0.118 . Since the p-value is greater than ( p .118 .05) , H0 is not rejected. There is insufficient evidence that the mean ratings differ among the 3 levels of Class Standing at .05 . To determine if there are differences in the mean rating between the two levels of Preparation, we test: H 0 : P R H a : P R The test statistics is F 14.40 and the p-value is p 0.000 . Since the p-value is less than ( p .000 .05) , H0 is rejected. There is sufficient evidence that the mean ratings differ between the two levels of preparation at .05 . There are only 2 levels of Preparation. The mean rating for Practice is higher than the mean rating Review. Copyright © 2014 Pearson Education, Inc.