1 6.6 The Central Limit Theorem 6.6.1 State the Central Limit Theorem The Central Limit Theorem states that for large random samples, the sampling distribution of the sample means is close to a normal probability distribution. 6.6.2 Apply the Central Limit Theorem to make predictions about and calculate probabilities for sample means. The following steps are used for using the central limit theorem to calculate the probability for a given sample mean. Step 1: Calculate the z-score for the sample mean, X , using the following formula: z X (if the POPULATION standard deviation, , is known) n where: X : is the SAMPLE mean : is the population mean : is the population standard deviation n : the sample size OR z X s (if the SAMPLE standard deviation is known) n where: X : is the SAMPLE mean : is the population mean s : is the sample standard deviation n : the sample size Step 2: Look up the probability in Appendix D and determine the desired probability using the same methods as before. 2 Examples: 1. A normal population has a mean of 60 and a standard deviation of 12. You select a random sample of 9. Compute the probability that the sample mean is: a. between 60 and 63 b. greater than 63 c. less than 56 d. between 56 and 63 e. between 50 and 56 2. A population of 100 with an unknown shape has a mean of 75. You select a sample of 40. The standard deviation of the sample is 5. Compute the probability that the sample mean is: a. less than 74 b. between 74 and 77 c. between 76 and 77 d. greater than 77 e. less than 76 3 Remember: For finding sample mean probabilities: Step 1: Calculate the z-score Step 2: Use Appendix D (z-score chart) and interpret your answer. Extra Examples: 1. A normal population has a mean of 60 and a standard deviation of 8. A random sample of 9 is taken. a. What is the probability that the sample mean is between 60 and 65? b. What is the probability that the sample mean is between 54 and 60? c. What is the probability that the sample mean is between 54 and 65? 2. A population of unknown shape has a mean of 70. You select a sample of42. The standard deviation of the sample is 5. a. Compute the probability the sample mean is greater than 71. b. Compute the probability the sample mean is less than 68.8. c. Compute the probability the sample mean is greater than 68.8. d. Compute the probability the sample mean is less than 71. 3. A trucking company claims that the mean weight of their delivery trucks when they are fully loaded is 6000 pounds and the standard deviation is 250 pounds. Assume that the population follows the normal distribution. Ninety trucks are randomly selected and weighed. a. What is the probability that the sample mean is between 6020 and 6070 pounds? 4 b. What is the probability that the sample mean is between 5970 and 5980 pounds? c. What is the probability that the sample mean is between 5970 and 6020? d. What is the probability that the sample mean is more than 6020? e. What is the probability that the sample mean is less than 6020? f. What is the probability the sample mean is more than 5970? g. What is the probability that the sample mean is less than 5970? h. What is the probability that the sample mean is between 5970 and 6000? More Practice! Worksheet 6.6 1. The mean rent for a one-bedroom apartment in Southern California is $2,200 per month. The distribution of the monthly costs does not follow the normal distribution. In fact, it is positively skewed. What is the probability of selecting a sample of 50 one-bedroom apartments and finding the mean to be at least $1,950 per month? The standard deviation of the sample is $250. 2. According to an IRS study, it takes an average of 330 minutes for taxpayers to prepare, copy, and electronically file a 1040 tax form. A consumer watchdog agency selects a random sample of 40 taxpayers and finds the standard deviation of the time to prepare, copy, and electronically file form 1040 is 80 minutes. a. What assumption or assumptions do you need to make about the shape of the population? b. What is the standard error of the mean in this example? c. What is the likelihood the sample mean is greater than 320 minutes? d. What is the likelihood the sample mean is between 320 and 350 minutes? e. What is the likelihood the sample mean is greater than 350 minutes? 5 3. Recent studies indicate that the typical 50-year-old woman spends $350 per year for personal-care products. The distribution of the amounts spent is positively skewed. We select a random sample of 40 women. The mean amount spent for those sampled is $335, and the standard deviation of the sample is $45. What is the likelihood of finding a sample mean this large or larger from the specified population? 4. Information from the American Institute of Insurance indicates the mean amount of life insurance per household in the United States is $110,000. This distribution is positively skewed. The standard deviation of the population is now known. a. A random sample of 50 households revealed a mean of $112,000 and a standard deviation of $40,000. What is the standard error of the mean? b. Suppose that you selected 50 samples of households. What is the expected shape of the distribution of the sample mean? c. What is the likelihood of selecting a sample with a mean of at least $112,000? d. What is the likelihood of selecting a sample with a mean of more than $1000,000? e. Find the likelihood of selecting a sample with a mean of more than $1000,000 but less than $112,000. 5. The mean age at which men in the United States marry for the first time is 24.8 years. The shape and the standard deviation of the population are both unknown. For a random sample of 60 men, what is the likelihood that the age at which they were married for the first time is less than 25.1 years? Assume that the standard deviation of the sample is 2.5 years. 6. A recent study by the Greater Los Angeles Taxi Drivers Association showed that the mean fare charged for service from Hermosa Beach to the Los Angeles International Airport is $18.00 and the standard deviation is $3.50. We select a sample of 15 fares. a. What is the likelihood that the sample mean is between $17.00 and $20.00? b. What must you assume to make the above calculation? 6 7.0 Hypothesis Testing 7.1 Introduction 7.1.1 Describe the purpose of Hypothesis Testing What is Hypothesis Testing? Hypothesis testing is a statistical procedure which involves a decision-making process for evaluating claims about a certain parameter of a population. As a researcher of data, you may be interested in answering many types of questions. Automobile manufacturers may be interested in determining whether seat belts will reduce the severity of injuries caused by accidents. A ladies' wear store may want to know whether the general public prefers a certain colour in a new line of fashion swim wear. These types of questions can be answered using the methods of hypothesis testing. Hypothesis testing starts with a statement about a population parameter such as the mean. What is a Hypothesis? In statistical analysis we make a claim, that is, state a hypothesis, then follow up with tests to verify the assertion or to determine that it is untrue. Because we utilize statistical inference, is not necessary to measure the entire population; instead, we take a sample from the population to determine whether the empirical evidence from the sample does or does not support the statement concerning the population. As noted, hypothesis testing starts with a statement about a population parameter such as the mean. 7 Example: One statement about the performance of a new model car is that the mean miles per gallon is 30. Another statement is that the mean miles per gallon is not 30. Only one of these statements is correct. To test the validity of the assumption (hypothesis) that the meal miles per gallon is 30, we must select a sample from the population, calculate sample statistics, and based on certain decision rules either accept or reject the hypothesis. 7.2 Hypothesis Tests 7.2.1 Name and describe the components of a statistical hypothesis test Five-Step Procedure for Hypotheses Testing When conducting hypothesis tests we actually employ a strategy of "proof by contradiction." We hope to accept a statement to be true by rejecting or ruling out another statement. Statistical hypothesis testing is a five-step procedure: 8 Hypothesis Testing - Step 1 The first step is to state the null and alternate hypotheses. What is the null hypothesis? For example, a recent newspaper report made the claim that the mean length of a hospital stay was 3.3 days. You think that the true length of stay is some other length than 3.3 days. The null hypothesis is written Ho: µ = 3.3 It is the statement about the value of the population parameter - in this case the population mean. The null hypothesis is established for the purpose of testing. On the basis of the sample evidence, it is either rejected or not rejected. In other words, it is accepted or rejected. If the null hypothesis is rejected, then we accept the alternate hypothesis. The alternate hypothesis is written H1: µ ≠3.3 There are two other formats for writing the null and alternate hypotheses: Suppose you think that the mean length of stay is greater than 3.3 days. The null and alternate hypothesis would be written: µ = 3.3 H1: µ ≠ 3.3 Ho: Note that in this case the null hypothesis indicates "no change or that is less than 3.3." The alternate hypothesis states that the mean length of stay is greater than 3.3 days. Suppose you think that the mean length of stay is less then 3.3 days. The null and alternate hypothesis would be written: µ ≥ 3.3 H1: µ <3.3 Ho: It is important to remember that no matter how the problem is stated, the null hypothesis will always contain the equal sign. The equality sign will never appear in the alternate hypothesis. One-tailed versus two-tailed test 9 When a direction is expressed in the alternate hypothesis, such as > or <, the test is referred to as being a one-tailed test. When the alternate hypothesis is that of "≠" (not equal to), the test will be a two-tailed test. Hypothesis Testing - Step 2 After setting up the null hypothesis and alternate hypothesis, the next step is to state the level of significance. The level of significance is designated , the Greek letter alpha. If will indicate when the sample mean is too far away from the hypothesized mean for the null hypothesis to be true. When a true null hypothesis is rejected it is referred to as a Type I error. If the null hypothesis is not true, but our sample results indicate that it is, we have a Type II error. 10 Hypothesis Testing - Step 3 Step 3 of the hypothesis testing procedure is to compute the test statistic. What is a test statistic? Which test statistic do I use? This answer to this question is determined by factors such as whether the population standard deviation is known and the size of the sample. The standard normal distribution, the z value, is used if the population is normally distributed if the population standard deviation is known and, when the sample size is greater than 30. 11 Hypothesis Testing - Step 4 Formulate the Decision Rule: A decision rule is based on Ho and H1 , the level of significance, and the test statistic. The decision rule is formulated by finding the critical values for z. If we are applying a one-tailed test, there is only one critical value. If we are applying a two-tailed test, there are two critical values. The following diagram illustrates the critical values for a two-tailed test, at the 0.01 level of significance. Since this is a two-tailed test, half of the 0.01 is found in each tail - 0.005. The area where Ho is not rejected is therefore 0.99. Since appendix D is based on half of the area under the curve, we locate 0.99/2 = 0.4950 in the body of the table to find the corresponding z critical values = 2.58. Therefore, our decision rule is: Reject the null hypothesis and accept the alternate hypothesis if the computed value of z does not fall in the region between -2.58 and +2.58. To find the critical value for a one-tailed test, at the 0.01 level of significance, place the 0.01 of the total area in the upper or lower tail. This means that 0.5000 - 0.01 = 0.4900 of the area is located between the z value of 0 and the critical value. We locate 0.4900 in the body of Appendix D and our decision rule is to reject the null hypothesis if the computed value from the test statistic exceeds 2.33 for an upper-tailed test or is less than -2.33 for a lower tailed test. 12 The following diagrams will illustrate the acceptance and rejection area for an upper-tailed test. Hypothesis Testing - Step 5 Select the Sample and Make a Decision: The final step is to select the sample and compute the value of the test statistic. This value is compared to the critical value, or values, and a decision is made whether to reject to accept the null hypothesis. In the following example the critical values for z are -2.58 and +2.58 (a two-tailed test). The computed value of z = 1.55. Since the computed value falls in the acceptance range, we do not reject, we accept the null hypothesis. 7.0 Hypothesis test examples: 1. A company manufactures desks. Their production follows the normal distribution, with a mean of 200 per week and a standard deviation of 16. The president would like to investigate whether the mean number of desks is different from 200 at the 0.01 significance level. A sample accumulated over 150 weeks has a mean of 203.5. Is the president right in assuming that the mean number of desks is different from 200? 13 2. The rate at which a stock of aspirin is changes each year has a mean of 6.0 and a standard deviation of 0.50. A random sample of 64 aspirin revealed a mean of 5.84. It is suspected that the mean turnover has changed and is no longer 6.0. Use the 0.05 significance level to test the hypothesis that the mean turnover is not 6.0. 3. The mean age of passenger cars in the US is 8.4 years. A sample of 40 cars in the student lots at the University of Tennessee showed the mean age to be 9.2 years. The standard deviation of this sample was 2.8 years. At the 0.1 significance level, can we conclude the mean age is more than 8.4 years for the cars of Tennessee students? 4. The manager of a store wants to find whether the mean unpaid balance is more than $400. The level of significance is set at 0.05. A random sample of 60 unpaid balances revealed the sample mean is $407 and the standard deviation is $22.50. Should she conclude that the mean is greater than $400? 5. The mean amount of time spent watching TV per day for eighth graders is 1.6 hours. A sample of 35 eight graders showed the mean number of hours to be 1.3 hours with a standard deviation of 1.0 hours. At the 0.01 significance level, can we conclude that the mean age is less than 1.6 hours? 6. The mean number of hours spent on the phone by employees is said to be 37 with a standard deviation of 2.1. The owner of a company wants to determine whether the mean number of minutes is less than 37. She takes a sample of 43 employees and finds that the mean amount of time spent is 33. Can we conclude that the mean number of minutes is less than 37? (Use the 0.05 significance level.) 7. A town council claims that the mean number of hours citizens spend commuting to work is 28 minutes. A company believes that the mean is not 28 minutes and takes a sample of 50 citizens. They determine that the mean commuting time of the sample is 36 minutes with a standard deviation of 11 minutes. At the 0.01 significance level, can the company conclude that the mean commuting time for the town is different from 28 minutes? 14 Worksheet for 7.0 1. The following information is available. H0: µ = 50 H1: µ ≠ 50 The sample mean is 49, and the sample size is 36. The population follows the normal distribution and the standard deviation is 5. Use the .05 significance level. 2. The following information is available. H0: µ ≤ 10 H1: µ > 10 The sample mean is 12 for a sample of 36. The population follows the normal distribution and the standard deviation is 3. Use the .02 significance level. 3. A sample of 36 observations is selected from a normal population. The sample mean is 21, and the sample standard deviation is 5. Conduct the following test of hypothesis using the .05 significance level. H0: µ ≤ 20 H1: µ > 20 4. A sample of 64 observations is selected from a normal population. The sample mean is 215, and the sample standard deviation is 15. Conduct the following test of hypothesis using the .03 significance level. H0: µ ≥ 220 H1: µ < 220 For Exercises 5-8: (a) State the null hypothesis and the alternate hypothesis. (b) State the decision rule. (c) Compute the value of the test statistic. (d) What is your decision regarding H0? (e) What is the ρ-value? Interpret it. 5. The manufacturer of the Χ-15 steel-belted radial truck tire claims that the mean mileage the tire can be driven before the tread wears out is 60,000 miles. The Crosset Truck Company bought 48 tires and found that the mean mileage for their trucks is 59,500 miles with a standard deviation of 5,000 miles. Is Crosset’s experience different from that claimed by the manufacturer at the .05 significance level? 6. The MacBurger restaurant chain claims that the waiting time of customers for service is normally distributed, with a mean of 3 minutes and a standard deviation of 1 minute. The quality-assurance department found in a sample of 50 customers at the Warren Road 15 MacBurger that the mean waiting time was 2.75 minutes. At the .05 significance level, can we conclude that the mean waiting time is less than 3 minutes. 7. A recent national survey found that high school students watched an average (mean) of 6.8 DVDs per month. A random sample of 36 college students revealed that the mean number of DVDs watched last month was 6.2, with a standard deviation of 0.5. At the .05 significance level, can we conclude that college students watch fewer DVDs a month than high school students? 8. At the time she was hired as a server at the Grumney Family Restaurant, Beth Brigden was told, “You can average more than $80 a day in tips.” Over the first 35 days she was employed at the restaurant, the mean daily amount of her tips was $84.85, with a standard deviation of $11.38. At the .01 significance level, can Ms. Brigden conclude that she is earning an average of more than $80 in tips.