P201 Lecture Notes Statistical Inference and Estimation Chapter 6 The two processes that are statistical inference. I. Estimation. Question asked: What is the value of the population parameter? Example: What % of U.S. citizens believe that Congress should raise the debt ceiling? What % of persons in Chattanooga believe that kids under 18 should be allowed in Coolidge Park only under adult supervision? What % of persons using the Riverwalk believe that dogs should prohibited? Answers: A number, called a point estimate, or an interval, called a confidence interval. The interval is one that has a prespecified probability (usually .95) of surrounding the population parameter. So the answer to the first example might be reported as “37% with a 5% margin of error.” This is a combination of point estimate (the 37%) and a interval estimate (from 37-5 to 37+5 or from 32 to 42%). II. Hypothesis Testing. A. With one population. . . Deciding whether a particular population parameter (usually the mean) equals a value specified by prior research or other considerations. Example: A light bulb is advertised as having an “average lifetime” of 5000 hours. Question: Is the mean of the population of lifetimes of bulbs produced by the manufacturer equal to 5000 or not? B. With two populations. . . Deciding whether corresponding parameters (usually means) of the two populations are equal or not. Example: Statistics taught with lab vs. Statistics taught without lab. Question: Is the mean amount learned by the population of students taught with a lab equal to the mean amount learned by the population of student taught without lab? C. Three populations. . . Deciding whether corresponding parameters (usually means) of the three populations are equal or not. And on and on and on. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 1 2/5/2016 Estimation – not covered in Corty Estimation is using information from samples to guess the value of a population parameter or difference between parameters. A lot of this goes on during an election season. Point Estimate A single value which represents our single best estimate of the value of a population parameter. Interval Estimate (usually reported as “Margin of error”) An interval which has a prespecified probability of surrounding the unknown parameter. This interval estimate is called a confidence interval (CI). Typically the interval estimate is centered around the point estimate. Lower limit Upper limit Point estimate statistic The sample statistic most often used as a point estimate is the Sample Mean. Reporting the result of estimation: (Hypothetical data) . . . “From the result of the XYZ poll, it is estimated that 17% of adult residents of the U.S. have tried gluten free diets, with margin of error equal to 3%.” This means that the pollster is 95% confident that the actual population percentage of persons trying gluten free diets is between 14% and 20%. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 2 2/5/2016 Introduction to Hypothesis Testing: The mean of a population Suppose we have a single population, say the population of light bulbs mentioned above. Suppose we want to determine whether or not the mean of the population equals 5000. You might wonder: What does it matter whether the mean is 5000? Answer: The manufacturer might be selling light bulbs whose average lifetime is only 2000 hours and may be counting on the fact that no one really pays attention. But the difference between 2000 hours and 5000 hours will add up over multiple purchases. In times when money is tight, those small differences may combine to make a large overall difference. Two possibilities H0. The population mean equals 5000. H1. The population mean does not equal 5000. These possibilities are called hypotheses. The first, the hypothesis of no difference is called the Null Hypothesis. (H0) The second, the hypothesis of a difference is called the Alternative Hypothesis. (H1) Our task is to decide which of the two hypotheses is true. “I reject the null” vs “I fail to reject the null” (I will sometimes say, “I retain the null.”) Why can’t we just know about the population? Light bulbs – The manufacturer may have simply made up the numbers on the package. They may have made a mistake in estimation. Treatment for C.Diff – One doctor says take antibiotics for 2 weeks. He/she believes that the mean number of C.Diff bacteria will be essentially 0 after 2 weeks. A 2nd doctor says take them for 6 weeks. He/she believes that the mean number of C.Diff bacteria will not be zero until 6 weeks. The point is that no one “knows” the correct value. If we have a belief about a specific population value, we have to test that belief using hypothesis testing procedures. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 3 2/5/2016 Two general approaches. 1. The Bill Gates (Warren Buffet) approach. Purchase ALL the light bulbs in the population. Measure the lifetime of each bulb. Compute the mean. If the mean equals 5000, retain the null. If the mean does not equal 5000, reject the null. Problem: Too many bulbs. 2. The Plan B approach. Take a sample of light bulbs. Compute the mean of the sample. From our study of sampling distributions, we know that the sample mean will not exactly equal the population mean. But we also know that it’ll be close to the population mean. If the population mean is 5000, then the sample mean should be close to 5000. So . . . An intuitively reasonable decision rule If the value of the sample mean is “close” to 5000, decide that the null must be true. If the value of the sample mean is “far” from 5000, decide that the null must be false. But how close is “close”? How far is “far”? What if the mean of the lifetimes of 25 bulbs were 4999.99? Most rational people would retain null. What if the mean of the lifetimes of 25 bulbs were 1003.23 Most rational people would reject null. What if the mean of the lifetimes of 25 bulbs were 4876.44? Hmm. A gray area. Clearly we need some rules, or perhaps we should say that we need an operational definition of “close” and of “far”. Close and Far as Probabilities – not covered in Corty Recall from sampling distributions that means of samples from a population vary around that mean of the population. This variation is due to sampling error. When we take samples from mean whose population is 5000, for example, the means most likely to occur will be those close to 5000. The means least likely to occur will be those far from 5000. This suggests that we could say that a mean is “close” to the hypothesized mean if it is one of those means that would be likely to be obtained from a population whose mean was the hypothesized value. And we could say that a mean is “far” from the hypothesized mean if it is one of those that would be unlikely to be obtained from a population whose mean was the hypothesized value. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 4 2/5/2016 Our Decision Rule Stated in terms of Probabilities So we can describe out decision rule (“close” vs. “far”) in terms of probabilities . . . If the sample mean is one of those that would have high probability of occurring if the population mean were 5000, then we’ll conclude that the population mean must be 5000. If the sample mean is one of those that would have low probability of occurring if the population mean were 5000, then we’ll conclude that the population mean must not be 5000. The p-value. Statisticians state the decision rule in terms of probabilities. They formalize the process by computing a special probability called the p-value. The p-value is the probability of an outcome (e.g., sample mean value) as extreme as the obtained outcome if the null hypothesis is true. Statisticians base their decision on the p-value. Our Decision Rule described in terms of the p-value But if the p-value is larger than the agreed-upon criterion value, the null hypothesis will not be rejected. Statisticians have agreed that if the p-value is smaller than or equal to an agreed-upon criterion value then the null hypothesis is to be rejected. Close = High probability = large p-value. Far= Low probability = small p-value. Signficance Level The criterion against which the p-value is compared is called the significance level. Typically, the significance level is (arbitrarily) set at .05. Our Final Description of the process in terms of p-value and significance level . . . If the p-value is larger than the significance level, then the null hypothesis is not rejected. If the p-value is less than or equal to the significance level, then the null hypothesis is rejected. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 5 2/5/2016 Corty’s Steps in Hypothesis Testing Every hypothesis test involves a set of steps. Every statistical text has a variation on the following list of steps. These steps are always carried out, regardless of the type of hypothesis being tested. Step 1. Pick the statistical test appropriate for your hypothesis. Right now, you don’t know of any statistical tests. That will soon change. Step 2. Make sure your data meet the assumptions of the test, e.g., unimodality, symmetry, near normality, no outliers. Create a frequency distribution with Normal Curve Overlay. Look for outliers – extremely positive or negative values. Look at skewness values – should be |skewess| < = 1.5. Step 3. List the null and the alternative hypothesis See my example. Step 4. Set the significance level of the test. (Corresponds to Corty’s “critical value” step.) Significance level will always be .05 for this course. See below for critical values. Step 5. Compute the value of the test statistic. Also compute its p-value. See example. Step 6. Compare the p-value with the significance level and make your decision. Then interpret the result. You should memorize these steps for the next exam. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 6 2/5/2016 Worked Out Example – The light bulb example Suppose that purchasers from across the country were contacted and instructed to go to the nearest hardware/building supply store and purchase a packet of the bulbs and then to select one bulb randomly from the packet. Those bulbs were then packed in foam and sent to a centralized testing facility where 100 of them were randomly selected from the nearly 500 shipped from across the country. Those 100 were plugged into standard sockets and power was applied. They were allowed to burn continuously until they failed and the time to failure of each bulb in the sample was recorded. Suppose that the manufacturer had substantial evidence that the standard deviation of the population was equal to 300 and that this value was not related to the mean of the population. We couldn’t actually know this in real life. Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single population is the Z test. The test is called the One Population Z Test. Step 2: Assumptions. We’ll assume that the data are essentially uniomodel and symmetric, pretty nearly normally distributed. We’ll assume that there are no outliers. Step 3: Null hypothesis and Alternative hypothesis. We want to determine whether the manufacturer’s claim that the “average” lifetime in the population is 5000 is true or not. This suggests the following . . . Null Hypothesis: The mean of the population equals 5000. µ = 5000. Alternative Hypothesis: The mean of the population does not equal 5000. µ ≠ 5000. Step 4: Significance Level This one's easy. It's quite common to use .05 as the criterion for the probability of the observed outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than .05, we retain the null. So let the significance level be .05 As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 7 2/5/2016 Step 5: Computed value of test statistic and the p-value Suppose the mean of the sample of 100 lifetimes was 4935.33 with sample standard deviation equal to 305.4. Then Z Z= = (X-bar – Hypothesized mean) ----------------------------------Standard error of the mean 4935.33 – 5000 -------------------------------- = 300/10 - 64.67 ---------30 = -2.16 The p-value for a Z of -2.16 is computed as a Normal Distribution Problem. Recall that the p-value is the probability of a value as extreme as the obtained value of Z. The Z is 2.165, so any Z as large as + 2.16 or larger is “as extreme as” the obtained Z. But note that any Z as negative as – 2.16 or more negative is also “as extreme as” the obtain ed Z. So we want the probability of a Z as positive as +2.16 + probability of a Z as negative as -2.16. To solve it, we get the two areas beyond 2.16 in either direction – the area to the left of the Z as a negative number and the area to the right of Z as a positive number. Z= - 2.16 Z= + 2.16 Distribution of sample means if null were true. -2 -1 0 1 2 Z= Tail area = .0154 .0154 p-value = Gotten from the Normal Distribution table. .0154 + .0154 = .0308 Step 6: Compare the p-value with the significance level. .0308 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population mean is NOT equal to 5000. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 8 2/5/2016 Critical Values of the Z statistic The Z obtained Z value for the above example was 2.16. The p-value was .0308. Here are some other Z values that we could have obtained and the p-value for each of them. Z value we could have obtained. 0.50 1.00 1.50 1.70 1.80 1.96 2.16 (our Z) 2.50 3.00 3.50 p-value Decision 0.6170 0.3174 0.1336 0.0892 0.0718 0.0500 0.0308 0.0124 0.0027 0.0005 Do not reject Do not reject Do not reject Do not reject Do not reject Reject Reject Reject Reject Reject Note the pattern. As Zs get farther from 0, the p-values get smaller and smaller. Note that there is one “special” Z value whose p is exactly 0.0500. That Z is called the critical Z. Its value is 1.96. If your Z is -1.96 or +1.96, you know that your p-value is exactly .05. Any Z larger in absolute value than the critical value will have a p smaller than .05. This Z will always be the value that divides “Do not reject” from “Reject”, whenever you do a Z test. This means that knowledgeable data analysts don’t even bother to compute p-values when they do a Z test. They remember that the Critical Z is 1.96 and after conducting their research, if their obtained Z is equal to or more negative than -1.96 or equal to or more positive than + 1.96, they reject. We will use p-values, not critical values. If all of our hypothesis tests were Z tests, then we would not bother computing p-values for any of them. If that were the case, we’d simply compare our obtained Z with 1.96 and base our decision on the result of that comparison. But 99+% of our statistical tests will NOT be Z tests. And computation of critical values for the other types of tests is cumbersome. Luckily, our computer program will compute the p-value for each of the other tests that we’ll conduct. So we don’t have to deal with critical values of any of the statistical tests that follow. (I may mention them in passing, but we’ll let the computer do that work for us.) Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 9 2/5/2016 Another example. A drug manufacturer has developed a drug that it believes will affect the duration of cold symptoms. Suppose that prior careful measurement of colds symptoms has determined that the population average duration of symptoms without treatment is 8 days with population standard deviation equal to 2 days. The manufacturer recruits a sample of persons who sign a waiver allowing the representatives of the manufacturer to “give them” colds by swabbing their nasal passages with a fluid containing the cold virus. The time of swab is time 0. The number of days until a carefully calibrated measure indicated the absence of cold symptoms was recorded. For 25 persons, the values are listed below . . . days Frequency Valid Percent Valid Percent Cumulative Percent 3 3 12.0 12.0 12.0 4 1 4.0 4.0 16.0 5 4 16.0 16.0 32.0 6 7 28.0 28.0 60.0 7 4 16.0 16.0 76.0 8 2 8.0 8.0 84.0 9 1 4.0 4.0 88.0 10 3 12.0 12.0 100.0 25 100.0 100.0 Total Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single population is the Z test. Step 2: Assumptions. Check these Step 3: Null hypothesis and Alternative hypothesis. We want to determine whether the manufacturer’s claim that the “average” duration of symptoms in the population is 8 is true or not. This suggests the following . . . Null Hypothesis: Alternative Hypothesis: Biderman’s 201 Handouts The mean of the population equals 8. The mean of the population does not equal 8. µ = 8. µ ≠ 8. P201 Topic 11: Statistical Inference- 10 2/5/2016 Step 4: Significance Level This one's easy. It's quite common to use .05 as the criterion for the probability of the observed outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than .05, we retain the null. So let the significance level be .05 As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96. Step 5: Computed value of test statistic and the p-value Z = (X-bar – Hypothesized mean) ----------------------------------Standard error of the mean 6.32 – 8 -------------------------------2/5 Z= `-1.68 ---------0.4 = = -4.2 The p-value for a Z of -4.20 is computed as a Normal Distribution Problem. Recall that the p-value is the probability of a value as extreme as the obtained value of Z. To solve it, we get the two areas beyond 4.2 in either direction – the area to the left of the Z as a negative number and the area to the right of Z as a positive number. Z = -4.20 Z= +4.20 -2 Tail area = -1 0 1 2 .00133 .00133 p-value = .00133 + .00133 = .00266 Step 6: Compare the p-value with the significance level. .0000 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population mean is NOT equal to 8. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 11 2/5/2016 Possible results of the Hypothesis Testing Process Start here on 10/23/14. State of World Null True, µ=5000 Null False, µ≠5000 Fail to reject Null Correct Failure to reject Incorrect Failure to Reject Type II Error Reject Null Incorrect Rejection Type I Error Correct Rejection Decision Correct Failure to reject – a good outcome The null is true. µ really does equal 5000. The manufacturer’s claim is true. We do not reject the null but instead conclude that the null is true. We make a correct decision. Correct Rejection – another good outcome The null is false. µ really does not equal 5000 The manufacturer’s claim is wrong. We "detected" the difference between the actual population mean and the manufacturer’s claim. Incorrect Rejection: Type I Error The Null is true. µ really does equal 5000. The manufacturer’s claim is true. But unbeknownst to us, because of a random accumulation of factors, our outcome was one which seemed inconsistent with the null. So we rejected it and incorrectly accused the manufacturer of lying on its packaging. Controlling P(Type I Error): The significance level. So, in most research, the probability of this error is .05. Incorrect Retention: Type II error. The Null is false. µ does not equal 5000. The manufacturer is lying But, because of a random accumulation of factors, our outcome was one which seemed consistent with the null. So we did not reject it and issued a statement saying incorrectly that the manufacturer’s packaging had truthful language. Controlling P(Type II error). Having large samples is the primary means of minimizing P(Type II). Larger sample sizes lead to smaller P(Type II error). Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 12 2/5/2016 Two-tailed vs. One-tailed Alternative Hypotheses For one population tests, the Null is always: “Population mean equals X.” Most of the time, the alternative hypothesis is, “Population mean does not equal X.” The “does not equal” means that we will reject the null . . . if the sample mean is less than the hypothesized value or if it is larger than the hypothesized value. Most of the time we’re so ignorant of the situation that we won’t be able to predict whether the population mean will be less than or larger than the hypothesized value when the null is false. Occasionally, however, we’ll know that if the null is false, the population mean can only be larger than the hypothesized value. In other instances, we’ll know that if the null is false, the population mean can only be smaller than the hypothesized value. Example . . . A manufacturer of food products is trying to determine how much sodium to add to its packaged meat. The population mean taste ratings of its meat with 9% sodium is 57 with standard deviation = 7. An experiment is conducted in which samples of meat with sodium = 12% are rated. The null hypothesis is: Population mean rating of the 12% meat = 57. Alternative hypothesis: ? Suppose the manufacturer knows that the meat with more sodium won’t be rated worse (except on rare occasions due to sampling error). In this case, the manufacturer knows that if the sodium has an effect, it will only INCREASE the mean rating. So the alternative hypothesis would be (One-tailed) Alternative hypothesis: Population mean > 57. Using a one-tailed alternative hypothesis changes the p-value. If the alternative hypothesis is “Pop mean > hypothesized value”, the p-value is the probability of a value as large as the obtained value. If the alternative hypothesis is “Pop mean < hypothsized value”, the p-value is the probability of a value as small as the obtained value. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 13 2/5/2016 Worked out example: Ratings of a sample of 25 persons are taken. Here they are . . . rating 73 57 56 66 49 59 70 45 56 51 56 48 70 73 47 59 63 59 63 60 56 67 60 60 62 Mean = 59.4 S = 7.778. Step 1: Test Statistic: A test statistic appropriate for testing a hypothesis about the mean of a single population is the Z test. Step 2: Assumptions. Check these Step 3: Null hypothesis and Alternative hypothesis. We want to determine whether the manufacturer’s claim that the “average” duration of symptoms in the population is 8 is true or not. This suggests the following . . . Null Hypothesis: Alternative Hypothesis: Biderman’s 201 Handouts The mean of the population equals 57. The mean of the population is larger than 57. µ = 57. µ > 57. P201 Topic 11: Statistical Inference- 14 2/5/2016 Step 4: Significance Level This one's easy. It's quite common to use .05 as the criterion for the probability of the observed outcome. If that probability is less than or equal to .05, we reject. If that probability is larger than .05, we retain the null. So let the significance level be .05 As we’ll see, a significance level of .05 corresponds to a critical Z value of + or – 1.96. Step 5: Computed value of test statistic and the p-value Z Z= = (X-bar – Hypothesized mean) ----------------------------------Standard error of the mean 59.4 – 57 -------------------------------7/5 ` 2.4 ---------1.4 = = 1.7 The p-value for a Z of 1.70 is computed as a Normal Distribution Problem. Recall that the p-value is the probability of a value as LARGE as the obtained value of Z (since we’re employing a one-tailed alternative). To solve it, we get the two areas beyond 4.2 in either direction – the area to the left of the Z as a negative number and the area to the right of Z as a positive number. Z= +1.70 -2 -1 0 1 2 Tail area = p-value = .0446 .0446 Step 6: Compare the p-value with the significance level. .0446 is less than .0500, so we’ll reject the null hypothesis. Our conclusion is that the population mean is larger than 57. Biderman’s 201 Handouts P201 Topic 11: Statistical Inference- 15 2/5/2016