Confidence Interval for the Population Proportion 1 What a way to start a section of notes – but anyway. Imagine you are at the ground level in front of my house at the curb. The picture below is the view of a sprinkler turned on full blast. The one thing bad about the picture is the sprinkler does not shoot in both directions at once. It shoots left and then right. But I put both for illustrative purposes. 2 When I put my sprinkler in the center of my yard I can cover the middle 95% of the front yard. Let’s think about an experiment we could undertake. Say you are outside my house late at night when all the lights are out and you are blindfolded. Then we spin you around a lot. Your job then is to put the sprinkler down in the yard. What is the probability that the center of the yard will get wet? Did you say 95%? Sure you did and here is why. If, when in the center, 95% of the yard can be hit, then putting the sprinkler at different places in the yard would mean that 95% of the time the center of the yard should be hit. Hope this helps you understand confidence intervals. If not, well, sorry. 3 Overview In this section we study one of the two basic inference methods confidence intervals (hypothesis testing is the other) Confidence intervals are used when our interest is estimating an unknown population parameter. 4 Overview In a previous section we saw the sampling distribution of sample proportions and on the next slide I show these results for the sampling distribution. The sampling distribution: 1) has a normal distribution 2) has the same mean as the population proportion from which the sample is drawn (even when we do not know the value – it’s the pattern that is the same), and 3) has a standard error equal to the square root of [p(1-p)]/n. Now in this section for practical purposes we use the sample proportion P hat in the formula instead of the population proportion p because we do not know it and are trying to estimate it. 5 The number line here is really just the line segment from 0 to 1 and the value marked on the line is the population proportion of a category of a qualitative variable. The small letter p is the population proportion of the category. p P is the sample proportion and will be typed (and pronounced) P hat. σphat = sqrt((P hat(1-P hat))/n) µPhat σphat = the standard deviation of the distribution of sample proportions P hat and is given another name – the standard error of the proportion! P hat 6 Here I just have the sampling distribution of the sample proportions P hat. Remember the center is at the population proportion p. µPhat P hat 7 On the previous slide I have arrows pointing in both directions from the center of the distribution. By adjusting the length of the arrows we can envision capturing a certain % of sample proportions. For example, if we go out 1.96 standard errors in either direction we know we would have 95% of all sample proportions. Now, think about slide 2 and slide 7 together. Starting at the center we cover 95% of the “ground.” 8 As I explain of slide 3 with the sprinkler, a similar story holds here. Say in 1 sample we get the sample proportion P hat 1. By the using the same length that was used from the center, if we now use the lengths from the sample proportion the “ground” covered by the length moving in either direction from the sample proportion P hat 1 should cover the center 95% of the time. µPhat P hat P hat 1 9 From my example you see that sometimes the sample proportion will not be the population proportion. So, a confidence interval builds in a margin of error around our sample proportion in the hopes that the interval will include the population proportion. The way we calculate the interval is 1) Take the sample proportion 2) Calculate another value I will explain about more later 3) Get two numbers by taking the sample proportion and subtracting the other value and taking the sample mean and adding the other value. This interval, from a low value to a high value, is hoped to contain the true unknown population proportion. 10 From the last slide I now reiterate some ideas. The line represents sample proportions. In our sample we get the one represented by the vertical maker. Then we calculate another value – I show you later. Take this number and subtract it from the sample proportion to get the lower limit of the interval and also take this number and add it to the sample mean to get the upper limit of the interval. Lower limit sample pro upper limit P hat 11 Estimating with confidence - confidence interval A property we learned earlier, combined with our more precise notion of the 68 - 95 - 99.7 rule, is that 95% of sample proportions lie within 1.96 standard errors of the population proportion. Imagine 1.96 standard errors is the length of my sprinkler in one direction. If in the center 95% of the yard can get wet, then by putting then sprinkler at other parts of the yard the center will get wet 95% of the time. 12 Estimating with confidence - confidence interval To get a confidence interval for the unknown population mean we 1. Calculate the sample proportion. 2. Calculate 1.96 standard errors. 3. Take the sample proportion and subtract 1.96 standard errors. Take the sample proportion and add 1.96 standard errors. 13 Example based on example 9.1 of the book page 193 There is a population of adults who watch the evening news. A drug maker may be interested in the proportion of viewers who would ask their doctor for information about the drug. The interest is in the population proportion of viewers who asked their doctor about the drug. Say 693 adults are surveyed to see if they had asked their doctor about a drug being advertised on the evening news. In the sample, 104 said they asked their doctor about the drug. The sample proportion P hat = 104/693 = .15 The standard error of the proportion is the square root of ((.15)(.85))/693 = square root of .0001839 = .0135572 or .014 14 Problem So, we need the number 1.96 times .014 = .027 The lower limit of the interval is .15 - .027 = .123, and the upper limit of the interval is .15 + .027 = .177. When we use this method we can expect that the interval we get will contain the true unknown population proportion 95% of the time. 15 Estimating with confidence - critical z The Z of 1.96 was the Z to get a 95% confidence interval. 1.96 is called the critical Z, or Zα/2. .025 .475 .475 .025 P hat .025 is the area to the right of the critical z = 1.96. 16 Estimating with confidence - critical z What if we want a 90% confidence interval? .05 .45 .45 .05 x The Z we should use is 1.645 17 Estimating with confidence Note that when we went from a 95% to a 90% interval the interval shrank. The 90% interval leaves us less confident and we get a smaller interval. This also means a 95% interval leaves us more confident and gives us a bigger interval. If you want to be 100% sure the interval includes the unknown mean, guess the interval is between a minus infinity and infinity. You can be sure the number is in that rather large interval – but this is not very practical. 18 99% confidence interval The Z to use if you want a 99% confidence interval is 2.575. We see the Z’s used for 4 commonly used confidence levels on page 139. 19 Estimating with confidence - summary A C% confidence interval means we can be C% confident the unknown parameter lies within Z standard errors of the sample proportion. This really means we arrived at these numbers by a method that gives correct results C% of the time. Here C is the Confidence Coefficient. 20 Level of significance – alpha The book uses the Greek letter alpha to stand for what is called the level of significance. alpha (α)= 1 – Confidence Coefficient. Well, if we are , for example, 95% confident the interval includes the unknown proportion then there is a 5% probability the interval will not include the unknown proportion. 21 The level of significance refers to the probability that a confidence interval would not include the true population proportion. Typically we pick on a level of significance of .05. In general, instead of the specific case of .05, we refer to the level of significance as alpha. Since alpha = 1 – confidence coefficient, an alpha of .05 would give a confidence coefficient of .95 and we would talk about a 95% confidence interval. A 5% chance of the population proportion not being in the interval calculated from the sample is the same as being 95% confident the population mean is in the interval calculated from the sample. 22 So, our confidence interval is made up of that part of the number line that has the limits P hat minus (Zα/2 times the standard error) and P hat plus (Zα/2 times the standard error), Where the standard error = square root of ((P hat)(1 – P hat)/n). Remember n = the sample size and P hat is calculated as the number of responses that have the category of the qualitative variable we are interested in divided by the total number of responses. Example would be number of folks who say Mt. Dew is their favorite pop divided by all the people who where asked what is their favorite pop (and some said Coke, some said Pepsi, some said Mt. Dew and so on). 23 In statistics the word estimator is used often. The word is used to talk about a method or procedure of using sample information to learn about a population parameter. The sample proportion P hat is a point estimator of the population proportion p. Note, once we collect a sample and calculate the sample proportion the value is called a point estimate. The point estimator has a drawback in that since you and I know the value obtained varies from sample to sample, the estimate will not likely be the true unknown population proportion. This is why we go to the confidence interval or what is more formally called an interval estimator. 24 In an introductory stats class it is customary to mention, without proof, some properties of estimators. I do that here. The estimators we mention in this class typically have the properties mentioned. An unbiased estimator of a population parameter is an estimator whose expected value is equal to the parameter. So the expected value of the sample proportion is the population proportion! This basically means the average of all possible sample proportions is the population proportion. An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger. Note in the standard error of the sample proportion that there is division by n and as n gets larger the standard error shrinks. 25 If there are two, or more, unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively more efficient. For most of the rest of the term we will see ideas similar to what we have in this section. Hypothesis testing is a variation of what we have done here. Lastly I want to work on a problem to develop 1 last idea. On the next slide I have an example where P hat = .48 from various sample sizes and the interval is based on 95% confidence. Note how as the sample size increases Z times the standard error decreases and note that the lower and upper limits get closer together. Z times the standard error is the margin of error I mentioned before. 26 27 With this thinking we can see the margin of error is Zα/2 times sqrt((P hat)(1-P hat)/n). Note on the previous slide this is the third column. If we set this margin of error equal to B we have B = Zα/2 times sqrt((P hat)(1-P hat)/n). We can use this to help use see what sample size we need to have a certain margin of error. By some math we can rearrange to have n by itself as n = (Zα/2 )2 (P hat)(1 – P hat)/B2. 28 The last point I would make is that if we are thinking about a sample size we really do not know P hat yet and so we should either use 1) .5 if we have no idea about the population proportion, or 2) Use some value that we think is relevant for the problem at hand. Let’s do problem 9.16 page 147 to consider this idea. From the problem we have n = 1.6452 times (.5)(.5)/.032 = 2.71(.25)/.0009 = 753 29