Demonstrations IV Dr. Scott Stevens AA. AB. AC. AD. Estimation of the population mean when is known (Confidence intervals) Estimation of the Population Mean when unknown (Confidence intervals) Estimation of Population Proportion (Confidence Interval for One Proportion) Determination of Appropriate Sample Size (Proportion) 1 AA. Estimation of the population mean when is known (Confidence intervals) Problem: Statisticians sometimes report the 50% confidence interval, with the margin for sampling error known as the probable error. For example, an estimate x-bar of the average useful life of a TV picture tube is said to have a probable error of e years if there is a 50% chance that the interval from x-bar – e to x-bar + e has a 50% chance of including the population mean. Calculate the probable error if the standard deviation in TV tube lives is known to be 2.5 years and the average useful lifetime in a sample of 25 TV tubes is found to be 8.15 years. This is an estimation problem. In estimation problems, you are given information about a sample, and are asked to provide an estimate of a population parameter. This estimate usually is expressed as a confidence interval, or interval estimate. The process of doing this is always the same. We compute a statistic from the sample, such as the sample mean, and use this value as a point estimate (single number estimate), of the corresponding population parameter. We then compute a margin of error for this estimate. We’ll have more to say about the margin of error below: how to compute it, and what it means. When finding the confidence interval for the mean, we can use the procedure in the box on the next page. The box describes how to build a confidence interval for the mean when the population is known (which is the situation in this demonstration) and when it is unknown (as in demonstration AB). Read that procedure on the top of the next page now, then come back to this page and continue. ***** I’ve implemented this box in a spreadsheet template called Confidence Interval for the Mean. You can find it on the website. You won’t have this template available for exams, but it will allow you to check your work in homework, and will walk you through the required calculations step by step. I haven’t gone into the why of this box, but I think your book does an okay job of explaining it in Chapter 8. The basic idea is that the critical score tells you how many "standard deviations" you have to extend out from the population mean in order to pick up the fraction c of the sample means. The second term ( n or s / n ) tells you how big one standard deviation is for the sampling distribution of the mean. (The actual value is n , but in real life, we rarely know , so we approximate with s / n . This quantity, s / n , comes up a lot, and it's usually called the standard error. It's not a great name, but get used to it. Whenever you see a reference to the standard error of a statistic, it always means: "here's our approximation to how big one standard deviation is in the sampling distribution of this statistic". One more comment before proceeding. As we’ve mentioned before, notation in statistics is not always universal. When referring to a critical z or t score, I’ll always indicate it with an asterisk (z * or t*). This is just a way of letting you know that this particular score is a critical value. Although this usage is common, your book doesn’t use it. 2 Finding a Confidence Interval for the Population Mean What you need: The mean of your sample, x-bar The size of your sample, n Either the standard deviation of the population (), or the standard deviation of your sample (s). The confidence level that you desire, c. (95% is the most common.) What you get: A confidence interval for the population mean at the specified confidence level. 1. The confidence interval for the population mean always looks like this: x-bar + margin of error. We abbreviate margin of error as MOE. If you know , the population standard deviation, go to step 2a. If you do not know the population standard deviation , then go to step 2b. 2a. You know . In order to compute the MOE, you'll need to find the critical z value, z*. Find z*. In Excel, z* = NORMSINV((1+c)/2). You can also find the value of z * for the most commonly used confidence levels by looking in the table below. For confidence level Use z* = MOE = z* .90 1.645 .95 1.960 .99 2.576 n , so confidence interval = x-bar + z* n go on to step 3 2b. You don't know . In order to compute MOE you'll need the critical t value, t*, and the sample standard deviation s. Find t*. In Excel, t* = TINV(1-c,n-1). Alternatively, you can use the book's explanation of how to obtain t* from Table E.3 at the back of your text. Find the sample standard deviation s, if it hasn't been given. You can do this with the Excel command = STDEV(data range), where data range is the set of sample values. confidence interval = x-bar + t* s/ n go on to step 3 3. Verify that the assumptions of this technique are satisfied. You are okay if any of the following are true. The original population is normally distributed, or The data is roughly normally distributed and n > 10, or The data is roughly symmetric and n > 20, or The sample is large (n > 30 is usually enough) 3 Now let’s solve the problem. It’s whole discussion of probable error just says: The probable error is the margin of error for a 50% confidence interval. So we have: x-bar = 8.15 = 2.5 n = 25 c = 0.50 I'll show the work in Excel, using my Confidence Interval for the Population Mean template. As always, the template includes in the blue box the formulas used to perform the calculation. Note that, for this problem, I typed in the values of , n, and x-bar. If you actually typed in the 25 TV tube lifetimes, Excel would have computed n and x-bar for you. Confidence Interval for the Population Mean Population standard deviation, , if known Sample mean, x-bar Sample size, n Sample standard deviation, s confidence level, c sample mean, x-bar sample size, n sigma known, s not needed critical z value, z* standard error, SE margin of error, MOE lower confidence limit upper confidence limit 2.5 8.15 25 0.5 8.15 25 --0.674490366 0.5 0.337245183 7.812754817 8.487245183 CELL MUST BE EMPTY IF <== SIGMA NOT KNOWN! =AVERAGE(range) =COUNT(range) =NORMSINV((1+c)/2) =sigma/SQRT(n) =z* x SE =x-bar - MOE =x-bar + MOE Check Population must be roughly symmetric--sample<30 So our answer is: The probable error in our 8.15 year estimate of average TV life is about 0.337 years, which is about 4 months. Since the sample is of 25 TV sets, this conclusion should be valid as long as the distribution of TV set lifetimes is roughly symmetric. Since the problem didn't give us the data, we can't conclude that for sure. The only fishy thing in the work above is the calculation of the critical z, z *. Where does that (1+c)/2 come from? It may be easiest to understand by dealing with a particular confidence level, like the 0.5 level in this problem. We want a “central chunk” of the standard normal distribution; in this case, the middle 50% (or 0.50) of it. How far from the middle of the distribution are the endpoints of this chunk? Well, if 50% of the area is in this chunk, then 50% is outside of it, too. Since the chunk we’re looking at is central, this means that 25% of the area is in each of the two tails. So I want to know what z values give me 25% of the area in each tail. The cutoffs are =NORMSINV(0.25) and =NORMSINV(0.75). Make sure you see why. These values are –0.67449 and +0.67449. Since the one value will always be the negative of the other (why?), we just compute the upper one. 4 0.45 0.4 Area = c density 0.35 0.3 0.25 0.2 Area = (1 - c)/2 Area = (1 - c)/2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 z 1 z* 2 3 4 The graph above shows the more general situation. If we use a confidence level c, then we want the area in the central chunk to be c, so we want the area in the tails to total to 1 – c. Since there are two tails, each gets an area of (1 – c)/2. Now look how much of the curve’s area lies to the left of z*: (1 – c)/2 for the lower tail and c for the central chunk. The total is (1 + c)/2, so the value of z* is given by =NORMSINV((1+c)/2). (Recall how NORMSINV works from Demonstration X.) What this means—the whole idea of confidence interval It's important to understand what the confidence interval (or margin of error) tells you, and what it does not. Let me try to get this across to you by eavesdropping on a conversation between Bob, the TV Guy, and his friend Phil (also known as “Phil the Nitpicker”). Phil’s TV is broken. Phil: So my TV tube’s shot, huh? Bob: ‘Fraid so. I’m gonna have to put in a new one. Phil: How long will the new one last? Bob: Well, it varies, of course. Phil: Okay—I understand you can’t tell me how long my new tube will last. But how long do TV tubes last on average? Bob: Well, I’d say about 8.15 years. Phil: Really? Bob: Really. A buddy of mine has an electronics store. He ran 25 TV sets until their picture tubes went south, and he said that they lasted an average of 8.15 years. He likes figuring stuff like that. Phil: You have weird friends, Bob. Bob: (eyeing Phil) Tell me about it. Phil: Still…that’s only 25 sets. I mean, you can’t say that the average for all TV tubes in the world is 8.15 years, just because it was 8.15 years for the sets your friend had. Bob: Nope, Phil, you can’t. That’s why I said, “about 8.15 years”. Hand me that screwdriver, willya? Phil: (hands it over) Yeah. Yeah. (pauses) So when you say that tubes last on average about 8.15 years, You mean—what? Like, a tube will last between, say, 8 and 8.3 years? I mean, I’ll be good for 8 years anyway, huh? Bob: (stopping work to look at Phil) Look, man—I have no idea what your tube will do. It could burn out in a week, a year, 10 years, or 50 years, for all I know. That’s what warranties are for. Some tubes last a long time, some don’t. I’m just saying that, on average, they last about 8.15 years. Okay? Phil: Yeah, sure, sure. I understand. Sure. That’s what I meant. 5 Bob, a bit fed up, returns to working on the set. Phil: So, the average of your friend’ tubes was 8.15 years, so you figure that the average for all the tubes is about 8.15 years. That makes sense. Bob: Seems so to me. Phil: So—what? You figure the average lifetime for all the picture tubes in the world is between 8 years and 8.3 years? Bob: I reckon so. Probably. Phil: Yeah, yeah. Me too. I reckon so. (pauses) What do you mean, “probably”? Bob: (pulling his head out of the back of the TV) Look, Phil—I DON’T KNOW, okay? It’s possible, I suppose, that my friend just happened to get a bunch of Super TV tubes in his set! Or maybe they were particularly bad, just by chance! So maybe, just maybe, the actual average life of a picture is way off from 8.15 years! Maybe they only last 3 months on average, and everyone that you and I ever heard of has just been really lucky! Phil: But you don’t think so… Bob: No, I don’t think so. It could happen, but it’s bloody unlikely! All right? I think it’s pretty darned likely that the average for all picture tubes is between 8 and 8.3 years. I can’t tell you exactly how likely it is that I’m right about this, but I never took statistics in TV repair school! Phil: I was just asking for your opinion… Bob: (putting down his tools, keeping himself under control) Okay. Here’s my opinion. I know a guy with 25 sets, and his tubes lasted an average of 8.15 years. That part, I’m sure of. From that, I’m guessing that the average lifetime of all of the TV tubes in the world is about 8.15 years. And, since you are so concerned about it, by “about 8.15 years”, I mean “between 8 and 8.3 years”. But maybe I’m wrong, and the average isn’t between 8 and 8.3 years. I’m saying that I guess that there’s about a 10% chance that I’m wrong, but that’s a guess, too. Now, DO YOU WANT ME TO FIX YOUR SET, OR DON’T YOU? Phil: Well, about how much is it going to cost? As annoying as Phil is in the conversation above, he does have a few points. First, you can’t be sure that a sample, even one taken with proper care to assure randomness, is representative of the population as a whole. Second, if you’re going to tell me that the lifetime is “about 8.15 years”, I need to know what you consider to be “about 8.15”. Is 8 years “about “ 8.15? How about 7 years? 12 years? Finally, even if you specify to me what you mean by “about”, there’s still the possibility that you’re wrong—that the range that you gave me doesn’t include the actual mean lifetime of TV tubes. (And, of course, even if you could tell me exactly how long tubes last on average, that still doesn’t make any guarantees about my particular tube.) So the consequence of all of this is: when you’re going to estimate the value of the population mean, , by saying that it’s “about as big as x-bar”, you have to tell me two things: first off, what do you mean by “about”, and secondly, how likely is it that you’re right. And this is exactly what we do whenever we build a confidence interval. We use the sample mean, x-bar, as our guess for the population mean, . Our definition of “about” come from our MOE (margin of error). Finally, our chance of being right in our estimate is what we call our confidence level. So in the language of statistics, Bob was saying that he believed that the 90% confidence interval for mean tube life ran from 8 to 8.3, implying that the margin of error was 0.15 for this interval. He’s quite wrong, as the problem solution on page 3 shows. There is, in fact only a 50% chance that the average life is between 7.81 and 8.48 years. If you computed the 90% confidence interval that Bob needs, you’d find it has a margin of error of 1.645 0.5 = 0.8225 years, or about 10 months. That is, there is about a 90% chance that the mean lifetimes of all TV tubes is within 10 months of 8.15 years. 6 Meaning of "90% Confidence" A very good way to think of what confidence means—say, "90% confidence"—is this thought experiment. I have a barrel, filled with black marbles and white marbles, mixed thoroughly together. You know that 90% of the marbles in the barrel are white and 10% are black. Reach into the barrel and take a marble. Hold it tightly in your hand, and don't look at it. We would then say that you are 90% confident that the marble you took is white. Without looking at it, you don't know what it is, but picking a marble the way that you did will give you a white marble 90% of the time. It's terribly common for students to completely misinterpret confidence intervals. Check out the interpretations below and make sure you understand their mistakes. WHAT THE SOLUTION TO THE PROBLEM DOES NOT MEAN: The MOE was 0.337 years for a confidence level of 50%. This does not mean that 50% of all TVs have lifetimes within 0.337 years of 8.15. TV lifetimes are much more spread out than that. The confidence interval is a statement about the population mean—the average lifetime of a TV set. WHAT THE SOLUTION TO THE PROBLEM DOES NOT MEAN: The MOE was 0.337 years for a confidence level of 50%. This does not mean that 50% of all samples of 25 TV sets will have means within 0.337 of 8.15 years. Again, we are using a sample to build a confidence interval for the population mean. WHAT THE SOLUTION TO THE PROBLEM DOES NOT MEAN: The MOE was 0.337 years for a confidence level of 50%. This does not mean that the population mean will be within 0.337 years of 8.15 "50% of the time". The mean lifetime of TV sets is a single number, and it either is in the confidence interval, or it isn't. After all, the marble in your hand in the thought experiment above isn't white 90% of the time and black 10% of the time, is it? It doesn't flash back and forth from color to color! Sometimes people will say "there's a 50% chance that the population mean is within 0.337 years of 8.15". This is skirting the edges of acceptable language, and it can give a false impression. It’s better to say that we're 50% confident that the average lifetime of a TV set is within 0.337 years of 8.15 years. AB. Estimation of the Population Mean when unknown (Confidence intervals) Problem: Henry Cavendish (who, by the way, was a real mad scientist) made 23 measurements of the density of the earth relative to the density of water. In so doing, he is sometimes credited as being “the man who weighed the earth”. Assuming that his observations (provided on the next page, in yellow) are independent measurements made from a normal distribution, does the 99% confidence interval include the value of 5.517 that is now accepted as the density of the earth? We are given the observations in a sample and asked to compute from it alone a confidence interval for the mean of the population. To solve this, we’re going to use the box on page 2 of these demonstrations. We'll find s and use it to find the standard error (which is s/n). We'll find the critical t value (which is =TINV(1-c, n-1)), then multiply the standard error by this t value to get the margin of error (MOE). This quantity is added to and subtracted from the sample mean (x-bar) to give the confidence interval. Here's the work for this one, in Excel. Again, I’ve used my Excel template Confidence Interval for the Population Mean. 7 Confidence Interval for the Population Mean DATA 5.1 5.27 5.29 5.29 5.3 5.34 5.34 5.36 5.39 5.42 5.44 5.46 5.47 5.53 5.57 5.58 5.62 5.63 5.65 5.68 5.75 5.79 5.85 Population standard deviation, , if known <==Since you've provided data, I'll compute the values of x-bar, s and n. I'll ignore anything in these three cells ==> confidence level, c 0.99 sample mean, x-bar 5.483478261 sample size, n 23 sample standard deviation, s 0.190420795 critical t value, t* 2.818760549 standard error, SE 0.03970548 margin of error, MOE 0.111920242 lower confidence limit 5.371558019 upper confidence limit 5.595398503 Check Population must be roughly symmetric--sample<30 =AVERAGE(range) =COUNT(range) =STDEV(range) =TINV(1-c,n-1) =s/SQRT(n) =t* x SE =x-bar - MOE =x-bar + MOE So the 99% confidence interval runs from 5.37 to 5.60 times the density of water. This does indeed include 5.517. We were instructed to assume that the population is normal, so our assumptions in the box on page 2 are satisfied. Two final points before we leave this problem. First, let’s look at the formula for the critical t value, t*. Where does this come from? Well, unlike the other INV functions in Excel, TINV wants you to tell it the total area in the two tails. Don’t ask me why. It means, though, that a confidence level of c requires that the “central chunk” of the t distribution have an area of c, which means that the total area in the two tails must be 1 – c. (See page 4 of this demonstration if this doesn’t make sense.) The n – 1 is the “degrees of freedom” for the t distribution, and we’ll talk about what that means in class. For confidence interval work for one mean, it’s always n – 1. Okay, the last point: you can get Excel to do this work for you directly. If you're interested, I've provided instructions on the next page. I'd encourage you to give it a try. So why don't we just let Excel do this work all the time? Four reasons. 1. Excel doesn't check assumptions—you must, or what Excel tells you could be garbage. 2. Excel requires that you provide all of the sample observation. You can't just tell it x-bar, s, and n, and let it go. 3. Excel assumes that you don't know . If you do, then it's proper to use the z-distribution, not the tdistribution. (Frankly, this isn’t much of an objection. In real life, we almost never know sigma.) 8 4. It's important that you understand what you're doing. It's almost impossible to expand upon your knowledge of stats if you don't. Having Excel Automatically Generate the Confidence Interval for the Mean of a Population from a Sample Alone What you need: a randomly selected sample from a population. The sample must satisfy the requirements in step 3 of the procedure on page 2. What you get: a lot of information, including the confidence interval for the mean of the population. Step 1: Enter your sample into Excel, either as a single row, or a single column. Step 2: On Excel's Tools menu, choose Data Analysis. From the menu that appears, choose Descriptive Statistics. Click OK. Step 3: Click in the box labeled Input Range, then highlight the numbers in your sample. The range identifying these cells should appear in the Input Range box. Step 4: Check the boxes next to Summary Statistics and Confidence Level for Mean. In the Confidence Level For Mean box, put the desired confidence level. If you want a 99% confidence interval, type 99. Step 5: Click OK. The top row of the output (labeled mean) tells you x-bar, and the bottom row (labeled Confidence Level) tells you the MOE. Step 6: To get the lower limit of the confidence interval, subtract the MOE from the mean. To get the upper limit of the confidence interval, add the MOE to the mean. Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum 5.483478 0.039705 5.46 5.29 0.190421 0.03626 -0.5514 0.149761 0.75 5.1 5.85 Sum Count Confidence Level(99.0%) 126.12 23 0.11192 Here's the result of applying the box above to the data in the problem. As you can see, we get a mean of 5.483 and a MOE of 0.11192, just as we did earlier. Note the entries for standard deviation (s), standard error, and count (n) 9 AC. Estimation of Population Proportion (Confidence Interval for One Proportion) Problem: After the confirmation hearing of Justice Clarence Thomas, a survey of 1300 members of the National Association for Female Executives revealed that all but 299 of them considered sexual harassment in the workplace to be a problem. Find the 95% confidence interval for the fraction of all female executives who consider sexual harassment in the workplace to be a problem. The difference between this problem and the preceding ones is that the parameter of interest in a population proportion (π) rather than a population mean (). The work will be very similar to the work in demonstration Y, with this difference: the standard error of the proportion is not n , but is given by standard error of the proportion = p(1 p) n (NOTE: In my Excel templates, I refer to the sample proportion p as “p-hat”. Different textbooks use different symbols for this quantity, and I wrote the templates for another book. Just ignore the “-hat” for this class.) We'll do this problem with our Excel spreadsheet. The formulas used in the sheet are shown, also. Cell B3 contains the confidence level. B2 is p, D3 is the sample size, and B7 is the MOE. Note that proportion problems always use a z score, not a t score. This is the case even though we don’t know the population standard deviation. I’ve created a template, Confidence Interval for the Population Proportion, for this kind of problem, but let’s do it here from scratch, just for practice. Given Data p-hat = 0.77 c= 0.95 n= 1300 Calculations Confidence Interval z*= 1.95996 lower limit= 0.74712 std. err. of p-hat= 0.01167 upper limit = 0.79288 MOE= 0.02288 Checks # successes= 1001 # failures= 299 At least 5 of each, technique okay. Given Data p-hat = 0.77 c= 0.95 n= 1300 Calculations Confidence Interval z*= =NORMSINV((1+B3)/2) lower limit= =B2-B7 std. err. of p-hat= =SQRT(B2*(1-B2)/D3) upper limit = =B2+B7 MOE= =B5*B6 Checks # successes= =B2*D3 # failures= =D3-B9 =IF(AND(B9>=5,D9>=5),"At least 5 of each, technique okay.","Insufficient successes or failures.") 10 So, assuming that the sample is randomly selected from the target population, we are 95% confident that the proportion of all female executives who find sexual harassment in the workplace a problem is between 74.7% and 79.3%. Is it likely that this sample is random? No. First, the poll was taken of the membership of the National Association for Female Executives, based in New York. It is doubtful that this group is representative of female executives as a whole (the target population), but even if it were, we must ask how the poll was taken. Most likely it was either a convenience sample (for example, of women in the home chapter of NYC), or a self-selected sample (of members who, say, visited the organization's website and chose to vote). If the former is the case, then the sample will reflect the bias of NYC residents. If the latter is the case, then the harassment figure is probably inflated, since the people most eager to express their opinion are likely to be those with strong feelings on the matter. The second bias is due to the timing of the poll. The issue of sexual harassment was saturating the media at that time, and emotions were running high on the Thomas/Hill debate. This could lead people to identify harassment as a problem who normally would "shrug it off". Note the "Checks" box on my Excel sheet. In order for this technique to be valid, we should have at least 5 successes and 5 failures in our sample. In this case, 23% of the 1300 votes were "failures"—women who said "no". 0.23 1300 = 299 is obviously a lot bigger than 5, and the remaining 1001 women were "successes"—they said "yes". Never use a statistical technique without checking its assumptions! AD. Determination of Appropriate Sample Size (Proportion) Problem: Ms. Goodman wishes to conduct a study to determine who controls the TV remote in American couples, the man or the woman. She will construct a 95% confidence interval for the proportion of men who control the remote, and wants her margin of error to be no more than 4.5%. How large a sample should she take? This is a proportion problem, and the margin of error in the proportion formula is the product of the critical p(1 p) n z score (z*) and the standard error of the proportion, . The z score for 95% is just 1.960, as we saw in the solution box on page 2 of these demonstrations. (It can also be computed by z* = NORMSINV((1+c)/2).) The MOE for this problem is to be 0.045. So for Ms. Goodman to get her reported answer, we'd need that 0.045 = 1.960 p(1 p) n Now divide both sides by 1.96 and square both sides: (0.045/1.96)2 = p(1 – p)/n, or n = p(1 – p)/(0.045/1.96)2 The problem, of course, is that we don’t know what p is! We haven’t yet taken the sample! What do we do? Two approaches are often adopted, and we’ll discuss them both. (1) Cover your butt. If you look at the formula for n above, you’ll find that it always takes on its biggest value when p is 0.5. We can feel confident then, that our sample will be large enough if n is at least 11 0.5(1 – 0.5)/(0.045/1.96)2 This is 474.3, so surveying 475 randomly selected couples should be sufficient. (2) If you don’t want to take a worst case estimate of p, your only alternative is to use a reasonable estimate for it. Often this value is obtained by doing a rather small study and using the observed value of p in the equation on the last page. For example, Ms. Goodman might do a preliminary survey of 50 couples and find that in 40 of those 50 couples, the man controlled the remote. Using p = 0.8 in the formula for n on the last page, then, would give 303.5, so Ms. Goodman might decide on a total survey of, say 320 people. (The slight inflation of the value of n would be a good idea, since our value for p was estimated.) Note that this second approach can also be used to estimate the required sample size when building a confidence interval for the population mean, too. We proceed in the same way, taking the expression for MOE and setting it equal to the MOE desired. For the mean, though, this formula is going to include or s, and there is no “worst case” value for these. In brief, you have to be able to have a reasonable guess of s or (or do a small preliminary study to find a reasonable guess), or you can’t determine the required sample size for a certain MOE in a study of a confidence interval for the population mean. 12