10/31/08 132 × 185 Row total × Column total = = 106.6 229 Overall total Oct. 31 Statistic for the day: Per capita consumption of candy by Americans in 2007: 24.5 pounds Rule of sample proportions (p. 359) IF: 73 × 110 218 = 63.6of interest 1. There is a population proportion 2. We have a random sample from the 2population (Observed − Expected) 3. The sample is large enough so that we will see at least five Expected of both possible outcomes THEN: 1 √ size are taken and the sample margin of error If numerous samples of the=same sample size proportion is computed every time, the resulting histogram will: 1 1 margin of error = √ 1. be roughly bell-shaped1007 = 31.7 = 0.032 364population × · · · × proportion 326 2. have mean equal 365 to the×true P (No Match) = Assignment: Read Chapter 20 Exercises pp. 382-383: 1, 2, 4, 6 365 ×equal 365 × 3. have standard deviation to:· · · × 365 ! " " " " " # = 0.109 (population proportion) × (1 − population proportion) sample size ! " " " # standard deviation = " (.56) × (.44) = 0.16 1007 1 Sample means: quantitative variables Suppose we want to estimate the mean weight at PSU What is the uncertainty in the mean? We need a margin of error for the mean. Suppose we take another sample of 237. What will the mean be? Will it be 152.5 again? Probably not. Consider what would happen if we took 1000 samples, each of size 237, and computed 1000 means. Data from stat 100 survey, spring 2004. Sample size 237. Mean value is 152.5 pounds. Standard deviation is about (240 – 100)/4 = 35 Hypothetical result, using a “population” that resembles our sample: Extremely interesting: The histogram of means is bell-shaped, even though the original population was skewed! Formula for estimating the standard deviation of the sample mean (don’t need histogram) Just like in the case of proportions, we would like to have a simple formula to find the standard deviation of the mean without having to resample a lot of times. Suppose we have the standard deviation of the original sample. Then the standard deviation of the sample mean is: Standard deviation is about (157 – 148)/4 = 9/4 = 2.25 1 10/31/08 Example: SAT math scores So in our example of weights: The standard deviation of the sample is about 35. Hence by our formula: Standard deviation of the mean is 35 divided by the square root of 237: 35/15.4 = 2.3 (We estimated 2.25) So the margin of error of the sample mean is 2×2.3 = 4.6 Report 152.5 ± 4.6 (Same as 147.9 to 157.1) A sample of 100 SAT math scores with a mean of 540 would be very unusual. A sample of 100 with a mean of 510 would not be unusual. Suppose nationally we know that the SAT math test has a mean of 100 points and a standard deviation of 100 points. Draw by hand a picture of what you expect the distribution of sample means based on samples of size 100 to look like. Sample means have a normal distribution mean 500 standard deviation 100/10 = 10 So draw a bell shaped curve, centered at 500, with 95% of the bell between 500 – 20 = 480 and 500 + 20 = 520 132 × 185 Row total × Column total = = 106.6 229 Overall total 73 × 110 = 63.6 Rule of sample 218 means (p. 363) (Observed − Expected) IF: 1. The population of measurements of interest is2bell-shaped, OR Expected 2. A large sample (at least 30) is taken. 1 margin of error = √ sample size THEN: If numerous samples of the same size are taken and the sample mean is computed every time, the resulting histogram will: 1 1 margin of error = √ = = 0.032 1. be roughly bell-shaped 1007 31.7 2. have mean equal to the true population mean 365 × 364 × · · · × 326 P (No Match) = = 0.109 3. have standard deviation estimated by:365 × · · · × 365 365 × sample standard deviation √ sample size ! " " " " " " " # 132 × 185 Row total × Column total = = 106.6 (population ×total (1 − population proportion) 229 proportion) Overall sample size 73 × 110 1 218 = 63.6 (Observed − Expected)2 Expected Back to proportions: Suppose the true proportion is known margin of error = √ 1 sample size Back to proportions (cont’d): 1 1 margin of error = √ = = 0.032 31.7 Suppose the true proportion 1007 is known Recall that in craps, P(win) = 244/495 = 0.493 for each game. 365 × 364 × · · · × 326 Recall that in craps, P(win) = 244/495 P (No Match) = = 0.493 for each game. = 0.109 Q: If you play 100 games of craps, what proportion will you win? sample standard deviation The RoSP says that the sampling distribution √ of the proportion of wins sample size A: Dunno. Q: Okay, I guess that was obvious. But can we at least give a range of possible proportions that will be valid most (say, 95%) of the time? A: Sure. Use the Rule of Sample Proportions. 365 × 365 × · · · × 365 in 100 games will be: • • • ! " " " (population proportion) × (1 − population proportion) " # roughly normal, sample size with mean 0.493 ! " and standard deviation " " (.493) × (.507) # 100 = 0.05. 1 Therefore, the 68-95-99.7 rule says that 95% of the time, the proportion of wins in 100 games will be between 0.493−2×0.05 and 0.493+2×0.05. (0.393 and 0.593) 2 10/31/08 True proportion unknown Next, suppose we do not know the true population proportion value. This is far more common in reality! How can we use information from the sample to estimate the true population proportion? Suppose we have a sample of 200 students in STAT 100 and find that 28 of them are left handed. Our sample proportion is: We can now estimate the standard deviation of the sample proportion based on a sample of size 200: ! " " " " # (.14) × (1 − .14) = 0.025 200 (94 − 106.6)2 (38 − 25.4)2 (91 − 78.4)2 (6 − 18.6)2 + + + = 18.3 106.6 25.4 78.4 18.6 2 Hence, 2 standard (73 deviations − 62.4) = 2×.025 = .05 = 1.80 62.4 (53 − 63.6)2 = 1.77 63.6 0.14 (35 − 45.6)2 = 2.46 On the following two slides, we’ll pretend 45.6 that the true population proportion is 0.12. (57 − 46.4)2 = 2.42 46.4 P (no match with MY birthday) = P (no match at all) ≈ Note the green curve, which is the truth. 364 49 = 0.874 365 364 1225 = 0.035 365 2 Of course, ordinarily we don’t know where it lies, but at least we know its standard deviation. Thus, we can build a confidence interval around our 14% estimate (in red). If we repeat the sampling over and over, 95% of our confidence intervals will contain the true proportion of 0.12. This is why we use the term “95% confidence interval”. If we take another sample, the red line will move but the green curve will not! Definition of “95%132confidence interval fortotal the= 106.6 true × 185 Row total × Column = 229 Overall total population proportion”: 73 × 110 = 63.6 218 the sample that is almost An interval of values computed from 2 certain (95% certain in (Observed this case) to cover the true but − Expected) Expected unknown population proportion. margin of error = √ The plan: margin of error = √ 1 sample size 1 1 = = 0.032 1007 31.7 1. Take a sampleP (No Match) = 365 × 364 × · · · × 326 = 0.109 365 × 365 × · · · × 365 2. Compute the sample proportion sample standard deviation √ sample size 3. Compute the estimate of the standard deviation of the sample proportion: (proportion) × (1 − proportion) ! " " " " # sample size 4. 95% confidence interval(.14) for×the true population proportion: (1 − .14) = 0.025 sample proportion ± 2×SD 200 ! " " " # Other confidence coefficients The confidence intervals we’ve been constructing (using ±2×stdev) are called 95% confidence intervals. The confidence coefficient is 95%, or .95. It means that we cover the middle 95% of the normal curve associated with sample proportions, and this requires 2 standard deviations. We can change the confidence coefficient by using the normal table (p. 157) to determine the appropriate number of standard deviations. 1 3 10/31/08 Other confidence coefficients: An example Suppose we want a 90% confidence interval instead of 95%. How many standard deviations span the middle 90% of the normal curve? 90% confidence interval (see p. 157) Since 90% is in the middle, there is 5% in either end. So find z for .05 and z for .95. We get z = ±1.64 90% confidence interval: sample proportion ± 1.64×stdev Back to the original example of sample size 200, 14% lefthanded in sample. We found the sample stdev to be .025. Hence, to construct a 90% CI we have .140 – 1.64×.025 = .140 − .041 = .099 .140 + 1.64×.025 = .140 + .041 = .181 90% confidence interval: .099 to .181 (95% confidence interval was: .090 to .190) So the 90% CI is shorter (a more precise estimate) but less accurate. It has a 10% chance of missing the true population proportion. 4