Chapter 9 - Estimation Using a Single Sample The objective of inferential statistics is to use sample data to estimate some unknown characteristic of the corresponding population, such as mean (μ) or proportion (π). Two methods of accomplishing this are point estimation and interval estimation. 9.1 Point Estimation Point Estimate – of a population characteristic is a single number that is based on a sample data and represents a plausible value of the characteristic. When estimating a population proportion, there is only one way to do it, p . When estimating a population mean, you can use the sample mean ( X ), the sample median, or a trimmed mean. When deciding which to choose, we want the one that yields an accurate estimation and we use information from the sampling distribution. Unbiased statistic – a statistic whose mean value is equal to the value of the population characteristic being estimated. When given a choice of several unbiased statistics, use the one with the least standard deviation. 9.2 Large-Sample Confidence Interval for a Population Proportion Confidence Interval (CI) - for a population proportion, is a range of plausible values for the unknown population characteristic. It is constructed using sample data, so that, with a chosen degree of confidence (confidence level), the value of the characteristic will be captured between the lower and upper endpoints of the interval. i.e. statistic ± margin of error. Confidence Level – is the success rate of the method used to construct the confidence interval Note: this is a confidence in the method used to construct the interval, not a confidence in the specific interval. Usual choices for confidence levels are: 90%, which has a tail area of .05, and a z* of 1.645 95%, which has a tail area of .025, and a z* of 1.96 99%, which has a tail area of .005, and a z* of 2.576 Ex. If the method was used to generate an interval over and over with different samples, a 95% confidence level would mean that, 95% of the resulting intervals would capture the true value of the characteristic being estimated. As the confidence level increases, the confidence interval gets wider, and the precision decreases. Methods for Constructing Confidence Intervals Large Sample CI for population proportion π – When n is sufficiently large, the statistic p is unbiased, and has a sampling distribution that is approximately normal, with mean π and 1 1 standard deviation then the CI = ± z* n n Conditions: Simple Random Sample Approximately Normal – np ≥ 10 and n(1 – p) ≥ 10 n < 10% of population For example: For a 95% CI, (find .975 in the table of standard normal (z) curves and we have a z* value of 1.96,) so 95% of the values are within ± 1.96 standard deviations of the mean or π is within 1 1 1 the interval p – 1.96 to p +1.96 or p ±1.96 n n n For a 99% CI, you would find .995 in the table of standard normal curves and z* = 2.58. For a 90% CI, you would find .95 in the table of standard normal curves and z* = 1.645. Standard Error of a statistic is the standard deviation of the statistic. Choosing a sample size If the sampling distribution is approximately normal, then the bound on error of estimation (B) is 1.96 · standard error of the statistic for a 95% confidence interval. The sample size (n) required to estimate a population proportion π to within an amount B with 95% confidence is 1.96 n 1 2 9.3 Confidence Interval for a Population Mean The general formula for a confidence interval for a population mean () when o X is the sample mean from a random sample o Sampling distribution is approximately normal (given, graph, n ≥ 30) o The population standard deviation () is known Is X ± z* n If is unknown, we must use the sample data to estimate and the result is a different X standardized variable denoted by t: t which has more variability and we must look s n at t distributions. t distributions – are distinguished by the number of degrees of freedom (df) and have the properties: o The t distribution corresponding to any fixed number of degrees of freedom is bell shaped and centered at zero (just like the standard normal (z) distribution). o Each t distribution is more spread out than the standard normal (z) distribution. o As the number of degrees of freedom increases, the spread of the t distribution decreases. o As the number of degrees of freedom increases, the corresponding sequence of t distributions approaches the standard normal (z) distribution. The probability distribution of the standardized variable t X is the t distribution with s n degrees of freedom = n – 1. s X t critical value n where: X is the mean of a random sample, the population is normally distributed or n ≥ 30, and is unknown. A One-Sample t Confidence Interval for has formula: The sample size required to estimate a population mean () to within an amount B (bound on 1.96 error of estimation) with 95% confidence is n . If is unknown, it may be B estimated based on previous information or, for a population that is not “too skewed” by using range/4. If the desired confidence level is not 95%, then replace 1.96 with the appropriate z* value. 2 General rule of thumb: If n < 30, might have more variability and skewness If 15 < n < 30, can use a t-test if there are no extreme outliers If n < 15, include “proceed with caution”, distribution must be symmetrical to use t-test 9.4 Interpreting and Communicating the Results of Statistical Analyses Interpretation of Confidence Interval: We can be 90% confident that the true proportion is between ___ and ___. Interpretation of Confidence Level: We have used a method to produce this estimate that successfully captures the true proportion 90% of the time. A wide confidence interval indicates that we don’t have very precise information about the population characteristic being estimated. The width of a confidence interval is affected by the confidence level, the sample size, and the standard deviation of the statistic used. The best strategy for decreasing the width of a confidence interval is to take a larger sample. Additional Notes Statement about a Confidence Interval We are ____% confident that the true mean/proportion of context lies within the interval _____ and ______. To make the margin of error smaller: Make z* smaller by lowering the confidence level Increase the sample size n (to cut the margin of error in half, n must be 4x larger.) Make σ smaller (can’t really change this, but can use a different statistic that has a smaller σ.)