Chapter 7.1 CONFIDENCE INTERVALS AND SAMPLE SIZE WHEN ๐ IS KNOWN Estimation ๏ต One important aspect of inferential statistics is estimation - the process of estimating a population using information obtained from a sample. ๏ต The accuracy of estimation depends on several key factors: ๏ต ๏ต Sample Size ๏ต Observations chosen randomly ๏ต Other assumptions that we will explore In this chapter, statistical procedures for estimating the population mean, variance, and standard deviation will be explored. What We Will Cover: ๏ต Confidence intervals for the population mean when ๐ is known ๏ต Determining Sample Size ๏ต Confidence intervals for the population mean when ๐ is unknown Point Estimates ๏ต A point estimate is a specific numerical value estimate of a parameter. ๏ต The best point estimate of the population mean ๐ is the sample mean ๐ฅ. ๏ต Sample measures (i.e. statistics) that are used to estimate population measures (i.e. parameters) are also called estimators. ๏ต Statisticians are always looking for “good” estimators. Properties of a Good Estimator ๏ต Point estimators are considered ‘good’ if they demonstrate the following properties: ๏ต Unbiasedness ๏ต Efficiency (relative efficiency) ๏ต Consistency Unbiasedness ๏ต ๏ต For an estimator to be unbiased, the expected value (or mean) of the estimates obtained from samples of a given size is equal to the parameter being estimated. ๐ = the population parameter of interest ๏ต ๐ = the point estimator of ๐ The sample statistic ๐ is an unbiased estimator of the population ๏ต ๐ธ(๐) = ๐ ๏ต parameter ๐ if Efficiency ๏ต Assume that a simple random sample of n elements can be used to provide two unbiased estimators for the same population parameter, say ๐1 and ๐2 ๏ต The point estimator ๐1 is said to be more efficient relative to ๐2 if: Var(๐1 ) < var (๐2 ) ๏ต In other words, the point estimator with the smaller variance, and hence a smaller standard deviation, will provide estimates that are closer to the true population parameter Consistency ๏ต A point estimator is consistent if values of the point estimator tend to become closer to the true population parameter as the sample size becomes larger. ๏ต As the sample size n approaches the population size N, the value should be closer to the population parameter. Interval Estimation ๏ต ๏ต ๏ต ๏ต ๏ต Interval Estimate: an interval or range of values used to estimate the parameter. For example, we might say that the mean height of males falls somewhere between ±3 inches from 70 inches. 67 ≤ ๐ ≤ 73, or 70 ± 3 However, the interval estimate may or may not contain the value of the true underlying population parameter. A degree of confidence must be assigned before creating an interval estimate. Interval Estimation ๏ต Confidence Level: the probability that the interval estimate will contain the true population parameter given repeated sampling ๏ต This is often expressed as a percentage. ๏ต For example, one might say they are 99% confident that the interval estimate 70 ± 3 contains the true population mean, ๐ ๏ต This would also be interpreted as, ”with 0.99 probability, the interval contains the true population mean.” Interval Estimation ๏ต Confidence Interval (CI): a specific interval estimate of a parameter determined by using data obtained from a sample and by using the specific confidence level of the estimate. ๏ต When an interval estimate is made and a confidence level is assigned, we have a confidence interval. ๏ต The most common confidence levels used for confidence intervals are the 90%, 95%, and 99% levels of confidence. Margin of Error ๏ต ๏ต ๏ต ๏ต A point estimator cannot be expected to provide the exact value of the population parameter. But an interval estimate can be used by adding and subtracting a margin of error to the point estimate. Margin of error (maximum error of the estimate): the maximum likely difference between the point estimator of a parameter and the actual value of a population parameter. ๐ฅ ± margin of error The purpose is to provide information about how close the point estimate is to the population parameter Assumptions for CI when ๐ is Known ๏ต The sample must be a random sample. • ๏ต All of the sample points have an equal chance of being selected In most applications, a sample size of n ≥ 30 is adequate • If n < 30, then population must be normally distributed ๏ต If the population distribution is highly skewed or contains outliers, then a sample size of 50 or more is recommended. ๏ต In practice, get as much data as you can; more is better. Interval Estimate of a Population Mean when ๐ is Known ๏ต In order to develop an interval estimate of a population mean, the margin of error must be computed using either: • The population standard deviation ๐ • The sample standard deviation, s ๏ต ๐ is rarely known exactly, but often a good estimate can be obtained based on comprehensive historical data ๏ต This is typically what we are referring to when we say that ๐ is known. Interval Estimate of a Population Mean when ๐ is Known ๏ต Interval estimate of ๐ ๐ ๐ฅ ± ๐ง๐ผ/2 ∗ ๐ ๐ฅ = ๐กโ๐ ๐ ๐๐๐๐๐ ๐๐๐๐ 1 − ๐ผ = the confidence coefficient ๐ง๐ผ/2 ๐ผ = ๐ง ๐ฃ๐๐๐ข๐ ๐๐๐๐ฃ๐๐๐๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐๐ ๐กโ๐ ๐ข๐๐๐๐ ๐ก๐๐๐ ๐๐ ๐กโ๐ ๐ ๐ก๐๐๐๐๐๐ ๐๐๐๐๐๐ ๐๐๐ ๐ก๐๐๐๐ข๐ก๐๐๐ 2 ๐ = ๐กโ๐ ๐๐๐๐ข๐๐๐ก๐๐๐ ๐ ๐ก๐๐๐๐๐๐ ๐๐๐ฃ๐๐๐ก๐๐๐ ๐ = ๐ ๐๐๐๐๐ ๐ ๐๐ง๐ Interval Estimate of a Population Mean when ๐ is Known ๏ต Values for ๐ง๐ผ/2 for the most commonly used confidence intervals are: Confidence Level ๐ผ ๐ผ/2 Table LookUp Area ๐ง(๐ผ/2) 90% 0.10 0.05 0.95 1.65 95% 0.05 0.025 0.975 1.96 99% 0.01 0.005 0.9950 2.58 Interval Estimate of a Population Mean when ๐ is Known Interval Estimate of a Population Mean when ๐ is Known Meaning of confidence ๐ ๐ ๏ต Suppose we choose ๐ผ = 0.1 and we construct the intervals using ๐ฅ ± 1.65 ∗ ๏ต We can say that 90% of the intervals constructed given repeated sampling will contain the true population mean ๏ต We say that this interval has been established at the 90% confidence level ๏ต The value 0.9 is referred to as the confidence coefficient When ๐ is Known : Example 1 ๏ต Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a potential location for a new outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. ๏ต A sample size of n = 36 was taken and the sample mean income is $41,100. The population is not believed to be highly skewed. The population standard deviation is known to be $4,500, and the confidence coefficient to be used in the interval estimate is 0.95. When ๐ is Known : Example 1 ๏ต First, find the margin of error given a confidence coefficient of 0.95: ๐ ๐ง๐ผ/2 ∗ = ๐ • ๐ง๐ผ/2 = 1.96 • ๐ = 36 • ๐ = $4,500 ๐ 4,500 ๐ง๐ผ/2 ∗ = 1.96 ∗ = $1,470 ๐ 36 When ๐ is Known : Example 1 ๏ต Using the calculated margin of error, we can construct the interval estimate of ๐ ๐ ๐ฅ ± ๐ง๐ผ/2 ∗ ๐ ๏ต $41,100 ± $1,470 ๏ต ($39,630, $42,570) ๏ต We can say that we are 95% confident that the interval contains the true population mean, ๐ When ๐ is Known : Example 1 ๏ต ๏ต We can also do this for varying levels of confidence Confidence Level Margin of Error Interval Estimate 90% $1,237.5 ($39,862.5, $42,337.5) 95% $1,470 ($39,630, $42,570) 99% $1,935 ($39,165, $43,035) Notice that in order to have a higher degree of confidence, the margin of error and thus the width of the confidence interval must be larger. Confidence Intervals: Graphically When ๐ is Known : Example 2 ๏ต A researcher wishes to estimate the number of days it takes an automobile dealer to sell a Kia Forte. ๏ต A random sample of 50 cars had a mean time on the dealer’s lot of 54 days. ๏ต Assume the population standard deviation to be 6.0 days. ๏ต Find the best point estimate of the population mean and the 95% confidence interval of the population mean. When ๐ is Known : Example 2 ๐ฅ = 54, ๐ง๐ผ/2 = 1.96, ๐ = 6, ๐ = 50 6 50 ๏ต 54 ± 1.96 ∗ ๏ต 54 ± 1.7 ๏ต The confidence interval is (52.3, 55.7) ๏ต With 95% confidence, we can say that (52.2 < ๐ < 55.7) When ๐ is Known : Example 2 Interval Estimate of a Population Mean when ๐ is Known ๏ต Sometimes other confidence coefficients other than 90%, 95%, and 99% are used. ๏ต We may need to calculate other values of ๐ง๐ผ/2 ๏ต The value for ๐ผ represents the total area of the areas in both tails of the distribution ๏ต ๐ผ is found by subtracting the desired confidence interval from 1. ๏ต For example, if we wanted a confidence level of 98%, we take: ๐ผ = 1 - 0.98 = 0.02 ๏ต Then, we find ๐ผ/2 = 0.01 Interval Estimate of a Population Mean when ๐ is Known Interval Estimate of a Population Mean when ๐ is Known ๏ต ๐ผ/2 = 0.01 ๏ต Now subtract this value from 1 to get the corresponding probability. ๏ต 1 – 0.01 = 0.99 ๏ต The closest z-score is 2.33 ๏ต So, the interval would be: ๐ ๐ฅ ± 2.33 ∗ ๐ Sample Size ๏ต The size of the sample is very important in statistical estimation. ๏ต How large must the sample be to obtain an accurate estimate? ๏ต The answer to this depends on three main factors: ๏ต The margin of error ๏ต The population standard deviation ๏ต The degree of confidence Determining Sample Size ๏ต The size of the sample is very important in statistical estimation. ๏ต How large must the sample be to obtain an accurate estimate? ๏ต If a desired margin of error is selected prior to the sampling, then the sample size necessary to satisfy the margin of error can be determined by rearranging the equation for the margin of error. Determining Sample Size ๏ต Let E = the desired margin of error ๐ ๐ธ = ๐ง๐ผ/2 ∗ ๐ ๏ต The necessary sample size for a given sample: ๐ง๐ผ/2 ∗ ๐ ๐= ๐ธ 2 Example 1: Determining Sample Size ๏ต Discount Sounds is evaluating a potential location for a new retail outlet based on, in part, the mean annual income of the individuals in the marketing area of the new location. ๏ต Suppose that Discount Sounds management team wants an estimate of the population mean such that there is a 0.95 probability that the sampling error is $500 or less. ๏ต How large of a sample size is needed to meet the required precision? Example 1: Determining Sample Size ๏ต We want to find n to get a margin of error equal to 500. ๐ 500 = ๐ง๐ผ/2 ∗ ๐ ๏ต At 95% confidence, ๐ง๐ผ/2 = ๐ง0.025 = 1.96. Recall ๐ = 4,500 ๐ง๐ผ/2 ∗ ๐ ๐= ๐ธ ๏ต 2 1.96 ∗ 4500 = 500 2 ≈ 312 A sample size of 312 is needed to reach the desired precision. Example 2: Determining Sample Size ๏ต A sociologist wishes to estimate the average number of automobile thefts in a large city per day within 2 automobiles. ๏ต He wishes to be 99% confident, and from a previous study the standard deviation was found to be 4.2. ๏ต How many days should he select to survey? Example 2: Determining Sample Size ๏ต First, what key information do we know? ๐ผ = 0.01 • ๐ง๐ผ/2 = 2.58 • ๐ธ=2 • • ๐= ๐ง๐ผ/2 ∗๐ 2 ๐ธ ๐ง๐ผ/2 ∗ ๐ ๐= ๐ธ ๏ต 2 2.58 ∗ 4.2 = 2 2 = 29.35 The researcher should take a sample of 30 or more to achieve the desired accuracy Determining Sample Size ๏ต ๏ต The necessary sample size equation requires a value for the population standard deviation ๐ If ๐ is unknown, a preliminary or planning value for ๐ can be used in the equation: • • ๏ต Use the estimate of the population standard deviation computed in a previous study Use a pilot study and use the sample standard deviation from the study Use judgement or a best guess for the value of ๐