Oller's Brief on: Confidence Intervals CONCEPTS AND OBJECTIVES 1. Identifying and creating confidence intervals. 2. How to craft an interval when σ is known. 3. How to craft an interval when σ is unknown. 4. How to select the appropriate sample size given a desired margin of error. 5. Using excel to construct confidence intervals. A. BACKGROUND INFORMATION We know from previous exercises: 1. A sample of size n is drawn from a population. 2. If you sum the values of the sample and divide by n, you will obtain a sample mean. That sample mean is a point estimate of the population mean μ. 3. Our expectation is that the sample mean will equal the population mean. E[ x ]=μ. However, we know that this expectation will not likely be realized. This leads to our sample error, which is x -μ. 4. Additionally, we saw that if we took many samples from a population, the sample means and sample errors will take the form of a normal distribution in two instances. a. If the sample size is greater than or equal to 30. b. If the population is normally distributed. This distribution will contain a standard deviation of these sample means or sample errors. This standard deviation is also known as the standard error and is denoted as σ x . This should not be confused with the standard deviation of the population, σ. Rather than construct numerous samples and derive a standard error, we can use the commonly used property that σ x = n . Using Excel to calculate the standard error- Type =(value for the population standard deviation/sqrt(sample size)). 5. We know that we can construct a z-score for any value x that comes from a normally x distributed set of sample means. z= . X B. THE CONFIDENCE INTERVAL Rationale: x is merely a point estimate of μ. We can learn more about the quality of our sample mean by constructing a confidence interval. 1. Confidence Interval- The confidence interval is equal to x Margin of Error. 2. Confidence Level- The confidence level is the percentage of samples that will be within the confidence interval. E.g. 90% 3. Confidence coefficient- The confidence level expressed as a proportion. E.g. .9 4. The level of significance- The complement to the confidence coefficient. α= Level of Significance= 1- confidence coefficient. This is the probability that an interval estimation procedure will yield an interval that does not include μ. To estimate the confidence interval, we have to evaluate our particular circumstance. Case 1- σ is known Case 2- σ is unknown Case 3- n<30 and the sample is not normally distributed. C. CASE 1, σ KNOWN First, the sample size needs to be large or the population needs to be normally distributed, such that we can infer that our sampling distribution is normal. With a normal distribution and a known population standard deviation, we can use the standard normal distribution. The CI= x Margin of Error. The margin of error will equal the z score at a given level of significance*the standard error. CI x Z * n 2 Where: CI The confidence interval The point estimate mean x Standard error n % The error we are willing to accept ( 1 )% The confidence level (confidence coeff.) Critical value of Z. The value corresponding to an area (probability) of z 2 in 2 both the lower and upper tails. This equation means that α/2 percent values will be in the upper tail and α/2 percent in the lower tail. To find the critical z value, we need to look up 1-α/2 in the body of the Z chart and find the corresponding Z score. Why use 1-α/2 instead of 1-α? Standard Normal Distribution Confidence Interval 0.45 0.4 α/2 0.35 F(x) 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2 3 Score Observations: 1. The higher the confidence level the wider the CI is the maximum sampling error i.e. margin of error 2. z * n 2 Suppose: x 125 24 n 36 Find (construct) a 90% CI for the population mean To find the critical z the value you need to look up the relevant probability from within the table: Use 1-α/2. α=1-.9=.1. .1/2=.05. So, lookup 1-.05=.95 Z=1.65 In excel, find the Z score by typing =normsinv(1-α/2). Find the Margin of error by typing =(Z score result*(σ/sqrt(n)). You can find the lower boundary of the confidence interval by typing =( x -the margin of error). You can find the upper boundary of the confidence interval by typing =( x +the margin of error). How would you interpret this result? Construct a 99% CI for the population mean D. CASE 2 σ unknown. In this case, we will still operate under the condition that standard error is still normally distributed (the sample size is large or the population is normally distributed); however, you cannot use σ to find your standard error. You must use S as a proxy. Recall that S= x x 2 i . n 1 a. In excel, you just need to enter =stdev(sample range). x no longer is a S/ n Standard Normal random variable. This distribution does not follow the Z distribution anymore, but it does have a particular distribution called the t-distribution If σ is unknown then we cannot use the z distribution because The t-distribution is: 1. A continuous distribution 2. Bell-shaped & symmetrical about zero 3. More spread out (flatter) than the Standard Normal Distribution, however as n increases t-distribution approaches z-distribution To use the t-table (page 329) of your book, you need two pieces of information. 1. α/2 2. The degrees of freedom. DF=n-1. Notes: 1. As n (and likewise the DF) becomes larger, the t score declines. 2. As n becomes large (greater than 30), t=z. CI using t-distribution S CI x tdf n 1, / 2 * n Where tdf n 1, / 2 is the t-value from the t-table using t curve with df = n-1 and a right tail area of / 2 . USING EXCEL- To calculate the value of t using excel, type the following: =tinv(α,degrees of freedom) ****Make sure you use α not α/2. Excel automatically makes the adjustment**** Do problem 18 on page 336. E. CASE 3, n<30 and the population is not normally distributed. You cannot use t or z. You must increase your sample size. F. HOW TO DETERMINE YOUR SAMPLE SIZE GIVEN A DESIRED MARGIN OF ERROR. z * is the margin of error n 2 Let e = z * n 2 If we know the values of e, , & Z then we can find n e = z * n 2 Solve for n: e n z z n e n 2 z e z n e 2 2 If e = 30 200 .05 , find the required sample size and always round up. APPENDIX Term Description Population Standard Deviation (σ) Sample Standard Deviation (s) This is the standard deviation for the entire population. It measures the average dispersion from the mean. This is the standard deviation of your sample. It is an approximation of 𝜎. For example, if you have a population of 500 and sample 100 of these observations, σ would be the standard deviation of all 500, whereas s would be the standard deviation of the 100 observations in your sample. This is the average of the entire population. This is the average of the sample. It can take numerous potential values. It is a point estimate of the population mean. Population Mean (µ) Sample Mean (𝑥̅ ) Formula (x−μ)2 σ=√ =stdevp(data) N (x−x̅)2 s=√ Excel Command =stdev(data) n−1 µ= ∑𝑁 1 𝑁 𝑥 =average(data) 𝑥 =average(data) 𝑥̅ = ∑𝑛 1 𝑛 Standard Error (𝜎𝑥̅ ) This is the standard deviation of the sampling distribution (the distribution of the potential values for the sample mean.) It is basically the average error when using a sample mean. 𝜎𝑥̅ = 𝜎 =σ/sqrt(n) √𝑛 if σ is known. Or 𝑠 𝜎𝑥̅ = √𝑛 If σ is unknown. Z(α/2) t(α/2) Margin of Error This is a Z score at the location where the area in either tail is α/2. It measures distance from the mean in terms of standard errors. Basically, this measure is telling you how far away from the mean you must go to leave that probability of occurrence in each tail. We are using Z when sigma is known. This is a t score at the location where the area in either tail is α/2. It measures distance from the mean in terms of standard errors. Basically, this measure is telling you how far away from the mean you must go to leave that probability of occurrence in each tail. We are using this distribution sigma is unknown. This is the distance you need to add and subtract from the sample mean to get your confidence interval. It is determined by your level of significance and standard error. A z-score is equal to 𝑥̅ −μ 𝜎𝑥̅ , However, we are finding a z-score at a given probability, so don’t use this calculation for confidence intervals. A t-score is equal to 𝜎𝑥̅ , However, we are finding a t-score at a given probability, so don’t use this calculation for confidence intervals. Margin of Error= Z(α/2)* 𝜎𝑥̅ , if using the standard normal distribution, or the t distribution This is a range that will include the population mean with a probability equal to the confidence level. =normsinv(α/2) This will give you the negative z-score. Both values are numerically the same. =tinv(α, d.f.) 𝑥̅ −μ t(α/2)* 𝜎𝑥̅ , if using Confidence Interval =normsinv(1-α/2) This will give you the positive z-score. Upper Boundary= 𝑥̅ +Margin of Error Lower Boundary= 𝑥̅ -Margin of Error The degrees of freedom for these problems is n-1. This finds a t-score with an area of α/2 in the upper tail. Just use the * to multiply the two terms.