Chapter 6.1 — Confidence Intervals Stat 226 – Introduction to Business Statistics I Confidence Intervals Spring 2009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:30-10:50 a.m. Sample means vary in value and form a sampling distribution in which not all samples result in x̄-values equal to the population mean µ. We should not expect to obtain a sample mean x̄ (based on a specific sample) that is exactly equal to the population mean µ. However, we can expect the point estimate to be fairly close in value to the population mean for a sufficiently large sample size (sampling distribution becomes approximately normal for large sample size). Chapter 6, Section 6.1 Recall 68-95-99.7 rule: 95% of all observations from a normal distribution will fall within ± 2 standard deviation. Confidence Intervals Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 1 / 25 Chapter 6.1 — Confidence Intervals Using this concept, we can construct so-called confidence intervals: We know that x̄ follows a normal distribution with mean µ and standard √ deviation σ/ n, i.e., σ x̄ ∼ N (µ, √ ) n Therefore, we can anticipate approximately 95% of all random samples of size n from some population with unknown µ and known σ to produce sample means x̄ that fall between Stat 226 (Spring 2009, Section A) Section 6.1 2 / 25 This interval ! σ σ " µ−2∗ √ ; µ+2∗ √ n n is based on the 68-95-99.7 rule. We know from Chapter 1, that the actual z-score corresponding to the middle 95% is z = 1.96. so more precisely we have ! σ σ " µ − 1.96 ∗ √ ; µ + 1.96 ∗ √ n n We are going to use z = 1.96 in the future when constructing a 95% confidence interval. σ and µ + 2 ∗ √ n Introduction to Business Statistics I Introduction to Business Statistics I Chapter 6.1 — Confidence Intervals If the sample size n is large enough, the sampling distribution of the sample means is approximately normal. Our point estimate x̄ will hardly be equal to the population mean µ, but most likely (≈ 95% of all times) fall within 2 standard deviations about the population mean µ. σ µ−2∗ √ n Stat 226 (Spring 2009, Section A) Section 6.1 3 / 25 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 4 / 25 Chapter 6.1 — Confidence Intervals Chapter 6.1 — Confidence Intervals Example: ACT scores ∼ N (µ, 5.9), let’s take samples of size n = 76 It can be shown that this concept can be reversed in the following sense: ⇒ approximately 95% of all samples of size 76 will produce sample means between Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 5 / 25 Chapter 6.1 — Confidence Intervals Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 6 / 25 Chapter 6.1 — Confidence Intervals Definition of a Confidence Interval Confidence Intervals (short: CI) A confidence interval for the unknown population mean µ is an interval (or range) of plausible values for µ. It is constructed such that with a chosen degree (or level) of confidence C, the value of the unknown population mean will be captured inside the interval. For each confidence interval we have a confidence level C: C provides information on how much “confidence” we can have in the method used to construct the CI C usual choices are: 90%, 95%, and 99% C can be interpreted as the rate of success for the method used to construct CI in the long run Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 7 / 25 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 8 / 25 Chapter 6.1 — Confidence Intervals Chapter 6.1 — Confidence Intervals Example: 99% level of confidence A level C confidence interval for population mean µ For a sufficiently large sample size n (CLT can apply so x̄ follows a normal distribution) or a population that is already normally distributed, the general formula for a level C confidence interval for the population mean µ when σ is known is given by ! σ σ " x̄ − z ∗ · √ ; x̄ + z ∗ · √ n n i.e. in short notation ! σ " x̄ ± z · √ n ∗ A 99% confidence interval is constructed such that in the long run it is successful in capturing the true unknown population mean 99% of all times. —————————————————————————————— Finding the critical value z ∗ for a level C confidence interval: More precisely we have that C = (1 − α) ∗ 100% The relevant number is called α, measuring the difference between the desired level of confidence and certainty (i.e. 100%). Example: z∗ The desired level of confidence C determines which critical value is used. The three most commonly used confidence levels, 90%, 95%, and 99% use critical values 1.645, 1.96, and 2.575 respectively. Use Table A to find z ∗ . Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 9 / 25 Chapter 6.1 — Confidence Intervals Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 10 / 25 Chapter 6.1 — Confidence Intervals Example: A random sample of size n = 25 from last semester’s heights data yielded a sample mean of x̄ = 69.36. We know the population standard deviation is σ = 4.004 Find a 90% confidence interval for the unknown population mean µ Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 11 / 25 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 12 / 25 Chapter 6.1 — Confidence Intervals Chapter 6.1 — Confidence Intervals What about a 95% confidence interval? Why settle for a 90% CI or 95% CI when we can construct 99% CIs? The higher level of confidence comes with a price tag: The resulting interval is wider than the 90% or 95% confidence interval: 99% CI =⇒ z ∗ = 2.575 =⇒ 4.004 69.36 ± 2.575 · √ =⇒ 25 # $% & (67.29794 , 71.42206) 2.06206 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 13 / 25 Chapter 6.1 — Confidence Intervals Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 14 / 25 Section 6.1 16 / 25 Chapter 6.1 — Confidence Intervals The width of any confidence interval is given by In the previous 3 examples, the width of the corresponding CIs was Handout on simulated confidence intervals 90%: 95%: 99%: Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 15 / 25 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Chapter 6.1 — Interpretation of Confidence Intervals Chapter 6.1 — Interpretation of Confidence Intervals Interpretation of Confidence Intervals Referring to the handout on the 100 simulated confidence intervals we can take a away the following facts: 1 We can be C% confident that the falls in the constructed level C confidence interval, i.e. between the lower and upper CI bound for a specific calculated example. 2 If we would take repeated samples, approximately C% of all samples taken will include the in the long run. 3 The interpretation of a CI is always in terms of the unknown population mean µ and never in terms of the sample mean x̄. The sample mean x̄, the center of every CI, will always be included in the CI by default. Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 17 / 25 Be careful: Before we take a sample from a population we can say there is a C% chance, (e.g. 95% chance), that our confidence interval will include the population parameter µ if we plan on constructing C% confidence intervals, (e.g. 95% CIs). Once we have taken the sample, this decision is made. Our interval either does contain µ or it does not. We just don’t know it. There is not a C% chance anymore, all we can say is that we are C% confident, (e.g. 95% confident). Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Chapter 6.1 — Confidence Intervals Chapter 6.1 — Confidence Intervals We saw that the two properties of a high level of confidence and but a narrow (precise) CI work against each other. margin of error The higher the level of confidence the wider the confidence interval and therefore the less precision we have estimating the unknown µ. Introduction to Business Statistics I Section 6.1 18 / 25 σ m = z∗ · √ n is also referred to as the so-called margin of error changing one of the three components z ∗ , σ or n in the margin of error will have the following impact on the width of the confidence interval 1 level of confidence C = (1 − α) ∗ 100% will change z ∗ remedy: If we need a certain level of confidence, but also a specific precision, we can increase the sample size n if n goes up ⇒ σx̄ = √σn will go down! we get a narrower interval with more precision: Stat 226 (Spring 2009, Section A) Section 6.1 19 / 25 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 20 / 25 Chapter 6.1 — Confidence Intervals 2 Chapter 6.1 — Confidence Intervals sample size n will change standard deviation σx̄ sample size calculations If we want both a high level of confidence and a small margin of error (i.e. narrow confidence interval) we need to take a sample of size n≥ 3 ! z ∗ · σ "2 m n rarely corresponds to an integer number, so we always need round up to the next largest integer. population standard deviation σ Why next largest? If we would round down, the corresponding confidence interval would not have the desired margin of error any longer, but a slightly larger one! Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 21 / 25 Chapter 6.1 — Confidence Intervals Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 22 / 25 Chapter 6.1 — Assumptions for Confidence Intervals Example: What sample size should be used to estimate the mean age of workers in a large factory within 1 year at a 95% level of confidence if the standard deviation σ for the variable age is known to be 3.5? Necessary Assumptions for Constructing CIs 1 the sampling distribution of x̄ has to follow at least approximately a distribution, i.e. either sample size is for the to apply if the population we sample from does not follow a normal distribution, or the population we sample from follows a normal distribution. 2 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 23 / 25 The sample taken has to be a Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I sample. Section 6.1 24 / 25 Chapter 6.1 — Confidence Intervals worksheets Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 6.1 25 / 25