4/19/2013 Introduction to Inference Estimating with Confidence IPS Chapter 6.1 © 2009 W.H. Freeman and Company Objectives (IPS Chapter 6.1) Estimating with confidence Statistical confidence Confidence intervals Confidence interval for a population mean How confidence intervals behave Choosing the sample size 1 4/19/2013 Overview of Inference Methods for drawing conclusions about a population from sample data are called statistical inference Methods Confidence Intervals - estimating a value of a population parameter Tests of significance - assess evidence for a claim about a population Inference is appropriate when data are produced by either a random sample or a randomized experiment How Statistical Inference Works Statistical Inference is the process of drawing conclusions using data that are subject to random variation. It makes propositions about populations, using data drawn from the population of interest via some form of random sampling. The result of a statistical inference is a statistical proposition. Some common forms of statistical proposition are: an estimate, i.e. a particular value that best approximates some population parameter of interest a confidence interval, i.e. an interval constructed from the data in such a way that, under repeated sampling, it would contain the true parameter value with the probability stated in the confidence level a test of significance, i.e. a decision to reject or accept a hypothesis/claim about the nature or state of a population on the basis of a statistically significant outcome/result 2 4/19/2013 How Statistical Inference Works Population under study µ =? , p = ? 3. Use Statistical Inference to draw conclusions 1. Take a SRS of size n 2. Compute the value of a sample statistic x , pˆ Confidence Intervals A confidence interval gives an estimated range of values which is likely to include an unknown population parameter. It’s calculated from a sample taken from the population and is of the form: estimate ± margin of error The level of confidence is the likelihood that the true value of the population parameter falls in the estimated interval range of values. 3 4/19/2013 Computing Confidence Intervals for µ when the population standard deviation σ is known Example 1 The weight of single eggs of the brown variety is normally distributed with an unknown mean µ and a known standard deviation σ = 5g. You buy a carton of 12 brown eggs and find out that the box weighs 770g, for an average weight of 64.2g per egg . What can you conclude about the true mean weight µ of all brown eggs? ? 5g 770 g x 64.2 g P 1 x 1 0.68 or 68% n n Example 1 Cont. Therefore, there is a 68% chance that theinterval x 1 includes the uknown value . Distribution of X-bar x 64.2 n 12 1.443 Location of the mean µ. x 1 68% Confidence Interval for the mean weight of a brown egg. 5 n n 64.2 1.443 (62.757, 65.643) 4 4/19/2013 P 2 x 2 0.95 or 95%. n n Therefore, there is a 95% chance that theinterval Example 1 Cont. x 2 n includes the uknown value . Distribution of X-bar x 64.2 n 5 12 1.443 Location of the mean µ. x 2 n 64.2 2 1.443 (61.314, 67.086) 95% Confidence Interval for the mean weight of a brown egg. Example 1 Cont. What is an 80% Confidence Interval for the mean weight of a brown egg? We need to find a value z * so that : P z * x z* 0.80 or 80% n n 1 0.80 2 - z* = invNorm(.10, 0, 1) = -1.28 We can use z* to calculate the margin of error for the interval: m z * n Therefore an 80% CI for the mean weight of a brown egg is: 5 64.2 1.28 12 64.2 1.8475 or 62.35, 66.05 −z* = 5 4/19/2013 Confidence Interval for µ (σ given) In general A level C (expressed as a %) confidence interval for µ when σ is known is given by: x z * n Assumptions The population from where the sample is taken is normally distributed, or sample size n ≥ 30. - z* z* Example Weights of newborn babies follow a normal distribution with a standard deviation σ =1lb & an unknown mean µ. To estimate µ we look at the next 10 babies born. We find that the sample mean x-bar for these two babies is 6.35 lbs. (a) What is a 90% CI for µ based on this sample? (b) What is a 95% CI for µ based on this sample? (c) What is a 85% CI for µ based on this sample? 6 4/19/2013 How do we find specific z* values? Table D: Values of z* for the listed confidence levels C in the bottom row of the table are given in the row above it. Example: For a 98% confidence level, z*=2.326 We can also use software. For example, in Excel: =NORMINV(probability, mean, standard_dev) gives z for a given cumulative probability. Since we want the middle C probability, the probability we require is (1 - C)/2 Example: For a 98% confidence level, =NORMINV(.01,0,1) = −2.32635 (= neg. z*) Computing CI using the TI-83 1. Press STAT. 2. Select TESTS Zinterval. Select Inpt: Data; enter the value for σ, the list (Li) where the sample data is stored, and the confidence level ( C – Level) in decimal format. OR Select Inpt: Stats; enter the value for σ, x-bar, n, and the confidence level (C – Level). 3. Select Calculate & press Enter. 7 4/19/2013 Example 1: Calories in Apples The following table shows the number of calories in a sample of 10 apples of a certain variety. 49 69 39 30 54 50 65 63 64 41 Compute (a) 80%, (b) 90%, & (c) 98% confidence intervals for the true population mean of the number of calories in apples of this variety based on this sample. Assume caloric content in apples of this type is normally distributed with a standard deviation σ =10 calories. What does it all mean? Say we compute a 95% confidence interval, “95% /√n confidence” means 95% of the time the interval we compute captures the true value of the population mean (µ), and 5% of the time it misses it. 8 4/19/2013 Confidence intervals - Summary The confidence interval is a range of values with an associated probability or confidence level C. The probability quantifies the chance that the interval contains the true population parameter. Sample size and experimental design You may need a certain margin of error (e.g., drug trial, manufacturing specs). In many cases, the population variability ( is fixed, but we can choose the number of measurements (n). So plan ahead what sample size to use to achieve that margin of error. m z* n z * 2 n m Remember, though, that sample size is not always stretchable at will. There are typically costs and constraints associated with large samples. The best approach is to use the smallest sample size that can give you useful results. 9 4/19/2013 What sample size for a given margin of error? Density of bacteria in solution: Measurement equipment has standard deviation σ = 1 * 106 bacteria/ml fluid. How many measurements should you make to obtain a margin of error of at most 0.5 * 106 bacteria/ml with a confidence level of 90%? For a 90% confidence interval, z* = 1.645. z * 1.645 *1 2 n n 3.29 10.8241 m 0 . 5 2 2 Using only 10 measurements will not be enough to ensure that m is no more than 0.5 * 106. Therefore, we need at least 11 measurements. 10