Chapter 8 – Confidence Intervals about a Single Parameter There are two branches of statistical inference, 1) estimation of parameters and 2) testing hypotheses about the values of parameters. We will consider estimation first. Defn: A point estimate of a parameter (a numerical characteristics of a population) is a specific numerical value based on the data obtained from a sample. To obtain a point estimate of a parameter, we summarize the information contained in the sample data by using a statistic. A particular statistic used to provide a point estimate of a parameter is called an estimator. Characteristics of an Estimator To provide good estimation, an estimator should have the following characteristics: 1) The estimator should be unbiased. In other words, the expectation, or mean, of the sampling distribution of the estimator for samples of a given size should be equal to the parameter which the statistic is estimating. 2) The estimator should be consistent. In other words, if we increase the size of the sample which we select from the population, the estimator should yield a value which gets closer to the true value of the parameter being estimated. 3) The estimator should be relatively efficient. In other words, of all possible statistics that could be used to estimate a particular parameter, we want to choose the statistic whose sampling distribution has the smallest variance. It turns out that, for a population mean, , the sample mean, consistent, and relatively efficient. X , is an estimator which is unbiased, In any given situation, we have no way of knowing precisely how close the estimated value of a parameter is to the true value of the parameter. However, if the estimator satisfies the three properties listed above, we can be highly confident that the estimated parameter value is unlikely to differ from the true parameter value by much. On the other hand, it is nearly certain that the estimated value will not be exactly equal to the true parameter value. We want to have some idea of the precision of the estimate. Hence, rather than simply calculating a point estimate of the parameter value, we find a confidence interval estimate. Defn: A confidence interval estimate of a parameter consists of an interval of numbers obtained from a point estimate of the parameter, together with a percentage that specifies how confident we are that the true parameter value lies in the interval. This percentage is called the confidence level, or confidence coefficient. Thus, there are three quantities that must be specified: 1) the point estimate of the parameter, 2) the width of the interval (which is usually centered at the value of the point estimate), and 3) the confidence level. Confidence Interval for a Population Mean (Assuming that the population standard deviation is known): If we are sampling from a population which has a normal distribution, or if our sample size is large enough (n 30), then we may find the confidence interval as follows (we will talk about the more realistic situation in which the population standard deviation is unknown later). We know that, in either of the situations (normality or large sample size), the random variable X X has either a standard normal probability distribution (in the first case) or an approximate Z X n standard normal probability distribution (in the second case). Hence, for a given percentage , we may make the following probability statement: X X P z z 1 . X 2 2 n We can rearrange the quantities in parentheses to obtain an equivalent probability statement by multiplying throughout the inequality by the quantity X . We obtain n P z X X z X 1 . n 2 2 n We want to isolate the parameter, , in the middle of the inequality, so we subtract X throughout, obtaining P X z X X z X 1 . n n 2 2 We are almost done. We do not want - in the center, but rather , so we multiply throughout by –1, obtaining P X z X X z X 1 . n n 2 2 Then the endpoints of our confidence interval are X z 2 X n and X z 2 X n . The confidence level is 1 - . Intervals constructed in the above manner are called confidence intervals for the corresponding parameter (in this case, a population mean, ). We are usually interested in one of several specific values of the confidence level, either 90%, or 95% or 99%. 1) For 1 - = .90, we have z 0.05 1.645 , and the 90% confidence interval for has the form X 1.645 X , n X 1.645 2) For 1 - = .95, we have X 1.96 X , n X . n z 0.025 1.96 , and the 95% confidence interval for has the form X 1.96 X . n 3) For 1 - = .99, we have z 0.005 2.575 , and the 99% confidence interval for has the form X 2.575 X , X 2.575 X . n n We interpret a confidence interval as follows. A confidence interval with confidence level (1 )100% is an interval obtained from sample data by a method such that, (1 - )100% of all intervals obtained by this method would, in fact, contain the true value of the parameter, and 100% of all intervals so obtained would not contain the true value of the parameter. Estimation of Population Means, When Is Unknown If we know the population standard deviation, , then we can use the previous formula for finding a confidence interval for the population mean, . However, in nearly every practical situation, we know neither. We then must use the sample standard deviation, S, as an estimate of the population standard deviation, and the previous formula for the confidence interval for is no longer valid. Instead of using the standard normal distribution for constructing the confidence interval, we use a related distribution, called the t distribution. Characteristics of the t Distribution: 1) It is bell-shaped. 2) The distribution is unimodal (one peak) and symmetric, and the mean, median and mode are all equal to 0. 3) The curve is continuous; i.e., there are no gaps. 4) The total area under the curve is 1. 5) The curve extends indefinitely in both directions, approaching, but never touching, the horizontal axis. 6) The variance is greater than 1. 7) There is actually a family of t-distributions, each one characterized by a parameter called the degrees of freedom. 8) The t-curve is wider than the standard normal curve, but for larger sample sizes, the t-curve is closer to the standard normal curve. (See the graph on page 358 of your textbook for examples of t-curves, compared to the standard normal distribution.) Note: The symbol d.f. will be used for degrees of freedom. The concept denotes the number of values in the data set which are free to vary. When we first select a random sample of size n, we have n degrees of freedom. After we compute the sample mean, we have used up one degree of freedom, and now d.f. = n – 1. If we compute a second statistic from the data, then we use up another degree of freedom, and now d.f. = n – 2. Formula for a Confidence Interval for When is Unknown, and n < 30: We modify our previous formula somewhat. When the population standard deviation is unknown, the (1 – )100% confidence interval for the mean is given by: X t is the sample standard deviation, and t 2 , n 1 2 S , n 1 n , X t 2 , n 1 S . Here S n is the called a critical value for the t distribution with d.f. = n – 1. We can find t critical values using Table F in Appendix C if we know the degrees of freedom of the t-distribution and the confidence level, 1- . For this purpose we would use the two-tail values for . For example, if our confidence level is (1 - )100% = 95%, then = 0.05. If our sample size is 25 then d.f. = 24, and the t-critical value is 2.064. Finding a Confidence Interval for Using the TI-83 Calculator: Given a data set of n sample values X1, X2, … , Xn, where these represent a random sample from a population with unknown mean , we can find a (1 – )100% confidence interval for as follows: 1) Enter the data in the calculator, using STAT and 1:Edit. 2) Choose STAT, then TESTS, then 8:T Interval. 3) Choose Data by hitting ENTER; or if the sample mean and sample standard deviation are given to you, choose Stats. 4) If we don’t know the sample mean and sample standard deviation, then we would choose Data, enter the variable name for List:, and enter the appropriate confidence level. If we know the sample mean and the sample standard deviation, then we would choose Stats, rather than Data, and enter the value of the sample mean, the value of the sample standard deviation, the sample size, and the appropriate confidence level. 5) Hit ENTER. Example: p. 364, Exercise 9. Example: p. 365, Exercise 11. Estimation of Population Proportions Sometimes we want to use sample data to estimate the proportion, p, of a population that have a certain characteristic. The point estimate of p would be the proportion, the characteristic of interest. p̂ , of the sample members that have In this case, we are talking about using a binomial experiment to do the estimation. We would select a random sample of n members of the population; the randomness insures that the n trials are independent of each other. We are seeking the same information about each sample member, whether the member has the characteristic of interest. Thus the trials are identical. Each trial results in one of two possible outcomes; either the member has the characteristic, or the member does not have the characteristic. Assuming that the sample is random, the probability that a member has the characteristic is p, the proportion of the population who have the characteristic. Hence the three conditions of a binomial experiment are satisfied. If we define X = number of members of the sample who have the characteristic, then X has a binomial distribution with parameters n and p. The proportion of the sample who have the characteristic of interest is then pˆ X . If the sample n size is large enough, then this random variable has an approximate normal distribution, and the random pˆ p variable has an approximate standard normal distribution. pˆ (1 pˆ ) n We can construct a (1 – )100% confidence interval for p using this fact. The form of the confidence interval is pˆ z 2 pˆ 1 pˆ , pˆ z n 2 pˆ 1 pˆ . To find z , we look in the table of the standard n 2 normal distribution, Table E in Appendix C. If our confidence level is (1 - )100% = 95%, then = 0.05, and we divide this by two and substract from 0.5, to get 0.4750. Looking in the table, we find that the critical value corresponding to the probability 0.4750 is 1.96. Finding a Confidence Interval for p Using the TI-83 Calculator: Given the number n of trials in our binomial experiment (size of sample) and the number of successes (number of members of the sample with the characteristic of interest), where the sample was taken from a population in which the unknown proportion of members with the characteristic is p, we can find a (1 – )100% confidence interval for p as follows: 1) Enter the data in the calculator, using STAT and 1:Edit. 2) Choose STAT, then TESTS, then A:1-PropZInt. 3) Enter the value of X, the sample size, and the appropriate confidence level. 5) Hit ENTER. Example: p. 374, Exercise 7.