Chapter 9 – Estimating the Value of a Parameter Using Confidence Intervals There are two branches of statistical inference, 1) estimation of parameters and 2) testing hypotheses about the values of parameters. We will consider estimation first. Defn: A point estimate of a parameter (a numerical characteristic of a population) is a specific numerical value based on the data obtained from a sample. To obtain a point estimate of a parameter, we summarize the information contained in the sample data by using a statistic. A particular statistic used to provide a point estimate of a parameter is called an estimator. For any parameter, there are many different possible estimators that could be used. In other words, there are many different ways to use a data set to find a number that estimates the parameter. Some of these estimators are better than others. We want to find the single best possible estimator of the parameter. Characteristics of a Good Estimator To provide good estimation, an estimator should have the following characteristics: 1) The estimator should be unbiased. In other words, the expectation, or mean, of the sampling distribution of the estimator for samples of a given size should be equal to the parameter which the statistic is estimating. 2) The estimator should be consistent. In other words, if we increase the size of the sample which we select from the population, the estimator should yield a value which gets closer to the true value of the parameter being estimated. 3) The estimator should be relatively efficient. In other words, of all possible statistics that could be used to estimate a particular parameter, we want to choose the statistic whose sampling distribution has the smallest variance. It turns out that, for a population mean, , the sample mean, consistent, and relatively efficient. X , is an estimator which is unbiased, In any given situation, we have no way of knowing precisely how close the estimated value of a parameter is to the true value of the parameter. However, if the estimator satisfies the three properties listed above, we can be highly confident that the estimated parameter value is unlikely to differ from the true parameter value by much. On the other hand, it is nearly certain that the estimated value will not be exactly equal to the true parameter value. We want to have some idea of the precision of the estimate. Hence, rather than simply calculating a point estimate of the parameter value, we find a confidence interval estimate. Defn: A confidence interval estimate of a parameter consists of an interval of numbers obtained from a point estimate of the parameter, together with a percentage that specifies how confident we are that the true parameter value lies in the interval. This percentage is called the confidence level, or confidence coefficient. Thus, there are three quantities that must be specified: 1) the point estimate of the parameter, 2) the width of the interval (which is usually centered at the value of the point estimate), and 3) the confidence level. Estimation of Population Means, When Is Unknown Now the Central Limit Theorem says that for large samples (n 30), the following random variable X has an approximate standard normal distribution: Z . This formula involves two unknown n parameters, the population mean and the population standard deviation. We want to be able to estimate the unknown population mean, but we need to do something about the unknown standard deviation. We will use the sample standard deviation, S, as an estimate of the population standard deviation. Hence, we are trying to estimate the population standard deviation using the random X variable . However, we cannot say that this random variable has an approximate standard S n normal distribution. It has a different distribution, called a t-distribution. Instead of using the standard normal distribution for constructing the confidence interval, we use a related distribution, called the t distribution. Characteristics of the t Distribution: 1) It is bell-shaped. 2) The distribution is unimodal (one peak) and symmetric, and the mean, median and mode are all equal to 0. 3) The curve is continuous; i.e., there are no gaps. 4) The total area under the curve is 1. 5) The curve extends indefinitely in both directions, approaching, but never touching, the horizontal axis. 6) The variance is greater than 1. 7) There is actually a family of t-distributions, each one characterized by a parameter called the degrees of freedom. 8) The t-curve is wider than the standard normal curve, but for larger sample sizes, the t-curve is closer to the standard normal curve. (See the graph on page 425 of your textbook for examples of t-curves, compared to the standard normal distribution.) Note: The symbol d.f. will be used for degrees of freedom. The concept denotes the number of values in the data set which are free to vary. When we first select a random sample of size n, we have n degrees of freedom. After we compute the sample mean, we have used up one degree of freedom, and now d.f. = n – 1. If we compute a second statistic from the data, then we use up another degree of freedom, and now d.f. = n – 2. Formula for a Confidence Interval for : (The development of the formula is presented below, for anyone who wants to go through it and understand how confidence intervals are constructed. You will be responsible only for understanding what a confidence interval is, and being able to find and interpret confidence intervals.) If we are sampling from a population which has a normal distribution, or if our sample size is large enough (n 30), then we may find the confidence interval as follows. We know that, in either of the X situations (normality or large sample size), the random variable has either a t-distribution (in S n the first case) or an approximate t-distribution (in the second case). Hence, for a given percentage , we may make the following probability statement: X P t t 1 . ,n 1 S 2 ,n1 2 n We can rearrange the quantities in parentheses to obtain an equivalent probability statement by multiplying throughout the inequality by the quantity S . We obtain n S S 1 . P t X t , n 1 n 2 2 , n 1 n We want to isolate the parameter, , in the middle of the inequality, so we subtract X throughout, obtaining S S 1 . P X t X t , n 1 , n 1 n n 2 2 We are almost done. We do not want - in the center, but rather , so we multiply throughout by –1, obtaining S S 1 . P X t X t , n 1 , n 1 n n 2 2 Then the endpoints of our confidence interval are X t 2 ,n 1 S S and X t . The confidence , n 1 n n 2 level is 1 - . Thus, a (1 – )100% confidence interval for the mean is given by: X t 2 S , n 1 n , X t 2 , n 1 S . n Here S is the sample standard deviation, and t , n 1 is the called a critical value for the t distribution 2 with d.f. = n – 1. We can find t critical values using Table F in Appendix C if we know the degrees of freedom of the t-distribution and the confidence level, 1- . For this purpose we would use the two-tail values for . For example, if our confidence level is (1 - )100% = 95%, then = 0.05. If our sample size is 25 then d.f. = 24, and the t-critical value is 2.064. Intervals constructed in the above manner are called confidence intervals for the corresponding parameter (in this case, a population mean, ). We are usually interested in one of several specific values of the confidence level, either 90%, or 95% or 99%. We interpret a confidence interval as follows. A confidence interval with confidence level (1 )100% is an interval obtained from sample data by a method such that, (1 - )100% of all intervals obtained by this method would, in fact, contain the true value of the parameter, and 100% of all intervals so obtained would not contain the true value of the parameter. Finding a Confidence Interval for Using the TI-83 Calculator: Given a data set of n sample values X1, X2, … , Xn, where these represent a random sample from a population with unknown mean , we can find a (1 – )100% confidence interval for as follows: 1) Enter the data in the calculator, using STAT and 1:Edit. 2) Choose STAT, then TESTS, then 8:T Interval. 3) Choose Data by hitting ENTER; or if the sample mean and sample standard deviation are given to you, choose Stats. 4) If we don’t know the sample mean and sample standard deviation, then we would choose Data, enter the variable name for List:, and enter the appropriate confidence level. If we know the sample mean and the sample standard deviation, then we would choose Stats, rather than Data, and enter the value of the sample mean, the value of the sample standard deviation, the sample size, and the appropriate confidence level. 5) Hit ENTER. Example: p. 432, Exercise 17. Example: p. 432, Exercise 21. Estimation of Population Proportions Sometimes we want to use sample data to estimate the proportion, p, of a population that have a certain characteristic. The point estimate of p would be the proportion, the characteristic of interest. p̂ , of the sample members that have In this case, we are talking about using a binomial experiment to do the estimation. We would select a random sample of n members of the population; the randomness insures that the n trials are independent of each other. We are seeking the same information about each sample member, whether the member has the characteristic of interest. Thus the trials are identical. Each trial results in one of two possible outcomes; either the member has the characteristic, or the member does not have the characteristic. Assuming that the sample is random, the probability that a member has the characteristic is p, the proportion of the population who have the characteristic. Hence the three conditions of a binomial experiment are satisfied. If we define X = number of members of the sample who have the characteristic, then X has a binomial distribution with parameters n and p. The proportion of the sample who have the characteristic of interest is then pˆ X . If the sample n size is large enough, then this random variable has an approximate normal distribution, and the random pˆ p variable has an approximate standard normal distribution. pˆ (1 pˆ ) n We can construct a (1 – )100% confidence interval for p using this fact. The form of the confidence pˆ 1 pˆ pˆ 1 pˆ . To find z , we look in the table of the standard , pˆ z interval is pˆ z n n 2 2 2 normal distribution, Table E in Appendix C. If our confidence level is (1 - )100% = 95%, then = 0.05, and we divide this by two and substract from 0.5, to get 0.4750. Looking in the table, we find that the critical value corresponding to the probability 0.4750 is 1.96. Finding a Confidence Interval for p Using the TI-83 Calculator: Given the number n of trials in our binomial experiment (size of sample) and the number of successes (number of members of the sample with the characteristic of interest), where the sample was taken from a population in which the unknown proportion of members with the characteristic is p, we can find a (1 – )100% confidence interval for p as follows: 1) 2) 3) 5) Enter the data in the calculator, using STAT and 1:Edit. Choose STAT, then TESTS, then A:1-PropZInt. Enter the value of X, the sample size, and the appropriate confidence level. Hit ENTER. Example: p. 442, Exercise 15. Example: p. 442, Exercise 19.