Review of Confidence Interval Concepts A confidence interval is an interval of values that is likely to "capture" the unknown value of a population parameter of interest, such as the true population mean, μ, or the true difference, μd. Another concept is to estimate the difference between two independent samples. However, we will save this discussion for a future lesson. The confidence level is the probability (fraction of times) that the procedure used to determine an interval gives an interval that actually captures the true population value. For example, say we repeatedly drew samples of the same size from a population and constructed 95% confidence intervals for each sample, and we repeated this process 1000 times. Then we would expect 95%, or 950, of these confidence intervals to contain the true population parameter. In reality, though, we typically construct only one such confidence interval and thus we are X-% confident that this interval has captured the true parameter. However in reality, this interval might or might not contain the true value. As a result, confidence intervals are exactly that: statements of how confident you are. These should not be interpreted, for example, to say that there is a 95% probability that the true value is in this interval. This is not true because the true value is either in the interval (i.e. probability of 1) or not in the interval (probability of 0). In most situations considered in our text, the general format for determining a confidence interval is Sample statistic ± Multiplier × Standard error In other words, we form a confidence interval by adding and subtracting an appropriate number of standard errors to (and from) the sample estimate. Last week we considered confidence intervals for 1-proportion and our multiplier in our interval used a z-value. But what if our variable of interest is a quantitative variable (e.g. GPA, Age, Height) and we want to estimate the population mean? In such a situation proportion confidence intervals are not appropriate since our interest is in a mean amount and not a proportion. Therefore we apply similar techniques but now we are interested in estimating the population mean, μ, by using the sample statistic and the multiplier is a t-value. Until now we assumed that our random variable came from a normal distribution with a known population standard deviation, σ. However, typically we do not know this parameter and therefore must estimate it. This is done by using the standard deviation of the sample which is expressed as "S". Since we need to make this estimate we lose our reference to the variable being from a normal distribution. These t-values come from a t-distribution which is similar to the standard normal distribution from which the z-values came. The similarities are that the distribution is symmetrical and centered on 0. The difference is that when using a t-table we need to consider a new feature: degrees of freedom (df). This degree of freedom will be based on the sample size, n. 1 Example of 1-proportion and means confidence intervals In the spring of 2006, all students registered for STAT200 were asked to complete a survey. A total of 1004 students responded. If we assume that this sample represents the PSU-UP undergraduate population, then we can make inferences regarding this population based on the survey results. What follows are three examples. 1. Have you ever made out with someone of the same sex? 2. How much are you willing to spend on a date (or want your date to spend on you)? 3. What is your weight (pounds)? If you could choose your "ideal" weight (in pounds), what would it be? Solutions: 1. pˆ Z * pˆ (1 pˆ ) 0.215(1 0.215) = 0.215 1.96 * = 0.189 ≤ p ≤ 0.241 n 1004 Variable SameSex_1 X 216 N 1004 Sample p 0.215139 95% CI (0.189722, 0.240557) Z-Value -18.05 P-Value 0.000 Interpretation: We are 95% confident that the true proportion of PSU-UP undergraduate students who have made out someone of the same sex is between 18.9% and 24.1% 41.544 s 2. x t * = 51.94 1.98 * = $49.35 ≤ u ≤ $54.54 988 n Variable DateSpnd N 988 Mean 51.9423 StDev 41.5440 SE Mean 1.3217 95% CI (49.3487, 54.5360) Interpretation: We are 95% confident that the true mean amount of money that PSU-UP undergraduate students think should be spent on the first date is between $49.35 and $54.54 3. x d t * sd n = 6.789 1.98 * 18.171 995 = 5.66 lbs. ≤ ud ≤ 7.92 lbs Paired T for Weight - IdlWt Weight IdlWt Difference N 995 995 995 Mean 152.067 145.277 6.78935 StDev 32.212 32.963 18.17087 SE Mean 1.021 1.045 0.57606 95% CI for mean difference: (5.65892, 7.91977 Interpretation: We are 95% confident that the true mean difference between actual weight and ideal weight of PSU-UP undergraduate students is between 5.66 lbs and 7.92 lbs Question: Does this mean that actual weight of all PSU-UP undergraduate students is more than their ideal weight? Answer: No. Since this interval estimates a mean difference we cannot say that all students feel that their actual weight is more than their ideal weight, but instead we say that on average PSUUP undergraduate students feel that their actual weight is more than their ideal weight. 2