Class Notes Mar 31 12.4 General Confidence Interval Procedure for One Mean As we recall, the general format of a confidence interval is: sample estimate multiplier standard error. When the parameter is the mean of a population, the sample estimate is x (the sample mean) and the standard error is s.e.( x ) = s . Moreover, the n multiplier is denoted as t*. The value of t* is determined using a probability distribution called the Student’s t-distribution or just t-distribution. Note: When we studied the confidence interval for sample proportion, the value of z* (multiplier) was determined using normal distribution. The difference here is the different probability distributions we use for sample mean and sample proportion. A parameter called degree of freedom, abbreviated as df, is associated with any t-distribution. For problems involving inference about a single mean, df = n-1, where n is the sample size. Features of t-distribution: 1. The t-distribution has a bell shape, centered at 0. 2. The t-distribution looks like a standard normal distribution except it is more spread out than the standard normal distribution. 3. The t-distribution will be very close to a standard normal distribution when its df is large enough. Conditions to satisfy in order to use a “t” confidence interval: 1. The population is bell-shaped and the sample is a random sample. That is, for small sample, the data should show no extreme skewness and should not contain any outliers. 2. If the population is not bell-shaped, a large random sample (n 30) will do. But if there are extreme outliers, it is better to have a larger sample. How to determine the t* Multiplier? Learn to use Table 12.1, on Pg. 451. How to calculate a confidence interval for a population mean? We use 6 steps: 1. Make sure the appropriate condition apply; 2. Determine the sample mean and standard deviation ( x and s); 3. Calculate the standard error of the mean. s.e.( x ) = s ; n 4. Calculate df = n-1 and choose a confidence level. 5. Use Table 12.1 (or statistical software) to find t*; 6. The interval is x t* s.e.( x ) which is x t* s . n Question 1: Suppose we want to know if the average number of CDs that PSU students own is smaller than 24. We draw a random sample of 250 students. The following is the Minitab output: Descriptive Statistics: C1 Variable C1 N 250 Mean 24.867 Median 25.459 TrMean 24.885 Variable C1 Minimum -6.503 Maximum 54.147 Q1 18.998 Q3 30.838 StDev 10.195 SE Mean 0.645 Use the 6 steps we mentioned before to find out a 95% confidence interval for the average number of CDs PSU students own. Is 24 included in this interval? How do you interpret the fact that 24 is (or not) included in this confidence interval? Interpretation of the confidence Intervals for sample mean: Each interval indicates the range of values that probably covers the true average of the population. 12.5 General Confidence Interval for the Difference Between Two Means Suppose we want to know if there is any difference between the average number of CDs that the male students own and the average number of CDs that the female students own. One way to express our question is to write the null hypothesis as: H0 : female - male = 0 (because female - male represents the difference between the average number of CDs of 2 samples, one is male, one is female) The general format of a confidence interval for the difference in two means is: difference in sample means t* standard error, where “difference in sample means” is x1 x2 ; standard error is s12 s 22 ( s1 , s2 are sample standard deviations for these 2 n1 n2 samples; n1, n2 are sample sizes). In the example we have, in order to calculate the confidence interval for the difference between the average number of CDs that the male students own and the average number of CDs that the female students own, we 1. Check conditions on males and females separately to see if we can use a confidence interval. 2. Calculate the sample means, x1 and x2 first, then compute the difference in sample means x1 x2 ; 3. Identify the sample sizes and standard deviations for the male and female samples (usually from Minitab output); 4. Find out t* using table 12.1 (we won’t do this because the df formula is too complicated); 5. Calculate the confidence interval. Note: We can use this method only for 2 independent samples. Question 2: Suppose we want to compare the average GPA of male and female Stat 200 students. Using the data we used on Thursday, try to apply the six steps on this problem (the Minitab output is attached). Two-Sample T-Test and CI: GPA, Gender Two-sample T for GPA Gender female male N 257 156 Mean 3.094 2.950 StDev 0.510 0.566 SE Mean 0.032 0.045 Difference = mu (female) - mu (male ) Estimate for difference: 0.1438 95% CI for difference: (0.0348, 0.2527) T-Test of difference = 0 (vs not =): T-Value = 2.60 P-Value = 0.010 DF = 301 We see the 95% confidence interval for the difference between the mean of the males and the mean of females is (0.0348, 0.2527). Since “0” is not included in this interval, we can say there is a difference between the mean of males and the mean of the females. Interpretation of our 95% confidence interval in this case: This 95% confidence interval tells us that 95% of all those differences between those 2 means (if we repeat the same experiment a large amount of times and compute a difference for each time) will fall in the range of (0.0348, 0.2527). It is sometimes reasonable to assume the equal variance. This is when we want to use pooled standard deviation. The equal variance assumption comes when we have reason to believe the 2 populations that we are interested in have the same variability. In practical, if the standard deviations of both samples are about the same, we have a good reason to use the pooled standard deviation. Using the previous example, we see the standard deviations for both males and females are about the same (how do you make this judgment?). Hence, we try to use the pooled standard deviation. The output is the following: Two-Sample T-Test and CI: GPA, Gender Two-sample T for GPA Gender N Mean StDev SE Mean female male 257 156 3.094 2.950 0.510 0.566 0.032 0.045 Difference = mu (female) - mu (male ) Estimate for difference: 0.1438 95% CI for difference: (0.0377, 0.2499) T-Test of difference = 0 (vs not =): T-Value = 2.66 P-Value = 0.008 DF = 411 The result is very much the same as the one that we had using un-pooled standard deviation except the df now is much bigger. 12.6 The Difference Between Two Proportions (Independent) Again, we have the general format: sample estimate multiplier standard error, where the sample estimate is pˆ1 pˆ 2 ; standard error is s.e.( pˆ1 pˆ 2 ) = pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) ; n1 n2 multiplier is denoted as z* and is determined using the standard normal distribution (notice that t* is used for the difference between two means). Conditions for a confidence interval for the difference in two proportions: 1. Sample proportions are available based on independent samples from the two populations. 2. All of the quantities n1 pˆ 1, n2 pˆ 2, n1 (1 pˆ 1 ), and n2 (1 pˆ 2 ) are at least 10. For example, if we want to see if there is any difference between the proportions of right-handedness among male and female Stat 200 students, we can use 95% confidence interval to solve this question. The Minitab output is the following: Test and CI for Two Proportions: Handed, Gender Success = right-handed Gender female male X 236 129 N 258 156 Sample p 0.914729 0.826923 Estimate for p(female) - p(male): 0.0878056 95% CI for p(female) - p(male): (0.0193535, 0.156258) Test for p(female) - p(male) = 0 (vs not = 0): Z = 2.51 P-Value = 0.012 The 95% confidence interval for the difference between the proportions of males and females is (0.019, 0.156). Since “0” is not included in this interval, we say there is a difference between the proportions of males and females. Interpretation of our 95% confidence interval: This interval tells us that that 95% of all differences between those 2 proportions (if we repeat the same experiment a large amount of times and compute a difference for each time) will fall in the range of (0.019, 0.156). Question 3: Decide if the following cases are doing a CI for one proportion, the difference of 2 proportions, one mean, or the difference of 2 means. 1. In order to compare the proportions of males and females students who have at least one tattoo, we draw a sample of 200 students for each gender. 2. State A claims that they have a higher average income tax than the income tax of State B. 3. To obtain the average time one needs driving from State College to NYC, we ask 200 students to drive from State College to NYC on the same day. 4. To see which gender is more likely to smoke, we use a sample of males and a sample of females. We then compare the ratios of smokers in each sample. 5. We want to see if a new drug has a cure rate of 65%.