CHAPTER 7, Sections 7.1 & 7.2 Revised Jan 27, 2011 Inference for the Mean of a Population when sigma is unknown, Sec 7.1 Previously we made the assumption that we knew the population standard deviation, σ. We then developed a confidence interval and used tests of significance to evaluate evidence for or against a hypothesis, all with a known σ. In many situations, σ is unknown. In this section, we will continue doing inference for the population mean, but we will use the sample standard deviation , s, as a substitute for the unknown population standard deviation. The procedures are called t procedures. Confidence Interval for a Mean First, Assumptions for Inference about the population mean: Our data are a simple random sample (SRS) of size n taken from a normally distributed population with mean µ and standard deviation σ, both of which are unknown. Unless a small sample is used, the assumption that the data comes from a SRS is more important than the assumption that the population distribution is normal. Because we do not know the population sigma, σ, we make two changes in our procedure: 1. The sample standard deviation , s, is used in place of the unknown σ to estimate the standard deviation of the sample mean. The result is called the standard error of the mean, SEM. SEM = SEx s n Where s is the sample standard deviation and n is the sample size. 2. We calculate a different test statistic, t instead of z, and our P Value comes from the t distribution instead of the Normal distribution. The t-distributions: The t-distribution is used when we do not know the population sigma, σ. The t-distributions have density curves similar in shape to the standard normal curve, but with more spread. Lecture 9, Section 7.1 & 7.2 Page 1 The t-distributions have more probability in the tails and less in the center when compared with the standard normal distribution. This is because substituting the estimate s for the unknown value of σ introduces more uncertainty. For sample sizes less than 30, s tends to underestimate σ. As the sample size increases, the t-density curve approaches the N(0,1) distribution. This is because s estimates σ with less bias as the sample size increases. The t Distributions Suppose that an SRS of size n is drawn from a N ( , ) population, but the values of µ and σ are unknown. Then the one-sample t statistic t x s/ n has the t distribution with n-1 degrees of freedom. There is a separate distribution for every sample size. For any given sample size the degrees of freedom = n-1. Each line on the t table is for one specific degree of freedom. For any line on the t table there are columns which show the value of t which coincides with certain specific probabilities in the right tail of the t distribution. The One-Sample t Confidence Interval Suppose that a SRS of size n is drawn from a normal population having unknown mean µ and σ. A level C confidence interval for µ is xt* s n where t* is the value for the t(n-1) density curve with area C between –t* and t*. Xbar is the point estimate, and t* (s)/√(n) is the margin of error. This interval is exact when the population distribution is normal and is approximately correct for large n in other cases. Examples : 1. Suppose X, Bob’s golf scores, are approximately normally distributed with unknown mean and standard deviation. A SRS of n = 16 scores is selected and a sample mean of x = 77 and a sample standard deviation, s, = 3 is calculated. Calculate a 90% confidence interval for . Lecture 9, Section 7.1 & 7.2 Page 2 2. (Example 7.1 in Textbook) Corn soy blend, CSB is highly nutritious, low-cost fortified food that can be incorporated into different food preparations worldwide. As part of a study to evaluate appropriate vitamin C levels in this commodity, measurements were taken on samples of CSB produced in a factory. The following data are the amounts of vitamin C, measured in milligrams per 100 grams of blend, for a random sample of size 8 from a production run. Compute a 95% confidence interval for where is the population mean vitamin C content of the CSB. 26 31 23 22 11 22 14 31 By hand: x = 22.5 Lecture 9, Section 7.1 & 7.2 Page 3 s = 7.191 n = 8 df = 7 Using SPSS: analyze > descriptive statistics > explore Move “vitaminC” to “dependent list”. Click “statistics” and select “descriptives” and change/keep a confidence interval. Click “continue” followed by “OK”. 95% Descriptives Vitamin C Statistic 22.50 Mean 95% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Std. Error 2.542 16.49 28.51 22.67 22.50 51.714 7.191 Minimum 11 Maximum 31 Range 20 Interquartile Range 14 Skewness -.443 .752 Kurtosis -.631 1.481 The One-Sample t test: 1. State the Null and Alternative hypothesis. 2. Find the test statistic: Suppose that a SRS of size n is drawn from a normal population having unknown mean µand unknown σ. To test the hypothesis H 0 : 0 based on a SRS of size n, compute the one-sample t statistic t x 0 s/ n 3. Calculate the p-value. In terms of a random variable T having the t(n-1) distribution, the Pvalue for a test of H 0 : 0 against Ha : 0 is P(T t ) , one side right Ha : 0 is P(T t ) , one side left Ha : 0 is 2 P(T | t |) , two side These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases. Lecture 9, Section 7.1 & 7.2 Page 4 4. State the conclusions in terms of the problem. Use the given α or if the value is not specified use α = 0.05 as the default. Then compare the P-value to the α level. If P-value α, then reject H0 . We have sufficient evidence to reject Ho. If P-value > α, then we fail to reject H0 . We lack sufficient evidence to reject H0 . Either way, we should use the words of the story in the conclusion. Examples: 1. Experiments on learning in animals sometimes measure how long it takes mice to find their way through a maze. Suppose the population mean time is 18 seconds for one particular maze. A researcher thinks that loud noise will decrease the time it takes a mouse to complete the maze. She measures how long each of 30 mice take to complete the maze with loud noise stimulus. She gets a sample mean, Xbar = 16 seconds and a sample standard deviation, s = 3 seconds. Do a one sided hypothesis test to test the researchers assertions with α = 0.1. 2. (Example 7.2 in Textbook) Suppose that we know that sufficient vitamin C was added to the CSB mixture to produce a mean vitamin C content in the final product of 40 mg/100 g. It is suspected that some of the vitamin C is lost in the production process. To test this hypothesis we can conduct a one-sided test to determine if there is sufficient evidence to conclude that vitamin C is lost. A sample of 8 batches of CSB was tested for vitamin C. The sample mean = 22.50 and the sample standard deviation = 7.191. Use α = 0.05 level. By hand: Ho: µ = 40 Lecture 9, Section 7.1 & 7.2 Page 5 Ha: µ < 40 one side left test Using SPSS: analyze > compare means > One sample T test Move “vitaminc” into the “test variable box” and type in 40 for the test value. To change the confidence interval, Click “options” and change confidence interval from 95% to whatever. I did not do this as I will keep the 95% default. Click “continue”. Lastly click “OK”. One-Sample Statistics N Vitamin C Mean 8 Std. Deviation 22.50 Std. Error Mean 7.191 2.542 One-Sample Test Test Value = 40 95% Confidence Interval of the Difference vitamin C t -6.883 df 7 Sig. (2-tailed) .000 Mean Difference -17.500 Lower -23.51 Upper -11.49 Matched Pairs Design: A common design to compare two treatments is the matched pairs design. One type of matched pair design has 2 subjects who are similar in important aspects matched in pairs and each treatment is given to one of the subjects in each pair. A 2nd common design does not use matched subjects. Instead each subject is given 2 treatments in random order. Ex: Each subject does a left hand test and a right hand test. Lecture 9, Section 7.1 & 7.2 Page 6 Another common design uses before-treatment and after-treatment observations on each of the subjects in the experiment. Each subject thereby provides data on the difference, or improvement, or reduction associated with the treatment. Assumptions for the matched pair t test: 1. The data values are paired and we analyze the line-by-line differences. 2. The line-by-line differences are independent of each other. 3. The line-by-line differences are normally distributed with unknown population mean and unknown population standard deviation. Paired t Procedures: To compare the responses to the two treatments in a matched pairs design, determine the difference between the two treatments for each subject and analyze the observed differences. Example: (Problem 7.31 is done by hand and using SPSS): The researchers studying vitamin C in CSB in example 7.1 were also interested in a similar commodity called wheat soy blend (WSB). Both these commodities are mixed with other ingredients and cooked. Loss of vitamin C as a result of cooking was a concern of the researchers. One preparation used in Haiti called gruel can be made from WSB, salt, sugar, milk, banana, and other optional items to improve the taste. Five samples of gruel prepared in Haitian households were obtained. The vitamin C content of these 5 samples was measured before and after cooking. Set up appropriate hypotheses and carry out a significance test for these data. Hypotheses: Ho: µ of differences (before – after) =0; µ = pop mean of differences Ha: µ of differences (before – after) >0 Here are the data: Sample 1 Before 73 After 70 Difference 3 2 79 77 2 3 86 79 7 4 88 86 2 5 78 67 11 Xbar 80.8 75.8 5.0 s 6.140 7.530 3.937 BY HAND: Xbar of differences = Test statistic, t = 2.840 Lecture 9, Section 7.1 & 7.2 Page 7 5.0 Std Dev of differences =s = 3.937 P Value is between .02 and .025. Exact P Value = .0234 ( from computer). Conclusion: if α = .05: Reject Ho. There is sufficient evidence to conclude that vitamin C is lower after cooking. Using SPSS: > Analyze > Compare Means > Paired – Sample T test. Move “before and after” to “paired variable box” (whichever variable is listed first will come first in the subtraction) Click “OK” Paired Samples Statistics Mean Pair 1 N Std. Deviation Std. Error Mean Before 80.8000 5 6.14003 2.74591 After 75.8000 5 7.52994 3.36749 Paired Samples Test Paired Differences 95% Confidence Interval of the Mean Pair Before - 1 After 5.00000 Std. Std. Error Deviation Mean 3.93700 1.76068 Difference Lower .11156 Sig. (2Upper t 9.88844 2.840 df tailed) 4 .047 (P-Values for t tests are given by SPSS as Sig (2 tailed) and must be divided by 2 for one-side tests). One side P Value = .0235. A confidence interval or statistical test is called robust if the confidence level or Pvalue does not change very much when the assumptions of the procedure are violated. The t procedures are robust against non-normality of the population when there are no outliers, especially when the distribution is roughly symmetric and unimodal. Robustness and use of the One-Sample t and Matched Pair t procedures: Unless a small sample is used, the assumption that the data comes from a SRS is more important than the assumption that the population distribution is normal. Lecture 9, Section 7.1 & 7.2 Page 8 n<15: Use t procedures only if the data are close to normal with no outliers. n is 15 - 39: The t procedure can be used except in the presence of outliers or strong skewness. n≥40: The t procedure can be used even for clearly skewed distributions. Comparing Two Means: Two-Sample Problems: Sec 7.2 A situation in which two populations or two treatments based on separate samples are compared. A two-sample problem can arise: from a randomized comparative experiment which randomly divides the units into two groups and imposes a different treatment on each group. From a comparison of two random samples selected separately from two different populations. Note: Do not confuse two-sample designs with matched pair designs! In the twosample t problems, each group is composed of separate subjects, ie, no subject is in both groups, and each subject furnishes only one piece of data, ie, no subject is tested twice. Assumptions for Comparing Two Means: Two independent simple random samples, from two distinct populations are compared. The same variable is measured on both samples. The sample observations are independent, ie, neither sample has an influence on the other. Both populations are normally distributed. The means 1 and 2 and standard deviations 1 and 2 of both populations are unknown. Typically we want to compare two population means by giving a confidence interval for their difference, 1 2 , or by testing the hypothesis of no difference, H0 : 1 2 0. The Two-Sample t Confidence Interval: Suppose that an SRS of size n1 is drawn from a normal population with unknown mean and that an independent SRS of size n2 is drawn from another normal 1 Lecture 9, Section 7.1 & 7.2 Page 9 population with unknown mean 2 . The confidence interval for the difference between population means, 1 2 is given by s12 s22 ( x1 x2 ) t * n1 n2 The value of t* is determined for the confidence level, C, desired. The confidence interval formula is still composed of the same two components, (1) a point estimate for the difference between population means, and (2) a margin of error which expresses the uncertainty involved. Note that the two sample sizes do not have to be equal. Here, t* is the value for the t(k) density curve with area C between –t* and t*. The value of the degrees of freedom, k, is approximated by software, or if we do the calculations by hand, we determine the df using the smaller of the two sample sizes. Two-Sample t Procedure For Tests Of Significance: 1. Write the hypotheses in terms of the difference between means. H0 : u1 2 0 H a : 1 2 0 H a : 1 2 0 H a : 1 2 0 one side right or one side left two side or 2. Calculate the test statistic. A SRS of size n is drawn from a normal 1 1 and a SRS of size n2 is drawn from another normal population with unknown mean 2 . Sigma 1 and sigma 2 are also population with unknown mean unknown. To test the hypothesis H 0 : u1 2 0 the two-sample t statistic is: t x1 x 2 s12 s22 n1 n2 and use P-values or critical values for the t(k) distribution, where the degrees of freedom k is either approximated by software, or is based on the df for the smaller of the the two sample sizes. Note: SPSS calculates a value for degrees of freedom using a complicated formula which takes the sample sizes Lecture 9, Section 7.1 & 7.2 Page 10 and sample standard deviations into account. This value does not have to be an integer. The df value can be as small as the df for the smaller sample, or it can be as large as the sum of the two dfs. The closer the sample sizes are, and the closer the two sample standard deviations are, the higher the calculated df value will be. 3. Calculate the P-value. Note: Unless we use software, we can only get a range for the P-value. We use the following formulas: H a : 1 2 0 use P(T t ) one side right test H a : 1 2 0 use P(T t ) one side left test H a : 1 2 0 use 2 P(T | t |) two side test Note: If doing the calculations by hand you should use the df for the smaller of the two sample sizes. The resulting procedure is conservative. 4. State the conclusions in terms of the problem. Choose a significance level such as α = 0.05, then compare the P-value to the α level. If P-value α, then reject H0 ; we have sufficient evidence to reject Ho If P-value > α, then fail to reject H0 ; we lack sufficient evidence …….. Robustness and use of the Two-Sample t Procedures: The two-sample t procedures are more robust than the one-sample t methods, particularly when the distributions are not symmetric. They are robust in the following circumstances: If two samples are equal size and the two populations that the samples come from have similar distributions then the t distribution is accurate for a variety of distributions even when the sample sizes are as small as n1 n2 5 . When the two population distributions are different, larger samples are needed. n n 15 : Use two-sample t procedures if the data are close to normal. If 1 2 the data are clearly non normal or if outliers are present, do not use t. n1 + n2 is between 15-39: The t procedures can be used except in the presence of outliers or strong skewness. n n 40 : The t procedures can be used even for clearly skewed 1 2 distributions. Lecture 9, Section 7.1 & 7.2 Page 11 Examples: 1. The U.S. Department of Agriculture (USDA) uses many types of surveys to obtain important economic estimates. In one pilot study they estimated wheat prices in July and in September using independent samples. Here is a brief summary from the report: Month July September n 90 45 x s / n (SEM) 3.50 3.61 0.023 0.029 a. Note that the standard error of the sample mean,(SEM) was reported, instead of sample standard deviations. We find the standard deviation for each of the samples as follows: b. Use a significance test to examine whether or not the price of wheat was the same in July and September. Be sure to give details and carefully state your conclusion. Ho: µ july = µ sept Ha: µ july ≠ µ sept two side test because direction not indicated Lecture 9, Section 7.1 & 7.2 Page 12 c. Give a 95% confidence interval for the increase in the population mean between July and September. 2. The survey for Study Habits and Attitudes (SSHA) is a psychological test designed to measure the motivation, study habits, and attitudes toward learning of college students. These factors, along with ability are important in explaining success in school. Scores on the SSHA range from 0 to 200. A selective private college gives the SSHA to a SRS of both male and female first-year students. Here are the data for 18 women: 154 109 137 115 152 140 145 178 101 103 126 126 137 165 165 129 200 148 Here are the data for 20 men: 108 140 114 91 180 109 132 75 88 113 115 151 126 70 92 115 169 187 146 104 a. Examine each sample graphically, with special attention to outliers and skewness. Is use of a t procedure acceptable for these data? STEMPLOT MEN STEM WOMEN 50 7 8 8 21 9 984 10 139 5543 11 5 6 12 669 2 13 77 60 14 058 1 15 24 9 16 55 17 8 70 18 19 20 0 Lecture 9, Section 7.1 & 7.2 Page 13 6 5 Frequency 4 3 2 1 Mean = 140.56 Std. Dev. = 26.262 N = 18 0 100 120 140 160 Women scores Lecture 9, Section 7.1 & 7.2 Page 14 180 200 7 6 Frequency 5 4 3 2 1 Mean = 121.25 Std. Dev. = 32.852 N = 20 0 60 80 100 120 140 160 180 200 Men scores b. Most studies have found that the mean SSHA score for men is lower than the mean score in a comparable group of women. Test this supposition here. That is, state the hypotheses, carry out the test and obtain a P-value, and give your conclusions. Lecture 9, Section 7.1 & 7.2 Page 15 Using SPSS: Note: The data needs to be typed in using two columns. In the first column you need to put all the scores. In the second column define the grouping variable as gender and enter “ women” next to the women’s scores and “men” next to the men’s scores. Analyze > Compare means > Independent Sample T test Move “score” to “Test Variable” box and “gender” to “grouping variable” box. Click “define groups” and enter “women” for group 1 and “men” for group 2. Click “continue” followed by “OK”. Group Statistics score group women N men Mean Std. Deviation 18 140.56 26.262 6.190 20 121.25 32.852 7.346 Levene's Test for Equality of Variances F score Equal variances assumed Equal variances not assumed Std. Error Mean 1.030 Sig. .317 t-test for Equality of Means t Sig. (2tailed) df Mean Differenc e Std. Error Differenc e 95% Confidence Interval of the Difference Lower Upper 1.986 36 .055 19.306 9.721 -.410 39.021 2.010 35.537 .052 19.306 9.606 -.185 38.797 Independent Samples Test We ALWAYS use the second line, “Equal variances not assumed” to get the t test statistic, p-value, etc. Never assume equal variances in this test. NOTE: PValue in the above example is .052/2 = .026 Give a 95% confidence interval for the mean difference between the SSHA scores of male and female first-year students at this college. Lecture 9, Section 7.1 & 7.2 Page 16 3. Suppose we wanted to compare how students performed on test 1 versus test 2 in stat 301. Below is data for a random sample of 10 students taking stat 301 along with the printout of the results from running A MATCHED PAIR T TEST. Subject Test 1 1 2 3 4 5 6 7 8 9 10 Test 2 60 59 90 87 99 100 92 82 75 84 55 60 82 85 100 98 90 76 79 82 Paired Samples Statistics Pair 1 Tes t 1 Tes t 2 Mean 82.80 80.70 N 10 10 Std. Deviation 14.382 14.507 Std. Error Mean 4.548 4.588 Paired Samples Test Paired Differences Pair 1 Tes t 1 - Test 2 Mean 2.100 Std. Deviation 3.573 Std. Error Mean 1.130 95% Confidence Interval of the Difference Lower Upper -.456 4.656 t 1.859 df 9 Sig. (2-tailed) .096 Answer the questions below based on the test. a. Why was a matched pairs test used as opposed to a two sample t-test? Lecture 9, Section 7.1 & 7.2 Page 17 b. Is there a difference between test 1 and test 2 scores. Write out the two-side matched pair hypotheses to test this and give the P-value. Are our results significant at the 5% significance level? c. Suppose one of our friends thought test 1 was easier and the students generally did better on it. We want to test whether the student is correct. Write out the one-side matched pair hypotheses to test this and give the P-value. Are our results significant at the 5% significance level? Lecture 9, Section 7.1 & 7.2 Page 18 4. An instructor thought the material for the second test was more difficult and hence the students may have done worse on the second test. She decided to randomly sample 10 Exam 1 tests and a different 10 Exam 2 tests, with each sample taken from all the tests taken, ie the two groups of 10 are different students. Below is data for the two random samples and the analysis. Sample Test 1 Sample Test 2 60 59 90 87 99 100 92 82 75 84 55 60 82 85 100 98 90 76 79 82 Group Statistics s core test 1 2 N Mean 82.80 80.70 10 10 Std. Error Mean 4.548 4.588 Std. Deviation 14.382 14.507 Independent Samples Test Levene's Test for Equality of Variances F s core Equal variances ass umed Equal variances not as sumed .015 Sig. .905 t-tes t for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper .325 18 .749 2.100 6.460 -11.472 15.672 .325 17.999 .749 2.100 6.460 -11.472 15.672 a. What procedure was used and why? Two sample T test because the two groups are made up of different students. b. Write out the one-side hypotheses for this two-sample t test and give the Pvalue. Are your results significant at the 5% significance level? Lecture 9, Section 7.1 & 7.2 Page 19