Lecture 7 Analysis and Interpretation of Inferential Data Using t Distribution Reading Best and Kahn Chapter 11 Pages 406 to 423 Outline •Independent versus Dependent Samples •Assumptions about the independent-samples t-test •Calculate the independent sample t-test. •Degrees of freedom for the independent-samples t test. •Using EXCEL to calculate t test •Interpretation of t-test from SPSS •Presenting the results in APA format. Review Hypothesis Testing. Identify hypothesis to be tested and put it in symbolic form. Identify the null hypothesis Identify the alternative hypothesis. Select the significance level α based on the seriousness of the Type 1 error. Identify the statistic that is relevant to the test and identify the sampling distribution. Determine the test statistic either p value or critical value. Draw the graph. Reject H0: Test statistic is in the critical region or the p value ≤α. Fail to reject H0: test is not in the critical region or p value>α Finding P-Values Hypothesis Left tailed right tailed Type of test Is it a two tailed test? left P-value = area to the left of the test statistic P-value = twice the area to the left of the test statistic right P-value = twice the area to the right of the test statistic. P-value = area to the right of the test statistic. Independent versus Dependent Samples Definition Two samples drawn from two populations are independent if the selection of one sample from one population does not affect the selection of the second sample from the second population. Otherwise, the samples are dependent. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Example 1 Suppose we want to estimate the difference between the mean salaries of all male and all female executives. To do so, we draw two samples, one from the population of male executives and another from the population of female executives. These two samples are independent because they are drawn from two different populations, and the samples have no effect on each other. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Example 2 Suppose we want to estimate the difference between the mean weights of all participants before and after a weight loss program. To accomplish this, suppose we take a sample of 40 participants and measure their weights before and after the completion of this program. Note that these two samples include the same 40 participants. This is an example of two dependent samples. Such samples are also called paired or matched samples. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved T-test: Definition The t-test compares the means for two groups of individuals. It is a versatile statistical test. It can be used to test whether two group means are different. For example: Is Nightmare on Elm Street 2 is scarier than Nightmare on Elm street 1? Watch both movies and measure heart rate. Does listening to music while you work improve your attention? Get some people to write an essay while listening to music and then write a different essay when working in silence. Then compare their essay grades. Why use T test? Case 1 The population standard deviation is not known. The sample size is small ( n < 30). The population from which the sample is selected is normally distributed. Case 2 The population standard deviation is not known The sample size is large ( n ≥ 30). Hypothesis about two groups Suppose I ask you about your anxiety in taking the basic statistics course. If I ask you to number your anxiety level on a scale of 1 to 10, where 1 would mean very little anxiety and 10 would indicate that high anxiety. I then pose the following questions 1. Is the male any different from female anxiety? 2. Do students who have previous mathematics experience suffer less anxiety? 3. Do part-time students experience as much anxiety as full time students? 4. Are undergraduate students more anxious than post graduate students? Each one of these questions can be examined using an independent sample t-test. Is there a difference between the two group means? If you calculate two sample means and they are different, there are 2 possible reasons for the difference. 1. Each group comes from a different population and the sample means represent two different population means. When this happens you reject the null Hypothesis. 2. The groups come from the same population and the means vary by chance. You just happen to pick two groups with means that are far apart. You fail to reject the null hypothesis. The independent t-test Used in situations in which there are two experimental conditions and different participants used in each condition. The assumptions about independent sample t-tests The variable being measured is normally distributed. The variances of the groups being assessed are equivalent ( homogeneous) Sample 1 is randomly sampled from population1 and sample 2 is randomly sampled form population2. Independent sample t test equation t= x1 x 2 estimate of the standard error Example Estimate of Standard error Recall: the standard error tells us how variable the differences between sample means are by chance alone. If the standard deviations high then large differences between sample means can occur by chance. If the standard deviation is small then only small differences between the sample means are expected. The standard error of the sampling distribution is used to assess whether the difference between two samples means is statistically significantly meaningful or simply a chance result. . Variance Sum Law The variance sum law is used to calculate the standard deviation of the sampling distribution of differences between sample means. It states The variance of the difference between independent variables is equal to the sum of their variances ( Howell, 2006). In essence this tells you that: The variance of the sampling distribution is equal to the sum of variances of the two populations from which the samples were taken. Calculate the Standard error of each population. Using the sample standard deviation we calculate the standard error of each population’s sample distribution. SE of sampling distribution of population 1 = SE of the sample distribution of population 2 = s1 N1 s2 N2 Recall: Variance is equal to standard deviation squared. Calculate the variance of each population. Variance of Sampling distribution of population 1 = s1 N 1 2 2 s1 N1 Variance of sampling distribution of population 2 = s2 N 2 2 2 s2 N 2 The variance sum law: to find the variance of the sampling distribution of differences we add the variances of the sampling distribution. Variance of sampling distribution of differences = s12 s22 N1 N2 To find the standard error of the sampling distribution of differences we find the square root of the variance SE of the sampling distribution of differences = s12 s22 N N 2 1 Therefore substitute SE in the previous equation for t. ( See page 409, Best and Kahn). t x1 x2 s12 s22 N2 N1 This equation works only when the sample sizes are equal. Sometimes we ant to compare two groups that contain different numbers of participants then the above equation is not appropriate. Instead the pooled variance estimate t-test is used The pooled variance estimate for two samples. s 2 p n1 1 s 2 1 n2 1 s n1 n2 2 2 2 Pooled Standard Deviation for Two Samples The pooled standard deviation for two samples is computed as (n1 1)s (n2 1)s sp n 1 n2 2 2 1 2 2 2 1 and where n1 and n2 are the sizes of the two samples and s 2 are the variances of the two samples, respectively. Here s2 s is an estimator of σ. p Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Estimator of the Standard Deviation of x1 – x2 Estimator of the Standard Deviation of x1 – x2 The estimator of the standard deviation of is x1 x 2 s x1 x 2 s p 1 1 n1 n2 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Degree of freedom (df) for independent samples t- test The degree of freedom that we calculate for the independent sample t-test must reflect the number in each sample minus one. df = n1 + n2 – 2 or df = ( n1 -1) + (n2 – 1) or df = df1 + df2 Example A sample of 14 cans of Brand I diet soda gave the mean number of calories of 23 per can with a standard deviation of 3 calories. Another sample of 16 cans of Brand II diet soda gave the mean number of calories of 25 per can with a standard deviation of 4 calories. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Question At the 1% significance level, can you conclude that the mean number of calories per can are different for these two brands of diet soda? Assume that the calories per can of diet soda are normally distributed for each of the two brands and that the standard deviations for the two populations are equal. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Solution Step 1: H0: μ1 – μ2 = 0 (The mean numbers of calories are not different.) H1: μ1 – μ2 ≠ 0 (The mean numbers of calories are different.) Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Solution Step 2: The two samples are independent σ1 and σ2 are unknown but equal The sample sizes are small but both populations are normally distributed We will use the t distribution Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Solution Step 3: The ≠ sign in the alternative hypothesis indicates that the test is two-tailed. α = .01. Area in each tail = α / 2 = .01 / 2 = .005 df = n1 + n2 – 2 = 14 + 16 – 2 = 28 Critical values of t are -2.763 and 2.763. ( page 483 Best and Kahn) Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Draw figure Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Solution Step 4: (n1 1)s12 (n2 1)s22 (14 1)(3)2 (16 1)(4)2 sp 3.57071421 n1 n2 2 16 16 2 s x1 x 2 1 1 1 1 sp (3.57071421) 1.30674760 n1 n2 14 16 ( x1 x 2 ) ( 1 2 ) (23 25) 0 t 1.531 s x1 x 2 1.30674760 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Solution Step 5: The value of the test statistic t = -1.531 It falls in the nonrejection region Therefore, we fail to reject the null hypothesis Consequently, we conclude that there is no difference in the mean numbers of calories per can for the two brands of diet soda. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Example 2 A sample of 40 children from New York State showed that the mean time they spend watching television is 28.50 hours per week with a standard deviation of 4 hours. Another sample of 35 children from California showed that the mean time spent by them watching television is 23.25 hours per week with a standard deviation of 5 hours. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Question Using a 2.5% significance level, can you conclude that the mean time spent watching television by children in New York State is greater than that for children in California? Assume that the standard deviations for the two populations are equal. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Example Solution Step 1: H0: μ1 – μ2 = 0 H1: μ1 – μ2 > 0 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Step 2: The two samples are independent Standard deviations of the two populations are unknown but assumed to be equal Both samples are large We use the t distribution to make the test Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Step 3: α = .025 Area in the right tail of the t distribution = α = .025 df = n1 + n2 – 2 = 40 + 35 – 2 = 73 Critical value of t is 1.993 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Figure 10.4 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Step 4: (n1 1)s12 (n2 1)s22 (40 1)(4)2 (35 1)(5)2 sp 4.49352655 n1 n2 2 40 35 2 s x1 x 2 1 1 1 1 sp (4.49352655) 1.04004930 n1 n2 40 35 ( x1 x 2 ) ( 1 2 ) (28.50 23.25) 0 t 5.048 s x1 x 2 1.04004930 Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Step 5: The value of the test statistic t = 5.048 It falls in the rejection region Therefore, we reject the null hypothesis H0 Hence, we conclude that children in New York State spend more time, on average, watching TV than children in California. Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved Using EXCEL to calculate t http://www.ehow.com/video_4983079_use-excels-ttest-function.html?sms_ss=gmail&at_xt=4cc6188f5805f4b4,0 Output from the independent sample t-test SPSS gives an out put for independent t-test in 2 tables. Let us assume that we are assessing your statistics anxiety: from the population I have 2 groups sample size is 12. Group one is exposed to an innovative teaching strategy and group two the traditional statistics lecture. The group who were exposed to innovative strategy had a mean anxiety of 5.50 with a standard deviation of 2.78 SE of the group is .802. the group exposed to traditional lecture had a mean anxiety level of 5.58,with a standard deviation of 2.503 and SE of .723. SPSS Output ( table 1) Teaching strategy N Mean Std. Deviation Std. Error Mean Anxiety 12 12 5.58 5.50 2.77980 2.50303 .80246 .72256 Lecture Innovative Table 2 Levene’s test for Equality of variance T-test for equity of means 95% confidence interval of the difference Anxiety Equal variances assumed Equal variances not assumed F Sig .201 .659 T df Sig ( 2 tailed) . Mean difference -.077 22 939 -.083 -..077 21.762 .939 -.083 Std error difference 1.079 1.079 lower -2.322 -2.324 Upper 2.156 2.158 Explanation Notice there is more information in the second table. The top row label the statistics computed, below each label are the values calculated by SPSS. The first column is divided into two more rows. Row 1: equal variances assumed Row 2: equal variances not assumed. Remember: one of the assumptions of t is equal variances. When this is violated, we have the option of using a more conservative estimate. Levene’s Test for equity of variances F Levene’s test of homogeneity of variances computes a statistic called F. For our data F = .201 Sig: In this column, SPSS reports the significance of the Levene’s F. If the significance level is ≤ .05, then we conclude that the variances of the two groups differ significantly. The alpha associated with Levene’s F is .659. Since .659 is greater than .05, the difference between the variances is not significant. Thus we do not have evidence that we have violated the assumption of equal variances. T-test for Equality of means Because Levene’s test was not significant, we use the top row of the output labeled “ equal variances assumed” t The t value is -.077 df there are 22 degrees of freedom Sig( 2 tailed) the p at which the t is significant is .939. Since we use p≤.05, and .939 is greater than .05, the difference between students who were exposed to innovative teaching strategies and traditional lecture statistics anxiety level is not significant. Mean Difference The difference in anxiety between the samples is -.0833 Std. Error Difference: The standard error mean difference s 1.0798 x x 95% Confidence Interval of the difference Another ways of determining if there is a significant difference between the two means is to compute confidence bands around the observed t. If 0 falls within the band, we do not have a significant difference, if the band does not include 0, the difference is significant. Lower The lower point of the band is -2.322 Upper: the upper point of the band is 2.156 Since 0 falls within 95% confidence band, the difference between the two samples statistics anxiety levels is not significant. What do the results say? Based on our tow samples, t = -.077. Our calculated t does not exceed the critical value of t = ±2.074. Thus we fail to reject the null hypothesis. There is no significant difference between students statistics anxiety between student exposed to traditional lecture methods and those exposed to innovative strategies. If the results indicated a significant difference we would compute and effect size using Cohen’s d. Presenting results using APA format You then present your results Students who were exposed to innovative had a statistics anxiety level of 5.50 ( s = 2.78) while that of students exposed to traditional lectures was 5.58 ( s – 2.00). Statistics anxiety levels did not differ significantly between students exposed to traditional teaching methods and those exposed to innovative methods within this study sample. t(22) = -.077. p> .05. Or: no significant difference was found between students exposed to traditional teaching methods and those exposed to an innovative strategy. T(22) = .077, p = .939 In-class Exercise Best and Kahn Page 444 nos. 8 and 9