Hypothesis Tests: Two Independent Samples Cal State Northridge 320 Andrew Ainsworth PhD 2 Major Points Psy 320 - Cal State Northridge • What are independent samples? • Distribution of differences between means • An example • Heterogeneity of Variance • Effect size • Confidence limits 3 Psy 320 - Cal State Northridge Independent Samples • Samples are independent if the participants in each sample are not related in any way • Samples can be considered independent for 2 basic reasons ▫ First, samples randomly selected from 2 separate populations 4 Independent Samples Psy 320 - Cal State Northridge • Getting Independent Samples ▫ Method #1 Random Selection Population #2 Random Selection Population #1 Sample #1 Sample #2 5 Psy 320 - Cal State Northridge Independent Samples • Samples can be considered independent for 2 basic reasons ▫ Second, participants are randomly selected from a single population ▫ Subjects are then randomly assigned to one of 2 samples 6 Independent Samples Psy 320 - Cal State Northridge • Getting Independent Samples ▫ Method #2 Population Random Selection and Random Assignment Sample #1 Sample #2 7 Psy 320 - Cal State Northridge Sampling Distributions Again • If we draw all possible pairs of independent samples of size n1 and n2, respectively, from two populations. • Calculate a mean X 1 and X 2 in each one. • Record the difference between these values. • What have we created? • The Sampling Distribution of the Difference Between Means 8 Psy 320 - Cal State Northridge Sampling Distribution of the Difference Between Means Population 1 Population 2 Calculate Statistic 1 Sample 1 Sample 2 X11 X21 2 Sample 1 Sample 2 X12 X22 3 Sample 1 Sample 2 X13 X23 ... ... Sample 1 Sample 2 ∞ X1 X2 • Mean of sampling distribution X 1 is 1 • Mean of sampling distribution X 2 is 2 • So, the mean of a sampling distribution X 1 X 2 is 1- 2 9 Psy 320 - Cal State Northridge Sampling Distribution of the Difference Between Means • Shape ▫ approximately normal if both populations are approximately normal ▫ or N1 and N2 are reasonably large. 10 Psy 320 - Cal State Northridge Sampling Distribution of the Difference Between Means • Variability (variance sum rule) ▫ The variance of the sum or difference of 2 independent samples is equal to the sum of their variances 2 X1 X 2 X X 1 2 2 X1 2 X1 n1 2 X2 2 X2 n2 X2 n1 1 X2 2 n2 ; Remembering that X 1 Think Pathagoras c 2 a 2 b 2 X 1 n1 11 Psy 320 - Cal State Northridge Sampling Distribution of the Difference Between Means • Sampling distribution of X 1 X 2 12 n1 1 - 2 22 n2 12 Psy 320 - Cal State Northridge Converting Mean Differences into ZScores • In any normal distribution with a known mean and standard deviation, we can calculate z-scores to estimate probabilities. z ( X 1 X 2 ) ( 1 2 ) X X 1 2 • What if we don’t know the ’s? • You got it, we guess with s! 13 Psy 320 - Cal State Northridge Converting Mean Differences into t-scores • We can use • If, sX1 X 2 to estimate X X 1 2 ( X 1 X 2 ) ( 1 2 ) t s X1 X 2 ▫ The variances of the samples are roughly equal (homogeneity of variance) ▫ We estimate the common variance by pooling (averaging) 14 Psy 320 - Cal State Northridge Homogeneity of Variance • We assume that the variances are equal because: ▫ If they’re from the same population they should be equal (approximately) ▫ If they’re from different populations they need to be similar enough for us to justify comparing them (“apples to oranges” and all that) 15 Psy 320 - Cal State Northridge Pooling Variances • So far, we have taken 1 sample estimate (e.g. mean, SD, variance) and used it as our best estimate of the corresponding population parameter • Now we have 2 samples, the average of the 2 sample statistics is a better estimate of the parameter than either alone 16 Psy 320 - Cal State Northridge Pooling Variances • Also, if there are 2 groups and one groups is larger than it should “give” more to the average estimate (because it is a better estimate itself) • Assuming equal variances and pooling allows us to use an estimate of the population that is smaller than if we did not assume they were equal 17 Psy 320 - Cal State Northridge Calculating Pooled Variance • Remembering that SS SS s df n 1 2 sp2 is the pooled estimate: we pool SS and df to get s 2 p SS pooled df pooled SS1 SS 2 SS1 SS 2 SS1 SS 2 df1 df 2 (n1 1) (n2 1) n1 n2 2 18 Psy 320 - Cal State Northridge Calculating Pooled Variance SS1 SS 2 s n1 n2 2 2 p Method 1 SS1 ( X i1 X 1 ) Method 2 SS1 (n1 1) s 2 SS 2 ( X i 2 X 2 ) 2 1 2 SS 2 (n2 1) s 2 2 19 Psy 320 - Cal State Northridge Calculating Pooled Variance • When sample sizes are equal (n1 = n2) it is simply the average of the 2 2 2 s 2 p s X1 s X 2 2 • When unequal n we need a weighted average (n1 1) s (n2 1) s s n1 n2 2 2 p 2 1 2 2 20 Psy 320 - Cal State Northridge Calculating SE for the Differences Between Means s s X1 X 2 2 p n1 s 2 p n2 • Once we calculate the pooled variance we just substitute it in for the sigmas earlier • If n1 = n2 then s X1 X 2 s 2 p s 2 p 2 1 2 2 s s n1 n2 n1 n2 21 Psy 320 - Cal State Northridge Example: Violent Videos Games • Does playing violent video games increase aggressive behavior • Two independent randomly selected/assigned groups ▫ Grand Theft Auto: San Andreas (violent: 8 subjects) VS. NBA 2K7 (non-violent: 10 subjects) ▫ We want to compare mean number of aggressive behaviors following game play 22 Psy 320 - Cal State Northridge Hypotheses (Steps 1 and 2) • 2-tailed ▫ h0: 1 - 2 = 0 or Type of video game has no impact on aggressive behaviors ▫ h1: 1 - 2 0 or Type of video game has an impact on aggressive behaviors • 1-tailed ▫ h0: 1 - 2 ≤ 0 or GTA leads to the same or less aggressive behaviors as NBA ▫ h1: 1 - 2 > 0 or GTA leads to more aggressive behaviors than NBA 23 Psy 320 - Cal State Northridge Distribution, Test and Alpha (Steps 3 and 4) • We don’t know for either group, so we cannot perform a Z-test • We have to estimate with s so it is a t-test (t distribution) • There are 2 independent groups • So, an independent samples t-test • Assume alpha = .05 24 Psy 320 - Cal State Northridge Decision Rule (Step 5) • dftotal = (n1 – 1) + (n2 – 1) = n1 + n2 – 2 ▫ Group 1 has 8 subjects – df = 8 – 1= 7 ▫ Group 2 has 10 subjects – df = 10 – 1 = 9 • dftotal = (n1 – 1) + (n2 – 1) = 7 + 9 = 16 OR dftotal = n1 + n2 – 2 = 8 + 10 – 2 = 16 • 1-tailed t.05(16) = ____; If to > ____ reject 1-tailed • 2-tailed t.05(16) = ____; If to > ____ reject 2-tailed df 1-tailed 2-tailed 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0.1 0.2 3.078 1.886 1.683 1.533 1.476 1.44 1.415 1.397 1.383 1.372 1.363 1.356 1.35 1.345 1.341 1.337 1.333 1.33 1.328 1.325 1.323 1.321 1.319 1.318 1.316 Critical Values of Student's t 0.05 0.025 0.01 0.005 0.0005 0.1 0.05 0.02 0.01 0.001 6.314 12.706 31.821 63.657 636.619 2.92 4.303 6.965 9.925 31.598 2.353 3.182 4.5415 5.841 12.941 2.132 2.776 3.747 4.604 8.61 2.015 2.571 3.365 4.032 6.859 1.943 2.447 3.143 3.707 5.959 1.895 2.365 2.998 3.499 5.405 1.86 2.306 2.896 3.355 5.041 1.833 2.262 2.821 3.25 4.781 1.812 2.228 2.764 3.169 4.587 1.796 2.201 2.718 3.106 4.437 1.782 2.179 2.681 3.055 4.318 1.771 2.16 2.65 3.012 4.221 1.761 2.145 2.624 2.977 4.14 1.753 2.131 2.602 2.947 4.073 1.746 2.12 2.583 2.921 4.015 1.74 2.11 2.567 2.898 3.965 1.734 2.101 2.552 2.878 3.922 1.729 2.093 2.539 2.861 4.883 1.725 2.086 2.528 2.845 3.85 1.721 2.08 2.518 2.831 3.819 1.717 2.074 2.508 2.819 3.792 1.714 2.069 2.5 2.807 3.767 1.711 2.064 2.492 2.797 3.745 1.708 2.06 2.485 2.787 3.725 25 Psy 320 - Cal State Northridge t Distribution 26 Psy 320 - Cal State Northridge Calculate to (Step 6) Data Subject 1 2 3 4 5 6 7 8 9 10 GTA 12 9 11 13 8 10 9 10 NBA 9 9 8 6 11 7 8 11 8 7 Mean s s2 10.25 1.669 2.786 8.4 1.647 2.713 27 Psy 320 - Cal State Northridge Calculate to (Step 6) • We have means of 10.25 (GTA) and 8.4 (NBA), but both of these are sample means. • We want to test differences between sample means. ▫ Not between a sample and a population mean 28 Psy 320 - Cal State Northridge Calculate to (Step 6) • The t for 2 groups is: ( X 1 X 2 ) ( 1 2 ) t s X1 X 2 • But under the null 1 - 2 = 0, so: ( X1 X 2 ) t s X1 X 2 29 Psy 320 - Cal State Northridge Calculate to (Step 6) • We need to calculate s X1 X 2 and to do that we need to first calculate s 2 p (n1 1) s n2 1 s __(____) __ ____ s n1 n2 2 __ __ 2 2 p 2 1 2 2 _____ _____ _____ ____ __ __ 30 Calculate to (Step 6) Psy 320 - Cal State Northridge 2 p • Since s is _____ we can insert this into the equation s X X 1 s X1 X 2 2 s 2 p n1 s 2 p n2 ____ ____ __ __ ___ ___ ____ Therefore, X 1 X 2 ___ ___ ___ to ___ s X1 X 2 ___ ___ 31 Psy 320 - Cal State Northridge Make your decision (Step 7) • Since ____ > ____ we would reject the null hypothesis under a 2-tailed test • Since ____ > ____ we would reject the null hypothesis under a 1-tailed test • There is evidence that violent video games increase aggressive behaviors 32 Psy 320 - Cal State Northridge Heterogeneous Variances • Refers to case of unequal population variances • We don’t pool the sample variances just add them • We adjust df and look t up in tables for adjusted df • For adjustment use Minimum df = smaller n - 1 33 Psy 320 - Cal State Northridge Effect Size for Two Groups • Extension of what we already know. • We can often simply express the effect as the difference between means (e.g. 1.85) • We can scale the difference by the size of the standard deviation. ▫ Gives d (aka Cohen’s d) ▫ Which standard deviation? 34 Psy 320 - Cal State Northridge Effect Size • Use either standard deviation or their pooled average. ▫ We will pool because neither has any claim to priority. ▫ Pooled st. dev. = 2.745 1.657 35 Effect Size, cont. Psy 320 - Cal State Northridge X viol X nonviol 10.25 8.4 d s pooled 1.657 1.85 1.12 1.657 • This difference is approximately 1.12 standard deviations • This is a very large effect 36 Psy 320 - Cal State Northridge Confidence Limits CI.95 X 1 X 2 t.05,2tailed * s X 1 X 2 (____ ____) ____ ____ ____ ____ ____ ____ 37 Psy 320 - Cal State Northridge Confidence Limits • p = .95 that based on the sample values the interval formed in this way includes the true value of 1 - 2 • The probability that interval includes 1 - 2 given the value of X 1 X 2 • Does 0 fall in the interval? What does that mean? 38 Psy 320 - Cal State Northridge Computer Example • SPSS reproduces the t value and the confidence limits. • All other statistics are the same as well. • Note different df depending on homogeneity of variance. • Remember: Don’t pool if heterogeneity and modify degrees of freedom 39 SPSS output Psy 320 - Cal State Northridge Group Statistics AGGRESS GAME 1.00 gta 2.00 nba N Mean 10.2500 8.4000 8 10 Std. Deviation 1.66905 1.64655 Std. Error Mean .59010 .52068 Independent Samples Test Levene's Test for Equality of Variances F AGGRESS Equal variances assumed Equal variances not assumed .005 Sig. .942 t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 2.355 16 .032 1.8500 .78571 .18436 3.51564 2.351 15.048 .033 1.8500 .78697 .17308 3.52692