Class 14 Testing Hypotheses about Means Paired samples 10.3 p 419-425 Weight (in pounds) of 72 anorexic patients before and after treatment Weight before 80.7 89.4 91.8 74.0 78.1 88.3 87.3 75.1 80.6 78.4 77.6 88.7 81.3 78.1 70.5 77.3 85.2 86.0 81.4 79.7 85.5 84.4 79.0 77.5 Weight after 80.2 81.0 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 77.4 79.5 89.6 81.4 81.8 77.3 84.2 75.4 79.5 73.0 88.3 84.7 81.4 81.2 Weight before 72.3 89.0 80.5 84.9 81.5 82.6 79.9 88.7 94.9 76.3 81.0 80.5 85.0 89.2 81.3 76.5 70.0 80.4 83.3 83.0 87.7 84.2 86.4 76.5 Weight after 88.2 78.8 82.2 85.6 81.4 81.9 76.4 103.6 98.4 93.4 73.4 82.1 96.7 95.3 82.4 72.5 90.9 71.3 85.4 81.6 89.1 83.9 82.7 75.7 Weight before 80.2 87.8 83.3 79.7 84.5 80.8 87.4 83.6 83.3 86.0 82.5 86.7 79.6 76.9 94.2 73.4 80.5 81.6 82.1 77.6 83.5 89.9 86.0 87.3 Weight after 82.6 100.4 85.2 83.6 84.6 86.2 86.7 95.2 94.3 91.5 91.9 100.3 76.7 76.8 101.6 94.9 75.2 77.3 95.5 90.7 92.5 93.8 91.7 98.0 Data/Data Analysis/ Descriptive Statistics/Summary Statistics and s/n^.5 Confidence Level for Mean 7.9/72^.5 Before Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%) After 82.36 0.61 81.85 86 5.184 26.875 -0.007 -0.022 24.9 70 94.9 5929.9 72 1.218 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%) 85.04 0.93 84.05 81.4 7.927 62.838 -0.614 0.408 32.3 71.3 103.6 6122.8 72 1.863 82.36 +/- 1.218 is the 95% confidence interval for the mean. Test Statistic H0: μb = μa Ha: μa > μb 𝑠𝑝𝑜𝑜𝑙𝑒𝑑 = 𝑡= 71 26.875 + 71 62.838 = 6.6975 142 85.04 − 82.36 1 1 6.6975 × 72 + 72 = 2.40 P-value = t.dist.rt(2.40,142) = 0.0088 H0: μb = μa Ha: μa > μb Data must be in two columns. t-Test: Two-Sample Assuming Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail After 85.039 62.838 72 44.857 0.000 142 2.400 0.00884 1.656 0.018 1.977 Before 82.360 26.875 72 Same as previous slide! If this is all you want, =t.test() is for you! The 2-sample t-test we just did is VALID. But we can do better….. By taking advantage of our paired data. Paired Data • n1 must equal n2 • For each of the before values, there must be a corresponding after value for the same element. – Here the data elements are the patients. And the paired nature of the data is OBVIOUS. • Using a paired test when the data are paired USUALLY leads to a valid and LOWER p-value. – Because s1 and s2 (the standard deviations of each group) do NOT enter into the “equation” – Instead, we use the sample standard deviation of the n differences…which is usually “pretty” small. • Instead of dealing with the variation in weights across the patients (s1 and s2), we deal only with the variation in pounds gained. – 90 to 92 and 45 to 47 are both gains of 2. H0: μb = μa Ha: μa > μb t-Test: Paired Two Sample for Means Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail After Before 85.039 82.36 62.838 26.875 72 72 0.3498 0 71 2.9116 0.0024 1.6666 0.0048 1.9939 Better than before! H0: μb = μa Ha: μa > μb If all you want is the p-value….. 1 for 1-tail 1 for paired The = t.dist(array1,array2,1,1) takes you directly to the p-value H0: μb = μa Ha: μa > μb A paired two-sample t-test for means Is equivalent to A one-sample t-test of H0: μA-B = 0. ID 1 2 3 4 5 6 Group 1 1 1 1 1 1 Before 80.7 89.4 91.8 74 78.1 88.3 After 80.2 81 86.4 86.3 76.1 78.1 Aft-Before -0.5 -8.4 -5.4 12.3 -2 -10.2 67 68 69 70 71 72 3 3 3 3 3 3 82.1 77.6 83.5 89.9 86 87.3 95.5 90.7 92.5 93.8 91.7 98 Average count stdev standard error t-stat dof p-value 13.4 13.1 9 3.9 5.7 10.7 2.679167 72 7.807796 0.920158 2.911639 71 0.002401 2.68/.92 Case: The Sophomore Jinx The Data…. Exhibit 1 American League Rookie Award Data, Non Pitchers Rookie Year Year Player G AB BA SA 1949 Roy Sievers 140 471 306 471 1950 Walter Dropo 136 559 322 583 1951 Gilbert McDougald 131 402 306 488 1953 Harvey Kuenn 155 679 308 386 1998 Ben Grieve 1999 Carlos Beltran 2001 Ichiro Suzuki 2002 Eric Hinske 2003 Angel Berroa 155 156 157 151 158 583 663 692 566 567 Sophomore Year G AB BA 113 370 238 99 360 239 152 555 263 155 656 306 486 372 647 449 512 SA 395 369 369 390 288 293 350 279 287 458 454 457 481 451 148 98 157 124 134 265 247 321 243 262 481 366 425 437 385 Rookie Year AB BA 582 273 464 274 605 278 635 304 534 281 SA 442 472 415 435 433 Sophomore Year G AB BA 148 572 280 34 127 236 146 607 282 152 593 295 157 580 319 SA 460 409 418 459 445 Exhibit 2 National League Non-Pitchers Year Player 1950 Samuel Jethroe 1951 Willie Mays 1953 James Gilliam 1954 Wallace Moon 1955 William Virdon 1996 Todd Hollandsworth 1997 Scott Rolen 2000 Rafael Furcal 2001 Albert Pujols G 141 121 151 151 144 149 156 131 161 478 561 455 590 291 283 295 329 437 469 382 610 106 160 79 157 296 601 324 590 247 290 275 314 368 532 370 561 H0: Ha: P-value and Conclusion Test Statistic additional notes….