Class 14 notes

advertisement
Class 14
Testing Hypotheses about Means
Paired samples
10.3 p 419-425
Weight (in pounds) of 72 anorexic
patients before and after treatment
Weight
before
80.7
89.4
91.8
74.0
78.1
88.3
87.3
75.1
80.6
78.4
77.6
88.7
81.3
78.1
70.5
77.3
85.2
86.0
81.4
79.7
85.5
84.4
79.0
77.5
Weight
after
80.2
81.0
86.4
86.3
76.1
78.1
75.1
86.7
73.5
84.6
77.4
79.5
89.6
81.4
81.8
77.3
84.2
75.4
79.5
73.0
88.3
84.7
81.4
81.2
Weight
before
72.3
89.0
80.5
84.9
81.5
82.6
79.9
88.7
94.9
76.3
81.0
80.5
85.0
89.2
81.3
76.5
70.0
80.4
83.3
83.0
87.7
84.2
86.4
76.5
Weight
after
88.2
78.8
82.2
85.6
81.4
81.9
76.4
103.6
98.4
93.4
73.4
82.1
96.7
95.3
82.4
72.5
90.9
71.3
85.4
81.6
89.1
83.9
82.7
75.7
Weight
before
80.2
87.8
83.3
79.7
84.5
80.8
87.4
83.6
83.3
86.0
82.5
86.7
79.6
76.9
94.2
73.4
80.5
81.6
82.1
77.6
83.5
89.9
86.0
87.3
Weight
after
82.6
100.4
85.2
83.6
84.6
86.2
86.7
95.2
94.3
91.5
91.9
100.3
76.7
76.8
101.6
94.9
75.2
77.3
95.5
90.7
92.5
93.8
91.7
98.0
Data/Data Analysis/
Descriptive Statistics/Summary Statistics and
s/n^.5
Confidence Level for Mean
7.9/72^.5
Before
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95.0%)
After
82.36
0.61
81.85
86
5.184
26.875
-0.007
-0.022
24.9
70
94.9
5929.9
72
1.218
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Confidence Level(95.0%)
85.04
0.93
84.05
81.4
7.927
62.838
-0.614
0.408
32.3
71.3
103.6
6122.8
72
1.863
82.36 +/- 1.218 is the 95%
confidence interval for the mean.
Test Statistic
H0: μb = μa
Ha: μa > μb
𝑠𝑝𝑜𝑜𝑙𝑒𝑑 =
𝑡=
71 26.875 + 71 62.838
= 6.6975
142
85.04 − 82.36
1
1
6.6975 × 72 + 72
= 2.40
P-value = t.dist.rt(2.40,142) = 0.0088
H0: μb = μa
Ha: μa > μb
Data
must be
in two
columns.
t-Test: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean
Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
After
85.039
62.838
72
44.857
0.000
142
2.400
0.00884
1.656
0.018
1.977
Before
82.360
26.875
72
Same as
previous
slide!
If this is all you
want, =t.test()
is for you!
The 2-sample t-test we just did is
VALID.
But we can do better…..
By taking advantage of our paired
data.
Paired Data
• n1 must equal n2
• For each of the before values, there must be a
corresponding after value for the same element.
– Here the data elements are the patients. And the paired nature
of the data is OBVIOUS.
• Using a paired test when the data are paired USUALLY leads
to a valid and LOWER p-value.
– Because s1 and s2 (the standard deviations of each group) do
NOT enter into the “equation”
– Instead, we use the sample standard deviation of the n
differences…which is usually “pretty” small.
• Instead of dealing with the variation in weights across the patients (s1
and s2), we deal only with the variation in pounds gained.
– 90 to 92 and 45 to 47 are both gains of 2.
H0: μb = μa
Ha: μa > μb
t-Test: Paired Two Sample for Means
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean
Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
After Before
85.039 82.36
62.838 26.875
72
72
0.3498
0
71
2.9116
0.0024
1.6666
0.0048
1.9939
Better
than
before!
H0: μb = μa
Ha: μa > μb
If all you want is
the p-value…..
1 for 1-tail
1 for
paired
The = t.dist(array1,array2,1,1) takes
you directly to the p-value
H0: μb = μa
Ha: μa > μb
A paired two-sample t-test
for means
Is equivalent to
A one-sample t-test of
H0: μA-B = 0.
ID
1
2
3
4
5
6
Group
1
1
1
1
1
1
Before
80.7
89.4
91.8
74
78.1
88.3
After
80.2
81
86.4
86.3
76.1
78.1
Aft-Before
-0.5
-8.4
-5.4
12.3
-2
-10.2
67
68
69
70
71
72
3
3
3
3
3
3
82.1
77.6
83.5
89.9
86
87.3
95.5
90.7
92.5
93.8
91.7
98
Average
count
stdev
standard
error
t-stat
dof
p-value
13.4
13.1
9
3.9
5.7
10.7
2.679167
72
7.807796
0.920158
2.911639
71
0.002401
2.68/.92
Case: The Sophomore Jinx
The Data….
Exhibit 1
American League Rookie Award Data, Non Pitchers
Rookie Year
Year Player
G
AB
BA
SA
1949 Roy Sievers
140
471
306
471
1950 Walter Dropo
136
559
322
583
1951 Gilbert McDougald
131
402
306
488
1953 Harvey Kuenn
155
679
308
386
1998 Ben Grieve
1999 Carlos Beltran
2001 Ichiro Suzuki
2002 Eric Hinske
2003 Angel Berroa
155
156
157
151
158
583
663
692
566
567
Sophomore Year
G
AB
BA
113
370
238
99
360
239
152
555
263
155
656
306
486
372
647
449
512
SA
395
369
369
390
288
293
350
279
287
458
454
457
481
451
148
98
157
124
134
265
247
321
243
262
481
366
425
437
385
Rookie Year
AB
BA
582
273
464
274
605
278
635
304
534
281
SA
442
472
415
435
433
Sophomore Year
G
AB
BA
148
572
280
34
127
236
146
607
282
152
593
295
157
580
319
SA
460
409
418
459
445
Exhibit 2
National League Non-Pitchers
Year
Player
1950 Samuel Jethroe
1951 Willie Mays
1953 James Gilliam
1954 Wallace Moon
1955 William Virdon
1996 Todd Hollandsworth
1997 Scott Rolen
2000 Rafael Furcal
2001 Albert Pujols
G
141
121
151
151
144
149
156
131
161
478
561
455
590
291
283
295
329
437
469
382
610
106
160
79
157
296
601
324
590
247
290
275
314
368
532
370
561
H0:
Ha:
P-value and Conclusion
Test Statistic
additional notes….
Download