Ttests Self Test Ans..

advertisement
Statistical Methods I
Hypothesis Development and Ttests
Self Check – Answers
Question 1: Researchers wanted to test the folklore that women who were not given
information regarding the gender of their unborn child could accurately guess the
gender at levels greater than chance. To test this, they asked a sample of 104
pregnant women to guess the sex of their babies. Of these, 67 guessed correctly.
a) Develop the appropriate null and alternative hypothesis statements for this test.
Since there are two outcomes (male/female), guessing would most likely result in a 50%
“success” rate. If we use this as our benchmark, then “success” in determining gender
better than 50% would be better than guessing. Therefore, the null and alternative
hypotheses (or the “claim”) are:
H0: p< 50%
Ha: p>50%
b) Test this hypothesis using alpha = .05.
Without a dataset, it is easier just to execute this one by hand (yes it is). Here is the
math:
Z = .6442 - .50/SQRT((.6442*.3558)/104) (note that the .6442 comes from 67/104)
Z = .1442/.0469 = 3.07.
This would indicate that the value of .6442 is more than 3 standard deviations greater
than the benchmark established for guessing of .50.
c) Explain your conclusion.
The critical value associated with alpha = .05 is 1.645. Since 3.07 is well beyond what
was established as the critical value, we would easily reject the null hypothesis and
conclude that pregnant women can determine the gender of their babies at a higher
rate than guessing.
Question 2: Refer to the Pennstate1 Dataset
a) The variable “Fastest” is the fastest speed that students have admitted to driving.
Is the average fastest speed greater than 90? Test this using alpha=.01. Explain
your conclusion.
This is a one sample ttest of mean. Using the SAS code attached, you will generate the
following output:
N
Mean
Std Dev
Std Err
Minimum
Maximum
189
97.15
18.47
1.3434
30.00
150.00
Mean
99% CL Mean
Std Dev 99% CL Std Dev
97.15
93.65
100.60
DF
t Value
Pr > |t|
188
5.32
<.0001
18.47
16.29
21.26
The first box provides descriptive statistics on our sample. From this we know that the
mean fastest speed is 97MPH (that’s fast!). So, we would not be surprised to see a ttest
result that would indicate that the average speed is greater than 90.
The second box provides the 99% confidence interval for the fastest speed. This interval
was not specifically requested, but you should get into the habit of providing it as part
of your analysis. You will also notice that the value of 90 is outside (below) the 99%
interval of 93.65MPH and 100.60MPH.
The third box is really our output of interest. Here, we see that the t-statistic is 5.32 –
meaning that our outcome of 97 MPH is over 5 standard deviations above the
hypothesized value of 90. The associated p-value of <.0001 would also indicate to us
that students drive much faster than 90 MPH. Recall that the p-value is the calculated
probability of making a Type I error – the probability of rejecting the null hypothesis
(fastest speed is less than 90MPH) when the null hypothesis was true. This value of <.0001
is about as low as you will ever see. Since the p-value is lower than our alpha (.01), we
will confidently reject the null hypothesis and conclude that students are driving faster
than 90MPH (yikes).
b) Lets hypothesize that male students’ fastest speeds are greater than female
students’ fastest speeds. Develop the hypothesis statements and test this at
alpha = .01. Explain your conclusion.
Because males and females represent two independent populations, we will use a two
sample independent t-test of the mean of the differences. Using the SAS Code
attached, you will generate the following output:
SEX
Female
Male
N
Mean
Std Dev Std Err
Minimum
Maximum
102
88.4020
14.4313 1.4289
30.00
130.00
87
107.4
17.4339 1.8691
55.00
150.00
-19.0003
15.8828 2.3179
Diff (1-2)
SEX
Method
Mean 99% CL Mean
Std Dev 99% CL Std Dev
Female
88.40
84.65
92.15
14.43
12.20
17.57
Male
107.4
102.5
112.3
17.43
14.54
21.610
15.88
14.00
18.29
Diff (1-2)
Pooled
-19.00
-25.03
-12.97
Diff (1-2)
Satterthwaite
-19.00
-25.13
-12.87
Method
Variances
Pooled
Equal
Satterthwaite
Unequal
DF
t Value
Pr > |t|
187
-8.20
<.0001
167.25
-8.08
<.0001
Equality of Variances
Method
Folded F
Num DF
Den DF
F Value
Pr > F
86
101
1.46
0.0678
The associated hypothesis statements for this test are:
H0: µm < µf
Ha: µm > µf
The first box provides descriptive statistics on our sample for each gender. From this we
know that the mean fastest speed for the female students is 88MPH and the mean
fastest speed for the male students is 107MPH, and the difference between the two is 19MPH (since the value is negative, since the value is female minus male). Note that if
the null hypothesis was true for this test, the difference would be close to 0. So, we
would not be surprised to see a ttest result that would indicate that there is a difference
between the two genders.
The second box provides the 99% confidence interval for the individual genders’ fastest
speeds and for the difference between the two genders’ fastest speeds. This interval
was not specifically requested, but you should get into the habit of providing it as part
of your analysis. You will also notice that the value of 0 is not included the 99% interval
of the differences -25.13 MPH and -12.87MPH. Note that technically we should be using
the Satterthwaite (unspooled) results since the variances are not equal, but the results
for both methods are similar.
The third box is really our output of interest. Here, we see that the t-statistic is over 8 for
the two methods – meaning that our outcome of men driving 19 MPH than the women
is 8 standard deviations above the hypothesized difference of 0. The associated pvalue of <.0001 would also indicate to us that a true difference exists and the men are
driving faster. Recall that the p-value is the calculated probability of making a Type I
error – the probability of rejecting the null hypothesis (the difference is 0 or, in this case
positive – meaning that the women drive faster) when the null hypothesis was true. This
value of <.0001 is about as low as you will ever see. Since the p-value is lower than our
alpha (.01), we will confidently reject the null hypothesis and conclude that men are
driving faster than women.
The fourth box is simply a test of similarity of variance – the results tell us whether we
should be using the pooled or the unpooled (Satterthwaite) results. For this test, the null
is that they are similar. So, if the p-value is less than alpha, we would reject the null and
conclude that they are different – as is the case here. If the p-value is large (greater
than .1) this would indicate that the two groups have similar variance and we would
use the pooled results.
Question 3: Refer to the Cholest1 Dataset
a) Medical researchers believe that a patient’s cholesterol level drops after a heart
attack. Develop the hypothesis statements to test this statement and the
associated testing matrix. The time period of interest is from Day 2 after the
attack to Day 4 after the attack.
H0: µd < 0
Ha: µd > 0
Where, difference = Day 2 Cholesterol – Day 4 Cholesterol.
Ho True
Type 1 Error
Valid Decision
Reject Ho
Fail to Reject Ho
Ho False
Valid Decision
Type 2 Error
b) Test this claim at the alpha = .01 level. Explain your conclusion.
This is a paired ttest of means. This is true because the measurements are taking place
on the same population – two days apart. Using the SAS Code attached, you will
generate the following output:
Statistics
Difference
twoday fourday
N
Lower CL
Mean Mean
Upper CL
Mean
45
14.20 25.82
37.43
Lower CL
Std Dev Std Dev
32.00
38.65
Upper CL
Std Dev
48.83
Std Err Minimum Maximum
5.76
-46
108
T-Tests
Difference
DF t Value
twoday fourday
44
4.48
Pr > |t|
<.0001
The first box provides us with the descriptive statistics on the difference between the two
measurements. It also includes the confidence interval around the difference.
The second box is really our box of interest. Here, we find that the t-statistic of 4.48
indicates that our mean difference of 25.82 is 4.48 standard deviations above the null
mean of 0. The associated p-value of <.0001 is much less than our established alpha
value of .01. Therefore, we would reject the null with confidence and conclude that
the difference is greater than 0.
c) Explain the implications of a Type 1 Error and the probability of this happening.
The reported p-value is the calculated probability of making a Type 1 error. In this case,
the p-value is very low at <.0001. A Type 1 error occurs when we reject the null
hypothesis when the null was actually true. In the present context, a patient could be
given cholesterol lowering medication, when it is not necessary, with potentially lifethreatening results.
SAS CODE for Ttests Self Test
No SAS Code for Question 1.
*Question 2: Refer to the Pennstate1 Dataset
a)The variable "Fastest" is the fastest speed that students have admitted to
driving.
Is the average fastest speed greater than 90? Test this using alpha=.01.
Explain your conclusion.;
Proc ttest data=jlp.pennstate1 H0=90 alpha=.01;
Var Fastest;
Run;
*Question 2: b)
Lets hypothesize that male students' fastest speeds are
greater than female students' fastest speeds.
Develop the hypothesis statements and test this at alpha = .01. Explain your
conclusion.;
Proc ttest data=jlp.pennstate1 alpha=.01;
Var Fastest;
Class sex;
Run;
*Question 3: Refer to the Cholest1 Dataset
b)
Test this claim at the alpha = .01 level.
Proc ttest data=jlp.Cholest1;
Paired twoday*fourday;
Run;
Explain your conclusion.;
Download