Statistical Methods I Hypothesis Development and Ttests Self Check – Answers Question 1: Researchers wanted to test the folklore that women who were not given information regarding the gender of their unborn child could accurately guess the gender at levels greater than chance. To test this, they asked a sample of 104 pregnant women to guess the sex of their babies. Of these, 67 guessed correctly. a) Develop the appropriate null and alternative hypothesis statements for this test. Since there are two outcomes (male/female), guessing would most likely result in a 50% “success” rate. If we use this as our benchmark, then “success” in determining gender better than 50% would be better than guessing. Therefore, the null and alternative hypotheses (or the “claim”) are: H0: p< 50% Ha: p>50% b) Test this hypothesis using alpha = .05. Without a dataset, it is easier just to execute this one by hand (yes it is). Here is the math: Z = .6442 - .50/SQRT((.6442*.3558)/104) (note that the .6442 comes from 67/104) Z = .1442/.0469 = 3.07. This would indicate that the value of .6442 is more than 3 standard deviations greater than the benchmark established for guessing of .50. c) Explain your conclusion. The critical value associated with alpha = .05 is 1.645. Since 3.07 is well beyond what was established as the critical value, we would easily reject the null hypothesis and conclude that pregnant women can determine the gender of their babies at a higher rate than guessing. Question 2: Refer to the Pennstate1 Dataset a) The variable “Fastest” is the fastest speed that students have admitted to driving. Is the average fastest speed greater than 90? Test this using alpha=.01. Explain your conclusion. This is a one sample ttest of mean. Using the SAS code attached, you will generate the following output: N Mean Std Dev Std Err Minimum Maximum 189 97.15 18.47 1.3434 30.00 150.00 Mean 99% CL Mean Std Dev 99% CL Std Dev 97.15 93.65 100.60 DF t Value Pr > |t| 188 5.32 <.0001 18.47 16.29 21.26 The first box provides descriptive statistics on our sample. From this we know that the mean fastest speed is 97MPH (that’s fast!). So, we would not be surprised to see a ttest result that would indicate that the average speed is greater than 90. The second box provides the 99% confidence interval for the fastest speed. This interval was not specifically requested, but you should get into the habit of providing it as part of your analysis. You will also notice that the value of 90 is outside (below) the 99% interval of 93.65MPH and 100.60MPH. The third box is really our output of interest. Here, we see that the t-statistic is 5.32 – meaning that our outcome of 97 MPH is over 5 standard deviations above the hypothesized value of 90. The associated p-value of <.0001 would also indicate to us that students drive much faster than 90 MPH. Recall that the p-value is the calculated probability of making a Type I error – the probability of rejecting the null hypothesis (fastest speed is less than 90MPH) when the null hypothesis was true. This value of <.0001 is about as low as you will ever see. Since the p-value is lower than our alpha (.01), we will confidently reject the null hypothesis and conclude that students are driving faster than 90MPH (yikes). b) Lets hypothesize that male students’ fastest speeds are greater than female students’ fastest speeds. Develop the hypothesis statements and test this at alpha = .01. Explain your conclusion. Because males and females represent two independent populations, we will use a two sample independent t-test of the mean of the differences. Using the SAS Code attached, you will generate the following output: SEX Female Male N Mean Std Dev Std Err Minimum Maximum 102 88.4020 14.4313 1.4289 30.00 130.00 87 107.4 17.4339 1.8691 55.00 150.00 -19.0003 15.8828 2.3179 Diff (1-2) SEX Method Mean 99% CL Mean Std Dev 99% CL Std Dev Female 88.40 84.65 92.15 14.43 12.20 17.57 Male 107.4 102.5 112.3 17.43 14.54 21.610 15.88 14.00 18.29 Diff (1-2) Pooled -19.00 -25.03 -12.97 Diff (1-2) Satterthwaite -19.00 -25.13 -12.87 Method Variances Pooled Equal Satterthwaite Unequal DF t Value Pr > |t| 187 -8.20 <.0001 167.25 -8.08 <.0001 Equality of Variances Method Folded F Num DF Den DF F Value Pr > F 86 101 1.46 0.0678 The associated hypothesis statements for this test are: H0: µm < µf Ha: µm > µf The first box provides descriptive statistics on our sample for each gender. From this we know that the mean fastest speed for the female students is 88MPH and the mean fastest speed for the male students is 107MPH, and the difference between the two is 19MPH (since the value is negative, since the value is female minus male). Note that if the null hypothesis was true for this test, the difference would be close to 0. So, we would not be surprised to see a ttest result that would indicate that there is a difference between the two genders. The second box provides the 99% confidence interval for the individual genders’ fastest speeds and for the difference between the two genders’ fastest speeds. This interval was not specifically requested, but you should get into the habit of providing it as part of your analysis. You will also notice that the value of 0 is not included the 99% interval of the differences -25.13 MPH and -12.87MPH. Note that technically we should be using the Satterthwaite (unspooled) results since the variances are not equal, but the results for both methods are similar. The third box is really our output of interest. Here, we see that the t-statistic is over 8 for the two methods – meaning that our outcome of men driving 19 MPH than the women is 8 standard deviations above the hypothesized difference of 0. The associated pvalue of <.0001 would also indicate to us that a true difference exists and the men are driving faster. Recall that the p-value is the calculated probability of making a Type I error – the probability of rejecting the null hypothesis (the difference is 0 or, in this case positive – meaning that the women drive faster) when the null hypothesis was true. This value of <.0001 is about as low as you will ever see. Since the p-value is lower than our alpha (.01), we will confidently reject the null hypothesis and conclude that men are driving faster than women. The fourth box is simply a test of similarity of variance – the results tell us whether we should be using the pooled or the unpooled (Satterthwaite) results. For this test, the null is that they are similar. So, if the p-value is less than alpha, we would reject the null and conclude that they are different – as is the case here. If the p-value is large (greater than .1) this would indicate that the two groups have similar variance and we would use the pooled results. Question 3: Refer to the Cholest1 Dataset a) Medical researchers believe that a patient’s cholesterol level drops after a heart attack. Develop the hypothesis statements to test this statement and the associated testing matrix. The time period of interest is from Day 2 after the attack to Day 4 after the attack. H0: µd < 0 Ha: µd > 0 Where, difference = Day 2 Cholesterol – Day 4 Cholesterol. Ho True Type 1 Error Valid Decision Reject Ho Fail to Reject Ho Ho False Valid Decision Type 2 Error b) Test this claim at the alpha = .01 level. Explain your conclusion. This is a paired ttest of means. This is true because the measurements are taking place on the same population – two days apart. Using the SAS Code attached, you will generate the following output: Statistics Difference twoday fourday N Lower CL Mean Mean Upper CL Mean 45 14.20 25.82 37.43 Lower CL Std Dev Std Dev 32.00 38.65 Upper CL Std Dev 48.83 Std Err Minimum Maximum 5.76 -46 108 T-Tests Difference DF t Value twoday fourday 44 4.48 Pr > |t| <.0001 The first box provides us with the descriptive statistics on the difference between the two measurements. It also includes the confidence interval around the difference. The second box is really our box of interest. Here, we find that the t-statistic of 4.48 indicates that our mean difference of 25.82 is 4.48 standard deviations above the null mean of 0. The associated p-value of <.0001 is much less than our established alpha value of .01. Therefore, we would reject the null with confidence and conclude that the difference is greater than 0. c) Explain the implications of a Type 1 Error and the probability of this happening. The reported p-value is the calculated probability of making a Type 1 error. In this case, the p-value is very low at <.0001. A Type 1 error occurs when we reject the null hypothesis when the null was actually true. In the present context, a patient could be given cholesterol lowering medication, when it is not necessary, with potentially lifethreatening results. SAS CODE for Ttests Self Test No SAS Code for Question 1. *Question 2: Refer to the Pennstate1 Dataset a)The variable "Fastest" is the fastest speed that students have admitted to driving. Is the average fastest speed greater than 90? Test this using alpha=.01. Explain your conclusion.; Proc ttest data=jlp.pennstate1 H0=90 alpha=.01; Var Fastest; Run; *Question 2: b) Lets hypothesize that male students' fastest speeds are greater than female students' fastest speeds. Develop the hypothesis statements and test this at alpha = .01. Explain your conclusion.; Proc ttest data=jlp.pennstate1 alpha=.01; Var Fastest; Class sex; Run; *Question 3: Refer to the Cholest1 Dataset b) Test this claim at the alpha = .01 level. Proc ttest data=jlp.Cholest1; Paired twoday*fourday; Run; Explain your conclusion.;