STAT500 HW#10_solutions STAT500 HW#10 Solutions 1) (15 pts; 5pts each) a) The value of the pooled-variance t-statistic, t = -3.83. b) The value of the non-pooled variance t’-statistic, t’ = -3.79. c) We want to test Ho: μ1 = μ2, Ha: μ1 ≠ μ2. Both statistics (t and t’) lead one to reject the null hypothesis (p = 0.000 and 0.000, respectively) at = 0.01. Similarly, the p-values for both statistics are less than = 0.001, we would reject the null hypothesis in this case as well. Then there is a difference in bonus percentages for gender, and the conclusion to reject the null hypothesis is the same at both the 1% and 0.1% levels of significance, regardless of which statistic (t or t’) is used to test the hypothesis. 2) (16 pts; 4pts each) The data are paired since twins are “naturally” paired. So the paired t test is used. a) The hypotheses are Ho: a - n = 0, Ha: a – n ≠ 0 (two-tailed test). From the output, we can reject Ho (t = 4.95; P-value = 0) at = 0.05, and conclude that the mean final grades are different for academic versus non-academic emphasis. b) The mean difference between final grades of the students in academic and non-academic environments is (from the output) 3.800. The 95% confidence interval is (2.230, 5.370). The size difference in the mean final grades is estimated to be between 2.230 and 5.370 with 95% confidence. c) The conditions appear to be satisfied for the use of a paired t-procedure. For example, the normality plot suggests that the difference is normally distributed. d) It appears that using twins in this study to control for variation in the final scores was effective as compared to taking a random sample of 30 students in both types of environments. Justification is provided by rejection of the null hypothesis in the paired-procedure (controlling for variation) and a failure to reject the same hypothesis in an independent two-sample t-procedure. 3) (16 pts; 4pts each) a. The data are not paired. Need to determine whether to use pooled variance or separate variance. Also both n1 and n2 are less than 30. So need to check normality assumption for both sets. STAT500 HW#10_solutions Normality assumptions in both cases are satisfied. S1 0.6 Separate variance needs to be used (by rule of thumb) S2 The hypotheses are H0: (wide) – (narrow) = 0, Ha: (wide) – (narrow) ≠ 0 (two-tailed test). Using the Minitab 2-sample t procedure (do not check the box for “assume equal variance”) STAT500 HW#10_solutions P-value of the test is 0.005 with “not equal” as the alternative. For tests with level of significance >0.005, H0 is rejected. Thus, these two types of jets have different mean noise levels. b. Size of the difference in the mean noise level is estimated by a 98% CI. From Minitab output, a 98% CI for the difference is (-14.79, - 1.58). c. Since zero is not in the 98% confidence interval, we can conclude at α = 0.02 that the two types of jets do in fact have different mean noise levels. d. The difference between the two jet noise levels is at most equal to -14.79 with 98% confidence. This represents a small percentage of the mean noise level, and may not be of practical importance. 4) (15 pts; 5pts each) a) The null hypothesis is “There is no difference in the amount spent on campaigns between males and females”, which means the difference between female campaign expenditures and male expenditures is equal to 0. The alternative hypothesis is “Female candidates spend less on their campaigns than male candidates”, i.e. the difference between female campaign expenditures and male expenditures is less than 0. Ho: μf μm 0, Ha: μf μm < 0 (one-sided left-tailed test). We have independent samples because the samples were randomly selected, no paired structure. Furthermore, according to the normality plot, there is no evidence that the samples are not drawn from a normal distribution (see chart below). STAT500 HW#10_solutions In addition, for variance similarity, Sf / Sm = 51.95 / 61.92 = 0.84 Sf Sm (1.414> 0.84 > 0.707) a pooled variance can be used. b) We use the pooled t-test procedure in Minitab (Stat > Basic Statistics > 2-sample t… and then clicking “Assume equal variances” box under “Options” tab, we get the following output: _________________________________________________________________ Two-sample T for Female_1 vs Male_1 SE N Mean StDev Mean Female_1 20 245.3 52.0 12 Male_1 61.9 14 20 351.0 Difference = μ (Female_1) - μ (Male_1) Estimate for difference: -105.7 98% upper bound for difference: -67.3 T-Test of difference = 0 (vs <): T-Value = -5.85 P-Value = 0.000 DF = 38 Both use Pooled StDev = 57.1554 ____________________________________________________________________ STAT500 HW#10_solutions The p-value that Minitab calculated is less than our desired significance level (0.000 < 0.02). This means that we do have sufficient evidence to conclude that women running for office in state legislature spend less than their male counterparts. c. sqrt(((20-1)51.95²+(20-1)61.92²)/(20+20-2)) = 57.15 t = (245.3 - 351)/(57.15*sqrt(1/20+1/20))= -5.85 From the table at α = 0.02 and df = 38, tα/2 = 2.42857. So, 98% 1 1 + = -105.7 ± 43.90 = (-149.60,-61.80) 20 20 The difference between the amount spent by female and male candidates is, to a 98% level of confidence, between 61.80 and 149.60. CI = (-105.7) ± 2.42857* 57.16 * 5) (15 pts; 5pts each) a. Null hypothesis: Montana. California does not have a higher mean hysterectomy cost than Alternative hypothesis: California has a higher mean hysterectomy cost than Montana. The patients were selected from each group at random and are not related to each other. There more than 30 samples in each group, so we can assume normality of the sample means. We can use two sample t test with unequal variances (unequal variances since STAT500 HW#10_solutions SMont / SCailif = 320 / 890 = 0.36). Minitab output is : ______________________________________________________________________________________________________ Two-Sample T-Test and C SE Sample N Mean StDev Mean 1 200 7458 320 23 2 200 12690 890 63 Difference = μ (1) - μ (2) Estimate for difference: -5232.0 95% upper bound for difference: -5121.6 T-Test of difference = 0 (vs <): T-Value = -78.23 P-Value = 0.000 DF = 249 ______________________________________________________________________________________________________ α = 0.05 P-Value = 0.000 0.05 > 0.000 We reject the null hypothesis at α = 0.05. b. 95% CI for difference: (-5363.7, -5100.3) The 95% CI is (-5363.7, -5100.3). c. SMont / SCailif = 320/890 = 0.360 < 0.707 (Use separate variance t-test) 6) (15 pts) The sample size is not large, thus we check the normal assumption of the data. STAT500 HW#10_solutions This plot suggests that normality of both husband and wife data are ok. We first do two-sample t-test. Minitab two-sample t test output (use pooled variance): ________________________________________________________________ Two-Sample T-Test and CI: Husband, Wife Two-sample T for Husband vs Wife N Mean StDev SE Mean Husband 10 49.7 18.9 6.0 Wife 17.4 5.5 10 47.1 Difference = μ (Husband) - μ (Wife) Estimate for difference: 2.60 95% lower bound for difference: -11.48 T-Test of difference = 0 (vs >): T-Value = 0.32 P-Value = 0.376 DF = 18 Both use Pooled StDev = 18.1583 __________________________________________________________________ In this result, we cannot reject H0. STAT500 HW#10_solutions Now we do paired-t test. Again, we start by checking the normality of the difference of age. This plot suggests that the differences (husband & wife) are normally distributed. We can, therefore, proceed with a paired t-test for the mean difference (d). We want to test Ho: d = 0, Ha: d > 0 where d = male (mean age) – female (mean age) at α = 0.05. (Note: This is one-sided right-tailed test.) Following is the Minitab output: __________________________________________________________________ Paired T-Test and CI: Husband, Wife Paired T for Husband - Wife N Mean StDev SE Mean Husband 10 49.70 18.92 5.98 Wife 10 47.10 17.36 5.49 Difference 10 2.60 3.57 1.13 95% lower bound for mean difference: 0.53 T-Test of mean difference = 0 (vs > 0): T-Value = 2.31 P-Value = 0.023 ____________________________________________________________________________ P-value = 0.023 < 0.05, so we reject null hypothesis at α = 0.05. This time, we conclude that the mean age of married men is higher than the mean age of married women. The two test results are different. The paired-t is more appropriate because the data STAT500 HW#10_solutions is paired. 7) (8 pts) Following is the Minitab output: __________________________________________________________________ Power and Sample Size Paired t Test Testing mean paired difference = 0 (versus > 0) Calculating power for mean paired difference = difference α = 0.02 Assumed standard deviation of paired differences = 3.57 Difference Sample Size Target Power Actual Power 1 109 0.8 0.800122 __________________________________________________________________ So, we should sample at least 109 married couples.