Exercise #4 碩專二甲 NA0C0003 程方麗 Application Activity with Confidence Intervals 1. 2 groups, scores on three tests Confidence intervals for the mean differences a. Aptitude test (37 points possible): (-1.57, .99) b. Grammaticality judgment test (200 points possible) : (-5.62, .76) c. Phonemic distribution task (96 points possible) : (-10.8, -.001) Q: Are the groups statistically different from each other on any of the tests? Q: What can you say about the precision of the estimates of mean difference? Q: Which test do you think has the largest effect size? Zero falls within the area in the “Aptitude test” and “Grammaticality judgment test”, except in the “Phonemic distribution task”. Therefore, there are no statistically significant differences in the “Aptitude test” and “Grammaticality judgment test” between the two groups. But there is statistically significant difference in “Phonemic distribution task”. In terms of the distance from zero, Phonemic distribution task is the farthest from zero, implying that the Phonemic distribution task is likely to have the largest effect size. However, with the farthest distance from zero, the Phonemic distribution task implies that there is least confidence in estimating precisely. In terms of confidence intervals reflected from the three tests, there is statistically significant difference shown in the Phonemic distribution task between the two groups. As far as the precision of estimation, the width of the Phonemic distribution task is .114 (nearly 11 points CI divided by 96), the width of the Grammaticality judgment test is .035 (nearly 7 points CI divided by 200), and the width of Aptitude test is .081 (nearly 3 points CI divided by 37). Obviously, the Phonemic distribution task has the largest interval with the farthest distance away from zero, so it shows the slightest probability of precisely estimating compared with the other two tests. In this vein, the Grammaticality judgment test with a narrower width (.035) than the Aptitude test whose width is .081 tends to have a more precise estimation. As it is, the greater the CI is, the larger the effect size is. Therefore, the Phonemic distribution task has the largest effect size. 2. mean difference between groups PTP-NP = 3.42, the mean difference between OLP-NP = 3.38, and the mean difference between OLP-PTP= 0.14 Q: Are the groups statistically different from each other on any of the tests? Q: What can you say about the precision of the estimates of mean difference? Q: Which test has the largest effect size? As the Figure 4.3 shows, confidence interval for the PTP-NP roughly falls within the range (-0.10~0.00); confidence interval for the OLP-NP lies within the range (-0.10~0.001); confidence interval for the OLP-PTP roughly lies within the range (-0.037~0.038). Confidence interval for the PTP-NP lies centrally to the left side of zero, meaning the statistic result doesn’t reject null hypothesis and no statistic difference exists. Likewise, confidence interval for the OLP-NP also lies almost centrally to the left side of zero, because only very slight distance (0.001) lies away from zero; in terms of confidence intervals, both are one-tail direction with no or very slight distance away from zero. As for the OLP-PTP, zero lies almost in the middle from the right side and from the left side with two tailed direction. In other words, the regions of rejecting the null hypothesis from right and left sides are almost the same with alpha value .025 in the 95% confidence interval. Synthetically speaking, the groups are not statistically different from each other on any of the tests. As the Figure 4.3 shows, confidence interval for the PTP-NP roughly falls within the range (-0.10~0.00) with the width .292 (3.42 divided by nearly 1point); confidence interval for the OLP-NP lies within the range (-0.10~0.001) with the width .295(3.38 divided by nearly 1 point); confidence interval for the OLP-PTP roughly lies within the range (-0.037~0.038) with the width .536 (0.075 divided by 0.14). Among the pairs of comparison, the OLP-PTP reflects that zero lies nearly in the middle distance from the right and the left within the area, implying no statistically significant difference exists between OLP and PTP. Besides, this pair OLP-PTP has a wider width (0.536) than the other two pairs(PTP-NP 0.292 and OLP-NP 0.295 respectively), implying that, in a sense, the pair OLP-PTP has less precise estimation than the other two groups with fairly precise estimation. In terms of effect size, the greater the distance from zero is, the bigger effect size there might be. Therefore, the PTP-NP pair and the OLP-NP pair are expected to have bigger effect size than the OLP-PTP pair. 3. Variant 1a: DeKeyser (2000) found a statistical correlation between age of arrival and scores on the grammatically judgment test ( r = -.62, n=57, p < .001) Variant 1b: Flege, Yrnl-Komshian, and Liu (1999) found a statistical correlation between age of arrival and pronunciation scores (r = -.89, n = 264, p< .001). Variant 2a: DeKeyser (2000) found a correlation between age of arrival and scores on the grammatically judgement test (95% CI : -.76, -.42). Variant 2b : Flege, Yrnl-Komshian, and Liu (1999) found a statistical correlation between age of arrival and pronunciation scores (95% CI : -.92, -.87). Q: What do the confidence intervals tell you that P-values can’t? The value of R square (-.89) is larger than the value of R square (-.62), implying that Variant 1b has bigger effect size than Variant 1a. However, no further bits of information are provided except that there is statistical difference and the extent of the effect size. We don’t know whether such data is sufficient to lend support to the practical application in reality. Then, 95% CI needs to be considered. Since Variant 2b has a narrower 95% CI range (0.05) than Variant 2a with 95% CI range (0.34), it is held that Variant 2b has a more precise estimation and can be inferred that variant 2b is more likely to also cause practical difference in reality. P-value reflects only whether there is significant difference between two groups, while confidence intervals reflect not only whether there is significant difference but also whether the estimation is precise as well as how much the effect size is. By considering confidence intervals, one may infer the results of the effect of a specific treatment on the sampled participants in a study are more likely to be generalized to other populations outside the study. Therefore, the confidence intervals can help us judge the practicability of a specific treatment, for statistical difference reflected from p-value generated from a study with a limited number of samples alone is not sufficient for the treatment in a study to be believed to be able to be generalized or be put into practice, if the statistic results of the study lack more precise estimation.