WattsSEDU7006-8-2 0 Exploring Inferential Statistics and Their Discontents Stephen W. Watts Northcentral University WattsSEDU7006-8-2 1 Exploring Inferential Statistics and Their Discontents Jackson (2012) Chapter Exercises #2a. This is a one-tailed test. We are interested in the product preventing cavities. #2b. H0: μ new TP ≥ μ oth; Ha: μ new TP < μ oth #2c. Zobt = (M – SDs) / SDM (1.73 – 1.5) / 1.12 = 0.23 / 1.12 = 0.205 #2d. Zcv = 1.645 #2e. H0 should not be rejected. The difference in cavities between other brands and the new toothpaste are not significant enough to support the claim. #2f. CI = M ± Z * SEM, M = 1.5, Z = ± 1.96, SEM = 1.12 / √ 60 = 1.12 / 7.746 = 0.145 1.5 ± 19.6*0.145 = 1.5 ± 2.834 CI = -1.334 to 4.334. Since you can’t have negative cavities, I would assume that the confidence interval would be from 0 to 4.334 cavities. #4. As the degrees of freedom increase the critical value decreases. By comparing tobt with his tcv with df = 13 there is a larger percentage that he will fail to reject the null hypothesis even though it should be rejected; a Type II error. #6a. This is a two-tail test; hypothesis is that there is a difference, but no direction is indicated. #6b. μ cm = μ pop; Ha: μ cm ≠ μ pop #6c. t = ( M – μ) / SEM; tobt = (59 – 58) / 1.016 = 0.984 #6d. t(13)cv = ±2.160 #6e. H0 should not be rejected. The difference demonstrated in spatial ability between those who listen to classical music and those in the general population who do not listen to classical music is not large enough to reject the null hypothesis. #6f. CI = M ± tcv * SEM = 59 ± 2.160*1.016 = 59 ± 2.195 CI = 56.805 to 61.195 WattsSEDU7006-8-2 2 #8a. χ2 = Σ ( O – E) 2 / E = (31-24)2 / 24 + (89 – 96)2 / 96 = 72 / 24 – 72 / 96 = 49 / 24 + 49 / 96 = 2.042 + 0.510 = χ2obt = 2.552 #8b. df = 1 #8c. χ2cv = 3.841 #8d. The null hypothesis is rejected if χ2obt is greater than χ2cv; this is not the case. The researcher will fail to reject the null hypothesis, because the number of people exercising in California is not significantly larger than the number of people exercising in the United States. #2a. This study meets the assumptions of the Independent-Groups t Test with a onetailed test. #2b. H0: μNM ≥ μ Mus; Ha: μ NM < μ Mus #2c. tcv = 1.86; t(8)obt = 2.193; p < .05, one-tailed test (Used SPSS to get obtained value.) #2d. Reject the null hypothesis, and conclude that studying without music leads to better attention on the material. #2e. r2 = t2 / t2 + df = 2.1932 / 2.1932 + 8 = 4.809 / (4.809 + 8) = 4.809 / 12.809 = r 2 = 0.375 for a large effect size. Test Scores 8 6 4 2 0 #2f. Music No Music #2g. CI = M1 – M2 ± tcv(s M1 – M2) = 7.556 – 6.000 ± 1.86* (0.503 – 0.333) = 1.556 ± 1.86*0.170 = 1.556 ± .316 CI = 1.240 to 1.872 WattsSEDU7006-8-2 3 #4a. This study meets the assumptions of the t Test for Correlated Groups with a onetailed test. #4b. H0: μNM ≥ μ Mus; Ha: μ NM < μ Mus #4c. tcv = 2.015; t(5)obt = 2.739; p < 0.05, one-tailed test (Used Excel to get obtained value.) t-Test: Paired Two Sample for Means Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail No Music Music 6.666666667 7.666666667 1.466666667 0.666666667 6 6 0.674199862 0 5 2.738612788 0.020429702 2.015048372 0.040859404 2.570581835 #4d. Reject the null hypothesis, and conclude that studying without music leads to better test scores. #4e. r2 = t2 / t2 + df = 2.7392 / (2.7392 + 5) = 7.502 / (7.502 + 5) = 7.502 / 12.502 r2 = 0.60 for a large effect size. Test Scores 8 6 4 2 0 #4f. Music No Music WattsSEDU7006-8-2 4 #4g. CI = M1 – M2 ± tcv(s M1 – M2) = 7.667 – 6.667 ± 2.015 * (0.494 – 0.333) = 1 ± 2.015 * 0.161 = 1 ± 0.324 CI = 0.676 to 1.324 #6a. This study meets the assumptions of the Wilcoxon Rank-Sum Test one-tailed test. #6b. H0: μgs ≥ μ rs; Ha: μ gs < μ rs #6c. W(n1=7, n2=7)cv = 39; W(n1 = 7, n2 = 7) = 43, p = 0.05, one-tailed test. Red Sauce 7 8 6 4.5 9 12.5 10 14 6 4.5 7 8 8 10.5 62 Green Sauce 4 5 6 8 7 6 9 1 2 4.5 10.5 8 4.5 12.5 43 #6d. Fail to reject null hypothesis. Taste scores for the two sauces did not differ significantly. #8a. χ2 (1, N = 105) = 6.732, p < 0.05 Men Women 15 27 Front 32 19 Back 42 51 47 93 46 Expectation 21.2 20.8 25.8 25.2 cv Chi-Squared 1.826 1.866 1.504 1.537 3.841 6.732 df= #8b. df = (2-1)*(2-1) = 1 #8c. χ2cv = 3.841 1 WattsSEDU7006-8-2 5 #8d. Reject the null hypothesis. There is a significant difference in seating preferences between women and men. More men sit in the back row and more women sit in the front row. Part I Assignment Question Answers What are degrees of freedom? How are they calculated? The degrees of freedom are the number of scores in any sample that can freely change. For any given mean, all of the values can freely change to maintain the mean except the last one. Thus, the degrees of freedom can be calculated with the formula df = N – 1. What do inferential statistics allow you to infer? The inference in inferential statistics is what you can say about a population based on the conclusions found in a research study on a sample of the population following specific sampling and statistical procedures. What is the General Linear Model (GLM)? Why does it matter? The general linear model (GLM) unifies various statistical models into a flexible generalization of linear regression such that response variables can be other than normally distributed. Since the goal of data analysis “is to summarize or describe accurately what is happening in the data” (Trochim & Donnelly, 2012, p. 297) the GLM enhances this ability. The GLM has brought together disparate statistical tools into one model, while also allowing for the discovery of advanced models, such as SEM and HLM, for testing complex models. Compare and contrast parametric and nonparametric statistics. Why and in what types of cases would you use one over the other? Parametric statistics are used when analyzing interval or ratio data. To use parametric statistics, the data must be bell-shaped and will have known or calculable population means and standard deviations. Parametric statistics allow for more accurate and precise estimates than nonparametric measures as long as the assumptions upon which they are based are true. Parametric statistics allow for predictability of WattsSEDU7006-8-2 6 scores based on probabilities. Nonparametric statistics are used when analyzing nominal or ordinal data. With nonparametric statistics the distribution of data does not have to be bellshaped and population parameters are not needed. Nonparametric statistics are used in situations where (a) an assumption of distribution probability is not warranted, (b) ranking is involved, and (c) assessing preferences. The design of a study may affect which particular statistical test is used, but does not differentiate between parametric and nonparametric statistics. Why is it important to pay attention to the assumptions of the statistical test? What are your options if your dependent variable scores are not normally distributed? Parametric statistics are only accurate, or have statistical power, when the assumptions upon which the test is based are true. If the test assumptions are not met, the results can be inaccurate, misleading, or wrong. When dependent variable scores are not normally distributed, but are bellshaped, a t-test parametric analysis can be performed. If, however, the dependent variable scores are not bell-shaped the only option for analysis is the use of nonparametric statistical tests. Part II Assignment Question Answers What does p = .05 mean? What are some misconceptions about the meaning of p =.05? Why are they wrong? Should all research adhere to the p = .05 standard for significance? Why or why not? Also known as the significance level, or alpha, of a study; p represents the probability of committing a Type I error, of rejecting the null hypothesis when it is true. Schmidt (2010) identified six misconceptions regarding significance, including: (a) reliable replication, (b) identifies the size of a relationship, (c) when not significant it indicates no relationship, (d) are essential to research, (e) guarantee impartiality, and (f) contribute to the field. Significance bears no relationship on?? replication, and indicates that in situations where there is no actual relationship between variables there is a 1:20 chance that a significant finding WattsSEDU7006-8-2 7 will be found. Correlation or effect size identifies the strength of a relationship; the significance does not. Type II and Type I errors are predicted because there is always a chance that a significant result will be found even if no relationship occurs, or that a nonsignificant result will be found even if there should be a significant?? one. Significance identifies the probability of the first possibility. Significance can be manipulated though sample size, is not the best measure for summarizing research data, and is not essential to research, does not guarantee impartiality of observations, and may distort and “retard the development of cumulative knowledge” (Schmidt, 2010, p. 239). Cohen (1992) identifies that when multiple null hypotheses are being tested the significance level be lowered so that “experimentwise [sic] risk not become too large” (p. 156) and conclusion validity decreased with a “fishing and the error rate problem” (Trochim & Donnelly, 2012, p. 255). While a significance level of 0.05 is traditional, there is nothing sacrosanct regarding that level of Type I probability, and Faul, Erdfelder, Land, and Buchner (2007) concluded that there is no reason why the exact significance level of results should not be reported in reports and articles. Compare and contrast the concepts of effect size and statistical significance. There are four important components of statistical power; significance, sample size, effect size, and power (Cohen, 1992; Faul, Erdfelder, Land, & Buchner, 2007; Trochim & Donnelly, 2012). Many researchers focus on significance almost to the exclusion of the other components (Cohen, 1992). The idea behind statistical analysis of research is to determine whether a treatment creates two separate populations whose differences can be measured. The effect size is a measure showing the degree the experimental mean is expected to deviate from the control mean or the variance accounted for in a study. Statistical significance is the maximum allowable risk WattsSEDU7006-8-2 8 of erroneously rejecting the null hypothesis and committing a Type I error. Both are similar because as each decreases the minimum sample size must increase in order to maintain sufficient power to find a significant result. As each increases in value the minimum sample size decreases to maintain consistent power. Other than their relationship to statistical power, significance and effect size are not similar at all. What is the difference between a statistically significant result and a clinically or “real world” significant result? Give examples of both. A statistically significant result traditionally can occur by chance 1 in 20 times even if the null hypothesis is false. Clinical, or real-world, significance occurs when research results are useful to expanding the knowledge of the field. Statistical significance indicates results obtained are different from the population mean and probably are not due to chance. Schmidt (2010) identified the results of a metaanalysis in which eight studies were found to be statistically nonsignificant, while another eight studies were found to be statistically significant, even though all 16 studies used the same “test of decision making to supervisory rating of job performance in various midlevel jobs” (p. 234). A study can return statistical significance even if the null hypothesis is true or not be statistically significant even though the null hypothesis is false; clinical significance is never achieved unless the underlying data represent an important difference (Carver, 1978). Schmidt’s meta-analysis is an example of practical significance because eliminating the sampling and measurement error from the 16 studies resulted in “basically a single value (.32) because there is nearly no variation” (p. 236). What is NHST? Describe the assumptions of the model. Null hypothesis significance testing (NHST) assumes that differences in means from the H0 are due principally because of WattsSEDU7006-8-2 9 sampling variance or by chance (Kirk, 2003). If a statistically significant result is found, the null hypothesis is rejected on the assumption that the result is probably not due to those factors. Describe and explain three criticisms of NHST. Three criticisms of null hypothesis significance testing are, (a) the wrong question is being answered, (b) it is a trivial exercise, and (c) it makes an ordinal value out of a ratio value, losing much in the conversion. Researchers want to ascertain truth when conducting research; what is the probability that this treatment will consistently result in the outcomes predicted (reliability) and that the research hypothesis is true (validity)? Significance testing contributes to answering neither of these questions, but instead answers a different question; what is the probability that the results obtained were not a chance occurrence? The assumptions of significance testing identify that there is sampling variance in all research measures. Based on this assumption, the probability that two means on any measure will be exactly the same to an infinite number of decimal places is zero. Therefore, the null hypothesis will be false in every case, making it a trivial exercise to compare anything with it. The third criticism of NHST is that significance testing creates an arbitrary “go-no-go decision straddled over p = .05” (Cohen, 1992, p. 156) or a “cliff effect” (Kirk, 2003, p. 87) that tends to mark significant findings as important and nonsignificant finds as unimportant; a “hurdle, with the statistical significance test coming before further consideration of the results” (Carver, 1978, p. 388). This hurdle, in essence, places chance and sample size before real-world significance. Describe and explain two alternatives to NHST. What do their proponents consider to be their advantages? An alternative to NHST proposed by Schmidt (2010) is to follow the example of physical scientists and utilize confidence intervals. He suggests that confidence intervals provide more information than do significance tests and are just as objective. He further cites the 1999 APA Task Force Report on significance testing that confidence intervals WattsSEDU7006-8-2 10 should be reported in research results. Kirk (2003) identified that confidence intervals provide all of the information associated with significance, but also contains “a range of values within which the population parameter is likely to lie, . . . the same unit of measurement as the data, . . . [and] are especially useful in assessing the practical significance of results” (p. 88). Both Schmidt (2010) and Kirk (2003) also propose the reporting of effect sizes in research results. Effect sizes summarize data by identifying the strength of an association. Through reporting this strength estimation can be made regarding the practical usefulness of the results. Which type of analysis would best answer the research question you stated in Activity 1? Justify your answer. I did not propose or state a research question in Activity 1, nor was one asked of me. WattsSEDU7006-8-2 11 References Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378-399. Retrieved from http://scholasticadministrator.typepad.com/thisweekineducation/files/the_case_against_st atistical_significance_testing.pdf Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. doi:10.1037/00332909.112.1.155 Faul, F., Erdfelder, E., Lang, A.-G., & Buchneer, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, biomedical sciences. Behavior Research Methods, 39(2), 175-191. Retrieved from http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/download-and-register/Dokumente/GPower3BRM-Paper.pdf Jackson, S. L. (2012). Research methods and statistics: A critical thinking approach (4th ed.). Belmont, CA: Wadsworth Cengage Learning. Kirk, R. E. (2003). The importance of effect magnitude. In S. F. Davis (Ed.), Handbook of Research Methods in Experimental Psychology (pp. 83-105). doi:10.1002/9780470756973.ch5 Schmidt, F. (2010). Detecting and correcting the lies that data tell. Perspectives on Psychological Science, 5(3), 233-242. doi:10.1177/1745691610369339 Trochim, W. M. K., & Donnelly, J. P. (2008). The research methods knowledge base (3rd ed.). Mason, OH: Cengage Learning.