WattsSEDU7006-8-7 0 NORTHCENTRAL UNIVERSITY ASSIGNMENT COVER SHEET Learner: Stephen W Watts THIS FORM MUST BE COMPLETELY FILLED IN Academic Integrity: All work submitted in each course must be the Learner’s own. This includes all assignments, exams, term papers, and other projects required by the faculty mentor. The known submission of another person’s work represented as that of the Learner’s without properly citing the source of the work will be considered plagiarism and will result in an unsatisfactory grade for the work submitted or for the entire course, and may result in academic dismissal. EDU7006-8 Dr. Theresa Thonhauser Quantitative Research Design 7 Samples, Power Analysis, and Design Sensitivity Assignment: Part 1 (a) Compare and contrast internal and external validity. Describe and give examples of research questions for which external validity is a primary concern. Describe and give examples of research questions in which internal validity is a primary concern. Discuss strategies researchers use in order to make strong claims about the applicability of their findings to a target population. (b) Compare and contrast random selection and random assignment. Be sure to include a discussion of when you would want to do one or the other and the possible consequences of failing to do random selection or random assignment in particular situations. (c) Explain the relationship between sample size and the likelihood of a statistically significant difference between measured values of two groups. In other words, explain why, all else being equal, as sample size increases the likelihood of finding a statistically significant relationship increases. (d) Compare and contrast probability and non-probability sampling. What are the advantages and disadvantages of each?) Part II (a) In a few sentences, describe two designs that can address your research question. The designs must involve two different statistical analyses. For each design, specify and justify each of the four factors and calculate the estimated sample size you’ll need. Give reasons for any parameters you need to specify for G*Power. Include peer-reviewed journal articles as needed to support your responses to Part I. Length: 5-7 pages (app. 350 words per page) Faculty Use Only Steve: Thank you for your work on this paper! Your responses to the essay questions and power analysis are thorough, thoughtful, and well supported. Based on what I currently know about your research, you are right on track with your power analysis, and the statistical tests you are selecting seem appropriate for your proposed research. While you will work with your dissertation chair and committee on your research questions and hypotheses, I’ve given you some feedback on them in this paper. I think you have a really interesting study topic, but I recommend clarifying and streamlining the language and variables in the research question and hypotheses. Please see my comments throughout your document for details and let me know if you have any questions. Thanks! Theresa Thonhauser Content Score: 97% Writing Score: 93% August 27, 2012 Running head: WattsSEDU7006-8-7 2 Samples, Power Analysis, and Design Sensitivity Stephen W. Watts Northcentral University WattsSEDU7006-8-7 3 Samples, Power Analysis, and Design Sensitivity Part I 1. Validity focuses on the truth or accuracy of findings (Cosby & Bates, 2012), are foundational principles used to determine the quality of research (Trochim & Donnelly, 2008), and are operative when working with questions dealing with cause-and-effect (Cosby & Bates, 2012). “Internal validity is synonymous with control” (Salkind, 2009, p. 231). If a study has no confounds, the results of the outcome variable are exclusively attributable to the manipulation of the independent variable (Jackson, 2012; Yu & Ohlund, 2010), and if? the structural integrity of the research design allows no other plausible explanations for the results (Oncu & Kakir, 2011), the study is said to have good internal validity. External validity, on the other hand, “is synonymous with generalizability” (Salkind, 2009, p. 231), and consists in the degree that the findings of a study can be generalized (Jackson, 2012) to other populations, places (Crosby & Bates, 2012; Oncu & Kakir, 2011; Trochim & Donnelly, 2008), “treatment variables, and measurement instruments” (Dimitrov & Rumrill, 2003, p. 159). A study does not either have or not have internal or external validity; these constructs are not binary. Further, as the internal validity of a study increases, making it more determinant, the external validity tends to decrease making it less extendable. Conversely, the higher the generalizability the less likely a study will engender a causal relationship. Research questions for which external validity would be the primary concern are those focused on being able to generalize the findings of the study to a larger population. For example, Ferguson and DeFelice (2010) conducted a study in which graduate students participating in a specific class that was offered in two formats based on length of course were surveyed to determine if there were differences in the students’ satisfaction with the communication in the WattsSEDU7006-8-7 4 class, whether students’ were more likely to take another online class, whether students’ perceived learning differed, or whether students’ academic performance differed because of the two formats, when all else was kept reasonably constant. While it was instructive that there were differences between the students who attended the shorter summer classes as opposed to students who attended the regular length classes, the authors were more interested in generalizing the findings so that they could make inferences regarding ways to improve online classes, and suggested that the study “may have implications for an institution’s policies for determining the length of online courses” (p. 81). Research questions for which internal validity is a primary concern are those that are specifically looking for cause-and-effect. For example, Chyung and Vachon (2005) investigated the research question, “what factors of an e-learning system do e-learners express as satisfying factors (i.e., motivation factors) and what factors do they express as dissatisfying factors (i.e., hygiene factors)” (p. 103)? In this study, the authors specifically wanted to determine what causes satisfaction and dissatisfaction in students, and whether the study determined that satisfaction is not the inverse of dissatisfaction and vice verse. Joo, Park, Park, Kim, and Kim (2009) demonstrated their interest in internal validity with two research questions, “does learners’ satisfaction predict academic achievement in the corporate cyber education” (p. 3931) and “does learners’ satisfaction predict learning transfer in the corporate cyber education” (p. 3931). Another way of phrasing internal validity is whether the findings of a study have predictive validity. It is this ability to predict that Joo et al. (2009) were hoping to find as a result of their study. They found that learner’s satisfaction is both predictive of academic achievement and transfer of learning. WattsSEDU7006-8-7 5 In order to make a strong claim regarding the applicability of a finding to a target population a study must have strong external validity. The most important strategy for researchers who desire strong external validity is to ensure that subjects are randomly selected (Jackson, 2012; Oncu & Kakir, 2011; Trochim & Donnelly, 2008). Once a sample has been randomly selected it is important to minimize dropout (Trochim & Donnelly, 2008), ensure that the researchers have been properly instructed on how to interact with subjects (Salkiind, 2009), and “be careful in interpreting the results, . . . because any over-interpretation would also be a threat to external validity” (Oncu & Kakir, 2011, p. 1105). Another strategy for strengthening external validity is through replication of results in various settings, with different samples, and at different times (Cozby & Bates, 2012; Jackson, 2012). 2. Random selection is a manner of choosing a sample from a population for a study so that each member of the population has an equal chance of being in the sample (Trochim & Donnelly, 2008). A sample that is chosen randomly from an appropriate population and is of sufficient size is said to be representative of the population (Jackson, 2012). When a sample is representative it enhances the external validity of a study. “When the sample is representative of the population, we can be fairly confident that the results we find based on the sample also hold for the population” (Jackson, 2012, p. 100). Random selection may also be used to identify different start times in a multiple baseline design (Koehler & Levin, 1998). Without random selection of subjects, determining the population study results can be generalized to is much more difficult, or impossible. Once a sample is selected, random assignment is the manner of relegating the subjects in the sample to appropriate groups by chance, such that each subject has an equal chance of being in any specific group (Trochim & Donnelly, 2008). True experiments are defined by random WattsSEDU7006-8-7 6 assignment of subjects to groups because “randomization ensures that the individual characteristic composition of the two groups will be virtually identical in every way” (Crosby & Bates, 2012, p. 82). Statistically equivalent groups enhance internal validity within a study (Trochim & Donnelly, 2008). Without random assignment of subjects to control and experimental groups, outcomes must be interpreted with caution because “we can never conclude that the independent variable definitely caused any of the observed changes in the dependent variable” (Jackson, 2012, p. 350) because of potential selection bias (Oncu & Kakir, 2011). Random selection and random assignment are not mutually exclusive. A study can have neither random selection nor random assignment, either random selection or random assignment, or both random selection and random assignment. Random selection increases the external validity of a study provided that the selection is done from the appropriate population that the researcher is hoping to generalize to. Random assignment, on the other hand, increases the internal validity of a study by providing probabilistic equivalency. 3. One aspect that is crucial to any experimental design is sample size. A sample with too few participants runs the risk of (a) poorly reflecting the underlying population, (b) not finding a significant result when the null hypothesis is false, and (c) nonreplicable results. A sample with too many participants, however, can be much more costly and slower to conduct (Acheson, 2010). As the number of subjects from a single population increase, variability around the mean decreases; while as the number of subjects decrease, variability around the mean is likely to increase (Jackson, 2012). Statistical power analysis rests upon four variables; the sample size (N), the significance criterion (α), the population effect size (ES), and power (β). Each of these variables is directly related to the other three (Cohen, 1992; Faul, Erdfelder, Land, & Buchneer, 2007). WattsSEDU7006-8-7 7 In a simple example, in order to reject the null hypothesis, a comparison is made between the mean of the control group, and the mean of the experimental group. If the mean of the experimental group when converted to a z-score, surpasses a critical value, the null hypothesis is rejected. The z-score is calculated using the following equation: π = Μ – π) (π ο³πΏΜ To have a better possibility of rejecting the null hypothesis, z must be as large as possible. The outcome variable directly manipulates the numerator. To increase the probability that the outcome variable is significant I must manipulate the denominator; the smaller the denominator, the larger the value of z. The formula for the denominator is: ο³πΏ = ο³ √π΅ By increasing the size of N, the standard error of the mean gets smaller, increasing the size of z. This is the affect that I am looking for. Therefore, as the size of N increases, the value of z increases; increasing the likelihood of a statistically significant result. Conversely, as the size of N decreases, the value of z decreases; decreasing the likelihood of a statistically significant result. 4. Probability sampling utilizes random selection. For random selection, a procedure or process is implemented to ensure that each unit of the population has a fair and equal chance of being selected (Trochim & Donnelly, 2008). “Because the determination of who will end up in the sample is determined by nonsystematic and random rules, the chance that the sample will truly represent the population is great” (Salkind, 2009, p. 90). Probability sampling has the following advantages: (a) allows estimating confidence intervals (Trochim & Donnelly, 2008), (b) it is the easiest way to get a representative sample (Salkind, 2009; Trochim & Donnelly, WattsSEDU7006-8-7 8 2008), (c) is “more accurate and rigorous than non-probabilistic samples” (Trochim & Donnelly, 2008, p. 48), and (d) allows the computing of sampling variances (McCready, 2006). Probability sampling, however, can also be “time consuming and tedious” (Salkind, 2009, p. 97). There are two types of non-probability sampling. The first type of non-probability sampling is purposive, in which subjects with specific criteria are sought for inclusion in a study (Jackson, 2012; Trochim & Donnelly, 2008). The second type of non-probability sampling may be called accidental (Trochim & Donnelly, 2008), haphazard (Jackson, 2012), person-on-thestreet, or convenience sampling (Jackson, 2012; Trochim & Donnelly, 2008). Convenience sampling is often used in educational research, where “the characteristics of a specific group of individuals match the attributes of the phenomenon being studied” (Rhode, 2009, p. 4; See also Ali & Ahmad, 2011; Boling, Hough, Krinsky, Saleem, & Stevens, 2011; Tallent-Runnels, Thomas, Lan, Cooper, Ahern, Shaw, & Liu, 2006). Non-probability sampling is often used because it is (a) convenient, (b) inexpensive, (c) can be used “when it is not feasible, practical, or theoretically sensible to use random sampling” (Jackson, 2012, p. 101), and (d) in situations where withholding of treatment may be unethical. The problem with non-probability sampling is that the ability to generalize is weakened (Salkind, 2009; Trochim & Donnelly, 2008). Random selection and random assignment are not mutually exclusive; it is possible for a study to have neither, one, or both randomizations. Part II 1a.The sample size must have at least 620 subjects to meet the given factors; 310 subjects for each group when ES = .2, α = .05, β = .2 on a one-tailed t-test with two independent groups of equal size (see Figure 1a below). WattsSEDU7006-8-7 9 1b. For a sample size of 310 (155 in each group) using the compromise function, the resulting alpha and beta is α = .09, β = .35, p = 0.05 (see Figure 1b below). “Determining a meaningful effect size is a judgment call. . . . Selecting a desirable power is achieved by balancing the need to detect an outcome with the difficulty in obtaining large sample sizes” (Houser, 2007, pp. 2-3). Due to the law of large numbers the benefit of a larger sample diminishes, while the cost and time involved increases (McCready, 2006). Using a t-test for independent groups we compare the means of the samples to determine how likely the groups are from the same population, or whether they are different enough to conclude they are from differing populations. In this case, “if an outcome is detected, then power is not an issue, the sample was obviously large enough to detect it” (Houser, 2007, p. 1). The diminished statistical power because of the sample size does not mean that an effect cannot be found, it just means that the chance of finding a significant result is reduced. While a significance criterion of .09 and power of .65 is traditionally unacceptable (Salkind, 2009), if the importance of the information to be attained is high, or preliminary, it may be worthwhile doing the study anyway. Figure 1a Figure 1b. WattsSEDU7006-8-7 10 2a. The sample size must have at least 969 subjects for a one-way analysis of variance (ANOVA) with three groups and ES = .1, α = .05, and β = .2 (see Figure 2a below). 2b. For a sample of 486, which is approximately half the size of 2a the resulting alpha and beta is α = .10, β = .39, p = 0.05 (see Figure 2b below). An analysis of variance (ANOVA) allows comparing the means of three or more groups to determine if “at least one group mean differs from the others by more than would be expected based on chance” (Jackson, 2012, p. 287). It is traditional to utilize a beta/alpha ratio of 4:1 (Cohen, 1992), but some authors argue that “the benefit of balanced Type I and Type II error risks often offsets the costs of violating significance level conventions” (Faul et al., 2007, p. 177). I chose a traditional ratio for this assignment. If an appropriate sample size is not attainable, a convenience sample size for this study still has a 90% chance of not committing a Type I error, and a 61% chance of not committing a Type II error. If the study is being conducted regarding a new or developing theory, and there are few previous reports or pilot studies, a study with a convenience sample may still be worthwhile conducting even with a diminished capacity to find a significant result. Figure 2a. Figure 2b. WattsSEDU7006-8-7 11 3. The following research question and corresponding hypotheses will be investigated relative to two experimental designs: Q1. How does satisfaction of adult learners, as measured by the Learner satisfaction subsection of the LSTQ (Gunawardena, Linder-VanBerschot, LaPoint, & Rao, 2010), vary, if at all, in an online live virtual classroom (LVC) environment between learners who continuously see the instructor through visual technology (webcam) and learners who cannot see the instructor through a webcam (at all or whatever you decide here, see Comment #5)…? H0. Measures of learner satisfaction are statistically equivalent when the visual (webcam) element is used continuously as opposed to when it is not used continuously or not used at all? in online LVC instruction of adult technical professional development courses ( μLVC = μwc ). Ha. Measures of learner satisfaction are statistically different when the visual (webcam) element is used continuously as opposed to when it is not in online LVC instruction of adult technical professional development courses ( μLVC ≠ μwc ). 3a. In the first design, data will be collected from the online classes of ten (minimum) instructors who teach various technologies. Each instructor will teach two instances of two different online classes of five consecutive days or less duration. These classes will be paired, such that one instance of the class will be taught according to that instructor’s normal delivery (the control) and one instance will be taught in the normal style with the addition of a webcam transmitting the instructor’s image to the class during interactive periods of the class (the experiment). Whether the control class or experimental class will be taught first will be randomized. Each student will be encouraged at the end of the class to fill out the Learner Satisfaction and Transfer-of-learning Questionnaire (LSTQ), developed and validated by WattsSEDU7006-8-7 12 Gunawardena et al. (2010), in addition to the regular course evaluation. Incomplete or surveys that have the same value for all sixteen questions will be discarded. The independent variable in this first design is the visual element and has two attributes; full use of the webcam (1) and minimal use of the webcam (0). The dependent variable in this first design is learner satisfaction, which is a construct that will be derived from the Learner satisfaction subscale of the LSTQ; consisting of five 5-point Likert scale questions. The learner satisfaction construct is an ordinal variable varying from strongly agree = 5 to strongly disagree = 1. This design will use two groups of equal size, one representing each attribute of the independent variable. It is not known whether the distribution of scores from the Learner satisfaction subscale of the LSTQ will be normal, so the Wilcoxon rank-sum test will be used to determine whether one sample has significantly larger values than the other. According to the hypotheses, it is not known whether use of the webcam will increase or decrease scores, so a two-tailed test will be used. It is expected that the scores from the subscale will be leptokurtic, with negative skew; therefore the parent distribution of Laplace will be selected. As the standard deviation of the data is unknown, the effect size will be set to d = 0.3; slightly larger than a small effect size, but not a medium effect size. Traditional values of α = 0.05 and β = 0.2 are generally acceptable for most research in the social sciences (Salkind, 2009) and have been selected in this case. Based on the preceding factors, and using an a priori analysis, a minimum sample size of N = 234 is required to have an optimal chance of rejecting the null hypothesis, if it is false (see Figure 3a below). WattsSEDU7006-8-7 13 Figure 3a. 3b. Utilizing the same collection procedure, in design two I add a second independent variable, technology, which can have five different values, making a 2 x 5 factorial design. In this design I can determine if there is a significant main effect regarding whether the webcam has a significant effect on learner satisfaction across all groups, or whether learner satisfaction is significantly affected by the technology of the class. The factorial design also allows for determining whether interaction effects exist between using the webcam and different technologies. The independent variables in this design are the visual element that has two attributes, and the type of technology, which has five attributes. According to Heine (n.d.) this is an ANOVA: Fixed effects, special, main effects and interactions statistical test. I have chosen an effect size to match the previous discussion; one slightly larger than a small effect size, but not a medium effect size, so d = 0.15. I have also chosen traditionally acceptable values for alpha and beta, where α = 0.05 and β = 0.2. Since this is a two factor test, the degrees of freedom are the number of possible values for the first factor less one, multiplied by the number of possible WattsSEDU7006-8-7 14 values for the second factor less one. In this case df = ( 2 – 1 ) ( 5 – 1 ) = 1*4 = 4. The numbers of groups in a factorial design are determined by multiplying the number of possible values of all factors together; in this case, 2 * 5 = 10. Based on the preceding factors, and using an a priori analysis, a minimum sample size of N = 536 is required to have an optimal chance of finding significant main and interactive effects if they exist (see Figure 3b below). Figure 3b. WattsSEDU7006-8-7 15 References Acheson, A. (2010). Sample size. In N. J. Salkind (Ed.), Encyclopedia of Research Design (pp. 1300-1302). doi:10.4135/9781412961288 Ali, A., & Ahmad, I. (2011). Key factors for determining students’ satisfaction in distance learning courses: A study of Allama Iqbal Open University. Contemporary Educational Technology, 2(2), 118-134. Retrieved from http://cedtech.net/ Boling, E. C., Hough, M., Krinsky, H., Saleem, H., & Stevens, M. (2011). Cutting the distance in distance education: Perspectives on what promotes positive, online learning experiences. Internet and Higher Education. doi:10.1016/j.iheduc.2011.11.006 Chyung, S. Y., & Vachon, M. (2005). An investigation of the profiles of satisfying and dissatisfying factors in e-learning. Performance Improvement Quarterly, 59(3), 227-245. doi:10.1177/0741713609331546 Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. doi:10.1037/00332909.112.1.155 Cozby, P. C., & Bates, S. C. (2012). Methods in behavioral research (11th ed.). Boston, MA: McGraw Hill Higher Education. Dimitrov, D. M., & Rumrill, P. D. Jr. (2003). Pretest-posttest designs and measurement of change. Work, 20(2), 159-165. Retrieved from http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCoQFj AA&url=http%3A%2F%2Fwww.phys.lsu.edu%2Ffaculty%2Fbrowne%2FMNS_Semina r%2FJournalArticles%2FPretest-posttest_design.pdf&ei=Z0V1T4rLJ8Wi2gXummUDQ&usg=AFQjCNF6-c4j-7JJKm9ohkXr6jXJ1T7pA&sig2=BW7hMizEWzQBfgUIIQ30rA Faul, F., Erdfelder, E., Lang, A.-G., & Buchneer, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, biomedical sciences. Behavior Research Methods, 39(2), 175-191. Retrieved from http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/download-and-register/Dokumente/GPower3BRM-Paper.pdf Ferguson, J. M., & DeFelice, A. E. (2010). Length of online course and student satisfaction, perceived learning, and academic performance. International Review of Research in Open and Distance Learning, 11(2), 73-84. Retrieved from http://www.irrodl.org/ index.php/irrodl Gunawardena, C. N., Linder-VanBerschot, J. A., LaPointe, D. K., & Rao, L. (2010). Predictors of learner satisfaction and transfer of learning in a corporate online education program. The American Journal of Distance Education, 24(1), 207-226. doi:10.1080/08923647.2010.522919 WattsSEDU7006-8-7 16 Heine, H. (n.d.). ANOVA: Fixed effects, special, main effects and interaction. Unpublished manuscript, Department of Experimental Psychology, University of Dusseldorf, Dusseldorf, Germany. Retrieved from http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/user-guide-bydistribution/f/anova_fixed_effects_special Houser, J. (2007). How many are enough? Statistical power analysis and sample size estimation in clinical research. Journal of Clinical Research Best Practices, 3(3), 1-5. Retrieved from http://firstclinical.com/journal/2007/0703_Power.pdf Jackson, S. L. (2012). Research methods and statistics: A critical thinking approach (4th ed.). Belmont, CA: Wadsworth Cengage Learning. Joo, Y.J., Park, S.H., Park, S.Y., Kim, S.M., Kim, E.K. & Kim, J.Y. (2009). Relationships among Learners’ Satisfaction, Academic Achievement and Learning Transfer in the Corporate Cyber Education. In G. Siemens & C. Fulford (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 (pp. 39303933). Chesapeake, VA: AACE. Koehler, M. J., & Levin J. R. (1998). Regulated randomization: A potentially sharper analytical tool for the multiple-baseline design. Psychologial Methods, 3(2), 206-217. doi:10.1037//1082-989X.3.2.206 McCready, W. C. (2006). Applying sampling procedures. In F. T. L. Leong, & J. T. Austin (Eds.), The Psychology handbook: A guide for graduate students and research assistants (2nd ed., pp. 147-161). doi:10.4135/9781412976626.n10 Oncu, S., & Cakir, H. (2011). Research in online learning environments: Priorities and methodologies. Computers & Education, 57, 1098-1108. doi:10.1016/j.compedu.2010.12.009 Rhode, J. F. (2009). Interaction equivalency in self-paced online learning environments: An exploration of learner preferences. The International Review of Research in Open and Distance Learning, 10(1). Retrieved from http://www.irrodl.org/index.php/irrodl/ article/view/603/1178 Salkind, N. J. (2009). Exploring research (7th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall. Tallent-Runnels, M. K., Thomas, J. A., Lan, W. Y., Cooper, S., Ahern, T. C., Shaw, S. M., & Liu, X. (2006). Teaching courses online: A review of the research. Review of Educational Research, 76(1), 93-135. doi:10.3102/00346543076001093 Trochim, W. M. K., & Donnelly, J. P. (2008). The research methods knowledge base (3rd ed.). Mason, OH: Cengage Learning. Yu, C.-H., Ohlund, B. (2010). Threats to validity of research design. Retrieved from http://www.creative-wisdom.com/teaching/WBI/threat.shtml