WattsSEDU7006-8-7
0
NORTHCENTRAL UNIVERSITY ASSIGNMENT COVER SHEET
Learner: Stephen W Watts
THIS FORM MUST BE COMPLETELY FILLED IN
Academic Integrity: All work submitted in each course must be the Learner’s own. This
includes all assignments, exams, term papers, and other projects required by the faculty mentor.
The known submission of another person’s work represented as that of the Learner’s without
properly citing the source of the work will be considered plagiarism and will result in an
unsatisfactory grade for the work submitted or for the entire course, and may result in academic
dismissal.
EDU7006-8
Dr. Theresa Thonhauser
Quantitative Research Design
7 Samples, Power Analysis, and Design
Sensitivity
Assignment: Part 1 (a) Compare and contrast internal and external validity. Describe and
give examples of research questions for which external validity is a primary concern. Describe
and give examples of research questions in which internal validity is a primary concern. Discuss
strategies researchers use in order to make strong claims about the applicability of their findings
to a target population. (b) Compare and contrast random selection and random assignment. Be
sure to include a discussion of when you would want to do one or the other and the possible
consequences of failing to do random selection or random assignment in particular situations. (c)
Explain the relationship between sample size and the likelihood of a statistically significant
difference between measured values of two groups. In other words, explain why, all else being
equal, as sample size increases the likelihood of finding a statistically significant relationship
increases. (d) Compare and contrast probability and non-probability sampling. What are the
advantages and disadvantages of each?)
Part II (a) In a few sentences, describe two designs that can address your research
question. The designs must involve two different statistical analyses. For each design, specify
and justify each of the four factors and calculate the estimated sample size you’ll need. Give
reasons for any parameters you need to specify for G*Power. Include peer-reviewed journal
articles as needed to support your responses to Part I. Length: 5-7 pages (app. 350 words per
page)
Faculty Use Only
Steve: Thank you for your work on this paper! Your responses to the essay questions and power analysis
are thorough, thoughtful, and well supported. Based on what I currently know about your research, you
are right on track with your power analysis, and the statistical tests you are selecting seem appropriate for
your proposed research.
While you will work with your dissertation chair and committee on your research questions and
hypotheses, I’ve given you some feedback on them in this paper. I think you have a really interesting
study topic, but I recommend clarifying and streamlining the language and variables in the research
question and hypotheses. Please see my comments throughout your document for details and let me know
if you have any questions. Thanks!
Theresa Thonhauser
Content Score: 97%
Writing Score: 93%
August 27, 2012
Running head: WattsSEDU7006-8-7
2
Samples, Power Analysis, and Design Sensitivity
Stephen W. Watts
Northcentral University
WattsSEDU7006-8-7
3
Samples, Power Analysis, and Design Sensitivity
Part I
1. Validity focuses on the truth or accuracy of findings (Cosby & Bates, 2012), are
foundational principles used to determine the quality of research (Trochim & Donnelly, 2008),
and are operative when working with questions dealing with cause-and-effect (Cosby & Bates,
2012). “Internal validity is synonymous with control” (Salkind, 2009, p. 231). If a study has no
confounds, the results of the outcome variable are exclusively attributable to the manipulation of
the independent variable (Jackson, 2012; Yu & Ohlund, 2010), and if? the structural integrity of
the research design allows no other plausible explanations for the results (Oncu & Kakir, 2011),
the study is said to have good internal validity. External validity, on the other hand, “is
synonymous with generalizability” (Salkind, 2009, p. 231), and consists in the degree that the
findings of a study can be generalized (Jackson, 2012) to other populations, places (Crosby &
Bates, 2012; Oncu & Kakir, 2011; Trochim & Donnelly, 2008), “treatment variables, and
measurement instruments” (Dimitrov & Rumrill, 2003, p. 159).
A study does not either have or not have internal or external validity; these constructs are
not binary. Further, as the internal validity of a study increases, making it more determinant, the
external validity tends to decrease making it less extendable. Conversely, the higher the
generalizability the less likely a study will engender a causal relationship.
Research questions for which external validity would be the primary concern are those
focused on being able to generalize the findings of the study to a larger population. For example,
Ferguson and DeFelice (2010) conducted a study in which graduate students participating in a
specific class that was offered in two formats based on length of course were surveyed to
determine if there were differences in the students’ satisfaction with the communication in the
WattsSEDU7006-8-7
4
class, whether students’ were more likely to take another online class, whether students’
perceived learning differed, or whether students’ academic performance differed because of the
two formats, when all else was kept reasonably constant. While it was instructive that there were
differences between the students who attended the shorter summer classes as opposed to students
who attended the regular length classes, the authors were more interested in generalizing the
findings so that they could make inferences regarding ways to improve online classes, and
suggested that the study “may have implications for an institution’s policies for determining the
length of online courses” (p. 81).
Research questions for which internal validity is a primary concern are those that are
specifically looking for cause-and-effect. For example, Chyung and Vachon (2005) investigated
the research question, “what factors of an e-learning system do e-learners express as satisfying
factors (i.e., motivation factors) and what factors do they express as dissatisfying factors (i.e.,
hygiene factors)” (p. 103)? In this study, the authors specifically wanted to determine what
causes satisfaction and dissatisfaction in students, and whether the study determined that
satisfaction is not the inverse of dissatisfaction and vice verse. Joo, Park, Park, Kim, and Kim
(2009) demonstrated their interest in internal validity with two research questions, “does
learners’ satisfaction predict academic achievement in the corporate cyber education” (p. 3931)
and “does learners’ satisfaction predict learning transfer in the corporate cyber education” (p.
3931). Another way of phrasing internal validity is whether the findings of a study have
predictive validity. It is this ability to predict that Joo et al. (2009) were hoping to find as a result
of their study. They found that learner’s satisfaction is both predictive of academic achievement
and transfer of learning.
WattsSEDU7006-8-7
5
In order to make a strong claim regarding the applicability of a finding to a target
population a study must have strong external validity. The most important strategy for
researchers who desire strong external validity is to ensure that subjects are randomly selected
(Jackson, 2012; Oncu & Kakir, 2011; Trochim & Donnelly, 2008). Once a sample has been
randomly selected it is important to minimize dropout (Trochim & Donnelly, 2008), ensure that
the researchers have been properly instructed on how to interact with subjects (Salkiind, 2009),
and “be careful in interpreting the results, . . . because any over-interpretation would also be a
threat to external validity” (Oncu & Kakir, 2011, p. 1105). Another strategy for strengthening
external validity is through replication of results in various settings, with different samples, and
at different times (Cozby & Bates, 2012; Jackson, 2012).
2. Random selection is a manner of choosing a sample from a population for a study so
that each member of the population has an equal chance of being in the sample (Trochim &
Donnelly, 2008). A sample that is chosen randomly from an appropriate population and is of
sufficient size is said to be representative of the population (Jackson, 2012). When a sample is
representative it enhances the external validity of a study. “When the sample is representative
of the population, we can be fairly confident that the results we find based on the sample also
hold for the population” (Jackson, 2012, p. 100). Random selection may also be used to identify
different start times in a multiple baseline design (Koehler & Levin, 1998). Without random
selection of subjects, determining the population study results can be generalized to is much
more difficult, or impossible.
Once a sample is selected, random assignment is the manner of relegating the subjects in
the sample to appropriate groups by chance, such that each subject has an equal chance of being
in any specific group (Trochim & Donnelly, 2008). True experiments are defined by random
WattsSEDU7006-8-7
6
assignment of subjects to groups because “randomization ensures that the individual
characteristic composition of the two groups will be virtually identical in every way” (Crosby &
Bates, 2012, p. 82). Statistically equivalent groups enhance internal validity within a study
(Trochim & Donnelly, 2008). Without random assignment of subjects to control and
experimental groups, outcomes must be interpreted with caution because “we can never conclude
that the independent variable definitely caused any of the observed changes in the dependent
variable” (Jackson, 2012, p. 350) because of potential selection bias (Oncu & Kakir, 2011).
Random selection and random assignment are not mutually exclusive. A study can have
neither random selection nor random assignment, either random selection or random assignment,
or both random selection and random assignment. Random selection increases the external
validity of a study provided that the selection is done from the appropriate population that the
researcher is hoping to generalize to. Random assignment, on the other hand, increases the
internal validity of a study by providing probabilistic equivalency.
3. One aspect that is crucial to any experimental design is sample size. A sample with
too few participants runs the risk of (a) poorly reflecting the underlying population, (b) not
finding a significant result when the null hypothesis is false, and (c) nonreplicable results. A
sample with too many participants, however, can be much more costly and slower to conduct
(Acheson, 2010). As the number of subjects from a single population increase, variability
around the mean decreases; while as the number of subjects decrease, variability around the
mean is likely to increase (Jackson, 2012). Statistical power analysis rests upon four variables;
the sample size (N), the significance criterion (α), the population effect size (ES), and power (β).
Each of these variables is directly related to the other three (Cohen, 1992; Faul, Erdfelder, Land,
& Buchneer, 2007).
WattsSEDU7006-8-7
7
In a simple example, in order to reject the null hypothesis, a comparison is made between
the mean of the control group, and the mean of the experimental group. If the mean of the
experimental group when converted to a z-score, surpasses a critical value, the null hypothesis is
rejected. The z-score is calculated using the following equation:
𝒛 =
Μ…– 𝛍)
(𝐗
𝑿̅
To have a better possibility of rejecting the null hypothesis, z must be as large as possible. The
outcome variable directly manipulates the numerator. To increase the probability that the
outcome variable is significant I must manipulate the denominator; the smaller the denominator,
the larger the value of z. The formula for the denominator is:
𝑿 =

√𝑡
By increasing the size of N, the standard error of the mean gets smaller, increasing the size of z.
This is the affect that I am looking for. Therefore, as the size of N increases, the value of z
increases; increasing the likelihood of a statistically significant result. Conversely, as the size of
N decreases, the value of z decreases; decreasing the likelihood of a statistically significant
result.
4. Probability sampling utilizes random selection. For random selection, a procedure or
process is implemented to ensure that each unit of the population has a fair and equal chance of
being selected (Trochim & Donnelly, 2008). “Because the determination of who will end up in
the sample is determined by nonsystematic and random rules, the chance that the sample will
truly represent the population is great” (Salkind, 2009, p. 90). Probability sampling has the
following advantages: (a) allows estimating confidence intervals (Trochim & Donnelly, 2008),
(b) it is the easiest way to get a representative sample (Salkind, 2009; Trochim & Donnelly,
WattsSEDU7006-8-7
8
2008), (c) is “more accurate and rigorous than non-probabilistic samples” (Trochim & Donnelly,
2008, p. 48), and (d) allows the computing of sampling variances (McCready, 2006). Probability
sampling, however, can also be “time consuming and tedious” (Salkind, 2009, p. 97).
There are two types of non-probability sampling. The first type of non-probability
sampling is purposive, in which subjects with specific criteria are sought for inclusion in a study
(Jackson, 2012; Trochim & Donnelly, 2008). The second type of non-probability sampling may
be called accidental (Trochim & Donnelly, 2008), haphazard (Jackson, 2012), person-on-thestreet, or convenience sampling (Jackson, 2012; Trochim & Donnelly, 2008). Convenience
sampling is often used in educational research, where “the characteristics of a specific group of
individuals match the attributes of the phenomenon being studied” (Rhode, 2009, p. 4; See also
Ali & Ahmad, 2011; Boling, Hough, Krinsky, Saleem, & Stevens, 2011; Tallent-Runnels,
Thomas, Lan, Cooper, Ahern, Shaw, & Liu, 2006). Non-probability sampling is often used
because it is (a) convenient, (b) inexpensive, (c) can be used “when it is not feasible, practical, or
theoretically sensible to use random sampling” (Jackson, 2012, p. 101), and (d) in situations
where withholding of treatment may be unethical. The problem with non-probability sampling is
that the ability to generalize is weakened (Salkind, 2009; Trochim & Donnelly, 2008). Random
selection and random assignment are not mutually exclusive; it is possible for a study to have
neither, one, or both randomizations.
Part II
1a.The sample size must have at least 620 subjects to meet the given factors; 310 subjects
for each group when ES = .2, α = .05, β = .2 on a one-tailed t-test with two independent groups of
equal size (see Figure 1a below).
WattsSEDU7006-8-7
9
1b. For a sample size of 310 (155 in each group) using the compromise function, the
resulting alpha and beta is α = .09, β = .35, p = 0.05 (see Figure 1b below). “Determining a
meaningful effect size is a judgment call. . . . Selecting a desirable power is achieved by
balancing the need to detect an outcome with the difficulty in obtaining large sample sizes”
(Houser, 2007, pp. 2-3). Due to the law of large numbers the benefit of a larger sample
diminishes, while the cost and time involved increases (McCready, 2006). Using a t-test for
independent groups we compare the means of the samples to determine how likely the groups are
from the same population, or whether they are different enough to conclude they are from
differing populations. In this case, “if an outcome is detected, then power is not an issue, the
sample was obviously large enough to detect it” (Houser, 2007, p. 1). The diminished statistical
power because of the sample size does not mean that an effect cannot be found, it just means that
the chance of finding a significant result is reduced. While a significance criterion of .09 and
power of .65 is traditionally unacceptable (Salkind, 2009), if the importance of the information to
be attained is high, or preliminary, it may be worthwhile doing the study anyway.
Figure 1a
Figure 1b.
WattsSEDU7006-8-7
10
2a. The sample size must have at least 969 subjects for a one-way analysis of variance
(ANOVA) with three groups and ES = .1, α = .05, and β = .2 (see Figure 2a below).
2b. For a sample of 486, which is approximately half the size of 2a the resulting alpha
and beta is α = .10, β = .39, p = 0.05 (see Figure 2b below). An analysis of variance (ANOVA)
allows comparing the means of three or more groups to determine if “at least one group mean
differs from the others by more than would be expected based on chance” (Jackson, 2012, p.
287). It is traditional to utilize a beta/alpha ratio of 4:1 (Cohen, 1992), but some authors argue
that “the benefit of balanced Type I and Type II error risks often offsets the costs of violating
significance level conventions” (Faul et al., 2007, p. 177). I chose a traditional ratio for this
assignment. If an appropriate sample size is not attainable, a convenience sample size for this
study still has a 90% chance of not committing a Type I error, and a 61% chance of not
committing a Type II error. If the study is being conducted regarding a new or developing
theory, and there are few previous reports or pilot studies, a study with a convenience sample
may still be worthwhile conducting even with a diminished capacity to find a significant result.
Figure 2a.
Figure 2b.
WattsSEDU7006-8-7
11
3. The following research question and corresponding hypotheses will be investigated
relative to two experimental designs:
Q1. How does satisfaction of adult learners, as measured by the Learner satisfaction
subsection of the LSTQ (Gunawardena, Linder-VanBerschot, LaPoint, & Rao, 2010), vary, if at
all, in an online live virtual classroom (LVC) environment between learners who continuously
see the instructor through visual technology (webcam) and learners who cannot see the instructor
through a webcam (at all or whatever you decide here, see Comment #5)…?
H0. Measures of learner satisfaction are statistically equivalent when the visual
(webcam) element is used continuously as opposed to when it is not used continuously or not
used at all? in online LVC instruction of adult technical professional development courses ( μLVC
= μwc ).
Ha. Measures of learner satisfaction are statistically different when the visual (webcam)
element is used continuously as opposed to when it is not in online LVC instruction of adult
technical professional development courses ( μLVC ≠ μwc ).
3a. In the first design, data will be collected from the online classes of ten (minimum)
instructors who teach various technologies. Each instructor will teach two instances of two
different online classes of five consecutive days or less duration. These classes will be paired,
such that one instance of the class will be taught according to that instructor’s normal delivery
(the control) and one instance will be taught in the normal style with the addition of a webcam
transmitting the instructor’s image to the class during interactive periods of the class (the
experiment). Whether the control class or experimental class will be taught first will be
randomized. Each student will be encouraged at the end of the class to fill out the Learner
Satisfaction and Transfer-of-learning Questionnaire (LSTQ), developed and validated by
WattsSEDU7006-8-7
12
Gunawardena et al. (2010), in addition to the regular course evaluation. Incomplete or surveys
that have the same value for all sixteen questions will be discarded.
The independent variable in this first design is the visual element and has two attributes;
full use of the webcam (1) and minimal use of the webcam (0). The dependent variable in this
first design is learner satisfaction, which is a construct that will be derived from the Learner
satisfaction subscale of the LSTQ; consisting of five 5-point Likert scale questions. The learner
satisfaction construct is an ordinal variable varying from strongly agree = 5 to strongly disagree
= 1. This design will use two groups of equal size, one representing each attribute of the
independent variable. It is not known whether the distribution of scores from the Learner
satisfaction subscale of the LSTQ will be normal, so the Wilcoxon rank-sum test will be used to
determine whether one sample has significantly larger values than the other. According to the
hypotheses, it is not known whether use of the webcam will increase or decrease scores, so a
two-tailed test will be used. It is expected that the scores from the subscale will be leptokurtic,
with negative skew; therefore the parent distribution of Laplace will be selected. As the standard
deviation of the data is unknown, the effect size will be set to d = 0.3; slightly larger than a small
effect size, but not a medium effect size. Traditional values of α = 0.05 and β = 0.2 are generally
acceptable for most research in the social sciences (Salkind, 2009) and have been selected in this
case. Based on the preceding factors, and using an a priori analysis, a minimum sample size of
N = 234 is required to have an optimal chance of rejecting the null hypothesis, if it is false (see
Figure 3a below).
WattsSEDU7006-8-7
13
Figure 3a.
3b. Utilizing the same collection procedure, in design two I add a second independent
variable, technology, which can have five different values, making a 2 x 5 factorial design. In
this design I can determine if there is a significant main effect regarding whether the webcam has
a significant effect on learner satisfaction across all groups, or whether learner satisfaction is
significantly affected by the technology of the class. The factorial design also allows for
determining whether interaction effects exist between using the webcam and different
technologies.
The independent variables in this design are the visual element that has two attributes,
and the type of technology, which has five attributes. According to Heine (n.d.) this is an
ANOVA: Fixed effects, special, main effects and interactions statistical test. I have chosen an
effect size to match the previous discussion; one slightly larger than a small effect size, but not a
medium effect size, so d = 0.15. I have also chosen traditionally acceptable values for alpha and
beta, where α = 0.05 and β = 0.2. Since this is a two factor test, the degrees of freedom are the
number of possible values for the first factor less one, multiplied by the number of possible
WattsSEDU7006-8-7
14
values for the second factor less one. In this case df = ( 2 – 1 ) ( 5 – 1 ) = 1*4 = 4. The numbers
of groups in a factorial design are determined by multiplying the number of possible values of all
factors together; in this case, 2 * 5 = 10. Based on the preceding factors, and using an a priori
analysis, a minimum sample size of N = 536 is required to have an optimal chance of finding
significant main and interactive effects if they exist (see Figure 3b below).
Figure 3b.
WattsSEDU7006-8-7
15
References
Acheson, A. (2010). Sample size. In N. J. Salkind (Ed.), Encyclopedia of Research Design (pp.
1300-1302). doi:10.4135/9781412961288
Ali, A., & Ahmad, I. (2011). Key factors for determining students’ satisfaction in distance
learning courses: A study of Allama Iqbal Open University. Contemporary Educational
Technology, 2(2), 118-134. Retrieved from http://cedtech.net/
Boling, E. C., Hough, M., Krinsky, H., Saleem, H., & Stevens, M. (2011). Cutting the distance in
distance education: Perspectives on what promotes positive, online learning experiences.
Internet and Higher Education. doi:10.1016/j.iheduc.2011.11.006
Chyung, S. Y., & Vachon, M. (2005). An investigation of the profiles of satisfying and
dissatisfying factors in e-learning. Performance Improvement Quarterly, 59(3), 227-245.
doi:10.1177/0741713609331546
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. doi:10.1037/00332909.112.1.155
Cozby, P. C., & Bates, S. C. (2012). Methods in behavioral research (11th ed.). Boston, MA:
McGraw Hill Higher Education.
Dimitrov, D. M., & Rumrill, P. D. Jr. (2003). Pretest-posttest designs and measurement of
change. Work, 20(2), 159-165. Retrieved from
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCoQFj
AA&url=http%3A%2F%2Fwww.phys.lsu.edu%2Ffaculty%2Fbrowne%2FMNS_Semina
r%2FJournalArticles%2FPretest-posttest_design.pdf&ei=Z0V1T4rLJ8Wi2gXummUDQ&usg=AFQjCNF6-c4j-7JJKm9ohkXr6jXJ1T7pA&sig2=BW7hMizEWzQBfgUIIQ30rA
Faul, F., Erdfelder, E., Lang, A.-G., & Buchneer, A. (2007). G*Power 3: A flexible statistical
power analysis program for the social, behavioral, biomedical sciences. Behavior
Research Methods, 39(2), 175-191. Retrieved from http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/download-and-register/Dokumente/GPower3BRM-Paper.pdf
Ferguson, J. M., & DeFelice, A. E. (2010). Length of online course and student satisfaction,
perceived learning, and academic performance. International Review of Research in
Open and Distance Learning, 11(2), 73-84. Retrieved from http://www.irrodl.org/
index.php/irrodl
Gunawardena, C. N., Linder-VanBerschot, J. A., LaPointe, D. K., & Rao, L. (2010). Predictors
of learner satisfaction and transfer of learning in a corporate online education program.
The American Journal of Distance Education, 24(1), 207-226.
doi:10.1080/08923647.2010.522919
WattsSEDU7006-8-7
16
Heine, H. (n.d.). ANOVA: Fixed effects, special, main effects and interaction. Unpublished
manuscript, Department of Experimental Psychology, University of Dusseldorf,
Dusseldorf, Germany. Retrieved from http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/user-guide-bydistribution/f/anova_fixed_effects_special
Houser, J. (2007). How many are enough? Statistical power analysis and sample size estimation
in clinical research. Journal of Clinical Research Best Practices, 3(3), 1-5.
Retrieved from http://firstclinical.com/journal/2007/0703_Power.pdf
Jackson, S. L. (2012). Research methods and statistics: A critical thinking approach (4th ed.).
Belmont, CA: Wadsworth Cengage Learning.
Joo, Y.J., Park, S.H., Park, S.Y., Kim, S.M., Kim, E.K. & Kim, J.Y. (2009). Relationships
among Learners’ Satisfaction, Academic Achievement and Learning Transfer in the
Corporate Cyber Education. In G. Siemens & C. Fulford (Eds.), Proceedings of World
Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 (pp. 39303933). Chesapeake, VA: AACE.
Koehler, M. J., & Levin J. R. (1998). Regulated randomization: A potentially sharper analytical
tool for the multiple-baseline design. Psychologial Methods, 3(2), 206-217.
doi:10.1037//1082-989X.3.2.206
McCready, W. C. (2006). Applying sampling procedures. In F. T. L. Leong, & J. T. Austin
(Eds.), The Psychology handbook: A guide for graduate students and research assistants
(2nd ed., pp. 147-161). doi:10.4135/9781412976626.n10
Oncu, S., & Cakir, H. (2011). Research in online learning environments: Priorities and
methodologies. Computers & Education, 57, 1098-1108.
doi:10.1016/j.compedu.2010.12.009
Rhode, J. F. (2009). Interaction equivalency in self-paced online learning environments: An
exploration of learner preferences. The International Review of Research in Open and
Distance Learning, 10(1). Retrieved from http://www.irrodl.org/index.php/irrodl/
article/view/603/1178
Salkind, N. J. (2009). Exploring research (7th ed.). Upper Saddle River, NJ: Pearson/Prentice
Hall.
Tallent-Runnels, M. K., Thomas, J. A., Lan, W. Y., Cooper, S., Ahern, T. C., Shaw, S. M., &
Liu, X. (2006). Teaching courses online: A review of the research. Review of Educational
Research, 76(1), 93-135. doi:10.3102/00346543076001093
Trochim, W. M. K., & Donnelly, J. P. (2008). The research methods knowledge base (3rd ed.).
Mason, OH: Cengage Learning.
Yu, C.-H., Ohlund, B. (2010). Threats to validity of research design. Retrieved from
http://www.creative-wisdom.com/teaching/WBI/threat.shtml