Uploaded by kmax09

pdf archiveAJPIASvol 86iss 5368 1

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/324618977
Quantitative critical thinking: Student activities using Bayesian updating
Article in American Journal of Physics · May 2018
DOI: 10.1119/1.5012750
CITATIONS
READS
10
311
1 author:
Aaron Warren
Purdue University Northwest
19 PUBLICATIONS 585 CITATIONS
SEE PROFILE
All content following this page was uploaded by Aaron Warren on 02 May 2018.
The user has requested enhancement of the downloaded file.
PHYSICS EDUCATION RESEARCH SECTION
The Physics Education Research Section (PERS) publishes articles describing important results from the
field of physics education research. Manuscripts should be submitted using the web-based system that can
be accessed via the American Journal of Physics home page, http://ajp.dickinson.edu, and will be forwarded
to the PERS editor for consideration.
Quantitative critical thinking: Student activities using Bayesian
updating
Aaron R. Warrena)
Department of Chemistry and Physics, Purdue University Northwest, 1401 S. US-421, Westville,
Indiana 46391-9542
(Received 1 June 2016; accepted 6 November 2017)
One of the central roles of physics education is the development of students’ ability to evaluate
proposed hypotheses and models. This ability is important not just for students’ understanding of
physics but also to prepare students for future learning beyond physics. In particular, it is often
hoped that students will better understand the manner in which physicists leverage the availability
of prior knowledge to guide and constrain the construction of new knowledge. Here, we discuss
how the use of Bayes’ Theorem to update the estimated likelihood of hypotheses and models can
help achieve these educational goals through its integration with evaluative activities that use
hypothetico-deductive reasoning. Several types of classroom and laboratory activities are presented
that engage students in the practice of Bayesian likelihood updating on the basis of either
consistency with experimental data or consistency with pre-established principles and models. This
approach is sufficiently simple for introductory physics students while offering a robust mechanism
to guide relatively sophisticated student reflection concerning models, hypotheses, and problemsolutions. A quasi-experimental study utilizing algebra-based introductory courses is presented to
assess the impact of these activities on student epistemological development. The results indicate
gains on the Epistemological Beliefs Assessment for Physical Science (EBAPS) at a minimal cost
of class-time. VC 2018 American Association of Physics Teachers.
https://doi.org/10.1119/1.5012750
I. INTRODUCTION
There is widespread agreement that preparation for the
contemporary and future workplace requires physics students
(including non-majors) to develop an understanding of the
procedures by which physics and science generally operate.1–5 Many of these procedures fall within the category of
abilities termed “critical thinking,” involving evaluative
reflection upon data, hypotheses, and models. Although a
number of instructors may prefer to value other abilities
more, the case is made that it is a mistake to do so since traditional course designs leave students underprepared for their
futures. This is in part because the students often complete
physics courses with their initial novice-like traits still
intact.6,7
Novice physics students are often prone to unscientific
and disorganized evaluation of models and hypotheses. They
usually prefer methods that might be viewed as primarily
authoritarian in nature due to their reliance upon recognized
authorities such as instructors, outstanding peers, and answer
keys rather than an independent consideration of evidence.8,9
If little attention is paid during a course to helping students
to develop the means to evaluate models and hypotheses in a
robust fashion, then there can be little hope that they will finish a class with any measurable improvement in associated
368
Am. J. Phys. 86 (5), May 2018
http://aapt.org/ajp
critical thinking abilities. For example,10,11 it is found that
students in traditional labs spent significantly less time
engaged in sense-making than their peers in non-traditional
courses, particularly in the amount of time spent making
sense of errors and uncertainties. Moreover, many studies
find that introductory physics courses, both traditional and
non-traditional, even including those that produce gains on
measures of conceptual understanding, tend to have a negative impact on students’ attitudes.12,13
Despite the broadly negative findings on student attitude
shifts during introductory courses, there are some wellestablished curricula that have demonstrated success in
helping to improve student beliefs. In particular, courses
with an explicit focus on model-building, including
Physics by Inquiry,14 Physics of Everyday Thinking,15
Modeling Instruction,16–18 and the Investigative Science
Learning Environment (ISLE),19,20 have been able to demonstrate gains on student attitudes. Additionally, courses
that pay special attention to epistemological considerations have been shown to produce gains.21,22 Critical
thinking and reflection on the process of learning are
woven within these curricula as students continually propose and evaluate hypotheses and models.
Several evaluation techniques that may receive attention
in these curricula are dimensional analysis, special-case
C 2018 American Association of Physics Teachers
V
368
analysis, and limit-case analysis. Each evaluation technique
applies pre-determined criteria and standards to an assertion
in order to test whether the assertion is well-defined and consistent with prior knowledge. Students are then expected to
update the estimated likelihood of an assertion in a qualitative, intuitive fashion. The attention that these courses generally pay to helping students reflect in their learning is
thought to be one essential feature that enables positive attitudinal shifts to occur.
Specific evaluation activities, employing hypotheticodeductive reasoning, have been shown to positively impact
student problem-solving ability.8,9 These activities, although
ideally meant for curricula such as ISLE or Modeling
Instruction, are curriculum-agnostic in the sense that they
may be easily and directly incorporated within even traditional courses (although it is not known how their effects are
modified by being employed in different curricular environments). Here, we endeavor to deepen such activities and
make them more precise through the use of Bayes’ Theorem
to quantitatively update student estimates of likelihoods. We
propose that the incorporation of Bayes’ Theorem within
these evaluation activities for an algebra-based introductory
physics curriculum may provide a novel means for generating positive attitudinal shifts among students. Given the
effect hypothetico-deductive evaluation activities are already
known to have on problem-solving ability, their extension to
include Bayesian updating in order to also produce positive
attitudinal shifts may make them especially useful in fulfilling the broader educational goals of many physics courses.
The central purpose of this report is therefore to test
whether students’ use of Bayes’ Theorem to estimate and
update subjective probabilities of hypotheses can positively
impact their epistemological attitudes. We first present the
motivation for this study by examining the manner by which
Bayes’ Theorem creates explicit values for organized student
reflection at the end of the evaluative process, enabling
important but subtle points of general scientific reasoning to
be clarified. This motivation is rooted within the logical
structure of hypothetico-deductive reasoning, which we
briefly review. We then discuss how the use of Bayes’
Theorem is related to hypothesis-evaluation and illustrate its
use with examples. Next, templates and examples of several
novel types of curricular materials that engage students in
the use of Bayes’ Theorem are described. Finally, the results
of a quasi-experimental study assessing the epistemological
effect of these materials are presented. While not conclusive,
the results support the hypothesis that Bayesian evaluation
activities enhance students’ epistemological attitudes. These
results, when combined with previous work showing that
evaluation activities are able to improve problem-solving
ability at a relatively low cost of class-time and instructor
preparation, suggest that the Bayesian evaluation activities
have promising educational potential.
II. HYPOTHETICO-DEDUCTIVE REASONING
One of the central cognitive tools in scientific reasoning is
the hypothetico-deductive (HD) process,23 which is used to
test a hypothesis by deducing a prediction and then comparing an actual result with the predicted result. A generic template for this process is shown in Table I. It should be noted
that the assumptions of the planned test will often require
testing or assessment of their own before any change in confidence regarding the hypothesis may be granted.
369
Am. J. Phys., Vol. 86, No. 5, May 2018
Table I. A schematic outline of the hypothetico-deductive process.
IF
AND
THEN
AND/BUT
[the hypothesis to be tested is assumed to be true]
[a test is planned under certain assumed conditions]
[a prediction is deduced]
[results of the test, including associated uncertainties, were
obtained]
THEREFORE [estimated likelihood of hypothesis should/should not be
changed]
Depending on the nature of the hypothesis being tested,
there are two general types of situations in which students
can be asked to use the HD process.
Direct evaluation: The hypothesis makes assertions about
the nature of some data. For example, assertions about data
patterns (e.g., asserting that some particular data are best
modeled with a linear fit), the presence and impact of errors
associated with data collection, or the comparison of empirical results with a theoretical prediction.
Indirect evaluation: The hypothesis makes assertions
about relationships between physical models (e.g., coherence
in some limiting cases or special cases) or between a physical model and general theoretical principles (e.g., energy
conservation).25
The difference between these two categories is the immediate presence or absence of empirical data. The term
“Indirect” is used for the latter category because data have
already been used to establish confidence in some other models or principles, and thus, those data are now being used
indirectly to test the model at hand. It is in this way that
much work in theoretical physics is done. String theory, for
example, is largely motivated by inconsistencies between
general relativity and quantum field theory, and through the
careful use of indirect evaluation, it qualifies as an empirical
science despite the difficulty in generating directly testable
predictions.26 This is a typical stage in the development of
theoretical knowledge, as a new framework first gains confidence by demonstrating consistency with pre-existing principles (and the data they are supported by) before being tested
against data from novel experiments. One of the distinctive
features of introductory physics is that it is often the first
opportunity available to help students become more sensitive
to this point that one can reason at the level of physical models and theories to test new ideas without always needing to
collect more data. This is an essential concept for students to
be able to construct well-organized, hierarchal knowledge in
many fields, including physics, and can be introduced at the
beginning of the course by the following example.
A. Introductory example
We start with two cylindrical containers, with differing
radii and with marks at equal height intervals. Water is initially poured into the wide cylinder up to mark 4. It is then
poured from the wide cylinder to the narrow cylinder and is
observed to reach mark 6 (see Fig. 1). On the basis of this
result, one na€ıve model relating the height of the water in the
two cylinders, hwide and hnarrow, is
hnarrow ¼ hwide þ 2:
(1)
We evaluate this model using the HD process, either
directly or indirectly. Direct evaluation amounts to repeating
Aaron R. Warren
369
quantitative model. This simple scenario provides a touchstone example of such reasoning which other activities can
build upon, as discussed below.
III. BAYESIAN UPDATING
Fig. 1. Water is poured from a wide cylinder to a narrow cylinder, and the
height is observed to change from mark 4 in the wide cylinder to mark 6 in
the narrow cylinder. [From Lawson, The Neurological Basis of Learning,
Development and Discovery: Implications for Science and Mathematics
Instruction. Copyright 2003 by Kluwer Academic Publishers. Reproduced
by permission of Kluwer Academic Publishers (Ref. 24).
the experiment with a change in the independent variable
(the water’s initial height) and comparing the result with the
prediction of our model.
Direct evaluation:
IF the model is true
AND the experiment is repeated with the wide cylinder
initially filled to mark 6 instead of mark 4
THEN when poured into the narrow cylinder, the water
should reach mark 8
BUT when the experiment is performed, we see the water
reaches mark 9
THEREFORE we must decrease confidence in our model.
An indirect evaluation of the water-pouring model entails
a clever thought experiment that turns the experimental procedure around in order to arrive at a contradiction with a
general theoretical principle that we already have strong confidence in.
Indirect Evaluation:
IF the model is true
AND water is initially poured into the narrow cylinder up
to mark 2 and then transferred from the narrow cylinder
into the wide cylinder
THEN the model predicts that hwide ¼ 0
BUT this means that the water disappears, which clearly
violates volume conservation for our fluid
THEREFORE we must decrease confidence in our model.
Here, prior data that went into developing students’ established confidence in volume conservation as a valid principle
for water (generally provided by earlier K-12 science
courses) have been leveraged in order to test a new assertion.
Since the indirect approach is able to modify confidence in
the proposed model, it is epistemically equivalent to the
direct evaluation of the model with new data from new
experiments. Students usually express surprise and appreciation for the indirect approach, as they often have not engaged
in thought experimentation or critical reflection of a
370
Am. J. Phys., Vol. 86, No. 5, May 2018
At the end of the HD process, one may update confidence
in the hypothesis being tested, if the assumptions of the test
were sufficiently met. This updating is usually performed in
a qualitative fashion by students in introductory physics
courses, sometimes with the exaggerated and incorrect belief
that the result of a test may disprove a hypothesis or (even
worse) that a result may prove a hypothesis. Such a binary
model of truth value hampers the ability to adapt to new
information, as it severely overestimates the impact of any
single test. Thus, students with particular attitudes toward
knowing possess different attitudes toward learning, affecting their adaptiveness to new information or claims.27 In
contemporary science, the likelihood of a hypothesis is often
calculated and updated using Bayes’ Theorem, which codifies the optimal manner by which new evidence should be
used to update the estimated likelihood of a claim.28 One
need to only look at the extensive role Bayesian inferencing
plays in gravitational wave detection29 to see its impact on
physics, for example.
A typical use of Bayes’ Theorem requires one to begin
with an initial probability for a hypothesis H, denoted P(H),
which is called the prior. Once some evidence E is gathered,
the probability of the evidence P(E) and the conditional
probability PðEjHÞ of observing evidence E given hypothesis H are used to calculate the posterior probability PðHjEÞ
of hypothesis H given the evidence E according to
PðHjEÞ ¼
PðEjHÞ
PðHÞ:
PðEÞ
(2)
This can be re-written in a more convenient form for our
purposes
PðHjEÞ ¼
PðHÞ R
;
PðHÞ R þ 1 PðHÞ
(3)
where
R¼
PðEjH Þ
;
PðEj:H Þ
(4)
is a Bayes factor. This is the ratio by which a particular piece
of evidence is relatively more likely given hypothesis H than
by its converse, :H. This factor encodes the inferential
power of the evidence with regard to H. For example, a
strong confirmatory result means that the evidence can be
much better explained by the hypothesis than otherwise and
R will tend toward large values in such cases (i.e., R 1). A
strong disconfirmatory result means that the evidence is
much better explained by assuming that the hypothesis is
false than otherwise and R will tend toward zero (although
never equaling zero). A null result is when the evidence
offers no discrimination between H and :H, in which case
R ¼ 1. Guidelines for the estimation of R are provided in
Table II.
In order to better help students understand the act of confidence updating along with the associated epistemic implications, it seems natural to engage them in the use of Bayes’
Aaron R. Warren
370
Table II. Guidelines for estimation of R. [Adapted with permission from
Kass and Raftery, J. Am. Stat. Assoc. 90(430), 791 (1995). Copyright 1995,
American Statistical Association (Ref. 30)].
R
Interpretation
1
< 150
1
1
150 to 20
1
1
to
20
3
1
3 to 1
:H very strongly favored
:H strongly favored
:H substantially favored
:H barely favored
H barely favored
H substantially favored
H strongly favored
H very strongly favored
1 to 3
3 to 20
20 to 150
>150
Theorem. While the full-fledged use of Bayes’ Theorem is
perhaps too much to expect in an introductory physics
course, it is quite possible to employ it in a loose fashion.
This can be introduced at the beginning of a course with the
water-pouring example.
A. Introductory example
Direct evaluation:
If we revisit the direct evaluation of the water-pouring
task from above, then the hypothesis H is the assertion that
hnarrow ¼ hwide þ 2. Before planning or doing the direct evaluation, each student is asked to estimate their confidence in
H, thus providing the prior P(H). If the student is completely
unsure, then they are told to set P(H) ¼ 0.5, representing a
50% chance that the hypothesis is or is not valid, corresponding to the maximum epistemic uncertainty. The test is then
performed, and the evidence E is the observation that the
water rose to mark 9 instead of mark 8. The same student
may then use Table II to select a value for R, which in this
case should be a small number because the result is much
more likely if our hypothesis is false than if it is true. Let us
suppose the student, based on their subjective estimation of
the inferential power of this result, selects R ¼ 0.001, which
is in the category of being a very strong disconfirmation of
the hypothesis (<1=150).
Specifically, this choice for the R value indicates that the
student believes that the obtained result is 1000 times more
likely to occur if the hypothesis is false than if the hypothesis
was true. Another student may select R ¼ 0.05 if they feel
that the result is only 20 times more likely if the hypothesis
is false than if it is true. Note that the value of R cannot be
zero, though, because one can never be absolutely sure of the
measurements; for example, there is always the possibility of
miscounting the marks or having made a procedural error
with the materials, no matter how many times it is
performed.
Continuing with the example, where our student has identified an initial subjective confidence of P(H) ¼ 0.5 and
selected an updating coefficient of R ¼ 0.001, the subjective
posterior probability is then calculated as
PðHjEÞ ¼
0:5 0:001
9:99 104 ;
0:5 0:001 þ 1 0:5
(5)
demonstrating how the direct evaluation allows the student
to drastically reduce the likelihood of this model on the basis
of compelling disconfirmatory evidence. Within the HD
371
Am. J. Phys., Vol. 86, No. 5, May 2018
process, this posterior probability is the precise formulation
of the conclusion reached in the THEREFORE portion of the
process.
Although the specific values used are subjective and will
vary student-by-student, as more evaluations are performed
(say by repeating the experiment with new values), each student’s posterior probability will asymptotically approach
zero. If the students do this for a total of three tests and then
plot the results for the class, they can directly see the convergence for themselves. After doing this, it is important to
point out several epistemic corollaries. First, the fact that we
can never reach a probability of zero is the quantitative
expression of the maxim that we should never rule out any
hypothesis with an absolute certainty. Similarly, in cases
where a hypothesis is confirmed by multiple experiments,
the posterior probability will approach, but never reach, a
value of 1; this expresses the maxim that one ought never to
have absolute confidence in the truth of a hypothesis.
Second, the fact that everyone starts with subjective prior
probabilities and yet inevitably converges to a single asymptotic result illustrates the objective aspect of science.
Students may tend to conflate “absolute” with “objective,”
believing that in order to be objectively valid, something
must be absolutely true, and this example is the first time
when many of them may begin to realize that these are logically disjointed properties.
The emphasis that the Bayesian activity places upon the
selection and justification of the updating coefficient R is
intended to create value for the act of experimental error
analysis. Students in traditional labs spend very little time
engaged in sense-making.11 Indeed, students generally do
not think at all about error analysis, and if they do, they think
only that it is a waste of time.10 Uncertainties and errors are
viewed as being unrelated to physics proper and of little
importance to the experiment or its outcome. Students tend
to take a “trouble-shooting” approach in their attitude toward
error analysis, intending to get the expected result instead of
accepting the results and trying to make sense of them.
One intention of the Bayesian direct evaluation activities
is to redefine for students what the result of an experiment
really is. Instead of being a simple yes/no match to a prediction in order to confirm a hypothesis that the students already
accept as true, the Bayesian approach requires students to
shift their perspective and see an experiment as a way to
gain new information that affects their confidence in a
hypothesis which can never be fully proven or disproven. It
is now acceptable for the experiment to produce a disconfirmatory result, and the goal becomes not to get the predicted
result but to figure out whether their result is confirmatory,
null, or disconfirmatory (and to what extent). The only way
to determine that is via careful error analysis. Thus, the goal
of the laboratory is no longer to prove something they know
but to gather new information that has a real, calculable
effect on their confidence in the hypothesis, and this creates
value for the act of error analysis.
The evaluative reflection that Bayesian updating enables
is able to create authentic value for the act of error analysis
from the students’ perspective in a traditional lab. After all,
there seems to be little point to error analysis if one already
and absolutely knows the truth, thanks to the textbook or
instructor. Without Bayesian updating, error analysis in an
introductory lab is more likely to be reduced in the students’
view to an artificial and unmotivated burden. Another way to
create value and engage students in error analysis is by
Aaron R. Warren
371
framing the experiments as investigations instead of tests,
such as in Refs. 10 and 11. We leave open as a possibility
that the inclusion of Bayesian updating may benefit both labs
that run traditional testing experiments as well as those that
are setup as more open-ended investigative experiments.
Indirect evaluation
Bayes’ Theorem also clarifies the importance of indirect
evaluation as a scientific reasoning strategy. For the indirect
evaluation of the water-pouring task, H is still the assertion
that hnarrow ¼ hwide þ 2, but E now represents the evidence
that this model is inconsistent with volume conservation (as
found via the special-case analysis with hnarrow treated as the
independent variable and set to an initial value of 2 marks).
Students start with their personal priors P(H) and make estimates of R, which will again be a small number due to the
fact that the thought experiment appears to violate volume
conservation. Note that R cannot be zero, though, because
there may always be unknown factors which make volume
conservation an invalid principle in this setting; this is
extremely unlikely, and shocking if true, but impossible to
rule out. Again, students can use their estimates to calculate
posterior probabilities, and doing a few similar thought
experiments (such as starting with hnarrow ¼ 1) allows them
to see convergence toward zero likelihood of the model. The
use of indirect evaluation is thus demonstrated to be epistemically equivalent to the use of direct evaluation. This illustrates the evaluative role of scientific principles, which are
able to serve as powerful tools to modify confidence in proposed hypotheses and models. The precise values of the posteriors that each student gets from direct evaluation of H will
generally be different from the posteriors arrived at via indirect evaluation (due to differing estimates of R), but this is a
difference of degrees and not of kind.
Finally, the broader importance of why the students should
value this process is something that deserves continual emphasis during the course. The ability to reason precisely, even
under uncertain conditions, is one of the most daunting challenges for students, especially for those who initially tend
toward absolutist epistemological beliefs. Bayes’ Theorem is
powerful in this regard because it shows that uncertainty in a
hypothesis is not something that can or should be avoided;
rather, it is unavoidable, and evaluative tests provide the only
valid mechanism by which learning can be achieved and uncertainty diminished. By continually reflecting on their epistemic
uncertainties and seeing how those uncertainties change in
response to the HD process via Bayes’ Theorem, students can
gain a certain comfort level in dealing with the uncertainty and
may begin to treat it from a more productive point of view.
IV. CURRICULAR MATERIAL DESIGNS
To engage students in the use of both the HD process and
Bayes’ Theorem, a variety of activities have been developed.
These activities allow students to exercise the use of each of
these reasoning tools as part of direct and indirect evaluations.
A. Direct evaluation activities
Direct evaluation is primarily exercised by students during
in-class activities that involve the analysis of pre-recorded data
and in laboratory experiments that require students to design
and conduct experiments to test specific physical models. An
372
Am. J. Phys., Vol. 86, No. 5, May 2018
example in-class activity is shown in Fig. 2, illustrating a typical format for engaging students in the construction and evaluation of empirical hypotheses regarding relationships between
measured quantities. This activity is done in lecture at the
beginning of a module focusing on Newtonian mechanics.
Here, evaluation is not simply a summative test performed to
verify a model, but a formative tool that plays a central role in
scientific reasoning, even when one is just beginning to explore
a particular class of phenomena. For the example shown in
Fig. 2, typical proposed relationships include linear, quadratic,
and sinusoidal functions of the angle. Students work in groups,
with laptops, to graph, fit, and test proposed functions and to
update their confidence in each proposal.
Activities such as this may be utilized in a few different
ways. When time is available or the number of competing
hypotheses is suitably small, students can complete the entire
activity in small groups before sharing and discussing results
with the class. An alternative approach, which is sometimes
more appropriate, is to reconvene as a class after part (a) is
complete, compile a single class list of hypotheses under
consideration along with initial estimates of the likelihood of
each, assign particular hypotheses to each group for testing
in part (b), and then reconvene the class again afterward to
share results and identify an optimal hypothesis.
While some direct evaluation activities have been developed for use in lecture or recitation, such as above, the majority have been made for use in the laboratory. Laboratory
experiments in introductory classes often intend to test a particular physical model. As shown in Fig. 3, the laboratory
report can be used as a mechanism to compel students to
reflect on the HD process being instantiated by the particular
experiment at hand and to use Bayes’ Theorem to modify
their confidence in the tested model. The summary in part (h)
of the report generally follows a consistent pattern:
IF the model is true
AND the proposed experiment is conducted as described
THEN the actual results should confirm the prediction
deduced from the model
AND/BUT our results, after accounting for both random and
systematic errors, confirm/fail to test/disconfirm the model
THEREFORE we are more/equally/less confident in the
model, as estimated with Bayes’ Theorem.
In addition, the lab report highlights the importance of
experimental error analysis. Students not only perform error
analysis but also reflect on how making an inference about
the validity of the model strongly depends upon sufficient
minimization of the errors. If the errors are too significant,
the R factor in Bayes’ Theorem will be essentially equal to 1
and the posterior probability of the model will nearly equal
the prior; thus, very little information about the validity of
the model can be learned from such an experiment.
Moreover, the directionality of the systematic errors is
especially important to consider when estimating R. For
example, if one tests Newtonian mechanics by predicting the
average acceleration of a cart rolling down an incline under
the assumed condition of no rolling friction, then the systematic error induced by the presence of rolling friction will
cause the actual acceleration to be less than predicted. If, after
doing the experiment, the actual acceleration is significantly
less than predicted, then this error will cause R to be closer to
1 since the error increases the likelihood that the evidence
could have been obtained even though the hypothesis (of
Aaron R. Warren
372
Fig. 2. An example in-class activity for direct evaluation of empirical hypotheses. Note the notation change in the presentation of Bayes’ Theorem; the prior is
renamed the initial confidence, P(H) ¼ Ci, and the posterior is the final confidence, PðHjEÞ ¼ Cf . This terminology and notation are consistently used on all
Bayesian activities given to students.
Newton’s Laws) is true. However, if the actual acceleration is
greater than predicted, then this systematic error will make R
closer to 0 since the error decreases the likelihood that the
evidence could have been obtained while the hypothesis was
valid. Although one does not need Bayes’ Theorem to realize
this, its use provides a structure to draw out such reasoning
and explicitly demonstrates via the R factor why and how
error analysis is central to the direct evaluation of a model.
B. Indirect evaluation activities
Students can practice indirect evaluation during lecture,
recitation, and homework activities that require them to test
either their proposed problem solutions or a proposed physical model, without collecting any new data. Valid indirect
evaluation techniques such as dimensional, special-case,
limit-case, and trend-case analyses are foreign to most students. Demonstration of these techniques during lecture and
practice of their use during recitation and homework, along
with specific and steady feedback, is therefore required for
students to gradually adopt these techniques. Such feedback
includes written comments on student work pointing out
when a student’s selected special-case was not productive,
perhaps because the selected change does not yield an intuitively or conceptually obvious impact on the answer. Other
times, the feedback may point out mistakes in conceptual or
quantitative reasoning a student made during the evaluative
373
Am. J. Phys., Vol. 86, No. 5, May 2018
analysis. It may take a minimum of 6–8 weeks before the
majority of a class begins to demonstrate proficiency with
indirect evaluation strategies.8,9
Figure 4 shows a general template for activities that engage
students in self-evaluation of their proposed solutions to standard end-of-chapter problems. This approach first requires the
students to reflect on how their solution is derived as an instantiation of a general physical model, such as the ideal gas law
or rigid body motion. Then, the activity points out that a solution can be evaluated by testing for consistency with other
pieces of information that are already held in high confidence.
The students do this via the HD process, using techniques such
as special-case analysis,8 and then update their confidence in
their solution via Bayes’ Theorem. In this way, students can
reduce the epistemic uncertainty in their work in a scientifically authentic fashion, without relying upon external authorities. Near the end of the semester, some of this scaffolding can
be removed as students internalize the process, including
reminders about the HD process and prompts to update confidence estimates with Bayes’ Theorem.
V. IMPACT ON STUDENT ATTITUDES
A. Methods
To assess the impact of the Bayesian evaluation activities
on student learning, the Epistemological Beliefs Assessment
Aaron R. Warren
373
Fig. 3. A template for laboratory experiment reports that highlights the use of hypothetico-deductive reasoning and the use of Bayes’ Theorem.
for Physical Science (EBAPS)21,31 was administered in a
pre/post-test format to three one-semester algebra-based
introductory physics courses. This instrument includes 30
items that ask students to respond to statements on a 5-point
Likert scale ranging from strongly disagree to strongly agree.
Scoring of the EBAPS features five subscales that measure
specific aspects of student epistemological sophistication.
The subscales are Axis 1, Structure of Scientific Knowledge;
Axis 2, Nature of Knowing and Learning; Axis 3, Real-life
Applicability; Axis 4, Evolving Knowledge; Axis 5, Source
of Ability to Learn. In this study, we required an instrument
capable of performing measurements that are tightly aligned
to the particular attitudes that the Bayesian activities are
intended to impact. None of the eight categories of highly
correlated items from the Colorado Learning Attitudes about
Science Survey (CLASS)12 were deemed appropriate measures. The Sense-Making/Effort category comes closest, but
it conflates the nature of learning, the structure of knowledge, and other expert-identified attitudinal dimensions that
the EBAPS keeps separate. This is perhaps not surprising
Fig. 4. After attempting to solve a standard end-of-chapter problem, this activity compels students to indirectly evaluate their solution using techniques such as
special-case and limit-case analysis and to update their confidence in their solution, thereby modifying the epistemic uncertainty in their own work.
374
Am. J. Phys., Vol. 86, No. 5, May 2018
Aaron R. Warren
374
(and even expected) since the CLASS categories are based
on student responses in order to achieve high reliability, and
students are not likely to attend to the distinct conceptual
dimensions an expert recognizes. In particular, the EBAPS
scores for Axes 2 and 4 focus more specifically on attitudes
that the Bayesian activities are designed to affect. Thus,
although not as widely used or reliable as the CLASS, the
EBAPS allows the most appropriate measurement of the specific attitudes we wish to assess.
Axis 2—Nature of Knowing and Learning (Nat. Learn.)
subscale—assesses whether students believe that learning
science consists of information retention or whether it is
inherently constructive, progressing by leveraging prior
experiences to interpret and test new information and requiring continual reflection upon one’s own understanding.
Given that the Bayesian activities involve students reflecting
upon and altering their confidence based on tests employing
both their prior knowledge in thought experiments and their
empirical results in laboratory, we expect that such activities
could improve student gains on this subscale. In particular,
questions 1, 11, 12, 13, and 30 each deal with students’ attitudes toward constructivist activities that assess one’s confidence and change subjective confidence by using prior
knowledge to independently test an idea or problem solution.
Axis 4—Evolving Knowledge (Evo. Know.) subscale—
focuses on how well students are able to avoid absolutism
and extreme relativism. The HD process and Bayes’
Theorem necessarily refute absolutism, as models are tested
and confidence levels modified. Bayesian activities also give
students grounds to reject extreme relativism by demonstrating that, regardless of one’s personal prior confidence, the
results of direct and indirect evaluations can cause everyone’s confidence in a proposition to converge toward 0 or 1.
This subscale relies solely on questions 6, 28, and 29, each
of which deals with attitudes toward subjective probability in
science and its ability to be affected by new evidence.
The remaining subscales (Axes 1, 3, and 5) are not predicted to show any effect due to Bayesian activities. For
instance, although various hypothetico-deductive activities,
such as special-case analysis, do intend to help students construct a hierarchical organization to physical knowledge, the
addition of Bayesian updating to those activities does not
draw further attention to such a structure. Instead, the
Bayesian updating only propagates the implication of that
structure in order to revise confidence in a hypothesis or
model. Likewise, Bayesian updating has no direct bearing on
attitudes toward problem-solving sophistication, notions of
self-efficacy, or the relationship between abstract physical
models or representations and the real world.
The three courses were each taught by the same instructor,
for lectures and labs, and used the textbook by Etkina
et al.,32 which is based on the ISLE approach. However,
only two of the courses (labeled E1 and E2) included
Bayesian activities; the third course (labeled C) did not
include any Bayesian activities. Within each experimental
condition course, E1 and E2, there were three direct evaluation activities conducted during the lecture, eight laboratory
experiments and reports using Bayesian updating, and 14
indirect evaluation activities incorporated within the weekly
recitation assignments. The survey administration in all three
courses was similar, being completed during class and without incentivization. Student responses were checked to
ensure that they indicated serious effort, for example, by not
giving the same response on the final 10 items of the survey.
B. Results
The results of the pre/post-tests, after listwise deletion of
students who did not fully complete either the pre- or postsurveys, are shown in Table III, including mean scores on all
five subscales. The percentages of students who provided
complete responses to both the pre- and post-surveys were
78% (E1: 21 of 27 students), 80% (E2: 35 of 44 students),
and 82% (C: 18 of 22 students). The overall score and each
subscale score range from 0 to 100. Bayesian estimation
(BEST)33 is used to determine the mean and 95% highestdensity interval (HDI) of the distribution of the difference of
means. Bayesian estimation has the advantage of generating
an explicit distribution of credible values and is more informative than traditional distribution characterization via the
mean and standard error. This study focuses on Axis 2 (Nat.
Learn.) and Axis 4 (Evo. Know.), which are shown in bold
font, since those are the only subscales that the Bayesian
activities are intended to affect.
The E1 and E2 classes appear to have differences in their
performances. Similar variability has been seen in CLASS
pre-/post-results obtained from prior sections of this course
during routine course assessment, with gains in overall
CLASS scores fluctuating by roughly 5%–10% from one
semester to the next despite nearly identical instructional
Table III. Pre-/post-comparisons for C(N ¼ 18), E1 (N ¼ 21), and E2 (N ¼ 35). Classes E1 and E2 included Bayesian activities, while C did not. Mean gains
and 95% HDI for the distribution of mean gains are provided by Bayesian estimation (Ref. 33). Bayesian activities are predicted to affect attitudinal shifts in
Axis 2 (Nat. Learn.) and Axis 4 (Evo. Know.), shown in bold font.
C
Pre
Post
Mean gain
E1
Pre
Post
Mean gain
E2
Pre
Post
Mean gain
375
Overall
Struct. know.
Nat. learn.
Real-life app.
Evo. know.
Src. learn
67.3 6 2.1
70.3 6 1.4
2:67:8
2:9
Overall
69.0 6 1.6
73.4 6 1.6
4:49:4
0:5
Overall
61.9 6 2.0
66.0 6 1.6
4:09:5
1:2
58.8 6 2.4
62.4 6 2.0
3:410:3
3:5
Struct. know.
59.2 6 2.7
65.7 6 2.6
6:314:0
1:8
Struct. know.
56.6 6 2.5
57.6 6 2.1
0:87:5
6:2
67.9 6 3.2
67.5 6 2.4
0:48:2
9:2
Nat. learn.
63.2 6 2.9
72.3 6 2.4
9:517:5
1:2
Nat. learn.
60.0 6 2.1
63.7 6 2.0
3:99:9
1:9
73.1 6 4.3
79.5 6 2.6
16:0
5:65:4
Real-life app.
77.7 6 3.2
81.8 6 2.9
13:6
4:34:8
Real-life app.
68.0 6 3.6
76.0 6 3.0
17:7
7:91:8
70.4 6 5.1
73.6 6 3.6
3:116:8
10:7
Evo. know.
71.0 6 3.2
75.4 6 3.6
4:114:1
6:3
Evo. know.
57.1 6 4.0
66.0 6 3.4
9:020:1
1:7
75.6 6 3.6
79.7 6 3.1
4:614:8
5:8
Src. learn
83.3 6 2.6
84.0 6 2.4
0:78:1
7:0
Src. learn
71.4 6 3.4
77.0 6 2.2
5:413:7
3:4
Am. J. Phys., Vol. 86, No. 5, May 2018
Aaron R. Warren
375
materials and style. Given the relatively high reliability and
validity of the CLASS, it seems most likely that statistical
noise is the cause of the EBAPS differences between E1 and
E2, and we therefore combine the two sections in order to
better smooth fluctuations and test the predicted impact of
the Bayesian activities.
The results for the mean gains of the Control class versus
the gains from the two experimental classes (combined) are
shown in Fig. 5 and summarized in Table IV. Although this
study is only concerned with Axis 2 (Nat. Learn.) and Axis 4
(Evo. Know.), for the sake of completeness, we include the
results for each subscale as well as the overall EBAPS score.
The data were then multiply imputed34,35 to produce estimated values for the missing pre-/post-test results from individual students. Globally, the missing data rate was 10.2%
(19 of 186 possible). A total of 50 imputations were generated, pooled, and assessed for each missing value. The
method used was multiple imputation by chained equations,36,37 and the variables used in the imputation model
included the set of individual gains on all five EBAPS subscales as well as the overall score.
To test whether the Bayesian activities produced a significant effect on Axis 2 and Axis 4, as predicted, we will analyze the data using two separate approaches. First, we will
use classical null-hypothesis significance testing in order to
make inferences about effects observed on Axis 2 and Axis
4. Our second analysis will use Bayesian parameter estimation to calculate likelihoods and effect sizes for Axis 2 and
Axis 4. Since this is an exploratory study to gather tentative
information about the impacts of these new activities, rather
than a test of a firm hypothesis, we will proceed in an investigative fashion. Our goal is not a strict binary judgment of the
impact or no impact because even if there were significant
evidence for one or the other, there would still be significant
threats to external validity posed by the sample composition
and size, by the unknown reliability of the EBAPS subscales,
and by the fact that only one method of instruction from a
single instructor was employed. Instead, our analytical goal
is to generate information regarding the observed impact of
these activities which can inform any future implementation
and study of their attitudinal impact. As will be seen, both
the classical and Bayesian approaches will paint a similar
picture of the impact produced by the activities.
First, taking the classical approach, we perform a multivariate analysis of variance with one independent variable
(class condition) and two dependent variables (subscale
gains on Axis 2 (Nat. Learn.) and Axis 4 (Evo. Know.). The
Table IV. Summary of mean gains for the control and experimental (E1 and
E2 combined) classes. Cohen’s d is provided as an estimate of the effect
size. Although this study is only concerned with Axis 2 and Axis 4 (in bold
font), the overall score and all subscales are also included for completeness.
Overall
Axis 1
Axis 2
Axis 3
Axis 4
Axis 5
Mean gain 2.9 6 1.5 3.6 6 2.4 0.3 6 2.6 6.4 6 3.9 3.2 6 5.4 4.2 6 3.2
(C)
Mean gain 4.2 6 0.8 3.1 6 1.6 5.7 6 1.6 6.5 6 2.1 7.1 6 3.1 3.8 6 1.9
(E)
Cohen’s d
0.184
0.040
0.453
0.005
0.146
0.026
MANOVA indicates a marginally significant effect (Pillai
¼ 0.052, F(2, 90) ¼ 2.474, and p ¼ 0.090) with a moderate
effect size (partial g2 ¼ 0.063 and 95% HDI ¼ (0.009,
0.170)). The observed power is 0.54, which means that a repetition of this study would be nearly as likely to produce a
false negative as to identify a true positive. Thus, if these
results are to be tested further, then future work must involve
larger samples and perhaps utilize a higher-precision instrument (yet to be developed) that is more capable of detecting
the attitudinal shifts that the Bayesian activities are intended
to produce.
Continuing our exploration of the results, we tentatively
accept the marginal significance of the MANOVA in order to
investigate how each individual axis may have been affected
by the class condition. Strictly speaking, no firm conclusions
may be drawn from this work; however, it is informative to
see whether one axis appears to have responded differently
from another. Indeed, the class condition shows a likely and
moderately strong effect on Axis 2, (F ¼ 4.793, p ¼ 0.031,
and partial g2 (95% HDIÞ ¼ 0:0500:158
0:002 ), but a much weaker
and less likely effect on Axis 4 (F ¼ 0.499, p ¼ 0.482, and
partial g2 ð95% HDIÞ ¼ 0:0050:058
0:000 ).
We now undertake an independent Bayesian analysis of
the data. The likelihoods that the gains in the Experiment
condition exceeded the gains in the Control condition on
each subscale and corresponding effect sizes were estimated
using the BEST software package. The results are shown in
Table V. Although the results are listed for all subscales, we
are only concerned with Axis 2 and Axis 4. In Bayesian
parameter estimation, a prior estimate of credible parameters
is updated and credibility thereby reallocated across the space
of possible parameter values. Unlike classical inferencing,
whereby multiple tests require p-value corrections in order to
moderate family-wise Type I error rates, multiple comparisons made in a Bayesian estimation do not usually require
any correction or modification (the exception being situations
Table V. The difference of mean gains between conditions, the estimated
likelihood that gains in the Experiment condition exceeded Control group
gains, and the estimated effect sizes, all provided by Bayesian parameter
estimation (Ref. 33). Although this study is only concerned with Axis 2 and
Axis 4 (in bold font), all subscales are included for completeness.
Axis 1
Fig. 5. Gains for mean student scores in the Control and Experimental (E1
and E2 combined) classes, with bars indicating one standard error of the
mean.
376
Am. J. Phys., Vol. 86, No. 5, May 2018
Axis 2
Axis 3
Axis 4
Axis 5
Difference of gains
0:65:7
6:412:8
0:29:8
3:616:7
0:67:4
6:6
0:0
9:5
8:5
9:4
(BEST)
Likelihood Egain > Cgain
43.4
97.4
51.6
70.6
44.4
(BEST) (%)
1:00
0:46
0:51
0:61
Effect size (BEST)
0:040:40
0:47 0:510:02 0:010:46 0:140:34 0:050:58
(95% HDI)
Aaron R. Warren
376
when a multi-level model would be appropriate).38 Here, to
the extent that we believe Axis 2 and Axis 4 measure distinct
attitudinal dimensions (as intended in the construction of the
subscales), a multi-level model is inappropriate.
Examining the estimates yielded by the BEST estimation
procedure, we find that there is a 97.4% likelihood that the
Experiment condition outperformed the Control condition on
Axis 2, while there is a 70.6% likelihood for Axis 4. Effect
sizes are estimated at roughly 0.51 for Axis 2 and 0.14 for
Axis 4. These results support the same conclusions as our
classical analysis, without any need for worry regarding tentative acceptance of marginal significances or other such
threats to internal validity. Namely, we again conclude that
the activities appear to have had a moderate and highly likely
impact on Axis 2, alongside a weak and less likely impact on
Axis 4. The external validity of these conclusions still suffers
a number of threats posed by the limited sample sizes, the
specific nature of the instruction in each course, and the
unknown reliability of the EBAPS subscale scores. These are
factors that only further testing can address. Despite these
important caveats, when taken as a whole, the results are
suggestive of an educational impact that merits further study.
An immediate follow-up question that emerges from our
analysis is why the activities might positively impact attitudes on Axis 2 (Nat. Learn.) to a much greater degree than
on Axis 4 (Evo. Know.). This result is surprising considering
the emphasis the Bayesian activities place upon confidence
updating. One hypothesis we favor is that the use of subjective probabilities in the Bayesian activities places the
obvious and repeated value in the relativistic nature of science. Meanwhile, the emergence of intersubjective agreement which is gradually realized via multiple tests and
updatings of a single hypothesis (and which tends to universally drive subjective probabilities toward 0 or 1) was only
experienced by students in one Bayesian activity early in the
course. Therefore, students may be led by this skewed experience toward an overly relativist perception of scientific
knowledge.
To test this hypothesis, student responses to the three questions that factor into the Axis 4 subscale score, shown in Fig. 6,
may be analyzed for changes in student attitudes toward relativist or absolutist positions. For example, for question #29, we
classify answer C as a neutral response, while answers A and B
show relativist attitudes and answers D and E express absolutist
attitudes. For questions #6 and #28, we classify responses A
and B as absolutist, C as neutral, while D and E are relativist
responses. As shown in Fig. 7, aggregating the pre- and postresponses from each condition and examining the change in
response rates for these three categories, we find that students
in the Control tended to adopt more absolutist attitudes from
pre- to post-tests, while students in the Experimental group
showed a shift toward relativist attitudes. This pattern was held
for responses to all three items, as well as the aggregate shown
here.
Given that the Axis 4 subscale intends to assess the extent
to which a student is able to maintain a suitably sophisticated
understanding of scientific knowledge, being rooted in intersubjective agreements which are constrained by nature,
Fig. 6. The items from EBAPS which are used to score Axis 4 (Evo. Know.). The responses scored as most expert-like for this subscale are #6 A, #28 E, and
#29 C. Here, we will instead sort student responses into three categories (relativist, neutral, and absolutist) as described in the text.
377
Am. J. Phys., Vol. 86, No. 5, May 2018
Aaron R. Warren
377
Fig. 7. Shown, for each condition, are changes in the cumulative percentage
of responses to the questions used for the Axis 4 subscale (questions #6, 28,
and 29), which were classified as relativist, neutral, or absolutist.
attitudes that are “too absolutist” or “too relativist” are
accordingly scored lower. Thus, it may be that the curricular
selection of the specific Bayesian activities used produced
attitudes which were scored only slightly higher than those
in the Control condition. A more frequent engagement in
Bayesian activities that develop intersubjective convergence
via multiple Bayesian updatings (performed either serially
by a single student/group or in parallel by pooling results
from a number of independent evaluations) may better
enable expert-like sophistication and enable a greater positive shift in student attitudes on Axis 4.
An alternative hypothesis, which is not mutually exclusive
to the above hypothesis, is that these results are due to an
incomplete growth of students’ epistemic attitudes. The
Reflective Judgment Model39,40 views people as generally
progressing through several levels of growth (Prereflective,
Quasi-Reflective, and Reflective) which, roughly speaking,
are typified by absolutist, relativist, and expert-like attitudes,
respectively. Thus, relativist attitudes are an intermediate
stage between absolutist and expert-like attitudes. From this
perspective, we would suggest that the Bayesian activities
may enable a progression in students’ level of reflective
judgment. If true, the results obtained would indicate that the
students’ progression was incomplete but was nonetheless
substantially enhanced in the Experimental condition,
whereas the Control condition appeared to regress toward a
Prereflective level.
C. Discussion
The results suggest that the Bayesian activities may generate a meaningful and educationally significant effect on specific student attitudes regarding the nature of learning
although this suggestion is not conclusive. Studies with
larger sample sizes and a variety of instructional methodologies are required to more precisely test and estimate the
effects of the Bayesian activities. Additionally, the development of an instrument designed specifically to assess student
attitudes that pertain to uncertainty, confidence, and changes
in confidence due to new information would allow results to
carry stronger inferential power. Given the distinctive nature
of the results obtained in this work, namely the ability to positively shift student epistemic attitudes in a one-semester
378
Am. J. Phys., Vol. 86, No. 5, May 2018
introductory physics course, this is worthy of further
exploration.
It is worth noting that any effects of the Bayesian activities came at a relatively small cost in terms of class time, as
the activities did not require a burdensome curricular investment. The cumulative amount of time (in lecture, recitation,
and lab) that students in the E1 and E2 classes spent on
Bayesian activities was roughly 4–5 h out of a total of 72
hours of instructional time over the semester. It is possible
that a greater incorporation of Bayesian activities would produce greater epistemological gains.
Although these courses were based on the ISLE curriculum, the Bayesian activities themselves could easily be
adapted to any curriculum, including traditional designs. Of
course, the impact of the Bayesian activities is almost certainly non-trivially related to other pedagogical features of
the course, and so, their effectiveness is sure to vary depending on the context of use. The ISLE approach is centered
upon the notion of model development, testing, and revision/
rejection and therefore lends itself very naturally to the
inclusion of Bayesian activities. Approaches similar to ISLE,
such as Modeling Instruction or Physics by Inquiry, could
similarly prove to be amenable to the addition of Bayesian
activities. Traditional pedagogical approaches may make it
more difficult for engagement in Bayesian activities to carry
sufficient meaning or significance to the students and may
therefore mitigate the effectiveness of these activities. On
the other hand, if exam questions or other assessments are
given that include the exercise of Bayesian activities, perhaps the explicit value placed on these activities would be
sufficient to produce both attitudinal and problem-solving
gains.
VI. CONCLUSION
Several activity types, featuring both direct and indirect
evaluation of hypotheses, have been proposed in order to
engage students in Bayesian updating. The results from a
quasi-experimental study indicate that these Bayesian activities benefit some aspects of student epistemology, particularly those regarding the nature of knowing and learning.
The ability for a set of activities to produce a positive shift in
student attitudes makes it a distinctly useful curricular tool,
particularly given that the well-documented difficulty many
courses have in producing such shifts.
Moreover, the prospective portability of these activities in
a variety of course designs, at minimal cost of class time,
means that they may be broadly helpful in many courses
without needing extensive instructor preparation or course
alterations. Considering that other positive attitudinal shifts
have generally required extensive investments of preparation
and class time (e.g., Refs. 14, 15, 18, and 21), this seems
rather remarkable. Beyond that, these activities appear to
have enhanced a curriculum (ISLE) that already exhibited
positive attitudinal shifts. We caution that further testing is
required, given the limitations imposed by the sample size in
this study, and also that qualitative analysis via interviews
would help to deepen an understanding of how and to what
degree these Bayesian activities influence student attitudes.
The specific selection of activities used in the study also
appears to have caused students to shift their attitude regarding the evolution of knowledge toward an overly relativist
perspective. This suggests that there is room for improvement in the design and selection of Bayesian activities in
Aaron R. Warren
378
order to generate further gains. The optimal selection of
Bayesian activities is almost certain to depend to some
degree upon the particular course design that they are incorporated into. Moreover, their use and impact in calculusbased introductory classes may differ from what was seen
here in algebra-based courses and is something we are currently studying.
One intent of the Bayesian activities, particularly the
direct evaluation activities, was to motivate students to
attend to error analysis and to greater sense-making in general. This has not been tested yet and stands out as something
that ought to be studied. This is particularly true if, as we
hypothesized, the Bayesian activities are able to increase
sense-making among students even in more traditional lab
designs.
Finally, there are many open questions, including the interaction of these activities with demographic variables in producing attitudinal gains. Considering the many ways in which
gender affects student attitudes in physics courses (e.g., Refs.
41 and 42), one may anticipate a similar difference in the
impact of Bayesian activities. Another important question
would be the robustness and transferability of students’ attitudes beyond the physics classroom and surveys, similar to
studies of the transferability of scientific reasoning abilities
(e.g., Ref. 43). Since Bayes’ Theorem is presented and used
across a wide variety of physical contexts in these activities,
does that make it possible for students to recognize and use it
more generally? In particular, one may wonder whether the
instantiations of Bayesian updating that these activities provide are sufficiently generic as to facilitate transfer across
topical domains,44 which is indeed a broader goal of these
activities and would be worthy of continued pursuit.
ACKNOWLEDGMENTS
The author received financial support for this study via
Instructional Improvement Program grants at his institution
and is thankful to three anonymous reviewers whose
comments greatly improved the quality of this article.
a)
Electronic mail: arwarren@purdue.edu
R. W. Bybee and B. Fuchs, “Preparing the 21st century workforce: A new
reform in science and technology education,” J. Res. Sci. Teach. 43(4),
349–352 (2006).
2
R. Gott, S. Duggan, and P. Johnson, “What do practicing applied scientists
do and what are the implications for science education?,” Res. Sci.
Technol. Educ. 17, 97–107 (1999).
3
E. Lottero-Perdue and N. W. Brickhouse, “Learning on the job: The acquisition of scientific competence,” Sci. Educ. 86, 756–782 (2002).
4
S. Duggan and R. Gott, “What sort of science education do we really
need?,” Int. J. Sci. Educ. 24, 661–679 (2002).
5
National Academy of Engineering, Educating the Engineer of 2020:
Adapting Engineering Education to the New Century (The National
Academies Press, Washington, DC, 2005).
6
E. F. Redish, J. M. Saul, and R. N. Steinberg, “Student expectations in
introductory physics,” Am. J. Phys. 66, 212–224 (1998).
7
M. Sahin, “Effects of problem-based learning on university students’ epistemological beliefs about physics and physics learning and conceptual
understanding of Newtonian mechanics,” J. Sci. Educ. Technol. 19,
266–275 (2010).
8
A. R. Warren, “Impact of teaching students to use evaluation strategies,”
Phys. Rev. Spec. Top.-Phys. Educ. Res. 6, 020103 (2010).
9
A. R. Warren, Ph.D. dissertation, Rutgers University, 2006.
10
R. Lippmann, Ph.D. dissertation, University of Maryland, 2003.
11
A. Karelina and E. Etkina, “Acting like a physicist: Student approach
study to experiment design,” Phys. Rev. Spec. Top.-Phys. Educ. Res. 3,
020106 (2007).
1
379
Am. J. Phys., Vol. 86, No. 5, May 2018
12
W. K. Adams, K. K. Perkins, N. S. Podelefsky, M. Dubson, N. D.
Finkelstein, and C. E. Wieman, “New instrument for measuring students
beliefs about physics and learning physics: The Colorado Learning
Attitudes about Science Survey,” Phys. Rev. Spec. Top.-Phys. Educ. Res.
2, 010101 (2006).
13
A. Madsen, S. B. McKagan, and E. C. Sayre, “How physics instruction
impacts students’ beliefs about learning physics,” Phys. Rev. Spec. Top.Phys. Educ. Res. 11, 010115 (2015).
14
B. A. Lindsey, L. Hsu, H. Sadaghiani, J. W. Taylor, and K. Cummings,
“Positive attitudinal shifts with the physics by inquiry curriculum across
multiple implementations,” Phys. Rev. Spec. Top.-Phys. Educ. Res. 6,
010102 (2012).
15
V. Otero and K. Gray, “Attitudinal gains across multiple universities using
the physics and everyday thinking curriculum,” Phys. Rev. Spec. Top.Phys. Educ. Res. 4, 020104 (2008).
16
D. Hestenes, “Toward a modeling theory of physics instruction,” Am. J.
Phys. 55, 440–454 (1987).
17
D. Hestenes, C. Megowan-Romanowicz, S. Osborn Popp, J. Jackson, and
R. Culbertson, “A graduate program for high school physics and physical
science teachers,” Am. J. Phys. 79(9), 971–979 (2011).
18
E. Brewe, L. Kramer, and G. O’Brien, “Modeling instruction: Positive attitudinal shifts in introductory physics measured with CLASS,” Phys. Rev.
Spec. Top.-Phys. Educ. Res. 5, 013102 (2009).
19
E. Etkina and A. Van Heuvelen, “Investigative science learning environment,” in Forum on Education of the American Physical Society, Spring
issue (2004), pp. 12–14.
20
E. Etkina, A. Van Heuvelen, S. White-Brahmia, D. T. Brookes, M.
Gentile, S. Murthy, D. Rosengrant, and A. Warren, “Scientific abilities
and their assessment,” Phys. Rev. Spec. Top.-Phys. Educ. Res. 2, 020103
(2006).
21
A. Elby, “Helping physics students learn how to learn,” Am. J. Phys.,
Phys. Educ. Supp. 69(7), S54–S64 (2001).
22
E. F. Redish and D. Hammer, “Reinventing college physics for biologists:
Explicating and epistemological curriculum,” Am. J. Phys. 77(629),
629–642 (2009).
23
A. E. Lawson, “The generality of hypothetico-deductive reasoning,” Am.
Biol. Teach. 62(7), 482–495 (2000).
24
A. E. Lawson, The Neurological Basis of Learning, Development and
Discovery: Implications for Science and Mathematics Instruction (Kluwer
Academic Publishers, New York, 2003).
25
E. Etkina, A. Warren, and M. Gentile, “The role of models in physics
instruction,” Phys. Teach. 44(1), 34–39 (2006).
26
R. Dawid, String Theory and the Scientific Method (Cambridge U. P., UK,
2013).
27
J. Brownlee, S. Walker, S. Lennox, B. Exley, and S. Pearce, “The first
year university experience: Using personal epistemology to understand
effective learning and teaching in higher education,” High. Educ. 58(5),
599–618 (2009).
28
D. Sivia and J. Skilling, Data Analysis: A Bayesian Tutorial, 2nd ed.
(Oxford U P, Oxford, 2006).
29
B. P. Abbot et al., “Observation of gravitational waves from a binary black
hole merger,” Phys. Rev. Lett. 116, 061102 (2016).
30
R. E. Kass and A. E. Raftery, “Bayes factors,” J. Am. Stat. Assoc.
90(430), 773–795 (1995).
31
A. Elby, J. Fredriksen, C. Schwartz, and
B. White, “Epistemological
beliefs assessment for physics science,” <http://www2.physics.umd.edu/
elby/EBAPS/home.htm>.
32
E. Etkina, M. Gentile, and A. Van Heuvelen, College Physics, 1st ed.
(Pearson, Boston, 2014).
33
J. K. Kruschke, “Bayesian estimation supersedes the t test,” J. Exp.
Psychol. 142(2), 573–603 (2013).
34
R. J. Little and D. B. Rubin, Statistical Analysis with Missing Data, 2nd
ed. (Wiley, Canada, 2002).
35
N. J. Horton and K. P. Kleinman, “Much ado about nothing: A comparison
of missing data methods and software to fit incomplete data regression
models,” Am. Stat. 61(1), 79–90 (2007).
36
I. R. White, P. Royston, and A. M. Wood, “Multiple imputation using
chained equations: Issues and guidance for practice,” Stat. Med. 30(4),
377–399 (2011).
37
S. van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in R,” J. Stat. Software 45(3), 1–67 (2011).
38
A. Gelman, J. Hill, and M. Yajima, “Why we (usually) don’t have to worry
about multiple comparisons,” J. Res. Educ. Eff. 5(2), 189–211 (2012).
Aaron R. Warren
379
39
P. M. King and K. S. Kitchener, Developing Reflective Judgment:
Understanding and Promoting Intellectual Growth and Critical Thinking
in Adolescents and Adults (Jossey-Bass, San Francisco, 1994).
40
P. M. King and K. S. Kitchener, “Reflective judgment: Theory and
research on the development of epistemic assumptions through
adulthood,” Educ. Psychol. 39(1), 5–18 (2004).
41
L. Kost, S. Pollock, and N. Finkelstein, “Characterizing the gender gap in introductory physics,” Phys. Rev. Spec. Top.-Phys. Educ. Res. 5, 010101 (2009).
380
View publication stats
Am. J. Phys., Vol. 86, No. 5, May 2018
42
J. M. Nissen and J. T. Shemwell, “Gender, experience, and self-efficacy
in introductory physics,” Phys. Rev. Phys. Educ. Res. 12, 020105
(2016).
43
E. Etkina, S. Murthy, and X. Zou, “Using introductory labs to engage students in experimental design,” Am. J. Phys. 74(11), 979–986 (2006).
44
J. A. Kaminsky, V. M. Sloutsky, and A. F. Heckler, “The cost of concreteness: The effect of nonessential information on analogical transfer,”
J. Exp. Psychol. 19(1), 14–29 (2013).
Aaron R. Warren
380
Download