Metacognitive Monitoring 1 Running head: METACOGNITIVE MONITORING

Metacognitive Monitoring 1
The Effects of Persuasive and Expository Text on Metacognitive Monitoring and Control
Daniel L. Dinsmore, Sandra M. Loughlin, and Meghan M. Parkinson
University of Maryland
Metacognitive Monitoring 2
This investigation examined metacognitive processes across two text types (persuasive and
expository). We also considered the effects of think aloud and three expertise levels
(acclimation, competence, and proficiency) for scrolling (i.e., moving back and forth in text) and
calibration (i.e., difference between confidence and performance). Participants were
undergraduates enrolled in either human development (n = 38) or government/politics courses (n
= 38), and practicing attorneys (n = 4). Participants read two passages on judicial review
presented via computer, and trace data on scrolling behaviors were logged during reading.
Additionally, a calibration measure was completed after reading. Think-alouds were coded for
metacognitive utterances. Data were analyzed via non-parametric bootstrapping. Significant
differences between text type were found for scrolling, calibration, and utterance categories.
There was no significant difference for think aloud condition on scrolling or calibration. Only
scrolling was statistically different for expertise level. However, median differences revealed
interesting trends between expertise groups requiring further investigation.
Metacognitive Monitoring 3
The Effects of Persuasive and Expository Text on Metacognitive Monitoring and Control
Students, particularly those at the undergraduate level, are often required to read,
evaluate, and use information presented in text. Too often, it is assumed that undergraduates are
competent readers able to use text type effectively to learn essential content. However, this
assumption has recently been called into question (e.g., Fox, Dinsmore, Maggioni, & Alexander,
2009). Specifically, Fox et al. (2009) found that undergraduates enrolled in a research methods
course were able to recall only limited information from course-related texts and did not display
the strategic processing expected of competent readers. One possible explanation for those
reported shortfalls was that these undergraduates were poor at monitoring and controlling their
cognitive processes, particularly in regards to comprehension (Wiley, Griffin, & Thiede, 2005).
These problems at monitoring and controlling may revolve around the students’ inability to use
prior knowledge (e.g., Shapiro, 2008), calibrate their learning (i.e., monitoring the relation
between confidence and performance; e.g., Dunlosky, Serra, Matvey, & Rawson, 2005), set
goals, or activate appropriate strategies (e.g., Aleven, McLaren, Roll, & Koedinger, 2006). The
present study was compelled by these concerns and by the goal of understanding how
presumably competent readers engage with texts cognitively and metacognitively.
As we move into this investigation, we are aided by the fact that the research on
metacognition represents a mature line of inquiry. In particular, there is an extensive literature on
the relation between metacognition and text (e.g., Wiley, Griffin, & Thiede, 2005). However,
despite the richness and diversity in this line of research, several problems and gaps persist.
Specifically, the literature on metacognitive monitoring and control has relied primarily on short
segments of text rather than extended discourse and has not considered the potential effects of
text type or genre on monitoring or control processes. Further, we considered the possibility that
Metacognitive Monitoring 4
certain measures of metacognition in the literature may actually influence metacognitive
processing. Finally, we found limited consideration given to the levels of readers’ expertise in a
domain and their subsequent metacognitive processing. The aim of the present research is to
address these gaps.
For the purpose of this investigation, metacognition is defined as "thinking about
thinking" (Miller, Kessel, & Flavell, 1970, p. 613), which encompasses four key components
(Flavell, 1979): metacognitive knowledge, metacognitive experiences, cognitive goals, and the
strategy activation. Metacognitive knowledge refers to knowledge or beliefs that guide the
course of mental operations at either the person, task, or strategy level, while metacognitive
experiences are the cognitive or affective experiences that pertain to a mental operation.
Cognitive goals refer to cognitive or metacognitive goals that direct cognitive or metacognitive
activity. Finally, strategies are cognitive actions that are evoked to monitor (metacognitive
strategies) or make (cognitive strategies) progress toward a goal.
There are number of studies of metacognition that have examined the difference between
individuals’ confidence and their performance (i.e., calibration) with tasks involving the
memorization of word pairs (e.g., Thiede & Dunlosky, 1994) or general knowledge questions
(Dahl, Allwood, & Hagberg, 2009). However, these studies have infrequently considered the
effect of topic or domain on the outcomes reported, particularly as it relates to academic domains
(Parkinson & Dinsmore, in preparation). Although this research provides insights into
monitoring and control processes, it sheds little light on what might be occurring within the
minds of undergraduates reading challenging texts from which they are expected to learn new
and complex content. Further, those monitoring studies that have used connected discourse
typically utilize expository texts such as Encarta (e.g., Moos & Azevedo, 2008).
Metacognitive Monitoring 5
Expository text is characterized as non-fiction reading material in which the intent is to
inform or explain (Williams, Stafford, Lauer, Hall, & Pollini, 2009). Although students often
read expository texts, the aforementioned studies have not been designed to establish how that
particular type of text over other forms may affect metacognitive monitoring and control. For
that reason, we have chosen to compare participants’ metacognitive processing with two text
types (i.e., expository and persuasive text).
Persuasive text is defined as text in which an author argues a point of view in order to
change a reader’s knowledge, beliefs, or interest (Kamalski, Sanders, & Lentz, 2002; Murphy,
Long, Holleran, & Esterly, 2003). Our interest in persuasive text for this study comes from the
finding that such text can be influential in sparking students’ interest and deepening their
knowledge (e.g., Buehl, Alexander, Murphy, & Sperl, 2001; Carrell & Connor, 1991). This may
be especially true for two-sided refutational text in which competing views on an issue are
presented, although to the advantage of one view over the other (Allen, 1991). We expected that
by presenting participants with two different texts on judicial review, we might uncover
differences in their metacognitive monitoring and control.
Whether the text is expository or persuasive, it is still necessary to find some viable
method for unearthing typically covert mental processes. This is no easy task and is made more
difficult because the measurements themselves may in fact disrupt these mental processes. This
has long presented a problem for metacognitive researchers, who have attempted a variety of
metacognitive measures. In their review of the metacognition literature, Dinsmore, et al (2008)
identified six types of measures in the contemporary literature: self report, observation, thinkaloud, interviews, and performance ratings. Measures of metacognition should be chosen based
on their utility in uncovering these mental processes, but not disrupting them. Previous measures,
Metacognitive Monitoring 6
such as performance ratings (i.e., calibration) and observational measures such as rereading (as
measured by the number of times participants scroll backward through text; e.g., JohnsonGlenberg, 2005) and help seeking (operationalized as soliciting help from an outside source; e.g.,
Aleven & Koedinger, 2002) have not shown evidence of disrupting covert mental processes.
Of particular interest here was the affect think-aloud protocols may have on other
measures of monitoring and control commonly used (i.e., calibration and observational
measures). This issue has become more salient because there has been an increasing use of thinkaloud methodology in the metacognition literature (Dinsmore et al 2008). In a think-aloud
protocol, participants are asked to perform a task while continuously reporting thoughts that
occur during a task (Erricson & Simon, 1984). Further, Ericsson and Simon conjecture that these
thoughts emanate from working memory. By positioning these concurrent verbalizations in
working memory, think-aloud protocol should only elicit verbalizations about deliberately
enacted strategies, not automated skills (e.g., decoding in reading).
However, the question of whether the think-aloud protocol affects processing is far from
resolved. Veenman, Elshout, and Groen (1993) investigated this issue with measures of
regulatory processing in a discovery learning situation. They found no significant differences in
their measures of regulatory processing (as measured by student performance relevant to
strategic processing) between a think-aloud and no think-aloud conditions. However, they did
find that time on task differed significantly between the two groups. Since the think-aloud
protocol took a significantly longer amount of time, it is quite possible that this placed higher
demands on participants working memory. These higher demands may limit the amount of
strategic processing one is able engage in, or conversely, take longer because the protocol itself
is eliciting more strategic processing from the participant.
Metacognitive Monitoring 7
Although there is no direct empirical evidence that think-aloud protocol affects strategic
processing, there is evidence that it negatively affects learning outcomes. Karahasanović, Hinkel,
Sjøberg, and Thomas (2009) concluded in their study that think-aloud protocol impacted not only
reading time, but also negatively impacted participants’ posttest scores. Further, in a descriptive
study, Greatorex and Süto (2008) found that participants reported a wide variation in
participants’ comments about their experience with the think-aloud protocol.
These findings may indicate that due to increased demands on participants’ time,
variability of participants’ descriptions of their experience with the think-aloud protocol, and a
negative impact on learning outcomes, it seems likely that some aspect of metacognitive
monitoring and control would be affected by the think-aloud protocol. This study addresses this
concern by comparing participants’ responses measurements of metacognitive monitoring and
control not expected to affect covert mental processing (i.e., scrollbacks, calibration, and help
seeking) in a think-aloud and no think-aloud condition. We would expect that the think-aloud
protocol would elicit more instances of metacognitive monitoring and control due to the fact that
it attempts to make normally covert processes overt.
Finally, the present research addresses how metacognitive monitoring and control change
with expertise in a particular domain, a relation that has received minimal consideration in the
literature. With a few notable exceptions, most studies of metacognition have not considered the
effect of expertise on metacognitive processes (e.g., de Bruin, Rikers, & Schmidt, 2007). Rather,
they have investigated single populations (i.e., readers who have similar levels of expertise
relative to the content of the text) or have not addressed the issue of expertise at all (e.g., Rhodes
& Castel, 2008). This research paradigm is problematic because the literature predicts
differential processes for individuals at varying levels of expertise within a domain. For example,
Metacognitive Monitoring 8
Alexander’s Model of Domain Learning (MDL; Alexander, 1997) hypothesizes that levels of
expertise (i.e., acclimation, competence, and proficiency) result from the differential confluence
of knowledge, interest, and strategies; a confluence that likely has implications for metacognitive
monitoring and control processes. For instance, it is probable that individuals at higher levels of
expertise are more knowledgeable about and invested in issues relevant to their domain, and thus
likely engage in different patterns of metacognitive monitoring and control, particularly with
respect to calibration, than are novices while reading the same text.
In the current study, this relation was addressed by targeting pools of participants at
varying levels of expertise in government and politics, the domain in which our task was situated
(i.e., the texts utilized for this study were on the topic of judicial review). The first pool was
comprised of undergraduates in a human development course that we predicted to have low prior
knowledge and interest in the domain (i.e., acclimation). We also recruited undergraduates
enrolled in a government and politics course, who we expected to demonstrate moderate levels
of prior knowledge and interest in government and politics (i.e., competence). Lastly, we
included practicing attorneys for our expert group, predicting that they would articulate high
levels of prior knowledge and interest in government and politics. Moreover, their professional
status indicated their level of expertise in the domain. It was our expectation that these groups
would differentially monitor and control their reading behaviors.
The participants for this study were recruited from three different pools. The first pool
consisted of undergraduates at a large mid-Atlantic university in the United States enrolled in
two sections of an introductory human development course. For the students enrolled in the
Metacognitive Monitoring 9
human development course (n = 38) the average age was 21.16. Participants in this first pool
were 52.63% female and 76.32% Caucasian. The average GPA for this first pool was 3.25 and
they had completed an average of 80.68 cumulative college credits. These participants came
from a variety of academic majors.
The second pool consisted of undergraduates at the same university that were enrolled in
an upper-level government and politics course. For the students enrolled in the government and
politics course (n = 38) the average age was 20.34. Participants in this second pool were 39.47%
female and 60.53% Caucasian. The average GPA for this second pool was 3.30 and they had
completed an average of 74.47 cumulative college credits. 71.05% of the participants from the
government and politics class were government and politics majors.
The third pool consisted of practicing attorneys from the mid-Atlantic region of the
United States. For the practicing attorneys (n = 4) the average age was 28.5. Participants in this
third pool were all male and 75.00% Caucasian.
The materials for this study were all computerized. The materials in the computer
environment consisted of two text passages and a glossary, as well as the measures for the study.
Text passages. The texts for this study consisted of an expository passage and a two-sided
refutational passage. The topic for these texts was judicial review. Currently, there is some
debate over the use (i.e., the overuse) of judicial review that is referred to as judicial activism.
Each of the two passages was adapted so that they were of similar length and difficulty. These
passages were presented in a text box with scroll arrows on the right hand side. Three lines of
text were clearly visible at a time. Lines of text both above and below the target text were in light
Metacognitive Monitoring 10
The expository passage (Appendix A) was adapted from a Microsoft Encarta entry on the
judicial branch (Microsoft, 2008). This passage described the role of the judicial branch and did
not contain any argument relating to judicial review or judicial activism. The expository passage
was 1,111 words and was 79 lines long. The Flesch Reading Ease for this passage was 44.5 and
the Flesch-Kincaid Grade Level was 12.4.
The two-sided refutational passage (Appendix B) was adapted from two sources. The
first source was from a transcript of a speech given by then Attorney General Alberto Gonzales
at the American Enterprise Institute on January 17, 2007, entitled, ”Democracy and the Third
Branch" (Gonzales, 2007). Gonazales argued in his speech that the judicial branch should
exercise extreme caution in when declaring executive and legislative actions unconstitutional.
The second source was an article written by Clint Bolick, a member of the CATO Institute,
which appeared in the Wall Street Journal April 3, 2007, entitled, "A Cheer for Judicial
Activism" (Bolick, 2007). Bolick argued in the article that the judiciary must do everything
possible to ensure that the government does not infringe on individuals' civil liberties. These two
sources were woven together to create a two-sided refutational text in which Gonzales's
arguments restricting the use of judicial activism were refuted by Bolick's arguments for the
judicial to do everything possible to ensure individuals' liberties. The two-sided refutational
passage was 1,213 words and was 81 lines long. The Flesch Reading Ease for this passage was
39.2 and the Flesch-Kincaid Grade Level was 14.2.
Glossary. The glossary consisted of a glossary of terms as well as biographical
information on both Alberto Gonzales and Clint Bolick. The glossary of terms listed keywords
from each of the texts and gave their definitions. The definitions for these terms were adapted
from the Merriam-Webster Online Dictionary (Merriam-Webster, 2008). Sample terms included:
Metacognitive Monitoring 11
judicial activism, Alexander Hamilton, James Madison, tyranny, Constitutional law, deference,
and judicial review. The brief biographies for both Alberto Gonzales and Clint Bolick were each
less than three hundred words.
The measures for this study were also all computerized. The measures for the study
include: demographics, prior knowledge, topic interest, passage knowledge, and calibration.
Demographics. The demographics questionnaire had students report their sex, age, and
ethnicity (using the United States Census Bureau categories). For the undergraduates, they were
also asked to report their academic major, cumulative college credits completed, and their
cumulative grade point average (based on a four-point scale).
Prior knowledge. The prior knowledge test measured participants' prior knowledge on the
topic of the judicial review process. The measure consisted of sixteen multiple-choice items
based on information in both the expository and persuasive passages. All the prior knowledge
questions came from the two passages (eight from the expository passage and eight from the
persuasive passage).
The responses for the multiple-choice items were scored using a targeted response model
(Alexander, Murphy, & Kulikowich, 1998). In this way, differentiation between those immersed
in the topic or domain and those not immersed in the topic or domain could be made. An
example of one of the multiple choice items appears below.
Appellate jurisdiction is exercised by __________.
a. the United States courts of appeals (4)
b. the Supreme Court (2)
c. the President (0)
Metacognitive Monitoring 12
d. trial courts (1)
The answer choices corresponded to one of the following categories: in-topic correct
responses, in-topic incorrect responses, in-domain incorrect response, and popular lore
responses. In this case, the answer choice "the United States Court of Appeals" was the in-topic
correct response and was scored a 4. The answer choice "the Supreme Court" was the in-topic
incorrect response and was scored a 2. The answer choice, "the Supreme Court" was incorrect,
but was within the topic of judicial review. The answer choice, "trial courts" was the in-domain
incorrect response and was scored a 1. Although the trial courts fall within the domain of
government and politics, they have no role in the topic of judicial review. The answer choice,
"the President" was a “popular lore” answer and was scored a 0. This response was one in which
someone with little to no domain knowledge may choose.
The Cronbach's alpha for the prior knowledge measure was 0.59. Although lower than
the suggested alpha for experimental measures of 0.70, the depressed alpha in this case may
represent participants' fragmentary knowledge on the topic of judicial review. Bernardi (1994)
suggests that alpha is partially dependent on the sample chosen. In this case, it is quite possible
that the weak correlation between items may have been due to the fact that the participants
(particularly the human development undergraduates) may have some declarative knowledge
(i.e., statements or propositions about a domain) but that this knowledge may not be principled
(i.e., overarching conceptualizations in a domain).
Topic interest. Topic interested was assessed by having participants report their level of
interest for ten items related to the judicial branch. These items included how interested they
were in: checks and balances, historic court decisions, judges and justices, the Constitution, and
the founding fathers. The participants were asked to respond to these ten items by making a slash
Metacognitive Monitoring 13
on a 100-pixel line with "not interested" and "very interested" at opposite poles. The Cronbach's
alpha for this scale was 0.90. An example item for the topic interest scale appears below.
Governmental systems of checks and balances
not interested
very interested
Passage knowledge. For each passage, knowledge was assessed immediately following
each passage. These passage knowledge questions related directly to information presented in the
passage and were similar to the prior knowledge question both in the wording of the questions
and the response format. However, the particular questioned that appeared after each passage
could only be answered from the passage the participants had just read. There were eight
questions per passage.
Cronbach’s alpha for this scale was 0.39. As discussed above, these were the same
questions as the prior knowledge test. The lower alpha when these items are presented after
reading a passage actually presents an interesting picture. One possibility is the differential
ability of participants to learn from text, thereby weakening further the correlations between
items for this sample. Regardless, since these posttest items were taken directly from the
passage, the validity for the scale outweighs concerns about the reliability as reported by
Cronbach's alpha.
Calibration. Immediately following each passage knowledge question, participants were
asked to rate their confidence in the answer to the preceding passage knowledge question. The
participants were asked to respond to the calibration items by “clicking on the line indicating
how confident you would be in the accuracy of your response to the following questions." The
Metacognitive Monitoring 14
Cronbach's alpha for the confidence scales was 0.89. A sample item for calibration appears
Appellate jurisdiction is exercised by __________.
Trace Data
In addition to the measures described above, we collected trace data in the form of
logfiles for scrollbacks and help seeking. We also collected trace data in the form of audiotapes
for the think-aloud protocol.
Scrollbacks. Scrollbacks were operationalized as the number of times a participant
scrolled backward through the text by at least three lines or more, similar to a study conducted by
Johnson-Glenberg (2005). Trace data were collected on participants' navigation patterns through
the text to give us a count of the total number of scrollbacks for each passage for each
participant. Additionally, we also tracked the amount of time the participants spent on each
portion of the text (i.e., the three line segments).
Help seeking. Help seeking was operationalized as the number of times a participant
accessed the glossary terms or biographies. Access to the glossary was either access to the terms
or the biographies. Trace data were collected to give us the total number of times for each
passage that the participants accessed the glossary.
Think aloud. Participants in the think-aloud condition were asked to think aloud while
reading each of the two passages (see the procedures section for more information on the thinkaloud protocol). The 35 think alouds were transcribed into text files by the first and third authors.
Metacognitive Monitoring 15
These transcripts were then coded for instances of metacognitive monitoring and control by the
first and second authors. Using Flavell's (1979) conception of metacognition, transcripts were
coded for instances of metacognitive knowledge (MK), metacognitive experiences (ME), goals
(G), and the activation of strategies (AS). During coding, goals and the activation of strategies
was combined into a single code (G/AS). Definitions of these three codes and examples for each
appear in Table 1. The level of inter-rater reliability for a randomly seleted 20% of the think
alouds was 90.66%. Differences between these codes were resolved through conference. This
level of inter-rater reliability was considered acceptable, and the first author coded the remainder
of the think-aloud transcripts using this coding scheme.
All participants were treated according to APA (5th Edition) guidelines and completed a
consent form before participating. The experiment was conducted on four PCs in a laboratory
running Internet Explorer 7.0. Data were sent from the PCs to a secure external Apache server
running on a UNIX platform. The experiment was administered by the first, second, and third
authors. Both the order of passages (i.e., expository and persuasive) as well as think-aloud
condition were counterbalanced in a Latin-squares design.
No think-aloud condition. Participants were seated at one of four computer workstations
in the laboratory. At the beginning of the experiment participants were instructed, "For all of the
measures, if you don't know the answer, please take your best guess." Participants completed the
demographic, prior knowledge, and topic interest measures. According the Latin-squares design,
participants in the no think-aloud condition either had the expository passage first or the
persuasive passage first. Participants were told that they were going to answer questions after
reading the passages. As the participants read the text, they scrolled up or down through the text
Metacognitive Monitoring 16
until they reached the end of the text. By clicking the "continue" button at the end of the passage
student were directed to the passage knowledge measure. Immediately after the recognition
items, participants completed the confidence scales for each question (calibration). Following the
calibration items, participants completed beliefs and passage interest measures for that particular
passage. Participants then repeated the same procedure for the second passage. Following the
experiment participants were debriefed.
Think-aloud condition. The procedure for the think aloud condition was identical to the
no think-aloud condition, except for the following additions. Before the first passage subjects
were given instructions for the think-aloud protocol and given a short practice passage. The
protocol for the think aloud is included in Appendix C. The practice passage was about
mosquitoes and was adapted from a popularly written science article by Marston Bates (1975).
Once participants felt comfortable reading aloud, they then read either the expository or
persuasive text first. Before each passage participants were instructed, "As you read this text,
please say out loud what you are thinking and doing." Participants could choose to read aloud or
not. If participants were silent for more than 30 seconds the experimenter prompted the subjects
again to please say out loud what they were thinking or doing. This procedure was repeated for
the second passage.
Results and Discussion
Results for each of the three research hypotheses (i.e., textual influences on
metacognitive monitoring and control, effects of the think-aloud protocol on metacognitive
monitoring and control, and the influence of domain expertise on metacognitive monitoring and
control) are presented and briefly discussed.
Due to internet and power failures during data collection, data from 4 participants were
Metacognitive Monitoring 17
lost. The following analyses used the 76 remaining participants (n = 36 for the think aloud
condition and n = 40 for the no think aloud condition). In addition one think aloud was unusable
due to poor tape quality. Given the circumstances, these data can be considered missing at
random and not a participant effect. All presented analyses use data from the remaining 76
participants and 35 think-aloud transcripts unless otherwise noted.
Means and standard deviations for the metacognitive monitoring and control variables
(i.e., scrollbacks, help seeking, absolute accuracy, and bias) appear in Table 2 across passages
and think-aloud conditions. The data in Table 2 provide evidence that the number of help seeking
behaviors that this sample engaged in was very limited. Due to the very low prevalence of help
seeking in this investigation (i.e., three participants), we have excluded it from further analyses.
Since the data collected during this investigation consisted of both trace data (i.e., count
data) and difference scores (i.e., calibration), inferential analysis on the means of these variables
was considered inappropriate. Since these frequency data and difference scores should not be
considered to follow a normal distribution, we chose to use a non-parametric bootstrap
technique. Bootstrap has been identified as a good technique to test non-parametric data (Efron
& Tibshirani, 1993), such as frequency and difference scores in this investigation. For all of the
following tests we used the bootstrapping technique to resample (N=5000) from the participants
in our study (n=76). The re-sample created a distribution in which we calculated the median
(Med) along with a 95% confidence interval at the 2.5 (P2.5) and 97.5 (P97.5) percentiles. This
allowed us to test null hypotheses that differences between passages, conditions, groups, or
interactions were zero at α = 0.05.
Textual Influences on Metacognitive Monitoring and Control
We compared the monitoring and control variables (i.e., scrollbacks, absolute accuracy,
Metacognitive Monitoring 18
and bias) between the expository and persuasive passages. In addition to looking at differences
between these measures, we also examined the think-aloud data from participants in the thinkaloud condition (n=35).
Scrollbacks. We began by testing to see how many times participants scrolled backward
through the expository passage and the persuasive passage (a between-passages test). Figure 1
displays the medians for both passages (0.76 for the expository passage and 1.00 for the
persuasive passage). The median difference between these two passages was -0.25. This suggests
that overall scrollbacks were used 0.25 more times during the persuasive passage than the
expository passage. This was difference was not significant (Med = -0.25, P2.5 = -0.82, P97.5 =
The lack of difference between passages may mask the difference within individuals
between the passages. Specifically, we calculated the difference scores for each individual on
scrollbacks between the passages in order to investigate whether individuals used scrollbacks
more often for the expository or persuasive passage. This is in effect a within-subjects repeated
measures test using bootstrapping. First, we tested the value of the absolute difference (by
participant) in the usage of scrollbacks between the passages by subtracting the number of
scrollbacks in the expository passage by the number of scrollbacks in the persuasive passage.
The median of the resample from the bootstrap test was 0.99. This indicates that a participant at
the 50th percentile of difference scores had a difference in scrollback usage between the passages
of 0.99, regardless of which passage they used the greater number of scrollbacks for. This
median difference was significantly different than zero (Med = 0.99, P2.5 = 0.72, P97.5 = 1.32).
Specifically, to test our hypothesis that the persuasive passage would elicit more evidence
of metacognitive monitoring and control, we examined the directionality of these difference
Metacognitive Monitoring 19
scores (i.e., did the individuals use more scrollbacks for the persuasive passage?). Here we tested
the value of the signed difference (retaining the positive or negative value of the difference
score) in participants’ usage of scrollbacks between the passages. The median difference was 0.24. This indicates that a participant at the 50th percentile of signed difference scores used 0.24
more scrollbacks for the persuasive text than the expository text. However, this was not a
significant difference (Med = -0.24, P2.5 = -0.61, P97.5 = 0.14). This evidence suggests that there
is in fact a main effect for scrollbacks between passages, but that the directionality (i.e., which
passage was greater) was non-significant in this sample.
Calibration. To test participants’ calibration between passages, we calculated both
absolute accuracy and bias. We used a similar procedure to the one Nietfeld, Cao, and Osborne
(2005) used. For absolute accuracy we calculated the difference between their overall confidence
on the multiple-choice items (on a 100-pixel scale) and their corresponding performance on
those posttest multiple-choice items (percent correct on the posttest). Since we used a targeted
response model, we divided the scores across the eight items by 32 (maximum possible score on
all eight items) instead of the total number of items, as Nietfeld, et al (2005), did. We then took
the absolute value of these differences to get an absolute accuracy score for each individual. For
bias, we used the same procedure except that we retained the signed value of the difference score
between confidence and performance to see if the participants were over- or under-confident.
We hypothesized that participants would be better calibrated for the persuasive passage
than the expository passage. Figure 2 presents the absolute accuracy and bias scores between the
passages. Lower difference scores indicate that participants were better calibrated. Medians for
absolute accuracy were 11.41 for the expository passage and 11.45 for the persuasive passage.
The median difference between the two passages for these participants on absolute accuracy was
Metacognitive Monitoring 20
-0.075. This indicates that a participant with an absolute difference score at the 50th percentile
was more closely calibrated on the expository passage than the persuasive passage by 0.075
points, regardless of whether they were overconfident or under-confident on the passages. This
difference was not significant (Med = -0.075, P2.5 = -3.88, P97.5 = 3.97).
Further, we predicted that participants would be overconfident for the expository passage,
but not so for the persuasive passage. Figure two shows that, in fact, participants were
overconfident for the expository passage (Med = 3.97) and under-confident for the persuasive
passage (Med = -4.63). This indicates that a participant with a signed difference score at the 50th
percentile was overconfident on the expository passage by 3.97 points (confidence was higher
than performance) and under-confident on the persuasive passage by -4.63 points (performance
was higher than confidence). The median difference between the passages was significant (Med
= -8.59, P2.5 = -14.74, P97.5 = -2.39). This evidence suggests that while absolute accuracy did not
differ between the two passages, the manner in which they differed (i.e. over- or underconfidence) did.
Think aloud. Figure 3 presents the differences in median number of utterances for
metacognitive knowledge, metacognitive experiences, and goals/activation of strategies within
the 35 participants in the think-aloud condition. Again, we tested the differences within
individuals by subtracting the number of metacognitive knowledge utterances in the persuasive
passage from the metacognitive knowledge utterances in the expository passage. The medians
for metacognitive knowledge utterances were 3.14 and 1.23 for the expository and persuasive
passages respectively. The median difference between passages at the individual level for
metacognitive knowledge was 1.91. This indicates that a participant with a metacognitive
knowledge difference score at the 50th percentile made 1.91 more metacognitive knowledge
Metacognitive Monitoring 21
utterances in the expository passage than the persuasive passage. This difference was significant
(Med = 1.91, P2.5 = 0.91, P97.5 = 3.09).
The medians for metacognitive experience utterances were 2.71 and 5.60 for the
expository and persuasive passages respectively. The median difference between passages was 2.77. This indicates that a participant with a metacognitive experience difference score at the 50th
percentile made 2.77 more metacognitive experience utterances in the persuasive passage than
the expository passage. This difference was also significant (Med = -2.77, P2.5 = -4.52, P97.5 = 1.11).
The medians for goals/activation of strategies were 1.54 and 2.11 for the expository and
persuasive passages respectively. The median difference between passages was -0.57. This
indicates that a participant with a goals/activation difference score at the 50th percentile made
0.57 more goals/activation of strategy utterances in the persuasive passage than the expository
passage. This difference was also significant (Med = -0.57, P2.5 = -1.11, P97.5 = -0.29). Overall,
this evidence demonstrates that type of text may elicit more types of metacognitive monitoring
and control, and also demonstrates that type of text may elicit different types of metacognitive
monitoring and control.
Effects of the Think-Aloud Protocol on Metacognitive Monitoring and Control
Next, we turn to an examination of the differences between the think-aloud and no thinkaloud groups in regards to scrollbacks and calibration. We predicted that think aloud would elicit
greater metacognitive monitoring and control. To test the hypotheses about the difference
between the think-aloud and no think-aloud conditions, we again relied on the non-parametric
bootstrap. For the following analyses differences in scrollbacks and calibration were examined
for passages within participants (i.e., difference scores) between the two groups (essentially a
Metacognitive Monitoring 22
repeated measures test of passage effects using the think-aloud groups as the between-subjects
Scrollbacks. To test the hypothesis that participants in the think-aloud group would
demonstrate more metacognitive monitoring and control via scrollbacks and calibration, we
conducted a bootstrap test with a null hypothesis that the difference in the medians of each group
equaled zero. Figure 4 presents these data for scrollbacks. First, we looked to see if there were
differences in the absolute (unsigned) difference in scrollbacks between passages for each
participant. The absolute median difference between scrollbacks for each individual on the
passages was 0.64 for both the think-aloud and no-think aloud groups. The difference between
these two medians was not significant (Med = 0.00, P2.5 = -0.98, P97.5 = 0.51). This indicates that
a participant at the 50th percentile of the think-aloud group had the same number of scrollbacks
as a participant in the 50th percentile of the no-think aloud group.
Further, we examined if the groups scrolled back differently for one passage versus the
other by retaining the signed difference. For the think-aloud group, the median difference in
scrollbacks was -0.36, indicating that a participant in the 50th percentile of the acclimation group
scrolled back 0.36 times more often in the persuasive passage than the expository passage. For
the no think-aloud group, the median difference in scrollbacks was -0.13, indicating that a
participant in the 50th percentile of the no think-aloud group scrolled 0.13 times more often in the
persuasive passage than the expository passage. The difference between the two groups on
scrollbacks between the passages was not significant (Med = -0.23, P2.5 = -0.98, P97.5 = 0.52).
These tests indicate that there were no between-subjects effects for the think-aloud condition.
Calibration. To test the hypothesis that the think-aloud group would be more closely
calibrated than the no think-aloud group, we conducted a bootstrap test with a null hypothesis
Metacognitive Monitoring 23
that the difference between groups was zero. This is the between-subjects effects (think-aloud
condition) for the repeated measures (i.e., the passages) for calibration. Results for both absolute
accuracy and bias are presented in Figure 5. The first examination was of differences in the
groups’ absolute accuracy (the unsigned difference between confidence and performance). There
were no significant differences in absolute accuracy between the two groups for either the
expository passage (Med = -0.33, P2.5 = -4.17, P97.5 = 3.34) or the persuasive passage (Med =
1.31, P2.5 = -2.71, P97.5 = 5.28).
The second examination was for bias (the signed difference between confidence and
performance). The difference between the two groups (i.e., think-aloud and no think-aloud) for
the expository passage was 0.94. This means that a participant at the 50th percentile of the thinkaloud group was more under-confident compared to a participant at the 50th percentile of the no
think-aloud group, though this was not statistically significant (Med = 0.94, P2.5 = -5.01, P97.5 =
7.09). These differences were also not significant for the persuasive passage with a median
difference between the two groups of 0.89. This indicates that a participant in the 50th percentile
of the think-aloud group was more under-confident than a participant at the 50th percentile of the
no think-aloud group (Med = 0.89, P2.5 = -5.42, P97.5 = 6.79). These tests indicate that there were
no significant differences in the between-subjects effects for think-aloud condition.
Effect of Domain Expertise on Metacognitive Monitoring and Control
The participants for the study were chosen specifically because their various levels of
expertise were hypothesized to differ. Although the undergraduates (i.e., from the human
development and government and politics classes) are similar to each other in terms of their GPA
and cumulative college credits completed, they differed in both their prior knowledge and
interest in the judicial review process. Table 3 and Figure 6 show the differences in mean levels
Metacognitive Monitoring 24
of prior knowledge and topic interest across the three participant pools (which were continuous,
normally distributed data). As we would expect, there is a clear increase in both prior knowledge
and topic interest from the human development undergraduates (those in assimilation),
government and politics undergraduates (those in competence), to the practicing attorneys (those
in expertise).
An omnibus ANOVA test indicated that there were significant differences in both prior
knowledge (F = 12.72, df = 2, p < 0.01) and topic interest (F = 9.59, df = 2, p < 0.01) between
these three groups. Contrasts (i.e., Fischer's LSD) indicated that there were also significant
differences between the human development undergraduates and the government and politics
undergraduates in both prior knowledge (Mdif = 6.46, SE = 1.54, p < 0.01) and topic interest (Mdif
= 14.46, SE = 4.08, p < 0.01). There were also significant differences between the human
development undergraduates and the practicing attorneys in both prior knowledge (Mdif = 12.85,
SE = 3.50, p < 0.01) and topic interest (Mdif = 29.33, SE = 8.92, p < 0.01). However, significant
differences were not found between the government and politics undergraduates and the
practicing attorneys in either prior knowledge (Mdif = 6.39, SE = 3.50, p = 0.072) or topic interest
(Mdif = 14.86, SE = 8.96, p = 0.10). However, we contend that these differences were not
detected due to the small sample size of the practicing attorneys combined with the conservative
nature of contrasts such as Fischer's LSD.
Since significant differences were found between the two undergraduate participant pools
and we were able to obtain large enough samples, an examination of the differences between
these two groups in regards to scrollbacks and calibration was undertaken. To test the hypotheses
about the difference between the human development undergraduates (i.e., those in acclimation)
and the government and politics undergraduates (i.e., those in competence), we again relied on
Metacognitive Monitoring 25
the non-parametric bootstrap. For the following analyses differences in scrollbacks and
calibration were examined for passages within participants (i.e., difference scores) between the
two groups (essentially a repeated measures test of passage effects using the developmental
groupings as the between-subjects effect).
Scrollbacks. To test the hypothesis that participants in the acclimation group would use
scrollbacks more often in the expository passage than the persuasive passage, whereas the
participants in the competence group would use scrollbacks more for the persuasive passage than
the expository passage, we conducted a bootstrap test with a null hypothesis that the difference in
the medians of each group equaled zero. Figure 7 presents these data for scrollbacks. First, we
looked to see if there were differences in the absolute (unsigned) difference in scrollbacks
between passages for each participant. The absolute median difference between scrollbacks for
each individual on the passages was 0.57 for the acclimation group and 1.38 for the competence
group. The difference between these two medians was significant (Med = -0.80, P2.5 = -1.38,
P97.5 = -0.26). This indicates that a participant at the 50th percentile of the competence group used
0.80 more scrollbacks in one passage versus the other, regardless of which passage they used the
greater number for.
Further, we examined which of these passages had more scrollbacks in each of these
groups by retaining the signed difference. For the acclimation group, the median difference in
scrollbacks was 0.29, indicating that a participant in the 50th percentile of the acclimation group
scrolled back 0.29 times more often in the expository passage than the persuasive passage. For
the competence group, the median difference in scrollbacks was -0.84, indicating that a
participant in the 50th percentile of the competence group scrolled 0.84 times more often in the
persuasive passage than the expository passage. The difference between the two groups on
Metacognitive Monitoring 26
scrollbacks between the passages was significant (Med = 1.08, P2.5 = 0.33, P97.5 = 1.73). This
indicates that in addition to a repeated measures effect for passages (i.e., textual effects of
metacognitive monitoring and control), there is also a between-subjects effect of developmental
Calibration. To test the hypothesis that the group in competence would be more closely
calibrated than the group in acclimation, we conducted a bootstrap test with a null hypothesis
that the difference between groups was zero. This is the between-subjects effects (developmental
group) for the repeated measures (i.e., the passages) for calibration. Results for both absolute
accuracy and bias are presented in Figure 8. First, we looked to see if there were differences in
their absolute accuracy (the unsigned difference between confidence and performance). There
were no significant differences in absolute accuracy between the two groups for either the
expository passage (Med = -1.74, P2.5 = -5.48, P97.5 = 1.86) or the persuasive passage (Med =
0.00, P2.5 = -4.05, P97.5 = 4.09).
For bias, slight differences between the groups began to emerge, particularly for the
persuasive passage. The difference between the two groups (i.e., acclimation and competence)
for the expository passage was 0.71. This means that a participant at the 50th percentile of the
competence group was slightly more overconfident compared to a participant at the 50th
percentile of the acclimation group, though this was not statistically significant (Med = 0.71, P2.5
= -5.28, P97.5 = 6.59). However, for the persuasive passage this difference was greater. In fact,
for our sample, a participant in the 50th percentile of the acclimation group was more
underconfident (by 5.13 points) than a participant at the 50th percentile of the competence group,
although this difference was not significant (Med = 5.13, P2.5 = -1.18, P97.5 = 11.55).
Metacognitive Monitoring 27
To our knowledge, this study was the first to investigate metacognitive monitoring and
control between expository and persuasive text. Previous evidence of persuasion on knowledge
and interest (Buehl et al, 2001) spurred us to investigate learners’ strategic processing with
persuasive text. In addition, it was important to us that measures of metacognitive monitoring
and control did not change or elicit different levels (quantity or quality) of participants’ mental
processing. Moreover, by sampling from participant pools which we hypothesized would have
varying levels of expertise (i.e., acclimation, competence, and proficiency) we were able to
examine these differences among participants of different familiarity with the domain.
Of the results presented above, the most surprising to us was the very limited use of the
help-seeking feature (i.e., the glossary) by the participants of all expertise levels in this
investigation. Given the active lines of research dealing with help seeking in the literature (e.g.,
Aleven & Koedinger, 2002), we expected participants to use help seeking in at least one of the
two passages. Two reasons may underlie the limited use of help seeking here. One, it may be a
reflection of the differences in task environment, and two, it may be a reflection of the
participants' motivation.
First, unlike Aleven and Koedinger’s work (which primarily deals with well-structured
tasks such as solving geometry problems), the task environment here was an ill-structured task,
comprehending text. Since participants were not required to find “an answer” to a problem, but
rather try to comprehend the passage, the participants may have been unaware that they needed
to seek help (a monitoring problem). Additionally, the accessibility of the help seeking feature
may make a difference in their probability of using the feature. For example, if the environment
(such as a cognitive tutor) prompts students with a help-seeking option, they may be more likely
to examine these features. In this investigation, participants were told the help-seeking feature
Metacognitive Monitoring 28
was available, but were not prompted during the task to use this feature.
Second, participants’ motivation may have played a role in the limited use of help
seeking within this study (a control problem). Participants may know they do not understand a
term, but lack the interest or need to comprehend the passage to actually seek help. This finding
is particularly helpful in attempts to structure environments (computerized and otherwise) that
encourage participants to monitor and control their mental processes. However, with these results
in mind, we caution that if participants are prompted to seek help, this does not mean that they
will be able or willing to seek help on their own. This is particularly salient in the literature
dealing with metacognition and self-regulated learning since a large percentage of studies use
some form of prompting (Dinsmore, et al, 2008).
Textual Influences on Metacognitive Monitoring and Control
Rereading differed across conditions, as we found evidence that participants used
scrollbacks differently across the two passages, but that the participants did not necessarily use
scrollbacks more for the persuasive passage than for the expository passage. One possible
explanation may relate to working memory demands, while the other is a limitation with the
choice of measure in this investigation. If in fact, as Kellogg (2001) found, that persuasive text
places greater demands on working memory than expository text, this may explain why some
participants reread more for the persuasive text and others reread more for the expository text. In
order to deal with higher demands on working memory, one might need to use strategies, such as
rereading to deal with the higher demands of the persuasive passage. On the other hand, it is also
possible that the high demands on working memory make monitoring and control more costly,
causing one to reread less of the persuasive passage. We suspect that some of these issues will be
clarified as we examine rereading among the domain expertise groups. It would be our
Metacognitive Monitoring 29
contention that prior knowledge and interest of the individual may help explain these findings in
regards to rereading.
An alternative explanation may involve the limitations of using scrollbacks to measure
rereading. We were only able to detect when participants scrolled back more than three lines.
Since we had collected think-aloud data for some individuals, an inspection of these transcripts
revealed that participants reported rereading more times than they had scrolled back (i.e., going
back one or two lines). We will continue to examine participants' strategic moves through text
with measures more fine-tuned than scrollbacks. For example, we are hoping that using eyetracking methodology will help us examine strategic moves through different types of text.
While absolute accuracy did not differ between passages as we expected, the difference
in bias (i.e., overconfidence and under-confidence) was significant. We can forward two possible
explanations for this finding. First, the difference in bias may relate to participants relative
familiarity in reading expository and persuasive passages. A large majority of the participants in
the study were university undergraduates who read mostly expository texts (i.e., textbooks) for
their classes. Familiarity with this type of text may increase their confidence to levels beyond
their actual performance. Conversely, their relative unfamiliarity with persuasive texts
(especially in the classroom environment) may make them less confident in their ability to
comprehend the text. As we collect more data for practicing attorneys, we hypothesize that their
familiarity with legal briefs (a type of persuasive text) may moderate their bias scores.
Overall, metacognitive experience and goals/activation of strategies were higher for the
persuasive passage, while metacognitive knowledge was higher for the expository passage. This
finding makes sense to us, since the expository passage was primarily a collection of declarative
facts (e.g., “Since Marbury v. Madison, about 150 federal laws have been struck down in whole
Metacognitive Monitoring 30
or in part, along with about 1000 state laws and more than 100 municipal ordinances.”). Making
connections to their prior knowledge (e.g., “I knew that”, “I didn’t know that”) was the main
metacognitive monitoring activity during this expository passage. Whereas, in the persuasive
passage participants had to evaluate both comprehension and agreement (e.g., “I don’t
understand that”, “I agree with that”) in order to analyze the arguments being presented in the
passage (e.g., “Gonzales, arguing against judicial activism, states that courts should be very
careful in taking the step of declaring that a law or agency action is unconstitutional.”).
Interestingly, there were more utterances of goals/activation of strategies in the persuasive
passage. This may indicate increased engagement with the text, especially since the participants
had to evaluate both comprehension and agreement more closely in the persuasive text than the
expository text. This finding supports the explanation that perhaps scrollbacks are not fine
grained enough to differentiate the activation of strategies in these two types of texts.
Effects of the Think-Aloud Protocol on Metacognitive Monitoring and Control
In line with previous studies (e.g., Veenman et al, 1993), we did not find significant
differences in metacognitive monitoring and control as measured by scrollbacks and calibration.
However, this does not mean that differences do not exist. It may be the case that our ability to
detect these differences was limited by our measures. For example, the rereading was
operationalized as scrolling back through more than three lines of text. It is possible, as we stated
above, that participants looked back one or two lines more often in one of the conditions, but that
this difference was undetectable in our data.
Although we found no significant differences here, we are still unsure that the thinkaloud protocol has no impact on metacognitive monitoring and control, especially given the
evidence that this protocol significantly affects participant outcomes (Karahasanović, Hinkel,
Metacognitive Monitoring 31
Sjøberg, and Thomas, 2009). Since we have other data on participant outcomes in addition to the
multiple-choice items, we plan to investigate whether this is the case in this study as well.
Additionally, as Greatorx and Süto (2008) reported in their descriptive study, participants
reported varied experiences with the think-aloud protocol. An examination of Table 4 shows that
while the means for scrollbacks are similar for the two conditions, the standard deviation for the
think-aloud condition was higher. In fact Box’s Test (which tests the equality of the covariance
matrices between groups) was significant (F = 3.07, df = 3, 1559607, p < 0.05). This finding is in
line with what Greatorex and Süto (2008) found in their descriptive study. We can forward two
explanations for this difference in variance between the groups. The literature suggests that the
directions (specifically whether they chose to read out loud or not) may have primed some
participants to engage in certain behaviors, strategic and otherwise (Bannert & Mengelkamp,
Effect of Domain Expertise on Metacognitive Monitoring and Control
For the third question, a between-subjects effect of developmental level, we found
significant differences between the groups' rereading behavior, but not their calibration. This
question is one of central importance, as studies comparing metacognition at different levels of
expertise are limited in the contemporary literature (Dinsmore, et al, 2008). First, we found that
the government and politics students reread more for the persuasive passage than the expository
passage. This clarifies the findings from the within-subjects effects of the passages above. In
fact, it was interesting that unlike the trend for all the participants together, the human
development undergraduates as a group scrolled reread more for the expository text than the
persuasive text. Not only were these participants likely more unfamiliar with persuasive text in
the classroom environment, they were as a group more unfamiliar with the topic. We contend
Metacognitive Monitoring 32
that they were probably able to engage with the expository text more easily because it required
less prior knowledge to comprehend. Conversely, it is likely that more prior knowledge would be
necessary to understand and engage with the arguments presented for and against judicial
activism, which would subsequently impact ease of comprehension.
We did not find significant differences for calibration, which is in line with previous
research that novices and experts do not necessarily differ in how well-calibrated they are to a
task (Lichtenstein & Fischoff, 1980). Both groups (i.e., acclimation and competence) were
overconfident for the expository passage and under-confident on the persuasive passage. Overall,
these participants actually seemed to be fairly well calibrated. The median participant was only
miscalibrated by about 11 points. We were surprised that most participants, particularly the
participants in acclimation were so accurate.
Considering metacognition through a developmental theory of expertise, such as the
MDL is crucial. When assigning course text, students' prior knowledge and interest should be
considered as these factors indicate competence within a domain and the subsequent ease with
which students can engage with particular types of text. As Fox et al. have cautioned,
undergraduates are not as apt to learn from assigned texts as instructors often assume. Difficulty
learning from text often stems from poor metacognitive monitoring and control (Wiley, Griffin,
& Thiede, 2005). In order to further investigate the findings reported in this study it may be
necessary to collect more data from practicing attorneys, so we can examine differences among
all levels of expertise, including those demonstrating proficiency.
Metacognitive Monitoring 33
Author Note
We would like to thank Emily Fox for her help in adapting the passages. We would also
like to thank the members of the Disciplined Reading and Learning Research Laboratory for
their helpful comments and feedback on this manuscript.
Metacognitive Monitoring 34
Aleven, V. A., & Koedinger, K. R. (2002). An effective metacognitive strategy: Learning by
doing and explaining with a computer-based Cognitive Tutor. Cognitive Science, 26, 147179.
Aleven, V. A., McLaren, B., Roll, I., & Koedinger, K. R. (2006) Toward Meta-cognitive
Tutoring: A Model of Help Seeking with a Cognitive Tutor. International Journal of
Artificial Intelligence in Education, 16, 101-128.
Alexander, P. A. (1997). Mapping the multidimensional nature of domain learning: The interplay
of cognitive, motivational, and strategic forces. In M. L. Maehr & P. R. Pintrich (Eds.),
Advances in motivation and achievement (Vol. 10, pp. 213–250). Greenwich, CT: JAI
Alexander, P. A., Murphy, P. K., & Kulikowich, J. M. (1998). What responses to domainspecific analogy problems reveal about emerging competence: A new perspective on an
old acquaintance. Journal of Educational Psychology, 90, 397-406.
Allen, M. (1991). Meta-analysis comparing the persuasiveness of one-sided and two-sided
messages. Western Journal of Speech Communication, 55, 390-404.
Bannert, M., & Mengelkamp, C. (2008). Assessment of metacognitive skills by means of
instruction to think aloud and reflect when prompted. Does the verbalization affect
learning? Metacognition and Learning, 3, 39-58.
Bates, M. (1975). The lady lives on blood. In A. Ternes (Ed.), Ants, Indians, and little dinosaurs
(pp. 74-82). New York: Charles Scribner’s Sons.
Bernardi, R. A. (1994). Validating research results when Cronbach's alpha is below .70: A
methodological procedure. Educational and Psychological Measurement, 54, 766-775.
Metacognitive Monitoring 35
Bolick, C. (2007). A cheer for judicial activism. Retrieved January 21, 2008,
Buehl, M. M., Alexander, P. A., Murphy, P. K., & Sperl, C. T. (2001). Profiling persuasion: The
role of beliefs, knowledge, and interest in the processing of persuasive texts that vary by
argument structure. Journal of Literacy Research, 33, 269-301.
Carrell, P. L., & Connor, U. (1991). Reading and writing descriptive and persuasive texts. The
Modern Language Journal, 75, 314-324.
Dahl, M., Allwood, C. M., & Hagberg, B. (2009). The realism in older people's confidence
judgments of answers to general knowledge questions. Psychology and Aging, 24, 234238.
de Bruin, A. B. H., Rikers, R. M. J. P., Schmidt, H. G. (2007). Improving metacomprehension
accuracy and self-regulation in cognitive skill acquisition: The effect of learner expertise.
European Journal of Cognitive Psychology, 19, 671-688.
Dinsmore, D. L., Alexander, P. A., & Loughlin, S. M. (2008). Focusing the conceptual lens on
metacognition, self-regulation, and self-regulated learning. Educational Psychology
Review, 20, 391-409.
Dunlosky, J., Serra, M. J., Matvey, G., & Rawson, K. A. (2005). Second-order judgments about
judgments of learning. Journal of General Psychology, 132, 335-346.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman &
Ericcson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge,
MA, US: The MIT Press.
Metacognitive Monitoring 36
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive–
developmental inquiry. American Psychologist, 34, 906–911.
Fox, E., Dinsmore, D. L., Maggioni, L., & Alexander, P. A. (2009, April). Factors associated
with undergraduates’ success in reading and learning from course texts. Paper presented
at the annual meeting of the American Educational Research Association, San Diego,
Gonzales, A. (2007). Speech to the American Enterprise Institute. Retrieved January 21, 2008,
Greatorex, J., & Süto, W. M. I. (2008). What do GCSE examiners think of 'thinking aloud'?
Findings from an exploratory study. Educational Research, 50, 319-331.
Johnson-Glenberg, M. C. (2005). Web-based training of metacognitive strategies for text
comprehension: Focus on poor comprehenders. Reading and Writing, 18, 755-786.
Kamalski, J. Sanders, T., & Lentz, L. (2008). Coherence marking, prior knowledge, and
comprehension of informative and persuasive texts: Sorting things out. Discourse
Processes, 45, 323-345.
Karahasanović, A., Hinkel, U. N., Sjøberg, D. I. K., & Thomas, R. (2009). Comparing of
feedback-collection and think-aloud methods in program comprehension studies.
Behaviour & Information Technology, 28, 139-164.
Kellogg, R. T. (2001). Competition for working memory among writing processes. American
Journal of Psychology, 114, 175-191.
Lichenstein, S., & Fischoff, B. (1980). Training for calibration. Organizational Behavior and
Human Performance, 26, 149-171.
Metacognitive Monitoring 37
Microsoft (2008). The judicial branch. Retrieved January 21, 2008,
Miller, P. H., Kessel, F. S., & Flavell, J. H. (1970). Thinking about people thinking about people
thinking about...: A study of social–cognitive development. Child Development, 41, 613–
Moos, D. C., & Azevedo, R. (2008). Self-regulated learning with hypermedia: The role of prior
domain knowledge. Contemporary Educational Psychology, 33, 270-298.
Murphy, P. K., Long, J. F., Holleran, T. A., & Esterly, E. (2003). Persuasion online or on paper:
A new take on an old issue. Learning and Instruction, 13, 511-532.
Parkinson M. M., & Dinsmore, D. L. (in preparation). Calibrating calibration.
Nietfeld, J. L., Cao, L., & Osborne, J. W. (2005) Metacognitive monitoring accuracy and student
performance in the postsecondary classroom. Journal of Experimental Education, 74, 728.
Rhodes, Matthew G.; Castel, Alan D. (2008). Metacognition and part-set cuing: Can interference
be predicted at retrieval? Memory & Cognition, 36, 1429-1438.
Shapiro, A. M. (2008) Hypermedia design as learner scaffolding. Educational Technology
Research and Development, 56, 29-44.
Thiede, K. W., & Dunlosky, J. (1994). Delaying students' metacognitive monitoring improves
their accuracy in predicting their recognition performance. Journal of Educational
Psychology, 86, 290-302.
Veenman, M. V. J., Elshout, J. J., & Groen, M. G. M. (1993) Thinking aloud: Does it affect
regulatory processes in learning? Tijdschrift voor Onderwijsresearch, 18, 322-330.
Metacognitive Monitoring 38
Wiley, J., Griffin, T. D., Thiede, K. W. (2005). Putting the comprehension in
metacomprehension. Journal of General Psychology, 132, 408-428.
Williams, J. P., Stafford, K. B., Lauer, K. D., Hall, K. M., Pollini, S. (2009). Embedding reading
comprehension training in content-area instruction. Journal of Educational Psychology,
101, 1-20.
Metacognitive Monitoring 39
Appendix A
Expository Passage
The Judicial Branch, the portion of the United States national government that decides
cases arising under federal laws and under the Constitution of the United States. The judicial
branch interprets laws that have been passed by the legislative branch (Congress) and approved
by the president of the United States, who leads the executive branch.
Article III of the Constitution vests the judicial power in “one supreme Court, and in such
inferior courts as the Congress may from time to time establish.” This means that apart from the
Supreme Court, the organization of the judicial branch is left in the hands of Congress.
Beginning with the Judiciary Act of 1789, Congress created several types of courts and other
judicial organizations, which now include lower courts, specialized courts, and administrative
offices to help run the judicial system.
Federal courts have a leading role in interpreting laws, rules, and other government
actions, and determining whether they conform to the Constitution. This function of judicial
review was asserted in 1803 by Chief Justice John Marshall in the case of Marbury v. Madison.
Judicial review includes both interpreting the law and judging cases. First, in Marshall’s words,
“it is emphatically the province and duty of the judicial department to say what the law is.” This
need to explain the law stems from the fact that the Constitution and many laws include vague
words or phrases. The ambiguity of the Constitution’s 14th Amendment, for example, makes it
one of the most important sources of cases argued before the Supreme Court. The amendment
guarantees citizens “due process of law” and “equal protection of the laws.” The meaning of
these phrases is unclear, leading to protracted court battles over the application of the 14th
Amendment to groups such as racial minorities, women, people with disabilities, and legal and
Metacognitive Monitoring 40
illegal aliens. Confusion and disagreement over the amendment have thrust the courts into
disputes over affirmative action, abortion, sexual preferences, welfare benefits, and the rights of
the disabled.
Striking down laws or practices that violate the Constitution is another function of
judicial review. Although the Court voided few laws during its first hundred years, it proved
much more willing to take such strong steps in the 20th century. Since Marbury v. Madison,
about 150 federal laws have been struck down in whole or in part, along with about 1000 state
laws and more than 100 municipal ordinances.
The courts do not always have the final say in settling issues of legal interpretation.
Working together, Congress and the states can compel the courts to accept a legal principle by
amending the Constitution. After the Supreme Court ruled that income taxes were
unconstitutional in Pollock v. Farmers’ Loan & Trust Co. in 1895, for example, Congress and the
states ratified the 16th Amendment in 1913 to permit such taxes. Amending the Constitution is
difficult and is usually time consuming, however.
The president and members of Congress have their own ideas of what the Constitution
permits, and on occasion they may try to impede or simply ignore the courts’ decisions.
The president of the United States appoints federal judges, but these appointments are subject to
approval by the Senate. Once confirmed by the Senate, federal judges have appointments for life
or until they choose to retire. Federal judges can be removed from their positions only if they are
convicted of impeachable offenses by the Senate, but this has happened on only a few occasions.
The life-long appointments of federal judges makes it easier for the judiciary to stay removed
from political pressure. The long terms mean that presidential appointees to federal courts will
Metacognitive Monitoring 41
have an influence that lasts for decades, so the Senate closely scrutinizes many appointments,
and sometimes blocks them altogether.
The federal courts—which include district courts, courts of appeal, and the Supreme
Court—handle only a small part of the legal cases in the United States. Most cases involve state
and local laws, so they are tried in state and local courts rather than federal courts. Despite its
relatively narrow jurisdiction, the caseload of the federal court system usually increases every
year. To cope with the rapidly rising volume of work, Congress has repeatedly expanded the
number of lower federal courts and judges.
Most federal cases start out in the district courts, which are trial courts—courts that hear
testimony about the facts of a case. There are about 90 district courts, including one or more in
each state, one in the District of Columbia, one in Puerto Rico, and three territorial courts with
jurisdiction over Guam, the Virgin Islands of the United States, and other U.S. territories. Each
district is assigned from 2 to 28 judges, and there are about 650 district court judges in all. Each
year the district courts handle more than 250,000 civil cases and more than 45,000 criminal
cases, but only a tiny percentage of the civil and criminal cases actually go to trial.
After a district court hears the facts of a case and issues a decision, the decision can be appealed
to the second tier in the judicial branch, the courts of appeals. The appeals courts can consider
only questions of law and legal interpretation, and in nearly all cases must accept the lower
court’s factual findings. An appeals court cannot, for example, consider whether the physical
evidence in a case was enough to prove a person was guilty. Instead, the appeals court might
consider whether the district court followed appropriate rules in accepting evidence during the
Metacognitive Monitoring 42
The federal appeals courts system was created in 1891 to assist the Supreme Court with
its workload. About 50,000 such appeals are filed every year. For appeals purposes, the United
States is divided into 12 judicial areas called circuits, each with an appeals court containing from
6 to 28 judges. Every state, territory, and the District of Columbia belongs to an appeals circuit .
An additional appeals court, the Court of Appeals for the Federal Circuit, has nationwide
jurisdiction over major federal questions.
Decisions of the appeals courts are final, unless the U.S. Supreme Court agrees to hear a
further appeal. In district courts, most cases are heard by a single judge. In the appeals courts,
cases are usually heard by a panel of three or more judges. When all of the court’s panels of
judges sit together to hear a case the court is said to be sitting en banc.
The United States Supreme Court is the highest court of the country. It consists of nine
judges called justices, including a chief justice and eight associate justices. This number has
remained steady for decades and now seems fixed, although in the 19th century the Court’s size
Metacognitive Monitoring 43
Appendix B
Persuasive Passage
Judicial activism has always been a subject of argument, but is now getting more
attention, particularly due to recent court decisions, such as Hamdan v. Rumsfeld. In this case, a
federal court decided that the Executive Branch could not hold certain suspects without trial
indefinitely. Judicial activism is viewed by its critics, such as Alberto Gonzales, as “the judiciary
overstepping the bounds set by the Constitution.” On the other hand, supporters of a strong,
active judiciary, such as Clint Bolick, feel that recent cases in which judges have been described
as "activist" are actually examples of the judiciary upholding its constitutional role and
protecting the rights of individuals. Both sides base their supporting arguments on historical
grounds, on checks and balances of the Constitution, and on citizens’ rights.
Historical references are used as support both by critics and by those in favor of judicial
activism. Gonzales uses the writers of the U. S. Constitution to support his argument against
activism, saying that he does not believe those who wrote the Constitution ever intended that
judges or courts would take on the role of making policy. He refers to Alexander Hamilton's
statement in the Federalist Papers in which Hamilton says that the judicial branch of the
government will have the least power to endanger political rights because of the limited nature of
the functions assigned to it in the Constitution. Bolick uses similar but more compelling
historical references to make his case in favor of a stronger role for the courts. He argues that
judicial review, the power to invalidate unconstitutional laws, was essential to the type of
government established by our Constitution. He quotes James Madison, another writer of the
Constitution, who argued that one role of the judicial branch will be to guard our individual
rights from possible violation by the executive or legislative branches of government. For
Metacognitive Monitoring 44
example, courts have found that certain anti-abortion legislation made by states violates the 14th
amendment of the Constitution, which protects the "right to privacy." Therefore, many state laws
regarding abortion have been deemed unconstitutional by the courts and thrown out. So the
function of judicial review given to the courts by the Constitutional actually gives them great
power as the guardian of the constitutional rights of every citizen.
The checks and balances of the three branches of government are also used to both
criticize and support an active role for the judiciary branch. The writers of the U. S. Constitution
envisioned three separate but equal branches of the federal government. The checks and balances
of the Constitution ensure that no one branch of government or person has too much power.
Gonzales, arguing against judicial activism, states that courts should be very careful in taking the
step of declaring that a law or agency action is unconstitutional. He says that lawmakers and
Executive Branch officials have sworn to uphold the Constitution, just as judges do. Courts that
too easily use the Constitution as a way to strike down the actions of the other branches may not
be allowing the legislature and the President to exercise their proper constitutional roles.
However, Bolick raises the counter-argument that the courts are well equipped to second-guess
lawmakers’ decisions that may be made too hastily or for the wrong reasons and that do not take
into account all of the possible Constitutional issues. If legislators carefully considered the merits
and constitutionality of legislation, then Gonzales's arguments might have merit. But our
legislators rarely even read the complex bills they pass, which all too often are written to please
outside interests, such as lobbyists who may have special interests or big business at heart.
Judges, by contrast, look carefully at the competing evidence presented by both sides, as they
should. If the courts did not check whether laws or decisions by the executive branch are actually
in line with the Constitution, they would not be carrying out their own constitutional role. This
Metacognitive Monitoring 45
would undo our checks and balances system and allow the legislative and executive branches to
have too much power
Protection of citizens’ rights is another issue used both to criticize and to support an
active role for judges and the courts. Gonzales agrees that the courts must protect people from
situations where the wishes of the majority might go against an individual’s constitutional rights.
But he says that it is far more important to guard against the situation of having activist judges
who undermine the right of the people to govern themselves. We elect lawmakers and our
president and we have the right to expect that they will express the will of the majority – that is
their job. And if they do not, we have the power to select different representatives and a different
president in the next election. But when power is held by a few judges who are not elected and
who can overturn the actions of our elected officials, we face a far greater danger. Yet, in posing
this argument, Gonzales fails to take into account the other side of the problem, individual rights.
Bolick says that the situation of unelected judges overriding the strong and clearly expressed
wishes of a majority of the voters is extremely rare. A far greater problem is that judges do not
take enough care to protect individual rights. The courts are much more likely to presume that
laws and government actions are constitutional, making it much harder for individuals to prove
that their rights have been violated. Even worse, courts have decided that the Constitution does
not protect some very important individual rights against the interference of the government,
including some related to the protections and privileges that go with being a citizen. So not only
are courts ignoring legislation that is unconstitutional, they are interpreting the Constitution in a
way that lets the government override the rights of individual citizens.
Gonzales concludes that if the people have decided they favor your policy goals at the
ballot box, then you get a chance to set policy and make laws. He says that the party that controls
Metacognitive Monitoring 46
Congress and has the votes to enact laws supporting their policies should be free to do so without
contradiction from activist judges who disagree with those laws on political grounds. Bolick
shifts the argument away from the narrow issue of politics. He argues instead that the importance
of judicial activism revolves around the minority rights that are the essential element of the
Constitution and our democracy. He says that a court gavel can be David's hammer against the
Goliath of big government. Among our governmental institutions, courts alone are designed to
protect the individual against the power of the majority, and against special interest groups with
too much influence. We all have a stake in seeing that the judiciary does protect us, for as
government expands with new demands, such as Homeland Security, our freedom depends on
the willingness of courts to keep the government in line. For better or worse, the courts are the
last line of defense against the government running roughshod over individual liberties. When
judges swear allegiance to the Constitution, they must be aware of the danger of going beyond
the proper bounds of their judicial power, but even more so of the greater danger of not using it
Metacognitive Monitoring 47
Appendix C
Protocol for Think-Aloud Condition
Instructions for Think-aloud Protocol
"In this investigation, we are interested in what you think and do while you read a text. What we
want you to do is say what you are thinking and doing out loud. You can decide for yourself
whether you would like to read the text silently or out loud, or do some of both. Do whatever
feels most natural to you. We are only interested in what you are thinking or doing as you read.
For example, if you are going back to reread, please say that's what you are doing. If something
in the text reminds you of prior experiences or things you already know, let us know. If you are
thinking that you don't understand something, please say that, too. There is no right or wrong
things to say here, just whatever is going through your head as you read. If you are quiet for a
period of time, I'll ask you to say what you're thinking. Do you have any questions?"
Instructions for Practice Passage
"So that you can get comfortable with thinking aloud while you read, I'm going to give you a
practice passage to read first. This is just a practice, and I won't be recording what you say. You
can take your time and get used to how it feels. So, what I want you to do now is read the
passage and say what you're thinking and doing out loud."
Metacognitive Monitoring 48
Table 1
Codes Used for Think-Aloud Transcripts
Knowledge or beliefs that affect
"Wow, I never knew that."
Knowledge (MK)
the course of mental operations
"Judicial activism, I'm pretty sure I
about a person, task, or strategy.
know what that is."
Cognitive or affective experience
"I'm being distracted by noise
Experience (ME)
that pertain to a mental operation.
"Ok, I didn't understand that part."
Goals and
Realizing through a
"I'll just start that paragraph over."
Activation of
metacognitive experience and
"I'm going back, to re-read
Strategies (G/AS)
planning to evoke a strategy and
evidence of those strategies.
Metacognitive Monitoring 49
Table 2
Descriptive Statistics of Metacognitive Monitoring and Control Across Think-Aloud
Conditions and Across Passages
Mean (SD)
1.76 (2.04)
0.13 (0.47)
22.89 (13.50)
-8.57 (15.26)
Metacognitive Monitoring 50
Table 3
Descriptive Statistics of the Three Participant Groups on Prior Knowledge and Interest in
the Judicial Review Process
Prior Knowledge
Max. Mean (SD)
Topic Interest
Mean (SD)
41.65 (7.00)
45.70 (16.21)
48.11 (6.53)
60.16 (17.52)
54.50 (2.08)
75.03 (18.63)
Note. HDU = human development undergraduates, GPU = government and politics
undergraduates, PA = practicing attorneys
Metacognitive Monitoring 51
Table 4
Descriptive Statistics of Metacognitive Monitoring and Control Between Think-Aloud
Conditions and Across Passages
Think aloud
Max. Mean (SD)
No think aloud
Mean (SD)
2.03 (2.40)
1.53 (1.65)
0.56 (0.33)
0.13 (0.56)
23.36 (11.73)
22.47 (15.06)
-8.57 (18.25)
-8.58 (12.20)
Metacognitive Monitoring 52
Figure 1
Median Number of Scrollbacks by Passage
Number of ScrollBacks
Metacognitive Monitoring 53
Figure 2
Median Calibration Scores for Absolute Accuracy and Bias Between the Expository and
Persuasive Passages
Calibration (Condfidence-Performance)
Absolute Accuracy
Metacognitive Monitoring 54
Figure 3
Differences in Think-Aloud Utterances for the Expository and Persuasive Passages
Median Number of Utterances
Note: MK = Metacognitive Knowledge, ME = Metacognitive Experience, G/AS =
Goals/Activation of Strategies
Metacognitive Monitoring 55
Figure 4
Absolute and Signed Difference in Number of Scrollbacks Between the Expository and
Persuasive Passages Among Think-Aloud and No Think-Aloud Conditions
Difference in Scrollbacks
Think Aloud
No Think Aloud
Metacognitive Monitoring 56
Figure 5
Absolute Accuracy and Bias of the Expository and Persuasive Passages Among the Think-Aloud
and No Think-Aloud Conditions
Calibration (Confidence-Performance)
Think Aloud
No Think Aloud
Bias (Expository) Bias (Persuasive)
Metacognitive Monitoring 57
Figure 6
Differences in Prior Knowledge and Topic Interest Among the Three Participant Groups
Average Score
Prior Knowledge
Topic Interest
Note. HD = Human Development Undergraduates, GP = Government and Politics
Undergraduates, PA = Practicing Attorneys
Metacognitive Monitoring 58
Figure 7
Absolute and Signed Difference in Number of Scrollbacks Between the Expository and
Persuasive Passages Among Human Development and Government and Politics Undergraduates
Difference in Scrollbacks
Note. HD = Human Development Undergraduates, GP = Government and Politics
Metacognitive Monitoring 59
Figure 8
Absolute Accuracy and Bias of the Expository and Persuasive Passages Among Human
Development and Government and Politics Undergraduates
Calibration (Confidence-Performance)
Absolute Accuracy Absolute Accuracy Bias (Expository)
Bias (Persuasive)
Note. HD = Human Development Undergraduates, GP = Government and Politics