1 Running Head: METACOGNITION AND WORD LEARNING

advertisement
1
Running Head: METACOGNITION AND WORD LEARNING
“What Did I Learn?” and “How Did I Do?” The Relation between Metacognition and Word
Learning
Meghan M. Parkinson
University of Maryland
2
Abstract
Undergraduates’ metacognitive processes during word learning are a crucial component of
building representations of key concepts from text. Metacognitive monitoring was measured
through self-report of judgments of learning and confidence ratings (N = 60). Accuracy and bias
scores were calculated to determine students’ ability to calibrate their word learning. Results
indicated that undergraduates were poorly calibrated and overconfident about their word
knowledge. This supports evidence from previous studies on calibration of declarative
knowledge and reading comprehension. Judgment accuracy was a significant predictor of gains
in word knowledge from pretest to posttest. These findings warrant consideration of
metacognitive monitoring in regards to widely-cited processes of word learning.
3
Alexander (2005) proposed a lifespan development approach to reading, drawing on a
body of research supporting the Model of Domain Learning. Such an approach moves away from
the idea that emerging reading is the only critical period of learners’ development. In fact,
undergraduate students can still be in what Alexander called the acclimation phase of reading
development if they have little prior topic knowledge (e.g., knowledge of psychology) and only
situational interest (e.g., the instructor makes psychology interesting). Further, it is assumed that
undergraduates are able to recognize authors’ differing intents across texts, and even instructors’
purpose in assigning particular texts, although this assumption is not supported by think-aloud
data from undergraduates (Fox, Dinsmore, Maggioni, & Alexander, 2008).
As undergraduates gain topic knowledge and interest they are more likely to use
strategies effectively and eventually may move towards deep-processing strategies and
principled knowledge. This perspective on adult readers’ developing competence draws attention
to several key aspects. First, reading is itself a domain and knowledge of language and discourse
influences development of competence. Readers must comprehend a text at two different levels:
the textbase and a situation model based on inferences made from the text and the additive
quality of those inferences with prior topic knowledge (Kintsch, 1994). In order to comprehend
the textbase upon which the situation model operates, readers must draw upon conditional and
procedural knowledge for decoding text and building proposition models.
Key to comprehending the textbase is the ability to understand most of the individual
words within the text. If a word is unfamiliar, readers draw upon their domain knowledge of
reading in order to use strategies such as morphological analysis and using the context to derive
meaning. The Model of Domain Learning suggests that readers’ metacognitive monitoring
improves, as part of general improvement in strategic processing, when learners gain competence
4
in a given domain. In the case of reading, that domain competence is compounded with gains in
principled knowledge and deep-processing in a content-area domain. For example,
undergraduate psychology majors must be scaffolded in order to comprehend discourse
characteristic of the field. They take courses introducing statistics and research methods as well
as core findings from existing literature in the field. Reading an empirical study for the first time
is quite challenging for an undergraduate unfamiliar with the conventions of the field. However,
gaining competence in reading empirical studies eventually leads to gains in principled
knowledge of psychology, which in turn increase knowledge of reading within the domain of
psychology.
It has been well documented that development in academic domains is predicated on
students’ ability to acquire a base of conceptual knowledge and that individuals’ vocabulary is an
effective indicator of that knowledge base (Alexander, Murphy, Woods, Duhon, & Parker,
1997). It has also been well documented that students’ metacognitive awareness (i.e., knowledge
of self as a learner and thinker) is significantly related to academic development (Veenman,
Elshout, & Meijer, 1997). Yet, what is less well understood is the degree to which metacognitive
monitoring predicts word learning. Word learning is a necessary component of successful
reading comprehension (Davis, 1944; Stahl, 1999). Although metacognitive monitoring has been
studied in relation to declarative knowledge (Nietfeld & Schraw, 2002) and reading
comprehension (Dunlosky, Rawson, & Middleton, 2005), it has rarely been studied in relation to
word learning. The purpose of this study is to investigate that relation between metacognition
and word learning for competent readers.
5
Metacognition
Flavell (1979) described four kinds of metacognitive occurrences: metacognitive
knowledge, metacognitive experiences, goals, and actions. Metacognitive knowledge includes
beliefs about self as a learner. Metacognitive experiences include thoughts and feelings that
coincide with cognitive tasks. Metacognitive goals refer to the global and specific objectives of
cognitive tasks. Finally, metacognitive actions include strategies utilized to achieve those
specified goals. Several researchers have categorized metacognitive experiences, goals, and
actions under the umbrella of regulation of cognition (Baker & Brown, 1984; Schraw &
Moshman, 1995). The metacognitive experience of monitoring is one type of regulatory process.
Monitoring is situation-specific (Baker & Brown, 1984). For example, readers may fail to
monitor their comprehension of a text if they are uninterested in the topic, or too tired to
concentrate. Further, adults tend to be quite inaccurate in their monitoring of cognitive tasks
(Glenberg & Epstein, 1985), but training can improve monitoring accuracy and subsequently
performance on outcome tasks (Lichtenstein & Fischhoff, 1980; Nietfeld & Schraw, 2002).
Judgments of learning. A specific type of metacognitive monitoring is judgments of
learning (JOLs), or self-evaluations made after a learning task to determine how well information
will be remembered (Koriat, 1997). Numerous studies have demonstrated that JOLs impact
performance through improved strategy selection and use (e.g., Thiede, Anderson, & Therriault,
2003). Remaining questions in the literature on JOLs are a) what information individuals use to
make their JOLs; and b) how to improve accuracy of JOLs.
One view of the information used to make JOLs is called the accessibility hypothesis.
From this perspective, individuals consider the amount of information accessed from memory in
order to make a JOL. Additionally, the quality of that accessed information is typically
6
unevaluated, and so individuals base their judgments on quantity of information (Dunlosky,
Rawson, & Middleton, 2005). Some suggestions from the literature on how to improve the
accuracy of JOLs include delaying judgments after a task (Thiede, Anderson, & Therriault,
2003), strategy training (Nietfeld & Schraw, 2002), and explicitly asking participants to access
specific information before making JOLs (Dunlosky et al., 2005).
Calibration. Individuals ask themselves, “What did I learn?” in order to make a JOL, but
they ask themselves, “How well did I learn?” in order to rate their confidence. Determining the
accuracy of JOLs is an example of monitoring accuracy, or the degree to which confidence
judgments match performance (Nietfeld & Schraw, 2002). Another name for this is calibration.
Well-calibrated individuals are aware of what they do and do not know, thus they can adjust their
strategies appropriately (Glenberg & Epstein, 1985; Thiede, Anderson, & Therriault, 2003).
Calibration is operationalized as either absolute accuracy or relative accuracy. Absolute
accuracy is individuals’ match between performance and confidence ratings averaged across
items (Nietfeld & Schraw, 2002). Relative accuracy is the item-by-item match between
performance and confidence ratings (Dunlosky, Rawson, & Middleton, 2005). Thus, researchers
studying individual differences in calibration to a cognitive task, they would calculate absolute
accuracy. Researchers studying specific judgments for particular items would calculate relative
accuracy. Further, to determine individuals’ tendency to be overconfident or underconfident, bias
must be calculated (Nietfeld & Schraw, 2002). Formulas for these constructs will be provided in
the results section of the paper.
Extensive research has focused on the role of metacognitive monitoring in reading
comprehension (Dunlosky et al., 2005; Glenberg & Epstein, 1985). What has been
overwhelming neglected is empirical evidence for the role of metacognitive monitoring in word
7
learning during reading. This research gap is puzzling, as vocabulary knowledge has been
consistently found to have a positive influence on reading comprehension (Baumann, 2004).
Given Dunlosky et al.’s (2005) findings of differences in JOLs based on grain-size (global vs.
term-specific), it seems likely that metacognitive monitoring is related but distinct for reading
comprehension and word learning from context. Before such a contrast can be studied, empirical
evidence must be provided for the relation of metacognitive monitoring and word learning.
Word Learning from Context
Incidental word learning. Jenkins, Stein, and Wysocki (1984) defined incidental word
learning as the ability to derive and retain new word information without explicit direction.
Because it has been hypothesized that only 10% of new word meanings are learned through
direct instruction (Nagy, Anderson, & Herman, 1987), and because reading accounts for such a
large portion of individuals’ learning of new words, it is crucial to examine the metacognitive
processes that lead readers to recognize the presence of unknown or partially known words in
text and to take appropriate cognitive action. Noticing a gap in linguistic knowledge and the
concomitant need to locate or infer meaning for unknown words requires metacognitive
monitoring and locating or constructing meanings for those words requires regulation of
cognition.
Daalen-Kapteijns, Elshout-Mohr, and de Glopper (2001) described orientations to word
learning while reading for comprehension. They describe text oriented activities as those in
which a reader engages to understand the main idea of the text. From this orientation, readers
would only derive meaning for unknown words, through strategies such as substitution and
checking, if it was necessary to sustain the flow of reading comprehension. On the other hand,
word oriented activities are those concerned with using context to determine the meaning of
8
unknown words. This would lead readers to a context-specific representation of word meaning
that may or may not support future encounters with the same word in different contexts. Finally,
Daalen-Kapteijns et al. describe vocabulary knowledge oriented activities. These are driven by
the goal of increasing vocabulary knowledge and encoding new features to one’s mental lexicon.
Readers purposefully decontextualize derived aspects of word meaning in order to associate it
with what they already know about similar words or morphological parts.
It is the encoding entailed in the last type of orientation that was the basis for Sternberg
and Powell’s (1983) theory of learning word meaning from context. More recently, researchers
(e.g., Bolger, Balass, Landen, & Perfetti, 2008) have posited an instance-based learning approach
to deriving word meanings. From this perspective, encounters with words provide information
about one or more features, and the context of the encounter is encoded along with those
features. This information shapes subsequent encounters with the same word. Over several
encounters enough information accrues, and associations become strengthened enough to
abstract certain core features that constitute a decontextualized understanding of the word’s
meaning. Although the instance-based learning approach describes the process for learning new
word meanings, it does not indicate the mechanism driving the process. As Daalen-Kapteijns et
al. (2001) suggest, not every reader will engage in the encoding of word features the same way.
The current study suggests calibration as a metacognitive mechanism to predict word learning.
After searching for clues, readers infer word meaning with the use of context. They make
a guess about what meaning best fits the sentence. Sternberg and Powell (1983) named this
process selective encoding, or finding relevant information from the context in order to
determine word meaning. Next, readers must use a combination of clues presented in context in
order to make an appropriate guess about an unknown word’s meaning. Readers also compare
9
clues from context to prior knowledge about the topic or situation. Sternberg and Powell refer to
this as selective comparison. Incidental word learning is only effective if learners accurately
monitor their selective encoding and selective comparison.
Measuring word learning. Much of our current understanding of word learning processes
has been accumulated from studies utilizing artificially constructed texts (Swanborn & de
Glopper, 1999). Although these experimental manipulations illuminate specific aspects of word
learning, they have poor generalizability to typically encountered opportunities for incidental
word learning. Adults continue to add one to two new word meanings to their vocabulary every
day, and rarely do so through explicit instruction (Goulden, Nation, & Read, 1990).
There is a need for studying adult readers’ incidental word learning from naturally
occurring texts. One of the major challenges to this approach is measuring word learning in a
way that is sensitive to partial word knowledge, as readers may only encode broad features of
meaning if they engage in text-oriented activities (Bolger et al., 2008; Daalens-Kapteijns et al.,
2001). According to Durso and Shore (1991) there are three levels of words: unknown words,
frontier words, and known words. Readers are unable to distinguish unknown words from madeup words. Frontier words are words that readers recognize as real because they have been
encountered before, and can generally place in the correct general context, even without knowing
word meaning. It is possible for readers to define known words and understand their meaning
within multiple contexts.
Since a word-knowledge measure must be sensitive to partial word knowledge, it is
important to consider whether to use a multiple-choice or constructed answer format to capture
maximum variability. In the case of multiple-choice questions, the distractors are extremely
important as they constrain word features to varying degrees. Standardized tests of vocabulary
10
typically use this approach. Distractors should be of similar difficulty (i.e., all low-frequency
words) and they should follow a graduated response model (i.e., one choice should be same part
of speech, another should be same semantic category, another should include one correct feature,
and another should be the multifaceted definition). Additionally, the stems for multiple-choice
questions should be controlled for contextual support (Anderson & Freebody, 1981).
Given the difficulty of meeting all those requirements, and the constraints on variability
posed by clues from the stem or the context of the study, the constructed response method was
chosen for the current study. When participants are given the opportunity to generate a
definition, they can demonstrate as little or much as they know about a particular word. It is then
the responsibility of researchers to create a scoring system that adheres to a scheme capable of
capturing partial word knowledge without rewarding answers that are only distantly related.
Context effects are also irrelevant in this method of testing because target words are presented in
isolation.
The way in which word learning is measured has implications for the results of the study.
The current study seeks to measure word learning from incidental exposure to words in context.
Therefore, passages were selected from texts of typical difficulty and similar style to those read
by undergraduates. Further, attention was not called to the target words, nor was direction given
as to the need to later generate a definition for the target words. This design ensures that the
study lends empirical support to adults’ ability to encode meaning features incidentally from text.
A constructed response pretest/posttest design was utilized to capture as much variance in partial
word knowledge as possible. It was also designed to decrease reliance on participants’ use of
synonyms for definitions, as some low-frequency words do not have an easier synonym. Finally,
11
measuring metacognitive monitoring throughout the tasks was a crucial aspect of the current
study, as metacognition has rarely been studied in regards to word learning.
Purpose
The purpose of the current study is to address several gaps in the literature on word
learning. First, research has primarily focused on children’s developmental gains in competence.
As Alexander’s (2005) lifespan development perspective of reading suggests, undergraduates are
still gaining knowledge, interest, and competence reading within particular domains. Thus, it is
imperative to examine this cross-section of readers’ lifelong development in order to
appropriately scaffold required reading and consequently improve learning for students.
Additionally, the progression from novice to expert within a domain such as psychology
reciprocally interacts with metacognitive monitoring (Alexander et al., 1997).
Given this landscape, it seems prudent to study metacognition in regards to word
learning, as word learning is crucial to the development of principled knowledge within a
domain. Monitoring is especially promising as a mechanism for change in the hypothesized
views of word learning (Bolger et al., 2008; Sternberg & Powell, 1983). Since previous findings
have shown that judgments of learning differ at the passage and word level (Dunlosky et al.,
2005) it is necessary to determine if that finding is replicable across different kinds of discourse.
Finally, more evidence is needed for incidental word learning so that future studies can
distinguish between the processes and products of incidental vs. intentional (instructed) word
leaning. Before such comparison can be made, adults’ ability or lack thereof deriving meaning
incidentally from context must be established. The current study aims to provide evidence for
incidental word learning and for the influence of metacognitive monitoring on word learning.
The research design offers several unique strengths in filling these gaps in the literature, such as
12
naturally occurring texts, opportunities to self-report monitoring from fine-grained to global, and
open-ended pretest and posttest items. The following research questions are under investigation.
Do undergraduates’ JOLs and calibration as measures of metacognitive monitoring predict their
gains in word knowledge? JOLs and calibration should uniquely predict gains in word
knowledge as they are different grain size and asked either after reading passages or completing
posttest items.
Do JOLs and calibration across words predict changes in word knowledge? The reason
for this question is to determine the contribution of relative accuracy to word learning. The
previous question addressed absolute accuracy. Relative accuracy provides information on each
item, rather than averaging across items. It is expected that relative accuracy will predict change
in word knowledge, but that JOLs will not because they entail metacognitive monitoring at a
more global level (whole passage and whole group of words).
Do contextual factors (i.e., text difficulty and part of speech) contribute to differences in
word learning, over and above indicators of metacognitive monitoring (i.e., JOLs and
calibration)? Word and text factors are expected to have a mediating effect on metacognitive
monitoring because task difficulty may decrease the accuracy of metacognitive monitoring,
thereby decreasing its influence on word learning.
Method
Participants
Ninety-six undergraduates participated in the study, but data were only analyzed from 60
of the participants. There were several reasons for removing participants from the data analysis.
First, a large number of participants completed the first session, but were absent from class, or
could not complete the second session. Second, several participants failed to complete a whole
13
section or measure. Third, a few participants were removed because they indicated that they were
non-native English speakers on their demographics form.
The students were enrolled in either a human development class, or an education class at
a large, public university in the mid-Atlantic region of the United States. Students were primarily
juniors (65%) and had an average age of 21.1 years. Eighteen male and 42 female students
participated, and were predominantly Caucasian (58.3%).
Measures
Woodcock-Johnson III Diagnostic Reading Battery. Participants completed the reading
comprehension and vocabulary subscales from the Woodcock-Johnson III Diagnostic Reading
Battery. The W-J III DRB reading comprehension subscale is a series of cloze tasks, where
students must fill in the blank with the appropriate word for each sentence. The W-J III DRB
vocabulary subscales are a series of association tasks where a word is presented and participants
are directed to provide a synonym for the synonyms subscale, an antonym for the antonyms
subscale, and the appropriate word for the analogies subscale. These measures provided
information about participants’ general level of reading skill, specifically reading comprehension
and vocabulary knowledge. Cronbach’s alpha was .65 for this sample of undergraduates.
Word-knowledge pretest. To assess participants' prior knowledge of the target words, the
author created a word knowledge pretest for the study. The word-knowledge pretest consists of a
list of 60 words (Appendix A). Thirty target words were chosen from the text passages
administered in session two. These words are low frequency words, those that occur less than ten
times per 5 million words of running text, as determined by Carroll, Davies, and Richman's
(1971) The American Heritage Word Frequency Book. Example target words are dispelled and
dilapidation. Target words were chosen with consideration for part of speech. Previous work has
14
found that it is easier to derive meaning for nouns than for other parts of speech (Brown, 1957).
For this reason, the current study sought to balance the number of nouns and non-nouns to
analyze differences in both word learning and calibration based on part of speech.
Ten more words were chosen from text surrounding the passages. Example filler words
are admonish and arbitrary. The purpose of the filler words was to prevent participants from
focusing on target words that they would see again in session two. Finally, ten pseudowords
from a previous study (Schwanenflugel, Stahl, & McFalls, 1997) were added to the wordknowledge pretest. Pseudowords follow English language rules for orthography, but have no
meaning. Example pseudowords are devernal and edarthic.
The directions given to participants were, "Write a definition or short description for
every word that you can on the list. Please make your definitions as clear as possible so that I
know that you understand the meaning of the word. I am not interested in the number of words
that you know, so just do your best." After participants completed this first phase of the pretest,
directions indicated, "Go through the list again and place a check mark beside any word that you
left blank if you have seen it before or if it is familiar to you, even if you are not quite sure what
it means." The purpose for this set of instructions was to gain information about partial word
knowledge participants may have for target words. The pseudowords forced participants to
discriminate between words they may have previously encountered, and therefore know some
semantic feature of, and words that they have never encountered and do not have meanings.
Responses to the word-knowledge pretest were scored on a scale of 0 to 3. A score of
three was given to direct definitions or synonyms, as determined by the dictionary and thesaurus.
A score of two was given for indirect synonyms, and a score of one was given for some correct
feature of word meaning. On the pretest, a score of one was also given to any target words with a
15
check mark. A zero was given for incorrect answers. The author coded all responses to target
words, and two additional raters each scored one-third of the target word responses. A
calculation of Cohen's Kappa index of interrater reliability revealed 85% interrater reliability.
This calculation is corrected for chance agreements, and is therefore a conservative estimate
(Cohen, 1968). Additionally, the word knowledge pretest was found to be positively correlated
with the vocabulary subscale from the W-J III DRB, r = .48, p < .01. This provides evidence for
the validity of the word-knowledge pretest as a measure of existing word knowledge.
Narrative passages. Participants read six counterbalanced narrative passages, each
approximately 250 words in length, to present the target words in typically encountered context
(Appendix B). The passages were taken from two sources, The Tales of Edgar Allan Poe (2004)
and The Complete Works of Washington Irving (1978). These books were selected as sources
because narratives were written by famous American male authors of roughly the same period.
Based on text readability, a typically performing college sophomore could comprehend about
75% of text written by Washington Irving with ease, and 95% of the text written by Edgar Allan
Poe. Text readability, often referred to as text difficulty, was determined by the Lexile
Framework for Reading (2004). Lexiles are based on semantic difficulty (word frequency) and
syntactic complexity (sentence length).
Existing narrative texts were utilized in the proposed study in order to increase
generalizability. Empirical work on word learning has chiefly used artificially constructed texts
and tasks in order to create experimental manipulations (Durso & Shore, 1991; Fukkink, 2005;
McKeown, 1985). By manipulating text, researchers change the characteristics of target words,
contextual support, and text difficulty. Changing these factors does not simulate word learning
opportunities in typically encountered texts. Thus, the current study sought to study word
16
learning in a manner which reflects a task undergraduates are likely to encounter over the course
of typical reading.
The texts were specifically chosen as domain general to avoid the confounding of prior
topic knowledge with prior word knowledge. The focus of the current study is solely on the
domain of reading. Once more is known about knowledge, interest, and strategic processing of
word learning in this domain, the layer of content-domain knowledge, interest, and strategic
processing can be added.
Judgment of learning scales. Each passage was followed with two judgment of learning
scales (Appendix B). The first question asked, "How confident are you in your understanding of
the passage's overall meaning?" The second question asked, "How confident are you in your
understanding of the individual word meanings from the passage?" Participants responded by
marking a slash on a 100-mm line with 0% at one end and 100% at the other end. The value in
using continuous rating scales rather than categorical scales has been demonstrated in the
literature (Albaum, Best, & Hawkins, 1981; Schraw, Potenza, & Nebelsick-Gullet, 1993) and
was deemed the best way to capture individual differences in self-report of judgments of
learning. Cronbach’s alpha was .87 for the passage JOL scales and .83 for the word meaning
JOL scales.
Word-knowledge posttest. The word-knowledge posttest was similar to the wordknowledge pretest, with a shorter format and slightly different directions. Specifically, the
posttest consisted of only the target words, and not the filler words and pseudowords.
Participants were instructed to, "Write a definition or short description for each word. Please
make your definitions as clear as possible so that I know you understand the meaning of the
word. If you are unsure of a word's meaning, write your best guess." Responses were scored on
17
the same 0 to 3 scale as the pretest and the interrater reliability reported earlier includes scoring
on posttest responses. The word-knowledge posttest was found to be significantly correlated with
the word-knowledge pretest, r = .67, p < .01.
Confidence scales. A confidence scale followed each word on the posttest. The directions
demonstrated the calibration question as well as how to mark the 100-mm line. The question
asked, "How confident are you in the accuracy of your response?" Participants generated a
definition, or best guess description for each target word, and then evaluated the accuracy of
their response from 0% to 100% on the confidence scale. Reliability was .92 for the confidence
scale.
Procedure
Measures for the first session were group administered during class time. Participants
completed the reading comprehension and vocabulary subscales of the W-J III DRB and wordknowledge pretest. These measures were counterbalanced across participants and took
approximately 35 minutes to complete. Demographic information was also collected at this time.
One week later participants were administered session two measures. By allowing one
week between sessions to elapse, participants are likely to have forgotten specific words on the
word-knowledge pretest which should contribute to the validity of word-knowledge posttest data.
Second session measures included the contextual passages, judgment of learning scales, and the
word-knowledge posttest with calibration scales. The passages were counterbalanced.
Results
Do undergraduates’ JOLs and calibration as measures of metacognitive monitoring predict their
gains in word knowledge?
18
Word knowledge. Participants performed as expected for age-level and grade-level on the
reading comprehension (M = 37.18, SD = 2.91) and vocabulary knowledge (M = 48.10, SD =
3.28) subscales of the W-J III DRB. Prior knowledge for the specific words used in the study, as
measured by the word-knowledge pretest (M = 16.87, SD = 8.16) was quite variable. This
highlights the importance of testing for individual differences in word knowledge when
conducting studies of word learning. Participants also showed variable performance on the wordknowledge posttest (M = 15.67, SD = 10.23). Their mean difference scores (posttest mean –
pretest mean) were negative (M = -.28, SD = .39), a finding that will be interpreted in the
discussion section (Table 1).
Word-knowledge mean difference scores negatively correlated with bias, r = -.28, p < .05
(Table 2). This suggests that the greater the change in word knowledge, the less confident
participants were in their knowledge.
Judgments of learning. Participants gave higher ratings for the passage JOL (M = 76.72,
SD = 14.00) than for word meaning JOL scales (M = 70.96, SD = 15.89), t(59) = 3.02, p < .01.
This suggests that overall, participants were fairly confident that they had comprehended the
global meaning of the passages, but were less confident they had comprehended finer grained
word meanings within the passages. However, passage JOLs were related to word JOLs, r = .52,
p < .01. Further, JOLs of words within passages were related to performance on the vocabulary
subscale of the WJ-III DRB, r = .30, p < .05.
Calibration. The confidence scales (M = 28.73, SD = 18.66) included in the wordknowledge posttest captured ratings of how confident individuals were in their responses to the
posttest. Absolute accuracy (confidence score – percent correct on the posttest) was calculated to
capture participants’ average calibration across all items on the posttest. When absolute accuracy
19
is zero, it indicates perfectly accurate calibration to the task. For this task, absolute accuracy
ranged from 0 to 100. The signed difference of the absolute accuracy calculation is called bias,
which indicates whether participants were over- or under-confident in their responses. Although
the means are reported for these variables (Table 1), a specific type of analysis called bootstrap
was used because this study utilized difference scores, which are not assumed to follow a normal
distribution (Bonate, 2000).
A non-parametric bootstrap technique was used to create a 95% confidence interval
around median scores of a random sample drawn from the participants in the study. Bootstrap
has been identified as a good technique to test non-parametric data (Efron & Tibshirani, 1993),
such as the difference scores in this investigation. The strategy was used to resample (N=5000)
from the participants in the current study (n=58). The re-sample created a distribution in which
the median was calculated (Med) along with a 95% confidence interval at the 2.5 (P2.5) and 97.5
(P97.5) percentiles. This allowed testing of the null hypothesis that differences between
participants were zero at α = 0.05.
Figure 1 presents the absolute accuracy and bias for the participants on the posttest. The
median for absolute accuracy (the absolute difference between confidence and performance) was
14.56. This indicates that a participant at the 50th percentile had an absolute difference score
between confidence and performance of 14.56. This difference was significant (Med = 14.56,
P2.5 = 11.24, P97.5 = 18.43). The median for bias (the signed difference between confidence and
performance was 11.66. This indicates that a participant at the 50th percentile had a signed
difference score between confidence and performance of 11.66, indicating that they were
overconfident. This difference was significant (Med = 11.66, P2.5 = 7.59, P97.5 = 15.95).
20
A regression analysis (Table 3) was run to determine the influence of bias on the mean
difference word-knowledge scores. Bias was a significant predictor of gains in word knowledge.
Do JOLs and calibration across words predict changes in word knowledge?
Features of the task may influence JOLs and calibration, and subsequently participants’
word learning. In order to determine the impact of task features, analysis must be approached
across words rather than across people. This is especially the case in considering texts because
target words are nested within particular passages. Therefore, a measure of relative accuracy, or
calibration for each item was obtained using Kendall’s tau b. This type of correlation was chosen
because it tests the association between two ordinal variables. This means that relative accuracy
ranges from -1 to 1.
First, a regression analysis was run to determine the influence of JOLs and relative
accuracy on gains in word knowledge (Table 5). While relative accuracy was found to be a
significant predictor of word-knowledge gain, JOLs were not a significant predictor.
Do contextual factors (i.e., text difficulty and part of speech) contribute to differences in word
learning, over and above indicators of metacognitive monitoring (i.e., JOLs and calibration)?
Text difficulty. Recall that texts for the current study were selected to differ in difficulty
and that difficulty was determined by Lexile rating, considering sentence length and word
difficulty. For the regression analysis, each word was coded as zero if it was from one of the
three passages deemed easier for the typical high school graduate. Words were coded as one if
they were from one of the three passages deemed somewhat challenging for high school
graduates.
21
Part of speech. Since word meaning derivation has been found to be easier for nouns than
for other parts of speech (Brown, 1957), type of word was an important consideration for the
current study. Nouns were coded as a one and non-nouns were coded as a zero.
Regression analysis revealed that relative accuracy was the only significant predictor of
gains in word knowledge, with or without the entry of text difficulty and part of speech.
Discussion
Metacognitive Monitoring Influences Word Learning
Results support the influence of metacognitive monitoring on word learning. Following is
an examination of the concepts underlying this finding, as well as how these findings relate to
hypotheses of the study.
Word learning. Word learning was represented by changes in word knowledge from
pretest to posttest. These changes were found to be small and negative, indicating that on the
whole participants demonstrated greater knowledge on the pretest than they did on the posttest.
There are several interpretations of this finding. First, individuals have a larger receptive
vocabulary, or lexicon of known and partially known meanings, than expressive vocabulary, or
the meanings that can be appropriately communicated (Durso & Coggins, 1991). Thus,
participants may have learned some aspect of word meaning, but found themselves unable to
express that new meaning, especially in light of information they may have had about that word
before encountering it in the specific context presented.
This can be further explained by McKeown’s (1985) findings that multiple encounters
with a word cause interference because oftentimes information presented in new contexts does
not overlap, or even conflicts with information already in memory. Given that participants
recognized the target words as real words, based on some prior experience, during the word-
22
knowledge pretest, it is likely that the contexts presented did not provide information that
supported their partial knowledge. For example, several participants correctly defined clove as a
“type of spice,” or “a portion of a plant,” but were unable to produce a definition on the posttest.
Within the context of the passage, clove meant “stuck.” According to the instance-based learning
approach (Bolger et al., 2008), features of word meaning are encoded with the context. If this is
the most recent information available, and readers have not determined how to resolve seemingly
unrelated information about a word, with pre-existing knowledge it is easy to see why it was
difficult to generate definitions on the posttest that were relatively easy on the pretest.
One limitation to the study was that directions on the posttest did not indicate that
participants should attempt all words. It was impossible to determine which words were left
blank because participants were unable to generate a definition, and which were left blank due to
fatigue effects or indifference. Nevertheless, it is interesting that overall there was a slight loss in
demonstrated word knowledge. This makes sense in light of the presented framework for word
learning, but it was not fully anticipated from college undergraduates.
Further, greater gains in word learning were related to decreasing confidence in responses
to the word-knowledge posttest. This suggests that undergraduates who were engaged in learning
were unsure of how well they were learning. Given the effects of interference just described, it is
not surprising that confusion led to decreasing confidence. Perhaps these word learners were in
the acclimation phase of Alexander’s (2005) conceptualization of reading development.
Judgments of learning. Although it seems intuitive that passage JOLs are related to JOLs
of words within passages, this link has not always been present in previous studies (Dunlosky et
al., 2005). Knowledge of word meanings has historically been found to contribute to reading
comprehension (Stahl, 1999), but evidence has not been accumulated to suggest that
23
metacognitive monitoring of word meaning knowledge is related to metacognitive monitoring of
reading comprehension.
JOLs were not expected to influence word learning to the same extent as relative
accuracy because the JOLs represent a more global judgment than confidence ratings. In other
words, JOLs represent metacognitive monitoring of passage comprehension, but confidence
ratings represent metacognitive monitoring of specific word knowledge. Since these types of
monitoring were found to be related (Table 2), the regression analysis supported the hypothesis
that JOLs and confidence ratings do indeed represent distinct constructs.
Calibration. Both absolute accuracy and relative accuracy were found to influence word
learning. Overall, undergraduates were overconfident in their performance on the wordknowledge posttest. Some of the participants were extremely well-calibrated, but most were
poorly calibrated, as suggested by previous research (Glenberg & Epstein, 1985). Perhaps
individuals who demonstrated very little word knowledge on the posttest were well aware of
their lack of knowledge, and individuals who demonstrated a great deal of word knowledge were
aware of the knowledge they possessed. It is those who were in the middle, effortfully learning
new aspects of certain words and unsure how to gauge their progress, that may have been the
least calibrated.
Results for the first research question support the hypothesis that calibration, as
represented by both absolute accuracy and relative accuracy, influences word learning. Little
research has directly addressed metacognitive monitoring in regards to word learning while
reading, so this evidence is a first step to better understanding this relation.
24
Contextual Factors that Influence Word Learning
Contextual factors of the text and chosen target words were considered as influences on
word learning. Specifically, text difficulty was chosen as an indicator because the relative ease
with which a reader can comprehend the text influences their capacity to engage in word
learning. Part of speech was chosen as an indicator of difficulty for the target words because
several studies have found nouns to be easier to learn than adjectives, verbs, and other parts of
speech (Brown, 1957; Schwanenflugel, Stahl, & McFalls, 1997). Neither of these indicators of
contextual difficulty were found to influence word learning, contrary to the hypothesis that they
would have a mediating effect upon metacognitive monitoring of word learning.
Conclusions
Including calibration in the conceptualization of metacognitive monitoring provides
critical information about not just what individuals believe they are learning, but how well they
estimate that learning has progressed. Since calibration can be measured across items (by person)
it is possible to consider person factors, such as general reading skill. When calibration is
measured item-by-item (by word) it is possible to consider factors inherent in the presentation of
words within context. Studying word learning in typical contexts approximates conditions
undergraduates are likely to encounter when reading independently for courses or for pleasure.
Metacognitive monitoring should be studied not only within particular domains, but also
across levels of processes typical of particular domains, from global to specific. In order to make
this feasible, measurement of monitoring should occur throughout a complex task, such as
reading. This type of paradigm allows examination of monitoring relations across levels of
processing as well as information about how undergraduates rate their performance at various
stages of their learning. Students were poorly calibrated and overconfident in their word
25
learning, as anticipated from studies of calibration related to reading comprehension (Glenberg
& Epstein, 1985). Further investigation of this tendency, along with word, text, and person
factors that may influence students’ efforts to calibrate their learning, would supply muchneeded evidence for the best kinds of feedback and training instructors might use to scaffold
their undergraduates’ learning from assigned texts.
Future Directions
This exploratory study of metacognitive monitoring and word learning establishes their
relation to each other and warrants further investigation. First, it is important to consider
metacognitive monitoring as a process, not just a product. Think aloud protocol could be utilized
to illuminate processes undergraduates call upon and when they are apt to use them while
reading connected discourse. It would also be of interest to determine if the processes captured
by think aloud protocol relate to the self-report measures used in this study to measure
monitoring as a product. Perhaps the method of self-report is itself difficult for participants to
calibrate. This would be especially effective paired with computer administration, as a computer
environment could capture specific data regarding time to complete specific tasks and typewritten responses. Computer administration, with its aforementioned benefits, would also allow a
longer type of study that could compare word learning in different contexts. Examples of context
variation include providing dictionary definitions, comparing narrative vs. expository text,
comparing informational vs. persuasive text, and looking at the influence of interest and prior
knowledge by presenting texts from different domains.
Further empirical investigation into the categories of vocabulary interest (DaalenKapteijns et al., 2003) would also provide much-needed information about when a reader
chooses to pay attention to unknown words within text. One’s orientation to word learning and
26
its relation to reading comprehension would presumably influence both metacognitive
monitoring and strategic processing. It may be helpful to create profiles of monitoring and
strategy use for each of the approaches described by Daalen-Kapteijns et al. Another way to
understand attention and word learning would be to directly compare intentional word learning
tasks with incidental word learning tasks (such as the task in the current study). To date, the
literature on word learning tends to illuminate either intentional or incidental tasks, but very few
studies have directly compared the two types of word learning (Fukkink & de Glopper, 1998;
Swanborn & de Glopper, 1999).
The impact of feedback on calibration has been examined for word pairs and knowledge
questions, but not for a complex task such as reading (Lichtenstein & Fischhoff, 1980; Nietfeld
& Schraw, 2002). Although feedback was found to significantly and immediately improve
calibration and subsequent performance for the simpler types of tasks, such effects may not be as
straightforward for reading. Perhaps calibration feedback would need to be combined with
instruction in metacognitive monitoring and strategy use in order to have any impact on reading
outcomes. Additionally, developmental data are essential to understanding undergraduates’ word
learning, as students may be acclimating to the demands of reading challenging texts across
several domains. This would capture the impact of feedback over time and take into
consideration increasing proficiency in both the content domain and domain of reading.
Instructional Implications
Feedback from instructors is one way in which undergraduates can hope to improve their
metacognitive monitoring of word learning. Discussion questions are a common assignment in
undergraduate courses, and one that instructors sometimes use as an indicator of whether
students are engaging in deep processing with the assigned texts. Oftentimes, feedback on the
27
types of questions asked, and how students generated those questions is not included in
instruction. Helping students become more strategic readers helps them become better learners.
One aspect of strategic reading concerns word learning, as understanding vocabulary directly
improves reading comprehension (Stahl, 1999). Teaching students how to better monitor their
reading, and specifically their learning of terms signifying core concepts, should be an important
goal in all college courses. Students are expected to learn independently at the undergraduate
level, but cannot do so if those expectations are not properly scaffolded both inside the classroom
and through assignments outside the classroom.
28
References
Albaum, G., Best, R., & Hawkins, D. I. (1981). Continuous vs. discrete semantic differential
rating scales. Psychological Reports, 49, 83-86.
Alexander, P. A. (2005). The path to competence: A lifespan developmental perspective on
reading. Journal of Literacy Research, 37, 413-436.
Alexander, P. A., Murphy, P. K., Woods, B. S., Duhon, K. E., & Parker, D. (1997). College
instruction and concomitant changes in students’ knowledge, interest, and strategy use: A
study of domain learning. Contemporary Educational Psychology, 22, 125-146.
Anderson, R. C., & Freebody, P. (1981). Vocabulary knowledge. In J. Guthrie (Ed.),
Comprehension and teaching: Research reviews (pp. 77-117). Newark, DE: International
Reading Association.
Baker, L., & Brown, A. L. (1984). Metacognitive skills and reading. In P. D. Pearson, M. Kamil,
R. Barr, & P. Mosenthal (Eds.), Handbook of reading research (Vol. 1, pp. 353-394).
White Plains, NY: Longman.
Baumann, J. F. (2004). Vocabulary-comprehension relationships. In B. Maloch, J. V. Hoffman,
D. L. Schallert, C. M. Fairbanks, & J. Worthy (Eds.), 54th yearbook of the National
Reading Conference. Oak Creek, WI: National Reading Conference, Inc.
Bolger, D. J., Balass, M., Landen, E., & Perfetti, C. A. (2008). Context variation and definitions
in learning the meanings of words: An instance-based learning approach. Discourse
Processes, 45, 122-159.
Bonate, P. L. (2000). Analysis of pretest-posttest designs. Boca Raton, FL: Chapman & Hall.
Brown, R. W. (1957). Linguistic determinism and the part of speech. Journal of Abnormal and
Social Psychology, 55, 1-5.
29
Carroll, J. B., Davies, P., & Richman, B. (1971). The American Heritage word frequency book.
Boston: Houghton Mifflin Company.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled
disagreement or partial credit. Psychological Bulletin, 70, 213-220.
Davis, F. B. (1944). Fundamental factors of comprehension in reading. Psychometrika, 9, 185197.
Dunlosky, J., Rawson, K. A., Middleton, E. L. (2005). What constrains the accuracy of
metacomprehension judgments? Testing the transfer-appropriate-monitoring and
accessibility hypotheses. Journal of Memory and Language, 52, 551-565.
Durso, F. T., & Coggins, K. A. (1991). Organized instruction for the improvement of word
knowledge skills. Journal of Educational Psychology, 83, 108-112.
Durso, F. T., & Shore, W. J. (1991). Partial knowledge of word meanings. Journal of
Experimental Psychology: General, 120, 190-202.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman &
Hall.
Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitivedevelopmental inquiry. American Psychologist, 34, 906-911.
Fox, E., Dinsmore, D. L., Maggioni, L., & Alexander, P. A. (2008, March). Undergraduates’
independent and scaffolded reading of course texts: Further evidence of fragile
understanding. Paper presented at the annual meeting of the American Educational
Research Association, New York.
Fukkink, R. G. (2005). Deriving word meaning from written context: a process analysis.
Learning and Instruction, 15, 23-43.
30
Glenberg, A. M., & Epstein, W. (1985). Calibration of comprehension. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 11, 702-718.
Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied
Linguistics, 11, 341-363.
Irving, W. (1978). The complete works of Washington Irving: The sketch book of Geoffery
Crayon, Gent. (H. Springer, Ed.) Boston: Twayne Publishers.
Jenkins, J. R., Stein, M. L., & Wysocki, K. (1984). Learning vocabulary through reading.
American Educational Research Journal, 21, 767-787.
Kintsch, W. (1994). Text comprehension, memory, and learning. American Psychologist, 49,
294-303.
Koriat, A. (1997). Monitoring one’s own knowledge during study: A cue-utilization approach to
judgments of learning. Journal of Experimental Psychology: General, 126, 349-370.
Lexile (2004). http://www.lexile.com
Lichtenstein, S. & Fischhoff, B. (1980). Training for calibration. Organizational Behavior and
Human Performance, 26, 149-171.
McKeown, M. G. (1985). The acquisition of word meaning from context by children of high and
low ability. Reading Research Quarterly, 20, 482-496.
Nagy, W. E., Anderson, R. C., & Herman, P. A. (1987). Learning word meanings from context
during normal reading. American Educational Research Journal, 24, 237-270.
Nietfeld, J. L., & Schraw, J. (2002). The effect of knowledge and strategy training on monitoring
accuracy. The Journal of Educational Research, 95, 131-142.
Poe, E. A. (2004). The tales of Edgar Allan Poe. New York: Simon & Schuster.
31
Schraw, G., & Moshman, D. (1995). Metacognitive theories. Educational PsychologyReview, 7,
351-371.
Schraw, G., Potenza, M. T., & Nebelsick-Gullet, L. (1993). Constraints on the calibration of
performance. Contemporary Educational Psychology, 18, 455-463.
Schwanenflugel, P. J., Stahl, S. A., & McFalls, E. L. (1997). Partial word knowledge and
vocabulary growth during reading comprehension. National Reading Research Center
Universities of Georgia and Maryland, Reading research report no. 76.
Stahl, S. A. (1999). Vocabulary development. Cambridge, MA: Brookline Books.
Sternberg, R. J., & Powell, J. S. (1983). Comprehending verbal comprehension. American
Psychologist, 38, 878-893.
Swanborn, M. S. L., & de Glopper, K. (1999). Incidental word learning while reading: A metaanalysis. Review of Educational Research, 69, 261-285.
Thiede, K. W., Anderson, M. C. M., & Therriault, D. (2003). Accuracy of metacognitive
monitoring affects learning of texts. Journal of Educational Psychology, 95, 66-73.
van Daalen-Kapteijns, M., Elshout-Mohr, M., & de Glopper, K. (2001). Deriving the meaning of
unknown words from multiple contexts. Language Learning, 51, 145-181.
Veenman, M. V. J., Elshout, J. J., & Meijer, J. (1997). The generality vs domain-specificity of
metacognitive skills in novice learning across domains. Learning and Instruction, 7, 187209.
32
Appendix A: Items on Word-Knowledge Pretest and Posttest
Target Words
Boding
Boorish
Capacious
Clove
Congeniality
Contrived
Countenance
Dilapidation
Dispelled
Filigreed
Fissure
Gilded
Motley
Pacific
Pertinacious
Petulant
Pommel
Psalmody
Specious
Specter
Stave
Sullen
Tarn
Varlet
Veritable
Vignette
Waggery
Wended
Withe
33
Filler Words
Admonish
Arbitrary
Benefactor
Derision
Docile
Expound
Forlorn
Harbinger
Ineffable
Incipient
Kindle
Lucid
Melancholy
Prodigious
Sagacious
Sentiment
Thwart
Tumultuous
Vehemently
Wane
Whim
Pseudowords
Calsar
Devernal
Drallen
Edarthic
Fossern
Jandelar
Merriton
Phisteron
Redistac
Thonstan
34
Appendix B: Sample Passage and Judgment of Learning Scales
The portrait, I have already said, was that of a young girl. It was a mere head and shoulders, done
in what is technically termed a vignette manner, much in the style of the favorite heads of Sully.
The arms, the bosom, and even the ends of the radiant hair melted imperceptibly into the vague
yet deep shadow which formed the background of the whole. The frame was oval, richly gilded
and filigreed in Moresque. As a thing of art nothing could be more admirable than the painting
itself. But it could have been neither the execution of the work, nor the immortal beauty of the
countenance, which had so suddenly and so vehemently moved me. Least of all, could it have
been that my fancy, shaken from its half slumber, had mistaken the head for that of a living
person. I saw at once that the peculiarities of the design, of the vignetting, and of the frame, must
have instantly dispelled such an idea – must have prevented even its momentary entertainment.
Thinking earnestly upon these points, I remained, for an hour perhaps, half sitting, half reclining,
with my vision riveted upon the portrait. At length, satisfied with the true secret of its effect, I
fell back within the bed. I had found the spell of the picture in an absolute life-likeness of
expression, which, at first startling, finally confounded, subdued, and appalled me.
How confident are you in your understanding of the passage's overall meaning?
0_______________________________________________100%
How confident are you in your understanding of individual word meanings from the passage?
0_______________________________________________100%
35
Table 1
Descriptive Statistics of Metacognitive Monitoring and Word Knowledge
Min.
Person
Max.
Word
Max.
Mean (SD)
Min.
Judgments
of Learning
25.83
93.83
70.96 (15.89)
63.12 83.23
72.85 (6.10)
Confidence
Ratings
0.80
72.53
28.73 (18.66)
7.33 55.24
27.69 (12.60)
Calibration1
0.31
56.97
14.63 (13.89)
0.02
0.62
0.37 (0.15)
Word
Knowledge
Difference
Score
-1.33
0.56
-0.28 (0.39)
-0.70
0.40
-0.21 (0.32)
Comprehen
sion
31.00
44.00
37.18 (2.91)
Vocabulary
42.00
56.00
48.10 (3.28)
Bias
-30.10 56.97
Mean (SD)
11.71 (16.46)
Note. Calibration was calculated as absolute accuracy for questions across people and
relative accuracy across words.
Metacognition and Word Learning 36
Table 2
Intercorrelations between Metacognitive Monitoring, Word Knowledge, and General Reading
Skills
1
2
3
4
1. Comp
—
2. Vocab
.49**
—
3. JOL
.10
.30*
4. PCR
.16
.38**
.46**
—
5. Bias
-.12
.17
.28*
.81**
6. AbsA
-.05
.22
.37**
.81**
7. WKC
.18
.24
.24
5
6
7
—
.14
—
.81**
-.28*
Note. Comp = Nelson-Denny Comprehension; Vocab = NelsonDenny Vocabulary; JOL = Judgment of Learning; PCR = Posttest
Confidence Rating; Bias = Bias; AbsA = Absolute Accuracy
(calibration); WKC = Word Knowledge Change.
*p < .05, **p < .01
—
-.04
—
Metacognition and Word Learning 37
Table 3
Intercorrelations between Relative Accuracy, Judgments of Learning, and Confidence
1
2
1. RelA
—
2. JOL
.01
—
3. Conf
.41*
-.01
3
—
Note. RelA = Relative Accuracy (Calibration); JOL = Judgments of
Learning; Conf = Confidence.
*p < .05
Metacognition and Word Learning 38
Table 4
Summary of Step-wise Regression Analysis for Person Variables Predicting Change in Word
Knowledge (N = 57)
Variable
B
SE B
Β
Step 1
Bias
-.01 .00
-.28*
-.01 .00
-.32*
Step 2
Bias
Comprehension .00
.20
.02
Vocabulary
.20
.25
.03
Note. R2 = .08 for Step 1 (p < .05); ΔR2 = .07 for Step 2.
*p < .05
Metacognition and Word Learning 39
Table 5
Summary of Step-wise Regression Analysis for Metacognitive Monitoring Variables Predicting
Change in Word Knowledge (N = 24)
Variable
B
SE B
Β
Judgment of learning -.01 .01
-.19
Relative accuracy
.47*
Note. R2 = .23 (NS).
*p < .05
.95
.39
Metacognition and Word Learning 40
Table 6
Summary of Step-wise Regression Analysis for Context Variables Predicting Change in Word
Knowledge (N = 24)
Variable
B
SE B
Β
Step 1
Relative accuracy .95
.39
.47*
JOL
.01
-.19
Relative accuracy 1.00 .36
.50*
JOL
-.01
.01
-.14
Part of speech
-.21
.11
-.36
Text difficulty
.20
.11
.33
-.01
Step 2
Note. R2 = .23 for Step 1; ΔR2 = .22 for Step 2 (p < .05).
*p < .05
Metacognition and Word Learning 41
Figure 1
Median Differences in Absolute Accuracy and Bias
Calibration (Confidence - Performance)
16
14
*
*
12
10
8
6
4
2
0
Absolute Accuracy
Bias
Download