1 Running Head: METACOGNITION AND WORD LEARNING “What Did I Learn?” and “How Did I Do?” The Relation between Metacognition and Word Learning Meghan M. Parkinson University of Maryland 2 Abstract Undergraduates’ metacognitive processes during word learning are a crucial component of building representations of key concepts from text. Metacognitive monitoring was measured through self-report of judgments of learning and confidence ratings (N = 60). Accuracy and bias scores were calculated to determine students’ ability to calibrate their word learning. Results indicated that undergraduates were poorly calibrated and overconfident about their word knowledge. This supports evidence from previous studies on calibration of declarative knowledge and reading comprehension. Judgment accuracy was a significant predictor of gains in word knowledge from pretest to posttest. These findings warrant consideration of metacognitive monitoring in regards to widely-cited processes of word learning. 3 Alexander (2005) proposed a lifespan development approach to reading, drawing on a body of research supporting the Model of Domain Learning. Such an approach moves away from the idea that emerging reading is the only critical period of learners’ development. In fact, undergraduate students can still be in what Alexander called the acclimation phase of reading development if they have little prior topic knowledge (e.g., knowledge of psychology) and only situational interest (e.g., the instructor makes psychology interesting). Further, it is assumed that undergraduates are able to recognize authors’ differing intents across texts, and even instructors’ purpose in assigning particular texts, although this assumption is not supported by think-aloud data from undergraduates (Fox, Dinsmore, Maggioni, & Alexander, 2008). As undergraduates gain topic knowledge and interest they are more likely to use strategies effectively and eventually may move towards deep-processing strategies and principled knowledge. This perspective on adult readers’ developing competence draws attention to several key aspects. First, reading is itself a domain and knowledge of language and discourse influences development of competence. Readers must comprehend a text at two different levels: the textbase and a situation model based on inferences made from the text and the additive quality of those inferences with prior topic knowledge (Kintsch, 1994). In order to comprehend the textbase upon which the situation model operates, readers must draw upon conditional and procedural knowledge for decoding text and building proposition models. Key to comprehending the textbase is the ability to understand most of the individual words within the text. If a word is unfamiliar, readers draw upon their domain knowledge of reading in order to use strategies such as morphological analysis and using the context to derive meaning. The Model of Domain Learning suggests that readers’ metacognitive monitoring improves, as part of general improvement in strategic processing, when learners gain competence 4 in a given domain. In the case of reading, that domain competence is compounded with gains in principled knowledge and deep-processing in a content-area domain. For example, undergraduate psychology majors must be scaffolded in order to comprehend discourse characteristic of the field. They take courses introducing statistics and research methods as well as core findings from existing literature in the field. Reading an empirical study for the first time is quite challenging for an undergraduate unfamiliar with the conventions of the field. However, gaining competence in reading empirical studies eventually leads to gains in principled knowledge of psychology, which in turn increase knowledge of reading within the domain of psychology. It has been well documented that development in academic domains is predicated on students’ ability to acquire a base of conceptual knowledge and that individuals’ vocabulary is an effective indicator of that knowledge base (Alexander, Murphy, Woods, Duhon, & Parker, 1997). It has also been well documented that students’ metacognitive awareness (i.e., knowledge of self as a learner and thinker) is significantly related to academic development (Veenman, Elshout, & Meijer, 1997). Yet, what is less well understood is the degree to which metacognitive monitoring predicts word learning. Word learning is a necessary component of successful reading comprehension (Davis, 1944; Stahl, 1999). Although metacognitive monitoring has been studied in relation to declarative knowledge (Nietfeld & Schraw, 2002) and reading comprehension (Dunlosky, Rawson, & Middleton, 2005), it has rarely been studied in relation to word learning. The purpose of this study is to investigate that relation between metacognition and word learning for competent readers. 5 Metacognition Flavell (1979) described four kinds of metacognitive occurrences: metacognitive knowledge, metacognitive experiences, goals, and actions. Metacognitive knowledge includes beliefs about self as a learner. Metacognitive experiences include thoughts and feelings that coincide with cognitive tasks. Metacognitive goals refer to the global and specific objectives of cognitive tasks. Finally, metacognitive actions include strategies utilized to achieve those specified goals. Several researchers have categorized metacognitive experiences, goals, and actions under the umbrella of regulation of cognition (Baker & Brown, 1984; Schraw & Moshman, 1995). The metacognitive experience of monitoring is one type of regulatory process. Monitoring is situation-specific (Baker & Brown, 1984). For example, readers may fail to monitor their comprehension of a text if they are uninterested in the topic, or too tired to concentrate. Further, adults tend to be quite inaccurate in their monitoring of cognitive tasks (Glenberg & Epstein, 1985), but training can improve monitoring accuracy and subsequently performance on outcome tasks (Lichtenstein & Fischhoff, 1980; Nietfeld & Schraw, 2002). Judgments of learning. A specific type of metacognitive monitoring is judgments of learning (JOLs), or self-evaluations made after a learning task to determine how well information will be remembered (Koriat, 1997). Numerous studies have demonstrated that JOLs impact performance through improved strategy selection and use (e.g., Thiede, Anderson, & Therriault, 2003). Remaining questions in the literature on JOLs are a) what information individuals use to make their JOLs; and b) how to improve accuracy of JOLs. One view of the information used to make JOLs is called the accessibility hypothesis. From this perspective, individuals consider the amount of information accessed from memory in order to make a JOL. Additionally, the quality of that accessed information is typically 6 unevaluated, and so individuals base their judgments on quantity of information (Dunlosky, Rawson, & Middleton, 2005). Some suggestions from the literature on how to improve the accuracy of JOLs include delaying judgments after a task (Thiede, Anderson, & Therriault, 2003), strategy training (Nietfeld & Schraw, 2002), and explicitly asking participants to access specific information before making JOLs (Dunlosky et al., 2005). Calibration. Individuals ask themselves, “What did I learn?” in order to make a JOL, but they ask themselves, “How well did I learn?” in order to rate their confidence. Determining the accuracy of JOLs is an example of monitoring accuracy, or the degree to which confidence judgments match performance (Nietfeld & Schraw, 2002). Another name for this is calibration. Well-calibrated individuals are aware of what they do and do not know, thus they can adjust their strategies appropriately (Glenberg & Epstein, 1985; Thiede, Anderson, & Therriault, 2003). Calibration is operationalized as either absolute accuracy or relative accuracy. Absolute accuracy is individuals’ match between performance and confidence ratings averaged across items (Nietfeld & Schraw, 2002). Relative accuracy is the item-by-item match between performance and confidence ratings (Dunlosky, Rawson, & Middleton, 2005). Thus, researchers studying individual differences in calibration to a cognitive task, they would calculate absolute accuracy. Researchers studying specific judgments for particular items would calculate relative accuracy. Further, to determine individuals’ tendency to be overconfident or underconfident, bias must be calculated (Nietfeld & Schraw, 2002). Formulas for these constructs will be provided in the results section of the paper. Extensive research has focused on the role of metacognitive monitoring in reading comprehension (Dunlosky et al., 2005; Glenberg & Epstein, 1985). What has been overwhelming neglected is empirical evidence for the role of metacognitive monitoring in word 7 learning during reading. This research gap is puzzling, as vocabulary knowledge has been consistently found to have a positive influence on reading comprehension (Baumann, 2004). Given Dunlosky et al.’s (2005) findings of differences in JOLs based on grain-size (global vs. term-specific), it seems likely that metacognitive monitoring is related but distinct for reading comprehension and word learning from context. Before such a contrast can be studied, empirical evidence must be provided for the relation of metacognitive monitoring and word learning. Word Learning from Context Incidental word learning. Jenkins, Stein, and Wysocki (1984) defined incidental word learning as the ability to derive and retain new word information without explicit direction. Because it has been hypothesized that only 10% of new word meanings are learned through direct instruction (Nagy, Anderson, & Herman, 1987), and because reading accounts for such a large portion of individuals’ learning of new words, it is crucial to examine the metacognitive processes that lead readers to recognize the presence of unknown or partially known words in text and to take appropriate cognitive action. Noticing a gap in linguistic knowledge and the concomitant need to locate or infer meaning for unknown words requires metacognitive monitoring and locating or constructing meanings for those words requires regulation of cognition. Daalen-Kapteijns, Elshout-Mohr, and de Glopper (2001) described orientations to word learning while reading for comprehension. They describe text oriented activities as those in which a reader engages to understand the main idea of the text. From this orientation, readers would only derive meaning for unknown words, through strategies such as substitution and checking, if it was necessary to sustain the flow of reading comprehension. On the other hand, word oriented activities are those concerned with using context to determine the meaning of 8 unknown words. This would lead readers to a context-specific representation of word meaning that may or may not support future encounters with the same word in different contexts. Finally, Daalen-Kapteijns et al. describe vocabulary knowledge oriented activities. These are driven by the goal of increasing vocabulary knowledge and encoding new features to one’s mental lexicon. Readers purposefully decontextualize derived aspects of word meaning in order to associate it with what they already know about similar words or morphological parts. It is the encoding entailed in the last type of orientation that was the basis for Sternberg and Powell’s (1983) theory of learning word meaning from context. More recently, researchers (e.g., Bolger, Balass, Landen, & Perfetti, 2008) have posited an instance-based learning approach to deriving word meanings. From this perspective, encounters with words provide information about one or more features, and the context of the encounter is encoded along with those features. This information shapes subsequent encounters with the same word. Over several encounters enough information accrues, and associations become strengthened enough to abstract certain core features that constitute a decontextualized understanding of the word’s meaning. Although the instance-based learning approach describes the process for learning new word meanings, it does not indicate the mechanism driving the process. As Daalen-Kapteijns et al. (2001) suggest, not every reader will engage in the encoding of word features the same way. The current study suggests calibration as a metacognitive mechanism to predict word learning. After searching for clues, readers infer word meaning with the use of context. They make a guess about what meaning best fits the sentence. Sternberg and Powell (1983) named this process selective encoding, or finding relevant information from the context in order to determine word meaning. Next, readers must use a combination of clues presented in context in order to make an appropriate guess about an unknown word’s meaning. Readers also compare 9 clues from context to prior knowledge about the topic or situation. Sternberg and Powell refer to this as selective comparison. Incidental word learning is only effective if learners accurately monitor their selective encoding and selective comparison. Measuring word learning. Much of our current understanding of word learning processes has been accumulated from studies utilizing artificially constructed texts (Swanborn & de Glopper, 1999). Although these experimental manipulations illuminate specific aspects of word learning, they have poor generalizability to typically encountered opportunities for incidental word learning. Adults continue to add one to two new word meanings to their vocabulary every day, and rarely do so through explicit instruction (Goulden, Nation, & Read, 1990). There is a need for studying adult readers’ incidental word learning from naturally occurring texts. One of the major challenges to this approach is measuring word learning in a way that is sensitive to partial word knowledge, as readers may only encode broad features of meaning if they engage in text-oriented activities (Bolger et al., 2008; Daalens-Kapteijns et al., 2001). According to Durso and Shore (1991) there are three levels of words: unknown words, frontier words, and known words. Readers are unable to distinguish unknown words from madeup words. Frontier words are words that readers recognize as real because they have been encountered before, and can generally place in the correct general context, even without knowing word meaning. It is possible for readers to define known words and understand their meaning within multiple contexts. Since a word-knowledge measure must be sensitive to partial word knowledge, it is important to consider whether to use a multiple-choice or constructed answer format to capture maximum variability. In the case of multiple-choice questions, the distractors are extremely important as they constrain word features to varying degrees. Standardized tests of vocabulary 10 typically use this approach. Distractors should be of similar difficulty (i.e., all low-frequency words) and they should follow a graduated response model (i.e., one choice should be same part of speech, another should be same semantic category, another should include one correct feature, and another should be the multifaceted definition). Additionally, the stems for multiple-choice questions should be controlled for contextual support (Anderson & Freebody, 1981). Given the difficulty of meeting all those requirements, and the constraints on variability posed by clues from the stem or the context of the study, the constructed response method was chosen for the current study. When participants are given the opportunity to generate a definition, they can demonstrate as little or much as they know about a particular word. It is then the responsibility of researchers to create a scoring system that adheres to a scheme capable of capturing partial word knowledge without rewarding answers that are only distantly related. Context effects are also irrelevant in this method of testing because target words are presented in isolation. The way in which word learning is measured has implications for the results of the study. The current study seeks to measure word learning from incidental exposure to words in context. Therefore, passages were selected from texts of typical difficulty and similar style to those read by undergraduates. Further, attention was not called to the target words, nor was direction given as to the need to later generate a definition for the target words. This design ensures that the study lends empirical support to adults’ ability to encode meaning features incidentally from text. A constructed response pretest/posttest design was utilized to capture as much variance in partial word knowledge as possible. It was also designed to decrease reliance on participants’ use of synonyms for definitions, as some low-frequency words do not have an easier synonym. Finally, 11 measuring metacognitive monitoring throughout the tasks was a crucial aspect of the current study, as metacognition has rarely been studied in regards to word learning. Purpose The purpose of the current study is to address several gaps in the literature on word learning. First, research has primarily focused on children’s developmental gains in competence. As Alexander’s (2005) lifespan development perspective of reading suggests, undergraduates are still gaining knowledge, interest, and competence reading within particular domains. Thus, it is imperative to examine this cross-section of readers’ lifelong development in order to appropriately scaffold required reading and consequently improve learning for students. Additionally, the progression from novice to expert within a domain such as psychology reciprocally interacts with metacognitive monitoring (Alexander et al., 1997). Given this landscape, it seems prudent to study metacognition in regards to word learning, as word learning is crucial to the development of principled knowledge within a domain. Monitoring is especially promising as a mechanism for change in the hypothesized views of word learning (Bolger et al., 2008; Sternberg & Powell, 1983). Since previous findings have shown that judgments of learning differ at the passage and word level (Dunlosky et al., 2005) it is necessary to determine if that finding is replicable across different kinds of discourse. Finally, more evidence is needed for incidental word learning so that future studies can distinguish between the processes and products of incidental vs. intentional (instructed) word leaning. Before such comparison can be made, adults’ ability or lack thereof deriving meaning incidentally from context must be established. The current study aims to provide evidence for incidental word learning and for the influence of metacognitive monitoring on word learning. The research design offers several unique strengths in filling these gaps in the literature, such as 12 naturally occurring texts, opportunities to self-report monitoring from fine-grained to global, and open-ended pretest and posttest items. The following research questions are under investigation. Do undergraduates’ JOLs and calibration as measures of metacognitive monitoring predict their gains in word knowledge? JOLs and calibration should uniquely predict gains in word knowledge as they are different grain size and asked either after reading passages or completing posttest items. Do JOLs and calibration across words predict changes in word knowledge? The reason for this question is to determine the contribution of relative accuracy to word learning. The previous question addressed absolute accuracy. Relative accuracy provides information on each item, rather than averaging across items. It is expected that relative accuracy will predict change in word knowledge, but that JOLs will not because they entail metacognitive monitoring at a more global level (whole passage and whole group of words). Do contextual factors (i.e., text difficulty and part of speech) contribute to differences in word learning, over and above indicators of metacognitive monitoring (i.e., JOLs and calibration)? Word and text factors are expected to have a mediating effect on metacognitive monitoring because task difficulty may decrease the accuracy of metacognitive monitoring, thereby decreasing its influence on word learning. Method Participants Ninety-six undergraduates participated in the study, but data were only analyzed from 60 of the participants. There were several reasons for removing participants from the data analysis. First, a large number of participants completed the first session, but were absent from class, or could not complete the second session. Second, several participants failed to complete a whole 13 section or measure. Third, a few participants were removed because they indicated that they were non-native English speakers on their demographics form. The students were enrolled in either a human development class, or an education class at a large, public university in the mid-Atlantic region of the United States. Students were primarily juniors (65%) and had an average age of 21.1 years. Eighteen male and 42 female students participated, and were predominantly Caucasian (58.3%). Measures Woodcock-Johnson III Diagnostic Reading Battery. Participants completed the reading comprehension and vocabulary subscales from the Woodcock-Johnson III Diagnostic Reading Battery. The W-J III DRB reading comprehension subscale is a series of cloze tasks, where students must fill in the blank with the appropriate word for each sentence. The W-J III DRB vocabulary subscales are a series of association tasks where a word is presented and participants are directed to provide a synonym for the synonyms subscale, an antonym for the antonyms subscale, and the appropriate word for the analogies subscale. These measures provided information about participants’ general level of reading skill, specifically reading comprehension and vocabulary knowledge. Cronbach’s alpha was .65 for this sample of undergraduates. Word-knowledge pretest. To assess participants' prior knowledge of the target words, the author created a word knowledge pretest for the study. The word-knowledge pretest consists of a list of 60 words (Appendix A). Thirty target words were chosen from the text passages administered in session two. These words are low frequency words, those that occur less than ten times per 5 million words of running text, as determined by Carroll, Davies, and Richman's (1971) The American Heritage Word Frequency Book. Example target words are dispelled and dilapidation. Target words were chosen with consideration for part of speech. Previous work has 14 found that it is easier to derive meaning for nouns than for other parts of speech (Brown, 1957). For this reason, the current study sought to balance the number of nouns and non-nouns to analyze differences in both word learning and calibration based on part of speech. Ten more words were chosen from text surrounding the passages. Example filler words are admonish and arbitrary. The purpose of the filler words was to prevent participants from focusing on target words that they would see again in session two. Finally, ten pseudowords from a previous study (Schwanenflugel, Stahl, & McFalls, 1997) were added to the wordknowledge pretest. Pseudowords follow English language rules for orthography, but have no meaning. Example pseudowords are devernal and edarthic. The directions given to participants were, "Write a definition or short description for every word that you can on the list. Please make your definitions as clear as possible so that I know that you understand the meaning of the word. I am not interested in the number of words that you know, so just do your best." After participants completed this first phase of the pretest, directions indicated, "Go through the list again and place a check mark beside any word that you left blank if you have seen it before or if it is familiar to you, even if you are not quite sure what it means." The purpose for this set of instructions was to gain information about partial word knowledge participants may have for target words. The pseudowords forced participants to discriminate between words they may have previously encountered, and therefore know some semantic feature of, and words that they have never encountered and do not have meanings. Responses to the word-knowledge pretest were scored on a scale of 0 to 3. A score of three was given to direct definitions or synonyms, as determined by the dictionary and thesaurus. A score of two was given for indirect synonyms, and a score of one was given for some correct feature of word meaning. On the pretest, a score of one was also given to any target words with a 15 check mark. A zero was given for incorrect answers. The author coded all responses to target words, and two additional raters each scored one-third of the target word responses. A calculation of Cohen's Kappa index of interrater reliability revealed 85% interrater reliability. This calculation is corrected for chance agreements, and is therefore a conservative estimate (Cohen, 1968). Additionally, the word knowledge pretest was found to be positively correlated with the vocabulary subscale from the W-J III DRB, r = .48, p < .01. This provides evidence for the validity of the word-knowledge pretest as a measure of existing word knowledge. Narrative passages. Participants read six counterbalanced narrative passages, each approximately 250 words in length, to present the target words in typically encountered context (Appendix B). The passages were taken from two sources, The Tales of Edgar Allan Poe (2004) and The Complete Works of Washington Irving (1978). These books were selected as sources because narratives were written by famous American male authors of roughly the same period. Based on text readability, a typically performing college sophomore could comprehend about 75% of text written by Washington Irving with ease, and 95% of the text written by Edgar Allan Poe. Text readability, often referred to as text difficulty, was determined by the Lexile Framework for Reading (2004). Lexiles are based on semantic difficulty (word frequency) and syntactic complexity (sentence length). Existing narrative texts were utilized in the proposed study in order to increase generalizability. Empirical work on word learning has chiefly used artificially constructed texts and tasks in order to create experimental manipulations (Durso & Shore, 1991; Fukkink, 2005; McKeown, 1985). By manipulating text, researchers change the characteristics of target words, contextual support, and text difficulty. Changing these factors does not simulate word learning opportunities in typically encountered texts. Thus, the current study sought to study word 16 learning in a manner which reflects a task undergraduates are likely to encounter over the course of typical reading. The texts were specifically chosen as domain general to avoid the confounding of prior topic knowledge with prior word knowledge. The focus of the current study is solely on the domain of reading. Once more is known about knowledge, interest, and strategic processing of word learning in this domain, the layer of content-domain knowledge, interest, and strategic processing can be added. Judgment of learning scales. Each passage was followed with two judgment of learning scales (Appendix B). The first question asked, "How confident are you in your understanding of the passage's overall meaning?" The second question asked, "How confident are you in your understanding of the individual word meanings from the passage?" Participants responded by marking a slash on a 100-mm line with 0% at one end and 100% at the other end. The value in using continuous rating scales rather than categorical scales has been demonstrated in the literature (Albaum, Best, & Hawkins, 1981; Schraw, Potenza, & Nebelsick-Gullet, 1993) and was deemed the best way to capture individual differences in self-report of judgments of learning. Cronbach’s alpha was .87 for the passage JOL scales and .83 for the word meaning JOL scales. Word-knowledge posttest. The word-knowledge posttest was similar to the wordknowledge pretest, with a shorter format and slightly different directions. Specifically, the posttest consisted of only the target words, and not the filler words and pseudowords. Participants were instructed to, "Write a definition or short description for each word. Please make your definitions as clear as possible so that I know you understand the meaning of the word. If you are unsure of a word's meaning, write your best guess." Responses were scored on 17 the same 0 to 3 scale as the pretest and the interrater reliability reported earlier includes scoring on posttest responses. The word-knowledge posttest was found to be significantly correlated with the word-knowledge pretest, r = .67, p < .01. Confidence scales. A confidence scale followed each word on the posttest. The directions demonstrated the calibration question as well as how to mark the 100-mm line. The question asked, "How confident are you in the accuracy of your response?" Participants generated a definition, or best guess description for each target word, and then evaluated the accuracy of their response from 0% to 100% on the confidence scale. Reliability was .92 for the confidence scale. Procedure Measures for the first session were group administered during class time. Participants completed the reading comprehension and vocabulary subscales of the W-J III DRB and wordknowledge pretest. These measures were counterbalanced across participants and took approximately 35 minutes to complete. Demographic information was also collected at this time. One week later participants were administered session two measures. By allowing one week between sessions to elapse, participants are likely to have forgotten specific words on the word-knowledge pretest which should contribute to the validity of word-knowledge posttest data. Second session measures included the contextual passages, judgment of learning scales, and the word-knowledge posttest with calibration scales. The passages were counterbalanced. Results Do undergraduates’ JOLs and calibration as measures of metacognitive monitoring predict their gains in word knowledge? 18 Word knowledge. Participants performed as expected for age-level and grade-level on the reading comprehension (M = 37.18, SD = 2.91) and vocabulary knowledge (M = 48.10, SD = 3.28) subscales of the W-J III DRB. Prior knowledge for the specific words used in the study, as measured by the word-knowledge pretest (M = 16.87, SD = 8.16) was quite variable. This highlights the importance of testing for individual differences in word knowledge when conducting studies of word learning. Participants also showed variable performance on the wordknowledge posttest (M = 15.67, SD = 10.23). Their mean difference scores (posttest mean – pretest mean) were negative (M = -.28, SD = .39), a finding that will be interpreted in the discussion section (Table 1). Word-knowledge mean difference scores negatively correlated with bias, r = -.28, p < .05 (Table 2). This suggests that the greater the change in word knowledge, the less confident participants were in their knowledge. Judgments of learning. Participants gave higher ratings for the passage JOL (M = 76.72, SD = 14.00) than for word meaning JOL scales (M = 70.96, SD = 15.89), t(59) = 3.02, p < .01. This suggests that overall, participants were fairly confident that they had comprehended the global meaning of the passages, but were less confident they had comprehended finer grained word meanings within the passages. However, passage JOLs were related to word JOLs, r = .52, p < .01. Further, JOLs of words within passages were related to performance on the vocabulary subscale of the WJ-III DRB, r = .30, p < .05. Calibration. The confidence scales (M = 28.73, SD = 18.66) included in the wordknowledge posttest captured ratings of how confident individuals were in their responses to the posttest. Absolute accuracy (confidence score – percent correct on the posttest) was calculated to capture participants’ average calibration across all items on the posttest. When absolute accuracy 19 is zero, it indicates perfectly accurate calibration to the task. For this task, absolute accuracy ranged from 0 to 100. The signed difference of the absolute accuracy calculation is called bias, which indicates whether participants were over- or under-confident in their responses. Although the means are reported for these variables (Table 1), a specific type of analysis called bootstrap was used because this study utilized difference scores, which are not assumed to follow a normal distribution (Bonate, 2000). A non-parametric bootstrap technique was used to create a 95% confidence interval around median scores of a random sample drawn from the participants in the study. Bootstrap has been identified as a good technique to test non-parametric data (Efron & Tibshirani, 1993), such as the difference scores in this investigation. The strategy was used to resample (N=5000) from the participants in the current study (n=58). The re-sample created a distribution in which the median was calculated (Med) along with a 95% confidence interval at the 2.5 (P2.5) and 97.5 (P97.5) percentiles. This allowed testing of the null hypothesis that differences between participants were zero at α = 0.05. Figure 1 presents the absolute accuracy and bias for the participants on the posttest. The median for absolute accuracy (the absolute difference between confidence and performance) was 14.56. This indicates that a participant at the 50th percentile had an absolute difference score between confidence and performance of 14.56. This difference was significant (Med = 14.56, P2.5 = 11.24, P97.5 = 18.43). The median for bias (the signed difference between confidence and performance was 11.66. This indicates that a participant at the 50th percentile had a signed difference score between confidence and performance of 11.66, indicating that they were overconfident. This difference was significant (Med = 11.66, P2.5 = 7.59, P97.5 = 15.95). 20 A regression analysis (Table 3) was run to determine the influence of bias on the mean difference word-knowledge scores. Bias was a significant predictor of gains in word knowledge. Do JOLs and calibration across words predict changes in word knowledge? Features of the task may influence JOLs and calibration, and subsequently participants’ word learning. In order to determine the impact of task features, analysis must be approached across words rather than across people. This is especially the case in considering texts because target words are nested within particular passages. Therefore, a measure of relative accuracy, or calibration for each item was obtained using Kendall’s tau b. This type of correlation was chosen because it tests the association between two ordinal variables. This means that relative accuracy ranges from -1 to 1. First, a regression analysis was run to determine the influence of JOLs and relative accuracy on gains in word knowledge (Table 5). While relative accuracy was found to be a significant predictor of word-knowledge gain, JOLs were not a significant predictor. Do contextual factors (i.e., text difficulty and part of speech) contribute to differences in word learning, over and above indicators of metacognitive monitoring (i.e., JOLs and calibration)? Text difficulty. Recall that texts for the current study were selected to differ in difficulty and that difficulty was determined by Lexile rating, considering sentence length and word difficulty. For the regression analysis, each word was coded as zero if it was from one of the three passages deemed easier for the typical high school graduate. Words were coded as one if they were from one of the three passages deemed somewhat challenging for high school graduates. 21 Part of speech. Since word meaning derivation has been found to be easier for nouns than for other parts of speech (Brown, 1957), type of word was an important consideration for the current study. Nouns were coded as a one and non-nouns were coded as a zero. Regression analysis revealed that relative accuracy was the only significant predictor of gains in word knowledge, with or without the entry of text difficulty and part of speech. Discussion Metacognitive Monitoring Influences Word Learning Results support the influence of metacognitive monitoring on word learning. Following is an examination of the concepts underlying this finding, as well as how these findings relate to hypotheses of the study. Word learning. Word learning was represented by changes in word knowledge from pretest to posttest. These changes were found to be small and negative, indicating that on the whole participants demonstrated greater knowledge on the pretest than they did on the posttest. There are several interpretations of this finding. First, individuals have a larger receptive vocabulary, or lexicon of known and partially known meanings, than expressive vocabulary, or the meanings that can be appropriately communicated (Durso & Coggins, 1991). Thus, participants may have learned some aspect of word meaning, but found themselves unable to express that new meaning, especially in light of information they may have had about that word before encountering it in the specific context presented. This can be further explained by McKeown’s (1985) findings that multiple encounters with a word cause interference because oftentimes information presented in new contexts does not overlap, or even conflicts with information already in memory. Given that participants recognized the target words as real words, based on some prior experience, during the word- 22 knowledge pretest, it is likely that the contexts presented did not provide information that supported their partial knowledge. For example, several participants correctly defined clove as a “type of spice,” or “a portion of a plant,” but were unable to produce a definition on the posttest. Within the context of the passage, clove meant “stuck.” According to the instance-based learning approach (Bolger et al., 2008), features of word meaning are encoded with the context. If this is the most recent information available, and readers have not determined how to resolve seemingly unrelated information about a word, with pre-existing knowledge it is easy to see why it was difficult to generate definitions on the posttest that were relatively easy on the pretest. One limitation to the study was that directions on the posttest did not indicate that participants should attempt all words. It was impossible to determine which words were left blank because participants were unable to generate a definition, and which were left blank due to fatigue effects or indifference. Nevertheless, it is interesting that overall there was a slight loss in demonstrated word knowledge. This makes sense in light of the presented framework for word learning, but it was not fully anticipated from college undergraduates. Further, greater gains in word learning were related to decreasing confidence in responses to the word-knowledge posttest. This suggests that undergraduates who were engaged in learning were unsure of how well they were learning. Given the effects of interference just described, it is not surprising that confusion led to decreasing confidence. Perhaps these word learners were in the acclimation phase of Alexander’s (2005) conceptualization of reading development. Judgments of learning. Although it seems intuitive that passage JOLs are related to JOLs of words within passages, this link has not always been present in previous studies (Dunlosky et al., 2005). Knowledge of word meanings has historically been found to contribute to reading comprehension (Stahl, 1999), but evidence has not been accumulated to suggest that 23 metacognitive monitoring of word meaning knowledge is related to metacognitive monitoring of reading comprehension. JOLs were not expected to influence word learning to the same extent as relative accuracy because the JOLs represent a more global judgment than confidence ratings. In other words, JOLs represent metacognitive monitoring of passage comprehension, but confidence ratings represent metacognitive monitoring of specific word knowledge. Since these types of monitoring were found to be related (Table 2), the regression analysis supported the hypothesis that JOLs and confidence ratings do indeed represent distinct constructs. Calibration. Both absolute accuracy and relative accuracy were found to influence word learning. Overall, undergraduates were overconfident in their performance on the wordknowledge posttest. Some of the participants were extremely well-calibrated, but most were poorly calibrated, as suggested by previous research (Glenberg & Epstein, 1985). Perhaps individuals who demonstrated very little word knowledge on the posttest were well aware of their lack of knowledge, and individuals who demonstrated a great deal of word knowledge were aware of the knowledge they possessed. It is those who were in the middle, effortfully learning new aspects of certain words and unsure how to gauge their progress, that may have been the least calibrated. Results for the first research question support the hypothesis that calibration, as represented by both absolute accuracy and relative accuracy, influences word learning. Little research has directly addressed metacognitive monitoring in regards to word learning while reading, so this evidence is a first step to better understanding this relation. 24 Contextual Factors that Influence Word Learning Contextual factors of the text and chosen target words were considered as influences on word learning. Specifically, text difficulty was chosen as an indicator because the relative ease with which a reader can comprehend the text influences their capacity to engage in word learning. Part of speech was chosen as an indicator of difficulty for the target words because several studies have found nouns to be easier to learn than adjectives, verbs, and other parts of speech (Brown, 1957; Schwanenflugel, Stahl, & McFalls, 1997). Neither of these indicators of contextual difficulty were found to influence word learning, contrary to the hypothesis that they would have a mediating effect upon metacognitive monitoring of word learning. Conclusions Including calibration in the conceptualization of metacognitive monitoring provides critical information about not just what individuals believe they are learning, but how well they estimate that learning has progressed. Since calibration can be measured across items (by person) it is possible to consider person factors, such as general reading skill. When calibration is measured item-by-item (by word) it is possible to consider factors inherent in the presentation of words within context. Studying word learning in typical contexts approximates conditions undergraduates are likely to encounter when reading independently for courses or for pleasure. Metacognitive monitoring should be studied not only within particular domains, but also across levels of processes typical of particular domains, from global to specific. In order to make this feasible, measurement of monitoring should occur throughout a complex task, such as reading. This type of paradigm allows examination of monitoring relations across levels of processing as well as information about how undergraduates rate their performance at various stages of their learning. Students were poorly calibrated and overconfident in their word 25 learning, as anticipated from studies of calibration related to reading comprehension (Glenberg & Epstein, 1985). Further investigation of this tendency, along with word, text, and person factors that may influence students’ efforts to calibrate their learning, would supply muchneeded evidence for the best kinds of feedback and training instructors might use to scaffold their undergraduates’ learning from assigned texts. Future Directions This exploratory study of metacognitive monitoring and word learning establishes their relation to each other and warrants further investigation. First, it is important to consider metacognitive monitoring as a process, not just a product. Think aloud protocol could be utilized to illuminate processes undergraduates call upon and when they are apt to use them while reading connected discourse. It would also be of interest to determine if the processes captured by think aloud protocol relate to the self-report measures used in this study to measure monitoring as a product. Perhaps the method of self-report is itself difficult for participants to calibrate. This would be especially effective paired with computer administration, as a computer environment could capture specific data regarding time to complete specific tasks and typewritten responses. Computer administration, with its aforementioned benefits, would also allow a longer type of study that could compare word learning in different contexts. Examples of context variation include providing dictionary definitions, comparing narrative vs. expository text, comparing informational vs. persuasive text, and looking at the influence of interest and prior knowledge by presenting texts from different domains. Further empirical investigation into the categories of vocabulary interest (DaalenKapteijns et al., 2003) would also provide much-needed information about when a reader chooses to pay attention to unknown words within text. One’s orientation to word learning and 26 its relation to reading comprehension would presumably influence both metacognitive monitoring and strategic processing. It may be helpful to create profiles of monitoring and strategy use for each of the approaches described by Daalen-Kapteijns et al. Another way to understand attention and word learning would be to directly compare intentional word learning tasks with incidental word learning tasks (such as the task in the current study). To date, the literature on word learning tends to illuminate either intentional or incidental tasks, but very few studies have directly compared the two types of word learning (Fukkink & de Glopper, 1998; Swanborn & de Glopper, 1999). The impact of feedback on calibration has been examined for word pairs and knowledge questions, but not for a complex task such as reading (Lichtenstein & Fischhoff, 1980; Nietfeld & Schraw, 2002). Although feedback was found to significantly and immediately improve calibration and subsequent performance for the simpler types of tasks, such effects may not be as straightforward for reading. Perhaps calibration feedback would need to be combined with instruction in metacognitive monitoring and strategy use in order to have any impact on reading outcomes. Additionally, developmental data are essential to understanding undergraduates’ word learning, as students may be acclimating to the demands of reading challenging texts across several domains. This would capture the impact of feedback over time and take into consideration increasing proficiency in both the content domain and domain of reading. Instructional Implications Feedback from instructors is one way in which undergraduates can hope to improve their metacognitive monitoring of word learning. Discussion questions are a common assignment in undergraduate courses, and one that instructors sometimes use as an indicator of whether students are engaging in deep processing with the assigned texts. Oftentimes, feedback on the 27 types of questions asked, and how students generated those questions is not included in instruction. Helping students become more strategic readers helps them become better learners. One aspect of strategic reading concerns word learning, as understanding vocabulary directly improves reading comprehension (Stahl, 1999). Teaching students how to better monitor their reading, and specifically their learning of terms signifying core concepts, should be an important goal in all college courses. Students are expected to learn independently at the undergraduate level, but cannot do so if those expectations are not properly scaffolded both inside the classroom and through assignments outside the classroom. 28 References Albaum, G., Best, R., & Hawkins, D. I. (1981). Continuous vs. discrete semantic differential rating scales. Psychological Reports, 49, 83-86. Alexander, P. A. (2005). The path to competence: A lifespan developmental perspective on reading. Journal of Literacy Research, 37, 413-436. Alexander, P. A., Murphy, P. K., Woods, B. S., Duhon, K. E., & Parker, D. (1997). College instruction and concomitant changes in students’ knowledge, interest, and strategy use: A study of domain learning. Contemporary Educational Psychology, 22, 125-146. Anderson, R. C., & Freebody, P. (1981). Vocabulary knowledge. In J. Guthrie (Ed.), Comprehension and teaching: Research reviews (pp. 77-117). Newark, DE: International Reading Association. Baker, L., & Brown, A. L. (1984). Metacognitive skills and reading. In P. D. Pearson, M. Kamil, R. Barr, & P. Mosenthal (Eds.), Handbook of reading research (Vol. 1, pp. 353-394). White Plains, NY: Longman. Baumann, J. F. (2004). Vocabulary-comprehension relationships. In B. Maloch, J. V. Hoffman, D. L. Schallert, C. M. Fairbanks, & J. Worthy (Eds.), 54th yearbook of the National Reading Conference. Oak Creek, WI: National Reading Conference, Inc. Bolger, D. J., Balass, M., Landen, E., & Perfetti, C. A. (2008). Context variation and definitions in learning the meanings of words: An instance-based learning approach. Discourse Processes, 45, 122-159. Bonate, P. L. (2000). Analysis of pretest-posttest designs. Boca Raton, FL: Chapman & Hall. Brown, R. W. (1957). Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology, 55, 1-5. 29 Carroll, J. B., Davies, P., & Richman, B. (1971). The American Heritage word frequency book. Boston: Houghton Mifflin Company. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. Davis, F. B. (1944). Fundamental factors of comprehension in reading. Psychometrika, 9, 185197. Dunlosky, J., Rawson, K. A., Middleton, E. L. (2005). What constrains the accuracy of metacomprehension judgments? Testing the transfer-appropriate-monitoring and accessibility hypotheses. Journal of Memory and Language, 52, 551-565. Durso, F. T., & Coggins, K. A. (1991). Organized instruction for the improvement of word knowledge skills. Journal of Educational Psychology, 83, 108-112. Durso, F. T., & Shore, W. J. (1991). Partial knowledge of word meanings. Journal of Experimental Psychology: General, 120, 190-202. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitivedevelopmental inquiry. American Psychologist, 34, 906-911. Fox, E., Dinsmore, D. L., Maggioni, L., & Alexander, P. A. (2008, March). Undergraduates’ independent and scaffolded reading of course texts: Further evidence of fragile understanding. Paper presented at the annual meeting of the American Educational Research Association, New York. Fukkink, R. G. (2005). Deriving word meaning from written context: a process analysis. Learning and Instruction, 15, 23-43. 30 Glenberg, A. M., & Epstein, W. (1985). Calibration of comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 702-718. Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11, 341-363. Irving, W. (1978). The complete works of Washington Irving: The sketch book of Geoffery Crayon, Gent. (H. Springer, Ed.) Boston: Twayne Publishers. Jenkins, J. R., Stein, M. L., & Wysocki, K. (1984). Learning vocabulary through reading. American Educational Research Journal, 21, 767-787. Kintsch, W. (1994). Text comprehension, memory, and learning. American Psychologist, 49, 294-303. Koriat, A. (1997). Monitoring one’s own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349-370. Lexile (2004). http://www.lexile.com Lichtenstein, S. & Fischhoff, B. (1980). Training for calibration. Organizational Behavior and Human Performance, 26, 149-171. McKeown, M. G. (1985). The acquisition of word meaning from context by children of high and low ability. Reading Research Quarterly, 20, 482-496. Nagy, W. E., Anderson, R. C., & Herman, P. A. (1987). Learning word meanings from context during normal reading. American Educational Research Journal, 24, 237-270. Nietfeld, J. L., & Schraw, J. (2002). The effect of knowledge and strategy training on monitoring accuracy. The Journal of Educational Research, 95, 131-142. Poe, E. A. (2004). The tales of Edgar Allan Poe. New York: Simon & Schuster. 31 Schraw, G., & Moshman, D. (1995). Metacognitive theories. Educational PsychologyReview, 7, 351-371. Schraw, G., Potenza, M. T., & Nebelsick-Gullet, L. (1993). Constraints on the calibration of performance. Contemporary Educational Psychology, 18, 455-463. Schwanenflugel, P. J., Stahl, S. A., & McFalls, E. L. (1997). Partial word knowledge and vocabulary growth during reading comprehension. National Reading Research Center Universities of Georgia and Maryland, Reading research report no. 76. Stahl, S. A. (1999). Vocabulary development. Cambridge, MA: Brookline Books. Sternberg, R. J., & Powell, J. S. (1983). Comprehending verbal comprehension. American Psychologist, 38, 878-893. Swanborn, M. S. L., & de Glopper, K. (1999). Incidental word learning while reading: A metaanalysis. Review of Educational Research, 69, 261-285. Thiede, K. W., Anderson, M. C. M., & Therriault, D. (2003). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95, 66-73. van Daalen-Kapteijns, M., Elshout-Mohr, M., & de Glopper, K. (2001). Deriving the meaning of unknown words from multiple contexts. Language Learning, 51, 145-181. Veenman, M. V. J., Elshout, J. J., & Meijer, J. (1997). The generality vs domain-specificity of metacognitive skills in novice learning across domains. Learning and Instruction, 7, 187209. 32 Appendix A: Items on Word-Knowledge Pretest and Posttest Target Words Boding Boorish Capacious Clove Congeniality Contrived Countenance Dilapidation Dispelled Filigreed Fissure Gilded Motley Pacific Pertinacious Petulant Pommel Psalmody Specious Specter Stave Sullen Tarn Varlet Veritable Vignette Waggery Wended Withe 33 Filler Words Admonish Arbitrary Benefactor Derision Docile Expound Forlorn Harbinger Ineffable Incipient Kindle Lucid Melancholy Prodigious Sagacious Sentiment Thwart Tumultuous Vehemently Wane Whim Pseudowords Calsar Devernal Drallen Edarthic Fossern Jandelar Merriton Phisteron Redistac Thonstan 34 Appendix B: Sample Passage and Judgment of Learning Scales The portrait, I have already said, was that of a young girl. It was a mere head and shoulders, done in what is technically termed a vignette manner, much in the style of the favorite heads of Sully. The arms, the bosom, and even the ends of the radiant hair melted imperceptibly into the vague yet deep shadow which formed the background of the whole. The frame was oval, richly gilded and filigreed in Moresque. As a thing of art nothing could be more admirable than the painting itself. But it could have been neither the execution of the work, nor the immortal beauty of the countenance, which had so suddenly and so vehemently moved me. Least of all, could it have been that my fancy, shaken from its half slumber, had mistaken the head for that of a living person. I saw at once that the peculiarities of the design, of the vignetting, and of the frame, must have instantly dispelled such an idea – must have prevented even its momentary entertainment. Thinking earnestly upon these points, I remained, for an hour perhaps, half sitting, half reclining, with my vision riveted upon the portrait. At length, satisfied with the true secret of its effect, I fell back within the bed. I had found the spell of the picture in an absolute life-likeness of expression, which, at first startling, finally confounded, subdued, and appalled me. How confident are you in your understanding of the passage's overall meaning? 0_______________________________________________100% How confident are you in your understanding of individual word meanings from the passage? 0_______________________________________________100% 35 Table 1 Descriptive Statistics of Metacognitive Monitoring and Word Knowledge Min. Person Max. Word Max. Mean (SD) Min. Judgments of Learning 25.83 93.83 70.96 (15.89) 63.12 83.23 72.85 (6.10) Confidence Ratings 0.80 72.53 28.73 (18.66) 7.33 55.24 27.69 (12.60) Calibration1 0.31 56.97 14.63 (13.89) 0.02 0.62 0.37 (0.15) Word Knowledge Difference Score -1.33 0.56 -0.28 (0.39) -0.70 0.40 -0.21 (0.32) Comprehen sion 31.00 44.00 37.18 (2.91) Vocabulary 42.00 56.00 48.10 (3.28) Bias -30.10 56.97 Mean (SD) 11.71 (16.46) Note. Calibration was calculated as absolute accuracy for questions across people and relative accuracy across words. Metacognition and Word Learning 36 Table 2 Intercorrelations between Metacognitive Monitoring, Word Knowledge, and General Reading Skills 1 2 3 4 1. Comp — 2. Vocab .49** — 3. JOL .10 .30* 4. PCR .16 .38** .46** — 5. Bias -.12 .17 .28* .81** 6. AbsA -.05 .22 .37** .81** 7. WKC .18 .24 .24 5 6 7 — .14 — .81** -.28* Note. Comp = Nelson-Denny Comprehension; Vocab = NelsonDenny Vocabulary; JOL = Judgment of Learning; PCR = Posttest Confidence Rating; Bias = Bias; AbsA = Absolute Accuracy (calibration); WKC = Word Knowledge Change. *p < .05, **p < .01 — -.04 — Metacognition and Word Learning 37 Table 3 Intercorrelations between Relative Accuracy, Judgments of Learning, and Confidence 1 2 1. RelA — 2. JOL .01 — 3. Conf .41* -.01 3 — Note. RelA = Relative Accuracy (Calibration); JOL = Judgments of Learning; Conf = Confidence. *p < .05 Metacognition and Word Learning 38 Table 4 Summary of Step-wise Regression Analysis for Person Variables Predicting Change in Word Knowledge (N = 57) Variable B SE B Β Step 1 Bias -.01 .00 -.28* -.01 .00 -.32* Step 2 Bias Comprehension .00 .20 .02 Vocabulary .20 .25 .03 Note. R2 = .08 for Step 1 (p < .05); ΔR2 = .07 for Step 2. *p < .05 Metacognition and Word Learning 39 Table 5 Summary of Step-wise Regression Analysis for Metacognitive Monitoring Variables Predicting Change in Word Knowledge (N = 24) Variable B SE B Β Judgment of learning -.01 .01 -.19 Relative accuracy .47* Note. R2 = .23 (NS). *p < .05 .95 .39 Metacognition and Word Learning 40 Table 6 Summary of Step-wise Regression Analysis for Context Variables Predicting Change in Word Knowledge (N = 24) Variable B SE B Β Step 1 Relative accuracy .95 .39 .47* JOL .01 -.19 Relative accuracy 1.00 .36 .50* JOL -.01 .01 -.14 Part of speech -.21 .11 -.36 Text difficulty .20 .11 .33 -.01 Step 2 Note. R2 = .23 for Step 1; ΔR2 = .22 for Step 2 (p < .05). *p < .05 Metacognition and Word Learning 41 Figure 1 Median Differences in Absolute Accuracy and Bias Calibration (Confidence - Performance) 16 14 * * 12 10 8 6 4 2 0 Absolute Accuracy Bias