Metacognitive Monitoring 1 Running head: METACOGNITIVE MONITORING The Effects of Persuasive and Expository Text on Metacognitive Monitoring and Control Daniel L. Dinsmore, Sandra M. Loughlin, and Meghan M. Parkinson University of Maryland Metacognitive Monitoring 2 Abstract This investigation examined metacognitive processes across two text types (persuasive and expository). We also considered the effects of think aloud and three expertise levels (acclimation, competence, and proficiency) for scrolling (i.e., moving back and forth in text) and calibration (i.e., difference between confidence and performance). Participants were undergraduates enrolled in either human development (n = 38) or government/politics courses (n = 38), and practicing attorneys (n = 4). Participants read two passages on judicial review presented via computer, and trace data on scrolling behaviors were logged during reading. Additionally, a calibration measure was completed after reading. Think-alouds were coded for metacognitive utterances. Data were analyzed via non-parametric bootstrapping. Significant differences between text type were found for scrolling, calibration, and utterance categories. There was no significant difference for think aloud condition on scrolling or calibration. Only scrolling was statistically different for expertise level. However, median differences revealed interesting trends between expertise groups requiring further investigation. Metacognitive Monitoring 3 The Effects of Persuasive and Expository Text on Metacognitive Monitoring and Control Students, particularly those at the undergraduate level, are often required to read, evaluate, and use information presented in text. Too often, it is assumed that undergraduates are competent readers able to use text type effectively to learn essential content. However, this assumption has recently been called into question (e.g., Fox, Dinsmore, Maggioni, & Alexander, 2009). Specifically, Fox et al. (2009) found that undergraduates enrolled in a research methods course were able to recall only limited information from course-related texts and did not display the strategic processing expected of competent readers. One possible explanation for those reported shortfalls was that these undergraduates were poor at monitoring and controlling their cognitive processes, particularly in regards to comprehension (Wiley, Griffin, & Thiede, 2005). These problems at monitoring and controlling may revolve around the students’ inability to use prior knowledge (e.g., Shapiro, 2008), calibrate their learning (i.e., monitoring the relation between confidence and performance; e.g., Dunlosky, Serra, Matvey, & Rawson, 2005), set goals, or activate appropriate strategies (e.g., Aleven, McLaren, Roll, & Koedinger, 2006). The present study was compelled by these concerns and by the goal of understanding how presumably competent readers engage with texts cognitively and metacognitively. As we move into this investigation, we are aided by the fact that the research on metacognition represents a mature line of inquiry. In particular, there is an extensive literature on the relation between metacognition and text (e.g., Wiley, Griffin, & Thiede, 2005). However, despite the richness and diversity in this line of research, several problems and gaps persist. Specifically, the literature on metacognitive monitoring and control has relied primarily on short segments of text rather than extended discourse and has not considered the potential effects of text type or genre on monitoring or control processes. Further, we considered the possibility that Metacognitive Monitoring 4 certain measures of metacognition in the literature may actually influence metacognitive processing. Finally, we found limited consideration given to the levels of readers’ expertise in a domain and their subsequent metacognitive processing. The aim of the present research is to address these gaps. For the purpose of this investigation, metacognition is defined as "thinking about thinking" (Miller, Kessel, & Flavell, 1970, p. 613), which encompasses four key components (Flavell, 1979): metacognitive knowledge, metacognitive experiences, cognitive goals, and the strategy activation. Metacognitive knowledge refers to knowledge or beliefs that guide the course of mental operations at either the person, task, or strategy level, while metacognitive experiences are the cognitive or affective experiences that pertain to a mental operation. Cognitive goals refer to cognitive or metacognitive goals that direct cognitive or metacognitive activity. Finally, strategies are cognitive actions that are evoked to monitor (metacognitive strategies) or make (cognitive strategies) progress toward a goal. There are number of studies of metacognition that have examined the difference between individuals’ confidence and their performance (i.e., calibration) with tasks involving the memorization of word pairs (e.g., Thiede & Dunlosky, 1994) or general knowledge questions (Dahl, Allwood, & Hagberg, 2009). However, these studies have infrequently considered the effect of topic or domain on the outcomes reported, particularly as it relates to academic domains (Parkinson & Dinsmore, in preparation). Although this research provides insights into monitoring and control processes, it sheds little light on what might be occurring within the minds of undergraduates reading challenging texts from which they are expected to learn new and complex content. Further, those monitoring studies that have used connected discourse typically utilize expository texts such as Encarta (e.g., Moos & Azevedo, 2008). Metacognitive Monitoring 5 Expository text is characterized as non-fiction reading material in which the intent is to inform or explain (Williams, Stafford, Lauer, Hall, & Pollini, 2009). Although students often read expository texts, the aforementioned studies have not been designed to establish how that particular type of text over other forms may affect metacognitive monitoring and control. For that reason, we have chosen to compare participants’ metacognitive processing with two text types (i.e., expository and persuasive text). Persuasive text is defined as text in which an author argues a point of view in order to change a reader’s knowledge, beliefs, or interest (Kamalski, Sanders, & Lentz, 2002; Murphy, Long, Holleran, & Esterly, 2003). Our interest in persuasive text for this study comes from the finding that such text can be influential in sparking students’ interest and deepening their knowledge (e.g., Buehl, Alexander, Murphy, & Sperl, 2001; Carrell & Connor, 1991). This may be especially true for two-sided refutational text in which competing views on an issue are presented, although to the advantage of one view over the other (Allen, 1991). We expected that by presenting participants with two different texts on judicial review, we might uncover differences in their metacognitive monitoring and control. Whether the text is expository or persuasive, it is still necessary to find some viable method for unearthing typically covert mental processes. This is no easy task and is made more difficult because the measurements themselves may in fact disrupt these mental processes. This has long presented a problem for metacognitive researchers, who have attempted a variety of metacognitive measures. In their review of the metacognition literature, Dinsmore, et al (2008) identified six types of measures in the contemporary literature: self report, observation, thinkaloud, interviews, and performance ratings. Measures of metacognition should be chosen based on their utility in uncovering these mental processes, but not disrupting them. Previous measures, Metacognitive Monitoring 6 such as performance ratings (i.e., calibration) and observational measures such as rereading (as measured by the number of times participants scroll backward through text; e.g., JohnsonGlenberg, 2005) and help seeking (operationalized as soliciting help from an outside source; e.g., Aleven & Koedinger, 2002) have not shown evidence of disrupting covert mental processes. Of particular interest here was the affect think-aloud protocols may have on other measures of monitoring and control commonly used (i.e., calibration and observational measures). This issue has become more salient because there has been an increasing use of thinkaloud methodology in the metacognition literature (Dinsmore et al 2008). In a think-aloud protocol, participants are asked to perform a task while continuously reporting thoughts that occur during a task (Erricson & Simon, 1984). Further, Ericsson and Simon conjecture that these thoughts emanate from working memory. By positioning these concurrent verbalizations in working memory, think-aloud protocol should only elicit verbalizations about deliberately enacted strategies, not automated skills (e.g., decoding in reading). However, the question of whether the think-aloud protocol affects processing is far from resolved. Veenman, Elshout, and Groen (1993) investigated this issue with measures of regulatory processing in a discovery learning situation. They found no significant differences in their measures of regulatory processing (as measured by student performance relevant to strategic processing) between a think-aloud and no think-aloud conditions. However, they did find that time on task differed significantly between the two groups. Since the think-aloud protocol took a significantly longer amount of time, it is quite possible that this placed higher demands on participants working memory. These higher demands may limit the amount of strategic processing one is able engage in, or conversely, take longer because the protocol itself is eliciting more strategic processing from the participant. Metacognitive Monitoring 7 Although there is no direct empirical evidence that think-aloud protocol affects strategic processing, there is evidence that it negatively affects learning outcomes. Karahasanović, Hinkel, Sjøberg, and Thomas (2009) concluded in their study that think-aloud protocol impacted not only reading time, but also negatively impacted participants’ posttest scores. Further, in a descriptive study, Greatorex and Süto (2008) found that participants reported a wide variation in participants’ comments about their experience with the think-aloud protocol. These findings may indicate that due to increased demands on participants’ time, variability of participants’ descriptions of their experience with the think-aloud protocol, and a negative impact on learning outcomes, it seems likely that some aspect of metacognitive monitoring and control would be affected by the think-aloud protocol. This study addresses this concern by comparing participants’ responses measurements of metacognitive monitoring and control not expected to affect covert mental processing (i.e., scrollbacks, calibration, and help seeking) in a think-aloud and no think-aloud condition. We would expect that the think-aloud protocol would elicit more instances of metacognitive monitoring and control due to the fact that it attempts to make normally covert processes overt. Finally, the present research addresses how metacognitive monitoring and control change with expertise in a particular domain, a relation that has received minimal consideration in the literature. With a few notable exceptions, most studies of metacognition have not considered the effect of expertise on metacognitive processes (e.g., de Bruin, Rikers, & Schmidt, 2007). Rather, they have investigated single populations (i.e., readers who have similar levels of expertise relative to the content of the text) or have not addressed the issue of expertise at all (e.g., Rhodes & Castel, 2008). This research paradigm is problematic because the literature predicts differential processes for individuals at varying levels of expertise within a domain. For example, Metacognitive Monitoring 8 Alexander’s Model of Domain Learning (MDL; Alexander, 1997) hypothesizes that levels of expertise (i.e., acclimation, competence, and proficiency) result from the differential confluence of knowledge, interest, and strategies; a confluence that likely has implications for metacognitive monitoring and control processes. For instance, it is probable that individuals at higher levels of expertise are more knowledgeable about and invested in issues relevant to their domain, and thus likely engage in different patterns of metacognitive monitoring and control, particularly with respect to calibration, than are novices while reading the same text. In the current study, this relation was addressed by targeting pools of participants at varying levels of expertise in government and politics, the domain in which our task was situated (i.e., the texts utilized for this study were on the topic of judicial review). The first pool was comprised of undergraduates in a human development course that we predicted to have low prior knowledge and interest in the domain (i.e., acclimation). We also recruited undergraduates enrolled in a government and politics course, who we expected to demonstrate moderate levels of prior knowledge and interest in government and politics (i.e., competence). Lastly, we included practicing attorneys for our expert group, predicting that they would articulate high levels of prior knowledge and interest in government and politics. Moreover, their professional status indicated their level of expertise in the domain. It was our expectation that these groups would differentially monitor and control their reading behaviors. Method Participants The participants for this study were recruited from three different pools. The first pool consisted of undergraduates at a large mid-Atlantic university in the United States enrolled in two sections of an introductory human development course. For the students enrolled in the Metacognitive Monitoring 9 human development course (n = 38) the average age was 21.16. Participants in this first pool were 52.63% female and 76.32% Caucasian. The average GPA for this first pool was 3.25 and they had completed an average of 80.68 cumulative college credits. These participants came from a variety of academic majors. The second pool consisted of undergraduates at the same university that were enrolled in an upper-level government and politics course. For the students enrolled in the government and politics course (n = 38) the average age was 20.34. Participants in this second pool were 39.47% female and 60.53% Caucasian. The average GPA for this second pool was 3.30 and they had completed an average of 74.47 cumulative college credits. 71.05% of the participants from the government and politics class were government and politics majors. The third pool consisted of practicing attorneys from the mid-Atlantic region of the United States. For the practicing attorneys (n = 4) the average age was 28.5. Participants in this third pool were all male and 75.00% Caucasian. Materials The materials for this study were all computerized. The materials in the computer environment consisted of two text passages and a glossary, as well as the measures for the study. Text passages. The texts for this study consisted of an expository passage and a two-sided refutational passage. The topic for these texts was judicial review. Currently, there is some debate over the use (i.e., the overuse) of judicial review that is referred to as judicial activism. Each of the two passages was adapted so that they were of similar length and difficulty. These passages were presented in a text box with scroll arrows on the right hand side. Three lines of text were clearly visible at a time. Lines of text both above and below the target text were in light gray. Metacognitive Monitoring 10 The expository passage (Appendix A) was adapted from a Microsoft Encarta entry on the judicial branch (Microsoft, 2008). This passage described the role of the judicial branch and did not contain any argument relating to judicial review or judicial activism. The expository passage was 1,111 words and was 79 lines long. The Flesch Reading Ease for this passage was 44.5 and the Flesch-Kincaid Grade Level was 12.4. The two-sided refutational passage (Appendix B) was adapted from two sources. The first source was from a transcript of a speech given by then Attorney General Alberto Gonzales at the American Enterprise Institute on January 17, 2007, entitled, ”Democracy and the Third Branch" (Gonzales, 2007). Gonazales argued in his speech that the judicial branch should exercise extreme caution in when declaring executive and legislative actions unconstitutional. The second source was an article written by Clint Bolick, a member of the CATO Institute, which appeared in the Wall Street Journal April 3, 2007, entitled, "A Cheer for Judicial Activism" (Bolick, 2007). Bolick argued in the article that the judiciary must do everything possible to ensure that the government does not infringe on individuals' civil liberties. These two sources were woven together to create a two-sided refutational text in which Gonzales's arguments restricting the use of judicial activism were refuted by Bolick's arguments for the judicial to do everything possible to ensure individuals' liberties. The two-sided refutational passage was 1,213 words and was 81 lines long. The Flesch Reading Ease for this passage was 39.2 and the Flesch-Kincaid Grade Level was 14.2. Glossary. The glossary consisted of a glossary of terms as well as biographical information on both Alberto Gonzales and Clint Bolick. The glossary of terms listed keywords from each of the texts and gave their definitions. The definitions for these terms were adapted from the Merriam-Webster Online Dictionary (Merriam-Webster, 2008). Sample terms included: Metacognitive Monitoring 11 judicial activism, Alexander Hamilton, James Madison, tyranny, Constitutional law, deference, and judicial review. The brief biographies for both Alberto Gonzales and Clint Bolick were each less than three hundred words. Measures The measures for this study were also all computerized. The measures for the study include: demographics, prior knowledge, topic interest, passage knowledge, and calibration. Demographics. The demographics questionnaire had students report their sex, age, and ethnicity (using the United States Census Bureau categories). For the undergraduates, they were also asked to report their academic major, cumulative college credits completed, and their cumulative grade point average (based on a four-point scale). Prior knowledge. The prior knowledge test measured participants' prior knowledge on the topic of the judicial review process. The measure consisted of sixteen multiple-choice items based on information in both the expository and persuasive passages. All the prior knowledge questions came from the two passages (eight from the expository passage and eight from the persuasive passage). The responses for the multiple-choice items were scored using a targeted response model (Alexander, Murphy, & Kulikowich, 1998). In this way, differentiation between those immersed in the topic or domain and those not immersed in the topic or domain could be made. An example of one of the multiple choice items appears below. Appellate jurisdiction is exercised by __________. a. the United States courts of appeals (4) b. the Supreme Court (2) c. the President (0) Metacognitive Monitoring 12 d. trial courts (1) The answer choices corresponded to one of the following categories: in-topic correct responses, in-topic incorrect responses, in-domain incorrect response, and popular lore responses. In this case, the answer choice "the United States Court of Appeals" was the in-topic correct response and was scored a 4. The answer choice "the Supreme Court" was the in-topic incorrect response and was scored a 2. The answer choice, "the Supreme Court" was incorrect, but was within the topic of judicial review. The answer choice, "trial courts" was the in-domain incorrect response and was scored a 1. Although the trial courts fall within the domain of government and politics, they have no role in the topic of judicial review. The answer choice, "the President" was a “popular lore” answer and was scored a 0. This response was one in which someone with little to no domain knowledge may choose. The Cronbach's alpha for the prior knowledge measure was 0.59. Although lower than the suggested alpha for experimental measures of 0.70, the depressed alpha in this case may represent participants' fragmentary knowledge on the topic of judicial review. Bernardi (1994) suggests that alpha is partially dependent on the sample chosen. In this case, it is quite possible that the weak correlation between items may have been due to the fact that the participants (particularly the human development undergraduates) may have some declarative knowledge (i.e., statements or propositions about a domain) but that this knowledge may not be principled (i.e., overarching conceptualizations in a domain). Topic interest. Topic interested was assessed by having participants report their level of interest for ten items related to the judicial branch. These items included how interested they were in: checks and balances, historic court decisions, judges and justices, the Constitution, and the founding fathers. The participants were asked to respond to these ten items by making a slash Metacognitive Monitoring 13 on a 100-pixel line with "not interested" and "very interested" at opposite poles. The Cronbach's alpha for this scale was 0.90. An example item for the topic interest scale appears below. Governmental systems of checks and balances not interested very interested Passage knowledge. For each passage, knowledge was assessed immediately following each passage. These passage knowledge questions related directly to information presented in the passage and were similar to the prior knowledge question both in the wording of the questions and the response format. However, the particular questioned that appeared after each passage could only be answered from the passage the participants had just read. There were eight questions per passage. Cronbach’s alpha for this scale was 0.39. As discussed above, these were the same questions as the prior knowledge test. The lower alpha when these items are presented after reading a passage actually presents an interesting picture. One possibility is the differential ability of participants to learn from text, thereby weakening further the correlations between items for this sample. Regardless, since these posttest items were taken directly from the passage, the validity for the scale outweighs concerns about the reliability as reported by Cronbach's alpha. Calibration. Immediately following each passage knowledge question, participants were asked to rate their confidence in the answer to the preceding passage knowledge question. The participants were asked to respond to the calibration items by “clicking on the line indicating how confident you would be in the accuracy of your response to the following questions." The Metacognitive Monitoring 14 Cronbach's alpha for the confidence scales was 0.89. A sample item for calibration appears below. Appellate jurisdiction is exercised by __________. 0% 100% Trace Data In addition to the measures described above, we collected trace data in the form of logfiles for scrollbacks and help seeking. We also collected trace data in the form of audiotapes for the think-aloud protocol. Scrollbacks. Scrollbacks were operationalized as the number of times a participant scrolled backward through the text by at least three lines or more, similar to a study conducted by Johnson-Glenberg (2005). Trace data were collected on participants' navigation patterns through the text to give us a count of the total number of scrollbacks for each passage for each participant. Additionally, we also tracked the amount of time the participants spent on each portion of the text (i.e., the three line segments). Help seeking. Help seeking was operationalized as the number of times a participant accessed the glossary terms or biographies. Access to the glossary was either access to the terms or the biographies. Trace data were collected to give us the total number of times for each passage that the participants accessed the glossary. Think aloud. Participants in the think-aloud condition were asked to think aloud while reading each of the two passages (see the procedures section for more information on the thinkaloud protocol). The 35 think alouds were transcribed into text files by the first and third authors. Metacognitive Monitoring 15 These transcripts were then coded for instances of metacognitive monitoring and control by the first and second authors. Using Flavell's (1979) conception of metacognition, transcripts were coded for instances of metacognitive knowledge (MK), metacognitive experiences (ME), goals (G), and the activation of strategies (AS). During coding, goals and the activation of strategies was combined into a single code (G/AS). Definitions of these three codes and examples for each appear in Table 1. The level of inter-rater reliability for a randomly seleted 20% of the think alouds was 90.66%. Differences between these codes were resolved through conference. This level of inter-rater reliability was considered acceptable, and the first author coded the remainder of the think-aloud transcripts using this coding scheme. Procedure All participants were treated according to APA (5th Edition) guidelines and completed a consent form before participating. The experiment was conducted on four PCs in a laboratory running Internet Explorer 7.0. Data were sent from the PCs to a secure external Apache server running on a UNIX platform. The experiment was administered by the first, second, and third authors. Both the order of passages (i.e., expository and persuasive) as well as think-aloud condition were counterbalanced in a Latin-squares design. No think-aloud condition. Participants were seated at one of four computer workstations in the laboratory. At the beginning of the experiment participants were instructed, "For all of the measures, if you don't know the answer, please take your best guess." Participants completed the demographic, prior knowledge, and topic interest measures. According the Latin-squares design, participants in the no think-aloud condition either had the expository passage first or the persuasive passage first. Participants were told that they were going to answer questions after reading the passages. As the participants read the text, they scrolled up or down through the text Metacognitive Monitoring 16 until they reached the end of the text. By clicking the "continue" button at the end of the passage student were directed to the passage knowledge measure. Immediately after the recognition items, participants completed the confidence scales for each question (calibration). Following the calibration items, participants completed beliefs and passage interest measures for that particular passage. Participants then repeated the same procedure for the second passage. Following the experiment participants were debriefed. Think-aloud condition. The procedure for the think aloud condition was identical to the no think-aloud condition, except for the following additions. Before the first passage subjects were given instructions for the think-aloud protocol and given a short practice passage. The protocol for the think aloud is included in Appendix C. The practice passage was about mosquitoes and was adapted from a popularly written science article by Marston Bates (1975). Once participants felt comfortable reading aloud, they then read either the expository or persuasive text first. Before each passage participants were instructed, "As you read this text, please say out loud what you are thinking and doing." Participants could choose to read aloud or not. If participants were silent for more than 30 seconds the experimenter prompted the subjects again to please say out loud what they were thinking or doing. This procedure was repeated for the second passage. Results and Discussion Results for each of the three research hypotheses (i.e., textual influences on metacognitive monitoring and control, effects of the think-aloud protocol on metacognitive monitoring and control, and the influence of domain expertise on metacognitive monitoring and control) are presented and briefly discussed. Due to internet and power failures during data collection, data from 4 participants were Metacognitive Monitoring 17 lost. The following analyses used the 76 remaining participants (n = 36 for the think aloud condition and n = 40 for the no think aloud condition). In addition one think aloud was unusable due to poor tape quality. Given the circumstances, these data can be considered missing at random and not a participant effect. All presented analyses use data from the remaining 76 participants and 35 think-aloud transcripts unless otherwise noted. Means and standard deviations for the metacognitive monitoring and control variables (i.e., scrollbacks, help seeking, absolute accuracy, and bias) appear in Table 2 across passages and think-aloud conditions. The data in Table 2 provide evidence that the number of help seeking behaviors that this sample engaged in was very limited. Due to the very low prevalence of help seeking in this investigation (i.e., three participants), we have excluded it from further analyses. Since the data collected during this investigation consisted of both trace data (i.e., count data) and difference scores (i.e., calibration), inferential analysis on the means of these variables was considered inappropriate. Since these frequency data and difference scores should not be considered to follow a normal distribution, we chose to use a non-parametric bootstrap technique. Bootstrap has been identified as a good technique to test non-parametric data (Efron & Tibshirani, 1993), such as frequency and difference scores in this investigation. For all of the following tests we used the bootstrapping technique to resample (N=5000) from the participants in our study (n=76). The re-sample created a distribution in which we calculated the median (Med) along with a 95% confidence interval at the 2.5 (P2.5) and 97.5 (P97.5) percentiles. This allowed us to test null hypotheses that differences between passages, conditions, groups, or interactions were zero at α = 0.05. Textual Influences on Metacognitive Monitoring and Control We compared the monitoring and control variables (i.e., scrollbacks, absolute accuracy, Metacognitive Monitoring 18 and bias) between the expository and persuasive passages. In addition to looking at differences between these measures, we also examined the think-aloud data from participants in the thinkaloud condition (n=35). Scrollbacks. We began by testing to see how many times participants scrolled backward through the expository passage and the persuasive passage (a between-passages test). Figure 1 displays the medians for both passages (0.76 for the expository passage and 1.00 for the persuasive passage). The median difference between these two passages was -0.25. This suggests that overall scrollbacks were used 0.25 more times during the persuasive passage than the expository passage. This was difference was not significant (Med = -0.25, P2.5 = -0.82, P97.5 = 0.35). The lack of difference between passages may mask the difference within individuals between the passages. Specifically, we calculated the difference scores for each individual on scrollbacks between the passages in order to investigate whether individuals used scrollbacks more often for the expository or persuasive passage. This is in effect a within-subjects repeated measures test using bootstrapping. First, we tested the value of the absolute difference (by participant) in the usage of scrollbacks between the passages by subtracting the number of scrollbacks in the expository passage by the number of scrollbacks in the persuasive passage. The median of the resample from the bootstrap test was 0.99. This indicates that a participant at the 50th percentile of difference scores had a difference in scrollback usage between the passages of 0.99, regardless of which passage they used the greater number of scrollbacks for. This median difference was significantly different than zero (Med = 0.99, P2.5 = 0.72, P97.5 = 1.32). Specifically, to test our hypothesis that the persuasive passage would elicit more evidence of metacognitive monitoring and control, we examined the directionality of these difference Metacognitive Monitoring 19 scores (i.e., did the individuals use more scrollbacks for the persuasive passage?). Here we tested the value of the signed difference (retaining the positive or negative value of the difference score) in participants’ usage of scrollbacks between the passages. The median difference was 0.24. This indicates that a participant at the 50th percentile of signed difference scores used 0.24 more scrollbacks for the persuasive text than the expository text. However, this was not a significant difference (Med = -0.24, P2.5 = -0.61, P97.5 = 0.14). This evidence suggests that there is in fact a main effect for scrollbacks between passages, but that the directionality (i.e., which passage was greater) was non-significant in this sample. Calibration. To test participants’ calibration between passages, we calculated both absolute accuracy and bias. We used a similar procedure to the one Nietfeld, Cao, and Osborne (2005) used. For absolute accuracy we calculated the difference between their overall confidence on the multiple-choice items (on a 100-pixel scale) and their corresponding performance on those posttest multiple-choice items (percent correct on the posttest). Since we used a targeted response model, we divided the scores across the eight items by 32 (maximum possible score on all eight items) instead of the total number of items, as Nietfeld, et al (2005), did. We then took the absolute value of these differences to get an absolute accuracy score for each individual. For bias, we used the same procedure except that we retained the signed value of the difference score between confidence and performance to see if the participants were over- or under-confident. We hypothesized that participants would be better calibrated for the persuasive passage than the expository passage. Figure 2 presents the absolute accuracy and bias scores between the passages. Lower difference scores indicate that participants were better calibrated. Medians for absolute accuracy were 11.41 for the expository passage and 11.45 for the persuasive passage. The median difference between the two passages for these participants on absolute accuracy was Metacognitive Monitoring 20 -0.075. This indicates that a participant with an absolute difference score at the 50th percentile was more closely calibrated on the expository passage than the persuasive passage by 0.075 points, regardless of whether they were overconfident or under-confident on the passages. This difference was not significant (Med = -0.075, P2.5 = -3.88, P97.5 = 3.97). Further, we predicted that participants would be overconfident for the expository passage, but not so for the persuasive passage. Figure two shows that, in fact, participants were overconfident for the expository passage (Med = 3.97) and under-confident for the persuasive passage (Med = -4.63). This indicates that a participant with a signed difference score at the 50th percentile was overconfident on the expository passage by 3.97 points (confidence was higher than performance) and under-confident on the persuasive passage by -4.63 points (performance was higher than confidence). The median difference between the passages was significant (Med = -8.59, P2.5 = -14.74, P97.5 = -2.39). This evidence suggests that while absolute accuracy did not differ between the two passages, the manner in which they differed (i.e. over- or underconfidence) did. Think aloud. Figure 3 presents the differences in median number of utterances for metacognitive knowledge, metacognitive experiences, and goals/activation of strategies within the 35 participants in the think-aloud condition. Again, we tested the differences within individuals by subtracting the number of metacognitive knowledge utterances in the persuasive passage from the metacognitive knowledge utterances in the expository passage. The medians for metacognitive knowledge utterances were 3.14 and 1.23 for the expository and persuasive passages respectively. The median difference between passages at the individual level for metacognitive knowledge was 1.91. This indicates that a participant with a metacognitive knowledge difference score at the 50th percentile made 1.91 more metacognitive knowledge Metacognitive Monitoring 21 utterances in the expository passage than the persuasive passage. This difference was significant (Med = 1.91, P2.5 = 0.91, P97.5 = 3.09). The medians for metacognitive experience utterances were 2.71 and 5.60 for the expository and persuasive passages respectively. The median difference between passages was 2.77. This indicates that a participant with a metacognitive experience difference score at the 50th percentile made 2.77 more metacognitive experience utterances in the persuasive passage than the expository passage. This difference was also significant (Med = -2.77, P2.5 = -4.52, P97.5 = 1.11). The medians for goals/activation of strategies were 1.54 and 2.11 for the expository and persuasive passages respectively. The median difference between passages was -0.57. This indicates that a participant with a goals/activation difference score at the 50th percentile made 0.57 more goals/activation of strategy utterances in the persuasive passage than the expository passage. This difference was also significant (Med = -0.57, P2.5 = -1.11, P97.5 = -0.29). Overall, this evidence demonstrates that type of text may elicit more types of metacognitive monitoring and control, and also demonstrates that type of text may elicit different types of metacognitive monitoring and control. Effects of the Think-Aloud Protocol on Metacognitive Monitoring and Control Next, we turn to an examination of the differences between the think-aloud and no thinkaloud groups in regards to scrollbacks and calibration. We predicted that think aloud would elicit greater metacognitive monitoring and control. To test the hypotheses about the difference between the think-aloud and no think-aloud conditions, we again relied on the non-parametric bootstrap. For the following analyses differences in scrollbacks and calibration were examined for passages within participants (i.e., difference scores) between the two groups (essentially a Metacognitive Monitoring 22 repeated measures test of passage effects using the think-aloud groups as the between-subjects effect). Scrollbacks. To test the hypothesis that participants in the think-aloud group would demonstrate more metacognitive monitoring and control via scrollbacks and calibration, we conducted a bootstrap test with a null hypothesis that the difference in the medians of each group equaled zero. Figure 4 presents these data for scrollbacks. First, we looked to see if there were differences in the absolute (unsigned) difference in scrollbacks between passages for each participant. The absolute median difference between scrollbacks for each individual on the passages was 0.64 for both the think-aloud and no-think aloud groups. The difference between these two medians was not significant (Med = 0.00, P2.5 = -0.98, P97.5 = 0.51). This indicates that a participant at the 50th percentile of the think-aloud group had the same number of scrollbacks as a participant in the 50th percentile of the no-think aloud group. Further, we examined if the groups scrolled back differently for one passage versus the other by retaining the signed difference. For the think-aloud group, the median difference in scrollbacks was -0.36, indicating that a participant in the 50th percentile of the acclimation group scrolled back 0.36 times more often in the persuasive passage than the expository passage. For the no think-aloud group, the median difference in scrollbacks was -0.13, indicating that a participant in the 50th percentile of the no think-aloud group scrolled 0.13 times more often in the persuasive passage than the expository passage. The difference between the two groups on scrollbacks between the passages was not significant (Med = -0.23, P2.5 = -0.98, P97.5 = 0.52). These tests indicate that there were no between-subjects effects for the think-aloud condition. Calibration. To test the hypothesis that the think-aloud group would be more closely calibrated than the no think-aloud group, we conducted a bootstrap test with a null hypothesis Metacognitive Monitoring 23 that the difference between groups was zero. This is the between-subjects effects (think-aloud condition) for the repeated measures (i.e., the passages) for calibration. Results for both absolute accuracy and bias are presented in Figure 5. The first examination was of differences in the groups’ absolute accuracy (the unsigned difference between confidence and performance). There were no significant differences in absolute accuracy between the two groups for either the expository passage (Med = -0.33, P2.5 = -4.17, P97.5 = 3.34) or the persuasive passage (Med = 1.31, P2.5 = -2.71, P97.5 = 5.28). The second examination was for bias (the signed difference between confidence and performance). The difference between the two groups (i.e., think-aloud and no think-aloud) for the expository passage was 0.94. This means that a participant at the 50th percentile of the thinkaloud group was more under-confident compared to a participant at the 50th percentile of the no think-aloud group, though this was not statistically significant (Med = 0.94, P2.5 = -5.01, P97.5 = 7.09). These differences were also not significant for the persuasive passage with a median difference between the two groups of 0.89. This indicates that a participant in the 50th percentile of the think-aloud group was more under-confident than a participant at the 50th percentile of the no think-aloud group (Med = 0.89, P2.5 = -5.42, P97.5 = 6.79). These tests indicate that there were no significant differences in the between-subjects effects for think-aloud condition. Effect of Domain Expertise on Metacognitive Monitoring and Control The participants for the study were chosen specifically because their various levels of expertise were hypothesized to differ. Although the undergraduates (i.e., from the human development and government and politics classes) are similar to each other in terms of their GPA and cumulative college credits completed, they differed in both their prior knowledge and interest in the judicial review process. Table 3 and Figure 6 show the differences in mean levels Metacognitive Monitoring 24 of prior knowledge and topic interest across the three participant pools (which were continuous, normally distributed data). As we would expect, there is a clear increase in both prior knowledge and topic interest from the human development undergraduates (those in assimilation), government and politics undergraduates (those in competence), to the practicing attorneys (those in expertise). An omnibus ANOVA test indicated that there were significant differences in both prior knowledge (F = 12.72, df = 2, p < 0.01) and topic interest (F = 9.59, df = 2, p < 0.01) between these three groups. Contrasts (i.e., Fischer's LSD) indicated that there were also significant differences between the human development undergraduates and the government and politics undergraduates in both prior knowledge (Mdif = 6.46, SE = 1.54, p < 0.01) and topic interest (Mdif = 14.46, SE = 4.08, p < 0.01). There were also significant differences between the human development undergraduates and the practicing attorneys in both prior knowledge (Mdif = 12.85, SE = 3.50, p < 0.01) and topic interest (Mdif = 29.33, SE = 8.92, p < 0.01). However, significant differences were not found between the government and politics undergraduates and the practicing attorneys in either prior knowledge (Mdif = 6.39, SE = 3.50, p = 0.072) or topic interest (Mdif = 14.86, SE = 8.96, p = 0.10). However, we contend that these differences were not detected due to the small sample size of the practicing attorneys combined with the conservative nature of contrasts such as Fischer's LSD. Since significant differences were found between the two undergraduate participant pools and we were able to obtain large enough samples, an examination of the differences between these two groups in regards to scrollbacks and calibration was undertaken. To test the hypotheses about the difference between the human development undergraduates (i.e., those in acclimation) and the government and politics undergraduates (i.e., those in competence), we again relied on Metacognitive Monitoring 25 the non-parametric bootstrap. For the following analyses differences in scrollbacks and calibration were examined for passages within participants (i.e., difference scores) between the two groups (essentially a repeated measures test of passage effects using the developmental groupings as the between-subjects effect). Scrollbacks. To test the hypothesis that participants in the acclimation group would use scrollbacks more often in the expository passage than the persuasive passage, whereas the participants in the competence group would use scrollbacks more for the persuasive passage than the expository passage, we conducted a bootstrap test with a null hypothesis that the difference in the medians of each group equaled zero. Figure 7 presents these data for scrollbacks. First, we looked to see if there were differences in the absolute (unsigned) difference in scrollbacks between passages for each participant. The absolute median difference between scrollbacks for each individual on the passages was 0.57 for the acclimation group and 1.38 for the competence group. The difference between these two medians was significant (Med = -0.80, P2.5 = -1.38, P97.5 = -0.26). This indicates that a participant at the 50th percentile of the competence group used 0.80 more scrollbacks in one passage versus the other, regardless of which passage they used the greater number for. Further, we examined which of these passages had more scrollbacks in each of these groups by retaining the signed difference. For the acclimation group, the median difference in scrollbacks was 0.29, indicating that a participant in the 50th percentile of the acclimation group scrolled back 0.29 times more often in the expository passage than the persuasive passage. For the competence group, the median difference in scrollbacks was -0.84, indicating that a participant in the 50th percentile of the competence group scrolled 0.84 times more often in the persuasive passage than the expository passage. The difference between the two groups on Metacognitive Monitoring 26 scrollbacks between the passages was significant (Med = 1.08, P2.5 = 0.33, P97.5 = 1.73). This indicates that in addition to a repeated measures effect for passages (i.e., textual effects of metacognitive monitoring and control), there is also a between-subjects effect of developmental level. Calibration. To test the hypothesis that the group in competence would be more closely calibrated than the group in acclimation, we conducted a bootstrap test with a null hypothesis that the difference between groups was zero. This is the between-subjects effects (developmental group) for the repeated measures (i.e., the passages) for calibration. Results for both absolute accuracy and bias are presented in Figure 8. First, we looked to see if there were differences in their absolute accuracy (the unsigned difference between confidence and performance). There were no significant differences in absolute accuracy between the two groups for either the expository passage (Med = -1.74, P2.5 = -5.48, P97.5 = 1.86) or the persuasive passage (Med = 0.00, P2.5 = -4.05, P97.5 = 4.09). For bias, slight differences between the groups began to emerge, particularly for the persuasive passage. The difference between the two groups (i.e., acclimation and competence) for the expository passage was 0.71. This means that a participant at the 50th percentile of the competence group was slightly more overconfident compared to a participant at the 50th percentile of the acclimation group, though this was not statistically significant (Med = 0.71, P2.5 = -5.28, P97.5 = 6.59). However, for the persuasive passage this difference was greater. In fact, for our sample, a participant in the 50th percentile of the acclimation group was more underconfident (by 5.13 points) than a participant at the 50th percentile of the competence group, although this difference was not significant (Med = 5.13, P2.5 = -1.18, P97.5 = 11.55). Conclusions Metacognitive Monitoring 27 To our knowledge, this study was the first to investigate metacognitive monitoring and control between expository and persuasive text. Previous evidence of persuasion on knowledge and interest (Buehl et al, 2001) spurred us to investigate learners’ strategic processing with persuasive text. In addition, it was important to us that measures of metacognitive monitoring and control did not change or elicit different levels (quantity or quality) of participants’ mental processing. Moreover, by sampling from participant pools which we hypothesized would have varying levels of expertise (i.e., acclimation, competence, and proficiency) we were able to examine these differences among participants of different familiarity with the domain. Of the results presented above, the most surprising to us was the very limited use of the help-seeking feature (i.e., the glossary) by the participants of all expertise levels in this investigation. Given the active lines of research dealing with help seeking in the literature (e.g., Aleven & Koedinger, 2002), we expected participants to use help seeking in at least one of the two passages. Two reasons may underlie the limited use of help seeking here. One, it may be a reflection of the differences in task environment, and two, it may be a reflection of the participants' motivation. First, unlike Aleven and Koedinger’s work (which primarily deals with well-structured tasks such as solving geometry problems), the task environment here was an ill-structured task, comprehending text. Since participants were not required to find “an answer” to a problem, but rather try to comprehend the passage, the participants may have been unaware that they needed to seek help (a monitoring problem). Additionally, the accessibility of the help seeking feature may make a difference in their probability of using the feature. For example, if the environment (such as a cognitive tutor) prompts students with a help-seeking option, they may be more likely to examine these features. In this investigation, participants were told the help-seeking feature Metacognitive Monitoring 28 was available, but were not prompted during the task to use this feature. Second, participants’ motivation may have played a role in the limited use of help seeking within this study (a control problem). Participants may know they do not understand a term, but lack the interest or need to comprehend the passage to actually seek help. This finding is particularly helpful in attempts to structure environments (computerized and otherwise) that encourage participants to monitor and control their mental processes. However, with these results in mind, we caution that if participants are prompted to seek help, this does not mean that they will be able or willing to seek help on their own. This is particularly salient in the literature dealing with metacognition and self-regulated learning since a large percentage of studies use some form of prompting (Dinsmore, et al, 2008). Textual Influences on Metacognitive Monitoring and Control Rereading differed across conditions, as we found evidence that participants used scrollbacks differently across the two passages, but that the participants did not necessarily use scrollbacks more for the persuasive passage than for the expository passage. One possible explanation may relate to working memory demands, while the other is a limitation with the choice of measure in this investigation. If in fact, as Kellogg (2001) found, that persuasive text places greater demands on working memory than expository text, this may explain why some participants reread more for the persuasive text and others reread more for the expository text. In order to deal with higher demands on working memory, one might need to use strategies, such as rereading to deal with the higher demands of the persuasive passage. On the other hand, it is also possible that the high demands on working memory make monitoring and control more costly, causing one to reread less of the persuasive passage. We suspect that some of these issues will be clarified as we examine rereading among the domain expertise groups. It would be our Metacognitive Monitoring 29 contention that prior knowledge and interest of the individual may help explain these findings in regards to rereading. An alternative explanation may involve the limitations of using scrollbacks to measure rereading. We were only able to detect when participants scrolled back more than three lines. Since we had collected think-aloud data for some individuals, an inspection of these transcripts revealed that participants reported rereading more times than they had scrolled back (i.e., going back one or two lines). We will continue to examine participants' strategic moves through text with measures more fine-tuned than scrollbacks. For example, we are hoping that using eyetracking methodology will help us examine strategic moves through different types of text. While absolute accuracy did not differ between passages as we expected, the difference in bias (i.e., overconfidence and under-confidence) was significant. We can forward two possible explanations for this finding. First, the difference in bias may relate to participants relative familiarity in reading expository and persuasive passages. A large majority of the participants in the study were university undergraduates who read mostly expository texts (i.e., textbooks) for their classes. Familiarity with this type of text may increase their confidence to levels beyond their actual performance. Conversely, their relative unfamiliarity with persuasive texts (especially in the classroom environment) may make them less confident in their ability to comprehend the text. As we collect more data for practicing attorneys, we hypothesize that their familiarity with legal briefs (a type of persuasive text) may moderate their bias scores. Overall, metacognitive experience and goals/activation of strategies were higher for the persuasive passage, while metacognitive knowledge was higher for the expository passage. This finding makes sense to us, since the expository passage was primarily a collection of declarative facts (e.g., “Since Marbury v. Madison, about 150 federal laws have been struck down in whole Metacognitive Monitoring 30 or in part, along with about 1000 state laws and more than 100 municipal ordinances.”). Making connections to their prior knowledge (e.g., “I knew that”, “I didn’t know that”) was the main metacognitive monitoring activity during this expository passage. Whereas, in the persuasive passage participants had to evaluate both comprehension and agreement (e.g., “I don’t understand that”, “I agree with that”) in order to analyze the arguments being presented in the passage (e.g., “Gonzales, arguing against judicial activism, states that courts should be very careful in taking the step of declaring that a law or agency action is unconstitutional.”). Interestingly, there were more utterances of goals/activation of strategies in the persuasive passage. This may indicate increased engagement with the text, especially since the participants had to evaluate both comprehension and agreement more closely in the persuasive text than the expository text. This finding supports the explanation that perhaps scrollbacks are not fine grained enough to differentiate the activation of strategies in these two types of texts. Effects of the Think-Aloud Protocol on Metacognitive Monitoring and Control In line with previous studies (e.g., Veenman et al, 1993), we did not find significant differences in metacognitive monitoring and control as measured by scrollbacks and calibration. However, this does not mean that differences do not exist. It may be the case that our ability to detect these differences was limited by our measures. For example, the rereading was operationalized as scrolling back through more than three lines of text. It is possible, as we stated above, that participants looked back one or two lines more often in one of the conditions, but that this difference was undetectable in our data. Although we found no significant differences here, we are still unsure that the thinkaloud protocol has no impact on metacognitive monitoring and control, especially given the evidence that this protocol significantly affects participant outcomes (Karahasanović, Hinkel, Metacognitive Monitoring 31 Sjøberg, and Thomas, 2009). Since we have other data on participant outcomes in addition to the multiple-choice items, we plan to investigate whether this is the case in this study as well. Additionally, as Greatorx and Süto (2008) reported in their descriptive study, participants reported varied experiences with the think-aloud protocol. An examination of Table 4 shows that while the means for scrollbacks are similar for the two conditions, the standard deviation for the think-aloud condition was higher. In fact Box’s Test (which tests the equality of the covariance matrices between groups) was significant (F = 3.07, df = 3, 1559607, p < 0.05). This finding is in line with what Greatorex and Süto (2008) found in their descriptive study. We can forward two explanations for this difference in variance between the groups. The literature suggests that the directions (specifically whether they chose to read out loud or not) may have primed some participants to engage in certain behaviors, strategic and otherwise (Bannert & Mengelkamp, 2008). Effect of Domain Expertise on Metacognitive Monitoring and Control For the third question, a between-subjects effect of developmental level, we found significant differences between the groups' rereading behavior, but not their calibration. This question is one of central importance, as studies comparing metacognition at different levels of expertise are limited in the contemporary literature (Dinsmore, et al, 2008). First, we found that the government and politics students reread more for the persuasive passage than the expository passage. This clarifies the findings from the within-subjects effects of the passages above. In fact, it was interesting that unlike the trend for all the participants together, the human development undergraduates as a group scrolled reread more for the expository text than the persuasive text. Not only were these participants likely more unfamiliar with persuasive text in the classroom environment, they were as a group more unfamiliar with the topic. We contend Metacognitive Monitoring 32 that they were probably able to engage with the expository text more easily because it required less prior knowledge to comprehend. Conversely, it is likely that more prior knowledge would be necessary to understand and engage with the arguments presented for and against judicial activism, which would subsequently impact ease of comprehension. We did not find significant differences for calibration, which is in line with previous research that novices and experts do not necessarily differ in how well-calibrated they are to a task (Lichtenstein & Fischoff, 1980). Both groups (i.e., acclimation and competence) were overconfident for the expository passage and under-confident on the persuasive passage. Overall, these participants actually seemed to be fairly well calibrated. The median participant was only miscalibrated by about 11 points. We were surprised that most participants, particularly the participants in acclimation were so accurate. Considering metacognition through a developmental theory of expertise, such as the MDL is crucial. When assigning course text, students' prior knowledge and interest should be considered as these factors indicate competence within a domain and the subsequent ease with which students can engage with particular types of text. As Fox et al. have cautioned, undergraduates are not as apt to learn from assigned texts as instructors often assume. Difficulty learning from text often stems from poor metacognitive monitoring and control (Wiley, Griffin, & Thiede, 2005). In order to further investigate the findings reported in this study it may be necessary to collect more data from practicing attorneys, so we can examine differences among all levels of expertise, including those demonstrating proficiency. Metacognitive Monitoring 33 Author Note We would like to thank Emily Fox for her help in adapting the passages. We would also like to thank the members of the Disciplined Reading and Learning Research Laboratory for their helpful comments and feedback on this manuscript. Metacognitive Monitoring 34 References Aleven, V. A., & Koedinger, K. R. (2002). An effective metacognitive strategy: Learning by doing and explaining with a computer-based Cognitive Tutor. Cognitive Science, 26, 147179. Aleven, V. A., McLaren, B., Roll, I., & Koedinger, K. R. (2006) Toward Meta-cognitive Tutoring: A Model of Help Seeking with a Cognitive Tutor. International Journal of Artificial Intelligence in Education, 16, 101-128. Alexander, P. A. (1997). Mapping the multidimensional nature of domain learning: The interplay of cognitive, motivational, and strategic forces. In M. L. Maehr & P. R. Pintrich (Eds.), Advances in motivation and achievement (Vol. 10, pp. 213–250). Greenwich, CT: JAI Press. Alexander, P. A., Murphy, P. K., & Kulikowich, J. M. (1998). What responses to domainspecific analogy problems reveal about emerging competence: A new perspective on an old acquaintance. Journal of Educational Psychology, 90, 397-406. Allen, M. (1991). Meta-analysis comparing the persuasiveness of one-sided and two-sided messages. Western Journal of Speech Communication, 55, 390-404. Bannert, M., & Mengelkamp, C. (2008). Assessment of metacognitive skills by means of instruction to think aloud and reflect when prompted. Does the verbalization affect learning? Metacognition and Learning, 3, 39-58. Bates, M. (1975). The lady lives on blood. In A. Ternes (Ed.), Ants, Indians, and little dinosaurs (pp. 74-82). New York: Charles Scribner’s Sons. Bernardi, R. A. (1994). Validating research results when Cronbach's alpha is below .70: A methodological procedure. Educational and Psychological Measurement, 54, 766-775. Metacognitive Monitoring 35 Bolick, C. (2007). A cheer for judicial activism. Retrieved January 21, 2008, http://www.cato.org/pub_display.php?pub_id=8168. Buehl, M. M., Alexander, P. A., Murphy, P. K., & Sperl, C. T. (2001). Profiling persuasion: The role of beliefs, knowledge, and interest in the processing of persuasive texts that vary by argument structure. Journal of Literacy Research, 33, 269-301. Carrell, P. L., & Connor, U. (1991). Reading and writing descriptive and persuasive texts. The Modern Language Journal, 75, 314-324. Dahl, M., Allwood, C. M., & Hagberg, B. (2009). The realism in older people's confidence judgments of answers to general knowledge questions. Psychology and Aging, 24, 234238. de Bruin, A. B. H., Rikers, R. M. J. P., Schmidt, H. G. (2007). Improving metacomprehension accuracy and self-regulation in cognitive skill acquisition: The effect of learner expertise. European Journal of Cognitive Psychology, 19, 671-688. Dinsmore, D. L., Alexander, P. A., & Loughlin, S. M. (2008). Focusing the conceptual lens on metacognition, self-regulation, and self-regulated learning. Educational Psychology Review, 20, 391-409. Dunlosky, J., Serra, M. J., Matvey, G., & Rawson, K. A. (2005). Second-order judgments about judgments of learning. Journal of General Psychology, 132, 335-346. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. Ericcson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA, US: The MIT Press. Metacognitive Monitoring 36 Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive– developmental inquiry. American Psychologist, 34, 906–911. Fox, E., Dinsmore, D. L., Maggioni, L., & Alexander, P. A. (2009, April). Factors associated with undergraduates’ success in reading and learning from course texts. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Gonzales, A. (2007). Speech to the American Enterprise Institute. Retrieved January 21, 2008, from http://jurist.law.pitt.edu/paperchase/2007/01/gonzales-disparages-judicial.php. Greatorex, J., & Süto, W. M. I. (2008). What do GCSE examiners think of 'thinking aloud'? Findings from an exploratory study. Educational Research, 50, 319-331. Johnson-Glenberg, M. C. (2005). Web-based training of metacognitive strategies for text comprehension: Focus on poor comprehenders. Reading and Writing, 18, 755-786. Kamalski, J. Sanders, T., & Lentz, L. (2008). Coherence marking, prior knowledge, and comprehension of informative and persuasive texts: Sorting things out. Discourse Processes, 45, 323-345. Karahasanović, A., Hinkel, U. N., Sjøberg, D. I. K., & Thomas, R. (2009). Comparing of feedback-collection and think-aloud methods in program comprehension studies. Behaviour & Information Technology, 28, 139-164. Kellogg, R. T. (2001). Competition for working memory among writing processes. American Journal of Psychology, 114, 175-191. Lichenstein, S., & Fischoff, B. (1980). Training for calibration. Organizational Behavior and Human Performance, 26, 149-171. Metacognitive Monitoring 37 Microsoft (2008). The judicial branch. Retrieved January 21, 2008, http://encarta.msn.com/encyclopedia_761595623/Judicial_Branch.html Miller, P. H., Kessel, F. S., & Flavell, J. H. (1970). Thinking about people thinking about people thinking about...: A study of social–cognitive development. Child Development, 41, 613– 623. Moos, D. C., & Azevedo, R. (2008). Self-regulated learning with hypermedia: The role of prior domain knowledge. Contemporary Educational Psychology, 33, 270-298. Murphy, P. K., Long, J. F., Holleran, T. A., & Esterly, E. (2003). Persuasion online or on paper: A new take on an old issue. Learning and Instruction, 13, 511-532. Parkinson M. M., & Dinsmore, D. L. (in preparation). Calibrating calibration. Nietfeld, J. L., Cao, L., & Osborne, J. W. (2005) Metacognitive monitoring accuracy and student performance in the postsecondary classroom. Journal of Experimental Education, 74, 728. Rhodes, Matthew G.; Castel, Alan D. (2008). Metacognition and part-set cuing: Can interference be predicted at retrieval? Memory & Cognition, 36, 1429-1438. Shapiro, A. M. (2008) Hypermedia design as learner scaffolding. Educational Technology Research and Development, 56, 29-44. Thiede, K. W., & Dunlosky, J. (1994). Delaying students' metacognitive monitoring improves their accuracy in predicting their recognition performance. Journal of Educational Psychology, 86, 290-302. Veenman, M. V. J., Elshout, J. J., & Groen, M. G. M. (1993) Thinking aloud: Does it affect regulatory processes in learning? Tijdschrift voor Onderwijsresearch, 18, 322-330. Metacognitive Monitoring 38 Wiley, J., Griffin, T. D., Thiede, K. W. (2005). Putting the comprehension in metacomprehension. Journal of General Psychology, 132, 408-428. Williams, J. P., Stafford, K. B., Lauer, K. D., Hall, K. M., Pollini, S. (2009). Embedding reading comprehension training in content-area instruction. Journal of Educational Psychology, 101, 1-20. Metacognitive Monitoring 39 Appendix A Expository Passage The Judicial Branch, the portion of the United States national government that decides cases arising under federal laws and under the Constitution of the United States. The judicial branch interprets laws that have been passed by the legislative branch (Congress) and approved by the president of the United States, who leads the executive branch. Article III of the Constitution vests the judicial power in “one supreme Court, and in such inferior courts as the Congress may from time to time establish.” This means that apart from the Supreme Court, the organization of the judicial branch is left in the hands of Congress. Beginning with the Judiciary Act of 1789, Congress created several types of courts and other judicial organizations, which now include lower courts, specialized courts, and administrative offices to help run the judicial system. Federal courts have a leading role in interpreting laws, rules, and other government actions, and determining whether they conform to the Constitution. This function of judicial review was asserted in 1803 by Chief Justice John Marshall in the case of Marbury v. Madison. Judicial review includes both interpreting the law and judging cases. First, in Marshall’s words, “it is emphatically the province and duty of the judicial department to say what the law is.” This need to explain the law stems from the fact that the Constitution and many laws include vague words or phrases. The ambiguity of the Constitution’s 14th Amendment, for example, makes it one of the most important sources of cases argued before the Supreme Court. The amendment guarantees citizens “due process of law” and “equal protection of the laws.” The meaning of these phrases is unclear, leading to protracted court battles over the application of the 14th Amendment to groups such as racial minorities, women, people with disabilities, and legal and Metacognitive Monitoring 40 illegal aliens. Confusion and disagreement over the amendment have thrust the courts into disputes over affirmative action, abortion, sexual preferences, welfare benefits, and the rights of the disabled. Striking down laws or practices that violate the Constitution is another function of judicial review. Although the Court voided few laws during its first hundred years, it proved much more willing to take such strong steps in the 20th century. Since Marbury v. Madison, about 150 federal laws have been struck down in whole or in part, along with about 1000 state laws and more than 100 municipal ordinances. The courts do not always have the final say in settling issues of legal interpretation. Working together, Congress and the states can compel the courts to accept a legal principle by amending the Constitution. After the Supreme Court ruled that income taxes were unconstitutional in Pollock v. Farmers’ Loan & Trust Co. in 1895, for example, Congress and the states ratified the 16th Amendment in 1913 to permit such taxes. Amending the Constitution is difficult and is usually time consuming, however. The president and members of Congress have their own ideas of what the Constitution permits, and on occasion they may try to impede or simply ignore the courts’ decisions. The president of the United States appoints federal judges, but these appointments are subject to approval by the Senate. Once confirmed by the Senate, federal judges have appointments for life or until they choose to retire. Federal judges can be removed from their positions only if they are convicted of impeachable offenses by the Senate, but this has happened on only a few occasions. The life-long appointments of federal judges makes it easier for the judiciary to stay removed from political pressure. The long terms mean that presidential appointees to federal courts will Metacognitive Monitoring 41 have an influence that lasts for decades, so the Senate closely scrutinizes many appointments, and sometimes blocks them altogether. The federal courts—which include district courts, courts of appeal, and the Supreme Court—handle only a small part of the legal cases in the United States. Most cases involve state and local laws, so they are tried in state and local courts rather than federal courts. Despite its relatively narrow jurisdiction, the caseload of the federal court system usually increases every year. To cope with the rapidly rising volume of work, Congress has repeatedly expanded the number of lower federal courts and judges. Most federal cases start out in the district courts, which are trial courts—courts that hear testimony about the facts of a case. There are about 90 district courts, including one or more in each state, one in the District of Columbia, one in Puerto Rico, and three territorial courts with jurisdiction over Guam, the Virgin Islands of the United States, and other U.S. territories. Each district is assigned from 2 to 28 judges, and there are about 650 district court judges in all. Each year the district courts handle more than 250,000 civil cases and more than 45,000 criminal cases, but only a tiny percentage of the civil and criminal cases actually go to trial. After a district court hears the facts of a case and issues a decision, the decision can be appealed to the second tier in the judicial branch, the courts of appeals. The appeals courts can consider only questions of law and legal interpretation, and in nearly all cases must accept the lower court’s factual findings. An appeals court cannot, for example, consider whether the physical evidence in a case was enough to prove a person was guilty. Instead, the appeals court might consider whether the district court followed appropriate rules in accepting evidence during the trial. Metacognitive Monitoring 42 The federal appeals courts system was created in 1891 to assist the Supreme Court with its workload. About 50,000 such appeals are filed every year. For appeals purposes, the United States is divided into 12 judicial areas called circuits, each with an appeals court containing from 6 to 28 judges. Every state, territory, and the District of Columbia belongs to an appeals circuit . An additional appeals court, the Court of Appeals for the Federal Circuit, has nationwide jurisdiction over major federal questions. Decisions of the appeals courts are final, unless the U.S. Supreme Court agrees to hear a further appeal. In district courts, most cases are heard by a single judge. In the appeals courts, cases are usually heard by a panel of three or more judges. When all of the court’s panels of judges sit together to hear a case the court is said to be sitting en banc. The United States Supreme Court is the highest court of the country. It consists of nine judges called justices, including a chief justice and eight associate justices. This number has remained steady for decades and now seems fixed, although in the 19th century the Court’s size varied. Metacognitive Monitoring 43 Appendix B Persuasive Passage Judicial activism has always been a subject of argument, but is now getting more attention, particularly due to recent court decisions, such as Hamdan v. Rumsfeld. In this case, a federal court decided that the Executive Branch could not hold certain suspects without trial indefinitely. Judicial activism is viewed by its critics, such as Alberto Gonzales, as “the judiciary overstepping the bounds set by the Constitution.” On the other hand, supporters of a strong, active judiciary, such as Clint Bolick, feel that recent cases in which judges have been described as "activist" are actually examples of the judiciary upholding its constitutional role and protecting the rights of individuals. Both sides base their supporting arguments on historical grounds, on checks and balances of the Constitution, and on citizens’ rights. Historical references are used as support both by critics and by those in favor of judicial activism. Gonzales uses the writers of the U. S. Constitution to support his argument against activism, saying that he does not believe those who wrote the Constitution ever intended that judges or courts would take on the role of making policy. He refers to Alexander Hamilton's statement in the Federalist Papers in which Hamilton says that the judicial branch of the government will have the least power to endanger political rights because of the limited nature of the functions assigned to it in the Constitution. Bolick uses similar but more compelling historical references to make his case in favor of a stronger role for the courts. He argues that judicial review, the power to invalidate unconstitutional laws, was essential to the type of government established by our Constitution. He quotes James Madison, another writer of the Constitution, who argued that one role of the judicial branch will be to guard our individual rights from possible violation by the executive or legislative branches of government. For Metacognitive Monitoring 44 example, courts have found that certain anti-abortion legislation made by states violates the 14th amendment of the Constitution, which protects the "right to privacy." Therefore, many state laws regarding abortion have been deemed unconstitutional by the courts and thrown out. So the function of judicial review given to the courts by the Constitutional actually gives them great power as the guardian of the constitutional rights of every citizen. The checks and balances of the three branches of government are also used to both criticize and support an active role for the judiciary branch. The writers of the U. S. Constitution envisioned three separate but equal branches of the federal government. The checks and balances of the Constitution ensure that no one branch of government or person has too much power. Gonzales, arguing against judicial activism, states that courts should be very careful in taking the step of declaring that a law or agency action is unconstitutional. He says that lawmakers and Executive Branch officials have sworn to uphold the Constitution, just as judges do. Courts that too easily use the Constitution as a way to strike down the actions of the other branches may not be allowing the legislature and the President to exercise their proper constitutional roles. However, Bolick raises the counter-argument that the courts are well equipped to second-guess lawmakers’ decisions that may be made too hastily or for the wrong reasons and that do not take into account all of the possible Constitutional issues. If legislators carefully considered the merits and constitutionality of legislation, then Gonzales's arguments might have merit. But our legislators rarely even read the complex bills they pass, which all too often are written to please outside interests, such as lobbyists who may have special interests or big business at heart. Judges, by contrast, look carefully at the competing evidence presented by both sides, as they should. If the courts did not check whether laws or decisions by the executive branch are actually in line with the Constitution, they would not be carrying out their own constitutional role. This Metacognitive Monitoring 45 would undo our checks and balances system and allow the legislative and executive branches to have too much power Protection of citizens’ rights is another issue used both to criticize and to support an active role for judges and the courts. Gonzales agrees that the courts must protect people from situations where the wishes of the majority might go against an individual’s constitutional rights. But he says that it is far more important to guard against the situation of having activist judges who undermine the right of the people to govern themselves. We elect lawmakers and our president and we have the right to expect that they will express the will of the majority – that is their job. And if they do not, we have the power to select different representatives and a different president in the next election. But when power is held by a few judges who are not elected and who can overturn the actions of our elected officials, we face a far greater danger. Yet, in posing this argument, Gonzales fails to take into account the other side of the problem, individual rights. Bolick says that the situation of unelected judges overriding the strong and clearly expressed wishes of a majority of the voters is extremely rare. A far greater problem is that judges do not take enough care to protect individual rights. The courts are much more likely to presume that laws and government actions are constitutional, making it much harder for individuals to prove that their rights have been violated. Even worse, courts have decided that the Constitution does not protect some very important individual rights against the interference of the government, including some related to the protections and privileges that go with being a citizen. So not only are courts ignoring legislation that is unconstitutional, they are interpreting the Constitution in a way that lets the government override the rights of individual citizens. Gonzales concludes that if the people have decided they favor your policy goals at the ballot box, then you get a chance to set policy and make laws. He says that the party that controls Metacognitive Monitoring 46 Congress and has the votes to enact laws supporting their policies should be free to do so without contradiction from activist judges who disagree with those laws on political grounds. Bolick shifts the argument away from the narrow issue of politics. He argues instead that the importance of judicial activism revolves around the minority rights that are the essential element of the Constitution and our democracy. He says that a court gavel can be David's hammer against the Goliath of big government. Among our governmental institutions, courts alone are designed to protect the individual against the power of the majority, and against special interest groups with too much influence. We all have a stake in seeing that the judiciary does protect us, for as government expands with new demands, such as Homeland Security, our freedom depends on the willingness of courts to keep the government in line. For better or worse, the courts are the last line of defense against the government running roughshod over individual liberties. When judges swear allegiance to the Constitution, they must be aware of the danger of going beyond the proper bounds of their judicial power, but even more so of the greater danger of not using it enough. Metacognitive Monitoring 47 Appendix C Protocol for Think-Aloud Condition Instructions for Think-aloud Protocol "In this investigation, we are interested in what you think and do while you read a text. What we want you to do is say what you are thinking and doing out loud. You can decide for yourself whether you would like to read the text silently or out loud, or do some of both. Do whatever feels most natural to you. We are only interested in what you are thinking or doing as you read. For example, if you are going back to reread, please say that's what you are doing. If something in the text reminds you of prior experiences or things you already know, let us know. If you are thinking that you don't understand something, please say that, too. There is no right or wrong things to say here, just whatever is going through your head as you read. If you are quiet for a period of time, I'll ask you to say what you're thinking. Do you have any questions?" Instructions for Practice Passage "So that you can get comfortable with thinking aloud while you read, I'm going to give you a practice passage to read first. This is just a practice, and I won't be recording what you say. You can take your time and get used to how it feels. So, what I want you to do now is read the passage and say what you're thinking and doing out loud." Metacognitive Monitoring 48 Table 1 Codes Used for Think-Aloud Transcripts Code Description Example Metacognitive Knowledge or beliefs that affect "Wow, I never knew that." Knowledge (MK) the course of mental operations "Judicial activism, I'm pretty sure I about a person, task, or strategy. know what that is." Metacognitive Cognitive or affective experience "I'm being distracted by noise Experience (ME) that pertain to a mental operation. outside." "Ok, I didn't understand that part." Goals and Realizing through a "I'll just start that paragraph over." Activation of metacognitive experience and "I'm going back, to re-read Strategies (G/AS) planning to evoke a strategy and something." evidence of those strategies. Metacognitive Monitoring 49 Table 2 Descriptive Statistics of Metacognitive Monitoring and Control Across Think-Aloud Conditions and Across Passages Min. Max. Mean (SD) Scrollbacks 0.00 8.00 1.76 (2.04) Help Seeking 0.00 3.00 0.13 (0.47) Absolute Accuracy 2.00 64.00 22.89 (13.50) -49.88 25.00 -8.57 (15.26) Bias Metacognitive Monitoring 50 Table 3 Descriptive Statistics of the Three Participant Groups on Prior Knowledge and Interest in the Judicial Review Process Min. Prior Knowledge Max. Mean (SD) Min. Topic Interest Max. Mean (SD) HDU 23.00 58.00 41.65 (7.00) 12.60 85.20 45.70 (16.21) GPU 35.00 59.00 48.11 (6.53) 9.90 87.10 60.16 (17.52) PA 52.00 57.00 54.50 (2.08) 53.30 97.50 75.03 (18.63) Note. HDU = human development undergraduates, GPU = government and politics undergraduates, PA = practicing attorneys Metacognitive Monitoring 51 Table 4 Descriptive Statistics of Metacognitive Monitoring and Control Between Think-Aloud Conditions and Across Passages Min. Think aloud Max. Mean (SD) Min. No think aloud Max. Mean (SD) Scrollbacks 0.00 8.00 2.03 (2.40) 0.00 6.00 1.53 (1.65) Help Seeking 0.00 2.00 0.56 (0.33) 0.00 3.00 0.13 (0.56) Absolute Accuracy 2.00 50.00 23.36 (11.73) 4.00 64.00 22.47 (15.06) -49.88 25.00 -8.57 (18.25) -30.88 16.13 -8.58 (12.20) Bias Metacognitive Monitoring 52 Figure 1 Median Number of Scrollbacks by Passage Number of ScrollBacks 1.2 1 0.8 0.6 0.4 0.2 0 Expository Persuasive Metacognitive Monitoring 53 Figure 2 Median Calibration Scores for Absolute Accuracy and Bias Between the Expository and Persuasive Passages Calibration (Condfidence-Performance) 14 12 10 8 6 * 4 2 Expository Persuasive 0 -2 -4 -6 Absolute Accuracy Bias Metacognitive Monitoring 54 Figure 3 Differences in Think-Aloud Utterances for the Expository and Persuasive Passages Median Number of Utterances 6 * 5 4 3 * * 2 Expository Persuasive 1 0 MK ME G/AS Note: MK = Metacognitive Knowledge, ME = Metacognitive Experience, G/AS = Goals/Activation of Strategies Metacognitive Monitoring 55 Figure 4 Absolute and Signed Difference in Number of Scrollbacks Between the Expository and Persuasive Passages Among Think-Aloud and No Think-Aloud Conditions 0.8 Difference in Scrollbacks (Expository-Persuasive) 0.6 0.4 0.2 Think Aloud 0 No Think Aloud -0.2 -0.4 -0.6 Absolute Signed Metacognitive Monitoring 56 Figure 5 Absolute Accuracy and Bias of the Expository and Persuasive Passages Among the Think-Aloud and No Think-Aloud Conditions 14 Calibration (Confidence-Performance) 12 10 8 6 4 Think Aloud 2 No Think Aloud 0 -2 -4 -6 Absolute Accuracy (Expository) Absolute Accuracy (Persuasive) Bias (Expository) Bias (Persuasive) Metacognitive Monitoring 57 Figure 6 Differences in Prior Knowledge and Topic Interest Among the Three Participant Groups 100 * 90 * 80 Average Score 70 60 30 75.03 * 50 40 * 60.16 54.5 48.11 41.65 45.7 HD GP PA 20 10 0 Prior Knowledge Topic Interest Note. HD = Human Development Undergraduates, GP = Government and Politics Undergraduates, PA = Practicing Attorneys Metacognitive Monitoring 58 Figure 7 Absolute and Signed Difference in Number of Scrollbacks Between the Expository and Persuasive Passages Among Human Development and Government and Politics Undergraduates Difference in Scrollbacks (Expository-Persuasive) 1.5 * 1 0.5 * 0 HD GP -0.5 -1 Absolute Signed Note. HD = Human Development Undergraduates, GP = Government and Politics Undergraduates Metacognitive Monitoring 59 Figure 8 Absolute Accuracy and Bias of the Expository and Persuasive Passages Among Human Development and Government and Politics Undergraduates Calibration (Confidence-Performance) 15 10 5 HD 0 GP -5 -10 Absolute Accuracy Absolute Accuracy Bias (Expository) (Expository) (Persuasive) Bias (Persuasive) Note. HD = Human Development Undergraduates, GP = Government and Politics Undergraduates