Scoring Open-Ended Concept Maps. . . 1 A technique for automatically scoring open-ended concept maps by Ellen M. Taricani and Roy B. Clariana The Pennsylvania State University College of Education Editorial contact: Dr. Roy Clariana Penn State University 1510 S. Quebec Way #42 Denver, CO 80231 RClariana@psu.edu 303-369-7071 Educational Technology Research and Development, 54, 61-78. Running Head: Scoring open-ended concept maps... Scoring Open-Ended Concept Maps. . . 2 A technique for automatically scoring open-ended concept maps Abstract This descriptive investigation seeks to confirm and extend a technique for automatically scoring concept maps. Sixty unscored concept maps from a published dissertation were scored using a computer-based technique adapted from Schvaneveldt (1990) and colleague’s Pathfinder network approach. The scores were based on link lines drawn between terms and on the geometric distances between terms. These concept map scores were compared to terminology and comprehension posttest scores. Concept map scores derived from link data were more related to terminology whereas concept map scores derived from distance data were more related to comprehension. A step-bystep description of the scoring technique is presented and the next steps in the development process are discussed. ___________________________________________________ Keywords: computer-based assessment, Pathfinder networks, technology tools Scoring Open-Ended Concept Maps. . . 3 A technique for automatically scoring open-ended concept maps Currently there is considerable interest in the use of concept maps both to promote and to measure meaningful learning (Shavelson, Lang, & Lewin, 1994). Concept maps are sketches or diagrams that show the relationships among a set of terms by the positions of the terms and by labeled lines and arrows connecting some of the terms. Guided by Ausubel’s (1968) theory, teachers and researchers, mainly in science education, have considered concept hierarchy, proposition correctness, and cross-concept links as the most salient features of concept maps (Rye & Rubba, 2002). Concept maps are most often scored by raters using rubrics to quantify content, meaning, and visual arrangement. Extensive empirical research has shown that scoring approaches with the highest reliability and criterion-related validity compare specific features in student concept maps to those in expert referent maps (Ruiz-Primo & Shavelson, 1996). Because of the nature of the information in concept maps and also the idiosyncrasies of individual maps, raters almost necessarily must be subject-matter experts in the content of the map. Also, scoring concept maps by hand takes time and experience. McClure, Sonak, and Suen (1999) reported that raters can be overwhelmed by complex scoring rubrics. After comparing six concept map scoring approaches, they concluded that the reliability of concept map scores decreases substantially as the cognitive complexity of the scoring task increases. Taken together, these issues mitigate against the casual use of concept maps for assessment in the classroom with all but the simplest scoring rubrics. Scoring Open-Ended Concept Maps. . . 4 Several large-scale projects are underway that seek to automate concept map scoring (Cañas, Coffey, Carnot, Feltovich, Hoffman, Feltovich, & Novak, 2003; Herl, O'Neil, Chunga, & Schacter, 1999; Luckie, 2001). This investigation considers a concept map scoring technique that is based on Pathfinder associative networks (Schvaneveldt, Dearholt, & Durso, 1988) that does not require raters. Pathfinder Associative Networks Pathfinder networks (PFNets) are a well established method for representing knowledge that has been applied in a number of domains of interest to instructional designers (Jonassen, Beissner, & Yacci, 1993). PFNets are 2-dimensional graphic network representations of a matrix of relationship data in which concepts are represented as nodes and relationships as unlabeled links connecting the nodes. PFNets visually resemble concept maps, but without linking terms. There are three steps in the Pathfinder approach. In Step 1, raw proximity data is collected typically using a word-relatedness judgment task. Participants are shown a set of terms two at a time, and judge the relatedness of each pair of terms on a scale from one (low) to nine (high). The number of pair-wise comparisons that participants must make is (n2 – n)/2, with n equal to the total number of terms in the list. In Step 2, a software tool called Knowledge Network and Orientation Tool for the Personal Computer (KNOT, 1998) is used to reduce the raw proximity data into a PFNet representation. Pathfinder uses an algorithm to determine a least-weighted path that links all of the terms. The rules for calculating the least-weighted path can be adapted by adjusting parameters that reduce or prune the number of links in the resulting PFNet (refer to Dearholt & Schvaneveldt, 1990). The resulting PFNet is based on a data reduction approach that is purported to Scoring Open-Ended Concept Maps. . . 5 represent the most salient relationships in the raw proximity data. In Step 3, the similarity of the participant’s PFNet to an expert referent PFNet is calculated also using KNOT software (Goldsmith & Davenport, 1990). The total number of derived links shared by two PFNets is called links in common. Common is a positive integer that ranges from zero to the maximum number of links in the referent PFNet. How can this Pathfinder approach be used to score concept maps? Several investigators have shown that concept map-like tasks capture some of the same types of relational information as word-relatedness judgment tasks (Jonassen, 1987; McClure et al., 1999; Rye & Rubba, 2002; Schau, Mattern, Zeilik, & Teague, 1999). Compared to word-relatedness judgment tasks, concept maps are relatively fast to complete and perhaps more authentic (Shavelson et al., 1994). In the present investigation, concept maps provide an alternative to word-relatedness judgment tasks in Step 1 for obtaining raw proximity data, while Steps 2 and 3 are conducted in the conventional way. Thus, the main contribution of this investigation is in clearly describing how components of a concept map can be converted into raw proximity data in Step 1, and then describing the validity and reliability of the scores obtained. An Automatic Technique for Scoring Concept Maps What information components of concept maps can be collected automatically? Concepts, links, and linking terms can be counted in various ways. In addition, following Kitchin (2000); Yin, Vanides, Ruiz-Primo, Ayala, and Shavelson (2004) have proposed that map structure complexity, as determined by examining the overall visual layout of the map, should also be considered. Automatically measuring these components of concept maps is easier with closed-ended concept mapping tasks where the student is Scoring Open-Ended Concept Maps. . . 6 provided with a predefined list of concepts and linking terms, such as the concept map scoring software used by Herl et al. (1999). However, many investigators refer to openended concept mapping, where participants may use any concepts and linking terms in their maps, as the gold standard for capturing students’ knowledge structures (McClure et al., 1999; Ruiz-Primo, Schultz, Li, & Shavelson, 1999; Yin et al., p.24). But automatically scoring open-ended concept maps is considerably more difficult. Clariana, Koul, and Salehi (in press) piloted a technique for scoring open-ended concept maps. Practicing teachers enrolled in graduate courses constructed concept maps on paper while researching the topic, “the structure and function of the human heart and circulatory system” online. Participants were given the online addresses of five articles that ranged in length from 1,000 to 2,400 words but were encouraged to view additional resources. After completing their research, participants then used their concept map as an outline to write a 250-word text summary of this topic (see Clariana, 2003). Computer software tools (Clariana, 2002) were used to measure the geometric distances between terms in the concept maps, referred to as distance data, and to count the link lines that connected terms, referred to as link data (see Figure 1). Using Pathfinder KNOT software, the raw distance and link data were converted into PFNets and then were compared to an expert’s PFNets to obtain similarity scores. Five pairs of raters using rubrics also scored all of the concept maps and text summaries. The correlation values (Pearson r) for the concept maps scored by raters compared to concept map link-based scores was 0.36, to concept map distance-based scores was 0.54, and to text summaries scored by raters was 0.49. The correlation values for the text summaries scored by raters compared to concept map link-based scores was 0.76 and to concept map distance-based scores was 0.71. The Scoring Open-Ended Concept Maps. . . 7 authors concluded that these “automatically derived concept map scores can provide a relatively low-cost, easy to use, and easy to interpret measure of students’ science content knowledge.” Link Array left atrium a b c d e f g right ventricle pulmonary vein pulmonary artery left atrium lungs oxygenate pulmonary artery pulmonary vein deoxgenate right ventricle a 0 0 0 1 0 0 b c d e f g 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 - d e f g Distance Array a b c d e f g lungs deoxygenate oxygenate left atrium lungs oxygenate pulmonary artery pulmonary vein deoxgenate right ventricle a 120 150 108 73 156 66 b 36 84 102 42 102 c 120 114 138 54 84 144 138 42 114 120 - Figure 1 The link and distance data arrays of a simple map. It is important to note that “link line” does not correspond to the term “proposition” used in the concept map literature. A proposition is the combination of two concepts (e.g., subject–predicate) and a linking term (e.g., verb) that describes the relationship between the two concepts. For example, in the proposition, “the aorta is a blood vessel”, “aorta” and “blood vessel” are concepts and “is a” is the linking term. Typically, proposition are scored based on correctness, which includes deciding whether the linking term is valid and significant for that context. Recently, Harper, Hoeft, Evans, and Jentsch (2004) reported that the correlation between just counting link lines compared to actually scoring correct and valid propositions in the same set of maps was r Scoring Open-Ended Concept Maps. . . 8 = 0.97, suggesting that the substantial extra time and effort required to specify and handscore all possible linking terms adds little additional information over just counting link lines. Thus, the approach used by Clariana et al. (in press) is a significant departure from most concept map scoring approaches because it considers link lines, not proposition correctness. In addition, their technique uses the relative spatial location of concepts to communicate hierarchical and coordinate concept relations (Robinson, Corliss, Bush, Bera, & Tomberlin, 2003, p.26). This approach is founded on previous research on free association norms (Deese, 1965), on structural analyses of text propositions (Frase, 1969), on the Matrix Model of Memory (Humphreys, Bain, & Pike, 1989; Pike, 1984), and on current neural network models of cognition (McClelland, McNaughton, & O'Reilly, 1995). For example, in an untrained neural network, concept units (e.g., subjects, predicates, and linking words) are randomly associated, but as the network learns by experiencing many propositions, structural relationships emerge. Elman (in press) has shown that verbs (linking words) and nouns (subjects–predicates) separate into two separate clusters, and then nouns sub-cluster based on their meaning (e.g., animate and inanimate). The present investigation assumes that the distances between concepts terms in concept maps captures aspects of this underlying fundamental concept structure (e.g., subject–predicate associations). In addition, distance data may provide a direct measure of what Yin et al. (2004) call map structure complexity. In a follow-up study, Clariana and Poindexter (2004) used the same Pathfinder scoring technique but asked participants to draw network maps rather than concept maps. Scoring Open-Ended Concept Maps. . . 9 Network maps are like concept maps, except that there are no linking terms. The mapping directions specifically directed the participants to use spatial closeness to show relationships and intentionally deemphasized the use of link lines. Participants completed one of three print-based text lessons on the heart and circulatory system. The three lesson treatments included adjunct constructed response questions, scrambled-sentences, and a reading only control. Participants completed three multiple-choice posttests that assessed identification, terminology, and comprehension and then were handed a list of 25 preselected terms and were asked to draw a network map. The adjunct question treatment was significantly more effective than the other lesson treatments for the comprehension outcome, and no other treatment comparisons were significant. Scores based on network map link data (relative to distance data) were more related to terminology with Pearson’s r = .77 compared to r = .69, while scores based on network map distance data (relative to link data) were more related to comprehension, with Pearson’s r = .71 compared to r = .53. Thus the geometric distances between terms related more to the broader processes and functions of the heart and circulatory system, while the links drawn to connect terms related more to verbatim knowledge from the lesson text covering facts, terminology, and definitions. The purpose of the present descriptive investigation is to confirm and extend these two previous experimental studies involving this computer-based technique for scoring open-ended concept maps. The ultimate goal of this line of research is to develop a software tool that allows a learner to create and save a concept map and then automatically score the map relative to an archived expert referent map, receiving specific feedback on their maps. One necessary component of this software tool is the Scoring Open-Ended Concept Maps. . . 10 mathematical approach used to convert raw data into scores in Step 2. Therefore, this investigation compares multiple data reduction approaches for forming PFNets. The resulting scores are compared to traditional multiple-choice posttests that measure terminology and comprehension of the lesson content as a measure of the concurrent criterion-related validity of the various concept map scores. Method A recent dissertation that used concept maps as an instructional treatment provides an ideal and convenient existing data set for the purposes of the present investigation (Taricani, 2002). Though this is previously published data, the present investigation does not reexamine the original research questions. Further, this computerbased scoring method was not available at the time of the original publication and also the concept maps were not scored by raters in that study. In Taricani’s dissertation, undergraduate students were randomly assigned to one of five treatment conditions including a learner-generated concept map treatment with feedback, a learner-generated concept map treatment without feedback, a partially completed fill-in-the blank concept map treatment with feedback, a partially completed fill-in-the blank concept map treatment without feedback, and finally a reading-only, no map or feedback control treatment. Of these five, only the first two treatments involved creating a concept map, so only these two treatments are included in the present investigation. Participants Participants were freshmen students at a large northeastern university (n = 60) recruited as volunteers from both science and non-science courses. They were randomly assigned to either the learner-generated concept map treatment with feedback (Feedback) Scoring Open-Ended Concept Maps. . . 11 or the learner-generated concept map treatment without feedback (No Feedback). The No Feedback treatment group had 17 males and 13 females and the Feedback group, 12 males and 18 females. Participation was voluntary and participants were rewarded with either extra course credit or pizza and ice cream for their participation but not for performance. Materials The print-based instructional text was a 1,900-word passage on the human heart developed by Dwyer (1972) called “The human heart: Parts of the heart, circulation of blood, and cycle of blood pressure.” The text dealt with the parts of the heart and the internal processes that occur during the systolic and diastolic phases. The complexity of the information provided was suitable for these participants' general knowledge and comprehension levels. Following the approach used by Novak and Gowin (1984), a two-page lesson on how to create a concept map was developed (available from Taricani, 2002, pp. 122-124). This two-page lesson included a description of concept mapping and an example of a hierarchical concept map. A short paragraph was presented that contained a familiar scenario about a student who took a walk to the campus library. The walk scenario was selected as a metaphor of blood flow through the heart (e.g., the student moves from point-to-point going past various buildings along the way and blood flows from point-topoint passing through various components along the way). After reading this paragraph, participants were asked to draw a concept map of the library walk scenario on a blank sheet of paper as practice. To foreshadow the lesson treatment, feedback in the form of an instructor prepared hierarchical concept map of the library walk was given to the Scoring Open-Ended Concept Maps. . . 12 Feedback group after they had completed the drawing portion of the 2-page concept map lesson. Feedback was not provided to the No Feedback group. Posttests The multiple-choice criterion posttest originally developed and validated by Dwyer (1972), consisted of 20 questions that dealt with terminology and 20 questions that dealt with comprehension. The terminology test was designed to measure declarative knowledge of facts, terms, and definitions. The comprehension test was designed to measure a more thorough understanding of the processes of the human heart, with a specific focus on the functions of different parts of the heart. The KR-20 reliability for the posttest was 0.83. Procedure The procedure was similar for both the Feedback and No Feedback treatment groups. First, participants completed a demographic survey. Next they completed the 2page training lesson on how to draw a concept map, with or without feedback. Then participants read the 3-page instructional text on the human heart and were asked to draw a concept map of that information on a blank piece of paper while reading the lesson text. Participants could use any terms and any number of terms in their maps. After completing their concept maps, the Feedback group was given an instructorgenerated hierarchical concept map as feedback. They were asked to compare this feedback map to their own map and were allowed to change their map using pens of a different color. Only three of the participants made changes to their concept maps after viewing the feedback map. Finally, the lesson materials and concept maps were collected and then all participants completed the multiple-choice posttests. Scoring Open-Ended Concept Maps. . . 13 Establishing the Expert Concept Map Scoring Referent An optimized expert referent concept map is required for comparison during the scoring process (i.e., in Step 3). With open-ended concept maps, participants may use any terms and any number of terms in their maps, but only those terms also included in an expert referent (including synonyms and metonyms) can count during scoring. In order to optimize the value of the information across the entire set of 60 concept maps, a frequency list containing all of the terms used by the participants was prepared. A common term, such as left ventricle, had a high frequency count, while a term such as white corpuscles had a low frequency count. Functionally equivalent terms, such as auricle and atrium, were listed together as acceptable synonyms. A biology content expert was provided with the instructional text and the frequency list of terms and was asked to draw a concept map of the structure and function of the human heart and circulatory system. The expert was not required to use terms from the list, though the list was intended to influence the expert’s choice of terms. The resulting optimized expert referent concept map contained 26 terms with 36 links between terms. Recall that the instructor-generated concept map that served as feedback during the lesson was hierarchical, following the requirements of Novak and Gowin (1984). However, the optimized expert referent concept map developed after-the-fact for scoring purposes was clearly not hierarchical; it had a network structure that consisted of a cluster of terms in the center, and two flow loops, one to the left and the other to the right of the central cluster. This nonhierarchical network structure for the expert’s map is supported by Ruiz-Primo and Shavelson (1996) and Safayeni, Derbentseva, and Cañas (2003), who have questioned the requirement of imposing hierarchical structure on concept maps. Scoring Open-Ended Concept Maps. . . 14 Step 1 – Collecting Raw Concept Map Proximity Data Link Data Concept map link data were collected following Clariana et al. (in press). First, the links on each concept map were entered into an array that can be analyzed by KNOT software. Each array listed the 26 terms used by the expert down the left column and across the top row, so that every cell in the array represents the relationship between two of the 26 terms. Following convention, since the data above the diagonal are redundant with data below the diagonal, each link array consisted of just the 325 elements in the lower triangle of the array (e.g., (262 – 26)/2). A “1” entered in a link array cell indicates that a link was drawn between those two terms on the concept map while a “0” indicates that no link was drawn. Referring back to Figure 1, since a link is drawn between left atrium and pulmonary vein, then a “1” is entered into the corresponding cell in the link array. Linking terms were not considered, such as “is a”, “has a” and “contains”, only the presence or absence of each link line (ink on paper) was recorded. Link raw data is similarity data, where smaller values indicate weaker relationship. When a student did not include one of the 26 terms, all possible links to that term were coded as “0” to indicate no relationship to the other terms. The inter-rater reliability for transferring links from the paper-based maps to the link array was 0.98. Distance Data Concept map distance data were collected using ALA-Mapper software (Clariana, 2002). ALA-Mapper converts the spatial location of the terms in a concept map into a distance array containing all of the pair-wise distances between terms (refer back to Scoring Open-Ended Concept Maps. . . 15 Figure 1). Each paper-based concept map was recreated in ALA-Mapper, a manual process that can introduce error, though care was taken to maintain the original spatial proportional relationships between the terms in the map. The inter-rater reliability for this process was 0.81. Distance data is dissimilarity data, where larger values indicate weaker relationship. When a student did not include one of the 26 terms, all possible links to that term were coded as infinity to indicate no relationship to the other terms. Step 2 – Reducing Raw Concept Map Data into PFNets KNOT software was used to establish link and distance-based PFNets for each concept map by selecting the command “Analyze Proximity Data” found in the “Data” main menu. To establish PFNets, KNOT software requires information that describes the raw data and also two calculation parameters. The required descriptive information includes: (a) the data structure (e.g., lower triangle), (b) whether data is dissimilarity or similarity data (e.g., the link data are similarity data and the distance data are dissimilarity data), (c) the number of decimal places in the data (e.g., zero for both data sets), (d) the maximum value (e.g., link data set to “1” and distance data set to “1000”), (e) the minimum value (e.g., “0.1” for both data sets). Note that the maximum of 1000 for distance data and the minimum of 0.1 for link data are necessary in order to exclude missing terms from the analysis. In addition, two KNOT software parameters, q and Minkowski’s r, are used to define the rules for including and removing links, thus these two parameters are used to set the amount of data reduction that is desired. Minkowski’s r can range from one to infinity and q can range from two to n - 1 (where n is the number of terms). The r parameter is used in the calculation of the weight between terms (Dearholt & Scoring Open-Ended Concept Maps. . . 16 Schvaneveldt, 1990, p. 3). The q parameter is used to set the depth of the path that is considered during link pruning. Changing the r and q parameters usually results in changes in the resulting PFNets. Using the largest meaningful values of r and q produces a PFNet with the fewest links (the most data reduction) while using the smallest values of r and q produces a PFNet with the most links (the least data reduction). The link raw data in this investigation provides an interesting exception. Because the link raw data consists only of ones and zeros, identical PFNets will be produced no matter what values are set for the r and q parameters. Further, there is no data reduction when creating these link-based PFNets. Here, the expert’s link raw data contains 36 links and the expert’s link-based PFNet contains the same 36 links. For link data, the r and q parameters were set to the maximum values, with r equal to infinity and q equal to 25 (i.e., n – 1) because this is the recommended parameter values for ordinal level data. What values of the r and q parameters are appropriate for raw concept map distance data? Few specific guidelines exist for which values of Minkowski’s r is best for what situation (Durso & Coggins, 1990). Most investigations use the maximum values of r (infinity) and q (n - 1), designated in the form PFNet (r, q), perhaps since this should reveal the most salient information in the raw data and also because most research involving Pathfinder uses ordinal-level data. Rather than use one set of values, Dearholt and Schvaneveldt (1990) state, “Frequently, important information from a given set of proximity data can be obtained from different PFNets, constructed using different values of r and q.” (p. 5) Following this recommendation, four separate sets of PFNets were established from raw distance data across a range of Minkowski’s r values including 1, 2, 3, and infinity. Regarding the q parameter, because distance raw data are the actual Scoring Open-Ended Concept Maps. . . 17 distances between terms in 2-dimensional space, the value used for q is not as critical (regarding violations of the triangle inequality in Euclidean space, see Buzydlowski, 2002, p. 27). Thus, following convention, the q parameter was set to its maximum meaningful value of 25 (i.e., n – 1). Step 3 – Comparing PFNets to the Optimized Expert Referent KNOT software was used to compare participants’ PFNets to the optimized expert referent PFNet by selecting the command “Analyze Networks” found in the “Network” main menu. The software provides two highly correlated measures of similarity called common and configural similarity. Common is a global measure of similarity that is simply the total number of links shared by two PFNets. Common is reported in this investigation because it is easier to understand and also because it was a better predictor than configural similarity in the two previous studies. Reliability of the Concept Map Scores Because the expert’s link-based PFNet contains 36 links, then participants’ maximum possible score is 36 (see the column labeled “Maximum Possible” in Table 1). These link-based scores may be viewed as a 36-item test. A test item response table with 37 columns by 60 rows was created. Column 1 listed the 60 participants’ names while columns 2 through 37 listed the performance of each student for each of the expert’s 36 links. Students received a “1” in the appropriate cell when their link agreed with the expert and a “0” when it did not. For the link-based scores with 36 possible links, Cronbach alpha was 0.86 for the No Feedback group and 0.76 for the Feedback group. In the same way, a test item response table for the distance-based scores was created (34 maximum possible for PFNet (2,25)). Cronbach alpha was 0.80 for the No Feedback group Scoring Open-Ended Concept Maps. . . 18 and 0.66 for the Feedback group. Thus, the link-based scores are a little more reliable than the distance-based scores, and both link-based and distance-based scores for the No Feedback group are more reliable than those of the Feedback group. Table 1. Means and standard deviations for the multiple-choice posttests and the alternate concept map scores. 20 20 1 to 19 2 to 19 No Feedback Group M SD 9.5 (4.6) 8.7 (4.1) 36 1 to 27 11.5 (6.4) 11.8 (5.1) 65 34 31 28 1 to 65 1 to 21 1 to 17 1 to 16 23.4 10.3 9.2 8.4 (13.4) (5.2) (4.5) (4.2) 27.3 9.5 8.4 7.7 (15.0) (4.0) (3.7) (3.4) Maximum Observed Possible Range Multiple-Choice Posttests Terminology Comprehension Concept Map Scores Link data PFNet (, 25) Distance data PFNet (1, 25) PFNet (2, 25) PFNet (3, 25) PFNet (, 25) Feedback Group M SD 8.3 (4.1) 7.0 (3.7) Results and Discussion Though not of central interest in this present analysis, the seven posttest and concept map dependent variables listed in Table 1 were analyzed by MANOVA to determine whether the Feedback and No Feedback group means were different. Though the No Feedback group means for both the multiple-choice and the concept map scores tended to be a little larger than those of the Feedback group (see Table 1), none of the seven dependent variables were significantly different. For example, the greatest between-subjects effect was observed for comprehension posttest scores, with F (1,59) = 2.887, MSe = 0.038, p = .09. The remainder of this section will describe the relative characteristics of the concept map scores. Scoring Open-Ended Concept Maps. . . 19 Estimating the Concurrent Criterion-related Validity of the Concept Map Scores Correlation analysis is a common approach used for estimating concurrent criterion-related validity (Crocker & Algina, 1986). Pearson’s correlation was used to compare the terminology and comprehension multiple-choice criterion posttests to each concept map score shown in Table 2. Table 2. Correlation analyses of concept map scores with the multiple-choice posttests. Term 0.78* Comp 0.54* Feedback Term Comp 0.46* 0.30 0.32 0.48* 0.40* 0.38* 0.45* 0.61* 0.53* 0.49* 0.02 0.26 0.36 0.38* No Feedback Link data PFNet (, 25) Distance data PFNet (1, 25) PFNet (2, 25) PFNet (3, 25) PFNet (, 25) 0.05 0.11 0.18 0.22 (* p <.05; Term – terminology posttest; Comp – comprehension posttest) First, the No Feedback and Feedback groups obtained very different values and patterns of correlations in Table 2 and few of the Feedback group correlations were significant. The low correlations observed for the Feedback group between multiplechoice posttest scores and concept map scores can be at least partially accounted for by attenuation due to the lower test reliability of the Feedback group’s concept map scores, especially for distance-based scores. Further, it is not obvious that concept map scores should strongly correlate with multiple-choice test scores, thus Pearson r values in the 0.30 to 0.40 range seem reasonable and even expected. Thus establishing both convergent and discriminant validity is an important step in understanding what concept maps measure. Scoring Open-Ended Concept Maps. . . 20 However, low correlations for the Feedback group may also be partially accounted for by the negative effect of concept-maps as feedback (Tan, 2000). Recall that the main difference between the Feedback and No Feedback group is the instructorgenerated hierarchical concept map given as feedback to the Feedback group. Lee and Nelson (in press) suggest that presenting completed concept maps to learners as instructional materials places emphasis on “…unintentional imitation of its structure by the learners because the learners are given a concept map and cannot organize the concept map to make it fit with their internal knowledge structures.” Similarly, Lambiotte and Dansereau (1992) report that, “students with more well established schemas for the circulatory system performed less well when structure was imposed by an outline or a knowledge map (sic concept map).” (p. 198) Safayeni et al. (2003) go so far as to suggest that hierarchical concept maps of the circulatory and other biological systems are incompatible with the actual understood nature of those systems (p. 12) and propose a new form of concept map called cyclic concept maps (Kinchin, 2000). Further, neither the participants nor the expert created hierarchical concept maps of this content, and so the instructor-generated hierarchical concept map given as feedback was likely difficult to reconcile since it was so different from the students’ nonhierarchical maps. In hindsight, providing this hierarchical concept map as feedback was ill-advised. Future research should consider the effects of using cyclical versus hierarchical concept maps as lesson feedback. Second, scores derived from concept map link data were, in general, more related to multiple-choice posttest performance than were those derived from concept map distance data. Again, this can be at least partially accounted for by attenuation due to the Scoring Open-Ended Concept Maps. . . 21 lower reliability of the distance-based concept map scores. On the other hand, because it takes extra work to draw the link lines, links drawn on paper are probably more salient to the individual creating the concept map than are the distances between terms. Third, identical to the findings reported by Clariana and Poindexter (2004), for the No Feedback group, distance-based scores relative to link-based scores were a better predictor of comprehension, r = .61 compared to r = .54, and link-based scores relative to distance-based scores were a better predictor of terminology scores, r = .78 compared to r = .48 (see Table 2). This finding suggests that link and distance data in the same map are capturing different aspects of the participants’ knowledge. General Discussion This investigation confirms a technique for scoring open-ended concept maps using a Pathfinder network analysis approach. The concept map score reliability and the criterion-related correlation results indicate that the technique is capturing different aspects of participants’ knowledge of the lesson content. Most automatic concept map scoring systems focus on proposition correctness, which totally depends on the correctness of the linking term connecting the two concepts (see Cañas et al., 2003; Herl et al., 1999; Luckie, 2001). Based on our connectionist view for representing knowledge, the technique applied in this investigation totally disregards linking terms yet manages to capture information. In addition, our technique is different than other current automatic scoring approaches in that (a) it represents concept map raw data in vector arrays that allows for easier manipulation, aggregation, reduction, comparison, and analysis; (b) it provides a measure of map structural complexity that likely reflects the structure of the content; (c) Scoring Open-Ended Concept Maps. . . 22 participants’ concept maps can easily be compared to multiple distinct forms of expert concept maps and to each other as required by the researcher; and (d) pragmatically, it can be applied to both closed and open-ended concept maps. The instructional content and the centrality of the expert referent map in this investigation limit the generalizability of these findings. The lesson content involved blood flow through the circulatory system, which is essentially a cyclical path and Pathfinder networks are path representations (Safayeni et al., 2003). This concept map scoring technique may not be adequate for content that is not path-like. Second, the quality of the expert referent map is critical to the quality of the analyses since every map is compared to the referent map. Future research should consider how consistently superior referent maps can be established, perhaps by using multiple expert maps. The ultimate goal of this line of research is to develop a software tool that can be used to create, save, and automatically score concept maps, providing feedback on acceptable and less acceptable components of the map. Future research should consider using weighted and directional link lines. Students could select links that are dashed, thin, and thick to indicate degree of relatedness and single or double-headed arrows to show the directionality of the link relationship. This type of data can be easily represented in the data arrays used in this investigation and KNOT software can already accommodate this type of analysis. An important module of this software tool is the mathematical approach used to convert raw concept map data into scores. Besides PFNets, future research should consider alternate scoring approaches for directly analyzing raw link and distance data such as correlation, factor analysis, and multidimensional scaling, as well as approaches Scoring Open-Ended Concept Maps. . . 23 for comparing visual representations from the fields of biology (e.g., protein visualization tools) and engineering (e.g., data mining). Further refinement of this technique for scoring concept maps is warranted. References Ausubel, D.P. (1968). Educational psychology: a cognitive view. New York, NY: Holt, Rinehart, and Winston. Buzydlowski, J. W. (2002). A comparison of self-organizing maps and Pathfinder networks for the mapping of co-cited authors. Doctoral dissertation completed at Drexel University. Retrieved October 3, 2004, from http://faculty.cis.drexel.edu/~jbuzydlo/papers/thesis.pdf Cañas, A.J., Coffey, J.W., Carnot, M.J., Feltovich, P., Hoffman, R.R., Feltovich, J., & Novak, J.D. (2003). A Summary of Literature Pertaining to the Use of Concept Mapping Techniques and Technologies for Education and Performance Support. A report for The Chief of Naval Education and Training at Pensacola, Florida. Retrieved October 3, 2004, from http://www.ihmc.us/users/aCañas/Publications/ ConceptMapLitReview/IHMC/Literature Review on Concept Mapping.pdf Clariana, R.B. (2002). ALA-Mapper, version 1.0. Retrieved October 3, 2002, from http://www.personal.psu.edu/rbc4/ala.htm Clariana, R.B. (2003). Latent semantic analysis writing prompt, used with the permission of Knowledge Analysis Technologies, LLC. Retrieved October 30, 2003 from http://www.personal.psu.edu/rbc4/frame.htm Clariana, R.B., Koul, R., & Salehi, R. (in press). The criterion related validity of a computer-based approach for scoring concept maps. International Journal of Instructional Media, 33 (3), in press. Clariana, R.B., & Poindexter, M. T. (2004). The influence of relational and propositionspecific processing on structural knowledge. Presented at the annual meeting of the American Educational Research Association, April 12-16, San Diego, CA. Retrieved April 15, 2004 from http://www.personal.psu.edu/rbc4/aera_2004.doc Crocker, L.A., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart, and Winston. Dearholt, D.W. & Schvaneveldt, R.W. (1990). Properties of Pathfinder Networks. In Schvaneveldt (Ed.), Pathfinder associative networks: studies in knowledge organization (pp. 1-30). Norwood, NJ: Ablex Publishing Corporation. Deese, J. (1965). The structure of associations in language and thought. Baltimore, MD: The Johns Hopkins Press. Durso, F. T., & Coggins, K. A. (1990). Graphs in the social and psychological sciences: empirical contributions of Pathfinder. In Schvaneveldt (Ed.), Pathfinder associative networks: studies in knowledge organization (pp. 31-51). Norwood, NJ: Ablex Publishing Corporation. Dwyer, F. M. (1972). A guide for improving visualized instruction. State College, PA: Learning Services. Scoring Open-Ended Concept Maps. . . 24 Elman, J.L. (in press). An alternative view of the mental lexicon. Trends in Cognitive Science, in press. Retrieved December 10, 2004 from http://www.crl.ucsd.edu/~elman/Papers/elman_tics_opinion_2004.pdf Frase, L. T. (1969). Structural analysis of the knowledge that results from thinking about text. Journal of Educational Psychology, 60 (6, Part 2), 1-16. Goldsmith, T.E., & Davenport, D.M. (1990). Assessing structural similarity of graphs. In Schvaneveldt (Ed.), Pathfinder associative networks: studies in knowledge organization (pp. 75-87). Norwood, NJ: Ablex Publishing Corporation. Harper, M. E., Hoeft, R. M., Evans, A. W. III, & Jentsch, F. G. (2004). Scoring concepts maps: Can a practical method of scoring concept maps be used to assess trainee’s knowledge structures? Paper presented at the Human Factors and Ergonomics Society 48th Annual Meeting. Herl, H.E., O'Neil, H.F. Jr., Chunga, G.K.W.K., & Schacter, J. (1999). Reliability and validity of a computer-based knowledge mapping system to measure content understanding. Computers in Human Behavior, 15, 315-333. Humphreys, M.S., Bain, J.D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96, 208-233. Jonassen, D.H. (1987). Assessing cognitive structure: verifying a method using pattern notes. Journal of Research and Development in Education, 20 (3), 1-13. Jonassen, D. H., Beissner, K., & Yacci, M. (1993). Structural knowledge: techniques for representing, conveying, and acquiring structural knowledge. Hillsdale, NJ: Lawrence Erlbaum Associates. Kinchin, I. M. (2000). Using concept maps to reveal understanding: A two-tier analysis. School Science Review, 81, 41-46. KNOT (1998). Knowledge Network and Orientation Tool for the Personal Computer, version 4.3. Retrieved October 3, 2003 from http://interlinkinc.net/ Lambiotte, J., & Dansereau, D. (1992). Effects of knowledge maps and prior knowledge on recall of science lecture content. Journal of Experimental Education, 60, 189201. Lee, Y., & Nelson, D. (in press). Viewing or visualizing – which concept map strategy works best on problem solving performance? British Journal of Educational Technology, in press. Luckie, D. (2001). C-TOOLS: Concept-Connector Tools for online learning in science. A National Science Foundation grant. Retrieved October 3, 2003 from http://www.msu.edu/~luckie/CTOOLS.pdf McClure, J., Sonak, B., & Suen, H. (1999). Concept map assessment of classroom learning: reliability, validity, and logistical practicality. Journal of Research in Science Teaching, 36, 475-492. McClelland, J.L., B.L. McNaughton, & O'Reilly, R.C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models. Psychological Review, 102, 419-457. Novak, J.D., & Gowin, D.B. (1984). Learning how to learn. Cambridge, UK: Cambridge University Press. Scoring Open-Ended Concept Maps. . . 25 Pike, R. (1984). A comparison of convolution and matrix distributed memory systems. Psychological Review, 91, 281-294. Robinson, D.H., Corliss, S.B., Bush, A.M., Bera, S.J., & Tomberlin, T. (2003). Optimal presentation of graphic organizers and text: A case for large bites? Educational Technology Research and Development, 51 (4), 25-41. Ruiz-Primo, M. A., & Shavelson, R.J. (1996). Problems and issues in the use of concept maps in science assessment. Journal of Research in Science Teaching, 33, 569600. Ruiz-Primo, M. A., Schultz, S., Li, M., & Shavelson, R. J. (1999). On the cognitive validity of interpretations of scores from alternative concept mapping techniques. (CSE Tech. Rep. No. 503). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Rye, J. A., & Rubba, P. A. (2002). Scoring concept maps: an expert map-based scheme weighted for relationships. School Science and Mathematics, 102, 33-44. Safayeni, F., Derbentseva, N., & Cañas, A.J. (2003). Concept maps: a theoretical note on concepts and the need for cyclic concept maps. Retrieved October 3, 2004 from http://cmap.ihmc.us/Publications/ResearchPapers/Cyclic%20Concept%20Maps.p df Schvaneveldt, R.W. (1990). Pathfinder associative networks: studies in knowledge organization. Norwood, NJ: Ablex Publishing Corporation. Schvaneveldt, R.W., Dearholt, D.W., & Durso, F.T. (1988). Graph theoretic foundations of Pathfinder networks. Computers and Mathematics with Applications, 15, 337345. Schau, C., Mattern, N., Zeilik, M., & Teague, K.W. (1999). Select-and-fill-in concept map scores as a measure of students’ connected understanding of introductory astronomy. Paper presented at the Annual Meeting of the American Educational Research Association (Montreal, Canada, April). Available from ERIC ED434909. Shavelson, R. J., Lang, H., & Lewin, B. (1994). On concept maps as potential authentic assessment is science. (CSE Tech. Rep. No. 388). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Tan, S.C. (2000). The effects of incorporating concept mapping into computer assisted instruction. Journal of Educational Computing Research, 23, 113-131. Taricani, E.M. (2002). Effects of the level of generativity in concept mapping with knowledge of correct response feedback on learning. Dissertation Abstracts International, ISBN # 0-493-67101-3. Yin, Y., Vanides, J., Ruiz-Primo, M.A., Ayala, C.C., & Shavelson, R.J. (2004). A comparison of two construct-a-concept-map science assessments: created linking phrases and selected linking phrases. (CSE Tech. Rep. No. 624). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).