Application of Generalizability Theory to Concept-Map Assessment Research Yue Yin & Richard J. Shavelson Stanford Educational Assessment Laboratory (SEAL) Stanford University & CRESST AERA 2004, San Diego CA Overview • Part 1: Feasibility of applying G-theory to concept-map assessment (CMA) research - Examining the dependability of CMA scores - Designing a CMA for a particular application - Narrowing down alternatives • Part 2: Empirical study of using G-theory to compare two CMAs: - Construct-a-map with created linking phrases (C) - Construct-a-map with selected linking phrases (S) A Concept-map Concepts/Terms Linking lines Linking Phrases Proposition Variations in CMA Components Variation Examples Task -Topic only -Topic and concepts (C) -Topic, concepts and linking phrases (S) -Topic, incomplete concepts or incomplete linking phrases (fill-in-the-nodes or fill-in-thelines) Response -Computer -Paper-pencil Scoring System -Link score -Concept score -Proposition score -Structure score Part 1 Feasibility of Applying G Theory to CMA Research Viewing CMA with G theory • Basic idea A particular type of score, given by a particular rater, based on a particular type of concept map, on a particular occasion, … is a sample from a multifaceted universe. • Object of measurement People—the variation in students’ knowledge structure • Facets Task (concept & proposition), response format, scoring system, rater, occasion, … G theory vs. CTT Similarity • • • • Concept-term sampling Proposition sampling Rater sampling Occasion sampling • • • • Equivalence of alternate forms Internal consistency Inter-rater reliability Stability over time G Theory’s Advantage • Integrate conceptually and simultaneously evaluate all the technical properties above • Estimate not only the effect of individual facets, but also interaction effects • Permits us to optimize an assessment’s technical quality Examining Technical Properties & Designing Assessments • Examining dependability (G study) How well can a measure of student’s declarative knowledge structure be generalized across concept map tasks? scoring systems? occasions? raters? propositions? different concept samples? • Designing an assessment (D study) How many concept map tasks, scoring systems, occasions, raters, propositions, and/or different concept samples will be needed to obtain a reliable measurement of students’ declarative knowledge structure? Narrowing Down Alternatives • Task - Which task type is more reliable over raters, occasions, propositions, concept samples? - Accordingly, this task needs fewer raters, occasions, propositions, and concept samples. • Scoring system - Which scoring system is more reliable over raters, occasions, propositions, concept samples? - Accordingly, this scoring system needs fewer raters, occasions, propositions, and concept samples. Part 2 Empirical Study of Using G-theory to Compare Two CMAs Two Frequently Used CMAs • Construct-a-map with created linking phrases (C)--Provides a cognitively valid measure of knowledge structure (e.g., Ruiz-Primo et al., 2001 & Yin et al., 2004) • Construct-a-map with selected linking phrases (S)--Provides an efficient way to measure knowledge structure (e.g., Klein et al., 2001) Method • Concept-map task - 9 Concepts (for C & S) water, volume, cubic centimeter, wood, density, mass, buoyancy, gram, and matter - 6 Linking phrases (for S only) is a measure of… has a property of… depends on… is a form of… is mass divided by… divided by volume equals… • Participants - 92 eighth-graders - 46 girls - previously studied a related unit - no related instruction between two occasions • Procedures C S (n = 22) S C (n = 23) C C (n = 26) S S (n = 21) Criterion Map Water Wood has is a form of has has has has a property of Mass is a form of Matter has a property of divided by volume equals is unit of has Density depends on Gram has has has a property of is mass divided by is a unit of CC Buoyancy Volume Mandatory Propositions Source of Variation CS & SC CC & SS • • • • • • • • • • • • • • Person (P) Proposition/Item (I) Format (F) PxF PxI FxI P x F x I, e Person (P) Proposition/Item (I) Occasion (O) PxO PxI OxI P x O x I, e Variance Component Estimate G Study in SC & CS 70% Percent of Total Variability 60% 50% 40% CS SC 30% 20% 10% 0% P F I PF Source PI FI PFI,e G Study in CC & SS Percent of Total Variability 70% 60% 50% 40% CC SS 30% 20% 10% 0% P O I PO Source PI OI POI,e D Study for C CMA 1 0.9 Relative G Coefficient 0.8 0.7 0.6 1 0.5 2 3 0.4 0.3 0.2 0.1 0 0 4 8 12 16 20 24 Item/Proposition Numbers 28 32 D Study for S CMA 1 0.9 Relative G Coefficient 0.8 0.7 1 0.6 2 0.5 3 0.4 0.3 0.2 0.1 0 0 4 8 12 16 20 Item/Proposition Number 24 28 32 Conclusions • G study pinpoints multiple sources of measurement error, thereby giving insight into how to improve the reliability and applicability of CMA via a D study • C and S mapping tasks are not equivalent in their technical properties • Fewer occasions and propositions are needed in S than C to get a reliable evaluation of students’ declarative knowledge structure Thank You for Your Interest! To get the complete paper, please either contact Yue Yin at yyin@stanford.edu Or download the file directly at http://www.stanford.edu/dept/SUSE/SEAL/