TerraNova Evaluation of a Standardized Test Mini-Project 1 Teresa Frields and Mitzi Hoback A. General Information Title: TerraNova Publisher: CTB/McGraw-Hill Date of Publication: 1997 A. General Information Cost Varies as to what is purchased $122 per 30 Complete Battery Plus consumable test booklets $92.50 per 30 Complete Battery Plus reusable test booklets A. General Information Administration Time Varies by test and level Typically given over a period of several test sessions or days Fall, Winter, and Spring testing periods available B. Brief Description of Purpose and Nature of Test General Purpose of Test Constructed as a “comprehensive modular assessment series” of student achievement Promoted as a device to help diverse audiences understand student academic achievement and progress Reports provide useful and informative data which allows for national comparison of group and individual achievement B. Brief Description of Purpose and Nature of Test Population for which test is applicable K-12 Reading/language arts and mathematics available for K-12 Science and social studies tests available 1-2 B. Brief Description of Purpose and Nature of Test Description of Content Multiple choice format Generates precise norm-referenced achievement scores and a full complement of objective mastery scores Designed to measure concepts, processes, and skills taught throughout the nation Content areas measured are Reading/Language Arts, Mathematics, Science, and Social Studies B. Brief Description of Purpose and Nature of Test Appropriateness of Assessment Method Selected-response items can provide information on basic knowledge and some patterns of reasoning Does not provide evidence for performance standards/targets Other TerraNova formats provide a combination of selected-response and constructed-response C. Technical Evaluation Norms/Standards 1. Type – The battery generates precise normreferenced achievement scores and a full compliment of objective mastery scores. Types of scores provided: Scaled Scores Grade Equivalents National Percentiles National Stanines Normal Curve Equivalents Reports are provided both individually and as groups of students. C. Technical Evaluation Norms/Standards 2. Standardization Sample – Size: The norming sample was based on a stratified national sample. 295 schools Fall & Spring norming studies involved between 860,000 and 1,720,000 C. Technical Evaluation Norms/Standards 2. Standardization Sample – Representativeness: Separate sampling designs were used for institutions of different types Public schools stratified by region, community, type, size, & Orshansky Percentile (an indicator of socioeconomic status) C. Technical Evaluation Norms/Standards Standardization Sample – procedure followed in obtained sample: Spring Standardization – April, 1996 Fall Standardization – October 1996 Recommended test administration period is five week window centered on the norming periods C. Technical Evaluation Norms/Standards 3. Standardization Sample – Availability of subgoup norms Questionnaire sent to participating schools 95% responded in the fall 100% responded in the spring C. Technical Evaluation Norms/Standards 3. Standard setting procedures employed – qualifications and selection of judges: Nominations were made of experienced teachers and curriculum specialists with national reputations Judges had to possess “deep understanding” of one of the five content areas C. Technical Evaluation Norms/Standards 3. Standard setting procedures employed – number of judges: 2 committees for each of 5 content areas Primary/Elementary and Middle/High School 4-5 teachers per committee, one curriculum expert (external) and one CTB content expert (approximately 70 people total) C. Technical Evaluation Reliability 1. Types – Measure of internal consistency: Kuder-Richardson Formula 20 (KR20) Item pattern KR20 (a unique measure that takes into account the additional accuracy associated with IRT item-pattern scoring) Coefficient alpha On individual student score reports, a student’s score is reported along with a confidence band. C. Technical Evaluation Reliability 2. Results: Reliability coefficients were consistently .80s and .90s Spelling consistently lower Grade 1 and 2 also had slightly lower coefficients C. Technical Evaluation Validity 1. Types – Content-related: Numerous studies (e.g. classroom pilots, usability, sensitivity) conducted Advisory panel of teachers, administrators, and content specialists from all parts of country Based on recommendations of SCANS (Secretary’s Commission of Achieving Necessary skills) report C. Technical Evaluation Validity 1. Types – Content-related: Developers and scorers worked together as constructed-response items were scored for consistency and accuracy of scoring guides and process Reviewed various informational sources for children to determine topics of interest C. Technical Evaluation Validity 1. Types – Criterion-related: Conducted variety of research studies, such as correlation with SAT and ACT, NAEP, TIMMS C. Technical Evaluation Validity 1. Types – Construct-related: Careful test development process to support content validity and comprehensiveness of test Construct validity for skills, concepts and processes measured in each subject C. Technical Evaluation Validity 2. Results: Provides achievement scores that are valid for several types of educational decision making A thorough validity evaluation encompassed content-, criterion-, and construct-related evidence Bias Used the following procedures to reduce the amount of bias: Ensured valid test plan Followed stringent editorial guidelines Conducted expert reviews Analyzed student data for differential item functioning Selected best items D. Summary of MMY Reviews Reviewed by Judith A. Monsaas, Assoc. Prof. Of Education, North Georgia College and State University, Dahlonega, GA Tests are “very engaging and user friendly”. Materials are well-constructed, and attractive, Addition of performance standards is helpful for schools moving toward a standards-based curriculum framework D. Review, continued Claims to assist in decision making in many areas, including evaluation of student progress, instructional program planning, curriculum analysis, class grouping, etc. This reviewer believes they can support this claim Has a particularly useful section for parents on “Using Test Results” D. Review, continued “Although these tests are attractive and more engaging than most achievement tests I have inspected, I doubt that students will forget that they are taking a test.” Good section on “Avoiding Misinterpretations” when using grade equivalents is helpful D. Review, continued Process used to develop the test and ensure content validity was very thorough and clearly explained Norming and score reporting methods are well-developed Reviewer’s only problem is with the mastery classifications for the criterionreferenced interpretations. She feels they are arbitrarily defined. D. Review continued Reviewed by Anthony J. Nitko, Professor, Department of Educational Psychology, University of Arizona, Tucson, AZ One change in the new edition is that items within each subtest are organized according to contextual themes, countering the criticism that standardized tests assess strictly decontextualized knowledge and skills D. Review Continued Developers carefully analyzed curriculum guides from around the country, as well as national and state standards and textbook series Several usability studies were run. The results of these were used to improve test items, teachers’ directions, and page designs D. Review continued Earlier editions criticized for problems related to speed. This version corrects those. Typically fewer than 4% of students fail to respond to the last item on each subtest “One of the better batteries of its type.” Teachers’ materials exceptionally well-done and informative E. Critique of the Instrument Our research on the TerraNova helps us to draw the following conclusions: A complete and comprehensive test Numerous measures and studies were done to ensure technical requirements TerraNova takes pride in its overall test design, construction, norming, national standardization process, reliability, validity, and the reduction of bias issues E. Critique of the Instrument Does a good job supporting its purpose as a measure to aid in student achievement Provides three main types of information including norm-referenced information, some criterion information, and standardsbased performance information Serves as a good measure in comparing student achievement with national performances E. Critique of the Instrument This is not a test that should be used by itself. It is simply one type of measure and cannot be the only measure used in making critical decisions When used in conjunction with other test methods and teacher judgment, it is an effective measure for what it purports to do Caution should be used when using this assessment to track state standards, although it purports to be accurately correlated, there is no substantial proof. E. Critique of the Instrument Interesting Tidbits: Del Harnish has done research on bias issues and is published for his work on the TerraNova Testnote Clarity is a computer program available with the disaggregation of data which allows the user to customize and apply to district curriculum