TerraNova Evaluation of a Standardized Test Mini

advertisement
TerraNova
Evaluation of a Standardized
Test
Mini-Project 1
Teresa Frields and Mitzi Hoback
A. General Information
Title: TerraNova
Publisher: CTB/McGraw-Hill
Date of Publication: 1997
A. General Information
Cost
Varies as to what is purchased
$122 per 30 Complete Battery Plus
consumable test booklets
$92.50 per 30 Complete Battery Plus
reusable test booklets
A. General Information
Administration Time
Varies by test and level
Typically given over a period of several test
sessions or days
Fall, Winter, and Spring testing periods
available
B. Brief Description of Purpose
and Nature of Test
General Purpose of Test
Constructed as a “comprehensive modular
assessment series” of student achievement
Promoted as a device to help diverse
audiences understand student academic
achievement and progress
Reports provide useful and informative data
which allows for national comparison of
group and individual achievement
B. Brief Description of
Purpose and Nature of Test
Population for which test is
applicable
K-12
Reading/language arts and mathematics
available for K-12
Science and social studies tests available
1-2
B. Brief Description of Purpose
and Nature of Test
Description of Content
Multiple choice format
Generates precise norm-referenced achievement
scores and a full complement of objective mastery
scores
Designed to measure concepts, processes, and
skills taught throughout the nation
Content areas measured are Reading/Language
Arts, Mathematics, Science, and Social Studies
B. Brief Description of
Purpose and Nature of Test
Appropriateness of
Assessment Method
Selected-response items can provide
information on basic knowledge and some
patterns of reasoning
Does not provide evidence for performance
standards/targets
Other TerraNova formats provide a
combination of selected-response and
constructed-response
C. Technical Evaluation
Norms/Standards
1. Type – The battery generates precise normreferenced achievement scores and a full
compliment of objective mastery scores.
Types of scores provided:
 Scaled Scores
 Grade Equivalents
 National Percentiles
 National Stanines
 Normal Curve Equivalents
Reports are provided both individually and as groups of
students.
C. Technical Evaluation
Norms/Standards
2. Standardization Sample – Size: The
norming sample was based on a stratified
national sample.
 295 schools
 Fall & Spring norming studies involved
between 860,000 and 1,720,000
C. Technical Evaluation
Norms/Standards
2. Standardization Sample –
Representativeness:
Separate sampling designs were used for
institutions of different types
Public schools stratified by region,
community, type, size, & Orshansky
Percentile (an indicator of socioeconomic
status)
C. Technical Evaluation
Norms/Standards
Standardization Sample – procedure followed
in obtained sample:
Spring Standardization – April, 1996
Fall Standardization – October 1996
Recommended test administration period is
five week window centered on the norming
periods
C. Technical Evaluation
Norms/Standards
3. Standardization Sample – Availability of
subgoup norms
Questionnaire sent to participating schools
95% responded in the fall
100% responded in the spring
C. Technical Evaluation
Norms/Standards
3. Standard setting procedures employed –
qualifications and selection of judges:
Nominations were made of experienced
teachers and curriculum specialists with
national reputations
Judges had to possess “deep understanding”
of one of the five content areas
C. Technical Evaluation
Norms/Standards
3. Standard setting procedures employed –
number of judges:
2 committees for each of 5 content areas
Primary/Elementary and Middle/High
School
4-5 teachers per committee, one curriculum
expert (external) and one CTB content
expert (approximately 70 people total)
C. Technical Evaluation
Reliability
1. Types – Measure of internal consistency:
 Kuder-Richardson Formula 20 (KR20)
 Item pattern KR20 (a unique measure that
takes into account the additional accuracy
associated with IRT item-pattern scoring)
 Coefficient alpha
On individual student score reports, a student’s score is
reported along with a confidence band.
C. Technical Evaluation
Reliability
2. Results:
Reliability coefficients were consistently
.80s and .90s
Spelling consistently lower
Grade 1 and 2 also had slightly lower
coefficients
C. Technical Evaluation
Validity
1. Types – Content-related:
Numerous studies (e.g. classroom pilots, usability,
sensitivity) conducted
Advisory panel of teachers, administrators, and
content specialists from all parts of country
Based on recommendations of SCANS
(Secretary’s Commission of Achieving Necessary
skills) report
C. Technical Evaluation
Validity
1. Types – Content-related:
 Developers and scorers worked together
as constructed-response items were scored
for consistency and accuracy of scoring
guides and process
 Reviewed various informational sources
for children to determine topics of interest
C. Technical Evaluation
Validity
1. Types – Criterion-related:
 Conducted variety of research studies,
such as correlation with SAT and ACT,
NAEP, TIMMS
C. Technical Evaluation
Validity
1. Types – Construct-related:
Careful test development process to support
content validity and comprehensiveness of
test
Construct validity for skills, concepts and
processes measured in each subject
C. Technical Evaluation
Validity
2. Results:
Provides achievement scores that are valid
for several types of educational decision
making
A thorough validity evaluation
encompassed content-, criterion-, and
construct-related evidence
Bias
Used the following procedures to reduce the
amount of bias:
Ensured valid test plan
Followed stringent editorial guidelines
Conducted expert reviews
Analyzed student data for differential item
functioning
Selected best items
D. Summary of MMY Reviews
Reviewed by Judith A. Monsaas, Assoc.
Prof. Of Education, North Georgia College
and State University, Dahlonega, GA
Tests are “very engaging and user friendly”.
Materials are well-constructed, and
attractive,
Addition of performance standards is
helpful for schools moving toward a
standards-based curriculum framework
D. Review, continued
Claims to assist in decision making in many
areas, including evaluation of student
progress, instructional program planning,
curriculum analysis, class grouping, etc.
This reviewer believes they can support this
claim
Has a particularly useful section for parents
on “Using Test Results”
D. Review, continued
“Although these tests are attractive and
more engaging than most achievement tests
I have inspected, I doubt that students will
forget that they are taking a test.”
Good section on “Avoiding
Misinterpretations” when using grade
equivalents is helpful
D. Review, continued
Process used to develop the test and ensure
content validity was very thorough and
clearly explained
 Norming and score reporting methods are
well-developed
Reviewer’s only problem is with the
mastery classifications for the criterionreferenced interpretations. She feels they
are arbitrarily defined.
D. Review continued
Reviewed by Anthony J. Nitko, Professor,
Department of Educational Psychology,
University of Arizona, Tucson, AZ
One change in the new edition is that items
within each subtest are organized according
to contextual themes, countering the
criticism that standardized tests assess
strictly decontextualized knowledge and
skills
D. Review Continued
Developers carefully analyzed curriculum
guides from around the country, as well as
national and state standards and textbook
series
Several usability studies were run. The
results of these were used to improve test
items, teachers’ directions, and page designs
D. Review continued
Earlier editions criticized for problems
related to speed. This version corrects
those. Typically fewer than 4% of students
fail to respond to the last item on each
subtest
 “One of the better batteries of its type.”
Teachers’ materials exceptionally well-done
and informative
E. Critique of the Instrument
Our research on the TerraNova helps us to
draw the following conclusions:
A complete and comprehensive test
Numerous measures and studies were done
to ensure technical requirements
TerraNova takes pride in its overall test
design, construction, norming, national
standardization process, reliability, validity,
and the reduction of bias issues
E. Critique of the Instrument
Does a good job supporting its purpose as a
measure to aid in student achievement
Provides three main types of information
including norm-referenced information,
some criterion information, and standardsbased performance information
Serves as a good measure in comparing
student achievement with national
performances
E. Critique of the Instrument
This is not a test that should be used by itself. It is
simply one type of measure and cannot be the only
measure used in making critical decisions
When used in conjunction with other test methods
and teacher judgment, it is an effective measure
for what it purports to do
Caution should be used when using this
assessment to track state standards, although it
purports to be accurately correlated, there is no
substantial proof.
E. Critique of the Instrument
Interesting Tidbits:
Del Harnish has done research on bias
issues and is published for his work on the
TerraNova
Testnote Clarity is a computer program
available with the disaggregation of data
which allows the user to customize and
apply to district curriculum
Download