
Classroom Assessment
A Practical Guide for Educators
by Craig A. Mertler
Chapter 11
Standardized Tests
A standardized test is one that is administered, scored,
and interpreted in identical fashion for all examinees.
Standardized tests allow educators to gain a sense of
the average level of performance for a well-defined
group of students.
Classroom teachers have no control over these types of
tests, but must understand their nature and
Achievement tests measure academic skills; aptitude
tests measure potential or future achievement.
Nationally known standardized tests include:
Many states also use state-mandated tests, which are
authorized by state legislatures or boards of education,
and are used as high school graduation requirements.
Two types of standardized tests are norm-referenced
(no predetermined passing score; performance is based
on comparisons to others) and criterion-referenced
(performance is compared to preestablished criteria).
Methods of Reporting Scores on
Standardized Tests
Criterion-Referenced Tests
• Permit teachers to draw inferences about what
students can do relative to large domain.
• Answer the following questions:
 What does this student know?
 What can this student do? What content and skills has the
student mastered?
Report raw scores, usually in the form of number or
percentage of items answered correctly.
Other, less common results include speed of
performance, quality of performance, and precision of
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests
• Permit comparisons to well-defined norm group
(intended to represent current level of achievement
for a specific group of students at a specific grade
• Answer the following questions:
 What is the relative standing of this student across this
broad domain of content?
 How does the student compare to other similar students?
Scores are often transformed to a common
distribution—normal distribution or bell-shaped curve.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
Normal distribution
 Three main characteristics:
Distribution is symmetrical.
Mean, median, and mode are the same score
and are located at center of distribution.
Percentage of cases in each standard deviation
is known precisely.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
Normal distribution (continued)
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
• Raw score
 Number of items answered correctly.
 Not very useful for norm-referenced tests.
 Score must be transformed in order to be useful
for comparisons.
• Percentile rank: Single number that indicates the
percentage of norm group that scored below a given
raw score.
 Ranges from 1 to 99; much more compact in
middle of distribution (doesn’t represent equal
 Often misinterpreted as percentage raw scores.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
• Grade-equivalent score: The grade in the norm group
for which a certain raw score was the median
 Consists of two numerical components: The first
number indicates grade level and the second
indicates the month during that school year
(ranges from 0 to 9); for example, grade-equivalent
score of 4.2.
 Often misinterpreted as standard to be achieved.
 Although scores represent months, they do not
represent equal units.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
Standardized score: Score that result from
transformation to fit normal distribution.
 Overcomes previous limitation of unequal units.
 Allows for comparison of performance across
two different measures.
 Reports performance on various scales to
determine how many standard deviations the
score is away from the mean.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
• Standardized scores (continued)
 z-score
 More than 99% of scores fall in the range of –3.00 to
 Sign indicates whether above or below mean;
number indicates how many standard deviations
away from mean.
 Half the students will be above; half will be below.
 Problems with interpreting negative scores.
 Provides location of score in distribution with mean
of 50 and standard deviation of 10 (over 99% of
scores range from 20 to 80).
 Can be misinterpreted as percentages.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
• Standardized scores (continued)
 SAT/GRE score
 Provides location of score in distribution with mean
of 500 and standard deviation of 100 (over 99% of
scores range from 200 to 800).
Stanine score
 Provides the location of a raw score in a specific
segment or band of the normal distribution.
 Mean of 5 and standard deviation of 2; range from 1
to 9.
 Represents coarse groupings; does not provide very
specific information.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
Standardized scores (continued)
Normal curve equivalent (NCE) score
 Mean of 50 and standard deviation of 21.06;
matches percentile ranks at three specific points (1,
50, and 99).
 Unlike percentile ranks, represents equal units.
Deviation IQ score
 Provides location of score in distribution with mean
of 100 and standard deviation of 15 or 16.
 Primarily used with measures of mental ability.
Methods of Reporting Scores on
Standardized Tests
Norm-Referenced Tests (continued)
Standardized scores (continued)
 All standardized scores provide the same
information, simply reported on different scales.
Methods of Reporting Scores on
Standardized Tests
Interpreting Student
Norm-Referenced Tests
Error exists in all educational measures.
 Can affect scores both negatively and positively.
Standard error of measurement (standard error or
SEM): The average amount of measurement error
across students in norm group.
 Provides a range (known as a confidence interval)
of performance when both added and subtracted
from test score.
Confidence Interval = Score ± Standard Error
Interpreting Student
Standard error of measurement (continued)
 Purpose of confidence interval is to determine
range of scores that we are reasonably confident
represents a student’s true ability.
 68% confidence interval (observed score ± one
standard error).
 96% confidence interval (observed score ± two
standard errors).
 99% confidence interval (observed score ± three
standard errors).
Interpreting Student
Standard error of measurement (continued)
 On norm-referenced tests, confidence intervals
are presented around student’s obtained
percentile rank score.
Known as national percentile bands.
Can be used to compare subtests by
examining the bands for overlap.
 When bands overlap, there is no real difference
between estimates of true achievement on subtests.
Uses of Test Results for
Two main ways that test results can be used by
For revising instruction for entire class.
For developing intervention strategies for individual
Standardized test results have not typically been used
to aid teachers in making instructional decisions.
Data-driven decision making takes some practice and
experience for classroom teachers.
Uses of Test Results for
For revising instruction for the entire class:
Standardized Test Scores
1. Identify any content area or subtest where there are high percentages of students who
performed below average.
2. Based on these percentages, rank order the 6–8 content areas or subtests with the poorest
3. From this list, select 1–2 content areas to examine further by addressing the following:
• Where is this content addressed in our district’s curriculum?
• At what point in the school are these concepts/skills taught?
• How are the students taught these concepts/skills?
• How are students required to demonstrate that they have mastered the concepts/skills? In
other words, how are they assessed in the classroom?
4. Identify new/different methods of instruction, reinforcement, assessment, etc.
Revise Instruction
Uses of Test Results for
For developing intervention strategies for individual
Standardized Test Scores
1. Identify any content area or subtest where the student performed below average.
2. Rank order the 6–8 content areas or subtests with the poorest performance.
3. From this list, select 1–2 content areas to serve as the focus of the intervention.
4. Identify new/different methods of instruction, reinforcement, assessment, etc., in order to
meet the needs of the individual student.
Revise Instruction
Analyzing Student
Performance—An Example