Norm-Referenced Test

advertisement
Standardized Testing and
California Schools’ API
Scores
What’s the Connection?
Let’s Start Thinking




1. Where is the best place to examine
direct data about student learning?
2. List at least three advantages and three
disadvantages to using standardized
assessment tools.
3. List at least three advantages and three
disadvantages to using local or homegrown
assessment tools.
4. What are some advantages to embedded
assessment?
What’s the Deal with Testing?
As a society, we like numbers. If sometime can be
quantified, it is viewed as valid or more scientific.
If it cannot be quantified, we view the activity with
suspicion.
Machine scoring of a test is fast, efficient, and
cheap.
Hand scoring of a test is slow, time consuming, and
very expensive.
Lessons from the Past
 Mass testing came about in the late 1800’s / early 1900’s.
 Originally used to decide who was qualified to attend
universities and who was bound to work in factories.
 Attempted to model the efficient factory methods of Henry
Ford – test should be easy, cheap, and work for everyone.
 Early IQ Tests (the Alpha-Beta Tests) were developed for
the U.S. Army as a way to decide the career path of new
recruits.
 Early test also developed to determine which immigrants
could enter the U.S.
Standardized Tests – What’s the
Difference?

Criterion-Referenced Test
Criterion-referenced tests, also called mastery tests,
compare a person's performance to a set of
objectives. Anyone who meets the criterion can get
a high score.
Everyone knows what the benchmarks / objectives
are and can attain mastery to meet them.
It is possible for ALL the test takers to achieve
100% mastery.
Standardized Tests – What’s the
Difference?
Norm-Referenced Test
Norm-referenced tests compare an individual's
performance with the performance of others.
They are designed to yield a normal curve, with 50%
of test takers scoring above the 50th percentile and
50% scoring below it, so half the test takers MUST
pass and half the test takers MUST fail
The test makers design the test with questions that
MOST people will get incorrect.
If too many people get a question correct, or too many
score well, then test questions are “thrown out” until
they achieve a normal curve again.

Interpreting Test Scores
(some definitions)



Raw score. This is the number of items the student
answered correctly. It is used to calculate the other,
more useful scores.
Stanine. One of nine equal sections of the normal
curve. Stanines can be easily averaged and compared
from test to test, but are less precise than other scores.
Normal curve equivalent (NCE). For these scores,
the normal curve is divided into equal units ranging
from 1 to 99, with an average of 50. These can be
averaged and compared from test to test or year to
year.
Normal Curve


Half of the test takers
are grouped into the
“passing” region of
the curve and half into
the “failing” region of
the curve.
So by definition, half
the test takers MUST
“fail”, i.e. be below the
50th percentile.
State/School Goals

So when a school says that their goal is to
have 70% of their students above the 50th
percentile, is this possible?

Well, yes, but it would mean that another
school would have to have 70% of their
students below the 50th percentile.
Closer to Home: San Diego
City Schools (SDCS)


In 2001, SDCS officials reported that as a
district (second largest in the state), they
had 66% of their students above the 50th
percentile on the SAT/9 test for 2000.
The news media reported “the shame of
SDCS” because 1/3 of their students
where below the 50th percentile.
Was this a fair report??
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING


Many educators and members of the public fail to
grasp the distinctions between criterion-referenced
and norm-referenced testing. It is common to hear the
two types of testing referred to as if they serve the
same purposes, or shared the same characteristics.
Much confusion can be eliminated if the basic
differences are understood.
The following is adapted from: Popham, J. W. (1975).
Educational evaluation. Englewood Cliffs, New
Jersey: Prentice-Hall, Inc.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension
Criterion-Referenced
Tests
Norm-Referenced
Tests
Purpose
To determine whether each
student has achieved specific
skills or concepts.
To find out how much
students know before
instruction begins and after
it has finished.
To rank each student with
respect to the
achievement of others in
broad areas of knowledge.
To discriminate between high
and low achievers.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension
Criterion-Referenced
Tests
Content
Measures specific
skills which make up a
designated curriculum.
These skills are
identified by teachers
and curriculum
experts.
Each skill is expressed
as an instructional
objective.
Norm-Referenced
Tests
Measures broad skill areas
sampled from a variety of
textbooks, syllabi, and the
judgments of curriculum
experts.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension
Criterion-Referenced
Tests
Each skill is tested by at
Characteristics least four items in order to
obtain an adequate sample
of student performance and
to minimize the effect of
guessing.
The items which test any
given skill are parallel in
difficulty.
Item
Norm-Referenced
Tests
Each skill is usually tested by
less than four items.
Items vary in difficulty.
Items are selected that
discriminate between high
and low achievers.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension
Criterion-Referenced
Tests
Score
Each individual is
Interpretation compared with a preset
standard for acceptable
achievement. The
performance of other
examinees is irrelevant.
A student's score is usually
expressed as a percentage.
Student achievement is
reported for individual
skills.
Norm-Referenced
Tests
Each individual is compared
with other examinees and
assigned a score--usually
expressed as a percentile, a
grade equivalent score, or a
stanine.
Student achievement is
reported for broad skill
areas, although some normreferenced tests do report
student achievement for
individual skills.
Tests Currently Used in
California
California Achievement Test – 6th Edition
(CAT/6): National Norm Referenced Test
 California Standards Test (CST): State Norm
Referenced Test w/ Scaled Scores
Golden State Exam: Criterion Referenced Test
CA-High School Exit Exam (CA-HSEE):
Criterion Referenced Test
Testing Case In Point
Testing Case In Point

In this scenario we will use a fictitious
“norm-referenced” test being given a a
single high school.
Testing Case In Point


John and his fellow students at Anywhere
High School are given the “Let’s Achieve
Test” version 1 (LAT/1).
The LAT/1 is a norm-referenced test.
Testing Case In Point



John does not perform well on the test,
compared to the other test takers.
He scores below the 50th percentile and is
classified “below grade level”.
John spends the next school year getting
extra tutoring, staying after school, and
going to Saturday tutoring sessions.
Testing Case In Point



The following school year on the LAT/1,
John performs better than he did the
previous year.
However, because of a school-wide focus
on the test, all the other students in the
school also perform better.
As a result, John’s norm-reference test
score is still below the 50th percentile and
he is still classified as “below grade level”.
Academic Performance Index (API)

The API score was originated to provide a
systematic method to rank order schools based
on a number of criteria. It is to measure
academic growth and performance of a school.
The schools would receive a rank compared to
ALL other schools in the state and a second
ranking comparing them to SIMILAR schools
around the state.
Early Proposed API Criteria (1999):
Test Results (SAT/9) – 60% of score
Attendance Rates
Graduation Rates
Other statewide test results (GSE, CA-HSEE)

From 1999 to 2002 ONLY the SAT/9 Test results are
used to calculate 100% of a school’s API score.
Current API Criteria
(baseline set in 2002):
 California Achievement Test (CAT/6) – about 12% of
score. Includes mathematics, reading, language, science
 California Standards Test (CST) – about 73% of score.
Includes mathematics, science, language arts, social science
 CA- High School Exit Exam (CA-HSEE) – about 15% of
score.

Eventually API scores will also include graduation
and attendance rates from schools as part of the
overall “score”.
Consider This
 So,
does this system adequately
measure the success of CA
students?
 Does it reflect the learning that is
happening in CA classrooms?
Some Questions



What are the appropriate uses of Normreference tests? Criterion-reference tests?
How should these test be used at the
state/district/school level?
What role does testing play in looking at
school performance? Student
performance? Teacher performance?
The Real Question
We Should Ask



Testing is a reality that is here to stay.
It has been legislated by the state of CA
under the STAR system and by the federal
government by the NCLB Act.
So we should really be asking;
How do we use these tools to support
students and their learning in CA schools?
Download