Economic Perspectives on Standardized Testing

advertisement
Economic Perspectives on
Standardized Testing(c)
Richard P. Phelps
(c)
2002, by Richard P. Phelps
Economic Perspectives on
Standardized Testing: Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Why can’t economists and psychologists just get along?
Overview of economic theory as it pertains to education & testing
Human capital theory and the economics of information
Supply & demand; benefits & costs; goods & bads
The cost of standardized testing (from society’s point of view)
The benefits of standardized testing (information)
The benefits of standardized testing (motivation)
Optimal testing system structures
Optimal testing industry structures
Discussion
Topic 1: Why can’t economists and psychologists just get
along?
1) Why can’t economists and
psychologists just get along?
[answer: sometimes they do]
• Tversky and Kahneman, two cognitive psychologists, asked
themselves why rational economic man patronizes casinos, where the
odds are against him.
• Their experiments revealed that tolerance of (or, attraction to) risk
varies widely among individuals, and most weigh small risks against
low-probability, but very large, gains “sub-optimally”
• Tversky’s and Kahneman’s work is now required reading for any
economics major
• Experimental economics, which strongly resembles cognitive
psychology in its methods, is now the fastest growing area of research
in the field.
1) Why can’t economists and
psychologists just get along?
[answer: sometimes they do not]
Test Utility research
• Thousands of studies conducted by I/O psychologists from the 1960s
through the 1980s
• Dozens of meta-analyses
• Even a few meta-analyses of the meta-analyses
• Few economists, then or now, even aware of the field
Decline in interest in Test Utility research
• Regulatory ruling against validity generalization in late 1980s by Civil
Rights office in Reagan administration
• National Research Council forms committee with curious membership
to critique a single Test Utility study (critique interpreted by many as a
condemnation of all Test Utility research)
Topic 2: Overview of economic theory as it pertains to
education & testing
2) Economic theory as it
pertains to education
in general
Traditionally, education economics conducted in 2 fields
Labor Economics
• Labor markets for teachers and graduates
• Returns (in wages) to investment (in years) in education
Public Finance
• Returns (in achievement, attainment) to investment (in tax revenues)
• Funding equity, adequacy, efficiency, & intra-metropolitan migration
2) Economic theory as it
pertains to testing
in particular
Human Capital Theory
• Higher wages over the long term can more than compensate for the
earnings foregone while still in school
• …assumed a strong correlation between accumulation (years in school,
any school) and earning power (applicable knowledge and skills)
Economics of Information
• Basic economic assumption of “perfect information” is simplistic
• When buyer and seller have “asymmetric” information, classic
economic assumptions are not appropriate
Topic 3: Human capital theory and the economics of
information
3) Human capital
theory:
seminal works
• Human Capital (1954), Gary Becker
• Schooling, Experience, and Earnings (1974), Jacob Mincer
• Dozens of World Bank reports
3) Economics of
Information:
seminal works
•
“The Market for Lemons” (1970) George Akerlof
– When buyers can evaluate a purchase based only on a quality assessment of the
entire group, sellers have an incentive to market poor quality merchandise and,
over time, the average quality of goods declines. Often-used counters to quality
decline are: guarantees, brand names, franchising, and credentials.
•
“Economics of Imperfect Information”(1976) Rothschild, Stiglitz,
Grossman
– Perfectly competitive markets have perfect information. In markets without
perfect information, there is little incentive for private individuals to fill the
breach (Consumers’ Reports is an exception, and not very profitable). Thus,
there can be a role for government to promote market efficiency, by providing
information.
3) Screening,
signaling,
filtering,
credentialing, I
•
Education and Jobs: The Great Training Robbery (1970), Ivar Berg
– Employers pay for credentials, not human capital; they know little to nothing of
the quality of education programs, only the perception thereof
•
Generating Inequality (1972) Lester Thurow
– Employers want “trainable” employees, and judge that those who could endure
schooling are probably more trainable than those who could not
•
Work of Piore and Doeringer on “Market Segmentation”
– Neither education nor education credentials matter in “secondary” labor markets,
only in “primary” market, with career ladders
3) Screening,
signaling,
filtering,
credentialing, II
•
Market Signaling (1973), Michael Spence
– Diplomas are a signaling device to employers, who take a gamble with every
new hire; evidence that the graduate is hoping employers will conclude that
certain human capital has been obtained, but not proof that it has
•
“On the Weak versus the Strong Version of the Screening Hypothesis”
(1979) George Psacharopoulos
– Weak: employers pay only higher starting wages for “better” credentials
– Strong: employers continue to pay higher wages for “better” credentials even
after they become familiar with each employee’s actual productivity
•
•
“Higher Education as a Filter” (1973) Kenneth Arrow
“The Theory of Screening” (1975) Joseph Stiglitz
3) Empirical and
theoretical
work on
standards
•
Burton Weisbrod (1964)
– Discovered that 90% of adults are hired within the boundaries of a school district
other than the one from which they graduated
– So, employers are not familiar with and have no influence over the education
standards used to train virtually all their employees
•
John Bishop (1980s)
– It is unreasonable to expect a teacher to be both a sympathetic coach and a
neutral judge. External exams let them be coaches exclusively, which is in
keeping with what most of them probably want anyway.
•
Robert Costrell (1994)
– School district incentives are to inflate grades and socially promote. If they
maintain tough standards, they only hurt their own children in later competition
against graduates of other districts where standards are lax and grades inflated.
– Standards must be enforced externally, or they will not be.
Topic 4: Supply & demand; benefits & costs; goods & bads
4) Benefits & costs;
goods & bads
• Economists are (small d) democrats
– what is a “good” or a benefit is relative to each individual; the researcher
does not get to decide what is good or bad for the consumer; consumers
decide for themselves
– but, we’d all like more money (freely exchangeable) and more free time
• Economists assume we all want more of something (even if it is
spiritual enlightenment), and that we can’t always get it
• Benefits have two phases: creation and capture
– Not all potential benefits are realized, or “captured”
– (e.g.,) You do very well and learn very much at a college with a terrible
reputation, and then cannot get a job because of that reputation
4) The demand for
standardized testing
• Phelps (1998) - 40 years of public opinion poll data
– The adult public is not ignorant about standardized tests, since all have
taken many, for better or for worse
– Support for high-stakes standardized testing is overwhelming, and has
been consistently so for decades
– Most stakeholders, including students and parents, are strongly supportive.
Teachers are usually supportive, but don’t like being judged for outcomes
over which they have little control. Education professors are strongly
opposed. Administrators have been on the fence, may now be opposed.
– The year 2000 “testing backlash” was very strongly hyped public relations
creature, and completely unsupported by the objective evidence.
4) “Natural Experiments” in test
demand and valuation:
a) countries liberalize education,
b) drop test requirements,
c) find that standards deteriorate,
d) then revert back to testing
• Many Western European and North American states
(1960s – 1970s)
• Many Post-Colonial, Newly-Independent states
(1940s – 1970s)
• Ex-Communist Eastern European states (1990s –
2000s)
4) Trends in test
adding/dropping, OECD
countries: 1974--1999
4) Countries adding or
dropping large-scale, external
testing,
by type of testing: 1974-1999
Number of countries or provinces...
Type of testing
...adding
testing
...dropping
testing
Assessments
17
0
Upper secondary exit exams
12*
0
University entrance exams
5
0
Subject-area end-of-course exams
6
0
Lower secondary exit or entrance exams
4
2
Inclusion of voc/prof tracks in exit exam system
3
0
Primary/secondary-level achievement testing
2
1
Diagnostic testing
2
0
TOTAL
51
3
4) Countries with nationally
standardized high-stakes exit
exams, by level of education
Primary school
Belgium (French)
Italy
Netherlands
Russia
Singapore
Switzerland (some
cantons)
Lower secondary
school
Belgium (French)
Canada: Quebec
China
Czech Republic
Denmark
France
Hungary
Iceland
Ireland
Italy
Japan
Korea
Netherlands
New Zealand
Norway
Portugal
Russia
Singapore
Sweden
Switzerland
United Kingdom: England & Wales,
Scotland
Upper secondary
school
Belgium: (Flemish) & (French)
Canada: Alberta, British Columbia,
Manitoba, New Brunswick,
Newfoundland, Quebec
China
Denmark
Finland
France
Germany
Hungary
Iceland
Italy
Japan
Netherlands
Norway
Portugal
Russia
Singapore
Sweden
Switzerland
United Kingdom: England & Wales,
Scotland
4) Demand for testing
is not unlimited
– saturation is possible
School district response to state test mandates (1991)
State and local tests'
purpose and content are…
Percent of districts substituting
state test
…exactly the same or very similar
82
…somewhat or moderately similar
69
…not at all similar or very little
41
SOURCE: U.S. GAO, 1993.
Topic 5: The Cost of Standardized Testing
(from society’s point of view)
5) Cost jargon
• Marginal cost (the cost of the next unit): For a test, it is the cost that is
incurred due to the addition of a test, and only that cost.
– (e.g., during test administration, the school building must be maintained,
but such would be the case without a test, too. The test is not responsible
for this cost.)
– Subject-matter instruction occurs whether or not there is external testing,
so it also is not a cost of the test.
• Opportunity cost (cost of foregone opportunities (i.e., instead of
doing this, you could have been at work making money)): For a test,
the time a teacher spends preparing for, monitoring, or scoring a test is
time he could have been planning his course, grading homework, etc.
– If the teacher makes productive use of the time while students are taking a
test, there are no opportunity costs.
5) Average all-inclusive
per-student costs of two
test types in states having
both:
1990-91
Type of test
Cost factors
Multiple-choice
Performance
Start-up development
$2
$10
Ongoing, annual costs
$16
$33
SOURCE: U.S. GAO, 1993, p.43
5) Average per-student costs
of two test types in states
having both, with
adjustments:
1990-91
All
systemwide
tests
All-inclusive marginal
cost
Sample of 11
state
performance
tests
Sample of 6
multiple-choice
tests in those
same states
$15
$33
$16
…minus adjustment for
regular school year
administration
-7
-15
-7
...minus adjustment for
replacement of
preexisting tests
-6
-12
-12
$5
$11
$2
Marginal cost after
adjustments
SOURCE: Phelps, 2000.
5) “Economies”
jargon
• The unit cost of producing your product declines the more of an
“economy” you have (because fixed/overhead costs get spread out)
– Scale – you can sell at lower cost because you make so many of them
– Scope – you can sell at lower cost because you make other stuff that is
similar, or in similar ways
– Learning – you figure out ways to be more efficient and productive as
you gain experience
• There are many “economies” (just like validities)
Economies of scale in state performance testing
Some economies of scope in state performance testing
5) General structure of testing costs
Scorers
are...
GROUPS of
teachers or
professional
scorers
INDIVIDUAL
teachers or
professional
scorers
a
COMPUTER
Students take tests...
EN MASSE
in GROUPS
ONE at a TIME
5) Slack capacity in
U.S. students’ time
= opportunity for
windfall gain ?
Average number of hours per day devoted to…
Region/
Country
Sports
TV
watching
Playing or
socializing
Studying
USA
2.2
2.6
2.5
2.3
East Asia
(N = 5)
0.9
2.4
1.3
3.1
West Europe
(N = 4)
1.6
2.0
2.4
2.8
East Europe
(N = 7)
1.6
2.6
2.5
2.9
Topic 6: The Benefits of Standardized Testing
-- Information
6) Information
benefits of testing
• For whom? Could be anyone – student, parent, teacher, school,
public, postsecondary institution, employer, …
• Information can be used beneficially in:
–
–
–
–
–
Diagnosis (of student, teacher, school, ….)
Alignment (to standards, schedule, each other, …)
Learning for teachers
Goodwill with public
Decisions (promotion, placement, selection, …)
6) Information
benefits of testing
– how are they
measured?
• Predictive validity (fairly measurable)
• Allocative efficiency (fairly measurable)
– (the greater the range restriction the higher the
allocative efficiency?)
• Alignment (not so easy to measure)
• Goodwill (not at all easy to measure)
Topic 7: The Benefits of Standardized Testing
-- Motivation
6) Motivational benefits of
testing
– how are they measured?
• In controlled experiments:
– Ex. A) One group is told the test at the end of the course comes with a
reward; control group told it does not count
– Ex. B) One group is tested throughout course; control group is not
• In large-scale studies--Graduates from regions with highstakes tests compared to their non-tested counterparts:
• By their relative performance on another, common test
• Their relative wages after graduation
• Their relative rates of dropout, persistence, attainment, …
• “Backwash Effect” (e.g., students in states with high-stakes
high school graduation tests perform better even on the 8thgrade level IAEP, TIMSS, or NAEP
7) Large-scale studies
finding benefits to the
use of external, highstakes examinations
• John Bishop (1980s+) several studies -- IAEP, TIMSS,
SAT, NY State, Canada, …
• Winfield; Fredericksen; Bishop; Jacobson (minimum
comp. states)
• Others: Graham, Husted (SAT); Grissmer, Flanagan
(NAEP); Phelps (TIMSS+); Carnoy (NAEP); Rosenshine
(NAEP); Braun (NAEP); Wenglinsky
7) Smaller-scale
studies finding
benefits to the use of
high-stakes
examinations
• Controlled experiments – Tuckman, Trimble; Webb; Wolf, Smith;
Egeland; Jones; Brown, Walberg; Tuckman; Khalaf, Hanna; others….
• Evaluations -- Anderson, Muir, Bateson, Blackmore, Rogers;
Heyneman; G.A.O.; Achieve; Stake, Theobald; Bond, Cohen; Calder;
Glassnap, Pogio, Miller; others…
• Case studies – S.R.E.B.; Schleisman; Neville; Goldberg, Roswell;
Schlawin; Delong; Lerner; Jett, Shafer; others…
7) Bishop's estimates
of dollar value of
high-stakes exams on
student outcomes
*
Difference (in standard
deviation units)
Difference (in gradelevel-equivalent
units)
Difference per student (in
net present value)
in 1993 dollars*
Canada: High-stakes
testing provinces vs.
others
.233 (in math)
.183 (in science)
.75 (in math)
.67 (in science)
$13,370 (in math)
$11,940 (in science)
USA: New York State vs
rest of U.S.
.164 (in
SAT Verbal +Math)
.75 (verbal + math)
$13,370
IAEP: High-stakes
testing countries vs.
others
.586 (in math)
2.0 (in math)
.7 (in science)
$35,650 (in math)
$12,480 (in science)
TIMSS: High-stakes
testing countries vs.
others
n/a
.9 (in math)
1.3 (in science)
$16,040 (in math)
$23,170 (in science)
Based on male-female average, averaged across six longitudinal studies, cited in Bishop, 1995a, Table 2, counting only
general academic achievement, not accounting for technical abilities.
Topic 8: Optimal testing system structures
8) Single or multiple
target systems
•
Becker and Rosen (1990)
– A “single target” examination (e.g., minimum competency) is problematic
• Set too high, slower kids will be discouraged and drop out
• Set too low, and advanced kids will be bored and may work less
– Examination systems should have multiple targets
•
Empirical Studies of 1970s—1980s Minimum Competency Exams (e.g., Ligon,
Mangino, Babcock Johnstone, Brightman, Davis)
– Performance of lowest students did improve, but that of advanced students either
stayed flat, or decreased
•
Jonathan Jacobson (1992)
– Longitudinal analysis of students from minimum competency states showed that
slowest students gained and middle students lost
– Probably, the test induced resource flows to the slow students and away from the
middle students
8) Examples of
multiple target
systems
•
Hierarchical, or “tiered,” systems – British system, New York State
– All students must pass exams with broad, common requirements, but at choice of
levels (Advanced or Ordinary; Competency or Honors)
– British just recently changed, creating a hybrid that looks more like continental
exam systems
•
Branched or parallel track systems – Most of Continental Europe
– Students choose (or the choice is made for them) where to concentrate their efforts,
and they are tested mostly on that concentration
– First branching (junior high level) into academic, general, vocational
– Second branching (high school level) into subject area or vocational concentration
8) Some current
research on testing
system structure
•
John Bishop
– Suspects that standardized end-of-course or end-of-year examinations may be the
most optimal form of standardized testing.
– Why? – perhaps because they combine the best of both worlds
• standardized and external
• concise, targeted, with very strong alignment between curriculum and test
•
Value-added systems
– Concerns for volatility and fairness mandate that the testing be frequent – at least
annual
•
Tests not only quality control measure; How to optimize whole set (Phelps,
Just for the Kids, others…)
8) The more high-stakes decision points, the better
the student performance ?
Figure 1: Average TIMSS Score and Number of Quality Control
Measures Used, by Country
Average Percent Correct (grades 7&8)
80
70
60
50
40
30
20
10
0
0
5
10
15
20
Number of Quality Control Measures Used
Top-Perf orming Countries
SOURCE: Phelps, 2001
Bottom-Perf orming Countries
8) Quality control has proportionally greater effect in poorer
countries
Average Percent Correct (grades 7& 8)
(per GDP/capita)
Figure 2: Average TIMSS Score and Number of Quality Control
Measures Used (each adjusted for GDP/capita), by Country
Num be r of Quality Control Me as ure s Us e d (pe r GDP/capita)
SOURCE: Phelps, 2001
Topic 9: Optimal testing industry structures
9) The industry
structure game,
in theory
• Selfish consumers want a perfectly competitive industry
– Lots of producers, cutthroat competition
– Easy producer entry to, exit from industry
– Low prices, lots of choice and information
• Selfish producers want to be monopolists
– Raise prices, lower quality
– Block new entrants, withhold information
9) The industry
structure game,
in practice
• Consumers want stable suppliers, salespeople they know, brand
names they can trust
– So, sure, they want competition, choice, and low prices…
– But, they do not want to have to try out a new brand of detergent after
every visit to the grocery store
• Producers try to avoid monopoly, or else get regulated or split up
– e.g., Microsoft pushes Apple and Corel to the brink of bankruptcy, then
tosses each of them a lifeline to keep them in business (barely)
– So, the goal is to approach having a monopoly without quite having one
9) Competitive
strategy theory
• In industries with steep economies (of scale, scope, learning, ….)
there is only room for so many producers
– If you do not have the relevant “economies” in your firm, you had better
focus on a specialty niche that makes you unique, or else get out
• (e.g.) General Electric/RCA Consumer Electronics (1987)
– Crowded field: Sony, Zenith, Phillips, Toshiba, Mitsubishi, others
• Sony - technological edge, reputation for quality, could charge high prices
• Niche players – Mitsubishi (big screen TVs); Sharp (flat panels)
• Low cost players – Koreans had entered market, Chinese were purchasing the
facilities of bankrupt American firms (e.g., Admiral, Philco, Sylvania)
• Japanese manufacturers were building assembly plants in US and Mexico in
order to lower their shipping costs for large sets
– GE was “stuck in the middle” – could not compete on cost or quality and
had no unique niche – they sold out
9) Possible sources of
competitive advantage in
the testing industry
• Advantages related to scale economies
– Huge item banks take time to accumulate and test and they are
copyrighted (‘sunk costs’ => barrier to entry)
– Established client base, relationships
• Advantages related to scope economies
– Much psychometric expertise is equally useful across a variety of tests
– Customers needs largely similar across states, countries
– Good brand name provides instant cachet in new markets
• Advantages related to learning economies
– Experience working with, knowledge of clients
– Experience gained with a new type of product will lower cost for
subsequent, similar projects
9) Niche markets in
educational testing (where
“economies” may be of
little help)
• Custom-made performance tests, “built from scratch”
• Some special education and psychological testing that requires
one-on-one administration, highly-specialized protocols, or
licensed test administrators
• Some vocational-occupational testing that employs “hands on”
demonstrations observed by specialists
• Oral interviews
Topic 10: Discussion
Download