Koppitz Developmental Scoring System for the Bender® Gestalt Test

The KOPPITZ-2
A revision of
Dr. Elizabeth Koppitz’
Bender Developmental
Scoring System for
Young Children.
Now for ages 5 years
thru 89 years.
What is the Koppitz-2?
The Koppitz-2 is an extensive revision,
redevelopment, and extension (up and
down) of the original Koppitz (1963, 1975)
Bender-Gestalt Test for Young Children,
one of the most popular individually
administered tests for children of the last
century.
How is Koppitz-2 Different from the
Original Koppitz?
• Original age range, 5-10 years.
--New = age range 5-89 years
• Original, 30 items
--New = 34 items ages 5-7, and 45 items
for ages 8-89.
• Original, local sample of 1,100, all from
NY
--New = National sample of 3,535
Key Differences in the Revision vs
Original Koppitz, cont.
• Original, only offered perceptual ages.
--New = Standard scores, percentile ranks, and
age equivalents
--Supplementary conversion to T-scores, zscores, NCEs, and stanines provided.
• Original, 9 cards used at all ages
--New = 13 cards for ages 5-7 and 12 cards for
ages 8-89 (16 cards overall, Designs 5-13
used at all ages)
Key Differences in the Revision vs
Original Koppitz, cont.
• Original, scored via error count.
--New = scored in a positive direction via
an absence of errors
• Original, items selected on group mean
differences only
--New = classical true score theory
applied to item selection
And, How is it the same?
• Retains all 9 original Bender cards at all
age levels.
• Retains Koppitz emphasis on the Bender
as assessing VMI from a developmental
perspective.
• Retains Koppitz’ conceptual approach to
interpretation of BG performance.
• Retains the unstructured aspects of
Bender administration.
And, How is it the same, cont.?
• Provides data on time to completion.
• Retains Koppitz’ emotional indicators but
adds more scoring guides and provides an
EI record form for ease of use.
Original scoring system
• Koppitz (1963) original developmental scoring
system derived from a list of distortions Koppitz
observed in the drawings of young children
using the original nine Bender (1938) cards.
• Koppitz then chose items based on their ability
to differentiate among 77 children from grades 1
through 4 on the basis of grade placement.
• These items were then used to devise normative
data from the protocols of 1100 children ages 5
through 10 years.
How were new items derived?
• A group of experienced psychologists was gathered to
write developmentally appropriate items for the new
designs apparent in the Bender-Gestalt II and its 16
(versus 9 in the original) designs.
• Additional scoring elements were written for the original
Bender designs as well since it was not known how the
original Koppitz items would fare using modern
techniques of item selection.
• Once these items were agreed upon (and there were
more than 100 for initial analyses), trained, supervised
staff scored all protocols according to the newly devised
Koppitz Developmental Scoring System.
How were new items derived?
• Classical test theory was used to guide the item
analyses which were performed on all agreed
upon scoring elements.
• Corrected or partial point-biserial correlations
between item and total score calculated at every
age interval.
• Using the item means and discrimination
indexes (reviewed separately by age level), 34
items were retained for the 5 through 7 year olds
and 45 items for the 8 year and older groups.
Koppitz-2 Materials
• 16 design cards of the Bender-Gestalt-II
• Two record forms
Ages 5-7 years
Ages 8-89 years
• Supplemental EI Record Form
• Scoring template
• Examiner’s Manual
The Koppitz-2 Standardization Sample
3,535 individuals drawn during 2001 and
2002 to represent the US population at
large according to the 2000 US Census
statistics, stratified on the basis of age,
sex, race, ethnicity, geographic region,
and SES level (as estimated by
educational attainment—of parents for
children and the individual for adults).
How successful was the actual
sampling?
• Very!
• On nearly all variables, the actual sample was
within 1 percentage point of the population
values. The largest discrepancy occurred on
SES where the sample was off by 2.1%, undersampling those with more than a HS education.
• In calculation of the norms tables, sample
weights were calculated to perfectly mimic the
population data.
Summary of Reliability Results
The overall reliability of the Koppitz-2 DBGT VMI has
been demonstrated to be quite good. Relative to
Anastasi and Urbina’s (1997) three sources of test error
(content, time, and scorer), the coefficients determined
demonstrate very acceptable levels of score reliability.
The internal consistency reliability of the VMI is
consistently high across demographic classifications as
well as various diagnostic groups. The magnitude of
these reliability coefficients strongly suggests that the
Koppitz DBGT scores generally possess relatively small,
acceptable amounts of error and that test users can
have confidence in the consistency of Koppitz DBGT
results when obtained after carefully following the
standardized administration and scoring procedures
detailed in the manual.