A-Test-Scores-11.22.15

advertisement
Rivier University
Education Division
Specialist in Assessment
of Intellectual Functioning
(SAIF) Program
ED 656, 657, 658, & 659
John O. Willis, Ed.D., SAIF
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
1
Statistics:
Test Scores
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
2
One measurement is worth a
thousand expert opinions.
.
— Donald Sutherland
A little inaccuracy sometimes
saves a ton of explanation.
.
— H. H. Munro (Saki)
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
3
For more accurate, more detailed,
and more entertaining information
on these topics, please see W.
Joel Schneider's Psychometrics
from the Ground Up at
https://assessingpsyche.
wordpress.com/psychometricsfrom-the-ground-up/
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
4
and
Kevin McGrew's Applied
Psychometrics at
http://themindhub.com/
research-reports
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
5
We can measure the
same thing with many
different units.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
6
We measure the same
distances with many
different units.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
7
Disability
Rights
Center
Low Avenue
NH State
House
Phenix Avenue
Main Street
0.1 miles
528 feet
176 yards
6,336 inches
161 meters
8 chains
11.22.15 Rivier Univ.
32 rods
8
We measure the same
temperatures with
many different units.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
9
ºC
ºF
100
212
37
0
-17.8
SAIF
Statistics
K
373.15
98.6
310.15
32
273.15
0
255.35
John O. Willis
10
Test authors and
publishers feel
compelled to do
the same thing
with test scores.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
11
Z scores
-4
-3
-2
-1
0
1
2
3
4
Standard
40
55
70
85
100
115
130
145
160
1
4
7
10
13
16
19
3
6
9
12
15
18
21
24
10
20
30
40
50
60
70
80
90
NCE
1
1
8
29
50
71
92
99
99
Percentile
0.1
0.1
2
16
50
84
98
99.9
99.9
Scaled
V- Scale
T
SCORES USED WITH THE TESTS
When a new test is developed, it is
normed on a sample of hundreds or
thousands of people. The sample
should be like that for a good
opinion poll: female and male,
urban and rural, different parts of
the country, different income
levels, etc.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
13
The scores from that norming
sample are used as a yardstick for
measuring the performance of
people who then take the test.
This human yardstick allows for
the difficulty levels of different
tests. The student is being
compared to other students on
both difficult and easy tasks.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
14
You can see from the illustration
below that there are more scores
in the middle than at the very
high and low ends. Many
different scoring systems are
used, just as you can measure
the same distance as 1 yard, 3
feet, 36 inches, 91.4 centimeters,
0.91 meter, or 1/1760 mile.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
15
There are 200 &s.
Each && = 1%.
&
& &
&
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
& & &
Percent in each
2.2%
6.7%
16.1%
50%
16.1%
6.7%
2.2%
Standard Scores
– 69
70 – 79
80 – 89
90 – 110
111 – 120
121 – 130
131 –
Scaled Scores
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
&
17 18 19
T Scores
– 29
30 – 36
37 – 42
43 – 56
57 – 63
64 – 70
71 –
Percentile Ranks
– 02
Very
Low
03 – 08
09 – 24
Low
Average
25 – 75
77 – 91
High
Average
92 – 98
98 –
Very
Superior
WoodcockJohnson Classif.
Stanines
Very Low
- 73
Low
Low
74 - 81
Below
Average
82 - 88
Average
Low
Average
89 - 96
Average
97 - 103
High
Average
Above
Average
104 - 111
112 - 118
Superior
High
119 - 126
Very High
127 -
Adapted from Willis, J. O. & Dumont, R. P., Guide to Identification of Learning Disabilities (3rd ed.) (Peterborough, NH: Authors, 2002,
pp. 39-40). Also available at http://www.myschoolpsychology.com/testing-information/sample-explanations-of-classification-labels/
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
16
PERCENTILE RANKS (PR) simply
state the percent of persons in the
norming sample who scored the same
as or lower than the student. A
percentile rank of 63 would be high
average – as high as or higher than
63% and lower than the other 37% of
the norming sample. It would be in
Stanine 6. The middle 50% of
examinees' scores fall between
percentile ranks of 25 and 75.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
18
A percentile rank of 63 would mean
that you scored as high as or higher
than 63 percent of the people in the
test’s norming sample  and lower
than the other 37 percent .
Never use the abbreviations “%ile” or
“%.” Those abbreviations guarantee
your reader will think you mean
“percent correct,” which is an entirely
different matter.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
19
Percentile ranks (PR) are not equal
units. They are all scrunched up in the
middle and spread out at the two
ends. Therefore, percentile ranks
cannot be added, subtracted,
multiplied, divided, or – therefore –
averaged (except for finding the
median if you are into that sort of
thing).
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
20
NORMAL CURVE EQUIVALENTS
(NCE) were – like so many clear,
simple, understandable things –
invented by the government. NCEs
are equal-interval standard scores
cleverly designed to look like percentile ranks. With a mean of 50 and
standard deviation of 21.06, they line
up with percentile ranks at 1, 50, and
99, but nowhere else, because percentile ranks are not equal intervals.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
21
Percentile Ranks and
Normal Curve Equivalents
PR
1 10 20 30 40 50 60 70 80 90 99
NCE 1 23 33 39 45 50 55 61 67 77 99
PR
1
3
8 17 32 50 68 83 92 97 99
NCE 1 10 20 30 40 50 60 70 80 90 99
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
22
100
90
80
70
60
50
40
30
20
10
0
rubber
band
PR
NCE
stick
O. Willis
1 10 20SAIF30Statistics
40 50John60
70 80 90 99
11.22.15 Rivier Univ.
23
A Normal Curve Equivalent
of 57 would be in the 63rd
percentile rank (Stanine 6).
The middle 50% of
examinees' Normal Curve
Equivalent scores fall
between 36 and 64.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
24
Because they are equal units,
Normal Curve Equivalents can
be added and subtracted, and
most statisticians would
probably let you multiply,
divide, and average them.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
25
Z SCORES are the
fundamental standard score.
One z score equals one standard deviation. Although only
a few tests (favored mostly by
occupational therapists) report
them, z scores are the basis
for all other standard scores.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
26
Z SCORES have an average
(mean) of 0.00 and a standard
deviation of 1.00. A z score of
+0.33 would be in the 63rd
percentile rank, and it would
be in Stanine 6. The middle
50% of examinees' z scores
fall between -0.67 and +0.67.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
27
Wechsler-type STANDARD
SCORES ("quotients" on some
tests) have an average (mean) of
100 and a standard deviation of
15. A standard score of 105
would be in the 63rd percentile
rank and in Stanine 6. The middle
50% of examinees' standard
scores fall between 90 and 110.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
28
[Technically, any score defined
by its mean and standard
deviation is a “standard score,”
but we usually (except, until
recently, with tests published
by Pro-Ed) use “standard
score” for standard scores with
mean = 100 and s.d. = 15.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
29
Wechsler-type SCALED SCORES
("standard scores" [which they
are] on some Pro-Ed tests) are
standard scores with an average
(mean) of 10 and a standard
deviation of 3. A scaled score of
11 would be in the 63rd percentile
rank and in Stanine 6. The middle
50% of students' standard scores
fall between 8 and 12.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
30
V-SCALE SCORES have a mean of
15 and standard deviation of 3 (like
Scaled Scores). A v-scale score of
16 would be in the 63rd percentile
rank and in Stanine 6. The middle
50% of examinees' V-Scale Scores
fall between 13 and 17. V-Scale
Scores simply extend the ScaledScore range downward for the
Vineland Adaptive Behavior Scales.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
31
T SCORES have an average
(mean) of 50 and a standard
deviation of 10. A T score of 53
would be in the 62nd percentile
rank, Stanine 6. The middle
50% of examinees' T scores fall
between approximately 43 and
57. [Remember: T scores, Scaled
Scores, NCEs, and z scores are
actually all standard scores.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
32
CEEB SCORES for the SATs,
GREs, and other Educational
Testing Service tests used to
have an average (mean) of 500
and a standard deviation of 100.
A CEEB score of 533 would have
been in the 62nd percentile rank,
Stanine 6. The middle 50% of
examinees' CEEB scores used to
fall between approximately 433
and 567.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
33
BRUININKS-OSERETSKY
SUBTEST SCALE SCORES have
an average (mean) of 15 and a
standard deviation of 5. A
Bruininks-Oseretsky (BOT-2)
Scale Score of 17 would be in the
66th percentile rank, Stanine 6.
The middle 50% of examinees'
scores fall between
approximately 12 and 18.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
34
QUARTILES ordinarily divide
scores into the lowest,
antepenultimate, penultimate,
and ultimate quarters (25%) of
scores. However, they are
sometimes modified in odd ways.
DECILES divide scores into ten
groups, each containing 10% of
the scores.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
35
STANINES (standard nines)
are a nine-point scoring system.
Stanines 4, 5, and 6 are
approximately the middle half
(54%)* of scores, or average
range. Stanines 1, 2, and 3 are
approximately the lowest one
fourth (23%). Stanines 7, 8, and
9 are approximately the highest
one fourth (23%).
_________________________
* But who’s counting?
36
Why do
authors
and
publishers
create and
select
all these
different
scores?
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
37
• Immortality. We still talk about
“Wechsler-type standard scores”
with a mean of 100 and standard
deviation (s.d.) of 15. [Of
course, Dr. Wechsler’s name
has also gained some
prominence from all the tests he
published before and after his
death in 1981.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
38
• Retaliation? I have always
fantasized that the 1960
conversion of Stanford-Binet IQ
scores to a mean of 100 and s.d.
of 16 resulted from Wechsler’s
grabbing market share from the
1937 Stanford-Binet with his
1939 Wechsler-Bellevue and
1949 WISC and other tests.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
39
My personal hypothesis was that
when Wechsler’s deviation IQ (M =
100, s.d. = 15) proved to be such
a popular improvement over the
Binet ratio IQ (Mental Age/
Chronological Age x 100) (MA/CA
x 100) there was no way the next
Binet edition was going to use that
score. [This idea is probably
nonsense, but I like it.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
40
[Wechsler went with a deviation IQ
based on the mean and s.d.
because the old ratio IQ (MA/CA x
100) did not mean the same thing
at different ages. For instance, an
IQ of 110 might be at the 90th
percentile at age 12, the 80th at
age 10, and the 95th at age 14.
The deviation IQ means the same
thing at all ages.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
41
[The raw data from the Binet ratio IQ
scores did show a mean of about 100
(mental age = chronological age) and
a standard deviation, varying
considerably from age to age, of
something like 16 points, so both the
Binet and the Wechsler choices were
reasonable. However, picking just
one would have made life a lot easier
for evaluators from 1960 to 2003.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
42
In any case, the subtle difference
between s.d. 15 and 16 (WISC
115 = Binet 116, WISC 85 = Binet
84, WISC 145 = Binet 148, etc.)
plagued evaluators with the
1960/1972 and 1986 editions of
the Binet. The 2003 edition finally
switched to s.d. 15.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
43
• Matching the precision of the
score to the precision of the
measurement. Total or composite scores based on several
subtests are usually sufficiently
reliable and based on sufficient
items to permit a fine-grained
15-point subdivision of each
standard deviation (standard
score).
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
44
It can be argued that a subtest
with less reliability and fewer items
should not be sliced so thin. There
might be fewer than 15 items! A
scaled score dividing each standard
deviation into only 3 points would
seem more appropriate, but there
are consequently big jumps
between scores on such scales.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
45
The Vineland Adaptive Behavior
Scale v-scale extends the scaled
score measurement downward
another 5 points to differentiate
among persons with very low
ratings because the Vineland is
often used with persons who
obtain extremely low ratings. The
v-scale helpfully subdivides the
lowest 0.1% of ratings.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
46
T scores, dividing each standard
deviation into 10 slices, are finer
grained than scaled scores (3
slices), but not quite as narrow as
standard scores (15). The
Differential Ability Scales,
Reynolds Intellectual Assessment
Scales, and many personality and
neuropsychological tests and
inventories use T scores.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
47
Dr. Bill Lothrop often quoted Prof.
Charles P. "Phil" Fogg:
Gathering data with a rake
and examining them under
a microscope.
Test scores may give the illusion
of greater precision than the test
actually provides.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
48
However, Kevin McGrew
(http://www.iapsych.com/iapap
101/iap101brief5.pdf) warns us
that wide-band scores, such as
scaled scores, can be dangerously
imprecise. For example a scaled
score of 4 might be equivalent to a
standard score of 68, 69, or 70 (the
range usually associated with
intellectual disability) or 71 or 72
(above that range).
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
49
That lack of precision can have
severe consequences when
comparing scores, tracking
progress, and deciding whether a
defendant is eligible for special
education or for the death penalty
(http://www.atkinsmrdeath
penalty.com/).
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
50
The WJ IV, KTEA-II, and WIAT-III, for
example use standard scores with
Mean 100 & SD 15 for both (sub)tests
and composites. This practice does
not seem to have caused any harm,
even if it is unsettling to those of us
who trained on the 1949 WISC and
1955 WAIS.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
51
• Sometimes test scores offer a
special utility. The 1986 StanfordBinet Fourth Ed. (Thorndike,
Hagen, & Sattler), used composite
scores with M = 100 and s.d. = 16
and subtest scores with M = 50
and s.d. = 8.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
52
With that clever system, you
could convert subtest scores to
composite scores simply by
doubling the subtest score. It
was very handy for evaluators.
Mentally converting 43 to 86 was
much easier than mentally
converting scaled score 7 or T
score 40 to standard score 85.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
53
Sample Explanation for
Evaluators Choosing to
Translate all Test Scores into
a Single, Rosetta Stone
Classification Scheme
[In addition to writing the following
note in the report, remind the reader
again in at least two subsequent
footnotes. Readers will forget.]
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
54
“Throughout this report, for all of
the tests, I am using the stanine
labels shown below (Very Low,
Low, Below Average, Low
Average, Average, High Average,
Above Average, High, and Very
High), even if the particular test
may have a different labeling
system in its manual.”
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
55
Stanines
&&&&&
There are 200 &s, so
Each && = 1 %
Stanine
Percentile
Standard Score
&&&&&&
&&&&&&&
&&&&&&
&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
&&&&&&&
1
2
3
4
5
6
7
8
9
Very
Low
4%
Low
7%
Below
Average
12%
Low
Average
17%
Average
20%
High
Average
17%
Above
Average
12%
High
7%
Very
High
4%
1–4
4 - 11
11 - 23
23 - 40
40 – 60
60 – 77
77 - 89
89 - 96
96 -99
74 - 81
82 - 88
89 - 96
97 – 103
104 – 111
112- 118
119 - 126
127 -
- 73
Scaled Score
1 – 4
5
6
7
8
9
10
11
12
13
14
15
16 – 19
v-score
1 – 9
10
11
12
13
14
15
16
17
18
19
20
21 – 24
T Score
- 32
33 – 37
38 - 42
43 - 47
48 – 52
53 – 57
58 - 62
63 -67
68 -
Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley
Custom Publishing, 1998, p. 26). Also available at http://alpha.fdu.edu/psychology/test_score_descriptions.htm.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
56
Obviously, that explanation is
for translating all scores into
stanines. You would modify
the explanation if you elected
to translate all scores into a
different classification scheme,
such as that used with the
Woodcock-Johnson. (Boiler
plate is always risky in reports!)
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
57
Sample Explanation for
Evaluators Using the
Rich Variety of Score
Classifications Offered
by the Several Publishers
of the Tests Inflicted on
the Innocent Examinee.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
58
“Throughout this report, for the
various tests, I am using a variety
of different statistics and different
classification labels (e.g., Poor,
Below Average, and High Average)
provided by the test publishers.
Please see p. i of the Appendix to
this report for an explanation of
the various classification schemes.”
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
59
1
There are 200 &s.
Each && = 1%.
&
& &
&
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
& & &
Percent in each
2.2%
6.7%
16.1%
50%
16.1%
6.7%
2.2%
Standard Scores
– 69
70 – 79
80 – 89
90 – 109
110 – 119
120 – 129
130 –
Scaled Scores
V-Scale Scores
1
2
3
1–8
T Scores
– 29
z-scores
< –2.00
– 02
Wechsler
Extremely
Classification
Low
Extremely
WISC-V
Low
DAS
Very
Classification
Low
WoodcockVery
Johnson Classif.
Low
Pro-Ed
Very
Classification
Poor
KTEA-3
Lower
Classification
Extreme
Vineland
Low
11.22.15
Rivier
Univ.
Adaptive Levels
– 70
Percentile Ranks
4
5
6
7
8
9
10
11
9
10
11
12
13
14
15
16
30 – 36
–2.00
– –1.34
03 – 08
37 – 42
–1.33
– –0.68
09 – 24
Low
Borderline
Average
Very
Low
Low
Average
Below
Low
Average
Low
Low
Average
Below
Poor
Average
Below Average
70 – 84
Moderately Low
Statistics
71 – SAIF
85
43 – 56
–
&
13
14
15
16 17 18 19
Standard
17
18
19
20
21 – 24
57Score
– 62
63 –110
69
70 –
12
0.67 – 0.66
0.67 – 1.32
1.33 – 1.99
2.00 –
25 – 74
75 – 90
High
Average
High
Average
Above
Average
91 – 97
98 –
Very
Superior
Extremely
High
Very
High
Average
Average
Average
Average
(90 – 110)
Average
Average
85 – 115
Adequate
John
Willis
86 –O.
114
Superior
Very
High
High
(111 – 120)
Superior
(121 – 130)
Above
Average
Superior
High Average
Above Average
116 – 130
Moderately High
115 – 129
Very Superior
(131 – )
Very Superior
Upper
Extreme
High
60
130 –
My score is 110! I am
adequate, average, high
average, or above average.
I’m glad that much is clear!
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
61
There are 200 &s.
Each && = 1%.
&
& &
&
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
& & &
Percent in each
2.2%
6.7%
16.1%
50%
16.1%
6.7%
2.2%
Standard Scores
– 69
70 – 79
80 – 89
90 – 109
110 – 119
120 – 129
130 –
Scaled Scores
1
2
3
1–8
V-Scale Scores
T Scores
– 29
z-scores
< –2.00
BruininksOseretsky
Percentile Ranks
RIAS
Classification
Stanford-Binet
Classification
Leiter
Classification
Severe Delay =
30 – 39
WoodcockJohnson Classif.
Pro-Ed
Classification
KTEA II
Classification
Vineland
Adaptive Levels
&
4
5
6
7
8
9
10
11
12
13
14
15
16 17 18 19
9
10
11
12
13
14
15
16
17
18
19
20
21 – 24
30 – 36
–
2.00 – –1.34
37 – 42
–
1.33 – –0.68
43 – 56
57 – 62
63 – 69
70 –
0.67 – 0.66
0.67 – 1.32
1.33 – 1.99
2.00 –
–
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
– 02
Significantly
Below Av.
03 – 08
Moderately
Below Av.
09 – 24
Below
Average
Moderately
Impaired
Borderline
Low
Average
40-54
Mildly
Impaired
55-69
25 – 74
Average
Average
75 – 90
Above
Average
91 – 97
Moderately
Above Av.
High
Average
Superior
Very
ModLow/
erate
Delay Mild
40-54 Delay
55-69
Very
Low
Very
Poor
Lower
Extreme
Low
– 70
11.22.15 Rivier Univ.
Low
Below
Average
Low
Average
Below
Poor
Average
Below Average
70 – 84
Moderately Low
71 – 85
Low
SAIF
Statistics
Average
Average
(90 – 110)
Average
Average
85 – 115
Adequate
86 – 114
John O. Willis
Above
Average
High
(111 – 120)
Superior
(121 – 130)
Above
Average
Superior
High Average
Above Average
116 – 130
Moderately High
115 – 129
98 –
Significantly
Above Av.
Gifted
130-144
Very
Gifted
145-160
Very
High/
Gifted
Very Superior
(131 – )
Very Superior
Upper
Extreme
High
130 –
Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley
Custom Publishing, 1998, p. 27). Also available at http://alpha.fdu.edu/psychology/test_score_descriptions.htm.
62
Wechsler
Classification
DAS
Classification
RIAS
Classification
Stanford-Binet
Classification
Leiter
Classification
Severe Delay =
30 – 39
WoodcockJohnson Classif.
Pro-Ed
Classification
KTEA II
Classification
Vineland
Adaptive Levels
Extremely
Low
Very
Low
Significantly
Below Av.
Moderately
Below Av.
Moderately
Impaired
Borderline
40-54
Borderline
Low
Mildly
Impaired
55-69
Very
ModLow/
erate
Delay Mild
40-54 Delay
55-69
Very
Low
Very
Poor
Lower
Extreme
Low
– 70
Low
Low
Poor
Below Average
70 – 84
Moderately Low
71 – 85
63
PUBLISHER'S SCORING SYSTEM FOR THE WECHSLER SCALES
[These are not the student’s own scores, just the scoring systems for the tests.]
When a new test is developed, it is normed on a sample of hundreds or thousands of people. The sample should be
like that for a good opinion poll: female and male, urban and rural, different parts of the country, different income
levels, etc. The scores from that norming sample are used as a yardstick for measuring the performance of people
who then take the test. This human yardstick allows for the difficulty levels of different tests. The student is being
compared to other students on both difficult and easy tasks. You can see from the illustration below that there are
more scores in the middle than at the very high and low ends.
Many different scoring systems are used, just as you can measure the same distance as 1 yard, 3 feet, 36 inches,
91.4 centimeters, 0.91 meter, or 1/1760 mile.
PERCENTILE RANKS (PR) simply state the percent of persons in the norming sample who scored the same as
or lower than the student. A percentile rank of 50 would be Average – as high as or higher than 50% and lower
than the other 50% of the norming sample. The middle half of scores falls between percentile ranks of 25 and 75.
STANDARD SCORES (called "quotients" on Pro-Ed tests) have an average (mean) of 100 and a standard
deviation of 15. A standard score of 100 would also be at the 50th percentile rank. The middle half of these
standard scores falls between 90 and 110.
SCALED SCORES (called "standard scores" by Pro-Ed) are standard scores with an average (mean) of 10 and a
standard deviation of 3. A scaled score of 10 would also be at the 50th percentile rank. The middle half of these
standard scores falls between 8 and 12.
QUARTILES ordinarily divide scores into the lowest, next highest, next highest, and highest quarters (25%) of
scores. However, they are sometimes modified as shown below. It is essential to know what kind of quartile is
being reported.
DECILES divide scores into ten groups, each containing 10% of the scores.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
64
There are
Each &&
&
& &
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
200 &s.
= 1%.
&
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
& & &
&
Percent in each
2%
7%
16%
50%
16%
7%
2%
Standard Scores
- 69
70 – 79
80 - 89
90 – 109
110 – 119
120 - 129
130 -
Scaled Scores
Percentile Ranks
1
2
3
Wechsler IQ
Classification
WIAT-III
Classifications
5
6
03 – 08
- 02
7
0
Lowest 5%
1
Next 20%
0
Lowest 25%
10
Extremely
Low
Very Low
Low 55 –
55 69
– 54
8
20
Very
Low
Borderline
9
10
11
12
25 – 74
09 - 24
1
Lowest 25%
Quartiles
Modified
Quartiles
Modified
Quartile-Based
Scores
Deciles
4
2
Next 25%
2
Next 25%
1
Next 25%
30 40 50
Low
Average
Below
Average
70 – 84
3
Next 25%
3
Next 25%
2
Next 25%
60 70 80
Average
Average
85 – 115
13
75 – 90
14
15
16 17 18 19
91 - 97
98 -
4
Highest 25%
4
Highest 25%
3 Highest 25%
4
with 1 or more errors
zero errors
90
High
Average
100
Very
High
Superior
Above
Average
116 – 130
Very
Superior
Super
-ior
131145
Very
Super
-ior
146 –
WISC-V
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
65
It is essential that the reader
know (and be reminded)
precisely what classification
scheme(s) we are using with
the scores, whether we use all
the different ones provided
with the various tests or
translate everything into a
common language.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
66
I usually put all my test scores
in an appendix to the narrative
report. The right-most column
is usually a verbal label for each
score (e.g., “Above Average”).
I use footnotes to explain the
test scores, confidence bands,
and percentile ranks in at least
the first table in the appendix.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
67
The last column gets a footnote
in every table so I can keep
reminding the reader that I am
either using one set of verbal
labels (not necessarily the
publisher’s) for scores or that I
am using various publishers’
different sets of labels, so the
same score may have different
names.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
68
Ralph's Test Scores in Standard Scores, Percentile Ranks, and Stanines for his Age
Test
WISC-IV Full Scale IQ
DAS-II General Conceptual Ability
Woodcock-Johnson III General Intellectual Ability
Stanford-Binet 5 Full Scale IQ
KABC-III Mental Processing Index
RIAS Composite Intelligence Index
WIAT III Reading Comprehension
Gray Oral Reading Test Oral Reading Quotient
Stan90%
dard
ConfiScore
dence1
110 106 – 114
110 106 – 114
110 106 – 114
85
81 – 89
85
81 – 89
85
81 – 89
85
81 – 89
85
81 – 89
Percentile2
75
75
75
16
16
16
16
16
Classification3
High Average
Above Average
Average
Low Average
Average
Below Average
Ava
Average
Below Average
1. Even on the best tests, scores can never be perfectly accurate. This range shows how much scores are
likely to vary 90% of the time just by pure chance.
2. Percentile ranks tell the percentage of students the same age who scored the same as Ralph or lower. For
example, a percentile rank of 67 would mean that Ralph scored as high as or higher than 67 percent of
students his age and lower than the remaining 33 percent.
3. Each test uses its own particular scheme for classifying scores. The same score may be called different
names on different tests. Please see the explanation on p. i of the Appendix to this report.
Because of the dramatic discrepancy between Ralph's Average General
Intellectual Ability on the WJ III and his Average Reading Comprehension on the WIAT-III, the team should consider the possibility that he
might have a specific learning disability in reading comprehension.
1. These are the standard, scaled, or T scores used
with the various tests. Please see p. i of the Appendix
to this report for an explanation of these scores.
2. Even on the best tests, scores can never be
perfectly accurate. This range shows how much
scores are likely to vary 90% of the time just by pure
chance.
3. Percentile ranks tell the percentage of students the
same age who scored the same as Ralph or lower.
For example, a percentile rank of 67 would mean that
Ralph scored as high as or higher than 67 percent of
students his age and lower than the remaining 33
percent.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
70
4. Each test uses its own particular scheme for
classifying scores. The same score may be called
different names on different tests. Please see the
explanation on p. i of the Appendix to this report.
– or –
4. Each test uses its own particular scheme for
classifying scores. The classification schemes for
the various tests taken by Ecomodine are
explained on p. ii. I have taken the liberty of
substituting "stanine" classifications, as explained
on p. i, for the publishers' classifications. These
are NOT the classification labels used by the
various test publishers. Please see p. ii.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
71
If, as I usually do, I copy and
paste parts of tables into my
narrative (perhaps deleting
some rows and columns), I
again footnote all columns in the
first table and footnote the
verbal label column in all tables.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
72
• No matter what you do, you will
confuse some readers, annoy
others, and enrage a few.
• Explain what you are doing in at
least three places in the narrative
and in a footnote on every table
and a few score citations in text.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
73
However, bear in mind that all
such classification schemes are
arbitrary (not, as attorneys say,
“arbitrary and capricious,” just
arbitrary).
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
74
"It is customary to break down
the continuum of IQ test scores
into categories. . . . other
reasonable systems for dividing
scores into qualitative levels do
exist, and the choice of the
dividing points between different
categories is fairly arbitrary. . . .
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
75
“It is also unreasonable to place too
much importance on the particular
label (e.g., 'borderline impaired')
used by different tests that
measure the same construct
(intelligence, verbal ability, and so
on)." [Roid, G. H. (2003). StanfordBinet Intelligence Scales, Fifth
Edition, Examiner's Manual. Itasca,
IL: Riverside Publishing, p. 150.]
76
Page 153
"Qualitative descriptors are only suggestions
and are not evidence-based; alternate terms
may be used as appropriate" [emphasis in
original].
[WISC-V Technical and interpretive manual, p. 152.]
Life becomes more complicated
when scores are not normally
distributed, as is often the case
with neuropsychological tests
and behavioral checklists, and
sometimes with visual-motor
and language measures.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
78
It is easy to check. In a normal
distribution (or one that has
been brutally forced into the
Procrustean bed of a normal
distribution), the following
scores should be equivalent.
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
79
If the standard scores do not match these percentile
ranks in the norms tables, the score distribution is
not normal and the standard scores and percentile
ranks must be interpreted separately. See the test
manual and other books by the test author(s).
PR
SS
ss
v
T
B-O
z
PR
99.9
98
84
50
16
02
0.1
145
130
115
100
85
70
55
19
16
13
10
7
4
1
24
21
18
15
12
9
6
80
70
60
50
40
30
20
30
25
20
15
10
5
0
+3.0
+2.0
+1.0
0
–1.0
–2.0
–3.0
99.9
98
84
50
16
02
0.1
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
80
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
81
http://myweb.stedwards.edu/brianws/3328fa09/sec1/lecture11.htm
Brian William Smith
82
Dumont/Willis Extra Easy Evaluation Battery
(DWEEEB)
http://www.myschoolpsychology.com/Humor.pdf
http://www.myschoolpsychology.com/wpcontent/uploads/2014/02/DWEEEB.pdf
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
83
SCORES IN THE AVERAGE RANGE
There are 200 &s.
Each && = 1%.
&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
Percent .1%
S.S.
s.s
- 55
1
2
3
4
5
6
7
8
9
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
&&
&&
99.8%
.1%
56 – 144
145 -
10
11
12
13
14
15
16
17
18
19
T
- 20
21 – 79
80 -
PR
- 0.1
0.2 – 99.8
99.9 -
Average
High
Average
Classi- Low
fication Average
There are 200 &s.
Each && = 1%.
&
&&&&&&
&&&&&&
& & & &
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
Percent
49%
2%
49%
S.S.
< 100
100
> 100
s.s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
T
< 50
50
> 50
P. R.
- 48
4951
52 -
Classification
11.22.15 Rivier Univ.
Below Average
16
17
& & & &
18
19
Above Average
Average
84
85
86
A publisher calling a score
“average” does not make the
student’s performance average.
If a student earned a Low
Average reading score of 85 on
the KTEA or WIAT-II and is then
classified as Average for precisely
the same score on the KTEA-II or
WIAT-III, the student is still in the
bottom 16% of the population!
11.22.15 Rivier Univ.
SAIF
Statistics
John O. Willis
87
HAND ME THAT GLUE GUN
Byron Preston, 15, hasn't gone to school for four
months. . . . He . . . was expelled for possession
of a "weapon" -- a tattoo gun, which he took to
school to practice tattooing on fruit. "It doesn't
shoot anything," complains his father, James. "It
just happens to have the word 'gun'." But school
officials wouldn't listen, saying a student having a
"gun" at school calls for automatic expulsion
according to their zero tolerance policy. A Prince
George's County Public Schools spokesman says
the policy is "under review" by the school board.
The Prestons have been told verbally that they
won the appeal of the expulsion, but somehow
the paperwork to reinstate Byron into school has
88
never shown up. (RC/WTTG-TV)
I call 90 - 109 “Average.”
There are
Each &&
&
& &
&
- 69
1
2
3
Extremely
Low
– 69
Very Low
Low 55 –
– 55 69
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
200 &s.
= 1%.
4
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
70 – 79
80 - 89
5
6
Borderline
70 – 79
90 – 109
7
8
Low
Average
80 – 89
Below
Average
70 – 84
11.22.15 Rivier Univ.
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
9
10
11
Average
90 – 109
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
110 – 119
120 - 129
12
13
High
Average
110 – 119
Average
85 – 115
SAIF
Statistics
John O. Willis
14
15
Superior
120 – 129
Above
Average
116 – 130
& & &
&
130 16 17 18 19
Very
Superior
130 –
Super Very
-ior Super
-ior
131145 146 –
89
I call 85 - 115 “Average.”
There are
Each &&
&
& &
&
- 69
1
2
3
Extremely
Low
– 69
Very Low
Low 55 –
– 55 69
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
200 &s.
= 1%.
4
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
70 – 79
80 - 89
5
6
Borderline
70 – 79
90 – 109
7
8
Low
Average
80 – 89
Below
Average
70 – 84
11.22.15 Rivier Univ.
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
9
10
11
Average
90 – 109
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
110 – 119
120 - 129
12
13
High
Average
110 – 119
Average
85 – 115
SAIF
Statistics
John O. Willis
14
15
Superior
120 – 129
Above
Average
116 – 130
& & &
&
130 16 17 18 19
Very
Superior
130 –
Super Very
-ior Super
-ior
131145 146 –
90
I call 80 - 119 “Average.”
There are
Each &&
&
& &
&
- 69
1
2
3
Extremely
Low
– 69
Very Low
Low 55 –
– 55 69
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
200 &s.
= 1%.
4
&
&&&&&&
&&&&&&
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
70 – 79
80 - 89
5
6
Borderline
70 – 79
90 – 109
7
8
Low
Average
80 – 89
Below
Average
70 – 84
11.22.15 Rivier Univ.
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
9
10
11
Average
90 – 109
&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&&&&&&
&
&&&&&&
&&&&&&
110 – 119
120 - 129
12
13
High
Average
110 – 119
Average
85 – 115
SAIF
Statistics
John O. Willis
14
15
Superior
120 – 129
Above
Average
116 – 130
& & &
&
130 16 17 18 19
Very
Superior
130 –
Super Very
-ior Super
-ior
131145 146 –
91
I call him “Nice Kitty.”
92
Download