Cross-cultural research with the Snijders

advertisement
Cross-cultural research with the Snijders-Oomen Nonverbal
Intelligence Tests
Jaap A. Laros 1 and Peter J. Tellegen 2
1
University of Brasília, Brazil, 2 University of Groningen, The Netherlands
Abstract
The SON-R 2.5-7 and the SON-R 5.5-17 are the latest revisions of the Snijders-Oomen
nonverbal intelligence test, originally developed in the Netherlands in 1943 for use with deaf
children. The tests consist of 6 to 7 subtests mainly focussed on visual-spatial abilities and
abstract and concrete reasoning. Research in the Netherlands indicates that the nonverbal
SON-R tests are well suited for use with children of ethnic minorities. With traditional tests
the cognitive abilities of minority children are often underestimated as a result of their lack of
knowledge of the official language. Notwithstanding the favorite research results with
minority children in The Netherlands, it cannot simply be assumed that the SON-R tests can
be used unmodified in countries which are greatly different from The Netherlands. In this
presentation we will discuss results obtained with the SON-R tests in Australia, the USA,
Great Britain, China, Peru and Brazil.
Introduction
With the latest revision and standardization in the Netherlands of the Snijders-Oomen
Nonverbal Intelligence Tests, various cross-cultural validation studies have been initiated.
The objective of these studies was to verify to which degree the SON-R tests can be used with
cultures different from the cultures in Western Europe in general, and different from the
cultures in the Netherlands in particular. In other words, the goal of these studies was to
discern if, and to what extent adaptations of the material (e.g., test instructions, testing format,
examples, items) of the SON-R tests are required for cross-national and cross-cultural use.
The reason of the adaptation of the SON-R tests is to be able to assess in a fair way the
construct of intelligence in multiple cultures.
A crucial phase in the adaptation of conventional tests is the translation of the instrument from
the source language into a target language. To obtain an equivalent test in another language or
culture not only a translation is needed that preserves the meaning of the verbal test materials,
but also additional changes may be necessary to insure the equivalence of the versions of the
test in multiple languages or cultures, such as those affecting item format and testing
procedures. (Hambleton, 1993). Close correspondence between the original version and the
translated version is required before reliance can be placed on results based on translated tests.
In addition to using effective translation practices, test users also need to examine the
psychometric properties of translated tests (Ellis, 1995). The adaptation of nonverbal tests for
multiple cultures does not include the difficult and often very problematic test translation
1
phase and is therefore much less complicated than the adaptation process of (partly) verbal
tests.
The fact that the adaptation process of nonverbal tests is much less complicated than the one
needed for intelligence tests which use written or spoken language is one of the great
advantages of nonverbal intelligence tests. The circumstance that nonverbal intelligence tests
often do not need translation, does not automatically mean that there is no need for empirical
studies on the equivalence of their applications for different cultures. Van de Vijver and
Poortinga (1997) noted that when a psychological instrument developed in one society is
applied in a different cultural context, invariance of psychometric properties like reliability
and validity cannot be merely assumed, but has to be empirically demonstrated. The
occurrence of bias can change the psychometric properties of an instrument when it is used in
a different culture. Bias can appear at various levels; at test-, subtest-, or at item-level. A test,
subtest, or an item is biased if it does not measure the same psychological trait across cultural
groups (Van de Vijver, 1998). Three different types of bias can be differentiated, construct
bias, method bias and item bias (Van de Vijver & Poortinga, 1997). A test shows construct
bias if the construct measured is not identical across different cultural groups. Method bias
occurs as a consequence of nuisance variables due to method-related factors.
Three sorts of method bias can be distinguished: sample bias, instrument bias and
administration bias. Sample bias occurs when the samples are incomparable on aspects other
than the target variable. Instrument bias is related to characteristics of the instrument that are
not identical for different culture groups. An example of this kind of bias is stimulus
familiarity. The third type of method bias occurs when there are problems related with the
administration of a test. This type of bias can occur when communication problems arise
because of the insufficient knowledge of the subject of the language that the examiner uses.
Item bias occurs if persons with the same amount of the trait being estimated, but belonging to
different groups have different probabilities of making a specific response to the item.
The probability that administration bias will occur with a nonverbal intelligence test like the
SON-R is low as that the instructions are given in a nonverbal or in a verbal way, dependent
on the possibilities of communication of the subject. Moreover, the providing of feedback
following each item and the ‘showing how to do it’ by the examiner reduces the chance of the
occurrence of administration bias.
Drenth (1975) has described the SON-test as an example of culture-reduced tests, meaning
that these tests only measure a limited number of culturally determined skills that they do not
intend to measure. Examples of culturally determined skills are: being able to use a pencil,
being capable of working with numbers, and being able to understand the instructions. Drenth
has argued that it is not essential for a culture-reduced test to produce similar score
distributions in different cultural groups; the only condition for a culture-reduced test is that it
should not reflect skill differences determined by cultural factors. Drenth defines a culture-fair
test as a culture-reduced test for the particular groups under consideration.
The conception of Jensen (1980) of a culture-reduced test differs slightly from Drenth´s
conception: Jensen considers a test culture-reduced when the performance on this test is only
2
to a limited extent influenced by culturally determined familiarity with the stimulus figures
and response format. Jensen has formulated the following criteria for culture-reduced tests: (a)
the test has to be a performance measure, (b) instructions should be given in mime to exclude
the influence of language, (3) preliminary practice items should be provided, (4) items should
not depend on time, (5) items should require abstract reasoning rather than factual information
and (6) problems should be designed in such a way as to ensure subjects are unable to guess
from memory of similar items encountered in the past. The SON-R tests do not satisfy all of
the criteria of Jensen for culture-reduced tests because with one subtest of the SON-R 5.5-17
(Hidden Pictures) a time limit is being used. Jensen has also stated that the use of abstract
geometrical figures reduces the likelihood of cultural bias. In the SON-R tests various subtests
contain concrete, meaningful pictures instead of abstract, geometrical figures. The subtests
using meaningful picture materials might be culture-specific.
The research finding that immigrant children in the Netherlands (mainly children from
Morocco, Turkey, Surinam and the Dutch Antilles) perform better on the SON-R tests than on
traditional intelligence tests like the WISC-R and RAKIT (Laros & Tellegen, 1991; Tellegen,
Winkel, Wijnberg-Williams & Laros, 1998) provides a positive empirical indication of the
culture-fairness of the SON-R for the immigrant groups in the Dutch population. One of the
reasons why immigrant children attain lower mean scores on traditional intelligence tests than
on nonverbal intelligence tests like the SON-R is the relative strong reliance of these tests on
verbal abilities and specific knowledge learned in school. This is especially the case with the
so-called omnibus intelligence tests like the various Wechsler scales which contain subtests
like Information and Vocabulary which presuppose specific knowledge learned in school
(Helms-Lorenz & Van de Vijver, 1995).
The fact that minority groups show lower means on a test, however, does not necessarily
mean that a test is culturally biased. Van de Vijver & Poortinga (1992) argued that the
desirability of cultural loadings in measurement procedures is determined by the intention of
the test in question. If a particular test is intended to test knowledge gained during a course at
school it is quite likely that culture-specific knowledge is tested. In this case, cultural loadings
in tests are unavoidable and even desirable. In general a distinction can be made between
generalizations about achievements and about aptitudes. In the latter case, cultural loadings
are undesirable (Helms-Lorenz & Van de Vijver, 1995).
A second research result with positive implications for the culture-fairness of the
SON-R tests for the immigrant groups in the Netherlands is the finding that there is no
relation between their length of stay in the Netherlands and their IQ-scores, indicating that
performance on the SON-R is not dependent on knowledge of the Dutch language (Laros &
Tellegen, 1991).
Notwithstanding the existence of the positive indications of the culture-fairness of the SON-R
tests, also some negative indications are available, especially in reference to the subtests of
the SON-R in which concrete, meaningful picture materials are used instead of abstract
geometrical figures. Our expectation before the start of the cross-cultural validation studies
with the SON-R tests, was to encounter some bias in the subtests that use meaningful picture
materials because some of the pictures used seemed rather specific for western cultures.
3
Before presenting the results of these studies, a short description of the SON-R tests will be
given.
Description of the SON-R tests
The SON test was originally developed in 1943 by Snijders-Oomen for use with deaf
children. She intended to measure a broad spectrum of intelligence functions without
being dependent on the use of oral or written language. With subsequent revisions also
norms and instructions for use with hearing subjects were developed. The latest revision
comprises of separate tests for younger and older children, the SON-R 2.5-7 (Tellegen
et al., 1998) and the SON-R 5.5-17 (Snijders, Tellegen & Laros, 1989). In table 1 some
characteristics of the tests are presented.
Table 1: Characteristics of the SON-tests in the Netherlands
SON-R 2.5-7
age range
N sample
N subtests
reasoning abilities
spatial abilities
perceptual abilities
administration time
mean reliability subtests
reliability total score
generalizability total score
2;0 – 7;11 years
1124
6
Categories
Situations
Analogies
---Mosaics
Patterns
Puzzles
---50 min.
.72
.90
.78
SON-R 5.5-17
5;6 – 16;11 years
1350
7
Categories
Situations
Analogies
Stories
Mosaics
Patterns
---Hidden Pictures
90 min.
.76
.93
.85
Most subtests are related to either reasoning abilities or spatial abilities while in the
SON-R 5.5-17 also a test for perceptual abilities is included. In the subtest Categories
the subject has to find the common element between three pictures and select two other
pictures that belong to the same category. In Situations one or more parts are missing
from a drawing and the subject has to select those parts from a number of alternatives to
make the drawing a meaningful whole. In Analogies the transformation of an abstract
element is shown and the subject has to perform the same transformation on another
element by selecting the proper alternative. In Stories cards have to be ordered to make
a meaningful story.
The SON-R 2.5-7
The revised version of the Snijders-Oomen Nonverbal Intelligence Test (Tellegen et al.,
1988) for children between the ages of 2.5 to 7 years comprises 6 subtests. In sequence
of administration the subtests are: Mosaics, Categories, Puzzles, Analogies, Situations
4
and Patterns. In the subtest Mosaics the child is asked to copy different mosaic patterns
with red/white squares. In the first part of the subtest Categories the child has to sort
cards based on the category they belong; in the second part two out of five pictures have
to be chosen, which are missing in a certain category of pictures. In the first part of
Puzzles, three pieces have to be copied in a frame to resemble an example; in the second
part the child is asked to form a whole from three to six puzzle pieces. In the first part of
Analogies the child is required to sort three to five forms according to form, colour, and
size into two compartments. In part two of Analogies a geometric figure changes in one
or more aspects to form another figure. To obtain a similar transformation with a second
figure, the child is asked to choose the correct alternative. In the fist part of Situations
with items showing only the upper halves of four pictures, the child is asked to find the
missing halves. In the second part of Situations the task is to indicate the missing parts
of drawings of concrete situations. In Patterns the child is asked to copy several patterns
with a pencil.
The subtests Mosaics, Puzzles and Patterns reflect spatial abilities, while the remaining
subtests Categories, Analogies and Situations are more directed to abstract and concrete
reasoning abilities.
The administration of the SON-R 2.5-7 occurs individually; the mean administration
time of the SON-R 2.5-7 amounts to 50 minutes. A very important element of the
instructions exist in the ‘showing how to do it’ of the examiner of a part of the items.
Another very important aspect of the test administration is the feedback that the
examiner offers after each item. The feedback offered with the SON-R 2.5-7 goes
beyond the feedback given with the SON-R 5.5-17; the child is not only informed
whether the answer was right or wrong, but the examiner helps the child to find the
correct solution. Because of this aspect the SON-R 2.5-7 has more similarities with a
learning potential test than with a traditional test of intelligence (Tellegen & Laros,
1993a).
The standardization of the SON-R 2.5-7 is based on a nationwide sample of 1124
children varying in age from 2 years and 3 months to 7 years and three months. The
reliability (alpha stratified) of the total score of the SON-R 2.5-7 increases from .86 at
2.5 years to .92 at 7.5 years with a mean value of .90. The generalizability of the total
score (alpha) increases from .71 at 2.5 years to .82 at 7.5 years with a mean value of .78.
The average reliability of the subtests is .72. The test-retest correlation of the SON-R
2.5-7 with an interval of 3 months is .79. Besides a total IQ-score on the SON-R 2.5-7
also scores on the Performance Scale and Reasoning Scale are being calculated. The
score on the Performance Scale is based on the three performance subtests Mosaics,
Puzzles and Patterns, and the score on the Reasoning Scale is based on the three
reasoning subtests Categories, Analogies and Situations.
The validity of the SON-R 2.5-7 has been investigated through various validation
studies in the Netherlands and in other countries like Great Britain, the USA, and
Australia by comparing the results on this test with results on other intelligence- and
5
language development tests. The results of the studies outside the Netherlands will be
described in the third part of this paper.
The results on the SON-R 2.5-7 have been compared in the Netherlands with the
following tests: the WISC-R, the WPPSI-R, the TONI-2, the Stutsman, the KaufmanABC, the BOS 2-30, the LDT, the RAKIT, the TOMAL, the DTVP-2 and the Reynell
and Schlichting language development tests. The sample size of the various validation
studies varies from 26 to 558 subjects; the mean sample size amounts to 118 subjects.
The 21 correlations of the SON-R 2.5-7 with other non-verbal (intelligence) tests vary
from .45 to .83 and have a mean value of .65. The 12 correlations with general
intelligence measures vary from .54 to .87 with a mean of .65. The 19 correlations of the
SON-R 2.5-7 with measures for verbal ability and verbal intelligence vary from .20 to
.71 and have a mean value of .48. With some of the general intelligence tests it is
possible to calculate the correlation with the performance scale, the verbal scale and
with the total score. In all of these cases the correlation of the SON-R 2.5-7 with the
performance scale was higher than the correlation with the verbal scale.
These diverse validation studies support the divergent and convergent validity of the
SON-R 2.5-7, but the highly varying correlations imply that substantial differences
between the scores on the SON-R 2.5-7 and on other intelligence tests can appear.
Possible explanations for these highly varying correlations are: differences between the
contents of the SON-R and other intelligence tests, differences in the test procedure of
the SON-R and the various other tests, the very young age at which the children were
tested and the great interval of time between the application of the tests.
The SON-R 5.5-17
The revised version of the Snijders-Oomen Nonverbal intelligence test (Snijders, Tellegen &
Laros, 1989) for children and adolescents between the ages of 5.5 to 17 years consists of 7
subtests. In sequence of administration the 7 subtests are: Categories, Mosaics, Hidden
Pictures, Patterns, Situations, Analogies and Stories. In Categories a child has to choose two
out of five pictures, which are missing in a certain category of pictures. The task in Mosaics
consists of copying figures with red/white squares. In Hidden Pictures the task is to find a
given picture that is hidden several times in a bigger drawing. In Patterns a part of a particular
pattern or line is missing: the child has to draw the missing part with a pencil. The task in
Situations is to indicate the missing parts of drawings of concrete situations. In Analogies
geometrical figures are presented with the problem format A : B = C : D; the child has to
discover the principle behind the transformation A : B and apply it to figure C to find the
correct figure D out of for alternatives. In Stories the child has to order a number of cards in
such a way that they form a logical story. Categories, Situations and Analogies are multiple
choice tests, while Mosaics, Hidden Pictures, Patterns and Stories are so-called ‘action’ tests.
In action tests the solution has to be sought in an active manner which makes observation of
behaviour possible.
The SON-R 5.5-17 can be divided into four types of tests according to their contents: abstract
reasoning tests (Categories & Analogies), concrete reasoning tests (Situations & Stories),
6
spatial tests (Mosaics & Patterns) and perceptual tests (Hidden Pictures). Principal
components analysis (PCA) has been performed on the subtest correlations to obtain
empirical confirmation of the theoretical dimensions of the test. Although for the youngest
age groups the four theoretical dimensions were supported by the loadings on the first four
varimax rotated components, across all age groups the two component solution with a
‘reasoning’ and a ‘spatial’ component provided more consistent results.
The SON-R 5.5-17 is administered individually; the average administration time amounts to
90 minutes. The role of time scoring is kept to a minimum; only with the subtest Hidden
Pictures an effective time is being used. For the other subtests sufficient time is allowed for
the answering of each item. In the following two important aspects the administration
procedure of the SON-R shows differences with the procedure used in traditional intelligence
tests: (1) the providing of feedback following each item, and (2) the use of an adaptive
procedure. The feedback given during the application of the SON-R 5.5-17 is restricted to
informing the examinee whether the answer was right or wrong. The feedback clarifies the
instructions and gives the child the opportunity to learn from his own errors and successes and
to adjust his problem solving strategy. With the used test procedure of the SON-R each item
becomes an opportunity to learn and adjust (Tellegen & Laros, 1993).
The adaptive procedure is made possible by dividing the subtests in two or three parallel
series of about 10 items; every child starts with the easiest item of the first series. Each series
is broken off after two errors; the starting point for the next series is determined by the score
on the preceding series. In this way, the administration of items is determined by the subject´s
individual performance and the presentation is limited to the most relevant items for each
subject. On the average the number of items presented with the adaptive procedure is 50% of
the total number of items in the subtests while the maximum number to presented amounts to
about 60%.
The standardization of the SON-R 5.5-17 is based on a nationwide sample of 1350 children
and adolescents varying in age from 6 to 14 years. The reliability coefficient (alpha stratified)
of the total score of the test increases from .90 at six years to .94 at fourteen years with a
mean value of .93. The generalizability of the total score (alpha) increases from .81 at six
years to .88 at fourteen years with a mean value of .85. The average reliability of the subtests
is .76. The validity of the SON-R 5.5-17 is evident from the clear relationship with different
indicators of school career such as school type, class repetition and school report marks. The
mean multiple correlation of the SON-R with these indicators of school career amounts to .59.
For children in the age group of 7 to 9 years the multiple correlation is .54, for children in the
age group 10-11 years .60, and for children in the age group 13-14 years the multiple
correlation increases to .63.
Cross-cultural studies with the SON-R 2.5-7
Research in Australia
The validity study with the SON-R 2.5-7 in Australia was executed in 1996 in Victoria under
supervision of Jo Jenkinson of the Deakin University in co-operation with the University of
7
Groningen (Jenkinson, Roberts, Dennehy & Tellegen, 1996; Tellegen, 1997).
The study is based on 155 subjects, 72 boys and 83 girls, with a mean age of 4 years and
5 months (standard deviation 10 months). Within the group of 155 children three groups can
be differentiated: children without specific handicaps (N=59); hearing impaired children
(N=59), and children with a developmental retardation (N=37).
In this research both the Wechsler Preschool and Primary Scale of Intelligence – Revised
(WPPSI-R) and the SON-R 2.5-7 were administrated in changing order; the mean interval
between applications was 20 days. The SON-R 2.5-7 was administered according to the
standard test procedure by psychology students from the University of Groningen, while the
WPSSI-R was administrated by Australian psychologists. The hearing impaired children and
the children with a developmental retardation only did the performance scale of the WPPSI-R.
Results
The correlation of the SON-R 2.5-7 with the WPPSI-R in the total Australian sample amounts
to .78. Within the three different groups the correlation between the performance scale of the
WPSS-R and the SON-R 2.5-7 was .74, .74 and .75. Within the non-handicapped group the
correlation of the SON-R 2.5-7 with the verbal scale (.54) is lower than the correlation with
the performance scale (.74) of the WPPSI-R. The scores on the SON-R 2.5-7 are on average 5
points lower than the scores on the WPSSI-R.
Table 2: Characteristics of the research with the SON-R outside The Netherlands
_________________________________________________________________
SON-R 2.5-7
Australia
Great-Britain
United States
-----------------------------------------------------------------------------------------N of subjects
155
58
75/31/26/29/47
Age
4;5 (0;10)
6;3 (0;3)
5;1/4;7/4;7/5;6/4;7
Correlation with
criterion test
.78 (WPPSI-R PIQ) .87 (BAS 6 subt.) .59 (WPPSI-R FSIQ)
.66 (K-ABC)
.61 (MSCA)
.47 (PPVT-R)
.61 (PLS-3)
_________________________________________________________________
SON-R 5.5-17
China
Peru
Brazil
-----------------------------------------------------------------------------------------N of subjects
302
160
82
Age
11;6
9;4
10;5
Correlation with
criterion test
.77 (WISC-R FSIQ) .60 (school marks)
Items with problems:
Categories
6
3
10
Situations
3
2
4
8
Stories
0
0
0
_________________________________________________________________
Research in the USA
The validation study executed in the USA (West Virginia) supervised by Stephen O´Keefe of
the West-Virginia Graduate College implied the application of the SON-R 2.5-7 and five
other cognitive tests. The tests that were applied were the following: the WPPSI-R, the
Kaufman-ABC, the McCarthy Scales of Children’s Abilities (MSCA), the Peabody Picture
and Vocabulary Test - Revised (PPVT-R) and the Preschool Language Scale-3 (PLS-3).
The SON-R 2.5-7 was partly administrated by psychology students from the University of
Groningen, partly by psychologists from West Virginia. The amount of time between the
administration of the SON-R 2.5-7 and the other tests generally was very short; in the
majority of the cases the other test was applied on the same day. The PLS-3 was not
administrated during this study; the test scores on the PLS-3 were collected on another
occasion and could be used in this research. The number of children that both made the SONR 2.5-7 and another test varied from 26 (in case of the MSCA) to 75 (in case of the WPPSIR).
Results
The SON-R 2.5-7 had a correlation of .59 with the total score on the WPPSI-R; the
correlations with the performance and verbal scales were .60 and .43 respectively. The
average age of the 75 children that did both tests was 5.1 years. The mean total score on the
SON-R 2.5-7 was more than two points lower than the total score on the WPPSI-R (94.5
versus 96.8) and nearly four points lower than the performance scale of the WPPSI-R (94.5
versus 98.3).
The SON-R 2.5-7 showed a correlation of .66 with the Kaufman-ABC. The correlation with
the simultaneous scale was much higher than with the scale for sequential processing (.58
versus .29). The correlation with the nonverbal scale of the Kaufman-ABC amounted to .61.
The average age of the 31 children that did both the Kaufman-ABC and the SON-R 2.5-7 was
4.6 years.
With the general cognitive index of the MSCA the SON-R 2.5-7 showed a correlation of .61;
the correlation with the verbal scale was .48; while the correlation with the perceptual
performance scale amounted to .61. The average age of the 26 children that did both tests was
4.6 years.
The correlation of the SON-R 2.5-7 with the PPVT-R was .47; the average age of the 29
children that did both tests was 5.5 years.
With the total language score of the PLS-3 the SON-R 2.5-7 showed a correlation of .61; with
the Auditory Comprehension Scale the correlation was .59, while the correlation with the
Expressive Communication Scale amounted to .56. The average age of the 47 children who
did both tests was 4.6 years.
9
Research in Great Britain
During this validation research in Great Britain which took place in 1996 and that was
supervised by Julie Dockrell of the University of London, the SON-R 2.5-7 and the British
Ability Scales (BAS) were both administrated. The BAS was administrated by psychology
students from the University of London, and the SON-R was applied by psychology students
from the University of Groningen. Both tests were applied in changing order to 58 children,
34 boys and 24 girls, from the first year of primary school. The mean age of the children was
6;3 years (standard deviation 3 months). The interval between test administrations varied from
some days to some weeks. Within the total sample three groups of children can be
distinguished: a group without specific handicaps (N=20), a group for which English is the
second language (N=22), and a group with learning disabilities (N=16).
Six subtests of the BAS were applied, the four subtests of the short version (Naming
Vocabulary, Digit Recall, Similarities, and Matrices) and two extra nonverbal subtests (Block
Design and Visual Recognition).
Results
The correlation of the SON-R 2.5-7 with the short version of the BAS is .80. When the two
nonverbal subtests are included the correlation augments to .87. The correlation with the
verbal part (three verbal subtests) of the shortened version of the BAS was .71, while the
correlation with the performance part (three nonverbal subtests) amounted to .78. The
correlations in the group of children without any handicaps are considerably lower than in the
other two groups (.56 versus .76 and .78). In the total English sample the SON-R IQ-scores
are 7 points lower than on the short form of the BAS. The difference in IQ-scores between the
group of normal children and children with learning disabilities for the SON-R 2.5-7 is 40.8
points and for the short form of the BAS 42.8 points.
Conclusions of the studies in Australia, the USA, and Great Britain
In the three cross-cultural studies the correlations of the SON-R 2.5-7 with the
performance scales of a number of criterion tests could be compared to the correlations
with the verbal scales. In all three studies the correlation of the SON-R 2.5-7 with the
performance scale of the criterion test was clearly stronger than with the verbal scale. In
the Australian study the SON-R 2.5-7 showed correlations with the performance and
verbal scales of the WPPSI-R of .74 and .54 respectively; in the study in the USA these
correlations were .60 and .43. In the same study the SON-R 2.5-7 showed a correlation
of .61 with the perceptual performance scale of the MSCA, and a correlation of .48 with
the verbal scale of the MSCA. In the research in Great Britain the SON-R 2.5-7 showed
a correlation with the performance part of the BAS of .78 and a correlation of .71 with
the verbal part of the BAS. These correlations obtained in Australia, the USA and Great
Britain support the convergent and the divergent validity of the SON-R 2.5-7.
Cross-cultural Studies with the SON-R 5.5-17
10
Research in China
Chinese psychology students in collaboration with Milly Judistera, a Dutch
psychologist, executed the research with the SON-R 5.5-17 in China, which took place
in 1996. The study was supervised in China by Professor Zhang Hou Can of the Beijing
Normal University. This study was a pilot study as a preparation of the standardization
and adaptation of the SON-R 5.5-17 for China.
All 7 subtests of the SON-R 5.5-17 were administrated according to the standard test
procedure by Chinese psychology students to a sample of 302 Chinese children,
consisting of 165 boys and 137 girls. To make the administration of the test possible,
the instructions of the SON-R 5.5-17 were translated into the Chinese language. The
Chinese students were trained in the administration of the SON-R 5.5-17 by Milly
Judistira with help of an English speaking Chinese psychology student as an interpreter.
The Chinese subjects were tested in the following age groups: 6.5-year-olds (103), 11.5year-olds (94), and 14.5-year-olds (105). The children came from Beijing (23), Tianjin
(81), Miyun (77) and Guangrao (121). The sampling procedure used was not a random
procedure; the schools were chosen on basis of existing contacts with the University of
Beijing, and within the schools children were chosen by the teachers. Although the
teachers claimed to have chosen the children in a random manner, there is no guarantee
that this really was the case. The research results therefore have to be interpreted with
some caution.
Psychometric characteristics
The generalizabilty coefficient (alpha) of the total score of the SON-R 5.5-17 in the
Chinese sample increases from .76 at 6.5 years to .82 at 14.5 years with a mean value of
.80. These values are lower than the ones found for the Dutch standardization sample;
here the generalizability of the total score increased from .81 at 6.5 years to .88 at 14.5
years with a mean value of .85. The generalizability coefficient is computed by the
usual coefficient alpha in which the subtests are the unit of analysis; the number of
subtests and the mean correlation between the subtests determine this coefficient. The
mean correlation between the 7 subtests of the SON-R 5.5-17 was lower for the three
Chinese age groups than for the comparable age groups of the Dutch sample (.31, .40,
and .42 versus .37, .50 and .51).
The mean of the standardized total score obtained by the Chinese children was quite
close to the mean standardized total score of the Dutch norm group (98.9 versus 100).
The mean scores of the Chinese children on Categories, Situations and Stories, all
subtests that use meaningful picture materials, were lower in comparison with the Dutch
norm group (respectively 93.6, 91.6, and 95.3 versus 100). The mean of these three
subtests amounts to 93.5. The mean scores of the Chinese children on Mosaics, Hidden
Pictures, Patterns and Analogies, all subtests using non-meaningful picture material
such as geometrical forms, were 96.9, 101.9, 104.2 and 108.1. The mean of these four
subtests amounts to 102.8 and is higher in comparison to the mean score of the four
subtests in the Dutch norm group (100). The difference found in the Chinese sample
between the mean score of the subtests with meaningful picture material (93.5) and the
11
mean score of the subtests with non-meaningful picture material (102.8) is significant at
the 5% level. This result is an indication that the Chinese children had more difficulties
with the subtests that use meaningful picture materials than with the subtests using nonmeaningful picture materials.
The relative low scores of the Chinese children on Categories, Situations and Stories
were conform to expectations beforehand; the meaningful pictures used in these subtests
were not expected to be familiar to cultures very different from the Dutch culture. In
one of the items of the subtest Categories, for instance, people leaving a church is
shown; such a situation did appear not to be very familiar to Chinese children.
The lower mean score on Mosaics (96.9 versus 100) does not correspond with earlier
expectations since the item material of this subtest consist of abstract geometrical
figures which according to Jensen (1980) reduces the probability of cultural bias. The
Chinese children did not show any problems with the other two subtests containing
abstract geometrical figures; on these subtests, Patterns and Analogies, they even
showed higher mean scores than their Dutch colleagues (104.2 and 108.1 versus 100).
The lower score on Mosaics might be a result of the fact that the majority of the Chinese
children were not tested with the original version of the subtest Mosaics. Due to
financial restrictions there was only one original SON-R 5.5-17 test available for the
research in China; in order to be able to test various children in the same period of time
a number of copies of the test were made. The squares of the copied version of Mosaics
did not fit very well in the frame in which the mosaic patterns had to be copied.
Probably, the use of test material, which was not standardized, has caused the lower
score of the Chinese children on Mosaics.
Factor analysis failed to show a clear similarity between the factor structure of the 7
subtests of the SON-R 5.5-17 in the Chinese sample and in the Dutch standardization
sample. In the Dutch sample Principal Component Analysis (PCA) with varimax
rotation resulted in two clear factors: a ‘reasoning’ and a ‘spatial’ factor. Although in
the youngest age groups four factors appeared (‘spatial’, ‘concrete reasoning’, ‘abstract
reasoning’, and ‘perception’) the two factor solution offered better and more consistent
results across all different age groups. In the Chinese sample PCA with varimax rotation
did not provide the same results as in the Netherlands. In the Dutch research the two
spatial subtests showed high loadings on the second factor (the ‘spatial’ factor) while in
the Chinese study this only seems to be the case for the youngest group. Moreover, in
the Dutch research Categories showed high loadings on the first factor (the ‘reasoning’
factor); but this was not the case for the eldest group of the Chinese sample. The results
of the factor analysis suggest a lack of test equivalence of the SON-R 5.5-17 in the
Chinese and the Dutch culture.
These results, however, have to be interpreted with some caution because reservations
can be made about the appropriateness of factor analysis as a method for assessing
equivalence of test applications in different cultures. Hambleton & Bollwark, (1991)
note in this respect that the disadvantage of factor analysis is that the results are sample
dependent, since it is based on classical item statistics. Their conclusion also applies to
12
other often used classical statistical methods to detect a possible cultural loading of a
test that are sample dependent, like the comparison of p-values and the comparison of
total test scores. All these methods presuppose the use of equal ability groups. Even in
the case of non-equal ability groups researchers must still check that the ordering of
item difficulties is the same in the two different cultures (Hambleton & Kanjee, 1995).
Following the recommendation of Hambleton & Kanjee, the correlation coefficients
between the ordering of the item difficulties in the two cultures have been calculated.
The Spearman´s rho correlation coefficients between the item difficulties of the six
subtests of the SON-R (one subtest, Hidden Pictures, does not consist of independent
items so no correlations could be computed) obtained in the Chinese sample and in the
Dutch sample are the following: .98 (Categories), .99 (Mosaics), .99 (Patterns), .98
(Situations), .99 (Analogies) & .98 (Stories). Although the correlations between the item
difficulties in the two cultures are all very high, Categories, Situations, and Stories show
a slightly lower correlation than the remaining subtests. This might be an indication of
cultural bias of these subtests.
The above-described psychometric characteristics of the SON-R 5.5-17 in this study
with 302 Chinese children suggest that the subtests Categories, Situations and Stories
might have a cultural bias. A next step in this study was an attempt to identify the
sources of this possible cultural bias of the three subtests.
Results of the judgmental procedure
According to Hambleton (1993) both judgmental and statistical methods should be used
in studies to determine the equivalence of a test in multiple languages or cultures. In this
study the judgmental procedure consisted of the reviewing on the aspect of possible
cultural bias of the test instructions, the testing format, the examples, and all the items
of Categories, Situations and Stories for the Chinese population of children from 6 to 17
years of age. The reviewing took place during a group discussion with the Chinese
psychology students. The test instructions, the testing format and the examples used in
the SON-R 5.5-17 were judged by them as containing no cultural bias. The only aspect
of the testing procedure that caused some problems for the Chinese administrators of the
SON-R 5.5-17 was the providing of feedback after each item. Some students reported
that they felt uncomfortable giving feedback because they believed that this might
influence the emotions of the children. Possibly the problems with the provision of
feedback is related to the fact that in the Chinese culture communication often occurs in
an indirect manner.
The results of the judgmental procedure with respect to the items of the three subtests
are the following: 6 items of Categories (items 2b, 3b, 4b, 6a, 7a, and 9b) and 3 items of
Situations (2b, 2c and 10b) were identified as being possibly culture specific. For the
subtest Stories no culture specific items were identified.
For these culture specific items new items have been developed that are adapted to the
Chinese culture. A Chinese student of the art academy has drawn the adapted items. In
the future standardization and validation research of the SON-R 5.5-17 in China, items
13
that were identified as being culture specific will be replaced by adapted items.
Research in Peru
The validation study with the SON-R 5.5-17 in Peru took place in 1996 and was
executed by two psychology students from the University of Groningen, and by
Peruvian psychologists. The research was supervised in Peru by Veronica Bisso Cajas,
from the Consortium of Catholic Centres of Education in Lima. This study served as a
preparation of the standardization and adaptation of the SON-R 5.5-17 for Peru.
All 7 subtests of the SON-R 5.5-17 were administrated according to the standard test
procedure by the two Dutch psychology students and by ten Peruvian psychologists. To
make the administration of the test possible, the instructions of the SON-R 5.5-17 were
translated into Spanish. The Peruvian psychologists were trained in the administration
procedure of the SON-R 5.5-17.
The test was administrated to a sample of 160 Peruvian children, consisting of 79 boys
and 81 girls. The age of the children varied from 6 years to 15 years with a mean of 9;4
years. All the subjects of the Peruvian sample lived in the city of Lima, the capital of
Peru; 50% of the children came from State schools and 50% private schools. The State
schools in Peru are fully supported by the government; children who visit these schools
generally come from families with a low SES-level and often live in very poor
circumstances. The private schools are partly financed by religious congregations and
private persons; children who frequent these schools generally come from families with
a high or moderately high SES-level. The sampling procedure used was only partially a
random procedure; the schools were chosen on basis of existing contacts with the
Consortium of Catholic Centres of Education, but within the schools children were
selected at random. In this research there is an over-representation of children with
moderate and high SES levels. The Peruvian sample cannot be considered as
representative for Peru, nor for Lima. The results of this study, therefore, cannot simply
be generalised to all of Peru and have to be interpreted with caution.
Psychometric characteristics
The results of this study show that the Peruvian children obtained a lower mean score
on the SON-R 5.5-17 than the children of the Dutch norm group (94.0 versus 100). The
mean scores of the Peruvian children on Categories, Situations and Stories, the subtests
which use meaningful picture material, were 97.1, 91.7 and 93.1. The mean of these
three subtests amounts to 93.9. The mean scores of the Peruvian children on Mosaics,
Hidden Pictures, Patterns and Analogies, all subtests using non-meaningful picture
materials, were 91.6, 92.2, 101.4, and 98.1. The mean of these four subtests amounts to
95.8. The difference between the mean score of the subtests with meaningful picture
material (93.9) and the mean score of the subtests with non-meaningful picture material
(95.8) is significant at the 5% level. This finding is in accordance with expectations
beforehand; Peruvian children, like the Chinese children, showed more problems with
the subtests containing meaningful pictures than with the subtests containing nonmeaningful pictures.
14
To check whether the ordering of item difficulties was the same in the Peruvian sample
and the Dutch sample Spearman’s rho correlation coefficients were calculated. The
correlation coefficients between the item difficulties of the subtests Categories,
Situations and Stories obtained in the Peruvian sample and in the Dutch norm sample
are .97, .98, and .97. These correlations are based on 48 children from the total sample
of Peruvian children. Note that the correlation coefficients between the item difficulties
of Categories, Situations and Stories obtained in the Peruvian and in the Dutch sample
are of the same magnitude as the correlations between the item difficulties found in the
Chinese and the Dutch sample.
In addition to the SON-R 5.5-17 also a Spanish version of the WISC-R was
administered to all 160 children of the Peruvian sample in order to obtain information
about the validity of the SON-R 5.5-17 in Peru. Since there were no Peruvian norms
available for the WISC-R, norm scores were used that are based on the standardization
sample of the USA (Wechsler, 1974).
The correlation of the SON-R 5.5-17 with the full scale of the WISC-R was .77, with
the performance scale .74 and with the verbal scale .69. These correlations are slightly
lower compared to those found in a study in the Netherlands with 35 children from an
outdoor psychiatric university clinic (Tellegen & Laros, 1993b). In this study the
correlation of the SON-R 5.5-17 with the full scale of the WISC-R was .80, with the
performance scale .80 and with the verbal scale .66.
The mean IQ-score of the Peruvian sample on the WISC-R was 96.7; the mean score on
the verbal scale 94.3 and the mean score on the performance scale 100.2 These scores
cannot simply be compared to the mean IQ-score on the SON-R 5.5-17 because the tests
have been standardized in different years; the WISC-R was standardized in 1974 and
the SON-R 5.5-17 in 1988. As Flynn (1987) has observed, norms on intelligence tests
are becoming stricter during the years. In a recent study (Sattler, 1992) comparing the
norms of the WISC-R with those of the most recent version of the Wechsler test for
children, the WISC-III (Wechsler, 1991), the norm scores on the WISC-III appeared to
be lower than the norm scores on the WISC-R. In the 17 years between the two
standardizations (1974-1991) the norm scores for the full scale decreased 5.3 points, for
the verbal scale 2.4 points, and for the performance scale 7.4 points. Based on the
differences found between the norms of the WISC-R and the WISC-III one can correct
the obtained scores for the differences in years of standardization of the WISC-R and
the SON-R 5.5-17. After correction, the mean norm score on the WISC-R of the
Peruvian sample becomes 92.3 for the full scale, 94.1 for the performance scale and
92.3 for the verbal scale. The Peruvian children obtain, after correction, a higher score
on the SON-R 5.5-17 than on the WISC-R.
Results of the judgmental procedure
To identify culture specific items of the subtests Categories, Situations and Stories a
judgmental procedure was used. With the remaining subtests (Mosaics, Hidden Pictures,
Patterns, Analogies) no judgmental procedure was used, because these subtests are less
likely to be culture specific. When a child responded incorrectly on an item of
15
Categories, Situations or Stories the examiner checked, after testing, whether the
pictures used in the item were understood by the child. This procedure, however, has
not been followed in a systematic way; not all the children, who responded incorrectly
to an item, were asked if the pictures used in that particular were familiar to them. The
difference with the judgmental procedure used in the Chinese study is that there the
examiners (psychology students) were asked to judge the cultural loading of the items;
in this study the examinees (children) were asked to give their judgement.
The results of the judgmental procedure with respect to the items of the three subtests
are the following: 3 items of Categories (items 1a, 2b, and 4a) and 2 items and one
example of Situations (example A, and items 2c and 4b) were identified as being
possibly culture specific.
Research in Brazil
The research in Brazil, which took place in 1996, was executed under the supervision of
Jaap Laros of the University of Brasilia, one of the authors of the SON-R tests. Six
psychology students of the above-mentioned university did the actual test administration
at schools. To make the administration of the test possible, the test instructions of the
SON-R 5.5-17 were translated into Portuguese. The Brazilian psychology students were
trained in the administration of the SON-R 5.5-17 by their supervisor. The study was a
preparation for the standardization and adaptation of the SON-R 5.5-17 for Brazil.
In this Brazilian study the subtests Categories, Situations and Stories of the SON-R 5.517 were administered individually to a sample of 82 Brazilian children, consisting of 41
girls and 41 boys. The age of the children varied from 7 to 14 years with a mean of 10,4
years. All children came from 2 state schools from Brasilia, the capital of Brazil. The
two schools differed in respect to the SES-level of their pupils; the 36 children from the
first school generally came from families with a moderate SES-level, while the 46
children from the second school principally came from families with a low SES-level.
The sampling procedure used was not a totally random procedure; the schools were
selected on the basis of existing contacts with the University of Brasilia, but within the
schools the children were selected at random, using only their age as a selection
criterion. The Brazilian sample cannot be considered to be representative for Brazil, nor
for Brasilia. The results of this study should therefore be interpreted with some caution.
To gain additional information on possible cultural bias, the three subtests of the SONR 5.5-17 have been administered in a nonstandard fashion. Van de Vijver & Poortinga
(1997) stress the importance of nonstandard administration of tests in cross-cultural
studies to evaluate the adequacy of stimulus and response formats, and of the test
administration procedure. In such a nonstandard test procedure it usually is very
informative to ask examinees to motivate their responses. In the standard procedure of
the SON-R 5.5-17 the subtests are divided into two or three parallel series of about 10
items; each child starts with the easiest item of the first series, but with which item of
the second or third series is started, depends on the performance of the child on the
previous series. In the nonstandard procedure used in this study the items were offered
in order of increasing difficulty; the subtests Categories and Situations with 27 and 33
16
items were broken off after twelve errors, while the subtest Stories with 20 items was
broken off after 8 errors. With the use of this nonstandard test procedure the children
responded to much more items than would have been the case if the standard adaptive
procedure had used. The average number of items administrated with the adaptive
procedure is 50% of the total number of items; with the nonstandard procedure this
number amounts to about 90% of the total number of items.
Psychometric characteristics
The reliability coefficients (alpha) for the three subtests in the Brazilian sample were as
follows: .85 for Categories, .87 for Situations and .82 for Stories. When corrected for
the influence of age, the corrected reliability coefficients are: .76 for Categories, .82 for
Situations and .76 for Stories. These values of the reliability coefficients are quite
similar to the values found in a Dutch sample of 415 subjects when the three subtests
were administrated without an adaptive procedure (Laros & Tellegen, 1991, page 33).
The results from this research indicate further that the Brazilian children obtained a
lower mean total score on the SON-R 5.5-17 (based on the subtests Categories,
Situations and Stories) in comparison to the Dutch norm group (96.2 versus 100). The
mean scores of the Brazilian children for Categories (94.8) and Situations (95,0) are
relatively low in comparison with the Dutch norm group (100); the mean score of the
Brazilian sample on Stories (97.5) comes close to the mean score of their Dutch
colleagues. The children of the school with a comparatively low SES-level seemed to
have more difficulties with Categories and Situations than the children of the school
with a higher SES-level. The mean score of the low SES-group (N=46) on the three
subtests was 94.8 while the mean scores on the three subtests were as follows:
Categories 92.9, Situations 92.5 and Stories 97.0. It can be concluded that Brazilian
children had more problems with Categories and Situations than with Stories. This trend
seems to get stronger for Brazilian children with a low SES-level.
The Spearman’s rho correlation coefficients between the item difficulties of the subtests
Categories, Situations and Stories obtained in the Brazilian sample and in the Dutch
norm sample are .97, .92, and .96. The relative low correlation coefficient for the subtest
Situations might be an indication of item bias in this subtest, but might also reflect the
different test administration procedures that were used in the Dutch and the Brazilian
research. If the latter were the case, the use of the non-standard administration
procedure would have had more effect on the item difficulties of Situations than on the
item difficulties of the other two subtests.
In addition to the administration of the three subtests of the SON-R 5.5-17, other data of
the 82 Brazilian children were gathered in order to obtain information about the validity
of the subtests. School marks on mathematics, science, and Portuguese were collected
for every child of the Brazilian sample. The school marks give an indication of the
school achievement of the children. The children were also evaluated on their
motivation, cooperation and concentration by their schoolteachers and by the
psychology students that administrated (a part of) the SON-R 5.5-17.
17
Of the three subtests, the report marks showed the highest correlations with the subtest
Categories: science (.61); mathematics (.59), and Portuguese (.38); they showed the
lowest correlations with the subtest Stories: science (.26); mathematics (.36), and
Portuguese (.29). The correlations of the report marks with the total score on the SON-R
5.5-17 are as following: science (.52); mathematics (.49), and Portuguese (.38). The
multiple correlation of the report marks with the total score on the SON-R 5.5-17
amounts to .60. The correlations of report marks with the total score on the SON-R 5.517 in the Dutch norm sample are of the same magnitude as the correlations found in the
Brazilian sample. Considering the fact that in the Brazilian study the total score on the
SON-R 5.5-17 was based on only three instead of all seven subtests, the correlations in
this study are quite high.
The judgment of the teachers on motivation, cooperation and concentration of their
pupils correlated higher with the total score on the SON-R 5.5-17 than the judgment of
the psychology students. When the judgment scores of the teachers and the psychology
students were combined into a mean score, these scores showed a higher correlation
with the total score on the SON-R 5.5-17 than the scores based on the judgment of the
teachers alone. The correlation of the total score on the SON-R 5.5-17 with the mean
score on motivation is .40, with the mean score on cooperation .45, and with the mean
score on concentration .19. The multiple correlation of the total score on the SON-R
5.5-17 with the mean scores on motivation, cooperation and concentration is .50.
Results of the judgmental procedure
The judgmental procedure used in this research consisted of the following: when a child
responded incorrectly on an example or an item the examiner checked, immediately
after the incorrect answer, whether the child understood the pictures that were used in
the item. This procedure was followed in a systematic way, and formed the basis on
which conclusions were made regarding the cultural specificity of the items. For the
subtest Categories the judgmental procedure implied that after each incorrectly
answered item the child had to indicate his or her familiarity with eight figures. [Each
item of the subtest Categories is build up out of eight figures; three figures define the
category, and five figures are the alternatives from which the child has to choose two
that pertain to the category]. In case of the subtest Situations the procedure implied that
after each incorrectly answered item, the child had to indicate his or her familiarity with
five to fourteen figures. The procedure implied for the subtest Stories that the child had
to indicate his or her familiarity with four to seven figures for every incorrectly
answered item. As a consequence of this time consuming test procedure the amount of
time needed to administer the three subtests of the SON-R 5.5-17 came close to an hour.
The results obtained with this judgmental procedure indicate that the Brazilian children
had no problems understanding the subtest instructions of the SON-R 5.5-17. On the
basis of the answers that the children provided with respect to their (un)familiarity with
the figures used in the items, 14 items were identified as being possibly culture-specific;
10 items of Categories (items 2b, 2c, 3a, 4a, 4b, 5c, 6a, 8a, 8c and 9c) and 4 items of
Situations (items 2b, 3c, 4a, and 10b). The criterion which was used to classify an item
as being possibly culture specific was the following: when one of the figures that form
18
an item was indicated by more than 5% of the Brazilian sample as being unfamiliar an
item was classified as being possibly culture specific. Maybe this criterion might be a
little too strict, considering the large amount of items of the subtest Categories that were
classified as possibly culture specific.
In case of the subtest Categories twice as many figures used in the alternatives were
unfamiliar to the Brazilian children than the figures that defined the category. With the
subtest Situations, however, twice as many ‘main’ figures were identified as unfamiliar
than the figures used in the alternatives. For the subtest Stories no adaptations for the
Brazilian culture were found to be necessary.
Conclusions of the studies in China, Peru and Brazil.
In the cross-cultural validation studies in China and Peru, where all seven subtests of the
SON-R 5.5-17 were administrated, the average of the mean scores on the three subtests with
meaningful picture materials (Categories, Situations and Stories) was significantly lower at
5% level than the average of the mean scores on the four remaining subtests that contain nonmeaningful picture materials (Mosaics, Hidden Pictures, Patterns and Analogies). This
research finding is in accordance with our expectations beforehand that were partially based
on the theoretical considerations of Jensen (1980) about culture-fair tests. This result indicates
a possible cultural bias in the subtests Categories, Situations and Stories.
Another indication of possible bias is given by the correlation between the item difficulties in
the Dutch norm group and the item difficulties in different cultures. The correlation
coeffcients give an indication of the extent in which the ordering of item difficulties is the
same in the two different cultures. A different ordering of the items might indicate that the
items do not have the same meaning for two different cultural groups. The Spearman’s rho
correlation coefficients for the subtests Mosaics, Patterns, and Analogies in the Chinese study
were all .99. The correlation coefficients for Categories in the Chinese, Peruvian, and
Brazilian study are respectively .98, .97, and .97; for Situations .98, .98, and .92; for Stories
.98, .97, and .96. Although all the correlation coefficients are quite high, the correlations for
the three subtests Categories, Situations and Stories in which meaningful picture materials are
used, are somewhat lower which might indicate possible bias in these subtests. The lowest
correlations were found in the Brazilian sample, but there the nonstandard test procedure
makes a straightforward comparison with the item difficulties in the Dutch sample
problematic.
The reliability coefficients of the subtests in the three studies are comparable with the
coefficients found in the Dutch sample, although the values in the Dutch sample are
somewhat higher. In the Chinese study, the only study where a factor analysis was performed,
the resulting factor structure failed to show a clear similarity with the factor structure in the
Dutch sample. This might be an indication of the cultural loading of some of the subtests.
In the Peruvian and Brazilian study clear indications of the validity of the SON-R 5.5-17 were
found. The SON-R 5.5-17 showed a relatively high correlation with the WISC-R (.77) in the
Peruvian study. In the Brazilian study the standardized total score based on three subtests of
the SON-R showed satisfactory correlations with school marks on science (.52); mathematics
19
(.49), and Portuguese (.38). The correlations of report marks with the total score on the SONR based on 7 subtests in the Dutch sample are of the same magnitude as the correlations found
in the Brazilian sample. In the Brazilian study the total score on the SON-R, based on three
subtests, correlated .40 with motivation, .45 with cooperation and .19 with concentration.
In all three studies a judgmental procedure was used for the identification of item bias in the
subtests Categories, Situations and Stories. In the Chinese study the Chinese psychology
students who applied the SON-R 5.5-17 reviewed the items on the aspect of possible cultural
bias. In the Peruvian and Brazilian study the children who responded incorrectly to an item
were asked if the pictures that were used in that item were understood by them. As a result of
this procedure 14 of the 33 items (42%) of the subtest Categories have been identified as
possibly culture specific. One of these items has been identified as culture specific in all three
studies (item 2b), 3 items were identified as biased in two studies (items 4a, 4b and 6a) and 9
items were identified as biased in just one study (items 1a, 2c, 3a, 3b, 5c, 7a, 8a, 8c, and 9c).
For the subtest Situations 6 of the 33 items (18%) and one example were identified as
possibly culture specific. Three of the items were identified in two different studies (items 2b,
2c, and 10b); the remaining 3 items were identified as biased in just one study (items 3c, 4a,
4b). The example that was identified as culture specific (example A) was identified in only
one study.
General Conclusions
The research results obtained in various countries for both the SON-R 2.5-7 and the
SON-R 5.5-17 indicate that both tests can be used in cultures that are different from the
Dutch culture. The validity-coefficients of both SON-R tests encountered in various
countries are comparable to the validity-coefficients found in the Dutch norm sample.
For the SON-R 5.5-17, however, some adaptations will have to be made in the subtests
Categories and Situations. For the subtest Categories 42% of the items will have be
adapted, while for the subtest Situations 18% of the items and one example will have to
be adapted. For the other subtests of the SON-R 5.5-17 no adaptations have to be made.
Although no specific research have been done to item bias of the SON-R 2.5-7,
adaptations of this test seems to be less necessary. For this young age group the test uses
only very simple pictures. In the construction phase of this test special attention was
paid to possible item bias. The research result that immigrant children in the
Netherlands had the same mean score on the subtests that use meaningful picture
materials as on the subtests that use non-meaningful picture materials such as
geometrical forms, is an empirical argument indicating the culture-fairness of the SONR 2.5-7. Only future empirical research will tell whether the impression is correct that
for cross-cultural use of the SON-R 2.5-7 no adaptations are necessary.
20
References
Anastasi, A. (1989). Psychological testing (6th edition). New York: Macmillan.
Brouwer, A., Koster, M. & Veenstra, B. (1995). Validation of the SnijdersOomen Test (SON-R 2.5-7) for Dutch and Australian children with disabilities.
Groningen: Internal Report, Department of Educational and Personality Psychology.
Ceci, S.J. (1991). How much does schooling influence general intelligence and
its cognitive components? A reassessment of the evidence. Developmental
Psychology, 27, 703-722.
Cooper, C.R. & Denner, J. (1998). Theories linking culture and psychology:
universal and community-specific processes. Annual Review of Psychology, 49,
559-584.
Cronbach, L.J. (1990). Essentials of psychological testing (5th edition). New
York: Harper-Collins Publishers Inc.
Drenth, P.J.D. (1975). Psychological tests for developing countries: rationale
and objectives. Dutch Journal of Psychology [Nederlands Tijdschrift voor de
Psychologie], 30, 5-22.
Ellis, B.B. (1995). A partial test of Hulin´s psychometric theory of measurement
equivalence in translated tests. European Journal of Psychological Assessment, 11,
184-193.
Hambleton, R.K. (1993). Translating achievement tests for use in cross-national
studies. European Journal of Psychological Assessment, 9, 57-68.
Hambleton, R.K. (1994). Guidelines for adapting educational and psychological
tests: a progress report. European Journal of Psychological Assessment, 10, 229-240.
Hambleton, R.K. & Bollwark, J. (1991). Adapting tests for use in different
cultures: technical issues and methods. Bulletin of the International Test
Commission, 18, 3-32.
Hambleton, R.K. & Anjee, A. (1995). Increasing the validity of cross-cultural
assessments: use of improved methods for test adaptations. European journal of
Psychological Assessment, 11, 147-157.
Hambleton, R.K. & Slater, S.C. (1997). Item response theory models and testing
practices: current international status and future directions. European journal of
Psychological Assessment, 13, 21-28.
Hamers, J.H.M. & Sijtsma, K. & Ruijssenaars, A.J.J.M. (1993). Learning
Potential Assessment. Theoretical, Methodological and Practical Issues. Amsterdam:
Swets & Zeitlinger.
Helms-Lorenz, M. & van de Vijver, F.J. (1995). Cognitive assessment in
education in a multicultural society. European Journal of Psychological Assessment,
11, 158-169.
Holland, P.W. & Thayer, D.T. (1988). Differential item performance and the
Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.), Test Validity (pp.
129-145). Hillsdale, New York: Lawrence Erlbaum.
Horn, J., ten (1996). Validation research of the Snijders-Oomen nonverbal
intelligence test (SON-R 2.5-7) in the USA. Groningen: Internal Report, Department
of Educational and Personality Psychology.
Hu, S. & Oakland, T. (1991). Global and regional perspectives on testing
21
children and youth: an empirical study. International Journal of Psychology, 26,
329-344.
Hulin, C.L. (1987). A psychometric theory of evaluations of item and scale
translations – Fidelity across languages. Journal of Cross-Cultural Psychology, 18,
115-142.
Jenkinson, J., Roberts, S., Dennehy, S. & Tellegen P. (1996). Validation of the
Snijders-Oomen Nonverbal intelligence test – Revised 2.5-7 for Australian children
with disabilities. Journal of Psycho-educational Assessment, 14, 276-286.
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.
Jensen, A.R. (1984). Test bias: Concepts and criticisms. In: C.R. Reynolds &
R.T. Brown (Eds.), Perspectives on bias in mental testing. New York: Free Press.
Judistira, E.M. (1996). A preliminary validation research with the SON-R 5.5-17
in China. Groningen: Internal Report, Department of Educational and Personality
Psychology.
Fan, X., Wilson, V.L. & Kapes, J.T. (1996). Ethnic group representation in test
construction samples and test bias: the standardization fallacy revisited. Educational
and Psychological Measurement, 56, 365-381.
Laros, J.A. & Tellegen, P.J. (1991). Construction and validation of the SON-R
5.5-17, the Snijders-Oomen non-verbal intelligence test. Groningen: WoltersNoordhoff.
Le Clerq M. & Holvast L. (1996). The SON-R 5.5-17 and the WISC-R applied
to Peruvian school children. Groningen: Internal Report, Department of Educational
and Personality Psychology. SWAP: 96-1227.
Linden, W.J. van der, & Hambleton, R.K. [eds.] (1997). Handbook of Modern
Item Response Theory. New York: Springer Verlag.
Oakland, T., Wechsler, S., Bensuan, E. & Stafford, M. (1994). The construct of
intelligence among Brazilian children – An exploratory study. School Psychology
International, 15, 361- 370.
Parmar, R.S. (1989). Cross-cultural transfer of non-verbal intelligence tests: an
(in)validation study. British Journal of Educational Psychology, 59, 379-388.
Poortinga, Y.H. (1995). Cultural bias in assessment: historical and thematic
issues. European Journal of Psychological Assessment, 11, 140-146.
Sattler, J.M. (1992). Assessment of children, Revised and updated third edition.
San Diego, CA: J.M. Sattler, Publisher, Inc.
Sijtsma, K. & Molenaar, I. (1987). Reliability of test scores in nonparametric
item response theory. Psychometrika, 52, 79-98.
Snijders, J.Th., Tellegen, P.J. & Laros, J.A. (1989). Snijders-Oomen Nonverbal
intelligence test: SON-R 5.5-17. Manual and research report. Groningen: WoltersNoordhoff.
Tellegen, P. (1997). An Addition and Correction to the Jenkinson et al. (1996) Australian
SON-R 2.5-7 Validation Study. Journal of Psychoeducational Assessment, 15, 67-69.
Tellegen, P.J. & Laros, J.A. (1993a). The Snijders-Oomen nonverbal intelligence
tests: general intelligence tests or tests for learning potential? In: Hamers, J.H.M. &
Sijtsma, K. & Ruijssenaars, A.J.J.M. Learning Potential Assessment. Theoretical,
Methodological and Practical Issues. Amsterdam: Swets & Zeitlinger.
22
Tellegen, P.J. & Laros, J.A. (1993b). The construction and validation of a nonverbal test
of intelligence: the revision of the Snijders-Oomen tests. European Journal of Psychological
Assessment, 9,147-157.
Tellegen, P.J., Winkel, M., Wijnberg-Williams, B.J. & Laros, J.A. (1998).
Snijders-Oomen Nonverbal Intelligence Test, SON-R 2.5-7, Manual & Research
Report. Lisse: Swets & Zeitlinger.
Van de Vijver, F.J.R. (1997). Meta-analysis of cross-cultural comparisons of
cognitive test performance. Journal of Cross-Cultural Psychology, 28, 678-709.
Van de Vijver, F.J.R. (1998). Cross-cultural assessment: value for money?
Invited Address for Division 2 of the International Association of Applied
Psychology. San Fransisco, August 10.
Van de Vijver, F.J.R. & Poortinga, Y.H. (1992). Testing in culturally
heterogeneous populations: When are cultural loadings undesirable? European
Journal of Psychological Assessment, 8, 17-24.
Van de Vijver, F.J.R. & Poortinga, Y.H. (1997). Towards an integrated analysis
of bias in cross-cultural assessment. European Journal of Psychological Assessment,
13, 29-37.
Wang, Z-M. (1993). Psychology in China: a review. Annual Review of
Psychology, 44, 87-116.
Wechsler, D. (1974). Wechsler Intelligence Scale for Children, Revised. (WISCR). New York: Psychological Cooperation.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children: Third edition.
(WISC-III). New York: Psychological Cooperation.
Weiss, S.C. (1980). Culture Fair Intelligence Test (CFIT) and Draw-a-Person
scores from a rural Peruvian sample. Journal of Social Psychology, 11, 147-148.
Zhang, H-C. (1988). Psychological Measurement in China. International Journal
of Psychology, 23, 101-117.
23
Download