Cross-cultural research with the Snijders-Oomen Nonverbal Intelligence Tests Jaap A. Laros 1 and Peter J. Tellegen 2 1 University of Brasília, Brazil, 2 University of Groningen, The Netherlands Abstract The SON-R 2.5-7 and the SON-R 5.5-17 are the latest revisions of the Snijders-Oomen nonverbal intelligence test, originally developed in the Netherlands in 1943 for use with deaf children. The tests consist of 6 to 7 subtests mainly focussed on visual-spatial abilities and abstract and concrete reasoning. Research in the Netherlands indicates that the nonverbal SON-R tests are well suited for use with children of ethnic minorities. With traditional tests the cognitive abilities of minority children are often underestimated as a result of their lack of knowledge of the official language. Notwithstanding the favorite research results with minority children in The Netherlands, it cannot simply be assumed that the SON-R tests can be used unmodified in countries which are greatly different from The Netherlands. In this presentation we will discuss results obtained with the SON-R tests in Australia, the USA, Great Britain, China, Peru and Brazil. Introduction With the latest revision and standardization in the Netherlands of the Snijders-Oomen Nonverbal Intelligence Tests, various cross-cultural validation studies have been initiated. The objective of these studies was to verify to which degree the SON-R tests can be used with cultures different from the cultures in Western Europe in general, and different from the cultures in the Netherlands in particular. In other words, the goal of these studies was to discern if, and to what extent adaptations of the material (e.g., test instructions, testing format, examples, items) of the SON-R tests are required for cross-national and cross-cultural use. The reason of the adaptation of the SON-R tests is to be able to assess in a fair way the construct of intelligence in multiple cultures. A crucial phase in the adaptation of conventional tests is the translation of the instrument from the source language into a target language. To obtain an equivalent test in another language or culture not only a translation is needed that preserves the meaning of the verbal test materials, but also additional changes may be necessary to insure the equivalence of the versions of the test in multiple languages or cultures, such as those affecting item format and testing procedures. (Hambleton, 1993). Close correspondence between the original version and the translated version is required before reliance can be placed on results based on translated tests. In addition to using effective translation practices, test users also need to examine the psychometric properties of translated tests (Ellis, 1995). The adaptation of nonverbal tests for multiple cultures does not include the difficult and often very problematic test translation 1 phase and is therefore much less complicated than the adaptation process of (partly) verbal tests. The fact that the adaptation process of nonverbal tests is much less complicated than the one needed for intelligence tests which use written or spoken language is one of the great advantages of nonverbal intelligence tests. The circumstance that nonverbal intelligence tests often do not need translation, does not automatically mean that there is no need for empirical studies on the equivalence of their applications for different cultures. Van de Vijver and Poortinga (1997) noted that when a psychological instrument developed in one society is applied in a different cultural context, invariance of psychometric properties like reliability and validity cannot be merely assumed, but has to be empirically demonstrated. The occurrence of bias can change the psychometric properties of an instrument when it is used in a different culture. Bias can appear at various levels; at test-, subtest-, or at item-level. A test, subtest, or an item is biased if it does not measure the same psychological trait across cultural groups (Van de Vijver, 1998). Three different types of bias can be differentiated, construct bias, method bias and item bias (Van de Vijver & Poortinga, 1997). A test shows construct bias if the construct measured is not identical across different cultural groups. Method bias occurs as a consequence of nuisance variables due to method-related factors. Three sorts of method bias can be distinguished: sample bias, instrument bias and administration bias. Sample bias occurs when the samples are incomparable on aspects other than the target variable. Instrument bias is related to characteristics of the instrument that are not identical for different culture groups. An example of this kind of bias is stimulus familiarity. The third type of method bias occurs when there are problems related with the administration of a test. This type of bias can occur when communication problems arise because of the insufficient knowledge of the subject of the language that the examiner uses. Item bias occurs if persons with the same amount of the trait being estimated, but belonging to different groups have different probabilities of making a specific response to the item. The probability that administration bias will occur with a nonverbal intelligence test like the SON-R is low as that the instructions are given in a nonverbal or in a verbal way, dependent on the possibilities of communication of the subject. Moreover, the providing of feedback following each item and the ‘showing how to do it’ by the examiner reduces the chance of the occurrence of administration bias. Drenth (1975) has described the SON-test as an example of culture-reduced tests, meaning that these tests only measure a limited number of culturally determined skills that they do not intend to measure. Examples of culturally determined skills are: being able to use a pencil, being capable of working with numbers, and being able to understand the instructions. Drenth has argued that it is not essential for a culture-reduced test to produce similar score distributions in different cultural groups; the only condition for a culture-reduced test is that it should not reflect skill differences determined by cultural factors. Drenth defines a culture-fair test as a culture-reduced test for the particular groups under consideration. The conception of Jensen (1980) of a culture-reduced test differs slightly from Drenth´s conception: Jensen considers a test culture-reduced when the performance on this test is only 2 to a limited extent influenced by culturally determined familiarity with the stimulus figures and response format. Jensen has formulated the following criteria for culture-reduced tests: (a) the test has to be a performance measure, (b) instructions should be given in mime to exclude the influence of language, (3) preliminary practice items should be provided, (4) items should not depend on time, (5) items should require abstract reasoning rather than factual information and (6) problems should be designed in such a way as to ensure subjects are unable to guess from memory of similar items encountered in the past. The SON-R tests do not satisfy all of the criteria of Jensen for culture-reduced tests because with one subtest of the SON-R 5.5-17 (Hidden Pictures) a time limit is being used. Jensen has also stated that the use of abstract geometrical figures reduces the likelihood of cultural bias. In the SON-R tests various subtests contain concrete, meaningful pictures instead of abstract, geometrical figures. The subtests using meaningful picture materials might be culture-specific. The research finding that immigrant children in the Netherlands (mainly children from Morocco, Turkey, Surinam and the Dutch Antilles) perform better on the SON-R tests than on traditional intelligence tests like the WISC-R and RAKIT (Laros & Tellegen, 1991; Tellegen, Winkel, Wijnberg-Williams & Laros, 1998) provides a positive empirical indication of the culture-fairness of the SON-R for the immigrant groups in the Dutch population. One of the reasons why immigrant children attain lower mean scores on traditional intelligence tests than on nonverbal intelligence tests like the SON-R is the relative strong reliance of these tests on verbal abilities and specific knowledge learned in school. This is especially the case with the so-called omnibus intelligence tests like the various Wechsler scales which contain subtests like Information and Vocabulary which presuppose specific knowledge learned in school (Helms-Lorenz & Van de Vijver, 1995). The fact that minority groups show lower means on a test, however, does not necessarily mean that a test is culturally biased. Van de Vijver & Poortinga (1992) argued that the desirability of cultural loadings in measurement procedures is determined by the intention of the test in question. If a particular test is intended to test knowledge gained during a course at school it is quite likely that culture-specific knowledge is tested. In this case, cultural loadings in tests are unavoidable and even desirable. In general a distinction can be made between generalizations about achievements and about aptitudes. In the latter case, cultural loadings are undesirable (Helms-Lorenz & Van de Vijver, 1995). A second research result with positive implications for the culture-fairness of the SON-R tests for the immigrant groups in the Netherlands is the finding that there is no relation between their length of stay in the Netherlands and their IQ-scores, indicating that performance on the SON-R is not dependent on knowledge of the Dutch language (Laros & Tellegen, 1991). Notwithstanding the existence of the positive indications of the culture-fairness of the SON-R tests, also some negative indications are available, especially in reference to the subtests of the SON-R in which concrete, meaningful picture materials are used instead of abstract geometrical figures. Our expectation before the start of the cross-cultural validation studies with the SON-R tests, was to encounter some bias in the subtests that use meaningful picture materials because some of the pictures used seemed rather specific for western cultures. 3 Before presenting the results of these studies, a short description of the SON-R tests will be given. Description of the SON-R tests The SON test was originally developed in 1943 by Snijders-Oomen for use with deaf children. She intended to measure a broad spectrum of intelligence functions without being dependent on the use of oral or written language. With subsequent revisions also norms and instructions for use with hearing subjects were developed. The latest revision comprises of separate tests for younger and older children, the SON-R 2.5-7 (Tellegen et al., 1998) and the SON-R 5.5-17 (Snijders, Tellegen & Laros, 1989). In table 1 some characteristics of the tests are presented. Table 1: Characteristics of the SON-tests in the Netherlands SON-R 2.5-7 age range N sample N subtests reasoning abilities spatial abilities perceptual abilities administration time mean reliability subtests reliability total score generalizability total score 2;0 – 7;11 years 1124 6 Categories Situations Analogies ---Mosaics Patterns Puzzles ---50 min. .72 .90 .78 SON-R 5.5-17 5;6 – 16;11 years 1350 7 Categories Situations Analogies Stories Mosaics Patterns ---Hidden Pictures 90 min. .76 .93 .85 Most subtests are related to either reasoning abilities or spatial abilities while in the SON-R 5.5-17 also a test for perceptual abilities is included. In the subtest Categories the subject has to find the common element between three pictures and select two other pictures that belong to the same category. In Situations one or more parts are missing from a drawing and the subject has to select those parts from a number of alternatives to make the drawing a meaningful whole. In Analogies the transformation of an abstract element is shown and the subject has to perform the same transformation on another element by selecting the proper alternative. In Stories cards have to be ordered to make a meaningful story. The SON-R 2.5-7 The revised version of the Snijders-Oomen Nonverbal Intelligence Test (Tellegen et al., 1988) for children between the ages of 2.5 to 7 years comprises 6 subtests. In sequence of administration the subtests are: Mosaics, Categories, Puzzles, Analogies, Situations 4 and Patterns. In the subtest Mosaics the child is asked to copy different mosaic patterns with red/white squares. In the first part of the subtest Categories the child has to sort cards based on the category they belong; in the second part two out of five pictures have to be chosen, which are missing in a certain category of pictures. In the first part of Puzzles, three pieces have to be copied in a frame to resemble an example; in the second part the child is asked to form a whole from three to six puzzle pieces. In the first part of Analogies the child is required to sort three to five forms according to form, colour, and size into two compartments. In part two of Analogies a geometric figure changes in one or more aspects to form another figure. To obtain a similar transformation with a second figure, the child is asked to choose the correct alternative. In the fist part of Situations with items showing only the upper halves of four pictures, the child is asked to find the missing halves. In the second part of Situations the task is to indicate the missing parts of drawings of concrete situations. In Patterns the child is asked to copy several patterns with a pencil. The subtests Mosaics, Puzzles and Patterns reflect spatial abilities, while the remaining subtests Categories, Analogies and Situations are more directed to abstract and concrete reasoning abilities. The administration of the SON-R 2.5-7 occurs individually; the mean administration time of the SON-R 2.5-7 amounts to 50 minutes. A very important element of the instructions exist in the ‘showing how to do it’ of the examiner of a part of the items. Another very important aspect of the test administration is the feedback that the examiner offers after each item. The feedback offered with the SON-R 2.5-7 goes beyond the feedback given with the SON-R 5.5-17; the child is not only informed whether the answer was right or wrong, but the examiner helps the child to find the correct solution. Because of this aspect the SON-R 2.5-7 has more similarities with a learning potential test than with a traditional test of intelligence (Tellegen & Laros, 1993a). The standardization of the SON-R 2.5-7 is based on a nationwide sample of 1124 children varying in age from 2 years and 3 months to 7 years and three months. The reliability (alpha stratified) of the total score of the SON-R 2.5-7 increases from .86 at 2.5 years to .92 at 7.5 years with a mean value of .90. The generalizability of the total score (alpha) increases from .71 at 2.5 years to .82 at 7.5 years with a mean value of .78. The average reliability of the subtests is .72. The test-retest correlation of the SON-R 2.5-7 with an interval of 3 months is .79. Besides a total IQ-score on the SON-R 2.5-7 also scores on the Performance Scale and Reasoning Scale are being calculated. The score on the Performance Scale is based on the three performance subtests Mosaics, Puzzles and Patterns, and the score on the Reasoning Scale is based on the three reasoning subtests Categories, Analogies and Situations. The validity of the SON-R 2.5-7 has been investigated through various validation studies in the Netherlands and in other countries like Great Britain, the USA, and Australia by comparing the results on this test with results on other intelligence- and 5 language development tests. The results of the studies outside the Netherlands will be described in the third part of this paper. The results on the SON-R 2.5-7 have been compared in the Netherlands with the following tests: the WISC-R, the WPPSI-R, the TONI-2, the Stutsman, the KaufmanABC, the BOS 2-30, the LDT, the RAKIT, the TOMAL, the DTVP-2 and the Reynell and Schlichting language development tests. The sample size of the various validation studies varies from 26 to 558 subjects; the mean sample size amounts to 118 subjects. The 21 correlations of the SON-R 2.5-7 with other non-verbal (intelligence) tests vary from .45 to .83 and have a mean value of .65. The 12 correlations with general intelligence measures vary from .54 to .87 with a mean of .65. The 19 correlations of the SON-R 2.5-7 with measures for verbal ability and verbal intelligence vary from .20 to .71 and have a mean value of .48. With some of the general intelligence tests it is possible to calculate the correlation with the performance scale, the verbal scale and with the total score. In all of these cases the correlation of the SON-R 2.5-7 with the performance scale was higher than the correlation with the verbal scale. These diverse validation studies support the divergent and convergent validity of the SON-R 2.5-7, but the highly varying correlations imply that substantial differences between the scores on the SON-R 2.5-7 and on other intelligence tests can appear. Possible explanations for these highly varying correlations are: differences between the contents of the SON-R and other intelligence tests, differences in the test procedure of the SON-R and the various other tests, the very young age at which the children were tested and the great interval of time between the application of the tests. The SON-R 5.5-17 The revised version of the Snijders-Oomen Nonverbal intelligence test (Snijders, Tellegen & Laros, 1989) for children and adolescents between the ages of 5.5 to 17 years consists of 7 subtests. In sequence of administration the 7 subtests are: Categories, Mosaics, Hidden Pictures, Patterns, Situations, Analogies and Stories. In Categories a child has to choose two out of five pictures, which are missing in a certain category of pictures. The task in Mosaics consists of copying figures with red/white squares. In Hidden Pictures the task is to find a given picture that is hidden several times in a bigger drawing. In Patterns a part of a particular pattern or line is missing: the child has to draw the missing part with a pencil. The task in Situations is to indicate the missing parts of drawings of concrete situations. In Analogies geometrical figures are presented with the problem format A : B = C : D; the child has to discover the principle behind the transformation A : B and apply it to figure C to find the correct figure D out of for alternatives. In Stories the child has to order a number of cards in such a way that they form a logical story. Categories, Situations and Analogies are multiple choice tests, while Mosaics, Hidden Pictures, Patterns and Stories are so-called ‘action’ tests. In action tests the solution has to be sought in an active manner which makes observation of behaviour possible. The SON-R 5.5-17 can be divided into four types of tests according to their contents: abstract reasoning tests (Categories & Analogies), concrete reasoning tests (Situations & Stories), 6 spatial tests (Mosaics & Patterns) and perceptual tests (Hidden Pictures). Principal components analysis (PCA) has been performed on the subtest correlations to obtain empirical confirmation of the theoretical dimensions of the test. Although for the youngest age groups the four theoretical dimensions were supported by the loadings on the first four varimax rotated components, across all age groups the two component solution with a ‘reasoning’ and a ‘spatial’ component provided more consistent results. The SON-R 5.5-17 is administered individually; the average administration time amounts to 90 minutes. The role of time scoring is kept to a minimum; only with the subtest Hidden Pictures an effective time is being used. For the other subtests sufficient time is allowed for the answering of each item. In the following two important aspects the administration procedure of the SON-R shows differences with the procedure used in traditional intelligence tests: (1) the providing of feedback following each item, and (2) the use of an adaptive procedure. The feedback given during the application of the SON-R 5.5-17 is restricted to informing the examinee whether the answer was right or wrong. The feedback clarifies the instructions and gives the child the opportunity to learn from his own errors and successes and to adjust his problem solving strategy. With the used test procedure of the SON-R each item becomes an opportunity to learn and adjust (Tellegen & Laros, 1993). The adaptive procedure is made possible by dividing the subtests in two or three parallel series of about 10 items; every child starts with the easiest item of the first series. Each series is broken off after two errors; the starting point for the next series is determined by the score on the preceding series. In this way, the administration of items is determined by the subject´s individual performance and the presentation is limited to the most relevant items for each subject. On the average the number of items presented with the adaptive procedure is 50% of the total number of items in the subtests while the maximum number to presented amounts to about 60%. The standardization of the SON-R 5.5-17 is based on a nationwide sample of 1350 children and adolescents varying in age from 6 to 14 years. The reliability coefficient (alpha stratified) of the total score of the test increases from .90 at six years to .94 at fourteen years with a mean value of .93. The generalizability of the total score (alpha) increases from .81 at six years to .88 at fourteen years with a mean value of .85. The average reliability of the subtests is .76. The validity of the SON-R 5.5-17 is evident from the clear relationship with different indicators of school career such as school type, class repetition and school report marks. The mean multiple correlation of the SON-R with these indicators of school career amounts to .59. For children in the age group of 7 to 9 years the multiple correlation is .54, for children in the age group 10-11 years .60, and for children in the age group 13-14 years the multiple correlation increases to .63. Cross-cultural studies with the SON-R 2.5-7 Research in Australia The validity study with the SON-R 2.5-7 in Australia was executed in 1996 in Victoria under supervision of Jo Jenkinson of the Deakin University in co-operation with the University of 7 Groningen (Jenkinson, Roberts, Dennehy & Tellegen, 1996; Tellegen, 1997). The study is based on 155 subjects, 72 boys and 83 girls, with a mean age of 4 years and 5 months (standard deviation 10 months). Within the group of 155 children three groups can be differentiated: children without specific handicaps (N=59); hearing impaired children (N=59), and children with a developmental retardation (N=37). In this research both the Wechsler Preschool and Primary Scale of Intelligence – Revised (WPPSI-R) and the SON-R 2.5-7 were administrated in changing order; the mean interval between applications was 20 days. The SON-R 2.5-7 was administered according to the standard test procedure by psychology students from the University of Groningen, while the WPSSI-R was administrated by Australian psychologists. The hearing impaired children and the children with a developmental retardation only did the performance scale of the WPPSI-R. Results The correlation of the SON-R 2.5-7 with the WPPSI-R in the total Australian sample amounts to .78. Within the three different groups the correlation between the performance scale of the WPSS-R and the SON-R 2.5-7 was .74, .74 and .75. Within the non-handicapped group the correlation of the SON-R 2.5-7 with the verbal scale (.54) is lower than the correlation with the performance scale (.74) of the WPPSI-R. The scores on the SON-R 2.5-7 are on average 5 points lower than the scores on the WPSSI-R. Table 2: Characteristics of the research with the SON-R outside The Netherlands _________________________________________________________________ SON-R 2.5-7 Australia Great-Britain United States -----------------------------------------------------------------------------------------N of subjects 155 58 75/31/26/29/47 Age 4;5 (0;10) 6;3 (0;3) 5;1/4;7/4;7/5;6/4;7 Correlation with criterion test .78 (WPPSI-R PIQ) .87 (BAS 6 subt.) .59 (WPPSI-R FSIQ) .66 (K-ABC) .61 (MSCA) .47 (PPVT-R) .61 (PLS-3) _________________________________________________________________ SON-R 5.5-17 China Peru Brazil -----------------------------------------------------------------------------------------N of subjects 302 160 82 Age 11;6 9;4 10;5 Correlation with criterion test .77 (WISC-R FSIQ) .60 (school marks) Items with problems: Categories 6 3 10 Situations 3 2 4 8 Stories 0 0 0 _________________________________________________________________ Research in the USA The validation study executed in the USA (West Virginia) supervised by Stephen O´Keefe of the West-Virginia Graduate College implied the application of the SON-R 2.5-7 and five other cognitive tests. The tests that were applied were the following: the WPPSI-R, the Kaufman-ABC, the McCarthy Scales of Children’s Abilities (MSCA), the Peabody Picture and Vocabulary Test - Revised (PPVT-R) and the Preschool Language Scale-3 (PLS-3). The SON-R 2.5-7 was partly administrated by psychology students from the University of Groningen, partly by psychologists from West Virginia. The amount of time between the administration of the SON-R 2.5-7 and the other tests generally was very short; in the majority of the cases the other test was applied on the same day. The PLS-3 was not administrated during this study; the test scores on the PLS-3 were collected on another occasion and could be used in this research. The number of children that both made the SONR 2.5-7 and another test varied from 26 (in case of the MSCA) to 75 (in case of the WPPSIR). Results The SON-R 2.5-7 had a correlation of .59 with the total score on the WPPSI-R; the correlations with the performance and verbal scales were .60 and .43 respectively. The average age of the 75 children that did both tests was 5.1 years. The mean total score on the SON-R 2.5-7 was more than two points lower than the total score on the WPPSI-R (94.5 versus 96.8) and nearly four points lower than the performance scale of the WPPSI-R (94.5 versus 98.3). The SON-R 2.5-7 showed a correlation of .66 with the Kaufman-ABC. The correlation with the simultaneous scale was much higher than with the scale for sequential processing (.58 versus .29). The correlation with the nonverbal scale of the Kaufman-ABC amounted to .61. The average age of the 31 children that did both the Kaufman-ABC and the SON-R 2.5-7 was 4.6 years. With the general cognitive index of the MSCA the SON-R 2.5-7 showed a correlation of .61; the correlation with the verbal scale was .48; while the correlation with the perceptual performance scale amounted to .61. The average age of the 26 children that did both tests was 4.6 years. The correlation of the SON-R 2.5-7 with the PPVT-R was .47; the average age of the 29 children that did both tests was 5.5 years. With the total language score of the PLS-3 the SON-R 2.5-7 showed a correlation of .61; with the Auditory Comprehension Scale the correlation was .59, while the correlation with the Expressive Communication Scale amounted to .56. The average age of the 47 children who did both tests was 4.6 years. 9 Research in Great Britain During this validation research in Great Britain which took place in 1996 and that was supervised by Julie Dockrell of the University of London, the SON-R 2.5-7 and the British Ability Scales (BAS) were both administrated. The BAS was administrated by psychology students from the University of London, and the SON-R was applied by psychology students from the University of Groningen. Both tests were applied in changing order to 58 children, 34 boys and 24 girls, from the first year of primary school. The mean age of the children was 6;3 years (standard deviation 3 months). The interval between test administrations varied from some days to some weeks. Within the total sample three groups of children can be distinguished: a group without specific handicaps (N=20), a group for which English is the second language (N=22), and a group with learning disabilities (N=16). Six subtests of the BAS were applied, the four subtests of the short version (Naming Vocabulary, Digit Recall, Similarities, and Matrices) and two extra nonverbal subtests (Block Design and Visual Recognition). Results The correlation of the SON-R 2.5-7 with the short version of the BAS is .80. When the two nonverbal subtests are included the correlation augments to .87. The correlation with the verbal part (three verbal subtests) of the shortened version of the BAS was .71, while the correlation with the performance part (three nonverbal subtests) amounted to .78. The correlations in the group of children without any handicaps are considerably lower than in the other two groups (.56 versus .76 and .78). In the total English sample the SON-R IQ-scores are 7 points lower than on the short form of the BAS. The difference in IQ-scores between the group of normal children and children with learning disabilities for the SON-R 2.5-7 is 40.8 points and for the short form of the BAS 42.8 points. Conclusions of the studies in Australia, the USA, and Great Britain In the three cross-cultural studies the correlations of the SON-R 2.5-7 with the performance scales of a number of criterion tests could be compared to the correlations with the verbal scales. In all three studies the correlation of the SON-R 2.5-7 with the performance scale of the criterion test was clearly stronger than with the verbal scale. In the Australian study the SON-R 2.5-7 showed correlations with the performance and verbal scales of the WPPSI-R of .74 and .54 respectively; in the study in the USA these correlations were .60 and .43. In the same study the SON-R 2.5-7 showed a correlation of .61 with the perceptual performance scale of the MSCA, and a correlation of .48 with the verbal scale of the MSCA. In the research in Great Britain the SON-R 2.5-7 showed a correlation with the performance part of the BAS of .78 and a correlation of .71 with the verbal part of the BAS. These correlations obtained in Australia, the USA and Great Britain support the convergent and the divergent validity of the SON-R 2.5-7. Cross-cultural Studies with the SON-R 5.5-17 10 Research in China Chinese psychology students in collaboration with Milly Judistera, a Dutch psychologist, executed the research with the SON-R 5.5-17 in China, which took place in 1996. The study was supervised in China by Professor Zhang Hou Can of the Beijing Normal University. This study was a pilot study as a preparation of the standardization and adaptation of the SON-R 5.5-17 for China. All 7 subtests of the SON-R 5.5-17 were administrated according to the standard test procedure by Chinese psychology students to a sample of 302 Chinese children, consisting of 165 boys and 137 girls. To make the administration of the test possible, the instructions of the SON-R 5.5-17 were translated into the Chinese language. The Chinese students were trained in the administration of the SON-R 5.5-17 by Milly Judistira with help of an English speaking Chinese psychology student as an interpreter. The Chinese subjects were tested in the following age groups: 6.5-year-olds (103), 11.5year-olds (94), and 14.5-year-olds (105). The children came from Beijing (23), Tianjin (81), Miyun (77) and Guangrao (121). The sampling procedure used was not a random procedure; the schools were chosen on basis of existing contacts with the University of Beijing, and within the schools children were chosen by the teachers. Although the teachers claimed to have chosen the children in a random manner, there is no guarantee that this really was the case. The research results therefore have to be interpreted with some caution. Psychometric characteristics The generalizabilty coefficient (alpha) of the total score of the SON-R 5.5-17 in the Chinese sample increases from .76 at 6.5 years to .82 at 14.5 years with a mean value of .80. These values are lower than the ones found for the Dutch standardization sample; here the generalizability of the total score increased from .81 at 6.5 years to .88 at 14.5 years with a mean value of .85. The generalizability coefficient is computed by the usual coefficient alpha in which the subtests are the unit of analysis; the number of subtests and the mean correlation between the subtests determine this coefficient. The mean correlation between the 7 subtests of the SON-R 5.5-17 was lower for the three Chinese age groups than for the comparable age groups of the Dutch sample (.31, .40, and .42 versus .37, .50 and .51). The mean of the standardized total score obtained by the Chinese children was quite close to the mean standardized total score of the Dutch norm group (98.9 versus 100). The mean scores of the Chinese children on Categories, Situations and Stories, all subtests that use meaningful picture materials, were lower in comparison with the Dutch norm group (respectively 93.6, 91.6, and 95.3 versus 100). The mean of these three subtests amounts to 93.5. The mean scores of the Chinese children on Mosaics, Hidden Pictures, Patterns and Analogies, all subtests using non-meaningful picture material such as geometrical forms, were 96.9, 101.9, 104.2 and 108.1. The mean of these four subtests amounts to 102.8 and is higher in comparison to the mean score of the four subtests in the Dutch norm group (100). The difference found in the Chinese sample between the mean score of the subtests with meaningful picture material (93.5) and the 11 mean score of the subtests with non-meaningful picture material (102.8) is significant at the 5% level. This result is an indication that the Chinese children had more difficulties with the subtests that use meaningful picture materials than with the subtests using nonmeaningful picture materials. The relative low scores of the Chinese children on Categories, Situations and Stories were conform to expectations beforehand; the meaningful pictures used in these subtests were not expected to be familiar to cultures very different from the Dutch culture. In one of the items of the subtest Categories, for instance, people leaving a church is shown; such a situation did appear not to be very familiar to Chinese children. The lower mean score on Mosaics (96.9 versus 100) does not correspond with earlier expectations since the item material of this subtest consist of abstract geometrical figures which according to Jensen (1980) reduces the probability of cultural bias. The Chinese children did not show any problems with the other two subtests containing abstract geometrical figures; on these subtests, Patterns and Analogies, they even showed higher mean scores than their Dutch colleagues (104.2 and 108.1 versus 100). The lower score on Mosaics might be a result of the fact that the majority of the Chinese children were not tested with the original version of the subtest Mosaics. Due to financial restrictions there was only one original SON-R 5.5-17 test available for the research in China; in order to be able to test various children in the same period of time a number of copies of the test were made. The squares of the copied version of Mosaics did not fit very well in the frame in which the mosaic patterns had to be copied. Probably, the use of test material, which was not standardized, has caused the lower score of the Chinese children on Mosaics. Factor analysis failed to show a clear similarity between the factor structure of the 7 subtests of the SON-R 5.5-17 in the Chinese sample and in the Dutch standardization sample. In the Dutch sample Principal Component Analysis (PCA) with varimax rotation resulted in two clear factors: a ‘reasoning’ and a ‘spatial’ factor. Although in the youngest age groups four factors appeared (‘spatial’, ‘concrete reasoning’, ‘abstract reasoning’, and ‘perception’) the two factor solution offered better and more consistent results across all different age groups. In the Chinese sample PCA with varimax rotation did not provide the same results as in the Netherlands. In the Dutch research the two spatial subtests showed high loadings on the second factor (the ‘spatial’ factor) while in the Chinese study this only seems to be the case for the youngest group. Moreover, in the Dutch research Categories showed high loadings on the first factor (the ‘reasoning’ factor); but this was not the case for the eldest group of the Chinese sample. The results of the factor analysis suggest a lack of test equivalence of the SON-R 5.5-17 in the Chinese and the Dutch culture. These results, however, have to be interpreted with some caution because reservations can be made about the appropriateness of factor analysis as a method for assessing equivalence of test applications in different cultures. Hambleton & Bollwark, (1991) note in this respect that the disadvantage of factor analysis is that the results are sample dependent, since it is based on classical item statistics. Their conclusion also applies to 12 other often used classical statistical methods to detect a possible cultural loading of a test that are sample dependent, like the comparison of p-values and the comparison of total test scores. All these methods presuppose the use of equal ability groups. Even in the case of non-equal ability groups researchers must still check that the ordering of item difficulties is the same in the two different cultures (Hambleton & Kanjee, 1995). Following the recommendation of Hambleton & Kanjee, the correlation coefficients between the ordering of the item difficulties in the two cultures have been calculated. The Spearman´s rho correlation coefficients between the item difficulties of the six subtests of the SON-R (one subtest, Hidden Pictures, does not consist of independent items so no correlations could be computed) obtained in the Chinese sample and in the Dutch sample are the following: .98 (Categories), .99 (Mosaics), .99 (Patterns), .98 (Situations), .99 (Analogies) & .98 (Stories). Although the correlations between the item difficulties in the two cultures are all very high, Categories, Situations, and Stories show a slightly lower correlation than the remaining subtests. This might be an indication of cultural bias of these subtests. The above-described psychometric characteristics of the SON-R 5.5-17 in this study with 302 Chinese children suggest that the subtests Categories, Situations and Stories might have a cultural bias. A next step in this study was an attempt to identify the sources of this possible cultural bias of the three subtests. Results of the judgmental procedure According to Hambleton (1993) both judgmental and statistical methods should be used in studies to determine the equivalence of a test in multiple languages or cultures. In this study the judgmental procedure consisted of the reviewing on the aspect of possible cultural bias of the test instructions, the testing format, the examples, and all the items of Categories, Situations and Stories for the Chinese population of children from 6 to 17 years of age. The reviewing took place during a group discussion with the Chinese psychology students. The test instructions, the testing format and the examples used in the SON-R 5.5-17 were judged by them as containing no cultural bias. The only aspect of the testing procedure that caused some problems for the Chinese administrators of the SON-R 5.5-17 was the providing of feedback after each item. Some students reported that they felt uncomfortable giving feedback because they believed that this might influence the emotions of the children. Possibly the problems with the provision of feedback is related to the fact that in the Chinese culture communication often occurs in an indirect manner. The results of the judgmental procedure with respect to the items of the three subtests are the following: 6 items of Categories (items 2b, 3b, 4b, 6a, 7a, and 9b) and 3 items of Situations (2b, 2c and 10b) were identified as being possibly culture specific. For the subtest Stories no culture specific items were identified. For these culture specific items new items have been developed that are adapted to the Chinese culture. A Chinese student of the art academy has drawn the adapted items. In the future standardization and validation research of the SON-R 5.5-17 in China, items 13 that were identified as being culture specific will be replaced by adapted items. Research in Peru The validation study with the SON-R 5.5-17 in Peru took place in 1996 and was executed by two psychology students from the University of Groningen, and by Peruvian psychologists. The research was supervised in Peru by Veronica Bisso Cajas, from the Consortium of Catholic Centres of Education in Lima. This study served as a preparation of the standardization and adaptation of the SON-R 5.5-17 for Peru. All 7 subtests of the SON-R 5.5-17 were administrated according to the standard test procedure by the two Dutch psychology students and by ten Peruvian psychologists. To make the administration of the test possible, the instructions of the SON-R 5.5-17 were translated into Spanish. The Peruvian psychologists were trained in the administration procedure of the SON-R 5.5-17. The test was administrated to a sample of 160 Peruvian children, consisting of 79 boys and 81 girls. The age of the children varied from 6 years to 15 years with a mean of 9;4 years. All the subjects of the Peruvian sample lived in the city of Lima, the capital of Peru; 50% of the children came from State schools and 50% private schools. The State schools in Peru are fully supported by the government; children who visit these schools generally come from families with a low SES-level and often live in very poor circumstances. The private schools are partly financed by religious congregations and private persons; children who frequent these schools generally come from families with a high or moderately high SES-level. The sampling procedure used was only partially a random procedure; the schools were chosen on basis of existing contacts with the Consortium of Catholic Centres of Education, but within the schools children were selected at random. In this research there is an over-representation of children with moderate and high SES levels. The Peruvian sample cannot be considered as representative for Peru, nor for Lima. The results of this study, therefore, cannot simply be generalised to all of Peru and have to be interpreted with caution. Psychometric characteristics The results of this study show that the Peruvian children obtained a lower mean score on the SON-R 5.5-17 than the children of the Dutch norm group (94.0 versus 100). The mean scores of the Peruvian children on Categories, Situations and Stories, the subtests which use meaningful picture material, were 97.1, 91.7 and 93.1. The mean of these three subtests amounts to 93.9. The mean scores of the Peruvian children on Mosaics, Hidden Pictures, Patterns and Analogies, all subtests using non-meaningful picture materials, were 91.6, 92.2, 101.4, and 98.1. The mean of these four subtests amounts to 95.8. The difference between the mean score of the subtests with meaningful picture material (93.9) and the mean score of the subtests with non-meaningful picture material (95.8) is significant at the 5% level. This finding is in accordance with expectations beforehand; Peruvian children, like the Chinese children, showed more problems with the subtests containing meaningful pictures than with the subtests containing nonmeaningful pictures. 14 To check whether the ordering of item difficulties was the same in the Peruvian sample and the Dutch sample Spearman’s rho correlation coefficients were calculated. The correlation coefficients between the item difficulties of the subtests Categories, Situations and Stories obtained in the Peruvian sample and in the Dutch norm sample are .97, .98, and .97. These correlations are based on 48 children from the total sample of Peruvian children. Note that the correlation coefficients between the item difficulties of Categories, Situations and Stories obtained in the Peruvian and in the Dutch sample are of the same magnitude as the correlations between the item difficulties found in the Chinese and the Dutch sample. In addition to the SON-R 5.5-17 also a Spanish version of the WISC-R was administered to all 160 children of the Peruvian sample in order to obtain information about the validity of the SON-R 5.5-17 in Peru. Since there were no Peruvian norms available for the WISC-R, norm scores were used that are based on the standardization sample of the USA (Wechsler, 1974). The correlation of the SON-R 5.5-17 with the full scale of the WISC-R was .77, with the performance scale .74 and with the verbal scale .69. These correlations are slightly lower compared to those found in a study in the Netherlands with 35 children from an outdoor psychiatric university clinic (Tellegen & Laros, 1993b). In this study the correlation of the SON-R 5.5-17 with the full scale of the WISC-R was .80, with the performance scale .80 and with the verbal scale .66. The mean IQ-score of the Peruvian sample on the WISC-R was 96.7; the mean score on the verbal scale 94.3 and the mean score on the performance scale 100.2 These scores cannot simply be compared to the mean IQ-score on the SON-R 5.5-17 because the tests have been standardized in different years; the WISC-R was standardized in 1974 and the SON-R 5.5-17 in 1988. As Flynn (1987) has observed, norms on intelligence tests are becoming stricter during the years. In a recent study (Sattler, 1992) comparing the norms of the WISC-R with those of the most recent version of the Wechsler test for children, the WISC-III (Wechsler, 1991), the norm scores on the WISC-III appeared to be lower than the norm scores on the WISC-R. In the 17 years between the two standardizations (1974-1991) the norm scores for the full scale decreased 5.3 points, for the verbal scale 2.4 points, and for the performance scale 7.4 points. Based on the differences found between the norms of the WISC-R and the WISC-III one can correct the obtained scores for the differences in years of standardization of the WISC-R and the SON-R 5.5-17. After correction, the mean norm score on the WISC-R of the Peruvian sample becomes 92.3 for the full scale, 94.1 for the performance scale and 92.3 for the verbal scale. The Peruvian children obtain, after correction, a higher score on the SON-R 5.5-17 than on the WISC-R. Results of the judgmental procedure To identify culture specific items of the subtests Categories, Situations and Stories a judgmental procedure was used. With the remaining subtests (Mosaics, Hidden Pictures, Patterns, Analogies) no judgmental procedure was used, because these subtests are less likely to be culture specific. When a child responded incorrectly on an item of 15 Categories, Situations or Stories the examiner checked, after testing, whether the pictures used in the item were understood by the child. This procedure, however, has not been followed in a systematic way; not all the children, who responded incorrectly to an item, were asked if the pictures used in that particular were familiar to them. The difference with the judgmental procedure used in the Chinese study is that there the examiners (psychology students) were asked to judge the cultural loading of the items; in this study the examinees (children) were asked to give their judgement. The results of the judgmental procedure with respect to the items of the three subtests are the following: 3 items of Categories (items 1a, 2b, and 4a) and 2 items and one example of Situations (example A, and items 2c and 4b) were identified as being possibly culture specific. Research in Brazil The research in Brazil, which took place in 1996, was executed under the supervision of Jaap Laros of the University of Brasilia, one of the authors of the SON-R tests. Six psychology students of the above-mentioned university did the actual test administration at schools. To make the administration of the test possible, the test instructions of the SON-R 5.5-17 were translated into Portuguese. The Brazilian psychology students were trained in the administration of the SON-R 5.5-17 by their supervisor. The study was a preparation for the standardization and adaptation of the SON-R 5.5-17 for Brazil. In this Brazilian study the subtests Categories, Situations and Stories of the SON-R 5.517 were administered individually to a sample of 82 Brazilian children, consisting of 41 girls and 41 boys. The age of the children varied from 7 to 14 years with a mean of 10,4 years. All children came from 2 state schools from Brasilia, the capital of Brazil. The two schools differed in respect to the SES-level of their pupils; the 36 children from the first school generally came from families with a moderate SES-level, while the 46 children from the second school principally came from families with a low SES-level. The sampling procedure used was not a totally random procedure; the schools were selected on the basis of existing contacts with the University of Brasilia, but within the schools the children were selected at random, using only their age as a selection criterion. The Brazilian sample cannot be considered to be representative for Brazil, nor for Brasilia. The results of this study should therefore be interpreted with some caution. To gain additional information on possible cultural bias, the three subtests of the SONR 5.5-17 have been administered in a nonstandard fashion. Van de Vijver & Poortinga (1997) stress the importance of nonstandard administration of tests in cross-cultural studies to evaluate the adequacy of stimulus and response formats, and of the test administration procedure. In such a nonstandard test procedure it usually is very informative to ask examinees to motivate their responses. In the standard procedure of the SON-R 5.5-17 the subtests are divided into two or three parallel series of about 10 items; each child starts with the easiest item of the first series, but with which item of the second or third series is started, depends on the performance of the child on the previous series. In the nonstandard procedure used in this study the items were offered in order of increasing difficulty; the subtests Categories and Situations with 27 and 33 16 items were broken off after twelve errors, while the subtest Stories with 20 items was broken off after 8 errors. With the use of this nonstandard test procedure the children responded to much more items than would have been the case if the standard adaptive procedure had used. The average number of items administrated with the adaptive procedure is 50% of the total number of items; with the nonstandard procedure this number amounts to about 90% of the total number of items. Psychometric characteristics The reliability coefficients (alpha) for the three subtests in the Brazilian sample were as follows: .85 for Categories, .87 for Situations and .82 for Stories. When corrected for the influence of age, the corrected reliability coefficients are: .76 for Categories, .82 for Situations and .76 for Stories. These values of the reliability coefficients are quite similar to the values found in a Dutch sample of 415 subjects when the three subtests were administrated without an adaptive procedure (Laros & Tellegen, 1991, page 33). The results from this research indicate further that the Brazilian children obtained a lower mean total score on the SON-R 5.5-17 (based on the subtests Categories, Situations and Stories) in comparison to the Dutch norm group (96.2 versus 100). The mean scores of the Brazilian children for Categories (94.8) and Situations (95,0) are relatively low in comparison with the Dutch norm group (100); the mean score of the Brazilian sample on Stories (97.5) comes close to the mean score of their Dutch colleagues. The children of the school with a comparatively low SES-level seemed to have more difficulties with Categories and Situations than the children of the school with a higher SES-level. The mean score of the low SES-group (N=46) on the three subtests was 94.8 while the mean scores on the three subtests were as follows: Categories 92.9, Situations 92.5 and Stories 97.0. It can be concluded that Brazilian children had more problems with Categories and Situations than with Stories. This trend seems to get stronger for Brazilian children with a low SES-level. The Spearman’s rho correlation coefficients between the item difficulties of the subtests Categories, Situations and Stories obtained in the Brazilian sample and in the Dutch norm sample are .97, .92, and .96. The relative low correlation coefficient for the subtest Situations might be an indication of item bias in this subtest, but might also reflect the different test administration procedures that were used in the Dutch and the Brazilian research. If the latter were the case, the use of the non-standard administration procedure would have had more effect on the item difficulties of Situations than on the item difficulties of the other two subtests. In addition to the administration of the three subtests of the SON-R 5.5-17, other data of the 82 Brazilian children were gathered in order to obtain information about the validity of the subtests. School marks on mathematics, science, and Portuguese were collected for every child of the Brazilian sample. The school marks give an indication of the school achievement of the children. The children were also evaluated on their motivation, cooperation and concentration by their schoolteachers and by the psychology students that administrated (a part of) the SON-R 5.5-17. 17 Of the three subtests, the report marks showed the highest correlations with the subtest Categories: science (.61); mathematics (.59), and Portuguese (.38); they showed the lowest correlations with the subtest Stories: science (.26); mathematics (.36), and Portuguese (.29). The correlations of the report marks with the total score on the SON-R 5.5-17 are as following: science (.52); mathematics (.49), and Portuguese (.38). The multiple correlation of the report marks with the total score on the SON-R 5.5-17 amounts to .60. The correlations of report marks with the total score on the SON-R 5.517 in the Dutch norm sample are of the same magnitude as the correlations found in the Brazilian sample. Considering the fact that in the Brazilian study the total score on the SON-R 5.5-17 was based on only three instead of all seven subtests, the correlations in this study are quite high. The judgment of the teachers on motivation, cooperation and concentration of their pupils correlated higher with the total score on the SON-R 5.5-17 than the judgment of the psychology students. When the judgment scores of the teachers and the psychology students were combined into a mean score, these scores showed a higher correlation with the total score on the SON-R 5.5-17 than the scores based on the judgment of the teachers alone. The correlation of the total score on the SON-R 5.5-17 with the mean score on motivation is .40, with the mean score on cooperation .45, and with the mean score on concentration .19. The multiple correlation of the total score on the SON-R 5.5-17 with the mean scores on motivation, cooperation and concentration is .50. Results of the judgmental procedure The judgmental procedure used in this research consisted of the following: when a child responded incorrectly on an example or an item the examiner checked, immediately after the incorrect answer, whether the child understood the pictures that were used in the item. This procedure was followed in a systematic way, and formed the basis on which conclusions were made regarding the cultural specificity of the items. For the subtest Categories the judgmental procedure implied that after each incorrectly answered item the child had to indicate his or her familiarity with eight figures. [Each item of the subtest Categories is build up out of eight figures; three figures define the category, and five figures are the alternatives from which the child has to choose two that pertain to the category]. In case of the subtest Situations the procedure implied that after each incorrectly answered item, the child had to indicate his or her familiarity with five to fourteen figures. The procedure implied for the subtest Stories that the child had to indicate his or her familiarity with four to seven figures for every incorrectly answered item. As a consequence of this time consuming test procedure the amount of time needed to administer the three subtests of the SON-R 5.5-17 came close to an hour. The results obtained with this judgmental procedure indicate that the Brazilian children had no problems understanding the subtest instructions of the SON-R 5.5-17. On the basis of the answers that the children provided with respect to their (un)familiarity with the figures used in the items, 14 items were identified as being possibly culture-specific; 10 items of Categories (items 2b, 2c, 3a, 4a, 4b, 5c, 6a, 8a, 8c and 9c) and 4 items of Situations (items 2b, 3c, 4a, and 10b). The criterion which was used to classify an item as being possibly culture specific was the following: when one of the figures that form 18 an item was indicated by more than 5% of the Brazilian sample as being unfamiliar an item was classified as being possibly culture specific. Maybe this criterion might be a little too strict, considering the large amount of items of the subtest Categories that were classified as possibly culture specific. In case of the subtest Categories twice as many figures used in the alternatives were unfamiliar to the Brazilian children than the figures that defined the category. With the subtest Situations, however, twice as many ‘main’ figures were identified as unfamiliar than the figures used in the alternatives. For the subtest Stories no adaptations for the Brazilian culture were found to be necessary. Conclusions of the studies in China, Peru and Brazil. In the cross-cultural validation studies in China and Peru, where all seven subtests of the SON-R 5.5-17 were administrated, the average of the mean scores on the three subtests with meaningful picture materials (Categories, Situations and Stories) was significantly lower at 5% level than the average of the mean scores on the four remaining subtests that contain nonmeaningful picture materials (Mosaics, Hidden Pictures, Patterns and Analogies). This research finding is in accordance with our expectations beforehand that were partially based on the theoretical considerations of Jensen (1980) about culture-fair tests. This result indicates a possible cultural bias in the subtests Categories, Situations and Stories. Another indication of possible bias is given by the correlation between the item difficulties in the Dutch norm group and the item difficulties in different cultures. The correlation coeffcients give an indication of the extent in which the ordering of item difficulties is the same in the two different cultures. A different ordering of the items might indicate that the items do not have the same meaning for two different cultural groups. The Spearman’s rho correlation coefficients for the subtests Mosaics, Patterns, and Analogies in the Chinese study were all .99. The correlation coefficients for Categories in the Chinese, Peruvian, and Brazilian study are respectively .98, .97, and .97; for Situations .98, .98, and .92; for Stories .98, .97, and .96. Although all the correlation coefficients are quite high, the correlations for the three subtests Categories, Situations and Stories in which meaningful picture materials are used, are somewhat lower which might indicate possible bias in these subtests. The lowest correlations were found in the Brazilian sample, but there the nonstandard test procedure makes a straightforward comparison with the item difficulties in the Dutch sample problematic. The reliability coefficients of the subtests in the three studies are comparable with the coefficients found in the Dutch sample, although the values in the Dutch sample are somewhat higher. In the Chinese study, the only study where a factor analysis was performed, the resulting factor structure failed to show a clear similarity with the factor structure in the Dutch sample. This might be an indication of the cultural loading of some of the subtests. In the Peruvian and Brazilian study clear indications of the validity of the SON-R 5.5-17 were found. The SON-R 5.5-17 showed a relatively high correlation with the WISC-R (.77) in the Peruvian study. In the Brazilian study the standardized total score based on three subtests of the SON-R showed satisfactory correlations with school marks on science (.52); mathematics 19 (.49), and Portuguese (.38). The correlations of report marks with the total score on the SONR based on 7 subtests in the Dutch sample are of the same magnitude as the correlations found in the Brazilian sample. In the Brazilian study the total score on the SON-R, based on three subtests, correlated .40 with motivation, .45 with cooperation and .19 with concentration. In all three studies a judgmental procedure was used for the identification of item bias in the subtests Categories, Situations and Stories. In the Chinese study the Chinese psychology students who applied the SON-R 5.5-17 reviewed the items on the aspect of possible cultural bias. In the Peruvian and Brazilian study the children who responded incorrectly to an item were asked if the pictures that were used in that item were understood by them. As a result of this procedure 14 of the 33 items (42%) of the subtest Categories have been identified as possibly culture specific. One of these items has been identified as culture specific in all three studies (item 2b), 3 items were identified as biased in two studies (items 4a, 4b and 6a) and 9 items were identified as biased in just one study (items 1a, 2c, 3a, 3b, 5c, 7a, 8a, 8c, and 9c). For the subtest Situations 6 of the 33 items (18%) and one example were identified as possibly culture specific. Three of the items were identified in two different studies (items 2b, 2c, and 10b); the remaining 3 items were identified as biased in just one study (items 3c, 4a, 4b). The example that was identified as culture specific (example A) was identified in only one study. General Conclusions The research results obtained in various countries for both the SON-R 2.5-7 and the SON-R 5.5-17 indicate that both tests can be used in cultures that are different from the Dutch culture. The validity-coefficients of both SON-R tests encountered in various countries are comparable to the validity-coefficients found in the Dutch norm sample. For the SON-R 5.5-17, however, some adaptations will have to be made in the subtests Categories and Situations. For the subtest Categories 42% of the items will have be adapted, while for the subtest Situations 18% of the items and one example will have to be adapted. For the other subtests of the SON-R 5.5-17 no adaptations have to be made. Although no specific research have been done to item bias of the SON-R 2.5-7, adaptations of this test seems to be less necessary. For this young age group the test uses only very simple pictures. In the construction phase of this test special attention was paid to possible item bias. The research result that immigrant children in the Netherlands had the same mean score on the subtests that use meaningful picture materials as on the subtests that use non-meaningful picture materials such as geometrical forms, is an empirical argument indicating the culture-fairness of the SONR 2.5-7. Only future empirical research will tell whether the impression is correct that for cross-cultural use of the SON-R 2.5-7 no adaptations are necessary. 20 References Anastasi, A. (1989). Psychological testing (6th edition). New York: Macmillan. Brouwer, A., Koster, M. & Veenstra, B. (1995). Validation of the SnijdersOomen Test (SON-R 2.5-7) for Dutch and Australian children with disabilities. Groningen: Internal Report, Department of Educational and Personality Psychology. Ceci, S.J. (1991). How much does schooling influence general intelligence and its cognitive components? A reassessment of the evidence. Developmental Psychology, 27, 703-722. Cooper, C.R. & Denner, J. (1998). Theories linking culture and psychology: universal and community-specific processes. Annual Review of Psychology, 49, 559-584. Cronbach, L.J. (1990). Essentials of psychological testing (5th edition). New York: Harper-Collins Publishers Inc. Drenth, P.J.D. (1975). Psychological tests for developing countries: rationale and objectives. Dutch Journal of Psychology [Nederlands Tijdschrift voor de Psychologie], 30, 5-22. Ellis, B.B. (1995). A partial test of Hulin´s psychometric theory of measurement equivalence in translated tests. European Journal of Psychological Assessment, 11, 184-193. Hambleton, R.K. (1993). Translating achievement tests for use in cross-national studies. European Journal of Psychological Assessment, 9, 57-68. Hambleton, R.K. (1994). Guidelines for adapting educational and psychological tests: a progress report. European Journal of Psychological Assessment, 10, 229-240. Hambleton, R.K. & Bollwark, J. (1991). Adapting tests for use in different cultures: technical issues and methods. Bulletin of the International Test Commission, 18, 3-32. Hambleton, R.K. & Anjee, A. (1995). Increasing the validity of cross-cultural assessments: use of improved methods for test adaptations. European journal of Psychological Assessment, 11, 147-157. Hambleton, R.K. & Slater, S.C. (1997). Item response theory models and testing practices: current international status and future directions. European journal of Psychological Assessment, 13, 21-28. Hamers, J.H.M. & Sijtsma, K. & Ruijssenaars, A.J.J.M. (1993). Learning Potential Assessment. Theoretical, Methodological and Practical Issues. Amsterdam: Swets & Zeitlinger. Helms-Lorenz, M. & van de Vijver, F.J. (1995). Cognitive assessment in education in a multicultural society. European Journal of Psychological Assessment, 11, 158-169. Holland, P.W. & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.), Test Validity (pp. 129-145). Hillsdale, New York: Lawrence Erlbaum. Horn, J., ten (1996). Validation research of the Snijders-Oomen nonverbal intelligence test (SON-R 2.5-7) in the USA. Groningen: Internal Report, Department of Educational and Personality Psychology. Hu, S. & Oakland, T. (1991). Global and regional perspectives on testing 21 children and youth: an empirical study. International Journal of Psychology, 26, 329-344. Hulin, C.L. (1987). A psychometric theory of evaluations of item and scale translations – Fidelity across languages. Journal of Cross-Cultural Psychology, 18, 115-142. Jenkinson, J., Roberts, S., Dennehy, S. & Tellegen P. (1996). Validation of the Snijders-Oomen Nonverbal intelligence test – Revised 2.5-7 for Australian children with disabilities. Journal of Psycho-educational Assessment, 14, 276-286. Jensen, A.R. (1980). Bias in mental testing. New York: Free Press. Jensen, A.R. (1984). Test bias: Concepts and criticisms. In: C.R. Reynolds & R.T. Brown (Eds.), Perspectives on bias in mental testing. New York: Free Press. Judistira, E.M. (1996). A preliminary validation research with the SON-R 5.5-17 in China. Groningen: Internal Report, Department of Educational and Personality Psychology. Fan, X., Wilson, V.L. & Kapes, J.T. (1996). Ethnic group representation in test construction samples and test bias: the standardization fallacy revisited. Educational and Psychological Measurement, 56, 365-381. Laros, J.A. & Tellegen, P.J. (1991). Construction and validation of the SON-R 5.5-17, the Snijders-Oomen non-verbal intelligence test. Groningen: WoltersNoordhoff. Le Clerq M. & Holvast L. (1996). The SON-R 5.5-17 and the WISC-R applied to Peruvian school children. Groningen: Internal Report, Department of Educational and Personality Psychology. SWAP: 96-1227. Linden, W.J. van der, & Hambleton, R.K. [eds.] (1997). Handbook of Modern Item Response Theory. New York: Springer Verlag. Oakland, T., Wechsler, S., Bensuan, E. & Stafford, M. (1994). The construct of intelligence among Brazilian children – An exploratory study. School Psychology International, 15, 361- 370. Parmar, R.S. (1989). Cross-cultural transfer of non-verbal intelligence tests: an (in)validation study. British Journal of Educational Psychology, 59, 379-388. Poortinga, Y.H. (1995). Cultural bias in assessment: historical and thematic issues. European Journal of Psychological Assessment, 11, 140-146. Sattler, J.M. (1992). Assessment of children, Revised and updated third edition. San Diego, CA: J.M. Sattler, Publisher, Inc. Sijtsma, K. & Molenaar, I. (1987). Reliability of test scores in nonparametric item response theory. Psychometrika, 52, 79-98. Snijders, J.Th., Tellegen, P.J. & Laros, J.A. (1989). Snijders-Oomen Nonverbal intelligence test: SON-R 5.5-17. Manual and research report. Groningen: WoltersNoordhoff. Tellegen, P. (1997). An Addition and Correction to the Jenkinson et al. (1996) Australian SON-R 2.5-7 Validation Study. Journal of Psychoeducational Assessment, 15, 67-69. Tellegen, P.J. & Laros, J.A. (1993a). The Snijders-Oomen nonverbal intelligence tests: general intelligence tests or tests for learning potential? In: Hamers, J.H.M. & Sijtsma, K. & Ruijssenaars, A.J.J.M. Learning Potential Assessment. Theoretical, Methodological and Practical Issues. Amsterdam: Swets & Zeitlinger. 22 Tellegen, P.J. & Laros, J.A. (1993b). The construction and validation of a nonverbal test of intelligence: the revision of the Snijders-Oomen tests. European Journal of Psychological Assessment, 9,147-157. Tellegen, P.J., Winkel, M., Wijnberg-Williams, B.J. & Laros, J.A. (1998). Snijders-Oomen Nonverbal Intelligence Test, SON-R 2.5-7, Manual & Research Report. Lisse: Swets & Zeitlinger. Van de Vijver, F.J.R. (1997). Meta-analysis of cross-cultural comparisons of cognitive test performance. Journal of Cross-Cultural Psychology, 28, 678-709. Van de Vijver, F.J.R. (1998). Cross-cultural assessment: value for money? Invited Address for Division 2 of the International Association of Applied Psychology. San Fransisco, August 10. Van de Vijver, F.J.R. & Poortinga, Y.H. (1992). Testing in culturally heterogeneous populations: When are cultural loadings undesirable? European Journal of Psychological Assessment, 8, 17-24. Van de Vijver, F.J.R. & Poortinga, Y.H. (1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13, 29-37. Wang, Z-M. (1993). Psychology in China: a review. Annual Review of Psychology, 44, 87-116. Wechsler, D. (1974). Wechsler Intelligence Scale for Children, Revised. (WISCR). New York: Psychological Cooperation. Wechsler, D. (1991). Wechsler Intelligence Scale for Children: Third edition. (WISC-III). New York: Psychological Cooperation. Weiss, S.C. (1980). Culture Fair Intelligence Test (CFIT) and Draw-a-Person scores from a rural Peruvian sample. Journal of Social Psychology, 11, 147-148. Zhang, H-C. (1988). Psychological Measurement in China. International Journal of Psychology, 23, 101-117. 23