Test Bias and Construct Validity - Arthur Robert Jensen memorial site

Test Bias and Construct Validity Author(s): Arthur R. Jensen Source: The Phi Delta Kappan, Vol. 58, No. 4 (Dec., 1976), pp. 340-346 Published by: Phi Delta Kappa International Stable URL: http://www.jstor.org/stable/20298577 . Accessed: 26/06/2014 07:52 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. . Phi Delta Kappa International is collaborating with JSTOR to digitize, preserve and extend access to The Phi Delta Kappan. http://www.jstor.org This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions R. Arthur Jensen AND BIAS TEST VALIDITY CONSTRUCT Recent bias research no shows a number M are familiar with that our mental .lWAost psychologists the claims of critics tests are biased culturally minorities, especially biased in favor of middle-class culturally a As whites. the literature. the dividual's background is similar to that of the modal cultural configuration of American yardstick. a narrow, There everyone by is a conspiracy biased collection real measure "Intelligence because named, to due or genes inherent probably linguistic, of the test." thus and may and values to speak skills, especially standard English, that instrument white middle-class scores The of almost two and because children minority guarantees biased that children will obtain of other group any the more similar experi similar the more people, than R. JENSEN is professor of research and psy psychology Uni Institute Learning, chologist, of Human is This article versity of California, Berkeley. an address at the drawn he delivered from 1975, meeting of the American September, in Chicago. Association Psychological educational 340 is 4. The is a performance between social classes and between certain IQ and majority success in with a number tests, tests this of different very using I of school variance sources. The an shows estimated attributable figures of children, socioeconomic we percentages to each easiest based class of analysis those in the last column, average absolute difference these sample of ages 5 to 12.l schoolchildren, 1 with ance, total a random representing California between between within the same that between the families class social a (on scale of SES) is nine IQ points, than the greater between social classes. is 50% difference In short, discriminate the IQ tests race or that notion terms in largely average of or much in more and parents as background, class groups. difference among the sharing family, and cultural linguistic same the between or racial social intellectual samples large question children in California. Because of its the Wechsler Intelligence familiarity, Scale for Children-Revised (WISC-R) is a good example, with data on full scale IQs of more than 600 whites and 600 blacks IQ difference same differences show to examine able a difference large the children big are these differences? been some social class is just a myth. The IQ shows do have as biased, average as IQTests Show Differences? where the same family is If the Wechsler IQ difference average classes? too, Notice, than which for validity and attainment and how between difference culturally as produces 10-point never may occupations. Where Do so is social biased criteria, like culturally educational in middle and lower racial groups. minority tests biased 5. Culturally show good theless predictive predicting test 12 is the important here average is status, siblings as between blacks and whites? Or a larger difference between siblings in cultural differences biggest are The point: inde difference, critics claim, what kind of bias is it that background. Table ARTHUR of background which of similarity scores But full siblings within also 12 IQ points.2 share experience test in function Just and white or culture not do are but of their backgrounds." "The IQ test is a seriously ences be psycho biases temporal reward tests penalize higher children. dif not environment, in the cultural, "Aptitude middle-class ability scores IQ class social and ethnic, "Racial, in mean ferences CB called aptly tests." background) in and intelligence more been have mis never were they to measure (cultural other is that blacks or in the U.S. Similarity race average of socioeconomic pendent IQ points. cognitive by the tests. direct blacks.) The upon heavily cultural knowledge and 3. poorly." in these criticisms implication common 10 group) was only six IQ points. (The largest SES difference was 26 IQ points in the whites and 12 IQ points in the dif draw minorities sampled do have are: tests 2. The other who may themes middle-class specific and linguistic usage. verbal the test was in which the The average IQ differences of the all possible comparisons each racial classes social (within cupation. between in vocabulary of their frequency Blacks, tests of mental the items be penalized." always are tests sadly will developed might of all persons." from backgrounds than the culture tended an Anglo to make of "Persons vocabularies, The main a society." measures "IQ fering 1. The they an in Anglo-centric; to which extent on whites. by typical. are tests "IQ measure are They included usage a just I have picked up few direct quotations from are here reminder, be." should words are based tests are and of cultural in any of black/white significant of widely used tests of intelligence. "The certain against blacks, indices bias scores their several using to vari of of grasp In discussing distinguish loaded ordered be generality the of tent are less general scale oc test's of test test may be the more contain acquired PHI DELTA KAPPAN This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions must first con two bias. same test and a items of continuum is the specificity con informational the items. The the culture information quired, only Tests along loading, which culture or biased. culture can we bias, between and culture loading the not mean does culture cepts: Culture as clearly the giving the in IQ. For a 10-point on parental Criteria of Cultural Bias content culture the in which could be ac loaded it is. A that information within or narrower a could particular can This culture. be usually A ing. ent to in is, City loaded this than 10-cent the more sense, candy culture "How question, can bars you many for the Whether to respect is a in the population groups is an It The abstract. a volves test or two in used with biased culture of test the cultural or predictions groups com to respect scores between their scores. between per social individuals, racial whether se, or classes, a be cannot groups, obviously is no criterion of bias. There proper a priori that any two basis for assuming be equal in whatever should populations is that test the measure. general test external types: or are bias and of internal, construct and validity predictive validity. For of criteria uses of tests, predictive validity is crucial. One criterion of test bias is this: Do the intercepts and slopes the of of regression measures criterion do words, scores test the predict equally well for both groups? When the regres sion lines of two groups have different a and slopes, intercepts person's on the criterion dicted performance ? etc. will be influenced job, school, group not are biased blind. Its test and membership, adequate test, on the predictors. other hand, ture prediction or scholastic based on a of An pre ? by scores un is color fu person's job lines regression standard tests. and for whites A blacks regression single equation predicts equally well for both the few racial groups.3 Interestingly, exceptions reported in the literature would favor the black groups if the tests were used for selection, i.e., the any ence in the regression lines is such that for any given test score whites slightly outperform blacks on the criterion. In to overpre the criteri on a the selection Extremes of Culture Loading ad evi overwhelming com when blacks against general more about conclusions them.) validity criteria of test bias Construct no but complicated less im portant. It is very likely that tests which show little or no bias in terms of the indices of construct validity are also in.predictive validity. Construct validity criteria of bias refer to internal characteristics of the test and the degree of similarity of their text test of measures of the that ability in both as as well group If there is a difference other? means on the what the test does test, measures unsuspected our in the in group of theory The each names of from ranges concrete vocabulary and mon, words each choice subject set the pattern. and must of to Items rare Pro of 60 plates, of geo part which a multiple from complete correctly in complexity range a from most by very The pattern select six difficulty the of a designs with amissing metric the to consist of consisting com easy, very concepts. gressive Matrices the and to it. The words abstract and examiner to point is asked subject 150 plates, The pictures. the pictures four one of consists PPVT with level is that a to up 3-year-olds the capacity beyond adult. average Both of these tests were individually to about 600 white and administered 400 black children, ages 6 to 12, in California two The schools.4 groups show a typical IQ difference of about one standard deviation (15 points) on tests. both other predict differences Matrices. Progressive level of difficulty the one two tests I let us contrast First, believe most psychologists will agree are widely separated on the culture-loading the Peabody Picture Vo continuum: (PPVT) and Raven's cabulary Test passable a battery differences our in question. Does test measures yield are borne out empirical populations of what predictions ly in the the or a test, individual in the same hypothetical con the involves also bias, to group in validity, of whether question of tests, one from properties Construct another. of Raw Correlation be Scores with Age. that the Peabody first indication The tween the two groups? I shall illustrate the application of and some or similarly in both racial groups is the fact the criteria bias on of construct known a of variety of mental tests standard internal of abilities, the United sive are criteria these test any controversy whites and States. We have on these two data our in others over around largely black differences test the more groups in exten than and population, revolved bias has well-known in test blacks scores. white/ the age in months, a were quite different that unlikely the nearly each group. efficient same the raw in scores of correlation and about same measuring something in both groups, it seems the scores would have correlation Internal Consistency internal quite tests in both racial groups. .70, for both tests about between correlation the are groups the If behave instruments Raven that well mainly intelligence or IQ tests. In all the the populations for which examples, evidence of test bias was sought by on differ tend pared with whites. (There are too few studies of other ethnic groups to permit previously Research on this criterion of test bias is unequivocal. is a negligible There difference in the slopes and intercepts on biased theory performance, is as accurate for blacks scores, as for whites. of not statistical on test scores differ appreciably for the in question? In other two populations his In brief, unbiased practical = 622. dence on the predictive validity of standard tests indicates that they are are Legitimate blacks performance blacks gives which on, 622; tests the words, blacks' dict = size: whites Sample to is supposed 100 17 Total sample vantage. differences Score 5 4 error Measurement other the on based 44^ 12 families (siblings) ex the content that is generally peculiar to the members of one group but not to the members of another group, it is liable to parisons 73 Within in bias of determination specific Between families (within race and 29^ 9 social class) matter. empirical To specific populations. test contains that the tent two any is no such thing as test bias in the There more of performance issue. separate two 22 14^ 12 con cultural particular the test to be biased with the (or more) it Social class (within races) 8\ 6 buy Race (within social classes) tent causes Average IQ Difference % Variance Source $1?" be Difference inWISC-R IQ Independently Associated with Race (White/Black), Social Class, and Between and Within Families load to its cultural corresponds the respond test item requiring name York in New three parks content Absolute and Average of Variance Percent 1. Estimated Table determined simply by examination of the test items. or generality of the The specificity consistency in the Peabody with in Reliability. The reliability co is .96, both for DECEMBER 1976 This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions age 341 whites and liabilities and .86. the blacks; whites and Raven (The Raven rected consists for .90 re lower only because fewer of length are a has of re Raven blacks than the Peabody liability the for for items. the test, Cor Raven's is higher than the Peabody's.) reliability are example, adults only schoolchildren. shown performances and quantitatively tatively, have sort of thing; that in test differences can be closely simu is, black/white lated, . . tests of. variety the same quali of groups by comparing and older white younger chil the more much The London found (e.g., bannister), than some while difficult, so that the overall differences average out and the English and the American white same the children mean however, on every obtain a item in the same the were for the blacks is background and blacks, we a reflect cultural between should the were group in taking other more or we formance, ferent internal we than careless or made the tests, at the answers, guesses contaminated their per haphazard otherwise But more should consistency that the see dif quite reliabilities. expect are reliabilities whites and Rank Order of Item Difficulty. The highly for comparable blacks. passing an item is an index of item difficulty. We P percentage can the of rank the compare in the white values group of order and black P these and groups express the degree of similarity between the groups by means the P values. between are tions the ing with corrected for correla us attenuation, each racial group of the the i.e., reliability of Ps within each itself, order rank correlation the (All of correlation the of racial the difficulty and males rank the is of correlation item other difficul and as blacks white between between females. males (The and is females The difficulties when greater correlations corrected of item are all .99 or in the Raven for attenuation. This was found not to be the case when test scores of white Peabody in schoolchildren London, scores with compared matched number 342 England, of age white children in California. A of items differed markedly in order of as many as 50 for Londoners bronco, and some were difficulty, items in rank order apart and Californians. Words thermos, and caboose, differ blacks Correlation P Decrements. of item for white matched for for difficulties In both ad score within item passing 2, and tion of P decrements whatever a is most similarity. The sensitive correlation The on males and and .880 that the the and in blacks. races sexes two same indeed again than in whites .980. that tests were it would be rank their order and the differences in difficulty between items adjacent difficulty should be so alike in both the black and white seem It would groups. even and loading con information tween values and P Matching Are verbal and Peabody tests more verbal? The Raven biased small differences Items. than and Raven that we Peabody seen in the preceding show analyses the little difference between two bias indices Going of a step we further, non between have we well each racial measures item as a whole test among group. tells b, correlations, items discriminate is measur within each items that tween the us how out racial the highest score for both a of the the two that the items individual differences are racial group discriminate have item much between It turns groups. the the These groups. be items correlations with blacks same most total and whites. of Wrong Answers Analysis Culture bias leads to the expectation that whites and blacks should make errors different But wrong. the among distractors multiple of the items they get of analysis re incorrect sponses (errors) in the Peabody that the errors are distributed over fashion choice distractors the than children's the same as who whites. it was various dren multiple and blacks. were several to this finding: exceptions items blacks made different significant On some errors shows in a for each item in the for whites proportions In Raven's there Matrices very matched how racial instance examined. the and within persons The second set given b) be the individual have the tests on the and group, coefficients) ing and how well same P of cor total discriminates the order racial (phi items the nonchance decrements. and the Raven The first set of correlations, us tells tent as the Peabody and the Raven should both show such high degrees of similarity between blacks and whites in rank each single choice more that two tests as dissimilar remarkable evidence point-biserial items and single that best measure race. If the items of these culturally biased for blacks, remarkable see decrements correlate a in favor of the correlations racial in whites no more the .830. decrements we Thus at blacks' is .823 differ of P Raven's blacks P is females two group items adjacent between correlation of of for the a, groups (corrected whites' and between tenuation) P decrements correla two index a for showed is some the Peabody a) compared relations between dichotomy. The done It Item Discriminabilities we items in the test. This is PyP^^ so on, where is the P2-^3> and Pj percent passing item 1,P2 is the percent across the black difference there in also that a vocabulary test in English may be a biased test of intelligence for Mexican Let's jacent so on. in group. Thus and thereby was analysis significant Raven. Raven difficulty same the for difficulty are group no show scores. Peabody the white between items matched Racial Group the level of item difficulty altogether and look only at the differ between in blacks that the of the two are matched out lower on than difference Peabody the to obtain Peabody Americans. remove ences blacks the we and ties. in culture .988.) cross-racial no find then the difficulties items and the the of is not as different and black black males In .983. order ties on the Peabody whites black we But California than items, the the were blacks It turns see in the rank order of item difficul whites The item is .987. females between of between black words, like between corrected order for blacks and whites correlation The the test, rank Peabody between correlation rank Americans. do group.) On were and ence of highly in rank order of differences that we see .between English difficulty If one sets group. whites to expect on If Raven expect scores 35 item against Mexican-American of kind should group. The Picture Vocabulary to really difference biased culture-reduced Raven blacks, as for whites. If the Peabody more of items Peabody significant passing the rank but test, order of item difficulty about about IQ. California lower percent have For each group. we a Peabody items found same the percent passing. Raven when more items for difficulty white culture-loaded those were the with children, however, words much easier certain in Raven in Cali residing and Raven Peabody of linguistic Califor and and blacks also to most are they for American Obviously Londoners very of whites Test dren." of backgrounds nians differ fornia. "A even unfamiliar in England, though moderate difficulty But found in every of responses proportions error were distractors the proportions were PHI DELTA KAPPAN This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions approximately such that the black for white two to the chil years in younger Thus age. chronological it that appears that the few differences were found between white and black are children more in level of mental differences to cultural than related clearly to on the An overall Matrices: race, sex, the age, and items, the cultural blacks fornia have differences and for to a diversity such of is a variance, turns out sources to other relative sensitive index of bias. that the interaction, statistically It though accounts significant, of for less than 1% of the total variance in both the Peabody and the Raven. We found that we could closely within simulate, whole this error, with all of their the margin of of analysis its main interactions, effects using only as as picture matrices! diverse progressive strikes me as quite be tests such pseudo-race and copying free-choice comparison. two into groups: a younger sample group (ages 6 to 9) and a slightly overlapping older group (ages 8 to 11). The same analysis of variance blacks and these two that whites, was on performed when different of groups all the features of reproduced whites, on the of variance the two analysis There is no racial difference groups. between the two sets of variances, with in the margin of sampling error. This is true for both the Peabody and the Raven. The action was pseudo-race also about X items as geometric preferences designs, for match and size, number with preferences which age.5 inter 1% of the total variance. size puzzles, comparisons, discrimination tures, sorting The types of analysis described above all with tain highly points cultural when mentioning. The rank between correlated groups the test as well, similar results. But cer are worth Stanford-Binet. difficulty tests to other gains items greater are more order hetero simple picture of animal pic colored verbal buttons, In a doctoral thesis, Paul Nichols analyzed 16 items of the Stanford-Binet from year III-6 through IV-6 ? the most of sequence heterogeneous items in the whole test ? given to 2,514 black and 2,526 white children, all between 4 and 5 years of Note are age.6 three important with dealing a only points: 1) we restricted por are within of racial or cogency hetero geneous, since it is so unlikely that a cultural difference between two groups a one-year are all been of preschoolers to exposed public rank-order schooling. correlation age inter ? they the com between the blacks and whites in the percent passing each of these 16 Stanford-Binet items turns out to be .99 (without blacks applied more op picture vocabulary, aesthetic posite analogies, comparisons, and so on. following directions, correction been of order comprehension, for correlation Internal Bias of Other Tests have rank no is probably There The ing stimuli on the basis of color, form, change same the test of intelligence geneous collection items to be found anywhere than the items included in the Stanford-Binet tests for ages 3x/2 to 5. The items involve children conservation Piagetian simple in and val, 3) haven't yet mon culture on performed age groups children. comparing by older white and younger result difficulty in the two groups over a set of items that differ markedly in their on knowledge and specific demands tion of the Stanford-Binet test (16 items from year III-6 through IV-6), 2) all the quantitatively This has been shown for developmental the white the entire white implausible. qualitatively, of tests, divided ? on tests and vocabulary an argument Such simulated, closely sample. We called this comparison of two different age groups of whites a We of on A variety of other tests have shown the same sort of thing; that is, black/ white differences in test performance and variance, and all of loadings the first principal component can sampling P decre choice correlations, item factor and as indices rank order of item difficulties, inter-item simu the white group, distractors, that Cali closely within The interaction of greatest interest in terms of detecting culture bias is the race X items interaction. The size of this ments, com argue between whites late age differences to culturally respect to black/white would would skills. findings, are tests these one parisons, non quite subjects. interaction, time for less than these light of that biased with was variance performed on the following factors and all their interactions, for both the Pea and Raven's body Picture Vocabulary this variance. total maintain of analysis In Differences became and accounted significant of White/Black but races, interaction .2% of Simulation two the using whites of ages 6 to 9 and blacks of ages 8 to 11, we found that the race X items maturity differences. the same analysis by doing Finally, again Thus, and whites in The attenuation). the P between this Pearson values of is .96. age at range least, the Stanford-Binet IQ test does not look at all culture biased. I would be quite surprised if black/white comparisons turned out very differently from this on section any other in any other age the of Stanford-Binet range. Wechsler Intelligence Scale for Chil dren. The WISC provides some striking examples of how invalid are the critics' armchair subjective, bias in specific test of cultural analyses items. For example, a favorite target of test critics is the WISC Verbal item: Comprehension "What is the thing to do if a fellow (girl) much Good "(A) Good morning (B) " evening (C)None of these. "It or false." is a good morning. True smaller than yourself starts to fight with you?" This item is often claimed to be culturally biased against blacks, and David Wechsler himself was confronted by this claim in an interview with Dan Rather on a recent CBS-TV program, "The IQ Myth." After seeing the CBS program, a Frank psychology graduate student, Miele, looked up the item statistics on this and other WISC items. He obtained WISC tests on large samples of age matched and black schoolchil white dren and looked at the rank order of of this purportedly biased difficulty item within each racial group. When the easiest item in the whole WISC is ranked 1 and the hardest is ranked 161, the DECEMBER 1976 This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions 343 rank order in difficulty of the "pick a fight" item is only 42 within the black group, as compared to 47 within the white group. item is relatively The whites! for In this short, easier claims are thus easily debunked at the cross-racial items across correlation rank for over all 161 of the is sexes the me The .95. within same the given bat and T. R. correlational Osborne on data white itself and 237 black children in Georgia schools. All the children were given 29 tests of verbal, numerical, spatial, form board, vocabulary, racial two that older years the WISC is Stanford-Binet are items, Yet the heterogeneous. of WISC different difficulty preciably very of items is not ap for whites and lie Personnel Wonder Test. made adults, up items geneous of 50 verbal, spatial, numerical, have that found test for hetero very ? is a This used general intelligence to as the next. cross the close to based groups representative and whites our by evidence any or that the g factor tests standard is in the in blacks g than in whites. "Not is the key complexity al intelligence]. Items some active se, but per difficulty to g [gener that require mental manipulation, mental trans ... are the of the input, formation most g-loaded items." conscious nonverbal, and so on. We logical, the correlation with grade found least a different some blacks. widely not measured the order school substantial Note also rank g of blacks like each attenuation, correlations fluctuate I have on group .96.) items, much load for racial reasoning, arithmetic, is .97. (The correlation of difficulty rank in whites with that in blacks who average g magnitude racial group same the unity. correlation each of correlations one from Corrected ? great variety nonverbal in grades 5 through 8. cross-racial ings are of about the correlation of have 541 blacks and whites The tests? Miele cognitive order of difficulty WISC by just looking whites tery of cognitive sent bias statistics. item The of similar is this general factor for and Frank particular blacks than for armchair How blacks If these various biased culturally for were tests cognitive these two popula it seems improbable that the tions, of the bias would be so magnitude uniform over all types of tests that they all would have same the of pattern g loadings (within the margin of sampling in error) These black and various on loadings g, differential blacks white populations. substantial all with tests, no yield evidence cultural biases of in American and whites. in per cent passing the 50 items, between samples of more than 700 blacks and is .94. The P decrements 700 whites, correlate .81. We also tried to find out if five black and five white psychologists could sort out the eight most and the eight least racially discriminating items when all 16 were items randomly better than chance. inspection very poor discriminate clue blacks On the factor the item's (or first substantially other all the criminate found racial a either does highly between common factor general within the racial the racial to group, item The general component of individual geneous ? as the the largest differences of g first principal ? tests. measure of g when it is its processes correct 344 on a among analyzed of other battery tests, that are heterogeneous content loading preferably in informational tests and in the types of cognitive involved in arriving at the for load one and blacks to be set have in the formu correlation becomes correlation constitutes that the g factor in this diverse tests is the as to not psychological the inspecting dozens tests of g to ? the conclu regarding g is sion that the key word complexity same of complexity the mental a test item required by to produce the person answer. Not per difficulty correct but complexity that some active require some conscious lation, formation is of the sensorimotor as for whites. the rather input, short-term and than manipulation item involves, the This is true for blacks and whites Bender-Gestalt the Test, Draw-a Man Test, the Illinois Test of Psycho linguistic Abilities, and tests of reading, arithmetic and 13 tests in all.7 This test battery of years for races. age, and 13 14 diverse in correlate the on and samples Boston, g load .98 across correction same loadings cognitive large The .98 without (That's of g from Baltimore. attenuation.) done I have correlation and 975 blacks, all drawn tests a in separately factor-analyzed Philadelphia, ings of the the achieve of it perhaps a cerebral even response, more The items. and the true is for just mental an transformation more for it all all animals is ^-loaded. alike. I and humans, that possess cortex. If we hypothesize that the well established of average IQ difference about 15 points between blacks and whites is mainly a difference in g, in the sense cognitive rather of a capacity complexity as just than for dealing in a difference any with form, to due specific cultural content in the IQ test, then we should predict that blacks and cross-racial a battery achievement daresay habitual se, memory are the of the subtests of the Wechsler Intelli gence Scale for Children combined with intercorrelated in the to g. Items key mental manipu trans mental or a ability most ^-loaded Nichols of hundreds many I am led items, of loadings and seven Paul in any do we But its and operations order for the with using correlations despite processes knows yet sense. physiological some idea individual Cor .68. in common nature. the factor g split-half correction-for-attenuation of basic By out unreliability, large battery g for blacks tests answers. turned strong evidence 7 of any test (or test item) as a intelligence is factor on group of 986 whites validity each factor g have mental different seemingly call upon? No one they really what makes for g, certainly over the 29 tests. This the corrected la, This .97. high was An in or the set based ment - construct done to determine between spelling, source single in a hetero cognitive of the criterion or factor intelligence collection important final step was one ings, dis groups. was analysis principal component each racial group. in-race Isg the Same g in Blacks and Whites? can be defined first usual is true within this group tests the great diversity of their content the was of the split-half subgroups. In this way we can determine the reliability of the rected the and black racial is this g factor that all complex cognitive split in half and a principal correlation that What was analysis each Also, from batteries. in the white based on whites, factor general correlates component) the item's racial dis and samples. The from group standard separately randomly be intercor item taken principal-components done within will least In other the groups. words, a test item is correlated items, the more we the on principal with highly the most with the different correlation hand, of all loading racial more items or each criminability, both to which most a be and whites. analysis within relations as the armchair to A components cards separate Again, is shown items of tween on presented shuffled. The judges sorted no Is the Nature of g? What were tests The spelling. several of California whites on performance cognitive will differ less in average on tasks lesser involving on than in tasks complexity the PHI DELTA KAPPAN This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions greater volving cognitive complexity. What do we find? Reaction-Time Studies. pothesis and choice reaction auditory time reaction of to all per as a increases (RT) stimulus i.e., complexity, in the the number of bits of information the to which signal hy in time In stimuli. and sons, function experi test of this complexity is based on differences mental simple visual One person It responds. has also been shown that there is no correlation between simple RT and IQ, but a is there correlation negative be tween IQ and choice RT. That is, persons with higher IQs show quicker RT in a choice situation that calls for some information Four processing. using experiments independent quite diff?rent methods but comparing simple and choice RTs in whites and blacks all show no significant race dif ference for simple RT. But they all a show race significant race (or for choice founded with SES) difference or complex each RT.8 In acts as his person these con more on on than backward digit span. Richard Figueroa in 622 and 622 whites blacks from drawn age-matched California of samples randomly schools.9 Both predictions are fully borne out by the data. We found that backward span correlates significantly higher with total IQ than does forward span; and is true this within each racial We group. also found that the difference between whites and blacks in backward memory is more span than is no there status, race significant ence in forward memory race difference the their values. absolute average, show tween simple whites. a Blacks, difference larger RT and choice RT, on in substantial span. or four-choice, the of The several methods I have described test bias, in terms of for detecting than do various practically complexity simple RT; therefore, in accord with our choice RT shows significant hypothesis, with correlations IQ and with race, while simple RT does not. and Forward If Memory. mental Backward reflects g manipulation for capacity and transforma should ence on mental a expect those racial larger differ more requiring transforma and tests manipulation tion of the input in order to arrive at the output. The and forward digit-span subtests culture-loaded Wechsler shows battery. the difference smallest backward numbers somewhat digit in more and transformation digit span. span reverse ? the digit span white/black repeating ? order mental than socioeconomic Duncan's by a series calls manipulation does forward for evidence comparisons, number of and and The Figure a of function 1 shows the total WISC IQs ?s race of socioeconomic 2 Figure ward digit-span and index Duncan's and forward scores versus as a large representative blacks. are results backward of function span of transformation is signifi input has pre two dicted unknown previously phenomena: 1) the differential correla tion of forward and backward digit span with IQ, and 2) the significantly smaller racial difference in forward than back ward digit span. I do not know of any hypothesis invoking cultural bias in the Wechsler dicted psychological tests either that of phenomena. bias shows bias in any of back cant beyond the .001 level.) Thus the theory of g as a capacity for and the con dealing with complexity scious used, in samples of None of unequivocal: indices of cultural any significant of these tests of indication when are they used with blacks arid whites. Correlation status. shows a on based widely standardized and group tests of intelligence to given whites is well-known, - diverse quite dividual Index regarding black/white However, the several objective forward of any of the subtests. I think, would agree that Everyone, of in Moreover, average as measured of race and SES. (The interaction of race X backward tests of the Wechsler lend themselves nicely to a test of this hypothesis. For one thing, most clinical psychologists judge the digit-span test to be one of the least The of SES. tion, and if it is the g factor on which blacks and whites essentially differ, then we as a function samples the conclusions. strong Figure 1. WISC-R Full Scale ?Q of = = black (N 622) and white (N 622) status Digit-Span and validity can of course be applied to any other groups in the population. But th? evidence regarding groups other than U.S. blacks.and whites is either lacking or is still too sketchy to permit any ^L_L_J_I_I_L_L_I_I_I_L 0123456789 SocioeconomicStatus of of persons' test construct test's features internal performances RT eight-choice zero status. socioeconomic Conclusion task is still a very low level of com plexity as compared with most IQ test items, but it is still more complex than the 9 be of total movement time, independently which is only slightly correlated with It RT and is unrelated to complexity. that a two should be remembered choice, 8 Figure 2. WISC-R Forward and Back ward Digit Span scaled scores (x~?O, o = 3) of black and white samples as a function is measured incidentally, 7 5 6 3 4 Socioeconomic Status 12 0 differ span, but the remains memory as large memory span. socioeconomic forward control for we When ?s twice in difference it is the difference between simple and choice RT that is of primary interest, not r-4 forward and control, A and I tested these predictions backward experiments, own This being so, our theory of g should predict the following: 1. Backward digit span should cor relate more highly with total IQ than should forward digit span. 2. Blacks and whites should differ would these have pre interesting raw scores with internal age, con sistency reliability, rank order of item difficulty (i.e., percent passing), relative difficulty of adjacent items, item cor relation with total score, loadings of items or relative tests on the - distractors general are all and factor, in choice frequencies pf error the substantially same in the white and black groups. I conclude that these standardized tests of intelligence ? the Peabody Pic ture other Progressive In Stanford-Binet, Wechsler Scale for Children, Wonderli? Test, and most likety many tests similar evidence blacks Raven's Vocabulary, Matrices, telligence Personnel ? show of differential and whites. They practically behave DECEMBER 1976 This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions no culture bias for statis 345 tically much and groups the same in both racial same essentially the subjective, about armchair perform job in both groups. Claims on based and surmise speculation test specific in biases cultural ? items the sole of those critics of tests who method to foster the myth wish of culture ? bias are false the objective by the fact that it may proven evidence. Moreover, 3. T. be possible to specially devise culturally biased items in no way proves that all of our tests standard existing biased. loaded Culturally are ? of culturally course. But not culturally biased. The distinc tion is crucial. The myth of culture bias thrives on obscuring this distinction. The large factor general measured by is standard tests of intelligence clearly the same factor in blacks as in that this general whites. The hypothesis our is a capacity for mental conscious factor plexity, and transformation has led empirically manipulation, stimulus inputs of that to predictions a at com cognitive high are borne level of nor science Neither cause the As findings. to question, cate results, the tests causes alternative ? other hypotheses peatedly of test and populations, score variance, theories and against renounce openly that objective disproves. one evidence sig of these is repli as to seek pit an those re D 4. For full Are Biased see A. details, Culture-Loaded Psychology pp. 185-244. Monographs, R. Jensen, "How Genetic Tests?" November, 7. Nichols, op. cit. J. J, Bosco, "Social Class and the Proces Final Information," sing of Visual Report No. OEG-5-9 Contract Project No. 9-3-041, 325041-0034 D.C.: Office (Washington, (010) of Education, of Health, U.S. Department and Welfare, Education, May, 1970); A. R. and Mental "Race Jensen, op. cit.; Ability," and Experi C. E. Noble, "Race, Reality, in Biology mental Psychology," Perspectives and Medicine, Autumn, 1969, pp. 10-30; and Y. Poortinga, "A Comparison and of African in Simple and Students Auditory European in L. J. Cronbach and P. J. Visual Tasks," Tests and Cultural Drenth, eds., Mental 8. Adaptation 349-54. 1974, Mouton, (The Hague: pp. 1972), 9. Jensen and Figueroa, "Forward ward with Interaction Digit-Span IQ," op. cit. D "Race and Mental Ability," in Man Variation ed., Racial Graduate Students Find Studies > of variance were estimated 2. The percentages as follows: the following correlations First, x SES = .4381; Race x were Race obtained: = .4355. Then = .4955; SES x IQ IQ partial out SES Were obtained, correlations partialing x IQ, and Race from SES x IQ, from Race = x IQ)/SES .3765; yielding (SES x (Race .2797. IQ)/Race= attributable of IQ variance The proportion of one to Race and SES independently is the square of the partial correla another tions, i.e., .14 for Race and .08 for SES. as .95 the test-retest The WISC-R gives of Full Scale reliability interval) (one-month sample. IQ in the age range of the present error. is 5% measurement there This means = Thus 27%, leaving 73% for 14% + 8% + 5% racial and Between Families variance (within Families. and Within SES groups) By variance Families is meant the interfamily Between of the siblings. between the means variability variance Within is variability Families among the same family. The Between siblings within Families and Within variances were determined in large from my study of sibling correlations on a highly of whites and blacks samples IQ scale (Lorge-Thorndike comparable IQ), was .43 in both This racial groups. which means racial that 43% of the variance within to differences between is attributable groups The includes SES differences. families, which is due to variance remainder of the variance of higher education find disappointing intellectually tutions studies their and a to according damaging, at Berke by researchers emotionally and Back Race and moods their their their serious beset energies and requirements of and imaginative searchers say in Education. The pushed tion intellectual re the of Katz Joseph the based Testing Service, x)n in-depth interviews conclusions more than 100 graduate area, than Berkeley by more four universities, nationwide the Education, graduate criticizes report of education for departments for their role as failing to train students the undergraduates. Instead, students that graduate claim of teachers researchers are "taught have graduate economic cle In it." have not addition, their job Chroni the students, to adjusted hard times and declining for the is taking tive in pressing and leadership very in gradu initia little for a rethinking purposes of most of the graduate like 3) They find com peers. maintain that have civility," problems. of of been in in created the size of a number offer to of many "a loss in the number of and recommendations remedy Urge They of graduate "greater of the flow of information equalization between a of members departments. and Hartnett students' at and "demeaning" like college treated students graduate Katz students graduate prospective and also graduate recommend departments." that faculty They members come sensitive to the of graduate the supply as well students, and of graduate more problems as that "avoid the placed on spend in be students professional lectual reports. "Apparently ate education goals for contempt schools markets teaching if not to to neglect and at opportunities to simply by the growth graduate the Lilly Institute lively inter them among problems, theoretical community breadth, quiry, at narrow limited" intellectual authors a "competitive inadequate not are isola in expect subjected are They of to join a instead intellectual professors treatment. traumatic but find and freshmen, these students data gathered by the ETS. Financed by grants from and the National Endowment of but are changes education hope for working with others." then returned questionnaires 700 graduate on and 1) They 2) They to for at graduate that are quick concentrating specialty." with the arrive scholars, "relative and actions, munity The in the students call graduate less experience expectations into times researchers, atti old of of "access State University of New York at Stony Brook and Rodney of the Hartnett Educational with mospheres of Higher that the adequate," Students community school at drill than capacities," the Chronicle the students. not busy work, most man's to make ly thwarted: relent evidence and researchers lives if by even all of which make graduate times more resemble military exercise two structure the school ley's Wright Institute. students "find Many crammed, and grim, in clear claims. report The for survey two-year the insti of spite assumptions no are longer tudes in U.S. students graduate Many in programs, the old and Damaging Disappointing less for the to Jane R. Mercer 1. I am indebted WISC-R data and the SES ratings. They have in A. R. Jensen and in detail been described R. A. and Backward "Forward Figueroa, Race Interaction with and IQ," Digit-Span Decem Journal of Educational Psychology, ber, 1975, pp. 882-93. 346 L. G. Humphreys, S. A. Cleary, A. Wesman, "Educational Use of Kendrick, Tests with Ameri Disadvantaged Students,'* can Psychologist, 1975, pp. 15-41; January, L. G. Humphreys, of Group "Implications for Test Assess Differences Interpretation," ment in a Pluralistic of Society, Proceedings on Testing the 1972 Invitational Conference Problems N.J.: Educational Test (Princeton, pp. 56-71; and R. L. Linn, ing Service, 1973), "Fair in Selection," Review Test Use of Educational Research, pp. Spring, 1973, 139-161. Press, 1975). (New York: Academic "For 6. A. R. Jensen and R. A. Figueroa, and Backward Interaction ward Digit-Span and IQ," op. cit.; Paul L. Nichols, with Race on of Heredity and Environment "The Effects in 4- and 7 Test Performance Intelligence White and Negro Year-Old Pairs," Sibling of Min doctoral dissertation, University 1972. nesota, out response criticize, their limits analytically determine mental other our researchers, A. 5. A. R. Jensen, in J. F. Ebling, nificance. social justice is served by denying within and error variance. families This means that if we exclude due to race (14%) variance we are left with sum of the SES 86%, and the variance Between variance Families plus by 86% must be (within SES groups)- divided = + solve (8% .43, i.e., we .43, so x x)/86% is the percent of variance between 20%, which families within SES and racial groups. The of 44% is the variance remainder within families. of creation proletariat," the number teaching graduate be made graduate training." PH? DELTA KAPPAN This content downloaded from 129.130.252.222 on Thu, 26 Jun 2014 07:52:52 AM All use subject to JSTOR Terms and Conditions be emotional of years education, a "prestigious to limited a high-class limits that intel be students and part that of

Test Bias and Construct Validity - Arthur Robert Jensen memorial site

Related documents

Products

Support

Test Bias and Construct Validity - Arthur Robert Jensen memorial site

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib