Tests of Cognitive Intelligence Common Characteristics of Individual Intelligence Tests • • • • • • • individually administered administration requires advanced training tests cover wide range of age and ability examiner must establish rapport immediate scoring of items usually requires about one hour allows opportunity for observation Two Main Individually Administered Intelligence Tests Stanford-Binet • He wanted to create a process for identifying intellectually limited children so they could be removed from the regular classroom and put in special education. Wechsler Scales • Developed in response to the perceived shortcomings of the Stanford-Binet Binet’s Principles of Test Construction • Wanted tasks to measure judgment, attention, and reasoning. • Guided by two major concepts: age differentiation and general mental ability. • Age differentiation: Binet searched for tasks that could be completed by 2/3 to ¾ of the children in a particular age group & was completed by fewer younger children and more older children. • General mental ability: measured only the total product of the various tasks. Judged value of task in terms of its correlation with the combined result of all other tasks. Early Binet Scales 1905: 30 items ordered by difficulty. Test lacked: • adequate measuring units to express results (only used idiot, imbecile, and moron) • adequate normative data (only used 50 subjects) • evidence of validity 1908: Grouped items according to age level rather than simply increasing difficulty. Introduced concept of mental age. • Increased norm group to 203. • Criticized because it produced only one score almost exclusively related to verbal, language, and reading ability 1916 Stanford Binet Intelligence Scale • Lewis Terman increased size of standardization sample though it was only white native-California children. • Introduced intelligence quotient concept to show subjects’ rate of mental development. –IQ = (MA/CA) x 100 • However, maximum mental age was 19.5. Had to set maximum chronological age, too, so set it at 16. 1937 scale • Extended age range down to 2 and up to 22 years, 10 months. • Scoring standards and instructions were improved • Several performance items added • Standardization sample improved to include 3184 subjects from 11 states. Subjects selected according to their fathers’ occupations. Still, sample included only whites and mainly those from urban areas. • Developed alternate form. Problems with 1937 Form • Reliability higher for older subjects than for younger ones and for those in the lower IQ ranges • Scores were most unstable for young children with high IQ. • Each age group also had different standard deviations which made interpretation difficult 1960 Stanford-Binet • Used Binet’s principles to redo scale. • Solved problem of differential variation in IQ by using the deviation IQ concept. Set mean at 100 with SD of 16. Could now compare scores of one age level with another. • No new normative sample but did one in 1972 that included non-whites and 2100 children. Modern Binet Scale • Totally revised in 1986 by Thorndike et al. • Used Thurstone’s multidimensional model (1938): G made up of crystallized ability (verbal & quantitative reasoning), fluid-analytic abilities (abstract-visual reasoning) and short term memory. • Used IRT (Rasch model) to determine proper order of the items • Used routing test (Vocabulary) as attempt to adapt testing to specific ability level of each examinee without computer adaptive testing Structure of the SB-IV • Verbal Reasoning included vocabulary test, comprehension test, absurdities test, and verbal relations test. • Abstract-Visual Reasoning included pattern analysis test, copying test, matrices test, paper-folding and cutting test. • Quantitative Reasoning included quantitative test, number series test, equation-building test. • Short-term Memory included bead memory, memory for sentences, memory for digits, and memory for objects. • Composite included all areas combined. Psychometric properties of SB-IV • Standardization sample has 5000+ subjects in 47 states and DC. • Sample stratified based on 1980 census – geographic region, community size, ethnic group, age, and gender. • Internal consistency reliability is .98 for composite and .93-.97 for area scores. Some individual test scores are lower: .73 for memory for objects is the lowest. • Test-retest reliabilities for composite score were .91 and .90 for 5 and 8-year-olds. • Factor analysis supports the structure of the test. • Correlations with other IQ tests are generally in the 70s and 80s Wechsler Scales • David Wechsler worked at NY’s Bellevue Hospital. He wasn’t happy with the Stanford Binet with it’s focus on children or on the production of a single score. • In 1939, he created the Wechsler-Bellevue, later called the WAIS. • In 1949, he created the children’s version, the WISC. • In 1967, he added the WPPSI for children ages 2.57. Structure of the WAIS • The WAIS yields separate verbal and performance IQs • The WAIS-III has four index scores: Verbal comprehension, working memory, perceptual organization, and processing speed. Verbal and Performance Tests on the WAIS Verbal: • Vocabulary • Similarities • Arithmetic • digit Span • Information • Comprehension • Letter-Number Sequencing Performance: • Picture completion • Digit symbol-coding • Block design • Matrix reasoning • Picture arrangement • Symbol search • Object assembly Scales and Norms for the WAIS • Determine raw score for each subtest. • Convert raw scores to standard scores, called scaled scores (M=10, SD=3) • There are conversions for 13 age groups. This method of conversion obscures any differences in performance by age. • Subtest scaled scores are added, then converted to WAISIII composite scores. • Three composite scores: verbal, performance, full scale, each with M=100, SD=15 • Four index scores: verbal comprehension, perceptual organization, working memory, processing speed Standardization of the WAIS • Standardized on a stratified sample of 2,450 adults representative of the US population aged 16-89. • There were 200 cases per age group, except for the smaller numbers in the two oldest groups. • Still difficult to know the effects of selfselection since participants had to be invited and accept to be included. Reliability of the WAIS • Internal consistency and test-retest reliabilities are about .95 or higher for full scale and verbal scores. • They’re about .90 for performance and three other index scores: perceptual organization, working memory, and processing speed. • Internal consistency reliability for the subtests range from upper .70s to low .90s. Test-retest is about .83. • Generally, performance reliabilities are lower than verbal reliabilities on the subtests. Validity of the WAIS • Great deal of information on criterion-related and construct validity. • Factors analyses support use of 4 index scores. • Comparison studies show the pattern of WAIS-III scores for many special groups, e.g., Alzheimers’ Disease, Parkinson’s, learning disabled, brain injury. • Is the top test used today WISC-III • Is the most popular test for assessing intellectual ability of children ages 6 years, 0 months to 16 years, 11 months. • Similar to structure of the WAIS, with easier items • Both tests yield verbal, performance, and full scale IQ and 4 index scores • Most of the subtests are the same Psychometric Properties of the WISC-III • Standardization program involved 2,200 cases selected to represent the US population of children aged 6-16. • Composite scores generally have internal consistency reliabilities in the mid-.90s and testretest reliabilities around .90. • Subtest reliabilities are generally in the mid-.80s. • Object Assembly and Mazes are problematic, with reliabilities in the .60s. Group Differences in IQ • Psychological tests designed to measure differences among people. • Test scores that demonstrate differences among people may suggest that people are not created with the same basic abilities. • Biggest problem: Some ethnic groups obtain lower average scores on some psychological tests. On average African Americans score 15 points lower than whites on IQ tests. • Dispute is not whether differences occur but why they occur.—environment vs. biology Problems with Biology Argument • IQ scores are improving (called the Flynn effect), more so for African Americans than whites. • Victimization by stereotyping could affect test performance and grades. • Construct of race has no biological meaning based on evidence from studies in population genetics, the human genome and physical anthropology. Criticisms Related to Content Validity • Looking at specific items, it was thought that they might be biased because some children wouldn’t have the opportunity to learn the material • Members of ethnic groups might answer some items differently but still correctly • Scores affected by language skills inculcated as part of a white, middle-class upbringing foreign to inner city children Responses to Content Validity Criticisms • Test developers are indifferent to the opportunities people have to learn the information on the tests. The meaning they assign to the tests comes from correlations of test scores with other variables. • Some evidence suggests that the linguistic bias in standardized tests does not cause the observed differences (Scheuneman, 1987). • Elimination of biased items from a test didn’t change the test scores (Bianchini, 1976). • Can’t find classes of items most likely to be missed by minority group members (Wild, et al., 1989) Other Ways of Thinking About Differences • Maybe difference in test scores may reflect patterns of problem-solving that characterize different subcultures (e.g., MBTI) • R. D. Goldman (1973) proposed the differential process theory which maintains that different strategies may lead to effective solutions for many types of tasks. Strategies mediate abilities and performance. Criterion Issues • Most standardized tests are evaluated against other standardized tests. The criterion may be the same test dressed up differently or measuring test-wiseness on both • IQ tests may be correlated with achievement tests. Achievement may be moderated by opportunity to learn. • Goldman and Hartig (1976) found scores on the WISC to be unrelated to teacher ratings of classroom performance for minority children, but significant for non-minority children. • Majority and minority children grow up in different social environments. Perhaps test scores accurately reflect the effects of social and economic inequality.