ppt

advertisement
Tests of Cognitive Intelligence
Common Characteristics of Individual
Intelligence Tests
•
•
•
•
•
•
•
individually administered
administration requires advanced training
tests cover wide range of age and ability
examiner must establish rapport
immediate scoring of items
usually requires about one hour
allows opportunity for observation
Two Main Individually Administered
Intelligence Tests
Stanford-Binet
• He wanted to create a process for identifying
intellectually limited children so they could
be removed from the regular classroom and
put in special education.
Wechsler Scales
• Developed in response to the perceived
shortcomings of the Stanford-Binet
Binet’s Principles of Test Construction
• Wanted tasks to measure judgment, attention, and
reasoning.
• Guided by two major concepts: age differentiation
and general mental ability.
• Age differentiation: Binet searched for tasks that
could be completed by 2/3 to ¾ of the children in a
particular age group & was completed by fewer
younger children and more older children.
• General mental ability: measured only the total
product of the various tasks. Judged value of task
in terms of its correlation with the combined result
of all other tasks.
Early Binet Scales
1905: 30 items ordered by difficulty. Test lacked:
• adequate measuring units to express results (only used
idiot, imbecile, and moron)
• adequate normative data (only used 50 subjects)
• evidence of validity
1908: Grouped items according to age level rather than
simply increasing difficulty. Introduced concept of mental
age.
• Increased norm group to 203.
• Criticized because it produced only one score almost
exclusively related to verbal, language, and reading ability
1916 Stanford Binet Intelligence Scale
• Lewis Terman increased size of standardization
sample though it was only white native-California
children.
• Introduced intelligence quotient concept to show
subjects’ rate of mental development.
–IQ = (MA/CA) x 100
• However, maximum mental age was 19.5. Had to
set maximum chronological age, too, so set it at
16.
1937 scale
• Extended age range down to 2 and up to 22 years,
10 months.
• Scoring standards and instructions were improved
• Several performance items added
• Standardization sample improved to include 3184
subjects from 11 states. Subjects selected
according to their fathers’ occupations. Still,
sample included only whites and mainly those
from urban areas.
• Developed alternate form.
Problems with 1937 Form
• Reliability higher for older subjects than for
younger ones and for those in the lower IQ
ranges
• Scores were most unstable for young
children with high IQ.
• Each age group also had different standard
deviations which made interpretation
difficult
1960 Stanford-Binet
• Used Binet’s principles to redo scale.
• Solved problem of differential variation in IQ
by using the deviation IQ concept. Set mean
at 100 with SD of 16. Could now compare
scores of one age level with another.
• No new normative sample but did one in
1972 that included non-whites and 2100
children.
Modern Binet Scale
• Totally revised in 1986 by Thorndike et al.
• Used Thurstone’s multidimensional model (1938):
G made up of crystallized ability (verbal &
quantitative reasoning), fluid-analytic abilities
(abstract-visual reasoning) and short term
memory.
• Used IRT (Rasch model) to determine proper order
of the items
• Used routing test (Vocabulary) as attempt to adapt
testing to specific ability level of each examinee
without computer adaptive testing
Structure of the SB-IV
• Verbal Reasoning included vocabulary test,
comprehension test, absurdities test, and verbal
relations test.
• Abstract-Visual Reasoning included pattern analysis
test, copying test, matrices test, paper-folding and
cutting test.
• Quantitative Reasoning included quantitative test,
number series test, equation-building test.
• Short-term Memory included bead memory, memory
for sentences, memory for digits, and memory for
objects.
• Composite included all areas combined.
Psychometric properties of SB-IV
• Standardization sample has 5000+ subjects in 47
states and DC.
• Sample stratified based on 1980 census – geographic
region, community size, ethnic group, age, and gender.
• Internal consistency reliability is .98 for composite and
.93-.97 for area scores. Some individual test scores
are lower: .73 for memory for objects is the lowest.
• Test-retest reliabilities for composite score were .91
and .90 for 5 and 8-year-olds.
• Factor analysis supports the structure of the test.
• Correlations with other IQ tests are generally in the
70s and 80s
Wechsler Scales
• David Wechsler worked at NY’s Bellevue Hospital.
He wasn’t happy with the Stanford Binet with it’s
focus on children or on the production of a single
score.
• In 1939, he created the Wechsler-Bellevue, later
called the WAIS.
• In 1949, he created the children’s version, the
WISC.
• In 1967, he added the WPPSI for children ages 2.57.
Structure of the WAIS
• The WAIS yields separate verbal and
performance IQs
• The WAIS-III has four index scores: Verbal
comprehension, working memory,
perceptual organization, and processing
speed.
Verbal and Performance Tests on the WAIS
Verbal:
• Vocabulary
• Similarities
• Arithmetic
• digit Span
• Information
• Comprehension
• Letter-Number
Sequencing
Performance:
• Picture completion
• Digit symbol-coding
• Block design
• Matrix reasoning
• Picture arrangement
• Symbol search
• Object assembly
Scales and Norms for the WAIS
• Determine raw score for each subtest.
• Convert raw scores to standard scores, called scaled
scores (M=10, SD=3)
• There are conversions for 13 age groups. This method of
conversion obscures any differences in performance by
age.
• Subtest scaled scores are added, then converted to WAISIII composite scores.
• Three composite scores: verbal, performance, full scale,
each with M=100, SD=15
• Four index scores: verbal comprehension, perceptual
organization, working memory, processing speed
Standardization of the WAIS
• Standardized on a stratified sample of 2,450
adults representative of the US population
aged 16-89.
• There were 200 cases per age group, except
for the smaller numbers in the two oldest
groups.
• Still difficult to know the effects of selfselection since participants had to be invited
and accept to be included.
Reliability of the WAIS
• Internal consistency and test-retest reliabilities are
about .95 or higher for full scale and verbal scores.
• They’re about .90 for performance and three other
index scores: perceptual organization, working
memory, and processing speed.
• Internal consistency reliability for the subtests
range from upper .70s to low .90s. Test-retest is
about .83.
• Generally, performance reliabilities are lower than
verbal reliabilities on the subtests.
Validity of the WAIS
• Great deal of information on criterion-related and
construct validity.
• Factors analyses support use of 4 index scores.
• Comparison studies show the pattern of WAIS-III
scores for many special groups, e.g., Alzheimers’
Disease, Parkinson’s, learning disabled, brain
injury.
• Is the top test used today
WISC-III
• Is the most popular test for assessing
intellectual ability of children ages 6 years, 0
months to 16 years, 11 months.
• Similar to structure of the WAIS, with easier
items
• Both tests yield verbal, performance, and
full scale IQ and 4 index scores
• Most of the subtests are the same
Psychometric Properties of the WISC-III
• Standardization program involved 2,200 cases
selected to represent the US population of children
aged 6-16.
• Composite scores generally have internal
consistency reliabilities in the mid-.90s and testretest reliabilities around .90.
• Subtest reliabilities are generally in the mid-.80s.
• Object Assembly and Mazes are problematic, with
reliabilities in the .60s.
Group Differences in IQ
• Psychological tests designed to measure
differences among people.
• Test scores that demonstrate differences among
people may suggest that people are not created
with the same basic abilities.
• Biggest problem: Some ethnic groups obtain
lower average scores on some psychological
tests. On average African Americans score 15
points lower than whites on IQ tests.
• Dispute is not whether differences occur but why
they occur.—environment vs. biology
Problems with Biology Argument
• IQ scores are improving (called the Flynn
effect), more so for African Americans than
whites.
• Victimization by stereotyping could affect
test performance and grades.
• Construct of race has no biological meaning
based on evidence from studies in
population genetics, the human genome and
physical anthropology.
Criticisms Related to Content
Validity
• Looking at specific items, it was thought that
they might be biased because some children
wouldn’t have the opportunity to learn the
material
• Members of ethnic groups might answer
some items differently but still correctly
• Scores affected by language skills
inculcated as part of a white, middle-class
upbringing foreign to inner city children
Responses to Content Validity Criticisms
• Test developers are indifferent to the opportunities
people have to learn the information on the tests. The
meaning they assign to the tests comes from
correlations of test scores with other variables.
• Some evidence suggests that the linguistic bias in
standardized tests does not cause the observed
differences (Scheuneman, 1987).
• Elimination of biased items from a test didn’t change
the test scores (Bianchini, 1976).
• Can’t find classes of items most likely to be missed by
minority group members (Wild, et al., 1989)
Other Ways of Thinking About
Differences
• Maybe difference in test scores may reflect
patterns of problem-solving that
characterize different subcultures (e.g.,
MBTI)
• R. D. Goldman (1973) proposed the
differential process theory which maintains
that different strategies may lead to effective
solutions for many types of tasks.
Strategies mediate abilities and
performance.
Criterion Issues
• Most standardized tests are evaluated against other
standardized tests. The criterion may be the same test
dressed up differently or measuring test-wiseness on both
• IQ tests may be correlated with achievement tests.
Achievement may be moderated by opportunity to learn.
• Goldman and Hartig (1976) found scores on the WISC to
be unrelated to teacher ratings of classroom performance
for minority children, but significant for non-minority
children.
• Majority and minority children grow up in different social
environments. Perhaps test scores accurately reflect the
effects of social and economic inequality.
Download