Testing 461 3 – 15 – 99 Brief lecture Cognitive tests are tests of achievement and aptitude. The are less expensive and are easy to score. Group tests are less expensive and easier to score. They are cheap and easy. Anyone can administer them. Its drawback is that the test quality is dependent on the individual administering the test. They cannot discriminate as finely as an individual test. Group tests – There is a “lower ceiling” and higher ceiling”. We cannot differentiate the lower from the higher scores. So GROUP TESTS ARE GOOD FOR PEOPLE WHO SCORE IN THE MIDDLE. A person’s IQ is dependent on genetic endowment & past experiences. There is a complex interaction between the two. An IQ test is a general level of intellectual functioning. IQ tests measure the same underlying trait. Intelligence tests predict future behavior. Know what the “Jangle Fallacy” is. 1 Testing 461 3 – 22 – 99 Second lecture for test two Intelligence is one of the most researched topics in psychology; but its definition remains elusive. An OPERATIONAL DEFINTION defines a concept in terms of the way it is measured. “It tests what it tests.” Intelligence tests were invented to measure intelligence, not to define it. A REAL DEFINITION is one that seeks to tell us the true nature of the thing being defined. So we ask experts in the field. Experts broadly agree that Intelligence is the capacity to learn from experience and the capacity to adapt to one’s environment. That learning and adaptation are both crucial to intelligence stands out in certain cases where mentally disabled persons fail to possess one or the other capacity in sufficient degree. Theorists and contributors to the concept of intelligence. Francis Galton (1869) Binet & Simon (1905) Charles Spearman (1904, 1923, 1927) L. L. Thurstone (1931) Wecshler (1939) The aggregate to global capacity of the individual to act purposely, to think rationally and to deal effectively with the environment. H. Gardner (1983, 1993) Raymond Cattel R. Sternberg (1985, 1986) The STANFORD – BINET TEST (4TH EDITION) Why do IQ tests if genetics have your destiny in stone? So we can identify those who have impairments and intercede in time to help them. Binet and Simon introduced in 1905 the first IQ test to identify kids who would do well in education; those that did not do well would be placed in a special education program. There were 30 items on the test. They were representative items for what kids were going to be asked. The aim was not measurement but classification. It was a brief and practical test. It measured directly what Binet & Simon regarded as the essential factor in IQ, practical judgement, rather than wasting time with lower level abilities involving sensory, motor and perceptual elements. They took on a pragmatic view of IQ. Much of the test weighed on verbal skills – to get away from Galton’s tradition. 2 The major innovation of the 1908 scale was the introduction of the concept of mental level. The tests had been standardized on about 300 normal children between the ages of 3 and 13 years of age. This allowed Binet & Simon to order the tests according to the age level at which they were typically passed. Sterns corrected a flaw with their work and said a 5-year-old functioning at the 2-year-old level was impaired than a 13-year-old at the 10-year-old level even though the age difference is 3 years for each case. William Sterns came up with IQ = mental age/chronological age x 100. We do not use ratio IQ anymore we use deviation IQ. Deviation IQ indicates the degree to which an individual deviates from the expected performance for a person of their age. We are compared to people of our same age. In 1916, Louis Terman and his associates at Stanford revised and translated the Binet & Simon scales producing the Stanford – Binet test. Terman was the one to abbreviate “IQ”. It is in its 4th edition. It is one of two individual IQ tests; the other is the Wechsler test (WPPSI – R) The David Wechsler Preschool and Primary Scale of Intelligence – Revised looks at both verbal and nonverbal scores. There is a verbal IQ and a Performance IQ. They combine to make a full scale IQ. The psychoanalytic view looks at IQ as a multifactorial trait that involves both general cognitive ability, plus the sum total of knowledge that someone has acquired. Captures aptitude and achievement. p. 156 Remember Galton said IQ was by hereditary. Did a lot of psychophysical tests. IQ was determined by sensory ability; his ideas relate to the “speed of processing” tasks on IQ tests. This is all inherent according to him. Different from Binet & Simon’s views on verbal ability. 3 Spearman and the g factor. 2 Factor Theory of Intelligence p. 156 Charles Spearman and the G Factor. Spearman proposed that IQ consisted of two kinds of factors: a single general factor, g, and numerous specific factors, s1 s2…Spearman also helped invent FACTOR ANALYSIS to aid his investigation of the nature of IQ. In Spearman’s view, an examinee’s performance on any homogenous test of intellectual ability was determined mainly by two influences: g, the general factor, and s, a factor specific to that test. He likened g, to “energy” or “power” in the cortex. And s, the specific factor, was to have a physiological substrate localized in the group of neurons serving the particular kind of mental operation demanded by the test. Spearman believed that individual differences in g were most directly reflected in the ability to use three principles of cognition: apprehension of experience, education of relations and education of correlations. “Education” refers to the process of figuring things out. He reasoned that some tests were loaded with the g factor and some concentrated more on the s factors. 2 tests with the g factor should correlate strongly. In contrast, tests not saturated with g should show minimal correlation with one another. This is referred to as the 2-factor theory of intelligence. THURSTONE and the Primary Mental Abilities. p.158 He didn’t agree with Spearman’s g factor, he said there are several broad group factors that could best explain empirical results from tests. He came up with another factor theory. There are 7 factors out of 12 he proposed that have been corroborated. They have been designated as primary mental abilities. They are verbal comprehension, word fluency, number (math), space (visualize a 3 dimensional object in your head), associative memory, perceptual speed and inductive reasoning (finding a rule in a number sequence completion test.) Thurstone viewed IQ as being able to be flexible and modifiable in behavior in a meaningful way. 4 Raymond Cattell and the Fluid/Crystallized Distinction Cattel said that g could be subdivided further. Cattel proposed an influential theory of the structure of intelligence that has been revised and extended by Horn (1968, 1985). Instead of finding a single general factor or a ½ dozen group factors, Cattel and Horn identified 2 major factors which they labeled fluid intelligence, gf and crystallized intelligence, gc. Fluid intelligence is a largely nonverbal and relatively culture – reduced form of mental efficiency. It is related to a person’s inherent capacity to learn and solve problems. So it is used during adaptation to new situations. By contrast crystallized intelligence represents that one has already learned through the investment of fluid intelligence in cultural settings. Crystallized intelligence is highly culturally/experience dependent and is used for tasks which require a learned or habitual response. Since crystallized intelligence arises when fluid intelligence is applied to cultural products, we would expect these two kinds of intelligence to be correlated. GARDENER and the Theory of Multiple Intelligences Howard Gardener had a much broader theory of intelligence. He did his work in the 1980’s. He proposed a theory of multiple intelligences based loosely upon the study of brain – behavior relationships. There are several relatively independent human intelligences. He says six natural intelligences have been confirmed: linguistic, musical, logical/math, spatial, bodily/kinesthetic and personal intelligence. So someone like a ballet dancer had intelligence. IQ is the skill to be able to solve problems or create products that were valued by more than one culture. 5 Sternberg and the Triarchic Theory of Intelligence p.168 In addition to proposing that certain mental mechanisms are required for intelligent behavior, he also emphasizes that intelligence involves adaptation to the real – world environment. So he valued “practical intelligence”. Sternberg’s theory is called Triarchic because it deals with three aspects of intelligence. He said IQ tests look for only memory, not the fit between the person and the culture/environment. Truly intelligent people adapt themselves to the environment or change the environment themselves to fit themselves. Sternberg (1986) has made it clear that intelligence just has too many components to be measured by any single test. 1. Componential intelligence – Consists of the internal mechanisms that are responsible for intelligent behavior. 2. Experiential intelligence – A person with this intelligence is able to deal effectively with novel tasks. Contextual intelligence – Defined as “mental activity involved in purposive adaptation to, shaping of and selection of real world environments relevant to one’s life.” So it is a “niche finding.” We either shape or select the environment that fits our needs. 6 Testing Class /4 – 5 – 99 IQ Test Issues Validity – Do they tests measure what they are supposed to measure? They do well for predicating. But not for individual prediction. Better predictions come with grades, study habits, motivation and home and family variables. Stability of IQ – IQ is a fairly stable trait. If you are real young, age 5 for example, then ones IQ can greatly differ, but the older you get the more stable it becomes. By 17 your IQ is in stone. But individual variation can be great due to dramatic changes in health and living conditions. Correlations of IQ – Family size and birthdays. The oldest/first born is the smartest if there is a small family size. Socioeconomic status says the richer you are the higher your IQ. There is more resources to learn. Compared to a poor working class family who do not have the time and resources to make their children smarter. The poor kids score 10 – 20 points lower than middle class folk. Not that the poor are less intelligent but the opportunity to learn is not there plus their own helplessness living in poor neighborhoods doesn’t motivate them to do well in school. Ethnicity – Asian Americans score 10 – 20 points higher than whites. Blacks/Hispanics/Native American Indians score 10 – 15 points lower than whites. But 15% - 25% of blacks score higher than whites. Asians in Asia score higher than anyone else. TEST BIAS Possible explanations for group differences in IQ. Test bias Hypothesis – Middle class whites were made the norm so of course the test will reflect their way of life and not the life of a poor kid. So these white kids have more experience with the test already because they “lived it” in a sense. They were the target for the assessment. The definition of test bias is: Differential Validity for different sub – populations. Whereas fairness is the appropriateness of action which taken based on those results. 7 However some culture fair tests show some of the same biases among races. The differences may be due to MOTIVATION, which is culturally influenced. The real issue is BIAS Vs. FAIRNESS. A test is bias if it is differentially valid for different groups. Fairness is the appropriateness of action based on those results. 2nd hypothesis – genetic hypothesis was proposed by Howard Jensen. No great research to prove this. He said differences are in abstract reasoning research. Last Hypothesis – Environmental hypothesis, growing up impoverished with poor nutrition, poor education and fewer resources to learn. They may become less motivated to do well in school. They won’t see an education as a ticket out of their destructive lives. If there is no hope then “why try” becomes their attitude. Support for the environmental hypothesis comes from adoption studies. Poor blacks raised in rich white families scored higher on IQ tests than blacks raised in their poor neighborhoods. The BELL CURVE book says IQ is inherited - So a task force was formed by the APA to refute this. As a group when people reach adolescence their IQ’s remain stable. But there’s a difference on longitudinal research, which looks at individuals where scores can change by 18 points. The higher your IQ, the more you can pick out a particular stimulus. Nerve conduction – People who are intelligent have higher nerve conduction rates. Heritability is .45 for children and .75 for adults. The Shipley –test we took was used as a general – full scale IQ test. It was first used to determine dementia. It uses T – corrected scores. Good concurrent validity of .6 to .9. Culture reduced tests try and minimize the effects of culture. Some cultures place values on speedy performance, language and familiarity in IQ tests. 3 factors: 1. Minimize language on culture free tests by using non – verbal test items. 2. Second factor is SPEED 3. TEST CONTENT – Items on the test are equally familiar or equally unfamiliar. Culture fair IQ test. Non –verbal group IQ test. CATTEL took Spearman’s G factor into crystal (culture) and liquid intelligence. 8 Cattel – create a test on pure “LIQUID” intelligence. It has to be accomplished. The Stanford Binet IQ test was the Standard IQ test for a long time. Then in 1939 David Wechsler came about. He created his “Wechsler Intelligence Scales.” Wexler said the Stanford test concentrated on verbal abilities too much. The Wexler test now referred to as the WAIS –III came about in 1998. It overtook the Stanford in the 1970’s. 5% of kids in school have some kind of disability. Achievement tests – the students have to read them. It is not given orally like an individual test. A BASELINE is established when the kid answers all the correct # of questions in a row. The Assumption is that these are all easy questions. A CEILING is reached when the kid answers a string of questions incorrectly in a row. The questions are too hard. WISC is for kids. Coding test subtest – we give them a test booklet and pencil with no eraser. Another subtest is for Similarity. Ex. how are a wheel and a ball alike. Their a circle, they roll etc… Rearrange pictures to put them in order of a sequence. The Block test is the single highest performance for IQ. Single most important subtest is Vocabulary. Verbal skills are emphasized – just define words. Object – Assembly test. A picture puzzle for everyone. Everyone gets same presentation. Points are awarded for junctures – they don’t have to solve the entire test. This is the least reliable of the subtests. Common sense test look for practical everyday problems. 9 4 – 12 – 99/Last lecture before 2nd test Continue the Stanford – Binet & Wexler Subtests lecture from last week. Remember the list we came up with one month ago on what are the constructs of intelligence. This is that list she typed up for us. Creativity – Not really assessed on mainstream tests today Adaptation – coding sequence ability & block design test. Speed of Processing – All timed tests assess this. Logic – Picture story sequence and block design. Breadth of knowledge – This is covered in those tests. Motivation – Well, just the fact they are taking an IQ test says something about this. Higher scores would mean higher motivation. Interpersonal Skills – Not really assessed either. So creativity and interpersonal skills are NOT ASSESSED on IQ tests. Sterberg did a study on his own: Expert list on IQ is: 1. Verbal ability IQ 2. Problem – solving IQ 3. Practical IQ Public list of IQ was: 1. Social Competence – This was on the test; public chose this as #1. 2. Verbal ability From last lecture: The Stanford – Biney test was very popular. It was the only test to be used for a long time then Wechser’s test takes over in the early 1970’s. 10 Continue w/S.B. Test The Stanford – Binet has been criticized for its lack of uniformity. However, Psychometric principles are good. Test –retest is really good for the S.B. The Wechsler test became very popular because of its use of both verbal & non – verbal subtests. The S.B. is better than the Wechsler if we want to find out the people in the lower or higher part of the percentiles. The S.B. can spot finer differences in scores here. It can spot out mentally retarded people better because the Wechler test bottoms out in the mid 50’s for IQ regardless of which test is used for IQ. 90 – 110 Average. 50% of the population will fall into this range. 110 – 120 Above average 130 and up is Very Superior Then: 80 – 90 low average 70 – 80 Borderline 70 and below is Mentally Retarded. 55 – 70 mildly retarded 20 – 40 moderately retarded 20 – 25 Profoundly retarded Developmental Tests – 4 Scales/categories These scales are used to assess infants and pre – schoolers. Much more difficult to assess than school age kids because they cannot talk and don’t keep their attention on you. And they don’t have any motivation yet. They don’t care what you are doing. They all have to tested individually. Mothers are good informants about their baby’s behavior. These tests have low reliability and low validity then with school age kids. So developmental tests is to Identify At Risk Children for Development Delays. So Concurrent Validity is going to be important. We want to know that our assessment is consistent with what mom is telling us about their babies, with what pre –school teachers are saying. The goal is to get early intervention. When we spot a developmentally delayed baby then the earlier the intervention the better off he will be. These test lack predictive validity. It will not predict cognitive ability. We are looking at sensorimotor abilities only so it won’t predict cognition. There is no correlation between sensorimotor abilities and cognition. 11 BAYLEY II Test. The NORM group was babies from 2 months to 42 months old – Bayley II scales. We give this test to babies we think are NOT NORMAL. This test will spot the delayed kids out. The manual is very specific and the tester must know the test cold. You cannot keep looking back at the manual to look for instructions. Tester must “engage” with baby with novel toys – bells and sugar pills. You must first look at the baby’s behavior. Is he teething or not. Mean of 100 and SD of 15 but it is not an IQ test. Denver II Developmental Screening Test Most babies will get this test. Don’t have to be trained like the Bayley test. It is quicker to take. Works both for healthy and “at risk” kids. A delay is when a kid did not pass an item that 90% of other kids their age did pass. Abnormal kid when a kid has 2 or more delays in the area the kid is being tested on. It is individually given test. It is just a screening instrument. It rarely yields false positive – 5% of kids turn out normal when the test said they were abnormal. But it has a huge problem with false negatives – It doesn’t screen out kids who should have been screened out. 80% turned out screwed up when the test said they are okay. But when it does “red flag” kids, those kids are likely to have developmental problems. Vineland Test Today we have a law passed by Congress to define mental retardation as being not only an IQ that fell two standard deviations below the mean but also an adaptive behavior measure That fell 2 SD’s below the mean. Self sufficiency counts; kids who were once classified as mentally retarded were not because of language barriers. The Vineland test identifies kids who are not self – sufficient. It can be done over the phone. 12 Factor Analysis We divide the “g” factor up into distinct mental components. We use subtests to identify the “g” factors. Take a Global score then use factor analysis on a group of items to find “g”. Possible to make multiple aptitude test batteries. Ex. BAT, Differential Aptitude test, the vocational aptitude test. GRE Test GRE is now done on the computer. Easy to take it now – plus the results come back immediately. Been around since 1949. But you cannot go back to change an answer. You can drop the test before leaving the door. Doesn’t really predict future GPA but does when the range is restricted. You must wait 60 days before taking the test again. Start out with moderately difficult question – then if you score correctly you get harder questions, if you score bad you get easy questions. Sorta “interactive”. Abstract reasoning part was added on in the 1970’s. Different Programs look at different parts of the test. The entire purpose of the test is to predict graduate performance. GRE scores are correlated with first year scores. You are not invited to come back in grad school if you have a “C” grade. But the norm was taken with kids regardless of grades and admitted them. Then there is good validity with the GRE. CREATIVE TEST (not in text) p.100 in workbook. J.P. Guilford said creative people create new ideas and products which are novel. This comes from Divergent thinking. Convergent thinking is what we are trained to do; we get a single correct response to an question. Paul Torrance came up with a test using 4 factors Guilford found. Divergent thinking would first be a period of more common unoriginal type of thinking, then later we get more creative. These 4 factors are: 1. Fluency – Total # of ideas 2. Flexibility – Total # of categories those ideas fall into. More categories, more creative 3. Originality – Novelty of response. 4. Elaboration – How many details were we able to come up with. 13 Bias in Content Validity – Bias in content validity is probably the most common criticism of those who denounce the use of standardized tests with minorities. The items ask for information that minority or disadvantaged children have not had equal opportunity to learn. Bias in Predictive or Criterion – Related Validity – A test is considered biased with respect to predictive validity when the inference drawn from the test score is not made with the smallest feasible random error or if there is constant error in an inference or prediction as a function of membership in a particular group. Acoording to this viewpoint, a test is unbiased if the results for all revelant subpopulations cluster equally well around a single regression line. An example is an unbiased SAT test will predict future academic performance of both blacks and whites with near – identical accuracy. Bias in Construct Validity – Since this is such a broad concept, the definition of bias in construct validity requires a general statement amenable to research from a variety of viewpoints with a broad range of methods. If a test is non – biased then comparisons across relevant subpopulations should reveal a high degree of similarity for the factorial structure of the test & the rank order of item difficulties within the test. In general, ability and aptitude tests fare quite well by these criteria. FAIRNESS – Social Values & Test Fairness The concept of test fairness incorporates social values and philosophies of test use. 1. Unqualified Individualism – In the open market American tradition, the ethical stance of unqualified individualism dictates that, without exception, the best qualified candidates should be selected for employment, admission or other privilege. 2. Quotas – The ethical stance of quotas acknowledges that many bureaucracies and educational institutions owe their very existence to the city or state in which they function. The city exists at the will of the people so these institutions may be ethically bound to represent their city by hiring a proportionate number of each race into their employment. 3. Qualified individualism – is a radical variant of individualism. This is when one is hired regardless of race and gender and solely upon tested abilities. 14 With respect to selection ratios, the practical impact of qualified individualism is therefore midway between quotas and unqualified individualism. Vandenberg & Vogler concluded that a substantial genetic component to IQ has been proved by decades of adoption studies, familiar research, and twin projection. The genetic contribution to human IQ is usually measured in terms of a HERITABILITY INDEX which can vary from 0 to 1.0. The heritability index is an estimate of how much of the total variance in a given trait is due to genetic factors. 0 means no genetic factors make a contribution while 1.0 means that genetic factors are exclusively responsible for the variance in a trait. Of course, heritability is somewhere between the 2 extremes. 15