Test Administration The Examiner & the Subject Relationship between examiner & subject • Feldman & Sullivan (1960) • WISC administered to children in two conditions: Enhanced rapport neutral Race of tester • Little evidence that race of tester significantly affects test performance (at least with black & white American children) Training of testers • Patterson et al. (1995) Expectancy or Rosenthal effects Rosenthal & Jacobsen (1968) Classic study conducted at a public elementary school, described as “lower class” School used a tracking system where children were sorted into one of three tracks (fast, medium & slow) based on reading performance About 600 students in the school Students at the school described as “low achievers”; pre-test data showed an average IQ of 98 for boys and 99 for girls In spring of 1964. students given the “Harvard Test of Inflection”, which supposedly predicted academic “blooming” At beginning of following year, researchers randomly selected 20% of the student at the school, and stated that these students were “academic bloomers” At the end of the year, IQ scores of children labeled as “bloomers” were compared with the IQ scores of children who were not given this label Results Students labeled as “bloomers” gained significantly more IQ points over the year, compared to control students Reinforcing responses • Sweet (1970) • Terrel et al. (1978) Administered WISC-R to lower-income African-American 2nd graders under one of 4 conditions: • • • • No feedback Verbal praise Candy Culturally relevant verbal praise (“nice job, blood”) Standardized Administration Standardized testing procedures are so important that they are listed as an essential criterion for valid testing in the Standards for Educational & Psychological Testing Requires that tester must be very familiar with materials, requiring several test administrations Background & Motivation of Examinee Test anxiety • When taking an important examination I sweat a great deal • I freeze up when I take intelligence tests or school exams • I really don’t understand why some people get so upset about tests • I dread courses in which the instructor likes to give “pop” quizzes Test Anxiety (cont’d) Test anxiety is negatively correlated with school achievement, aptitude test scores, & measures of intelligence Test anxiety is exacerbated by tests with time limits Siegman (1956) Compared performance levels of high- and low-anxious medical/psychiatric patients on timed & untimed subtests from the WAIS 12.5 12 11.5 low-anxious subjects 11 10.5 high-anxious subjects 10 9.5 9 untimed tests timed tests Test-Smart or “Coached” Individuals Powers & Swinton (1984) • Mailed test preparation materials, including extra practice tests, explanations to practice test questions, hints or strategies for answering different item types • Special coaching, preparation yielded scores 53 points higher than scores of uncoached respondents Fair Test Fact Sheet The GRE can be conquered with tricks having nothing to do with the knowledge, persistence, thoughtfulness, and other qualities that are vital to graduate study and professional performance. One coaching book advises: "Taking the GRE is a game with its own rules, traps, and measures of success…How you do on the GRE is an indication of how well you play the game, but it is not an indication of how 'intelligent' you are, or what kind of student you will make." Fair Test on Coaching (cont’d) The exam's susceptibility to coaching undermines educational equity by advantaging students who can afford test prep materials - many of whom already score in the upper percentiles - over those who cannot. The most comprehensive coaching classes (which generally offer the greatest score gains) cost upwards of $1,000 or more. One coaching company claims its students gain on average 212 points on the GRE - a substantial advantage in the graduate school application process. While ETS asserts that the GRE is not coachable, it promotes its own materials: test takers can purchase a diagnostic service for $15, Preparing to Take the General Test for $18, or can use the free POWERPREP software package. While there are no independent studies on coaching's impact on the GRE, independent studies of coaching for the similar SAT exams demonstrate that coaching can improve scores (see FairTest's The SAT Coaching Cover-Up). Computer-Administered Tests Improves standardization Can individually tailor administration of test items Allows for precise timing of questions & responses Saves time & expense Allows as much time as necessary for responses Reduces bias in testing Examinees often disclose more in response to computer test than human-administered test Computerized Adaptive Testing An adaptive test is one in which the questions are tailored specifically to the individual being examined An adaptive test of mental ability, for example, will include items that are neither too easy nor too difficult for the respondent As a result, when two individuals of different ability take the same test, they might respond to completely different questions Computerized Adaptive Testing Most difficult items (10) Difficult items (10) Less difficult items (10) Ten items Routing test Average items (10) Easier items (10) Easy items (10) Easiest items (10) Computerized Adaptive Testing Using Item Response Theory Theory allows for a calculation of the difficulty of each item, the discriminating power of the item, and the probability of guessing the correct answer It then provides procedures for • Estimating the respondent’s ability on the basis of his or her response to each item • Choosing the optimal test items on the basis of that estimate • Revising that estimate on the basis of responses to each new item Advantages of Computerized Adaptive Testing Makes it possible to achieve high levels of accuracy using an extremely small set of test questions (reducing fatigue, boredom) Improves item security Saves time & money Research shows that CAT versions of test produce scores that correlate highly (about .90) with paper-and-pencil versions of tests Behavioural Assessment Methodology Research has shown that in certain areas (e.g., estimating job aptitude), the best kind of test involves giving a person some of the tasks he/she will be required to perform on the job, and then observing & rating that performance Reliability of observers’ ratings is critical in these kinds of measures Issues in Behavioural Assessment Reactivity • Reliability and accuracy are highest when someone is monitoring the observers; decreases when work is not monitored Drift • After training, observers tend to “drift” away from the standards or rules they followed in training Expectancies • Behavioural observers will pay more attention to & notice behaviour they expect