Testing in Clinical Psychology Chapter V What is a test? It is a systematic procedure for observing and describing a person’s behaviour in a standard situation Tests present a set of planned stimuli (inkblots or true-false questions) and ask the client to respond them in dome way The client’s reactions become the test results or scores, to be used as samples, signs, or correlates in the clinician’s assessment strategy Test data may lead to conservative, situation specific statements or to sweeping, high-level inference In some ways tests are different than any other assessment devices: A test can be administered in private setting Client’s test response can be quantitatively compared to statistical norms established by the responses of others Tests can be distributed in groups as well as individually What do tests measure? Tests provide measures of everything from A to Z Some of the tests ask direct, specific questions (do you ever feel discouraged?), while others ask for general directions to less distinct stimuli (tell me what you see in this drawing?) Some have correct answers (what is a chicken?) while others probe for opinions or preferences (I enjoy looking at flowers-T or F?) Some are presented in paper and pencil form, some are given orally Some require verbal skills (what does analogy mean?) some ask the client to perform various tasks (please trace the correct path through this puzzle maze) and others combine verbal, numerical, and performance items Many tests can be grouped into three categories: Intellectual or cognitive abilities Personality characteristics Attitudes, interests, preferences, and values Test construction procedures Analytic Approach: Begin by asking “What are the qualities I want to measure?”, “How do I define these qualities?”, and “What kind of tests and test items would make sense for assessing these qualities This is a deductive approach to test construction Empirical Approach: Instead of deciding ahead of time what test content should be used to measure a particular target, the tester lets the content “choose itself ” Empirically driven testers are usually willing to employ items that reliably discriminate among the target groups even though the conceptual relevance of those items cannot always be explained clearly Sequential System Approach: Combines aspects of the analytical and empirical techniques For example, testers who choose initial test items analytically may then examine results statistically to determine which item responses are, and are not, correlated with one another, which items are too easy or too difficult, which items do and do not discriminate. Groups of correlated items are then identified as scales which are thought to be relatively pure measures of certain dimensions of personality, mental ability, or the like Standardization and score interpretation Standardization refers to consistency in administration and scoring of a test The numbers that come from the standardization sample – means, variances, percentages, and so on – are called norms Tests scores can also be interpreted based on a criterion established by the tester rather than on a normative sample Tests scores can be interpreted by comparing only to himself or herself. This is called ipsative measurement Avoiding distortion in test scores Circumstances under which a test is given – temprature, noise, presence of a stranger, etc. – can affect the results Some clients tend to respond in particular ways, which is called response set, response style, and response bias (such as social desirability) Minimizing the extraneous sources of variability: Developing clear, simple instructions Pilot testing Enlisting experts on test bias Building indicators of response bias Ethical Standards for Psychologists’ Use of Tests: Table 5.2, p. 170 Competence Professional/scientific responsibility Integrity Respect for rights and dignity Concern for others’ welfare Social responsibility Access to test materials Criteria for Judging the Psychometric Quality of a Testing Table 5.4, p. 172 Norms Internal consistency Test-retest reliability Inter-rater reliability Content validity Construct validity Generalization validity Clinical utility Theories of Intelligence General intelligence model (g) Psychometric approach Intelligence as a global, general ability Multiple Specific Intelligences Models: Intelligence as a collection of relatively separate abilities Stenberg’s triarchic theory (three kinds of intelligence: analytical, creative, practical) Gardner (8 frames of mind: verbal, mathematical, spatial, bodilykinaesthetic, musical, intrapersonal, interpersonal, and naturalistic Hierarchical and Factor Analytic Models Combination of the two Test of Intellectual Functioning The Binet Scale: 1905-1st version. 30 questions and tasks, including things like wrapping a piece of candy, repeating numbers or sentences from memory, etc. 1908 revision, Binet tests were age graded, so younger children were expected to pass earlier questions, and older children were expected to pass the later ones 1926 Stanford-Binet. Intelligence Quotient (Mental Age / Chronological Test x 100) These categories are used: very superior, superior, high average, average, low average, borderline, and mentally retarded ( mildly, moderately, severely, and profoundly retarded) 1960 edition. They changed the way IQ was derived. IQ tables in which the formula’s results were corrected in light of mean and variance IQs at each age level 1986 edition. Within each subtest, the items are arranged in increasing order of difficulty and their results are organised to assess four major areas of intellectual functioning: verbal reasoning, abstract/visual reasoning, quantitative reasoning, and short-term memory Standard Age Score (SAS) is determined for each subtest by using tables that convert raw scores to normalised standard scores with a mean of 50 and a standard deviation of 8 for each age group The Wechsler Scale: Wechsler-Bellevue aimed at adults (aged 17 and older) It is a point scale (client receives credit for each correct answer Items were arranged in subtests based on similarity. Each subtest contained increasingly difficult items The WB contained six verbal subtests (information, comprehension, arithmetic, similarities, digit span, and vocabulary) and five performance subtests (digit symbol, picture completion, block design, picture arrangement, and object assembly) WAIS (Wechsler Adult Intelligence Scale): 6 verbal and 5 performance subtests Measure Verbal IQ, Performance IQ, and Full-Scale IQ The WAIS III: Extended age range (through age 89) Four new Index Scores (verbal comprehension, working memory, perceptual organisation, and processing speed) Clinicians can obtain a multifaceted description of a person’s cognitive strengths and weaknesses WISC (Wechsler Intelligence Scale for Children): 12 subtests (6 verbal and 6 performance) of which only 10 were usually administered Not useful for very young kids (5 to 15) WPPSI (the Wechsler Preschool and Primary Scale of Intelligence) reached to 4 years old WPPSI-R reached to 3 years old WISC-R 12 subtests (6 verbal and 6 performance) of which only 10 were usually administered More representative content than WISC WISC III New items were added to replace outdated, culturally unfair, to easy or too difficult Symbol Search subscale was added as a supplementary for the Coding subtest Other Intelligence Tests: Kaufman Assessment Battery for Children (K-ABC): Children 2 ½ to 12 ½ years of age It defines the intelligence as the ability to solve new problems (fluid intelligence) rather than knowledge of facts (crystallized intelligence) Woodcock-Johnson Psycho-Educational Battery Both children and adults 27 subtest cover cognitive ability, academic achievement, and individual interests The Peabody Picture Vocabulary Test – Revised The Porteus Maze Test The Raven’s Progressive Matrices Aptitude and Achievement Tests The Scholastic Aptitude Tests (SAT) Woodcock-Johnson Cognitive Battery III Woodcock-Johnson Achievement Battery III Kaufman Test Educational Achievement (K-TEA) Wechsler Individual Assessment (WIAT) Tests of Attitudes, Interests, Preferences and Values The Strong-Campbell Interest Survey The Kuder Occupational Interest Survey Career Assessment Inventory These paper and pencil tests designed to assess client’s preferences for various pursuits, occupations, academic subjects, recreational activities, and people Allport-Vernon-Lindzey Study of Values: generalised life orientations (theoretical, economic, aesthetic, social, political, and religious) Purpose-in-Life Test: humanistic value assessment scale Reinforcement Survey Schedule: a list of situations and activities that the client rates in terms of desirability Personality Tests Personality can be defined as the pattern of behavioural and psychological characteristics by which a person can be compared and contrast with other people Clinicians seek way to describe and understand consistencies and inconsistencies in a given person, and also how people in general tend to resemble and differ from one another Theoretical approaches to personality varies Objective Tests: present relatively clear, specific stimuli such as questions, statements, or concepts to which the client responds with direct answers, choices or ratings Projective Tests: each individual’s personality will determine the way she interprets things. Clients are asked to respond to ambiguous or unstructured stimuli (inkblots, drawings, etc) and their responses are interpreted as a reflection of both conscious and unconscious aspects of their personality structure and dynamics Objective Personality Tests Personal Data Sheet: is a first objective tests The MMPI (the Minnesota Multiphasic Personality Inventory): True-false-cannot say response style When compared to “normals” members of various diagnostic groups showed statistically different responses to many items There are 10 clinical scales (Hypochondriasis, Depression, Conversion Hysteria, Psychopathic Deviance, Masculinity-femininity, Paranoia, Psychasthenia, Schizophrenia, Hypomania, Social Introversion), 4 validity scales (?, L, F, K) Clinicians conduct profile analyses by comparing a client’s MMPI scores with those of other clients: a. Clinically: by recalling previous clients’ patterns b. Statistically: by reference to books containing sample profiles and the characteristics of the people who produced them The original MMPI was criticised for its outdated and unrepresentative standardisation sample, for deficiencies in its covarage of some aspects of mental disorders, for its oldfashioned items, and for the unreliability of some of its scales MMPI-2 The CPI (The California Psychological Inventory): Broad-range, empirically constructed, objective personality test Developed to measure personality in the normal population Items are grouped into more diverse and positively oriented scales (sociability, self-acceptance, responsibility, dominance, self-control, etc.) and three validity scales Representative of standardisation sample Other Objective Personality Inventories: Personality Research Form (PRF), The Milton Clinical Multiaxial Inventory (MCM-II), and the Meyers-Briggs Type Indicator (MBTI) Objective Tests Based on Factor Analysis: Aim is to determine the minimum number of traits or characteristics One approach is to examine how much different traits overlap with one another FA is a mathematical procedure that helps to reduce the complexity of many different traits by grouping them into clusters or factors based on the pattern of correlations between the different traits Cattel-16 PF (16 Personality Factors Questionnaire) Eysenck Personality Questionnaire (3 Basic Personality Factors – Psychoticism, Introversion-Extraversion, and Emotional Stability) Many studies resulted in 5-factor solutions 1. 2. 3. 4. 5. Big-Five factors includes: Neuroticism: a tendency to feel anxious, angry, and depressed in many situations Extraversion: a tendency to be assertive, active, and prefer to be with other people Openness: a quality indicating active imagination, curiosity, and receptiveness to many experiences Agreeableness: orientation toward positive, sympathetic, helpful interactions with others Conscientiousness: a tendency to be reliable and persistent in pursuing goals Behavioural Tests: Fear Survey Schedule: list of objects, persons, situations that the client rates in terms of fearsomeness State-Trait Anxiety Inventory The Social Phobia and Anxiety Inventory PTSD Symptom Scale Beck depression Inventory The Multiple Affect Adjective Checklist The Bulimia Test-Revised Projective Personality Tests The Rorschach Inkblot Test: A set of 10 coloured and black-and-white inkblots The client is shown 10 cards, one at a time and client asked what she sees or what the blot could be The tester records all responses verbatim and takes about response times, how the card was held as responses occurred, noticeable emotional reactions, and other behaviours When she is done, tester goes back through the set of cards and conducts an inquiry or systematic questioning of the client about the characteristics of each blot Initial reactions and comments during the inquiry are coded Example The Thematic Apperception Test (TAT): Consist of 30 drawings of people, objects, and landscapes Generally 10 of these cards (one of them blank) are administered Determined by the client’s age, sex, and by the clinician’s interest Tester shows each picture and ask the client to make up a story about it, including what led up to the scene, what is now happening, and what is going to happen Client is encouraged to say what the people in the drawings are thinking and feeling For the blank card the respondents are asked to imagine a drawing, describe it, and then construct a story about it Example Analysis of the TAT can focus upon both the content and the structure of TAT stories Content: what client describe, the people, the feelings, the events, the outcomes. Structure: how client tells her story: logic, organisation, use of language, the appearance of speech dysfluency, the misunderstanding of instructions or stimuli in the drawing, and obvious emotional arousal Some clinicians prefer TAT scoring systems that are relatively unstructured. They develop an idiosyncratic combination of principles derived from psychodynamic theory and their clinical experience Incomplete Sentence Test: Ask clients to complete incomplete sentences How the client finishes the sentences reflect important personality characteristics “I like…”, “My father,…”, “I secretly…” (Rotter Incomplete Sentence Blank) Example Rotter Incomplete Sentence Blank 1. I feel . . .hopeful about most things. 2. I regret . . .not being able to communicate with my ex-wife 3. Other people . . .are usually fair and honest. 4. I am best when . . .I'm at home with my family. 5. What bothers me is . . .the thought of losing contact with my children. 6. The happiest time . . .is when I'm spending time with my children. 7. I am afraid of . . .being separated from my children. 8. My father . . .is someone I can always talk to about things. 9. I dislike to . . .argue with my wife. 10. I failed . . .to understand my wife's needs. 11. At home . . .is one of the places I like best. 12. Boys . . .can be a challenge to keep up with! 13. My mother . . .always took care of her family. 14. I suffer . . .from trying too hard sometimes. 15. The future . . .seems uncertain right now. 16. Other kids . . .were my best friends when I was young. 17. My nerves are . . .somewhat unsettled lately. 18. Girls . . .were a mystery to me in High School. 19. My greatest worry is . . .not being able to see my kids. Projective Drawings: The client’s drawing serve as the basis for the clinician’s inference about various aspects of client’s personality House-tree-person (HTP) Draw a Person Test (DAP) Bender-Gestalt (sometimes) Example: Draw a Person (C. 7 years old) The Psychometric Properties of Tests Reliability: In general reliability of psychological tests tend to be adequate but not uniformly so Determining reliability of projective tests is problematic because split-half, parallel form, and test-retest coefficients often do not make sense with such instruments The scoring of projective tests has traditionally been far more subjective than for objective tests Interrater reliabilities have tended to be low More objective scoring systems for some of the projective tests (such as Rorschach) Validity: Overall the validity of psychological tests has been less impressive than their reliability For most tests the size of the discrepancy between the reliability and validity is too great In general, the closer a test content or task are to the content or task being assessed (i.e. the criterion) the higher the validity will be Distortion of Test Scores: Non-standard data collection procedures Client’s motivation Structure of the items and response alternatives The testing circumstances Client’s tendency to respond in particular ways (i.e. response style, social desirability bias, and response bias): - Social desirability (responding in most socially acceptable way) - Acquiescent response style (tendency to be agree with any self-descriptive items) Other Client variables (culture, education, etc.)