PSYCHOLOGICAL ASSESSMENT GENERAL CONCEPTS Uses: 1. Measure differences between individuals or between reactions of the same individual under different circumstances 2. Detection of intellectual difficulties, severe emotional problems, and behavioral disorders 3. Classification of students according to type of instruction, slow and fast learners, educational and occupational counseling, selection of applicants for professional schools 4. Individual counseling – educational and vocational plans, emotional well-being, effective interpersonal relations, enhance understanding and personal development, aid in decision- making 5. Basic research – nature and extent of individual differences, psychological traits, group differences, identification of biological and cultural factors 6. Investigating problems such as developmental changes in the lifespan, effectiveness of educational interventions, psychotherapy outcomes, community program impact assessment, influence of environment on performance Measures broad aptitudes to specific skills Features of a psychological test: Sample of behavior Objective and standardized measure of behavior Diagnostic or predictive value depends on how much it is an indicator of relatively broad and significant areas of behavior Tests alone are not enough – it has to be empirically demonstrated that test performance is related to the skill set for which he or she is tested Tests need not resemble closely the behavior trying to be predicted Prediction – assumes that the performance of the individual in the test generalizes to other situations Capacity – can tests measure “potential”? Only in the sense that present behavior can be used as an indicator of future behavior? No psychological test can do more than measure behavior OBJECTIVE MEASUREMENT OF DIFFICULTY o o STANDARDIZATION Uniformity of procedure when administering and scoring a test Testing conditions must be the same for all Establishing norms (normal or average performance of others who took the same test under the same conditions) Raw scores are meaningless unless evaluated against suitable interpretative data Standardization sample – indicates average performance and frequency of deviating by varying degrees from the average o Indicates position with reference to all others who took the test o In personality tests, indicates scores typically obtained by average persons Objective – scores remain the same regardless of examiner characteristics Difficulty – items passed by the most number of people are the easiest Reliability - Consistency of scores obtained when retested with the same test or with an equivalent form of test Validity - Degree to which the test measures what it’s supposed to measure - Requires independent, external criteria against which the test is evaluated - Validity coefficient – determines how closely the criterion performance can be predicted from the test score – low correspondence between test performance and criterion – high correspondence between test performance and criterion - Broader tests must be validated against accumulated data based on different investigations - Validity is first established on a representative sample of test takers before it is ready for use h we know what the test measures Guidelines in the Use of Psychological Tests General: prevent the misinterpretation and misuse of the test to avoid: o Rendering test invalid; and o Hurting the individual A qualified examiner needs to: o Select, administer, score, and interpret the test o Evaluate validity, reliability, difficulty level, and norms o Be familiar with standardized instructions and conditions o Understand the test, test-taker, and testing conditions o Remember that scores obtained can only be interpreted wit reference to the specific procedure used to validate the test o Obtain some background data in order to interpret the score o Obtain information on other special factors that influenced the score The test user is anyone who uses test scores to arrive at decisions o Most frequent cause of misuse: insufficient or faulty knowledge about the test Ensure the security of test content and communication o Need to forestall deliberate efforts to fake scores o Need to communicate in order to: - Dispel the mystery surrounding the test and correct prevalent misconceptions Present relevant data about reliability, validity, and other psychometric properties - Familiarize test-takers about procedures, dispel anxiety, an ensure that best performance is given - Feedback regarding test performance The test administration: o Should help predict how the client will behave outside the testing situation o Influences specific to the testing situation introduces error variance and reduces test validity o Examiners need to memorize exact verbal instructions, prepare test materials, and familiarize themselves with specific testing procedure The testing conditions o Suitable testing room with adequate lighting, ventilation, seating facilities, and work space o Implications of details during testing (e.g. improvised answer sheet, paper-and-pencil vs. computer, familiar examiner vs. stranger) o Need to: o o of test how to take the test most minute detail account when interpreting test results Some examiners may deviate from procedure to extract more information. However, scores obtained this way can no longer be compared to the norm. Establish rapport o Examiner’s efforts to arouse interest in the test, elicit cooperation, and encourage them to respond in a manner appropriate to the test objectives Any deviation from standard motivating conditions should be noted and used for interpretation Maximizing rapport: - Maintain a friendly, cheerful, and relaxed manner - Consider examinee characteristics (e.g. for children, consider presenting the test as a game, have brief test periods) - Be sensitive to special difficulties - Give reassurance – no one is expected to finish or to get all items correctly (every test implies a threat to a person’s prestige) - Eliminate surprise by: items - Convince them that is in their own interest to obtain a valid and reliable score (e.g.avoiding waste of time, arriving at correct decisions) Examiner and Situational Variables - E.g. age, sex, ethnicity, professional or socioeconomic status, training and experience, personality characteristics - Manner: warm vs. cold, rigid vs. normal - Testing variables: nature of test, purpose of testing, instructions given to test-takers Examiner’s non-verbal behavior (e.g. facial or postural cues) - Test-taker: activities preceding the task, receiving feedback In case these situations cannot be controlled, qualify this in the feedback / report Norms – represents test performance of the standardization sample The raw score is converted into a derived score, which: o Training effects - Coaching – close resemblance of test content and coaching material o o scores - to specific test content, there is low generalizability of improvement to other criteria Test sophistication – repeated testing experience introduces an advantage over first-time test-takers - details, and implications of a solution formulation of a solution evaluate performance NORMS AND THE MEANING OF TEST SCORES REMEMBER: In the absence of additional interpretative data, a raw score on any psychological test is meaningless. Measures relative standing in the normative sample – performance in reference to other persons Permits direct comparison on different tests Can be expressed in terms of - Developmental level attained; or - Relative position within a specified group STATISTICAL CONCEPTS equalize It’s more effective to train on broad cognitive skills such as: -solving Statistics – used to organize and summarize quantitative data to facilitate and understanding of it Frequency distribution – tabulating scores into class intervals and counting how often a score falling in the class interval appears within the data Normal curve features: - Largest number of cases cluster in the center of the range Number drops gradually in both directions as extremes are approached - Bilaterally symmetrical – 50% of cases fall to the left and to the right of the - Single peak in the center Central tendency – single, most typical or representative scores to characterize the performance of an entire group Mean – average; add all scores and divide by total number of cases Mode – most frequent score; midpoint of the class interval with the highest frequency; highest point on the distribution curve Median – middlemost score when all scores have been arranged from smallest to largest Variability – extent of individual differences around the central tendency Range – highest and lowest score o Deviation – difference between an individual’s score and the mean of the group (x = X - M) Standard deviation – square root of the variance; compares the variability of different groups o higher standard deviation means more individual differences (variation) DEVELOPMENTAL NORMS Basal age – highest age at and below which all tests were passed Mental age – basal age + partial credits in months for tests passed above basal agelevel tests o Mental age unit shrinks correspondingly with age Grade equivalent – mean raw score obtained by children in each grade Disadvantages: o Appropriate only for common subjects taught across grade levels (e.g. not applicable for high school level) Emphasis on different subjects may vary from grade to grade Grade norms are not performance standards Where X = raw score, µ = mean, and σ = standard deviation o Percentile – percentage of persons who fall below a given raw score o Indicates person’s relative position in the standardization sample o The lower the percentile, the lower the standing o Advantages: Easy to compute Can be easily understood Universally applicable o Disadvantage: inequality of units Shows only the relative position but not the the group (because the distance from 3σ to -2σ is equal to 2.14, + distance from -2σ to -1σ id 13.59 = 13.59 + 2.14 = 15.73 - T-score – (normalized standard score) x 10 ± 50 o µ = 50, σ = 10 - Stanine – also called “standard nine” o µ = 5, σ = 2 Stanine Percentage Ordinal scales – sequential patterning of early behavior development o Developmental stages follow a constant order; each stage presupposes mastery of an earlier stage WITHIN-GROUP NORMS amount of difference between the scores Standard score – individual’s distance from the mean in terms of standard deviation units o Linear transformation – retain exact numerical relations of original raw scores Subtract constant, divide by constant Also called z-score - Non-linear transformation – fit scores to any specified distribution curve (usually normal curve) Normalized standard scores – distribution that has been transformed to fit the normal curve Compute the percentage of persons falling at or above each raw score Locate percentage in the normal curve Obtain normalized standard score Example: A score of -1 means that person surpassed approximately 16% of Deviation IQ o IQ = ratio of mental age to chronological age if IQ = 100, mental age = chronological age 𝐼𝑄 = 𝑚𝑒𝑛𝑡𝑎𝑙 𝑎𝑔𝑒/𝑐ℎ𝑟𝑜𝑛𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑎𝑔𝑒 o o standard score with µ = 100 and σ = 15 (or 16, depending on the test) DIQ is only comparable across tests if they have the same mean and standard deviation o Relativity of Norms IQ should always be accompanied by the name of the test Individual’s standing may be misrepresented if inappropriate norms are used Sources of variation across tests: o Test content o Scale units of mean and standard deviation o Composition of standardization sample Normative sample – ideally, a representative cross-section of the population for which the test is designed o Sample – group of persons actually tested o Population – larger but similarly constituted group from which the sample is drawn o Should be large enough to provide stable values o Should be representative of the population under consideration Else, restrict the population to fit the sample (redefine population) o Should consider specific influences affecting the normative sample Anchor Norms – used to work out equivalency between tests Equipercentile method – scores are equivalent if they have equal percentiles on two tests (e.g. 80th percentile in Test A = IQ of 115, 80th percentile in Test B = IQ of 120, therefore Test A’s 115 is Test B’s 120) Specific norms – tests are standardized to more specific populations to suit the purpose of the test (a.k.a. subgroup, local norms) Fixed reference group – comparability and continuity referred to for An independently sampled group against which future test scores are compared Updated via anchor test (or list of common items) that have items occurring in the original reference group. Adjustments are made based on comparing frequency of correct answers on common items of previous group and present group DOMAIN-REFERENCED TEST INTERPRETATION Aka “criterion-referenced” testing Reference is content domain rather than a group of persons Tests mastery of specific content (what can the client do?) Content meaning – focus on what they can do vs. how they compare with others Should have content that is widelyrecognized as important Should have items that sample each objective Best used for testing basic skills at elementary levels Mastery testing – if individual has or has not obtained a pre-established level of mastery o Individual differences is of little or no importance Impractical for content beyond elementary skills because of differing levels of achievement, instruction Tests need to have critical variables required for performance of certain functions Efforts should be made to address limitations of a single test score o Cutoff should be a band of scores rather than a single score on one administration of the o test o Should be dependent on other sources of information o Both test construction and content experts should decide on cutoff scores o Score should be established on empirical data RELIABILITY RELIABILITY Consistency of scores obtained by the same person across time, items, or other test conditions Extent to which individual differences in test scores represent “true” differences or chance errors Estimate what proportion of test score variance is error variance o Error variance – difference in scores resulting from conditions that are irrelevant to the purpose of the test No test is a perfectly reliable instrument CORRELATION COEFFICIENT - Expresses the degree of relationship between two scores Zero correlation indicates the total absence of a relationship Pearson Product-Moment Correlation Coefficient – accounts for individual’s position in the group and the amount of deviation from the mean - TYPES OF RELIABILITY Statistical significance – whether findings in the sample can be generalized to the population “significant at the .01 level” = there is only about 1 out of 100 chance that the findings in the sample is wrong (i.e. only 1 in 100 chance that the correlation is actually 0). Significance level – risk of error we’re willing to take in drawing conclusions from our data Test-Retest Reliability Confidence interval – range of score under which the true score might fall given a specified level of confidence Reliability coefficient – use of correlation coefficient for psychometric properties o Level of acceptable correlation coefficient = .80 - .90 Repeat same test on the same person on another occasion Test for correlation between scores on the two separate testing occasions Source of error variance – fluctuations in performance between the two testing occasions Shows how test can be generalized across situations Higher reliability, lower susceptibility to random changes Need to specify length of interval Interval rarely exceeds 6 Months Disadvantage: practice effect Can only be applied to tests in which performance is not affected by repetition (e.g. sensorimotor, motor) Alternate-Form Reliability Same person is tested with one form on one occasion and an alternate, equivalent form on another occasion Test for correlation of scores on the two forms Measure of both temporal stability and consistency of responses to tw different item samples Source of error variance: content sampling (to what extent does performance depend on specific items or arrangement of the test?) Parallel forms must: o Be independently constructed; o Items should be expressed in the same form; o Same type of content; o Equivalent range and level of difficulty; o Instructions, time limits, and sample items must be equivalent Disadvantage: reduce but does not completely eliminate practice effect Questionable: degree of change in the test due to repetition (e.g. insight tasks) Split-Half Reliability Two scores are obtained by dividing it into equivalent halves Source of error variance: content sampling Test for coefficient of internal consistency Single administration of a single form Longer test = more reliable Spearman-Brown formula – for estimating the effect of shortening or lengthening the test o Used because this type of reliability only technically computes for the reliability of half the test o Inter-item Consistency (a.k.a. Kuder-Richardson Reliability and Coefficient Alpha) Single administration of a single form Consistency of all items in the test Source of error variance: o Content sampling o Heterogeneity of behavior More homogenous items, more consistency However, is a homogenous test appropriate for a heterogeneous psychological construct? Is the criterion being predicted homogenous or heterogeneous? Unless items are highly homogenous, the KR coefficient will be lower than S-H Used for tests with no wrong or right answers Scorer Reliability Factors excluded from error variance: o True variance (remains in scores) o Irrelevant factors that can be controlled experimentally Correlate results obtained by two separate scorers Interpreting reliability scores: .85 = 85% is true variance, 15% is error variance Analysis of error variance: o Reliability from delayed alternate forms = 1 - .70 = .30 (content + time) o Reliability from split-half = 1 - .80 = .20 (content) o Error variance due to time: .30 - .20 = .10 (time) Scorer Reliability = 1 - .92 = .08 (interscorer) o Error variance = .30 + .20 + .08 = .38 o True variance = 1 - .38 = .62 Speed tests – low-difficulty test with very short time limit Power tests – high-difficulty items with no time limit No one can complete the tests Reliability of speed tests cannot be measured from single administration Reliability is affected by the range of individual differences in the group Reliability is also affected by varying average ability level Standard Error of Measurement Also expresses reliability Interval between which the true score may lie (obtained score ± 1 SEM) VALIDITY Criterion-related Validity Validity What the test measures and how well it measures it What can be inferred from the test scores Correlation coefficient between a test score and a direct and independent measure of criterion Content-Description Procedures (Content Validity) Systematic examination of the test to evaluate if it covers a representative sample of behavior to be tested Content must be broadly defined to cover major objectives Important to consider test-taker responses, not just relevance of content Test specifications – content areas or topics to be covered, objectives, importance of topics, number of items per topic o More appropriate for achievement tests o Does the test cover a representative sample of specified skills and knowledge? o Is test performance free from irrelevant variables? Face validity – whether the test “looks valid” to test-takers and other technically untrained observers o Desirable feature of a test but should not be a substitute for other types of validity Indicate test’s effectiveness in predicting performance in specified activities Not about time, but about objective of testing Concurrent – used to diagnose existing status (Does person qualify or the job?) o Criterion data is already available Predictive – used to predict future performance (Does person have the prerequisites to do well in a job?) Avoid criterion contamination (e.g. rater’s knowledge of test contaminates criterion ratings) Criterion measure examples: academic achievement, performance in training, actual job performance, contrasted groups (extremes of distribution of criterion measures); psychiatric diagnoses, ratings by authority, correlation between new test and previously-available test Pre-test and post-test scores = training is valid if, after training, failed items in pretest were passed during post-test Structural Equation Modeling (SEM) – explores relationships among constructs and the path that a construct uses to affect criterion performance Construct Validity Extent to which a test measures a theoretical construct or trait Evidence includes research on nature of the trait and the conditions affecting development and manifestation Age differentiation – used in traditional intelligence tests Correlation with other tests – new test measures approximately the same behavior as the previous test o Moderate correlation is desirable Factorial validity – identification of factors and determining factors that impact the scores Internal consistency – measure of homogeneity o Upper criterion group vs lower criterion group – items that do not show higher scores on upper criterion group are eliminated Convergent – test correlates highly with others tests that it should theoretically correlate with Discriminant – test does not correlate with variables it should be theoretically different from Measurement and Interpretation of Validity Validity coefficient – correlation between test scores and criterion Conditions affecting validity: o Demographics of the group o Sample heterogeneity (i.e. sample was pre-selected) o Change over time because of selection standards o Relationship between test and criterion (linear, curved, curvilinear) Heterscedasticity – unequal variability in high and low scores (e.g. little variability in scores of Test A when scores on test B are low, wider variability of scores in Test A when scores of test B are higher) Uses of Tests for Decision-making Selection – either accepted or rejected Placement – assignments to different categories based on a single score Classification – involves two or more criteria for placement Differential validity – test should be able to determine differences in person’s performance in different jobs or programs (i.e. test should be able to see if person is good at Job A and not at Job B) o Battery should include tests that are good predictors of criterion A and poor predictors of criterion B, and vice-versa Multiple discriminant functions – determine how closely a set of scores approximates typical scores in a given job, diagnosis, etc. Used for: o Criterion is unavailable but group characteristics are o Non-linear relationship between a criterion and one or more predictors Test bias Slope bias – significantly different validity coefficients in the two groups (differential validity) Intercept bias – systematically under- or over-predicts criterion performance for a particular group ITEM ANALYSIS Item Analysis Used to shorten test and increase its reliability and validity Item difficulty – percentage of people passing the item o Items are usually arranged in increasing difficulty o The higher the inter-item correlations, the wider the spread of difficulty should be Thurstone Absolute Scaling o Find scale values of items separately within each group by converting percentage passing into z-values o Translate all these scale values into corresponding values for the group chosen as the reference group Test score distribution must approximate the normal curve Item discrimination – degree to which an item differentiates correctly among testtakers in the measured behavior In contrasting groups, upper and lower 27% are used Purpose: identify deficiencies in the test or in the teaching Index of discrimination (D) – difference in percentage passing of upper scorers and lower scorers (convert number of persons passing into percentages) Phi coefficient – relationship between item and criterion Item Response Theory Item-test regression – represents both item difficulty and item discrimination o Difficulty level – 50% threshold (50% passing and 50% failing) o Discriminative power = steeper curve, higher discriminative index Item performance is related to an estimated amount of latent trait Item information functions – takes all item parameters into account and shows how efficiently an item measures behavior at different ability levels Item parameters should not vary according to ability levels Cross-validation – independent validation of the test separate from group on which items were selected o Factors affecting lowering validity across different groups: Size of the original item pool = number in original item pool was large and the number retained is small, there is a higher chance that the validity of the retained items will be spurious because of more opportunities to capitalize on chance differences Size of the sample – smaller sample size has higher error variance Items are assembled without a theory Differential item functioning – identify items for which persons with equal ability from different cultural groups have different probabilities for success - Possible reason: item does not measure the same construct in the two groups. INTELLIGENCE Intelligence Ability level at a given point in time Score is not indicative of the reasons behind performance o Should be descriptive rather than explanatory Should not be used to label individuals but help in understanding them Start where they are, assess strengths and weaknesses, make interventions Contribute to self-understanding and personal development Not a single entity but a composite of several functions o Combination of abilities required for survival and advancement within a culture Measures of scholastic aptitude or academic achievement o Reflective of prior educational achievement o Indicator of future performance o Effective predictor of performance in various occupations and daily life activities Should not be the only basis for making decisions Heritability and Modifiability Heritability index – how much of the variation in scores is due to genetics? o Obtained using correlations of monozygotic and dizygotic twins Limitations o Applicable to populations and not individuals – MR can still be due to a defective gene o Limited to population characteristics at a given time o Does not indicate modifiability IQ is not fixed and unchanging, it can be modified o Changes can result from events or environmental interventions o Training on cognitive skills, problemsolving strategies, efficient learning habits Motivation Personality is not independent from aptitude Aptitudes cannot be investigated independent from affect o Prediction of subsequent performance can be enhanced by combining it with information about motivation and attitudes Achievement elsewhere can help shape cognitive performance (self-concept) o Theories of Intelligence Organization Two-Factor Theory Charles Spearman All intellectual factors share a common factor (g) (s) – specific factors limited to very specific abilities Only g accounts for the correlation of performance in two intelligence tests Aim of testing: measure the amount of g Single test that is highly saturated with g could be substituted for test with heterogeneous items Abstract relations are the best measures of g Group factor – degree of correlation that may result above and beyond g (e.g. arithmetic, mechanical, linguistic) – common to some but not all Multiple Factor Theories Thurstone Group factors called “primary mental abilities” o Verbal comprehension – tests such as reading comprehension, verbal analogies, verbal reasoning, etc. o Word fluency – anagrams, rhyming, naming words within a category o Number – speed and accuracy of arithmetic operations o Space – perception of fixed spatial or geometric relations, manipulatory visualizations o Associative Memory o Perceptual Speed – quick and accurate grasp of visual details, similarities, and differences Induction / General Reasoning – find a rule and apply it to others Structure of Intellect Model Guilford Mental abilities can be traced into underlying factors, which are categorized into three dimensions o Operations – what the person does (memory recording and retention, divergent and convergent production, evaluation, cognition) o Contents – nature of materials on which operations are performed (visual, auditory, symbolic, semantic, behavioral) o Products – form in which information is processed (units, classes, relations, systems, transformations, implications) Cattell-Horn-Carroll Theory (C-H-C) Catell o Fluid intelligence - broad ability to reason, form concepts, and solve problems using unfamiliar information or novel procedures. o Crystallized intelligence - breadth and depth of a person's acquired knowledge, the ability to communicate one's knowledge, and the ability to reason using previously learned experiences or procedures. Carroll o Layers that represent ABILITY TESTS Index of general level of performance Often designated as tests of scholastic aptitude or academic achievement Individually-Administered Intelligence Tests Nature and Development of Traits Differences in factor patterns are influenced by experiential background Change over time is also observed, also because of different methods in carrying out the same task Mechanisms o Learning set – learning through presentation of different problems of the same kind o o Transfer of training – formal schooling where efficient and systematic problem-solving techniques is learned Co-occurrence of learning experiences – learn one, learn all in a proper environment Processing skills tend to be specific to type of content being processed (domain specificity) Domain – content (linguistics, mathematical) or context (cultural, social, geographical) Intelligence tests developed are just measures of scholastic achievement. Are there tests that will measure so-called “practical, everyday” intelligence? Stanford-Binet Intelligence Test (5th Edition) o Age range: 2-65 years old o Verbal and Non-verbal Fluid Reasoning Knowledge Quantitative Reasoning Visual-Spatial Processing Working Memory o Reliability: split-half, test-retest, interscorer o Validity: Content, Construct (age differentiation) Wechsler Scales o Wechsler Preschool and Primary Scale of Intelligence (WPPSI-R), Wechsler Intelligence Scale for Children (WISCIV), Wechsler Adult Intelligence Scale (WAIS-IV) Verbal o Information o Comprehension o Similarities o Vocabulary Performance o Block Design o Matrix Reasoning o Visual Puzzles o Picture Completion o Figure Weights Working Memory o Digit Span o Arithmetic o Letter-Number Sequencing Processing Speed o o Coding Reliability: test-retest, interscorer Validity: Construct (convergent with other cognitive abilities including motor, memory, language, attention) Differential Ability Scales o Measure specific abilities rather than a global IQ Core subtests = General Conceptual Ability Diagnostic subtests – relatively independent abilities Achievement tests o Reliability: internal consistency, testretest o Validity: Criterion (w/ Wechsler, SB), Construct Kaufman Scales Tests for Special Populations Infant and Preschool Testing Require individual administration Bayley Scales of Infant Development o Assess current developmental status rather than subsequent ability Comprehensive Assessment of Mentally-Retarded Persons o Mental scale – sensory and perceptual, memory, learning, problem-solving, vocalization, vocal communication, abstract thinking o Motor scale – gross motor abilities o Behavior rating scale – personality development: emotional and social behavior, attention span and arousal, persistence, goal-directedness McCarthy Scales of Children’s Abilities o Index of functioning at the time of testing Verbal Perceptual-Performance Quantitative Genera Cognitive Memory Motor Piagetian Scales o Presuppose a uniform sequence of development through successive stages Object permanence Development of means to achieve ends Imitation Operational causality Object relations in space o Development of schemata for relating to objects Mental retardation – substantial limitations in present functioning o Sub-average intellectual functioning concurrent with limitations in two or more of the following: communication, self-care, home living, social skills, community use, self-direction, health and safety, functional academics, leisure and work Vineland Adaptive Behavior Scale – focus on what individual habitually does rather than what s/he can do Testing Persons with Physical Disabilities Modify testing medium, time limits, content of tests Individualized assessment using a variety of data from different sources Hearing impairments – usually handicapped by verbal tests Visual impairments – adapting oral tests, no performance tests Motor impairments – may not be able to compose oral or written responses, no time limit, more prone to fatigue Multicultural Testing Language, speed removed as a parameter Varying test content Ravens’s Advanced Progressive Matrices o Measure of ‘g’ Requires eduction of relations among abstract items Culture-Fair Intelligence Test o Cattell o Test of fluid reasoning o Inductive reasoning – make broad generalizations based on available data o Think logically and solve problems in novel situations, regardless of learned intelligence Series - Choose which best completes the series Classification - Identify two figures which are in some way different from others Matrices - Complete the design or matrix presented Conditions - Select the one that duplicates the conditions given Goodenough-Harris Drawing Test o Accuracy of observation and development of conceptual thinking o Test may measure different functions at different ages Approaches to cross-cultural testing 1. Choose items that are common across cultures; validate against local criteria 2. Develop a test within one culture and administer it to persons with different cultural backgrounds 3. Different tests are developed for each culture, validated, and used only within that culture Group Tests Used in educational system, government service, industry military Typically employs multiple-choice format for uniformity and objectivity in scoring Increasing difficulty arranged in separately timed subtests Spiral-omnibus format – single long time limit, mixed items of increasing difficulty Advantages o Can be administered simultaneously o Greatly simplifies examiner’s role o Provides more uniform testing conditions o Scoring is more objective o Provide better established norms Disadvantages o Less opportunity for rapport, maintaining cooperation and interest o Less likely to detect extraneous interfering variables o Examinees have restricted responses – penalized original thinkers o Little to no opportunity for direct observations o Lack of flexibility Tests for Multiple Aptitudes Multi-level batteries Sample major intellectual skills found to be pre-requisite for schoolwork Suitable for schools for comparability across levels Youngest age suitable for group testing: Kindergarten / 1st grade Cognitive Abilities Test o Verbal – verbal classification, sentence completion, verbal analogies o Quantitative – quantitative relations, number series, equation building o Nonverbal – figure classification, figure analogies, figure analysis Test of Cognitive Skills o Sequences – understanding and applying rules of arrangement in patterns of figures, letters, or numbers o Analogies – identifying the relationship and applying the principle to select a second pair exhibiting the same relationship o Verbal Reasoning – identification of essential elements in objects or things, inferring relationships between sets of words, drawing logical conclusions from verbal passages o Memory – definitions of a set of artificial words are presented and recall is tested after other tests have been given Used because of: o Intraindividual variation in performance on intelligence scales o Tests are found to be primarily a measure of verbal comprehension Differential Aptitudes Test o For educational and career counseling of grades 8-12 students o Verbal Reasoning, Numerical Reasoning, Abstract Reasoning, Perceptual Speed and Accuracy, Mechanical Reasoning, Space Relations, Spelling, Language Use Multidimensional Aptitude Battery o Group test designed to measure the same aptitudes as WAIS-R o Suitable for adolescents and adults, not for individuals with mental disturbance or retardation o Provides fully interpretable scores at the subtest level (T-score), Verbal and Performance Level, and Overall Total Score Psychological Issues in Ability Testing Nature of intelligence o Intelligence is complex and dynamic o Intelligence test performance is highly stable o Intelligence develops cumulatively Environmental contributions to Intelligence o Environmental stability contributes to IQ stability o Pre-requisite learning skills contribute to subsequent learning Functional academics + personality characteristics o Rises or drops may occur as a result of environmental changes o o o Genetics and development o Pre-school tests have moderate predictive validity, infant tests have none o In the absence of inborn pathology, environment plays a major role in subsequent development o Developmental transformations – rudimentary skills at infancy are transformed with age into more complex manifestations o Individual differences within age level is greater than individual differences across age levels o Changes that occur with aging varies with the individual o Demands for adults are different from school-age children (practical vs. academic information) o Mean IQ of adult performance increased over the years (Flynn effect) Research trends o Cross-sectional analysis of IQ trends – adults have less IQ (because they received less education) o Longitudinal studies – scores tend to improve with age Culture Cultural changes and not simply age determines rises and declines in performance Cultural influences will and should be reflected in test scores Cultural differences can become handicap when the individual moves from one culture to another and attempts to succeed in the latter culture PERSONALITY Measures emotional, motivation, interpersonal, attitudinal characteristics of a person Development of Personality Tests Content-related Procedures o Obtain information regarding a psychological construct, create items consistent with that construct o Example: Woodworth Personal Data Sheet – information regarding psychiatric and pre-neurotic symptoms Empirical Criterion-Keying o Development of scoring key in terms of some external criterion o Select items that differentiate between clinical samples and normal population o Example: if 25% or more “normal” people answered an item unfavorably, it could not be “abnormal” since it is present in the “normal” population with such frequency o Responses are treated as diagnostic or symptomatic of the criterion behavior with which they are associated Examples: Minnesota Multiphasic Personality Inventories (MMPI), California Psychological Inventory, Personality Inventory for Children Factor analysis o Systematic classification of personality traits o Example: Guilford-Zimmerman Temperament Survey Personality Theories o Biopsychosocial Source of reinforcement (detached, discordant, dependent, independent, and ambivalent) Pattern of coping behavior (active vs. passive) Not a general personality instrument Help in differential diagnoses Example: Millon Clinical Multiaxial Inventory o Manifest Needs System (Henry Murray) Results in ipsative scores = strength of need is expressed in relation too other needs within the individual questionable ample: Edwards Personal Preference Schedule Achievement - need to accomplish tasks well Deference - need to conform to customs and defer to others Order - need to plan well and be organized Exhibition - need to be the center of attention in a group Autonomy - need to be free of responsibilities and obligations Affiliation - need to form strong friendships and attachments Intraception - need to analyze behaviors and feelings of others Succorance - need to receive support and attention from others Dominance - need to be a leader and influence others Abasement - need to accept blame for problems and confess errors to others Nurturance - need to be of assistance to others Change - need to seek new experiences and avoid routine Endurance - need to follow through on tasks and complete assignments Heterosexuality - need to be associated with and attractive to members of the opposite sex Aggression - need to express one's opinion and be critical of others Example: Personality Research Form and Other Jackson Inventories o Behaviorally-oriented and mutuallyexclusive definitions of 20 personality constructs o For prediction of behavior of individuals in normal contexts Test-Taking Attitudes and Response Bias Faking o Faking good – choosing answers that create a favorable impression o Faking bad – choosing answers that make them appear more disturbed o Face validity increases susceptibility to faking Social desirability – test-taker is unaware of putting up a “good front” Impression management – conscious dissembling to create a specific effect o Avoid: forced-choice items Acquiescence – tendency to answer True or Yes o Avoid: number of items keyed positively should equal number of items keyed negatively Deviation – tendency to give unusual or uncommon responses Traits, States, Persons, and Situations Behavior can be explained by both traits, states, and their interaction Individuals differ in extent of altering behavior to meet the situation Different behavior settings influence behavior Trait – relatively stable State – transitory condition Measuring Interests and Attitudes Values Difficulties with value inventories: o Sampling systematically o Appropriate level of abstraction o Value domains o Early inventories were incompatible with contemporary definitions Interest Inventories Interest testing – used for educational an career assessment o Also stimulated by occupational selection and classification Opinions and attitudes o For social psychology research o Consumer research and employee relations Has exploration validity – interest inventories increase behaviors needed for career exploration o Used to introduce individual to careers that he or she has not previously considered Issue: Sex fairness o Tests are validated against existing groups, it perpetuates group differences Example: Strong Interest Inventory o “Like,” “Indifferent,” “Dislike” of 5 categories: Occupations School subjects Activities Leisure activities Day-to-day contact with various people o Levels of Scores (Realistic, Investigative, Artistic, Conventional, Enterprising, Social) – RIASEC o Personal Style -taking o Validity: Criterion Example: Jackson Vocational Interest Survey o Work roles – what a person does on the job o Work styles – preference for situations or environments o 34 basic interest scales, 26 work roles, and 8 work styles o Equally applicable to both sexes o Validity: Construct Example: Kuder Occupational Interest Survey o Uses forced-choice triad (liked most to liked least) o 10 Broad interest areas: Outdoor, Mechanical, Computational, Scientific, Persuasive, Artistic, Literary, Musical, Social Service, Clerical o Grouped based on content validity o Scores expressed as correlation between respondent scores and the interest pattern of a particular group Example: Self-Directed Search o Self-administered, self-scored, selfinterpreted o Holland – occupational preferences s a choice of a way of life Individuals seek environments that are congruent with their personality types Vocational choices are implementations of self-concepts Trends o Expansion of occupational levels o Effect of inventory on test-taker o RIASEC model not a good fit for minority and other cultures Opinion Surveys and Attitude Scales Attitude – tendency to react favorably or unfavorably to a stimulus (e.g. ethnic group, custom, institution) o Cannot be directly observed o Should be inferred from verbal and nonverbal behavior Opinions – replies to specific questions Other Assessment Techniques Measures of Styles and Types Cognitive style – preferred and typical modes of perceiving, remembering, thinking, and problem solving Individuals differ on how they perceive and categorize situations, which depends on prior learning and experience Aptitude cannot be investigated independent of affect o Example: Perceptual tasks are related to attitude, motivation, and emotion o Example: Flexibility of closure = socially retiring, independent, analytical o Example: Field dependence – extent to which their perception of what is upright is influenced by surrounding visual field; sometimes called “cognitive control” -independent = active, participant approach to learning Personality types – constructs used to explain similarities and differences in preferred modes of thinking, perceiving, and behaving across individuals o Example: Myers-Briggs Type Indicator Attitude: Introversion vs. Extraversion Ways of Perceiving: Sensing vs. Intuition Ways of Judging: Thinking vs. Feeling Lifestyle: Judging vs. Perceiving All types are valuable and necessary, they each have strengths and weaknesses o Individuals are more skilled within their preferred functions, processes, and attitudes Criticism: stereotypes categorical data Situational Tests Placing individual in a situation closely resembling a “real-life” criterion situation Character Education Inquiry – makes use of familiar, natural situations in one’s routine o Measures honesty, self-control, altruism Situational Stress Test – sample individual’s behavior in a stressful, frustrating, or emotionally disruptive environment Leaderless Group Discussion – group is assigned a topic for discussion. Measures verbal communication, verbal problemsolving, and acceptance by peers Role-playing Self-Concepts and Personal Constructs How events are perceived by the individual Extent of self-acceptance by the individual Capacity to conceptualize self – capability to assume distance from one’s self and one’s impulses o Manifests in test-taking defensiveness, response sets, social desirability o Increases with age, education, SES, and intelligence Example: Washington University Sentence Completion Test – measures levels of ego development o Prosocial, Impulsive, Self-Protective, Conformist, Self-Aware Conscientious, Individualistic, Autonomous, and Integrated Self-Esteem Inventories and Others o Self-esteem – evaluative component of self-construct; evaluation of an individual of his or her performance o Example: Adjective Checklist – consists of 300 adjectives and adjectival phrases commonly used to describe a person’s attributes. o Example: Q-Sort – give piles, arrange from “most characteristic” to “least characteristic” in a forced-normal distribution (examiner specifies number of cards to be placed in each pile) o Example: Semantic Differential – examines connotations of any given concept for the individual (e.g. From a scale of 1 that means bad and 7 that means good, how would you rate “Father”?) uative (good-bad, valuableworthless, clean-dirty) -weak, largesmall, heavy-light) -passive, fast-slow, sharp-dull) o Observer Reports Naturalistic observation – direct observation of spontaneous behavior in natural settings (e.g. diary method, time sampling) o No control is exerted over the stimulus situation Interview – elicit life-history data o Can be highly-structured to unstructured o Affords direct observation Ratings – evaluation of the individual based on cumulative, uncontrolled observations o Disadvantages: Ambiguity Amount of relevant contact Halo effect Error of central tendency Leniency error Nominating technique – choose one person with whom individual would like to study, work, eat lunch o Can identify potential leaders, isolates o Good concurrent and predictive validity because of high number of raters, raters are in a good position to observe, and observer’s opinions influence the observed action APPLICATIONS OF TESTING Educational Testing Interview and questionnaires to elicit lifehistory data Consistently good predictors of performance Developed through o Criterion keying and cross-validation Prediction and classification within a specific educational setting Uses educational achievement tests Achievement tests Biodata o Identification of constructs through job analyses and surveys Measures effects of specific program of instruction or training Measure effects of relatively standardized experiences (controlled, known) Aptitude – cumulative influence of different learning experiences in daily living o Measure effect of learning under relatively uncontrolled or unknown conditions Ability – any measure of cognitive behavior o Sample of what individual knows at the time of testing o Level of development in one or more abilities o Includes both aptitude and achievement No two tests correlate perfectly with one another o Difference in achievement and ability could be about over prediction or under prediction Is objective, uniform, and efficient Functions: o Reveal weaknesses in learning o Give direction to subsequent learning o Motivate learner Provide means of adapting to individual results o Aid in evaluating teaching o Aid in formulating educational goals (analyze educational objectives, critical examination of content of instruction methods) Item format o Multiple choice is often used Promote rote memorization Learning of isolated facts vs. development of problemsolving and conceptual understanding o Constructed-response / open-ended = requires examinee to generate an answer o Portfolio assessment – cumulative record of a sample of a student’s work in various areas over a period of time General Achievement Batteries o Provide profiles of scores on individual subtests or in major academic areas o Horizontal and vertical comparisons o Large majority have overlapping items for different levels o Some are concurrently normed with aptitude tests enable direct comparison of scores Tests of Minimum Competency o Ascertain mastery of basic skills o Teacher-Made Classroom Tests o Tests for College Level Used for placement and admissions o o o Graduate School Admission special appointments Diagnostic and Prognostic Testing Diagnosis of learning disabilities Prognostic – predict usual performance in a course Teach-test-teach – how well s/he can learn during one-to-one instruction Assessment in Early Childhood Education Measure outcomes of early childhood education School readiness – attainment of prerequisite skills, knowledge, attitudes, motivation, and behavior to profit from school instruction Emphasis on abilities required for learning to rea o Selection and classification of personnel Individuals should be placed in a job where they are most qualified Traits irrelevant to job requirements should not affect selection decisions Selection tests should be validated with test performance Global Procedures for Performance Assessment Job Sample – task is part of work to be performed, but all applicants operate under uniform conditions Simulations – reproduce functions in the job Job analysis – identify requirements that differentiate one job from other jobs o Description of job activities in terms of behavioral requirements Occupational Testing Identify aspects of performance that differentiates good and poor workers o Facilitates effective use of tests across jobs that may seem different Job elements – units describing critical work requirements Synthetic validation – it is possible to identify skills, knowledge, and performance requirements common to many jobs o Job analysis to identify elements and relative weights o Analysis and empirical study of each test to determine extent to which it measures proficiency in performing job elements o Finding validity in each test from the weight of these elements in the job and in the test Validity generalization – application of prior validity findings to a new situation via metaanalysis Multiple Factor Theory o Considers behaviors under the control of worker + environmental conditions o Effectiveness + productivity + utility o Any job has multiple performance components, consisting of various combinations of knowledge, skills, and motivations Tests of verbal and numerical reasoning have some predictive validity for different jobs. However, additional variables need to be measured Special aptitude tests – for testing abilities that are “supplemental” to IQ, such as mechanical, musical, etc. o For abilities specific to situations, not included in standard batteries o Example: psychomotor tests – for manual dexterity, motor, perceptual, mechanical abilities o Mechanical aptitudes – rapid manipulation of items, spatial manipulation / perception o Clerical aptitudes – perceptual speed, accuracy o Computer-related aptitudes o Social and emotional aspects of intelligence – knowledge, skills, abilities of examinees in interpersonal and selfmanagement Personality Testing in the Workplace Most relevant personality dimensions in specific jobs Examples: o Emotional stability – quick decisionmaking in stressful conditions o Agreeableness – needed for extensive interpersonal contact Integrity tests – applicant’s attitude toward and history of involvement in illegal behaviors Leadership – ability to persuade others to work towards a common goal Clinical and Counseling Psychology Individual intelligence tests, educational tests, brief questionnaires and rating scales For diagnostic, prognostic, and therapeutic decisions in mental health settings Psychological assessment – intensive study of one or more individuals through multiple sources of data o Provides an integrated picture of the individual o Multiple sources protect against overgeneralizing test data o Aim:: making informed decisions pertaining to differential diagnoses, career selection, treatment, education, forensic o Continuous process of hypothesisgeneration and testing o Involves professional judgement based on knowledge about specific problems of specific populations o Ecological viewpoint – also need to consider the context of person’s life Explore patterns of test scores for strengths and weaknesses Profile analysis: o Amount of scatter or variation among scores o Base rate data – frequency of such features in normative population o Score patterns that are typical of special populations / clinical syndromes Irregularities in performance suggest avenues for exploration Observing general behavior in the context of testing Neuropsychological Assessment Intelligence Tests Integrate test’s statistical info with human development, personality theory, etc. Consider both skills and extraneous conditions Need for supplementary information Calls for individualized interpretation of test performance rather than uniform application of any type of pattern analysis Apply what is known about the brain-body relationship for diagnosis and treatment of brain damaged individuals E.g. left hemisphere lesion = V < P in Wechsler, opposite pattern In righthemisphere lesions and diffuse brain damage Age affects behavioral symptoms caused by brain damage o Amount of learning, intellectual development o The younger the age, the greater the effect of brain damage on intellectual functioning Chronicity – amount of time elapsed since injury will affect physiological changes and behavioral recovery through learning / compensation Intellectual impairment may be an indirect result of brain damage Same behavior may be due to organic, emotional, or mixed causes Need premorbid ability level to examine the extent of damage Instruments: Perception of spatial relations and memory for newly learned material (example:Bender Visual Gestalt Motor Test) Difficult to interpret results in terms of score patterns Batteries can measure all significant neuropsychological skills: o Detect brain damage o Help identify and localize damaged area o Differentiate among syndromes o Planning rehabilitation through identifying type and extent of behavioral deficits Identifying Learning Disabilities Specific learning disability o Disorder in basic psychological processes involved in using and understanding spoken or written language, which manifests in imperfect ability to listen, think, speak, read, write, spell, do math calculations o Does not include children whose learning problems are a result of economic /environmental / cultural disadvantage Severe discrepancy in ability and achievement in different communication and math skills Not achieving in a manner commensurate with age and ability levels, even with proper education Shows normal or above-normal intelligence, with difficulties in learning one or more basic skills Could also manifest in difficulty perceiving and encoding information, poor integration of input from different senses, disruption of sensorimotor coordination Also: aggression, affective and interpersonal problems because of academic failures and frustration Assessment uses different sources because: o Various behavioral disorders are associated with LD o Individual differences in combination of symptoms o Need for specific information on nature and extent of disability Dynamic assessment – deliberate departure from standardized or uniform test administration to elicit qualitative data o “Testing the limits” – additional cues are provided o Learning potential assessment – teachtest-teach o Disadvantages: Transportability – extent to which it can be used by others problem-solving to real-life problems o o Career Assessment Define problem through functional analysis of behavior Select appropriate treatments Assess behavior change resulting from treatment Procedures: o Self-report Integrate information from expressed interests, preferences, and value system Career maturity – mastery of vocational tasks appropriate to age level Clinical Judgment Influenced by cultural stereotypes, fallacious prediction principles Used when satisfactory tests are unavailable Suited for cases that are rare and idiosyncratic, frequency is too low for the development of statistical strategies Psychologist with low levels of cognitive complexity are more likely to form biased clinical judgments The Assessment Report Behavioral Assessment Direct observation Physiological measures (for anxiety, sex, and sleep disorders) There is no standard form or outline Report must adapt to needs, interests, and background of those who will receive it Should select what is relevant to answering questions Concentrate on individual’s differentiating characteristics rather than on traits on which the individual is average Barnum effect – pseudo-validation from general, vague statements that apply to most people ETHICAL AND SOCIAL CONSIDERATIONS 1. Do no harm. a. Provide services and use techniques in which they have been trained b. Choose tests that are appropriate for the purpose and for the examinee c. Recognize boundaries of competencies and limitations of expertise 2. Be sufficiently knowledgeable about the science of human behavior to guard against unwarranted inferences in interpretation. 3. Protect safety and security of test materials. 4. Protect safety and security of examinees a. Persons should not be subjected to testing programs under false pretenses b. Protect the individual’s privacy i. Information to be asked must be relevant to purpose ii. Informed consent should include the purpose of testing, data needed, and use of scores c. Test-taker should have the opportunity to comment on the report i. The report should be readily understandable, free from technical jargon and labels, and oriented towards the immediate objective of testing d. Records should not be released without the knowledge or consent of the examinee, unless mandated by law