Evaluating Psychological Tests 1 Psychological testing • Suffers a credibility problem within the eyes of general public • Two main problems – Tests used inappropriately • Goddard (1912) used a translation of Binet’s test to test ability of American immigrants - conclusion 79% of Italian immigrants = ‘feeble-minded’ - bias – Tests themselves can be flawed • Often measures supposed constructs which are not supported by proper factor analysis - (Internal locus of control) 2 External bias in tests • Do group differences imply test bias (difficulty unrelated to characteristic being assessed)? – V1 - innate abilities can be different across groups (Reynolds, 1995; Kline, 1993) • Japanese have higher than average spatial abilities • African Americans have ‘lower IQ’ (Hernstein & Murray, 1996) – V2 – Ethnic and gender groups must have the same underlying abilities – evidence to the contrary must be a product of measuring something other than what is relevant • Kline – ‘egalitarian fallacy’ 3 Dealing with differences • Detected through different regression equation – not through different means • What purpose does research in this area serve? – Within group differences far outweigh between group differences 4 Detecting internal bias • If only gross scores are considered, hard and easy items for each group might balance themselves out giving a false impression of the test’s ‘health’ • Alternative – Run a mixed factorial ANOVA – Each test item (question) is entered as a level of repeated measures factor – Group = between subjects variable • Main effect of item – expected • Main effect of group shows external bias • Interaction show internal bias in that the pattern of responding is different across the groups • Such a method is susceptible to power manipulation 5 Bias - performance characteristics • Response bias – individuals are more likely to agree than disagree (Cronbach, 1946) – response set of acquiescence • Does not cause a problem if everyone behaves in same manner – standard score will be unaffected • But there are considerable individuals differences in acquiescence therefore it can cause a major problem – Changing polarity removes this difficulty • Social desirability – Counter acted by lie scales and consistency measures 6 Obvious influences • • • • Motivation Expectation Anxiety Test specific practise 7 Revisiting Validity 8 Validity – different definitions • Correctness or truth of an inference • Validity with respect to IV – Are we truly manipulating that which we think we are • Often relies on the construct of interest being adequately described • How do you manipulate something like the unconscious? • Validity with respect to the DV – Extent to which you are measuring what you claim 9 to measure Different types of validity • Content validity – Whether the target construct is adequately addressed – When measuring depression should assess aspects such as fatigue, anxiety, appetite, motivation, libido • Is assessed through expert opinion – Has a certain amount of subjectivity 10 Different types of validity • Criterion-Related validity – How measure compares to some already validated measure • Two types – Predictive – Concurrent 11 Different types of validity • Construct validity – Most important – Are the experimental manipulations that we make really manipulating the construct of interest – Evaluation requires • Clear definition of the construct – Can be difficult e.g., IQ – has many different facets • Assess match between construct and operations used to represent it (exp manipulations) – Can involve criterion and content validity – Viewed as an evolving never ending process 12 Different types of validity • Internal validity – degree to which the independent and dependent variables are causally linked • External validity – degree to which causal relationship holds across different settings 13 How relevant is validity to you • Reviewing articles is essentially addressing validity and reliability issues – In examination situation would be useful although not essential to talk about the different forms of validity • In discussion sections of reports again you are essentially evaluating the results with respect to validity and reliability – Would not really use the formal language used 14 here – is a style issue