Joni L Mihura: Joni L Mihura: The Validity of Psychological Tests as Measures of Aggressive Behavior: A Review of the Literature E.M. Farrer & J.L. Mihura University of Toledo Abstract This study reviews the empirical literature on the validity of psychological tests as measures of aggressive behavior. The psychological tests were categorized into two groups: (a) self-report questionnaires (e.g., BDHI, JI, PAI) and (b) performance personality tests (e.g., Rorschach and Hand Test). For criterion variable of aggressive behavior, only studies using observational measures are included in the review (e.g., ward reports, patient records, chart reviews). The effect sizes of the psychological tests compared to observational measures are presented and then compared using a monotrait-multimethod approach. Also of interest are similar studies using a multitrait-multimethod approach, comparing and contrasting similar constructs (e.g., aggression, anger, antisocial behavior) using the same or different methods (self-report measures, performance personality tests, and observational measures). The goal of the study is therefore twofold: (1) to review the literature on the validity of psychological tests as measures of aggressive behavior and (2) to place this aggression literature in a psychometric context regarding more general issues of monomethod-heteromethod approaches to validity. Method cont. The observational measures needed to be clear in how they measured the aggressive behavior. Records of past aggressive behavior (e.g., chart reviews, criminal file reviews) also had to have a well-defined way of measuring aggressive behavior including “objective” systems like the number of institutional infractions for forensic samples. Only studies written and conducted in English with no clear criterion contamination, (e.g., behavior ratings blind to psychological test data) were included. For comparative purposes, the findings are reported in effect sizes, converted where necessary to use Pearson r as the common metric. As a rule of thumb, the magnitude of effect sizes (r) can be classified as (a) small = .10, (b) moderate = .30, and (c) large = .50 Table 2. Aggressive Behavior as Measured by Performance Personality and Self-Report Tests: Summary Statistics Measurement Method k N rw Performance Personality Test 3 246 .31 Self-Report Tests 5 498 .27 Total Tests 8 744 .29 Note: k = number of effect sizes included in the summary statistic Figure 1. Multitrait-Multimetehod Table Construct Overlap Measurement Method Introduction Most often in psychology the general notion of a person’s level of functioning and personality aspects is obtained by the word of mouth of the person him- or herself. Ways this can be done is by using selfreport measures, such as the Personality Assessment Inventory (PAI), or by performance personality tests, such as the Rorschach. These measures, however, rely heavily on the respondent as the source of information, whereas behavior measures rely on others as the source of information. Results cont. Different Same Moderate High Major Study Question: E.g., Self-report Self report or anger measure performance personality compared to aggression measures aggressive behavior compared to aggressive behavior E.g., self-report E.g., Self-report anger measure aggression measure compared to selfcompared to self-report report aggression aggression measure measure Table 3. MTMM Results: Weighted Mean Effect Sizes Construct Overlap Measurement Method Moderate High Different .16 (k = 8, N = 924) .28 (k = 9, N = 814) Same .46 (k = 6, N = 1,771) .77 (k = 4, N = 371) Many self-report measures used for screening are broadband inventories such as the PAI or the Jesness Inventory (JI). Several are also specifically designed to measure the construct of interest. The construct of particular interest to this review is aggression. Aggression can be defined as “the act or practice of attacking without provocation, “ (Coccaro et al., 1997). Aggression can be verbal or physical and, for this study, directed outwardly. The reliance on self-report measures and performance personality tests of aggression is of particular interest due to the implications that could arise if the aggression is carried out. How well can self-report measures and performance personality tests designed to measure aggression actually predict aggressive behavior? Further, aggression also has similar constructs with similar implications. Anger and antisocial behavior are among those. How well do tests specifically measuring those related constructs predict aggressive behavior? Also, how well do the same constructs measured by different methods compare? This multitrait-multimethod approach is of particular interest to the study. According to Campbell and Fiske (1959) the same construct measured by different methods should agree and should agree better than different constructs measured by different methods. Thus, the study has two goals. The first is to review how selfreport and performance personality measures of aggression compare to observable behavior. The second is to compare similar but slightly different constructs to themselves and each other using the same and different methods. Method Studies were located by conducting a PsycINFO search of articles published within the past 30 years with either Aggressive Behavior or Antisocial Behavior or Violence as Subject terms. These were further limited by a classification code of personality scales and inventories or clinical psychological testing. The articles were limited due to the high volume retrieved without the classification code— 19,651. The limit reduced the number of articles to 387. These remaining articles were kept or eliminated based on the following criteria. The tests in the study had to contain a self-report or performance personality measures of aggression, anger, or antisocial behavior. The next criterion for the study was an observational measure used that could be correlated with the self-report or performance personality test. Discussion Results For studies that reported more than one effect size, these were were averaged to report as one effect size per study. Table 1. Aggressive Behavior as Measured by Performance Personality and Self-Report Tests Study Measures Sample N r C 94 .27 14. Hand Test AOS&ACT-MOV MR 36 .57 13. Hand Test AOS&ACT-MOV MR 116 .27 F 169 .26 S 91 .33 17. BDHI Assault F 60 .26 15. BDHI Assault C 51 .40 7. PAI AGG-P F 127 .20 Performance Personality Tests 3. Rorschach AgC Self-Report Tests 18. PAI AGG 4. Aggression Questionnaire PA Note: C = Clinical; MR = Mentally Retarded; F = Forensic; S = Student Table 1 shows the effect sizes for performance personality tests and self-report measures as compared to behavior that range from .20 TO .57. Summary statistics were also computed for self-report and performance personality test effect sizes. This was done by taking the mean of each grouping of tests weighted by N. As shown in Table 2, both performance personality and self-report tests had overall medium effect size relationships with aggressive behavior—r = .31 and r = .27, respectively. The next table shows the results from the question of what happens to the effect sizes when slightly different constructs are measured using different methods and when the same constructs are measured by the same methods. Again, the effect sizes in the table are weighted to compensate for the varying sample sizes. According to Jacob Cohen (1988), “…when one looks at nearmaximum correlation coefficients, of personality measures…with reallife criteria, the values one encounters fall at the order of r = .30.” This corresponds to the findings above. The values were obtained using personality measures and with the same or highly overlapping constructs as compared to the real-life criteria in question. Self-report and performance personality measures do not differ in their effect sizes either. For the Campbell and Fiske’s (1959) MTMM approach, the data correspond quite well. Measuring moderate construct overlap using different methods will result in low effect sizes. On the other hand, using high construct overlap and the same methods, the correlation is quite high and what one would expect for test-retest reliability. This also corresponds with Meyer et al.’s (2001) findings that a single measure will only represent a certain portion of one’s personality and that different sources of information tend to provide their own unique interpretation of someone’s personality or behavior. Future information is yet to come. While performance personality tests and self-report measures were compared to each other, to behavioral measures, and to themselves; the next step would be to see how well observational measures compare to themselves. References 1. Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105. 2. Cohen, J. (1988). Set correlation and contingency tables. Applied Psychological Meaurement, 12(4), 425-434. 3. Meyer, G.J., Finn, S.E., Eyde, L.D., Kay, G.G., Moreland, K.L., Dies, R.R., Eisman, E.J., Kubiszyn, T.W., & Reed, G.M. (2001). Psychological testing and psychological assessment. American Psychologist, 56(2), 128-165. See Handout for the List of Reviewed Studies.