Solving Classification Problems for Symptom Validity Tests with Mixed Groups Validation Richard Frederick, Ph.D., ABPP (Forensic) US Medical Center for Federal Prisoners Springfield, Missouri I am not a neuropsychologist. My view of brain Your view of brain My board certifications: Forensic Psychology American Board of Professional Psychology Assessment Psychology American Board of Assessment Psychology My professional goal: Use tests properly in forensic psychological assessments Goals of workshop Participants in this workshop will be able to employ Excel graphing methods: --to evaluate classification characteristics of symptom validity tests --to adapt symptom validity test scores to their individual, local, base rates --to combine information from local base rate and multiple symptom validity tests richardfrederick.com Something is terribly wrong 1. The SIRS has sensitivity = .485 and specificity = .995. 2. The SIRS was administered to 131 criminal defendants who were strongly suspected of feigned psychopathology. 68% of them were categorized as feigning by the SIRS What is a classification test? A structured routine for determining which individuals belong to which of two groups. (1) There are two groups. (2) It’s not easy to determine which group an individual belongs to without the help of the test. Real World The distributions represent our estimations of how the populations of the two groups score on the test. We generally estimate the population distributions by sampling. We notice that the populations have two separate, but overlapping distributions. The extent of the overlap is of concern to us. Questions that must be addressed in research before we can continue: (1) Are there really two separate groups? (2) Can we effectively represent the population distributions by sampling? Real World What we notice next. The mean separation between the groups is 10 points. Persons in Population A have a mean score that is 10 points below persons in Population B. The sd for each population is the same. The mean separation between groups is one sd. When researchers talk about mean separation, they often refer to effect size. Often, Cohen’s d is the statistic used to refer to standardized mean separation. Here, Cohen’s d = 1. This is often referred to as a large, or very large, effect size. Mean separation = 0 Making tests often means finding those characteristics that best separate the distributions of the two groups. Two distributions of gender with respect to: Intelligence Moderately large mean separation Two distributions of gender with respect to: Longevity Large mean separation Two distributions of gender with respect to: Hair Length Very large mean separation Two distributions of gender with respect to: Body Mass Real World Summary: (1) We have two groups. (2) We have a test for which the two groups score differentially. (3) The differences in mean scores represents a very large effect. Foundations of TPR and FPR More commonly, researchers report Sensitivity and Specificity. These terms are common, but not most helpful. We are going to use the terms: True Positive Rate (TPR) and False Positive Rate (FPR). TPR = Sensitivity FPR = 1 - Specificity What are TPR and FPR? TPR is the proportion of individuals who do have the condition who generate positive scores. TPR is the rate of scores are beyond the cut in the direction that indicates the presence of the condition. FPR is the proportion of individuals who do NOT have the condition who generate positive scores. FPR is the rate of scores beyond the cut in the direction that indicates the presence of the condition. Have nots Haves The green line represents the cut score. Scores to the LEFT of the line are classified NEGATIVE. Scores to right are classified POSITIVE. Here, the False Positive Rate is 92.4%. The True Positive Rate is 100%. As we move the line to the right, both rates DECREASE. To totally eliminate false positives, we have to be willing to identify almost no one as a positive. Test/ /Truth Have disorder Don’t Has disorder True Positives False Positives Doesn’t False Negatives Haves Positives True Negatives Negatives Have Nots TPR = True Positives/Haves FPR = False Positives/Have Nots Haves Have nots A positive score will be one that is associated with Population A membership. If we set a point at which a score will be used to say, “This score represents Population A,” such a score will be referred to as a “positive score.” A positive score can be a true positive or a false positive: unknown to us. The True Positive Rate is the proportion of Population A members who generate a positive score. In our figure, the point at which we begin to identify “positive scores” is at 50, the mean of population A. Scores at or below 50 are called positive, and a person who generates a positive score is classified as a Population A member. We can pick any value to be our “cut score,” but it’s hard to pick one that doesn’t result in some Population B members producing “positive scores.” In our figure, 50% of the Population A members have scores at 50 or below. This is the True Positive Rate. TPR = .50. In our figure, 16% of the Population B members have scores at 50 or below. This it the False Positive Rate. FPR = .16. We note that it is not the test that has a certain TPR and FPR. It is the chosen test score that has a certain TPR and FPR. A different test score will almost certainly have different TPR and FPR. Overcoming limiting factors of “known groups” validation in determining test score sensitivity and specificity We think of a test as a way to characterize a dependency. As you have more of X, you have more of Y. Y depends on X. X predicts Y. X is some construct. Y is some test score. There is a relationship that we wish to characterize and quantify. Let’s consider feigning. As you are more likely to feign, you are more likely to engage in certain behavior. This behavior might be “providing answers to items on a test” at a certain rate. You might choose more items, you might choose fewer items than “normals.” We develop the idea that we can identify individuals who respond at a certain rate as feigners, and we decide to make a decision point about when we call test takers feigners and when we don’t. We call that decision point a cut score. We call test scores at or beyond the cut score: positive scores Some positive scores are correct: true positives Some positive scores are incorrect: false positives If our test is any good, and if the relationship between X and Y is strong, then our rate of true positives is much higher than our rate of false positives. Let’s skip to the end. We are now using the test in our clinic. We look over our results. We see a number of “positive scores.” We know that those “positive scores” are some unknown mixture of “true positives” and “false positives.” We’d like to know what that ratio of that mixture is. Here’s how we do it: First, we estimate what the true positive rate of the cut score is. Then, we estimate what the false positive rate of the cut score is. Then, we figure out what percentage of people in our sample are feigning. Then we can get the ratio of the mixture of our true positive and false positives in all the positive scores in our clinic. (We call this positive predictive power.) Getting TPR and FPR: We depend on researchers to tell us what the estimates of true positive rate and false positive rate are. They usually do this through a process called “criterion groups validation.” People with more confidence than might be called for refer to this process as “known groups validation.” The process is seemingly straightforward. Identify two groups. One group has the condition. All the positives in this group are “true positives.” One group doesn’t have the condition. All the positives in this group are “false positives.” The rate of “true positives” is the sensitivity of the test. TPR = sensitivity. The rate of “false positives” is the non-specificity of the test. FPR = 1 – specificity. There are many problems with this process, but let’s focus on the main two. Problem 1 In Study 1, for a given cut score, researchers report the TPR is .67 and the FPR is .12. In Study 2, for the same cut score, researchers report TPR = .58 and FPR = .09. Which values do you use? Problem 2: In Study 1, for a given cut score, researchers report the TPR is .67 and the FPR is .12. In Study 2, for a different cut score, researchers report TPR = .58 and FPR = .09. Which cut score do you use? “Known” groups validation Let’s validate a test! God whispers to us what truth is and we identify 100 honest responders and 100 feigners. 100 100 We take our best shot at a test. TEST TRUTH 100 100 Test results TEST TRUTH 49 1 50 51 99 150 100 100 We say for our test: True positive rate = 49/100 = 49% [sens = 49%] False positive rate = 1/100 = 1% [specificity = 99%] TEST TRUTH 49 1 50 51 99 150 100 100 Because God does not whisper to us anymore, we take this test, our best test, and we say, “This is the best we can do.” Let’s call it our Gold Standard. We will now make criterion groups with this test, and we will call the groups “Known Groups.” We will then validate tests, based on these Known Groups. We say for our test: True positive rate = 49/100 = 49% [sensitivity = 49%] False positive rate = 1/100 = 1% [specificity = 99%] TEST TRUTH 49 1 50 51 99 150 100 100 Our move from TRUTH to KNOWN GROUPS TRUTH “KNOWN” GROUPS 49 51 100 1 99 100 50 150 We forget what truth is and develop faith in our gold standard “KNOWN” GROUPS 50 150 Let’s validate a new test, which just happens to be a perfect test. What test diagnostic efficiencies will we assign our new, perfect, test? PERFECT TEST “KNOWN” GROUPS 49 51 100 1 99 100 50 150 Let’s validate a new test, which just happens to be a perfect test. What test diagnostic efficiencies will we assign our new, perfect, test? PERFECT TEST “KNOWN” GROUPS 49 51 100 1 99 100 50 150 TPR = 49/50 = 98%, FPR = 51/150 = 34% Our belief that we can make perfect criterion groups from imperfect criteria has led us to misunderstand tremendously what we are doing. Let’s begin to address these problems in a non-traditional way. Table for Computation of Test Characteristics Positive (Feigners) Test Positive Test Negative 80% 20% Negative (Not Feigning) 10% Computation for Positive Predictive Power 90% Computation for Negative Predictive Power Sensitivity = Specificity = 80% 90% Table for Computation of Test Characteristics Positive (Feigners) Test Positive Test Negative 80% 20% Negative (Not Feigning) 10% PPP = Ratio of True Positives to All Positives 90% NPP = Ratio of True Negatives to All Negatives True False Positive Positive Rate (TPR) = Rate (FPR) = 80% 10% Table for Computation of Test Characteristics Positive (Feigners) Test Positive Test Negative 80% 20% Negative (Not Feigning) 10% PPP = Ratio of True Positives to All Positives 90% NPP = Ratio of True Negatives to All Negatives True False Positive Positive Rate (TPR) = Rate (FPR) = 80% 10% Table for Computation of Test Characteristics Base Rate of Feigning 100% 0% Test Positive 80% 10% Test Negative 20% 90% NOTE: Calculations of TPR and FPR are INDEPENDENT of Base Rate True Positive Rate (TPR) = 80% False Positive Rate (FPR) = 10% Table for Computation of Test Characteristics Base Rate of Feigning Test Positive 100% 0% 80% 10% True Positive Rate (TPR) = 80% False Positive Rate (FPR) = 10% Table for Computation of Test Characteristics Base Rate of Feigning Proportion Tests Positive 1.00 0 .80 .10 True Positive False Positive Rate (TPR) = .80 Rate (FPR) = .10 The Test Validation Summary Proportion Positive Scores on Classification Test 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 Proportion Positive Scores 0.2 False Positive Rate 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 Base Rate of Feigning .70 .75 .80 .85 .90 .95 1.00 REMINDER: Here is what we are working on—figuring out which positives in our clinic are true positives. First, we estimate what the true positive rate of the cut score is. Then, we estimate what the false positive rate of the cut score is. Let’s do that part now. Then, we figure out what percentage of people in our sample are feigning. Then we can get the ratio of the mixture of our true positive and false positives in all the positive scores in our clinic. Mixed groups validation Table for Computation of Test Characteristics Base Rate of Malingering Pr + Tests 1.0 .8 .6 .4 .2 0 .8 .66 .52 .38 .24 .1 TPR = .8 (.8, .6, .4, .2 are mixed groups, not pure) FPR = .1 Table for Computation of Test Characteristics Base Rate of Malingering Pr + Tests 0 .2 .4 .6 .8 1 .1 .24 .38 .52 .66 .8 FPR = .1 TPR = .8 The Test Validation Summary Proportion Positive Scores on Classification Test 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 Proportion Positive Scores 0.2 False Positive Rate 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 Base Rate of Malingering .75 .80 .85 .90 .95 1.00 The Test Validation Summary Proportion Positive Scores on Classification Test 1 When BR = 1, 80% of scores positive, all true positives 0.9 When 0 < BR < 1, positive scores are some mixture of true positives and false positives. That mixture is easily discernible. 0.8 0.7 True Positive Rate 0.6 0.5 0.4 0.3 When BR = 0, 10% of scores positive, all false positives 0.2 Proportion Positive Scores False Positive Rate 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 Base Rate of Malingering .75 .80 .85 .90 .95 1.00 1.00 When we say FPR = .16 and TPR = .50, what we’re saying is that, no matter what samples we test, we expect to see no fewer than 16% positive scores and no more than 50% positive scores. 0.90 Proportion Positive Scores in Sample 0.80 Movement along this line from left to right represents increasing rate of Population A and increasing rate of positive scores. 0.70 0.60 0.50 0.40 0.30 0.20 FPR = .16, TPR = .50 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Base Rate of Population A in Sample 0.90 1.00 The Test Validation Summary FPR is the proportion of positive scores obtained when BR = 0. Proportion Positive Scores on Classification Test 1 0.9 TPR is the proportion of positive scores obtained when BR = 1. 0.8 0.7 The BR of the condition varies moving along the solid straight line as the proportion of positive scores increases from FPR to TPR. 0.6 0.5 0.4 0.3 NPP 0.2 PPP Proportion Positive Scores 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Base Rate of Condition FPR = .10 TPR = .80 The Test Validation Summary Proportion Positive Scores on Classification Test 1 0.9 0.8 0.7 0.6 0.5 0.4 The mixture of true positives and false positives changes in a linear fashion, moving from 0% true positives to 100% true positives, but the rate of change (PPP) is not linear. PPP changes in a non-linear, or curvilinear, fashion. 0.3 NPP 0.2 PPP Proportion Positive Scores 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Base Rate of Condition FPR = .10 TPR = .80 Table for Computation of Test Characteristics Base Rate of Malingering 0 Pr + Tests .2 .4 .6 .8 .24 .38 .52 .66 1 The Test Validation Summary Proportion Positive Scores on Classification Test 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 Proportion Positive Scores 0.2 False Positive Rate 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 Base Rate of Malingering .75 .80 .85 .90 .95 1.00 The Test Validation Summary Proportion Positive Scores on Classification Test 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 Proportion Positive Scores 0.2 False Positive Rate 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 Base Rate of Malingering .75 .80 .85 .90 .95 1.00 1.00 FPR = .052, SE = .021 0.90 TPR = .777, SE = .061 Proportion Positive TOMM Scores 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Estimated Base Rate of Malingering 0.90 1.00 1.00 FPR = .056, SE = .025 0.90 TPR = .742, SE = .093 TOMM No simulation studies Proportion Positive TOMM Scores 0.80 0.70 0.60 FPR = .056, SE = .025 0.50 0.40 0.30 TPR = .742, SE = .093 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Estimated Base Rate of Malingering 0.90 1.00 NPP PPP For any imperfect test, PPP ranges from 0 to 1 as base rate ranges from 0 to 1 NPP ranges from 0 to 1 as base rate ranges from 1 to 0 Using MGV to estimate test diagnostic efficiencies of the Reliable Digit Span Laurie Ragatz, PhD Richard Frederick, PhD What is Reliable Digit Span? RDS is a symptom validity measure for Digit Span. The value of RDS is derived by adding longest strings of two trials passed for both forward and backward Digit Span. Researched cut scores include 5 or lower, 6 or lower, 7 or lower, or 8 or lower. Reliable Digit Span Example: Forward Digit Span Correct Incorrect 1 4 Correct Incorrect 2 5 Correct Incorrect 5 7 1 Correct Incorrect 8 3 4 Correct Incorrect 5 9 4 6 Correct Incorrect 7 2 3 9 Correct Incorrect Directions: Examinee recalls numbers in the same order they were provided by the examiner Backward Digit Span Correct Incorrect Example Correct Answer Correct Incorrect 1 2 2 1 Correct Incorrect 7 4 4 7 Correct Incorrect 5 3 9 9 3 5 Correct Incorrect 8 2 4 4 2 8 Correct Incorrect Directions: Examinee recalls numbers in the reverse order they were provided by the examiner Reliable Digit Span: 4 + 3 =7 (1) We found all available articles dealing with RDS and identified the cut scores investigated. We included simulator studies. (2) Based on the authors’ decision about criterion group membership, we calculated the overall base rate of malingering in the study. (3) We observed the overall rate of positive scores in the study at the identified cut score. (4) We did not include any data for persons with mental retardation. The rate of positive scores among persons with mental retardation was exceedingly high for all cut scores. Criterion group Test outcome Example: Smith (2010) reported 203 TOMMs at cut < 45. Is Test score positive Test score negative Total Is not malingering malingering Total 42 15 57 21 125 146 63 140 203 We have 63 malingerers in a sample of 203. BR = 63/203 = 0.31. We have 57 positive scores. Proportion positive scores (PPS) is 57/203 = .28. For this study, we plot (BR, PPS) = (.31, .28) x = .31, y = .28. Our n for WLS = 203. RDS = 5 or lower Study Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) N 96 54 157 60 65 62 133 Cut score = 5 or lower BR PPS 0.490 0.052 0.444 0.093 0.223 0.057 0.333 0.083 0.554 0.215 0.532 0.113 0.113 0.008 Using weighted least squares regression (with N as the weight), we regressed Proportion Positive Scores (PPS) on Base Rate (BR) to generate the Proportion Positive Score Line. We obtained y-intercept of -.015 (all negative values are truncated to 0), and slope of .265. RDS = 5 or lower Study N BR PPS 1 0.9 1 96 0.49 0.052 2 54 0.444 0.093 3 157 0.223 0.057 4 60 0.333 0.083 0.4 5 65 0.554 0.215 0.2 6 62 0.532 0.113 7 133 0.113 0.008 put these data in WLS to obtain regression line characteristics 0.8 0.7 0.6 0.5 0.3 0.1 0 0 0.2 0.4 0.6 scatterplot 0.8 1 RDS: 5 or lower, FPR = 0, TPR = .265 RDS = 6 or lower Study Duncan & Ausborn, 2002 Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Strauss, Slick, Hunter, et al 2002 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) Babikian, Boone, Lu, & Arnold Greiffenstein & Baker (2008) y-intercept = .015, slope = .419 N 187 96 54 157 74 60 65 62 133 154 87 Cut score = 6 or lower BR PPS 0.283 0.230 0.490 0.094 0.444 0.185 0.223 0.089 0.459 0.243 0.333 0.117 0.554 0.354 0.532 0.242 0.113 0.045 0.429 0.130 0.775 0.368 RDS: 6 or lower, FPR = .015, TPR = .434 RDS = 7 or lower N Study Duncan & Ausborn, 2002 Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Inman & Berry, 2002 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ruocco, Swirsky-Sacchetti, Chute et al., 2007 Merten, Bossink, Schmand (first) Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) Greiffenstein, Baker, Gola (1994) Babikian, Boone, Lu, & Arnold Greiffenstein, Gola, Baker (1995) Greiffenstein & Baker (2008) y-intercept = .187, slope = .39 187 96 54 157 92 60 65 77 48 62 133 106 154 177 602 Cut score = 7 or lower BR PPS 0.283 0.394 0.490 0.260 0.444 0.333 0.223 0.270 0.478 0.130 0.333 0.133 0.554 0.554 0.041 0.338 0.500 0.458 0.532 0.452 0.113 0.083 0.406 0.396 0.429 0.234 0.384 0.582 0.492 0.419 RDS: 7 or lower, FPR = .187, TPR = .618 RDS = 8 or lower Study Meyers & Volbrecht Mathias, Greve, Bianchini et al. 2002 Etherton, Bianchini, Greve et al 2005 Etherton, Bianchini, Ciota, & Greve 2005 Axelrod, Fichtenberg, Millis, & Wertheimer, nd Ylioga, Baird, Podell (2009) Harrison, Rosenblum, Currie (2010) Greiffenstein, Baker, Gola (1994) Babikian, Boone, Lu, & Arnold y-intercept = .236, slope = .824 N 96 54 157 60 65 62 133 106 154 Cut score = 8 or lower BR PPS 0.49 0.458 0.444444 0.5 0.22293 0.49 0.333333 0.217 0.553846 0.753846154 0.532258 0.565 0.112782 0.263 0.40566 0.557 0.428571 0.377 RDS: 8 or lower, FPR = .236, TPR = .824 As we move from a cut score of 5 or lower to 6 or lower, we obtain substantial improvement in TPR estimate with little cost in FPR increase. Our choice for best cut score for RDS RDS: 6 or lower, FPR = .015, TPR = .434 Cut score FPR TPR 5 or lower 0 (.038) .25 (.07) 6 or lower .015 (.053) .434 (.082) 7 or lower .187 (.102) .618 (.155) 8 or lower .236 (.112) .824 (.190) By using WLS regression, we can obtain standard errors of our estimates of FPR and TPR. So, new researchers can test hypotheses about parametric values of FPR and TPR. Overcoming limiting factors of “known groups” validation in determining test score sensitivity and specificity Summary: 1. The TVS and MGV allow powerful research into existing published data sets. Summary data are used. 2. Understanding of parametric values of TPR and FPR is facilitated when researchers publish results on a variety of cut scores that should be considered. A frequency distribution would be ideal, for example, RDS n RDS n RDS n 0 5 5 7 10 88 1 0 6 51 11 74 2 0 7 68 12 61 3 1 8 79 13 32 4 3 9 98 14 12 3. Combining studies in this way allows us to generate stable values of TPR and FPR with SE’s so that new research can test those values. 4. Researchers should focus on the basis for estimating BR’s in their research groups. All research estimating FPR and TPR is vulnerable to error when the purity of research groups is overestimated. Working towards a reliable estimate of mixed group base rate will facilitate better validation studies. Reliably estimate local base rates of feigning for proper allocation of sensitivity and specificity information How can the Test Validation Summary help me determine my local BR? 1. Get the best estimate of the test FPR and TPR for a certain test score. 2. Find the proportion of test scores in your sample that are positive scores. The Test Validation Summary You review your records and determine that 40% of your patients have a positive score when the score has FPR = .10 and TPR = .80. Proportion Positive Scores on Classification Test 1 0.9 0.8 From the TVS, you see that this corresponds to a BR = .43. You see that in your clinic, the PPP for a positive score is .86 and the NPP for a negative score is .86. 0.7 0.6 0.5 0.4 0.3 NPP 0.2 PPP Proportion Positive Scores 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Base Rate of Condition FPR = .10 TPR = .80 From a sample, observe rate of positive scores. Use TVS to estimate condition BR in that sample, PPP and NPP for that BR. 527 criminal defendants who took RMT and VIP concurrently Rate of positive scores in this sample was .113 PPP = .814 1 – NPP = .077 1.00 FPR = .056, SE = .025 0.90 TPR = .742, SE = .093 TOMM No simulation studies Proportion Positive TOMM Scores 0.80 0.70 0.60 FPR = .056, SE = .025 0.50 0.40 0.30 TPR = .742, SE = .093 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Estimated Base Rate of Malingering 0.90 1.00 Beth A. Caillouet, Bernice A. Marcopulos, Jesse G. Brand, Julie Ann Kent, & Richard I. Frederick Question: What are the BRs of malingering in the two samples? Question: What are the BRs of malingering in the two samples? Information needed: Estimates of TOMM FPR and TPR. From TOMM TVS, we get FPR = .056, TPR = 742. Sample 1: Secondary gain present. Proportion positive scores = 55/220 = .25. Sample 2: Secondary gain absent. Proportion positive scores = 34/299 = .11. Use TOMM TVS to estimate BR of each sample. When PPS = .25, BR = .28. When PPS = .11, BR = .08. Defensibly choose symptom validity cut scores that are ideally suited for their local base rates M-FAST Malingering Genuine MFAST > 5 MFAST < 6 86 TPR = .93 FPR = .17 BR malingering = 35%, N = 86 Malingering Genuine .35(86) .65(86) MFAST > 5 MFAST < 6 TPR = .93 FPR = .17 BR malingering = 35%, N = 86 86 MFAST > 5 Malingering Genuine .93(30) .17(56) 30 56 MFAST < 6 TPR = .93 FPR = .17 BR malingering = 35%, N = 86 86 Malingering Genuine MFAST > 5 28 9.52 MFAST < 6 2 46.48 30 56 TPR = .93 FPR = .17 BR malingering = 35%, N = 86 86 Malingering Genuine MFAST > 5 28 10 MFAST < 6 2 46 30 56 TPR = .93 86 FPR = 10/56 = .18 BR malingering = 35%, N = 86 Malingering Genuine MFAST > 5 28 10 38 MFAST < 6 2 46 48 30 56 86 TPR = 28/30 = .93 PPP = 28/38 = .737 BR malingering = .35 NPP FPR = 10/56 = .18 NPP = .958 PPP 1 – FPR TPR Malingering Genuine MFAST > 5 28 103 131 MFAST < 6 2 467 469 30 570 600 TPR = 28/30 = .93 PPP = 28/131 = .213 BR malingering = .05 NPP FPR = 10/56 = .18 NPP = .996 PPP 1 – FPR TPR Test validation summary for M-FAST cut score recommended by test manual. PPP does not even reach 50% correct decisions until BR > .16 At recommended cut score FPR very high M-FAST > 5 FPR = .17 TPR = .93 At BR = .05, PPP does not exceed .50 until cut score adjusted to > 9 on M-FAST Combining information from local base rate and multiple symptom validity tests You can get estimates of PPP and NPP for the sample you work with—IF you can reliably estimate the BR. 737 defendants were administered: Rey 15 Item Memory Test (RMT)—memorize and reproduce 15 items—very easy test. Score is items reproduced (0 to 15) Word Recognition Test (WRT)—memorize 15 words, identify those 15 and correctly reject 15 from a list of 30. Score is number of hits and correct rejections (0 to 30) RMT validating using MGV with clinical probability judgments. FPR = .025 TPR = .574 Frederick & Bowden, 2009 RMT < 9 FPR = .025 TPR = .574 We found 726 defendants who completed BOTH RMT and WRT. 81/726 failed the RMT= .111 proportion positive score. By observation of TVS, then BR = .16, PPP = .814, NPP = .923 From a sample, observe rate of positive scores. Use TVS to estimate condition BR in that sample, PPP and NPP for that BR. 527 criminal defendants who took RMT and VIP concurrently Rate of positive scores in this sample was .113 PPP = .814 1 – NPP = .077 We found 726 defendants who completed BOTH RMT and WRT. 81/726 failed the RMT= .111 proportion positive score. By observation of TVS, then BR = .16, PPP = .814, NPP = .923 If PPP = .814, then in this sample, the probability of feigning if RMT is positive, is .814. If NPP = .923, then in this sample, the probability of feigning if RMT is negative is .077, or 1 - .923. To conduct MGV, we sampled from two groups: 1. The 645 individuals who passed the RMT—had a negative score. 2. The 81 individuals who failed the RMT—had a positive score. Example of sampling 645 individuals with negative scores, p(mal) = .077 Sample n = 360 81 individuals with positive scores, p(mal) = .814 Sample n = 40 400 cases, 10% failures, 90% passes Overall p(mal) = 40*.814 + 360*.077 = .151 Sample 25 times, plot x = .151, y = observed rate of positive WRT scores, n for WLS = 400 Group Ratio Failures Passes N BR Samples 1 0 0 645 645 0.077 1 2 0.1 40 300 400 0.1507 25 3 0.2 40 160 200 0.2244 25 4 0.3 40 93 133 0.2981 25 5 0.4 40 60 100 0.3718 25 6 0.5 40 40 80 0.4455 25 7 0.6 40 27 67 0.5192 25 8 0.7 40 17 57 0.5929 25 9 0.8 40 10 50 0.6666 25 10 0.9 40 4 44 0.7403 25 11 1 81 0 81 0.814 1 For each sample, BR was pre-estimated. Then we observed rate of positive WRT scores at each potential cut score. Word Recognition Test (WRT) Range 4 to 30, Mean = 23.2 Within group of RMT < 9, mean = 18.7 Within group of RMT > 8, mean = 23.8 Word Recognition Test (WRT) For every potential cut score of WRT (4 -30), we plotted all x, y pairs obtained from sampling We performed WLS to obtain the FPR and TPR estimates at every potential cut score. We plotted the FPR and TPR estimates at every potential cut score to generate the ROC curve. AUC = .905, SE = .012, 95% CI for AUC = .881-.930. Best cut scores: LTE 18 (TPR = .563, FPR = .034) LTE 19 (TPR = .620, FPR = .066) LTE 18 1 0.9 Proportion Positive Scores 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 Base Rate 0.6 0.7 0.8 0.9 1 We plotted the FPR and TPR estimates at every potential cut score to generate the ROC curve. TPR AUC = .905, SE = .012, 95% CI for AUC = .881.930. Best cut scores: LTE 18 (TPR = .563, FPR = .034) LTE 19 (TPR = .620, FPR = .066) FPR WORD RECOGNITON TEST (WRT) Summary: 1. We can use tests to form mixed groups for validation. 2. The best estimates of FPR and TPR for a test cut score allow us to estimate PPP and NPP at our sample BR. 3. Instead of “known groups” design (which is misleading), we do not presume to know (or care) about the status of any individual. We assign individuals “probabilities of having the condition” based on their test score. 4. Mixed groups have an overall “probability of having the condition,” which is the average of the individual probabilities. 5. We do not need to be certain about group memberships. We gain much flexibility by working with probabilities of having the condition vs. certainties of having the condition. Another example Dawes 1967 showed that valid probability judgments are excellent base rate indicators. His work was substantiated in Frederick 2000 and Frederick and Bowden 2009. To conduct MGV, we formed groups of defendants for whom individuals ratings of likelihood of malingering psychosis were generated by forensic psychologists, before any testing took place. The BR of malingered psychosis for each group was then the mean of the probability rating. If each member of the group had been rated as 10% likely to feign psychosis, then the BR of the group was estimated to be 10%. We then observed the hit rate (proportion positive scores) for the groups for a variety of F-family indicators of feigning on the MMPI-2 and MMPI-2-RF. We formed 15 groups of 30 individuals. For each group, we had a static base rate, which was the mean of the probability judgments assigned before testing. Within each group, we iteratively observed the hit rate of positive F-family indicators at each potential cut score. Using the BR estimate and the proportion positive scores at each potential cut score, we performed WLS to generate estimates of FPR and TPR. From these estimates, we generated ROC curves. 15 groups, 30 defendants in each group, 450 defendants Each defendant rated from 0 to 100 before testing, with respect to likelihood he would feign psychosis. Groups were formed after first sorting individuals by ratings, from lowest to highest. Mean ratings of groups (each group, n = 30): 0 0 1.2 4.2 5.0 5.0 5.0 5.0 8.1 10 14.5 22.2 30.3 45.7 72.3 Rates of positive F-family scores at each potential cut observed. Scale AUC SE 95%CI F .904 .015 .874-.933 Fp .870 .018 .834-.906 Fp (no L items) .905 .015 .877-.934 F-r .940 .011 .919-.962 Fp-r .926 .013 .901-.950 Estimates by Nicholson, Mouton, Bagby, Buis, Peterson, and Buigas (1998): AUC’s and SE: F (.929, .021) Fp (.885, .027) Scale Cutoff FPR TPR F GTE 28 .043 .635 Fp GTE 8 .054 .484 Fp (no L items) GTE7 .055 .537 F-r GTE20 .050 .640 Fp-r GTE8 .055 .652 Summary: 1. Using the estimates of likelihood of feigning based only on clinician judgment prior to testing did not result in random results. We can assume that mean probability judgments were effective base rate estimates. 2. Our estimates of F and Fp are consistent with estimates in large, well-validated analysis. 3. In this study, MMPI-2-RF indicators have higher mean AUC and lower SE than their MMPI-2 counterparts. Scale Cutoff FPR TPR F GTE 28 .043 .635 Combine information about F with the SIRS-2 f Valid Frequency 4 4 6 1 9 1 10 2 11 3 12 2 13 3 14 4 15 2 16 3 17 3 18 2 19 4 20 5 21 2 22 7 23 3 24 1 25 5 26 6 27 6 Percent 2.7 .7 .7 1.3 2.0 1.3 2.0 2.7 1.3 2.0 2.0 1.3 2.7 3.4 1.3 4.7 2.0 .7 3.4 4.0 4.0 Valid Percent 3.1 3.1 .8 3.8 .8 4.6 1.5 6.1 2.3 8.4 1.5 9.9 2.3 12.2 3.1 15.3 1.5 16.8 2.3 19.1 2.3 21.4 1.5 22.9 3.1 26.0 3.8 29.8 1.5 31.3 5.3 36.6 2.3 38.9 .8 39.7 3.8 43.5 4.6 48.1 4.6 52.7 28 4.7 5.3 7 58.0 Cumulative Percent 131 defendants who took MMPI and SIRS 52.7% of cases are 27 or lower 47.3% of cases are 28 or higher What is the base rate of feigned psychopathology? Scale Cutoff FPR TPR F GTE 28 .043 .635 BR TPR FPR NPP PPP What we say: Within our sample of 131 defendants, the BR of feigned psychopathology is .73 (NOT .475) At BR = .73, the PPP of F GTE 28 is .976. At BR = .73, the NPP of GTE 28 is .492, so p(feigning if LTE 27) is still .508) (Remember, they’re being given the SIRS for a reason) F < 28 NPP about .66 F > 27 Application of MGV to a CGV estimation of FPR and TPR Greve, Bianchini, Love, Brennan, & Heinly (2006) articulated six separate groups with increasing base rate of malingering based on formal criteria for malingering (the Slick criteria) to validate the MMPI-2 Fake Bad Scale 1. No incentive (no evidence of external incentive and no test performance suggestive of malingering; n = 18, mean FBS = 15.4) 2. Incentive (external incentive, but no test performance suggestive of malingering; n = 79, mean FBS = 19.5) 3. Suspect (external incentive and at least one indicator suggestive of malingering; n = 66, mean FBS = 22.7) 4. Statistically Likely (external incentive; at least two indicators suggestive of malingering; n = 51, mean FBS = 22.8) 5. Probable (external incentive; strong indicators of malingering; n = 31, mean FBS = 26.9) 6. Definite (external incentive; very strong indicators of malingering; n = 14, mean FBS = 29.8) Even though it is clear that BR Definite > BR Probable > BR Statistically Likely > BR Suspect > BR Incentive Only > BR No Incentive They were required, to conduct “Known” groups validation, to ignore this obvious circumstance and to define BR No Incentive = BR Incentive Only = 0 BR Statistically Likely = BR Probable = BR Definite = 1.0 And drop all participants defined as Suspect to yield the following ROC FBS ROC generated by “Known” groups validation by Greve & Bianchini If we had estimates of the BR for each of the subgroups formed by Greve and Bianchini, we could use MGV to estimate FPR and TPR for each potential cut score. We have our stable estimate of TOMM FPR and TPR 1.00 TOMM No simulation studies FPR = .056, SE = .025 0.90 TPR = .742, SE = .093 Proportion Positive TOMM Scores 0.80 FPR = .056, SE = .025 0.70 0.60 0.50 TPR = .742, SE = .093 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Estimated Base Rate of Malingering 0.90 1.00 We can get estimates of BRs for those groups from other work by Greve & Bianchini. They formed similar groups using the Slick criteria to investigate the TOMM. We can use the proportion of positive TOMMs in each of these subgroups to estimate the BRs in each of them. From Greve, Bianchini, Doane (2006) Proportion Positive TOMM Scores No Inc 0 Inc Only 5 Suspect 20 Probable 47 Definite 78 The Test Validation Summary 1 Proportion Positive Scores on TOMM 0.9 TOMM: FPR = .056, TPR = .742 0.8 Est BR of Proportion Positive TOMM malingering Scores 0.7 No Inc 0 0 Inc Only .05 0 Suspect .20 .21 Probable .47 .633 0.6 0.5 0.4 0.3 0.2 0.1 0 .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Base Rate of Malingering Definite .78 1 We take these BR estimates and reapply them to the Greve & Bianchini FBS data. Example of MGV for FBS based on BR estimates for Greve & Bianchini groups established by Slick criteria Base Rate of Malingering 0 0 .21 .633 1 Pr + Tests .11 .09 .23 .52 .79 n 18 79 66 31 14 For FBS > 27, using WLS Regression, FPR = .091, TPR = .773 (For WLS, n is the weighted variable) At FBS > 27 Evaluate constructs that underlie symptom validity tests 1.00 10 clinical studies using Rey 15-Item Test FPR = .054, SE = .037; TPR = .570, SE = .119 0.90 0.80 Probability RMT Score < 9 0.70 No simulators 0.60 All clinical data 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Estimated Base Rate of Malingering 0.90 1.00 RMT validating using MGV with clinical probability judgments. FPR = .025 TPR = .574 Frederick & Bowden, 2009 CI, TPR = .574, SE = .044 We will generate TVS based on these values and find PPP and 1 – NPP to estimate probability of bad intent represented by RMT score. Intends to respond correctly Inconsistent/Invalid Compliant/Valid Low Effort High Effort Irrelevant/Invalid Suppressed/Invalid Does not intend to respond correctly Validity Indicator Profile VIP Verbal Subtest Items • Easy: Baby Drink Infant • Moderate: People Ally Folk • Difficult: Nimiety Conceit Surfeit 1.0 Compliant 0.9 Sector 1 Sector 2 Sector 3 Proportion Correct 0.8 0.7 0.6 Irrelevant 0.5 0.4 0.3 0.2 0.1 Suppression Malingering 0.0 0 10 20 30 40 50 60 Running Mean Serial Position 70 80 90 1.0 Compliant 0.9 Sector 1 Sector 2 Sector 3 Proportion Correct 0.8 Not guessing, knowledgeable responding 0.7 0.6 Guessing 0.5 Guessing is imminent 0.4 0.3 0.2 0.1 Easy items Difficult items 0.0 0 10 20 30 40 50 60 Running Mean Serial Position 70 80 90 1.0 0.9 Proportion Correct 0.8 0.7 0.6 Careless curves Inconsistent curves 0.5 0.4 0.3 0.2 0.1 0.0 0 10 20 30 40 50 60 Running Mean Serial Position 70 80 90 527 criminal defendants who took RMT and VIP concurrently Rate of positive scores in this sample was .113 PPP = .814 1 – NPP = .077 1.00 Here we are matching VIP categories to the construct most likely captured by the VIP. FPR = 0, SE = .028 0.90 TPR = .859, SE = .130 Probability of Suppression or Irrelevant on VIP 0.80 0.70 0.60 0.50 0.40 0.30 BR of .42 estimated for this group is mean of PPP for positive RMT scores in this group and (1 – NPP) for negative RMT scores in the group Points in scatterplot represent groups of 25 individuals. Sorted defendants 0.20 by clinical ratings of malingering, 0.10 then took 20 groups of 25 and 0.00 one group of 27, 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 for 527 defendants. Probability Feigned Cognitive Impairment Given by RMT < 9 1.00 Same 21 subgroups, N = 527 defendants 0.90 Probability Inconsistent or Lower on VIP 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Probability Feigned Cognitive Impairment 0.90 1.00 527 criminal defendants VRIN was converted to “probability of invalid responding” by dividing VRIN raw score by 12. VRIN raw scores >12 were assigned p = 1. We are interested in FPR and TPR for “Invalid”