Back to Lie Detection Home Page VALIDITY OF THE LIE DETECTOR A Psychophysiological Perspective JOHN J. FUREDY University of Toronto RONALD J. HESLEGRAVE Defense and Civil Institute of Environmental Medicine, Toronto The profession of polygraphy uses physiological measures to improve the detection of deception. The science most relevant for assessing the utility of these measures is that of psychophysiology. This article examines the validity of polygraphy from the perspective of the science of psychophysiology, which employs physiological measures to study and differentiate psychological processes. The focus is on the version practiced currently by most members of the American Polygraph Association, the "control question technique" (CQT). A brief consideration of some critical terms is followed by a description of CQT polygraphy, and then by a review of the literature. We conclude that as a scientific tool, CQT polygraphy is of questionable validity, although it is probably a better-than-chance detector of guilt E specially in the United States, but also in Canada, Israel, Japan, and the United Kingdom, the forensic use of the lie detector is increasing (Office of Technology Assessment Report, 1983). The evaluation of polygraphic assessment continues to be a matter of heated and complex controversy among both laymen and experts in cognate fields. The aim of this review is to shed some light on the validity of the lie detector, and the practice of polygraphy as established by the American Polygraph Association (APA).' AUTHORS' NOTE: The preparation of this article was supported by grants to JJF from the National Research Council of Canada and the Social Sciences Research Council of Canada (Sabbatical leave Fellowship). We are indebted to Caroline Davis for comments on an earlier draft. Correspondence can be CRIMINAL JUSTICE AND BEHAVIOR, Vol. 15 No. 2, June 1988 219-246 • 1988 American Association for Correctional Psychology 219 220 CRIMINAL JUSTICE AND BEHAVIOR In any controversial area, the basic terms are often only emotively rather than intellectually understood. Usage may hallow a particular meaning that is not justified by the facts. This is the case with the meaning of polygraphy. Through association with the APA, this term has come to mean the specific procedures of the profession of polygraphers that are used, purportedly, to detect deception. As a result, it has become common to talk about "administering a polygraph test," and polygraphic organizations have indicated that such administration should be restricted to bona fide experts in "polygraphy." One effect of this usage is that it implies that a polygraph test is a standardized technique for differentiating honesty from guilt (or, at the very least, innocence from guilt) by an expert. In fact, as will become apparent, the "test" is not standardized, and it is doubtful whether the examiners can be considered to be (scientific) experts in the legal sense of the term. More generally, the term polygraph refers to an instrument that has far wider uses than those of lie detection, and are also far less mysterious. The polygraph monitors a number of subtle physiological changes in functions (such as heart rate, skin resistance, and blood pressure) that the body constantly undergoes when responding to internal needs and external demands from the environment. These small changes are influenced by the autonomic nervous system; they are not under voluntary control, and we are not generally aware of them. The changes in autonomic functions are detected by amplifying and recording the functions on a multichannel instrument, the polygraph. It is so called because it has many (poly) pens, with each pen dedicated to recording (or graphing) a specific physiological function being measured. Because the topic under discussion has a number of other terminological difficulties, we shall begin our treatment with a section that considers some of the critical terms and the relations between them. addressed to John Furedy at the Department of Psychology, University of Toronto, Toronto, Ontario, M5S 1A1, Canada. Furedy, Heslegrave / LIE DETECTOR VALIDITY 221 TERMS OF REFERENCE The expression lie detector will be used here to refer to the methods used to establish guilt or innocence by the polygraphic profession as practiced by members of the APA. This restriction is important because there are, of course, many other logically possible ways of attempting to differentiate guilt from innocence. We focus on the APA type of polygraphic methods both because these methods are primarily in use in Western democratic societies and because it is these methods that have evolved out of the science of psychophysiology. The term validity requires a little elaboration. In the psychological testing literature (the polygraph being, essentially, a psychological test—see Lykken, 1981), reliability is regarded as necessary, though not sufficient, for validity. A test is reliable when repeated administrations of the test to the same individual (assumed to be unchanged) from one occasion to another yield comparable results. For the test to be valid, it must also measure what it purports to measure. So, to take a simple example from physics, a faulty thermometer that has constant measurement error (say, 10 degrees) due to a manufacturing flaw, may be perfectly reliable, but quite invalid or inaccurate; it will reliably show temperatures 10 degrees different from the actual temperature. The modern instruments (polygraphs) available to most professional polygraphers are reliable in the sense of providing an adequate record of the actual physiological functions. From the point of view of reliability, therefore, there is no great problem with these instruments. Rather, it is their validity or accuracy that is at issue, and we shall refer to accuracy rather than validity, because this more clearly allows us to discuss the possible errors in classification, and, more important, to make a critical distinction between two sorts of errors. When polygraphers are asked to give a single percentage figure on the "accuracy" of polygraphy, the misconception is promoted that either the errors are of a single type, or that differences between error types are unimportant. On the contrary, it is critical 222 CRIMINAL JUSTICE AND BEHAVIOR to distinguish between at least two major sorts of errors: false negative and false positive. The distinction between these error classes was strongly emphasized by the modern approach to psychophysical problems (Tanner & Swets, 1954): the theory of signal detection (TSD). Although the status of TSD and its treatment of the concept of psychophysical threshold is still a matter of controversy, the distinction between the two sorts of errors is a widely accepted and sound one. In the context of detecting deception, false negative errors are instances where the deceptive suspect is declared to be truthful. There is a tendency to assume that any shortfall from perfection of the polygraph is represented only by errors in favor of the accused, that is, of letting the guilty go free, which is consistent with the intention of the justice system. However, false positive errors also occur where the truthful are wrongly judged to be deceptive. These errors lead to the innocent being falsely accused, and, from a judicial point of view, are more serious than the false negative errors. Moreover, the severity of this problem is magnified by evidence that in polygraphy, false positive errors may be more likely than false negative errors (Horvarth, 1977). Another complicating aspect in polygraphy is that in addition to the truthful and deceptive decisions, there is also a third sort of decision that is made when the examiner is unwilling to assign the examinee to the truthful or deceptive categories. This third type of decision—the "inconclusive" result—would appear to be an error from a scientific point of view, because neither guilt nor innocence has been detected. However, from a legal point of view, the outcome is consistent with the intention of justice, according to which the accused cannot be considered guilty regardless of the "true" nature of this circumstance. It is also relevant to consider the qualitative implications of the distinction between false negative and false positive errors. These implications concern the subjective value that is placed on the two sorts of errors, which is determined both by the circumstances and by the ethical views of the person evaluating those circumstances. As an illustration, contrast a habitual thief with a record of previous thefts with a person without any previous convictions Furedy, Heslegrave / LIE DETECTOR VALIDITY 223 who is in a position of responsibility and authority (e.g., a bank manager). Most would argue that false positive errors are less serious than negative errors in the case of the habitual thief. That is, even if the thief were to be wrongly accused inasmuch as he or she had not committed that particular theft, he or she probably committed other similar crimes without being caught. In contrast, the false negative error of not catching the habitual criminal for an actual crime is a more, serious problem, especially from the point of view of future victims. Just the reverse holds for the "solid-citizen" case, where an incorrect accusation (even if later proved wrong) can have a very dramatically deleterious effect on the life of the unfortunate innocent accused. Not only the circumstances but also the views of the evaluator are important. Hence while most evaluators would agree that there is a difference between the habitual-criminal and respectable-executive circumstances, the size of this difference is very much in the eye of the beholding evaluator. An evaluator who is for "law and order" would probably argue that false positives in the case of the criminal are almost completely unimportant, rather than being merely less important than in the case of the respectable executive. An evaluator whose values lay a greater emphasis on civil liberties would, on the other hand, argue that false positives should never be viewed as unimportant even in the case of habitual criminals, because such a position is contrary to the principle that all persons are entitled to a fair hearing. In addition, it is also likely that polygraphers tend to lean, in general, toward undervaluing the importance of false positive errors, since polygraph results represent only one source of evidence contributing to the eventual outcome of the accusation. As detailed below, a recent Israeli experiment showed that when in situations where the risk of falsely classifying innocent subjects was high, professional polygraphers were very ready to accept such false positive errors as long as they thought that their rate of detecting the guilty was also high (Ben-Shakhar, Lieblich, & Bar-Hillel, 1982). This discussion makes clear why statements of the form, "polygraphy is X% accurate," are too simplistic, masking as they do a number of technical as well as ethical complexities. 224 CRIMINAL JUSTICE AND BEHAVIOR It is also important to recognize the important role played by the attitudes of evaluators toward accepting errors, and weighing these errors depending upon the circumstances. Because the circumstances are known by the evaluator prior to the administration of the polygraph, and because the test is not standardized, it is likely that not only will the outcome be judged on the basis of examinee circumstance and examiner attitude, but also the administration of the test will be shaped by these prejudices. Because the test is psychological in the sense of involving a complex interview-like interaction between examiner and examinee, any biases in designing and administering the test are likely to produce outcomes that are consistent with those biases. So different individuals accused of different crimes may be given quite different tests, even though all those tests are called by a single name—a polygraph test. Indeed, the term test itself is potentially misleading, because it suggests relatively standardized instruments such as IQ tests that, though controversial, give essentially the same results across competent operators. Although there is much dispute about the accuracy of polygraphy, there is agreement that this profession offers a technical tool rather than a novel scientific area of investigation. All technical tools are founded on basic scientific fields. Because the central technological claim of poly graphy is that it uses physiological measures to detect deception, the scientific field that is most cognate to polygraphy is psychophysiology. Psychophysiology may be defined as that branch of psychology that uses unobtrusively measured physiological changes (of which we are not normally aware) to study psychological processes (for elaboration, see, e.g., Furedy, 1983). Because deception is a psychological process, at least that part of polygraphy that uses the physiological measures to detect deception is appropriately conceived as a technique that attempts to apply the principles of psychophysiology. Of course the polygraphic situation has other components to be described in greater detail below. These components include the pretest interview (where the examiner tries to establish the content for the test), and the confession-inducing function of the polygraph. Concerning the first component, scientific information Furedy, Heslegrave / LIE DETECTOR VALIDITY 225 from the fields of psychological testing and clinical psychology appear germane, while the field of social psychology is relevant to the second component. Accordingly, the scientific underpinnings of the profession of polygraphy are varied and complicated, so that there is unlikely to be simple answer to the question of polygraphy's scientific status. THE CONTEMPORARY PRACTICE OF POLYGRAPHY The prescientific and modern origins of polygraphy have been summarized elsewhere (Furedy, 1986; for a more detailed account, see Heslegrave, 1981). In this article we shall move directly to describe the current version. DESCRIPTION OF THE CURRENT VERSION OF THE POLYGRAPHIC EXAMINATION The polygraphic examination as practiced by the APA is a process that lasts from one to one-and-a-half hours. It has two phases: the initial interview and the polygraphic test. The first phase, lasting between 30 and 60 minutes, is used to establish rapport with the interviewee, to work out the exact form of the questions to be asked, to convince the interviewee (sometimes by way of "demonstration") of the infallibility of the polygraph, and to ensure that the polygraph is in working order in the sense that the physiological changes are being clearly recorded. During this time, the polygrapher seeks to present a very "professional" image. Nowadays polygraphic equipment is sufficiently standardized that four function are recorded. Probably the most sensitive function is one commonly referred to as the "GSR" or the "galvano" channel by polygraphers. It is the short-term change in skin resistance (or conductance) that is elicited by most stimuli one or two seconds following the stimulus, and lasting two or three seconds. This is the "galvanic skin response," hence the abbreviation "GSR." Another channel, known as the "cardio" 226 CRIMINAL JUSTICE AND BEHAVIOR function by polygraphers, is the approximate mean of the systolic and diastolic blood pressures. This is recorded by a pressure cuff on the arm, but by settling for mean pressure, the polygrapher is able to get a continuous, beat-by-beat estimate of relative (up versus down) changes in pressure. To obtain this "cardio" measure, the cuff has to be kept continuously inflated between the systolic and diastolic levels. Although not painful, this procedure certainly detracts to a significant extent from the ideal psychophysiological measure, which should be completely unobtrusive (see Furedy, 1983). A third function is respiration, in which relatively small changes in amplitude and frequency are under at least partial autonomic control. Because of the complexity of the respiratory waveform, and because autonomic control is only partial, this function has not been extensively studied by psychophysiologists. Polygraphers, however, make significant use of this channel, and when only three functions are recorded, it is usual to have two respiratory channels (obtained by having two recording couplers around the thorasic and abdominal areas on the torso). When fourth channel records an additional function, this has been changes in the peripheral vasculature, as represented by the blood flow in the tip of the index finger. The vasomotor response (VMR) behaves rather similarly to the GSR, inasmuch as it is activated as a short-term change by any change in stimulation, and it increases as a function of stimulus intensity. Except for respiration, the measured changes are under autonomic rather than voluntary control, and the subject is not normally aware of them. Even in the case of respiration, given that the changes being measured are quite small, and that the interviewee is told not to make any gross changes in breathing, the autonomic, nonvoluntarily controlled description is reasonably accurate. However, another reason for this channel is to discover noncompliance with instructions, which presumably may be used as another index of guilt or deception, with deceptive suspects being less compliant. The polygraphic test proper is based on the rationale that questions ("stimuli") that are relevant will elicit greater physiological responses in the guilty than in the innocent. However, because Furedy, Heslegrave / LIE DETECTOR VALIDITY 227 individual differences in physiological responsivity are great, the comparison needs to be made within subjects. That is, one must compare the responses to relevant questions to those of other questions for the each person, rather than contrasting the responses of different persons. Early polygraphers simply compared crime-relevant questions (e.g., Did you kill X?) to irrelevant ones (e.g., Were you born in year Y?), but this poses the obvious problem that innocent subjects may respond more to the relevant question simply because is more emotionally arousing or anxiety provoking than the irrelevant question. The current polygraphic solution to this problem is to use "control" questions. These control questions are made up in consultation with the interviewee during the first phase. They are designed to elicit at least as much emotion as the relevant questions do for the innocent. For example, the polygrapher may establish during the first phrase that the interviewee has stolen something on a previous occasion. Then the polygrapher asks the interviewee to answer "no"(i.e., lie) to the following control question: "Apart from the present problem, did you ever steal anything in your life?" The deception on this question by an innocent subject is presumed to produce a larger response than the (truthful) answer by him or her to the relevant question. In this modern "control-question" approach, the interviewee can, be judged to be deceptive if responses to the relevant questions exceed those to the control questions by a reasonable margin. It is important to recognize that, as emphasized by Lykken (1981), the term control here is not used in the scientific sense. In the scientific sense of the term, assuming the phenomenon to be investigated is deception, the control question should be identical in every respect to the relevant question except for the presence of deception in the latter question; in terms of the logic of scientific experimentation, the relevant question is like the experimental condition.2 The scientific sense of "control" is not properly applicable, in our opinion, to the polygraphic field situation, but only to a certain laboratory paradigm designed specifically to study the psychological process of deception (Hemsley, 1977; Heslegrave, 1981). Nevertheless, even though the 228 CRIMINAL JUSTICE AND BEHAVIOR polygraphic control-question technique (CQT) provides no control in the scientific sense of the term, it may still prove an accurate way of detecting guilt, by attempting to- control the differences in the emotional content or the psychological significance between the relevant and control questions. In the modern CQT examination, there are about ten questions. The form and content of all of these questions are determined in the interview phase in conjunction with the interviewee. About three each of the questions are relevant, control, and irrelevant. In a single "test," the questions are presented about 15-30 seconds apart, and they are formulated so that the subject can answer yes or no to each question. After three of these "test" series, there is a pause. During the pause the polygrapher usually leaves the interrogation room to interpret the tests. If a decision cannot be made by the polygrapher, additional tests may be administered. During these tests, but most commonly following the pause after the third test, the confession-inducing function of the polygraphy is brought into play. By this point the examiner has had a lot of information on which to "judge" the examinee. The available information included not only the physiological responses, but possibly the past criminal records, the present behavior of the subject during the interview, and the current facts about the case. Even if the examiner is not sure that the examinee is lying, that doubt is resoluble (according to professional polygraphers) by pressing the examinee into a confession of guilt. Usually this involves asserting that the "machine" indicates (to the examiner) that the subject has been lying to the relevant questions. According to anecdotal evidence of polygraphers, about 30% of the subjects break down at this point and admit their guilt. Polygraphers argue that this confession-inducing function is really a part of polygraphy's detection function. However, that argument rests on the very debatable assumption that all such induced confessions are true. We shall consider this problem in greater detail below. An important aspect of the polygraphic examination is the interpretation of the physiological responses. Currently, there are two basic methods of scoring: subjective and objective. The Furedy, Heslegrave / LIE DETECTOR VALIDITY 229 subjective method, which used to be the only method used by most polygraphers (whose training is often limited to 7-8 weeks of course work to cover physiology, psychology, psychophysiology, and the polygraphic techniques, plus a 6-month internship), consists of simply inspecting the shape of the responses and deciding, in a general sort of way, whether the person has been deceptive. This qualitative scoring method rests on the assumption that there is a unique pattern of physiological responding associated with lying, namely, the "specific lie response." This notion used to be popular with polygraphers, but has no evidential basis in the science of psychophysiology. Nevertheless, perhaps because individual physiological recordings are intuitively unique and striking, the notion persists among some professional polygraphers that such qualitative scoring is accurate. The "objective," quantitative method of scoring has more recently been adapted as the preferred method, especially since polygraphers with sound psychophysiological research credentials have documented its utility. The rationale for this method— originated by Backster (1962) and improved on by Raskin (1976)—does not require the assumption of a specific lie response, but only that, in terms of some specifiable quantity, the physiological responses to the relevant questions exceed those to the control questions. The method itself classifies the differences between pairs of relevant and control questions in each response channel as a function of magnitude ranging from +3 to -3. The algebraic value of these numbers is positive to the extent that the control response exceeds the relevant response. For example, the algebraic sum of these scores over 3 response channels, 3 question pairs, and 3 tests determines how the subject is classified. In the example, the range of scores is from +81 to -81, and cutoff points are as follows: truthful (+6 or more), deceptive (-6 or less), and "inconclusive" (between +5 and -5). Strictly speaking, the last classification refers really to the outcome of the polygraphic examination rather than to the interviewee, because it indicates merely that the test score does not permit the polygrapher to come to a decision concerning whether the interviewee is truthful or deceptive. Polygraphers, therefore, claim that the inconclusive 230 CRIMINAL JUSTICE AND BEHAVIOR category is not a real "decision," and that only a binary, truthful/ deceptive decision is involved in undergoing a polygraphic test. However, the inconclusive outcome is clearly a third category that is applied to the examinee by both the polygrapher and the polygrapher's clients, and it is also obvious that the outcome affects the examinee differently from both a truthful and a deceptive outcome. One such differential consequence for the examinee of an inconclusive outcome is that the polygrapher may decide to give one or two additional tests, or even another complete examination on another occasion. It is important to recognize that, compared to the scoring methods used in the science of psychophysiology, the "objectivity" of the above polygraphic scoring method is severely limited, and has been labeled by its originators (Barland & Raskin, 1975) as "semi-objective." One basic problem is that the score (ranging from +3 to -3) is arrived at by subjective and qualitative means. Another problem is that the setting of the cutoff points (6) for inconclusives is arbitrary. On the other hand, polygraphers point out that their task is more difficult than that of the psychophysiologist, who can average over many subjects and draw statistical inferences concerning whether or not there is a significant difference between two conditions. The polygrapher is required to make a decision concerning a specific individual. Another arbitrary aspect of the polygraphic cutoff criteria is that there is no allowance for number of channels and number of tests. While it is true that the first number is usually three or four, and that the second number varies, in fact, usually between three and five,, this still remains a problem at least in principle. This is so because, mathematically, the chances of scoring an examination inconclusive decrease as a function of the sum of the number of channels used and tests administered. These chances, indeed, asymptote toward zero as that sum approaches infinity. However, it would be possible for the polygraphers to counter that the same sort of difficulty holds for traditional group significance testing in experimental psychophysiology, where the chances of finding a "significant" difference becomes near-perfect as the sample size becomes very large. It is because of this that a statistically Furedy, Heslegrave / LIE DETECTOR VALIDITY 231 significant difference between two groups on an IQ test is often considered to be psychologically insignificant if the difference is small and the group samples are large (e.g., in the hundreds).3 Accordingly, we suggest that it is only the basic scoring method rather than the cutoff criteria that suffer uniquely from subjectivity. That problem, however, is exacerbated by the fact that it is polygraphic practice to have the records scored by the examiner, rather than being "blindly" scored by an individual who has access only to the physiological records themselves. Even when measurement is fully objective, errors from bias can creep in. That is a principle that holds not only in the biological sciences but also in such "hard" sciences as astronomy. However, when judgment is required of the sort involved in deciding to characterize a relevant question as "clearly" (2) versus "slightly" (1) greater than its paired control, it is obvious that the biases of the observer can significantly affect the numerical values assigned. In this regard, professional polygraphers are loath to give up their privilege of scoring their own records, if only because they need the information on the spot. Yet in terms of objectivity or lack of bias, it would seem that this problem is a considerable one for the apparently "objective" mode of polygraphic scoring. Despite this problem, field scoring is almost exclusively done by the examiner, and it is only in research that blind-scoring studies have been undertaken (e.g., Horvath, 1977). One potential amelioration of this problem is represented by the efforts of Raskin and his colleagues (e.g., Kircher & Raskin, 1983) to provide computer scoring of tests. This kind of objective scoring may result in superior decisions by individual polygraphers, although it must be stressed that no degree of sophistication on the measurement side can overcome the other problems such as those discussed above with regard to the scientific shortcomings of the so-called control-question technique. Before concluding this description, two variants of traditional lie detection should be mentioned. These variants are not part of the current professional polygrapher's package, but they are intended for the same purpose—detection of the guilty. The first variant is a psychophysiologically based method known as the 232 CRIMINAL JUSTICE AND BEHAVIOR Guilty Knowledge Technique (GKT). The GKT was introduced by Lykken (1959) as an alternative to the standard CQT used by professional polygraphers. The rationale is to focus on the guilty knowledge (i.e., significance of the question) rather than the guilty person (i.e., emotional content of the question), and therefore to ask questions that only the guilty can know the truthful answer. Both the rationale and the accuracy of the GKT are superior to that of the CQT (see Bradley & Janisse, 1981; Lykken, 1981; Podlesny & Raskin, 1977), although from a purely psychophysiological perspective it is important to note that it is probably not the process of deception but rather an orienting process that is being detected (Ben-Shakhar, 1977; Furedy, 1986; Heslegrave, 1981). However, the evidence favoring the GKT is based only on laboratory studies, and this fact is no accident. For its implementation, the GKT requires that the details of the crime to be used in the polygraph test be kept secret, in order that the guilty knowledge be truly unique to the perpetrator of the crime. Because this requirement runs counter to normal police procedures, as well as being quite difficult to implement, the GKT is seldom used by current polygraphers. However, it is clear that if police and polygraphic methods are modified, the GKT is a promising future method for the accurate detection of the guilty when a specific crime has been committed (see also Lykken, 1981, for a hypothetical case). The second variant is the Psychological Stress Evaluator (PSE). This seeks to make use of the fact that there are small changes in voice inflection that a speaker is unaware of, and that respond to stress by changes in the microtremors found in the larynx (Lippold, 1971). These changes can be amplified and displayed either on an oscilloscope or a graphic recorder. As a response to stress, or to the emotional content of the question, these changes can be taken as an indication of lying in the same way as other measures in the CQT polygraph test. The PSE is very convenient to use. Not only is it unnecessary to affix electrodes to the body, but the PSE can even be used on tapes or telephone conversations. Fortunately for those who would regret the decline personal freedom if all our conversations could be so monitored for their truthfulness, research has shown the PSE to Furedy, Heslegrave / LIE DETECTOR VALIDITY 233 be no better than chance (Brenner, Branscombe, & Schwartz, 1979; Horvath, 1978; Lynch & Henry, 1979; Nachshon & Feldman, 1980). In this regard, the PSE has to be sharply distinguished from modern CQT polygraphy, for even polygraphy's severest critics (e.g., Lykken, 1981) do not argue that polygraphy's accuracy is not better than chance, and the APA itself has officially rejected the use of the PSE (Horvath, 1982). VALIDITY In this section we shall briefly review some of the literature pertinent to the accuracy and validity of current detection of deception techniques. The literature on accuracy is vast, and there is no pretension of exhaustiveness in this brief review. This review will cover several of the more important issues related to determining the validity of detection of deception techniques. Our intention is to provide the reader with information that will facilitate a critical evaluation of accuracy claims. One necessary prolegomenon, however, is that the accuracy estimates of polygraphic detection of deception are dependent upon a great many factors. The skill level of the examiner, the psychological state of the subject, the scoring procedures, the questioning techniques, and the particular physiological variables measured are but a few of the variables that must be taken into account when one is attempting to determine the accuracy or validity of the procedure. As an overriding caveat, it should be clear that most studies have provided insufficient control over the many factors that can influence the accuracy of detection of deception procedures. Accordingly, the validity of these procedures remains an unresolved issue and estimates of accuracy range from chance to perfection. IS DECEPTION ACTUALLY DETECTED? Polygraphy has come to be known as the "detection of deception," but this still leaves open the question of whether, in fact, it is deception that is being detected by the procedure. We 234 CRIMINAL JUSTICE AND BEHAVIOR shall focus on variants of the control question technique (CQT) in this regard, although we shall also consider the guilty knowledge technique (GKT) at the end of this subsection. In the CQT, the control questions are paired with the relevant questions by being temporally adjacent in the question series; this temporal proximity minimizes differential habituation effects. The control questions are designed in a pretest interview and deal with similar circumstances to those covered by the relevant questions "so that the subject is very likely to be deceptive to them or very concerned about them" (Podlesny & Raskin, 1977, p. 786). Although there is some dispute over the exact theoretical formulation underlying the CQT (Lykken, 1978, 1979, 1981; Raskin, 1978; Raskin & Podlesny, 1979), in general the theory is that guilty subjects will be deceptive to relevant questions and show stronger autonomic responding to the relevant than the control questions. In contrast, the control question is meant to be "a stronger stimulus for the innocent subject because he knows he is truthful to the relevant questions; he has been led to believe that the control questions are also very important in assessing his veracity . . . and he is either deceptive in his answers, very concerned about his answers, or unsure of his truthfulness because of the vagueness of the questions and problems in recalling the events" (Raskin & Podlesny, 1979, p. 54). Although a number of practical and theoretical problems with the CQT have been identified by Lykken (1974,1978,1979), the main point is that the procedure does not provide an adequate scientific control for detecting deception, because it is impossible to estimate what the relevant response would have been if the answer to the relevant question had been honest. Indeed, Raskin and Podlesny (1979) have argued that the control question is not meant as a scientific control for deception. Rather, it is meant as a stronger emotional stimulus than the relevant question for innocent subjects. Therefore, it is meant as an "emotional standard" (Barland & Raskin, 1973, p. 43) designed to enhance the innocent subject's responses to control questions. In fact, in Raskin and Podlesny's terms (1979), quoted above, the control questions are meant to be of great concern to all subjects (since guilty and innocent subjects cannot be discriminated beforehand). Furedy, Heslegrave / LIE DETECTOR VALIDITY 235 The users of the CQT, then, do not employ the technique to detect deception per se, but rather employ it to detect the guilty. One can, indeed, generate the seemingly paradoxical conclusion that deception could be detected only in those innocent subjects who give larger responses to the control than to the relevant questions. In this case, innocent individuals would be those who, being asked to be deceptive to the control questions and being truthful (and hence innocent) to the relevant questions, would be classified by the CQT user (the polygrapher) as "nondeceptive," whereas they were actually deceptive (as instructed) with respect to the control question. Lykken's (1959,1960) Guilty Knowledge Technique—GKT— does not suffer from the scientific problems of control methodology that weaken the CQT. This is because the GKT does provide a control comparison that is a reasonable estimate of the subjects' responses to relevant questions if they were being honest to those questions. However, the GKT is also not designed to detect deception per se. The rationale, rather, is that because of the guilty person's special knowledge about some crime-related issue, the response to a question about that issue will exceed that of a person who does not possess any such guilty knowledge, because the relevant knowledge would be more salient to the guilty.4 So even if the GKT were to be commonly used in the field, which it is not, it would still not serve to detect deception per se, despite the fact that the term detection of deception has been accepted into current usage. However, the fact that deception is not being detected highlights the problems with this technique. If an innocent suspect has crime-relevant, but not genuinely guilty, knowledge that has been acquired in ways not associated with the commission of the crime (e.g., if the police who were at the scene of the crime inadvertently release information that is later used to construct a relevant question in the GKT), then he or she may be misclassified as guilty.3 LABORATORY VERSUS FIELD VALIDITY Even if current techniques do not detect deception per se, it is still possible that they do detect the guilty, and differentiate them 236 CRIMINAL JUSTICE AND BEHAVIOR from those who did not commit specific criminal acts. However, it is very difficult to get a precise estimate of the accuracy of polygraphy. Polygraphers themselves who write in polygraphy journals are, not surprisingly, very sanguine about the level of their profession's accuracy. Recently, Ansley (1983) has provided a review in which he reports an accuracy of 96.3% for field cases. However, this review has failed to cite a number of studies that reported quite low accuracy rates, and also includes reference to many studies that lacked any semblance of scientific control. However, the overall accuracy issue appears confused even when only scientifically respectable studies are considered. Whereas Lykken (1978, 1979) estimates accuracy to be only slightly above chance (i.e., 64%-71%), Raskin and his associates (Podlesny & Raskin, 1978; Raskin, 1978), reviewing the same body of literature, put the figure as near perfect, that is, 90%-95%. In what follows, we consider some of the complex factors that are responsible for this great variation in estimates between two respected members of the scientific psychophysiological community. One of the most significant factors that affect conclusions concerning the validity of the CQT is whether the accuracy is determined in laboratory and mock-event studies, or in actual field investigations. There appears to be a consensus that the differences between the laboratory and field situations are sufficiently great that laboratory results should not be generalized to the validity of these procedures in field settings. Most would agree that the subjects undergoing real-life polygraphic interviews would be considerably more aroused and concerned than those subjects involved in laboratory experiments. In addition, in field situations, subjects would vary in many ways: subjects would view the examination, examiner, and the purpose of the test differently; they would probably be more heterogeneous and vary from laboratory subjects on such factors as age, background, intelligence, personality; the events preceding the examination would differ as well as the time period between the critical event and the examination; in the field, the guilty subjects would be more motivated to deceive; and the anxiety or stress levels of Furedy, Heslegrave / LIE DETECTOR VALIDITY 237 guilty and innocent subjects may differ more extensively in the field than in the lab. It should also be noted that although a number of factors have been listed that may differ from the laboratory to the field situation, the direction of these effects have not been specified. For example, although we can probably assume that subjects undergoing an actual criminal investigation would be more aroused and stressed, we cannot assume that guilty subjects would therefore be either more or less detectable. On the one hand, the additional arousal could lead the guilty to respond more strongly, but it is also possible that the innocent would come to be anxious about the relevant question and hence show greater responses to it. Indeed, it is not unreasonable to suppose that the move from lab to field increases and decreases, respectively, the significance of the relevant and control questions. Because CQT polygraphy's rationale depends crucially on the relevant-control comparison, the above supposition would in itself be sufficient to produce a reversal of direction of effects as one moved from the lab to the field. More generally, any views on the effects of these factors on polygraphy's accuracy are no more than guesses. There is no adequate research that allows one to even begin to estimate these differences between laboratory and field situations in order for laboratory results to be validly generalized to field settings. Finally, we must be clear that field validation reports cannot include studies such as card tests on criminal suspects (e.g., Kugelmass, Lieblich, Ben'Ishal, Opatowski, & Kaplan, 1968) or mock crime investigations on convicted psychopaths (Raskin & Hare, 1978). Accurate estimates of the validity of polygraphy, then, can be based only on field examinations of actual criminal cases. However, this restriction is not sufficient: The outcomes of the polygraph examinations must also be verified against some criterion on "ground truth," which has usually taken the form of judicial outcome or confession. If these necessary restrictions are accepted, then there are only a handful of reports that are relevant as evidence on this issue. 238 CRIMINAL JUSTICE AND BEHAVIOR Lykken (1974) cited Bersh (1969) as the only adequate study of validity in the area. In that study Bersh obtained 323 criminal investigations conducted by the military, and had a four-member panel of lawyers reach a decision (disregarding legal technicalities) concerning the guilt or innocence of the accused. After discarding 80 cases on the grounds of insufficient evidence, the panel produced unanimous, majority, and split decisions, respectively, for 247, 59, and 27 cases. Using the judicial decision as the criterion, the polygrapher's decision was correct in 92% of the unanimous cases, but only in 75% of the majority cases. While these statistics seem to provide impressive levels of validity, there are several key problems to consider. The first problem is that the adequacy of the criterion "ground truth" measure is questionable. It cannot be assumed that all judicial decisions were correct, because there is no way of independently estimating the "ground truth." It may even be possible that the 8% error in unanimous cases occurred through mistakes made by the panel rather than by the polygrapher! The second problem is that, as Lykken (1974) has pointed out, the polygrapher's decision was made on the basis of the facts concerning the case as well as clinical impressions of the subject under investigation at the time of examination. In that case, it is possible to view the study as one of reliability, with the polygrapher serving as an additional judge. The fact that the polygrapher obtained more "correct" decisions in the unanimous than in the majority cases then can be viewed as simply illustrating that as the agreement among the original four judges rose,* so too did the agreement of the polygrapher-judge with the panel of original judges. It is true that both Raskin (1978) and Barland (1982) have correctly indicated that there is no evidence from Bersh's study to support Lykken's interpretation that the polygrapher functioned as yet another judge. However, because the polygrapher followed the usual professional practice of basing decisions not only on the records but also on other factors, Lykken's interpretation cannot be ruled out of consideration. The third problem is that from all the cases on which the polygrapher made a decision, one-third were discarded through Furedy, Heslegrave / LIE DETECTOR VALIDITY 239 receiving split decisions from the panel of judges. If we assume that the polygrapher's accuracy for this one-third of the cases was no better than chance, then over all the cases selected (323 cases) the number of correct detections based on the panel decisions and indecisions (corrected for total cases in each category) would have been 75% against a chance rate of 50%, a less impressive statistic. It might also be observed that the polygrapher seemed to be able to make guilty versus innocent decisions in those one-third of the cases where, according to the panel, the evidence was not sufficient to yield a unanimous legal decision. At least to critics of polygraphy, this suggests that polygraphers are apt to make decisions in situations where the ground truth is not determinable. Only if polygraphy is regarded as a sort of magic path to truth would this possibility be an untroubling one. In a follow-up study, Horvath (1977) examined judgments from 10 polygraph examiners of law enforcement agencies. An important methodological advance over the earlier Bersh (1969) study was that in half the cases the "ground truth" was verified by confession of the guilty person. Another methodologically advantageous feature was that the polygrapher used only the physiological records for their judgments. This feature is important because it can be argued (see, e.g., Lykken, 1981) that without such "blind" record reading a study can at best produce information on the shared prejudice among polygraphers (i.e., reliability), rather than on accuracy (i.e., validity). In the 560 judgments made by the 10 examiners for verified cases, correct decisions occurred only 64% of the time. Moreover, there were no differences between high- and low-experience examiners (greater than versus less than 3 years experience). Raskin (1978) has argued that this usually low accuracy fate may be partly attributable to less experience, poor training, and the fact that polygraphers usually do not operate simply on the basis of the records, but have other behavioral symptoms to consider. However, especially as the physiological recordings are supposed to be the essence of polygraphy, these results cannot be viewed as very supportive of the notion that polygraphy's accuracy is very high. In another extension of Bersh's work, Barland and Raskin 240 CRIMINAL JUSTICE AND BEHAVIOR (1976) reported a study in which Barland administered ControlQuestion tests to 92 criminal suspects and then had a panel of experts review the cases with the polygraph evidence removed. The panels' decisions were the criterion against which the polygraph decisions would be judged. In only 64 cases could the panel achieve a majority decision, and of those a clear polygraphic decision could be achieved by blind scoring of charts by Raskin in only S1 of the cases. Of those 51 cases, 40 were criterion guilty and 11 were criterion innocent. Raskin scored 86% of the cases correctly. However, when the guilty and innocent subjects are considered separately, he scored 98% of the guilty correctly, but only 45% of the innocent correctly, which yields an average of 71.5% correct classifications, if these sample accuracy rates are representative of true (population) accuracy. Consideration of these various statistics indicates that the high degree of accuracy at detecting guilty subjects must be balanced against excessively high false positive outcomes. Barland (1982) has noted, however, that unlike the Bersh (1969) study, the case histories that were given to his panel were often incomplete, and there were an unusually large number of cases that were not classified because of inconclusive evidence. He has raised the possibility that his incompleteness coupled with the philosophy of American jurisprudence to protect the innocent may have caused bias in favor of innocent decisions. While this argument may have some merit, the data suggest that this caused a bias toward generating inconclusive decisions rather than those of innocence. Those classed as innocent were probably innocent "beyond a reasonable doubt." Again, however, the validity of the panel decisions as to "ground truth" are questionable. In contrast to these somewhat low accuracy estimates, proponents of polygraphy have argued that several other studies reporting high accuracies do meet Lykken's "criteria of blind interpretation of confirmed polygraphy charts from criminal suspects" (Raskin & Podlesny, 1979, p. 56). These studies (Horvath & Reid, 1972; Hunter & Ash, 1973; Raskin, 1976; Slowik & Buckley, 1975; Wicklander & Hunter, 1975) report, on the average, 90% and 89% accuracy, respectively, for the detection Furedy, Heslegrave / LIE DETECTOR VALIDITY 241 in guilty and innocent suspects. However, Lykken has claimed that the charts were not randomly selected, but rather chosen from the subset that showed "clear" truthful or deceptive patterns. On the other hand, this charge of nonrandom chart selection could also be leveled at the (low accuracy) study of Horvath (1977). More recently, a study by Ginton, Daie, Elaad, and BenShakhar (1982) appears to provide relevant data on accuracy. An important advantage of this work is that there was clear and independent evidence for what the "ground truth" was, and yet the situation was a real, field situation instead of a simulated, laboratory arrangement. In their arrangements, subjects did or did not commit some act; the act was committed freely rather than being simulated; the act was verifiable; subjects were concerned about the outcome of the polygraphic examination, and believed that the experimenter did not know who had and who had not committed the act, and that the polygrapher did not know the proportion of guilty people. In total, 21 Israeli police officers participated in the study as part of a police course. During part of the course, subjects were given tests that required written answers. Unbeknownst to the subjects, the answers were secretly recorded on a hidden chemical page underneath their exam page. Later the subjects scored their own test with answer sheets under conditions that physically allowed alteration of their original test sheets. A few days later all subjects were told that cheating had occurred on the tests and were given an opportunity to clear themselves of suspicion by taking a polygraphy exam. However, it was also made clear that a negative polygraphic outcome would adversely affect their future careers in the force. In all, 7 subjects cheated, but of the 21 subjects only IS took the polygraph exam; 3 confessed, 1 guilty subject did not show up for the exam, and 2 (1 guilty and 1 innocent) refused to take the exam leaving, only 2 guilty subjects and 13 innocent subjects. The evaluation of the subjects were made blindly on the charts alone, on the behavior of the subjects alone, and on both the charts and behavior (which is the standard polygraphic practice). In addition, charts were scored both 242 CRIMINAL JUSTICE AND BEHAVIOR globally and by the field-numerical-scoring technique proposed by Barland and Raskin (1975). The 2 guilty subjects form too small a sample to base any conclusions on, but suffice it to say that the various methods did result in misclassifications (innocent or inconclusive). For the 13 innocent subjects, one relevant comparison is that among the chart-alone, behavior-alone, and chart-and-behavior conditions, with global chart scoring. Here the respective frequencies of innocent (correct), guilty, and inconclusive (which would probably be sufficiently serious as to affect an innocent policeman's career adversely) categories were as follows: for the chart-alone condition, 7, 3, and 3; for the remaining two conditions, 1 1 , 1 , and 0. Accordingly, with the global method of scoring, addition of the charts has no effect on correct or incorrect decisions, while behavioral information (with or without the chart information) appears to provide more accurate and less ambiguous (fewer inconclusive) decisions. The chart information appears to reduce the frequency of inconclusives obtained from using the charts alone. The other issue of interest is to compare the accuracy of the (older) global method chart scoring with the "semi-objective" numerical system. In the chart-alone (blind) condition, the respective frequencies of the innocent, guilty, and inconclusive categories were 7, 3, and 3 (global) and 5, 1, and 7 (numerical). The corresponding frequencies for the chart-and-behavior condition were 11,2, and 0 (global) and 6, 1, and 6 (numerical). The most obvious feature of these results is that the numerical method produces more inconclusive decisions and fewer "hits" (innocent classifications) than the global method. This sort of "trade-off5 relation is expected on the basis of signal-detection theory. Of course the sample size involved in this study is far too small for any definitive fine-grained conclusions. However, what is clear from what appears to be the only field study that was both sufficiently real life and adequately controlled, is that CQT polygraphy, though better than chance, produces a significant percentage of wrong decisions. In summary, the primary scientific authorities on the detection Furedy, Heslegrave / LIE DETECTOR VALIDITY 243 of deception have different estimates of the validity of the techniques for field use. Raskin (Raskin & Podlesny, 1979) believes the accuracy to be approximately 90% in field situations, while Lykken has repeatedly (e.g., Lykken, 1974) stated that detection of deception techniques have an accuracy rate of 64% to 71 % (against chance rates of 50%). There is a dearth of adequate research on the subject, and that research, moreover, is complicated and difficult to do. At this stage it appears likely that the accuracy of polygraphy is somewhere between Raskin's and Lykken's estimates. That is, the technique is better than chance, but not so foolproof that one does not need to be critical of the procedures and the results of those procedures. Moreover, as also discussed elsewhere in this review, there are special problems having to do with the gravity of making false positive errors, as well as other issues that render the valuation of polygraphy a complicated and controversial matter. 6 In addition, the use of polygraphy raises other ethico-legal issues in liberal democratic societies. We have not discussed these here (for a brief account, see Furedy, 1986), but their existence makes it important to recognize that conclusions regarding polygraphy are conditioned not only by the facts but also by the differing values of various assessors. NOTES 1. Even this qualified assertion is not meant to imply that the practices and techniques employed by members of the APA are scientifically based, and therefore reliable and valid. Rather, this assertion implies only that the measure employed and the interview techniques used on any occasion are drawn from a finite common pool of measures and techniques. As will become apparent, different individual situations can "result in different measures and techniques being employed by different examiners, and the specific interview will undoubtedly vary with different examiners. In other words, the level of standardization of the polygraph "test" does not approach the level reached by other psychological tests designed to measure such concepts as IQ, personality, and temperament. 2. In addition to the obvious fact that control questions vary in more ways from relevant questions than the presence or absence of deception, there are other unique difficulties with the rationale of the CQT that actually generate the paradoxical conclusion that deception is detected in the innocent rather than the guilty. The paradox arises from the fact that examinees are supposed to be deceptive to the control question. The CQT 244 CRIMINAL JUSTICE AND BEHAVIOR involves comparing responses to the relevant and control questions, and "decisions of truthful or deceptive require substantially larger overall responses to control or relevant questions, respectively" (Podlesny ft Raskin, 1977, p. 786). Accordingly, if one assumes that it is deception as a psychological process rather than guilt about the commission of a specific crime that is to be detected (as the claim that polygraphy involves the "detection of deception" would suggest), then larger autonomic responses to the control relative to the relevant question would be interpretable as detecting deception (in the innocent suspect). The fact that it is possible to generate this paradoxical conclusion indicates that the rationale of the CAT is beset with complications that are not apparent to one who simply assumes that the "control" question is simply a scientific control 3. The argument that increasing the number of measures and subjects in the sample reduces the chances of finding a no-difference result holds only if the population difference is not precisely zero. However, that assumption probably holds for most natural phenomena. Specifically, the difference in IQ between males and females is probably not precisely zero, but it is also probably not large enough to be psychologically meaningful, even though it would produce a statistically significant sample difference with sufficiently large sample sizes. Similarly, by increasing the number of channels and tests, the polygrapher increases the chances of a noninclusive outcome (either deceptive or truthful) under conditions where the "true" ("population") relevant and control responses do not differ by a "reasonable margin" (analogous to a "psychologically meaningful" difference in the IQ example). 4. This is a rather simplified description of the findings concerning the GKT. For a more thorough discussion, see the work of Ben-Shakhar and his colleagues (e.g., Ben-Shakhar, 1977; Ben-Shakhar, Lieblich, ft Kugelmass, 1975). 5. A recent study by Bradley and Warfield (1984) indicates that the GKT may work even when the innocent possess crime-relevant knowledge, and this aspect of the study is certainly stressed in the abstract's "major conclusion that subject may have crime-relevant information and not be classed, based on detection scores, as guilty" (Bradley ft Warfield, 1984, p. 683). However, in one group of 10 innocent subjects, the rate of misclassification was 60% (Bradley ft Warfield, 1984, p. 687), suggesting that there are conditions where possession of crime-relevant knowledge by the innocent certainly does invalidate the GKT. 6. For an exhaustive, recent review of the studies reporting accuracy rates, see Office of Technology Assessment report (1983). REFERENCES Ansley, N. (1983). A compendium of polygraphy validity. Polygraph, 12, 53-61. Backster, C. (1962). Methods of strengthening our polygraph technique. Police, 6(5), 61-68. Barland, G. H. (1982). On the accuracy of the polygraph: An evaluative review of Lykken's tremor in the blood. Polygraph, 11, 215-224. Barland, G. H., & Raskin, D. C. (1973). Detection of deception. In W. F. Prokasy & D. C. Raskin (Eds.). (1973). Electrodermal activity in psychological research. New York: Academic Press. Furedy, Heslegrave / LIE DETECTOR VALIDITY 245 Barland, G. H., ft Raskin, D. C. (1975). An evaluation of field techniques in detection of deception. Psychophysiology, 12,321-330. Barland, G. H., ft Raskin, D. C. (1976, March). Validity and reliability of polygraph examinations of criminal suspects (Report No. 76-1, Contract 75-Ni-99O001, US.Department of Justice). Salt Lake City: University of Utah. Ben-Shakhar, G. (1977). A further study on the dichotomization theory in detection of information. Psychophysiology. 14,408-413. Ben-Shakhar, G., Lieblich, I., ft Bar-Hillel. M. (1982). An evaluation of polygraphers' judgments: A review from a decision theoretic perspective. Journal of Applied Psychology. 67,701-703. Ben-Shakhar, G., Lieblich, I., ft Kugelmass, S. (1975). Detection of information and GSRhabitation: An attempt to derive detection efficiency from two habitation curves. Psychophysiology. 12,283-288. Bersh, P. J. (1969). A validation of polygraph examiner judgments. Journal of Applied Psychology. 53,399-403. Bradley, M. T., & Janisse, M. P. (1981). Accuracy demonstrations, threats, and the detection of deception: Cardiovascular, electrodermal, and pupillary measures. Psychophysiology, 18,307-315. Bradley, M. T., ft Warfield, J. F. (1984). Innocence, information, and the guilty knowledge test in the detection of deception. Psychophysiology. 21,683689. Brenner, M., Branscomb, H. H., ft Schwartz, G. E. (1979). Psychological evaluation Two tests of a vocal measure. Psychophysiology, 16,351-357. Furedy, J. J. (1983). Operational, analogical, and genuine definitions of psychophysiology. International Journal of Psychophysiology, 7,13-19. Furedy, J. J. (1986). Lie detection as psychophysiological differentiation: Some fine lines. In M. Coles, E. Donchin, ft S. Porges (Eds.), Psychophysiology: Systems, processes, and applications—A handbook. New York: Guilford. Ginton, A., Daie, N., Elaad, E., ft Ben-Shakhar, G. (1982). A method of evaluating the use of the polygraph in a real-life situation. Journal of Applied Psychology. 67,131-137. Hemsley, G. D. (1977). Experimental studies in the behavioral indicants of deception. Unpublished doctoral dissertation, University of Toronto. Heslegrave, R. J. (1981). A psychophysiological analysis of the detection of deception: The role of information, retrieval, novelty and conflict mechanism. Unpublished Ph.D. thesis, University of Toronto. Horvath, F. S. (1977). The effect of selected variables on interpretation of polygraph records. Journal of Applied Psychology. 62,127-136. Horvarth, F. S. (1978). An experimental comparison of the psychological stress evaluator and the galvanic skin response in the detection of deception. Journal of Applied Psychology. 63, 338-344. Horvath, F. S. (1982). Detecting deception: The promise and the reality of voice stress analysis. Polygraph. 11, 304-308. Horvath, F. S., ft Reid, J. E. (1972). The polygraph silent answer test. Journal of Criminal Law. Criminology. & Police Science, 63, 285-293. Hunter, F. L., ft Ash, P. (1973). The accuracy of consistency of polygraph examiners' diagnosis. Journal of Police Science A Administration, 1, 370-375. Kircher, J. C, & Raskin, D. C. (1983). Clinical versus statistical lie detection revisited: Through a lens sharply. Psychophysiology, 20,452 (Abstract). Kugelmass, S., Lieblich, I., Ben-Ishai, A., Opatowski, A., ft Kaplan, M. (1968). 246 CRIMINAL JUSTICE AND BEHAVIOR Experimental evaluation of galvanic skin response and blood pressure change indices during criminal interrogation. Journal of Criminal Law, Criminology, & Police Science, 59,632-635. Lippold, O. (1971). Physiological tremor. Scientific American, 224,65-73. Lykken, D. T. (1969). The GSR in the detection of guilty. Journal of Applied Psychology,43,385-388. Lykken, D. T. (1960). The validity of the guilty knowledge technique: The effects of faking. Journal of Applied Psychology. 44,258-262. Lykken, O. T. (1974). Psychology and the lie detector industry. American Psychologist,29, 725-739. Lykken, D. T. (1978). The psychopath and lie detector. Psychophysiology, 15,137142. Lykken, D. T. (1979). The detection of deception. Psychological Bulletin, 86,47-53. Lykken, D. T. (1981). A tremor in the blood: Uses and abuses of the lie detector. New York: McGraw-Hill. Lynch, B. E., & Henry, D. R. (1979). A validity study of the Psychological Stress Evaluator. Canadian Journal of Behavioural Science, 11, 89-94. Nachshon, I., & Feldman, B. (1980). Vocal indices of psychological stress: A validation study of the psychological stress evaluator. Journal of Police Science & Administration. 8,40-53. Office of Technology Assessment. (1983, November). Scientific Validity of Polygraph Testing. Washington, DC: Government Printing Office. Podlesny, J. A., & Raskin, D. C. (1977). Physiological measures and the detection of deception. Psychological Bulletin. 84,782-799. Podlesny, J. A., & Raskin, D. C. (1978). Effectiveness of techniques and physiological measures in the detection of deception. Psychophysiology, 15,344-359. Raskin, R. C. (1976, June). Reliability of chart interpretations and sources of errors in polygraphy examinations (Report No. 76-3, contract 75-Nii-99-0001, U.S. Department of Justice). Salt Lake City: University of Utah. Raskin, D. C. (1978). Scientific assessment of the accuracy of detection of deception: A reply to Lykken. Psychophysiology, 15,143-147. Raskin, D. C, A Hare, R. D. (1978). Psychopathy and detection of deception in a prison population. Psychophysiology, 15, 126-136. Raskin, D. C, & Podlesny, J. A. (1979). Truth and deception: A reply to Lykken. Psychological Bulletin, 86,54-59. . Slovik, S. M., & Buckley, J. P. (1975). Relative accuracy of polygraph examiner diagnosis of respiration, blood pressure, and GSR recordings. Journal of Police Science & Administration, 3, 305-309. Tanner, W. P., Jr., & Swets, J. A. (1954). A decision-making theory of visual detection. Psychological Review, 61, 401-409. Wicklander, D. E., & Hunter, F. L. (1975). The influence of auxiliary sources of information in polygraph diagnoses. Journal of Police Science & Administration, 33,405-409.