Forensic Voiceprints By Katherine Ramsland Source: http://www.care2.com/c2c/groups/disc.html?gpp=5154&pst=855449 The Origin of Voiceprints Voice analysis for the KGB? That's what political prisoners with special skills are forced to do in Alexander Solzhenitsyn's fact-based novel, The First Circle. Although imprisoned, these scientists have a unique position in Stalin's Russia. They live in a penal institution that doubles as a scientific research center and their assignment is to develop voiceprint technology. While the Russian secret police analyze phone calls in Germany, the technicians are pressed to figure out how to scientifically measure the individuality of the human voice. The novel offers a fascinating glimpse into the early days of this technology, but it was not in 1949 Russia where it all began. The idea that someone could be identified by the sound of his voice had its origins in the work of Alexander Melville Bell (father to Alexander Graham Bell). Over one hundred years ago, he developed a visual representation of what the spoken word would look like. It was based on pronunciation and he showed that there were subtle differences among different people who said the same things. His son later joined him in using the system to help deaf people to speak. Then in 1941, the laboratories of Bell Telephone in New Jersey produced a machine—the sound spectrograph—for mapping a voice onto a graph. It analyzed sound waves and produced a visual record of voice patterns that were based on frequency, intensity, and time. Acoustic scientists used it during World War II, as seen in Solzhenitsyn's novel, to attempt to identify enemy voices on telephones and radios. However, with the war's end, the urgency for this technology diminished and little came of it until later. Voiceprint technology began to get notice for criminal investigations in the early 1960s when the New York City Police Department received numerous bomb threats by phone against major airlines. Stymied, the FBI asked Bell Labs to help. Lawrence G. Kersta, one of their senior engineers, was assigned the task of figuring out a method of identification that would stop the calls and bring the perpetrators to justice. He was a physicist who had worked with the sound spectrograph in its early days. It took him more than two years and the analysis of over 50,000 voices, but he managed to offer a technique that he claimed tested at 99.65% accuracy. He had even brought in professional mimics to try to fool the machine, but try as they might to imitate someone else's voice, the mimics showed up in the graph as quite dissimilar from the original voices. Kersta eventually broke away from Bell Laboratories to market the machine on his own. Then in 1966, the Michigan State Police started to work on a practical application of voiceprint technology in criminal investigations. They formed a Voice Identification Unit and hired Kersta to train these officers. Their intent was to use it to assist with ongoing cases, but it wasn't long before its legal weight was reviewed in a courtroom. Voiceprint technology came into the American courts in the 1960s, and judges were divided on whether or not to admit it as scientific evidence. There was little research to support it, there were few people who really could be called technical experts, and linguists testified against one another on its viability. The first case was in military court, United States v. Wright, and that began the judicial controversy. One court ruled the technology admissible, but a dissenting judge wrote a detailed opinion on why it should not be considered scientifically acceptable. The New Jersey Supreme Court was the first non-military court to make an appellate review, in State v. Cary. Courts in New York and California had admitted this type of testimony, so the New Jersey justices remanded the case to check the accuracy of the equipment. Another appeal came their way and they ruled that it was too early to tell whether this method was reliable. After several more times back and forth, with no new scientific support, the voiceprint identification evidence was excluded. The reason for this, and the subsequent case history, are supplied in detail in Section Five. First, Let's look at how the sound spectrograph worked in a murder investigation. The Voiceprint of a Killer The year was 1971. Neil LaFeve, an amiable but law-abiding game warden in Wisconsin was found murdered on September 24th, on his 32nd birthday. That afternoon, he had been out in the woods posting signs and had planned to finish long before the party that his wife had organized for him. When he failed to show up, his wife grew worried and phoned his boss. They discussed it together, but there was no reason they could think of that Neil might still be out in the woods. LaFeve's boss drove out to have a look. He noticed that all the signs had been posted, so when darkness came and there was still no indication that LaFeve was returning, he called the police. They searched through the night, but gave up without finding the missing warden. In the morning, the search party came across LaFeve's truck. It was empty and the door was ajar. Things looked bad and only got worse when they found a large amount of blood not far away. Another searcher picked up some broken sunglasses and two spent shells from a .22 rifle. From there, more signs of a wounded man formed a trail: human body matter, a tooth, blood, and bone fragments. They felt certain they would not find him alive. Finally the search party reached a spot that looked like it had been recently dug up. The police got shovels and soon they had located Neil LaFeve - without his head. Another freshly dug spot nearby, though much smaller, yielded his head. It had been hacked off with a blunt instrument---a shovel or spade---and two bullets were imbedded in the skull. The coroner also found several bullets in the corpse. The first step was to determine if LaFeve had any enemies. The officers in charge of the investigation looked through a list of men that LaFeve had arrested for poaching, because these men could have a vendetta. The brutality of the attack indicated rage or revenge, not just a random killing. All of the men who had been convicted of hunting illegally on those grounds were located and interviewed on tape, and a few were asked to submit to polygraph exams. However, there was one man who refused to cooperate: 21year-old Brian Hussong. LaFeve had arrested him several times, yet he had continued to poach. Hussong had no alibi for September 24th and he resisted all attempts to clear up the murder mystery. He seemed a likely suspect. Sergeant Marvin Gerlikovski was in charge, so he got a rare court order that allowed him to put a wiretap on Hussong's house. He took the extra precaution of recording everything that was said, which paid off in a way he didn't expect. It wasn't long before Hussong got on the phone to get his grandmother to hide his guns and give him an alibi. She appeared to cooperate, so Gerlikovski sent detectives to her house. Flustered, she led them straight to the hiding place. Ballistics experts confirmed a match between the .22 rifle and the bullets found in LaFeve's body, which was enough evidence to place Hussong under arrest. Gerlikovski then sent the tapes he had made to Michigan's Voice Identification Unit—at that time the best in the world for this type of procedure. The leading experts in voiceprint analysis had trained these officers. Ernest Nash examined the tapes, gave his opinion, and ended up serving as an expert witness during Hussong's trial. However, it was not Hussong's voice that he testified about, but that of Hussong's grandmother. She had denied saying that she had hidden the guns, so Nash explained how he could match her voice to that of the voice on the tape. He then used his laboratory results to affirm that she was definitely the person speaking to her grandson on the tape. The jury listened to the tapes again, and after less than four hours of deliberation, they returned a guilty verdict of first-degree murder that gave Hussong a life term in prison. So just what is it about the human voice that makes it electronically measurable? The Spectograph and the Human Voice Anyone who talks on a phone or tape recorder is fair game for voice analysis, especially if they have criminal intent. Increasingly, more law enforcement officers are getting trained in voiceprint analysis, and with the development of computer and digital spectrogram technology, the procedure is becoming widely used. Lawrence Kersta noted that each person's voice has a unique quality that can be mapped on a graph. The individuality derives primarily from differences in physical vocal mechanisms. One person's vocal chords, no matter how similar they might look, process sounds differently than someone else's. The size and shape of someone's vocal cavity, tongue, and nasal cavities contribute to this, as well as how that person coordinates lips, jaw, tongue, and soft palate to make speech. No combination of these things is like any other. That means that our voices are sufficiently unique to make personal identification based on voice sounds possible. Although Kersta also believed that an individual's voice does not change over his or her lifetime, other experts have disputed him on this point. If the body changes, so does the voice. Even where a person lives can effect voice changes, as well as illness, stress, aging, and other factors. Nevertheless, Kersta maintained that the essential qualities of the voice remain constant. He felt that he finally proved this in one of the most famous cases involving the spectrograph: that of the reclusive Howard Hughes. In 1971, a man named Clifford Irving came to New York to cut a deal for what he claimed was Hughes' autobiography, ghosted by him. He had letters that he insisted were written by Hughes and experts soon authenticated them. The publisher McGraw-Hill bought into his claim, advancing him $765,000 and announcing their intent to publish the book. Eventually Irving turned in a 1200 page manuscript. It was difficult to ascertain whether Hughes had actually authorized this transaction since for the past fifteen years he had been exceedingly elusive. That Irving had letters from him seemed a good indicator that they knew each other. Several people who had known Hughes read the manuscript and felt convinced that it was genuinely his story. However, he finally surfaced from his retreat on Paradise Island in the Bahamas to renounce the book. Hughes claimed that he had never met Clifford Irving and that the whole thing was a fake. He added that he did not know where Irving had gotten his information. However, he was not willing to make his renunciation in person. He agreed only to do this by phone. That meant that he could be identified only by his voice—how it sounded and what he said. A group of reporters familiar with him from his early days was assembled by NBC in Los Angeles to ask him questions for two hours. Their purpose was to authenticate the voice on the phone as that of the famous, eccentric billionaire, and they were to ask some key questions that would trip up an imposter. The man on the phone responded in convincing detail. He talked about such things as the make of his plane and trips that he had made, but he stumbled when asked about the good luck charm that a woman had presented to him before his 1938 trip around the world. He said that he could not recall the incident, but moments later he did: She had placed chewing gum on the tail of his plane. This entire phone conversation was recorded and as they listened again, the reporters all believed that Hughes had been the man on the phone. That meant that Irving was a fraud. Irving defended himself by insisting that the person who had called was the imposter, but NBC had hired Lawrence Kersta to make a voiceprint analysis. He measured pitch, tone, and volume to compare the voice pictures on a line-by-line basis, comparing a recording of a speech that Hughes had made in 1947 with the recordings from the interview. Finally he announced that the man who had spoken to reporters was Howard Hughes. Even one of Kersta's most vocal critics, phonetics professor Peter Ladefoged, admitted that the recordings were remarkably identical. Irving was arrested and convicted of forgery. He repaid the publisher and was sentenced to thirty months in prison. Since the recordings had been made nearly a quarter of a century apart and Hughes' voice had deepened, there had been concern that changes would make the reading impossible. However, the spectrographic patterns proved to be impressively similar. This result further convinced Kersta that the inherent uniqueness of an individual's voice remains constant. Spectrographic analysis of the human voice has made a similar impact in other criminal cases, so let's see more specifically how an interpretation is made. How It Works Many law enforcement laboratories are equipped with at least one sound spectrograph, although there are several types to choose from. This machine plots the frequency of a complex sound according to time and intensity. Its function is based on the idea that the human voice is produced by a combination of physiological structures and harmonics. The vocal column begins in the vocal folds and ends at the lips. The vocal folds function acoustically as a closed end so that the vocal column becomes a closedtube resonator. The tension of the vocal folds determines the vibrational frequency. When a sound is produced, those harmonics nearest the resonant frequency of the vocal column increase in amplitude. If the shape of the mouth, throat, or lips changes, the frequencies vary with the change. The sound spectrograph converts the sound of a voice into a visual graphic display known as a voiceprint. The analog spectrograph has four parts: a magnetic tape recorder unit, a tape scanning device, a filter, and an electronic stylus that writes the information onto electrically sensitive paper. A high-quality tape is fastened to the scanning drum, which holds a 2.5 segment of tape time. The process takes about eighty to ninety seconds to complete. As the drum revolves, an electronic filter is activated that allows only a certain band of frequencies to get through to the recorder. These frequencies are translated into electrical energy that gets recorded by the stylus. As the process continues, the filter moves into increasingly higher frequencies and the stylus records the intensity levels of each defined range. The final print shows a pattern of closely spaced lines that represent 2.5 seconds worth of all of the distinguishable frequencies of that person's voice as it was taped. The horizontal axis on a voiceprint represents the parameter of time, registering how high or low a voice is. The vertical axis is the frequency. The degree of darkness within each region on the graph illustrates the degree of intensity, or the voice's volume. Two kinds of prints can be made: bar prints, which are utilized for identification, and contour prints, which help to file the prints in a computer. Recent developments include digital spectrographs that can be used with a computer for enhanced comparison and measurement, but some specialists still prefer the older analog model. Comparisons are made between voice samples and when sufficient similarity exists between one pattern and another, the voices are believed to have a high probability of originating from the same person. For forensic purposes, the voiceprint interpreter needs a recording of the suspect's voice (e.g., from an interview) to compare to the sample made in the context of a crime, such as an obscene phone call or taped conversation. Other people's voices, unrelated to the crime, are used for elimination factors (points of dissimilarity). Interpreters use two methods of identification: Aural: listening to the voice on tape to compare single sounds and series of sounds for similarities and discrepancies; the examiner also listens for breath patterns, inflections, unusual speech habits, and accents. Visual: reading the voiceprints to compare their appearances. First, the examiner evaluates the recording of the unknown suspect, to make sure it has sufficient quality and clarity for analysis. Then the examiner turns to the voices of the known person to ensure that the recording has similar clarity. The best test cases have the suspect repeat what was said on the "unknown voice" tape, or at least include as many of the same words as possible. The aural and visual methods are combined to come up with one of five conclusions: positive identification probable identification positive elimination probable elimination no decision. The highest standard requires the identification of twenty speech sounds that possess similarities. "Positive elimination" derives from twenty or more differences, and the rest fall on a spectrum in between. Voiceprint Analysis Expertise To be qualified as experts in voiceprint analysis, technicians must: 1. complete a course of study on spectrographic analysis that generally runs from two to four weeks 2. complete one hundred voice comparison cases under intense personal supervision by a known expert 3. be examined by a board of experts in the field Since courts generally contest the methods of interpretation, not the actual accuracy or reliability of the spectrographic instrument, it is important that any spectrograph technician who testifies in court be highly qualified. The less training and experience the technician has, the more such testimony becomes vulnerable to serious questions by the judge and jury. All of the studies that have been done on spectrographic accuracy, including a 1986 FBI survey, show that those people who have been properly trained and who use standard aural and visual procedures get highly accurate results. The opposite is true where training and/or analysis methods are limited. Bringing such studies to the attention of the courts could help determine who is indeed an expert and could minimize some of the controversy and confusion that comes from misperception. Those who do the recordings for analysis must also be competent to operate the recording device, because the quality of the tape has great bearing on the interpreter's results. The skills involved in aural and visual voice interpretation include: 1. Critical listening, with an ear for anomalies and the ability to audit 2. 3. 4. 5. foreground information as distinguishable from background Ability to check for tape tampering Experience reading magnetic tapes Ability to operate the spectrograph equipment, both for general results and for zooming in on specific patterns Ability to work with an investigative team In all likelihood, voiceprints will continue to play a key role in any investigation that involves voice evidence. As such, they will become part of the evidence brought into court. Like other technologies that once were resisted but are now fully admissible, voiceprints may soon have their day.