Forensic Voiceprints

advertisement
Forensic Voiceprints
By Katherine Ramsland
Source: http://www.care2.com/c2c/groups/disc.html?gpp=5154&pst=855449
The Origin of Voiceprints
Voice analysis for the KGB?
That's what political prisoners with special skills are forced to do in Alexander
Solzhenitsyn's fact-based novel, The First Circle. Although imprisoned, these
scientists have a unique position in Stalin's Russia. They live in a penal
institution that doubles as a scientific research center and their assignment is to
develop voiceprint technology. While the Russian secret police analyze phone
calls in Germany, the technicians are pressed to figure out how to scientifically
measure the individuality of the human voice. The novel offers a fascinating
glimpse into the early days of this technology, but it was not in 1949 Russia
where it all began.
The idea that someone could be identified by the sound of his voice had its
origins in the work of Alexander Melville Bell (father to Alexander Graham Bell).
Over one hundred years ago, he developed a visual representation of what the
spoken word would look like. It was based on pronunciation and he showed that
there were subtle differences among different people who said the same things.
His son later joined him in using the system to help deaf people to speak.
Then in 1941, the laboratories of Bell Telephone in New Jersey produced a
machine—the sound spectrograph—for mapping a voice onto a graph. It
analyzed sound waves and produced a visual record of voice patterns that were
based on frequency, intensity, and time. Acoustic scientists used it during World
War II, as seen in Solzhenitsyn's novel, to attempt to identify enemy voices on
telephones and radios. However, with the war's end, the urgency for this
technology diminished and little came of it until later.
Voiceprint technology began to get notice for criminal investigations in the early
1960s when the New York City Police Department received numerous bomb
threats by phone against major airlines. Stymied, the FBI asked Bell Labs to
help.
Lawrence G. Kersta, one of their senior engineers, was assigned the task of
figuring out a method of identification that would stop the calls and bring the
perpetrators to justice. He was a physicist who had worked with the sound
spectrograph in its early days. It took him more than two years and the analysis
of over 50,000 voices, but he managed to offer a technique that he claimed
tested at 99.65% accuracy. He had even brought in professional mimics to try
to fool the machine, but try as they might to imitate someone else's voice, the
mimics showed up in the graph as quite dissimilar from the original voices.
Kersta eventually broke away from Bell Laboratories to market the machine on
his own.
Then in 1966, the Michigan State Police started to work on a practical application
of voiceprint technology in criminal investigations. They formed a Voice
Identification Unit and hired Kersta to train these officers. Their intent was to
use it to assist with ongoing cases, but it wasn't long before its legal weight was
reviewed in a courtroom.
Voiceprint technology came into the American courts in the 1960s, and judges
were divided on whether or not to admit it as scientific evidence. There was
little research to support it, there were few people who really could be called
technical experts, and linguists testified against one another on its viability.
The first case was in military court, United States v. Wright, and that began the
judicial controversy. One court ruled the technology admissible, but a dissenting
judge wrote a detailed opinion on why it should not be considered scientifically
acceptable.
The New Jersey Supreme Court was the first non-military court to make an
appellate review, in State v. Cary. Courts in New York and California had
admitted this type of testimony, so the New Jersey justices remanded the case
to check the accuracy of the equipment. Another appeal came their way and
they ruled that it was too early to tell whether this method was reliable. After
several more times back and forth, with no new scientific support, the voiceprint
identification evidence was excluded.
The reason for this, and the subsequent case history, are supplied in detail in
Section Five. First, Let's look at how the sound spectrograph worked in a murder
investigation.
The Voiceprint of a Killer
The year was 1971. Neil LaFeve, an amiable but law-abiding game warden in
Wisconsin was found murdered on September 24th, on his 32nd birthday. That
afternoon, he had been out in the woods posting signs and had planned to finish
long before the party that his wife had organized for him. When he failed to
show up, his wife grew worried and phoned his boss. They discussed it
together, but there was no reason they could think of that Neil might still be out
in the woods.
LaFeve's boss drove out to have a look. He noticed that all the signs had been
posted, so when darkness came and there was still no indication that LaFeve was
returning, he called the police. They searched through the night, but gave up
without finding the missing warden.
In the morning, the search party came across LaFeve's truck. It was empty and
the door was ajar. Things looked bad and only got worse when they found a
large amount of blood not far away. Another searcher picked up some broken
sunglasses and two spent shells from a .22 rifle. From there, more signs of a
wounded man formed a trail: human body matter, a tooth, blood, and bone
fragments. They felt certain they would not find him alive.
Finally the search party reached a spot that looked like it had been recently dug
up. The police got shovels and soon they had located Neil LaFeve - without his
head. Another freshly dug spot nearby, though much smaller, yielded his head.
It had been hacked off with a blunt instrument---a shovel or spade---and two
bullets were imbedded in the skull. The coroner also found several bullets in the
corpse.
The first step was to determine if LaFeve had any enemies. The officers in
charge of the investigation looked through a list of men that LaFeve had arrested
for poaching, because these men could have a vendetta. The brutality of the
attack indicated rage or revenge, not just a random killing.
All of the men who had been convicted of hunting illegally on those grounds
were located and interviewed on tape, and a few were asked to submit to
polygraph exams. However, there was one man who refused to cooperate: 21year-old Brian Hussong. LaFeve had arrested him several times, yet he had
continued to poach. Hussong had no alibi for September 24th and he resisted all
attempts to clear up the murder mystery. He seemed a likely suspect.
Sergeant Marvin Gerlikovski was in charge, so he got a rare court order that
allowed him to put a wiretap on Hussong's house. He took the extra precaution
of recording everything that was said, which paid off in a way he didn't expect.
It wasn't long before Hussong got on the phone to get his grandmother to hide
his guns and give him an alibi. She appeared to cooperate, so Gerlikovski sent
detectives to her house. Flustered, she led them straight to the hiding place.
Ballistics experts confirmed a match between the .22 rifle and the bullets found
in LaFeve's body, which was enough evidence to place Hussong under arrest.
Gerlikovski then sent the tapes he had made to Michigan's Voice Identification
Unit—at that time the best in the world for this type of procedure. The leading
experts in voiceprint analysis had trained these officers. Ernest Nash examined
the tapes, gave his opinion, and ended up serving as an expert witness during
Hussong's trial. However, it was not Hussong's voice that he testified about, but
that of Hussong's grandmother. She had denied saying that she had hidden the
guns, so Nash explained how he could match her voice to that of the voice on
the tape. He then used his laboratory results to affirm that she was definitely
the person speaking to her grandson on the tape.
The jury listened to the tapes again, and after less than four hours of
deliberation, they returned a guilty verdict of first-degree murder that gave
Hussong a life term in prison.
So just what is it about the human voice that makes it electronically measurable?
The Spectograph and the Human Voice
Anyone who talks on a phone or tape recorder is fair game for voice analysis,
especially if they have criminal intent. Increasingly, more law enforcement
officers are getting trained in voiceprint analysis, and with the development of
computer and digital spectrogram technology, the procedure is becoming widely
used.
Lawrence Kersta noted that each person's voice has a unique quality that can be
mapped on a graph. The individuality derives primarily from differences in
physical vocal mechanisms. One person's vocal chords, no matter how similar
they might look, process sounds differently than someone else's. The size and
shape of someone's vocal cavity, tongue, and nasal cavities contribute to this, as
well as how that person coordinates lips, jaw, tongue, and soft palate to make
speech. No combination of these things is like any other. That means that our
voices are sufficiently unique to make personal identification based on voice
sounds possible.
Although Kersta also believed that an individual's voice does not change over his
or her lifetime, other experts have disputed him on this point. If the body
changes, so does the voice. Even where a person lives can effect voice changes,
as well as illness, stress, aging, and other factors. Nevertheless, Kersta
maintained that the essential qualities of the voice remain constant. He felt that
he finally proved this in one of the most famous cases involving the
spectrograph: that of the reclusive Howard Hughes.
In 1971, a man named Clifford Irving came to New York to cut a deal for what
he claimed was Hughes' autobiography, ghosted by him. He had letters that he
insisted were written by Hughes and experts soon authenticated them. The
publisher McGraw-Hill bought into his claim, advancing him $765,000 and
announcing their intent to publish the book. Eventually Irving turned in a 1200
page manuscript.
It was difficult to ascertain whether Hughes had actually authorized this
transaction since for the past fifteen years he had been exceedingly elusive.
That Irving had letters from him seemed a good indicator that they knew each
other. Several people who had known Hughes read the manuscript and felt
convinced that it was genuinely his story. However, he finally surfaced from his
retreat on Paradise Island in the Bahamas to renounce the book.
Hughes claimed that he had never met Clifford Irving and that the whole thing
was a fake. He added that he did not know where Irving had gotten his
information. However, he was not willing to make his renunciation in person.
He agreed only to do this by phone. That meant that he could be identified only
by his voice—how it sounded and what he said.
A group of reporters familiar with him from his early days was assembled by NBC
in Los Angeles to ask him questions for two hours. Their purpose was to
authenticate the voice on the phone as that of the famous, eccentric billionaire,
and they were to ask some key questions that would trip up an imposter. The
man on the phone responded in convincing detail. He talked about such things
as the make of his plane and trips that he had made, but he stumbled when
asked about the good luck charm that a woman had presented to him before his
1938 trip around the world. He said that he could not recall the incident, but
moments later he did: She had placed chewing gum on the tail of his plane.
This entire phone conversation was recorded and as they listened again, the
reporters all believed that Hughes had been the man on the phone. That meant
that Irving was a fraud.
Irving defended himself by insisting that the person who had called was the
imposter, but NBC had hired Lawrence Kersta to make a voiceprint analysis. He
measured pitch, tone, and volume to compare the voice pictures on a line-by-line
basis, comparing a recording of a speech that Hughes had made in 1947 with
the recordings from the interview. Finally he announced that the man who had
spoken to reporters was Howard Hughes.
Even one of Kersta's most vocal critics, phonetics professor Peter Ladefoged,
admitted that the recordings were remarkably identical. Irving was arrested and
convicted of forgery. He repaid the publisher and was sentenced to thirty
months in prison.
Since the recordings had been made nearly a quarter of a century apart and
Hughes' voice had deepened, there had been concern that changes would make
the reading impossible. However, the spectrographic patterns proved to be
impressively similar. This result further convinced Kersta that the inherent
uniqueness of an individual's voice remains constant.
Spectrographic analysis of the human voice has made a similar impact in other
criminal cases, so let's see more specifically how an interpretation is made.
How It Works
Many law enforcement laboratories are equipped with at least one sound
spectrograph, although there are several types to choose from. This machine
plots the frequency of a complex sound according to time and intensity. Its
function is based on the idea that the human voice is produced by a combination
of physiological structures and harmonics.
The vocal column begins in the vocal folds and ends at the lips. The vocal folds
function acoustically as a closed end so that the vocal column becomes a closedtube resonator. The tension of the vocal folds determines the vibrational
frequency. When a sound is produced, those harmonics nearest the resonant
frequency of the vocal column increase in amplitude. If the shape of the mouth,
throat, or lips changes, the frequencies vary with the change.
The sound spectrograph converts the sound of a voice into a visual graphic
display known as a voiceprint. The analog spectrograph has four parts: a
magnetic tape recorder unit, a tape scanning device, a filter, and an electronic
stylus that writes the information onto electrically sensitive paper.
A high-quality tape is fastened to the scanning drum, which holds a 2.5 segment
of tape time. The process takes about eighty to ninety seconds to complete. As
the drum revolves, an electronic filter is activated that allows only a certain band
of frequencies to get through to the recorder. These frequencies are translated
into electrical energy that gets recorded by the stylus. As the process continues,
the filter moves into increasingly higher frequencies and the stylus records the
intensity levels of each defined range. The final print shows a pattern of closely
spaced lines that represent 2.5 seconds worth of all of the distinguishable
frequencies of that person's voice as it was taped.
The horizontal axis on a voiceprint represents the parameter of time, registering
how high or low a voice is. The vertical axis is the frequency. The degree of
darkness within each region on the graph illustrates the degree of intensity, or
the voice's volume.
Two kinds of prints can be made: bar prints, which are utilized for identification,
and contour prints, which help to file the prints in a computer.
Recent developments include digital spectrographs that can be used with a
computer for enhanced comparison and measurement, but some specialists still
prefer the older analog model.
Comparisons are made between voice samples and when sufficient similarity
exists between one pattern and another, the voices are believed to have a high
probability of originating from the same person. For forensic purposes, the
voiceprint interpreter needs a recording of the suspect's voice (e.g., from an
interview) to compare to the sample made in the context of a crime, such as an
obscene phone call or taped conversation. Other people's voices, unrelated to
the crime, are used for elimination factors (points of dissimilarity).
Interpreters use two methods of identification:
Aural: listening to the voice on tape to compare single sounds and series of
sounds for similarities and discrepancies; the examiner also listens for breath
patterns, inflections, unusual speech habits, and accents.
Visual: reading the voiceprints to compare their appearances.
First, the examiner evaluates the recording of the unknown suspect, to make
sure it has sufficient quality and clarity for analysis. Then the examiner turns to
the voices of the known person to ensure that the recording has similar clarity.
The best test cases have the suspect repeat what was said on the "unknown
voice" tape, or at least include as many of the same words as possible.
The aural and visual methods are combined to come up with one of five
conclusions:





positive identification
probable identification
positive elimination
probable elimination
no decision.
The highest standard requires the identification of twenty speech sounds that
possess similarities. "Positive elimination" derives from twenty or more
differences, and the rest fall on a spectrum in between.
Voiceprint Analysis Expertise
To be qualified as experts in voiceprint analysis, technicians must:
1. complete a course of study on spectrographic analysis that generally runs
from two to four weeks
2. complete one hundred voice comparison cases under intense personal
supervision by a known expert
3. be examined by a board of experts in the field
Since courts generally contest the methods of interpretation, not the actual
accuracy or reliability of the spectrographic instrument, it is important that any
spectrograph technician who testifies in court be highly qualified. The less
training and experience the technician has, the more such testimony becomes
vulnerable to serious questions by the judge and jury.
All of the studies that have been done on spectrographic accuracy, including a
1986 FBI survey, show that those people who have been properly trained and
who use standard aural and visual procedures get highly accurate results. The
opposite is true where training and/or analysis methods are limited. Bringing
such studies to the attention of the courts could help determine who is indeed an
expert and could minimize some of the controversy and confusion that comes
from misperception.
Those who do the recordings for analysis must also be competent to operate the
recording device, because the quality of the tape has great bearing on the
interpreter's results.
The skills involved in aural and visual voice interpretation include:
1. Critical listening, with an ear for anomalies and the ability to audit
2.
3.
4.
5.
foreground information as distinguishable from background
Ability to check for tape tampering
Experience reading magnetic tapes
Ability to operate the spectrograph equipment, both for general results
and for zooming in on specific patterns
Ability to work with an investigative team
In all likelihood, voiceprints will continue to play a key role in any investigation
that involves voice evidence. As such, they will become part of the evidence
brought into court. Like other technologies that once were resisted but are now
fully admissible, voiceprints may soon have their day.
Download