Roy Morris, Summer Project - Seidenberg School of Computer

advertisement
The Study of Voiceprint technology
As a Means of Voice Verification
Roy Morris
Instructor: Charles C. Tappert
Abstract
This technical paper explains the definition, and
functionality of Voiceprint as a means of
authentication and verification. It describes the
methods used to create a voiceprint as well as
methods and techniques used to perform a
spectral analysis. The work also lists some of the
current and possible future applications. This
study explains some of the methods used in
measuring a voice sample such as the
analysis of a spectrogram
Introduction
As the amount of internet users and
information increase, there is a parallel
increase in the need for stronger security.
Nowadays, passwords alone do not provide
enough security, and many users are more
seriously considering biometrics as a form
of protection through verification. As the
study of biometrics expands, voice
authentication is being looked at as a form
of biometrics that can provide that added
extra security. “More financial institutions are
considering voice biometrics as a way to fight
call center fraud. That's because other forms of
authentication are proving ineffective at a time
when socially engineered attacks against call
centers are on the rise” [3]. This study is
focused on how well a person’s voice can be
individualized and used as a means of
authentication, and verification. The
fundamental thoughts on voice verification
suggest that every voice can be
individualized enough to be able to identify
one person from the next through the
analysis of a voiceprint [5].
A person’s voice print is defined as “a
graphic record made by a sound
spectrograph of the energy patterns emitted
by speech. No two voiceprints are alike” [6].
Through recording, editing, and analyzing
the spectrographic features of a voice, we
can study what makes the voice a sufficient
means of verification.
For this study, each participant was
recorded repeating the phrase “My name is
Name” twenty times. They were instructed
to speak relaxed, but clearly in order to get
clean voice samples. Once twenty utterances
were captured from each subject, the
background noise was filtered out so the
sample utterance was free of interfering
noise. This is referred to as energy
thresholding. Also, the phrase “My name is”
was isolated so that all samples repeated the
same utterance. This is done to improve the
study, as consistency is important in all
studies. Next, a spectral analysis of the
utterance was conducted.
A spectrogram is defined as a visual
representation of sound that shows the
amplitude of frequency components of a
signal over time. A spectrogram is created
by a mathematical algorithm called FFT.
The signal is decomposed into its frequency
components where time is shown on the xaxis while frequency is displayed on the yaxis. Below is an example of a spectrogram
created by one of the subjects in the study.
Where𝑣 (𝑛)., is the vocal tract impulse
response and 𝑔(𝑛) is the excitation. The
entire frequency domain is shown as
S𝑓 =𝐺 𝑓.𝑉 𝑓 (2)
Some other analysis techniques used for
speech verification are the Mel Cepstrum
Analysis, Human Factor Cepstrum Analysis,
LPC Analysis, PLP Analysis, and a
Temporal Analysis.
The Mel-frequency’s advantage is
that it provides a more accurate response to
a human auditory system. It does this by
locating the frequency bands logarithmically
over the mel scale which provides a better
response of the human auditory system than
other frequency bands derived from FFT.
There are three main categories in which
speech recognition can be placed into. They
are the acoustic phonetic approach, the
pattern recognition method, and the artificial
intelligence technique [2].
The acoustic phonetic approach is based on
the theory that the specific phonetic sounds
can be found within the speech sample.
The pattern recognition method is “one in
which the speech pattern are required
directly without explicit feature
determination and segmentation”, and the
artificial intelligence technique’s greatest
advantage is that it allows for parallel
computation
One of the predominant spectral analysis
techniques used in voice verification is the
Cepstral analysis. This analysis technique
essentially separates excitation and vocal
tract, the speech signal is given as
𝑠 𝑛 =𝑔 𝑛 ×𝑣 𝑛 (1)
The LPC Analysis is interesting.
This technique offers the idea that a speech
utterance can be determined by using a
linear combination of all of the other
previous speech samples.
“LPC analysis states that a given
speech sample for a signal at time n, 𝑠 𝑛 .can
be represented as a linear combination of all
the previous p speech sample as given
below: 𝑠 𝑛 =π‘Ž1𝑠 𝑛−1 +π‘Ž2𝑠 𝑛−2 +β‹―+π‘Žπ‘›π‘ 
𝑛−𝑛” [2].
In the study of this work’s
spectrograms, the voice samples were
segmented into phonetic sounds. The phrase
“My name is” for instance, has seven
phonetic sounds. Dynamic Time Warping
can also be used to segment the individual
sounds. Dynamic Time Warping (DTW) is a
non-linear pattern recognition algorithm and
has become one of the main algorithms used
in modern day speech recognition. It
measures the similarity between two voice
samples. Dynamic time warping establishes
an alignment for two sequences of feature
vectors [2].
Many companies today are creating
speech recognition software for all sorts of
consumer products. For instance, a company
who considers itself amongst the leaders in
speech technologies has incorporated speech
recognition/verification software into
computers, automobiles, bluetooth devices,
mobile phones and even home appliances. It
appears that the list of uses for speech
verification is endless. Sensory reports a
False Acceptance Rate of 0.01% for its
products while the False Reject Rate rests
just under 5% [1]. These numbers are
optimal, but are they real? Most companies
who manufacture speech verification
software have similar numbers, but they do
not all operate at the same efficiency.
Conclusion
In conclusion to this paper, many people
today are seeking out biometrics as a means
to protect their information on and offline.
With the help of a quality voiceprint, speech
verification can be used to help secure one’s
information. Speech recognition can be
placed into three main categories, the
acoustic phonetic approach, the pattern
recognition method, and the artificial
intelligence technique. For this voiceprint
study, the entire analysis of recorded voice
samples was partially completed. a spectral
analysis must be done in order to obtain the
numeric values of the voice samples. The
science behind speech recognition is
growing and the applications of this
technology appear to be endless, when it is
perfected. A research company named Opus
Research will be holding its annual Voice
Biometric Conference in Singapore. “We're
very pleased to showcase the everexpanding set of present solutions and future
opportunities for voice biometrics to support
speaker identification and verification
around the world. With enrolled voiceprints
already exceeding 20 + million, we’re
witnessing an accelerated deployment of
voice biometric-based solutions to support
trusted commerce” [4].
References
[1] savedelete.com/7-best-free-speechrecognition-software.html
[2] Krishan Kant Lavania “Reviewing HumanMachine Interaction through Speech
Recognition approaches and Analyzing an
approach for Designing an Efficient System”
International Journal of Computer Applications
(0975 – 8887) Volume 38– No.3, January 2012
[3] Tracy Kitten, “Voice Biometrics as a Fraud
Fighter, Could Emerging Technology Play New
Role in Call Centers?”, May 22,2012
[4] Dan Miller, “Global Growth Brings Voice
Biometrics Conference To Singapore” August
22, 2012,
http://voicebiocon.com/
[5] Steve Cain,
http://expertpages.com/news/voiceprint
identification.htm, updated August 23, 2012.
[6] Criminal Investigation By Kären M.
Hess, Christine Hess Orthmann 2010.
Appendix
Specifications of the Study
Each subject was asked to record the phrase –
“My name is Name”. There were 20 utterances
recorded from each participant. Before the
subjects were asked to speak, they were each
instructed to speak naturally, but clearly while
recording the utterances. Utterances considered
“bad” were all deleted and re-recorded.
Hardware used
Computer: Hewlett Packard Model G71Notebook PC
Microphone: IDT High Definition audio
CODEC (Internal Microphone)
Software used
wavesurfer version 8.5.8 - waveSurfer is an
open source tool for sound visualization and
manipulation. Some applications are for
speech analysis and sound annotation.
Wavesurfer was chosen for its ease of use
and user-friendly interface. This software
application was mainly used for creating
wav files and viewing spectrograms of the
samples collected.
http://www.speech.kth.se/wavesurfer/
Audacity 2.0.1 – Audacity is a free tool to
download from the internet
http://audacity.sourceforge.net/. It can
operate on many different operating systems
to record and mix sounds as well as a
number of other functions. The reason I
chose Audacity for the study, was for its
ability to cleanly erase background noise,
and erase/cut the portions of sound that
weren’t needed
Subject Demographics
The subjects are from somewhat different
ethnic backgrounds, but all of them speak
English as their primary language, and none
of them have foreign identifying accents.
Name
Andrew Calipa
Anita Valencia
Cody Tacktil
Desiree Morris
Michael Valencia
Noemi Valdovinos
Pauly Fidun
Rob Sorensen
Rob Weinstein
Roy Morris
Tony Federici
Age
21
22
29
27
19
22
26
21
28
30
31
Gender
M
F
M
F
M
F
M
M
M
M
M
Download