The Study of Voiceprint technology As a Means of Voice Verification Roy Morris Instructor: Charles C. Tappert Abstract This technical paper explains the definition, and functionality of Voiceprint as a means of authentication and verification. It describes the methods used to create a voiceprint as well as methods and techniques used to perform a spectral analysis. The work also lists some of the current and possible future applications. This study explains some of the methods used in measuring a voice sample such as the analysis of a spectrogram Introduction As the amount of internet users and information increase, there is a parallel increase in the need for stronger security. Nowadays, passwords alone do not provide enough security, and many users are more seriously considering biometrics as a form of protection through verification. As the study of biometrics expands, voice authentication is being looked at as a form of biometrics that can provide that added extra security. “More financial institutions are considering voice biometrics as a way to fight call center fraud. That's because other forms of authentication are proving ineffective at a time when socially engineered attacks against call centers are on the rise” [3]. This study is focused on how well a person’s voice can be individualized and used as a means of authentication, and verification. The fundamental thoughts on voice verification suggest that every voice can be individualized enough to be able to identify one person from the next through the analysis of a voiceprint [5]. A person’s voice print is defined as “a graphic record made by a sound spectrograph of the energy patterns emitted by speech. No two voiceprints are alike” [6]. Through recording, editing, and analyzing the spectrographic features of a voice, we can study what makes the voice a sufficient means of verification. For this study, each participant was recorded repeating the phrase “My name is Name” twenty times. They were instructed to speak relaxed, but clearly in order to get clean voice samples. Once twenty utterances were captured from each subject, the background noise was filtered out so the sample utterance was free of interfering noise. This is referred to as energy thresholding. Also, the phrase “My name is” was isolated so that all samples repeated the same utterance. This is done to improve the study, as consistency is important in all studies. Next, a spectral analysis of the utterance was conducted. A spectrogram is defined as a visual representation of sound that shows the amplitude of frequency components of a signal over time. A spectrogram is created by a mathematical algorithm called FFT. The signal is decomposed into its frequency components where time is shown on the xaxis while frequency is displayed on the yaxis. Below is an example of a spectrogram created by one of the subjects in the study. Whereπ£ (π)., is the vocal tract impulse response and π(π) is the excitation. The entire frequency domain is shown as Sπ =πΊ π.π π (2) Some other analysis techniques used for speech verification are the Mel Cepstrum Analysis, Human Factor Cepstrum Analysis, LPC Analysis, PLP Analysis, and a Temporal Analysis. The Mel-frequency’s advantage is that it provides a more accurate response to a human auditory system. It does this by locating the frequency bands logarithmically over the mel scale which provides a better response of the human auditory system than other frequency bands derived from FFT. There are three main categories in which speech recognition can be placed into. They are the acoustic phonetic approach, the pattern recognition method, and the artificial intelligence technique [2]. The acoustic phonetic approach is based on the theory that the specific phonetic sounds can be found within the speech sample. The pattern recognition method is “one in which the speech pattern are required directly without explicit feature determination and segmentation”, and the artificial intelligence technique’s greatest advantage is that it allows for parallel computation One of the predominant spectral analysis techniques used in voice verification is the Cepstral analysis. This analysis technique essentially separates excitation and vocal tract, the speech signal is given as π π =π π ×π£ π (1) The LPC Analysis is interesting. This technique offers the idea that a speech utterance can be determined by using a linear combination of all of the other previous speech samples. “LPC analysis states that a given speech sample for a signal at time n, π π .can be represented as a linear combination of all the previous p speech sample as given below: π π =π1π π−1 +π2π π−2 +β―+πππ π−π” [2]. In the study of this work’s spectrograms, the voice samples were segmented into phonetic sounds. The phrase “My name is” for instance, has seven phonetic sounds. Dynamic Time Warping can also be used to segment the individual sounds. Dynamic Time Warping (DTW) is a non-linear pattern recognition algorithm and has become one of the main algorithms used in modern day speech recognition. It measures the similarity between two voice samples. Dynamic time warping establishes an alignment for two sequences of feature vectors [2]. Many companies today are creating speech recognition software for all sorts of consumer products. For instance, a company who considers itself amongst the leaders in speech technologies has incorporated speech recognition/verification software into computers, automobiles, bluetooth devices, mobile phones and even home appliances. It appears that the list of uses for speech verification is endless. Sensory reports a False Acceptance Rate of 0.01% for its products while the False Reject Rate rests just under 5% [1]. These numbers are optimal, but are they real? Most companies who manufacture speech verification software have similar numbers, but they do not all operate at the same efficiency. Conclusion In conclusion to this paper, many people today are seeking out biometrics as a means to protect their information on and offline. With the help of a quality voiceprint, speech verification can be used to help secure one’s information. Speech recognition can be placed into three main categories, the acoustic phonetic approach, the pattern recognition method, and the artificial intelligence technique. For this voiceprint study, the entire analysis of recorded voice samples was partially completed. a spectral analysis must be done in order to obtain the numeric values of the voice samples. The science behind speech recognition is growing and the applications of this technology appear to be endless, when it is perfected. A research company named Opus Research will be holding its annual Voice Biometric Conference in Singapore. “We're very pleased to showcase the everexpanding set of present solutions and future opportunities for voice biometrics to support speaker identification and verification around the world. With enrolled voiceprints already exceeding 20 + million, we’re witnessing an accelerated deployment of voice biometric-based solutions to support trusted commerce” [4]. References [1] savedelete.com/7-best-free-speechrecognition-software.html [2] Krishan Kant Lavania “Reviewing HumanMachine Interaction through Speech Recognition approaches and Analyzing an approach for Designing an Efficient System” International Journal of Computer Applications (0975 – 8887) Volume 38– No.3, January 2012 [3] Tracy Kitten, “Voice Biometrics as a Fraud Fighter, Could Emerging Technology Play New Role in Call Centers?”, May 22,2012 [4] Dan Miller, “Global Growth Brings Voice Biometrics Conference To Singapore” August 22, 2012, http://voicebiocon.com/ [5] Steve Cain, http://expertpages.com/news/voiceprint identification.htm, updated August 23, 2012. [6] Criminal Investigation By Kären M. Hess, Christine Hess Orthmann 2010. Appendix Specifications of the Study Each subject was asked to record the phrase – “My name is Name”. There were 20 utterances recorded from each participant. Before the subjects were asked to speak, they were each instructed to speak naturally, but clearly while recording the utterances. Utterances considered “bad” were all deleted and re-recorded. Hardware used Computer: Hewlett Packard Model G71Notebook PC Microphone: IDT High Definition audio CODEC (Internal Microphone) Software used wavesurfer version 8.5.8 - waveSurfer is an open source tool for sound visualization and manipulation. Some applications are for speech analysis and sound annotation. Wavesurfer was chosen for its ease of use and user-friendly interface. This software application was mainly used for creating wav files and viewing spectrograms of the samples collected. http://www.speech.kth.se/wavesurfer/ Audacity 2.0.1 – Audacity is a free tool to download from the internet http://audacity.sourceforge.net/. It can operate on many different operating systems to record and mix sounds as well as a number of other functions. The reason I chose Audacity for the study, was for its ability to cleanly erase background noise, and erase/cut the portions of sound that weren’t needed Subject Demographics The subjects are from somewhat different ethnic backgrounds, but all of them speak English as their primary language, and none of them have foreign identifying accents. Name Andrew Calipa Anita Valencia Cody Tacktil Desiree Morris Michael Valencia Noemi Valdovinos Pauly Fidun Rob Sorensen Rob Weinstein Roy Morris Tony Federici Age 21 22 29 27 19 22 26 21 28 30 31 Gender M F M F M F M M M M M