SPEAKER RECOGNITION Chhavi Jain Pursuing M.Tech, Dept. of CSE DPGITM, Gurgaon, India ABSTRACT Speaker Verification software, or voice biometrics, is technology that relies on that fact that we all have a unique voice that identifies us. Speaker verification software is most commonly used in security applications, such as gaining access to restricted areas of a building, or computer files by using your voice. It is also used in call centers where clients say the menu which they wish to select. In many applications, speech may be the main or only means of transferring information and so would provide a simpler method for authentication. The telephone system provides a familiar network for obtaining and delivering the speech signal. For telephony based applications, there would be no need for any equipment or networks to be installed as the telephone and mobile network would suffice. For non-telephone applications, soundcards and microphones are available as a cheap alternative. In our application we are making a speaker identification software that identify the person on the call when he say “hello” and give us the name of the speaker if we already know that speaker in our software database. 1 INTRODUCTION Speaker Recognition Speech is the vocalized form of human communication. When we talk to a person the speech signals give many level of information for the listener of speech. First and most important it Speaker Detection Speaker Identification sent or conveys a message of words to the Speaker Detection: In speaker detection there is listener. A speech signals have many other searching and then tagging the speech based on information like language of speaker, emotion, who is speaking. In large audio data achieves gender and the identity of the speaker. So in speaker detection is used to search the speaker speaker recognition our aim is to identify the from the speech, detect it and index according to speaker from his/her speech. effective way for use that speaker speech region. 2 SPEAKER RECOGNITION SpeakerVerification: Speaker verification is the use of a machine to verify a person’s claimed There are three different methods in speaker Recognition: identity from his voice. It also called with different terms for speaker verification, including voice verification, speaker authentication, voice authentication, talker authentication and talker verification. Here a person makes a identity claim by entering some information to system like can be used to take the speaker acoustic wave and username, card convert into analog speech signal. An analog to digital number. If speaker recognition is text dependent (A/D) converter convert the analog speech signal into then the text is known to user/person. Then digital speech signal. A/D convert can take sample speaker speak the text in front of microphone and frequency at 8000-192000 samples per second at 8-64 bits system analyzed the signal for verification that it of resolution with 1-4 channels. As the sample is the right person or not. frequency, resolution or channel increase the size of employee SpeakerIdentification: number, Speaker smart identification there is no previous identity claim and the system just takes speech signal from the microphone and identifies the user related to which group or unknown. In this we have to match the speaker voice sample with the all the users in the system and find which has the voice near or equal to the voice that claim for the identification. 3 MOTIVATION speech signal data increase. For the small device where the memory is less we need to less sample frequency so data can be handled by small device also. A sample frequency 8000 samples per second at 16 bits of resolution with one channel can be identify the feature of speaker voice and accurate for small device. In case of computer we can increase sample frequency and bit resolution for better result. So in this module our main motive is to take the sound signal from the speaker and save into system or real time processing on the signal. Speech Feature Extraction In speech features extraction Some time we meet to our friends, relative or we find the parameter to classify the speaker from each family member and we miss some important task other. Different speech features are obtained from the to tell them. How can we make a computer or speech signal like MFCC, LPCC etc. The main motive of mobile so intelligent that he give us reminder feature extraction is to reduction data while retain the about the important task related to the speaker classification information of the speaker. If the data of talk to us? Speaker identification recognition speech signal is reduced then there is less number of system which allows us to find a person based on computations his or her voice and remind about the task related comparison.Chapter 3 give the full detail about the to that person. So in this dissertation our main feature extraction from the speech signal. concentrations on the speaker identification. and less time will be taken for Speech Feature Matching In this module the data from feature extraction will be classified with different classification model of speaker recognition like HMM, 4 SPEAKER IDENTIFICATION GMM, ANN, Vector Quantization etc. Speech Signal Acquisition First of all we need the After the classification of the feature data the training speaker voice to feed into system for training and data will be matched with the testing data space. Offline matching the feature of speaker voice to identify the data matching and online data matching will be speaker. Microphone or telephone handset or mobile performed. Different distance matching method is used to find the result. Chapter give the full detail about to monitor criminals in common places by identifying feature classification and matching. them by voices. In fact, all these examples are actually examples of real time systems. For any identification 5 ISSUES IN SPEAKER IDENTIFICATION system to be useful in practice, the time response, or There are some issue related to speaker identification, time spent on the identification should be minimized. those are Growing size of speaker database is also common fact for practical systems and can also lead to system 1. Noise in the signal(Environment noise optimization. from surround) 2. Age of the speaker(Voice change with the age of speaker) 3. Emotional state of the speaker like anger. 4. Illness of the speaker like kuff etc 5. Channel mismatch 6. Different hardware for the speech signal. 7 SPEAKER IDENTIFICATION MODEL 6 APPLICATIONS Practical applications for automatic speaker identification are obviously various kinds of security systems. Human voice can serve as a key for any security objects, and it is not so easy in general to lose or forget it. Another important property of speech is that it can be transmitted by telephone channel, for example. This provides an ability to automatically identify speakers and provide access to security objects by telephone. Nowadays, this approach begins to be used for telephone credit card purchases and bank transactions. Human voice can also be used to prove identity during access to any physical facilities by storing speaker model in a small chip, which can be used as an access tag, and used instead of a pin code. Another important application for speaker identification is to monitor people by their voices. For instance, it is useful in information retrieval by speaker indexing of some recorded debates or news, and then retrieving speech only for interesting speakers. It can also be used 8 CONCLUSION In case of different word other then “Hello” our success encouragement, help and useful suggestions. His rate to identify correctly the right speaker is 25.7% if untiring and painstaking efforts, methodical approach speaker identification system trained with word “hello” and individual help made it possible for me to complete only. In case of same word “Hello” speaker this work in time. identification system have the success rate 97.14%. REFRENCES Our system for speaker identification can play important role in application like Caller ID on mobile [1] B.S. Atal, S.L. Hanauer, “Speech analysis and based on voice because when the speaker start with synthesis by linear prediction of the speech wave.”, “Hello” then 97.14% chance is there that our system Journal of the acoustical society of America. Vol.50, identifies the speaker. No.2, pp. 637- 655, 1971. [2] B.S. Atal, “Effectiveness of linear prediction 9 FUTURE WORK characteristics of the speech wave for automatic speaker There is still wide array of open research problem relevant to the issue in speaker identification. Results given by the experiment in this dissertation are very promising and encourage us to extend the work in speaker identification even further Make a caller ID software on mobile based on Try to improve result for same word with different algorithm. classification model for speaker recognition. Extract new feature that can identify correctly speaker with less training. their Voices”, Proceedings of the IEEE, vol. 64, pp 460 – 475, 1976. IEEE Volume 65, Issue 10, Oct. 1977 Page(s):1428 – 1443. [5] Y. Linde, A. Buzo& R. Gray, “An algorithm for vector quantizer design”, IEEE Transactions on Communications, Vol. 28, issue 1, Jan 1980 pp.84-95. [6] Davis, S.B., Mermelstein, P., “Comparison of For different word our result are not so good, so we need to analysis new scope for [3] B. S. Atal, “Automatic Recognition of Speakers from cepstrum: A guide to processing” Proceedings of the our speaker identification system. society of America.Vol.55, No.6, pp. 1304-1312, 1974. [4] Childers, D.G.; Skinner, D.P.; Kemerait, R.C.; “The This can include the following identification and verification.” Journal of the acoustical Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences”, IEEE Trans. on Acoustic, Speech and SignalProcessing,Vol. 28,No4,pp.357–366, 1980. [7] L. R. Rabiner, “A tutorial on hidden Markov models Try other feature matching model like GMM, ANN, HMM etc. and selected applications in speech recognition,” Proceedings of IEEE, vol. 77, no. 2, pp. 257–286, 1989. [8] S. Furui, “Vector-Quantization-Based Speech Recognition and Speaker Recognition Techniques”, ACKNOWLEDGMENT IEEE Signals, Systems and Computers, Volume2, pp. 954It is with deep sense of gratitude and reverence that I express my sincere thanks to my Dissertation Supervisor Mr. Manu Phogat for their guidance, 958, 1991 [9] D.A. Reynolds, “Experimental evaluation of features PROCESSING, VOL. 9, NO. 3, pp.217-231, MARCH for robust speaker identification,” IEEE Trans. Speech 2001 Audio Process., vol. 2(4), pp. 639-43, Oct. 1994. [20]SirkoMolau, Michael Pitz, Ralf Schluter, and Hermann [10]D.A.Reynolds,”Speaker Identification and Ney “Computing mel-frequency cepstral coefficients on verification using Gaussian mixture speaker models,” in the power spectrum”, International Conference on Proc. ESCA workshop of Automatic speaker Acoustics, Speech, and Signal Processing - ICASSP , vol. recoginition,identification and verification,pp. 27- 1, pp. 73-76, 2001 30,Apr.1994. [21]Claudio Becchetti and Lucio PrinaRicotti, “Speech [11] Young, S.J., Odell, J., Ollason, D., Valtchev, V., Recognition”, Chichester: John Woodland, P., “The HTK Book. Version 2.1”, Department Wiley & Sons, 2004. of Engineering, Cambridge University, UK, 1995. [22] Skowronski, M.D., Harris, J.G., “Exploiting [12] D.A. Reynolds, R. Rose, “Robust text-independent independent filter bandwidth of human factor cepstral speaker identification using Gaussian mixture speaker coefficients in automatic speech recognition”, Journal of models.”, IEEE Transactions on Speech and Audio the Acoustical Society of America,Vol.116,No.3pp.1774– Processing Vol.3, 1995. 1780, Sept. 2004. [13]D.A.Reynolds,”speaker identification and [23]Sadaoki Furui,”50 years of progress in speech and verification using Gaussian mixture models,”Speech speaker recognition” ECTI Transactions on Computer and Comm.,Vol.17,pp.91-108,aug 1995 Information Technology, Vol. 1, No. 2.pp. 64-74, 2005. [14] Campbell, J.P., Jr.; “Speaker recognition: a tutorial” [24]TodorGanchev, Nikos Fakotakis, George Kokkinakis Proceedings of the IEEE Volume 85, Issue 9, pp1437 – “Comparative Evaluation of Various MFCC 1462, Sept. 1997. Implementations on the Speaker Verification Task”Proc. [15] Slaney M. “Auditory Toolbox. Version 2”, Technical of the SPECOM-2005, Vol. 1, pp. 191-194 October 17-19, Report #1998-010, Interval Research Corporation, 1998. 2005. [16] D.A. Reynolds, T. Quatieri, R. Dunn, “Speaker [25] D. E. Sturim, W. M. Campbell, D. A. verification using adapted Gaussian mixture models”, Reynolds”Classification model in speaker recoginition Digital Signal Processing Vol.10, No.1, 2000. “,Lecture Notes in Computer Science Volume 4343, pp [17] Martin, A. and Przybocki, M., “The NIST 1999 278-297,2007 Speaker Recognition Evaluation—An Overview”, Digital Signal Processing, Vol. 10, Num. 1-3. January/April/July 2000 [18] Martin, A. and Przybocki, M., “The NIST Speaker Recognition Evaluations: 1996- 2000”, Proc. OdysseyWorkshop, Crete, June 2001 [19]Elias Nemer,RafikGoubran, Samy Mahmoud, “Robust Voice Activity Detection Using HigherOrderStatistics in the LPC Residual Domain”, IEEE TRANSACTIONS ON SPEECH AND AUDIO