SPEAKER RECOGNITION Chhavi Jain Pursuing M.Tech, Dept. of

advertisement
SPEAKER RECOGNITION
Chhavi Jain
Pursuing M.Tech, Dept. of CSE
DPGITM, Gurgaon, India
ABSTRACT
Speaker Verification software, or voice biometrics, is technology that relies on that fact that we all have a unique voice
that identifies us. Speaker verification software is most commonly used in security applications, such as gaining access to
restricted areas of a building, or computer files by using your voice. It is also used in call centers where clients say the
menu which they wish to select. In many applications, speech may be the main or only means of transferring information
and so would provide a simpler method for authentication. The telephone system provides a familiar network for
obtaining and delivering the speech signal. For telephony based applications, there would be no need for any equipment
or networks to be installed as the telephone and mobile network would suffice. For non-telephone applications,
soundcards and microphones are available as a cheap alternative. In our application we are making a speaker
identification software that identify the person on the call when he say “hello” and give us the name of the speaker if we
already know that speaker in our software database.
1 INTRODUCTION
Speaker Recognition
Speech is
the
vocalized
form
of human
communication. When we talk to a person the
speech signals give many level of information for
the listener of speech. First and most important it
Speaker Detection
Speaker Identification
sent or conveys a message of words to the
Speaker Detection: In speaker detection there is
listener. A speech signals have many other
searching and then tagging the speech based on
information like language of speaker, emotion,
who is speaking. In large audio data achieves
gender and the identity of the speaker. So in
speaker detection is used to search the speaker
speaker recognition our aim is to identify the
from the speech, detect it and index according to
speaker from his/her speech.
effective way for use that speaker speech region.
2 SPEAKER RECOGNITION
SpeakerVerification: Speaker verification is the
use of a machine to verify a person’s claimed
There are three different methods in speaker
Recognition:
identity from his voice. It also called with
different terms for speaker verification, including
voice verification, speaker authentication, voice
authentication, talker authentication and talker
verification. Here a person makes a identity claim
by entering some information to system like
can be used to take the speaker acoustic wave and
username,
card
convert into analog speech signal. An analog to digital
number. If speaker recognition is text dependent
(A/D) converter convert the analog speech signal into
then the text is known to user/person. Then
digital speech signal. A/D convert can take sample
speaker speak the text in front of microphone and
frequency at 8000-192000 samples per second at 8-64 bits
system analyzed the signal for verification that it
of resolution with 1-4 channels. As the sample
is the right person or not.
frequency, resolution or channel increase the size of
employee
SpeakerIdentification:
number,
Speaker
smart
identification
there is no previous identity claim and the
system just takes speech signal from the
microphone and identifies the user related to
which group or unknown. In this we have to
match the speaker voice sample with the all the
users in the system and find which has the voice
near or equal to the voice that claim for the
identification.
3 MOTIVATION
speech signal data increase. For the small device where
the memory is less we need to less sample frequency so
data can be handled by small device also. A sample
frequency 8000 samples per second at 16 bits of
resolution with one channel can be identify the feature
of speaker voice and accurate for small device. In case of
computer we can increase sample frequency and bit
resolution for better result. So in this module our main
motive is to take the sound signal from the speaker and
save into system or real time processing on the signal.
Speech Feature Extraction In speech features extraction
Some time we meet to our friends, relative or
we find the parameter to classify the speaker from each
family member and we miss some important task
other. Different speech features are obtained from the
to tell them. How can we make a computer or
speech signal like MFCC, LPCC etc. The main motive of
mobile so intelligent that he give us reminder
feature extraction is to reduction data while retain the
about the important task related to the speaker
classification information of the speaker. If the data of
talk to us? Speaker identification recognition
speech signal is reduced then there is less number of
system which allows us to find a person based on
computations
his or her voice and remind about the task related
comparison.Chapter 3 give the full detail about the
to that person. So in this dissertation our main
feature extraction from the speech signal.
concentrations on the speaker identification.
and
less
time
will
be
taken
for
Speech Feature Matching In this module the data from
feature extraction will be classified with different
classification model of speaker recognition like HMM,
4 SPEAKER IDENTIFICATION
GMM, ANN, Vector Quantization etc.
Speech Signal Acquisition First of all we need the
After the classification of the feature data the training
speaker voice to feed into system for training and
data will be matched with the testing data space. Offline
matching the feature of speaker voice to identify the
data matching and online data matching will be
speaker. Microphone or telephone handset or mobile
performed. Different distance matching method is used
to find the result. Chapter give the full detail about
to monitor criminals in common places by identifying
feature classification and matching.
them by voices. In fact, all these examples are actually
examples of real time systems. For any identification
5 ISSUES IN SPEAKER IDENTIFICATION
system to be useful in practice, the time response, or
There are some issue related to speaker identification,
time spent on the identification should be minimized.
those are
Growing size of speaker database is also common fact
for practical systems and can also lead to system
1.
Noise in the signal(Environment noise
optimization.
from surround)
2.
Age of the speaker(Voice change with the
age of speaker)
3.
Emotional state of the speaker like anger.
4.
Illness of the speaker like kuff etc
5.
Channel mismatch
6.
Different hardware for the speech signal.
7 SPEAKER IDENTIFICATION MODEL
6 APPLICATIONS
Practical
applications
for
automatic
speaker
identification are obviously various kinds of security
systems. Human voice can serve as a key for any
security objects, and it is not so easy in general to lose or
forget it. Another important property of speech is that it
can be transmitted by telephone channel, for example.
This provides an ability to automatically identify
speakers and provide access to security objects by
telephone. Nowadays, this approach begins to be used
for
telephone
credit
card
purchases
and
bank
transactions. Human voice can also be used to prove
identity during access to any physical facilities by
storing speaker model in a small chip, which can be
used as an access tag, and used instead of a pin code.
Another important application for speaker identification
is to monitor people by their voices. For instance, it is
useful in information retrieval by speaker indexing of
some recorded debates or news, and then retrieving
speech only for interesting speakers. It can also be used
8 CONCLUSION
In case of different word other then “Hello” our success
encouragement, help and useful suggestions. His
rate to identify correctly the right speaker is 25.7% if
untiring and painstaking efforts, methodical approach
speaker identification system trained with word “hello”
and individual help made it possible for me to complete
only. In case of same word “Hello” speaker
this work in time.
identification system have the success rate 97.14%.
REFRENCES
Our system for speaker identification can play
important role in application like Caller ID on mobile
[1] B.S. Atal, S.L. Hanauer, “Speech analysis and
based on voice because when the speaker start with
synthesis by linear prediction of the speech wave.”,
“Hello” then 97.14% chance is there that our system
Journal of the acoustical society of America. Vol.50,
identifies the speaker.
No.2, pp. 637- 655, 1971.
[2] B.S. Atal, “Effectiveness of linear prediction
9 FUTURE WORK
characteristics of the speech wave for automatic speaker
There is still wide array of open research problem
relevant to the issue in speaker identification. Results
given by the experiment in this dissertation are very
promising and encourage us to extend the work in
speaker identification even further
Make a caller ID software on mobile based on
Try to improve result for same word with
different algorithm.

classification model for speaker recognition.
Extract new feature that can identify correctly
speaker with less training.

their Voices”, Proceedings of the IEEE, vol. 64, pp 460 –
475, 1976.
IEEE Volume 65, Issue 10, Oct. 1977 Page(s):1428 – 1443.
[5] Y. Linde, A. Buzo& R. Gray, “An algorithm for
vector quantizer design”, IEEE Transactions on
Communications, Vol. 28, issue 1, Jan 1980 pp.84-95.
[6] Davis, S.B., Mermelstein, P., “Comparison of
For different word our result are not so good,
so we need to analysis new scope for

[3] B. S. Atal, “Automatic Recognition of Speakers from
cepstrum: A guide to processing” Proceedings of the
our speaker identification system.

society of America.Vol.55, No.6, pp. 1304-1312, 1974.
[4] Childers, D.G.; Skinner, D.P.; Kemerait, R.C.; “The
This can include the following

identification and verification.” Journal of the acoustical
Parametric Representations for Monosyllabic Word
Recognition in Continuously Spoken Sentences”, IEEE
Trans. on Acoustic, Speech and SignalProcessing,Vol.
28,No4,pp.357–366, 1980.
[7] L. R. Rabiner, “A tutorial on hidden Markov models
Try other feature matching model like GMM,
ANN, HMM etc.
and selected applications in speech recognition,”
Proceedings of IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[8] S. Furui, “Vector-Quantization-Based Speech
Recognition and Speaker Recognition Techniques”,
ACKNOWLEDGMENT
IEEE Signals, Systems and Computers, Volume2, pp. 954It is with deep sense of gratitude and reverence that I
express
my
sincere
thanks
to
my
Dissertation
Supervisor Mr. Manu Phogat for their guidance,
958, 1991
[9] D.A. Reynolds, “Experimental evaluation of features
PROCESSING, VOL. 9, NO. 3, pp.217-231, MARCH
for robust speaker identification,” IEEE Trans. Speech
2001
Audio Process., vol. 2(4), pp. 639-43, Oct. 1994.
[20]SirkoMolau, Michael Pitz, Ralf Schluter, and Hermann
[10]D.A.Reynolds,”Speaker Identification and
Ney “Computing mel-frequency cepstral coefficients on
verification using Gaussian mixture speaker models,” in
the power spectrum”, International Conference on
Proc. ESCA workshop of Automatic speaker
Acoustics, Speech, and Signal Processing - ICASSP , vol.
recoginition,identification and verification,pp. 27-
1, pp. 73-76, 2001
30,Apr.1994.
[21]Claudio Becchetti and Lucio PrinaRicotti, “Speech
[11] Young, S.J., Odell, J., Ollason, D., Valtchev, V.,
Recognition”, Chichester: John
Woodland, P., “The HTK Book. Version 2.1”, Department
Wiley & Sons, 2004.
of Engineering, Cambridge University, UK, 1995.
[22] Skowronski, M.D., Harris, J.G., “Exploiting
[12] D.A. Reynolds, R. Rose, “Robust text-independent
independent filter bandwidth of human factor cepstral
speaker identification using Gaussian mixture speaker
coefficients in automatic speech recognition”, Journal of
models.”, IEEE Transactions on Speech and Audio
the Acoustical Society of America,Vol.116,No.3pp.1774–
Processing Vol.3, 1995.
1780, Sept. 2004.
[13]D.A.Reynolds,”speaker identification and
[23]Sadaoki Furui,”50 years of progress in speech and
verification using Gaussian mixture models,”Speech
speaker recognition” ECTI Transactions on Computer and
Comm.,Vol.17,pp.91-108,aug 1995
Information Technology, Vol. 1, No. 2.pp. 64-74, 2005.
[14] Campbell, J.P., Jr.; “Speaker recognition: a tutorial”
[24]TodorGanchev, Nikos Fakotakis, George Kokkinakis
Proceedings of the IEEE Volume 85, Issue 9, pp1437 –
“Comparative Evaluation of Various MFCC
1462, Sept. 1997.
Implementations on the Speaker Verification Task”Proc.
[15] Slaney M. “Auditory Toolbox. Version 2”, Technical
of the SPECOM-2005, Vol. 1, pp. 191-194 October 17-19,
Report #1998-010, Interval Research Corporation, 1998.
2005.
[16] D.A. Reynolds, T. Quatieri, R. Dunn, “Speaker
[25] D. E. Sturim, W. M. Campbell, D. A.
verification using adapted Gaussian mixture models”,
Reynolds”Classification model in speaker recoginition
Digital Signal Processing Vol.10, No.1, 2000.
“,Lecture Notes in Computer Science Volume 4343, pp
[17] Martin, A. and Przybocki, M., “The NIST 1999
278-297,2007
Speaker Recognition Evaluation—An Overview”,
Digital Signal Processing, Vol. 10, Num. 1-3.
January/April/July 2000
[18] Martin, A. and Przybocki, M., “The NIST Speaker
Recognition Evaluations: 1996- 2000”, Proc.
OdysseyWorkshop, Crete, June 2001
[19]Elias Nemer,RafikGoubran, Samy Mahmoud,
“Robust Voice Activity Detection Using HigherOrderStatistics in the LPC Residual Domain”, IEEE
TRANSACTIONS ON SPEECH AND AUDIO
Download