SPEAKER RECOGNITION

Chhavi Jain
Pursuing M.Tech, Dept. of CSE
DPGITM, Gurgaon, India
Speaker Verification software, or voice biometrics, is technology that relies on that fact that we all have a unique voice
that identifies us. Speaker verification software is most commonly used in security applications, such as gaining access to
restricted areas of a building, or computer files by using your voice. It is also used in call centers where clients say the
menu which they wish to select. In many applications, speech may be the main or only means of transferring information
and so would provide a simpler method for authentication. The telephone system provides a familiar network for
obtaining and delivering the speech signal. For telephony based applications, there would be no need for any equipment
or networks to be installed as the telephone and mobile network would suffice. For non-telephone applications,
soundcards and microphones are available as a cheap alternative. In our application we are making a speaker
identification software that identify the person on the call when he say “hello” and give us the name of the speaker if we
already know that speaker in our software database.
Speaker Recognition
Speech is
of human
communication. When we talk to a person the
speech signals give many level of information for
the listener of speech. First and most important it
Speaker Detection
Speaker Identification
sent or conveys a message of words to the
Speaker Detection: In speaker detection there is
listener. A speech signals have many other
searching and then tagging the speech based on
information like language of speaker, emotion,
who is speaking. In large audio data achieves
gender and the identity of the speaker. So in
speaker detection is used to search the speaker
speaker recognition our aim is to identify the
from the speech, detect it and index according to
speaker from his/her speech.
effective way for use that speaker speech region.
SpeakerVerification: Speaker verification is the
use of a machine to verify a person’s claimed
There are three different methods in speaker
identity from his voice. It also called with
different terms for speaker verification, including
voice verification, speaker authentication, voice
authentication, talker authentication and talker
verification. Here a person makes a identity claim
by entering some information to system like
can be used to take the speaker acoustic wave and
convert into analog speech signal. An analog to digital
number. If speaker recognition is text dependent
(A/D) converter convert the analog speech signal into
then the text is known to user/person. Then
digital speech signal. A/D convert can take sample
speaker speak the text in front of microphone and
frequency at 8000-192000 samples per second at 8-64 bits
system analyzed the signal for verification that it
of resolution with 1-4 channels. As the sample
is the right person or not.
frequency, resolution or channel increase the size of
there is no previous identity claim and the
system just takes speech signal from the
microphone and identifies the user related to
which group or unknown. In this we have to
match the speaker voice sample with the all the
users in the system and find which has the voice
near or equal to the voice that claim for the
speech signal data increase. For the small device where
the memory is less we need to less sample frequency so
data can be handled by small device also. A sample
frequency 8000 samples per second at 16 bits of
resolution with one channel can be identify the feature
of speaker voice and accurate for small device. In case of
computer we can increase sample frequency and bit
resolution for better result. So in this module our main
motive is to take the sound signal from the speaker and
save into system or real time processing on the signal.
Speech Feature Extraction In speech features extraction
Some time we meet to our friends, relative or
we find the parameter to classify the speaker from each
family member and we miss some important task
other. Different speech features are obtained from the
to tell them. How can we make a computer or
speech signal like MFCC, LPCC etc. The main motive of
mobile so intelligent that he give us reminder
feature extraction is to reduction data while retain the
about the important task related to the speaker
classification information of the speaker. If the data of
talk to us? Speaker identification recognition
speech signal is reduced then there is less number of
system which allows us to find a person based on
his or her voice and remind about the task related
comparison.Chapter 3 give the full detail about the
to that person. So in this dissertation our main
feature extraction from the speech signal.
concentrations on the speaker identification.
Speech Feature Matching In this module the data from
feature extraction will be classified with different
classification model of speaker recognition like HMM,
GMM, ANN, Vector Quantization etc.
Speech Signal Acquisition First of all we need the
After the classification of the feature data the training
speaker voice to feed into system for training and
data will be matched with the testing data space. Offline
matching the feature of speaker voice to identify the
data matching and online data matching will be
speaker. Microphone or telephone handset or mobile
performed. Different distance matching method is used
to find the result. Chapter give the full detail about
to monitor criminals in common places by identifying
feature classification and matching.
them by voices. In fact, all these examples are actually
examples of real time systems. For any identification
system to be useful in practice, the time response, or
There are some issue related to speaker identification,
time spent on the identification should be minimized.
those are
Growing size of speaker database is also common fact
for practical systems and can also lead to system
Noise in the signal(Environment noise
from surround)
Age of the speaker(Voice change with the
age of speaker)
Emotional state of the speaker like anger.
Illness of the speaker like kuff etc
Channel mismatch
Different hardware for the speech signal.
identification are obviously various kinds of security
systems. Human voice can serve as a key for any
security objects, and it is not so easy in general to lose or
forget it. Another important property of speech is that it
can be transmitted by telephone channel, for example.
This provides an ability to automatically identify
speakers and provide access to security objects by
telephone. Nowadays, this approach begins to be used
transactions. Human voice can also be used to prove
identity during access to any physical facilities by
storing speaker model in a small chip, which can be
used as an access tag, and used instead of a pin code.
Another important application for speaker identification
is to monitor people by their voices. For instance, it is
useful in information retrieval by speaker indexing of
some recorded debates or news, and then retrieving
speech only for interesting speakers. It can also be used
In case of different word other then “Hello” our success
encouragement, help and useful suggestions. His
rate to identify correctly the right speaker is 25.7% if
untiring and painstaking efforts, methodical approach
speaker identification system trained with word “hello”
and individual help made it possible for me to complete
only. In case of same word “Hello” speaker
this work in time.
identification system have the success rate 97.14%.
Our system for speaker identification can play
important role in application like Caller ID on mobile
IEEE Signals, Systems and Computers, Volume2, pp. 954-958, 1991
