Speaker Identification with Back Propagation Neural Network Using Tunneling Alogorithm PPS Subhashini

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 3 - Mar 2014
Speaker Identification with Back Propagation Neural Network Using
Tunneling Alogorithm
PPS Subhashini#1, Dr. M.Satya Sairam #2,Dr. D Srinivasarao#3
#1
#2
Associate Prof ,ECE Dept.,RVR & JC College of Engg., Guntur,
Prof & Head, ECE Dept., Chalapathi Institute of Engg. and Technology, Guntur,
#3
Prof & Head,Department of ECE,J N T U H, Hyderabad,
Ab st r a ct
speaker identification has been an active area of
research in the past due to its diverse applications and it
continues to be a challenging research topic. Back
Propagation neural networks provides an attractive
possibilities for solving signal processing & pattern
classification problems. Several algorithms have been
proposed for choosing the BP neural network prototypes &
training the network. The selection of the BP prototypes &
the network weights are the system identification problem.
The proposed thesis implements an enhanced training method
for BP neural network based on Tunneling algorithm. The
proposed work is tested on the speaker Identification
problem. Features are obtained by using linear predictive
coefficients (LPC) and these features are classified by using
Back propagation neural network. The efficiency of the
proposed method is tested on the different speaker voices. It is
shown that the use of Tunneling algorithm results in better
fast learning to reach the global minima.
keywords - ANN, Back Propagation Training, LPCC method,
MLP network.
I. INTRODUCTION
Speaker recognition is the identification of the
person who is speaking by characteristics of their voices
(voice biometrics), also called voice recognition.
Recognizing the speaker can simplify the task of
translating speech in systems that have been trained on
specific person's voices or it can be used to authenticate or
verify the identity of a speaker as part of a security process.
Speaker recognition has a history dating back
some four decades and uses the acoustic features of speech
that have been found to differ between individuals. These
acoustic patterns reflect both anatomy (i.e, size and shape
of the throat and mouth) and learned behavioral patterns
(i.e, voice pitch, speaking style.) Speaker verification has
earned speaker recognition its classification as a
"behavioral biometric".[1][2]
Each speaker recognition system has two phases:
Enrollment and verification. During enrollment, the
speaker's voice is recorded and typically a number of
features are extracted to form a voice print, template, or
ISSN: 2231-5381
model. In the verification phase, a speech sample or
"utterance" is compared against a previously created voice
print. For identification systems, the utterance is compared
against multiple voice prints in order to determine the best
match(es) while verification systems compare an utterance
against a single voice print. Because of the process
involved, verification is faster than identification.
Speaker recognition systems fall into two
categories: text-dependent and text-independent. If the text
must be the same for enrollment and verification this is
called text-dependent recognition. In a text-dependent
system, prompts can either be common across all speakers.
Text-independent systems are most often used for speaker
identification as they require very little if any cooperation
by the speaker. In this case the text during enrollment and
test is different. In fact, the enrollment may happen without
the user's knowledge.[3]
This paper focuses on speaker recognition using
Multilayer Perceptron (MLP) Neural Network based on
back propagation training algorithm. From the results it
was observed that the proposed method has very high
success rate in recognising different speaker identities.
II. ARTIFICIAL NEURAL NETWORKS
Neural network or Artificial Neural Network
(ANN) is a massively parallel distributed processor made
up of simple processing units, which has a natural
propensity for storing experiential knowledge and making
it available for use. A neural network contains a large
number of simple neuron like processing elements and a
large number of weighted connections encode the
knowledge of a network. Though biologically inspired,
many of the neural network models developed do not
duplicate the operation of the human brain. [5]
The intelligence of a neural network emerges from the
collective behavior of neurons. Each neuron performs only
very limited operation. Even though each individual
neuron works slowly, they can
still quickly find a
solution by working in parallel. This fact can explain why
humans can recognize a visual scene faster than a digital
computer Training the Neural Networks can be
accomplished in two ways.
http://www.ijettjournal.org
Page 106
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 2 - Mar 2014


A. Working Principle
supervised learning method
unsupervised learning method
In supervised learning, the model defines the effect one set
of observations, called inputs, has on another set of
observations, called outputs. In other words, the inputs are
assumed to be at the beginning and outputs at the end of
the causal chain. The models can include mediating
variables between the inputs and outputs.
Considering the MLP Neural Network with one hidden
layer as shown in Fig.2 consisting number of input nodes I,
the number of hidden nodes J and the number of output
nodes K . The I-dimensional input z is passed directly to a
In unsupervised learning, all the observations are assumed
to be caused by latent variables, i.e. the observations are
assumed to be at the end of the causal chain. With
unsupervised learning it is possible to learn larger and
more complex models than with supervised learning. This
is because in supervised learning one is trying to find the
connection between two sets of observations.
Backpropagation method is the basis for training a
supervised neural network. The output is a real value
which lies between 0 and 1 based on the sigmoid function.
III. MLP NEURAL NETWORKS
Multi Layer Perceptron (MLP) is a type of artificial
network for applications to problems of supervised
learning such as pattern classification. A feed-forward
network has a layered structure. Each layer consists of
units which receive their input from units from a layer
directly below and send their output to units in a layer
directly above the unit. There are no connections within a
layer. The inputs are fed into the first layer of hidden units.
No processing takes place in input units. The training
output values are vectors of length equal to the number of
classes. After training, the network responds to a new
pattern.[8]
This network consists of three layers, one input layer,
number of hidden layers and one output layer as shown in
Fig. 1. The activation of a hidden units of layer of Nh,1 is a
function f of the weighted inputs plus a bias .The output of
the hidden units is distributed over the next layer of Nh,2
hidden units, until the last layer of hidden units, of which
the outputs are fed into a layer of N0 output units [6] .
The output of each hidden neuron is then weighted and
passed to the output layer. The outputs of the network
consist of sums of the weighted hidden layer neurons. The
design of an MLP requires several decisions that include
the number of hidden units in the hidden layer, values of
the prototypes, the functions used at the hidden units and
the weights applied between the hidden layer and the
output layer [7].
The performance of an MLP network depends on the
number of inputs, the shape of the sigmoid function at the
hidden units and the method used for determining the
network weights. MLP networks are trained by selecting
the weights randomly from the training data.
ISSN: 2231-5381
Fig.1 multi-layer network with l number of hidden layers
hidden layer. Suppose there are J neurons in the hidden
layer, each of the J neurons in the hidden layer applies
unipolar sigmoid activation function defined in the
Equation 1 given below
(
)=
( .
)
(1)
Where >0 and net of a node is the summation of its
weighted inputs. Where J is number of nodes in the hidden
layer.Each of the K neurons in the Output layer applies
Generalized Sigmoid activation function defined in the
Equation 2 given below
(
)=
( )
(2)
∑
Where K is the number of nodes in the output
layer. The generalised sigmoid function is the general case
of the sigmoid function with the ability to specify the
steepness of the function as well as an offset that should be
taken into consideration.The use of the generalized
sigmoid function introduces additional flexibility into the
MLP model. Since the response of each output neuron is
tempered by the responses of all the output neurons, the
competition actually fosters cooperation among the output
neurons [15].
http://www.ijettjournal.org
Page 107
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 3 - Mar 2014
When the MLP Neural network is used in classification,
the hidden layer performs clustering while the output layer
performs classification. The hidden units apply Non linear
sigmoid activation function for the input patterns. The
output layer would linearly combine all the outputs of the
hidden layer. Each output node would then give an output
value, which represents the probability that the input
pattern falls under that class. Back-propagation can be
A. Back Propagation Training
The first phase in the Back Propagation tunneling
training algorithm is Back-Propagation training Algorithm.
A two layer feed-forward network did not present a
solution to the problem of how to adjust the weights from
input to hidden units The solution is that the errors for the
units of the hidden layer are determined by backpropagating the errors of the units of the output layer. The
method is often called the back-propagation learning rule.
Back-propagation can also be considered as a
generalization of the delta rule for non-linear activation
functions layer networks .
The error measure E is defined as the total quadratic error
for pattern p at the output units. When a learning pattern is
clamped, the activation values are propagated to the output
units, and the actual network output is compared with the
desired output values, we usually end up with an error in
each of the output units.
Let I be the Number of input nodes, J be of Number of
hidden nodes and K be the Number of output nodes for the
MLP neural network. Consider V be the weight vector for
hidden layer and W be the weight vector for output layer.
The size of W matrix is K X J. The size of V matrix is
JXI
Fig.2 multi-layer network with single hidden layer
applied to networks with any number of layers. In this
paper a feed-forward neural network with a single layer of
hidden units is used with a sigmoid activation function for
the units.
IV. BACKPROPAGATION NEURAL NETWORK
The
MLP
neural
network
trained
with
backpropagation training algorithm is known as Back
Propagation(BP) Neural Network.The error surface of a
complex network is full of hills and valleys. Because of the
gradient descent, the network can get trapped in a local
minimum when there is a much deeper minimum nearby.
Probabilistic methods can help to avoid this trap, but they
tend to be slow. Another suggested possibility is to
increase the number of hidden units. Although this will
work because of the higher dimensionality of the error
space, and the chance to get trapped is smaller, it appears
that there is some upper limit of the number of hidden units
which, when exceeded, again results in the system being
trapped in local minima [8].
The Back Propagation with tunneling is used to train
the Designed BP Neural Network which replaces the
gradient-descent rule for MLP learning can find the global
minimum from arbitrary initial choice in the weight space.
The algorithm consists of two phases. The first phase is a
local search that implements the BP learning. The second
phase implements dynamic tunneling in the weight space
avoiding the local trap and thereby generates the point of
next descent. Alternately, repeating the two phases forms a
training procedure that leads to the global minimum. This
algorithm is computationally efficient.
ISSN: 2231-5381
B. The steps for the training cycle
1) By applying the feature vectors one by one to the input
layer the output of hidden layer is computed as
=
= 1,2, …
(3)
The function f1(.) is unipolar Sigmoid Function defined
by using (1). Output of output layer is computed by
using (4)
(
=
)
= 1,2, …
(4)
The function f2(.) is Generalized Sigmoid Function
defined by using (2)
2) The error value is computed by using (5)
= (
−
) + ,
= 1,2, … . ,
(5)
3) Error signal vectors
and
of both layers are
computed. Dimension of Vector
is (K X 1) and
Dimension of is (J X 1). The error signal terms of
output layer is given by using
=(
−
)(1 −
) ,
(6)
= 1,2, … . The error signal terms of hidden layer is given by
using (7)
http://www.ijettjournal.org
Page 108
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 2 - Mar 2014
=
∑
1−
,
= 1,2, …
4)
(7)
The output layer weights are adjusted as
=
+
,
(8)
= ,2, … = 1,2, … .
The hidden layer weights are adjusted by using (9)
=
+
,
(9)
= 1,2, … = 1,2, … .
6) Repeat the steps 1) to 5) for all the feature vectors
Fig.3 Error surface
5)
7) The training cycle is repeated for 1000 epochs.
V. TUNNELING ALGORITHM
The Second phase in the Back Propagation tunneling
training algorithm is dynamic tunneling. A random point
W in the weight space is selected. For a new point + ,
where
is a perturbation related to W, if ( + ) ≤
( ), the BP is applied until a local minimum is found;
otherwise, the tunneling technique takes place until a point
at a lower basin is found as shown in the Fig 3. The
algorithm automatically enters the two phases alternately
and weights are modified according to their respective
update rules. The tunneling is implemented by solving the
deferential Equation by using (10)
( )
=
(
( )
( )∗
−
)
/
(10)
Where
is the learning rate,
()
,
( )
=
( )∗
( )∗
is the local minima of
+
( )
VI. FEATURE EXTRACTION USING LINEAR
PREDICTIVE
CODING (LPC)
A. Introduction.
Linear Predictive Coding (LPC) is one of the most
powerful speech analysis techniques, and one of the most
useful methods for encoding good quality speech at a low
bit rate. It provides extremely accurate estimates of speech
parameters, and is relatively efficient for computation. [4]
B. Basic Principles.
LPC starts with the assumption that the speech
signal is produced by a buzzer at the end of a tube. The
glottis (the space between the vocal cords) produces the
buzz, which is characterized by its intensity (loudness) and
frequency (pitch). The vocal tract (the throat and mouth)
forms the tube, which is characterized by its resonances,
which are called formants.
LPC analyzes the speech signal by estimating the
formants, removing their effects from the speech signal,
and estimating the intensity and frequency of the remaining
buzz. The process of removing the formants is called
inverse filtering, and the remaining signal is called the
residue.
(11)
The equation is integrated for a fixed amount of time, with
a small time-step ∆ . After every ∆ , E(W) is computed
()
with the new value of
keeping the remaining
components of W unchanged.
Tunneling comes to a halt when E(W)
E(W*), and
initiates the next gradient descent. If this condition of
descent is not satisfied, then this process is repeated with
( )∗
all the components of
until the above condition of
descent is reached. If the above condition is not satisfied
()
for all
, then the last local minimum is the global
minimum [21][22].
ISSN: 2231-5381
The numbers which describe the formants and the
residue can be stored or transmitted somewhere else. LPC
synthesizes the speech signal by reversing the process: use
the residue to create a source signal, use the formants to
create a filter (which represents the tube), and run the
source through the filter, resulting in speech.
Because speech signals vary with time, this
process is done on short chunks of the speech signal, which
are called frames. Usually 30 to 50 frames per second give
intelligible speech with good compression.[20]
http://www.ijettjournal.org
Page 109
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 3 - Mar 2014
C. Windowing.
For short-term analysis the signal must be zero
outside of a defined range this is performed by multiplying
the signal with a window normally we choose a windowwidth of 20 – 30 ms .the window-shift is usually 10 ms.
D. Window shapes.
Window shapes may be rectangular window, Wn =1,
Hamming window , Wn= 0.54-0.46 cos(2wn / (N-1)).
LPC
coeff
speaker
Speaker1
Speaker1
Speaker1
Speaker1
Speaker2
Speaker2
Speaker2
Speaker2
Speaker2
Speaker3
Speaker3
Speaker3
Speaker3
Other common windows: Gauss-, Hann-,
Blackmann-Window. Rectangular window,Wn=1, And
window width of 20ms.
LPC uses the autocorrelation method of
autoregressive (AR) modeling to find the filter coefficients.
The generated filter might not model the process exactly
even if the data sequence is truly an AR process of the
correct order. This is because the autocorrelation method
implicitly windows the data, that is, it assumes that signal
samples beyond the length of x are 0.
Coeff
1
Coeff
2
Coeff
3
Coeff
4
Coeff
5
Coeff
6
Coeff
7
Coeff
8
Coeff
9
Coeff
10
Coeff
11
1.000
1.9501
1.2311
1.6068
1.000
1.1897
1.1934
1.6822
1.000
1.000
1.0648
1.9883
1.5828
-2.801
-1.851
-2.570
-2.194
-2.389
-2.199
-2.196
-1.707
-2.389
-2.167
-2.102
-1.178
1.5843
2.619
3.569
2.850
3.2262
1.9524
2.1421
2.1458
2.6346
1.9524
1.3171
1.3819
2.3054
1.8999
-0.1713
0.77883
0.05983
0.43554
-0.7100
-0.5203
-0.5165
-0.0278
-0.7100
-0.1625
-0.0977
0.8258
0.42026
-1.3305
-0.3803
-1.0994
-0.7236
0.21869
0.40834
0.41212
0.90091
0.21869
0.54389
0.60867
1.5322
1.1267
0.2278
1.178
0.45901
0.83471
0.83471
-0.1026
0.08703
0.09081
0.5796
-0.8750
-0.8103
0.11324
-0.2923
1.4217
2.3718
1.6528
2.0285
0.02764
0.2173
0.22107
0.70987
0.21285
0.27763
1.2012
0.79564
0.795
-1.4536
-0.5034
-1.2225
-0.8467
0.02490
0.21456
0.21834
0.70713
0.25396
0.31874
1.2423
0.83675
0.836
0.54563
1.4958
0.77677
1.1525
0.056714
0.24637
0.25015
0.73894
0.091236
0.15602
1.0796
0.67403
0.764
-0.03196
0.91817
0.19918
0.57488
-0.14382
0.045834
0.049611
0.5384
-0.37531
-0.31053
0.61302
0.20748
0.209
-0.02005
0.93008
0.21109
0.58679
0.070365
0.26002
0.2638
0.75259
0.17657
0.24135
1.1649
0.75936
0.7541
Table.1 Linear predictive coding (LPC) coefficients for three speaker
VII. EXPERIMENTAL RESULTS
This project aims towards the implementation of
Tunneling algorithm to train a Back propagation neural
network for the enhancement of the computational effort
required for training the network.The test are carried out
for speaker recognition problem. For each speech signal
11 features are extracted using Linear Predictive Coding
(LPC) technique & is classified into one of 3 speakers as
shown in the Fig 4.
The Back propagation network is trained with five
utterances for each word and tested with five more
utterances which are different from the trained
utterances.The network was designed to recognize the 3
ISSN: 2231-5381
speakers, 11 features are extracted from each using
standard LPC analysis .The three speech signals for three
speakers for the word ‘hello’ are shown in the Fig 5, Fig 6
and Fig 7 respetively. The features so obtained for three
speakers for four utterances each is tabulated in
Table1.The designed BP neural network consists of 11
inputs nodes, with variable number of hidden nodes from
1 to 15 and 3 output nodes . The experimental results are
verified by plotting percentage of correct classification
along Y axis and Variable number of hidden nodes on X
axis which is shown in Fig 8 and it was observed that as
the number of hidden nodes are varied between 1 and 15
the percentage of correct classification also increases
linearly.
http://www.ijettjournal.org
Page 110
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 2 - Mar 2014
Fig.4 Block Diagram
Fig.8 percentage of correct classification
verses No. Of hidden nodes
Fig.5 speaker 1 speech signal for the word hello
VIII.
CONCLUSION
The Back Propagation Neural Network training with
Tunnelling algorithm is proposed for Identification of
different speakers. The proposed method is tested for
three speakers of ten utterances each. It is found that this
method requires less time for training the BP neural
network and has good success rate in Identification of
speaker. It was also observed that the percentage of
correct classification increases with increase in number of
hidden nodes.
References:
Fig.6 speaker 2 speech signal for the word hello
Fig.7 speaker 3 speech signal for the word hello
ISSN: 2231-5381
[1] Rabiner LR, juang BH, “Fundamentals of speech
recognition”, Prentice Hall India
[2] Sadaoki Furui ,Furui Furui, “Digital Speech
Processing, Synthesis, and Recognition”
[3] M R Schroeder, ‘Speech and speaker Recognition ’
[4] John L Ostrander, Timothy D ,”Speech Recognition
Using LPC Analysis”
[5] Tom Mitchell, “Artificial Neural Networks,” in
Machine Learning, 1st ed. McGraw- Hill ,1997
pp.95-111
[6] peter dreisiger, cara macnish2 and wei liu “estimating
conceptual
similarities
using
distributed
representations and extended backpropagation”
(ijacsa) International journal of advanced computer
science and applications, vol. 2, _o. 4, 2011
http://www.ijettjournal.org
Page 111
International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 3 - Mar 2014
[7] O.Batsaikhan and Y.P. Singh, Mongolian Character [17] john h. l. hansen and brian d. womack , “Feature
Recognition using Multilayer Perceptron, Proceedings
Analysis And Neural Network-Based Classification
of the 9th International Conference on Neural
Of Speech Under Stress” , ieee transactions on speech
Information Processing, Vol. 2,2002.
and audio processing, vol. 4, no. 4, july 1996 307
[8] D.Y. Lee, Handwritten digit recognition using K [18] sharada c. sajjan and vijaya c,” Speech Recognition
nearest neighbour, radial basis function and
Using Hidden Markov Models” , world journal of
backpropogation
neural
networks,
Neural
science and technology 2011, 1(12): 75-78 issn: 2231
Computation, Vol. 3, pp. 440-449.
– 2587
[9] a.m.
numan-al-mobin,
mobarakol
islam, [19] ben pinkowskilpc, “Spectral Moments For Clustering
kaustubhdhar, tajul Islam, md. rezwan,m. Hossain
Acoustic Transients” , ieee transactions on speech and
“backpropagation with vector chaotic learning rate”
audio processing, vol. 1, no. 3, july 1993
[10] marine campedel-oudot, member, ieee, olivier cappé, [20] aki härmä, member, ieee , “Linear Predictive Coding
member, ieee, and eric moulines, member, ieee , “
Withmodified Filter Structures” , ieee transactions on
Estimation Of The Spectral Envelope Of Voiced
speech and audio processing.
Sounds Using A Penalized Likelihood Approach” ieee [21] chowdhury, singh, “use of dynamic tunneling with
backpropagation in training feedforward neural
transactions on speech and audio processing, vol. 9,
networks”
no. 5, july 2001 469
[22] t. kathirvalavakumar, p. thangavel, “a modified
[11] jonas samuelsson and per hedelin, “Recursive Coding
backpropagation training algorithm for feedforward
Of Spectrum Parameters’’ ieee transactions on speech
neural networks.”
and audio processing, vol. 9, no. 5, july 2001
[12] bishnu s. atal “The History Of Linear Prediction”
[13] roy c. snell, member, zeee, and fausto milinazzo,
“Formant Location From Lpc Analysis Data” Ieee
Transactions On Speech And Audio Processing, vol.
1, no. 2, april 1993 129
[14] madre, g.baghious, “ Linear Predictive Speech Coding
Using Fermat Number Transform”
[15] shih-chi huang and yih-fang huang , “Learning
Algorithms For Perceptrons Using Back-Propagation
With Selective Updates”
[16] fu-chuang chen, “Back-Propagation Neural Networks
Fornon linear self tuning adaptive Control “
ISSN: 2231-5381
http://www.ijettjournal.org
Page 112
Download