Speech Feature Analysis Using Step-Weighted Linear

advertisement
Speech Feature Analysis Using Step-Weighted
Linear Discriminant Analysis
Jiang Hai, Er Meng Joo
School of Electrical and Electronic Engineering, Nanyang Technological University
S1-B4b-06, Nanyang Avenue, Nanyang Technological University, Singapore 639798
Telephone: (65) 67905472
EDICS: 2.SPEE
Abstract: In the speech feature extraction procedure, the relative simple strategy to promote
the discriminant of feature vectors is to plus their deltas. Followed the dimension of the
feature vector will increase remarkably. Therefore, how to effectively decrease the feature
space dimension is key to the performance of calculation. In this paper, a step-weighted linear
discriminant dimensionality reduction technique is proposed. Dimensionality reduction using
the linear discriminant analysis (LDA) is commonly based on optimization of certain
separability criteria in the output space. The resulting optimization problem using LDA is
linear, but these separability criteria are not related to the classification accuracy in the output
space directly. As a result, even the best weighting function among the input-space results in
poor classification of data in the output-space. Through the step-weighted linear discriminant
dimensionality reduction technique, we can adjust the weight function of between-class
scatter matrix based on the output-space when one dimension is reduced. We describe this
method and present an application to a speaker-independent isolated digit recognition task .
Keywords: Dimensionality reduction, Linear Discriminant Analysis, Speech recognition
-1-
n, k , l
Counter and number of patterns
N
The total number of data samples
K
The total number of classes
d (kl )
The Euclidean distance between the means of class k and class l in
the input-space

The weight function
nk
The number of training vectors in class k
xnk , xnk
A training pattern and the label. For n  1,..., N and k  1,...K
SW , S W
The within-class scatter matrix
SB , SB
The between-class scatter matrix of the class means
vk , vk
The mean of class k
v,v
The global sample mean
Tn
The transformation matrix is formed by [1 , 2 ,..., n ]

The mapping  : x  y
m, n
The dimension of the input-space and output-space
Table 1 Notation Conventions Used in This Paper
1. Introduction
Dimensionality reduction is the process of mapping high dimensional patterns to a lower
dimensional subspace and is typically used as a preprocessing step in classification
application. The optimality criterion of choice for classification purposes is the Bayes error,
which is the minimum achievable classification error, given the underlying distribution.
However, it is a time consuming or unreliable task to estimate the Bayes error. Because the
difficulty in directly estimating the Bayes error, linear projections based on using the scatter
matrices are quite popular in dimensionality reduction for the purposes of classification. The
optimality criteria of Fisher’s Linear Discriminant is J  tr ( SW1S B ) [1].
It is common to apply linear discriminant analysis (LDA) for statistical pattern classification
tasks to reduce computation and to decrease the dimension. The LDA transformation attempts
-2-
to reduce dimension with keeping most of discrimination information in the feature space.
Recently, LDA and improved LDA has been applied to several problems, such as face
recognition[2][3] and speech recognition[4]. In speech recognition task, feature space
dimension can be increased by extending the feature vector to add a range of neighboring
frame data. Doing this will noticeably increase discrimination of feature space, but the
computation becomes impractical at the same time. Efficiently compress the dimension of
feature space is very useful in speech signal processing.
Due to an optimality criterion based on scatter matrices in general is not directly related to
classification accuracy. Therefore, a weighted scatter matrix is often constructed in which
smaller distances are more heavily weighted than larger distances [5]. However, the scatter
matrix is calculated in the input-space and the true value of scatter matrix in output-space is
far from correct when the dimensionality is reduced more than one dimension. Rohit Lotlikar
and Ravi Kothari proposed the Fractional-Step Dimensionality Reduction method to
overcome this problem [6]. However, they just considered the between-class scatter matrix
and without calculating the within-class scatter matrix. This will lead to lose useful
discriminant information in the procedure of projection. In this paper, we introduce the
concept of step-weighted dimensionality reduction, wherein, the dimensionality is reduced
from n to m (m  n) at one dimension a step. In addition to describing the algorithm of
step-weighted LDA method, we present an application to the speaker-independent isolated
digit word recognition problem.
2 The conventional LDA
The LDA problem is formulated as follows [7]. Let x   n be a feature vector. We seek to
find a transformation x  x ,  :  n   m with m  n , such that in the transformed space,
minimum loss of discrimination occurs. In practice, m is much smaller than n . A common
form of an optimality criteria to be maximized is the function J  tr ( SW1S B ) . In classical
LDA, the corresponding input-space within-class and between-class scatter matrix are defined
by,
-3-
K
S B   nk ( k   )( k   ) t
(1)
k 1
nk
K
SW  

k 1
k 
1
nk

1
N
( xnk  k )( xnk  k ) t
n 1
nk
x
n 1
(3)
k
n
K
n 
k 1
k
(2)
(4)
k
The LDA is to maximize in some sense the ratio of between-class and within-class scatter
matrices after transformation. This will enable to choose a transform that keeps the most
discriminative information while reducing the dimension. Precisely, we want to maximize the
objective function
max

S B t
S w t
The columns of the optimum  are the relative generalized eigenvectors corresponding to
the first p maximal magnitude eigenvalues of the equation
S B   S w 
(5)
3 Step-weighted Linear Discriminant Analysis (SW-LDA)
Because the definition of between-class scatter matrix is not directly related to classification
accuracy. Therefore, a weighted scatter matrix is often constructed in which smaller distances
are more heavily weighted than larger distances [5].
K
K
S B    (d ( kl ) )nk nl (vk  vl )(vk  vl ) t
(6)
k 1 l 1
In conventional LDA, if we wish reduce the dimensionality from n to m ( m is much
smaller than n ), we would compute S b and its eigenvectors 1 , 2 ,..., n . We would obtain
the m dimensional representation which is spanned by 1 , 2 ,...,  m . When there are have
enough many classes, it is very possible that a pair of classes have the same orientation as  n
or  n1 , n2 ,..., m1 . Because the 1 , 2 ,..., n are orthogonal respectively, the two classes
would heavily overlap in the m dimensional space. Though the two classes are
-4-
well-separated in the original space, they were not sufficiently weighted in computing S b
after projecting some dimensions,
We gradually compress the data one dimension per step. At each dimensional reduction step,
we recompute the between-class and within-class scatter matrix based on the changed
interclass distances and intraclass distances and rebuild the weighting function, then compute
its eigenvectors. Thereby those class centers which come closer together can be increasingly
weighted. The corresponding output-space within-class and between-class scatter matrix are
defined by
K
S B   nk ( k   )( k   ) t
(7)
k 1
K
K
S B    (d ( kl ) )nk nl ( k   l )( k   l ) t
(8)
k 1 l 1
nk
K
SW  

k 1
n 1
nk
k 
1
nk
x
 
1
N
n 
n 1
(10)
k
n
K
k 1
k
(9)
( xnk   k )( xnk   k ) t
(11)
k
The entire procedure is expressed for a reduced dimensionality of m .

Step 1: Calculating the S B and S w according to the equation (1) and (2);

Step 2: Computing the transformation matrix Tn1  [1 , 2 ,, n1 ] and reduce the
feature space dimensionality from n to n  1 ;

Step 3: Calculating the S B and S w according to the equation (8) and (9);

Step 4: Computing the transformation matrix Tn2  [1 ,  2 ,  ,  n2 ] and reduce the
feature space dimensionality from n  1 to n  2 ;

Step 5: Repeat the Step 3 and Step 4 until feature space reach to m dimension;

Step 6: Computing the transformation matrix T  Tm * Tm1 *... * Tn2 * Tn1 .

Step 7: After training procedure, using transformation matrix T project the observed
feature vectors from
n to m ;.
-5-
4 Application to Speech Database
Our speech recognition experiments were based upon a HMM based speech recognizer for
speaker independent isolated English digits task. The TI46 corpus of isolated words which
was designed and collected at Texas Instruments (TI) is used in the proposed system. The
TI46 corpus contains 16 speakers: 8 males labeled and 8 females. There are 15 utterances of
each English digit (0~9) from each speaker: 10 designated as training tokens and 5 designed
as testing tokens in the proposed system. The front-end features of this system were 16
Mel-frequency cepstral coefficients plus their deltas. Therefore, the original dimensionality of
the speech feature space is 32. The recognition system was trained by 10 clear training
tokens per person and the training corpus has 1600 speech utterances altogether. To test the
robustness of the speech recognition, we add white noise on the clear testing utterances
according to different Signal-to-Noise Ratio (SNR) .
To apply LDA, common weighting LDA (W-LDA) and step-weighted LDA (SW-LDA) to our
speech recognition system, we need labeled training data. The labels come from the training
data based on different digit.
In all our simulations, we chose the dimensionality of compressed feature space m =24 and
m =16. We present results obtained using each data set with the LDA, W-LDA and the
proposed SW-LDA algorithms. The W-LDA and SW-LDA ran the simulations for  (d )
taken from the set {d 0 , d 2 , d 4 , d 6 , d 8 , d 10 , d 12 , d 14 , d 16 } . For each choice of  (d ) , the
training accuracy was noted.
LDA
SW-LDA
W-LDA
The power of d
The power of d
SNR=100
SNR=20
-6-
-16
-14
-12
-10
-8
-6
-4
-2
97
96
95
94
93
92
91
90
0
-16
-14
-12
-10
-8
-6
-4
-2
% recognition
98.5
98
97.5
97
96.5
96
95.5
95
0
% recognition
LDA
SW-LDA
W-LDA
The power of d
-16
-14
-12
-2
0
-16
-14
-12
-10
-8
-6
-4
-2
0
70
-10
%
75
-8
80
-6
85
94
93
92
91
90
89
88
87
-4
90
% recognition
LDA
SW-LDA
W-LDA
recognition
LDA
SW-LDA
W-LDA
The power of d
Average Recognition Rate
SNR=10
Figure 1 The Recognition Rate for dimensionality reducing from 32 to 24
Figure 1 shows the speech recognition results when the dimension of feature vectors is
reduced from 32 to 24. The accuracy of recognition shows that the SW-LDA has better
performance than the W-LDA method generally. The SW-LDA is better than the common
LDA when weighting function is d 0 , d 1 and d 2 . The best weighting function for
SW-LDA is  (d )  d 0 .
LDA
SW-LDA
W-LDA
98
95
96
90
% recognition
94
92
90
80
75
The power of d
-16
-14
-12
The power of d
SNR=100
SNR=20
LDA
SW-LDA
W-LDA
LDA
SW-LDA
W-LDA
90
% recognition
80
75
70
65
60
55
50
45
85
80
75
The power of d
The power of d
Average Recognition Rate
SNR=10
Figure 2 The Recognition Rate of dimensionality reducing from 32 to 16
-7-
-16
-14
-12
-10
-8
-6
-4
-2
0
-16
-14
-12
-10
-8
-6
-4
-2
70
0
% recognition
-10
-8
-6
-4
0
-16
-14
-12
-10
-8
-6
-4
-2
70
0
88
85
-2
% recognition
LDA
SW-LDA
W-LDA
Figure 2 shows the testing accuracies obtained with common LDA, conventional W-LDA and
SW-LDA for different weighting functions. The rates of speech recognition show that
SW-LDA are better than W-LDA along the range of powers. SW-LDA has better performance
than conventional LDA when the range of powers of d  8 . The best weighting function for
SW-LDA is  (d )  d 8 .
5. Conclusion
We proposed a method of dimensionality reduction based on SW-LDA method. Using
SW-LDA, one can obtain good dimensionality reduction performance than the common LDA
technique. When the dimensionality is reduced much more, the SW-LDA method shows
relatively better property than the conventional weighting LDA method. Applying the
SW-LDA, the speech recognition accuracy rates based on MFCC feature extraction obtained
more increase than that of common LDA and weighting LDA obviously.
References
[1] K.Fukunaga, “Introduction to Statistical Pattern Recognition” New York: Academic, 1990
[2] Belhumeur, Hespanha and Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class
specific linear projection”, Pattern Analysis and Machine Intelligence, IEEE Transactions on,
Volume: 19 Issue: 7, July 1997, Page(s): 711-720
[3] K.Etemad, R.Chellappa, “Discriminant analysis for recognition of human face images”, J. Opt.
Soc. Am. A 14 (8) (1997) 1724-1733
[4] Martin, Charlet and Mauuary, L. “Robust speech/non-speech detection using LDA applied to
MFCC” Acoustics, Speech , and Signal Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE
International Conference on, Volume: 1,7-11 May 2001
[5] Y., Y. Gao and H. Erdogan (2000), “Weighted pairwise scatter to improve linear
discriminant analysis”, in Proc. ICSLP 4, pp.608~611.
[6] Rohit Lotlikar and Ravi Kothari, “Fractional-Step Dimensionality Reduction” IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.6, June 2000
[7] Duchene and S. Leclercq, "An Optimal Transformation for Discriminant Principal Component
Analysis," IEEE Trans. On Pattern Analysis and Machine Intelligence,Vol. 10, No 6, November
1988
-8-
Download