Document 13760458

advertisement
PFORMANCE EVALUATION OF AUTOMATIC &PEAKER RECOGNITION
I),
Venugopal and
v,V,S,
SCHEMES
Sarrna
Indian
Institute of Science
Bangalore 560012
ABSTRACT
A mathematicalformulation of an automatic speaker verificationscheme as a two
class pattern recognition problem is presented. Expressions for the expe cted values and
the variance of the design—set and the
set error rates are derived, The bound on
the performance of an automatIc speaker id-
test
entification system as a cascade of independent verification systems is derived, The
implications of these results in the design
of az automatic sp eaker recognition, system
are discussed.
I INTRODUCTION
The problem of performance evaluation
of any automatic speaker verification system (ASVS) Is yet to be satisfactorily SQlved. In general pattern recognition literature, the performance estimation has received considerable notice recenty and the
in the design
of an AEVS is discussed in a recent note of
importance of these results
the
authors'.
In
this paper, an ASVS
a 2—class
bring out
is analysed as
M known customers
parameters on the performance estimation following the mathematical model of
verification system provided by
aDixon
patern
• The expected error rates for both
the
and the test-set are derived
as adesign-set
function of the number of samples per
speaker (N), the number of features (L) ,the
number of customers (M), the number of design impostors (K) and the Mahalanobis distance () between the classes under Gaussian assumptions, The variance of the designst error rate is derived bringing out the
importance of choosing sufficient number of
Impostors at the system design stage, The
expected error rates for an automatic speaker identification system (ASI) as a cascade of N independent ASVS are derived• The
importance of these resalts in the design of
an automatic speaker recognition system is
pointed out.
MATHEMATICAL NODEL FOR ASVS
and
S
all unknown
a single alien group,
ofil,...,MJ
S0
speakers (rest of the world) This momion of an alien
of
disgroup
speakers
tinguishes an ASVS from an ASIS and is essential as there is always a chance of a
person not belonging to the set S trying
to impersonate as one of S. Corresponding
to each member of S, there is a label L.(j=
If any speaker wants to be veifasd under the label L1, then we define
classes: the C. ciss consisting of the
speaker S• and theimposter class C' consisting the remaining (N—i) speaers and
the alien group S.
two
Cj={s;cj=so,s,
pattern recognition problem to
explicitly the effect of the va-
rious
II
predetermined phrase, He also presents a
label claiming that he is a particular
ttcustomeru belonging to the system. In the
system, a predetermined set of features,
possibly
depending on the label entered,
is
extracted
from the utterance and the speaker is either accepted or rejected.
Let the ASVS be designed for a set of
The
system on the
i=1,,,•,M,
j=l, . .
basis of the feature veassign the speakers to
or Cj. The accept/rejectrule
using the optimal Bayesian classifier is to
to accept the claim of a particular speaker as valid if
ctor X
of dimension L
C
will
P(C/L ,X) > P(C/L ,x)
and reject
otherwise,
ven
The
(1)
it
aposteriori probabilities are giP(X/L,Cj)P(L/C)P(C) /P(L,X)
p
/(L(2b),X)
P(Cj/L,X)= p(X/Lj
where the
probabilities have the usual mep(L./C.) is the probability of
be verified under
speaker
waa.jn
his
libel L. and p(L/C.) is the probability of an ".mpostor" trng to present
label
While the
values of the
anings
and
b
to
on
L.,
actual
variousprobabilities of (2a)a and (2b)
depend upon the conditions in particular
environment, we may assume in most cases
In an ALVS, a given speaker not necea
ssarily belonging to the system,
utters
780
p( )'>p(C1),
a piori probabdities
for all
A reasonable
and
assumspeakers1
assumption considering
the above inequalities is (L./C.)p(Cj)=
C1 if
dtXe,
i.e.
O
ing3equal
p(L/C1)p(C). Again, it may
e
ostulated
ifi the presence of the knowledge of
class Ci the feature vector is independent
of the label L, p(X/Lj,Cj)=p(X/Cj). The decan now be rewritten as
cision rule
that
£ccept
(l
if
(3)
p(X/L3,C3)>p(X/L3,C3)
and reject otherwise,
The class-conditional densities in (3)
are given by
pc/C3)=pç/S3)
N
=
l,...,M
gn set)0
(4)
T=pr
where
ass C1.
pj)
calculated exactly. Otherwise
are to be estimated from N labeli=l,...,M
led samples of each speaker of sot 8 where
as p(X/80) is to be estimated from a set of
be
"impostor references"
(of speakers of
Just as the number of training samples per
class is finite, the number of impostors(I
to
that can be considered
represent the gris also finite.
Nature of the Class-ConditionalDensities:
Assumption: p(X/C3)'N(3,E) and p(X/C)
where E3=E,E3=and p=M+K- are
N(3,
the
S
known covariance
matrices of the
two cland h3 are
asses C and C and the means
to be estirnate. from N design simples of_
class C and pN design samples of class C.
Remark: It may not be unreasonable to assf.h.
ume p(X/S-)=p(X/C1) to be Gaussian. Then
p/) i a finite mixture of Gaussians,
This will be a multimodal distributionfor
small N and K. Again it may not be unreasonable, for large N and K, to fit a Gaussian distribution to samples of class C3.
Classifier 2 For notational convenience,we
denote C3 by Cl and C3 by C2 and p(X/C1)r"
this caN(1,E) and p(X/C2)-" N'2,pE). Formatrices
se of unequal means and covariance
the minimum probability o± error classifier
is the one using quaciratic diecriminaxtt funlinear disction. In this paper, the minimax
criminant with eual error rate is used for
further analysis. We define a linear die
criminant
d = ' —l
)
where
*
= t+(l-t)
(5)
where t
1 is a
u
O=d' (B1"2
Therefore,
•I-
h)/(l/2+l)
a sample X
X-
—
I
(8)
—
Proposition 1 2 CT in (8) may be expressed
as the prbability of the ratio of two nonand '2 being greater
central
variates
than the quantity (l-l)/(l+fl).
(9)
(l-P1)/(l+P1)]
where
and °2 are distributed as
and 2(L,2).
2
X1= [2(l+P1)]
pNjq3+l)- [1+2+p+l)2NiJ
2
x= [2(1-F1)jN
(
+lYi-[l+2(p+i) 2N]} z2
= Mahalanobis squared distance between
t
the two populations =
—
(3/2..j) (p+l) (l+2+(p÷l) 2N)
Proof : (The proof follows that of Moron4.
We define two random vectors u and v such
(h)
that
u= (pN/p+l)' (E')
i+p
(h-) and
'
+(+l) 2N)T
2u1+u2
where
Then
T can be written as(u+v)-(u-v)'
T=pr(u'v O)=prj(u+v)'
(u+v) and (u-v)
(u-v)O1.
are die tributed independe-
with dispersion matrices 2(l+Pl)IT
ntly
and
where
is the corre1aion
2(l-P1)I,
coefficient
etween
P1
corresponding pair of
is the LXL unit
elements of u and v and
matrix0 Thus
(u+v)1
(u=v)/(l+Pi)
w1—(-)
and m94(u_v)t (uv)/(1—P), eqn.(9) follOws.
if,
Detailed proof is given in reference 5.
Proposition 2 : It is also possible to express the expeced error rate CT in closed
form expression
parame-
of our choice and =(l/N) E31
=
and
X2. where X1. is the
(l/PN)E!i
labelled
i. or equal
jth rate the
sample of3class
error
threshold 9 is given by
ter
(E)
4!
is
X
the feature vector corresponding to an arbitrary new utterance from cl-
i=O,ij
where p(X/5j), i=O,l,,,.,M are speaker-conditional densities of the feature vector X,
If the distributions p(X/$1),i=O,..M
are completely known, then the error rate
oup
(test
S
ances used Jor designing the system (desiTest-set error rate: The expected test—set
error rate () may be written from (7) as
pOc/s1)p(S)
can
(7)
and to class C2, otherwise,
III, TEI AND DESIGN-SE7 ERROR RATES FOR
AN 81TS
The ASVS may be tested either by new
utterances of the speakers belonging to S
and
set) or by the sample utter-
(6)
T
is assigned to class
781
= Q(L,X1,
2'
where
(10)
0+02
P1)1-c(e1,o2)-exp(- 2
(eie2) { B;1p)(jL+in,
'm
m=l4L12
e2(1- 1)c(e,)
is
the
the
circular coverage function, I (z)
modified Bessel function of fir& kind
B(p,q) the incosiplote
p-function. The upm
0 and
the lower sign
per sign
is for
for m<0,
Design—set error rate
sign set error rate
from eqn,(7) as
—
:
The expected
However,
(12)
feature vector correspondM
utterance from the dearbitrary
sign set of class
where X.
ing to
is
'
the
0-4
cc
variates m and w being grea-
=pr[o3/w4 (l-?2)/(l+F2)7
and
w and
0
(13)
are distributed as
be seen from (15) that
a
large N,
large
K
is not
O
\rEsTEr
-Z
TLTS
DE5J SET
cc
ter than the quantity
where
-\
C1,
in (12) may be expressed
Proposition 3:
as the proabilityof the ratio of two noncentral
It may
essential,
0-5 -
ol
+1
the
biased and ar not reliable,
mistically
for
de-
may be written
pr[(-) (E')X1—
pulation,
variance of the design-set error rate
is inversely proportional to the number of
recorded sample utterances for speaker and
the total number of speakers including the
design impostors, Fig, 2 and (15) show that
for small number of customers (M if sufficienily large number of_design_impostors
and
are opti(K) are not used, both
(11)
jLm)5m03.
where0=(l+
mistically biased, Fig, 2 shows that for
an SV$ the expected error rates become independent of population size for large po-
Cii
Ui 0.2
%2(L,213)
C—
U
(+1)2NJ
= [2(1-r2)
:
0
Ui
oc
E2
p+1)
-3/2(+2)
2
+
2
It is
EJ IL
6
7
6
Rates as Function of
N/L
TEST SET
in a
cloAC-J
sed form expression
k
repectivly,
fined in (lL with K. and
replaced by
)4 an F2
Proposition 5: The variance
dom variable eD
given by
and
(14)
de-
0
a:
being
Cii
_2 of the ran-
4r
Ui
C—
(i-c )/(p+l)N
(is)
proof follows that of Foley and is given in reference 5,
and
-.
In Fig. 1 and 2 the values of
are plotted as functions of N/i and B.
Fig, 1 gives the nature of the biases that
creep into the eatimates for small N/L.The
test—set error rate is pessimistically biThe
design—set error
lL4
UI
= Q(L,)3,
where
X4, F),
) is the same function as
ased and the
5
also possible to ex-
press the expected rror rate
is
4
3
The proof is similar to that of
proposition 1,
Proposition4;
=
0
Fig,l-Expected Error
2
2
Proof
Ui
+
x342(l+F2)]8s+l)
rate
is opti
782
'C
Lu
I0
20
30
40
50
60
70
80
0
Fig,2-Expected Error Rates as Functions of
Population Size
PERFORM1NCE EVALUATION OF
IV
IS
of
s ASIS can be reali2:ed as a cascade
(N-i) or N ASVS's as shown in Fig.3All
the A$VSs are assumed to have identifical
performance. If there is a reject option
at Mth stage also the possibility of (14+1)
classes corresponding to N customers and
an alien class (as not belonging to the
system) can be introduced in an ASIE as
well, On the other hand, the decision can
be terminated at (M-l)th stage and speaker 814 can be accepted.
Let p be the probability of error and
q the probability of correct decision of an
ABVS, Assuming that the jth speaker has test,ed the system, we can draw the decision
tree
as shown in Fig•4, If D1, i
l,.,.,M
is the decision taken by the system at the
ith stage that the speaker is S, then we
can write the probability of correct decision
s
=E
E P(D./S.)P(S.)
j=
.3
equal
(j/) ( j=1 q) (,/) ( qM/ (1-q)
(17)
(19) shows the effect of populaon the performance of an ASIS,,
thus corroborating Dodding ton' s results '.
Equation
tion si7e
V
Nov.
DISCUSSION OF RESULTS
of
The design
an ASVS proceeds in
three steps: (i) Data base preparation,
feature selection and extraction and
statistical classification and performance evaluation. All the stages are,of
The
sectcourse,
fication into
(iii)
results of
interrelated.
ion III provide information on: (i)Preparation of data set (number of design samper speaker (N)),(ii) The
pie utterances
dimension of the feature vector (L). If
niL ratio is small there will be wide disin performance estimates that wifl.
parities
be
if the
is tested on the
design
1969.
3. T,W,Anderson and R.R.Bahadur, "Classitwo multivariate normal
distributions with different covariance
matricest', Ann, of Math, Statist,, Vol.33
pp 420-431, June 1962.
(ii)
obtained
for
62, pp 141-148, Apr. 1975,
5. V,V.S,Sarma and B, Venugopal, "Statistical problems in performance assessment
of Automatic Speaker Recognition Systems"
CI? Report No,61, Dept, of BCE, Indian
Inst0 of Science, India, Jan. 197?,
6, D.H, Foley, "Considerations of sample
and feature size", IEEE Trans,Information
Theory, Vol.17-18, pp 618-626,Sept,1972.
7, A,E,Rosenberg, "Automatic speaker yen—
fication: a review", Proceedings IEEE,
Vol.64, pp 475—487, Apr. 1976,
(16)
.3
Assuming
apriori probabilities
for all
speakers belonging to S and from
Fig. 4, we can write
=
2
4. M,A.Moron, "On the expectation of erro
ro of allocation associated with a linear
diseriminantfunction", Biome trika, Vol.
N
P(5.,D.) =
complex for small or large N because of
the presence of the alien class,
REFERENCES
1, V.1T.S, Sarma and D, Venugopal,
"Performance evaluation of automatic
speaker verification systems", IEEE
Trans,Acoustics, Speech, Signal Processing
(to be published).
R,C.Dixon and P,E. Bourdeau, 'tMathe
matical model
pattern verification",
IBM J. of R and D, Vol, 13, pp 717-721,
I
system
set or on an independent test set,
(iii) The discriminating ability of a feature depends on the appropriate distance
the unbetween the classes concerned.
derlying distributions are Gaussian the
distance between classes itself provides an
estimate of error, It should be kept in
mind, however, that the distance estimate
number of samples per class
from a
is a biased estimate of the true distance
between the populations, (iv) The error
and
are functions of
rates
(Fig.2),
Even if an ASV is to be designed pfor a
small number of customers N, a sufficiently
large number K of impostors should be considered
the design set so as to make the
If
-1
cI
'
)40
0UVt
>
—--i
'r
tnI
kio
r1
4o
u-I
>1
w
YE5
YE
DECISJOM
Fig,3 - ASIS as
a
Cascade of ASVS
finite
Dj
in
estimates reliable. For large
be so important. The design
of a verification system is thus equally
Fig,4 —
performance
N
this may not
783
Decision Tree
for Speaker
j
Download