Text-independent Speaker Verification Based on High

advertisement
University of Fribourg, Switzerland
Department of Computer Science
TEXT-INDEPENDENT SPEAKER VERIFICATION BASED ON
HIGH-LEVEL INFORMATION EXTRACTED WITH
DATA-DRIVEN METHODS
Asmaa EL HANNANI
The focus in this thesis is speaker verification which aims at authenticating the claimed
identity by the speaker. The speech signal conveys roughly two kinds of information about the
speaker's identity. The first set of information reflects the spectral properties of speech (lowlevel) which are related to the physical structure of the vocal apparatus. These parameters are
used since the beginning of the research in automatic speaker recognition. The second set of
information reflects the behavioural traits (high-level) such as prosody, phonetic information,
pronunciation, idiolectal word usage, conversational patterns, topics of conversations, etc.
Recently, various studies have shown that high-level features convey speaker specific
information and can be added to the low-level features in order to increase the robustness of
the speaker verification system. Usually, these features are extracted by analyzing streams
produced by phonetic speech recognition systems. Two of the major problems that arise when
phone based systems are being developed are the possible mismatches between the
development and evaluation data and the lack of transcribed databases. In order to solve these
two problems we propose in this thesis to use a data-driven approach instead the phonetic one
to extract such high-level features. In this way, the availability of corpora is much less an
issue and the training corpus can be chosen to match the working conditions as much as
possible.
The major contribution of this thesis is the use of speech segmentation, grounded on datadriven techniques that require neither phonetic nor orthographic transcriptions of the speech
data, to extract pseudo-phonetic and high-level information for text-independent speaker
verification. Our data-driven approach is based on Automatic Language Independent Speech
Processing (ALISP) tools. We have shown, throughout this thesis that units automatically
acquired from the speech data, can be used like phonemes in different levels in order to
extract complementary types of information and to improve the speaker verification accuracy.
All results reported in this thesis have been validated on the NIST Speaker Recognition
Evaluation data.
Jury:
-
Prof. Béat Hirsbrunner, University of Fribourg, Switzerland (Jury President)
Prof. Rolf Ingold, University of Fribourg, Switzerland (PhD Director)
Dr Dijana Petrovska-Delacrétaz, INT, Evry, France (PhD Co-director)
Prof. Régine André-Obrecht, Université Paul Sabatier, France (External Expert)
Dr Frédéric Bimbot, IRISA, France (External Expert)
Download