University of Fribourg, Switzerland Department of Computer Science TEXT-INDEPENDENT SPEAKER VERIFICATION BASED ON HIGH-LEVEL INFORMATION EXTRACTED WITH DATA-DRIVEN METHODS Asmaa EL HANNANI The focus in this thesis is speaker verification which aims at authenticating the claimed identity by the speaker. The speech signal conveys roughly two kinds of information about the speaker's identity. The first set of information reflects the spectral properties of speech (lowlevel) which are related to the physical structure of the vocal apparatus. These parameters are used since the beginning of the research in automatic speaker recognition. The second set of information reflects the behavioural traits (high-level) such as prosody, phonetic information, pronunciation, idiolectal word usage, conversational patterns, topics of conversations, etc. Recently, various studies have shown that high-level features convey speaker specific information and can be added to the low-level features in order to increase the robustness of the speaker verification system. Usually, these features are extracted by analyzing streams produced by phonetic speech recognition systems. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. In order to solve these two problems we propose in this thesis to use a data-driven approach instead the phonetic one to extract such high-level features. In this way, the availability of corpora is much less an issue and the training corpus can be chosen to match the working conditions as much as possible. The major contribution of this thesis is the use of speech segmentation, grounded on datadriven techniques that require neither phonetic nor orthographic transcriptions of the speech data, to extract pseudo-phonetic and high-level information for text-independent speaker verification. Our data-driven approach is based on Automatic Language Independent Speech Processing (ALISP) tools. We have shown, throughout this thesis that units automatically acquired from the speech data, can be used like phonemes in different levels in order to extract complementary types of information and to improve the speaker verification accuracy. All results reported in this thesis have been validated on the NIST Speaker Recognition Evaluation data. Jury: - Prof. Béat Hirsbrunner, University of Fribourg, Switzerland (Jury President) Prof. Rolf Ingold, University of Fribourg, Switzerland (PhD Director) Dr Dijana Petrovska-Delacrétaz, INT, Evry, France (PhD Co-director) Prof. Régine André-Obrecht, Université Paul Sabatier, France (External Expert) Dr Frédéric Bimbot, IRISA, France (External Expert)