The EURONOUNCE corpus of non-native Polish for ASR-based Pronunciation Tutoring System Natalia Cylwik, Agnieszka Wagner, Grażyna Demenko Adam Mickiewicz University, Institute of Linguistics, Department of Phonetics, Poznań, Poland The increasing interest in teaching second language (L2) pronunciation and prosody has coincided with the development of speech technology leading to a creation of a number of Computer-Assisted Pronunciation Training (CAPT) systems trying to apply modern techniques such as speech analysis, speech recognition and automatic error detection to non-native speech. These attempts were not always successful as the systems were trained either on the target language only or target language and nonnative speech by speakers with a different native language (L1) background. In the EURONOUNCE project it was assumed that in order for a system to reliably process non-native speech, detect errors and evaluate the learner, it must be trained and tested on “three-language” database: As a result steps: Source language database Target language database Interlanguage (non-native) database a complex multilingual speech database was created following several Text material collection Speaker selection and recordings Annotation Linguistic and statistical analysis of the database THE EURONOUNCE PROJECT Full name: Intelligent Language Tutoring System with multimodal feedback functions Partners: Technical University in Dresden Adam Mickiewicz University in Poznan Slovak Academy of Sciences in Bratislava Russian Academy of Sciences in Petersburg Voice INTER connect GmbH in Dresden Aim: Creation of L2 pronunciation and prosody teaching software (AZAR 3.0) with multimodal feedback functions for the following language pairs: L1 German (DE) - L2 Polish (PL)/Slovak (SK)/Russian (RU)/ Czech (CZ) L1 Polish/Slovak/Russian/Czech - L2 German Feedback: Novelty audio and visual modes based on audio (recording) and visual (oscillogram) representation of the reference voice learners’ speech analysis and HMM-based automatic recognition (recording, oscillogram, segmentation and annotation at the phone level provided) automatic error detection at the phone level colored-scale evaluation of the learner’s pronunciation at the phone level visualization of the tongue and lips articulatory movements text and graphic tutorial on the L2 phonetics Language switch – the software is directed to a clearly defined user – native speaker of a particular L1 trying to acquire a particular L2. MULTILINGUAL SPEECH DATABASES Within the EURONOUNCE project a complex multilingual speech database was created for all language pairs according to the following structure: L1 database (Source-language database) L2 database (Target language database) Reference database Non-native database Speech by 18 native speakers of the source language serving as a reference for the assessment of non-native pronunciation. 50 hours of the target language speech provided by over 100 speakers for the general speech recognizer training. Target language speech by one male and one female professional speaker (reference voices) for implementation in the software. Target speech by 18 L1 speakers for the purpose of collecting evidence of errors and ASR and automatic error detection training Annotation system: a modified version of Polish SAMPA an extended SAMPA for German a set of labels to mark intermediate phonemes (neither Polish, nor German) diacritics to mark insertions, deletions and substitutions, e.g. “-“ LINGUISTIC ANALYSIS At the phone level three kinds of pronunciation errors are distinguished: substitutions, insertions and deletions. The distribution of different types of errors (Fig.1) is similar in different proficiency groups, but the overall number of pronunciation errors (Fig.2) decreases with student’s proficiency. Fig.1 Fig. 2 80% begginers advanced 2486 960 60% advanced (25,99%) 40% 647 287 NON-NATIVE SPEECH DATABASE ON THE EXAMPLE OF L1 DE - L2 PL 573 269 20% The non-native corpus is comprised of 6 tests: Accent test – sentences containing Polish sounds and phonetic phenomena difficult from the point of view of a German learner, e.g. Polish [x] in words such as <ich> (Eng. ‘their’) which Germans might pronounce as [C]. Dialectological test – 124 sentences containing words with alternative pronunciations e.g. <bank> pronounced as /bank/ or /baNk/ and covering Polish phonetic processes e.g. assimilation <bluzka> (Eng. ‘blouse’) pronounced as /bluska/ and a full range of Polish phonemes in different contexts, word and sentence positions and. Spontaneous speech test – 4 tasks such as finishing a sentence, e.g. ‘My hobby is…’ and explaining the meaning of a proverb or idiomatic expression commonly known both in Poland and Germany, e.g. Pol. ‘przemoknąć do suchej nitki’, Germ. ‘keinen trockenen Faden (mehr) am Leibe haben’ (Eng. ‘to get soaked to the skin’). Continuous speech test – three passages taken from stories by H. Ch. Andersen and Grimm Brothers. Prosody test – 59 sentences which aim at collecting evidence of prosodic errors such as erroneous stress placement or non-native-like vowel duration, intonation patterns in neutral sentences vs. sentences with focus, in questions vs. statements, commands and requests, etc. Phondat corpus – 341 phonetically rich and balanced sentences for the purposes of ASR training and testing and collecting mispronunciations of consonant clusters. elementary (42,67%) 0% deletion substitution intermediate (31,35%) insertion The most frequent pronunciation errors concern phonemes not present in student’s L1: L1 DE - L2 PL: fricatives (/x/, /v/), affricates (/t^s/, /d^z/, /t^S/, /d^Z/) and palatal consonants (/n’/, /s’/, /z’/, /t^s’/, /d^z’/), graphemes <ą> and <ę> whose phonetic mapping depends on the context, glides /w/ and /j/ Fig.3 vowel reduction /@/ (9%) only closure (plosives, 2,6%) voicing (1,6%) vocalization /6/ (2,7%) plosives (2,1%) Fig. 2 glides - /j/ (1,8%) /l/ (<1%) glides - /w/ (1,8%) vowels (13,4%) fricatives (12,1%) affricates (8,7%) palatalization (2,5%) depalatalization (11,7%) nasalization (1,8%) <ą> <ę> (5,4%) nasals (<1%) no nasalization (<1%) diphthong (7,1%) devoicing (8,5%) fortisation (<1%) lenisation (5,4%) L1 PL - L2 DE: diphthongs, vowels, and fricatives /C/ and /h/ 100% SPEAKER SELECTION AND RECORDINGS 90% 30,3 7,3 63,9 75,4 86,1 89,1 80% 36 speakers per each language pair were recorded with a balanced distribution of proficiency level and gender i.e. 12 speakers (6 males and 6 females) per level (elementary, intermediate, advanced) ANNOTATION OF DE-PL SPEECH CORPUS Steps: Automatic segmentation and generation of canonical transcription Manual verification of automatic transcription by a phonetician Marking of the pronunciation errors by three labelers – native speakers of the target language (here Polish) based on subjective evaluation. Verification of the source language (here German) by a native speaker. 70% 16,7 62,8 60% 50% C B A 40% 30% 20% 22,9 85,1 83,8 90,5 tense, long lax, long 10% 0% central lax, short Fig. 4: The percentage share of the mean of the erroneous realizations of a given vowel type in total for all proficiency levels.