poster

advertisement
The EURONOUNCE corpus of non-native Polish for ASR-based Pronunciation Tutoring System
Natalia Cylwik, Agnieszka Wagner, Grażyna Demenko
Adam Mickiewicz University, Institute of Linguistics, Department of Phonetics, Poznań, Poland
The increasing interest in teaching second language (L2) pronunciation and prosody
has coincided with the development of speech technology leading to a creation of a
number of Computer-Assisted Pronunciation Training (CAPT) systems trying to
apply modern techniques such as speech analysis, speech recognition and automatic
error detection to non-native speech. These attempts were not always successful as
the systems were trained either on the target language only or target language and nonnative speech by speakers with a different native language (L1) background. In the
EURONOUNCE project it was assumed that in order for a system to reliably process
non-native speech, detect errors and evaluate the learner, it must be trained and tested
on “three-language” database:



As a result
steps:




Source language database
Target language database
Interlanguage (non-native) database
a complex multilingual speech database was created following several
Text material collection
Speaker selection and recordings
Annotation
Linguistic and statistical analysis of the database
THE EURONOUNCE PROJECT
Full name:
Intelligent Language Tutoring System with multimodal
feedback functions
Partners:
Technical University in Dresden
Adam Mickiewicz University in Poznan
Slovak Academy of Sciences in Bratislava
Russian Academy of Sciences in Petersburg
Voice INTER connect GmbH in Dresden
Aim:
Creation of L2 pronunciation and prosody teaching software
(AZAR 3.0) with multimodal feedback functions for the
following language pairs:
L1 German (DE) - L2 Polish (PL)/Slovak (SK)/Russian
(RU)/ Czech (CZ)
L1 Polish/Slovak/Russian/Czech - L2 German
Feedback:
Novelty
audio and visual modes based on

audio (recording) and visual (oscillogram)
representation of the reference voice

learners’ speech analysis and HMM-based
automatic recognition (recording, oscillogram,
segmentation and annotation at the phone level
provided)

automatic error detection at the phone level

colored-scale evaluation of the learner’s
pronunciation at the phone level

visualization of the tongue and lips articulatory
movements

text and graphic tutorial on the L2 phonetics
Language switch – the software is directed to a clearly
defined user – native speaker of a particular L1 trying to
acquire a particular L2.
MULTILINGUAL SPEECH DATABASES
Within the EURONOUNCE project a complex multilingual speech database was
created for all language pairs according to the following structure:
L1 database
(Source-language database)
L2 database
(Target language database)
Reference database
Non-native database
Speech by 18 native speakers of the source
language serving as a reference for the assessment
of non-native pronunciation.
50 hours of the target language speech provided by
over 100 speakers for the general speech
recognizer training.
Target language speech by one male and one
female professional speaker (reference voices) for
implementation in the software.
Target speech by 18 L1 speakers for the purpose of
collecting evidence of errors and ASR and
automatic error detection training
Annotation system:

a modified version of Polish SAMPA

an extended SAMPA for German

a set of labels to mark intermediate phonemes (neither Polish, nor German)

diacritics to mark insertions, deletions and substitutions, e.g. “-“
LINGUISTIC ANALYSIS
At the phone level three kinds of pronunciation errors are distinguished: substitutions,
insertions and deletions. The distribution of different types of errors (Fig.1) is similar
in different proficiency groups, but the overall number of pronunciation errors (Fig.2)
decreases with student’s proficiency.
Fig.1
Fig. 2
80%
begginers
advanced
2486 960
60%
advanced
(25,99%)
40%
647 287
NON-NATIVE SPEECH DATABASE ON THE EXAMPLE OF L1 DE - L2 PL
573 269
20%
The non-native corpus is comprised of 6 tests:

Accent test – sentences containing Polish sounds and phonetic phenomena
difficult from the point of view of a German learner, e.g. Polish [x] in words
such as <ich> (Eng. ‘their’) which Germans might pronounce as [C].

Dialectological test – 124 sentences containing words with alternative
pronunciations e.g. <bank> pronounced as /bank/ or /baNk/ and covering
Polish phonetic processes e.g. assimilation <bluzka> (Eng. ‘blouse’)
pronounced as /bluska/ and a full range of Polish phonemes in different
contexts, word and sentence positions and.

Spontaneous speech test – 4 tasks such as finishing a sentence, e.g. ‘My hobby
is…’ and explaining the meaning of a proverb or idiomatic expression
commonly known both in Poland and Germany, e.g. Pol. ‘przemoknąć do
suchej nitki’, Germ. ‘keinen trockenen Faden (mehr) am Leibe haben’ (Eng.
‘to get soaked to the skin’).

Continuous speech test – three passages taken from stories by H. Ch.
Andersen and Grimm Brothers.

Prosody test – 59 sentences which aim at collecting evidence of prosodic errors
such as erroneous stress placement or non-native-like vowel duration,
intonation patterns in neutral sentences vs. sentences with focus, in questions
vs. statements, commands and requests, etc.

Phondat corpus – 341 phonetically rich and balanced sentences for the
purposes of ASR training and testing and collecting mispronunciations of
consonant clusters.
elementary
(42,67%)
0%
deletion
substitution
intermediate
(31,35%)
insertion
The most frequent pronunciation errors concern phonemes not present in student’s L1:

L1 DE - L2 PL: fricatives (/x/, /v/), affricates (/t^s/, /d^z/, /t^S/, /d^Z/) and
palatal consonants (/n’/, /s’/, /z’/, /t^s’/, /d^z’/), graphemes <ą> and <ę> whose
phonetic mapping depends on the context, glides /w/ and /j/
Fig.3
vowel reduction /@/ (9%)
only closure (plosives, 2,6%)
voicing (1,6%)
vocalization /6/ (2,7%)
plosives (2,1%)
Fig. 2 glides - /j/ (1,8%)
/l/ (<1%)
glides - /w/ (1,8%)
vowels (13,4%)
fricatives (12,1%)
affricates (8,7%)

palatalization (2,5%)
depalatalization (11,7%)
nasalization (1,8%)
<ą> <ę> (5,4%)
nasals (<1%)
no nasalization (<1%)
diphthong (7,1%)
devoicing (8,5%)
fortisation (<1%)
lenisation (5,4%)
L1 PL - L2 DE: diphthongs, vowels, and fricatives /C/ and /h/
100%
SPEAKER SELECTION AND RECORDINGS
90%
30,3
7,3
63,9
75,4
86,1
89,1
80%
36 speakers per each language pair were recorded with a balanced distribution of
proficiency level and gender i.e. 12 speakers (6 males and 6 females) per level
(elementary, intermediate, advanced)
ANNOTATION OF DE-PL SPEECH CORPUS
Steps:

Automatic segmentation and generation of canonical transcription

Manual verification of automatic transcription by a phonetician

Marking of the pronunciation errors by three labelers – native speakers of the
target language (here Polish) based on subjective evaluation.

Verification of the source language (here German) by a native speaker.
70%
16,7
62,8
60%
50%
C
B
A
40%
30%
20%
22,9
85,1
83,8
90,5
tense, long
lax, long
10%
0%
central
lax, short
Fig. 4: The percentage share of the mean of the erroneous
realizations of a given vowel type in total for all proficiency levels.
Download