poster

advertisement
An Audiovisual Feedback System for Acquiring L2 Pronunciation and L2 Prosody
Grażyna Demenko1, Agnieszka Wagner1, Natalia Cylwik1 and Oliver Jokisch2
(1) Adam Mickiewicz University, Institute of Linguistics, Department of Phonetics, Poznań, Poland
(2) TU Dresden, Laboratory of Acoustics and Speech Communication, Dresden, Germany
An increasing application of speech technology solutions to foreign language
learning and the acknowledgement that proper pronunciation and prosody are
as important for nonnative speech intelligibility as other linguistic skills have
led to the development of a new discipline known under the name of
Computer-Assisted Pronunciation Training (CAPT).
Fig.1
Curriculum
The exercises are devised in order to test and practice prosody in smaller and larger
syntactic units. At the word level suprasegmental identification is devoted mainly to:

perception and production of regular and irregular lexical stress and foot structure as
well as types of nuclear accents, duration, intensity,

identification of mono-, di-, tri-, four-syllable words, prosodic word, enclitics,
proclitics, linking
At the level of simple and complex sentences exercises consist in:

production and recognition of different types of sentences, i.e. declaratives,
commands, wh-questions, etc.

identification and production of emphatic stress, relating focus with meaning

performing communicative functions with focus (e.g. showing emotions,
disagreement, calling attention to new information)

perception and production of contrastive pitch patterns conveying various meanings
e.g., fall (finality, authority), rise (unfinished, insinuating, tentative)
Requirements on an intelligent tutoring system
In order to be effective CAPT systems should meet the following requirements:

allow for training of both pronunciation and prosody: weak pronunciation can
sometimes preclude full intelligibility of speech, but prosody is important too,
because it helps listeners to process the segmental content

identify precisely the location and type of the error

provide scoring of learner’s utterance that gives immediate information on the
overall output quality

provide effective feedback via different channels (visual, aural, also descriptive,
contrastive feedback) – the feedback should be relevant to the type of error made by
the learner, easy to interpret and constructive, so that the learner understands how to
self-correct and get improvement

keep track of the learner’s performance, so that identification of features that should
be practiced is possible and the learner’s progress can be monitored

user-friendliness - it should be clear how to interpret displays and evaluate results
Prosody training
Feedback

Fig.2
an accurate visual representation of student’s and native speaker’s pitch contour in
real time paired with auditory feedback
pitch contours are stylized using the Pitch Line software to provide a continuous
representation and to ensure that only perceptually relevant pitch variations (i.e. the
macroprosodic component of the pitch contour) are displayed
relevant portions of the pitch contour (i.e. those corresponding to accented and
phrase-boundary words) are described parametrically with regard to four
perceptually significant features: direction, steepness, range of the distinctive pitch
movement and its temporal alignment with the onset of the accented vowel
a higher-level surface-phonological representation of the contour is derived from the
parametric description; It is in terms of discrete categories of pitch accents and
boundary tones, encodes melodic and functional aspects of prosody and unlike
strictly phonological representations, it makes no distinction between linguistic and
paralinguistic functions of prosody. Intonation contours which have different
representations at the surface-phonological level convey different meanings.
automatic assessment: qualitative (in terms of the pitch accent and boundary tone
categories) and quantitative (parametric)
results are displayed on a color scale (red - green)
the learner is instructed to compare his/her realization to that of the native speaker –
quantitative measurements of both realizations are provided as a support

The Euronounce project
Intelligent Language Tutoring System with Multimodal Feedback Functions (acronym
Euronounce) aims at creating L2 pronunciation and prosody teaching software. The
project focuses on Slavonic (Polish, Slovak, Russian, Czech)-German language pairs.
The Euronounce project was preceded by earlier projects carried out between 2005
and 2007. As a result, an audiovisual software AzAR (German acronym for Automat for
Accent Reduction) aimed at teaching Russians German pronunciation was created. AzAR
architecture separates the structure from the content, which enables adaptation of the
system to a new language or set of exercises.
Following the baseline developed in these projects the Euronounce project, beside
new language pairs, adds also suprasegmental exercises.


Fig.3

AzAR
AzAR is a knowledge-based system: it focuses on specific language pairs and uses
expert’s knowledge on typical errors made by L2 learners caused by interference with
their native language (L1) phonology and phonetics.
AzAR includes an extensive curriculum (defined by an expert) for the production
and perception training of difficult segmental contrasts.
The learner’s task is to listen to the utterance (a minimal pair, sentence or fragment
of a text) produced by the reference voice and to repeat it (in the production scenario) or
to discriminate between words in a minimal pair realized by the reference voice (in the
perception scenario).
In the first case the system gives a multimodal (visual and audio) feedback –
learner’s utterance is displayed and scored. An oscillogram of the model utterance is
presented simultaneously to allow for comparison. The learner can listen to his/her own
realization of the utterance and to that produced by the reference voice.
Fig. 1 illustrates the template for the production exercise in which realization of the
vowel contrast (/I/ as in “bitten” vs. /i:/ as in “bieten”) can be practiced.
Features of the feedback system

the software uses HMM-based speech recognition and speech signal analysis on the
learner’s input which makes a visual and aural comparison of the user’s own
performance with that of the reference voice possible


 automatic error detection on the phonemic level: all uttered phones are
marked using a color scale (on which green indicates good and red – bad
pronunciation)
 an additional visual mode includes animated visualization of the vocal tract
(lips area and articulators movements) and a formants graph for particular
phones
 in order to ensure that the information provided by the visual feedback is
useful for the learner a tutorial (Fig.2 & 3) is provided – it gives
introduction to acoustic and articulatory phonetics and explains how to
interpret the acoustic displays.
For each exercise in the curriculum a passage containing information on the
classification, features and articulation of the phone is provided as well as a
sagittal slice of the vocal tract during the phone production and pictures of the
lip area showing also tongue position.
#mam
5,.
#ją
H*L
#mam
5,. H*L
#co
5,.
#ją
#co
5,?
#sukienkę
#$p
HL*
#$p
5,.
#sukienkę
#$p
L*H? #$p
HL*
5,.
HL*
Fig.4: A Polish utterance realized by native speaker (top) and learner with L1 German (bottom).
The 2nd tier contains the surface-phonological representation of the intonation contour.
Download