Corpus-based pronunciation training Ulrike Gut, Albert-Ludwigs-University Freiburg

advertisement
PTLC2005 Ulrike Gut Corpus-based pronunciation training:1
Corpus-based pronunciation training
Ulrike Gut, Albert-Ludwigs-University Freiburg
1 Introduction Language corpora are increasingly used in the classroom and the
recognition of their pedagogical value is growing (e.g. Botley et al 1996, Ghadessy,
Henry & Roseberry 2001, Kettemann & Marko 2002, Granger et al 2002, Sinclair
2004). It has been claimed that the application of corpora in the classroom supports
inductive learning processes and the creation of language awareness in language
students. By investigating corpora, students are stimulated to enquire and speculate
about language structures and develop the ability to recognize language patterns. In
corpus-based “data-driven learning”, for example, students have the opportunity to
work as researchers by developing a research question and analysing it with reallanguage data (e.g. Johns 1991, Leech 1997).
Data-driven learning studies have so far mainly been carried out on native speaker
corpora. However, several researchers have proposed that this method should be
extended to learner corpora, which contain language produced by language learners
(e.g. Granger & Tribble 1998). It has been suggested that the advantage of learner
corpora as opposed to native corpora in the classroom lies in the opportunity they
provide for students to discover typical difficulties of learners of a certain language.
Activities based on a comparison between native and non-native data enable
language learners to focus on negative evidence and typical errors and train their
ability to notice differences between native and non-native language use. By
observing the errors learners typically and most frequently make, students might find
it easier to become aware of the features of their own interlanguage and possibly
stimulate a restructuring of their own language use and knowledge.
Due to the scarcity of learner speech corpora corpus-based pronunciation training in
a classroom setting has so far been impossible (cf. Nesselhauf 2004). The purpose
of this paper is to introduce the recently completed LeaP corpus (section 2), a
phonetically annotated learner corpus, and to report on its application in
pronunciation training. Section 3 presents its use in a pronunciation training course
for learners of German. Section 4 describes how the corpus was employed in a datadriven learning approach in a university seminar for students of English.
2 The LeaP corpus The LeaP corpus <http://www.phonetik.uni-freiburg.de/leap/>
was collected in the LeaP (Learning Prosody in a Foreign Language) project at the
University of Bielefeld, Germany, which was concerned with the acquisition of
prosody by non-native speakers of German and English. The corpus consists of a
total of 359 fully text-to-tone aligned recordings adding up to 73.941 words. The total
amount of recording time is more than 12 hours. The LeaP corpus comprises four
different speech styles:
-
readings of nonsense word lists
reading passage (about 2 minutes)
retellings of a story (between 2 and 10 minutes)
free speech in an interview situation (between 10 and 30 minutes)
The corpus contains recordings with 131 different speakers with a total of 32 different
native languages as well as 18 recordings with native speakers. A number of
different learner groups are represented: native speakers of English and of German,
serving as controls; especially advanced learners (near-natives); learners before and
after a training course in prosody; and learners before and after going for a stay
PTLC2005 Ulrike Gut Corpus-based pronunciation training:1
abroad. All recordings were annotated both manually and automatically on 8 different
tiers (see Figure 1). Manual annotations included pitch (initial high, final low,
intervening peaks and valleys), intonation (transcribed in modified ToBI), segments,
syllables (transcribed in SAMPA), words and phrasing. Parts-of-speech and lemmata
annotations were added automatically.
Figure 1. Annotation in the LeaP corpus.
3 The LeaP corpus in a pronunciation training course Recordings from the LeaP
corpus were used in a pronunciation training course from October 2002 to February
2003 at the University of Bielefeld, in which eight female learners of German
participated. The course consisted of thirteen weekly 90-minute lessons on German
pronunciation and prosody and comprised perception exercises, theoretical input and
practical exercises based on the LeaP corpus. It was taught by a University lecturer
in phonetics and three students of linguistics.
It was investigated whether the participants’ pronunciation improved after the course.
For this purpose, they were recorded before and after the course reading a nonsense
word list and retelling a story they had read aloud first. In addition, the participants
took part in two perception experiments on vowel length and word stress. In
experiment I, participants listened to 14 syllables containing either a short (lax) or
long (tense) vowel and were asked to indicate whether they perceived the vowel as
short or long. In experiment II, participants listened to 32 nonsense words of between
two and six syllables read by a native speaker of Standard German and were asked
to indicate which of the syllables they perceived as stressed.
The following additional measurements were taken before and after the course:
- accent rating 10 native speakers of German, 5 male and 5 female, all students of
linguistics at the University of Osnabrück, were asked to rate the participants’ foreign
accent on a 5-point scale from 1 (very good accent – native-like) to 5 (very
pronounced foreign accent).
PTLC2005 Ulrike Gut Corpus-based pronunciation training:1
- knowledge of German prosody This was tested by asking the participants to give
either rules or examples of word and sentence stress, intonation and speech rhythm
in German.
- stress placement 4 trained phoneticians marked the stress placement produced by
the learners in the nonsense word lists. Stress placement by the non-native speakers
was judged as correct or wrong compared to stress placement of five native German
speakers in the same word lists.
- speech rate The average number of syllables per phrase (phrase was defined as a
stretch of speech between two pauses of more than 150 ms) was calculated.
Table 1 illustrates the mean results of the speakers before and after the
pronunciation training course in terms of accent ratings, prosodic knowledge, stress
placement, speech rate and the perception of vowel length and stress. It can be seen
that the accent ratings did not improve after the course. On average, the learners
were rated as “medium” (3 on a scale from 1 to 5); the range lies between 1.6 and
4.2. The course participants showed significant improvements in their prosodic
knowledge and in correct stress placement. Before the course, on average, they had
no knowledge of German prosody – after the course they were able to name, on
average, more than two rules and give examples. Stress placement improved from
more than seven errors on average to just above 3 after the course. No significant
changes occurred after the course in terms of the perception of vowel length and
stress placement and the speech rate.
ratings prosodic
stress
knowledge placement
before 2.9
0.12
7.4
course
after
2.9
2.37
3.2
course
n.s.
** ( p<0.01) ** ( p<0.01)
vowel
perception
12.25
stress
perception
14.25
speech rate
3.98
11.7
17.83
4.93
n.s.
n.s.
n.s.
Table 1. Mean values of all measurements before and after the course.
These mean values, however, disguise that individual learners did improve in some
of the variables. Four of the participants, for example, improved their perception of
stress by 5 and 4 syllables. Further analyses suggest that more advanced learners
with better accent ratings profited more form the course. Learners with higher initial
accent ratings improved their stress placement more than learners with lower accent
ratings (r=.7; p<0.05). Accent ratings before the course show a moderate but not
significant correlation with prosodic knowledge after the course (r=-.47).
4 The LeaP corpus in a data-driven approach The LeaP corpus was used as a
tool for inductive learning in a course entitled “Phonetic properties of non-native
speech”, in which 21 students of English at the University of Freiburg in Germany
participated. The course lasted for one semester (October 2004 to February 2005)
and consisted of 15 classes with a mix of lecture, discussion and corpus work. In 13
classes, the students worked with the LeaP corpus, using the speech software Praat,
and solved small tasks such as the measurement of segment lengths and vowel
formants. In three classes, the students carried out a group project on an empirical
research question of their choice. Research questions included for example “Final
devoicing in English by German learners” and “Fluency before and after a stay
abroad”.
After the course, the students filled in a questionnaire about their attitudes towards
the corpus work. In questions 1 and 2 they were asked to rate their preferred
teaching method and to estimate where they learned most. Ratings ranged from 1
PTLC2005 Ulrike Gut Corpus-based pronunciation training:1
(best) to 5 (worst). As shown in Table 2, the students, on average, preferred the
discussion and lecture over corpus work, reading and the presentations by students.
They felt they had learned most in the lecture parts, followed by their own reading
and the discussions. Corpus work and the presentations by students were rated
lowest.
discussion
lecture
corpus work
reading
presentations by students
preferred method
2.2
2.2
2.5
2.6
3.3
learned most
2.47
1.66
2.66
1.8
3.25
Table 2. Preferred teaching method and self-estimation of where the students
learned most.
In the third question, the students agreed that corpus work was communicative (75%
yes/25% no), interesting (95%/5%), stimulating (86%/14%) and varied (62%/38%).
On the whole, they did not judge it to be boring (11% yes/89% no), too difficult
(0%/100%), too easy (5%/95%) or discouraging (0%/100%). Furthermore, 90%
agreed that they had learned a lot about foreign accent and that they had become
more aware of foreign accents (81%). Only 10% claimed that their own accent had
improved through the corpus work, but 72% believed that their language teaching will
improve.
5 References
Botley, Simon, Glass, Julia, McEnery, Tony and Wilson, Andrew (eds.) (1996)
Proceedings of Teaching and Language Corpora 1996. Lancaster: UCREL technical
papers volume 9.
Ghadessy, Mohsen, Henry, Alex and Roseberry, Robert (2001) Small corpus studies
and ELT. Amsterdam: John Benjamins.
Granger, Sylviane and Tribble, Christopher (1998) Learner corpus data in the foreign
language classroom: form-focused instruction and data-driven learning. In: Granger,
Sylviane (ed.), Learner English on Computer, London: Longman, pp. 199-209.
Granger, Joseph Hung and Stephanie Petch-Tyson (eds.) (2002) Computer Learner
Corpora, Second Language Acquisition and Foreign Language Teaching.
Amsterdam: Benjamins.
Johns, T. 1991. Should you be persuaded – two samples of data-driven learning
materials. In: T. Johns & P. King (eds.), Classroom concordancing. Birmingham: ELR
Journal 4, p. 1-16.
Kettemann, Bernhard and Marko, Georg (eds.) (2002) Teaching and Learning by
Doing Corpus Analysis. Amsterdam: Rodopi.
Leech, G. (1997). Teaching and language corpora: a convergence. In: A. Wichmann,
S. Fligelstone, A. McEnery & G. Knowles (eds.) Teaching and language corpora.
London: Longman, p. 1-23.
Nesselhauf, Nadja (2004) Learner corpora and their potential for language teaching.
In: Sinclair, John (ed.), How to Use Corpora in Language Teaching. Amsterdam:
John Benjamins, pp.125-152.
Sinclair, John (ed.) (2004) How to Use Corpora in Language Teaching. Amsterdam:
John Benjamins.
Download