PTLC2005 Ulrike Gut Corpus-based pronunciation training:1 Corpus-based pronunciation training Ulrike Gut, Albert-Ludwigs-University Freiburg 1 Introduction Language corpora are increasingly used in the classroom and the recognition of their pedagogical value is growing (e.g. Botley et al 1996, Ghadessy, Henry & Roseberry 2001, Kettemann & Marko 2002, Granger et al 2002, Sinclair 2004). It has been claimed that the application of corpora in the classroom supports inductive learning processes and the creation of language awareness in language students. By investigating corpora, students are stimulated to enquire and speculate about language structures and develop the ability to recognize language patterns. In corpus-based “data-driven learning”, for example, students have the opportunity to work as researchers by developing a research question and analysing it with reallanguage data (e.g. Johns 1991, Leech 1997). Data-driven learning studies have so far mainly been carried out on native speaker corpora. However, several researchers have proposed that this method should be extended to learner corpora, which contain language produced by language learners (e.g. Granger & Tribble 1998). It has been suggested that the advantage of learner corpora as opposed to native corpora in the classroom lies in the opportunity they provide for students to discover typical difficulties of learners of a certain language. Activities based on a comparison between native and non-native data enable language learners to focus on negative evidence and typical errors and train their ability to notice differences between native and non-native language use. By observing the errors learners typically and most frequently make, students might find it easier to become aware of the features of their own interlanguage and possibly stimulate a restructuring of their own language use and knowledge. Due to the scarcity of learner speech corpora corpus-based pronunciation training in a classroom setting has so far been impossible (cf. Nesselhauf 2004). The purpose of this paper is to introduce the recently completed LeaP corpus (section 2), a phonetically annotated learner corpus, and to report on its application in pronunciation training. Section 3 presents its use in a pronunciation training course for learners of German. Section 4 describes how the corpus was employed in a datadriven learning approach in a university seminar for students of English. 2 The LeaP corpus The LeaP corpus <http://www.phonetik.uni-freiburg.de/leap/> was collected in the LeaP (Learning Prosody in a Foreign Language) project at the University of Bielefeld, Germany, which was concerned with the acquisition of prosody by non-native speakers of German and English. The corpus consists of a total of 359 fully text-to-tone aligned recordings adding up to 73.941 words. The total amount of recording time is more than 12 hours. The LeaP corpus comprises four different speech styles: - readings of nonsense word lists reading passage (about 2 minutes) retellings of a story (between 2 and 10 minutes) free speech in an interview situation (between 10 and 30 minutes) The corpus contains recordings with 131 different speakers with a total of 32 different native languages as well as 18 recordings with native speakers. A number of different learner groups are represented: native speakers of English and of German, serving as controls; especially advanced learners (near-natives); learners before and after a training course in prosody; and learners before and after going for a stay PTLC2005 Ulrike Gut Corpus-based pronunciation training:1 abroad. All recordings were annotated both manually and automatically on 8 different tiers (see Figure 1). Manual annotations included pitch (initial high, final low, intervening peaks and valleys), intonation (transcribed in modified ToBI), segments, syllables (transcribed in SAMPA), words and phrasing. Parts-of-speech and lemmata annotations were added automatically. Figure 1. Annotation in the LeaP corpus. 3 The LeaP corpus in a pronunciation training course Recordings from the LeaP corpus were used in a pronunciation training course from October 2002 to February 2003 at the University of Bielefeld, in which eight female learners of German participated. The course consisted of thirteen weekly 90-minute lessons on German pronunciation and prosody and comprised perception exercises, theoretical input and practical exercises based on the LeaP corpus. It was taught by a University lecturer in phonetics and three students of linguistics. It was investigated whether the participants’ pronunciation improved after the course. For this purpose, they were recorded before and after the course reading a nonsense word list and retelling a story they had read aloud first. In addition, the participants took part in two perception experiments on vowel length and word stress. In experiment I, participants listened to 14 syllables containing either a short (lax) or long (tense) vowel and were asked to indicate whether they perceived the vowel as short or long. In experiment II, participants listened to 32 nonsense words of between two and six syllables read by a native speaker of Standard German and were asked to indicate which of the syllables they perceived as stressed. The following additional measurements were taken before and after the course: - accent rating 10 native speakers of German, 5 male and 5 female, all students of linguistics at the University of Osnabrück, were asked to rate the participants’ foreign accent on a 5-point scale from 1 (very good accent – native-like) to 5 (very pronounced foreign accent). PTLC2005 Ulrike Gut Corpus-based pronunciation training:1 - knowledge of German prosody This was tested by asking the participants to give either rules or examples of word and sentence stress, intonation and speech rhythm in German. - stress placement 4 trained phoneticians marked the stress placement produced by the learners in the nonsense word lists. Stress placement by the non-native speakers was judged as correct or wrong compared to stress placement of five native German speakers in the same word lists. - speech rate The average number of syllables per phrase (phrase was defined as a stretch of speech between two pauses of more than 150 ms) was calculated. Table 1 illustrates the mean results of the speakers before and after the pronunciation training course in terms of accent ratings, prosodic knowledge, stress placement, speech rate and the perception of vowel length and stress. It can be seen that the accent ratings did not improve after the course. On average, the learners were rated as “medium” (3 on a scale from 1 to 5); the range lies between 1.6 and 4.2. The course participants showed significant improvements in their prosodic knowledge and in correct stress placement. Before the course, on average, they had no knowledge of German prosody – after the course they were able to name, on average, more than two rules and give examples. Stress placement improved from more than seven errors on average to just above 3 after the course. No significant changes occurred after the course in terms of the perception of vowel length and stress placement and the speech rate. ratings prosodic stress knowledge placement before 2.9 0.12 7.4 course after 2.9 2.37 3.2 course n.s. ** ( p<0.01) ** ( p<0.01) vowel perception 12.25 stress perception 14.25 speech rate 3.98 11.7 17.83 4.93 n.s. n.s. n.s. Table 1. Mean values of all measurements before and after the course. These mean values, however, disguise that individual learners did improve in some of the variables. Four of the participants, for example, improved their perception of stress by 5 and 4 syllables. Further analyses suggest that more advanced learners with better accent ratings profited more form the course. Learners with higher initial accent ratings improved their stress placement more than learners with lower accent ratings (r=.7; p<0.05). Accent ratings before the course show a moderate but not significant correlation with prosodic knowledge after the course (r=-.47). 4 The LeaP corpus in a data-driven approach The LeaP corpus was used as a tool for inductive learning in a course entitled “Phonetic properties of non-native speech”, in which 21 students of English at the University of Freiburg in Germany participated. The course lasted for one semester (October 2004 to February 2005) and consisted of 15 classes with a mix of lecture, discussion and corpus work. In 13 classes, the students worked with the LeaP corpus, using the speech software Praat, and solved small tasks such as the measurement of segment lengths and vowel formants. In three classes, the students carried out a group project on an empirical research question of their choice. Research questions included for example “Final devoicing in English by German learners” and “Fluency before and after a stay abroad”. After the course, the students filled in a questionnaire about their attitudes towards the corpus work. In questions 1 and 2 they were asked to rate their preferred teaching method and to estimate where they learned most. Ratings ranged from 1 PTLC2005 Ulrike Gut Corpus-based pronunciation training:1 (best) to 5 (worst). As shown in Table 2, the students, on average, preferred the discussion and lecture over corpus work, reading and the presentations by students. They felt they had learned most in the lecture parts, followed by their own reading and the discussions. Corpus work and the presentations by students were rated lowest. discussion lecture corpus work reading presentations by students preferred method 2.2 2.2 2.5 2.6 3.3 learned most 2.47 1.66 2.66 1.8 3.25 Table 2. Preferred teaching method and self-estimation of where the students learned most. In the third question, the students agreed that corpus work was communicative (75% yes/25% no), interesting (95%/5%), stimulating (86%/14%) and varied (62%/38%). On the whole, they did not judge it to be boring (11% yes/89% no), too difficult (0%/100%), too easy (5%/95%) or discouraging (0%/100%). Furthermore, 90% agreed that they had learned a lot about foreign accent and that they had become more aware of foreign accents (81%). Only 10% claimed that their own accent had improved through the corpus work, but 72% believed that their language teaching will improve. 5 References Botley, Simon, Glass, Julia, McEnery, Tony and Wilson, Andrew (eds.) (1996) Proceedings of Teaching and Language Corpora 1996. Lancaster: UCREL technical papers volume 9. Ghadessy, Mohsen, Henry, Alex and Roseberry, Robert (2001) Small corpus studies and ELT. Amsterdam: John Benjamins. Granger, Sylviane and Tribble, Christopher (1998) Learner corpus data in the foreign language classroom: form-focused instruction and data-driven learning. In: Granger, Sylviane (ed.), Learner English on Computer, London: Longman, pp. 199-209. Granger, Joseph Hung and Stephanie Petch-Tyson (eds.) (2002) Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching. Amsterdam: Benjamins. Johns, T. 1991. Should you be persuaded – two samples of data-driven learning materials. In: T. Johns & P. King (eds.), Classroom concordancing. Birmingham: ELR Journal 4, p. 1-16. Kettemann, Bernhard and Marko, Georg (eds.) (2002) Teaching and Learning by Doing Corpus Analysis. Amsterdam: Rodopi. Leech, G. (1997). Teaching and language corpora: a convergence. In: A. Wichmann, S. Fligelstone, A. McEnery & G. Knowles (eds.) Teaching and language corpora. London: Longman, p. 1-23. Nesselhauf, Nadja (2004) Learner corpora and their potential for language teaching. In: Sinclair, John (ed.), How to Use Corpora in Language Teaching. Amsterdam: John Benjamins, pp.125-152. Sinclair, John (ed.) (2004) How to Use Corpora in Language Teaching. Amsterdam: John Benjamins.