The English pronunciation teaching in Eu

Research in Language, 2012, vol. 10.1/2
Alice Henderson, Dan Frost, Elina Tergujeff, Alexander Kautzsch, Deirdre Murphy,
Anastazija Kirkova-Naskova, Ewa Waniek-Klimczak, David Levey, Una
Cunningham and Lesley Curnick – The English Pronunciation Teaching in Europe
Survey: Selected Results ....................................................................................................
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver –
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca ..... 29
Marta Nowacka – Questionnaire-Based Pronunciation Studies: Italian, Spanish and Polish
Students’ Views on their English Pronunciation .................................................................. 43
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal – Perception of FA by Nonnative Listeners in a Study Abroad Context ........................................................................ 63
Takehiko Makino and Rika Aoki – English Read by Japanese Phonetic Corpus: An
Interim Report ................................................................................................................... 79
Jolanta Szpyra-Kozłowska and Marek Radomski – The Perception of English-Accented
Polish –A Pilot Study ......................................................................................................... 97
Włodzimierz Sobkowiak – This is Tom = /zyzys'tom/. Pronunciation in Beginners' EFL
Textbooks Then and Now .................................................................................................. 111
Sebastian Schmidt – New Ways of Analysing the History of Varieties of English - An
Acoustic Analysis of Early Pop Music Recordings from Ghana ........................................... 123
Miroslav Ježek – The Double-Edged Sword of RP: the Contrasting Roles of a
Pronunciation Model in both Native and Non-native Environments ..................................... 133
Una Cunningham – Using Nigerian English in an International Academic Setting ................ 143
Geoffrey Schwartz – Initial Glottalization and Final Devoicing in Polish English .................. 159
Jan Volín, Mária Uhrinová and Radek Skarnitzl – The Effect of Word-Initial
Glottalization on Word Monitoring in Slovak Speakers of English ...................................... 173
Wiktor Gonet and Radosław Święciński – More On the Voicing of English Obstruents:
Voicing Retention vs. Voicing Loss ................................................................................... 183
Andrzej Porzuczek – Measuring Vowel Duration Variability in Native English Speakers
and Polish Learners ........................................................................................................... 201
Tomasz Ciszewski – Stressed Vowel Duration and Phonemic Length Contrast ...................... 215
Alexander Kautzsch – Transfer, Similarity or Lack of Awareness? Inconsistencies of
German Learners in the Pronunciation of LOT, THOUGHT, STRUT, PALM and BATH .... 225
Jolanta Szpyra-Kozłowska – Mispronounced Lexical Items in Polish English of
Advanced Learners ............................................................................................................ 243
Research in Language, 2012, vol. 10.1
DOI 10.2478/v10015-011-0047-4
Université de Savoie1
University of Jyväskylä2
University of Regensburg3
Trinity College Dublin4
University of Skopje5
University of Łódź6
University of Cádiz7
Stockholm University8
Université de Lausanne9
This paper provides an overview of the main findings from a European-wide on-line
survey of English pronunciation teaching practices. Both quantitative and qualitative data
from seven countries (Finland, France, Germany, Macedonia, Poland, Spain and
Switzerland) are presented, focusing on teachers' comments about:
● their own pronunciation,
● their training,
● their learners’ goals, skills, motivation and aspirations,
● their preferences for certain varieties (and their perception of their students'
The results of EPTiES reveal interesting phenomena across Europe, despite shortcomings
in terms of construction and distribution. For example, most respondents are non-native
speakers of English and the majority of them rate their own mastery of English
pronunciation favourably. However, most feel they had little or no training in how to
teach pronunciation, which begs the question of how teachers are coping with this key
Henderson and Frost are listed first because they did the final editing. Thereafter, authors are
listed in alphabetical order of the country whose data they gathered and analysed. The order of
the other authors thus reflects neither hierarchy nor significance of individual contributions, as
this is a truly collaborative project and article.
Alice Henderson, Dan Frost et alii
aspect of language teaching. In relation to target models, RP remains the variety of
English which teachers claim to use, whilst recognizing that General American might be
preferred by some students. Differences between countries are explored, especially via
replies to open-ended questions, allowing a more nuanced picture to emerge for each
country. Other survey research is also referred to, in order to contextualise the analyses
and implications for teaching English and for training English teachers.
1. Introduction
English pronunciation teaching has been the subject of several surveys but mainly in
English-speaking countries, such as Canada (Foote, Holtby & Derwing, 2011;
Breitkreuz, Derwing and Rossiter, 2002), Australia (Macdonald, 2002), and Great
Britain (Bradford and Kenworthy, 1991; Burgess and Spencer, 2000). The attitudes
towards pronunciation and the teaching practices of EFL teachers in Ireland were
examined by D. Murphy (2011). Walker’s survey of teachers in Spain (1999), which
included some questions about training to teach pronunciation, is a relatively rare
example of a study focusing on the issue in another European country. Relevant studies
have been carried out recently in Poland, Serbia and Finland but have tended to focus on
the learner’s perspective.
English pronunciation researchers in Poland have concentrated on two major issues:
firstly, the attitudes of the learners towards native speaker models (e.g. Kul, Janicka and
Weckwerth 2005, Waniek-Klimczak and Klimczak 2005) and secondly, the degree of
success in reaching the models in the learning process (e.g. Gonet, Szpyra-Kozłowska
and Święciński 2010, Nowacka 2010). Although the studies adopt a learner rather than a
teacher perspective, their results may be relevant for both, as the majority of participants
are university students training to become teachers of English. Thus, the fact that
university students recognise the relevance of native speaker models (with a strong
preference for RP), but do not necessarily believe they will be able to reach the goal of
native-like accent (see different views in Kul et al. 2005) may affect their attitudes
towards the specification of goals in pronunciation teaching.
Paunivic (2009) presented a similar perspective in the Serbian context. She showed
that complex interactions of sociolinguistic constructs were influential in shaping
trainees’ attitudes and their notion of the EFL teacher. The division between “foreign
and incorrect” and “standard and correct” surfaced as most distinctive in the participants’
responses, which favoured the latter, especially the British and American varieties,
participants dismissing even native speakers as “foreign” if they sounded markedly
In Finland, English pronunciation teaching has not been a frequently researched
topic. Some insights into teaching materials and practices in the classroom are offered by
Tergujeff (2010a, in print) and by two recent works: Lintunen (2004) and Tergujeff et al.
(2011). Both studies include a survey section concentrating on phonetic teaching
methods in English pronunciation teaching, but as opposed to the present study, the
Finnish surveys were aimed at learners, not teachers.
Therefore, to the best of our knowledge, no study has extensively explored and
compared how English pronunciation is taught in several European countries. The
English Pronunciation Teaching in Europe Survey (EPTiES) seeks to fill this gap.
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Teachers from ten European countries created and administered the survey: Finland,
France, Germany, Ireland, Macedonia, The Netherlands, Poland, Spain, Sweden and
Switzerland. The current article explores the survey’s results for seven of these
countries, focusing on the following issues: teacher training; teachers’ views of their
own pronunciation; teachers’ awareness of their students’ goals and skills; teachers’
awareness of students’ motivation to speak English and of their aspiration to achieve
native-like pronunciation.
2. Survey Design & Administration
The survey, designed and administered using the open-source application LimeSurvey,
has 57 questions, requesting for example: participant information; teachers’ views on the
pronunciation-related training they received; information about which varieties and
norms are used in the classroom (for receptive & productive work). Certain questions,
such as “Please list your teaching qualifications”, are formulated to reflect specific
national contexts. Likert scale items are used, as well as several yes - no questions which
are followed by a request for more information. The questions about teacher training are
open questions, whereas others permit several answers to be chosen from a list, such as
the questions about models and norms2.
The survey was open from February 2010 until September 2011 and a total of 843
people replied, with 481 completed surveys. Attempts were made to contact teachers at
all levels of the private and public sectors by several means, including personal contacts
and mailing lists of professional bodies such as teachers’ associations (e.g. SUKOL in
Finland, TESOL-France, ELTAM in Macedonia, ETAS in Switzerland). Educational
institutions and administrative structures were also contacted directly (Finland, France
and Germany). Invitations were distributed internationally via the Linguist List and
“promotional” bookmarks were handed out at various conferences over a two-year
The results presented include only countries for which there were at least 12
completed surveys (Table 1), which unfortunately excludes Ireland (8), the Netherlands
(0) and Sweden (1). As may be expected, some questionnaires are incomplete, often with
only a few items left unanswered; therefore, the number of respondents for a given
question is indicated in tables only when it differs from those in Table 1 below.
N° of respondents per country
N° of records completed
The latter is a potential weakness of the survey design, as will be shown in the discussion of
Alice Henderson, Dan Frost et alii
N° of respondents per country
N° of records completed
Table 1. Participants per country, total n° of respondents/n° of completed surveys
3. Findings: Teachers
This section is divided into two parts. The first provides background information about
the respondents in relation to: gender; average age and average number of years
teaching; level and type of education; native language; teaching context (public/private,
age of learners). The differences among countries will be referred to in the further
analyses where they have an important impact (e.g. on the results related to attitudes and
norms). The second section looks at teachers’ views on the relative importance of
English and of pronunciation, as well as teachers’ self-assessment of their pronunciation.
Respondents were predominantly female (77%) for 6 of the 7 countries but there are
some important differences: 95.1% in Finland, 92.3% in Macedonia, 83% in
Switzerland, 75% in France & Poland, and 72.45% in Germany. It is interested to note
that in Spain, 65% of those who completed the questionnaire were male. This was
somewhat surprising given that language teaching in Spain, particularly at primary and
secondary school level, has tended to be female dominated. Although it is true that more
men are entering the profession, these figures may not be significant – of the 31 teachers
who initially responded to the questionnaire but did not necessarily complete it, 16 were
male and 15 were female.
The overall average age is 42.95 years, with averages from Poland and Macedonia
well below this. The average age and years of experience is lowest for the Polish
respondents: 17/20 were aged 22-26, with 2-3 years of teaching practice. This is
significantly lower than the overall survey average of 16.13 years’ teaching experience.
Respondents in Macedonia show a slightly higher range of age and experience: average
age 29 (from 28-50 years) and 8 years’ teaching experience (from 3-34 years). The
average age in Finland is 44.6 years (24-67 years) with an average of 16 years’ teaching
experience, with a range of 0 to 44 years. German figures are almost exactly the same in
both average and range: average age 44.68 (ranging from 24 to 66) and 15.99 years
average experience (1-41 years). Even though France and Switzerland have the same
average age (46), the former averaged 21 years’ teaching experience as against 15 years
in Switzerland. This seems to suggest that in France, respondents are mainly career
teachers from the outset, whereas in Switzerland, English teaching is probably not the
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
participants’ first career. Finally, almost half (45%) of Spanish respondents are over the
age of 45 with more than 15 years’ experience. However, this can largely be explained
by the fact that most work in attractive large or medium-size towns and cities. Jobs in
popular urban centres tend to be taken by candidates with the most years of service.
In terms of level of education, respondents in only two countries tend to hold specific
EFL qualifications: in Switzerland 13/17 described themselves as TEFL-trained3 with
two having PhDs4. The majority (94.2%) of Finnish respondents have finished at least an
MA degree; in Finland, qualified EFL subject teachers hold an MA degree in English
with a teacher training programme/pedagogy as a minor subject in the degree. The young
Polish respondents are recent graduates or are still in the process of doing MA courses.
All Macedonian respondents hold BA degrees, one an MA degree, and one a CPE
certificate. In the case of Spain, all except one of the teachers had a University degree
and 25% had an MA or PhD. In France, over half of the respondents have passed the
CAPES or the Agrégation (the French national competitive exams for recruiting
teachers) and many other different levels and types of qualifications were listed5.
Concerning native language, despite the fact that overall 88.21% of respondents
describe themselves as non-native speakers of English, there are important differences
among the different countries. In Switzerland they were predominantly native English
speakers (83%), but in neighbouring France, three-quarters of respondents were nonnative speakers. Most participants in Finland were non-native speakers of English
(99.0%), as was the case in Germany (95.87%) and Spain (74.19%). In Macedonia and
Poland all respondents were non-native English speakers.
Respondents teach predominantly in the public sector (92.2% in Finland, 93.93% in
Germany, 80.65% in Spain), except for Macedonia, where 76.92% of respondents teach
in the private sector. Polish respondents teach in the public sector, with additional
classes taught in private language schools in the evenings or at weekends. Swiss
respondents teach mainly adults in both private (61%) and public (39%) sectors. French
respondents also teach primarily adults, working in tertiary education (76.9%) and high
schools (21.5%). In contrast, Finns are quite evenly distributed across different teaching
contexts: i.e. primary (29.1%), lower secondary (31.1%), and upper secondary (27.2%)
level; only a few respondents teach in other contexts (vocational school, university,
other). Three-quarters of participants in Germany teach 10-18 year olds: (40.50%) at
Gymnasium (age 10-18), followed by Realschule (age 10 to 16, 20.94%), and
Grundschule (age 6-10, 16.25%). A slightly smaller proportion (13.50%) teaches
younger pupils (age 10-15) at Hauptschule.
Two items required participants to estimate the relative importance of English and of
pronunciation in relation to other language skills; the averages were relatively high (4.66
and 3.77 respectively, with 5 being “extremely important”). In a third item, teachers selfevaluated their pronunciation (4.17, with 5 being native). This section discusses those
results and begins to address the issue of the status of English.
Reassuringly, the overall figures for the importance of English are quite high in all
countries, as one would anticipate from English teachers. In Finland, the average rating
For example, having a DipTEFL, CELTA, MEd in TESOL.
One in Entomology and the other in English Linguistics.
One cannot take these exams without having completed an undergraduate degree and since 2011,
a 2-year Master’s programme must be completed before being allowed to teach.
Alice Henderson, Dan Frost et alii
for the importance of English in relation to other languages was very high: 4.65. In their
comments, the respondents frequently mentioned the status of English as a global
language, and also issues related to the respondents’ own language use, e.g. “I teach
English, I read in English, I communicate in English with friends abroad”. However,
one respondent pointed out that “English is not the only foreign language people should
learn”. This comment relates to recent debates around language policy and education in
Finland, where foreign language skills are highly valued and vast resources are invested
in language education. In recent years however, pupils have been choosing to study
fewer optional languages than before (see for example Kangasvieri et al. 2011, Sajavaara
et al. 2007). The concern is that Finns’ language skills may diminish as fewer pupils
study German, French and Russian, whereas nearly all study English. The global status
of English is surely one reason behind the new tendency of language choices at school,
and pupils and parents may think that knowing English, in addition to the official
languages Finnish and Swedish, is enough. Moreover, according to a recent survey,
English is widely present in the everyday life of younger Finns in particular.
The results from Macedonia point to a similar situation. Respondents from
Macedonia allotted high values to the importance of English in relation to other
languages (4.69/5 on average). In the open comments for that item, respondents
mentioned the economic relevance of English and the communicative relevance of
English as a world language. The responses given for item n°61, which explored the
importance of English pronunciation in relation to other language skills, echo these
notions. Although 19.2% rated the importance of pronunciation as 5 (extremely
important), in effect signifying that pronunciation is seen as equally important as other
language skills, most of the respondents opted for the lower ratings (52.6% and 26.9%
for 4/5 and 3/5 respectively). In their comments, communication clearly takes priority
over correct pronunciation: English “needs to be learnt” because it is “the language of
global trade” and “all information is in English”. Teachers – much to their credit – seem
to be aware that communication is the goal of learning English for their learners.
Pronunciation as a skill then is viewed through the lenses of this aim and is pushed down
on the priority list, i.e. English is solely learnt for communicative purposes.
At the opposite end of the spectrum, in Spain practically all informants gave great
importance to pronunciation in relation to other language skills (item n°61). That
necessity to improve English pronunciation skills is widely accepted and the urgent need
for specific teacher training in this area has been advocated for some time (see Donovan
2001; Levey 1999, 2001; Pavon, 2001; Pavon and Rosado, 2003). Analysis of Spanish
data reveals that pronunciation remains a problem and informants recognize that
insufficient time and resources are spent on it. The reasons most commonly cited for not
dedicating more time to it centred on two aspects: first, the difficulty it constitutes for
both students and teachers, and secondly, the fact that teachers felt their hands were tied
by curricular demands and by the need for schools to obtain results:
“Spanish students need help with their pronunciation but in the end we have to be
realistic… unfortunately the truth is that students must pass a written exam at the end of
the year - there is no oral test. So I'm sorry to say oral skills are not the priority”.
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
The results from Germany raise the question of the nativization of English in Europe, as
well as the possible categorization of English as an “additional” instead of a foreign
language (Hilgendorf, 2007:144). The fact that German respondents rate the overall
importance of English as high (4.67) does not come as a surprise. It nicely mirrors the
status English has gained in primary and secondary education in the last decades
resulting from trends in European and national educational policy– a development which
is systematically documented by the German government's federal office for statistical
analysis (Statitisches Bundesamt 2003a, 2011a, 2011b).. In schools of general education
(Grundschule, Hauptschule, Realschule, Gesamtschule, Gymnasium) the percentage of
pupils learning English has increased from 69.1% in the school year 2002/2003 to 86.7%
in the school year 2010/2011 (Statistisches Bundesamt 2003a, 2011a). A similar increase
can be witnessed in vocational schools from 42.1% to 51.7%, respectively (Statistisches
Bundesamt 2003b, 2011b). The development in schools of general education goes hand
in hand with the introduction of English as a compulsory subject in primary schools in
many of the 16 federal states of Germany, who are independent from the federal
government in establishing and implementing educational policies. The increase in the
number of English learners in the vocational sector seems to be a consequence of the
requirements of a globalized workplace. Hilgendorf's (2007:144) suggestion of “a shift
in the status of the language from that of a foreign language to that of an additional
language” obviously holds for other European countries, as well. But if and to what
extent Germany and other European countries find themselves in a process of “ongoing
nativization and acculturation of English” (Hilgendorf 2007: 145) remains to be seen
once this new generation of pupils (who have started learning English at an earlier age)
grows up.
Importance of English
Finland (n=78)
France (n=52)
Germany (n=270)
Macedonia (n=14)
Poland (n=14)
Spain (n=23)
Switzerland (n=16)
Table 2. Average results from item n°60: For you personally, how important is English in
relation to other languages? Please rate from 1 to 5, with 1 as “not important at all” and 5
“extremely important”.
Alice Henderson, Dan Frost et alii
Importance of pronunciation
Finland (n=78)
France (n=52)
Germany (n=270)
Macedonia (n=14)
Poland (n=14)
Spain (n=23)
Switzerland (n=16)
Table 3. Average results from item n°61: For you personally, how important is
pronunciation in relation to other language skills? Please rate from 1 to 5, where 1 = “the
least important” and 5 = “the most important”.
Overall, teachers self-evaluated their pronunciation as being quite good (4.17 on a scale
from 1-5, with 5 being excellent). However, the question was perhaps misinterpreted:
‘Your own pronunciation skills’ could conceivably refer to one’s knowledge of
phonology/phonetics or one’s ability to pronounce English. The fact that German
respondents rate their own pronunciation skills worse (3.99) than teachers from other
European countries (except Poland at 3.92) is matched in the open answers by a high
level of awareness that they are not perfect. The following contribution serves as a case
in point: “I am able to avoid the specific German accent, so native speakers often can't
tell where I'm from, but they certainly can tell that I am not a native speaker of English”.
The Poles’ low average probably reflects a relatively critical self-evaluation with respect
to their own accent. The respondents are young and lacking in experience, and more
importantly they have just graduated from institutions which devote considerable time
and effort to making students aware of how much work they still have ahead of them.
Teachers’ level of pronunciation, self-assessed
Finland (n=78)
France (n=52)
Germany (n=268)
Macedonia (n=14)
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Teachers’ level of pronunciation, self-assessed
Poland (n=14)
Spain (n=22)
Switzerland (n=16)
Table 4. Average results from item n°63: How would you rate your own pronunciation
skills? Please rate from 1 to 5, where 1 = “extremely poor” and 5 = “excellent”.
4. Findings: Teacher training
The three questions concerning teacher training were:
 In relation to pronunciation, please rate the teacher training you received from 1
to 5, with 1 as “extremely poor” and 5 as “excellent”.
 Please tell us how much training you received specific to teaching pronunciation.
Feel free to mention any period of time (hours, months, years, etc.).
 Please explain the content and/or style of the training you received. Feel free to
mention types of courses, approaches, etc.
Participants’ comments reveal that many if not most appear to be amateurs when it
comes to teaching pronunciation. By amateurs, we mean not only that the participants
clearly love their subject (from the Latin, amator), but also that they appear to have
received little or no professional training which deals specifically with how to teach
pronunciation. It is surprising that whereas the average self-assessment of pronunciation
skills was quite high (4.64), the average rating of their training in relation to teaching
pronunciation should be so much lower (2.91, where 1 = extremely poor). Moreover, the
average might have been even lower, as participants may have confused “phonetics” and
“pronunciation”, despite the clear formulation of the questions.
When asked about the quality of pronunciation training they had received, only in
Finland was the average score above 3 (3.16 on a Likert scale from 1-5, where 5 was
“excellent”). One of the most frequent follow-up comments (given by nearly half of the
Finnish participants) referred to one or more pronunciation courses or described an
equivalent time spent on pronunciation training. Some respondents clearly pointed out
that they had been taught pronunciation but not how to teach it. Respondents seldom
learnt the skill of pronunciation teaching outside their teacher training (e.g. by studying
phonetics separately), nor was pronunciation teaching intertwined with other topics.
In Switzerland the replies about quantity of training varied dramatically, from “none
at all” (3 respondents), to vague references to training during CELTA courses, to a more
specific description of a 16-week course during a Bachelor’s programme. The latter did
not address the teaching of phonetics, but only “learning the symbols”. Considering the
average age of the teachers and the number of years they have been in service, perhaps it
Alice Henderson, Dan Frost et alii
is not surprising they could not give more precise details about how much pre-service
training they had received. Nevertheless, when asked to explain the content or style of
the training received, there were some more specific comments such as: “watched
teachers on DVDs”, “A speaker comes and then in groups we practice their teaching
methods.” Some people claimed to be self-taught: “Mainly gleaned from workshops and
using course books.” and another “… CELTA required a written paper on teaching it.
The rest has been basically self-taught.” There were also references to specific
universities, books, biographies, and authors.
German English teachers also feel that their training was not particularly satisfactory
with respect to pronunciation teaching (2.86). They provided a plethora of revealing
comments which raise several issues. First, some misunderstood the question in terms of
what counts as teacher training and mainly referred to university classes:
 “one semester of Language Lab exercises, a transcription class (one semester), a
lecture on English Pronunciation (one semester) - but I learned most during 10
months as an exchange student in Scotland (by 'doing')”
 “I studied at times of the former GDR that is why I didn't get much training and
can't express it in hours, etc. But I had an excellent phonetics teacher”
 “Phonetics classes at university consisting of transcription and theory (stress,
pitch etc.) and practical training to improve our own pronunciation”
There is also a widespread opinion that having good pronunciation is sufficient for
teaching pronunciation, however it may be acquired:
 “I went to study abroad, one year in Australia. Best pronunciation training ever”
 “None at all, but I lived in GB for a year”
 “Professors at the university and teacher trainers presumed that if one is able to
pronounce correctly, they will somehow be able to make the children pronounce
correctly, too”
Obviously, neither spending time in an English-speaking country nor having good
pronunciation oneself guarantee that one can teach pronunciation effectively. And while
there may be a lack of quality trainers in certain contexts (“Very little time was devoted
to teaching pronunciation, probably because one of the trainers spoke English with a
very heavy German accent”), in some cases respondents did report on practical
techniques they acquired during teacher training. In this, they were similar to many
respondents in Switzerland:
 “during teacher training: working in a language lab, listen and repeat exercises
(individual or in groups) with teacher or CD, ways of introducing new words and
their pronunciation, ways of controlling the correct pronunciation”
 “instructions on how to teach pronunciation to children in our 'Seminar' (teacher
training group)”
In their comments, 19 of the French respondents said they had very little or no training,
19 mentioned only the phonetics classes they received as undergraduates themselves and
9 mentioned training they had received at conferences, etc. which they had attended
since becoming teachers. It is possible that there was some confusion between
“phonetics” and “pronunciation”, as well as between the education received by
respondents as undergraduates and in their actual teacher training. However, a more
likely explanation is that the paucity of teacher training in pronunciation is so great that
for many respondents, the only experience on which they could draw was often their first
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
year phonetics and phonology lectures. In fact, very few respondents in France had
anything positive to say about their teacher training regarding the teaching of
pronunciation: “We had a few classes about the pronunciation of English, intonation etc.
but just the theory and no actual demonstration of how to teach them”. However, as one
aptly concluded: “knowing about something is certainly not the same as knowing how to
teach it”.
In Poland, few respondents (18.75%) said they had received formal training in
teaching pronunciation. In Spain, training was largely limited to one-year university
courses, and in one case two years. The quality, content and the practical application of
these courses in phonetics varied from university to university. Only 3 respondents had
received further training or taken subsequent courses after university, 27.77% of the
informants had received no or practically no formal training and a further 22.22%
described themselves as self-taught.
Macedonian teachers gave low ratings with regard to their training to teach
pronunciation, yet in their comments they highlighted the necessity of receiving good
training: “I believe the teacher should be very well trained in order to be good at
teaching pronunciation”. They reported that their first (and sometimes only) explicit
instruction in pronunciation was during their undergraduate course in English Phonetics
and Phonology: theoretical lectures on segmentals and prosody as well as various types
of activities for practicing phonetic symbols and phonemic transcription, English sound
formation and categorization, basic phonetic and phonological rules as well as different
types of intonation patterns. In several responses teachers referred to being self-taught;
additional training which they mentioned was related to English teaching in general and
not specifically pronunciation.
Similarly, when respondents from Finland were asked to describe the content and/or
style of their training, they listed very traditional pronunciation teaching methods:
phonetics and transcription, repetition and drills, discussion exercises, reading aloud, and
listening tasks. Training in the language lab was mentioned frequently, and some
mentioned a theoretical orientation, or that training had mainly consisted of lectures.
To conclude, limited or no specific training in teaching pronunciation seems to be the
norm, but non-native English speaker respondents have usually received training in
improving their own pronunciation.
5. Findings: Learners
This section covers teachers’ perceptions of their students, and more specifically of their
goals, skills, motivations and aspirations. The questions were as follows:
 Rate your awareness of your learners' goals. Please rate from 1 to 5, with 1 as “no
awareness” and 5 as “excellent awareness”.
 Please rate your awareness of your learners' skills. Please rate from 1 to 5, with 1
as “no awareness” and 5 as “excellent awareness”.
 Please rate from 1 to 5 how motivated you feel your learners are to speak English,
with 1 as “totally unmotivated” and 5 as “extremely motivated”.
Alice Henderson, Dan Frost et alii
 To what extent do you feel your students aspire to have native or near native
pronunciation of English? Please rate from 1 to 5, with 1 as “do not aspire to this
at all” and 5 as “aspire to this 100%”.
Interpretation of the results requires caution. For example, German English teachers
have the second lowest awareness of their students' goals (3.36) after France and the
lowest awareness of their students' skills (3.61). An interpretation along the lines of
reduced interest in the students or a reserved teacher-learner relationship would be an
overgeneralization that requires a more representative basis for comparison from the
other countries, as well as additional survey data, preferably learner-centred.
It would seem from the results that teachers in France are marginally more aware of
their students’ skills (3.98 on a scale from 1 to 5) than of their goals (3.77/5). If this is
the case, then the reasons are quite possibly cultural. The French academic grading
system is based on subtracting marks for errors from a total of 20 marks. In this way,
teachers are encouraged to search for weaknesses in their students, rather than for
strengths. As for the relative lack of awareness of learners’ goals, this may be due to
their irrelevance in France’s “top-down” society. France operates a national curriculum
in secondary schools and also in some tertiary institutions, so teachers are generally not
expected to take the needs of the learners into account themselves. Moreover, a certain
distance is maintained between teachers and their students, with the vous form and
Monsieur or Madame being used to address teachers at both secondary and tertiary
levels. Lastly, large class sizes do not help to encourage meaningful interaction between
students and their teachers; in universities modern language class sizes may run to 50 or
more. It would thus seem logical that teachers in France are less aware of their learners’
goals than in other countries – in fact we had expected the average to be even lower.
In Finland the difference between awareness of goals (3.58) and of skills (3.91) was
even more marked, but teachers’ further comments help to explain this, as well as the
fact that many of the Finnish respondents were teaching at primary level(29.1%) or
lower secondary (31.1%) in contrast to France, where the learners tend to be adults.Some
Finnish respondents referred to their own goals for the learners, e.g. “I know what their
goals should be,” but others mentioned learners having varied goals. Some teachers
working in the primary level seem to be of the opinion that young learners do not have
goals. When asked to comment on their awareness of learners’ skills, the most frequently
mentioned aspect by the Finnish respondents was lack of time or big groups. However,
respondents also stated it is the teacher’s duty to be aware of the learners’ skills and
The teachers in Switzerland showed a relatively high awareness of learners’ goals (4)
and claim to have a slightly lower awareness of learners’ skills (3.75). However, they
felt that their students were highly motivated to speak English (4.25). This was the
highest response to this question and this may reflect the perceived importance of
speaking English in Switzerland today (see Dürmüller (2002), especially in higher
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Teachers’ awareness of Learners’ goals
Finland (n=78)
France (n=52)
Germany (n=269)
Poland (n=14)
Spain (n=22)
Switzerland (n=16)
Table 5. Average results from item n°64: Rate your awareness of your learners' goals. Please
rate from 1 to 5, where 1 = “no awareness” and 5 = “excellent awareness”.
Teachers’ awareness of Learners’ skills
Finland (n=78)
France (n=52)
Germany (n=269)
Macedonia (n=14)
Poland (n=14)
Spain (n=21)
Switzerland (n=16)
Table 6. Average results from item n°65: Please rate your awareness of your learners' skills.
Please rate from 1 to 5, where 1 = “no awareness” and 5 = “excellent awareness”.
The two questions on learners’ general motivation to learn English and their aspiration to
achieve native-like pronunciation show that, overall, the former is greater than the latter,
in teachers’ estimations. In Poland, the low aspiration to sound native (2.71) is
Alice Henderson, Dan Frost et alii
understandable as most of the respondents teach children. German teachers' estimations
of the students' motivation to speak English (3.53) is the third lowest after Poland and
France, while their evaluation of the students' aspiration to sound native-like (2.94) is
Even though the Finnish respondents estimate their learners’ motivation to be quite
high (3.88 on average), the comments reveal very clearly that the learners have varied
levels of motivation; some are highly motivated whereas some show little interest. In
terms of aspirations, as indicated by teachers’ comments, learners in Finland opt for
intelligible communication in the target language rather than native-like pronunciation,
and here it seems that famous Finns such as motor sport heroes have shown them the
example: “Formula One drivers have proved to Finnish students it’s not necessary to
pronounce English perfectly to become rich and famous”.
It is not hard to interpret the results for Switzerland, with the highest average for
motivation to learn English (4.25) but a lower aspiration to sound native (3.38). It is a
country with four national languages, with many Swiss using English to communicate
with compatriots who speak a different language from themselves. Several Masters
courses are taught in English and many see English as essential for good job prospects,
but none of these reasons require native or near native pronunciation of English.
It would appear from the results that French learners are among the least motivated to
learn English. Respondents believe that their learners’ aspiration to achieve native or
near native pronunciation is relatively low (2.9/5)6. An explanation may lie in
institutional, linguistic and cultural factors. Firstly, many of the French respondents
teach partly or exclusively EAP &/or ESP, as learning a foreign language is a national
requirement in all disciplines at tertiary level in France; motivation and aspirations are
therefore often lower in language classes (Taillefer 2002). Secondly, it must be pointed
out that French and English are so very different phonologically (Hirst & Di Cristo
1998; Blum 1999; Vaissière; 2002; Frost 2010), that even the least pragmatic French
learners of English know that native-like fluency is a very difficult goal. Thirdly, the
French traditionally attribute a relatively high importance to written texts, both when
learning their native language and foreign languages (Duchet 1991). This often translates
in difficulties acquiring the phonological system of a foreign language later on. And
finally, the French tend to equate “fluent” with “perfect”; therefore even communicative
competence is a sort of perfection that they might not dare to aspire to. In this way, they
may resemble the Spanish respondents, who had the lowest average (2.6) for aspiration
to sound native-like.
This would tally with the results to items 60 & 66 on the importance of English in relation to
other languages and motivation to speak English respectively, where French averages (though
above the median on the 1-5 Likert scale) were the lowest (4.48) and second lowest (3.4) of the
seven countries.
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Students’ motivation to study English
Finland (n=78)
France (n=52)
Germany (n=269)
Macedonia (n=14)
Poland (n=14)
Spain (n=21)
Switzerland (n=16)
Table 7. Average results from item n°66: Please rate from 1 to 5 how motivated you feel your
learners are to speak English, where 1 = “totally unmotivated” and 5 = “extremely
Students’ aspiration to achieve native level
Finland (n=78)
France (n=52)
Germany (n=269)
Macedonia (n=14)
Poland (n=14)
Spain (n=21)
Switzerland (n=16)
Table 8. Average results from item n°67: To what extent do you feel your students aspire to
have native or near native pronunciation of English? Please rate from 1 to 5, where 1 = “do not
aspire to this at all” and 5 = “aspire to this 100%”.
Alice Henderson, Dan Frost et alii
6. Findings: Models of English
At the end of the survey, four items covered models of English:
 For RECEPTIVE work (listening, reading), which variety(ies) or model(s) of
English do you use in your classes? You may choose more than one answer.
 “…” … do your learners generally prefer?
 For PRODUCTIVE work (speaking, writing), which variety(ies) or model(s) of
English do you use in your classes? You may choose more than one answer.
 “…” … do your learners generally prefer?
This was not a ranking item and participants could give multiple answers. For
example, 94.7% of respondents in Finland chose RP (Received Pronunciation) as the
variety they prefer to use for receptive work but this did not exclude them from choosing
other varieties, too. Only the data for the three most frequently selected reference accents
for reception and for production work will be discussed: RP (Received Pronunciation,
GA (General American) and IE (“a type of International English”).
Throughout the countries, a clear discrepancy was found between which
varieties/models teachers use and which they think their students generally prefer.
Received Pronunciation (RP) is used by most teachers (receptive work RP: 91.63%, GA:
70.73%; productive work RP: 84.2%, GA: 53.84%). On the other hand, teachers indicate
that General American (GA) is preferred by students, but the difference is less clear-cut
(receptive work RP: 64.53%, GA66.69%; productive work RP: 55.24%, GA: 63.35%).
A type of “international variety” is also frequently mentioned by respondents for both
types of work, and as a variety they use and which their learners prefer.
Poland is the only country where 100% of teachers chose RP as the variety they use
for receptive and productive work. Anecdotal evidence shows that, partly because
teachers are aware that their learners are exposed to and enjoy GA through films and
music, during class they use RP materials. At university level, only Poznan offers a
choice of target variety, all the others use RP. Similarly, participants in Spain
overwhelmingly chose RP, such that their results are the highest overall (95%, 90% and
85%), except in the variety teachers chose for productive work (75%), where only the
French teachers chose RP less (65.38%).
One of the teachers in Switzerland commented thus on RP: “I don’t like the idea of
propagating the Queen’s English.” This would seem to be a native speaker luxury, as a
non-native teacher of English would probably never authorize themselves to say this. On
the whole, non-native teachers seem to prefer a clear reference point when teaching
English pronunciation and this is logically achieved by favouring one variety over the
other. In particular, this simplifies the assessment process. In Macedonia, for instance,
teachers favour using RP presumably because they were taught/trained in RP, the
reference model they are familiar and comfortable with. However, in the survey it
appears that they feel their students prefer General American. If pronunciation is
stereotypically thought to be the skill that is least prone to modification, it would be
interesting to explore teachers’ willingness (and/or ability?) to adapt their pronunciation
to the various demands of their learners (and not just in Macedonia). Another interesting
observation in the Macedonian data was the preference of so-called “global English”
where emphasis was placed on intelligibility as suggested in: “Global English means
global/ non-native pronunciation, and yet intelligible communication”.
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
The German data reveals that RP7 is still the variety teachers choose, for both receptive
(91.19%) and productive work (91.19%). General American however, is a respectable
second, at least in reception (80.08%); the same is true in France (RP: 80.77%, GA:
78.85%). In productive work, GA's status clearly lags behind RP (at 67.82% in Germany
and 50% in France), but by comparison with other European countries Germany ranks
second after Macedonia (69.23%), the lowest being Spain (35%). Again, teachers in
Macedonia may be making a nod toward their perceived students’ preference for GA
(100%) in productive work.
In contrast to a clear preference for RP among teachers, the teachers' evaluation of
the students' preferences seems less clear cut in both Germany and France. The survey
suggests that RP and GA are almost equal alternatives for students in Germany, both in
receptive (RP: 72.41%, GA: 73.95%) and in productive (RP: 72.03%, GA: 68.97%)
work. The situation is arguably similar in France: reception (RP: 61.54%, GA: 57.77%)
and production (RP: 51.92%, GA: 44.23%). In addition to other major varieties of
English, the label “a type of international English” surprisingly ranks third with German
respondents for students’ preferences in production and reception, as well as for teachers'
in production work. In terms of teachers' receptive work IE was only rated sixth
(21.46%), close to Scottish English (24.90%) and Irish English (22.61%) but far behind
Australian English (37.93%). This is quite likely to mirror the fact that audio samples of
these varieties often accompany EFL textbooks.
In Finland, a substantial proportion of teachers also use other varieties/models,
particularly for receptive work. For example using Australian English was nearly as
popular as using “a type of international English” which came third most popular after
RP and GA. Irish English, Scottish English and Canadian English were all mentioned by
more than 20% of the Finnish respondents for receptive work. This is perhaps due to the
fact that, as in Germany, EFL textbooks’ audio CDs include different native and nonnative varieties (Tergujeff 2009, 2010b; Kopperoinen 2011).
In Switzerland, for receptive and productive work teachers favoured RP followed by
GA. This corresponded to their perceived preferences for their learners, although one
commented: “I believe learners need a pronunciation model of some description to
promote intelligibility but I am not interested in forcing them to acquire a particular
accent.” The perceived preference for RP among learners was stronger in productive
(81.25%) than receptive (68.75%). “A type of international” English was the third
preferred option.
To conclude, based on the results for items 75 to 78, it is clear that although RP is
still the dominant form for both reception and production work in English, GA seems to
be making inroads. The increased use of the Internet both in teaching and at home is
perhaps an important factor here. From the teachers’ point of view, it is easy enough to
find audio and video examples of varieties to use in a formal class setting, but perhaps
more important is on-line informal learning of English (Sockett, 2011). Web 2.0
technologies such as peer-to-peer file transfer and streaming have led to a previously
unthinkable ease of access to media content, such as films and TV series which, given
the American cultural hegemony in these domains, has led to European learners of
English being exposed increasingly to American rather than British varieties of English.
Many teachers possibly understand RP as being Southern British English, based on their
Alice Henderson, Dan Frost et alii
Therefore, in addition to textbook comparisons and classroom research, surely the
influence of such informal influences merits more scrutiny. It is also not clear exactly
what IE refers to but, as the third most frequent choice, deserves closer investigation.
Item n°75
(% of Ts, receptive work)
Item n°76
(% of Ts, productive work)
Table 9: Results for items n°75 & n°76: Percentage of teachers who chose a variety for
receptive & productive work
Item n°77
(% of Ss, receptive work)
Item n°78
(% of Ss, productive work)
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Item n°77
(% of Ss, receptive work)
Item n°78
(% of Ss, productive work)
Table 10: Results for items n°77 & n°78: Percentage of teachers who indicated their
students’ preference for a variety, for receptive & productive work
7. Conclusions
The findings of this study have shed light on a cross-section of current themes in
pronunciation teaching across Europe, as well as providing valuable aid for future
studies. The three areas we have focussed on in this paper are teacher training, aims and
objectives, and models/varieties.
Our findings suggest that teacher training in relation to the teaching of English
pronunciation is woefully inadequate, according to the majority of participants. If this is
true, Europe today is similar to the United States in the 1990s, where J.M. Murphy
(1997) found that less than 50% of MA TESOL programmes had modules devoted to
phonology. This lack of training does not match the emphasis placed on English
pronunciation in the Common European Framework of Reference (CEFR), where
‘Phonological Control’ is one of the descriptors in the Language Competence/Linguistic
category. Pronunciation is also considered one of the key elements in the speaking
component of major international English language proficiency tests such as IELTS,
TOEFL and TOEIC. In other words, the apparent lack of teacher training in
pronunciation is not representative of the requirements of English language learning, as
many highly-regarded assessment procedures specifically refer to phonology.
Another crucial issue concerns the choice of objectives: should one aim for
intelligibility and communicative competence and/or native-like pronunciation? The
respondents’ comments showed that the choice necessarily influences what teachers
actually do with learners to achieve those objectives and how they learn to do that. In
relation to such pedagogical dilemmas, the issue of informal learning must be addressed
(Sockett, 2011): if games and online content provide constant, repetitive exposure to
certain accents, what impact does this have on teachers’ choices for classroom time?
In terms of varieties, RP is preferred by teachers though they do recognize that GA
might be more popular amongst students (except in Switzerland). The term
“International English”, a popular choice across the seven countries, also deserves
clarification: what characterizes it? who uses it in which situations? how should this
influence our teaching? and so forth. This issue also raised the importance of locally
produced – or at least relevant – materials, as well as addressing the environment outside
the classroom in ESL/EFL contexts. In her study of adult ESL in Ireland, D. Murphy
(2011) found that while pronunciation was regarded as a valuable element of English
language learning, little innovation in teaching practice was observed. Particularly
Alice Henderson, Dan Frost et alii
problematic was the discrepancy between the model of English pronunciation being used
by teachers, and the model on which materials were based. Arguably in some teaching
contexts there is a parallel mismatch between materials and context when non-native
English speakers, who might feel most comfortable teaching RP, are faced with a set of
youngsters who, obsessed with American games or TV series, have adopted American
accent features.
The survey presented in this paper is a pilot study, and as such, will be improved and
expanded on in further work. Certain items will need reworking, and certain themes will
need developing. Participation levels were sometimes uneven across the countries,
leading to the abandoning of data from Ireland, The Netherlands and Sweden. In
discussions it has become clear that in some contexts a paper-based survey might have
been more successful. Distribution was uneven within countries, with certain areas being
over-represented, e.g. the Francophone areas of Switzerland dominate the Swiss results,
and in France there are few participants from secondary schools or the private sector.
This means that sometimes it has not been possible to make certain cross-country
comparisons as we would not be comparing like with like.
The perspectives for further research are vast. Most importantly, the rest of the data
(e.g. concerning teaching conditions, methodology, technology, etc.) will be analysed
and follow-up phone interviews will be carried out. It would also be useful to compare
the data with learner surveys, shedding light on some of the more ambiguous findings.
Above all, we would like to use the experience we have gained from this collaborative
project by continuing to explore how varieties of English are chosen, taught and
perceived across Europe.
We would like to express heartfelt thanks to all of the teachers who took the time to
participate in this survey, including those in countries which were not covered by this
Donovan, P.J. 2001. Making Pronunciation a Priority for EFL Teachers and Learners. In
Levey, D., Losey, M.A. & González, M.A. (Eds.). English Language Teaching
Changing Perspectives in Context. Cádiz: Universidad de Cádiz (Servicio de
Publicaciones). 245-249.
Duchet, J-L. 1991. Code de l'anglais oral. Paris: Ophrys.
Dürmüller, U. 2002. English in Switzerland: From Foreign Languages to Lingua Franca.
In D.Allerton, P.Skandera, C.Tschichold (Eds), Perspectives on English as a World
Language, Basel: Schwabe. 114-123.
Foote, J.A., Holtby, A.K. & Derwing, T.M. 2011. Survey of the Teaching of
Pronunciation in Adult ESL Programs in Canada, 2010. TESL Canada Journal 29(1),
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Frost, D. 2010. Stress cues in English and French: a perceptual study. Journal of the
International Phonetic Association, 41(01): 67-84.
Gonet, W., Szpyra-Kozłowska, J. & Święciński, R. 2010. Clashes with ashes – The
acquisition of vowel reduction by Polish students of English. In Waniek Klimczak,
E. (Ed.) Issues in Accents of English 2. pp. 213-232. Newcastle upon Tyne:
Cambridge Scholars Publishing.
Hilgendorf, Suzanne K. 2007. English in Germany: Contact, Spread, and Attitudes.
World Englishes. 26.2: 131-148.
Hirst, D. & A. Di Cristo (Eds.). 1998. Intonation Systems: A Survey of Twenty
Languages. Cambridge: Cambridge University Press.
Janicka, K., M. Kul, & J. Weckwerth. 2005. Polish students’ attitudes to native English
accents. In Dziubalska-Kołaczyk, K. &. Przedlacka, J. (Eds.) English pronunciation
Models: A changing scene. Bern: Peter Lang. pp. 251-292.
Kangasvieri, T., E. Miettinen, P. Kukkohovi & M. Härmälä. 2011. Kielten tarjonta ja
kielivalintojen perusteet perusopetuksessa. Muistiot 2011:3. Helsinki: Finnish
National Board of Education.
Kopperoinen, A. 2011. Accents of English as a lingua franca: a study of Finnish
textbooks. International Journal of Applied Linguistics 21(1), 71–93.
Levey, D. 2001. Stressing Intonation. In Harris, T., Roldan, I., Sanz, I, &. Torreblanco,
M. (Eds.) ELT2000: Thinking Back, Looking Forward. Granada: Greta. 35-45.
Levey, D. 1999. Half Truths and White Lies: A Practical Pronunciation Guide for
Spanish Speakers. In Harris, T. & Sanz, I. (Eds.) ELT: Through the Looking Glass.
Granada: Greta. 215-226.
Lintunen, P. 2004. Pronunciation and Phonemic Transcription: A study of advanced
Finnish learners of English. Turku: University of Turku.
Murphy, D. 2011. An investigation of English pronunciation teaching in Ireland. English
Today, 27(4), 10-18.
Murphy, J.M. 1997. Phonology courses offered by MA TESOL programs in the US.
TESOL Quarterly 31, 741-761.
Paunović, T. 2009. Plus ça change... Serbian EFL students’ attitudes towards varieties of
English. Poznań Studies in Contemporary Linguistics, 45(4), 511-533. Retrieved
from http://versita.metapress.com/content/6563h54842u858nm/fulltext.pdf
Pavon, V. 2001. El Papel del Profesor en la Enseñanza de la Pronunciación. In Levey,
D., Losey, M.A. & González, M.A. (Eds.). English Language Teaching Changing
Perspectives in Context. Cádiz: Universidad de Cádiz (Servicio de Publicaciones).
Pavon, V. & Rosado, A. 2003. Guía de Fonética y Fonología para Estudiantes de
Filología Inglesa en el Umbral del Siglo XXI. Granada: Comares
Sajavaara, K., Luukka, M-R. & Pöyhönen, S. 2007. Kielikoulutuspolitiikka Suomessa:
Lähtökohtia, ongelmia ja tulevaisuuden haasteita. In Pöyhönen, S. & Luukka, M-R.
(Eds.), Kohti tulevaisuuden kielikoulutusta. Kielikoulutuspoliittisen projektin
loppuraportti. University of Jyväskylä: Centre for Applied Language Studies.
Sockett, G. 2011. From the cultural hegemony of English to online informal learning:
Cluster frequency as an indicator of relevance in authentic documents, Asp, la revue
du GERAS, 60, 5-20.
Alice Henderson, Dan Frost et alii
Statistisches Bundesamt (www.desatis.de). 2003a. Bildung und Kultur.
Allgemeinbildende Schulen [Education and Culture. Schools of general Education].
Fachserie 11, Reihe1. Wiesbaden.
deSchulen2110100037004,property=file.pdf; retrieved 2012-02-13]
Statistisches Bundesamt (www.desatis.de). 2011a. Bildung und Kultur.
Allgemeinbildende Schulen [Education and Culture. Schools of general Education].
Fachserie 11, Reihe1. Wiesbaden.
deSchulen2110100117004,property=file.pdf; retrieved 2012-02-13]
Statistisches Bundesamt (www.desatis.de). 2003b. Bildung und Kultur. Berufliche
Schulen [Education and Culture. Vocational schools]. Fachserie 11, Reihe1.
n2110200047004,property=file.pdf; retrieved 2012-02-13]
Statistisches Bundesamt (www.desatis.de). 2011b. Bildung und Kultur. Berufliche
Schulen [Education and Culture. Vocational schools]. Fachserie 11, Reihe1.
n2110200117004,property=file.pdf; retrieved 2012-02-13]
Taillefer, G. 2002. L’anglais dans les formations spécialisées à l’Université :un cheveu
sur la soupe? Peut-on rendre le plat plus appétissant ?, ASP 37-38, 155-172.
Tergujeff, E. 2009. Accent addition in Finnish EFL textbooks. Presentation given at the
3rd International conference on native and non-native accents of English, 12.12.2009
Lodz, Poland.
Tergujeff, E. 2010a. Pronunciation teaching materials in Finnish EFL textbooks. In
Henderson, A. (Ed.), English Pronunciation: Issues and Practices (EPIP):
Proceedings of the First International Conference. June 3–5 2009, Université de
Savoie, Chambéry, France. Université de Savoie: Laboratoire LLS.
Tergujeff, E. 2010b. Model pronunciation in Finnish EFL textbooks. Presentation given
at NAES-FINSSE 2010: English in the North, 11.6.2010 Oulu, Finland.
Tergujeff, E. (in print). English pronunciation teaching: Four case studies from Finland.
Journal of Language Teaching and Research, 43.
Tergujeff, E., Ullakonoja, R. & Dufva, H. (2011). Phonetics and Foreign Language
Teaching in Finland. In Werner, S. & Kinnunen, T. (eds.), XXVI Fonetiikan päivät
2010. Joensuu, Finland: University of Eastern Finland. 63–68.
Nowacka, M. 2010. The ultimate attainment of English pronuncitaion by Polish college
students: A longitudinal study. In Waniek-Klimczak, E. (Ed.) Issues in Accents of
English 2.Newcastle upon Tyne: Cambridge Scholars Publishing. 233-260.
The English Pronunciation Teaching in Europe Survey: Selected Results of a Pilot Study
Vaissière, J. 2002. Cross-Linguistic Prosodic Transcription: French versus English.
Problems and Methods in Experimental Phonetics, In honour of the 70th anniversary
of Prof. L.V. Bondarko,. N. B. Volslkaya, N. D. Svetozarova and P. A. Skrelin. St.Petersburg. 147-164.
Waniek-Klimczak, E. & K. Klimczak. 2005. Target in speech development: learners’
views. In K. Dziubalska-Kołaczyk &. J. Przedlacka (Eds.) English pronunciation
Models: A changing scene. Bern: Peter Lang. 229-250.
Research in Language, 2012, vol. 10.1
DOI 10.2478/v10015-011-0043-8
Faculty of Humanities and Social Sciences, University of Zagreb1
Faculty of Economics and Business, University of Zagreb2
This paper deals with the attitudes of Croatian speakers to ELF, in particular to its
pronunciation. Four methods were combined to reach conclusions about the status of ELF
in Croatia: diary study, teacher interviews, a preliminary focus group interview and a
survey. Whilst the first three methods revealed that the subjects regularly disfavour ‘bad
pronunciation’, the survey showed that when it actually comes to talking to either native
or non-native speakers, the subjects turned out to be tolerant to a slight accent. This
clearly suggests a case of what is known as linguistic schizophrenia (B.B. Kachru 1977;
Seidlhofer 2001). However, there are notable differences among groups of participants
depending on variables such as professional profile, gender, degree of ease and success in
learning pronunciation, and national pride. In any case, the combination of these methods
proved to be a very good way to deal with the topic. The diary study is a valuable method
to look into everyday practices and can feed nicely into survey questions. The preliminary
survey highlighted the importance of different groups of participants and the need for
groups of questions focusing around different factors. The preliminary focus group
interview showed that it is crucial to have a single homogenous group of participants, as
well as a trained facilitator. Finally, teacher interviews pointed to the possibility of similar
attitudes being held by university teachers and the students they teach, which suggests that
attitudes may be perpetuated. Overall, triangulation across methods and participants in the
way proposed in the present paper provided a wealth of data, allowing a bottom-up view
and a top-down view on the state of ELF in Croatia.
1. Introduction
Pronunciation of English as a Lingua Franca (ELF) first appeared as a research-based
construct: on a corpus of International English conversations, Jenkins (e.g. Jenkins 2000;
Jenkins 2002) postulated the existence of core features (those required for intelligibility)
and non-core features (those not required for intelligibility). Intelligibility was defined in
terms of non-native speaker interactions (e.g. Jenkins 1998:121). In other words, in an
international communication setting, features of English pronunciation such as pre-fortis
clipping and aspiration were shown to be crucial in assuring understanding, whereas
features such as qualitative vowel reduction or weakening were shown not to be crucial
in this respect (Jenkins 2002). At a time when the Inner Circle – Outer Circle debate has
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver
just ended (Kachru 1991; Kachru 1996), and the debate about the ownership of English
was still in full swing (Widdowson 1994; Firth and Wagner 1997), this was bound to be
a controversial issue (Jenkins 2002:101; Jenkins 2007; Jenkins 2009). What started as a
fundamentally applied-linguistics concept which was meant to add “an intelligibility
dimension to communicative competence” and promote “accommodation skills”
(Jenkins 2002:101) proved to be highly controversial, primarily because of attitudes
towards pronunciation.
It is hardly any wonder that attitudes are crucial when pronunciation is at issue. We
tend to judge people by their (foreign) accent, as is well known from the famous
matched guise research (Lambert 1967). As listeners we tend to prefer historically
powerful over historically less powerful groups based on their pronunciation
(Lindemann 2005), and we tend to prefer the in-group vs. the out-group (Dailey 2005).
Our self-concept as speakers is correlated to our “objective” pronunciation performance
(Chuming 2004), suggesting that affective factors underlie pronunciation performance.
Different motivations might also be at play: if we learn English because we like how it
sounds (results for Croatia from Mihaljević Djigunović 1991; Mihaljević Djigunović
2007), we might want to learn to sound like native speakers. Some people may want to
keep their national identity, which might be reflected in their English pronunciation
(Stanojević and Josipović Smojver 2011). Others might be simply influenced by their
English teachers, who tend to prefer a native-like pronunciation in various ways (Sifakis
and Sougari 2005; Jenkins 2006; Drljača Margić and Širola 2009; Stanojević and
Josipović Smojver 2011).
Given that attitudes are crucial in ELF, a variety of issues need to be taken into
consideration in order to find out about the state of ELF in a country such as Croatia.
Firstly, potential differences in attitudes towards ELF among different groups of ELF
speakers (e.g. according to age, gender, etc.) should be investigated. Secondly, data
about actual pronunciation practices of these ELF speakers should be included to see
whether (and to what extent) pronunciation practices and attitudes correspond. Thirdly,
we should investigate the attitudes of English teachers towards ELF to see if they
correspond to the attitudes of ELF speakers they teach. All this calls for a research model
which allows top-down confirmatory investigation and bottom-up exploratory research,
as well as using a variety of quantitative and qualitative methodologies to gain a
balanced insight into the issues at hand (cf. e.g. Gorard 2004). In other words, we argue
for a model that allows triangulation across groups of participants and methodologies
(we suggest the following procedures: language diaries by ELF speakers, teacher
interviews, focus group interviews, recordings of ELF speakers, and a questionnaire on
In this paper we will provide the rationale behind these procedures and give
preliminary results of combining language diaries, focus group interviews, a pilot
questionnaire and teacher interviews. We will discuss what they reveal about the state of
ELF in Croatia, and how they work together methodologically. The paper starts with a
discussion of the ELF situation in Croatia, and the methodological rationale. The third
section presents the results, followed by a discussion and conclusion.
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca
2. The ELF situation in Croatia and tools for ELF studies
Croatia has a rich tradition of research into Teaching English as a Foreign Language
(TEFL), but only a few studies on the status of ELF. TEFL studies (for an overview cf.
Vilke 2007) into the attitudes of secondary school learners in Croatia suggest that they
are dissatisfied with teacher-centred approaches to teaching (Mihaljević Djigunović
2007:124–125). This coincides with motivation research: secondary school pupils report
that they want to learn English so as to communicate with others (Mihaljević Djigunović
1991:195) in various ways, e.g. via the Internet, talking to foreigners, using email
(Narančić Kovač and Cindrić 2007:71–72). This may mean that secondary school pupils
are indeed willing to be independent users of ELF. The situation with university students
in Croatia seems to be a bit more complex – a recent study (Stanojević and Josipović
Smojver 2011) has found a clear divide between “liberal” students (ones who do not
disfavour a foreign accent when speaking to others, and who do not necessarily want to
work on their pronunciation), and more “traditional” ones (who do). Expectedly, the
more “traditional” students are primarily English majors (cf. also Drljača Margić and
Širola 2009) whereas, for instance, business majors tend to be more liberal (Kabalin
Borenić 2011). However, corresponding differences in the attitudes towards ELF were
also evident among men and women, participants living in urban or rural environments
and participants who assess themselves as more or less proficient pronouncers
(Stanojević and Josipović Smojver 2011). Thus, other factors such as identity
construction may be at play (cf. Josipović Smojver and Stanojević, in press). In order to
find out what these factors might be and how this relates to actual Croglish
pronunciation practices in Croatia (cf. Josipović Smojver 2010), we argue for a use of a
number of different methods. We propose the use of language diaries, teacher
interviews, focus group interviews, a questionnaire and (focus group) recordings. This
selection of methodologies enables triangulation in the sense of a qualitative-quantitative
mix, top-down and bottom-up view, as well as checking for attitudes and actual
Diary studies are a good starting point, because they are exploratory in nature (Bailey
1991:61), provide access to learner introspections (cf. their use in learner strategy
research; Richards 2009:157), and promote reflection (Allwright and Bailey 1991). They
are a good choice at the outset of this study, because they will give us access into a range
of possible attitudes towards ELF, tapping into an emic perspective that might otherwise
be outside our reach as researchers. This should allow us to include the emic perspective
when constructing the questionnaire about ELF attitudes.
Focus group interviews are a way to continue the emic perspective and to move away
from individual attitudes, because they can tap into group meanings and norms (Bloor et
al. 2001:17). They should be conducted with a relatively homogenous group of
participants discussing a particular topic so as to help understand it. The discussion
should be focused, and let by a skilful moderator (Krueger and Casey 2000:10). The
method has been used in market research for some time (Greenbaum 1998), and has
been gaining momentum in social research as well (Bloor et al. 2001). It has not been
extensively used in studying attitudes of speakers of foreign languages (Ho 2006), or
indeed ELF (cf. Gerritsen and Nickerson 2009:188; one exception is Grau 2009). Focus
groups are well suited for ELF research, because they are a useful interpretative aid
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver
when survey results are available (Bloor et al. 2001:17) and a valuable triangulation tool
(cf. Cohen, Manion, and Morrison 2007:377). Moreover, focus group interviews are
normally recorded, which may be a source of pronunciation data. We envisage a
threefold use of focus groups. Firstly, we hope to tap into group attitudes on ELF
pronunciation and use, which will help us get clearer insights into the trends visible from
the diary studies. Secondly, we will use focus groups to help us understand the results of
the ELF questionnaire, as a way of tapping into the emic perspective. Finally, focus
group recordings will be a source of objective data about English pronunciation. In order
to get a relatively natural setting for speaking English in a relatively monolingual
environment such as Croatia, we plan to use two facilitators who do not speak Croatian.
Three practical issues that need to be taken into consideration here include sampling,
training the facilitators and procedures for analyzing the recorded pronunciations, which
we cannot go into detail here.
The interview is a technique which allows a more in-depth look into individual
factors that may come up (as opposed to focus group interviews which investigate group
attitudes). It has been used time and again in ELF research with ELF speakers (Erling
and Bartlett 2006) and teachers (Jenkins 2005; Jenkins 2007; Jenkins 2009; Trent and
Lim 2010). We propose non-structured interviews with teachers of English at
universities across Croatia. Some recent survey-based studies in Croatia have found that
future teachers of English are not really open to teaching ELF (Drljača Margić and Širola
2009; Josipović Smojver and Stanojević, in press), which is in line with Jenkins’ results
saying that teachers of English are ambivalent towards ELF (Jenkins 2007). By talking
to Croatian teachers of English in academic settings, we hope to gain a deeper insight
into these issues and possible reasons behind them. Importantly, however, we will be
looking whether teacher attitudes are reflected in the attitudes of ELF speakers.
Finally, we envisage the use of a questionnaire on the attitudes towards ELF, which
will give us quantitative results. There are a number of general and practical issues
involved in questionnaire use in education research – from the way in which a
questionnaire is constructed to its administration (cf. e.g. Cohen, Manion, and Morrison
2007, 317–348; Dörnyei 2010). In this study, we have decided on using a pilot with a
number of closed questions regarding the attitudes to ELF on three groups of
participants. The results of the pilot feed into the focus group interviews (where we ask
for comments on some of its results), as well as the construction of the final
questionnaire (which will again be piloted).
Overall, we believe that this makes for a good mix of methodologies, giving a
reasonably comprehensive view of the state of ELF in Croatia. It provides
methodological triangulation because: (1) it combines attitude research with actual
recordings of ELF; (2) it allows exploration as well as confirmation; (3) it brings
together quantitative as well as qualitative data analysis; and (4) it looks into the
attitudes of teachers as one of the possible “takes” on what is going on with ELF
speakers. In the next section we will present and discuss some of our results in the
application of this research architecture.
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca
3. Results
Diary study
The purpose of the diary study was to explore the attitudes of individual ELF speakers
towards their English use in everyday situations. The participants were asked to keep a
diary for seven days and reflect on the following issues: how they and their
conversational partners used English that day, which aspect of their English use might
have stood out, why (or why not), and how they felt when using English. The
participants were volunteers, who had attended a class on Business English taught by the
second author. They were given a book for participating in the study. We sent out 15
invitations, and got back diaries from four participants (three male and one female, all in
their early or mid twenties). The low return rate was expected – although the participants
were alerted to the possible benefits of using a diary study (e.g. better awareness of their
use of English), they were no longer attending classes, and their internal (and external)
motivation seems not to have been sufficient. We performed a qualitative analysis of the
diary entries.
The results show that English was used as a matter of course in a variety of everyday
situations with native and non-native speakers of English. The results were particularly
enlightening with regard to: the use of English as a code-switching practice, the use of
English with other native and non-native speakers, and their attitudes towards English
The participants used English as part of their everyday Internet conversations (e.g.
chat), mostly by code switching from Croatian to English (in the words of one of the
participants: “I would use a phrase such as Hello, What’s up or Bye from time to time”).
All of the participants consider this type of code-switching an everyday practice, which
they believe everyone does at their age (“this is an everyday choice – I think in a mixture
of English and Croatian, and I frequently think of an English expression first, plus I am
certain that my conversational partner will understand me”). This is not strictly speaking
an ELF use, but English code switching was noted in different countries, and in a variety
or registers (McClure 1998). When it comes to computer-mediated communication, it
might be an identity-building practice which affirms group identity and communality (cf.
e.g. Androutsopoulos 2004; Leppänen 2007). Perhaps this is reflected in responses such
as “I believe that most young people use English when communicating via chat” or “it
has become normal to use [English], especially among young people”, where
participants refer to themselves as “young people” which might be the identity they want
to build.
The reported ELF use ranges from online chat with other non-native and native
speakers to speaking English with native speakers face-to-face and to formal writing in
English. When faced with an “unplanned” face-to-face conversation with a non-native
speaker, one of the participants reports that she felt “surprised and taken aback, but later
[her] speech became more fluent”. The participants who used ELF in online chats and
forums do not report such a feeling: “I used English as I do it every day, there was
nothing special about it” or “My choice of English was a matter of course, because for
many people on the forum English is their native tongue”. A reason why they do not
report surprise may be due to increased control (you can choose whether you want to
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver
enter a chat or a forum and when) and familiarity with their conversational partners (they
referred to them as “friends” and “acquaintances”), making the situation less stressful.
The participants do not seem to give much thought to their own English
pronunciation. Three participants constantly report being particularly aware of
“grammatical accuracy”, “syntax” and “spelling, for instance not being careful with
capitalizing when using chat”, and a single participant mentions that he paid attention to
his pronunciation on two occasions. This is not surprising, keeping in mind that most of
their ELF use is written rather than spoken. Still, when reporting on the speech of others,
all four participants mention pronunciation. For instance, when talking to a tourist faceto-face, one participant noticed that “he pronounced things wrong, because English
wasn’t his mother tongue, which made him difficult to understand”. When pronunciation
is not “incorrect”, it remains unnoticed: “I do not pay attention to the accent and
grammatical accuracy of my acquaintances, because all of them speak English well, the
communication flows without problems, and they pronounce English well”. As for
particular accents, only British and American English are mentioned, American English
being the norm: “British English, ... is not so usual for me; I usually listen to American
English” or “I like the sound of American English much better than British English”.
The results show that English is used in code-switching and in talking to native and
non-native speakers. The participants notice the pronunciation of their conversational
partners when they pose communication problems or are different from what they are
used to. Methodologically, the data suggest that the final questionnaire should include
questions concerning the situations when English is used, and particular English accents.
Still, a larger sample of diaries from a variety of participants would be instrumental to
generalize the results.
Preliminary survey
The preliminary survey was conducted on a sample of 2498 participants from throughout
Croatia, most of who were university students (58.6%), and the remaining were
secondary school pupils (25.5%) and employees in a large international company
(15.9%). Most of the participants were female (67.9. They were given an anonymous
questionnaire in Croatian, which contained 31 items (16 on a 5-point Likert scale and the
remaining offering a selection of several options). Four questions dealt with attitudes to
the regional pronunciation of Croatian. Seventeen questions dealt with attitudes to
English (beliefs about the importance of fluency, grammar and pronunciation, attitudes
towards one’s own pronunciation of English when speaking to native and non-native
speakers, attitudes towards learning English pronunciation, beliefs about the ease of
understanding non-native speakers, and attitudes towards (non-)native teachers of
English). The remaining questions dealt with participant data (for details on the
questionnaire cf. Stanojević and Josipović Smojver 2011 and Josipović Smojver and
Stanojević, in press). The aim of the questionnaire was to explore whether there were
any links between the participants’ characteristics (e.g. pupil vs. student vs. employee;
liberal vs. traditional attitudes towards Croatian; gender; and self-assessed proficiency)
and the way one perceives one’s own accent, the accent of one’s conversational partners
and teaching models.
The results show that most participants find pronunciation important (89% agree or
strongly agree that “correct pronunciation” is important), and 67% of participants agree
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca
that perfecting English pronunciation so as to pass for a native speaker is a worthwhile
endeavour, regardless of the time and effort it would take. Still, most believe that some
foreign accent is okay when talking to native or non-native speakers of English: most
participants would not mind having a strong or slight accent when talking to native
speakers (76.1%) or non-native speakers (82.3%). Native speakers are not preferred as
teachers of English pronunciation (M = 3.24, SD = 1.23), and the pronunciation of nonnative conversational partners is not preferred over native conversational partners (M =
2.97, SD = 1.32).
As expected, ANOVA showed that there were significant differences between
secondary school pupils, university students and company employees on all six
questions: the attitude towards perfecting their pronunciation so as to pass for a native
speaker (F(2,2480) = 31.94, p < .001), the importance of pronunciation when speaking
(F(2,2353) = 8.76, p <. 001), the acceptability of foreign accent when talking to native
speakers (F(2,2476) = 4.67, p = .009) and non-native speakers (F(2,2470) = 7.57, p <
.001), the belief that native speakers are better teachers of pronunciation than non-native
speakers (F(2,2444) = 15.48, p < .001), and the preference for non-native speakers as
conversational partners (F(2,2485) = 8.81, p < .001). Generally, company employees
tend to be on one end of the scale and pupils/students on the other. Scheffe’s post-hoc
test showed that employees scored significantly lower than pupils and students on
wanting to perfect their pronunciation, scored significantly higher on wanting native
speaker teachers, scored significantly higher on disfavouring a foreign accent with nonnative speakers, and scored significantly higher on preference for non-native speakers as
conversational partners. Scheffe showed no differences in disfavouring a foreign accent
when talking to native speakers (all groups score rather high on disfavouring a foreign
accent). Although all participants agree that correct pronunciation is important, Scheffe’s
post hoc test showed that secondary school pupils scored significantly lower than
university students (with employees in the middle).
Participants who strongly disagreed that ideal Croatian pronunciation should be
regionally unmarked (i.e. they have a “liberal” attitude towards Croatian pronunciation),
generally had a more liberal attitude towards English pronunciation. ANOVA showed
that there were significant differences between groups on four of the six questions: the
importance of pronunciation when speaking English (F(4,2344) = 8.35, p < .001), the
acceptability of foreign accent when talking to native (F(4,2468) = 15.08, p < .001) and
non-native speakers (F(4,2461) = 13.99, p < .001), and the attitude towards perfecting
their pronunciation so as to pass for a native speaker (F(4,2470) = 2.73, p = .028).
Scheffe’s post-hoc test showed that participants with a liberal attitude towards Croatian
scored significantly lower than all or most other participants on the importance of
English pronunciation and disfavouring one’s foreign accent when talking to native or
non-native speakers of English. There were no significant differences between groups
with regard to preferring native speakers as pronunciation teachers, and preferring nonnative conversational partners.
Gender differences were found on five of the six questions. Women scored
significantly higher than men on the importance of pronunciation when speaking English
(t(2345) = 4.06, p < .001), the acceptability of foreign accent when talking to native
(t(2469) = 3.53, p < .001) and non-native speakers (t(2462) = 2.71, p = .007), the attitude
towards perfecting their pronunciation so as to pass for a native speaker (t(2472) = 8.71,
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver
p < .001), and preferring non-native conversational partners (t(2475) = 3.70, p < .001).
There were no significant differences between men and women on preferring native
Finally, ANOVA showed that there were significant differences between participants
on all six questions with regard to how they assessed their own pronunciation: the
importance of pronunciation when speaking English (F(4,2337) = 44.09, p < .001), the
acceptability of foreign accent when talking to native (F(4,2460) = 39.48, p < .001) and
non-native speakers (F(4,2454)=28.09, p < .001), the attitude towards perfecting their
pronunciation so as to pass for a native speaker (F(4,2464) = 5.12; p < .001), the belief
that native speakers are better teachers of pronunciation than non-native speakers
(F(4,2429)=8.23, p < .001), and preferring non-native over native conversational
partners (F(4,2468) = 49.66, p < .001). Scheffe’s post-hoc test showed that speakers who
rate their pronunciation as poor score significantly lower than all other groups on the
importance of a correct pronunciation in English, wanting to perfect their pronunciation,
disfavouring a foreign accent when talking to native and non-native speakers, and on
wanting a native speaker to teach them pronunciation. Scheffe showed that participants
who rated their pronunciation as excellent or very good scored significantly lower on
preferring native speakers as conversational partners. Scheffe showed no differences
between groups on the attitude towards perfecting one’s pronunciation so as to pass for a
native speaker.
These results suggest that all of the explored parameters – participant profile,
attitudes towards Croatian, gender, and self-assessed proficiency may influence the way
in which one perceives the importance of English pronunciation, the acceptability of
foreign accent when talking to native and non-native speakers and the attitude towards
perfecting one’s pronunciation so as to pass for a native speaker. Significant differences
in the attitudes towards native vs. non-native teachers appeared only when learner status
was at issue (i.e. among students/pupils vs. employees, and different groups according to
self-assessed pronunciation proficiency), but not among groups according to gender or
the attitude towards Croatian. Significant differences in the attitudes towards non-native
conversational partners were present only between participants who had different
attitudes towards English, but not between subjects who had different attitudes to
Croatian. This suggests that the attitudes towards ELF may include several components
(e.g. one’s own actual pronunciation practice vs. teaching and learning pronunciation),
and may be related to two different sources of more or less liberal attitudes – those
referring to one’s own status in the learner-speaker continuum, and those referring to
other sociophonetic factors (e.g. gender, attitudes towards one’s native language).
Preliminary results of teacher interviews and focus group interviews
In addition to the two studies reported on above, we conducted three interviews with
university lecturers of English, and three focus group interviews with business majors
attending the Faculty of Economics and Business. They were used to obtain preliminary
results and get the feel for the methods at hand.
Three semi-structured interviews were conducted by the third author with university
lecturers of English as a Foreign Language, one teaching English teacher majors, one
engineering majors, and one business majors. It was a semi-structured interview, dealing
with the teachers’ beliefs about teaching pronunciation, appropriate models, ELF
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca
pronunciation, and with how they think their students regard pronunciation. The aim of
the interview was to see to what extent the attitudes of actual English teachers coincided
with the results of the preliminary questionnaire.
There were differences between the three participants depending on where they
teach. The participant teaching engineering majors believes that, when international
communication is at issue, pronunciation is a “means to an end”, which should be taught
only when serious misunderstanding might occur. The lecturer teaching business majors
believes that pronunciation is important for her students, in the sense that when they
communicate with others they might be judged by their pronunciation. The lecturer
teaching future teachers of English believes that pronunciation is paramount. All three
participants believe that their students hold the same views. When appropriate teaching
models are discussed, native models come to the fore, and all three participants explicitly
mention British and American English. As one of the participants says, British English
has a special status in Croatia, “because it used to be preferred in my education, and my
entire teaching career seems to have been revolving around it, but I am well aware of
American English as well”. All three agree that American English is the model of choice
among their students, and that students in general (at least on the declaratory level)
prefer native models. Finally, all three participants are keenly aware of the ELF
pronunciation as being present to various extents in international communication. They
accept it up to a point: when communication needs to be achieved, ELF might be an
okay choice, but certainly not “when future teachers of English are concerned” (who
should strive towards a native model). All three participants believe that ELF should
certainly not be a teaching model. One of them fears that “language might disintegrate”
because of this.
The results were somewhat expected – lecturers teaching students of different
profiles seem to be in touch with their students’ attitudes which followed from the
questionnaire (e.g. English majors going for native-like pronunciation, or engineering
majors going for understandability). Of course, the issue is whether these attitudes might
be perpetuated by the teachers themselves (cf. Stanojević and Josipović Smojver 2011).
On the methodological level, it is clear that valuable data can be obtained by using this
method, and that, given a larger sample, this data may supplement the data obtained
from the students.
Three focus group interviews were conducted with a group of business majors, as a
part of another unrelated study by the second author. The participants were asked to
comment on two findings: that most business majors prefer to talk to native speakers and
that they want to improve their pronunciation so as to pass for a native speaker. The
results show that the native speaker is seen as an authority figure by the members of the
focus groups, in the words of one participant: “I will learn more from a native speaker,
the non-native speaker’s mistakes might rub off on me”. The authority of native speakers
(American English is preferred by the students) is no doubt connected with the prestige
of native accents: “you might be ashamed of your bad accent, you might be the laughing
stock of others [if you speak] Russian English or French English”. Or: “People perceive
your speech as worse if you have a foreign accent, regardless of correctness or fluency”.
Thus, imitating a native variety might be a point of pride (“It is a challenge”; “I feel
good when I can do it”). Finally, the reasons behind going for native accents might also
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver
be issues of understanding: half of the focus group participants believe that it is easier to
understand native speakers.
As we hoped, the results of the focus group interviews provided a detailed account of
the reasons behind the answers of the business majors on the questionnaire, and
highlighted the need for multiple focus groups for different groups of participants. On a
practical level, this first attempt at using a focus group made it clear that better results
might be expected if training is provided for the facilitator, and if the focus group is
conducted in a more informal atmosphere, which is in accordance with the practical
suggestions from the literature (Krueger and Casey 2001). Moreover, it illustrated the
possible difficulty of creating a relaxed atmosphere vs. the need to make recordings that
are sufficiently high quality to be phonetically analyzed. These issues still remain to be
4. Discussion, conclusion and outlook
The results concerning the state of ELF in Croatia suggest several things. Firstly, there is
a clear case of linguistic schizophrenia (B. B. Kachru 1977; Seidlhofer 2001): one
should take time to study “proper” (i.e. native-like) pronunciation (cf. teacher interviews,
the survey), and a “bad pronunciation” is always noticeable (diaries, teacher interviews,
the survey, the focus group). Still, when it comes to talking to native or non-native
speakers, slight accent is okay (the survey). Attaining a native-like accent may be a good
reason for particular pride (the focus group). Overall, there are clear differences in the
attitudes towards pronunciation between different groups of participants: pupils, students
and employees, men and women, better and worse pronouncers and participants with
different attitudes towards Croatian (the survey). We need to get different groups of
participants to do diaries, take part in focus groups and teacher interviews, which would
shed light on these differences. Our focus on business majors showed that they use
English as part of their everyday life, that they notice the pronunciation of others
(diaries), and that native varieties for them are a source of prestige (the focus group). In a
study which looked into the differences among university students with different majors
in Croatia (Stanojević and Josipović Smojver 2011), business majors tended to be in the
middle of the scale (in between, e.g. students majoring in engineering and English). In
the light of this finding, we expect different results from diary studies and focus groups
with different participants.
It was rewarding to see that the various methods work well together, and that the
proposed triangulation may be a good way to gather extensive data, which will
(eventually) correspond to each other in different ways. We learned that the diary study
is a valuable method to look into the everyday practices, and that it may feed into
questions in the questionnaire concerning the everyday practices (such as internet use,
and communication in English). The preliminary survey highlighted the importance of
different groups of participants, and constructing groups of questions focusing around
different factors. The preliminary focus group interview showed that it is crucial to have
a single homogeneous group of participants, as well as a trained facilitator. We must
consider and try out the idea of recording the focus group to obtain actual pronunciation
data – and confirm the discrepancy between the actual pronunciation and attitudes.
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca
Teacher interviews pointed to the possibility of similar attitudes being held by university
teachers and the students they teach, which may indicate that attitudes are perpetuated.
Overall, triangulation across methods and participants in the way proposed here provided
a wealth of data, allowing a bottom-up view and a top-down view on the state of ELF in
Croatia. What remains to be seen is how these data on attitudes towards ELF will relate
to actual pronunciation practices.
Allwright, Dick, and Kathleen M. Bailey. 1991. Focus on the Language Classroom: An
Introduction to Classroom Research for Language Teachers. Cambridge University
Androutsopoulos, Jannis. 2004. Non-native English and Sub-cultural Identities in Media
Discourse. In Den Fleirspråklege Utfordringa, ed. Helge Sandøy, Endre Brunstad,
and Jon Erik Hagen, 83–98. Oslo: Novus.
Bailey, Kathleen M. 1991. Diary Studies of Classroom Language Learning: The
Doubting Game and the Believing Game. In Language Acquisition and the
Second/Foreign Language Classroom, ed. Eugenius Sadtono, 60–102. Singapore:
SEAMEO Regional Language Center.
Bloor, Michael, Jane Frankland, Michelle Thomas, and Kate Robson. 2001. Focus
Groups in Social Research. London, Thousand Oaks, New Delhi: Sage Publications.
Chuming, Wang. 2004. A Study on the Relationship between English Pronunciation
Self-concept and Actual Pronunciation. Foreign Language World (5).
Cohen, Louis, Lawrence Manion, and Keith Morrison. 2007. Research Methods in
Education. London; New York: Routledge.
Dailey, R. 2005. Language Attitudes in an Anglo-Hispanic Context: The Role of the
Linguistic Landscape. Language & Communication 25 (1): 27–38.
Dörnyei, Zöltan. 2010. Questionnaires in Second Language Research: Construction,
Administration, and Processing. 2nd ed. New York: Routledge.
Drljača Margić, Branka, and Dorjana Širola. 2009. (Teaching) English as an
International Language and Native Speaker Norms: Attitudes of Croatian MA and
BA Students of English. English as an International Language Journal 5: 129–136.
Erling, Elizabeth J, and Tom Bartlett. 2006. Making English Their Own: The Use of
ELF Among Students of English at the Free University of Berlin. Nordic Journal of
English Studies 5 (2): 9–40.
Firth, Alan, and Johannes Wagner. 1997. On Discourse, Communication, and (Some)
Fundamental Concepts in SLA Research. The Modern Language Journal 81 (3):
Gerritsen, Marinel, and Catherine Nickerson. 2009. BELF: Business English as a Lingua
Franca. In The Handbook of Business Discourse, ed. Francesca Bargiela-Chiappini,
180–192. Edinburgh: Edinburgh University Press.
Gorard, Stephen. 2004. Combining Methods in Educational and Social Research.
Maidenhead: Open University Press.
Mateusz-Milan Stanojević, Višnja Kabalin Borenić and Višnja Josipović Smojver
Grau, Maike. 2009. Worlds Apart? English in German Youth Cultures and in
Educational Settings. World Englishes 28 (2): 160–174.
Greenbaum, Thomas L. 1998. The Handbook for Focus Group Research. Thousand
Oaks, London, New Delhi: Sage Publications.
Ho, Debbie. 2006. The Focus Group Interview: Rising to the Challenge in Qualitative
Research Methodology. Australian Review of Applied Linguistics 29 (1).
http://www.nla.gov.au/openpublish/index.php/aral/article/view/1914. Accessed on
Feb 23, 2012.
Jenkins, Jennifer. 1998. Which Pronunciation Norms and Models for English as an
International Language? ELT Journal 52 (2): 119–126.
Jenkins, Jennifer. 2000. The Phonology of English as an International Language: New
Models, New Norms, New Goals. Oxford: Oxford University Press.
Jenkins, Jennifer. 2002. A Sociolinguistically Based, Empirically Researched
Pronunciation Syllabus for English as an International Language. Applied Linguistics
23 (1): 83–103.
Jenkins, Jennifer. 2005. Implementing an International Approach to English
Pronunciation: The Role of Teacher Attitudes and Identity. TESOL Quarterly 39 (3):
Jenkins, Jennifer. 2006. Current Perspectives on Teaching World Englishes and English
as a Lingua Franca. TESOL Quarterly 40 (1): 157–181.
Jenkins, Jennifer. 2007. English as a Lingua Franca: Attitude and Identity. Oxford
Applied Linguistics. Oxford University Press.
Jenkins, Jennifer. 2009. English as a Lingua Franca: Interpretations and Attitudes. World
Englishes 28 (2): 200–207.
Josipović Smojver, Višnja. 2010. Foreign Accent and Levels of Analysis: Interference
between English and Croatian. In Issues in Accents of English 2: Variability and
Norm, ed. Ewa Waniek-Klimczak, 23–35. Newcastle: Cambridge Scholars
Josipović Smojver, Višnja, and Mateusz-Milan Stanojević (in press) Stratification of
English as a Lingua Franca: Identity Constructions of Learners and Speakers. In
Teaching and Researching English Accents in Native and Non-native Speakers, ed.
Ewa Waniek-Klimczak and Linda Shockey. Springer.
Kabalin Borenić, Višnja. 2011. Attitudes to English and EFL Motivation in Croatian
University Students of Business - Results of a Pilot Research Study. In UPRT 2010:
Empirical Studies in English Applied Linguistics, ed. Magdolna Lehmann, Réka
Lugossy, and József Horváth, 135–151. Pécs: Lingua Franca Csoport.
Kachru, Braj B. 1977. Linguistic Schizophrenia and Language Census: A Note on the
Indian Situation. Linguistics 15 (186): 17–32.
Kachru, Yamuna. 1991. Speech Acts in World Englishes: Toward a Framework for
Research. World Englishes 10 (3): 299–306.
Kachru, Yamuna. 1996. Kachru Revisits Contrasts. English Today 12 (1): 41–44.
Krueger, Richard A., and Mary Anne Casey. 2000. Focus Groups: a Practical Guide for
Applied Research. Thousand Oaks, London, New Delhi: Sage Publications.
Combining Different Types of Data in Studying Attitudes to English as a Lingua Franca
Krueger, Richard A., and Mary Anne Casey. 2001. Designing and Conducting Focus
Group Interviews. In Social Analysis: Selected Tools and Techniques, ed. Richard A.
Krueger, Mary Anne Casey, Jonathan Donner, Stuart Kirsch, and Jonathan N.
Maack, 4–23. Social Development Papers 36. Washington: Social Development
http://siteresources.worldbank.org/SOCIALANALYSIS/11048901120158652972/20566697/SDP-36.pdf#page=10. Accessed on Feb 23, 2012.
Lambert, Wallace E. 1967. A Social Psychology of Bilingualism. Journal of Social
Issues 23 (2): 91–109.
Leppänen, Sirpa. 2007. Youth Language in Media Contexts: Insights into the Functions
of English in Finland. World Englishes 26 (2): 149–169.
Lindemann, Stephanie. 2005. Who Speaks ‘broken English’? US Undergraduates’
Perceptions of Non-native English. International Journal of Applied Linguistics 15
(2): 187–212.
McClure, Erica. 1998. The Relationship between Form and Function in Written National
Language - English Codeswitching: Evidence from Mexico, Spain and Bulgaria. In
Codeswitching Worldwide, ed. Rodolfo Jacobson, 125–150. Berlin, New York:
Mouton de Gruyter.
Mihaljević Djigunović, Jelena. 1991. Nastava engleskog jezika i motivacija za učenje.
Unpublished Ph. D. dissertation, Zagreb: University of Zagreb.
Mihaljević Djigunović, Jelena. 2007. Croatian EFL Learners’ Affective Profile,
Aspirations and Attitudes to English Classes. Metodika 8 (14): 115–126.
Narančić Kovač, Smiljana, and Ivana Cindrić. 2007. English Language Needs of
Croatian Students. Metodika 8 (14): 68–83.
Richards, Keith. 2009. Trends in Qualitative Research in Language Teaching Since
2000. Language Teaching 42 (02): 147–180.
Seidlhofer, Barbara. 2001. Closing a Conceptual Gap: The Case for a Description of
English as a Lingua Franca. International Journal of Applied Linguistics 11 (2): 133–
Sifakis, Nicos C., and Areti-Maria Sougari. 2005. Pronunciation Issues and EIL
Pedagogy in the Periphery: A Survey of Greek State School Teachers’ Beliefs.
TESOL Quarterly 39 (3): 467–488.
Stanojević, Mateusz-Milan, and Višnja Josipović Smojver. 2011. Euro–English and
Croatian National Identity: Are Croatian University Students Ready for English as a
Lingua Franca? Suvremena lingvistika 37 (71): 105–130.
Trent, John, and Jenny Lim. 2010. Teacher Identity Construction in School-university
Partnerships: Discourse and Practice. Teaching and Teacher Education 26 (8): 1609–
Vilke, Mirjana. 2007. English in Croatia - a Glimpse into Past, Present and Future.
Metodika 8 (14): 17–24.
Widdowson, Henry. 1994. The Ownership of English. TESOL Quarterly 28 (2): 377–
Research in Language, 2012, vol. 10.1
DOI 10.2478/v10015-011-0048-3
University of Rzeszów
This article is an attempt to review the most recent phonetic literature on the application
of questionnaires in phonetic studies. In detail, we review the scope of pronunciation
questionnaire-based surveys with respect to Polish and non-Polish students of English. In
addition, this paper aims to examine European students’ beliefs and attitudes towards their
own English pronunciation and is also intended to provide some arguments for or against
the use of foreign-accented rather than native models of pronunciation in phonetic
The data come from three groups of informants, namely: Italian, Spanish and Polish
students of English. With respect to foreign, non-Polish respondents, the study was
conducted at the University of Salento in Italy and the University in Vigo, Spain within
the framework of the Erasmus Teacher Mobility Programme in two consecutive academic
years: i.e. 2010/2011 and 2011/2012. As regards Polish respondents, our research
involved subjects from six different tertiary schools, i.e. five universities and one college,
located in various parts of Poland.
On balance, the results of our study give an insight into the phonetic preferences of
adult European advanced students of English with reference to the importance of good
native-like pronunciation, the aims of pronunciation study, factors contributing to
phonetic progress and their self-study pronunciation learning strategies. Our findings
point to the fact that students of English wish to speak with good pronunciation, set a high
native-like standard for themselves, report having benefited from their phonetic
instruction and exposure to native English and that they work on their pronunciation by
means of various, mostly cognitive, strategies.
Rather than casting new light on teaching pronunciation, the outcome of this study is
consistent with the findings of other research on foreign students’ choice of preferred
pronunciation model, which is undeniably native rather than foreign-accented.
1. Introduction: the outline of questionnaire-based studies
A common method of eliciting learners’ judgments on various aspects of language
teaching and learning is the use of questionnaires. In the phonetic literature a wide array
of questionnaires concerning pronunciation can be found. Although it is claimed that
they are not reliable, since they present the respondents’ subjective opinions and
judgments about the situation rather than the bare facts themselves, they are a frequently
used assessment tool as they provide valuable feedback to teachers. Anybody willing to
Marta Nowacka
make use of an opinion survey should consult Dörnyei’s (2003) and Presser et al.’s
(2004) publications about the nature, the merits and the shortcomings of questionnaires.
In addition, Dörnyei (2003) discusses their construction and administration and the
processing of questionnaire data. Moreover, Presser et al. (2004), apart from covering
topics of current research, examine practical interests in questionnaire survey
methodology and sampling.
Thus, numerous publications present the results of such surveys of opinions. For the
purpose of this analysis we have examined about fifty questionnaire-based pronunciation
studies and divided them into two groups, i.e. firstly, the surveys that focus on
international informants and then the ones that concern Polish respondents exclusively.
The former studies, conducted on the international scene, concentrate on different
aspects of pronunciation education, researching, for instance:
 attitudes to pronunciation in EFL1 (Porter and Garvin 1989);
 attitudes to foreign accent or native-likeness in the L2; pronunciation selfevaluation (Hammond 1990);
 the importance of ‘good pronunciation’ (Kenworthy 1990);
 phonology in teacher training courses (Bradford and Kenworthy 1991);
 factors affecting pronunciation learning (Edwards 1992 as cited in Barrera Pardo
 knowledge of English pronunciation, motivation and self-awareness (CelceMurcia et al. 1996);
 the content of phonology courses in the USA (Murphy 1997);
 motivation in pronunciation (Dalton and Smit 1997);
 students’ awareness of the difficulty and importance of English pronunciation;
influential factors in the acquisition of pronunciation; attitudes towards English
accents (Cenoz and Garcia-Lecumberri 1999);
 teaching intonation among EFL practitioners (Roads, 1999);
 proclaimed and perceived wants and needs among Spanish teachers of English
(Walker 1999);
 pronunciation learning styles (Basso 2000);
 the effectiveness of teaching pronunciation to Malaysian TESL students
(Rajadurai 2001);
 pronunciation views and practices of reluctant teachers in Australia (MacDonald
 native speaker norms and International English (Timmis 2002);
 learners’ ethnic group affiliation and L2 pronunciation accuracy; native-like
nonaccented L2 speech (Gatbonton et al. 2005);
 links between pronunciation teaching, EIL and the sociocultural identity of nonnative speakers of English; awareness of EIL-related matters (mutual
intelligibility in non-native to non-native communication) (Sifakis and Sougari
All the abbreviations that are included in this paragraph are explained here: EFL – English as a
Foreign Language, L2 – second language, TESL – Teaching English as a Second Language, EIL
– English as an International Language, ESP – English for Specific Purposes.
Questionnaire-Based Pronunciation Studies
 international students’ attitudes towards English pronunciation and the
comparison of Euro-English with the Lingua Franca Core (Bryła 2006);
 students’ evaluation of learner corpora in L2 prosody research and teaching (Gut
 perception of foreign accent by native and non-native speakers (Vishnevskaya
 personality traits (extroversion, empathy etc.) and pronunciation talent in L2
acquisition (Hu and Reiterer 2009);
 musicality and the phonetic language aptitude (Nardo and Reiterer 2010);
 pronunciation preferences for phonological variation among linguistically trained
and untrained respondents (Benrabah 2010);
 native and non-native perception of foreign-accented speech (Nowacka 2010);
 students’ attitude toward pronunciation: the perceived utility of pronunciation,
level of confidence and interest in pronunciation, teachers’ views and practices
with regard to pronunciation instruction (Yeou 2010);
 English pronunciation teaching practices in European countries/survey
(Henderson, in press; Henderson and Frost et al. in press);
 pronunciation identity constructions of learners and speakers among Croatian
students (Josipović Smojver and Stanojević, in press);
 the phonetic needs of French EFL students (Nasser-Eddine, 2011);
 students’ metacognitive awareness; pronunciation learning strategies (Murphy, in
 EFL pronunciation attitudes: standard Croatian, self-assessment of English
pronunciation, perceived role in the exchange (Stanojević et al., in press);
 the changing attitudes to accents in professional discourse of learners of ESP
(Tyurina and Koltzova, in press);
 French students’ familiarity with, and attitudes towards, other foreign accents in
English (Scheuer, in press);
 and teaching pronunciation in EFL classes (Luke [nd]) to give some examples of
such studies.
Some questionnaires have been administered solely to Polish students of English in
order to examine their views on different aspects of phonetic instruction. The most
frequently discussed issue concerns the teaching and learning of English phonetics at
schools of higher education, i.e. universities and colleges (Waniek-Klimczak 1997;
Dziubalska-Kołaczyk et al. 1999; Sobkowiak 2002; Wysocka 2003; Wrembel 2005) as
well as at secondary schools (Szpyra-Kozłowska et al. 2002; Wrembel 2002).
Other fields of interest within phonetics comprise:
 students’ attitudes to teaching suprasegmental phonetics on the basis of authentic
texts (Pospieszyńska and Wolski 2003);
 the role of metacompetence in the acquisition of FL phonology (Wrembel 2003);
 phonetic transcription (Ciszewski 2004);
 students’ judgments of the English pronunciation model (Szpyra-Kozłowska
 the goals of L2 pronunciation instruction; subjects’ attitudes to native speaker
varieties and their perception of speech with disturbed rhythm (Janicka 2005);
Marta Nowacka
 phonetic learning preferences in relation to field dependence and independence
(Baran 2006);
 features which condition success in the acquisition of English phonetics (Gonet
 the use of the language laboratory in modern pronunciation pedagogy (SzpyraKozłowska et al. 2006);
 pronunciation learning strategies with a focus on advanced students (Pawlak
2006, 2008, 2010, in press);
 attitudes to native English accents as models for EFL Pronunciation (Janicka et
al. 2008);
 pronunciation self-evaluation (Nowacka 2008);
 target in speech development: the choice of model, accent preferences, the
attainment of native-like accent, the role/importance of pronunciation as a
subskill in communication (Waniek-Klimczak and Klimczak 2008)
 and attitudes to male and female voices (Szpyra-Kozłowska and Pawlak 2010).
To sum up, a wide array of pronunciation-related topics have been researched with the
use of questionnaires. Thanks to the data collected in them, teachers and researchers can
formulate some generalizations about, for instance, students’ phonetic preferences,
which are the centre of attention in this analysis.
2. Experimental design
In this section the aims of the study, the questionnaire design, respondents and
questionnaire administration are presented.
The aims
Although, as has been demonstrated in the preceding section, students’ views on English
pronunciation have been studied in several questionnaires, for the purpose of this
analysis we felt it necessary to examine international, i.e. Italian, Spanish and Polish,
students’ phonetic preferences. We examine and compare four aspects of their
pronunciation teaching and learning, namely: the importance of good native-like
pronunciation, the aims of pronunciation study, factors contributing to phonetic progress
and self-study pronunciation learning strategies.
There is also a secondary aim to this study, namely to provide an argument in the
discussion about changing models of pronunciation, e.g. EIL/LFC on the basis of
students’ preference for or disregard of native standards. This intention was triggered by
Remiszewski’s (2008) call for such investigations:
The debate [how to teach pronunciation in the EFL classroom] must embrace the attitudes
and beliefs of the learner. Paradoxically, proposals centered around LFC are claimed to be
designed for the learner’s benefit, but at the same time we still know so little about the
learner’s actual point of view. This must change, as the data which are already available
show that a more thorough analysis of learners’ motivations and beliefs can cast some
new light on the discussed problem. As for now, the picture is far from complete.”
(Remiszewski, 2008: 307)
Questionnaire-Based Pronunciation Studies
Questionnaire design
The questionnaire was designed for the purpose of my PhD in the year 2004. It
contained seven questions, both open and closed, which initially the Polish informants
were asked to answer. The results concerning the first four questions are reported below
and then followed by a discussion of several pertinent issues that emerged from their
Respondents and questionnaire administration
157 students of English, from three different countries, i.e. Italy, Spain 3 and Poland, took
part in this project. The Polish students made up the majority (58%), the remaining 42%
was shared by the Italians (24%) and the Spanish (18%).
The data on the Italian respondents were collected at the University of Salento,
Lecce, in the south of Italy, in April 2011. The informants were all second year students
of the Faculty of Modern Arts (Facolta di Lettere Moderne). Most of these 38
participants were female (35), with a mean age of 20.5. They had been learning English
for about 11.5 years and their proclaimed level of advancement in English was on the
whole intermediate (87%).
As regards the survey administration in Spain, in October 2011, the questionnaire
was conducted with 28 second year students of the University of Vigo, in the northwestern part of Spain, in the Faculty of Translation. Females constituted the majority
(68%). The students’ mean age was 20. Their declared length of studying English was 14
years and they mostly regarded themselves as upper-intermediate (61%) and advanced
students (36%). Thus, their level of proficiency was one stage higher than that of the
When it comes to the Polish informants, the data were gathered in the year 2004.
Unlike previous studies of this kind, our research involved subjects from six different
tertiary schools (five universities and one college), located in various parts of Poland in
Kraków, Lublin, Łódź, Poznań and Sosnowiec and also at the college in Rzeszów. 4 A
total of 91 Polish tertiary school students of English, who were randomly selected at the
respective centres, participated in the study. They are regarded to be a homogenous
group as all of them were final year students of English. University students (62) were in
the majority, constituting 68% of the subject population under study, while college
subjects (29) were in the minority, i.e. 32%. These informants reflect the student
population of English at tertiary schools quite well since female students (70, 77%)
outnumbered their male counterparts (21, 23%) as they usually do.
The experiment, which consisted of a written questionnaire and a recording of reading and
spontaneous speech, was conducted during the summer term, over a period of two months, from
March to April 2004. In this article only some written data are discussed. The analysis of the
recording, the students’ self-evaluation of their pronunciation, native and non-native ratings of
the subjects’ phonetics are presented in Nowacka (2008).
The visit within the Erasmus Teacher Mobility Programme.
The experiment was held at the following universities: Jagiellonian University in Kraków (14
respondents), Maria Curie-Skłodowska University in Lublin (11), The University of Łódź (10),
Adam Mickiewicz University in Poznań (12) and The University of Silesia in Sosnowiec (15)
along with one college, the Teacher Training College of Foreign Languages in Rzeszów (29).
Marta Nowacka
To recap, we present the results of the questionnaire conducted with 157 subjects of
three nationalities in order to formulate some conclusions on international students’
phonetic wants and needs.
3. Results and discussion
This section discusses the results corresponding to each questionnaire statement in the
order in which they appeared in the survey.
3.1. Importance of good English pronunciation
At first, the respondents were asked to take a stance on the problem expressed in the
following statement “It is important for me to have good English pronunciation.” Their
task was to gauge its importance on a 5-point scale, i.e. “strongly agree – agree –
undecided – disagree - strongly disagree” The notion of ‘good’ was not defined as it was
the informants’ task to decide what it meant for them. In this respect the survey has
confirmed the obvious, which can be seen in Figure 1.
Figure 1: Statement 1: “It is important for me to have good English pronunciation.”
Almost all respondents (98%) have positive beliefs regarding the importance of speaking
English with good pronunciation. To be more precise, 69% strongly agreed with this
statement; the rest (29%) chose a more moderate option by ticking the answer ‘I agree’
while the remaining 2% chose the ‘undecided’ and ‘disagree’ option.
After choosing an answer, the subjects were to give reasons for their choice. To
justify their opinion, the informants supplied arguments which can be grouped into three
major categories. According to some of them, it is important to have good English
pronunciation in order to: sound like a native/near-native speaker, to be clearly
understood/to communicate successfully/to avoid misunderstandings as well as to be a
good model for students as a teacher, and to clients as an interpreter, in the future.
To conclude this section, it should be stated that in general, a positive picture
emerges from this set of responses since nearly all students of English consider it
important to speak English with good pronunciation. In general, the reasons for such an
opinion are as follows: they wish to sound native-like, want to be clearly understood or
Questionnaire-Based Pronunciation Studies
simply feel that good pronunciation should be part and parcel of their professional
3.2. Aims of the pronunciation study
The second questionnaire point sought to obtain the respondents’ opinions as to the
following statement: Students should aim for native English pronunciation.5 Figure 2
shows the obtained results.
Figure 2: Statement 2: “Students should aim for native English pronunciation.”
It is clear that the majority of students (89%) agree with this statement, with 31% opting
for ‘strongly agree’ and 58% for ‘agree.’ The remaining 8% are undecided as to whether
native English should be a goal of pronunciation education, and 3% disagree with such
an idea. Thus, in all likelihood we can predict that most of them would aspire to the
native or near-native model of pronunciation in their speech.
Additionally, in order to see whether native-like pronunciation rather than EIL is
favoured by students, we rephrased the afore-mentioned statement in a different way, i.e.
we referred to the informants’ choice of pronunciation “I attempt to speak with native
English pronunciation.”6 Figure 3 presents the obtained results, which prove that the
majority of the students, i.e. 86% of Spanish and 84% of Italians, wish to speak with
native pronunciation. No statistically significant differences between the examined
nationalities can be found here.
Similarly to question 1, the same 5-option continuum was used to obtain responses.
This statement was tested only with Italian and Spanish subjects.
Marta Nowacka
Figure 3: Statement: “I attempt to speak with native English pronunciation” tested with
Italian and Spanish respondents.
To sum up, it should be noted that the majority of the students in this study maintain that
they aim for native English pronunciation. What we have learnt from the respondents’
justifications is that they assume that nativelikness should be the target for language
specialists and other learners’ pronunciation should be intelligible enough to allow them
to communicate. We have also noted a few voices stating that native-like pronunciation
increases one’s chances of finding a good job in the European Union, and one dissenting
voice saying that accent-free English speech deprives a foreigner of his/her own identity.
3.3. Factors contributing to phonetic progress
Responses to question 3 were to supply information on the factors which have a major
influence on the informants’ pronunciation. Figure 4 summarises the results.
Figure 4. Response to question no. 3: What factors have contributed to improving your English
pronunciation most?
Questionnaire-Based Pronunciation Studies
As can be seen from Figure 4, ‘listening to authentic English’ (88%) is claimed to be the
most beneficial factor which has contributed to improving students’ pronunciation most.
The ranking of the remaining factors, from the most to the least useful ones is as follows:
practical phonetics classes (58%), contacts with native speakers (57%), imitating
authentic speech (56%), self-study on pronunciation (33%), stay in an English speaking
country (31%), primary/secondary school English teacher's classes (29%) and
‘Descriptive Grammar’7 classes (19%).
3.4. Pronunciation self-study and pronunciation learning strategies
(henceforth PLS)
The next questionnaire task, expressed in question 4, Have you ever worked on
improving your pronunciation on your own outside the classes? was intended to reveal
whether or not the respondents have ever made a self-initiated conscious effort at
improving their pronunciation outside phonetic training at their tertiary school. The
obtained figures are encouraging, with ¾ (76%) of the respondents claiming to have
worked on pronunciation on their own, and only ¼ (24%) admitting that they have never
done so.
Those who acknowledge self-practice of pronunciation were further asked to reveal
how they do it. The respondents report having used a wide variety of self-study
techniques since they list as many as 37 different strategies. A lot of these techniques are
very similar and might be grouped into more general categories of the traditional ‘listen
and repeat’ type. Most students specify more than one form of self-practice (averaging
1.6). The most popular PLSs mentioned by students are: reading aloud to oneself (9%),
listening to and imitating authentic speech (8%), drilling difficult words and utterances,
making use of transcription and checking the pronunciation of words in dictionaries.
To classify PLS we found it convenient to follow the taxonomy created by Pawlak
(2010), thanks to which cognitive, metacognitive, social and affective strategic devices
could be distinguished. It turned out that cognitive strategies were the most frequently
applied by our subjects (27 PLSs). According to Pawlak’s (2010:195): “(…) the group of
cognitive PLS is by far the most elaborate, both with respect to the sheer number of
strategic devices and their specificity, which is fully warranted by the fact that it contains
actions and thoughts which are directly involved in studying and practising target
language pronunciation, thus constituting the core of the whole classification scheme.”
The cognitive strategies were then followed by 7 metacognitive techniques8, namely:
‘recording oneself’, ‘practising pronunciation of separate words and sounds’, ‘recording
oneself on a tape and then listening and making corrections’, ‘self-monitoring’,
‘listening to pronunciation (paying attention to it while listening to authentic English)’,
‘recording BBC news and then recording oneself and comparing one’s pronunciation
with the original’, ‘writing down a tapescript with a focus on unfamiliar sounds and
words’. Among the responses there were also 3 social strategies such as ‘talking to a
non-native speaker who knows the language and has better pronunciation than me’,
By ‘Descriptive Grammar classes’ I meant the theory of phonetics and phonology.
Some of these ‘metacognitive’ strategies overlapped to some extent with ‘cognitive’ ones.
Marta Nowacka
‘talking to other students’ and ‘attending conversation classes with American native
speakers’. Not even one respondent pointed to affective strategies which involve such
things as rewarding and/or encouraging oneself or the use of relaxation techniques. The
above-mentioned findings are consistent with the results of other researchers (cf.
Droździał-Szelest 1997; Petersen 2000 as cited in Pawlak 2010:198; Pawlak 2008,
Cognitive PLS were the winners. Respondents reported the use of 27 different
strategies. These strategies correspond to some extent to the skills of listening, speaking
and reading or the skills combined. The respondents’ pronunciation techniques based on
listening enhanced by other activities are as follows: listening to BBC (on the radio),
authentic English (on TV), English songs and films; listening to and reading (BBC
English) materials; listening to (English) tapes/BBC World on the radio and repeating
after a model/imitating the speaker/authentic speech, as well as watching English
language programmes. The skill of speaking and in particular work on correct
articulation of English could be what our informants had in mind when they reported:
imitating authentic speech (audio books, films etc.)/native speakers; practising along
with films; singing songs in English (simultaneously with the singer on the CD); talking
to British friends/oneself in English; conversing in English with foreign students while
staying abroad; speaking aloud (revision before exams); murmuring to oneself and even
drilling particular words/groups of words “which I found difficult”/repeating certain
words and phrases/authentic utterances. Some responses point to the subjects’ use of
different sources of educational materials, e.g. studying pronunciation with books, tapes
and phonetic transcription of words; checking pronunciation (of unknown words) in a
dictionary and then pronouncing them aloud/working with some pronunciation
dictionaries; using original tapes with English pronunciation/practical phonetics
textbooks /doing some activities; doing pronunciation exercises on the Internet. One of
the respondents mentioned reading aloud (to oneself) and yet another identified staying
in an English-speaking country and ‘absorbing’ the language as one of their
pronunciation learning strategies.
This outcome to some extent confirmed the obvious, as Pawlak (2010: 191-192)
points to their similar ranking: “… in the group of direct PLS, it is cognitive strategies,
such as naturalistic and formal practice or attempts to analyze the sound system that are
likely to play the most significant role. (…) [indirect] strategic devices will probably be
utilized less frequently than direct ones (…) with learners opting mainly for
metacognitive strategies (e.g. planning for a language task or self-evaluation) rather than
social (e.g. asking a classmate to correct one’s pronunciation) or affective ones (e.g.
encouraging oneself to practice new sounds).”9
On the whole, subjects have pointed to some time-consuming but beneficial methods
like recording oneself followed by a detailed analysis of the outcome and self-correction.
They see the importance of pronunciation self-study, realize that formal phonetic
classroom training is insufficient, they work on pronunciation on their own, and they
report using numerous and varied self-study pronunciation strategies, mostly cognitive
Direct PLSs are the ones that require mental processing of language, while indirect PLSs are
those that support learning in general and do not have to involve target language use.
Questionnaire-Based Pronunciation Studies
4. Statistical analysis
The application of Pearson's Chi-square Test for Independence has allowed us to
measure if the dependencies between the nationality and the examined variables
concerning pronunciation are statistically significant.10
The results suggest that the respondents’ nationality does not affect the first two
variables, i.e. “It is important for me to have good English pronunciation” (p>α,
p=0.55535) presented in Figure 5 and “Students should aim at native English
pronunciation” (p>α, p=0.52756) shown in Figure 6. In other words, the distribution of
responses to the above-mentioned statements is similar regardless of students’
Figure 5. Nationality versus good pronunciation.
Figure 6. Nationality versus native English pronunciation.
The significance level selected for this study is α=0.05. It is assumed that when:
p < 0.05 there is a statistically significant dependency (marked with*);
b) p < 0.01 there is a highly significant dependency (marked with**);
c) p < 0.001 there is a very high significant dependency (marked with***).
Marta Nowacka
However, as can be seen in Figure 7, the test has proved that there is a dependency
between the respondents’ nationality and pronunciation self-study (p<α, p=0.01249).
Even if we look at the percentage we can see that the Italian respondents’ responses
(58% ‘yes’; 42% ‘no’) differ from the ones given by the Spanish (82% ‘yes’; 18% ‘no’)
and Polish (81% ‘yes’; 19% ‘no’) subjects.
Figure 7. Pronunciation sulf-study.
As regards the dependencies between nationality versus the factors most influencing the
respondents’ pronunciation we could observe differences in the case of five out of eight
factors; namely, stays in an English speaking country (p=0.02196*), contacts with native
speakers (p=0.01813*), practical phonetics classes (p=0.00000***), imitating authentic
speech (p=0.00002***), and primary/secondary school English (p=0.00000***).11 The
differences in percentages among the nationalities can be seen in Figure 8.
Figure 8. Nationality versus self-study pronunciation strategies.
p>α for the remaining three factors was calculated as follows: self-study on pronunciation
(p=0.40755), ‘Descriptive Grammar’ classes (p=0.29020) and listening to authentic English
Questionnaire-Based Pronunciation Studies
We have also examined the ranking of these influential factors for individual
nationalities. For all the examined nationalities ‘listening to authentic English’ occupies
the top position, then the ranking of factors differs slightly. For instance, the Italian
respondents regard ‘primary and secondary school education’ as the second most
beneficial aspect. The Spanish value ‘contacts with native speakers’ and ‘practical
phonetics’ next while Poles opt for ‘practical phonetics,’ ‘imitating authentic speech’ as
well as ‘contacts with native speakers’.
5. Conclusions
This article was intended to provide a thorough examination of the nature of
pronunciation preferences of Italian, Spanish and Polish learners of English. The survey
conducted by the present author reveals that most students wish to speak with good
English pronunciation and to sound native-like, which agrees with the findings by Porter
and Garvin (1989), Waniek-Klimczak (1997), Szpyra-Kozłowska (2004), Bryła (2006),
Janicka et al. 2008, and Waniek-Klimczak and Klimczak (2008). The respondents
believe it is important to have good pronunciation in English since they want to be
clearly understood, serve as a good model for students and be perceived as competent
users of English. The majority of informants agree with the statement that students
should aim for native English pronunciation. Those who do not support this claim seem
to regard intelligibility as the main aim of communication and take into account the
needs of people who are not specialists in English.
Students report that their pronunciation has improved most as a result of listening to
authentic English, practical phonetics instruction, imitating authentic speech as well as
through contacts with native speakers. Waniek-Klimczak’s (1997) subjects point to a
slightly different order of factors which most influenced their pronunciation. Among
them there are watching and listening to authentic English, practical phonetics and
listening classes. One of our findings was that the college students favoured practical
pronunciation classes over more academically-oriented ‘Descriptive Grammar’ classes,
which is also consistent with Waniek Klimczak’s (1997) results. However, in a survey
by Dziubalska-Kołaczyk et al. (1999) a greater preference for ‘Descriptive Grammar’ is
evident, although in general the majority of their subjects indicate a strong correlation
between theoretical and practical classes and the positive influence of the two on their
pronunciation. Furthermore, Cenoz et al. (1999) point to yet another ranking of factors
beneficial for their students’ phonetics, i.e. residence in an English-speaking country,
speaking to natives, specific training through phonetics, listening to radio and TV and
ear training.
The majority of our respondents (76%) claim to study pronunciation on their own by
means of different, mostly cognitive, strategies. This is a significantly higher percentage
than that found by Sobkowiak’s (2002) questionnaire, where only half of the
experimental group claimed pronunciation self-study. The most frequently mentioned
self-study techniques are as follows: reading aloud to themselves, imitating authentic
speech from films, audio books and the media, listening to and repeating after a model,
drilling particularly difficult words and phrases, learning pronunciation with books and
tapes, working with pronunciation dictionaries as well as listening to and watching
English-language programmes.
What agrees with the findings of Droździał-Szelest (1997), Petersen (2000 as cited in
Pawlak 2010) and Pawlak (2006, 2008, 2010) is that most respondents tend to use
traditional cognitive strategies such as repetition, and that transcription is also mentioned
as a helpful tool in the mastery of pronunciation, which is also confirmed by Sobkowiak
(2002). Unlike in Pawlak’s (2006) research, the respondents in the present study are
aware of the importance of comparing the authentic with the student’s own speech.
Metacognitive strategies such as self-evaluation and self-monitoring are also said to be
To recapitulate, although this description of students of English is based on limited
evidence, it is hoped that it provides a fair and adequate characterization of this group of
learners with reference to their phonetic preferences. The results on students’ wants and
needs with respect to pronunciation point to the fact that learners of English wish to
speak with good pronunciation, set a high native-like standard for themselves, report
having benefited from their phonetic instruction and exposure to native English and that
they work on their pronunciation by means of various, mostly cognitive, strategies.
The outcome of this study can serve as yet another argument for teaching native
models of English to students of English (cf. Remiszewski 2008; Scheuer, 2008;
Sobkowiak 2008; Szpyra-Kozłowska 2008). It is consistent with Sobkowiak’s
(2008:139) observation that among European students there is a preference for sounding
native-like: “[q]uestionnaire and experimental research clearly shows that to most
learners, at least in the European context, correct native(-like) pronunciation is not only a
question of communicative pragmatics, but also self-image. And listeners, both native
and non-native, evaluate the speaker on the basis of his pronunciation.”
The present study aims at exploring the under-investigated interface between SA and L2
phonological development by assessing the impact of a 3-month SA programme on the
pronunciation of a group of 23 Catalan/Spanish learners of English (NNSs) by means of
phonetic measures and perceived FA measures. 6 native speakers (NS) in an exchange
programme in Spain provided baseline data for comparison purposes. The participants
were recorded performing a reading aloud task before (pre-test) and immediately after
(post-test) the SA. Another group of 37 proficient non-native listeners, also bilingual in
Catalan/Spanish and trained in English phonetics, assessed the NNS' speech samples for
degree of FA. Phonetic measures consisted of pronunciation accuracy scores computed by
counting pronunciation errors (phonemic deletions, insertions and substitutions, and stress
misplacement). Measures of perceived FA were obtained with two experiments. In
experiment 1, the listeners heard a random presentation of the sentences produced by the
NSs and by the NNSs at pre-test and post-test and rated them on a 7-point Likert scale for
degree of FA (1 = “native” , 7 = “heavy foreign accent”). In experiment 2, they heard
paired pre-test/post-test sentences (i.e. produced by the same NNS at pre-test and posttest) and indicated which of the two sounded more native-like. Then, they stated their
judgment confidence level on a 7-point scale (1 = “unsure”, 7 = “sure”). Results indicated
a slight, non-significant improvement in perceived FA after SA. However, a significant
decrease was found in pronunciation accuracy scores after SA. Measures of pronunciation
accuracy and FA ratings were also found to be strongly correlated. These findings are
discussed in light of the often reported mixed results as regards pronunciation
improvement during short-term immersion.
1. Introduction
A large body of research into second language (L2) phonological acquisition has
analysed the phenomenon of foreign accent (FA), which is the result of perceived
differences between the acoustic-phonetic properties of L2 speech and those
characterising native speakers’ norms: “Listeners hear foreign accents when they detect
This research was supported by grants FFI2010-21483-C02-01 and BES-2008-010037 from the Spanish
Government to the SALA project, and from grant 2010 SGR 140 from the Generalitat de Catalunya to t he
research group Allencam.
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
divergences from English phonetic norms along a wide range of segmental and
suprasegmental (i.e., prosodic) dimensions” (Flege 1995). Most FA research has
explored the perception of accented speech by native listeners, who have been found to
assess accentedness reliably regardless of training or experience (Brennan and Brennan
1981, Flege and Fletcher 1992). These studies have usually been conducted in learning
contexts of long-term immersion in the L2 community, and in connection with variables
that have been identified as influencing perceived degree of FA, most notably age of
onset of L2 learning and L2 experience.
Despite the traditional use of native listeners, a few studies have also analysed the
perception of accented speech by non-native listeners. For instance, Flege (1988) found
that two groups of Chinese non-native listeners were able to judge degree of FA in
Chinese-accented English sentences following the same response pattern observed for
native listeners, with judgements from the most experienced Chinese group more closely
resembling native listeners’ judgements. Similarly, in Mackey et al. (2006), proficient
Arabic listeners provided FA judgements of Italian-accented speech in English which
were strongly correlated with native listeners’ judgements. These findings extended
those by Flege 1988, as they suggested that non-native listeners are able to reliably
assess accentedness in speech samples from L2 learners even in the absence of a shared
L1 background between listeners and learners. More studies have supported the finding
that listeners with different L1s may share a similar response to accented speech. Munro
et al. (2006) found moderate to high correlations in FA scores, as well as in
comprehensibility and intelligibility scores, provided by four different groups of native
listeners and non-native listeners with varying L1s who assessed English speech samples
with different accents. Derwing and Munro (in press) also obtained high correlations
between native listeners’ ratings of accented English and ratings from a group of
proficient non-native listeners with different L1 backgrounds, concluding that both
groups of listeners may be equally reliable to assess L2 learners’ speech. The results of
these studies, therefore, indicate that non-native listeners who are proficient enough in
the L2 they are asked to evaluate can provide reliable FA judgements which closely
match those of native listeners. However, these few studies analysing the perception of
accented speech by non-native listeners are usually conducted also in long-term
immersion contexts, rather than in shorter periods of immersion, such as those typical of
Study Abroad learning contexts.
Study Abroad (SA) is a second language learning context which can be defined as a
combination of language-based and/or content-based classroom instruction together with
out-of-class interaction in the native speech community (Freed 1995). SA programmes
have become very popular, for instance, in Europe and America, due to the common
sense and long held assumption that immersion in the L2 community results in
substantially enhanced L2 knowledge, as such immersion is assumed to offer plenty of
opportunities for interaction with native speakers and exposure to a great amount of
quality input. Consequently, SA programmes have been encouraged by language
instructors and academic administrators and have come to play an important role in
governments’ L2 learning policies, as a means to promote multilingualism in response to
an increasingly globalised international context (see, e.g., Kinginger 2009 and Llanes
2011 for a review of official figures and language learning policies). An increasing body
of research has been subsequently devoted to this learning context, in order to account
Perception of FA by Non-native Listeners in a Study Abroad Context
for the nature of the study abroad experience and empirically assess its impact on L2
learners’ linguistic development (see research overviews in DeKeyser 2007; DuFon and
Churchill 2006; Freed 1995). For the most part, research has found evidence for a
positive effect of the study abroad experience on learners’ L2 development, yet actual
linguistic gains appear to be related to individual and context variables, such as contact
patterns while abroad, L1 and L2 use, L2 exposure, onset level of proficiency, or length
of stay, as well as to aspects of programme design (see Pérez-Vidal and Juan-Garau
2011 for a characterisation of SA). A complex picture results of the interaction of all
these factors, with findings sometimes providing inconclusive or conflicting evidence, as
the benefits of SA are not always clear for all language skills, or the gains reported may
fall short of the high expectations arising out of the above-mentioned widespread belief
in the substantial effects of study abroad immersion.
Research has analysed the impact of SA on different linguistic domains, and usually
in contrast with formal instruction (FI) in at home (AH) institutions. Results have
provided consistent evidence of the beneficial role of SA for lexical improvement
(Collentine 2004; Llanes and Muñoz 2009), as well as for writing (Pérez-Vidal and JuanGarau 2009, 2011). Sociolinguistic skills have been the object of considerable research,
with studies examining, for instance, communication strategies (Lafford 1995) or
pragmatic competence (Barron 2006), and which have also yielded results supporting the
positive effect of SA on these areas. However, mixed results have been found for
grammar. Results by Collentine 2004 showed a superiority for AH learners over those
who went abroad, whereas the opposite was true in Howard (2005). Most SA research
has focused on the development of oral skills, traditionally considered to be the linguistic
domain most likely to improve as a result of SA, and research findings in general have
supported this view. Some studies have analysed the impact of SA on overall L2
speaking proficiency (Brecht et al. 1995, Segalowitz and Freed 2004), and extensive
research has also been carried out to analyse gains in L2 learners’ fluency (Freed et al.
2004, Juan-Garau and Pérez-Vidal 2007; Trenchs-Parera 2009, Valls-Ferrer 2011).
Nevertheless, studies focusing on specific aspects of phonological development in
learners’ speech production are scarce.
Studies of phonological development during SA generally focus on the differential
effects of SA vs FI on production accuracy, and have yielded mixed results. DíazCampos (2004) reported a positive effect of both learning contexts on the production of
Spanish plosives in two groups of English students of Spanish, although development
towards native-like patterns was found to be stronger in the FI group. Contrarily, DíazCampos (2006) observed greater gains in the production of Spanish consonants for the
SA group as compared with the FI group. Mora (2008) examined the production of VOT
in English voiceless plosives by a group of Spanish/Catalan bilingual learners after a
two-term FI period at their home university and after a three-month SA term abroad. He
found no effect of FI on VOT duration, whereas an increase was observed after SA,
although non-significant. However, in a similar study analysing English vowels,
significant improvement in production was found after FI, but not after SA (Pérez-Vidal
et al. 2011). Højen (2003) found better perceived foreign accent scores after SA as a
function of length (average=7.1 months), but production at the segmental level did not
improve significantly. Avello (2011) and Avello et al. (in press) reported minor gains in
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
perceived FA scores and no significant improvement in segmental production accuracy,
The present study thus explores the under-investigated impact of SA on L2 learners’
phonological development by assessing the impact of a 3-month SA programme on the
pronunciation of a group of 23 bilingual Catalan/Spanish learners of English by means
of both phonetic measures of pronunciation accuracy and perceived FA measures by
non-native listeners. The relationship between both types of measures is also explored.
Our objectives are thus the following:
- To explore the effect of SA on L2 learners’ phonological development (measured by
pronunciation accuracy scores and FA scores).
- To explore the relationship between the phonetic properties of L2 learners’ speech
(objective measures) and perceived degree of FA (subjective measures).
2. Method
2.1. Participants
2.1.1. Speakers
This study is part of a larger, state-funded project called SALA (Study Abroad and
Language Acquisition), which aims at uncovering the effects of a short, 3-month SA
period on the linguistic development of university level L2 English learners. Data were
collected from a group of non-native speakers (NNSs) studying Translation and
Interpreting in Barcelona, Spain (N=23; 20 females and 3 males). Their age ranged from
17 to 21 (M=18.8). At the time of data collection, none of them reported suffering any
speech impairment. They all started to learn English as a foreign language (EFL) in AH
institutions around the same age (8 years), as established by the curriculum in the
Spanish educational system, thus sharing a similar age of onset of L2 learning (AOL).
Their acquisition of English took place basically through classroom instruction (i.e., as a
FL in their native speech community), sharing also a similar exposure to English of
between 700-800 hours.
These learners had to certify an advanced level of proficiency in English (equivalent
to a B2 in the Common European Framework of Reference or CEFR) in order to be
admitted to the university where they were studying. As part of their Translation and
Interpreting degree, they had to specialise in two FLs, English being one of them, and
the other language being either French or German. They had thus a similar multilingual
profile, since they were all early bilinguals of both Spanish and Catalan, studying
English and another FL. They all had a compulsory 3-month study abroad term in an
English-speaking country at the beginning of their second academic year.
Speech samples from 6 native speakers (NS) of English served as baseline data to
assess the learners’ performance. These NSs were also part of the SALA corpus. None of
them reported any speech dysfunction. They were young university students enrolled in
an exchange programme in Spain (i.e., they were learners of L2 Spanish), with an age
Perception of FA by Non-native Listeners in a Study Abroad Context
range similar to that of the NNSs. Both groups of speakers had, therefore, a similar
profile, and consequently their data were highly comparable.
2.1.2. Listeners
A group of proficient non-native listeners were recruited as judges (NNJs, N=37) to
assess the NNSs’ degree of FA. Their linguistic profile was similar to that of the NNSs,
i.e., they were also bilingual speakers of Spanish and Catalan studying English as a FL.
They were taking a degree in English Studies in Barcelona, which involved attending
Linguistics and Literature content courses taught in English, and by the time of data
collection they had completed two courses on English phonetics and phonology. These
courses included a comprehensive description of English segmental and suprasegmental
properties, phonetic and phonological transcription, and pronunciation training, as well
as training in the use of speech analysis software (Praat). The courses were designed to
specifically tap on the problems facing L1 speakers of Spanish/Catalan when learning
English. They had, therefore, a proficient level of English, a sound knowledge of English
phonetics, and were highly familiar with the accented speech they were asked to judge,
as they shared the non-native speakers’ L1s. They performed two listening experiments
(see 2.3.2. below) and completed a questionnaire tapping on their linguistic profile and
their degree of familiarity with different native and non-native English accents. They did
these tasks for course credit.
2.2. Speech samples
Speech samples were elicited by means of a reading aloud (RA) task in which the
participants read the text The North Wind and the Sun (NWS, see Appendix 1). This is a
standard, 114-word text of which different versions exist in different languages (e.g.
French version: Fougeron and Smith 1999; Spanish version: Martínez-Celdrán et al.
2003; RP British English: Roach 2004), and which has been used to document
differences characterising English pronunciation in different dialects or by foreign
speakers (see Schneider et al. 2004). The fact that the text was the same for all the
subjects facilitated contrasting analyses, as the same vowel and consonantal items
appeared in all the speech samples, and in the same contexts. In order to assess the
effects of the 3-month SA, data were collected prior to the students’ departure (pre-test),
and immediately after their return (post-test).
The participants were recorded one at a time. They were instructed to read the text
first silently on their own, and then aloud at a normal speaking rate to be recorded. They
were told that they would be asked a question about the content of the text, which they
were to answer as quickly as possible after reading it aloud. This was done so as to draw
the participants’ attention to the content, in such a way that they were not aware that the
focus of interest was pronunciation, and with the aim of obtaining more natural sounding
data. The participants read the text out loud, and immediately after finishing, they were
asked the following question: Was the North Wind stronger than the sun?, which they
answered by stating yes or no.
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
Data from the NNSs were recorded in sound-attenuated cabins using analogue tape
recording technology, and were subsequently digitised in .wav format at 22,050 Hz, with
16-bit resolution. Data from the NSs were digitally recorded in professional sound-proof
cabins, using the Pro Tools digital audio platform. The digital files were saved in .wav
format at 44,100 Hz (later down sampled to 22,050 Hz), 16-bit resolution.
A sentence from the RA task was selected (see Appendix 2) which presented several
segmental and suprasegmental properties that were likely to result in accented
pronunciation for our L2 learners (see pronunciation errors in 2.3.1. below). The selected
sentence was extracted from each participant’s recording, and the resulting files were
edited and normalised for intensity at 70.0 dB in order to create the stimuli for the
listening experiments. Data manipulation was carried out with Praat 5.1 (Boersma and
Weenink 2009).
2.3. Data analyses
2.3.1. Pronunciation accuracy scores
The NNSs’ production accuracy was assessed by means of a phonetic analysis (Brennan
and Brenna 1981, Trofimovich et al. 2009), which was conducted by the first author on
the waveform and corresponding spectrogram of each speech sample. Pronunciation
errors were identified and accuracy scores were subsequently computed by counting the
total number of mispronunciations in each NNS’ pre-test and post-test speech samples.
These accuracy scores served as objective, phonetic measures of the NNSs’ speech
production development, and included mispronunciations affecting segmental
articulation (deletions, insertions, and phonological substitutions), as well as stress
misplacement. Presented below are some examples of such pronunciation errors:
a) Deletions:
-deletion of [l] in warm(l)y (one-segment deletion)
-deletion of final syllable in travel(er) (multiple-segment deletion)
b) Insertions:
-insertion of an extra vowel [e] in immediat[e]ly
-insertion of a velar consonant at the beginning of [ɣ]warmly
c) Substitutions:
-substitution of bilabial approximant [β] for velar fricative [v] in traveller
-substitution of dental plosive [d] for dental fricative [ð] in then
-substitution of open vowel [a] for close back vowel [ɔ] in warmly
-substitution of dental fricative [ð] for alveolar plosive [d] in immediately
-substitution of velar fricative [x] for glottal fricative [h] in his
d) Stress misplacement:
-stress shift to the penultimate syllable in multisyllabic words: traˈveller for ˈtraveller,
immeˈdiately for iˈmmediately.
Perception of FA by Non-native Listeners in a Study Abroad Context
2.3.2. Perceived FA measures
Perceived FA measures consisted of subjective listeners’ judgements obtained from the
proficient NNJ by means of two listening experiments: a rating task and a pairedcomparison task. These experiments provided us with behavioural measures of the
perceived degree of FA in the NNSs’ pronunciation prior to and immediately after SA.
They were self-paced tasks created and run with Praat software (Boersma and Weenink
2009, version 5.1). Both listening experiments were performed during the same session
(equivalent to a class activity within the NNJs’ course on English phonetics). The rating
was conducted first, then the paired-comparison. The whole session lasted around an
a) Experiment 1: Rating
The rating experiment (Munro et al. 2006, Derwing et al. 1998) provided a holistic
measurement of perceived FA changes throughout time. The NNJ heard a randomised
presentation of the speech samples produced by the NNS (pre-test and post-test) and the
NS (baseline). Their task was to rate the degree of FA in the oral samples by means of a
7-point Likert scale, where 1 stood for “native” and 7 stood for “heavy foreign accent”.
They were instructed to make use of the whole scale. Each stimulus was repeated twice
for a total of 104 trials per judge (23 NNSs x 2 times x 2 repetitions + 6 NSs x 2
repetitions), making up a total of 3,848 judgements (104 trials x 37 judges). 10 practice
trials were presented before the actual experiment in order to familiarise the listeners
with the procedure, allowing them also to check the volume level.
b) Experiment 2: Paired-Comparison
It was expected that the paired-comparison experiment would provide a more finegrained global assessment of the effect of SA on the NNSs’ degree of accentedness,
since this methodology consists of directly comparing two items produced by the same
speaker at two different testing times. Previous research analysing L2 speech production
(Riney and Flege 1998, Bradlow et al. 1999, Højen 2003) has reported it as very
sensitive to slight changes in pronunciation of the kind that are most likely to occur after
a short SA programme.
First, the NNJ had to decide which sentence was more native-like out of two paired
sentences (i.e., produced by the same NNS). Then, they stated their confidence level on a
7-point scale (1=”unsure” - 7=”sure”). For each NNS, there was a pre-test/post-test trial
and a post-test/pre-test trial. The order of presentation was counterbalanced across trials,
which were randomised. There were 46 trials per judge (2 orders x 23 NNSs), making up
a total of 1,702 judgements (46 trials x 37 judges). As was the case with experiment 1,
experiment 2 was also preceded by a few practice trials.
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
3. Results and discussion
3.1. Pronunciation accuracy scores
Figure 1 below graphically presents the accuracy scores obtained by the NNSs at pre-test
(M = 3.95, SD = 2.75) and at post-test (M = 3.30, SD = 2.65). The number of
pronunciation errors ranged from 0 to 9 at both testing times, with considerable intersubject variability, as indicated by the relatively high standard deviation. A pairedsamples t-test revealed significant gains in pronunciation accuracy after SA [t(22) =
2.135, p = .044)], i.e., the NNSs produced significantly fewer pronunciation errors after
SA than they did before their departure, the eta squared (η2 = .17) indicating a large
effect size. These results suggest that the 3-month SA had a large positive impact on the
NNSs’ phonological production accuracy, allowing them to significantly improve their
segmental articulation and stress production.
Figure 1: Mean number of pronunciation errors produced by the NNS before and after SA.
SD in parenthesis.
3.2. Perceived FA scores
3.2.1. Experiment 1: Rating
The NNJ used a 7-point scale to rate the degree of FA in the speech samples presented to
them (1=.”native”, 7=”heavy foreign accent”). Preliminary reliability analyses were
conducted to explore consistency in the NNJ’ ratings, and they yielded both high intrarater and inter-rater coefficients. Regarding intra-rater reliability, a strong correlation
was found in the judge-based FA scores assigned at each of the two rating repetitions (r
= .855, p < .001), which indicates that each judge’s first and second repetition ratings
were very similar. Inter-rater reliability was examined by means of an intra-class
correlation (ICC) analysis which yielded a high Cronbach's Alpha (.996), indicating a
high degree of agreement among the judges.
Perception of FA by Non-native Listeners in a Study Abroad Context
Figure 2 below illustrates the mean FA ratings assigned by the NNJ to the NNSs (pretest and post-test) and to the NS (baseline). As expected, the ratings for the NS group
were very close to 1 (M = 1.29, SD = .17), indicating that the NNJ successfully identified
the native speakers of English, and rated them accordingly. The NNSs’ ratings were
considerably outside the range of the NSs’ ratings both at pre-test and post-test, and
significantly differed from them at the two testing times, as shown by independentsamples t-tests (p < .001). There was a slight improvement in the NNS’ FA scores after
SA, since the perceived degree of accentedness decreased from pre-test (M = 4.88, SD =
1.28) to post-test (M = 4.68, SD = 1.20). This decrease, however, failed to reach
significance [t(22) = 1.306, p > .05]. These results seem to indicate a positive trend of
development towards less accented speech, suggesting that SA might have had some
impact on the NNSs’ degree of accentedness, although statistically non-significant.
Figure 2: Mean FA ratings (Experiment 1) for NNS (Pre-Test and Post-Test) and NS
(baseline). SD in parenthesis.
3.2.2. Experiment 2: Paired-Comparison
The paired-comparison experiment complemented the rating experiment, as it was
assumed to yield more fine-grained measures of the global degree of accentedness
perceived by the NNJ in the learners’ speech samples. The combination of the FA scores
obtained with both experiments was thus expected to provide us with the necessary
information to better evaluate possible changes in the NNSs’ speech production.
In the paired-comparison experiment, the NNJ were asked to directly compare the
learners’ pre-test and post-test speech samples. The NNJ first had to indicate which of
the two versions was better (i.e. more native-like), and then used a 7-point scale to state
their degree of confidence (1=”unsure” - 7=”sure”). The data thus obtained were
codified as follows: a negative sign was assigned to the selected confidence scale value
when the pre-test version was chosen as better, and a positive sign was assigned when
the post-test version was preferred. This resulted in scores ranging between -7 and 7,
which were further recoded into values between -6 and 6 (see figure 3 below), with
positive values indicating that a majority of post-test samples had been preferred as more
native-like, and pointing, therefore, to an improvement in speech production after SA.
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
The mean global FA scores are presented in figure 3. Individual scores ranged between 2.05 and 4.42, indicating large inter-subject variability. 11 out of the 23 learners
obtained positive scores (ranging from .29 to 4.42), although in most cases scores were
below 2. The positive group mean (.36), although only slightly above 0, can be
interpreted as a slight improvement in the NNSs’ degree of accentedness, in a similar
way to the results of the rating experiment. There was, therefore, a parallelism between
the two listening experiments, in the sense that they both seemed to point to a positive,
although small effect of SA on the NNSs’ perceived degree of FA. These results also
matched the gains observed in the analysis of the pronunciation accuracy scores.
Figure 3: Individual and group mean FA scores for NNS (Experiment 2). 11 subjects
(highlighted in red) obtained positive scores, signalling improvement after SA. The group
mean (.36) was also positive.
Taken together, these findings suggest that increased experience with the L2 in the
context of SA was beneficial for the learners’ pronunciation development, as measures
of pronunciation accuracy and perceived degree of FA both point towards improved
performance after SA. Such an improvement can be explained on the basis of the
excellent opportunities for oral practice available while abroad, as the learners take
advantage of the exposure to varied and authentic L2 input, and may engage themselves
in meaningful interactions in real communicative situations which may lead to useful
feedback from native speakers. . The positive albeit moderate effect of SA on
pronunciation found in this study is also in accordance with the results of most SA
research, which report significant gains in other linguistic skills such as vocabulary
(Collentine 2004), writing (Pérez-Vidal and Juan-Garau 2009) and especially oral
fluency (Perez-Vidal and Juan-Garau 2007, Trenchs-Parera 2009; Valls-Ferrer 2011).
Perception of FA by Non-native Listeners in a Study Abroad Context
Interestingly though, when it comes to phonological development, the scant existing
research has not provided consistent evidence supporting a large effect of SA on
improved pronunciation, despite the positive outcomes shown in other oral abilities, and
the fact that oral production is assumed to be one of the most practiced skills while
abroad. Hence, our findings regarding improved accuracy in pronunciation contrast with
previous research which has mostly focused on the analysis of a limited number of
specific vowel and/or consonantal L2 sounds (Avello et al. in press, Díaz-Campos 2004,
Pérez-Vidal et al. 2011), and has failed to show a substantial impact of SA on segmental
production. These divergences may be attributable to the differences in the selected
object of study. Instead of analysing a limited set of discrete units, the present study has
targeted a wider range of phonological features, by looking into various phenomena at
the segmental and suprasegmental level, including phonemic deletions, insertions and
substitutions, as well as stress implementation, which affect not only discrete units, but
also syllable structure.
A slight positive impact of SA was found when analysing perceived degree of
accentedness, although in this case the improvement was non-significant and suggested
no large effect of SA on this domain. This is very much in line with previous FA
research within the context of SA. Højen (2003) found a significant improvement in his
participants' FA ratings after SA, but when exploring individual differences, he obtained
a positive correlation between foreign accent ratings and length of stay (average 7.1
months); improvement was observed for those learners with longer SA (of up to 11
months), whereas learners with 3 to 4 months of SA did not improve significantly. He
concluded that length of stay was an important factor for improvement of perceived FA
to take place. Similarly, Avello (2011) also failed to find significant improvement in FA
scores for a group of participants who had spent a 3-month period abroad. These
findings may be explained by the fact that listeners seem to rate speech samples for
accentedness holistically (Magen 1998), paying attention not only to aspects of
segmental production or stress, but also to other suprasegmental or prosodic properties
of speech, e.g. rhythm, intonation, pauses, or connected speech phenomena. In this
sense, a 3-month programme may be too short for substantial improvement to accrue in
these other areas of pronunciation.
3.3. Relationship accuracy scores/FA ratings
The relationship between the phonetic and FA measures was explored by means of
Pearson correlations. A strong correlation was found between the two measures at pretest (r=.814) and post-test (r=.730), and both correlations were significant at the .01 level
(p<.005). This strong correlation points to a relationship between accuracy scores and
FA scores, in such a way that the production by the NNS of fewer pronunciation errors
resulted in the perception by the NNJ of a lower degree of accentedness, whereas the
larger the number of pronunciation errors, the higher the degree of accentedness
perceived. These results are in line with previous research which has established a
correlation between perceived accentedness in L2 speech samples and the phonetic
characteristics of those speech samples in terms of divergences from native-like
pronunciation patterns (Brennan and Brennan 1981, Magen 1998). Despite the fact that
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
improvement in FA scores did not reach significance, it seems that our non-native
listeners were nonetheless able to perceive the decrease in pronunciation errors between
pre-test and post-test, i.e. they can be considered as “good judges” who correctly
performed their task. Given their ability to perceive these differences in pronunciation,
and the fact that they were also phonetically trained, it is likely that they also focused on
other phonetic-acoustic properties of the speech samples, for instance, at the
suprasegmental level mentioned above, which might not have differed substantially after
SA, resulting in the differences in significance found for the accuracy scores as
compared with the FA ratings.
4. Summary and Conclusions
This study aimed at furthering our understanding of the impact SA may have on L2
learners’ pronunciation development. Although the few existing studies suggest that SA
does not substantially change learners’ pronunciation patterns, our findings indicate that
SA may, indeed, result in gains for this specific area, even after a short-term immersion
programme of only 3 months.
Phonetic measures of pronunciation accuracy suggest a large impact of short-term
SA on production at the level of segmental articulation, as well as at the suprasegmental
level of stress implementation, since a significant decrease of pronunciation errors was
found in the learners' speech production after SA. However, there is no evidence of a
large effect of SA on global FA scores; a positive trend seems to emerge towards less
accented speech, but it is not strong and is far from significant.
Despite the differences observed between the two types of scores regarding strength
of the SA impact, phonetic accuracy scores and perceived FA ratings are shown to be
strongly correlated. This strong correlation points to a relationship between both types of
measures, which is interpreted in terms of the proficient non-native listeners' ability to
perceive the phonetic characteristics of the speech samples, namely, the decrease in
pronunciation errors between pre-test and post-test, assigning worse FA ratings to speech
samples containing a larger number of mispronunciations.
To summarise, it seems that SA offers the kind of input and practice that may be
conducive to improvement in pronunciation (as is the case in other linguistic areas,
specially of oral performance) for those learners who are able to draw on the contact
opportunities and the exposure to massive amount of quality input that characterise this
learning context. At least our findings regarding pronunciation accuracy seem to indicate
so. But these results should be taken with caution, as the learners' FA scores fail to
improve significantly or to even approach native-like scores after SA, notwithstanding
the significant decrease in pronunciation errors. This may be an indication that
substantial improvement is more likely to accrue at the segmental level and regarding
stress, but it is possible that other areas of pronunciation not analysed in our study, such
as rhythm or intonation, may not be affected by SA, or may require longer periods of
immersion to benefit from the SA experience.
Perception of FA by Non-native Listeners in a Study Abroad Context
Avello, P., Lara, A.R., Mora, J.C., and Pérez-Vida, C. In press: The impact of Study
Abroad and Length of Stay on Phonological Development in Speech Production.
Proceedings of the 30th Aesla Conference, Universitat de Lleida.
Avello, P. 2011: Measuring Perceived Pronunciation Gains in Study Abroad:
Methodological Issues. Paper presented at the 29th Aesla Conference, Universidad
de Salamanca.
Barron, A. 2006: Learning to Say 'You' in German: The Acquisition of Sociolinguistic
Competence in a Study Abroad Context. In A. DuFon, and E. Churchill (Eds.) 2006:
Language Learners in Study Abroad Contexts. Clevedon: Multilingual Matters, pp.
Boersma, P., and Weenick, D. 2009. Praat: Doing phonetics by computer, version 5.1.
Bradlow, A. R., Akahane-Yamada, R., Pisoni, D. B., and Tohkura, Y. 1999: Training
Japanese Listeners to Identify English /r/ and /l/: Long-Term Retention of Learning
in Perception and Production. Perception and Psychophysics, 61(5), 977-985.
Brennan, E. M., and Brennan, J. S. 1981: Measurements of Accent and Attitude Toward
Mexican-American Speech. Journal of Psycholinguistic Research, 10(05), 487-501
Brecht, R., Davidson, D., and Ginsberg, R. 1995: Predictors of Foreign Language Gain
during Study Abroad. In B. Freed (Ed.), Second Language Acquisition in a Study
Abroad Context. Amsterdam: John Benjamins Publishing Company, pp.37-66.
Collentine, J. 2004: The Effects of Learning Contexts on Morphosyntactic and Lexical
Development. Studies in Second Language Acquisition, 26(2), 227-248.
Derwing, T. M., and Munro, M. J. In press: The development of L2 oral language skills
in two L1 groups: A seven-year study. Language Learning.
Derwing, T. M., Munro, M. J., and Wiebe, G. 1998: Evidence in Favor of a Broad
Framework for Pronunciation Instruction. Language Learning, 48(3), 393-410.
Díaz-Campos, M. 2006: The Effect of Style in Second Language Phonology: An
Analysis of Segmental Acquisition in Study Abroad and Regular-Classroom
Students. In C. A. Klee, and T. L. Face (Eds.), Selected Proceedings of the 7th
Conference on the Acquisition of Spanish and Portuguese as First and Second
Languages. Sommerville, MA: Cascadilla Proceedings Project, pp.26-39.
Díaz-Campos, M. 2004: Context of Learning in the Acquisition of Spanish Second
Language Phonology. Studies in Second Language Acquisition, 26, 249-273.
DuFon, A., and Churchill, E. (Eds.) 2006: Language Learners in Study Abroad Contexts.
Clevedon: Multilingual Matters.
Flege, J. E. 1995: Second Language Speech Learning Theory, Findings, and Problems.
In W. Strange (Ed.), Speech Perception and Linguistic Experience: Issues in CrossLanguage Research. Timonium, MD: York Press, pp.233-277.
Flege, J. E. 1988: Factors Affecting Degree of Perceived Foreign Accent in English
Sentences. Journal of the Acoustical Society of America, 84(1), 70-79.
Flege, J. E., and Fletcher, K. L. 1992: Talker and Listener Effects on Degree of
Perceived Foreign Accent. Journal of the Acoustical Society of America, 91(1), 370389.
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
Fougeron, C., and Smith, C. L. 1999: French. In IPA (Ed.), Handbook of the
International Phonetic Association pp.78-81.
Freed, B. (Ed.) 1995: Second Language Acquisition in a Study Abroad Context.
Amsterdam: John Benjamins Publishing Company.
Freed, B. F., Dewey, D. P., Segalowitz, N., and Halter, R. 2004: The Language Contact
Profile. Studies in Second Language Acquisition, 26(2), 349-356.
Howard, M. 2005: On the role of context in the development of learner language :
Insights from study abroad research. ITL International Journal of Applied
Linguistics, 148, 1-20.
Højen, A. D. 2003: Second-language speech perception and production in adult learners
before and after short-term immersion. Unpublished doctoral dissertation. University
of Aarhus.
Juan-Garau, M., and Pérez-Vidal, C. 2007: The Effect of Context and Contact on Oral
Performance in Students Who Go on a Stay Abroad. VIAL, 4, 117-134.
Kinginger, C. 2009: Language Learning and Study Abroad. Basingstoke: Palgrave
Lafford, B. 1995: Getting Into, Through and Out of a Survival Situation: A Comparison
of Communicative Strategies Used by Students Studying Spanish Abroad and 'At
Home'. In B. Freed (Ed), : Second Language Acquisition in a Study Abroad Context.
Amsterdam: John Benjamins Publishing Company, pp. 97-122.
Llanes, À. 2011: The Many Faces of Study Abroad: An Update on the Research on L2
Gains Emerged during a Study Abroad Experience. International Journal of
Multilingualism, 8(3), 189-215.
Llanes, À., and Muñoz, C. 2009: A Short Stay Abroad: Does it make a Difference?
System, 37(3), 353-365.
MacKay, I. R. A., Flege, J. E., and Imai, S. 2006: Evaluating the Effects of
Chronological Age and Sentence Duration on Degree of Perceived Foreign Accent.
Applied Psycholinguistics, 27, 157-183.
Magen, H. S. 1998: The Perception of Foreign-Accented Speech. Journal of Phonetics,
26(4), 381-400.
Martínez-Celdrán, E., Fernández Planas, A. M., and Carrera-Sabaté, J. 2003: Castilian
Spanish. Journal of the International Phonetic Association, 33(02), 255-259.
Mora, J. C. 2008: Learning Context Effects on the Acquisition of a Second Language
Phonology. In C. Pérez-Vidal, M. Juan-Garau and A. Bel (Eds.), A Portrait of the
Young in the New Multilingual Spain. Clevendon: Multilingual Matters, pp.241-263.
Munro, M. J., Derwing, T. M., and Morton, S. L. 2006: The Mutual Intelligibility of L2
Speech. Studies in Second Language Acquisition, 28(1), 111-131.
Pérez-Vidal, C., and Juan-Garau, M. 2011: The Effect of Context and Input Conditions
on Oral and Written Development: A Study Abroad Perspective. International
Review of Applied Linguistics in Language Teaching, 49(2), 157-185.
Pérez-Vidal, C., and Juan-Garau, M. 2009: The Effect of Study Abroad on Written
Performance. In L. Roberts, D. Véronique, A. Nilsson and M. Tellier (Eds.),
EUROSLA Yearbook Volume 9 (2009). Amsterdam: John Benjamins Publishing
Company, pp.269-295.
Perception of FA by Non-native Listeners in a Study Abroad Context
Pérez-Vidal, C., Juan-Garau, M., and Mora, J. C. 2011: The Effects of Formal
Instruction and Study Abroad Contexts on Foreign Language Development: The
SALA Project. In C. Sanz, and R. P. Leow (Eds.), Implicit and Explicit Conditions,
Processes and Knowledge in SLA and Bilingualism. Washington D. C.: Georgetown
University Press, pp.115-138.
Riney, T., and Flege, J. E. 1998: Changes Over Time in Global Foreign Accent and
Liquid Identifiability and Accuracy. Studies in Second Language Acquisition, 20,
Roach, P. 2004: British English: Received Pronunciation. Journal of the International
Phonetic Association, 34(02), 239-245.
Schneider, E. W., Burridge, K., Kortmann, B., Mesthrie, R., and Upton, C. (Eds.) 2004:
A Handbook of Varieties of English. Volume 1: Phonology. Berlin: Mouton de
Segalowitz, N., and Freed, B. 2004: Context, Contact and Cognition in Oral Fluency
Acquisition: Learning Spanish in at Home and Study Abroad Contexts. Studies in
Second Language Acquisition, 26(2), 173-199.
Trenchs-Parera, M. 2009: Effects of Formal Instruction and Stay Abroad on the
Acquisition of Native-Like Oral Fluency. The Canadian Modern Language Review,
65(3), 365-393.
Trofimovich, P., Lightbown, P., Halter, R. H., and Song, H. 2009: ComprehensionBased Practice: The Development of L2 Pronunciation in a Listening and Reading
Program. Studies in Second Language Acquisition, 31, 609-639.
Valls-Ferrer, M. 2011: The development of oral fluency and rhythm during a study
abroad period. Doctoral dissertation. Universitat Pompeu Fabra. Retrieved from
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
The primary purpose of this paper is to explain the procedure of developing the English
Read by Japanese Phonetic Corpus. A series of preliminary studies (Makino 2007, 2008,
2009) made it clear that a phonetically-transcribed computerized corpus of Japanese
speakers’ English speech was worth making. Because corpus studies on L2 pronunciation
have been very rare, we intend to fill this gap. For the corpus building, the 1,902 sentence
files in the English Read by Japanese speech database scored for their individual sounds
by American English teachers trained in phonetics in Minematsu, et al. (2002b) have been
chosen. The files were pre-processed with the Penn Phonetics Lab Forced Aligner to
generate Praat TextGrids where target English words and phonemes were forced-aligned
to the speech files. Two additional tiers (actual phones and substitutions) were added to
those TextGrids, the actual phones were manually transcribed and the other tiers were
aligned to that tier. Then the TextGrids were imported to ELAN, which has a much better
searching functionality. So far, fewer than 10% of the files have been completed and the
corpus-building is still in its initial stage. The secondary purpose of this paper is to report
on some findings from the small part of the corpus that has been completed. Although it is
still premature to talk of any tendency in the corpus, it is worth noting that we have found
evidence of phenomena which are not readily predicted from L1 phonological transfer,
such as the spirantization of voiceless plosives, which is not considered normal in the
pronunciation of Japanese.
1. Introduction
The purpose of this paper is to explain the procedures in developing the English Read by
Japanese (henceforth ERJ) Phonetic Corpus and to report on some findings from the
small part of the corpus that has been completed.
A series of preliminary studies (Makino 2007, 2008, 2009) made it clear that a
phonetically-transcribed computerized corpus of Japanese speakers’ English speech was
worth making. So the first author began building the ERJ Phonetic Corpus by making
use of ERJ speech database (Minematsu, et al. 2002a), which he also used in the
preliminary studies.
Corpus studies on L2 pronunciation have been very rare (cf. Gut 2009, Meng, et al.
2009). We intend to fill this gap with this study.
Takehiko Makino and Rika Aoki
2. The ERJ speech database
The ERJ speech database was collected mainly in order to help CALL system
development (Minematsu, et al. 2002a). 807 different sentences and 1,009 different word
sets were read aloud by 100 male and 100 female speakers in 20 different recording sites
in Japan. All of the sites were universities and all the speakers were students in those
Each sentence and each word were read by approximately 12 speakers and 20
speakers respectively of each sex. In total, the ERJ speech database consists of more than
70,000 speech files: 24,744 sentence files and 45,495 word-set files.
2.1 ERJ recording procedure
The following explanation of the recording procedure of the ERJ speech database is
based on Minematsu, et al. (2002). Before recording, speakers were asked to practice
pronouncing the sentences and words on the given sheets. In the practice, they were
permitted to refer to the reading sheets with phonemic and prosodic notation.
The phonemic symbols used in the training sheets are based on those of the TIMIT
database and the CMU pronunciation dictionary. The model of the pronunciation is
therefore Mainstream American English. The actual symbols used are shown below with
their IPA equivalents:
Consonants: P /p/, T /t/, K /k/, B /b/, D /d/, G /ɡ/, CH /tʃ/, JH /dʒ/, F /f/, TH /θ/, S /s/, SH
/ʃ/, HH /h/, V /v/, DH /ð/, Z /z/, ZH /ʒ/, M /m/, N /n/, NG /ŋ/, L /l/, R /r/, W /w/, Y /j/
Vowels: IY /i/, IH /ɪ/, EH /ɛ/, EY /eɪ/, AE /æ/, AA /ɑ/, AW /aʊ/, AY /aɪ/, AH /ʌ/, AO /ɔ/,
OY /ɔɪ/, OW /oʊ/, UH /ʊ/, UW /u/, ER /ɝ/, AXR /ɚ/, AX /ə/
Each vowel was specified for degrees of stress: “1” for primary, “2” for secondary
and “0” for unstressed.
Since the IPA is used for transcribing pronunciations in English dictionaries in Japan,
the above set of symbols was unfamiliar to the Japanese subjects. In order to ensure that
the speakers understood these symbols correctly, a website was prepared where they
could listen to word examples for each phonemic symbol. On that website, they also
could listen to sample sentences with prosodic notations (explained below) so that they
could understand what those notations meant.
However, the degree to which the speakers made use of the learning materials was
entirely up to them; it is possible that some of the speakers were more influenced by
spelling rather than phonemic notation.
English Read by Japanese Phonetic Corpus: An Interim Report
Examples of sentences in the training sheets are shown below: 1
This was easy for us.
[DH IH1 S] [W AA1 Z] [IY1 Z IY0] [F AO1 R] [AH1 S]
Is this seesaw safe ?
[IH1 Z] [DH IH1 S] [S IY1 S AO2] [S EY1 F]
Those thieves stole thirty jewels.
[DH OW1 Z] [TH IY1 V Z] [S T OW1 L] [TH ER1 T IY0] [JH UW1 AX0 L Z]
The phonemic notations were removed in the sheets used in the recording sessions,
because it was inferred that reading sentences with phonemic notation could induce
unnatural pronunciation.
Examples of the sentences with rhythmic specifications are shown below:
Come to tea with John.
[K AH1 M] [T UW1] [T IY1] [W IH1 DH] [JH AA1 N]
Come to tea with John and Mary.
@ -/
[K AH1 M] [T UW1] [T IY1] [W IH1 DH] [JH AA1 N] [AE1 N D] [M EH1 R IY0]
“@” stands for nuclear stress, “+” for non-nuclear primary/secondary stress, and “-”
for unstressed syllables. Here again, the phonemic notations were removed from the
reading sheets for the recording sessions, while the rhythmic specifications were
Examples of the sentences specified for their intonation are shown below.
Note that the intonation curves are not based on any particular theoretical frameworks
but only indicate impressionistically what was decided important. The last line of each
sentence is an instruction in Japanese about the meanings/attitudes which were
(supposedly) conveyed by the intonation.
It is evident from these examples that different degrees of “sentence accents” and “weak form”
pronunciations of function words were not taken into consideration when preparing the
phonemic notation. The same is true for the “rhythm-specified” and “intonation-specified”
sentences discussed below. This could have led the speakers to pronounce the sentences using
“citation form” for every word.
Takehiko Makino and Rika Aoki
Again, the phonemic notations were removed from the reading sheet used in the
recording sessions, while the intonation curves were retained..
In the recording sessions, speakers were asked to read aloud sentences and words on
the given sheets repeatedly until they were sure that they pronounced them correctly. If
they made errors on the same sentences three times, they were allowed to skip them and
go on to the next one.
After recordings, each speech file was checked by the technical staff of the recording
site. If they found any technical errors in sentences or words, they were recorded again.
Minematsu, et al. (2002a) claims that with this procedure, the pronunciation errors in
the database are supposed to have been made purely because of the speakers’ lack of
skills in English pronunciation and not because of their lack of knowledge about
phonological forms of individual words or spelling-to-pronunciation correspondences.
3. Corpus building procedure
3.1 Selection of speech files
Obviously, it was unpractical to use the whole database for the corpus building because
of its sheer size. Fortunately, however, 9,494 speech files have been selected and given
pronunciation proficiency scores by American teachers trained in phonetics in another
study (Minematsu, et al. 2002b). The selected files are grouped into five sets:
Sentence files scored for their individual sounds: 1,902
Sentence files scored for their rhythm: 952
Sentence files scored for their intonation: 952
Word files scored for their individual sounds: 3,786
Word files scored for their stress pattern: 1,902
In the ERJ Phonetic Corpus, we have chosen to use only the first group, i.e., the 1,902
sentence files scored for their individual sounds for transcription. The reason for this
choice is that the other sentence groups were specified for their rhythm or intonation,
which could have distorted what Japanese speakers normally do when they read English
Word sets have not been chosen because we are not interested in the pronunciation of
individual words.
3.2 Transcription
To reduce the effort of manual transcription, the files were pre-processed by the Penn
English Read by Japanese Phonetic Corpus: An Interim Report
http://www.ling.upenn.edu/phonetics/p2fa/), which produced forced aligned Praat
(Boersma and Weenink 2011) TextGrids for each speech file with two tiers: the “word”
tier and “phone” (=phoneme) tier.
The p2fa is designed for Mainstream American English speech, so it was inevitable
that the Japanese speakers’ speech resulted in transcriptions with numerous errors.
Figure 1: An example of a TextGrid output from the Penn Phonetics Lab Forced Aligner
shown on Praat.
Then, using Praat software, two tiers (actual phones > “actual” and substitutions >
“subst”) were added to the TextGrids and “word” and “phone” tiers were re-interpreted
as target words and target phonemes respectively.
The actual phones were manually transcribed, and boundaries of target phones and
target words were manually aligned with the actual phones. The second author of this
paper was involved at this very important stage.
The substitution tier is the same as the actual phone tier, except that consecutive
actual phones were merged into one unit if they corresponded to a single target phoneme.
This tier is only necessary for searching purposes; the duration information of each
phone is retained in the actual phone tier.
Takehiko Makino and Rika Aoki
Figure 2: An example of a re-formatted and corrected TextGrid shown on Praat.
The re-formatted and corrected TextGrids were then imported to ELAN software
(Sloetjes and Wittenburg 2008; http://www.lat-mpi.eu/tools/elan/), which has a much
better searching functionality than Praat. The resulting .eaf files and the original .wav
files are the complete individual data of the corpus. So far, fewer than 10% of the files
have been completed and the corpus-building is still in its initial stage.
Figure 3: An example of the Corpus data shown on ELAN.
English Read by Japanese Phonetic Corpus: An Interim Report
4. Preliminary findings
In this tiny micro-corpus, the following consonantal tendencies, among others, have
been found.
Figure 4: An example of a search result by ELAN.
4.1 Voiced plosives
The voiced plosive phonemes are frequently spirantized (realized as fricatives): 32% for
/b/, 15% for /d/ and 8% for /ɡ/. The equivalent phonemes are often (but not obligatorily
like, for example, in Spanish) spirantized between vowels in Japanese, so this
distribution seems entirely natural. But the situation is quite not so simple. Let’s look at
the individual cases below.
Takehiko Makino and Rika Aoki
4.1.1 /b/
/b/ --> [b]
/b/ --> [β]
Count (n=41)
/b/ --> [bɨ]
/b/ --> [bɯ]
/b/ --> [b˺]
b --> [b]
/b/ --> [p˺]
/b/ --> [v]
/b/ --> [ɸ]
Table 1: ERJ realizations of /b/ and their phonetic contexts
In the above table, shaded cells in the “Realization” column represent spirantized
realizations, and those in the “Contexts” column represent possible spirantizing
conditions. “V” represents target vowels, [approx] approximants (liquids and
semivowels), [nas] nasals, and [obstr] obstruents (plosives, fricatives and affricates). “0”
represents a pause, so “0_” and “_0” correspond to syllable-initial and syllable final
positions respectively.
The spirantizing condition for Japanese voiced plosives is “between vowels,” but this
does not necessarily result in the spirantization of /b/, as shown in the table. This reflects
the fact that spirantization is a variable process in Japanese.
Other possible spirantizing conditions, from a universal phonetic point of view, which
do not appear in Japanese but do so in English include syllable-final (or “weak”)
positions. “V_[obstr]” (after a vowel and before an obstruent) is a possible context where
the following obstruent is very likely to be the onset of the following syllable. This is
basically an impossible consonantal sequence in Japanese, and the difficulty in
pronunciation can also be resolved by other means than spirantization such as vowel
insertion, which does not occur in the current data. The devoiced ([b]) and unreleased
([p˺]) realizations seem to be more English-like resolutions in this condition.
English Read by Japanese Phonetic Corpus: An Interim Report
4.1.2 /d/
/d/ --> [d]
d --> [d]
/d/ --> [tʰ]
/d/ --> [ð]
Count (n=54)
d --> [dʰ]
/d/ --> [t]
/d/ --> [t˺]
/d/ --> [z]
/d/ --> [ɾ]
/d/ --> [ʃ]
/d/ --> [θɨ]
Table 2: ERJ realizations of /d/ and their phonetic contexts
/d/ is spirantized rather infrequently in Japanese, much less often than the other voiced
plosives. This is reflected in the table, where shaded conditions correspond to many
cases of non-spirantized realizations.
Takehiko Makino and Rika Aoki
4.1.3 /ɡ/
/ɡ --> [ɡ]
Count (n=25)
/ɡ --> [ɡɨ]
/ɡ/ --> [ɡ]
/ɡ --> [ŋɡ]
/ɡ --> [ɣ]
/ɡ --> [x]
Table 3: ERJ realizations of /ɡ/ and their phonetic contexts
Here again, the fricative realizations are infrequent. More cases of non-spirantized [ɡ]
appear in spirantizing conditions than fricative realizations.
The /ɡ/ in Japanese can be realized as a velar nasal [ŋ] as well as a [ɡ] or spirantized
[ɣ] between vowels, but this variant does not appear in the current data.
4.2 Voiceless plosives
The voiceless plosive phonemes are also sometimes spirantized: 14% for /p/, 7% for /t/
and 6% for /k/. This cannot be the case of L1 transfer because this sort of “weakening” is
not considered normal for Japanese speech.
4.2.1 /p/
/p/ --> [p]
Count (n=50)
English Read by Japanese Phonetic Corpus: An Interim Report
Count (n=50)
/p/ --> [pʰ]
/p/ --> [ɸ]
/p/ --> [pɨ]
/p/ --> [pɯ]
/p/ --> [p˺]
Table 4: ERJ realizations of /p/ and their phonetic contexts
Here, we are only concerned with spirantized cases; the phonetic conditions in the
non-spirantized cases ([p, pʰ, p˺]) are too varied, and in any case released [p]s are what is
generally found for this sound in Japanese in the phonetics literature.
The fact that a spirantized realization [ɸ] does appear (though infrequently) is in itself
notable. It is possible that /p/ is sometimes spirantized in spontaneous Japanese speech
under some conditions, but we do not possess the data necessary to confirm this. All the
conditions where it appears are spirantizing conditions for voiced plosives. There might
be some universal phonetic process at work which can spirantize voiceless plosives in
these conditions.
4.2.2 /t/ and /k/
/t/ --> [t]
Count (n=112)
Takehiko Makino and Rika Aoki
Count (n=112)
/t/ --> [tʰ]
/t/ --> [t˺]
/t/ --> [tɨ]
/t/ --> [ts]
/t/ --> [tʲ]
/t/ --> [tθ]
/t/ --> [d]
/t/ --> [s]
/t/ --> [tɯ]
/t/ --> [tʃ]
/t/ --> [θ]
/t/ --> [ð]
/t/ --> [ɾ]
Table 5: ERJ realizations of /t/ and their phonetic contexts
Again, we are only concerned with spirantized cases. It is to be noted that spirantized
realizations are found even in “non-spirantizing” conditions. Much the same can be said
of the spirantization of /k/.
Count (n=73)
/k/ --> [k]
/k/ --> [kʰ]
English Read by Japanese Phonetic Corpus: An Interim Report
Count (n=73)
/k/ --> [x]
/k/ --> [kɨ]
/k/ --> [k˺]
/k/ --> [xk]
Table 6: ERJ realizations of /k/ and their phonetic contexts
4.3 Voiced (inter)dental fricatives
ð is very frequently mispronounced: only 13.5% were canonical [ð]. The most frequent
pronunciation was [d], which accounts for 32.4%, and the next most frequent were [dz]
(27%) and [z] (21.6%).
Count (n=37)
ð/ --> [d]
ð/ --> [dz]
ð/ --> [z]
ð/ --> [ð]
ð/ --> [dʰ]
Table 7: ERJ realizations of /ð/ and their phonetic contexts
Takehiko Makino and Rika Aoki
The different realizations are more or less evenly distributed, and we should not
comment about the conditions where they are found with such small data, although
plosive realizations [d, dʰ, dz] seem to be preferred in the syllable-initial positions.
4.4 /n/
/n/ was found to be pronounced as some sort of nasalized vowel in more than 30% of the
cases. This can be predicted from Japanese phonology, whose moraic nasal /N/ is
regularly realized as a nasalized vowel before a vowel, semivowel, sibilant fricative /s/
(which is usually realized either as [s] or [ʃ]) or /h/.
In the table below, [sib] means “sibilant fricative” and specific sounds in their
contexts are also transcribed where appropriate.
It is to be noted that nasalized vowel realizations appear even before obstruents in
some cases. This again is not predictable from the phonology of Japanese, and cannot be
the case of L1 transfer.
Count (n=138)
/n/ --> [n]
/n/ --> [ə]
/n/ --> [m]
/n/ --> [ĩ]
/n/ --> [ɲ]
/n/ --> [õ]
/n/ --> [ŋ]
V_[p, b]
English Read by Japanese Phonetic Corpus: An Interim Report
Count (n=138)
/n/ --> [ẽ]
/n/ --> [n <silence> n]
/n --> [n]
/n --> [æ]
/n/ --> [ ]
/n/ --> [ɯ]
/n/ --> [ɲɲ]
/n/ --> [ ]
/n/ --> [ʊ]
Table 8: ERJ realizations of /n/ and their phonetic contexts
5. Remaining problems
5.1 Lack of prosodic notation
The corpus is intended to be a source of all the phonetic characteristics of Japanese
speakers’ English speech. Therefore, prosodic notation is also necessary.
However, L2 prosody is very difficult to describe. Studies such as Gut (2009) and Li,
et al. (2011) use English ToBI (Beckman, et al. 2005) for L2 English, which I believe is
a mistake. L2 prosodic system is neither that of L1 nor of the target language, but
something of a mixture of the two.
The first author of this paper will be addressing this problem and proposing a
notational system of Japanese speakers’ English prosody in Makino (forthcoming).
5.2 Inefficiency of manual transcription
Development of spoken corpora lags far behind that of written corpora for obvious
reasons; that is, transcribed texts are not readily available, although making such texts
can be facilitated by using automatic speech recognition (ASR) technologies.
The development of L2 spoken corpora is even more difficult, because ASR
technologies have not been developed for non-native speech. Even more difficult than
this is an L2 phonetically-transcribed corpus like what we are doing, because narrow
phonetic transcription (independent of any language) is required.
Tsubaki and Kondo (2011) tried using ASR technologies in the development of their
Japanese speakers’ L2 English corpus, with reasonably good results, but this entailed an
enrichment of the dictionary with all the possible pronunciations for each entry that
Takehiko Makino and Rika Aoki
could be conceived of in terms of contrastive phonetics of the two languages. Unless the
size of the dictionary necessary is very small like theirs (the text they used was “The
North Wind and the Sun”), I do not think it practical.
6. Further work
We have decided that a different set of files (800 in total) are to be included in ERJ
Phonetic Corpus. Those files were selected independently of the study discussed in §3.1
from the ERJ database for another study (Minematsu, et al. 2011), where the recordings
were played over the telephone to Americans who were not familiar with Japanese
speakers’ English. The subjects were asked to repeat the sentences they heard and the
responses were written down orthographically.
With this data, we will be able to explore what sort of actual phones tend to be
misheard or not understood at all. This can be a basis for the study of intelligibility.
The research for this study was supported by Grant-in-Aid for Scientific Research (B)
No.23300067 (project leader: Nobuaki Minematsu) from the Japan Society for the
Promotion of Science, and a Chuo University Grant for Special Research.
While the perception of Polish-accented English by native-speakers has been studied
extensively (e.g Gonet & Pietroń 2004, Scheuer 2003, Szpyra-Kozłowska 2005, in press),
an opposite phenomenon, i.e. the perception of English-accented Polish by Poles has not,
to our knowledge, been examined so far despite a growing number of Polish-speaking
foreigners, including various celebrities, who appear in the Polish media and whose
accents are often commented on and even parodied.
In this paper we offer a report on a pilot study in which 60 Polish teenagers, all
secondary school learners (aged 15-16) listened to and assessed several samples of
foreign-accented Polish in a series of scalar judgement and open question tasks meant to
examine Poles’ attitudes to English accent(s) in their native language.
More specifically, we aimed at finding answers to the following research questions:
 How accurately can Polish listeners identify foreign accents in Polish?
 How is English-accented Polish, when compared to Polish spoken with a
Russian, Spanish, French, Italian, German and Chinese accent, evaluated by
Polish listeners in terms of the samples’ degree of:
(a) comprehensibility
(b) foreign accentedness
(c) pleasantness?
 What phonetic and phonological features, both segmental and prosodic, are
perceived by Polish listeners as characteristic of English-accented Polish?
 Can Polish listeners identify different English accents (American, English
English and Scottish) in English-accented Polish?
 Does familiarity with a specific foreign language facilitate the recognition and
identification of that accent in foreign-accented Polish?
1. Introduction
While the perception of Polish-accented English by native-speakers has been studied
extensively (e.g Gonet & Pietroń 2004, Scheuer 2003, Szpyra-Kozłowska 2005, in
press), the perception of foreign-accented Polish by Poles has not, to our knowledge,
been examined so far despite a growing number of Polish-speaking foreigners, including
various celebrities, who appear in the Polish media and whose accents are often
commented on and even parodied. They include, for example, an American model of
Polish descent, an Italian dancer, a German actor and comedian, a French chef with
Polish roots. Apart from such celebrities, more and more foreigners undertake to learn
Jolanta Szpyra-Kozłowska and Marek Radomski
Polish: students who study in Poland, businessmen representing their firms, citizens of
the former Soviet republics (mostly Ukrainians and Byelorussians) seeking employment
in this country and many others. In recent years Polish has, in fact, become a popular
language to learn, as shown in the growing number of Polish language schools that have
opened in the major Polish cities, such as Warsaw and Cracow. 1 These facts allow us to
claim that Poles have found themselves in a fairly new situation of being increasingly
exposed to many different versions of foreign-accented Polish. It is therefore interesting
to examine how such accents are perceived and evaluated by Polish listeners.
In this paper we offer a report on a study in which 60 Polish teenagers listened to and
assessed several samples of foreign-accented Polish in a series of scalar judgement and
open question tasks meant to examine Poles’ perception of several foreign accents in
their native language, including three English accents.
More specifically, we aimed at finding answers to the following research questions:
 How accurately can Polish listeners identify foreign accents in Polish?
 How is English-accented Polish, when compared to Polish spoken with a Russian,
Spanish, French, Italian, German and Chinese accent, evaluated by Polish
listeners in terms of the samples’ degree of:
foreign accentedness
pleasantness (acceptability)?
 What phonetic and phonological features, both segmental and prosodic, are
perceived by Polish listeners as characteristic of English-accented Polish?
 Can Polish listeners identify different English accents (American, English
English and Scottish) in English-accented Polish?
 Does familiarity with a specific foreign language facilitate the recognition and
identification of that accent in foreign-accented Polish?
It should be pointed out that as the present study is limited in terms of the number and
quality of the analysed accent samples as well as in employing only one group of
assessors, its results should be regarded as preliminary and subject to future verification. 2
2. Experimental design
In this section we present the relevant details concerning the design of the experiment we
have carried out in order to examine the perception of English-accented Polish. We deal
here first with the samples of Polish subject to evaluation and then with the listening and
assessment procedure.
It is worth pointing out that many citizens of the British Isles undertake to learn Polish because
their jobs require contacts with Polish immigrants.
After the completion of this paper another experiment of a similar design was carried out by the
authors in which the same speech samples were evaluated by different participants, i.e. 60 Polish
Department students (aged 20-24) of Maria Curie-Skłodowska University in Lublin. The results
obtained in both groups are very similar and support the majority of the conclusions drawn in
this paper. A fuller discussion of the latter experiment can be found in Szpyra-Kozłowska and
Radomski (in press).
The Perception of English-Accented Polish - A Pilot Study
2.1. Samples of foreign-accented Polish
For the purposes of the experiment between July and November 2011, 20 foreign
speakers of Polish were recorded while performing two tasks: reading a short passage
taken from a coursebook in Polish for the beginners3 and talking with one of the
experimenters on some everyday topics. 9 samples were then selected for accent
evaluation. The speakers (5 men and 4 women) were citizens of the USA, Scotland,
England, Russia, Germany, Italy, France, Spain and China (speaker of Mandarin), all
staying temporarily in Poland and learning Polish for a variety of personal and
professional reasons and for different periods of time (ranging from several weeks up to
three years). Care was taken to select speech samples with a similar, i.e. average degree
of foreign-accentedness, that is those ones in which a foreign accent was noticeable or
even strong, but which generally did not hinder the intelligibility of utterances. 4 Only
samples of reading were used in the experiment since they were more uniform with
respect to their degree of accentedness than the recordings of spoken Polish in which
numerous grammatical errors made them often incomprehensible. Moreover, as the
focus of this study was on pronunciation problems, grammatically correct written
passages were more appropriate for diagnostic purposes. 5 Each recording was between
1,5 and 2 minutes long.
2.2. Listeners
Nine samples of foreign-accented Polish were presented to a group of 60 Polish boys and
girls, aged 15-16, all attending a junior secondary school (gymnasium) in Lublin, where
one of the experimenters was an English teacher. All the participants had been learning
English for about 5-6 years and, apart from it, also another language, i.e. German,
Spanish, Russian or French. These facts indicate that all of them have acquaintance with
English pronunciation (usually in its RP version), but are also familiar with the sounds of
some other languages, which should facilitate accent assessment.
2.3. Listening and assessment procedure
In November 2011, the participants were informed that they would listen to the
recordings of several speech samples of Polish provided by foreign learners of this
language and then would be asked to assess them by completing the prepared answer
sheets. They did it in two sessions (5 samples were evaluated in the first session and 4
samples in the second one), with a one-week interval between them, during their regular
The texts used in the experiment were adapted from Swan (2005).
We were not always successful in this respect and while extreme cases of exceptionally good and
very poor Polish pronunciation were rejected, the experimental samples cannot be claimed to be
uniform in terms of their degree of accentedness.
It should be pointed out, however, that there are also drawbacks of employing samples of reading
as many foreign speakers’ pronunciation is heavily influenced by Polish spelling.
Jolanta Szpyra-Kozłowska and Marek Radomski
English lessons. Each sample was presented twice and then ample time was given to the
students to provide answers. Whenever necessary, additional explanations were provided
by the experimenter.
The answer sheets contained 3 scalar judgement tasks concerning the samples’
degree of comprehensibility, foreign-accentedness and pleasantness, as well as three
open questions in which the subjects were asked to identify the speakers’ country of
origin, to list their most striking pronunciation features and to describe a given accent in
impressionistic terms. Finally, the students supplied information on their age, sex and
foreign languages they learnt. Needless to say, the study was anonymous.
3. Results and discussion
The presentation and discussion of the results given below will follow the research
questions provided in section 1.
3.1. Foreign accent recognition
In the first question of our study we asked the participants to identify the country of the
speakers’ origin. They succeeded in completing this task in 37.5% of cases. 6
Below we present the percentage of the correct answers dividing the nine accents into
three groups: those which were (relatively) easy to recognize (above 50% of the correct
responses), those which were difficult to identify (20% of correct responses and less) and
those which were of medium difficulty (between 20% and 50%).
Accents which were easy to identify (over 50%):
Russian – 86%
Chinese – 56%
American English – 36% (83%)
Thus, the absolute winner was a Russian accent in Polish, or, to generalize, the east
Slavic accents, including also Ukrainian and Byelorussian.7 This result can be attributed
not only to very distinct features of this accent, but also to its considerable familiarity to
Polish listeners who are often exposed to it in the media, for example in the news reports
of Polish-speaking reporters from Kiev or Vilnius, and who can also hear it from
(mostly) Ukrainian citizens, particularly numerous in the Lublin region, situated in the
east of Poland, close to the Ukrainian border.
The second accent, recognized by 56% of participants, was Chinese, which is
surprising for two reasons. First, the recorded Chinese woman speaks beautiful, fully
intelligible Polish, with only a few phonetic departures from the original. Secondly,
In our experiment the identification task was very difficult as the choice was not limited as is
frequently the case in other accent studies, where the participants have to choose from several
provided options, as in Flege and Fletcher (1992) or Mareuil, Brahimi and Gendrot (2010).
In fact, these three accents are very similar and cannot be easily told apart.
The Perception of English-Accented Polish - A Pilot Study
Polish learners are not often exposed to this accent. Yet, its phonetic properties were
distinct enough to lead to this high result.8
Finally, an American English accent in Polish was placed in this group although only
36% of the answers were fully correct. It must be added, however, that 47% participants
identified it as ‘some kind of English.’ This yields 83% of the responses recognizing this
accent as produced by a native-speaker of English. As a matter of fact, the American
English accent in Polish turned out to be the most English-sounding accent of the three
varieties subject to analysis. An explanation of this fact should be sought in the
participants’ frequent exposure to American English, mainly through films and songs.
Accents which were of medium difficulty to identify (20%-50%)
German – 36%
Only one accent, i.e. German, appeared to be of medium difficulty to identify and was
recognized correctly by 36% of the participants only. Two comments are in order. First,
this fairly low result might follow from the young age of the subjects. In the case of
older Poles the success of identifying this accent might be greater due to massive
exposure of the oldest generation to German during World War II, numerous war films
popular in Poland until the 80’s and a considerably larger number of German learners in
Poland in the past than now. Secondly, the pupils who took part in the experiment live in
eastern Poland, with relatively few German visitors. It would therefore be interesting to
find out whether similar results would be obtained in western regions where the ties with
Germany are much stronger.
Accents which were difficult to identify (20% and below):
Italian – 21%
French – 15%
English English – 3,3%
Scottish English – 1,6%
Spanish – 0%
As many as five accents out of nine are placed in the third group as those ones which
were particularly difficult to recognize for the listeners. Within this set, the Italian and
the French samples were identified correctly by considerably more participants than the
remaining three accents, which include English English, Scottish English and Spanish
(below 4% of the correct responses). Quite surprisingly, both the English English and
Scottish English samples belong here in spite of the fact that all the participants are
learners of English and should thus be familiar with typical phonetic properties of this
language and at least with those features which are common to the majority of its
varieties. Spanish-accented Polish has to be singled out as the accent which failed to be
recognized completely, with no correct responses at all (0%).
To sum up this part of our experiment, of the three English accents presented to the
listeners, only Polish with American English features was relatively easy to recognize.
A question that arises in connection with the data above is whether accent
recognition depends on the degree of the samples’ accentedness, as it might be assumed
that the more accented someone’s speech is, the easier it is to identify the speaker’s
It should be added here that we counted as correct those answers according to which the accent
under discussion was described as Japanese as in common, though completely incorrect view
prevalent in Poland, Chinese and Japanese are regarded as similar languages.
Jolanta Szpyra-Kozłowska and Marek Radomski
origin. In other words, if more phonetic clues are available to the listener, this should
facilitate accent recognition. A comparison of the results provided in this section with
those concerning accentedness in section 3.3. shows, however, that this connection is
only partial. Thus, while the American and Russian samples were regarded as both
strongly accented and easy to identify, the English English recording was considered
strongly accented but difficult to recognize. Moreover, the German speaker had,
according to the participants, the strongest foreign accent of all, yet its correct
identifications amounted to 36% only. On the other hand, there is a high correlation
between the samples’ low level of accentedness and a small degree of their recognition
since the Spanish, French, Italian and Scottish English recordings are found in this
category. To sum up, while ‘the weaker the accent, the more difficult it is to identify’
principle appears to hold true, its opposite does not.
To shed more light on accent perception, it seems also interesting to examine the
erroneous judgements in some detail. Below we present the number of countries
indicated as the place of the speakers’ origin:
German – 26
American English – 11
Scottish English – 24
French – 11
English English – 19
Italian – 11
Spanish – 13
Russian – 4
Chinese – 12
Thus, the German-accented and Russian-accented samples are two extremes in this
evaluation as in the former case as many as 26 different countries were listed (including
such unlikely candidates as Korea, Japan, Canada, Jamaica and Hungary) and in the
latter only 4. Polish pronounced with a Scottish-English accent and with an EnglishEnglish accent also caused considerable differences of opinion while the nationality of
the American English speaker was less controversial.
It should also be pointed out that some interesting patterns can be observed in the
incorrect evaluations of English-accented Polish. Thus, to 26% of the participants the
Scottish sample sounded German and to 23% Czech or Slovak, whereas the English
English sample was considered to be uttered by a German speaker by 20% of the
subjects and 18% of them viewed it as produced by someone from Africa.9 This means
that while the recognition of theses accents is extremely poor, their Germanic nature is
identified by about one fourth of the listeners.
3.2. Accent recognition versus language learning
Another research question concerned the relation between accent recognition and
familiarity with the specific foreign languages. According to the experimental data, this
correlation is either very weak or nonexistent.
Thus, while all the participants are learners of English, only about 30% of them
recognized the three samples produced by English speakers as uttered by a person from
It is interesting to note that in the case of such judgements the name of the whole continent was
provided and not of individual countries. This means that Polish participants either assume that
there is something like one African accent or simply cannot tell these accents apart.
The Perception of English-Accented Polish - A Pilot Study
an English-speaking country. Similarly, of 37 learners of German, only 13 identified this
accent correctly. What is more, none of the 10 learners of Spanish provided the correct
answer and the French accent was properly recognized only by nine pupils who had
never learnt this language.
Only in the case of Russian-accented Polish were these two factors correlated in that
all 10 learners of Russian identified the accent correctly, but since a similar decision was
made by numerous other participants who do not know this language, this fact can be
viewed as accidental.
We feel therefore justified in concluding that exposure to foreign accents and their
characteristic phonetic properties play a greater role in accent identification than foreign
language learning.
3.3. Evaluation of samples’ comprehensibility, foreign-accentedness and
Three experimental tasks involved making scalar judgements by the listeners in order to
assess the samples degree of comprehensibility, foreign-accentedness and pleasantness.
In the first of them the participants were requested to indicate how difficult it was to
understand a given sample by choosing one of five options ranging from ‘very easy to
understand’ to ‘incomprehensible.’ The results fall roughly into two types: the samples
considered either very difficult or completely incomprehensible by over 45% of the
listeners and those ones which were viewed as either very easy or rather easy to
understand by over 50% of the subjects. The first category comprises the following:
Accents which were very difficult to understand / incomprehensible
German - - 96%
English English – 45%
American English – 55%
Scottish English – 46%
Russian – 50%
As shown above, the German-accented sample was rated as the most incomprehensible
by as many as 96% of the respondents as it was indeed the most heavily accented
recording. It is striking that all three English-accented samples were also placed in this
group in spite of the fact that all the participants learn English, which should facilitate
Let us examine now the second group of samples.
Accents which were very easy / rather easy to understand
Chinese – 76%
Spanish – 58%
Italian – 76%
Scottish English – 53%
French – 71%
Russian – 50%
The Chinese, Italian and the French samples were absolute winners in this category. The
Scottish English speaker was judged by 53% of the subjects as rather easy to understand.
Two contradictory evaluations should be pointed out concerning the Russian and
Scottish recordings which found themselves in both categories. Thus, a similar number
of the subjects maintained that they were easy / rather easy to understand and that they
were difficult to comprehend.
Jolanta Szpyra-Kozłowska and Marek Radomski
The next task required specifying the degree of samples’ foreign-accentedness. The
participants were provided with five options (from ‘slight’ to ‘ very strong foreign
accent’). The results are presented below.
Strong / very strong foreign accent:
German – 85%
English English – 76%
American English – 85%
Chinese – 40%
Russian – 83%
Scottish English – 33%
According to the above figures, the American English and English English samples were
regarded as strongly accented by 85% and 76% of the listeners respectively. The
judgements were less severe in the case of the Scottish English recording, which was
considered strongly accented by 33% of the subjects.
The remaining samples were perceived as pronounced with a very slight or slight
foreign accent.
Very slight / slight foreign accent
Spanish – 26%
Italian – 20%
French – 25%
Scottish English – 18%
Again, we should note the occurrence of the Scottish recording in both categories, which
shows that this particular sample was difficult for the listeners to evaluate.
The third task consisted in deciding how pleasant / unpleasant sounding a given
accent was. As in the previous cases, five options were supplied to choose from. The
relevant figures are given below.
Rather unpleasant /very unpleasant accents:
German – 71%
American English – 50%
Russian – 63%
English English – 50%
Both American English and English English accents in Polish were placed in this group
with about half of the subjects regarding them as either rather unpleasant or very
unpleasant. It is worth pointing out that, apart from Russian, the remaining samples
represent Germanic languages, commonly perceived by Poles as harsh sounding.
The most pleasant accents included the following:
Very pleasant / pleasant accents:
French – 55%
Italian – 48%
Spanish – 33%
It is striking that all the three samples found in the category of pleasant sounding accents
were provided by speakers of Romance languages, in common Polish opinion regarded
as nice and melodious.
The greatest differences of opinion were observed in the case of two accents, i.e.
Chinese and Scottish English, with a similar number of respondents judging them as
pleasant and unpleasant:
Scottish English
As in the remaining instances, the Scottish recording appears to stand apart from the
other ones in triggering contradictory judgements of the listeners.
A closer examination of the above data shows that there is a large degree of
correlation between the three aspects of accent perception analysed in this section. Thus,
The Perception of English-Accented Polish - A Pilot Study
the French, Italian and Spanish samples were judged easy to understand, only slightly
foreign-accented and pleasant sounding. On the other hand, German, Russian, American
English and English English samples were assessed as difficult to understand, heavily
accented and unpleasant sounding. Only in the case of the Chinese and Scottish English
samples were the judgements less uniform; both were perceived as strongly accented but
easy to understand and this discrepancy may be the reason why they were evaluated in
two extreme ways in terms of their pleasantness.
To sum up, of the three English accents in Polish, the Scottish English recording was
more highly evaluated by Polish listeners than the American English and English
English samples in terms of its comprehensibility and aesthetic qualities.
The mean results concerning the comprehensibility, accentedness and pleasantness of
the experimental samples are presented in the table below. A five point scale (1-5) was
used, where the higher the figure, the more severe the participants’ judgements.10
Speakers’ native comprehensibility accentedness pleasantness
English English
American English
Scottish English
Table 1. Mean evaluations of the samples’ comprehensibility, accentedness and pleasantness
The data in Table 1 confirm our earlier observations concerning a high degree of
correlation between the listeners’ evaluations of the samples’ comprehensibility,
accentedness and pleasantness. This is in agreement with the findings of previous
research (e.g. Fayer and Krasinski 1987, Munro and Derwing 1995) which indicate that a
lower degree of foreign accent is associated with higher intelligibility and lower
It should be pointed out that there are some differences between the results presented earlier and
those in Table 1 due to the already discussed contradictory evaluations of some samples which
influence the mean values in the table.
It should be added that in accent evaluations various extralinguistic factors, such as, for
example, the listeners’ attitude towards various ethnic groups, often play an important role. We
address this issue in another experimental study which is now in preparation.
Jolanta Szpyra-Kozłowska and Marek Radomski
3.4. Perceived phonetic properties of English-accented Polish
The respondents were also requested to enumerate those phonetic properties of the
presented foreign accents which they found particularly striking. They did it either by
listing some words found in the samples and underlining their mispronounced portions
or by making explicit comments on the specific aspects of the speakers’ pronunciation,
such as, for example, “he pronounces ‘r’ in a strange way’ or ‘she puts too much
emphasis on ‘p’ and ‘k’.’
All the participants were unanimous in pointing out the most noticeable features of
all the accents occurring in our experiment. The first of them concerns the pronunciation
of Polish coronals, i.e. the ‘soft’ realization of the postalveolar obstruents as
palatoalveolars. The second problem involves prepalatals, usually pronounced by foreign
learners also as palatoalveolars.12 In other words, Polish listeners often observed in the
experimental samples the lack of distinction between postalveolars and prepalatals,
rarely found in other languages.
The next common difficulty concerns consonant clusters which abound in Polish in
all positions, but are infrequent in other languages. In the case of the English-accented
samples the following word-initial clusters were often underlined as pronounced
incorrectly13: szczupła /tupwa/ ‘slim,’ wcześnie /ftee/ ‘early,’ przygotowuje
/pgotovuje/ ‘prepares,’ zdolny /zdoln/ ‘talented,’ wstawać /vstavat/ ‘get up,’
śniadanie /adae/, etc.
The respondents noted also some characteristic vowel features, i.e. frequent
replacements of the high front centralized vowel, spelt as <y> with its fully front
counterpart [i], e.g. Krystyna > [Kristina] ‘Christine’, medycyna > [medicina] ‘medical
science,’ as well as problems with the correct pronunciation of the so-called nasal
vowels, spelt as <ą> and <ę>, which are realized in several ways depending on the
Three additional features frequently appeared in the assessment of the Englishaccented samples. Many respondents noted aspiration of voiceless plosives claiming that
the speakers put too much emphasis or stress on /p/ and /k/, ‘spit them out’ or simply
‘pronounced them in a funny way.’ Also the English rhotics in all three accents attracted
much attention with such comments as ‘she pronounces /r/ differently than we do’ (about
the American speaker), ‘he swallows many r’s’ (about the English English speaker) and
‘his /r/ is blurred / unclear’ (about the Scottish English speaker). Finally, some listeners
observed what they considered an unusual pronunciation of stressed and unstressed
vowels; the former were often lengthened, the latter reduced, as in do domu ‘home’
pronounced as [ddo:mu] and kupili ‘they bought’ rendered as [k’pili].
Other realizations of prepalatals, for example, as palatalized dentals / alveolars, were also
The incorrect versions contained either modifications of one or two consonants in a cluster, a
deletion of a segment or vowel insertion.
A more detailed description of the perceived phonetic properties of foreign-accented Polish can
be found in Szpyra-Kozłowska and Radomski (in press).
The Perception of English-Accented Polish - A Pilot Study
Thus, in their evaluations of English-accented Polish, the listeners paid attention almost
exclusively to segmental features, particularly those pertaining to consonants and
consonant clusters. Prosodic aspects of the experimental samples failed to be noticed by
almost all the participants, which contradicts those views (e.g. Jilka 2011) according to
which suprasegmental factors, and intonation in particular, are of primary importance in
the perception of foreign accent.
3.5. Impressionistic evaluation
In the final task the respondents were asked to provide their own descriptions of the
experimental samples’ accents which were, obviously, impressionistic in character. The
most striking observation we have made concerns a large number of negative terms in
comparison with positive evaluations.
Thus, the adjectives that were found in virtually all answer sheets were ‘dziwny’ and
‘śmieszny’, both of which are ambiguous in Polish as they are in English; the former
means both ‘strange’ and ‘difficult to accept, weird,’ the latter both ‘amusing’ and
‘ridiculous.’ Other frequently used terms include irytujący ‘irritating,’ denerwujący
‘annoying,’ żałosny ‘pathetic,’ okropny ‘terrible,’ sepleniący ‘lisping,’ nieporadny
‘clumsy,’ niewyraźny ‘unclear,’ nudny ‘boring,’ mówi jakby miał zatkany nos ‘speaks
through a stuffed nose,’ mówi jak pijany ‘sounds drunk.’
Positive and neutral terms such as interesujący ‘interesting,’ miły ‘nice,’ delikatny
‘delicate,’ fajny ‘cool,’ miękki ‘soft,’ egzotyczny ‘exotic,’ were rarely employed by the
This result of our study is further supported by the fact that in the scalar judgement
task which involved describing the accents’ pleasantness, two extreme options were
selected with strikingly different frequency; of 540 evaluations, only in 30 cases was the
‘very pleasant’ label chosen, while its opposite, i.e. ‘very unpleasant’ over three times
more often (108 times).
The accents which evoked most negative comments of the participants were German,
described as twardy ‘hard’, szorstki ‘harsh,’ toporny ‘coarse’ and barbarzyński
‘barbaric,’ but also English English and American English, perceived by many subjects
as plujacy ‘spitting,’ sepleniacy ‘lisping,’ and niedbały ‘careless,’ niechlujny ‘sloppy.’
The samples which received the most positive descriptions comprise Italian
(śmieszny ale fajny ‘funny but cool,’ interesujący ‘interesting’), Spanish (egzotyczny
‘exotic,’ ciekawy ‘interesting’) and Chinese (delikatny ‘delicate’).
The above facts point to a fairly critical attitude of the participants towards foreignaccented Polish who seem to fail to appreciate the amount of effort required in learning a
difficult language like Polish and who are rather harsh in their judgements. This might
stem from the fact of their insufficient exposure to foreign versions of Polish and the
resulting lack of tolerance towards something that is little known and should therefore be
approached with caution. The teenagers’ predominantly negative perception of foreignaccented Polish may also be attributed to a tendency typical of that particular age group
to express highly critical and frequently extreme and unbalanced opinions. Whether this
Jolanta Szpyra-Kozłowska and Marek Radomski
intolerance of foreign accents in Polish speech is a more general issue remains to be
investigated in the future.15
4. Conclusions
The present pilot study on the perception of foreign-accented Polish, and Polish spoken
with an English accent in particular, allows us to formulate several tentative conclusions.
1. Of the nine samples employed in our experiment, it was the easiest for the
participants to recognize the Russian, American and Chinese accents. The English
English and Scottish English samples were identified correctly by a few subjects
2. No significant correlation was found between the fact of learning a given foreign
language and the ease of its recognition. Only 30% of the participants, all
learners of English, were able to identify the English-accented samples as
produced by a native-speaker of English.
3. The American English and English English samples were assessed as difficult to
understand, heavily accented and unpleasant sounding. The Scottish English
accent received more favourable opinions on all three counts.
4. While a non-Polish pronunciation of postalveolar and prepalatal consonants as
well as problems with consonant clusters appear to be the most noticeable
properties of all foreign accents, those produced by native-speakers of English are
additionally perceived as having aspirated plosives, a nontrilled pronunciation of
rhotics, as well as lengthened stressed vowels and reduced unstressed vowels.
5. The participants take a critical attitude towards foreign-accented Polish shown,
among other things, in their use of many negative evaluative terms, several of
which were provided in reference to the English-accented samples, with Scottish
English again standing apart as perceived more positively.
As has already been pointed out, further research is needed to find out whether the above
conclusions will retain their validity when more samples of foreign-accented Polish and
different groups of participants are employed in the experimental procedure.
foreign language teaching and learning. Textbooks function in a foreign language
classroom in many capacities (Cunningsworth 1995), one of which is the provision of
text, used as a model for language practice, including practice of pronunciation. The
changing methodological trends in EFL pedagogy over the decades affect EFL textbook
pronunciation treatment in a variety of ways. In this paper a simple feasibility study is
presented whereby a few beginners’ textbooks are compared with respect to their handling
of pronunciation in the first unit of the course. Four textbooks come from about ½ century
ago, and three are sampled from among those currently available. On the descriptive level,
some analysis is offered of the phonetic (and especially phonolapsological) characteristics
of the sampled texts, as they changed through time. On the level of application, it is
claimed that, while the lexico-grammatical and pedagogical limitations on the content of
the first lessons/units in EFL textbooks leave authors little space for phonetic control, a
modicum of such control is feasible if attention is paid to such variables as pronunciation
difficulty and L1 transfer. The Phonetic Difficulty Index (PDI), which is briefly
introduced in the paper, can be used to measure and control some of these variables and
give the textbook authors and users a useful teaching/learning instrument.
1. Introduction
This is a feasibility study for a much larger potential research project into the treatment
of pronunciation in beginners’ EFL textbooks. Part of that project would be a diachronic
analysis of such textbooks over approximately half a century, to see how pronunciation
is introduced to beginning learners, both explicitly and implicitly, in the text, as well as
in the multimedia and online materials accompanying the recent generations of EFL
textbooks. The focus is not on the specifically phonetic resources, i.e. those whose stated
aim it is to teach pronunciation (see Wrembel 2004 for this perspective), but on the
standard materials targeting the general population of learners, with no ESP or other
The desirability of a study like this is dramatically underscored by: (i) the relative
paucity of research on the handling of pronunciation in EFL textbooks on the one hand,
and (ii) the fundamental importance of the textbook as the primary teaching/learning
resource in most EFL classrooms around the globe.
Włodzimierz Sobkowiak
Considering the above factors, as well as the enormity of the EFL resource market, both
synchronically and diachronically, a thorough analysis of EFL textbook phonetics would
be a project of impractically grandiose proportions. In this study I can only take a closer
look at some aspects of the whole issue. Consequently, I decided to concentrate on the
the changing methodological trends in FL pedagogy over the last five decades
affect EFL textbook pronunciation treatment in a variety of ways,
the lexico-grammatical and pedagogical limitations on the content of EFL
textbooks leave authors little space for phonetic control, but...
such control of textual material is feasible if attention is paid to such variables
as pronunciation difficulty and L1 transfer,
the Phonetic Difficulty Index (PDI) can be used to help measure and control
some of these variables and give the textbook authors and users a useful
teaching/learning instrument.
Within the limits of this short paper, I will, try to throw some light on the above issues
by taking a comparative look at the treatment of pronunciation in two small samples
from EFL textbooks separated by several decades of time. The first sample comes from
my own first-time EFL experience, which locates it towards the end of the 1960’s. This
sample includes such textbooks as: Nauka angielskiego, English for everyone, Present
day English for foreign students and First things first (see References). I then compare
those old textbooks with a random sample of these which can currently be found on
bookshop shelves: Angielski dla samouków, Angielski nie gryzie! and Korepetycje
domowe 1. In both cases I only look at the contents of the respective “lesson/unit one” in
each of these books, with particular attention paid to how pronunciation is presented and
This methodology allows no pretense of being even close to traditionally conceived
scientific-empirical rigour, of course. On the one hand, for example, the old sample
would probably constitute about half of all EFL textbooks of use in Poland at that time,
Poland being behind the iron curtain, and EFL being discouraged, as opposed to
Russian. The market of EFL resources in contemporary Poland is booming, on the other
hand, and the socio-political situation is entirely different. From this point of view, then,
the two samples are hardly at all comparable. I believe, however, that they can still do
their service of yielding interesting preliminary and tentative insights into the issues here
treated. In the study proper of EFL textbook phonetics the selected empirical textbook
database would obviously need to be substantiated in a more rigorous manner.
2. The importance of the textbook in EFL
That the textbook is of fundamental importance in (formal) EFL teaching and learning,
and that it is in the very centre of almost all EFL classrooms around the world, is hardly
a controversial claim. Indeed, many teachers and educators, as well as researchers and
analysts, have noticed that the status of the textbook may well be too elevated, compared
This sample was actually taken at random from among beginning EFL textbooks available on
Empik shelves in November 2011.
This is Tom = /zyzys'tom/
to other available resources. This could be because the textbook plays a number of roles
at the same time. In his monograph entirely devoted to choosing the coursebook for an
EFL course, Cunningsworth lists the following roles:
“a resource for presentation material (spoken and written)
a source of activities for learner practice and communicative interaction
a reference source for learners on grammar, vocabulary, pronunciation, etc
a syllabus (…)
a resource for self-directed learning or self-access work
a support for less experienced teachers (…)” (Cunningsworth op cit:7)
All of these functions can, and normally do, refer to pronunciation work, including the
last one listed. However, as happens to be the case, the ‘support’ a teacher could count
on to obtain from a coursebook with respect to his/her work on pronunciation would in
most cases be negative. That is to say, few general textbooks offer teachers much by way
of methodological help with phonetics. Indeed, as noticed many times in relevant
research, explicit and systematic treatment of pronunciation is by and large absent from
most EFL coursebooks currently available (Szpyra-Kozłowska et al. 2003, SzymańskaCzaplak 2006). Thus, the teacher is ‘supported’ by the textbook in his/her belief that
pronunciation is best left alone: “Teaching English pronunciation is an area of language
Another authority on FLT, wrote in 1981: “The importance of the textbook cannot be
overestimated. It will inevitably determine the major part of the classroom teaching and
the students’ out-of-class learning” (Rivers 1981:475). A generation later, and in a
completely different stage of the development of FLT methodology, we can find
surprisingly similar observations: “The heavy reliance on a coursebook in a foreign
language classroom is a crucial issue. The fact that the teachers and learners use the
coursebook and its supporting materials as their basic aid proves the importance of
selecting and evaluating an appropriate coursebook” (İnal 2006:22). İnal’s quote
immediately brings to mind two important issues. First: the work of “selecting and
evaluating” does not stop at the level of the textbook as a whole; once that is selected,
the teacher must often select and evaluate the contents of the textbook at hand on various
levels and from the point of view of various functionalities. For example, “is the
treatment of football vocabulary useful in Unit Six of my textbook, or should I try to find
something better?” Second, and in direct relevance to pronunciation: it would be easy
enough to select and evaluate material on the basis of how pronunciation is explicitly
treated in the given unit/lesson of the course. But how can a teacher evaluate the implicit
handling of pronunciation in the textbook? For example, what is the phonetic profile of
the text contained in the unit? Which words might be particularly troublesome to
learners? Is the phonetic difficulty progression through the coursebook in parallel with
the other gradients, such as those of vocabulary or grammar? Textbooks or methodology
guides bundled with them would not provide this kind of information for a number of
reasons: the overall neglect of EFL pronunciation in most syllabuses, curricula and
courses (cf., for example, Baran-Łucarz 2006), the paucity of relevant research guiding
the materials developers, or the concomitant lack of software support for phonetic
analysis of coursebook text. Later in this contribution Phonetic Difficulty Index (PDI) is
used to demonstrate that some phonetic control over text is indeed feasible.
Włodzimierz Sobkowiak
3. EFL textbooks then and now
Let us now have a look at some examples of the treatment of pronunciation in
coursebook texts for beginners as it was half a century ago, and as it is now. In order to
do this, I will illustrate my discussion with some facsimiles of authentic textbook pages
Fifty (and more) years ago coursebook authors did not shy away from explicitly
providing phonetic transcription from lesson one. In Figure 1 a snapshot of the very first
lines of unit one is shown of a textbook by MacCallum and Thomas Watson, published
in Poland in 1946.
Figure 1. MacCallum and Watson, 1946
This coursebook was originally published before WW II in London, and shows clear
signs of the grammar-translation method, e.g. the grammatical explanation of articles
right at the very beginning of the text. On the other hand, however, the simplified
phonetic transcription (see Sobkowiak 1997 for an in-depth treatment of L1-sensitive
simplification of phonetic transcription) shows the influence of the new, post-war
paradigm: that of audiolingualism. The learner is expected to try to speak from the very
beginning. Nowadays native-speaker recording would be used instead of transcription, of
course, but the principle is the same. The textbook continues with lesson one by
providing a text for practice; a part of it is reproduced in Figure 2.
Figure 2. MacCallum and Watson, 1946
This is Tom = /zyzys'tom/
It is samples of such introductory texts from several textbooks which will be used later
in this paper to make up a mini-corpus for the application of the PDI metric. At this point
let us only notice a few interesting points without going deeper into the text’s phonetic
structure. First, back in 1946 the EFL profession had not yet heard of the communicative
method, which shows in the quality of the text: the sentences are there entirely as
language specimens, rather than to really communicate anything. On the phonetic front
notice the high incidence of the definite article, one of the words hardest to pronounce
for a Polish EFL learner. Notice that in some cases the article could actually be avoided
in this text: “water is cold” is perfectly grammatical, and pragmatically speaking even
better than the original sentence (which water is cold, anyway?).
My own early learning of English was almost entirely based on the course of Frank
Candlin, published in 1963. It was reprinted many times in Poland; my own 1969 Polish
edition is the fourth. Figure 3 shows the very beginning of lesson one and the dialogue
appearing later in the lesson.
Figure 3. Frank Candlin, 1963
Like in MacCallum and Watson before, notice the heavy reliance on some of the
phonetically hardest function words in English: this, these, that, those. The didactic
intent of this move is clear, of course, but it is equally evident that no phonetic reflection
went into the preparation of these introductory texts. The dialogue remains completely
wooden, with a pragmatically most infelicitous turn at the very end, doubtless meant to
illustrate a grammatical point, but misfiring badly.
It took a few more years for the communicative method to finally hit the mainstream
coursebooks. In Poland it was ushered in by the immensely popular course of
L.G.Alexander, published for this market in 1973. The opening dialogue in that course
happened at a railway station and went like this: “Excúse me! / Yés? / Ís thís yóur
hándbag? / Párdon? / Ís thís yóur hándbag? / Yés, it ís / Thánk you véry múch”. Notice
that: (i) word stress is indicated explicitly (and sometimes somewhat superfluously), (ii)
sandhi clusters like those in the middle of “is this” are practically unpronounceable well
into intermediate stages of EFL proficiency, even if the phrase is pragmatically very
natural and common. While there is a lot of emphasis on spoken practice in Alexander’s
course, there is little explicit treatment of pronunciation, which word seemed little short
Włodzimierz Sobkowiak
of a four letter word to many communicatively minded methodologists. Alexander’s
course may well mark this turning point between audio-lingualism and
communicativeness in EFL.
Finally, during my grammar school years (1970-1974) I used the then standard
school textbook, Smólska and Rusiecki’s English for everyone, 1965 edition. It was
unique among the books discussed so far in that it treated pronunciation seriously. In
Figure 4 the beginning of unit one is shown, with (Polish-simplified) phonetic
transcription used throughout.
Figure 4. Smólska and Rusiecki, 1965
The customary “this is…” appears here as well, providing a lot of space for error, for
example for final devoicing and/or sandhi regressive voice assimilation (typical of
Polglish accents in western Poland), as shown in the title of this paper: /zyzys'tom/. In
unit one of the course, i.e. one written for complete beginners, we can also find some
rather sophisticated discussion of the /æ/ vowel and final devoicing, using phonetic
terminology (in Polish), such as: mouth open, lower jaw, front of the tongue, incisors,
tensing, devoicing.
If we now fast-forward half a century, we will find ourselves in a completely
different textbook environment. Not only is there an almost uncountable variety of
coursebooks and accompanying multimedia materials with online support, but the EFL
teaching/learning paradigms have changed dramatically. In effect, we would be hard
pressed to find any explicit treatment of pronunciation in contemporary textbooks at all.
This includes phonetic transcription, too, which is maybe regarded as useless in view of
the easy availability of spoken resources in the form of recordings and video files. In
none of the three textbooks sampled here is there phonetic transcription in the first unit
of the course.
This is Tom = /zyzys'tom/
As a representative illustrative example let me use Birkenmajer and Mańko, published in
2004. Figure 5 holds the beginning of the first unit.
Figure 5. Birkenmajer and Mańko, 2004
The notorious “this is…” is gone. The pragmatic quality of the sentences is certainly
higher than it used to be ½ century earlier (with the notable exception of Alexander’s).
Birkenmajer and Mańko are not afraid to attach Polish translations to the target English
sentences. Finally, from the point of view of phonetics, it is striking that there is no
advice whatsoever about the pronunciation of the two morphophonemic variants of the
plural morpheme.
4. PDI analysis of the textbook sample
While the above overview of the EFL textbooks affords some superficial appreciation of
a number of phonetic issues, it would be hard to draw some more far-reaching
conclusions concerning the profiling of pronunciation on the basis of a scan of
introductory pages. This is why I decided, as mentioned above, to compile a mini-corpus
of text, collecting all object-text taken from the seven textbooks under consideration
here: four ‘then’ and three ‘now’ (see References for details). Object-text is here defined
as that which is the teaching target, rather than meta-text used for unit organization,
providing linguistic advice, introducing exercises, etc. Thus, the records collected in the
sample would include the sentences of expository text as well as utterances in dialogues.
There are altogether 77 records in the database, each one tagged with the textbook
identifier, phonetically transcribed and PDI-processed. The database can be
conceptualized and visualized in a number of ways. Figure 6 shows its view in a simple
lister overlay application running under Windows.
Włodzimierz Sobkowiak
Figure 6. A sample of the textbook corpus/database
The highlighted record, The window is open, comes from MacCallum and Watson 1946
(‘cal’). The fourth column contains the PDI difficulty codes, the DIF column shows the
mean word-weighted PDI value of this record, the WORDS column holds the number of
words (four), and the CALY column sums up the global PDI value of the record (nonword-weighted; five in this case). Some phonetic difficulties identified by the PDI
algorithm are listed in their separate fields: thus, for example, J stands for schwa, and Z
stands for a word-final voiced obstruent (prone to erroneous devoicing in Polglish).
There are two occurrences of the former and one of the latter in The window is open.
The PDI metric and algorithm has been introduced, described and analyzed in-depth
in a number of publications by Sobkowiak and Sobkowiak and Ferlacka (see
References). The most concise definition is this: “PDI is a global numerical measure of
the phonetic difficulty of the given English lexical item for Polish learners. The measure
combines (a) the most salient grapho-phonemic difficulties such learners are known to
have reading English, i.e. mostly spelling pronunciation, (b) some commonest phonemic
L1-interference problems known from the literature and my own teaching experience,
finally (c) some of the notorious developmental L2-interference pronunciation errors
observed in all learners of English regardless of their L1 background” (Sobkowiak
1999:214). In its current implementation PDI contains 63 points in its checklist. The
algorithm can be run over a word list or arbitrary text in ordinary spelling; it first
phonetically transcribes the text, and then tags it with identified difficulty points to
produce output shown in Figure 6. All type of phonolapsological statistics and analyses
can be initiated at this point. PDI has been used to study, among others, the
phonolapsological profile of dictionary definitions (Sobkowiak 2006a) and graded
readers (Sobkowiak and Ferlacka 2011).
The PDI algorithm has been run on the mini-corpus of coursebook text collected in
ways described earlier. Some of the global PDI statistics gleaned from this analysis
appear in Table 1. The ‘then’ column shows data for the four older textbooks, the ‘now’
column – for the three new ones. With this size sample no statistically significant effects
can be obtained, but the observed differences are certainly interesting and promising for
potential further research.
This is Tom = /zyzys'tom/
# records
# words
average record length (in words)
average PDI value per record
average PDI value per word
# ‘easy’ words (with PDI=0)
average ‘easy’ words per record
Table 1. Some phonolexical statistics: ‘then’ versus ‘now’
It will be seen that, while the number of records (sentences) is roughly equivalent for
both sub-samples, the number of words differs: apparently the sentences are now longer
than they used to be. This can, of course be observed if Figure 5 is compared with the
previous ones: gone are the strangely concise? These are walls entries in favour of more
communicatively felicitous, and longer, sentences. With longer sentences the overall
PDI value per sentence must grow as well, of course; it goes from 5.6 to 7.8 between
‘then’ and ‘now’. In plain language this means that there were almost six points of
pronouncing difficulty in one sentence in the beginning sections of the ‘old’ textbooks,
but there are almost eight such points in the equivalent sample of contemporary
textbooks. In all of my past work with PDI, however, this statistic has been weighted by
the number of words in a record, to avoid the counterintuitive claim that a longer
sentence is ipso facto phonetically harder than a shorter one. If word-weighting is
applied to the data at hand, the average PDI value figures for ‘then’ versus ‘now’ are not
very different, as can be seen in the table2. Interestingly, the value seems to have gone
down a bit, the effect which is more dramatically observed in the number of ‘easy’
words per record across the two sub-corpora: this has grown more than five times
between then and now. Should this turn out in further research to be a robust effect, it
could be evidence that textbook writers do tend to make their resources more
phonetically user-friendly than used to be the case half a century ago. This is not to
claim that phonolapsological control is wielded directly; rather that some other editorial
decisions and choices indirectly affect the phonetic profile of the text. Incidentally, this
is also the phenomenon observed in the PDI analysis of pedagogical dictionary
definitions and graded readers.
In a larger study of textbook phonolapsology this would be an entry point to a more
thorough treatment of the collected corpus text. Space restrictions do not allow this here.
But a few more examples can be provided of how PDI can be used not only to analyze
textbooks for the benefit of writers and editors, but also to assist teachers and learners in
their tasks of evaluation and selection, mentioned at the beginning of this paper. Because
the PDI algorithm not only computes the overall PDI value of a word or sentence, but
also tags each word or sentence with the specific phonetic difficulty points it contains, as
exemplified in Figure 6, it is possible to select wanted material from text with a fair
This value is notably lower, by the way, than the mean word-weighted PDI value counted over
the corpora of controlled-vocabulary dictionary definitions (Sobkowiak 2006a) or of simplified
graded readers (Sobkowiak and Ferlacka 2011), where mean PDI=1.79.
Włodzimierz Sobkowiak
degree of precision. Thus, not only can one obtain sentences with the highest/lowest PDI
value in the sample: I work with many other teachers men and women (PDI=16), Peter
and John are talking to their wives (PDI=21), I like music (PDI=0), My name is Max
(PDI=0.5, word-weighted), but one can also request sentences with a high/low incidence
of a given PDI code or code cluster (see Sobkowiak 2006b for so-called PDI
codegrams). If word-final (de)voicing is under study or practice, for example, sentences
with many instances of PDI(Z) can be located: This is his dog, I teach many students
girls and boys, I work with many other teachers men and women (all with 3
occurrences). By contrast, if no word-final voiced obstruents are wished for, it is easy to
use PDI to come up with: I teach in a school in Coventry, They are all very intelligent, I
like music.
Similarly precise queries can be easily formulated for all of the 63 PDI codes.
Likewise, it is possible to combine queries for specific PDI codes with those for PDI
values, e.g.: give me those sentences which are generally phonetically easy, i.e. low PDI,
but with a high proportion of words containing a given phonetic difficulty. All of the
other existing variables, such as sentence word-length, word spelling, or textual
frequency, could be similarly combined into such queries.
5. Conclusions
The underlying theme of this preliminary study is the notion that because textbooks are
of such fundamental pedagogical importance in the foreign language classroom, the
underlying phonetic and phonolapsological profile of the texts used must have a
powerful effect on acquisition of the target language pronunciation. If this hypothesis
sounds prima facie somewhat less plausible than if it applied to the grammatical or
lexical structure of beginners’ coursebooks it might be due to the current state of the art
when it comes to EFL pronunciation teaching and research, i.e. the general neglect
mentioned at the beginning of this paper. While grammar and vocabulary are under strict
editorial control in beginners’ textbooks, and hence expected to bring targeted
consequences in terms of learning, acquisition, skill and proficiency, pronunciation is
seldom, if at all, treated in this way, at least outside of dedicated phonetic coursebooks,
which are not normally used with beginners anyway.
If this inference generally makes sense, then, a thorough phonetic study of EFL
textbooks becomes a necessity. This can be done in a number of ways, of course, and
with a variety of tools. What I have demonstrated in this paper is just one such tool,
namely PDI, and one methodology, namely a contrastive chronological look at textbooks
‘then’ versus ‘now’. Quite apart from the phonetic and phonolapsological study of
textbooks, it would also be extremely interesting to compare the actual effect of
textbooks, one or two (human) generations apart, on the EFL achievement, phonetic and
otherwise, of learners belonging to those generations. This, needless to say, would be a
project of enormous proportions and complexity.
This is Tom = /zyzys'tom/
Alexander, Louis George [pseud.]. 1973. First things first: an integrated course for
beginners. Warszawa: PWN. [Reprint of the first English edition, 1967, Harlow:
Baran-Łucarz, Małgorzata. 2006. Prosto w oczy – fonetyka jako “Michałek” na studiach
filologicznych?. In Dydaktyka fonetyki języka obcego w Polsce, eds. Włodzimierz
Sobkowiak and Ewa Waniek-Klimczak, 7-17. Konin: Wydawnictwo Państwowej
Wyższej Szkoły Zawodowej.
Birkenmajer, Maria and Elżbieta Mańko. 2004. Korepetycje domowe. Jezyk angielski
(nowa edycja). Warszawa: Langenscheidt Polska.
Candlin, Edwin Frank. 1963. Present day English for foreign students. Warszawa:
Państwowe Wydawnictwo Wiedza Powszechna. [Reprint of the first English edition,
1963, London: Hodder and Stoughton]
Cunningsworth, Alan. 1995. Choosing your coursebook. Oxford. Heinemann.
Dostalova, Iva, Sarka Zelenkova and James Branam. 2011. Angielski dla samouków.
Ożarów Mazowiecki: Firma Księgarska Olesiejuk. [Reprint of the Czech edition,
2003, Praha: Fragment]
İnal, Bülent. 2006. Coursebook selection process and some of the most important criteria
to be taken into consideration in foreign language teaching. Journal of Arts and
Sciences 5: 19-29.
MacCallum and Thomas Watson. 1946. Nauka angielskiego; szybko, łatwo i przyjemnie.
Celle-Unterlüss: Wydawnictwo Antoniego Markiewicza. [Reprint of the English
edition, 1937, Nauka angielskiego. English for Poles. An easy and quick method;
London: Orbis]
Nowak, Agata. 2011. Angielski nie gryzie! Warszawa: Wydawnictwo Edgard.
Rivers, Wilga M. 1981. Teaching foreign-language skills. Chicago University Press:
Smólska, Janina and Jan Rusiecki. 1965. [1st ed. 1963]. English for everyone.
Warszawa: Państwowe Zakłady Wydawnictw Szkolnych.
Sobkowiak, Włodzimierz. 1997. Radically simplified phonetic transcription for Polglish
speakers. In Language history and linguistic modelling. Festschrift for Jacek Fisiak
on his 60th birthday, eds. Raymond Hickey and Stanisław Puppel, 1801-1830.
Berlin: Mouton.
Sobkowiak, Włodzimierz. 1999. Pronunciation in EFL Machine-Readable Dictionaries.
Poznań: Motivex.
Sobkowiak, Włodzimierz. 2004. Phonetic Difficulty Index. In Dydaktyka fonetyki języka
obcego. Zeszyt Naukowy Instytutu Neofilologii Państwowej Wyższej Szkoły
Zawodowej w Koninie nr 3., eds. Włodzimierz Sobkowiak and Ewa WaniekKlimczak, 102-107. Konin: Wydawnictwo Państwowej Wyższej Szkoły Zawodowej.
Sobkowiak, Włodzimierz. 2006a. Phonetics of EFL dictionary definitions. Poznań:
Wydawnictwo Poznańskie.
Włodzimierz Sobkowiak
Sobkowiak, Włodzimierz. 2006b. PDI revisited: lexical cooccurrence of phonetic
difficulty codes. In Dydaktyka fonetyki języka obcego. Neofilologia VIII. Zeszyty
naukowe Państwowej Wyższej Szkoły Zawodowej w Płocku, eds. Włodzimierz
Sobkowiak and Ewa Waniek-Klimczak, 225-238. Płock: Wydawnictwo Państwowej
Wyższej Szkoły Zawodowej.
Sobkowiak, Włodzimierz and Wiesława Ferlacka. 2011. PDI as a tool of phonetic
enhancements to graded e-readers. In The acquisition of L2 phonology, eds. Janusz
Arabski and Adam Wojtaszek, 138-158. Bristol: Multilingual Matters.
Szpyra-Kozłowska, Jolanta et al. 2003. Komponent fonetyczny w podręcznikach
przygotowujących do egzaminów Cambridge (FCE, CAE, CPE). In Zeszyt Naukowy
Instytutu Neofilologii Państwowej Wyższej Szkoły Zawodowej w Koninie nr 2, eds.
Włodzimierz Sobkowiak and Ewa Waniek-Klimczak, 137-144. Konin:
Wydawnictwo Państwowej Wyższej Szkoły Zawodowej.
Szymańska-Czaplak, Elżbieta. 2006. Nauczanie fonetyki w szkole na poziomie
elementarnym – analiza wybranych podręczników do nauki języka angielskiego. In
Dydaktyka fonetyki języka obcego w Polsce, eds. Włodzimierz Sobkowiak and Ewa
Waniek-Klimczak, 231-238. Konin: Wydawnictwo Państwowej Wyższej Szkoły
Wrembel, Magdalena. 2004. Beyond ‘listen and repeat’ – an overview of English
pronunciation teaching materials. In Dydaktyka fonetyki języka obcego. Zeszyt
Naukowy Instytutu Neofilologii Państwowej Wyższej Szkoły Zawodowej w Koninie nr
3., eds. Włodzimierz Sobkowiak and Ewa Waniek-Klimczak, 171-179. Konin:
Wydawnictwo Państwowej Wyższej Szkoły Zawodowej.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0045-6
Justus Liebig University, Giessen
Focusing on English in Ghana, this paper explores some ways in which early popular
music recordings might be used to reconstruct the phonology of colonial and post-colonial
Englishes in a situation where other recordings are (mostly) absent.
While the history of standard and, to a certain degree, non-standard varieties of “Inner
Circle Englishes” (Kachru 1986) has received linguistic attention, diachronic
investigations of Outer Circle varieties are still the exception. For the most part,
descriptions of the history of post-colonial Englishes are restricted to sociohistorical
outlines from a macro-sociolinguistic perspective with little if any reference to the
linguistic structure of earlier stages of the varieties. One main reason for this lack of
diachronic studies is the limited availability of authentic historical data. In contrast to
spoken material, written sources are more readily available, since early travel accounts,
diaries or memoirs of missionaries, traders and administrators often contain quotes and at
times there are even documents produced by speakers of colonial Englishes themselves
(cf. the diary of Antera Duke, a late 18th century Nigerian slave trader; Behrendt et al.
2010). Such material provides insights into the morphology, syntax and the lexicon of
earlier stages of varieties of English (cf. Hickey 2010), but it is inadequate for the
reconstruction of phonological systems. Obtaining spoken material, which permits
phonological investigation, is far more difficult, since there are comparatively few early
recordings of Outer Circle Englishes. In such cases, popular music recordings can fill the
I will present first results of an acoustic analysis of Ghanaian “Highlife” songs from
the 1950s to 1960s. My results show that vowel subsystems in the 1950s and 1960s show
a different kind of variation than in present-day Ghanaian English. Particularly the
STRUT lexical set is realized as /a, ɔ/ in the Highlife-corpus. Today, it is realized with
three different vowels in Ghanaian English, /a, ε, ɔ/ (Huber 2004: 849). A particular
emphasis will also be on the way Praat (Boersma and Weenink 2011) can be used to
analyze music recordings.
1. Introduction
The present study is concerned with the structural development of World Englishes.
Focusing on English in Ghana, the former British Gold Coast colony, this paper explores
ways in which early popular music recordings might be used to reconstruct the
phonology of colonial and post-colonial Englishes in a situation where other recordings
are (mostly) absent. The database consists of early 20th century music recordings from
Sebastian Schmidt
the former Gold Coast colony and the emerging independent post-colonial nation of
Ghana. The recordings contain lyrics that can be regarded as authentic historical nearspoken data. This source as a linguistic database is made accessible, because recordings
of colonial or early post-colonial Englishes are rare. In several pilot studies (cf. Huber &
Schmidt 2011a and 2011b, Schmidt 2011a) the potentials and methodological challenges
of using early popular music lyrics for the analysis of earlier stages of Outer Circle
(Kachru 1986) varieties of English have been explored.
For the present study, a pilot corpus of popular music lyrics from 1950s Gold Coast
Colony and then, from 1957 onwards, post-independence Ghana has been compiled. The
actual recordings and the transcribed lyrics have been subject to an auditory (Huber &
Schmidt 2011a) and an acoustic analysis (Schmidt 2012b). In both studies, the focus is
on sound change and on the differences between RP and Ghanaian English (GhE). The
motivation of the acoustic analysis is first of all to investigate if early music recordings
can be used within the context of an acoustic study at all and secondly to what extent the
results fit in with the findings generated by the auditory analysis. The present paper
brings the two analyses together. It is structured into three major sections, 1.)
background information on the data, 2.) a report on the methods and tools that were
applied, and 3.) a discussion of the findings and an outlook.
2. Background and Data
Kreyer and Mukherjee (2007) worked quantitatively and qualitatively on the style of pop
song lyrics in general. They also investigated vocabulary and lexicogrammatical
routines. In order to do so, the authors compiled the Giessen-Bonn Corpus of Popular
Music (GBoP). The GBoP consists of transcripts of popular music lyrics of various
heterogenous genres, such as rock music and rap. Kreyer and Mukherjee’s study is based
on the GBoP and focuses on written language. They call for a systematic, corpus-based
approach to popular music lyrics as linguistic data on all descriptive levels. The authors
show that it is worthwhile working with popular music lyrics as a linguistic database.
Furthermore, they suggest that a corpus-based approach to the study of the language in
popular music should be preferred.
Miethaner (2005) uses the BLUR-corpus (Blues Lyrics collected at the University of
Regensburg) to reconstruct earlier stages of African American English (AAE). By
applying corpus linguistic methodology, BLUR turns out to be an appropriate and valid
representation of earlier AAE. Miethaner demonstrates that blues lyrics can be used to
reconstruct the morphology, morphosyntax and syntax of earlier AAE.
Trudgill (1983) diachronically investigates English pop-singers’ pronunciation.
Among others, rhoticity serves as one linguistic variable. He observes a trend to sing in
an Americanized way in the 1950s and 60s but this trend is weakened at the latest with
the advent of punk-rock in the late 1970s in favour of a local English pronunciation. By
comparing several records of The Beatles and The Rolling Stones from 1963 until 1969,
Trudgill emphasizes the diachronic perspective of his study. As a result, the author
shows the importance of linguistic models and of identity in the context of the language
used in popular music.
New Ways of Analysing the History of Varieties of English
In this tradition, Brato and Jansen (2008) focus on both southern and northern English
indie rock bands, such as The Arctic Monkeys and The Kooks. Conducting an auditory
analysis of a selection of songs, they find that the bands they looked at are generally
English in their pronunciation and even exhibit regional accent features, such as
typically marked Sheffield English.
West African popular music lyrics have also been subject to linguistic analysis. Both
Coester (1998) and Culver (2007 and 2008) show an interest in the language of the late
Nigerian musician Fela Anikulapo Kuti. As Coester (1998) points out, Kuti’s lyrics are
characterized by an “intermingling of languages” (Coester 1998). The author shows that
the language in Kuti’s 1970s and 1980s lyrics alternates between Nigerian Pidgin
(NigP), Standard English (StE) and Yoruba, sometimes even within a single song. These
alternations are frequent and give distinction to Kuti’s style and the genre Afro-Beat of
which Kuti is regarded to be the founding-father.
The present study is a corpus-based, diachronic analysis of phonological details in
early Ghanaian popular music. The selected lyrics stem from a genre called Highlife.
Highlife is a form of dance-music of West African origin which was popular both with
the white minority and the local population (Bender 1985 and 2007, Collins 1986 and
1989, Oti 2009).
The musicologist Collins has worked extensively on Highlife music (cf. Collins 1986
and 1989), which he considers an umbrella term for West African popular music that had
its heyday around the time of Ghana’s independence in 1957 (cf. Collins 1986 and
1989). According to Collins, Highlife is characterized by “fusion” on the levels of
musical styles, cultures and languages (Collins 1989: 221). English and Pidgin lyrics
represent only a fraction of Highlife songs which were recorded in a variety of
languages. Some of the songs contain both local Ghanaian languages and English or
Pidgin English. There are also Pidgin elements within otherwise StE-oriented songs.
Hybridity in terms of stylistic and cultural diversity as well as language fusion are, from
a linguistic point of view, the central characteristics of early West African popular
Crucial for the acoustic analysis is the recording situation: According to Collins
(personal communication), the singer stood near or in front of the recording microphone.
The band was placed behind him and thus further away from the microphone. The music
was originally distributed on gramophone records. These were digitalized and stored as
.wav-data. It was particularly paid attention not to alter the voice in any way. To sum up,
the voice of the singers is generally ‘in front of the music’ so that the music can be
treated as background noise when vocals are measured.
3. Linguistic Context
Ghanaian English is an Outer Circle variety of English (Kachru 1982, 1986), which was
brought to the territory of modern Ghana through trading contacts and colonisation
(Huber 1999, 2008). Huber and Schmidt (2011a) locate modern GhE between
nativization and the endonormative stabilization stage in Schneider’s evolutionary model
(Schneider 2003, 2007). Currently, GhE is the “de facto official language” (Huber 2008:
72) in Ghana, because the status of English in Ghana is not specified in the constitution
Sebastian Schmidt
of the country. Nevertheless, GhE is spoken in most public domains such as schools, the
media and in parliament. In contrast to local languages, it “has the advantage of ethnic
neutrality” (Huber 2008: 73), which is an important aspect in a multi-ethnic and multilingual region.
Based on a structural investigation of GhE, as conducted by Huber (2008), “it should
be kept in mind that on all descriptive levels, GhE is a system of tendencies rather than
categorical differences from the British standard” (Huber 2008: 74). Especially in the
public domain, the British standard has overt prestige. Ghanaian speakers of English
often claim to sound RP, while in fact speakers often favour a distinct Ghanaian
pronunciation to dissociate themselves from speakers of other West African varieties of
The present-day GhE vowel inventory is characterised by a reduction of the twelve
RP monophthongs to the following seven: /i/, /e/, /ɛ/, [a], [ɔ], [o], [u] (Huber 2008: 75).
Importantly, the RP central vowel /ʌ/ is not part of the GhE vowel system (Huber 2008:
4. Pilot Study I: Auditory Analysis
The vowel quality of RP /ʌ/ (STRUT; Wells 1982 and 2010) varies considerably in
present day Ghanaian English (cf. Huber 2008). This is why the standard lexical set
STRUT was chosen as the variable for an auditive study of the pronunciation in early
Ghanaian popular music lyrics (cf. Huber & Schmidt 2011a). The STRUT vowel is here
defined as the central monophthong lower than schwa (cf. Ladefoged 2006). Huber and
Schmidt (2011a) compared the /ʌ/ vowel sub-system in the corpus of early Highlife
lyrics with Huber’s (2008) report on contemporary GhE. The song lyrics were
transcribed orthographically by Schmidt and proof-read by Huber and students from the
University of Ghana. Word lists containing all RP STRUT words were extracted from
the Highlife corpus. Then, both authors coded RP STRUT variants as follows:
open vowel = a
half-open back vowel = o
closed back vowel = u
undecided/between ‘a’ and ‘o’ = m
Depending on the actual realization, love would, for example, be coded as love_o, love_a
or love_m. An inter-rater agreement of 97,2% was reached.
As expected, variation in the realisation of RP STRUT in the 1950s/1960s songs is
clearly observable. The main variants, though, are /a/ and /ɔ/. /ɛ/, a current GhE variant
of RP STRUT, was not found in the Highlife corpus. Surprisingly, come was
consistently realized as /kʊm/ in the song “Apolonia” by The Builders Brigade Band. To
date, not enough is known about the singer or the band to give a solid explanation,
particularly, because love is realized throughout the song as /lav/.
New Ways of Analysing the History of Varieties of English
5. Acoustic Analysis
For the acoustic analysis, the transcriptions of the songs were transferred to PRAAT text
grids as required for most PRAAT scripts (Lennes 2003). The selection of songs had to
be revised, though. In the auditory analysis, even rather damaged recordings could be
included, because, after some training, human coders could work well with them.
However, when analysed with PRAAT (Boersma and Weenink 2011), the formants in
these songs could not be measured to a satisfactory degree.
My text grids consist of three tiers, a ‘line’-tier, a ‘word’-tier and a ‘vowel’-tier. All
variants of RP STRUT are marked on tier 3, the vowel-tier. Due to the relatively small
number of data-points this was done manually. A modified version of Lennes’ (2003)
script was used to measure the marked sections on the vowel-tier. The generated output
was normalised using the NORM vowel-normalisation suite by Thomas and Kendall
(2011). The Bark Difference Metric was chosen for normalisation, because this method
works well with vowel sub-systems (Thomas and Kendall 2011). Figure1 shows the
plotted RP STRUT words in the selected songs. Z3-Z2 represents the front-back
dimension, Z3-Z1 the height dimension in analogy to a standard vowel chart (cf.
Ladefoged 2006). The STRUT words auditorily coded as ‘a’ by Huber and Schmidt
(2011a) are plotted in red. They cluster in the lower half of the diagram whereas the ‘o’
words in blue gather in the upper part. Love coded as ‘m’ (green) falls in between. For
current GhE, Huber (2008) shows that the open vowel /a/ and the half-open vowel /ɔ/ are
typical realisations of RP STRUT. The acoustic analysis confirms that RP /ʌ/ was
already realized as /a/ and /ɔ/ in 1950s/60s GhE.
Figure 1: Vowel plot of RP STRUT words in a selection of early Ghanaian Highlife songs.
Sebastian Schmidt
As the diagram shows, the acoustic pilot study confirms the findings from the auditory
one (Huber and Schmidt 2011a). Apart from cup, all instances of RP STRUT in the
Highlife corpus cluster towards the back-end of Z3-Z2. There is a tendency for words
coded as ‘o’ to display a backer quality than tokens coded ‘a’. There is also a tendency
towards a height divide between ‘a’-tokens and ‘o’-tokens with only Sunday as an
outlier. Figure1 also confirms the correct application of the ‘m’-code, because the token
love coded as ‘m’ is located between most ‘a’ and ‘o’ tokens.
The results for RP STRUT words in the Highlife corpus encourage further acoustic
analyses aiming at a complete representation of the vowel system of early Highlife
In conclusion, the lyrics of early popular music recordings can be analysed
acoustically. Furthermore, the present study also shows that the results from the
auditory study correspond to a large extend with the acoustic analysis. Through the
application of both methods we get a glimpse of the English spoken in Ghana in the
6. Challenges
Early popular music in its original form is stored on various analogue records. Visits to
archives, for example to the African Music Archive (AMA), Mainz, Germany, and
experience from field work show that much depends on the condition of the actual
record. Record here - since we are talking of the 1950s/60s - basically means shellac
gramophone records and vinyl records. Depending on the frequency of use, the
technology used for playing the records and the conditions of the respective archives or
storerooms, the records deteriorate. Deterioration is inevitable due to the material
characteristics of shellac and vinyl. Loss of data quality and sometimes of whole
collections of music has to be taken into consideration. Apart from the purely physical
aspect mentioned above, it is a challenge to contextualize the data.
Although ethno-musicologist Coester is currently working on the biographies of
early Highlife singers, not much is known in detail about them. From a sociolinguistic or
sociophonetic perspective, it would be helpful to know more about the singers, their L1s,
educational background and if they had lived or toured extensively abroad for longer
periods of time, for instance in Nigeria or the USA. Basic information about the singers
can often be retrieved from the labels on the records. For example, in the case of the
song “Awirehow” by E.T. Mensah and His Tempo’s Band, the vocalist is identified as
Dan Acquaye. In the case of “The Tree and the Monkey”, also by E.T. Mensah and His
Tempos Band, Julie Okine is mentioned, who is so far the only female singer in the
corpus of early Highlife recordings. Due to typical regional and ethnic affiliation,
though, it can be inferred from the names with some certainty to which ethnic group in
Ghana a person belongs. Okine, for instance, is a Ga name (Anderson, personal
Another challenge is the acquisition of data. Highlife recordings are scattered over
various archives all over the world. The Gramophone Library of the Ghana Broadcasting
Corporation (GBC) in Accra for instance holds a vast collection of shellac records from
the 1950s and 1960s that is being digitalised. Recordings made by Decca West Africa
New Ways of Analysing the History of Varieties of English
are held at the British Library. For the purpose of linguistic analyses, digital recordings
in high .wav quality are indispensable and so an extensive database needs to be
compiled. Technical issues prove less challenging than legal issues in this respect. It is
often not clear who the copyright owners are and if digital copies of the recordings can
be made available for research purposes.
7. Outlook
In order to generate a vowel system of early Ghanaian Highlife pronunciation, the
methodology outlined above has to be repeated for other lexical standard sets as well,
especially those which exhibit different realizations in RP and GhE (Huber 2008: 74,
81). Some Highlife songs are performed in a more ‘spoken’ way (performed somewhat
similarly to talking blues). In these songs, vowel length merging can also be analysed.
Turning to consonants, /t/-affrication is a variable worth investigating. It is described
by Wells (1982) as “a common allophone of /t/ in a London accent [which] is a heavily
affricated [ts], thus [tsɑɪʔ ~ tsɑɪts] tight, [ˈpʰɑtsi] party” (Wells 1982: 31). As Huber
(2008) observes for GhE, /t/-affrication has currency there, because “in the Fante dialect
of Akan, /t/ has two allophones: [t] before back vowels and affricated [ts] before front
vowels. Speakers of the dialect sometimes transfer this allophony to English and, for
example, pronounce the name Martin [matsin]” (Huber 2008: 84).
Although current GhE is described as non-rhotic (Huber 2008: 87), the pronunciation
of post-vocalic /r/ is a feature of a number of singers. An analysis of rhoticity in the
Highlife corpus could provide empirical evidence of this phenomenon. An hypothesis
which needs to be tested is whether rhoticity can be attributed to an orientation towards
American popular music (cf. Trudgill 1983).
In the long run, early popular music recordings from Ghana will be compared to
recordings from other colonial or post-colonial contexts. Nigeria with its extensive
heritage of Highlife and Afro-Beat is an obvious contender for comparative studies. The
same is true for Sierra Leone where Calypso culture brought forth an extensive number
of recordings containing English or Krio lyrics in the 1950s and 1960s.
The advantage for linguists, who are interested in diachronic perspectives of popular
music is that there is an ongoing, though not unproblematic (Hassold 2005), recording
tradition in West Africa. From this rich source we are currently compiling a
comprehensive corpus of music lyrics.
Behrendt, Stephen D., A.J.H. Latham and David Northrup (2010): The Diary of Antera
Duke, an Eighteenth-century African Slave Trader. Oxford: OUP.
Bender, Wolfgang. 1985. Sweet Mother - Moderne Afrikanische Musik. München:
Trickster Verlag.
Bender, Wolfgang. 2007. Der Nigerianische Highlife. Musik und Kunst in der populären
Kultur der 50er und 60er Jahre. Wuppertal: Peter Hammer Verlag.
Sebastian Schmidt
Boersma, Paul and David Weenink. 2011. Praat: Doing Phonetics by Computer. Version
5.2.44. [Computer Programme].
Brato, Thorsten & Sandra Jansen. 2008. “‘You used to gerri’ in yer fishnets, now you
only gerri’ in yer nightdress’: Regional and supraregional accents in English rock
songs”. Presented at The Thirteenth International Conference on Methods in
Dialectology. Leeds, 04 August. http://www.thorsten-brato.de/en/conferencespresentations/. Accessed: 20 February 2012.
Coester, M. 1998. Language as a product of cultural contact. In: ntama Journal of
African Music and Popular Culture. http://www.uni-hildesheim.de/ntama/. Accessed:
5. March 2012.
Collins, John. 1986. E.T. Mensah: King of Highlife. London: Off the Record Press.
Collins, John. 1989. “The early history of West African highlife music”. Popular Music.
Vol. 8, No. 3. 221-230.
Culver, Christopher. 2007. Fela’s Nigerian English.
Culver, Christopher. 2008. A linguistic approach to Fela Kuti’s lyrics.
http://www.christopherculver.com/linguistweblog/2008/01/a-linguistic-approach-tofela-kuti%E2%80%99s-lyrics/. Accessed: 5. March 2012.
Hassold, Finn. 2005. Die Krise des Highlife - Zur Entwicklung der populären Musik in
Ghana. München: GRIN.
Hickey, Raymond (Ed.) (2010): Varieties of English in Writing. The Written Word as
Linguistic Evidence. Amsterdam: John Benjamins.
Huber, Magnus (2004): "Ghanaian English: Phonology." In Kortmann, Bernd and Edgar
W. Schneider (Eds.): A Handbook of Varieties of English. A Multimedia Reference
Tool. Volume 1: Phonology. Berlin: Mouton de Gruyter. 842-865.
Huber, Magnus. 2008. “Ghanaian English: phonology.” In: Rajend Mesthrie (ed.).
Varieties of English 4 - Africa, South and Southeast Asia. Berlin: Mouton de Gruyter.
Huber, Magnus & Sebastian Schmidt. 2011a. “New ways of analysing the history of
varieties of English - Early Highlife recordings from Ghana”. Presented at ISLE 2.
Boston, 17-21 June 2011.
Huber, Magnus & Sebastian Schmidt. 2011b. “Investigating the history of Pidgin
English - Early Highlife Recordings from Ghana”. Presented at The 2011 Summer
Conference of the Society for Pidgin and Creole Linguistics. Accra, Ghana, 2-6
August 2011.
Kachru, Braj (ed.). 1982. The Other Tongue: English across cultures. Urbana:
University of Illinois Press.
Kachru, Braj B. (1986): The Alchemy of English. The Spread, Functions, and Models of
Non-Native Englishes. Chicago: The University of Illinois Press.
Kreyer, Rolf & Joybrato Mukherjee. 2007. “The style of pop song lyrics: a corpuslinguistic pilot study”. Anglia. 125 (1). 31-58.
Ladefoged, Peter. 2006. A course in Phonetics. Boston: Thomson.
Lennes, Mietta. 2003. collect_formant_data_from_files.praat. [Computer Script].
Source: http://www.helsinki.fi/~lennes/praatscripts/public/collect_formant_data_from_files.praat. Accessed: 5. March 2012.
New Ways of Analysing the History of Varieties of English
Oti, Sonny. 2009. Highlife Music in West Africa: Down Memory Lane. Lagos, Nigeria:
Schmidt, Sebastian. 2011. “Tracing the Lyrics – Early Highlife Recordings from Ghana
as Linguistic Data and Cultural Artifacts”. Presented at the GCSC-Workshop Korpus
Kommunikation Kultur: Linguistik als Kulturwissenschaft. Gießen, 4. November
Schneider, Edgar W. 2007. Postcolonial English: Varieties Around the World.
Cambridge: Cambridge University Press.
Schneider, Edgar W. 2003. "The dynamics of New Englishes: From identity construction
to dialect birth". Language 79 (2), pp. 233–281.
Thomas, Erik R. and Tyler Kendall. 2007. NORM: The vowel normalization and
plotting suite. [ Online Resource: http://ncslaap.lib.ncsu.edu/tools/norm/ ].
Trudgill, Peter. 1983. “Acts of conflicting Identity - The sociolinguistics of British popsong pronunciation”. In: Peter Trudgill. On Dialect - Social and Geographical
Perspectives. Oxford: Blackwell. 141-160.
Wells, J.C. 1982. Accents of English 1 - An Introduction. Cambridge: CUP.
Wells, J.C. 2010. Standard lexical sets.
http://www.phon.ucl.ac.uk/home/wells/stanlexsets.htm. Accessed: 19 February 2012.
E.T. Mensah and his Tempos Band. 1950s. “Awirehow”. Decca West Africa.
E.T. Mensah and his Tempos Band. 1950s. “Day by Day”. Decca West Africa.
E.T. Mensah and his Tempos Band. 1950s. “Don’t Mind Your Wife”. Decca West
E.T. Mensah and his Tempos Band. 1950s. “I Want to be Happy”. Decca West Africa.
E.T. Mensah and his Tempos Band. 1950s. “Inflation Calypso” Decca West Africa.
E.T. Mensah and his Tempos Band. 1950s. “Sunday Mirror”. Decca West Africa.
E.T. Mensah and his Tempos Band. 1950s. “Tea Samba”. Decca West Africa.
Note on the discography: RetroAfric, London, offers reissues of E.T. Mensah’s most
famous recordings.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0037-6
Masaryk University, Brno
Received Pronunciation (RP) is often studied as the pronunciation model in Great Britain
and non-English-speaking countries separately. What my paper focuses on is the duality
with which RP is essentially endowed: the role(s) in which it has to satisfy the needs of
both native and non-native speakers of English.
Whilst the claim that RP has changed recently goes unchallenged, the issue of
reflecting these changes in the preferred transcription models is hotly debated. Upton’s
model of RP is one that does include several new symbols, motivated by an attempt to
‘ensure that the description of a late twentieth century version the accent […] looks
forward to the new millennium rather than back at increasingly outmoded forms’
(2001:352). I discuss the feasibility of adopting Upton’s model of RP as the pronunciation
model in non-English speaking countries, where it is desirable to resolve the paradox that
‘most of our teaching is aimed at young people, but the model we provide is that of
middle-aged or old speakers’ (Roach 2005: 394).
The observations I make are largely based on my MA research, which is now being
modified for the purposes of my Ph.D. I asked undergraduate students of English in
England and the Czech Republic to evaluate seven voices ranging from the clearly
regional to the unquestionably RP. The objective was to discover which sounds are
considered to fall within the scope of RP by students in both countries, which approach
avoids treating RP as though it were to include only the sounds ‘allowed by a
preconceived model’ (Upton 2000: 78). Further, the respondents were asked to comment
on the most salient features in the recordings: what they opted to comment on reveals a
marked difference in the role of RP as a model accent in the given countries. Societies
which lack a prestigious non-regional accent are often oblivious to the social connotations
RP carries. Whilst it seems technically impossible to replace the model accent in all
teaching materials all over the world, creating awareness of the fact that a rather
outmoded model of RP found in many textbooks may not always be the best option is a
necessary step towards ensuring that non-English speaking students are not only
understood but that their speech will attract no adverse judgements.
1. Introduction
RP, like any other accent, is subject to constant change. However, the transcription
model found in materials for ELT purposes has changed little since Jones’s transcription,
first used in the English Pronouncing Dictionary published in 1917. The reasons are
manifold. Upton (2001: 355) mentions the following as the most prominent ones:
Miroslav Ježek
 in the world of lexicography, phonological matters are not usually given priority
(this is presumably brought about by the fact that most lexicographers are not
phoneticians, hence they do not pay as much attention to the matters of
pronunciation as they do to semantics and grammar)
 there is strong conservative pressure in the ELT divisions of publishing houses
 phonological redescription in ELT dictionaries would also entail the revision of a
great number of other non-dictionary texts in which pronunciation is discussed —
this would be rather impractical and, above all, too costly
For the aforementioned reasons it might seem to an outsider (in particular to someone
who does not reside in the UK and whose first language is not English) that RP is an
accent with little, if any, variation. The best testimony to prove that the opposite is true is
the number of labels often attached to RP. The basic division phoneticians make is into
an older, rather conservative, variety and a younger, modern one. The former is labelled
‘traditional RP’ (Upton 2008: 239), ‘U-RP’ (Wells 1982: 279), ‘Refined RP’
(Cruttended 1994: 80) or ‘marked RP’ (Honey 1991: 38). The latter is called
‘mainstream RP’ (Wells 1982: 279), ‘General RP’ (Cruttended 1994: 80), ‘unmarked
RP’ (Honey 1991: 38), or there might not be any label at all: Upton (2000: 76) decided
to call this modern variety simply ‘RP’ on the grounds that it is the mainstream variety
and it can therefore ‘legitimately lay claim to the RP label without qualification’.
RP is an accent endowed with both advantages and disadvantages. This has been
well-documented in a wealth of research; cf. for example Giles (1990) and, more
recently, Beal (2008). RP is viewed as competent, persuasive and intelligent, but, at the
same time, as rather unfriendly and dishonest (Beal 2008: 29). This is the reason why I
call RP a ‘double-edged sword’: it may open some doors for you but it may also close
Prof. Clive Upton, currently based at Leeds University, is the only linguist who has
radically altered the transcription model of RP with the aim of providing a transcription
model which avoids ‘slavish imitation of the dictates of self-appointed arbiters of taste or
style in language’ (Upton 2003: viii). Instead, Upton only includes those sounds ‘heard
to be used by educated, non-regionally marked speakers rather than [sounds] “allowed”
by a preconceived model’ (Upton 2000: 78). Ramsaran shrewdly observes that ‘[i]f one
excludes certain non-traditional forms from one’s data, how can one discover the ways
in which the accent is changing?’. In other words, one cannot use the same sieve,
metaphorically speaking, over and over again to see who falls through and who does not.
This is hardly a successful way of detecting linguistic change.
It is now time to turn our attention to the actual description of the model in question.
2. Upton’s model of RP
Upton’s model has been in use for about two decades now and the most notable
publications where this model can be found include the world-famous Oxford English
Dictionary (OED). Other dictionaries using Upton’s model of RP are, for example, The
New Shorter Oxford English Dictionary (from 1993 onwards), The Concise Oxford
Dictionary (from 1995 onwards), and The New Oxford Dictionary of English (1998,
The Double-Edged Sword of RP
2003). Last but not least, The Oxford Dictionary of Pronunciation for Current English
(2001) is also on the list, this being the only dictionary focusing solely on pronunciation.
The call for an updated version of the RP model had been around for some time
before Upton decided to undertake the task of providing one. Gimson, in particular,
insisted that a new set of criteria for redefining RP be found. These ‘will result in a
somewhat diluted form of the traditional standard’ (1984: 53). In the same article
Gimson adds his hope that
the re-defined RP may be expected to fulfil a new and more extensive role in present-day
British society. Its primary function will be that of the most widely understood and
generally acceptable form of speech within Britain […] and more importantly for the
future, this standard form of British speech can function as one of the principal models for
users of English throughout the world
(1984: 53)
2.1. RP Vowels
While most of the vowels employed by Upton are the same as in other (older) models of
RP, there are several salient changes which have made his model a contentious issue.
The following table taken from Upton (2008: 241-2) neatly summarises the differences
between RP and traditional RP:
shared RP/trad-RP
~ 
~ 
~ 
Miroslav Ježek
shared RP/trad-RP
 ~
Table 1: The vowels of RP and traditional RP (Upton 2008: 241-2)
Whilst some changes seem to be mere transcriptional preferences (e.g. DRESS or
NURSE), others have raised a few eyebrows because they essentially alter the way RP is
perceived and interpreted. Namely it is the TRAP, BATH and PRICE lexical sets that are
discussed here in detail.
Firstly, the TRAP vowel is lowered so that the appropriate symbol is no longer the
ash symbol [], but the cardinal vowel no. 4 []. Wells (2001) insists that it is not
necessary to make the change as it is enough to retain the original symbol and simply
redefine it. This is, however, hardly possible due to the fact that phonetic symbols are
absolutes, therefore ‘their interpretation cannot be altered to suit the new development,
so that if anything is to change in the interests of accuracy and clarity it must be the label
that is applied to the sound’ (Upton 2008: 240). Upton goes on to argue that because
ELT texts are broadly phonemic ‘their users […] need to be provided with transcriptions
which correspond as honestly as possible to the sounds of the modern accent’ (2008:
Secondly, Upton introduces the short BATH vowel [], typically associated with the
North of England, as a possible RP alternative to the usual long BATH []. The logic
behind this decision is relatively simple: people in the North of England no longer adopt
the southern long BATH vowel; as a result even those who would normally be perfect
RP speakers cannot be labelled thus because they retain the short BATH vowel. If the
older model is taken as the norm, there is not (or soon will not be) a single RP speaker in
the North and, more importantly, RP ceases to be a non-regional accent. Instead, it is
immediately associated with the South of England. Upton then introduces ‘southern’ and
‘northern’ varieties of RP, thereby adhering to the universally accepted principle that
‘RP is not to be considered as exclusively a southern-British phenomenon’ (Upton et al.
2003: xiii).
Thirdly, the PRICE diphthong, changed from trad-RP [] to RP [] has come in for
a significant amount of criticism. Wells (2001) admits that there is a lot of variation in
the starting point of the diphthong but strictly dismisses Upton’s choice as ‘very
unsuitable [because it] accords with the habits neither of RP nor of southeastern speech’.
It is interesting to ponder a little on why the second element (south-eastern speech) is
added in the previous quote from Wells. I understand why Wells is unhappy about
Upton’s choice of [] if he cannot see it used in RP at all, but adding that it is not
present in south-eastern speech either seems to go against the criterion that RP should
not be associated with any particular region. Incidentally, this is exactly the reason why
The Double-Edged Sword of RP
Upton’s model of RP comes in for a lot of criticism—his inclusion of the short BATH
allegedly deprives RP of its non-regional basis. Surely, RP should only allow—where
possible, of course—supraregional sounds not associated with any particular region. One
notable exception is the short/long BATH vowel, where both regions stick to their own
varieties. A linguist can then either dismiss one of the two variants as non-standard or
allow both in their model of standard pronunciation.
This idea is far from modern: in 1942 Vilem Mathesius, the founding father of
English Studies in Czechoslovakia, observed that people from the Bohemia region
(centred on Prague) pronounce the initial consonant cluster in the Czech word ‘shoda’
(Eng. 'agreement’) voicelessly while people in the Moravian region (centred on Brno)
prefer the voiced variant. Although the former, i.e. voiceless, pronunciation had
traditionally been regarded as standard, Mathesius noticed that people from Moravia,
though otherwise perfectly conforming to the standard-speaker model, stick to the voiced
variant. In a dilemma very similar to the one Upton found himself in, Mathesius opts to
accept both variants as standard (1982: 149).
2.2. RP Consonants
RP consonants are nowhere near as variable as its vowels; hence they pose considerably
fewer problems for phoneticians. Many variants found in Upton’s model are RP
universals and are thus not unique to his model. The only consonantal feature worth
mentioning here is the presence of optional intrusive /r/, as in ‘drawing’ [()]. The
italics mean that the /r/ sound is intrusive rather than linking, which is shown in normal
3. Research
I conducted the research in 2009 for the purposes of my MA thesis. Right now, it is
being modified and, hopefully, improved at Ph.D. level. The whole idea formed in my
mind during my year-long stay at Leeds University in 2006-2007. It was not until then
that I started to realise certain differences in the perception of RP in the UK and the
Czech Rep.
3.1. Research objectives
 to compare the roles RP fulfils in the UK and the Czech Rep.
 to test the extent to which undergraduate students of English in both countries are
aware of recent innovations in RP
 to discover which sounds are considered to fall within the scope of RP by
students in both countries
Miroslav Ježek
3.2. Methodology
I set up a simple website which can still be accessed here: www.received
pronunciation.wz.cz. I asked respondents from both the UK and the Czech Rep. to
evaluate seven recordings which ranged from clearly non-RP/regional to trad-RP. All the
UK respondents were, incidentally, English (although I would certainly not have
discarded data from, say, Scottish or Welsh people). They were all aged 19-25 and were
either of working or of middle-class background. They were from all sorts of regions
within England—if we take into account the two best-known criteria which separate the
North from the South (namely the BATH and STRUT vowels), then I can say I had 17
southern and 13 northern respondents. The Czech respondents were also aged 19-25;
furthermore, I chose only those who model their speech the British way. Five of the
seven recordings were made by me; the remaining two (including the trad-RP recording)
were taken from Collins et Mees (2003). Each recording was accompanied by a
questionnaire. First, the respondents were asked to indicate, on a scale of 1 to 7 (1-highly
regional, 7-RP), how close to RP the given recording sounded to them. I view RP, like
any other accent, quantitatively (more or less) rather than qualitatively (either…or). This
is something foreign students often seem oblivious to: they think that someone either
speaks RP or they do not. But this is utterly mistaken as Wells’ category of Near-RP
(Wells 1982: 279) testifies. Then they went on to fill in several write-in questions. I
deemed it extremely important not to ask about any particular sounds so as not to put
ideas into my respondents’ minds. The questions were thus rather vague such as ‘What is
your overall impression of this speaker?’ or ‘Can you comment on any particular details
which helped you make up your mind in the RP score question?’. What the respondents
opted to comment on — regardless of whether their comments were positive or negative
— reveals a marked difference in the role of RP as a model accent in the given countries.
4. Results
It is perhaps not surprising that what I ended up with was just a hotchpotch of comments
which were then classified into categories by the common topic. The most salient
categories include the following: intelligibility, regionality, social status, education,
poshness. There were admittedly some more categories, namely euphony, speed,
authenticity, appropriateness, and rhythmicality, but these were found rather awkward to
deal with or useless and will not be taken into account in the Ph.D. research.
A very simple table below illustrates the differences between GB and CZ
Social status
GB respondents
CZ respondents
Table 2: observations by topic (measured in index points)
The Double-Edged Sword of RP
What is immediately observable is the fact that for Czech learners of English the crucial
aspect when they assess English speakers is intelligibility. The remaining four categories
are not nearly as important for them as they are for their British counterparts. This is
obviously perfectly understandable and entirely predictable, but it shows without any
doubt that the roles of RP in native and non-native environments are markedly different
and should therefore be kept separate whenever transcription models are discussed.
Czech university students of English are, of course, told about the regional and social
connotations RP carries but I argue there is a huge gap between knowing something and
feeling it intuitively. Czech learners of English often see RP as the most intelligible
accent and thus consent to learn it almost automatically. Unfortunately, the model of RP
they find in teaching materials is outdated, which is rather startling, for the recordings
found in the very same textbooks often do not correspond with the transcripts. One could
argue that these recordings are not then RP (and unquestionably many of them really are
not), but it would then mean that there are no RP recordings in modern textbooks of
English. The next question then suggests itself: Why are these teaching materials full of
phonetic transcriptions of an accent which does not appear in them at all?
The TRAP vowel is a case in point. While the transcriptions invariably insist on [],
the recordings include voices with lowered [] for which it seems more appropriate to
choose []. Specifically, I am now talking about Maturita Solutions textbooks used
mainly in secondary schools—there are several pronunciation exercises which stress the
importance of distinguishing such minimal pairs as ‘pat’ [p] and ‘pet’ [pt]. Sadly,
the TRAP vowel is predominantly realised as [] in the recordings (this might be so
because of the fact that the majority of the voices, without any doubt, belong to people in
their twenties, if not younger, which in itself is a very welcome step, of course). It then
takes me a lot of time explaining to my students that there is no need to attempt [] and
that [] is perfectly acceptable. For many Czech learners of English, the adoption of []
would certainly help to make the situation easier since they have [], unlike [], in their
The question in which respondents were asked to evaluate the recordings on a scale
of 1 to 7 (1-highly regional, 7-RP) provided some intensely interesting data as well.
Three speakers’ scores are worth looking at in greater detail.
Speaker 3 (most regional)
Speaker 4 (modern RP)
Speaker 6 (trad-RP)
GB respondents
CZ respondents
Table 3: RP scores for three selected speakers
I have decided to retain the original numbers the speakers had been assigned in the RP
Test in order that the readers could visit the website and listen to the recordings for
As we can see, the most regional Speaker 3 (the accent is, by the way, not a
particularly strong one, the voice belongs to a Ph.D. student of the English language
from Middlesbrough) received exactly the same score from Czech respondents as
modern-RP Speaker 4 did. There are two possible explanations: either students in the
Miroslav Ježek
Czech Republic failed to spot those regional features which clearly are not RP (e.g.
lowered STRUT and monophthongised GOAT) or their perception of RP is rather
outdated and what is considered modern RP now in the UK is still perceived as non-RP
in the Czech Republic. The latter explanation, however, is made somewhat doubtful in
the light of the next observation: Czech respondents failed to assign the highest RP score
to the trad-RP speaker. Although the score of 5 might appear to be high, it must be kept
in mind that Speaker 6 sits roughly in the middle with the fifth highest score of all.
British respondents, on the other hand, unmistakably and unanimously placed Speaker 6
at the very top of the rank.
The comments Czech respondents made about Speaker 6 reveal that the accent is not
only ‘weird’ but also, according to a number of them, regional, too. Crucially though, the
accent was ranked fourth in the intelligibility question for Czech respondents. Generally
speaking, the accent was not popular with either set of respondents. For British
respondents the overwhelming perception of the accent was that of sounding extremely
The comments from both sets of respondents have also shown that while lowered
TRAP and short BATH vowels are RP sounds for English respondents, they are not so
for their Czech counterparts. Intrusive /r/ is most assuredly an RP sound for both sets, as
is, in fact, the glottal stop replacing /t/ in other than intervocalic positions. /t/-glottaling
is not treated here for it has been covered extensively elsewhere (cf. Hannisdal 2006).
The last RP sound I want to discuss here in greater detail is the PRICE diphthong. It is
one of the most contentious issues in Upton’s model of RP and the one for which Wells
(2001) finds the least sympathy. This diphthong did draw some comments from British
respondents, many of whom noticed the backed first element. The decision as to whether
or not this falls within the scope of RP was, however, far from unanimous—about 60%
of those who did comment on it considered [] to be an RP sound.
Most revealing is the conspicuous lack of any comments on the part of Czech
respondents. The reason why they failed to spot any variation here is quite simple: in the
Czech phonological system there only are five monophthongal vowels /a/, /e/, /i/, /o/,
and /u/ and three diphthongs /au/, /eu/, and /ou/ (Dankovicova 1999: 72). As far as the /a/
vowel is concerned, its realisation varies to a large extent ranging from [~~]. The
front vowel is common in Bohemia whereas the back one is typical of Moravia. This
variation is merely allophonic; as a consequence, Czech learners of English have trouble
distinguishing minimal pairs such as fun/fan when these are pronounced by a native
speaker of English whose fan vowel is realized as [] and not as []. It is then far from
surprising that Czech respondents did not comment on the PRICE diphthong in the RP
Test at all.
5. Conclusion
The results of my research seem to suggest that trad-RP is a now such a rarity it has lost
its function in the ELT field. It appears to be so obsolete that some Czech respondents
mistook it for a regional accent; moreover it is not the most intelligible dialect any more.
This might have been brought about by far greater exposure to a higher number of native
The Double-Edged Sword of RP
British accents in the past two decades. Learners of English in the Czech Republic rely
less and less on textbook CDs and turn to some more natural/authentic sources (TV
programmes of all sorts are immensely influential in this respect) when trying to
improve their pronunciation.
Upton’s model of RP seems highly suitable for Great Britain since it reconciles the
two opposing tendencies still present in British society—namely the desire to speak
better but, at the same time, to avoid sounding posh and elitist. This is well documented
in Beal who comes to the conclusion that ‘British society today is every bit as
hierarchical as that which spawned the elocution movement of the 18 th century, but […]
the models of good pronunciation are no longer the aristocracy but the professional and
entrepreneurial classes who can provide employment’ (2008: 38). But RP is no longer
the automatically preferred accent. Call centres are a case in point—their workers ‘avoid
both the unfriendly connotations of RP, and the uneducated associations of broad
regional accents, and so are acceptable to a wide range of callers’ (Beal 2008: 30-1).
Surely Upton’s model of RP is a step towards a less elitist perception of the accent.
Wells (2001) objects to Upton’s model of RP because he sees it as an unnecessary
threat to the ‘hard-won uniformity’ which had been achieved in the transcription of RP.
He believes that ‘supposed gains did not make up for the sacrifice of an agreed standard’
(2001). What should we do, though, if the agreed standard, albeit so laboriously gained,
does not reflect the true state of affairs any longer?
Introducing Upton’s model to the Czech Republic, however, appears to face many
The first and seemingly insuperable obstacle is money. Re-editing and republishing
the vast numbers of teaching materials in which pronunciation is discussed would not
only be highly impractical but also too expensive.
Secondly, for the reasons mentioned in the Introduction there is not enough support
to carry out these changes anyway.
Thirdly, I fear some of the changes would only bring about more confusion for the
overwhelming majority of learners (in particular for those who do not study English at
university, which is the lowest level where phonetic symbols are dealt with properly in
the Czech Republic) for whom phonetic symbols are abstruse and who learn
pronunciation by way of imitation rather than by way of pronouncing dictionaries.
Last but not least, RP in the Czech Republic lacks the social and regional
connotations it has for native speakers in Great Britain. The roles of RP in the two
countries in question are markedly different. What seems necessary in Britain might not
be so in the Czech Republic: whilst updating the model in Britain makes sure that the
accent is rid of the redolence of social privilege, there is no such problem in the Czech
It seems, nonetheless, important for Czech learners of English to be aware of the
incessant change RP is subject to (it is not a petrified accent, although it is for obvious
reasons more resistant to change than other accents). Likewise they should take into
account the wealth of connotations this accent is endowed with. They should know that
for many people in Britain RP (particularly in the traditional form) is not the preferred
accent and the reaction to it may not always be positive.
Miroslav Ježek
RP is the accent used in the Czech Republic as the model accent. This seems extremely
unlikely to change in the foreseeable future (if a completely radical change is not
undertaken, e.g. replacing RP with the General American accent). I am convinced that it
is eminently desirable to resolve the unhappy situation in which the accent often heard
from CDs is in certain particulars considerably different from the transcription provided.
It is true that CDs often contain recordings with a wide variety of accents; many of them
are (slightly) regional and are also different from the phonetic symbols used in the
teaching materials. These, however, are not presented as the model students should
Beal, Joan. 2008. Shamed by Your English? In Joan Beal et al. (eds) Perspectives on
Prescriptivism. Bern: Peter Lang, 21-40.
Collins, Beverley et Inger Mees. 2003. Practical Phonetics and Phonology. London:
Cruttenden, Alan. 1994. Gimson’s Pronunciation of English, 5th ed. London: Arnold.
Dankovicova, Jana. 1999. Czech. In Handbook of the International Phonetic
Association, Cambridge University Press, pp. 70–74.
Giles, Howard et al. 1990. The Social Meaning of RP. In Susan Ramsaran (ed) Studies in
the Pronunciation of English: A Commemorative Volume in Honour of A. C. Gimson.
London: Routledge, 191-211.
Gimson, A. C. 1984. The RP Accent. In Peter Trudgill (ed) Language in the British
Isles. Cambridge: Cambridge University Press, 45-54.
Hannisdal, Bente Rebecca. 2006. Variability and change in Received Pronunciation,
Ph.D. dissertation. Bergen: University of Bergen.
Honey, John. 1991. Does Accent Matter?. London: Faber and Faber.
Mathesius, Vilem. 1982 [1942]. Jazyk, kultura a slovesnost. Praha: Odeon.
Roach, Peter. 2005. Representing the English Model. In Katarzyna Dzubialska-Kolczyk
& Joanna Przedlacka (eds) English Pronunciation Models: A Changing Scene. Bern:
Peter Lang, 393-9.
Upton, Clive. 2000. Maintaining the Standard. In Robert Penhallurick (ed) Debating
Dialect: Essays on the Philosophy of Dialect Study. Cardiff: University of Wales
Press, 66-83.
Upton, Clive. 2001. Revisiting RP. In Malcolm Jones (ed) Essays in Lore and Language:
Presented to John Widdowson on the Occasion of His Retirement. Sheffield:
National Centre for English, 351-68.
Upton, Clive, William Kretzschamr and Rafal Konopka. 2003. The Oxford Dictionary of
Pronunciation for Current English. Oxford: Oxford University Press.
Upton Clive. 2008. Received Pronunciation. In Clive Upton & Bernd Kortmann (eds)
Varieties of English: The British Isles. New York: Mouton de Gruyter, 237-52.
Wells, J. C. 1982. Accents of English (3 vols). Cambridge: Cambridge University Press.
Wells, J. C. 2001. IPA Transcription Systems for English.
http://www.phon.ucl.ac.uk/home/wells/ipa-english-uni.htm [accessed February 2012]
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0036-7
Stockholm University
This study examines the English pronunciation of a group of Nigerian students at a
university in Sweden from the point of view of their intelligibility to two groups of
listeners: 1) native speakers of English who are teachers at the university; 2) nonnative
speakers of English who are teachers at the university. It is found that listeners who are
accustomed to interacting with international students do better than those who are not, and
that native speakers of English do no better or worse than non-native listeners. The
conclusion is drawn that locally useful varieties of Nigerian English may not easily be
used as for wider communication and that students preparing to study abroad would find it
useful to gain access to a more widely intelligible variety.
1. Background
Many students from all around the world find their way to universities in Sweden. There
are a number of reasons why Sweden is attractive to international students. The standard
of living is high, and so is the standard of education. In addition, it is well known that
many Swedish people speak English. Swedish universities offer a fair number of
Master’s programs and a few undergraduate programmes taught through the medium of
English. It is not difficult for international students to study in Sweden, even if they have
no knowledge of the Swedish language. Another, quite compelling, reason for the
interest Swedish universities have attracted from international students is the fact that
Swedish higher education had until recently no tuition fees, not even for students from
outside of Europe. Many students have realised that in Sweden they have the chance of
getting a world-class education without paying fees.
Many Nigerian students who come to Sweden to study English have received most or
all of their previous education through the medium of English. It comes as a shock in
many cases for these students to find that they are not viewed by their teachers in
Sweden as native or near-native speakers of English. They may fail language proficiency
courses and find that their English does not work as well as they expect it to in
communication with their teachers and with other international students, in particular
those who are non-native speakers of English.
The influx of students from other parts of the world has not been entirely without
problems. There are a number of inconsistencies between the Swedish education system
and its counterparts in other countries. One problem we have had is with the way foreign
qualifications are judged by the Swedish National Agency for Service to Universities
Una Cunningham
and University Colleges who centrally administer admissions to Swedish universities.
Each year the Agency produces a handbook where the national qualifications in many
countries are listed, explained and compared to the Swedish qualifications upon which
the admissions system is based. Unfortunately it appears that in a number of cases the
Swedish system is overly generous in its conversion of foreign grades. For example, it is
necessary for students who have attended school in Nigeria to achieve a grade 8 (the
lowest pass grade) in English O-level (SSCE/WASSCE). This is deemed as equivalent to
the Swedish upper secondary course English B, which is in turn deemed equivalent to
IELTS level 6.0 with at least 5.5 in each section of the test. This fulfils the English
language prerequisites for any programme of study in any faculty at any Swedish
university. Ironically, it appears that at least in the past, Nigerian universities do not
accept students to any faculty with less than a credit (grade 6) in O-level English
(SSCE/WASSCE) (Ufomata 1996).
The Swedish system of higher education was designed to cater for the needs and
expectations of Swedish school leavers. For many years this was adequate. When
Sweden entered the EU in 1995 the number of international students increased with
exchange schemes such as the Socrates-Erasmus programme which funds and facilitates
the exchange of students and staff between universities in Europe. Such students stay for
a semester or a year and return to their home universities with their credits to take their
degree there. The influx of students from beyond the EU coincided with the introduction
of degree programmes (as opposed to short courses which the student collects until the
appropriate number of credits and a degree thesis have been achieved allowing the
student to apply for a degree). In the global higher education market, degree programmes
are much more transparent and attractive than the loosely bound selection of courses
which leads to a degree that has been usual, at least in the humanities, in Sweden.
The EU has, through what is known as the Bologna process, attempted to impose a
degree of uniformity on European higher education. It is, in theory, possible for students
to wander from one European university to another, taking their credits where they may.
Of course, in practice, things are not always that simple, but there is at least a level of
understanding of the way the system works in other parts of Europe. When students from
other parts of the world apply to Swedish courses and programmes they may find that
their qualifications are not well regarded. Students from Pakistan, for example, may find
that they need to have completed both a BA and an MA to be deemed to have a
qualification equivalent to a Swedish bachelor’s degree. Students from Russia may find,
to their dismay, that only three of their five years of university education will be
These circumstances lead to a situation where many students are admitted to study at
too high a level due to the prerequisites being inappropriately low. In fact such students
often have a primary problem with insufficient proficiency in English language. The
University provides an English language needs analysis to discover such cases early on
(in the first week of study) so that students can be offered courses in English for
academic purposes during one or sometimes two semesters, before proceeding to their
planned programme of study. This preparatory study improves students’ proficiency
levels while simultaneously introducing them to the means, methods and models of
learning which shape the student experience at a Swedish university.
Using Nigerian English in an International Academic Setting
English has no official status in Sweden, which means that Sweden is part of Kachru’s
expanding circle (Kachru 1992). In Sweden, the requirement of English language
proficiency for the study of English at university level is set to match that held by
Swedish school-leavers who have taken two years of English at upper secondary school.
Previously such students will have studied English for at least 7 years at primary and
lower secondary school. In addition, they are bombarded by English from TV, cinema,
music and computer games. They have ample opportunity to hear English and most
young people can switch to English with minimal inconvenience when they need to,
which is fairly often, given that Swedes travel extensively and cannot expect to meet
only Swedish speakers outside Sweden and that Sweden has many international visitors.
Swedish young people possess considerable communicative competence in English.
While their speech may be accented and their grasp of the vagaries of English grammar
tenuous, they speak and understand English adeptly. Consequently, university English
courses in Sweden are generally designed to teach grammatical accuracy and academic
reading and writing skills at the initial level, rather than pronunciation and
communication, moving quickly on to the kind of courses in English language,
linguistics, literature and culture that can be found at universities anywhere in the
English-speaking world.
What then do we require of students who are to take part in our courses in terms of
English language proficiency? One criterion for the required level in the needs analysis
is that students need to be able to understand native and non-native speakers of English
speaking clearly and at normal tempo, which is what they need to be able to do if they
are to take part in classes. Another is that they need to be able to express themselves
coherently orally and in a free writing task. Yet another part of the needs analysis is a
grammar test, corresponding to the IELTS levels 5.0 and 6.0 which are the levels
required for our preparatory courses and ordinary undergraduate and graduate courses
To address the needs of students who are admitted to the university with less English
proficiency than we require as shown by the needs analysis, we have designed our
preparatory courses in English for academic purposes. In fact we have found that some
Swedish students also appreciate these courses, either because they have only one year
of upper secondary English, or because they have been away from education for some
years and feel that their English needs refreshing before they continue. Even the
occasional native speaker of English turns up on our courses for a number of reasons,
often involving limited educational opportunities. Our needs analysis will pick up these
speakers through their lack of certainty regarding the grammaticality of Standard English
constructions and their written disfluency. For students who have learned English as a
foreign language in their home country in the so-called expanding circle, it may be
disappointing to find that the level achieved is not adequate for study in Sweden, and
some do insist on disregarding the advice of their teachers and continuing on the
programme or course to which they have been admitted, generally with disappointing
results. For Swedish students and for the occasional inner circle speaker from the UK or
the US, the preparatory courses offer a chance to remediate the gaps in their English in a
context which is not face-threatening. The students who find it hardest to accept a
disappointing result in the needs analysis are those who come from the outer circle, those
for whom English is a second language, often the language of their education.
Una Cunningham
There is a serious problem here which is faced by all international academic
environments. While, on the one hand, speakers of what McArthur (2002) calls New
Englishes rightly demand full respect and recognition of these as legitimate varieties of
English, they, like some of the Old Englishes such as my own Northern Irish English are
not always ideally suited to international communication. As an educated speaker of
Northern Irish English, when I left Northern Ireland to study in Britain, I quickly learned
to modify my pronunciation to facilitate communication with non-Northern Irish
interlocutors. Significantly, this can be done without compromising speaker identity as,
in my case, a person from Northern Ireland. Initially I was not easily understood and my
pronunciation was the object of comment. Failure to change my more “extreme”
pronunciations might eventually have led to those I interacted most with getting used to
my way of speaking, but the social and educational cost would be considerable. The
result is that I, like many speakers of non-standard accents and dialects, switch between
accents depending on my interlocutor and the communicative situation.
There is a significant distinction to be made at this point between English and
Englishes. One of the definitions of a language as opposed to a dialect refers to the
criterion of mutual intelligibility. While I would not like to suggest that the less widely
intelligible accents of English are not English – we are after all talking about accents
rather than syntactic or lexical variation – English is a very special case. We ask more of
English than has ever been asked of any language in our history. Not only is it an
important lingua franca, allowing genuine international communication, it is also a local
living language for millions of everyday speakers in many different countries. But we
are fooling ourselves if we claim that a speaker can wander from one communicative
situation to another without modifying his or her English according to the
communicative situation. Two speakers of any variety of English will be able to speak
together in a different way than if one of them were to converse with a speaker of
another variety in another place, and in yet another way if speaking to an EFL or ESL
learner (even one who has the same variety of English as a target for their learning)
whose proficiency may well be limited.
The problem may arise as a consequence of postcolonial insecurities and
defensiveness regarding the status of New Englishes. If there is indeed a Standard
Nigerian English pronunciation, which seems relatively problematic given the variation
described in e.g. Banjo (1971), Bamgbose (1995) and Ufomata (1996), it may not be
very useful for international communication, just as can be said of Glasgwegian and
various Northern Ireland accents, not to mention some kinds of southern US accents, or
broad Australian, or Scouse, Geordie or any other well-defined accent of English. No
linguist would question the legitimacy of any of these inner or outer circle accents. What
happens is that these are not adequate for international or even interregional settings. To
say this is not in any way to denigrate these accents–they are obviously linguistically
adequate and important carriers of sociolinguistic markers. But it is important to separate
the functions of English in local and international communication. English as a language
of international communication is not the same as speaking to your neighbour in
Glasgow, Birmingham, Hong Kong or Lagos. There is little point in insisting on the
right to use the particular forms and phones that mark a speaker as a speaker of a
particular variety if there is a failure to communicate.
Using Nigerian English in an International Academic Setting
It does, of course, take two to communicate. A good deal has been written about the
need for native speakers of inner speaker varieties to become more informed and tolerant
listeners such that they might be better prepared to perceive and interpret unfamiliar
accents of English (Phillipson 1992). There is a good deal of individual variation in how
flexible listeners are in their attempts to understand what they are hearing, and probably
the personal language history of the listener will be relevant in how easily they
understand other accents (Cunningham 2009). In addition, experience of the accent in
question will also be significant in how easily an individual can understand a particular
accent (Kirkpatrick, Deterding et al. 2008; Rooy 2009).
There is a difference, however, between attempting to understand a speaker of
English as a foreign language (EFL) who has a foreign accent and attempting to
understand a speaker of a New English who is a speaker of English as a second language
(ESL) and who has an accent associated with that variety of English. The two situations
are similar, but there is a difference in speaker and listener expectations. Both speakers
will have had the experience of being a learner of English and presumably of having
instruction in pronunciation of English. Both will often have learned English from a
teacher with their own language background in a class of others with the same
background. However, the EFL speaker will often have had British or American English
as a model for their learning, while the ESL speaker may well have had the New English
variety in question as their model. Where there is a breakdown in communication these
speakers may behave differently. The speaker of a New English has different
expectations of his or her variety being met with respect and may be extremely reluctant
or unable to offer alternative pronunciations, finding that an intolerable infringement of
their speaker integrity. The EFL speaker may be better prepared to try different
pronunciations and formulations.
Jenkins has led the way in the description of English as a language of international
communication (EIL) e.g. Jenkins (2002, 2005) and where at least one party is not a
native speaker of English, English as a Lingua Franca (ELF) e.g. Jenkins (2006, 2009),
Berns (2008), Seidlhofer (2009) and Watterson (2008). In this context, it is
unproblematic to discuss matters such as an ELF core phonology (Jenkins 2000). Crystal
(2003,124) and Graddol (1997, 56) discuss the possibility that English might develop
into a number of mutually unintelligible varieties, but that this would be mitigated by a
parallel competence being built in a globally standard English for international
communication, leading to a diglossic situation which is reminiscient of that currently
operating in countries like Sweden where English is used as soon as a non-Swedish
participant is involved while Swedish is used between Swedes. The data presented in this
paper suggests that this may already be a necessity.
Smith and Nelson (1985) teased out the distinction between intelligibility,
comprehensibility and interpretability. Intelligibility is the concern of this paper, and
deals with word or utterance recognition, such that a listener would be able to transcribe
an utterance which he or she finds intelligible.
Intelligibility is not an absolute. Intelligibility is a factor related to a specific speakerlistener communicative event. An utterance or a speaker cannot be said to be intelligible
or not intelligible in any absolute sense. A speaker can be more or less intelligible to
different speakers in different situations.
Una Cunningham
A lack of intelligibility is a problem for speaker and listener alike, and a good deal of
work has been done on various aspects of intelligibility, e.g. Smith and Rafiqzad (1979),
Smith & Nelson (1985), Jenkins (2002) and Berns (2008). Smith and Nelson (1985)
point out that there is general agreement that it is unnecessary for every speaker of
English to be intelligible to every other speaker of English, but that we do need to be
intelligible to those with whom we are likely to communicate in English.
Naturally, the time is long past when native inner circle speakers are the only
legitimate judges of what is intelligible, and few would maintain that native speakers are
automatically more intelligible than non-native speakers e.g. Smith and Rafiqzad (1979).
As the number of speakers for whom English is one of a number of languages grows and
has long ago exceeded the number of so-called native (monolingual?) speakers of
English, the imagined native speaker is not often the implied interlocutor for learners of
English in either EFL or ESL situations.
This study uses data from Nigerian students and thus it is relevant to consider the
role and status of English in Nigeria. A good deal has been written on this topic which is
confounded by the multitude of languages spoken in the country (some 400 in some
sources e.g. Gut and Milde 2002). The colonial history of countries such as Nigeria have
led to a situation where English is retained as a language of business, education and
media as well as interethnic communication (Gut 2007), although, Nigerian Pidgin
English also serves for interethnic communication. Due to a complex mesh of factors
including linguistic attitudes and language policies in the outer circle countries in general
and Nigeria in particular, these speakers may not appreciate their first languages,
sometimes referring to them disparagingly as dialects, vernaculars or local languages. A
good deal has been written and will continue to be written about the need for African
languages to take a more prominent role in the lives of the people of Africa, e.g. (Prah
2002). The role of English in Nigeria, as elsewhere in Africa, and the attitudes of
Nigerians to English and other Nigerian languages are sensitive topics.
The distinction between second language varieties of English such as Standard
Nigerian English and learner varieties of those with Standard Nigerian English as their
target variety is far from clear cut. The nature of the relationship between English-based
varieties in Nigeria has not, to my knowledge, been fully explored. In other comparable
postcolonial contexts a continuum has been described which spans from a basilect,
perhaps represented here by Nigerian Pidgin English to an acrolect which would be close
to the British English which was the variety once imposed upon Nigeria, as suggested by
Ufomata (1996).
Adamo (2007) writes that “English has itself (to a certain extent at least) become a
Nigerian language”. She points to nativization of English as indexical of its integration
into the culture of the community. Like the Nigerian author, Achebe, she sees Nigerian
English as having “communion with its ancestral home but is altered to suit its new
surrounding” (Achebe 1975). She writes further that “When a people are alienated from
their language(s), as is the case in Nigeria today, they gradually become alienated from
their culture” She argues that English, however nativized, will not serve as a national
language, and calls for an indigenous language to take that place. At the same time she is
realistic and points to the efforts made to standardize, nativize and codify Nigerian
English to enable it as a carrier of Nigerian culture.
Using Nigerian English in an International Academic Setting
The status of Nigerian English as a variety of English has been questioned (Ajani 2007).
This is certainly a central question if we are to be able to decide whether the English
spoken in Nigeria is a variety of English which can carry a culture or if we are to regard
it as a learner variety. In the words of Kachru “what is ‘deficit linguistics in one context
may be a matter of ‘difference’ which is based on vital sociolinguistic realities of
identity, creativity and linguistic and cultural contact in another context” (Kachru 1991).
Ajani (2007) sets the position of a standardised Nigerian English against the early
position of English teachers in Nigeria who refuse to accept any model but the native
British model. Ajani relates this debate to the US Ebonics debate, rejecting AAVE as a
legitimate variety for use in education. He further questions whether speakers of one of
the 400 languages of Nigeria, e.g. Hausa, will sound the same when speaking English as
will a speaker of another language, e.g. Yoruba or Igbo.
Bamgbose (1982) views the emergence of a Nigerian English as a natural outcome of
the language contact situation in the country. He accounts for three mechanisms at work
in generating usages in the Nigerian English: the interference, deviation and creativity
approaches involving “interference” from the mother tongue (or possibly Nigerian
Pidgin English), “deviation” from the native British norm and the creative inclusion of
elements of local languages as well as English to create new items respectively.
Bamgbose rejects the native model for Nigerian learners and suggests that the educated
speaker of Standard English be the model. This standard has not, however been well
Schneider (2003) compares the evolution of postcolonial Englishes in language
contact situations to the acquisition of a second language such that the phonology of
such new varieties will display features that resemble transfer from the phonology of
“indigenous languages”. This view is shared by Hickey (2004:519) who writes on
cluster simplification in Asian and African Englishes that “this is determined largely by
the phonotactics of the background language(s). In the case of Nigeria, there are a
multitude of such background, or substrate languages. It is estimated that almost 400
languages are spoken in Nigeria (Bamgbose 1971, Agheyisi 1984). This does, of course,
depend on how the languages are defined. Prah (2009) claims that the number of
languages, as defined by criteria of mutual unintelligibility might be far fewer. He states,
“What is not easily recognized by many observers is that most of what in the literature,
and classificatory schemes, on African languages passes as separate languages in an
overwhelming number of cases are actually dialectal variants of “core languages.” In
other words, most African languages can be regarded as mutually intelligible variants
within large clusters (core languages).”
Ufomata (1996) offers an account of the continuum that exists with native-like accents at
one end (deemed essential for a career as newsreader) and “other varieties which can be
defined negatively in relation to these standard accents”. Ufomata goes on to say that the
Nigerian standard is socially accepted and internationally intelligible. Bamgbose (1995)
suggests that this accent should be taught in schools. Ufomata accounts for some of the
main features of Educated Spoken Nigerian English, describing them with reference to
RP phonemes. These are:
 The vowels of ship and sheep are both pronounced [i]
 Food and foot are both pronounced with [u]
Una Cunningham
Bath and bag are both pronounced with [a]
The vowels of play and plough are monophthongized to [e] and [o] respectively
The initial consonants of thin and then are pronounced [t] and [d] respectively
Heavy nasalization of vowels preceding nasals and the dropping of word-final
Previous work on the intelligibility of Nigerian English has indicated that rhythm and
intonation are the biggest problem (Stevenson 1965). Syllables that would be unstressed
in other varieties of English may not be reduced in any way in this variety. This study
will add to our knowledge about the intelligibility problems experienced by Nigerian
English speakers and their non-Nigerian interlocutors.
Banjo (1971:169-70) in an often cited account describes four discrete varieties of
Nigerian ranging from what he calls Variety 1 which is marked by wholesale transfer of
phonological, syntactic, and lexical features of Kwa or Niger Congo to English, spoken
by those whose knowledge of English is very imperfect and neither socially acceptable
in Nigerian nor internationally intelligible, through Variety 2 and Variety 3 which are
described as progressively closer to standard British English in syntax, semantics and
lexis, though still different in phonetic features with increasing international
intelligibility to Variety 4 which he describes as identical to standard British English.
This last may correspond to the “newsreader variety” described by Ufomata (1996). It
seems likely that there is in fact a continuum ranging perhaps even from a basilect
represented by Nigerian Pidgin English through Standard Nigerian English to the
British-like acrolect.
2. Material and Methods
The three students who have provided the stimuli for this study are young men aged
between 23 and 34 from Nigeria. They came to Sweden to study a bachelor’s
programme in English language, literature and culture. When they arrived to take up
their studies they took part in the needs analysis mentioned above, and all three of them
were found to be have an inadequate level of English proficiency on both their oral skills
(receptive and productive) and their mastery of standard English grammar. The students
involved in this study have been educated in English-medium schools since primary
school. When asked which is their first or native language, all three indicated that
English was their first language. This is in spite of the fact that further enquiry revealed
a) that they did not encounter English until they began primary school, b) that English
was not the language they used to speak to each other, choosing the Nigerian language
Igbo for that purpose in the case of two speakers (the third speaker did not speak or
understand Igbo) and Nigerian Pidgin English otherwise, c) that English was not the
language they used to talk to their families and d) that their English was not a language
they mastered in terms of grammatical consistency, vocabulary size and written or
spoken fluency according to the results of our needs analysis. Their English appears to
all intents and purposes to be a learner variety. The distinction between learner varieties
and New Englishes is, of course, not always easy to draw, and these young men have
presumably had Nigerian English as a model and target for their English learning.
Using Nigerian English in an International Academic Setting
The 21 listeners were recruited from among students and staff at a Swedish university,
both those who regularly come into contact with international students and those who do
not. Seven of the listeners were native speakers of English from the England, the US,
Scotland, Ireland and Australia and 14 were non-native speakers of English with French,
Swedish, Russian, Italian, Finnish and German as their first languages. Six of the native
English speakers and five of the non-native English speakers had extensive experience
hearing international Englishes of many kinds through contact with our extremely
international student body. Others had less such contact and experience.
The three speakers each recorded a set of material including a text, a wordlist, a set
of words contrasting high front vowels and postvocalic consonant voicing embedded in
carrier phrases in phrase final and non phrase final position, a set of semantically
meaningful sentences and a set of semantically unpredictable (but still grammatical)
sentences and a set of true/false questions. The last three items on this list are the same
material as used in another study reported in Munro and Derwing (1995). The stimuli
used in this study were selected from the semantically meaningful sentences. These
sentences were designed to include some sounds and sound combinations that are
generally challenging for many ESL and EFL speakers in sentences where the contexts
is not especially helpful to the listener. In other words, comprehension will not be an aid
to intelligibility, while the sentences are still considerably more natural than the test
words in carrier phrases that were also recorded.
Eight sentences were used in this study, uttered by speaker N1 apart from sentences 5
and 8. Sentences 4 and 8 are the same, but were spoken by two different speakers.
1 A big farmer lifts a large load.
2 A confident guy viewed a natural scene
3 A fair judge gives a second chance
4 A hundred sheep took a dangerous trip
5 My girl climbed a red car (speaker N2)
6 A pool is better than seventeen orange trees
7 A thin lady taught a musical language
8 A hundred sheep took a dangerous trip (speaker N3)
Speakers were presented with the stimuli using an online test facility built into the
learning platform used at the university. Listeners heard the utterances individually
through headphones and they could listen as many times as they wanted to the utterance
and were then asked to write what they heard. They could take the test online at a time
convenient to themselves.
3. Results
Table 1 shows the results provided by the listeners for the first sentence, A big farmer
lifts a large load as uttered by speaker N1. The listener responses are divided into those
obtained from native vs. non-native speakers of English, and those used or not used to
international Englishes. As can be seen, the responses were very varied, from the
imaginative It is summer, live the blue life to two cases, one native and one non-native
listener who heard the utterance as intended by the speaker.
Una Cunningham
It is summer, live the blue life
A big farmer lifts a large loot
A big farmer leaves a large lodge
The big farmer lives large loge
A big farmer lives a large looge?
A big farmer lifts a large load.
A big farmer leaves a large Luke.
The big farmer lives in large lu???
A big farmer lifts a large load.
A big farmer lives in a large luge
A big farmer lives a large luuk??
A big farmer lives in large louge
A big farmer lives a large look.
A big farmer lives a large loot
The big farmer lives in a large loot
A big farmer lives a large...
A big farmer lives in large ?
A big farmer leaves a large look...
A big farmer lives in a large
The large farmer lives a large
The big farmer leaves/lives a
Table 1. Responses from native and non-native English speakers used and unused to
international Englishes listening to speaker N1 saying A big farmer lifts a large load.
What we see here is that the listeners have difficulty reconstructing the elided /t/ in lifts;
they are unsure whether the intended vowel gives leaves or lives. They are interpreting
the word load produced with a [u] loot, look or Luke to name just a few, and the
speaker’s slightly affricated /d/ in load is interpreted as lodge or large. The listeners are
doing their best to listen with an open mind as they try to make sense of the utterance.
This leads to incomprehensibility as well as unintelligibility in Smith and Nelson’s
(1985) terminology.
Other stimuli sentences produce similarly creative reconstructions as listeners do
their best to comprehend the only sporadically intelligible speech of the speakers. Table
2 summarizes the responses, with the intended word at the left of each row and the
listener perceptions in subsequent columns.
Speaker N1
lifts 3
load 2
leaves 4
loot 3
competent 1
car 3
filled 1
confident 16
guy 4
viewed 17
lives 13
coffee 3
guard 4
etc. 7
girl 3
Other 1
Other 1
Other 5
Other 1
Other 7
Other 3
Using Nigerian English in an International Academic Setting
Speaker N2
Speaker N3
scene 9
sin 2
fair 10
friend 2
judge 5
choice 4
gives/give 15
ship(s) 3
dangerous 18
pool 17
poo 2
orange tree(s) 16
thin 6
teen/team 8
lady/ladies 16
church 1
tin 2
george 6
Other 10
Other 9
Other 5
Other 6
Other 4
Other 3
Other 2
Other 5
Other 5
Other 5
girl 5
climbed 13
red 17
car 14
gate/gay 5
Other 11
Other 8
Other 4
Other 7
sheep(s) 14
ship(s) 5
Other 2
Table 2. Summary of intelligibility issues in all eight stimuli sentences spoken by speakers
N1, N2 and N3 showing numbers of responses
So what we see here is that speaker N1 (like N3) does not distinguish between the
vowels in e.g. sheep and ship as evidenced by the confusion experienced by listeners in
these words as well as lifts, scene and thin. As mentioned above, his reduction of
consonant clusters or affrication of consonants in the coda lead to misperception of the
words lifts, competent and judge. We can further note that his realisation of post vocalic
nasals as nasalised vowels misleads or confuses the listeners in the words confident,
scene and thin. His monophthong pronunciation of the vowels in guy and fair causes
many listeners to guess wildly at the speaker’s intention. For speaker N2, the very open
[a] pronunciation of the vowel in car confuses a third of the listeners, while only less
than a quarter of the listeners could reconstruct girl from what they actually heard.
The listeners who came closest to hearing the speakers’ intended words were both
native and non-native speakers of English, but they were both quite used to hearing
international Englishes. The listeners who did least well were in one case a native
speaker who does in fact have experience of international Englishes, and the non-native
inexperienced listeners.
4. Discussion
There is nothing unexpected about the results reported above. Jenkins (2000, 2002) has
posited that certain parameters need to be upheld if speech is to be internationally
intelligible. These speakers of Nigerian English, perhaps even Standard Nigerian
English, as described by Ufomata (1996) and Bamgbose (1982) do not maintain the
distinctions outlined by Jenkins, and their speech as elicited for this study is patently not
Una Cunningham
intelligible to the non-Nigerian native and non-native speakers of English who are
listening to them.
Some descriptions of Nigerian English compare the variety to RP as a target variety,
e.g. Ufomata (1990, 1996). But the question of the status of Nigerian English as a variety
of English or a New English is very relevant here. If Nigerian English is a legitimate
variety of English, there is no reason why it should not be used as a model for Nigerian
learners of English. Eka (2003:35) writes that this is “the variety of world Englishes
spoken and written by Nigerians within the Nigerian environment”. So the question of
whether or not the features of Nigerian English are to be viewed prescriptively as errors
or descriptively as features of Nigerian English depends of the speakers’ intentions. If
they are intending to speak Nigerian English, they are not making errors – they are
succeeding in their intention. But if they are aiming at a more internationally intelligible
variety, then the features of their pronunciation can be seen as errors and may be
corrected if the students take part in classes in English pronunciation (which the speakers
in this study actually did as part of their course in Sweden. This Nigerian English is not a
language of wider communication as defined by Bamgbose (1991).
Smith and Nelson (1985) suggest that if a listener expects to understand a speaker it
is more likely that this will indeed be the case. Nonetheless, the listeners in this study do
appear to expect certain things of the utterances they hear. In line with the ideas
expressed in Jenkins’ Lingua Franca Core (Jenkins 2002), there are some sounds that
should not be elided and some vowel distinctions that should not be neutralised if
intelligibility is to be maintained.
It is not only the pronunciation that is affected by the first language. Listeners will
listen according to the salient cues to vowel and consonant identity, voicing, etc. that
operate in the languages they speak, particularly in their first languages. Native speakers
of English will identify postvocalic voicing in words like bat vs bad according to the
length of the vowel rather than the vocal fold vibration (voicing) during the stop phase of
the postvocalic consonant. In fact, in the speech of many individuals, the stop will be
devoiced, though still lenis (Cruttenden 2008). If a speaker of another variety is
transmitting other cues to postvocalic voicing but failing to shorten the vowel before a
voiceless consonant, the native speaking listener may fail to pick up on the intended
voicing. In any kind of communication involving speakers of different varieties,
listeners need to be as flexible as they are able to be, although, unless they have
considerable experience of listening to a particular speaker they may not be able to read
the cues transmitted by the speakers.
Levis (2005) explicates the difference between nativeness and intelligibility as
learner targets (see also Cunningham (2009)). Hung (2002) questions the need to
“improve” non native pronunciation of English. He asks why teachers should modify
learners’ naturally acquired phonology of English and when it is worth the learners’ and
their teachers’ efforts to do so. The answer Kirkpatrick, Deterding et al. (2008) offer to
the question is that intelligibility criteria must be decisive here. The research of
Kirkpatrick, Deterding et al has taken place in the Hong Kong context. In Nigeria too,
we are dealing with learners of English as a second, not a foreign language and Nigerian
English is a Nigerian language and is used to convey speaker identity. International
intelligibility may not, however, be high on speakers’ lists of priorities. Failure to speak
in a way that is intelligible to a wider circle of listeners than that found in a local
Using Nigerian English in an International Academic Setting
Nigerian context is only problematic if the speech is indeed directed to non-Nigerian
listeners. Even then, it is no more acceptable to insist that a Nigerian English speaker
change his or her pronunciation to suit the listener than it would be to require the same
of a Welsh, Australian or Northern Irish speaker.
There are two ways to go here. The Nigerian (Welsh / Australian / Irish) English
speaker can adjust his or her pronunciation, moving along the continuum to a less
regionally marked pronunciation, if he or she has access to such a variety, or the listener
can learn more about Nigerian (Welsh / Australian / Irish) English in order to become a
more experienced and “in tune” listener, what Catford would have described as
“lowering one’s intelligibility threshold” (Catford 1950). Now in the case of a nonNigerian listener who is in Nigeria, the latter alternative is reasonable and realistic, but in
the case of a Nigerian English speaker in the diaspora, it is not realistic to expect one’s
listeners to be prepared for perceiving Nigerian English. The speaker must adjust his or
her speech or face having interlocutors miss a good deal of what is said.
In discussion of the use of English as a language of international communication, or
English as a lingua franca, mutual intelligibility is a major concern (Cunningham 2009;
Rooy 2009). Without intelligibility, communication is severely hampered. If speakers of
Nigerian English mean to use their English as a language of wider, or international
communication, they need to move along the continuum that is Nigerian English to a
point where they avoid those features that are least helpful to their listeners such as the
realisation of postvocalic nasals as vowel nasalisation, the elision of postvocalic /l/ and
the mapping of English vowels onto a severely reduced set of vowels. This does not in
any way mean that they need to speak Standard Southern English, or even to sound
anything but Nigerian. It is fully possible to signal one’s identity in accent without
impairing intelligibility. The educated Nigerian speaker, just as the educated Northern
Irish, Scottish or Indian speaker, needs to have access to more than one register. There
are situations when such speakers will want to move in the other direction, back along
the Nigerian English continuum, when for reasons of credibility, integrity, solidarity and
identity it is necessary and desirable to enhance the very pronunciation features that
impair international intelligibility.
To conclude then, it would seem that whatever legitimacy this variety might have in
a national Nigerian context, it is not particularly useful for communication outside the
Nigerian context. If speakers intend to make themselves understood in a pan-African
context or further afield such as is the position of the students who come to Europe to
study, they will need to modify their pronunciation. This is true of all peripheral
varieties, or indeed perhaps all varieties where Jenkins’ Lingua Franca Core features are
not a part of the phonology. Certainly speakers of some Scottish or Northern Irish
varieties of English also need to modify their pronunciation when interregional or
international intelligibility is at stake. Efficient communication is a two-way affair. It
relies upon speakers and listeners meeting in their expectations, and there will usually be
an accommodation of interlocutors to each other (Coupland 1984).
However, it is necessary to balance the phonetic integrity of the speaker with the
needs of the listener. Nigerian English is a member of the family of English languages
(McArthur 2002). But the speaker needs to have access to a point high enough on the
basilect-acrolect continuum that is Nigerian English if international intelligibility is to be
achieved. There is a clear need for teaching in English for international communication
Una Cunningham
alongside teaching of Standard Nigerian English if Nigerians are not to cut themselves
off from international discourse and the wider international community.
In many parts of Africa parents are reported to be enthusiastically seeking English
medium schooling for their children from an early age, even from preschool in many
cases. A number of African nations have implemented legislation stipulating that
children will be educated through the medium of English either from the start or from a
certain age. This is far from uncontroversial, as both political opinion and research in
bilingual education suggest that children might learn better in the language or languages
they actually speak than in a foreign language (Prah 2002; Garcia, Skutnabb-Kangas et
al. 2006). The empowerment of the languages of Africa is an important issue and the use
of indigenous language in African schools is held by Prah and others to be the only way
forward if more than a small English-speaking elite are to have access to academic
success. One of the reasons why English-medium schooling is sought after by parents is
that they believe it will give the children access to a language of wider communication.
While this is the case in many African nations, it may not be the case in Nigeria. In
Nigeria, children are schooled in English from an early age, but the variety of English
used is naturally Nigerian English. Nigerian English speakers who do not gain access to
a more acrolectal variety of Nigerian English as part of their education will not be
intelligible to either their fellow Africans or to the wider international community. While
the English that is needed as a language of wider communication need not be restricted
to the Lingua France Core, Seidlhofer (2009: 243) points out that “ELF and postcolonial
Englishes are very different realities on the ground.”. The political desire to view all
varieties of English as mutually intelligible must not be allowed to stand in the way of
speakers of Nigerian English from acquiring a more widely understood pronunciation.
Achebe, Chinua. (1975). Morning yet on creation day. London: Heinemann.
Adamo, Grace. E. (2007). Nigerian English. English Today 231: 42-47.
Agheyisi, Rebecca. N. (1984). Minor languages in the Nigerian context. Word 35: 235253.
Ajani, Timothy T. (2007). Is there indeed a Nigerian English? Journal of Humanities
and Social Sciences 1:1. Available at
http://www.scientificjournals.org/journals2007/j_of_hum.htm Accessed 27 June 2009.
Bamgbose, Ayo. (1971). The English language in Nigeria. In The English language in
West Africa, ed. J. Spencer. London: Longmans.
Bamgbose, Ayo. (1982). “Standard Nigerian English: Issues of identification”.In The
other tongue: English across cultures, Ed.Braj. B. Kachru. Urbana: University of
Illinois Press.
Bamgbose, Ayo. (1991). Language and the nation. Edinburgh: Edinburgh University
Bamgbose, Ayo. (1995). English in the Nigerian environment. In New Englishes: A West
African perspective, ed. Ayo Bambgose, Ayo Banjo and Andrew Thomas, 9-26.
Ibadan: Mosuro.
Using Nigerian English in an International Academic Setting
Banjo, Ayo. (1971). On codifying Nigerian English: research so far. In New Englishes: A
West African perspective, ed. Ayo Bambgose, Ayo Banjo and Andrew Thomas,
1995, 203-231. Ibadan: Mosuro.
Berns, Margie. (2008). World Englishes, English as a lingua franca, and intelligibility,
London: Blackwell Publishing.
Bern, Margie. (2009). English as a lingua franca and English in Europe. World Englishes
28 (2): 192-199.
Catford, John C. (1950). Intelligibility. English Language Teaching 1: 7-15.
Coupland, Nikolas. (1984). Accommodation at work. International Journal of
Sociolinguistics 4-6: 49-70.
Cruttenden, Alan. (2008). Gimson's Pronunciation of English. London, Hodder Arnold.
Crystal, David. (2003). English as a Global Language. Cambridge: Cambridge
University Press.
Cunningham, Una. (2009). Models and targets for English pronunciation in Vietnam and
Sweden. Research in Language University of Lodz, Poland.
Cunningham, Una. (2009. Quality, quantity and intelligibility of vowels in Vietnameseaccented English. In Issues in Accents of English II: Variability and Norm, ed. Ewa
Waniek-Klimczak. Newcastle: Cambridge Scholars Publishing Ltd.
Eka, David. (2003). The English language: changes and chances within the Nigerian
environment. Journal of Nigerian English and Literature 4: 32-41.
Garcia, Ofelia, Tove Skutnabb-Kangas, and Maria E. Torres-Guzma. (2006). Imagining
Multilingual Schools: Languages in Education and Glocalisation. Clevedon:
Multilingual Matters.
Graddol, David. (1997). The future of English. London: British Council.
Gut, Ulrike. (2007). First language influence and final consonant clusters in the new
Englishes of Singapore and Nigeria. World Englishes 26: 346-359.
Gut, Ulrike and Jan-Torsten Milde. (2002). The prosody of Nigerian English. Available
at http://www.lpl.univ-aix.fr/sp2002/pdf/gut-milde.pdf . Accessed 27 June 2009
Hickey, Raymond. (2004). Englishes in Asia and Africa: origin and structure. In
Legacies of colonial English, ed. Raymond Hickey. Cambridge: Cambridge
University Press: 503-535.
Hung, Tony. T. N. (2002). English as a global language and the issue of international
intelligibility. Asian Englishes 5: 4-17.
Jenkins, Jennifer. (2000). The Phonology of English as an International Language.
Oxford: Oxford University Press.
Jenkins, Jennifer. (2002). A sociolinguistically based, empirically researched
pronunciation syllabus for English as an international language. Applied Linguistics
231: 83-103.
Jenkins, Jennifer. (2005). Implementing an international approach to English
pronunciation: The role of teacher attitudes and identity. Tesol Quarterly 393: 535543.
Jenkins, Jennifer. (2006). Current perspectives on teaching world Englishes and English
as a lingua franca. Tesol Quarterly 401: 157-181.
Jenkins, Jennifer. (2009). English as a lingua franca: interpretations and attitudes. World
Englishes 28 (2): 200-207.
Una Cunningham
Kachru, Braj. (1992). Teaching world Englishes. In The Other Tongue, English across
Cultures, ed. Braj B. Kachru. Urbana, University Illinois Press.
Kachru, Braj. B. (1991). Liberation linguistics and the Quirk concern. English Today 25:
Kirkpatrick, Andy., David Deterding, and Jennie Wong. (2008). The international
intelligibility of Hong Kong English. World Englishes 273-4: 359-377.
Levis, John. M. (2005). Changing contexts and shifting paradigms in pronunciation
teaching. Tesol Quarterly 393: 369-377.
McArthur, Tom. (2002). The Oxford Guide to World English. Oxford, Oxford University
Munro, Murray J. and Tracey M. Derwing. (1995). Processing time, accent, and
comprehensibility in the perception of native and foreign-accented speech. Language
and Speech 38: 289-306.
Phillipson, Robert. (1992). Linguistic Imperialism. Oxford, Oxford University Press.
Prah, Kwesi K., Ed. (2002). Rehabilitating African languages : language use, language
policy and literacy in Africa: selected case studies Cape Town, South Africa Centre
for Advanced Studies of African Society CASAS.
Prah, Kwesi K. (2009). The burden of English in Africa: From colonialism to neocolonialism. Lecture presented at Mapping Africa in the English-Speaking World.
University of Botswana.
Rooy, Susan. C. V. (2009). Intelligibility and perceptions of English proficiency. World
Englishes 28 (1): 15-34.
Schneider, Edgar. (2003). The dynamics of new Englishes: from identity construction to
dialect birth. Language 79: 233-81.
Seidlhofer, Barbara (2009). Common ground and different realities: world Englishes and
English as a lingua franca. World Englishes 28 (2): 236-245.
Smith, Larry E. and Cecil. L. Nelson (1985). International intelligibility of English:
directions and resources. World Englishes 4(3): 333-342.
Smith, Larry E. and Khalilullah Rafiqzad (1979). English for cross-cultural
communication - question of intelligibility. Tesol Quarterly 133: 371-380.
Stevenson, K. J. (1965). Reflections on the teaching of English in West Africa. Journal
of Nigeria English Studies Association 32: 227-235.
Ufomata, Titi. (1990). Acceptable models for TEFL. In Studies in the pronunciation of
English: A commemorative volume in honour of A.C. Gimson, 212-218, ed. S.
Ramsaran. London & N.Y.: Routledge.
Ufomata, Titi. (1996). Setting priorities in teaching English pronunciation in ESL
contexts. London, UCL. Available at
http://www.phon.ucl.ac.uk/home/shl9/ufomata/titi.htm. Accessed 27 June 2009.
Watterson, Matthew. (2008). Repair of non-understanding in English in international
communication. World Englishes 273-4: 378-406.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0044-7
UAM Poznań
This paper presents an acoustic study of the speech of Polish leaners of English. The
experiment was concerned with English sequences of the type George often, in which a
word-final voiced obstruent was followed by a word-initial vowel. Acoustic
measurements indicated the degree to which learners transferred Polish-style
glottalization on word-initial vowels into their L2 speech. Temporal parameters associated
with the production of final voiced obstruents in English were also measured. The results
suggest that initial glottalization may be a contributing factor to final devoicing errors.
Adopting English-style ‘liaison’ in which the final obstruent is syllabified as an onset to
the initial vowel is argued to be a useful goal for English pronunciation syllabi. The
implications of the experiment for phonological theory are also discussed. A hierarchical
view of syllabic structures proposed in the Onset Prominence environment allows for the
non-arbitrary representation of word boundaries in both Polish and English.
1. Introduction
In the development of English pronunciation teaching materials, issues of phonological
representation may lead to conflicting strategies with regard to given aspects of the
target language phonology. For example, the ship-sheep contrast may be described in
terms of a number of different phonological features, including [tense], [ATR] and
[long]. This variety of description can confuse learners and teachers alike, and lead to
undesirable results. I have heard many learners, presumably on the basis of the
descriptions of “long /i:/”, produce unnaturally long vowels in words such as sheep. In
this connection, it may be worthwhile to go back and re-evaluate traditional
phonological descriptions of target-language segments with the goal of increasing
teacher and learner awareness of their most salient properties. For the sheep-ship
contrast, studies such as Kaźmierski (2007) have suggested that the dynamic properties
of English vowels (Strange 1989), in particular diphthongization, are worth focusing on.
With regard to the nasal /ŋ/, Schwartz (2011) found that learners’ tendency for stop
insertion in words such as singer may be alleviated be revising the traditional ‘velar’
description of the sound. Briefly stated, it some cases it may be worth re-evaluating our
descriptions and representations of the most difficult aspects of L2 speech.
The English voice contrast in word-final position may represent another candidate
for this kind of representational refinement. Due to aerodynamic factors, final voiced
obstruents present a phonetic challenge for foreign leaners in the acquisition of English
phonology. Final devoicing (FD), a well-known feature of the phonology of many
languages, is one of the more frequently cited contributors to a Polish accent (e.g. Gonet
and Pietroń 2004) in English. Its avoidance is a priority in ESL pronunciation teaching.
Final devoicing frequently occurs in the native language as well, but without the
neutralization of the laryngeal contrast, the preservation of which may be attributed to
two phonetic parameters. The first, the relative duration of the final consonant and its
preceding vowel, is an example of well-documented phonetic universal (Chen 1970,
Maddieson 1997) that English has chosen to exaggerate (Port and Dalby 1982). Vowels
are clipped before unvoiced consonants, which are longer in duration. Voiced
consonants are shorter in duration, and the vowels preceding them are longer.
English speakers often employ an additional strategy in overcoming the phonetic
challenge of final voiced obstruents. They have a tendency to ‘liaise’ the final
consonants with the beginning of the following word, especially if that word begins with
a vowel. In other words, phrases such as hold on and tries it are generally pronounced by
native speakers as if they were written whole Don or try zit. As a result of this process,
the final obstruent in question loses its ‘final’ status, and is no longer in the environment
for FD to apply. Liaison is described in most teaching materials I have seen, but it is
usually relegated to descriptions of ‘connected speech phenomena’ that do not comprise
the main focus of textbooks. In the context of Polish instruction in English
pronunciation, final devoicing is emphasized as an area of L1 interference that must be
overcome. While the durational properties discussed above do get some mention, liaison
is rarely mentioned in connection with final voicing.
With regard to liaison, the Polish and English phonological systems are diametrically
opposed. Liaison in English results in a rearrangement of syllabic affiliation – the final
consonant becomes an onset to the first syllable of the following word. This process may
be motivated by an apparent universal preference for syllables with consonantal onsets.
In Polish, resyllabification across word boundaries is impossible (Rubach and Booij
1990). The preference for consonantal onsets is satisfied by means of an alternative
strategy: glottal stop insertion. Glottal stops may be claimed to fill an ‘empty onset’
position to repair a non-optimal syllabic structure. However, glottalization may have
further prosodic implications, underlying the ‘initial’ status of vowels at the start of a
word, and reinforcing the ‘finality’ of the preceding consonant, thereby preserving the
context for final devoicing. As a consequence, although FD in Polish English is
generally described as a simple segmental error, it may have far-reaching phonological
consequences. In particular, the study presented here touches on the question of how
Polish and English differ with respect to the representation of word boundaries.
These phonological considerations suggest that in Polish English we might look for a
correlation between FD and glottal stop insertion - we would predict that speakers who
glottalize initial vowels in English should be more likely to devoice final obstruents. A
preliminary study (Rejniak 2011) of a corpus of Polish English speech suggests that such
a correlation indeed exists. An impressionistic analysis found that the number of
devoicing errors rose in accordance with the number of glottal stop insertions. This
paper will present the results of an acoustic study of Polish English speech that seeks to
investigate this correlation. After some discussion of the phonetic parameters under
study in Section 2, the experimental procedure and results are described in Section 3.
Section 4 offers a new phonological perspective on these issues, and Section 5
2. Phonetic aspects of (de)voicing and glottalization
The phenomenon of final obstruent devoicing is a well-known feature of a large number
of languages, and is particularly prevalent in the languages of Europe, including most
members of the Slavic family, German, Dutch, and Catalan. It may be seen as a
necessary aerodynamic consequence of the final portion of a sequence of speech sounds,
during which airflow through the vocal tract has a natural tendency to diminish. Since
airflow through the glottis is what makes voicing possible, the decrease of airflow is
expected to be accompanied by a lack of vocal fold vibration. As a result of this
challenge, languages that maintain laryngeal contrasts in final position often employ
additional strategies to produce a distinction. For example, in French one may often
observe a short vowel after the release of a final voiced consonant, suggesting that extra
effort has been made to maintain the airflow required for voicing. Vowel intrusion may
be seen as a process that is parallel with the classic liaisons in phrases like les hommes
[le zɔm] ‘the men’. The result is a syllable-initial consonant during which it is easier to
maintain voicing.
Before pauses and consonant-initial words, final obstruent devoicing often occurs in
English, particularly in the case of fricatives. However, the “voice” contrast is preserved
through exploitation of a known phonetic universal (Maddieson 1997): vowels are
longer before voiced consonants than before voiceless ones. The magnitude of this
difference is much greater in English than in other languages (Chen 1970), so we may
assume that English has exaggerated this phonetic property in order to keep the laryngeal
contrast readily perceptible in the face of weak or absent vocal fold vibration in final
consonants. Alongside this difference in vowel duration, we find that consonants too are
also marked by universal voice-related durational properties: voiced consonants are
shorter than voiceless ones. English may be claimed to exaggerate this property as well.
While this fact is widely noted with regard to aspirated initial stops, the extended
duration of final voiceless consonants in English, a feature described in experimental
phonetic studies (e.g. Port and Dalby 1982), is not a priority of most English
pronunciation materials.
While Polish is if course known for final obstruent devoicing, the Southern and
Western regions of the country have been observed to exhibit voicing between vowels
across word boundaries. This process, known as Poznań-Cracow voicing has been found
to neutralize the laryngeal contrast in favor of the voiced variant, so the phrase brat Ewy
‘Eve’s brother’ is pronounced as [bradevɨ]. This voicing process may conceivably be
interpreted as a form of ‘liaison’ that blocks the insertion of glottal stops, which are
voiceless. Our acoustic study includes four speakers from the Wielkopolska region
where this voicing process is attested.
The term glottalization may be associated with two different phonological
phenomena. In the study of English accents and pronunciation, the terms glottalization
and glottaling are frequently associated with a process by which /t/ is replaced by glottal
stops. As an allophone of /t/, the glottal stop is commonly assumed to be the result of a
lenition process in casual speech. By contrast, the glottalization of word-initial vowels
serves as a marker to a prosodic boundary. It represents a form of strengthening, making
the syllable boundary more robust for listeners. Our focus in this paper will be on the
glottalization of word-initial vowels.
In English, initial glottalization has been found to be largely dependent on higher-level
prosodic structures. That is, it most frequently appears on word-initial vowels at phrase
boundaries, but not within a phrase. For example, in a study based on a corpus of radio
announcers’ speech, Dilley et al (1996) found that glottalization rates for phrase-initial
vowels were around 60%, while word-initial vowels within phrases were glottalized
around 20% of the time. In Polish, glottalization appears to be a syllable-level process,
motivated by the preference for consonantal onsets. The process has been reported to be
present on word-initial vowels (Dukiewicz and Sawicka 1995, Gussmann 2007) without
reference to phrase-level structures. It may even be found within words on morphemeinitial vowels: nauka ‘science’ may surface as [naʔuka]. As a result, although there is
little published data that is comparable to the studies describing English, it is reasonable
to assume that glottalization in Polish is more widespread than in English, which largely
limits the process to phrase-initial position.
One important aspect of glottalization that may be observed in both initial vowels as
well as glottalized allophones of /t/ is phonetic variability. While a canonical glottal stop
is characterized by a full closure, this feature often fails to surface in natural speech. This
is especially true in the case of intervocalic glottal stops, which may be perceived on the
basis of drops in pitch and small irregularities in the periodicity of the vocal wave. The
various irregularities have been described for English in Redi and Shattuck-Hufnagel
(2001) and Ashby and Przedlacka (2011). As it turns out, in the study described in this
paper it will be possible to describe glottalization in terms of the duration of full closure.
This is due to two factors: (1) we will not analyze glottalization at vowel hiatus where
full closure is not often achieved, and (2) we will analyze second language speech, in
which casual speech processes such as the reduction of glottal closure should be
relatively infrequent.
3. Experimental method
This section describes an acoustic study of Polish English speech. Our experiment will
address the following questions.
1. To what extent do Polish speakers transfer initial glottalization into their English
2. What effect does initial glottalization have on the realization of final voiced
obstruents in Polish English?
3. Do speakers from dialect regions associated with Poznań-Cracow voicing show
different behavior in their L2 with regard to these parameters?
3.1. Subjects and Data
10 first year students of English at the School of English at Adam Mickiewicz
University in Poznań, Poland participated in the acoustic study. The students were
recorded in a soundproof recording booth. The linguistic materials were comprised of a
list of English sentences containing sequences of word-final voiced obstruent(s) + wordinitial vowel, such as George often, today’s express train, Fred’s aunt. The data set
Initial Glottalization and Final Devoicing in Polish English
included 20 such sequences, as well as additional sentences to control for list reading
effects. The sentence list was read twice by each subject, producing 40 sequences for
analysis per speaker*10 speakers = 400 tokens for analysis. A native speaker of
American English also read the sentence list.
3.2. Acoustic measurements
The acoustic measurements were performed by hand using the Praat program. The
following measurements and calculations were made.
1. Duration in milliseconds of vowel preceding final consonant (VD)
2. Duration in milliseconds of final consonant (CD). For stops and affricates this
measurement combined both closure and noise bursts/frication.
3. V/C ratio: (VD/CD)
4. Duration in milliseconds of glottal closure (GC) from end of consonant noise to
onset of voicing on the vowel.
5. Duration in seconds of each sentence (RATE), allowing for the control of speech
Figure 1 – Illustration of acoustic measurements for the sequence jazz always. VD represents
vowel duration (142ms), CD consonant duration (140ms), and GC indicates glottal closure
duration (92 ms).
Figure 1 presents an illustration of the acoustic measurements on the sequence jazz
always. The following measurements (Rate is not included in this illustration) were
made on this token: VD=142 ms, CD=140 ms, V/C=1.01, GC=92 ms.
The GC measurement was complicated somewhat by irregularities in the vocal wave
associated with glottalization (Redi and Shattuck-Hufnagel 2001, Ashby and Przedlacka
2011). Figure 2 shows an example of this difficulty in the obstruent-vowel sequence in
the phrase George often. Note that at vowel onset there are two pulses of highly
laryngealized voicing. Since this type of irregularity is associated with the perception of
glottal stops, in such cases the GC measurement was extended to the onset of modal
voicing, characterized by a regular periodic pattern in the waveform. The glottal pulse
trackers in Praat were of assistance in identifying the onset of modal voicing.
For the purposes of the research questions, the V/C ratio and the GC measurements
allow us to characterize the degree of final devoicing and the extent of initial
glottalization. A higher V/C ratio is associated with error-free final voiced obstruents. A
shorter GC measurement indicates that the consonant and vowel have been liaised, while
longer glottal closure is of course associated with glottal stops.
Figure 2 – Glottal closure duration measurement in George often. Measurement includes two
cycles of highly laryngealized glottal pulsing.
3.3. Results - Individual data
The mean results for each individual speaker are provided in Table 1. Three speakers
had Rate measurements that fell outside of the standard deviation for the entire group,
indicated by shading in the appropriate cell. These speakers were excluded from the
group analysis. Note that the GC measurements for the non-native subjects exhibited a
wide range, from just under 9 to over 100 milliseconds. The native speaker inserted
showed measurable glottal closure in just one of 40 tokens, for an average GC
measurement of less than 2 milliseconds. The native speaker’s V/C ratio was 2.9, while
that of the non-natives ranged from 1.33 to 2.72
Initial Glottalization and Final Devoicing in Polish English
Table 1 – Mean results for each individual speaker. Shaded cells denote speakers whose Rate
average fell outside of the group standard deviation. These speakers were excluded from the
group analysis.
3.3 Group Data and interaction of GC with voicing parameters
To investigate the possible effects of GC duration on the voicing parameters, each
individual measurement was placed into one of three categories depending on the value
of the GC measurement. Type 1 was comprised of GC measurements of less than 40 ms,
and may be described partially or completely liaised. Type 2 included GC measurements
between 40-79 milliseconds, while Type 3 covered glottal closures of over 80 ms. From
the 8 speakers analyzed in the group data, there were 112 tokens of type 1, 119 tokens of
type 2, and 89 tokens of type 3.
Type 1;
0-39ms; n=112
Type 2;
40-79 ms; n=119
Type 3;
>80 ms; n=89
Table 2 – Voicing parameter means sorted according to three types of glottal closure
The mean results of the measurements sorted according to glottal closure duration are
presented in Table 2. A one-way analysis of variance (ANOVA) was performed to
establish the effects of Glottal Closure token type on the voicing parameters. Significant
effects (p<0.01) were found for both V/C ration and Consonant Duration. No significant
effect was found for Vowel Duration (p=0.17). Post-hoc Tukey tests were performed on
the pairs of means. For V/C ratio and Consonant Duration all pairwise comparisons were
significant. For Vowel Duration, Type 1 and Type 3 were significantly different, while
the other pairwise comparisons were insignificant
3.4 Effects of dialect
Our study may also raise the question of whether intervocalic voicing across word
boundaries, a feature associated with certain dialect regions, may be found in these
speakers’ L2 English, and if so, what if any effect does it have on the parameters of final
voicing. Of the 10 subjects recorded for this experiment, 4 of them reported that they
were raised in Wielkopolska, an area of Poland associated with intervocalic voicing. The
results of the acoustic study were thus sorted according to dialect background to
investigate any possible effects on the parameters under study. The dialect results are
given in Table 3, which shows the mean values of the voicing and glottal closure
parameters, as well as the percentage of Type 1 (liaised) tokens. A one-way ANOVA
revealed a significant effect of dialect on Glottal Closure duration, which was shorter for
the Wielkopolska speakers. In addition a chi-square test on the percentage of liaised
(Type 1) tokens was significant: Wielkopolska speakers were more likely than the others
to produced ‘liaised’ sequences. No significant effect of dialect was found for the
voicing parameters
%Type 1
Table 3 – Acoustic measurements sorted for dialect background.
3.5 Discussion
The results of the acoustic study support the hypothesis that glottalization of initial
vowels may contribute to final obstruent devoicing in the speech of Polish leaners of
English. In this connection it is interesting to observe the results obtained from the native
speaker, who showed almost across-the-board liaison, as well as the highest V/C ratio of
all the recorded subjects. Table 4 shows a comparison of the native speaker with group
mean values of the non-native. For the Polish speakers, the average glottal closure
duration of 60.8 ms fell within the Type2 range, while the V/C ratio was 1.76, notably
lower than that of the native speaker. Thus, it is reasonable to claim that liaison is a clear
Initial Glottalization and Final Devoicing in Polish English
aspect of native-like speech that contributes to the production of ‘final’ voiced
%Type 1
Native control
Table 4 – Comparison of native speaker control with group data for Polish learners.
When the analyzed tokens were divided into three types of the basis of Glottal Closure
duration, a clear effect was found for token type on both the V/C ratio and the Consonant
Duration. Importantly, the effect of token type on Vowel Duration was not significant.
This fact suggests that we may rule out speech rate as a factor in the group results. While
one may be inclined to attribute initial glottalization to the fact that the subjects were
speaking more slowly in a foreign language, rate effects should have been equivalent for
each of the parameters involved. This was not the case – only the final consonant was
The effect of Glottal Closure duration on the duration of final consonants may be
attributable to a process of final lengthening by which segments are lengthened at the
end of prosodic constituents (Beckman and Edwards 1990). Final devoicing and final
lengthening should be expected to co-occur. The longer a consonant is, the more likely it
is to be unvoiced, since more effort is required to sustain the glottal airflow required for
voicing over the course of a lengthened consonantal constriction. In other words, we are
witnessing the manifestation of a phonetic universal by which unvoiced consonants are
longer than voiced ones.
When liaison occurs, the context for final lengthening (and final devoicing) is
eliminated; the consonant is no longer final. Thus, although final lengthening does occur
in English (Beckman and Edwards 1990), the native speaker produced liaised
consonants instead of longer final ones that would be more susceptible to devoicing.
These results suggest that Polish and English have somewhat different representations of
“final” and “initial” positions. We will take up this issue in detail in the following
The data from the dialect groups may complicate the picture. The results indicated
that speakers from Wielkopolska produced more ‘liaised’ tokens, but this did not seem
to have a significant effect on the durational patterns associated with voicing. That is,
more liaison did not necessarily imply less devoicing, at least in terms of the temporal
parameters. One possible clue in explaining the discrepancy associated with the
Wielkopolska speakers may be found in the behavior of one speaker, who in many
instances showed an intrusive vowel before a glottalized initial vowel. For example, in
the phrase Today’s express train, the speaker produced a short vocoid after the final /z/,
and then showed full glottal closure on the initial vowel of express, resulting in a
sequence [zəʔɛ]. Since a full glottal stop is produced, we may not claim that liaison has
been acquired. The speaker appears to have adopted a vowel-insertion strategy to
produce fully voiced final obstruents, perhaps diminishing the significance of the
temporal parameters.
4. The phonology of boundaries
The acoustic study described in this paper reflects a fundamental difference in the
phonology of English and Polish with regard to the behavior of speech sounds at word
boundaries. Stated briefly, word boundaries in Polish seem to block many common
phonological processes that might be expected to accompany the concatenation of two
sounds. In English, on the other hand many such processes are common.
For example, Polish morphology shows a number of palatalization processes that
turn coronal stops into alveolo-palatal affricates before certain grammatical endings.
Thus, the locative of the form /kot/ ‘cat’ is /kotɕe/. The traditional assumption is that it is
the frontness of the vowel in the ending that conditions the alternation – the /t/ is said to
be ‘palatalized’. In a sequence kot jest ‘the cat is’, one might expect the palatal glide in
jest to cause palatalization of the /t/. It does not, so we may assume that the
concatenation process that results in the alternation at the morpheme boundary does not
apply at the word boundary. Conversely we frequently observe palatalization in an
analogous sequence got you in English, which is often pronounced as gotcha.
These facts are connected with the notion of resyllabification across wordboundaries, by which a word-final coda consonant is reinterpreted as the onset to the
following syllable. Thus, for English we may make a generalizing statement that a
sequence /tj/ in a syllable onset results in a post-alveolar affricate. Resyllabification is
impossible in Polish (Rubach and Booij 1990), so the /t/ and the /j/ in kot jest must be
analyzed as belonging to two separate syllabic constituents. Liaison in English may be
interpreted as a form of this type of resyllabification.
The Onset Prominence representational environment (OP; Schwartz 2010) offers a
useful set of materials for analyzing the different behavior in Polish and English at word
boundaries. OP builds on recent insights into the structural nature of segmental
phonology, in particular manner of articulation (Golston and Hulst 1999, Pöchtrager
2006). The basic building block of the OP representational environment, which may be
seen as the functional equivalent of a universal CV structure, is given in Figure 3. The
tree represents the acoustic signal as a hierarchical structure, from which both segmental
representations and prosodic categories such as syllables are derived. Manner is defined
by the layers of structure present in the segmental representation. Figure 3 represents a
stop-vowel sequence.
Figure 3 – Basic building block of syllabic structures in the OP environment.
In the OP environment, syllabic constituents such as the one in Figure 3 are formed from
the concatenation of individual segmental structures. Consider Figure 4, in which the
representation of the stop /k/ contains the top two layers of the structure in Figure 3, the
/w/ contains the Vocalic Onset node, and the vowel and final /k/ represent the Rhyme.
These structures combine to form the English word quick. Such a sequence, since it
proceeds down the hierarchy is predicted to be contained a single syllabic constituent .
The basic principle for syllabification is that a tree may be “absorbed” into a higher level
structure to its left, so the three structures in Figure 4 may combine into a single
Figure 4 – Syllabification of English quick in the OP environment
For the representation of initial vowels in Polish, we claim that they contain an
additional layer of structure, namely the Closure node associated with stops. Since it is
not a lower-level structure than the preceding consonant, the vowel may not be absorbed
into the tree to its left. Resyllabification does not occur, and the “final” status of the
consonant is reinforced. This is illustrated in Figure 5, which shows a word-final /d/
followed by an initial /e/ as they would be represented in Polish using OP structures.
Figure 5 – Sequence of word-final /d/ and word-initial /e/ in Polish. The active Closure node
on the vowel blocks the merging of the two trees.
1 The presence of the /k/ in the rhyme is the result of a submersion process for codas that will not
be relevant for the present paper.
Geoffrey Schwartz
In Figure 6 we see an analogous sequence in English. Note that the English vowel does
not contain the extra structure, and the tree on the right may be absorbed into the one on
the left, reflecting liaison and resyllabification. The difference between Polish and
English is captured in terms of the structural properties of the initial vowel. Initial
vowels in Polish are larger structures than they are in English. They might be thought to
contain a “built-in” glottal stop, which blocks resyllabification across word boundaries .
Figure 6 – English sequence of final /d/ and initial /e/ producing liaison.
This representational approach comes with benefits for both phonological theory and
comparative descriptions of Polish and English phonology upon which we may base
teaching materials. The advantage concerns the representation of phonological
boundaries. In phonology, this has been a recurring problem. Symbols (such as + and #)
traditionally used to represent such boundaries are inherently arbitrary in nature (e.g.
Scheer 2008). By contrast, in the Onset Prominence environment, such boundaries may
be constructed using the structure of segments themselves – they are truly ‘phonological’
entities. With regard to teaching materials, the value of OP representations lies in the fact
that they are hierarchical. Unlike a linear string of segmental symbols, this approach
allows for a faithful model of what actually happens in speech.
5. Final remarks
This paper has described an acoustic study of the speech of Polish learners of English.
The results, as well as the ensuing phonological discussion, suggest the need to establish
principled representations of phonological boundaries. Languages appear to show
systematic differences in the behavior of word-initial and word-final segments, which
manifest themselves in a number of processes found in Second Language speech. The
Onset Prominence environment offers a principled explanation of these differences, with
benefits for both phonological theory and second language speech acquisition.
2 In the case of kot jest, resyllabification is prevented by the ‘promotion’ of the structure of the /j/.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0030-0
Institute of Phonetics, Faculty of Arts, Charles University in Prague
The study investigates the impact of glottal elements before word-initial vowels on the
speed of processing of the phrases taken from natural continuous speech. In many
languages a word beginning with a vowel can be preceded by a glottal stop or a short
period of creaky voice. However, languages differ in the extent of use and functions of
this glottalization: it may be used to mark the word boundary, for instance, or to add
special prominence to the word. The aim of the experiment was to find out whether the
presence of the glottal element can influence reaction times in a word-monitoring
paradigm. Users of different languages – Slovak and Czech learners of English, as well as
native speakers of English – were participating in perception testing so that the influence
of the mother tongue could be determined. The results confirm the effect of both
glottalization and the L1 of the listeners. In addition, a significant effect of test item
manipulations was found. Although the phrases with added or deleted glottal stops
displayed no obvious acoustic artefacts, they produced longer reaction times than items
with naturally present or absent glottalizations. We believe that this finding underlines the
importance of inherent stress patterns, whose alterations lead to the increase in processing
1. Introduction
Linguists of most methodological backgrounds have a similar concern. Whether they are
generativists, structuralists or constructionalists, they have to establish the inventories of
items that are relevant for language communication. The research in sound patterns of
languages of the past decades has shown that it is unproductive to remain stuck with
narrowly defined phonemes and ignore rich symbolic structure provided by other speech
phenomena. Descriptive units, whose distinctive power rightfully draws attention of
language users, can change lexical meanings, but cannot explain on their own why some
speakers communicate more effectively, are better accepted, and induce more
cooperative behaviour than others (Local 2003; Hawkins 2003).
One of the elements that occur in most languages with non-phonemic status and still
could influence intelligibility of speech and the smooth flow of communication with all
its consequences is the glottalization of word-initial vowels. In this study, the term
glottalization will be used for glottal stops or perceptually equivalent glottal events, e.g.,
creaks, rapid drops in F0 or intensity, etc., which precede words beginning with
onsetless syllables. Languages differ in the extent of use and roles or functions of such
glottalizations (e.g., Przedlacka and Ashby 2011; Gordon and Ladefoged 2001; Redi and
Shattuck-Hufnagel 2001; Kohler 1994).While in some they can be treated as external
juncture signals that indicate an important autosemantic morpheme boundary, in others
they may add special prominence to words with which they are used. In such cases the
prosodic structure or the semantics of the utterance may be reflected. In phonological
terms, the word-initial prevocalic glottalization can be viewed as a specific treatment of
onsetless syllables in critical positions (Schwartz 2011).
While the production of glottal elements is often noted and explored, the perceptual
aspect of the problem remains unclear. It might be hypothesized that speakers who
regularly produce glottalizations would rely on their presence in the speech signal when
they have to process it. By analogy, the greatest sensitivity can be expected in those rare
languages where glottal stops act as phonemes. However, English is described as a
language where word-initial prevocalic glottalization is facultative, and it is only used to
emphasise a word if such an emphasis is contextually appropriate (Wells 1990: 327;
Cruttenden 1994: 155). It is even recommended to foreigners to avoid production of
glottal elements before most of the words beginning with vowels (especially frequently
occurring grammatical of, in, is, are, a, and, etc.) to prevent unnatural “choppy” flow of
speech (O’Connor 1980: 101). In such circumstances, inappropriate presence of glottal
elements might even hinder mental processing of speech since it produces unnatural or
unpredictable rhythmic configurations.
As our ultimate concern is English as a foreign language, the matter is even more
complicated. Foreign speakers of English try to model the speech behaviour of native
speakers, yet they struggle with production stereotypes from their own mother tongue.
The extent to which they either benefit or suffer from the presence of glottal elements in
speech can thus differ depending on their native situation.
In our previous study, we found significant differences between Czech and Spanish
speakers of English (Bissiri et al. 2011). Spanish learners of English, in whose L1
glottalization is used infrequently and mostly as a marker of emphasis, benefited less
from the presence of word-initial glottalizations than native speakers of Czech, which
uses glottalization frequently as a signal of juncture. However, these results are difficult
to interpret unambiguously since apart from differences in the general use of
glottalizations, Spanish differs from Czech typologically. The phonotactic patterns and
the prosodic plan of the two languages endow the learners of English with quite different
predispositions. Moreover, the EFL teaching in the two countries seems to draw on
different resources: both the general motivation of students and the teaching methods
may not be comparable.
Therefore, we decided to examine the differences between reaction times to words
with and without glottalization in Slovak speakers of English. Slovak is in many features
similar to Czech (they both are Western Slavonic languages) and speakers of these
languages are able to reasonably communicate even without special language
instruction. Also, the EFL methodology is essentially the same in the two countries: the
Czechs and Slovaks had lived under one central government until 1989 and they keep
sharing many of their social and cultural traditions. On the other hand, the two languages
The Effect of Word-Initial Glottalization on Word Monitoring in Slovak Speakers of English
differ in the exploitation of word-initial glottal stops: the use of glottalization in Slovak
is reportedly low and word-initial vowels regularly cause assimilation of voicing of the
final consonant of the preceding word. This means that rather than providing the
onsetless word-initial syllable with a glottal consonant-like element, the speakers of
Slovak prefer to tie the word-initial vowel quite firmly to the preceding consonant. For
instance, the word tak [tak] in the Slovak phrase
tak ale nie
(in Engl. but not this way)
will be pronounced with [g] due to the tightly adhering [a] of the following word. The
similarly sounding phrase in Czech, on the other hand, will contain glottalization and the
preceding word-final [k] will remain voiceless:
tak ale ne
(in Engl. but not this way)
Even in the case of less careful pronunciation where the glottal element might be
missing, the assimilation of voicing will not happen (again, cf. Geoff Schwartz’ concept
of onsetless syllables).
The objective of our study is thus to investigate the influence of L1 on the perceptual
impact of glottalizations in English while abstracting from profound differences in
phonological systems (Spanish and Czech) and in language instruction. Slovak and
Czech listeners will be compared mutually and against the benchmark performance of
native English listeners.
We have stipulated two sets of hypotheses. The first set concerns the influence of
glottalization, and the null hypothesis states that there is no effect of the presence or
absence of a word-initial glottal element on reaction times when monitoring the speech
signal for target words. An alternative hypothesis says that the presence of glottalization
highlights the target word thus facilitating its perception. Reaction times in such a case
should be shorter. Another alternative would argue that the presence of the glottal
segment breaks the natural flow of English (as argued in some pronunciation textbooks)
and produces the effect similar to that reported by Buxton (1983): rhythmically impaired
utterances lead to longer reaction times in word-monitoring experiments.
The second set of hypotheses concerns the mother tongue of the EFL learners. The
null hypothesis would deem it irrelevant. The first alternative would suggest that the
Czech listeners will benefit more from the presence of glottal stops as they use them on a
regular basis in their mother tongue. The second alternative would argue that the Slovak
listeners, who only use glottal stops to highlight words (similarly to the English) will
have shorter reaction times to words with glottal segments than the Czech listeners, to
whom the glottalization of word-initial vowels does not signal anything special.
2. Method
The experiment was based on the word-monitoring paradigm (Kilborn & Moss, 1996).
In this design, respondents are given a target (a word usually printed on a computer
screen) and they listen to auditory stimuli for that target. Their task is to press a
Jan Volín, Mária Uhrinová and Radek Skarnitzl
designated key as soon as they detect the word. Their reaction time (or the so-called
latency) is measured from the acoustic onset of the word to the moment when the key
was pressed. We used the DMDX software – a package developed specifically for
reaction time measurements (Forster & Forster, 2003).
Natural continuous speech provided the material for the stimuli. Five native speakers
of southern variants of British English read news bulletins that were earlier broadcast at
the BBC World Service. Forty-eight phrases were extracted from the recordings such
that the target words could not be guessed from the semantic cues, i.e., all common
collocations of the target words were avoided. For instance, in the phrase Arafat last
month as partial promised reforms the conjunction as was the target. Clearly, the
extraction of the sequence from a longer sentence does not help the listeners to guess
when the target word might come. Similarly, in the phrase with ten men after the striker
Thiery Henry the listeners were asked to react to the word after. The targets were placed
anywhere between the second and the fifth stress-group. Distractors with the target in the
first stress-group were only used to keep the listeners alert, but were not analyzed. Some
more distractors were prepared with consonants in the word-initial position so that the
listeners would not figure out the nature of the true targets.
One half of the true targets occurred naturally with glottal stops, the other half
without them. These 48 items were processed in sound editing software to create
artificial stimuli with the opposite value of glottalization, i.e., the naturally occurring
glottal stops were deleted and the items without glottal stops were provided with an
spliced one. Obviously, all possible care was taken to produce items that could not be
recognized as artificial, i.e., the items were without clicks and other discontinuities, with
smooth transitions of formants and the fundamental frequency track. These
manipulations were carried out with the help of Praat, Sound Forge, and Matlab software
Altogether, 96 targets and 36 fillers were used in the perceptual testing. The listeners
were 90 adults in three equally-sized groups by their L1. Thirty were native English
students and employees of a British university, 30 Czech and 30 Slovak learners of
English. They were tested individually through headphones in a sound treated booth.
3. Results
The results confirm previous findings of the positive effect of glottalizations on the
latencies: the words with pre-glottalized word-initial vowels are spotted faster than
words linked to the preceding words. Repeated measure ANOVA returned highly
significant effect of glottalization: F (1, 87) = 481.4; p < 0.001. Figure 1 indicates that
the latencies were about 450 ms and items with glottal stops were spotted about 60 ms
faster than the items without it.
The Effect of Word-Initial Glottalization on Word Monitoring in Slovak Speakers of English
Reaction time (ms)
with glottal stop
Figure 1: Mean reaction times of all listeners to words with (on the left) and without (on the
right) the word-initial pre-glottalization.
The main effect of the mother tongue (the between-group factor) was also found highly
significant: F (1, 87) = 11.96; p < 0.001. Post-hoc pairwise comparisons revealed that
the English listeners were significantly faster than both the Slovak and Czech listeners,
while Czechs and Slovaks did not differ significantly from each other. Although the
difference between the latter groups was not statistically significant, Figure 2 shows that
the Slovaks were on average faster than Czechs. That, however, does not address the
hypotheses about the influence of glottalization and, therefore, the interaction between
the variables is of interest. Analysis of variance returned significant interaction between
the mother tongue of the listeners and the glottalization variable: F (1, 87) = 7.26; p =
0.0012 Figure 2 indicates that this result is again caused by the difference between the
English on the one hand, and the Czech and Slovak on the other hand. Although there
are allegedly differences in the production of the word-initial glottalization between
Czech and Slovak, we found no difference in perceptual testing between the speakers of
these two languages.
In addition to this main outcome, we carried out some further analyses to find out,
whether the reaction times could have been influenced by any of our captured linguistic
or other variables. These analyses were also based on ANOVA for repeated measures,
but were calculated for individual test items rather than for individual subjects.
Jan Volín, Mária Uhrinová and Radek Skarnitzl
Reaction time (ms)
with glottal stop
Figure 2: Interaction between the variable of the mother tongue and glottalization. Mean
reaction times of the three listener groups to words with (on the left) and without (on the
right) the word-initial pre-glottalization.
First of all, we found a significant effect of word stress. Reactions to words with stressed
initial vowels were faster: F (1, 3740) = 25.1; p < 0.001. Figure 3 displays the mean
reaction times which suggest that the English listeners benefited more from the presence
of stress than the other two groups, whose behaviour with respect to word stress was
again very similar. There was no significant interaction between stress, mother tongue
and glottalization (p = 86).
We also decided to test the effect of the target position in the phrase. The factor of
position had four levels: the items in the second stress-group were labelled early (no first
stress-group targets were tested), the third stress-group was mid, the fourth was late-mid,
and the remaining items were late. Unlike the findings in Buxton (1983), our results did
not show any interesting trend. The early, mid and late-mid positions led to practically
the same result and only the late position produced significantly longer reaction times.
Similarly, we did not find any significant difference between reactions to structural
words (e.g., conjunctions, prepositions) and content words (e.g., nouns, adjectives).
Semantic status obviously did not matter in the word-monitoring task. This may have
been caused by the fact that the test items were extractions from longer sentences and
their semantics was damaged: the price we had to pay to meet the requirement of
unpredictability of the targets.
The Effect of Word-Initial Glottalization on Word Monitoring in Slovak Speakers of English
Reaction time (ms)
Figure 3: Mean reaction times of the three listener groups to words with stressed initial
vowel (on the right) and with unstressed initial vowel (on the left).
The last variable we tested was that of manipulation. Our set of 96 items consisted of 48
instances of natural production of glottalization or natural linking (24+24). The other
half of the test items had glottal stop either edited out or added (again 24+24). Although
the manipulated items did not exhibit any consciously perceivable artefacts, we wanted
to know whether there was any difference in reaction times to them. Figure 4 shows that
manipulation indeed matters and there is even highly significant interaction between this
variable and glottalization: F (1, 3734) = 144.6; p < 0.001. The items in which glottal
stops were edited out behaved in the same way as the analogical natural items, but the
items where the glottal stop was added led to slower reactions compared with items
where glottal stop was naturally present. This result is discussed below.
4. Discussion
The presence or absence of the glottal element before a word-initial vowel influences the
perceptual processes in all three language groups. However, our new group of listeners –
the Slovaks – did not produce results similar to the Spanish sample we investigated
previously. Although the Slovak listeners should differ from the Czech ones in the same
direction as the Spanish, they did not produce a similar effect, they did not differ
significantly from the Czech listeners. A possible explanation is that mutual contacts of
Jan Volín, Mária Uhrinová and Radek Skarnitzl
Czechs and Slovaks which are, for instance, reflected in the fact that they do not have to
learn each other’s language and still understand each other without difficulties, overrule
the influence of the native language on the perception of a facultative prosodic marker
like the glottal stop before a word-initial vowel. Perhaps the Spanish, who should be
using glottalization similarly to the Slovaks, interact less with speakers of languages
where glottalization is common. (The French, for example, are known to link words very
consistently without glottalizing the onsetless syllables.) Another explanation could be
that despite the traditional descriptions in grammar books the younger generation of
Slovaks uses more glottal stops than the older generations used to. This possibility is
supported by our informal observation but has to be verified empirically.
Reaction time (ms)
with glottal stop
Figure 4: Mean reaction times of the listeners to words with (on the left) and without (on the
right) the word-initial pre-glottalization according to the manipulation status of the item.
The general effect of stress confirms the expectations based on the earlier work of other
researchers, but smaller impact of stress on Czech and Slovak listeners is, to our best
knowledge, a new empirical finding. However, the effect of the target position in the
phrase and the effect of the semantic status of the words were not confirmed. As stated
above, we assume that the semantic unpredictability of the carrier phrases could have
caused this result.
On the other hand, we found a significant effect of test item manipulations. Although
the phrases with added glottal stops displayed no obvious acoustic artefacts, they
produced longer reaction times than items with naturally present glottalizations. We
believe that this finding underlines the importance of inherent stress patterns of a
language, whose alterations leads to the increase in processing load (cf. Buxton, 1983).
The research was supported by the internal grant of the Faculty of Arts, Charles
University in Prague. The authors would also like to thank to M-P. Bissiri who, as the
The Effect of Word-Initial Glottalization on Word Monitoring in Slovak Speakers of English
intern at the Institute of Phonetics in Prague at the time of the initial stages of the study,
collected some of the data.
Bissiri, M. P., Lecumberri, M. L., Cooke, M. and Volín, J. 2011. The Role of WordInitial Glottal Stops in Recognizing English Words. In: Proceedings of the 12th
Annual Conference of ISCA Interspeech: 165-168. Florence: ISCA.
Buxton, H. 1983. Temporal predictability in the perception of English speech, In: A.
Cutler and D. R. Ladd Eds.) Prosody: Models and Measurements: 111-121. Berlin:
Cruttenden, A. 1994. Gimson’s Pronunciation of English. London: Edward Arnold.
Dilley, L. Shattuck-Hufnagel, S. and Ostendorf, M. 1996. Glottalization of word-initial
vowels as a function of prosodic structure. Journal of Phonetics 24: 423-444.
Forster K.I. and Forster, J.C. 2003. DMDX: A Windows display program with
millisecond accuracy. Behavior Research Methods, Instruments, & Computers, 35/1:
Gordon, M. and Ladefoged, P. 2001. Phonation types: a crosslinguistic overview.
Journal of Phonetics 29: 383-406.
Hawkins, S. 2003. Roles and representations of systematic fine phonetic detail in speech
understanding. Journal of Phonetics 31: 373-405.
Kilborn, K. and Moss, H. 1996. Word Monitoring. Language and Cognitive Processes
11/6: 689-694.
Kohler, K. 1994. Glottal stops and glottalization in German. Phonetica 51: 38-51.
Local, J. 2003. Variable domains and variable relevance: interpreting phonetic
exponents. Journal of Phonetics 31: 321-339.
O’Connor, J.D. 1980. Better English Pronunciation. 2nd Edition. Cambridge: CUP.
Przedlacka, J. and Ashby, M. 2011. Acoustic correlates of glottal articulations in
Southern British English. In: Proccedings of ICPhS XVII: 1642-1645. Hong Kong:
Redi, L. and Shattuck-Hufnagel, S. 2001. Variation in the realization of glottalization in
normal speakers. Journal of Phonetics 29: 407-429.
Schwartz, G. 2011. Final devoicing in Polish English: Segmental or prosodic error?
Presentation at Accents 2011, Lodz: UoL.
Wells, J.C. 1990. Longman Pronunciation Dictionary. Harlow: Longman.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0035-8
UMCS Lublin
In Gonet (2010), one of the present authors found out that English word-final
phonologically voiced obstruents in the voicing-favouring environment exhibit
asymmetrical, if not erratic, behaviour in that voicing in plosives is most often retained
while in fricatives voicing retention concerns only about 1/3 of the cases, with the other
possibilities (partial and complete devoicing) occurring in almost equal proportions. The
present study is an attempt at exploring the intricacies of devoicing in English to examine
to what extent the general tendency towards obstruent devoicing is overridden by voicing
retention triggered by adjacent voiced segments both within words and across word
boundaries. This study is based on a relatively large knowledge base obtained from
recordings of spontaneous R. P. pronunciation.
The present study is a follow-up on Gonet (2010), whose focus was on consonantal
voicing in the word-final position. The paper presented the behaviour of English
obstruents and indicated that the voicing of English word-final obstruents is best
described by referring to the combination of word position and the voicing of the initial
sound in the following word. These combinations fall into two major classes:
phonation-favouring (if they are followed by a vowel or a voiced consonant),
phonation-impeding (before a pause or before a voiceless sound).
The study reviewed a number of publications, including those by Ball and Rahilly
(1999), Catford (1964, 1977, 1988), Clark and Yallop (1990), Davenport and Hannahs
(1998), Fujimura and Erickson (1999), Gimson (1962, 2001), Gonet (1989, 2001), Gonet
and Stadnicka. (2006), Jassem (1983), Ladefoged (1971, 1975), Lisker and Abramson
(1964), Maddieson (1999), Ohala. (1999), Port and Rottuno (1979), Raphael et al.
(1975), Roach (1983), Shockey (2003), Szpyra-Kozłowska (2003), Van den Berg
(1958), and was based on a large body of recordings of spoken English by 6 native
speakers. Yet the results exhibited asymmetrical, if not erratic, behaviour; the details are
presented in Table 1 as well as Figures 1 and 2.
Completely dev.
Fully voiced
Partially dev
Fully voiced
----------------------------Completely dev.
----------------------------Completely dev.
Fully voiced
Partially dev.
Completely dev.
Table 1. Voicing in English word-final obstruents (Gonet 2010).
fully voiced
partially devoiced
completely devoiced
Before a pause
Before -V
Before +V
Figure 1 Distribution of voicing in word-final plosives (Gonet 2010).
fully voiced
partially devoiced
completely devoiced
Before a pause
Before -V
Before +V
Figure 2 Distribution of voicing in word-final fricatives (Gonet 2010).
More on the Voicing of English Obstruents: Voicing Retention vs. Voicing Loss
Many authors indicate that obstruents have a natural tendency to devoice, especially in
voicing-impeding environments. Hence, for voiced obstruents, hypothetically there
apply 2 opposing forces:
Devoice an obstruent, especially in word-final position
Retain voicing, especially before a voiced sound
In view of the above, the goal of the present study was to explore the question to what
extent the general tendency towards obstruent devoicing is overridden by voicing
retention triggered by adjacent voiced sounds both within words and across word
Design of the experiment
As most of the studies on obstruent voicing in English are based on audio material
elicited in the form of read wordlists or lexical items embedded in sentence-frames, it
appeared imperative that this study should be based on spontaneous speech. For this
reason, the authors extracted audio from 4 high definition video recordings of interviews
with native speakers of English (2 male, 2 female), whose accent features were
characteristic of broadly defined Received Pronunciation.
2.2. Method
The audio recordings were then analyzed with a view to extracting sequences of sounds,
in which (phonologically) voiced obstruents were flanked by other voiced segments.
From each of the recordings, 200 samples were taken out. The selection was not random;
the samples were extracted one after another as they appeared in the recording. Thus
obtained 800 tokens of obstruents (X) between voiced sounds (V) could generally be
classified into three categories (word initial (V#XV), word medial (VXV), and word
final (VX#V):
have go, my business, editors of
editors, about, budding, suggestion
have go, and I, and er, inside of
The waveforms and spectrograms of the samples were then inspected and labelled as
either ‘fully voiced’ or ‘devoiced.’ The analyzed tokens were assigned to the first
category when voicing was maintained throughout the closure and release in the case of
stops, and during the entire period of close approximation in spirants. The segments
were classified as ‘devoiced’ whenever there was loss of voicing in the medial phase of
the stop and/or VOT was positive, and in the period of close approximation in fricative
Examples of both cases are shown below. Figures 3 and 4 present voicing maintained
throughout all stages of the plosive’s articulation; a fully voiced fricative is exemplified
in Figure 5, whereas Figures 6 and 7 show devoiced obstruents.
Figure 3 Full voicing of closure in [edɪ]tors
Figure 4 Full voicing in closure in welc[om#ba]ck
More on the Voicing of English Obstruents: Voicing Retention vs. Voicing Loss
Figure 5 Full voicing od /z/ in edit[əz#ə]f
Figure 6 Devoicing of /z/ in u[s#]at
Figure 7 Devoicing of /z/ in character[s#]f
Overall, 34 per cent of all the tokens were pronounced with voicing loss. The sections
below present a detailed analysis of the results, taking into account the following factors:
phonological category of the examined obstruents
manner of articulation
position in the word
following and preceding context
position in the syllable
lexeme type
If we view the number of devoiced tokens in individual lenis obstruents, it appears that
the differences between particular sound categories are more incremental than radical
(cf. Fig. 8).
Devoicing of particular obstruents
Figure 8 The percentage of devoiced tokens in particular sounds.
More on the Voicing of English Obstruents: Voicing Retention vs. Voicing Loss
Although the arrangement of sounds in the sequence looks random and does not indicate
any relationship with place or manner of articulation, there is a statistically significant
difference (p<0.001) between the affricate which tends to be devoiced in more than 60%
of the cases, and plosives and fricatives, in which devoicing occurs, respectively, in 35%
and 30% of the cases (Figure 9). Moreover, the results obtained for obstruents containing
fricative segments are in line with those presented in Haggard’s study (1978) in that
there appears a similar progression of devoiced sounds /v/ - /z/ - /ʤ/, with the palatoalveolar affricate becoming devoiced most often, and the labio-dental fricative most
frequently retaining its voicing. It should also be noticed that the result for the palatoalveolar fricative /ʒ/ should not be regarded as valid for the whole category of lenis
palato-alveolar fricatives due to the extremely low frequency of the sound; there
occurred only one instance of this consonant in the analyzed material (Asia).
Devoicing of obstruents according to
manner of articulation
per cent
Figure 9 The percentage of devoiced tokens in particular manners of articulation.
In regard to the position in the word, voicing is retained most often word internally
(80%), whereas most devoicing occurs word-initially (44%, Fig. 10), which shows the
relevance of word boundaries in the implementation of voicing as pointed out by
Docherty (1992:32). Similarly, in the case of plosives, the results (Fig. 11) match those
in Flege and Brown (1982) and Westbury (1979) in that the sounds are least frequently
devoiced in word-medial position, namely in 18% and 3.5%, respectively. The more
frequent occurrence of word-medial devoicing in the present study, particularly in
comparison to Westbury’s result, could stem from the fact that the above mentioned
analyses were carried out on elicited disyllabic words, not on spontaneous speech.
Wiktor Gonet and Radosław Święciński
Devoicing of obstruents according to the
position in the word (%)
Figure 10 The percentage of devoiced tokens in different word positions.
Plosives: word position
Figure 11 The percentage of devoiced plosives in different word positions.
Regarding the contexts in which obstruents occur, they are most often devoiced in the
vicinity of an adjacent obstruent: 59% in the preceding, and 54% in the following
context. In the context of preceding and following vowels and sonorants, devoicing is
less frequent (p<0.001, cf. Figures 12 and 13). An analogous observation was made by
Haggard (1978) in a study of words pronounced in isolation, which confirms that the
neighbouring sounds are a relevant factor in the realization of voicing.
More on the Voicing of English Obstruents: Voicing Retention vs. Voicing Loss
Preceding context of the devoiced obstruents
Figure 12 The percentage of devoiced tokens as preceded by specific sound categories.
Following context of the devoiced obstruents
Figure 13 The percentage of devoiced tokens as followed by specific sound categories.
Considering the effect of stress on the voicing of intervocalic lenis obstruents, there is
more devoicing (p<0.001) in stressed, than in unstressed, syllables (Fig. 14), while the
position in the syllable does not exert a statistically significant effect on the whole (Fig.
15). Assigning word-medial obstruents to syllables was performed according to the
Maximal Onset Principle (Goldsmith, 1990:128).
The effect of stress
Stressed syllables
Unstressed syllables
Figure 14 The effect of stress on the percentage of devoiced tokens.
The effect of the position in the syllable
Figure 15 The effect of the position in the syllable on the percentage of devoiced obstruents.
When the interaction of stress and syllable position is taken into account, it appears that
the greatest percentage of devoiced obstruents appears in stressed onsets. However, there
is a similar amount of devoicing in the opposing environment, i.e. in unstressed codas,
while significant differences concern the two previously mentioned contexts vs. stressed
codas and vs. unstressed onsets (p=between 0.001 to 0.01, Fig. 16). Thus, it cannot be
stated that a particular combination of the position in the syllable and the existence or
lack of stress enhance of hinder devoicing.
Interaction of stress x syllable position
Unstressed onsets Stressed codas
Unstressed codas Stressed onsets
Figure 16 The effect of stress and the position in the syllable on the percentage of devoiced
The distinction between function and content words has not found a reflection in the
amount of devoicing, and was found in 31% and 36% of cases, respectively (Fig. 17).
Lexeme type
Content words
Function words
Figure 17 The percentage of devoiced obstruents in content and function words.
Let us now review the effect of stress in each manner of articulation (Figures 18-20).
The affricate: the effect of stress
Stressed syllables
Unstressed syllables
Figure 18 The percentage of devoiced affricates in stressed and unstressed syllables.
Plosives: the influence of stress
Stressed syllables
Unstressed syllables
Figure 19 The percentage of devoiced plosives in stressed and unstressed syllables.
Fricatives: the influence of stress
Stressed syllables
Unstressed syllables
Figure 20 The percentage of devoiced fricatives in stressed and unstressed syllables.
More on the Voicing of English Obstruents: Voicing Retention vs. Voicing Loss
Significant differences between the amount of devoicing in stressed vs. unstressed
syllables were found were found in the affricate (Fig. 18) and in plosives (Fig. 19), while
in fricatives the differences were not significant (Fig. 20).
Another comparison was done for the position in the syllable. As was observed in the
effect of stress, here, too, the figures for affricates (Fig. 21) are markedly larger than
those for fricatives (Fig. 21) and plosives (Fig. 22).
The affricate: syllable position
Figure 21 The percentage of devoiced affricates in the onset and coda of the syllable
Fricatives: syllable position
Figure 22 The percentage of devoiced fricatives in the onset and coda of the syllable
The relation of devoicing vs. position in the syllable is reversed in plosives, where more
devoicing was noted in onsets than in codas (Fig. 23).
Plosives: syllable position
Figure 23 The percentage of devoiced plosives in the onset and coda of the syllable.
Finally, let us observe the interaction of devoicing with the position in the syllable x
stress (cf. Fig. 15 averaged across manner of articulation).
As there appeared no token containing the palato-alveolar affricate in an unstressed
coda, Figure 25 shows only three bars for the contexts available in the study.
The affricate: interaction of stress x syllable
Stressed onsets
Stressed codas
Unstressed onsets
Figure 24 The percentage of devoiced affricates in stressed and unstressed codas and onsets
Thus in the affricate, devoicing is significantly stronger (p<0.001) when under stress.
The results in plosives (Fig. 23) are similar to those in fricatives (Fig. 24), with
unstressed onsets and stressed codas favouring devoicing more than the remaining two
contexts (p<0.001).
Plosives: stress vs. syllable position
Unstressed onsets Stressed codas
Unstressed codas Stressed onsets
Figure 25 The percentage of devoiced plosives in stressed and unstressed codas and onsets.
Fricatives: stress vs. syllable position
Unstressed onsets Stressed onsets Unstressed codas
Stressed codas
Figure 26 The percentage of devoiced fricatives in stressed and unstressed codas and onsets
Most of the factors considered in the present study appear to affect voicing in
intervocalic obstruents. Regarding particular sound categories and manners of
articulation, the affricate is devoiced twice as frequently as plosives and fricatives, and
of other obstruents, /z/ is most frequently devoiced, probably because its voicing is often
predictable morphologically and does not have to be manifested phonetically, while /v/
and /ð/ were devoiced rarely. Plosives are devoiced still less frequently than /z/.
Considering the position of analyzed sounds in the word, it is interesting to see that
obstruents devoice more frequently when word-initial than when word-final. This shows
that in English the tendency to prolong VOT in stressed syllables exerts a stronger effect
than the reduction of Voicing-Into-Constriction.
Examining voicing in relation to adjacent sounds, it was noted that preceding and
following voiced obstruents do not retain voicing as strongly as one would expect;
vowels and sonorants exert a stronger voicing-retention effect.
Devoicing is also conditioned suprasegmentally, as most frequently devoicing takes
place in stressed syllables.
Ball, M.J. / Joan Rahilly, J. (1999) Phonetics The Science of Speech, London: Arnold
Catford, J. C. (1988) A Practical Introduction to Phonetics, Oxford: Oxford University
Press, (2001)
Catford, J.C. (1964) “Phonation types: the classification of some laryngeal components
of speech production”, (in:) Abercrombie, D. et al. (eds.) In honour
of Daniel
Jones (p. 26-37), London: Longman
Catford, J.C. (1977) Fundamental Problems in Phonetics, Edinburgh: Edinburgh
University Press
Clark, J. / Yallop, C. (1990) An Introduction to Phonetics and Phonology, Oxford: Basil
Davenport, M. / Hannahs, S.J. (1998) Introducing Phonetics and Phonology, London:
Docherty, G.J. 1992. The Timing of Voicing in British English Obstruents. Berlin: Foris
Flege, J. & Brown, W., Jr. (1982). The voicing contrast between English /p/ and /b/ as a
function of stress and position-in-utterance. Journal of Phonetics, 10, 335-345.
Fujimura, O. / Erickson, D. (1999) “Acoustic phonetics”, (in:) Hardcastle, W.J. / Laver,
J. (eds.) The Handbook of Phonetic Sciences (p. 65-115), Oxford: Blackwell
Gimson, A.C. (1962) Gimson’s Pronunciation of English, London: Arnold, (2001)
Goldsmith, A. 1990. Auto segmental and Metrical Phonology, Massachusussetts: Basil
Blackwell LTD
Gonet, W. (1989) Factorial Analysis of the Duration of R.P. Monophthongs in
Monosyllabic Words, diss., Institute of English, Maria Curie-Skłodowska University,
Lublin; Acoustic Phonetics Department, Polish Academy of Sciences, Poznań
Gonet, W. (2001) “Obstruent Voicing in English and Polish. Pedagogical Perspective”,
International Journal of English Studies, 1, nr 1, 73-92
Gonet, W. (2001) Voicing Control in English and Polish: A Pedagogical perspective.
International Journal of English Studies. Murcia (Spain): Universidad de Murcia, pp.
Gonet, W. 2010. Dispelling the Myth of Word-Final Obstruent Voicing in English: New
Facts and Pedagogical Implications. In: E. Waniek-Klimczak (ed.), Issues in Accents
of English 2. Cambridge: Cambridge Scholars Publishing pp. 361-376.
Gonet, W. and K. Różańska (2003). Voice Onset Time in Word Initial Lenis Plosives in
the Speech of Four BBC Presenters. Speech and Language Technology Vol. 7, pp.
35-52. Poznań: Polish Phonetic Association.
Gonet, W. and L. Stadnicka. (2006). Vowel Clipping in English. Speech and Language
Technology. Vol. 8. Poznań: Polish Phonetic Association.
Haggard, M. (1978). The devoicing of voiced fricatives. Journal of Phonetics 6. 95-102.
Jassem, W. (1983) The Phonology of Modern English, PWN: Warszawa
Ladefoged, P. (1971) Preliminaries to Linguistic Phonetics, Chicago: University of
Chicago Press
Ladefoged, P. (1975) A Course in Phonetics, New York: Harcourt Brace Jovanovich
Lipowska, E.B. (1991) Voicing of word final obstruents in R.P. English, M.A. thesis,
Maria Curie-Skłodowska University, Lublin
Lisker, L. and Abramson, A.S. (1964) “A cross-language study of voicing in initial
stops: Acoustical measurements”, (in:) Language and Speech 10, 1-28
Maddieson, I. (1999) “Phonetic universals” (in:) Hardcastle, W.J. / Laver, J. (eds.) The
Handbook of Phonetic Sciences (p. 619-639), Oxford: Blackwell Publishers
Ohala, J.J. (1999) “The Relation between Phonetics and Phonology”, (in:) Hardcastle,
W.J. / Laver, J. (eds.) The Handbook of Phonetic Sciences (p. 674-694), Oxford:
Blackwell Publishers
Port, R.F. and Rottuno, R. (1979). “Relation between voice-onset time and vowel
duration”, (in:) Journal of the Acoustical Society of America, 66, 654-662
Raphael, L.J. et al (1975) “Vowel duration as
cues to voicing in word-final stop
consonants: spectrographic and perceptual studies”, Journal of Speech and Hearing
Research, 18, 389-400
Roach, P. (1983) English Phonetics and Phonology. A Practical Course, Cambridge:
Cambridge University Press, 1991
Schockey, L. (2003) Sound Patterns of Spoken English, Oxford: Blackwell Publishing
Szpyra-Kozłowska, J. (2003) “The Lingua Franca Core and the Polish learner”, (in:) W.
Sobkowiak, E. Waniek-Klimczak (eds.) Dydaktyka fonetyki języka obcego.
Neofilologia. Zeszyty Naukowe, 5, 193-210, Płock: Państwowa Wyższa Szkoła
Szpyra-Kozłowska, J. et al. (2003) “Priorytety w nauczaniu języka angielskiego”, (in:)
W. Sobkowiak, E. Waniek-Klimczuk (eds.) Dydaktyka fonetyki języka obcego.
Neofilologii. Zeszyty Naukowe, 5, 211-223, Płock: Państwowa Wyższa Szkoła
Van den Berg, J. (1958) “Myoelastic-aerodynamic theory of voice production”, Journal
of Speech and Hearing Research 3, 1, 227-244
Westbury, John (1979) Aspects of the Temporal Control of Voicing in Consonant
Clusters in English, Texas Linguistic Forum 14. Department of Linguistics,
University of Texas, Austin.
DOI 10.2478/v10015-011-0034-9
University of Silesia
This paper presents a set of simple statistical measures that illustrate the difference
between native English speakers and Polish learners of English in varying the length of
vocalic segments in read speech. Relative vowel duration and vowel length variation are
widely used as basic criteria for establishing rhythmic differences between languages and
dialects of a language. The parameter of vocalic duration is employed in popular measures
such as ΔV (Ramus et al. 1999), VarcoV (Dellwo 2006, White and Mattys 2007), and PVI
(Low et al. 2000, Grabe and Low 2002). Apart from rhythm studies, the processing of
data concerning vowel duration can be used to establish the level of discrepancy between
native speech and learner speech in investigating other temporal aspects of FL
pronunciation, such as tense-lax vowel distinction, accentual lengthening or the degree of
unstressed vowel reduction, which are often pointed out as serious problems in the
acquisition of English pronunciation by Polish learners. Using descriptive statistics
(relations between personal mean vowel duration and standard deviation), the author
calculates several indices that demonstrate individual learners' (13 subjects) scores in
relation to the native speakers' (12 subjects) score ranges. In some tested aspects, the
results of the two groups of speakers are almost cleanly separated, which suggests not
only the existence of specific didactic problems but also their actual scale.
1. Introduction
Foreign language (FL) pronunciation is traditionally assessed by the teacher on the basis
of immediate subjective impressions. Although in classroom teaching practice this will
probably remain the basic approach, the recent development of PC-operated methods of
speech analysis has made them available to people outside the circle of professional
laboratory phoneticians, including FL teachers, who can now consider the use of
acoustic analysis as an interesting accessory didactic aid.
Not all speech signal parameters can be easily employed for pedagogical purposes,
but speech unit duration measurement is relatively reliable and informative. The
segmentation of speech chain is not always an easy task even if clear and consistent
criteria are applied, and it is time-consuming, but before the automatic methods are made
fully reliable, manual segmentation gives the researcher a better insight into the data.
The duration of speech units provides a researcher with a lot of useful information.
Vowel length appears to be a particularly interesting aspect of speech timing from the
Research supported by the Polish Ministry of Science and Higher Education via
point of view of the Polish learner of English (cf. Waniek-Klimczak 2005). This is
because relative vocalic duration in English can cue
- tense/lax vowel contrast (as an accessory cue)
- fortis/lenis contrast in coda
- prominence distribution
- prosodic domain boundaries
- rhythm patterns
Polish, however, is characterised by
- no tense/lax vowel distinction
- the voiced/voiceless contrast neutralised in coda
- very little unstressed vowel reduction
- allegedly weaker accentual lengthening.
Moreover, although final lengthening and initial strengthening are said to be universal
phenomena, we may face cross-linguistic discrepancies in the scale of their effects on
prosodic unit duration. Finally, Polish gives the listener more syllable-timing impression
despite extremely complex consonant clusters.
All these discrepancies may lead to cross-linguistic interference in the process of FL
learning. A number of researchers dealing with English phonetics pedagogy indeed
report problems with insufficient intrinsic vowel length distinction (Sobkowiak 1996,
Szpyra-Kozłowska 2003, Nowacka 2008, Bryła 2010), insufficient unstressed syllable
reduction and too short prominent syllables in Polish learners (Avery and Ehrlich 1996,
Hewings 2004, Dziubalska-Kołaczyk et al. 2006, Gonet et al. 2010) and especially
insufficient vowel reduction in Polish learners of English (Luke and Richards 1982,
Sobkowiak 1996, Hewings 2004, Nowacka 2008, Gonet et al. 2010, Porzuczek 2010).
Most opinions, however, are formulated with reference to auditory assessment and
pedagogical experience.
2. Objectives of the present study
There are two main objectives of the present study:
- to provide evidence for vocalic timing differences between native English speakers
and Polish learners that will illustrate the scale of learners' problems with the
'short'/'long' and stressed/unstressed temporal vocalic contrasts,
- to illustrate the developmental tendencies in the learners' speech by repeating the
testing procedure after 7 months of study including a course of practical phonetics.
The obtained evidence can also be used for further investigations into the rhythmic
patterns of the Polish learner's English speech.
3. Method
The subjects were 13 Polish first-year students of English at a teacher training college.
Their performance (2 recording sessions – October 2006, May 2007), originally recorded
for a more comprehensive study of EFL speech timing (Porzuczek, in press), was
analysed in comparison to the performance of 12 English secondary school students in
Cambridge, downloaded from the IViE database (Grabe et al. 2001). The participants
read the Cinderella passage (Grabe et al. 2001, see Appendix). They had been given time
to practise the reading prior to the recording.
The tested material included 46 vocalic syllable nuclei (see Appendix):
- 20 unstressed reduced vowels (17 non-phrase-final)
- 20 stressed monophthongs (10 non-phrase-final), (5 ‘long’ vowels, 12 ‘short’
vowels, 3 æ’s)
- 6 stressed diphthongs (3 non-phrase-final)
Vowels adjacent to approximants and phrases showing significant interspeaker
differences in prominence distribution were avoided. Stressed syllables were thus
lexically and syntactically determined. This approach helps to reduce the problems
which call for automatic segmentation (e.g. Loukina et al. 2011). The acoustic analysis
for the purposes of the present research was based on manual segmentation and
measurement (standard criteria) from the spectrograms and waveforms using the
PRAAT software (Boersma 2001). The data analysis involved descriptive statistics
including group and personal vowel duration medians, means and standard deviation.
Raw measurements were normalised for speech rate by using proportions of vowel class
mean durations and VarcoV (Dellwo 2006, White and Mattys 2007). VarcoV is
calculated as the percentage proportion of standard deviation from mean vowel duration
(SD) to mean vowel duration (VarcoV=SD*100%/meanV, where V=vowel duration).
Acoustic research tools based on duration, such as the recent rhythm measures, yield
results marked with significant individual variation. As Loukina et al. (2011) notice, in
cross-linguistic rhythm studies more variation is often found between individual
speakers than between languages. The same problem may therefore appear in comparing
native and non-native speech within one language. This poses a problem of data
interpretation, especially for normative didactic purposes. It seems justified though to
assume that results out of the range of native speakers' scores indicate non-native-like
pronunciation features.
4. Results
Predictably, group means show significant differences between native and non-native
English speech in both investigated aspects. Mean stressed vowel durations are presented
in Table 1.
group\V class
text grand mean
133 (SD=65=48%)
122 (SD=58=48%)
130 (SD=72=55%)
Table 1: Mean durations (ms) of particular vowel classes (D=diphthong, L=long, A=ash,
S=short) in stressed syllables and vowel length variability (Porzuczek, in press).
The general results suggest similar articulatory rates in both groups of subjects, as
indicated by similar mean vowel durations. Stressed vowel duration variability is higher
in native speakers (ENG). After the training (PL2), the learners noticeably accelerate,
but the variability index (SD/mean duration) remains identical. There is also a larger
temporal difference between particular vowel classes in the pronunciation of native
Table 2 presents more information concerning the performance of individual
speakers, which is important in the context of teaching groups of learners and setting the
group\V class
1.8-2.25 (2.1)
1.22-1.75 (1.5)
.92-1.51 (1.25)
1.57-2.33 (1.9)
1.21-1.78 (1.5)
.94-1.59 (1.33)
1.95-2.82 (2.4)
1.47-2.29 (1.7)
1.12-1.85 (1.69)
Table 2: Vowel class mean length proportions in individual speakers' score ranges.
Group medians in parentheses.
It turns out that the learners' group medians for L:S ratio (1.5) in both recordings
approximate the native speakers' minimum (1.47). However, the ranges largely overlap
and, despite significant group differences, most Polish learners fall within the norms of
native-like performance. Individual speakers' scores are shown in Appendix B.
The results indicate that the duration contrasts between vowel classes are clearer in
native speakers. Still, even though group scores differ significantly, there are a number
of native speakers who show less vowel length variation. This may suggest that either
many Polish learners make a proper distinction between the vowel classes, at least for
the 'long'/'short' vowel contrast, or that the scale of this quantitative distinction is
irrelevant as long as a minimum contrast level is reached, e.g. approximately a 1.5:1
ratio for the present text. In order to account for possible effects of extraneous variables,
we tried to observe the impact of pre-fortis clipping and final lengthening. The relevant
calculations showed 15% shorter vowels in pre-fortis positions in the native
performance. The learners made such vowels 8% shorter in the first recording and 16%
shorter in the second. There was more difference in final lengthening, however, which
made the native vowels three times longer than in non-phrase-final syllables, while the
Polish learners made their vowels in prepausal syllables twice as long (Table 3). The
ratio, which we call FLQ (final lengthening quotient), is obtained by dividing a subject's
mean vowel duration in phrase-final syllables by mean vowel duration in non-phrasefinal syllables.
FLQ = mean final (N=7): mean non-final (N=19)
1.64-2.51 (1.95)
1.63-2.75 (2.09)
2.28-3.32 (2.9)
Table3: Personal final lengthening quotient (FLQ) ranges and group medians (in
The same data, illustrating individual subjects' performance, are also presented in Fig. 1
final lengthening quotient (FLQ)
Fig. 1: Individual final lengthening quotient (FLQ) in English and Polish speakers.
The strong effect of final lengthening makes it advisable to present the results of the
research with respect to non-phrase-final syllables as well as the overall scores, even
though the process does not seem to have a very strong effect, for instance, on L:S ratios
(Table 4) or general vowel length variability (Table 5), especially in terms of score
group\V class
L:S (non-final)
L:S (overall)
1.35-2.18 (1.7)
1.24-1.79 (1.6)
1.5-2.32 (1.7)
1.22-1.75 (1.5)
1.21-1.78 (1.5)
1.47-2.29 (1.7)
Table 4: Personal 'long':'short' vowel ratio ranges and group medians.
overall (26)
non-final (19)
94-127 (108)
group mean
overall (26)
39-55 (49)
non-final (19)
33-51 (39)
112-160 (132)
100-140 (127)
82-119 (106)
39-62 (47)
30-49 (36)
106-155 (127)
87-121 (100)
44-63 (53)
30-51 (44)
Table 5: Personal mean vowel duration ranges and group medians (2-3). Personal vowel
length variation (5-6).
Apart from final lengthening and pre-fortis clipping, there is yet another potential
extraneous variable, viz. the complex and gradient nature of prominence. As was already
mentioned earlier, because of the lack of a continuous scale that could be used to
measure prominence taking into account all its components and their contribution, we
can only try to control its effects on duration by careful selection of contexts where
structural prominence is unambiguously distributed.
Generally, two conclusions can be formulated with respect to stressed vowel length
variability. Firstly, all native speakers and a majority (2/3) of Polish speakers before
training make the long vowels at least 50% longer than the short ones.
Secondly, final lengthening appears much stronger in the pronunciation of native
Far more spectacular results are obtained if vowels in both stressed and unstressed
syllables are taken into consideration. The differences can be captured by both VarcoV
and vowel reduction quotient (VRQ), calculated for individuals by dividing their mean
unstressed vowel duration by mean stressed vowel duration. Tables 6 and 7 show the
relevant VarcoV (SD:M) results2 for non-final contexts and all tested vowels. Native
speakers' codes are shown in bold. Polish learners’ codes are followed by "1" (1st
recording) or "2" (second recording).
The figures are not multiplied by 100 as in the original VarcoV formula.
Table 6: Non-final mean vowel duration (M)
and duration variability (SD:M)
(19 stressed vowels + 17 schwas)
Table 7: Overall mean vowel duration and
duration variability (SD:M)
(26 stressed vowels + 20 schwas)
The data from Tables 6 and 7 are also presented as a graph in Figure 2 for a clearer
illustration of cross-group and individual differences.
Vowel length
Non-final vowel
length variability
SD / mean Vdur
SD / mean Vdur
Figure 2: Vowel duration variability.
VarcoV shows the general vowel length variability, which may be influenced by other
factors, while VRQ focuses on the stressed/unstressed distinction, and shows the scale of
quantitative vowel reduction. It is presented in Table 8 and Figure 3.
MstrV Mschwa
The case of subject RM is an outstanding argument for the necessity to normalise the data for
speech rate. Together with CMC, CLH and CLP it may also convince learners that high speed
does not equal proficiency in FL speech performance.
Andrzej Porzuczek
MstrV Mschwa
Table 8: Quantitative vowel reduction scale in native English speakers and Polish learners.
S=subject, MstrV=mean stressed vowel duration, Mschwa=mean reduced vowel duration,
VRQ=Mschwa:MstrV. Native speakers' codes in bold. Polish learners’ codes followed by "1"
(1st recording) or "2" (second recording).
Vowel Reduction Quotient (VRQ):
Personal mean schwa:accentedV ratio
(non-phrase-final syllables)
Figure.3: Vowel Reduction Quotient.
The VRQ scores suggest that in native English speech the unstressed vowels are at least
50% shorter than the stressed ones. Polish learners, even after pronunciation training,
hardly ever reach this level of vowel reduction. The significant difference between the
Measuring Vowel Duration Variability in Native English Speakers and Polish Learners 211
groups is also reflected in group median differences. Table 9 presents both raw schwa
durations and measures normalised for speech rate (VarcoV, VRQ).
schwa median (ms)
VarcoV median
VRQ median
Table 9. Group medians for vowel reduction and duration variability measures.
5. Conclusions
Simple descriptive statistics concerning vowel duration which were used in this study
help to provide evidence supporting the following statements:
1. In Polish learners’ read speech, there is less difference between ‘long’ and ‘short’
vowels than in native production (but the evidence is rather weak).
2. Final lengthening is considerably stronger in native speakers.
3. Vowel reduction is a serious problem for Polish learners, who produce too long
unstressed vowels in terms of both absolute and relative durations. Despite some
progress, this remains difficult even after training.
4. Considering all duration determinants combined, the Polish learners vary their
vocalic length far less than do native English speakers, even though fluency
problems, typical of learner speech, should probably contribute to more variation.
5. VarcoV and VRQ are efficient measures which show differences between native
and Polish-accented English speech timing.
6. VRQ appears resistant to individual speech rate differences.
7. Because duration statistics are text-dependent, cross-linguistic studies are difficult
to conduct. Useful data about native and non-native speakers can be gathered if
standardised tests are introduced.
The measures presented in this paper show general differences between native English
and Polish learner pronunciation but they can also serve as immediate didactic help in
practical phonetics courses to enhance the learners' awareness of cross-linguistic
differences and similarities and may help set concrete targets for practical pronunciation
Avery, P. and S. Ehrlich. 1992. Teaching American English Pronunciation. Oxford:
Oxford University Press.
Boersma, P. 2001. Praat, a system for doing phonetics by computer. Glot International 5
(9/10): 341-345.
Bryła, A. 2010. Phonetic properties of Euro-English – empirical evidence. In Issues in
accents of English 2: Variability and norm, ed. E. Waniek-Klimczak, 37-60.
Newcastle-upon-Tyne: Cambridge Scholars Publishing.
Dellwo, V. 2006. Rhythm and Speech Rate: A Variation Coefficient for ΔC. In:
Language and Language-processing, eds. P. Karnowski and I. Szigeti, 231-241.
Frankfurt am Main: Peter Lang.
Dziubalska-Kołaczyk, K., A. Bogacka, D.Pietrala, M. Wypych and G. Krynicki. 2006.
PELT: an English language tutorial system for Polish speakers. MULTILING-2006,
paper 012.
Gonet, W., J. Szpyra-Kozłowska, and R. Święciński. 2010. The acquisition of Vowel
Reduction by Polish students of English. In Issues in accents of English 2:
Variability and norm, ed. E. Waniek-Klimczak, 291-308. Newcastle-upon-Tyne:
Cambridge Scholars Publishing.
Grabe, E., B. Post and F. Nolan. 2001. The IViE Corpus. Department of Linguistics,
University of Cambridge. http://www.phon.ox.ac.uk/IViE. [Retrieved 7 September
Grabe E. and E. L. Low. 2002. Durational variability in speech and the rhythm class
hypothesis. In Laboratory Phonology 7, eds. C. Gussenhoven and N. Warner, 515546. Berlin, New York: Mouton de Gruyter.
Hewings, M. 2004. Pronunciation Practice Activities. Cambridge: Cambridge
University Press.
Loukina, A., G. Kochanski, B. Rosner, C. Shih and E. Keane. 2011. Rhythm measures
and dimensions of durational variation in speech. Journal of the Acoustal Society of
America 129/5: 3258-3270.
Low E. L., E. Grabe and Nolan F. 2000. Quantitative characterisations of speech rhythm:
syllable-timing in Singapore English. Language and Speech 43: 377-401.
Luke, K.-K. and J. C. Richards. 1982. English in Hong-Kong: Functions and status.
English World-Wide 3: 147-164.
Nowacka, M. 2008. The Phonetic Attainment in Polish University and College Students
of English. A Study in the Productive and ReceptivePronunciation Skills.
Unpublished Ph.D. dissertation. Maria Curie-Skłodowska University, Lublin.
Porzuczek, A. 2010. The weak forms of TO in the pronunciation of Polish learners of
English. In Issues in accents of English 2: Variability and norm, ed. E. WaniekKlimczak, 309-324. Newcastle-upon-Tyne: Cambridge Scholars Publishing.
Porzuczek, A. (in press). The timing of tone group constituents in the advanced Polish
learner's English pronunciation. Katowice: Wydawnictwo Uniwersytetu Śląskiego
Ramus, F., M. Nespor and J. Mehler. 1999. Correlates of linguistic rhythm in the speech
signal. Cognition 72: 1-28.
Sobkowiak, W. 1996. English Phonetics for Poles. Poznań: Bene Nati
Szpyra-Kozłowska, J. 2003. The Lingua Franca Core and the Polish Learner. In
Dydaktyka fonetyki języka obcego, eds. W. Sobkowiak and E. Waniek-Klimczak,
193-210. Płock: Wydawnictwo Naukowe PWSZ w Płocku,.
Waniek-Klimczak, E. 2005. Temporal Parameters in Second Language Speech. Łódź:
Wydawnictwo Uniwersytetu Łódzkiego.
White, L. and S. L. Mattys. 2007. Calibrating rhythm: First language and second
language studies. Journal of Phonetics 35: 501-522.
Appendix A
The read text and tested vowels. Unstressed reduced vowels in italics, stressed vowels in
Once upon a time there was a girl called Cinderella. But everyone called her Cinders. Cinders
lived with her mother and two stepsisters called Lily and Rosa. Lily and Rosa were very
unfriendly and they were lazy girls. They spent all their time buying new clothes and going to
parties. Poor Cinders had to wear all their old hand-me-downs! And she had to do the cleaning!
One day, a royal messenger came to announce a ball. The ball would be held at the Royal Palace,
in honour of the Queen's only son, Prince William. Lily and Rosa thought this was divine. Prince
William was gorgeous, and he was looking for a bride! They dreamed of wedding bells!
When the evening of the ball arrived, Cinders had to help her sisters get ready. They were in a bad
mood. They'd wanted to buy some new gowns, but their mother said that they had enough gowns.
So they started shouting at Cinders. 'Find my jewels!' yelled one. 'Find my hat!' howled the other.
They wanted hairbrushes, hairpins and hair spray.
When her sisters had gone, Cinders felt very down, and she cried. Suddenly, a voice said: 'Why
are you crying, my dear?'. It was her fairy godmother!
Appendix B
Individual speakers' vowel class length ratios. Native speakers' codes in bold. Polish
learners codes followed by "1" (1st recording) or "2" (second recording)
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0049-2
University of Gdańsk
It has been generally accepted that greater vowel/syllable duration is a reliable correlate of
stress and that absolute durational differences between vowels underlie phonemic length
contrasts. In this paper we shall demonstrate that duration is not an independent stress
correlate, but rather it is derivative of another stress correlate, namely pitch. Phonemic
contrast, on the other hand, is qualitative rather than quantitative.
These findings are based on the results of an experiment in which four speakers of
SBrE read 162 mono-, di- and trisyllabic target items (made of CV sequences) both in
isolation and in carrier phrases. In the stressed syllables all Southern British English
vowels and diphthongs were represented and each vowel was placed in 3 consonantal
contexts: (a) followed by a voiced obstruent, (b) voiceless obstruent and (c) a sonorant.
Then, all vowels (both stressed and unstressed) were extracted from target items and
measured with PRAAT.
The results indicate that stressed vowels may be longer than unstressed ones. Their
durational superiority, however, is not stress-related, but follows mainly from vowelintrinsic durational characteristics and, to some extent, from the prosodic context (i.e. the
number of following unstressed vowels) in which it is placed. In CV 1CV2 disyllables,
when V1 is phonemically short, the following word-final unstressed vowel is almost
always longer. It is only when V1 is a phonemically long vowel that V2 may be shorter.
As far as diphthongal V1 is concerned, the durational V1~V2 relation is variable.
Interestingly, the V1~V3 relation in trisyllables follows the same durational pattern. In
both types of items the rare cases when a phonemically short V1 is indeed longer than the
word-final vowel involve a stressed vowel which is open, e.g. [], and whose minimal
execution time is longer due to a more extensive jaw movement. These observations
imply that both in acoustic and perceptual terms the realisation of word stress is not based
on the durational superiority of stressed vowels over unstressed ones. When it is, it is only
an epiphenomenon of intrinsic duration of the stressed vowel and extra shortness of nonfinal unstressed vowel.
As far as phonemic length contrast is concerned, we observe a high degree of
durational overlap between phonemically long and short vowels in monosyllabic CVC
words (which is enforced by a greater pitch excursion), whereas in polysyllables the
differences seem to be perceptually non-salient (>40 ms, cf. Lehiste 1970). This suggests
that the differences in vowel duration are not significant enough to underlie phonological
length contrasts.
1. Introduction
Vowel duration has been given an enormous amount of research attention, both phonetic
and phonological. It has also been generally accepted that duration is one of the major
phonetic correlates of stress (cf. Fry 1955, 1958). In this paper we will concentrate on
how phonemic length contrasts are curtailed by the operation of pre-fortis clipping (PFC)
and the prosodic context (i.e. the number of the following unstressed syllables, or foot
structure) in which the stressed vowel is placed. We will argue that PFC and the size of
the foot obliterate quantitative vowel contrasts.
2. Experiment design
Four male speakers of Southern British English took part in a controlled experiment.
Each subject read 162 target items (54 monosyllables, 54 disyllables and 54 trisyllables).
All items were presented in two contexts: in isolation and phrase-finally (Say the
word...). Target items were selected according to the following criteria: (i) all
monosyllables were of the CVC type, (ii) all di- and trisyllables terminated in [i]
(incidentally schwa), (iii) in the stressed vowel position all RP vowels an diphthongs
were represented, (iv) the post-stress consonants were of three types: voiceless
obstruents, voiced obstruents and sonorants (each vowel and diphthong was placed in all
three consonantal contexts), (v) where possible, the initial C was a voiced obstruent.
Only vowels were measured in the present study. The total number of observations
amounts to 652 (162 vowels x 2 contexts x 4 subjects). The significance of the durational
differences between stressed vowels in isolated vs. phrase-final context was tested for all
vowels in all three groups of target items (mono-, di- and trisyllables) separately. We
hypothesised that both isolated and phrase-final pronunciations are in fact identical by
virtue of being followed by silence. Thus, if the phrase-final lengthening effects occur
(for individual vowels or globally for all vowels within an item in terms of their total
duration), they should be observed in both contexts. One-way Anova (with an alpha of
.05) confirms that there is no significant effect of the context on both stressed and
unstressed vowel duration (p>.05). Thus, the two sets of data were combined which
increased the sensitivity of further statistical tests (n=104 for an individual subject in
each group of items, i.e. 1-, 2- and 3-syllables).
Vowel duration was measured with PRAAT (Boersma and Weenink 2005) using
waveforms and spectrograms. For vowels followed by consonants, vowel onset was
identified as the point where the target vowel full formant structure was reached and the
end of the vowel corresponded to the beginning of the closure phase. The termination of
word-final vowels was assumed to coincide with the end of periodic wave accompanied
by dispersion of F2/F3.
3. Vowels duration: a problematic stress correlate
Earlier studies have shown that there exist three acoustic correlates of stress, i.e. f0,
duration and intensity. According to Fry (1955, 1958), Bolinger (1958) and Morton and
Stressed Vowel Duration and Phonemic Length Contrast
Jassem (1965) the correlates differ in their contribution to stress perception: f0 provides
the strongest cue, increased duration has a slightly lesser perceptual value and intensity
is the weakest correlate. As argued by Lieberman (1960), however, vowel duration is the
weakest correlate. A different point of view is presented by Cutler, Dahan and Donselaar
(1997: 154) who argue that there is "peculiar redundancy of stress cues in English" and it
is also segmental structure that provides robust information about stress.
In essence, the null hypothesis tested in the present study assumes that there exists a
fixed VSTRESSED>VUNSTRESSED relation that holds for all three phonetic correlates of stress,
duration being one of them. Thus, V1 in polysyllabic items should invariably be longer
than the following unstressed vowels (V2 in 2- and 3-syllable words and V3 in 3-syllable
words). The durational superiority of the stressed vowel over the unstressed ones within
a lexical item, however, is not as obvious as it may seem. Admittedly, in trisyllabic
words V1 was found to be generally longer than the following unstressed vowel (V2).
The mean differences between the two vowels for each subject were as follows: S1=61
ms; S2=69 ms; S3=52 ms and S4=77 ms. However, not in all cases was the difference
between V1 and V2 positive. V1 did happen to be shorter than V2 (S1=5.5%; S2=4.6%;
S3=14.5% and S4=0.6% of items in the sample). Although such instances were
relatively infrequent in each sample, the very fact that they did occur raises doubts about
the validity of VSTRESSED>VUNSTRESSED relation. We do not think, however, that this
provides sufficient arguments for rejecting it. It has to be mentioned that V2 was longer
than V1 only very specific contexts: (i) when V1 was followed by a coda consonant (e.g.
density, dignity) and/or the consonant following V2 was a stop (e.g. Kennedy, Canada).
The former context accounts for the extra shortness of V 1 and the latter one for the
lengthening of V2 due to a slightly longer closure phase before the following stop.
Furthermore, since the coda consonant is generally assumed to contribute to the
phonological weight of the syllable rhyme, its duration should also be taken into
account. If added to the pre-coda vowel, the total duration of the CV rhyme would have
certainly eliminated all instances in which V1 alone was shorter than V2 in trisyllables.
Much stronger doubts concerning the durational domination of the stressed vowels
over the unstressed ones appear when V1 duration in di- and trisyllables is compared
with that of word-final unstressed vowels (e.g. biddy, bigamy). In disyllables, when V1 is
phonemically short, the following word-final unstressed syllable is almost always
longer. It is only when V1 is a phonemically long vowel that V2 is shorter. As far as
diphthongal V1 is concerned, the durational V1~V2 relation is variable. Interestingly, the
V1~V3 relation in trisyllables follows the same durational pattern. In both types of items
the rare cases when a phonemically short V1 is indeed longer than the word-final vowel
involve a stressed vowel which is open, e.g. [] and whose minimal execution time
(Klatt 1986) is longer due to a more extensive jaw movement. These observations imply
that both in acoustic and perceptual terms the realisation of word stress is not based on
the durational superiority of stressed vowels over unstressed ones. When it is, it is only
an epiphenomenon of intrinsic duration of the stressed vowel and extra shortness of nonfinal unstressed vowels, as illustrated in the graphs (1) and (2). Hence, to a large extent it
is accidental.
Graph 1: V1-V2 difference in duration (ms) in 2-syllable items (all subjects)
Graph 2: V1-V3 difference in duration (ms) in 3-syllable items (all subjects)
In consideration of the above, we have to reject the idea that stressed vowels are longer
than the unstressed ones within the same item. In terms of duration the
VSTRESSED>VUNSTRESSED relation is neither stable nor does it seem to be stress-related.
4. Pre-fortis clipping effects, or how phonemic contrast gets
In principle the PFC effects should be observed in all vowels which are followed by a
voiceless obstruent regardless of the vowel position and the prosodic context. Thus, it
should affect stressed and unstressed vowels alike as it is a stress-independent
Stressed Vowel Duration and Phonemic Length Contrast
phenomenon. As observed by Kim and Cole (2005), there exists an inversely
proportionate relation between the duration of the stressed syllable and the number of
syllables that follow. However, this regularity is also contextually independent of PFC.
While on average the duration of the stressed vowels is expected to decrease in longer
items, the compression effect may not suspend the operation of PFC. Thus, the mean
difference in milliseconds between the duration of stressed vowels followed by a
voiceless obstruent and those followed by a voiced one is expected to diminish without
threatening the significance of the difference itself.
Thus, according to the null hypothesis, regardless of the durational differences
between the stressed vowels in shorter vs. longer items and stressed vs. unstressed
vowels, the PFC effects, which are merely related to the voicing of the following
consonant, should be constant. If this claim is falsified, i.e. the PFC effects turn out to be
insignificant for some group of items or some prosodic context, the conditioning factor
must be singled out which is responsible for the PFC suspension. An alternative
hypothesis, in our view, must assume that it is caused by the intervocalic durational
relations within polysyllabic items. The existence of such interdependences entails a
postulation of a higher-level constituent which controls the interactions between the total
number of syllables and the degree of stressed vowel shortening before a fortis
consonant. We assume that this constituent is the metrical foot.
First, let us consider the durational differences relating to the PFC in the group of
monosyllables ending in voiced vs. voiceless obstruent. Rather unsurprisingly, the oneway Anova test (alpha .05) confirms that PFC has a highly significant effect (S1
p=1.28E-14; S2 p=2.59E-08; S3 p=.0007; S4 p=1.6E-15) on vowel duration for all
subjects regardless of the phonemic length of the vowel.
In disyllables the pre-voiced/pre-voiceless durational difference between stressed
vowels (V1) remains statistically significant, although it has to emphasised that the pvalues are generally higher and the mean differences are smaller (S1 p=.0002; S2
p=.004; S3=.0007; S4 p=.02).
As far as trisyllabic items are concerned, however, for all subjects the PFC effects on
V1 duration turn out to be non-significant (S1 p=.02; S2 p=.07; S3 p=.56; S4 p=.23).
Moreover, the mean differences in duration between V1+CVOICELESS and V1+CVOICED are
further reduced, both generally and for an individual subject. Noteworthy is also the fact
that while the mean difference in the duration of pre-voiced vs. pre-voiceless vowels in
monosyllables (53.6 ms~113.7 ms) may be safely assumed to be perceptually salient,
this is not so obvious in the case of di- and trisyllables, where the difference range is
27~33.9 ms and 20.9~26.6 ms, respectively.
In conclusion, PFC affects stressed vowels to a different degree depending on the
number of syllables that follow. Thus, the probability of its occurrence is inversely
proportionate to the overall vowel duration of the word.
Let us now pay attention to another surprising fact, namely that phonemically
identical vowels followed by a voiceless obstruent are not necessarily shorter than that
those followed by a voiced one. The percentage of cases when the vowel in VC VOICELESS
is longer than VCVOICED is presented in Table 1 below. For each speaker, the left-hand
column shows the number of items where the pre-voiced vowel was actually longer than
the phonemically identical pre-voiceless one and the right-hand one the percentage of
such occurrences in the data sample (n=36).
Table 1: Number of instances in which a stressed vowel is longer than a phonemically
identical vowel despite the PFC context.
This seems to undermine the very relation between the duration of a vowel and the
voicing of the following consonant. This observation does not necessarily falsify PFC.
As argued by Kingston and Diehl (1994), PFC is a feature which enhances phonemic
contrast and as such it facilitates speech perception. As Gussenhoven (2007: 146) puts it
“the implementation of pre-fortis clipping [...] is a concession to the hearer by way of
compensation for the frequent devoicing of the voiced obstruent.” Thus, this
compensation is more likely to occur when the phonemic distinctiveness is threatened.
Its degree observed in experimental conditions will then depend upon the organisation of
the input. Since in our experiment the order of target items was randomised (i.e. items
like bit and bid were never placed consecutively), there was no (or very little) necessity
of contrast enhancement.
Since on the other hand, PFC is aerodynamically conditioned ‘because the transglottal
pressure difference creating the airflow driving vocal fold vibration is hard to maintain
in the face of the impedence by the oral constriction of obstruents’ (Gussenhoven: ibid.),
its effect on vowel duration is likely to be observed even if distinctiveness is not
threatened (e.g. in a randomised experimental input). This does not mean, however, that
it must occur as the aerodynamic conditioning may be successfully counterbalanced by
the prosodic one (which may also be aerodynamic in nature). Pre-fortis clipping, then, is
both an articulatorily motivated and speaker-controllable parameter which may be latent
(i.e. producing statistically and perceptually insignificant differences in vowel duration)
when the vowel contrast is safe. 1 In terms of speech processing, considering the fact that
the perceptual information load is directly proportionate to the number of the syllables
within an item (cf. the cohort theory by Marlsen-Wilson and Tyler (1980)), in
monosyllables the number of instances in which a vowel followed by CVOICELESS is
longer than the phonemically identical vowel followed by CVOICED is the lowest.
To sum up, PFC has been shown to have the greatest effect on vowel duration in
monosyllabic items. The degree of durational difference between pre-voiced and prevoiceless vowels in the stressed position is inversely proportionate to the overall length
of an item, i.e. the effect is lesser on the stressed vowels in disyllables than on those in
monosyllables and it becomes insignificant in trisyllabic items. Pre-fortis clipping
appears to be both an articulatorily motivated and speaker-controllable process which
may be latent (i.e. producing statistically and perceptually insignificant differences in
vowel duration) when the vowel contrast is safe.
A typical context for its activation is the presentation of length contrast in minimal pairs
(beat~bead), e.g. in the process of phonetic instruction.
Stressed Vowel Duration and Phonemic Length Contrast
5. Durational overlap between phonemically long and short vowels
We observed that (i) mean stressed vowel durations systematically decrease as the
number of following unstressed syllables increases and (ii) the differences between
stressed vowel durations in mono- and disyllables are significantly greater (67-97 ms)
than those between di- and trisyllables (15-43 ms).
Graph 3: Mean stressed vowel durations (ms) in mono-, di- and trisyllabic items
Theoretically, one would expect that the systematic decrease in V1 duration in 2- and 3syllable words should result in a simultaneous obliteration of phonemic length
distinctions and, consequently, pose a potential threat to their perception. However, the
danger of eliminating phonemic length distinction in polysyllabic items is not as serious
as it may seem. Recall that the inter-speaker variation ranges from 18ms to 43.9 ms,
which does fit neatly in the non-distinguishable window (10~40 ms) established by
Lehiste (1970: 13). The durational deficiency of V1 in polysyllabic items may also be
successfully compensated for by a more robust segmental context. Note that,
paradoxically, due to the fact that as the number of the syllables grows, the number of
potential vowel-consonant permutations increases rapidly, which reduces the chances of
generating, for instance, a trisyllabic minimal pair (whose semantic contrast relies
entirely upon V1 quantity) virtually to zero. Thus, the substantially reduced V1
recognition time in di- and trisyllables can hardly impede the process of the whole word
recognition. Language economy should, therefore, allow to loosen the length contrast
requirements where intelligibility is not threatened, i.e. in polysyllabic forms, and
strengthen it if the recognition of an item is largely dependent on the recognition of the
vowel, i.e. in monosyllables. So much of the theory. What emerges from our data,
however, is a completely opposite regularity. It is in the monosyllabic items where the
stressed long and short vowels display durational convergence rather than in di- and
trisyllables. This conclusion was arrived at by mapping the mean durations of
phonemically long and short vowels onto the corresponding standard deviation values.
Thus, we have calculated the span of a durational window for the two classes of stressed
vowels in 1-, 2- and 3-syllable words by adding the standard deviation for each group to
its mean duration on the one hand and subtracting the standard deviation from the
corresponding mean duration on the other. The resulting windows for phonemically
short and long vowel durations in each group of items were then compared for each
subject with a view to extracting the degree of overlap, which was calculated in the
following way: (VMEAN DUR. + VSTD DEV.) – (V:MEAN DUR. – V:STD DEV.). We assumed that
there is an inversely proportionate relation between the degree of the durational overlap
and the robustness of the phonemic length contrast in a particular group of items.
It turns out that for all subjects the durational overlap was observed only in
monosyllabic items (S1=60.2 ms; S2=8.2 ms; S3=48 ms and S4=29.8 ms) and not in diand trisyllables. This is graphically illustrated in (x) below. Mean duration values are
represented by ♦.
Graph 4: Long/short durational overlap in 1-, 2- and 3-syllable items
Thus, despite the (misleading) fact that the differences in mean durations between long
vs. short vowels remain constant for all three groups of target items (cf. the distances
between ♦s in each V/V: pair), the durational overlap between long and short vowels in
monosyllables indicates that the phonemic contrast is, at least to some extent, suspended
in this particular context. Bearing in mind the doubtful perceptual value of the long-short
V1 contrast in polysyllables and a fair amount of durational long-short overlap in
monosyllables, we have to conclude that in general the phonemic contrast, at least in the
dialect of English investigated in this study, is qualitative rather than quantitative in
nature. What follows is that the perception of phonemic length contrast and the
production of phonemically conditioned differences in vowel durations may be two
different phenomena. While the distinctions do have their articulatory manifestation,
their perception (due to the fact that they are below just noticeable difference) are based
on quality rather than quantity.
6. Conclusions
The present findings may be summarised as follows. Duration alone is not an
independent stress correlate. It is rather a derivative of other correlates (pitch in
particular). Stressed vowels may be longer than unstressed ones. Their durational
Stressed Vowel Duration and Phonemic Length Contrast
superiority, however, is not stress-related but follows mainly from vowel-intrinsic
durational characteristics. The operation of PFC obliterates the durational contrasts.
Phonemic contrast is qualitative rather than quantitative. In monosyllables there is a high
degree of durational overlap between phonemically long and short vowels (which is
enforced by a greater pitch excursion), whereas in polysyllables the differences do exist
but are perceptually non-salient.
Bolinger, Dwight. 1958. A theory of pitch accent in English. Word 14: 109-149.
Cutler, Anne, Dahan Delphine and van Donsellar, Wilma. 1997. Prosody in the
comprehension of spoken language: a literature review. Language and Speech 40:
Fry, Denis B. 1955. Duration and intensity as acoustic correlates of linguistic stress.
JASA 27: 765-768.
Fry, Denis B. 1958. Experiments in the perception of stress. Language and Speech 1:
Gussenhoven, Carlos. 2007. A vowel height split explained. Compensatory listening and
Speaker Control. In J. Cole and J. I. Hualde (eds.) Laboratory Phonology 9: Mouton
de Gruyter: 145-172.
Kim, Heejin and Jennifer Cole. 2005. The stress foot as a unit of planned timing:
evidence from shortening in the prosodic phrase. Proceedings of Interspeech 2005,
Lisbon, Portugal: 2365-2368.
Kingston, John and Randy Diehl. 1994. Phonetic knowledge. Language 70(3): 419-454.
Klatt, Dennis H. 1976. Linguistic uses of segmental duration in English: Acoustic and
perceptual evidence. Journal of the Acoustical Society of America 59: 1208-1221.
Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, MA: The MIT press.
Lieberman, Philip. 1960. Some acoustic correlates of word stress in American English.
JASA 32: 451-454.
Marslen-Wilson. William D. and Loraine K. Tyler. 1980. The temporal structure of
spoken language understanding Cognition 8: 1-71.
Morton, John and Wiktor Jassem 1965. Acoustic correlates of stress. Language and
Speech 8: 159-181.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0029-6
University of Regenburg
The current study presents acoustic analyses of non-high back vowels and low central
vowels in the lexical sets LOT, THOUGHT, STRUT, PALM and BATH as pronounced
by German learners of English. The main objective is to show that learners of English at
university level are highly inconsistent in approximating the vowels of their self-chosen
target accents British English (BrE) and American English (AmE). To that end, the
acoustic qualities of the English vowels of learners are compared to their native German
vowels and to the vowels of native speakers of BrE and AmE. In order to facilitate
statements about the effect of increased experience, the study differentiates between
students in their first year at university and in their third year or later. The results obtained
are highly variable: In some cases the learners transfer their L1 vowels to English, other
cases show clear approximations to the target vowels, while other cases again document
the production of new vowels neither found in German nor in English. However, close
approximation to the target vowels only sometimes correlates with higher proficiency.
This might be an indicator of a low level of awareness of systematic differences between
the BrE and AmE vowel systems. But the data also indicate that the more advanced
learners produce more distinct AmE BATH vowels and BrE THOUGHT vowels than the
less advanced learners, which points to a partial increase of awareness resulting from
increased experience. All in all it seems that raising the awareness of differences between
target accents in L2 instruction is necessary if the envisage goal is for learners to reach
near-native pronunciation.
1. Introduction
In varieties of English around the world words of the lexical sets LOT, THOUGHT, and
BATH are pronounced in different ways. This leads to different degrees of overlap with
the lexical sets PALM and STRUT.
In the two major varieties of English which German learners aim at, namely BrE and
AmE, these differences manifest themselves as shown in table 1.
1 I would like to thank one anonymous reviewer for helpful comments on this paper. All
remaining shortcomings are my own responsibility.
Lexical set
[()]~ [()]
Table 1: Examples for BrE and AmE pronunciation of the lexical sets LOT, THOUGHT,
The LOT and THOUGHT vowels are less rounded in AmE than in BrE and can be
variable in quantity, as indicated by "()" in table 1 (cf. Wells 1982: 120, 122, 124, 476).
Many native speakers of AmE merge THOUGHT and LOT either to [()] or to [()]
(cf. Wells 1982: 473-476). A short and low AmE pronunciation of LOT / THOUGHT is
very similar to STRUT. The BATH vowel matches with PALM in BrE, with TRAP in
AmE. Similar to LOT and THOUGHT, AmE PALM can be comparatively short. (cf.
Wells 1982: 118-124).
It could be hypothesized that for learners of English who are not aware of these
differences between and variability within varieties of English and perceive the language
they are learning as a monolithic whole, these multiple pronunciations of mid and low
back vowels and low central are likely to be interpreted as a highly variable model. As a
result, learners are inconsistent in targeting their self-chosen accent, making use of a
plethora of vowels from different models.2
Along these lines, the present paper studies German learners of English at university
level, mostly students in teacher training. It sets out to describe and interpret their
inconsistencies in the production of vowels in the lexical sets LOT, THOUGHT,
STRUT, PALM and BATH with respect to the learners' self-chosen target accents, in
this case either BrE and AmE. Such an interpretation needs to take into account the
notions of interlanguage, L1 transfer, similarity, and awareness.
In the case of the English vowels in LOT, THOUGHT, STRUT, PALM and BATH
as pronounced by German learners of English similarity to the German vowels
SOCKEN, BOTEN, HATTEN, and BATEN might be expected to lead to transfer,
especially since these vowels receive little attention in formal instruction.
However, the acoustical analyses presented here show that in many cases learners use
sounds different from both L1 and L2, which can be seen as an empirical manifestation
of interlanguage. The likely reason for the learners' inconsistencies, therefore, cannot be
pinned down to transfer alone, but also to a lack of awareness of the highly
heterogeneous nature of the input around them.
2 Even if this might be, according to the anonymous reviewer, an "unwarranted assumption", it is a
reasonable one. Unfortunately, no previous studies supporting this claim could be discovered.
Transfer, similarity or lack of awareness?
2. English and German non-high back and low central vowels
Figure 1: The vowels of English (RP) and German (taken from Kortmann 2005:182)
Figure 1 provides a contrastive overview of the vowel systems of English (RP) and
German. The vowels to be dealt with in the present study, viz. the non-high back vowels
and the low central vowels, are highlighted by the box.
Figure 2: Non-high back vowels and low central vowels of English (RP) and (adapted from
Kortmann 2005:182)
Figure 2 zooms into the relevant area and roughly places lexical sets for German, BrE
and AmE at the traditional articulatory locations of vowels.
The four German vowels are represented by BATEN, HATTEN, BOTEN and
SOCKEN. The BATEN and HATTEN vowels are long and short low central vowels,
respectively, with HATTEN being slightly fronter than BATEN. BOTEN and SOCKEN
are long and short mid back vowels, respectively, with BOTEN being considerably
closer than SOCKEN.
American and British STRUT and PALM are close to German HATTEN and
BATEN. British THOUGHT is close to German BOTEN, the American versions are
more open and less rounded, and can be as open as to match PALM. British LOT is,
from an articulatory perspective, the rounded counterpart of [], American LOT is less
rounded and its variants can be very similar to those of THOUGHT (LOT-THOUGHT
merger, cf. Wells 1982: 473-476).
In BATH BrE uses the same vowel as in PALM, while the AmE BATH vowels
equals TRAP and is realized as [].
The following section briefly surveys relevant concepts of SLA theory and makes
some predictions of possible problems and routes of transfer in the acquisition of the
vowel systems of BrE and AmE by German learners.
3. SLA theory: L1 transfer, similarity and awareness
On the basis of the differences between German and English vowel space mentioned
above, the present section will briefly discuss the notion of interlanguage in connection
with L1 transfer in L2 phonological acquisition and suggest that the outcome of L2
phonological acquisition is very likely to be connected to the level of awareness learners
have for the details of the sound system of their target accent.
Interlanguage as introduced by Selinker (1972) entails the widely accepted notion
that learners when acquiring a second or foreign language "create a language system",
which is not seen as a "deficit system [...] but as a system of its own with its own
structures" (Gass and Selinker 2008: 14). The elements of the interlanguage are either
from the learner's L1 or from the L2. In addition there are so-called "new forms",
elements that belong neither to the L1 nor to the L2 (ibid.).
The process responsible for L1 elements being present in the L2 is L1 transfer.
Especially in L2 phonological acquisition L1 transfer is, despite its behaviourist roots, a
well accepted concept (cf. Major 2008 for a detailed discussion).
What often goes hand in hand with L1 transfer is the notion of cross-linguistic
similarity in that it addresses "the question which phenomena are more susceptible to
transfer and which are not" (Major 2008: 71). Here most researchers agree that "[t]he
more similar the phenomena the more likely transfer will operate; however, what
constitutes similar is not always clear-cut" and "a more rigorous and universally agreed
upon definition of similarity would seem necessary (Major 2008: 74).
Along these lines Bohn (2002) states that "[a]rmchair methods and acoustic and
articulatory comparisons can, at best, serve as a starting point" (Bohn 2002:209) and
according to Strange (2007) "[c]ross linguistic similarity is difficult to measure without
perception data" (Strange 2007: 45). In other words, the only reliable way to define
similarity is through perception experiments (cf. Strange 2007, Bohn 2002, Strange and
Shafer 2008). 3
In this vein the acoustic data presented here will serve as a starting point for the
description of L2 phonological acquisition of learners faced with more than one model.
But they will also serve to support the claim that similarity is a highly relative concept.
Equivalence classifications of sounds on the basis of assumed similarities are
subconscious processes of which learners are not aware. It seems to be necessary to see
similarity in Major's (2008: 75) terms as slowing down acquisition, but to different
extents and on an individual basis.
Two examples from L1 German L2 English learners will serve to illustrate this. It
has been shown elsewhere (Kautzsch 2010a) that in the case of the English mid and low
front vowels [] and [] in bed and bad (i.e in the lexical sets DRESS and TRAP)
German has only one short vowel counterpart [] as in BETTEN, which is then - due to
equivalence classification – used in both English contexts. The distinction between []
and [] develops quite late in German learners. This is very likely due to the fact that
this distinction does not feature prominently in German ESL/EFL classrooms, resulting
in a low level of awareness for this difference. 4
In the case of the dental fricatives // and //, which due to their absence in German
could be seen as dissimilar and therefore should be easier to acquire, the relativity of
similarity becomes even more apparent. The success rate of German learners here is
much higher, although there remain a considerable number of learners who do not
manage to acquire these sounds and use the alveolar fricatives [] and [] instead. From
a similarity perspective this means that for those learners who succeed in the acquisition,
the dental fricatives are dissimilar enough from German sounds as not to be classified as
equivalent. For those who fail, a perceived similarity with alveolar fricatives persists.
Again, the different performances of these learners seem to be connected to awareness.
As soon as one is aware of two sounds being different, they become more dissimilar and
are thus acquired faster. And since much emphasis is placed on the dental fricatives in
ELT in Germany, a higher level of awareness is created and the approximation to the
target sound is on the whole more successful.
The notion that learners can be made aware of similar phenomena is not new in SLA.
It is inherent, for example, in "Focus on Form" (presented for example in ch. 11.5 in
Gass and Selinker 2008), or in the "Noticing Hypothesis" (developed by Schmidt in a
series of articles on attention and awareness: Schmidt 1990, 1994, 1995, 2001, 2010).
Making learners aware of certain structures (or in our case sounds) seems especially
applicable in the classroom, less so in natural, immersive settings (cf. Krashen 1985,
Gass and Selinker 2008).
As far as the vowels under scrutiny in the current study are concerned, they receive
little attention in the ESL/EFL classroom of German learners of English. And when
3 For three popular models which incorporate similarity as a central concept see the Speech Learn-
ing Model (Flege 1995), the Perceptual Assimilation Model (Best and Strange, 1992; Best 1994,
1995) and the Native Language Magnet model (Kuhl 1993, 1991).
4 A similar situation is reported upon in Kautzsch (2010b) where German learners of English are
very inconsistent in their realizations of non-prevocalic r when aiming at BrE or AmE.
Alexander Kautzsch
considering Schmidt and Frota's (1986) claim that "a second language learner will begin
to acquire the target like form if and only if it is present in comprehended input and
'noticed' in the normal sense of the word, that is consciously", it must be assumed that
German learners will have difficulties in acquiring the vowels in LOT, THOUGHT,
BATH, STRUT, and PALM; transfer will possibly be at work to some extent.
3.2 Predictions for German learners of English
Based on the cross-linguistic analysis above, the present section will make some
predictions for the acquisition of a BrE and an AmE vowels system by German learners.
For German learners of English aiming at BrE the German and English non-high
back vowels and low mid central vowels are similar in their articulatory properties and in
their relative positions, i.e. the German system has the same contrasts as the BrE system,
namely a pair of short and long mid back vowels, and a pair of short and long low
central vowels. Thus it would be easy for German learners aiming at BrE to apply
STRUT; THOUGHT, and LOT, respectively. In other words, L1 transfer can be
expected, but at the same time few inconsistencies will arise since the two systems
contain the same distinctions.
For students aiming at AmE there are several options to utilise their German vowels
in English. HATTEN and BATEN may be matched with STRUT and PALM, but
BATEN might also be used in THOUGHT and LOT, if pronounced as a very open
vowel [()]. Alternatively, when THOUGHT and LOT are pronounced as [()], the
SOCKEN or BOTEN vowel is likely to occur. However, BOTEN being a rounded close
mid vowel, it is also possible that it is not employed at all. SOCKEN, on the other hand,
may turn out to be too short to be used in LOT and THOUGHT. Thus, if LOT and
THOUGHT are not pronounced similar to [()], learners need to acquire a new sound.
The same applies to the BATH vowel, which needs to be matched with TRAP and
pronounced as [], a new sound that does not belong to the German vowel inventory.
In sum, it seems that the acquisition of the AmE system is more inconsistency-prone
than the acquisition of the BrE system.
4. Data
The learners analyzed in this study are 20 students of English from the University of
Regensburg. All have been chosen on the basis of a stable L1 background, i.e. they were
born and raised in two adjacent regions of Bavaria, the south-eastern of the federal states
of Germany: the Upper Palatinate (Oberpfalz) and Lower Bavaria (Niederbayern).
10 students each have AmE and BrE as their self-chosen target accent. Each of the
target accent groups contains two proficiency levels: 5 learners each are "beginners"
(Beg), i.e. students of English in their first year at university, and "advanced" students
(Adv), i.e. learners in their 3rd year of later. What matches proficiency in this sample is
the students' average time spent abroad in months: the beginners have spent 0.8 months
in an English speaking country, while the advanced students have been abroad for 8.3
The analyses below will provide insights into how successful German learners of
English are in approximating their self-chosen target accent and if they become more
successful as proficiency and time spent abroad increases.
5. Method
The acoustical analysis to follow (section 6) will present the learners' English and
German vowels and contrast them to BrE and AmE native speaker control groups.
The learners' vowels were elicited by means of reading two word lists, the one
consisting of all English monophthongs, two from each lexical set, the other containing
all German monophthongs. The present study picks out non-high back vowels and low
central vowels as represented by the words below, the whole database thus totalling 200
English and 80 German monophthongs:
body, cot (LOT)
bawd, caught (THOUGHT)
bud, cut (STRUT)
father, palm (PALM)
bath, dance (BATH)
Socken (SOCKEN)
Boten (BOTEN)
hatten (HATTEN)
baten (BATEN)
The wordlists educe the speakers' most monitored style and therefore provide access to
their idealized targets. The recordings were made in a quite office setting, vowels were
measured at the centre using Praat (Boersma and Weenink no date) and plotted by means
of Kendall and Thomas's (2010) "vowels" package for "R" (The R Project for Statistical
Computing no date), applying auditory-based Bark measure5 for normalization to even
out individual differences across speakers.
For the comparison with native speaker data, formant values published in two
previous studies are utilised: The AmE vowels are taken from Hillenbrand et al. (1995),
who analyse 45 men, 48 women6 from Michigan (Great Lakes / Midland). The BrE
vowels are taken from Deterding (1997, 1990), who provides the vowels of 8 men and 8
women from the South of England. In both studies, participants read lists of words in the
context h_d7.
5 For further details on Bark normalization and the resulting Z values (cf. figures 3 to 10) see
Thomas and Kendall (2007: "Methods") and Traunmüller (1997).
6 Hillenbrand et al. (1995) also measure the vowels of children, but the present analysis only
adopts the vowels of adults.
7 Some scholars call for stable phonetic contexts when analysing vowels, because of variable coar-
ticulation effects (cf. e.g. Bohn 2002:199). Others only avoid "tokens following the approximants [w], [j] and [r]" and tokens "before [ŋ] and dark [l], as all these sounds have severe coarticulatory effects on the vowel" (Deterding et al. 2008: 162).
Alexander Kautzsch
6. Results
The results will be presented by means of two vowel plots for each of the learners' selfchosen target variety, for BrE in 6.2, for AmE in 6.3. The first plot in each section
contains the average locations of the vowels as produced by the beginners and the
advanced students to documents differences in the two proficiency groups. In addition
these plots provide the average locations of the native speakers' vowels to illustrate the
learners' degree of approximation to their target.
The second plot in each section adds the average locations of the learners' German
vowels in order to obtain a visual idea of the degree of L1 transfer taking place, i.e. to
show to what extent German learners use their native vowels in English.
6.1 British English Target
Figure 3 shows the results for beginners and advanced learners aiming at BrE in
comparison to BrE native speakers. Advanced students are closer to native THOUGHT
than the beginners (circle 1 in Figure 3). The LOT vowels (circle 2) are very close to
native LOT for both groups. With the lexical sets STRUT, BATH and PALM, beginners
are closer to native vowels (circle 3), while advanced students display a stronger –
somewhat unnecessary – differentiation between these vowels (circle 4).
Figure 3: The BrE non high back vowels and low central vowels of German beginners and
advanced learners and of native speakers of BrE
Adding the German vowels to the plot (Figure 4) results in the following picture:
German SOCKEN (circle 1) is not used for LOT (circle 2), German BOTEN is very
close to native THOUGHT but produce different vowels (circle 3), the pronunciation of
STRUT and PALM is close to German HATTEN and BATEN for advanced students
(circle 4), while the beginner's pronunciation of STRUT/PALM/BATH is closer to that
of native speakers (circle 5). The advanced group's BATH vowel is considerably higher
(circle 6).
Figure 4: The BrE and German non high back vowels and low central vowels of German
beginners and advanced learners, and the BrE non high back vowels and low central vowels
of native speakers of BrE
Summing up, German high proficiency learners acquire the LOT vowel as a close
approximation to native LOT and do not transfer German SOCKEN. Although German
BOTEN is close to BrE THOUGHT, German learners use a different vowel, which is
more open on average. In the case of PALM, BATH and STRUT, beginners are very
close to the BrE target vowels, while advanced students' PALM and STRUT are closer
to their native BATEN and HATTEN.
Thus, the predictions that German high proficiency learners aiming at BrE use their
native vowels BATEN, HATTEN, BOTEN and SOCKEN in English cannot be
confirmed; in other words expected L1 transfer take place to a limited extent only.
In addition, the increased experience of advanced students as opposed to beginners does
not increase their approximation to target vowel sounds, in fact the beginners are closer
to the target vowels in the case of PALM, BATH and STRUT.
Alexander Kautzsch
6.2 American English Target
The results for German learners' non-high back and low central vowel with respect to an
AmE target and in comparison to native speakers of AmE are shown in figure 5.
Both the beginners and the advanced students produce a close approximation to
native THOUGHT and LOT, with the beginners being even closer (circles 1 and 2 in
figure 5). The learners' BATH vowel is very different from native BATH; here the
advanced students are closer to native BATH but still at considerable distance (circle 3).
Moreover, German learners produce different vowels for LOT and PALM (circle 4).
Finally, the learners' STRUT vowels are considerably lower than native STRUT (circle
5), with the beginners being close to native LOT. This mismatch between native and
non-native STRUT, however, needs to be interpreted with caution. It cannot be seen as a
failure to approximate a native target on the side of the learners. It rather results from the
control groups' origin in the Greater Lakes region in the US. This is the area which is
likely to have been in the initial stage of the Northern Cities Vowel Shift (cf. Labov et al.
2006: 187-208) at the time of recording and thus the speakers' pronunciation of STRUT
does not represent a familiar target for learners. What this also illustrates is the
theoretically and practically challenging situation of multiple and heterogeneous target
Figure 5: The AmE non high back vowels and low central vowels of German beginners and
advanced learners and of native speakers of AmE
Adding the German vowels to the plot (figure 6) once more gives some insight into
possible L1 transfer. BOTEN and SOCKEN are not used in the learners' English (circles
1 and 2). The learners' LOT vowels, as well as native LOT, are similar to
BATEN/HATTEN (circle 3). THOUGHT, on the other hand, is a new native-like sound
(circle 4), while BATH (circle 5), PALM (circle 6) and STRUT (circle 7) are new
sounds which are neither German nor AmE.
Figure 6: The AmE and German non high back vowels and low central vowels of German
beginners and advanced learners, and the AmE non high back vowels and low central vowels
of native speakers of AmE
Summing up, the only AmE vowel of German learners in which some degree of L1
transfer can be witnessed is the LOT vowel. It is close to German BATEN/HATTEN
and learners make use of this proximity.
With AmE THOUGHT all learners use a vowel close to the target and different from
German vowels, whereas in the cases of AmE BATH, PALM, and STRUT all learners
use sounds different from German and AmE. In addition both learner groups maintain an
(unnecessary) distinction of PALM and LOT. German BOTEN and SOCKEN, on the
other hand, are not transferred. Similar to the learners aiming at BrE, increased
experience on the side of the advanced students does not increase their approximation to
the target.
6.3 Individual variation
In addition to the average locations of non-high back vowels and low central vowels as
presented above (6.1 and 6.2), this section shows four vowel plots to illustrate variation
across speakers. The plots are again grouped by target accent and each accent group has
one plot for beginners and one for advanced students. The ellipses around the mean
values mark the acoustical ranges of the respective vowels.
Starting with the results for the BrE group, the beginners' vowels (figure 7) overlap
to different extents than the advanced students' vowels (figure 8).
With the beginners, larger areas of STRUT and BATH overlap, PALM almost
completely covers the area of STRUT, BATH and LOT, which results from some
mispronunciations of PALM as []. Both BATH and STRUT overlap slightly with
LOT, and so does THOUGHT with LOT.
Figure 7: Beginners' individual variation in the pronunciation of BrE non-high back vowels
and low central vowels.
The plot for the advanced students again shows the clearer distinction between PALM,
STRUT and BATH mentioned above (6.1., figures 3 and 4). As a consequence, a wider
area of vowel space is covered. This, however, does not result in a clear distinction
between these vowels but rather leads to multiple overlaps of STRUT, PALM, BATH,
and LOT. A noticeable difference between the beginners and the advanced students can
be observed with respect to THOUGHT, which is almost completely distinct from LOT.
Transfer, similarity or lack of awareness?
Figure 8: Advanced students' individual variation in the pronunciation of BrE non-high
back vowels and low central vowels
Individual variation in the AmE target groups is shown in figures 9 (beginners) and 10
(advanced students). THOUGHT is almost fully distinct in both groups, overlapping to
small extents with LOT in the advanced group and with PALM in both groups, the latter
again being due to mispronunciations of PALM. In addition, both groups share a
considerable overlap of LOT, STRUT and PALM. The evident difference between
beginners and advanced students is that advanced students have a fully fronted version
of BATH, fully distinct from LOT, STRUT, and PALM, whereas beginners' BATH
strongly overlaps with these vowels.
Figure 9: Beginners' individual variation in the pronunciation of AmE non-high back vowels
and low central vowels
Alexander Kautzsch
Figure 10: Advanced students' individual variation in the pronunciation of AmE non-high
back vowels and low central vowels
In sum, this section has shown that the pronunciation of the vowels under scrutiny by
German learners varies to a considerable extent, indicating that it seems difficult for
learners to acquire a consistent and contrastive system, even after more than 10 year of
instruction and some time spent abroad. In two cases, however, a differentiation of the
vowel systems could be documented with the advanced students: THOUGHT becomes
more distinct from LOT in the BrE system and BATH from STRUT/PALM/LOT in the
AmE system. The overlaps and differentiations in the vowel systems under observation
point to the fact that learners do make some progress in approximating a self-reported
target accent with some vowels as proficiency increases, but fail to do so with others. A
likely explanation might again be awareness. It is easy to picture that with greater
experience in the foreign language, learners perceive a fronted version of BATH and a
rounded and closer version of THOUGHT as symbols of AmE and BrE, respectively,
and start to use these variants. Other characteristics, however, seem to go unnoticed.
7. Summary and Conclusions
German learners of English at university level has yielded the following overarching
1. When targeting AmE or BrE non-high back vowels and low central vowels,
German learners of English at university level make only little use of their
native vowel systems. In other words, they are beyond a stage of strong L1
Transfer, similarity or lack of awareness?
2. The learners produce new vowels which are neither native German nor native
English, which is a clear support for the reality of interlanguage as a system
that, among other things, also contains "elements [...] that do not have their
origin in either the NL or the TL" (Gass and Selinker 2008: 14).
3. Increasing experience in term of a closer approximation to the target is only
reflected in two cases: BATH is more front in advanced learners aiming at AmE
and THOUGHT is closer in advanced learners aiming at BrE. This might be due
to an increased level of awareness of these vowels as a result of increases
4. In general, however, experienced learners are not more native-like than less
experienced learners with respect to the vowels under discussion after more
than 10 years of learning.
All in all, the data presented here indicate that the learners of English analysed have not
fully acquired an L2 sound system. Having demonstrated that only two very salient
vowels start to be acquired at an advanced stage of proficiency, it seems that near-native
pronunciation can only be acquired or learned – if at all - with attention to and awareness
of the variability of the input8. Even experienced learners have no full awareness of the
systematic differences between the two major accents of English. If near-nativeness in
pronunciation is the envisaged goal of language learning, it is necessary to integrate
awareness of varieties of English into the ESL/EFL classroom, and especially in teacher
training at universities.
Best, C. and W. Strange. 1992. "Effects of phonological and phonetic factors on crosslanguage perception of approximants". Journal of Phonetics 20: 305-331.
Best, C. 1994. "The emergence of native-language phonological influence in infants: A
perceptual assimilation model." In H. Nusbaum, J. Goodman and C. Howard, eds.
The Transition from Speech Sounds to Spoken Words: The Development of Speech
Perception. Cambridge, Mass.: MIT Press, 167-224.
Best, C. 1995. "A direct realist view of cross-language speech perception." In Strange,
ed. 1995: 171-204.
Boersma, P. and D. Weenink. No date. "Praat: Doing phonetics by computer".
[www.praat.org; accessed 31/06/2011] .
Bohn, O.-S. 2002. "On phonetic similarity". In P. Burmeister, T. Piske and A. Rohde,
eds. An Integrated View of Language Development. Trier: WVT, Wissenschaftlicher
Verlag Trier, 191–216.
Deterding, D. 1990. "Speaker Normalization for Automatic Speech Recognition".
Ph.D.Thesis, Cambridge University.
I fully agree with the anonymous reviewer that "[j]ust because less salient vowel qualities
weren’t learned it doesn’t follow that near-native pronunciation would be possible even with attention to and awareness of the variability of the input". As usual in science, future research will
shed more light on this issue.
Deterding, D. 1997. "The Formants of Monophthong Vowels in Standard Southern
British English Pronunciation". Journal of the International Phonetic Association 27:
Deterding, David, Jennie Wong and Andy Kirkpatrick. 2008. "The pronunciation of
Hong Kong English". English World-Wide 29(2): 148-175.
Flege, J. E. 1995. "Second language speech learning: Theory, findings, and problems".
In Strange, ed. 1995: 233–277.
Gass, S. M. and Selinker, L. 2008. Second Language Acquisition: An Introductory
Course (3rd ed.). New York and London: Routledge.
Hansen Edwards, J.G. and M.L. Zampini, eds. 2008. Phonology and Second Language
Acquisition. Amsterdam and Philadelphia: John Benjamins.
Hillenbrand, J., L. A. Getty, M. J. Clark and K. Wheeler. 1995. "Acoustic characteristics
of American English vowels". Journal of the Acoustical Society of America
Kautzsch, A. 2010a. "Exploring L1 transfer in German Learners of English: High front
vowels, high back vowels and the BED/BAD distinction." Research in Language
(Special Issue: Proceedings of ACCENTS 2009, Lodz) 8: 63-84.
Kautzsch , A. 2010b. Rhoticity in German Learners of English. Paper presented at World
Englishes 2010 in Vancouver, Canada, August 2010.
Kendall, T. and E.R. Thomas. 2010. "Vowels: Vowel Manipulation, Normalization, and
Resource: http://ncslaap.lib.ncsu.edu/tools/norm/ ]
Kortmann, B. 2005. "Chapter V: Contrastive Linguistics: English and German". In B.
Kortmann. 2005. Linguistics: Essentials. Berlin: Cornelsen, 156-191.
Krashen, S. 1985. The Input Hypothesis: Issues and Implications. New York: Longman.
Kuhl, P.K. 1991. "Human adults and human infants exhibit a perceptual magnet effect
for the prototypes of speech sounds, monkeys do not." Perception and Psychophysics
50: 93-107.
Kuhl, P.K. 1993. "Early linguistic experience and phonetic perception: implications for
theories of developmental speech perception." Journal of Phonetics 21: 125-139.
Major, R. C. 2008. "Transfer in second language phonology. A review." In: Hansen
Edwards and Zampini, eds. 2008: 63-94.
Labov, W., S. Ash and C. Boberg. 2006. The Atlas of North American English. Berlin:
Mouton-de Gruyter.
Schmidt, R. 1990. "The role of consciousness in second language learning." Applied
Linguistics, 11, 129-158.
Schmidt, R. 1994. "Implicit learning and the cognitive unconscious: Of artificial
grammars and SLA." In N. Ellis, ed. Implicit and Explicit Learning of Languages.
London: Academic Press, 165-209.
Schmidt, R. 1995. "Consciousness and foreign language learning: A tutorial on the role
of attention and awareness in learning." In R. Schmidt, ed. Attention and Awareness
in Foreign Language Learning. Honolulu, HI: University of Hawaii, Second
Language Teaching & Curriculum Center, 1-63.
Schmidt, R. 2001. "Attention." In P. Robinson, ed. Cognition and Second Language
Instruction. Cambridge: Cambridge University Press, 3-32.
Schmidt, R. 2010. "Attention, awareness, and individual differences in language
learning". In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J. W. Sew, T.
Suthiwan and I. Walker. Proceedings of CLaSIC 2010, Singapore. Singapore:
National University of Singapore, Centre for Language Studies, 721-737.
Schmidt, R. and S.N. Frota. 1986. "Developing basic conversational ability in a second
language: A case study of an adult learner of Portuguese." In R. R. Day, ed. Talking
to Learn: Conversation in Second Language Acquisition. Rowley, MA: Newbury
House, 237-326.
Selinker, L. 1972. "Interlanguage". International Review of Applied Linguistics 10, 209–
Strange, W. 2007. "Cross-language phonetic similarity of vowels. Theoretical and
methodological issues." In O.-S. Bohn and M.J. Munro, eds. 2007. Language
Experience in Second Language Speech Learning. In Honor of James Emil Flege.
Amsterdam and Philadelphia: John Benjamins, 35-56.
Strange, W. and V.L. Shafer. 2008. "Speech perception in second language learners: The
re-education of selective perception." In: Edwards and Zampini, eds. 2008: 153-192.
Strange, W. ed. 1995. Speech Perception and Linguistic Experience: Issues in CrossLanguage Research, Baltimore: York Press.
The R Project for Statistical Computing. No date. "R". [www.r-project.org/, accessed
Thomas, E. R. and T. Kendall. 2007. NORM: The vowel normalization and plotting
suite. [http://ncslaap.lib.ncsu.edu/tools/norm/, accessed: 07/06/2012].
Traunmüller, H. 1997. "Auditory scales of frequency representation".
[http://www2.ling.su.se/staff/hartmut/bark.htm, accessed: 07/06/2012].
Wells, J.C. 1982. Accents of English. Cambridge, Cambridge University Press.
Research in Language, 2012, vol. 10.2
DOI 10.2478/v10015-011-0042-9
Maria Curie-Skłodowska University
The paper is a continuation of the author’s earlier studies in which she argues that it is the
mispronunciation of whole words due to their incorrect phonological storage in the
learners’ phonetic memory that is more detrimental to successful communication via
English than an inaccurate production of individual segments and suprasegmentals.
Consequently, phonetically difficult words deserve to be thoroughly investigated and
pedagogically prioritized.
The present study is a report on an experiment in which 20 English Department
students, all advanced learners of English, were recorded having been asked to read a list
of diagnostic sentences containing 80 words known to be problematic for Poles in terms
of their pronunciation. This has been done in order to isolate and examine the major error
types, to establish a hierarchy of difficulty among 8 sources of pronunciation errors, to
compare the obtained results with the most common error types made by intermediate
learners and to juxtapose the participants’ subjective evaluation of the phonetic difficulty
of words with their actual phonetic performance. The final goal is to draw pedagogical
implications for the phonetic training of advanced students of English.
1. Introduction
A striking feature of foreign-accented English, including Polish-English, is a frequent
occurrence of the so-called local errors, i.e. words stored in the learners’ phonetic
memory in an incorrect phonological shape. In a number of studies (Szpyra-Kozłowska
2011, Szpyra-Kozłowska and Stasiak 2010, Szpyra-Kozłowska in press) I argue that the
use of such items is more detrimental to successful communication via English than
inaccurately produced segments and suprasegmentals. In Szpyra-Kozłowska (in press) I
present experimental evidence that local errors significantly decrease Polish learners’
comprehensibility and intelligibility, create the impression of a heavy foreign accent and
are irritating for native English listeners. Consequently, Szpyra-Kozłowska and Stasiak
(2010) conclude that a shift is needed in phonetic instruction from the focus on sounds,
sound contrasts and prosodies to the focus on the pronunciation of problematic words.
To achieve this goal, however, a deeper insight is required into what types of items are
phonetically difficult for learners of different L1 background and various levels of
language proficiency. Szpyra-Kozłowska (2011) attempts to examine this issue in
relation to intermediate Polish learners and identifies eight major sources of word
pronunciation errors.
The present paper undertakes the problem of mispronounced words in the speech of
advanced Polish learners of English. It is a report on an experiment in which 20 English
Department students of Maria Curie-Skłodowska University in Lublin, Poland, were
recorded having been asked to read a list of diagnostic sentences containing 80 words
known to be problematic for Poles in terms of their pronunciation. This has been done in
 to isolate and examine the major types of phonetically difficult words;
 to establish a hierarchy of difficulty among 8 chief sources of pronunciation
errors in the speech of advanced learners;
 to compare the obtained results with those of intermediate learners;
 to examine the experimental results with the predictions of the PDI;
 to juxtapose the participants’ subjective evaluation of the phonetic difficulty of
words with their actual phonetic performance;
 to draw pedagogical implications for the phonetic training of advanced students
of English.
It is hoped that although the study is carried out in the Polish context, many of the
observations made here will be relevant for other types of foreign-accented English.
2. Sources of word mispronunciations
Many sources of word pronunciation errors commonly made by Polish learners are wellknown and have been identified by previous research.
In this context Sobkowiak’s (1999) work on the Phonetic Difficulty Index (PDI)
should be pointed out as a valuable attempt to deal systematically with phonetically
difficult words in Polish English. PDI (p. 214) “is a global numerical measure of the
phonetic difficulty of the given English lexical item for Polish learners,” meant to be
included in machine-readable EFL dictionaries and thus having mainly lexicographic
applications. It contains phonetic difficulty ratings of English words carried out by the
author on the basis of his observations of Polish learners’ pronunciation problems. The
current list of error sources (personal communication) includes 61 issues which can,
however, be grouped into more general categories. Thus, the largest set (26) concerns
spelling-related problems while the next largest group (24) involves problems with the
pronunciation of individual sounds and combinations of sounds (e.g. vowel hiatus,
consonant clusters). A prominent position is also occupied by stress-related problems
(5). The remaining error sources concern the incorrect application of Polish phonological
rules (such as Word-Final Obstruent Devoicing and Voice Assimilation) to English,
word length (more than 5 syllables) and several others.
It should be added that Sobkowiak’s list and his PDI are of general nature and do not
specify the relationship between the degree of words’ phonetic difficulty and the
learners’ level of English proficiency. This is a serious drawback of this proposal since
what is difficult for beginners might be fairly trivial for more advanced students. In other
words, the phonetic difficulty of English words should be examined in relation to their
proficiency level.
Taking this fact into account, in a recent study (Szpyra-Kozłowska 2011) I examined the
sources of phonetic difficulty of English words in intermediate learners’ speech. The
following eight major types of issues have been isolated:
1. Spelling-related problems.
2. Phonetic ‘false friends.’
3. Stress-related problems.
4. Pronunciation of consonant clusters involving interdentals.
5. Pronunciation of long words.
6. Pronunciation of words containing several liquids.
7. Pronunciation of words containing sequences of high front vowels.
8. Pronunciation of words with morphological alternations in related forms.
Since the types of problems listed above will be subject to experimental verification,
some explanation of these issues is in order.
Spelling-related difficulties result from two kinds of interference. The first one
involves interference from Polish spelling-to-pronunciation conventions incorrectly
applied to English words. Thus, typical examples comprise silent letters pronounced in
such items as <t> in nestle and <b> in tomb. Another problem stems from incorrect
overgeneralizations of English letter-to-sound rules, for example, interpreting the
digraph <ea> as the vowel [i:] in steak (as in meat, leaf, teach) or <ace> as [es] in
surface and palace (as in face, lace). As Polish learners have more access to written
rather than to spoken English, spelling exerts a powerful effect on Polish English
Phonetic ‘false friends’ are numerous lexical items which occur in both languages in
an identical or a similar orthographic form, but with different pronunciation. In the
majority of cases they are cognates, e.g. E chaos / P chaos, borrowings from English,
e.g. E model / P model or just accidental look-alikes, e.g. E gnat / P gnat ‘bone.’ A large
group of such words are proper nouns which appear in both languages in rather different
phonetic shapes, e.g. Nepal – E [n’po:l] / P [‘nepal] and Sidney – E [‘sdn] / P [s’idnej].
Similar or identical spelling suggests to the learners that their pronunciation must be
similar as well.
For speakers of languages with fixed stress, such as Polish, learning the intricacies of
the English stress system with its irregularities and exceptions is a genuine challenge.
Thus, while Polish learners typically employ Polish penultimate stress to English words,
e.g. ‘Japan, in’dustry, demon’strated, they frequently stress also other syllables, i.e.
ultimate, e.g. e’ffort, fe’male and antepenultimate, e.g. ‘successful, ‘computer, ar’bitrary
(for a more detailed discussion of stress errors in Polish English see Waniek-Klimczak
The interdental fricatives, which are absent in Polish, belong to the most difficult
sounds for many foreign learners. The degree of difficulty increases when they occur in
combination with other consonants. Intermediate learners who participated in our
previous experiment listed the following difficult words, all of which contain
consonantal clusters with interdentals: three throw birthday maths healthy sixth
The next source of difficulty is the length of words. Longer words are problematic
for the learners because of a variety of factors to be controlled: the placement of stress,
Jolanta Szpyra-Kozłowska
the articulation of many different new sounds and complex sound sequences. The
question that arises concerns the actual length of words which makes their pronunciation
difficult. Below we list some examples, taken from Szpyra-Kozłowska (2011), supplied
by intermediate learners as difficult because of their length,
(a) trisyllables: excitement, adventure, Australia, picturesque
(b) quadrisyllables: relaxation, astonishing, surprisingly
(c) quintisyllables: encyclopaedia, occasionally, exaggeration
According to these data, words marked as problematic because of their length contain
three syllables or more. For intermediate learners the longer a word, the more difficult it
is to pronounce.
One of the most interesting results of our study involving intermediate learners
(Szpyra-Kozłowska 2011) was the discovery that the presence of several liquids, i.e.
rhotics and laterals, contributes to the considerable pronunciation difficulty of a word.
Here are some examples supplied by the participants: appropriate library regularly
particularly rarely burglary
Many such items, apart from articulatory difficulty, are problematic because of their
spelling since <r>, appearing in the word-final and preconsonantal position, is a silent
letter in nonrhotic accents such as RP, generally taught in Poland. Since learners are
often confused as to which r’s to pronounce, many of them attempt to articulate all these
letters, which creates several liquids in a single word.
The collected data include also words regarded as difficult by the respondents due to
the fact that morphological alternations take place in the roots they contain. Since in
English such changes are often highly irregular and idiosyncratic, this fact contributes to
the perceived difficulty of the items in question. Some examples are presented below
with related forms provided in parentheses.
(a) society (social), northern (north), southern (south), anxiety (anxious)
(b) can’t (can), variety (various), breathe (breath), width (wide),
In (a) segments subject to consonant alternations are underlined while in (b) vowel
alternations are indicated. It is likely that pupils learn first more frequent words given in
parentheses and when faced with less common related items, transfer the pronunciation
from the former to the latter by analogy or due to preserving paradigm uniformity. The
degree of difficulty increases due to the fact in the above forms the alternating segments
are spelt in the same way.
The last category of phonetically difficult words for intermediate Polish learners
comprises items which contain two (a) or three (b) different high front vowels, i.e. []
and [i:], e.g.
(a) reading, sleeping, cheating, speedy, greedy, sleepy
(b) believing, receiving, preceding, repeating,
Mispronounced Lexical Items in Polish English of Advanced Learners
In such instances they tend to employ some kind of vowel harmony and pronounce two
[i:] vowels (or rather its shorter and less tense Polish counterpart [i]).
Yet another problem with [] is created by the following words: innocent, image,
impression, important, industry
In these items the initial vowel is difficult for Polish learners to pronounce and
usually replaced with Polish [i]. Apart from the powerful influence of English spelling,
another active factor here seems to be a phonotactic constraint of Polish banning in the
word initial position the occurrence of the Polish front centralized vowel [y], very close
to English [].
It should be added that, as demonstrated earlier, in many cases more factors than one
contribute to the phonetic difficulty of words. For instance, long words are often
problematic not only due to their length, but also because of the stress placement or
combinations of sounds they contain.
3. Experimental design
In October 2010 twenty randomly selected 4th year students of the English Department
of Maria Curie-Skłodowska University, Lublin, Poland, all advanced learners of English,
took part in the experiment in which they were asked to read aloud a list of sentences
(see Appendix 1) containing 80 phonetically difficult words, with 10 items representing
each of the 8 types of error sources discussed in the preceding section. The students were
then individually recorded. After the recording they were given a short questionnaire to
complete (see Appendix 2). They were provided with a list of 24 words which appeared
in the diagnostic sentences (with 3 items representing each of the 8 categories) and asked
to evaluate the degree of phonetic difficulty they posed for them (easy, medium and
difficult to pronounce). They were also requested to select three particularly difficult
words and comment on the source of the problem. The recordings were next auditorily
assessed by the researcher.
4. Results and discussion
4.1. General results
The experiment yielded 1600 tokens, of which only 655 (41%) were pronounced
correctly and 945 (59%) incorrectly. Graph 1 shows that the results of individual
participants range from 22.6% of correctly pronounced experimental items by the
poorest student to 70% by the best student in this group. The mean result is 36%.
Jolanta Szpyra-Kozłowska
Best students
Poorest student
Graph 1. Performance of best and poorest students
The above figures indicate that even for advanced students the experimental items
constitute a serious learning problem which has to be approached and remedied.
4.2. Hierarchy of word pronunciation difficulty factors
The experimental data allowed us to establish the following hierarchy of difficulty of the
8 factors presented in section 2 for advanced students:
Relatively easy types (over 50% of correct responses):
Clusters of ‘th’ and consonants different than /s/ - 70%
Liquids – 66% of correctly pronounced tokens (with two words being considerably
more difficult, i.e. particularly and regularly – only 7%)
Stress – 52% (particularly difficult: caricature)
Long words – 51% (particularly difficult: artificiality, congratulatory,
authoritarian, unintelligibility)
Medium difficulty (25%-40% of correct responses)
Spelling – 40% (particularly difficult: hideous, haven, thoroughly, Graham)
Alternating forms – 33. 5% (particularly difficult: courteous, advantageous,
managerial, infamous)
Considerable difficulty (below 25% of correct responses):
Phonetic ‘false friends’ – 24% (particularly difficult: algebra, gigantic, Disney)
Clusters of ‘th’ and /s/ - 12% (particularly difficult: strengths, lengths)
High front vowels – 7.5% (all difficult)
Mispronounced Lexical Items in Polish English of Advanced Learners
The above data require some comments. First of all, within most categories there are
words of a different degree of pronunciation difficulty for the participants. Only the set
containing sequences of high front vowels is homogeneous in this respect in that all of
them proved to be equally problematic. The same is true in the case of clusters of
interdentals and other fricatives. Thus, in these two instances we can talk of truly global
errors, not restricted to any particular lexical items. In the remaining cases there were
both easier and more difficult words, which means that other factors, apart from the ones
discussed here, are also relevant here. For example, while the words containing
sequences of liquids do not pose any major difficulty for advanced learners, two items,
i.e. particularly and regularly are commonly mispronounced by them.
Pedagogical implications of the established hierarchy of word difficulty are obvious.
Advanced students should receive additional training in the pronunciation of words with
sequences of high front vowels, items with clusters of interdentals and /s/ and forms
which are subject to irregular morphological alternations. The next major source of
errors is the existence of phonetic ‘false friends’, whose number runs into hundreds, not
only in Polish, but also in many other languages as well. We would like to suggest that
in phonetic practice use should be made of such ‘minimal pairs,’ employed in, for
instance, Szpyra-Kozłowska and Sobkowiak (2011) . They should include both common
and proper nouns, e.g.
4.3. A comparison of word difficulty for intermediate and advanced
A comparison of factors contributing to the difficulty of word pronunciation for
intermediate and advanced learners shows that, statistically, the latter group has learnt to
deal better with the issue of liquids, word stress, long words and spelling-related
problems. Advanced learners have also fewer problems with consonantal clusters
involving interdentals, with the exception of ‘th’ followed by /s/. The most difficult
items turned out to be lengths and strengths, both containing clusters of three
consonants. The issues which are problematic for both groups involve sequences of high
front vowels within single words, phonetic ‘false friends’ and forms displaying irregular
morphological alternations. This means that these difficulties should be given special
attention in the phonetic training of all learners. These observations are summarized
Hierarchy of word pronunciation difficulty factors
Intermediate learners
All factors of similar
Advanced learners
Most difficult:
high front vowels
<th + s>
phonetic ‘false friends’
less difficult:
morphological alternations
spelling-related problems
long words
stress-related problems
sequences of liquids
4.4. Difficult and easy words
Let us now examine in some detail those words which, among the 80 diagnostic lexical
items, proved to be particularly difficult or easy for the participants. The easiest words to
pronounce, with over 85% of proper realizations are the following:
(a) rural literally burglary barely
(b) monthly
birthday hundredth
(c) variety (various) anxiety (anxious) sincerity (sincere)
The examples in (a) contain sequences of liquids while those in (b) clusters in which the
interdentals are combined with nonfricatives. The words in (c) participate in
morphological alternations, as seen in related forms provided in parentheses.
The most problematic items can also be divided into several sets.
(a) cheating ceiling greedy repeating deceiving
(b) courteous (court) advantageous (advantage) managerial (manager)
(c) algebra (P algebra) caricature (P karykatura)
(d) strengths lengths
The first and the largest of them in (a) contains sequences of high front vowels The next
one in (b) involves irregular morphological alternations. The third group in (c) has
cognates in Polish. Finally, the last one in (d) comprises clusters of velar nasals,
followed by interdentals and /s/.
Moreover, some words might be claimed to cause difficulty because of the complex
relationship between spelling and pronunciation, for instance those with the suffix –ous
added to stems ending in <e>, e.g. courteous hideous advantageous
It should also be noted that some of the most problematic items are fairly long and
contain at least four syllables: advantageous caricature
To sum up, the items provided in this section support the observations made earlier
concerning the major sources of word mispronunciations. Yet another question that
naturally arises in connection with the experimental items concerns the relationship
between the phonetic difficulty of these words for the learners and their frequency of
occurrence. We have found no meaningful relationship between these two issues. Thus,
the frequency figures, based on the British National Corpus of Spoken English, are
identical or almost identical for many words belonging both to category of difficult and
easy words, e.g.
Frequency of easy words
hundredth 2
burglary 3
Frequency of difficult words
courteous 2
managerial 1
The problem is that word frequency in the British National Corpus of Spoken English
does not have to be the same as word frequency in foreign students’ English for which
no data are available. This means that the former source is of limited usefulness in
predicting the degree of difficulty involved in word pronunciation.
4.5. Experimental results versus Phonetic Difficulty Index
In this section we examine the accuracy of Sobkowiak’s Phonetic Difficulty Index, with
its 10-point scale, in predicting the degree of difficulty of the experimental items.
It appears that in some cases the PDI values do coincide with the easy/difficult
dichotomy established in this study, e.g.1
Easy words vs PDI value
sincerity – 0
criticizing – 1
literally – 1
Difficult words vs PDI value
courteous – 8
caricature - 7
advantageous – 7
In the majority of cases, however, no significant correlation between the two evaluations
can be found. Thus, frequently our easy and difficult words are given the same PDI
Easy words vs PDI value
variety – 2
barely – 2
monthly – 3
Difficult words vs PDI value
haven – 2
ceiling – 2
cheating – 3
In some instances easy words have a higher PDI value than the difficult ones, e.g.
Easy words vs PDI value
rural – 4
burglary – 5
hundredth – 5
Difficult words vs PDI value
strengths – 3
receiving – 2
algebra – 3
I am grateful to W. Sobkowiak for providing me with the PDI values of the experimental items.
The difficulty scale ranges from 0 to 10, where the higher the score, the greater the phonetic
difficulty of words.
An analysis of 24 easy and difficult experimental words with their PDI values shows that
a correlation is found in about 50% of cases only. We can conclude that the PDI, in its
present shape, is rather inaccurate as a measure of the phonetic difficulty of words for
the advanced learners who took part in our experiment.
4.6. Students’ evaluation of word difficulty
In the second part of the experiment the participants were asked to evaluate the degree of
difficulty involved in the pronunciation of 24 experimental items representing eight
types of factors isolated in section 2.
According to the subjects, the following factors make words difficult for them to
- length (e.g. unintelligibility, satisfactorily)
- low frequency of occurrence (e.g. unintelligibility, satisfactorily)
- the presence of <th+s> clusters (e.g. sixths, lengths, maths)
- spelling and specific word endings (such as –eous) (e.g. thouroughly, courteous,
Thus, the most frequently mentioned source of word difficulty was their considerable
length (left unspecified), with two words judged as the most problematic of the tested
items, i.e. unintelligibility and satisfactorily. The low frequency of the same words was
also indicated as a cause of pronunciation problems. Sixth, lengths and maths, all
containing interdental fricatives followed by /s/, were listed as difficult because of such
clusters. Irregular spelling and sound correspondences in thouroughly, courteous,
southern and anxiety were blamed for pronunciation problems with these items. The
participants also enumerated some troublesome word endings, e.g. –eous (courteous), rily (satisfactorily), -rely (rarely). Interestingly, problems with the correct placement of
stress were not mentioned.
Of the 24 items subject to students’ evaluation, the following five were judged the
most difficult (the figures in parentheses indicate the number of participants who
evaluated these words as such): unintelligibility (15) satisfactorily (10) courteous (5)
thoroughly (5) sixth (5)
It is interesting to examine whether these judgements find confirmation in the
students’ actual performance. Taking into account the 24 items under consideration, the
most difficult words for them to pronounce were as follows (with the percentage of
correct realizations provided in parentheses): courteous (0%) greedy (0%) innocent
(8%) ceiling (8%) thoroughly (16%)
These data show that the participants’ opinions on word difficulty coincide with their
phonetic performance only in the case of two items, i.e. courteous and thoroughly. In the
remaining instances there is no such correlation. Thus, two words claimed to be the most
difficult received the following scores: unintelligibility was pronounced correctly by
30% of the students and satisfactorily by 58%, which makes them items of medium
difficulty. Interestingly, none of the respondents mentioned as problematic the following
words with fairly low scores for correctness: Murphy (20%)
lengths (20%)
southerners (25%) Nepal (25%)
Mispronounced Lexical Items in Polish English of Advanced Learners
The observations reported here indicate that even advanced learners who undergo formal
phonetic training are only partly aware of their pronunciation problems.
Let us examine in more detail the relationship between the students’ phonetic
performance and their assessment of word difficulty. We compared 8 evaluations of 24
words as easy or difficult to pronounce by 4 students with good pronunciation and 4
students with poor pronunciation with their actual realization of these items. We counted
the number of matches and mismatches between the questionnaire answers2 and the
participants ‘ production of a given item. The results are shown below.
good students
poor students
items judged easy
and pronounced correctly
items judged easy
and mispronounced
It turned out that good students were more accurate in their assessment of word difficulty
than poor students. This means that good students more often consider words as easy
when they can actually pronounce them correctly than poor students who often mark
words as easy and yet mispronounce them.
5. Conclusions
It is hoped that the presented study has provided some insight into the issue of
phonetically difficult words in the speech of advanced Polish learners. It has allowed us
to make several observations which carry important pedagogical implications.
1. Phonetically difficult words abound in Polish-accented English of learners of
different levels of proficiency, including intermediate and advanced students.
Consequently, this issue should be given due attention in the course of their
phonetic training.
2. The most important sources of word mispronunciations for advanced learners
involve sequences of high front vowels, clusters of interdentals with other
fricatives and phonetic ‘false friends.’
3. Advanced students, when compared with intermediate learners, have fewer
problems with spelling and stress-related issues, sequences of liquids, longer
words, and clusters of interdentals with nonfricatives.
4. Since sets of phonetically difficult words for intermediate and advanced learners
overlap only partially, there can be no one PDI common for all of them.
5. The comparison of students’ evaluation of word difficulty with the experimental
results indicates that even advanced learners are only partly aware of their
pronunciation problems and cannot assess them objectively. Thus, more care
should be taken to develop their skill of self-evaluation.
A match was declared when an item was pronounced correctly and marked as easy to pronounce
or when an item was mispronounced and marked as difficult to pronounce.
Jolanta Szpyra-Kozłowska
Sobkowiak, W. 1999. Pronunciation in EFL Machine-Readable Dictionaries. Poznań:
Szpyra-Kozłowska, J. 2011. Phonetically difficult words in intermediate learners’
English. In Pawlak, M,. E. Waniek-Klimczak & J. Majer (eds). Speaking in contexts
of instructed foreign language acquisition.. Bristol: Multilingual Matters. 286-299.
Szpyra-Kozłowska, J. & W. Sobkowiak. 2011. Workbook in English Phonetics. 2nd
edition. Lublin: Wydawnictwo UMCS.
Szpyra-Kozłowska, J & S. Stasiak. 2010. From focus on sounds to focus on words in
English pronunciation instruction. Research in Language vol. 8., 163-174..
Szpyra-Kozłowska, J. in press On the irrelevance of sounds and prosody in foreignaccented English.
Waniek-Klimczak, E. 2002. Akcent wyrazowy w nauczaniu języka angielskiego. In
Sobkowiak, W. & E. Waniek-Klimczak (eds) Dydaktyka Fonetyki Języka Obcego.
Zeszyty Naukowe PWSZ w Płocku, t. III. Płock: Wydawnictwo Naukowe PWSZ w
Płocku. 101-114.
Appendix 1
A list of diagnostic sentences. The tested items are in boldface.
1. His mania for watching Disney cartoons and horror films all night made the
hotel management increasingly uncomfortable.
2. Graham and Murphy went straight from Madrid to Nepal, where they joined
the demonstrators fighting for freedom in Tibet.
3. It is frequently claimed that at school Einstein showed no enthusiasm or
appreciation for algebra, and his maths teacher regularly accused him of
4. He went to great lengths to characterize appropriately all his strengths and
weaknesses to prove he was innocent and did not commit this burglary.
5. With mounting curiosity he examined the whole area thoroughly for the sixth
time and decided that the evidence that the infamous murderer was there was
not satisfactorily established and was purely circumstantial.
6. The artificiality and unintelligibility of his explanations created much anxiety
particularly in this rural area were politicians are rarely trusted and their
sincerity is frequently questioned.
7. The rivalry between the southerners and the northerners in this sleepy town
was closely watched by the managerial staff of this industry who thought it
advantageous for a variety of reasons.
8. In this monthly I saw a caricature of this admirable, courteous man whom the
media keep simultaneously praising and criticizing.
9. It was her hundredth birthday and the organizers of the party made every
effort to control the chaos and provide various attractions: a gigantic neon with a
congratulatory message, a champagne geyser and a band of robots playing very
rhythmical music.
10. They considered this place their ultimate haven, with chestnut trees, thyme and
heather in the garden decorated with hideous dwarfs.
11. This greedy, unsophisticated person who kept repeating that he was capable of
deceiving literally everybody held a prestigious administrative post.
12. He could barely wait for receiving an explanatory statement from this
authoritarian official.
13. The width of this ceiling was truly impressive, but the paintings on it were
fairly imitative.
Jolanta Szpyra-Kozłowska
Appendix 2
The questionnaire used in the experiment
Evaluate the difficulty of pronunciation of the following words by marking them as
E – easy to pronounce
M – medium
D – difficult to pronounce
1. thoroughly
3. lengths
5. administrative 7. greedy
9. curiosity
11. sixth
13. burglary
15. southern
17. chestnut
19. criticizing
21. rarely
23. innocent
2. demonstrators
4. Murphy
6. particularly
8. anxiety
10. geyser
12. unintelligibility
14. ceiling
16. explanatory
18. Nepal
20. satisfactorily
22. courteous
24. maths
Now choose three words from the above list which you consider difficult to pronounce
and comment on why you find them difficult.