Report on existing foreign language and oral testing formats

advertisement
On the Way to Developing a MARTEL Plus Speaking Test
Introduction
Assessing linguistic competence in Maritime English adequately and reliably at internationally
recognized levels has been set forth in recent years as a major issue because it reaches out
equally to merchant marine officers, cadets and students, as well as Maritime English Training
(MET) institutions, maritime administrations, ship owners, etc. Indeed, all the above mentioned
parties have come to recognise the need of developing exam systems evaluating spoken
competence [1] and conducting Maritime English oral tests to this effect. Furthermore, the
necessity to ensure effective communication (in both written and oral form) in its diverse
manifestations in various nautical and technical spheres has been explicitly expressed in the
Manila amendments (2010) to the STCW Convention 1978/95. [2]
The Testing Context
The IMO requirements for English language competence needed for work in the maritime
environment have been stipulated in SOLAS, Chapter 5 and the STCW convention and code. To
sum them up, they can all be expressed as the ability to communicate:
 with other ships and coast stations
 with multilingual crews in a common language
 information relevant to the safety of life at sea , pollution prevention, etc. [3]
The ISM Code, in addition, emphasizes on effective communication in the execution of crew’s
duties which in practice is usually made in English.
Based on feedback received from different parties and in response to the need of developing a
more comprehensive process for the evaluation of oral competence, as raised in the 2010 IMO
STW 41 meeting, the MARTEL Plus project set as one of its goals to embark on enhancing the
speaking part of the MARTEL test of maritime English language proficiency. The latter was
created under the EU Leonardo da Vinci funding stream, in combination with the Lifelong
Learning Programme [4]. They envisaged this as a complement to the existing MarTEL
standards, a two-tier system, including the current MARTEL speaking section plus a separate
one to one oral examination [5]. Conducted with a qualified examiner/interlocutor it would
allow for a structured interview to elicit performance, a greater reliability and a fair assessment
of a candidate’s ability to speak English.
The aims of this report are:
- to review existing foreign language and oral testing formats, to compare them and find
out the most suitable option for the new MarTEL Plus speaking test incorporating in it
the specific oral capabilities required by the maritime industry
- to report on developments made so far by the Bulgarian team.
Research and Review of Some Existing English Oral Tests
Instead of delving into the multitude of existing English oral tests, we chose to get familiar with
only some of them pertinent to a particular target language use situation, namely RMIT English
Language Test for Aviation (RELTA), Trinity Spoken English for Work (SEW), STANAG 6001 tests
and the Oral Proficiency Interview (OPI). We selected them not only as tools providing
proficiency rating scales for specific communication skills in a certain sphere of ESP but also as
testing format types and compared them in terms of:
- purpose
- intended users
- test takers
- target language use situation
- test format and task types
- testing benchmark
There have been some objective as well as subjective reasons for dwelling on these test
formats.
We chose RELTA as being a test developed for a professional domain with a stress on speaking.
The RMIT English Language Test for Aviation (RELTA) is one of the tests developed to meet the
International Civil Aviation Organization (ICAO) language proficiency requirements for testing. It
is used to evaluate language proficiency of pilots and air traffic controllers for the six ICAO
levels. The RELTA consists of a 25 minute speaking test and a 35 minute listening test, both
completed on a computer. The RELTA is used to assess speaking and listening proficiency in
both phraseology for routine communications, and plain English for non-routine and
emergency communications in radiotelephony and face-to-face situations, as required by the
ICAO Language Proficiency Requirements [6].
Our second choice was based on the contextualized non-native speaker environment the test
format offers. The Trinity College of London Spoken English for Work (SEW) examination
measures spoken English in a working context relevant to the chosen profession of the
candidate. It is designed for anyone aged 16 and above who is already working or is preparing
to enter the world of work. The exams benefit non-native speakers from any profession where
English language proficiency is either a requirement or an opportunity for better career
prospects and promotion. It is not sector-specific and can be applied across an organisation to
all employees who need to speak English in their job. The test focuses on the speaking and
listening skills used in everyday working environments. All of the tasks are contextualised
around the world of work including a work-related telephone call between the candidate and
the examiner. The assessment also takes into account a wide range of employment tasks,
allowing each candidate to communicate about their individual work-related experiences. SEW
is available from B1 to C1 in the Common European Framework of Reference for Languages
(CEFR) [7].
The reasons for choosing the Bulgarian STANAG 6001 Speaking Section test format were purely
subjective – all members of our team have had some personal involvement in developing and
administering STANAG tests under the Partnership for Peace (PfP) programme. There is no
official exam for the STANAG 6001 levels and countries which use the scale produce their own
tests. The tests usually consist of four sections dedicated to the four major skills. The Speaking
Section measures skills required in a multinational military environment on personal, public,
and professional topics. The tests may be single-, bi- or multi-level and involve a variety of tasks
to ensure that candidates can:
- communicate in everyday social and routine workplace situations
- participate effectively in formal & informal conversations
- use the language with precision, accuracy, and fluency for all professional purposes, etc [8].
The Bulgarian team developed a multi-level test which in the long run proved to be the better
choice.
Last but not least, consideration was also given to the so-called Oral Proficiency Interview (OPI),
as we have all been OPIed and it has given us the good experience to be “on the other side”.
This test has the unique advantage of being a standardized method of measuring actual
proficiency in language skills required to function in given life/job-related situations, as well as a
testing tool with low risk of compromise. It is a test given to ascertain a person's English
language, listening, comprehension and speaking skills in many U.S. government institutions
and in use in some NATO countries with applications in both the business and educational
world. The OPI is conducted via the telephone or face-to-face with one or two interlocutors and
may be recorded in order to ensure an independent rating procedure. The OPI cycle consists of
several stages: warm-up, level checks, probes, and wind-down. The role of the 'warm-up' is to
put the interviewee at ease and to generate topics which can be explored later in the interview.
The level checks allow the test-taker to demonstrate his/her ability to deal with tasks and
contexts at a particular level. Level probes serve to determine ability to perform linguistic tasks
at the next higher base level. The wind-down brings the interview to an end. Each interview
may last from 20 to 40 minutes depending on the time needed to obtain a ratable speech
sample involving multiple language functions, tasks and topics, yet inevitably following the
same standardized procedure irrespective of the test taker’s level [9]. Appendix 1 represents in
brief how the four test formats compare in terms of the criteria already mentioned above.
What follows from our review is that existing English oral tests present candidates with tasks
that resemble as closely as possible what people do with the language in real life performing a
variety of language functions at the same time. The test-taker’s performance has to be
spontaneous since in real life we rarely have the chance to prepare for what we want to say.
Whether face-to-face, telephone or computer-delivered the testing method, test takers are
evaluated in their use of language related to both routine and non-routine (unexpected or
complicated) situations.
Taking this into consideration we agreed that the MarTEL Speaking Test should differ from a
large number of modern speaking exams that claim to assess candidates’ overall speaking
ability in English for no specific purpose. Rather, it should aim at assessing the linguistic
competence in Maritime English environment, incorporating maritime specialized and
terminological vocabulary, but should refrain from testing professional competency. Moreover,
it must differ from Marlin’s Test of Spoken English (TOSE) and the Test of Maritime
English Competence (TOMEC). Therefore the MarTEL Speaking Test will uniquely address
communications by being a ‘Maritime Test of English Language’ and not an ‘English Test of
Maritime Knowledge’. [10]
Research and Review of Existing Language Proficiency Descriptors and Frameworks
In order to ensure that language proficiency is understood in similar terms and achievements
can be compared in the European context, the Council of Europe has devised a common
framework for teaching and assessment, which is called ‘The Common European Framework of
Reference for Languages’, or CEFR for short. The CEFR, though not a testing tool proper,
furnished us with some useful ideas pertinent to the maritime environment, like “the
plurilingualism in response to European linguistic and cultural diversity”, the common reference
levels needed “to fulfil the tasks … in the various domains of social existence”, the flexibility of
“common reference levels”, “the various purposes of assessment”, etc. [11]
The International Civil Aviation Organisation (ICAO) has established English language proficiency
requirements for all pilots operating on international routes, and all air traffic controllers who
communicate with foreign pilots. These standards require pilots and air traffic controllers to be
able to communicate proficiently using both ICAO phraseology and plain English. Performance
is graded on a 6-band scale (1 – 6) according to the following criteria: pronunciation, structure,
vocabulary, fluency, comprehension, interactions. They reflect a similar target language
situation and attempts have already been made on creating a test representative of it. [12]
NATO Standard Agreement (STANAG) 6001 is an international military standard developed to
measure general language proficiency of key personnel being prepared to take part in or
actually participating in peacekeeping missions and performing various duties in NATO-led
operations. There is no official exam for the STANAG 6001 levels and countries which use the
scale produce their own tests. Test-takers are assigned levels on a band scale from 0 to 5,
expressed by whole numbers. Borderline (plus) levels are also used for levels from 0 to 3. [8]
Similar to it is the Interagency Language Roundtable (ILR) scale of oral proficiency which is
usually employed when assessing an OPI. It is a set of descriptions of abilities to communicate
in a language developed originally to assess the language proficiency of federal government and
diplomatic personnel. This is a system of measuring language proficiency on a scale of 0 to 5.
Proficiency level of 0 equates to no knowledge of a language, while the proficiency level of 5
equates to a highly educated foreigner or native speaker. Proficiency levels in excess of a whole
number, but not reaching the next whole number are represented with the 'plus' sign, for
example, a linguist who speaks at a near native level might be represented as having a 4+ level
proficiency. [13]
Outcome
1. As the Model course (3.17) on Maritime English presents us with the IMO requirements on
use of Maritime English in professional context we derived a list of topics for our testing needs.
They are divided into social exchanges, job-related and emergency types and broken down
according to the CEFR levels. This is important because we believe that a variety of topics
should be explored in the speaking test. Having prepared the list of topics we found that the list
of routine topics progressively increases in contrast to non-routine ones and outweighs them.
We consider this quite natural and typical of each proficiency level as they may be qualified
"conversations in relevant situations" as referred to in the CEFR. (See Appendix 2.)
2. Following CEFR, each level in terms of topics should encompass those included in the lower
levels but the expected output should be different based on the idea that it will be a global,
multi-level test covering levels A1 through C1 for each MarTEL phase and specialty as agreed in
Work Package (WP) 3.
3. There aren't many engineering-related topics in the Model Course so we think we should
expand the list and make some contributions to it based on our teaching experience. This will
provide engineers with equal opportunities to be tested fairly.
4. We find appropriate the idea of linking different phases of the MarTEL tests to test levels.
However, if we have to associate a level with a position on-board, it would be better to assign
minimum level requirements such as:
Phase R – A2
Phase 1 –not less than B1
Phase 2 – B2
Phase 3 – B2 - C1 (to be confirmed at a later date possibly after piloting)
Thus, when taking the test if a rating performs at B1 s/he will be assigned B1 in accordance with
the rating scale and still be eligible for the rating position. On the other hand, if s/he performs
at A1, the test will show that s/he has a very low level of oral English but will meet the
requirements for the position of a rating. This specific information may be crucial for future
employers.
5. Level C1 is not covered in the Model Course but we are of the opinion that senior officers
take part in formal language communications. Besides C1 is a proof of the level of language
needed to work at a managerial or professional level or to follow a course of academic study at
university level. Therefore we strongly believe that this justifies the inclusion of C1 in the test
specifications We find the topics suggested in Model Course 3.17, Core 2 suitable for B2 to be
appropriate for C1 as well. C2 may not be applicable for the maritime domain especially for
non-native speakers. Having consulted the CEFR [14], we found descriptors lacking C1 and C2
for performing some of the functions. Taking into consideration the multilingual crews as well
as the different levels of English language competence on board ships, we decided that it would
be irrelevant to expect a C2 performance in seafarers' routine duties.
6. The channel of communication should be face-to-face (involving a test-taker and an
interlocutor), not computer-delivered no matter how tempting it might appear at this stage.
This is the natural way of communication and enables the test-taker to demonstrate his/her
linguistic abilities in real interaction in close to real life situations. Moreover, it provides
sufficient evidence that the test-taker is able to participate fluently in real communicative
events.
7. VHF communications including use of SMCP will not be tested as they are covered in other
sections of the MarTEL test and given due attention.
8. The speaking grid has been designed and is under constant revision. It will hopefully serve as
the basis for developing the two versions of the test specifications (for test-developers and for
public use). It focuses on linguistic and pragmatic competence criteria as well as on interaction.
We believe they incorporate the speaking assessment factors relevant to the context. As for
sociolinguistic competence it should be given priority in the process of teaching rather than
testing.
9. The general features of the test format are being discussed based on the conducted
research. The issues of importance are the test length, number of sections, the number and
type of tasks and the assessment criteria. The tasks will be set in a carefully designed context
and will engage test takers in language performance in such a way that their contributions are
not rehearsed or prepared in advance. The final outcome will be reflected in the test
specifications.
General Assumptions
The following assumptions are based on research findings in the field of assessing languages for
specific purposes (LSP). We considered them as a starting point in defining the conceptual
framework of all documents and test materials we were tasked to develop.
1. There is a threshold language ability required before test-takers can make effective use
of their background knowledge in a specific context of language use.
2. There should be a link between the theory of test development and the practice of
selecting and using proper field specific materials in the process of designing valid and
reliable LSP tests.
3. In the light of developing communicative tests the construct of specific purpose
language ability is based on the interaction of two components: language knowledge
and strategic competence. [15]
General Considerations
In our efforts to carry out the research into and the review of the existing test formats for
testing oral production we had a few concerns to begin with. The first one was related to the
description of some central features of the testing context and the test format. This was
necessary to help provide the background of the kind of speaking to be assessed, i.e. construct
specification [16]. In our discussions we were trying to identify what speaking means in the
context of maritime English and the kinds of tasks incorporated in the test in order to test that
kind of speaking. Research findings show that it is difficult to find suitable and novel tasks that
test communicative ability alone and not intellectual capacity, educational and general
knowledge or maturity and experience of life.
In addition, the choice of the type of assessment is limited to construct based and task based
where the latter is especially used in professional contexts as the scores give information about
the examinees ability to deal with the demands of the situation. Researchers do not look at the
two perspectives as ‘conflicting’ [17]. Therefore, combining elements of the two appeared to be
the tool that would satisfy the needs of our particular context.
We approached the test format and its elements using the definition in the Dictionary of
Language Testing [18]. We focused on some of the elements of the test design namely the task
types and the kinds of responses required of the test-takers. These two are of primary concern
as the flexibility of the task frame guides or limits the response of the test-taker. Specialists in
language testing distinguish between open-ended speaking tasks, structured and semistructured tasks. In the maritime context and for our particular needs we find the open-ended
tasks very flexible in terms of language production and indication of speaking skills. The OPI, for
example includes a number of open-ended tasks that are related to description, instruction,
comparison, explanation, justification, prediction, decision. The structured speaking tasks
cannot assess the unpredictable and creative elements of the speaking. The semi-structured
tasks, reacting to situations, for example, tend to be used within a particular culture.
Further on, the most common way of arranging the speaking tasks is the interview format. Its
history dates back to the 1950s when it became a standard tool for testing speaking. Some of
its advantages are:
- it is realistic and resembles real life communication and interaction;
- it is flexible, questions can be adapted to each individual test-taker’s performance;
- it gives the interlocutor a lot of control over the interaction.
This format has been viewed as very time-consuming. Another disadvantage could be the
subjectivity of the assessment as the examiner might be influenced by the test-taker’s
personality or communication style. This is one reason why there should be two examiners –
one to conduct the interview and the other one to be the assessor. In this way the interlocutor
will stay focused on the interaction process helping and encouraging the test-taker to
demonstrate their best and carefully going through all stages and tasks of the interview. The
assessor on the other hand, will be involved in evaluation and will remain focused on the testtaker’s performance. The final score will be discussed by both until an agreement is reached. A
team of two examiners has been the most common practice in the testing community as it is
believed to reduce the level of subjectivity. This implies that both examiners should have
adequate training and qualification to use the format as a measurement instrument and
provide fair testing conditions for each test-taker.
Our second concern refers to ethics. Being fair to all test-takers is a major matter of interest for
all test developers and examination boards. This is the reason why some formats come with the
accompanying test materials, e.g. sample materials, preparation materials, etc. to provide
conditions for fair testing. The EALTA Guidelines for Good Practice in Language Testing and
Assessment was the document we used to make sure we were on the right track. It is our
responsibility as test developers to become familiar and follow the general principles of good
practice. There are also issues we need to clarify to ourselves before we begin our work on the
test(s). For example, it is important for us to find answers to questions like:
1. Does the assessment purpose relate to the curriculum?
2. How appropriate are the assessment procedures to the learners/test-takers?
3. What efforts are to be made to ensure that the assessment results will be accurate and
fair?
4. Will there be any preparatory materials?
5. Will markers/examiners be trained for each test administration?
6. Is the test intended to initiate change(s) in the current practice?
7. What evidence is there of the quality of the process followed to link tests and
examinations to the Common European Framework? [19]
To provide answers to these questions we should be engaged in discussions with the decision
makers to ensure that they are aware of both good and bad practice.
The third issue which is sometimes ignored is the washback (or backwash as used in the general
education field) effect. The notion of `washback` refers to the influence that tests have on
teaching and learning. Different aspects of influence have been discussed in different
educational settings at different times in history due to the fact that testing is not an isolated
event [20]. Washback studies investigate the impact of different types of tests on the content
of teaching, teachers’ approaches to methodology and the reasons for their decisions to do
what they do.
Furthermore, researchers suggest that `high-stakes tests` would have more impact than lowstakes tests [21]. If we consider the new speaking test a high-stakes test, we should then be
aware of factors such as the status of the subject, i.e. English within the curriculum, the prestige
of the test, the nature of teaching materials, teacher experience and teacher training, teacher
awareness of the nature of the test as they all would affect the amount and type of washback.
New tests do not necessarily influence the curriculum in a positive way as changes do not
happen overnight and teachers do not always feel ready to implement changes. In his study on
the washback effect of the Revised Use of English Test, Lam concludes that it is not sufficient to
change exams: “The challenge is to change the teaching culture, to open teachers` eyes to the
possibilities of exploiting the exam to achieve positive and worthwhile educational goals” [22].
Whether intended or unintended the washback effects demonstrate the complexity of the
phenomenon. One conclusion based on washback research findings is that there is a complex
interaction between tests on one hand and language teachers, material writers and syllabus
designers on the other hand and we should be aware of this.
Conclusion
This report was written to inform decision-makers about the progress made so far by the
Bulgarian team. Some major research findings in the field of testing have been considered in
making decisions about the development of the new test. The beginning has been set and there
is a lot of work ahead of us. We hope we are on the right track.
References
1. Logie Catherine Whose culture? The impact of language and culture on safety and
compliance
at
sea
Retrieved
February
4-th
2011
from
http://www.ukpandi.com/fileadmin/uploads/uk-pi/LP%20Documents/Industry%20Reports/
Alert/Alert14.pdf.
2. Subcommittee on Standards of Training and Watchkeeping, Report to the Maritime Safety
Committee Retrieved February 24-th 2011 from http://www.uscg.mil/imo/stw/docs/stw41report.pdf.
3. STCW Code, Table A-II/1, Table A IV/2
4. www.martel.pro
5. http://www.plus.martel.pro/
6. http://www.relta.org/
7. http://www.trinitycollege.co.uk/site/?id=1521.
8. http://www.md.government.bg/bg/doc/zapovedi/2009/2009_OX626_STANAG_Spec_EN.pdf
9. http://www.dlielc.org/text_only/Language_Testing/opi_test.html.
10. Ziarati R, Ziarati M, Çalbaş B. Improving Safety at Sea and Ports by Developing Standards for
Maritime
English
Retrieved
on
Feb
24
2011
from
http://www.healert.org/documents/published/he00845.pdf.
11. Common European Framework of Reference for Languages: Learning, Teaching, Assessment
(CEFR), CUP, 2001.
12. http://www.englishforaviation.com/ICAO-requirements.php
13. http://www.reference.com/browse/wiki/ILR_scale
14. Common European Framework of Reference for Languages: Learning, Teaching, Assessment
(CEFR), CUP, 2001, pp 58-59.
15. Douglas, D. Assessing Languages for Specific Purposes, Cambridge Language Assessment
series, Cambridge University Press, pp.30-36., 2000.
16. Luoma, S. Assessing Speaking, Cambridge Language Assessment Series, Cambridge
University Press, 2004.
17. Luoma, S. Assessing Speaking, Cambridge Language assessment Series, Cambridge
University Press, p.42, 2004.
18. Dictionary of Language Testing, Studies in Language testing 7, Cambridge University Press,
1999.
19.
EALTA
Guidelines
for
Good
Language
testing
and
Assessment
www.ealta.eu.org/guidelines.htm.
20. Shohamy, E. The power of tests: The impact of language tests on teaching and learning.
NFLC Occasional Paper. Washington, DC: National Foreign Language Center, 1993.
21. Alderson, J.C. and Wall, D. ‘Does Washback Exist?’ Applied Linguistics, 14(2): 115-29, 1993.
22. Lam, H.P. Methodology washback – an insider’s view. In D.Nunan, R.Berry, & V.Berry (Eds.),
Bringing about change in Language Education: Proceedings of the International Language in
Education Conference 1994 (83-102). Hong Kong: University of Hong Kong, 1994.
Appendix 1
Criteria
purpose
RELTA
To measure specific
purpose language
proficiency in
aviation English
SEW
To measure spoken
English in a working
context for better
career prospects
and promotion
intended users
Air Navigation
Service Providers,
Aircraft Operators
and National
Regulatory
Authorities
Pilots, air traffic
controllers,
aeronautical
station operators
Prospective
employers
test takers
target
language use
situation
test format
and task types
testing
benchmark
Radiotelephony
communications in
air navigations and
traffic control
services/the
aviation community
Speaking and
listening computer
delivered test,
Section 1, 2 test
radiotelephony in
phraseology and
plain English,
Section 3 interview
ICAO rating scales
STANAG 6001
To assess the
overall English
language
proficiency (not
professional
competence) of
military personnel
for career selection
and promotion
NATO member
states Ministries of
Defence
OPI
To assess listening,
comprehension and
speaking skills for
educational and
employment
purposes
Non-native
speakers already
working or
preparing to enter
the world of work
Non- sector-specific
Military and civilian
personnel in the
armed forces
Military
students/staff and
government agency
officials
Military-related
context
Life and job-related
context
Speaking and
listening, role-play
phone task
(problem solving);
topic discussion;
presentation
Speaking section –
face-to-face
structured
interview involving
various tasks –
conversation, roleplay, information
gathering task,
topic discussion,
etc.
STANAG 6001
Language
Proficiency Levels
Face-to-face or
telephone 3-phase
interview
containing a series
of different tasks,
topics, elicitation
techniques, etc.
CEFR
Government
institutions in the
USA , Canada and
other NATO
countries
Interagency
Language
Roundtable (ILR)
Skill Level
Descriptions
Download