Test comparability across languages Ülle Türk University of Tartu/Estonian Defence Forces ulle.turk@ut.ee/ylle.tyrk@mil.ee Topics to be discussed 1. Issues in comparing examinations in different languages (on the basis of the Estonian Year 12 examinations). 2. Some ways of achieving the comparability of examinations in different languages (on the basis of the Finnish matriculation examinations). 3. Relating language examinations to the CEFR – general principles. 4. Relating reading papers to the CEFR. 5. Relating writing papers to the CEFR. 30-31 March 2007 2 Issues in comparing examinations in different languages (on the basis of the Estonian Year 12 examinations) Why compare examinations across languages? • Needs of test users – University admissions officers – Employers – Teachers, students, parents • Increasing mobility of the population • Increasing consumer choice • A growing emphasis on accountability 30-31 March 2007 4 Equivalence of examinations • Equivalent forms = – Different versions of the same test, which are regarded as equivalent to each other in that they are based on the same specifications and measure the same competence. To meet the strict requirements of equivalence under classical test theory, different forms of a test must have the same mean difficulty, variance, and co-variance, when administered to the same persons. ALTE. 1998. Multilingual Glossary of Language Testing Terms. 30-31 March 2007 5 Year 12 examinations in Estonia =national school-leaving examinations established in 1997 –centrally developed, administered and marked –test the learning outcomes of the National Curriculum (2002) –contain tasks at different levels of difficulty –results on a 100-point scale –offered in 13 subjects • four foreign languages: English, French, German, Russian –students required to take three: mother tongue + two more 30-31 March 2007 6 Foreign language examinations • National Curriculum for foreign languages – two compulsory foreign languages • FL A: grades 1–3; 3 hrs per week/ 2 hrs per week • FL B: grades 4–6; 3 hrs per week/ 2 hrs per week – level B2 in ONE foreign language • Ministry regulations on the development, administration and grading of examinations and reporting results – five papers: listening and reading comprehension, speaking, writing, language structures – equally weighted (20 points each) • High-stakes examinations – results used for university entrance – foreign language requirement: English, French, German 30-31 March 2007 7 Questions • Does 93 in the examination of 2006 reflect the same level of competence as 93 in the examination of 2004 (or 2001)? • Is 63 in English the same as 63 in German, French or Russian? • What does ‘the same’ mean? – The same level of competence? B2 – The level of competence reached after the same amount of work? • US Foreign Service Institute (Jackson & Kaplan, 1999: 78) • US Defense Language Institute Foreign Language Center (MacWhinney 1995: 294) • Threshold Level (Trim) 30-31 March 2007 8 Year 12 examinations in FLs English 2001 No Mean % 8488 64.1 German 1408 66.0 1076 66.7 Russian 509 69.4 445 72.3 67 75.4 63 79.6 French 30-31 March 2007 2004 No Mean% 9099 66.6 9 German & English: reading paper • • • • 50 minutes Three texts with tasks Length of texts: 1500 words No of tasks: – German: one task per text – English: one or two tasks per text • No of items – German: 20 – English: 40 30-31 March 2007 10 Reading: mean scores Lang Year Paper Text 1 English German Text 2 Text 2 69%/35 % 57% 2001 61% 71% 2004 68% 78% 72%/66 % 75% 2001 61% 67% 67% 55% 2004 60% 84% 62% 46% 30-31 March 2007 11 Text types German 2001 & 2004 Text 1: Short adverts English 2001 Newspaper interview 2004 Opinion article Text 2: Short news/ opinion Short book reviews Magazine article Text 3: Newspaper/ Newspaper article Newspaper article articles magazine article (600–700 words) 30-31 March 2007 12 Task types German 2001 & 2004 English 2001 matching (question 1) inserting sentences into text + answer) 2) multiple choice: 3 Task 1: matching (person + advert) Task 2: matching (text 1) identifying important + heading) information 2) matching (def + word) 1) matching (text + multiple title) choice: 4 2) summary cloze options Task 3: 30-31 March 2007 2004 identifying important information 1) T/F/NI 2) matching (def + word) 13 More questions • The reading paper in English seems more difficult than that in German (B2/C1? B1/B2?) • Why are then the mean scores for the German examination the same or lower than the mean scores for the English examination? – Students who take German are less motivated and less bright? – It takes longer for Estonian students to reach the same level of competence in German than in English. 30-31 March 2007 14 Foreign languages in Estonia 2004–5 English Percentage Examinees: 11033 of learners 82.4% 9415 85.3% German 19.2% 1053 9.6% Russian 39.2% 485 4.4% 2.7% 80 0.7% French 30-31 March 2007 15 U.S. Government Proficiency Ratings (Jackson & Kaplan, 1999: 73) Rating Description S/R-0 No Functional Proficiency S/R-1 Elementary Proficiency: Able to satisfy routine courtesy and travel needs and to read common signs and simple sentences and phrases. S/R-2 Limited Working Proficiency: Able to satisfy routine social and limited office needs and to read short typewritten or printed straightforward texts. S/R-3 General Professional Proficiency: Able to speak accurately and with enough vocabulary to handle social representation and professional discussions within special fields of knowledge; able to read most materials found in daily newspapers. S/R-4 Advanced Professional Proficiency: Able to speak and read the language fluently and accurately on all levels pertinent to professional needs. S/R-5 Functionally Equivalent to an Educated Native Speaker 16 30-31 March 2007 Approximate Learning Expectations at the Foreign Service Institute (Jackson & Kaplan, 1999: 78) Language “categories” Category I: Languages closely cognate with English. French, Italian, Portuguese, Romanian, Spanish, Swedish, Dutch, Norwegian, Afrikaans, etc. Category II: Languages with significant linguistic and/or cultural differences from English. Albanian, Azerbaijani, Bulgarian, Finnish, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Latvian, Nepali, Polish, Russian, Serbian, Tagalog, Thai, Turkish, Urdu, Vietnamese, Zulu, etc. Category III: Languages which are exceptionally difficult for native English speakers to learn to speak and/or read. Arabic, Chinese, Japanese, Korean 30-31 March 2007 Weeks Class hours 23-24 575600 44 1100 88 (2nd year in the country) 2200 17 The Defense Language Institute Foreign Language Center (MacWhinney 1995: 294) Group I Western European languages: Roman alphabet, share many cognates with English, greatly simplified grammatical system. Group II More challenging Indo-European languages: Roman alphabet, complex grammatical system (Lithuanian, German, Romanian, Hindi) Group III Indo-European languages: non-Roman writing system, complex grammar (Greek, Russian, Serbian, and Persian) “Easy” non-Indo-European languages (Hungarian, Tagalog, Turkish; Thai or Vietnamese) Group IV Non-Indo-European languages: non-Roman writing system, complex grammar (Arabic, Japanese, and Korean) Group V Exotic languages: Eskimo, Warlpiri, Navajo, Georgian, etc. 30-31 March 2007 18 Some ways of achieving the comparability of examinations in different languages (on the basis of the Finnish matriculation examinations) Preparing examinations • Joint examination board for foreign languages (16 people) • 2–3 members representing each language • Language groups – 5–6 people • Markers (for English: 20–30 people) • Collective responsibility: the board as a whole is responsible for the quality of all language examinations 30-31 March 2007 20 Process of examination preparation • Language groups – – – – Agree on text and task types Agree on the schedule of work Find the texts (2–3 times as many as will be needed) Design the materials • The board discusses all the examination papers – Constructive criticism – Suggest changes/improvements • Language groups make the necessary changes • All the members of the language group read the test materials to make sure that they contain no mistakes 30-31 March 2007 21 Post-examination analysis • 3–4 weeks for marking the papers • Teachers mark their own students’ papers first using very detailed marking schemes, but their marks do not count. • If the central marker’s grade differs from that given by the teacher too greatly, a second marker is brought in. • The whole board analyses the examination results • If a question does not ‘work’, all students are awarded a point for it. • Item difficulty is taken into consideration when awarding the grades. 30-31 March 2007 22 Grading • 7 grades based on norm referencing – Laudatur (5%) – Eximia (10%) – Magna cum laude (20%) – Cum laude (30%) – Lubenter approbatur (20%) – Approbatur (10%) – Improbatur (2-5%) • reliability – 0.9-0.95 30-31 March 2007 23 Relating language examinations to the CEFR – general principles Relating an examination or test to the CEF is a complex endeavour. The existence of such a relation is not a simple observable fact, but is an assertion for which examination provider needs to provide both theoretical and empirical evidence. The procedures by which such evidence is put forward can be summarized by the term “validation of the claim.” Relating Language Examinations to the CEF: Manual (Preliminary Pilot Version), 2003: 1 30-31 March 2007 25 Procedures (1) • Familiarisation: – A selection of activities designed to ensure that participants in the linking process have a detailed knowledge of the CEFR • Specification: – A self-audit of the coverage of the examination (content and task types) in relation to the categories presented in CEFR Chapters 4 (Language use and the language learner) and 5 (The user/learner’s competences) 30-31 March 2007 26 Procedures (2) • Standardisation: – Suggested procedures to facilitate the implementation of a common understanding of the Common Reference Levels presented in CEFR Chapter 3 • Empirical validation: – The collection and analysis of test data and ratings from assessments in order to provide evidence that both the examination itself and the linking to the CEFR are sound 30-31 March 2007 27 Specification of examination content (1) • Familiarisation with CEFR – Consideration of a selection of the question boxes printed at the end of relevant sections of CEFR chapters – Discussion of the CEFR levels as a whole – Self-assessment of own language level in a foreign language – Sorting individual CEFR descriptors into levels 30-31 March 2007 28 Specification of examination content (2) • Internal validity: Description and analysis of – general examination content – process of test development – marking, grading, results – test analysis and post-examination review • External validity: Relate – general examination description to CEFR scales – description of communicative activities tested to CEFR scales – description of aspects of communicative language competence tested to CEFR scales 30-31 March 2007 29 Standardisation of judgements (1) • Familiarisation with CEFR as in the Specification stage (2 h) • Productive skills – Training in assessing performance in relation to CEFR levels using standardised samples (3–4 h/skill) – Benchmarking local performance samples to CEFR levels (3–4 h/skill) 30-31 March 2007 30 Standardisation of judgements (2) • Receptive skills – Training in judging the difficulty of test items in relation to CEFR standardised items (3–4 h/skill) – Judging the difficulty of local items in relation to CEFR levels (3–4 h/skill) 30-31 March 2007 31 Empirical validation • Data collection • Internal validation: – Confirming the psychometric quality of the test • External validation: – Confirming the relationship to the CEFR through an independent measure 30-31 March 2007 32 Activities 1 • Familiarisation – Consideration of a selection of the question boxes printed at the end of relevant sections of CEF chapters 3, 4 and 5 – Discussion of the CEFR levels as a whole • Table 1. Common Reference Levels: global scale (p 24) – Sorting individual CEFR descriptors into levels • Spoken Fluency (p 129) • General Linguistic Range (p 110) 30-31 March 2007 33 3 Common Reference Levels • Users of the Framework may wish to consider and where appropriate state: – to what extent their interest in levels relates to learning objectives, syllabus content, teacher guidelines and continuous assessment tasks (constructor-oriented); – to what extent their interest in levels relates to increasing consistency of assessment by providing defined criteria for degree of skill (assessor-oriented); – to what extent their interest in levels relates to reporting results to employers, other educational sectors, parents and learners themselves (user-oriented), providing defined criteria for degrees of skill (assessor-oriented); – to what extent their interest in levels relates to reporting results to employers, other educational sectors, parents and learners themselves (user-oriented). 30-31 March 2007 34 4 Language use and the language user/learner • Users of the Framework may wish to consider and where appropriate state: – in which domains the learner will need/be equipped/be required to operate. • Users of the Framework may wish to consider and where appropriate state: – the situations which the learner will need/be equipped/be required to handle; – the locations, institutions/organisations, persons, objects, events and actions with which the learner will be concerned. 30-31 March 2007 35 5 The user/learner’s competences • Users of the Framework may wish to consider and where appropriate state: – what prior sociocultural experience and knowledge the learner is assumed/required to have; – what new experience and knowledge of social life in his/her community as well as in the target community the learner will need to acquire in order to meet the requirements of L2 communication; – what awareness of the relation between home and target cultures the learner will need so as to develop an appropriate intercultural competence. • Users of the Framework may wish to consider and where appropriate state: – on which theory of grammar they have based their work; – which grammatical elements, categories, classes, structures, processes and relations are learners, etc. equipped/required to handle. 30-31 March 2007 36 B1 Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc. Can deal with most situations likely to arise whilst travelling in an area where the language is spoken. Can produce simple connected text on topics which are familiar or of personal interest. Can describe experiences and events, dreams, hopes & ambitions and briefly give reasons and explanations for opinions and plans. A1 Can understand and use familiar everyday expressions and very basic phrases aimed at the satisfaction of needs of a concrete type. Can introduce him/herself and others and can ask and answer questions about personal details such as where he/she lives, people he/she knows and things he/she has. Can interact in a simple way provided the other person talks slowly and clearly and is prepared to help. C2 Can understand with ease virtually everything heard or read. Can summarise information from different spoken and written sources, reconstructing arguments and accounts in a coherent presentation. Can express him/herself spontaneously, very fluently and precisely, differentiating finer shades of meaning even in more complex situations. 30-31 March 2007 37 C2 Proficient C1 user Can understand with ease virtually everything heard or read. Can summarise information from different spoken and written sources, reconstructing arguments and accounts in a coherent presentation. Can express him/herself spontaneously, very fluently and precisely, differentiating finer shades of meaning even in more complex situations. Can understand a wide range of demanding, longer texts, and recognise implicit meaning. Can express him/herself fluently and spontaneously without much obvious searching for expressions. Can use language flexibly and effectively for social, academic and professional purposes. Can produce clear, well-structured, detailed text on complex subjects, showing controlled use of organisational patterns, connectors and cohesive devices. 30-31 March 2007 38 B2 Can understand the main ideas of complex text on both concrete and abstract topics, including technical discussions in his/her field of specialisation. Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without strain for either party. Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages and Indedisadvantages of various options. pendent B1 Can understand the main points of clear standard input User on familiar matters regularly encountered in work, school, leisure, etc. Can deal with most situations likely to arise whilst travelling in an area where the language is spoken. Can produce simple connected text on topics which are familiar or of personal interest. Can describe experiences and events, dreams, hopes & ambitions and briefly give reasons and explanations for opinions and plans. 30-31 March 2007 39 A2 Can understand sentences and frequently used expressions related to areas of most immediate relevance (e.g. very basic personal and family information, shopping, local geography, employment). Can communicate in simple and routine tasks requiring a simple and direct exchange of information on familiar and routine matters. Can describe in simple terms aspects of his/her background, immediate environment and matters in areas Basic of immediate need. User A1 Can understand and use familiar everyday expressions and very basic phrases aimed at the satisfaction of needs of a concrete type. Can introduce him/herself and others and can ask and answer questions about personal details such as where he/she lives, people he/she knows and things he/she has. Can interact in a simple way provided the other person talks slowly and clearly and is prepared to help. 30-31 March 2007 40 C2 B1 A2 B2 A1 B1 C1 B2 Can express him/herself at length with a natural, effortless, unhesitating flow. Can keep going comprehensibly, even though pausing for grammatical and lexical planning and repair is very evident, especially in longer stretches of free production. Can construct phrases on familiar topics with sufficient ease to handle short exchanges, despite very noticeable hesitation and false starts. Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without imposing strain on either party. Can manage very short, isolated, mainly pre-packaged utterances, with much pausing to search for expressions, to articulate less familiar words, and to repair communication. Can make him/herself understood in short contributions, even though pauses, false starts and reformulation are very evident. Can communicate spontaneously, often showing remarkable fluency and ease of expression in even longer complex stretches of speech. Can produce stretches of language with a fairly even tempo; although he/she can be hesitant as he/she searches for patterns and expressions, there are few noticeably long pauses. 30-31 March 2007 41 Has a very basic range of simple expressions about personal details and needs of a concrete type. Has a sufficient range of language to be able to give clear descriptions, B2 express viewpoints and develop arguments without much conspicuous searching for words, using some complex sentence forms to do so. Can select an appropriate formulation from a broad range of language to him/herself clearly, without having to restrict what he/she wants C1 express to say. Has enough language to get by, with sufficient vocabulary to express with some hesitation and circumlocutions on topics such as B1 him/herself family, hobbies and interests, work, travel, and current events. Can use basic sentence patterns and communicate with memorised groups of a few words and formulae about themselves and other A2 phrases, people, what they do, places, possessions etc. Can exploit a comprehensive and reliable mastery of a very wide range of C2 language to formulate thoughts precisely, give emphasis, differentiate and eliminate ambiguity. Can produce brief everyday expressions in order to satisfy simple needs of type: personal details, daily routines, wants and needs, requests A2 aforconcrete information. Has a sufficient range of language to describe unpredictable situations, the main points in an idea or problem with reasonable precision B2 explain and express thoughts on abstract or cultural topics such as music and 42 films. 30-31 March 2007 A1 Sources • Common European Framework of Reference for Modern Languages: Learning, Teaching, Assessment (CEFR): http://www.coe.int/T/DG4/Portfolio/?L=E&M=/docum ents_intro/common_framework.html • Hardcastle, Peter. 2004. Test Equivalence and Construct Compatibility across Languages. University of Cambridge ESOL Examinations Research Notes, 17, August, 6−11. • Jackson, Frederick H. & Kaplan, Marsha A. 1999. Lessons learned from fifty years of theory and practice in government language teaching. In: Georgetown University Round Table on Language and Linguistics. Washington, DC: Georgetown University Press, 71–87. 30-31 March 2007 43 Sources • MacWhinney, Brian. 1995. Language-Specific Prediction in Foreign Language Learning. Language Testing, 12, 292–320. • Manual for relating language examinations to the Common European Framework of Reference for Languages – a preliminary pilot version: http://www.coe.int/T/DG4/Portfolio/?L=E&M=/docu ments_intro/Manual.html • Taylor, Lynda. 2004. Issues of test comparability. University of Cambridge ESOL Examinations Research Notes, 15, February, 2–5. 30-31 March 2007 44