CEF and the British National Corpus Bertil Geurts 25 March, 2011 Vocabulary and receptive skills Which words contribute most to listening and reading competence for Dutch learners of English as a secondary language? Status: Kick-off of research aiming to label English words ‘A1’, ‘A2’, ‘B1’ or ‘B2+’ for testing and possibly teaching purposes Incentive Revision of VAS 2 test of English Reading and vocabulary on three levels: BB, KBGT and HV Which words can year-2 students in BB, KBGT and HV be tested on? > Aim for ± CEF A2 (BB: A1/A2 … HV: A2/B1—) > Waystage (van Ek, revised 1991) Relevance of knowing words for L2 readers Correlation reading (and listening) word knowledge, especially at lower levels Other correlations familiarity with subject matter (CEF) (reading and listening) strategies, probably more so at higher levels syntax / grammar (disputed) What is a word? “The majority of the examples in the dictionary are taken word for word from one of the texts in the corpus.” 21 words (tokens) 14 different words (types): 5x the, 2x of, 2x in, 2x word 13 lemmas: the, majority, of, example, in, dictionary, take, word, for, from, one, text, corpus 13 word families 12 function words 9 content words Word family CONCLUDE concluded, concludes, concluding conclusion conclusive conclusively inconclusive III III V VI V a foregone conclusion ? jump to conclusions ? in conclusion ? conclude from ? His essay had a very weak …, which left a poor final impression on the reader. Can we demonstrate … that the factory caused the pollution? There was … evidence the two students had committed plagiarism, so they went free. The author … the article by suggesting topics for further research. From: www.pbs.plymouth.ac.uk/academicwordlistatuop/ I. 1 – 680 II. 680 – 1720 III. 1720 – 3300 IV. 3300 – 6500 V. 6500 - 14.600 VI. 14.600 - … (680 words) (1040 words) I and II: = 75% of all English usage (1580 words) (3200 words) (8100 words) (all 20 million words of the corpus) Input for labelling words Breakthrough, Waystage, Threshold and Key English Test, Preliminary English Test CEF(R) BNC / Bank of English frequency lists Teacher (expert) intuition B2+ B1 1500 . A2 900 . A1 600 Breakthrough, Waystage, Threshold Council of Europe Word Indexes for A1, A2, B1 (van Ek, 1978; revised 1991) Lots of words to do with post, army, church, but no e-mail, internet, cell phone Too bad: poste-restante thingummyjig Recent vocabulary needed, frequency relevant Cambridge KET (A2) and PET (B1) updated regularly CEF on vocabulary Booklet p. 1 VOCABULARY RANGE B2 B1 A2 A1 Has a good range of vocabulary for matters connected to his/her field and most general topics. Can vary formulation to avoid frequent repetition, but lexical gaps can still cause hesitation and circumlocution. Has a sufficient vocabulary to express him/herself with some circumlocutions on most topics pertinent to his/her everyday life such as family, hobbies and interests, work, travel, and current events. Has sufficient vocabulary to conduct routine, everyday transactions involving familiar situations and topics. Has a sufficient vocabulary for the expression of basic communicative needs. Has a sufficient vocabulary for coping with simple survival needs. Has a basic vocabulary repertoire of isolated words and phrases related to particular concrete situations. CEF: vocabulary clues Booklet p. 1 Table 2. Common Reference Levels: self-assessment grid L i s t e n i n g R e a d i n g A1 A2 B1 B2 I can recognise familiar words and very basic phrases concerning myself, my family and immediate concrete surroundings when people speak slowly and clearly. I can understand phrases and the highest frequency vocabulary related to areas of most immediate personal relevance (e.g. very basic personal and family information, shopping, local area, employment). I can catch the main point in short, clear, simple messages and announcements. I can understand the main points of clear standard speech on familiar matters regularly encountered in work, school, leisure, etc. I can understand the main point of many radio or TV programmes on current affairs or topics of personal or professional interest when the delivery is relatively slow and clear. I can understand extended speech and lectures and follow even complex lines of argument provided the topic is reasonably familiar. I can understand most TV news and current affairs programmes. I can understand the majority of films in standard dialect. I can understand familiar names, words and very simple sentences, for example on notices and posters or in catalogues. I can read very short, simple texts. I can find specific, predictable information in simple everyday material such as advertisements, prospectuses, menus and timetables and I can understand short simple personal letters. I can understand texts that consist mainly of high frequency everyday or jobrelated language. I can understand the description of events, feelings and wishes in personal letters. I can read articles and reports concerned with contemporary problems in which the writers adopt particular attitudes or viewpoints. I can understand contemporary literary prose. CEF ± English in Dutch secondary schools A1 L i s t e n i n g A2 BB1 KB1 GT1 H1 B1 B2 V1 GT2 KB3 V3 GT4 H4 V5 V6 BB1 KB1 R e a d i n g GT1 H1 V1 KB2 GT2 H2 KB3 GT3 BB4 V3 H4 H5 V6 Booklet p. 2 Frequency (1) Booklet pp. 2, 3 http://ucrel.lancs.ac.uk/bncfreq/ Companion Website for: Word Frequencies in Written and Spoken English: based on the British National Corpus. (2001) pp. 320, Longman, London. Word PoS a Det A / a Lett a bit Adv a great deal Adv a little Adv a lot Adv abandon Verb @ @ @ @ @ @ @ @ abbey NoC @ @ @ @ Aberdeen NoP % : : : : : % abandon abandoned abandoning abandons % abbey abbeys % Freq Ra Disp 21626 268 119 14 104 40 44 12 26 5 1 20 19 1 14 100 100 99 96 100 99 99 98 97 90 47 95 95 34 88 0.99 0.93 0.87 0.95 0.92 0.93 0.96 0.94 0.96 0.93 0.87 0.90 0.90 0.75 0.80 http://www.wordfrequency.info/ Word frequency lists and dictionary from the Corpus of Contemporary American English Frequency (2) Word PoS Freq the of and a in to it is to was I for that you he be* with on by at have* are not this 's but had they his from she that Det Prep Conj Det Prep Inf Pron Verb Prep Verb Pron Prep Conj Pron Pron Verb Prep Prep Prep Prep Verb Verb Neg DetP Gen Conj Verb Pron Det Prep Pron DetP 61847 29391 26817 21626 18214 16284 10875 9982 9343 9236 8875 8412 7308 6954 6810 6644 6575 6475 5096 4790 4735 4707 4626 4623 4599 4577 4452 4332 4285 4134 3801 3792 which or we 's an ~n't were as do been their has would there what will all if can her* said who one so up as them some when could him into DetP Conj Pron Verb Det Neg Verb Conj Verb Verb Det Verb VMod Ex DetP VMod DetP Conj VMod Det Verb Pron Num Adv Adv Prep Pron DetP Conj VMod Pron Prep 3719 3707 3578 3490 3430 3328 3227 3006 2802 2686 2608 2593 2551 2532 2493 2470 2436 2369 2354 2183 2087 2055 1962 1893 1795 1774 1733 1712 1712 1683 1649 1634 its then two out time my about did your now me no other only just more these also people know any first see very new may well should her* like than how get Det Adv Num Adv NoC Det Prep Verb Det Adv Pron Det Adj Adv Adv Adv DetP Adv NoC Verb DetP Ord Verb Adv Adj VMod Adv VMod Pron Prep Conj Adv Verb 1632 1595 1561 1542 1542 1525 1524 1434 1383 1382 1364 1343 1336 1298 1277 1275 1254 1248 1241 1233 1220 1193 1186 1165 1145 1135 1119 1112 1085 1064 1033 1016 995 Tasks and tries See Conference Booklet CEF-BNC: A B C D E F page 2 pages 2/3 page 4 page 4 5 – 7 page 5 pages 5 5 – 8 D, E, F Pick of the plumbers is from GLTL 2010 re-examination: B1 most likely Robber tries to hold up closed bank is in pretest revised VAS 2: A2 We’re all speaking Geek is from VWO 2010 re-examination: B2/B2+ I speak three languages fluently, am a black belt in karate, play league badminton, sing in a choir and play the organ at a church. I raise money for two charities and help with autistic children. I hope to retire from my plumbing work next year, aged 50. And no, Mr O’Neill, I didn’t have any GCSE’s either. But in no area of the culture is the collision more intense than over the English language, for the web has changed English more radically than any invention since paper, and much faster. According to Paul Payack, who runs the Global Language Monitor, there are currently 998,974 words in the English language, with thousands more emerging every month. By his calculation, English will adopt its one millionth word in late November. To put that statistic another way, for every French word, there are now ten in English.