Formulaic Language in Academic Study Norbert Schmitt Single Words vs. Multi-word Units • Most discussion of vocabulary (including academic vocabulary) has been conceptualized in terms of single words or word families How Much Vocabulary is Needed in English? • Nation (CMLR, 2006) 6,000 - 7,000 word families for spoken discourse 8,000 - 9,000 word families for written discourse Frequency and Coverage Levels 1st 1,000 2nd 1,000 3rd 1,000 4th–5th 1,000 6th–9th 1,000 10th–14th 1,000 Proper nouns Not in the lists Approximate written coverage (%) 78–81 8–9 3–5 3 2 <1 2–4 1–3 Approximate spoken coverage (%) 81–84 5–6 2–3 1.5–3 0.75–1 0.5 1–1.5 1 Nation (2006) AWL (Coxhead, TQ 2000) capacity assistance abstract brief focus hierarchy hypothesis incentive minimum diverse cooperate funding enormous investigation circumstance offset rational publication evidence maintain invoke integrity reverse manual sum scope entity item purchase revise spherical successive release AWL (Coxhead, TQ 2000) capacity assistance abstract brief focus hierarchy hypothesis incentive minimum diverse cooperate funding enormous investigation circumstance offset rational publication evidence maintain invoke integrity reverse manual sum scope entity item purchase revise spherical successive release Academic Vocabulary • Successive comes with its own typical phraseology • What words collocate with successive? COCA Results • • • • • • • each successive successive generations successive governments successive administrations successive waves successive layers successive stages Typical Collocations • Each successive president chose entanglements and evasion over transparency, legality, and independence. • Turning schools around could help save successive generations of kids who quit and often end up jobless. Phraseology in Language • There is a great deal of recurrent phraseology in language (including academic language) • This ‘formulaic language’ is crucial for accurate, appropriate, and fluent language use What is Formulaic Language? • Recurrent multi-word lexical items that have a single meaning or function (Schmitt, 2010) • It is a umbrella cover term for a number of formulaic categories – – – – – – – Idioms Collocations Phrasal verbs Lexical bundles Lexical phrases Phrasal expressions etc What is Formulaic Language? • multi-word units, multiword chunks, fixed expressions, frozen phrases, phrasal vocabulary, routine formulas, chunks, prefabricated routines … • Individual phrasal items will be referred to as a formulaic sequences Why is Formulaic Language Important? • Formulaic language is one of the most important components of language overall • The reasons for this are numerous: Why is Formulaic Language Important? • Formulaic language is ubiquitous in language use Why is Formulaic Language Important? • • Formulaic language is ubiquitous in language use Meanings and functions are often realized by formulaic language Why is Formulaic Language Important? • • • Formulaic language is ubiquitous in language use Meanings and functions are often realized by formulaic language Formulaic language is necessary for appropriate functional language use Why is Formulaic Language Important? • • • • Formulaic language is ubiquitous in language use Meanings and functions are often realized by formulaic language Formulaic language is necessary for appropriate functional language use Formulaic language has processing advantages Why is Formulaic Language Important? • Formulaic language is an important component of language acquisition Why is Formulaic Language Important? • • Formulaic language is an important component of language acquisition Formulaic language is a feature of many languages Why is Formulaic Language Important? • • • Formulaic language is an important component of language acquisition Formulaic language is a feature of many languages The use of formulaic language helps speakers be fluent Why is Formulaic Language Important? • • • • Formulaic language is an important component of language acquisition Formulaic language is a feature of many languages The use of formulaic language helps speakers be fluent Phraseology is a main feature that distinguishes different synonyms Ubiquitous in Language Use • • • • • • • 52-58% 32% 48-80% (M=66%) once every five words 21% 30% 31% - 40% 15% Erman and Warren (2000) Foster (2001) Oppenheim (2000) Sorhus (1977) Biber, et al. (1999) Howarth (1998) Rayson (2008) Ubiquitous in Language Use • • • • • • • 52-58% 32% 48-80% (M=66%) once every five words 21% 30% 31% - 40% 15% • Figures depend on the method of measurement, and whether spoken vs. written discourse Erman and Warren (2000): Foster (2001) Oppenheim (2000) Sorhus (1977) Biber, et al. (1999) Howarth (1998) Rayson (2008) Meanings and Functions • The more recurrent a language need is (e.g. need to apologize, make a request, explain a particular idea), the more likely there will be a conventionalized expression (i.e. formulaic language) to express it Meanings and Functions • • • • Expressing a concept: (get out of Dodge [City] = get out of town quickly, usually in uncomfortable circumstances) Stating a commonly believed truth or advice: (Too many cooks spoil the soup = it is difficult to get a number of people to work well together) Providing phatic expressions which facilitate social interaction: (Nice weather today is a nonintrusive way to open a conversation) Signposting discourse organization: (on the other hand signals an alternative viewpoint) Meanings and Functions • • • • Providing technical phraseology which can transact information in a precise and efficient manner: (2-mile final is a specific location in an aircraft landing pattern) Maintaining conversations: (How are you?, See you later) Realizing the topics necessary in daily conversations: (When is X? (time), How far is X? (location)) Expressing functions: I'm (very) sorry to hear about ___ to express sympathy Appropriate Language Use • Formulaic language is expected by the speech community, and so word combinations which do not comply to the norm sound ‘unnatural’ Appropriate Language Use • gap Native speaker or learner? – Betty very skillfully stopped the gap of the mailbox so that birds could not get in. – … but to bridge the gap between existing … Appropriate Language Use • Betty very skillfully stopped the gap of the mailbox so that birds could not get in. – Meaningful but awkward • … but to bridge the gap between existing – Appropriate word (collocation) choice Appropriate Language Use • Schmitt (ELIA, 2005-2006) • Define border • How is it used? Appropriate Language Use BNC frequency border borders bordering bordered X + on 8,011 2,539 367 356 Figurative sense 89 (1%) 84 (3%) 177 (48%) 99 (28%) 71% 75% Appropriate Language Use • • • His passion for self-improvement bordered on the pathological. But his approach is unconscionable, bordering on criminal. Some other words which occur to the right of bordered/ing on: a slump a sulk alcoholic poisoning antagonism apathy arrogance austerity bad taste blackmail carelessness chaos conspiracy contempt cruelty cynicism Appropriate Language Use SOMETHING (is/are) bordered/bordering on SOMETHING UNPLEASANT Processing Advantages Pawley and Syder (1983) • • • Formulaic sequences offer processing efficiency because single memorized units, even if made up of a sequence of words, are processed more quickly and easily than the same sequences of words which are generated creatively. The mind uses an abundant resource (long term memory) to store a number of prefabricated chunks of language that can be used ‘ready made’ in language production. This compensates for a limited resource (working memory), which can potentially be overloaded when generating language on-line from individual lexical items and syntactic/discourse rules. Processing Advantages Figurative Personally, I think you can have the highest degree from the best university in the world, but at the end of the day it’s your contribution to the society that matters, and not the name of the university you went to at all. Literal However, I still had to carry most of my stuff in small boxes from my old room to the new one. I had to make at least 50 trips so at the end of the day I was absolutely exhausted. Novel I know that at the end of the war he went on to teach students at the Military Academy. Processing Advantages Siyanova, Conklin, and Schmitt (SLR, 2011) First Pass Reading Time = 3 + 4 (early) Total Reading Time = 3 + 4 + 6 (late) Fixation Count = 3 + 4 + 6 (late) Processing Advantages Siyanova, Conklin, and Schmitt Figurative First Pass Reading Time (ms) 447 Total Reading Time (ms) 514 Fixation Count 2.8 Literal 454 507 2.7 Novel 497 628 3.2 Processing Advantages Siyanova, Conklin, and Schmitt Figurative First Pass Reading Time (ms) 447 Total Reading Time (ms) 514 Fixation Count 2.8 Literal = = = 454 507 2.7 Novel = < < 497 628 3.2 Language Acquisition • Peters (1983) suggests that formulaic sequences may be decomposed and the individual components extracted through a process of segmentation, to give insights into vocabulary and grammar: An hour ago, a year ago, a month ago A(n) _____ ago + hour, year, month Occurs in a Range of Languages • Formulaic language has been found in a range of languages: English, Russian, French, Spanish, Italian, German, Swedish, Polish, Arabic, Hebrew, Turkish, Greek, and Chinese • Is it a universal trait of all languages? Helps Speakers be Fluent • The largest unit of novel discourse that native speakers are able to process is a single clause of 8-10 words • When speaking, proficient speakers will speed up and become fluent during these clauses • But they will then slow down or even pause at the end of these clauses • NS seldom pause in the middle of a clause, or at least not for long Helps Speakers be Fluent • But proficient speakers can fluently say multi-clause utterances: - You can lead a horse to water, but you can’t make him drink. • Kuiper (2004) shows that speakers who operate under severe time constraints (play-by-play sports announcers, auctioneers) use a great deal of formulaic language in their speech • So, formulaic language helps speakers be more fluent Distinguishes Synonyms (Stubbs, 1994) How are the following (near) synonyms used? • • • • • WORK JOB CAREER LABOR EMPLOYMENT Distinguishes Synonyms (Stubbs, 1994) WORK: workaholic, workforce, workload, workplace aid worker, factory worker, office worker, social worker Distinguishes Synonyms (Stubbs, 1994) WORK: workaholic, workforce, workload, workplace aid worker, factory worker, office worker, social worker neutral? (frequent word = many contexts) Distinguishes Synonyms (Stubbs, 1994) JOB: botched, crummy, bad, hatchet, menial Distinguishes Synonyms (Stubbs, 1994) JOB: botched, crummy, bard, hatchet, menial negative? Distinguishes Synonyms (Stubbs, 1994) CAREER: brilliant, distinguished, glittering, acting, director, film, international, literary Distinguishes Synonyms (Stubbs, 1994) CAREER: brilliant, distinguished, glittering, acting, director, film, international, literary positive? Distinguishes Synonyms (Stubbs, 1994) LABOR: casual, cheap, deskilling, manual, unproductive Distinguishes Synonyms (Stubbs, 1994) LABOR: casual, cheap, deskilling, manual, unproductive negative? Distinguishes Synonyms (Stubbs, 1994) EMPLOYMENT: conditions, contract, discrimination, rights Distinguishes Synonyms (Stubbs, 1994) EMPLOYMENT: conditions, contract, discrimination, rights legal? Learner Use of Formulaic Language • Learners don’t use many idioms • Learners do use many high-frequency collocations (nice day) • Learners don’t use many lower-frequency but tightly-bound collocations (preconceived notions) Learner Use of Formulaic Language • But learners often do not use the collocations they know appropriately • Inappropriate collocations is a leading problem in learner language • Learners often use words with their correct meanings, but do not understand the correct context of use (collocation, register, frequency) Learner Use of Formulaic Language • Learners consistently overestimate their comprehension of reading texts that contain formulaic sequences that they either fail to identify or misunderstand, even at high levels of proficiency (Martinez and Murphy, TQ 2011) Learner Acquisition of Formulaic Language • Boers & Lindstromberg (ARAL 2012) reviewed acquisition research: – – – – – Learning from exposure requires repetition (frequency) Intentional learning produced better results Raising awareness of formulaic language is not a powerful accelerator of learning Knowing the component words makes learning a formulaic sequence easier Providing learning strategies (dictionaries, concordance lines) produced mixed results Learner Acquisition of Formulaic Language • Does learner use of formulaic language (e.g. collocations) improve just from studying in an academic environment? • Incidental acquisition • Li and Schmitt (JSLW, 2009) Learner Acquisition of Formulaic Language • We followed a Chinese MA student at Nottingham over one academic year and compiled a learner corpus from all of her essays and dissertation • We then analyzed all of her assignments and dissertation for formulaic language Learner Acquisition of Formulaic Language • Would the student produce more formulaic language over the year? • Would the student produce better formulaic language over the year? • Would the student become more confident in producing formulaic language over the year? D is se r sa y sa y sa y sa y sa y sa y sa y sa y 8 7 6 5 4 3 2 1 ta ti o n Es Es Es Es Es Es Es Es Amount Produced Average Tokens Per 700 Words 35 30 25 20 15 10 5 Appropriateness Inappropriateness Rate Less Appropriateness Rate Appropriateness Rate 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% ta ti o n 8 is se r D Es sa y 7 Es sa y 6 Es sa y 5 Es sa y 4 sa y 3 Es sa y 2 Es sa y Es Es sa y 1 0% Confidence Less Confident Confident Very Confident 100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% n 8 is se rt at io D Es sa y 7 Es sa y 6 Es sa y 5 Es sa y 4 Es sa y 3 Es sa y 2 Es sa y Es sa y 1 0.0% Learner Acquisition of Formulaic Language • Does learner use of formulaic language (e.g. collocations) improve from explicit teaching? • Focused instruction • Jones and Haywood (2004, In Schmitt (Ed.) Formulaic Sequences) Learner Acquisition of Formulaic Language • Learners had better awareness of formulaic language after 10 weeks and could identify a greater number of sequences in a text • Some learners made some progress in producing more formulaic sequences in a C-test: He suspected that too much of th__ ki__ o__ chemical might encourage the immune system…) • Most learners made no noticeable improvement in the number of formulaic sequences produced in their essays over 2 weeks Necessity of Formulaic Language Cowie (1992:10) goes so far to say: “It is impossible to perform at a level acceptable to native users, in writing or in speech, without controlling an appropriate range of multiword units.” Pedagogical Implications • Meunier review (ARAL, 2012) • If formulaic sequences are so important: • They need to be included in teaching syllabuses and materials • We can’t assume they will just be learned from exposure • They need to incorporated into language tests to a greater extent Pedagogical Implications • But what formulaic sequences? • Vincent (JEAP, 2013) proposes a 6-stage process for identifying academic phraseology • Martinez (ELTJ, 2013) suggests a selection framework based on frequency and transparency • In order to incorporate formulaic sequences into their teaching and testing, most practitioners need a list of formulaic sequences to address An Academic Formulas List (1-24) Simpson-Vlach & Ellis (AL, 2010) • in terms of • at the same time • from the point of view • in order to • as well as • part of the • the fact that • in other words • the point of view of • there is a • as a result of • this is a • on the basis of • a number of • there is no • point of view • the number of • the extent to which • as a result • in the case of • whether or not • the same time • with respect to • point of view of An Academic Formulas List (1-24) • The table showed the first 24 formulas on the core list (written and spoken), ranked by a combination of frequency and MI scores • All component words of these formulas come from the 1st 1,000 frequency band An Academic Formulas List Written 177-200 •even though the •this does not •was based on •the nature of the •in the course of •degree to which •be argued that •in terms of a •for this reason •are based on •in a number of •two types of •the total number •is more likely •which can be •are able to •be considered as •be used to •b and c •depend on the •is that it is •is affected by •should also be •if they are An Academic Formulas List Written 176-199 •even though the •this does not •was based on •the nature of the •in the course of •degree to which •be argued that •in terms of a •for this reason •are based on •in a number of •two types of •the total number •is more likely •which can be •are able to •be considered as •be used to •b and c •depend on the •is that it is •is affected by (AWL) •should also be •if they are An Academic Formulas List • Top 200 from written texts • 1st 1,000 • 2nd 1,000 • AWL 127 different words 2 different words 16 different words An Academic Formulas List • To learn formulas from the AFL, learners must either: – Know the high frequency component words already – This makes the learning easier • Or – Learn the AFL formulas as wholes even if some component words are not known – Less efficient • Knowing AWL words would not help much • Knowing the 1st 1,000 words is key An Academic Formulas List • Many of the AFL are structural components of meaningful sentences, but may not contain clear a meaning sense in their own right: • • • • is that it is is affected by should also be if they are An Academic Formulas List • The AFL is based around functions: • Framing attributes – the idea that – the change in • Quantity specification – a series of • Identification and focus – different types of – such as a An Academic Formulas List • Identification and focus – exactly the same – (the) difference between (the) • Locatives – in the real world • Vagueness markers – and so forth • Hedges – to some extent An Academic Formulas List • Obligation and directive – I want you to • Expressions of ability and possibility – allows us to – are able to • Evaluation – an important role in – is consistent with • Discourse markers – even though the – in conjunction with Formulaic Framework (Martinez, ELTJ, 2013) Infrequent take credit 27 Frequent take issue 121 take time 910 Transparent take credit take place 10,556 Opaque take time take issue take place Formulaic Framework (Martinez, ELTJ, 2013) Frequent take time (2) take place (1) Transparent Opaque take credit (4) take issue (3) Infrequent PHRASE List (Martinez & Schmitt, AL, 2012) • PHRASE List (PHRASal Expressions) • Some formulaic sequences are very frequent • 500 phrasal expressions within 5,000 BNC frequency level • Based on same frequency as individual BNC words • Phrases which are opaque and not easily guessable PHRASE List • LEAD TO (CAUSE) 13,555 (1st 1,000 frequency level) Excessive smoking can lead to heart disease. • HAVE GOT TO (must) 12,270 (2nd 1,000 frequency level) You have got to try this salad. • BY THE TIME (when) 3,607 (3rd 1,000 frequency level) By the time dinner started there were none left. PHRASE List Integrated Phrase Frequency Spoken List (per 100 million) general Rank Written Written Example general academic 107 HAVE TO 83,092 *** ** * I exercise because I have to. 463 GOING TO (FUTURE) 28,259 *** ** x I’m going to think about it. 894 WAS TO 14,366 x *** ** The message was to be transmitted worldwide. PHRASE List Integrated Phrase Frequency Spoken List (per 100 million) general Rank Written Written Example general academic 5502 MAKE UP ONE’S MIND 788 *** ** x You’d better make up your mind. 5503 AT WORK 787 x *** *** There were strange forces at work. Experimental PHRASE Test Inclusion in the Vocabulary Levels Test 1 take place 2 have got to 3 seek to 4 find out 5 make sure 6 carry out _____ do _____ try _____ must Experimental PHRASE Test Inclusion in the Vocabulary Levels Test 1 take place 2 have got to 3 seek to 4 find out 5 make sure 6 carry out __6__ do __3__ try __2__ must Experimental PHRASE Test 1 take place 2 have got to 3 seek to 4 find out 5 make sure 6 carry out __6__ do __3__ try __2__ must X Didn’t work well – learners needed context to make sense of many phrasal expressions Experimental PHRASE Test turn out: It turned out different. a. started b. seemed c. became d. did not look Experimental PHRASE Test turn out: It turned out different. a. started b. seemed c. became d. did not look Experimental PHRASE Test at least: At least it is warm. a. other things may be bad, but b. many days have passed and now c. I cannot believe that d. the least important thing is Experimental PHRASE Test at least: At least it is warm. a. other things may be bad, but b. many days have passed and now c. I cannot believe that d. the least important thing is Experimental PHRASE Test • Seems to work much better • Still in piloting • Ron Martinez (San Francisco State University) Vocabulary Website Resource Most Norbert Schmitt (& co-author) publications and other vocabulary resources can be accessed at his personal website: www.norbertschmitt.co.uk • This PowerPoint presentation is available • The PHRASE List is available • Link to COCA Corpus BYU web site