1 Pushing back the limits of phraseology: How far can we go? Sylviane Granger Centre for English Corpus Linguistics, Université catholique de Louvain – Belgium 1. Phraseology wide and narrow Phraseology is pervasive in all language fields and yet despite this fact – or perhaps precisely because of it – it has only relatively recently become established as a discipline in its own right. The phraseology literature represents it as a subfield of lexicology dealing with the study of word combinations rather than single words. These multi-word units (MWUs) are classified into a range of subtypes in accordance with their degree of semantic non-compositionality, syntactic fixedness, lexical restrictions and institutionalization. As phraseology has strong links but fuzzy borders with several other fields of linguistics however, notably morphology, syntax, semantics and discourse, linguists vary in their opinions as to which subsets of these MWUs should be included in the field of phraseology. Compounds and grammatical collocations are cases in point. This difficulty in establishing what exactly falls under the umbrella of phraseology is compounded by the fact that phraseology is a dynamic phenomenon, and displays both synchronic and diachronic variations (Moon 1998; Giegerich 2004). Although there is still some considerable discrepancy between linguists as regards the terminology and typology of word combinations and the limits of phraseology itself, there is general agreement that phraseology constitutes a continuum along which word combinations are situated, with the most opaque and fixed ones at one end and the most transparent and variable ones at the other (Cowie 1998: 4-7; Howarth 1998: 168-171; Gross 1996: 78). One of the main preoccupations of linguists working within this tradition has been to find linguistic criteria to distinguish one type of phraseological unit from another (e.g. collocations vs. idioms or full idioms vs. semi-idioms) and especially to distinguish the most variable and transparent multiword units from free combinations, which only have syntactic and semantic restrictions and are therefore considered as falling outside the realm of phraseology (Cowie 1998: 6). As Cowie (1998) points out, it is this approach, itself greatly indebted to the Russian tradition, which deserves much of the credit for having established phraseology as a discipline in its own right. It has provided linguists with a set of discrete criteria which can be used to categorize and analyze word combinations as well as provide thorough descriptions of phraseological units. At the same time though, establishing non-compositionality and fixedness as key indices of phraseology has placed focus firmly on units such as proverbs, idioms or phrasal verbs to the detriment of more variable combinations, which, because they are considered less ‘core’, tend to be dealt with less in reference and teaching tools, a state of affairs which is reflected in the large number of books devoted to idioms or phrasal verbs currently on the market. 2 A more recent approach to phraseology, which originated with Sinclair’s pioneering lexicographic work (Sinclair 1987) and is usually referred to as the statistical or frequency-based approach (Nesselhauf 2005), has turned phraseology on its head. Instead of resorting to a topdown approach which identifies phraseological units on the basis of linguistic criteria, it uses a bottom-up corpus-driven approach to identify lexical co-occurrences. This inductive approach generates a wide range of word combinations, which do not all fit predefined linguistic categories (Moon 1998: 39). It has opened up a “huge area of syntagmatic prospection” (Sinclair 2004: 19) encompassing sequences like frames and colligations as well as institutionalized phrases, which are “syntactically and semantically compositional, but occur with markedly high frequency (in a given context)” (Sag et al 2002). Such units, traditionally considered as peripheral or falling outside the limits of phraseology, have recently revealed themselves to be pervasive in language, while many of the most restricted units have proved to be highly infrequent. Unlike proponents of the classical approach to phraseology, Sinclair and his followers are much less preoccupied with distinguishing between different categories and subcategories of word combinations or more generally, with setting clear boundaries to phraseology. In Sinclair’s framework, phraseology is central: phraseological items take precedence over lexical items. This radical view has been criticized. Gaatone (1997: 168), for instance, welcomes the growing importance attached to multi-word units but warns against considering everything as phraseological. However, there is now some strong support for the ubiquity and centrality of phraseology, both from corpus-based linguistic studies and also from recent psycholinguistic studies, such as Wray’s (this volume), which present holistic storage as the default type of processing. 2. Reconciling the two approaches If phraseology is to be successfully integrated into both theoretical second language acquisition (SLA) studies and pedagogical applications, the most promising avenue would seem to be one that combines the benefits of the two approaches: the fine-grained linguistic analysis of the traditional approach and the heuristic value of the statistical approach. The traditional approach provides SLA researchers with a keener awareness of the different categories of MWUs. Current studies either make do with one overarching notion of ‘formulaic sequence’ or completely disregard the impact of phraseology on speakers’ word knowledge scores. Fresh light would be shed on the results of Wolter’s (2002) study of the syntagmatic vs. paradigmatic organization of the L1 and L2 mental lexicon if the phraseological profile of the prompt words was taken into account. The traditional approach also has much to contribute to pedagogical research as “different kinds of MWUs suggest different kinds of learning” (Grant & Bauer 2004: 51). While it is neither realistic nor desirable to expect materials designers, teachers and learners to master the full range of fine-grained categories and subcategories of MWUs, all these groups would benefit from a good understanding of the major categories and accessible terminology (Lewis 2000: 129-130). 3 This said, gaining a good grasp of the contextual use of words involves much more than the traditional bona fide categories of multi-word units. In an applied perspective, the frequencybased approach has an undeniable advantage as it covers the whole range of co-occurrence patterns with no a priori exclusions. Even so-called free combinations have a place. While they are often presented as predictable and hence not worthy of attention, they have been reinstated by recent studies of learner language which have shown that what is felt to be predictable by native speakers of the language may in fact present problems for foreign learners (Lea & Runcie 2002: 823-824). Nesselhauf’s (2005) study of V + N combinations has demonstrated that free combinations are not always used correctly by learners: she identified an error rate that was lower than for collocations but by no means negligible (17% vs. 25%). The frequency-based approach has also highlighted the importance of another category of MWU, what Biber (2004) calls ‘lexical bundles’, compositional recurrent sequences which he describes as “the most important textual building blocks used in spoken and written discourse.” Similar studies based on learner corpora of academic writing (De Cock 2003 and this volume; Paquot 2005 and this volume; Flowerdew 1998 and 2003; Granger & Paquot 2005) have shown that it is precisely these building blocks which cause learners difficulty. It follows that if learners are to become more fluent speakers and writers, these types of unit have to be included in any course or textbook alongside fully-fledged idioms and other traditionally recognized units. What we need then, is a combination of the two approaches. While it is advisable to start from a very wide notion of phraseology, the frequency-based information should be complemented with insights drawn from other disciplines as not all units identified by quantitative methods are pedagogically valuable. Traditional phraseological theory is essential here as it provides the necessary apparatus to break down the statistical units into linguistically-defined categories, an essential step towards optimal pedagogical integration. In fact, statistical multi-word units should be viewed as raw material which needs to be refined using a series of filters: linguistic (types of MWUs), cognitive (notions of salience, animacy, etc.), cross-linguistic (degree of congruence with learner’s L1) and didactic (teaching objective). 3. Conclusion The existence of two widely different approaches to phraseology is an undeniable asset for a field whose importance is now universally acknowledged. SLA theoreticians and practitioners have all the necessary ingredients – a solid theoretical apparatus, large native and learner corpora and powerful extraction tools - to integrate phraseology more solidly into SLA theory and teaching practice. It is to be hoped that they will avail themselves of these resources so that phraseology can at long last have the place it deserves in language education. References Biber D. (2004) Lexical bundles in academic speech and writing. In Lewandowska-Tomaszcyk B. (ed.) Practical Applications in Languages and Computers. Frankfurt: Peter Lang, 165-178. Cowie A.P. (1998) Introduction. In Cowie A.P. (ed.) Phraseology: Theory, Analysis and Applications. Oxford: OUP, 1-20. 4 De Cock S. (2003) Recurrent Sequences of Words in Native Speaker and Advanced Learner Spoken and Written English: a Corpus-driven Approach. Unpublished PhD dissertation. Louvain-la-Neuve: Université catholique de Louvain. Flowerdew L. (1998) Integrating ‘Expert’ and ‘Interlanguage’ Computer Corpora Findings on Causality: Discoveries for Teachers and Students. English for Specific Purposes 17(4): 329-345. Flowerdew L. (2003) A Combined Corpus and Systemic-Functional Analysis of the Problem-Solution Pattern in a Student and Professional Corpus of Technical Writing. TESOL Quarterly 37(3): 489-511. Gaatone D. (1997) La locution : analyse interne et analyse globale. In Martins-Baltar M. (ed.) La locution entre langue et usages. Langages. Fontenay-Saint Cloud: ENS éditions, 165-177. Giegerich H. J. (2004) Compound or phrase? English noun-plus-noun constructions and the stress criterion. English Language and Linguistics 8: 1-24. Granger S. & M. Paquot (2005) The phraseology of EFL academic writing: Methodological issues and research findings. Paper presented at ICAME 26 – AAAACL6 (International Computer Archive of Modern and Medieval English - American Association of Applied Corpus Linguistics), 12-15 May 2005, University of Michigan, USA. Grant L. & L. Bauer (2004) Criteria for Re-defining Idioms: Are we Barking up the Wrong Tree ? Applied Linguistics 25(1): 38-61. Gross G. (1996) Les expressions figées en français. Noms composés et autres locutions. Paris: Ophrys. Howarth P. (1998) The Phraseology of Learners’ Academic Writing. In Cowie A.P. (ed.) Phraseology: Theory, Analysis, and Applications. Oxford: OUP, 161-186. Lea D. & M. Runcie (2002) Blunt instruments and Fine Distinctions: a Collocations Dictionary for Students of English. In Braasch A. & C. Povlsen (eds) Proceedings of the Tenth EURALEX International Congress. Copenhagen: Center for Sprogteknologi, 819-829. Lewis M. (2000) Teaching collocation. Further Developments in the Lexical Approach. Boston: Heinle. Moon R. (1998) Fixed expressions and idioms in English. Oxford: Clarendon Press. Nesselhauf N. (2005) Collocations in a Learner Corpus. Amsterdam & Philadelphia: Benjamins. Paquot M. (2005) Towards a productively-oriented academic word list. Paper presented at Practical Applications in Language and Computers, 7-9 April 2005, Łódź, Poland. Sag I. A., T. Baldwin, F. Bond, A. Copestake & D. Flickinger (2002) Multiword Expressions: A Pain in the Neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City, 1-15. Sinclair J. (1987) Looking Up. An account of the COBUILD Project in lexical computing. London: Collins ELT. Sinclair J. (2004) Trust the Text – Language, corpus and discourse. London: Routledge. Wolter B. (2001) Comparing the L1 and L2 Mental Lexicon. A Depth of Individual Word Knowledge Model. Studies in Second Language Acquisition 23: 41-69.