ADAM MICKIEWICZ UNIVERSITY IN POZNAŃ Faculty of English Extracting neologisms from a corpus using NeoDet Marta Grochocka martag@wa.amu.edu.pl wa.amu.edu.pl The development of a lexical item (Bauer 1983) 1. Nonce formation Neologism (Fischer 1998) certain frequency over a certain period of time distribution in different contexts and domains 2. Institutionalization 3. Lexicalization 2 Types of neologisms • formal a new word, including acronyms and affixes, e.g. PC, e-, -gate (Metcalf 2002) • syntactic a new expression or grammatical construction • semantic a new meaning of an already existing word • borrowing 3 Methodology Aims of the study: to examine productive morphological processes in English by means of studying formal neologisms PART 1: Formal classification PART 2: Semantic classification 4 Neologism detector tool Functions: 1. compilation of the study corpus 2. neologism extraction based on the exclusion principle 3. neologism management 5 Neologism extraction process Study corpus Exclusion sources Neologism candidates Manual verification Neologism management 6 Study corpus size and content 14.3 million words newspaper articles and blogs published between 1st Jan. 2009 and 26th Oct. 2010 daily broadsheets: The Daily Telegraph, The Times, The Guardian tabloids: The Sun, The Daily Mail almost 9,000 neologism candidates analyzed (out of ca. 73,000) 121 neologisms extracted (without borrowings) 7 Exclusion sources Corpus: The British National Corpus (1991-1994) General dictionaries: Oxford Advanced Learner’s Dictionary 7th Edition, OALD7 (2005) Merriam-Webster's Collegiate Dictionary 11th Edition, MW11 (2006) Macmillan English Dictionary 2nd Edition, MEDAL2 (2007) Cambridge Advanced Learner's Dictionary 3rd Edition, CALD3 (2008) Chambers 21st Century Dictionary, CH11 (2008) Google: COBUILD Longman Dictionary of Contemporary English 5th Edition, LDOCE5 (2009) Dictionary.com Slang dictionaries: The Oxford Dictionary of New Words (1991) The Probert Encyclopaedia of Slang (2004) The Concise New Partridge Dictionary of Slang and Unconventional English (2007) The Dictionary of Contemporary Slang (2007) Word lists: proper names, geographical names 8 Neologism candidates analysis 9 Search engine 10 Neologism management 1 11 Neologism management 2 12 Neologism management 3 13 Formal classification of neologisms 14 Blends • • • • • • • • Twitterati (Twitter + glitterati) welectricity (wellingtons + electricity) retrotastic (retro + fantastic) girlicious (girl + delicious) Frankenfish (Frankenstein + fish) Obamarita (Obama + margarita) Holohoax (Holocaust + hoax) zeroflation (zero + inflation) 15 Semantic classification of neologisms 16 Semantic classification – examples IT and communications technology Politics and current affairs beatblogger Af-Pak cyber-locker Muslimist datablog Obamanomics Facebooker gamification iPad celebdom to liveblog fabby to retweet lip-syncher pet-set retrotastic Business and finance infocapitalism micro-employment zeroflation Entertainment Food and dieting frankenfish orthorexic 17 Problems • impossible to detect semantic and syntactic neologisms • alternative spelling, e.g. micro-blog, G & T • items provided as examples in the exclusion sources not analyzed by NeoDet • failure of the online exclusion sources to respond to the queries made by NeoDet • overrepresentation of the Entertainment and News section in the study corpus 18 Conclusions • formal neologisms as indicators of productive word formation processes • confirmation of the status of affixation and compounding as the most popular methods of extending the lexicon • blends as an important source of neologisms coined with the purpose of being witty, amusing and memorable • the largest number of neologisms in the area of IT and communications technology 19 Thank you ! 20