Annual IVACS One-Day Symposium. School of Education, Queen’s University Belfast. Thursday, January 23rd, 2014 The Applications of Corpus Linguistics for research in Education, Teaching and Learning Provisional Programme From 9.00 Arrival (Tea and Coffee) 9.15-9.30 Welcome Address 9.30-10.00 Responding to task: chaotic uniqueness in philosophy undergraduate essays James Binchy 10.00-10.30 Dealing with Dirty Data Ivor Timmis 10.30-11.00 The Project of the Corpus of Learner English in Malta (CLEM): compiling the corpus and decisions on the error-tagging system. Odette Vassallo COFFEE BREAK: 11.00-11.30 11.30-12.00 Collocation and the learner: wading into the depths Michael McCarthy 12.00-12.30 A comparison of Native and Non-native speaker undergraduate presentations in the CLAS Corpus Margaret Healy, Kristin Horan, Anne O’Keeffe 12.30-1.00 ‘When some pieces are missing’: The construction of non-native spoken discourse with a limited set of pragmatic markers Òscar Bladas and Aisling O’Boyle LUNCH BREAK: 1.00-2.30 (Sandwich lunch provided) 2.30-3.00 Confidence through Corpora: Supporting native speaker EAP with a corpus of student-generated texts Megan Bruce 3.00-3.30 Focus group data investigating ESOL speakers’ talk Róisín Ní Mhocháin 3.30 Round up Abstracts Responding to task: chaotic uniqueness in philosophy undergraduate essays James Binchy, Mary Immaculate College Essay writing is used as a common form of assessment for students engaged in third-level education. It is often the only form of assessment in a module. In the writing that students produce for their assessments, they are often expected by those correcting the essays to appropriate the norms of both academic writing and the norms of the particular discipline. However, at the same time, each essay must be unique and were it not to be, it would be considered plagiarism. This must be done while responding to a task set by the lecturer. This paper uses a corpus of 94 undergraduate student essays collected throughout the philosophy degree programme of one cohort of students, including both the first and the final essay assignment in the degree programme. The corpus shows that each student responds to the task, yet does so in a unique manner. The corpus also shows that for each pattern there is successful student who did not follow that pattern. This paper will examine the corpus both quantitatively and qualitatively to show that individual writer choices with regard to lexis can seem chaotic despite similarity of task. ************ Dealing with Dirty Data Ivor Timmis, Leeds Metropolitan University In this paper I will discuss some of the challenges involved in working with historical data, with a particular focus on historical sociolinguistics. The data I will discuss is taken from the Bolton/Worktown corpus, a corpus of conversations which took place between working class people in Bolton, an industrial town in the North of England, between 1937 and 1940. I will consider two specific aspects of the challenge this kind of data poses: 1) How far can we judge whether the data, for the most transcribed ‘live’ by nonlinguists, is ‘authentic’? 2) Is it possible to apply social categories such as ‘working class’ retrospectively? In relation to the question of authenticity, I will ask whether it is legitimate to speak of degrees of authenticity, and whether professionally and locally informed intuition has a legitimate role to play in making judgements on authenticity. In relation to social categorisation, I will argue that social history can come to the aid of corpus linguistics to help us meet this aspect of the challenge of historical sociolinguistics. ************ The Project of the Corpus of Learner English in Malta (CLEM): compiling the corpus and decisions on the error-tagging system. Odette Vassallo, University of Malta Malta, where Maltese and English have official status, has long been considered a bilingual country. There is a growing national concern based on anecdotal evidence that the standard of English is in decline, especially among learners. To date, no attempt has been made to substantiate or refute this perception through more concrete evidence. This paper presents a work-in-progress of a project set up to address this information gap. It outlines the compilation of the Corpus of Learner English in Malta (CLEM) which is a project funded by the University of Malta (ENGRP02-02/03) launched to create a baseline study for present and future investigation of learner English in Malta. To date, CLEM contains 447,925 words and consists of 985 language essays in English written by Maltese learners collected from examination scripts representing two national benchmarks. The web-based corpus analysis system CQPweb was selected for CLEM. Examination scripts were keyed in and the corpus was tagged for parts-of-speech (POS) adopting the Penn Treebank tagset. CLEM has now secured more texts and is set to expand and include the national benchmark of younger learners and of university students. The paper also presents the dilemma the project team currently faces in the selection of the right error-tagging system. ************ Collocation and the learner: wading into the depths Michael McCarthy In most vocabulary teaching, there is understandably an emphasis on increasing the size of the learner’s lexicon, to help learners over the 2-3,000 word threshold that facilitates successful everyday communication. While the size (breadth) of a learner’s vocabulary is crucial, what learners know about the behaviour of the words in their lexicon (depth of vocabulary knowledge) becomes more and more important as they strive to achieve more natural-sounding communication. However, collocation can remain problematic even for higher-level learners. I use evidence from the error-coded segment of the Cambridge Learner Corpus to examine three persistent problem areas under the general heading of collocation: (1) binomial ordering, where problems with the ordering of fixed binomial expressions (e.g. safe and sound, peace and quiet) persist as learners move up the CEFR levels, (2) tautological collocations, where (near-)synonymous words are collocated in unexpected ways (e.g. a stench smell, urban cities), which also persist up the levels and (3) delexical verb collocations (verbs such as make, take, get and their complements), where progress in depth of knowledge can be observed as learners move up the levels but where interesting problems remain, even at the highest levels. ************ A comparison of Native and Non-native speaker undergraduate presentations in the CLAS Corpus Margaret Healy, Mary Immaculate College Kristin Horan, Shannon College of Hotel Management Anne O’Keeffe, Mary Immaculate College This paper will use the Cambridge Limerick and Shannon Corpus (CLAS), a one-million word database of spoken academic English collected at the Shannon College of Hotel Management. These data are set in an interesting language context where roughly half of the cohort is native speakers of English and half is non-native speakers of English. In this paper, we will isolate the context of student presentations, in one subject area, Hotel Management Information Systems, and we will examine in detail the presentations of native and non-native speakers of English. Our analysis will begin by comparing word frequency lists and patterns of collocation but we will also look at the differing identities which belie what is said in different ways. ************ When some pieces are missing: The construction of non-native spoken discourse with a limited set of pragmatic markers Òscar Bladas, University of Barcelona Aisling O’Boyle, Queen’s University Belfast Pragmatic markers are key pieces of spoken discourse. Literature in second language acquisition shows that non-native speakers tend to underuse these particles or to use them differently from native speakers. However, most studies in the field pay little or no attention to the consequences of underusing pragmatic markers in terms of discourse organisation. Hence it remains unclear how non-native speakers manage to build coherent second language discourse with a lower number of these particles. This presentation aims to shed light on this issue by comparing both quantitatively and qualitatively L3 English discourse produced by L1 Catalan and L1 Spanish speakers with discourse produced by native English speakers. The analysis suggests that non-native speakers may, in fact, use a significant amount of PMs, though in a different way to that of native speakers. ************ Confidence through Corpora: Supporting native speaker EAP with a corpus of studentgenerated texts Megan Bruce, Durham University Foundation Centre Durham University Foundation Centre has a student cohort comprising both home and international students. Over the years, we have started to deliver EAP modules to the home students alongside their international counterparts and have found that we need to adopt some different strategies in order to support our home students in acquiring the academic writing skills they need. This talk explores one initiative that we have established to induct our students into the Community of Practice of their progressing department: a Foundation Corpus known as the FOCUS project. This project began with us building a corpus of student-generated texts from different academic disciplines which students can search to help them learn how language is really used in their subject area. Alongside the development of the corpus, we are designing a suite of self-access activities to allow students to make the best use of the corpus data to improve their own writing and use of language. This talk focuses in particular on how we have decided to teach some EAP skills through the searching of corpus data rather than explicit use of grammatical metalanguage, which is a less threatening approach for our home students who have little or no background grammatical knowledge, and illustrates it with examples of the approach. ************ Focus group data investigating ESOL speakers’ talk Róisín Ní Mhocháin, Bath Spa University This paper involves an initial analysis of focus group data of ESOL speakers in Ireland. The participants were resident in Ireland and were not at the time enrolled in language classes. The focus group discussion revolved around their experiences of conversation in Ireland and how previous language education classes benefited, or not, in conversational situations. This initial analysis, carried out using Wmatrix as a starting point, aims to explore the resultant talk by quantifying recurring linguistic features with the added expectation of identifying areas for further in-depth qualitative analysis. The learning from the initial Wmatrix analysis and its relevance for trainee and in-service language teachers will be discussed. ************