Using English Language Corpora in the ESL Classroom I-TESOL Conference October 12th, 2012 Brent A. Green Salt Lake Community College Introduction • Personal and Professional interests in using corpora in language teaching • Goals of Workshop: Participants will learned how to access and use on-line written and spoken English corpora to help them prepare course materials and assessments, increase understanding of English language structures, and engage students in data-driven learning tasks. Basics • What is a corpus? – A large database of language • What is a concordancer? – A software program that allows you to search the database for particular words or phrases • What is classroom concordancing? – A teaching approach in which concordance data are used in the language classroom to help learners notice and practice language patterns and use. This teaching approach is sometimes referred to as Data-driven Learning (DDL). Learners are driven by authentic language data, presented in the form of concordance lines, to act as a “linguistic detective’ to find answers to their linguistic queries (Johns 1988; 1991 a, b) Basics • What are concordance lines? – Examples of words or phrases uniquely presented in a way that the words or phrases under investigation are aligned in the middle of the page with their left and right contexts (often referred to as KWIC format). Key Word in Context (KWIC) KWIC • Example of KWIC from the Corpus of Contemporary American English (COCA) – www.americancorpus.org Three-Dimensional Framework Form (Syntax) Meaning (Semantics) Use (Pragmatics) Larsen-Freeman 1991 What do we look for? • Lexicography – What are the meanings associated with a particular word? – What is the frequency of a word relative to other related words? – What non-linguistic association patterns does a particular word have (e. g. to registers, historical periods, dialects) – What words commonly co-occur with a particular word, and what is the distribution of these “collocational” sequences across registers? – How are the senses and uses of a word distributed – How are seemingly synonymous words used and distributed in different ways? (Biber et al, 1988) What do we look for? • Grammatical structures (if or that clauses, causatives, etc.) • Discourse functions (making suggestions, introducing a speaker, etc.) How does one begin examining corpus data? • You need the following 1. a language related question which arises out of the text, your own observations or curiosity, or the observations and curiosity of your students. 2. A corpus of language that contains contexts which are similar to your learners’ target language learning domains. 3. Pedagogically sound principles in accessing and applying corpus data. Corpus-based Research 1. Research question 2. Extensive review of the literature 3. Summary of experts across form, meaning, and use categories 4.Comparison of experts against spoken and written corpora 5.Reformulation and expansion of existing frameworks Corpus-based Teaching • Syllabus design and evaluation – Student-based corpora – Student texts • • • • Material preparation Teacher-student collaboration Student research Assessments THE CORPUS OF CONTEMPORARY AMERICAN ENGLISH (COCA) • What is it? – The Corpus of Contemporary American English (COCA) is the largest freely-available on-line corpus of English • Who created it? – It was created by Mark Davies of Brigham Young University in 2008 • How many words does it contain? – The corpus contains more than 450 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. » Information adapted from http://corpus.byu.edu/coca/ THE CORPUS OF CONTEMPORARY AMERICAN ENGLISH (COCA) • What type of searches can I do with COCA? – The interface allows you to search for exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these. You can search for surrounding words (collocates) within a ten-word window. – The corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions – Information adapted from http://corpus.byu.edu/coca/ THE CORPUS OF CONTEMPORARY AMERICAN ENGLISH (COCA) What else can you do? You can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the collocates of two related words to determine the difference in meaning or use between these words. You can find the frequency and distribution of synonyms for nearly 60,000 words and also compare their frequency in different genres, and also use these word lists as part of other queries. Finally, you can easily create your own lists of semanticallyrelated words, and then use them directly as part of the query. Information adapted from http://corpus.byu.edu/coca/ Corpus-based Practice • Before you look for the collocates of each of the words deep, run, smile, and fairly -- what would you guess are the best collocates -- in other words, surrounding words that really help to "define" these words? • Are there any that are surprises in what you see in the corpus? Corpus-based Practice • Compare the collocates of the two words democrats and republicans. According to these texts (from newspapers, magazines, TV talk shows, etc), • Any possible media bias here? Corpus-based Practice • Compare the frequency of second vs secondly in academic texts. Which one would you guess is more frequent? • What issues do we have when we make this comparison? Corpus-based Practice • Compare the adjectives used to describe women and men. • Does this reflect biases in contemporary American culture? Corpus-based Practice • Using the web interface, you can search by – – – – – Words—malignant Phrases—nooks and crannies or faint + noun (faint [n*]) lemmas (all forms of words, like sing ([sing])or tall ([tall]) wildcards (un*ly or r?n*) more complex searches (un-X-ed adjectives (un*ed.[j*] )or verb + any word + a form of ground ([vv*] * [ground]). Types of Concordance-based Tasks Teacher-centered Collaborative Learner-centered The teacher selects words or phrases to be investigated usually taken from observations or information presented in the course text. The teacher and the learners agree on the language to be studied The learners form their own questions The teacher retrieves and selects concordance lines, and designs concordancebased tasks with different degrees of control The teacher and the learners browse the corpus and examine the language data together The learners browse the corpus independently. There is no structured or controlled task. The teacher provides clues and hints to help learners complete concordance tasks, or guides learners to a generalization or conclusion The teacher comments on and helps refine the learner's generalizations There is very little interference from the teacher in the generalization process. Adapted from Sripicharn 2003 Teacher-Centered Tasks • Example #1 – Used to and would in the habitual past • On Your Own – Hedges (kind of, sort of, like) – Say, talk, tell Erades (1943) • It may be safely said that in language a difference of form always corresponds to a difference in meaning and whenever more than one construction is—theoretically— possible, they never wholly and under all circumstances denote the same thing. The first axiom of all valid linguistic thinking is that in language nothing can serve as a substitute for something else. Would vs. Used to Example • Briefly discuss the differences between the two sentences with a partner (a) My father used to exercise every morning (b) My father would exercise every morning • One difference is that (a) can signal only habitual past action whereas (b) can also be conditional given appropriate context (i.e. “If he had time”). Would vs. Used to Example • Steps – think about the context when the structure occurs • personal narrative – find corpus data that matches that context • American Dreams (Studs Terkel) • Switchboard – search for target structures using a concordancing program • Monoconc Would vs. Used to Example • Steps cont. – Look for patterns in form, meaning, and use • In what ways, if any, are the forms the same or different? • In what ways, if any, are the meanings different or similar? (look carefully at surrounding context) • In what ways, if any, are the structures used differently? (look carefully at surrounding context) – Create sample worksheets or tests for students MICASE Corpus • How many words? – approximately 1.8 million words (190 hours) • What is the focus? – Contemporary university speech within the University of Michigan, in Ann Arbor, Michigan. • Who are the speakers? – Speakers represented in the corpus include faculty, staff, and all levels of students, and both native and non-native speakers. MICASE Corpus • What are the speech events? – The speech events included in the corpus include: small and large lectures (62), public interdisciplinary or departmental colloquia (13), discussion sections (9), student presentations (11), seminars (8), undergraduate lab sessions (8), lab group and other meetings (6), one-onone tutorials (3), office hours (8), advising consultations (5), dissertation defenses (4), study groups (8), interviews (3), campus/museum tours (2), and service encounters (2). On Your Own: Teacher-centered Task • Say, Talk, or Tell – Characteristics • Transitive vs. intransitive vs. ditransitive • Used in spoken language • Idiomatic expressions – Tasks • Search MICASE for tokens of these forms – Cut and past example sentences from MICASE into MS Word. » Ask learners to examine the forms » Assess learners ability to get the forms correct • Search for idiomatic expressions – Cut and paste examples of idiomatic forms – Ask learners key questions about the examples Example of Teacher-centered Tasks Sample sentences and Idioms Example of a collaborative task from MICASE (Hartmann, P. & Blass, 2000) MICASE Search • Click on the link below to begin your search MICASE • Using the form, meaning, and use handout— take notes on our discussion with softening phrases such as I think, In my opinion, It seems to me, others? Learner-centered • The learners form their own questions • The learners browse the corpus independently there is no structure or controlled task • There is very little interference from the teacher in the generalization process • Now it is your turn to answer those structure related questions that have been bothering you for years! • MICASE • Corpus of Contemporary American English (COCA) Other tasks • Utilizing the audio features • Browsing the corpus to find specific speech events • Micase activities for learners Two examples of student with TA during office hours S1: okay S2: you feel th- as though you're in a lab or, [S1: yeah almost ] <LAUGH> it's a little a little bit a little bit odd. okay. uh, the reason i asked you to come in is that, i- i'm looking at the grades and i'm looking at at this paper and, you're at the point where i don't want you to, fall off the edge. uh and and get a grade that's not gonna be, supportive. it seems to me that you know that you've been in touch with things in the class and that i, i liked what you did with your poem to change it which wasn't_ which must have involved a fair amount of work. [S1: (i don't know) ] to, you know to get that in a different order and to get the system ba- was it a lot of work? S1: mm, it wasn't too much it didn't take me too long to just, use the same word i just, i'd say the hardest part yeah was changing the sentences. trying to make 'em all fit again. [S2: okay ] but it wasn't too bad. S2: okay. but the rhythm seemed to work right and, [S1: mhm ] it it really did, come out to be a sus- sestina and one of the effects of the sestina is that, since you're using those words over and over again they they tend to acquire different meanings they tend to to just, they sound different in different combinations [S1: mhm ] and and they mean something. but let's look at this [S1: kay ] um, because i think that that part of what's happening here, is that is that you're using a lot of words where few words would work. where you don't really need that that many words to say what you want to to say. and there are some cases where you're where you're looking, or where you seem to be saying something um, and i think i know what i know what you want to say, but because you've sort of, you've given me more than than i need you're really disguising the meaning [S1: mkay ] rather than bringing the meaning out. so that, if y- if you look at this sentence and if you just r- read that sentence aloud. (R. C. Simpson, S. L. Briggs, J. Ovens, and J. M. Swales, 2002) Victor: Do you have a few minutes? Pam: Sure, I’m Pam. Victor: I’m Victor Pam: Hi Victor. Have a seat. How can I help you? Victor: Well I’m in Dr. Sears’ American Lit class…and I’m having a lotta trouble with that poetry unit. I’m thinking of dropping the class Pam: Oh. I hate to tell you, but Friday was the last day to drop. Victor: Oh no. I knew I should have dropped last week. Pam: Well, it’s all right. Let’s see what we can do to get you through the class. Guess literature isn’t your thing, huh? Victor: It’s just this unit on poetry. I did okay with short stories. Pam: What’s giving you problems? Victor: I just don’t get a lot of this modern stuff. It just doesn’t seem like poetry to me. Pam: What exactly bothers you? Victor: I understood the poems by Robert Frost and Maya Angelou, But the poems in last night’s homework don’t rhyme or have rhythm or anything. (Hartmann, P. & Blass, 2000) Favorite Corpus Web Site • Michael Barlow’s Corpus Linguistics Site http://www.athel.com/corpus.html Other Links • Spoken Corpora – MICASE: R. C. Simpson, S. L. Briggs, J. Ovens, and J. M. Swales. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan. – Linguistic Data Consortium University of Pennsylvania – The Corpus of Contemporary American English Mark Davies, Brigham Young University – American National Corpus – British National Corpus also available through Mark Davies Corpus website References • Spoken Language Resources – Bygate, M. (1998) Theoretical perspectives on speaking. Annual Review of Applied Linguistics 18, p. 20-42 – Burns, A. (1998) Teaching speaking. Annual Review of Applied Linguistics 18, p. 102-123 – Burns, A. & Joyce, H. (2002). Focus on speaking. Sydney: National Center for English Language Teaching and Research. – McCarthy, M. (1998). Spoken language & applied linguistics. Cambridge: Cambridge University Press. – Celce-Murcia, M., & Larsen-Freeman, D. (1999). The grammar book: An ESL/EFL teacher's course (2nd ed.). Boston, MA: Heinle & Heinle. References • Corpus Linguistics Texts – Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. – Partington, A. (1998) Patterns and Meanings: Using corpora for English language research and teaching. John Benjamins. – Tribble, C. & Jones, G. (1997). Concordances in the classroom: using corpora. A resource guide for teachers [new edition]. Houston, TX: Athelstan References • MICASE Tips and Tutorials • Other References – Erades, P. A. (1943). The case against provisional It. English Studies, 25, 169-176 – Hartmann, P. & Blass, L. (2000). Quest: Listening and speaking in the academic word Book 3. New York: McGraw Hill. – Johns, T. F. (1988) Whence and whither classroom concordancing? In T. Bongaerts, P de Hann, S. Lobbe, & H. Wekker (eds.) Computer applications in language learning, p. 9-27. USA: Forbis Publications – Johns, T. F. (1991) Should you be persuaded: Two examples of Data-driven learning. In T.F. Johns & P. King (eds.) ELR Journal Vol. 4 Classroom concordancing (p. 27-46). Birmingham CESL: The University of Birmingham Press – Johns, T. F. (1997). Contexts: The background, development, and trailing of a concordance-based CALL program. In A. Wichmann, S. Fligelstone, T. McEnery, & G. Knowles (eds.) Teaching and language corpora. London: Longman. – Riggenbach, H. (1999). Discourse analysis in the language classroom: Vol. 1. The spoken language. Ann Arbor, MI: University of Michigan Press. – Sripicharn, P. (2003). Implementing collaborative concordancing between teacher and learners in the writing class. Paper presented at the 5th CULI International Conference, Bangkok, Thailand.