What is a corpus

advertisement
Przemysław Kaszubski
IFA UAM Poznań
kprzemek@ifa.amu.edu.pl
The English Day (10 Dec., 2002)
Instytut Neofilologii
Państwowa Wyższa Szkoła Zawodowa w Koninie
The use of electronic text, or corpora, in the teaching of English
What is a corpus?
"a collection of pieces of language, selected and ordered according to explicit linguistic criteria in order to be used as a sample
of the language" (Sinclair 1996 - EAGLES96).




naturally-occurring / authentic text/discourse (NOT citations)
usu. machine-readable / electronic / computer-stored and -processable
compiled according to design criteria
(ideally) representative of the sampled language variety
Why bother with a corpus?
Expert speakers have only partial knowledge
Expert speakers think of what is possible
Expert speakers cannot quantify their knowledge
Expert speakers cannot make up natural examples
Corpus is more comprehensive and balanced
Corpus shows us what is common and typical
Corpus can give us fairly accurate statistics
Corpus can give us many natural examples
Some basic types of (monolingual) corpora





written / spoken
general/reference
special(ised)
sample
monitor
Other corpora



bilingual and multilingual (comparable & parallel)
special / non-standard (e.g. child language)
(non-native) learner (or interlanguage)
Representativeness. Why are the design criteria important? (Meyer 2002)







whose language (range of text sources; time-frame; sociolinguistic variables: gender, age, education, dialect, social
relationships)
production or reception
spoken / written medium
genre / text-type
general or specialised
size vs balance
sample size vs whole texts
Useful & reliable corpus annotation


POS-tagging
lemmatisation
ELT: some questions asked of corpora






does an item exist (in general; in a genre; typicality/variation)
are the synonyms interchangeable
typical lexical/grammatical context(s) for an item
teaching grammar through lexis
false-friends or true friends (bilingual corpora)
study of literary texts through concordancing
Pedagogical approaches to using corpora


teacher-controlled use of corpus-based resources (dictionaries, coursebooks, exercises)
data-driven-learning / classroom-concordancing
Advantages of small (<0,5M) over large (>100M) corpora (Aston 1997)
easier to manage
more fully analysable
easier to become familiar with
easier to interpret
easier to construct
easier to reconstruct
more clearly patterned
limits are clearer
Where are the corpora? Where are the tools? [Some demos.]
Free online corpora access
English
BNC Online Sampler: http://sara.natcorp.ox.ac.uk/lookup.html
COBUILD Concordance & Collocation Sampler: http://titania.cobuild.collins.co.uk/form.html
WebCorp: http://www.webcorp.org.uk/
Polish
Korpus PWN: http://korpus.pwn.pl/
Pseudo-korpus IPI-PAN: http://www.ipipan.waw.pl/~corpus/
Free online text resources for offline research
English:
Project Gutenberg: http://www.promo.net/pg/
Oxford Text Archive: http://ota.ahds.ac.uk/
Miscellaneous online sources (press, encyclopedias)
Polish
PWN links: http://slowniki.pwn.pl/korpus/linki.php
Polish-English
Bilingual documents: http://www.zbiordokumentow.pl/
Other affordable text resources
CD-ROMs (encyclopedias, newspapers)
Free tools for offline use
concordancer: Concordancer for Windows: http://www.linglit.tu-darmstadt.de/wconcord.htm
other:
XCLOZE & CONTEXTS: http://web.bham.ac.uk/johnstf/timcall.htm
TestBuilder: <soon available, e-mail me for info>
Conclusions: the contributions of corpora to remember




Lexical and lexicogrammatical access to text
Context
Supercede human intuition of commonness/variation
Anyone can research
WWW info on corpora (selection):
David Lee's Bookmarks for Corpus-Based Linguists: http://devoted.to/corpora
M. Barlow's Corpus Linguistics page: http://www.ruf.rice.edu/~barlow/corpus.html
Tim Johns CALL Page: http://web.bham.ac.uk/johnstf/timcall.htm
P. Kaszubski's (Learner) corpora page: http://main.amu.edu.pl/~przemka
PELCRA Home Page (Polish-English Language Corpora for Research and Applications):
http://www1.uni.lodz.pl/pelcra/index.htm
Recommended books & articles:
Kennedy, G. 1992. "Preferred ways of putting things with implications for language teaching". In: Svartvik, J. (ed.), 1992,
Directions in corpus linguistics. Proceedings of the Nobel Symposium 82, The Hague: Mouton de Gruyter. 335-373.
Partington, A. 1998. Patterns and meaning: using corpora for English language research and teaching. Amsterdam: John
Benjamins Publishing Company.
Tribble, C. & Jones, G., 1997. Concordances in the classroom: using corpora. A resource guide for teachers [new edition].
Houston, TX: Athelstan.
Wichmann, A., Fligelstone, S., McEnery, T. & Knowles, G (eds). 1997. Teaching and Language Corpora. London & New
York: Longman.
Download