Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK The British Academic Written English (BAWE) corpus of student writing Project in progress at the universities of Reading, Warwick and Oxford Brookes Funded by the Economic and Social Research Council (project nr. RES-000-23-0800) Outline • Corpora in LT: uses and purposes • Accessing corpus information: interfaces • Building corpora: requirements and decisions - the BAWE corpus Using corpora in language pedagogy pedagogic uses purposes classroom materials description “motivational” “linguistic” Interfaces (1): the concordance typical query options • word form • lemma • wildcards (e.g. “investigat*”) • grammatical (e.g. POS) • patterns Information & interfaces (2) statistics • Frequencies, ratios • e.g. word list, key words • ad hoc statistics corpus items • macrostructural properties and choices • generic types, e.g. CARS model (Swales 1990) Requirements: a “good corpus” for language pedagogy • Representative: target variety • Relevant: information, annotation • Usable: e.g. interface, size Representativeness The corpus as a representative sample should reflect: Conflicting principles – distribution and quantitative relations quantitative representativeness – range of features qualitative representativeness Linguistics Classics Archaeology History of Art Physics Business Politics Anthropology Publishing Medicine Meteorology Mathematics Computer Science Engineering Biochemistry Agriculture Food Sciences Health & Social Care Chemistry History Law Biological Sciences English A trade-off: stratified sampling AH PS Frame 2: 4 disciplinary groups Frame 1: the university: Frame 4: 3: 4 4x6 levels à 768 ass. ass. corpus Σ=3,072 disciplines per discipline à 128 à 32ass. ass. SS LS Sociology Representativeness (2): the BAWE corpus Relevance Relevant information in corpus Significant query Corpus annotation Features: lexicogrammatical, structural etc. Relevance (2): features annotated in the BAWE corpus • “grammatical” • textual: structure of “running text” • typographical (lay-out) • metatextual: numbering • other “interesting” features Corpus size “For the pedagogical analysis of many common grammatical phenomena a full-size research corpus is much too large.” (Osborne 2000) Modularity: subcorpora Specialised corpora Conclusion: 3 views • Qualitative vs. quantitative representation corpus as representation of a (set of) target variety/varieties • Corpus annotation and interfaces: query instances of lexicogrammatical (etc.) features and phenomena • Corpus size: modularity balanced samples of target variety/varieties Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK a.heuboeck@reading.ac.uk The British Academic Written English corpus http://www.warwick.ac.uk/go/BAWE/overview