This is a list used in my previous seminars. Questions asked in 2003-4 may change and will be added systematically.
Based on sources:
Flowerdew, J. 1993 [concordancing and language learning]
Granger (ed.) 1998 [learner corpora]
Hunston 2002 [corpora & lg learning/teaching: techniques and methodologies]
Kennedy & Kjellmer 1992 [historical overview of corpora + examples of pedagogically useful applications of corpora + general debate on the principles of corpus use in ELT]
McEnery & Wilson 1996 [historical overview of corpus linguistics + corpus annotation]
Partington 1998 [corpora & synonymy, translation, semantic prosody, style]
Tribble & Jones 1997 [examples of DDL exercises]
1. QUESTIONS
1.
Define a corpus. How is it different from a text archive?
2.
Corpus compilation criteria: briefly characterise any 5 criteria, and illustrate them with a specific corpus..
3.
Briefly characterise at least 5 different types of corpora.
4.
Corpus annotation: define, enumerate and briefly characterise at least 3 types
5.
What information can be gathered from corpora: give at least 3 examples [frequency; meaning, patterns & phraseology; collocation; colligation; semantic prosody]
6.
In what ways can a POS-tagged corpus be used that a non-tagged corpus cannot?
7.
How can linguistic information be obtained from a corpus? [wordlist, collocation statistics, concordancing, etc.]
8.
Why are collocation statistics useful?
9.
How can a corpus be applied in a study of: a) synonyms; b) translation equivalents; c) word meaning; d) phraseology and patterns; e) connotation.
10.
What kinds of words usually top a frequency wordlist?
11.
What can high-frequency lexical words in a word list indicate?
12.
What can a major differences in frequency of a grammatical word between two corpora indicate [e.g. different text-type, e.g. high freq of 'this' may imply academic writing]
13.
Corpus 1 contains many more occurrences of 'surprisingly' than 'incredibly'; corpus2 - the opposite. What can be (tentatively) said about the difference between both corpora?
14.
What does corpus evidence suggest about synonymy?
15.
Compare/contrast: a) general vs specialist corpus; b) parallel vs comparable corpora; c) plain-text vs annotated corpus; d) monitor corpus vs finite corpus; e) learner corpus vs reference corpus; f) learner corpus vs pedagogic corpus.
16.
Areas of application of corpora - briefly discuss 3 (except language learning).
17.
The strengths and limitations of corpora:
[+ judgements about collocations; + judgements about frequency; + semantic prosody and pragmatic meaning; + details of phraseology]
[- what is frequent, not what is impossible in lg; - provide data about corpus, NOT directly about language; - corpus data MUST be interpreted by intuition; - (esp. large) corpus presents language out of context]
18.
What are some of the largest corpora of English? Briefly characterise at least 3.
19.
Online corpora of English: mention 2, briefly characterise, and state their usefulness
SLIGHTLY HARDER QUESTIONS
1.
Should a corpus contain full-texts or textual samples? How can this affect its representativeness?
2.
Corpora vs introspection vs elicitation vs experimentation.
3.
Frequency vs salience [frequency vs psychological or cultural prominence]
4.
Typicality vs prototypicality in language [frequency vs psychological expectation] in the context of corpora.
5.
How have corpus studies affected linguistic theory? [grammar and lexis; word meaning and patterns; idiom principle vs open choice principle]
6.
Present a brief historical overview of corpus linguistics. Make sure you also relate to
Chomsky's views of language.
7.
Characterise Contrastive Interlanguage Analysis.
8.
The use of a large corpus vs small corpus: discuss any methodological ramifications.
Is a large corpus always better than a small corpus?
2. Explain a term/concept:
Term to briefly define/describe/illustrate corpus absolute / raw frequency alignment of parallel corpora annotated corpus annotation archive (of texts) corpus design criteria corpus-based corpus-driven co-text data-driven learning
Bank of English balanced corpus
BNC
Brown classroom concordancing cluster / n-gram
Cobuild
(collocational) frame collocation collocation statistics comparable corpora parallel corpus concordance concordancer concordancing context co-occurrence diachronic corpus pedagogic corpus general corpus monitor corpus span elicitation error tagging finite corpus frequency frequency list hapax (legomena)
ICAME
ICLE idiomatic principle interlanguage introspection keyword
Korpus Języka Polskiego PWN
KWiC learner corpus lemma lexical approach lexical syllabus lexicogrammar
LOB
London-Lund Corpus
Longman Corpus Network
Longman Learners Corpus mutual information t-score monitor corpus node word normalised frequency occurrence online concordancer open-choice principle
OTA parser parsing relative frequency representativeness salience/salient feature search (= query) semantic prosody semantic annotation serendipity
SEU [Survey of English Usage]
SGML
Sinclair slot-and-filler span (of words) subcorpus tag tagger tagset
Thorndike & Lorge token transcribe a corpus type (e.g. word type) typical / typicality unannotated corpus
PICLE
POS tagging
Project Gutenberg prosodic information
West, Michael word [as 'seen' by computer] word-form wordlist
Zipf's law quantitative data query (= search) query syntax / search pattern
3. Discuss in several sentences an example of a corpus-based finding that you find especially appealing. Cite the source from which it is taken.
Make sure you can provide background info to your example from: methodology of teaching historical linguistics
English syntax
English phonology history of English/American literature
4. True/false question types:
Corpus linguistics tells us what is systemically possible.
Corpus linguistics tells us what is likely to occur in the language in general or in particular contexts.
For most of the 1960's-1980's the conventional wisdom about how languages are learned and should be taught has not drawn heavily on the fruits of corpus research
Between about 1920 and the mid-1950's there was a close relationship between corpus research on vocabulary and second and foreign language teaching