corpus

advertisement

This is a list used in my previous seminars. Questions asked in 2003-4 may change and will be added systematically.

Based on sources:

Flowerdew, J. 1993 [concordancing and language learning]

Granger (ed.) 1998 [learner corpora]

Hunston 2002 [corpora & lg learning/teaching: techniques and methodologies]

Kennedy & Kjellmer 1992 [historical overview of corpora + examples of pedagogically useful applications of corpora + general debate on the principles of corpus use in ELT]

McEnery & Wilson 1996 [historical overview of corpus linguistics + corpus annotation]

Partington 1998 [corpora & synonymy, translation, semantic prosody, style]

Tribble & Jones 1997 [examples of DDL exercises]

1. QUESTIONS

1.

Define a corpus. How is it different from a text archive?

2.

Corpus compilation criteria: briefly characterise any 5 criteria, and illustrate them with a specific corpus..

3.

Briefly characterise at least 5 different types of corpora.

4.

Corpus annotation: define, enumerate and briefly characterise at least 3 types

5.

What information can be gathered from corpora: give at least 3 examples [frequency; meaning, patterns & phraseology; collocation; colligation; semantic prosody]

6.

In what ways can a POS-tagged corpus be used that a non-tagged corpus cannot?

7.

How can linguistic information be obtained from a corpus? [wordlist, collocation statistics, concordancing, etc.]

8.

Why are collocation statistics useful?

9.

How can a corpus be applied in a study of: a) synonyms; b) translation equivalents; c) word meaning; d) phraseology and patterns; e) connotation.

10.

What kinds of words usually top a frequency wordlist?

11.

What can high-frequency lexical words in a word list indicate?

12.

What can a major differences in frequency of a grammatical word between two corpora indicate [e.g. different text-type, e.g. high freq of 'this' may imply academic writing]

13.

Corpus 1 contains many more occurrences of 'surprisingly' than 'incredibly'; corpus2 - the opposite. What can be (tentatively) said about the difference between both corpora?

14.

What does corpus evidence suggest about synonymy?

15.

Compare/contrast: a) general vs specialist corpus; b) parallel vs comparable corpora; c) plain-text vs annotated corpus; d) monitor corpus vs finite corpus; e) learner corpus vs reference corpus; f) learner corpus vs pedagogic corpus.

16.

Areas of application of corpora - briefly discuss 3 (except language learning).

17.

The strengths and limitations of corpora:

[+ judgements about collocations; + judgements about frequency; + semantic prosody and pragmatic meaning; + details of phraseology]

[- what is frequent, not what is impossible in lg; - provide data about corpus, NOT directly about language; - corpus data MUST be interpreted by intuition; - (esp. large) corpus presents language out of context]

18.

What are some of the largest corpora of English? Briefly characterise at least 3.

19.

Online corpora of English: mention 2, briefly characterise, and state their usefulness

SLIGHTLY HARDER QUESTIONS

1.

Should a corpus contain full-texts or textual samples? How can this affect its representativeness?

2.

Corpora vs introspection vs elicitation vs experimentation.

3.

Frequency vs salience [frequency vs psychological or cultural prominence]

4.

Typicality vs prototypicality in language [frequency vs psychological expectation] in the context of corpora.

5.

How have corpus studies affected linguistic theory? [grammar and lexis; word meaning and patterns; idiom principle vs open choice principle]

6.

Present a brief historical overview of corpus linguistics. Make sure you also relate to

Chomsky's views of language.

7.

Characterise Contrastive Interlanguage Analysis.

8.

The use of a large corpus vs small corpus: discuss any methodological ramifications.

Is a large corpus always better than a small corpus?

2. Explain a term/concept:

Term to briefly define/describe/illustrate corpus absolute / raw frequency alignment of parallel corpora annotated corpus annotation archive (of texts) corpus design criteria corpus-based corpus-driven co-text data-driven learning

Bank of English balanced corpus

BNC

Brown classroom concordancing cluster / n-gram

Cobuild

(collocational) frame collocation collocation statistics comparable corpora parallel corpus concordance concordancer concordancing context co-occurrence diachronic corpus pedagogic corpus general corpus monitor corpus span elicitation error tagging finite corpus frequency frequency list hapax (legomena)

ICAME

ICLE idiomatic principle interlanguage introspection keyword

Korpus Języka Polskiego PWN

KWiC learner corpus lemma lexical approach lexical syllabus lexicogrammar

LOB

London-Lund Corpus

Longman Corpus Network

Longman Learners Corpus mutual information t-score monitor corpus node word normalised frequency occurrence online concordancer open-choice principle

OTA parser parsing relative frequency representativeness salience/salient feature search (= query) semantic prosody semantic annotation serendipity

SEU [Survey of English Usage]

SGML

Sinclair slot-and-filler span (of words) subcorpus tag tagger tagset

Thorndike & Lorge token transcribe a corpus type (e.g. word type) typical / typicality unannotated corpus

PICLE

POS tagging

Project Gutenberg prosodic information

West, Michael word [as 'seen' by computer] word-form wordlist

Zipf's law quantitative data query (= search) query syntax / search pattern

3. Discuss in several sentences an example of a corpus-based finding that you find especially appealing. Cite the source from which it is taken.

Make sure you can provide background info to your example from: methodology of teaching historical linguistics

English syntax

English phonology history of English/American literature

4. True/false question types:

Corpus linguistics tells us what is systemically possible.

Corpus linguistics tells us what is likely to occur in the language in general or in particular contexts.

For most of the 1960's-1980's the conventional wisdom about how languages are learned and should be taught has not drawn heavily on the fruits of corpus research

Between about 1920 and the mid-1950's there was a close relationship between corpus research on vocabulary and second and foreign language teaching

Download