Dr Maciej Machniewski
1. What are corpora?
1.1 Why do we need corpora? What for?
1.2 Corpus-text difference; corpus representativeness
1.3 Limitations of corpora
2. Are corpora enough?
2.1 What can we use to ‘support’ corpus research?
2.2 CA – which one? Types of CA
2.3 Contrastive Functional Analysis (CFA) as a meaning-based, ‘contextual’ CA
2.4 Equivalence in CA and TS
3. What can we learn about translation through corpora?
3.1 Translation ‘universals’
3.2 Are translated texts different from original texts? How and why?
4. How do we discover things through corpora?
4.2 Word lists (frequency-based / alphabetical)
4.3 Type/token ratio, lexical density
4.4 Sentence length
“the study of language based on examples of “real life” language use” (McEnery & Wilson 1996: 1)
“in principle any collection of more than one text can be called a corpus” (McEnery & Wilson 1996: 21)
available in electronic format
present-day corpora: The Bank of English – 200 million words; Korpus IPI PAN – 300 million words
Empirical linguistics: study of language based on real-life data
TEXT read as a whole read horizontally read for content read as a unique event read as an individual act of will coherent communicative event
(Cf. Tognini-Bonelli 2001: 3)
selection of texts vs. corpus representativeness
CORPUS read in fragments read vertically read for formal patterning read for repeated events read as a sample of social practice not a coherent communicative event
no corpus can ever account for all utterances in a given (variety of a) language
certain words, types of sentences or grammatical constructions are more common than other
falsificationist (‘skewed’) data
what can we / can’t we discover through corpora?
I have two main observations to make. The first is that I don’t think there can be any corpora, however large, that contain information about all of the areas of
English lexicon and grammar that I want to explore; all that I have seen are inadequate. The second observation is that every corpus that I’ve had a chance to examine, however small, has taught me facts that I couldn’t imagine finding out about in any other way. (Fillmore 1992: 35)
Contrastive Analysis (CA) as a method of linguistic analysis seen as most related to TS
2.2 various approaches to CA (many related to generative grammar, most language-learning centred)
Contrastive Functional Analysis (CFA):
starts from perceived similarities of meaning across two or more languages
seeks how these are expressed in the meaning-to-form-perspective
looks at the ways meanings are expressed (Chesterman 1998: 1)
(See also Appendix 3, Page 3)
Simplification: “the idea that translators subconsciously simplify the language or message or both” (Baker 1996:
Explicitation: “a stylistic translation technique which consists of making explicit in the target language what remains implicit in the source language because it is apparent from either the context or the situation” (Vinay &
Darbelnet 1995: 342)
Normalisation: a tendency “to exaggerate features of the target language and to conform to its typical patterns”
(Baker 1996: 183)
Comparison of original texts and translations in one language: lexical level:
source language patterns (collocations) carried over onto target language
source language ‘reflected’ in the target language text level:
explicitation (e.g. use of conjunctions)
simplification (type/token ratio)
See Appendix 1 (Page 3)
See Appendix 2 (Page 3)
Type/token ratio: “the ratio of the number of different words in a text to the number of running words” (Laviosa
2002: 22); for example, the type/token ratio for a corpus of 10,000 words containing 4,000 token types is 40%
Sentence length: the average number of words per sentence
Baker, M. (1996) ‘Corpus-based Translation Studies: the challenges that lie ahead’, in: Somers, H. (ed.), 175-186.
Chesterman, A. (1998) Contrastive Functional Analysis. Amsterdam and Philadelphia: Benjamins.
Fillmore, C. (1992) ‘’Corpus linguistics’ or ‘Computer-aided armchair linguistics’, in: Svartvik, J. (ed.), 35-60.
Laviosa, S. (2002) Corpus-based Translation Studies. Theory, Findings, Applications. Amsterdam and New York: Rodopi.
McEnery, T. & Wilson, A. (1996) Corpus Linguistics. Edinburgh: Edinburgh University Press.
Somers, H. (ed.) 1996. Terminology, LSP and Translation Studies in Language Engineering: In Honour of Juan C. Sager.
Amsterdam and Philadelphia: Benjamins.
Svartvik, J. (ed.) (1992) Directions in Corpus Linguistics. Berlin and New York: Mouton de Gruyter.
Tognini-Bonelli, E. (2001) Corpus Linguistics at Work. Amsterdam and Philadelphia: Benjamins.
Vinay, J. P. & Darbelnet, P. (1995) Comparative Stylistics of French and English. A Methodology for Translation. Amsterdam and
Appendix 1. Concordancing.
Appendix 2. Word list (frequency-based).
Appendix 3. Google exact phrase search, UK pages only, May 17, 2005.
Phrase in the hope that hoping that hoping to
Number of hits