References - Serwis Informacyjny WSJO


Development prospects in Translation Studies: Corpus-based Translation

Studies (CTS)

Dr Maciej Machniewski

1. What are corpora?

1.1 Why do we need corpora? What for?

1.2 Corpus-text difference; corpus representativeness

1.3 Limitations of corpora

2. Are corpora enough?

2.1 What can we use to ‘support’ corpus research?

2.2 CA – which one? Types of CA

2.3 Contrastive Functional Analysis (CFA) as a meaning-based, ‘contextual’ CA

2.4 Equivalence in CA and TS

3. What can we learn about translation through corpora?

3.1 Translation ‘universals’

3.2 Are translated texts different from original texts? How and why?

4. How do we discover things through corpora?

4.1 Concordance

4.2 Word lists (frequency-based / alphabetical)

4.3 Type/token ratio, lexical density

4.4 Sentence length


Corpus linguistics:

 “the study of language based on examples of “real life” language use” (McEnery & Wilson 1996: 1)


 “in principle any collection of more than one text can be called a corpus” (McEnery & Wilson 1996: 21)

 available in electronic format

 machine readable

 present-day corpora: The Bank of English – 200 million words; Korpus IPI PAN – 300 million words

Empirical linguistics: study of language based on real-life data


TEXT read as a whole read horizontally read for content read as a unique event read as an individual act of will coherent communicative event

(Cf. Tognini-Bonelli 2001: 3)

 selection of texts vs. corpus representativeness


CORPUS read in fragments read vertically read for formal patterning read for repeated events read as a sample of social practice not a coherent communicative event

 no corpus can ever account for all utterances in a given (variety of a) language

 certain words, types of sentences or grammatical constructions are more common than other

 falsificationist (‘skewed’) data

 what can we / can’t we discover through corpora?

I have two main observations to make. The first is that I don’t think there can be any corpora, however large, that contain information about all of the areas of

English lexicon and grammar that I want to explore; all that I have seen are inadequate. The second observation is that every corpus that I’ve had a chance to examine, however small, has taught me facts that I couldn’t imagine finding out about in any other way. (Fillmore 1992: 35)



Contrastive Analysis (CA) as a method of linguistic analysis seen as most related to TS

2.2 various approaches to CA (many related to generative grammar, most language-learning centred)


Contrastive Functional Analysis (CFA):

 starts from perceived similarities of meaning across two or more languages

 seeks how these are expressed in the meaning-to-form-perspective

 looks at the ways meanings are expressed (Chesterman 1998: 1)



Nasze władze zgodziły się na porozumienie w nadziei, że pozwoli ono zwielokrotnić nasz eksport do krajów "15".

Translation 1:

EU 15.

Our authorities accepted the agreement in the hope that it would enable increasing our export to

The Polish authorities accepted the agreement hoping to increase Poland’s import into the EU.

Translation 2:

(See also Appendix 3, Page 3)


Simplification: “the idea that translators subconsciously simplify the language or message or both” (Baker 1996:


Explicitation: “a stylistic translation technique which consists of making explicit in the target language what remains implicit in the source language because it is apparent from either the context or the situation” (Vinay &

Darbelnet 1995: 342)

Normalisation: a tendency “to exaggerate features of the target language and to conform to its typical patterns”

(Baker 1996: 183)


Comparison of original texts and translations in one language: lexical level:

 source language patterns (collocations) carried over onto target language

 source language ‘reflected’ in the target language text level:

 explicitation (e.g. use of conjunctions)

 simplification (type/token ratio)


See Appendix 1 (Page 3)


See Appendix 2 (Page 3)


Type/token ratio: “the ratio of the number of different words in a text to the number of running words” (Laviosa

2002: 22); for example, the type/token ratio for a corpus of 10,000 words containing 4,000 token types is 40%


Sentence length: the average number of words per sentence


Baker, M. (1996) ‘Corpus-based Translation Studies: the challenges that lie ahead’, in: Somers, H. (ed.), 175-186.

Chesterman, A. (1998) Contrastive Functional Analysis. Amsterdam and Philadelphia: Benjamins.

Fillmore, C. (1992) ‘’Corpus linguistics’ or ‘Computer-aided armchair linguistics’, in: Svartvik, J. (ed.), 35-60.

Laviosa, S. (2002) Corpus-based Translation Studies. Theory, Findings, Applications. Amsterdam and New York: Rodopi.

McEnery, T. & Wilson, A. (1996) Corpus Linguistics. Edinburgh: Edinburgh University Press.

Somers, H. (ed.) 1996. Terminology, LSP and Translation Studies in Language Engineering: In Honour of Juan C. Sager.

Amsterdam and Philadelphia: Benjamins.

Svartvik, J. (ed.) (1992) Directions in Corpus Linguistics. Berlin and New York: Mouton de Gruyter.

Tognini-Bonelli, E. (2001) Corpus Linguistics at Work. Amsterdam and Philadelphia: Benjamins.

Vinay, J. P. & Darbelnet, P. (1995) Comparative Stylistics of French and English. A Methodology for Translation. Amsterdam and

Philadelphia: Benjamins.


Appendix 1. Concordancing.

Appendix 2. Word list (frequency-based).

Appendix 3. Google exact phrase search, UK pages only, May 17, 2005.

Phrase in the hope that hoping that hoping to

Number of hits