Parallel corpora and contrastive studies

advertisement
Using corpora in contrastive
studies
Hilde Hasselgård
University of Oslo
Contrastive analysis
“Contrastive analysis is the systematic comparison of two or more
languages, with the aim of describing their similarities and
differences.” (Johansson 2007: 1)
CA [contrastive analysis] is a linguistic enterprise aimed at
producing inverted (i.e. contrastive, not comparative) two-valued
typologies (a CA is always concerned with a pair of languages),
and founded on the assumption that languages can be compared.
James (1980: 3)
3>
Corpora in linguistic analysis
Corpus: a (large) structured, machine-readable collection of texts,
prepared for use in linguistic research.
Benefits of corpora:
•Empirical basis for claims  material for studying language in
use (“parole”)
•(Relatively) easy access to material
•(Usually) shared resource
– Enhances scientific quality, in that studies can be replicated and
claims can be validated
4>
Some characteristics of
corpus-linguistic studies
Insistence on authentic material as basis of research.
Strong empirical and descriptive focus: Attention to patterns of use
rather than grammaticality and acceptability.
Quantitative investigations (often to back up qualitative ones)
– Frequency and distribution are seen as important features of words and
constructions.
– Corpus studies often aim to be exhaustive of the material investigated, i.e. to
account for all the occurrences of a particular word / construction in the
corpus.
– Therefore both precision and recall are important in searching the corpus;
Precision means to limit the output of the search to relevant constructions,
and recall to cast the net wide enough.
5>
Example of precision and recall
Topic: English N + N combinations involving the noun head.
Search for head: The ENPC (fiction) returned 332 hits
– Good recall – will certainly catch all relevant phrases.
– Bad precision – most of the hits will not be relevant.
Precision can be improved if the corpus has PoS-information, so
we can search for head preceded or followed by a noun.
– The ENPC (fiction) returned 12 hits of head + N: head waiter; head
Aristotle; head, bit by bit; head, dad; head teacher; head , soup; head
trimmer; head curtain; head honcho; head office; head floor nurse
(twice). 5 hits of N + head, two of which are compounds: section
head, deputy head.
6>
Corpora in contrastive analysis
Monolingual corpora = corpora that contain texts in one language
only
Bilingual/Multilingual corpora = corpora that contain texts in two or
more languages.
For a collection of texts in different languages to be called a
”parallel corpus”, the texts should be in some way related to each
other:
– Translation corpora
(through translation)
– Comparable corpora
(through text comparability)
– Bidirectional translation corpora
7>
Translation corpus
A corpus that contains the ‘same’ texts in more than one language; in
other words a corpus with both original and translated texts.
Original text(s)
Translation,
language 1
8>
(Translation,
language 2)
(Translation,
language 3)
Comparable corpus
a corpus that contains original texts in more than one language and
where the texts in each language have been selected according to the
same criteria (genre, content, publication date etc.)
9>
Language 1
Language 2
Language 3
Genre A
Genre A
Genre A
Genre B
Genre B
Genre B
Genre C
Genre C
Genre C
Genre D
Genre D
Genre D
Bidirectional translation corpus
(ENPC model)
Combination of translation and comparable corpus
The original texts are comparable (genre, number of words)
The translations go in both directions – a truly parallel corpus
10 >
The English-Norwegian Parallel Corpus
(ENPC) – Some facts
Started as a research project at the University of Oslo in 1994 and completed in
1997. Prof. Stig Johansson initiated and directed the project.
Original texts with authentic translations (English-Norwegian and NorwegianEnglish); Fictional and non-fictional texts.
Compiled for use in applied and theoretical linguistic research
Development of software for alignment of the texts (Knut Hofland, UiB) and for
searching the corpus (Jarle Ebeling, UiO)
Sister projects: The English-Swedish Parallel Corpus (Lund/Göteborg), EnglishFinnish Parallel Corpus (Jyväskylä/Savonlinna/Tampere) – same principle of
compilation; to some extent also shared texts.
Later developments: The French-Norwegian Parallel Corpus; the GermanNorwegian Parallel Corpus.
Other corpora built on the ENPC model in Germany (Chemnitz), France/Belgium
(Poitiers/Louvain-la-Neuve: the PLECI corpus), Spain (University of Léon)].
11 >
Important features of the ENPC
The originals and the translations are aligned at sentence level
(”s-unit”).
– Thus, searches in one language will return hits with the linked-up sentences
in the other language.
A browser for searching a bilingual corpus was developed
alongside the corpus.
Searches can be made in both originals and translations.
Searches are made in fiction and non-fiction separately.
– Thus, findings on the basis of translated language can always be checked
against originals within the same genre.
12 >
Searching the parallel corpus
13 >
Output of the search
14 >
From ENPC to OMC under the
SPRIK umbrella (SPRåk I Kontrast)
New languages were added, first (mainly) German, then French
Focus on English – Norwegian – German in the first phase of the SPRIK-project:
original texts in each language with translations into the other two.
Same principles for text selection, text sampling and preparation as for the ENPC
(exception: even more biased towards fiction because of the lack of translated
non-fiction). Same (or later versions of same) software for alignment, searching
etc.
Expanded search facilities and research possibilities:
– Three-way comparison of translations and originals
– Possibilities of investigating two different translations of the same text (translation
strategies, translationese)
Latest development at UiO (not part of OMC): Russian-Norwegian (RuN)
15 >
Trilingual parallel corpus model
16 >
Searching in the OMC (En-Ge-No):
dessuten in Norwegian originals &
translations, im übrigen in German originals
Dessuten skulle jeg slukke hvert øyeblikk... (TB1)
Außerdem wollte ich gerade das Licht ausmachen... (TB1TD)
Besides, I was just about to put out the light anyway.... (TB1TE)
And anyway, Mathilda had been taking them for years, they were commonly
prescribed once. (MW1)
Og dessuten hadde Mathilda tatt dem i årevis, det var i sin tid vanlig å foreskrive
dem. (MW1TN)
Im übrigen nahm Mathilda sie seit Jahren, sie wurden früher allgemein
verschrieben. (MW1TD)
Im übrigen gerät sie nach ihrem Vater. (ERH1)
She took after her father, at any rate. (ERH1TE)
For øvrig lignet hun på sin far. (ERH1TN)
17 >
Translation corpus with four
languages: No-En-Fr-Ge
Norwegian
originals
English
translation
18 >
French
translation
German
translation
Examples of search output
Dessuten hadde hun fått høvelig opplæring i historie, både den nye og den
gamle tid. (HW2)
Außerdem habe sie eine gute Unterweisung in Geschichte, sowohl in der
alten wie auch in der neuen, bekommen. (HW2TD)
It stated that Dina had received suitable instruction in both modern and
ancient history. (HW2TE)
En plus, elle avait reçu un enseignement convenable en histoire, ancienne
comme moderne. (HW2TF)
En contrepartie, elle eut plusieurs heures en plus devant elle. (HW2TF)
Til gjengjeld fikk hun flere timer på seg. (HW2)
Dafür hatte sie ein paar Stunden für sich. (HW2TD)
On the other hand, she got several extra hours to do the work. (HW2TE)
19 >
Methodology: Classifying
correspondences
congruent
expressed
divergent
Correspondence
Same realisation
type
Different
realisation type
zero
Example: French correspondences of however in a small En-Fr translation corpus:
However, this is likely to be a gross underestimate. (WHO1) Toutefois, leur nombre pourrait
être largement sous-estimé.
However, this essential function of social integration is today under threat ...(LB1)
Mais, aujourd'hui, cette fonction essentielle est menacée ...
However, because of the cultural and social setting, … (WHO1) En raison du contexte
culturel et social …
20 >
What can multilingual corpora
contribute?
They give insights into the languages compared – insights that are
likely to be unnoticed in studies of monolingual corpora.
They can be used for a range of comparative purposes and increase
our understanding of language-specific, typological and cultural
differences, as well as of universal features.
They illuminate differences between source texts and translations,
and between native and non-native texts.
They can be used for a number of practical applications, e.g. in
lexicography, language teaching, and translation.
(Aijmer & Altenberg 1996: 12)
21 >
Filipović’s arguments for using
corpora in contrastive analysis
a) a valid contrastive project cannot be considered complete
before its results have been verified and completed with the
help of some representative corpus;
b) only a corpus can verify certain cases of doubtful
grammaticality;
c) frequency and distribution can be established only on the basis
of a corpus;
d) without a corpus we could not analyse the stylistic value, i.e.
stylistic levels and registers, of certain forms;
e) the corpus is necessary for the component of ‘use’, (1984:
114)
22 >
Methodology in contrastive
analysis
A CA presupposes a tertium comparationis, i.e. a measure by
which we can be fairly certain we are comparing like with like.
The items to be compared across languages are selected on the
basis of perceived similarity (Chesterman 1998), such as
translation equivalence, semantic/etymological similarity,
grammatical or functional categories.
A frequently suggested tertium comparationis is translation
equivalence; which implies that the items in the two languages
convey (more or less) the same meaning.
23 >
Chesterman’s suggestion
(1998: 60)
1. Collecting primary data against which hypotheses are to be
tested. Primary data involve all instances of language use,
utterances that speakers of the languages in question produce.
2. Establishing comparability criterion based on a perceived
similarity of any kind.
3. Defining the nature of similarity and formulating the initial
hypothesis.
4. Hypothesis testing: determining the conditions under which the
initial hypothesis can be accepted or rejected.
5. Formulating the revised hypothesis.
6. Testing of the revised hypothesis, and so on.
24 >
Some benefits of a bidirectional
translation corpus such as the
ENPC
Comparable original and translated texts in both languages 
Control for translation bias
In-built tertium comparationis through translation equivalence and
text comparability
“…with the help of a corpus we get unprecedented opportunities
to study and contrast languages in use, including frequency
distributions and stylistic preferences. Corpora are absolutely
essential for macrolinguistic studies, but they will also enrich
studies of lexical and grammatical patterns.” (Johansson 2000)
And an important drawback: such corpora will always be limited
– In size, because of the work involved (copyrights, processing…)
– In coverage, because not all types of texts are translated.
25 >
Limitations
(As with corpus linguistics in general:) you can only search for
something that is explicit in the text
The size of the corpus restricts studies of less frequent lexis /
constructions
The corpus has not been parsed (syntactically annotated), i.e. it is not
possible to search for grammatical constructions, patterns of word
order etc.
Faulty and less successful translations
Tagging errors
26 >
How can we retrieve the relevant
constructions from the corpus?
The answer depends on the research question and on the corpus,
i.e. whether the corpus has annotation (information about PoS,
morphology, etc.)
– Research questions that can take lexical words as a starting point: relatively
easy; all you need to remember is to search for all possible forms of a word.
– Research questions that focus on a grammar feature (e.g. progressive
aspect) are trickier:
– Can be solved by identifying one or more lexical starting points
– Can be carried out if the corpus is tagged.
27 >
How do we know what
words/constructions to compare?
Not as trivial as it sounds
– For a suggestion of a method for discovering semantic relations across
languages on the basis of a bidirectional translation corpus, see DYVIK,
Helge: Translations as Semantic Mirrors: From Parallel Corpus to Wordnet.
(2004)
– Basically, the method involves taking advantage of the bidirectional
translation corpus, starting from a word in one of the languages, finding its
translations and in the next step see how these are translated in the other
direction.
Example: The Norwegian connector dessuten and its English
and French correspondences.
28 >
DESSUTEN in Norwegian originals
– a paradigm of correspondences
English translation ENPC fiction
(N=132)
moreover
besides
even
what BE more
also
and
in addition
as well
too
nor
Ø
others (only once)
29 >
French translation: FNPC (N=35)
en outre
d’ailleurs
aussi
de plus
en plus
de même
par ailleurs
et puis
… so which ones of these should be selected
for further contrastive study?
DESSUTEN – frequencies of
correspondences
ENPC fiction (N=132)
moreover
besides
even
what be more
also
and
in addition
as well
too
nor
Ø
others (only once)
30 >
FNPC (N=35)
7
27 (20.5%)
2
8
14
1
7
2
2
2
19
11
en outre
d’ailleurs
aussi
de plus
en plus
de même
par ailleurs
et puis
Ø
8 (22.9%)
2
4
6
1
1
1
1
11 (31.4%)
Mutual correspondence (MC)
(Altenberg 1999)
The frequency with which different (grammatical, semantic and
lexical) expressions are translated into each other.
Calculated and expressed as a percentage by means of the
formula
(At + Bt) x 100
As + Bs
The MC of dessuten and besides in the ENPC (fiction) is thus
(15 + 27) x 100 / (107 + 24) = 32.1
31 >
Using the ENPC/OMC for
research
Particularly well suited for studies of lexis / lexico-grammar (or
phenomena that can take lexis as their starting point)
A broad range of phenomena have been (are being) investigated, e.g.
the use of individual verbs (bli, få, take, give, see), modality, particular
syntactic constructions, connectives, sentence openings and other
discourse phenomena. (For a good overview up to 2007, see
Johansson 2007).
The methodology is not tied to any particular theoretical approach
“The material from the corpus serves first to verify the conclusions
based on the theory and second, to provide a means to collect
data in areas where the theory is inadequate.” (Filipović 1984:
113)
32 >
Lexicogrammar
Paradigms of correspondence highlight the fuzzy borderlines
between lexis and grammar and grammar and discourse.
Example: A modal verb will have a wide range of
correspondences
Norwegian kan
Modal aux: can, could, may, might, ‘ll, will, would,
should
Other verbs: know, enable, have, have to, had
better
Adjectives: possible, able capable.
Adverb: perhaps
(Løken 2007)
Suffix: -able
Mette seier han kan gå igjen, kaféen er alt stengd. (EH1)
Metta tells him to go away; the cafe is already closed.
33 >
Formulating research
questions in corpus-based CA
The research question needs to be one that can be investigated
with the material (and the annotation) in the corpus.
We may need to be prepared to either discard or reformulate the
original research question following initial corpus searches.
– Example: contrastive study of the Norwegian verb gjøre and English make.
The translation correspondences suggested that they were not each other’s
main correspondences.
Important to know the corpus in order to identify fruitful research
questions:
– ENPC translations carried out by professional translators  probably not
good material for studying errors (e.g. wrong use of “false friends”)
– Limited size and text types in the corpus; limits the types of claims that can be
made.
34 >
Case study 1: lexis
RQ: To what extent can the following words be
characterized as false friends?
– Norwegian eventuell(e)/eventuelt
– English eventual/eventually
– French éventuel(le)/éventuellement
Material: OMC (No-En-Fr) + FNPC + ENPC
Hypothesis: Norwegian and French are good friends,
but English eventual(ly) has a different meaning.
For more studies of false friends, see Languages in Contrast
10:2 (2010), Special issue: Pragmatic markers and
pragmaticalization: Lessons from false friends.
35 >
Basis for hypothesis: a
dictionary of false friends
Éventuel (F) vs Eventual (E)
Éventuel (F) means possible: le résultat éventuel - the possible
outcome.
Eventual (E) describes something that will happen at some
unspecified point in the future; it can be translated by a relative clause
like qui s'ensuit or qui a résulté or by an adverb like finalement.
Éventuellement (F) vs Eventually (E)
Éventuellement (F) means possibly, if need be, or even: Vous pouvez
éventuellement prendre ma voiture - You can even take my car / You
can take my car if need be.
Eventually (E) indicates that an action will occur at a later time; it can
be translated by finalement, à la longue, or tôt ou tard : I will
eventually do it - Je le ferai finalement / tôt ou tard.
(http://french.about.com/od/vocabulary/a/fauxamis-e_2.htm)
36 >
Some corpus examples
…for å hindre å vekke oppmerksomhet til eventuelle naboer (KF1)
…in order not to alert the neighbors
…pour ne pas éveiller l'attention d’éventuels voisins
…før vi eventuelt innledet nye forhold. (JG3)
…before embarking on any new relationships.
…avant de nous engager éventuellement dans une autre relation.
37 >
Renderings of eventuell/
eventuelt (N=11 in No-En-Fr)
English
French
Adj: possible: 1
Ø: 2
Adj: éventuel(le/s): 2
Ø: 1
Adv: no congruent rendering.
Adv: éventuellement: 1
plutôt: 1
Ø: 6
All cases of eventuelt are either
rephrased or omitted in the
translation.
Traces of the modal meaning of
eventuelt in use of modal verbs
(might, would), two cases of any,
and one or, where eventuelt is
used almost as a conjunction)
38 >
Traces of the modal meaning of
eventuelt in the choice of verb
forms il aurait été, il faudrait
savoir…
(one case of éventuellement in Fr.
translation does not come from N
eventuelt)
Éventuel(le) / éventuellement
in French originals (FNPC)
Only 3 examples found in fiction and non-fiction; 2 adjectives, 1
adverb.
1. En réalité, c'est moins la satisfaction d'un besoin réel qui peut
faire la beauté d'une chose utile, que la satisfaction possible
d'un besoin éventuel. (JLA1)
2. I virkeligheten er det mindre tilfredsstillelsen av et reelt behov
som ser skjønnheten i en nyttig ting, enn den mulige
tilfredsstillelse av et eventuelt behov.
3. …ce qu'ils auraient éventuellement à nous reprocher, …(CC1)
4. …det de måtte ha å bebreide oss,… (Lit: ’what they might have
to reproach us’)
39 >
Any conclusions?
1. The French and Norwegian meanings seem to be closely similar, with
no overlap with the meaning of English eventual/eventually.
2. The Norwegian and French adjectives EVENTUEL* seem to
correspond closely to each other in meaning and use.
3. The Norwegian and French adverbs eventuelt/éventuellement seem
to differ in frequency, with the Norwegian word being more frequent.
Difference of style level??
4. The Norwegian and French adverbs, in spite of similar meanings,
thus seem to have different distributional patterns.
5. (Possible translation effect: eventuelt/eventuellement may be
perceived as redundant and therefore omitted by the translator.)
6. More material is needed! E.g. from comparable corpora. (Monolingual
corpora seem to confirm the distributional difference: French
newspapers had c. 21 éventuellement per millon words, a Norwegian
corpus had c.154 eventuelt pmw.)
40 >
Case study 2: grammar
Presentative constructions in English and Norwegian (Ebeling 1999),
see also Ebeling (1998).
1. Common basic formula:
Dummy subject
Presentative verb
’existent’
Place adjunct
There
was
a client
at the counter
Det
var
en kunde
ved skranken.
Alternative expressions:
2. No dummy subject: A client was at the counter; En kunde var ved skranken.
3. Verb other than BE: Det stod en kunde ved skranken. ?There stood a client at
the counter. ?A client stood at the counter. En kunde stod ved skranken.
4. No adjunct: There’s been a robbery / Det har vært et bankran.
5. With definite NP: There was the radio in the kitchen / Det var radioen på
kjøkkenet.(AT1)
41 >
Some of Ebeling’s findings
Presentative constructions with det/there are much more frequent in
Norwegian than in English.
While the English there-construction is virtually restricted to the verb be, the
Norwegian det-construction can include a number of intransitive verbs, e.g.
finnes (‘exist’), bli (‘become’), bo (‘live’), gå (‘go’), komme (‘come’) and verbs
of posture.
Lexical verbs in the passive, e.g. ... det var blitt begått et mord like i
nærheten. (FC1). [‘there had been committed a murder just nearby’]
In translating a Norwegian det-clause with a verb other than være or finnes,
constructions without there are often chosen. (Presentatives with posture
verbs are often rendered by there + BE.)
Presentative constructions with fronted adjunct and without there (Behind the
shed was a bicycle) occur almost exclusively in written, often literary, texts,
and they introduce something perceivable or concrete.
42 >
Case study 3: discourse
RQ: What are the correspondences of the Norwegian connective
dessuten in English and French? In what contexts are they used?
Can we create a semantic map of this type of discourse relation?
Material: ENPC + FNPC
Search procedure: starting from Norwegian dessuten to map out
correspondences. Choosing the top four correspondences (except
Ø) and investigate correspondences in the other direction.
43 >
DESSUTEN – frequencies of
correspondences
ENPC fiction (N=132)
besides
also
what be more
moreover
in addition
even
as well
too
nor
and
Ø
others (only once)
44 >
FNPC (N=35)
27 (20.5%)
14 (10.6%)
8 (6.1%)
7 (5.3%)
7 (5.3%)
2
2
2
2
1
19
11
en outre
de plus
aussi
d’ailleurs
en plus
de même
par ailleurs
et puis
Ø
8 (22.9%)
6 (17.1%)
4 (11.4%)
2 (5.7%)
1
1
1
1
11 (31.4%)
Besides, also, what’s more, in
addition and moreover in Eng. orig.
fiction
•Besides (adv.): 19 hits.  dessuten (15), forresten (3), også (1), Ø (1)
•Also: 186 hits.  også (129), Ø (26), og (9), dessuten (6), heller ikke (5),
så (3), og så (3), other (5)
•What’s more: 6 hits.  attpåtil (2), dessuten (2), Ø (1)
•In addition: 3 hits. Correspondences: i tillegg (2), Ø (1)
•Moreover: 1 hit  til og med
besides
tr from dessuten
also
what's more
orig
0%
45 >
in addition
20 %
40 %
60 %
80 % 100 %
moreover
En outre, de plus, aussi and
d’ailleurs in French original fiction &
non-fiction
En outre (8 hits): dessuten (4), i tillegg (1), også (1), Ø (2)
Aussi (134 hits): også (93), dessuten (5), og (2), i tillegg (2), Ø (25), other
(7: ikke minst, selv, ikke-heller, både-og, samt, likevel, dertil) [only fiction;
only connector]
De plus (8 hits): dessuten (2), i tillegg (3), enn videre (1), også (1), Ø (1)
D’ailleurs (51 hits): for øvrig (16), forresten (11), dessuten (9), other (3)
[egentlig, faktisk, heller ikke]
tr from dessuten
en outre
aussi
orig
0%
46 >
de plus
20 %
40 %
60 %
80 %
100 %
d'ailleurs
Some observations
Dessuten is is more frequent than any of its correspondences execpt
also/aussi. Stylistically neutral; some of the English correspondences are
not. French??
Both also and aussi have high percentages of Ø correspondences 
apparently often perceived by translators to be redundant.
Besides, moreover, and in addition are all more frequent in translations of
dessuten than they are in original English fiction. A translation effect?
De plus and en outre have about the same frequencies as translations of
dessuten as they have in French originals.
En outre in translation also has other sources, thus is slightly more
frequent in translation from Norwegian than in original French.
The correspondences of d’ailleurs suggest that this connector signals the
addition of something slightly more peripheral (more like incidentally)
Translations into and/et and also/aussi is a kind of ‘normalization’, the
choice of a more general term in the target language. For English, a wish
to avoid formal connectives in fiction?
47 >
A map of additive relations
emerging from dessuten
også
heller ikke
dessuten
i tillegg attpåtil
also
in addition
moreover
besides
forresten
what’s more
dessuten
aussi
også
48 >
dessuten
en outre
i tillegg
de plus
for øvrig
d’ailleurs
forresten
Conclusions
The recurrent correspondences, except aussi/also/også mark an additive
relation that is explicitly ”on top of”; some of them even incidental.
The most ”literal” expressions of this meaning are probably in addition to,
i tillegg til, de plus.
The strongest mutual correspondences of dessuten are with besides for
English (32.1) and en outre for French (33.3). Both MCs are
asymmetrical; en outre and besides are translated into dessuten more
often than the other way round.
The most general expressions of addition (aussi/also) get high
percentages of Ø correspondences (19% for aussi; 14% for also)
A hypothesis about discourse relations that needs to be tested further: Of
the three langauges, French seems to mark the additive relation most
frequently.
49 >
Summing up
Parallel corpora enhance contrastive studies in a number of ways
– by ensuring that observations are based on authentic language use
– by yielding paradigms of correspondences
– thus often revealing meanings and nuances we might not have thought of
– and showing how the same meaning may be expressed by means of different
linguistic categories
– by providing quantitative data
– … thus also giving insights into ‘preferred ways of putting things’
– (if the corpus is bidirectional) by providing control for translation bias and by
allowing for ‘reverse’ investigations
– (if the corpus is representative) by controlling for the idiosyncrasies of
individual authors/translators
50 >
Why undertake corpus-based
contrastive investigations?
The importance of multilingual corpora
extends beyond contrastive studies. It is
up to the user to define fruitful research
questions and use the corpora creatively.
In this process we learn not only about
individual languages and their
relationships, about translation and
foreign-language acquisition, but also
about language in general – provided that
the study becomes truly multilingual.
Seeing through corpora we can see
through language.
Stig Johansson (2007: 316)
51 >
Information on the ENPC /
OMC
About the corpora:
ENPC: http://www.hf.uio.no/ilos/english/services/omc/enpc/
www.helsinki.fi/varieng/CoRD/corpora/ENPC/
OMC: www.hf.uio.no/ilos/english/services/omc/
About publications based on the OMC (up to 2006):
www.hf.uio.no/ilos/forskning/prosjekter/sprik/english/publications/
Very small, freely available translation corpus:
http://khnt.hit.uib.no/webtce.htm
52 >
References
Aijmer, K. & B. Altenberg. 1996. Introduction. In K. Aijmer, B. Altenberg, M. Johansson
(eds.) Languages in Contrast. Lund University Press, 11-16.
Altenberg, B. 1999. Adverbial connectors in English and Swedish: Semantic and lexical
correspondences. In Hasselgård & Oksefjell (eds.) Out of Corpora. Amsterdam: Rodopi,
249-268.
Chesterman, A. 1998 Contrastive Functional Analysis. Amsterdam/Philadelphia: John
Benjamins Publishing Company.
Dyvik, Helge. 2004. Translations as Semantic Mirrors: From Parallel Corpus to Wordnet. In
Aijmer, K. and B. Altenberg (eds) Advances in Corpus Linguistics. Amsterdam/New York:
Rodopi, 311-326.
(/www.ingentaconnect.com/content/rodopi/lang/2004/00000049/00000001/art00018)
Ebeling, J. 1998. Contrastive Linguistics, Translation, and Parallel Corpora. Meta 43:4, 602615. http://www.erudit.org/revue/META/1998/v43/n4/002692ar.pdf
Ebeling, J. 1999. Presentative Constructions in English and Norwegian : A corpus-based
contrastive study. Acta Humaniora 68. Oslo: Unipub forlag.
Filipović, R. 1984. What are the primary data for contrastive analysis? In Fisiak J. (ed.),
Contrastive linguistics. Prospects and Problems. Berlin/New York/ Amsterdam: Mouton
Publishers, 107-118.
James, C. 1980. Contrastive Analysis. London: Longman.
Johansson, S. 2000. Contrastive Linguistics and Corpora. University of Oslo, SPRIK
reports 3: www.hf.uio.no/ilos/forskning/prosjekter/sprik/docs/pdf/sj/johansson2.pdf
Johansson, S. 2007. Seeing through multilingual corpora. Amsterdam: Benjamins.
53 >
Download