Words of desire in English and French 1. Introduction 1.1. Terms For this project the English terms desire, intend and wish were chosen as they have a certain degree of semantic overlap, but remain distinct enough in their scope to warrant further exploration as to their usage. Three corresponding French terms were also chosen as they can be considered as reasonable equivalents for the English terms; indeed avoir l’intention de, désirer and souhaiter could also be considered as having a large degree of shared semantic content with both the equivalent English terms and each other. 2. Methodology 2.1. Corpora and Engines 2.1.1. English For the English data, the BNC (2005) was chosen as it is a large corpus with over 100 million words, which allows reliable analysis of linguistic data as a large number of hits are attainable from a wide variety of sources. It is therefore possible to discover a wide range of senses for a given item, whilst ensuring a representative sample. The engine used to query the BNC was Kilgarriff and Rychly’s Word Sketch Engine (Kilgarriff and Rychly 2005 hereafter referred to as WSE), as this allows for a detailed analysis of concordances, giving information on collocates and other statistics. It is also possible to compare this with data obtained from the Leeds BNC engine (Sharoff 2005b) to see any discrepancies. 2.1.2. French For the French data, a corpus of data taken from Internet pages (Sharoff 2005c) was used, accessed via Sharoff’s interface (2005a), as this allows for a fuller analysis of concordance lines than other interfaces examined such as the Corpus Lexicaux Québécois (Gouvernement du Québec) or the Corpus Concordance French (Cobb), as it is possible to obtain statistics on collocations for a given item. A potential problem with this corpus is that the range of sources is not as wide as the BNC, but given the lack of a more suitable source of information it is necessary to work with what is available. 2.2. Method of Analysis For the English data, a concordance is created using WSE (Kilgarriff and Rychly 2005), searching for the relevant lemma. One problem that immediately appears is the tendency for a word to appear 00000000 MODL5007 Corpus Linguistics 1 repeatedly in one text, which could potentially influence the interpretation of a concordance due to the style of a particular author or a specific usage in a given domain. WSE gives the possibility of avoiding this by taking a random sample of a specified size from the concordance lines. This is not possible for the French data, as WSE only works with data from the BNC and Susanne (Sampson 2005). The internet corpus interface (Sharoff 2005a) does not offer this function, but however as it uses a smaller corpus there are fewer hits for each word, which gives a workable number of concordance lines from a wide range of sources. For both languages it had been hoped to work with a sample of 150 concordance lines which should be wide enough to gain a reasonable sample to work with, although unfortunately for the French data this is sometimes below the 150 hits stipulated, but in all cases the number received is close to this number. In order to examine the collocates for the verbs, it is necessary again to employ a different strategy for each language. WSE offers the possibility of producing a word sketch for English lemmas, which gives subjects, objects and other types of segment that frequently appear with a given lemma. For French this is sadly not possible, but the internet corpus does offer the possibility of computing the collocation statistics for a query. It is possible to calculate the LL, MI and T scores for collocates; in all cases, the most useful statistical measures for dictionary writing using this method tend to be the LL and T scores, which produce very similar results. With both of these, the collocates with the highest scores are the strongest collocations and are those used in this project. It is however useful to read the lists to note any other collocations that were important to include. It is also possible to search for collocates according to their part of speech. For verbs, it is particularly useful to search for nouns on the left or right in order to discover any collocation patterns for subjects and objects. It is also useful to search for adverbs on the left or right in order to determine the sorts of adverbs used with these verbs. Avoir l’intention de presents a particular case, as this construction is always followed by a verb, so it is therefore not useful to look at the right context. There are, however, two possible places an adverb could be placed: to the left of the verb group or to the right of avoir. It is therefore necessary to do a search for collocates of avoir le intention de and also le intention de. Adverbs and adverbial phrases are classed according to Melčuk’s lexical functions (Wanner, p.22); the most common lexical functions found with these are magn (intense) and ver (genuine). 3. Analysis 3.1. Desire When searching for concordances for desire, it is important to ensure that there are no concordances found for the noun desire, which gives false results. In order to avoid this, in WSE it is possible to select the part of speech for a given lemma, thus ensuring that only verb forms are found. This problem is only present in the words chosen here in desire and wish, but had the same problem been the case for 00000000 MODL5007 Corpus Linguistics 2 any of the French verbs, it is possible to limit the hits from the noun forms using CQP syntax. In this case, a search in the form [lemma=“x”&pos=“V.*”], where x is the verb required filters the results, only allowing the corpus query processor to display verb forms (assuming the corpus has been correctly tagged). Upon examination of the concordance of desire, there is one specific pattern that is immediately obvious upon sorting the lines according to the right context. By far the most common pattern is desire followed by a verb in the infinitive. When we examine noun phrases that appear in object position (to the right of the verb) it becomes clear that there are two types: abstract and concrete. The sorts of abstract noun that appear most frequently with desire are nouns like cooperation and peace. Indeed, when a word sketch is performed in WSE, we find peace listed as the most common object of this verb. Another pattern that becomes clear upon examination of the concordance lines is the construction to leave a lot/a great deal/much to be desired. This had a corresponding French expression that will be discussed below. When we compare the modifiers that appear with this verb in the BNC in WSE with those we can discover the Leeds BNC engine (Sharoff 2005b), we see that there are fewer results in WSE. In these examples we see that the sorts of adverbs associated with ‘desire’ according to WSE are so, much and really. The Leeds engine allows us to search for adverbs to the left and right of the lemma we are searching for. In this way, it was possible to find adverbs and adverbial phrases such as a great deal and earnestly. 3.2. Intend One thing that is immediately clear upon querying the BNC using either engine is the level of formality compared with desire. There is a sense of intend which seems to be found in legal and political discourse that we do not find with desire that evokes a decision. When we examine the collocates for this verb, we can see that the adverbs and adverbial phrases tell us something about the nature of the verb. Originally and primarily are among the collocates, which shows us something about the semantic content of intend. We see that intend is used to talk about an individual’s or a group’s plans with reference to the extent to which the plan was carried out; using originally with intend shows that the outcome differs somehow to the plan, whereas primarily infers that there are secondary effects. Other common collocates are fully and clearly, which again give information about the extent and scope of a plan. This continues into the constructions used with this verb, we see that no pun intended shows a divergence from a plan with an unintentional slip-of-the- 00000000 MODL5007 Corpus Linguistics 3 tongue. In a similar vein, WSE gives joke as one of the collocates, showing that intend is used when talking about offence. There seems to be a degree of negative semantic prosody here, though not as strong as that described by Hunston (2002, p.119), in that this verb conveys the idea of the failure of a plan, usually to someone’s detriment. The subject collocates for this verb also evoke a legal and political domain. WSE gives parliament and defendant as two of the main collocates, which are clearly linked to this type of discourse. 3.3. Wish As with desire, it is necessary to specify that the lemma being searched for was a verb to avoid any confusion with the noun forms of this word. This works in most cases, assuming the texts being queried have been correctly tagged. It is possible to discern several senses of the verb wish by looking at the concordance lines from the BNC. Wish is generally used to refer to a desire that is unattainable for some reason given the current situation. This may be because the event that precludes the desired outcome has already passed or that it is simply an impossibility (particularly in the usage to express a wish – I wish to be rich). This meaning is shown particularly well in the usage that is used to express regret (I wish I could do that) One meaning that clearly distinguishes wish from the other usages we have examined where it replaces want in formal styles. This is a more concrete usage than the others we have examined as it does not refer to an abstract plan or desire, rather a more immediate want. Upon examining the collocates for wish, we find very specific uses that involve large events such as to wish someone a happy birthday/merry Christmas, or more general usages that involve hoping someone will be lucky or healthy. 3.4. Avoir l’intention de As previously explained, this construction poses a problem as it is the only example of a phrase that needed to be queried. It is necessary to create a query that searched for each element using its own lemma. Due to the gender system in French it is interesting to note that although intention is feminine, the article la is lemmatised in its masculine form le. Searching for avoir le intention de enables us to find all forms of this verb and discern patterns. As with its English equivalent to intend this item has a strong feeling of a difference between a planned event or situation and the outcome. This is shown in the specific meaning that is translated by to mean to do sth or to not intend to do sth. 00000000 MODL5007 Corpus Linguistics 4 Another meaning that clearly has a direct English equivalent is the academic sense to set out that is used to describe the intentions of a writer in a journal, paper or essay. 3.5. Désirer Désirer is perhaps the item that has the closest mapping between English and French, with very similar senses to the English desire. It may be noted however that désirer may be considered somewhat less formal than the English, as we see that it is possible to translate it by using a more generic verb such as want. It is not particularly surprising to find a high degree of semantic overlap between these items; like a high proportion of English and French words, the two forms described here both have a shared Latinate base dēsīderāre. In the sample of concordance lines it is perhaps due to the more restrictive nature of the internet corpus that there are fewer meanings that can be ascertained. 3.6. Souhaiter As well as the meanings discussed for wish above, it is clear that souhaiter has some supplementary related meanings: English uses a different verb for souhaiter la bienvenue and souhaiter bon soir. We are much more likely to bid someone good evening, for example when parting, and to use a different verb when greeting. It is however possible to use souhaiter in both cases in French. Another specific meaning in the French is souhaiter la bienvenue that requires the use of a different verb in English; this meaning is conveyed by to welcome in English. It is interesting to note that French also has a specific verb for this, accueillir, but it seems that both verbs can be used. It is also interesting to note that the French does not give the possibility of conveying the same sense of wish that refers to someone’s heart’s desire. Souhaiter used in this way carries only the weaker meaning. 4. Conclusion In this limited study of only a few items it is clear that the meanings discovered are very similar across the two languages. However, it has been possible to discover different usage restrictions through the examination of collocates and other phenomena through the exploration of corpora using different interfaces that may prove useful to translators and lexicographers. 2179 Words 5. References 00000000 MODL5007 Corpus Linguistics 5 British National Corpus (2005) British National Corpus [Internet] Oxford, Oxford University Computing Services. Available from <http://www.natcorp.ox.ac.uk> [Accessed 11/12/05] Cobb, T (2005) Corpus Concordance French [Internet]. Montréal, Université du Québec à Montréal Available from <http://www.lextutor.ca/concordancers/concord_f.html> [Accessed 14/12/05] Gouvernement du Québec (2005) Secrétariat à la politique linguistique – Corpus Lexicaux Québécois [Internet] Québec, Gouvernement du Québec. Available from <http://www.spl.gouv.qc.ca/corpus/index.html> [Accessed 14/12/05] Hunston, S. (2002) Corpora in Applied Linguistics. Cambridge: Cambridge University Press Kilgarriff A. and Rychly, P. (2005) Word WSE [Internet]. Lexical Computing Ltd. Available from <http://www.sketchengine.co.uk> [Accessed 11/12/05]. Sampson, G. (2005) Geoffrey Sampson: Susanne Scheme [Internet]. Available from <http://www.grsampson.net/RSue.html> [Accessed 11/12/05] Sharoff, S. (2005a) A Query to Internet Corpora [Internet]. Leeds, Leeds University Information Systems Services. Available from <http://corpus.leeds.ac.uk/internet.html> [Accessed 1/12/05]. Sharoff, S. (2005b) A Query to English Corpora [Internet]. Leeds, Leeds University Information Systems Services. Available from <http://corpus.leeds.ac.uk/protected/> [Accessed 1/12/05]. Sharoff, S. (2005c) Creating general-purpose corpora using automated search engine queries. In Baroni, M. and Bernardini, S. Web as corpus. Wanner, L. ed. (1996) Lexical Functions in Lexicography and Natural Language Processing. Philadelphia/Amsterdam, John Benjamins Publishing Company. 00000000 MODL5007 Corpus Linguistics 6