Polishing Papers for Procrustean Beds?1 Publication: Palimpsests or John McKenny University of Nottingham, Ningbo China Karen Bennett Centre for Comparative Studies University of Lisbon Portugal Abstract Portuguese academic discourse of the humanities is notoriously difficult to render into English, given the prevalence of rhetorical and discourse features that are largely alien to English academic style. The aim of this study was to test the hypothesis that some of those features might find their way into the English texts produced by Portuguese scholars through a process of pragmalinguistic and sociopragmatic transfer. If so, this would have important practical and ideological implications, not only for the academics concerned, but also for editors, revisers, teachers of EAP, translators, writers of academic style manuals and all the other gatekeepers of the globalized culture. The study involved a corpus of some 113,000 running words of English academic prose written by established Portuguese academics in the Humanities, which had been presented to a native speaker of English (professional translator and specialist in academic discourse) for revision prior to submission for publication. After correction of superficial grammatical and spelling errors, the texts were made into a corpus, which was tagged for Part of Speech (CLAWS7) and discourse markers (USAS) using WMatrix2 (Rayson 2003). The annotated corpus was then interrogated for the presence of certain discourse features using Wmatrix2 and Wordsmith 5 (Scott 2006), and the findings compared with those of a control corpus, Controlit, of published articles written by L1 academics in the same or comparable journals. The results reveal significant overuse of certain features by Portuguese academics, and a corresponding underuse of others, suggesting marked differences in the value attributed to those features by the two cultures. Keywords: academic discourse, humanities, Portuguese, English, research articles corpus 1 This is a postprint of an article published in English Text Construction, 2.2. 2009. 228-245. Reproduced with the kind permission of John Benjamins Publishing. The complete article may be accessed at: http://benjamins.com/#catalog/journals/etc.2.2.06mck/details Introduction English academic discourse, which emerged in the 17th century as a vehicle for the new rationalist/scientific paradigm (Halliday & Martin, 1993:2-21, 54-68; Martin, 1998), now holds hegemonic status on the world stage, and mastery of it is essential for any scholar wishing to pursue an international career (Tardy 2004). However, it may not be taken for granted that all cultures construe knowledge in the same way (Canagarajah 2002:1-5). In Portugal, which did not experience a Scientific Revolution as such, an older humanitiesbased tradition was perpetuated by an education system grounded on Scholastic and Rhetorical principles. As a result, Portuguese academic discourse in the humanities contains features that are markedly different from the hegemonic English style - so much so, in fact, that they may even reflect a whole different underlying epistemology (Bennett, 2006, 2007a, b). The extent to which these features intrude upon the English writing produced by Portuguese academics wishing to publish abroad constitutes the main aim of this paper. The possibility that there may exist cultural differences in discursive or expository writing patterns was first raised by Robert B. Kaplan in a seminal paper first published in 1966. In it, he suggested that many of the errors of text organisation and cohesion made by foreign students in their academic writing may be due to different cultural conventions and indeed ‘thought patterns’ encoded in their mother tongues. Logic (in the popular, rather than the logician’s sense of the word), which is the basis of rhetoric, is evolved out of a culture; it is not universal. Rhetoric, then, is not universal either, but varies from culture to culture and even from time to time within a given culture. It is affected by canons of taste within a given culture at a given time. (Kaplan, 1980:400) He went on to assert that the typical linear development of the expository English paragraph may in fact be quite alien to other cultures, and even suggested a series of diagrammatic representations of how a paragraph might develop according to Semitic, Oriental, Romance and Russian styles (Idem:403-411). Although this initial approach was overly simplistic, Kaplan’s work gave rise to a multitude of similar studies that explored discourse differences from a variety of cultural perspectives (eg. Smith, 1987; Ventola & Mauranen, 1996; Duszak, 1997), eventually culminating in the formal constitution of the discipline that is today known as Contrastive Rhetoric (Connor, 1996). Thus, English academic writing has been compared to ‘teutonic, gallic and nipponic’ styles (Galtung, 1981), German (Clyne, 1987a, 1987b, 1988), Indian languages (Kachru, 1987); Czech (Cmejrková, 1996, 1997), Finnish (Mauranen, 1993), Polish (Duszak, 1994), Norwegian (Dahl, 2004) and Russian/Ukrainian (Yakhontova, 2002, 2006) to name but a few. Unfortunately, Portuguese academic discourse has been somewhat neglected amidst this plethora of contrastive rhetorical studies. There has been some investigation into other Romance languages, particular Spanish, which has a certain relevance: for example, Kaplan (1980:408), in his initial article, observed that 'there is much greater freedom to digress or to introduce extraneous material in French, or in Spanish, than English’, while Grabe & Kaplan (1996:194), summarizing the work of several different researchers, report that Spanish writers prefer a more ‘elaborated’ style of writing, use longer sentences and have a penchant for subordination. More recently, Martín Martín (2003) has investigated rhetorical variation between social science abstracts in Spanish and English; Moreno (1997) has looked at the use of causal metatext (or text about text) in the same two languages, and Mur Dueñas (2007b) has examined pronoun use and selfmention. Salager-Meyer (2003) also explores the differences between Spanish, English and French in her work on medical discourse, while, within pragmatics, Cuenca (2003) examines reformulation markers in English, Spanish and Catalan. As regards Portuguese in particular, McKenny (2005) examines epistemic stance and dogmatism in the argumentative writing of Portuguese advanced learners using Porticle, the Portuguese subcorpus of ICLE, the International Corpus of Learner English, and, in a later work (2007) discusses the implications of differing rhetorical conventions and traditions for the teaching of EAP writing. Bennett’s work on Portuguese academic writing (2006, 2007a, b) differs from the Contrastive Rhetoric studies described above in that it is not oriented towards the teaching (EAP) profession. Instead, it took place within the sphere of Translation Studies (TS) and involved the systematic analysis of a corpus of Portuguese academic texts that had been submitted for translation. The aim was to determine some of the problems raised by differences between source text features and target culture expectations, extending beyond the merely technical to take in the ethical and ideological implications of 'domestication' (i.e. the systematic refashioning of the source text to bring it into line with target culture norms) (Venuti, 1995). The present paper to some extent represents a continuation of that project, in that it deals with a parallel corpus of texts also written by Portuguese academics, though this time in English, as they were submitted for revision rather than translation. Revision is thus considered here as paratranslational activity, and the language reviser is perceived as one of the many 'literacy brokers' that typically intervene in a text in order to prepare it for publication in the English-speaking world (Lillis & Curry, 2006). Much Portuguese academic writing in the humanities displays characteristics that are diametrically opposed to those valued by English Academic Discourse writing manuals (Bennett, 2009). It is characterised by a taste for ‘copiousness’ (manifested by a general ‘wordiness’ and redundancy); a preference for a high-flown erudite register over the demotic (evident in both syntactical structure and lexical choices), and a tendency towards abstraction and figurative language. Cohesion is frequently achieved through elaborate synonyms and cataphora, rather than by ellipsis or anaphoric pronouns as might be preferred in English (Halliday & Hasan, 1976; Mateus et al. 1989: 146); and there are also important differences as regards textual organization: a propensity for indirectness means that the main idea is often embedded, adorned or deferred at all ranks. Some of these features are illustrated in the extract of Portuguese academic prose presented below2: 2 Clarity of exposition and logical reasoning are clearly not objectives here, for the text revels in ambiguity, deliberately setting up paradoxes and analogical relations and using language in a non-referential way. The syntax is incredibly complex, with a meandering main clause that is constantly being interrupted by circumstantial information; and there is also a high degree of abstraction that is scarcely digestible by the English language (eg. ‘tragicity’, ‘Portugalness’; ‘messianity’). There are also very few of the material O ensaísmo trágico de Lourenço, [sic] parece em parte decorrer da sua própria tragicidade de ensaísta, malgré lui, Lourenço’s tragic essayism seems in part to arise out of his own tragicity as an essayest, ‘malgré lui’, como se esta posição de metaxu do pensamento português, entre o mythos e logos, projectada no papel do crítico as if this position of ‘metaxu’ of Portuguese thought, between ‘mythos’ and ‘logos’, projected onto the role of critic que tragicamente parece assumir, entre o sistema impossível e a poiesis estéril, o guindasse para um lugar / não lugar which he tragically seems to assume, between the impossible system and the sterile ‘poiesis’, hoists him to a place / non-place de indecibilidade trágica, ao mesmo tempo que, inserido no fechamento de um pensar saudoso, na clausura of tragic undecidability, at the same time as, inserted into the closure of a yearning thought, in the confinement de uma historicidade filomitista, mais do que logocêntrica, se debate na paradoxia de uma portugalidade sem mito, of a philomitist historicity, more than logocentric, struggles in the paradoxalness of a Portugalness without myth, atada à pós-história de si mesmo, simultaneamente dentro e fora dela. bound to the post-history of itself, simultaneously inside and outside it. Fig. 1: Varela, M.H. 2000. ‘Rasura e reinvenção do trágico no pensamento português e brasileiro. Do ensaísmo lúdico ao ensaísmo trágico’ in Revista Portuguesa de Humanidades, Vol.4 (UCP, Braga) Hence, this study is designed to test the hypothesis that some of the discourse features typical of Portuguese writing in the humanities may manifest themselves in the English-language texts produced by Portuguese scholars, over and above the kind of cross-linguistic transfer that is expected on the level of grammar and lexis (Odlin 1989). Certain epistemological issues had to be taken into consideration from the outset of this experiment. If the researcher has strong intuitions as to why a group of writers write in a certain way based on long experience of teaching EAP, translating and polishing papers, should these intuitions be brought to bear a priori on the corpus analysis? Such a method seems to run counter to the position of Sinclair (2004) and Tognini-Bonelli (2001) who each recommend approaching the data without processes that are so predominant in English academic prose, and instead most are relational or existential. See Bennett (2006) for a more detailed analysis of this passage. presuppositions and going where the data lead. As researchers, however, we did not feel impelled to choose between corpus-based or corpus-driven linguistics (Ooi 1998). When we uploaded our two corpora to Wmatrix2 (Rayson 2003) information about distinctive features of the corpora was registered automatically by the software. The resultant data did not necessarily accord with our predictions or perceptions. At this stage, the phase of POS and semantic tagging and automatic corpora comparison, our investigation was corpus-driven. When we brought our intuitions to bear on the resultant data, in order to sort out features worth investigating further, we were doing a corpus-based analysis of our corpora. At different stages we were doing different kinds of corpus linguistics. We assume that scholars are capable of periods of epoché when their most firmly held beliefs are suspended, questioned or submitted to empirical tests. Indeed, one of the suggestions made in this paper is that corpus-based critical discourse analysis is a potentially fruitful approach to the study of intercultural rhetoric. Corpus and Methods The research was based on the comparison of two corpora each of around 113,000 words. The corpus under investigation, dubbed Portac, consists of a sample of articles from the area of the Humanities or Arts written by a group of senior Portuguese academics aiming to publish their work in English-language journals. The control corpus (Controlit) was a collection of articles already published by L1 academics in the same or comparable journals. The Portac corpus was basically opportunistic or self-selecting, as it consisted of draft papers intended for publication and written by individual academics willing to allow their texts to be used as data for linguistic investigation. This data resulted from the work of one of the authors as a language reviser, that is to say, a professional translator and specialist in academic discourse who undertakes to revise a manuscript prior to its submission for publication. When the agreement of the Portac authors had been obtained, we set out to compile a control corpus of comparable overall size made up of texts with similar communicative purposes. A list was drawn up of the English language journals in which the Portuguese authors wished to be or had been published. This list was subsequently narrowed down to four journals, chosen because articles published in them were available electronically from university library databases, and a census was made of the articles published in these four journals between 2005 and 2008. Two filters were applied in selecting articles as candidates for inclusion in Controlit. Firstly, only those articles written by single authors were retained. Secondly, an attempt was made to ensure that we selected only articles written by native speakers of English. Using surnames as a guide, only the texts of authors with Anglo-Celtic names were considered (e.g. Richardson, Saunders, Newlyn, Groves, Neill, Ricks) and a further check was made on first names. Admittedly this method is far from infallible but it at least minimizes the likelihood of including L2 writers in the Controlit corpus, which was designed to represent L1 writing. The result of this filtering left a set of articles which we chose from according to theme: those articles which dealt with subjects of interest to our Portac writers, broadly considered, were selected to make up the Controlit corpus. Two software suites were used for this study in a complimentary fashion. Wmatrix21 (Rayson 2003), available to scholars online, enables the investigator to compare two corpora and continually shift focus as trends become apparent; that is to say, researchers may quickly compare lexical and grammatical dimensions from the perspective of one or other of the corpora. Wordsmith Tools 5 (Scott 1999) was used to carry out searches which are not available on Wmatrix2 such as the creation of frequency counts of word clusters, or word or N-gram searches using a wild card (for example, for polysyllabic noun forms, a frequency list of all words ending in *ion). Results of corpus comparison in Wmatrix2 and in Wordsmith Tools are expressed in terms of Log Likelihood2 (henceforth LL), which measures the likelihood that a difference between the observed frequency of an item and its expected frequency is not random. The higher the LL value, the more significant is the difference between two frequency scores. An LL value of 3.8 or higher is significant at the level of p < 0.05 and an LL of 6.6 or higher is significant at p < 0.01. Results Probably the most significant finding was the high degree of nominalization present in the writing of Portuguese academics compared to the control corpus. This was manifested in a number of ways. At the level of individual words, there was an overuse of nouns, both singular (LL 25.17) and plural (LL 69.81), and, as might be expected in such a context, a greater use of indefinite and definite articles (LL 43.81 and LL 36.13 respectively). Concomitant with this, there was also a massive underuse of pronouns in Portac, 6,154 (6.11% of all text) vs. 8,671 in Controlit (8.49%), giving an astonishing Log Likelihood of 394.98. This may represent a straightforward consequence of nominalization; for, as Biber et al. (1999:92) conclude, from analyzing various written corpora totalling 40 million words, ‘a high frequency of nouns/…/corresponds to a low density of pronouns’. However, the Portac writers also seem to be selective about the pronouns they avoid: he (LL -232), she (LL -104), him (LL 96), I (LL -39), me (LL -37), it (LL -25.74) were all underused, while we (39.41) and us (16.85) were overused. This overuse of the plural pronouns in Portac cannot be attributed to multiple authorship as all the articles in Portac and also in Controlit were written by a single author. There seems to be some other mechanism at work, as we discuss below. Of the nouns employed, Portuguese authors appear to have a penchant for polysyllabic abstract nouns of Latinate origin. Using Wordsmith 5 to search on *ion, 2,184 instances of this suffix were obtained in Portac compared to only 1,458 in Controlit (the Log Likelihood of such a difference is 163), while the results for –icity, ization and –ation gave LL7.07, LL14.16 and LL50.71 respectively. Hofland and Johansson (1982:22) suggest that the high frequency of the indefinite article an found in written informative prose indicated a high proportion of Latinate vocabulary. The Portuguese writers’ overuse of an (LL 18.65) may thus be a direct consequence of their greater use of Latinate word tokens consonant with their mother tongue’s close filiation with Latin. Adjectives were also more prevalent in Portac (46.58), which once again indicates a heavy concentration of semantic content in the noun phrase. Perhaps also related to the tendency for nominalization was a truly startling overuse of the genitive, both singular and plural (’s and s’) (LL 211.64), and also the alternative construction using of to express the same relationship (LL 34.03). In some cases, this may simply reflect the difficulty that non-native speakers have with English compound nouns (examples from Portac include the world’s population, where a native speaker might prefer the world population or Luanda’s slums instead of the Luanda slums). Elsewhere, however, it seems to derive directly from the tendency to overnominalize as in the following example, the genitive in the noun phrase: a comment on the possibilities of the play’s staging was reconstrued by the reviser using a clausal form (i.e. a comment upon how the play might be staged). Wmatrix was used on the POS tagged versions of the corpora to search for subordinating conjunctions (e.g. if, because, unless, so, for, although, while) and coordinating conjunctions (and, or, nor). The Portac writers, at first glance, appeared to underuse subordination (LL -8.16) and greatly overuse coordination (LL 26.17) in comparison with the writers in Controlit. Although this automated measure of subordination seems to suggest that the Portuguese academics use fewer subordinate clauses, it needs to be remembered that the POS tag, CS, which stands for subordinating conjunction, does not include occurrences of that used as a relative pronoun. Clearly, relative clauses are subordinate clauses par excellence. A non-computerized search was needed to distinguish the uses of that as a relative pronoun in the two corpora. The greater frequency of that relatives in Portac produced a Log Likelihood of 10.15, so this kind of subordination, at least, was more frequent in the English academic discourse of the Portuguese writers. The second most frequent use of that in the two corpora was to introduce clauses embedded in matrix structures. This structure allows a writer to thematize attitudinal meanings and offers an explicit statement of evaluation by presenting the ‘evaluative that’ clause embedded within a matrix clause: I should say from the start that my aim here is not to address the problem of translation itself - although obviously, in this context, some issues relating to it have to be considered. (from Portac) It has long been recognised that Shakespeare read and borrowed from the Geneva translation of the Bible (from Controlit). As mentioned in the introduction, there is a rich mosaic of contrastive work done by scholars on different aspects of academic discourse, a good deal of which focuses closely on one particular syntactic or rhetorical feature. One such investigation is that of Hyland and Tse (2005) who looked at the frequency and function of ‘evaluative that’ clauses in academic abstracts. Drawing on this work, every example of that in both our corpora was examined to eliminate all cases where that was used to perform other grammatical functions, such as where it acted as demonstrative or relative pronoun. The result of this non-computerized search is given in Table 1. It is clear that this type of subordination is used more frequently by the Portac authors. Portac Controlit Log Likelihood 753 662 5.7 Table 1 Evaluative that Closer inspection of evaluative that clauses showed that Portac writers make greater use of certain kinds of embedding or matrix structures (such as We can see that…; It should be pointed out that...) particularly to carry epistemic stance. Table 2 shows the results of concordance searches using Wordsmith Tools on four variable structures. The choice of these sentence initial frames was guided by the intuitions of the authors. Search words with wild cards (*) Portac Controlit It * * that 42 28 It is * * that 27 22 We*that 17 4 We** that 15 3 TOTAL 101 57 Table 2 Frequency of use of some embedding clauses Another suggestive work from which we could obtain ‘intuitions’ to test empirically was the research done by Cuenca (2003). Cuenca carried out a contrastive analysis of the usage of reformulation markers in academic English compared to similar writing in Spanish and Catalan. We suspected that the usage of reformulation markers in the Portac corpus might resemble that found by Cuenca in the two cognate Romance languages. A comparison of our two corpora revealed a higher occurrence of reformulation markers in the writing of the Portuguese academics (LL. 67.76). To the list of such markers provided by Wmatrix, namely, i.e., e.g., was added two other parenthetical connectives analyzed by Cuenca (2003), that is, and in other words. The reformulation marker or was found to have the same distribution in the two corpora and was not studied further. Wordsmith Tools Concord was used to calculate the frequency of occurrence of each reformulation marker in the two corpora. This overuse of reformulation markers is examined further in the Discussion section below. Reformulation marker Portac Controlit namely 32 1 that is 22 9 i.e. 20 2 in other words 15 12 e.g. 10 - Total 99 24 Table 3 Occurrences of reformulation markers in the corpora In the initial Wmatrix contrast of the two corpora significant overuse of prepositions by the Portac writers was apparent (LL 46.32). As noted above, of was a main contributor to this overuse (LL 31.31). A closer scrutiny revealed that multi-word prepositions (Granger, S. and Meunier, F. 2008) also contributed to this difference between the two corpora (LL 13.19). Multi-word preposition No. of occurrences in Controlit with_regard_to No. of occurrence s in Portac 23 by_means_of 11 1 with_reference_to 8 1 in_spite_of 8 4 in_view_ of 7 in_connection_with 5 1 by_way_of 5 1 in_front_of 4 1 in_conjunction_with 3 3 1 in_common_with Table 4 Comparison of most frequently used multi-word prepositions The multi-word prepositions overused by the Portuguese authors and listed in Table 3 bear a fairly close resemblance to compound prepositions frequently used in Portuguese. One question worth examining is whether there are two kinds of transfer: (1) crosslinguistic transfer, which seems likely in relations to these multi-word prepositions. This takes place at the lexical level; and (2) the transfer of discourse conventions, which might be the more likely explanation of the variation in use of the reformulation markers recorded in Table 2. This initial glimpse of the supra-lexical patterns in the multiword prepositions and the multiword expressions led us to believe that in a further exploration of our two corpora we should do a comparative study of the phraseology of our two groups of writers. Table 5 shows the results of the automated contrast between the two groups of writers in their use of multiword expressions (MWEs), as measured by the semantic tagger of Wmatrix. It should be noted that all MWEs in Table 5 have LL values higher than 6.6 and are therefore significant at p < 0.01. Also, in this list of the twenty highest log likelihoods there are positive and negative LL values. + means that the Portuguese academics are using the expression more frequently while - indicates that the L1 authors in Controlit are using the expression in question more. It is noteworthy that, of the first five MWEs overused by Portac writers, three (with regard to; according to; as regards) perform a textual function in the sense of Halliday and Hasan (1989: 29): i.e. they are not so much used to express ideas or interpersonal relations but rather as a means of ensuring that what is written is relevant and relates to its context. The prepositional phrase, in fact, which is the MWE most frequently used by Portac writers did not feature in the discussion of reformulation markers on the previous page and tabulated in Figure 3. Nevertheless the important role that this discourse marker of reformulation plays in realizing epistemic stance would repay further study. Although Table 5 shows only the 20 MWEs most frequently used by Porticle writers, very interesting results were obtained when all of the frequencies of the more than 3,000 MWE types detected in both corpora were compared. Portac had 5,756 tokens of MWEs as opposed to 4,772 in Controlit. The Log Likelihood of LL 92.10 obtained for this comparison suggests that there are significant differences in the balance between novel and formulaic language in the two groups of writers. The provision for customizing the USAS semantic tagger in Wmatrix by extending the dictionaries means that De Cock’s (2000) pioneering work on formulaicity in EFL speech and writing can now be applied more easily in cross-cultural rhetoric studies. MWE Portac Controlit Log Likelihood in_fact 87 14 + 55.40 with_regard_to 23 0 + 30.83 according_to 44 11 + 19.70 as_much_as 22 2 + 18.59 as_regards 13 0 + 17.43 in_the_picture 11 0 + 14.75 carried_out 10 0 + 13.41 his_own 22 51 - 13.23 in_question 17 2 + 12.87 due_to 14 1 + 12.85 out_of 10 30 - 11.41 brought_about 7 0 + 9.38 in_view_of 7 0 + 9.38 made_up 7 0 + 9.38 still_life 7 0 + 9.38 white_man 7 0 + 9.38 by_means_of 11 1 + 9.29 in_order_to 44 19 + 9.08 her_own 6 20 - 8.62 in_the_end 10 1 + 8.14 Table 5 Comparison of the most expressions (MWEs) extracted automatically by Wmatrix frequently used multi-word Table 6 below contains a summary of the main findings of the analysis of the two corpora using Wmatrix supplemented by Concord in Wordsmith Tools when searching for word clusters. All the Log Likelihood values refer to overuse or underuse of expressions by the writers in Portac. The positive values of LL refer to overuse of such of expressions while the negative values register underuse by the same writers. overuse of nouns, articles, adjectives in Portac underuse of pronouns in Portac LL -394.98. underuse: he (LL -232), she (LL -104), him (LL -96), I (LL -39), me (LL -37), it (LL -25.74) overuse: we (39.41) and us (16.85) words ending *ion in Portac (LL 163) –icity (LL7.07) –ization (LL14.16) –ation (LL50.71) overuse of an (LL 18.65) overuse of the genitive, singular and plural (’s and s’) (LL 211.64), Of to express the same relationship (LL 34.03). underuse of subordinating conjunctions (LL -8.16) overuse of coordination (LL 26.17) that as a relative pronoun in Portac (LL 10.15) Evaluative that clauses (LL5.7) Overuse of reformulation markers (LL. 67.76) Table 6 Summary of main findings Discussion All these features together make the English prose of Portuguese academics seem very dense and abstract in relation to that of their native speaker counterparts, and this may ultimately affect their chances of getting their work published. However, before looking at solutions to this problem, let us first discuss possible reasons for these differences. Although nominalization has been a central feature of English academic discourse since the emergence of scientific writing in the 17th century (see Halliday and Martin 1993; Martin & Veel 1998), there is evidence to suggest that the ‘historic drift towards thinginess' (Halliday 1998:211) may have gone into reverse in recent years. Certainly, public English generally seems to be becoming more 'conversational' and informal (Fairclough 1994, 1997), and one of the ways in which this is manifested is by a new preference for clausal structures above nominalizations (see Leech et al. 2001:294). It could be the case that Portuguese academic writers are somewhat lagging behind in this respect, reluctant to accompany such innovation or less able to respond to the trend. If this were all there were to it, then the problem would seem to be easily solvable through effective teaching, designed to raise L2 writers’ awareness of nominalization and encourage a more clausal-based style. If, however, there are cultural reasons for the markedly different style employed by Portuguese authors, as we suspect, the issue becomes ideologically more complex. It is reasonable to assume that many of the differences between Portac and Controlit may be accounted for by a tendency on the part of Portuguese academics to transfer stylistic and rhetorical features that are valued in their own culture into their English writing. For example, the habit in Portuguese of using synonyms rather than anaphoric pronouns to achieve textual cohesion (Mateus et al. 1989: 146) may contribute to the generally low pronoun count in Portac. However, further analysis is needed to substantiate these intuitions. To our knowledge, the relative frequencies of different cohesive devices have not yet been systematically counted in either English or Portuguese academic discourse. A corpus investigation of this area would provide a very useful contribution to research in Contrastive Rhetoric and Translation Studies. A similar process of L1 transfer may account for the frequent use of reformulation markers. Cuenca (2003) concludes that academic writing in Spanish and Catalan displays much more frequent use of reformulation markers than is found in English, and the Portuguese approach to reformulation is likely to be much closer to Spanish and Catalan than to English. Clearly, then, contrastive rhetoric research carried out on other Romance languages might provide the student of Portuguese L1 and L2 writing with clues as to which linguistic features to investigate. Although it has not been possible to test for the presence in Portac of all the differentiating discourse features identified by Bennett (2006, 2007a, 2007b), the findings listed above would seem to point to the persistence of some Portuguese academic writing conventions in these English texts. For example, the heavy nominalization not only makes the prose sound more ‘learned’ and ‘literary’, it also has the effect of turning contingent observations into abstractions, a quality that is reinforced by the prevalence of polysyllabic Latinate words and lexical abstractions (i.e. nouns ending in -ion, -icity, ization). The long sentences and embedding structures reproduce the copiousness and indirectness of Portuguese prose, while the proliferation of adjectives and appositional structures also serve to ‘pad out’ the discourse, creating an impression of abundance. Finally, the overuse of the first-person plural pronoun may be a direct transposition of the Portuguese authorial ‘we’, used systematically even when the text has been penned by a single author (as was the case with all the Portac texts) in the belief that this creates an effect of modesty by implying collective rather than individual thought (Estrela et al. 2006:47; Eco, 1997:168) Conclusion If the differences between Portac and Controlit can indeed be explained by the intrusion of Portuguese discourse features into the English prose produced by Portuguese academics, this raises important questions of both a practical and an ideological nature. Firstly, to what extent does this transfer jeopardize the chances of Portuguese academics being published in international journals? We know that verbosity, unnecessary complexity, abstraction and ‘pomposity’ are generally eschewed by arbitrators of style in English academic prose; but are editors and referees aware that other cultures may value these qualities differently? Would such an awareness alter their perception of the quality of the work submitted and therefore affect the international status of the authors in question? Secondly, to what extent should texts like these be domesticated in order to bring them into line with the Procrustean norms imposed by the hegemonic culture? Are revisers, editors and proofreaders at liberty to erase or alter discourse features that transmit value and are therefore profoundly bound up with questions of identity? Or might this constitute a form of cultural imperialism, or even ‘epistemicide’ (Santos, 2005; Bennett, 2007b), all the more insidious because it undermines the very conceptual framework upon which the author’s worldview is based? And what of the alternative, the ‘palimpsest’, that allows the thought patterns of the original version to be glimpsed beneath the surface structure? Can we guarantee that this will find a readership, even if it gets past the editors and referees? It is, after all, so much more tiring for readers to process sentences that do not fall in the way that one expects them to. Corpus Linguistics may have a useful role to play in this debate. Communication is now understood to be far more complex than theoretical notions of ‘standard English’ would have us believe, and there have already been moves towards adopting more realistic language models within corpus-enabled learning environments. By raising awareness of some of the differences existing between the discourses produced at the centre and margins of the system (Kachru, 1988; Canagarajah, 2002), Corpus Linguistics can make a useful contribution to work currently being pursued in fields such as Critical Discourse Analysis and Ethnomethodology, where issues of value and power take centre stage. Corpus tools may also be used by EAP teachers in the preparation of didactic materials and by learners who wish to orient their own progress autonomously. Hopefully, this will not only empower those on the periphery that wish to make their voice heard, but also encourage the conservatives at the centre to question the basic premises upon which the whole concept of Western knowledge is based. In this paper, the reader may detect the tension between a top-down, theory-led approach and a bottom-up, data-driven approach to discourse. The work began as an attempt to see whether two different paradigms of linguistics could converge fruitfully on the same issues. On the way, we were sometimes reminded of the Hedgehog and the Fox (Berlin 1953). We leave it to the reader to decide who is who. Endnotes Available online at < http://ucrel.lancs.ac.uk/wmatrix2.html> 2 Log Likelihood information and calculator available online at < http://ucrel.lancs.ac.uk/llwizard.html> References Bennett, K. 2006. ‘Critical language study and translation: the case of academic discourse’. In Translation Studies at the Interface of Disciplines, J.F. Duarte, A.A.Rosa & T. Seruya (Eds.). Amsterdam & Philadelphia: John Benjamins. 111-127. Bennett, K. 2007a, ‘Galileo’s revenge: ways of construing knowledge and translation strategies in the era of globalization’. In Social Semiotics, 17(2), M. Salaama-Carr (Ed.), Abington: Taylor & Francis. 171-193. Bennett, K. 2007b, ‘Epistemicide! The tale of a predatory discourse’. In The Translator, 13 (2), Manchester: St Jerome. 151-169. Bennett, K. 2009. ‘English Academic Style Manuals: A Survey’ in Journal of English for Academic Purposes, 8(1). 43-54. Berlin, I. 1953 The Hedgehog and the Fox: An Essay on Tolstoy’s View of History London: Weidenfeld and Nicolson. Biber, D., Johsanson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Longman. Canagarajah, A. S. 2002. A Geopolitics of Academic Writing. Pittsburgh, PA: University of Pittsburgh Press. Clyne, M. ‘Cultural Differences in the Organization of Academic Texts in English and German’ in Journal of Pragmatics, 11. 211-247. North Holland: Elsevier. Clyne, M. 1987b. 'Discourse Structures and Discourse Expectations: Implications for Anglo-Germanic Academic Communication in English’ in Discourse Across Cultures: Strategies in World Englishes, Larry E. Smith (Ed). Prentice Hall. Clyne, M. 1988. ‘Cross-Cultural Responses to Academic Discourse Patterns’ in Folia Linguistica 22. Čmejrková, S. 1996. ‘Academic Writing in Czech and English’. In Eija Ventola and Anna Mauranen (eds.).Pragmatics and Beyond. Amsterdam: John Benjamin. 137-153. Connor, U. 1996. Contrastive Rhetoric: Cross-Cultural Aspects of Second-Language Writing. Cambridge: Cambridge University Press. Cuenca M.-J. 2003. Two ways to reformulate: a contrastive analysis of reformulation markers. Journal of Pragmatics, 35:1069-1093. Dahl, T. 2004. ‘Textual metadiscourse in research articles: A marker of national culture or of academic discipline?’ Journal of Pragmatics 36: 1807-1825. De Cock, S. 2000 ‘Repetitive phrasal chunkiness and advanced EFL speech and writing. In Christian Mair, C. and Hundt, M. (Eds) Corpus linguistics and linguistic theory: papers from the twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20), Freiburg im Breisgau 1999 Amsterdam: Rodopi. Duszak, A. 1994. ‘Academic discourse and intellectual styles’. Journal of Pragmatics 21: 291-313. Duszak, A. (Ed). 1997. Cultural Styles of Academic Discourse, Berlin & New York: Mouton de Gruyter. Eco, U. 1997 [1977]. Como Se Faz uma Tese em Ciência Humanas. Lisbon: Editorial Presença. Estrela, E., Soares, M.A & Leitão, M.J. 2006. Saber Escrever uma Tese e Outros Textos. Lisbon: D. Quixote. Fairclough, N. 1994. ‘Conversationalization of public discourse and the authority of the consumer’. In R. Keat, N. Whitely, and N. Abercrombie (eds.) The authority of the consumer. London: Routledge. Fairclough, N. 1997. ‘Critical discourse analysis’. In T. A. van Dijk (ed.) Discourse studies : a multidisciplinary approach. Vol 2 Discourse as social action. London: Sage. 258-284. Galtung, J. 1981. ‘Structure, culture, and intellectual style: an essay comparing Saxonic, Teutonic, Gallic and Nipponic Approaches’ in Social Science Information 20, 6. London and Beverly Hills: Sage. Giannoni, D. S. 2002. ‘Worlds of gratitude: A contrastive study of acknowledgement texts in English and Italian’. Applied Linguistics 23: 1-31. Grabe, W. & Kaplan, R.. 1996. Theory and Practice of Writing: An Applied Linguistics Perspective, London & New York: Longman. Granger, S. and Meunier, F. 2008. Phraseology: An Interdisciplinary Perspective. Amsterdam & Philadelphia: John Benjamins. Halliday, M.A.K. 1998. ‘Things and relations: Regrammaticising experience as technical knowledge’. In Reading Science: Critical and Functional Perspectives on Discourses of Science, Jim R. Martin. & Robert Veel (Eds).London and New York: Routledge. Halliday, M.A.K. and Hasan, R. 1976. Cohesion in English, London and New York: Longman. Halliday, M.A.K. and Hasan R. 1989. Language, Context and Text: Aspects of Language in a Social-semiotic Perspective. Second Edition. Oxford: Oxford University Press. Halliday, M.A.K. and Martin, J.R. (Eds), 1993. Writing Science: Literacy and Discursive Power, Pittsburgh & London: University of Pittsburgh Press. Hyland, K. and Tse, P. 2005. ‘Hooking the reader: A corpus study of evaluative that in abstracts’. English for Specific Purposes 24 (2): 123-139. Kachru, B. J. 1988. ‘The sacred cows of English’. English Today 4 (4): 3-8. Kachru, Y. 1987. ‘Cross-Cultural Texts, Discourse Strategies and Discourse Interpretation’ in Discourse Across Cultures: Strategies in World Englishes, Larry Smith (Ed). UK: Prentice Hall. Kaplan, R. B. 1980 [1966]. ‘Cultural Thought Patterns in Inter-Cultural Education’ in Readings on English as a Second Language, Kenneth Croft (Ed.) Massachesetts: Winthrop. 399-418. Lillis, T. & Curry, M-J. 2006. ‘Professional Academic Writing by Multilingual Scholars: Interactions with Literacy Brokers in the Production of English-Medium Texts’ in Written Communication, Vol. 23 No. 1. London, California & New Delhi: Sage. 3-35. Martín Martín, P. 2003. ‘A genre analysis of English and Spanish research paper abstracts in experimental social sciences.’ English for Specific Purposes 22: 25-43. Martin, J.R. 1998. ‘Discourses of science: Recontextualisation, genesis, intertextuality and hegemony’ in Martin, J. R. & Veel, R. (Eds.) Reading Science: Critical and Functional Perspectives on Discourses of Science. London & New York: Routledge. 314. Martin, J. R. and Veel, R. (Eds.) 1998. Reading Science: Critical and Functional Perspectives on Discourses of Science. London & New York: Routledge. Mateus, M.H.M., Brito, A.M., Duarte, I. and Faria, I. H. 1989. Gramática da Língua Portuguesa. Lisbon: Caminho. Mauranen, A. 1993a. Cultural Differences in Academic Rhetoric, Frankfurt, Berlin, New York, Bern, Paris, Vienna: Peter Lang. Mauranen, A. 1993b. ‘Contrastive ESP rhetoric: Metatext in Finnish-English economics texts’. English for Specific Purposes 12: 3-22. McKenny, J. 2005 ‘Content analysis of dogmatism compared with corpus analysis of epistemic stance in student essays’. Information Design Journal + Document Design, 13 (1). McKenny J. 2007 A corpus-based investigation of the phraseology in various genres of written English with applications to the teaching of English for academic purposes. Unpublished Ph.D. thesis, School of English, Leeds University. Moreno, A. I. 1997. ‘Genre constraints across languages: Causal metatext in Spanish and English RAs.” English for Specific Purposes 16 (3): 161-179. Mur Dueñas, P. 2007a. ‘I/We focus on…’: A cross-cultural analysis of self-mentions in business management research articles’. Journal of English for Academic Purposes 6: 143-162. Mur Dueñas, P. 2007b. ‘Same genre, same discipline; however, there are differences: A cross-cultural analysis of logical markers in academic writing’. ESP Across Cultures 4: 37-53. Odlin, T. 1989 Language Transfer: Cross-linguistic influence in Language learning. Cambridge: Cambridge University Press. Ooi, V. 1998. Computer Corpus Lexicography. Edinburgh: Edinburgh University Press. Rayson, P. 2003. Matrix: a statistical method and software tool for linguistic analysis through corpus comparison. Unpublished Ph.D. thesis. Lancaster University. Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. Salager-Meyer, F. et al. 2003. ‘The scimitar, the dagger and the glove: Intercultural differences in the rhetoric of criticism in Spanish, French and English medical discourse (1930-1995)’. English for Specific Purposes 22 (3): 223-247. Santos, B.S. 2005. ‘General Introduction' to Reinventing Social Emancipation. Toward New Manifestos. In Santos, B.S. (Ed.), Vol. 1. Democratizing Democracy: Beyond the Liberal Democratic Canon. London: Verso. xvii – xxxiii. Scott, M. 1999. Wordsmith Tools. Oxford: Oxford University Press. Sinclair, J. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge. Smith, L. (Ed). 1987. Discourse Across Cultures: Strategies in World Englishes UK: Prentice Hall. Tardy, C. 2004. ‘The role of English in scientific communication: Lingua franca or Tyrannosaurus Rex?. Journal of English for Academic Purposes 3: 247-269. Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. Ventola, E. & Mauranen, A. (Eds). 1996, Academic Writing: Intercultural and Textual Issues, Amsterdam & Philadelphia: John Benjamins. Venuti, L. 1995. The Translator’s Invisibility: A History of Translation, London & New York: Routledge. Yakhontova, T. 2002.’“Selling” or "Telling"? The issue of cultural variation in research genres' in Academic Discourse, John Flowerdew (Ed). London & New York: Longman. Yakhontova, T. 2006. ‘Cultural and disciplinary variation in academic discourse: The issue of influencing factors’. Journal of English for Academic Purposes, 5, 153-167. John McKenny John McKenny teaches at the Division of English Studies at the University of Nottingham, Ningbo China and is Head of the Centre for Research in Applied Linguistics, Ningbo. He previously worked as a Professor Adjunto at Viseu Polytechnic (Portugal) for twelve years and as Senior Lecturer at Northumbria University for five years. His Ph.D. thesis at Leeds University was entitled A corpus-based investigation of the phraseology in various genres of written English with applications to the teaching of English for academic purposes. He is currently co-editing with Tometro Hopkins a volume entitled Englishes of the British Isles to be published this year by Continuum International. This book is the first of a 15-volume series on World Englishes. Karen Bennett Karen Bennett is a member of the Centre for Comparative Studies, University of Lisbon, where she researches in Translation Studies. Her PhD in English Academic Discourse: its hegemonic status and implications for translation (University of Lisbon) was based upon her extensive experience as translator and teacher of Academic Discourse with the Catholic University of Portugal, University of Coimbra, and elsewhere. She has published a number of articles on this subject and others, including ‘Galileo’s revenge: Ways of construing knowledge and translation strategies in the era of globalization’ in Social Semiotics, Vol. 17, No. 2 (2007), and ‘Epistemicide! The tale of a predatory discourse’ in The Translator, Vol. 13, No. 2, (2007).