Processing the language of predicting and forecasting in an Italian corpus of economic reports Maria Teresa Musacchio SSLMIT, University of Trieste, Trieste, Italy musacchi@sslmit.units.it Abstract Predicting and forecasting play a central role in economic theory and practice. Economic agents refer to them to base their policies and decisions on. This paper presents an attempt to extract predictions from an Italian corpus of economic reports using morphological rather than lexical information in KWIC concordance. The method – as applied to Italian – leads to identify grammatical and lexical information that can be used to integrate data in terminology collections. Lexical and grammatical data thus acquired can be fed back into the system to refine search for predictions and forecasts in a further stage of research. 1. Introduction Economists study the economy in much the same way as other scientists approach their disciplines. They observe phenomena, formulate problems, develop theories or models to capture the essence of the phenomena, and they test their predictions against economic data in an attempt to verify or refute them. Although economists use theory and observation much like other scientists, their task becomes particularly challenging when it comes to testing because experiments are often difficult in economics. In many cases, economists cannot make experiments such as inducing inflation to generate useful data and put forward proposals or recommendations to control the economy. To compensate for the limitations in laboratory experiments, they study carefully the natural experiments offered by history (Mankiw, 1998: 18-19). Thus, prediction based on theories or models and forecasting based on data analysis acquire special relevance in economics compared to other areas of science and technology. Economic agents devise their policies and make their decisions partly referring to economic predictions and forecasts and this accounts for the attention given to reports containing this type of information and regularly published by authoritative economic and financial institutions such as the IMF and the OECD at international level and central banks, research institutes and institutes of statistics at national level. The scientific method used by economists is also reflected in their daily practice and in their language as claims to knowledge can be made only by using language (Backhouse et al., 1993: 1). At the level of textual structure, this implies that in many economic texts a sequence of three parts – analysis, prediction or forecast, and proposal – can be identified (Sobrero, 1993: 255). Predicting and forecasting are thus central to economic discourse and in particular to argumentation. Here then lies the problem: if knowledge can only be expressed through language, how are predictions and forecasts formulated and can they be easily identified and tracked through linguistic analysis? In economic discourse predicting and forecasting are forms of evaluation, that is assessments of the possible consequences of current events, descriptions of future developments. It is therefore to be expected that they will be expressed using verbs in tenses of future action such as future tenses and conditionals indicating the probability of an event occurring in the future. Given the centrality of predictions and forecasts in economics, one can easily imagine that economists take great care to formulate them. They use devices to limit the validity of their claims, that is they use words to make meanings fuzzier or less fuzzy. In linguistics this process is called hedging (Lakoff, 1973: 195). Hedges qualify nouns, verbs, adjectives or whole predications and – as Brown and Levinson (1987) pointed out – also include epistemic modal expressions. In her research on speech acts in English economic discourse Merlini (1983: 8-16) identified four different types of predictions1 she classified as shown in Table 1 below. Two criteria are used to identify and classify predictions: epistemic and inferential gradients. The former indicates the degree of speaker/writer’s commitment to the truth value of the whole proposition (shields), the latter the truth conditions of propositions (approximators). Shields can be modals (epistemic will, would, could, should, may, might), modal expressions (to be [or seem/appear] + likely, possible, bound, due, destined) and modifiers such as adjectives or adverbs – either downtoners (likely, possible, probable, possibly, probably, presumably, perhaps) – or intensifiers like certainly and well. Further, the value of claims can be altered using modifiers such as obviously, surely, clearly, conceivably and equivalent paraphrases (this shows that, this indicates that, suggests that, one may reasonably expect, etc). Approximators are expressions like almost, nearly, at least, more or less, about, virtually, etc. (Bloor & Bloor, 1993: 153-154). Predictions often take an implicit form, that is, they are seldom introduced by verbs like to predict, to forecast, to expect and so on (Merlini, 1983: 23). 1 Merlini clearly states (1983: 8) that her analysis concerns what in English is generally referred to as ‘prediction’, that is a statement of the likely course of future economic events based on economic theory. It is my contention, however, that the third and fourth type of prediction indicated by Merlini are often based on data as well, so that the distinction between prediction and forecast is not clear-cut. Since this paper focuses on ‘applied’ and ‘instrumental’ predictions due to their greater interest to market operators, in what follows prediction will be used as an umbrella term to refer to both predictions and forecasts. Type of prediction Interpretive or hypothetical prediction = it creates a link between two phenomena: if phenomenon p is observed, then phenomenon q will occur Examples If consumers want more of any good, the price will rise, sending a signal to producers that more supply is needed. (Samuelson & Nordhaus, 2001 : 26) Suppose you own a football club. Before the season begins you have to set the price of football tickets for the season. Your sole aim is to maximize your revenue from Illustrative or speculative prediction = it creates a ticket sales so you can afford to buy some better players simplified model of reality for the next season. (…) If the quantity demanded is insensitive sake of clarity. From this to the price it will require a very low price to fill thge relationships are engendered and ground and total revenue will collapse. If, however, consequences are drawn. small reductions in ticket sales lead to large increases in the quantity sold, it makes more sense to charge a price that will fill the ground. (Begg et al., 1997 : 56) Applied or realistic prediction = If the experiences of the 1994 bond-market decline are a The past/current situation is guide to the future, even a substantial Wall Street analysed and economic models correction (…) will have a relatively small effect on the and tools are used to predict or pace of Main Street activity. (Tyson, 1997 : 58) forecast future events. a) (…) the economy would be capable of a much better performance if only the Federal reserve and the bond market would relax the interest-rate brakes. Instrumental prediction = it is (Tyson 1997: 58) used to prove the validity of : a) a b) I hope that by the time this book comes out the warning or b) a proposal. European Central Bank will have moved aggressively to cut rates and stimulate growth; if it does not, the liquidity trap could be about to claim another victim. (Krugman, 1999: 162) Table 1: A classification of predictions with examples 2. Predictions and forecasts in Italian This paper investigates the possibility of extracting predictions – with special reference to trends in the business cycle affecting the markets – from an Italian corpus of economic reports. As the ups and downs in an economy can be expressed through language in many different ways, in this paper research on hedging and especially Merlini’s observations on predictions are applied to Italian. In this first stage of research, epistemic modality is used as a working hypothesis or ‘knowledge probe’ (Ahmad & Fulford, 1992) to identify predictions in Italian economic texts. Despite the noise one may find, an attempt is made to see if – in an untagged corpus – concordancing of prediction probes is a speedier way to identify the ups and downs in the economies than a search for all the terms expressing change or no change. Data analysed in this paper come from a corpus of Italian economic texts jointly developed by the University of Surrey and the University of Trieste. It is approximately 190,000 tokens and includes the Annual Report, an issue of the Economic Bulletin of the Bank of Italy, a fragment on the Italian business cycle from The Annual Report of the Italian Institute of Statistics (ISTAT) and a number of speeches delivered by the Governor of the Bank of Italy. Corpus composition with data of publication of texts is illustrated in Table 2. Texts Tokens Economic Bulletin of the Bank of Italy 36,188 No. 32 – February 1999 Notes to the Economic Bulletin of the 5,500 Bank of Italy (No. 32 – Feb. 1999) Annual Report 1998 of ISTAT – 33,354 fragment (Ch. 1) Annual Report 1998 of the Bank of 77,593 Italy Considerazioni finali ( 31st May 1999) 11,172 Dpef 1999-2001 (22nd April 1998) 8,794 Speeches (December 1998-January 15,079 1999) Total 187,680 Table 2: Composition of the Surrey-Trieste Corpus of Italian economic reports Compared to English, Italian has a lower number of modals and relies on other devices to express varying degrees of confidence about future states and events. For instance, epistemic will indicating confident prediction has an Italian equivalent in the simple future indicative. A lower degree of confidence expressed in English through modals such as would, could, should, may and might is indicated in Italian by the simple conditional. The epistemic modals in Italian are potere indicating possibility and dovere expressing higher probability. Unlike their English counterparts potere and dovere can be conjugated in all tenses and modes (Serianni 1991: 396). As is known, Italian verbs are marked for simple tenses and for modes by suffixes attached to the verb root and inflected for person. The number of irregular verbs in Italian is relatively high. These features of Italian verbs would make retrieval of predictions based on simple future and conditional tenses highly complex. However, a number of restrictions can be applied that make retrieval much easier. First, in special languages a large use of impersonal forms and passives is made so that search for tenses inflected in the third person singular and plural – active or passive – can suffice. Second, problems of information retrieval due to morphological irregularities in tense formation can be avoided when it comes to simple future and conditional by querying the corpus for suffixes without their thematic vowels (‘e’, ‘i’ or nil) (Serianni, 1991: 433). For the simple future the search string will therefore be *rà/*ranno (3rd pers sing/pl): compare aumentare (to increase) aumenterà/ aumenteranno and diminuire (to decrease) diminuirà/diminuiranno – both regular verbs – and potere potrà/potranno or andare (to go) andrà/andranno. Similarly, the search string for the simple conditional will be *rebbe/*rebbero (cf. aumenterebbe/aumenterebbero but dovrebbe/dovrebbero). Besides limiting the number of searches, an advantage of the method used at this stage of research lies in the possibility to retrieve all predictions regardless of the verb or noun groups indicating them. For example, an upward trend may take the form ‘la domanda crescerà nel secondo trimestre’ (N + V + A ‘demand will grow in the second quarter’) – where information on the trend is carried by the verb – or a form where information is carried by the subject (the noun crescita or ‘growth/increase’ in the N+prep+N collocation crescita della domanda): ‘si verificherà una crescita della domanda nel secondo trimestre’ (V + N + prep + N + A lit. ‘An increase in demand will take place in the second quarter’). Information extracted in this way on the company a word keeps is both grammatical (colligation) and lexical (collocation) (Sinclair 1998). KWIC concordances are performed using SystemQuirk (Ahmad, 1998). Occurrences of the simple future (3rd person sing/pl = 223) and the simple conditional (3rd person sing/pl = 307) in our corpus were manually checked to identify predictions vs other propositions. 98 predictions in the future or 43.95% of the total and 213 predictions in the conditional or 69.38% of the total were found. Predictions were then analysed to see if any structural patterns emerged. It soon became clear that predicted changes exhibit patterns where core information about ups and downs or no-change is carried either by the verb or by the noun – in this case usually in subject position. Table 3 shows examples of sentences in which prediction information is carried by verbal groups. Examples 1-5 concern increases, examples 6-7 stability, while examples 8-10 highlight decreases. Examples in the table also show typical approximators used in Italian in the context of predictions. Table 4 presents similar predictions, but their core information is carried by the noun group, while main verbs are general-purpose verbs, that is verbs that can collocate with a wide range of nouns and thus take on different meanings (provocare, comportare, segnare, indicare, contribuire). In Table 4 examples 1 to 7 designate upward trends, examples 8 and 9 indicate no change and examples 10 to 13 show downward trends. Further examples of approximators are found in these sentences and can be added to a list to be used in a second stage of work, when this simple method to extract predictions is refined by combining search of grammatical and lexical items. Taken together, the two tables show the wide variety of verbs and phrases used to convey the idea of increase, decrease or no change of some economic indicators, though the list is by no means complete and indeed many more examples were found in our corpus. This may suggest that – in Italian and possibly other Romance languages – retrieval of predictions relying on morphology or syntax is a more productive method than the simple search for modals and modal expressions. Further indication of the relevance of retrieval based on verb tense inflection is provided in Table 5, which lists occurrences of all explicit predictions (si prevede [che]) and some shields – presumibilmente, probabilmente – to be found in our corpus. 1 2 3 4 5 6 7 8 9 10 LH co(n)text RH co(ntext) soprattutto le aree accrescerà la produttività del sistema meridionali del Paese Consumatori mutano con tenderà a crescere la domanda di alcuni beni il variare dell’età. questo processo; Contribuendo alla potrà accelerarlo l’introduzione dell’euro diversificazione Lo sviluppo del salirebbe marginalmente, dal 3,3 al 4,4 per cento; commercio mondiale dei Bot a 12 mesi. dal 4,6 per cento del 1999 dovrebbe innalzarsi L’avanzo primario al 5,2 nel medio termine dovrà essere elevato e L’andamento l’avanzo primario stabile. l’indice di questo mese di di dodici mesi prima. Le rimarrebbe sul livello gennaio aspettative nei paesi industriali; in al 2 per cento. scenderà Europa, lo sviluppo L’economia mondiale dal 1998 al 2002, di oltre tre punti l’incidenza delle spese sul dovrebbe diminuire percentuali (dal 4 PIL Secondo le aspettative a valori di poco superiori degli operatori, si ridurrebbe all’1 per cento, in l’inflazione Text speeches\dpef.txt speeches\interv~1.txt reports\releco~1.txt reports\bollett~1.txt speeches\dpef.txt speeches\consfi~1.txt speeches\interv~1.txt speeches\consfi~1.txt reports\bollett~1.txt reports\bollett~1.txt Table 3: A concordance of economic predictions. Information is carried by the verb in column three. Hedging lexicalised through inflection (conditionals), modals (potrà, dovrebbe) or modal expressions (tenderà a) is highlighted in bold, approximators are marked in italics. 1 2 3 4 5 6 7 8 9 10 11 12 13 LH co(n)text RH co(n)text margine di interesse. Un dalla crescita dei ricavi da potrà derivare contributo positivo servizi che il complesso dei un aumento del reddito provocherà provvedimenti esaminati disponibile ad accentuarsi in Europa la tendenza alla comporterà nei prossimi anni; concentrazione delle prospettive della in autunno, con la sarà possibile nostra economia definizione quelli dello scorso anno. sostegno nella dinamica dovrebbe trovare La domanda interna dei consumi (0,3 punti percentuali del un aumento modesto. segnerebbe PIL); l’avanzo primario L’incidenza del già all’opera lo scorso alimento dalla crescita del trarrebbe anno. La capacità di spesa reddito disponibile del 2000 (/fig. 28). La pressoché costante attorno si manterrebbe crescita dei prezzi in Italia a questo livello La crescita dei prezzi al lievemente inferiore ai risulterebbe consumo nel nostro paese valori medi dell’area del 1995 al 6,3 del 1998. con il venire a scadenza dovrebbe proseguire La riduzione dei titoli dal 1965. L’assenza di un restringimento delle pressioni inflazionistiche, potrebbero determinare condizioni che per l’anno in corso. Al oltre ai più favorevoli dovrebbero contribuire, suo ridimensionamento risultati previsti invariati nel terzo (tav. 4); una flessione nei mesi Indicherebbero i dati di fonte doganale successivi. Il ristagno Text reports\releco~1.txt reports\istat.txt speeches\interv~1.txt speeches\dpef.txt reports\bollett~1.txt reports\bollett~1.txt reports\bollett~1.txt reports\bollett~1.txt reports\bollett~1.txt reports\releco~1.txt reports\istat.txt speeches\dpef.txt reports\bollett~1.txt Table 4: A concordance of economic predictions. Information is carried by the underlined noun (usu. the subject) in a N [+ prep. + N] + V + A construction. in column two or four. Hedging lexicalised through inflection (conditionals), modals (potrà, dovrebbe) or modal expressions (sarà possibile) is highlighted in bold, approximators are marked in italics. 1 2 3 4 LH co(n)text dinamica dei salari nominali, i quali brasiliana, benché non ancora quantificabili, L’avvio dell’euro determinerà di crescita. Le negoziazioni, che partiranno si prevede si prevede probabilmente presumibilmente RH co(n)text non aumenteranno più del 3% nel 1999 e nel 2000 Saranno evidenti soprattutto nei paesi dell’America latina Text reports\istat.txt reports\istat.txt un’integrazione del reports\releco~1.txt entro l’estate, riguarderanno reports\releco~1.txt Table 5: Examples of explicit predictions and predictions containing shields in the Surrey-Trieste corpus. 3. Discussion and future work This paper presents the first stage of research in the extraction of predictions from an untagged Italian corpus of economic reports. Drawing on linguistic research on modality and economic predictions in English, a simple method based on Italian morphology was used to identify predictions. Despite the noise – over 50% of occurrences are not predictions – this method can help identify a) verbs lexicalising predicted change or no change in economies and/or economic indicators and b) phrases (collocations) and patterns (colligations) pointing to predictions. The co-texts of sentences also provide data on approximators and shields. Information acquired via morphology can then be used to refine the method for the extraction of predictions. The challenge is to compile lists of verbs and phrases which – combined with grammatical patterns – can integrate information provided in terminology collections as predictions and their language are central to economic discourse. To economists this can also prove a quick way to scan long Italian economic texts in search of predictions. In the second stage of research ways to reduce noise and thus improve the efficiency of extraction will be explored. In particular, attempts will be made to combine grammatical and lexical information to refine search. In my presentation Italian examples will also be compared and contrasted with predictions drawn from an English corpus of economic reports developed at the University of Trieste. Reference will further be made to what little data on economic predictions can be gathered from the large written Italian corpora CORIS (general language) and CODIS (special languages) of the University of Bologna. 4. References Ahmad, K. and H. Fulford, 1992. “Knowledge Processing 4. Semantic relations and their use in elaborating terminology”, Computing Sciences Report CS-92-07. Guildford: University of Surrey. Ahmad, K. 1998. Specialist texts and their Quirks. In TAMA ’98. Terminology in Advanced Microcomputer Applications. Proceedings of the 4th TermNet Symposium. Tools for Multilingual Communication. Vienna: TermNet, 141-157. Backhouse, R., T. Dudley-Evans and W. Henderson, 1993. Exploring the language and rhetoric of economics. In W. Henderson, T. Dudley-Evans, and R. Backhouse (eds.), Economics and Language. London: Routledge, 1-20. Begg, D., Fischer, S., Dornbusch, R., 1997. Economics. 5th ed.. Maidenhead: Mc-Graw-Hill. Bloor, M., T. Bloor, (1993). How economists modify propositions. In W. Henderson, T. Dudley-Evans, and R. Backhouse (eds.), Economics and Language. London: Routledge, 153-169. Brown, P., Levinson S.C., 1987. Politeness: Some Universals in Language Usage. Cambridge: Cambridge University Press. Krugman, P., 2000. The Return of Depression Economics. London: Penguin. Lakoff, G. 1973. Hedges: A study in meaning criteria and the logic of fuzzy concepts. Journal of Philosophical Logic, 2(4): 458-503. Mankiw, G., 1998. Principles of Economics. Fort Worth (TX): The Dryden Press. Merlini, L., 1983, Gli atti del discorso economico: la previsione. Status illocutorio e modelli linguistici nel testo inglese. Parma: Zara. Samuelson, P.A., Nordhaus, W.D., 2001. Economics. New York: McGraw-Hill Irwin. Serianni, L., 1991. Grammatica italiana. Italiano comune e lingua letteraria, 2a ed.. Torino: UTET. Sinclair, J.M., 1998. The lexical item. In E. Weigand (ed.), Contrastive Lexical Semantics, Amsterdam/ Philadelphia: John Benjamins, 1-24. Sobrero, A.A., 1993. Lingue speciali. In: A.A. Sobrerro A.A. (a cura di), Introduzione all’italiano contemporaneo. La variazione e gli usi. Roma-Bari: Laterza, vol. II:237-277. Tyson, L., 1997. How bright is bright? In: The World in 1998. London: The Economist Publications, 58-59.