Paper - University of Surrey

advertisement
Processing the language of predicting and forecasting in an Italian corpus of
economic reports
Maria Teresa Musacchio
SSLMIT, University of Trieste, Trieste, Italy
musacchi@sslmit.units.it
Abstract
Predicting and forecasting play a central role in economic theory and practice. Economic agents refer to them to base their policies and
decisions on. This paper presents an attempt to extract predictions from an Italian corpus of economic reports using morphological
rather than lexical information in KWIC concordance. The method – as applied to Italian – leads to identify grammatical and lexical
information that can be used to integrate data in terminology collections. Lexical and grammatical data thus acquired can be fed back
into the system to refine search for predictions and forecasts in a further stage of research.
1. Introduction
Economists study the economy in much the same way
as other scientists approach their disciplines. They observe
phenomena, formulate problems, develop theories or
models to capture the essence of the phenomena, and they
test their predictions against economic data in an attempt
to verify or refute them. Although economists use theory
and observation much like other scientists, their task
becomes particularly challenging when it comes to testing
because experiments are often difficult in economics. In
many cases, economists cannot make experiments such as
inducing inflation to generate useful data and put forward
proposals or recommendations to control the economy. To
compensate for the limitations in laboratory experiments,
they study carefully the natural experiments offered by
history (Mankiw, 1998: 18-19). Thus, prediction based on
theories or models and forecasting based on data analysis
acquire special relevance in economics compared to other
areas of science and technology. Economic agents devise
their policies and make their decisions partly referring to
economic predictions and forecasts and this accounts for
the attention given to reports containing this type of
information and regularly published by authoritative
economic and financial institutions such as the IMF and
the OECD at international level and central banks,
research institutes and institutes of statistics at national
level.
The scientific method used by economists is also
reflected in their daily practice and in their language as
claims to knowledge can be made only by using language
(Backhouse et al., 1993: 1). At the level of textual
structure, this implies that in many economic texts a
sequence of three parts – analysis, prediction or forecast,
and proposal – can be identified (Sobrero, 1993: 255).
Predicting and forecasting are thus central to economic
discourse and in particular to argumentation. Here then
lies the problem: if knowledge can only be expressed
through language, how are predictions and forecasts
formulated and can they be easily identified and tracked
through linguistic analysis?
In economic discourse predicting and forecasting are
forms of evaluation, that is assessments of the possible
consequences of current events, descriptions of future
developments. It is therefore to be expected that they will
be expressed using verbs in tenses of future action such as
future tenses and conditionals indicating the probability of
an event occurring in the future. Given the centrality of
predictions and forecasts in economics, one can easily
imagine that economists take great care to formulate them.
They use devices to limit the validity of their claims, that
is they use words to make meanings fuzzier or less fuzzy.
In linguistics this process is called hedging (Lakoff, 1973:
195). Hedges qualify nouns, verbs, adjectives or whole
predications and – as Brown and Levinson (1987) pointed
out – also include epistemic modal expressions. In her
research on speech acts in English economic discourse
Merlini (1983: 8-16) identified four different types of
predictions1 she classified as shown in Table 1 below.
Two criteria are used to identify and classify predictions:
epistemic and inferential gradients. The former indicates
the degree of speaker/writer’s commitment to the truth
value of the whole proposition (shields), the latter the
truth conditions of propositions (approximators). Shields
can be modals (epistemic will, would, could, should, may,
might), modal expressions (to be [or seem/appear] +
likely, possible, bound, due, destined) and modifiers such
as adjectives or adverbs – either downtoners (likely,
possible, probable, possibly, probably, presumably,
perhaps) – or intensifiers like certainly and well. Further,
the value of claims can be altered using modifiers such as
obviously, surely, clearly, conceivably and equivalent
paraphrases (this shows that, this indicates that, suggests
that, one may reasonably expect, etc). Approximators are
expressions like almost, nearly, at least, more or less,
about, virtually, etc. (Bloor & Bloor, 1993: 153-154).
Predictions often take an implicit form, that is, they are
seldom introduced by verbs like to predict, to forecast, to
expect and so on (Merlini, 1983: 23).
1
Merlini clearly states (1983: 8) that her analysis concerns what
in English is generally referred to as ‘prediction’, that is a
statement of the likely course of future economic events based
on economic theory. It is my contention, however, that the third
and fourth type of prediction indicated by Merlini are often
based on data as well, so that the distinction between prediction
and forecast is not clear-cut. Since this paper focuses on
‘applied’ and ‘instrumental’ predictions due to their greater
interest to market operators, in what follows prediction will be
used as an umbrella term to refer to both predictions and
forecasts.
Type of prediction
Interpretive or hypothetical
prediction = it creates a link
between two phenomena: if
phenomenon p is observed, then
phenomenon q will occur
Examples
If consumers want more of any good, the price will rise,
sending a signal to producers that more supply is needed.
(Samuelson & Nordhaus, 2001 : 26)
Suppose you own a football club. Before the season
begins you have to set the price of football tickets for the
season. Your sole aim is to maximize your revenue from
Illustrative or speculative
prediction = it creates a
ticket sales so you can afford to buy some better players
simplified model of reality for the next season. (…) If the quantity demanded is insensitive
sake of clarity. From this
to the price it will require a very low price to fill thge
relationships are engendered and ground and total revenue will collapse. If, however,
consequences are drawn.
small reductions in ticket sales lead to large increases in
the quantity sold, it makes more sense to charge a price
that will fill the ground. (Begg et al., 1997 : 56)
Applied or realistic prediction =
If the experiences of the 1994 bond-market decline are a
The past/current situation is
guide to the future, even a substantial Wall Street
analysed and economic models
correction (…) will have a relatively small effect on the
and tools are used to predict or
pace of Main Street activity. (Tyson, 1997 : 58)
forecast future events.
a) (…) the economy would be capable of a much better
performance if only the Federal reserve and the
bond market would relax the interest-rate brakes.
Instrumental prediction = it is
(Tyson 1997: 58)
used to prove the validity of : a) a b) I hope that by the time this book comes out the
warning or b) a proposal.
European Central Bank will have moved
aggressively to cut rates and stimulate growth; if it
does not, the liquidity trap could be about to claim
another victim. (Krugman, 1999: 162)
Table 1: A classification of predictions with examples
2. Predictions and forecasts in Italian
This paper investigates the possibility of extracting
predictions – with special reference to trends in the
business cycle affecting the markets – from an Italian
corpus of economic reports. As the ups and downs in an
economy can be expressed through language in many
different ways, in this paper research on hedging and
especially Merlini’s observations on predictions are
applied to Italian. In this first stage of research, epistemic
modality is used as a working hypothesis or ‘knowledge
probe’ (Ahmad & Fulford, 1992) to identify predictions in
Italian economic texts. Despite the noise one may find, an
attempt is made to see if – in an untagged corpus –
concordancing of prediction probes is a speedier way to
identify the ups and downs in the economies than a search
for all the terms expressing change or no change.
Data analysed in this paper come from a corpus of
Italian economic texts jointly developed by the University
of Surrey and the University of Trieste. It is
approximately 190,000 tokens and includes the Annual
Report, an issue of the Economic Bulletin of the Bank of
Italy, a fragment on the Italian business cycle from The
Annual Report of the Italian Institute of Statistics (ISTAT)
and a number of speeches delivered by the Governor of
the Bank of Italy. Corpus composition with data of
publication of texts is illustrated in Table 2.
Texts
Tokens
Economic Bulletin of the Bank of Italy
36,188
No. 32 – February 1999
Notes to the Economic Bulletin of the
5,500
Bank of Italy (No. 32 – Feb. 1999)
Annual Report 1998 of ISTAT –
33,354
fragment (Ch. 1)
Annual Report 1998 of the Bank of
77,593
Italy
Considerazioni finali ( 31st May 1999)
11,172
Dpef 1999-2001 (22nd April 1998)
8,794
Speeches (December 1998-January
15,079
1999)
Total 187,680
Table 2: Composition of the Surrey-Trieste Corpus of
Italian economic reports
Compared to English, Italian has a lower number of
modals and relies on other devices to express varying
degrees of confidence about future states and events. For
instance, epistemic will indicating confident prediction has
an Italian equivalent in the simple future indicative. A
lower degree of confidence expressed in English through
modals such as would, could, should, may and might is
indicated in Italian by the simple conditional. The
epistemic modals in Italian are potere indicating
possibility and dovere expressing higher probability.
Unlike their English counterparts potere and dovere can
be conjugated in all tenses and modes (Serianni 1991:
396). As is known, Italian verbs are marked for simple
tenses and for modes by suffixes attached to the verb root
and inflected for person. The number of irregular verbs in
Italian is relatively high. These features of Italian verbs
would make retrieval of predictions based on simple
future and conditional tenses highly complex. However, a
number of restrictions can be applied that make retrieval
much easier. First, in special languages a large use of
impersonal forms and passives is made so that search for
tenses inflected in the third person singular and plural –
active or passive – can suffice. Second, problems of
information retrieval due to morphological irregularities
in tense formation can be avoided when it comes to simple
future and conditional by querying the corpus for suffixes
without their thematic vowels (‘e’, ‘i’ or nil) (Serianni,
1991: 433). For the simple future the search string will
therefore be *rà/*ranno (3rd pers sing/pl): compare
aumentare (to increase)  aumenterà/ aumenteranno and
diminuire (to decrease)  diminuirà/diminuiranno – both
regular verbs – and potere  potrà/potranno or andare
(to go)  andrà/andranno. Similarly, the search string for
the simple conditional will be *rebbe/*rebbero (cf.
aumenterebbe/aumenterebbero but dovrebbe/dovrebbero).
Besides limiting the number of searches, an advantage of
the method used at this stage of research lies in the
possibility to retrieve all predictions regardless of the verb
or noun groups indicating them. For example, an upward
trend may take the form ‘la domanda crescerà nel
secondo trimestre’ (N + V + A  ‘demand will grow in
the second quarter’) – where information on the trend is
carried by the verb – or a form where information is
carried by the subject (the noun crescita or
‘growth/increase’ in the N+prep+N collocation crescita
della domanda): ‘si verificherà una crescita della domanda
nel secondo trimestre’ (V + N + prep + N + A  lit. ‘An
increase in demand will take place in the second quarter’).
Information extracted in this way on the company a word
keeps is both grammatical (colligation) and lexical
(collocation) (Sinclair 1998). KWIC concordances are
performed using SystemQuirk (Ahmad, 1998).
Occurrences of the simple future (3rd person sing/pl =
223) and the simple conditional (3rd person sing/pl = 307)
in our corpus were manually checked to identify
predictions vs other propositions. 98 predictions in the
future or 43.95% of the total and 213 predictions in the
conditional or 69.38% of the total were found. Predictions
were then analysed to see if any structural patterns
emerged. It soon became clear that predicted changes
exhibit patterns where core information about ups and
downs or no-change is carried either by the verb or by the
noun – in this case usually in subject position. Table 3
shows examples of sentences in which prediction
information is carried by verbal groups. Examples 1-5
concern increases, examples 6-7 stability, while examples
8-10 highlight decreases. Examples in the table also show
typical approximators used in Italian in the context of
predictions. Table 4 presents similar predictions, but their
core information is carried by the noun group, while main
verbs are general-purpose verbs, that is verbs that can
collocate with a wide range of nouns and thus take on
different meanings (provocare, comportare, segnare,
indicare, contribuire). In Table 4 examples 1 to 7
designate upward trends, examples 8 and 9 indicate no
change and examples 10 to 13 show downward trends.
Further examples of approximators are found in these
sentences and can be added to a list to be used in a second
stage of work, when this simple method to extract
predictions is refined by combining search of grammatical
and lexical items.
Taken together, the two tables show the wide variety
of verbs and phrases used to convey the idea of increase,
decrease or no change of some economic indicators,
though the list is by no means complete and indeed many
more examples were found in our corpus. This may
suggest that – in Italian and possibly other Romance
languages – retrieval of predictions relying on
morphology or syntax is a more productive method than
the simple search for modals and modal expressions.
Further indication of the relevance of retrieval based on
verb tense inflection is provided in Table 5, which lists
occurrences of all explicit predictions (si prevede [che])
and some shields – presumibilmente, probabilmente – to
be found in our corpus.
1
2
3
4
5
6
7
8
9
10
LH co(n)text
RH co(ntext)
soprattutto le aree
accrescerà
la produttività del sistema
meridionali del Paese
Consumatori mutano con
tenderà a crescere
la domanda di alcuni beni
il variare dell’età.
questo processo;
Contribuendo alla
potrà accelerarlo
l’introduzione dell’euro
diversificazione
Lo sviluppo del
salirebbe marginalmente, dal 3,3 al 4,4 per cento;
commercio mondiale
dei Bot a 12 mesi.
dal 4,6 per cento del 1999
dovrebbe innalzarsi
L’avanzo primario
al 5,2
nel medio termine dovrà essere elevato e
L’andamento
l’avanzo primario
stabile.
l’indice di questo mese di
di dodici mesi prima. Le
rimarrebbe sul livello
gennaio
aspettative
nei paesi industriali; in
al 2 per cento.
scenderà
Europa, lo sviluppo
L’economia mondiale
dal 1998 al 2002,
di oltre tre punti
l’incidenza delle spese sul dovrebbe diminuire
percentuali (dal 4
PIL
Secondo le aspettative
a valori di poco superiori
degli operatori,
si ridurrebbe
all’1 per cento, in
l’inflazione
Text
speeches\dpef.txt
speeches\interv~1.txt
reports\releco~1.txt
reports\bollett~1.txt
speeches\dpef.txt
speeches\consfi~1.txt
speeches\interv~1.txt
speeches\consfi~1.txt
reports\bollett~1.txt
reports\bollett~1.txt
Table 3: A concordance of economic predictions. Information is carried by the verb in column three. Hedging
lexicalised through inflection (conditionals), modals (potrà, dovrebbe) or modal expressions (tenderà a) is highlighted in
bold, approximators are marked in italics.
1
2
3
4
5
6
7
8
9
10
11
12
13
LH co(n)text
RH co(n)text
margine di interesse. Un
dalla crescita dei ricavi da
potrà derivare
contributo positivo
servizi
che il complesso dei
un aumento del reddito
provocherà
provvedimenti esaminati
disponibile
ad accentuarsi in Europa
la tendenza alla
comporterà
nei prossimi anni;
concentrazione
delle prospettive della
in autunno, con la
sarà possibile
nostra economia
definizione
quelli dello scorso anno.
sostegno nella dinamica
dovrebbe trovare
La domanda interna
dei consumi
(0,3 punti percentuali del
un aumento modesto.
segnerebbe
PIL); l’avanzo primario
L’incidenza del
già all’opera lo scorso
alimento dalla crescita del
trarrebbe
anno. La capacità di spesa
reddito disponibile
del 2000 (/fig. 28). La
pressoché costante attorno
si manterrebbe
crescita dei prezzi in Italia
a questo livello
La crescita dei prezzi al
lievemente inferiore ai
risulterebbe
consumo nel nostro paese
valori medi dell’area
del 1995 al 6,3 del 1998.
con il venire a scadenza
dovrebbe proseguire
La riduzione
dei titoli
dal 1965. L’assenza di
un restringimento delle
pressioni inflazionistiche, potrebbero determinare
condizioni
che
per l’anno in corso. Al
oltre ai più favorevoli
dovrebbero contribuire,
suo ridimensionamento
risultati previsti
invariati nel terzo (tav. 4);
una flessione nei mesi
Indicherebbero
i dati di fonte doganale
successivi. Il ristagno
Text
reports\releco~1.txt
reports\istat.txt
speeches\interv~1.txt
speeches\dpef.txt
reports\bollett~1.txt
reports\bollett~1.txt
reports\bollett~1.txt
reports\bollett~1.txt
reports\bollett~1.txt
reports\releco~1.txt
reports\istat.txt
speeches\dpef.txt
reports\bollett~1.txt
Table 4: A concordance of economic predictions. Information is carried by the underlined noun (usu. the subject) in a
N [+ prep. + N] + V + A construction. in column two or four. Hedging lexicalised through inflection (conditionals),
modals (potrà, dovrebbe) or modal expressions (sarà possibile) is highlighted in bold, approximators are marked in
italics.
1
2
3
4
LH co(n)text
dinamica dei salari
nominali, i quali
brasiliana, benché non
ancora quantificabili,
L’avvio dell’euro
determinerà
di crescita. Le
negoziazioni, che
partiranno
si prevede
si prevede
probabilmente
presumibilmente
RH co(n)text
non aumenteranno più del
3% nel 1999 e nel 2000
Saranno evidenti
soprattutto nei paesi
dell’America latina
Text
reports\istat.txt
reports\istat.txt
un’integrazione del
reports\releco~1.txt
entro l’estate,
riguarderanno
reports\releco~1.txt
Table 5: Examples of explicit predictions and predictions containing shields in the Surrey-Trieste corpus.
3. Discussion and future work
This paper presents the first stage of research in the
extraction of predictions from an untagged Italian corpus
of economic reports. Drawing on linguistic research on
modality and economic predictions in English, a simple
method based on Italian morphology was used to identify
predictions. Despite the noise – over 50% of occurrences
are not predictions – this method can help identify a)
verbs lexicalising predicted change or no change in
economies and/or economic indicators and b) phrases
(collocations) and patterns (colligations) pointing to
predictions. The co-texts of sentences also provide data on
approximators and shields. Information acquired via
morphology can then be used to refine the method for the
extraction of predictions. The challenge is to compile lists
of verbs and phrases which – combined with grammatical
patterns – can integrate information provided in
terminology collections as predictions and their language
are central to economic discourse. To economists this can
also prove a quick way to scan long Italian economic texts
in search of predictions. In the second stage of research
ways to reduce noise and thus improve the efficiency of
extraction will be explored. In particular, attempts will be
made to combine grammatical and lexical information to
refine search.
In my presentation Italian examples will also be
compared and contrasted with predictions drawn from an
English corpus of economic reports developed at the
University of Trieste. Reference will further be made to
what little data on economic predictions can be gathered
from the large written Italian corpora CORIS (general
language) and CODIS (special languages) of the
University of Bologna.
4. References
Ahmad, K. and H. Fulford, 1992. “Knowledge Processing
4. Semantic relations and their use in elaborating
terminology”, Computing Sciences Report CS-92-07.
Guildford: University of Surrey.
Ahmad, K. 1998. Specialist texts and their Quirks. In
TAMA ’98. Terminology in Advanced Microcomputer
Applications. Proceedings of the 4th TermNet
Symposium. Tools for Multilingual Communication.
Vienna: TermNet, 141-157.
Backhouse, R., T. Dudley-Evans and W. Henderson,
1993. Exploring the language and rhetoric of
economics. In W. Henderson, T. Dudley-Evans, and R.
Backhouse (eds.), Economics and Language. London:
Routledge, 1-20.
Begg, D., Fischer, S., Dornbusch, R., 1997. Economics.
5th ed.. Maidenhead: Mc-Graw-Hill.
Bloor, M., T. Bloor, (1993). How economists modify
propositions. In W. Henderson, T. Dudley-Evans, and
R. Backhouse (eds.), Economics and Language.
London: Routledge, 153-169.
Brown, P., Levinson S.C., 1987. Politeness: Some
Universals in Language Usage. Cambridge: Cambridge
University Press.
Krugman, P., 2000. The Return of Depression Economics.
London: Penguin.
Lakoff, G. 1973. Hedges: A study in meaning criteria and
the logic of fuzzy concepts. Journal of Philosophical
Logic, 2(4): 458-503.
Mankiw, G., 1998. Principles of Economics. Fort Worth
(TX): The Dryden Press.
Merlini, L., 1983, Gli atti del discorso economico: la
previsione. Status illocutorio e modelli linguistici nel
testo inglese. Parma: Zara.
Samuelson, P.A., Nordhaus, W.D., 2001. Economics. New
York: McGraw-Hill Irwin.
Serianni, L., 1991. Grammatica italiana. Italiano comune
e lingua letteraria, 2a ed.. Torino: UTET.
Sinclair, J.M., 1998. The lexical item. In E. Weigand
(ed.), Contrastive Lexical Semantics, Amsterdam/
Philadelphia: John Benjamins, 1-24.
Sobrero, A.A., 1993. Lingue speciali. In: A.A. Sobrerro
A.A. (a cura di), Introduzione all’italiano
contemporaneo. La variazione e gli usi. Roma-Bari:
Laterza, vol. II:237-277.
Tyson, L., 1997. How bright is bright? In: The World in
1998. London: The Economist Publications, 58-59.
Download