Society Grids Khurshid Ahmad Abstract

advertisement
Society Grids
Khurshid Ahmad, Lee Gillam and David Cheng
Dept. of Computing, University of Surrey
Abstract
A grid implementation is described that can deal with large volumes of streaming free natural
language text in conjunction with large sets of time series data. Processing speed-ups on a cluster
of 24 machines (81 CPUs) for dealing with texts in excess of 100 million words (of text) are
reported. The application area is econometrics, specifically the behaviour of financial markets, and
the methodology reported can extend the scope of the Surrey’s Society Grid to strategically
important areas of crime science and social anthropology. The data and compute requirements
identified in the three areas compare well with the traditional concerns in grid computing. Our
studies indicate problems of scalability especially when dealing with multi-modal data – texts and
numbers.
1. Introduction
A large number of demonstrator projects in eScience focus on ‘real-time’ data, paradoxically
stored in large data archives, referred to by
some as data tombs (Fayyad and Uthurusamy
2002). Terabytes of data emanate from quarkhunting expeditions and from engines of jet
aircraft across time zones. These data are,
reportedly, efficiently stored, rapidly retrieved,
intelligently processed, and visualized. Neither
the particle physicists nor the structural-safety
engineers will receive any more data, unless
new data are sought, and relatively well
established models, motivated within a crisply
defined framework, will be used to find the
elusive quark or the potentially damaging
vibrations.
Much real-world data does not come from a
single source or is without contamination that
cannot be readily eliminated, nor is it ever
‘complete’ - the data has both time- and
frequency-domain components and involves
both global and local influences that are
difficult to quantify. Dynamic, real-world data
influences, and is influenced by, other
seemingly independent data sets. Models used
to analyze such data are based on overlapping
theoretical frameworks: there are seldom, if any,
unified theories or the opportunity to make
universal approximations.
In practical terms, the logistics used in
conveying information to and from real-world
systems includes a range of modes: natural
language, images, and (sets of) numbers are
prominent. A computational grid can facilitate
capturing and processing of data, and
visualizing the processed data, in one modality,
and synthesizing results across modalities. The
construction of such a multi-modal grid, that
processes real-world data, will help in
understanding problems of building such
systems and how to scale up the results of
existing mono-modal grid systems.
Multi-modal data invariably brings with it
the question of how to fuse results of
computations over data was articulated in
different modes.
The mode (numbers or
language for instance) imposes methodological
constraints: typically, quantitative methods are
used directly on numerical data for summarizing
a time series; qualitative methods are used to
process texts such that one ends up with a set of
statistics summarising the contents of one or
more text documents.
In this paper we describe the construction
and testing of a multi-modal grid that can
process numerical and textual data. The data
sets are composed from continuous livestreaming data captured from dedicated
datafeeds that are processed using upto 24machines. We motivate the discussion about a
multi-modal grid by outlining one application
area – econometric analysis of markets that
includes the analysis of the value of transactions
as well as the effect of news on such
transactions. The numeric data used is a timeordered record of financial transactions; the text
data is the financial news that relates to such
transactions and to economic and political
events of world interest that impact on financial
decisions. The method for converting streaming
news into time-ordered signals that were
articulated in language is described. This is
followed by the description of a grid and its
performance. Discussion includes a typology of
data and methods that can be adapted to a
number of other societal issues including the
perception of crime and racial violence.
Note that we have reported the results of
time series analysis and Monte Carlo
simulations elsewhere (Ahmad, Gillam and
Cheng 2005) and we will be focusing
exclusively on the problem of qualitative data
(news texts) and analysis.
2. Motivation
Consider econometric data – movements of the
values of financial instruments sometimes
integrated with the economic/financial value of
enterprises - that are collected in real-time and
rendered as time-series. The data are artifacts
of how humans interact: rises and falls in value
represent some form of consensus; human
investment activities produce data, analysis of
which results in further investments producing
even more data, but by the time analysis of data
is complete, the data set may already have
changed. Time series analysis involving Monte
Carlo simulations and stochastic models, with
which the e-Science community is familiar, are
used for predicting the value of the financial
instruments at a certain time, based on such
data. But the analysis is found to be flawed
under different economic conditions, for
example during boom or bust periods. Scholars
have discovered that investors and traders can
suspend rational reasoning, and that their
sentiments interfere with their decision-making.
This may involve use of covert information,
greed, false expectations, herd instincts and so
on.
The Yale University Center for
International Finance publishes a monthly
survey of both investor and trader sentiment
related to performance of the US economy – the
variance in the expectations of the two groups is
very clear (Shiller 2003).
A study of individual or group (human)
behaviour requires a conjunctive analysis of
rational and other behaviour.
Such work
necessitates consideration of the notion of
bounded rationality. Human decision making, it
appears, is bounded: neither always rule
governed nor always informed by prior
empirical knowledge (Simon 1992, Kahneman
2002). Econometricians have argued that the
hopes, fears, aspirations and disappointments of
the investors and traders in a financial market
manifest themselves as changes in the price of
one or more financial instruments. The effect of
exaggerated hopes and aspirations on the part of
naïve or greedy or confused investors/traders
may lead to a short term erratic upwards
movement in prices, but the proponents of
efficient market hypothesis (EMH) have argued
that other (more) rational traders and investors
work to counter the bounded rationality of the
greedy; similarly fears and disappointments of
some may lead to erratic downwards behaviour
of prices and again the rational agents intervene
to check behaviour that is not based on rational
analysis. It has been noticed, however, that
whilst the prices may consolidate in the short
term, the order flow remains erratic for much
longer and, in turn, presents a challenge for
EMH.
A considerable body of literature exists
where econometricians have analysed the
‘impact of news’ on the prices, and indirectly on
the order flows. The tradition here is to use
methods and techniques of time-series analysis more precisely, generalized autoregressive
conditional heteroskedasticity (GARCH) - and
to isolate the unexpected changes (shocks) that
may occur in the price over a period of time.
Processing an aggregate of such records (over a
1, 15 or 30 minute, daily, weekly, monthly or
yearly interval), especially for finding autocorrelation within the time series of one
instrument or the cross-correlation amongst
many (see, Percival & Walden 2000), requires
distributed systems with fairly short response
times. The challenge here is intellectual and has
concomitant strategic and commercial import.
Engle (1993), the co-winner of the 2003
Nobel Prize in Economics who has developed
and used GARCH, has argued that the impact of
news may be positive or negative, but the effect
of negative news lasts much longer than that of
positive news.
This asymmetry has been
observed empirically and Engle has shown how
to model the asymmetry. Computing the ‘news
impact’ is a computationally intensive task
requiring, on the one hand, fast and efficient
calculations, and on the other, substantial
volumes of data storage and handling
capabilities – a large number of key financial
instruments are each bought and sold many
times within one second, and in some cases in a
variety of financial markets.
The news impact analysis, however, does
not explicitly use linguistic data. The
econometricians, by and large, use information
proxies. For instance, Engle’s past and current
work
uses
the
timings
of
various
announcements from financial and monetary
authorities as a placeholder or proxy for the
details of the news itself. Andersen et al (2002)
have noted that: News announcements matter,
and quickly; the timing of announcement
matters; the rate of change adjusts to news
gradually; the effect on traded volume persists
longer than on prices. Other authors have used
sentiment proxies: the changes in the values of
economic variables, for example, turn-over of a
stock exchange or number of initial public
offerings, are subjected to factor analysis, and a
novel sentiment index is created (Baker and
Wurgler 2004).
Some researchers pre-select keywords that
indicate change in the value of a financial
instrument – including metaphorical terms like
above, below, up and down – and use them to
‘represent’ positive/negative news stories.
Others use the frequency of collocational
patterns for assigning a ‘feel-good/bad’ score to
the story (see, for example, DeGennaro and
Shrieves 1997, and Koppel and Shtrimberg
2004). Table 1 shows such sentiment proxies frequent metaphorical or literal keywords that
can be used as placeholders for investor/trader
sentiment.
Sentiment in
news stories
‘Good’ news
stories
‘Bad’ news stories
‘Neutral’ stories
Lexical Content
appear to comprise collocates
like revenues rose, share
rose;
may contain profit warning,
poor expectation;
usually contain collocates
such as announces product,
alliance made;
Table 1: Lexical content of a news story and
the implied sentiment.
The ‘sentiment’ of the story is then
correlated with that of a financial instrument
cited in the stories and inferences made.
Sentiments are difficult to detect, but there may
be evidence of sentiment detectable in financial
news (comprising facts as well as rumours),
company reports (containing a rosier view of
the world), and speeches of key players
(bringing glad tidings or otherwise). There is a
body literature emerging that includes the
description of algorithms and programs that can
detect sentiment in texts, that range from film
and holiday resort reviews to restaurant reviews,
based on the mutual information metric. This
metric has been used to compute the joint
probability distribution of one arbitrary word
(w1) with another arbitrary word (w2):
MI(w1,w2) = p(w1 & w2) / (p(w1)*p(w2))
where p(w) is the probability of the word
occurring and p(w1, w2) is that of the two
words occurring together. Turney (2002) has
used the metric to detect the semantic
orientation (SemOr) of individual phrases used
by a reviewer by in conjunction with sentiment
words ‘excellent’ and ‘poor’:
SemOr(phrase)= MI(‘excellent’, phrase) –
MI (‘poor’, phrase)
The sum of SemOr for all pre-selected phrases
is computed – if the sum is negative then the
given review is deemed negative otherwise it is
deemed positive.
3. Method
In the above-mentioned methods of sentiment
analysis, either the sentiment ‘variables’ and
metrices use information proxies, or they rely
on pre-selected keywords and phrases. The
implicit and explicit methods are designed to
avoid ambiguity that is inherent in natural
language based communication. However, it is
important to explore whether or not these
sentiments can be extracted with a minimum of
ambiguity where the premium is on avoiding
false positives.
We have discussed, elsewhere, the
amenability of special language texts for
automatic analysis - the authors of special
language texts are trained to avoid ambiguity.
This is not to say that specialist writers or their
readers always succeed, but the chances of a
writer confusing a reader are lower than that of
writer’s of non-specialist texts. It has been
shown that a pre-selected collection of texts has
a lexical profile – a set of single words that are
characteristic of a specialism. The profile
dominates most compound words and indeed
large number of meaning bearing phrases. The
texts not only have a restricted vocabulary,
albeit profusely used, but appear also to have
syntactic restrictions that result in largely
unambiguous phrases (see Ahmad, Gillam and
Cheng 2005 and references therein).
We adopt a text-driven and bottom-up
method: starting from a collection of texts in a
specialist domain, together with a representative
general language corpus. We describe a five
step algorithm for identifying discourse patterns
with more or less unique meanings, without any
overt access to an external knowledge base:
I. Select training corpora
General Language: The British National
Corpus, contains 100-million tokens distributed
over 4124 texts (Aston and Burnard 1998);
Special Language: Reuters Corpus Volume
1 (RCV1) comprising news texts produced in
1996-1997: contains 181 million tokens
distributed over 806,791 texts. (For describing
how our method works we will use a randomly
selected component of the corpus – the output
of February 1997 comprising 14,244,349
tokens, henceforth referred to as the RCV1Feb97 corpus).
II. Extract key words: The frequencies of
individual words in the RCV1-Feb97 were
computed using System Quirk. The frequency
of the words in the RCV1-Feb97 corpus is
compared with the frequency of the same words
in the BNC. A word that is used more
frequently in the RCV1-Feb, according to a
statistical criterion referred to as weirdness, than
in the BNC is regarded as a candidate keyword
(Ahmad 1995). The grammatical words (the, a,
an, and, but..), usually described as a stop list,
have a very similar distribution, but subject
specific words have a rather different
distribution (see Table 2):
fR
Word
percent
market
fR/NR
fG
fG /NG Weirdness
(a)
(b)
('c)
(d)
(b)/(d)
65763 0.46% 2928 0.00%
157.84
36349 0.26% 30078 0.03%
company 29058 0.20% 40118 0.04%
8.49
5.09
bank
28041 0.20% 17932 0.02%
10.99
shares
23352 0.16%
19.51
8412 0.01%
Table 2. Occurrences of most frequent words
in RCV1-Feb (fR) compared with their
frequency in the BNC (fG). NR=14.24Million;
NG=100 million
III. Extract key collocates:
Collocation
patterns, combinations of words that occur
together frequently, are considered as indicators
of meaning and intent of the author.
Techniques have been developed recently that
attempt to compute the statistically significant
patterns. In our method, the focus is on the
collocates of the most frequently used single
words – selection based on frequency can then
be readily programmed. System Quirk has
modules to do just that (Ahmad, Gillam &
Cheng 2005).
The key collocates of the most frequent
word in RCV1-Feb – percent - are up, rose,
rise, down and fell. This results in the patterns
rose X percent, X percent rise, and up [by] X
percent. Our method automatically selects these
collocates, from them computes collocates of
these collocates.
The collocation patterns suggest that the
metaphorical words, rose, fell, up, down,
usually used to refer to movement of objects in
physical space have been transferred (the origin
of the word metaphor) over to the change in the
value of the rather abstract financial
instruments.
IV. Extract local grammar using collocation
and relevance feedback: The frequent
collocates have an unambiguous interpretation,
and the avoidance of ambiguity is the
cornerstone of modern information retrieval.
The frequent collocates of collocates have still
more unambiguous interpretation. This has
helped us write the programs that recognize
these patterns automatically and is important for
dealing with the deluge of texts – c. 100,000
words per hour.
The ambiguities typically occur because a
pivotal verb (or noun) in a sentence can be
replaced by other verbs (or nouns). The
specialist nature of financial news restricts the
use of such verbs (nouns) to a very small subset
of such words in the language and thereby
minimizes ambiguity. This approach is contrary
to the current paradigm of natural language
processing that is grounded in universal
grammar – where many words can be used
interchangeably. The approach used in Society
Grids is called local grammar. Figure 1 shows
a local grammar patterns that is amongst the
most frequent ones in our training corpus – for
down. The patterns were then tested on the bulk
of the RCV1 corpus and the precision of the
local grammar patterns was considerably
improved over the precision on single word
retrievals.
Figure 1: A finite state automaton for
recognising negative sentiment sentences
comprising ‘down’
The
local
grammar
is
used
to
unambiguously identify sentences that contain
sentiment bearing phrases and to automatically
annotate the phrases.
Figure 2. The results of ‘filtering’ raw news for a 48 hour period (top line) for differentiating words
that carry sentiment information within (bottom line) and without (middle line) the local grammar
patterns.
Figure 3. A differentiated view of the positive and negative sentiment within the local grammar
patterns for the data described above.
Figure 2 shows the filtering power of the
local grammar patterns: the patterns identify
between 1,000 and 10,000 sentiment words in a
corpus of between 10,000 to 100,000 tokens
arriving per hour to find between 10 to 100
‘true’ sentiment bearing sentences.
The system differentiates between ‘negative’
and ‘positive’ sentiments (Figure 3). Here some
user intuition is used to suggest one word or
phrase has a negative connotation as opposed to
a positive one. The positive and negative
sentiment time-series can then be correlated
with the time series of financial data.
4. The Society Grid demonstrator
The first prototype of our Society Grid
demonstrator was developed under the aegis of
the ESRC e-Social Science Programme
(FINGRID project). We demonstrated how
Grid technologies could support novel research
activities in financial economics that involve the
rapid processing and combination of large
volumes of time-varying qualitative and
quantitative data. We used Globus (GT3) with
the Java CogKit to integrate:
•
Live financial data: news, historical time
series data and tick data provided by
Reuters, (Reuters SSL SDK).
• Time series analysis: a FORTRAN
bootstrap algorithm, and the MATLAB
toolkit for Wavelet Analysis (via JMatLink)
• News/Sentiment analysis: System Quirk
components for terminology extraction,
ontology learning and local grammar
analysis.
• Visualisation and fusion: System Quirk
components for corpus visualisation,
financial charting, and data fusion.
The Society Grid demonstrator enables the
extraction of patterns of language from large
collections of text that indicate changes in
events or values of objects, and the correlation
of these with movements in financial markets.
These patterns are extracted semi-automatically
using methods of corpus linguistics, pioneered
and tested in the System Quirk framework, to
discover keywords, a select group of verbs, and
orthographic markers. We discovered the local
grammar that governs the ordering of these
keywords, verbs and markers in sentimentbearing sentences.
We have shown how
econometricians and empirical and financial
economists could use Grid technologies to
facilitate research and collaboration.
The Society Grid comprises 24 machines1 in
addition to a financial datafeed provided by
Reuters Financial Services (c. 25 MB or 6000
news items on average per day; one year is
around 2 GB texts). We have developed
programs using Reuters SSL Developer Kit
(Java) to capture the news, historical time series
data and tick data. Reuters supply news with
categories, authorships and date information.
We have followed Hughes and Bird (2003)
word frequency counting approach to evaluate
the performance of our implementation. The
corpora used in our experiments are the Brown
Corpus and the Reuters RCV1 Corpus: see
Table 3 for details. The computational power of
our grid implementation has been reported in
Ahmad et al. (2004), with an 8-node
configuration.
Files
Brown
RCV1
500
806,791
Size (Mb)
5.2
2576.8
Words (M)
1.0
169.9
Table 1: Size of the corpora
Theoretically, the performance gain in a
Grid environment is proportional to the number
of machines being used. In parallel processing,
the overall execution time of the task is taken by
measuring the process that finishes at last. To
discount this factor, we explored the use of an
8-CPU, 16-CPU, 32-CPU and 64-CPU
configuration. Each experiment was repeated
10 times, and the average was recorded. Figure
4 below shows the time taken in seconds to
complete the word frequency counting on the
Reuters RCV1 corpus.
Time in seconds
7000
6000
5000
4000
3000
2000
1000
0
0
16
32
48
64
80
Number of CPUs
Figure 4. Time taken to perform word
frequency counting on the Reuters RCV1
with different numbers of CPUs.
We have observed a performance gain of 47%
in using a 16-CPU grid rather than a 8-CPU
grid; a gain of 33% in moving from a 16-CPU
grid to a 32-CPU grid; a gain of 16% in moving
from a 32-CPU grid to a 48-CPU grid; and a
mere gain of 7% in moving from a 48-CPU grid
to a 64-CPU grid configuration. To investigate
degradation of performance, we decomposed
the execution time of the word frequency
counting process into four parts: preparation
time (time required to allocate the task),
GridFTP upload time (upload the necessary
files to each machines), processing time (time
required to perform the word frequency
counting) and GridFTP download time (time
required to download the results). Figure 5
shows the decomposition of time taken to
complete the word frequency counting on the
Reuters RCV1 corpus.
7
6.5
Time in ms (Log scale)
4.1 Design and Performance of the Society
Grid
8000
6
Preparation Time
GridFTP Upload Time
5.5
Processing Time
GridFTP Download Time
5
4.5
4
0
16
32
48
64
80
Number of CPUs
Figure 5. Decomposition of time taken to
perform word frequency counting on the
Reuters RCV1 with different number of
CPUs
1
We have 19 Dell PowerEdge 2650 with 1 GB
memory and dual processors; and 5 Dell
Optiplex GX150s with 256MB memory, single
processor. 81 CPUs are available across these
24 machines.
The bulk of computation is in the word
frequency counting – where the actual
processing occurs. By considering this alone,
the performance gain was 49%, 39%, 22% and
10% respectively in moving from an 8-CPU
grid to 64-CPU grid.
5. Society Grids – What next?
The methods and techniques developed in
our prototype can be used to investigate how a
person’s perception of his or her own well
being, at different times and in different places,
and in various facets - social, political and
economic - can be the same or at variance with,
say for example, crime statistics, economic
indicators, achievements or failures of (other)
ethnic/racial categories.
Evidence of such
bounded rationality
includes:
(i) the
reassurance gap: the difference between crime
rates and the public perception of crime
(Fielding 1995, Fielding, Innes and Fielding,
2002); and (ii) internal war (Kaldor 1999):
where emotional/affective responses are needed
to compensate for the limitations of the rational
action model. The reassurance gap and internal
war are both mediated by a discourse pattern
that fuels bounded rationality – racist web sites,
Data
Mode Type (Freq)
Numerical (D)
Quantitative
Numerical (C)
Informative
Qualitative
Appellative;
Expressive
minority community newspapers, inflammatory
speeches, all publicly accessible and all laden
with sentiments.
The data in the three different disciplines
have significant overlaps and a number of
differences as well. All three fields have
discrete data and continuous data, quantitative
data and qualitative data. The text types range
from informative text, that are primarily
designed to convey information from some
knowledgeable person to a less knowledgeable
person, to expressive texts, where a
knowledgeable person is seeking or transferring
knowledge to those with equal knowledge. A
new data type we have recently identified is that
of appellative texts where somebody is
competing to transfer their knowledge. Each of
the text types contains sentiment bearing
sentences, and once extracted this sentimentrelated information can be used in conjunction
with the quantitative data.
Financial Economics
Crime Science
Social
Anthropology
Macro-micro Economic Indicators; Census Statistics; Survey of Social
Attitudes; Life-style/Well-being Statistics
MARKET MOVEMENT
CRIME STATISTICS
ETHNICITY DATA
General News Reports and Editorials
FINANCIAL NEWS;
POLITICAL NEWS
ETHNO-CULTURAL
NEWS
Financial/Monetary
Police Forces/Home Office Reports; Crime
Regulators’ Reports
Reports
LETTERS TO THE EDITOR; RUMOUR-LADEN E-MAILS; COMMENTARIES
Semi-structured interviews (Traders, Citizens)
CITIZEN SURVEYS
INVESTOR SURVEYS
Table 3: Data and Mode typology for e-Social Science, ‘D’ indicates discrete data and ‘C’
continuous data
We are currently seeking support to extend
the methods, tools and techniques of e-Science
and of Society Grids, to fuse the quantitative
and qualitative data in a mono-discipline
(econometrics, sociology of crime, and social
anthropology) and to fuse the analysis across
the disciplines. The experts are looking at
different facets of the same reality and we aim
to integrate this analysis. We believe that the
methods, techniques and prototypes we have
developed using leading-edge techniques to
analyse large data sets, both quantitative and
qualitative, with reference to market sentiment,
sentiment analysis at large, can make a potential
contribution to understanding crime, conflict,
and economy. Grid technologies can benefit
traditional social science analysis that begins
with an attempt to find correlation between the
onset of a crisis, for example, racial violence,
insecurity amongst citizens and stock market
crash, and variables related to system attributes
(local, national, or market systems), social
divisions, economic-activity data, types of
systems involved and external context. Grids
for Social Scientists will have matured when
social scientists can quickly and easily explore
such phenomena through combinations of
methods of textual, historical, theoretical, and
numeric analyses: when social scientists can
focus on the science, not on the technology
required to undertake the science.
Acknowledgements
The work described was part-funded by the ESRC
(FINGRID: RES-149-25-0028), EPSRC (SOCIS:
GR/M89041/01, REVEAL: GR/S98450/01) and EU
(LIRICS: eContent EDC-22236). In particular we
would like to thank our colleagues at Surrey: Prof
Nigel Fielding (Sociology), Prof John Eade
(Anthropology), and Dr M Rogers (Linguistics) and
we are grateful to Prof John Nankervis (Essex) and
Prof Yorick Wilks (Sheffield) for discussions on
econometrics and linguistics.
References
Ahmad, K. (1995). “Pragmatics of Specialist
Terms and Terminology Management” In (Ed.) Petra
Steffens. Machine Translation and the Lexicon.
(LNAI, Vol. 898) Heidelberg: Springer. pp.51-76
Ahmad, K., Gillam, L., and Cheng, D. (2005)
“Textual and Quantitative Analysis: Towards a new,
e-mediated Social Science”.
Proc. of the 1st
International Conference on e-Social Science
(Manchester, June 2005).
Ahmad, K., Taskaya-Temizel, T., Cheng D.,
Gillam, L., Ahmad, S., Traboulsi, H. and Nankervis,
J. (2004). Financial Information Grid – an ESRC eSocial Science Pilot, Proceedings of the Third UK eScience Programme All Hands Meeting (AHM 2004),
Nottingham, United Kingdom. © EPSRC Sept 2004,
(ISBN 1-904425-21-6)
Andersen, T. G., Bollerslev, T., Diebold, F X.,
& Vega, C. (2002). Micro effects of macro
announcements: Real time price discovery in foreign
exchange. National Bureau of Economic Research
Working
Paper
8959,
http://www.nber.org/papers/w8959.
Aston, G., and Burnard, L. (1998). The BNC
Handbook. Edinburgh: Edinburgh University Press.
Baker, M., and Wurgler, J. (2004). Investor
Sentiment and the Cross-Section of Stock Return,
NBER Working Papers 10449, Cambridge, Mass
National Bureau of Economic Research, Inc.
Hughes, B., and Bird, S. (2003). “Grid-Enabling
Natural Language Engineering by Stealth”, In Proc.
of HLT-NAACL 2003 (Workshop on SEALTS), pp.
31-38, Association of Computational Linguistics,
2003.
DeGennaro, R., and R. Shrieves (1997): ‘Public
information releases, private informationarrival and
volatility in the foreign exchange market’. Journal of
Empirical Finance Vol. 4, pp 295–315.
Engle, R. F. and Ng, V. K (1993). Measuring
and testing the impact of news on volatility, Journal
of Finance. Vol. 48, pp 1749—1777.
Fayyad, U. and Uthurusamy, R. (2002) “Evolving
data mining into solutions for insights”.
Communications of the ACM 45(8), pp28-31
Fielding Nigel. (1995). Community Policing.
Oxford: Oxford University Press
Fielding, N., Innes, M., and Fielding, J. (2002).
‘Reassurance Policing and the Visual Environmental
Crime Audit in Surrey Police: a Report’. Guildford:
Univ. of Surrey Department of Sociology.
Kahneman, D. (2002). Maps of Bounded
Rationality: A Perspective on Intuitive Judgment and
Choice (A Nobel Prize Lecture December 8, 2002).
Kaldor, M. (1999). New and Old Wars:
Organised Violnece in a Global Era. Polity Press:
Cambridge.
Koppel, M and Shtrimberg, I. (2004). “Good
News or Bad News? Let the Market Decide”. In
AAAI Spring Symposium on Exploring Attitude and
Affect in Text. Palo Alto: AAAI Press. pp. 86-88.
Percival, D. B., and A. T. Walden. 2000. Wavelet
Methods for Time Series Analysis. Cambridge
University Press.
Shiller R. J. (2003). The New Financial Order:
Risk in the 21st Century. Princeton: Princeton
University Press.
Simon, H. (1992). ‘Rational Decision Making in
Business Organisations (A Nobel Memorial Lecture,
8 December, 1978).’ In (Ed.) Assar Lindbeck. Nobel
Lectures in Economic Sciences 1969-1980.
Singapore: World Scientific Publishing Company.
(Available
at
http://nobelprize.org/economics/laureates/1978/simon
-lecture.pdf)
Turney, Peter D. (2002). “Thumbs Up or Thumbs
Down?
Semantic
Orientation
Applied
to
Unsupervised Classification of Reviews”. Proc. 40th
Ann. Meeting of the Ass.n for Comp.l Ling. (ACL).
Philadelphia, July 2002, pp. 417-424. (available at
http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf)
Download