Proceedings of 8th Annual London Business Research Conference

advertisement
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
A Comparative Study of Exchange Rates of Currencies Using
Time Series Techniques
Anushree Goutam Ringne, Durga Toshniwal and Siddha Maloo
Data mining is a process of discovering novel, intriguing and useful patterns
in large data sets and deducing new relationships between them. A time
series is a sequence of data points, measured typically at successive
points in time spaced at uniform time intervals. Any data collected over time
is a time series such as weather records, sales statistics, history of stock
prices, currency exchange rates etc. Study of time series is i is crucial in
understanding the underlying principles and it provides valuable insights
into the data and hence aids the process of decision-making. Several data
mining techniques have been developed for studying the large amounts of
data. Several techniques have been proposed in the past for measuring the
similarity time series data. Euclidean distance has proven to be one of the
best and most accurate metric. In this paper, we apply data mining
techniques to real world datasets exchange rates of different currencies per
US Dollar. Our aim is to study the similarities and trends in the graphs of
the exchange rates of currencies. We use the concept of Euclidean
distance as a similarity measure to discover similarities and spot trends in
the data. The analysis of the Exchange rate curves using the Euclidean
similarity measure after normalization gives us a whole new insight into the
correlations between currencies. We were able to spot currencies, which
tended to move together, and also some which were very different than
each other. This knowledge can also be useful for predicting future
behaviors in currency exchange rates.
Field: Finance
1. Introduction
With all the advances in technology, the amount of data stored has reached a critical
mass with more companies, governments and individuals choosing to store their
information on the digital platform. A large proportion of these datasets comprise of time
series data. Some of the tasks that are normally undertaken by the time series data
mining community include indexing, clustering, similarity search, classification, prediction,
anomaly detection etc. Han & Kamber (2006) explore these topics in detail. Rafiei &
Mendelzon (1997) have worked on similarity search and Warren Liao(2005) has done a
vast survey on clustering of time series data. Indexing of time series data tries to find the
most similar time series in the database given a query time series and some similarity
dissimilarity measure.
Clustering finds natural groupings of time series in database under some similarity
dissimilarity measure whereas classification labels the unlabelled time series to some
predefined class. Prediction involves predicting the value at a future time if we have values
for the times before that. Anomaly detection finds interesting/unexpected sections in a
time series when we start with a normal time series and an un-annotated time series.
Anushree Goutam Ringne, Indian Institute of Technology Roorkee, India, Email : anushreeringne@gmail.com
Dr. Durga Toshniwal, Indian Institute of Technology Roorkee, India, Email: durgafec@iitr.ernet.in
Siddha Maloo, Mohanlal Sukhadia University, India, Email: siddhamaloo@yahoo.in
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Extracting valuable information from such huge datasets becomes a challenge due to
various reasons. First of all many of these have very high dimensionality. To efficiently
deal with massive data the dimensionality reduction is the way to go as is kind of the heart
of time series data mining. Other reason is that such huge datasets are computationally
more expensive. Hence the need for data compression has become a prerequisite to
efficient analysis of data. It not only reduces the cost of storing the data but also speeds
up the post processing. Papadimitriou et al (2007, September) have done some incredible
work on time series compressibility and privacy. Some companies are concerned about
the privacy of the data they are releasing. There has been a lot of research in this area
and various data mining algorithms have been developed. Various methods like data
perturbation, encryption and masking, k-anonymity, association rule mining etc. have been
proposed. Samarati & Sweeney (1998) worked on privacy protection using k-anonymity.
A crucial thing in analyzing any time series data is to find how different or similar they are
and one of the ways to do is to define distance between two time series. It can be then
used to decide the extent of similarity of a time series or their dissimilarity. Various
methods are used for computing the similarity measure. The most common and the most
widely used technique is the Euclidean distance. We would be describing it in detail later.
In this paper our aim is to apply data mining techniques, to analyze the exchange rates of
currencies. For searching patterns from datasets, Euclidian distance has been used as a
similarity measure. We first normalize the datasets for different currencies and then find
the Euclidian distances between them. These distances give us a way to compare the
similarities and dissimilarities amongst the curves. We have used the exchange rates of
fifteen different currencies with respect to the US dollar in this study. This is an extension
of the previous work by the Ringne & Toshniwal (2013) wherein six currencies were
studied for analyzing trends in them. Section 2 deals with the related work in this area,
section 3 explains the Euclidean distance in detail, section 4 illustrates the case studies
and results and section 5 deals with conclusions.
2. Related Work
The analysis of time series data is crucial in understanding the underlying principles and
developing models useful for predicting or collecting them. Several data mining techniques
have been developed for studying the large amounts of data. Westphal & Blaxton (1998)
describe methods and tools for solving real-world problems. Many algorithms focus on
compressing the datasets so as to make the analysis. Efficient similarity search was
studied in detail by Agrawal et al in 1993. Struzik & Arno (1999) studied the Haar wavelet
transform in the time series similarity paradigm. Toshniwal & Joshi (2005) developed a
new method based on moments for measuring the similarity between two time series.
For analyzing a set of time series it is important to know how similar or dissimilar they are.
Euclidean distance is the most common technique used to measure similarity. Dynamic
time warping is another technique which can be used to measure similarity between two
time series. Berndt & Clifford (1994, July) describe dynamic time warping in detail. An
important thing that data mining researchers have to overcome is the problem of
preserving privacy. To address this problem, methods like k-anonymity have been used.
The randomization method is another technique for privacy-preserving data mining in
which noise is added to the data in order to mask the attribute values of records. The
noise added is sufficiently large so that individual record values cannot be recovered.
Therefore, techniques are designed to derive aggregate distributions from the perturbed
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
records. Subsequently, data mining techniques can be developed in order to work with
these aggregate distributions. Santini, Simone, & Ramesh Jain presented a survey of
Similarity measures in 1999. Garilov in his paper in 2000 provided his insight into the best
measure to be used to mine stock market data. Das et al. (1997) also worked on finding
similar time series.
3. Similarity search
In this section we would be explaining briefly the data mining techniques that we would be
applying to the datasets. We would search for patterns and similarities in the exchange
rate of six different currencies using the Euclidean distance.
3.1 Similarity Measure using Euclidean Distance
Given the two points (x1, y1) and (x2, y2), the distance between these points is given by
the equation (1).
d
 x2  x1 2   y2  y1 2
……………...……………… (1)
This distance known as the Euclidean distance is also used to find the ‗distance‘ between
two time series – Wikipedia. Given two time series Q = q1…qn and C = c1…cn the
distance between two time series is given by equation (2).
DQ, C    qi  ci  …………………...……………... (2)
n
2
i 1
But if we naively try to measure the distance between two ―raw‖ time series, we may get
very unintuitive results. This is because Euclidean distance is very sensitive to some
―distortions‖ in the data. For most problems these distortions are not meaningful, and thus
we can and should remove them.
3.2 Data Normalization
Offset Translation: This is done when the two time series are similar but differ just by an
offset so it becomes meaningful to compare the distances with the offset removed.
Amplitude Scaling: This refers to normalization of the amplitudes so that they range
between 0 and 1. It is done based upon the standard deviation and the mean of the time
series data given by the equations (3) and (4)
Q
Q  mean(Q)
std (Q)
…
C
C  mean(C )
…
std (C )
…………………………… (3)
…………………………… (4)
Noise: Another transformation that must be done before doing any analysis with the time
series data is to remove noise or smooth out the time series so that we achieve better
accuracy in our results.
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
In calculating the value of Euclidean distance, we take a square root which is quite
expensive. When we compute the Squared Euclidean distance we do not take the square
root. This makes computation of squared Euclidean distance very fast as compared to the
Euclidean distance. In this paper we have used squared Euclidean distance as a similarity
measure. It must be noted that squared Euclidean distance is not a metric as it does not
satisfy the triangle inequality, but it is very fast and is used frequently when the distances
have to be compared.
4. Case Study and Results
In this section we will first describe the data that is being used in the case study. This will
be followed by the general observations that are seen in the data finally followed by a
discussion on the results
4.1 Case Data
It is known that a time series is a sequence of data which are taken with time. As such
gold prices, the Exchange rates, the sales of a company, sales data, the data from a
nuclear power plant etc are time series. And hence we apply time series data mining
techniques in order to study the data sets. We run the algorithm on the real time varying
datasets.
S. No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Table 1 The countries whose currency exchange rates are to be studied
Continent
Country
Africa
South Africa
Asia
Malaysia
Asia
India
Asia
Japan
Asia
China
Australia
Australia
Australia
New Zealand
Europe
Switzerland
Europe
Sweden
Europe
Denmark
Europe
Norway
Europe
United Kingdom
North America
Mexico
North America
Canada
South America
Brazil
The table 1 shows the countries under consideration and the Figure 1 shows the graphs of
the conversion rates of fifteen countries viz. Switzerland, Australia, Malaysia, Brazil,
Sweden, New Zealand, Denmark, Norway, India, the Great Britain, Canada, South
Africa, Mexico, Japan and China from 1922 to 2010. The graph shows the conversion
rates of the currencies per US Dollar.
4.2 General observations and Discussions
Each country has been considered individually. Franc, the currency of Switzerland shows
a general declining graph, which shows that the currency is becoming stronger with
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
respect to the US Dollar. The price of 1 US dollar was around 5 Francs in 1920s but it is
now close to 1 Franc. It was stable from 1940s to 1970s but then started gaining
popularity and strength against the US Dollar. In case of Australia, the currency was
Australian Pound till February 1966 when it was changed to Australian Dollar. Australia left
the pre-decimal pound system and moved to a decimal currency. The graph shows an
increasing trend till 2001 which is the weakest year for the Australian Dollar. It gained its
strength slowly after that and is currently at par with the US Dollar.
Figure 1. The conversion rates of different currencies from 1920s till 2010
The country of Malaysia had several currencies which go in this order: Straits Settlement
Dollar, the Malayan Dollar, the Malaysian Dollar and the Ringgit. The general trend is
steady for Malaysia and the currency is particularly strong as compared to the Indian
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Rupee. Brazil has seen considerable fluctuations in its currency. Particularly in the yare
1992 which worsened in 1993. Through the 1980s and 1990s, the Brazilian economy
suffered from rampant inflation that subdued economic growth. The economy started to
stabilize after the introduction of Plano Real which was a set of measures to stabilize the
economy by introducing a new currency. The Sweden Kronor has also stayed fairly stable,
with minor fluctuations. The mean price is around 6 Kronor for 1 US Dollar. The peak was
in 2001 which matches with the recession that hit many countries. New Zealand
conversion rate graph shows an increasing trend till 200 after which it is declining to reach
the price of 1.5 NZ dollars for 1 US Dollar. New Zealand changed its currency from New
Zealand Pound to New Zealand Dollar in 1968, following the decision by Australia to
abandon the pre-decimal pounds in 1966.
We cannot just overlook the similarity between the currencies of Denmark, Norway and
Sweden. All these currencies have exactly the same trend at the same times, the similar
dips and troughs. Though the currencies are named differently, their geographical location
and similar economy due to it is the reason why these currencies move similarly. These
three countries form the Scandinavians and have their own versions of kroner, which
seem to be of similar value. The conversion rate for Indian Rupee and British pound per
US Dollar shows an increasing tendency in Figure 1. While for the Indian Rupee the rise is
very sharp beginning around 1990s the same increase for the British Pound is gradual.
Also it may be noted that 1990s was the time when India started the path of globalization.
This paper in no way concludes that globalization is the cause for the rise, but just states
that fact that the time period of both the events coincides. Both the British Pound and the
Indian Rupee show an increasing graph, which reflects the weakening of these two
currencies.
While for the British Pound this weakening is insignificant, as the conversion rate changed
only from 0.2 to 0.7, for India the weakening is an issue which draws concern towards the
economy of the biggest democracy in the world. In 1920s the price of 1 US Dollar was 4
Indian Rupees which rose to 10 rupees in 1980s after which it grew tremendously
touching the 50 mark. The Canadian Dollar is more or less strong and is equally valued as
compared with the US dollar. Though the graph shows many ups and downs, if we look
closely they are largely insignificant. There is a peak in the year 2001 as in many other
currencies, after which it gained its strength and was almost equal to 1 US dollar in 2010.
South Africa, a country which has been a prey to the British invasion just like India has a
history of using two different currencies. The South African Pound was used till 1960
which was replaced by the national currency Rand in 1961. Rand shows a peak in the
year 2001. Mexico also had an economic crisis similar to the one at Brazil and around the
same time. Till 1992 old pesos were used, but action was taken in 1993 to stabilize the
economy and new pesos were then used as the currency in Mexico.
The exchange rate of Japanese Yen was was low till 1940s. It shows a negative slope
since the 1960s. For Chine we do not have data from 1941 till 1980. In the 1980s and 90s
the Chinese Yuan vs. US dollar conversion rate graph shows an increasing slope. It is
noteworthy that the recession that hit many countries around 2001 left China unaffected.
The economy has been stable since 1995.
Before we move forward with the similarity metric Euclidean distance, let us focus on
some of the features evident in the graph.
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Global crisis of 1985:
It must be noted that the year 1985 was one of the world‘s worst economic crisis years.
The growth rate in USA was around 1%. We can see the evidence of this economic crisis
in the peaks of the exchange rate curves. Switzerland, Australia, Brazil, Sweden, Norway,
Denmark, New Zealand, the great Britain, Canada, Japan show a peak in the graph in the
year 1985. On the other hand India, South Africa and China seem to be unaffected by this
global crisis, showing that their economy was not so much dependent on the US economy
at that time. This changes when we saw the recession of 2001. India was affected by that
crisis, though not as much as the other countries were, but the effect was significant and
conspicuous.
Global crisis of 2001:
The Canadian Dollar, the Norway Kroner, the Indian Rupee, the Denmark Krones, the
British Pound, Australian Dollar, Sweden Kroner, New Zealand Dollar and South African
Rand all show a peak around the year 2001. It signifies that their economy was affected
due to the recession that came with the recession of 2001 after the 9/11 attack on World
Trade Centre in United States.
Japan’s economy
An interesting phenomenon that we see in the Figure 1 is the prices of Japanese Yen per
US Dollar. And after the end of World War II i.e. in 1945 and the dropping of the atomic
bomb on Hiroshima and Nagasaki the economy of Japan was disturbed. This leads to the
steep slope of the increasing exchange rates from 1945 and the large stable value till
1965. As the economy began to settle and Japan started to develop the exchange rate of
Japanese Yen per US Dollar started showing a decreasing trend.
4.3 Results from the Euclidean distance similarity metrics
For this case study we normalized the data so that the values would lie between 0 and 1
for each period of interest. We use a period of interest (POI) of 20 years. Our aim is to
spot the trends in this POI. The first POI is from 1961 to 1980. What we have done is,
used the values from 1960 to 1980 for all the four currencies and calculated the Euclidean
distance amongst them. The results are tabulated in Figure 4. The next POIs are from
1971 till 1990, 1981 till 2000 and 1991 till 2010. For each pair of currencies we computed
the Euclidian distance between them and the results are tabulated in Figure 2, 3, 4 and 5.
We will see the results and discussions for each Period of Interest individually.
Period of Interest 1960-1980
For the first POI i.e. from 1960 to 1980, the ten pairs of currencies which behaved most
similarly were South Africa-India, Mexico-Brazil, Denmark-Malaysia, South Africa-Sweden,
South Afrca-Canada, Norway-Malaysia, Britain-New Zealand, India-Sweden, IndiaCanada and Australia-New Zealand. It is noteworthy that Japan and Switzerland have a
high value for squared Euclidean distance for all the currencies. Graph for Japan shows a
decreasing slope during this period of interest, i.e. it is recovering from the bombings in
1945 and the economy is growing well, whereas Switzerland‘s economy also seems to be
quite strong. Both these countries have a high similarity which is evident from the value of
their squared Euclidean distance with respect to each other.
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Figure 2. Squared Euclidean distance between exchange rates of currencies for
POI 1 1961-1980
Figure 3. Squared Euclidean distance between exchange rates of currencies for
POI 2 1971- 1990
Figure 4. Squared Euclidean distance between exchange rates of currencies for
POI 3 1981- 2000
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Figure 5. Squared Euclidean distance between exchange rates of currencies for
POI 4 1991-2010
Period of Interest 1971-1990
In the POI 1971 to 1990 the currencies with similar trend were South-Africa-India,
Australia- New Zealand. These two pairs had the value of the squared Euclidean distance
less than 3. In this POI too, Japan was on the path of strengthening its economy.
Switzerland‘s economy seems to be good too in this POI. The most dissimilar currency
pair was Denmark-Malaysia. While the slope of Malaysian currency exchange rate was
almost zero, the Denmark Krones has an increasing slope till 1985.
Period of Interest 1981-2000
In the POI 1981 to 2000, the most similar currency pair was Japan-Switzerland, both of
them rapidly gaining strength and showing a negative slope in the graph. Interestingly it is
the only pair below the squared Euclidean distance 3. The countries whose currencies
behaved differently or rather interestingly in this time of interest are Brazil, China, Japan
and Mexico. Brazil and Mexico both were battling with inflation, corruption and a dying
economy. Nevertheless the similarity index for Brazil and Mexico, do not suggest that they
behaved any similarly. Brazil has a peak in 1993 while Mexico‘s economy was at its worst
in 1991 after which it recovered due to government interventions and schemes. Japan
continues to grow in this POI and strengthen its currency. China seems indifferent and it
gives the impression that its economy is not affected at all by the other countries or events
and is completely independent. Chinese economy is similar to India, South Africa and
Malaysia. It becomes more evident in the next POI wherein the currency exchange rate is
non-fluctuating and is constant at 6 Yuan per US dollar.
Period of Interest 1991-2010
In the POI 1991 to 2010, the most similar currency pair is again that of Japan-Switzerland.
New Zealand-Australia and Sweden-Australia are the two other country pairs who behave
very similarly in this period of interest. The most dissimilar currency in this POI is JapanIndia. While the graph for Japan is a slow declining one, which is good for the country, the
exchange rate of Indian Rupee per US dollar has increased significantly. As mentioned
earlier the affect of the Global economy on India started with the beginning of liberalization
and globalization in the 1980s and 1990s. Beginning with 2.74 Indian Rupees per US
dollar, we have the rate at 45.65 Indian Rupees per US dollar in 2008. This is an increase
of 1566.05% from the initial 2.74 in 1928.
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Table 2: Possibly correlated currencies
Sr. No
1
2
3
Countries whose currencies are correlated
South Africa-India
Sweden-Norway-Denmark
New Zealand-Australia
Currencies which behaved differently than the others:
 Japan
 Switzerland
 China
When we consider the entire time period from 1960 to 2010, we find these three currency
pairs most similar: South Africa-India, Japan-Switzerland and New Zealand-Australia. In
these three pairs the former ones are more similar than the latter ones. Japan-New
Zealand was the currency pair that was most dissimilar followed by Japan-India and
Japan-Australia.
5. Conclusion
Data mining is used for the study of the exchange rates of currencies for fifteen different
countries. The analysis of the Exchange rate curves using the Euclidean similarity
measure after normalization gives us a whole new insight into the correlations between
currencies. We concluded that the decline of Japanese Yen can be attributed to the bad
economy after the end of World War II. Switzerland, Australia, Sweden, Denmark,
Norway, New Zealand, India, Britain, Canada, South Africa and Japan show a peak
around the year 2001. It signifies that their economy was affected due to the recession
that came with the recession of 2001 after the 9/11 attack on World Trade Centre in
United States. It must be noted that the year 1985 was one of the world‘s worst economic
crisis years. The growth rate in USA was around 1%. We can see the evidence of this
economic crisis in the peaks of the exchange rate curves. Switzerland, Australia, Brazil,
Sweden, Norway, Denmark, New Zealand, the great Britain, Canada, Japan show a peak
in the graph in the year 1985. On the other hand India, South Africa and China seem to be
unaffected by this global crisis, showing that their economy was not so much dependent
on the US economy at that time. This changes when we saw the recession of 2001. India
was affected by that crisis, though not as much as the other countries were, but the effect
was significant and conspicuous. When we consider the entire time period from 1960 to
2010, we find these three currency pairs most similar: South Africa-India, JapanSwitzerland and New Zealand-Australia. In these three pairs the former ones are more
similar than the latter ones. Japan-New Zealand was the currency pair that was most
dissimilar followed by Japan-India and Japan-Australia.
References
Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques.
Morgan kaufmann, 2006.
http://en.wikipedia.org/wiki/Euclidean_distance
Proceedings of 8th Annual London Business Research Conference
Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3
Agrawal, Rakesh, Christos Faloutsos, and Arun Swami. Efficient similarity search in
sequence databases. Springer Berlin Heidelberg, 1993.
Struzik, Zbigniew R., and Arno Siebes. "The Haar wavelet transform in the time series
similarity paradigm." Principles of Data Mining and Knowledge Discovery. Springer
Berlin Heidelberg, 1999. 12-22.
Toshniwal, Durga, and Ramesh C. Joshi. "Finding similarity in time series data by method
of time weighted moments." Proceedings of the 16th Australasian database
conference-Volume 39. Australian Computer Society, Inc., 2005.
Berndt, Donald, and James Clifford. "Using dynamic time warping to find patterns in time
series." KDD workshop. Vol. 10. No. 16. 1994.
Ringne A., Toshniwal D., ―Interpreting Financial Datasets using Time Series Data mining
Techniques: a searhch for similarities and features‖, 3rd IIM Ahmedabad
International Conference on on Advanced Data Analysis, Business Analytics and
Intelligence, April 2013
http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
Westphal, Christopher, and Teresa Blaxton. "Data mining solutions: methods and tools for
solving real-world problems." 1998.
Rafiei, Davood, and Alberto Mendelzon. "Similarity-based queries for time series data."
ACM SIGMOD Record 26.2, 1997: 13-25.
Warren Liao, T. "Clustering of time series data—a survey." Pattern Recognition 38.11,
2005: 1857-1874. http://en.wikipedia.org/wiki/First_moment_of_area
Santini, Simone, and Ramesh Jain. "Similarity measures." Pattern analysis and machine
intelligence, IEEE transactions on 21.9 (1999): 871-883.
Gavrilov, Martin, et al. "Mining the stock market (extended abstract): which measure is
best?." Proceedings of the sixth ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, 2000.
Das, Gautam, Dimitrios Gunopulos, and Heikki Mannila. "Finding similar time series."
Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg,
1997. 88-100.
Download