Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 A Comparative Study of Exchange Rates of Currencies Using Time Series Techniques Anushree Goutam Ringne, Durga Toshniwal and Siddha Maloo Data mining is a process of discovering novel, intriguing and useful patterns in large data sets and deducing new relationships between them. A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Any data collected over time is a time series such as weather records, sales statistics, history of stock prices, currency exchange rates etc. Study of time series is i is crucial in understanding the underlying principles and it provides valuable insights into the data and hence aids the process of decision-making. Several data mining techniques have been developed for studying the large amounts of data. Several techniques have been proposed in the past for measuring the similarity time series data. Euclidean distance has proven to be one of the best and most accurate metric. In this paper, we apply data mining techniques to real world datasets exchange rates of different currencies per US Dollar. Our aim is to study the similarities and trends in the graphs of the exchange rates of currencies. We use the concept of Euclidean distance as a similarity measure to discover similarities and spot trends in the data. The analysis of the Exchange rate curves using the Euclidean similarity measure after normalization gives us a whole new insight into the correlations between currencies. We were able to spot currencies, which tended to move together, and also some which were very different than each other. This knowledge can also be useful for predicting future behaviors in currency exchange rates. Field: Finance 1. Introduction With all the advances in technology, the amount of data stored has reached a critical mass with more companies, governments and individuals choosing to store their information on the digital platform. A large proportion of these datasets comprise of time series data. Some of the tasks that are normally undertaken by the time series data mining community include indexing, clustering, similarity search, classification, prediction, anomaly detection etc. Han & Kamber (2006) explore these topics in detail. Rafiei & Mendelzon (1997) have worked on similarity search and Warren Liao(2005) has done a vast survey on clustering of time series data. Indexing of time series data tries to find the most similar time series in the database given a query time series and some similarity dissimilarity measure. Clustering finds natural groupings of time series in database under some similarity dissimilarity measure whereas classification labels the unlabelled time series to some predefined class. Prediction involves predicting the value at a future time if we have values for the times before that. Anomaly detection finds interesting/unexpected sections in a time series when we start with a normal time series and an un-annotated time series. Anushree Goutam Ringne, Indian Institute of Technology Roorkee, India, Email : anushreeringne@gmail.com Dr. Durga Toshniwal, Indian Institute of Technology Roorkee, India, Email: durgafec@iitr.ernet.in Siddha Maloo, Mohanlal Sukhadia University, India, Email: siddhamaloo@yahoo.in Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Extracting valuable information from such huge datasets becomes a challenge due to various reasons. First of all many of these have very high dimensionality. To efficiently deal with massive data the dimensionality reduction is the way to go as is kind of the heart of time series data mining. Other reason is that such huge datasets are computationally more expensive. Hence the need for data compression has become a prerequisite to efficient analysis of data. It not only reduces the cost of storing the data but also speeds up the post processing. Papadimitriou et al (2007, September) have done some incredible work on time series compressibility and privacy. Some companies are concerned about the privacy of the data they are releasing. There has been a lot of research in this area and various data mining algorithms have been developed. Various methods like data perturbation, encryption and masking, k-anonymity, association rule mining etc. have been proposed. Samarati & Sweeney (1998) worked on privacy protection using k-anonymity. A crucial thing in analyzing any time series data is to find how different or similar they are and one of the ways to do is to define distance between two time series. It can be then used to decide the extent of similarity of a time series or their dissimilarity. Various methods are used for computing the similarity measure. The most common and the most widely used technique is the Euclidean distance. We would be describing it in detail later. In this paper our aim is to apply data mining techniques, to analyze the exchange rates of currencies. For searching patterns from datasets, Euclidian distance has been used as a similarity measure. We first normalize the datasets for different currencies and then find the Euclidian distances between them. These distances give us a way to compare the similarities and dissimilarities amongst the curves. We have used the exchange rates of fifteen different currencies with respect to the US dollar in this study. This is an extension of the previous work by the Ringne & Toshniwal (2013) wherein six currencies were studied for analyzing trends in them. Section 2 deals with the related work in this area, section 3 explains the Euclidean distance in detail, section 4 illustrates the case studies and results and section 5 deals with conclusions. 2. Related Work The analysis of time series data is crucial in understanding the underlying principles and developing models useful for predicting or collecting them. Several data mining techniques have been developed for studying the large amounts of data. Westphal & Blaxton (1998) describe methods and tools for solving real-world problems. Many algorithms focus on compressing the datasets so as to make the analysis. Efficient similarity search was studied in detail by Agrawal et al in 1993. Struzik & Arno (1999) studied the Haar wavelet transform in the time series similarity paradigm. Toshniwal & Joshi (2005) developed a new method based on moments for measuring the similarity between two time series. For analyzing a set of time series it is important to know how similar or dissimilar they are. Euclidean distance is the most common technique used to measure similarity. Dynamic time warping is another technique which can be used to measure similarity between two time series. Berndt & Clifford (1994, July) describe dynamic time warping in detail. An important thing that data mining researchers have to overcome is the problem of preserving privacy. To address this problem, methods like k-anonymity have been used. The randomization method is another technique for privacy-preserving data mining in which noise is added to the data in order to mask the attribute values of records. The noise added is sufficiently large so that individual record values cannot be recovered. Therefore, techniques are designed to derive aggregate distributions from the perturbed Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 records. Subsequently, data mining techniques can be developed in order to work with these aggregate distributions. Santini, Simone, & Ramesh Jain presented a survey of Similarity measures in 1999. Garilov in his paper in 2000 provided his insight into the best measure to be used to mine stock market data. Das et al. (1997) also worked on finding similar time series. 3. Similarity search In this section we would be explaining briefly the data mining techniques that we would be applying to the datasets. We would search for patterns and similarities in the exchange rate of six different currencies using the Euclidean distance. 3.1 Similarity Measure using Euclidean Distance Given the two points (x1, y1) and (x2, y2), the distance between these points is given by the equation (1). d x2 x1 2 y2 y1 2 ……………...……………… (1) This distance known as the Euclidean distance is also used to find the ‗distance‘ between two time series – Wikipedia. Given two time series Q = q1…qn and C = c1…cn the distance between two time series is given by equation (2). DQ, C qi ci …………………...……………... (2) n 2 i 1 But if we naively try to measure the distance between two ―raw‖ time series, we may get very unintuitive results. This is because Euclidean distance is very sensitive to some ―distortions‖ in the data. For most problems these distortions are not meaningful, and thus we can and should remove them. 3.2 Data Normalization Offset Translation: This is done when the two time series are similar but differ just by an offset so it becomes meaningful to compare the distances with the offset removed. Amplitude Scaling: This refers to normalization of the amplitudes so that they range between 0 and 1. It is done based upon the standard deviation and the mean of the time series data given by the equations (3) and (4) Q Q mean(Q) std (Q) … C C mean(C ) … std (C ) …………………………… (3) …………………………… (4) Noise: Another transformation that must be done before doing any analysis with the time series data is to remove noise or smooth out the time series so that we achieve better accuracy in our results. Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 In calculating the value of Euclidean distance, we take a square root which is quite expensive. When we compute the Squared Euclidean distance we do not take the square root. This makes computation of squared Euclidean distance very fast as compared to the Euclidean distance. In this paper we have used squared Euclidean distance as a similarity measure. It must be noted that squared Euclidean distance is not a metric as it does not satisfy the triangle inequality, but it is very fast and is used frequently when the distances have to be compared. 4. Case Study and Results In this section we will first describe the data that is being used in the case study. This will be followed by the general observations that are seen in the data finally followed by a discussion on the results 4.1 Case Data It is known that a time series is a sequence of data which are taken with time. As such gold prices, the Exchange rates, the sales of a company, sales data, the data from a nuclear power plant etc are time series. And hence we apply time series data mining techniques in order to study the data sets. We run the algorithm on the real time varying datasets. S. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Table 1 The countries whose currency exchange rates are to be studied Continent Country Africa South Africa Asia Malaysia Asia India Asia Japan Asia China Australia Australia Australia New Zealand Europe Switzerland Europe Sweden Europe Denmark Europe Norway Europe United Kingdom North America Mexico North America Canada South America Brazil The table 1 shows the countries under consideration and the Figure 1 shows the graphs of the conversion rates of fifteen countries viz. Switzerland, Australia, Malaysia, Brazil, Sweden, New Zealand, Denmark, Norway, India, the Great Britain, Canada, South Africa, Mexico, Japan and China from 1922 to 2010. The graph shows the conversion rates of the currencies per US Dollar. 4.2 General observations and Discussions Each country has been considered individually. Franc, the currency of Switzerland shows a general declining graph, which shows that the currency is becoming stronger with Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 respect to the US Dollar. The price of 1 US dollar was around 5 Francs in 1920s but it is now close to 1 Franc. It was stable from 1940s to 1970s but then started gaining popularity and strength against the US Dollar. In case of Australia, the currency was Australian Pound till February 1966 when it was changed to Australian Dollar. Australia left the pre-decimal pound system and moved to a decimal currency. The graph shows an increasing trend till 2001 which is the weakest year for the Australian Dollar. It gained its strength slowly after that and is currently at par with the US Dollar. Figure 1. The conversion rates of different currencies from 1920s till 2010 The country of Malaysia had several currencies which go in this order: Straits Settlement Dollar, the Malayan Dollar, the Malaysian Dollar and the Ringgit. The general trend is steady for Malaysia and the currency is particularly strong as compared to the Indian Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Rupee. Brazil has seen considerable fluctuations in its currency. Particularly in the yare 1992 which worsened in 1993. Through the 1980s and 1990s, the Brazilian economy suffered from rampant inflation that subdued economic growth. The economy started to stabilize after the introduction of Plano Real which was a set of measures to stabilize the economy by introducing a new currency. The Sweden Kronor has also stayed fairly stable, with minor fluctuations. The mean price is around 6 Kronor for 1 US Dollar. The peak was in 2001 which matches with the recession that hit many countries. New Zealand conversion rate graph shows an increasing trend till 200 after which it is declining to reach the price of 1.5 NZ dollars for 1 US Dollar. New Zealand changed its currency from New Zealand Pound to New Zealand Dollar in 1968, following the decision by Australia to abandon the pre-decimal pounds in 1966. We cannot just overlook the similarity between the currencies of Denmark, Norway and Sweden. All these currencies have exactly the same trend at the same times, the similar dips and troughs. Though the currencies are named differently, their geographical location and similar economy due to it is the reason why these currencies move similarly. These three countries form the Scandinavians and have their own versions of kroner, which seem to be of similar value. The conversion rate for Indian Rupee and British pound per US Dollar shows an increasing tendency in Figure 1. While for the Indian Rupee the rise is very sharp beginning around 1990s the same increase for the British Pound is gradual. Also it may be noted that 1990s was the time when India started the path of globalization. This paper in no way concludes that globalization is the cause for the rise, but just states that fact that the time period of both the events coincides. Both the British Pound and the Indian Rupee show an increasing graph, which reflects the weakening of these two currencies. While for the British Pound this weakening is insignificant, as the conversion rate changed only from 0.2 to 0.7, for India the weakening is an issue which draws concern towards the economy of the biggest democracy in the world. In 1920s the price of 1 US Dollar was 4 Indian Rupees which rose to 10 rupees in 1980s after which it grew tremendously touching the 50 mark. The Canadian Dollar is more or less strong and is equally valued as compared with the US dollar. Though the graph shows many ups and downs, if we look closely they are largely insignificant. There is a peak in the year 2001 as in many other currencies, after which it gained its strength and was almost equal to 1 US dollar in 2010. South Africa, a country which has been a prey to the British invasion just like India has a history of using two different currencies. The South African Pound was used till 1960 which was replaced by the national currency Rand in 1961. Rand shows a peak in the year 2001. Mexico also had an economic crisis similar to the one at Brazil and around the same time. Till 1992 old pesos were used, but action was taken in 1993 to stabilize the economy and new pesos were then used as the currency in Mexico. The exchange rate of Japanese Yen was was low till 1940s. It shows a negative slope since the 1960s. For Chine we do not have data from 1941 till 1980. In the 1980s and 90s the Chinese Yuan vs. US dollar conversion rate graph shows an increasing slope. It is noteworthy that the recession that hit many countries around 2001 left China unaffected. The economy has been stable since 1995. Before we move forward with the similarity metric Euclidean distance, let us focus on some of the features evident in the graph. Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Global crisis of 1985: It must be noted that the year 1985 was one of the world‘s worst economic crisis years. The growth rate in USA was around 1%. We can see the evidence of this economic crisis in the peaks of the exchange rate curves. Switzerland, Australia, Brazil, Sweden, Norway, Denmark, New Zealand, the great Britain, Canada, Japan show a peak in the graph in the year 1985. On the other hand India, South Africa and China seem to be unaffected by this global crisis, showing that their economy was not so much dependent on the US economy at that time. This changes when we saw the recession of 2001. India was affected by that crisis, though not as much as the other countries were, but the effect was significant and conspicuous. Global crisis of 2001: The Canadian Dollar, the Norway Kroner, the Indian Rupee, the Denmark Krones, the British Pound, Australian Dollar, Sweden Kroner, New Zealand Dollar and South African Rand all show a peak around the year 2001. It signifies that their economy was affected due to the recession that came with the recession of 2001 after the 9/11 attack on World Trade Centre in United States. Japan’s economy An interesting phenomenon that we see in the Figure 1 is the prices of Japanese Yen per US Dollar. And after the end of World War II i.e. in 1945 and the dropping of the atomic bomb on Hiroshima and Nagasaki the economy of Japan was disturbed. This leads to the steep slope of the increasing exchange rates from 1945 and the large stable value till 1965. As the economy began to settle and Japan started to develop the exchange rate of Japanese Yen per US Dollar started showing a decreasing trend. 4.3 Results from the Euclidean distance similarity metrics For this case study we normalized the data so that the values would lie between 0 and 1 for each period of interest. We use a period of interest (POI) of 20 years. Our aim is to spot the trends in this POI. The first POI is from 1961 to 1980. What we have done is, used the values from 1960 to 1980 for all the four currencies and calculated the Euclidean distance amongst them. The results are tabulated in Figure 4. The next POIs are from 1971 till 1990, 1981 till 2000 and 1991 till 2010. For each pair of currencies we computed the Euclidian distance between them and the results are tabulated in Figure 2, 3, 4 and 5. We will see the results and discussions for each Period of Interest individually. Period of Interest 1960-1980 For the first POI i.e. from 1960 to 1980, the ten pairs of currencies which behaved most similarly were South Africa-India, Mexico-Brazil, Denmark-Malaysia, South Africa-Sweden, South Afrca-Canada, Norway-Malaysia, Britain-New Zealand, India-Sweden, IndiaCanada and Australia-New Zealand. It is noteworthy that Japan and Switzerland have a high value for squared Euclidean distance for all the currencies. Graph for Japan shows a decreasing slope during this period of interest, i.e. it is recovering from the bombings in 1945 and the economy is growing well, whereas Switzerland‘s economy also seems to be quite strong. Both these countries have a high similarity which is evident from the value of their squared Euclidean distance with respect to each other. Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Figure 2. Squared Euclidean distance between exchange rates of currencies for POI 1 1961-1980 Figure 3. Squared Euclidean distance between exchange rates of currencies for POI 2 1971- 1990 Figure 4. Squared Euclidean distance between exchange rates of currencies for POI 3 1981- 2000 Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Figure 5. Squared Euclidean distance between exchange rates of currencies for POI 4 1991-2010 Period of Interest 1971-1990 In the POI 1971 to 1990 the currencies with similar trend were South-Africa-India, Australia- New Zealand. These two pairs had the value of the squared Euclidean distance less than 3. In this POI too, Japan was on the path of strengthening its economy. Switzerland‘s economy seems to be good too in this POI. The most dissimilar currency pair was Denmark-Malaysia. While the slope of Malaysian currency exchange rate was almost zero, the Denmark Krones has an increasing slope till 1985. Period of Interest 1981-2000 In the POI 1981 to 2000, the most similar currency pair was Japan-Switzerland, both of them rapidly gaining strength and showing a negative slope in the graph. Interestingly it is the only pair below the squared Euclidean distance 3. The countries whose currencies behaved differently or rather interestingly in this time of interest are Brazil, China, Japan and Mexico. Brazil and Mexico both were battling with inflation, corruption and a dying economy. Nevertheless the similarity index for Brazil and Mexico, do not suggest that they behaved any similarly. Brazil has a peak in 1993 while Mexico‘s economy was at its worst in 1991 after which it recovered due to government interventions and schemes. Japan continues to grow in this POI and strengthen its currency. China seems indifferent and it gives the impression that its economy is not affected at all by the other countries or events and is completely independent. Chinese economy is similar to India, South Africa and Malaysia. It becomes more evident in the next POI wherein the currency exchange rate is non-fluctuating and is constant at 6 Yuan per US dollar. Period of Interest 1991-2010 In the POI 1991 to 2010, the most similar currency pair is again that of Japan-Switzerland. New Zealand-Australia and Sweden-Australia are the two other country pairs who behave very similarly in this period of interest. The most dissimilar currency in this POI is JapanIndia. While the graph for Japan is a slow declining one, which is good for the country, the exchange rate of Indian Rupee per US dollar has increased significantly. As mentioned earlier the affect of the Global economy on India started with the beginning of liberalization and globalization in the 1980s and 1990s. Beginning with 2.74 Indian Rupees per US dollar, we have the rate at 45.65 Indian Rupees per US dollar in 2008. This is an increase of 1566.05% from the initial 2.74 in 1928. Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Table 2: Possibly correlated currencies Sr. No 1 2 3 Countries whose currencies are correlated South Africa-India Sweden-Norway-Denmark New Zealand-Australia Currencies which behaved differently than the others: Japan Switzerland China When we consider the entire time period from 1960 to 2010, we find these three currency pairs most similar: South Africa-India, Japan-Switzerland and New Zealand-Australia. In these three pairs the former ones are more similar than the latter ones. Japan-New Zealand was the currency pair that was most dissimilar followed by Japan-India and Japan-Australia. 5. Conclusion Data mining is used for the study of the exchange rates of currencies for fifteen different countries. The analysis of the Exchange rate curves using the Euclidean similarity measure after normalization gives us a whole new insight into the correlations between currencies. We concluded that the decline of Japanese Yen can be attributed to the bad economy after the end of World War II. Switzerland, Australia, Sweden, Denmark, Norway, New Zealand, India, Britain, Canada, South Africa and Japan show a peak around the year 2001. It signifies that their economy was affected due to the recession that came with the recession of 2001 after the 9/11 attack on World Trade Centre in United States. It must be noted that the year 1985 was one of the world‘s worst economic crisis years. The growth rate in USA was around 1%. We can see the evidence of this economic crisis in the peaks of the exchange rate curves. Switzerland, Australia, Brazil, Sweden, Norway, Denmark, New Zealand, the great Britain, Canada, Japan show a peak in the graph in the year 1985. On the other hand India, South Africa and China seem to be unaffected by this global crisis, showing that their economy was not so much dependent on the US economy at that time. This changes when we saw the recession of 2001. India was affected by that crisis, though not as much as the other countries were, but the effect was significant and conspicuous. When we consider the entire time period from 1960 to 2010, we find these three currency pairs most similar: South Africa-India, JapanSwitzerland and New Zealand-Australia. In these three pairs the former ones are more similar than the latter ones. Japan-New Zealand was the currency pair that was most dissimilar followed by Japan-India and Japan-Australia. References Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Morgan kaufmann, 2006. http://en.wikipedia.org/wiki/Euclidean_distance Proceedings of 8th Annual London Business Research Conference Imperial College, London, UK, 8 - 9 July, 2013, ISBN: 978-1-922069-28-3 Agrawal, Rakesh, Christos Faloutsos, and Arun Swami. Efficient similarity search in sequence databases. Springer Berlin Heidelberg, 1993. Struzik, Zbigniew R., and Arno Siebes. "The Haar wavelet transform in the time series similarity paradigm." Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 1999. 12-22. Toshniwal, Durga, and Ramesh C. Joshi. "Finding similarity in time series data by method of time weighted moments." Proceedings of the 16th Australasian database conference-Volume 39. Australian Computer Society, Inc., 2005. Berndt, Donald, and James Clifford. "Using dynamic time warping to find patterns in time series." KDD workshop. Vol. 10. No. 16. 1994. Ringne A., Toshniwal D., ―Interpreting Financial Datasets using Time Series Data mining Techniques: a searhch for similarities and features‖, 3rd IIM Ahmedabad International Conference on on Advanced Data Analysis, Business Analytics and Intelligence, April 2013 http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance Westphal, Christopher, and Teresa Blaxton. "Data mining solutions: methods and tools for solving real-world problems." 1998. Rafiei, Davood, and Alberto Mendelzon. "Similarity-based queries for time series data." ACM SIGMOD Record 26.2, 1997: 13-25. Warren Liao, T. "Clustering of time series data—a survey." Pattern Recognition 38.11, 2005: 1857-1874. http://en.wikipedia.org/wiki/First_moment_of_area Santini, Simone, and Ramesh Jain. "Similarity measures." Pattern analysis and machine intelligence, IEEE transactions on 21.9 (1999): 871-883. Gavrilov, Martin, et al. "Mining the stock market (extended abstract): which measure is best?." Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2000. Das, Gautam, Dimitrios Gunopulos, and Heikki Mannila. "Finding similar time series." Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 1997. 88-100.