Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 BigData Processing Using MapReduce Foreign Exchange (EUR/USD Currency Pair) Say Er Lim, Hui Kim Law, Saeed Aghabozorgi and Ying Wah Teh This paper describes how using Hadoop MapReduce to process big data. The big data that used in this project is foreign exchange rate of EUR/USD currency pair which taken day by day within a minute. Firstly, the foreign exchange data will load into a Linux environment that stimulated by the Ubuntu that already set up in a desktop computer by using Hadoop MapReduce function. After that, we extract the required data from the Hadoop that has been successfully loaded. Then, those data are used to show time series and predict the foreign exchange rate for the future (e.g. the next day). BigData Processing Using MapReduce Foreign Exchange (EUR/USD Currency Pair) JEL Codes: F34, G21 and G24 1. Introduction Advance in technology and social networks have brought a lot of data. The volume of data is increasing, become more complex, high velocity and the type of data is variable. The size of big data might be petabytes, it collected by millions of people that consisting of billions to trillions of record. Furthermore, big data is coming from a variety of sources such as social media, web, sales, customer information and other. The large and complex data sets are difficult and slow to process efficiently by using traditional data processing applications. The challenges of those processing applications are hard to process, capture, store, transfer and analysis the data sets. The big data used in this project is the EUR/USD foreign exchange’s data. Foreign exchange is the conversion of currency into another currency. The definition of foreign exchange from Cambridge Advanced Learner’s Dictionary & Thesaurus is described as the system by which the type of money used in one country is exchanged for another country’s money, making international trade easier. The foreign exchange market enables currency conversion to assists international trade and investment. US dollar (USD), euro (EUR), Japanese yen (JPY), British pound (GBP) and Australian dollar (AUD) are the major currencies in the foreign exchange market. EUR/USD is a widely traded currency pair in the world (Bekiros & Diks, 2008). The foreign exchange market is representing the largest asset class in the world leading to high liquidity, it is unique and its trading volume is huge. The foreign exchange market operates continuously day by day with 24 hours per day. Thus, the exchange rates are inconsistent, it might change every day with every minute either rise or decline. Foreign exchange rate is among the most important economic indices in the international monetary market. In foreign exchange markets, normally we have two sets of price data which are bid and ask price. Ask is the price that the broker will sell you the position you required, while bid price is the price which a broker will buy your current day trading position from you. Broker uses the bid and ask price to buy current trading position or use it to sell the trading position to intended buyer. In addition, there have two sets of data to refer to the opening _________________________________________________________ Dr. Saeed Aghabozorgi, Department of Information Systems, University of Malaya, Malaysia Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 and the closing price end of the period respectively in foreign exchange chart. There have a lot of factors can affect the ask and bid price of foreign exchange market such as volatility of trading market, differentials in interest rates, differentials in inflations and other. The effect of foreign exchange fluctuations might affect the profitability of an organization’s business and caused the organization is put to exchange risk. Due to foreign exchange market trade is operating every day, so the data for foreign exchange market is large and high rate fluctuation. Therefore, these data need to be processed, stored, analyzed and predicted in order to see the trend of the foreign exchange and help the buyer and seller to identify and make a profit trading. In this paper, we will explain about installation of Ubuntu and configuration of Hadoop to store data and retrieve it. Then we will explain about the Moving Average approach which is used to predict the foreign exchange. The rest of this paper is organized as follows. In Section II, the related works are described. The Installation of Ubuntu and configuration of Hadoop to stimulate a Linux environment for processing big data is briefly discussed in Section III and IV. In Section V, we will outline the Moving Average algorithm that applied on foreign exchange time series datasets and the system architecture. The Graphical User Interface (GUI) for this user module is described in Section VI. In Section VII, conclusion and future perspectives are drawn. 2. Literature Review Many authors have tried to predicti exchange market such as (Christiansen, 2011; Du & Hu, 2014; Dueker & Neely, 2007; Evans, Pappas, & Xhafa, 2013; Gradojevic, 2013; Hutson & Laing, 2014; Kiani, 2013; Kóbor & Székely, 2004; Narayan, 2013; Ranaldo, 2009; Sarno, Schneider, & Wagner, 2012; Sewell & Shawe-Taylor, 2012; Talebi, Hoang, & Gavrilova, 2014). Among all of these works, in a study, Meese and Rogoff showed that naïve random walk benchmark model is better than conventional linear models in forecasting future exchange rates (Abhyankar, Sarno, & Valente, 2005). The authors Chun Teck, Tze Haw and Chee Wooi employ artificial neural networks (ANNs) and unconditional Vector Autogressive model (VAR) to predict Yuan/USD exchange rates by using monetary fundamentals (Lye, Chan, & Hooy, 2011). The result of them shows that ANNs outperformed in market rate forecasts and are supported by monetary fundamentals. Besides that, some researchers had used order flow in exchange rate prediction. They found out that order flow can provide powerful information that allow public to forecast the daily exchange rate. Mahnaz Mahdavi had used the loss function approach of Bayesian statistics to forecast foreign exchange rate in his paper. He proposes a loss function in his forecasting model and the Bayesian forecasts slightly outperformed the classical forecast of foreign exchange (Mahdavi, 1997). In the paper of Forecasting of foreign exchange rates of Taiwan’s major trading partners by novel nonlinear Grey Bernoulli model NGBM, the authors had study the feasibility and effectiveness of novel Grey model with the concept of Bernoulli differential equation for foreign exchange prediction. Novel Nonlinear Grey Bernoulli Model (NGBM) has shown improving in the precision of the traditional Grey forecasting model in the preliminary result of this paper and this model is successfully applied in forecasting annual foreign exchange rates of 13 countries in year 2005 (Chen, Chen, & Chen, 2008). Furthermore, from the paper that I study, the authors use relative power parity (PPP) model based on consumer price index (CPI) or traded-goods price index (TPI) and a linear forecasting technique to determine Yen/US Dollar exchange rates over a short-term horizon period. The TPI-based PPP-model in outperforming the pure random walk is better than CPI-based PPP-model (Grossmann & Simpson, 2010). Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 However, CPI-based PPP-model produced lower forecast error than a random walk model. An adaptive autoregressive moving average (ARMA) combining with differential evolution (DE) based training forecasting model had been studied by some researchers to shows that this proposed ARMA-DE exchange rate prediction model has superior prediction potential in short and long range if compare to other models (Rout, Majhi, Majhi, & Panda, 2014). Neural network is one of the forecasting models for foreign exchange market. Yeo state that neural network techniques are prime candidates for prediction purpose of high volatility, complexity and noise market environment (Yao & Tan, 2000). Neural networks model able to use fundamental and technical indicators as an input to simulate fundamental and technical analysis, can also decrease prediction risks (Yao & Tan, 2000). In addition, an adaptive fuzzy network with a parallel genetic algorithm also is a good choice for predicting the foreign exchange. Fuzzy inference system has the ability to approximate any non-linear mapping (Kosko, 1993). The genetic algorithm and the adaptive fuzzy network system will optimize the network to approximate the mapping. AutoRegressive Integrated Moving-Average (ARIMA) is also a foreign exchange forecasting model that used by many researchers in foreign exchange market. The ARIMA models are often referred to as Box-Jenkins models and are first popularized by Box and Jenkins. ARIMA model combining its own past values, past errors, current and past values of other time series to predict a value in time series. ARIMA model consist three stages which are identification stage, estimation and diagnostic checking stage, and the last stage is forecasting. 3. Map reduce MapReduce is a computing model, it used for efficiency processing large data sets and distributed over cluster of computers. However, Hadoop is an open source Java programming framework; it implements a computational paradigm named MapReduce for processing large data sets on distributed computing environment. MapReduce is a programming model and software framework proposed by Google(Dean & Ghemawat, 2008) . The Hadoop MapReduce is inspired by the Google’s MapReduce that invented in the year 2004, where a software framework application could be broken down into numerous small parts. This Hadoop MapReduce is a popular big data processing engine that dedicated to scalable and distributed data intensive computing. MapReduce consist and perform two separate and user-defined functions which is map and reduce in Hadoop program. First, the data sets will be split into smaller chunks and then distributed as an input into map process. The map process will break down the individual elements into tuples (key/value pairs). After that, the Hadoop MapReduce framework sorts the outputs of the maps, which are then input to the reduce process. The reduce job will combine those data tuples into a smaller set of tuples to form the output. 4. Setup Firstly, before storing and processing the foreign exchange data, the installation and configuration for the Hadoop MapReduce in the personal computers (stand-alone system) are needed. From the literature review (Daneshyar & Patel, 2012) that has been found, it is determined that the Hadoop MapReduce is more suitable to install on the Linux environment than the windows environment because the windows environment had problems connecting to the distributed cluster(Daneshyar & Patel, 2012). By default the personal computer is using the windows environment, so, it is highly recommended to install the Ubuntu operating system into the personal laptop in order to run the Hadoop Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 MapReduce. This Ubuntu operating system is a complete desktop Linux-based operating system that allows the Linux application to be compiled and run on a windows operating system in secure - the files and data will stay protected, as well as it loads quickly on any computer. The installation of this Ubuntu operating system enables the Hadoop MapReduce to run on the windows laptop over the Ubuntu. After installation of the Ubuntu operating system, the Hadoop MapReduce in the Ubuntu operating system needs to be configured before it can be used by executing the command. Then, the foreign exchange rate for EUR/USD currency pair can be loaded into the Hadoop MapReduce, and user needs to key in the Java coding to extract the desired data such as date, time and closing ask of the EUR/USD foreign exchange rate as the output. 5. MOVING AVERAGE TECHNIQUE Time series data is ordered by time, exchange rate is time series data and its data is collected at specific points in time. The data (exchange rate) that we measuring are referred as variable. Commonly, the frequencies of time series data are observed at annual, quarterly, monthly, weekly or daily. In this project, we observed the frequency of exchange rate in daily. Time series analysis includes methods that use for analyzing time series data in order to extract useful and meaningful statistics and also other characteristics of the data. The techniques of time series analysis may be parametric or non-parametric methods. Time series prediction is use of a model to predict future values based on previously observed values. The exist a lot of time series prediction techniques that use previously observed values or data as the basis of estimating future outcome such as moving average, weighted moving average, exponential smoothing, autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), linear prediction, trend estimation, growth curve and other techniques. In this paper, Moving Average technique will be used to analyze the data and performing prediction. The extracted output from the Hadoop MapReduce will be passed to the Moving Average model for further analysis by performing a series of calculation on the closing ask of foreign exchange rate in order to predict the future exchange rate. Moving average also called rolling average or running average in statistics. The moving average model is a simple and common technique that used with time series data to analyze a set of data points, and it can smooth out the fluctuations and highlight longer-term trends. This moving average model is often used in technical analysis of financial data such as stock prices, exchange rate or trading volume and can also use in economics to examine microeconomic time series. More than that, moving average is one of the most used indicators in Foreign Exchange Market (FOREX). A moving average’s formula is taken to predict the foreign exchange rate after identifying and extracting necessary data from Hadoop MapReduce. The following example illustrates Moving Average modeling and prediction using a simulated data set containing a time series data. The reasons for choosing Moving Average model as big data analytics and prediction of foreign exchange rate is because the data analysis of EUR/USD exchange rate is within one day per minute time series and its focus is only for the closing ask. It focused on the closing ask is because the closing asks are the most real data of the day and this ask rate will be brought to the next day’s open asks, furthermore people mostly use this ask rate to buy the current trading position from a broker or changing the other country’s currency. In addition, using moving average for analysis and predicting foreign exchange rate is because it need rely on previous observed exchange rate to perform further forecasting. Essentially the analysis performed by Moving Average modeling is divided into two stages. The “Identification” and “Prediction” stages are summarized below. Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 A. Identification Stage The first process in identification stage is to specify the input data set. The input data set is the foreign exchange rate of EUR/USD currency pair. Then use an identify statement to read the data of EUR/USD foreign exchange rate. After that, extract the wanted parameters from the Hadoop MapReduce as an output to plot a time series graph according to the date (as an input) that enter by users. Table 1 shows the example of EUR/USD foreign exchange rate data set, and the time series of EUR/USD foreign exchange rate that has been plotted is shown in the Fig. 1 below. The system architecture is shown in Fig. 2 below. TABLE 1. EUR/USD Foreign Exchange Rate Data Sets Date Time 12-09-2012 12-09-2012 12-09-2012 12-09-2012 12-09-2012 12-09-2012 12-09-2012 12-09-2012 12-09-2012 12-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 11-09-2012 00:09:00 00:08:00 00:07:00 00:06:00 00:05:00 00:04:00 00:03:00 00:02:00 00:01:00 00:00:00 23:59:00 23:58:00 23:57:00 23:56:00 23:55:00 23:54:00 23:53:00 23:52:00 23:51:00 23:50:00 23:49:00 23:48:00 23:47:00 23:46:00 23:45:00 23:44:00 23:43:00 23:42:00 23:41:00 23:40:00 EUR/USD (Close, Ask) 1.28617 1.28617 1.28620 1.28618 1.28616 1.28627 1.28622 1.28625 1.28625 1.28625 1.28620 1.28616 1.28615 1.28632 1.28607 1.28611 1.28604 1.28602 1.28625 1.28619 1.28624 1.28625 1.28626 1.28621 1.28624 1.28622 1.28613 1.28605 1.28622 1.28633 Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 Figure 1. Time Series of EUR/USD Foreign Exchange Rate (From Sept 11, 2012 to Sept 12, 2012). Figure 2. System Architecture A. Prediction Stage When the outputs are extracted and the time series is plotted, the next step is using formula to perform the prediction of future exchange rate. For example, if those exchange rates are R t, Rt-1, Rt-2, …… R t-(N-1) for N days then the formula is: where Rt+1 = Prediction Closing Ask Rate for Period t+1 Rt-1 = Closing Ask Rate for Period t-1 N = Number of Periods in the Moving Average So for example, if a ten-period moving average would be: Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 6. Graphical User Interface (GUI) The user module that used in this paper is the Java Graphical User Interface (GUI). This module is to provide an interface for the user to select based on their preferred date of exchange rate graph and then predict the next closing asks exchange rate accordingly. The GUI performance is shown in the Fig. 3, Fig. 4 and Fig. 5 below. Figure 3. The user interface of EUR/USD Currency Prediction System Figure 4. The users interface that let user make a selection based on their desired date Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 Figure 5. Time Series of EUR/USD Foreign Exchange Rate that generated based on the user selection. 7. Summary and Conclusions We have proposed using Hadoop MapReduce for processing foreign exchange data in this paper. The programming language used in this user module is Java. A simple and clear technique (Moving Average) is used to forecast the exchange rate for EUR/USD currency pair. Besides that, we found out that Hadoop MapReduce is suitable for processing a variety of big data sets, it can minimize the processing time and get the accurate output in the shortest time. Using another algorithm to predict the exchange rate and processing the big data within least time require can be another opportunity for further work. Acknowledgment The authors would like to thank the reviewers for their comments on earlier versions of this paper. This research is funded by University of Malaya Research Grant (UM.C/625/1/HIR/MOHE/SC/13/2). References Abhyankar, A., Sarno, L., & Valente, G. (2005). Exchange rates and fundamentals: evidence on the economic value of predictability. Journal of International Economics. Bekiros, S., & Diks, C. (2008). The nonlinear dynamic relationship of exchange rates: Parametric and nonparametric causality testing. Journal of Macroeconomics. Chen, C.-I. C., Chen, H. L. H., & Chen, S.-P. S. (2008). Forecasting of foreign exchange rates of Taiwan’s major trading partners by novel nonlinear Grey Bernoulli model NGBM (1, 1). Communications in Nonlinear Science and Numerical Simulation, 13(6), 1194–1204. doi:10.1016/j.cnsns.2006.08.008 Christiansen, C. (2011). Intertemporal risk-return trade-off in foreign exchange rates. Journal of International Financial Markets, Institutions and Money, 21(4), 535–549. doi:10.1016/j.intfin.2011.02.001 Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 Daneshyar, S., & Patel, A. (2012). evaluation of data processing using mapreduce framework in cloud and standalone computing. International Journal of Distributed & Parallel Systems, 3(6), 51–63. Dean, J., & Ghemawat, S. (2008). MapReduce: SimplifiedDataProcessing onLargeClusters. Communication of The ACM, Vol.51, No, 107–113. Du, D., & Hu, O. (2014). The long-run component of foreign exchange volatility and stock returns. Journal of International Financial Markets, Institutions and Money, 31, 268– 284. doi:10.1016/j.intfin.2014.04.005 Dueker, M., & Neely, C. J. (2007). Can Markov switching models predict excess foreign exchange returns? Journal of Banking & Finance, 31(2), 279–296. doi:10.1016/j.jbankfin.2006.03.002 Evans, C., Pappas, K., & Xhafa, F. (2013). Utilizing artificial neural networks and genetic algorithms to build an algo-trading model for intra-day foreign exchange speculation. Mathematical and Computer Modelling, 58(5-6), 1249–1266. doi:10.1016/j.mcm.2013.02.002 Gradojevic, N. (2013). Foreign exchange customers and dealers: Who’s driving whom? Finance Research Letters. doi:10.1016/j.frl.2013.11.005 Grossmann, A., & Simpson, M. (2010). Forecasting the Yen/US Dollar exchange rate: Empirical evidence from a capital enhanced relative PPP-based model. Journal of Asian Economics, 21(5), 476–484. Hutson, E., & Laing, E. (2014). Foreign exchange exposure and multinationality. Journal of Banking & Finance, 43, 97–113. doi:10.1016/j.jbankfin.2014.03.002 Kiani, K. M. (2013). Can signal extraction help predict risk premia in foreign exchange rates. Economic Modelling, 33, 926–939. doi:10.1016/j.econmod.2013.06.005 Kóbor, Á., & Székely, I. P. (2004). Foreign exchange market volatility in EU accession countries in the run-up to Euro adoption: weathering uncharted waters. Economic Systems, 28(4), 337–352. doi:10.1016/j.ecosys.2004.02.001 Lye, C., Chan, T., & Hooy, C. (2011). Forecasting Chinese foreign exchange with monetary fundamentals using artificial neural networks. 3rd Int Conf Inf Finance Eng. Mahdavi, M. (1997). A Bayesian approach to foreign exchange forecasting. Global Finance Journal. Narayan, S. (2013). Foreign exchange markets and oil prices in Asia. Journal of Asian Economics, 28, 41–50. doi:10.1016/j.asieco.2013.06.003 Ranaldo, A. (2009). Segmentation and time-of-day patterns in foreign exchange markets. Journal of Banking & Finance, 33(12), 2199–2206. doi:10.1016/j.jbankfin.2009.05.019 Proceedings of 7th Asia-Pacific Business Research Conference 25 - 26 August 2014, Bayview Hotel, Singapore ISBN: 978-1-922069-58-0 Rout, M., Majhi, B., Majhi, R., & Panda, G. (2014). Forecasting of currency exchange rates using an adaptive ARMA model with differential evolution based training. Journal of King Saud University-Computer and …. Sarno, L., Schneider, P., & Wagner, C. (2012). Properties of foreign exchange risk premiums. Journal of Financial Economics, 105(2), 279–310. doi:10.1016/j.jfineco.2012.01.005 Sewell, M., & Shawe-Taylor, J. (2012). Forecasting foreign exchange rates using kernel methods. Expert Systems with Applications, 39(9), 7652–7662. doi:10.1016/j.eswa.2012.01.026 Talebi, H., Hoang, W., & Gavrilova, M. L. (2014). Multi-scale Foreign Exchange Rates Ensemble for Classification of Trends in Forex Market. Procedia Computer Science, 29, 2065–2075. doi:10.1016/j.procs.2014.05.190 Yao, J., & Tan, C. (2000). A case study on using neural networks to perform technical forecasting of forex. Neurocomputing, 34(1), 79–98.