International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013 380 ISSN 2278‐7763 STOCK EXCHANGE IFORECASTING USING HADOOP MAP-REDUCE TECHNIQUE KUSHAGRA SAHU, REVATI PAWAR, SONALI TILEKAR, RESHMA SATPUTE DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA; DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA; DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA; DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA. Email: kushagr007@gmail.com ABSTRACT This article is based on Cloud Based Stock forecasting using neural network and Cloud as Hadoop. Stock Market has high profit and high risk features, on the stock market analysis and prediction research has been paid attention by people. The stock price trend is complex nonlinear function so the price has certain predictability. This article mainly with improved BP neural network (BPNN) for the stock price prediction .And result show that method has good prediction effect on stock price. Using this paper one can predict stock future trend of multiple companies by using Map Reduce Technique for parallelism and achieving accurate results. Features of article are SMA graph, EMA graph, OBV graph, Prediction of stock KEYWORDS : MapReduce; Hadoop; Stock market Forecasting; Technical Indicators; Graphical indicators; BP Neural Network. 1 INTRODUCTION Hadoop MapReduce is a latest framework specially designed for processing large datasets on distributed sources. Apache’s Hadoop is an implementation of MapReduce. This article proposes to utilize the parallel and distributed processing capability of Hadoop MapReduce for managing heterogeneous query execution on large datasets. Map and Reduce techniques to break down the parsing and execution stages for parallel and distributed processing. In this paper section 2 presents Hadoop MapReduce framework. Section 3 presents the Stock Market Forecasting. Section 4 presents Technical Indicator, Section 5 Graphical Indicator and Section 6 BP Neural Network and then Conclusion 2 HADOOP MAPREDUCE MapReduce is a programming model for expressing distributed calculation on massive amount of data and an execution framework for large-scale data processing on clusters of article of trade servers [1] [2]. It was originally developed by Google and built on well-known principles in parallel and distributed processing Hadoop is the open source implementation of MapReduce [4][1][2] written in java which provides reliable, scalable and fault tolerance distributed computing. Hadoop environment set up involves a great number of parameters which are crucial to achieve excellent performance. It allows programmers to develop distributed applications without any knowledge. Key-value pair forms the basic data structure in MapReduce. Keys and values may be primitives such as integers, floating point values, strings, and raw bytes or they may be arbitrary complex structures (lists, tuples, associative array, etc.). Programmers typically need to define their custom data types. The map function takes the input record and generates intermediate key and value pairs. The reduce function takes an intermediate key and a set of values to form a smaller set of values. Typically just zero or one output value is produced by the reducer. In MapReduce, the programmer defines a mapper and reducer with the following signature: Map (k1, v1) → [(k2, v2)] Reduce (k2, [v2]) → [(k3, v3)] [….] denotes the list. MapReduce framework is responsible for automatically splitting the input, distributing each chunk to mappers on multiple machines, grouping and arrangement all intermediate values related with the intermediate key, passing these values to reducers on multiple resources, this is shown in Fig1. Monitoring the execution of mappers and reducers as to re-execute them when failures are detected is done by the master. It is not uncommon for MapReduce jobs to have thousands of individual tasks that need to be assigned to nodes in the cluster. In large jobs, the total number of tasks may exceed the number of tasks that can be run on the clusters concurrently, making it necessary for the scheduler to maintain some sort of a task queue and to track the progress of running tasks, so that waiting tasks can be assigned to nodes as they become available. Fig 1 .shows Simplified view of MapReduce Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013 381 ISSN 2278‐7763 C. On Balance Volume (OBV)1) It is technical analysis indicator used to relate price & volume in stock market. 2) If OBV constantly increases means there are upward trends. 3) If OBV constantly decreases means there are downward trends. 4) If today’s Close > yesterday’s close then OBV= OBV (yesterday) + Volume (today) 5) If today’s Close< yesterday’s close then OBV=OBV (yesterday)-Volume (today) Fig 1. View of MapReduce 3 STOCK MARKET FORECASTING The stock market reflects the variation of the market economy, and receives ten million investors’ focus since its opening development. The stock market is characterize by high-risk, high-yield, so investors are concerned about the analysis of the stock market and trying to forecast the trend of the stock market. However, stock market is impacted by the politics, economy and many other factors, coupled with the complexity of its internal law, such as price changes in the non-linear, and shares data with high noise characteristics, therefore the traditional mathematical statistical techniques to forecast the stock market has not yielded suitable results. Neural networks can approximate any composite non-linear relations and has robustness and fault-tolerant features. Therefore, it is very suitable for the analysis of stock data. 5 GRAPHICAL INDICATORS In this we will display various graphs like bar chart, line chart by using in built java library Jfreechart. 6 BACK PROPAGATION NEURAL NETWORK BP neural network algorithm [6] [7] is a supervised learning algorithm, its main idea is: Enter the study samples and then we can use the back-propagation algorithm to adjust the weights and bias of network by repeated training. Ensure the output vector is close to the expected vector as far as possible. Back Propagation Neural Network consist of 3 main steps: A. Perform a forward pass on the network. 1. Output of neuron=∑(input*weight) B. 4 TECHNICAL INDICATORS This method is used for analysis purpose by using one of the following feature users can see graph of that company by entering period as input. A. Simple Moving Average (SMA)1) SMA is basic of the moving average used for treading. 2) It is based on closing price. Ex. Daily Closing price- 11,12,13,14,15,16,17 To Find MA of day1st day- (11+12+13+14+15)/5=13 2nd day- (12+13+14+15+16)/5=14 3rd day- (13+14+15+16+17)/5=15 & so on. B. Exponential Moving Average (EMA) – 1) Try to reduce Lag by applying more weight to recent price. 2) EMA (Current) = ((Price (Cur) – EMA (Prev))*Multiplier) + EMA (Prev) Multiplier = (2/ (Time period+1)) Perform a reverse pass (training). 1. Error for Output layer δoutput_layer= (Target valueOutput value)*(1- Output value) *Output value. 2. New Weights for Output layer Wn= Wn+( hidden δhidden_layer=( δoutput_layer*input) 3. Error for layer δoutput_layer*weight)(1-Output Value 4. New Weights for value)*Output Hiddenlayer Wn= Wn+( δhidden_layer*input) C. Perform a further forward pass and comment on the result. Once modelling an artificial functional model from the biological neuron, we must take into account three basic components. First off, the synapses of the biological neuron are modelled as weights. Let’s remember that the synapse of the biological neuron is the one which interconnects the neural network and gives the strength of the connection. For an artificial neuron, the weight is a number, and represents the synapse. A negative weight reflects an inhibitory connection, while positive values Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013 382 ISSN 2278‐7763 designate excitatory connections. The following components of the model represent the actual activity of the neuron cell. All inputs are summed altogether and modified by the weights. This activity is referred as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be -1 and 1. server. MapReduce in Hadoop is used for developing virtual data nodes for parallel processing of the data using Neural Network for time efficient forecasting of the future trend of the stock market. Technical indicators such as SMA, EMA and OBV makes analysis of esoteric stock data more understandable. Neural Network algorithm can predict with more accuracy. This way we eliminate the problem of false prediction that may result from incompetent methods that requires more human observation on the data. Hadoop enables us to deal with huge amount of data and enables parallel processing on input data which makes the result available in finite duration of time with more accuracy. It is defined at the application level that low execution overhead is imposed on runtime environment. 9 ACKNOWLEDGEMENT Mathematically, this process is described in the fig. 2 Fig 2. BPNN From this model the interval activity of the neuron can be shown to be: We would like to extend our sincere, heartfelt gratitude to our Head of Department,Prof. Mrs. Sarika.Zaware and our internal guide Prof. Mrs. S. Pimpalkar AISSMS IOIT, under whose guidance we had the privilege of working and learning and whose constant inspiration at all faces of the article lead to the successful completion of our work 9 REFERENCES [1] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Google Research Publication (2004). [2] Ralf Lammel. Google's MapReduce Programming Model Revisited.Science of Computer Programming archive. Volume 68, (2008). The output of the neuron, vk, would therefore be the outcome of some activation function on the value of vk. [3] Apachee Hadoop,http://Hadoop.apache.org. 7 CONCLUSION [4] Hammoud, S., Maozhen Li, Yang Liu, Alham N.K., Zelong Liu. MRSim: A discrete event based MapReduce simulator. Seventh International IEEE Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2010. This paper presented a novel highly automated approach for Stock Exchange forecasting using Hadoop Cloud. [5] Tom White. Hadoop: The Definitive Guide. O’Reilly, Scbastopol, California, 2009. Our approach consists of Identifying company selected by the user and downloading the same from yahoo finance. Normalizing the data and performing BPNN algorithm on the data received from yahoo [6] Yong Liao. Based on Gene Expression Programming and the timeseries analysis of the price of stock. Chinese dissertation database, 2005 [7] Zhou Yixin, Jie Zhang”Stock data analysis based on BP neural network” 2010 Second International Conference on Communication Software and Networks. Copyright © 2013 SciResPub. IJOART