Document 14681184

advertisement
 International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013 380 ISSN 2278‐7763 STOCK EXCHANGE IFORECASTING USING HADOOP MAP-REDUCE
TECHNIQUE
KUSHAGRA SAHU, REVATI PAWAR, SONALI TILEKAR, RESHMA SATPUTE
DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA; DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA; DEPARTMENT OF
COMPUTER, AISSMS’S IOIT, PUNE, INDIA; DEPARTMENT OF COMPUTER, AISSMS’S IOIT, PUNE, INDIA.
Email: kushagr007@gmail.com ABSTRACT This article is based on Cloud Based Stock forecasting using neural network and Cloud as Hadoop. Stock Market has high
profit and high risk features, on the stock market analysis and prediction research has been paid attention by people. The
stock price trend is complex nonlinear function so the price has certain predictability. This article mainly with improved BP
neural network (BPNN) for the stock price prediction .And result show that method has good prediction effect on stock
price. Using this paper one can predict stock future trend of multiple companies by using Map Reduce Technique for
parallelism and achieving accurate results. Features of article are SMA graph, EMA graph, OBV graph, Prediction of stock
KEYWORDS : MapReduce; Hadoop; Stock market Forecasting; Technical Indicators; Graphical indicators; BP Neural Network.
1 INTRODUCTION
Hadoop MapReduce is a latest framework specially
designed for processing large datasets on distributed
sources. Apache’s Hadoop is an implementation of
MapReduce. This article proposes to utilize the parallel
and distributed processing capability of Hadoop
MapReduce for managing heterogeneous query
execution on large datasets. Map and Reduce techniques
to break down the parsing and execution stages for
parallel and distributed processing.
In this paper section 2 presents Hadoop MapReduce
framework. Section 3 presents the Stock Market
Forecasting. Section 4 presents Technical Indicator,
Section 5 Graphical Indicator and Section 6 BP Neural
Network and then Conclusion
2 HADOOP MAPREDUCE
MapReduce is a programming model for expressing
distributed calculation on massive amount of data and an
execution framework for large-scale data processing on
clusters of article of trade servers [1] [2]. It was originally
developed by Google and built on well-known principles
in parallel and distributed processing Hadoop is the open
source implementation of MapReduce [4][1][2] written in
java which provides reliable, scalable and fault tolerance
distributed computing. Hadoop environment set up
involves a great number of parameters which are crucial
to achieve excellent performance. It allows programmers
to develop distributed applications without any
knowledge. Key-value pair forms the basic data structure
in MapReduce. Keys and values may be primitives such
as integers, floating point values, strings, and raw bytes
or they may be arbitrary complex structures (lists, tuples,
associative array, etc.). Programmers typically need to
define their custom data types. The map function takes
the input record and generates intermediate key and
value pairs. The reduce function takes an intermediate
key and a set of values to form a smaller set of values.
Typically just zero or one output value is produced by
the reducer. In MapReduce, the programmer defines a
mapper and reducer with the following signature:
Map (k1, v1) → [(k2, v2)] Reduce (k2, [v2]) → [(k3, v3)]
[….] denotes the list.
MapReduce framework is responsible for automatically
splitting the input, distributing each chunk to mappers
on multiple machines, grouping and arrangement all
intermediate values related with the intermediate key,
passing these values to reducers on multiple resources,
this is shown in Fig1. Monitoring the execution of
mappers and reducers as to re-execute them when
failures are detected is done by the master. It is not
uncommon for MapReduce jobs to have thousands of
individual tasks that need to be assigned to nodes in the
cluster. In large jobs, the total number of tasks may
exceed the number of tasks that can be run on the clusters
concurrently, making it necessary for the scheduler to
maintain some sort of a task queue and to track the
progress of running tasks, so that waiting tasks can be
assigned to nodes as they become available. Fig 1 .shows
Simplified view of MapReduce
Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013 381 ISSN 2278‐7763 C. On Balance Volume (OBV)1) It is technical analysis indicator used to relate
price & volume in stock market.
2) If OBV constantly increases means there are
upward trends.
3) If OBV constantly decreases means there are
downward trends.
4) If today’s Close > yesterday’s close then
OBV= OBV (yesterday) + Volume (today)
5) If today’s Close< yesterday’s close then
OBV=OBV (yesterday)-Volume (today)
Fig 1. View of MapReduce
3 STOCK MARKET FORECASTING
The stock market reflects the variation of the market
economy, and receives ten million investors’ focus since
its opening development. The stock market is
characterize by high-risk, high-yield, so investors are
concerned about the analysis of the stock market and
trying to forecast the trend of the stock market. However,
stock market is impacted by the politics, economy and
many other factors, coupled with the complexity of its
internal law, such as price changes in the non-linear, and
shares data with high noise characteristics, therefore the
traditional mathematical statistical techniques to forecast
the stock market has not yielded suitable results. Neural
networks can approximate any composite non-linear
relations and has robustness and fault-tolerant features.
Therefore, it is very suitable for the analysis of stock data.
5 GRAPHICAL INDICATORS
In this we will display various graphs like bar chart, line
chart by using in built java library Jfreechart.
6 BACK PROPAGATION NEURAL NETWORK
BP neural network algorithm [6] [7] is a supervised
learning algorithm, its main idea is: Enter the study
samples and then we can use the back-propagation
algorithm to adjust the weights and bias of network by
repeated training. Ensure the output vector is close to the
expected vector as far as possible.
Back Propagation Neural Network consist of 3 main
steps:
A. Perform a forward pass on the network.
1. Output of neuron=∑(input*weight)
B.
4 TECHNICAL INDICATORS
This method is used for analysis purpose by using one of
the following feature users can see graph of that
company by entering period as input.
A. Simple Moving Average (SMA)1) SMA is basic of the moving average used
for treading.
2) It is based on closing price.
Ex. Daily Closing price- 11,12,13,14,15,16,17
To Find MA of day1st day- (11+12+13+14+15)/5=13
2nd day- (12+13+14+15+16)/5=14
3rd day- (13+14+15+16+17)/5=15 & so on.
B. Exponential Moving Average (EMA) –
1) Try to reduce Lag by applying more weight to recent
price.
2) EMA (Current) = ((Price (Cur) – EMA
(Prev))*Multiplier) + EMA (Prev)
Multiplier = (2/ (Time period+1))
Perform a reverse pass (training).
1. Error for Output layer δoutput_layer= (Target valueOutput value)*(1- Output value) *Output value.
2. New Weights
for Output layer
Wn= Wn+(
hidden
δhidden_layer=(
δoutput_layer*input)
3. Error
for
layer
δoutput_layer*weight)(1-Output
Value
4. New
Weights
for
value)*Output
Hiddenlayer
Wn=
Wn+(
δhidden_layer*input)
C. Perform a further forward pass and comment on
the result.
Once modelling an artificial functional model from the
biological neuron, we must take into account three basic
components. First off, the synapses of the biological
neuron are modelled as weights. Let’s remember that the
synapse of the biological neuron is the one which
interconnects the neural network and gives the strength
of the connection. For an artificial neuron, the weight is a
number, and represents the synapse. A negative weight
reflects an inhibitory connection, while positive values
Copyright © 2013 SciResPub. IJOART International Journal of Advancements in Research & Technology, Volume 2, Issue4, April‐2013 382 ISSN 2278‐7763 designate excitatory connections.
The following components of the model represent the
actual activity of the neuron cell. All inputs are summed
altogether and modified by the weights. This activity is
referred as a linear combination. Finally, an activation
function controls
the amplitude of the output. For example, an acceptable
range of output is usually between 0 and 1, or it could be
-1 and 1.
server.
MapReduce in Hadoop is used for developing
virtual data nodes for parallel processing of the
data using Neural Network for time efficient
forecasting of the future trend of the stock
market.
Technical indicators such as SMA, EMA and
OBV makes analysis of esoteric stock data more
understandable.
Neural Network algorithm can predict with more
accuracy. This way we eliminate the problem of false
prediction that may result from incompetent methods
that requires more human observation on the data.
Hadoop enables us to deal with huge amount of data and
enables parallel processing on input data which makes
the result available in finite duration of time with more
accuracy. It is defined at the application level that low
execution overhead is imposed on runtime environment.
9 ACKNOWLEDGEMENT
Mathematically, this process is described in the fig. 2
Fig 2. BPNN
From this model the interval activity of the neuron can be
shown to be:
We would like to extend our sincere, heartfelt
gratitude to our Head of Department,Prof. Mrs.
Sarika.Zaware and our internal guide Prof. Mrs. S.
Pimpalkar AISSMS IOIT, under whose guidance we had
the privilege of working and learning and whose
constant inspiration at all faces of the article lead to the
successful completion of our work
9 REFERENCES
[1] Jeffrey Dean and Sanjay Ghemawat. MapReduce:
Simplified Data Processing on Large Clusters. Google
Research Publication (2004).
[2] Ralf Lammel. Google's MapReduce Programming Model
Revisited.Science of Computer Programming archive.
Volume 68, (2008).
The output of the neuron, vk, would therefore be the
outcome of some activation function on the value of vk.
[3] Apachee Hadoop,http://Hadoop.apache.org.
7 CONCLUSION
[4] Hammoud, S., Maozhen Li, Yang Liu, Alham N.K., Zelong
Liu. MRSim: A discrete event based MapReduce
simulator. Seventh International IEEE Conference on
Fuzzy Systems and Knowledge Discovery (FSKD), 2010.
This paper presented a novel highly automated approach
for Stock Exchange forecasting using Hadoop Cloud.
[5] Tom White. Hadoop: The Definitive Guide. O’Reilly,
Scbastopol, California, 2009.
Our approach consists of
Identifying company selected by the user and
downloading the same from yahoo finance.
Normalizing the data and performing BPNN
algorithm on the data received from yahoo
[6] Yong Liao. Based on Gene Expression Programming and
the timeseries analysis of the price of stock. Chinese
dissertation database, 2005
[7] Zhou Yixin, Jie Zhang”Stock data analysis based on BP
neural network” 2010 Second International Conference on
Communication Software and Networks.
Copyright © 2013 SciResPub. IJOART 
Download