Financial Forecasting using Neural Networks

advertisement
Black box modeling for highly nonlinear systems
Skolidis G, Souriadakis M, Georgikopoulou A., Hatzopoulos P, Nikolaou G.,
Tseles D.I.
1. Introduction
Neural networks are interconnecting systems, which can be considered as
simplified mathematical models functioning like the neuron patterns of the human
brain. However, in contrast to traditional computing techniques, which are
programmed with rules to perform a specific task, neural networks must be taught or
trained through a training data set and create by itself the patterns and the rules
governing the network. Although computers perform better than artificial neural
networks, for tasks based on precise and fast arithmetic operations, neural networks
can be used in problems where the associations and the patterns between the input
variables are unknown, and is worth mentioning that this method does not need
continuous relationships between the data that is being evaluated in order to identify
key events or patterns.
While neural networks have many applications in Finance and Economics such as
stock selection, mortgage applicants, bankruptcy forecasts, real estate appraisal and
forecasting stock or index prices with time-series, their utility in practice is often
limited. On the other hand the unique ability of neural networks to learn from data
any nonlinear relationship without prior knowledge of the system make them an
excellent tool for forecasting applications.
In this paper two case studies are going to be presented. First a financial
application where the neural networks are used to predict the value of houses in
Boston, the Boston Housing Project as it is called, which is a typical benchmark
problem for neural networks. The second case study uses neural network model for
the prediction of meteorological parameters.
2. Neural Networks as Forecasting Models
Neural networks can be used as forecasting tools in many different areas. They
also have the ability to classify nonlinear systems and can approximate any nonlinear
function to some level of accuracy. In economic and financial applications the most
basic and commonly used neural
network is the multilayer
feedforward network. Figure 1
X1
illustrates the architecture on a
neural network with one hidden
N1
layer containing two neurons,
Y
three input variables {xi.}, i=1,2,3
X2
and one output y. All the nodes at
each layer are connected to each
N2
node at the upper layer by
interconnection strength called
X3
weights. A training algorithm is
used to obtain a set of weights
that minimizes the difference
Figure 1. Architecture of a neural network
between the target and the output produced by the simulation of the network.
2.1 Learning Algorithms
In our study we used several different variations of the backpropagation training
algorithm, each of them having a variety of different computation and storage
requirements. The table below summarizes the training algorithms used in the seeking
procedure of the model with the highest level of accuracy.
Algorithm
Gradient Descent (GD)
Gradient Descent with
Momentum (GDM)
Gradient Descent with
Adaptive Learning Rate
(GDX)
Resilient
Backpropagation (RP)
Polak-Ribiere Conjugate
Gradient (CGP)
Levenberg-Marquardt
(LM)
Bayessian
Regularization (BR)
Description
Slow response, can be used in incremental training mode
Faster training than GD, can be used in incremental
training mode
Faster training than GD, but can only be used in Batch
training mode
Simple batch training mode algorithm with fast
convergence and minimal storage requirements
Slightly larger storage requirements and faster
convergence on some problems
Faster training algorithm for networks with moderate size,
with ability of memory reduction for use when the training
data set is large
Modification of the Levenberg –Marquardt training
algorithm to produce networks that improves
generalization and reduces the difficulty of the
determination of the optimum network architecture.
2.3 Techniques for Improving Generalization
In our research we studied two techniques for improving the generalization ability
of the network: Early Stopping, Bayesian Regularization.
Early Stopping
This technique requires the data set to be divided into three subsets: training, test
and validation set. The training set is used for computing the gradient and updating
the network weights and biases. The training procedure monitors the error of the
validation set, and as soon as the error starts to increase the training stops and returns
the weights at the phase where the error was minimum. This is the stage when the
model should cease to be trained to overcome the over-fitting problem.
Bayesian Regularization
This technique involves the modification of the performance function, such as the
sum of squared network errors (MSE). A typical performance function that is used for
training feedforward neural networks is the mean sum of squares of the network
errors.
1 N
1 N
2
(
e
)

(t i  ai ) 2


i
N i 1
N i 1
It is possible to improve generalization if we modify the performance function by
adding a term that consists of the mean of the sum of squares of network weight and
biases.
msereg  mse  (1   )msw
where  is the performance ratio, and
1 n
msw   w 2j
n j 1
Using this performance function will cause the network to have smaller weights and
biases, forcing the networks response to be smoother and less likely to over-fit.
F  mse 
From these two techniques we have chosen Bayesian Regularization because we
have observed a significant increase in the accuracy of the models tested.
2.4 Performance Metrics
For both case studies there are used two statistical criteria to estimate the
performance of each neural network: in-sample criteria and out of sample criteria
which are based on tests of significance.
Specifically in the case of the in-sample-criteria we evaluate the regression, since
we want to know how well a model fits in the actual data. The best way to measure
the goodness of fit measure is through the multiple correlation coefficient which is
also known as the R -squared coefficient. The ratio of the variance of the output
predicted by the model relative to the true or observed output:
T
R2 
 ( yˆ
t
 yt ) 2
(y
t
 yt ) 2
t 1
T
t 1
The out of sample criteria evaluates how well competing models generalizes the
data set that we use for the estimation. To evaluate the performance of a model out-ofsample, initially we begun by dividing the data into an in sample estimation training
set, so we can take the coefficients. The most commonly used statistic for evaluating
out-of-sample fit is the root mean squared error (rmsq) statistic:
*
rmsq 
( y


1

yˆ ) 2
*
3. Case studies
In the following section, the two case studies are presented and we compare the
simulation results of the models using the performance metrics described in the
previous section.
3.1 The Boston Housing Project
In our application we developed a number of neural networks for forecasting the
value of houses in Boston, the Boston Housing Project as it is called, which is a
typical benchmark problem for neural networks. The application was developed in
Matlab 6.5 programming environment, using the neural network toolbox that it
provides. Initially, the inputs and the targets were preprocessed, so that they fall in the
range [-1,1]. Secondly we developed the network, with the specific architecture
(number of layers, neurons in each layer, transfer function of each layer) that we
wanted to test. Then the network was trained using one of the algorithms described
above. After the training of the neural network, we simulated it with the data that was
actually trained to evaluate its performance, to wit how well the model fits the actual
data. Its performance was measured by computing the R-squared co-efficient and the
root mean squared error. Lastly the network was tested with new inputs, to measure
its ability to generalize using the rmsq statistical criterion. After the first tests we put
aside the models with the best performance and we tried to improve their ability to
make forecast in new data, using the Bayesian Regularization with the modified
performance function.
3.2 Experimental Results
Every model that was developed had eleven (11) inputs, one hidden layer and an
output unit to predict the value of the house. The final set of weights to which a
network settles down, depends on a number of factors, e.g., initial weights chosen,
different learning parameters and the number of hidden neurons. The number of
hidden neurons varied between 5~14 and the training completed after the training
cycle has reached the 5000 iterations. The experiment, for each model constructed,
was conducted ten (10) times, and to compare the models we used the average error of
the test data. The method used to select the neural network, which gives us the
forecast with the highest level of accuracy, was first the trial and error technique
where we compared several architectures and selected the best ones and in
continuance the optimization of the models selected from the trial and error method.
3.2.1 Optimization
After the first selection, we tried to optimize those networks using the Bayesian
Regularization with the modified performance function. When we use the
regularization technique the user has to determine the optimum value of the learning
rate so that the network will adequately fit the training data and will not get overfitted.
If the learning rate is too large, the network may get overtrained, but if it is too small
the network will not fit the data. After we have experimented with several values of
the learning rate, we have concluded to the values listed in the table below, indicating
also the level of accuracy of each model.
Architecture
Transfer
Function layers
Training Algorithm
Learning
Rate
Epochs
Average Error
of Test Data
11-5-1
tansig-purelin
gda
default
5000
3.058
rmsq
11-5-1
tansigpurelin
gdx
default
5000
2.924
11-7-1
tansigpurelin
gdx
default
5000
2.856
11-9-1
tansigpurelin
gda
default
5000
3.001
11-9-1
tansigpurelin
gdx
default
5000
2.87
11-13-1
tansig-purelin
gdx
default
5000
2.832
11-14-1
tansig-purelin
gdx
default
5000
2.87
11-5-1
tansig-purelin
gda
0.8
5000
3.456
11-5-1
tansig-purelin
gdx
0.8
5000
3.178
11-7-1
tansig-purelin
gdx
0.8
5000
2.997
11-9-1
tansig-purelin
gda
0.8
5000
3.212
11-9-1
tansig-purelin
gdx
0.8
5000
2.907
11-13-1
tansig-purelin
gdx
0.8
5000
2.862
11-14-1
tansig-purelin
gdx
0.8
5000
2.834
11-5-1
tansig-purelin
gda
0.75
5000
3.419
11-5-1
tansig-purelin
gdx
0.75
5000
3.257
11-7-1
tansig-purelin
gdx
0.75
5000
3.13
11-9-1
tansig-purelin
gda
0.75
5000
3.219
11-9-1
tansig-purelin
gdx
0.75
5000
2.941
11-13-1
tansig-purelin
gdx
0.75
5000
2.932
11-14-1
tansig-purelin
gdx
0.75
5000
2.913
11-5-1
tansig-purelin
gda
0.70
5000
3.547
11-5-1
tansig-purelin
gdx
0.70
5000
3.312
11-7-1
tansig-purelin
gdx
0.70
5000
3.235
11-9-1
tansig-purelin
gda
0.70
5000
3.3
11-9-1
tansig-purelin
gdx
0.70
5000
3.115
11-13-1
tansig-purelin
gdx
0.70
5000
2.914
11-14-1
tansig-purelin
gdx
0.70
5000
2.9
11-5-1
tansig-purelin
gda
0.85
5000
3.304
11-5-1
tansig-purelin
gdx
0.85
5000
3.029
11-7-1
tansig-purelin
gdx
0.85
5000
2.932
11-9-1
tansig-purelin
gda
0.85
5000
3.309
11-9-1
tansig-purelin
gdx
0.85
5000
2.895
11-13-1
tansig-purelin
gdx
0.85
5000
2.826
11-14-1
tansig-purelin
gdx
0.85
5000
2.876
3.2.3 Model Selection
The model which gave the best forecast, was the one with 11-13-1 architecture,
trained with the gdx algorithm for 5000 iterations and the value of the learning rate
was 0.85.
Performance is 0.0239781, Goal is 0
0
Real and Simulated Outputs of TrainData
10
50
45
40
Training-Blue
35
30
-1
10
25
20
15
10
-2
10
0
500
1000
1500
2000 2500 3000
5000 Epochs
3500
4000
4500
5000
5
5
10
15
20
25
30
35
40
45
50
Error of Test Data
Real and Simulated Outputs of TestData
10
45
9
40
8
7
35
6
5
30
4
25
3
2
20
1
0
0
5
10
15
20
25
30
35
40
45
15
10
50
15
20
25
30
35
40
45
4. Weather forecasting using neural networks
The purpose of the neural networks models for this case study is to predict the
ambient temperature using past measurements. After thorough literature review and
analysis all the models was decided to have three parameters as inputs: previous
temperature measurements, atmospheric pressure and relative humidity. After
experimentation on different network architectures it turned out that for the given data
used for modeling the best prediction was given by a network with one hidden layer,
having five neurons. For this architecture the results for a number of training
algorithms and activation functions are given in the next two tables.
TRAIN
TEST
MSE
min
4,5348
Tansig - Purelin
MAE
min
1,6356
Tansig - Purelin
R
max
0,9685
Tansig - Purelin
MSE
min
5,6598
Tansig - Tansig
MAE
min
1,7490
Tansig - Tansig
R
max
0,9555
Tansig - Tansig
TRAIN
TEST
5. Conclusion
MSE
min
4,6510
TRAINBR
MAE
min
1,6616
TRAINBR
R
max
0,9677
TRAINBR
MSE
min
5,7135
TRAINBR
MAE
min
1,7554
TRAINBR
R
max
0,9553
TRAINBR
The purpose of this paper was to prove that neural networks can be used for time
series forecasting. We have presented and compared many different neural network
models for a typical benchmark problem from the financial sector and also an
application for modeling a highly nonlinear problem. It was proved that neural
networks given a large set of consistent data can be used as time series forecasters
providing results that can achieve accuracy comparable if not superior to traditional
forecasting techniques.
Download