Predicting Daily Returns for the IBM Stock

advertisement
Predicting Daily Returns for the IBM Stock
M. Oldemiro Fernandes
Luis Torgo
mofer@liacc.up.pt
ltorgo@liacc.up.pt
LIACC, University of Porto
www.liacc.up.pt
Abstract. The goal of the work described in this paper is to predict the daily returns of the closing
prices for the IBM stock. From the original data of IBM daily quotes a new data set was built using
technical indicators as predictor variables. Using this new data set, two modelling approaches were
tried: regression and classification. Early analysis and experiments suggested that this prediction
problem has some specific properties that make it difficult for standard learning algorithms. Resulting
from this analysis we propose a two-steps approach to overcome these difficulties. Initial experimental
analysis shows that this approach is promising. However, the actual results are still far from the ideal
performance achievable by our proposed methodology. Our analysis of these results show that further
work must be done, namely in improving the performance of the classification stage of our approach.
1. Introduction
During market hours, having an accurate prediction of the last price of the day, allows one to
make profitable intraday trades. This is the main motivation behind our work, where we try to
predict the daily returns of one particular stock.
Daily stock returns are defined as the percentage change between two successive closing
prices:
Closet +1
=
− 1 × 100
Closet
Rt +1
(1)
For each period we set the prediction time at the moment after the Open price (Open t+1) is
known as shown in Figure 1. This means that this value can be used to predict the Close price of
that day, which is our main objective.
Past
Open t
Close t
Open t+1
Prediction
Moment
Close t+1
Future
Figure 1-Prediction moment.
2. Data Presentation and Pre Processing Methodology
IBM daily quotes were collected from finance.yahoo.com. The data set consists in 7891
observations of Open, High, Low and Close prices and Volume traded, of each day from
14/07/1970 to 12/10/2001.
From this base data we have generated a dataset whose predictor variables are several
technical indicators. Namely, we have used 9 attributes. The technical indicators were chosen
among those most used by traders: Moving Averages, Aroon Indicator, Relative Strength Index,
Chaikin Money Flow, Stochastic Oscillator, Average True Range.
The variation between the last Close (t) and the last Open (t+1) was also incorporated in the
data set as an attribute. This attribute is very important when the aim is trying to predict the
closing price (t+1) (Zirilli, 1997).
The target variable is the daily return of the Closing prices, which was calculated according
to the formula given before. This results in a regression data set. Additionally, using the same
attributes we have created a classification data set by classifying the target variable into 4
classes, using quartiles information to determine bins dimension. The reason for using quartiles
to divide the target variable into classes was to get one classification problem with balanced
classes. The resulting classification has an easy interpretation. The first quartile represents large
negative moves (sell opportunities), the last quartile represents large positive advances (buy
opportunities), while both middle quartiles represents very small moves in either directions,
respectively (insufficient to compensate trading costs if any action is done).
In summary, we have created two different data sets from the original data, which represent
two different views of the same problem: one with the original numeric returns and the other
with the returns discretised into four classes.
After the pre-processing steps described above we have obtained a data set with 7837
observations. These observations were divided into a training set (with the first 5173 cases) and
a testing set (with the remaining 2664 cases).
3. Exploratory Data Analysis
We have carried out a simple analysis of the statistical properties of the target variable,
separately for the training and testing cases (c.f. Figure 2).
2000
1000
V11
Std. Dev = 2.15
Mean = .1
N = 2664.00
.0
12
.0
10
0
8.
0
6.
0
4.
0
2.
0
0.
.0
-2
.0
-4
.0
-6
.0
-8 0
0.
-1 0
2.
-1 0
4.
-1 0
6.
-1
0
TestingDataSet
700
Count
2664 600
Maximum
13.16
500
Mean
0.07
Median
0.00 400
Mode
0.00 300
Minimum
-15.54 200
Range
28.71
100
Std. Dev = 1.42
Std. Deviation 2.15
Mean = .0
0
N = 5172.00
Variance
4.61
Skewness
0.22
V11
Kurtosis
5.64
Figure 2- Statistical Measures of the Data.
.0
10
0
8.
0
6.
0
4.
0
2.
0
0. 0
.
-2
.0
-4 0
.
-6 0
.
-8 .0
0
-1 0
2.
-1 .0
4
-1 .0
6
-1 .0
8
-1 .0
0
-2 .0
2
-2 .0
4
-2
TrainingDataSet
Count
5172
Maximum
11.38
Mean
0.03
Median
0.00
Mode
0.00
Minimum
-23.52
Range
34.90
Std. Deviation 1.42
Variance
2.02
Skewness
-0.47
Kurtosis
17.55
The observation of the distribution of the returns target variable revealed a non-normal
distribution in the sense that it has longer tails and bigger concentration of occurrences around
the central value. This situation is more evident in the training set which has a larger range of
values, but it is present in the test set too, as it can be seen through the Kurtosis measure.
Comparing both data sets, it is possible to identify a larger variance in the test set. Regarding
measures of centrality, while the means are slightly different, the medians are the same.
4. Construction of Prediction Models
Using the two data sets described before, some models were obtained using different
learning algorithms.
4.1 Regression Data Set
Several experiments were done, using the system RT4.1 1 (Torgo, 1999). The results
obtained in the testing set are summarised in Figure 3. Results are presented using the Mean
1 www.liacc.up.pt/~ltorgo/RT
Square Error (MSE), Mean Absolute Deviation (MAD) and Normalised Mean Square Error
(NMSE).
Algorithm
Model of Mean
RT4.1 (default)
RT4.1 lr
RT4.1 lr -tlm be
MSE MAD NMSE
4.608 1.526 1.000
4.152 1.487 0.901
3.376 1.338 0.733
3.321 1.331 0.721
Figure 3- Results with RT.
The Model of Mean represents the simplest model one can have: it always predicts the mean
value of returns (calculated with the training data). This model is used as a reference to evaluate
the relative gain in performance obtained with more complex models.
RT4.1 with default parameters grows one tree with 43 nodes and averages in the 22 leaves.
This more complex model has little advantage when compared to always predicting the average.
The best model was a standard linear regression with attribute selection (3 attributes selected
from the original 9), that performs slightly better than linear regression with all attributes.
These experiments using the regression data set highlighted some specific characteristics of
this domain. In fact, the most interesting observations (from a trading perspective) of this
problem are a few extreme observations, usually considered outliers. Most learning algorithms
will ignore such cases, because they are biased to reduce the error in the prediction process,
which is better achieved when the most common cases (non outliers) are modelled. This is the
reason why some models that we have obtained just produced the mean as the prediction model.
Thus, the main problem with this data set is the fact that the most interesting cases from the
trading perspective do not have sufficient representativeness from a statistical point of view, as
to be considered relevant by the learning algorithms. This empirical observation as lead us to
develop a methodology to try to overcome this difficulty. This methodology will be presented in
Section 5.
4.2 Classification Data Set
We have tried different classification learning algorithms with the classification data set,
namely: C5.02 (Quinlan, 1993), Ltree 3 (Gama, 2001) and a back propagation Neural Network.
The results obtained with these systems are summarised in Figure 4.
A lgorith m
E rror Ra te (% )
C 5.0 (default)
65.1%
C 5.0 -r (decision rules)
62.6%
C 5.0 -t20 (boostin g)
61.6%
N eural N etwork
61.9%
Ltree
60.8%
Figure 4- Results with Classification Data Set.
The default version of C5.0 produces one large decision tree, with 920 nodes, which has an
error rate of 65.1%. Using C5.0 to generate rules, 87 are generated, with a smaller error rate.
Using 20 trials boosting, the error rate was even smaller.
Ltree with pre pruning (-m15) and an univariate parameter (-U) produced the best
classification model, with a very small tree.
We have also tried to train a neural net for this classification task. The target variable was
decomposed in 4 binary variables (each of them assumes the value 1 if the case is from the
correspondent class, and 0 if not). The best result (shown in Figure 4) was obtained with an
architecture with 10 input neurons, 8 hidden neurons and 4 output neurons.
In spite of all efforts we have made to reduce the error rate, the best results achieve an error
that is still too high, meaning that more work should be done.
2 www.rulequest.com
3 www.liacc.up.pt/~jgama
5. Suggested Methodology
As a result of our regression experiments we have noticed that the most interesting
observations, from a trading perspective, where being disregarded by the regression models that
were obtained (c.f. Section 4.1)
To overcome this difficulty, we propose an approach based on a two-stages learning process.
In the first stage we try to obtain a model that is able to correctly identify the type of
observation, according to the classes we have used before (large and small increases (decreases)
of the closing price returns). Based on this classification of the training cases, we develop a
regression model for each class of observations. In this way, each regression model will be
obtained only with similar cases.
During prediction our approach follows a similar two steps method. In the first step, one uses
the obtained classification model to classify the case for which we want a prediction into one of
the four classes. Then, given the predicted classification, in the second step the respective
regression model is used to predict the closing price return.
5.1 Ideal Results (benchmarking model)
The sources of error of this proposed methodology can be two-folded. On one hand we have
the classification error of our first level model. On the other hand we have the regression error
of our second level models. Ideally our first level model will have a classification error of zero.
With the goal of understanding the limits of our proposed methodology we have simulated this
situation. We have looked at the test set and have obtained the correct classification for each test
case (thus “cheating”). Given this ideal classification we have observed the error of the
respective regression model. The regression algorithm used in this experiment was RT4.1 with
default parameters (regression trees with averages at leaves). Results obtained are presented in
Figure 5.
CaseClass
-2
-1
1
2
All observations
Size MSE
13 2.041
3
0.073
3
0.058
7
2.176
--- 1.277
MAD NMSE
0.887
0.873
0.236
0.995
0.207
1.021
0.959
0.728
0.638
---
Figure 5- Results with Benchmark Model.
These results can be regarded as a kind of ideal performance we can aim at with our twostages methodology.
These results show much lower prediction error than those obtained without the two-stages
methodology. One can see that in those cases with lower absolute returns (classes 1 and –1)
predicting the mean return seems to be a good compromise. However, in extreme cases, with
bigger movements (like those in classes 2 and –2), more complex models are necessary.
Obviously, this experiment only serves to give an idea of how far one can expect to go, using
the proposed methodology, because, in the real world, the true class of the test cases is not
known. Still one can try to discover it, using the classification models, obtained as we have
described previously. This is described in the next section.
5.2 The results of the two-stages method
We now present the results obtained the method we have proposed including both prediction
stages (classification and regression). The classification model was obtained using Ltree (the
best classifier obtained in first experiments) to learn to classify each new observation in one of
the four classes. Within each class a regression model was constructed using RT4.1 with the
default parameter values.
For each case of the test set, the classifier was used to obtain a probabilistic classification.
This means that for each test case, Ltree produced a class distribution probability.
With this set-up two different experiments were carried out. In the first, each test case was
considered to belong to the class with larger probability, and the respective regression model
was used to obtain the prediction.
In the second approach, all regression models were used, and the final prediction was
obtained as a weighted average of all four predictions, with the class probabilities being used as
weights. Results for both experiments are presented in Figure 6 and show that the second
approach is better.
C ase C la ss
M a xim um
W eigh ted A verage
M SE
M AD
4.579 1.593
3.760 1.394
Figure 6- Results with Two Stages Method.
These results show that the error of the classification stage was so high (namely, 60.8%, c.f.
Figure 4) that the overall results of our two-stages methodology were worst than those obtained
with simpler approaches (c.f. Figure 3). However, we should recall that given the results of
Figure 5 one can expect that if the results of the overall score can be significantly better than the
results of Figure 3. Thus the main conclusion of these experiments is that further work needs to
be carried out in the classification stage so as to allow this two-stages approach to pay-off.
Namely, an idea that should provide better results is to explore misclassification costs within
the classification task. In effect, better than optimising the error rate we would like to avoid
certain types of error more than others (for instance classifying a class –2 case as class 2).
6. Conclusions and Future Work
In this study, we found that predicting daily returns is not an easy task. We believe that this
difficulty is increased because algorithms are error oriented, instead of being profit oriented. As
such, these algorithms ignore most interesting cases to model, and concentrate efforts on those
without significant moves. This was the main motivation for the development of one method
that is able to separate observations by class (type of market movement) and then use class
specialised regression models.
We constructed one ideal scenario to have one benchmark model. This benchmark model
shows that using specialised regression models, with accurate classification of cases, lower error
rates will be possible.
When we used our best classifier to obtain distribution classes for each observation, results
have deteriorated. We believe that better results can be achieved if better classification is done.
We intend to explore the idea of misclassification costs as a means to bias the classification
stage towards more trading-oriented performance goals.
References
Gama, J. (2001), “Functional Trees for Classification”, IEEE International conference in Data Mining, Published by IEEE
Computer Society.
Quinlan, J. (1993), “C4.5: programs for machine learning”, Kluwer Academic Publishers.
Torgo, L. (1999), “Inductive Learning of Tree-based Regression Models”, PhD thesis, Faculty of Sciences, University of Porto.
Zirilli, Joseph S. (1997), “Financial Prediction using Neural Networks”, International Thomson Computer Press, London, UK.
Download