City University of Hong Kong Department of Computer Science

advertisement
City University of Hong Kong
Department of Computer Science
BSCCS/BSCS Final Year Project Report 2004-2005
(04CS023)
Stock Market Index Forecast using CAS Algorithm
(Volume
1
of
1 )
Student Name
: Wan Kwok Wai Winsen
Student No.
: 50323718
Programme
Code
: BScCS
Supervisor
: Dr. Andy Chun
1st Reader
: Dr. Victor Lee
2nd Reader
: Mr. C H Lee
For Official Use Only
Stock Market Index Prediction using CAS Algorithm
Section I
bstract
Pattern matching technique is found to be explanatory in time series analysis, and its
importance in this area has been increasing. However, reinformcenet learning method
was not used as a learning algorithm for the pattern matching process. In this project,
the Stock Market Index Forecast System (SMIFS) was built to make stock market
forecast. Ant Algorithm was applied to a two way path environment for the system to
learn how patterns should be matched. The time series was divided into sub-timeseries based on the predefined “ant to sub-time-series” ratio. A sub-time-series was
then divided into segments for pattern matching. We also studied the performance of
various formulas for calculating future index point after determining the future
movement direction. Experiements were done using different segment sizes. The
results indicated that SMIFS worked better with a larger segment size. It suggested that
pattern matching systems based on different learning algorithms may work best with
very extreme segment sizes. The performance of SMIFS could be further increase by
optimizing the segment size and “ant to sub-time-series” ratio.
Page 2
Stock Market Index Prediction using CAS Algorithm
Section II
cknowledges
Winsen Wan would like to thank Dr. Andy Chun for his valuable advice and supervision.
He would like to thank Ms. Charis Hui for her assistance in obtaining financial data set.
He would also like to thank all his classmates, friends and family for giving him their
kind supports.
Page 3
Stock Market Index Prediction using CAS Algorithm
Section III
able Of Contents
bstract........................................................................................................................................2
cknowledges..............................................................................................................................3
able Of Contents........................................................................................................................4
ntroduction ..................................................................................................................................6
1.1
Objectives.................................................................................................................7
1.2
Scope Of This Project ..............................................................................................7
1.3
Problem Domain.......................................................................................................7
ackground Information...............................................................................................................8
2.1
Rosennean Complexity ............................................................................................8
2.2
Elliot Wave Principle.................................................................................................9
2.3
Related Works........................................................................................................10
ethod .......................................................................................................................................12
3.1
Data Set .................................................................................................................12
3.2
Pattern Matching ....................................................................................................13
3.2.1 Segments ...............................................................................................................13
3.2.2 Sub-time-series ......................................................................................................13
3.2.3 Ant Algorithm On Shortest Path Problem...............................................................14
3.2.4 Ant Algorithm On This Project ................................................................................15
3.2.5 Similarity Determination .........................................................................................17
3.2.6 Discounted-F ..........................................................................................................17
3.3
Efficiency Enhancement.........................................................................................19
3.4
Weight Adjustment .................................................................................................20
3.5
Index Calculation....................................................................................................22
esults .......................................................................................................................................23
iscussion..................................................................................................................................28
onclusion .................................................................................................................................30
eferences.................................................................................................................................31
ppendix A - Experience & Difficulty .........................................................................................32
8.1
Use Of Ant Algorithm..............................................................................................32
8.2
Weighting Function Design ....................................................................................33
8.3
Weight Adjustment Optimization ............................................................................33
8.4
Project Management ..............................................................................................35
ppendix B - Database Design..................................................................................................36
ppendix C - System Design .....................................................................................................39
10.1
Class Description ...................................................................................................39
10.2
UML Design............................................................................................................40
10.3
Design Special .......................................................................................................46
ppendix D - GUI Design...........................................................................................................49
Page 4
Stock Market Index Prediction using CAS Algorithm
11.1
11.2
Main Interface.........................................................................................................49
Agent and Index Management Interface ................................................................51
Page 5
Stock Market Index Prediction using CAS Algorithm
Section 1
ntroduction
There are a number of stock market indices in the US financial market. Each of these
indices tracks the performance of a group of stocks which considered to be
representing a particular industry, sector or market in the US economy. This group of
stocks is called a basket.
For example, the Standards and Poors (S&P) 500 Composite Stock Price Index is an
index of 500 stocks from major industries of the US economy. It is a capitalizationweighted index for 500 leading companies, which considered to be a representative
sample group, in leading industries within the US economy. Many investors whose
investment objectives are to track the performance of a particular index would
participate in index trading.
Because of the attractive return from investment, many tools and techniques were
developed for making financial forecast. Technical Analysis (TA) was one of the most
commonly used tools. TA makes use of many financial indicators to forecast the time
series. TA suggests special patterns appear in the time series to be indicating the
future trend.
As Artificial Intelligent (AI) continued to growth, AI techniques and algorithms were
applied to make financial forecast. In general, Artificial Nerual Network (ANN) was used
to tackle financial forecasting problems. ANN was said to provide relatively good result
compared to other methods. Nevertheless, ANN did not tell us how it made the
forecast, at least we did not know what factors affect the forecast from ANN.
Page 6
Stock Market Index Prediction using CAS Algorithm
Some people started to think of applying machine learning on pattern matching
technique for making financial forecast, which is relatively a new method for this
problem. By enriching our knowledge in pattern matching financial forecast, it may help
us figure out the real factors which determine the trend and movement of a time series.
1.1
Objectives
This project primarily aims at designing an algorithm based on pattern matching
technique which makes stock market forecast using historical data, and constructing
formulas which are used to calculate future index points. The secondary objective is to
analyze potential relationship between stock index and some attributes of the index.
1.2
Scope Of This Project
Originally, a stock market model and an automatic trading agent were planned to build
in this project, while the agent would be capable to participate in security trading by
itself. However, I realized that this project scope would be too large and impossible to
achieve, after studying the problem in details, and also some related works. As a result,
the project scope was reduced, and it now includes:
z
Study techniques used in computer-aided financial forecast
z
Study related works in making financial forecast
z
Design a pattern matching algorithm based on Ant Algorithm for forecasting
the trend of a time series
1.3
z
Construct formulas for calculating future index points
z
Evaluate the performance of the pattern matching algorithm and formulas
Problem Domain
Given a set of historical data in a time series, and a target segment, Ant Algorithm is
applied to match the target segment with the historical data to determine the future
trend. A learning procedure is designed for training ants. Several formulas are
developed for calculating future index points with the predicted trend.
Page 7
Stock Market Index Prediction using CAS Algorithm
Section 2
ackground Information
Complex systems are complex in the sense that they consist of at least one nonsimulable model. No single dynamical description can successfully model a complex
system, or else the behaviors of that system can always be predicted correctly [1]. One
of the examples of complex system is the financial market [2].
2.1
Rosennean Complexity
According to Robert Rosen’s definitions to “simple” and “complex” systems, a system
with all its models simulable, it is considered as simple. On the other hand, a system
with a nonsimulable model, it is considered as complex.
Robert Rosen (1934 – 1988) was a theoretical biologist who dedicated his life to find
out the answer for the question “What is Life?”. As a professor emeritus of biophysics at
Dalhosie University, he realized that the Newtonian model was inadequate to describe
biological systems. The Newtonian model of physics did not provide him an answer to
the question. This suggested that biological systems are more complex than the
Newtonian model.
We usually use modeling to understand the world, congruence between the elements
and structures of two systems are established. One of the two systems would be a real
system in the world, while the other one is our model. When the real system acts
unexpectedly, which is not consistent with our model, we think that the real system is
complex.
Page 8
Stock Market Index Prediction using CAS Algorithm
Rosennean complexity suggests that if a single dynamical description is capable to
describe a system successfully, then the behaviors of that system will always be
predicted correctly. This kind of system does not have any complexity. On the other
hand, if multiple partial dynamical descriptions are insufficient to describe a system
succesuffly, it is complex.
Financial markets are certainly complex systems, no one could correctly, successfully
describe the system. Hence, no one could predict the financial market behavior, or else
there would be no financial market. Some people believe that prices and indices
movement obey the random walk theory [3], which says prices move independently in
any point of time, implying that historical data cannot be used to predict future
movement. This is, indeed, obvious, because a financial market responses to many
factors everyday. Some of these factors are not predictable, news, for instance [4].
2.2
Elliot Wave Principle
In 1939, Ralph Nelson Elliott proposed his Elliot Wave Principle in Financial World. The
basis of the Principle is that, regular pattern always exists in the natural world. He
realized that many things in the world are actually repeating, for example, morning and
night tides, day and night, life and death. All of these cycles consist of two types of
movement: rise and fall [5].
Elliot combines his observations with Fibonacci sequence and suggested that in the
financial market, a cycle consists of 5 rises, and 3 falls, totally 8 waves (3, 5 and 8 are
Fibonacci numbers). He called the longest cycle as Grand Supercycle, and it can be
divided into 8 Supercycles. Similarly, each Supercycle can be divided into 8 cycles, and
a cycle can be further divided into smaller units.
In this project, we do not actually make use of the Principle in making financial forecast.
However, we based on the Principle and saying that, the time series always repeat
itself. We believe that there exists a matched segment somewhere in the historical data
in the time series, given a target segment. We make use of the matched segment to
predict the future movement for the target segment by asserting that the history will
Page 9
Stock Market Index Prediction using CAS Algorithm
happen again in the same way. We believe that there are cycles in financial markets
which are difficult to detect using statistical methods.
2.3
Related Works
Despite the difficulties of making financial forecast, many researches were done in this
area for its commercial and academic values. These researches include stock market
index, stock price and foreign exchange rate prediction. However, we still cannot find
an effective algorithm for making financial forecast. Techniques like statistical
forecasting and artificial neural network [6], genetic programming [7], and pattern
matching [8] have been applied to solve the problem. The accuracy of predicting the
trend using these techniques varies from 20% to 70%. It is no doubt that, financial
forecasting is not an easy task to do.
Tsang and Li applied financial genetic programming to make stock-market forecast [7].
The research was done using the Hang Seng Index data from 25-May-1991 to 16-Oct1993 for the Hong Kong stock market. It used 9 experts to predict the index movement
in next week by giving any of the following description: bullish (the index rises by over
1.3%), bearish (the index falls by over 1.3%), sluggish (the index is neither bullish nor
sluggish) or uncertain (the expert did not make any prediction). The mean accuracy
achieved by the experts was 50.39%.
Method using local approximation by pattern modeling and pattern recognition
techniques was also used for making stock-market forecast in Singh’s researches [8].
The research was done using the S&P index monthly data from 1988 to 1996 for the
US financial market, other series in other domains were also used. A whole time-series
is chopped into segments, each segment is then represented as a pattern using a set
of tag values, where 0 represents a fall, and 1 represents a rise or a zero movement.
For a target segment, prediction on the movement direction is made by identifying the
nearest historical match, which has the minimum difference in magnitude compare with
the target segment. The matched pattern is then used to predict the index in next state.
The mean direction prediction accuracy achieved for the data sets used was 73.17%.
Page 10
Stock Market Index Prediction using CAS Algorithm
Singh encouraged further research to be done to enhance the pattern recognition
based tools for forecasting.
Page 11
Stock Market Index Prediction using CAS Algorithm
Section 3
ethod
Pattern matching using Ant Algorithm is employed to generalize similar segments. This
helps us predict the future movement direction of a target segment. After determining
the future movement direction, several formulas are applied to calculate the index point
for the next trade date. The logical flow of the whole financial forecast process is shown
in Fig. 1.
Fig. 1 Logical flow of the whole financial forecast process
3.1
Data Set
S&P 500 Composite Index historical data is used for training and testing the system. 10
years data of the S&P 500 Composite Index, from 15-Sept-1994 to 15-Sept-2004,
totally 2610 observations, are obtained from DataStream using terminals in the
Department of Economics and Finance of the City University of Hong Kong.
The data from 15-Sept-1994 to 12-Oct-1994 are unused, because segments formed
within this period would have undefined attributes due to inadequate historical data
(e.g. percentage difference between index and average index of last 20 days). The data
from 13-Oct-1994 to 19-Mar-2003, totally 2220 observations, will be used for training
the system. And the data from 20-Mar-2003 to 14-Sept-2004, totally 389 observations,
will be used for testing purpose, i.e. making actual forecast. The preservation of data is
Page 12
Stock Market Index Prediction using CAS Algorithm
essential, because it prevents the same segment to be found in the learnt data, which
may lead to over-optimistic result.
3.2
Pattern Matching
Generalization of segments is done according to the some of the segment attributes
stated in Table 1. Segments which are too large or too small cannot be generalized
well. In order words, the segment size affects prediction accuracy. As the ants learn
how segments should be generalized, the optimal segment size could be determined
by picking the size that the best prediction accuracy was achieved with.
3.2.1
Segments
The time-series will be chopped into segments. Each segment is associated with a
number of attributes, which are sets of values for each index point in the segments. The
segment attributes are shown in Table 1.
Segment attribute
Description
Absolute index
The exact index point
Movement magnitude
The change in index values
Movement percentage
The percentage change in index value
Movement direction
A rise, a falls or zero index movement
Percentage difference between
The percentage difference between Index and
Index and averaged index
averaged index
Table 1 Segment attributes
3.2.2
Sub-time-series
Sub-time-series are formed in order to reduce system loads and, at the same time,
provide a superior environment for ants to operate.
A time-series contains a number of sub-time-series, and a sub-time-series consists of
segments as shown in Fig. 2. A sub-time-series is actually a logical structure only,
which allows us to process the time-series group by group. The size of sub-time-series
Page 13
Stock Market Index Prediction using CAS Algorithm
is measured with the number of segments formed within that sub-time-series, and it is
proportional to the number of ants used for training or prediction. This “ant to sub-timeseries size” ratio is set to 1 in this project. The prediction accuracy maybe affected
when this ratio changes.
Index
Sub-time-series
Index Point
Segment
Time Series
Time
Fig. 2 Relationships between time-series, sub-time-series, segment and index point
3.2.3
Ant Algorithm On Shortest Path Problem
Ant algorithm will be used to find out the matched segments. Ant is very good in solving
shortest-path problems, for example, traveling salesman problem [9]. Ants move from
the start to the destination, and then move back to the start using the same path.
As they walk along the path, they leave pheromone, which is a substance that attracts
other ants to choose that path instead of the others. An ant which chose the shortest
path returns to the start quicker than other ants. Hence, the path it chosen has a higher
concentration of pheromone than the other paths. It makes other ant choose this path
with a higher chance. However, pheromone evaporates over time, making the path not
as attractive as before.
Fig. 3 shows how Ant algorithm works for shortest-path problem in a network.
Assuming all edges have the same length, the red path has a length of 4, while the blue
path has a length of 5. If two ants move together from the start to the destination, and
then back to the start using the same path, the ant which chooses the red path takes 8
Page 14
Stock Market Index Prediction using CAS Algorithm
steps to finish its journey, while the ant which chooses the blue path takes 10. So when
another ant starts its journey at step 8, it is facing a situation that the red path has the
concentration of pheromone doubled compare with the blue one, and hence, the red
one is more attractive. It is more likely to choose the red path than the blue path.
Node
Start
Destination
Fig. 3 Shortest-path finding in a network using Ant Algorithm
3.2.4
Ant Algorithm On This Project
For the application of Ant algorithm to our pattern matching problem, some of the listed
segment attributes are used to find out the matched segment. Different ants give
different weights to these segment attributes, and they also give a weight to the other
ants’ choices, which simulates the effect of pheromone. A segment and a sub-timeseries to be evaluated will be given, and different ants will start from different locations
in the sub-time-series, which are randomized. An ant moves along the sub-time-series,
inspects its left hand side and right hand side, and stop at the point that it believes that
the matched pattern was found on each side. It then chooses the one with the lowest
difference among both sides. The track it went through will attract other ants’ choices,
and the degree will be influenced by the weight given by the other ants. In Fig. 4, by
combining all individual decisions, the shaded part in the sub-time-series is found to be
matched with the target segment.
Page 15
Stock Market Index Prediction using CAS Algorithm
Fig. 4 Aggregate decision from individual ants
Our application of Ant algorithm is different from the general application, in which our
action space is not a network, but a two-way path. A sub-time-series can be treated as
a two-way path, since the number of edges connected to a node must be 1 or 2.
Pheromone left by ants can attract other ants to move forward or stop. In Fig. 4 when
Ant 4 reaches the track of Ant 1, it is encouraged to move forward by the pheromone
left by Ant 1. After it moves forward, and reaches the track of Ant 2, it is encouraged to
forward pheromone left by both Ant 1 and Ant 2. On the other hand, when Ant 4
reaches the end of Ant 1’s track, it is encouraged to stop, because the concentration of
pheromone on the next state is lower than the current state.
Page 16
Stock Market Index Prediction using CAS Algorithm
3.2.5
Similarity Determination
A segment S of size n has a number of segment attributes:
Segment attribute
Denoted by
Absolute indices
Y = {y1, y2, …, yn-1, yn}
Movement magnitudes
V = {v2, v3, …, vn-1, vn }, where vn = (yn - yn-1)
Movements percentages
P = {p2, p3, …, pn-1, pn }, where p n = (vn / yn-1) * 100
Movement directions
D = {d2, d3, …, dn-1, dn }, where
dn = (1 / n) when (vn > 0),
dn = -(1 / n) when (vn < 0),
dn = 0 when (vn = 0)
Percentage difference between
index and average index of last m
days
A(m) = {a1, a2, …, an-1, an }, where
n
an = ([yn - {(
∑
yi) / m}] / yn) * 100
i =n−m
Table 2 Segment attribute notations
The weighting function F consists of 6 weighted components: Movements in
percentage, movement directions, indices over average index of the last 5, 10, 15 and
20 days. All of these will be used in F to match two segments. Any ant determines the
level of similarity of two segments Sg and Sh by applying its weights to F:
4
F = w1(Pg - Ph) + w2(Dg - Dh) +
∑
i =1
n
wi+2{A g(5*i) - A h(5*i)}, where
∑
wi = 1
i =1
Any two identical segments generate a zero F value. It means when F approaches
zero, the similarity increases.
3.2.6
Discounted-F
In prediction mode, the F value will be discounted to simulate the effect of pheromone.
The default maximum discount was set to 20% in this research. An F value can be
discounted using the following formulas:
Page 17
Stock Market Index Prediction using CAS Algorithm
F’ = F * [1- D’M * (P / T)]
(1)
D’M = DM * [1 – (Rs – Rsm) / (RsM – Rsm)]
(2)
Rs = S / (S + F)
(3)
In the first formula, F’ is the discounted-F, D’M is the actual maximum discount, while V
is the number of ants passed by the segment, and T is the total number of working
ants. The formula discounts the F value with a portion of the actual maximum discount,
according to the percentage of ants passed by the segment.
In the second formula, DM is the default maximum discount, Rs is the direction success
rate of the ant, while Rsm is the minimum direction success rate among all ants, and RsM
is the maximum direction success rate among all ants. The formula sets the actual
maximum discount according to the ant’s learning performance compared to the
extremes. For example, the best performed ant would have no discount, and the worst
performed ant would have the actual maximum discount equals the default one.
In the last formula, S is the number of direction success in learning for the ant, and F is
the number of direction fail in learning for the ant.
F = 1.22
Next Segment
Chosen Segment
Previous Segment
F = 1.25
F = 1.24
F = 1.32
Ant is moving towards the left
Fig. 5 The ant should have taken one more move
Fig. 5 shows the F value of the next segment is discounted to 1.22 from 1.25. Let us
denote “next segment” as n, “chosen segment” as c, “previous segment” as p and
“target segment” as t. shows the details of discounted-F. Assumed that there are 100
working agents, where 25 of them passed by n. The direction success rate of the ant is
54%, while the minimum among all ants is 38% and the maximum among all ants is
66%. The discounted F can be calculated:
Page 18
Stock Market Index Prediction using CAS Algorithm
Rs = 0.54
D’M = 0.2 * [1 – (0.54 – 0.38) / (0.66 – 0.38)] = 0.085714
F’ = 1.25 * [1- 0.085714 * (25 / 100)] = 1.223214
The ant will choose n instead of c, since n has a smaller F value than c. The
discounting effect mainly determined by the number of ants passed by the segment,
and the ant’s learning performance.
3.3
Efficiency Enhancement
The time-series is stored in a database. Hence, agents query the database frequently
when learning or forecasting. This leads to a large number of I/O operations in the
system, which increases system load and decreases system performance. However,
large amount of memory is required if we load the whole time-series into the main
memory. This could also decrease system performance, depending on the time-series
size.
To facilitate an efficient learning and forecasting, a sub-time-series will be cached (i.e.
loaded into the memory) at one time for ants to work with, so as to reduce the number
of I/O operations and have a controlled memory consumption. After all ants finished
working on the cached sub-time-series, it will be replaced by the next one. There will be
an overlapped period between two consecutive sub-time-series, in order to ensure all
patterns are evaluated. The size of the overlapped period equals (s – 1), where s is the
size of the segment being learnt. Fig. 6 shows the sub-time-series replacement and the
overlapped period in between.
Sub-time-series
Overlapped period
Target segment
Time-series
Fig. 6 Sub-time-series replacement
Page 19
Stock Market Index Prediction using CAS Algorithm
3.4
Weight Adjustment
During the learning process, each ant find out the segment which it thinks is matched.
After all ants made their decisions, ants’ choices are compared to the target segment. A
choice is considered to be incorrect, if the next movement direction of the chosen
segment is different from that of the target segment. In this case, the neighbours of the
chosen segment will be inspected to determine how the ant should adjust its weights.
Fig. 7 The ant should have taken one more move
Let us denote “next segment” as n, “chosen segment” as c, “previous segment” as p
and “target segment” as t. In Fig. 7, the ant was moving towards the left, and it decided
not to move further after reaching c, it was because Fn is higher Fc. As we mentioned
earlier, the F value consists of 6 components (named as F-components), and they
make different contributions to the F value. We assume that the weight associated with
the maximum F-component should be adjusted in a mismatch. The reason for having
this assumption is that, adjusting the weight associating with the minimum Fcomponent could be ineffective, because the non-weighted component could be zero.
The ant chose an incorrect segment in Fig. 7, because the next movement direction of
c is different from that of t. After inspecting the neighbours of c (i.e. p and n), it is known
that the ant should have taken one more move to choose n, since the next segment
has the correct next movement direction. Base on our assumption, it is believed that
the maximum F-component Fn inhibited the ant from moving forward to n. So, the
corresponding weight adjustment is to lower the weight associated with the maximum
F-component.
The same logic of weight adjustment is applied to other cases, and presented in Table
3:
Page 20
Stock Market Index Prediction using CAS Algorithm
Correct
Correct ant
Explanation for failure
Weight adjusted
neighbour
action
Next
Take one more
The effect of the maximum
Lower the weight
segment
move
F-component was too
associated with the
large, and inhibited the ant
maximum F-
from moving forward
component
Previous
Do not take the
The effect of the maximum
Raise the weight
segment
last move
F-component was too
associated with the
weak, and could not inhibit
maximum F-
the ant from moving
component
forward
No correct
Nil
Nil
Nil
neighbour
Table 3 Logic for adjusting weight for different situations
However, when both neighbours are not correct, it is assumed that the ant has already
made the optimal choice in its locality.
Page 21
Stock Market Index Prediction using CAS Algorithm
3.5 Index Calculation
A number of formulas are used to calculate the predicted index point for the next trade
date after determined the movement direction, and their performances are being
assessed by comparing to the differences to the actual index point. Table 4 lists these
formulas.
No
1
Formula
Description
Take the averaged movement of the
n
yn+1 = yn ± ( ∑ vi) / n
segment as the next movement
i =1
2
n
yn+1 = yn ±
∑
i =1
n
Take the weighted averaged movement
j =1
of the segment as the next movement,
{i / ( ∑ j)} vi
while the more recent the index point, the
higher the weight applied
3
n
yn+1 = yn ± yn * {( ∑ pi) / n}
i =1
Take the averaged movement percentage
of the segment as the next movement
percentage
4
n
n
Take the weighted averaged movement
i =1
j =1
percentage of the segment as the next
yn+1 = yn ± yn * [ ∑ {i / ( ∑ j)} pi]
movement percentage, while the more
recent the index point, the higher the
weight applied
Table 4 Formulas used for index calculation
The predicted index point is basically calculated by the last index point plus a
movement. The movement direction predicted determines the sign of the movement.
Page 22
Stock Market Index Prediction using CAS Algorithm
Section 4
esults
Two experiments were done using an “ant to sub-time-series size” ratio of 1. It means
the number of ants used equals the number of segments formed within a sub-timeseries. This ratio affects the coverage of ants’ inspection in a sub-time-series. If the
ratio is too small, some parts of the sub-time-series may not be inspected by any ant.
Ants would be over-dispersed, and there may not be too many ants chosen the same
segment. On the other hand, if the ratio is too large, the sub-time-series maybe
crowded by ants. Under this situation, too many ants may start their inspections at the
same location, and they are likely to choose the same segment. Hence, the degree of
collaborative decision making will be lower.
For the data between 20-Mar-2003 to 14-Sept-2004, 35 samples were randomly
selected to test our system. 200 ants were used for prediction.
In the first experiment, the segment size was set to 9. The system predicted the next
movement direction successfully for 22 out of 35 samples, the accuracy was 62.86%.
The result for each prediction is shown in Table 5.
No Predicted on
Predicted for
Actual next
Predicted next
movement
movement
direction
direction
Correct
1
15-Apr-2003
16-Apr-2003
Fall
Fall
9
2
02-May-2003
05-May-2003
Fall
Rise
8
3
12-May-2003
13-May-2003
Fall
Fall
9
4
16-Jun-2003
17-Jun-2003
Rise
Fall
8
Page 23
Stock Market Index Prediction using CAS Algorithm
5
25-Jun-2003
26-Jun-2003
Rise
Fall
8
6
30-Jun-2003
01-Jul-2003
Rise
Rise
9
7
08-Aug-2003
11-Aug-2003
Rise
Rise
9
8
11-Aug-2003
12-Aug-2003
Rise
Rise
9
9
18-Aug-2003
19-Aug-2003
Rise
Rise
9
10
22-Aug-2003
25-Aug-2003
Rise
Rise
9
11
29-Aug-2003
01-Sept-2003
Same
Fall
8
12
19-Sept-2003
22-Sept-2003
Fall
Fall
9
13
29-Sept-2003
30-Sept-2003
Fall
Fall
9
14
03-Oct-2003
06-Oct-2003
Rise
Rise
9
15
13-Oct-2003
14-Oct-2003
Rise
Fall
8
16
15-Oct-2003
16-Oct-2003
Rise
Rise
9
17
17-Oct-2003
20-Oct-2003
Rise
Rise
9
18
26-Nov-2003
27-Nov-2003
Same
Fall
8
19
02-Dec-2003
03-Dec-2003
Fall
Fall
9
20
15-Dec-2003
16-Dec-2003
Rise
Rise
9
21
02-Jan-2004
05-Jan-2004
Rise
Fall
8
22
27-Jan-2004
28-Jan-2004
Fall
Rise
8
23
27-Feb-2004
01-Mar-2004
Rise
Rise
9
24
16-Mar-2004
17-Mar-2004
Rise
Same
8
25
24-Mar-2004
25-Mar-2004
Rise
Rise
9
26
30-Apr-2004
03-May-2004
Rise
Rise
9
27
11-May-2004
12-May-2004
Rise
Fall
8
28
26-May-2004
27-May-2004
Rise
Fall
8
29
18-Jun-2004
21-Jun-2004
Fall
Fall
9
30
30-Jun-2004
01-Jul-2004
Fall
Fall
9
31
12-Jul-2004
13-Jul-2004
Rise
Rise
9
32
30-Jul-2004
02-Aug-2004
Rise
Rise
9
33
26-Aug-2004
27-Aug-2004
Rise
Rise
9
34
03-Sept-2004
06-Sept-2004
Same
Rise
8
35
09-Sept-2004
10-Sept-2004
Rise
Fall
8
Table 5 SMIFS direction forecast result with a segment size of 9
The average weights of the 200 ants which worked with a segment size of 9 are shown
in Table 6.
Page 24
Stock Market Index Prediction using CAS Algorithm
Weight
Average of 200 Ants
1
0.08512
2
0.006535
3
0.250665
4
0.209735
5
0.30236
6
0.145585
Table 6 Average weights of ants worked with a segment size of 9
For the 22 success prediction, we compared the performance of each index calculation
formulas. Formula 2 performed the best among the other formulas. The result is
presented in Table 7.
Formula No.
Average Error
1
0.52%
2
0.48%
3
0.52%
4
0.49%
Average
0.5%
Table 7 SMIFS index forecast result for experiment 1
In the second experiment, the segment size was set to 3 and the same 35 samples
were used, however, we could not obtain a better result compared to the previous
experiment. The system predicted the next movement direction successfully for 16 out
of 35 samples, the accuracy was 45.71%. The result for each prediction is shown in
Table 8.
No Predicted on
Predicted for
Actual next
Predicted next
movement
movement
direction
direction
Correct
1
15-Apr-2003
16-Apr-2003
Fall
Fall
9
2
02-May-2003
05-May-2003
Fall
Fall
9
3
12-May-2003
13-May-2003
Fall
Fall
9
4
16-Jun-2003
17-Jun-2003
Rise
Fall
8
Page 25
Stock Market Index Prediction using CAS Algorithm
5
25-Jun-2003
26-Jun-2003
Rise
Rise
9
6
30-Jun-2003
01-Jul-2003
Rise
Rise
9
7
08-Aug-2003
11-Aug-2003
Rise
Fall
8
8
11-Aug-2003
12-Aug-2003
Rise
Fall
8
9
18-Aug-2003
19-Aug-2003
Rise
Fall
8
10
22-Aug-2003
25-Aug-2003
Rise
Fall
8
11
29-Aug-2003
01-Sept-2003
Same
Fall
8
12
19-Sept-2003
22-Sept-2003
Fall
Fall
9
13
29-Sept-2003
30-Sept-2003
Fall
Rise
8
14
03-Oct-2003
06-Oct-2003
Rise
Fall
8
15
13-Oct-2003
14-Oct-2003
Rise
Rise
9
16
15-Oct-2003
16-Oct-2003
Rise
Fall
8
17
17-Oct-2003
20-Oct-2003
Rise
Rise
9
18
26-Nov-2003
27-Nov-2003
Same
Rise
8
19
02-Dec-2003
03-Dec-2003
Fall
Fall
9
20
15-Dec-2003
16-Dec-2003
Rise
Rise
9
21
02-Jan-2004
05-Jan-2004
Rise
Fall
8
22
27-Jan-2004
28-Jan-2004
Fall
Fall
9
23
27-Feb-2004
01-Mar-2004
Rise
Fall
8
24
16-Mar-2004
17-Mar-2004
Rise
Fall
8
25
24-Mar-2004
25-Mar-2004
Rise
Rise
9
26
30-Apr-2004
03-May-2004
Rise
Rise
9
27
11-May-2004
12-May-2004
Rise
Rise
9
28
26-May-2004
27-May-2004
Rise
Rise
9
29
18-Jun-2004
21-Jun-2004
Fall
Rise
8
30
30-Jun-2004
01-Jul-2004
Fall
Rise
8
31
12-Jul-2004
13-Jul-2004
Rise
Rise
9
32
30-Jul-2004
02-Aug-2004
Rise
Fall
8
33
26-Aug-2004
27-Aug-2004
Rise
Fall
8
34
03-Sept-2004
06-Sept-2004
Same
Fall
8
35
09-Sept-2004
10-Sept-2004
Rise
Fall
8
Table 8 SMIFS direction forecast result with a segment size of 3
The average weights of the 200 ants which worked with a segment size of 3 are shown
in Table 9.
Page 26
Stock Market Index Prediction using CAS Algorithm
Weight
Average of 200 Ants
1
0.166155
2
0.00443
3
0.306105
4
0.22251
5
0.203905
6
0.096895
Table 9 Average weights of ants worked with a segment size of 3
For the 16 success prediction, we compared the performance of each index calculation
formulas. Formula 1 performed the best among the other formulas. The result is
presented in Table 10.
Formula No.
Average Error
1
0.41%
2
0.43%
3
0.42%
4
0.44%
Average
0.42%
Table 10 SMIFS index forecast result for experiment 2
Page 27
Stock Market Index Prediction using CAS Algorithm
Section 5
iscussion
Our direction prediction result was better than using genetic algorithm forecasting,
which had a direction success rate of 50.39% [7]. However, Singh’s pattern matching
method performed better with a direction success rate of 76% with a segment size of 3,
and he suggested this to be the optimal segment size [8].
Our results had a 62.86% accuracy with a segment size of 9, however, it fell to 45.71%
when the segment size became 3. The different finding between Singh’s work and this
project indicated that different pattern matching implementation might have different
optimal segment size. System applying reinforcement learning algorithm and other
machine learning algorithm may have different optimal segment size. Possibily the
optimal segment size was not yet obtained in this project, which could further improve
the system’s direction success rate.
Apart from segment sizes, “ant to sub-time-series size” also affects the system
performance. Throughout this project, the ratio was set to 1. This ratio could be further
reduced to obtain a better result. With a ratio of 1, the sub-time-series may still be
overcrowed. For example, 0.5 could be a better choice. This indicates that the “ant to
sub-time-series” ratio should be well-controlled when applying Ant Algorithm to two way
path environments.
The index calculation formulas performances were acceptable. Most of the formulas
had the average error within 0.5%. These formulas actually only made use of the
predicted movement direction, and the target segment data for calculation. The
Page 28
Stock Market Index Prediction using CAS Algorithm
matched segment data were not taken into account. Further research would try to make
use of the matched segment data as well.
Finally, the average weights had shown a high importance for weight3, weight4, and
weight5. During the pattern matching process, these weights were applied to 3 of the Fcomponents: percentage difference between index of average index of last 5 days, last
10 days and last 15 days respectively. With a segment size of 9, the 3 weights
contributed 0.76276 of 1 to the total weight. Similarly, with a segment size of 3, they
contributed 0.73252 of 1 to the total weight. The result suggested that these 3 factors
are significantly more important than the other factors for finding a matched segment no
matter how large the segment size is. However, the movement direction was not
considered as important, and it contributed less than 0.01 to the total weight in both
experiments.
Page 29
Stock Market Index Prediction using CAS Algorithm
Section 6
onclusion
Applying Ant Algorithm to a two way path environment is relatively new idea. More
research would be needed to explore the optimal settings in order to make the
algorithm perform better. Such algorithm accommodates a flexible and dynamical
pattern matching task in time series analysis.
ANN had been widely used in solving this kind of problem, which gave us a
considerably good result. Nonetheless, it does not provide us any information about its
decision criteria. We would question where the highly accurate results come from. This
is the reason why searching for a new method for solving this kind of problem is worth
to do. By applying pattern matching technique, we could have full understanding about
why a certain pattern is considered as a match. It is also clear that how the matched
pattern would affect the result. This help us know more about the hidden things inside
the time series.
The pattern matching method based on Ant Algorithm developed in this project worked
better with a larger segment size. Matching a larger segment would be more difficult
than matching a smaller segment, because they are harder to be generalized. Maybe
this also explains why a larger segment size led to a better result in this project.
Because of this reason, we suggest a larger segment size, if further researches would
like to take the matched segment into account in further index calculation. However, the
size of the data set may have to increase, since a larger segment may not repeat itself
in a relatively short period based on Elliot Wave Principle.
Page 30
Stock Market Index Prediction using CAS Algorithm
Section 7
eferences
[1]
Gwinn, T. (2004). Robert Rosen – Complexity in a Nutshell. [Online]. Avaliable:
http://www.panmere.com/rosen/faq_complex1.htm [2004, Sept. 21]
[2]
Johnson, N. F., Paul Jefferies, and Pak Ming Hui. (2003). Financial Market Complexity.
New York: Oxford University Press.
[3]
Investopedia.com. [Online]. Financial Concepts – Random Walk Theory. Available:
http://www.investopedia.com/university/concepts/concepts5.asp [2004, Sept. 21]
[4]
Lippi, M., and Daniel Thornton. (2004). A Dynamic Factor Analysis of the Response of
U.S. Interest Rates to News. [Online]. Research Division of the Federal Reserve
Bank of St. Louis. Available:
http://research.stlouisfed.org/wp/2004/2004-013.pdf [2004, Sept. 22]
[5]
任若恩, 馬向前, 沈沛龍, 劉莉亞, 及鄧雲勝. (2003). 技術分析: 北京: 中國財政經濟出版社.
[6]
Wu, Shaun-inn, and Ruey-Ping Lu. (1993). Combining Artificial Neural Networks and
Statistics for Stock-Market Forecasting. Proceedings of the 1993 ACM conference on
Computer science, 257-64
[7]
Tsang, Edward, and Jin Li. (2000). Combining Ordinal Financial Predictions with Genetic
Programming. Proceedings of the 2nd International Conference on Intelligent Data
Engineering and Automated Learning, Data Mining, Financial Engineering, and
Intelligent Agents, 532-37
[8]
Singh, S., and Paul McAtackney. (1998). Dynamic Time-Series Forecasting using Local
Approximation. Proceedings of the 10th IEEE International Conference on Tools with
AI, 392-99
[9]
Bonabeau, E., and Guy Théraulaz. (2000). Swarm Smarts. [Online]. Scientific American,
Inc. Available:
http://dsp.jpl.nasa.gov/members/payman/swarm/sciam_0300.pdf [2004, Sept. 23]
Page 31
Stock Market Index Prediction using CAS Algorithm
Section 8
ppendix A - Experience & Difficulty
8.1
Use Of Ant Algorithm
It was really a challenging task when designing how Ant algorithm can be used in
solving pattern matching problem in this project. We considered putting segments into a
network, and using Ant algorithm to find out the first matched segment, just liked
solving the shortest-path problem. However, there must be some criteria to determine
where to put a certain segment in the network, which are difficult to define. This made
the method unfeasible.
Another alternative was to loop through each segment in the time-series, and compare
the segments with the target segment based on some formulas. It means that no agent
will be used. This method is relatively hardwired, i.e. the system always behaves the
same because there is only one party who will evaluate the segments similarity using
the same formula.
We also considered using other reinforcement learning algorithm apart from Ant
algorithm. Traditional reinforcement learning algorithm uses agents for making
decision, however, one agent cannot influence the other one’s choice. This makes each
agent to work on its own, as if there is no other agent. The only advantage was that
there are more than one suggested matched segment for reference, but there could
have no relation between agents’ decisions.
Finally we decided to use the current method, i.e. applying Ant algorithm to a 2-way
path environment. We allow ants to start from a random position on a path, and they
Page 32
Stock Market Index Prediction using CAS Algorithm
will be inspecting their own locality to decide which segment is the best-matched in this
locality. Our method has advantages over the previous methods. First, we need not to
consider categorizing segments. Second, ants apply their own set of weighting to the F
function and generating different results. Lastly, it employs ants for collaborative
decision making, which choices between ants are correlated.
8.2
Weighting Function Design
When we were setting criteria for pattern matching, originally we planned to include
absolute indices and movement magnitudes in similarity determination. This is
meaningful in making prediction if peoples’ decisions are different for the same
segment pattern at different index levels. For example, different index levels may reflect
different economic prosperity and investment environment, which affects peoples’
willingness to invest. These finally affect the financial index movement.
But since absolute indices and movement magnitudes have no lower and upper limits,
they cannot be normalized. They may give incorrect indication to the level of similarity
when being set as one of the F-components.
Even though these criteria are meaningful, we failed to normalize these values and
finally chose not to include them in the weighting function.
8.3
Weight Adjustment Optimization
Another difficult task was to optimize the weight adjustment method. We realized that,
for each correct neighbour (as presented in Table 3), we could have two different
interpretations, which leads to different adjustment rules. Table 11 shows the
differences between two explanations for pattern matching failure:
Page 33
Stock Market Index Prediction using CAS Algorithm
Correct
Adopted explanation for failure
Another explanation for failure
Next
The effect of the maximum F-
The effect of the minimum F-
segment
component was too large, and
component was too small (i.e. its
inhibited the ant from moving
value was too big), and could not
forward
push the ant move one extra step
Previous
The effect of the maximum F-
The effect of the minimum F-
segment
component was too weak, and
component was too strong (i.e.
could not inhibit the ant from
its value was too small), and
moving forward
pushed the ant move one extra
neighbour
step
No correct
Nil
Nil
neighbour
Table 11 Different explanations for pattern matching failure
Weight adjustment rules could be set out by combining the two types of explanations,
i.e. 2C2 = 4.
Originally we applied another adjustment rule. We tried to raise the weight associated
with the minimum F-component when the previous neighbour is correct. The
assumption made behind was that, the value of the minimum F-component was too
small, so that the ant made the last move.
This assumption seemed to be correct, and the method was also good because it does
not lead to weight bias, i.e. a certain weight is always increasing or decreasing. After
the weight associated with the minimum F-component raises to a certain level, it would
not be the minimum F-component anymore, and the adjustment is going to be made on
another weight.
However, the value of the minimum F-component could be zero, when the two
segments are exactly the same on that criterion. In that case, no matter how much we
adjust the weight, the adjustment will be ineffective. Thereby, this method is not
practical.
Page 34
Stock Market Index Prediction using CAS Algorithm
Since adjusting the weight associated with the minimum F-component could be
ineffective, we decided to adopt the current method, adjusting the weight associated
with the maximum F-component for both cases.
8.4
Project Management
From this project, we realized the importance of a good project planning, and spending
effort to keep up with intermediate deadlines. They really ensure progress and help us
get things done on time.
It is sometimes tough to keep up with deadlines in intermediate stages, because they
seem not to be important. But of course this is not the case. By keeping up with them, a
final product can be worked out with less effort at the end of the project. Oppositely, if
one deadline is not kept up, team members feel very difficult and dejected to keep up
when the second one approaches. As a result, the team gets further and further from
the target when each deadline comes. Ultimately the team has to spend a large amount
of effort to complete the project at the end, but for a low quality work.
Page 35
Stock Market Index Prediction using CAS Algorithm
Section 9
ppendix B - Database Design
5 tables were created in SMIFS, they are shown in Fig. 8:
Fig. 8 Relations in Stock Market Index Forecast System (SMIFS)
1.
Table Overview
Table Name
Description
IDX_VAL
Stock market index valuation per date
IDX
Stock market index
AGENT
Ants’ attributes
AGENT_WEIGHT
Weights applied to weighting function for each ant
MATCHING_RESULT
Temporary storage for summarizing chosen segments
Page 36
Stock Market Index Prediction using CAS Algorithm
2.
Table Details
IDX_VAL table
Field Name
Data type & Size
Description
IID
NUMERIC(6,0)
Instrument ID
DATE
NUMERIC(8,0)
Trade date
VAL
NUMERIC(7,2)
Index point
MOV
NUMERIC(7,2)
Movement
MOV_PCT
NUMERIC(5,2)
Movement in percentage
MOV_DIR
NUMERIC(1,0)
Movement in direction
AVG_LST5
NUMERIC(8,3)
Averaged index of the last 5 trade dates
AVG_LST10
NUMERIC(8,3)
Averaged index of the last 10 trade dates
AVG_LST15
NUMERIC(8,3)
Averaged index of the last 15 trade dates
AVG_LST20
NUMERIC(8,3)
Averaged index of the last 20 trade dates
Field Name
Data type & Size
Description
IID
NUMERIC(6,0)
Instrument ID
CODE
VARCHAR(8)
Instrument abbreviation
NAME
VARCHAR(30)
Instrument full name
IDX table
Page 37
Stock Market Index Prediction using CAS Algorithm
AGENT table
Field name
Data type & Size
Description
AID
NUMERIC(6,0)
Agent ID
IID
NUMERIC(6,0)
Instrument ID
SEG_SIZE
NUMERIC(3,0)
Segment size
GRP
NUMERIC(3,0)
Group
DATE_CRET
NUMERIC(8,0)
Creation date
DATE_INIT
NUMERIC(8,0)
Initialization date
NUM_LEARN
NUMERIC(8,0)
Number of learnings done
NUM_PREDICT NUMERIC(8,0)
Number of predictions made
DIR_SUCCESS
Number of success in predicting movement
NUMERIC(8,0)
direction
DIR_FAIL
NUMERIC(8,0)
Number of fail in predicting movement
direction (for consistency checking)
AGENT_WEIGHT table
Field name
Data type & Size
Description
AID
NUMERIC(6,0)
Agent ID
WEIGHT1
NUMERIC(4,3)
Weight for movement percentage match
WEIGHT2
NUMERIC(4,3)
Weight for movements direction match
WEIGHT3
NUMERIC(4,3)
Weight for average index (last 5) match
WEIGHT4
NUMERIC(4,3)
Weight for average index (last 10) match
WEIGHT5
NUMERIC(4,3)
Weight for average index (last 15) match
WEIGHT6
NUMERIC(4,3)
Weight for average index (last 20) match
MATCHING_RESULT table
Field name
Data type & Size
Description
DATE_BEGIN
NUMERIC(8,0)
Date begin of the chosen segment
DATE_END
NUMERIC(8,0)
Date end of the chosen segment
F
NUMERIC(10,8)
F value
Page 38
Stock Market Index Prediction using CAS Algorithm
Section 10
ppendix C - System Design
10.1
Class Description
Table 12 lists the classes defined in his project:
Class Name
Description
SMIFS
Stock Market Index Forecast System. It encapsulates all system
components
AgentManager
Agent Management Component. It provides function for
maintaining agents (ants)
IndexManager
Index Management Component. It provides function for
maintaining stock market indices
PatternMatcher
Pattern Matching Component. It matches segments in prediction
mode, and improves ants in learning mode
FutureIndexCal
Future Index Calculation Component. It calculates the next index
valuation for a segment in prediction mode after PatternMatcher
finds a matched segment. It uses various formulas in the
calculation and records the performance of different formulas
Agent
An individual participated in pattern matching process. It decides
which segment in the time-series matches with the target
segment. Every agent has its own set of weights, which will be
applied to the weighting function F
Page 39
Stock Market Index Prediction using CAS Algorithm
Segment
A set of index data in a given sub-time-series with predefined
size
SubTsPeriod
A data structure that holds details of a sub-time-series
IndexPt
An index point in a time-series. It records details of financial data
on a particular date
Difference
A data structure that contained by a Segment for holding each
component that forms the F value
JourneyGroup
A data structure that specifies a group of agents that would be
managed by one thread during pattern matching process
Table 12 Classes defined in the system
10.2
1.
UML Design
Use Case Diagram
SMIFS
Train
Predict
User
ManageAgent
ManageIndex
Page 40
1
WeightedF()
MinIndividualDiff()
MaxIndividualDiff()
movPct : Double
movDir : Double
avgLst5 : Double
avgLst10 : Double
avgLst15 : Double
avgLst20 : Double
Difference
haveChosenSeg()
resetChosenF()
1
1
CloneWithIdxData()
CloneWithIdxRef()
isTarget : Boolean
size : Integer
dateBegin : String
dateEnd : String
indexID : Integer
idxIndex[size] : Integer
idxNextIndex : Integer
index[size] : IndexPt
cntVisited : Integer
diff : Difference
Segment
*
reset()
Agent
*
agentID : Integer
indexID : Integer
segSize : Integer
agentGrp : Integer
dateCreated : String
dateInit : String
cntLearn : Integer
cntPredict : Integer
cntDirSuccess : Integer
cntDirFail : Integer
weight[n] : Double
idxChosenSeg : Integer
journeyDir : Integer
chosenSegStartDate : String
chosenSegEndDate : String
chosenF : Double
chosenNextMovDir : Integer
1
createAgent()
deleteAgent()
modifyAgent()
countAgent()
listAgent()
indexCode : String
opMode : Integer
opStartTime : DateTime
subTsSize : Integer
cntWorkingAgent : Integer
cntSubTsProcessed : Integer
cntSegProcessed : Integer
cntTotalSubTs : Integer
tsDateBegin : Date
tsDateEnd : Date
subTsDateBegin : Date
subTsDateEnd : Date
segDateBegin : Date
segDateEnd : Date
*
1
SystemStatus
cntAgent : Integer
agent : ArrayList
AgentManager
1
1
SMIFS
1
start()
match()
manageJourney()
allJourneyGroupFinish()
determineMatchedSegment()
adjustWeight()
applyAdjustment()
buildTargetSeg()
partitionSubTs()
prepareAgent()
loadIndexData()
buildSegment()
createAgent()
indexCountRec()
agentCountRec()
getMinSuccessRate()
getMaxSuccessRate()
journeyGrp : JourneyGroup
cretAgent : Boolean
cntCret : Integer
indexID : Integer
indexCode : String
opMode : Integer
dateBegin : String
dateEnd : String
datePredict : String
segSize : Integer
subTsSize : Integer
agentGrp : Integer
cntAgent : Integer
cntSeg : Integer
cntSubTs : Integer
indexPt : ArrayList
agent : ArrayList
seg : ArrayList
subTsPeriod : ArrayList
sysStat : SystemStatus
targetSeg : Segment
matchedSeg : Segment
PatternMatcher
1
startOp()
manageAgent()
manageIndex()
agentMan : AgentManager
indexMan : IndexManager
sysStatus : SystemStatus
patternMatcher : PatternMatcher
1
1
1
1
*
calulationMethod1()
calulationMethod2()
calulationMethod3()
calulationMethod4()
matchedSeg : Segment
targetSeg : Segment
indexCode : String
actual : IndexPt
predictMovDir : Integer
FutureIndexCal
1
indexID : Integer
date : String
indexValue : Double
movValue : Double
movPct : Single
movDir : Double
avgLst5 : Double
avgLst10 : Double
avgLst15 : Double
avgLst20 : Double
IndexPt
beingManaged : Boolean
allFinished : Boolean
idxFirstAgent : Integer
cntAgentManaged : Integer
JourneyGroup
*
createIndex()
deleteIndex()
modifyIndex()
countIndex()
listIndex()
cntIndex : Integer
index : ArrayList
IndexManager
1
2.
1
1
Stock Market Index Prediction using CAS Algorithm
Class Diagram (Accessors / Attributes are not shown)
Page 41
Stock Market Index Prediction using CAS Algorithm
3.
Sequence Diagrams – Train
: User
: SMIFS
: PatternMatcher
: SystemStatus
startOp(opMode, indexId, indexCode,
startDate, endDate, predictDate, segSize,
agentGrp, cntAgent, sysStat)
start ()
partitionSubTs(indexId,
dateStart, dateEnd)
prepareAgent(indexId,
segSize, agentGrp, cntAgent,
cretAgent, cntCret)
match()
(Update system runtime status)
loadIndexData(opMode,
indexId, subTsDateBegin,
subTsDateEnd, segSize)
buildSegment(opMode,
indexId, segSize)
buildTargetSeg()
manageJourney()
determineMatchedSegment()
adjustWeight()
applyAdjustment()
Page 42
Stock Market Index Prediction using CAS Algorithm
4.
Sequence Diagram - Predict
: User
: SMIFS
: PatternMatcher
: SystemStatus
: FutureIndexCal
startOp(opMode, indexId, indexCode,
startDate, endDate, predictDate, segSize,
agentGrp, cntAgent, sysStat)
start ()
partitionSubTs(indexId,
dateStart, dateEnd)
prepareAgent(indexId,
segSize, agentGrp, cntAgent,
cretAgent, cntCret)
match()
(Update system runtime status)
loadIndexData(opMode,
indexId, subTsDateBegin,
subTsDateEnd, segSize)
buildSegment(opMode,
indexId, segSize)
buildTargetSeg(predictDate)
manageJourney()
determineMatchedSegment()
calIndexMethod1()
calIndexMethod2()
calIndexMethod3()
calIndexMethod4()
Page 43
Stock Market Index Prediction using CAS Algorithm
5.
Sequence Diagram - Manage Agent:
: SMIFS
: User
: AgentManager
manageAgent( )
listAgent()
createAgent(indexCode, segSize, agentGrp, count)
listAgent()
deleteAgent(rowIndex)
listAgent()
modifyAgent(rowIndex, segSize, agentGrp)
listAgent()
Page 44
Stock Market Index Prediction using CAS Algorithm
6.
Sequence Diagram - Manage Index:
: SMIFS
: User
: IndexManager
manageIndex( )
listIndex()
createAgent(indexCode, indexName)
listIndex()
deleteIndex(rowIndex)
listIndex()
modifyIndex(rowIndex,indexCode, indexName)
listIndex()
Page 45
Stock Market Index Prediction using CAS Algorithm
10.3
1.
Design Special
Overall Design
The idea of using pattern matching technique with index calculation for forecasting
stock market index was borrowed from Singh’s research [8], however, criteria for
matching a pattern are changed. In addition, the system is given reinforcement learning
capabilities. And also, it compares performance of various formulas during the index
calculation process.
The system design is data-oriented. For example, the IndexPt class represents the
financial figures of a particular date. This design facilitates a systematic and effective
data handling.
2.
Divided Time-series
In order to facilitate efficient learning and prediction, the data set will be divided into
sub-time-series. At any point of time, one sub-time-series will be cached for agents to
work with. This prevents individual agents to query the database individually and
concurrently, which lead to a very high I/O activity in the system. The Pattern Matching
Component in the system is responsible to determine how the time-series should be
divided according to “ant to sub-time-series” ratio.
3.
IndexPt Indices in Segment class
An integer array idxIndex is an attribute for the Segment class (see Class Diagram). It
stores the ArrayList indices of indexPt in PatternMatcher class instead of actual index
data and the actual index data are stored in indexPt (except target segment). The
reason for this design is that, storing actual index data consumes significantly more
memory than storing array indices. Assume the segment size is 6, two consecutive
segments will have 5 overlapped indexPt (see Fig. 9). By storing array indices, the
actual index data will only stored once. The design aims at controlling memory
consumption.
Page 46
Stock Market Index Prediction using CAS Algorithm
Overlapped index points
Sub-time-series
......
......
Segment 1
Segment 2
An index point
Fig. 9 Overlapped index points of two consecutive segments
4.
calculationMethodN() in FutureIndexCal class
Each calculation function uses different formula to calculate the next index for a
segment. The result obtained with these formulas will be written to a CSV file for
analyses.
5.
PatternMatcher class
PatternMatcher is a core component in the system. Because of the divided time-series,
its structure is complex. Fig. 10 shows the relations between its attributes in details.
Note that when the segment size is 2, with a sub-time-series size equals 14, there are
13 segments formed in the sub-time-series.
Page 47
Stock Market Index Prediction using CAS Algorithm
Fig. 10 Relations between PatternMatcher attributes
Page 48
Stock Market Index Prediction using CAS Algorithm
Section 11
ppendix D - GUI Design
11.1
Main Interface
Fig. 11 is the Graphical User Interface (GUI) of SMIFS which will be shown after
launching the program. Some information must be supplied to allow SMIFS starts
learning or predicting. The information can be divided into 3 categories: i) operational
data, ii) index data, and iii) agent-related data. Information belong to the same category
are grouped together in the user interface, so that it will be well-organized and easy-touse.
Fig. 11 Main screen of SMIFS
Page 49
Stock Market Index Prediction using CAS Algorithm
There are 3 groups on the user interface which separate the 3 different categories of
inputs. In the Operation group, the Mode dropdown box allows user to specify the
operation type, it can be Learn or Predict. The Start button is used to start the operation
after all information is provided through the user interface by the user.
In the Index Data group, user can specify which index to operate on. The Code
dropdown box lists out all available index data by their Reuter code. When the user
selects another index code, the Name textbox will be updated automatically at the
same time to show the index’s full name. User can decide how many index points to
form a segment with the Segment Size textbox. The Start Date and the End Date
textboxes together define the period which used for learning in the Learn mode, or for
segment pattern matching in the Predict mode. And the Predict On textbox specifies
the date which prediction is made on. Start Date must be less than End Date in any
operation mode, and End Date must be less than Predict On in Predict mode.
In the Agent group, user can decide how many ants to use in the Count textbox, and
which group of ants to use in the Group textbox. A group number is associated to an
ant, so that ants working on the same index with the same segment size can be
separate into different groups. With the grouping strategy, comparison between ants
working with the same parameters, but under different system setting is possible.
Lastly, Create When Necessary checkbox indicates what SMIFS should do if there are
not enough ants as specified. If the checkbox is checked, SMIFS will create sufficient
ants to meet the Count given by the user; but if it is not checked, SMIFS will not start
the operation with an insufficient amount of ants.
After starting the operation, SMIFS will try to partition the index data within the period
Start Date and End Date into sub-time-series. In case the last sub-time-series does not
obey the “ant to sub-time-series” ratio, the user will be prompted with the number of
segments formed in the last sub-time-series, and asked if he wishes to continue the
operation. This is important, for example, if only 2 segments are formed within the last
sub-time-series, hundreds of ants are actually choosing 1 of the 2 segments within a
sub-time-series. This is not really meaningful in our algorithm, because the starting
points of ants are not randomized enough, and the number of segments for ants to
choose from is not large enough.
Page 50
Stock Market Index Prediction using CAS Algorithm
Our main interface is simple, clear and well-organized.
11.2
Agent and Index Management Interface
Fig. 12 shows the menu item used to launch the agent and index management
interfaces from the main interface. By clicked either one of these menu items, another
user interface will be launched for managing the corresponding type of data.
Fig. 12 Menu items for launching interfaces for Manage Agent and Manage Index
After clicking Manage Agent, the Agent Manager interface will be launched as shown in
Fig. 13. Inputs for different function are grouped together. This is the reason why there
could have duplicated fields; there are two Index Code dropdown boxes in the
interface, for instance. However, this will not confuse the user. On the other hand, since
a group of input for a particular function is grouped together, it helps the user
understand the interface more quickly. As shown in Fig. 13, it is obvious that one of the
Index Code dropdown boxes is for filtering the agent listed by index code, while another
one is for creating agents working for that index data.
The Agent Master grid lists out all available agents once the interface is launched, the
user may input filtering criteria inside Search Filter to limit the listed agents. The criteria
maybe Index Code, or Segment Size or a combination of them. For creating agents, the
Page 51
Stock Market Index Prediction using CAS Algorithm
user has to provide a set of input including Index Code, Segment Size, Group, and
Count, while Count specifies how many agents to create using the given parameters.
For updating or deleting an agent, the user needs to highlight that particular agent in
Agent Master and provide necessary input.
When any agent is highlighted in the Agent Master, information associated with the
agent will be displayed in the Agent Information group, including the set of weights that
the agent applies in pattern matching. These weights will be applied the corresponding
F-component.
Fig. 13 Interface for Manage Agent
The Index Manager interface works very similar with the Manage Agent interface. The
interface is shown in Fig. 14.
Page 52
Stock Market Index Prediction using CAS Algorithm
Fig. 14 Interface for Manage Index
Page 53
Download