City University of Hong Kong Department of Computer Science BSCCS/BSCS Final Year Project Report 2004-2005 (04CS023) Stock Market Index Forecast using CAS Algorithm (Volume 1 of 1 ) Student Name : Wan Kwok Wai Winsen Student No. : 50323718 Programme Code : BScCS Supervisor : Dr. Andy Chun 1st Reader : Dr. Victor Lee 2nd Reader : Mr. C H Lee For Official Use Only Stock Market Index Prediction using CAS Algorithm Section I bstract Pattern matching technique is found to be explanatory in time series analysis, and its importance in this area has been increasing. However, reinformcenet learning method was not used as a learning algorithm for the pattern matching process. In this project, the Stock Market Index Forecast System (SMIFS) was built to make stock market forecast. Ant Algorithm was applied to a two way path environment for the system to learn how patterns should be matched. The time series was divided into sub-timeseries based on the predefined “ant to sub-time-series” ratio. A sub-time-series was then divided into segments for pattern matching. We also studied the performance of various formulas for calculating future index point after determining the future movement direction. Experiements were done using different segment sizes. The results indicated that SMIFS worked better with a larger segment size. It suggested that pattern matching systems based on different learning algorithms may work best with very extreme segment sizes. The performance of SMIFS could be further increase by optimizing the segment size and “ant to sub-time-series” ratio. Page 2 Stock Market Index Prediction using CAS Algorithm Section II cknowledges Winsen Wan would like to thank Dr. Andy Chun for his valuable advice and supervision. He would like to thank Ms. Charis Hui for her assistance in obtaining financial data set. He would also like to thank all his classmates, friends and family for giving him their kind supports. Page 3 Stock Market Index Prediction using CAS Algorithm Section III able Of Contents bstract........................................................................................................................................2 cknowledges..............................................................................................................................3 able Of Contents........................................................................................................................4 ntroduction ..................................................................................................................................6 1.1 Objectives.................................................................................................................7 1.2 Scope Of This Project ..............................................................................................7 1.3 Problem Domain.......................................................................................................7 ackground Information...............................................................................................................8 2.1 Rosennean Complexity ............................................................................................8 2.2 Elliot Wave Principle.................................................................................................9 2.3 Related Works........................................................................................................10 ethod .......................................................................................................................................12 3.1 Data Set .................................................................................................................12 3.2 Pattern Matching ....................................................................................................13 3.2.1 Segments ...............................................................................................................13 3.2.2 Sub-time-series ......................................................................................................13 3.2.3 Ant Algorithm On Shortest Path Problem...............................................................14 3.2.4 Ant Algorithm On This Project ................................................................................15 3.2.5 Similarity Determination .........................................................................................17 3.2.6 Discounted-F ..........................................................................................................17 3.3 Efficiency Enhancement.........................................................................................19 3.4 Weight Adjustment .................................................................................................20 3.5 Index Calculation....................................................................................................22 esults .......................................................................................................................................23 iscussion..................................................................................................................................28 onclusion .................................................................................................................................30 eferences.................................................................................................................................31 ppendix A - Experience & Difficulty .........................................................................................32 8.1 Use Of Ant Algorithm..............................................................................................32 8.2 Weighting Function Design ....................................................................................33 8.3 Weight Adjustment Optimization ............................................................................33 8.4 Project Management ..............................................................................................35 ppendix B - Database Design..................................................................................................36 ppendix C - System Design .....................................................................................................39 10.1 Class Description ...................................................................................................39 10.2 UML Design............................................................................................................40 10.3 Design Special .......................................................................................................46 ppendix D - GUI Design...........................................................................................................49 Page 4 Stock Market Index Prediction using CAS Algorithm 11.1 11.2 Main Interface.........................................................................................................49 Agent and Index Management Interface ................................................................51 Page 5 Stock Market Index Prediction using CAS Algorithm Section 1 ntroduction There are a number of stock market indices in the US financial market. Each of these indices tracks the performance of a group of stocks which considered to be representing a particular industry, sector or market in the US economy. This group of stocks is called a basket. For example, the Standards and Poors (S&P) 500 Composite Stock Price Index is an index of 500 stocks from major industries of the US economy. It is a capitalizationweighted index for 500 leading companies, which considered to be a representative sample group, in leading industries within the US economy. Many investors whose investment objectives are to track the performance of a particular index would participate in index trading. Because of the attractive return from investment, many tools and techniques were developed for making financial forecast. Technical Analysis (TA) was one of the most commonly used tools. TA makes use of many financial indicators to forecast the time series. TA suggests special patterns appear in the time series to be indicating the future trend. As Artificial Intelligent (AI) continued to growth, AI techniques and algorithms were applied to make financial forecast. In general, Artificial Nerual Network (ANN) was used to tackle financial forecasting problems. ANN was said to provide relatively good result compared to other methods. Nevertheless, ANN did not tell us how it made the forecast, at least we did not know what factors affect the forecast from ANN. Page 6 Stock Market Index Prediction using CAS Algorithm Some people started to think of applying machine learning on pattern matching technique for making financial forecast, which is relatively a new method for this problem. By enriching our knowledge in pattern matching financial forecast, it may help us figure out the real factors which determine the trend and movement of a time series. 1.1 Objectives This project primarily aims at designing an algorithm based on pattern matching technique which makes stock market forecast using historical data, and constructing formulas which are used to calculate future index points. The secondary objective is to analyze potential relationship between stock index and some attributes of the index. 1.2 Scope Of This Project Originally, a stock market model and an automatic trading agent were planned to build in this project, while the agent would be capable to participate in security trading by itself. However, I realized that this project scope would be too large and impossible to achieve, after studying the problem in details, and also some related works. As a result, the project scope was reduced, and it now includes: z Study techniques used in computer-aided financial forecast z Study related works in making financial forecast z Design a pattern matching algorithm based on Ant Algorithm for forecasting the trend of a time series 1.3 z Construct formulas for calculating future index points z Evaluate the performance of the pattern matching algorithm and formulas Problem Domain Given a set of historical data in a time series, and a target segment, Ant Algorithm is applied to match the target segment with the historical data to determine the future trend. A learning procedure is designed for training ants. Several formulas are developed for calculating future index points with the predicted trend. Page 7 Stock Market Index Prediction using CAS Algorithm Section 2 ackground Information Complex systems are complex in the sense that they consist of at least one nonsimulable model. No single dynamical description can successfully model a complex system, or else the behaviors of that system can always be predicted correctly [1]. One of the examples of complex system is the financial market [2]. 2.1 Rosennean Complexity According to Robert Rosen’s definitions to “simple” and “complex” systems, a system with all its models simulable, it is considered as simple. On the other hand, a system with a nonsimulable model, it is considered as complex. Robert Rosen (1934 – 1988) was a theoretical biologist who dedicated his life to find out the answer for the question “What is Life?”. As a professor emeritus of biophysics at Dalhosie University, he realized that the Newtonian model was inadequate to describe biological systems. The Newtonian model of physics did not provide him an answer to the question. This suggested that biological systems are more complex than the Newtonian model. We usually use modeling to understand the world, congruence between the elements and structures of two systems are established. One of the two systems would be a real system in the world, while the other one is our model. When the real system acts unexpectedly, which is not consistent with our model, we think that the real system is complex. Page 8 Stock Market Index Prediction using CAS Algorithm Rosennean complexity suggests that if a single dynamical description is capable to describe a system successfully, then the behaviors of that system will always be predicted correctly. This kind of system does not have any complexity. On the other hand, if multiple partial dynamical descriptions are insufficient to describe a system succesuffly, it is complex. Financial markets are certainly complex systems, no one could correctly, successfully describe the system. Hence, no one could predict the financial market behavior, or else there would be no financial market. Some people believe that prices and indices movement obey the random walk theory [3], which says prices move independently in any point of time, implying that historical data cannot be used to predict future movement. This is, indeed, obvious, because a financial market responses to many factors everyday. Some of these factors are not predictable, news, for instance [4]. 2.2 Elliot Wave Principle In 1939, Ralph Nelson Elliott proposed his Elliot Wave Principle in Financial World. The basis of the Principle is that, regular pattern always exists in the natural world. He realized that many things in the world are actually repeating, for example, morning and night tides, day and night, life and death. All of these cycles consist of two types of movement: rise and fall [5]. Elliot combines his observations with Fibonacci sequence and suggested that in the financial market, a cycle consists of 5 rises, and 3 falls, totally 8 waves (3, 5 and 8 are Fibonacci numbers). He called the longest cycle as Grand Supercycle, and it can be divided into 8 Supercycles. Similarly, each Supercycle can be divided into 8 cycles, and a cycle can be further divided into smaller units. In this project, we do not actually make use of the Principle in making financial forecast. However, we based on the Principle and saying that, the time series always repeat itself. We believe that there exists a matched segment somewhere in the historical data in the time series, given a target segment. We make use of the matched segment to predict the future movement for the target segment by asserting that the history will Page 9 Stock Market Index Prediction using CAS Algorithm happen again in the same way. We believe that there are cycles in financial markets which are difficult to detect using statistical methods. 2.3 Related Works Despite the difficulties of making financial forecast, many researches were done in this area for its commercial and academic values. These researches include stock market index, stock price and foreign exchange rate prediction. However, we still cannot find an effective algorithm for making financial forecast. Techniques like statistical forecasting and artificial neural network [6], genetic programming [7], and pattern matching [8] have been applied to solve the problem. The accuracy of predicting the trend using these techniques varies from 20% to 70%. It is no doubt that, financial forecasting is not an easy task to do. Tsang and Li applied financial genetic programming to make stock-market forecast [7]. The research was done using the Hang Seng Index data from 25-May-1991 to 16-Oct1993 for the Hong Kong stock market. It used 9 experts to predict the index movement in next week by giving any of the following description: bullish (the index rises by over 1.3%), bearish (the index falls by over 1.3%), sluggish (the index is neither bullish nor sluggish) or uncertain (the expert did not make any prediction). The mean accuracy achieved by the experts was 50.39%. Method using local approximation by pattern modeling and pattern recognition techniques was also used for making stock-market forecast in Singh’s researches [8]. The research was done using the S&P index monthly data from 1988 to 1996 for the US financial market, other series in other domains were also used. A whole time-series is chopped into segments, each segment is then represented as a pattern using a set of tag values, where 0 represents a fall, and 1 represents a rise or a zero movement. For a target segment, prediction on the movement direction is made by identifying the nearest historical match, which has the minimum difference in magnitude compare with the target segment. The matched pattern is then used to predict the index in next state. The mean direction prediction accuracy achieved for the data sets used was 73.17%. Page 10 Stock Market Index Prediction using CAS Algorithm Singh encouraged further research to be done to enhance the pattern recognition based tools for forecasting. Page 11 Stock Market Index Prediction using CAS Algorithm Section 3 ethod Pattern matching using Ant Algorithm is employed to generalize similar segments. This helps us predict the future movement direction of a target segment. After determining the future movement direction, several formulas are applied to calculate the index point for the next trade date. The logical flow of the whole financial forecast process is shown in Fig. 1. Fig. 1 Logical flow of the whole financial forecast process 3.1 Data Set S&P 500 Composite Index historical data is used for training and testing the system. 10 years data of the S&P 500 Composite Index, from 15-Sept-1994 to 15-Sept-2004, totally 2610 observations, are obtained from DataStream using terminals in the Department of Economics and Finance of the City University of Hong Kong. The data from 15-Sept-1994 to 12-Oct-1994 are unused, because segments formed within this period would have undefined attributes due to inadequate historical data (e.g. percentage difference between index and average index of last 20 days). The data from 13-Oct-1994 to 19-Mar-2003, totally 2220 observations, will be used for training the system. And the data from 20-Mar-2003 to 14-Sept-2004, totally 389 observations, will be used for testing purpose, i.e. making actual forecast. The preservation of data is Page 12 Stock Market Index Prediction using CAS Algorithm essential, because it prevents the same segment to be found in the learnt data, which may lead to over-optimistic result. 3.2 Pattern Matching Generalization of segments is done according to the some of the segment attributes stated in Table 1. Segments which are too large or too small cannot be generalized well. In order words, the segment size affects prediction accuracy. As the ants learn how segments should be generalized, the optimal segment size could be determined by picking the size that the best prediction accuracy was achieved with. 3.2.1 Segments The time-series will be chopped into segments. Each segment is associated with a number of attributes, which are sets of values for each index point in the segments. The segment attributes are shown in Table 1. Segment attribute Description Absolute index The exact index point Movement magnitude The change in index values Movement percentage The percentage change in index value Movement direction A rise, a falls or zero index movement Percentage difference between The percentage difference between Index and Index and averaged index averaged index Table 1 Segment attributes 3.2.2 Sub-time-series Sub-time-series are formed in order to reduce system loads and, at the same time, provide a superior environment for ants to operate. A time-series contains a number of sub-time-series, and a sub-time-series consists of segments as shown in Fig. 2. A sub-time-series is actually a logical structure only, which allows us to process the time-series group by group. The size of sub-time-series Page 13 Stock Market Index Prediction using CAS Algorithm is measured with the number of segments formed within that sub-time-series, and it is proportional to the number of ants used for training or prediction. This “ant to sub-timeseries size” ratio is set to 1 in this project. The prediction accuracy maybe affected when this ratio changes. Index Sub-time-series Index Point Segment Time Series Time Fig. 2 Relationships between time-series, sub-time-series, segment and index point 3.2.3 Ant Algorithm On Shortest Path Problem Ant algorithm will be used to find out the matched segments. Ant is very good in solving shortest-path problems, for example, traveling salesman problem [9]. Ants move from the start to the destination, and then move back to the start using the same path. As they walk along the path, they leave pheromone, which is a substance that attracts other ants to choose that path instead of the others. An ant which chose the shortest path returns to the start quicker than other ants. Hence, the path it chosen has a higher concentration of pheromone than the other paths. It makes other ant choose this path with a higher chance. However, pheromone evaporates over time, making the path not as attractive as before. Fig. 3 shows how Ant algorithm works for shortest-path problem in a network. Assuming all edges have the same length, the red path has a length of 4, while the blue path has a length of 5. If two ants move together from the start to the destination, and then back to the start using the same path, the ant which chooses the red path takes 8 Page 14 Stock Market Index Prediction using CAS Algorithm steps to finish its journey, while the ant which chooses the blue path takes 10. So when another ant starts its journey at step 8, it is facing a situation that the red path has the concentration of pheromone doubled compare with the blue one, and hence, the red one is more attractive. It is more likely to choose the red path than the blue path. Node Start Destination Fig. 3 Shortest-path finding in a network using Ant Algorithm 3.2.4 Ant Algorithm On This Project For the application of Ant algorithm to our pattern matching problem, some of the listed segment attributes are used to find out the matched segment. Different ants give different weights to these segment attributes, and they also give a weight to the other ants’ choices, which simulates the effect of pheromone. A segment and a sub-timeseries to be evaluated will be given, and different ants will start from different locations in the sub-time-series, which are randomized. An ant moves along the sub-time-series, inspects its left hand side and right hand side, and stop at the point that it believes that the matched pattern was found on each side. It then chooses the one with the lowest difference among both sides. The track it went through will attract other ants’ choices, and the degree will be influenced by the weight given by the other ants. In Fig. 4, by combining all individual decisions, the shaded part in the sub-time-series is found to be matched with the target segment. Page 15 Stock Market Index Prediction using CAS Algorithm Fig. 4 Aggregate decision from individual ants Our application of Ant algorithm is different from the general application, in which our action space is not a network, but a two-way path. A sub-time-series can be treated as a two-way path, since the number of edges connected to a node must be 1 or 2. Pheromone left by ants can attract other ants to move forward or stop. In Fig. 4 when Ant 4 reaches the track of Ant 1, it is encouraged to move forward by the pheromone left by Ant 1. After it moves forward, and reaches the track of Ant 2, it is encouraged to forward pheromone left by both Ant 1 and Ant 2. On the other hand, when Ant 4 reaches the end of Ant 1’s track, it is encouraged to stop, because the concentration of pheromone on the next state is lower than the current state. Page 16 Stock Market Index Prediction using CAS Algorithm 3.2.5 Similarity Determination A segment S of size n has a number of segment attributes: Segment attribute Denoted by Absolute indices Y = {y1, y2, …, yn-1, yn} Movement magnitudes V = {v2, v3, …, vn-1, vn }, where vn = (yn - yn-1) Movements percentages P = {p2, p3, …, pn-1, pn }, where p n = (vn / yn-1) * 100 Movement directions D = {d2, d3, …, dn-1, dn }, where dn = (1 / n) when (vn > 0), dn = -(1 / n) when (vn < 0), dn = 0 when (vn = 0) Percentage difference between index and average index of last m days A(m) = {a1, a2, …, an-1, an }, where n an = ([yn - {( ∑ yi) / m}] / yn) * 100 i =n−m Table 2 Segment attribute notations The weighting function F consists of 6 weighted components: Movements in percentage, movement directions, indices over average index of the last 5, 10, 15 and 20 days. All of these will be used in F to match two segments. Any ant determines the level of similarity of two segments Sg and Sh by applying its weights to F: 4 F = w1(Pg - Ph) + w2(Dg - Dh) + ∑ i =1 n wi+2{A g(5*i) - A h(5*i)}, where ∑ wi = 1 i =1 Any two identical segments generate a zero F value. It means when F approaches zero, the similarity increases. 3.2.6 Discounted-F In prediction mode, the F value will be discounted to simulate the effect of pheromone. The default maximum discount was set to 20% in this research. An F value can be discounted using the following formulas: Page 17 Stock Market Index Prediction using CAS Algorithm F’ = F * [1- D’M * (P / T)] (1) D’M = DM * [1 – (Rs – Rsm) / (RsM – Rsm)] (2) Rs = S / (S + F) (3) In the first formula, F’ is the discounted-F, D’M is the actual maximum discount, while V is the number of ants passed by the segment, and T is the total number of working ants. The formula discounts the F value with a portion of the actual maximum discount, according to the percentage of ants passed by the segment. In the second formula, DM is the default maximum discount, Rs is the direction success rate of the ant, while Rsm is the minimum direction success rate among all ants, and RsM is the maximum direction success rate among all ants. The formula sets the actual maximum discount according to the ant’s learning performance compared to the extremes. For example, the best performed ant would have no discount, and the worst performed ant would have the actual maximum discount equals the default one. In the last formula, S is the number of direction success in learning for the ant, and F is the number of direction fail in learning for the ant. F = 1.22 Next Segment Chosen Segment Previous Segment F = 1.25 F = 1.24 F = 1.32 Ant is moving towards the left Fig. 5 The ant should have taken one more move Fig. 5 shows the F value of the next segment is discounted to 1.22 from 1.25. Let us denote “next segment” as n, “chosen segment” as c, “previous segment” as p and “target segment” as t. shows the details of discounted-F. Assumed that there are 100 working agents, where 25 of them passed by n. The direction success rate of the ant is 54%, while the minimum among all ants is 38% and the maximum among all ants is 66%. The discounted F can be calculated: Page 18 Stock Market Index Prediction using CAS Algorithm Rs = 0.54 D’M = 0.2 * [1 – (0.54 – 0.38) / (0.66 – 0.38)] = 0.085714 F’ = 1.25 * [1- 0.085714 * (25 / 100)] = 1.223214 The ant will choose n instead of c, since n has a smaller F value than c. The discounting effect mainly determined by the number of ants passed by the segment, and the ant’s learning performance. 3.3 Efficiency Enhancement The time-series is stored in a database. Hence, agents query the database frequently when learning or forecasting. This leads to a large number of I/O operations in the system, which increases system load and decreases system performance. However, large amount of memory is required if we load the whole time-series into the main memory. This could also decrease system performance, depending on the time-series size. To facilitate an efficient learning and forecasting, a sub-time-series will be cached (i.e. loaded into the memory) at one time for ants to work with, so as to reduce the number of I/O operations and have a controlled memory consumption. After all ants finished working on the cached sub-time-series, it will be replaced by the next one. There will be an overlapped period between two consecutive sub-time-series, in order to ensure all patterns are evaluated. The size of the overlapped period equals (s – 1), where s is the size of the segment being learnt. Fig. 6 shows the sub-time-series replacement and the overlapped period in between. Sub-time-series Overlapped period Target segment Time-series Fig. 6 Sub-time-series replacement Page 19 Stock Market Index Prediction using CAS Algorithm 3.4 Weight Adjustment During the learning process, each ant find out the segment which it thinks is matched. After all ants made their decisions, ants’ choices are compared to the target segment. A choice is considered to be incorrect, if the next movement direction of the chosen segment is different from that of the target segment. In this case, the neighbours of the chosen segment will be inspected to determine how the ant should adjust its weights. Fig. 7 The ant should have taken one more move Let us denote “next segment” as n, “chosen segment” as c, “previous segment” as p and “target segment” as t. In Fig. 7, the ant was moving towards the left, and it decided not to move further after reaching c, it was because Fn is higher Fc. As we mentioned earlier, the F value consists of 6 components (named as F-components), and they make different contributions to the F value. We assume that the weight associated with the maximum F-component should be adjusted in a mismatch. The reason for having this assumption is that, adjusting the weight associating with the minimum Fcomponent could be ineffective, because the non-weighted component could be zero. The ant chose an incorrect segment in Fig. 7, because the next movement direction of c is different from that of t. After inspecting the neighbours of c (i.e. p and n), it is known that the ant should have taken one more move to choose n, since the next segment has the correct next movement direction. Base on our assumption, it is believed that the maximum F-component Fn inhibited the ant from moving forward to n. So, the corresponding weight adjustment is to lower the weight associated with the maximum F-component. The same logic of weight adjustment is applied to other cases, and presented in Table 3: Page 20 Stock Market Index Prediction using CAS Algorithm Correct Correct ant Explanation for failure Weight adjusted neighbour action Next Take one more The effect of the maximum Lower the weight segment move F-component was too associated with the large, and inhibited the ant maximum F- from moving forward component Previous Do not take the The effect of the maximum Raise the weight segment last move F-component was too associated with the weak, and could not inhibit maximum F- the ant from moving component forward No correct Nil Nil Nil neighbour Table 3 Logic for adjusting weight for different situations However, when both neighbours are not correct, it is assumed that the ant has already made the optimal choice in its locality. Page 21 Stock Market Index Prediction using CAS Algorithm 3.5 Index Calculation A number of formulas are used to calculate the predicted index point for the next trade date after determined the movement direction, and their performances are being assessed by comparing to the differences to the actual index point. Table 4 lists these formulas. No 1 Formula Description Take the averaged movement of the n yn+1 = yn ± ( ∑ vi) / n segment as the next movement i =1 2 n yn+1 = yn ± ∑ i =1 n Take the weighted averaged movement j =1 of the segment as the next movement, {i / ( ∑ j)} vi while the more recent the index point, the higher the weight applied 3 n yn+1 = yn ± yn * {( ∑ pi) / n} i =1 Take the averaged movement percentage of the segment as the next movement percentage 4 n n Take the weighted averaged movement i =1 j =1 percentage of the segment as the next yn+1 = yn ± yn * [ ∑ {i / ( ∑ j)} pi] movement percentage, while the more recent the index point, the higher the weight applied Table 4 Formulas used for index calculation The predicted index point is basically calculated by the last index point plus a movement. The movement direction predicted determines the sign of the movement. Page 22 Stock Market Index Prediction using CAS Algorithm Section 4 esults Two experiments were done using an “ant to sub-time-series size” ratio of 1. It means the number of ants used equals the number of segments formed within a sub-timeseries. This ratio affects the coverage of ants’ inspection in a sub-time-series. If the ratio is too small, some parts of the sub-time-series may not be inspected by any ant. Ants would be over-dispersed, and there may not be too many ants chosen the same segment. On the other hand, if the ratio is too large, the sub-time-series maybe crowded by ants. Under this situation, too many ants may start their inspections at the same location, and they are likely to choose the same segment. Hence, the degree of collaborative decision making will be lower. For the data between 20-Mar-2003 to 14-Sept-2004, 35 samples were randomly selected to test our system. 200 ants were used for prediction. In the first experiment, the segment size was set to 9. The system predicted the next movement direction successfully for 22 out of 35 samples, the accuracy was 62.86%. The result for each prediction is shown in Table 5. No Predicted on Predicted for Actual next Predicted next movement movement direction direction Correct 1 15-Apr-2003 16-Apr-2003 Fall Fall 9 2 02-May-2003 05-May-2003 Fall Rise 8 3 12-May-2003 13-May-2003 Fall Fall 9 4 16-Jun-2003 17-Jun-2003 Rise Fall 8 Page 23 Stock Market Index Prediction using CAS Algorithm 5 25-Jun-2003 26-Jun-2003 Rise Fall 8 6 30-Jun-2003 01-Jul-2003 Rise Rise 9 7 08-Aug-2003 11-Aug-2003 Rise Rise 9 8 11-Aug-2003 12-Aug-2003 Rise Rise 9 9 18-Aug-2003 19-Aug-2003 Rise Rise 9 10 22-Aug-2003 25-Aug-2003 Rise Rise 9 11 29-Aug-2003 01-Sept-2003 Same Fall 8 12 19-Sept-2003 22-Sept-2003 Fall Fall 9 13 29-Sept-2003 30-Sept-2003 Fall Fall 9 14 03-Oct-2003 06-Oct-2003 Rise Rise 9 15 13-Oct-2003 14-Oct-2003 Rise Fall 8 16 15-Oct-2003 16-Oct-2003 Rise Rise 9 17 17-Oct-2003 20-Oct-2003 Rise Rise 9 18 26-Nov-2003 27-Nov-2003 Same Fall 8 19 02-Dec-2003 03-Dec-2003 Fall Fall 9 20 15-Dec-2003 16-Dec-2003 Rise Rise 9 21 02-Jan-2004 05-Jan-2004 Rise Fall 8 22 27-Jan-2004 28-Jan-2004 Fall Rise 8 23 27-Feb-2004 01-Mar-2004 Rise Rise 9 24 16-Mar-2004 17-Mar-2004 Rise Same 8 25 24-Mar-2004 25-Mar-2004 Rise Rise 9 26 30-Apr-2004 03-May-2004 Rise Rise 9 27 11-May-2004 12-May-2004 Rise Fall 8 28 26-May-2004 27-May-2004 Rise Fall 8 29 18-Jun-2004 21-Jun-2004 Fall Fall 9 30 30-Jun-2004 01-Jul-2004 Fall Fall 9 31 12-Jul-2004 13-Jul-2004 Rise Rise 9 32 30-Jul-2004 02-Aug-2004 Rise Rise 9 33 26-Aug-2004 27-Aug-2004 Rise Rise 9 34 03-Sept-2004 06-Sept-2004 Same Rise 8 35 09-Sept-2004 10-Sept-2004 Rise Fall 8 Table 5 SMIFS direction forecast result with a segment size of 9 The average weights of the 200 ants which worked with a segment size of 9 are shown in Table 6. Page 24 Stock Market Index Prediction using CAS Algorithm Weight Average of 200 Ants 1 0.08512 2 0.006535 3 0.250665 4 0.209735 5 0.30236 6 0.145585 Table 6 Average weights of ants worked with a segment size of 9 For the 22 success prediction, we compared the performance of each index calculation formulas. Formula 2 performed the best among the other formulas. The result is presented in Table 7. Formula No. Average Error 1 0.52% 2 0.48% 3 0.52% 4 0.49% Average 0.5% Table 7 SMIFS index forecast result for experiment 1 In the second experiment, the segment size was set to 3 and the same 35 samples were used, however, we could not obtain a better result compared to the previous experiment. The system predicted the next movement direction successfully for 16 out of 35 samples, the accuracy was 45.71%. The result for each prediction is shown in Table 8. No Predicted on Predicted for Actual next Predicted next movement movement direction direction Correct 1 15-Apr-2003 16-Apr-2003 Fall Fall 9 2 02-May-2003 05-May-2003 Fall Fall 9 3 12-May-2003 13-May-2003 Fall Fall 9 4 16-Jun-2003 17-Jun-2003 Rise Fall 8 Page 25 Stock Market Index Prediction using CAS Algorithm 5 25-Jun-2003 26-Jun-2003 Rise Rise 9 6 30-Jun-2003 01-Jul-2003 Rise Rise 9 7 08-Aug-2003 11-Aug-2003 Rise Fall 8 8 11-Aug-2003 12-Aug-2003 Rise Fall 8 9 18-Aug-2003 19-Aug-2003 Rise Fall 8 10 22-Aug-2003 25-Aug-2003 Rise Fall 8 11 29-Aug-2003 01-Sept-2003 Same Fall 8 12 19-Sept-2003 22-Sept-2003 Fall Fall 9 13 29-Sept-2003 30-Sept-2003 Fall Rise 8 14 03-Oct-2003 06-Oct-2003 Rise Fall 8 15 13-Oct-2003 14-Oct-2003 Rise Rise 9 16 15-Oct-2003 16-Oct-2003 Rise Fall 8 17 17-Oct-2003 20-Oct-2003 Rise Rise 9 18 26-Nov-2003 27-Nov-2003 Same Rise 8 19 02-Dec-2003 03-Dec-2003 Fall Fall 9 20 15-Dec-2003 16-Dec-2003 Rise Rise 9 21 02-Jan-2004 05-Jan-2004 Rise Fall 8 22 27-Jan-2004 28-Jan-2004 Fall Fall 9 23 27-Feb-2004 01-Mar-2004 Rise Fall 8 24 16-Mar-2004 17-Mar-2004 Rise Fall 8 25 24-Mar-2004 25-Mar-2004 Rise Rise 9 26 30-Apr-2004 03-May-2004 Rise Rise 9 27 11-May-2004 12-May-2004 Rise Rise 9 28 26-May-2004 27-May-2004 Rise Rise 9 29 18-Jun-2004 21-Jun-2004 Fall Rise 8 30 30-Jun-2004 01-Jul-2004 Fall Rise 8 31 12-Jul-2004 13-Jul-2004 Rise Rise 9 32 30-Jul-2004 02-Aug-2004 Rise Fall 8 33 26-Aug-2004 27-Aug-2004 Rise Fall 8 34 03-Sept-2004 06-Sept-2004 Same Fall 8 35 09-Sept-2004 10-Sept-2004 Rise Fall 8 Table 8 SMIFS direction forecast result with a segment size of 3 The average weights of the 200 ants which worked with a segment size of 3 are shown in Table 9. Page 26 Stock Market Index Prediction using CAS Algorithm Weight Average of 200 Ants 1 0.166155 2 0.00443 3 0.306105 4 0.22251 5 0.203905 6 0.096895 Table 9 Average weights of ants worked with a segment size of 3 For the 16 success prediction, we compared the performance of each index calculation formulas. Formula 1 performed the best among the other formulas. The result is presented in Table 10. Formula No. Average Error 1 0.41% 2 0.43% 3 0.42% 4 0.44% Average 0.42% Table 10 SMIFS index forecast result for experiment 2 Page 27 Stock Market Index Prediction using CAS Algorithm Section 5 iscussion Our direction prediction result was better than using genetic algorithm forecasting, which had a direction success rate of 50.39% [7]. However, Singh’s pattern matching method performed better with a direction success rate of 76% with a segment size of 3, and he suggested this to be the optimal segment size [8]. Our results had a 62.86% accuracy with a segment size of 9, however, it fell to 45.71% when the segment size became 3. The different finding between Singh’s work and this project indicated that different pattern matching implementation might have different optimal segment size. System applying reinforcement learning algorithm and other machine learning algorithm may have different optimal segment size. Possibily the optimal segment size was not yet obtained in this project, which could further improve the system’s direction success rate. Apart from segment sizes, “ant to sub-time-series size” also affects the system performance. Throughout this project, the ratio was set to 1. This ratio could be further reduced to obtain a better result. With a ratio of 1, the sub-time-series may still be overcrowed. For example, 0.5 could be a better choice. This indicates that the “ant to sub-time-series” ratio should be well-controlled when applying Ant Algorithm to two way path environments. The index calculation formulas performances were acceptable. Most of the formulas had the average error within 0.5%. These formulas actually only made use of the predicted movement direction, and the target segment data for calculation. The Page 28 Stock Market Index Prediction using CAS Algorithm matched segment data were not taken into account. Further research would try to make use of the matched segment data as well. Finally, the average weights had shown a high importance for weight3, weight4, and weight5. During the pattern matching process, these weights were applied to 3 of the Fcomponents: percentage difference between index of average index of last 5 days, last 10 days and last 15 days respectively. With a segment size of 9, the 3 weights contributed 0.76276 of 1 to the total weight. Similarly, with a segment size of 3, they contributed 0.73252 of 1 to the total weight. The result suggested that these 3 factors are significantly more important than the other factors for finding a matched segment no matter how large the segment size is. However, the movement direction was not considered as important, and it contributed less than 0.01 to the total weight in both experiments. Page 29 Stock Market Index Prediction using CAS Algorithm Section 6 onclusion Applying Ant Algorithm to a two way path environment is relatively new idea. More research would be needed to explore the optimal settings in order to make the algorithm perform better. Such algorithm accommodates a flexible and dynamical pattern matching task in time series analysis. ANN had been widely used in solving this kind of problem, which gave us a considerably good result. Nonetheless, it does not provide us any information about its decision criteria. We would question where the highly accurate results come from. This is the reason why searching for a new method for solving this kind of problem is worth to do. By applying pattern matching technique, we could have full understanding about why a certain pattern is considered as a match. It is also clear that how the matched pattern would affect the result. This help us know more about the hidden things inside the time series. The pattern matching method based on Ant Algorithm developed in this project worked better with a larger segment size. Matching a larger segment would be more difficult than matching a smaller segment, because they are harder to be generalized. Maybe this also explains why a larger segment size led to a better result in this project. Because of this reason, we suggest a larger segment size, if further researches would like to take the matched segment into account in further index calculation. However, the size of the data set may have to increase, since a larger segment may not repeat itself in a relatively short period based on Elliot Wave Principle. Page 30 Stock Market Index Prediction using CAS Algorithm Section 7 eferences [1] Gwinn, T. (2004). Robert Rosen – Complexity in a Nutshell. [Online]. Avaliable: http://www.panmere.com/rosen/faq_complex1.htm [2004, Sept. 21] [2] Johnson, N. F., Paul Jefferies, and Pak Ming Hui. (2003). Financial Market Complexity. New York: Oxford University Press. [3] Investopedia.com. [Online]. Financial Concepts – Random Walk Theory. Available: http://www.investopedia.com/university/concepts/concepts5.asp [2004, Sept. 21] [4] Lippi, M., and Daniel Thornton. (2004). A Dynamic Factor Analysis of the Response of U.S. Interest Rates to News. [Online]. Research Division of the Federal Reserve Bank of St. Louis. Available: http://research.stlouisfed.org/wp/2004/2004-013.pdf [2004, Sept. 22] [5] 任若恩, 馬向前, 沈沛龍, 劉莉亞, 及鄧雲勝. (2003). 技術分析: 北京: 中國財政經濟出版社. [6] Wu, Shaun-inn, and Ruey-Ping Lu. (1993). Combining Artificial Neural Networks and Statistics for Stock-Market Forecasting. Proceedings of the 1993 ACM conference on Computer science, 257-64 [7] Tsang, Edward, and Jin Li. (2000). Combining Ordinal Financial Predictions with Genetic Programming. Proceedings of the 2nd International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents, 532-37 [8] Singh, S., and Paul McAtackney. (1998). Dynamic Time-Series Forecasting using Local Approximation. Proceedings of the 10th IEEE International Conference on Tools with AI, 392-99 [9] Bonabeau, E., and Guy Théraulaz. (2000). Swarm Smarts. [Online]. Scientific American, Inc. Available: http://dsp.jpl.nasa.gov/members/payman/swarm/sciam_0300.pdf [2004, Sept. 23] Page 31 Stock Market Index Prediction using CAS Algorithm Section 8 ppendix A - Experience & Difficulty 8.1 Use Of Ant Algorithm It was really a challenging task when designing how Ant algorithm can be used in solving pattern matching problem in this project. We considered putting segments into a network, and using Ant algorithm to find out the first matched segment, just liked solving the shortest-path problem. However, there must be some criteria to determine where to put a certain segment in the network, which are difficult to define. This made the method unfeasible. Another alternative was to loop through each segment in the time-series, and compare the segments with the target segment based on some formulas. It means that no agent will be used. This method is relatively hardwired, i.e. the system always behaves the same because there is only one party who will evaluate the segments similarity using the same formula. We also considered using other reinforcement learning algorithm apart from Ant algorithm. Traditional reinforcement learning algorithm uses agents for making decision, however, one agent cannot influence the other one’s choice. This makes each agent to work on its own, as if there is no other agent. The only advantage was that there are more than one suggested matched segment for reference, but there could have no relation between agents’ decisions. Finally we decided to use the current method, i.e. applying Ant algorithm to a 2-way path environment. We allow ants to start from a random position on a path, and they Page 32 Stock Market Index Prediction using CAS Algorithm will be inspecting their own locality to decide which segment is the best-matched in this locality. Our method has advantages over the previous methods. First, we need not to consider categorizing segments. Second, ants apply their own set of weighting to the F function and generating different results. Lastly, it employs ants for collaborative decision making, which choices between ants are correlated. 8.2 Weighting Function Design When we were setting criteria for pattern matching, originally we planned to include absolute indices and movement magnitudes in similarity determination. This is meaningful in making prediction if peoples’ decisions are different for the same segment pattern at different index levels. For example, different index levels may reflect different economic prosperity and investment environment, which affects peoples’ willingness to invest. These finally affect the financial index movement. But since absolute indices and movement magnitudes have no lower and upper limits, they cannot be normalized. They may give incorrect indication to the level of similarity when being set as one of the F-components. Even though these criteria are meaningful, we failed to normalize these values and finally chose not to include them in the weighting function. 8.3 Weight Adjustment Optimization Another difficult task was to optimize the weight adjustment method. We realized that, for each correct neighbour (as presented in Table 3), we could have two different interpretations, which leads to different adjustment rules. Table 11 shows the differences between two explanations for pattern matching failure: Page 33 Stock Market Index Prediction using CAS Algorithm Correct Adopted explanation for failure Another explanation for failure Next The effect of the maximum F- The effect of the minimum F- segment component was too large, and component was too small (i.e. its inhibited the ant from moving value was too big), and could not forward push the ant move one extra step Previous The effect of the maximum F- The effect of the minimum F- segment component was too weak, and component was too strong (i.e. could not inhibit the ant from its value was too small), and moving forward pushed the ant move one extra neighbour step No correct Nil Nil neighbour Table 11 Different explanations for pattern matching failure Weight adjustment rules could be set out by combining the two types of explanations, i.e. 2C2 = 4. Originally we applied another adjustment rule. We tried to raise the weight associated with the minimum F-component when the previous neighbour is correct. The assumption made behind was that, the value of the minimum F-component was too small, so that the ant made the last move. This assumption seemed to be correct, and the method was also good because it does not lead to weight bias, i.e. a certain weight is always increasing or decreasing. After the weight associated with the minimum F-component raises to a certain level, it would not be the minimum F-component anymore, and the adjustment is going to be made on another weight. However, the value of the minimum F-component could be zero, when the two segments are exactly the same on that criterion. In that case, no matter how much we adjust the weight, the adjustment will be ineffective. Thereby, this method is not practical. Page 34 Stock Market Index Prediction using CAS Algorithm Since adjusting the weight associated with the minimum F-component could be ineffective, we decided to adopt the current method, adjusting the weight associated with the maximum F-component for both cases. 8.4 Project Management From this project, we realized the importance of a good project planning, and spending effort to keep up with intermediate deadlines. They really ensure progress and help us get things done on time. It is sometimes tough to keep up with deadlines in intermediate stages, because they seem not to be important. But of course this is not the case. By keeping up with them, a final product can be worked out with less effort at the end of the project. Oppositely, if one deadline is not kept up, team members feel very difficult and dejected to keep up when the second one approaches. As a result, the team gets further and further from the target when each deadline comes. Ultimately the team has to spend a large amount of effort to complete the project at the end, but for a low quality work. Page 35 Stock Market Index Prediction using CAS Algorithm Section 9 ppendix B - Database Design 5 tables were created in SMIFS, they are shown in Fig. 8: Fig. 8 Relations in Stock Market Index Forecast System (SMIFS) 1. Table Overview Table Name Description IDX_VAL Stock market index valuation per date IDX Stock market index AGENT Ants’ attributes AGENT_WEIGHT Weights applied to weighting function for each ant MATCHING_RESULT Temporary storage for summarizing chosen segments Page 36 Stock Market Index Prediction using CAS Algorithm 2. Table Details IDX_VAL table Field Name Data type & Size Description IID NUMERIC(6,0) Instrument ID DATE NUMERIC(8,0) Trade date VAL NUMERIC(7,2) Index point MOV NUMERIC(7,2) Movement MOV_PCT NUMERIC(5,2) Movement in percentage MOV_DIR NUMERIC(1,0) Movement in direction AVG_LST5 NUMERIC(8,3) Averaged index of the last 5 trade dates AVG_LST10 NUMERIC(8,3) Averaged index of the last 10 trade dates AVG_LST15 NUMERIC(8,3) Averaged index of the last 15 trade dates AVG_LST20 NUMERIC(8,3) Averaged index of the last 20 trade dates Field Name Data type & Size Description IID NUMERIC(6,0) Instrument ID CODE VARCHAR(8) Instrument abbreviation NAME VARCHAR(30) Instrument full name IDX table Page 37 Stock Market Index Prediction using CAS Algorithm AGENT table Field name Data type & Size Description AID NUMERIC(6,0) Agent ID IID NUMERIC(6,0) Instrument ID SEG_SIZE NUMERIC(3,0) Segment size GRP NUMERIC(3,0) Group DATE_CRET NUMERIC(8,0) Creation date DATE_INIT NUMERIC(8,0) Initialization date NUM_LEARN NUMERIC(8,0) Number of learnings done NUM_PREDICT NUMERIC(8,0) Number of predictions made DIR_SUCCESS Number of success in predicting movement NUMERIC(8,0) direction DIR_FAIL NUMERIC(8,0) Number of fail in predicting movement direction (for consistency checking) AGENT_WEIGHT table Field name Data type & Size Description AID NUMERIC(6,0) Agent ID WEIGHT1 NUMERIC(4,3) Weight for movement percentage match WEIGHT2 NUMERIC(4,3) Weight for movements direction match WEIGHT3 NUMERIC(4,3) Weight for average index (last 5) match WEIGHT4 NUMERIC(4,3) Weight for average index (last 10) match WEIGHT5 NUMERIC(4,3) Weight for average index (last 15) match WEIGHT6 NUMERIC(4,3) Weight for average index (last 20) match MATCHING_RESULT table Field name Data type & Size Description DATE_BEGIN NUMERIC(8,0) Date begin of the chosen segment DATE_END NUMERIC(8,0) Date end of the chosen segment F NUMERIC(10,8) F value Page 38 Stock Market Index Prediction using CAS Algorithm Section 10 ppendix C - System Design 10.1 Class Description Table 12 lists the classes defined in his project: Class Name Description SMIFS Stock Market Index Forecast System. It encapsulates all system components AgentManager Agent Management Component. It provides function for maintaining agents (ants) IndexManager Index Management Component. It provides function for maintaining stock market indices PatternMatcher Pattern Matching Component. It matches segments in prediction mode, and improves ants in learning mode FutureIndexCal Future Index Calculation Component. It calculates the next index valuation for a segment in prediction mode after PatternMatcher finds a matched segment. It uses various formulas in the calculation and records the performance of different formulas Agent An individual participated in pattern matching process. It decides which segment in the time-series matches with the target segment. Every agent has its own set of weights, which will be applied to the weighting function F Page 39 Stock Market Index Prediction using CAS Algorithm Segment A set of index data in a given sub-time-series with predefined size SubTsPeriod A data structure that holds details of a sub-time-series IndexPt An index point in a time-series. It records details of financial data on a particular date Difference A data structure that contained by a Segment for holding each component that forms the F value JourneyGroup A data structure that specifies a group of agents that would be managed by one thread during pattern matching process Table 12 Classes defined in the system 10.2 1. UML Design Use Case Diagram SMIFS Train Predict User ManageAgent ManageIndex Page 40 1 WeightedF() MinIndividualDiff() MaxIndividualDiff() movPct : Double movDir : Double avgLst5 : Double avgLst10 : Double avgLst15 : Double avgLst20 : Double Difference haveChosenSeg() resetChosenF() 1 1 CloneWithIdxData() CloneWithIdxRef() isTarget : Boolean size : Integer dateBegin : String dateEnd : String indexID : Integer idxIndex[size] : Integer idxNextIndex : Integer index[size] : IndexPt cntVisited : Integer diff : Difference Segment * reset() Agent * agentID : Integer indexID : Integer segSize : Integer agentGrp : Integer dateCreated : String dateInit : String cntLearn : Integer cntPredict : Integer cntDirSuccess : Integer cntDirFail : Integer weight[n] : Double idxChosenSeg : Integer journeyDir : Integer chosenSegStartDate : String chosenSegEndDate : String chosenF : Double chosenNextMovDir : Integer 1 createAgent() deleteAgent() modifyAgent() countAgent() listAgent() indexCode : String opMode : Integer opStartTime : DateTime subTsSize : Integer cntWorkingAgent : Integer cntSubTsProcessed : Integer cntSegProcessed : Integer cntTotalSubTs : Integer tsDateBegin : Date tsDateEnd : Date subTsDateBegin : Date subTsDateEnd : Date segDateBegin : Date segDateEnd : Date * 1 SystemStatus cntAgent : Integer agent : ArrayList AgentManager 1 1 SMIFS 1 start() match() manageJourney() allJourneyGroupFinish() determineMatchedSegment() adjustWeight() applyAdjustment() buildTargetSeg() partitionSubTs() prepareAgent() loadIndexData() buildSegment() createAgent() indexCountRec() agentCountRec() getMinSuccessRate() getMaxSuccessRate() journeyGrp : JourneyGroup cretAgent : Boolean cntCret : Integer indexID : Integer indexCode : String opMode : Integer dateBegin : String dateEnd : String datePredict : String segSize : Integer subTsSize : Integer agentGrp : Integer cntAgent : Integer cntSeg : Integer cntSubTs : Integer indexPt : ArrayList agent : ArrayList seg : ArrayList subTsPeriod : ArrayList sysStat : SystemStatus targetSeg : Segment matchedSeg : Segment PatternMatcher 1 startOp() manageAgent() manageIndex() agentMan : AgentManager indexMan : IndexManager sysStatus : SystemStatus patternMatcher : PatternMatcher 1 1 1 1 * calulationMethod1() calulationMethod2() calulationMethod3() calulationMethod4() matchedSeg : Segment targetSeg : Segment indexCode : String actual : IndexPt predictMovDir : Integer FutureIndexCal 1 indexID : Integer date : String indexValue : Double movValue : Double movPct : Single movDir : Double avgLst5 : Double avgLst10 : Double avgLst15 : Double avgLst20 : Double IndexPt beingManaged : Boolean allFinished : Boolean idxFirstAgent : Integer cntAgentManaged : Integer JourneyGroup * createIndex() deleteIndex() modifyIndex() countIndex() listIndex() cntIndex : Integer index : ArrayList IndexManager 1 2. 1 1 Stock Market Index Prediction using CAS Algorithm Class Diagram (Accessors / Attributes are not shown) Page 41 Stock Market Index Prediction using CAS Algorithm 3. Sequence Diagrams – Train : User : SMIFS : PatternMatcher : SystemStatus startOp(opMode, indexId, indexCode, startDate, endDate, predictDate, segSize, agentGrp, cntAgent, sysStat) start () partitionSubTs(indexId, dateStart, dateEnd) prepareAgent(indexId, segSize, agentGrp, cntAgent, cretAgent, cntCret) match() (Update system runtime status) loadIndexData(opMode, indexId, subTsDateBegin, subTsDateEnd, segSize) buildSegment(opMode, indexId, segSize) buildTargetSeg() manageJourney() determineMatchedSegment() adjustWeight() applyAdjustment() Page 42 Stock Market Index Prediction using CAS Algorithm 4. Sequence Diagram - Predict : User : SMIFS : PatternMatcher : SystemStatus : FutureIndexCal startOp(opMode, indexId, indexCode, startDate, endDate, predictDate, segSize, agentGrp, cntAgent, sysStat) start () partitionSubTs(indexId, dateStart, dateEnd) prepareAgent(indexId, segSize, agentGrp, cntAgent, cretAgent, cntCret) match() (Update system runtime status) loadIndexData(opMode, indexId, subTsDateBegin, subTsDateEnd, segSize) buildSegment(opMode, indexId, segSize) buildTargetSeg(predictDate) manageJourney() determineMatchedSegment() calIndexMethod1() calIndexMethod2() calIndexMethod3() calIndexMethod4() Page 43 Stock Market Index Prediction using CAS Algorithm 5. Sequence Diagram - Manage Agent: : SMIFS : User : AgentManager manageAgent( ) listAgent() createAgent(indexCode, segSize, agentGrp, count) listAgent() deleteAgent(rowIndex) listAgent() modifyAgent(rowIndex, segSize, agentGrp) listAgent() Page 44 Stock Market Index Prediction using CAS Algorithm 6. Sequence Diagram - Manage Index: : SMIFS : User : IndexManager manageIndex( ) listIndex() createAgent(indexCode, indexName) listIndex() deleteIndex(rowIndex) listIndex() modifyIndex(rowIndex,indexCode, indexName) listIndex() Page 45 Stock Market Index Prediction using CAS Algorithm 10.3 1. Design Special Overall Design The idea of using pattern matching technique with index calculation for forecasting stock market index was borrowed from Singh’s research [8], however, criteria for matching a pattern are changed. In addition, the system is given reinforcement learning capabilities. And also, it compares performance of various formulas during the index calculation process. The system design is data-oriented. For example, the IndexPt class represents the financial figures of a particular date. This design facilitates a systematic and effective data handling. 2. Divided Time-series In order to facilitate efficient learning and prediction, the data set will be divided into sub-time-series. At any point of time, one sub-time-series will be cached for agents to work with. This prevents individual agents to query the database individually and concurrently, which lead to a very high I/O activity in the system. The Pattern Matching Component in the system is responsible to determine how the time-series should be divided according to “ant to sub-time-series” ratio. 3. IndexPt Indices in Segment class An integer array idxIndex is an attribute for the Segment class (see Class Diagram). It stores the ArrayList indices of indexPt in PatternMatcher class instead of actual index data and the actual index data are stored in indexPt (except target segment). The reason for this design is that, storing actual index data consumes significantly more memory than storing array indices. Assume the segment size is 6, two consecutive segments will have 5 overlapped indexPt (see Fig. 9). By storing array indices, the actual index data will only stored once. The design aims at controlling memory consumption. Page 46 Stock Market Index Prediction using CAS Algorithm Overlapped index points Sub-time-series ...... ...... Segment 1 Segment 2 An index point Fig. 9 Overlapped index points of two consecutive segments 4. calculationMethodN() in FutureIndexCal class Each calculation function uses different formula to calculate the next index for a segment. The result obtained with these formulas will be written to a CSV file for analyses. 5. PatternMatcher class PatternMatcher is a core component in the system. Because of the divided time-series, its structure is complex. Fig. 10 shows the relations between its attributes in details. Note that when the segment size is 2, with a sub-time-series size equals 14, there are 13 segments formed in the sub-time-series. Page 47 Stock Market Index Prediction using CAS Algorithm Fig. 10 Relations between PatternMatcher attributes Page 48 Stock Market Index Prediction using CAS Algorithm Section 11 ppendix D - GUI Design 11.1 Main Interface Fig. 11 is the Graphical User Interface (GUI) of SMIFS which will be shown after launching the program. Some information must be supplied to allow SMIFS starts learning or predicting. The information can be divided into 3 categories: i) operational data, ii) index data, and iii) agent-related data. Information belong to the same category are grouped together in the user interface, so that it will be well-organized and easy-touse. Fig. 11 Main screen of SMIFS Page 49 Stock Market Index Prediction using CAS Algorithm There are 3 groups on the user interface which separate the 3 different categories of inputs. In the Operation group, the Mode dropdown box allows user to specify the operation type, it can be Learn or Predict. The Start button is used to start the operation after all information is provided through the user interface by the user. In the Index Data group, user can specify which index to operate on. The Code dropdown box lists out all available index data by their Reuter code. When the user selects another index code, the Name textbox will be updated automatically at the same time to show the index’s full name. User can decide how many index points to form a segment with the Segment Size textbox. The Start Date and the End Date textboxes together define the period which used for learning in the Learn mode, or for segment pattern matching in the Predict mode. And the Predict On textbox specifies the date which prediction is made on. Start Date must be less than End Date in any operation mode, and End Date must be less than Predict On in Predict mode. In the Agent group, user can decide how many ants to use in the Count textbox, and which group of ants to use in the Group textbox. A group number is associated to an ant, so that ants working on the same index with the same segment size can be separate into different groups. With the grouping strategy, comparison between ants working with the same parameters, but under different system setting is possible. Lastly, Create When Necessary checkbox indicates what SMIFS should do if there are not enough ants as specified. If the checkbox is checked, SMIFS will create sufficient ants to meet the Count given by the user; but if it is not checked, SMIFS will not start the operation with an insufficient amount of ants. After starting the operation, SMIFS will try to partition the index data within the period Start Date and End Date into sub-time-series. In case the last sub-time-series does not obey the “ant to sub-time-series” ratio, the user will be prompted with the number of segments formed in the last sub-time-series, and asked if he wishes to continue the operation. This is important, for example, if only 2 segments are formed within the last sub-time-series, hundreds of ants are actually choosing 1 of the 2 segments within a sub-time-series. This is not really meaningful in our algorithm, because the starting points of ants are not randomized enough, and the number of segments for ants to choose from is not large enough. Page 50 Stock Market Index Prediction using CAS Algorithm Our main interface is simple, clear and well-organized. 11.2 Agent and Index Management Interface Fig. 12 shows the menu item used to launch the agent and index management interfaces from the main interface. By clicked either one of these menu items, another user interface will be launched for managing the corresponding type of data. Fig. 12 Menu items for launching interfaces for Manage Agent and Manage Index After clicking Manage Agent, the Agent Manager interface will be launched as shown in Fig. 13. Inputs for different function are grouped together. This is the reason why there could have duplicated fields; there are two Index Code dropdown boxes in the interface, for instance. However, this will not confuse the user. On the other hand, since a group of input for a particular function is grouped together, it helps the user understand the interface more quickly. As shown in Fig. 13, it is obvious that one of the Index Code dropdown boxes is for filtering the agent listed by index code, while another one is for creating agents working for that index data. The Agent Master grid lists out all available agents once the interface is launched, the user may input filtering criteria inside Search Filter to limit the listed agents. The criteria maybe Index Code, or Segment Size or a combination of them. For creating agents, the Page 51 Stock Market Index Prediction using CAS Algorithm user has to provide a set of input including Index Code, Segment Size, Group, and Count, while Count specifies how many agents to create using the given parameters. For updating or deleting an agent, the user needs to highlight that particular agent in Agent Master and provide necessary input. When any agent is highlighted in the Agent Master, information associated with the agent will be displayed in the Agent Information group, including the set of weights that the agent applies in pattern matching. These weights will be applied the corresponding F-component. Fig. 13 Interface for Manage Agent The Index Manager interface works very similar with the Manage Agent interface. The interface is shown in Fig. 14. Page 52 Stock Market Index Prediction using CAS Algorithm Fig. 14 Interface for Manage Index Page 53