Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2 189 FAST ADAPTIVE LEARNING ALGORITHM FOR CLASSIFICATION OF TIME SERIES Vanya MARKOVA, Ventseslav SHOPOV, Velko ILTCHEV Bulgarian Academy of Sciences – Institute of System Engineering and Robotics, Technical University Sofia – Branch Plovdiv Bulgaria e-mail: markovavanya@yahoo.com Abstract This article presents a new fast algorithm for classification. This algorithm can process either symbol data or shape subsequences. The knowledge is stored as weights of the arcs of a tree, where all possible combinations are presented. The data structure, used to store the weights, is an indexed linear array, instead of dynamic tree or matrix of incidence. An algorithm for processing this new data structure has been developed and compared with existing ones Keywords: machine learning, adaptive algorithm, decision making, classification, performance evaluation 1 INTRODUCTION Induction rule learning is a simple yet powerful learning and classification model. Tree learning algorithms offer tools for discovery of relationships, patterns and knowledge from data in databases and time series [3]. Induction rule trees are a classifier in the form of tree structure that contains decision nodes and leaves. It assigns a class value to an instance. The decision tree is a special kind of induction rule tree, which construction takes polynomial time concerning the number of attributes and inputs, as no backtracking is required [1]. The approaches used in WEKA are precise, but time extensive [1],[2]. An application of these methods for decision making is the area of Autonomous Agents[4][5]. The induction rule trees are used in meta-learning part of Autonomous Mobile Sensor Agent (AMSA)[6],[7]. The approaches used in WEKA are precise, but time extensive [2]. So every improvement will have significant impact on overall performance of Agents behaviour. Faster approach for inference of induction rules will speed up the over process of decision making of such agents. In this paper a new approach for storing weights of a induction rule tree in indexed linear array is proposed instead of using dynamic tree or 190 Fast Adaptive Learning Algorithm For Classification Of Time Series matrix of incidence. A proper algorithm is developed and compared with two existing cases, i.e. dynamic tree and matrix of incidence. As comparison measures the accuracy of prediction and execution performance are used. 2 THE APPROACH We decide to store knowledge in the form of weights of the arcs of a tree, where all possible combinations of facts will be presented. The number of layers λ of this tree will be the ω+2, where size of the window= ω. The number of outgoing arcs (from a node) corresponds to the number of facts. Hence, the number of nodes η in this "tree of knowledge" will be: where number_of_fact_types = ξ, and the number of arcs μ is: This tree learns new knowledge in the following way: When a sequence of facts occurs, then the weight of the arcs, the algorithm goes through, increases. The value of increase will be called hereinafter: "coefficient of learning" γ. Prediction and learning happen simultaneously! Figure 1 Example of a tree, which stores information about 3 facts. The window encloses 3 facts, i.e. the system can predict the 4-th fact. Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2 191 When a sequence of facts did not occur for a long time, then the weight of the arcs, the algorithm goes through, decreases. The value of decrease will be called hereinafter: "coefficient of amnesia" ρ. The values of these coefficients The tree just now learned the sequences: a1, a1, a2, a3 a3, a2, a2, a1 a3, a2, a2, a3 (follow the arcs in bold please). 3 WELL-KNOWN DATA STRUCTURES FOR STORING TREES 3.1 Structure with dynamic memory allocation It stores only those sequences, which occurred at least one time. This saves memory in the beginning of the process of learning. But with the growing of the knowledge, the memory consumption grows also and reaches the volume, needed to store a full tree with all nodes and arcs. Of course, an additional algorithm can be developed, to prune the less informative branches, but this algorithm will work slow and will increase the memory fragmentation. 3.2 Matrix of incidence The rows, of this matrix, will contain the nodes from the current layer, and the columns - the nodes from the next one. The weight of the binding arc will lie on the cross point. The advantage of this data structure is the fast search. The disadvantages are: - the matrix is sparse; - a separate matrix for each layer is necessary. As a result, a set of matrixes is must be defined. - the number of rows and columns for each layer depends both on the number of facts and on the size of the window. One possible solution of the second problem is to include the number of the layer into the number of the node, but this will slow down the search speed. Even in this case, this one single matrix remains sparse. 3.3 Static table Such a table may have, for instance, the following 4 attributes: CurrentLayer - , CurrentNode , Arc (which contain the weight of this arc), NextNode . It is not necessary to store NextLayer, because it can be calculated each time as CurrentLayer + 1 i.e. . 192 Fast Adaptive Learning Algorithm For Classification Of Time Series The advantage of this data structure is the fast search, which can be additionally speeding up if the table is ordered on CurrentLayer and CurrentNode. In this case the algorithm will fulfil a dichotomy search. The disadvantage is the big memory consumption. 4 THE NEW DATA STRUCTURE The main idea behind this new structure is: "Only arcs shall be stored, because they are the only ones, which contain the information about the analyzed sequences of facts." For this purpose only one single array of integers is necessary. Its elements will contain the arc weights, which weights can vary from 0 to 100 (for example). The position of each arc in this array can be calculated as Segment + Offset, where: - Segment ζ is the total number of all arcs in all layers, up to the current layer ς. Hereinafter, the notion layer will stand for a "layer of arcs" and instead of for a "layer of nodes". The segment can be calculated through the following formula: - Offset τ points to the position of the arc in the current layer. The offset can be calculated throught the following formula: top_of_stack=α , bottom_of_stack=β, fact_in_stack =φ, because the algorithm, described below, uses a stack to store the facts from the analyzed sequence. The advantages of this data structure are: - very fast search, because the position of an arc can be calculated fast and easy via a simple formula; - significantly low memory consumption, in comparison with the matrix and with the static table. The structure with dynamic memory allocation has lower memory consumption only in the beginning of the learning process. When the knowledge, stored in the dynamic tree, exceeds a certain volume, then the amount of memory needed will exceed the amount of memory needed for the proposed new data structure, because the dynamic tree structure must store not only the weights of the arcs, but the tree structure also, represented through the pointers between the nodes. Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2 193 5 THE ALGORITHM A tree is a recursive data structure. Hence, it is reasonable to use a recursive algorithm for traversing it. Such an algorithm will use the machine stack. This is not convenient, because it is not easy to manipulate the machine stack, which is necessary, because it stores the analysed sequence. Therefore we decide to use an iterative algorithm and to manage our own stack with two sections. The first one will contain the facts from the analysed sequence and the second one - the weights of the corresponding arcs. The advantage of this approach is, that we can fast and easy compute the probability of the sequence of facts, we analyse, because the facts from this sequence already lie in the stack, together with the weights of the corresponding arcs. for each fact in window starting with the earliest one do begin calculate the segment and the offset as if this is the first layer then Segment = 0; Offset = Fact - 1; else for each next layer Segment += FactTypesNumber ^ Layer; Offset = Offset of the facts in the stack + (Fact - 1); push(Fact, Arrows[Segment+Offset]) into the stack // Learning, which goes simultaneously with prediction! if (Arrows[Segment+Offset] + LearningCoefficient) < 101 then Arrows[Segment+Offset] += LearningCoefficient; else Arrows[Segment+Offset] = 100; end // explore the bunch of arrows, grown up from the last layer of window facts Segment += FactTypesNumber ^ Layer; Offset = (Offset - 1) * FactTypesNumber; for each fact do push this fact into the stack calculate the probability of the facts sequence in the stack if this probability lies over a threshold then send a signal for ... pull the fact from the stack and try with the next one } // Learning process for the last layer get the last fact, i.e. the fact for which the prediction has just now been made if (Arrows[Segment+Offset+Fact-1] + LearningCoefficient) < 101 then Arrows[Segment+Offset+Fact-1] += LearningCoefficient; else Arrows[Segment+Offset+Fact-1] = 100; // Fulfill amnesia for each arrow in the array of arrows do if(Arrows[ArrowsPtr] > (AmnesiaCoefficient + 1)) Arrows[ArrowsPtr] -= AmnesiaCoefficient; 194 Fast Adaptive Learning Algorithm For Classification Of Time Series 6 EXPERIMENTAL RESULTS We build two applications based on classification algorithm, the first one uses dynamic tree and the second one uses indexed array as primary data structure. The first application is denoted as App I and the second one is denoted as App II. The results from AMSA simulations are used as data set. AMSA operate with time series as primary data set, so these time series are used as training and test sets in comparative analysis of our two applications. On the stage of pre-processing time series at first are sliced to subsequences. After that the data in the patterns are normalised to provide the equidistance measures. These measures are allows us to use k-mean algorithm to cluster the subsequences into a discrete set of unordered facts. window size app I 3 4 5 6 7 8 9 10 overall app II 0,45 0,55 0,6 0,55 0,52 0,43 0,36 0,33 0,47 accuracy ZeroR J48 0,45 0,35 0,55 0,33 0,6 0,38 0,55 0,37 0,52 0,33 0,43 0,27 0,36 0,22 0,33 0,2 0,47 0,31 Table 1 Impact of the window size on accuracy Figure 2 Impact of the window size on accuracy 0,46 0,56 0,61 0,55 0,53 0,45 0,38 0,37 0,49 Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2 195 In the first experiment the both applications process the experimental data and as result for comparative analysis the accuracy δ is used. And accuracy is the number of successful classified samples as a ratio to all samples. δ= /θ Where the number of is correct classified samples and θ is all samples. In this experiment the impact of size of window w is studied and results are shown in table. In the second experiment the both algorithms process the experimental data and as result the execution performance is used. The execution performance is measured in milliseconds. window size 3 4 5 6 7 8 9 10 11 app I 318 391 470 623 738 841 890 996 1209 time in ms app II ZeroR 180 75 196 79 199 82 207 85 213 88 206 90 219 93 221 96 223 99 J48 440 520 643 882 976 1201 1270 1376 1540 Table 2 Impact of the window size on execution performance Figure 3 Impact of the window size on execution performance From the presented results is obvious that both applications of a new algorithm show equal accuracy, which is almost the same as accuracy of J48. In other hand the execution performance differs significantly, the application with indexed array is 196 Fast Adaptive Learning Algorithm For Classification Of Time Series significantly faster than J48. Moreover with augmentation of the windows size w the new proposed algorithm decisively outperforms the existing ones. CONCLUSION The new proposed algorithm shows the better execution performance than the existing ones. In addition the latter approach consumes significantly less amount of memory. These advantages of the new proposed algorithm are observed at the same accuracy ratio as in the case of dynamic tree and matrix of incidence. This new proposed approach could be used as in simultaneously manner as well as in the classic way with explicit learning and classification parts. In the future studies the influence of learning and amnesia rates at accuracy and execution performance could be of interest. Probably the learning performance of the new proposed algorithm could be better than the learning performance of the other classification approaches such as decision trees and artificial networks, and this will be the matter of future studies. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] PEIMAN MAMANI BARNAGHI, Vahid Alizadeh Sahzabi and Azuraliza Abu Bakar,A Comparative Study for Various Methods of Classification,2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) © (2012) IACSIT Press, pp 62-68 RAJPUT A., et all, J48 and JRIP Rules for E-Governance Data, International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (2) : 2011,pp 202-217 QUINLAN J.R., Induction of Decision Trees, Machine Learning 1: 81-106, 1986,pp 81-106 JOHNSON D. E., OLES F. J., ZHANG T., GOETZ T., A decision-tree-based symbolic rule induction system for text categorization, IBM SYSTEMS JOURNAL, VOL 41, NO 3, 2002 BRAZIER, F.M.T., KEPHART, J.O. ; VAN DYKE PARUNAK, H. ; HUHNS, M.N. Agents and Service-Oriented Computing for Autonomic Computing: A Research Agenda IEEE Internet Computing Magazine Volume: 13 , Issue: 3, Page(s): 82 - 87 MARKOVA V.. SHOPOV V., ONKOV K, An Approach for Developing the Behaviour Models of Autonomous Mobility Sensor Agent, Varna, Proc. of the International conference ”Information Technologies, 19-20 September, Varna, Bulgaria, 2008, pp 97-105 MARKOVA V; SHOPOV V.: Comparative Analysis of Algorithms for learning In Autonomous Mobile Sensor Agent. Proceedings of the International Conference on Information Technologies; 2011, pp 255-263