FAST ADAPTIVE LEARNING ALGORITHM FOR CLASSIFICATION

advertisement
Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2
189
FAST ADAPTIVE LEARNING ALGORITHM FOR
CLASSIFICATION OF TIME SERIES
Vanya MARKOVA, Ventseslav SHOPOV, Velko ILTCHEV
Bulgarian Academy of Sciences – Institute of System Engineering and Robotics,
Technical University Sofia – Branch Plovdiv
Bulgaria
e-mail: markovavanya@yahoo.com
Abstract
This article presents a new fast algorithm for classification. This algorithm can
process either symbol data or shape subsequences. The knowledge is stored as
weights of the arcs of a tree, where all possible combinations are presented. The
data structure, used to store the weights, is an indexed linear array, instead of
dynamic tree or matrix of incidence. An algorithm for processing this new data
structure has been developed and compared with existing ones
Keywords: machine learning, adaptive algorithm, decision making, classification,
performance evaluation
1
INTRODUCTION
Induction rule learning is a simple yet powerful learning and classification model.
Tree learning algorithms offer tools for discovery of relationships, patterns and
knowledge from data in databases and time series [3].
Induction rule trees are a classifier in the form of tree structure that contains
decision nodes and leaves. It assigns a class value to an instance. The decision tree is a
special kind of induction rule tree, which construction takes polynomial time
concerning the number of attributes and inputs, as no backtracking is required [1]. The
approaches used in WEKA are precise, but time extensive [1],[2].
An application of these methods for decision making is the area of Autonomous
Agents[4][5]. The induction rule trees are used in meta-learning part of Autonomous
Mobile Sensor Agent (AMSA)[6],[7]. The approaches used in WEKA are precise, but
time extensive [2]. So every improvement will have significant impact on overall
performance of Agents behaviour.
Faster approach for inference of induction rules will speed up the over process of
decision making of such agents. In this paper a new approach for storing weights of a
induction rule tree in indexed linear array is proposed instead of using dynamic tree or
190
Fast Adaptive Learning Algorithm For Classification Of Time Series
matrix of incidence. A proper algorithm is developed and compared with two existing
cases, i.e. dynamic tree and matrix of incidence. As comparison measures the accuracy
of prediction and execution performance are used.
2
THE APPROACH
We decide to store knowledge in the form of weights of the arcs of a tree, where
all possible combinations of facts will be presented.
The number of layers λ of this tree will be the ω+2, where size of the window= ω.
The number of outgoing arcs (from a node) corresponds to the number of facts. Hence,
the number of nodes η in this "tree of knowledge" will be:
where number_of_fact_types = ξ,
and the number of arcs μ is:
This tree learns new knowledge in the following way:
When a sequence of facts occurs, then the weight of the arcs, the algorithm goes
through, increases. The value of increase will be called hereinafter: "coefficient of
learning" γ. Prediction and learning happen simultaneously!
Figure 1 Example of a tree, which stores information about 3 facts. The window
encloses 3 facts, i.e. the system can predict the 4-th fact.
Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2
191
When a sequence of facts did not occur for a long time, then the weight of the
arcs, the algorithm goes through, decreases. The value of decrease will be called
hereinafter: "coefficient of amnesia" ρ. The values of these coefficients
The tree just now learned the sequences:
a1, a1, a2, a3
a3, a2, a2, a1
a3, a2, a2, a3 (follow the arcs in bold please).
3
WELL-KNOWN DATA STRUCTURES FOR STORING TREES
3.1 Structure with dynamic memory allocation
It stores only those sequences, which occurred at least one time. This saves
memory in the beginning of the process of learning. But with the growing of the
knowledge, the memory consumption grows also and reaches the volume, needed to
store a full tree with all nodes and arcs.
Of course, an additional algorithm can be developed, to prune the less informative
branches, but this algorithm will work slow and will increase the memory
fragmentation.
3.2 Matrix of incidence
The rows, of this matrix, will contain the nodes from the current layer, and the
columns - the nodes from the next one. The weight of the binding arc will lie on the
cross point.
The advantage of this data structure is the fast search.
The disadvantages are:
- the matrix is sparse;
- a separate matrix for each layer is necessary. As a result, a set of matrixes is
must be defined.
- the number of rows and columns for each layer depends both on the number of
facts and on the size of the window.
One possible solution of the second problem is to include the number of the layer
into the number of the node, but this will slow down the search speed. Even in this
case, this one single matrix remains sparse.
3.3 Static table
Such a table may have, for instance, the following 4 attributes: CurrentLayer - ,
CurrentNode , Arc (which contain the weight of this arc), NextNode .
It is not necessary to store NextLayer, because it can be calculated each time as
CurrentLayer + 1 i.e.
.
192
Fast Adaptive Learning Algorithm For Classification Of Time Series
The advantage of this data structure is the fast search, which can be additionally
speeding up if the table is ordered on CurrentLayer and CurrentNode. In this case the
algorithm will fulfil a dichotomy search.
The disadvantage is the big memory consumption.
4
THE NEW DATA STRUCTURE
The main idea behind this new structure is: "Only arcs shall be stored, because
they are the only ones, which contain the information about the analyzed sequences of
facts."
For this purpose only one single array of integers is necessary. Its elements will
contain the arc weights, which weights can vary from 0 to 100 (for example).
The position of each arc in this array can be calculated as Segment + Offset,
where:
- Segment ζ is the total number of all arcs in all layers, up to the current layer ς.
Hereinafter, the notion layer will stand for a "layer of arcs" and instead of for a "layer
of nodes". The segment can be calculated through the following formula:
- Offset τ points to the position of the arc in the current layer. The offset can be
calculated throught the following formula:
top_of_stack=α , bottom_of_stack=β, fact_in_stack =φ,
because the algorithm, described below, uses a stack to store the facts from the
analyzed sequence.
The advantages of this data structure are:
- very fast search, because the position of an arc can be calculated fast and easy
via a simple formula;
- significantly low memory consumption, in comparison with the matrix and with
the static table. The structure with dynamic memory allocation has lower memory
consumption only in the beginning of the learning process. When the knowledge,
stored in the dynamic tree, exceeds a certain volume, then the amount of memory
needed will exceed the amount of memory needed for the proposed new data structure,
because the dynamic tree structure must store not only the weights of the arcs, but the
tree structure also, represented through the pointers between the nodes.
Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2
193
5
THE ALGORITHM
A tree is a recursive data structure. Hence, it is reasonable to use a recursive
algorithm for traversing it. Such an algorithm will use the machine stack. This is not
convenient, because it is not easy to manipulate the machine stack, which is necessary,
because it stores the analysed sequence. Therefore we decide to use an iterative
algorithm and to manage our own stack with two sections. The first one will contain
the facts from the analysed sequence and the second one - the weights of the
corresponding arcs.
The advantage of this approach is, that we can fast and easy compute the
probability of the sequence of facts, we analyse, because the facts from this sequence
already lie in the stack, together with the weights of the corresponding arcs.
for each fact in window starting with the earliest one do
begin
calculate the segment and the offset as
if this is the first layer then
Segment = 0; Offset = Fact - 1;
else for each next layer
Segment += FactTypesNumber ^ Layer;
Offset = Offset of the facts in the stack + (Fact - 1);
push(Fact, Arrows[Segment+Offset]) into the stack
// Learning, which goes simultaneously with prediction!
if (Arrows[Segment+Offset] + LearningCoefficient) < 101 then
Arrows[Segment+Offset] += LearningCoefficient;
else Arrows[Segment+Offset] = 100;
end
// explore the bunch of arrows, grown up from the last layer of window
facts
Segment += FactTypesNumber ^ Layer;
Offset = (Offset - 1) * FactTypesNumber;
for each fact do
push this fact into the stack
calculate the probability of the facts sequence in the stack
if this probability lies over a threshold then
send a signal for ...
pull the fact from the stack and try with the next one
}
// Learning process for the last layer
get the last fact, i.e. the fact for which the prediction has just now
been made
if (Arrows[Segment+Offset+Fact-1] + LearningCoefficient) < 101 then
Arrows[Segment+Offset+Fact-1] += LearningCoefficient;
else Arrows[Segment+Offset+Fact-1] = 100;
// Fulfill amnesia
for each arrow in the array of arrows do
if(Arrows[ArrowsPtr] > (AmnesiaCoefficient + 1))
Arrows[ArrowsPtr] -= AmnesiaCoefficient;
194
Fast Adaptive Learning Algorithm For Classification Of Time Series
6
EXPERIMENTAL RESULTS
We build two applications based on classification algorithm, the first one uses
dynamic tree and the second one uses indexed array as primary data structure. The first
application is denoted as App I and the second one is denoted as App II.
The results from AMSA simulations are used as data set. AMSA operate with
time series as primary data set, so these time series are used as training and test sets in
comparative analysis of our two applications.
On the stage of pre-processing time series at first are sliced to subsequences.
After that the data in the patterns are normalised to provide the equidistance measures.
These measures are allows us to use k-mean algorithm to cluster the subsequences into
a discrete set of unordered facts.
window
size
app I
3
4
5
6
7
8
9
10
overall
app II
0,45
0,55
0,6
0,55
0,52
0,43
0,36
0,33
0,47
accuracy
ZeroR
J48
0,45
0,35
0,55
0,33
0,6
0,38
0,55
0,37
0,52
0,33
0,43
0,27
0,36
0,22
0,33
0,2
0,47
0,31
Table 1 Impact of the window size on accuracy
Figure 2 Impact of the window size on accuracy
0,46
0,56
0,61
0,55
0,53
0,45
0,38
0,37
0,49
Journal of Information, Control and Management Systems, Vol. 10, (2012), No.2
195
In the first experiment the both applications process the experimental data and as
result for comparative analysis the accuracy δ is used. And accuracy is the number of
successful classified samples as a ratio to all samples.
δ= /θ
Where the number of is correct classified samples and θ is all samples. In this
experiment the impact of size of window w is studied and results are shown in table.
In the second experiment the both algorithms process the experimental data and
as result the execution performance is used. The execution performance is measured in
milliseconds.
window
size
3
4
5
6
7
8
9
10
11
app I
318
391
470
623
738
841
890
996
1209
time in ms
app II
ZeroR
180
75
196
79
199
82
207
85
213
88
206
90
219
93
221
96
223
99
J48
440
520
643
882
976
1201
1270
1376
1540
Table 2 Impact of the window size on execution performance
Figure 3 Impact of the window size on execution performance
From the presented results is obvious that both applications of a new algorithm
show equal accuracy, which is almost the same as accuracy of J48. In other hand the
execution performance differs significantly, the application with indexed array is
196
Fast Adaptive Learning Algorithm For Classification Of Time Series
significantly faster than J48. Moreover with augmentation of the windows size w the
new proposed algorithm decisively outperforms the existing ones.
CONCLUSION
The new proposed algorithm shows the better execution performance than the
existing ones. In addition the latter approach consumes significantly less amount of
memory. These advantages of the new proposed algorithm are observed at the same
accuracy ratio as in the case of dynamic tree and matrix of incidence. This new
proposed approach could be used as in simultaneously manner as well as in the classic
way with explicit learning and classification parts.
In the future studies the influence of learning and amnesia rates at accuracy and
execution performance could be of interest. Probably the learning performance of the
new proposed algorithm could be better than the learning performance of the other
classification approaches such as decision trees and artificial networks, and this will be
the matter of future studies.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
PEIMAN MAMANI BARNAGHI, Vahid Alizadeh Sahzabi and Azuraliza Abu
Bakar,A Comparative Study for Various Methods of Classification,2012
International Conference on Information and Computer Networks (ICICN 2012)
IPCSIT vol. 27 (2012) © (2012) IACSIT Press, pp 62-68
RAJPUT A., et all, J48 and JRIP Rules for E-Governance Data, International
Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (2) :
2011,pp 202-217
QUINLAN J.R., Induction of Decision Trees, Machine Learning 1: 81-106,
1986,pp 81-106
JOHNSON D. E., OLES F. J., ZHANG T., GOETZ T., A decision-tree-based
symbolic rule induction system for text categorization, IBM SYSTEMS
JOURNAL, VOL 41, NO 3, 2002
BRAZIER, F.M.T., KEPHART, J.O. ; VAN DYKE PARUNAK, H. ; HUHNS,
M.N. Agents and Service-Oriented Computing for Autonomic Computing: A
Research Agenda IEEE Internet Computing Magazine Volume: 13 , Issue: 3,
Page(s): 82 - 87
MARKOVA V.. SHOPOV V., ONKOV K, An Approach for Developing the
Behaviour Models of Autonomous Mobility Sensor Agent, Varna, Proc. of the
International conference ”Information Technologies, 19-20 September, Varna,
Bulgaria, 2008, pp 97-105
MARKOVA V; SHOPOV V.: Comparative Analysis of Algorithms for learning
In Autonomous Mobile Sensor Agent. Proceedings of the International
Conference on Information Technologies; 2011, pp 255-263
Download