An object-oriented C++ library for multi-layer feed-forward neural networks based on the Decorator design pattern Michal Czardybon1 and Lars Nolle2 1 Silesian University of Technology Faculty of Automatic Control, Electronics and Computer Science Wydzial Automatyki, Elektroniki i Informatyki ul. Akademicka 16, 44-100 Gliwice, Poland 2School of Computing and Informatics The Nottingham Trent University Burton Street, Nottingham, NG1 4BU, UK lars.nolle@ntu.ac.uk Summary This paper describes the innovative design of an object-oriented C++ library for multilayer feed-forward neural networks following the Decorator design pattern. One design objective was to take full advantage of the object-oriented paradigm, i.e. easy of use, to understand and to extend the library with no impact on efficiency. The greatest problem encountered was the requirement of atomicity of classes, which required the decoupling of the neural network classes from the learning algorithms, in this case from the error-backpropagation learning algorithm, which strongly relates on knowledge about the internal structure of a network. Three alternative design patterns have been evaluated for this project. These design patterns were the Decorator design pattern, the Extension Objects design pattern, and the External Map design pattern. The Decorator design pattern was then chosen for the design of the library. The design decisions are presented and justified including the overall design and the choice of the underlying data structures with notes about an efficient and concise ways of implementation, all done in a purely objectoriented way. KEYWORDS: neural networks, object-oriented design, decorator design pattern 1 Introduction During a research project, an efficient object-oriented C++ library for multi-layer feedforward neural networks was needed. Object-oriented programming (OOP) is widely believed to improve modularity and reliability of software products. OOP increases maintainability and encourage reuse of existing code. Corresponding Author Czardybon, Nolle These advantages result from the general design rules binding concepts from the world being modelled with classes and pairs of concepts satisfying the specialization-generalization relation with inheritance. Unfortunately it turns out in some circumstances not to be so straightforward to design a set of classes that would allow high flexibility and extensibility. This problem has already been explored, which resulted in the development of design patterns [1] that provided general solutions to the frequently encountered design problems in object-oriented programming. These general rules are becoming more and more popular. The design of the library presented in this article applies some of these rules to the task of modelling multi-layer feed-forward neural networks, which is believed to have resulted in a significantly better design according to its flexibility and extensibility compared with existing libraries. As a consequence, new learning algorithms can easily be added without any changes to the existing code. 1.1 Requirements identified The initial purpose of the NTUNE library (Nottingham Trent University Neural Environment) was to capture the model of the multi-layer feed-forward neural networks in an object-oriented paradigm to provide a generic reusable toolkit for various applications. The most important factor to be taken into account during a design process for every library is its reusability. A success of a library can be measured by the number of applications it was used in. To be successful a library has to be easy to use, to understand and extend, and efficient at the same time. The object-oriented paradigm was used for the NTUNE library as it is widely believed to be the best-known choice to meet these goals. To take full advantage of this paradigm, some rules had to be carefully followed. During the design of the NTUNE library, some of the explicit guidelines from [2] were taken into account. The numbers in brackets denote the corresponding guideline in [2]: 1. High level abstractions (13) - the higher abstraction the closer to the way people think. Algorithms can be expressed more elegantly, when they are based on high level data structures. 2. Atomicity of classes (14) - when the classes are well-encapsulated, so that each of them models exactly one, well-separated concept or task, it is easier to modify or exchange implementation of one of them without worrying about the rest. 3. Do not repeat yourself (17) Page 2 of 23 Czardybon, Nolle - every concept should be described only once; every repetition not only doubles programmer's effort, but much worse is a threat of inconsistency. These guidelines, as translated into the field of multi-layer feed-forward networks, can express some explicit requirements for the library: 1. High level abstractions – vectors and matrices with dot-product-like high level operation are better than low-level tables being operated on in explicit loops. 2. Atomicity of classes – the design should be modular, so that some parts of it are interchangeable with some alternatives. Especially, as there are many different algorithms known for the task of learning, they should be interchangeable. Moreover, a trained network can be used for some particular applications where it only has to compute the function learned, thus there should be a “core” computing layer class, which knows nothing about any learning algorithm. 3. Do not repeat yourself – especially, no learning algorithm should repeat any part of the common functionality of neural network or duplicate the common structure. Similarly, what is being done in the “core” network, should not be repeated in learning algorithms classes. 2 Artificial Neural Networks First models of Artificial Neural Networks (ANNs) were introduced in 1943 by Warren McCulloch and Walter Pitts [3] but it took another 20 years to overcome the problems of their first approach and to find appropriate learning methods. Since then, applications of artificial neural networks have grown rapidly in a wide area of science and industry. Artificial neural networks simulate the biological structure of neural networks in brains in a simplified way to endow computers with the very considerable abilities of biological neural networks: the ability of learning from examples, pattern classification, generalization and prediction. Both artificial and biological neural networks consist of a large number of simple processing elements called units or nodes which are connected to other processing elements. The great performance of neural networks stems from this massive parallelism [4]. It was was proven by Hornik, et al. [5] that “... standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximation any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy ...” Hornik also provided evidence that such networks “... are capable of arbitrarily accurate approximation to a function and its derivatives” [6]. Therefore, feed-forward networks can be used Page 3 of 23 Czardybon, Nolle to approximate a given process by presenting suitable examples to it, provided a deterministic relationship exists between the input parameter and the output parameter. Because of this, the feed-forward network architecture is the most commonly used type of artificial neural networks. Therefore, an Object-Oriented C++ library was required for this network type. 2.1 Feed-forward networks A feed-forward network can be seen as a black box with defined and directed inputs and outputs (Figure 1). That means the information is fed through the net in only one direction without any feedback loops. Figure 2 shows a single processing element of a feed-forward network. The output of the processing element is calculated by adding a bias value to the sum of all inputs multiplied with their associated weight (Equation 1) and passing the result through a non-linear transfer function, usually a sigmoid (Equation 2) or tangent hyperbolic function. x bias xi wi (1) i y s( x ) 1 1 e x (2) The sigmoid function s(x) can be considered to be a “smooth” binary step function so that the first derivative s(x)’ exist and is easy to compute (Equation 3). s( x)' s( x)1 s( x) (3) The processing elements are arranged in layers, one input layer, one or more hidden layers - so called because they are ‘invisible’ from the outside - and one output layer. Each output of each unit in one layer is connected usually with every unit in the next layer. Units within the same layer are not connected to each other (Figure 3). The networks work in two modes: the training mode and the testing mode. In training mode, a set of training input patterns together with the demanded output pattern is used to adjust the weights in order to minimize the error of the produced network output and the demanded network output. In test mode, new input patterns can be presented to the network and the network output is used in the application. 2.2 Error-backpropagation learning algorithm What makes neural networks really interesting is their ability to learn and generalize a function from examples. The classical learning algorithm for the multi-layer feed-forward architecture – the error backpropagation one is based on the greatest descent rule. Its derivation can be found in [4]. This section presents briefly how it works. Page 4 of 23 Czardybon, Nolle The training starts with random initialization of weights and biases. Then many learning cycles (epochs) are performed. In each epoch all the training patterns (the examples) are presented in sequence to the network and the weights of the neural network are updated. The algorithm for updating the weights for one data pattern is as follows: 1. Compute an output of the network for the given input as usual, but additionally remember the outputs of all the neurons. 2. Compute delta values for all the neurons in the output layer: delta i output i 1 output i target i output i where: i is an index of neuron in the output layer, outputi is the obtained value on the ith output, targeti is the expected value on the ith output. 3. Compute the delta values for all the remaining neurons in all the hidden layers(this is the error-backpropagation rule): n k delta i k output i 1 k k 1 output i k 1 w j i delta j j 1 where: i is an index of neuron in the current layer, j is an index of neuron form the next layer, k is an index of the current layer, n is the number of neurons in the next layer. 4. Update all weights in all the layers wi j wi j learningrate delta i input j where wij is the weight corresponding to the jth input of the ith neuron. 5. Update all biases in all the layers bi bi learningrate deltai The learning process is being performed iteratively until a stop condition occurs. The stop condition can be for example when an error had not been decreasing over some specified number of iterations. From the C++ library designer's point of view, the important thing is that for this algorithm the internal structure of a network cannot be hidden completely (as suggested by the encapsulation rule of object-oriented paradigm) and some additional data are needed for every neuron. These issues will be discussed later in this article. Page 5 of 23 Czardybon, Nolle 2.3 Other algorithms Error-backpropagation is the classic algorithm, but it is not the only one. During the recent years many improvements to the algorithm and alternative algorithms were proposed. For example, one may consider general purpose optimization methods such as simulated annealing [8] or genetic algorithms [9]. It was important to take this into account during the design phase, so that other learning algorithms can possibly be added easily to the library in future. 3 Object-Oriented Analyses This section describes the identification of the main objects in the system. The most obvious candidate classes are the nouns from the informal description [10]: neural network, neuron, layer, connection (link), weight, example (pattern), error, signal, vector, input, output, transfer function. Some of them denote primitive values rather than types (weight, error, signal, input, output). A vector is an abstract data type present in many libraries, e.g. in STL [11]. Inputs and outputs are examples of data of vector type (hence they do not denote classes themselves). Similar to the vector is the matrix type. Although it is not present in the description explicitly, it was found to be suitable for some implicit requirements of the implementation. Abstract data types provide higher level of abstraction, so they were preferred over primitive values (more details will follow). The transfer function is a method rather than an object, but it might be of benefit if would be represented by a function object, if many different interchangeable transfer functions would be needed. A sample or data pattern is an auxiliary class of objects. After this initial analysis, the remaining candidates for the main classes are: neural network, neuron, layer, connection (link). The relationships are: neural network consists of several layers, each layers consist of several neurons and the neurons from consecutive layers are connected with the connections (or links). The most primitive object – the neuron – is fully described by a vector of weights and a bias value. At this point, the final set of classes may seem obvious – the classes should be: NeuralNetwork, Layer, Neuron and Link. In fact, this set can be found in [12]. It is very flexible, but it might be argued that it is too flexible for the purpose of the library for multi-layer feedforward neural networks. For this particular architecture many significant simplifications can be made that result in a simpler design (less concepts) and better efficiency (less granularity). In fact, it was decided to discard the neuron class and choose the layer class as the smallest component of a network. The second simplification was to discard the link class as well, as it is in fact not Page 6 of 23 Czardybon, Nolle needed at all, because the links store no data and as only fully connected layers are being considered, the connections exist implicitly between every two neurons from any two consecutive layers. Additionally, it was observed that a neural network had a basic interface similar to a layer, i.e. the ability to compute vectors operations. Moreover, there might be other objects of that functionality for applications others than neural networks, e.g. fuzzy-logic controllers. For this reason a VectorMapping class was introduced as a general common interface of layer and neural network and potentially some other black-box like vector functions. Up to this point, no potential classes for learning algorithms were mentioned. They will be discussed later separately, as it was an important design problem to express them independently from the structure of the network. This designing is consistent with the second requirement identified (the structure should know nothing about the learning algorithms). In the conclusion, NeuralNetwork and Layer classes, as specializations of the VectorMapping class, were chosen to constitute the main part of the design. For the reason of flexibility (explained below) the Layer class is only an interface and the concrete class is ComputingLayer (see Figure 4). 4 Design and implementation This section explains the details of the design decisions made and gives a rationale for them. As in the C++ language design phase is strongly dependent of the implementation possibilities, some implementation details are presented here as well. It is important to notice that software development process used was iterative, while this article describes the final product, so the decisions described here were not necessarily concluded in the first iteration or in the presented order. 4.1 Underlying data structures The first fundamental problem to solve was the need for high level abstract data types of vectors and matrices. The problem was that there was no standard matrix class in the C++ standard library, while the vector one existed. An appropriate matrix class had to be written or found in some free (preferably GNU/GPL) library. The requirement for this class was to allow access to sequences of elements of rows and columns uniformly in the same way as to sequences of elements of vectors. The reason for this requirement was that it would allow expressing the dotproduct-like operations generally. A decision was made to write both vector and matrix classes from scratch. The uniformity of the access to sequences was achieved by introducing an additional class – the Sequence, which Page 7 of 23 Czardybon, Nolle is an additional level of abstraction between data structures and iterators. The sequence class is a unified view on vectors, rows of matrices and columns of matrices, that provides an iterator for read and write access over its elements. The implementation of this concept in the NTUNE library is simple, not parametrized by type, so it is not general enough to be used as a general purpose tool. Nevertheless, the general implementation can be made. 4.2 Layer class Layer is a computing component containing several neurons. In this library however, for the purpose of efficiency, the neurons are not present explicitly. Instead, the layer is parametrized by a matrix of weights and by a vector of biases. The outputs are computed according to the following equation: output i sigma weights.row i input biasesi (4) where weights.row(i) denotes ith row of the matrix of weights, which is the first argument of dotproduct operation. The ith row of the matrix contains the weights of the ith “virtual” neuron in the layer (see figure 5). The bias vector contains the biases for all the neurons in a layer. The interface of the Layer class can be divided into two sub-interfaces: the computational one, and the one for insight, i.e. the methods for the weights and the biases read and modification. The insight is provided via the Sequence class. class Layer : public VectorMapping { public: Layer(int n_inputs, int n_outputs) : VectorMapping(n_inputs, n_outputs) { } virtual ~Layer() { } /******* VECTOR-MAPPING (interface) ************************/ virtual void compute(Sequence input, Sequence output) = 0; /******* INSIGHT (interface) *******************************/ virtual virtual virtual virtual double void Sequence Sequence getBiasForNeuron(int n) setBiasForNeuron(int n, double v) getWeightSequenceOfNeuron(int n) getWeightSequenceForInput(int k) = = = = 0; 0; 0; 0; int getNeuronCount() { return getOutputCount(); } }; This representation proved to be very elegant, as many operations were possible to be expressed at higher level using dot-product operations on sequences. As it was already shown the dot-product between a row and an input vector is used to compute the output of the layer. An Page 8 of 23 Czardybon, Nolle explicit low-level loop is avoided. In consequence, the whole implementation of the basic functionality of the layer class occupies about 10 lines of code (see below) and is easy to read and to understand. void ComputingLayer::compute(Sequence input, Sequence output) { assert(output.getSize() == this->getNeuronCount()); for (int k = 0; k < getNeuronCount(); k++) { double sum = + Sequence::dot_product ( input, getWeightSequenceOfNeuron(k) ) + biases(k); output(k) = sigma( sum ); } } 4.3 Network class A neural network class is not complicated in its construction – it simply collects several layers and is responsible for the propagation of signals between consecutive layers in the compute() method. The next listing shows the declaration of the network class neglecting some unimportant details: class NeuralNetwork : public VectorMapping { public: /******** CREATION *********************************************/ NeuralNetwork(int n_inputs, int n_outputs); ~NeuralNetwork(); /// Add hidden layer (insert it just before the output layer) void addHiddenLayer(int size); /******** VECTOR-MAPPING (overriden) ***************************/ void compute(Sequence input, Sequence output); /******** INSIGHT **********************************************/ int getLayerCount(); Layer* getLayer(int k); void setLayer(int k, Layer* lay); private: std::vector<Layer*> layers; }; A little problem was encountered with the process of network construction. It might seem obvious from the high-level point of view that the constructor should take a list containing the sizes of the different layers, for example: net = new NeuralNetwork( {3, 5, 2} ); but this is not possible in C++. The passed array would have to be pre-prepared: int sizes[] = {3, 5, 2}; Page 9 of 23 Czardybon, Nolle net = new NeuralNetwork( sizes, 3 ); or alternatively a variable-length parameter list could be used, which is not type-safe and generally not elegant. If the number of arguments can be limited to some constant value, a parameter list could be employed with default 0 values for the parameters that are not used: NeuralNetwork::NeuralNetwork(int size1, int size2, int size3 = 0, int size4 = 0, int size5 = 0); Then for the cost of the inelegant definition and the hard-wired limitation of the number of possible layers (to four in the above example), the usage would be very natural: net = new NeuralNetwork( 3, 5, 2 ); Yet another solution was chosen which compromise conciseness and elegance to create a network with no hidden layers with a constructor and then add arbitrary numbers of hidden layers with the addHiddenLayer method. The constructor takes only the number of inputs and outputs as its parameters. net = new NeuralNetwork(3, 2); net->addHiddenLayer(5); 4.4 Auxiliary classes Some auxiliary classes are common for all the learning algorithms. There is a class for data patterns (Pattern) and for a collection of data patterns (PatternSet). The NetworkSerializer class is provided with static methods for saving networks to files and loading them back. There are also some functions for general vector-mapping operations that compute the errors produced for a specified data-pattern sets. 5 Selecting a design pattern Having the basic functionality of neural networks designed separately of the learning algorithms, the next task was to find a way of adding learning algorithms to a network dynamically at run-time. This section describes the general problem and discusses different candidate solutions. The final solution is presented together with the rational behind the decision. 5.1 Problem description The problem encountered was that the error-backpropagation algorithm is strongly connected with the internal structure of a neural network. The main difficulty to overcome was that the learning algorithm requires additional information spread among all the neurons of a network. In fact, the values of all the outputs of all the neurons have to be stored during the feedforward phase, since they are being used later in the error-backpropagation phase, as well as delta values, which are convenient to be computed before any updates of weight are made. When using Page 10 of 23 Czardybon, Nolle the momentum learning method [13], a history of weight changes has to be remembered too. Moreover, some additional methods connected with learning seem appropriate to be placed in the layer class rather than in the learning algorithm one, e.g. for weights random initialization (initRandom) or for delta values computation (computeOutputDelta, computeHiddenDelta). The above problem can be considered an instance of a more general problem of adding new methods and new fields to an existing class. The C++ language has a feature, which can be employed for this purpose – inheritance. However, it is a static feature and it does not allow to modify objects dynamically, i.e. during program execution. Unfortunately, the static case is very inflexible. For example, it is not possible to create a neural network and train it with one algorithm first and then continue with another one, which may be very appropriate (hybrid methods). If only the static feature is available, every time an algorithm is changed, new objects have to be created and contents needs to be copied. Another argument for a more dynamic approach is elegance. It is typical to train a network and then, when the training is finished, use it in an application. But the additional data needed during training should be removed when they are not necessary any more for efficiency, e.g. if the trained network is to be used in a system with memory constrains, such as a mobile device. It was decided that a method for adding and removing fields and methods dynamically had to be developed as a better approach. The three consecutive sections describe three alternative solutions that were considered, from which one was chosen. 5.2 Solution 1 – The Decorator design pattern For the purpose of efficiency, the C++ programming language does not allow to add or remove methods and fields dynamically. A quite elegant solution for this general problem was already found – it is the Decorator design pattern [1]. It was concluded that it was appropriate for the problem with learning algorithms for neural networks, as it is suggested in [1] to use it (2 cases out of 3): to add responsibilities to individual objects dynamically and transparently, that is, without affecting other objects. for responsibilities that can be withdrawn. In the Decorator design pattern the extended object is not being created by static inheritance, but by dynamic composition (“has-a” relationship instead of “is-a”). There are two objects constituting the whole – the core object, i.e. the object being extended and the object of the extension – the decorator (see Figure 6). Besides the additional fields and methods, the decorator Page 11 of 23 Czardybon, Nolle implements the interface of the core simply redirecting the common messages to the core object it contains, so that it may replace the core, wherever it was used (the need to implement the same interface is the reason why the abstract interface of layer is separated). An example of a redirected (or delegated) message is the query for the bias value of a neuron: double LearningLayer::getBiasForNeuron(int n) { return core->getBiasForNeuron(n); } In the cases when functionality of an old method has to be extended, it can be implemented like in the example below, when the outputs of all the neurons have to be stored during the feed-forward phase: void LearningLayer::compute(Sequence input, Sequence output) { core->compute(input, output); // store the result; it will be needed for backpropagation Sequence::copy( output, this->output.getSequence() ); } The Decorator design pattern can be illustrated as a box the core can be placed in and replaced by, that allows adding new features to its content. An important property of this solution is that the extension can be removed when it is not needed any more and that it imposes no requirements on the object being extended. On the other hand, unlike when the static inheritance is used, there is a need to explicitly redirect all the massages intended for the core object. 5.3 Solution 2 – The Extension Objects design pattern The Decorator design pattern is not the only possible solution of the problem discussed. There has already been a simple and quite popular solution of the problem of additional data that have to be added to an existing class, so called “user data”. It can be implemented, for example, in the form of special fields like: void* user_data; Unfortunately this approach is not elegant, because it is untyped, the “user data” member exists even if it is not used. Also, the object has to be specially prepared for an extension, i.e. it “has to know” that it will be extended. The above solution seems to support additional data only, but the additional methods can be supported as well. To achieve this, the “user data” object should contain a pointer to the “core” object (as in the Decorator design pattern, see Figure 7 b,c) and should be of a class with methods operating on both “core” and “user data” (own) objects. It can be interpreted as that the “core” object contains “inherited” members and the “user data” contains Page 12 of 23 Czardybon, Nolle the added members (both fields and methods). This approach could be extended even further to support multiple extensions at the same time (however this is beyond the needs of the learning algorithms), e.g. there would be a set of extending objects (“user data” objects) instead of just a single one. Actually, this solution was described in [14] as the Extension Objects design pattern and is suggested to be used when: “you need to support the addition of new or unforeseen interfaces to existing classes and you don't want to impact clients that don't need this new interface. Extension Objects lets you keep related operations together by defining them in a separate class.” The described applicability concentrates on the extensions of interface (additional methods), and it is not stated there, that it makes possible to store additional data as well. However, this is clearly the case. It could be a solution to the problem of adding learning algorithms to neural networks dynamically. However, the “core” objects have to know that they may be extended, i.e. the interface for extensions had to be there, even if in some applications it might not being used. Therefore, it was concluded that the Decorator design pattern is more suitable as it imposes less requirements for the class to be extended. The Extension Objects pattern has also some advantages, e.g. the interface of the class being extended does not need to be preserved. 5.4 Solution 3 – The External Map There is yet another solution, which does not require the “core” class to know that it will be extended and simultaneously does not need to reimplement the common interface. This is the External Map pattern (see Figure 7d), which introduces a map, i.e. an associative container, mapping pointers to “core” objects to associated extension objects: std::map<Subject*, Extension*> extensions_map; what can be translated into the area of neural networks as: std::map<Layer*, BackpropExtension*> extensions_map; The extensions are implemented exactly as in the Extension Objects and Decorator design pattern, i.e. they store pointers to the “core” object. The map can be placed anywhere – for example in the class of a learning algorithm. A drawback of this solution is that the operation of accessing the extension can be more expensive. However, as there are usually only a few layers in an ANN application, the overhead would not be significant. The External Map pattern also introduces more additional data than the Page 13 of 23 Czardybon, Nolle previously described solutions. In some way it actually introduces a parallel duplication of the set of objects being extended, which was the main reason why the simpler solution of the Decorator design pattern was preferred. 5.5 Selecting a design pattern There are pros and cons for each of the analysed solution, so for different applications different of them may be appropriate. Table 1 provides a comparison of the alternative solutions evaluated in this work.. It might also be of help in selecting a specific pattern for other applications. The Decorator design pattern was chosen as the most appropriate solution for the problem of pluggable learning algorithms to neural networks, because it was concluded to best compromise between simplicity and efficiency. In this application domain, only one extension is used at the same time, for other applications another alternative might be better. For example, the Extension Objects design pattern could be successfully applied in a design of a computer game, where characters gain many new abilities when they find some artefacts and lose them when the artefacts are lost. On the other hand, the External Map is irreplaceable when the extension concerns objects from a given, closed for changes library, when absolutely no assumptions can be made about them while the library is developed. Figure 8 shows the final class structure derived from the evaluation. 6 Implementation of the error-backpropagation algorithm This section describes the implementation of the classic error-backpropagation learning algorithm. The error-backpropagation algorithm (with momentum ) is implemented in the BackpropAlgorithm class. This class has a method named attachNetwork(), which has to be invoked with the network that has to be trained. Inside this method, the Decorator design pattern is being applied, i.e. the objects of layers from the network are being replaced by their decorated versions. Then the initNetwork() method should be called to set the weights and biases to random values. After this the training can be performed using the train() method. When the training is finished, the added algorithm can be removed from the network with detachNetwork() method call. The decorated layer class has some additional methods that the learning algorithm calls. Unfortunately, the layers are accessible via the NeuralNetwork class only by their basic interface (Layer class), while what is needed is the extended interface (LearningLayer class). Thus, for the purpose of code clarity and to avoid explicit down-casting, the exact pointers to the extended layer objects are remembered in the algorithm class. This is clearly a repetition, so it breaks the “Do not Page 14 of 23 Czardybon, Nolle repeat yourself” rule, but it was concluded that the better clarity justifies this. 7 NeuralLab facade The guidelines of good object-oriented design imply that an application programmer has to manage several classes. This is a flexible solution, but not necessary the easiest one for the programmer in some situations. It is expected that most of the programmers will at the beginning want to make some experiments with the library to check its capabilities or to quickly check if the neural networks are able to solve their problems. For this purpose, a simplified easy-to-use all-inone NeuralLab wrapper class was introduced. It is intended for the situation when one wants to experiment with a neural network with the error-backpropagation algorithm using two data sets: a training set and a testing set. There are also methods for direct error and correlation retrieval. 8 Discussion Other object-oriented libraries for neutral networks are available. In this section they are compared with the NTUNE library with a special emphasis put on the design of general structure and the separation of learning algorithms. In the approach presented in [12], the smallest object is neuron, not a layer, and there is an explicit class for links. This makes design much more general, i.e. neural networks of very different architectures can by created, because there can be various types of neurons and links derived. However, there is a need to maintain much more objects and even the objects of links, that could have been expressed implicitly, have to be explicitly created. To implement various architectures and various learning algorithms new classes for nodes (neurons) and links are derived from more general ones. This way the learning algorithms can be somehow separated from the structure (the structure is in the base class, the algorithm is in the derived one), but this is an example of a statically oriented solution, i.e. the learning algorithm cannot be dynamically added, removed or exchanged. Once chosen for a network, an algorithm (and its data) cannot be easily changed or removed after the training. JOONE [15] is an object-oriented neural network library written in Java. It introduces layer as the smallest object similarly as NTUNE does. An interesting solution was used for representing links: the links are grouped into synapses exactly like the neurons are grouped into layers. This allows constructing networks of various architectures, not only the multi-layer feedforward ones. Being similar (although more general) in overall design to the NTUNE, JOONE does not separate learning algorithms from the structure of the networks, i.e. the learning procedures are build in the layers objects. Furthermore, the design is oriented on gradient-based Page 15 of 23 Czardybon, Nolle methods not allowing to implement other types of methods, like simulated annealing, etc. It is clearly visible even in the library itself in the WinnerTakeAll layer class the backward method intended for overriding for learning purposes is “not implemented / not required” showing that the design of the basic abstract layer class is appropriate only for one of several classes of learning algorithms. Of course, arbitrary learning methods can be implemented by extending general classes, but it cannot be done elegantly, there will always be some methods and even some data inherited, that are intended for the gradient-based methods. NEURObjects [16] is a library written in C++, which uses a completely different class hierarchy. Its main inheritance schema assumes that a two-layer network is a specialisation of a one-layer network; a three-layer network is a specialisation of a two-layer network, etc. Such approach was avoided in the NTUNE library, because it was observed that the number of layers is only a parameter of a network, and the three-layer network is not a specialization of two-layer one. Besides the structure, an important feature of a neural network library is the way it deals with learning algorithms. The NEURObjects library tries to separate them from the structure, but it does it only partially. In fact, a backprop() method is present in the classes of multi-layer feedforward neural networks, so the design is oriented for the gradient-based methods only. Moreover, although the learning algorithm classes are separated, much of the common data and functionality of the gradient-based methods is stored in the LayerNetTrain class of objects being a static internal part of an neural network objects, hence the learning algorithms are not truly separated. 9 Conclusion The most important factor that distinguishes the NTUNE library from other neural network libraries is that it fully and effectively separates learning algorithms from the structure of the networks, allowing to use neural network objects as pure computing structures only and train them with various independent learning algorithms, which integrate with the internal structure of networks only temporarily. It was achieved by discarding statically-oriented features, when dynamic behaviour was needed (inheritance was used mainly for the purpose of interface sharing, not for extensions of objects' features, which may be dynamic) and by breaking the rule of the full encapsulation of objects, i.e. an “insight” interfaces were introduced, where the learning algorithms needed to access the internal structure of the objects constituting neural networks. This approach was chosen instead of friend classes, because what was required was an access to the abstract structure, not to the concrete implementation. The NTUNE library was intended for multi-layer feed-forward neural networks only, as it Page 16 of 23 Czardybon, Nolle was the initial requirement of this project. However, general design patterns were examined in practice shedding light on the way they can be employed to separate additional pluggable features of objects such as learning algorithm. This experience can be used to implement elegantly neural networks of other architectures as well. NTUNE and an NTUNE based graphical network tool are released under the GNU General Public Licence and hence is freely available. Installation files for Linux and Windows are available at http://www.mczard.republika.pl/ntune.html. Page 17 of 23 Czardybon, Nolle 10 References [1] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995 [2] Eckel, B.: Thinking in C++: Volume 1, (2nd ed), Prentice-Hall, 2000 [3] McCulloch, W.S., Pitts, W.: A logical calculus of the ideas imminent in nervous activity, Bulletin of Mathematical Biophysics, Vol.5, 1943, pp115-133 [4] Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing, MIT Press, 1986 [5] Hornik, K., Stinchcombe, M., White, H.: Multilayer Feedforward Networks are Universal Approximators, Neural Networks, Vol. 2, 1989, pp 359-366 [6] Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks, Neural Networks, Vol. 4, 1991, pp 251-257 [7] Rummelhart, D. E., Hinton, G. E., Williams, R. J.: Learning representations by backpropagating errors, Nature, vol. 323, 1986, pp 533-536 [8] Kirkpatrick, S., Gelatt, C.D., Jr., Vecchi, M.P.: Optimization by Simulated Annealing, Science, Vol. 220, No. 4598, 13 May 1983, pp 671-680 [9] Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989 [10] Abbott, R.J.: Program design by informal English descriptions, Communications of the ACM, Vol. 26, No. 11, 1983, pp 882-894 [11] Stepanov, A., Lee, M.: The Standard Template Library, Hewlett-Packard , 1996 [12] Rogers, J.: Object oriented neural networks in C++, Morgan Kaufmann, 1996 [13] Qian, N.: On the Momentum Term in Gradient Descent Learning Algorithm, Neural Networks, Vol. 12, 1999, pp 145-151 [14] Gamma, E.: The Extension Objects Pattern, Technical report, Washington University, wucs-97-07, 1996 [15] Marrone, P.: Joone - Java Object Oriented Neural Engine, http://www.jooneworld.com [16] Valentini, G., Masulli, F.: NEURObjects: an object-oriented library for neural network development, http://www.disi.unige.it/person/ValentiniG/NEURObjects Page 18 of 23 Czardybon, Nolle Table 1 – Comparison of design patterns. Feature Decorator Extension Objects External Map New data members can be added yes yes yes New methods can be added yes yes yes yes no yes no yes yes Access to the extension is effective yes yes1 no Multiple extensions are possible yes2 yes yes no no yes 1 2 3 The core need not to be prepared for extension The abstract interface does not have to be separated and reimplemented in the extension Explicit down-casting is not needed to retrieve the extension Number of additional pointers3 Notes: 1. Only in the case of a single extension. 2. Limited to a few extensions because of efficiency dropping. 3. This number can be considered a measure of structural complexity. Page 19 of 23 Czardybon, Nolle x1 x2 x3 y1 y2 y3 Feed-forward network xn ym Figure 1 – Feed-forward network as black box. x1 x2 w1 w2 wi xi s( x ) y i wi xi Figure 2 – Processing element. input layer hidden layer output layer Figure 3 – Feed-forward network with one hidden layer. Figure 4 – General class diagram. Page 20 of 23 Czardybon, Nolle Figure 5 – A matrix of weights from a layer with 5 inputs and 3 outputs. Figure 6 – Decorator design pattern applied to the layer object. Page 21 of 23 Czardybon, Nolle Figure 7 – Basic model (a) and three alternative solutions to the dynamic extension problem (b, c, d). Arrows denote pointers. Page 22 of 23 Czardybon, Nolle Figure 8 – Class structure of the ANN library. Page 23 of 23