Design of object-oriented C++ library for

advertisement
An object-oriented C++ library for
multi-layer feed-forward neural networks based on the
Decorator design pattern
Michal Czardybon1 and Lars Nolle2 
1 Silesian
University of Technology
Faculty of Automatic Control, Electronics and Computer Science
Wydzial Automatyki, Elektroniki i Informatyki
ul. Akademicka 16, 44-100 Gliwice, Poland
2School
of Computing and Informatics
The Nottingham Trent University
Burton Street, Nottingham, NG1 4BU, UK
lars.nolle@ntu.ac.uk
Summary
This paper describes the innovative design of an object-oriented C++ library for multilayer feed-forward neural networks following the Decorator design pattern. One design objective
was to take full advantage of the object-oriented paradigm, i.e. easy of use, to understand and to
extend the library with no impact on efficiency. The greatest problem encountered was the
requirement of atomicity of classes, which required the decoupling of the neural network classes
from the learning algorithms, in this case from the error-backpropagation learning algorithm,
which strongly relates on knowledge about the internal structure of a network. Three alternative
design patterns have been evaluated for this project. These design patterns were the Decorator
design pattern, the Extension Objects design pattern, and the External Map design pattern. The
Decorator design pattern was then chosen for the design of the library. The design decisions are
presented and justified including the overall design and the choice of the underlying data structures
with notes about an efficient and concise ways of implementation, all done in a purely objectoriented way.
KEYWORDS:
neural networks, object-oriented design, decorator design pattern
1 Introduction
During a research project, an efficient object-oriented C++ library for multi-layer feedforward neural networks was needed. Object-oriented programming (OOP) is widely believed to
improve modularity and reliability of software products. OOP increases maintainability and
encourage reuse of existing code.

Corresponding Author
Czardybon, Nolle
These advantages result from the general design rules binding concepts from the world
being modelled with classes and pairs of concepts satisfying the specialization-generalization
relation with inheritance. Unfortunately it turns out in some circumstances not to be so
straightforward to design a set of classes that would allow high flexibility and extensibility. This
problem has already been explored, which resulted in the development of design patterns [1] that
provided general solutions to the frequently encountered design problems in object-oriented
programming. These general rules are becoming more and more popular.
The design of the library presented in this article applies some of these rules to the task of
modelling multi-layer feed-forward neural networks, which is believed to have resulted in a
significantly better design according to its flexibility and extensibility compared with existing
libraries. As a consequence, new learning algorithms can easily be added without any changes to
the existing code.
1.1 Requirements identified
The initial purpose of the NTUNE library (Nottingham Trent University Neural
Environment) was to capture the model of the multi-layer feed-forward neural networks in an
object-oriented paradigm to provide a generic reusable toolkit for various applications.
The most important factor to be taken into account during a design process for every
library is its reusability. A success of a library can be measured by the number of applications it
was used in. To be successful a library has to be easy to use, to understand and extend, and
efficient at the same time. The object-oriented paradigm was used for the NTUNE library as it is
widely believed to be the best-known choice to meet these goals. To take full advantage of this
paradigm, some rules had to be carefully followed. During the design of the NTUNE library, some
of the explicit guidelines from [2] were taken into account. The numbers in brackets denote the
corresponding guideline in [2]:
1. High level abstractions (13)
- the higher abstraction the closer to the way people think. Algorithms can be
expressed more elegantly, when they are based on high level data structures.
2. Atomicity of classes (14)
- when the classes are well-encapsulated, so that each of them models exactly one,
well-separated concept or task, it is easier to modify or exchange implementation of
one of them without worrying about the rest.
3. Do not repeat yourself (17)
Page 2 of 23
Czardybon, Nolle
- every concept should be described only once; every repetition not only doubles
programmer's effort, but much worse is a threat of inconsistency.
These guidelines, as translated into the field of multi-layer feed-forward networks, can
express some explicit requirements for the library:
1. High level abstractions – vectors and matrices with dot-product-like high level
operation are better than low-level tables being operated on in explicit loops.
2. Atomicity of classes – the design should be modular, so that some parts of it are
interchangeable with some alternatives. Especially, as there are many different
algorithms known for the task of learning, they should be interchangeable. Moreover, a
trained network can be used for some particular applications where it only has to
compute the function learned, thus there should be a “core” computing layer class,
which knows nothing about any learning algorithm.
3. Do not repeat yourself – especially, no learning algorithm should repeat any part of the
common functionality of neural network or duplicate the common structure. Similarly,
what is being done in the “core” network, should not be repeated in learning algorithms
classes.
2 Artificial Neural Networks
First models of Artificial Neural Networks (ANNs) were introduced in 1943 by Warren
McCulloch and Walter Pitts [3] but it took another 20 years to overcome the problems of their first
approach and to find appropriate learning methods. Since then, applications of artificial neural
networks have grown rapidly in a wide area of science and industry. Artificial neural networks
simulate the biological structure of neural networks in brains in a simplified way to endow
computers with the very considerable abilities of biological neural networks: the ability of learning
from examples, pattern classification, generalization and prediction.
Both artificial and biological neural networks consist of a large number of simple
processing elements called units or nodes which are connected to other processing elements. The
great performance of neural networks stems from this massive parallelism [4]. It was was proven
by Hornik, et al. [5] that “... standard multilayer feedforward networks with as few as one hidden
layer using arbitrary squashing functions are capable of approximation any Borel measurable
function from one finite dimensional space to another to any desired degree of accuracy ...”
Hornik also provided evidence that such networks “... are capable of arbitrarily accurate
approximation to a function and its derivatives” [6]. Therefore, feed-forward networks can be used
Page 3 of 23
Czardybon, Nolle
to approximate a given process by presenting suitable examples to it, provided a deterministic
relationship exists between the input parameter and the output parameter. Because of this, the
feed-forward network architecture is the most commonly used type of artificial neural networks.
Therefore, an Object-Oriented C++ library was required for this network type.
2.1 Feed-forward networks
A feed-forward network can be seen as a black box with defined and directed inputs and
outputs (Figure 1). That means the information is fed through the net in only one direction without
any feedback loops. Figure 2 shows a single processing element of a feed-forward network. The
output of the processing element is calculated by adding a bias value to the sum of all inputs
multiplied with their associated weight (Equation 1) and passing the result through a non-linear
transfer function, usually a sigmoid (Equation 2) or tangent hyperbolic function.
x  bias   xi  wi
(1)
i
y  s( x ) 
1
1  e x
(2)
The sigmoid function s(x) can be considered to be a “smooth” binary step function so that
the first derivative s(x)’ exist and is easy to compute (Equation 3).
s( x)'  s( x)1  s( x)
(3)
The processing elements are arranged in layers, one input layer, one or more hidden
layers - so called because they are ‘invisible’ from the outside - and one output layer. Each output
of each unit in one layer is connected usually with every unit in the next layer. Units within the
same layer are not connected to each other (Figure 3). The networks work in two modes: the
training mode and the testing mode. In training mode, a set of training input patterns together with
the demanded output pattern is used to adjust the weights in order to minimize the error of the
produced network output and the demanded network output. In test mode, new input patterns can
be presented to the network and the network output is used in the application.
2.2 Error-backpropagation learning algorithm
What makes neural networks really interesting is their ability to learn and generalize a
function from examples. The classical learning algorithm for the multi-layer feed-forward
architecture – the error backpropagation one is based on the greatest descent rule. Its derivation
can be found in [4]. This section presents briefly how it works.
Page 4 of 23
Czardybon, Nolle
The training starts with random initialization of weights and biases. Then many learning
cycles (epochs) are performed. In each epoch all the training patterns (the examples) are presented
in sequence to the network and the weights of the neural network are updated. The algorithm for
updating the weights for one data pattern is as follows:
1. Compute an output of the network for the given input as usual, but additionally
remember the outputs of all the neurons.
2. Compute delta values for all the neurons in the output layer:
delta i output i 1
output i
target i output i
where:
i is an index of neuron in the output layer,
outputi is the obtained value on the ith output,
targeti is the expected value on the ith output.
3. Compute the delta values for all the remaining neurons in all the hidden layers(this is
the error-backpropagation rule):
n
k
delta i
k
output i 1
k
k 1
output i
k 1
w j i delta j
j 1
where:
i is an index of neuron in the current layer,
j is an index of neuron form the next layer,
k is an index of the current layer,
n is the number of neurons in the next layer.
4. Update all weights in all the layers
wi j
wi j learningrate delta i input j
where wij is the weight corresponding to the jth input of the ith neuron.
5. Update all biases in all the layers
bi
bi learningrate deltai
The learning process is being performed iteratively until a stop condition occurs. The stop
condition can be for example when an error had not been decreasing over some specified number
of iterations. From the C++ library designer's point of view, the important thing is that for this
algorithm the internal structure of a network cannot be hidden completely (as suggested by the
encapsulation rule of object-oriented paradigm) and some additional data are needed for every
neuron. These issues will be discussed later in this article.
Page 5 of 23
Czardybon, Nolle
2.3 Other algorithms
Error-backpropagation is the classic algorithm, but it is not the only one. During the
recent years many improvements to the algorithm and alternative algorithms were proposed. For
example, one may consider general purpose optimization methods such as simulated annealing [8]
or genetic algorithms [9]. It was important to take this into account during the design phase, so that
other learning algorithms can possibly be added easily to the library in future.
3 Object-Oriented Analyses
This section describes the identification of the main objects in the system. The most
obvious candidate classes are the nouns from the informal description [10]: neural network,
neuron, layer, connection (link), weight, example (pattern), error, signal, vector, input, output,
transfer function. Some of them denote primitive values rather than types (weight, error, signal,
input, output). A vector is an abstract data type present in many libraries, e.g. in STL [11]. Inputs
and outputs are examples of data of vector type (hence they do not denote classes themselves).
Similar to the vector is the matrix type. Although it is not present in the description explicitly, it
was found to be suitable for some implicit requirements of the implementation. Abstract data types
provide higher level of abstraction, so they were preferred over primitive values (more details will
follow). The transfer function is a method rather than an object, but it might be of benefit if would
be represented by a function object, if many different interchangeable transfer functions would be
needed. A sample or data pattern is an auxiliary class of objects.
After this initial analysis, the remaining candidates for the main classes are: neural
network, neuron, layer, connection (link). The relationships are: neural network consists of several
layers, each layers consist of several neurons and the neurons from consecutive layers are
connected with the connections (or links). The most primitive object – the neuron – is fully
described by a vector of weights and a bias value.
At this point, the final set of classes may seem obvious – the classes should be:
NeuralNetwork, Layer, Neuron and Link. In fact, this set can be found in [12]. It is very flexible,
but it might be argued that it is too flexible for the purpose of the library for multi-layer feedforward neural networks. For this particular architecture many significant simplifications can be
made that result in a simpler design (less concepts) and better efficiency (less granularity). In fact,
it was decided to discard the neuron class and choose the layer class as the smallest component of
a network. The second simplification was to discard the link class as well, as it is in fact not
Page 6 of 23
Czardybon, Nolle
needed at all, because the links store no data and as only fully connected layers are being
considered, the connections exist implicitly between every two neurons from any two consecutive
layers. Additionally, it was observed that a neural network had a basic interface similar to a layer,
i.e. the ability to compute vectors operations. Moreover, there might be other objects of that
functionality for applications others than neural networks, e.g. fuzzy-logic controllers. For this
reason a VectorMapping class was introduced as a general common interface of layer and neural
network and potentially some other black-box like vector functions.
Up to this point, no potential classes for learning algorithms were mentioned. They will
be discussed later separately, as it was an important design problem to express them independently
from the structure of the network. This designing is consistent with the second requirement
identified (the structure should know nothing about the learning algorithms).
In the conclusion, NeuralNetwork and Layer classes, as specializations of the
VectorMapping class, were chosen to constitute the main part of the design. For the reason of
flexibility (explained below) the Layer class is only an interface and the concrete class is
ComputingLayer (see Figure 4).
4 Design and implementation
This section explains the details of the design decisions made and gives a rationale for
them. As in the C++ language design phase is strongly dependent of the implementation
possibilities, some implementation details are presented here as well. It is important to notice that
software development process used was iterative, while this article describes the final product, so
the decisions described here were not necessarily concluded in the first iteration or in the presented
order.
4.1 Underlying data structures
The first fundamental problem to solve was the need for high level abstract data types of
vectors and matrices. The problem was that there was no standard matrix class in the C++ standard
library, while the vector one existed. An appropriate matrix class had to be written or found in
some free (preferably GNU/GPL) library. The requirement for this class was to allow access to
sequences of elements of rows and columns uniformly in the same way as to sequences of
elements of vectors. The reason for this requirement was that it would allow expressing the dotproduct-like operations generally.
A decision was made to write both vector and matrix classes from scratch. The uniformity
of the access to sequences was achieved by introducing an additional class – the Sequence, which
Page 7 of 23
Czardybon, Nolle
is an additional level of abstraction between data structures and iterators. The sequence class is a
unified view on vectors, rows of matrices and columns of matrices, that provides an iterator for
read and write access over its elements. The implementation of this concept in the NTUNE library
is simple, not parametrized by type, so it is not general enough to be used as a general purpose
tool. Nevertheless, the general implementation can be made.
4.2 Layer class
Layer is a computing component containing several neurons. In this library however, for
the purpose of efficiency, the neurons are not present explicitly. Instead, the layer is parametrized
by a matrix of weights and by a vector of biases. The outputs are computed according to the
following equation:
output i sigma weights.row i input
biasesi
(4)
where weights.row(i) denotes ith row of the matrix of weights, which is the first argument of dotproduct operation. The ith row of the matrix contains the weights of the ith “virtual” neuron in the
layer (see figure 5). The bias vector contains the biases for all the neurons in a layer.
The interface of the Layer class can be divided into two sub-interfaces: the computational
one, and the one for insight, i.e. the methods for the weights and the biases read and modification.
The insight is provided via the Sequence class.
class Layer : public VectorMapping
{
public:
Layer(int n_inputs, int n_outputs)
: VectorMapping(n_inputs, n_outputs)
{
}
virtual ~Layer() { }
/******* VECTOR-MAPPING (interface) ************************/
virtual void compute(Sequence input, Sequence output) = 0;
/******* INSIGHT (interface) *******************************/
virtual
virtual
virtual
virtual
double
void
Sequence
Sequence
getBiasForNeuron(int n)
setBiasForNeuron(int n, double v)
getWeightSequenceOfNeuron(int n)
getWeightSequenceForInput(int k)
=
=
=
=
0;
0;
0;
0;
int getNeuronCount() { return getOutputCount(); }
};
This representation proved to be very elegant, as many operations were possible to be
expressed at higher level using dot-product operations on sequences. As it was already shown the
dot-product between a row and an input vector is used to compute the output of the layer. An
Page 8 of 23
Czardybon, Nolle
explicit low-level loop is avoided. In consequence, the whole implementation of the basic
functionality of the layer class occupies about 10 lines of code (see below) and is easy to read and
to understand.
void ComputingLayer::compute(Sequence input, Sequence output)
{
assert(output.getSize() == this->getNeuronCount());
for (int k = 0; k < getNeuronCount(); k++) {
double sum =
+ Sequence::dot_product (
input,
getWeightSequenceOfNeuron(k)
)
+ biases(k);
output(k) = sigma( sum );
}
}
4.3 Network class
A neural network class is not complicated in its construction – it simply collects several
layers and is responsible for the propagation of signals between consecutive layers in the
compute() method. The next listing shows the declaration of the network class neglecting some
unimportant details:
class NeuralNetwork : public VectorMapping
{
public:
/******** CREATION *********************************************/
NeuralNetwork(int n_inputs, int n_outputs);
~NeuralNetwork();
/// Add hidden layer (insert it just before the output layer)
void
addHiddenLayer(int size);
/******** VECTOR-MAPPING (overriden) ***************************/
void
compute(Sequence input, Sequence output);
/******** INSIGHT **********************************************/
int
getLayerCount();
Layer* getLayer(int k);
void
setLayer(int k, Layer* lay);
private:
std::vector<Layer*>
layers;
};
A little problem was encountered with the process of network construction. It might seem
obvious from the high-level point of view that the constructor should take a list containing the
sizes of the different layers, for example:
net = new NeuralNetwork( {3, 5, 2} );
but this is not possible in C++. The passed array would have to be pre-prepared:
int sizes[] = {3, 5, 2};
Page 9 of 23
Czardybon, Nolle
net = new NeuralNetwork( sizes, 3 );
or alternatively a variable-length parameter list could be used, which is not type-safe and generally
not elegant. If the number of arguments can be limited to some constant value, a parameter list
could be employed with default 0 values for the parameters that are not used:
NeuralNetwork::NeuralNetwork(int size1, int size2, int size3 = 0,
int size4 = 0, int size5 = 0);
Then for the cost of the inelegant definition and the hard-wired limitation of the number of
possible layers (to four in the above example), the usage would be very natural:
net = new NeuralNetwork( 3, 5, 2 );
Yet another solution was chosen which compromise conciseness and elegance to create a network
with no hidden layers with a constructor and then add arbitrary numbers of hidden layers with the
addHiddenLayer method. The constructor takes only the number of inputs and outputs as its
parameters.
net = new NeuralNetwork(3, 2);
net->addHiddenLayer(5);
4.4 Auxiliary classes
Some auxiliary classes are common for all the learning algorithms. There is a class for
data patterns (Pattern) and for a collection of data patterns (PatternSet). The NetworkSerializer
class is provided with static methods for saving networks to files and loading them back. There are
also some functions for general vector-mapping operations that compute the errors produced for a
specified data-pattern sets.
5 Selecting a design pattern
Having the basic functionality of neural networks designed separately of the learning
algorithms, the next task was to find a way of adding learning algorithms to a network
dynamically at run-time. This section describes the general problem and discusses different
candidate solutions. The final solution is presented together with the rational behind the decision.
5.1 Problem description
The problem encountered was that the error-backpropagation algorithm is strongly
connected with the internal structure of a neural network. The main difficulty to overcome was
that the learning algorithm requires additional information spread among all the neurons of a
network. In fact, the values of all the outputs of all the neurons have to be stored during the feedforward phase, since they are being used later in the error-backpropagation phase, as well as delta
values, which are convenient to be computed before any updates of weight are made. When using
Page 10 of 23
Czardybon, Nolle
the momentum learning method [13], a history of weight changes has to be remembered too.
Moreover, some additional methods connected with learning seem appropriate to be placed in the
layer class rather than in the learning algorithm one, e.g. for weights random initialization
(initRandom) or for delta values computation (computeOutputDelta, computeHiddenDelta).
The above problem can be considered an instance of a more general problem of adding
new methods and new fields to an existing class. The C++ language has a feature, which can be
employed for this purpose – inheritance. However, it is a static feature and it does not allow to
modify objects dynamically, i.e. during program execution. Unfortunately, the static case is very
inflexible. For example, it is not possible to create a neural network and train it with one algorithm
first and then continue with another one, which may be very appropriate (hybrid methods). If only
the static feature is available, every time an algorithm is changed, new objects have to be created
and contents needs to be copied. Another argument for a more dynamic approach is elegance. It is
typical to train a network and then, when the training is finished, use it in an application. But the
additional data needed during training should be removed when they are not necessary any more
for efficiency, e.g. if the trained network is to be used in a system with memory constrains, such as
a mobile device.
It was decided that a method for adding and removing fields and methods dynamically
had to be developed as a better approach. The three consecutive sections describe three alternative
solutions that were considered, from which one was chosen.
5.2 Solution 1 – The Decorator design pattern
For the purpose of efficiency, the C++ programming language does not allow to add or
remove methods and fields dynamically. A quite elegant solution for this general problem was
already found – it is the Decorator design pattern [1]. It was concluded that it was appropriate for
the problem with learning algorithms for neural networks, as it is suggested in [1] to use it (2 cases
out of 3):

to add responsibilities to individual objects dynamically and transparently, that
is, without affecting other objects.

for responsibilities that can be withdrawn.
In the Decorator design pattern the extended object is not being created by static
inheritance, but by dynamic composition (“has-a” relationship instead of “is-a”). There are two
objects constituting the whole – the core object, i.e. the object being extended and the object of the
extension – the decorator (see Figure 6). Besides the additional fields and methods, the decorator
Page 11 of 23
Czardybon, Nolle
implements the interface of the core simply redirecting the common messages to the core object it
contains, so that it may replace the core, wherever it was used (the need to implement the same
interface is the reason why the abstract interface of layer is separated). An example of a redirected
(or delegated) message is the query for the bias value of a neuron:
double LearningLayer::getBiasForNeuron(int n)
{
return core->getBiasForNeuron(n);
}
In the cases when functionality of an old method has to be extended, it can be
implemented like in the example below, when the outputs of all the neurons have to be stored
during the feed-forward phase:
void LearningLayer::compute(Sequence input, Sequence output)
{
core->compute(input, output);
// store the result; it will be needed for backpropagation
Sequence::copy( output, this->output.getSequence() );
}
The Decorator design pattern can be illustrated as a box the core can be placed in and
replaced by, that allows adding new features to its content.
An important property of this solution is that the extension can be removed when it is not
needed any more and that it imposes no requirements on the object being extended. On the other
hand, unlike when the static inheritance is used, there is a need to explicitly redirect all the
massages intended for the core object.
5.3 Solution 2 – The Extension Objects design pattern
The Decorator design pattern is not the only possible solution of the problem discussed.
There has already been a simple and quite popular solution of the problem of additional data that
have to be added to an existing class, so called “user data”. It can be implemented, for example, in
the form of special fields like:
void* user_data;
Unfortunately this approach is not elegant, because it is untyped, the “user data” member
exists even if it is not used. Also, the object has to be specially prepared for an extension, i.e. it
“has to know” that it will be extended. The above solution seems to support additional data only,
but the additional methods can be supported as well. To achieve this, the “user data” object should
contain a pointer to the “core” object (as in the Decorator design pattern, see Figure 7 b,c) and
should be of a class with methods operating on both “core” and “user data” (own) objects. It can
be interpreted as that the “core” object contains “inherited” members and the “user data” contains
Page 12 of 23
Czardybon, Nolle
the added members (both fields and methods).
This approach could be extended even further to support multiple extensions at the same
time (however this is beyond the needs of the learning algorithms), e.g. there would be a set of
extending objects (“user data” objects) instead of just a single one. Actually, this solution was
described in [14] as the Extension Objects design pattern and is suggested to be used when:
“you need to support the addition of new or unforeseen interfaces to existing classes
and you don't want to impact clients that don't need this new interface. Extension
Objects lets you keep related operations together by defining them in a separate
class.”
The described applicability concentrates on the extensions of interface (additional
methods), and it is not stated there, that it makes possible to store additional data as well.
However, this is clearly the case. It could be a solution to the problem of adding learning
algorithms to neural networks dynamically. However, the “core” objects have to know that they
may be extended, i.e. the interface for extensions had to be there, even if in some applications it
might not being used. Therefore, it was concluded that the Decorator design pattern is more
suitable as it imposes less requirements for the class to be extended. The Extension Objects pattern
has also some advantages, e.g. the interface of the class being extended does not need to be
preserved.
5.4 Solution 3 – The External Map
There is yet another solution, which does not require the “core” class to know that it will
be extended and simultaneously does not need to reimplement the common interface. This is the
External Map pattern (see Figure 7d), which introduces a map, i.e. an associative container,
mapping pointers to “core” objects to associated extension objects:
std::map<Subject*, Extension*> extensions_map;
what can be translated into the area of neural networks as:
std::map<Layer*, BackpropExtension*> extensions_map;
The extensions are implemented exactly as in the Extension Objects and Decorator design
pattern, i.e. they store pointers to the “core” object. The map can be placed anywhere – for
example in the class of a learning algorithm.
A drawback of this solution is that the operation of accessing the extension can be more
expensive. However, as there are usually only a few layers in an ANN application, the overhead
would not be significant. The External Map pattern also introduces more additional data than the
Page 13 of 23
Czardybon, Nolle
previously described solutions. In some way it actually introduces a parallel duplication of the set
of objects being extended, which was the main reason why the simpler solution of the Decorator
design pattern was preferred.
5.5 Selecting a design pattern
There are pros and cons for each of the analysed solution, so for different applications
different of them may be appropriate. Table 1 provides a comparison of the alternative solutions
evaluated in this work.. It might also be of help in selecting a specific pattern for other
applications.
The Decorator design pattern was chosen as the most appropriate solution for the problem
of pluggable learning algorithms to neural networks, because it was concluded to best compromise
between simplicity and efficiency. In this application domain, only one extension is used at the
same time, for other applications another alternative might be better. For example, the Extension
Objects design pattern could be successfully applied in a design of a computer game, where
characters gain many new abilities when they find some artefacts and lose them when the artefacts
are lost. On the other hand, the External Map is irreplaceable when the extension concerns objects
from a given, closed for changes library, when absolutely no assumptions can be made about them
while the library is developed. Figure 8 shows the final class structure derived from the evaluation.
6 Implementation of the error-backpropagation algorithm
This section describes the implementation of the classic error-backpropagation learning
algorithm. The error-backpropagation algorithm (with momentum ) is implemented in the
BackpropAlgorithm class. This class has a method named attachNetwork(), which has to be
invoked with the network that has to be trained. Inside this method, the Decorator design pattern is
being applied, i.e. the objects of layers from the network are being replaced by their decorated
versions. Then the initNetwork() method should be called to set the weights and biases to
random values. After this the training can be performed using the train() method. When the
training is finished, the added algorithm can be removed from the network with
detachNetwork() method call.
The decorated layer class has some additional methods that the learning algorithm calls.
Unfortunately, the layers are accessible via the NeuralNetwork class only by their basic interface
(Layer class), while what is needed is the extended interface (LearningLayer class). Thus, for the
purpose of code clarity and to avoid explicit down-casting, the exact pointers to the extended layer
objects are remembered in the algorithm class. This is clearly a repetition, so it breaks the “Do not
Page 14 of 23
Czardybon, Nolle
repeat yourself” rule, but it was concluded that the better clarity justifies this.
7 NeuralLab facade
The guidelines of good object-oriented design imply that an application programmer has
to manage several classes. This is a flexible solution, but not necessary the easiest one for the
programmer in some situations. It is expected that most of the programmers will at the beginning
want to make some experiments with the library to check its capabilities or to quickly check if the
neural networks are able to solve their problems. For this purpose, a simplified easy-to-use all-inone NeuralLab wrapper class was introduced. It is intended for the situation when one wants to
experiment with a neural network with the error-backpropagation algorithm using two data sets: a
training set and a testing set. There are also methods for direct error and correlation retrieval.
8 Discussion
Other object-oriented libraries for neutral networks are available. In this section they are
compared with the NTUNE library with a special emphasis put on the design of general structure
and the separation of learning algorithms.
In the approach presented in [12], the smallest object is neuron, not a layer, and there is
an explicit class for links. This makes design much more general, i.e. neural networks of very
different architectures can by created, because there can be various types of neurons and links
derived. However, there is a need to maintain much more objects and even the objects of links,
that could have been expressed implicitly, have to be explicitly created. To implement various
architectures and various learning algorithms new classes for nodes (neurons) and links are
derived from more general ones. This way the learning algorithms can be somehow separated from
the structure (the structure is in the base class, the algorithm is in the derived one), but this is an
example of a statically oriented solution, i.e. the learning algorithm cannot be dynamically added,
removed or exchanged. Once chosen for a network, an algorithm (and its data) cannot be easily
changed or removed after the training.
JOONE [15] is an object-oriented neural network library written in Java. It introduces
layer as the smallest object similarly as NTUNE does. An interesting solution was used for
representing links: the links are grouped into synapses exactly like the neurons are grouped into
layers. This allows constructing networks of various architectures, not only the multi-layer feedforward ones. Being similar (although more general) in overall design to the NTUNE, JOONE
does not separate learning algorithms from the structure of the networks, i.e. the learning
procedures are build in the layers objects. Furthermore, the design is oriented on gradient-based
Page 15 of 23
Czardybon, Nolle
methods not allowing to implement other types of methods, like simulated annealing, etc. It is
clearly visible even in the library itself in the WinnerTakeAll layer class the backward method
intended for overriding for learning purposes is “not implemented / not required” showing that the
design of the basic abstract layer class is appropriate only for one of several classes of learning
algorithms. Of course, arbitrary learning methods can be implemented by extending general
classes, but it cannot be done elegantly, there will always be some methods and even some data
inherited, that are intended for the gradient-based methods.
NEURObjects [16] is a library written in C++, which uses a completely different class
hierarchy. Its main inheritance schema assumes that a two-layer network is a specialisation of a
one-layer network; a three-layer network is a specialisation of a two-layer network, etc. Such
approach was avoided in the NTUNE library, because it was observed that the number of layers is
only a parameter of a network, and the three-layer network is not a specialization of two-layer one.
Besides the structure, an important feature of a neural network library is the way it deals with
learning algorithms. The NEURObjects library tries to separate them from the structure, but it does
it only partially. In fact, a backprop() method is present in the classes of multi-layer feedforward neural networks, so the design is oriented for the gradient-based methods only. Moreover,
although the learning algorithm classes are separated, much of the common data and functionality
of the gradient-based methods is stored in the LayerNetTrain class of objects being a static internal
part of an neural network objects, hence the learning algorithms are not truly separated.
9 Conclusion
The most important factor that distinguishes the NTUNE library from other neural
network libraries is that it fully and effectively separates learning algorithms from the structure of
the networks, allowing to use neural network objects as pure computing structures only and train
them with various independent learning algorithms, which integrate with the internal structure of
networks only temporarily. It was achieved by discarding statically-oriented features, when
dynamic behaviour was needed (inheritance was used mainly for the purpose of interface sharing,
not for extensions of objects' features, which may be dynamic) and by breaking the rule of the full
encapsulation of objects, i.e. an “insight” interfaces were introduced, where the learning
algorithms needed to access the internal structure of the objects constituting neural networks. This
approach was chosen instead of friend classes, because what was required was an access to the
abstract structure, not to the concrete implementation.
The NTUNE library was intended for multi-layer feed-forward neural networks only, as it
Page 16 of 23
Czardybon, Nolle
was the initial requirement of this project. However, general design patterns were examined in
practice shedding light on the way they can be employed to separate additional pluggable features
of objects such as learning algorithm. This experience can be used to implement elegantly neural
networks of other architectures as well.
NTUNE and an NTUNE based graphical network tool are released under the GNU
General Public Licence and hence is freely available. Installation files for Linux and Windows are
available at http://www.mczard.republika.pl/ntune.html.
Page 17 of 23
Czardybon, Nolle
10 References
[1]
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable
Object-Oriented Software, Addison-Wesley, 1995
[2]
Eckel, B.: Thinking in C++: Volume 1, (2nd ed), Prentice-Hall, 2000
[3]
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas imminent in nervous activity,
Bulletin of Mathematical Biophysics, Vol.5, 1943, pp115-133
[4]
Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing, MIT Press, 1986
[5]
Hornik, K., Stinchcombe, M., White, H.: Multilayer Feedforward Networks are Universal
Approximators, Neural Networks, Vol. 2, 1989, pp 359-366
[6]
Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks, Neural
Networks, Vol. 4, 1991, pp 251-257
[7]
Rummelhart, D. E., Hinton, G. E., Williams, R. J.: Learning representations by backpropagating errors, Nature, vol. 323, 1986, pp 533-536
[8]
Kirkpatrick, S., Gelatt, C.D., Jr., Vecchi, M.P.: Optimization by Simulated Annealing,
Science, Vol. 220, No. 4598, 13 May 1983, pp 671-680
[9]
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning,
Addison-Wesley, 1989
[10]
Abbott, R.J.: Program design by informal English descriptions, Communications of the
ACM, Vol. 26, No. 11, 1983, pp 882-894
[11]
Stepanov, A., Lee, M.: The Standard Template Library, Hewlett-Packard , 1996
[12]
Rogers, J.: Object oriented neural networks in C++, Morgan Kaufmann, 1996
[13]
Qian, N.: On the Momentum Term in Gradient Descent Learning Algorithm, Neural
Networks, Vol. 12, 1999, pp 145-151
[14]
Gamma, E.: The Extension Objects Pattern, Technical report, Washington University,
wucs-97-07, 1996
[15]
Marrone, P.: Joone - Java Object Oriented Neural Engine, http://www.jooneworld.com
[16]
Valentini, G., Masulli, F.: NEURObjects: an object-oriented library for neural network
development, http://www.disi.unige.it/person/ValentiniG/NEURObjects
Page 18 of 23
Czardybon, Nolle
Table 1 – Comparison of design patterns.
Feature
Decorator
Extension Objects
External Map
New data members can be added
yes
yes
yes
New methods can be added
yes
yes
yes
yes
no
yes
no
yes
yes
Access to the extension is effective
yes
yes1
no
Multiple extensions are possible
yes2
yes
yes
no
no
yes
1
2
3
The core need not to be prepared for
extension
The abstract interface does not have to
be separated and reimplemented in the
extension
Explicit down-casting is not needed to
retrieve the extension
Number of additional pointers3
Notes:
1. Only in the case of a single extension.
2. Limited to a few extensions because of efficiency dropping.
3. This number can be considered a measure of structural complexity.
Page 19 of 23
Czardybon, Nolle
x1
x2
x3
y1
y2
y3
Feed-forward
network
xn
ym
Figure 1 – Feed-forward network as black box.
x1
x2
w1
w2
 wi  xi
s( x )
y
i
wi
xi
Figure 2 – Processing element.
input layer
hidden layer
output layer
Figure 3 – Feed-forward network with one hidden layer.
Figure 4 – General class diagram.
Page 20 of 23
Czardybon, Nolle
Figure 5 – A matrix of weights from a layer with 5 inputs and 3 outputs.
Figure 6 – Decorator design pattern applied to the layer object.
Page 21 of 23
Czardybon, Nolle
Figure 7 – Basic model (a) and three alternative solutions to the dynamic extension problem (b, c,
d). Arrows denote pointers.
Page 22 of 23
Czardybon, Nolle
Figure 8 – Class structure of the ANN library.
Page 23 of 23
Download