Lecture 5 and 6 on Neural Networks

Lecture 11 Neural Networks
Learning outcomes
You will know the two main learning paradigms
You will know some problem types and the appropriate paradigm to use
You will know some further detail of the practical techniques and problems
encountered when setting up networks
NN Learning Strategies
What is learning?
Learning is a process by which the parameters of a NN are adapted through stimulation
by the environment. The type of learning is determined by the manner in which the
parameter change takes place [adapted from Haykin - Neural Networks 1994 Mcmillan].
So the NN is stimulated by the environment.
The NN changes as a result.
The NN responds to the environment in a new way.
We have met Perceptrons and Feed Forward NNs and a training algorithm for these types
of NN. This algorithm is an example of an error correction learning algorithm. The
learning paradigm is that of supervised learning.
The NN learns from examples which are given to it – the NN responds and if necessary is
then changed because the supervisor (the person who wrote the training program) has
decided that response was not (sufficiently) correct. The particular algorithm we have
seen differentially changes the weights that are likely to have most effect.
We will see in the next lecture that there are kinds of NN which learn from data in an
unsupervised way using a different algorithm (competitive learning).
Problem Type and suitable Learning Paradigms
Function Approximation
Input –output mapping needed – Supervised learning
Need to learn some patterns – then later on recall the pattern from partial information or
noisy data. Unsupervised learning
Pattern classification
If patterns known in advance this reduces to input –output mappings and hence
supervised learning. If no knowledge of the expected patterns then unsupervised can be
used to detect them.
Time series prediction
Sequence of data – want to guess the next one [stock market prediction] View as input –
output mapping – supervised with error correction.
Further issues to do with Neural Network design
Training methods
We have discussed the use of learning rates, and the problem of local minima for the
error function which can be solved by the use of a momentum term. These approaches
date from the earlier days of neural networks where the training methods were pretty
primitive. Early backpropagation algorithms were slow and cumbersome. Subsequent
work by mathematicians have refined training methods so that there are now many
variants, using sophisticated specialist techniques. They are difficult to describe without
technical mathematical jargon – but they are still accessible in matlab for our use. The
Levenberg-Marquhardt algorithm works extremely quickly and reliably for example and
is probably the algorithm of choice within matlab.
One of the problems which can occur is that our network learns the data too well – it
specialises to the data input and it is no good at generalisation. This is what happened
with the curve fitting backprop demo in the laboratory when the difficulty index was 1
but we used 9 neurons to fit the data.
The difficulty is that we are training and always looking for a smaller error value – how
do we know that reducing the error on the training set is a bad thing from the point of
view of generalising?
One answer is early stopping or cross-validation.
Recall the process we have settled on to train a network:
Split data into train set and test set (randomly);
Train network using training data – then test network on test set. If we get a sufficiently
good result then we will use the net on new data and trust the results.
However when we overtrain the test set gives bad results – so we can't use the network.
When we use early stopping we split the original data into 3 sets – a train, a test and a
cross-validation set.
We train the network on the training set – but we keep checking on the accuracy of the
net on the validation set as well as the training set. As long as the error on both sets is
reducing we keep training – but only the training data is used in changing the network
weights. When we start to get increasing error on the validation data we stop training the
net and test it on the test data. If it works ok with the test data then we are willing to use it
on new data. We haven't covered this in labs but matlab can build this in to the training
with the NN tool.