Lecture 11 Neural Networks Learning outcomes You will know the two main learning paradigms You will know some problem types and the appropriate paradigm to use You will know some further detail of the practical techniques and problems encountered when setting up networks NN Learning Strategies What is learning? Learning is a process by which the parameters of a NN are adapted through stimulation by the environment. The type of learning is determined by the manner in which the parameter change takes place [adapted from Haykin - Neural Networks 1994 Mcmillan]. So the NN is stimulated by the environment. The NN changes as a result. The NN responds to the environment in a new way. We have met Perceptrons and Feed Forward NNs and a training algorithm for these types of NN. This algorithm is an example of an error correction learning algorithm. The learning paradigm is that of supervised learning. The NN learns from examples which are given to it – the NN responds and if necessary is then changed because the supervisor (the person who wrote the training program) has decided that response was not (sufficiently) correct. The particular algorithm we have seen differentially changes the weights that are likely to have most effect. We will see in the next lecture that there are kinds of NN which learn from data in an unsupervised way using a different algorithm (competitive learning). Problem Type and suitable Learning Paradigms Function Approximation Input –output mapping needed – Supervised learning Association Need to learn some patterns – then later on recall the pattern from partial information or noisy data. Unsupervised learning Pattern classification If patterns known in advance this reduces to input –output mappings and hence supervised learning. If no knowledge of the expected patterns then unsupervised can be used to detect them. Time series prediction Sequence of data – want to guess the next one [stock market prediction] View as input – output mapping – supervised with error correction. Further issues to do with Neural Network design Training methods We have discussed the use of learning rates, and the problem of local minima for the error function which can be solved by the use of a momentum term. These approaches date from the earlier days of neural networks where the training methods were pretty primitive. Early backpropagation algorithms were slow and cumbersome. Subsequent work by mathematicians have refined training methods so that there are now many variants, using sophisticated specialist techniques. They are difficult to describe without technical mathematical jargon – but they are still accessible in matlab for our use. The Levenberg-Marquhardt algorithm works extremely quickly and reliably for example and is probably the algorithm of choice within matlab. Overfitting One of the problems which can occur is that our network learns the data too well – it specialises to the data input and it is no good at generalisation. This is what happened with the curve fitting backprop demo in the laboratory when the difficulty index was 1 but we used 9 neurons to fit the data. picture The difficulty is that we are training and always looking for a smaller error value – how do we know that reducing the error on the training set is a bad thing from the point of view of generalising? One answer is early stopping or cross-validation. Recall the process we have settled on to train a network: Split data into train set and test set (randomly); Train network using training data – then test network on test set. If we get a sufficiently good result then we will use the net on new data and trust the results. However when we overtrain the test set gives bad results – so we can't use the network. When we use early stopping we split the original data into 3 sets – a train, a test and a cross-validation set. We train the network on the training set – but we keep checking on the accuracy of the net on the validation set as well as the training set. As long as the error on both sets is reducing we keep training – but only the training data is used in changing the network weights. When we start to get increasing error on the validation data we stop training the net and test it on the test data. If it works ok with the test data then we are willing to use it on new data. We haven't covered this in labs but matlab can build this in to the training with the NN tool.