Lectures 9 Feed-Forward Neural Networks Learning outcomes You will be able to: describe the common feed-forward neural network architecture; describe various common transfer functions; know how to code different types of data for input to a network. We have met the hardlim transfer function. We do however allow other transfer functions – in fact for multilayer networks we need other transfer functions because hardlim makes training ineffective. The most common are hardlim hardlims purelin logsig (or sigmoid) tansig How do we use feed forward neural networks (or the five second guide to neural network modelling) Feed forward nets are used to classify patterns, recognize things or to calculate functions. We get them to do this by supervised training - that is we present examples to them and say "do this thing when you see something like this other thing". After training (if we get it right) the network not only knows how to behave with the data we trained it on but also acts correctly with completely new data. Sometimes this very simple idea gets lost in the detail – because the things we want the net to recognise don't come ready made to put into a computer. We have to do some work to get them into the right form. Description of Feed Forward Neural Network used for function approximation. The standard feed forward network that we will use is used to attempt to model a function. It has three layers: input layer hidden layer output layer No transfer function uses logsig transfer uses purelin transfer How many neurons in each layer? input hidden output determined by function ????? determined by function (well almost) choose to get good fit eg Credit scoring. Input size depends on data you possess. Output size depends on what you are after – 1 if credit score, 1 if yes/no, 3 if £x at y% paid back over z months. An example problem We have some data on irises – numerical values and the classification of the iris the data was taken from. Can we tell what kind of iris we are looking at just from the data (and so do away with the need for botanists to do this job for us)? See the data in iris.data. [we will make available in the labs] A sample of this data is 6, 2.7, 5.1, 1.6, Iris-versicolor 6.7, 3.1, 4.7, 1.5, Iris-versicolor 4.3, 3, 1.1, 0.1, Iris-setosa 6, 2.2, 5, 1.5, Iris-virginica 5.8, 2.6, 4,1 .2, Iris-versicolor The data is interpreted as follows: 6 2.7 5.1 1.6 are the observations of 4 features of the iris. This iris is of type Iris-versicolor 4.3 3 1.1 0.1 are measurements on an iris of type Iris-setosa. We need to get this into a form we can classify with a neural net – so we use a numeric coding. See the data in irisnumeric.data. A sample of this data is: 6 2.7 5.1 1.6 1 6.7 3.1 4.7 1.5 1 4.3 3 1.1 0.1 2 6 2.2 5 1.5 0 5.8 2.6 4 1.2 1 The data is interpreted as follows: 6 2.7 5.1 1.6 are the observations of 4 features of the iris. This iris is of type 1 or Iris-versicolor 4.3 3 1.1 0.1 are measurements on an iris of type 2 or Iris-setosa. We want to train a NN to recognise such data i.e we want to plug in four values for the features and have the network say "That was an iris-versicolor" (or rather "That was a number 1" or rather output 1). So we create a network with four inputs, some number of neurons in the hidden layer (3 say) and one neuron in the output layer and use the default transfer functions. [picture] We give it lots of samples to train on, and if it works fine. If not we try altering the hidden layer size or using other transfer functions. How to code information into the inputs: number input eg weight height salary age data. (Assuming for credit score output). These are probably left as numerical inputs – one neuron each. Picture However – beware of data over time. Example with inflation – don't use raw numbers but categorise as low, medium, high for example. Non-Numeric Input More data: weight height salary age gender data gender – male/female categorical data – not a number. Convert to numeric: 0 for male say and 1 for female. Even more data: ethnic origin - white european, black british, black african, asian sub continent etc. Here there is a choice of single neuron 0 0.25 0.5 0.75 1.0 say. Alternatively use a bit map approach: ethnic origin group of neurons. Code ethnic origin as a bit pattern. 0 0 0 – white 0 0 1 black british etc. picture There are three main types of coding data: Linear or Local [suitable for numeric or ordered categorical data (eg income on a categorised scale low income – 1 middle –2 high - 3)] Binary coding [for categorical where no expected relationship (eg ethnic origin or Iris data) Only used to reduce output dimension – try to avoid since hidden order] One-of-n or Distributed [for categorical where no expected relationship (eg ethnic origin or Iris data) Preferred over binary but may be unwieldy if n is big] NB – coding can affect how easy it is to find a network which recognises the data. How to code information into the outputs: Similar choices for the output neurons. If you want to train a network to recognise categories you need to code them somehow. There is additional problem of decoding the output you actually get (applies whatever you do). Suppose we expect numerical output. Then if neuron spits out 25 ok its 25. Expect categorical output 0 male 1 female. Neuron spits out 0.4 ?? Well it's nearer to 0 – so male. [You might want to know what values the nn produced for the training data before deciding to put the cut off at 0.5]. picture Similarly with distributed output 0.2 0.3 0.7 – is this really 0 0 1? The usual rule is to take the winning neuron as a 1. How many neurons No definite rule but……there is a "rule of thumb" about the relationship between the number of weights and the number of training values needed to give a certain level of performance: #training set > #Weights/error proportion eg want 10% errors as max error and design network with 20 weights 20/0.1=200. Need 200 data points in the training set. Alternatively if you have a network with 15 weights and only 50 data points: error=#W/#train=15/50 ~ 1/3. So the network won't be very good.