Neural Networks Biological Neural Networks dendrites colaterals cell body signal direction axon Biological Neuron electrical signal axon synapse vesicles neurotransmitters presynaptic membrane dendrite Biological Network synaptic gap postsynaptic membrane electrical signal http://pharmacyebooks.com/2010/10/artifitial-neural-networks-hot-topic-pharmaceutical-research.html The Perceptron The perceptron was developed by Frank Rosenblatt in 1957. It is a simple feed-forward network that can solve (create a decision function for) linearly separable problems. input data −∞, +∞ output −1, +1 Inside the Perceptron πΊπ weights ππ πΊπ ππ sigma-pi ... perceptron output ... πΊπ ππ πΆπ πΊππ πΊπ ππ΅−π πΊπ΅−π πΊπ΅−π ππ΅−π step function When is a Problem Linearly Separable? RED vs BLUE Linearly Separable Not Linearly Separable http://dynamicnotions.blogspot.com/2008/09/single-layer-perceptron.html A Practical Application Classification The Iris Data - This is one of the most famous datasets used to illustrate the classification problem. From four characteristics of the flower (the length of the sepal, the width of the sepal, the length of the petal and the width of the petal), the objective is to classify a sample of 150 irises into three species: versicolor, virginica and setosa. Sources: R.A. Fisher. "The use of multiple measurements in taxonomic problems. Annals of Eugenics", 7(2), 179–188 (1936) Data from: UCI Machine Learning Repository - http://archive.ics.uci.edu/ml/ Training a 4-2-1 Network for the Iris Data sepal length 1/5 of Iris Data selected uniformly, 10 samples per class for a total of 30 training set pairs. The 4-2-1 network is comprised of a total of 10 weights, 8 between the input and hidden layers, and 2 between the hidden layer and the output. sepal width 0.0 Iris-setosa 4-2-1 net 0.5 Iris-versicolor 1.0 Iris-virginica petal length Iris Data - 3 classes 50 samples each petal width 5.1 4.9 4.7 4.6 5.0 3.5 3.0 3.2 3.1 3.6 1.4 1.4 1.3 1.5 1.4 7.0 6.4 6.9 5.5 6.5 3.2 3.2 3.1 2.3 2.8 4.7 4.5 4.9 4.0 4.6 6.3 5.8 7.1 6.3 6.5 3.3 2.7 3.0 2.9 3.0 6.0 5.1 5.9 5.6 5.8 iris characteristics 0.2 0.2 0.2 0.2 0.2 : 1.4 1.5 1.5 1.3 1.5 : 2.5 1.9 2.1 1.8 2.2 : Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor trained network specification input layer hidden layer output layer learning rate error limit max runs # training sets ihweights Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica The outputs for the three classes were set to 0, 0.5 and 1.0 howeights 4 2 1 0.28 0.01 10000 30 0.1835137273718 -1.52185484488147 1.06085392071769 -10.1057086709985 -1.53328697751333 4.0131689222145 -1.63759087701708 10.741961194748 -6.01331593454728 6.66056158141261 Classifier Performance Sample Count 1 2 3 1 50 0 0 2 0 46 1 3 0 4 49 Perf. Fraction 1 2 3 1 1.0 0.0 0.0 2 0.0 0.92 1.0 3 0.0 0.08 0.98 A Demonstration Typical Feed-Forward Neural Network input layer hidden layer output layer output data input data −∞, +∞ −∞, +∞ −1, +1 Inside an Artificial Neuron πΆπ weights sigma-pi neuron output ππ πΆπ ... πΆπ ππ ... outputs from previous layer πΆπ πΊππ πΆπ ππ΅−π πΆπ΅−π πΆπ΅−π ππ΅−π sigmoid function distribution to next layer ππ Backward Error Propagation 1. Initialize the network with small random weights. 2. Present an input pattern to the input layer of the network. 3. Feed the input pattern forward through the network to calculate its activation value. 4. Take the difference between desired output and the activation value to calculate the network’s activation error. 5. Adjust the weights feeding the output neurons to reduce the activation error for this input pattern. 6. Propagate an error value back to each hidden neuron that is proportional to its contribution to the network activation error. 7. Adjust the weights feeding each hidden neuron to reduce its contribution of error for this input pattern. 8. Repeat steps 2 to 7 for each input pattern in the training set ensemble. 9. Repeat step 8 until the network is suitably trained. Implementing a Neural Network t output training sets each with p values t input training sets each with m values mxn weights m input layer nodes input to hidden layer n hidden layer nodes nxp p output weights hidden to layer nodes output layer Neural Network Data Structure & Components public public public public public public public public public public public public public public public public public static static static static static static static static static static static static static static static static static double learn = 0.28; double error = 0.01; int npairs = 0; int maxnumruns = 10000; int numinput = 1; int numhidden = 1; int numoutput = 1; double[,] inTrain; double[,] outTrain; neuron[] iLayer; neuron[] hLayer; neuron[] oLayer; weight[,] ihWeight; weight[,] hoWeight; int pxerr; double Scalerr; bool showtoterr = true; public class neuron { public double input; public double output; public double error; public neuron() { input = 0.0; output = 0.0; error = 0.0; } } public class weight { public double wt; public double delta; public weight(double wght) { wt = wght; delta = 0.0; } } Generalized Delta Rule pth training set input π«π πππ =πΌπΉππ πππ tpi πππ π«π πππ πΉππ πΌ πππ correction to weight value error in jth unit learning rate Quantifying Error for Back Propagation π(πππ ) neuron output function for pth presentation for training πΉππ = π′ (πππ ) πππ − πππ πΉππ = π′ π (πππ ) (πππ πππ π) error for jth unit in output layer πΉππ πππ error for jth unit in hidden layer πππ πΉππ πππ πΉππ πππ πππ πππ ... πΉππ hidden layer πππ output layer pth training set output πΉππ The Sigmoid Function π π = π −π −ππ π+π sigmoid π′ π = π − π(π)π derivative of the sigmoid Another Sigmoid Function π π π = π + π−π sigmoid π′ π = π(π) π − π(π) derivative of the sigmoid Running the Neural Network public void calcInputLayer(int p) { for (int i = 0; i < iLayer.Length; i++) { iLayer[i].output = inTrain[i, p]; } } public void calcHiddenLayer() { for(int h=0;h<hLayer.Length;h++) { hLayer[h].input = 0.0; for (int i = 0; i < iLayer.Length; i++) hLayer[h].input += ihWeight[i, h].wt * iLayer[i].output; hLayer[h].output = f(hLayer[h].input); } } public void calcOutputLayer() { for (int o = 0; o < oLayer.Length; o++) { oLayer[o].input = 0.0; for (int h = 0; h < hLayer.Length; h++) oLayer[o].input += hoWeight[h, o].wt * hLayer[h].output; oLayer[o].output = f(oLayer[o].input); } } public double f(double x) { return 1.0 / (1.0 + Math.Exp(-x)); } public double df(double x) { return f(x) * (1.0 - f(x)); } Running the network is a feed-forward process. Input data is presented to the input layer. The activation (input) is computed for each node of the hidden layer and then used to compute the output of the hidden layer nodes The activation (input) is computed and used to compute the output of the network. Training the Network In backward error propagation, the difference between the actual output and the goal (or target) output provided in the training set is used to compute the error in the network. This error is then used to compute the delta (change) in weight values for the weights between the hidden layer and the output layer. public void calcOutputError(int p, int r) { for (int o = 0; o < oLayer.Length; o++) oLayer[o].error = df(oLayer[o].input) * (outTrain[o, p] - oLayer[o].output); for (int h = 0; h < hLayer.Length; h++) for (int o = 0; o < oLayer.Length; o++) hoWeight[h, o].wt += learn * oLayer[o].error * hLayer[h].output; } public void calcHiddenError(int p, int r) { double err = 0.0; for (int h = 0; h < hLayer.Length; h++) { for (int o = 0; o < oLayer.Length; o++) err = oLayer[o].error * hoWeight[h, o].wt; hLayer[h].error = df(hLayer[h].input) * err; } for (int i = 0; i < iLayer.Length; i++) for (int h = 0; h < hLayer.Length; h++) ihWeight[i, h].wt += learn * hLayer[h].error * iLayer[i].output; } These new weight values are then used to distribute the output error to the hidden layer nodes. These nodes errors are, in turn, used to compute the changes in value for the weights between the input layer and the hidden layer of the network. 1. Set the number of neurons in each level 2. Select the learning rate, error limit and max training runs 3. Give the number of training pairs and include them in the left-hand text window with input output pairs listed sequentially input 1 output 1 input 2 output 2 : input n output n Total Training Set Ensemble Error during training process Training rate depends on initial value of random weights User can monitor rate of error correction in each weight during training as weight color large delta small delta small or zero changes in each weight do not necessarily mean that network is trained training could be hung in a local minimum When running the network, place input values in text window and click run answer(s) appear on next line(s) How Many Nodes? Number of Input Layer Nodes matches number of input values Number of Ouput Layer Nodes matches number of output values But what about the hidden Layer? Too few hidden layer nodes and the NN can't learn the patterns. Too many hidden layer nodes and the NN doesn't generalize. When Should We Use Neural Networks? Neural Networks need lots of data (example solutions) for training. The functional relationships of the problem/solution are not well understood. The problem/solution is not applicable to a rule-based solution. "Similar input data sets generate "similar" outputs. Neural Networks perform general Pattern Recognition. Neural Networks are particularly good as Decision Support tools. Also good for modeling behavior of living systems. Can a Neural Network do More than a Digital Computer? Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than the computer on which it is being executed. The question is, "Can a computational system such as a Neural Network be built that can do something that a digital computer cannot?" A digital computer is the physical embodiment of a Turing Machine which is defined as a universal computer of all computable functions. An artificial Neural Network is loosely modeled on the human brain. Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic the activities of human brain cells. Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does everything a human brain can do? Can you think of something human brains do that, so far, has not been accomplished or, at least, approximated by a computer or any other physical (man-made) system? Can a Neural Network do More than a Digital Computer? Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than the computer on which it is being executed. The question is, "Can a computational system such as a Neural Network be built that can do something that a digital computer cannot?" A digital computer is the physical embodiment of a Turing Machine which is defined as a universal computer of all computable functions. An artificial Neural Network is loosely modeled on the human brain. Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic the activities of human brain cells. Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does everything a human brain can do? Can you think of something human brains do that, so far, has not been accomplished or, at least, approximated by a computer or any other physical (man-made) system? Consciousness What is the Computational Power of Consciousness? Since we can't quantify consciousness, it is not likely that we can determine the level of computational power necessary to manifest it. However, we can establish a relative measure of computational power for systems that do and (so far) do not exhibit consciousness. Human Mind/Brain Turing Machine Digital Computer Neural Network Physical System/Model Relative Computational Power Mind/Brain Turing Machine Digital Computer Physical Model Neural Network Relative Computational Power Mind/Brain Dualism vs Materialism The Revised Turing Test Turing Machine Finite Storage and Finite Precision Digital Computer Physical Model Due to limitations of finite storage and the related issue of finite precision arithmetic, a Turing Machine can exhibit greater computational power than a digital computer. Symbolism vs Connectionism Engineering and Technology Neural Network Relative Computational Power Mind/Brain Turing Machine Digital Computer Physical Model Neural Network