Artificial Neural Network Overview An artificial neural network is a collection of connected models neurons. Taken one at a time each neuron is rather simple. As a collection however, a group of neurons is capable of producing complex results. In the following sections I will briefly summarize a mathematical model of a neuron, neuron layer, and neural network before discussing the types of behavior achievable from a neural network. Finally, I will conclude with a short description of the program included in this lesson so you can form networks that are tailored to your class. Models The models presented in this section appear fairly difficult mathematically. However, they eventually boil down to just multiplication and addition. The use of matrices and vectors simplifies the notation but is not absolutely required for this application. Neuron Model A model of a neuron has three basic parts: input weights, a summer, and an output function. The input weights scale values used as inputs to the neuron, the summer adds all the scaled values together, and the output function produces the final output of the neuron. Often, one additional input, known as the bias is added to the system. If a bias is used, it can be represented by a weight with a constant input of one. This description is laid out visually below. I1 I2 W1 W2 x W3 I3 B 1 f(x) a Where I1, I2, and I3 are the inputs, W1, W2, and W3 are the weights, B is the bias, x is an intermediate output, and a is final output. The equation for a is given by a f (W1 I1 W2 I 2 W3 I 3 B) where f could be any function. Most often, f is the sign of the argument (i.e. 1 if the argument is positive and -1 if the argument is negative), linear (i.e. the output is simply the input times some constant factor), or some complex curve used in function matching (not needed here). For this model we will use the first case where f is the sign of the argument for two reasons: it closely matches the ‘all or nothing’ property seen in biological neurons and it is fairly easy it implement. When artificial neurons are implemented, vectors are commonly used to represent the inputs and the weights so the first of two brief reviews of linear algebra is appropriate here. The dot product of two vectors x ( x1 , x2 ,, xn ) and y ( y1 , y2 ,, yn ) is given by x y x1 y1 x2 y2 xn yn . Using this notation the output is simplified to a f (W I B) where all the inputs are contained in I and all the weights are contained in W . Neuron Layer In a neuron layer each input is tied to every neuron and each neuron produces its own output. This can be represented mathematically by the following series of equations: a1 f1 (W1 I B1 ) a2 f 2 (W2 I B2 ) a3 f 3 (W3 I B3 ) ... NOTE: In general these functions may be different, however, I will take them to be the sign of the argument from now on. And we will take our second digression into linear algebra. We need to recall that to perform the operation of matrix multiplication you take each column of the second matrix and perform the dot product operation with each row of the first matrix to produce each element in the result. For example the dot product of the ith column of the second matrix and the jth row of the first matrix results in the (j,i) element of the result. If the second matrix is only one column, then the result is also one column. Keeping matrix multiplication in mind, we append the weights so that each row of a matrix represents the weights of on neuron. Now, representing the input vector and the biases as one column matrices, we can simplify the above notation to: a f (W I B) which is the final form of the mathematical representation of one layer of artificial neurons. Neural Network A neural network is simply a collection of neuron layers where the output of each previous layer becomes the input to the next layer. So, for example, the inputs to layer two are the outputs of layer one. In this exercise we are keeping it relatively simple by not having feedback (i.e. output from layer n being input for some previous layer). To mathematically represent the neural network we only have to chain together the equations. The finished equation for the three layer network in this equation is given by: a f (W3 f (W2 f (W1 I B1 ) B2 ) B3 ) Neural Network Behavior Although transistor now switch in as little as 0.000000000001 seconds and biological neurons take about .001 seconds to respond we have not been able to approach the complexity or the overall speed of the brain because of, in part, the large number (approximately 100,000,000,000) neurons that are highly connected (approximately 10,000 connections per neuron). Although not as advanced as biologic brains, artificial neural networks are still perform many important functions in a wide range of applications including sensing, controls, pattern recognition, and categorization. Generally, networks (including our brains) are trained to achieve a desired result. The training mechanisms and rules are beyond the scope of this paper, however it is worth mentioning that generally good behavior is rewarded while bad behavior is punished. That is to say that when a network performs well it is modified only slightly (if at all) and when it performs poorly larger modifications are made. As a final thought on neural network behavior, it is worth noting that if the output function of the neurons are all linear functions, the network is reducible to a one layer network. In other words, to have a useful network of more than one layer we must us a function like the sigmoid (an s shaped curve), the sign function we used above, a linear function that saturates, or any other non-line shaped curve. Matlab Code This section covers the parameters in my Matlab code that you might choose to modify if you decide to create a network with inputs and outputs other than what have been already documented in this lesson. Before using my code you should be aware that it was not written to solve general neural network problems, but rather to find a network by randomly trying values. This means that it could loop forever even if a solution to your inputs and outputs exists. If you do not get a good result after a few minutes you may want to stop the execution and change your parameters. Finally, I will not claim that I have worked all bugs out of this program so you should check your results carefully before executing them in a classroom setting. p1, p2, and p3 are input patterns for three different inputs. Each input pattern consists of three elements pertaining to different attributes of the input. For example in my lesson I used redness, roundness, and softness. Here, for instance, a one in the first position means that an object is red while a zero indicates that it is not red. a1, a2, and a3 are output patterns. They need to be initialized to be incorrect (that way the program enters the loop rather than bypasses it). The second argument of the conditionals for the loop should be the desired results. In my case, I chose to have one neuron in the last layer be an indicator for each object. When that object was used as an input for the network, that neuron would end up being a one while the other neurons in the last layer would be negative one (if everybody did their math correctly). More explicitly, when the first element of a1 is not a positive one then it is wrong and I want to do the loop again. In a similar manner, when the second element of a1 is not a negative one it is wrong and I want to do the loop again. And the same for the rest of the outputs. Note that there is one known bug involving the termination of non-terminating decimals (in binary 0.1 is non-terminating). It is possible that a 0.0000 is taken to be positive rather than zero.