Neural Network 校外:210.70.101.21 校內::ftp//ai@10.10.101.21:2005 秘:ai201 ** The Perceptron ** The perceptron is a program that learn concepts, i.e. it can learn to respond with True (1) or False (0) for inputs we present to it, by repeatedly "studying" examples presented to it. The Perceptron is a single layer neural network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The training technique used is called the perceptron learning rule. The perceptron generated great interest due to its ability to generalize from its training vectors and work with randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification. Our perceptron network consists of a single neuron connected to two inputs through a set of 2 weights, with an additional bias input. The perceptron calculates its output using the following equation: P * W + b > 0 where P is the input vector presented to the network, W is the vector of weights and b is the bias. The Learning Rule The perceptron is trained to respond to each input vector with a corresponding target output of either 0 or 1. The learning rule has been proven to converge on a solution in finite time if a solution exists. The learning rule can be summarized in the following two equations: For all i: W(i) = W(i) + [ T - A ] * P(i) b = b + [ T - A ] where W is the vector of weights, P is the input vector presented to the network, T is the correct result that the neuron should have shown, A is the actual output of the neuron, and b is the bias. Training Vectors from a training set are presented to the network one after another. If the network's output is correct, no change is made. Otherwise, the weights and biases are updated using the perceptron learning rule. An entire pass through all of the input training vectors is called an epoch. When such an entire pass of the training set has occured without error, training is complete. At this time any input training vector may be presented to the network and it will respond with the correct output vector. If a vector P not in the training set is presented to the network, the network will tend to exhibit generalization by responding with an output similar to target vectors for input vectors close to the previously unseen input vector P. Limitations Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of two values (True or False). Second, perceptrons can only classify linearly separable sets of vectors. If a straight line or plane can be drawn to seperate the input vectors into their correct categories, the input vectors are linearly separable and the perceptron will find the solution. If the vectors are not linearly separable learning will never reach a point where all vectors are classified properly. The most famous example of the perceptron's inability to solve problems with linearly nonseparable vectors is the boolean exclusive-or problem. Our Implementation We implemented a single neuron perceptron with 2 inputs. The input for the neuron can be taken from a graphic user interface, by clicking on points in a board. A click with the left mouse button generates a '+' sign on the board, marking that it's a point where the perceptron should respond with 'True'. A click with the right mouse button generates a '-' sign on the board, marking that it's a point where the perceptron should respond with 'False'. When enough points have been entered, the user can click on 'Start', which will introduce these points as inputs to the perceptron, have it learn these input vectors and show a line which corresponds to the linear division of the plane into regions of opposite neuron response. Here's an example of a screen shot: *** 相關網頁 *** http://www.imt.ntou.edu.tw/Lab/aiwww/neural.html Back propagation During training, information is propagated back through the network and used to update connection weights. How? Different neural network architectures use different algorithms to calculate the weight changes. Backpropagation (BP) is a commonly used (but inefficient) algorithm in MLPs. We know the errors at the output layer, but not at the hidden layer elements. BP solves the problem of how to calculate the hidden layer errors (it propagates the output errors back to the previous layer using the output element weights). The mathematics of this algorithm are given in several textbooks and on-line tutorials. For a detailed explanation of the back propagation algorithm, see Carling, Alison (1992) Introducing Neural Networks, Wilmslow: Sigma Press, pp. 147-154. It helps to know some features of it when training neural networks. 1. Internally most BP networks work with values between 0 and 1. If your inputs have a different range, NN simulators like Neural Planner will scale each input variable minimum to 0 and maximum to 1. 2. They change the weights each time by some fraction of the change needed to completely correct the error. This fraction, ß, is the learning rate. a. High learning rates cause the learning algorithm to take large steps on the error surface, with the risk of missing a minimum, or unstably oscillating across the error minimum ('sloshing') b. Small steps, from a low learning rate, eventually find a minimum, but they take a long time to get there. c. Some NN simulators can be set to reduce the learning rate as the error decreases. d. Also, sloshing can be reduced by mixing in to the weight change a proportion of the last weight change, so smoothing out small fluctutions. This proportion is the momentum term. 3. The algorithm finds the nearest local minimum, not always the lowest minimum. One solution commonly used in backpropagation is to: 1. restart learning every so often from a new set of random weights (i.e. somewhere else in the weight space). 2. find the local minimum from each new start 3. keep track of the best minimum found 4. Overfitting is when the NN learns the specific details of the training set, instead of the general pattern found in all present and future data There can be two causes: . Training for too long. Solution? 1. Test against a separate test set every so often. 2. Stop when the results on the test set start getting worse. a. Too many hidden nodes One node can model a linear function More nodes can model higher-order functions, or more input patterns Too many nodes model the training set too closely, preventing generalisation. *** 相關網頁 *** http://www.gc.ssr.upm.es/inves/neural/ann1/supmodel/MLP.htm#backprop self-organing feature map The basic idea of SFM is to incorporate into the competitive learning rule some degree of sensitiviy with respect tothe neighborhood or history. This provide a way to avoid totally unlearned neurons and it helps enhance certain topological property which should be preserved in the feature mapping. Supose that an input pattern has N features and is represented by a vector x in an N-dimensional pattern space. The network maps the input pattern to an output space. The output space is suposed to be one dimensional or two dimensional arrays of output nodes, which possess a certain topological orderness. The question is how to train a network so that the ordered relationship can be preserved. Kohonen proposed to allow the ouput nodes interact laterally, leading to the self-organizing feature map. The most prominent feature is the concept of excitatory learning within a neighborhood around the winning neuron. The size of the neighborhood decreases with each iteration. The training phase is provided here: 1. First a winning neuron is selected as the one with the shortest Euclidean distance between its weight vector and the input vector, where denotes the weight vector corresponding to the ith output neuron. 2. Let i* denote the index of the winner and let I* denote a set of indexes corresponding to a defines neighborhood of winner i*. Then the weights associated with the winner and its neighboring neurons are updated by for all the indices , and n is a small positive learning rate. The amount of updating may be weighted according to a preasigned "neighborhood function" . for all j. For example, a neighborhood function may be chosen as where represents the position of the neuron j in the output space. The convergence of the feature map depends on a proper choice . One choice is that . The size of the neighborhood should decrease gradually as depicted in the next figure: 3. The weight update should be inmediately succeeded by the normalization. In the retrieving phase, all the output neurons calculate the Euclidean distance between the weights and the input vector and the winning neuron is the one with the shortest distance. Competitive Learning with History Sensitivity Incorporating some history/frequency sensitivity into de competitive learning rule provides another way to alleviate the problem of totally unlearned neurons or prejudiced training. There are two approaches: 1. Modulate the selection of a winner by the frequency sensitivity. 2. Modulate the learning rate by the frequency sensitivity. The rate of training can also be modulated by frequency sensitivity. As an example, we present the following competitive learning rule: 1. Select a winner i* as the neuron with, for example, the smallest Euclidean distance. 2. Update the weights associated with the winner. This technique is called frequency-sensitive competitive learning, where the parameter is a function of how frequent the i*-th node is selected as the winner. *** 相關網頁 *** http://citeseer.nj.nec.com/context/109316/0 ………The end …….