self-organing feature map

advertisement
Neural
Network
校外:210.70.101.21
校內::ftp//ai@10.10.101.21:2005
秘:ai201
**
The
Perceptron
**
The perceptron is a program that learn concepts, i.e. it can learn to respond
with True (1) or False (0) for inputs we present to it, by repeatedly "studying"
examples presented to it.
The Perceptron is a single layer neural network whose weights and biases could be
trained to produce a correct target vector when presented with the corresponding
input vector. The training technique used is called the perceptron learning rule.
The perceptron generated great interest due to its ability to generalize from its
training vectors and work with randomly distributed connections. Perceptrons are
especially suited for simple problems in pattern classification.
Our perceptron network consists of a single neuron connected to two inputs
through a set of 2 weights, with an additional bias input.
The perceptron calculates its output using the following equation:
P * W + b > 0
where P is the input vector presented to the network, W is the vector of
weights and b is the bias.
The Learning Rule
The perceptron is trained to respond to each input vector with a corresponding
target output of either 0 or 1. The learning rule has been proven to converge on
a solution in finite time if a solution exists.
The learning rule can be summarized in the following two equations:
For all i:
W(i) = W(i) + [ T - A ] * P(i)
b
= b + [ T - A ]
where W is the vector of weights, P is the input vector presented to the
network, T is the correct result that the neuron should have shown, A is the
actual output of the neuron, and b is the bias.
Training
Vectors from a training set are presented to the network one after another. If
the network's output is correct, no change is made. Otherwise, the weights and
biases are updated using the perceptron learning rule. An entire pass through
all of the input training vectors is called an epoch. When such an entire pass of
the training set has occured without error, training is complete. At this time
any input training vector may be presented to the network and it will respond
with the correct output vector. If a vector P not in the training set is presented
to the network, the network will tend to exhibit generalization by responding
with an output similar to target vectors for input vectors close to the
previously unseen input vector P.
Limitations
Perceptron networks have several limitations. First, the output values of a
perceptron can take on only one of two values (True or False). Second,
perceptrons can only classify linearly separable sets of vectors. If a straight
line or plane can be drawn to seperate the input vectors into their correct
categories, the input vectors are linearly separable and the perceptron will find
the solution. If the vectors are not linearly separable learning will never reach
a point where all vectors are classified properly.
The most famous example of the perceptron's inability to solve problems with
linearly nonseparable vectors is the boolean exclusive-or problem.
Our Implementation
We implemented a single neuron perceptron with 2 inputs. The input for the
neuron can be taken from a graphic user interface, by clicking on points in a
board. A click with the left mouse button generates a '+' sign on the board,
marking that it's a point where the perceptron should respond with 'True'. A
click with the right mouse button generates a '-' sign on the board, marking
that it's a point where the perceptron should respond with 'False'. When
enough points have been entered, the user can click on 'Start', which will
introduce these points as inputs to the perceptron, have it learn these input
vectors and show a line which corresponds to the linear division of the plane
into regions of opposite neuron response.
Here's an example of a screen shot:
***
相關網頁 ***
http://www.imt.ntou.edu.tw/Lab/aiwww/neural.html
Back propagation

During training, information is propagated back through the network and used to
update connection weights. How?

Different neural network architectures use different algorithms to calculate the
weight changes.

Backpropagation (BP) is a commonly used (but inefficient) algorithm in MLPs.

We know the errors at the output layer, but not at the hidden layer elements.

BP solves the problem of how to calculate the hidden layer errors (it propagates
the output errors back to the previous layer using the output element weights).
The mathematics of this algorithm are given in several textbooks and on-line tutorials.
For a detailed explanation of the back propagation algorithm, see Carling, Alison
(1992) Introducing Neural Networks, Wilmslow: Sigma Press, pp. 147-154.
It helps to know some features of it when training neural networks.
1. Internally most BP networks work with values between 0 and 1. If your inputs
have a different range, NN simulators like Neural Planner will scale each input
variable minimum to 0 and maximum to 1.
2. They change the weights each time by some fraction of the change needed to
completely correct the error. This fraction, ß, is the learning rate.
a. High learning rates cause the learning algorithm to take large steps on
the error surface, with the risk of missing a minimum, or unstably
oscillating across the error minimum ('sloshing')
b. Small steps, from a low learning rate, eventually find a minimum, but
they take a long time to get there.
c. Some NN simulators can be
set to reduce the learning rate
as the error decreases.
d. Also, sloshing can be
reduced by mixing in to the
weight change a proportion
of the last weight change, so
smoothing out small
fluctutions. This proportion is
the momentum term.
3. The algorithm finds the nearest local minimum, not always the lowest minimum.
One solution commonly used in backpropagation is to:
1. restart learning every so often from a new set of random weights (i.e.
somewhere else in the weight space).
2. find the local minimum from each new start
3. keep track of the best minimum found
4. Overfitting is when the NN learns the specific details of the training set, instead
of the general pattern found in all present and future data
There can be two causes:
.
Training for too long. Solution?
1. Test against a
separate test
set every so
often.
2. Stop when the
results
on the test set
start getting
worse.
a. Too many hidden nodes

One node can model a linear function

More nodes can model higher-order functions, or more input
patterns

Too many nodes model the training set too closely, preventing
generalisation.
*** 相關網頁
***
http://www.gc.ssr.upm.es/inves/neural/ann1/supmodel/MLP.htm#backprop
self-organing feature map
The basic idea of SFM is to incorporate into the competitive learning rule some
degree of sensitiviy with respect tothe neighborhood or history. This provide a way to
avoid totally unlearned neurons and it helps enhance certain topological property
which should be preserved in the feature mapping.
Supose that an input pattern has N features and is represented by a vector x in an
N-dimensional pattern space. The network maps the input pattern to an output space. The
output space is suposed to be one dimensional or two dimensional arrays of output nodes,
which possess a certain topological orderness. The question is how to train a network so
that the ordered relationship can be preserved. Kohonen proposed to allow the ouput nodes
interact laterally, leading to the self-organizing feature map.
The most prominent feature is the concept of excitatory learning within a neighborhood
around the winning neuron. The size of the neighborhood decreases with each iteration.
The training phase is provided here:
1. First a winning neuron is selected as the one with the shortest Euclidean
distance between its weight vector and the input vector, where denotes the
weight vector corresponding to the ith output neuron.
2. Let i* denote the index of the winner and let I* denote a set of indexes
corresponding to a defines neighborhood of winner i*. Then the weights
associated with the winner and its neighboring neurons are updated by
for all the indices , and n is a small positive learning rate. The amount of
updating may be weighted according to a preasigned "neighborhood
function" .
for all j. For example, a neighborhood function may be chosen as
where represents the position of the neuron j in the output space. The
convergence of the feature map depends on a proper choice . One choice is
that . The size of the neighborhood should decrease gradually as depicted in
the next figure:
3. The weight update should be inmediately succeeded by the normalization.
In the retrieving phase, all the output neurons calculate the Euclidean distance
between the weights and the input vector and the winning neuron is the one with the
shortest distance.
Competitive Learning with History Sensitivity
Incorporating some history/frequency sensitivity into de competitive learning rule
provides another way to alleviate the problem of totally unlearned neurons or
prejudiced training. There are two approaches:
1. Modulate the selection of a winner by the frequency sensitivity.
2. Modulate the learning rate by the frequency sensitivity.
The rate of training can also be modulated by frequency sensitivity. As an example,
we present the following competitive learning rule:
1. Select a winner i* as the neuron with, for example, the smallest Euclidean
distance.
2. Update the weights associated with the winner.
This technique is called frequency-sensitive competitive learning, where the
parameter is a function of how frequent the i*-th node is selected as the winner.
***
相關網頁 ***
http://citeseer.nj.nec.com/context/109316/0
………The end …….
Download