Lecture 09 Artificial Intelligence Perceptron

advertisement
In 1957, Frank Rosenblatt, a scientist at Cornell University, created
an artificial neural network consisting of a single neuron, with
multiple inputs and one output. He christened it the perceptron.
The weighted inputs (meaning input x weight) and bias are
summed at the node. This sum is then applied to an activation
function ( i.e. function(sum) ) to get the final output. Thebias is a
way to control the amount of stimulation needed before the neuron
gets excited. And if you had read the first post about neural
networks you'd know that learning is possible because of the
strength of the synaptic connections. The weights of each input are
precisely that - the strength of the input.
This perceptron has two very important uses - classification of
linearly separable patterns, and fitting of data into a linear function.
Pattern Classification
I shall illustrate this using the example of a pattern with 2
dimensions (i.e. two inputs), but note that the perceptron can extend
to many inputs. In my example, we want to classify someone as
being male or female, given two pieces of information, namely
whether or not the person is wearing a skirt and whether or not the
person is carrying a handbag. Lets just assume that we're still living
in the 50's so anyone wearing a skirt OR carrying a handbag is
female, and everyone else is male. Here is a truth table to
summarise these conditions:
What we want to do is give the perceptron the two inputs (skirt and
handbag) and let it tell us whether the person is male or female (0
or 1). I've no choice but to get mathematically involved now.
To calculate the weighted sum at the node, we have:
The activation function is there to format the output to either 0 or 1
(i.e. to prevent getting outputs like 5.2324 etc). The perceptron uses
a hard-limiter as the activation function, which is like this:
The only thing left to do now is to let the neural network learn what
weights and bias to use. Luckily, I prepared a batch earlier today
(actually it was prepared by the master chef so that it will look nicer
than the one I prepared on camera and also I didn't have to come in
to work so early) so we don't have to wait for the baking to be done.
So the bias is -1 and both weights are 1.
I can show you that the above values work by testing our inputs to
see if we get the correct output:
So the neuron that can tell us whether the person is a guy or gal is:
Training the Neuron
Ok how did the master chef get the weights and the bias? He was
kind enough to come all the way back from home to tell us his
secret procedure because he works for us.
Hi. Uhh... yeah ok, so here's how to train the neuron to classify your
pattern whatever. First you just randomly choose the weights and
bias. Then you create a set of training data, feed it to the neuron,
and if he gives the wrong output, you tell him it's wrong and he'll
adjust the weights and try the training data again.
The training data we use for the example above is:
Yeah I know it's the same as the actual data. That's coz there are
not many patterns to choose from for this example.
First, think of the the bias as an input of 1 with an unknown weight.
That means the structure of the neuron is like this:
Now, while training, if you get the correct output, don't adjust the
weights. Otherwise, if you get the wrong output, you adjust the
weights according to this rule:
So we randomly choose our weights to be all 1. Lets try it with one
round of training data:
Each round of training data is called an epoch, so we have come to
the end of our first epoch. It seems like after one round of training
data, the weights have settled quite nicely to values of [0 1 1] at the
end of the first epoch. But we're still not sure whether these weights
work for the first set of training data (0 skirt 0 handbag), because
that is the case that gave the wrong output. So we try it for a second
epoch:
During the second epoch, we again had to adjust the weights, so we
still can't be sure if it works for all training data. We gotta go for a
third epoch:
As you can see, the weights we arrived at during the 2nd epoch
works for all cases during the 3rd epoch. Thus, we can be sure that
these weights are correct, i.e. the weights have converged.
And that is how you train a perceptron. Can I go now? k thanks
Linear Data Fitting
The fitting of data points to a linear function is another use of the
perceptron. All of you who have studied science before in
secondary school, you'd know about this. The teacher tells you to
do an experiment, record down the data, then plot the graph. You
plot the graph and find yourself with some data points that seem to
occur randomly. Then you're told to draw a "best-fit" line to
approximate all the points along the line and then get the equation
of the line (y = mx + c).
So we have some data points (input) and we want to get the
equation of the line (y = mx + c). The neuron model we'd use for this
is this:
Notice about the neuron that there's no activation function. So, the
weighted sum at the node will lead straight to the output. That is to
say:
So all you need to get the line equation is w and c innit? And you
know how to do that innit? The master chef taught you already, just
train the neuron with the raw data points. How to adjust the
weights? Same thing:
The desired output in this case is the y-coordinate of the data point.
Will the weights converge? Most likely they will never, because
otherwise your data points really do lie exactly on a line. So you
have to decide how many epochs to train the neuron for. The more
you train, the better fitting line you will get, supposedly.
Death
There is a flaw in the perceptron - it can only classify linearly
separable patterns, and as a result, it can only fit data to straight
lines (no curves).
Linearly separable patterns are patterns that can be classified into
two classes that can be separated by a line (or for 3D patterns, a
plane, or for higher-dimensional patterns, a hyperplane).
The separating line (or plane) is called the decision boundary. The
equation of the decision boundary is actually of the form:
If there are only two inputs, it will be a line (you can see that the
equation will be like y = mx + c).
As it turns out, there are tons of patterns that are non-linearly
separable, like the exclusive or (XOR) truth table:
The data points are arranged like this:
There is no way to separate the two values of outputs by a single
decision boundary line, and thus the perceptron fails.
That wasn't the end of the perceptron, however...
....................
Download