In 1957, Frank Rosenblatt, a scientist at Cornell University, created an artificial neural network consisting of a single neuron, with multiple inputs and one output. He christened it the perceptron. The weighted inputs (meaning input x weight) and bias are summed at the node. This sum is then applied to an activation function ( i.e. function(sum) ) to get the final output. Thebias is a way to control the amount of stimulation needed before the neuron gets excited. And if you had read the first post about neural networks you'd know that learning is possible because of the strength of the synaptic connections. The weights of each input are precisely that - the strength of the input. This perceptron has two very important uses - classification of linearly separable patterns, and fitting of data into a linear function. Pattern Classification I shall illustrate this using the example of a pattern with 2 dimensions (i.e. two inputs), but note that the perceptron can extend to many inputs. In my example, we want to classify someone as being male or female, given two pieces of information, namely whether or not the person is wearing a skirt and whether or not the person is carrying a handbag. Lets just assume that we're still living in the 50's so anyone wearing a skirt OR carrying a handbag is female, and everyone else is male. Here is a truth table to summarise these conditions: What we want to do is give the perceptron the two inputs (skirt and handbag) and let it tell us whether the person is male or female (0 or 1). I've no choice but to get mathematically involved now. To calculate the weighted sum at the node, we have: The activation function is there to format the output to either 0 or 1 (i.e. to prevent getting outputs like 5.2324 etc). The perceptron uses a hard-limiter as the activation function, which is like this: The only thing left to do now is to let the neural network learn what weights and bias to use. Luckily, I prepared a batch earlier today (actually it was prepared by the master chef so that it will look nicer than the one I prepared on camera and also I didn't have to come in to work so early) so we don't have to wait for the baking to be done. So the bias is -1 and both weights are 1. I can show you that the above values work by testing our inputs to see if we get the correct output: So the neuron that can tell us whether the person is a guy or gal is: Training the Neuron Ok how did the master chef get the weights and the bias? He was kind enough to come all the way back from home to tell us his secret procedure because he works for us. Hi. Uhh... yeah ok, so here's how to train the neuron to classify your pattern whatever. First you just randomly choose the weights and bias. Then you create a set of training data, feed it to the neuron, and if he gives the wrong output, you tell him it's wrong and he'll adjust the weights and try the training data again. The training data we use for the example above is: Yeah I know it's the same as the actual data. That's coz there are not many patterns to choose from for this example. First, think of the the bias as an input of 1 with an unknown weight. That means the structure of the neuron is like this: Now, while training, if you get the correct output, don't adjust the weights. Otherwise, if you get the wrong output, you adjust the weights according to this rule: So we randomly choose our weights to be all 1. Lets try it with one round of training data: Each round of training data is called an epoch, so we have come to the end of our first epoch. It seems like after one round of training data, the weights have settled quite nicely to values of [0 1 1] at the end of the first epoch. But we're still not sure whether these weights work for the first set of training data (0 skirt 0 handbag), because that is the case that gave the wrong output. So we try it for a second epoch: During the second epoch, we again had to adjust the weights, so we still can't be sure if it works for all training data. We gotta go for a third epoch: As you can see, the weights we arrived at during the 2nd epoch works for all cases during the 3rd epoch. Thus, we can be sure that these weights are correct, i.e. the weights have converged. And that is how you train a perceptron. Can I go now? k thanks Linear Data Fitting The fitting of data points to a linear function is another use of the perceptron. All of you who have studied science before in secondary school, you'd know about this. The teacher tells you to do an experiment, record down the data, then plot the graph. You plot the graph and find yourself with some data points that seem to occur randomly. Then you're told to draw a "best-fit" line to approximate all the points along the line and then get the equation of the line (y = mx + c). So we have some data points (input) and we want to get the equation of the line (y = mx + c). The neuron model we'd use for this is this: Notice about the neuron that there's no activation function. So, the weighted sum at the node will lead straight to the output. That is to say: So all you need to get the line equation is w and c innit? And you know how to do that innit? The master chef taught you already, just train the neuron with the raw data points. How to adjust the weights? Same thing: The desired output in this case is the y-coordinate of the data point. Will the weights converge? Most likely they will never, because otherwise your data points really do lie exactly on a line. So you have to decide how many epochs to train the neuron for. The more you train, the better fitting line you will get, supposedly. Death There is a flaw in the perceptron - it can only classify linearly separable patterns, and as a result, it can only fit data to straight lines (no curves). Linearly separable patterns are patterns that can be classified into two classes that can be separated by a line (or for 3D patterns, a plane, or for higher-dimensional patterns, a hyperplane). The separating line (or plane) is called the decision boundary. The equation of the decision boundary is actually of the form: If there are only two inputs, it will be a line (you can see that the equation will be like y = mx + c). As it turns out, there are tons of patterns that are non-linearly separable, like the exclusive or (XOR) truth table: The data points are arranged like this: There is no way to separate the two values of outputs by a single decision boundary line, and thus the perceptron fails. That wasn't the end of the perceptron, however... ....................