University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI Artificial Neural Networks(ANNs) 1.The Nervous System The human nervous system can be broken down into three stages that may be represented in block diagram form as: The receptors collect information from the environment – e.g. photons on the retina. The effectors generate interactions with the environment – e.g. activate muscles. The flow of information/activation is represented by arrows – feed forward and feedback. Naturally, in this module we will be primarily concerned with the neural network in the middle. 2.Basic Components of Biological Neurons 1. The majority of neurons encode their activations or outputs as a series of brief electrical pulses (i.e. spikes or action potentials). 2. The neuron’s cell body (soma) processes the incoming activations and converts them into output activations. 3. Dendrites are fibres which emanate from the cell body and provide the receptive zones that receive activation from other neurons. 4. Axons are fibres acting as transmission lines that send activation to other neurons. 5. The junctions that allow signal transmission between the axons and dendrites are called synapses. The process of transmission is by diffusion of chemicals called neurotransmitters across the synaptic cleft 1 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI ■ Dendrites receive activation from other neurons. ■ Soma processes the incoming activations and converts them into output activations. ■ Axons act as transmission lines to send activation to other neurons. ■ Synapses the junctions allow signal transmission between the axons and dendrites. ■ The process of transmission is by diffusion of chemicals called neurotransmitters. 3. What are Neural Networks ? 1. Neural Networks (NNs) are networks of neurons, for example, as found in real (i.e. biological) brains. 2. Artificial Neurons are crude approximations of the neurons found in brains. They may be physical devices, or purely mathematical constructs. 3. Artificial Neural Networks (ANNs) are networks of Artificial Neurons, and hence constitute crude approximations to parts of real brains. They may be physical devices, or simulated on conventional computers. 4. From a practical point of view, an ANN is just a parallel computational system consisting of many simple processing elements connected together in a specific way in order to perform a particular task. 5. One should never lose sight of how crude the approximations are, and how over-simplified our ANNs are compared to real brains. 2 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 4. What are Artificial Neural Networks used for? As with the field of AI in general, there are two basic goals for neural network research: Brain modeling : The scientific goal of building models of how real brains work. This can potentially help us understand the nature of human intelligence, formulate better teaching strategies, or better remedial actions for brain damaged patients. Artificial System Building : The engineering goal of building efficient systems for real world applications. This may make machines more powerful, relieve humans of tedious tasks, and may even improve upon human performance. These should not be thought of as competing goals. We often use exactly the same networks and techniques for both. Frequently progress is made when the two approaches are allowed to feed into each other. There are fundamental differences though, e.g. the need for biological plausibility in brain modeling, and the need for computational efficiency in artificial system building. 5.Why are Artificial Neural Networks worth studying? 1. They are extremely powerful computational devices . 2. Massive parallelism makes them very efficient. 3. They can learn and generalize from training data – so there is no need for enormous feats of programming. 4. They are particularly fault tolerant – this is equivalent to the “graceful degradation” found in biological systems. 5. They are very noise tolerant – so they can cope with situations where normal symbolic systems would have difficulty. 6.Architecture of ANNs 1-The Single Layer Feed-forward Network consists of a single layer of weights , where the inputs are directly connected to the outputs, via a series of weights. The synaptic links carrying weights connect every input to every output , but not other way. This way it is considered a network of feed-forward type. 3 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 2-Multi Layer Feed-forward Network The name suggests, it consists of multiple layers. The architecture of this class of network, besides having the input and the output layers, also have one or more intermediary layers called hidden layers. The computational units of the hidden layer are known as hidden neurons. - The hidden layer does intermediate computation before directing the input to output layer. - The input layer neurons are linked to the hidden layer neurons; the weights on these links are referred to as input-hidden layer weights. - The hidden layer neurons and the corresponding weights are referred to as output-hidden layer weights. - A multi-layer feed-forward network with ℓ input neurons, m1 neurons In the first hidden layers, m2 neurons in the second hidden layers, and n 4 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI output neurons in the output layers is written as (ℓ - m1 - m2 – n ). The Fig. above illustrates a multilayer feed-forward network with a configuration (ℓ - m – n). 3-The Recurrent Networks differ from feed-forward architecture. A Recurrent network has at least one feed back loop. There could be neurons with self-feedback links; that is the output of a neuron is fed back into it self as input. 7.Learning in Neural Networks There are many forms of neural networks. Most operate by passing neural ‘activations’ through a network of connected neurons. One of the most powerful features of neural networks is their ability to learn and generalize from a set of training data. They adapt the strengths/weights of the connections between neurons so that the final output activations are correct. There are three broad types of learning: 1. Supervised Learning (i.e. learning with a teacher) 2. Reinforcement learning (i.e. learning with limited feedback) 3. Unsupervised learning (i.e. learning with no help) This module will study in some detail the most common learning algorithms for the most common types of neural network. 1-Supervised Learning - A teacher is present during learning process and presents expected 5 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI output. - Every input pattern is used to train the network. - Learning process is based on comparison, between network's computed output and the correct expected output, generating "error". - The "error" generated is used to change network parameters that result improved performance. 2- Unsupervised Learning - No teacher is present. - The expected or desired output is not presented to the network. - The system learns of it own by discovering and adapting to the structural features in the input patterns. 3- Reinforced learning - A teacher is present but does not present the expected or desired output but only indicated if the computed output is correct or incorrect. - The information provided helps the network in its learning process. - A reward is given for correct answer computed and a penalty for a wrong answer. Note : The Supervised and Unsupervised learning methods are most popular forms of learning compared to Reinforced learning. 8-The McCulloch-Pitts Neuron This vastly simplified model of real neurons is also known as a Threshold Logic Unit : 1. A set of synapses (i.e. connections) brings in activations from other neurons. 2. A processing unit sums the inputs, and then applies activation function. 3. An output line transmits the result to other neurons. Using the above notation, we can now write down a simple equation for the output out of a McCulloch-Pitts neuron as a function of its n inputs ini : 6 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI where is the neuron’s activation threshold. We can easily see that: Note that the McCulloch-Pitts neuron is an extremely simplified model of real biological neurons. Some of its missing features such: non-binary inputs and Outputs. Nevertheless, McCulloch-Pitts neurons are computationally very powerful. One can show that assemblies of such neurons are capable of universal computation. 9.General Procedure for Building Neural Networks Formulating neural network solutions for particular problems is a multistage process: 1. Understand and specify your problem in terms of inputs and required outputs, e.g. for classification the outputs are the classes usually represented as binary vectors. 2. Take the simplest form of network you think might be able to solve your problem, e.g. a simple Perceptron. 3. Try to find appropriate connection weights (including neuron thresholds) so that the network produces the right outputs for each input in its training data. 4. Make sure that the network works on its training data, and test its generalization by checking its performance on new testing data. 5. If the network doesn’t perform well enough, go back to stage 3 and try harder. 6. If the network still doesn’t perform well enough, go back to stage 2 and try harder. 7. If the network still doesn’t perform well enough, go back to stage 1 and try harder. 8. Problem solved – move on to next problem. 10.Artificial Neuron - Basic Elements Neuron consists of three basic components - weights, thresholds, and a single activation function 7 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI In practice, neurons generally do not fire (produce an output) unless their total input goes above a threshold value. Activation Functions An activation function f performs a mathematical operation on the signal output. The activation functions are chosen depending upon the type of problem to be solved by the network. The most common activation functions are: Activation functions are called bipolar continuous and bipolar binary functions, respectively. The word "bipolar" is used to point out that both positive and negative responses of neurons are produced for this definition of the activation function. 8 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI Activation functions are called unipolar continuous and unipolar binary functions, respectively. 11.NEURAL NETWORK LEARNING RULES Our focus in this section will be artificial neural network learning rules. A neuron is considered to be an adaptive element. Its weights are modifiable depending on the input signal it receives, its output value, and the associated teacher response. In some cases the teacher signal is not available and no error information can be used, thus the neuron will modify its weights based only on the input and/or output. This is the case for unsupervised learning. Let us study the learning of the weight vector wi, or its components wy connecting the j7th input with the i'th neuron.. In general, the j'th input can be an output of another neuron or it can be an external input. Our discussion in this section will cover single-neuron and single-layer network supervised learning and simple cases of unsupervised learning. Under different learning rules, the form of the neuron's activation function may be different. Note that the threshold parameter may be included in learning as one of the weights. This would require fixing one of the inputs, say x,. We will assume here that x,, if fixed, takes the value of - 1. 9 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 11.1.Hebbian Learning Rule For the Hebbian learning rule the learning signal is equal simply to the neuron's output (Hebb 1949). This learning rule requires the weight initialization at small random values around wi = 0 prior to learning. The Hebbian learning rule represents a purely feedforward, unsupervised learning. 11 University of Babylon/College of IT S/W Dept. 11 3rd Class/Applications of AI University of Babylon/College of IT S/W Dept. 12 3rd Class/Applications of AI University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 11.2.Perceptron Learning Rule For the perceptron learning rule, the learning signal is the difference between the desired and actual neuron's response (Rosenblatt 1958). Thus, learning is supervised and the learning signal is equal to: This example illustrates the perceptron learning rule of the network shown in Figure . The set of input training vectors is as follows: 13 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 11.3.Delta Learning Rule The delta learning rule is only valid for continuous activation functions as defined before, and in the supervised training mode. The learning signal for this rule is called delta and is defined as follows 14 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI This example discusses the delta learning rule as applied to the network shown in Figure . Training input vectors, desired responses, and initial weights are identical to those in Example . The delta learning requires that the value f'(net) be computed in each step. 15 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 11.4.Widrow-Hoff Learning Rule The Widrow-Hoff learning rule (Widrow 1962) is applicable for the supervised training of neural networks. It is independent of the activation function of neurons used since it minimizes the squared error between the desired output value d, and the neuron's activation value net, = wtx. The learning signal for this rule is defined as follows: The weight vector increment under this learning rule is 11.5. Correlation Learning Rule By substituting r = di into the general learning rule we obtain the correlation learning rule. The adjustments for the weight vector and the single weights, respectively are 11.6.Winner-take-all Learning Rule This learning rule differs substantially from any of the rules discussed so far in this section It can only be demonstrated and explained for an ensemble of neurons, preferably arranged in a layer of p units. This rule is an example of Competitive learning, and it is used for unsupervised network training. Typically, winner-take-all learning is used for learning 16 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI statistical properties of inputs (Hechtle Nielsen 1987).learning is based on the premise that one of the neurons in the layer, say the m'th, has the maximum response due to input x, as shown in Figure . this neuron is declared the winner. As a result of this winning event, the weight vector w containing weights highlighted in the figure is the only one adjusted in the given unsupervised learning step. Its increment is computed as follows: The winner selection is based on the following criterion of maximum activation among all p neurons participating in a competition: 11.7.Outstar Learning Rule Outstar learning rule is another learning rule that is best explained when neurons are arranged in a layer. This rule is designed to produce a desired 17 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI response d of the layer of p neurons shown in Figure (Grossberg 1974, 1982). The rule is used to provide learning of repetitive and characteristic properties of input /output relationships. This rule is concerned with supervised learning; however, it is supposed to allow the network to extract statistical properties of the input and output signals. The weight adjustments in this rule are computed as follows: Excersise/ presents the analysis of a two layer feedforward network using neurons having the bipolar binary activation function given in . Our purpose is to find output o5 for a given network and input pattern. 18 University of Babylon/College of IT S/W Dept. 3rd Class/Applications of AI 11.9 Application of Neural Network Applications can be grouped in following categories: Clustering: A clustering algorithm explores the similarity between patterns And places similar patterns in a cluster. Best known applications Include data compression and data mining. Classification/Pattern recognition: The task of pattern recognition is to assign an input pattern (like handwritten symbol) to one of many classes. This category includes algorithmic implementations such as associative memory. Function approximation : The tasks of function approximation is to find an estimate of the unknown function subject to noise. Various engineering and scientific disciplines require function approximation. Prediction Systems: The task is to forecast some future values of a time-sequenced data. Prediction has a significant impact on decision support systems. Prediction differs from function approximation by considering time factor. System may be dynamic and may produce different results for the same input data based on system state (time). 19