Artificial Neural Networks(ANNs) 1.The Nervous System

advertisement
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
Artificial Neural Networks(ANNs)
1.The Nervous System
The human nervous system can be broken down into three stages that
may be represented in block diagram form as:
The receptors collect information from the environment – e.g. photons on
the retina.
The effectors generate interactions with the environment – e.g. activate
muscles.
The flow of information/activation is represented by arrows – feed
forward and feedback.
Naturally, in this module we will be primarily concerned with the neural
network in the middle.
2.Basic Components of Biological Neurons
1. The majority of neurons encode their activations or outputs as a series
of brief electrical pulses (i.e. spikes or action potentials).
2. The neuron’s cell body (soma) processes the incoming activations and
converts them into output activations.
3. Dendrites are fibres which emanate from the cell body and provide the
receptive zones that receive activation from other neurons.
4. Axons are fibres acting as transmission lines that send activation to
other neurons.
5. The junctions that allow signal transmission between the axons and
dendrites are called synapses. The process of transmission is by diffusion
of chemicals called neurotransmitters across the synaptic cleft
1
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
■ Dendrites receive activation from other neurons.
■ Soma processes the incoming activations and converts them into output
activations.
■ Axons act as transmission lines to send activation to other neurons.
■ Synapses the junctions allow signal transmission between the axons
and dendrites.
■ The process of transmission is by diffusion of chemicals called neurotransmitters.
3. What are Neural Networks ?
1. Neural Networks (NNs) are networks of neurons, for example, as
found in real (i.e. biological) brains.
2. Artificial Neurons are crude approximations of the neurons found in
brains. They may be physical devices, or purely mathematical constructs.
3. Artificial Neural Networks (ANNs) are networks of Artificial
Neurons,
and hence constitute crude approximations to parts of real brains. They
may be physical devices, or simulated on conventional computers.
4. From a practical point of view, an ANN is just a parallel computational
system consisting of many simple processing elements connected
together
in a specific way in order to perform a particular task.
5. One should never lose sight of how crude the approximations are, and
how over-simplified our ANNs are compared to real brains.
2
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
4. What are Artificial Neural Networks used for?
As with the field of AI in general, there are two basic goals for neural
network research:
Brain modeling : The scientific goal of building models of how real
brains work. This can potentially help us understand the nature of human
intelligence, formulate better teaching strategies, or better remedial
actions for brain damaged patients.
Artificial System Building : The engineering goal of building efficient
systems for real world applications. This may make machines more
powerful, relieve humans of tedious tasks, and may even improve upon
human performance. These should not be thought of as competing goals.
We often use exactly the same networks and techniques for both.
Frequently progress is made when the two approaches are allowed to feed
into each other. There are fundamental differences though, e.g. the need
for biological plausibility in brain modeling, and the need for
computational efficiency in artificial system building.
5.Why are Artificial Neural Networks worth studying?
1. They are extremely powerful computational devices .
2. Massive parallelism makes them very efficient.
3. They can learn and generalize from training data – so there is no need
for enormous feats of programming.
4. They are particularly fault tolerant – this is equivalent to the “graceful
degradation” found in biological systems.
5. They are very noise tolerant – so they can cope with situations where
normal symbolic systems would have difficulty.
6.Architecture of ANNs
1-The Single Layer Feed-forward Network consists of a single layer of
weights , where the inputs are directly connected to the outputs, via a
series of weights. The synaptic links carrying weights connect every input
to every output , but not other way. This way it is considered a network of
feed-forward type.
3
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
2-Multi Layer Feed-forward Network
The name suggests, it consists of multiple layers. The architecture of
this class of network, besides having the input and the output layers, also
have one or more intermediary layers called hidden layers. The
computational units of the hidden layer are known as hidden neurons.
- The hidden layer does intermediate computation before directing the
input to output layer.
- The input layer neurons are linked to the hidden layer neurons; the
weights on these links are referred to as input-hidden layer weights.
- The hidden layer neurons and the corresponding weights are referred to
as output-hidden layer weights.
- A multi-layer feed-forward network with ℓ input neurons, m1 neurons
In the first hidden layers, m2 neurons in the second hidden layers, and n
4
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
output neurons in the output layers is written as (ℓ - m1 - m2 – n ).
The Fig. above illustrates a multilayer feed-forward network with a
configuration (ℓ - m – n).
3-The Recurrent Networks differ from feed-forward architecture. A
Recurrent network has at least one feed back loop.
There could be neurons with self-feedback links; that is the output of a
neuron is fed back into it self as input.
7.Learning in Neural Networks
There are many forms of neural networks. Most operate by passing neural
‘activations’ through a network of connected neurons.
One of the most powerful features of neural networks is their ability to
learn and generalize from a set of training data. They adapt the
strengths/weights of the connections between neurons so that the final
output activations are correct.
There are three broad types of learning:
1. Supervised Learning (i.e. learning with a teacher)
2. Reinforcement learning (i.e. learning with limited feedback)
3. Unsupervised learning (i.e. learning with no help)
This module will study in some detail the most common learning
algorithms for the most common types of neural network.
1-Supervised Learning
- A teacher is present during learning process and presents expected
5
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
output.
- Every input pattern is used to train the network.
- Learning process is based on comparison, between network's computed
output and the correct expected output, generating "error".
- The "error" generated is used to change network parameters that result
improved performance.
2- Unsupervised Learning
- No teacher is present.
- The expected or desired output is not presented to the network.
- The system learns of it own by discovering and adapting to the
structural features in the input patterns.
3- Reinforced learning
- A teacher is present but does not present the expected or desired output
but only indicated if the computed output is correct or incorrect.
- The information provided helps the network in its learning process.
- A reward is given for correct answer computed and a penalty for a
wrong answer.
Note : The Supervised and Unsupervised learning methods are most
popular forms of learning compared to Reinforced learning.
8-The McCulloch-Pitts Neuron
This vastly simplified model of real neurons is also known as a
Threshold Logic Unit :
1. A set of synapses (i.e. connections) brings in activations from other
neurons.
2. A processing unit sums the inputs, and then applies activation function.
3. An output line transmits the result to other neurons.
Using the above notation, we can now write down a simple equation for
the output out of a McCulloch-Pitts neuron as a function of its n inputs
ini :
6
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
where
is the neuron’s activation threshold. We can easily see that:
Note that the McCulloch-Pitts neuron is an extremely simplified model of
real biological neurons. Some of its missing features such: non-binary
inputs and Outputs. Nevertheless, McCulloch-Pitts neurons are
computationally very powerful. One can show that assemblies of such
neurons are capable of universal computation.
9.General Procedure for Building Neural Networks
Formulating neural network solutions for particular problems is a multistage process:
1. Understand and specify your problem in terms of inputs and required
outputs, e.g. for classification the outputs are the classes usually
represented as binary vectors.
2. Take the simplest form of network you think might be able to solve
your problem, e.g. a simple Perceptron.
3. Try to find appropriate connection weights (including neuron
thresholds) so that the network produces the right outputs for each input
in its training data.
4. Make sure that the network works on its training data, and test its
generalization by checking its performance on new testing data.
5. If the network doesn’t perform well enough, go back to stage 3 and try
harder.
6. If the network still doesn’t perform well enough, go back to stage 2 and
try harder.
7. If the network still doesn’t perform well enough, go back to stage 1 and
try harder.
8. Problem solved – move on to next problem.
10.Artificial Neuron - Basic Elements
Neuron consists of three basic components - weights,
thresholds, and a single activation function
7
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
In practice, neurons generally do not fire (produce an output) unless their
total input goes above a threshold value.
Activation Functions
An activation function f performs a mathematical operation on the signal
output. The activation functions are chosen depending upon the type of
problem to be solved by the network. The most common activation
functions are:
Activation functions are called bipolar continuous and bipolar binary
functions, respectively. The word "bipolar" is used to point out that both
positive and negative responses of neurons are produced for this
definition of the activation function.
8
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
Activation functions are called unipolar continuous and unipolar binary
functions, respectively.
11.NEURAL NETWORK LEARNING RULES
Our focus in this section will be artificial neural network learning rules.
A neuron is considered to be an adaptive element. Its weights are
modifiable depending on the input signal it receives, its output value, and
the associated teacher response. In some cases the teacher signal is not
available and no error information can be used, thus the neuron will
modify its weights based only on the input and/or output. This is the case
for unsupervised learning. Let us study the learning of the weight vector
wi, or its components wy connecting the j7th input with the i'th neuron..
In general, the j'th input can be an output of another neuron or it can be an
external input. Our discussion in this section will cover single-neuron and
single-layer network supervised learning and simple cases of
unsupervised learning. Under different learning rules, the form of the
neuron's activation function may be different. Note
that the threshold parameter may be included in learning as one of the
weights. This would require fixing one of the inputs, say x,. We will
assume here that x,, if fixed, takes the value of - 1.
9
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
11.1.Hebbian Learning Rule
For the Hebbian learning rule the learning signal is equal simply to the
neuron's output (Hebb 1949).
This learning rule requires the weight initialization at small random
values around wi = 0 prior to learning. The Hebbian learning rule
represents a purely feedforward, unsupervised learning.
11
University of Babylon/College of IT S/W Dept.
11
3rd Class/Applications of AI
University of Babylon/College of IT S/W Dept.
12
3rd Class/Applications of AI
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
11.2.Perceptron Learning Rule
For the perceptron learning rule, the learning signal is the difference
between the desired and actual neuron's response (Rosenblatt 1958).
Thus, learning is supervised and the learning signal is equal to:
This example illustrates the perceptron learning rule of the network
shown in Figure . The set of input training vectors is as follows:
13
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
11.3.Delta Learning Rule
The delta learning rule is only valid for continuous activation functions as
defined before, and in the supervised training mode. The learning signal
for this rule is called delta and is defined as follows
14
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
This example discusses the delta learning rule as applied to the network
shown in Figure . Training input vectors, desired responses, and initial
weights are identical to those in Example . The delta learning requires
that the value f'(net) be computed in each step.
15
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
11.4.Widrow-Hoff Learning Rule
The Widrow-Hoff learning rule (Widrow 1962) is applicable for the
supervised training of neural networks. It is independent of the activation
function of neurons used since it minimizes the squared error between the
desired output value d, and the neuron's activation value net, = wtx. The
learning signal for this rule is defined as follows:
The weight vector increment under this learning rule is
11.5. Correlation Learning Rule
By substituting r = di into the general learning rule we obtain the
correlation learning rule. The adjustments for the weight vector and the
single weights, respectively are
11.6.Winner-take-all Learning Rule
This learning rule differs substantially from any of the rules discussed so
far in this section It can only be demonstrated and explained for an
ensemble of neurons, preferably arranged in a layer of p units. This rule is
an example of Competitive learning, and it is used for unsupervised
network training. Typically, winner-take-all learning is used for learning
16
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
statistical properties of inputs (Hechtle Nielsen 1987).learning is based on
the premise that one of the neurons in the layer, say the m'th, has the
maximum response due to input x, as shown in Figure . this neuron is
declared the winner. As a result of this winning event, the weight vector
w
containing weights highlighted in the figure is the only one adjusted in
the given unsupervised learning step. Its increment is computed as
follows:
The winner selection is based on the following criterion of maximum
activation among all p neurons participating in a competition:
11.7.Outstar Learning Rule
Outstar learning rule is another learning rule that is best explained when
neurons are arranged in a layer. This rule is designed to produce a desired
17
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
response d of the layer of p neurons shown in Figure (Grossberg 1974,
1982). The rule is used to provide learning of repetitive and characteristic
properties of input /output relationships. This rule is concerned with
supervised learning; however, it is supposed to allow the network to
extract statistical properties of the input and output signals. The weight
adjustments in this rule are computed as follows:
Excersise/ presents the analysis of a two layer feedforward network using
neurons having the bipolar binary activation function given in . Our
purpose is to find output o5 for a given network and input pattern.
18
University of Babylon/College of IT S/W Dept.
3rd Class/Applications of AI
11.9 Application of Neural Network
 Applications can be grouped in following categories:
 Clustering:
A clustering algorithm explores the similarity between patterns
And places similar patterns in a cluster. Best known applications
Include data compression and data mining.
 Classification/Pattern recognition:
The task of pattern recognition is to assign an input pattern
(like handwritten symbol) to one of many classes. This category
includes algorithmic implementations such as associative
memory.
 Function approximation :
The tasks of function approximation is to find an estimate of the
unknown function subject to noise. Various engineering and
scientific disciplines require function approximation.
 Prediction Systems:
The task is to forecast some future values of a time-sequenced
data. Prediction has a significant impact on decision support
systems. Prediction differs from function approximation by
considering time factor. System may be dynamic and may
produce different results for the same input data based on
system state (time).
19
Download