Power Point Presentation

advertisement
Neural Networks
Biological Neural Networks
dendrites
colaterals
cell body
signal
direction
axon
Biological Neuron
electrical
signal
axon
synapse
vesicles
neurotransmitters
presynaptic
membrane
dendrite
Biological
Network
synaptic
gap
postsynaptic
membrane
electrical
signal
http://pharmacyebooks.com/2010/10/artifitial-neural-networks-hot-topic-pharmaceutical-research.html
The Perceptron
The perceptron was developed by Frank Rosenblatt in 1957. It is a simple
feed-forward network that can solve (create a decision function for) linearly
separable problems.
input
data
−∞, +∞
output
−1, +1
Inside the Perceptron
π‘ΊπŸŽ
weights
𝝎𝟎
π‘ΊπŸ
𝝎𝟏
sigma-pi
...
perceptron
output
...
π‘Ίπ’Š
πŽπ’Š
π‘Άπ’Œ
πšΊπŽπ’Š π‘Ίπ’Š
πŽπ‘΅−𝟐
𝑺𝑡−𝟐
𝑺𝑡−𝟏
πŽπ‘΅−𝟏
step
function
When is a Problem Linearly Separable?
RED vs BLUE
Linearly Separable
Not Linearly Separable
http://dynamicnotions.blogspot.com/2008/09/single-layer-perceptron.html
A Practical Application
Classification
The Iris Data - This is one of the most famous datasets used to illustrate the classification problem. From
four characteristics of the flower (the length of the sepal, the width of the sepal, the length of the petal and
the width of the petal), the objective is to classify a sample of 150 irises into three species: versicolor,
virginica and setosa.
Sources: R.A. Fisher. "The use of multiple measurements in taxonomic problems. Annals of Eugenics",
7(2), 179–188 (1936)
Data from: UCI Machine Learning Repository - http://archive.ics.uci.edu/ml/
Training a 4-2-1 Network for the Iris Data
sepal
length
1/5 of Iris Data selected uniformly, 10 samples per class for a
total of 30 training set pairs. The 4-2-1 network is comprised of
a total of 10 weights, 8 between the input and hidden layers,
and 2 between the hidden layer and the output.
sepal
width
0.0 Iris-setosa
4-2-1 net
0.5 Iris-versicolor
1.0 Iris-virginica
petal
length
Iris Data - 3 classes 50 samples each
petal
width
5.1
4.9
4.7
4.6
5.0
3.5
3.0
3.2
3.1
3.6
1.4
1.4
1.3
1.5
1.4
7.0
6.4
6.9
5.5
6.5
3.2
3.2
3.1
2.3
2.8
4.7
4.5
4.9
4.0
4.6
6.3
5.8
7.1
6.3
6.5
3.3
2.7
3.0
2.9
3.0
6.0
5.1
5.9
5.6
5.8
iris characteristics
0.2
0.2
0.2
0.2
0.2
:
1.4
1.5
1.5
1.3
1.5
:
2.5
1.9
2.1
1.8
2.2
:
Iris-setosa
Iris-setosa
Iris-setosa
Iris-setosa
Iris-setosa
Iris-versicolor
Iris-versicolor
Iris-versicolor
Iris-versicolor
Iris-versicolor
trained network specification
input layer
hidden layer
output layer
learning rate
error limit
max runs
# training sets
ihweights
Iris-virginica
Iris-virginica
Iris-virginica
Iris-virginica
Iris-virginica
The outputs for the
three classes were
set to 0, 0.5 and 1.0
howeights
4
2
1
0.28
0.01
10000
30
0.1835137273718
-1.52185484488147
1.06085392071769
-10.1057086709985
-1.53328697751333
4.0131689222145
-1.63759087701708
10.741961194748
-6.01331593454728
6.66056158141261
Classifier
Performance
Sample Count
1
2
3
1
50
0
0
2
0
46
1
3
0
4
49
Perf. Fraction
1
2
3
1
1.0
0.0
0.0
2
0.0
0.92
1.0
3
0.0
0.08
0.98
A Demonstration
Typical Feed-Forward Neural Network
input
layer
hidden
layer
output
layer
output
data
input
data
−∞, +∞
−∞, +∞
−1, +1
Inside an Artificial Neuron
π‘ΆπŸŽ
weights
sigma-pi
neuron
output
πŽπ’Š
π‘Άπ’Œ
...
π‘Άπ’Š
𝝎𝟏
...
outputs from previous layer
π‘ΆπŸ
πšΊπŽπ’Š π‘Άπ’Š
πŽπ‘΅−𝟐
𝑢𝑡−𝟐
𝑢𝑡−𝟏
πŽπ‘΅−𝟏
sigmoid
function
distribution to next layer
𝝎𝟎
Backward Error Propagation
1. Initialize the network with small random weights.
2. Present an input pattern to the input layer of the network.
3. Feed the input pattern forward through the network to calculate its
activation value.
4. Take the difference between desired output and the activation value to
calculate the network’s activation error.
5. Adjust the weights feeding the output neurons to reduce the activation
error for this input pattern.
6. Propagate an error value back to each hidden neuron that is proportional
to its contribution to the network activation error.
7. Adjust the weights feeding each hidden neuron to reduce its contribution
of error for this input pattern.
8. Repeat steps 2 to 7 for each input pattern in the training set ensemble.
9. Repeat step 8 until the network is suitably trained.
Implementing a Neural Network
t output
training
sets
each with
p values
t input
training
sets
each with
m values
mxn
weights
m input
layer nodes input to
hidden
layer
n hidden
layer nodes
nxp
p output
weights
hidden to layer nodes
output
layer
Neural Network Data Structure & Components
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
double learn = 0.28;
double error = 0.01;
int npairs = 0;
int maxnumruns = 10000;
int numinput = 1;
int numhidden = 1;
int numoutput = 1;
double[,] inTrain;
double[,] outTrain;
neuron[] iLayer;
neuron[] hLayer;
neuron[] oLayer;
weight[,] ihWeight;
weight[,] hoWeight;
int pxerr;
double Scalerr;
bool showtoterr = true;
public class neuron
{
public double input;
public double output;
public double error;
public neuron()
{
input = 0.0;
output = 0.0;
error = 0.0;
}
}
public class weight
{
public double wt;
public double delta;
public weight(double wght)
{
wt = wght;
delta = 0.0;
}
}
Generalized Delta Rule
pth training set input
πš«π’‘ π’˜π’Šπ’‹ =πœΌπœΉπ’‘π’‹ π„π’‘π’Š
tpi
π„π’‘π’Š
πš«π’‘ π’˜π’Šπ’‹
πœΉπ’‘π’‹
𝜼
π’˜π’Šπ’‹
correction to weight value
error in jth unit
learning rate
Quantifying Error for Back Propagation
𝒇(𝒂𝒑𝒋 ) neuron output function for pth presentation for training
πœΉπ’‘π’‹ = 𝒇′ (𝒂𝒑𝒋 ) 𝒕𝒑𝒋 − 𝒐𝒑𝒋
πœΉπ’‘π’‹ = 𝒇′ 𝒋 (𝒂𝒑𝒋 )
(𝒇𝒐𝒓 𝒂𝒍𝒍 π’Œ)
error for jth unit in output layer
πœΉπ’‘π’Œ π’˜π’‹π’Œ error for jth unit in hidden layer
π’˜π’‹πŸ
πœΉπ’‘πŸ
π’•π’‘πŸ
πœΉπ’‘πŸ
π’•π’‘πŸ
π’˜π’‹πŸ
π’˜π’‹π’Œ
...
πœΉπ’‘π’Œ
hidden layer
π’•π’‘π’Œ
output layer
pth training set output
πœΉπ’‘π’‹
The Sigmoid Function
𝒇 𝒙 =
𝟐
−𝟏
−πŸπ’™
𝟏+𝒆
sigmoid
𝒇′ 𝒙 = 𝟏 − 𝒇(𝒙)𝟐
derivative of
the sigmoid
Another Sigmoid Function
𝟏
𝒇 𝒙 =
𝟏 + 𝒆−𝒙
sigmoid
𝒇′ 𝒙 = 𝒇(𝒙) 𝟏 − 𝒇(𝒙)
derivative of
the sigmoid
Running the Neural Network
public void calcInputLayer(int p)
{
for (int i = 0; i < iLayer.Length; i++)
{
iLayer[i].output = inTrain[i, p];
}
}
public void calcHiddenLayer()
{
for(int h=0;h<hLayer.Length;h++)
{
hLayer[h].input = 0.0;
for (int i = 0; i < iLayer.Length; i++)
hLayer[h].input += ihWeight[i, h].wt * iLayer[i].output;
hLayer[h].output = f(hLayer[h].input);
}
}
public void calcOutputLayer()
{
for (int o = 0; o < oLayer.Length; o++)
{
oLayer[o].input = 0.0;
for (int h = 0; h < hLayer.Length; h++)
oLayer[o].input += hoWeight[h, o].wt * hLayer[h].output;
oLayer[o].output = f(oLayer[o].input);
}
}
public double f(double x)
{
return 1.0 / (1.0 + Math.Exp(-x));
}
public double df(double x)
{
return f(x) * (1.0 - f(x));
}
Running the network is a feed-forward
process. Input data is presented to the
input layer.
The activation (input) is computed for
each node of the hidden layer and then
used to compute the output of the
hidden layer nodes
The activation (input) is computed and
used to compute the output of the
network.
Training the Network
In backward error propagation, the difference between the actual output and the goal (or target)
output provided in the training set is used to compute the error in the network. This error is
then used to compute the delta (change) in weight values for the weights between the hidden
layer and the output layer.
public void calcOutputError(int p, int r)
{
for (int o = 0; o < oLayer.Length; o++)
oLayer[o].error = df(oLayer[o].input) * (outTrain[o, p] - oLayer[o].output);
for (int h = 0; h < hLayer.Length; h++)
for (int o = 0; o < oLayer.Length; o++)
hoWeight[h, o].wt += learn * oLayer[o].error * hLayer[h].output;
}
public void calcHiddenError(int p, int r)
{
double err = 0.0;
for (int h = 0; h < hLayer.Length; h++)
{
for (int o = 0; o < oLayer.Length; o++)
err = oLayer[o].error * hoWeight[h, o].wt;
hLayer[h].error = df(hLayer[h].input) * err;
}
for (int i = 0; i < iLayer.Length; i++)
for (int h = 0; h < hLayer.Length; h++)
ihWeight[i, h].wt += learn * hLayer[h].error * iLayer[i].output;
}
These new weight values are then used to distribute the output error to the hidden layer nodes.
These nodes errors are, in turn, used to compute the changes in value for the weights between
the input layer and the hidden layer of the network.
1. Set the number of neurons in each level
2. Select the learning rate, error limit and max
training runs
3. Give the number of training pairs and include
them in the left-hand text window with input
output pairs listed sequentially
input 1
output 1
input 2
output 2
:
input n
output n
Total Training Set Ensemble Error
during training process
Training rate depends on initial
value of random weights
User can monitor rate of error correction in
each weight during training as weight color
large delta
small delta
small or zero changes in each weight do
not necessarily mean that network is trained
training could be hung in a local minimum
When running the network, place input
values in text window and click run
answer(s) appear on next line(s)
How Many Nodes?
Number of Input Layer Nodes matches number of input values
Number of Ouput Layer Nodes matches number of output values
But what about the hidden Layer?
Too few hidden layer nodes and the NN can't learn the patterns.
Too many hidden layer nodes and the NN doesn't generalize.
When Should We Use Neural Networks?
Neural Networks need lots of data (example solutions) for training.
The functional relationships of the problem/solution are not well understood.
The problem/solution is not applicable to a rule-based solution.
"Similar input data sets generate "similar" outputs.
Neural Networks perform general Pattern Recognition.
Neural Networks are particularly good as Decision Support tools.
Also good for modeling behavior of living systems.
Can a Neural Network do More than a Digital Computer?
Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than
the computer on which it is being executed.
The question is, "Can a computational system such as a Neural Network be built that can do
something that a digital computer cannot?"
A digital computer is the physical embodiment of a Turing Machine which is defined as a universal
computer of all computable functions.
An artificial Neural Network is loosely modeled on the human brain.
Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic
the activities of human brain cells.
Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does
everything a human brain can do?
Can you think of something human brains do that, so far, has not been accomplished or, at least,
approximated by a computer or any other physical (man-made) system?
Can a Neural Network do More than a Digital Computer?
Clearly a simlulation of a Neural Network running on a digital computer cannot be more powerful than
the computer on which it is being executed.
The question is, "Can a computational system such as a Neural Network be built that can do
something that a digital computer cannot?"
A digital computer is the physical embodiment of a Turing Machine which is defined as a universal
computer of all computable functions.
An artificial Neural Network is loosely modeled on the human brain.
Rather than using a software simulation of neurons, we can build electronic circuits that closely mimic
the activities of human brain cells.
Can we build a physical system of any kind (based on electronics, chemistry, etc...) that does
everything a human brain can do?
Can you think of something human brains do that, so far, has not been accomplished or, at least,
approximated by a computer or any other physical (man-made) system?
Consciousness
What is the Computational Power of Consciousness?
Since we can't quantify consciousness, it is not likely that we can determine the level of
computational power necessary to manifest it.
However, we can establish a relative measure of computational power for systems that do and
(so far) do not exhibit consciousness.
Human Mind/Brain
Turing Machine
Digital Computer
Neural Network
Physical System/Model
Relative Computational Power
Mind/Brain
Turing
Machine
Digital
Computer
Physical
Model
Neural
Network
Relative Computational Power
Mind/Brain
Dualism
vs
Materialism
The Revised
Turing Test
Turing
Machine
Finite Storage
and
Finite Precision
Digital
Computer
Physical
Model
Due to limitations of finite
storage and the related issue
of finite precision arithmetic,
a Turing Machine can exhibit
greater computational power
than a digital computer.
Symbolism
vs
Connectionism
Engineering
and
Technology
Neural
Network
Relative Computational Power
Mind/Brain
Turing
Machine
Digital
Computer
Physical
Model
Neural
Network
Download