Supervised learning network-latest

advertisement
Supervised learning network
G.Anuradha
Architecture
• Earlier attempts to build intelligent and self
learning systems using simple components
• Used to solve simple classification problems
• Used by Rosenblatt to explain the patternrecognition abilities of biological visual systems.
Sensory Unit
Associator Unit
Binary activation function
Response Unit
Activation +1 0 -1
Quiz
• Which of the features would probably not
be useful for classifying handwritten digits
from binary images?
Raw pixels from images
Set of strokes that can be combined to
form various digits
Day of the year on which the digits were
drawn
Number of pixels set to one
Perceptron networks-Theory
single-layer feed forward networks
1.
It has 3 units:,
1.
2.
3.
2.
3.
4.
input(sensory),
hidden(associator unit)
Output (response unit)
Input-hidden fixed weights -1,0,1 assigned at random,
binary activation fn:
Output unit (1,0,-1) activation, binary step fn: with
threshold θ
Output of perceptron is y  f ( yin)
1ifyin  



f ( yin)  0if    yin   
 1 yin  



Perceptron theory
5. Weight updation between hidden and output
unit
6. Checks out for error between hidden and output
layer
7. Error=target-calculated
8. weights are adjusted in case of error
wi (new)  wi (old)  txi
b(new)  b(old)  t
α is the learning rate, ‘t’ is the target which is -1 or 1.
No error-no weight change-training is stopped
Single classification perceptron
network
x
0
1
x1
X1
b
w1
xi
Xi
xn
Xn
wi
Y
wn
y
Perceptron training algo for single
output classes
• Step 0: initialize weights,bias,learning rate(between 0
and1)
• Step 1: perform step 2-6 until final stopping condition is
false
• Step 2: perform steps 3-5 for each training pair indicated
by s:t
• Step 3: input layer is applied with identity activation fn:
– xi=si
• Step 4: calculate yin
y=f(yin)
1ifyin  



f ( yin)  0if    yin   
 1yin  



Perceptron training algo for single
output classes
• Step 5: Weight and bias adjustment: Compare
the value of actual and desired(target)
If y≠t
else
wi (new)  wi (old )  txi
b(new)  b(old )  t
wi(new)=wi(old)
b(new)=b(old)
•Step 6: train the network until there is no weight change. This is the
stopping condition for the network. If not met start from Step n2
EXAMPLE
Start
Stop
If
weight
change
s
Initialize weights
and bias
Set α (0 to 1)
W(new)=w(old)
B(new)=b(old)
For
each
s:t
wi (new)  wi (old)  txi
b(new)  b(old)  t
If y!=t
Y
Activate input
units
Xi=si
Calculate net
input
Apply activation
function y=f(yin)
Perceptron training algo for multiple
output classes
• Step 0: Initialize the weights, biases, and
learning rate suitably
• Step 1: Check for stopping condition; if
false then perform steps 2-6
• Step 2: Perform steps 3 to 5 for each
bipolar or binary training vector pair s:t
• Step 3: Set activation(identity) a each
input unit i=1 to n xi=si
Perceptron training algo for multiple
output classes
• Step 4: calculate output response
n
yinj  bj   xiwij
i 1
Activations are applied over the net input to calculate the output
response
1ifyin  



f ( yin)  0if    yin   
 1 yin  



Perceptron training algo for multiple
output classes
• Step 5: Make adjustment in weights and bias for
j=1 to m and i=1 to n
If ti≠yj then
wij(new)  wij(old)  tjxi
else
wij(new)  wij(old )
bj (new)  bj (old )
Step 6: Check for stopping condition. No change in weights then stop training
process
Example of AND
Linear separability
• Perceptron network is used for linear
separability concept.
• Separating line is based of threshold θ
• The condition for separating the response
from region of positive to region of zero is
w1x1+w2x2+b> θ
• The condition for separating the response
from region of zero to region of negative is
w1x1+w2x2+b<- θ
What binary threshold neurons cannot do
• A binary threshold output unit cannot even tell if two single bit features are
the same!
Positive cases (same):
(1,1)  1;
(0,0)  1
Negative cases (different): (1,0)  0;
(0,1)  0
• The four input-output pairs give four inequalities that are impossible to
satisfy:
w1 + w2 ³ q , 0 ³ q
w1 < q ,
w2 < q
-q w1
w2
1
x2
x1
A geometric view of what binary threshold neurons cannot do
Imagine “data-space” in which the
axes correspond to components of an
input vector.
– Each input vector is a point in this
space.
– A weight vector defines a plane in
data-space.
– The weight plane is perpendicular
to the weight vector and misses
the origin by a distance equal to
the threshold.
0,1
1,1
0,0
1,0
The positive and negative cases
cannot be separated by a plane
Discriminating simple patterns
under translation with wrap-around
• Suppose we just use pixels as
the features.
• Can a binary threshold unit
discriminate between different
patterns that have the same
number of on pixels?
– Not if the patterns can
translate with wrap-around!
pattern A
pattern A
pattern A
pattern B
pattern B
pattern B
Learning with hidden units
•
•
•
For such linear separability problem we require an additional layer called as
hidden layer.
Networks without hidden units are very limited in the input-output mappings they
can learn to model..
We need multiple layers of adaptive, non-linear hidden units.
Solution to EXOR problem
ADALINE
• A network with a single linear unit is called
an ADALINE (ADAptive LINear Neuron)
• Input-output relationship is linear
• Uses bipolar activation for its input signals
and its target output
• Weights between the input and output are
adjustable and has only one output unit
• Trained using Delta rule (Least mean
square) or (Widrow-Hoff rule)
Architecture
• Delta rule for Single output unit
– Minimize the error over all training patterns.
– Done by reducing the error for each pattern one at a time
• Delta rule for adjusting the weight for ith pattern is
(i=1to n)
wi   (t  yin) xi
• Delta rule in case of several output units for adjusting
the weight from ith input unit to jth output unit
wij   (t  yinj ) xi
Difference between Perceptron
and Delta Rule
Perceptron
Delta
Originates from hebbian
assumption
Derived from gradientdescent method
Stops after a finite number
of learning steps
Continuous forever
converging asymptotically
to the solution
Minimizes error over all
training patterns
Architecture
x0=1
1
b
x1
X1
w1

yin= x1wi
f(yin)
w2
x2
X2
wn
yin
xn
Xn
e=t-yin
Adaptive
algorithm
O/p error
generator
t
Start
Stop
Y
Initialize weights
and bias and α
If Ei=Es
Input the specified
tolerance error Es
Calculate error
Ei=Σ(t-yin)2
For
each
s:t
wi (new)  wi (old )   (t  yin) xi
b(new)  b(old )   (t  yin)
Y
Activate input
units
Xi=si
Calculate net input
Yin=b+Σxi wi
Madaline
• Two or more adaline are integrated to develop madaline
model
• Used for nonlinearly separable logic functions (EX-OR)
function
• Used for adaptive noise cancellation and adaptive
inverse control
• In noise cancellation the objective is to filter out an
interference component by identifying a linear model of a
measurable noise source and the corresponding
immeasurable interference.
• ECG, echo elimination from long distance telephone
transmission lines
Download