Lecture2 Simple Perceptron

advertisement
CS532 Neural Networks
Dr. Anwar Majid Mirza
anwar.m.mirza@nu.edu.pk
Lecture No. 2
January 17th, 2008
National University of Computer and Emerging Sciences
NU-FAST, A.K. Brohi Road, H11, Islamabad, Pakistan
•
•
•
•
•
•
•
•
Pattern Classification
Characteristics in Pattern Recognition
Perceptron
Simple Perceptron for Pattern Classification
Perceptron: Algorithm
Perceptron: Example
Perceptron: Matlab code
Assignment 1
Pattern Recognition
• What is a Pattern?
• Patterns can be representations of scenes,
objects, temporal sequences, and description
of anything physical or abstract.
• Visual patterns can be pixels of various colors and
intensities.
• Speech patterns can be measurements of acoustical
waveforms.
• Patterns in medical diagnosis may be strings of
symptoms detected from patients.
• Regardless of the origin of these patterns, they
can all be represented as vectors of various
dimensions.
• These vectors are the input patterns to the
pattern recognition systems.
Pattern Recognition
• Classes and Classification
• The goal of the pattern recognition process is
to identify, or classify, input patterns.
• Character recognition may involve the task of
identifying an image scanned from a written page of
text as one of 26 possible characters of the alphabet.
• In speech recognition, pattern recognition may
involve the discrimination of vectors representing
spoken words.
• The pattern being recognized is classified as
belonging to one of a small number of classes.
• For example, the 26 characters may be the classes in
character recognition.
• The scanned input image is classified as one of the
alphabet characters.
Characteristics in Pattern Recognition
1. Data Space Partitioning
•
•
•
•
Pattern recognition results in a partitioning and
labeling of regions in the input space.
If the input to a pattern recognition system is an N
dimensional vector, the input space is a space
spanned by all input vectors (each input vector can
be viewed as a point, or pattern, in the N dimensional
input space).
Let X be the set of all input patterns, and C be the set
of classes.
Formally, a pattern recognition system P(X) performs
a mapping from the input space to the set of possible
classes.
P(X) : X g C
Characteristics in Pattern Recognition
1. Data Space Partitioning
(Input Data Space Partitioning for two dimensional data)
A
B
A
C
C
Estimated
Decision
Boundary
for Class C
Points
Correctly
Classified
as Class C
Points
Misclassified
as Class B
Characteristics in Pattern Recognition
2. Adaptivity
•
•
•
•
Adaptation occurs during training when examples of
various classes from a training set are presented to
the pattern recognition system along with their
correct classification.
The system “learns” from the training process by
associating an input example in the training set with
the correct class.
After being trained with a class label, the system can
often later correctly classify the same pattern and
improve on “similar” mistakes.
Thus the performance of a pattern recognition
system improves as more training data is
represented and it becomes more and more capable
of correctly classifying a wider range of inputs.
Characteristics in Pattern Recognition
3. Generalization
•
•
•
•
If an adaptive system is trained using a set of
patterns, T, and can correctly classify patterns not in
T, the process behind this behavior is called
generalization.
Generalization based on adaptation is recognized as
being the essence of intelligent systems.
Usually, the training set T, available to a pattern
recognition system is very small compared to the
total number of examples in the input space.
By using a well selected set that captures most of
the variations expected from the inputs, a well
designed pattern recognition system should be able
to “learn” important properties in distinguishing
patterns of different classes.
Characteristics in Pattern Recognition
4. Input Space Complexity
•
•
Minimum error boundaries between classes
in pattern recognition can be extremely
complex and difficult to characterize.
Since the unknown boundaries can be
extremely irregular, adaptive systems are
unlikely to find the solution through
“learning” merely based in a small set of
examples.
• Perceptrons had perhaps the most farreaching impact of any of the early neural
nets.
• A number of different types of Perceptrons
have been used and described by various
workers.
• The original perceptrons had three layers of
neurons – sensory units, associator units and a
response unit – forming an approximate model
of a retina.
• Under suitable assumptions, its iterative
learning procedure can be proved to converge
to the correct weights i.e., the weights that
allow the net to produce the correct output
value for each of the training input patterns.
• The architecture of a
simple perceptron for
performing single
classification is shown
in the figure.
• The goal of the net is
to classify each input
pattern as belonging,
or not belonging, to a
particular class.
• Belonging is signified
by the output unit
giving a response of
+1; not belonging is
indicated by a
response of -1.
• A zero output means
that it is not decisive.
1
b
X1
w1
X2
l
l
l
Xn
w2
Y2
wn
 1 if

f ( x)   0 if
 1 if

x 
  x  
x 
1.
2.
3.
4.
5.
6.
Initialize weights, bias and the threshold . Also
set the learning rate a such that ( 0 < a <=1).
While stopping condition is false, do the
following steps.
For each training pair s:t, do steps 4 to 7.
Set the activations of input units. xi = si
Compute the response of the output unit.
Update weights and bias if an error occurred for
this pattern.
If y is not equal to t (under some limit) then
wi(new) = wi(old) + a xi t for i = 1 to n
b(new) = b(old) + a t
end if
7.
Test stopping condition: If no weight changed in
step 3, stop, else continue.
• Lets consider the following training data:
Inputs
Target
x1
x2
t
1
1
1
1
0
-1
0
1
-1
0
0
-1
• We initialize the weights to be w1 = 0, w2 = 0
and b = 0. Also we set a = 1, and  = 0.2.
• The following table shows the sequence in which
the net is provided with the input one by one and
checked for the required target.
Input
x1
Bias
x2
Net
Input
Output
Target
yin
y
t
1
2
Weights
w1 w2
b
0
0
0
1
1
0
0
1
0
1
0
1
1
1
1
0
2
1
-1
0
1
1
-1
1
-1
-1
-1
1
0
0
0
1
1
0
0
1
0
-1
-1
1
1
0
0
1
0
1
0
1
1
1
1
-1
1
0
-2
-1
1
0
-1
1
-1
-1
-1
1
0
0
0
1
1
0
0
0
-1
-2
-2
Input
10
x1
x2
1
1
0
0
1
0
1
0
Bias
1
1
1
1
Net
Input
Output
Target
yin
y
t
1
-2
-1
-4
1
-1
-1
-1
1
-1
-1
-1
Weights
w1 w2
2
2
2
2
3
3
3
3
Thus the positive response is given by all the points
such that
2x1  3x2  4  0.2
And the negative response is given by all the points
such that
2 x1  3x2  4  0.2
b
-4
-4
-4
-4
• We note that the output of the Perceptron is
1(positive), if the net input ‘yin’ is greater than .
• Also that the output is -1(negative), if the net input
‘yin’ is less than –.
• We also know that
yin = b + x1*w1 + x2*w2
in this case
• Thus we have
2*x1 + 3*x2 – 4 > 0.2
(+ve output)
2*x1 + 3*x2 – 4 < -0.2
(-ve output)
• Removing the inequalities and simplifying them gives
us the following two equations:
2*x1 + 3*x2 = 4.2
2*x1 + 3*x2 = 3.8
• These two equations are equations of straight lines in
x1,x2-plane.
• These lines geometrically separate out the input
training data into two classes.
• Hence they serve as acting like decision boundaries.
• The two lines have the same slope, but their yintercepts are different. So essentially, we have two
parallel straight lines acting line decision boundaries
for out input data.
• The parameter , determines the separation between
two lines (and hence the width of the indecisive
region).
• Graphically
x2
(0, 1)
(1, 1)
2 x1  3x2  4.2
x1
(0, 0)
(1, 0)
2 x1  3x2  3.8
Undecided
Region
• The following slides contain the
Matlab source code for the
Simple Perceptron Studied in
the class.
• Also find the first assignment in
these slides
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Lecture 2: Artificial Neural Networks
%
By Dr. Anwar M. Mirza
%
Department of Computer Science
%
NUCES-FAST, Islamabad, PAKISTAN
%
Date: December 30, 2001
%
Last Modified: February 09, 2007
% Purpose:
% This Matlab code is based on Perceptron learning rule
% Part 1: Training data and various control parameters are defined
% Part 2: Perceptron learning rule is used for training of the net
% Part 3: The resulting network is tested for the input data
% Part 4: Some exercises
%%%%%%%%%%%%%%%%%%%%%%%%% Part 1 %%%%%%%%%%%%%%%%%%
clear all
% Initialize various parameters
w(1) = 0
% Initialize weights, w(1) is bias weight.
w(2) = 0; w(3) = 0
alpha = 1
% Learning Rate
theta = 0.2
% Length of non-decisive region
stopFlag = -1;
% Stoping flag, -1 = do not stop and +1 = stop
wtChange = 1;
% Weight change Flag, -1 = no change, +1 = change
epsilon = 0.001;
% Termination Criteria
epoch = 0;
% Epoch counter
% define training patterns
noOfTrainingPatterns = 4;
s(1,:) = [1 1 1]; t(1,1) = 1;
s(3,:) = [1 0 1]; t(3,1) = -1;
s(2,:) = [1 1 0]; t(2,1) = -1;
s(4,:) = [1 0 0]; t(4,1) = -1;
%%%%%%%%%%%%%%%%%% Part 2 %%%%%%%%%%%%%%%%%%
% Step 1: While stopping condition is false, do steps 1 to 6
while stopFlag < 0
% Step 2: For each training pair s:t, do steps 3 to 5
wtChange = -1;
for i=1:noOfTrainingPatterns
%Step 3: Set activations of each input unit
x = s(i,:); outDesired = t(i,:);
% Step 4: Compute activation of the output unit
% Calculate the net input
yin = x*w';
if yin > theta
y = 1.;
elseif yin < -theta
y = -1.;
else
y = 0.;
end
% Step 5: Update biases and weights
if abs(outDesired-y) > epsilon
w = w + (alpha*outDesired).*x;
wtChange = 1.;
end
epoch, x, yin, y, outDesired, w
end
if wtChange < 0
stopFlag = 1.;
end
epoch = epoch + 1;
end % of while loop
%%%%%%%%%% Part 3 %%%%%%%%%%%%
% Lets pick one pattern at random from the training data.
p=fix(1+rand*(noOfTrainingPatterns-1))
x=s(p,:)
outDesired=t(p,:)
% For testing, we simply give this as input to the trained net
% and find out the output.
yin = x*w';
if yin > theta
y = 1.;
elseif yin < -theta
y = -1.;
else
y = 0.;
end
% print y and the desired output
y, outDesired
Assignment No. 1
Note: Please submit your assignment (both in hard- and soft copy form) by
4:00pm Thursday, January 24th, 2008). Please carry out the following
exercises by doing appropriate modification in the above code.
Exercise 1: Consider the training data for the Perceptron used above
Inputs
target
x1
x2
x3
bias
t
1
1
1
1
1
1
1
0
1
-1
1
0
1
1
-1
0
1
1
1
-1
(i.e. there is one more input neuron now). Verify that iterations converge
after 26 epochs for learning rate equal to 1.0 and theta equal to 0.1, and
that the converged set of weights and bias is w1=2, w2 =3, w3 =4 and b=-8.
Exercise 2: Plot the changes in the separating lines as they occur in Exercise1.
Exercise 3: Using the above code, find the weights required to perform the
following classification:
Vectors (1,1,1,1) and (-1,1,-1,-1) are members of the class (and
therefore have target value 1);
Vectors (1,1,1,-1) and (1,-1,-1,1) are not members of the class (and
have target value -1).
Use a learning rate of 1 and starting weights of 0. Using each of the training
vectors as input, test the response of the net. Display the number of epochs
taken to each the convergence. Also plot the maximum error in each epoch.
Download