department of electrical & computer engineering

advertisement
DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING
CAP 6615
Neural Networks
Rajesh Pydipati
CAP 6615: Neural Networks
Introduction
The objective of taking this course was to get a clear understanding of the concepts involved in neural
network computing so that the technology can be tailored to solve a plethora of real world problems with
wide ranging applications in various fields which are mentioned below. Emphasis was mainly on
programming to observe the working of the algorithms.
Course Description




Objectives: Understand the concepts and learn the techniques of neural network computing.
Prerequisites: A familiarity with basic concepts in calculus, linear algebra, and probability theory.
Calculus requirements include: differentiation, chain rule, integration. Linear algebra requirements
include: matrix multiplication, inverse, pseudo-inverse.
Main topics: Introduction to neural computational models including classification, association,
optimization, and self-organization. Learning and discovery. Knowledge-based neural network
design and algorithms.
Applications include: pattern recognition, expert systems, control, signal analysis, and computer
vision.
Syllabus










Basic neural computational models
Feedforward networks
Learning / back propagation
Association networks
Classification
Self-Organization
Radial Basis Function networks
Support Vector Machines
Networks based on lattice computation
Applications
Projects:
A set of four projects were done as part of this course. A detailed description of each project with an
approach to the solution is presented next.
© 2003 Rajesh Pydipati
4
Fall 2003
CAP 6615: Neural Networks
Project 1:
Problem statement:
Project 1a: Implement the SLP learning algorithm. Implement the algorithm yourselves; do not use any
ANN package. Train your SLP to classify the capital letter patterns A, B, C, and D, in two classes, C1 and C1 as follows: A belongs to C1; B, C, and D all belong to -C1. After training, test whether your SLP
correctly classifies the same four patterns. You may use either the unipolar or the bipolar version of the
patterns
Approach:
This problem was to make us understand the basic working of a neural network circuit, the ‘perceptron’.
The problem involved, was to identify four different letter patterns which were fed as a stream of ‘0’s and
‘1’s to the network. After constructing the network architecture, it was trained on some data. After training
was complete, some patterns were tested to test the efficacy of the algorithm. Algorithms were written in
MATLAB.
Results:
The network was able to perfectly classify the four letter patterns
Project 2a: Implement an SLP to solve the following problem:
1.
Randomly choose 1000 points on either side of the line y = 0.5x + 2. Do not choose points
exactly on the line. Also, pick the points between fixed bounds b1 <= x <= b2 such that b2-b1
< 100, as shown in the figure below.
2. Train the SLP to discriminate between the two classes of points. Use a sequential application
of the points. Then pick 5 test points (not from the training set!) from either side of the line
to test your SLP on. Do they classify correctly? (The difficulty will be when the test points
are close to the line.) Print out the equation of the line obtained from the final weights. If the
5 points were not correctly classified, use them as additional training points and retrain the
network. Then pick another 5 points and test again.
© 2003 Rajesh Pydipati
5
Fall 2003
CAP 6615: Neural Networks
Approach:
This problem was to make us understand the working of a single layer perceptron in making decision
surfaces to distinguish various clusters of data. The problem involved. The problem involved in classifying
the data points as belonging to two clusters by forming a hyper plane as a decision boundary. After
constructing the network architecture, it was trained on some data. After training was complete, some data
points were tested to test the efficacy of the algorithm. Algorithms were written in MATLAB.
Results:
The network was able to perfectly classify the data points according to the decision boundary.
Project 2
Problem statement:
Project 2 is more of a research project and consists of implementing the backpropagation training algorithm
for multilayer perceptrons. Use Fisher's Iris dataset to train and test a multiclass MLP using the
backpropagation method. The dataset comprises 3 groups (classes) of 50 patterns each. One group
corresponds to one species of Iris flower. Every pattern has 4 real-valued features. The number of input and
output neurons is known; the number of hidden layers and hidden neurons are your choice.




Train your network using 13 exemplar patterns from each class (roughly 25% of the patterns)
picked at random. Then use the remaining patterns in the dataset to test the network and report the
results.
Next, train your network using 25 exemplar patterns from each class (i.e., 50% of the patterns)
picked at random. Use the remaining patterns in the set to test the network and report the results.
Next, train your network using 38 exemplar patterns from each class (roughly 75% of the patterns)
picked at random. Use the remaining patterns to test the network and report the results.
Finally, train your network on the entire pattern set. Then use the same patterns to test the network
and report the results. Note that this last experiment may pose serious convergence problems.
Explore techniques such as momentum to increase convergence speed, try various network architectures
(number of hidden layers and neurons in the hidden layers), investigate various stopping criteria and ways
to adjust learning rate and other parameters you might have.
Fisher's Iris Dataset
R.A. Fisher's Iris dataset is often referenced in the field of pattern recognition. It consists of 3 groups
(classes) of 50 patterns each. One group corresponds to one species of Iris flower: Iris Setosa (class C1), Iris
Versicolor (class C2), and Iris Verginica (class C3). Every pattern has 4 features (attributes), representing
petal width, petal length, sepal width, and sepal length (expressed in centimeters). The dataset file contains
one pattern per line, starting with the class number, followed by the 4 features. Lines are terminated with
single LF characters. Patterns are grouped by class.
© 2003 Rajesh Pydipati
6
Fall 2003
CAP 6615: Neural Networks
Irises by Vincent van Gogh (oil on canvas 1889)
Approach:
A network based on back propagation principle was constructed to train the network and classify the
various classes as mentioned above. The material below explains the architecture in detail.
Basic functions implemented in the neural networks algorithm: A one hidden layer MLP network with
feed-forward different and randomly selected training and test data sets.
Software used: MATLAB (Run on Windows Platform)
Important Considerations: Number of layers , Number of Processing elements in each layer,
Randomizing the training and test data sets , Expressive power , Training error , Activation function ,
Scaling input , Target values , Initializing weights , Learning rate , Momentum learning , Stopping criterion
,Criterion function
1. Number of layers:
Multilayer neural networks implement linear discriminants, in a space where the inputs have been
mapped non-linearly. Non-linear multilayer networks have greater computational or expressive power
than simple 2-layer networks (input & output layers), and can implement more functions. Given a
sufficient number of hidden units, any function can be represented. For this project, a one hidden layer
MLP has been chosen in order to reduce the complexity of the decision hyper plane.
2. Number of Processing elements in each layer:
The number of PE’s in the input and output layer can be easily understood based on the key
features in the input and output space. A clear observation of the input of each feature set reveals the
principal components that can be used as distinguished features between the three classes of IRIS leaves
that we plan to classify. Every pattern has 4 features (attributes), representing petal width, petal length,
sepal width, and sepal length (expressed in centimeters). An attempt at trying to reduce the input space,
in order to reduce the overall complexity of the classifier has been made. However, it should not be
© 2003 Rajesh Pydipati
7
Fall 2003
CAP 6615: Neural Networks
forgotten that neglecting any of the features without proper reasoning might amount to losing key
features and hence reduce accuracy of our classifier. Thus the input space has been analyzed for the
principal components using the PCA algorithm, which has also been implemented in the source code
used for arriving at the neural networks based solution in this project. The number of output PE’s is 3
due to the need for classifying each input space data set into one of the three different classes as
mentioned in the problem definition. Choosing the number of PE’s in the hidden layer is, however, a
more intuitive task. It is found that PE’s equal to 3 give best results. However varying the PE’s with in
a acceptable number did not alter the accuracies much.
3. Randomizing the training and test data sets:
For most practical outputs, the need for randomizing the training and test data sets is important.
A total of 13,25,38,50 training data sets, respectively, were used for each class, in each part of the
project. Correspondingly, a total 37,25,12,50 test data sets have been used. This data has been
randomly permuted before being fed forward in the network.
4. Expressive power:
Although we will have cause to use networks with different activation function for each layer
or each unit of each layer, to simplify the mathematical analysis, identical non-linear activation
functions were used.
5. Training error:
The training error on a pattern is the sum over output units of the squared difference between
the desired output d k and the actual output y k.
J (w) = ½* || d – y || 2
The training error for the hidden layer is calculated by the back propagation of the output layer errors.
6. Activation function:
The important constraint is that this should be continuous and differentiable. The sigmoid is a
smooth, differentiable, non-linear and saturating function. A minor benefit is that the derivative can be
easily expressed in terms of itself. That’s why, this function was chosen.
7. Scaling input:
In order to avoid difficulty due to difference of scale for each input, the input patterns should be
shifted so that the average over the training set of each feature is zero. Since online protocols do not
have the full data set at any one time, the scaling of the inputs was not found necessary.
8. Target values:
© 2003 Rajesh Pydipati
8
Fall 2003
CAP 6615: Neural Networks
For a finite value of net k, the output could never reach the saturation value, and thus there
would be error. Full training would never terminate because weights would become extremely large, as
the net k would be driven to plus or minus infinity. Thus target values corresponding to 2*(desired-1)
were used here.
9. Initializing weights:
For uniform learning, i.e. for all weights to reach their equilibrium values at the same time,
initializing the weights is very crucial. In case of non-uniform learning, one category is learnt well
before the others and so overall error rate is typically higher than necessary, due to redistribution of
error. To ensure uniform learning, the weights have been randomly initialized for each given layer.
10. Learning rate:
The optimal step size is given by step opt = (d 2 J/ d w 2) –1. The proper setting of this parameter
greatly affects the convergence as well as the classification accuracy. After a lot of trials, it was found
that this parameter should be very small (say a value in the range (1/10 to 1/1000)), to get close to
accurate results.
11. Momentum learning:
Error surfaces often have multiple minima in which d J (w) / d w is very small. These arise
when there are too many weights and thus the error depends only weakly upon any one of them.
Momentum allows the network to learn more quickly. The effect of the momentum term for the narrow
steep regions of the weight learning space is to focus the movement in a downhill direction by
averaging out the components of the gradient which alternate in sign [Gupta, Homma et al]. After a lot
of trails, it was found that the momentum learning parameter should be less than 1 for this particular
application. Varying this parameter with in between 0 and 1, did not adversely affect the performance.
However it should be noted that increasing this parameter value significantly reduces the convergence
speed.
12. Stopping criterion:
Usually the stopping criterion used is when the error falls below the error achieved on a
separate validation set (in which no data set for the test set has also been in the training set). But here
after training and testing individually for each class, it was found that the error is typically 0.4 when it
converges. So this was the stopping criterion used, in order to avoid over fitting and due to its
simplicity in implementing.
13. Criterion function:
The squared error has been used as the criterion function for this project.
© 2003 Rajesh Pydipati
9
Fall 2003
CAP 6615: Neural Networks
Results:
The results of the neural networks based classifier has been presented in the form of a confusion matrix,
the columns of which represent the classification results and the rows of which represent the class numbers
of our classification. The diagonal terms illustrate the correct results and the off diagonal terms in each
column; illustrate the wrong classification results, for that particular set of features (corresponding to each
class). Various experiments were conducted on maximizing the circuit performance. A sample result is
shown below.
trainConf  classification results on the training data
testConf  classification results on the test data
Step size change
a) Initial step size = 0.0400
Number of processing elements in the hidden layer = 3
Momentum factor = 0.9000
trainConf = 13
testConf =
0
0
0
13
0
0
0
13
37
0
0
0
33
0
0
4
37
© 2003 Rajesh Pydipati
10
Fall 2003
CAP 6615: Neural Networks
Project 3
Problem statement
Project 3 consists of implementing a training algorithm for a morphological perceptron with dendritic
structures (MPDS). Write a program that trains a two-input, two-output MPDS to solve the embedded
spirals problem.
The parametric equations of the two spirals are:
x1(theta) = theta * cos(theta) * 2 / pi
y1(theta) = theta * sin(theta) * 2 / pi
and
x2(theta) = – x1(theta)
y2(theta) = – y1(theta)
where
theta
=
0*pi/16+pi/2,
1*pi/16+pi/2,
2*pi/16+pi/2,
...,
64*pi/16+pi/2.
The spirals are initially sampled in 65 points, at angles ranging from pi/2 to 4*pi+pi/2 in uniform
increments of pi/16. These 2*65 points are provided for your convenience as a dataset file in the
Datasets section and will represent the first training set. The program will run in stages. At each stage, it
will train the SLMP, then double the number of points by subsampling the spirals (substituting theta),
and then test the SLMP on the entire set (consisting of the original training points together with the
intermediate test points). The stages are repeated until either correct classification occurs for all points,
or the number of points per spiral reaches 1025. The figure below illustrates the two spirals, each with
the initial 65 training points depicted as solid dots and the first test set of 64 intermediate points as
empty circles.
To summarize, your implementation must perform the following tasks:







constructs an SLMP and trains it on the initial training set;
generates 64 intermediate points per spiral, each point being on the spiral (and not on the edge
connecting two points) dividing an arc piece in two halves, resulting in 129 points per spiral (as in
the above figure);
tests the SLMP on the entire, 2*129 point, set and reports the results;
if recognition is 100% correct, then the program stops; otherwise, it retrains the SLMP on the new
set of data and continues;
doubles the number of points by generating a new set of intermediate points on each spiral; thus, the
new set will consist of 2*257 points;
tests the SLMP on the entire, 2*257 point, set and reports the results;
repeats this procedure (retraining, doubling the number of points, and then testing) until either
recognition is 100% accurate, or the total number of points per spiral has reached 1025; reports the
classification results on the last test set and then stops.
© 2003 Rajesh Pydipati
11
Fall 2003
CAP 6615: Neural Networks
.
Approach:
As explained in the problem, an algorithm was written in MATLAB to implement the Morphological
perceptron with dendritic structures. This particular algorithm mimics the exact functioning of a human
brain growing and shrinking dendrites all the time, as it progresses in its learning and hence its name.
Results:
The algorithm was able to classify the spirals accurately.
© 2003 Rajesh Pydipati
12
Fall 2003
CAP 6615: Neural Networks
Project 4 consists of implementing several types of associative memories.
Project 4a: Implement the algorithm to create a Hopfield auto-associative memory that stores the capital
letter patterns A, B, C, E, and X. Test the memory on all of the following patterns:
o
o
o
perfect (undistorted) A, B, C, E, and X;
corrupted A5%, B5%, C5%, E5%, and X5% with 5% dilative, erosive, and random noise;
corrupted A10%, B10%, C10%, E10%, and X10% with 10% dilative, erosive, and random noise.
Remember, however, that the Hopfield network requires bipolar data, so be sure to make the
necessary conversions.

Project 4b: Implement the algorithm to create a pair of morphological auto-associative memories
M and W that store the same capital letter patterns A, B, C, E, and X. As before, test the memory on
all of the following patterns:
o perfect (undistorted) A, B, C, E, and X;
o corrupted A5%, B5%, C5%, E5%, and X5% with 5% dilative, erosive, and random noise;
o corrupted A10%, B10%, C10%, E10%, and X10% with 10% dilative, erosive, and random noise.
Approach:
Hop field associative memory
The code for implementing the Hopfield associative memory is written in MATLAB.
Observations:
1) In all the cases (undistorted, 5% dilative, erosive and random noise cases as well as 10% dilative, erosive
and random noise cases) the letter patterns ‘A’ and ‘x’ were classified correctly. The main reason for this
may be because the patters themselves are largely unassociated as there is wide disparity in the letter form
itself in between these cases. The other patterns are not classified correctly. The main reason for this may
be because the letter patterns ‘B’,’C’,’E’ were somewhat similar in one way or another among themselves .
The results are plotted using the ‘imshow’ function in MATLAB. Make sure that Image processing
tool box is available in your version of Matlab, otherwise we can not observe the results. The code was
written in MATLAB on windows platform using version 6.12. While executing the code no significant
problems were encountered. The results are very obvious from the figures that pop up after executing the
code. However appropriate insights regarding those results are also mentioned above.
© 2003 Rajesh Pydipati
13
Fall 2003
CAP 6615: Neural Networks
Results:
Some results are shown here for the case of patterns with 5% dilative noise added:
Pattern ‘A’ recalled
Pattern ’B’ not recalled
Pattern ‘C’ not recalled
Pattern ‘E’ not recalled
Pattern ‘X’ recalled
Final Observation:
In the case of Hopfield associative memory we observe that recall is successful only when the
patterns themselves are not very similar. In case the patterns are similar the memory is confused and pops
out garbage results.
© 2003 Rajesh Pydipati
14
Fall 2003
CAP 6615: Neural Networks
Morphological Associative memories using matrices ‘W’ and ‘M’
Observations:
In these memories, the entire letter patterns (undistorted, 5% dilative, erosive and random noise cases)
were classified correctly both by the W and M associative memories.
In the 10% noise case we observe that
1) W is robust in the presence of Erosive noise.
2) M is robust in the presence of Dilative noise.
Even this, is very subtle to observe as the recalls are perfect in all patterns for both W and M
associative memories, except in one pattern ‘E’ (10% erosive noise case) where W performs better than M
associative memory reinforcing the observations mentioned above.
Probably, if more noise is added we would be able to better appreciate the facts mentioned in (1) and (2)
above.
Results:
For the case of 10% erosive noise added patterns the following recalls were obtained.
© 2003 Rajesh Pydipati
15
Fall 2003
CAP 6615: Neural Networks
© 2003 Rajesh Pydipati
16
Fall 2003
CAP 6615: Neural Networks
This is Pattern ‘E’ (not clear on the white background)
Final Observation:
In general morphological memories performed better than Hopfield associative memories.
© 2003 Rajesh Pydipati
17
Fall 2003
CAP 6615: Neural Networks
Additional patterns that were generated and their recalls using Wand M memories:
© 2003 Rajesh Pydipati
18
Fall 2003
CAP 6615: Neural Networks
© 2003 Rajesh Pydipati
19
Fall 2003
CAP 6615: Neural Networks
Results of additional patterns with Hopfield associative memory
© 2003 Rajesh Pydipati
20
Fall 2003
CAP 6615: Neural Networks
© 2003 Rajesh Pydipati
21
Fall 2003
CAP 6615: Neural Networks
References:
1) Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory
Madan M. Gupta, Liang Jin, Noriyasu Homma
2) Neural Networks: A Comprehensive Foundation
Simon S. Haykin
3) Pattern Classification
Richard O. Duda, Peter E. Hart, David G. Stork
4) Class notes of Prof. Gerhard Ritter of CAP 6615
© 2003 Rajesh Pydipati
22
Fall 2003
Download