A Brief Introduction to Netlab

advertisement
A Brief Introduction to Netlab
Ian Nabney’s Neural Network
Toolbox for MatLab
Todd Leen
April 2005
The Netlab toolbox is available online at http://www.ncrg.aston.ac.uk/netlab/. The
toolbox is installed at OGI/CSEE and at PSU/CS. In order to use it, you must be sure that
Netlab is in the searchpath for your Matlab session. You can do this from within Matlab
by entering the command
>> path(path,’/usr/local/matlab-r14/toolbox/netlab’)
if you’re an OGI students. Melanie has the correct path for the PSU installation.
There is an overview, and several examples at the class website. The toolbox consists of
a set of Matlab m-files, and so are compatible with MatLab running on any platform.
There are help comments in the function files, but no extended online documentation.
The companion volume Netlab – Algorithms for Pattern Recognition, Ian T. Nabney,
Springer, 2001 contains detailed documentation.
My goal here is to provide enough of an introduction to allow you to use some of the
functions in the toolbox. This is not intended to be comprehensive, and I’ll just give you
the simplest pieces to get you started.
If you are unfamiliar with Matlab, there are several good on-line tutorials including those
at
http://spicerack.sr.unh.edu/~mathadm/tutorial/software/matlab/
http://www.engin.umich.edu/group/ctm/
http://web.usna.navy.mil/~mecheng/DESIGN/CAD/MATLAB/usna.html
and the OGI/CSE 15minute tutorial at
http://www.cse.ogi.edu/CFST/tut/matlab.html .
1. Network Examples and Demos
There are several training examples for both regression and classification problems in the
demonstration program demnlab. There’s a GUI for training your own data in the
demonstration program demtrain.
2. Neural Network Setup and Parameter Optimization
Netlab uses a data structure for neural network parameters rather than a simple vector of
weights. Hence one requires a wrapper function to interface the optimization algorithms
to the network data structure. The function netopt provides this wrapper.
The data structures for the network are established by the call
>> net = mlp(2,20,1,'linear');
which specifies a multilayer perceptron with 2 input, 20 hidden, and 1 output node. The
output node is further specified to have a linear response function. Other choices are
‘logistic’ or ‘softmax’, both are used for classification.
The network weights are trained by
>> [net,options] = netopt(net, options, xtrain, ytrain, 'conjgrad');
which trains the network specified in the data structure net on the input/target data in
xtrain/ytrain using conjugate gradient descent. The call to netopt above returns
the optimized weights in the data structure net. Various algorithms are controlled by
the 18 values in the vector options. To get an idea of what these are, do a
>> help foptions
.
Optimization methods that can be used in the function netopt include
conjgrad
scd
quasinew
graddesc
–
-
conjugate gradient descent
scaled conjugate gradient descent
quasi-Newton methods
gradient descent optimization
The above are all batch-mode optimization routines. The advantage of the conjugate
gradient, scaled conjugate gradient, and quasi-Newton optimizers are that you don’t need
to specify the learning rate. These algorithms use line searches instead.
You can also train networks with online, or stochastic, gradient descent. For this, the call
to netopt is replaced with a call to the function olgd (for on-line gradient descent).
For on-line gradient descent, you can choose between cyclic sampling of the training data
options(5)=0 or random sampling with replacement options(5)=1.
We can develop an entire training and testing script (for a regression problem) with just a
few more statements:
% Short neural network training and testing script
net = mlp(1,20,1,’linear’);
options=zeros(1,18);
options(1)=1;
options(14)=500;
% Specify the network architecture
% Display the error while running
% Maximum number of training
% iterations
% Train the network (using conjugate gradient descent here)
[net,options]=netopt(net, options, xtrain, ytrain, 'conjgrad');
% Plot the training data, AND the network function
xplot=-5:0.05;5 ;
netvals = mlpfwd(net,xplot’);
% Evaluate the network
% output for input values
% in xplot.
plot(x,netvals, xtrain, ytrain, ‘o’) ;
% Error on training and test sets. Note that the function mlperr
% returns one-half the sum squared error.
Etrain = 2*mlperr(net,xtrain,ytrain) % Returns the training error
Etest = 2*mlperr(net,xtest,ytest)
% Returns the test set error
3. Weight Decay Regularization
Netlab implements regularization by (zero mean) Gaussian priors on each network
parameter – i.e. by weight decay. The weight decay parameters are set by a call to
mlpprior
>> prior = mlpprior(nin, nhidden, nout, aw1, ab1, aw2, ab2)
where aw1 is the weight decay (hyper)parameter for the first layer weights, ab1 the
weight decay parameter for the first layer biases, aw2 the weight decay parameter for
the second layer weights, and ab2 the weight decay parameter for the second layer of
biases. (All of these are scalars – the same value is applied to each weight or bias in a
group.) To make the search over the regularization parameter simple, you should use the
same value for all four weight decay parameters.
The prior returned by mlpprior is passed as an argument to mlp and the weight decay
parameters appear in the data. So the complete statements for network setup and training
are
>> prior = mlpprior(nin, nhidden, nout, aw1,ab1,aw2,ab2);
>> net = mlp(nin, nhidden, nout, function, prior);
>> [net,options] = netopt(net, options, trainIn, trainOut, method);
4. Single Layer Networks
Single layer networks can set up using the function glm (for general linear model)
trained using the function glmtrain, evaluated in the forward mode using the function
glmfwd. The error is reported using the function glmerr. To set weight decay for a
single layer network, set the field prior in the call to glm equal to the weight decay
parameter.
Download