A Brief Introduction to Netlab Ian Nabney’s Neural Network Toolbox for MatLab Todd Leen April 2005 The Netlab toolbox is available online at http://www.ncrg.aston.ac.uk/netlab/. The toolbox is installed at OGI/CSEE and at PSU/CS. In order to use it, you must be sure that Netlab is in the searchpath for your Matlab session. You can do this from within Matlab by entering the command >> path(path,’/usr/local/matlab-r14/toolbox/netlab’) if you’re an OGI students. Melanie has the correct path for the PSU installation. There is an overview, and several examples at the class website. The toolbox consists of a set of Matlab m-files, and so are compatible with MatLab running on any platform. There are help comments in the function files, but no extended online documentation. The companion volume Netlab – Algorithms for Pattern Recognition, Ian T. Nabney, Springer, 2001 contains detailed documentation. My goal here is to provide enough of an introduction to allow you to use some of the functions in the toolbox. This is not intended to be comprehensive, and I’ll just give you the simplest pieces to get you started. If you are unfamiliar with Matlab, there are several good on-line tutorials including those at http://spicerack.sr.unh.edu/~mathadm/tutorial/software/matlab/ http://www.engin.umich.edu/group/ctm/ http://web.usna.navy.mil/~mecheng/DESIGN/CAD/MATLAB/usna.html and the OGI/CSE 15minute tutorial at http://www.cse.ogi.edu/CFST/tut/matlab.html . 1. Network Examples and Demos There are several training examples for both regression and classification problems in the demonstration program demnlab. There’s a GUI for training your own data in the demonstration program demtrain. 2. Neural Network Setup and Parameter Optimization Netlab uses a data structure for neural network parameters rather than a simple vector of weights. Hence one requires a wrapper function to interface the optimization algorithms to the network data structure. The function netopt provides this wrapper. The data structures for the network are established by the call >> net = mlp(2,20,1,'linear'); which specifies a multilayer perceptron with 2 input, 20 hidden, and 1 output node. The output node is further specified to have a linear response function. Other choices are ‘logistic’ or ‘softmax’, both are used for classification. The network weights are trained by >> [net,options] = netopt(net, options, xtrain, ytrain, 'conjgrad'); which trains the network specified in the data structure net on the input/target data in xtrain/ytrain using conjugate gradient descent. The call to netopt above returns the optimized weights in the data structure net. Various algorithms are controlled by the 18 values in the vector options. To get an idea of what these are, do a >> help foptions . Optimization methods that can be used in the function netopt include conjgrad scd quasinew graddesc – - conjugate gradient descent scaled conjugate gradient descent quasi-Newton methods gradient descent optimization The above are all batch-mode optimization routines. The advantage of the conjugate gradient, scaled conjugate gradient, and quasi-Newton optimizers are that you don’t need to specify the learning rate. These algorithms use line searches instead. You can also train networks with online, or stochastic, gradient descent. For this, the call to netopt is replaced with a call to the function olgd (for on-line gradient descent). For on-line gradient descent, you can choose between cyclic sampling of the training data options(5)=0 or random sampling with replacement options(5)=1. We can develop an entire training and testing script (for a regression problem) with just a few more statements: % Short neural network training and testing script net = mlp(1,20,1,’linear’); options=zeros(1,18); options(1)=1; options(14)=500; % Specify the network architecture % Display the error while running % Maximum number of training % iterations % Train the network (using conjugate gradient descent here) [net,options]=netopt(net, options, xtrain, ytrain, 'conjgrad'); % Plot the training data, AND the network function xplot=-5:0.05;5 ; netvals = mlpfwd(net,xplot’); % Evaluate the network % output for input values % in xplot. plot(x,netvals, xtrain, ytrain, ‘o’) ; % Error on training and test sets. Note that the function mlperr % returns one-half the sum squared error. Etrain = 2*mlperr(net,xtrain,ytrain) % Returns the training error Etest = 2*mlperr(net,xtest,ytest) % Returns the test set error 3. Weight Decay Regularization Netlab implements regularization by (zero mean) Gaussian priors on each network parameter – i.e. by weight decay. The weight decay parameters are set by a call to mlpprior >> prior = mlpprior(nin, nhidden, nout, aw1, ab1, aw2, ab2) where aw1 is the weight decay (hyper)parameter for the first layer weights, ab1 the weight decay parameter for the first layer biases, aw2 the weight decay parameter for the second layer weights, and ab2 the weight decay parameter for the second layer of biases. (All of these are scalars – the same value is applied to each weight or bias in a group.) To make the search over the regularization parameter simple, you should use the same value for all four weight decay parameters. The prior returned by mlpprior is passed as an argument to mlp and the weight decay parameters appear in the data. So the complete statements for network setup and training are >> prior = mlpprior(nin, nhidden, nout, aw1,ab1,aw2,ab2); >> net = mlp(nin, nhidden, nout, function, prior); >> [net,options] = netopt(net, options, trainIn, trainOut, method); 4. Single Layer Networks Single layer networks can set up using the function glm (for general linear model) trained using the function glmtrain, evaluated in the forward mode using the function glmfwd. The error is reported using the function glmerr. To set weight decay for a single layer network, set the field prior in the call to glm equal to the weight decay parameter.