CSE 581 Machine Learning

Machine Learning Homework 2-1 Due Tuesday, May 3, 2005 1. Non-linear auto-regression – the Mackey-Glass equation. In this exercise you will use a neural net to predict the dynamics of a chaotic time series generated by the Mackey-Glass delay differential equation. The data are in the files mack2_tr.dat , mack2_val.dat, and mack2_tst.dat on the class web page. There is a description of the data in the file mack2.txt. To get a look at the behavior of this time series, plot the first column of data. For your experiments, use the Netlab toolbox for Matlab. Both Matlab and the Netlab package are on the OGI-CSEE and PSU-CS education compute servers. The idea behind prediction of univariate time series is that their past behavior provides the information needed to predict future behavior. For linear systems and models, this is the basis for autoregressive (AR) time series models. For nonlinear systems, the notion of predictability from the past is formally captured by Taken’s theorem. Taken’s theorem tells us that we can represent the dynamical state of the system that generated the observed time series x(t) by embedding the time series in a vector v(t )  [ x(t ), x(t  ), x(t  2 ), x(t  3 ), ... x(t  (m  1)) ] where  is the embedding lag, and m is the embedding dimension. The data in the files is embedded for you, using a lag and dimension known to provide decent results for this data set. The data consists of five columns. The first four are the lagged values used as inputs, the x(t ) x(t  6) x(t  12) x(t  18) last column is the value to be predicted: x(t  85) Thus we use the current value x(t), and three previous values to predict the value in the future x(t+85). Your assignment is to apply several neural nets to this prediction problem and obtain the best results you can. You are to use mack2_tr.dat to actually optimize the network weights. When you have arrived at what you think is your best network architecture (size, optimization …), then run the test data mack2_tst.dat through your models. (This is noise-free data, so you will not see overfitting.) First fit a single linear node (linear AR model) to the problem. You can prepare, train, and evaluate the one-node linear model using the NetLab functions glm, glmtrain, and glmfwd. Plot the prediction results (on mack2_test.dat) and also the data on the same plot so you can compare the shapes visually. Measure and report the mean square error on the test data also. Then move on to a network with a single layer of sigmoidal hidden nodes and a linear output. Experiment with the number of hidden nodes, and with different optimizers. Try the scaled conjugate gradient (scg)and the quasi-Newton (quasinew) optimizers. You can try quite large networks – up to 50 hidden units -- although they will take a bit longer to train them. As for the linear node, plot network predictions along with the results. Does the nonlinear neural net do significantly better than the linear model? Report the best result you’ve achieved on the test data. Plot the prediction along with the data and compare visually with the linear node prediction. 1. Classification – Iris Data For this exercise, you will train a neural network to classify the three different iris species in the famous Fisher iris data. The data are in the files IrisDev.dat (the development data) and IrisTest.dat (the test data). The files contain the input features in columns one through four, and in the last three columns, the class of each example encoded in a one-of-three representation T = (0 0 1) for class 1 T = (0 1 0) for class 2 T = (1 0 0) for class 3 You will need to construct a network with four inputs and three outputs. Use a logistic unit in the output layer. You can compute the classification accuracy with the Netlab function confmat, which computes both the overall classification accuracy (expressed as percent), and writes out a confusion matrix. The rows of a confusion matrix contain the true class labels, while the columns are the network assignments. For example, suppose we have a three-class problem and a classifier that generates the following confusion matrix: C1 C2 C3 C1 44 1 0 C2 5 39 9 C3 1 10 41 For this example, 44 of 50 C1 example were correctly classified, 5 were mislabeled C2, and 1 was mislabeled C3. The total number of misclassified examples is the sum of the off-diagonal elements, i.e. 26. The error rate is 0.173. Segment the development data set into five segments of 15 examples, and use a 5fold cross-validation to pick the network size. Do you see any overfitting? When you have what you believe is the best network, run the test data through the network and report the classification accuracy, and include the confusion matrix.

CSE 581 Machine Learning

Related documents

Products

Support

CSE 581 Machine Learning

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib