CSE 581 Machine Learning

advertisement
Machine Learning
Homework 2-1
Due Tuesday, May 3, 2005
1. Non-linear auto-regression – the Mackey-Glass equation.
In this exercise you will use a neural net to predict the dynamics of a chaotic time
series generated by the Mackey-Glass delay differential equation. The data are in
the files mack2_tr.dat , mack2_val.dat, and mack2_tst.dat on
the class web page. There is a description of the data in the file mack2.txt. To
get a look at the behavior of this time series, plot the first column of data.
For your experiments, use the Netlab toolbox for Matlab. Both Matlab and the
Netlab package are on the OGI-CSEE and PSU-CS education compute servers.
The idea behind prediction of univariate time series is that their past behavior
provides the information needed to predict future behavior. For linear systems
and models, this is the basis for autoregressive (AR) time series models. For
nonlinear systems, the notion of predictability from the past is formally captured
by Taken’s theorem.
Taken’s theorem tells us that we can represent the dynamical state of the system
that generated the observed time series x(t) by embedding the time series in a
vector
v(t )  [ x(t ), x(t  ), x(t  2 ), x(t  3 ), ... x(t  (m  1)) ]
where  is the embedding lag, and m is the embedding dimension.
The data in the files is embedded for you, using a lag and dimension known to provide
decent results for this data set.
The data consists of five columns. The first four are the lagged values used as inputs, the
x(t ) x(t  6) x(t  12) x(t  18)
last column is the value to be predicted:
x(t  85)
Thus we use the current value x(t), and three previous values to predict the value in the
future x(t+85).
Your assignment is to apply several neural nets to this prediction problem and obtain the
best results you can. You are to use mack2_tr.dat to actually optimize the network
weights. When you have arrived at what you think is your best network architecture
(size, optimization …), then run the test data mack2_tst.dat through your models.
(This is noise-free data, so you will not see overfitting.)
First fit a single linear node (linear AR model) to the problem. You can prepare, train,
and evaluate the one-node linear model using the NetLab functions glm, glmtrain,
and glmfwd. Plot the prediction results (on mack2_test.dat) and also the data on the
same plot so you can compare the shapes visually. Measure and report the mean square
error on the test data also.
Then move on to a network with a single layer of sigmoidal hidden nodes and a linear
output. Experiment with the number of hidden nodes, and with different optimizers. Try
the scaled conjugate gradient (scg)and the quasi-Newton (quasinew) optimizers.
You can try quite large networks – up to 50 hidden units -- although they will take a bit
longer to train them.
As for the linear node, plot network predictions along with the results. Does the nonlinear neural net do significantly better than the linear model? Report the best result
you’ve achieved on the test data. Plot the prediction along with the data and compare
visually with the linear node prediction.
1. Classification – Iris Data
For this exercise, you will train a neural network to classify the three different iris
species in the famous Fisher iris data. The data are in the files IrisDev.dat (the
development data) and IrisTest.dat (the test data). The files contain the input
features in columns one through four, and in the last three columns, the class of each
example encoded in a one-of-three representation
T = (0 0 1) for class 1
T = (0 1 0) for class 2
T = (1 0 0) for class 3
You will need to construct a network with four inputs and three outputs. Use a
logistic unit in the output layer. You can compute the classification accuracy with the
Netlab function confmat, which computes both the overall classification accuracy
(expressed as percent), and writes out a confusion matrix.
The rows of a confusion matrix contain the true class labels, while the columns are
the network assignments. For example, suppose we have a three-class problem and a
classifier that generates the following confusion matrix:
C1
C2
C3
C1
44
1
0
C2
5
39
9
C3
1
10
41
For this example, 44 of 50 C1 example were correctly classified, 5 were mislabeled
C2, and 1 was mislabeled C3. The total number of misclassified examples is the sum
of the off-diagonal elements, i.e. 26. The error rate is 0.173.
Segment the development data set into five segments of 15 examples, and use a 5fold cross-validation to pick the network size. Do you see any overfitting? When
you have what you believe is the best network, run the test data through the network
and report the classification accuracy, and include the confusion matrix.
Download