(1) Assume algorithm X has two parameters A and B, each of which

advertisement
Machine Learning in Practice / Applied Machine Learning
Week 12 Quiz
Name:________________________________________
(1) Assume algorithm X has two parameters A and B, each of which have 2 settings, so you can create 4 alternative
settings of algorithm X {X1-1, X1-2, X2-1, X2-2}. Note that X1-1 means algorithm X with setting 1 for parameter A and
setting 1 for parameter B. X1-2 means algorithm X with setting 1 for parameter A and setting 2 for parameter B. Assume
you are doing 5-fold cross validation, so your data set Y can be divided into 5 segments {Y1, Y2, Y3, Y4, Y5}. {Y2,Y3,Y4}
is a subset of your data created by concatenating segments Y2, Y3, and Y4.
Interpret the table below as follows, using the first substantive row as an example. If you train on {Y2,Y3,Y4} and test on
{Y1} with algorithm X1-1, you will get .55 Kappa. If you use X1-2 instead, you will get .5 kappa. And so on. You can refer
to this table when you are stepping through your optimization to compare performance of alternative versions on the
algorithm on different folds. Use the optimization approach where on each fold you create a train set, a validation set, and
a test set.
Train
{Y2,Y3,Y4}
{Y3,Y4,Y5}
{Y1,Y2,Y5}
{Y1,Y3,Y5}
{Y1,Y2,Y4}
{Y2,Y3,Y4,Y5}
{Y1,Y3,Y4,Y5}
{Y1,Y2,Y4,Y5}
{Y1,Y2,Y3,Y5}
{Y1,Y2,Y3,Y4}
Validation or Test
{Y1}
{Y2}
{Y3}
{Y4}
{Y5}
{Y1}
{Y2}
{Y3}
{Y4}
{Y5}
X1-1
.55
.6
.55
.62
.66
.7
.66
.5
.55
.53
X1-2
.5
.66
.63
.53
.55
.66
.55
.6
.66
.6
X2-1
.68
.68
.6
.72
.6
.72
.68
.7
.58
.59
X2-2
.4
.58
.52
.55
.54
.56
.6
.59
.6
.55
(A) Describe your optimization procedure.
(B) What would be your test set performance on each fold as well as average performance for your optimized algorithm?
Were there any folds where selecting the algorithm with the best performance on the validation set lead to sub-optimal
performance on the test set (if so, which?)?
(C) Based on your optimization algorithm, which version of algorithm X would you pick for test set {Y2} and why?
(D) Would you pick the same version of algorithm X for {Y1} and {Y3}? Why or why not?
Download