Summary of Pruning ADABOOST, S Rouse, 1071890

advertisement
Summary of paper 'Pruning Adaptive Boosting' by D. Margineantu and T. Dietterich: Samantha Rouse (1071890)
One of the problems with ADABOOST is that it collects an exceptionally large number of classifiers which all need to
be stored and this takes up a lot of memory space. Sometimes these classifiers can take up more space than the data set
itself and often dissuades customers from choosing this method of classification, despite its effectiveness. The paper
aims to determine whether all of these classifiers are necessary for good performance or if it 'prune' down these
classifiers into a smaller subset whilst retaining the effectiveness and performance of the full use of ADABOOST.
The paper proposes five pruning algorithms which cut down the number of classifiers and reduce the memory
requirements of ADABOOST. All five algorithms take in a training set, an ADABOOST algorithm including an
original weak learning algorithm to be boosted (in the case of the paper, the C4.5 algorithm), and a max memory size
for the final group of classifiers. The goal of each algorithm is to construct the best possible selection of classifiers
without exceeding the number of classifiers that fits into the memory limit specified.
Early stopping pruning runs ADABOOST and selects classifiers, simply selects the first x classifiers which fit in
memory. This method relies on the assumption that the ADABOOST algorithm selects classifiers in decreasing order
of quality and that all are equally accurate.
KL-Divergence pruning selects diverse classifiers. It focuses on classifiers that were trained on very different
probability distributions. It calculates the probability distributions using the KL distance. KL divergence finds the set
of classifiers where the total sum of their KL distances (which represents probability distance) is maximized. It then
utilises a greedy algorithm that simply adds new classifiers to set of old, if the new classifier's KL distance is larger.
Kappa pruning uses a different method of choosing diverse classifiers. It selects based on how their classification
decisions differ. To do this it uses the Kappa method. For each pair of classifiers, a value k is computed, and then
chooses pairs starting with the lowest k (the classification decisions are least in agreement) and add them in increasing
order of k until the total number of classifiers to fill the memory space is met.
The fourth algorithm, Kappa-error convex hull pruning, takes into account diversity and accuracy. On a scattergraph
called the Kappa-error diagram, for each pair, k (as above) is plotted on the x axis and the average of error results on
the y axis. The convex hull of the points in the diagram is then constructed. This can simply be thought of as a
"summary" of the diagram. The classifiers chosen are the ones that appears in a classifier pair that forms part of this
hull. One downside of this is that the user cannot change number of chosen classifiers in relation to specified memory
space, although it still prunes down the number of classifiers to a smaller amount.
Finally, Reduce-error pruning with backfitting divides the training set into a sub-training set and a pruning set.
ADABOOST is trained on the sub-training set and then the pruning set is used to decide which classifiers to keep. The
set of classifiers chosen are the ones which give the best voted performance on the pruning set. Backfitting is then used
to add classifiers to the final set.
The paper finds the Reduce-error and the Kappa pruning perform best regardless of how many classifiers need to be
removed, whilst Kappa-error convex hull pruning is also highly competitive, although the fixed number of classifiers
did not make it ideal in some circumstances. The paper noted that ADABOOST had a tendency towards overfitting
and that in certain data sets, Reduce-error pruning actually eliminated this overfitting, thus actually improving the
performance of ADABOOST. In other cases, Reduce-error was also able to remove anything up to 80% of the
classifiers and still exhibit results that were comparable to full ADABOOST. However in other cases, using Reduceerror and pruning up to 20% could substantially harm the output, so it was very data set dependent. Kappa pruning
improved results for all except for one data set that had a reputation of being unstable. Some data sets could be pruned
up to 60% without harming results. Convex hull was better for a notable number of cases. The paper then goes on to
state that due to ADABOOSTs weakness of overfitting in some scenarios, some form of pruning should be used
whenever ADABOOST is.
My opinion is that whilst the algorithms utilised appear to be highly effective, the paper has a tendency to make
sweeping generalisations and the assumption that despite the relatively small number of tested cases, their results are
final and representative of pruning ADABOOST in all possible data sets. The authors only tried out ten data sets, a
relatively tiny amount, and in a few cases pruning decreased the effectiveness of ADABOOST (for example when
Reduce-error pruned anything more than the tiniest 10% amount). On top of this, the authors simply state that in all
scenarios, "some form of pruning" should be used, despite the fact that Early Stopping proved to be a very ineffective
method of pruning ADABOOST and should not be used. This should have been made clearer. Even though overfitting
can be an issue with ADABOOST, I believe that if there is the memory space for it without pruning, it should still be
considered without pruning until a larger number of data sets with many varying number of elements in varying
dimensions and with a varying number of factors can be tested.
Download