Summary of paper 'Pruning Adaptive Boosting' by D. Margineantu and T. Dietterich: Samantha Rouse (1071890) One of the problems with ADABOOST is that it collects an exceptionally large number of classifiers which all need to be stored and this takes up a lot of memory space. Sometimes these classifiers can take up more space than the data set itself and often dissuades customers from choosing this method of classification, despite its effectiveness. The paper aims to determine whether all of these classifiers are necessary for good performance or if it 'prune' down these classifiers into a smaller subset whilst retaining the effectiveness and performance of the full use of ADABOOST. The paper proposes five pruning algorithms which cut down the number of classifiers and reduce the memory requirements of ADABOOST. All five algorithms take in a training set, an ADABOOST algorithm including an original weak learning algorithm to be boosted (in the case of the paper, the C4.5 algorithm), and a max memory size for the final group of classifiers. The goal of each algorithm is to construct the best possible selection of classifiers without exceeding the number of classifiers that fits into the memory limit specified. Early stopping pruning runs ADABOOST and selects classifiers, simply selects the first x classifiers which fit in memory. This method relies on the assumption that the ADABOOST algorithm selects classifiers in decreasing order of quality and that all are equally accurate. KL-Divergence pruning selects diverse classifiers. It focuses on classifiers that were trained on very different probability distributions. It calculates the probability distributions using the KL distance. KL divergence finds the set of classifiers where the total sum of their KL distances (which represents probability distance) is maximized. It then utilises a greedy algorithm that simply adds new classifiers to set of old, if the new classifier's KL distance is larger. Kappa pruning uses a different method of choosing diverse classifiers. It selects based on how their classification decisions differ. To do this it uses the Kappa method. For each pair of classifiers, a value k is computed, and then chooses pairs starting with the lowest k (the classification decisions are least in agreement) and add them in increasing order of k until the total number of classifiers to fill the memory space is met. The fourth algorithm, Kappa-error convex hull pruning, takes into account diversity and accuracy. On a scattergraph called the Kappa-error diagram, for each pair, k (as above) is plotted on the x axis and the average of error results on the y axis. The convex hull of the points in the diagram is then constructed. This can simply be thought of as a "summary" of the diagram. The classifiers chosen are the ones that appears in a classifier pair that forms part of this hull. One downside of this is that the user cannot change number of chosen classifiers in relation to specified memory space, although it still prunes down the number of classifiers to a smaller amount. Finally, Reduce-error pruning with backfitting divides the training set into a sub-training set and a pruning set. ADABOOST is trained on the sub-training set and then the pruning set is used to decide which classifiers to keep. The set of classifiers chosen are the ones which give the best voted performance on the pruning set. Backfitting is then used to add classifiers to the final set. The paper finds the Reduce-error and the Kappa pruning perform best regardless of how many classifiers need to be removed, whilst Kappa-error convex hull pruning is also highly competitive, although the fixed number of classifiers did not make it ideal in some circumstances. The paper noted that ADABOOST had a tendency towards overfitting and that in certain data sets, Reduce-error pruning actually eliminated this overfitting, thus actually improving the performance of ADABOOST. In other cases, Reduce-error was also able to remove anything up to 80% of the classifiers and still exhibit results that were comparable to full ADABOOST. However in other cases, using Reduceerror and pruning up to 20% could substantially harm the output, so it was very data set dependent. Kappa pruning improved results for all except for one data set that had a reputation of being unstable. Some data sets could be pruned up to 60% without harming results. Convex hull was better for a notable number of cases. The paper then goes on to state that due to ADABOOSTs weakness of overfitting in some scenarios, some form of pruning should be used whenever ADABOOST is. My opinion is that whilst the algorithms utilised appear to be highly effective, the paper has a tendency to make sweeping generalisations and the assumption that despite the relatively small number of tested cases, their results are final and representative of pruning ADABOOST in all possible data sets. The authors only tried out ten data sets, a relatively tiny amount, and in a few cases pruning decreased the effectiveness of ADABOOST (for example when Reduce-error pruned anything more than the tiniest 10% amount). On top of this, the authors simply state that in all scenarios, "some form of pruning" should be used, despite the fact that Early Stopping proved to be a very ineffective method of pruning ADABOOST and should not be used. This should have been made clearer. Even though overfitting can be an issue with ADABOOST, I believe that if there is the memory space for it without pruning, it should still be considered without pruning until a larger number of data sets with many varying number of elements in varying dimensions and with a varying number of factors can be tested.