RACOG and wRACOG: Two Probabilistic Oversampling Techniques

advertisement
RACOG and wRACOG: Two Probabilistic Oversampling Techniques
Abstract:
As machine learning techniques mature and are used to tackle complex
scientific problems, challenges arise such as the imbalanced class
distribution problem, where one of the target class labels is underrepresented in comparison with other classes. Existing oversampling
approaches for addressing this problem typically do not consider the
probability distribution of the minority class while synthetically generating
new samples. As a result, the minority class is not represented well which
leads to high misclassification error. We introduce two probabilistic
oversampling approaches, namely RACOG and wRACOG, to synthetically
generating and strategically selecting new minority class samples. The
proposed approaches use the joint probability distribution
of data attributes and Gibbs sampling to generate new minority class
samples. While RACOG selects samples produced by the Gibbs sampler
based on a predefined lag, wRACOG selects those samples that have the
highest probability of being misclassified by the existing learning model.
We validate our approach using nine UCI data sets that were carefully
modified to exhibit class imbalance and one new application
domain data set with inherent extreme class imbalance. In addition, we
compare the classification performance of the proposed methods with three
other existing resampling techniques.
Download