Learning with Imprecise Classes, Rare Instances, and Complex Relationships Srinath Ravindran

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence
Learning with Imprecise Classes,
Rare Instances, and Complex Relationships
Srinath Ravindran
Department of Computer Science
North Carolina State University
Raleigh, North Carolina 27695
acids, alcohols and amines. Often the chemicals in one class
exhibit properties similar to chemicals in another congeneric
class while being different from chemicals in the same class.
The former phenomenon is an example of within-class variation and the latter, inter-class similarity.
Moreover, the occurrence of some chemicals may be rare
or unseen. That is, the chemical may be the only representative of its class in the training data, or a chemical that is
not be present training set could be presented at a later stage
(test set) for a prediction task.
Finally, the labels assigned to each of the chemicals are
determined experimentally and this task is costly and, as often in other applications, error prone.
The most important challenge when considering rare instances and cases with lack of examples is handling noisy
data. The foremost task is to build a system that can tell
the difference between a noisy instance and a rare instance.
Moreover, many applications deal with large amounts of
data, both in dimensionality and in the number of instances.
This poses additional challenges in terms of memory and
time complexity.
The problems discussed above span all aspects of machine
learning: classification, regression, and reinforcement learning. My thesis, focuses on supervised learning tasks.
Abstract
In applications including chemoinformatics, bioinformatics, information retrieval, text classification, computer vision and others, a variety of common issues
have been identified involving frequency of occurrence,
variation and similarities of instances, and lack of precise class labels. These issues continue to be important
hurdles in machine intelligence and my doctoral thesis
focuses on developing robust machine learning models
that address the same.
Problem Description
There are a variety of machine learning approaches that
work well for problems involving IID data, and cases with
enough examples to represent the variations in data. However, in many practical applications, we may not have sufficient examples, or the data may not be drawn from identical
distributions and there could be complex relations among
individual data points.
These problems that make the learning task difficult and
often error prone typically fall under the following categories:
1. Within-class Variation: instances within one category or
class have different properties
Research Questions
2. Inter-class Similarity: instances from different categories
or classes may have similar properties
Many approaches have been demonstrated to successfully address each of the four problems discussed above.
The more prominent approaches include multilevel models
(Bach 2008; Gelman and Hill 2006), active learning (Cohn,
Atlas, and Ladner 1994; Settles 2010), mixture or ensemble
models (Bahler and Navarro 2000; Jordan and Jacobs 1994;
Bishop 2007), multiple instance learning (Zhou 2004) and
transfer learning techniques (Pan and Yang 2008).
Multilevel modeling, also referred to as hierarchical modeling, is an increasingly popular approach to modeling data
and is known to outperform classical regression in predictive accuracy (Gelman 2005). Multilevel models have
been known to provide improved generalization in prediction tasks across various application domains. Such models
have been studied under various guises including bayesian
models (Gelman and Hill 2006), computer vision and visual
cortex simulation (Sudderth et al. 2005; Bouvrie 2009), and
linear models (Gelman and Hill 2006).
3. Rare Instances: instances may be unseen or may be the
only representatives of their kind and could be present in
either training or test data
4. Lack of labeled instances: instances that either belong to
an unknown class or are assigned an imprecise class label
Over the past few years, a variety of approaches have been
developed to address each of the issues mentioned above.
Existing approaches to prediction most often do not consider these problems together, instead treating these as separate problems. However, there are many applications having
some combination of these problems.
An example application is chemical toxicity prediction.
Chemicals belong to various congeneric classes such as
c 2011, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1859
Future Work
However, multilevel models have found limited use in
prediction problems in the presence of rare instances. The
foremost question to address is: given the power of multilevel models, can they be applied to prediction in the presence of the problems discussed earlier? As we shall discuss
in the next section, our current work has shown promising
results in this direction.
Much multilevel modeling work has focused on batch
learning and supervised learning. However, some applications require online learning. Online learning often posses
all four of the problems mentioned in the previous section.
We must investigate the performance of multilevel models
in such applications and if required, also identify an alternative approach for online learning in the presence of the four
problems.
While each of the first three problems have received attention for a relatively long time, the fourth problem (“lack of
labeled instances”) has received interest only recently. While
active and semi-supervised learning models have been proposed, open questions remain concerning the quality of the
labels in the data. Labeling the data instances is not only
costly but also error prone, for a variety of reasons, such as
labeler inexperience and fatigue. An instance could be assigned either a wrong label, an imprecise label, or left unlabeled. Apart from noisy or imprecise labels, the presence of
class imbalance, within-class variation, and inter-class similarity still remain major concerns in active learning.
My thesis aims to address both the issues of imprecise
labels and class imbalance for active learning. The major
challenge here is establishing a trade-off between cost of labeling and classification error. At present, we are working
on a solution to learning in the presence of imprecise class
labels.
Feature extraction and selection plays a vital role in various
machine learning tasks. More importantly, the presence of
some features may be helpful in predicting rare cases. It is
important to study the effect of both feature selection and
extraction towards improving prediction in the presence of
the four problems we discussed earlier.
We believe that sampling interesting instances could help
improve the performance of active learning, especially in the
presence of class imbalance. This will be a good extension
of my work on interestingness.
Finally, there is an increasing interest in multitask learning and learning in the presence of instances with multiple
labels. All the problems addressed in my work exist in these
two learning tasks as well. It will be an important direction
for future research.
References
Bach, F. 2008. Exploring large feature spaces with hierarchical multiple kernel learning. CoRR abs/0809.1493.
Bahler, D., and Navarro, L. 2000. Methods for combining heterogeneous sets of classifiers. In Proceedings of the
Seventeenth National Conference on Artificial Intelligence,
Workshop on New Research Problems for Machine Learning. AAAI Press/ The MIT Press.
Bishop, C. 2007. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1st ed.
2006. corr. 2nd printing edition.
Bouvrie, J. V. 2009. Hierarchical Learning: Theory with
Applications in Speech and Vision. Ph.D. Dissertation, Massachusetts Institute of Technology.
Cohn, D.; Atlas, L.; and Ladner, R. 1994. Improving generalization with active learning. Mach. Learn. 15:201–221.
Gelman, A., and Hill, J. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge University Press.
Gelman, A. 2005. Multilevel (Hierarchical) Modeling: What
It Can and Can’t Do. Technometrics.
Jordan, M. I., and Jacobs, R. A. 1994. Hierarchical mixtures of experts and the em algorithm. Neural Computation
6(2):181–214.
Pan, S. J., and Yang, Q. 2008. A survey on transfer learning.
Technical Report HKUST-CS08-08, Department of Computer Science and Engineering, Hong Kong University of
Science and Technology, Hong Kong, China.
Settles, B. 2010. Active learning literature survey. Technical
report, University of Wisconsin-Madison.
Sudderth, E. B.; Torralba, A.; Freeman, W. T.; and Willsky,
A. S. 2005. Learning hierarchical models of scenes, objects,
and parts. In ICCV, 1331–1338. IEEE Computer Society.
Zhou, Z.-H. 2004. Multi-instance learning: A survey. Technical report, Department of Computer Science and Technology, Nanjing University, China.
Current Progress
We have developed a multilevel model for supervised prediction tasks that achieves better performance than existing models in the presence of within-class variation, interclass similarity and rare instances without compromising the
overall error rate. Across multiple domains, the model can
generalize at least as well as most existing models, while
correctly predicting more rare instances. A paper describing
this work is currently under review.
In related work previous work, we addressed the issue of
detecting interesting patterns in data, especially detecting if
a less frequent or rare pattern is interesting or not. Existing methods suffer from a variety of shortcomings. To begin, their output depends on the choice of a threshold for the
value of support. If the support is low, they tend to generate a
large number of patterns, many of which are “uninteresting”.
Even if the support is sufficiently large, some patterns generated may already be known to the user as ground truth. At
the same time, some interesting but infrequent patterns may
be mistakenly overlooked. Our approach, being subjective,
uses the relationship between entities of a pattern as a factor
in determining the interestingness of a pattern. The major
limitation of research in this direction is a lack of datasets or
standards that define “interest” since it is context dependent
and subjective.
1860