Regression-based Active Learning for Fraud Detection

Brian Mac Namee
[email protected]
Applied Intelligence Research Centre
Project Summary: Inductive-machine-learning-based automated fraud detection systems are
best treated as a regression problem, and rely on large collections of historical data labelled with
known outcomes. Adding known outcomes to such collections can be an expensive process, but
the use of active learning can alleviate this problem. The key issue in active learning research is
the design of the selection strategies used to select only the most informative examples from a
larger collection for expert review. Active learning research primarily focuses on classification
problems and there remain opportunities to improve selection strategies for regression
problems. The first part of this project will investigate novel hybrid selection strategies for active
learning for regression problems. This work will leverage existing work by the applicants focused
on the development of active learning selection strategies based on intrinsic properties of a
dataset, and the use of prediction model outputs for novelty detection. The second part of the
project will investigate the use of visualisation techniques to allow analysts guide the active
learning process using their own insights.
