Field Testing of An Autonomous Machine Learning System

advertisement
AUTOLEARN – AN AUTOMATED MACHINE LEARNING SYSTEM
A SHORT DESCRIPTION OF ITS CAPABILITIES
Asim Roy
1.
What is the problem with existing pattern recognition/machine learning technology?
The pattern recognition problem is addressed by a wide variety of fields, from statistics with its traditional
techniques to neural networks and artificial intelligence (AI) with their modern methods based on ideas about
how the human brain does it. But the main drawback to a wider use of this technology has been the need to
involve experts in order to develop an application. Almost all pattern recognition techniques, whether from
statistics or neural networks or AI, involve some arcane mathematical algorithms that only a few specialists
understand. So to develop an application using one of the techniques, one would have to hire one or more
specialists, sometimes requiring a Ph.D. degree, who by a process of trial-and-error create something that works.
This need to use experts or specialists in order to develop an application has held back the wider use of this
technology in products and systems. So there always has been a need to develop better methods for pattern
recognition that didn’t require such “human baby-sitting” by experts. However, developing methods that obviate
the need for experts has not been an easy task, because it involves major theoretical challenges. Section 2 below
describes how this situation has changed dramatically with the development of a new system called AutoLearn,
a general-purpose machine learning system that requires no baby-sitting by human experts. The pattern
recognition technique embedded in AutoLearn is based on some major theoretical breakthroughs in this science.
2.
General features and capabilities of AutoLearn
AutoLearn is a pattern recognition/classification system that “automates” the process of learning by a machine.
The system has a spreadsheet-like user interface for data input. Thus, in this preliminary version, data can be
brought into the system from other sources (databases, spreadsheet systems) by simply writing the data into a
textfile that can then be read by AutoLearn. Data can also be input directly into AutoLearn. Once data is
available within AutoLearn, literally “a simple click of a button” is all that is needed to have the system learn
the rules of pattern recognition from the data. So there is no need for an expert who knows how the underlying
algorithm works or the need to set various parameters of the learning algorithm. Here are some of the other
powerful capabilities of AutoLearn:
a) Error minimization: One of the primary tasks of a pattern recognition system is to learn rules so that it
has minimum error when used on previously unseen examples. AutoLearn’s underlying algorithm also
learns rules so that its performance error is minimized when used on new cases.
b) Generalization: The essence of human learning is generalization. Generalization is the ability to
describe the phenomenon at hand using a minimum set of rules. There is no current method that can
perform this task well without the intervention of an expert. AutoLearn can perform this task extremely
well without any outside intervention. For example, on a well-known breast cancer dataset, AutoLearn
derived a “single rule” to detect benign cancer cases that no other system has been able to. Deriving a
minimal set of rules facilitates a greater understanding of the phenomenon than otherwise.
c) Derives rules in a form that humans can understand: Neural networks and statistical pattern
recognition techniques cannot derive rules in a human understandable form. Some AI techniques can,
but they require a lot of outside intervention in order to do so. The rules derived by AutoLearn are in the
well-accepted IF-THEN form that humans can easily understand.
d) Feature Selection: For a given problem, often it is not known what features/attributes are important for
the pattern recognition task. So, generally, lots of features are included in the datasets and that creates a
problem for methods that do not know how to weed out inappropriate features/attributes. Inclusion of
inappropriate features/attributes result in higher error rates for these methods. AutoLearn, on the other
hand, has the capability to “automatically” weed out inappropriate features. That should provide further
help in human understanding of the phenomenon. For example, the breast cancer dataset mentioned
earlier has nine (9) features (9 measurements taken of breast cancer cells of patients) and AutoLearn
derived two rules for the detection of malignancy cases. These two rules are based only on six (6) of
those nine (9) features/attributes. So that should lead to a better understanding of what features
(measurements) are important for the detection of malignancy cases.
e) Can generate neural networks too: One of the goals of the science of neural networks is to take
advantage of parallel processing in the computations similar to mechanisms of the brain. The idea of
parallelization of computations is important in many applications (e.g., defense systems, robotics) and
AutoLearn can indeed “design and train” a certain kind of neural network “automatically.” This
powerful capability has never been available before.
“Automated learning” has the potential to generate major, high volume applications in the future, from military
and medical to manufacturing, business and e-commerce. These potential applications are currently only a
dream without the “automated” machine learning technology.
Download