CogSciDMIntellAmp 03 TK 110720

advertisement
From Cognitive Science to Data Mining: The first intelligence amplifier
Tom Khabaza, July 2011
tom.khabaza@btinternet.com
Paper for “From Animals to Robots and Back: reflections on hard problems in the study of cognition - A
Symposium in Honour of Aaron Sloman”, School of Computer Science, University of Birmingham
12-13 September 2011
Introduction: Intelligence Amplifiers & Data Mining
“Intelligence Amplification” ([1,2,3]) refers the idea that the products of Artificial Intelligence will be used
initially, not to create fully intelligent machines, but to amplify or increase the power of human intelligence.
Data mining [4,5] is one such intelligence amplifier; data mining algorithms form the core of a process which
amplifies our ability to detect and act upon patterns in large quantities of data.
Whether data mining is really the first intelligence amplifier is open to debate; perhaps it is the first intelligence
amplifier in widespread use. The purpose of this claim is to emphasise that data mining enhances our mental
abilities in a way which is much closer to the idea of intelligence amplification than most of the widespread use
of IT.
Historical Background: Poplog, Clementine & CRISP-DM
During the 1980s, the Poplog AI programming environment [6] (developed at Sussex University under the
leadership of Aaron Sloman) was sold in the non-academic market by Systems Designers Ltd, which later
became SD-Scicon. A management buyout from SD-Scicon in 1989 created Integral Solutions Ltd (ISL),
whose core business was initially Poplog. At this stage, ISL’s product range included two machine learning
modules based on decision trees and neural networks, and ISL’s early business included a series of projects
which applied machine learning to extract useful patterns from customers’ data – that is, data mining projects
[7]. Based on the experience of these projects, Colin Shearer invented the Clementine data mining workbench
[8].
Despite being the first practitioner to execute ISL’s commercial data mining projects, I was initially sceptical
about the prospects for data mining and the Clementine workbench. Clearly the machine learning techniques
used for data mining could not in themselves solve business problems of any significance; how then could data
mining technology be of practical use?
The answer, which emerged from successive projects, lay in the data mining process. Clementine had the then
unique property of making data mining algorithms (at that time synonymous with machine learning algorithms)
accessible to non-technologists. This meant that the process of understanding and preparing the data, applying
the algorithms, and interpreting and using the results, could be executed by or in close collaboration with people
whose primary knowledge was in the business domain [9]. This in turn meant that business knowledge and
understanding could be closely integrated with data mining technology in the process of business problemsolving, without falling foul of the limitations of machine knowledge representation.
The design of Clementine, and the business-oriented data mining process which it enabled, were highly
influential, and could be said to have shaped modern data mining practice and tools. The business-oriented
process was later standardised in the data mining methodology CRISP-DM [10].
Data Mining
Data mining is the use of business knowledge to create new knowledge in natural or artificial form by
discovering and interpreting patterns in data. The term “business” is used here to emphasise the use of data
mining for practical purposes, but the definition would be equally correct if “business” were replaced with
“domain”. At heart, data mining is a business process, and is used in a wide variety of applications, including
customer analytics, fraud detection, risk management and law enforcement, and also in science and medicine.
The more recent term “Predictive Analytics” usually refers to complete solutions in which data mining is
embedded. Data mining is distinguished from other forms of data analysis by the use of “data mining
algorithms”, also sometimes called “predictive modelling algorithms”. “Knowledge in artificial form” refers to
the output of these algorithms, “predictive models” or “data mining models”, which are used to increase
information locally on the basis of generalisation, and are often embedded in Predictive Analytics solutions.
The industry standard data mining methodology is called CRISP-DM
[CRISP-DM] (which stands for CRoss-Industry Standard Process for
Data Mining), and is depicted in figure 1.
CRISP-DM was created by a research consortium, based on consultation
with a wide circle of practicing data miners; during this consultation
process, it was discovered that all practicing data miners had
independently discovered approximately the same process for successful
data mining.
Figure 1: CRISP-DM diagram
CRISP-DM provides an accurate picture of how data mining is carried
out, but omits some key properties of the data mining process, and does
not explain why the process has the form that it does.
9 Laws of Data Mining
Attempting to answer some nagging questions about data mining, I have recently published the “9 laws of data
mining” [11], listed below:
1. Business objectives are the origin of every data mining solution (Business Goals Law)
2. Business knowledge is central to every step of the data mining process (Business Knowledge Law)
3. Data preparation is more than half of every data mining process (Data Preparation Law)
4. The right model for a given application can only be discovered by experiment
or “There is No Free Lunch for the Data Miner” (NFL-DM)
5. There are always patterns (Watkins’ Law)
6. Data mining amplifies perception in the business domain (Insight Law)
7. Prediction increases information locally by generalisation (Prediction Law)
8. The value of data mining results is not determined by the accuracy or stability of predictive models
(Value Law)
9. All patterns are subject to change (Law of Change)
These laws address many aspects of the data mining process, but in this paper I will focus on the 6 th law: “Data
mining amplifies perception in the business domain”. This is also called the “Insight Law” because in data
mining the creation of new knowledge in natural form (“knowledge in the head”) is often described as
producing “insight”, this being one of the two types of result from data mining, the other being predictive
models.
From Intelligence to Perception
How and why does the data mining process produce new knowledge? The data mining process is essentially
one of problem-solving; the business expert works out how to achieve an objective in the business domain.
Business problems are solved by humans, not by algorithms, so how does data mining play a part in this?
The key issue addressed by data mining is that there may be useful information buried in data, where the
required volume of data is too large for patterns to be seen unaided. (Watkins Law indicates that such
information is always present.) A conventional view of data mining would suggest that business goals are
translated into data mining goals, then the algorithms are applied to the data, producing predictive models; these
models are used to make predictions and help guide business decision-making in such a way as to help achieve
the business goal. However, this view omits two crucial factors – one is the pervasive role of business
knowledge (as per the 2nd law) and the other is the production of insight, or new knowledge. It is on this second
shortcoming that I will now focus.
While data mining may indeed produce predictive models to aid decision-making, both the models themselves
and the process that produces them can also tell us new things about the business or domain. The process of
understanding and preparing the data means examining the data in a great deal of detail, and new facts often
emerge from this process; the data themselves have no intrinsic meaning, but when interpreted in the light of
business knowledge the data often reveal important new information about the business, even before data
mining algorithms are applied. When predictive models are produced, these will also often tell us important
information about the business – this may be revealed by the behaviour of the model, or by the model itself,
such as the readable rules in a decision-tree model, or by the relative importance of different input variables in
unreadable models. Again this information has no intrinsic importance, but can be seen to be important when
interpreted in the light of business knowledge.
It is a characteristic of these processes that they take place in the business domain; every piece of data and every
action has a business meaning. The data miner works, not in the realm of bits, bytes and algorithms, but in the
domain of enquiry. The data mining process enables the data miner to see things which would not be visible
unaided. We know that perception is an active, knowledge-based process. The data miner sees things in the
business domain by knowing what they are looking at.
My first hypothesis in this paper is that data mining amplifies perception in the following way: data mining
algorithms can detect patterns in data which are not visible to the naked eye, but the algorithms themselves have
no domain knowledge. The business expert has the business knowledge but cannot see the patterns unaided.
The data mining process (as described by CRISP-DM) enables the business expert to incorporate the pattern
discovery capabilities of the algorithms into their own perceptual process. There is nothing mysterious about
this – the process is mostly a codification of common sense – but it explains why data miners have the
experience of seeing things in the data. It is because data mining is like a perceptual process.
I have always wondered why machine learning algorithms (from the field of AI) seem to work better for data
mining than those originating in the field of statistics. My second hypothesis in this paper is that machine
learning algorithms work well for data miners because they are designed to be part of a cognitive system.
Machine learning systems tend to be based on intuitively plausible models of knowledge. For the purposes of
the data miner, it matters little whether these models are correct descriptions of human cognition; what makes
them helpful for data miners is the plausible nature of the knowledge they create or the patterns they discover.
This makes the algorithms easier to use as an extension of one’s own cognition.
Conclusion: The Impact of Cognitive Science
A bird’s-eye view of the activities of data miners in organisations would not immediately reveal anything to do
with cognition. A data miner appears to (and does in fact) work in the domain of application – they would seem
like marketeers, or fraud detection operatives, or police intelligence officers, or geneticists, or medics. They are
exactly this, but augmented by having their perceptual abilities, within their domain of operation, enhanced by
the ability to see meaningful patterns in data. Data mining is acting, for data miners, as an intelligence
amplifier.
This kind of intelligence amplifier does not provide the expanded human intellect envisioned by Ashby [12];
nevertheless, the expanded perceptual abilities of data miners can be used to make the world a better place (e.g.
[13,14,15,16,17]).
If my second hypothesis is correct, then this ability of data mining to enhance the perception of domain workers
is the result of the output of Cognitive Science research. By focussing on cognition, we have produced tools
which can become part of cognition.
References
[1] Ashby, W. R., An Introduction to Cybernetics, Chapman and Hall, 1956.
[2] Licklider, J. C. R., "Man-Computer Symbiosis", IRE Transactions on Human Factors in Electronics, vol.
HFE-1, 4-11, Mar 1960.
[3] Engelbart, D. C., Augmenting Human Intellect: A Conceptual Framework, Summary Report AFOSR-3233,
Stanford Research Institute, Menlo Park, CA, Oct 1962.
[4] Berry, M. J. A. & Linoff, G., Data Mining Techniques: For Marketing, Sales and Customer Support, Wiley,
1997.
[5] Helberg, C., Data Mining with Confidence, SPSS Inc., Chicago, 2002.
[6] du Boulay, J. B. H., Khabaza, T., Elsom-Cook, M. & Taylor, J., Poplog and the Learner: An Artificial
Intelligence Environment Used in Education, in Directory of Computer Training 1986, Badegmore part
Enterprises for Hoskyns Education.
[7] Fitzsimons, M., Khabaza, T. & Shearer, C. “The Application of Rule Induction and Neural Networks for
Television Audience Prediction”, in Proceedings of ESOMAR/EMAC/AFM Symposium on Information
Based Decision Making in Marketing, Paris, November 1993, pp 69-82.
[8] Khabaza, T. & Shearer, C., “Data Mining with Clementine”, IEE Colloquium on Knowledge Discovery in
Databases, Digest No 1995/021(B), London, February 1995.
[9] Shearer, C. & Khabaza, T., Data Mining by Data Owners, Intelligent Data Analysis, Baden-Baden,
Germany, August 1995.
[10] Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R., CRISP-DM 1.0:
Step-by-step data mining guide, http://www.crisp-dm.org, 1999.
[11] Khabaza, T., Nine Laws of Data Mining, www.khabaza.com/9laws, also published as a discussion group on
LinkedIn and on Twitter.
[12] Asaro, P. M., “From Mechanisms of Adaptation to Intelligence Amplifiers: The Philosophy of W. Ross
Ashby”, in Husbands, P., Holland, O. & Wheeler, M. (eds.) The Mechanical Mind in History, MIT Press,
2008.
[13] Van, J., “SPSS tools unravel secrets of disease”, Chicago Tribune, 11th January 2003.
[14] Piatetsky-Shapiro, G., Khabaza, T. & Ramaswamy, S., “Capturing Best Practice for Microarray Gene
Expression Analysis”, SIGKDD 2003, August 2003, Washington.
[15] Adderley, R. & Musgrove, P. B., “Data mining at the West Midlands Police: A study of bogus official
burglaries”, BCS Special Group Expert Systems, 1999, London, Springer-Verlag, pp191-203.
[16] McCue, C., Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis,
Butterworth-Heinemann, 2006.
[17] Chang, C.-J. & Shyue, S.-W., “A study on the application of data mining to disadvantaged social classes in
Taiwan’s population census”, Expert Systems with Applications 36, pp 510–518, Elsevier, 2009.
Download