Discovery Informatics: Papers from the AAAI-14 Workshop Using Computational Creativity to Guide Data-Intensive Scientific Discovery Kazjon Grace and Mary Lou Maher {k.grace,m.maher}@uncc.edu The University of North Carolina at Charlotte Abstract ploratory search, and what would be the objective function for such a tool? This paper seeks answers to those questions in research on creativity, particularly in a domain of artificial intelligence called computational creativity: the computational modelling of behaviours that, if observed in humans, would be considered creative (Wiggins 2006). We argue that the exploration of possible hypotheses is a creative act, and that good hypotheses are creative artefacts. Building on research that defines a creative artefact as one that is novel, valuable and surprising (Maher and Fisher 2012; Grace et al. 2014a; 2014b) we define a framework for computational hypothesis discovery based on the search for hypotheses with the union of those three factors as the objective. The generation of plausible hypotheses from observations is a creative process. Scientists looking to explain phenomena must invent hypothetical relationships between their dependent and independent variables and then design methods to verify or falsify them. Datadriven science is expanding both the role of artificial intelligence in this process and the scale of the observations from which hypotheses must be abduced. We adopt methods from the field of computational creativity – which seeks to model and understand creative behaviour – to the generation of scientific hypotheses. We argue that the generation of new insights from data is a creative process, and that a search for new hypotheses can be guided by evaluating those insights as creative artefacts. We present a framework for data-driven hypothesis discovery that is based on a computational model of creativity evaluation. Computationally evaluating creativity What motivates humans to deem artefacts creative? Given that we humans are responsible for constructing the notion of creativity, that question is analogous to defining creativity itself. A universally accepted definition is not readily available. Taylor (1988) gives dozens of attempts at defining creativity, from those that emphasise the creative process and aim to define it as a mental phenomenon to those that emphasise the social grounding of creative acts and aim to define it in terms of how society responds to creations. While these definitions were mostly founded with respect to artistic creativity, the same distinctions hold in science. Newell, Shaw, and Simon (1959) define creative problem solving (an activity highly related to hypothesis generation) as occurring where one or more of the following conditions are met: the result has novelty and value for the solver or their culture, the thinking is unconventional, the thinking requires high motivation and persistence, and being ill-defined or requiring significant problem formulation. Koestler (1964) and Boden (1990) offer related definitions based on the creative process rather than the resulting artefacts. Both are centred on the idea of a mental context for the domain in which the creative problem exists, a representation which Koestler calls a “matrix” and Boden a “conceptual space”. Koestler’s matrices are “patterns of ordered behaviour governed by fixed rules”, while Boden’s spaces are dimensioned by the “organising principles that unify and give structure to a given domain of thinking. Boden describes the pinnacle of creative thinking as the trans- Introduction Machine learning, for decades considered a standard tool throughout the sciences, is now being applied to discovery. Traditional approaches used AI to support hypotheses, building models that verify expected relationships between variables. AI systems are now regularly used to generate new and novel hypotheses from data (Colton and Steel 1999). Another recent development in machine learning has been practical meta-optimisation: the use of optimisation approaches to tune the hyper-parameters of other optimisation problems. These sequential decision making methods – such as Bayesian optimisation (Snoek, Larochelle, and Adams 2012) – permit machine learning to be adopted by non-experts, essentially substituting required computational time for required expertise. Practical meta-optimisation (for which maturing tools are now freely available) facilitates machine learning as a hypothesis confirmation tool, simplifying the process of constructing effective predictive models of well-defined problems. This paper investigates the analogous problem: what would an automated tool look like for hypothesis discovery, rather than confirmation? How can methods like Bayesian Optimisation be extended to exc 2014, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 35 ery to the notion of creativity (Colton and Steel 1999). Innovation analytics shares a goal with these early AI systems. formation of the space of what is possible to include artefacts that were previously outside that space, while Koestler describes creativity as the blending, or “bisociation” of elements from two distinct mental frames of reference. The transformative notions of the creative process postulated by Boden and Koestler are similar to those of unconventionality and reformulation raised by Newell. Each involves the relaxation of constraints and the adoption of elements – of process or product – considered outside the “norm”. The idea of transformation as a necessary component of creativity has attracted criticism for being difficult to operationalise (Wiggins 2006). The notions of novelty and value first articulated by Newell are common among many of the definitions of creativity that can be found in the literature (Taylor 1988). This duality has recently been criticised by computational creativity researchers (Maher and Fisher 2012) as insufficient, with the argument that most computational models of novelty are based on solely on the difference between artefacts. Such objective measures do not capture the complexity and subtlety of expectations about artefacts and the way violating those expectations interacts with creativity. To address this criticism we adopt the definition of a creative hypothesis as one that is novel, valuable and surprising, based on the framework described in (Maher and Fisher 2012). The additional surprise criterion incorporates the ideas of reformulation, transformation and bisociation raised in other definitions with the novelty/value duality of earlier definitions. Novelty (the degree to which an artefact is different from similar artefacts that already exist) and value (the degree to which an artefact is useful for its intended purpose) are near-universally accepted components of creativity. Surprise (the degree of unexpectedness of an artefact) is distinct from novelty as computational models of the latter capture only originality relative to the domain, and do not model notions of unexpectedness or the violation of trends. We adapt this three-factor measure of creativity for use in computational problem framing for analytics, where the “artefact” being evaluated for creativity is a relationship discovered within the data. Innovation Analytics: A framework for computational hypothesis discovery When the dataset becomes large and diverse, as is the case in any “big data” context, the problem of model selection rapidly becomes computationally prohibitive (Burnham and Anderson 2002). It is necessary to first select a subset of the overall data to relate to the desired dependent variables. Then it is often necessary to pre-process the selected variables in some way, by quantization, dimensionality reduction, or some other transformation. Only then can a supervised learning algorithm be applied to predict a relationship between the student attributes and the success attribute(s). At each step there are a number of hyperparameters to tune dependent on the data and algorithms selected. These steps combine to produce a very large space of potential models to explore, with each point in that space representing a potential relationship between student data and student success. We propose a computational method for exploration of that space, using methods derived from computational creativity evaluation. Automated model selection is a rich and on-going area of research (Bozdogan 1987; Burnham and Anderson 2002, etc), from which this project is differentiated by its goal of scientific hypothesis discovery and its accompanying objective function of creativity. We outline the innovation analytics framework in four parts: the computational models of value, novelty, and surprise which make up the creativity evaluation function, and the search process by which we propose to maximise them. We refer to independent variables, which make up the available data from which we intend to build a model, and dependent variables, about which we intend to form hypotheses conditional on the independent variables. Generating influence factors Innovation analytics is a search-based approach to hypothesis discovery. The space of hypotheses that explain the dependent variables will be explored for candidates which are deemed creative. Hypotheses will be represented as tree structures composed of selections over data, transformations of data, and predictive models applied to data, with each operation likely to incorporate several hyperparameters. We are currently investigating approaches for the search algorithm, and are currently considering Genetic Programming (Koza, Bennett III, and Stiffelman 1999), Bayesian Optimisation (Snoek, Larochelle, and Adams 2012) and Particle Swarm Optimisation (Poli, Kennedy, and Blackwell 2007) among others. Innovation analytics defines a non-stationary search, with the objective function defined by novelty, value and surprise changing with the discovery of each new hypothesis. This drift in the objective function will guide the search trajectory of the system towards new, creative hypotheses. We define innovation analytics generally as a search for hypotheses about the dependent variables expressed in terms of the independent variables, where the objective function is Creativity and the exploration of data Creativity has long been identified as at the heart of the scientific endeavour (Taylor and Barron 1963). Hypothesis construction involves creatively transforming data in such a way that a new correlation emerges, and is as much a problem-framing activity as a problem-solving one. We propose that computational methods for evaluating the creativity of objects can be applied to hypothesis discovery, as good hypotheses are innovative, disruptive and require novel approaches to the problem – all hallmarks of creativity. The use of AI methods for creative knowledge discovery dates back (to the authors’ knowledge) to AM and EURISKO (Lenat 1983), which generated new conjectures in mathematics, science and strategic wargames, among other disciplines. These programs possessed a model of the interestingness that guided their decision of what to explore next. One descendent of these programs, HR, expanded on interestingness, explicitly connecting interest-motivated discov- 36 object and its nearest cluster. They distinguish this from value in that novelty is calculated in the artefact description space, while value is calculated in the artefact performance space. An innovation analytics hypothesis is represented by a tree structure composed of selections, transformations and predictions about data. Using an edit distance based similarity metric we can define a space of possible hypotheses and cluster the known hypotheses within that space. We will adopt a hierarchical clustering approach novelty evaluation, which is advantageous as the domain of possible hypotheses – like any domain of creative artefacts – is expected to exhibit structure at multiple levels. A hierarchical clustering approach requires making less assumptions about the number and scale of the clusters comprising the domain than would single-level clustering. Novelty will be evaluated as the weighted sum of the distances between the new hypothesis and each of the clusters it is placed in, with an exponentially decreasing weight dependent on the depth within the tree of each cluster. This reflects that being an outlier in a more general cluster indicates significantly more novelty than being an outlier in a more specific one. the union of novelty, value and surprise. We will investigate several solutions for combining the three factors of novelty, value and surprise into a scalar value. Assessing the value of a hypothesis A creative artefact’s value is the degree to which it is recognised as useful, and of the three aspects of creativity it parallels traditional effectiveness measures most closely – it is the union of value with novelty and surprise that defines creativity. Two two major ways of assessing value have been investigated in the literature: assessing recognition socially (Sarkar and Chakrabarti 2011; Grace et al. 2014b) or assessing utility directly (Maher and Fisher 2012). For this project we adopt the latter approach, constructing a measure of the utility of a hypothesis: the strength of the predicted correlation between the data and a dependent variable. Our value measure is based on the idea that a useful influence factor explains variance in the data that existing influence factors do not. Some factors may be highly predictive for a small subset of the data, while others may be only slightly better than random chance but apply nearuniversally – i.e. value is multiobjective, incorporating both predictive power and effect size. A valuable hypothesis is one that not just accurately explains many data points, but one which explains data that few other hypotheses do. A valuable design is not assessed by summing over performance dimensions, because there is no objective weighting function by which to do so. We operationalise this notion by positing a “performance space” where each dimension is the explained variance of a single entry in the test set. In this case each point’s explained variance is a dimension of performance, and valuable hypotheses exist in unpopulated parts of that space. Hypotheses can be clustered in this space, and then a distance measure applied between the candidate hypothesis and the nearest cluster. This represents how different the correlation is from similar correlations, a measure of its uniqueness in the space of explained variance of the dependent variable. Hypotheses with high value will offer either stronger explanations of academic success than current hypotheses or explanations for groups of students that current hypotheses do not account for. As more hypotheses are discovered, the clusters and distance measures will adapt, reflecting the system’s changing knowledge of the space. Hypotheses that would previously have been of high value will reduce in value if many highly similar hypotheses are discovered. Assessing the surprisingness of a hypothesis (Grace et al. 2014a) describe two kinds of creativity-relevant surprise: artefacts that violate observed relationships, and artefacts that require conceptual restructuring to comprehend. Both are contingent on the violation of strongly-held expectations about future events, but differ in the kinds of expectation being violated. Relational surprise arises from violating expectations about the value some attribute of an artefact will take based on the value of other attributes. Structural surprise arises from the expectation that existing knowledge about how a domain is structured is complete and correct, and is violated when knowledge must be restructured to accommodate new observations. Structural surprise can be assessed via a hypothesis’s impact on the hierarchical clustering used for novelty evaluation. When a new hypothesis is evaluated it perturbs the structure of the online hierarchical clustering of observed hypotheses, affecting how others a clustered in addition to itself. The scale of this perturbation can be measured using a delta function, measuring the difference between the cluster hierarchy before and after the new hypothesis is observed. Relational surprise models the interactions between the independent and dependent attributes. Each independent attribute will be initially assumed to be statistically independent of each dependent attribute, and these assumptions will be updated with each iteration of the system to reflect the degree of interaction predicted by all available hypotheses. The expected influence between variables in future iterations can then be modelled using regression. Sudden increases in the relationship between variables – those that violate the trend predicted by past observations – will elicit surprise. In both of these models, surprising hypotheses about relationships between data are those that force the system to reconsider its understanding of the problem domain. Structural surprise occurs when expectations about the different kinds of hypothesis that exist are violated, causing the sys- Assessing the novelty of a hypothesis A creative artefact’s novelty is the degree to which it differs from other artefacts of the same kind. Each hypothesis to be evaluated for creativity is represented by both the transformation of the data necessary to construct it and the correlation(s) predicted after that transformation is applied. Applied to identifying influence factors in data, this means hypotheses that transform the data in ways unlike other existing hypotheses, and in doing so find different kinds of correlations within the data. (Maher and Fisher 2012) evaluate novelty by clustering observed artefacts and calculating the distance between an 37 References tem to restructure its clustering of what is observed. Relational surprise occurs when expectations about the relationships between attributes and success are violated, causing the system to re-evaluate their expected future interdependence. A hypothesis which violates expectations in either or both cases is considered surprising by the system. Boden, M. A. 1990. The Creative Mind: Myths and Mechanisms. Routledge. Bozdogan, H. 1987. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 52(3):345–370. Burnham, K. P., and Anderson, D. R. 2002. Model Selection and Multi-model Inference: a Practical Informationtheoretic Approach. Springer. Campbell, J. P.; DeBlois, P. B.; and Oblinger, D. G. 2007. Academic analytics: A new tool for a new era. Educause Review 42(4):40. Colton, S., and Steel, G. 1999. Artificial intelligence and scientific creativity. Artificial Intelligence and the Study of Behaviour Quarterly 102. Ferguson, R. 2012. Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning 4(5):304–317. Grace, K.; Maher, M. L.; Fisher, D.; and Brady, K. 2014a. Modeling expectation for evaluating surprise in design creativity. In Design Computing and Cognition 2014 (to appear). Grace, K.; Maher, M. L.; Fisher, D.; and Brady, K. 2014b. A data-intensive approach to predicting creative designs based on novelty, value and surprise. International Journal of Design, Creativity and Innovation (to appear). Koestler, A. 1964. The Act of Creation. Hutchinson London. Koza, J. R.; Bennett III, F. H.; and Stiffelman, O. 1999. Genetic programming as a Darwinian invention machine. Springer. Lenat, D. B. 1983. Eurisko: a program that learns new heuristics and domain concepts: the nature of heuristics iii: program design and results. Artificial intelligence 21(1):61– 98. Maher, M. L., and Fisher, D. H. 2012. Using AI to evaluate creative designs. In 2nd International conference on design creativity (ICDC), 17–19. Newell, A.; Shaw, J.; and Simon, H. A. 1959. The processes of creative thinking. Rand Corporation. Poli, R.; Kennedy, J.; and Blackwell, T. 2007. Particle swarm optimization. Swarm intelligence 1(1):33–57. Sarkar, P., and Chakrabarti, A. 2011. Assessing design creativity. Design Studies 32(4):348–383. Snoek, J.; Larochelle, H.; and Adams, R. P. 2012. Practical bayesian optimization of machine learning algorithms. In NIPS, 2960–2968. Taylor, C. W., and Barron, F. E. 1963. Scientific creativity: Its recognition and development. John Wiley. Taylor, C. W. 1988. Various approaches to the definition of creativity. In Sternberg, R. J., ed., The Nature of Creativity. Cambridge University Press. 99–124. Wiggins, G. A. 2006. A preliminary framework for description, analysis and comparison of creative systems. Knowledge-Based Systems 19(7):449–458. Proposed domain: Learning analytics Large-scale educational institutions, whether they are predominantly campus-based like many comprehensive state universities, or predominantly online like MOOCs, graduate far fewer students than they enrol. The US national average graduation-within-six-years rate for public four-year universities is 46%1 . Completion rates among MOOC courses are lower still, with completion rates of less than 10% being the norm. Those figures are far in excess of what can be explained by attrition of the incapable or unmotivated. Educational organizations are increasingly in possession of overwhelming data about student activities, both in traditional campus-based and large-scale online learning environments. Most educational analytics research (Campbell, DeBlois, and Oblinger 2007; Ferguson 2012) has focused on predicting overall academic success from intuitively related sources, primarily other academic data. Innovation analytics permits an exploration of heterogenous data that is not intuitively related to academic performance. Potential data sources include online activity, campus purchasing habits, swipe card logs and club/society activity that are initially non-intuitive. In this domain it is necessary not only to be able to predict student success, but to construct scrutable, verifiable hypotheses on which policy can be based. Conclusion We frame the act of discovering new hypotheses as a creative process, and propose a novel method of representing that act computationally based on that framing. The approach is inspired by theories and models developed for evaluating creative products, providing a theoretical basis for generating and evaluating creative hypotheses. Creativity is operationalised as the intersection of novelty, value and surprise: good designs are novel, unexpected and valuable to their users, while good influence factors are novel, unexpected and valuable to the research community. Novel influence factors promote progress, unexpected influence factors challenge established norms, and valuable influence factors explain the data. When the computational system identifies a correlation it evaluates as being creative, it communicates to its users the discovery, the interpretation of the data needed to discover it, and a justification for its discovery. This project posits a new approach to data-intensive analytics in scientific research that goes beyond a focus on finding strong correlations in large datasets. This project will develop a framework for discovering and communicating creative hypotheses about educational data. 1 Statistics from the National Center for Education Statistics (nces.ed.gov) 38