From: AAAI Technical Report SS-95-03. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. Generic Tasks of Scientific Discovery Radl E. Vald~s-P~rez Computer Science Department and Center for Light Microscope Imaging and Biotechnology Carnegie Mellon University Pittsburgh, PA 15213- USA Abstract Theaim of this paper is conceptualand theoretical: to proposethe conceptof generic scientific task as an organizing principle for research on scientific discovery, or morebroadly, scientific inference. Generictasks cut across various scientific fields, but involverather specific inferences; in any case, their generality is found only by surveying science as a whole. Computerprograms that carry out these generic tasks maydiffer somewhatfrom one science to another, but they all share similar computational mechanisms:hence they represent generic tasks. Weassemble a broad array of previous work as evidence to support this concept, and work out the research plan and methodsthat are implied by it. Introduction Most of the AI work on scientific discovery (including our own) has been carried out under one of two goals: to apply AI techniques fruitfully to science, or to show that computerized human-like search mechanisms can approximate human performance on scientific and mathematical tasks. Wecite just two examples of each: Firstly, the DENDRAL project applied AI methods to the chemical problem of elucidating molecular structure on the basis of mass-spectrometer observations (Lindsay el al. 1993), and the GRAFFITIproject has resulted in a program that makes interesting, plausible conjectures in graph theory (Fajtlowicz 1988; Chung 1988). In pursuit.of the second goal, Langley, Simon, Bradshaw, and Zytkow have described several programs that employ productions systems in a human-like manner to provide simple scenarios for prominent discoveries of the last century (Langley et al. 1987). Earlier, the AMprogram was claimed to make use of humanoid mechanisms to make conjectures in number theory (Lenat 1982). Manyother programs implemented under either banner could be cited. Of course, the two goals are not wholly separate: instead, they reflect a special emphasis. For example, an emphasis on AI performance (or expert) systems serves nevertheless to confirm the machine-centered half of 23 the theory of heuristic search (Newell &Simon 1976), whereas research having a cognitive emphasis serves to open up new scope for expert systems in science. The above overview leads to an important observation: most AI work on scientific discovery has viewed discovery either as a domain, i.e., as an outlet for AI techniques, or as human phenomena to be understood using humanoid mechanisms, e.g., production systems. These are worthy goals that merit further work, but there exist complementaryalternatives. Here, our goal is to propose such an alternative that views scientific discovery as abstract phenomenathat can be studied with the usual methods of empirical and theoretical science. The term "abstract" signifies that these phenomena are not the same as the reallife psychological phenomenaof humanscientists making inferences, but are instead the phenomenaof, for example, the logical and heuristic relations that hold between a hypothesis and evidence, or the degree of analogy between two scientific theories. Moreover, we use the term "phenomena" to emphasize the empirical nature of this alternative viewpoint: these abstract phenomenaconsist of the abstract inferential problems that arise within real-life science, regardless of howhuman scientists deal with them. In science, the presence of rich and interesting phenomenais a cue for developing scientific fields (Norwood Hanson (Hanson 1958) quotes Isaac Newton: "the task of natural philosophy is to argue from phenomena"). The phenomenaof scientific inference are rich: the number of records added to the Science Citation Index is upwards of 728,000 per year. Their interestingness we take as self-evident. It is unlikely that the inferences that arise within these papers are all the same, hence there is scope for seeking inferential patterns that impose order amongvariety. Any scientific field needs some theory, which can be as modest as a preliminary conceptual structure; quoting from (Hanson 1958) again: "Theories make phenomenainto systems" (his reference is not to computer systems, but to systematic knowledge; however, we are content with both senses). The main purpose of this paper is theoretical: to propose an organizing concept Comput~tiona, l t~iolo~), Computational Psycholosy Computational Petrolos~ Computation~ Mineralogy Comput~tlonal Cherniatr~ Computational As{ronomy Computational Zoology Computational Volcanology Comput~tionM Fhysics Computational G’eology Computational Oceanography Comput gtionM Physiology Computational Materials Science Computational Paleontology Computatl ~nM Seismolo ~y Computational Meteorology Figure 1: Usual View of Computation in Science that can be a fruitful basis for developing a theory of scientific discovery as abstract phenomena. The concept is that of a 1generic task of scientific discovery. The rest of this paper explains this concept, argues why it is promising, and examines related views (due to the broad scope of this subject, this paper will make more citations than is customary, but as necessary to make a convincing case). Generic Tasks of Scientific Discovery Our proposal is that research into the abstract phenomena of scientific discovery should he organized around the concept of generic tasks. Rather than slice up Science and Computation conventionally into computational biology, physics, volcanology, etc. as in Figure 1, we propose slices that are determined by tasks, which can include disparate sciences such as biology and physics. The next few sections will develop our meaning by examining in some detail numerous generic tasks from science. Each generic task will involve a single, specific type of inference that occurs in more than one science. We selected tasks by identifying programs that automate specific inferences to a convincing degree, that have a good link to scientific practice, and that show good promise of generality within science. For example, we have excluded simulation, experiment planning, and theory revision as generic tasks because they are too broad. Undoubtedly one could identify further generic tasks within these areas, but that is left for future development of the concept. The more good instances that can be added to the provisional list below, the more promising is the concept of generic task of scientific discovery. More generic tasks can certainly be drawn from applied statistics, but we emphasize here those tasks with origins in AI. Wemake no claim of crisp boundaries between these tasks; even the possibility that one task wholly contains another is not ruled out. As with any systematic classification (or typology), knowledgeof the best classes, and of relations amongthem, emerge over time. Solving differential equations A very familiar, generic task in science is solving for a differential equation. An equation of the same form 1Chandrasekaranhas previously developed the concept of generic tasks in knowledge-basedexpert systems (Chandrasekaran 1986), and J. McDermotthas expressed similar views (McDermott1988). Our focus is on science. 24 is solved identically, whether it arises in mechanics, population biology, or elsewhere. Existing symbolic computation systems can solve the differential equation y’ -- f(x) over restricted classes of elementary functions. This is, of course, also knownas indefinite integration: y-- ff(x)dx. Solving indefinite integrals is not an isolated case: there also exist algorithms for solving higher-order differential equations, finding roots of polynomials symbolically, and factoring multivariate polynomials over various coefficient domains. (Kaltofen 1987) and (Davenport, Siret, & Tournier 1988) provide overviews of such computational tasks, grouped into the field of computer algebra. Optimization A second example of a generic task in science is optimization. In the case of combinatorial optimization, there already exists the algorithmic theory and implemented software to solve linear programs, nonlinear programs, mixed-integer linear programs, and even mixed-integer nonlinear programs. The scope of combinatorial optimization is not limited to engineeringoriented or experiment-planning aspects of science; theoretical model-building itself can at least sometimes be formulated as a problem of combinatorial optimization by minimizingthe size of the model (e.g., (ValdesPerez 1994a)), which in some sense corresponds seeking the simplest theory. Another case is continuous optimization; similarly, such tasks are generic in the sense that they cut across sciences. A typical application of continuous optimization arises when theories or models contain free parameters to be optimized on the basis of experimental data (Box & Hunter 1965). Inferring laws from data Both AI and statistics have addressed the problem of finding algebraic laws hidden in data, a task of evident generality and of both historical and current relevance. Statistics, for example, has developed methods of regression that lead to algebraic relations amongmeasured variables (Mandel 1984). The BACON program from cognitive science emphasized the creation of new terms from data using a heuristic search that noticed correlations between two terms; psychological plausibility was also a motivating factor in the design of the program. Much work in machine discovery after BACONhas attempted improvements. These first three cases of generic tasks derive their generality from the following fact: their starting points consist of context-free equations, other mathematical expressions, or numeric data. One may begin to suspect that only such mathematical or numerical problems can be generic. To dispel this suspicion, next we will consider in some detail a task that involves substantial reasoning specific to a particular science, but which nevertheless is of a promising degree of generality. {S,,,} ~ {&,,} - {&,r} tutoring system that has been designed to test Anderson’s ACT*(now ACT-R) theory of cognition, which is based on the formalism of production systems. A crucial part of the tutor involves model tracing, which is the task of inferring the production system in the student’s head from his interaction with the tutor. Now, a production system is formally an instance of the schemein Figure 2, and the tutor/student interaction is analogous to the scientist/nature relation in the laboratory. There is not yet any hard evidence that these similarities will be exploited at the algorithmic level, but these and other links among "pathway elucidation" problems in science have been explored elsewhere in some detail (Valdes-Perez, Simon, & Murphy 1992). The basis for the generic character of pathwayelucidation is that, typically, intermediate products of the "reaction" are observed experimentally, all of which must be linked via a pathway. Usually there is a rich hierarchy of constraint on pathways, ranging from what are plausible participants, plausible "reactants" and "products" of a step, plausible steps, and plausible combinations of steps (Valdes-Perez 1994c). Sometimes pathway-like models are called other names: paths, assemblies, nucleations, channels, mechanisms, etc.; the main issue is whether they consist of discrete steps. There is a formal analogy between the pathways of Figure 2 and a proof in logic, in which ’---*’ is logical implication. Indeed, there is no restricted meaning to the symbol ’---’ as used above, which allows for its adaptation here both to the ’--*’ of cognitive psychology and the ’--+’ of chemistry; a logical proof certainly fits the pathway schema also. However, proofs are not typically "elucidated" on the basis of peeks at the intermediate steps. If they were, or if proofs were planned in advance by stipulating lemmas through which the proof must pass, then then proof "elucidation" could be another instance of the generic task. {&,r} {&,,} --+ {&,r} Figure 2: Pathways Pathway elucidation There are many problems across science whose solutions take the form of a pathway (or, a set of steps), as illustrated in Figure 2, in which Si,t and Si,r refer to a set of items on the left and right (respectively) of the ith step. Typically, a pathway describes some physical process that consists of multiple steps, and a typical means to elucidate the multi-step process is by collecting experimental evidence on its various manifestations, as allowed by current experimental technique. For example, an experimental biochemist’s study of a multi-step reaction may lead to the hypothesis that the underlying reaction pathwaylooks like this: ornithine + ammonia + carbon dioxide ammonia "4- citrulline arainine "1" water ~ ~ ~ water + citrttlline arfinine + water urea + ornithine Hans Krebs proposed this very pathway in 1932 to describe the formation of urea in vivo (Holmes 1980), and stages leading to this discovery were modelled in the KEKADA program (Kulkarni & Simon 1988). But not all chemical pathways are this simple. Many common cases involve a dozen or so steps, and some important reactions are suspected to consist of more than a hundred steps. A computer program named MECHEM automates a significant part of this task: it can propose highly plausible pathwaysof moderate size on the basis of experimental evidence (Valdes-Perez 1994b; 1994d; 1995). Chemistry is not the only source of pathwayelucidation problems. Determining multi-step processes on the basis of experimental data is a problem that arises in nuclear physics (Hodgson 1988), petrology (James B. Thompson, Laird, & Thompson 1982), and most especially biology. A biological problem of great current importance is protein folding; the experimentalist’s role is to elucidate the pathways that describe how a protein chain folds into its natural state from a denatured state (Levinthal 1968; Richards 1991). We add that numerous problems in biology involve the attempt to reconstruct some sort of pathway on the basis of experimental evidence. We have reported preliminary results on one such problem: endocytic pathways in cell biology (Valdes-Perez, Simon, & Murphy 1992). To further support the idea that there is a rich set of generic tasks that cuts across the sciences, we can adduce the surprising link of pathway elucidation with the science of psychology, specifically cognitive psychology. (Anderson el al. 1990) discuss an interactive Discrete model building Recently we identified a generic task of building discrete models in science by analyzing six computer programs that modelled or automated different aspects of scientific reasoning in biology, chemistry, and physics (Valdes-Perez, Zytkow, & Simon 1993). The commonframework extracted from these systems involves search in spaces of matrices (in which simpler models correspond to smaller matrices) constrained by domain laws such as conservation and analogous constraints. In parallel with the work on matrix-space search, we developed PAULI(Valdes-Perez 1994a; Valdes-Perez & Erdmann 1994; Valdes-Perez c) as an improvement on Kocabas’s BR-3 program (Kocabas 1991), borrowing representational ideas from MECHEM, which was one of the six systems that were initially included in the matrix-space search concept. Wethen selected the task addressed by one of the systems (GELL-MANN 25 (Fischer & Zytkow 1990)) and demonstrated a new tomation (Valdes-Perez a) that has some advantages over the original program. This body of work constitutes evidence for the generic character of what we are provisionally calling discrete model building. Detecting patterned behavior A commontask in experimental science is to look for patterns within phenomena. For example, (Oliver 1991) advises the reader to "Go for the spatial pattern" as a powerful heuristic of discovery in the earth sciences, but which is generally applicable to other fields that study complex phenomena. Of course, spatial patterns do not typically come into being out of whole cloth, rather they are woven by behavioral patterns: looms make patterned cloths not by weaving randomly, but by means of a patterned behavior on the part of the loom. A recent paper described a systematic technique (PENCHANT)for finding complex patterned behavior within typical processes that are studied in sciences such as cell biology and cognitive psychology (ValdesPerez & Perez 1994). The biological process consisted of manycells undergoing concurrent divisions, and the psychological process involved a chess master reconstructing a board position from memory. Since that paper, we have begun projects to apply PENCHANT to other scientific processes, such as particles impinging on an astronomical detector, a subject’s eye movements in psychology, and other, more complex processes in biology. The PENCHANT method began with "field work" in developmental biology in which we discovered unsuspected patterned behavior in time-lapse images of developing fruitfly embryos (Valdes-Perez & Minden). Wethen proceeded to propose a model that governed these processes and showed that the model accounted well for the observations. By introspection, later we were able to reconstruct our initial inference of patterned behavior as follows: ¯ If cell divisions were truly random(as was believed), then the manycells would, by chance, lead to some crowding;. ¯ Time-lapse images show the cells to be well spaced. ¯ Therefore, the cell division process cannot be random. From this specific inference, we extracted a generic method by generalizing the underlined terms: cell divisions became the process(es), and crowding became a (variety of) spatial and geometric features. The PENCHANTmethod can be viewed as a complex permutation test (Edgington 1980), but we have not so far encountered any similar method in our diverse contact with scientists. These applications are convincing evidence that detecting patterned behavior is a generic task of scientific discovery. While PENCHANT has not yet stumbled 26 on a convincing discovery, it has made some modest ones, and we are in hot pursuit of others. Moreover, PENCHANT follows closely on the heels of a human discovery whose key step the program can reproduce. Elucidating topological/spatial structure A significant number of tasks in natural science consist of proposing discrete, structural modelsof objects. The earliest significant work that involved computer scientists was the DENDRAL project (Lindsay et al. 1993), which started with the discovery by J. Lederberg at Stanford of an algorithm to generate the possible isomeric structures of a given molecular formula. That algorithm led to a sustained effort to automate the elucidation of molecular structure on the basis of mass-spectrometric and other data. The subsequent PROTEAN project (Hayes-Roth et al. 1986) at Stanford attempted to elucidate protein structure, which was considerably more difficult, since protein molecules are muchlarger than the molecules typical of organic chemistry that DENDRAL was confronted with. DENDRALand PROTEAN both used constraint-based exclusion paradigms, so perhaps the elucidation of topological/spatial structure can be counted tentatively as a generic task. Related Views The previous sections have explicated the concept of generic tasks at some length. In brief, we are proposing an organizational concept that provides a sustainable principled basis for research into the relations of computation and scientific inference. Wehave emphasized tasks of discovery, although not exclusively, but the scope of the concept extends to nowadays more routine tasks of scientific inference, such as solving a differential equation. The inspiration for this concept derives part!y from the book by Langley, Simon, Bradshaw, and Zytkow (Langley et al. 1987), who took broad view of science and examined to what extent the programs they described could be applicable elsewhere in science. This organizational concept contrasts with the "orthodox" view of the computational sciences depicted previously in Figure 1. Figure 3 illustrates our proposal, supplemented with other generic tasks that cannot be elaborated here (Valdes-Perez b). A Research Plan One fruitful research plan follows simply from the concept of generic scientific task. Given the current state of understanding, any of the following steps will advance the subject: ¯ Identify a new task of scientific inference that seems susceptible to automation. ¯ Automate it, or at least provide a computer aid. ¯ Seek applications, to show that the automation is credible. Biolo~ Chemistry’ - - ¯ to see where in science a process-oriented phenomenon had been reported as random on the basis of experimental data. (In this case, PENCHANT found new patterns, but they turned out to be artifacts of the assumption that certain objects having a nonzero area could be modelled as point-like objects). Finally, howdoes one automate the scientific task? Wesuggest the use of all available tools: the concepts and techniques of AI, operations research, theoretical computer science, applied mathematics and statistics, and so on. Wepropose to study the abstract phenomena of scientific inference, not to apply AI and only AI. An analogy: (Hanson 1958, p. 72) remarks that "Physics is not applied mathematics. It is a natural science in which mathematics can be applied." Similarly, a science of the abstract phenomenaof scientific inference is not applied AI, it is a science in which AI can be applied. Nevertheless, it should remain a part of AI, just as game-playing and natural language processing have remained in AI. The commonphenomena of all these areas is inference of somesort, so they will always have a degree of cohesion. solv.in~ differential equations opt~m~zatmn inferrin$ algebraic law~ pat hwa~elucidation discrete model buildin$ detectin~ patterned behavior elucidatin[~ topolosical structure findin~ differential ec~uation models reconstructin~ evolutionary trees uncoverin$ causal relations qualitative analysis of diff eqs )tual clusterin lnvartants Jroblems Figure 3: SomeGeneric Tasks of Scientific Discovery ¯ Generalize within science: Look for analogous tasks from another science, or from another subfield within the same science. ¯ If there is evidence that the task is generic within science, add the task to the list of knowngeneric tasks. ¯ Compare the new task to other generic tasks in search of further generalizations. One might generate more research steps by considering generically what is being proposed: a typology (or even a beginning taxonomy) of scientific inference founded on computation. Typologies require the development of concepts, demonstration of their worth, hierarchical organization in the more structured case of taxonomies, etc. Perhaps even the scientific task (carried out here in this section by the author) of proposing research plan on the basis of a typological viewpoint could be studied systematically and generically. What tools or heuristics can help to carry out this research plan? For example, how does one "identify a new task of scientific inference that seems susceptible to automation"? From our experience, a good step is to insert oneself personally within scientific contexts, e.g., becoming a memberof an interdisciplinary center, reading scientist’s (especially experimentalist’s) papers, attending seminars in other departments, and so on. One can then ask such fruitful questions as: Whatis the space of hypotheses that is implicitly being considered by such and such inference? Given a successful automation or computer aid for a task X, how does one "look for analogous tasks from another science"? We have found that one fruitful method is to browse databases of scientific abstracts (e.g., INSPEC, PERIODICALS,MEDLINE,etc.) using synonyms for the terms that describe X. For example, we have sought analogues to chemical pathway elucidation elsewhere, e.g., by first using "elucidat?" to find what other process-oriented phenomenaare elucidated in science, thus generating synonymsfor "pathway". We have also used this method to seek other tasks of detecting patterned behavior. For example, we worked with a biophysicist on crystal growth after finding a relevant paper in INSPEC(Durbin, Carlson, & Saros 1993) by searching on the keyword "randomly" Conclusion Wehave proposed a new view of scientific discovery that is based on the concept of generic tasks. Each generic task in science is general because it is present in various sciences, but at the same time is specific because it involves one mode of inference, or a small number of them. This view of scientific inference houses disparate pieces of work from computer algebra, computational statistics, operations research, theoretical computer science, and artificial intelligence under a single roof whose commonpurpose is the principled development of systematic methods of scientific inference. Wehave identified a number of credible generic tasks: almost every task is represented by at least one computer program, and for each task there is at least some evidence of genericity. There is no doubt that others that we have missed could be added to the initial list, but this would serve to strengthen our proposal, not vitiate it. Wehave pointed out a research plan that follows naturally from the generic task concept, and have suggested research heuristics and tools in support of the plan. Acknowledgments.This work was supported in part by a Science and TechnologyCenter grant from the National Science Foundation, #MCB-8920118, and by the Keck Center for AdvancedTraining in Computational Biology. References Anderson,J.; Boyle, C.; Corbett, A.; and Lewis, M. 1990. Cognitivemodelingand intelligent tutoring. Artificial Intelligence 42(1):7-49. Box, G., and Hunter, W. 1965. The experimental study of physical mechanisms.Technometrics7(1):23-42. (Reprinted in: The Collected Worksof George E.P. Box). 27 Chandrasekaran, B. 1986. Generic tasks in knowledgebased reasoning: high-level building blocks for expert system design. IEEE Expert 1:23-30. Chung, F. 1988. The average distance and the independence number. Journal of Graph Theory 12(2):229-235. Davenport, J. H.; Siret, Y.; and Tournier, E. 1988. Computer Algebra -- Systems and Algorithms for Algebraic Computation. London: Academic Press. Durbin, S.; Carlson, W.; and Saros, M. 1993. In situ studies of protein crystal growth by atomic force microscopy. Journal of Physics D (Applied Physics) 26(8B):B128-32. Edgington, E. S. 1980. Randomization Tests. NewYork: Marcel Dekker. Fajtlowicz, S. 1988. On conjectures of Graffiti. Discrete Mathematics 72:113-118. Fischer, P., and Zytkow, J. M. 1990. Discovering quarks and hidden structure. In Ras, Z.; Zemankova, M.; and Emrich, M., eds., Proceedings of the Fifth International Symposiumon Methodologies for Intelligent Systems, 362370. North Holland. Hanson, N. R. 1958. Patterns of discovery: An inquiry into the conceptual foundatons of science. Cambridge, New York: Cambridge University Press. Hayes-Roth, B.; Buchanan, B.; Lichtarge, O.; Hewett, M.; Altman, R.; Brinldey, J.; Cornelius, C.; Duncan, B.; and Jardetzky, O. 1986. Protean: Deriving protein structure from constraints. In Proc. of AAAI, 904-909. Hodgson, P. 1988. Multistep processes in nuclear reactions. Contemporary Physics 29(5):457-476. Holmes, F. L. 1980. Hans Krebs and the discovery of the ornithine cycle. In Proc. of 63rd Annual Meeting of the Federation of American Societies for Experimental Biology, volume 39. Symposiumon Aspects of the History of Biochemistry. James B. Thompson, J.; Laird, J.; and Thompson, A.B. 1982. Reactions in amphibolite, greenschist and blueschist. Journal of Petrology 23:1-27. Kaltofen, E. 1987. Computer algebra algorithms. Annual Review of Computer Science 2:91-118. Kocabas, S. 1991. Conflict resolution as discovery in particle physics. Machine Learning 6(3):277-309. Kulkarni, D., and Simon, H. 1988. The processes of scientific discovery: The strategy of experimentation. Cognitive Science 12:139-175. Langley, P.; Simon, H.; Bradshaw, G.; and Zytkow, 3. 1987. Scientific Discovery: Computational Explorations of the Creative Processes. Cambridge, Mass.: MIT Press. Lenat, D. B. 1982. AM: discovery in mathematics as heuristic search. In Davis, R., and Lenat, D. B., eds., Knowledge-based systems in artificial intelligence. New York, N.Y.: McGrawHill. Levinthal, C. 1968. Are there pathways for protein folding? J. Chim. Phys. 65:44-45. Lindsay, R.; Buchanan, B.; Feigenbaum, E.; and Lederberg, J. 1993. DENDRAL: a case study of the first expert system for scientific hypothesis formation. Artificial InteUigence 61(2):209-261. Mandel, J. 1984. The Statistical Analysis of Experimental Data. New York: Dover. McDermott, J. 1988. Preliminary steps toward a taxonomy of problem-solving methods. In Marcus, S., ed., Automating Knowledge Acquisition for Expert Systems. Boston, MA: Kluwer Academic Publishers. Newell, A., and Simon, H. A. 1976. Computer science as empirical inquiry: Symbols and search. Communications of the ACM19:111-126. Turing Award Lecture. Oliver, J. E. 1991. The Incomplete Guide to the Art of Discovery. NewYork: Columbia University Press. Richards, F. M. 1991. The protein folding problem. Scientific American 264(1):54-63. Valdes-Perez, R. E. Algebraic representation in scientific model building: Discovery of quark structure in physics. Artificial Intelligence. submitted for publication. Valdes-Perez, R. E. Generic tasks of scientific discovery. submitted to IJCAI-95. Valdes-Perez, R. E. A new theorem in particle physics enabled by machine discovery. Machine Learning. Submitted for publication. Valdes-Perez, R. E., and Erdmann, M. 1994. Systematic induction and parsimony of phenomenological conservation laws. Computer Physics Communications 83:171180. Valdes-Perez, R. E., and Minden, J.S. Drosophila melanogaster syncytial nuclear divisions are patterned: Time-lapse images, hypothesis, and computational evidence. Journal of Theoretical Biology. Accepted for publication. Valdes-Perez, R. E., and Perez, A. 1994. A powerful heuristic for the discovery of complex patterned behavior. In Machine Learning: Proceedings of the Eleventh International Conference. San Francisco, CA: Morgan Kaufmann. Valdes-Perez, R. E.; Simon, H. A.; and Murphy, R. F. 1992. Discovery of pathways in science. In Proceedings of the ICML-92 Workshop on Machine Discovery, 51-57. Also: National Institute for Aviation Research Report 9212, Wichita, KS 67208. Valdes-Perez, R. E.; Zytkow, J. M.; and Simon, H. A. 1993. Scientific model-building as search in matrix spaces. In Proceedings of 11th National Conference on Artificial Intelligence, 472-478. Menlo Park, CA: AAAIPress. Valdes-Perez, R. E. 1994a. Algebraic reasoning about reactions: Discovery of conserved properties in particle physics. Machine Learning 17(1):47-68. Valdes-Perez, R. E. 1994b. Conjecturing hidden entities via simplicity and conservation laws: Machine discovery in chemistry. Artificial Intelligence 65(2):247-280. Valdes-Perez, R. E. 1994c. Heuristics for systematic elucidation of reaction pathways. Journal of Chemical Information and Computer Sciences 34(4):976-983. Valdes-Perez, R. E. 1994d. Human/computer interactive elucidation of reaction mechanisms: Application to catalyzed hydrogenolysis of ethane. Catalysis Letters 28(1):79-87. Valdes-Perez, R. E. 1995. Machine discovery in chemistry: Newresults. Artificial Intelligence. in press. 28