Chapter 1 Overview 1.1 Method of Presentation Connectionism challenges many of the assumptions of artificial intelligence, cognitive science and epistemology. Unfortunately, like most assumptions, these reside in a “background” of which we’re hardly aware. Therefore, to better understand the significance of connectionism we must first bring these assumptions out into the light. Our goal will be a kind of “psychoanalysis” of the traditional theories of cognition and computation. As in psychoanalysis, we will expose long repressed beliefs and desires. The object is not necessarily to eliminate these beliefs and desires, but rather, by bringing them into conscious awareness, to open them up for critical scrutiny. Another metaphor for our task is an archaeological investigation, which in fact it is, since archaeology means the science of beginnings. Thus, just as the hill of Troy has been peeled back, stratum by stratum, to reveal its history, so we may dig below the edifices of the cognitive and computational sciences so that we may understand their historical supports (Fig. 1.1). Although the psychoanalytical and archaeological metaphors may help us understand our task, it is in fact better to present the material in its historical order. It’s easier to trace from the root out to the leaves, than from the leaves back to the root. The reason is that when we begin with the leaves there are too many branches to be followed simultaneously, and their ultimate destination is unclear. On the other hand, if we begin at the root then we’ll always be following just one branch at a time. When we want to 3 The Psychoanalysis of Computation The Archaeology of Computation Advantage of Historical Presentation 4 CHAPTER 1. OVERVIEW Modern Cognitive and Computational Sciences computability theory constructivism symbolic logic formal logic ancient philosophy Figure 1.1: Archaeology of Computation Additional Information Occasionally you will find supplementary text, such as this, separated from the main text by horizontal rules. This supplementary text takes many forms, including definitions, explanations, digressions, technical details, amplifications, and pointers into the literature. You may read this material or not, as your interests, background and time permit. explore another branch we will go back to the branch from which it grows, and this will be a branch with which we’re already familiar. The historical approach, by mimicking the genetic development of the ideas, keeps their relationships clear. Disadvantage of Historical Presentation On the other hand, the historical approach has a disadvantage. Since the roots of the modern cognitive and computational sciences go very deep indeed, it will not be obvious how these roots are connected to the issues central to these sciences. Of course, I could simply assert that they are, and that you will see the relevance when we finally get to the leaves, but that’s not a very satisfactory approach. Instead I’ll very briefly go from the top down, showing you the connections between the cognitive and computational sciences and some perennial philosophical and scientific concerns. This will give you a kind of prophetic insight when we then turn around and follow the historical process. 1.2. ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE 1.2 5 Artificial Intelligence and Cognitive Science Artificial intelligence (AI) began about 1960 as the attempt to get computers to act with human-like intelligence, although from the beginning many AI researchers were also interested in natural intelligence. It has always seemed likely that understanding the processes of natural intelligence would facilitate their emulation in our artifacts, and conversely that the ability to create intelligent artifacts would inform our theories of natural intelligence. As a result developments in artificial intelligence and cognitive science have proceeded in parallel, with many fruitful exchanges of ideas. Central to both artificial intelligence and cognitive science have been the use of logic-based knowledge representation languages and rule-based descriptions of cognitive processes, which are based on the idea of a calculus, that is, a set of symbolic structures and rules for the mechanical manipulation of these structures. This is the core idea of computer science, since the digital computer is nothing more than a mechanical rule manipulator, a physical device for implementing calculi. Although the idea of a calculus is very old — we will see that it comes from ancient Greek philosophy — it was not well understood until the theory of computation was developed in the first decades of the twentieth century. 1.3 Computation and Calculi The Theory of Computation Surprisingly, the theory of computation predates by a decade or more the invention of the first electronic digital computers in the 1940s. Why were mathematicians and logicians in the 1930s worried about the theory of computation when there weren’t any computers? To understand this we must investigate some of the issues that concerned mathematicians and philosophers in the late nineteenth and early twentieth centuries. In the late nineteenth century Georg Cantor developed a remarkable theory of sets that included a series of infinite numbers of greater and greater magnitude.1 As we’ll see, mathematicians and philosophers have always been uncomfortable with the infinite (both the infinitely large and the infinites1 Artificial Intelligence and Natural Intelligence To avoid clutter, I am omitting the references to the literature, which will be found in the later chapters where these topics are discussed in detail. Theory of Computation Predates Computers Cantor’s Set Theory 6 CHAPTER 1. OVERVIEW mal, or infinitely small); this attitude is rooted in ancient philosophy. As a result, many mathematicians doubted whether Cantor’s theory was meaningful. What did it mean to claim that these infinite sets and numbers exist? Nevertheless, Cantor’s powerful methods were eventually widely adopted by mathematicians, and by now have become an accepted part of mainstream mathematics. Constructivism There were — and are — many mathematicians who do not accept the proof techniques used by Cantor and most other mathematicians. One of the first of these was L. E. J. Brouwer (1882–1966), who founded a school of mathematics known as intuitionism. His ideas were further developed by a group of mathematicians who are known as constructivists, because they believe that it makes sense to talk about the existence of a mathematical object only if it can, at least in principle, be constructed. Thus, a constructivist would not accept a proof of the existence of a number satisfying an equation, unless that proof showed how to construct (i.e., calculate) that number. The goal of the constructivists was to prove as much as possible constructively; what could not be so proved was to be discarded as chicken scratches on paper. Constructivism What do these disputes in the philosophy of mathematics have to do and Computation with computers? The constructivists had a problem. Before they could offer a definite alternative to nonconstructive mathematics they would have to be able to say precisely what they meant by a number (or any other mathematical object) being constructible. To this end they developed the theory of effective calculability, or, as it’s now more commonly called, computability theory. Roughly, something is effectively calculable or computable if it can be calculated by the strictly mechanical application of formal rules in a finite number of steps and using finite resources. Specifically, it is effectively calculable just when there’s a calculus for calculating it. It is no coincidence that all these words contain the root calcul-; the exact significance is discussed in detail later (p. 24). Thus, the efforts of these mathematicians and logicians led to the theory of computation, which is the core theory of computer science, and provides many of the techniques and central insights of traditional AI and cognitive science. The Preference for We have traced briefly the cognitive and computational sciences back to Calculi the emergence of constructivist ideas in the beginning of this century. These ideas made precise the notion of the mechanical manipulation of symbolic structures. But the roots of the preference for calculi — the preference for 1.4. THEMES IN THE HISTORY OF KNOWLEDGE 7 rules that can be applied mechanically and for the representation of knowledge in symbolic structures — go much deeper, and manifest themselves in several themes that run through Western philosophy’s view of knowledge. 1.4 Themes in the History of Knowledge The first theme is that true, scientific knowledge can be formalized, that is, expressed in finite formulas constructed from discrete symbols (such as the letters, numbers and other signs of mathematical formulas). A complementary part of this theme is that scientific inference can be expressed in a finite number of precise, definite rules for manipulating these formulas, and that these rules depend only on the shape and arrangement of the symbols (think of algebraic processes such as solving a simple equation). This theme began with Aristotle’s identification of formal laws of deduction, continued with the extensive logical investigations of the Middle Ages (by Ockham et al.), showed its potential through attempts by Lull, Leibniz and others to devise a universal calculus for knowledge representation and inference, and reached maturity in the nineteenth century in the symbolic logics developed by philosophers and mathematicians such as Boole, Frege, and Russell. In the first decades of the twentieth century the movement to formalize scientific knowledge developed into an extreme form called logical positivism (or logical empiricism). This philosophy of science is based on Russell’s logic and Wittgenstein’s theory of logical atomism, which views items of scientific knowledge as logical constructions of elementary propositions, such as sense data or measurements. The philosophy was explicitly formulated in the “Vienna Circle,” whose members included Moritz Schlick, Rudolf Carnap and Otto Neurath, but rapidly spread to become the dominant philosophy of science in the first half of the twentieth century. Many major movements in the individual sciences, such as the Copenhagen interpretation in quantum mechanics (Bohr), functionalism in sociology (Durkheim) and behaviorism in psychology (Skinner), are simply applications of logical positivism to that science. Indeed, although it began to lose popularity about 1960, logical positivism still forms much of the background of our view of “real science.” In particular, although contemporary cognitive science has rejected behaviorism, a positivist view of method, it has not completely broken away from rule-based knowledge representation, a positivist view of cognition. The second theme is usually called the arithmetization of geometry, but Formalization of Knowledge Logical Positivism Rationalization of the Continuum 8 Limitations of Calculi Limitations of Symbolic AI CHAPTER 1. OVERVIEW is more accurately called the rationalization of the continuum. It stems from the discovery by the ancient Pythagoreans that certain continuous magnitudes (what we now call irrational numbers) could not be expressed as ratios of whole numbers; that is they could not be rationalized, or expressed as finite, discrete formulas. Since the ancient Greeks assumed that true knowledge was by its nature expressible in finite, discrete formulas, there seemed to be something inherently irrational (incomprehensible) about continuous magnitudes. Thus arose the problem of arithmetizing geometry, that is, of reducing the troubling continuous magnitudes, studied in geometry, to the more comfortable discrete numbers (integers), studied in arithmetic. The ancient Greeks were not able to accomplish this — Euclid axiomatizes arithmetic and geometry separately — and the problem of arithmetizing geometry remained open until the nineteenth century when Dedekind and Wierstrass showed how real numbers could be constructed from sequences of rational numbers. These accomplishments meshed into the development of formal logic, so that by the first decades of the twentieth century it seemed that mathematical knowledge and reasoning would be soon reduced to a calculus. Unfortunately, just when it seemed that the rationalization of mathematics was at hand, disturbing and seemingly irreparable defects were discovered in the foundations; mathematical logicians such as Skolem, Löwenheim, Gödel and Turing discovered fundamental limitations in the ability of calculi to represent knowledge and inference. In particular they call into question the apparent success of the arithmetization of geometry. Although most of these discoveries were made before 1940, their implications for science and mathematics and for cognition and computation are still unclear and a topic of continuing debate. Part of the difficulty is that the assumptions of discrete knowledge representation are so embedded in our view of knowledge that we find it difficult to see any alternative to viewing knowledge as a calculus. The development of electronic digital computers in the 1940s provided the first opportunity to directly test the hypothesis that knowledge was representable by finite, discrete formulas and that it could be processed by discrete symbol-manipulation operations. Early experiments were very promising, and led in the early 1970s to a surge of interest in rule-based systems. Unfortunately, more ambitious goals in both artificial intelligence and cognitive science began to expose the weaknesses of the calculus as a model of cognition; these include the inability to handle novelty, minor exceptions, uncertainty and context-sensitivity without an exponential explosion in the number or 1.5. ALTERNATIVE VIEWS OF COGNITION 9 size of the rules, and thus in the required computational resources. Further evidence of the inadequacy of the calculus model comes from the “cognitive inversion” of rule-based approaches: what they seem to do well are higher cognitive activities (formal logic, chess, mathematics); what they seem to do poorly are lower cognitive activities (perception, coordination, pattern recognition, image processing). This inversion suggests that natural intelligence is not rule-based. Thus, by the early 1980s researchers were being driven to the conclusion that the calculus was impractical as an implementation of artificial intelligence and implausible as a model of biological intelligence. But what could be an alternative? 1.5 Alternative Views of Cognition An alternative is connectionism, but to see its importance it is necessary to understand it in the context of alternative theories of knowledge. We have seen that the dominant view of knowledge in the Western philosophical tradition has reflected a pervasive aversion to the continuous and the infinite, and a preference for definite rules and finite, discrete, symbolic structures. Thus the calculus becomes the ideal model of a knowledge structure, and the goal of any science becomes the axiomatization of its knowledge. Further, since the essence of a calculus is in its formal structure rather than its material embodiment, there has been a tendency to think of knowledge as independent of its biological context. There have always been philosophers that have disagreed with this view of knowledge. For example, in ancient Greece Heraclitus stressed the continuity of knowledge, both in its structure and its evolution; the Sophists discussed the dynamics of knowledge and its relation to society; and Aristotle investigated analogical knowledge structures and the role of knowledge in its biological context. Positivist theories of knowledge were also criticized in the eighteenth and nineteenth centuries by philosophers such as Kant, Hegel, Dilthey and Nietzsche. The early twentieth century brought the development of two important alternatives to positivism, phenomenology, especially as formulated by Husserl and Heidegger, and ordinary language philosophy, as investigated by Austin, Wittgenstein and others. The phenomenologists stressed the importance of everyday understanding, including the social context, as a foundation for all knowledge; the ordinary language philosophers showed the impossibility of defining the terms of everyday language in the The Traditional View: A Calculus Philosophical Alternatives 10 Psychological Alternatives Neuroscientific Alternatives way demanded by the logical positivists, and the inadequacy of a positivist model of conceptual knowledge. The latter half of the twentieth century has also seen a renewed attempt to understand knowledge in both its biological (Lakoff, Johnson, Millikan, et al.) and social contexts (Kuhn, Lakatos, Popper, Dreyfus, et al.). There have also been minority views in psychology, most notably Gestalt psychology (Wertheimer, Koffka, Köhler, et al.), which in the early twentieth century opposed a continuous, holistic view of knowledge to the discrete, atomistic models of behaviorism. However, before it was able to develop an empirically verified theory of the mechanisms of cognition, it fell before the logical positivist juggernaut. Nevertheless, the decline of positivism has brought a renewed interest in Gestalt ideas, as well as investigations into mental imagery (Shepard, Miller, Paivio, Kosslyn, et al.) and the inadequacy of discrete, symbolic models of concepts and categories (Rosch et al.). There has also been a trend toward putting cognition in its biological context (Gibson, Wason, Johnson-Laird, et al.). By 1950 the view that the brain stores information like a computer memory, with discrete items of information stored in discrete locations, had come under attack from neuroscientists such as Karl Lashley. He proposed that each “item” of information was distributed over an entire region of the brain, with many different items being superimposed in the same region. Although Lashley was unable to explain the mechanism of distributed memory, such a mechanism was proposed in the 1960s by Karl Pribram, who proposed that memory operated on principles analogous to an optical hologram, that is, the interference of continuous fields. This idea forms a bridge between the theories of the Gestalt psychologists and modern connectionist theories. 1.6 Origins of Connectionism CHAPTER 1. OVERVIEW Connectionism In the middle decades of this century, before the sequential, digital computer had completely dominated our view of computation, there were many exciting applications of artificial neural networks, parallel analog computer systems with principles of operation patterned after the brain, such as Rosenblatt’s “perceptrons” and Selfridge’s “Pandemonium.” For a variety of reasons, both philosophical — the unquestioned acceptance of logical positivism working its way through computer science — and technological — the rapid progress in digital electronic technology — most neural network research had been 1.6. CONNECTIONISM 11 abandoned by the late 1960s. Although a few researchers continued in the area (e.g., Grossberg, Longuet-Higgens, Anderson, Willshaw), the 1970s were the decade of rule-based models of intelligence, whether natural or artificial. Hubert Dreyfus’ 1972 book, What Computers Can’t Do, is best known for What Computers its apparent negative conclusion, the impossibility of artificial intelligence, Can’t Do but it is also important for opening the way for connectionism since, in addition to showing the limitations of traditional AI, it reviewed some of the psychological evidence against symbolic knowledge representation, showed the importance of the phenomenological view of knowledge, and pointed to holographic models of cognition as an alternative to rule-based computation. By the early 1980s growing dissatisfaction with the calculus model was causing some researchers in cognitive science and artificial intelligence to explore alternative models based on the cooperative and competitive interaction of large numbers of simple elements that represent and process information at the subsymbolic level (Feldman, Ballard, Rumelhart, McClelland, Hinton, Smolensky, Sejnowski, Sutton, Barto, Kohonen, Hopfield, Hofstadter, and many others). Although some of the earlier connectionist models still used discrete (digital) symbols stored locally in a network, the trend has been toward representations based on continuous (analog) values distributed throughout the network. The result is both psychologically and biologically more plausible. Psychologically, because connectionist systems exhibit many of the same strengths and weaknesses as natural systems; for instance, they do not exhibit the cognitive inversion. Biologically, because it is much easier to conceive of an implementation in nervous tissue of a connectionist system than of a rule-based system. Since the mid-1980s there has been an efflorescence of connectionist research, both basic and applied, but its implications for our understanding of knowledge, language and cognition – both natural and artificial — are still on the horizon.2 The contribution of connectionism is that it integrates these alternative theories of knowledge into the body of scientific theory by providing access 2 The terms connectionism, parallel distributed processing (PDP), and neural networks are used nearly synonymously. I prefer connectionism to the more awkward (but perhaps more accurate) parallel distributed processing, but both refer to the same alternative theory of knowledge and cognition. I will reserve the term neural networks (and neurocomputing) for the new approach to computing inspired by biological neural systems. The difference, however, is more of intent than of content. Contemporary Connectionism Significance of Connectionism 12 The Homunculus Banishing the Homunculus CHAPTER 1. OVERVIEW to the mechanisms of cognition. That is, although many philosophers and psychologists were aware of the inadequacy of positivist theories of knowledge, their alternatives seemed unscientific because they did not show how they related to the commonly accepted theories and methods of the other sciences. The problem is to “banish the homunculus” — a colorful description of the goal of a scientific psychology. The problem is shown quite literally in some popular presentations of the brain. We may see the eyes and other senses wired to dials and TV monitors within a control room in the brain. The muscles, internal organs, etc. are in turn controlled by wires connected to control levers and switches in this same control room. And there, in the middle of it all — looking at the dials and manipulating the levers — is a little man (homunculus), usually wearing an engineer’s cap! It is apparent that nothing has been accomplished by explaining our cognitive capabilities in terms of a little man in our heads. How are his cognitive capabilities to be explained? By reduction to another, smaller homunculus? Although few scientists would postulate a homunculus in such an explicit form, it is remarkably hard to keep homunculi out of theories of the mind. For example, if a neuroscientist identifies a visual projection area in the cortex, the question immediately arises: Who (or what) is “looking” at the image in that area? No one proposes tiny TV screens in our brains, but a visual projection area is not so different unless we can explain mechanistically how the image is processed. Thus the only way to be sure we have banished the homunculus is to give a complete in-principle mechanistic (i.e., non-homuncular) account of every step from sensation to action; only then can we be sure we haven’t begged the central question. Part of the importance of AI to cognitive science is that it promises to do exactly this. Since, we suppose, there is no “ghost in the machine,” successful artificial intelligence is ipso facto a non-homuncular theory of intelligence. Unfortunately, traditional AI was not successful, since it was based on an inadequate theory of knowledge. The promise of the new AI is that, by virtue of a better epistemology, it will succeed in banishing the homunculus.