Chapter 1 Overview 1.1 Method of Presentation

advertisement
Chapter 1
Overview
1.1
Method of Presentation
Connectionism challenges many of the assumptions of artificial intelligence,
cognitive science and epistemology. Unfortunately, like most assumptions,
these reside in a “background” of which we’re hardly aware. Therefore, to
better understand the significance of connectionism we must first bring these
assumptions out into the light.
Our goal will be a kind of “psychoanalysis” of the traditional theories
of cognition and computation. As in psychoanalysis, we will expose long
repressed beliefs and desires. The object is not necessarily to eliminate these
beliefs and desires, but rather, by bringing them into conscious awareness,
to open them up for critical scrutiny.
Another metaphor for our task is an archaeological investigation, which
in fact it is, since archaeology means the science of beginnings. Thus, just
as the hill of Troy has been peeled back, stratum by stratum, to reveal its
history, so we may dig below the edifices of the cognitive and computational
sciences so that we may understand their historical supports (Fig. 1.1).
Although the psychoanalytical and archaeological metaphors may help
us understand our task, it is in fact better to present the material in its
historical order. It’s easier to trace from the root out to the leaves, than
from the leaves back to the root. The reason is that when we begin with the
leaves there are too many branches to be followed simultaneously, and their
ultimate destination is unclear. On the other hand, if we begin at the root
then we’ll always be following just one branch at a time. When we want to
3
The
Psychoanalysis of
Computation
The Archaeology of
Computation
Advantage of
Historical
Presentation
4
CHAPTER 1. OVERVIEW
Modern Cognitive and
Computational Sciences
computability
theory
constructivism
symbolic logic
formal logic
ancient philosophy
Figure 1.1: Archaeology of Computation
Additional Information
Occasionally you will find supplementary text, such as this, separated
from the main text by horizontal rules. This supplementary text takes
many forms, including definitions, explanations, digressions, technical
details, amplifications, and pointers into the literature. You may read
this material or not, as your interests, background and time permit.
explore another branch we will go back to the branch from which it grows,
and this will be a branch with which we’re already familiar. The historical
approach, by mimicking the genetic development of the ideas, keeps their
relationships clear.
Disadvantage of
Historical
Presentation
On the other hand, the historical approach has a disadvantage. Since
the roots of the modern cognitive and computational sciences go very deep
indeed, it will not be obvious how these roots are connected to the issues
central to these sciences. Of course, I could simply assert that they are, and
that you will see the relevance when we finally get to the leaves, but that’s
not a very satisfactory approach. Instead I’ll very briefly go from the top
down, showing you the connections between the cognitive and computational
sciences and some perennial philosophical and scientific concerns. This will
give you a kind of prophetic insight when we then turn around and follow
the historical process.
1.2. ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE
1.2
5
Artificial Intelligence and Cognitive Science
Artificial intelligence (AI) began about 1960 as the attempt to get computers to act with human-like intelligence, although from the beginning many
AI researchers were also interested in natural intelligence. It has always
seemed likely that understanding the processes of natural intelligence would
facilitate their emulation in our artifacts, and conversely that the ability to
create intelligent artifacts would inform our theories of natural intelligence.
As a result developments in artificial intelligence and cognitive science have
proceeded in parallel, with many fruitful exchanges of ideas.
Central to both artificial intelligence and cognitive science have been the
use of logic-based knowledge representation languages and rule-based descriptions of cognitive processes, which are based on the idea of a calculus, that
is, a set of symbolic structures and rules for the mechanical manipulation of
these structures. This is the core idea of computer science, since the digital
computer is nothing more than a mechanical rule manipulator, a physical
device for implementing calculi. Although the idea of a calculus is very old
— we will see that it comes from ancient Greek philosophy — it was not
well understood until the theory of computation was developed in the first
decades of the twentieth century.
1.3
Computation and
Calculi
The Theory of Computation
Surprisingly, the theory of computation predates by a decade or more the
invention of the first electronic digital computers in the 1940s. Why were
mathematicians and logicians in the 1930s worried about the theory of computation when there weren’t any computers? To understand this we must
investigate some of the issues that concerned mathematicians and philosophers in the late nineteenth and early twentieth centuries.
In the late nineteenth century Georg Cantor developed a remarkable theory of sets that included a series of infinite numbers of greater and greater
magnitude.1 As we’ll see, mathematicians and philosophers have always been
uncomfortable with the infinite (both the infinitely large and the infinites1
Artificial
Intelligence and
Natural
Intelligence
To avoid clutter, I am omitting the references to the literature, which will be found
in the later chapters where these topics are discussed in detail.
Theory of
Computation
Predates
Computers
Cantor’s Set
Theory
6
CHAPTER 1. OVERVIEW
mal, or infinitely small); this attitude is rooted in ancient philosophy. As a
result, many mathematicians doubted whether Cantor’s theory was meaningful. What did it mean to claim that these infinite sets and numbers exist?
Nevertheless, Cantor’s powerful methods were eventually widely adopted by
mathematicians, and by now have become an accepted part of mainstream
mathematics.
Constructivism
There were — and are — many mathematicians who do not accept the
proof techniques used by Cantor and most other mathematicians. One of
the first of these was L. E. J. Brouwer (1882–1966), who founded a school
of mathematics known as intuitionism. His ideas were further developed by
a group of mathematicians who are known as constructivists, because they
believe that it makes sense to talk about the existence of a mathematical object only if it can, at least in principle, be constructed. Thus, a constructivist
would not accept a proof of the existence of a number satisfying an equation,
unless that proof showed how to construct (i.e., calculate) that number. The
goal of the constructivists was to prove as much as possible constructively;
what could not be so proved was to be discarded as chicken scratches on
paper.
Constructivism
What do these disputes in the philosophy of mathematics have to do
and Computation with computers? The constructivists had a problem. Before they could offer
a definite alternative to nonconstructive mathematics they would have to be
able to say precisely what they meant by a number (or any other mathematical object) being constructible. To this end they developed the theory
of effective calculability, or, as it’s now more commonly called, computability
theory. Roughly, something is effectively calculable or computable if it can be
calculated by the strictly mechanical application of formal rules in a finite
number of steps and using finite resources. Specifically, it is effectively calculable just when there’s a calculus for calculating it. It is no coincidence that
all these words contain the root calcul-; the exact significance is discussed in
detail later (p. 24).
Thus, the efforts of these mathematicians and logicians led to the theory
of computation, which is the core theory of computer science, and provides
many of the techniques and central insights of traditional AI and cognitive
science.
The Preference for
We have traced briefly the cognitive and computational sciences back to
Calculi
the emergence of constructivist ideas in the beginning of this century. These
ideas made precise the notion of the mechanical manipulation of symbolic
structures. But the roots of the preference for calculi — the preference for
1.4. THEMES IN THE HISTORY OF KNOWLEDGE
7
rules that can be applied mechanically and for the representation of knowledge in symbolic structures — go much deeper, and manifest themselves in
several themes that run through Western philosophy’s view of knowledge.
1.4
Themes in the History of Knowledge
The first theme is that true, scientific knowledge can be formalized, that
is, expressed in finite formulas constructed from discrete symbols (such as
the letters, numbers and other signs of mathematical formulas). A complementary part of this theme is that scientific inference can be expressed in a
finite number of precise, definite rules for manipulating these formulas, and
that these rules depend only on the shape and arrangement of the symbols
(think of algebraic processes such as solving a simple equation). This theme
began with Aristotle’s identification of formal laws of deduction, continued
with the extensive logical investigations of the Middle Ages (by Ockham et
al.), showed its potential through attempts by Lull, Leibniz and others to
devise a universal calculus for knowledge representation and inference, and
reached maturity in the nineteenth century in the symbolic logics developed
by philosophers and mathematicians such as Boole, Frege, and Russell.
In the first decades of the twentieth century the movement to formalize
scientific knowledge developed into an extreme form called logical positivism
(or logical empiricism). This philosophy of science is based on Russell’s logic
and Wittgenstein’s theory of logical atomism, which views items of scientific
knowledge as logical constructions of elementary propositions, such as sense
data or measurements. The philosophy was explicitly formulated in the “Vienna Circle,” whose members included Moritz Schlick, Rudolf Carnap and
Otto Neurath, but rapidly spread to become the dominant philosophy of science in the first half of the twentieth century. Many major movements in
the individual sciences, such as the Copenhagen interpretation in quantum
mechanics (Bohr), functionalism in sociology (Durkheim) and behaviorism
in psychology (Skinner), are simply applications of logical positivism to that
science. Indeed, although it began to lose popularity about 1960, logical
positivism still forms much of the background of our view of “real science.”
In particular, although contemporary cognitive science has rejected behaviorism, a positivist view of method, it has not completely broken away from
rule-based knowledge representation, a positivist view of cognition.
The second theme is usually called the arithmetization of geometry, but
Formalization of
Knowledge
Logical Positivism
Rationalization of
the Continuum
8
Limitations of
Calculi
Limitations of
Symbolic AI
CHAPTER 1. OVERVIEW
is more accurately called the rationalization of the continuum. It stems from
the discovery by the ancient Pythagoreans that certain continuous magnitudes (what we now call irrational numbers) could not be expressed as ratios
of whole numbers; that is they could not be rationalized, or expressed as
finite, discrete formulas. Since the ancient Greeks assumed that true knowledge was by its nature expressible in finite, discrete formulas, there seemed
to be something inherently irrational (incomprehensible) about continuous
magnitudes. Thus arose the problem of arithmetizing geometry, that is, of
reducing the troubling continuous magnitudes, studied in geometry, to the
more comfortable discrete numbers (integers), studied in arithmetic. The
ancient Greeks were not able to accomplish this — Euclid axiomatizes arithmetic and geometry separately — and the problem of arithmetizing geometry
remained open until the nineteenth century when Dedekind and Wierstrass
showed how real numbers could be constructed from sequences of rational
numbers.
These accomplishments meshed into the development of formal logic, so
that by the first decades of the twentieth century it seemed that mathematical knowledge and reasoning would be soon reduced to a calculus. Unfortunately, just when it seemed that the rationalization of mathematics was
at hand, disturbing and seemingly irreparable defects were discovered in the
foundations; mathematical logicians such as Skolem, Löwenheim, Gödel and
Turing discovered fundamental limitations in the ability of calculi to represent knowledge and inference. In particular they call into question the
apparent success of the arithmetization of geometry. Although most of these
discoveries were made before 1940, their implications for science and mathematics and for cognition and computation are still unclear and a topic of
continuing debate. Part of the difficulty is that the assumptions of discrete
knowledge representation are so embedded in our view of knowledge that we
find it difficult to see any alternative to viewing knowledge as a calculus.
The development of electronic digital computers in the 1940s provided the
first opportunity to directly test the hypothesis that knowledge was representable by finite, discrete formulas and that it could be processed by discrete
symbol-manipulation operations. Early experiments were very promising,
and led in the early 1970s to a surge of interest in rule-based systems. Unfortunately, more ambitious goals in both artificial intelligence and cognitive science began to expose the weaknesses of the calculus as a model of cognition;
these include the inability to handle novelty, minor exceptions, uncertainty
and context-sensitivity without an exponential explosion in the number or
1.5. ALTERNATIVE VIEWS OF COGNITION
9
size of the rules, and thus in the required computational resources. Further
evidence of the inadequacy of the calculus model comes from the “cognitive
inversion” of rule-based approaches: what they seem to do well are higher
cognitive activities (formal logic, chess, mathematics); what they seem to do
poorly are lower cognitive activities (perception, coordination, pattern recognition, image processing). This inversion suggests that natural intelligence
is not rule-based. Thus, by the early 1980s researchers were being driven to
the conclusion that the calculus was impractical as an implementation of artificial intelligence and implausible as a model of biological intelligence. But
what could be an alternative?
1.5
Alternative Views of Cognition
An alternative is connectionism, but to see its importance it is necessary
to understand it in the context of alternative theories of knowledge. We
have seen that the dominant view of knowledge in the Western philosophical
tradition has reflected a pervasive aversion to the continuous and the infinite,
and a preference for definite rules and finite, discrete, symbolic structures.
Thus the calculus becomes the ideal model of a knowledge structure, and the
goal of any science becomes the axiomatization of its knowledge. Further,
since the essence of a calculus is in its formal structure rather than its material
embodiment, there has been a tendency to think of knowledge as independent
of its biological context.
There have always been philosophers that have disagreed with this view
of knowledge. For example, in ancient Greece Heraclitus stressed the continuity of knowledge, both in its structure and its evolution; the Sophists
discussed the dynamics of knowledge and its relation to society; and Aristotle investigated analogical knowledge structures and the role of knowledge in
its biological context. Positivist theories of knowledge were also criticized in
the eighteenth and nineteenth centuries by philosophers such as Kant, Hegel,
Dilthey and Nietzsche. The early twentieth century brought the development of two important alternatives to positivism, phenomenology, especially
as formulated by Husserl and Heidegger, and ordinary language philosophy,
as investigated by Austin, Wittgenstein and others. The phenomenologists
stressed the importance of everyday understanding, including the social context, as a foundation for all knowledge; the ordinary language philosophers
showed the impossibility of defining the terms of everyday language in the
The Traditional
View: A Calculus
Philosophical
Alternatives
10
Psychological
Alternatives
Neuroscientific
Alternatives
way demanded by the logical positivists, and the inadequacy of a positivist
model of conceptual knowledge. The latter half of the twentieth century has
also seen a renewed attempt to understand knowledge in both its biological (Lakoff, Johnson, Millikan, et al.) and social contexts (Kuhn, Lakatos,
Popper, Dreyfus, et al.).
There have also been minority views in psychology, most notably Gestalt
psychology (Wertheimer, Koffka, Köhler, et al.), which in the early twentieth century opposed a continuous, holistic view of knowledge to the discrete,
atomistic models of behaviorism. However, before it was able to develop
an empirically verified theory of the mechanisms of cognition, it fell before
the logical positivist juggernaut. Nevertheless, the decline of positivism has
brought a renewed interest in Gestalt ideas, as well as investigations into
mental imagery (Shepard, Miller, Paivio, Kosslyn, et al.) and the inadequacy of discrete, symbolic models of concepts and categories (Rosch et al.).
There has also been a trend toward putting cognition in its biological context
(Gibson, Wason, Johnson-Laird, et al.).
By 1950 the view that the brain stores information like a computer memory, with discrete items of information stored in discrete locations, had come
under attack from neuroscientists such as Karl Lashley. He proposed that
each “item” of information was distributed over an entire region of the brain,
with many different items being superimposed in the same region. Although
Lashley was unable to explain the mechanism of distributed memory, such a
mechanism was proposed in the 1960s by Karl Pribram, who proposed that
memory operated on principles analogous to an optical hologram, that is,
the interference of continuous fields. This idea forms a bridge between the
theories of the Gestalt psychologists and modern connectionist theories.
1.6
Origins of
Connectionism
CHAPTER 1. OVERVIEW
Connectionism
In the middle decades of this century, before the sequential, digital computer
had completely dominated our view of computation, there were many exciting
applications of artificial neural networks, parallel analog computer systems
with principles of operation patterned after the brain, such as Rosenblatt’s
“perceptrons” and Selfridge’s “Pandemonium.” For a variety of reasons, both
philosophical — the unquestioned acceptance of logical positivism working
its way through computer science — and technological — the rapid progress
in digital electronic technology — most neural network research had been
1.6. CONNECTIONISM
11
abandoned by the late 1960s. Although a few researchers continued in the
area (e.g., Grossberg, Longuet-Higgens, Anderson, Willshaw), the 1970s were
the decade of rule-based models of intelligence, whether natural or artificial.
Hubert Dreyfus’ 1972 book, What Computers Can’t Do, is best known for What Computers
its apparent negative conclusion, the impossibility of artificial intelligence, Can’t Do
but it is also important for opening the way for connectionism since, in
addition to showing the limitations of traditional AI, it reviewed some of the
psychological evidence against symbolic knowledge representation, showed
the importance of the phenomenological view of knowledge, and pointed to
holographic models of cognition as an alternative to rule-based computation.
By the early 1980s growing dissatisfaction with the calculus model was
causing some researchers in cognitive science and artificial intelligence to explore alternative models based on the cooperative and competitive interaction
of large numbers of simple elements that represent and process information
at the subsymbolic level (Feldman, Ballard, Rumelhart, McClelland, Hinton, Smolensky, Sejnowski, Sutton, Barto, Kohonen, Hopfield, Hofstadter,
and many others). Although some of the earlier connectionist models still
used discrete (digital) symbols stored locally in a network, the trend has
been toward representations based on continuous (analog) values distributed
throughout the network. The result is both psychologically and biologically
more plausible. Psychologically, because connectionist systems exhibit many
of the same strengths and weaknesses as natural systems; for instance, they
do not exhibit the cognitive inversion. Biologically, because it is much easier
to conceive of an implementation in nervous tissue of a connectionist system
than of a rule-based system. Since the mid-1980s there has been an efflorescence of connectionist research, both basic and applied, but its implications
for our understanding of knowledge, language and cognition – both natural
and artificial — are still on the horizon.2
The contribution of connectionism is that it integrates these alternative
theories of knowledge into the body of scientific theory by providing access
2
The terms connectionism, parallel distributed processing (PDP), and neural networks
are used nearly synonymously. I prefer connectionism to the more awkward (but perhaps
more accurate) parallel distributed processing, but both refer to the same alternative theory
of knowledge and cognition. I will reserve the term neural networks (and neurocomputing)
for the new approach to computing inspired by biological neural systems. The difference,
however, is more of intent than of content.
Contemporary
Connectionism
Significance of
Connectionism
12
The Homunculus
Banishing the
Homunculus
CHAPTER 1. OVERVIEW
to the mechanisms of cognition. That is, although many philosophers and
psychologists were aware of the inadequacy of positivist theories of knowledge, their alternatives seemed unscientific because they did not show how
they related to the commonly accepted theories and methods of the other
sciences.
The problem is to “banish the homunculus” — a colorful description of
the goal of a scientific psychology. The problem is shown quite literally in
some popular presentations of the brain. We may see the eyes and other
senses wired to dials and TV monitors within a control room in the brain.
The muscles, internal organs, etc. are in turn controlled by wires connected
to control levers and switches in this same control room. And there, in the
middle of it all — looking at the dials and manipulating the levers — is a
little man (homunculus), usually wearing an engineer’s cap!
It is apparent that nothing has been accomplished by explaining our cognitive capabilities in terms of a little man in our heads. How are his cognitive
capabilities to be explained? By reduction to another, smaller homunculus?
Although few scientists would postulate a homunculus in such an explicit
form, it is remarkably hard to keep homunculi out of theories of the mind. For
example, if a neuroscientist identifies a visual projection area in the cortex,
the question immediately arises: Who (or what) is “looking” at the image
in that area? No one proposes tiny TV screens in our brains, but a visual
projection area is not so different unless we can explain mechanistically how
the image is processed.
Thus the only way to be sure we have banished the homunculus is to give
a complete in-principle mechanistic (i.e., non-homuncular) account of every
step from sensation to action; only then can we be sure we haven’t begged
the central question.
Part of the importance of AI to cognitive science is that it promises to do
exactly this. Since, we suppose, there is no “ghost in the machine,” successful
artificial intelligence is ipso facto a non-homuncular theory of intelligence.
Unfortunately, traditional AI was not successful, since it was based on an
inadequate theory of knowledge. The promise of the new AI is that, by
virtue of a better epistemology, it will succeed in banishing the homunculus.
Download