Abstract There is strong empirical and theoretical evidence in psychology

advertisement
Abstract
There is strong empirical and theoretical evidence in psychology
that human development follows a stage-like progression throughout
the lifespan. Development is marked by relative stability, punctuated
by periods of rapid and profound conceptual change. However, the
driving forces behind these bursts of change are poorly understood. In
this paper I examine a growing body of evidence that developmental
change can emerge through a process of rational, statistical inference
as a learner interacts with his or her environment. I briefly introduce
the main idea behind statistical (bayesian) inference over structured
hypothesis spaces, and then explore the specific domain of number
learning with unique implications on the emergence of stages in human
development.
1
Stages of conceptual change in
humans and robots
Jonathan Scholz
15 December 2010
1
Introduction
Dating back at least to the work of Piaget [17], psychologists have distinguished between the ideas of learning and development. Whereas
learning was conceived to be a basic process of association and reaction, development was viewed as a fundamental reorganization of the
machinery of cognition. Development proceeded in a series of discrete
stages, between which humans made profound conceptual leaps. The
critical need for such a process in any artificial system approaching
human intelligence has been appreciated for some time, but until recently the basic requirements for a developmental learner were poorly
understood at the computational level. However, recent progress has
been made towards formalizing conceptual change in children using
the tools of bayesian statistics. This paper will consider how a recent model of one particular developmental process, numerical concept
learning, has provided the first computational account of a stage-like
learning phenomenon.
This paper is organized as follows. First I review the findings in
the psychology literature that suggest general stage-like processes of
development. I then describe the number learning problem in greater
detail, including both the basic empirical results and theoretical accounts in the developmental literature. In section 4, I introduce the key
ideas behind the use of bayesian statistics in psychology, and describe
several results with particular relevance to developmental learning in
both humans and robots. Finally, I explain a bayesian model of the
bootstrapping of number concepts designed to address a gap in the
developmental literature. I conclude by discussing the implications of
this research, and some future agendas for research.
2
2
Stages of development
Since at least as far back as the work of Piaget in the 1950’s, most psychologists have considered human development as procession through a
series of discrete stages. Far from being an arbitrary characterization
of child development, Piaget’s “accommodation” hypothesis commits
to a strong empiricist view that children modify their mental machinery to understand novel stimuli [19, 18]. While contrary to the views
of some of his leading contemporaries such as Chomsky and Fodor,
Piaget’s view has featured prominently in modern accounts of development such as the “Theory-Theory” [11].
Although Piaget’s constructivist ideas were initially inspired by observing his own children, there has been a great deal of empirical support over the past several decades. He was the first to propose the idea
of object permanence as a key developmental milestone, in which children come to understand objects as existing beyond their immediate
sensory environment [17]. Object permanence was initially thought
to emerge at between 3 and 4 years of age, but more recent studies
have pushed the age to 2 years and even 5 months using preferentiallooking time methods [2]. In either case, researchers agree that it is an
important conceptual accomplishment for young children.
The identification of object permanence at younger and younger
ages was made possible by removing the dependence on language, another well studied and stage-like developmental process. Carol Chomsky identified 5 specific stages of language faculty development [5].
Such qualitative shifts in children’s abilities have also been observed in
Theory of Mind reasoning [13], multimodal perception [1], and number
comprehension [24]. It is the last of these, reasoning about number,
that will be of particular interest as a instance of conceptual bootstrapping.
3
Learning the concept of number
One of the most striking examples of sudden conceptual change can be
found in children’s apprehension of number words. Children proceed
through a series of stages in their understanding of the relationship
between numerosity and the number words. Wynn identified 4 discrete stages of number competence, as well as a general window in
which these phases progress [24]. Notably, researchers and parents
alike observe the progression to the final phase as a “quantum leap”
in understanding, in which children switch from a rote to a systematic
understanding of number [20]. It is this leap that is of deepest interest
from a modeling point of view.
3
3.1
The subset knower phenomenon
At around 2 years of age, as children begin to learn basic words in their
native language, they first gain the ability to reason about number in
a limited way. If asked to retrieve 1 fish, a 2-year-old can successfully
remove a single fish from a bowl containing many toy fish. If asked
to retrieve two or more, however, children at the one-knower stage
will simply provide a handful of fish [24]. This is striking, since the
same children exhibit proficiency in counting up to 10 or more, and
will spontaneously utter number words in correct order. They later
proceed through two-knower and even a short three-knower period in
which their number comprehension tops out at two or three. Eventually, though, children display a sudden shift in understanding during
which they discover the underlying logic relating number words and numerosity. That is, rather than progressing on through four-knower and
five-knower stages, they transition to being cardinal-principal (CP)
knowers, able to grasp the abstract rules of number.
In addition to the seismic shift in competence, the manner in which
children provide answers to number questions is revealing. Theories of
CP development posit a sudden jump from an association-based ability to a logical one. Consistent with this idea, children before the CP
stage virtually never use counting to give large numbers, whereas children shortly after the CP-transition almost always do [24]. Younger
children instead seem to have directly mapped particular small numerocities onto the correct number words, thereby succeeding at small
numerosities with little effort, but lacking any general method for other
numbers.
3.2
Conceptual bootstrapping
Carey argues that the development of number competence can be explained by a process of conceptual bootstrapping [4]. By contrast with
associationist theories, bootstrapping argues for a representation of
number using mental models of small sets. For instance, one-knowers
might have a model of “one” as {X}, and “two” as {X,X}. These representations serve as the input a domain-general capacity referred to as
enriched parallel individuation. Carey and Corre argue that such a capacity is fundamental for working with sets and individuating objects,
and can explain subset-knower’s ability to put a set labeled “two” into
1-1 correspondence with their mental model of two, {X,X} [16].
In the bootstrapping account, the transition to CP-knower happens
when children observe a pattern between their memorized list of count
words and the successively larger sets they map to. By discovering the
rule that moving one step on the count list corresponds to increasing
the set size by one, children have an abstract definition for bootstrap-
4
ping arbitrarily large sequences of number words onto their appropriate
meaning. This account explains the dramatic CP-transition, but says
little about how children discover the rule in the first place. Indeed,
critics of the bootstrapping explanation cite vagueness and informality
as key weaknesses [8]. Perhaps more interestingly, Rips et al. argue
that bootstrapping is fundamentally a circular idea, which presupposes
some form of innate successor function to support reasoning with sets
[21]. A successor function is a function that for any number of objects
in a set sequence, returns next(k), the next largest set. Such a dependence would be a problem for Carey’s account because it moves the
work of relating counting words with sets from a learnable representation to an innate one. As will be discussed in Section 5, Piantadosi
et al. present a model which demonstrates that all three criticisms are
unfounded. First, however, we must provide some background on the
general methods they employ.
4
Bayesian models of cognition
One of the exciting trends in cognitive science over the past decade
has been the unification of symbolic and statistical theories of learning under a common framework. The approach is broadly referred
to as “bayesian modeling”, and technically includes any method which
employs Bayes’ rule to formalize inference under uncertainty. In the
developmental literature, however, it has take a more specific form, as
a normative account of learning from data. Just over the past four
years, bayesian models have been proposed to explain children’s patterns in the diverse areas of causal reasoning [10], social goal inference
[3], word learning [7], sensorimotor integration [15], and structured
representation general [14].
The common thread through each of these research programs is
the formulation of learning as a process of scoring hypotheses from a
hypothesis space based on how well they explain observed data. For
example, a bayesian model for learning to recognize animals might
select the hypothesis that “horses are the ones with hooves” over “horses
are the ones that are brown” when trained on a collection of images of
horses and bears. The insight which the model makes formal is that
the first hypothesis, “hooves”, is more useful for identifying horses than
“brownness” (at least when horses and bears are involved). Bayes’ rule
is a powerfully general tool which can be applied to any situation in
which a learner must select among competing explanations for some
event or observation.
The bayesian approach initially became popular in the artificial
intelligence community for handling measurement uncertainty, but has
come to play a more critical role in psychology as a mechanism for
5
formalizing the “inductive” learning of new concepts. Whereas the
learning of concepts such as “horse” or “sunrise” from a small number of
examples is considered an inductive leap, selecting among hypotheses is
fundamentally deductive. Thus, bayesian inference stands to solve the
“riddle of induction” by transforming an ill-defined inductive problem
into a well-defined deductive one [9]. Formally, Bayes’ rule can be
stated as
P (x|h)P (h)
P (h|x) = P
(1)
h∈H P (x|h)
The term P (x|h) is referred to as the likelihood, and indicates how
likely some observation x is given a hypothesis h. P (h) is the prior
probability of the hypothesis h, before any data is observed. Finally,
P (h|x) is the term of interest, and contains the probability of hypothesis h given that data x were observed. Each of these terms in fact
refers full probability distribution, and Bayes’ rule gives a way of updating
our estimate P (h|x) in light of some observation x. The term
P
h∈H P (x|h) is simply a normalizing term that sums the total probability of the data over all hypotheses, thus ensuring that P (h|x) is
a legal distribution. From this general rule we can derive arbitrarily
complicated models of data, from visual features to social behavior.
The important property of a bayesian model is that it describes the
statistically optimal way to update one’s beliefs in light of new data.
5 A computational model of number concept bootstrapping
The debate on the bootstrapping theory of number learning rested on
arguments of what computations the core systems of number were capable of. Carey et al. would have us believe that the knowledge of sets
alone was sufficient to develop a full system for numerical reasoning,
which would spontaneously develop as children gained experience with
number words. Rips et al. argued instead that the core system must
include a successor function to make cardinal-principle reasoning possible. Without a clear alternative to satisfy the role of next(k), the bootstrapping theory fails to address Rips’ concern. However, the bayesian
approach offers a solution by casting number learning as search over
a suitable hypothesis space. When combined with the machinery described above for inductive inference, Piantadosi et al. illustrate not
only that bootstrapping for number words is possible, but that the
developmental progression observed in children is consistent with an
optimal bayesian learner.
The core ingredient for Piantadosi’s model is a hypothesis space for
language, or what Carey might refer to as the core conceptual prim-
6
itives that we can assume humans are born with. Carey’s bootstrapping theory is not alone in presuming sophisticated innate capacities
in infants. Spelke has also argued for the existence of core systems
of number for explaining young infant’s apparent facility in number
tasks [6]. Thus, Piantadosi et al. are not straying far from the status quo in employing the lambda calculus as a core representational
system for language. The use choice of lambda calculus is not unique
to this work, and is in fact quite common as a formal language for
compositional semantics [12, 22].
Lambda calculus is a rich yet convenient language for expressing a
wide range of computational problems. It enjoys a long history within
linguistics and computer science, and was the basis for Lisp, one of
the oldest and most successful computer languages. The main activity
of lambda calculus is specifying how to build larger expressions out of
smaller parts. For example,
λx.(not(singleton?x)).
(2)
describes a function for determining if a set contains a single element.
The syntax of a lambda expression is as follows. To the left of a
period, there is a “λx". This denotes that the argument to the function
is the variable named x. On the right hand side of the period, the
lambda expression specifies how the expression evaluates its arguments.
Expression 2 returns the value of not applied to (singleton? x). In turn,
(singleton? x) is the function singleton? applied to the argument x.
Since this lambda expression represents a function, it can be applied
to arguments (sets) to yield return values. For instance, 2 applied to
{Bob, Joan} would yield TRUE, but {Carolyn} yields FALSE since
only the former is not a singleton set [20].
In addition to simple expressions as explained above, the lambda
calculus is capable of defining rich, arbitrarily complicated functions.
In using the primitives in lambda calculus as a hypothesis space, a
learner has a rich universe of concepts in which to search for a model
that best fits the data. For such a learner, the data consists of a
set of events, much as a child might experience, in which a group of
some number of objects is presented, along with a word indicating the
numerosity. One example of a more sophisticated concept which the
learner might select is shown below.
λS.(if (singleton?S)
(3)
00
(4)
(next(L(selectS))))
(5)
“one
In this example “next” returns the next word in the counting list
(NOT the next largest set, as required for Rips et al.), and “select”
returns a set containing a single element from S (a singleton). Overall,
7
then, the expression immediately returns “one” if the set contains 1
element, and if not it selects a single element from the set and calls
itself on this element. During this call it will return “one”, since the
set now contains a single element. This is an example of a somewhat
complicated expression, in that it involves recursion, and one which is
not obviously of use. However, it helps underscore the large space of
potential functions which can be represented with a few simple predicates. For additional examples of the predicates used in the model,
and functions for working with numbers, see [20].
To evaluate hypotheses in the form of lambda expressions given
data as described above, Piantadosi et al. introduce a bayesian model
which defines a probability distribution over expressions given observed
words W, containing a target type T (e.g. “cats”), in a context C (e.g.
cat, horse, dog, dog, cat, dog). By bayes rule, we have:
"
#
Y
P (L|W, T, C) ∝ P (W |T, C, L)P (L) =
P (wi |ti , ci , L) P (L) (6)
i
The authors define a likelihood function for a word given an expression and a context which, intuitively, assigns higher probability to
words that the grammar can generate, and lower probability to words
that it can not. Finally, a prior term P (L) assigns greater probability
to shorter expressions, and expressions without recursion, which biases the learner towards compact and simple concepts. This form of
bayesian “Occam’s razor” has been shown to be an important inductive
bias in learning concepts that fit observed data well but also generalize
to new situations. One of the great advantages of the bayesian approach is that these well-known but abstract scientific principles are
trivial to codify in a model.
After specifying the model, all that remains is to provide data
and “turn the crank”. The question then becomes: What hypotheses the learner will prefer as a function of the data it’s been given? In
this experiment, the data provided was obtained from the CHILDES
database, which contains examples with word frequencies that approximate the naturalistic word probabilities in child-directed speech. As
can be seen in Figure 1, the model progressed through 4 distinct phases
of development as the amount of experience increased. Like children,
the model initially preferred simple direct mappings for one, two and
even three objects. However, at four objects an interesting effect can
be found. Rather than continuing to discover concepts for higher order
words on the count list, it makes a sudden transition to a representation
that’s equivalent to CP-knower status in children. Thus, the learner
provides a computational verification of Carey’s initial suspicion: that
at some point it is more advantageous to learn a computational system
than rote memorize the correspondence between words and numerosity.
8
Figure 1: Learning results on CHILDES dataset. X-axis indicates amount
of data provided for the model, and Y-axis scores the probability of each
hypothesis given the data
Figure 2: Comparison of learners. Grey traces depict the large number of
low-probability hypotheses as the learner gains experience
However, unlike the Rips explanation, Piantadosi presented a hypothesis space in which the CP knower was but one latent concept in an
infinite space. While simple to codify, the lamdba calculus presented
here is a highly expressive “language of thought”, and contains sufficient
richness to model the developmental progression observed in humans.
Furthermore, the bayesian model for search this space correctly chose
the CP-knower hypothesis, over other possibilities equally consistent
with the data (such as a MOD-10 system, [20]).
6
Conclusion
While the model described above may be only of passing interest outside the scope of the bootstrapping debate, it has profound implications
on the theories of development in artificial agents and robots. Over the
9
past decade, a significant contingent of the robotics community has
sought inspiration for humanoid robots from the developmental psychology literature. The intuition behind the change might be obvious;
after decades of watching robots repeatedly fail in non-laboratory environments, a natural question is what makes human infants so adept at
coping with novel situations. For these communities of developmental
and epigenetic robotics, the question changed from “how can we design
behavior X into our robot” to “how can we design learner L into our
robot such that it will learn behavior X (and Y,Z) as it gains experience”. Unfortunately, the theoretical foundation for learner L does not
really exist, and certainly not to a level commensurate with human
achievement.
Not to be discouraged, robotics researchers such as Stoytchev [23]
identified a series of principles of development through rigorous analysis, and went on to impose these stages in a humanoid robot. In the
case of Stoytchev, principles included “gradual exploration” and “verifiability”, abstract ideas that governed his approach to robot control
architecture. This approach is clearly insightful and forward thinking,
but doesn’t truly break from the tradition of designing behaviors for
robots manually. To the extent that it is true, however, such a shortcoming can be justified by the lack of any real method for providing
domain-general developmental constraints. This is what makes the
work of Piantadosi et al. so compelling from a robotics perspective.
The important thing about their model is not what it seeks to explain,
but the fact that the driving force behind its human-like performance
is a general-purpose constraint imposed by the bayesian machinery.
Contrary to much of the work in developmental robotics, in which a
stage-like progression imposed explicitly or subconsciously by the programmer, the model described above came upon its stages organically.
The observed transition from 3-knower to CP-knower was driven by
an interaction between a likelihood term that maintained the CP hypothesis as new data came in, and a prior term that preferred it over
competitors which became increasingly complicated in order to explain
all the observed data.
In the end, models like these will probably gain and lose favor as
new results emerge in the literature, but the current climate marks a
turning point towards a more principled approach to computational
learning theory. As Kemp observes,
We have other interesting computational (bayesian) models
of cognitive processes, but they often assume significant inplace representational an reasoning abilities. The direction
of the field is clearly towards an overarching theory of how
all of these problems are learnable from a core set of domaingeneral and domain-specific faculties.
10
Piantadosi’s model can be seen as a step in this direction, and with
any luck, robot will soon begin taking these steps as well.
References
[1] L.E. Bahrick. Infants’ perception of substance and temporal synchrony in multimodal events*. Infant Behavior and Development,
6(4):429–451, 1983.
[2] Renee Baillargeon, Elizabeth S Spelke, and Stanley Wasserman. Object permanence in five-month-old infants. Cognition,
20(3):191–208, 1985.
[3] C. L. Baker, R. Saxe, and J. B. Tenenbaum. Action understanding
as inverse planning. Cognition, 2009.
[4] S. Carey. The Origin of Concepts. Oxford University Press, 2009.
[5] C. Chomsky. Stages in language development and reading exposure. Harvard Educational Review, 42(1):1–33, 1972.
[6] L. Feigenson, S. Dehaene, and E. Spelke. Core systems of number.
Trends in cognitive sciences, 8(7):307–314, 2004.
[7] M. C. Frank, N. D. Goodman, and J. B. Tenenbaum. Using speakers’ referential intentions to model early cross-situational word
learning. Psychological Science XX, 2009.
[8] CR Gallistel. Commentary on Le Corre & Carey. Cognition,
105(2):439–445, 2007.
[9] N. Goodman. The new riddle of induction. Philosophy of Science:
An Historical Anthology, page 424, 2009.
[10] A. Gopnik, C. Glymour, D.M. Sobel, L.E. Schulz, T. Kushnir,
and D. Danks. A theory of causal learning in children: Causal
maps and Bayes nets. Psychological review, 111(1):3–31, 2004.
[11] A. Gopnik and H.M. Wellman. The theory theory. In An earlier
version of this chapter was presented at the Society for Research
in Child Development Meeting, 1991. Cambridge University Press,
1994.
[12] I. Heim and A. Kratzer. Semantics in generative grammar. WileyBlackwell, 1998.
[13] W. Josef et al. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13(1):103–128, 1983.
[14] C. Kemp and J. B. Tenenbaum. The discovery of structural form.
Proceedings of the National Academy of Sciences, 2008.
11
[15] K. Kording and J. B. Tenenbaum. Causal inference in sensorimotor integration. Advances in Neural Information Processing
Systems 19, 2007.
[16] M. Le Corre and S. Carey. One, two, three, four, nothing more:
An investigation of the conceptual sources of the verbal counting
principles. Cognition, 105(2):395–438, 2007.
[17] J. Piaget and E. Duckworth. Genetic epistemology. American
Behavioral Scientist, 13(3):459, 1970.
[18] J. Piaget and B. Inhelder. The Psychology of the Child New York,
1969.
[19] Jean Piaget. Part i: Cognitive development in children: Piaget
development and learning. Journal of research in science teaching,
2(3):176–186, 1964.
[20] Steven T Piantadosi, Joshua B Tenenbaum, and Noah D Goodman. Bootstrapping in a language of thought: A formal model of
numerical concept learning. Cognition, 2012.
[21] L.J. Rips, J. Asmuth, and A. Bloomfield. Giving the boot to
the bootstrap: How not to learn the natural numbers. Cognition,
101(3):B51–B60, 2006.
[22] M. Steedman. The syntactic process, volume 131. MIT Press,
2000.
[23] A. Stoytchev. Robot Tool Behavior: A Developmental Approach
to Autonomous Tool Use. PhD thesis, Georgia Institute of Technology, College of Computing, 2007.
[24] K. Wynn. Children’s acquisition of the number words and the
counting system* 1. Cognitive Psychology, 24(2):220–251, 1992.
12
Download