Can Statistical Learning Bootstrap the Integers? Lance J. Ripsa, Jennifer Asmuthb, Amber Bloomfieldc a Psychology Department, Northwestern University, 2029 Sheridan Road, Evanston, IL 60208 USA b Psychology Department, Susquehanna University, 514 University Avenue, Selinsgrove, PA 17870 USA c Center for Advanced Study of Language, University of Maryland,7005 52nd Ave., College Park, MD 20742 USA Corresponding author: Lance Rips Psychology Department Northwestern University 2029 Sheridan Road Evanston, IL 60208 USA 847.491.5947 Fax: 847.491.7859 Email: rips@northwestern.edu Reply to Piantadosi et al. / 2 Abstract This paper examines Piantadosi, Tenenbaum, and Goodman’s (2012) model for how children learn the relation between number words (“one” through “ten”) and cardinalities (sizes of sets with one through ten elements). This model shows how statistical learning can induce this relation, reorganizing its procedures as it does so in roughly the way children do. We question, however, Piantadosi et al.’s claim that the model performs “Quinian bootstrapping,” in the sense of Carey (2009). The concept it learns is not discontinuous with the concepts it starts with. Instead, the model learns by recombining its primitives into hypotheses and confirming them statistically. As such, it accords better with earlier theories (Fodor, 1975, 1981) that learning does not increase expressive power. We also question the relevance of the simulation for children’s learning. The model starts with a small, preselected set of primitives, and the procedure it learns differs from children’s method. Finally, the knowledge of the positive integers that the model attains is consistent with an infinite number of nonstandard meanings—for example, that the integers stop after ten or loop from ten back to one. Keywords: Bootstrapping, Number knowledge, Number learning, Statistical learning Reply to Piantadosi et al. / 3 1. Introduction According to the now standard theory of number development, children gradually learn to recognize and produce collections of one, then two, and then three objects in response to requests, such as “Give me one [two, three] cups” or “Point to the picture of one [two, three] elephants.” At this point, they rapidly extend their success to larger collections—up to those named by larger numerals on their list of number terms, for example, “ten” (Wynn, 1992). (The largest numeral for which they are successful can vary, but let’s say “ten” for concreteness.) The standard theory sees this last achievement as the result of the children figuring out how to count objects: They learn a general rule for how to pair the numerals on their list with the objects in a collection in order to compute the total. No one doubts that children in Western cultures learn enumeration as a technique for determining the cardinality (i.e., the total size) of a collection. However, debates exist about how children make this discovery (see, e.g., Carey, 2009; Leslie, Gelman, & Gallistel, 2008; Piantadosi, Tenenbaum, & Goodman, 2012; and Spelke, 2000, 2011) and about its significance for their knowledge of number (e.g., Margolis & Laurence, 2008; Rey, 2011; Rips, Asmuth, & Bloomfield, 2006, 2008; Rips, Bloomfield, & Asmuth, 2008). Our aim in the present article is to examine a recent theory of learning enumeration—one by Piantadosi et al.—and to compare it to an earlier proposal by Carey. In doing so, we are motivated by Piantadosi et al.’s claim to have shown that difficulties we identified in Carey’s theory disappear on closer inspection. 1.1. Carey’s bootstrap proposal Carey (2004, 2009) has given a detailed account of learning to enumerate as an instance of a process she calls Quinian bootstrapping. In brief, children start with a short memorized list of numerals in order from “one” to “ten,” but where these numerals are otherwise uninterpreted. Over an approximately one-year period, children successively attach the numeral “one” to a mental representation of an arbitrary one-member set (e.g., {o1}), the numeral “two” to a representation of a two-member set (e.g., {o1, o2}), Reply to Piantadosi et al. / 4 and the numeral “three” to a representation of a three-member set (e.g., {o1, o2, o3}). Children next realize that a parallel exists between the order of the numeral list (“one” then “two” then “three”) and the set representations ordered by the addition of one element ({o1} then {o1, o2} then {o1, o2, o3}). They infer that the meaning of the next element on the numeral list is the set size given by adding one element to the set size named by the preceding numeral (e.g., the meaning of “five” is the cardinality one greater than that named by “four”). This inference allows them to determine the correct cardinal value for the remaining items on their count list (up to “ten”). In what follows, we refer to the conclusion of this inference as the bootstrap conclusion. This interesting proposal raises many empirical and theoretical questions (for a sample of these, see the commentaries to Carey, 2011, and other sources cited later in this article). Two of these issues, however, are important for the present discussion. The first is whether learning to enumerate creates a fundamentally new way of representing the positive integers. According to Carey, Quinian bootstrapping provides a child with new primitive concepts of number, concepts that the child’s old conceptual vocabulary can’t express, even in principle: Quinian bootstrapping mechanisms underlie the learning of new primitives, and this learning does not consist of constructing them from antecedently available concepts (they are definitional/computational primitives, after all) using the machinery of compositional semantics alone (Carey, 2009, p. 514, emphasis in the original). No translation is possible between the old number concepts and the new ones: To translate is to express a proposition stated in the language of [Conceptual System 2] in the language one already has ([Conceptual System 1])… In cases of discontinuity in which Quinian bootstrapping is required, this is impossible. Bootstrapping is not translation; what is involved is language construction, not translation. That is, drawing on resources from within CS1 and elsewhere, one constructs an incommensurable CS2 that is not translatable into CS1 (Carey, 2011, p. 157). Reply to Piantadosi et al. / 5 In the last quotation, CS1 is the child’s conceptual system prior to an episode of Quinian bootstrapping and CS2 is the conceptual system that results from bootstrapping. As the first of these quotations makes clear, Quinian bootstrapping is a kind of learning, usually an extended process that takes months or years to complete. Children don’t acquire the new number concepts by mere maturation. Likewise, external causal forces don’t merely stamp them in. A challenge in understanding Quinian bootstrapping is how to reconcile the claim about learning with the claim about discontinuity between old and new concepts. A second question about Quinian bootstrapping is the scope of the concepts it produces. Enumeration pairs the elements on the child’s count list with collections of objects. Be we have argued in earlier work (Rips et al., 2006) that learning the meanings of “one” through “ten” via Quinian bootstrapping does not pin down even the cardinal meanings of these terms to their correct meanings. This raises one of the questions mentioned earlier: How important is Quinian bootstrapping (and enumeration in general) to children’s understanding of the integers? 1.2. An overview Piantadosi et al. (2012) present an explicit model of how children learn to enumerate objects, and the clarity of their theory provides a chance to assess the issues just mentioned.1 Although this model differs from Carey’s (2004, 2009) in important details, as Piantadosi et al. note, they nevertheless claim it exemplifies bootstrapping. In the following sections, we argue, first, that because the model learns to enumerate by straightforward concept combination, it does not create a new discontinuous conceptual system. Instead, it illustrates Fodor’s (1975, 1981) hypothesis that learning elaborates old concepts: It cannot produce a new language that increases the child’s expressive power. Because of this limitation, the model is incapable of bootstrapping in Carey’s sense and does little to clear up the issues surrounding bootstrapping. Of course, this doesn’t mean that the model is incorrect. It could provide a correct account of enumeration even if Quinian bootstrapping has no role in its procedure. However, the model’s method 1 Unless we otherwise attribute them, all page references are to Piantadosi et al. (2012). Reply to Piantadosi et al. / 6 of enumerating differs from that of children, and these differences raise questions about the relevance of the model to children’s actual abilities. Finally, Piantadosi et al.’s model maps sets of one to ten elements to the terms “one” to “ten,” but has no implications for the structure of the integers. So the same difficulties about scope that beset Quinian bootstrapping carry over to this new proposal. 2. Is the Piantadosi et al. model a form of bootstrapping? Carey (2004, 2009) introduced “Quinian bootstrapping” as a term for procedures in which people learn new concepts that are discontinuous from their old ones. Thus, we take the central claims of Quinian bootstrapping to be these (see Beck, submitted for publication): Learning: In Quinian bootstrapping, an agent learns a new conceptual system, CS2, in terms of an old system, CS1. Discontinuity: After Quinian bootstrapping, CS2 is conceptually discontinuous from CS1. These properties are explicit in the quotations in Section 1.1 and in many other places in Carey’s (2009) presentation. Important for the present discussion is the fact that Carey introduces her chapter on how children acquire representations of the positive integers by setting herself two challenges: “to establish discontinuities in cognitive development by providing analyses of successive conceptual systems, CS1 and CS2, demonstrating in what sense CS2 is qualitatively more powerful than CS1” and “to characterize the learning mechanism(s) that get us from CS1 to CS2” (p. 288). Because of the Discontinuity claim, Quinian bootstrapping opposes the view that all forms of learning derive new concepts by recombining (or translating from) old ones. According to this alternative view (Fodor, 1975, 1981), learning is a form of hypothesis formation and confirmation in which the hypotheses are spelled out in the old conceptual vocabulary (i.e., the concepts that the person possesses prior to learning). Confirmation merely stamps them in without producing anything fundamentally new. To have a general term for such non-bootstrapping forms of learning, we’ll use concept recombination. Proponents of bootstrapping agree that mundane forms of learning use recombination. Bootstrapping Reply to Piantadosi et al. / 7 occurs only in special cases. So the question that divides theorists is whether any actual instances of learning are examples of bootstrapping. Piantadosi et al.’s clearest statement about this issue suggests that their approach is much closer to recombination than to Quinian bootstrapping (p. 214): One of the basic mysteries of development is how children could get something fundamentally new… Our answer to this puzzle is that novelty results from compositionality. Learning may create representations from pieces that the learner has always possessed, but the pieces may interact in wholly novel ways…This means that the underlying representational system which supports cognition can remain unchanged throughout development, though the specific representations learners construct may change. This approach, whatever its merits, has nothing to do with creating new primitives (or new conceptual systems) that are discontinuous with old ones. As such, it jettisons a central part of Carey’s bootstrapping theory, the Discontinuity claim. “Quinian bootstrapping” was introduced as a technical term, so there is limited room for redefining it while simultaneously claiming to defend it. We note that developmentalists have used the term “bootstrapping” in ways that differ from Carey’s “Quinian bootstrapping.” In research on language acquisition, syntactic bootstrapping is a hypothetical process in which children use syntactic properties of sentences to determine the referents of component words, and semantic bootstrapping is a process in which children use the referents of words to determine their syntactic category (see Bloom & Wynn, 1997, for a discussion of these possibilities in the context of number). Neither of these forms of bootstrapping qualifies as Quinian bootstrapping, according to Carey (2009, p. 21), since neither creates a conceptual system discontinuous with earlier ones. Piantadosi et al.’s proposal for number learning is not an example of either syntactic or semantic bootstrapping, since syntactic categories play no role in it. In fact, it might be possible to contend that their proposal fails to qualify as bootstrapping in an even wider sense, but we won’t argue for that Reply to Piantadosi et al. / 8 conclusion here.2 Our concern, instead, is to show that the Piantadosi et al. model is not a type of Quinian bootstrapping—it does not satisfy both the Learning and Discontinuity criteria—and from here on, we will use “bootstrapping” to mean Quinian bootstrapping. 2.1. Bootstrapping’s central features To make the distinction between bootstrapping and recombination a little more precise, let c* represent a new concept that is created in learning, and let c1, c2, …, ck represent old concepts. Proponents of recombination believe that learning is a function taking the old concepts into the new one: 𝑐 ∗ = 𝑓(𝑐1 , 𝑐2 , … , 𝑐𝑘 ). It matters very much, however, what the function f is like. Advocates of bootstrapping agree that the input to learning is a set of old concepts and the output a new concept. As Carey (2011, p. 157) remarks, “Clearly, if we learn or construct new representational resources, we must draw on those we already have.” But Carey would maintain that in examples of bootstrapping the function is not mere recombination, as the first quotation in Section 1.1 makes explicit. To distinguish between the positions, then, we need some restrictions on f or on its arguments in order to spell out the difference between recombination and bootstrapping (Rips & Hespos, 2011). As a first possibility, proponents of bootstrapping could insist that bootstrapping algorithms are so complex that they go beyond what could reasonably be considered recombination. For example, Carey’s (2009) theory of how children learn to enumerate includes an analogical inference that maps the first few numerals (“one,” “two,” “three”) to corresponding representations of cardinalities (see the sketch in Section 1.1). If analogical inference is too complicated to be recombination, then learning to enumerate may be a form of bootstrapping. A second potential way to distinguish bootstrapping and recombination is to hold that the input concepts (c1, c2, …, ck) in bootstrapping come from a broader domain of knowledge than is possible in 2 As D. Barner has suggested (personal communication, June 14, 2012), all prior bootstrapping theories appear to require that earlier representational stages be psychologically necessary steps in the acquisition of later ones, whereas earlier representations of number in Piantadosi et al.’s theory (e.g., their Twoknower function) play no role in producing its later representations (e.g., the CP-knower function). Reply to Piantadosi et al. / 9 recombination. In the case of number learning, the input concepts to bootstrapping may belong to two or more distinct cognitive modules. For example, c1, c2, …, ci may come from a module devoted to natural language quantifiers (e.g., some or all), whereas ci+1, ci+2, …, ck may come from a module for representing small sets of physical objects. Or the input concepts may include some that don’t appear in the child’s earlier number representations. For example, the old number representations may include only concepts c1, c2, …, ci, whereas input to the new representations may also include concepts ci+1, ci+2,…, ck. It is unclear to us whether either of these strategies suffices to show that bootstrapping and recombination differ in kind. Sheer complexity of a process doesn’t seem inconsistent with recombination. The individual steps in learning may be lengthy or difficult without creating anything fundamentally new. If so, advocates of bootstrapping owe us an explanation of what aspects of learning cause it to go beyond recombination. Similarly, why must recombination respect limits on the domain of its input? Why shouldn’t recombination be allowed to draw on all old concepts in the learner’s repertoire? Fodor’s (2010) response to Carey’s theory is to deny any such limits. We are not claiming that proponents of bootstrapping have explicitly adopted either of these strategies. Nor do we claim that the strategies are exhaustive. 3 In looking at Piantadosi et al.’s proposal, however, let’s keep these options temporarily open, since they may help us see why these authors believe their model performs a type of bootstrapping. The underlying issue with bootstrapping is that advocates have to reconcile the Learning and Discontinuity claims. But doing so is tricky because these claims seem to pull in opposite directions, with Learning suggesting continuity rather than discontinuity. If Learning and Discontinuity cannot be reconciled, bootstrapping is incorrect and concept recombination is correct as a theory of human concept acquisition. A main point of interest, then, in Piantadosi et al.’s model is that it purports to furnish a working example of bootstrapping and may therefore demonstrate bootstrapping’s viability. 3 The two strategies just described are examples of what Beck (submitted for publication) calls deflationary theories for reconciling the bootstrapping claims about learning and discontinuity. Neither creates anything totally new to the child’s conceptual system, but either could bring to light concepts that were only latent within this system. More radical strategies are also possible for makings sense of the Learning and Discontinuity claims (Beck, submitted for publication, and Shea, 2011). But these are farther removed from Piantadosi et al.’s theory, and we therefore don’t discuss them here. Reply to Piantadosi et al. / 10 2.2. The Piantadosi et al. model The Piantadosi et al. model receives as input a series of sets of different sizes, ranging from one to ten elements, with the frequency of each set size determined by the corpus frequency of the associated number words (“one” to “ten”). It evaluates its current stock of hypotheses about the appropriate number word for a given set, increasing the probabilities of hypotheses that give the right answer (e.g., labeling a set of four items with “four”) and decreasing the probabilities of hypotheses that give an incorrect or null answer. After sufficient training, the model converges on a hypothesis—the Cardinal Principle (CP-) knower function—that correctly labels sets of one to ten elements: CP-knower function: λS. (if (singleton? S) “one” (next (L (set-difference S (select S))))) This function tests whether the input set of objects S is a singleton (i.e., one-element set), and if it is, labels it “one.” If not, it removes an element from S (i.e., set-difference S (select S)) and recursively applies the same CP-knower function to the reduced set (L accomplishes this recursion). If the reduced set is a singleton, it labels it next(“one”) or “two.” And so on. More interesting is the order in which hypotheses emerge as the most likely candidate. Early in training, the model labels one-element sets with “one” and all other set sizes as unknown. It then switches to labeling one- and two-element sets correctly (using its primitive singleton and doubleton predicates), then one-, two-, and three-element sets (using singleton, doubleton, and tripleton), and finally reaches a more complicated rule (the CP-knower function) that correctly labels one- to ten-element sets. This behavior is extensionally similar to the progress children make in acquiring words for set sizes (see Section 3 for qualifications). The learning sequence is a result of several design choices: First, the model starts with primitive predicates that: (a) directly recognize set sizes of one, two, and three elements (e.g., the singleton? predicate); (b) carry out logic and set operations (e.g., set-difference); Reply to Piantadosi et al. / 11 (c) traverse the sequence of number words “one,” “two,” …, “ten” (the next predicate); and (d) perform recursion (L). (See Piantadosi et al.’s Table 1 for the full list of primitives.) Second, the model constructs hypotheses from these primitives in a way that gives lower prior probabilities to lengthier hypotheses and to hypotheses that include recursion (depending on a free parameter, γ). Thus, the model starts by considering simple and inaccurate non-recursive hypotheses (e.g., singleton sets are labeled “one” and all other sets are undefined) and ends with a more complex, but correct, recursive hypothesis as the result of feedback about the correct labeling. 2.3. Does the Piantadosi et al. model employ bootstrapping? Piantadosi et al. try to make the case that the discovery of the correct number hypothesis is a form of bootstrapping, though not quite of the variety Carey described in introducing this term. But on the face of it, their model looks like a perfect example of recombination. It starts with a small stock of primitives, and it combines them into hypotheses according to syntactic rules (a probabilistic context-free grammar). The model learns which of these hypotheses is best by Bayesian adjustment through feedback. Thus, all the primitive concepts that the model uses to frame its final hypothesis are already present in its initial repertoire. The only missing element is the correct assembly of these primitives by the grammar. These restrictions would seem to leave the model with little room for innovation of the sort that bootstrapping requires. Why should we regard the process as bootstrapping rather than as translating one system into another? In setting out the bootstrap idea, we mentioned two possible strategies to discriminate it from recombination (see Section 2.1). Of these possibilities, the first one—that bootstrapping involves a learning process more complex than standard recombination—is out for the Piantadosi et al. model. The model’s grammar composes its hypotheses by assembling them from a previously existing base. The model employs no inference more complex than the Bayesian conditioning that updates the hypotheses’ probability. Reply to Piantadosi et al. / 12 Piantadosi et al. have a better chance, then, of defending their bootstrapping claim by adopting the second strategy. Perhaps the model’s discovery of the pairing between numerals and cardinalities incorporates concepts that aren’t available in its initial state. Here the obvious candidate is recursion. The model’s initial hypotheses make no use of recursion, whereas the final CP-knower hypothesis does. The recursive predicate (L) confers greater computational power on this last hypothesis than is present in the earlier ones. So perhaps bootstrapping occurs when the model introduces recursion. (The model ensures that this introduction happens relatively late in learning by handicapping all hypotheses containing the recursive predicate, as we mentioned earlier.) This accords with Piantadosi et al.’s statement that “the model bootstraps in the sense that it recursively defines the meaning for each number word in terms of the previous number word. This is representational change much like Carey’s theory since the CP-knower uses primitives not used by subset knowers, and in the CP-transition, the computations that support early number word meanings are fundamentally revised” (p. 212). The idea seems to be that the model’s representations for cardinalities before bootstrapping aren’t extendible to the representations it uses after. Something is missing from the early representations that’s necessary for a more adult-like understanding. But in thinking about whether the CP-transition is a form of bootstrapping, we should keep in mind that in many mundane instances of learning—in discrimination learning, for example—people add primitives that do not figure in earlier hypotheses. In learning to distinguish poisonous from edible mushrooms, people may have to take into account new properties like the color of the mushrooms’ spores that were not parts of their original mushroom representations. No one would claim, though, that including spore color in the new concept is a discontinuous conceptual change. Likewise, merely including a previously unused predicate in a new hypothesis about number meaning doesn’t by itself imply that the hypothesis is discontinuous with old ones. Stretching the concept of Quinian bootstrapping to include such simple property additions would trivialize this concept. So if adding the recursive predicate L does produce a big conceptual change that must be because L is special—perhaps because it significantly increases the hypothesis’s computational power—not because it is new. Reply to Piantadosi et al. / 13 However, adding recursion still doesn’t conform to bootstrapping as Carey describes it in the quotations of Section 1. The model’s grammar prior to adopting the CP-knower hypothesis is identical to its grammar after adopting it, as is its primitive conceptual vocabulary. So a translation manual could easily express the new hypothesis in the old vocabulary, precisely as is done in Piantadosi et al.’s definition of the CP-knower function, which we displayed earlier. This undermines the idea that bootstrapping does not reduce to translation. For much the same reason, the CP-knower hypothesis in Piantadosi et al.’s version does not involve the creation of new primitives, and it certainly doesn’t create them in a way that goes beyond “the machinery of compositional semantics” (see the first of the quotations from Carey in Section 1.1). Of course, Piantadosi et al. could position their model as a non-Quinian type of “bootstrapping” that allows translation and dispenses with the need to create new primitives. But this move would abandon the important and arresting ideas that Carey had in mind in introducing this concept. Bootstrapping in this revised sense would not create a conceptual system that is discontinuous with the earlier one, and hence, it would not implement a method that bears the same intellectual interest. This revision would not simply raise the “semantic” issue of how we should use the term “bootstrapping,” but it would discard one of bootstrapping’s essential properties. In short, the Piantadosi et al. model could have made bootstrapping plausible by showing how the Learning and Discontinuity claims can be joined. Instead, it either jettisons Discontinuity or trivializes it. The reasonable conclusion from the Piantadosi et al. model is not that it provides a rigorous form of bootstrapping, but that it demonstrates bootstrapping as unnecessary. Children have no need for bootstrapping, since they can learn a correct method of enumeration through ordinary recombination. 3. How realistic is the model? The points raised in the preceding section do not show that the model is incorrect as a theory of how children learn to label cardinalities. Even if the model doesn’t learn by bootstrapping, it could still be Reply to Piantadosi et al. / 14 the right explanation of this learning process. However, three aspects of the model’s behavior deserve comment and suggest that it does not learn enumeration in the way children do. 3.1. The model’s choice of primitives First, the model draws on a relatively small set of handpicked primitives—the predicates in Piantadosi et al.’s Table 1, which we described in Section 2.2. The model forms all its hypotheses as combinations of these predicates. Piantadosi et al. believe these predicates “may be the only ones which are most relevant” to number learning (p. 202). But this restriction raises the question of whether children also limit their hypotheses in the same convenient way. We don’t dispute the importance of these predicates to knowledge of number, but how do children know prior to learning that these predicates are the most relevant ones? Piantadosi et al. claim that their theory “can be viewed as a partial implementation of the core knowledge hypothesis (Spelke, 2003),” but also deny that the primitive predicates are part of an encapsulated core domain devoted to number: “These primitives—especially the set-based and logical operations—are likely useful much more broadly in cognition and indeed have been argued to be necessary in other domains” (p. 202). If this particular set of primitives does not come cognitively prepackaged, however, then children must search for them among a much larger group of mental predicates, and many candidates exist in this larger set that carry numerical information. Although the model includes a few primitives (e.g., set intersection) that are not necessary for its hypotheses, the model excludes, by fiat, analog magnitudes, mental models, and explicit quantifiers (e.g., some), which according to many theories are relevant parts of children’s number knowledge prior to their mastery of enumeration.4 Consider analog magnitudes. According to this idea, people have access to a continuous mental measure that varies positively with the number of physical objects in the perceptual array. People can therefore use 4 For analog magnitudes, see, for example, Dehaene (1997), Gallistel and Gelman (1992), and Wynn (1992). For mental models, Mix, Huttenlocher, & Levine (2002). For quantifiers, Barner and Bachrach (2010), Carey (2009), and Sarnecka, Kamenskaya, Yamana, Ogura, and Yudovina (2007). Reply to Piantadosi et al. / 15 this measure as an approximate guide to cardinality. A model like Piantadosi et al.’s could easily build a hypothesis that makes use of analog magnitudes to label set sizes, and such a hypothesis would compete with those of the present version of the model, presumably gaining relatively high posterior probability. It could therefore slow or alter the course of number learning by delaying the success of the CP-knower hypothesis. On the one hand, if Piantadosi et al.’s predicates are truly the only ones children use in forming their hypotheses, then we need to know what enables the children to restrict their attention to these items and exclude information like analog magnitudes. On the other hand, if children consider a wider set of predicates, what’s the evidence that the model can converge on the right CP-knower procedure and do so in a realistic amount of time? The quantitative results from Piantadosi’s simulations become irrelevant under this second possibility. 3.2. The model’s method of enumeration A second question about the model’s fidelity is whether the CP-knower procedure is similar enough to children’s actual enumeration to back the claim that the model learns what children do. Children match numerals one-one to objects in an iterative way (Gelman & Gallistel, 1978). In counting a set of three cups {cup1, cup2, cup3}, they label a first object (e.g., cup1) “one” and remove it from further consideration. They then label the second object (e.g., cup2) “two,” and so on. As Piantadosi et al. point out, however, “the model makes no reference to the act of counting (pointing to one object after another while producing successive number words)” (p. 213). As we mentioned earlier, what the model does instead is recurse through the set of items, taking set differences until it arrives at a singleton, and then it unwinds through the list of numerals to arrive at the total. For example, if the model is enumerating the set of three cups, it first tests to see if the set is a singleton. Since it’s not, the model recursively applies the procedure to the result of removing one item from the set (e.g., {cup2, cup3}). Since this new set is still not a singleton, it again recursively applies the procedure to the set (e.g., {cup3}) formed by removing Reply to Piantadosi et al. / 16 another item from the set. Having at last found a singleton, it labels the set “one,” then labels the next largest set “two,” and finally labels the original set “three.” We can put this point about the difference between children’s behavior and the model’s in a second way: When older children count “one, two, three…three cups,” the first three number words do not label the size of sets of cups. Instead, these words mark positions in the count sequence. The children then rely on the principle—Gelman and Gallistel’s (1978) Cardinal Principle—that the final word in the enumeration sequence is the cardinality of the set, and they thus infer that there are “three cups.” Thus, only the second “three” in the earlier phrase denotes a number of cups. By contrast, the model always uses numerals as labels for set sizes. In their discussion (pp. 214-215), Piantadosi et al. claim that children’s actual counting behavior is a metacognitive effort to keep their place in the recursive routine. But what reason could there be for not taking the children’s simpler (and equally accurate) enumeration algorithm at face value? The model’s inability to arrive at the right procedure suggests that something is wrong with its architecture and calls into question Piantadosi et al.’s claims (p. 200) that “all assumptions made are computationally and developmentally plausible.” 3.3. The model’s knowledge of the sequence of cardinalities A third difference between children’s behavior and the model’s behavior is the extent of children’s knowledge at the time they become CP-knowers. The usual test of CP knowledge is that children can correctly produce sets of up to ten objects when asked to “Give me n.” For example, when asked to “Give me eight beads,” they can produce eight from a larger pile of beads. Recent evidence by Davidson, Eng, and Barner (2012), however, shows that children who are able to perform this task are often unable to say whether a single bead added to a box of five results in six beads rather than seven. This is the case even though children at the same stage can correctly say that the numeral that follows “five” is “six” rather than “seven.” Davidson et al. (2012, p. 166) note that their analysis “reveals that many CP-knowers do not have knowledge of the successor principle for even the smallest numbers… Reply to Piantadosi et al. / 17 These data suggest that knowledge of the successor principle does not arise automatically from becoming a CP-knower, but that this semantic knowledge may be acquired later in development.” The Piantadosi et al. model is limited to determining the cardinality of a given set of objects. So no logical inconsistency arises between possessing this skill and not being able to tell that one object added to a set of five yields a set of six. Still, Piantadosi et al.’s CP-knower function, in the course of determining that “six” labels a six-item set, also determines that “five” labels a set with one fewer element. Given this procedure, children’s difficulty in figuring out that a six-item set is one greater than a five-item set is mysterious and again suggests that the model’s CP-knower function is more complex than the routine children actually use at this point in their number development. Piantadosi et al. (pp. 206) describe their theory as a computational-level model, in the sense of Marr (1982). So perhaps we should discount these deviations between the model’s behavior and children’s, since they concern particular methods of pairing number words and cardinalities. The crucial claims of the Piantadosi et al. paper, however, depend on more than computational description. For example, whether the model learns by bootstrapping depends on whether the procedures the model employs before becoming a CP-knower are qualitatively different from the procedure it employs later. This difference requires a comparison of the algorithms before and after learning, and it limits how abstractly we can view the model when we come to evaluate it (see Jones & Love, 2011, for general criticisms along these lines of Bayesian learning theories). 4. How much does the model know about the positive integers? An intriguing aspect of bootstrapping is that it is supposed to produce the child’s first true representation of the positive integers. According to Carey (2004, p. 65), “coming to understand how the count list represents numbers reflects a qualitative change in the child’s representational capacities; I would argue that it does nothing less than create a representation of the positive integers where none was available before.” Similarly, according to Piantadosi et al. (p. 201): Reply to Piantadosi et al. / 18 Bootstrapping explains why children’s understanding of number seems to change so drastically in the CP-transition and what exactly children acquire that’s “new”: they discover the simple recursive relationship between their memorized list of words and the infinite system of numerical concepts. In Section 2, we examined the issue of whether Piantadosi et al.’s model effects a qualitative change in representations. But setting that issue aside here, how much does the model know about the positive integers—the “infinite system of numerical concepts”? 4.1. Does bootstrapping capture the meaning of the first few numerals? One thing seems clear. The model never learns the full set of positive integers. It simply learns to associate set sizes with the correct number words “one” to “ten” (or to whatever word is the last term on the model’s count list). Presented with a set of ten items, the model correctly labels it “ten,” but presented with a set of eleven items, it cannot make a correct response if “ten” is its largest count term. In another sense, however, the model does possess a general rule for relating number words and cardinalities: the CP-knower function, shown in Section 2.2. Piantadosi et al. write, “bootstrapping has been criticized for being incoherent or logically circular, fundamentally unable to solve the critical problem of inferring a discrete infinity of novel numerical concepts (Rips, Asmuth, & Bloomfield, 2006, 2008; Rips, Bloomfield, & Asmuth, 2008). We show that this critique is unfounded…” (p. 200). But although we do believe that bootstrapping is unable to solve the “problem of inferring a discrete infinity of novel numerical concepts,” we did not criticize bootstrapping as inconsistent or circular.5 Moreover, the conclusion itself is a correct generalization about number word-cardinality pairs, as we noted in our 5 A threat of circularity looms, however, if you read too much into the bootstrapped conclusion. You may be tempted to think that the conclusion fixes the cardinal meaning of the numerals if you understand “next term on the count list” as involving the full, infinite list for the positive integers. The full list does, of course, fix the numeral’s meaning since it is isomorphic to the positive integers. But at the time children perform the bootstrap inference, they have no knowledge of the full list; so assuming this structure as part of the bootstrapping process does lead to circularity. We note, too, that logical difficulties with bootstrapping’s inference procedures are not the same as the difficulties we surveyed in Section 1. The former concern problems in getting from the meanings of “one,” “two,” and “three” to the meanings of the terms in the rest of the child’s count list. The latter difficulties concern the more abstract problem of combining the Learning and Discontinuity theses. Reply to Piantadosi et al. / 19 earlier papers. What is learned is a correlation between advancing one step in the number word sequence (e.g., from “four” to “five”) and increasing the cardinality of a set by one. (In the Piantadosi et al. model, this correlation is implicit in the CP-knower procedure rather than declaratively represented, but the effect is the same.) This is an important discovery for children, and any theory that explains how they do it is praiseworthy. The trouble with this principle, however, is that, at the time children learn it, it fails to specify the meaning of the terms for the positive integers (Rips et al., 2006; Rips, Asmuth, & Bloomfield, 2008; Rips, Bloomfield, & Asmuth, 2008). After adopting the CP-knower function, the Piantadosi et al. model has a way to connect the word “one” to cardinality one, “two” to cardinality two,…, and “ten” to cardinality ten. But the same function is equally extendible to either of the mappings in (1) and (2), as well as an infinite number of others: (1) “one” denotes only cardinality one. “two” denotes only cardinality two. … “ten” denotes only cardinality ten. (2) “one” denotes cardinalities one, eleven, twenty-one,… “two” denotes cardinalities two, twelve, twenty-two,… … “ten” denotes cardinalities ten, twenty, thirty,… That is, the CP-knower function doesn’t constrain the cardinal meanings of the number words on the child’s list to their ordinary meanings. Proponents of bootstrapping now appear to agree with us that the CP-knower function and its equivalents don’t give children the meanings for numerals beyond those on their list of count terms. But it doesn’t give them the correct meanings for numerals on their count lists either, as (1) and (2) reveal. Knowing that a correlation exists between the numerals and the cardinalities is of no help in picking out the positive integers from among its rivals unless the child knows either the structure of the numerals or Reply to Piantadosi et al. / 20 the structure of the cardinalities. However, the numeral sequence, as given by the next predicate in Piantadosi et al.’s model, does not continue beyond “ten,” and as Piantadosi et al. emphasize (p. 212), their model does not build in a successor relation for cardinalities. Because the structure of the positive integers is well understood, we can be quite specific about what the CP-knower function fails to convey. It does not enforce the ideas that the correct structure is one that has: (a) a unique first element, (b) a unique immediate successor for each element, (c) a unique immediate predecessor for each element except the first, and (d) no element apart from those dictated by (a)-(c). 4.2. Can the model exclude rival meanings for the integers? Results from their simulations show that Piantadosi et al.’s model learns the standard pairing for the first ten integers rather than an alternative pairing in which “one” is mapped to sets with one or six elements, “two” to sets with two or seven elements, …, and “five” to sets with five or ten elements. This latter Mod-5 hypothesis (see their Figure 1) fails for two reasons: First, the model receives feedback that disconfirms the Mod-5 pairings, and second, the Mod-5 hypothesis is more complex than the correct alternative, given the choice of primitives. When feedback supports the Mod-5 hypothesis, however, the model eventually learns it. From these facts, Piantadosi et al. (p. 211) conclude: This work was motivated in part by an argument that Carey’s formulation of bootstrapping actually presupposes natural numbers, since children would have to know the structure of the natural numbers in order to avoid other logically plausible generalizations of the first few number word meanings. In particular, there are logically possible modular systems which cannot be ruled out given only a few number word meanings (Rips et al., 2006; Rips, Asmuth, & Bloomfield, 2008; Rips, Bloomfield, & Asmuth, 2008). Our model directly addresses one type of modular system along these lines: in our version of a Mod-N knower, sets of size k are mapped to the k mod Nth number word. We have shown that these circular systems of meaning are simply less Reply to Piantadosi et al. / 21 likely hypotheses for learners. The model therefore demonstrates how learners might avoid some logically possible generalizations from data… The problem for theories of number learning, however, is not eliminating hypotheses that the data directly disconfirm, such as Piantadosi et al.’s Mod-5 hypothesis. Instead, the difficulty lies in selecting from the infinitely many hypotheses that have not been disconfirmed. For the simulations in Piantadosi et al., these would include Mod-11, Mod-12, Mod-13, …. The model can’t decide among these hypotheses because its list of numerals stops at “ten” and because it has no information about cardinalities greater than ten. Hypotheses like Mod-11 might seem syntactically complex relative to the CP-knower function. If so, the model would prefer CP-knower to Mod-11, even without training on sets of eleven, due to the model’s assignment of higher prior probabilities to simpler hypotheses. But this is not the case. How simple or complex a function must be to capture Mod-11 depends entirely on the structure of the numeral list beyond “ten” (which we are assuming is the child’s highest count term). If the list continued, “one,” “two,”…, “ten,” “one,” “two,”…, “ten,” “one,” “two,” …, “ten,”…, then the CP-knower function would respond exactly in accord with Mod-11. Since neither children nor the Piantadosi et al. model knows how the count list continues, syntactic complexity can’t decide between Mod-11 and the standard meanings of the numerals; that is, it can’t discriminate between (1) and (2), above. (This is a variation of Goodman’s, 1955, famous point about the role of syntactic complexity in induction.) The message in our earlier papers was that the bootstrap conclusion does nothing to settle the question of whether the cardinal meaning of the first few numerals is given by their usual (adult) meaning or by Mod-11, Mod-12, and so on. The same is true of Piantadosi et al.’s CP-knower function. As Rey (2011) has pointed out in connection with Carey’s proposal, this issue is closely related to classic povertyof-the-stimulus arguments for learning natural language (e.g., Chomsky, 1965). Proponents of bootstrapping could contend that children’s knowledge of the meaning of the numerals suffers from the same problem that the bootstrap conclusion does. Adults clearly know that (1), and not (2), represents the correct meaning, but children may distinguish them only at a later point in their number development. Reply to Piantadosi et al. / 22 However, this conclusion, if it is true, places a stark limit on how much children learn about the numerals from the bootstrap’s conclusion. Piantadosi et al. begin to acknowledge this difficulty in noting that “the present work does not directly address what may be an equally interesting inductive problem relevant to a full natural number concept: how children learn that next always yields a new number word” (pp. 211-212). They believe that “similar methods to those that we use to solve the inductive problem of mapping words to functions could also be applied to learn that next always maps to a new word. It would be surprising if next mapped to a new word for 50 examples, but not for the 51st” (ibid.). But this conjecture is not obviously correct: Most lists that children learn—the alphabet, the months of the year, the notes of the musical scale—don’t have the structure of the natural numbers. Next for the English alphabet ends at the 26th item, and next for the sequence of U.S. Presidents currently ends at the 44th. The crucial difficulty, as we’ve emphasized, is that learning the mapping between the numerals and the cardinalities for one to ten can’t eliminate nonstandard sequences, such as Mod-11, unless children can somehow induce the correct structure. The structure could come from the cardinalities for the positive integers, or it could come from the structure of the numerals for these integers, since these structures are isomorphic. But it has to come from somewhere. Bootstrapping allows children to exploit the numeral sequence to determine the labels for cardinalities. But this strategy can’t pick out the right cardinal meanings—it merely passes the buck—unless the problem about how “next always yields a new number word” is resolved. 5. Conclusions On our view, the Piantadosi et al. model doesn’t bootstrap. It therefore doesn’t help vindicate bootstrapping as a cognitive process. What the model does is form hypotheses by recombining its primitives and confirming them statistically. So should we conclude that children can learn to enumerate through this (non-bootstrapping) sort of hypothesis formation and confirmation? Perhaps, although accepting this conclusion depends on ignoring the facts that (a) the model learns by combining a pre- Reply to Piantadosi et al. / 23 selected set of primitives but gives no account of how they are singled out, (b) finishes with a procedure that differs in important ways from children’s, and (c) has a firmer grasp of the sequence of cardinalities than children have. But even if the model is a correct description of how children learn to enumerate, the model still faces the problem that it leaves an unlimited set of possibilities for the meanings of the first few count terms. Reply to Piantadosi et al. / 24 Acknowledgements We thank David Barner, Jacob Beck, Jacob Dink, Brian Edwards, Emily Morson, James Negen, Steven Piantadosi, and Barbara Sarnecka for comments on an earlier draft of this article. IES grant R305A080341 helped support work on this paper. Reply to Piantadosi et al. / 25 References Barner, D., & Bachrach, A. (2010). Inference and exact numerical representation in early language development. Cognitive Psychology, 60, 40-62. doi: 10.1016/j.cogpsych.2009.06.002 Beck, J. (submitted for publication). Can bootstrapping explain concept learning? Bloom, P., & Wynn, K. (1997). Linguistic cues in the acquisition of number words. Journal of Child Language, 24, 511-533. doi: 10.1017/s0305000997003188 Carey, S. (2004). Bootstrapping and the origin of concepts. Daedalus, 133, 59-68. Carey, S. (2009). The origin of concepts. New York, NY: Oxford University Press. Carey, S. (2011). Concept innateness, concept continuity, and bootstrapping. Behavioral and Brain Sciences, 34, 152-161. doi: 10.1017/S0140525x10003092 Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: M.I.T. Press. Davidson, K., Eng, K., & Barner, D. (2012). Does learning to count involve a semantic induction? Cognition, 123, 162-173. doi: 10.1016/j.cognition.2011.12.013 Dehaene, S. (1997). The number sense: How mathematical knowledge is embedded in our brains. New York: Oxford University Press. Fodor, J. A. (1975). The language of thought: A philosophical study of cognitive psychology. New York: Crowell. Fodor, J. A. (1981). The present status of the innateness controversy. Representations: Philosophical essays on the foundations of cognitive science (pp. 257-316). Cambridge, MA: MIT Press. Fodor, J. A. (2010, October 8). Woof, woof [Review of the book The Origin of Concepts, by S. Carey]. Times Literary Supplement, pp. 7-8. Gallistel, C. R., & Gelman, R. (1992). Preverbal and verbal counting and computation. Cognition, 44, 4374. doi: 10.1016/0010-0277(92)90050-r Gelman, R., & Gallistel, C. R. (1978). The child's understanding of number. Cambridge, Mass.: Harvard University Press. Reply to Piantadosi et al. / 26 Goodman, N. (1955). Fact, fiction and forecast. Cambridge, MA: Harvard University Press. Jones, M., & Love, B. C. (2011). Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169-188. doi: 10.1017/s0140525x10003134 Leslie, A. M., Gelman, R., & Gallistel, C. R. (2008). The generative basis of natural number concepts. Trends in Cognitive Sciences, 12, 213-218. doi: 10.1016/j.tics.2008.03.004 Margolis, E., & Laurence, S. (2008). How to learn the natural numbers: Inductive inference and the acquisition of number concepts. Cognition, 106, 924-939. doi: 10.1016/j.cognition.2007.03.003 Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W.H. Freeman. Mix, K. S., Huttenlocher, J., & Levine, S. C. (2002). Quantitative development in infancy and early childhood. New York, NY: Oxford University Press. Piantadosi, S. T., Tenenbaum, J. B., & Goodman, N. D. (2012). Bootstrapping in a language of thought: A formal model of numerical concept learning. Cognition, 123, 199-217. doi: 10.1016/j.cognition.2011.11.005 Rey, G. (2011). Learning, expressive power, and mad dog nativism: The poverty of stimuli (and analogies), yet again. Paper presented at the Society for Philosophy and Psychology, Montreal. Rips, L. J., Asmuth, J., & Bloomfield, A. (2006). Giving the boot to the bootstrap: How not to learn the natural numbers. Cognition, 101, B51-B60. doi: 10.1016/j.cognition.2005.12.001 Rips, L. J., Asmuth, J., & Bloomfield, A. (2008). Do children learn the integers by induction? Cognition, 106, 940-951. doi: 10.1016/j.cognition.2007.07.011 Rips, L. J., Bloomfield, A., & Asmuth, J. (2008). From numerical concepts to concepts of number. Behavioral and Brain Sciences, 31, 623-642. doi: 10.1017/s0140525x08005566 Rips, L. J., & Hespos, S. J. (2011). Rebooting the bootstrap argument: Two puzzles for bootstrap theories of concept development. Behavioral and Brain Sciences, 34, 145-146. doi:10.1017/S0140525X10002190 Reply to Piantadosi et al. / 27 Sarnecka, B. W., Kamenskaya, V. G., Yamana, Y., Ogura, T., & Yudovina, Y. B. (2007). From grammatical number to exact numbers: Early meanings of 'one', 'two', and 'three' in English, Russian, and Japanese. Cognitive Psychology, 55, 136-168. doi: 10.1016/j.cogpsych.2006.09.001 Shea, N. (2011). New concepts can be learned. Biology & Philosophy, 26, 129-139. doi: DOI 10.1007/s10539-009-9187-5 Spelke, E. S. (2000). Core knowledge. American Psychologist, 55, 1233-1243. doi: 10.1037/0003066x.55.11.1233 Spelke, E. S. (2011). Quinean bootstrapping or Fodorian combination? Core and constructed knowledge of number. Behavioral and Brain Sciences, 34, 149-150. Wynn, K. (1992). Children's acquisition of the number words and the counting system. Cognitive Psychology, 24, 220-251. doi: 10.1016/0010-0285(92)90008-p