Shreesh P. Mysore 08/24/2003 1 Why Let Networks Grow? Thomas R. Shultz Department of Psychology and School of Computer Science McGill University 1205 Penfield Ave. Montreal, Quebec, Canada H3A 1B1 E-mail: thomas.shultz@mcgill.ca Shreesh P. Mysore Control and Dynamical Systems Program California Institute of Technology 1200 E. California Blvd. Pasadena, CA 91125 U.S.A. E-mail: shreesh@cds.caltech.edu Steven R. Quartz Division of the Humanities and Social Sciences, and Computation and Neural Systems Program California Institute of Technology 1200 E. California Blvd. Pasadena, CA 91125 U.S.A. E-mail: steve@hss.caltech.edu 1. Introduction Beyond the truism that to develop is to change, the process of developmental change itself has been relatively neglected in developmental science. Methodologically, behavioral studies have traditionally utilized cross-sectional studies, which have revealed a great deal about how certain behavioral and cognitive abilities differ at various developmental times, but tend to reveal less about the developmental processes that operate to transform those cognitive and behavioral abilities. A variety of reasons exist that account for the lack of explanations of developmental change, and range from the methodological challenges their study entails to principled, learningtheoretic arguments against the very existence of such processes (Macnamara, 1982; Pinker, 1984). Shreesh P. Mysore 08/24/2003 2 Among the most influential arguments against developmental change was Chomsky’s (1980) instantaneity argument, which argued that there is nothing from a learning standpoint that developmental change adds to an account of development – if one starts with the null hypothesis that children are qualitatively similar learners to adults and suppose that development is instantaneous (in the sense that there are no time dependencies in development), then departing from this by supposing children are initially more restricted learners only results in weakening their acquisition properties. The upshot is that less is less, and developmental change will only add to an account of development by reducing acquisition properties. Thus, as a sort of charity principle, one ought to at least start with the hypothesis that children do not differ substantially in their acquisition capacity from adults, since explaining development is hard enough without having to do so with greatly diminished learning. These sorts of arguments lead to the widespread assumption that there is little intrinsic theoretical insight to be gained from studying processes of developmental change. Indeed, even with the advent of connectionist models and their adoption in developmental science, most of the models that were used were qualitatively similar to models of adult learning. That is, most models used a fixed feedforward architecture in which the main free parameter was connection strengths. Therefore, such models implicitly adopted Chomsky’s argument and began with the assumption that the immature and mature state are qualitatively identical. Yet, how well-founded is this assumption, and is it the case that the main free parameter of learning is akin to the connection strengths of a fixed architecture? Viewed from the perspective of developmental cognitive neuroscience, it is now well established that the structural features of neural circuits undergo substantial alterations throughout development (e.g., Quartz and Sejnowski, 1997). Do such changes add anything of interest to the explanation of development and developmental change? If so, does a better understanding of the processes of developmental change undermine Chomsky’s argument, and thereby demonstrate that an understanding of developmental change is crucial for understanding the nature of cognitive development? In this chapter, we ask the question, why let networks grow? We begin by reviewing the wealth of accumulating data from neuroscience that network growth appears to be a much more central feature of learning than is traditionally assumed. Indeed, it appears that the assumption that the main modifiable parameter underlying learning is changes in connection strength in an otherwise fixed network may be in need of revision. In section 2, we revisit this question with our eye toward the evidence for neurogenesis – the postnatal birth and functional integration of new neurons – as an important component of learning throughout the lifespan. In section 3, we turn to consider the computational implications of learning-directed growth. There, we will consider whether less really is really less, or whether by breaking down the traditional distinction between intrinsic maturation and cognitive processes of learning the learning mechanism underlying cognitive development thereby becomes substantially different, and more powerful, in its learning properties than a fixed architecture. In section 4, we present a case in which activitydependent network growth underlies important acquisition properties in a model system, auditory localization in the barn owl. Our motivation for presenting this model system is that this nonhuman system is very well characterized in terms of the biological underpinnings, which provide important clues into the likely biological mechanisms underlying human cognitive development, and which give rise to general computational principles of activity-dependent network growth. In Shreesh P. Mysore 08/24/2003 3 section 5, we present evidence from modeling cognitive development in children. There, we explore a number of computational simulations utilizing the Cascade Correlation (CC) algorithm. In contrast to back propagation algorithm, CC starts with a minimal network and adds new units as a function of learning. Thus, CC can be viewed at a high level as a rule for neurogenesis, and thus is a means of exploring the computational and learning properties of such developmental processes. In section 6, we present the main conclusions from this work and point to areas of future research and open research questions. 2. Experience-dependent neurogenesis in adult mammalian brains Two broad categories of neural plasticity exist, based on the nature of expression and encoding of change in the nervous system. They are ``synaptic efficacy change" and ``structural plasticity". Whereas synaptic efficacy change has been the predominantly studied form of plasticity (Martin and Morris, 2002), neuroscience research also contains abundant evidence of structural plasticity - dendritic spine morphology change in response to stimuli, synaptogenesis, and/or reorganization of neural circuits (see (Lendvai et al., 2000; Zito and Svoboda, 2002) for experience-dependent spine plasticity and a review of synaptogenesis respectively). The one form of adult structural plasticity that has not been a part of mainstream neuroscience until recently, is neurogenesis, i.e., the birth or generation of new neurons. Adult neurogenesis, as the name suggests, is the production of new neurons postnatally. We briefly discuss the state of current day research in this area. 2.1 Background and status of current research in neurogenesis As early as the turn of the twentieth century, researchers such as Koelliker and His had described the development of the central nervous system in mammals in great detail and had found that the complex architecture of the brain appears to remain fixed from soon after birth (see (Gross, 2000) for references). Hence, the idea that neurons may be added continually was not considered seriously. Cajal and others described the different phases of neuronal development and as neither mitotic figures nor the phases of development were seen in adult brains, they suggested that neurogenesis stops soon after birth. Although there were sporadic reports raising the possibility of mammalian adult neurogenesis, the predominantly held view was opposed to this idea. In the 1950s, the 3H-dT autoradiography1 technique was introduced to neuroscience research and in 1961, Smart (Smart, 1961) used it, for the first time, to study neurogenesis in the post-natal brain (3-day old mice). Subsequently, Altman (Altman, 1962; Altman and Das, 1965) published a series of papers in the 1960s reporting autoradiographic evidence for new neurons in the neocortex, olfactory bulb (OB), and dentage gyrus (DG) of the young and adult rat. He also reported new neurons in the neocortex of the adult cat. Unambiguous evidence testifying to the neuronal nature (as opposed glial) of these proliferating cells was not available. Starting from 13 H-thymidine (3H-dT) contains the radioactive isotope Tritium of Hydrogen. Neurogenesis, is by definition, the creation of new cells and this requires DNA synthesis. 3H-dT is readily incorporated into a cell during the S-phase (synthesis phase) of its cell cycle. In essence, 3H-dT serves as a marker of DNA synthesis and its presence is visualized by capturing its radioactive emissions on a photographic emulsion (autoradiography), a process similar to taking a photograph. Shreesh P. Mysore 08/24/2003 4 1977 through 1985, Kaplan (Kaplan, 1984; Kaplan, 1985) published a series of articles that used electron microscopy along with 3H-dT labeling to confirm the results of Altman et al from 15 years ago. He was able to verify the neuronal nature of the proliferating cells in the DG and OB of adult rats, cerebral cortex of adult rats. In parallel, from 1983 through 1985, Nottebohm et al (Goldman and Nottebohm, 1983; Nottebohm, 1985; Paton and Nottebohm, 1984) published results showing (using EM, 3H-dT autoradiography and electrophysiology) that neurogenesis occurs in adult songbirds in areas that correspond to the primate cerebral cortex and hippocampus of adult macaque monkeys. Simultaneously in 1985, Rakic (Rakic, 1985) published an authoritative 3H-dT study on adult rhesus monkeys which concluded contrarily that all neurons of the adult rhesus are generated during prenatal and early postnatal life. Further, based on subsequent studies with electron microscopy and an immunocytochemical marker for astroglia (and not neurons), Eckenhoff and Rakic (Eckenhoff and Rakic, 1988) suggested that a stable neuronal population in adult brains may be a biological necessity in an organism whose survival relies on learned behavior acquired over time, and the traditional view of adult neurogenesis continued to be held. Starting from 1988, it was finally established without doubt, that neurogenesis occurs in the dentate gyrus of the adult rat. Stanfield and Trice (Stanfield and Trice, 1988) showed that new cells in the rat DG extend axons into the mossy fiber pathway (DG to CA3 projection). More studies (that followed the fate of a cell from cell division to a differentiation in its final site of residence (Kornack and Rakic, 1999) confirmed adult neurogenesis in the DG and OB of the macaque monkey. While DG neurogenesis is now widely accepted in both rodents and primates, neocortical neurogenesis is still debated. In primates, cortical neurogenesis has stirred a controversy in the past few years. Gould et al (Gould et al., 1999) recently reported that the macaque monkey neocortex (prefrontal, parietal and temporal lobes) acquires large numbers of new neurons throughout adulthood. Further, Shankle at al (Shankle et al., 1998) reported independently that large additions that alternate with comparable losses of neurons occur in the human neocortex in the first years of postnatal life. The potentially broad biomedical implications and impact on the development of replacement therapy for neurological disorders have added to the attention these reports have received. In their study, Gould et al (Gould et al., 2001) used the BrdU immunohistochemical labeling2 technique to detect cell proliferation. Other researchers like Rakic (Nowakowski and Hayes, 2000; Rakic, 2002) are skeptical about the results because of the high doses of BrdU, the argued non-specificity of cell type markers used, potential misinterpretation of endothelial cells as migrating neurons, the ambiguity in the identity of migrating bipolar shaped cells (neurons, immature oligodendrocytes and immature astrocytes can all adopt bipolar shapes), the large numbers of neurons in the migratory stream (approximately 2500 new neurons per day (Gould et al., 2001), and false positives due to optical problems (potential superposition of small BrdU labeled satellite glial cells on unlabeled neuronal soma due to the use of an insufficient number of optical planes in confocal microscopy). 2 BrdU, Bromodeoxyuridine, is also incorporated into a cell during its S-phase and hence is also a marker of DNA synthesis. However, unlike 3H-dT, its incorporation is not stoichiometric and this makes quantification more difficult. Low doses of BrdU are advocated by researchers (Rakic 2002) (to prevent artifacts that may arise due to the signal amplification methods used), along with the use of independent means to detect cell proliferation. Shreesh P. Mysore 08/24/2003 5 In summary, incontrovertible evidence now exists showing that adult neurogenesis occurs in the DG and OB of mammals (including primates) (Gage, 2002). The research community is at present not unanimous in its view of neocortical neurogenesis in mammals, especially primates. We now turn to the effects of psychological and environmental factors on DG neurogenesis. 2.2 Modulation of hippocampal neurogenesis Several studies (see (Gross, 2000)) show that the lifespan of a newly generated hippocampal neuron is positively affected by enriched environment living (in the wild and in laboratory conditions; in birds (Barnea and Nottebohm, 1994; Nottebohm et al., 1994), in mice(Kempermann et al., 1997; Kempermann et al., 1998), in rats (Nilsson et al., 1999). Oestrogen is known to stimulate the production of new immature neurons in the DG. Interestingly, just running in a wheel enhances the number of BrdU labeled cells in the DG, although this could be due to several reasons (enhanced environmental stimulation, reduction in stress or due to an effect of exercise like increased blood flow). Stressful experiences have been shown to decrease the numbers of new neurons in the DG, by downregulating cell proliferation. Studies suggest that this effect is caused by alterations in the hypothalamic pituitary adrenal axis (HPA). The influence of stress has been introduced through exposure to predator odor in adult rats and in 1 week old rat pups, social stress in tree shrews and marmosets, and developmental stress in rats (with effects that persist into adulthood). Adrenal steroids probably underlie this effect as stress increases adrenal steroid levels and glucocorticoids decrease the rate of neurogenesis. Further, it has been shown that the decreased rate of neurogenesis associated with ageing is due to an increase in glucocorticoid levels (see (Gould and Gross, 2002) for a summary of these effects and for references). Neurogenesis has been correlated with hippocampally dependent learning experiences. For instance, trace eye blink conditioning and spatial learning (both hippocampally dependent tasks) in animals lead to an increase in the number of neurons – this happens through an extension in the survival of neurons rather than an increase in their production. There appears to be a critical period following cell production such that learning occurring in this period increases neuronal lifespan. All the above factors that affect adult neurogenesis in the dentate are interpreted, along with the evidence of their continual addition in several areas of the vertebrate brain, as indicating that it is functionally significant and not a mere relic left over from evolution. 2.3 Function of neurogenesis An important question in the context of adult neurogenesis is whether it subserves a valid function or is simply a vestige of development (Gross, 2000). We will first look at some evidence that suggests that it has a useful role to play, and then look at what exactly that role may be (Kempermann, 2002; Nottebohm, 2002) and how it is achieved. New hippocampal neurons are suspected of having a role in learning and memory. Several pieces of evidence have been suggested (Gross, 2000) in support of this argument. New neurons are Shreesh P. Mysore 08/24/2003 6 added to structures that are important for learning and memory (like the hippocampus, lateral prefrontal cortex, inferior temporal cortex, and posterior parietal cortex). Several conditions that are detrimental to the proliferation of granule cells in the dentate (like stress, increased levels of glucocorticoids, etc) are also implicated in lower performance in hippocampally dependent learning tasks and this suggests a causal link between the two. Several conditions that increase proliferation (like enriched environments, increased oestrogen levels, wheel running, etc) also enhance performance. Increases in social complexity have been found to enhance the survival of new neurons in birds (Lipkind et al., 2002). However, it is to be noted that in the cases of both positive and negative modulators of neurogenesis, other factors such as changes in the dendritic structure of synapses can contribute to changes in learning performance. Finally, adult generated neurons may share some properties with embryonic and early postnatal neurons in their ability to extend axons, form new connections more readily, and to make more synapses (!)}. Granule cells in the dentate show LTP of greater duration in younger rats compared with older ones, and adult generated granule cells appear to have a lower threshold for LTP. More speculatively, Gross (Gross, 2000) suggests that the transient nature of many adult neurons could mediate the conversion of short term memory (encoded in their activity patterns) to long term memory (involving a transfer of these activity patterns to older circuits followed by the death of these new cells). A curious fact is that the lifetimes of the new adult-generated cells in the rat and macaque dentate are three and nine weeks respectively, which correspond to the approximate time of hippocampal storage in these species. Thus, learning and memory may involve the construction of entirely new circuits with new or unused elements (structural plasticity through neurogenesis (discussed here), spine motility (Lendvai et al., 2000), and synaptogenesis (Zito and Svoboda, 2002)), as well as the modulation of existing circuits (synaptic efficacy change, (Martin and Morris, 2002)). 3. Computational importance of learning-directed growth Formal learning theory (FLT) deals with the ability of an algorithm or learner to arrive at a target concept (or learn) based on examples of the concept. Three key aspects are that the algorithm searches through a predefined space of candidate hypotheses (concepts), it is expected to learn the concept exactly, and no restrictions are placed on the actual time taken by the learner to arrive at the target concept. FLT is therefore interested in ``exact" and ``in-principle" learnability. This process is also referred to as \textit{inductive inference} (see (Shavlik, 1990)), and the expectation is that generalization is achieved. Generalization is roughly measure of the performance of the learned concept on novel examples (not seen during learning) generated by the target concept; if the target concept is indeed learned, then the performance should be perfect. The classical formulation of inductive inference is in language learning (Gold, 1967). A quantification of generalization, or equivalently, of generalization error, is achieved statistically by defining it to be the sum of two components - the bias squared and the variance (Geman et al., 1992). In turn, the bias can be defined roughly as the distance of the best hypothesis in the space from the target concept, whereas the variance is the distance of the current hypothesis from the target. The traditional bias-variance tradeoff is as follows. Restricting the hypothesis space generally tends to reduce the variance, while increasing the likelihood that the bias is very large (the target concept may be far away from the few hypotheses in the space). On the other hand, expanding the hypothesis space increases the variance while increasing the likelihood that the Shreesh P. Mysore 08/24/2003 7 target hypothesis is included in the space (low bias). Clearly, while the objective is to minimize generalization error, its two components compete, where a highly biased learner is useful only if meticulously tailored to the problem at hand, while a more general learner requires larger resources in terms of training data sets. In a sense that this tradeoff is a consequence of the three key aspects of formal learning theory discussed above. PAC learning (or probably-approximately-correct learning), proposed by Valiant (Valiant, 1984) and subsequently accepted as the standard model of inductive inference in machine learning (Dietterich, 1990) addressed two of the three key aspects of formal learning theory. In this, the impractical assumption of infinite time horizons was remedied and the stringent restriction of exact learning was relaxed. In short, PAC learning deals not with in-principle learnability, but with learnability in polynomial time, and not with exact learning, but with approximate learning. Formally, a concept is PAC-learnable if the learner arrives with probability 1- at a hypothesis that classifies the all examples correctly with probability 1-. Nevertheless, the hypothesis space is still fixed apriori. A fixed hypothesis space yields such problematic theoretical issues as Fodor's paradox (Fodor, 1980), which essentially states in essence that nothing can be learned that is not already known; and hence nothing is really learned. The idea here is that no hypothesis that is more complex than the ones in the given hypothesis space can be evaluated and hence achieved. Therefore, all concepts need to be available in the hypothesis space before the search begins. Constructive learning (Piaget, 1970; Piaget, 1980; Quartz, 1993; Quartz and Sejnowski, 1997) addresses this issue of a fixed hypothesis space and hence issues like Fodor's paradox satisfactorily. The central idea of Piagetian constructivism is that of construction of more complex hypotheses from simpler ones, and a building up of representation. Unfortunately, Piaget did not specify the mechanisms by which such a process could be neurally implemented. This issue is dealt with more formally in \cite{quartz-1993,quartz-1997}. Constructivist learning deals directly with the issue of increasing hypothesis complexity as learning progresses. Computational models of constructive learning that implements upward changes in complexity exist (Shultz et al., 1995; Westermann, 2000). Activity dependent structural plasticity is viewed as the mechanism that implements constructivist learning. Baum (Baum, 1988) showed that networks with the ability to add structure during the process of learning are capable of learning in polynomial time, any learning problem that can be solved in polynomial time by any algorithm whatsoever (Quartz and Sejnowski, 1997), thus conferring a computational universality to this paradigm. Interestingly, the bias-variance tradeoff can be broken by constructivist learning, by adding hypotheses incrementally to the space in such a way as to keep variance low, while reducing the bias. Further, in the context of neurobiology, the burden of innate knowledge on the network is greatly relaxed. Given a basic set of primitives (in the form of mechanisms, physical substrate), the construction of the network and hence representations occurs under the guidance of activity and within genetic constraints. 4. Structural plasticity and auditory localization in barn owls An excellent example of experience-driven structural plasticity resulting in representational change is seen in the auditory localization system (ALS) of barn owls. The ``task" in question Shreesh P. Mysore 08/24/2003 8 here is the formation of associations between particular values of auditory cues like interaural time differences (ITD), interaural level differences (ITL) and amplitude spectra with specific locations in space. Whereas many animals can localize sounds soon after birth (Field et al., 1980) (indicating the hard-wiring of at least a part of this system (Brainard and Knudsen, 1998; Knudsen et al., 1991)), experience plays an important role in shaping and modifying the auditory localization pathway (ALP). The need for experience dependence can be qualitatively deduced for several reasons (Knudsen, 2002). For instance, there are individual differences in head and ear sizes and shapes, and in the course of its lifetime, an animal faces changing conditions, both external, and those imposed on it through aging processes. Hence, the ability of the ALS to adapt to experience could be very useful for the animal. Indeed, behavioral experiments readily attest to the existence of such an adaptability. Studies (called visual displacement studies) involve disruption of the auditory-location association by displacement of visual input along the azimuth through the use of prismatic spectacles (Brainard and Knudsen, 1998). The experimental evidence from these studies is summarized below. 4.1 Experimental evidence Young owls (<200 days old) that are fitted with prisms (23o shift) learn in about 5 weeks to modify their orienting behavior appropriately so that they are able to localize sound sources. Removal of prisms results in a gradual return to normal orienting behavior. On the other hand, adult owls (>200days old) adapt poorly (Knudsen, 1998). Interestingly, shifted-back owls (juveniles owls that were initially fitted with prisms which were removed later) were subsequently refitted with prisms as adults, they displayed adaptability very close to that displayed originally as juvenile. Finally, recent experiments (Linkenhoker and Knudsen, 2002) show that the capacity for behavioral plasticity in adult owls can be significantly increased with incremental training (where the adults experience prismatic shifts in small increments). Among the several interesting findings in these studies is that behavioral adaptation is mediated by structural plasticity. Figure 1 schematically shows the key brain areas involved in auditory localization plasticity in owls. Studies showed that when owls are reared with displacing prisms, the ITD tuning curves of neurons in the OT and the ICX, but not in the ICC, adapt appropriately (Brainard and Knudsen, 1993; Brainard and Knudsen, 1995). ----------------------Figure 1 goes here ----------------------- This suggested that the site of plasticity is after the ICC in the midbrain pathway. Anterograde (DeBello et al., 2001) and retrograde (Feldman and Knudsen, 1997) tracer experiments that studied the topography of the axonal projections from the ICC to the ICX as a function of experience showed that prism experience expands axonal arborization and causes synaptogenesis, in a manner that results in adaptive reorganization of the anatomical circuit (see figure 2). Compared to normal juveniles, the axonal projection fields in normal adults are sparser and more restricted. Relative to the projection fields in normal adults, those in owls subjected to Shreesh P. Mysore 08/24/2003 9 prism-rearing are denser and broader and showed a large net increase of axons in the adaptive portion of the field. The shifted ITD tuning of the ICX neurons matched the tuning of the ICC neurons from which the new projections were formed. Axonal projections corresponding to normal behavior remained in the prism-reared owls, thus showing the co-existence of two different auditory space encoding circuits. Further, in owls (called shifted-back owls) that were re-exposed to normal visual fields (after having been prism-reared for a while), the new axonal projections persist along with the normal projections, although the behavioral response belies their existence (Knudsen, 1998). ----------------------Figure 2 goes here ----------------------These results all indicate that structural change between the ICC and ICX is largely responsible for the experience driven adjustment in the auditory space map. The intriguing presence of two different encoding circuits in prism-reared owls, but only one overt behavior at any time led to the hypothesis that anatomical projections were being appropriately functionally inhibited through GABAA-ergic circuits. Indeed this is the case (Zheng and Knudsen, 1999; Zheng and Knudsen, 2001). One major question that remains unanswered relates to the nature of the error signal (or instructive signal) that guides plasticity. Results from (Hyde and Knudsen, 2001) show that a visually based topographic template signal is primarily responsible for calibration. As the visual information arriving from the retina is topographically arranged in the OT, and as a point-topoint feedback projection was found from OT to ICX (Hyde and Knudsen, 2000), it was suggested that OT provides a visually-based calibration signal to the ICX (Gelfland and Pearson, 1988; Hyde and Knudsen, 2001). Experimental support for this argument was found when a restricted lesion in the optic tectum eliminated plasticity only in the corresponding portion of the auditory space map in the ICX. It was not known until very recently (Gutfreund et al., 2002), whether this topographic signal was a simple visual template, or a processed version of it that reflected an alignment error. In (Gutfreund et al., 2002) the authors found that the ICX neurons have well-defined visual receptive fields, that the projection of this information from the OT to the ICX is precise, that the auditory receptor fields (locations corresponding to the ITD tuning) of the ICX neurons overlapped and covaried with their vRFs, and hence that the instructive signal is simply a visual topographic template. Further, a presentation of only an auditory stimulus (that has the appropriate (best tuned) ITD) to an ICX neuron produces a maximal, 40ms-long firing response with a latency of about 10ms. When just a visual stimulus is presented at the center of the neuron's vRF, the neuron produces a maximal, 40ms-long firing response, but this time with a latency of approximately 90ms (see figure 3). And when both auditory and visual stimuli are presented, with the ITD of the auditory stimulus corresponding to the vRF location, then the ICX neuron shows the auditory response (firing with 10ms latency), but does not show the visual response (response with 90ms latency). However, if a bimodal stimulus is presented, with an ITD not equal to that predicted by the vRF location(i.e., if there is a mismatch), then only the visual response is observed (firing with 90ms latency). Thus the dynamics of the ICX response (in particular, the presence of a visual response) following a Shreesh P. Mysore 10 08/24/2003 bimodal stimulus is different in the misaligned case from the normal case and this serves as the error signal between the spatial representations of the auditory and visual stimuli. We now present a computational model of auditory localization plasticity. ----------------------Figure 3 goes here ----------------------4.2 Modeling plasticity in auditory localization behavior We have implemented a simple connectionist model of the auditory localization network in barn owls using firing-rate coding neurons (the output of each neurons is a nonlinear function of the weighted sum of the inputs). Past attempts (Pouget et al., 1995; Rosen et al., 1994; Rucci et al., 1997) have primarily used reward-based learning schemes, about which little is known in the owl's auditory localization system. We move away from reward based methods, and further, implement a template-based teaching signal in our scheme (while (Gelfland and Pearson, 1988) did not use a reward-based scheme, they implemented learning via the back-propagation algorithm, which is acknowledged as not being biophysical and hence of needing modification. The driving assumption in our modeling effort is that the default structures, and pathways already exist and the behavior of interest is the plasticity between the ICC and ICX in response to prismatic shifts in the visual field (see (Kempter et al., 2001) for a plausible spike time dependent Hebbian scheme for the ab initio development of an ITD map in the laminar nucleus of barn owls). The default pathways are modeled by projections from one layer to the next, where the layers are the ICC, ICXI, ICXE, OTD, OTM, and OTS. This scheme is summarized in figure 4. We further make the simplifying assumption that the ITD tuning curves of the neurons are sharp (and distributions). As mentioned before, the plasticity observed occurs between the neurons in the ICC and the ICX layers – both structural plasticity (axo- and synaptogenesis) and synaptic efficacy change (ICCICXI and ICCICXE). In order to implement the activitydependent plasticity, we have used a learning rule that is inspired by the ----------------------Figure 4 goes here ----------------------classical Hebbian rule. The error signal is encoded in the selective and delayed firing of an ICXE neuron in response to a visual stimulus presented at its visual receptor field, when the auditory input that is simultaneously presented is mismatched. Under normal conditions (figure 4), when an audio-visual (AV) stimulus is presented, the green spikes representing the auditory response appear, but not the faded-red spikes representing the visual response. This is implemented through adaptation in the ICXE neurons to input stimuli such that if an ICXE neuron fires at instant t, then it is unable to fire for another 80 ms. Under mismatched conditions (with prism), an AV stimulus, now elicits only the (delayed) visual response (red spike). This is seen in figure 5 in the activity plot of the ICXE neuron. Shreesh P. Mysore 11 08/24/2003 ----------------------Figure 5 goes here ----------------------In our model, we detect the error signal by checking for ICX activity 90ms after the presentation of an AV stimulus. Weight change is then implemented such that weights reduce between ICC neurons that do not fire and ICXE (and ICXI) neurons that do, soon after. If the error occurs repeatedly for a sufficiently long period of time (which is taken to imply a systematic error in encoding information), then new ``synapses" are formed, that is, the population of the ICC-ICXI and ICC-ICXE weight matrices is increased. Once this happens, the learning rule from above increases the weights between ICC neurons that fire and ICXE (and ICXI) neurons that also fire soon after. The time course of weight evolution is implemented to be sigmoidal. The results of this scheme match the experiments, in that the ITD tuning curves of the ICX neurons shifts in the adaptive direction after training. Figures 6 (a) and (b) show the tuning curve of ICXE neuron \#1, before and after training, with a shift corresponding to 20s (shift by one unit, i.e., by 1o). With this model, we are thus able to capture the essence of the learning exhibited by the barn owls. Future work will look into adding further details. 4.3 Structural plasticity and representation construction While the ``how” of plasticity is answered through mechanisms of structural change, in order to get at principles of brain design, the more intriguing and inherently nebulous question - ``why structural plasticity?” is interesting. Is it true that given the constraints of the biology in question, structural plasticity is the most plausible, if not the only, way? Interestingly, one can argue that given a topographic arrangement of information, with this information being propagated in a point-to-point manner, drastic shifts in the map of magnitude greater than can be supported by the existing projection spread can only be incorporated through the formation of new projection pathways (and concomitant new synapses). In contrast, a diffuse network architecture with an all-to-all projection, could implement plasticity simply by changing synaptic efficacies. One quantification of the ``ability" or ``complexity" of a network is in terms of its memory capacity3 (Poirazi. and Mel, 2001). Applying a similar function counting approach to our problem (by labeling the inputs to the ICC and OTS as xi, and treating the output of the network as the vector of ICXE activities), we see that the existing network architecture implements one possible function, whereas the architecture post-learning implements a qualitatively different function. Post-natal structural plasticity has, in ----------------------------------Figures 6 (a) and (b) go here -----------------------------------3 An alternate quantifier of complexity is the Vapnik-Chervonenkis (see \cite{bishop-1995}) dimension or the VCdim (a measure of complexity of a concept class, and in our case; the network can be treated as a generator of various concepts, i.e., a concept class). Shreesh P. Mysore 12 08/24/2003 effect, permitted an increase in the repertoire of functions that the network can compute (memory capacity), subject to the topographic projection constraints, from 1 to more than 2 (the actual number depends on the extent of the new axonal growth). Here, we are assuming a simple oneneuron-per-concept representation as shown in figures 4 and 5. For instance, if the shift is by 1 unit (20s), then the capacity is 2, if the shift is by 2 units, the capacity is now 3, and so on. In this sense, a maximum capacity increase from 1 to 41 is possible with a 40o prism shift). Note that with age, the ability for structural plasticity diminishes, and beyond 200 days is almost nonexistent. In essence, this means that if the capacity of the network was N before 200 days, then it remains N (or very close to it) even when subject to a maximal prism shift, thus precluding further increase in complexity. If the memory capacity measure is applied to the Cascade Correlation scheme from the earlier section, one sees immediately that at the end of every input phase, the complexity of the network, in terms of the functions it can compute (memory capacity), is augmented. In general, one expects structural plasticity to achieve an abrupt and qualitative change in the representational ability of a network, and to typically increase its functional complexity. Another view of this is that if the network is viewed as a representation of the hypothesis space that is explored while learning from examples, then structural plasticity helps to increase the hypothesis space. This is fundamentally different from synaptic weight change, which only implements a search within a given hypothesis space, and is hence a powerful and indispensable ally in representation construction. 5. Evidence from modeling cognitive development in children One way to assess the importance of constructive growth in cognitive development is to compare models whose networks are allowed to grow to models containing static networks. This comparison is facilitated by the fact that the principal difference between two of the most popular neural algorithms for modeling cognitive development is that one of them incorporates growth and the other does not (Shultz, 2003). Both back-propagation (BP) and cascadecorrelation (CC) involve multi-layered feed-forward networks that learn by adjusting their connection weights to reduce error, defined as the discrepancy between actual and target outputs. BP networks are typically designed by a programmer and retain their initial structure throughout learning. In contrast, CC networks begin with a minimal topology containing only inputs and outputs defined by a programmer, and then proceed to add hidden units as needed to master the training problem. There are two phases in CC learning – the input phase and the output phase. Training typically begins in output phase by using the so-called delta rule to adjust the weights entering output units in order to reduce network error. When error reduction stagnates for certain number of epochs or when the problem has not been mastered within another, larger certain number of epochs, CC switches to input phase.4 In input phase, the task is no longer to reduce error, but rather to reconceptualize the problem by recruiting a useful hidden unit downstream of the existing units but upstream of the output units. A pool of typically eight candidate hidden units, usually equipped with sigmoid activation functions, is evaluated for recruitment. This evaluation consists of training randomly initialized weights from the inputs and any existing hidden units to these 4 An epoch is a pass through the training patterns. Shreesh P. Mysore 13 08/24/2003 candidates, with the goal of maximizing a modified correlation between candidate activation and network error. In other words, the CC algorithm in input phase is looking for a new hidden unit whose activation patterns track the network’s current error, becoming either very active or very inactive with error fluctuations – in short, a candidate unit which is sensitive to the major difficulties that the network is currently having. The same delta rule is used here in input phase, but instead of minimizing error as in output phase, it is used to maximize correlations. When these correlations stagnate over a certain number of input-phase epochs, the candidate with the highest absolute correlation with network error is selected and installed into the network, as the less successful candidates are discarded. This implements a kind of proliferation and pruning of units and connections that is characteristic of selection models of brain development (e.g., Changeux & Danchin, 1976). The new recruit is initialized with randomized output weights of positive values if the correlation with error was negative or with negative values if the correlation with error was positive. At this point, CC returns to output phase to determine how to best utilize the new conceptualization of the problem afforded by its latest recruit. In this way, new and more complex interpretations build on top of older, simpler interpretations. Cycling between input and output phases continues until the training problem is mastered. CC can be interpreted as implementing the processes synaptogenesis and neurogenesis that have recently been documented to be at least partly under the control of learning. As well, CC implements the hierarchical cascaded progression now thought to be characteristic of cortical development. There are presently three domains of cognitive development to which both BP and CC networks have been applied. Here we briefly review these “bakeoff” competitions in order to examine the effects of growth on simulation success. The three relevant domains are the balance scale, the integration of velocity, time, and distance cues for moving objects, and age changes in sensitivity to correlations vs. features in infant category learning. We consider each of these domains in turn. Balance-scale stages The balance-scale problem involves presenting a child with rigid beam balanced on a fulcrum (Siegler, 1976, 1981). The beam is equipped with several pegs spaced at regular intervals to the left and right of the fulcrum. The experimenter typically places some number of identical weights on one peg on the left side and some number of weights on one peg on the right side. While supporting blocks prevent the beam from moving, the child is asked to predict which side of the beam will descend, or whether the scale will remain balanced, once the supporting blocks are removed. The “rules” that children use to make balance-scale predictions can be diagnosed by presentation of six different types of problems. Three problems are relatively simple in the sense that one cue (either weight or distance from the fulcrum) predicts the outcome because the other cue does not differ from one side of the scale to the other. Three other problems are relatively complex in that the two relevant cues conflict with each other, with weight predicting one outcome and distance predicting the opposite outcome. For simple or complex types, there are three possible outcomes in which the direction of the tip is predicted by weight information or by distance information or the scale remains balanced. The pattern of a child’s predictions across these six different problem types can be used to diagnose the “rule” the child is apparently using to make predictions. Of course, in a connectionist network a rule is not implemented as a symbolic if-then structure, but is rather an emerging epiphenomenon of the network’s connection-weight values. Shreesh P. Mysore 14 08/24/2003 There is a consensus in the psychological literature that children progress through four different stages of balance-scale performance (Siegler, 1976, 1981). In stage 1, children use weight information only, predicting that the side with greater weight will descend or that the scale will balance when the two sides have equal weights. In stage 2, children continue to use weight information but begin to use distance information when the weights are equal on each side. In stage 3, weight and distance information are used about equally, but the child guesses when weight and distance information conflict on the more complex problems. Finally in stage 4, children respond correctly on all types of problems, whether simple or complex.5 In one of the first connectionist simulations of cognitive development, McClelland and Jenkins (1991) found that a static BP network with two groups of hidden units segregated for either weight or distance information progressed through the first three stages of the balance scale and even into the fourth stage. However, the network never settled into stage 4 performance, instead cycling between stage 3 and stage 4 functioning for as long as the programmer’s patience lasted. Further simulations indicated that BP networks could settle into stage 4, but only at the cost of missing stages 1 and 2 (Schmidt & Shultz, 1991). There seems to be no way for BP networks to capture both consistently mature stage 4 performance and earlier progressions through stages 1 and 2. In contrast, the first CC model of cognitive development naturally captured all four stages of balance-scale performance, and did so without requiring any hand-designed segregation of hidden units (Shultz, Mareschal, & Schmidt, 1994). Capturing balance-scale stages is not a trivial matter, as witnessed by the shortcomings of earlier symbolic-rule-learning algorithms (Langley, 1987; Newell, 1990). Both BP and CC models are able to naturally capture the other psychological regularity of balance-scale development, the torque-difference effect. This is the tendency for problems with large absolute torque differences (from one side of the scale to the other) to be easier for children to solve, regardless of their current stage (Ferretti & Butterfield, 1986). 6 Connectionist feedforward networks produce perceptual effects like torque difference whenever they learn to convert quantitative input differences into a binary output judgment. The bigger the input differences (in this case, from one side of the fulcrum to the other side), the clearer the hidden unit activations, and the more decisive the output decisions. Symbolic rule-based models are unable to naturally capture the torque-difference effect because symbolic rules typically don’t care about the amount of differences, only the direction of differences. The ability of neural-network models to capture stages 1 and 2 on the balance scale is due to a bias towards equal-distance problems in the training patterns. These are problems in which the weights are placed equally distant from the fulcrum on each side. This bias forces the network to first emphasize weight information at the expense of distance information because weight information is much more relevant to reducing prediction errors. Once weight information is successfully dealt with, then the network can turn its attention to distance information, particularly on those problems where distance varies from one side of the fulcrum to the other. In effect, the network must find the particular region of connection-weight space that allows it to emphasize the numbers of weights on the scale and then move to another region of weight space that focuses on the multiplication of equally important weight and distance information. It 5 Some degree of latitude is typically incorporated into stage diagnosis. For example, with four problems of each of the six types, a child could make up to four deviant responses and still be diagnosed into one of these four stages (Siegler, 1976, 1981). 6 Torque is the product of weight and distance on one side of the scale. Shreesh P. Mysore 15 08/24/2003 requires a powerful learning algorithm to make this move in connection-weight space. Apparently, a static BP network, once committed to using one source of information in stage 1 cannot easily find its way downhill to the stage-4 region merely by continuing to reduce error. A constructive algorithm, such as CC, has an easier time with this move because each newly recruited hidden unit effectively changes the shape of connection-weight space by adding a new dimension. This new dimension affords a path on which to move downhill in error reduction towards the stage-4 region. More colloquially, BP cannot have its cake and eat it too, but CC, due to its dynamic increases in computational power, can both have and eat its cake. Integration of velocity, time, and distance cues In Newtonian physics, the velocity of a moving object is the ratio of distance traveled to the time of the journey: velocity = distance / time. Algebraically then, distance = velocity x time, and time = distance / velocity. Some of the cleanest evidence on children's acquisition of these relations was collected by Wilkening (1981), who asked children to predict one dimension (e.g., time) from knowledge of the other two (e.g., velocity and distance). For example, three levels of velocity information were represented by the locomotion of a turtle, a guinea pig, and a cat. These three animals were described as fleeing from a barking dog, and a child participant was asked to imagine these animals running while the dog barked. The child's task was to infer how far an animal would run given the length of time the dog barked, an example of inferring distance from velocity and time cues. CC networks learning similar tasks typically progress through an equivalence stage (e.g., velocity = distance), followed by an additive stage (e.g., velocity = distance - time), and finally the correct multiplicative stage (e.g., velocity = distance / time) (Buckingham & Shultz, 2000). Some of these stages had been previously found with children, and others were subsequently confirmed as predictions of our CC simulations. As with children in psychological experiments, our CC networks learned to predict the value of one dimension from knowledge of values on the other two dimensions. We diagnosed rules based on correlations between network outputs and the various algebraic rules observed in children. To be diagnosed, an algebraic rule had to correlate positively with network responses, account for most of the variance in network responses, and account for more variance than any other rules did. As with rules for solving balance-scale problems, these rules are epiphenomena emerging from patterns of network connection weights. For velocity and time inferences, CC networks first acquired an equivalence rule, followed by a difference rule, followed in turn by the correct ratio rule. Results were similar for distance inferences, except that there was no equivalence rule for distance inferences. In making distance inferences, there is no reason a network should favor either velocity or time information because both velocity and time vary proportionally with distance. A shift from linear to nonlinear performance occurred with continued recruitment of hidden units. Linear rules include equivalence (e.g., time = distance), sum (e.g., distance = velocity + time), and difference (e.g., velocity = distance - time) rules, whereas nonlinear rules include product (e.g., distance = velocity x time) and ratio (e.g., time = distance / velocity) rules. Because the sum and difference rules in the second stage are linear, one might wonder why they require a hidden unit. The reason is that networks without a hidden unit are unable to simultaneously encode the relations among the three dimensions for all three inference types. In distance inferences, distance varies directly with both velocity and time. But in velocity inferences, distance and time vary inversely. And in time inferences, distance and velocity vary Shreesh P. Mysore 16 08/24/2003 inversely. Networks without hidden units are unable to encode these different relations without a hidden unit. The first-recruited hidden unit differentiates distance information from velocity and time information, essentially by learning connection weights with one sign (positive or negative) from the former input and opposite signs from the latter inputs. This weight pattern enables the network to consolidate the different directions of relations across the different inference types. In contrast to constructive CC networks, static BP networks seem unable to capture these stage sequences (Buckingham & Shultz, 1996). If a static BP network has too few hidden units, it fails to reach the correct multiplicative rules. If a static BP network has too many hidden units, it fails to capture the intermediate additive stages on velocity and time inferences. Our extensive exploration of a variety of network topologies and variation in critical learning parameters suggests that there is no static BP network topology that can capture all three types of stages in this domain. Even the use of cross-connections that bypass hidden layers, a standard feature of CC, failed to improve the stage performance of BP networks. There is no apparent way to get BP to cover all three stages because the difference between underpowered and overpowered BP network is a single hidden unit. Such a situation would surely frustrate Goldilocks if she were a BP advocate – there is either too little or too much computational power and no apparent way to have just the right amount of static network power. Thus, we conclude that the ability to grow in computational power is essential in simulating stages in the integration of velocity, time, and distance cues. Age changes in infant category learning Our final simulation bakeoff concerns age changes in sensitivity to correlations vs. features in category learning by infants. Using a stimulus-familiarization-and-recovery procedure to study categorization, Younger and Cohen (1983, 1986) found that 4-month-olds process information about independent features of visual stimuli, whereas 10-month-olds are able to abstract relations among those features. Such findings relate to a longstanding controversy about the extent to which perceptual development involves integration (Hebb, 1949) or differentiation (Gibson, 1969) of stimulus information. A developing ability to understand relations among features suggests that perceptual development involves information integration, a view compatible with constructivism. Infants are assumed to construct representational categories for repeated stimuli, ignoring novel stimuli that are consistent with a category, while concentrating on novel stimuli that are not members of a category (Cohen & Younger, 1983; Younger & Cohen, 1985). After repeated presentation of visual stimuli with correlated features, 4-month-olds recovered attention to stimuli with novel features more than to stimuli with either correlated or uncorrelated familiar features (Younger & Cohen, 1983, 1986). In contrast, 10-month-olds recovered attention to both stimuli with novel features and familiar uncorrelated features more than to stimuli with familiar correlated features. Mean fixation time in seconds is shown for the key interaction between age and correlated vs. uncorrelated test stimuli, plotted in solid lines in Figure x against the left-hand y-axis. This pattern of recovery of attention indicates that young infants learned about the individual stimulus features, but not about the relationship among features, whereas older infants in addition learned about how these features correlate with one other. ----------------------Figure 7 goes here ----------------------- Shreesh P. Mysore 17 08/24/2003 These infant experiments were recently simulated with CC encoder networks (Shultz & Cohen, 2003). Encoder networks have the same number of input units as output units, and their job is to reproduce their input values on their output units. They learn to do this reproduction by encoding an input representation onto their hidden units and then decoding that representation onto output units. In this fashion, encoder networks develop a recognition memory for the stimuli they are exposed to. Network error can be used as an index of stimulus novelty. Both BP and CC encoders were applied to the infant data (Shultz & Cohen, 2003). In CC networks, age was implemented by varying the score-threshold parameter: 0.25 for 4-month-olds and 0.15 for 10month-olds. This parameter governs how much learning occurs because learning stops only when all network outputs are within score-threshold of their target values for all training patterns. It was assumed that older infants would learn more from the same exposure time than would younger infants. Mean network error for the key interaction between age and correlated vs. uncorrelated test stimuli is plotted with dashed lines in Figure x against the right-hand y-axis. On the assumption that network error reflects stimulus novelty (and thus infant interest), the CC networks closely matched the infant data – more error to the uncorrelated test stimulus than to the correlated test stimulus only at the smaller score-threshold of 0.15. In sharp contrast, a wide range of static BP networks could not capture this key interaction. Our BP simulator was modified to use a score-threshold parameter to decide on learning success, just as in CC. The same coding scheme used in the CC simulations was also used here. A wide variety of score-threshold values were explored in a systematic attempt to cover the infant data. BP networks were equipped with three hidden units, the typical number recruited by CC networks. BP network topology was varied to explore the possible roles of network depth and the presence of cross connections in simulation success. Both flat (6-3-6) and deep (6-1-1-1-6) BP networks were tried, both with and without the cross-connection weights native to CC.7 There was never lower error on correlated than on uncorrelated test items, the signature of effective correlation detection. It is difficult to prove that any algorithm cannot in principle cover a set of phenomena, but we certainly gave BP a fair shake, running 9 networks in each of 80 simulations (2 familiarization sets x 10 score-threshold levels x 4 network topologies). Unlike previous psychological explanations that postulated unspecified qualitative shifts in processing with age, our computational explanation based on CC networks focused on quantitatively deeper learning with increasing age, a principle with considerable empirical support over a wide range of ages and experiments (e.g., Case, Kurland, & Goldberg, 1982). CC networks also generated a crossover prediction, with deep learning showing a correlation effect, and even more superficial learning showing the opposite – a similarity effect, in the form of more error to (or interest in) the correlated test item than to the uncorrelated test item. This is called a similarity effect because the uncorrelated test item was most similar to those in the familiarization set. This simulation predicted that with a single, suitable age group, say 10month-olds, there would be a correlation effect under optimal learning conditions and a similarity effect with less than optimal familiarization learning. Tests of this prediction found that 10-month-olds who habituated to the training stimuli looked longer at the uncorrelated than the correlated test stimulus, but those who did not habituate did just the opposite, looking longer at the correlated than the uncorrelated test stimulus (Cohen & Arthur, unpublished). 7 There were six input and six output units in BP networks just as there were in CC networks. Shreesh P. Mysore 18 08/24/2003 In addition to covering the essential age x correlation interaction, CC networks covered two other important features of the infant data: discrimination of the habituated from the correlated test stimulus via the presence of two additional noncorrelated features, and more recovery to test stimuli having novel features than to test stimuli having familiar features. Although the BP simulations did not capture the infant data, they were useful in establishing why CC networks were successful. Namely, the growth of cascaded networks with cross-connection weights seems critical to capturing detection of correlations between features and the developmental shift to this ability from earlier feature-value learning. 6. Conclusions The work reviewed here indicates that it is important to let networks grow as a way of satisfying both biological and computational constraints. Biological constraints clearly favour the use of neural networks that grow as they learn. Implementing network growth in simulations of psychological data produces superior coverage of the data when contrasted with comparable static networks. Network growth offers a number of computational advantages including the building of more complex knowledge on top of simpler knowledge, superior ability to escape from local minima during error reduction, and deeper learning of training problems. Such advantages translated into better coverage of developmental stage progressions seen the psychological literature. It is noteworthy that these computational constraints can operate in both supervised and unsupervised learning paradigms, including both reward-based Hebbian learning and multilayered feed-forward CC networks. As used here, the differences between these learning paradigms became somewhat blurred. Typically considered as an unsupervised learner, the Hebb rule was here modified to include the discrepancy between predicted and actual reward. Typically considered as supervised learning, CC can often be construed as not requiring a human teacher to supply the required target outputs. For example, CC networks learn about balance scale problems by computing the discrepancy between what they predict will happen and what actually happens as a scale is allowed to tip. Similarly, they learn about velocity, time, and distance relations by predicting and then noting how these cues are related as objects move. In these and in many other cases, observations of the natural environment supply the target outputs used in error computation. Likewise, CC encoder networks use the stimulus itself as the target output, computing error as the discrepancy between input and output activations. In this sense, encoders constitute a case of unsupervised learning about stimuli from mere stimulus exposure. Although we emphasized network growth in this chapter, it is also clear from neuroscience research that brains discard unused neurons and synapses. In a drive towards even greater biological realism, it would be worth experimenting with networks that both grow and prune. Some preliminary work along these lines has found that pruning of CC networks during growth and learning improves learning speed, generalization, and network interpretability (Thivierge, Rivest, & Shultz, 2003). This would appear to be yet another case of a biological constraint offering computational advantages. References (SPM and SRQ) Shreesh P. Mysore 19 08/24/2003 Altman, J. (1962). Are new neurons formed in the brains of adult mammals? Science 135, 1127-1128. Altman, J., and Das, G. D. (1965). Autoradiographic and histological evidence of postnatal hippocampal neurogenesis in rats. Journal of Comparative Neurology 124, 319--335. Barnea, A., and Nottebohm, F. (1994). Seasonal recruitment of hippocampal neurons in adult free-ranging black-capped chickadees. Proc Natl Acad Sci U S A 91, 11217-11221. Baum, E. B. (1988). Complete representations for learning from examples. In Complexity in information theory, Y. Abu-Mostafa, ed. Brainard, M. S., and Knudsen, E. I. (1993). Experience-dependent plasticity in the inferior colliculus: a site for visual calibration of the reward representation of auditory space in the barn owl. The Journal of Neuroscience 13, 45489--44608. Brainard, M. S., and Knudsen, E. I. (1995). Dynamics of visually guided auditory plasticity in the optic tectum of the barn owl. Journal of Neurophysiology 73, 595--614. Brainard, M. S., and Knudsen, E. I. (1998). Sensitive periods for visual calibration of the auditory space map in the barn owl optic tectum. Journal of Neuroscience 18, 3929--3942. Chomsky, N. (1980). Rules and representations. Behavioral and Brain Sciences 3, 1-61. DeBello, W. M., Feldman, D. E., and Knudsen, E. I. (2001). Adaptive axonal remodeling in the midbrain auditory space map. The Journal of Neuroscience 21, 3161--3172. Dietterich, T. G. (1990). Machine Learning. Annuan Review of Computer Science 4, 255--306. Eckenhoff, M. F., and Rakic, P. (1988). Nature and fate of proliferative cells in the hippocampal dentate gyrus during the life span of the rhesus monkey. J Neurosci 8, 2729-2747. Feldman, D. E., and Knudsen, E. I. (1997). An anatomical basis for visual calibration of the auditory space map in the barn owl's optic tectum. Journal of Neuroscience 17, 6820--6837. Field, J., Nuir, D., Pilon, R., Sinclair, M., and Dodwell, P. (1980). Infants' orientation to lateral sounds from birth to three months. Child Development 51, 295--298. Fodor, J. (1980). On the impossibility of learning more ``powerful" structures. In The debate between Jean Piaget and Noam Chomsky, M. Piatelli-Palmarini, ed. Gage, F. H. (2002). Neurogenesis in the adult brain. The Journal of Neuroscience 22, 612--613. Gelfland, J. J., and Pearson, J. C. (1988). Multisensor integration in biological systems. Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation 4, 1--58. Gold, M. (1967). Language identification in the limit. Information and Control 10, 447-474. Goldman, S. A., and Nottebohm, F. (1983). Neuronal production, migration, and differentiation in a vocal control nucleus of the adult female canary brain. Proc Natl Acad Sci U S A 80, 23902394. Gould, E., and Gross, C. G. (2002). Neurogenesis in adult mammals: some progress and problems. The Journal of Neuroscience 22, 619--623. Shreesh P. Mysore 20 08/24/2003 Gould, E., Reeves, A. J., Graziano, M. S. A., and Gross, C. G. (1999). Neurogenesis in the neocortex of adult primates. Science 286, 548--552. Gould, E., Vail, N., Wagers, M., and Gross, C. G. (2001). Adult generated hippocampal and neocortical neurons in macaques have transient existence. Proceedings of the National Academy of Sciences, USA 98, 10910--10917. Gross, C. G. (2000). Neurogenesis in the adult brain: death of a dogma. Nature Reviews Neuroscience 1, 67--73. Gutfreund, Y., Zheng, W., and Knudsen, E. I. (2002). Gated visual input to the central auditory system. Science 297, 1556--1559. Hyde, P. S., and Knudsen, E. I. (2000). Topographic projection from the optic tectum to the auditory space map in the inferior colliculus of the barn owl. The Journal of Comparative Neurology 421, 146--160. Hyde, P. S., and Knudsen, E. I. (2001). A topographic instructive signal guides the adjustment of the auditory space map in the optic tectum. Journal of Neuroscience 21, 8586--8593. Kaplan, M. S. (1984). Mitotic neuroblasts in 9-day old and 11-month old rodent hippocampus. Journal of Neuroscience 4, 1429--1441. Kaplan, M. S. (1985). Formation and turnover of neurons in young and senescent animals: an electron microscopic and morphometric analysis. Annals of the New York Academy of Sciences 457, 173--192. Kempermann, G. (2002). Why new neurons? Possible functions for adult hippocampal neurogenesis. Journal of Neuroscience 22, 635--638. Kempermann, G., Kuhn, H. G., and Gage, F. H. (1997). More hippocampal neurons in adult mice living in an enriched environment. Nature 386, 493-495. Kempermann, G., Kuhn, H. G., and Gage, F. H. (1998). Experience-dependent neurogenesis in the senescent dentate gyrus. The Journal of Neuroscience 18, 3206--3212. Kempter, R., Leibols, C., Wagner, H., and Hemmen, J. L. v. (2001). Formation of temporalfeature maps by axonal propagation of synaptic learning. Proceedings of the National Academy of Sciences 98, 4166--4171. Knudsen, E. I. (1998). Capacity for plasticity in the adult owl auditory system expanded by juvenile experience. Science 279, 1531--1533. Knudsen, E. I. (2002). Instructed learning in the auditory localization pathway of the barn owl. Nature 417, 322-328. Knudsen, E. I., Esterly, S. D., and Lac, S. d. (1991). Stretched and upside-down maps of auditory space in the optic tectum of blind-reared owls: acoustic basis and behavioral correlates. The Journal of Neuroscience 11, 1727--1747. Kornack, D., and Rakic, P. (1999). Continuation of neurogenesis in the hippocampus of the adult macaque monkey. Proceedings of the National Academy of Sciences 96, 5768--5773. Lendvai, B., Stern, E. A., Chen, B., and Svoboda, K. (2000). Experience-Dependent Plasticity of Dendritic Spines in the Developing Rat Barrel Cortex \textit{In Vivo}. Nature 404, 876--881. Shreesh P. Mysore 21 08/24/2003 Linkenhoker, B. A., and Knudsen, E. I. (2002). Incremental training increases the plasticity of the auditory space map in adult barn owls. Nature 419, 293--296. Lipkind, D., Nottebohm, F., Rado, R., and Barnea, A. (2002). Social change affects the survival of new neurons in the forebrain of adult songbirds. Behav Brain Res 133, 31-43. Macnamara, J. (1982). Names for Things: A Study of Child Language. Cambridge, MA: MIT Press. Martin, S. J., and Morris, R. G. M. (2002). New life to an old idea: The synaptic plasticity and memory hypothesis revisited. Hippocampus 12, 609--636. Nilsson, M., Perfilieva, E., Johansson, V., Orwar, O., and Eriksson, P. S. (1999). Enriched environment increases neurogenesis in the adult rat dentate gyrus and improves spatial memory. Journal of Neurobiology 39, 569--578. Nottebohm, F. (1985). Neuronal replacement in adulthood. Ann N Y Acad Sci 457, 143-161. Nottebohm, F. (2002). Why are some neurons replaced in adult brain? J Neurosci 22, 624-628. Nottebohm, F., O'Loughlin, B., Gould, K., Yohay, K., and Alvarez-Buylla, A. (1994). The life span of new neurons in a song control nucleus of the adult canary brain depends on time of year when these cells are born. Proc Natl Acad Sci U S A 91, 7849-7853. Nowakowski, R. S., and Hayes, N. L. (2000). New neurons: extraordinary evidence or extraordinary conclusions. Science 288, 771a. Paton, J. A., and Nottebohm, F. N. (1984). Neurons generated in the adult brain are recruited into functional circuits. Science 225, 1046-1048. Piaget, J. (1970). Genetic epistemology. Piaget, J. (1980). Adaptation and intelligence: organic selection and phenocopy. Poirazi., P., and Mel, B. W. (2001). Impact of active dendrites and structural plasticity on the memory capacity of neural tissues. Neuron 29, 779-796. Pouget, A., Defayet, C., and Sejnowski, T. J. (1995). Reinforcement learning predicts the site of plasticity for auditory remapping in the barn owl. In Advances in Neural Information Processing Systems 7, G. T. a. D. T. a. J. Alspector, ed. (Cambridge MA), pp. 125--132. Quartz, S. R. (1993). Neural networks, nativism and the plausibility of constructivism. Cognition 48, 223--242. Quartz, S. R., and Sejnowski, T. J. (1997). The neural basis of cognitive development: A constructivist manifesto. Behavioral and Brain Sciences 20, 537--596. Rakic, P. (1985). Limits to neurogenesis in primates. Science 227, 1054--1056. Rakic, P. (2002). Neurogenesis in the adult primate neocortex: an evaluation of the evidence. Nature Reviews Neuroscience 3, 65--71. Rosen, D. J., Rumelhart, D. E., and Knudsen, E. I. (1994). A connectionist model of the owl's sound localization system. In Advances in Neural Information Processing Systems 6, G. T. a. D. S. T. a. T. K. Leen, ed. (Cambridge, MA). Shreesh P. Mysore 22 08/24/2003 Rucci, M., Tononi, G., and Edelman, G. M. (1997). Registration of neural maps through valuedependent learning: modeling the alignment of auditory and visual maps in the barn owl's optic tectum. The Journal of Neuroscience 17, 334--352. Shankle, W. R., Landing, B. H., Rafii, M. S., Schiano, A., Chen, J. M., and Hara, J. (1998). Evidence for a postnatal doubling of neuron number in the developing human cerebral cortex between 15 months and 6 years. J Theor Biol 191, 115-140. Shavlik, J. W. (1990). Inductive learning from preclassified training examples: Introduction. In Readings in machine learning, J. W. S. a. T. D. Dietterich, ed. Shultz, T. R., Schmidt, W. C., Buckingham, D., and Mareschal, D. (1995). Modeling cognitive development with a generative constructionist algorithm. In Developing cognitive competence: New approaches to process modeling, T. J. S. a. G. S. Halford, ed. Smart, I. (1961). The subependymal layer of the mouse brain and its cell production as shown by autography after H3-thymidine injection. Jounrnal of Comparative Neurology 116, 325--347. Stanfield, B. B., and Trice, J. E. (1988). Evidence that granule cells generated in the dentate gyrus of adult rats extend axonal projections. Experimental Brain Research 72, 399--406. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM 27, 1134--1142. Westermann, G. (2000). Constructivist neural networks models of cognitive development. Zheng, W., and Knudsen, E. I. (1999). Functional selection of adaptive auditory space map by GABAA-mediated inhibition. Science 284, 962--965. Zheng, W., and Knudsen, E. I. (2001). GABA-ergic inhibition antagonizes adaptive adjustment of the owl's auditory space map during the initial phase of plasticity. Journal of Neuroscience 21, 4356--4365. Zito, K., and Svoboda, K. (2002). Activity-dependent synaptogenesis in the adult mammalian cortex. Neuron 35, 1015--1017. References (TRS) Buckingham, D., & Shultz, T. R. (1996). Computational power and realistic cognitive development. Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society (pp. 507-511). Mahwah, NJ: Erlbaum. Buckingham, D., & Shultz, T. R. (2000). The developmental course of distance, time, and velocity concepts: A generative connectionist model. Journal of Cognition and Development, 1, 305-345. Case, R., Kurland, D. M., & Goldberg, J. (1982). Operational efficiency and the growth of shortterm memory span. Journal of Experimental Child Psychology, 33, 386-404. Changeux, J. P., & Danchin, A. (1976). Selective stabilsation of developing synapses as a mechanism for the specification of neuronal networks. Nature, 264, 705-712. Cohen, L. B., & Arthur, A. E. (unpublished). The role of habituation in 10-month-olds' categorization. Shreesh P. Mysore 23 08/24/2003 Cohen, L. B., & Younger, B. A. (1983). Perceptual categorization in the infant. In E. Scholnick (Ed.), New trends in conceptual representation (pp. 197-220). Hillsdale, NJ: Lawrence Erlbaum Associates. Ferretti, R. P., & Butterfield, E. C. (1986). Are children's rule assessment classifications invariant across instances of problem types? Child Development , 57, 1419-1428. Gibson, E. J. (1969). Principles of perceptual learning and development. New York: AppletonCentury-Crofts. Gould, E., Tanapat, P., Hastings, N. B., & Shors, T. J. (1999). Neurogenesis in adulthood: a possible role in learning. Trends in Cognitive Sciences, 3, 186-192. [not used yet] Hebb, D. O. (1949). The organization of behavior. New York: Wiley. Langley, P. (1987). A general theory of discrimination learning. In D. Klahr, P. Langley, & R. Neches (eds.), Production system models of learning and development (pp. 99-161). Cambridge, MA: MIT Press. Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press. Quartz, S. R. (1993). Neural networks, nativism, and the plausibility of constructivism. Cognition, 48, 223-242. [not used yet] Quartz, S. R., & Sejnowski, T. J. (1997). The neural basis of cognitive development: A constructivist manifesto. Behavioural and Brain Sciences, 20, 537-596. [not used yet] Schmidt, W. C., & Shultz, T. R. (1991). A replication and extension of McClelland's balance scale model. Technical Report No. 91-10-18, McGill Cognitive Science Centre, McGill University, Montreal. Shultz, T. R. (2001). Connectionist models of development. In N. J. Smelser & P. B. Baltes (eds.), International Encyclopedia of the Social and Behavioral Science (Vol. 4, pp. 25772580). Oxford: Pergamon. [not used yet] Shultz, T. R. (2003). Computational developmental psychology. Cambridge, MA: MIT Press. Shultz, T.R., & Cohen, L. B. (2003, in press). Modeling age differences in infant category learning. Infancy. Shultz, T. R., & Mareschal, D. (1997). Rethinking innateness, learning, and constructivism: Connectionist perspectives on development. Cognitive Development, 12, 563-586. [not used yet] Shultz, T. R., Mareschal, D., & Schmidt, W. C. (1994). Modeling cognitive development on balance scale phenomena. Machine Learning, 16, 57-86. Shultz, T. R., & Rivest, F. (2001). Knowledge-based cascade-correlation: Using knowledge to speed learning. Connection Science, 13, 1-30. [not used yet] Siegler, R. S. (1976). Three aspects of cognitive development. Cognitive Psychology, 8, 481-520. Siegler, R. S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46, Serial No. 189. Thivierge, J. P., Rivest, F., & Shultz, T. R. (2003). A dual-phase technique for pruning constructive networks. IEEE International Joint Conference on Neural Networks. Shreesh P. Mysore 24 08/24/2003 Wilkening, F. (1981). Integrating velocity, time, and distance information: A developmental study. Cognitive Psychology, 13, 231-247. Younger, B. A., & Cohen, L. B. (1983). Infant perception of correlations among attributes. Child Development, 54, 858-867. Younger, B. A., & Cohen, L. B. (1985). How infants form categories. In G. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory, Vol. 19 (pp. 211247). New York: Academic Press. Younger, B. A., & Cohen, L. B. (1986). Developmental change in infants’ perception of correlations among attributes. Child Development, 57, 803-815. Shreesh P. Mysore 25 08/24/2003 Figure 1: Schematic representation of key anatomical areas. Shreesh P. Mysore 26 08/24/2003 OTD:Auditory ICC:Auditory Tonotopic Map Auditory Input ICX:Auditory Topographic Map Topographic Map ICC: Inferior Colliculus Central Nucleus ICX: Inferior Colliculus Central Nucleus OTD: Optic Tectum Deep Layer OTM: Optic Tectum Middle Layer OTS: Optic Tectum Superficial Layer OTM:Vis-aud Integration Auditory processing Visual processing Site of integration Site of plasticity OTS:Visual Topographic Map Visual Input Shreesh P. Mysore 27 08/24/2003 Figure 2: Axonal projections comparison. Reproduced from (DeBello et al., 2001). Shreesh P. Mysore 28 08/24/2003 Shreesh P. Mysore 29 08/24/2003 Figure 3: The influence of auditory stimuli on visual responses in the ICX. (A and B) Response of an ICX site to an auditory stimulus as a function of ITD; negative values indicate left-earleading. (C and D) Responses at the same site to light flashes. (E and F) Responses to the simultaneous presentation of auditory and visual stimuli. [(A), (C), and (E)] Multiunit recording; [(B), (D), and (F)] single-unit recording. The single unit was isolated from the same site (13). The arrows indicate the ITD value corresponding with the location of the visual RF. Reproduced from (Gutfreund et al., 2002) Shreesh P. Mysore 30 08/24/2003 Shreesh P. Mysore 31 08/24/2003 Figure 4: Information flow before mounting prisms Shreesh P. Mysore 32 08/24/2003 Visual field ICC ICXE OTD M OTM OTS M ICXI ICXE t (A) (V) Shreesh P. Mysore 33 08/24/2003 Figure 5: Information flow immediately after mounting prisms Shreesh P. Mysore 34 08/24/2003 Visual field ICC ICXE OTD M OTM OTS M ICXI ICXE t (A) ICXE t (V) Shreesh P. Mysore 35 08/24/2003 Figure 6: (a) Tuning curve of ICX neuron\#1 before training (b) After training. Shreesh P. Mysore 36 08/24/2003 Shreesh P. Mysore 37 08/24/2003 Figure 7. Mean fixation of key test stimuli in infants (solid lines) from Younger and Cohen (1986) and mean CC network error at two levels of score-threshold (dashed lines) from Shultz and Cohen (2003). 12 1.2 11 1.1 10 1 9 0.9 8 0.8 7 0.7 Correlated Uncorrelated Test stimulus Mean error Mean fixation (sec) Shreesh P. Mysore 38 08/24/2003 4 months 10 months 0.25 0.15