A Functional MRI Study of the Distributed Neural Circuitry of Learning and Reward Alexandra F. Awai Submitted to the Department of Nuclear Science and Engineering In Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science and Master of Science at the Massachusetts Institute of Technology -51 [JUM May 20,2005 8 Massachusetts Institute of Technology All Rights Reserved Signature of Author d ................... .-,.-.,....I.. . ......+. .......................... ~ e 6 ) e n tof Nuclear ~cien; and Engineering, May 2005 II +9 Certified by ......: ...'w. f .Y. :H - -4 ... : . ................................. r................*.......... 'Y Matthew Colonnese Postdoctoral Associate, McGovern Institute for Brain Research, Thesis Reader Certified by ..............., ............... L .C:. - 7 4./........................................ Alan Jasanoff Assistant Professor, Department of Nuclear Science and Engineering, Thesis Supervisor k Accepted by ******f -- ....................... ................................., ..p..7....,..., \li Jeffrey Coderre Chair, Department of Committee on Graduate Students A Functional MRI Study of the Distributed Neural Circuitry of Learning and Reward by Alexandra F. Awai Submitted to the Department o f Nuclear Science and Engineering May 20,2005 In Partial Fulfillment o f the Requirements for the Degrees o f Bachelor o f Science and Master o f Science Abstract The aim of this research project was to study the neural substrates involved in processing rewarding stimuli. Evaluation of the magnitudes of reward is one of the fundamental aspects of goal directed behavior, and studies have shown that this process involves the midbrain dopamine system. Work by C. R. Gallistel has shown that the reward magnitude of electrical stimulation to structures within this system increases with increasing current and frequency. In this study, operant conditioning with intracranial self-stimulation of the medial forebrain bundle (MFB) was used to correlate the rewarding quality of a stimulus with variations of its current amplitude (Part 1) or electrical pulse frequency (Part 2). For Part 2, a saturation frequency, which is the point at which increasing stimulus frequency does not elicit a more vigorous operant response, were established for each of the responsive subjects. Functional magnetic resonance imaging (fMRI) with blood oxygenation level dependent (BOLD) contrast was then used to evaluate the brain activation in response to behaviorally characterized electrical stimuli. High resolution anatomical images revealed that subjects with electrode tips positioned within 1 mm of the midline of the MFB tended to demonstrate reward-seeking behavior. Timecourses were plotted for imaging voxels in areas exhibiting BOLD responses in Part 1 and Part 2. In Part 1, the BOLD timecourse in the striatumlorbital cortex region - which has been implicated in reward processing - had different time-evolution characteristics than the central sinus, which is thought to reflect general hemodynamic responses to stimuli. Additionally, the activated regions were qualitatively similar for varying currents, but lower current amplitude led to a smaller percentage of active voxels. In Part 2, responses in the somatosensory1motor cortex and striatum with adjacent ventral forebrain, which are both thought comprise important reward processing circuitry, have similar BOLD responses for saturation and above saturation frequencies, but lower responses at below saturation frequencies. These results show that BOLD imaging can be utilized to isolate regions that code for the rewarding quality of MFB stimulation, rather than its sensory aspects. Thesis Supervisor: Alan Jasanoff, Ph.D. Title: Assistant Professor, Department o f Nuclear Science and Engineering Acknowledgements To Alan Jasanoff, for welcoming me into his lab and helping me to make a contribution to this research project, for which he wrote many macros including Operate, Operatesequencer, and MatLab functions used in analyzing the behavioral and imaging data. His thoughtful guidance was invaluable in my last year at the Institute. To Matthew Colonnese, for his technical contributions in terms of fMRI data acquisition and analysis, as well as his unflagging patience with my efforts to learn and write about neuroscience. His support was a vital part of my research experience. It has been my joyful privilege to learn from and work with these scientists. To the Department of Nuclear Science and Engineering, for allowing me to explore a broad range of disciplines while helping me to develop my knowledge of core engineering principles. 1 am truly grateful for the academic and financial support the Department has given me. I wish to express my sincere thanks. Table of Contents 1. Background 1.1 Motivation 1.2 Reward and Associative Learning 1.3 Neuroanatomy of Reward 1.4 The Dopamine Hypothesis of Reward 1.5 Imaging the Brain 1.6 Insights from fMFU into Brain Function 2. Methods 2.1 Implantation of Stimulating Electrode 2.2 Operant Training 2.2.1 Operant Training: Part 1 2.2.2 Operant Training: Part 2 2.3 Imaging 2.3.1 Functional Imaging: Part 1 2.3.2 Functional Imaging: Part 2 2.4 Imaging Data Analysis 3. Results 3.1 Part 1 3.1 .1 Operant Training Results 3.1 $2Imaging Results 3.1.3 Part 1 Figures 3.2 Part 2 3.2.1 Operant Training Results 3.2.1 .1 Shaping Sessions 3.2.1.2 Variable Sessions 3.2.2 Imaging Results 3.2.3 Part 2 Figures 4. Discussion Chapter 1 Background Animals are motivated to pursue behaviors for which they are rewarded. To discover the neural basis of reward, as well as the physiological mechanism of motivation, is therefore a primary goal of neuroscience. The neural circuitry involved in goal-directed behavior is complex. Understanding it requires parsing and analyzing numerous functions, such as perception of stimuli as rewarding, evaluating the rewarding or aversive quality of stimuli, recalling these values, and reacting in a context-appropriate fashion based on instinct or cognition. The body is regulated by neuroendocrine, autonomic and motivational mechanisms, the last being the most difficult to quantify, as motivation is an inferred internal state that is used to explain behavioral variability. Motivational states are regulated by the perceived needs of tissues (i.e. thirst and hunger), as well as anticipatory mechanisms, hedonic factors and ecological constraints.' The holistic physiological mechanism that underlies the behavioral manifestations of motivation and reward must be readily influenced by endocrine, visceral and somatic afferent stimuli, and have outputs that enable discrete behavioral control, for instance, ensuring that a hungry individual eats rather than drinks. This mechanism is the integration of myriad neural processes - some better understood than others - that are mediated by various regions in the brain by virtue of the specialized cells they are composed of. Neurophysiological studies have implicated many brain structures in learning, motivation and reward. These biological substrates are believed to work together in a distributed neural network of reward and motivation processes. The neural substrates for the different aspects of goal-directed behavior have been identified by lesion, drug and histological studies, as well as metabolic mapping and fMRI. The physiology of this circuitry, in terms of synaptic organization and the topography of neural projections, is well understood. However, there remain fundamental questions pertaining to the precise functional significance of each substrate within this complex system. fMR1 is uniquely positioned to reveal pivotal insights into these questions because it is relatively noninvasive, and it allows researchers to observe instantaneous changes in brain activity while experimenter-regulated behavior or stimulation is occurring. 1.1 Motivation Motivation is defined by its effect, which is to incite context-specific responses, such as flight in the face of danger, or consuming food when the body recognizes a nutrient deficit. Because the precise physiological mechanisms of motivated behavior are not completely understood, motivational theories are an important aspect of neuroscience models for explaining animals' interactions with their environment. Concepts of motivation are used to interpret individuals' variable reactions to constant stimuli especially affectively important stimuli - and the directedness of goal-seeking behavior. One of the earliest and most celebrated conceptions is that of homeostasis, the maintenance of a stable internal state.2 Homeostasis requires a regulatory system that uses a setpoint, a predetermined value of some physiological parameter, to maintain a stable internal state. The purpose of this regulatory mechanism is to avoid dangerous or unwanted deviations from the setpoint, or a narrow range about the setpoint. Although many biological processes involve such a system, including error detectors and a negative feedback mechanism to correct errors (i.e. insulin to modulate blood sugar concentration), learned behavior patterns are often more complicated. It would seem that a truly homeostatic system only exists where deviations from the setpoint are immediately lethal, or at least severely detrimental.2 For instance, warm-blooded animals' bodies work to maintain a relatively stable internal temperatures, as the proteins and mechanical systems that sustain life do require a particular environment. However, most physiological parameters can vary a great deal over long periods, allowing individuals to survive under extremely variable circumstances. Despite the seeming incompleteness of the homeostasis theory, much of behavioral neuroscience of hunger, thirst, and other ingestive behavior has involved searching for physiological setpoints and deficit signals. Homeostatic outcomes can arise without homeostatic mechanisms if stability is maintained by anticipatory motivation or a balance achieved by opposing neural, hormonal and behavioral forces. Anticipatory motivation refers to conditioned responses or otherwise preemptive mechanisms that are essentially reactions to predictions of a deficit. The aforementioned balance of forces refers to settling-point regulation. Biopsychologist Robert Bolles claimed that the homeostatic concept of body weight was simply a plausible fiction, and that weight is kept relatively stable by opposing neuroendocrine and psychological reflex mechanisms that are in balance at settlingpoints.' It is clear that there is no body weight setpoint, as this characteristic almost always changes throughout adulthood. However, if the compulsion to eat is determined by internal satiety, the availability and palatability of food, social circumstances and psychology, then one can consider that a changing body weight is transiently settling around a point that is only moderately stable. Originally, the homeostasis concept did not explicitly involve setpoints, and so it seems that the artificial joining of homeostasis, setpoints and error detections may be an unnecessary semantic complication on the part of behavioral neuroscientists. Considering homeostasis and pseudo-homeostasis is only the first step to forming a theory of motivation. Undesirable states (such as hunger), unstable interactions of competing internal forces (such as stress-related overeating), or the availability of a superior state (such as physical and mental satiation), can motivate goal-seeking behavior. The intervening variable concept of drive simplifies causal stimulus-response (S-R) relationships. That is, an undesirable state such as water deprivation, can be considered a stimulus or independent variable and the response (dependent variable) to this stimulus may be to drink more water than usual, work harder for the water, or tolerate contaminated water. However, water deprivation is not the only stimulus for these behaviors. Rather than considering each S-R relationship as a separate mechanism, it makes more sense to include the intervening variable, or drive, that can link many independent variables with the dependent variables attributable to a common condition caused by the stimuli (Figure 1.I). Furthermore, this drive concept is validated when predictions regarding the dependent variables it may cause are proven in e ~ ~ e r i m e n t s . ~ Independent Variables Dependent Variables Amount water drunk Water deprivation Wwk for sip Quinine tolerance Distance for water Water deprivation Amunt water drunk NaCl injection/' Distance for water \ Figure 1.1 DRIVES SIMPLIFY S-R RELA TIONSHIPS. Drives connect stimuli (independent variables) and responses (dependent variables). The existence of causal relationships mediated by intervening variables is obviously a very basic concept of motivation, and in order to avoid oversimplification, researchers have posited that truly motivated behavior requires some additional criteria. Epstein suggested that such behavior requires that the individual demonstrate flexible learning and coordinated appetitive behavior, actions that indicate prediction of a goal or reward expectation and hedonic reactions. Various models of drive's involvement in motivation have been put forth, such as drive reduction by reward deliverance. However, after the advent of internal self-stimulation experiments, wherein experimental subjects could obtain free rewards via direct electrical brain stimuli, incentive motivation concepts superseded the simpler drive-based models of motivation. These studies, as well as others focused on hedonic rewards, showed that stimuli thought to satisfy cravings would actually reinforce goal-seeking behavior, and so the focus of research into motivation shifted to classical and operant conditioning. 1.2 Reward, Reinforcement and Associative Learning Rewards are stimuli administered to individuals following a correct or desired response that increase the probability of occurrence of the response. Associative learning is the process by which discrete percepts or ideas, such as the reward and the response, are linked to one another. The concept of hedonic reward is central to most motivation theories. Before the 19603, essentially all explanations of reinforcement behavior involved drive concepts. However, in a paper titled The Pleasures of Sensation, Pfaffmann reinterpreted behavioral studies involving hedonic rewards using physiological evidence. This evidence regarding the neural encoding of hedonic sensations suggested that those sensations were rewarding and motivating in and of themselves, not requiring the existence of dependent variables to lead to motivated behavior. Bolles proposed that cognitive predictions, or learned expectations, of reward were the actual source of motivation. These expectancies are what Pavlovian psychologists would consider conditioned stimulus (CS) and unconditioned stimulus (US) interactions. In order to explain why CS-US expectancies caused motivation, a psychologist named Dalbir Bindra asserted that the CS itself is eventually perceived as a hedonic reward. Frederick Toates qualified this developing theory by suggesting that physiological depletion states could direct and enhance the incentive value of rewards. This explanation readily incorporates the logical aspects of homeostasis-like motivation concepts into modem theories of associative learning.2 Associative learning is studied using Pavlovian (classical) and operant (instrumental) conditioning. In the former, the experimenter provides both the stimulus and reward, allowing him or her to control when each learning episode occurs, thereby gaining insights into reward prediction - or errors in reward prediction - as well as qualitative reactions to the rewards. In contrast, operant conditioning requires that the experimental subject adjust decision making based on experience in order to minimize negative stimuli or maximize positive stimuli. That is, the subject determines the number, magnitude or quality of its rewards by its performance. Pavlovian behavior can be considered adaptive in a limited sense for a static environment environment, but a dynamic environment requires the acquisition of new behavioral strategies. This capacity for learning is typically studied with operant conditioning procedures. However, both human experience and comparative research reveal that when instrumental behavior passes a certain repetitiveness threshold, it often fails to adjust to new situations or the omission of an expected r e ~ a r d . ~ In an article written for Annual Reviews in Psychology, John Pearce and Mark Bouton discuss the relative merits of currently influential theories of associative learning. These theories address different aspects of learning, and lead to different implications for understanding how individuals form causal judgments. The current understanding is that the magnitude of a conditioned response depends upon the strength of perceived connection, or associativeness, between the CS and US for a given trial. This concept, which is based on the Rescorla-Wagner (1972) theory, explains a far wider range of experimental findings because it includes the assumption that the change in associative strength of a stimulus is determined by the difference between the magnitude of the UC and the sum of associative strengths of stimuli present, as opposed to considering only the primary stimulus. The theory is a metric for describing CS-US associations, but does not address what these relationships are, how they're formed or how they influence behavior. One way to gain insights into these issues is to study how associability changes with context and conditioningO4 Wagner argued that the salience and associability of a CS can only be changed when the elements it excites are at the focus of attention. Furthermore, the state of attention is dependent on the context and time course of an individual's exposure to the CS. Evidence supporting this theory is provided by studies pairing flavor and symptoms of illness. As one might expect, associations between the flavor and the symptoms were not as strong when brief access to the flavor sample was provided long before the symptoms were apparent, as opposed to only a few hours before the symptoms manifested. Mackintosh proposed that the associability of a stimulus can be determined by how accurately it predicts reinforcement. However, evidence from pattern discrimination studies (George and Pearce 1999, Mackintosh and Little 1969, and Shepp and Eimas 1964) seem to support this theory's prediction that attention to stimulus increases if it is the best available predictor of reinforcement, while electrical shock experiments (Pearce and Hall 1980) contradict it. Based on this shock stimulus study and others, Pearce and Hall proposed that the associability of a stimulus is high when it is followed by an unexpected US, but low when it is followed by a familiar or expected US. That is, more attention is paid to the stimulus while the subject is still learning about its significanceO4These conditioning studies were performed primarily to form plausible hypotheses about the internal process of learning. However, connecting learning with biological mechanisms requires using both behavioral evidence of reward and motivation, along with a quantitative understanding of neural function. 1.3 Neuroanatomy of Reward Researchers have tried to connect structure and function for decades by observing the behavior of intracranial self-stimulating and brain damaged animals, for example, noting that decerebrate rats do not learn taste aversion, and that intracranial electrical stimulation of hypothalamus and associated structures can reinforce operant conditioning? More recently, microscopic processes such as cerebral glucose utilization during electrical stimulation of the ventral tegmental area and the effect of dopamine depletion in the nucleus accumbens have been observed, building the body of knowledge concerning the inner workings of the neuroanatomy of reward.' The phenomenon of reinforcement is thought to involve the integration of myriad neural processes, each with its own physiological substrate or system of substrates. The basal ganglia and limbic system are especially important to mediating reward perception and adaptive behavioral response (Figures 1.2 and 1.3). t a GARA 11 Dwarnina Glulamdc T.a~rctnI rJw tcxl I". t l s * r t M ~ l q t Figure 1.2 CIRCUITRY MEDIATING PERCEPTION OF REWARD and initiation of adaptive responses to reward. Arrows connecting the ventral pallidum and various structures, as well as the connection between the nucleus accumbens and ventral tegmental area indicate GABA pathways.8 i 1 I I i 1 I I , I iI I 1 <:J 'w" (-)l***' . .'4'--.*ul-_- -- _ _-_-_ __ _ - --- - , ----- Figure 1.3 SCHEMATIC DIAGRAM OF NEURAL SUBSTRATE INTERATIONS. Thi~~limbic-striatal-pallidal circuitry is implicated in reward processes. I The basal ganglia consist of the globus pallidus and caudate-putamen (striatum), as well as the subthalamic nucleus and substantia nigra. Although the caudate nucleus and putamen are separated in many mammals, poor development of the rat's internal capsule makes them hard to distinguish. The subthalamic nucleus and the substantia nigra are brainstem structures, but they are closely related to the striatopallidal neuronal circuitry, and so are generally considered as part of the basal ganglia. The rat's striatum is a large, gray mass occupying the deepest part of the cerebral hemisphere. It serves as a recipient of topographically organized cortical and amygdaloid inputs, subcortical afferents from thalamic nuclei, as well as monoaminergic neurons and other cell groups. The ventral striatum, which can be regarded as a continuum of the dorsal striatum, is composed of ventromedial parts of the caudate-putamen, nucleus accumbens and olfactory tubercle.1° The substantia nigra and adjacent nuclei in the ventral tegmental area have been strongly implicated in the generation of hedonic reward. The substantia nigra is called dark substance for the pigmented cells of the SN pars compacta (SNc), which overlay the SN pars reticulata (SNr). The SNr is a pallidal structure composed of GABA-ergic (GABA is y-Aminobutyric acid) neurons that project to the thalamus and surrounding structures. The caudate-putamen (Figure 1.4) is innervated by dense dopaminergic projections arising from the SNc and ventro-lateral VTA. The rat's substantia nigraventral tegmental area (SN-VTA) lies in the mesencephalon and is 2.5 mm long and 3 mm wide. The SN contains 10,000 to 12,000 neurons on each side, while the VTA contains 27,000 neurons on each side." Figure 1.4 THE BASAL GANGLIA OF THE RAT, as well as the hypothalamus and sublenticular extended amygdale (SLEA). acanterior colmmissure, Acb- accumbens nucleus, CPu- caudateputamen, Tu- olfactory tubercle, VP- ventral pallidum, GPglobus pallidus, EP- entopeduncular nucleus, STh- subthalamic nucleus, SNC- substantia nigra, pars compacta, SNR- substantia nigra, pars reticu~ata.'~ The limbic system is comprised of the amygdala, hypothalamus, cingulated cortex, anterior thalamus, mammillary body and hippocampus (Figure 1.5). The hypothalamus is connected to the ventral tegmental area via the medial forebrain bundle (MFB), a tract of nerves whose stimulation has been shown to produce strong reinforcement of conditioned behavior. The limbic system is thought to integrate emotional information from various parts of the nervous system. It is believed that the hippocampus helps to register memories, the amygdala helps in assessing experiences, and the forebrain assists in decision-making. The ventral striatum and adjacent nucleus accumbens receive afferents from these limbic structures, and deliver outputs to structures such as the ventral pallidum. The MFB distributes impulses to various limbic structures from dopaminergic neurons in the SN-VTA, which together with glutamatergic imputs determine the output of ventral striatal GABA-ergic spiny neurons projecting into the ventral pallidurn! Cingulate cortex - I Hlppocarnpus Amygdala Figure 1 . 5 ~THE STRUCTURES OF THE LIMBIC SYSTEM, a medial view of the left hemisphere of the human brain! Figure 1.5b THE THREE-DIMENSIONAL ORGANIZATION OF THE HIPPOCAMPAL FORMATION IN THE RAT BRAIN. (A) The C-shaped hippocampus where f indicates the fornix. (B) Three horizontal sections at different dorsoventral levels. (C) The surface of the hippocampal formation where s is the septa1 pole and t is the temporal pole. The three coronal sections are shown at different rostrocaudal levels. DG- dentate gyrus, fi- fimbria, and S- subic~lum.'~ While both the amygdala and prefrontal cortex supply excitatory input to the nucleus accumbens, the amygdala is thought to be more involved in regulating responses to conditioned rewards, and the prefrontal cortex integrates short-term memories with the behavioral responses (Figure 1.2). There is considerable evidence that interactions between structures in the limbic system and dopamine-related functions in the ventral striatum influence the effects of conditioned stimuli on goal-directed behavior. One example of such evidence implicates N-methyl-D-aspartate (NDMA) glutamate receptors in the amygdala in the ability to learn to approach stimuli that indicate food rewards. It is believed that this associative learning requires that information be relayed to the ventral striatum where the activity of ascending mesencephalic dopamine projections helps to determine the selection of appropriate responses. Infusions of NDMA and non-NDMA receptor antagonsists into the core region of the nucleus accumbens, as well as lesions in that same area, are known to impair rats' foraging behavior. The performance of rodents in spatial memory tests is sensitive to intra-accumbens infusions of haloperiodol, a dopamine receptor antagonist. Optimization of foraging, which clearly involves spatial memories, is thought to be dependent on projections of the hippocampal formation. These connections suggest that the ventral striatum is a point of convergence for complex spatial information and conditioned stimuli (Figure 1.3).9 The activity of spiny projection neurons and cholinergic interneurons in the striatum has been implicated in several aspects of learning. Spiny projection neurons are relatively quiescent cells that phasically fire in association with learned movements. The cholinergic neurons show sensory activity during sensorimotor learning. The learned response, which is dopamine dependent, is an excitation in the activity of the cholinergic interneurons followed by a transient suppression. Essentially, neural responses occurring in the striatum in association with learning are influenced by dopamine-dependent modulation of thalamostriatal and corticostriatal pathways. Spiny projection neurons receive excitatory synaptic inputs from the thalamus and cortex (Figure 2.8). Phasic dopamine release during reward-related learning occurs at the excitatory corticostriatal and thalamostriatal synapses, and changes in the afferent activity of these synapses presumably underlies the aforementioned changes in striatal activity.' ' -0PnknkrNwmkoloOy Ficrure 1.6 THE STRIA TAL NEURAL CIRCUITRY INVOLVED IN REWARD-RELATED LEARNING. Open circles, red circles, black triangles and blue triangles refer to glutamatergic, dopaminergi?] GABA-ergic and cholinergic synaptic contacts respectively. Spiny projection neurons in the nucleus accumbens also play a role in the initiation of behavioral responses to environmental stimuli. These neurons receive excitatory input from cortical and thalarnic structures which often leads to dopamine release, indicating that this neural substrate is mediating an information gating function. However, the effects of dopamine on neural activity in the nucleus accumbens and adjacent striatum are state-dependent. That is, studies have shown that in the presence of a depolarizing excitatory tone, the stimulation of dopamine receptors prolongs excitatory responses, while in the absence of this tone the stimulation is inhibitory. In effect, dopamine transmission in the nucleus accumbens serves as a gating function that also augments sustained excitatory input to spiny cells while inhibiting their output. Characterization of dopamine's role in the nucleus accumbens therefore requires that studies employ conscious subjects, as the spontaneous presence of an excitatory tone is not present in vitro or in anesthetized animals. The ventral pallidum is recognized as the primary terminal field for spiny neurons of the nucleus accumbens, and its role in regulating motor behavior is well established. Studies of the mechanisms of drug reward have demonstrated that it also belongs in the motive circuit. Topographic analysis of the anatomical and function connections between the nucleus accumbens, ventral pallidum, mediodorsal thalamus, and prefrontal cortex has revealed that a series of contacts extends from the nucleus accumbens shell region outward to the core. The thalarnic portion of this circuit only permits the flow of information from the ventral pallidum to the prefrontal cortex. Because it involves the prefrontal cortex, this thalamic subcircuit may provide access to short-term memory functions to aid in context-appropriate behavioral responses to reward. 1.4 The Dopamine Hypothesis of Reward Learning is driven by deviations between predicted time and quality of rewards and actual reward delivery. Adjusting expectations leads to adaptive behavior that leads to maximizing rewards and minimizing aversive stimuli. Multiple lines of evidence support the theory that dopamine neurons of the SN-VTA project to structures involved in goal-directed behavior and motivation. For example, investigations of the role played by individual midbrain dopaminergic neurons showed that changes in dopaminergic activity correlate with the animals' transfer of behavioral reaction from the unconditioned stimulus to the conditioned stimulus. Studies utilizing bioelectrical sensors have shown that dopamine neurons emit a positive signal when the reward is better than predicted, and negative signal when the reward is worse than expected, or not delivered when it is expected.l 2 Dopamine has also been linked to motivation, reinforcement, addiction and memory. It was first identified when damage to nigrostriatal dopamine fibers caused feeding and drinking deficits, and damage to mesolimbic dopamine fibers decreased forward locomotion, which strongly associated with goal-seeking behavior. Neuroleptics, drugs that block the effects of dopamine by antagonistically binding to dopamine receptors, have been shown to attenuate or block the rewarding effects of lateral hypothalamic electrical stimulation. Immediate dopaminergic activation is not required for motivation, as experienced animals perform previously rewarded actions often until they have had considerable experience under the influence of the neuroleptic. In addition, studies have revealed that predictable rewards do not significantly increase Fos-like immunoreactivity in dopamine neurons, while the prefrontal cortex display marked Foslike activation. This evidence suggests that dopamine plays a role in imprinting the rewarding quality of stimuli, which is essential for control of goal-directed behavior by expectations based on past experience. This is not dopamine's only physiological function, and there are other substances, such as glutamate acting on NDMA receptors, that enable conditioned self-stimulation, but given these caveats, the dopamine hypothesis of motivation and reinforcement is well estabilished.13 This preliminary review of the neural substrates of reward has shown that the amygdala, nucleus accumbens and VTA, work together to processes rewarding stimuli. Domaninergic inputs to the VTA seem to provide prediction error signals. Dopamine, as well as glutamergic afferents from the thalamus, play important roles in prefrontalcortex-mediated short-term memories. In addition, the effects of conditioned reinforcers may be mediated by interactions between glutamatergic afferents and midbrain dopamine neurons. The physiological evidence presented in the aforementioned articles informs models of neurotransmitter activity and our understanding of goal-directed behavior. By combining observations of conditioned or goal-directed behavior and physiological observation strategies such as lesion studies, drug administration and fMRI, the mechanisms of reward and learning can be spatially, temporally and chemically described. 1.5 Imaging the Brain A wide range of rewarding stimuli have been shown to modulate BOLD signal responses in a variety of brain structures depending on the behavioral task or type of stimulus. But how does tMRI work, and how does BOLD imaging reflect brain function. Although fMRI has become the dominant technique for the study of the functional organization of the human brain during perceptual, cognitive and motor tasks, neuroscientists are still developing the theory as to how BOLD signal changes represent brain function. Positron emission tomography, which exploits ~ ~labeled ' water ~ as 0a tracer to reflect parenchymal blood flow, has shown that increases in blood flow are linearly proportional to increases in neuronal activity. In a given voxel, the volume element determined by MRI pixel size, there may be hundreds of thousands of these neurons. The BOLD effect reflects changes in blood flow, blood volume and blood oxygenation in the arteriole, capillary and venous vascular beds in an intricate and variable combination. However, it has excellent spatial and temporal resolution properties, and offers considerable flexibility in paradigm design.I4 A well defined, qualitative relationship between BOLD signal changes and neuronal activation has not been demonstrated, but is possible to obtain a quantitative understanding of physiological changes as they relate to the observed dynamics of magnetic resonance signal. Signal changes arise from differences in the magnetic susceptibility between blood vessels and the surrounding tissue due to the presence of paramagnetic deoxyhemoglobin in the blood. This variation in magnetic susceptibility manifests as differences in relaxation rate, particularly transverse relaxation rate. Spinecho and gradient-echo imaging, which depend on the relaxation rates R2 (l/Tz) and R ~ * (1/~2*) respectively, are used for different applications owing to the magnetic field variations of the latter and lower degree of BOLD contrast observed with the former.15 When Ogawa et a1 published their groundbreaking results of brain MRI with BOLD contrast in 1990, they actually stated that BOLD contrast was not observed in spin-echo images.16This conclusion was probably drawn because the spin-echo (SE) approach yields significantly smaller fractional signal changes compared to the gradientecho (GE) method.I7 To better explain the different consequences of SE and GE imaging, it is worthwhile at this point to review the physical bases and mathematical representations of both methods. While GE imaging generates an echo by reversing the gradient polarity, SE imaging produces an echo by applying a a pulse after TEl2. T ~is* the sum of T2 and ;T: T2 is the time constant for relaxation caused by the local magnetic fields of the nearby protons and T: represents the effect of macroscopic and microscopic magnetic field heterogeneities. The R pulse applied in SE eliminates the effect of Tzf, but at the same time reduces the fractional signal change. Deoxyhemoglobin, with its ironcontaining heme groups, is primarily T2-influencing, which is why T2-weighted (repeat time, TR >> longitudinal relaxation time, T1) imaging makes most sense for BOLD fh4RI.15 1.6 Insights from fMRI into Brain Function Brain regions in which rewarding stimuli consistently increase activity include the orbitofrontal cortex (OFC), amygdala, striaturn/nucleus accumbens and dopaminergic midbrain. In order to answer questions concerning the precise role these structures play, either as members of a neural circuit or independently, in sensing, predicting and valuing rewards, fMRI research has employed myriad primary and conditioned rewards in both human and animal studies. These rewards include food and water, appetitive smells, sexual stimuli, rewarding electrical stimulation, and even social rewards like money and positive feedback. Our understanding of how these rewards are processed is developing through an iterative process of experimentation and theorizing.18 The OFC receives direct inputs from the taste and olfactory cortices as well as higher-order visual and somatosensory areas. It is ideally located to store the reward value of sensory stimuli, and in fact OFC neurons in rats have been shown to respond preferentially to different tastes. Furthermore, these neurons decrease their firing when consumption of rewards induces satiation, and the stimulus becomes less rewarding. One clever approach to this research involves scanning hungry subjects who are exposed to two food-related stimuli. The subjects consequently consumed one of the corresponding foods until they were satisfied, and were scanned again. Responses in the OFC showed that activity related to the food that was eaten decreased, but activity for the other foodrelated stimulus did not. Studies such as this indicate that the OFC codes for the rewarding quality of stimuli, rather than their sensory aspects.19 fMRI studies have shown that the amygdala is involved in sensing aversive stimuli, for example, it is preferentially activated by images of frightening or angry faces. However, the amygdala is also activated following positively reinforcing stimuli, and so it seems that activity in the amygdala is really related to how arousing the stimulus is, rather than whether it is rewarding or aversive. In one study, BOLD signal changes were shown for subjects exposed to unpleasant (valerica acid [Val]) and pleasant (citral [Cit]) odors (Figure 1.7). The BOLD timecourse is remarkably similar for both stimuli at the same concentration, but there is a marked difference between timecourses for stimuli at different concentrations. These results, and the findings of similar studies, seem to contradict evidence from animal lesion and human neuropsychology research. In fact, these past studies have been reevaluated based on the idea that reward value is an interaction between valence and intensity. That is, the concentration or arousing qualities of a stimulus may have an effect on its affective character?' I ~w;nndv IE -7 S w M n Figure 1.7 TIMECOURSE OF AMYGDALA BOLD RESPONSE. Presented are the time course (line plots) and peak hernodynamic responses (bar graphs) to the high and low intensity presentations of valeric and citral in the left (a) and right (b) a r n ~ ~ d a l a . ~ ' After neural substrates have perceived the reward, assessed its value and valence and committed this information to short-term memory, the organism must still have a way to generate reward-directed behaviors. Research has shown that electrical stimulation of the ventral striatum is highly rewarding due to phasic dopamine release, and fMRI studies of reward processing have found that BOLD signal changes correspond to changes in reward amplitude. The timing of these changes supports the hypothesis that these responses in the ventral striatum signal reward prediction errors. In one study involving human subjects, fMRIwas used to correlate prediction errors in reward delivery with BOLD changes in the human striatum. The quantitative basis for the strategy of using fMRIto observe metabolic activity during reward delivery is that dopamine neurons give transient responses to deviations in expectations about reward delivery. Goal-directed behavior requires that the individual demonstrate flexible learning and coordinated appetitive behavior, actions that indicate prediction of a goal or reward expectation and hedonic reactions. A positive prediction error (nothing expected, reward delivered) can cause increased acivity in the left putamen (Figure 1.8). Negative prediction errors had the opposite e f f e ~ t . ~ ' Positive orediction error: 'f -2 Figure 1.8 LEARNING AND THE HUMAN STRIATUM. An illustration of the-results of fMRl experiments involving positive prediction errors." Neuroimaging studies have also implicated the amygdala and OFC. In the same way that these structures contribute to the processing of new rewards, representation of the predictive value of familiar rewards requires different types of characterizing information. For one study fMRI study, predictive reward values were coded by BOLD activity in the OFC, amygdala and striatum (Figure 1.8). Arbitrary visual cues were paired with two food-related odors in a classical conditioning paradigm, and after subjects were fed to satiety with one of the foods, responses to the predictive cue for that devalued odor decreased. Additionally, the striatum is known to respond to aversive stimuli and 'non-rewarding' salient events, such as random distractor stimuli. This data, combined with studies that show preferential activation in the striatum during active reward tasks as opposed to passive tasks where no action on the subject's part is required, suggests that the striatum in involved with coding stimulus saliency. More research is required to characterize the expanding list of functions posited to be mediated by the striatum.l9 REWARD VALUE CODING in the (a) striatum. (d) is relative difference in activity pre to post satiety. Neuroimaging has been vitally important in identifying and roughly localizing important stages of reward podessing. However, the specific hctions of the neural 1 substrates that have been impli ated in the reward circuit have been incompletely, and possibly erroneously, describe .Given the current evidence, it seems that the occurrence of salient stimuli is signaled by the amygdala, which also initially codes their predictive value. The OFC assesses the vilue of the stimulus ventral striatum then further integrates this information so that behavidral responses are context-appropriate. However, it is not clear whether the OFC is guidiJg behavior, or modulating the mental representation of the information itself. In additib, complex behaviors are not only implemented through involvement of individual struc/tures,but rather by interactions between many neural substrates. Furthermore, studies of reward-related neural responses have revealed BOLD activity in the parietal cortex, posterior and anterior cingulated, and dorsolateral prefrontal cortex.I9 tMRI is also being used to answer questions as to how the brain and behavior is influenced by hormones and other poorly understood sensory stimuli. This, and other work regarding how neurons obtain representations of reality that enable the individual to exert influence through goal directed behavior will contribute to our understanding of the neurobiology of social interactions. fMRI is the only tool that enables researchers to investigate neural activations over large portions of the brain with minimal extraneous or adverse effects on the subject. It will therefore continue play a crucial role in discovering the neural mechanisms of reward. The motivation for this project is to distinguish loci of reward magnitude processing from other substrates of goal-directed learning. Using animal subjects provides for the flexibility of intracranial self-stimulation and extended fMRI sessions. Electrical stimulation of the MFB is a robust substitute for natural reinforcers in the context of animal conditioning paradigms. The parameters of this stimulation are controlled by the experimenter and its results are thought to be unadulterated by appetitive or emotional factors. Evaluation and recall of rewards are complex processes involving many substrates, and by examining the real-time BOLD signal changes in anesthetized and eventually in awake subjects, we hope to better understand the physiological basis of reward processing in terms of the behavioral responses it influences. References 1. Principles of Neural Science. Ed. E. R. Kandel, J. H. Schwartz, T. M. Jessell. 3rd ed. Appleton and Lange. Nonvalk, Connecticut, 199 1. 2. Berridge, Kent C. "Motivation concepts in behavioral neuroscience." Physiol Behav. 2004 Apr; 8 l(2): 179-209. 3. Dayan, P., Balleine, B. W. "Reward, motivation and reinforcement learning." Neuron. 2002 Oct 10; 6(2): 285-98. 4. Pearce, J. M., Bouton, M. E. "Theories of associative learning in animals." Annu Rev Psychol. 200 1 ; 52: 111-39. 5. Stellar, E., J. R. Stellar. The Neurobiology of Motivation and Reward. SpringerVerlag New York, Inc. New York, NY, 1985. 6. Banich, M. T., Neuropsychology: The Neural Bases of Mental Function. Houghton Mifflin Company. New York, NY, 1997. 7. Demarest, R. J., C. R. Noback. The Human Nervous System: Basic Principles of Neurobiology. McGraw-Hill, Inc. 198 1. 8. Kalivas, Petenv W. and Mitsuo Nakamura. "Neural systsms for behavioral activation and reward." Current Opinion in Neurobiology. 1999,9:223-227. 9. Robbins, T. W. et al. 'bNewrobehavioralmechanisms of reward and motivation." Current Opinion in Neurobiology. 1996,6:228-236. 10. The Rat Nervous System. Ed. George Paxinos. 2nded. Academic Press. San Diego, CA, 1995. 11. Wickens, Jeffery R. "Neural mechanisms of reward-related motor learning." Current Opinion in Neurobiology. 2003, 13:685-690. 12. McClure, S. M. et al. "Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum." Neuron. Vol. 38,339-346, April 24 O 2003 by Cell Press. 13. Schultz, W. et al. "A Neural Substrate of Prediction and Reward." Science. Vol. 275, 1593-1598, 14 March 1997. 14. Menon, Ravi S. "Imaging function in the working brain with fMRI." Current Opinion in Neurobiology. 200 1, 1 1:630-636. 15. Cho, Jones and Singh. Foundations of Medical Imaging. John Wiley & Sons, Inc. 0 1993. 16. Ogawa et al. "Brain magnetic resonance imaging with contrast dependent on blood oxygenation." Proc. Natl. Acad. Sci. USA;87:9868-9872 (1990). 17. Jezzard, P. Computerized Medical Imaging and Graphics; 20(6):467-48 1 (1996). 18. McClure, S. M. et al. "The Neural Substrates of Reward Processing in Humans: The Modem Role of fMRI." Neuroscientist. Volume 10, Number 3,2004. 19. O'Doherty, John P. "Reward representations and reward-related learning in the human brain: insights from neuroimaging." Current Opinion in Neurobiology. 2004, 14:769-776. 20. Anderson, A. K., et al. " Dissociated neural representations of intensity and valence in human olfaction." Nature Neuroscience. 6: 196-202. 21. McClure, S. M. et al. "Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum." Neuron. Vol. 38,339-346, April 24 O 2003 by Cell Press. Chapter 2 Methods 2.1 Implantation of Stimulating Electrode Lewis rats between 250 and 275 grams were chosen for both Part 1 and Part 2. The silver unipolar electrodes consist of a 0.10 mm diameter stimulating wire and 0.05 mm diameter grounding wire, both Teflon-coated, housed in polyethylene tubing and connected by a plug. The coating was stripped from the grounding wire, but not from the electrode. The rats were anesthetized and the stimulating electrodes were lowered toward the left MFB by stereotaxic surgery using standard anatomical co-ordinates (2.2 mm caudal, 1.5 mm lateral to the bregma, 8.5 mm ventral to the dura). Six beryllium copper screws were implanted in the skull to provide purchase for the dental cement that secured the electrode. One additional screw was placed over the right visual cortex for current return. The rats were allowed at least five days to recover from surgery before behavioral training. 2.2 Operant Training The configuration of the stimulation set-up is diagrammed in Figure 3.3. For Part 1, the operant chamber was outfitted with a single 2.5 cm nose-poke device from Med Associated. Stimulation was delivered and current was regulated by an Iso-Stim stimulus generator that received parameter-modulating input from a computer running customwritten software called Operate and Operatesequencer (Metrowerks Code Warrior) for Part 1 and Part 2 respectively. A second nose poke was added on the same panel as the first for Part 2 in order to implement a choice-based experimental protocol. Pmtdses current amplitude oorRml Generates strnulus according to the operang program s parameters and sends ~tto the subject. Stimulus Generato I Sets~ngfor~8ndcantra(s ~ O f ~ ~ ~ t h e s t i m u ItfellsthggenaraSwtosendwhena response is qisbered by the nose poke device. Figure 2.3 SCHEMATIC REPRESENTATION OF STIMULATION SETUP. i k r ~ 2.2.1 Operant Training: Part 1 Rats were exposed to stimuli consisting of 500 ms trains of 0.2 ms pulses at 50, 100 and 200 Hz. One stimulus was delivered for each nose poke, and self-stimulation amplitudes of 0.12 to 0.30 mA were chosen to match the current threshold for minimal induced motion responses for relatively sustained poking. The rats were allowed at least three training sessions prior to imaging at one session per day, and at this point they had learned to nose poke vigorously, albeit at varying rates, for reward with intermittent periods of inaction. The following figure is an example of the relevant stimulation parameters printed to the screen by the Operate software during its execution. USratio: 1 Usinterval: 5000000 microseconds Usduration: 900000 microseconds pulsewidth: 500 microseconds pulseinterval: 10000 microseconds Figure 2.4 OPERATE PROGRAM PARAMETERS. US stands for unconditioned stimulus, and 1 is the number of responses required to obtain a reward. 2.2.2 Operant Training: Part 2 With the addition of the second nose poke device, a new program called Operatesequencer was introduced. It identified the nose pokes as Input 1 and Input 2 and assigned them alternating roles as a reference frequency that remained constant throughout the trial, and a variable comparison frequency. The functionality of either device would be randomly chosen at the beginning of each trial and would reverse halfway through the trial. The pulse frequencies of the variable stimulation were determined by a spacing function that randomly chose frequencies above and below a set mean comparison frequency at 0.05 log unit (or 12%) intervals. The parameters of the stimulation, including pulse interval, train duration and train interval, were also controlled through the program. However, current and the pulse length of 0.1 msec were determined by the stimulus generator. The following figure shows examples of the stimulation parameters printed to the screen by the Operatesequencer software during its execution of a variable trial. trainduration: 1.0 sec traininterval: 1.5 sec reffrequency: 200 Hz compfrequency: 200 Hz compspacing: 1.1200 numtrials: 1I trialduration: 300 sec trialinterval: 320 sec randomtrials: 1 repeatfirst: 1 midtrialswitch: 1 primingtrain: 1 seed: 2530 compfreqlist: [ IXI 1 double] compidentity: [0 0 1 1 1 0 1 1 0 1 I ] Figure 2.5 OPERATESEQUENCER PARAMETERS. "1's" refer to activation of the parameters they apply to. This means that the variable comparison frequency trials are randomized, that the first of the 11 trials is repeated, that during each trial the functionality of the nose poke devices switches mid-trial, and that at the beginning of each trial, the rat is given a free reward. Compidentity determines which of the inputs (0 or 1) is defined as the reference, and the seed is the random number that determines this vector. Operant conditioning entailed shaping sessions followed by training sessions. Data was collected for one session per day Subjects were stimulated at various current amplitudes to determine the highest value that would not cause an involuntary movement. During shaping sessions, 200 Hz was used for the mean comparison value as well as the reference, and the spacing ratio was set at 1. Shaping sessions consisting of six trials with of 600 second durations and 620 second intervals were then conducted at this current to gauge the responsiveness of the subjects to the reward. Rats that poked at an average rate lower than 15 responses per minute were deemed unresponsive, or not rewarded Rats that maintained adequate response rates began a series of training sessions with reference frequencies that were iteratively varied in order to determine the saturation frequency. This required that the spacing factor be set to 1.12 (Figure 2.5). At this point, the train interval was increased to 1.5 seconds while the train duration remained at 1 second, effectively adding a blackout period limiting the number of rewards the rat could obtain. Sessions consisted of eleven 10 minute trials at first in order to completely acclimatize the rats to the two-nose poke system, and to the training environment. After at least 3 trials, or until behavior became reasonably reproducible, trials were shortened to 5 minutes. 2.3 Imaging In preparation for imaging, animals were anesthetized with 1 5 2 . 0 % isoflurane or halothane, tracheostomized, and placed on mechanical ventilation (Harvard Apparatus). Body temperature was maintained with a heated water pad at 37 O C (Gaymar). Each rat was secured in a custom-made holder including a bitebar and earbars. Anesthesia was adjusted to 1% for imaging. Throughout each trial, animals were monitored using a transcutaneous blood-gas analyzer (Radiometer TCM3) or pulse oximeter (Nonin 8600MV). Imaging was done with a 4.7 T, 30 cm diameter inner bore diameter horizontal magnet, which was controlled by an AVANCE console (Bruker Instruments) with Paravision Imaging Software and was equipped with a 12 cm ID triple axis gradient set (26 Glcm maximum). Signal was transmitted and received with a surface coil consisting of a copper wire loop and etched circuit board that was positioned over the head, around the electrode, of each rat. High resolution anatomical images of the forebrain, including electrode implantation site, were acquired for each rat using a gradient echo FLASH sequence with TE and TR of 15 and 2000 ms respectively, 256 x 256 matrix, 3 x 3 cm field of view, and a slice thickness 1 mm. The sessions consisted of cycles of stimulation periods followed by longer rest periods, during which single-shot gradient echo EPI sequences were used for standard BOLD imaging. Image matrices of 64 x 48 pixels were acquired using TEITR 2012000 ms, bandwidth 100 kHz, field of view 3.2 x 2.4 cm and slice thickness of 1 mm. Imaging volumes consisted of 8- 12 consecutive slices centered over the somatosensory cortex. 2.3.1 Functional Imaging: Part 1 During image acquisition, animals were stimulated with same pulse train parameters that were used to determine response rates during training, both at full and at half current amplitudes, in separate experiments. Rest periods of 30 seconds alternated with stimulation periods lasting 20 seconds, with pulse trains delivered at a frequency of 1 Hz. Eight complete cycles of stimulation and rest periods were delivered per imaging trial, for a total of 200 images. 2.3.2 Functional Imaging: Part 2 During image acquisition, the responsive rats were stimulated at 0.5 Hz at the same current amplitude and pulse train parameters used during train parameters. Each rat received 100 second long cycles with a 10 second stimulation period and 90 seconds allowed for decay of residual effects of the stimulation. These cycles were performed at the saturation frequency, and frequencies above and below that value as presented in the following table. In order to confirm that the brain was responding to electrical stimulation despite anesthesia, 2 mA shocks to the paw were delivered while functional images were acquired. Unresponsive rats also received this paw shock, in addition to being imaged while receiving 200 Hz, 0.5 second trains of 0.1 millisecond pulses at 1 Hz and a current that did not elicit involuntary movements. 2.4 Imaging Data Analysis Analysis of imaging data was performed with Matlab v.6 (Mathworks) running in-house processing routines. Regions of significant activation were identified by correlation with stimuli, using a t-test criterion (uncorrected p <I 0-5), and superimposed on corresponding EPI anatomical maps. For Part 1, areas of reproducibly signalcorrelated BOLD activation were selected and plotted as functions of time. For Part 2, areas of signal-correlated activation at the saturation frequency were selected. The percent signal change for these voxels was then plotted for frequencies at, above and below the saturation value. Additional data processing and visualization were performed using Matlab and Adobe Creative Suite. Chapter 3 Results Data relating to both behavioral training and functional MRI experiments reveal insights into the effectiveness and rewarding character of intracranial self-stimulation of the rat MFB. Electrical stimulation, whose parameters are determined by operant conditioning in both Part 1 and Part 2, elicits a distributed hemodynamic response that is found reproducibly for areas implicated in neural reward processing. 3.1 Part 1 3.1.1 Operant Training Results After at least five days of recovery from surgery, MFBL9, MFLB 11 and MFBL 12 were subjected to behavioral training in a Skinner-type box, which was outfitted with one nose poke device as described in the Methods. Current amplitude was iteratively adjusted to just under the threshold for inducing involuntary movements upon stimulation, and various current amplitudes lower than this maximum value were also used during conditioning for response (nose poke) rate comparison. Electrode placements were determined to be rewarding if the rat acquired nose poking behavior at the selected current amplitude. Plots of response total as a function of time show that the general trend is for increased responding at higher amplitudes (Figure 3.1). For MFBL9 and MFBLI 1, whose response rates were two orders of magnitude greater than MFBL 12, the response rate to current amplitude relation is a decidedly nonlinear function, with response rate increasing an order of magnitude for fractional increases in current amplitude. MFBL12 did not respond as much or for as long as the other subjects, but was still considered to have acquired nose poking behavior. Although there is no clear trend of responses vs. time, the highest current stimulus did elicit the most sustained nose poking. Factors that may have contributed to these disparate response rates include the placement of the electrode and the varying sensitivity of the individual subjects to the 100 Hz frequency that was consistently applied. 3.1.2 Imaging Results Following behavioral training, BOLD fMRl under isoflurane anesthesia was used to characterize the neural response to pulse trains of electrical stimulation with parameters similar to those used during operant conditioning. In addition, high resolution gradient echo imaging showed that the electrode tips are positioned slightly outside of the dorsal extent of the MFB (Figure 3.3E, plate 3). Maps of the voxels containing statistically significant (p-value < 10") signal increases due to stimulus modulation were then superimposed over the EPI slices acquired through the rostra1 half of the brain during functional imaging for localization of activity (Figure 3.2 and 3.3). Three brain regions showed consistent activation in all of the subjects to varying degrees. By correlation with a stereotaxic atlas and high resolution images, these were determined to be: 1) the somatosensory and motor cortex (SIIMI), 2) the striatum and orbital cortex (StIOC) and 3) the central, or saggital, sinus (SS). The BOLD activation results, in terns of percentage of voxel activation, correlate with the behavioral trends. Significantly fewer voxels are identified as statistically significant when the lower stimulation amplitude is used (Figure 3.2). The regions significantly modulated by the lower current stimulus are the most prominent at the higher current stimulus as well. These results mirror the behavioral data, which showed less robust responding at lower current amplitudes. The timecourses of stimulus response vary between the three regions (Figure 3.4). Modulation of signal in the central sinus is commonly observed in rat fMR1 studies, and may reflect an overall increase in blood perfusion due to the presence of a stimulus. BOLD signal changes in this region are relatively uniform for the eight stimulation cycles. Like the response in the SlIMl, on average the signal changes in the SS peak more quickly after stimulation begins, and decays to baseline more slowly after stimulus offset when compared with the StIOC. However, percent signal changes from baseline for the S l/Ml are greater for the first four stimulation epochs than the last four. The average peak signal change for all three regions is about 3%. 3.1.3 Part 1 Figures MFBC9 I I I F k q m m s lo Various Current Amp4ludm I I I a I - - A - 1.5 2 25 3 3'5 4 4.5 5 MFBL11 Responses to Various Currents 3000 2 2.5 3 Time (milliseconds) Figure 3.1 RESPONSE TOTALS FOR (A) MFBL9, (8) MFBL 11 AND (C) MFBL12 for various current levels over the time that the subject was responding with nose pokes during 1 hour sessions. Plots with smaller lengths and regions of plots that are essentially horizontal lines indicate periods of unresponsiveness. thres. current Figure 3.2 (A) PERCENT ACTIVATED VOXELS (pc 105) across the brain averaged for all three rats. Example of one EPI slice stimulated at full current (B) and half current levels (C). Activation increases with current as a nonlinear function which is spatially qualitatively similar. - -a y e . I 1 I I I - . . * - -: s f ' - Ir - ..- I r- Fiaure 3.3 (A-C) BOLD ACTIVATION P-VALUE MAPS. The red, blue and green arrows indicating loci of reproducibly stimuluscorrelated BOLD activation refer to the saggital, or central, sinus (SS), the striatallorbital cortex area (SVOC) and the somatosensory and motor cortex (S1IM 1) respectively. The locations of these regions of interest are further illustrated by the color-coded encircled areas in the rat brain atlas plates of panel E. (D) High resolution scan of the rat in panel A. The numbered slices correspond with the atlas plates in E. (E) Atlas plates with color-coded indicators of activated regions and electrode tip positions. I;;;<m -.? :7*z. ,, .-':.. .> ' A. . -.&*,5. ,:;-.' >;I ,;:-,;,: :'!;:=:-,; , . .", . . ;:::$ s .. ~ ' .: .. . L ' 8 'O'O~ ..!, ..' , ;.,,'.A,' IL. - ! *. . ,.., .1,.9,-.-, y? .. .).<'7; *. '- L., time peristimulus time Figure 3.4 PERCENT SIGNAL CHANGE TIMECOURSES for all cycles for one subject (left). Corresponding time points within these cycles were then averaged to find the mean percent signal change (right). Blue, green and red traces correspond to the colored arrows of Figure 3.3. The gray regions indicate stimulation periods, and the white are rest periods. 3.2 Part 2 3.2.1 Operant Training Results Work by Peter Shizgal and Charles R. Gallistel has shown that the rewarding quality of intracranial electrical stimulation increases with increasing frequency, but tends to saturate after a certain point.1 We hypothesized that by investigating the BOLD response to stimuli with frequencies below, at and above saturation values, we could distinguish between activation resulting from the rewarding aspect of the stimulus and other, secondary effects. In order to determine the saturation frequencies of the subjects in this second phase of the study, a second nose poke device was added to the operant training chamber. The nose pokes were assigned alternating roles as the reference and comparison stimulus. The current amplitude was set at the beginning of training by iteration to be just below the threshold for inducing involuntary movements. 10s '"I 3.2.1.1 Shaping Sessions As stated in the Methods, the behavioral protocol for Part 2 involved shaping trials to determine receipt of rewards, followed by variable trials to ascertain saturation frequencies for individual subjects. By the third shaping trial, there was at least an order of magnitude difference between response rates for rats, which were thereafter classified as rewarded or not rewarded (Table 3.1). For the last shaping session, the average response rates of rewarded and not rewarded rats were 39.58 responses per minute and 0.87 responses per minute respectively. High resolution anatomical scans produced with gradient echo imaging showed that, in general, the electrode tips of rewarded rats were closer and more dorsal to the MFB (Figure 3.5). In contrast, electrode tips for not rewarded rats were positioned more medially (Figure 3.6). 3.2.1.2 Variable Sessions Rats that demonstrated reward seeking behavior were advanced to variable training, which began with the reference set at 200 Hz. The reference was increased until the response rate for the reference frequency stimuli and for comparison stimuli above the reference frequency were approximately equivalent (i.e. Figures 3.9,3.10 and 3.1 1). MFBL 17's response and reward receipt totals for sessions with the reference set to 200 Hz show that over time the subject learns to more consistently differentiate between higher and lower frequencies, seeking more of the more rewarding stimuli. The rat showed a decreasing preference, or ability to differentiate between the rewarding quality of the different frequencies, for higher frequencies when the difference between the higher and lower frequency decreased. Its preference, or the ratio of reference frequency rewards to comparison frequency rewards, continues to change for the entire range (1 59 Hz - 495 Hz) of comparison frequencies (Figure 3.7). In contrast, for the session where the reference was set above 297 Hz, the reference to comparison preference ratio is essentially 1:1 for comparison frequencies above the reference value, but is very close to 1 :1 for the entire range (Figure 3.8, bottom row). Variable training proceeded in a similar pattern for MFBL19 and MFBL24, although the rats performed at different but self- consistent response rates. Parameters for the imaging procedure (Table 3.5) were determined by the subjects' performance in five averaged trials at the determined saturation frequency (Figures 3.9, 3.10 and 3.1 1). 3.2.3 Imaging Results As in Part 1, high resolution anatomical images were used to determine the positions of subjects' electrode tips. For rats that did not demonstrate reward-seeking behavior, electrode tips tended to be more medially positioned than those of rewarded rats. Rewarded rats' electrode tips were all within 1 mm of the midline of the MFB. The tMR1 sessions consisted of cycles of stimulation periods followed by longer rest periods, during which single-shot gradient echo EPI sequences were used for standard BOLD imaging. Imaging volumes consisted of 8-12 consecutive slices centered over the somatosensory cortex. All three subjects had stimulation induced BOLD activation, however, larger areas of more statistically significant signal changes occurred in MFBL17 and MFBL24 than MFBL19. Both MFBL17 and MFBL24 had activation in the central sinus (CS), the somatosensory/motor cortex (Sl) and the caudate putamen (or striatum) with surrounding ventral forebrain (CPu), but MFBL24's BOLD response was more robust. MFBL17's scans (Figure 3.12) showed a greater response in terms of percentage of voxel activation at 300 Hz and 500 Hz than at 175 Hz, but are similar in extent to each other. In Part 1, it was shown that a higher current amplitude could create greater overall BOLD activation, and these images suggest that pulse frequency - the only variable parameter for this experiment - can achieve a qualitatively similar effect. Thus the BOLD response appears to match the behavior in that it increases up to the saturation point, but not beyond. MFBL24 was chosen to examine this phenomenon in more detail because its pvalue maps reveal specific areas with consistent and exceptional BOLD activation over time (Figure 3.14). Signal change as a function of time was plotted for voxels significantly activated in MFBL24's CS, S 1 and CPu regions at 265 Hz, and then averaged over eight stimulus-rest cycles (Figures 3.1 6 - 3.1 8). These voxels were then used to create corresponding signal change timecourses for the 115 Hz and 400 Hz stimulation sessions. For these voxels, there is a clear increase in the percent signal change during the 10 second period when trains of electrical pulses are being applied, with a subsequent decrease to near baseline within 12 seconds. Signal changes for the CS are an order of magnitude larger than for the S1 or the CPu, which is to be expected since the volumetric blood flow in this large vessel is much greater than in brain regions where blood vessels are fine and diffuse. Unlike the timecourse results of Part 1, decay of signal post-stimulus is about the same for the three regions. The more striking difference between the CS plots and the S1 or CPu plots is the relative peak signal change of the timecourses. For both the S1 and CPu, the 265 Hz (reference) and 400 Hz plots are similar, and notably larger than the 155 Hz response. The CS response shows roughly equal peak signal changes for the 400 Hz and 155 Hz stimuli, and smaller relative differences between the timecourse plots for the three stimulus types. These results correlate with the behavioral studies, which showed that MFBL24 did not prefer 400 Hz stimuli over 265 Hz stimuli, but preferred both to 155 Hz stimuli. 3.2.2 Part 2 Figures Table 3.1 TOTAL RESPONSE (NOSE POKE) PER TRIAL VALUES of each of the experimental subjects for the three shaping sessions averaged over the 6 trials that comprised each session. Response rates for trainable rats (highlighted in blue) are orders of magnitude larger than those for untrainable rats. Fisrure 3.5 ELECTRODE TIPS OF UNRESPONSIVE RATS: MFBLl8 (blue), MFBL20 (purple), MFBL21 (orange), MFBL22 (yellow), and MFBL23 (red). The positions of these electrode tips are clearly more medial than those of the trainable subjects. I.., I l ' % 8 # . . B+ I .... . . . A .."...4....i... , , I , . . . .m rnm -1 , . . , . . , . Figure 3.5 ELECTRODE TIPS OF REWARDED RATS: MFBL17 (red), MFBL19 (yellow), and MFBL24 (blue). The positions of these electrode tips are all within 1 mm of the midline of an MFB. 1/16/05 A 1M8105 M -- I 6 ; f I f 100 150 200 250 300 1 350 CornparlaonFnquoncy Comprrlaon Fnquoncy 1A9105 1119105 I; 1 ; I i 200 100 350 250 200 100 350 1120105 1/.om5 1i 100 250 Comparlwn Fnquoncy Compariwn Fnquoncy 1 ; f 150 200 250 Compamn Fnquency 300 350 100 150 200 250 Comparlwn Fnquoncy Fiaure 3.7 MFBL77's RESPONSE AND REWARD TOTALS FOR 200 Hz REFERENCE SESSIONS over three consecutive days. The subject's nose poking is so vigorous that it is responding faster than the 1.5 second reward stimulus interval. Over time, MFBL17 more consistently chooses the reference or comparison stimulus in numbers proportional to the difference between the comparison and reference frequencies. 300 350 211 Hz 211 Hz i i 2 100 t90 200 250 300 350 400 Comparimn Frequency Comparlmn Frequency 265.5 Hz 265.5 Hz I; f i !i 1 Cornparim Fmqwney 1 Compariwn Fmqwny 297.4 Hz 297.4 Hz A -11 f i 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0 5 5 0 Comprrlaon Fmqwny Comparim Fmqmney - -- 333.1 Hz - 333.1 Hz I; P 180 3 8 0 4 3 0 4 8 0 5 3 0 5 8 0 Compadmn Frequency Cornparim Frequency Figure 3.8 MFBL17's VARIABLE TRIAL RESPONSE AND REWARD TOTALS for sessions with the reference set at the frequencies indicated. As the reference frequency increases, the difference between comparison and reference rewards or responses decreases, especially for comparison frequencies greater than the reference. For instance, with the highest reference (333.1 Hz), the reference-comparison ratio does not change at all for values above the reference, and is about 1 for almost the whole range. MFBLI7: 297.4 Reference Trial Average 160 210 260 310 410 360 460 Comparison Frequency Figure 3.9 MFBL17's TRIAL REWARD TOTALS averaged over five sessions. frequency comparison comparison reference reference error error 105.4 1 6.50 200 1 39.8 1 6.55 ?EA TE FIGURE 3.9. Table 3.2 VALUES USED TO C MFBL19: 333.1 Reference Trial Average 290 340 390 440 490 540 Comparison Frequency Ficrure 3.10MFBL19's TRIAL REWARD TOTALS averaged over five sessions. frequency comparison comparison reference 10.78 5.24 4.63 3.40 5.16 8.33 6.70 10.04 4.02 4.17 reference error error 149.0 103.2 84.2 139.2 79.2 87.6 123.2 107.2 105.6 89.8 ' 11.10 5.07 21.43 3.78 19.38 7.91 6.26 9.61 4.32 4.05 395 1 92.2 Table 3.3 VALUES I SED TO C\ EA TE FIGURE 3.10. MFBL24: 265.5 Reference Trial Average 150 200 250 300 350 400 450 Comparison Frequency Figure 3.11 MFBL24's TRIAL REWARD TOTALS averaged over five sessions. frequency comparison comparison error 15.27 5.55 6.43 3.60 5.18 5.81 5.14 7.88 9.57 2.78 1 reference reference error 13.08 7.27 3.96 11.33 2.66 6.84 5.13 10.14 5.56 6.00 58.4 442 1 78.0 Table 3.4 VALUES L SED TO CREATE FIG IRE 3.11. Table 3.5 STIMULATION PARAMETERS THAT VARIED BETWEEN SUBJECTS. Trainable ubjects were stimulated at 0.5 Hz with trains of frequencies approximating the saturation value, and two values below and above this value. Untrainable subjects were all stimulated at 200 Hz. -low)H@ Edutim gradient echo te EPI 1. The 3DO Hz and 506 Hz pwliue mags have a greater @wamitage of activated wx&i;s-f*) Denotes the de&& tip position. iO Caudal Fiaure 3.12 ANATOMICAL AND P-VALUE MAPS OF MFBL17. I . Caudal High m l u t m gradient echo imaging was used to generate -bglp) the ~natwnicalimage, while EPI la ws used fc~s the BOLD WRI ("1 denotes the electrake tip posirtian I Caudal Fiaure 3.13 ANATOMICAL AND P-VALUE MAPS OF MFBL19. 1 caudial !he arrabrnkial ThecwA.l~,ombwnay and m c w b l c and Caudal Fiaure 3.14 ANATOMICAL AND P-VALUE MAPS OF MFBL24. 20 sec PeristimulusTime E N 7 M L SENUS AVERAGED TIMECOURSES ~ i g r p&angas ~ of m e l a signmntly wtsvted by 265 Hz stimulus am averaged aver all eight stirnulation wles and plated a& a function af time. The gray and white areas mpfesmt stirnulatian pwbd and rest pridrespectively. Fi F&~ME~!~c. -- Perissirnulor Tim Fiaure 3.17 SOMATOSENSORY AND MOTOR CORTEX AVERAGED TIMECOURSES FOR MFBL24. Peristimulus Time Ficrure 4.1 8 CAUDATE PUTAMEN AND SURROUNDING VENTRAL FOREBRAIN AVERAGED TIMECOURSES FOR MFBL24. Reference 1 . Gallistel, C. R. "The role of the dopaminergic projections in MFB selfstimulation." Behav Brain Res. 1986 June; 20(3):3 13-2 1 . Chapter 4 Discussion We have shown that electrical stimulation of the rat's medial forebrain bundle (MFB) elicits hemodynamic responses in neural substrates that have been implicated in reward processing, as well as the central sinus. Analysis of stimulus-correlated signal changes in these reward-related regions reveals relationships between relative peak activation and stimulus frequency. No such pattern is discernable in the signal changes due to bulk hemodynamic responses to the stimuli in the central sinus. Data from these regions therefore suggests that they are not simply responding to the perception of a stimulus, but rather that they are processing of the stimulus's rewarding quality. Both Part 1 and Part 2 of this study were conducted as preliminary studies in preparation for the imaging of awake rats performing a task for rewarding MFB stimulation. In Part 1, three rats were trained in an operant conditioning chamber to poke their noses into a device, initiating the stimulus generator to send an electrical stimulus of predetermined parameters to their implanted electrode. Part 2 involved the addition of a second device, to implement the new choice-based experimental design for the purpose of determining a saturation frequency above which subjects could not detect changes in reward magnitude of stimuli. The electrodes were lowered toward the MFB of each rat with varying levels of success. The subjects - both in Part 1 and Part 2 - tended to be receptive to rewards in terms of behavior and BOLD imaging when the electrode tip was positioned within a millimeter of the midline of the MFB. The MFB connects the hypothalamus with the ventral tegmental area, where dense dopaminergic projections arise to innervate the striatum.' MFB stimulation is thought to be rewarding because of the phasic dopamine release activated by externally introduced electrical currents. This artificial stimulation utilizes the native physiology of the dopamine system, which plays an important role in goal-directed learning. Early behavioral experimentation showed that higher pulse frequencies (assuming a square wave stimulus) and current amplitudes make for more rewarding stimuli, up to a point.2 At a certain frequency, the dopaminergic neurons reach their maximum response level, and above a particular current amplitude, the brain cannot withstand the electrical potential without damage or undesired, involuntary movements. Furthermore, in a study of variable frequencies combined with variable current levels, Gallistel showed that reward magnitudes depend on the rate at which action potentials are generated and on the size of the population of reward-relevant axons they are generated in.' Therefore, the maximum current magnitude and pulse frequency for a given experimental subject, as well as the stimulus's rewarding quality that the subject evidences through its behavior, strongly depend on the unique placement of its stimulation electrode. fMRI can then be used to isolate areas of BOLD activation caused by stimuli whose rewarding qualities have already been determined from individual-specific behavioral data, allowing the experimenter to make correlations between activated regions and reward processing. Figure 5.1 SAGGITAL SECTION OF THE RAT BRAIN WITH INPUTS (A) AND OUTPUTS (B) OF THE MFB. Relevant substrates include the amygdala (AMYG), caudate putamen (CPU), frontal cortex4(FC), substantia nigra (SN) and ventral tegmental area (VTA). Operant conditioning results for this study showed that response rates for rats were generally greatest for higher currents and frequencies. Results in Part 1 established that rewarding MFB stimulation elicits a distributed hemodynamic response in the anesthetized rats that is reduced for lower current stimuli. During imaging stimulation cycles, reproducibly activated loci included the striatum and orbital cortex and somatosensory/motor cortex. It is thought that the somatosensory/motor cortex sends corticostriatal afferents to the striatum, and that the orbitofrontal cortex plays a role in perceiving or evaluating reward magnitudes. The striatum-orbital cortex region (St/OC) is ideally located to store the reward value of sensory stimuli, and in fact St/OC neurons in rats have been shown to respond preferentially to different tastes. Such gustatory studies (see Figure 1.8) indicate that this region codes for the rewarding quality of stimuli, rather than their sensory aspects.' In Part 2, the striatum, or caudate putman, with adjacent ventral forebrain (CPu) and somatosensory cortex were also exceptionally stimulus-modulated, and analysis of the timecourses for these areas showed peak mean signal changes were similar for saturation and above saturation frequencies, but signal changes were lower for below saturation frequencies. In contrast, the timecourse for the central sinus, which reflects generic hemodynamic responses to perception of reward, did not follow the same pattern. Studies, such as Samuel McClure's fMR1 experiments with temporal prediction errors (see Figure 1 .7)6, have implicated the rodent CPu in reward evaluation processes. In that study, positive prediction errors caused increased activity in the left putamen. This evidence adds to a body of knowledge asserting that the striatum plays a pivotal role in associating varying levels of reward with sensory information. Although the areas with BOLD activation in Part 1 and Part 2 contain neural substrates that have been consistently implicated in reward processing, the regions activated in Part 1 tended to be more ventral than those in Part 2. The distributed neural network of reward includes many substrates that interact in a complex fashion unique to stimulus type, and the resolution of the BOLD imaging results is not fine enough to precisely pinpoint the individual structures responsible for stimulus-correlated responses. Given a much larger sample of reward-seeking subjects with robust BOLD responses, a more complete and accurate map of spatial and temporal activation profiles during stimulation could be constructed. BOLD fMRI could then be combined with substrate- or process- specific contrast agents, and activation profiles could be directly correlated with individual brain structures. Using the operant conditioning protocol applied for this study, rats can be trained to seek rewards, and their optimal stimulation parameters can be determined. These experiments have therefore shown that it is reasonable to expect that the surgical, behavioral and imaging procedures used here will contribute to reliable results in studies with awake rodents. Results of awake imaging will then be compared with these studies of anesthetized subjects, hopefully leading to insights into the real-time spatial activation of experimenter administered stimulation versus intracranial self-stimulation, as well as the physiological basis of reward-related learning. References 2. Wise, R. A. "Dopamine, Learning and Motivation." Nature Reviews Neuroscience. Vol. 5 June 2004. 3. Gallistel, C. R. and Matthew Leon. "Measuring the Subjective Magnitude of Brain Stimulation Reward by Titration with Rate of Reward." Behavioral Neuroscience. 1991, Vol. 105, No. 6 , 9 13-925. 4. Gallistel, C. R. "The role of the dopaminergic projections in MFB selfstimulation." Behav Brain Res. 1986 June; 20(3):3 13-21. 5. Stellar, James R. and Eliot Stellar. The Neurobiology of Motivation and Reward. O 185 by Springer-Verlag New York, Inc. 6. O'Doherty, John P. "Reward representations and reward-related learning in the human brain: insights from neuroimaging." Current Opinion in Neurobiology. 2004, 14:769-776. 7. McClure, S. M. et al. "Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum." Neuron. Vol. 38,339-346, April 24 O 2003 by Cell Press.