Document 11152282

advertisement
A Functional MRI Study of the Distributed Neural
Circuitry of Learning and Reward
Alexandra F. Awai
Submitted to the
Department of Nuclear Science and Engineering
In Partial Fulfillment of the Requirements for the Degrees of
Bachelor of Science and Master of Science
at the
Massachusetts Institute of Technology
-51
[JUM
May 20,2005
8 Massachusetts Institute of Technology
All Rights Reserved
Signature of Author
d
...................
.-,.-.,....I..
.
......+. ..........................
~ e 6 ) e n tof Nuclear ~cien; and Engineering, May 2005
II
+9
Certified by
......: ...'w. f
.Y. :H
-
-4
...
:
.
.................................
r................*..........
'Y
Matthew Colonnese
Postdoctoral Associate, McGovern Institute for Brain Research, Thesis Reader
Certified by
...............,
............... L .C:. -
7
4./........................................
Alan Jasanoff
Assistant Professor, Department of Nuclear Science and Engineering, Thesis Supervisor
k
Accepted by
******f
-- .......................
................................., ..p..7....,...,
\li
Jeffrey Coderre
Chair, Department of Committee on Graduate Students
A Functional MRI Study of the Distributed Neural
Circuitry of Learning and Reward
by
Alexandra F. Awai
Submitted to the Department o f Nuclear Science and Engineering
May 20,2005
In Partial Fulfillment o f the Requirements for the Degrees o f
Bachelor o f Science and Master o f Science
Abstract
The aim of this research project was to study the neural substrates involved in processing
rewarding stimuli. Evaluation of the magnitudes of reward is one of the fundamental
aspects of goal directed behavior, and studies have shown that this process involves the
midbrain dopamine system. Work by C. R. Gallistel has shown that the reward magnitude
of electrical stimulation to structures within this system increases with increasing current
and frequency. In this study, operant conditioning with intracranial self-stimulation of the
medial forebrain bundle (MFB) was used to correlate the rewarding quality of a stimulus
with variations of its current amplitude (Part 1) or electrical pulse frequency (Part 2). For
Part 2, a saturation frequency, which is the point at which increasing stimulus frequency
does not elicit a more vigorous operant response, were established for each of the
responsive subjects. Functional magnetic resonance imaging (fMRI) with blood
oxygenation level dependent (BOLD) contrast was then used to evaluate the brain
activation in response to behaviorally characterized electrical stimuli. High resolution
anatomical images revealed that subjects with electrode tips positioned within 1 mm of
the midline of the MFB tended to demonstrate reward-seeking behavior. Timecourses
were plotted for imaging voxels in areas exhibiting BOLD responses in Part 1 and Part 2.
In Part 1, the BOLD timecourse in the striatumlorbital cortex region - which has been
implicated in reward processing - had different time-evolution characteristics than the
central sinus, which is thought to reflect general hemodynamic responses to stimuli.
Additionally, the activated regions were qualitatively similar for varying currents, but
lower current amplitude led to a smaller percentage of active voxels. In Part 2, responses
in the somatosensory1motor cortex and striatum with adjacent ventral forebrain, which
are both thought comprise important reward processing circuitry, have similar BOLD
responses for saturation and above saturation frequencies, but lower responses at below
saturation frequencies. These results show that BOLD imaging can be utilized to isolate
regions that code for the rewarding quality of MFB stimulation, rather than its sensory
aspects.
Thesis Supervisor: Alan Jasanoff, Ph.D.
Title: Assistant Professor, Department o f Nuclear Science and Engineering
Acknowledgements
To Alan Jasanoff,
for welcoming me into his lab and helping me to make a
contribution to this research project, for which he wrote
many macros including Operate, Operatesequencer, and
MatLab functions used in analyzing the behavioral and
imaging data. His thoughtful guidance was invaluable in
my last year at the Institute.
To Matthew Colonnese,
for his technical contributions in terms of fMRI data
acquisition and analysis, as well as his unflagging patience
with my efforts to learn and write about neuroscience. His
support was a vital part of my research experience.
It has been my joyful privilege to learn from and work
with these scientists.
To the Department of Nuclear Science and Engineering,
for allowing me to explore a broad range of disciplines
while helping me to develop my knowledge of core
engineering principles. 1 am truly grateful for the academic
and financial support the Department has given me.
I wish to express my sincere thanks.
Table of Contents
1. Background
1.1 Motivation
1.2 Reward and Associative Learning
1.3 Neuroanatomy of Reward
1.4 The Dopamine Hypothesis of Reward
1.5 Imaging the Brain
1.6 Insights from fMFU into Brain Function
2. Methods
2.1 Implantation of Stimulating Electrode
2.2 Operant Training
2.2.1 Operant Training: Part 1
2.2.2 Operant Training: Part 2
2.3 Imaging
2.3.1 Functional Imaging: Part 1
2.3.2 Functional Imaging: Part 2
2.4 Imaging Data Analysis
3. Results
3.1 Part 1
3.1 .1 Operant Training Results
3.1 $2Imaging Results
3.1.3 Part 1 Figures
3.2 Part 2
3.2.1 Operant Training Results
3.2.1 .1 Shaping Sessions
3.2.1.2 Variable Sessions
3.2.2 Imaging Results
3.2.3 Part 2 Figures
4. Discussion
Chapter 1
Background
Animals are motivated to pursue behaviors for which they are rewarded. To
discover the neural basis of reward, as well as the physiological mechanism of
motivation, is therefore a primary goal of neuroscience. The neural circuitry involved in
goal-directed behavior is complex. Understanding it requires parsing and analyzing
numerous functions, such as perception of stimuli as rewarding, evaluating the rewarding
or aversive quality of stimuli, recalling these values, and reacting in a context-appropriate
fashion based on instinct or cognition.
The body is regulated by neuroendocrine, autonomic and motivational
mechanisms, the last being the most difficult to quantify, as motivation is an inferred
internal state that is used to explain behavioral variability. Motivational states are
regulated by the perceived needs of tissues (i.e. thirst and hunger), as well as anticipatory
mechanisms, hedonic factors and ecological constraints.' The holistic physiological
mechanism that underlies the behavioral manifestations of motivation and reward must
be readily influenced by endocrine, visceral and somatic afferent stimuli, and have
outputs that enable discrete behavioral control, for instance, ensuring that a hungry
individual eats rather than drinks. This mechanism is the integration of myriad neural
processes - some better understood than others - that are mediated by various regions in
the brain by virtue of the specialized cells they are composed of.
Neurophysiological studies have implicated many brain structures in learning,
motivation and reward. These biological substrates are believed to work together in a
distributed neural network of reward and motivation processes. The neural substrates for
the different aspects of goal-directed behavior have been identified by lesion, drug and
histological studies, as well as metabolic mapping and fMRI. The physiology of this
circuitry, in terms of synaptic organization and the topography of neural projections, is
well understood. However, there remain fundamental questions pertaining to the precise
functional significance of each substrate within this complex system. fMR1 is uniquely
positioned to reveal pivotal insights into these questions because it is relatively
noninvasive, and it allows researchers to observe instantaneous changes in brain activity
while experimenter-regulated behavior or stimulation is occurring.
1.1 Motivation
Motivation is defined by its effect, which is to incite context-specific responses,
such as flight in the face of danger, or consuming food when the body recognizes a
nutrient deficit. Because the precise physiological mechanisms of motivated behavior are
not completely understood, motivational theories are an important aspect of neuroscience
models for explaining animals' interactions with their environment. Concepts of
motivation are used to interpret individuals' variable reactions to constant stimuli especially affectively important stimuli - and the directedness of goal-seeking behavior.
One of the earliest and most celebrated conceptions is that of homeostasis, the
maintenance of a stable internal state.2
Homeostasis requires a regulatory system that uses a setpoint, a predetermined
value of some physiological parameter, to maintain a stable internal state. The purpose of
this regulatory mechanism is to avoid dangerous or unwanted deviations from the
setpoint, or a narrow range about the setpoint. Although many biological processes
involve such a system, including error detectors and a negative feedback mechanism to
correct errors (i.e. insulin to modulate blood sugar concentration), learned behavior
patterns are often more complicated. It would seem that a truly homeostatic system only
exists where deviations from the setpoint are immediately lethal, or at least severely
detrimental.2 For instance, warm-blooded animals' bodies work to maintain a relatively
stable internal temperatures, as the proteins and mechanical systems that sustain life do
require a particular environment. However, most physiological parameters can vary a
great deal over long periods, allowing individuals to survive under extremely variable
circumstances. Despite the seeming incompleteness of the homeostasis theory, much of
behavioral neuroscience of hunger, thirst, and other ingestive behavior has involved
searching for physiological setpoints and deficit signals.
Homeostatic outcomes can arise without homeostatic mechanisms if stability is
maintained by anticipatory motivation or a balance achieved by opposing neural,
hormonal and behavioral forces. Anticipatory motivation refers to conditioned responses
or otherwise preemptive mechanisms that are essentially reactions to predictions of a
deficit. The aforementioned balance of forces refers to settling-point regulation.
Biopsychologist Robert Bolles claimed that the homeostatic concept of body weight was
simply a plausible fiction, and that weight is kept relatively stable by opposing
neuroendocrine and psychological reflex mechanisms that are in balance at settlingpoints.' It is clear that there is no body weight setpoint, as this characteristic almost
always changes throughout adulthood. However, if the compulsion to eat is determined
by internal satiety, the availability and palatability of food, social circumstances and
psychology, then one can consider that a changing body weight is transiently settling
around a point that is only moderately stable. Originally, the homeostasis concept did not
explicitly involve setpoints, and so it seems that the artificial joining of homeostasis,
setpoints and error detections may be an unnecessary semantic complication on the part
of behavioral neuroscientists.
Considering homeostasis and pseudo-homeostasis is only the first step to forming
a theory of motivation. Undesirable states (such as hunger), unstable interactions of
competing internal forces (such as stress-related overeating), or the availability of a
superior state (such as physical and mental satiation), can motivate goal-seeking
behavior. The intervening variable concept of drive simplifies causal stimulus-response
(S-R) relationships. That is, an undesirable state such as water deprivation, can be
considered a stimulus or independent variable and the response (dependent variable) to
this stimulus may be to drink more water than usual, work harder for the water, or
tolerate contaminated water. However, water deprivation is not the only stimulus for
these behaviors. Rather than considering each S-R relationship as a separate mechanism,
it makes more sense to include the intervening variable, or drive, that can link many
independent variables with the dependent variables attributable to a common condition
caused by the stimuli (Figure 1.I). Furthermore, this drive concept is validated when
predictions regarding the dependent variables it may cause are proven in e ~ ~ e r i m e n t s . ~
Independent Variables
Dependent Variables
Amount water drunk
Water deprivation
Wwk for sip
Quinine tolerance
Distance for water
Water deprivation
Amunt water drunk
NaCl injection/'
Distance for water
\
Figure 1.1 DRIVES SIMPLIFY S-R RELA TIONSHIPS. Drives
connect stimuli (independent variables) and responses
(dependent variables).
The existence of causal relationships mediated by intervening variables is
obviously a very basic concept of motivation, and in order to avoid oversimplification,
researchers have posited that truly motivated behavior requires some additional criteria.
Epstein suggested that such behavior requires that the individual demonstrate flexible
learning and coordinated appetitive behavior, actions that indicate prediction of a goal or
reward expectation and hedonic reactions. Various models of drive's involvement in
motivation have been put forth, such as drive reduction by reward deliverance. However,
after the advent of internal self-stimulation experiments, wherein experimental subjects
could obtain free rewards via direct electrical brain stimuli, incentive motivation concepts
superseded the simpler drive-based models of motivation. These studies, as well as others
focused on hedonic rewards, showed that stimuli thought to satisfy cravings would
actually reinforce goal-seeking behavior, and so the focus of research into motivation
shifted to classical and operant conditioning.
1.2 Reward, Reinforcement and Associative Learning
Rewards are stimuli administered to individuals following a correct or desired
response that increase the probability of occurrence of the response. Associative learning
is the process by which discrete percepts or ideas, such as the reward and the response,
are linked to one another. The concept of hedonic reward is central to most motivation
theories. Before the 19603, essentially all explanations of reinforcement behavior
involved drive concepts. However, in a paper titled The Pleasures of Sensation,
Pfaffmann reinterpreted behavioral studies involving hedonic rewards using
physiological evidence. This evidence regarding the neural encoding of hedonic
sensations suggested that those sensations were rewarding and motivating in and of
themselves, not requiring the existence of dependent variables to lead to motivated
behavior. Bolles proposed that cognitive predictions, or learned expectations, of reward
were the actual source of motivation. These expectancies are what Pavlovian
psychologists would consider conditioned stimulus (CS) and unconditioned stimulus
(US) interactions. In order to explain why CS-US expectancies caused motivation, a
psychologist named Dalbir Bindra asserted that the CS itself is eventually perceived as a
hedonic reward. Frederick Toates qualified this developing theory by suggesting that
physiological depletion states could direct and enhance the incentive value of rewards.
This explanation readily incorporates the logical aspects of homeostasis-like motivation
concepts into modem theories of associative learning.2
Associative learning is studied using Pavlovian (classical) and operant
(instrumental) conditioning. In the former, the experimenter provides both the stimulus
and reward, allowing him or her to control when each learning episode occurs, thereby
gaining insights into reward prediction - or errors in reward prediction - as well as
qualitative reactions to the rewards. In contrast, operant conditioning requires that the
experimental subject adjust decision making based on experience in order to minimize
negative stimuli or maximize positive stimuli. That is, the subject determines the number,
magnitude or quality of its rewards by its performance. Pavlovian behavior can be
considered adaptive in a limited sense for a static environment environment, but a
dynamic environment requires the acquisition of new behavioral strategies. This capacity
for learning is typically studied with operant conditioning procedures. However, both
human experience and comparative research reveal that when instrumental behavior
passes a certain repetitiveness threshold, it often fails to adjust to new situations or the
omission of an expected r e ~ a r d . ~
In an article written for Annual Reviews in Psychology, John Pearce and Mark
Bouton discuss the relative merits of currently influential theories of associative learning.
These theories address different aspects of learning, and lead to different implications for
understanding how individuals form causal judgments. The current understanding is that
the magnitude of a conditioned response depends upon the strength of perceived
connection, or associativeness, between the CS and US for a given trial. This concept,
which is based on the Rescorla-Wagner (1972) theory, explains a far wider range of
experimental findings because it includes the assumption that the change in associative
strength of a stimulus is determined by the difference between the magnitude of the UC
and the sum of associative strengths of stimuli present, as opposed to considering only the
primary stimulus. The theory is a metric for describing CS-US associations, but does not
address what these relationships are, how they're formed or how they influence behavior.
One way to gain insights into these issues is to study how associability changes with
context and conditioningO4
Wagner argued that the salience and associability of a CS can only be changed
when the elements it excites are at the focus of attention. Furthermore, the state of
attention is dependent on the context and time course of an individual's exposure to the
CS. Evidence supporting this theory is provided by studies pairing flavor and symptoms
of illness. As one might expect, associations between the flavor and the symptoms were
not as strong when brief access to the flavor sample was provided long before the
symptoms were apparent, as opposed to only a few hours before the symptoms
manifested. Mackintosh proposed that the associability of a stimulus can be determined
by how accurately it predicts reinforcement. However, evidence from pattern
discrimination studies (George and Pearce 1999, Mackintosh and Little 1969, and Shepp
and Eimas 1964) seem to support this theory's prediction that attention to stimulus
increases if it is the best available predictor of reinforcement, while electrical shock
experiments (Pearce and Hall 1980) contradict it. Based on this shock stimulus study and
others, Pearce and Hall proposed that the associability of a stimulus is high when it is
followed by an unexpected US, but low when it is followed by a familiar or expected US.
That is, more attention is paid to the stimulus while the subject is still learning about its
significanceO4These conditioning studies were performed primarily to form plausible
hypotheses about the internal process of learning. However, connecting learning with
biological mechanisms requires using both behavioral evidence of reward and
motivation, along with a quantitative understanding of neural function.
1.3 Neuroanatomy of Reward
Researchers have tried to connect structure and function for decades by observing
the behavior of intracranial self-stimulating and brain damaged animals, for example,
noting that decerebrate rats do not learn taste aversion, and that intracranial electrical
stimulation of hypothalamus and associated structures can reinforce operant
conditioning? More recently, microscopic processes such as cerebral glucose utilization
during electrical stimulation of the ventral tegmental area and the effect of dopamine
depletion in the nucleus accumbens have been observed, building the body of knowledge
concerning the inner workings of the neuroanatomy of reward.' The phenomenon of
reinforcement is thought to involve the integration of myriad neural processes, each with
its own physiological substrate or system of substrates. The basal ganglia and limbic
system are especially important to mediating reward perception and adaptive behavioral
response (Figures 1.2 and 1.3).
t
a
GARA
11
Dwarnina
Glulamdc
T.a~rctnI rJw tcxl
I". t l s * r t M ~ l q t
Figure 1.2 CIRCUITRY MEDIATING PERCEPTION OF
REWARD and initiation of adaptive responses to reward. Arrows
connecting the ventral pallidum and various structures, as well
as the connection between the nucleus accumbens and ventral
tegmental area indicate GABA pathways.8
i
1
I
I
i
1
I
I
,
I
iI
I
1
<:J 'w"
(-)l***'
. .'4'--.*ul-_-
-- _ _-_-_
__
_ - --- -
,
-----
Figure 1.3 SCHEMATIC DIAGRAM OF NEURAL SUBSTRATE
INTERATIONS. Thi~~limbic-striatal-pallidal
circuitry is implicated
in reward processes.
I
The basal ganglia consist of the globus pallidus and caudate-putamen (striatum),
as well as the subthalamic nucleus and substantia nigra. Although the caudate nucleus
and putamen are separated in many mammals, poor development of the rat's internal
capsule makes them hard to distinguish. The subthalamic nucleus and the substantia nigra
are brainstem structures, but they are closely related to the striatopallidal neuronal
circuitry, and so are generally considered as part of the basal ganglia. The rat's striatum is
a large, gray mass occupying the deepest part of the cerebral hemisphere. It serves as a
recipient of topographically organized cortical and amygdaloid inputs, subcortical
afferents from thalamic nuclei, as well as monoaminergic neurons and other cell groups.
The ventral striatum, which can be regarded as a continuum of the dorsal striatum, is
composed of ventromedial parts of the caudate-putamen, nucleus accumbens and
olfactory tubercle.1°
The substantia nigra and adjacent nuclei in the ventral tegmental area have been
strongly implicated in the generation of hedonic reward. The substantia nigra is called
dark substance for the pigmented cells of the SN pars compacta (SNc), which overlay the
SN pars reticulata (SNr). The SNr is a pallidal structure composed of GABA-ergic
(GABA is y-Aminobutyric acid) neurons that project to the thalamus and surrounding
structures. The caudate-putamen (Figure 1.4) is innervated by dense dopaminergic
projections arising from the SNc and ventro-lateral VTA. The rat's substantia nigraventral tegmental area (SN-VTA) lies in the mesencephalon and is 2.5 mm long and 3
mm wide. The SN contains 10,000 to 12,000 neurons on each side, while the VTA
contains 27,000 neurons on each side."
Figure 1.4 THE BASAL GANGLIA OF THE RAT, as well as the
hypothalamus and sublenticular extended amygdale (SLEA). acanterior colmmissure, Acb- accumbens nucleus, CPu- caudateputamen, Tu- olfactory tubercle, VP- ventral pallidum, GPglobus pallidus, EP- entopeduncular nucleus, STh- subthalamic
nucleus, SNC- substantia nigra, pars compacta, SNR- substantia
nigra, pars reticu~ata.'~
The limbic system is comprised of the amygdala, hypothalamus, cingulated
cortex, anterior thalamus, mammillary body and hippocampus (Figure 1.5). The
hypothalamus is connected to the ventral tegmental area via the medial forebrain bundle
(MFB), a tract of nerves whose stimulation has been shown to produce strong
reinforcement of conditioned behavior. The limbic system is thought to integrate
emotional information from various parts of the nervous system. It is believed that the
hippocampus helps to register memories, the amygdala helps in assessing experiences,
and the forebrain assists in decision-making. The ventral striatum and adjacent nucleus
accumbens receive afferents from these limbic structures, and deliver outputs to
structures such as the ventral pallidum. The MFB distributes impulses to various limbic
structures from dopaminergic neurons in the SN-VTA, which together with glutamatergic
imputs determine the output of ventral striatal GABA-ergic spiny neurons projecting into
the ventral pallidurn!
Cingulate cortex
-
I
Hlppocarnpus
Amygdala
Figure 1 . 5 ~THE STRUCTURES OF THE LIMBIC SYSTEM, a
medial view of the left hemisphere of the human brain!
Figure 1.5b THE THREE-DIMENSIONAL ORGANIZATION OF
THE HIPPOCAMPAL FORMATION IN THE RAT BRAIN. (A)
The C-shaped hippocampus where f indicates the fornix. (B)
Three horizontal sections at different dorsoventral levels. (C) The
surface of the hippocampal formation where s is the septa1 pole
and t is the temporal pole. The three coronal sections are shown
at different rostrocaudal levels. DG- dentate gyrus, fi- fimbria,
and S- subic~lum.'~
While both the amygdala and prefrontal cortex supply excitatory input to the
nucleus accumbens, the amygdala is thought to be more involved in regulating responses
to conditioned rewards, and the prefrontal cortex integrates short-term memories with the
behavioral responses (Figure 1.2). There is considerable evidence that interactions
between structures in the limbic system and dopamine-related functions in the ventral
striatum influence the effects of conditioned stimuli on goal-directed behavior. One
example of such evidence implicates N-methyl-D-aspartate (NDMA) glutamate receptors
in the amygdala in the ability to learn to approach stimuli that indicate food rewards. It is
believed that this associative learning requires that information be relayed to the ventral
striatum where the activity of ascending mesencephalic dopamine projections helps to
determine the selection of appropriate responses. Infusions of NDMA and non-NDMA
receptor antagonsists into the core region of the nucleus accumbens, as well as lesions in
that same area, are known to impair rats' foraging behavior. The performance of rodents
in spatial memory tests is sensitive to intra-accumbens infusions of haloperiodol, a
dopamine receptor antagonist. Optimization of foraging, which clearly involves spatial
memories, is thought to be dependent on projections of the hippocampal formation. These
connections suggest that the ventral striatum is a point of convergence for complex
spatial information and conditioned stimuli (Figure 1.3).9
The activity of spiny projection neurons and cholinergic interneurons in the
striatum has been implicated in several aspects of learning. Spiny projection neurons are
relatively quiescent cells that phasically fire in association with learned movements. The
cholinergic neurons show sensory activity during sensorimotor learning. The learned
response, which is dopamine dependent, is an excitation in the activity of the cholinergic
interneurons followed by a transient suppression. Essentially, neural responses occurring
in the striatum in association with learning are influenced by dopamine-dependent
modulation of thalamostriatal and corticostriatal pathways. Spiny projection neurons
receive excitatory synaptic inputs from the thalamus and cortex (Figure 2.8). Phasic
dopamine release during reward-related learning occurs at the excitatory corticostriatal
and thalamostriatal synapses, and changes in the afferent activity of these synapses
presumably underlies the aforementioned changes in striatal activity.'
'
-0PnknkrNwmkoloOy
Ficrure 1.6 THE STRIA TAL NEURAL CIRCUITRY INVOLVED IN
REWARD-RELATED LEARNING. Open circles, red circles,
black triangles and blue triangles refer to glutamatergic,
dopaminergi?] GABA-ergic and cholinergic synaptic contacts
respectively.
Spiny projection neurons in the nucleus accumbens also play a role in the
initiation of behavioral responses to environmental stimuli. These neurons receive
excitatory input from cortical and thalarnic structures which often leads to dopamine
release, indicating that this neural substrate is mediating an information gating function.
However, the effects of dopamine on neural activity in the nucleus accumbens and
adjacent striatum are state-dependent. That is, studies have shown that in the presence of
a depolarizing excitatory tone, the stimulation of dopamine receptors prolongs excitatory
responses, while in the absence of this tone the stimulation is inhibitory. In effect,
dopamine transmission in the nucleus accumbens serves as a gating function that also
augments sustained excitatory input to spiny cells while inhibiting their output.
Characterization of dopamine's role in the nucleus accumbens therefore requires that
studies employ conscious subjects, as the spontaneous presence of an excitatory tone is
not present in vitro or in anesthetized animals.
The ventral pallidum is recognized as the primary terminal field for spiny neurons
of the nucleus accumbens, and its role in regulating motor behavior is well established.
Studies of the mechanisms of drug reward have demonstrated that it also belongs in the
motive circuit. Topographic analysis of the anatomical and function connections between
the nucleus accumbens, ventral pallidum, mediodorsal thalamus, and prefrontal cortex
has revealed that a series of contacts extends from the nucleus accumbens shell region
outward to the core. The thalarnic portion of this circuit only permits the flow of
information from the ventral pallidum to the prefrontal cortex. Because it involves the
prefrontal cortex, this thalamic subcircuit may provide access to short-term memory
functions to aid in context-appropriate behavioral responses to reward.
1.4 The Dopamine Hypothesis of Reward
Learning is driven by deviations between predicted time and quality of rewards
and actual reward delivery. Adjusting expectations leads to adaptive behavior that leads
to maximizing rewards and minimizing aversive stimuli. Multiple lines of evidence
support the theory that dopamine neurons of the SN-VTA project to structures involved
in goal-directed behavior and motivation. For example, investigations of the role played
by individual midbrain dopaminergic neurons showed that changes in dopaminergic
activity correlate with the animals' transfer of behavioral reaction from the unconditioned
stimulus to the conditioned stimulus. Studies utilizing bioelectrical sensors have shown
that dopamine neurons emit a positive signal when the reward is better than predicted,
and negative signal when the reward is worse than expected, or not delivered when it is
expected.l 2
Dopamine has also been linked to motivation, reinforcement, addiction and
memory. It was first identified when damage to nigrostriatal dopamine fibers caused
feeding and drinking deficits, and damage to mesolimbic dopamine fibers decreased
forward locomotion, which strongly associated with goal-seeking behavior. Neuroleptics,
drugs that block the effects of dopamine by antagonistically binding to dopamine
receptors, have been shown to attenuate or block the rewarding effects of lateral
hypothalamic electrical stimulation. Immediate dopaminergic activation is not required
for motivation, as experienced animals perform previously rewarded actions often until
they have had considerable experience under the influence of the neuroleptic. In addition,
studies have revealed that predictable rewards do not significantly increase Fos-like
immunoreactivity in dopamine neurons, while the prefrontal cortex display marked Foslike activation. This evidence suggests that dopamine plays a role in imprinting the
rewarding quality of stimuli, which is essential for control of goal-directed behavior by
expectations based on past experience. This is not dopamine's only physiological
function, and there are other substances, such as glutamate acting on NDMA receptors,
that enable conditioned self-stimulation, but given these caveats, the dopamine
hypothesis of motivation and reinforcement is well estabilished.13
This preliminary review of the neural substrates of reward has shown that the
amygdala, nucleus accumbens and VTA, work together to processes rewarding stimuli.
Domaninergic inputs to the VTA seem to provide prediction error signals. Dopamine, as
well as glutamergic afferents from the thalamus, play important roles in prefrontalcortex-mediated short-term memories. In addition, the effects of conditioned reinforcers
may be mediated by interactions between glutamatergic afferents and midbrain dopamine
neurons. The physiological evidence presented in the aforementioned articles informs
models of neurotransmitter activity and our understanding of goal-directed behavior. By
combining observations of conditioned or goal-directed behavior and physiological
observation strategies such as lesion studies, drug administration and fMRI, the
mechanisms of reward and learning can be spatially, temporally and chemically
described.
1.5 Imaging the Brain
A wide range of rewarding stimuli have been shown to modulate BOLD signal
responses in a variety of brain structures depending on the behavioral task or type of
stimulus. But how does tMRI work, and how does BOLD imaging reflect brain function.
Although fMRI has become the dominant technique for the study of the functional
organization of the human brain during perceptual, cognitive and motor tasks,
neuroscientists are still developing the theory as to how BOLD signal changes represent
brain function. Positron emission tomography, which exploits ~
~labeled
' water
~ as
0a
tracer to reflect parenchymal blood flow, has shown that increases in blood flow are
linearly proportional to increases in neuronal activity. In a given voxel, the volume
element determined by MRI pixel size, there may be hundreds of thousands of these
neurons. The BOLD effect reflects changes in blood flow, blood volume and blood
oxygenation in the arteriole, capillary and venous vascular beds in an intricate and
variable combination. However, it has excellent spatial and temporal resolution
properties, and offers considerable flexibility in paradigm design.I4
A well defined, qualitative relationship between BOLD signal changes and
neuronal activation has not been demonstrated, but is possible to obtain a quantitative
understanding of physiological changes as they relate to the observed dynamics of
magnetic resonance signal. Signal changes arise from differences in the magnetic
susceptibility between blood vessels and the surrounding tissue due to the presence of
paramagnetic deoxyhemoglobin in the blood. This variation in magnetic susceptibility
manifests as differences in relaxation rate, particularly transverse relaxation rate. Spinecho and gradient-echo imaging, which depend on the relaxation rates R2 (l/Tz) and R ~ *
(1/~2*)
respectively, are used for different applications owing to the magnetic field
variations of the latter and lower degree of BOLD contrast observed with the former.15
When Ogawa et a1 published their groundbreaking results of brain MRI with
BOLD contrast in 1990, they actually stated that BOLD contrast was not observed in
spin-echo images.16This conclusion was probably drawn because the spin-echo (SE)
approach yields significantly smaller fractional signal changes compared to the gradientecho (GE) method.I7 To better explain the different consequences of SE and GE imaging,
it is worthwhile at this point to review the physical bases and mathematical
representations of both methods. While GE imaging generates an echo by reversing the
gradient polarity, SE imaging produces an echo by applying a a pulse after TEl2. T ~is*
the sum of T2 and ;T:
T2 is the time constant for relaxation caused by the local magnetic
fields of the nearby protons and T: represents the effect of macroscopic and microscopic
magnetic field heterogeneities. The R pulse applied in SE eliminates the effect of Tzf, but
at the same time reduces the fractional signal change. Deoxyhemoglobin, with its ironcontaining heme groups, is primarily T2-influencing, which is why T2-weighted (repeat
time, TR >> longitudinal relaxation time, T1) imaging makes most sense for BOLD
fh4RI.15
1.6 Insights from fMRI into Brain Function
Brain regions in which rewarding stimuli consistently increase activity include the
orbitofrontal cortex (OFC), amygdala, striaturn/nucleus accumbens and dopaminergic
midbrain. In order to answer questions concerning the precise role these structures play,
either as members of a neural circuit or independently, in sensing, predicting and valuing
rewards, fMRI research has employed myriad primary and conditioned rewards in both
human and animal studies. These rewards include food and water, appetitive smells,
sexual stimuli, rewarding electrical stimulation, and even social rewards like money and
positive feedback. Our understanding of how these rewards are processed is developing
through an iterative process of experimentation and theorizing.18
The OFC receives direct inputs from the taste and olfactory cortices as well as
higher-order visual and somatosensory areas. It is ideally located to store the reward
value of sensory stimuli, and in fact OFC neurons in rats have been shown to respond
preferentially to different tastes. Furthermore, these neurons decrease their firing when
consumption of rewards induces satiation, and the stimulus becomes less rewarding. One
clever approach to this research involves scanning hungry subjects who are exposed to
two food-related stimuli. The subjects consequently consumed one of the corresponding
foods until they were satisfied, and were scanned again. Responses in the OFC showed
that activity related to the food that was eaten decreased, but activity for the other foodrelated stimulus did not. Studies such as this indicate that the OFC codes for the
rewarding quality of stimuli, rather than their sensory aspects.19
fMRI studies have shown that the amygdala is involved in sensing aversive
stimuli, for example, it is preferentially activated by images of frightening or angry faces.
However, the amygdala is also activated following positively reinforcing stimuli, and so
it seems that activity in the amygdala is really related to how arousing the stimulus is,
rather than whether it is rewarding or aversive. In one study, BOLD signal changes were
shown for subjects exposed to unpleasant (valerica acid [Val]) and pleasant (citral [Cit])
odors (Figure 1.7). The BOLD timecourse is remarkably similar for both stimuli at the
same concentration, but there is a marked difference between timecourses for stimuli at
different concentrations. These results, and the findings of similar studies, seem to
contradict evidence from animal lesion and human neuropsychology research. In fact,
these past studies have been reevaluated based on the idea that reward value is an
interaction between valence and intensity. That is, the concentration or arousing qualities
of a stimulus may have an effect on its affective character?'
I
~w;nndv
IE
-7
S w M n
Figure 1.7 TIMECOURSE OF AMYGDALA BOLD RESPONSE.
Presented are the time course (line plots) and peak
hernodynamic responses (bar graphs) to the high and low
intensity presentations of valeric and citral in the left (a) and right
(b) a r n ~ ~ d a l a . ~ '
After neural substrates have perceived the reward, assessed its value and valence
and committed this information to short-term memory, the organism must still have a
way to generate reward-directed behaviors. Research has shown that electrical
stimulation of the ventral striatum is highly rewarding due to phasic dopamine release,
and fMRI studies of reward processing have found that BOLD signal changes correspond
to changes in reward amplitude. The timing of these changes supports the hypothesis that
these responses in the ventral striatum signal reward prediction errors.
In one study involving human subjects, fMRIwas used to correlate prediction
errors in reward delivery with BOLD changes in the human striatum. The quantitative
basis for the strategy of using fMRIto observe metabolic activity during reward delivery
is that dopamine neurons give transient responses to deviations in expectations about
reward delivery. Goal-directed behavior requires that the individual demonstrate flexible
learning and coordinated appetitive behavior, actions that indicate prediction of a goal or
reward expectation and hedonic reactions. A positive prediction error (nothing expected,
reward delivered) can cause increased acivity in the left putamen (Figure 1.8). Negative
prediction errors had the opposite e f f e ~ t . ~ '
Positive orediction error: 'f
-2
Figure 1.8 LEARNING AND THE HUMAN STRIATUM. An
illustration of the-results of fMRl experiments involving positive
prediction errors."
Neuroimaging studies have also implicated the amygdala and OFC. In the same
way that these structures contribute to the processing of new rewards, representation of
the predictive value of familiar rewards requires different types of characterizing
information. For one study fMRI study, predictive reward values were coded by BOLD
activity in the OFC, amygdala and striatum (Figure 1.8). Arbitrary visual cues were
paired with two food-related odors in a classical conditioning paradigm, and after
subjects were fed to satiety with one of the foods, responses to the predictive cue for that
devalued odor decreased. Additionally, the striatum is known to respond to aversive
stimuli and 'non-rewarding' salient events, such as random distractor stimuli. This data,
combined with studies that show preferential activation in the striatum during active
reward tasks as opposed to passive tasks where no action on the subject's part is required,
suggests that the striatum in involved with coding stimulus saliency. More research is
required to characterize the expanding list of functions posited to be mediated by the
striatum.l9
REWARD VALUE CODING in the (a)
striatum. (d) is
relative difference in
activity pre to post satiety.
Neuroimaging has been vitally important in identifying and roughly localizing
important stages of reward podessing. However, the specific hctions of the neural
1
substrates that have been impli ated in the reward circuit have been incompletely, and
possibly erroneously, describe .Given the current evidence, it seems that the occurrence
of salient stimuli is signaled by the amygdala, which also initially codes their predictive
value. The OFC assesses the vilue of the stimulus ventral striatum then further integrates
this information so that behavidral responses are context-appropriate. However, it is not
clear whether the OFC is guidiJg behavior, or modulating the mental representation of
the information itself. In additib, complex behaviors are not only implemented through
involvement of individual struc/tures,but rather by interactions between many neural
substrates. Furthermore, studies of reward-related neural responses have revealed BOLD
activity in the parietal cortex, posterior and anterior cingulated, and dorsolateral
prefrontal cortex.I9
tMRI is also being used to answer questions as to how the brain and behavior is
influenced by hormones and other poorly understood sensory stimuli. This, and other
work regarding how neurons obtain representations of reality that enable the individual to
exert influence through goal directed behavior will contribute to our understanding of the
neurobiology of social interactions. fMRI is the only tool that enables researchers to
investigate neural activations over large portions of the brain with minimal extraneous or
adverse effects on the subject. It will therefore continue play a crucial role in discovering
the neural mechanisms of reward.
The motivation for this project is to distinguish loci of reward magnitude
processing from other substrates of goal-directed learning. Using animal subjects
provides for the flexibility of intracranial self-stimulation and extended fMRI sessions.
Electrical stimulation of the MFB is a robust substitute for natural reinforcers in the
context of animal conditioning paradigms. The parameters of this stimulation are
controlled by the experimenter and its results are thought to be unadulterated by
appetitive or emotional factors. Evaluation and recall of rewards are complex processes
involving many substrates, and by examining the real-time BOLD signal changes in
anesthetized and eventually in awake subjects, we hope to better understand the
physiological basis of reward processing in terms of the behavioral responses it
influences.
References
1. Principles of Neural Science. Ed. E. R. Kandel, J. H. Schwartz, T. M. Jessell. 3rd
ed. Appleton and Lange. Nonvalk, Connecticut, 199 1.
2. Berridge, Kent C. "Motivation concepts in behavioral neuroscience." Physiol
Behav. 2004 Apr; 8 l(2): 179-209.
3. Dayan, P., Balleine, B. W. "Reward, motivation and reinforcement learning."
Neuron. 2002 Oct 10; 6(2): 285-98.
4. Pearce, J. M., Bouton, M. E. "Theories of associative learning in animals." Annu
Rev Psychol. 200 1 ; 52: 111-39.
5. Stellar, E., J. R. Stellar. The Neurobiology of Motivation and Reward. SpringerVerlag New York, Inc. New York, NY, 1985.
6. Banich, M. T., Neuropsychology: The Neural Bases of Mental Function.
Houghton Mifflin Company. New York, NY, 1997.
7. Demarest, R. J., C. R. Noback. The Human Nervous System: Basic Principles of
Neurobiology. McGraw-Hill, Inc. 198 1.
8. Kalivas, Petenv W. and Mitsuo Nakamura. "Neural systsms for behavioral
activation and reward." Current Opinion in Neurobiology. 1999,9:223-227.
9. Robbins, T. W. et al. 'bNewrobehavioralmechanisms of reward and motivation."
Current Opinion in Neurobiology. 1996,6:228-236.
10. The Rat Nervous System. Ed. George Paxinos. 2nded. Academic Press. San Diego,
CA, 1995.
11. Wickens, Jeffery R. "Neural mechanisms of reward-related motor learning."
Current Opinion in Neurobiology. 2003, 13:685-690.
12. McClure, S. M. et al. "Temporal Prediction Errors in a Passive Learning Task
Activate Human Striatum." Neuron. Vol. 38,339-346, April 24 O 2003 by Cell
Press.
13. Schultz, W. et al. "A Neural Substrate of Prediction and Reward." Science. Vol.
275, 1593-1598, 14 March 1997.
14. Menon, Ravi S. "Imaging function in the working brain with fMRI." Current
Opinion in Neurobiology. 200 1, 1 1:630-636.
15. Cho, Jones and Singh. Foundations of Medical Imaging. John Wiley & Sons, Inc.
0 1993.
16. Ogawa et al. "Brain magnetic resonance imaging with contrast dependent on
blood oxygenation." Proc. Natl. Acad. Sci. USA;87:9868-9872 (1990).
17. Jezzard, P. Computerized Medical Imaging and Graphics; 20(6):467-48 1 (1996).
18. McClure, S. M. et al. "The Neural Substrates of Reward Processing in Humans:
The Modem Role of fMRI." Neuroscientist. Volume 10, Number 3,2004.
19. O'Doherty, John P. "Reward representations and reward-related learning in the
human brain: insights from neuroimaging." Current Opinion in Neurobiology.
2004, 14:769-776.
20. Anderson, A. K., et al. " Dissociated neural representations of intensity and
valence in human olfaction." Nature Neuroscience. 6: 196-202.
21. McClure, S. M. et al. "Temporal Prediction Errors in a Passive Learning Task
Activate Human Striatum." Neuron. Vol. 38,339-346, April 24 O 2003 by Cell
Press.
Chapter 2
Methods
2.1 Implantation of Stimulating Electrode
Lewis rats between 250 and 275 grams were chosen for both Part 1 and Part 2.
The silver unipolar electrodes consist of a 0.10 mm diameter stimulating wire and 0.05
mm diameter grounding wire, both Teflon-coated, housed in polyethylene tubing and
connected by a plug. The coating was stripped from the grounding wire, but not from the
electrode. The rats were anesthetized and the stimulating electrodes were lowered toward
the left MFB by stereotaxic surgery using standard anatomical co-ordinates (2.2 mm
caudal, 1.5 mm lateral to the bregma, 8.5 mm ventral to the dura). Six beryllium copper
screws were implanted in the skull to provide purchase for the dental cement that secured
the electrode. One additional screw was placed over the right visual cortex for current
return. The rats were allowed at least five days to recover from surgery before behavioral
training.
2.2 Operant Training
The configuration of the stimulation set-up is diagrammed in Figure 3.3. For Part
1, the operant chamber was outfitted with a single 2.5 cm nose-poke device from Med
Associated. Stimulation was delivered and current was regulated by an Iso-Stim stimulus
generator that received parameter-modulating input from a computer running customwritten software called Operate and Operatesequencer (Metrowerks Code Warrior) for
Part 1 and Part 2 respectively. A second nose poke was added on the same panel as the
first for Part 2 in order to implement a choice-based experimental protocol.
Pmtdses current amplitude oorRml
Generates strnulus according to the
operang program s parameters and
sends ~tto the subject.
Stimulus Generato
I
Sets~ngfor~8ndcantra(s
~ O f ~ ~ ~ t h e s t i m u
ItfellsthggenaraSwtosendwhena
response is qisbered by the nose poke
device.
Figure 2.3 SCHEMATIC REPRESENTATION OF STIMULATION
SETUP.
i
k
r
~
2.2.1 Operant Training: Part 1
Rats were exposed to stimuli consisting of 500 ms trains of 0.2 ms pulses at 50,
100 and 200 Hz. One stimulus was delivered for each nose poke, and self-stimulation
amplitudes of 0.12 to 0.30 mA were chosen to match the current threshold for minimal
induced motion responses for relatively sustained poking. The rats were allowed at least
three training sessions prior to imaging at one session per day, and at this point they had
learned to nose poke vigorously, albeit at varying rates, for reward with intermittent
periods of inaction. The following figure is an example of the relevant stimulation
parameters printed to the screen by the Operate software during its execution.
USratio: 1
Usinterval: 5000000 microseconds
Usduration: 900000 microseconds
pulsewidth: 500 microseconds
pulseinterval: 10000 microseconds
Figure 2.4 OPERATE PROGRAM PARAMETERS. US stands for
unconditioned stimulus, and 1 is the number of responses
required to obtain a reward.
2.2.2 Operant Training: Part 2
With the addition of the second nose poke device, a new program called
Operatesequencer was introduced. It identified the nose pokes as Input 1 and Input 2 and
assigned them alternating roles as a reference frequency that remained constant
throughout the trial, and a variable comparison frequency. The functionality of either
device would be randomly chosen at the beginning of each trial and would reverse
halfway through the trial. The pulse frequencies of the variable stimulation were
determined by a spacing function that randomly chose frequencies above and below a set
mean comparison frequency at 0.05 log unit (or 12%) intervals. The parameters of the
stimulation, including pulse interval, train duration and train interval, were also
controlled through the program. However, current and the pulse length of 0.1 msec were
determined by the stimulus generator. The following figure shows examples of the
stimulation parameters printed to the screen by the Operatesequencer software during its
execution of a variable trial.
trainduration: 1.0 sec
traininterval: 1.5 sec
reffrequency: 200 Hz
compfrequency: 200 Hz
compspacing: 1.1200
numtrials: 1I
trialduration: 300 sec
trialinterval: 320 sec
randomtrials: 1
repeatfirst: 1
midtrialswitch: 1
primingtrain: 1
seed: 2530
compfreqlist: [ IXI
1 double]
compidentity: [0 0 1 1 1 0 1 1 0 1 I ]
Figure 2.5 OPERATESEQUENCER PARAMETERS. "1's" refer
to activation of the parameters they apply to. This means that the
variable comparison frequency trials are randomized, that the
first of the 11 trials is repeated, that during each trial the
functionality of the nose poke devices switches mid-trial, and that
at the beginning of each trial, the rat is given a free reward.
Compidentity determines which of the inputs (0 or 1) is defined
as the reference, and the seed is the random number that
determines this vector.
Operant conditioning entailed shaping sessions followed by training sessions.
Data was collected for one session per day Subjects were stimulated at various current
amplitudes to determine the highest value that would not cause an involuntary movement.
During shaping sessions, 200 Hz was used for the mean comparison value as well as the
reference, and the spacing ratio was set at 1. Shaping sessions consisting of six trials with
of 600 second durations and 620 second intervals were then conducted at this current to
gauge the responsiveness of the subjects to the reward. Rats that poked at an average rate
lower than 15 responses per minute were deemed unresponsive, or not rewarded
Rats that maintained adequate response rates began a series of training sessions
with reference frequencies that were iteratively varied in order to determine the saturation
frequency. This required that the spacing factor be set to 1.12 (Figure 2.5). At this point,
the train interval was increased to 1.5 seconds while the train duration remained at 1
second, effectively adding a blackout period limiting the number of rewards the rat could
obtain. Sessions consisted of eleven 10 minute trials at first in order to completely
acclimatize the rats to the two-nose poke system, and to the training environment. After
at least 3 trials, or until behavior became reasonably reproducible, trials were shortened
to 5 minutes.
2.3 Imaging
In preparation for imaging, animals were anesthetized with 1 5 2 . 0 % isoflurane or
halothane, tracheostomized, and placed on mechanical ventilation (Harvard Apparatus).
Body temperature was maintained with a heated water pad at 37 O C (Gaymar). Each rat
was secured in a custom-made holder including a bitebar and earbars. Anesthesia was
adjusted to 1% for imaging. Throughout each trial, animals were monitored using a
transcutaneous blood-gas analyzer (Radiometer TCM3) or pulse oximeter (Nonin
8600MV).
Imaging was done with a 4.7 T, 30 cm diameter inner bore diameter horizontal
magnet, which was controlled by an AVANCE console (Bruker Instruments) with
Paravision Imaging Software and was equipped with a 12 cm ID triple axis gradient set
(26 Glcm maximum). Signal was transmitted and received with a surface coil consisting
of a copper wire loop and etched circuit board that was positioned over the head, around
the electrode, of each rat. High resolution anatomical images of the forebrain, including
electrode implantation site, were acquired for each rat using a gradient echo FLASH
sequence with TE and TR of 15 and 2000 ms respectively, 256 x 256 matrix, 3 x 3 cm
field of view, and a slice thickness 1 mm. The sessions consisted of cycles of stimulation
periods followed by longer rest periods, during which single-shot gradient echo EPI
sequences were used for standard BOLD imaging. Image matrices of 64 x 48 pixels were
acquired using TEITR 2012000 ms, bandwidth 100 kHz, field of view 3.2 x 2.4 cm and
slice thickness of 1 mm. Imaging volumes consisted of 8- 12 consecutive slices centered
over the somatosensory cortex.
2.3.1 Functional Imaging: Part 1
During image acquisition, animals were stimulated with same pulse train
parameters that were used to determine response rates during training, both at full and at
half current amplitudes, in separate experiments. Rest periods of 30 seconds alternated
with stimulation periods lasting 20 seconds, with pulse trains delivered at a frequency of
1 Hz. Eight complete cycles of stimulation and rest periods were delivered per imaging
trial, for a total of 200 images.
2.3.2 Functional Imaging: Part 2
During image acquisition, the responsive rats were stimulated at 0.5 Hz at the
same current amplitude and pulse train parameters used during train parameters. Each rat
received 100 second long cycles with a 10 second stimulation period and 90 seconds
allowed for decay of residual effects of the stimulation. These cycles were performed at
the saturation frequency, and frequencies above and below that value as presented in the
following table. In order to confirm that the brain was responding to electrical stimulation
despite anesthesia, 2 mA shocks to the paw were delivered while functional images were
acquired. Unresponsive rats also received this paw shock, in addition to being imaged
while receiving 200 Hz, 0.5 second trains of 0.1 millisecond pulses at 1 Hz and a current
that did not elicit involuntary movements.
2.4 Imaging Data Analysis
Analysis of imaging data was performed with Matlab v.6 (Mathworks) running
in-house processing routines. Regions of significant activation were identified by
correlation with stimuli, using a t-test criterion (uncorrected p <I 0-5), and superimposed
on corresponding EPI anatomical maps. For Part 1, areas of reproducibly signalcorrelated BOLD activation were selected and plotted as functions of time. For Part 2,
areas of signal-correlated activation at the saturation frequency were selected. The
percent signal change for these voxels was then plotted for frequencies at, above and
below the saturation value. Additional data processing and visualization were performed
using Matlab and Adobe Creative Suite.
Chapter 3
Results
Data relating to both behavioral training and functional MRI experiments reveal
insights into the effectiveness and rewarding character of intracranial self-stimulation of
the rat MFB. Electrical stimulation, whose parameters are determined by operant
conditioning in both Part 1 and Part 2, elicits a distributed hemodynamic response that is
found reproducibly for areas implicated in neural reward processing.
3.1 Part 1
3.1.1 Operant Training Results
After at least five days of recovery from surgery, MFBL9, MFLB 11 and MFBL 12
were subjected to behavioral training in a Skinner-type box, which was outfitted with one
nose poke device as described in the Methods. Current amplitude was iteratively adjusted
to just under the threshold for inducing involuntary movements upon stimulation, and
various current amplitudes lower than this maximum value were also used during
conditioning for response (nose poke) rate comparison. Electrode placements were
determined to be rewarding if the rat acquired nose poking behavior at the selected
current amplitude.
Plots of response total as a function of time show that the general trend is for
increased responding at higher amplitudes (Figure 3.1). For MFBL9 and MFBLI 1, whose
response rates were two orders of magnitude greater than MFBL 12, the response rate to
current amplitude relation is a decidedly nonlinear function, with response rate increasing
an order of magnitude for fractional increases in current amplitude. MFBL12 did not
respond as much or for as long as the other subjects, but was still considered to have
acquired nose poking behavior. Although there is no clear trend of responses vs. time, the
highest current stimulus did elicit the most sustained nose poking. Factors that may have
contributed to these disparate response rates include the placement of the electrode and
the varying sensitivity of the individual subjects to the 100 Hz frequency that was
consistently applied.
3.1.2 Imaging Results
Following behavioral training, BOLD fMRl under isoflurane anesthesia was used
to characterize the neural response to pulse trains of electrical stimulation with
parameters similar to those used during operant conditioning. In addition, high resolution
gradient echo imaging showed that the electrode tips are positioned slightly outside of the
dorsal extent of the MFB (Figure 3.3E, plate 3). Maps of the voxels containing
statistically significant (p-value < 10") signal increases due to stimulus modulation were
then superimposed over the EPI slices acquired through the rostra1 half of the brain
during functional imaging for localization of activity (Figure 3.2 and 3.3). Three brain
regions showed consistent activation in all of the subjects to varying degrees. By
correlation with a stereotaxic atlas and high resolution images, these were determined to
be: 1) the somatosensory and motor cortex (SIIMI), 2) the striatum and orbital cortex
(StIOC) and 3) the central, or saggital, sinus (SS).
The BOLD activation results, in terns of percentage of voxel activation, correlate
with the behavioral trends. Significantly fewer voxels are identified as statistically
significant when the lower stimulation amplitude is used (Figure 3.2). The regions
significantly modulated by the lower current stimulus are the most prominent at the
higher current stimulus as well. These results mirror the behavioral data, which showed
less robust responding at lower current amplitudes.
The timecourses of stimulus response vary between the three regions (Figure 3.4).
Modulation of signal in the central sinus is commonly observed in rat fMR1 studies, and
may reflect an overall increase in blood perfusion due to the presence of a stimulus.
BOLD signal changes in this region are relatively uniform for the eight stimulation
cycles. Like the response in the SlIMl, on average the signal changes in the SS peak
more quickly after stimulation begins, and decays to baseline more slowly after stimulus
offset when compared with the StIOC. However, percent signal changes from baseline for
the S l/Ml are greater for the first four stimulation epochs than the last four. The average
peak signal change for all three regions is about 3%.
3.1.3 Part 1 Figures
MFBC9
I
I
I
F k q m m s lo Various Current Amp4ludm
I
I
I
a
I
-
- A
-
1.5
2
25
3
3'5
4
4.5
5
MFBL11 Responses to Various Currents
3000
2
2.5
3
Time (milliseconds)
Figure 3.1 RESPONSE TOTALS FOR (A) MFBL9, (8) MFBL 11
AND (C) MFBL12 for various current levels over the time that the
subject was responding with nose pokes during 1 hour sessions.
Plots with smaller lengths and regions of plots that are
essentially horizontal lines indicate periods of unresponsiveness.
thres.
current
Figure 3.2 (A) PERCENT ACTIVATED VOXELS (pc 105) across
the brain averaged for all three rats. Example of one EPI slice
stimulated at full current (B) and half current levels (C).
Activation increases with current as a nonlinear function which is
spatially qualitatively similar.
- -a y e
.
I
1
I
I
I
-
.
. * -
-:
s
f '
- Ir
-
..- I
r-
Fiaure 3.3 (A-C) BOLD ACTIVATION P-VALUE MAPS. The red,
blue and green arrows indicating loci of reproducibly stimuluscorrelated BOLD activation refer to the saggital, or central, sinus
(SS), the striatallorbital cortex area (SVOC) and the
somatosensory and motor cortex (S1IM 1) respectively. The
locations of these regions of interest are further illustrated by the
color-coded encircled areas in the rat brain atlas plates of panel
E. (D) High resolution scan of the rat in panel A. The numbered
slices correspond with the atlas plates in E. (E) Atlas plates with
color-coded indicators of activated regions and electrode tip
positions.
I;;;<m
-.? :7*z. ,, .-':.. .> ' A. .
-.&*,5.
,:;-.'
>;I
,;:-,;,:
:'!;:=:-,; , .
.",
. . ;:::$ s .. ~ '
.: ..
.
L
' 8
'O'O~
..!, ..'
, ;.,,'.A,'
IL. - !
*.
. ,..,
.1,.9,-.-, y?
.. .).<'7;
*.
'-
L.,
time
peristimulus time
Figure 3.4 PERCENT SIGNAL CHANGE TIMECOURSES for all
cycles for one subject (left). Corresponding time points within
these cycles were then averaged to find the mean percent signal
change (right). Blue, green and red traces correspond to the
colored arrows of Figure 3.3. The gray regions indicate
stimulation periods, and the white are rest periods.
3.2 Part 2
3.2.1 Operant Training Results
Work by Peter Shizgal and Charles R. Gallistel has shown that the rewarding
quality of intracranial electrical stimulation increases with increasing frequency, but
tends to saturate after a certain point.1 We hypothesized that by investigating the BOLD
response to stimuli with frequencies below, at and above saturation values, we could
distinguish between activation resulting from the rewarding aspect of the stimulus and
other, secondary effects. In order to determine the saturation frequencies of the subjects
in this second phase of the study, a second nose poke device was added to the operant
training chamber. The nose pokes were assigned alternating roles as the reference and
comparison stimulus. The current amplitude was set at the beginning of training by
iteration to be just below the threshold for inducing involuntary movements.
10s
'"I
3.2.1.1 Shaping Sessions
As stated in the Methods, the behavioral protocol for Part 2 involved shaping
trials to determine receipt of rewards, followed by variable trials to ascertain saturation
frequencies for individual subjects. By the third shaping trial, there was at least an order
of magnitude difference between response rates for rats, which were thereafter classified
as rewarded or not rewarded (Table 3.1). For the last shaping session, the average
response rates of rewarded and not rewarded rats were 39.58 responses per minute and
0.87 responses per minute respectively. High resolution anatomical scans produced with
gradient echo imaging showed that, in general, the electrode tips of rewarded rats were
closer and more dorsal to the MFB (Figure 3.5). In contrast, electrode tips for not
rewarded rats were positioned more medially (Figure 3.6).
3.2.1.2 Variable Sessions
Rats that demonstrated reward seeking behavior were advanced to variable
training, which began with the reference set at 200 Hz. The reference was increased until
the response rate for the reference frequency stimuli and for comparison stimuli above
the reference frequency were approximately equivalent (i.e. Figures 3.9,3.10 and 3.1 1).
MFBL 17's response and reward receipt totals for sessions with the reference set to 200
Hz show that over time the subject learns to more consistently differentiate between
higher and lower frequencies, seeking more of the more rewarding stimuli. The rat
showed a decreasing preference, or ability to differentiate between the rewarding quality
of the different frequencies, for higher frequencies when the difference between the
higher and lower frequency decreased. Its preference, or the ratio of reference frequency
rewards to comparison frequency rewards, continues to change for the entire range (1 59
Hz - 495 Hz) of comparison frequencies (Figure 3.7). In contrast, for the session where
the reference was set above 297 Hz, the reference to comparison preference ratio is
essentially 1:1 for comparison frequencies above the reference value, but is very close to
1 :1 for the entire range (Figure 3.8, bottom row). Variable training proceeded in a similar
pattern for MFBL19 and MFBL24, although the rats performed at different but self-
consistent response rates. Parameters for the imaging procedure (Table 3.5) were
determined by the subjects' performance in five averaged trials at the determined
saturation frequency (Figures 3.9, 3.10 and 3.1 1).
3.2.3 Imaging Results
As in Part 1, high resolution anatomical images were used to determine the
positions of subjects' electrode tips. For rats that did not demonstrate reward-seeking
behavior, electrode tips tended to be more medially positioned than those of rewarded
rats. Rewarded rats' electrode tips were all within 1 mm of the midline of the MFB. The
tMR1 sessions consisted of cycles of stimulation periods followed by longer rest periods,
during which single-shot gradient echo EPI sequences were used for standard BOLD
imaging. Imaging volumes consisted of 8-12 consecutive slices centered over the
somatosensory cortex. All three subjects had stimulation induced BOLD activation,
however, larger areas of more statistically significant signal changes occurred in
MFBL17 and MFBL24 than MFBL19. Both MFBL17 and MFBL24 had activation in the
central sinus (CS), the somatosensory/motor cortex (Sl) and the caudate putamen (or
striatum) with surrounding ventral forebrain (CPu), but MFBL24's BOLD response was
more robust.
MFBL17's scans (Figure 3.12) showed a greater response in terms of percentage
of voxel activation at 300 Hz and 500 Hz than at 175 Hz, but are similar in extent to each
other. In Part 1, it was shown that a higher current amplitude could create greater overall
BOLD activation, and these images suggest that pulse frequency - the only variable
parameter for this experiment - can achieve a qualitatively similar effect. Thus the BOLD
response appears to match the behavior in that it increases up to the saturation point, but
not beyond.
MFBL24 was chosen to examine this phenomenon in more detail because its pvalue maps reveal specific areas with consistent and exceptional BOLD activation over
time (Figure 3.14). Signal change as a function of time was plotted for voxels
significantly activated in MFBL24's CS, S 1 and CPu regions at 265 Hz, and then
averaged over eight stimulus-rest cycles (Figures 3.1 6 - 3.1 8). These voxels were then
used to create corresponding signal change timecourses for the 115 Hz and 400 Hz
stimulation sessions. For these voxels, there is a clear increase in the percent signal
change during the 10 second period when trains of electrical pulses are being applied,
with a subsequent decrease to near baseline within 12 seconds. Signal changes for the CS
are an order of magnitude larger than for the S1 or the CPu, which is to be expected since
the volumetric blood flow in this large vessel is much greater than in brain regions where
blood vessels are fine and diffuse. Unlike the timecourse results of Part 1, decay of signal
post-stimulus is about the same for the three regions.
The more striking difference between the CS plots and the S1 or CPu plots is the
relative peak signal change of the timecourses. For both the S1 and CPu, the 265 Hz
(reference) and 400 Hz plots are similar, and notably larger than the 155 Hz response.
The CS response shows roughly equal peak signal changes for the 400 Hz and 155 Hz
stimuli, and smaller relative differences between the timecourse plots for the three
stimulus types. These results correlate with the behavioral studies, which showed that
MFBL24 did not prefer 400 Hz stimuli over 265 Hz stimuli, but preferred both to 155 Hz
stimuli.
3.2.2 Part 2 Figures
Table 3.1 TOTAL RESPONSE (NOSE POKE) PER TRIAL
VALUES of each of the experimental subjects for the three
shaping sessions averaged over the 6 trials that comprised each
session. Response rates for trainable rats (highlighted in blue)
are orders of magnitude larger than those for untrainable rats.
Fisrure 3.5 ELECTRODE TIPS OF UNRESPONSIVE RATS:
MFBLl8 (blue), MFBL20 (purple), MFBL21 (orange), MFBL22
(yellow), and MFBL23 (red). The positions of these electrode tips
are clearly more medial than those of the trainable subjects.
I..,
I
l
'
%
8
#
.
.
B+
I
....
.
.
.
A
.."...4....i...
,
,
I , . . .
.m rnm
-1
, . . , . . , .
Figure 3.5 ELECTRODE TIPS OF REWARDED RATS: MFBL17
(red), MFBL19 (yellow), and MFBL24 (blue). The positions of
these electrode tips are all within 1 mm of the midline of an MFB.
1/16/05
A
1M8105
M
--
I
6
;
f
I
f
100
150
200
250
300
1
350
CornparlaonFnquoncy
Comprrlaon Fnquoncy
1A9105
1119105
I;
1
;
I
i
200
100
350
250
200
100
350
1120105
1/.om5
1i
100
250
Comparlwn Fnquoncy
Compariwn Fnquoncy
1
;
f
150
200
250
Compamn Fnquency
300
350
100
150
200
250
Comparlwn Fnquoncy
Fiaure 3.7 MFBL77's RESPONSE AND REWARD TOTALS FOR
200 Hz REFERENCE SESSIONS over three consecutive days.
The subject's nose poking is so vigorous that it is responding
faster than the 1.5 second reward stimulus interval. Over time,
MFBL17 more consistently chooses the reference or comparison
stimulus in numbers proportional to the difference between the
comparison and reference frequencies.
300
350
211 Hz
211 Hz
i
i
2
100
t90
200
250
300
350
400
Comparimn Frequency
Comparlmn Frequency
265.5 Hz
265.5 Hz
I;
f
i
!i
1
Cornparim Fmqwney
1
Compariwn Fmqwny
297.4 Hz
297.4 Hz
A -11
f
i
1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0 5 5 0
Comprrlaon Fmqwny
Comparim Fmqmney
-
--
333.1 Hz
-
333.1 Hz
I;
P
180
3 8 0 4 3 0 4 8 0 5 3 0 5 8 0
Compadmn Frequency
Cornparim Frequency
Figure 3.8 MFBL17's VARIABLE TRIAL RESPONSE AND
REWARD TOTALS for sessions with the reference set at the
frequencies indicated. As the reference frequency increases, the
difference between comparison and reference rewards or
responses decreases, especially for comparison frequencies
greater than the reference. For instance, with the highest
reference (333.1 Hz), the reference-comparison ratio does not
change at all for values above the reference, and is about 1 for
almost the whole range.
MFBLI7: 297.4 Reference Trial Average
160
210
260
310
410
360
460
Comparison Frequency
Figure 3.9 MFBL17's TRIAL REWARD TOTALS averaged over
five sessions.
frequency
comparison
comparison
reference
reference
error
error
105.4 1
6.50
200 1
39.8 1
6.55
?EA
TE
FIGURE
3.9.
Table 3.2 VALUES USED TO C
MFBL19: 333.1 Reference Trial Average
290
340
390
440
490
540
Comparison Frequency
Ficrure 3.10MFBL19's TRIAL REWARD TOTALS averaged over
five sessions.
frequency
comparison
comparison
reference
10.78
5.24
4.63
3.40
5.16
8.33
6.70
10.04
4.02
4.17
reference
error
error
149.0
103.2
84.2
139.2
79.2
87.6
123.2
107.2
105.6
89.8
'
11.10
5.07
21.43
3.78
19.38
7.91
6.26
9.61
4.32
4.05
395 1
92.2
Table 3.3 VALUES I SED TO C\ EA TE FIGURE 3.10.
MFBL24: 265.5 Reference Trial Average
150
200
250
300
350
400
450
Comparison Frequency
Figure 3.11 MFBL24's TRIAL REWARD TOTALS averaged over
five sessions.
frequency
comparison
comparison
error
15.27
5.55
6.43
3.60
5.18
5.81
5.14
7.88
9.57
2.78 1
reference
reference
error
13.08
7.27
3.96
11.33
2.66
6.84
5.13
10.14
5.56
6.00
58.4
442 1
78.0
Table 3.4 VALUES L SED TO CREATE FIG IRE 3.11.
Table 3.5 STIMULATION PARAMETERS THAT VARIED
BETWEEN SUBJECTS. Trainable ubjects were stimulated at 0.5
Hz with trains of frequencies approximating the saturation value,
and two values below and above this value. Untrainable subjects
were all stimulated at 200 Hz.
-low)H@
Edutim gradient echo
te
EPI
1.
The 3DO Hz and 506 Hz pwliue
mags have a greater @wamitage
of activated wx&i;s-f*) Denotes
the de&&
tip position.
iO
Caudal
Fiaure 3.12 ANATOMICAL AND P-VALUE MAPS OF MFBL17.
I
. Caudal
High m l u t m gradient echo
imaging was used to generate
-bglp) the ~natwnicalimage, while EPI
la ws used fc~s
the BOLD WRI ("1
denotes the electrake tip posirtian
I
Caudal
Fiaure 3.13 ANATOMICAL AND P-VALUE MAPS OF MFBL19.
1
caudial
!he arrabrnkial
ThecwA.l~,ombwnay
and m c w b l c and
Caudal
Fiaure 3.14 ANATOMICAL AND P-VALUE MAPS OF MFBL24.
20 sec
PeristimulusTime
E N 7 M L SENUS AVERAGED TIMECOURSES
~ i g r p&angas
~
of m e l a signmntly wtsvted
by 265 Hz stimulus am averaged aver all eight stirnulation wles
and plated a& a function af time. The gray and white areas
mpfesmt stirnulatian pwbd and rest pridrespectively.
Fi
F&~ME~!~c.
--
Perissirnulor Tim
Fiaure 3.17 SOMATOSENSORY AND MOTOR CORTEX
AVERAGED TIMECOURSES FOR MFBL24.
Peristimulus Time
Ficrure 4.1 8 CAUDATE PUTAMEN AND SURROUNDING
VENTRAL FOREBRAIN AVERAGED TIMECOURSES FOR
MFBL24.
Reference
1 . Gallistel, C. R. "The role of the dopaminergic projections in MFB selfstimulation." Behav Brain Res. 1986 June; 20(3):3 13-2 1 .
Chapter 4
Discussion
We have shown that electrical stimulation of the rat's medial forebrain bundle
(MFB) elicits hemodynamic responses in neural substrates that have been implicated in
reward processing, as well as the central sinus. Analysis of stimulus-correlated signal
changes in these reward-related regions reveals relationships between relative peak
activation and stimulus frequency. No such pattern is discernable in the signal changes
due to bulk hemodynamic responses to the stimuli in the central sinus. Data from these
regions therefore suggests that they are not simply responding to the perception of a
stimulus, but rather that they are processing of the stimulus's rewarding quality.
Both Part 1 and Part 2 of this study were conducted as preliminary studies in
preparation for the imaging of awake rats performing a task for rewarding MFB
stimulation. In Part 1, three rats were trained in an operant conditioning chamber to poke
their noses into a device, initiating the stimulus generator to send an electrical stimulus of
predetermined parameters to their implanted electrode. Part 2 involved the addition of a
second device, to implement the new choice-based experimental design for the purpose
of determining a saturation frequency above which subjects could not detect changes in
reward magnitude of stimuli. The electrodes were lowered toward the MFB of each rat
with varying levels of success. The subjects - both in Part 1 and Part 2 - tended to be
receptive to rewards in terms of behavior and BOLD imaging when the electrode tip was
positioned within a millimeter of the midline of the MFB.
The MFB connects the hypothalamus with the ventral tegmental area, where
dense dopaminergic projections arise to innervate the striatum.' MFB stimulation is
thought to be rewarding because of the phasic dopamine release activated by externally
introduced electrical currents. This artificial stimulation utilizes the native physiology of
the dopamine system, which plays an important role in goal-directed learning. Early
behavioral experimentation showed that higher pulse frequencies (assuming a square
wave stimulus) and current amplitudes make for more rewarding stimuli, up to a point.2
At a certain frequency, the dopaminergic neurons reach their maximum response level,
and above a particular current amplitude, the brain cannot withstand the electrical
potential without damage or undesired, involuntary movements. Furthermore, in a study
of variable frequencies combined with variable current levels, Gallistel showed that
reward magnitudes depend on the rate at which action potentials are generated and on the
size of the population of reward-relevant axons they are generated in.' Therefore, the
maximum current magnitude and pulse frequency for a given experimental subject, as
well as the stimulus's rewarding quality that the subject evidences through its behavior,
strongly depend on the unique placement of its stimulation electrode. fMRI can then be
used to isolate areas of BOLD activation caused by stimuli whose rewarding qualities
have already been determined from individual-specific behavioral data, allowing the
experimenter to make correlations between activated regions and reward processing.
Figure 5.1 SAGGITAL SECTION OF THE RAT BRAIN WITH
INPUTS (A) AND OUTPUTS (B) OF THE MFB. Relevant
substrates include the amygdala (AMYG), caudate putamen
(CPU), frontal cortex4(FC), substantia nigra (SN) and ventral
tegmental area (VTA).
Operant conditioning results for this study showed that response rates for rats
were generally greatest for higher currents and frequencies. Results in Part 1 established
that rewarding MFB stimulation elicits a distributed hemodynamic response in the
anesthetized rats that is reduced for lower current stimuli. During imaging stimulation
cycles, reproducibly activated loci included the striatum and orbital cortex and
somatosensory/motor cortex. It is thought that the somatosensory/motor cortex sends
corticostriatal afferents to the striatum, and that the orbitofrontal cortex plays a role in
perceiving or evaluating reward magnitudes. The striatum-orbital cortex region (St/OC)
is ideally located to store the reward value of sensory stimuli, and in fact St/OC neurons
in rats have been shown to respond preferentially to different tastes. Such gustatory
studies (see Figure 1.8) indicate that this region codes for the rewarding quality of
stimuli, rather than their sensory aspects.'
In Part 2, the striatum, or caudate putman, with adjacent ventral forebrain (CPu)
and somatosensory cortex were also exceptionally stimulus-modulated, and analysis of
the timecourses for these areas showed peak mean signal changes were similar for
saturation and above saturation frequencies, but signal changes were lower for below
saturation frequencies. In contrast, the timecourse for the central sinus, which reflects
generic hemodynamic responses to perception of reward, did not follow the same pattern.
Studies, such as Samuel McClure's fMR1 experiments with temporal prediction errors
(see Figure 1 .7)6, have implicated the rodent CPu in reward evaluation processes. In that
study, positive prediction errors caused increased activity in the left putamen. This
evidence adds to a body of knowledge asserting that the striatum plays a pivotal role in
associating varying levels of reward with sensory information.
Although the areas with BOLD activation in Part 1 and Part 2 contain neural
substrates that have been consistently implicated in reward processing, the regions
activated in Part 1 tended to be more ventral than those in Part 2. The distributed neural
network of reward includes many substrates that interact in a complex fashion unique to
stimulus type, and the resolution of the BOLD imaging results is not fine enough to
precisely pinpoint the individual structures responsible for stimulus-correlated responses.
Given a much larger sample of reward-seeking subjects with robust BOLD responses, a
more complete and accurate map of spatial and temporal activation profiles during
stimulation could be constructed. BOLD fMRI could then be combined with substrate- or
process- specific contrast agents, and activation profiles could be directly correlated with
individual brain structures.
Using the operant conditioning protocol applied for this study, rats can be trained
to seek rewards, and their optimal stimulation parameters can be determined. These
experiments have therefore shown that it is reasonable to expect that the surgical,
behavioral and imaging procedures used here will contribute to reliable results in studies
with awake rodents. Results of awake imaging will then be compared with these studies
of anesthetized subjects, hopefully leading to insights into the real-time spatial activation
of experimenter administered stimulation versus intracranial self-stimulation, as well as
the physiological basis of reward-related learning.
References
2. Wise, R. A. "Dopamine, Learning and Motivation." Nature Reviews
Neuroscience. Vol. 5 June 2004.
3. Gallistel, C. R. and Matthew Leon. "Measuring the Subjective Magnitude of
Brain Stimulation Reward by Titration with Rate of Reward." Behavioral
Neuroscience. 1991, Vol. 105, No. 6 , 9 13-925.
4. Gallistel, C. R. "The role of the dopaminergic projections in MFB selfstimulation." Behav Brain Res. 1986 June; 20(3):3 13-21.
5. Stellar, James R. and Eliot Stellar. The Neurobiology of Motivation and Reward.
O 185 by Springer-Verlag New York, Inc.
6. O'Doherty, John P. "Reward representations and reward-related learning in the
human brain: insights from neuroimaging." Current Opinion in Neurobiology.
2004, 14:769-776.
7. McClure, S. M. et al. "Temporal Prediction Errors in a Passive Learning Task
Activate Human Striatum." Neuron. Vol. 38,339-346, April 24 O 2003 by Cell
Press.
Download