|
|
||||||||
W. Schultz is in the Institute of Physiology of the University of Fribourg, CH-1700 Fribourg, Switzerland.
| Abstract |
|---|
| Introduction |
|---|
|
|
|---|
Rewards have three basic functions (1). First, they elicit approach and consummatory behavior and serve as goals of voluntary behavior. In doing so, they interrupt ongoing behavior and change the priorities of behavioral actions. Second, rewards have positive reinforcing effects. They increase the frequency and intensity of behavior leading to such objects (learning) and maintain learned behavior by preventing extinction. This function constitutes the essence of "coming back for more" and relates to the notion of receiving rewards for having done something useful. Learning proceeds when rewards occur unpredictably and slows as rewards become more and more predicted (8). Thus, reward-driven learning depends on the discrepancy or "error" between the prediction of reward and its actual occurrence. In their third function, rewards induce subjective feelings of pleasure (hedonia) and positive emotional states. This function is difficult to investigate in animals.
Reduced dopamine neurotransmission in parkinsonian patients and experimentally lesioned animals is associated with severe deficits in movement, motivation, attention, and cognition. One consistent motivational deficit concerns the use of reward information for learning and maintaining approach and consummatory behavior (2, 9). The deficits occur mainly with destruction of projections from midbrain dopamine neurons to the nucleus accumbens and, to a lesser extent, to frontal cortex and striatum (caudate nucleus and putamen). These systems are also involved in the addictive properties of major drugs of abuse, such as cocaine, amphetamine, heroin, and nicotine.
| Responses of dopamine neurons in behaving primates |
|---|
|
|
|---|
|
Phasic activations occur when animals touch a morsel of hidden food or when drops of liquid are delivered to their mouth outside of behavioral tasks or during learning (Fig. 2
, top). Dopamine neurons distinguish rewards from nonreward objects but do not appear to discriminate between different food objects or liquid rewards. Only a few show the phasic activations after primary aversive stimuli, such as nonnoxious air puffs to the hand or hypertonic saline to the mouth (7). These stimuli are aversive because they disrupt behavior and induce active avoidance reactions.
|
Concurrently with the development of the dopamine response to reward-predicting stimuli during learning, the response to the predicted reward itself is lost, as if the response is transferred from the reward to the reward-predicting stimulus (Fig. 2
, top vs. middle). This is observed when free rewards are delivered outside of behavioral tasks and become predicted by conditioned stimuli through learning or when rewards occur surprisingly during individual learning phases and become predicted when a phase is fully acquired. Thus rewards are only effective in activating dopamine neurons when they are not predicted by phasic stimuli.
Dopamine neurons have a limited capacity to discriminate between appetitive and neutral or aversive stimuli. Only stimuli that are physically sufficiently dissimilar are well discriminated. Stimuli that do not explicitly predict rewards but physically resemble reward-predicting stimuli induce small activations followed by depressions in a limited fraction of neurons.
Dopamine neurons are depressed at the habitual time of reward when a predicted reward fails to occur after an error of the animal, withholding by the experimenter, or delayed delivery (Fig. 2
, bottom). The depression occurs in the absence of a stimulus immediately preceding the omitted reward. This reflects an expectation process based on an internal clock that concerns the precise time of the predicted reward. On the other hand, an activation follows the reward when this is presented at a different time than predicted (Fig. 3
). These data suggest that the prediction influencing dopamine neurons concerns both the occurrence and the time of reward.
|
Taken together, most dopamine neurons show phasic activations after food and fluid rewards, and after conditioned, rewardpredicting stimuli. They show biphasic activation-depression responses after stimuli that resemble reward-predicting stimuli or are novel or particularly salient. However, only few a phasic activations follow aversive stimuli. Thus dopamine neurons label environmental stimuli with an appetitive "tag," predict and detect rewards, and signal alerting and motivating events.
All responses to rewards and reward-predicting stimuli depend on event predictability that concerns the precise time of reward. The more tonic reward-predicting environmental context in which a reward occurs does not appear to influence dopamine neurons. The dopamine reward response appears to indicate to what extent a reward occurs differently than predicted, termed an "error" in the prediction of reward. Thus dopamine neurons report rewards relative to their prediction, rather than signaling rewards unconditionally. They appear to be feature detectors for the goodness of environmental events relative to prediction, being activated by rewarding events that are better than predicted, remaining uninfluenced by events that are as good as predicted, and being depressed by events that are worse than predicted (Fig. 2
). However, they fail to discriminate between different rewards and thus appear to emit an alerting message about the surprising presence or absence of rewards without indicating the particular nature of each reward. They process the time and prediction of rewards but not the nature of the particular reward.
| Potential use of the reward prediction error signal |
|---|
|
|
|---|
The basic arrangement of synaptic influences of dopamine neurons on striatal and frontal cortex neurons consists of a triad comprising dendritic spines, excitatory cortical terminals at the tip of dendritic spines, and dopamine varicosities contacting the same dendritic spines (Fig. 4
). Every medium-sized striatal spiny neuron receives ~1,000 dopaminergic synapses at its dendritic spines and ~5,000 cortical synapses. This arrangement would allow dopamine neurons to influence the synaptic effects of cortical inputs to striatal neurons. The released dopamine may act on the striatal and cortical neurons in several possible ways. 1) The immediate effect may consist in a change of corticostriatal neurotransmission. This would modify information circulating in cortico-basal ganglia loops and influence neurons in cortical structures involved in structuring behavioral output. 2) The relatively slow time course of dopamine membrane action may leave a short trace of the reward event and influence all subsequent activity for a short while. 3) The potential dopamine-dependent plasticity in the striatum and the observed forms of dopamine responses may induce plastic changes in striatal and cortical synapses concurrently activated by the events leading to reward.
|
I and B
I are short-term or long-term Hebbian modifiable. The same spines are indiscriminately contacted by the global reward prediction error signal from dopamine input X. Both neuron X and neuron A, but not neuron B, are activated when a reward-related signal is encountered. Neuron X transmits the message that a rewarding event has occurred without giving specific details, whereas neuron A sends a message about one of several detailed aspects of the reward-related event, such as color, texture, position, surroundings, etc. of the stimulus or may code a movement leading to obtaining the reward. The weights of striatal synapses could be modified according to the learning rule 
=
rio, where
is synaptic weight,
is learning constant, r is dopamine prediction error signal, i is input activation, and o is activation of striatal neuron. Thus, through the simultaneity or near simultaneity of activity in A and X, the activity of neuron X may induce a change in neurotransmission at the active A
I synapse, but leave the inactive B
I neurotransmission unchanged. In the case of lasting changes in synaptic transmission, subsequent input from neuron A would lead to an increased response in neuron I, whereas input from neuron B leads to an unchanged response in neuron I. Thus, the synaptic changes of A
I and B
I neurotransmission are conditional on dopamine neuron X being conjointly active with A or B. The dopamine response coding an error in the prediction of reward resembles in all major aspects the reinforcement signal of a particularly effective class of reinforcement models that incorporate temporal difference algorithms (6, 13, 15). They are based on behavioral learning theories that assume that learning depends crucially on the discrepancy or error between the prediction of reinforcement and its actual occurrence (1, 8). In these models, a critic module generates a global reinforcement signal and sends it to the actor module that learns and executes behavioral output. The critic-actor architecture closely resembles the connectivity of the basal ganglia, including the dopamine projection to the striatum and the reciprocal striatonigral projection. Models using temporal difference algorithms learn a wide variety of behavioral tasks, reaching from balancing a pole on a cart wheel to playing world-class backgammon (for references, see Ref. 11). Robots using temporal difference algorithms learn to move about two-dimensional space and avoid obstacles, reach and grasp, or insert a peg into a hole. Neurobiologically inspired temporal difference models replicate foraging behavior of honeybees, simulate human decision making, and learn orienting reactions, eye movements, sequential movements, and spatial delayed-response tasks. It is particularly interesting to see that teaching signals using prediction errors result in faster and more complete learning, compared with unconditional reinforcement signals.
| Conclusions and extensions |
|---|
|
|
|---|
Information concerning food and fluid rewards is also processed in brain structures other than dopamine neurons, such as dorsal and ventral striatum, subthalamic nucleus, amygdala, dorsolateral prefrontal cortex, orbitofrontal cortex, and anterior cingulate cortex. However, these structures do not appear to emit a global reward prediction error signal similar to dopamine neurons. These structures show 1) transient responses after the delivery of rewards, 2) transient responses to reward-predicting cues, 3) sustained activations during the expectation of rewards, and 4) modulations of behavior-related activity by predicted rewards (for references, see Ref. 11). Many of these neurons differentiate well between different food or fluid rewards. Thus they may process the specific nature of the rewarding event. Some reward responses depend on reward unpredictability in being reduced or absent when the reward is predicted by a conditioned stimulus, although it is unclear whether they signal prediction errors similar to dopamine neurons. It thus appears that the processing of specific rewards for learning and maintaining approach behavior would strongly profit from a cooperation between dopamine neurons signaling the unpredicted occurrence or omission of reward and neurons in the other structures simultaneously indicating the specific nature of the reward.
Impaired dopamine neurotransmission with Parkinson's disease, experimental lesions, or neuroleptic treatment is associated with many behavioral deficits in movement (akinesia, tremor, rigidity), cognition (attention, bradyphrenia, planning, learning), and motivation (reduced emotional responses, depression). Most deficits are considerably ameliorated by systemic dopamine precursor or receptor agonist therapy, which cannot in a simple manner restitute the phasic information transmission by neuronal impulses. It appears that dopamine neurotransmission plays two separate functions in the brain, the phasic processing of appetitive and alerting information and the tonic enabling of a large variety of motor, cognitive, and motivational processes without temporal coding (11). The tonic dopamine function is based on low, sustained extracellular dopamine concentrations in the striatum (5-10 nM) and other dopamine-innervated areas. The ambient dopamine concentration is regulated locally within a narrow range by spontaneous impulses, synaptic overflow, reuptake transport, metabolism, autoreceptor-controlled release and synthesis, and presynaptic transmitter interaction. The tonic stimulation of dopamine receptors should be neither too low nor too high for an optimal function of a given brain region. Other neurotransmitters exist in similarly low ambient concentrations, such as glutamate in striatum, cerebral cortex, hippocampus, and cerebellum, aspartate and GABA in striatum and frontal cortex, and adenosine in hippocampus. Neurons in many brain structures are apparently bathed in a "soup" of neurotransmitters that have powerful, specific physiological effects on neuronal excitability. Given the general importance of tonic extracellular concentrations of neurotransmitters, it appears that the wide range of parkinsonian symptoms would not be caused by deficient transmission of reward information by dopamine neurons but would reflect a malfunction of striatal and cortical neurons caused by impaired enabling by reduced ambient dopamine. Dopamine neurons would not be actively involved in the wide range of processes deficient in parkinsonism but would provide the important background concentration of dopamine necessary to maintain proper functioning of striatal and cortical neurons involved in these processes.
| Acknowledgments |
|---|
W. Schultz was awarded the 1997 Theodore Ott Prize of the Swiss Academy of Medical Sciences for the work reviewed in this article.
Reference citations are limited because of editorial restrictions
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
T. E. Baker and C. B. Holroyd Which Way Do I Go? Neural Activation in Response to Feedback and Spatial Processing in a Virtual T-Maze Cereb Cortex, August 1, 2009; 19(8): 1708 - 1722. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Gavornik, M. G. H. Shuler, Y. Loewenstein, M. F. Bear, and H. Z. Shouval Learning reward timing in cortex through reward dependent expression of synaptic plasticity PNAS, April 21, 2009; 106(16): 6826 - 6831. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. SUZUKI, N. OKAMURA, Y. KAWACHI, M. TASHIRO, H. ARAO, T. HOSHISHIBA, J. GYOBA, and K. YANAI Discrete cortical regions associated with the musical beauty of major and minor chords Cogn Affect Behav Neurosci, June 1, 2008; 8(2): 126 - 131. [Abstract] [PDF] |
||||
![]() |
P.R. Corlett, G.D. Honey, and P.C. Fletcher From prediction error to psychosis: ketamine as a pharmacological model of delusions J Psychopharmacol, May 1, 2007; 21(3): 238 - 252. [Abstract] [PDF] |
||||
![]() |
S. Kruger, M. Alda, L. T. Young, K. Goldapple, S. Parikh, and H. S. Mayberg Risk and Resilience Markers in Bipolar Disorder: Brain Responses to Emotional Challenge in Bipolar Patients and Their Healthy Siblings Am J Psychiatry, February 1, 2006; 163(2): 257 - 264. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ernst, A. S. Kimes, E. D. London, J. A. Matochik, D. Eldreth, S. Tata, C. Contoreggi, M. Leff, and K. Bolla Neural Substrates of Decision Making in Adults With Attention Deficit Hyperactivity Disorder Am J Psychiatry, June 1, 2003; 160(6): 1061 - 1070. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |