Research ArticleCOGNITIVE NEUROSCIENCE

Signal dynamics of midbrain dopamine neurons during economic decision-making in monkeys

See allHide authors and affiliations

Science Advances  01 Jul 2020:
Vol. 6, no. 27, eaba4962
DOI: 10.1126/sciadv.aba4962

Abstract

When we make economic choices, the brain first evaluates available options and then decides whether to choose them. Midbrain dopamine neurons are known to reinforce economic choices through their signal evoked by outcomes after decisions are made. However, although critical internal processing is executed while decisions are being made, little is known about the role of dopamine neurons during this period. We found that dopamine neurons exhibited dynamically changing signals related to the internal processing while rhesus monkeys were making decisions. These neurons encoded the value of an option immediately after it was offered and then gradually changed their activity to represent the animal’s upcoming choice. Similar dynamics were observed in the orbitofrontal cortex, a center for economic decision-making, but the value-to-choice signal transition was completed earlier in dopamine neurons. Our findings suggest that dopamine neurons are a key component of the neural network that makes choices from values during ongoing decision-making processes.

INTRODUCTION

When we make economic choices, the brain first evaluates available options and then decides whether to choose them. To understand the neural mechanism underlying the decision-making process, previous studies have measured neuronal activity while decisions are being made and have found that neurons in prefrontal and striatal regions encode information necessary to make economic decisions (15). In particular, neurons in the orbitofrontal cortex (OFC) encode multiple decision variables associated with internal processing executed while decisions are being made, such as the evaluation of available options, comparison between the options, and identification of a chosen option (69). These OFC signals dynamically change from information about available options to those about a chosen option as the decision-making process progresses. Accordingly, the OFC is considered a key component of the neural network that decides whether to choose available options based on the value information.

Midbrain dopamine neurons also play a crucial role in making economic decisions. These neurons are known to encode a reward-related signal called “reward prediction error” that indicates a discrepancy between obtained and expected reward values (10). This dopamine signal has been proposed to reinforce choices that lead to better-than-expected outcomes (11, 12). Note that the reward prediction error signal, which produces the reinforcement effect on choices, is evoked by outcomes after decisions are made. Although critical internal processing is executed while decisions are being made, little is known about the role of dopamine neurons during this period. Several studies have reported that dopamine neurons encode the value of a chosen option while animals are making economic decisions (13, 14), but the way in which this dopamine signal contributes to the decision-making process remains elusive.

In a separate line of research, it has become increasingly clear that dopamine neurons are divided into multiple subgroups encoding distinct signals related not only to value but also to punishment, salience, body movement, and cognitive processes (1522). These observations raise a possibility that, within the economic decision-making framework, dopamine neurons encode decision variables other than value-related information. In the present study, we investigated whether dopamine neurons encode signals involved in internal processing executed while decisions are being made. To this end, we recorded single unit activity from dopamine neurons in monkeys performing an economic decision-making task. This task was designed to continuously monitor neuronal activity as the monkey evaluated an option, decided whether to choose it, and expressed the choice with a motor action. We found that dopamine neurons encoded the value of an option immediately after it was offered and then gradually changed their activity to represent whether the monkey would decide to choose or not to choose the option. To give contexts to these dopamine signals, we also recorded single unit activity from the OFC, which is a key cortical substrate of economic decision-making. We found that OFC neurons exhibited similar signal dynamics, but the value-to-choice signal transition completed earlier in dopamine neurons than in OFC neurons. Our findings extend current knowledge about the role of dopamine neurons in economic decision-making by highlighting their dynamically changing signals associated with the ongoing decision-making process.

RESULTS

Monkeys made decisions based on an option’s value in our economic decision-making task

We designed an economic decision-making task in which a monkey decided whether to choose an offered option (Fig. 1A). The monkey gazed at a central fixation point and pressed a button at the beginning of each trial, following which two of six possible visual objects were sequentially presented. The six visual objects were associated with different amounts of a liquid reward (0.12 ml, value 1; 0.18 ml, value 2; 0.24 ml, value 3; 0.30 ml, value 4; 0.36 ml, value 5; and 0.42 ml, value 6). The same set of visual objects had been used throughout training (more than 6 months) and recording sessions in both monkeys. The first object was presented as an option, and the monkey was required to decide to choose or not to choose the first object within its presentation. Releasing the button was regarded as the decision to choose the first object, while keeping the button pressed down was regarded as the decision not to choose it. After the decision had been made, the second object was presented, and the outcome of the trial was delivered. If the monkey had decided to choose the first object, then the animal obtained the reward associated with the first object. If the monkey had decided not to choose the first object, then the animal obtained the reward associated with the second object by releasing the button within the presentation of the second object (i.e., by simply responding to the appearance of the second object). This task design enabled us to continuously monitor neuronal activity while decisions were being made during the presentation of the first object, that is, as the monkey evaluated the first object, decided whether to choose it, and expressed the choice with the button release.

Fig. 1 Economic decision-making task, monkeys’ behavior, and recording sites.

(A) Economic decision-making task. ITI, intertrial interval. (B) Choice rate of the first object in monkey A (n = 216 sessions) (left) and monkey E (n = 165 sessions) (right). (C) Rate of trials in which the monkey did not release the button within the second object presentation among trials in which the animal did not choose the first object. (D) Latency of the button release to choose the first object (circles) and to respond to the appearance of the second object (squares). Double asterisks indicate a significant difference between the latencies for the first and second objects (P < 0.01, two-tailed Wilcoxon signed-rank test). (E) Effects of the first (green) and second (orange) object values in the previous trial (t−1) and the first object value in the current trial (t) (purple) on the monkey’s choice. Double asterisks indicate a significant logistic regression coefficient (P < 0.01). Error bars in (B) to (E) indicate SEM, which are very small and hidden in most cases. (F and G) Recording sites shown on the images obtained by an MRI scan, in which the position of electrodes targeting the left SNc/VTA (red) (F) and the right OFC (yellow) (G) in monkey E is displayed.

We trained two monkeys (monkeys A and E) to perform this task. Both monkeys decided whether or not to choose the first object based on its value; the higher the value of the first object became, the more likely they were to choose it (i.e., the logistic regression slope between the value and the choice rate was significantly larger than zero; monkey A: n = 216 sessions, means ± SD = 3.5 ± 0.9, z = 11.70, P = 1.3 × 10−31; monkey E: n = 165 sessions, means ± SD = 3.4 ± 1.0, z = 10.12, P = 4.6 × 10−24; two-tailed Wilcoxon signed-rank test) (Fig. 1B). When the monkey decided not to choose the first object, the animal needed to release the button during the presentation of the second object. However, the animals sometimes did not release the button within the presentation of the second object, especially when the value of the second object was low (i.e., the regression slope between the value and the rate of not releasing the button within the presentation of the second object was significantly smaller than zero; monkey A: n = 216 sessions, b = −0.73, P = 3.7 × 10−38; monkey E: n = 165 sessions, b = −2.74, P = 1.4 × 10−44) (Fig. 1C). Furthermore, the latency of the button release was significantly longer when the monkey decided to choose the first object than when the monkey simply responded to the appearance of the second object (monkey A: n = 216 sessions; value 4: z = 12.68, P = 7.3 × 10−37; value 5: z = 12.73, P = 3.9 × 10−37; and value 6: z = 12.66, P = 9.5 × 10−37; monkey E: n = 165 sessions; value 4: z = 11.17, P = 5.4 × 10−29; value 5: z = 11.17, P = 5.4 × 10−29; and value 6: z = 11.17, P = 5.4 × 10−29; two-tailed Wilcoxon signed-rank test) (Fig. 1D). The latency significantly decreased as the object value increased for both the first object (i.e., the regression slope between the value and the latency was significantly smaller than zero; monkey A: n = 216 sessions, means ± SD = −24.8 ± 8.2, z = −12.74, P = 3.4 × 10−37; monkey E: n = 165 sessions, means ± SD = −50.3 ± 16.8, z = −11.17, P = 5.5 × 10−29; two-tailed Wilcoxon signed-rank test) and the second object (monkey A: n = 216 sessions, means ± SD = −11.0 ± 5.1, z = −12.72, P = 4.9 × 10−37; monkey E: n = 165 sessions, means ± SD = −5.3 ± 6.2, z = −8.78, P = 1.7 × 10−18; two-tailed Wilcoxon signed-rank test).

Notably, the monkey’s choice behavior was influenced not only by the value of the first object in the ongoing trial but also by the values of the first and second objects presented in the previous trial. A logistic regression analysis showed significantly negative regression coefficients between the values of the previous first and second objects and the choice rate of the current first object (n = 82,855 trials; previous first object, b = −0.13, P = 1.2 × 10−48; previous second object, b = −0.12, P = 2.8 × 10−45) (Fig. 1E), indicating that the higher the values of these previous objects became, the less likely the monkeys were to choose the first object in the ongoing trial. This suggests that, when making a decision of whether to choose the first object, the monkeys not only simply referred to the first object but also took into account their previous experiences.

Dopamine and OFC neurons represented value and/or choice

We recorded single unit activity from 96 dopamine neurons (60 and 36 neurons in monkeys A and E, respectively) in the substantia nigra pars compacta (SNc) and the ventral tegmental area (VTA) (Fig. 1F) and 285 OFC neurons (156 and 129 neurons in monkeys A and E, respectively) (Fig. 1G) and focused our analysis on their activity during the presentation of the first object when the monkey was required to make the decision based on the object’s value. We found that not only OFC neurons but also dopamine neurons exhibited multiple activity patterns that were modulated by the option’s value and/or the monkey’s choice behavior (Fig. 2). Some dopamine and OFC neurons encoded the value of the first object regardless of whether the monkey decided to choose the object (chosen trials) or not to choose it (unchosen trials) (see Fig. 2, A and D for example neurons). In contrast, the activity of some dopamine and OFC neurons represented the monkey’s choice behavior (see Fig. 2, C and F for example neurons). These example neurons were more strongly activated in chosen trials than in unchosen trials, even when the same object was offered (i.e., the object with value 4, which the monkey sometimes chose and sometimes did not choose). In addition, the activity of some dopamine and OFC neurons was influenced by both the value and choice (see Fig. 2, B and E for example neurons). These example neurons encoded the value of the first object, but only when the monkey decided to choose the object. These neurons seemed to reflect a decision variable called “chosen value” (6) because they encoded the value of the “chosen” first object.

Fig. 2 Dopamine and OFC neurons representing value and/or choice during economic decision-making.

(A to F) Activity of six example neurons [(A to C) dopamine neurons; (D to F) OFC neurons]. Top: Spike density functions (SDFs) aligned at the onset of the first object. The SDFs are shown for each object value (red, value 6; pink, value 5; yellow, value 4; light blue, value 3; blue, value 2; dark blue, value 1) and for chosen (solid curves) and unchosen trials (dotted curves). Gray horizontal bars indicate the time window to calculate the magnitude of neuronal activity. Bottom: Magnitude of neuronal activity plotted against the object value shown for chosen (filled circles) and unchosen trials (open circles). Gray plots showed the baseline activity (−500 to 0 ms) for each value condition. Error bars indicate SEM.

To statistically characterize signals encoded by dopamine and OFC neurons, we fitted the activity of each neuron with two models that depended on the “pure value” and “pure choice” (value and choice models, respectively, shown in Fig. 3A) and compared their coefficients of determination (R2) (Fig. 3B). The activity of neurons with a significantly better fit by the value model was considered to be more largely modulated by the value of the first object (P < 0.05, two-tailed bootstrap test) (red area in Fig. 3B, hereafter called “value-modulated” neurons), while the activity of neurons with a significantly better fit by the choice model was considered to be more largely modulated by the monkey’s choice behavior (P < 0.05, two-tailed bootstrap test) (blue area in Fig. 3B, hereafter called “choice-modulated” neurons). If neurons did not show a significantly better fit by either model (P > 0.05, two-tailed bootstrap test) but both models fitted significantly to their activity (P < 0.05, two-tailed F test), then these neurons were considered to exhibit an intermediate modulation between the value and the choice models (white area in Fig. 3B, hereafter called “intermediate” neurons). We conducted this model comparison analysis throughout the presentation of the first object using a 100-ms sliding window with a 1-ms step. Of the 285 OFC neurons, 22 were excluded from this analysis because they exhibited no discharge during the analysis period and we were unable to calculate the R2 used for the statistical procedure (see Materials and Methods for details of the analysis). Then, we obtained the temporal profile of the neuronal modulations evoked by the value and choice, that is, the R2 difference between the value and the choice models, for each dopamine neuron (n = 96; Fig. 3C) and each OFC neuron (n = 263; Fig. 3D).

Fig. 3 Model comparison analysis.

(A) Value (left) and choice (right) models. (B) Schematic diagram illustrating the procedure to identify value-modulated (red), intermediate (white), and choice-modulated neurons (blue). R2 is compared between the value (y axis) and the choice models (x axis). Horizontal and vertical dotted lines indicate the significance level (P < 0.05, two-tailed F test) of the value- and choice-model fits, respectively. Diagonal dotted lines indicate the significance level (P < 0.05, two-tailed bootstrap test) of the R2 difference between the models. Gray area indicates neurons that exhibited neither a significant R2 difference between the models (P > 0.05, two-tailed bootstrap test) nor a significant fit to either model (P > 0.05, two-tailed F test). (C and D) Temporal profile of the R2 difference between the models for each dopamine (n = 96) (C) and OFC neuron (n = 263) (D). The color of pixels represents the normalized magnitude of the R2 difference (red, better fit by the value model; white, intermediate fit between the value and the choice models; blue, better fit by the choice model). Open yellow rectangles show choice-modulated neurons. Red, gray, and blue triangles indicate the example neurons shown in Fig. 2, A and D, Fig. 2, B and E, and Fig. 2, C and F, respectively.

On the basis of the above temporal profile, we identified dopamine and OFC neurons with the value, intermediate, or choice modulation that stably continued for a certain period during the presentation of the first object (see Materials and Methods for details of the identification procedure). Of the 96 dopamine neurons, 38 were identified as value-modulated neurons, 52 were identified as intermediate neurons, and 32 were identified as choice-modulated neurons (Fig. 4A). We observed no significant electrophysiological or location difference between the three groups of dopamine neurons (fig. S1, A to E). Of the 263 OFC neurons, 101 were identified as value-modulated neurons, 106 were identified as intermediate neurons, and 64 were identified as choice-modulated neurons (Fig. 4B). Note that some dopamine and OFC neurons were identified as belonging to two or three groups because these neurons represented distinct signals for different periods during the object presentation. Among the identified neurons, the proportion of each neuron group was not significantly different between dopamine and OFC neurons (value-modulated neurons, P = 0.16; intermediate neurons, P = 0.38; choice-modulated neurons, P = 0.57; two-tailed Fisher’s exact test) (Fig. 4C).

Fig. 4 Proportions and averaged activities of value-modulated, intermediate, and choice-modulated neurons.

(A and B) Left: Proportions of identified neurons (i.e., value-modulated, intermediate, and choice-modulated neurons) and non-identified neurons. Right: Proportions of value-modulated, intermediate, and choice-modulated neurons among all the identified neurons. These proportions are shown for dopamine neurons (n = 96) (A) and OFC neurons (n = 263) (B). (C) Comparison of the proportions of value-modulated, intermediate, and choice-modulated neurons between dopamine (open bars) and OFC neurons (filled bars). n.s. indicates no significant difference (P > 0.05, two-tailed Fisher’s exact test). (D and E) Averaged magnitudes of value-modulated (left), intermediate (middle), and choice-modulated neuron activities (right) shown for dopamine neurons (n = 38, 52, and 32, respectively) (D) and OFC neurons (n = 54, 54, and 34, respectively) (E). Note that the OFC neurons that positively represented the option’s value and/or monkey’s choice were used in this analysis (see fig. S2 for OFC neurons that negatively represented the value and/or choice). Conventions are as the bottom panels in Fig. 2 (A to F).

We calculated the averaged activities of the value-modulated, intermediate, and choice-modulated neurons (see Fig. 4D for dopamine neurons and Fig. 4E and fig. S2 for OFC neurons). We analyzed the averaged activities of OFC neurons separately for neurons that positively represented the value and/or choice (Fig. 4E) and those that negatively represented the value and/or choice (fig. S2). The activities of both dopamine and OFC neurons represented the signals corresponding to the models used for their identifications. Especially, a marked feature of choice-modulated neurons is the difference in the activity between chosen and unchosen trials in their response to the object with value 4 (dopamine neurons: n = 32, z = 4.54, P = 5.8 × 10−6; OFC neurons: n = 34, z = 4.24, P = 2.2 × 10−5; two-tailed Wilcoxon signed-rank test), which is also observed in their response to the objects with values 3 and 5 (fig. S3). Notably, when the second object was presented, not only the value-modulated neurons but also intermediate and choice-modulated neurons represented the value of the second object to which the monkeys simply responded without economic decision-making (fig. S4). This suggests that the intermediate and choice-modulated activity patterns do not simply reflect the general or categorized information about the value of the first object.

Dopamine and OFC signals dynamically changed as the decision-making process progressed

When making economic choices, animals first evaluate available options and then decide whether to choose them. We found that the value-modulated, intermediate, and choice-modulated neuron signals appeared in the order corresponding to the time course of the decision-making process in both dopamine and OFC neurons (Fig. 5, A and B). The proportion of value-modulated neurons abruptly increased immediately after the onset of the first object, and the proportion of intermediate neurons then increased. Last, choice-modulated neurons gradually appeared. This temporal profile was also confirmed by the signal latency of each neuron group (Fig. 5, C and D). The latency of the value-modulated signal was significantly shorter than those of the intermediate and choice-modulated signals in both dopamine and OFC neurons (dopamine neurons: value versus intermediate, z = −1.99, P = 0.046 and value versus choice, z = −4.76, P = 1.9 × 10−6; OFC neurons: value versus intermediate, z = −3.23, P = 0.001 and value versus choice, z = −3.84, P = 1.2 × 10−4; two-tailed Wilcoxon rank-sum test). The latency of the intermediate signal was shorter than that of choice-modulated signal in both dopamine and OFC neurons, although the difference was significant only in dopamine neurons (dopamine neurons: intermediate versus choice, z = −3.44, P = 5.9 × 10−4; OFC neurons: intermediate versus choice, z = −1.09, P = 0.28; two-tailed Wilcoxon rank-sum test). In addition, the onset of each population signal appeared in the same order in both dopamine neurons (value, 131 ms; intermediate, 145 ms; choice, 205 ms) and OFC neurons (value, 116 ms; intermediate, 140 ms; choice, 268 ms) (arrowheads in Fig. 5, A and B). These results suggest that dopamine neurons, as well as OFC neurons, encoded the value of the first object immediately after it was offered and then gradually changed their activity to represent the animal’s choice behavior.

Fig. 5 Temporal dynamics of the dopamine and OFC signals corresponding to the time course of the decision-making process.

(A and B) Time-varying proportions of value-modulated (red), intermediate (gray), and choice-modulated neurons (blue) shown for dopamine neurons (n = 96) (A) and OFC neurons (n = 263) (B). Arrowheads represent the onsets of the value-modulated (red), intermediate (gray), and choice-modulated signals (blue). (C and D) Cumulative histograms of the latencies of the value-modulated (red), intermediate (gray), and choice-modulated signals (blue) shown for dopamine neurons (n = 96) (C) and OFC neurons (n = 263) (D). Vertical dotted lines indicate mean latencies, and numbers are means ± SD. Single and double asterisks indicate a significant difference between the latencies (P < 0.05 and 0.01, respectively, two-tailed Wilcoxon signed-rank test). (E and F) Comparison of the R2 between the value model (y axis) and the choice model (x axis) in dopamine neurons (n = 96) (E) and OFC neurons (n = 263) (F). Each panel indicates the R2 for each 100-ms time bin. Pink lines indicate linear regression lines. Red, gray, and blue circles indicate the example neurons shown in Fig. 2 (A and D, B and E, and C and F, respectively). (G and H) Regression slopes calculated for each time bin in dopamine (G) and OFC neurons (H).

To further confirm the above temporal profile, we examined how the degree of fit (i.e., R2) with the value and choice models changed over time as a population (Fig. 5, E and F). Specifically, we plotted the R2s of the value and choice models for each dopamine neuron (n = 96) and each OFC neuron (n = 263) as a scatter plot and calculated the regression slope of the value and choice models’ R2s. The smaller the regression slope is, the better the choice-model fit tends to become as a population. We found that the regression slope gradually decreased after the first object was offered in both dopamine and OFC neurons (regression coefficient of the slope against the elapsed time: dopamine neurons: b = −0.05, F = 67.96, P = 7.5 × 10−5; OFC neurons: b = −0.04, F = 107.57, P = 1.7 × 10−5) (Fig. 5, G and H), indicating that the degree of fit with the choice model increased over time.

The above analyses demonstrated that the temporal profile of the dopamine and OFC signals corresponded to the time course of the decision-making process in which the monkey first evaluated the first object and then decided whether to choose it. We observed the same temporal profile even in individual dopamine and OFC neurons. Specifically, many dopamine and OFC neurons identified as choice-modulated neurons encoded the value-modulated and/or intermediate signals, particularly before encoding the choice-modulated signal (see neurons surrounded by open yellow rectangles in Fig. 3, C and D; see also fig. S5).

We have so far examined the temporal profile of dopamine and OFC signals. We next compared the temporal profile between them and found that, especially, the choice-modulated signal appeared earlier in dopamine neurons than in OFC neurons. The latency of the choice-modulated signal was significantly shorter in dopamine neurons (means ± SD = 329 ± 141 ms) than in OFC neurons (means ± SD = 398 ± 173 ms) (z = −2.25, P = 0.024, two-tailed Wilcoxon rank-sum test) (Fig. 5, C and D). This tendency was maintained across monkeys (fig. S6). Furthermore, the onset of the signal was also shorter in dopamine neurons (205 ms) than in OFC neurons (268 ms) (Fig. 5, A and B). These data suggest that the signal transition from value to choice completed earlier in dopamine neurons.

Choice-modulated signal started earlier than the motor expression of choice

If the choice-modulated signal influenced the monkey’s choice, the signal would be expected to start earlier than the motor expression of the choice (i.e., the button release). We therefore compared the onset of the choice-modulated signal with that of the monkey’s button release (Fig. 6). For this comparison, the activity of the choice-modulated neurons was realigned at the onset of the button release for chosen and unchosen trials, and the onset of the choice-modulated signal was defined as the time at which these neurons started to show a significant modulation between chosen and unchosen trials (Fig. 6, B and D; see also fig. S7, A and B for OFC neurons that negatively represented the choice). Note that we used here the neuronal activity in trials in which the value of the first object was 4. Because the monkeys sometimes chose the first object and sometimes did not choose it in these trials (Fig. 1B), we could collect enough data to compare the neuronal activity between chosen and unchosen trials under the same value condition. We found that the choice-modulated signal started earlier than the onset of the monkey’s button release in both dopamine neurons (226 ms before the button release onset) and OFC neurons (177 ms before the button release onset) (Fig. 6, B and D). This suggests that these neurons could influence the monkey’s final motor expression to choose the first object, at least with respect to the time course. In addition, the choice-modulated signal of dopamine neurons appeared earlier than that of OFC neurons, consistent with the comparison of their latencies aligned at the onset of the first object (Fig. 5, C and D).

Fig. 6 Onsets of the choice-modulated signal and monkey’s choice behavior.

(A and C) Averaged SDFs of choice-modulated dopamine neurons (n = 31) (A) and OFC neurons (n = 34) (C) aligned at the onset of the first object shown for chosen trials (blue) and unchosen trials (gray) under the condition in which “object value = 4”. One choice-modulated dopamine neuron was excluded from this analysis because the monkey chose the first object associated with the value 4 in all trials during the recording session, and, consequently, we were unable to collect data in unchosen trials. (B and D) Averaged SDFs of the same choice-modulated dopamine neurons (n = 31) (B) and OFC neurons (n = 34) (D) aligned at the onset of the button release. Shaded areas around the curves indicate SEM. Vertical dotted lines and numbers indicate the time when the difference in the averaged firing rate between chosen and unchosen trials became significant (P < 0.05, one-tailed Wilcoxon signed-rank test).

Choice-modulated dopamine neurons were not activated by the motor action itself

Choice-modulated dopamine neurons were more strongly activated when the monkey decided to choose the first object (i.e., when the animal released the button) than when the monkey decided not to choose the first object (i.e., when the animal did not release the button). Therefore, it seemed possible that the choice-related activation could be involved in the motor process to execute the button release rather than the decision-making process to choose the first object. However, contrary to this possibility, we found that choice-modulated dopamine neurons were not activated when the monkey executed the same motor action (i.e., the button release) in a control task in which the animal was required to simply release the button without economic decision-making (Fig. 7, A and B) (see Materials and Methods for details of the control task). Of the 32 choice-modulated dopamine neurons, we recorded the activity of 20 neurons during the control task. These neurons did not show a significant modulation around the onset of the button release as a population (z = −1.13, P = 0.26, two-tailed Wilcoxon signed-rank test) (Fig. 7, C and D; see also fig. S8 for the activity of the 20 dopamine neurons in the economic decision-making task). This suggests that the choice-modulated signal of dopamine neurons was not simply caused by the motor action itself.

Fig. 7 Neuronal modulation evoked by the button release in the control task.

(A) Control task. (B) Latency of the button release in monkey A (n = 182 sessions) and monkey E (n = 165 sessions) in the control task. (C and E) Averaged SDFs of the choice-modulated dopamine neurons (n = 20) (C) and OFC neurons (n = 34) (E) aligned at the onset of the button release in the control task. Shaded areas around the curves indicate SEM. Horizontal white and blue bars indicate the time windows used to calculate the baseline firing rate (−400 to −200 ms) and the firing rate around the onset of the button release (−200 to 200 ms) of each neuron, respectively. (D and F) Comparison between the baseline firing rate and the firing rate around the onset of the button release in the control task for the 20 choice-modulated dopamine neurons (D) and the 34 choice-modulated OFC neurons (F). Each gray line indicates the data obtained from each neuron. Double asterisk indicates a significant difference between the firing rates (P < 0.01, two-tailed Wilcoxon signed-rank test). n.s. indicates no significant difference (P > 0.05, two-tailed Wilcoxon signed-rank test). Error bars indicate SEM.

We also recorded the activity of the 34 choice-modulated OFC neurons during the control task. In contrast to the dopamine neurons, as a population, the OFC neurons exhibited a significant increase in their activity around the onset of the button release (z = 3.53, P = 4.2 × 10−4, two-tailed Wilcoxon signed-tank test) (Fig. 7, E and F; see also fig. S7, C and D for OFC neurons that negatively represented the choice).

DISCUSSION

Dopamine neurons are known to play a crucial role in economic decision-making by reinforcing choices that lead to better-than-expected outcomes (11, 12). This effect is thought to be implemented by a dopamine signal called reward prediction error, which is evoked when animals obtain reward outcomes after decisions are made. In the present study, on the other hand, we focused on dopamine neuron activity while decisions are being made, that is, while a monkey evaluated an available option (i.e., the first object), decided whether to choose it, and expressed the choice with a motor action (i.e., the button release). We found that dopamine neurons encoded multiple decision variables during this period, including not only the value of the first object but also the animal’s upcoming choice. These signals displayed a temporal profile corresponding to the transition from value to choice. Furthermore, the signal transition was completed earlier in dopamine neurons than in the OFC, which has been proposed to be a cortical center for economic decision-making. Our findings extend knowledge about the role of dopamine neurons in economic decision-making by highlighting their dynamically changing signals associated with the ongoing decision-making process.

A major finding of the present study is that a group of dopamine neurons (choice-modulated dopamine neurons) represented whether the monkey decided to choose or not to choose the first object. These neurons were more strongly activated when the monkey decided to choose the first object than when the animal decided not to choose it. It might be possible, however, that their activity pattern reflects other attributes accompanying the monkey’s choice behavior. One possibility is that the activity pattern might reflect a neuronal modulation evoked by the motor process that executed the button release rather than a neuronal modulation involved in the decision-making process to choose the first object. Consistent with this assumption, previous studies in rodents have shown that a subgroup of dopamine neurons increases their activity when animals simply initiate a body movement and that stimulation of these neurons facilitates the body movement (16, 17). However, we found that choice-modulated dopamine neurons were not activated when the monkey executed the same motor action (i.e., the button release) in the control task, in which the animal was not required to make economic decisions. Thus, it is unlikely that choice-modulated dopamine neurons participate in the simple motor process. Consistent with this view, we found that these neurons exhibited almost the same activity profiles in trials with shorter and longer button-release latencies (fig. S9A), suggesting again that the activity pattern of choice-modulated dopamine neurons did not reflect the motor process that executed the button release.

Another possibility is that the activity pattern of choice-modulated dopamine neurons might reflect the monkey’s expectation of the upcoming second object. For instance, when the monkey decided to choose the first object, the animal was likely to expect that the upcoming second object would be worse than the first object. When the monkey decided not to choose the first object, the animal was likely to expect that the upcoming second object would be better than the first object. Thus, even if choice-modulated dopamine neurons represented the monkey’s expectation of the upcoming second object (i.e., better or worse) rather than the monkey’s choice behavior, their activity could become binary. On the other hand, we observed that the choice-modulated dopamine neurons were more strongly activated when the monkey decided to choose the first object (i.e., when the animal was likely to expect the worse second object) compared with when the monkey decided not to choose the first object (i.e., when the animal was likely to expect the better second object). Because dopamine neurons are thought to be more strongly activated when animals expect better events, their choice-related activation, which was stronger when the second object was expected to be worse, is not accounted for by the neuronal modulation evoked by the expectation of the upcoming second object. To further confirm whether choice-modulated dopamine neurons represented the expectation of the upcoming second object, we examined the effects of the values of the first and second objects presented in the previous trial on the activity of choice-modulated dopamine neurons (see Materials and Methods for details of the analysis). We have observed that the higher the values of these previous objects became, the less likely the monkeys were to choose the first object in the ongoing trial (Fig. 1E), suggesting that the monkeys expected the value of the upcoming second object based on the values of the previous objects. We found that only one and two of the 32 choice-modulated dopamine neurons exhibited a significant effect of the previous first or second object value, respectively, on the activity (P < 0.05, regression coefficient between the neuronal activity and the previous first or second object value). These proportions were not significantly larger than chance (previous first object value: 1 of 32, P = 0.80; previous second object value: 2 of 32, P = 0.44; one-tailed bootstrap test) (see Materials and Methods for details of the analysis). Thus, choice-modulated dopamine neurons were not considered to represent the previously offered values from which the monkey could expect the value of the upcoming second object. These data suggest that the activity pattern of choice-modulated dopamine neurons did not reflect the monkey’s expectation of the upcoming second object. It may be necessary, however, to continue to consider whether the choice-related modulation of dopamine neurons can be accounted for by other attributes accompanying the monkey’s choice behavior.

As discussed above, choice-modulated dopamine neurons were unlikely to be activated by the simple motor action. However, these neurons might directly regulate the motor expression of decisions, that is, whether to release the button or to keep the button pressed down. These motor expressions could be regarded as “go” and “no-go” responses. Because single dopamine neurons form widely spread axonal arborizations in the striatum (23), choice-modulated dopamine neurons might share the same downstream structures (e.g., a motor region of the striatum) with dopamine neurons signaling movement initiation and trigger the button release to choose the first object via the same mechanism by which dopamine neurons regulate movement initiation. Choice-modulated dopamine neurons might also be involved in “withholding” the button release (i.e., keeping the button pressed down) for not choosing the first object. These neurons not only increased their activity when the monkey decided to choose the first object but also decreased their activity when the animal decided not to choose it compared with their baseline firing rate (Fig. 4D), suggesting that these neurons also signaled the monkey’s decision not to choose the first object. Ogasawara et al. (21) found that dopamine signals transmitted to the striatum regulate withholding a motor action. Together, choice-modulated dopamine neurons may have potential to directly regulate the motor expression of decisions not only by triggering motor actions (i.e., go response) but also by withholding the actions (i.e., no-go response).

In addition to choice-modulated dopamine neurons, we found another group of dopamine neurons (intermediate dopamine neurons) that exhibited an intermediate activity pattern between the value and the choice models. These neurons encoded the value of the first object, primarily when the monkey decided to choose the first object. The activity of these neurons may reflect a decision variable called chosen value (6), because they encoded the value of the chosen first object. Consistent with this finding, previous studies have shown that dopamine neurons encode the value of a chosen option (13, 14) [see (24) for contradictory observations]. We found that dopamine neurons, as well as OFC neurons, started to encode the chosen value later than the value of the offered first object (i.e., the value-modulated signal). Then, dopamine and OFC neurons came to represent whether the monkey decided to choose or not to choose the object (i.e., the choice-modulated signal). Thus, the calculation of chosen value seems to occur during the signal transition from value to choice. These results raise a possibility that the calculation of the chosen value is a preliminary step toward generating a final choice command that signals which option to choose.

Our findings suggest that dopamine neurons are a key component of the neural network that makes choices from values during ongoing decision-making processes. Especially, because dopamine neurons exhibited the signal transition from value to choice via the intermediate state, these neurons may play a crucial role in transforming option’s value information into choice commands, which has been supposed to be implemented primarily by prefrontal regions (9, 25, 26). While dopamine neurons are connected with brain areas that process reward information (27), these neurons innervate the cortico-basal ganglia circuitry (28) and cortical areas (29) that are related to motor processing. Therefore, dopamine neurons seem to be well positioned as the site of transition between the reward system and the motor system.

Note that, although we found that the value-to-choice signal transition completed earlier in dopamine neurons than in OFC neurons, this finding does not simply suggest that dopamine neurons make decisions earlier than the OFC. Previous studies reporting the roles of the OFC in economic decision-making usually used behavioral tasks in which subjects decided to choose one among multiple available options (two options in many cases) (69). On the other hand, in our economic decision-making task, one option (i.e., the first object) was first presented, and the monkey decided to choose or not to choose this option before seeing the other option (i.e., the second object). Then, the monkey immediately expressed the decision with a motor action (i.e., the button release). Some previous studies also presented two options one by one as our economic decision-making task, but subjects were required to wait to express their decision until both options were presented (2). Therefore, it is difficult to estimate when the subjects made the decision. For instance, the subjects may have made the decision immediately after the presentation of the first option if its value was large but may have waited to make the decision until the second option was presented if the value of the first option was small. Our task design enabled us to estimate the period during which the monkey made the decision and to monitor neuronal activity during the decision-making process. However, of concern in our task is to what degree the OFC is involved in this type of economic decision-making (i.e., decisions of whether to choose one available option). Although the OFC has been shown to regulate decisions of which one to choose among multiple options, this cortical structure might be less involved in decisions of whether to choose one available option. Instead, subcortical systems including dopamine neurons might govern this type of economic decision-making. This possibility could account for why the value-to-choice signal transition completed earlier in dopamine neurons than in OFC neurons. Future studies are called for to determine to what degree the OFC is involved in decisions of whether to choose one available option, for instance, by testing the causality between the OFC and this type of economic decision-making.

A major difference in the proposed role of dopamine neurons in economic decision-making between the present study and the reinforcement learning theory is whether dopamine signals influence an “ongoing” choice behavior or “later” ones. The reinforcement learning theory supposes that the reward prediction error signal of dopamine neurons reinforces choices that lead to better-than-expected outcomes in later trials, while we propose that the dopamine signals influence internal processing executed during ongoing decision-making processes. It should be noted here that the effect of released dopamine on postsynaptic activity is mediated by G protein–coupled dopamine receptors, which are generally regarded as receptors that signal with slow speed (30). This slow effect of dopamine aligns with the reinforcement learning theory, because the theory supposes that the dopamine signal reinforces choices that are executed a second or more later than the onset of the dopamine signal. On the other hand, it is not clear whether the slow effect of dopamine can influence the ongoing decision-making process. For instance, we found that the choice-modulated dopamine signal started only 226 ms before the onset of the motor action to choose the first object. However, in accord with our proposal, recent studies have found that optogenetic stimulation of a dopamine neuron subgroup facilitates a motor action immediately after the onset of the stimulation (16, 17). Such a fast effect might be mediated by coreleased glutamate rather than dopamine (31). The synaptic mechanism underlying the ability of dopamine neurons to affect postsynaptic activity so quickly remains to be determined.

As discussed above, the reinforcement learning theory supposes that the reward prediction error signal of dopamine neurons influences decisions in later trials. On the other hand, another theory called “incentive salience theory” proposes that dopamine neurons affect ongoing decision-making processes by assigning incentive values to goals or actions of the processes (32). Accordingly, dopamine neurons motivate actions aimed at acquiring rewards. McClure et al. (33) attempted to capture the different dopamine functions derived from the two theories in a single flamework. They modeled decision-making processes as multiple states constituting of the start, goal, and several intermediate states. The model stores the estimated value of each state. By taking actions to advance toward the goal state, animals move from the start state to the goal state through the intermediate ones and obtain the values of the states that the animals reach. In this model, dopamine neurons play two different roles. First, these neurons directly influence ongoing action selection by signaling the estimated value of each state. This dopamine signal motivates actions leading to states with higher values. Second, dopamine neurons indirectly influence action selection through their role in learning the value of each state. This role is achieved by the reward prediction error signal of dopamine neurons. Our findings might fit into the first role proposed by McClure et al. That is, the three types of dopamine neurons that we found in the present study (i.e., value-modulated, intermediate, and choice-modulated dopamine neurons) might represent the values of different states of the decision-making process, such as the evaluation state and selection state, and might directly regulate the monkey’s choice behavior by motivating actions that lead to states with higher values.

We have so far not discussed the relationship between our dopamine data and reward prediction error. Neuronal modulations evoked during the presentation of the second object might provide an insight into this issue. For example, when the monkey decided to choose the first object, the animal was likely to expect that the upcoming second object would be worse than the first object. If the actual second object was better than the first object that had been chosen, then a negative reward prediction error could have arisen. In addition, when the monkey decided not to choose the first object, the animal was likely to expect that the upcoming second object would be better than the first object. If the actual second object was worse than the first object that had not been chosen, then a negative reward prediction error could have arisen. However, we observed no significant neuronal modulation correlated with such reward prediction errors in dopamine neurons or OFC neurons (fig. S10). This result may be accounted for by the characteristics of our decision-making task. For example, if the monkey had chosen the first object, then the animal no longer needed to take into account the second object because the second object was unavailable. If the monkey had not chosen the first object, then the animal no longer needed to care about the first object because the monkey had no chance to “rechoose” the first object. Therefore, although the information of the first and second object values, which was necessary to calculate the reward prediction error, was stored even in the next trial (Fig. 1E), the monkey may have not calculated the reward prediction error that was not required, at least during the presentation of the second object, to perform our decision-making task.

Compared with dopamine neurons, the OFC has attracted much attention as a neural substrate of economic decision-making. Our observations on the OFC are mostly consistent with previous studies reporting that OFC neurons encode multiple decision variables related to the evaluation of available options, comparison between the options, and identification of a chosen option (69). It has also been shown that subregions in the OFC (e.g., areas 11 and 13, which are anterior and posterior components of the OFC, respectively) play different roles in economic behavior. Murray et al. (34) found that inactivation of area 13 alters value updating while that of area 11 impairs goal selection. Although we did not observe a location difference between value-modulated, intermediate, and choice-modulated OFC neurons along the anterior-posterior axis (fig. S1G), it is accounted for by our recording sites in the OFC that were mostly in area 13 m (see Materials and Methods).

An important remaining question is the relationship between the dopamine and the OFC signals. We found that dopamine and OFC neurons exhibited similar signal dynamics associated with the ongoing decision-making process. However, it remains unclear whether and how their signals interact with each other. With respect to their circuitry, dopamine neurons send projections to the OFC (35) and are reciprocally connected with the ventral striatum (36), which is innervated by the OFC (37). In addition, lesioning the OFC alters the reward-related signal of dopamine neurons in rodents (38). According to these studies, dopamine neurons and the OFC are thought to form a functional network that also includes the ventral striatum, which plays a crucial role in reward-oriented behavior (39). Further studies are required to determine how dopamine neurons cooperate with other components of the neural network underlying economic decision-making, including the OFC and the ventral striatum.

MATERIALS AND METHODS

Animals

Two adult rhesus monkeys (Macaca mulatta; monkey A, male, 8.6 kg, 6 years old; monkey E, male, 10.1 kg, 12 years old) were used for the experiments. All procedures for animal care and experimentation were approved by the University of Tsukuba Animal Experiment Committee (permission number 14-137) and were carried out in accordance with the guidelines described in Guide for the Care and Use of Laboratory Animals published by the Institute for Laboratory Animal Research.

Behavioral tasks

Behavioral tasks and data collection were controlled by TEMPO system (Reflective Computing, WA, USA). The monkeys sat in a primate chair facing a computer monitor in a sound-attenuated and electrically shielded room. Eye movements were monitored using an infrared eye-tracking system (EyeLink, SR Research, Ontario, Canada) with sampling at 500 Hz.

The monkeys were trained to perform an economic decision-making task (Fig. 1A). Six visual objects were associated with different amounts of a liquid reward (water; 0.12, 0.18, 0.24, 0.3, 0.36, and 0.42 ml). The visual objects were monochrome fractal images (width, 5.2°; height, 5.2°) in monkey E and bar stimuli (width, 5.3°; height, 2.3°) consisting of green and magenta areas, the fraction of which predicted the amount of the liquid reward in monkey A. The same set of visual objects had been used throughout training (more than 6 months) and recording sessions in both monkeys. Each trial began with the presentation of a central fixation point (diameter, 0.5°) on the monitor, and the monkey was required to fixate on the point and press a button with the right hand. After the monkey had maintained fixation and kept the button pressed down for 750 ms, the fixation point disappeared, and one of the six visual objects was randomly presented as the “first object” at the center of the monitor for 1000 ms. This first object was offered as an option, and the monkey was required to decide to choose or not to choose this first object within its presentation. Releasing the button was regarded as the decision to choose the first object, while keeping the button pressed down was regarded as the decision not to choose it. When the monkey released the button, a red open rectangle (width, 6.3°; height, 6.3°) was presented around the chosen object as feedback. The first object and red rectangle disappeared 1000 ms after the onset of the first object, followed by a 400-ms fixation period. Then, one of the six visual objects was randomly presented as the “second object” for 1000 ms. If the monkey had decided to choose the first object, then the animal obtained the reward associated with the first object after the presentation of the second object. If the monkey had decided not to choose the first object, then the animal obtained the reward associated with the second object by releasing the button within the presentation of the second object. When the monkey released the button, the red open rectangle was presented around the second object as feedback. The monkey was required to maintain fixation until the offset of the second object. Correct behavior was signaled by a tone (1 kHz), and the reward associated with the chosen object was simultaneously delivered. Trials were aborted immediately if the monkey (i) did not start the central fixation or press the button within 4000 ms after the onset of the fixation point, (ii) broke the central fixation, (iii) released the button during inappropriate periods (i.e., during the first and second fixation periods), and (iv) pressed the button twice. These errors were signaled by a beep tone (100 Hz) and excluded from all analyses. If the monkey decided not to choose the first object but did not release the button during the presentation of the second object, then the beep tone was also presented at the offset of the second object and the monkey received no reward. These trials were excluded from analyses on neuronal activity. All trials were presented with a random intertrial interval (ITI) ranging from 2000 to 3000 ms.

The monkeys were also trained to perform a control task (Fig. 7A). Each trial began with the presentation of a central fixation point (diameter, 0.5°). The monkey was required to fixate on the point and press the button with the right hand. After the monkey had maintained the fixation and kept the button pressed down for 1000 ms, the animal was required to release the button within 2000 ms without any external go cue. After a 500-ms delay, correct behavior was signaled by a tone (1 kHz), and the liquid reward was simultaneously delivered. The reward amount was the same as value 3 or 4 in the economic decision-making task. Trials were aborted immediately if the monkey (i) did not start the central fixation or press the button within 4000 ms after the onset of the fixation point, (ii) broke the central fixation or button press during the 1000-ms fixation period, (iii) failed to release the button within 2000 ms, or (iv) broke the central fixation during the 500-ms delay. These errors were signaled by a beep tone (100 Hz) and excluded from all analyses. All trials were presented with a random ITI ranging from 2500 to 3500 ms.

Electrophysiology

A plastic head holder and three recording chambers were fixed to the skull under general anesthesia and sterile surgical conditions. Two of the recording chambers were placed over the frontoparietal lobes in both hemispheres, tilted laterally by 35°, and aimed at the SNc and the VTA. The other recording chamber was placed over the midline of the frontal lobes and aimed at the OFC in both hemispheres. The head holder and recording chambers were embedded in dental acrylic that covered the top of the skull and were firmly anchored to the skull by plastic screws. After the surgery, the monkeys underwent a magnetic resonance imaging (MRI) scan to determine the position of the recording electrode.

Single unit recordings were performed using tungsten electrodes with impedances of 1.2 to 2.5 megaohm (Frederick Haer, ME, USA) that were introduced into the brain through a stainless steel guide tube by an oil-driven micromanipulator (MO-97-S, Narishige, Tokyo, Japan). Recording sites were determined using a grid system, which allowed recordings at every 1 mm between penetrations. For a finer mapping of neurons, we also used a complementary grid that allowed electrode penetrations between the holes of the original grid.

Single unit potentials were amplified and band-pass filtered (100 Hz to 8 kHz) using a multichannel processor (MCP Plus 8, Alpha Omega, Nazareth, Israel) and isolated online using a voltage-time window discrimination system (ASD, Alpha Omega, Nazareth, Israel). The time of occurrence of each action potential was stored with 1-ms resolution.

Localization of recording regions

We recorded single unit activity from dopamine neurons in the SNc and VTA and from neurons in the OFC. To localize the recording regions, we inserted a tungsten electrode into a representative recording track in the SNc/VTA or the OFC and conducted an MRI scan (Fig. 1, F and G). The OFC region of interest ranged from A33 to A35 mm (−0.5 to +1.5 mm from the genu of the corpus callosum) in monkey A and from A33 to A34 mm (±0 to +1 mm from the genu of the corpus callosum) in monkey E along the anterior-posterior axis, which corresponded to area 13 m and might include the posterior end of area 11 m based on the Saleem and Logothetis atlas (40).

Identification of dopamine neurons

Putative dopamine neurons were identified on the basis of their well-established electrophysiological signatures: a low background firing rate at around 5 Hz, a broad spike potential in clear contrast to neighboring neurons with a high background firing rate in the substantia nigra pars reticulata (fig. S1A), and a phasic excitation in response to free reward.

Statistical analysis

For null hypothesis testing, 95% confidence intervals (P < 0.05) were used to define statistical significance in all analyses. To evaluate the effect of the value of the first object on the monkey’s decision of whether to choose the first object (Fig. 1B), the choice rate of the first object was fitted by the following logistic functionP=11+exp((β0+β1×V))(1)where P indicates the choice rate of the first object, V indicates the value of the first object, and β0 and β1 indicate the coefficients determined by logistic regression.

To evaluate the effects of the first and second object values in the previous trial on the monkey’s decision of whether to choose the first object in the ongoing trial (Fig. 1E), the choice rate of the first object was fitted by the following logistic functionPt=11+exp((β0+β1×V1t1+β2×V2t1+β3×V1t))(2)where Pt indicates the choice rate of the first object in trial t, V1t−1 and V2t−1 indicate the values of the first and second objects in trial t−1, respectively, V1t indicates the value of the first object in trial t, and β0 to β3 indicate the coefficients determined by logistic regression.

To analyze neuronal activity, we combined the data obtained from the two monkeys because they were qualitatively identical. To calculate spike density functions (SDFs), each spike was replaced by a Gaussian curve (σ = 30 ms).

To statistically characterize signals encoded by dopamine and OFC neurons, we fitted the activity of each neuron with the value and choice models (Fig. 3A). The value model is expressed by the following equationF=β0+β1×V(3)where F indicates the firing rate of each neuron, V indicates the value of the first object (1 to 6), and β0 and β1 indicate the coefficients determined by linear regression. The choice model is expressed by the following equationF=β0+β1×C(4)where C indicates whether the monkey decided to choose or not to choose the first object (1 for chosen trials and 0 for unchosen trials). The firing rate was calculated using a 100-ms sliding window with a 1-ms step. We compared the coefficient of determination (R2) between the two models for each window and for each neuron. We determined whether the R2 was significantly different between the models using a bootstrap procedure. For each neuron, we shuffled the firing rate of each trial and assigned it to another trial at random to form a shuffled dataset. We fitted the baseline-firing rate (0 to 100 ms before the onset of the first object) of the shuffled dataset with the value and choice models and calculated the baseline R2 difference between the models. We compared the R2 difference of the original dataset with the baseline R2 difference of the shuffled dataset. This shuffle and comparison process was repeated 1000 times. Consequently, for each calculation window, if the R2 difference of the original dataset was larger than the baseline R2 difference of the shuffled dataset in more than 975 repetitions, that is, the R2 of the value model was significantly larger than that of the choice model (P < 0.05, two-tailed bootstrap test), and the value model also fitted significantly to the activity (P < 0.05, two-tailed F test), then the activity of neurons was considered to be more largely modulated by the value of the first object for that window (red area in Fig. 3B, value-modulated neurons). On the other hand, if the R2 difference of the original dataset was smaller than the baseline R2 difference of the shuffled dataset in more than 975 repetitions, that is, the R2 of the choice model was significantly larger than that of the value model (P < 0.05, two-tailed bootstrap test), and the choice model also fitted significantly to the activity (P < 0.05, two-tailed F test), then the activity of neurons was considered to be more largely modulated by the monkey’s choice (blue area in Fig. 3B, choice-modulated neurons). If the R2 was not significantly different between the models (P > 0.05, two-tailed bootstrap test) but both models fitted significantly to the activity (P < 0.05, two-tailed F test), then neurons were considered to exhibit an intermediate modulation between the value and the choice models (white area in Fig. 3B, intermediate neurons). Of the 285 OFC neurons, 22 were excluded from this analysis because they exhibited no discharge in any 100-ms sliding window during the analysis period and the baseline R2 difference could not be calculated. We did not observe any dopamine neurons that exhibited no discharge during the analysis period in our dataset.

To display the temporal profile of the R2 difference between the value and the choice models for each neuron (Fig. 3, C and D), we normalized the R2 difference based on the baseline R2 difference of the shuffled dataset described above. Specifically, the R2 difference was divided by the baseline R2 difference corresponding to the significance level (P < 0.05, two-tailed bootstrap test). Thus, if the normalized R2 difference was larger than 1, then the R2 of the value model was significantly larger than that of the choice model (P < 0.05, two-tailed bootstrap test), indicating that the activity fitted better with the value model. If the normalized R2 difference was smaller than −1, then the R2 of the choice model was significantly larger than that of the value model (P < 0.05, two-tailed bootstrap test), indicating that the activity fitted better with the choice model.

To calculate the averaged activities of the value-modulated, intermediate, and choice-modulated neurons (Fig. 4, D and E, and fig. S2), we selected neurons that were identified as belonging to these neuron groups for a certain period during the presentation of the first object (Fig. 4, A and B). Specifically, we first calculated the temporal profile of the R2 difference between the value and the choice models for each neuron using a 100-ms sliding window with a 1-ms step as described above (Fig. 3, C and D). If a neuron was identified as a value-modulated, intermediate, or choice-modulated neuron in at least 45 of 50 consecutive 1-ms steps during the presentation of the first object, then the neuron was selected to calculate the averaged activity of the identified neuron group. Because 50 consecutive 100-ms sliding windows with a 1-ms step were used for this selection, neurons with a stable modulation at least for 150 ms were identified as belonging to the corresponding group. Some dopamine and OFC neurons were identified as belonging to two or three groups because these neurons represented distinct signals for different periods during the presentation of the first object. These neurons were selected to calculate the averaged activities of multiple neuron groups. The calculation time window started from the beginning of the 50 consecutive 1-ms steps, continued sliding until the “at least 45 of 50 consecutive 1-ms steps” criterion became unfulfilled, and stopped at the end of the last 50 consecutive 1-ms steps.

As mentioned above, we set the criterion as at least 45 of 50 consecutive 1-ms steps to identify each neuron group. If we used a different criterion, “at least 27 of 30 consecutive 1-ms steps”, then the total number of identified neurons increased compared with the 45-of-50 criterion (45-of-50 criterion, 38 value-modulated, 52 intermediate, and 32 choice-modulated dopamine neurons, and 101 value-modulated, 106 intermediate, and 64 choice-modulated OFC neurons; 27-of-30 criterion, 43 value-modulated, 67 intermediate, and 41 choice-modulated dopamine neurons, and 122 value-modulated, 135 intermediate, and 91 choice-modulated OFC neurons). Because dopamine and OFC neurons often exhibited a phasic modulation with a short duration, it is reasonable that the 27-of-30 criterion identified more neurons than the 45-of-50 criterion. However, the ratio of neurons identified as each type (i.e., value:intermediate:choice) is similar between the 45-of-50 and 27-of-30 criteria (dopamine neurons: 45-of-50 criterion, 1:1.4:0.84 and 27-of-30 criterion, 1:1.6:0.95; OFC neurons: 45-of-50 criterion, 1:1:0.63 and 27-of-30 criterion, 1:1.1:0.75). This suggests that, although the criterion changes the “threshold” of the identification, it does not alter the ingredient of the identified neurons.

We calculated the onsets of value-modulated, intermediate, and choice-modulated signals using the time-varying proportions of these neuron groups (Fig. 5, A and B). We compared the proportion of each neuron group at each 1-ms step with the median of the baseline proportion (0 to 200 ms before the onset of the first object). The onset was defined as the beginning of 20 consecutive 1-ms steps that were significantly different from the median (P < 0.05, two-tailed Fisher’s exact test).

For each neuron, we defined the latencies of value-modulated, intermediate, and choice-modulated signals as the start point of the first 50 consecutive 1-ms steps of which at least 45 steps exhibited these signals (i.e., the start point of the calculation time window described above) (Fig. 5, C and D).

We compared the activity of the choice-modulated neurons between trials in which the monkey decided to choose the first object (chosen trials) and trials in which the monkey decided not to choose the first object (unchosen trials) under the same value condition (i.e., when the object value was 4) (Fig. 6). One choice-modulated dopamine neuron was excluded from this analysis because the monkey chose the first object associated with the value 4 in all trials during the recording session and, consequently, we were unable to collect data during unchosen trials. To calculate the release-aligned SDFs in unchosen trials (i.e., trials in which the monkey did not release the button) (Fig. 6, B and D), we randomly selected the onsets of the button release in chosen trials with replacement and assigned the onsets to the unchosen trials.

To calculate the onset of the choice-modulated signal aligned at the onset of the button release (Fig. 6, B and D), we compared the activity of the choice-modulated neurons between chosen and unchosen trials using a 100-ms sliding window with a 1-ms step. The onset was defined as the beginning of 20 consecutive 1-ms steps that exhibited a significantly higher or lower activity in chosen trials than in unchosen trials for neurons with the “positive” or “negative” choice-modulated signal, respectively (P < 0.05, one-tailed Wilcoxon signed-rank test).

We examined location differences between the value-modulated, intermediate, and choice-modulated dopamine neurons. The dorsolateral-ventromedial location was regarded as the recording depth of each neuron that was measured from a reference depth (the recording depth of the shallowest dopamine neuron in each hemisphere) (fig. S1D). The posterior-anterior location was measured from the position of the most posterior dopamine neurons in each hemisphere (fig. S1E). We also examined location differences between the value-modulated, intermediate, and choice-modulated OFC neurons. The posterior-anterior location was measured from the position of the most posterior OFC neurons in each hemisphere (fig. S1G).

To evaluate the effects of the first and second object values in the previous trial on the response of each choice-modulated dopamine neuron to the first object in the current trial, we conducted a multiple regression analysis using the following equationFt=β0+β1×V1t1+β2×V2t1(5)where Ft indicates the firing rate of each choice-modulated dopamine neuron in response to the first object in trial t (the firing rate was calculated using the time window during which the choice-modulated signal of that neuron was detected), V1t−1 and V2t−1 indicate the values of the first and second object, respectively, in trial t−1, and β0 to β2 indicate the coefficients determined by multiple regression. To test whether the proportion of neurons showing a significant regression coefficient was significantly larger than chance, we used a bootstrap procedure. For each neuron, we shuffled the firing rate of each trial and assigned it to another trial at random to form a new dataset. Then, the proportion of neurons showing a significant regression coefficient was calculated. This proportion was compared with the original proportion. We repeated this shuffle and comparison 1000 times. If the proportion of the original dataset was larger than the proportion of the shuffled dataset in more than 950 repetitions, then the original proportion was considered to be significantly larger than chance (P < 0.05, one-tailed bootstrap test).

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/27/eaba4962/DC1

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We thank K. Amemori, J. Izawa, K. Mimura, and T. Minamimoto for valuable comments on our data analyses, E.S. Bromberg-Martin for comments on an earlier version of the manuscript and K. Bunzui for animal care. Funding: This research was supported by MEXT KAKENHI grant number JP16H06567 (to M.M.) and JST CREST grant number JPMJCR1853 (to M.M.) Author contributions: M.Y., T.K., and M.N. performed the experiments. M.Y. analyzed the data. M.Y. and M.M. wrote the manuscript. All authors discussed the results and manuscript. M.M. organized this project. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.
View Abstract

Stay Connected to Science Advances

Navigate This Article