Research ArticleNEUROSCIENCE

Common coding of expected value and value uncertainty memories in the prefrontal cortex and basal ganglia output

See allHide authors and affiliations

Science Advances  12 May 2021:
Vol. 7, no. 20, eabe0693
DOI: 10.1126/sciadv.abe0693

Abstract

Recent evidence implicates both basal ganglia and ventrolateral prefrontal cortex (vlPFC) in encoding value memories. However, comparative roles of cortical and basal nodes in value memory are not well understood. Here, single-unit recordings in vlPFC and substantia nigra reticulata (SNr), within macaque monkeys, revealed a larger value signal in SNr that was nevertheless correlated with and had a comparable onset to the vlPFC value signal. The value signal was maintained for many objects (>90) many weeks after reward learning and was resistant to extinction in both regions and to repetition suppression in vlPFC. Both regions showed comparable granularity in encoding expected value and value uncertainty, which was paralleled by enhanced gaze bias during free viewing. The value signal dynamics in SNr could be predicted by combining responses of vlPFC neurons according to their value preferences consistent with a scheme in which cortical neurons reached SNr via direct and indirect pathways.

INTRODUCTION

Our interactions with objects are governed by a variety of factors including their history of being associated with reward, an effect referred to as object value memory (1, 2). In primates and in particular monkeys, these value memories are shown to be retained across long periods of time, maintained for a large number of objects (high object capacity) and to be resistant to extinction by repeated exposure without reward (1, 35).

Systematic electrophysiological investigations of the circuitry involved in representation and storage of value memories have implicated posterior basal ganglia including caudate tail (CDt) (6, 7), ventral putamen (vPut) (8), caudoventral globus pallidus (cvGPe) (9), and caudo-dorsolateral substantia nigra reticulata (cdlSNr) (4) as participating nodes. Among cortical areas, the temporal-prefrontal (TP) cluster that includes ventral superior temporal sulcus and ventrolateral prefrontal cortex (vlPFC) is found to be sensitive to reward memories and to retain those memories for many months (1). Single-cell recordings have revealed robust signatures of value memory among vlPFC neurons (5). Notably, vlPFC is known to have anatomical (1012) and functional (1) connectivity to posterior striatum including CDt and vPut that project directly and indirectly to the basal ganglia output, cdlSNr. In turn, cdlSNr is also known to target PFC indirectly via mediodorsal nucleus of thalamus (13, 14). This positions vlPFC and cdlSNr as two nodes in the well-known cortico-basal ganglia loops (15).

Representation of value memory across the cortical and basal nodes raises questions about the mechanism of value memory storage and dynamics of its expression within the corticobasal circuitry. In particular, it is unknown whether value memory is primarily represented in one region and relayed to the second one or that it is independently represented in cortical and in basal nodes. Furthermore, the value memory resolution may be different in cortical and basal nodes (e.g., one region maybe more sensitive to incremental reward while the other may represent it rather crudely). In addition to reward amount, reward uncertainty is also common place in life and is shown to affect neuronal responses and behavior (16, 17). It is unknown whether and to what extent reward uncertainties may modulate reward memory in the corticobasal circuitry.

To address these issues and to allow for direct comparison between the cortical and basal nodes in the expression and retention of value memories, neuronal activity was recorded in vlPFC and cdlSNr in the same monkeys and with the same visual objects with well-established reward memories. This approach minimizes unwanted variations by allowing within-subject and within-stimulus comparisons between the two regions. Various aspects of value memory including its onset, capacity, persistence in time, its granularity, and its modulation by uncertainty were found to be similar in both regions. The value memory across objects was highly correlated between the two regions, pointing to a collaborative mechanism for encoding of value memory across the cortical and basal nodes. Comparison of neural responses is consistent with a model in which vlPFC signals reach cdlSNr via direct and indirect basal ganglia pathways based on their value preference for low or high value objects.

RESULTS

Two monkeys (monkeys B and R) were trained to associate abstract fractal objects with preassigned reward contingencies in a standard value training task for at least 10 days (Fig. 1A). Fractals were trained in sets belonging to different contingency types (Fig. 1B): (i) good/bad (GB) sets: consisting of sets of four fractals associated with low reward (bad objects) and four fractals associated with high reward (good objects); (ii) amount sets: consisting of sets of five objects with linearly increasing reward amount from low to high reward; and (iii) probability sets: consisting of five objects with linearly increasing high reward probability. The amount sets allowed us to gauge the granularity of value memory, while the probability sets allowed us to measure the effect of added reward uncertainty compared to amount sets.

Fig. 1 Value memory paradigm and stimuli.

(A) Value training: After central fixation, monkeys were shown a single target object in a random location. They received the reward associated with that object according to a fixed contingency schedule once they fixated the target. (B) Fractal objects were trained in sets of objects. In the GB category (104 fractals), the sets consisted of four objects with small reward (bad objects) and four objects with large reward (good objects). Relative juice amount in large to small reward was 3 to 1. In the amount category (40 fractals), the reward amount was increased linearly from small to large reward across the five objects in the set. In the probability category (30 fractals), rewards were probabilistically either small or large with large reward probability increasing linearly from 0 to 1 across the five objects in the set. Thus, probability and amount sets were matched by expected value. (C) Value memory: Monkeys kept central fixation while objects from a given set were shown randomly and sequentially within the neuron’s RF in a passive viewing task. (D) Acute single-unit recording sessions were done either in vlPFC or cdlSNr in the same animal and with the same fractal objects shown in (B).

To examine the representation of object value memory in cortico-basal circuitry, acute neural recordings from vlPFC (areas 8Av, 46v, and 45) and from cdlSNr were done in separate sessions in each monkey (Fig. 1D and fig. S1, 349 vlPFC neurons and 140 SNr neurons in total). The neural signature of value memory was recorded during a passive viewing procedure in which objects in a set were shown pseudorandomly in the neuron’s receptive field (RF) (Fig. 1C). During the memory test, no object contingent reward was provided and monkeys were rewarded for maintaining central fixation after variable intervals. About 150 fractals across the three set types were used for measuring object value memory per monkey. The large number of objects used was, in part, to ensure that neural responses could not be attributed to idiosyncratic features in each stimulus and also to gauge the value memory capacity of corticobasal circuitry for increasing number of objects.

Value coding types and value signal onset in the corticobasal circuitry

Figure 2A shows representative example neurons from each monkey that were recorded with the same GB object set in vlPFC and cdlSNr in the passive viewing task. As can be seen in both regions, there was a clear difference in firing to good compared to bad objects but with opposite polarity in vlPFC versus cdlSNr. The example vlPFC neurons showed stronger excitation to good objects while the example cdlSNr neurons showed excitation to bad objects and inhibition to good objects consistent with previous reports (4, 5). In both regions, a smaller proportion of neurons had the opposite value coding (i.e., responding more to good in cdlSNr and responding more to bad in vlPFC, for examples, see figs. S2 and S8).

Fig. 2 Good versus bad object discrimination in vlPFC and cdlSNr.

(A) Example neurons recorded in vlPFC and cdlSNr responding to the same objects. Top row shows average peristimulus time histograms (PSTHs) of firing to good and bad objects. Bottom row shows raster plot of firing to each object. Actual fractals used are shown to the left and grouped into good and bad. Insets show average firing rate in 100 to 400 ms after onset (gray shading) for each good or bad object. Nrn, neuron. Horizontal line in the PSTH indicates object on duration (400 ms) (B) Distribution of good versus bad object discrimination (value AUC) across all neurons in vlPFC (top) and in cdlSNr (bottom). Arrows mark population average of value AUC. (vlPFC: AUC = 0.59, t348 = 9.8, P < 1 × 10−19; cdlSNr: AUC = 0.35, t115 = −7.5, P < 1 × 10−10; AUC > 0.5 indicates higher firing to good objects). Bad-preferring (Bp; significant AUC < 0.5), good-preferring (Gp; significant AUC > 0.5), and nonsignificant (NS) neurons are color coded in the histogram (value types). Inset shows the percentage of Bp and Gp neurons. (C) Population average PSTH to good and bad objects separately for Bp, NS, and Gp neurons in vlPFC (top row) and cdlSNr (bottom row). (D) Average firing difference between good and bad objects (value signal: good-bad) across value types in vlPFC (top) and in cdlSNr (bottom). Inset shows cumulative distribution of value signal onset in Bp and Gp neurons in vlPFC (t200 = 1.8, P = 0.08) and in cdlSNr (t59 = 3.2, P = 2 × 10−3). (E) Average onset of value signal across the population in vlPFC and cdlSNr for Bp and Gp neurons (all effects F1,259 > 5.9, P < 0.05, significant and nonsignificant post hocs marked). (F) Average value signal in vlPFC (good-bad) and in cdlSNr (bad-good) across population (left) and cumulative distribution of value signal onset across all value significant neurons (Bp and Gp) in vlPFC versus cdlSNr (t261 = 0.12, P = 0.9).

To quantify the strength of object value memory, we calculated areas under receiver operating characteristic curve (AUCs) for discrimination of good and bad objects (value AUCs) for each neuron. The overall distribution of value AUCs across all neurons showed a significant shift toward preference for good objects in vlPFC (higher firing to good compared to bad) and a significant shift toward preference for bad objects in cdlSNr (higher firing to bad compared to good; Fig. 2B). In both monkeys, vlPFC had a larger percentage of neurons with significant positive AUC values (46% good-preferring or Gp neurons) and a smaller percentage with significant negative AUC values (13% bad-preferring or Bp neurons). The opposite was true in cdlSNr (49% Bp and 7% Gp neurons). In both regions, some neurons did not show significant value memory [nonsignificant (NS) neurons, 41% vlPFC, and 44% cdlSNr]. Figure 2C shows the average response of neurons in each value types (Bp, NS, and Gp) across both regions (also see fig. S3 for heatmap of all neurons recorded in both regions and in both subjects using GB sets).

To check whether neurons in each value type show different electrophysiological characteristics, properties such as spike shape, interspike interval (ISI), and baseline firing were contrasted between the neuron categories. The average spike shape in cdlSNr neurons was narrower than vlPFC neurons, but there was no clear difference in the average spike shape across value types within a region (fig. S4A). In vlPFC, there was no significant difference in ISI distribution and basal firing rate across neuron types (fig. S4, B and C). However, as evident in Fig. 2C, for cdlSNr, there was a significant decrease in basal firing rate from Bp neurons to NS neurons to Gp neurons (fig. S4, B and C). In terms of value signal, vlPFC Gp and Bp neurons had comparable magnitude and value onset times. In cdlSNr, the value signal in Bp neurons was much stronger and had an earlier onset time compared to Gp neurons (Fig. 2D). Further comparison showed that average value onsets were not significantly different between vlPFC Gp and Bp neurons and cdlSNr Bp neurons (Fig. 2E). However, for the cdlSNr Gp neurons, the value onset was significantly later (>150 ms). This difference in basal firing rate between Bp and Gp neurons in cdlSNr along with the fact that the value onset in cdlSNr Gp neurons is relatively late suggests that cdlSNr Gp neurons may have different cellular and connectivity characteristics compared to cdlSNr Bp neurons.

Notably, while overall the average value signal magnitude (good minus bad object firing) was stronger in cdlSNr compared to vlPFC, the onset of value signal was comparable across the neural populations in vlPFC and cdlSNr (Fig. 2F). Given the fact that the recordings were done in the same setup with the same stimuli and in the same monkeys, the comparable value onset in vlPFC and cdlSNr argues that value memory is represented in cortical and basal nodes in a distributed fashion rather than being relayed from one of the two nodes to the other [in contrast, see a similar analysis that revealed a delayed value onset in MD thalamus compared to cdlSNr (14)]. Note that because neuronal responses were obtained during a passive viewing task, we expect nonsimultaneity of recordings to have negligible impact on our onset comparisons. A permutation analysis showed that value onsets in both vlPFC and cdlSNr for neurons recorded across sessions was not significantly different from value onsets recorded within the same session (monkey B, P > 0.09; monkey R, P > 0.2; see Materials and Methods).

Correlated object value coding in the corticobasal circuitry

Despite having the same reward association history, the strength of value memory showed some variation across objects in vlPFC and in cdlSNr. If each of the vlPFC and cdlSNr nodes encode object values independent from each other, then one expects these variations to be uncorrelated across the two regions. On the other hand, if the value signals were shared by the corticobasal loop, then one expects a significant correlation. To address this point, we calculated and compared the value memory of the same objects in cdlSNr versus vlPFC. Value memory of a given object (e.g., one good object) was taken as its AUC versus the objects from the other category in the set (e.g., four bad objects in the same set; see Fig. 1B). To allow comparison between regions and between GB objects, the absolute value of AUC [i.e., abs(AUC − 0.5)] was used. Consistent with the average value signal (Fig. 2F), the value AUC for almost all objects was stronger in cdlSNr compared to vlPFC with most points falling above the unity line. Results showed a significant positive correlation for the absolute value signal across the two regions (Fig. 3A). In other words, the higher the absolute value of AUC for an object in vlPFC, the higher it would be in cdlSNr and vice versa. The strength of correlation and linear slope was similar for good and bad objects (Fig. 3B). This analysis also revealed the high object capacity memory mechanism in both regions because value AUC was far from 0.5 for almost all individual objects (104 and 88 total objects in monkeys B and R, respectively). Notably, the vlPFC-cdlSNr correlation was not observed in the surrogate data if the object labels were shuffled for cdlSNr neurons (Fig. 3C, showing results for one such shuffle). The observed correlation (Fig. 3A, R = 0.34) was significantly larger than the mean correlation of the null distribution (1 × 104 permutations, R = 0.09) with shuffled object labels (P = 0.025).

Fig. 3 Correlated value signal for objects across vlPFC and cdlSNr.

(A) Absolute AUC (|AUC|) of all good and bad objects for both monkeys in vlPFC versus cdlSNr. Each dot represents the |AUC| for one object averaged across all neurons in cdlSNr (y axis) and in vlPFC (x axis) color-coded for each monkey. Regression line is overlaid and slope is noted along with Pearson’s correlation (Deming regression used to account for noise in both the x and y axes). (B) Same format as (A) but showing the correlation separately for good and bad objects. (C) Same format as (A) but for object set shuffled (rotational shift of set labels by one) within monkeys. (D) Correlation dynamics time locked to object onset for absolute value AUC between vlPFC and cdlSNr calculated in a sliding 300-ms window for monkey B (top row) and monkey R (bottom row) separately for good and bad objects and combined. Significant correlations (P < 0.05) are marked by solid lines. (E) Same as in (A) but for shuffled monkey labels across cdlSNr neurons.

Next, the temporal dynamics and emergence of value signal correlation between vlPFC and cdlSNr was explored using a sliding window during the passive viewing trials (Fig. 3D). Results show that correlation becomes significant peaking around 350 ms after object onset in both monkeys. For monkey R, the correlation was stronger and lasted over a longer period (100 to 400 ms), while in monkey B, the significant correlation period was narrower (300 to 400 ms).

The correlated variation in value AUC between the two regions may simply reflect the variation in visual features between the fractals rather than a shared value coding mechanism across the cortex and basal ganglia. However, two pieces of evidence argue against this possibility. First, if the variation is due to visual features, then one predicts that shuffling the monkey labels should minimally affect the observed correlation between vlPFC and cdlSNr because the same objects were used in both monkeys. However, this shuffling removes the correlations between the two regions (Fig. 3E). Second, the object selectivity of vlPFC and cdlSNr was low for fractal objects. Both regions showed relatively low object selectivity as measured by sparsity (fig. S5A) compared to object selectivity in inferotemporal cortex or in CDt [for comparison, see (18, 19)]. Sparsity in cdlSNr was significantly lower than vlPFC. However, sparsity is highly sensitive to additive shifts in firing rate (i.e., neurons with higher mean firing tend to show lower sparsity). To account for firing differences between vlPFC and cdlSNr, we have constructed a novel object selectivity measure (termed nonuniformity or NU, see Materials and Methods). Briefly, NU is related to the difference between the observed distribution of responses to objects compared to a uniform distribution and has a support from 0 to 1. For neural responses tested with a limited number of objects, the support will have a smaller range depending on the number of objects. As can be seen in fig. S5B, NU was comparable between vlPFC and cdlSNr and was close to the minimum possible NU for four objects (four good and four bad objects in each set). This suggests that both vlPFC and cdlSNr have limited if any object selectivity for the fractals was used in this experiment.

Graded coding of object value memories in the corticobasal circuitry

To explore the granularity of corticobasal code for object value memory, objects with linearly increasing reward association from low to high reward (same reward amounts as bad and good objects, respectively) during training (amount set objects; Fig. 1B) were used in passive viewing test. Alternatively, one may linearly change the expected value across objects using a probabilistic reward schedule between low and high rewards (probability set objects; Fig. 1B). Choice trials done after training confirmed that monkeys learned and differentiated the graded amounts and probabilities as they often chose the object with higher value (fig. S6).

Figure 4 shows representative neuronal responses to amount and probability sets in both regions and both monkeys. Neurons in both regions mostly followed the changes in the expected value of objects in a granular fashion. For the example vlPFC neurons, this meant a gradual increase in firing rate for increasing values, and for the example cdlSNr neurons, this meant a gradual decrease in firing for increasing values. However, the rate of increase for probability objects tended to be faster compared to amount objects for objects matched for expected value. One may also note that while the polarity of response is the same between amount and probability sets compared to GB sets, the dynamic range of responses in amount and probability set seems larger than the GB objects in these examples.

Fig. 4 Example responses to GB, amount, and probability sets in vlPFC and cdlSNr.

Example neurons in vlPFC and cdlSNr recorded in GB (left column), amount (middle column), and probability (right column) sets in monkey B (top two rows) and monkey R (bottom two rows). In each panel, top row shows average PSTH to objects in the set. Bottom row shows raster plot of firing to each object in the set with dots indicating spike times. PSTH and spike raster are color-coded according to object value. Insets show average firing rate in 100 to 400 ms after object onset for each object (gray shading). For probability and amount sets, the x axis of inset is ordered by objects expected value memory.

Consistent with the examples shown, the average neural responses in both vlPFC and cdlSNr showed a graded response to increasing reward amounts and probabilities associated with each object (Fig. 5, A and B). In vlPFC, the average responses showed ever increasing excitation across the five levels of reward associations. In cdlSNr, the average response showed a transition from excitation to inhibition going from low to high rewards across the five levels.

Fig. 5 Population average of reward amount and probability memories in vlPFC and cdlSNr and free viewing bias in amount and probability sets.

(A) Population average PSTH of neurons responding to objects with varying reward amounts in vlPFC (top) and in cdlSNr (bottom). (B) Same format as in (A) and for the same neurons but for objects with varying reward probabilities in both regions. (C) Average firing of the same neurons to objects from amount, probability, and GB categories as a function of expected value in vlPFC (top) and cdlSNr (bottom). (D) Average firing difference for objects with the same expected value from probability sets compared to amount sets in vlPFC (top, F4,145 = 3.5, P = 9 × 10−3) and cdlSNr (bottom, F4,140 = 9.6, P = 6 × 10−7). (E) Free viewing task: Subject freely viewed four objects from probability sets or from amount sets (free viewing task). (F) Gaze bias measures (percentage of first saccade and object scanning frequency) for objects in probability and amount sets as a function of their expected value (first saccade: F4,395 > 4.8, P < 8 × 10−4, main effect of value and interaction; object scanning: F4,395 > 3.2, P < 2 × 10−2, main effect of value and interaction. Total of 38 amount and 43 probability sessions, and / and X indicate main effect of value and interaction, respectively) (G) Difference in gaze bias for probability sets compared to amount sets (first saccade: F4,15 = 7.2, P = 1 × 10−3; object scanning: F4,15 = 2.3, P = 9 × 10−2).

To aid response comparison between amount and probability sets, the neural responses to both set types are overlaid as a function of learned expected values (Fig. 5C). As can be seen, the probability responses tended to be higher than amount responses when matched for expected value in vlPFC. A similar pattern was observed in cdlSNr but with the opposite polarity. In both regions, the firing rate to objects in probability and amount sets exceeded expectations when compared with firing to good and bad objects. This deviation was, in general, stronger to objects in probability sets compared to value matched objects in amount sets (fig. S7A). Furthermore, despite having the same dynamic range in expected reward in GB sets versus amount and probability sets, the firing rates in both vlPFC and cdlSNr showed an expanded dynamic range in firing for the amount and probability sets (Fig. 5C). In addition, objects with small and large certain rewards in probability sets tended (not significant) to have lower response compared to their counterparts in the amount sets.

The average response of all neurons recorded in amount or probability sets (but not necessarily with all set types) is also consistent with what is seen in the population that was recorded in all set types (fig. S7C). The graded coding is also observed among neurons with minority value coding in vlPFC or cdlSNr. However, because of small number (three in vlPFC and one in cdlSNr), we cannot draw conclusion for differences among value types (fig. S8).

The key difference between the amount and probability sets is the reward uncertainty for objects with 25, 50, and 75% reward in the probability set. Notably, if neural responses to objects with matching expected value are subtracted between probability and amount sets for individual neurons, then one ends up with a roughly bell-shaped curve similar to what is expected from uncertainty coding (Fig. 5D, difference from zero seen for objects with 25, 50, and 75% reward probability). Thus, coding of value uncertainty seems to be additively superimposed on responses of vlPFC and cdlSNr neurons to expected value.

It is proposed that uncertainty coding and risk seeking can be explained by a convex utility function (20). In both vlPFC and cdlSNr, the firing rate to the objects with increasing values looks convex. In this case, if firing to probability objects fall on a line connecting the firing to lowest and highest value object in the amount set, the uncertainty boost observed can be explained in terms of the convex firing rate. In cdlSNr, the response to probability objects goes beyond what can be expected from the convex response to amount objects (fig. S7B). However, in vlPFC, one cannot reject the null hypothesis that the probability responses simply arise from the convex responses to amount objects (fig. S7B). Thus, at least in cdlSNr, uncertainty coding beyond predictions from a convex utility function is observed. Such uncertainty coding can arise from probability distortions as suggested by prospect theory and as previously reported in monkeys engaged in choice between probabilistic rewards. However, unlike the previous reports rather than seeing an inverted s-shaped distortion (21, 22) (overestimation of lower probabilities and underestimation of higher probabilities), we observed a boost to all uncertain probabilities.

The boost in neuronal response to objects with probabilistic reward memory tempts one to speculate an enhanced behavioral bias to the probability objects compared to amount objects with matching value. To test this prediction, behavioral gaze bias was examined in a free viewing task. Differential gaze bias during free viewing has been previously used as an index of memory strength (1, 23). In each trial, four objects from one object set were chosen randomly and were presented simultaneously on the screen (Fig. 5E). Amount and probability sets were used in different blocks of trials. Figure 5F shows two behavioral measures of gaze bias, namely, first saccade and object scanning rates averaged across blocks of free viewing with amount or probability sets [parts of these free viewing data were reported previously in (17) but are reanalyzed and presented to allow for comparison with neural data]. As expected, increased expected value resulted in increased gaze bias overall (Fig. 5F). The strength of gaze bias was boosted for probability objects. Subtraction of gaze bias for corresponding objects in probability and amount sets revealed the bell-shaped curve similar to what was observed neuronally with the boost observed mainly for objects with 25, 50, and 75% reward probability (Fig. 5G). The lower response to certain objects (0 and 100%) in probability set compared to amount set is also seen behaviorally. The response boost to probabilistic object may also explain the difficulty of 75% versus 100% choice trials (fig. S6). Overall, these results implicate both vlPFC and cdlSNr in value and uncertainty based behavioral biases.

Persistence of object value memories in the corticobasal circuitry

Different types of memories show various degrees of volatility. Overtrained value memory was previously shown to last for many months (1, 4, 24). To address the longevity of value memory in the corticobasal circuitry, persistence and stability of value memory in vlPFC versus cdlSNr across time were explicitly compared. Value memory for objects was tested hours, days, weeks, or months after last reward training in different neural recording sessions (Fig. 6A). Moreover, during these memory periods, the animal was actively learning object reward associations for a large number of other objects not used in this study. This arrangement allows one to examine the persistence of object reward memory representation in the corticobasal circuitry despite passage of time and despite interference from value learning with other objects. This stability of memory across time epochs was previously reported for vlPFC (5), but these data are now shown alongside cdlSNr to aid comparison between regions.

Fig. 6 Stability of value AUC in vlPFC and cdlSNr against time from last value association.

(A) Acute neural recordings were performed during passive viewing (blue block) to test value memory in objects previously trained with reward (black block). Reward memory was measured hours, days, weeks, or months after last reward association for a given object, during which monkeys were still trained with many other objects. (B) Population average of value AUCs for vlPFC and cdlSNr neurons with objects that were last trained hours, days, weeks, or months ago (left: main effect of region, F1,484 = 11, P = 8 × 10−4; main effect of time period and interaction, F2,484 < 0.2, P > 0.6) and population average of value AUCs within the neurons that were recorded with at least two sets of objects, one with shorter and one with longer passage of time from last reward training (right: main effect of region, F1,228 = 52, P = 6 × 10−12; main effect of period and interaction, F1,228 < 1.9, P > 0.17). The dashed lines are value AUC of surrogate data with two good and two bad object labels shuffled, which show no significant difference from chance in either region (P > 0.2) (= indicates main effect of region).

As can be seen in Fig. 6B, the average value AUC persisted across memory periods up to many weeks in both regions [for cdlSNr, there were not sufficient neurons recorded in the months period, but for the recorded neurons, we observed a high month long AUC of 0.85 consistent with previous reports of months long value memory in cdlSNr (4)]. This retention of value memory was also observed within neurons that were tested with at least two object sets each with different memory gaps from their last reward learning. The change in value AUC was not significant in either region despite a slight decreasing trend (Fig. 6B). In both cases, the AUC of a shuffled data in which labels of two good and two bad objects were switched remained insignificant and close to 0.5, proving that the sustained AUCs are not spurious and directly arise from object value memories.

Stability of value memories can also be assessed by their resistance to extinction. Value extinction results from repeated exposure to reward-predicting cues without reward (25). Because the passive viewing task consisted of repeated presentation of objects without contingent reward, the stability of value signals against extinction in the corticobasal circuitry could be examined. Notably, both vlPFC and cdlSNr showed preserved value signal despite repeated exposure across trials (Fig. 7A and fig. S9). In vlPFC, the constant value signal was observed despite a reduction of overall firing to good and bad objects as a function of number of trials (repetition suppression). Such a repetition suppression is previously reported in various cortical areas (26). However, this effect was not observed in cdlSNr, which showed relatively constant firing to good and bad objects as well as a constant value signal across trials. In addition, repetition suppression predicts response recovery to a new set of stimuli. For vlPFC but not cdlSNr neurons that were recorded with more than one GB set during a session, one sees a strong and significant recovery at the time of set switch (Fig. 7B).

Fig. 7 Stability of value AUC in vlPFC and cdlSNr against trials and repetition suppression.

(A) Population average firing to good and bad objects (left axis) and their firing difference (right axis) across all neurons in vlPFC (top: main effect of type, F1,6423 = 65, P = 6 × 10−16; main effect of trial, F9,6423 = 2.8, P = 2 × 10−3; interaction, F9,6423 = 0.1, P = 0.9; main effect of difference, F9,3189 = 0.9, P = 0.5) and the same format for neurons in cdlSNr (bottom: main effect of type, F1,2152 = 115, P = 2 × 10−26; main effect of trial and interaction, F9,2152 < 0.4, P > 0.9; main effect of difference, F9,1068 = 0.5, P = 0.9). The presentations of a given object were not in consecutive trials and were intervened by other object presentations during passive viewing but are plotted consecutively for each object (= and \ indicate main effect of GB and trial, respectively). (B) Population average firing in two consequent blocks of passive viewing done with two different object sets showing repetition suppression and its recovery in vlPFC (top: main effect of trial, F9,2547 = 10, P = 2 × 10−9; last trial of first block versus first trial of next block, t141 = 3.6, P = 3 × 10−4; other pairwise tests shown, t141 > 2.4, P < 2 × 10−2) but not in cdlSNr [bottom: main effect of trial, F9,801 = 1.3, P = 0.18; last trial of first block versus first trial of next block, t44 = −0.45, P = 0.6; other pairwise tests shown, abs(t141) < 0.9, P > 0.4].

Proposed circuit model of vlPFC to cdlSNr connectivity

It is known that vlPFC and cdlSNr are indirectly connected with each other through the cortico-basal ganglia loops (11, 13). Cortical projection neurons provide excitatory input to the striatum. Striatal GABAergic projection neurons then project either directly (direct pathway) or indirectly (indirect pathway) via external globus pallidus (GPe) to SNr. Last, SNr sends inhibitory projections to thalamus, which sends excitatory input back to cortex, closing the cortico-basal-thalamo-cortical loop. In our case, vlPFC is known to project to CDt (11, 12) (a striatal subregion) and CDt is shown to target cvGPe (9). Both direct and indirect pathways of this circuitry reach cdlSNr, which projects back to vlPFC via medial thalamus (27). The anatomical connectivity and the strong correlation between the AUC values in these regions posit that the temporal pattern of neural responses to objects in one region may be predictable by responses in the other. If we were to predict the average responses of one region to good and bad objects by combining responses of recorded neurons in the other region (using linear regression with neuron numbers matched between regions, see Materials and Methods), then one can see a good mutual prediction of cdlSNr response by vlPFC neurons and vice versa (Fig. 8A). However, the neuronal weights (beta coefficients of regression model) in vlPFC but not cdlSNr showed a significant correlation with the neurons value signal. In vlPFC, there was a negative correlation between value AUC and weights. The average weight for Bp neurons was significantly positive, and for Gp neurons, it was significantly negative in vlPFC (Fig. 8B, bottom). No such relation was observed in cdlSNr (Fig. 8B, top). The same pattern of weights is observed if individual neuron responses in one region to a given GB set were regressed against responses of all neurons tested with the same set (neuron numbers matched between regions; fig. S10). Thus, while responses in either region can be predicted by the response of the other, the feedforward projections from vlPFC to cdlSNr but not vice versa seem to organized with respect to value preference of the neurons.

Fig. 8 Proposed corticobasal circuit model and mutual response predictions.

(A) Actual and predicted (pred) average PSTH in one region using linear combination of all neurons PSTHs in the other for vlPFC (top) and cdlSNr (bottom). Cross-validated R2 is shown. (B) Beta coefficients of neurons as a function of their value AUC (left column) and average beta coefficient for Bp and Gp neurons (right column) (right top: Bp versus Gp, t63 = −0.6, P = 0.5; Bp and Gp versus 0, t56 = −1.2 and t7 = 0.3, P > 0.2; and right bottom: Bp versus Gp, t64 = 4.3, P = 6 × 10−5; Bp versus 0, t12 = 3.3, P = 6 × 10−3; and Gp versus 0, t52 = −2, P = 4 × 10−2). (C) Same format as in Fig. 7A but for the predicted vlPFC firing (top: main effect of type, F1,6960 = 22, P = 2 × 10−6; main effect of trial and interaction, F < 0.51, P > 0.8; main effect of difference, F9,3480 = 1.4, P = 0.18) and predicted cdlSNr firing (bottom: main effect of type, F1,2300 = 77; P = 2 × 10−18; main effect of trial and interaction, F < 0.1, P > 0.9; main effect of difference, F9,1150 = 0.45, P = 0.9) using weights for fitting individual vlPFC and cdlSNr neurons (see fig. S10). (D) Beta coefficients to predict average PSTHs in one region based on only three regressors: Average PSTHs of Gp, NS, and Bp neurons in the other region (insets show the firing pattern of each regressor). (E) Actual and predicted vlPFC and cdlSNr responses from regression described in (D). (F) vlPFC projects to CDt that also receives dense projections from IT cortex. Responses from Gp vlPFC neurons may reach cdlSNr via direct pathway (negative weights), and responses from Bp vlPFC neurons may reach cdlSNr via indirect pathway (positive weights) on average. Both vlPFC and cdlSNr have direct projections to superior colliculus (SC) to control attention and gaze.

The observed regression beta coefficients are consistent with a model in which vlPFC Gp neurons connect to cdlSNr via the direct pathway (thus negative projection weights) and vlPFC Bp neurons connect to cdlSNr via the indirect pathway (thus positive projection weights; Fig. 8F). If this simple model is true, then using a single beta coefficient for averaged vlPFC Gp, Bp, and NS neuron responses should fit average cdlSNr responses well and should show the same pattern of beta coefficient depending on the neuron’s value preference. This is observed to be the case. On the other hand, predicting vlPFC responses based on averaged cdlSNr Gp, Bp, and NS neurons should not result in a good fit, which is also found to be the case (Fig. 8, D and E).

The circuit model proposed in Fig. 8F provides a mechanism for cancellation of repetition suppression going from the cortex to SNr. As can be seen in Fig. 7 and fig. S9, repetition suppression is seen in vlPFC across the population for preferred and nonpreferred values as well as separately in Gp and Bp neurons but not in cdlSNr (significant in vlPFC Gp and trending in vlPFC Bp). If vlPFC Gp and Bp neurons reach cdlSNr via direct and indirect pathway as suggested by the model proposed in Fig. 8F, then repetition suppressions will have opposite signs and can cancel while value signal will be strengthened through addition of value signal in Gp and Bp neurons. Predicted cdlSNr firing based on previously obtained beta coefficients from regression of individual cdlSNr neuron on vlPFC neurons (fig. S10) showed constant firing to good and bad objects in as function of trials (Fig. 8C, bottom). Note that this is not a trivial consequence of regression, as the regression coefficients were obtained for fitting the averaged responses of neurons and did not have access to the trial-by-trial changes in firing in vlPFC and cdlSNr. On the other hand, predicting the trial-by-trial responses in vlPFC by cdlSNr (Fig. 8C, top) did not successfully capture the repetition suppression observed in vlPFC, suggesting that the observed suppression in vlPFC is independent of the feedback from cdlSNr.

Last, the general organization of RFs in cdlSNr compared to vlPFC was also examined (fig. S11). On average, the cdlSNr RFs are larger than vlPFC RFs and show more fragmentation (disconnected RF) compared to vlPFC (fig. S11C). The RF in both regions was mostly biased toward the contralateral hemifield; however, cdlSNr RF extends well into the ipsilateral hemifield. The source of ipsilateral RF may be from sparse contralateral cortical input to striatum (28, 29). The large size and fragmentation of cdlSNr RFs are also suggestive of additive combination of smaller and more localized cortical RFs that are relayed via striatal and pallidal pathways.

DISCUSSION

The corticobasal loop that includes cdlSNr and vlPFC was previously shown to encode value memories for a large number of objects (1, 4). The value memory can be expressed initially within a specific node in the loop and then propagate to other nodes (local coding) or can be represented in a distributed fashion across multiple nodes (distributed coding). If the code is local, then it can be stored in cortex and then relayed to basal ganglia or it can be stored in basal ganglia and then relayed to cortex via thalamus. In either scenario, delayed expression of value signal in the secondary node compared to the primary node should be observed. A recent study comparing the latency of value signals showed delayed expression of value memory signal in medial thalamus compared to cdlSNr by ~10 ms (14). To examine the value coding scheme in vlPFC and cdlSNr, neural recordings were performed in both regions using the same visual objects and within the same monkeys. In addition to within-subject and within-stimulus design, minimization of behavioral variations across sessions using passive viewing task provided favorable conditions to compare value signal dynamics and onsets across both regions. Unexpectedly, our results showed equivalent value signal dynamics and onsets in vlPFC and cdlSNr, which suggests both nodes to have comparable role in value memory coding and expression (Fig. 2F). Despite this fact, value memory code in cortical and basal nodes was not independent. The value memory for individual objects was highly correlated across both regions (Fig. 3).

It may be argued that the observed correlations in value coding between vlPFC and cdlSNr merely reflect variations in visual features of fractals. However, there are multiple observations that argue against this possibility. First, both vlPFC and cdlSNr showed minimal object selectivity (fig. S5). Second, if the value memory correlation was mainly due to visual features rather than actual corticobasal connectivity, then one predicts that swapping neurons between subjects should minimally affect the correlation because the same objects were used for both monkeys. However, this shuffling removed correlation between regions (Fig. 3E). Thus, value coding is not based on visual features common to different subjects; instead, value coding varies across objects in correlated fashion between vlPFC and cdlSNr. Third, in many respects, the expression of value memory was similar in vlFPC and cdlSNr. Both regions incrementally changed responses for incremental changes in object value with equivalent resolution. Both regions were sensitive to value uncertainty and encoded expected value and uncertainty in an additive fashion (Figs. 4 and 5). Furthermore, the persistence of value memory in time and its stability against extinction were comparable in both regions (Figs. 6 and 7). These results suggest that value memory retention and expression are tightly linked across both regions.

Despite these similarities, the responses to objects in cortical and basal nodes had differences in certain aspects. The first obvious difference was the opposite polarity of dominant value signal (dominant Gp coding in vlPFC and Bp coding in cdlSNr) and second was the much stronger value signal in cdlSNr compared to vlPFC. There was also a visual repetition suppression that was observed in vlPFC but not in cdlSNr. All of these differences including the stronger value signal in cdlSNr were found to be accounted for by an ordered mixing of vlPFC response in which Bp and Gp neurons reach cdlSNr by positive and negative weights, respectively (Fig. 8, A, B, D, and E). Furthermore, the same ordered mixing also predicted the lack of visual repetition suppression in cdlSNr despite significant repetition suppression in vlPFC (Fig. 8C).

We note however that despite the intuitive appeal and consistency of the regression results with direct/indirect cortico-striatal pathways and the cross-validation success in predicting lack of repetition suppression in cdlSNr, they are only correlational in nature and can have serious limitations in proving the circuit model proposed in Fig. 8F. A strong test of the proposed connectivity pattern requires careful anatomical or causal techniques that can tag vlPFC neurons based on their value preference with respect to direct/indirect pathways, which is not achieved by the current study.

Our results show that vlPFC and cdlSNr are both sensitive to and encode memory of reward uncertainty in addition to memory of expected reward (Figs. 4 and 5 and figs. S7 and S8). In vlPFC, uncertainty resulted in stronger excitation, and in cdlSNr, it resulted in stronger inhibition in the population average. Notably, free viewing gaze bias showed a similar boost for objects with uncertain reward history. Thus, the memory of reward uncertainty enhances attention to objects. Given the anatomical and functional projection from vlPFC and cdlSNr to superior colliculus (Fig. 8F), it is very likely that the observed neuronal enhancements underlie the behavioral bias observed in free viewing (7, 11, 30, 31). Reward uncertainty is shown to be encoded in an additive fashion to expected reward or independent from it in various regions including anterodorsal septum, basal forebrain, anterior cingulate cortex, and caudate head (16, 3234) in tasks with active reward anticipation. In these regions, the activity normally starts from the cue unset in a ramp up or sustained fashion to the time of reward delivery. In contrast, the current result is the first demonstration of the co-coding of expected value and value uncertainty memories within the same neuron during passively viewing. This co-coding was especially evident in cdlSNr and could not be accounted for by the convex response curve to objects with increasing expected value (fig. S7B). In addition, a response range expansion for graded amount and probability objects was observed compared to GB sets, which had only two reward levels in both regions (Fig. 5C and fig. S7A). The range expansion can accentuate the differences between objects when there are multiple levels of rewards.

Note that the uncertainty boost observed in our data and especially in cdlSNr can arise from probability distortion as proposed by the prospect theory of choice. However, the probability distortion in primates during choice is often observed to follow in inverted s-shape with overestimation of low probabilities and underestimation of high probabilities (21, 22). On the other hand, our data do not show underestimation of high probabilities. A key source of such difference can be the fact that the neural responses here are recorded during passive viewing of objects presented one at a time while the s-shaped distortion was observed during active choice. It is previously shown that such inconsistencies can arise from failure of procedure invariance (35) and framing effect (36). Consistent with the current study, when humans were involved in evaluating single bets (minimum selling price) rather than choice, overestimation of probabilities was observed even for high probabilities (37). It remains to be seen whether the responses of vlPFC and cdlSNr during choice trials show the inverted s-shape of probability distortion. Furthermore, the significant range expansion in amount and probability sets compared to GB sets and the trend for lower firing to certain objects (0 and 100%) in probability set compared to amount set are not easily explained by normative economic theories and call for further investigation and descriptive modelling.

vlPFC projection to cdlSNr is indirect and is via posterior striatum including CDt. CDt is one of the key nodes known to express object value memories (11). CDt receives the majority of its cortical input from infratemporal (IT) cortex, from which it acquires detailed object information and selectivity (10, 12, 38). Despite the large RFs of IT neurons, CDt neurons show selective spatial responses as well (19). One possibility is that spatial information is relayed to CDt by vlPFC neurons (fig. S11). In such a case, a multiplicative combination of IT and vlPFC can result in concurrent spatial and object selectivity observed in CDt neurons. vlPFC can also provide value memory information via glutamatergic projections. Previous results show that value (possibly from vlPFC) and object selectivity (possibly from IT) are encoded in a multiplicative fashion in the CDt neurons (6). Another key source of value information to CDt are the recently discovered “sustain” dopaminergic neurons in caudo-dorsolateral substantia nigra compacta (cdlSNc) (39, 40). While the role of DA in value learning in basal ganglia is well established, it is not now known how the value information from vlPFC and dopamine (DA) neurons interacts to derive CDt responses. PFC neurons are known to encode various kinds of information concurrently (4145). Thus, one possibility is that vlPFC inputs provide rich contextual information that shape value learning and memory in posterior basal ganglia.

In summary, our results are consistent with a distributed and collaborative coding of value memory in cortex and basal ganglia. In this coding scheme, vlPFC and cdlSNr both not only encode and express value memories but also talk with each other via the corticobasal loop. The value memory in both regions can influence attention and gaze via the connections to superior colliculus. The relative influence of each region on gaze bias and value-driven attention requires further investigation within a comparative framework. In addition, it is still possible that value learning and storage are done via a third common input to both vlPFC and cdlSNr. While such a scheme requires precise timing delays from the common input to both vlPFC and SNr to justify the similar value onsets observed, nevertheless, such a scheme cannot be completely ruled out without further investigation of all other candidate regions involved in object value memory [e.g., lateral intraparietal area also shows activation to value memory in a previous functional magnetic resonance imaging (fMRI) study (46)]. Furthermore, explicit comparison of value learning in both regions allows one to examine how this collaborative value memory emerges and folds out across learning trials. Collectively, these studies may help to reveal the neural substrates of “object skill” (2) phenomena and value-based maladaptive memories observed in substance addiction. Last, while our current result shows comparable coding of value memory in vlPFC and cdlSNr, whether this similarity extends to other domains of ecological salience including novel or aversive stimuli (17) needs further investigation.

MATERIALS AND METHODS

Subjects and surgery

Two male adult rhesus monkeys (Macaca mulatta) were used in all tasks (monkeys B and R ages 7 and 10, respectively). All animal care and experimental procedures were approved by the National Eye Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals. Both animals underwent surgery under general anesthesia during which a head holder and recording chambers were implanted on the head and scleral search coils for eye tracking were inserted in the eyes. The prefrontal recording chamber was tilted laterally and was placed over the left and right PFC for monkeys B and R, respectively (25° tilt for B and 35° tilt for R). The SNr chamber was tilted posteriorly and was centered on midline allowing access to cdlSNr on both hemisphere (40° tilt in both monkeys). After confirming the position of recording chamber using MRI, craniotomies over PFC was performed during a second surgery. The craniotomies for cdlSNr were performed on the same hemisphere as the PFC recording chamber (left and right cdlSNr for monkeys B and R, respectively). In monkey R, the craniotomy was extended to allow recordings on the left cdlSNr as well. The recording was done through grids placed over the chamber with 1-mm spacing.

Recording localization

Substantia nigra (SN) recording localization in both subjects was done using T1- and T2-weighted MRI (4.7T, Bruker). T2-weighted MRI is especially useful because T2 relaxation times in much shorter in SNr area due to higher iron content (47). Recoding chambers were equipped with a grid with 2-mm-hole spacing and were filled with gadolinium for better contrast (fig. S1, A and C). The location of SN in each monkey was further verified using the standard monkey atlas [D99 atlas (48)], which was brought into each monkeys native space using the National Institute of Mental Health Macaque Template (NMT) toolbox (49) and the projection of SN as reachable through the posterior recording chamber was visualized and confirmed to coincide with the recording locations (fig. S1, B and D). vlPFC recordings were localized and reconstructed using the same scans per monkey and using the D99 atlas [refer to figure S1 in (5)].

Stimuli

Visual stimuli with fractal geometry were used as objects (50). One fractal was composed of four point-symmetrical polygons that were overlaid around a common center such that smaller polygons were positioned more toward the front. The parameters that determined each polygon (size, edges, color, etc.) were chosen randomly. Fractal diameters were, on average, ~7° (ranging from 5° to 10°). Monkeys saw many fractals across three reward contingency groups (monkey B saw 96 in GB sets, 20 objects in amount set, and 20 objects in probability sets; monkey R saw 104 objects in GB sets, 30 objects in amount sets, and 20 objects in probability sets; Fig. 1B).

Task control and neural recording

All behavioral tasks and recordings were controlled by custom written VC++-based software (Blip; www.robilis.com/blip/). Data acquisition and output control were performed using National Instruments PCIe-6353. During the experiment, head-fixed monkeys sat in a primate chair and viewed stimuli rear-projected on a screen in front of them (~30 cm) by an active-matrix liquid crystal display projector (PJ550, ViewSonic). Eye position was sampled at 1 kHz using a scleral search coils. Diluted apple juice (33 and 66% for monkey B and R, respectively) was used as reward.

Activity of single isolated neurons was recorded with acute penetrations of glass-coated tungsten electrodes (AlphaOmega, 250 μ total thickness). The dura was punctured with a sharpened stainless steel guide tube, and the electrode was inserted into the brain through the guide tube by an oil-driven micromanipulator (MO-972, Narishige) until neural background or multiunit was encountered for vlPFC recording sessions or until firing pattern characteristics of SNr were observed in cdlSNr recording sessions after passing through cortical and subcortical structures guided by MRI. The electric signal from the electrode was amplified and filtered (2 Hz to 10 kHz; BAK amplifier and preamplifiers) and was digitized at 1 kHz. Neural spikes were isolated online using voltage-time discrimination windows. Spike shapes were digitized at 40 kHz and recorded for 4.5 ms (average spike shape of 300 spikes per neuron). An attempt was made to record all well-isolated and visually responsive neurons (visually response to neutral familiar fractals using RF mapper or passive viewing tasks or to flashing white dots in various locations). The results reported are from a total of 158 and 191 vlPFC neurons in monkeys B and R, respectively, and 60 and 80 cdlSNr neurons in monkeys B and R, respectively. Parts of the PFC results were previously reported in (5).

Neural data analysis

Responses were time-locked to objec -onset for analysis in all tasks (passive viewing tasks and receptor field mapping). The analysis epoch was from 100 to 200 ms in RF mapping and 100 to 400 ms after object onset in passive viewing task. Average firing to good and bad objects and their difference were calculated during the analysis epoch. The discriminability based on learned values was measured from average firing during analysis epoch across trials using AUC. Wilcoxon rank sum test was used for AUC significance for individual neurons (Fig. 2B). The preferred value for each neuron (fig. S9, A and B) was determined using a cross-validation method by using odd and even trials to determine preference in even and odd trials, respectively.

Onset detection procedure. Custom written MATLAB functions were used to detect response onset (value onsets in each neuron; Fig. 2, D and E). Briefly, for each neuron, average firing difference between good and bad objects for each neuron was transformed to z scores using −200 to 30 ms after object onset as the baseline. First, response peak after object onset was detected using MATLAB findpeaks with a minimum peak height of 1.64 corresponding to 95% confidence interval. The onset was determined as the first valley before this peak (valleys within baseline) using findpeaks on the inverted response.

Regression procedure for response prediction of one region by the other. Regression results reported in Fig. 8 (A and B) are done by using all neurons recorded in one region to predict population average response to good and bad in the other region, according to the following formula (regression 1)y¯(t)=j=1nβjxj(t)+ε, where y¯(t)i=1myi(t)m.where yi(t) ≡ [psth_badi(t), psth_goodi(t)] and xj(t) ≡ [psth_badj(t), psth_goodj(t)].

In this case, n is the minimum number of neurons between two regions in a given monkey and is used to determine the number of neurons in the first region (regressors). m is the number of neurons used for making the average peristimulus time histogram (PSTH) in the second region [dependent variable y¯(t)]. t represents the peristimulus time. psth_badi and psth_goodi are average responses to good and bad objects for neuron i (similar to examples shown in Fig. 2A). The region with larger number of neurons was subsampled using value AUC histogram matching (see below). In this case, monkey B had 158 and 48 neurons in vlPFC and cdlSNr, respectively, so n = 48, and monkey R had 191 and 68 neurons in vlPFC and cdlSNr, respectively, so n = 68. The equal number of regressors ensures a fair comparison between vlPFC and cdlSNr. The fitting was once done with the even trials in both dependent and independent variables. The resulting betas were then applied on odd trials in independent variables to predict responses from odd trials in dependent variables. This process was then repeated by regressing over odd and cross-validating over even. The cross-validated results from both cycles were then combined as the final report of the fit. The beta coefficients were also combined over both cycles and reported in Fig. 8B (cross validated R2 is reported in Fig. 8A).

To create plots of population responses with trial-by-trial changes in firing, a separate regression was done by using all neurons in a region (dependent region) that were recorded with a particular set as regressors to predict response PSTH of individual neurons in the other region (independent region) recorded with the same set. The regression was then repeated for all neurons in the dependent region individually. The beta coefficient of Gp and Bp neurons was averaged to make fig. S10. The predicted responses were also plotted on a trial-by-trial bases (Fig. 8C) for comparison with trial-by-trial change in actual firing in each region (Fig. 7A). Again, in this case, in each regression, subsampling was done using value AUC histogram matching to make sure that both regions have equal number of regressors for a given set. Regression was done according to the following formula (regression 2)yi(t)=j=1nβijxj(t)+ε

In this case, n is equal to the number neurons in region x that were recorded with the same set as the neuron i in the region y [which has its PSTH marked by yi(t)]. To make results from both regions comparable, n is equal to minimum number of neurons that were recorded in each region with a given set. Say, if SNr and vlPFC had 10 and 15 neurons recorded with set 103, then n=10. In this case, 10 neurons with value AUC histogram matching were chosen for the regression (see the “Value AUC histogram matching” section). Thus, both vlPFC->SNr and SNr->vlPFC regression had the exact same number of predictors.

Last, predicting average response in one region by average responses of Bp, NS, and Gp in the other (Fig. 8, D and E) was done according to the following formula (regression 3)

y¯(t)=k=13βkx¯k(t)+ε, where y¯(t)i=1myi(t)m and x¯k(t)l=1qkxl(t)qk with qk being the number of neurons in each Bp, NS, and Gp category in each region.

In this case, the number of predictors is three corresponding to three value categories Bp, NS, and Gp. All regressions were done within each monkey, and results were combined across the two monkeys.

Value AUC histogram matching. The histogram matching was done by sorting the vlPFC and cdlSNr regressor neurons with ascending value AUC. Then, the region with higher number of neurons was subsampled uniformly across the sorted population to have equal number of regressors as the other region.

Quantifying object selectivity. Sparsity of responses to objects was determined for each neuron (at least eight objects per neuron, half good) using the following formula separately for good and bad objects as: S=(1A)/(11n), where A=(inri/n)2/in(ri2/n) (51). Sparsity as a measure of object selectivity is highly sensitive to additive shifts in firing, decreasing for neurons with higher basal firing. This explains the lower sparsity in cdlSNr compared to vlPFC neuron. To arrive at a measure that can capture presence of object selectivity that is not sensitive to additive shifts or multiplicative scaling, we devised a novel measure called response NU. This metric compares the measured responses to objects with a uniform distribution with a support from the smallest to the largest responses observed. The uniform distribution is the maximum entropy distribution for a variable with limited support. The idea is that if a neuron is not really object selective and its variation is driven only by noise, then it should show a uniform distribution in responding. On the other hand, deviations from uniformity and concentration of responses anywhere along the support show a degree of response preference to objects. The NU is defined as 2r_minr_max(CDF(r)CDF(unir))rmaxrmin, where CDF(unir) is the cumulative density for the uniform distribution from rmin to rmax and the CDF(r) is the cumulative density for the observed distribution. At the limit that the number of tested objects go to infinity NU support is from 0 to 1. However, for small number objects NU support shrinks and has to be numerically calculated (fig. S5B, dashed lines).

Correlated value memory analysis. To compare value memory for all objects across both regions, absolute value AUC of one object versus AUC of objects from the opposite category in the same set was calculated in both regions and plotted as a single point in Fig. 3A. Because there are four good and four bad objects in a set, this resulted in eight AUC values for each set. However, the eight AUCs are not independent as we lose 2 degrees of freedom (df) for each set (equivalent to using mean good and mean bad responses). This df loss is considered for testing the significance of correlation coefficient between the two regions. The fisher z transform was used for significance of the correlation coefficients using the corrected df.

Testing for variations of value onsets across sessions. To test whether value onsets in passive viewing task were different for neurons that were recorded within a session versus neurons from different sessions, SD of value onset for neurons recorded within a session for each monkey was calculated and averaged (observed SD). Then, a null distribution of SDs was built by 10,000 permutations in which recording dates of neurons were shuffled. The observed SD was checked against this null distribution to test for the significant effect of within session recording on response onset variations.

RF mapping task

In this task, the animal had to keep fixating a central white dot (2°) while fractal objects were shown in one of 33 locations spanning eight radial directions and eccentricities from 0° to 20° in 5° steps. Fractals covered fixation when shown on the center. To measure visual responses unaffected by value and to reduce effects of object selectivity, multiple (at least eight) familiar objects not used in value training were used in this task. Objects were shown sequentially with 400-ms on and 200-ms off schedule. Central fixation would remain on between object presentations. Animals were rewarded for fixating after each object with probability 0.125 after which an intertrial interval (ITI) of 1 to 1.5 s with black screen would ensue. Locations were visited once orderly along radial directions and then orderly along the eccentricity circles and once randomly, resulting in 99 object presentations in one block of mapping. For some neurons, more than one block of mapping was performed. While mapping was done for all neurons in this study, the data from the mapping task itself were saved and analyzed for 220 vlPFC and 29 cdlSNr neurons (fig. S11).

Value training: Saccade task

Each session of training was performed with one set of fractals. The GB sets consisted of eight fractals (four good/four bad fractals). Bad fractals were paired with low reward (0.07 and 0.11 ml in monkeys R and B, respectively), and good fractals were paired with high reward (0.21 and 0.35 ml for monkeys R and B, respectively). The different juice amount was customized for each monkey based on his water motivation and to ensure satisfactory cooperation. The high-to-low juice amount was about 3 to 1 in both subjects. The amount sets consisted of five fractals with linearly increasing reward from low to high reward for each monkey. The probability sets consisted of five fractals associated probabilistically with low or high rewards but with a linearly increasing high reward probability. Note that in this case, objects 1 and 5 in amount and probability sets had the exact same reward size as good and bad objects in GB sets, respectively, and had no reward uncertainty. A trial started after central fixation on a white dot (2°), after which one object appeared on the screen at one of the five peripheral locations (10° to 15° eccentricity) or at center. In some sessions, fractals were shown on eight radial directions (45° divisions). After an overlap period of 400 ms, the fixation dot disappeared and the animal was required to make a saccade to the fractal. After 500 ± 100 ms of fixating the fractal, a large or small reward was delivered. Diluted apple juice (33 to 66%) was used as reward. The displayed fractal was then turned off followed by an ITI of 1 to 1.5 s with a black screen. Breaking fixation or a premature saccade to fractal during overlap period resulted in an error tone (<7% of trials). A correct tone was played after a correct trial. Normally, a training session consisted of 80 trials with objects presented in pseudorandom order. Each object set was trained for at least 10 sessions before test of long-term value memory. To check the behavioral learning of object values, choice trials with two objects with different reward association were included randomly in one of five trials (20% choice trials). The location and identity of fractals were randomized across choice trials. During the choice, the two fractals were shown in diametrically opposite locations, and monkey was required to choose one by looking and holding gaze for 500 ± 100 ms on a fractal, after which both fractals were turned off and the corresponding reward would be delivered. Only a single saccade was allowed in choice trials. Choice rate was >99% in GB sets, and for amount and probability sets, the average pairwise choice was 90 and 96%, respectively. For pairwise choice in amount and probability sets, see fig. S6.

Value memory: Passive viewing task

A passive viewing trial started after central fixation on a white dot (2°). The animal was required to hold a central fixation while objects from a given set were displayed randomly with 400-ms on and 400-ms off schedule. Animal was rewarded for continued fixation after a random number of two to four objects were shown. Objects were shown close to the location with maximal visual response for each neuron as determined by RF mapping task. When this maximal location was close to center (<5°), passive viewing was sometimes done by showing objects at the center. A block of passive viewing consisted of five to six presentations per object. In most cases, more than one block was acquired for a given set. For GB sets, passive viewing was done either in the same day (minutes to hours after) or 1 to 6 days, 1 to 4 weeks (7 to 29 days), and 1 to 4 months (30 to 142 days) after last value training session to allow examining the stability of memory across time. For amount/probability sets, most passive viewing sessions were done within days after last value training session.

Behavioral memory: Free viewing task

Each free viewing session consisted of 15 trials with fractals from a given set. In any given trial, four fractals would be randomly chosen from the set and shown in one of the four corners of an imaginary diamond or square around center (15° from display center; Fig. 5E). Fractals were displayed for 3 s during which the subjects could look at (or ignore) the displayed fractals. There was no behavioral outcome for free viewing behavior. After 3 s of viewing, the fractals disappeared. After a delay of 0.5 to 0.7 s, a white fixation dot appeared in one of nine random locations in the screen (center or eight radial directions). Monkeys were rewarded for fixating the fixation dot. This reward was not contingent on free viewing behavior. Next, display onset with four fractals was preceded by an ITI of 1 to 1.5 s with a black screen. Behavioral data in this task are from four monkeys B, R, D, and U who did 12, 18, 6, and 7 session with probability sets, respectively, and 12, 12, 7, and 7 sessions with amount sets, respectively.

Free viewing analysis

Gaze locations were analyzed using custom written MATALB functions in an automated fashion, and saccades (displacements > 2.5°) versus stationary periods were separated in each trial (17, 24). Two behavioral measures of gaze bias that were previously shown to be affected by value memory were used: (i) percentage of first saccade to a given object following the display onset (chance level, 25% in our free viewing with four objects); (ii) object scanning rate: rate of saccades within an object over the viewing duration of that object.

Statistical tests and significance levels

One-way and two-way analyses of variance (ANOVAs) were used to test main effects, and interactions for neural responses and for behavior error bars in all plots show SEM. All post hoc tests were done using Tukey’s post hoc test. Significance threshold for all tests in this study was P < 0.05. ns, not significant; *P < 0.05, **P < 0.01, and ***P < 0.001 (two-sided).

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/7/20/eabe0693/DC1

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We thank D. Parker, I. Bunea, G. Tansey, A.M. Nichols, and T.W. Ruffner for technical assistance in monkey care and electrophysiology and Hikosaka lab for helpful discussions. We also thank D. Yu, C. Zhu, and F. Ye for assistance with anatomical MRI scanning, which was carried out in the Neurophysiology Imaging Facility Core (National Institute of Mental Health, National Institute of Neurological Disorders and Stroke, and National Eye Institute). Funding: This research was supported by the Intramural Research Program at the NIH, National Eye Institute (grant no. EY000415-15). Author contributions: A.G. and O.H. designed the experiment. A.G. collected and analyzed the data. A.G. and O.H. wrote the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article