Research ArticlePSYCHOLOGY

Reward associations do not explain transitive inference performance in monkeys

See allHide authors and affiliations

Science Advances  31 Jul 2019:
Vol. 5, no. 7, eaaw2089
DOI: 10.1126/sciadv.aaw2089
  • Fig. 1 Experimental and analytic procedure.

    (A) Subjects learned the ordering of seven-item lists, consisting of images (A, B, C, etc.). The correct item is always the item that occurs early in the list (e.g., B is correct for the pair BC). (B) Trial structure. After touching a start stimulus, subjects see two images. Touching the correct stimulus yields rewards of varying magnitude. Incorrect responses yield no reward and a brief time-out period. (C) Stimuli were presented in pairs. Training sessions presented only adjacent pairs, outlined in blue. Testing sessions presented all pairs. Reward amounts depended on the rank of the correct stimulus. The “reverse gradient” delivered one drop of water for correct responses to A, two drops for correct responses to B, etc. This gradient is labeled “reverse” because the overall expected value of F exceeds that of E, although choosing F when the EF pair is presented results in no reward. Thus, expected value cannot be used to guide which choice is correct. The “concordant gradient” delivered six drops for correct choices of A, five for B, etc. Therefore, the stimulus with the higher expected value is concordant with the correct choice. (D) Bayesian model for estimating stimulus position from observed response accuracy. Subjects are presumed to make use of a linear representation with uncertain stimulus positions. We assume that this representation takes the form of a normal distribution with some mean and SD for each stimulus. To infer these parameters, we estimated p(correct) for each pair and transformed this to the area above zero of some z distribution. Inferring the parameters in our representation is then done as a simultaneous estimation problem, implemented using Stan. Stimuli adapted from images in the public domain.

  • Fig. 2 Subjects retained the same list of stimuli for each set of training and testing sessions and then were presented with a new list at the start of the next training session.

    Vertical dashed lines indicate a break lasting 1 year. (A) Population estimates of response accuracy (black) for each session. Chance is indicated by the dotted lines. Red circles correspond to the exploratory Q-learner, fit to the observed data. Blue circles correspond to the exploitative Q-learner, based on a previous study (12). (B) Peaks of inferred position distributions of each stimulus in subjects’ representations. Red, A; orange, B; green, C; cyan, D; blue, E; violet, F; and black, G. Subjects reconstructed the stimulus order in the concordant gradient condition, and did so approximately in the reverse gradient condition, with the exception of stimulus A. (C) Average number of the 21 stimulus pairs that fell in the correct order, based on the model estimates in the above panel. The red dotted line indicates how many pairs would be ordered correctly if subjects used expected values as the basis for ordering. (D) Support of the evidence for the positions in the inferred representation being organized according to a strictly linear representation (in red), relative to the expected values (in blue), according to a Bayesian Information Criterion (BIC) analysis. Subjects tended to be equivocal, or to favor the expected value, during the first few sessions of training. However, late in training, and throughout the testing phase, the inferred representations more closely resembled a linear ordering of stimuli with uniform spacing. (E to H) Same as (A) to (D), respectively, but based on pooling data across the six lists.

  • Fig. 3 Results of session-by-session binomial regression estimation of response accuracy.

    Shaded intervals represent the 95% credible interval for the population estimate. (A) Overall estimated response accuracy for each session (black circles), as well as subject-level estimates of performance (white diamonds). (B) Estimated regression parameters for the effects of symbolic distance (blue) and reward magnitude (red), plotted on the logit scale. Distance effects could only be estimated for testing phases, as all stimulus pairs during training had the same symbolic distance.

  • Fig. 4 Parameter estimates for binomial regression of response accuracy, pooled across the six lists learned in a given phase, plotted on the logit scale.

    Boxes represent the 80% credible interval for the population estimate, while whiskers represent the 95% credible interval. (A) Population estimates of the session intercepts, representing overall response accuracy (i.e., a value of 0.0 denotes chance performance). (B) Population estimates of the effect of symbolic distance. (C) Population estimates of the effect of reward magnitude. (D to F) Same as (A) to (C), respectively, but subject-level parameters estimated for each of the four subjects. (G to I) Posterior differences between the population-level parameters of the concordant gradient and reverse gradient conditions. (J to L) Posterior differences between the subject-level parameters.

  • Fig. 5 Response accuracy for all pairs presented during the last session of training and the first session of testing, based on the parameters reported in Fig. 4.

    Circles represent the raw empirical estimates, independent of the model fit. Boxes are centered at the mean population accuracy, with their upper and lower extent representing the 80% credible interval for the population estimate. Whiskers represent the 95% credible interval. Pairs are first organized by symbolic distance (red, distance 1; blue, distance 2; etc.) and then alphabetically (AB, BC, CD, etc.).

  • Fig. 6 Estimated response accuracy for critical pairs at the trial of testing.

    Boxes are centered at the mean population accuracy, with their upper and lower extent representing the 80% credible interval for the population estimate. Whiskers represent the 95% credible interval. (A) Population estimates of critical pair accuracy in the reverse gradient condition based on a binomial regression using only the first presentation of each pair. In addition, mean accuracy across all critical pairs, with corresponding uncertainty, is plotted in black. (B) Population estimates in the reverse gradient condition based on logistic regressions, extrapolating performance at the intercept (trial zero). Each pair was fit separately in this analysis, and the mean of the critical pairs was pooled across those estimates. (C) Population estimates of trial zero performance in the reverse gradient condition based on a logistic regression that incorporated data from all six pairs in both conditions, fitting both distance and reward effects. (D to F). Same as (A) to (C), respectively, but associated with the concordant gradient condition.

Supplementary Materials

  • Supplementary Materials

    Other Supplementary Material for this manuscript includes the following:

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article