Research ArticleSOCIAL SCIENCES

The limits of human predictions of recidivism

See allHide authors and affiliations

Science Advances  14 Feb 2020:
Vol. 6, no. 7, eaaz0652
DOI: 10.1126/sciadv.aaz0652
  • Fig. 1 Sample vignettes.

    The top and bottom panels provide examples of streamlined and enriched vignettes, respectively. Participants assessed the likelihood of re-arrest on a 30-point scale, as shown in each panel.

  • Fig. 2 Classification accuracy of human predictions, statistical models, and existing tools.

    Classification accuracy is shown for (i) human predictions, with and without immediate feedback; (ii) a logistic regression model that we trained using the same information provided to study participants; and (iii) the existing tools, COMPAS or LSI-R. For participants in the feedback condition, only the last 10 responses for each participant were used, to account for the effects of learning. Error bars represent 95% confidence intervals and are typically smaller than the height of red squares for the logistic regression models.

  • Fig. 3 Ranking accuracy of human predictions, statistical models, and existing tools.

    Ranking accuracy, as measured by AUC, is shown for (i) human predictions without feedback, (ii) logistic regression models that use the same information provided to study participants, and (iii) the existing LSI-R tools. Error bars indicate 95% confidence intervals.

  • Fig. 4 An alternative measure of ranking accuracy.

    Proportion of people who recidivated that were identified when ranking by the risk assessments of humans in the no-feedback condition, a logistic regression model, and existing tools (COMPAS or LSI-R). For each value p on the horizontal axis, the vertical axis shows the proportion of all recidivists that are included among the p-percent of the population deemed riskiest. Human performance was generally comparable to algorithmic tools in the streamlined condition (top), but algorithmic tools outperformed humans when more information was made available [enriched condition, (bottom)].

  • Table 1 Characteristics of the four datasets that we considered.

    BR, base rate.

    COMPAS balanced BRCOMPAS low BRLSI-R balanced BRLSI-R low BR
    Number of cases100010003111954
    BR of recidivism48%11%29%9%
    FeaturesStreamlinedStreamlinedStreamlined/enrichedStreamlined/enriched
    Number of responses
    (no feedback)
    270024002850/25002850/2900
    Number of responses
    (feedback)
    240030002700/24003000/2550

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/7/eaaz0652/DC1

    Fig. S1. Ranking performance of human predictions, statistical models, and existing tools.

    Fig. S2. A comparison between the classification accuracy of humans and existing tools.

    Fig. S3. Average classification accuracy over time with feedback.

    Fig. S4. Calibration plot for human responses.

    Table S1. Relative classification accuracy of humans without feedback.

    Table S2. Relative classification accuracy of humans with feedback.

    Table S3. Relative classification accuracy of humans with and without feedback.

    Table S4. Relative ranking accuracy of humans without feedback.

    Table S5. Relative performance of humans and models in the streamlined and enriched conditions.

    Table S6. Relative recall of humans without feedback.

  • Supplementary Materials

    This PDF file includes:

    • Fig. S1. Ranking performance of human predictions, statistical models, and existing tools.
    • Fig. S2. A comparison between the classification accuracy of humans and existing tools.
    • Fig. S3. Average classification accuracy over time with feedback.
    • Fig. S4. Calibration plot for human responses.
    • Table S1. Relative classification accuracy of humans without feedback.
    • Table S2. Relative classification accuracy of humans with feedback.
    • Table S3. Relative classification accuracy of humans with and without feedback.
    • Table S4. Relative ranking accuracy of humans without feedback.
    • Table S5. Relative performance of humans and models in the streamlined and enriched conditions.
    • Table S6. Relative recall of humans without feedback.

    Download PDF

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article