Fig. 2 Classification accuracy of human predictions, statistical models, and existing tools. Classification accuracy is shown for (i) human predictions, with and without immediate feedback; (ii) a logistic regression model that we trained using the same information provided to study participants; and (iii) the existing tools, COMPAS or LSI-R. For participants in the feedback condition, only the last 10 responses for each participant were used, to account for the effects of learning. Error bars represent 95% confidence intervals and are typically smaller than the height of red squares for the logistic regression models.
Fig. 3 Ranking accuracy of human predictions, statistical models, and existing tools. Ranking accuracy, as measured by AUC, is shown for (i) human predictions without feedback, (ii) logistic regression models that use the same information provided to study participants, and (iii) the existing LSI-R tools. Error bars indicate 95% confidence intervals.
Fig. 4 An alternative measure of ranking accuracy. Proportion of people who recidivated that were identified when ranking by the risk assessments of humans in the no-feedback condition, a logistic regression model, and existing tools (COMPAS or LSI-R). For each value p on the horizontal axis, the vertical axis shows the proportion of all recidivists that are included among the p-percent of the population deemed riskiest. Human performance was generally comparable to algorithmic tools in the streamlined condition (top), but algorithmic tools outperformed humans when more information was made available [enriched condition, (bottom)].
- Table 1 Characteristics of the four datasets that we considered.
BR, base rate.
COMPAS balanced BR COMPAS low BR LSI-R balanced BR LSI-R low BR Number of cases 1000 1000 311 1954 BR of recidivism 48% 11% 29% 9% Features Streamlined Streamlined Streamlined/enriched Streamlined/enriched Number of responses
(no feedback)2700 2400 2850/2500 2850/2900 Number of responses
(feedback)2400 3000 2700/2400 3000/2550
Supplementary Materials
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/7/eaaz0652/DC1
Fig. S1. Ranking performance of human predictions, statistical models, and existing tools.
Fig. S2. A comparison between the classification accuracy of humans and existing tools.
Fig. S3. Average classification accuracy over time with feedback.
Fig. S4. Calibration plot for human responses.
Table S1. Relative classification accuracy of humans without feedback.
Table S2. Relative classification accuracy of humans with feedback.
Table S3. Relative classification accuracy of humans with and without feedback.
Table S4. Relative ranking accuracy of humans without feedback.
Table S5. Relative performance of humans and models in the streamlined and enriched conditions.
Table S6. Relative recall of humans without feedback.
Additional Files
Supplementary Materials
This PDF file includes:
- Fig. S1. Ranking performance of human predictions, statistical models, and existing tools.
- Fig. S2. A comparison between the classification accuracy of humans and existing tools.
- Fig. S3. Average classification accuracy over time with feedback.
- Fig. S4. Calibration plot for human responses.
- Table S1. Relative classification accuracy of humans without feedback.
- Table S2. Relative classification accuracy of humans with feedback.
- Table S3. Relative classification accuracy of humans with and without feedback.
- Table S4. Relative ranking accuracy of humans without feedback.
- Table S5. Relative performance of humans and models in the streamlined and enriched conditions.
- Table S6. Relative recall of humans without feedback.
Files in this Data Supplement: