Figures
Tables
- Table 1 Human versus COMPAS algorithmic predictions from 1000 defendants.
Human versus COMPAS algorithmic predictions from 1000 defendants.. Overall accuracy is specified as percent correct, AUC-ROC, and criterion sensitivity (d′) and bias (β). See also Fig. 1.
(A) Human
(no race)(B) Human
(race)(C) COMPAS Accuracy (overall) 67.0% 66.5% 65.2% AUC-ROC (overall) 0.71 0.71 0.70 d′/β (overall) 0.86/1.02 0.83/1.03 0.77/1.08 Accuracy (black) 68.2% 66.2% 64.9% Accuracy (white) 67.6% 67.6% 65.7% False positive (black) 37.1% 40.0% 40.4% False positive (white) 27.2% 26.2% 25.4% False negative (black) 29.2% 30.1% 30.9% False negative (white) 40.3% 42.1% 47.9% - Table 2 Algorithmic predictions from 7214 defendants.
Logistic regression with 7 features (A) (LR7), logistic regression with 2 features (B) (LR2), a nonlinear SVM with 7 features (C) (NL-SVM), and the commercial COMPAS software with 137 features (D) (COMPAS). The results in columns (A), (B), and (C) correspond to the average testing accuracy over 1000 random 80%/20% training/testing splits. The values in the square brackets correspond to the 95% bootstrapped [columns (A), (B), and (C)] and binomial [column (D)] confidence intervals.
(A) LR7 (B) LR2 (C) NL-SVM (D) COMPAS Accuracy (overall) 66.6% [64.4, 68.9] 66.8% [64.3, 69.2] 65.2% [63.0, 67.2] 65.4% [64.3, 66.5] Accuracy (black) 66.7% [63.6, 69.6] 66.7% [63.5, 69.2] 64.3% [61.1, 67.7] 63.8% [62.2, 65.4] Accuracy (white) 66.0% [62.6, 69.6] 66.4% [62.6, 70.1] 65.3% [61.4, 69.0] 67.0% [65.1, 68.9] False positive (black) 42.9% [37.7, 48.0] 45.6% [39.9, 51.1] 31.6% [26.4, 36.7] 44.8% [42.7, 46.9] False positive (white) 25.3% [20.1, 30.2] 25.3% [20.6, 30.5] 20.5% [16.1, 25.0] 23.5% [20.7, 26.5] False negative (black) 24.2% [20.1, 28.2] 21.6% [17.5, 25.9] 39.6% [34.2, 45.0] 28.0% [25.7, 30.3] False negative (white) 47.3% [40.8, 54.0] 46.1% [40.0, 52.7] 56.6% [50.3, 63.5] 47.7% [45.2, 50.2]