Research ArticleRESEARCH METHODS

The accuracy, fairness, and limits of predicting recidivism

See allHide authors and affiliations

Science Advances  17 Jan 2018:
Vol. 4, no. 1, eaao5580
DOI: 10.1126/sciadv.aao5580

Figures

Tables

  • Table 1 Human versus COMPAS algorithmic predictions from 1000 defendants.

    Human versus COMPAS algorithmic predictions from 1000 defendants.. Overall accuracy is specified as percent correct, AUC-ROC, and criterion sensitivity (d′) and bias (β). See also Fig. 1.

    (A) Human
    (no race)
    (B) Human
    (race)
    (C) COMPAS
    Accuracy (overall)67.0%66.5%65.2%
    AUC-ROC (overall)0.710.710.70
    d′/β (overall)0.86/1.020.83/1.030.77/1.08
    Accuracy (black)68.2%66.2%64.9%
    Accuracy (white)67.6%67.6%65.7%
    False positive (black)37.1%40.0%40.4%
    False positive (white)27.2%26.2%25.4%
    False negative (black)29.2%30.1%30.9%
    False negative (white)40.3%42.1%47.9%
  • Table 2 Algorithmic predictions from 7214 defendants.

    Logistic regression with 7 features (A) (LR7), logistic regression with 2 features (B) (LR2), a nonlinear SVM with 7 features (C) (NL-SVM), and the commercial COMPAS software with 137 features (D) (COMPAS). The results in columns (A), (B), and (C) correspond to the average testing accuracy over 1000 random 80%/20% training/testing splits. The values in the square brackets correspond to the 95% bootstrapped [columns (A), (B), and (C)] and binomial [column (D)] confidence intervals.

    (A) LR7(B) LR2(C) NL-SVM(D) COMPAS
    Accuracy (overall)66.6% [64.4, 68.9]66.8% [64.3, 69.2]65.2% [63.0, 67.2]65.4% [64.3, 66.5]
    Accuracy (black)66.7% [63.6, 69.6]66.7% [63.5, 69.2]64.3% [61.1, 67.7]63.8% [62.2, 65.4]
    Accuracy (white)66.0% [62.6, 69.6]66.4% [62.6, 70.1]65.3% [61.4, 69.0]67.0% [65.1, 68.9]
    False positive (black)42.9% [37.7, 48.0]45.6% [39.9, 51.1]31.6% [26.4, 36.7]44.8% [42.7, 46.9]
    False positive (white)25.3% [20.1, 30.2]25.3% [20.6, 30.5]20.5% [16.1, 25.0]23.5% [20.7, 26.5]
    False negative (black)24.2% [20.1, 28.2]21.6% [17.5, 25.9]39.6% [34.2, 45.0]28.0% [25.7, 30.3]
    False negative (white)47.3% [40.8, 54.0]46.1% [40.0, 52.7]56.6% [50.3, 63.5]47.7% [45.2, 50.2]