Research ArticleRESEARCH METHODS

How to detect high-performing individuals and groups: Decision similarity predicts accuracy

See allHide authors and affiliations

Science Advances  20 Nov 2019:
Vol. 5, no. 11, eaaw9011
DOI: 10.1126/sciadv.aaw9011
  • Fig. 1 Analytical prediction: Decision similarity is tightly associated with decision accuracy in binary decision problems.

    (A) When compared with a benchmark individual with p > 0.5, the expected decision similarity E(Si) of individual i increases with its accuracy level pi; the different lines correspond to benchmark individuals with different levels of accuracy. (B) When comparing the decisions of individuals to the decisions of all other individuals in a pool of candidate decision makers, the expected average decision similarity of individuals is tightly correlated with individual accuracy, as long as the average accuracy of individuals in the pool is above 0.5 (more precisely, as long as the average accuracy of individuals remains above 0.5 after excluding every possible pair of individuals, see Eq. 4). The panel illustrates this for a pool of 10 decision makers with decision accuracies of 0.55, 0.60, 0.65,…1.0, respectively. The expected decision similarity is calculated using Eq. 3.

  • Fig. 2 Numerical simulations: The similarity-accuracy relationship is observed when similarity is calculated from a few samples and the decisions of different individuals are correlated with each other.

    (A) For the numerical simulations, we sampled decision makers from a wide range of populations of decision makers differing in their performance distribution (x axis, individual accuracy; y axis, probability density). We created those by systematically varying the two shape parameters α (values on top) and β (values on the right) of the beta distribution. Dashed vertical lines indicate the chance level of raters (i.e., accuracy of 0.5). (B) Average correlation coefficient between decision similarity and accuracy for 10 raters making 10, 25, and 100 decisions (subpanel columns) and for different degrees of correlations (0, 0.5, and 1; subpanel rows) between the decisions of different individuals (see Materials and Methods). Within each subpanel, the tiles correspond to raters drawn from the population (i.e., accuracy distribution) of the associated α-β combination in (A). Tiles below (above) the diagonal correspond to populations with an average individual accuracy above (below) 0.5; increasingly red (blue) colors indicate increasingly positive (negative) correlations. All results are averages over 2500 random samples. Whenever individual accuracy is above 0.5, we find a positive correlation between similarity and accuracy. While this correlation can be observed even in the most extreme scenarios with maximum correlation between the decisions of different decision makers (bottom row), generally, the strength of this correlation increases as the correlation between the decisions of decision makers decreases.

  • Fig. 3 Decision similarity tightly correlates with decision accuracy in breast and skin cancer diagnostics, geopolitical forecasting, and a general knowledge task.

    (A to D) In all four datasets, we find, as predicted (Figs. 1 and 2), a positive relationship between individuals’ average decision similarity (i.e., average percentage of agreement with others) and accuracy. In (A) and (B), accuracy is expressed as balanced accuracy, and in (C) and (D), as proportion correct (see Materials and Methods). Lines are robust linear regression lines.

  • Fig. 4 Decision similarity robustly permits the identification of high-performing individuals.

    (A to D) The average performance of individuals in a test set selected on the basis of their decision similarity in a training set, for different decision similarity thresholds (e.g., the top 25% corresponds to the 25% of raters with the highest decision similarity in the training set) and different numbers of training images (i.e., number of decisions used to calculate decision similarity). As can be seen, in all four datasets, selecting individuals based on decision similarity substantially increases the average performance in the pool of decision makers. As predicted, when increasing the size of the training set and/or applying a stricter selection criterion, the average accuracy of the selected individuals increases. In (A) and (B), accuracy is expressed as balanced accuracy, and in (C) and (D), as proportion correct.

  • Fig. 5 The decision similarity within a group is tightly associated with that group’s collective accuracy under the majority vote.

    (A) Illustrative example of the relationship between the expected average decision similarity, the average individual accuracy of group members (open dots), and the expected performance of the majority rule (filled dots), for eight groups of three identical decision makers with accuracies of 0.60, 0.65,…0.95, respectively. Decision similarity is calculated using Eq. 3, and the expected accuracy of the majority rule is calculated using the binomial distribution given by p3 + 3 ⋅ p2 ⋅ (1 − p). (B to E) As predicted, in all datasets, we find a strong positive correlation between the average decision similarity among group members and their collective performance under the majority rule (filled dots and solid robust regression lines). This pattern is driven by a strong positive relationship between the average decision similarity among group members and the average individual performance of group members (open dots, dashed robust regression lines). In (B) and (C), accuracy is expressed as balanced accuracy, and in (D) and (E), as proportion correct.

  • Fig. 6 Decision similarity permits identification of high-performing groups of individuals.

    (A to D) The average collective performance of groups in a test set, using the majority rule, when groups of individuals are selected on the basis of decision similarity in a training set, for different similarity thresholds (e.g., the top 25% corresponds to groups containing individuals with the 25% highest decision similarity values) and different numbers of training images (i.e., number of decisions used to calculate decision similarity). As can be seen, in all datasets, selecting groups of individuals based on decision similarity substantially increases the average collective performance. As predicted, the performance of the selected groups of individuals in the test set increases with the number of training images as well as with a stricter threshold value. The purple line (“Average”) refers to the average majority rule performance of all groups in the test set. In (A) and (B), accuracy is expressed as balanced accuracy, and in (C) and (D), as proportion correct.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/11/eaaw9011/DC1

    Supplementary text

    Fig. S1. Distribution of individuals’ level of accuracy and correlated decisions in the four datasets.

    Fig. S2. High-performing individuals are similar to each other, while low-performing individuals tend to make dissimilar decisions.

    Fig. S3. Decision similarity performs well for cases in which the majority decided correctly but breaks down for cases in which the minority decided correctly.

    Fig. S4. The similarity-accuracy relationship is also present when using the continuous probability forecasts.

    Fig. S5. Decision similarity permits identification of low-performing individuals.

    Fig. S6. Decision similarity permits identification of high-performing (and low-performing) individuals in small groups.

    Fig. S7. The relationship between decision similarity of a group of nine individuals and their individual and collective accuracy.

    Fig. S8. Decision similarity permits identification of low-performing groups.

    Fig. S9. In each of the four datasets, the average decision similarity to others tightly correlates with the decision similarity to the majority judgment.

    Fig. S10. Decision similarity to the majority tightly correlates with decision accuracy in breast and skin cancer diagnostics, geopolitical forecasting, and a general knowledge task.

    Skin cancer data set

    R Code numerical simulations (Fig. 2B)

  • Supplementary Materials

    The PDFset includes:

    • Supplementary text
    • Fig. S1. Distribution of individuals’ level of accuracy and correlated decisions in the four datasets.
    • Fig. S2. High-performing individuals are similar to each other, while low-performing individuals tend to make dissimilar decisions.
    • Fig. S3. Decision similarity performs well for cases in which the majority decided correctly but breaks down for cases in which the minority decided correctly.
    • Fig. S4. The similarity-accuracy relationship is also present when using the continuous probability forecasts.
    • Fig. S5. Decision similarity permits identification of low-performing individuals.
    • Fig. S6. Decision similarity permits identification of high-performing (and low-performing) individuals in small groups.
    • Fig. S7. The relationship between decision similarity of a group of nine individuals and their individual and collective accuracy.
    • Fig. S8. Decision similarity permits identification of low-performing groups.
    • Fig. S9. In each of the four datasets, the average decision similarity to others tightly correlates with the decision similarity to the majority judgment.
    • Fig. S10. Decision similarity to the majority tightly correlates with decision accuracy in breast and skin cancer diagnostics, geopolitical forecasting, and a general knowledge task.

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article