Research ArticleSOCIAL SCIENCES

Cross-disciplinary evolution of the genomics revolution

See allHide authors and affiliations

Science Advances  15 Aug 2018:
Vol. 4, no. 8, eaat4211
DOI: 10.1126/sciadv.aat4211
  • Fig. 1 Construction of the F network.

    Schematic network, serving as an instructive example of our method for classifying the faculty Embedded Image, their pollinator coauthors Embedded Image, and the links between them. The network corresponds to the table on the right. Two types of links connect the faculty nodes: a direct link (Embedded Image) if Embedded Image and Embedded Image are coauthors of at least one publication together, and a mediated link (Embedded Image) if there is at least one Embedded Image that has coauthored separately with Embedded Image and Embedded Image, thereby mediating a triadic closure between the two F. We classified each Embedded Image according to her/his main discipline: Embedded Image = biology and Embedded Image = computing unless they have collaborated with at least one Embedded Image from the other discipline, in which case the classification Embedded Image supersedes their original disciplinary classification. We classified the non-F coauthors Embedded Image as bridge pollinators if they coauthored with two or more faculty; otherwise, these Embedded Image are classified as leaf pollinators. Among the bridge pollinators, we classified those Embedded Image who coauthor with faculty from both biology and computing as cross-pollinators. Thus, the solid link connecting A-B represents a direct cross-disciplinary link, the dashed link connecting C-A represents a mediated cross-disciplinary link, and pollinators 7 and 8 are cross-pollinators because they have collaborated with faculty from each discipline. N/A, not applicable.

  • Fig. 2 Growth of cross-disciplinary social capital.

    (A) Evolution of the giant component in the U.S. biology-computing network. Green and magenta nodes represent faculty Embedded Image with Embedded Image and Embedded Image affiliation, respectively; black nodes represent faculty Embedded Image that, by time t, published at least one cross-disciplinary publication and joined the Embedded Image group; node size is proportional to the logarithm of the degree centrality, Embedded Image, of Embedded Image at time t. (B) Evolution of the fraction of collaboration links in the F network that are cross-disciplinary. We calculated f⋅,XD(t) using direct links Embedded Image between faculty (blue line) [that is, Embedded Image] or association links Embedded Image mediated by pollinators (red line) [that is, Embedded Image]. For comparison, the black line shows the evolution of cross-disciplinary links in the human genomics literature per Web of Science (WoS); these values are divided by two to facilitate trend comparison. The orange area marks the HGP project period.

  • Fig. 3 Descriptive statistics for the career data set.

    Vertical lines indicate distribution means for the corresponding subsets. (A) Probability distribution of the year of first publication Embedded Image by Embedded Image. (B) Probability distribution of Ki, the total number of collaborators for a given Embedded Image. (C) Probability distribution of χi, the fraction of the collaborators of Embedded Image who are cross-disciplinary. (D) Probability distribution of Embedded Image, the PageRank centrality of Embedded Image; it is scaled by Embedded Image, the number of Embedded Image so that the mean value of this scaled quantity across all Embedded Image, independent of the discipline subset, is 1. (E) Probability distribution of the mean impact factor (Embedded Image) of the publication record of Embedded Image. (F) Probability distribution of the total citations log10 Ci of Embedded Image.

  • Fig. 4 Career cross-sectional regression model.

    OLS parameter estimates for the linear regression model in Eq. 1. The coefficients for the relevant covariates split into two categories are shown, depending on whether you might find the information in the researcher’s CV or by analyzing her/his collaboration network. To facilitate comparison of the relative strength of the parameter estimates, the standardized beta coefficients are shown, representing the change in the dependent variable ln Ci that corresponds to a 1-SD shift in a given covariate. See table S2 for the complete list of parameter estimates. The levels of statistical significance are as follows: ***P ≤ 0.001.

  • Fig. 5 Career panel regression model.

    (A and B) Parameter estimates for the three principal explanatory variables included in the fixed effects F career model defined in Eq. 2; see table S4 for the complete list of parameter estimates. (C and D) Robustness check of panel regression model. To test the possibility of spurious correlations leading to the significant estimates for the cross-disciplinary variables in the panel model (table S4), we ran this model using a randomized cross-disciplinary indicator variable Embedded Image, implemented by shuffling just that variable across the observations without replacement. (C) For n = 1000 shuffled data sets, we do not observe any (0%) coefficient estimates as large as the empirical value βI = 0.145 corresponding to the dashed vertical blue line [solid vertical blue lines indicate 95% confidence interval (CI); see table S4, third column cluster]. (D) We repeated the same shuffling method for the panel model applied to only the 1247 Embedded Image classified with orientation Embedded Image, and again, we do not observe any (0%) coefficient estimates as large as the empirical value βI reported in table S5 (third column cluster). The levels of statistical significance are as follows: **P ≤ 0.01, ***P ≤ 0.001.

  • Fig. 6 The knowledge transfer story behind the numbers.

    Interactions of the HGP scholars with other faculty in the F network during the 2000s, and some of the landmark publications they produced, powering the genomics revolution. The scholar nodes bear the name initials. On the left panels, one can recognize some well-known HGP scholars, such as Eric Lander (EL) and Bruce Birren (BB). “d” stands for the network degree of a scholar and controls with the size of her/his node. “h” stands for the h-index of a scholar. Magenta nodes denote faculty affiliated with computing departments, while green nodes denote faculty affiliated with biology departments.

  • Fig. 7 Cross-disciplinarity beyond the faculty network F.

    (A) Cross-disciplinarity XDg as mixed authorship in the human genomics literature: Cross-disciplinarity is measured using the combinations of departmental affiliations on the set of Human Genome publications reported in the WoS. The mean value, weighted according to the publication volume each year, is Embedded Image. (B) Cross-disciplinarity XDe as mixed methods in NB: Cross-disciplinarity is measured by analyzing the combinations of computational and biological methods used within articles from the journal NB. The mean value, weighted according to the publication volume each year, is Embedded Image. In both panels, blue dots represent the respective rc(t), calculated using real data to measure the additional citation impact of XD publications. The curves correspond to the respective null model test statistic distribution P(rc,RND(t)), estimated from 1 million bootstrap randomizations, in which the expected value rc,RND(t) ≡ 1 (that is, no difference between the mean citation impact of the subsets). The red curve and shaded region correspond to the 90% confidence interval for the respective randomized rc,RND(t ) ≡ 1, and the outer black curves correspond to the 95% (solid) and 99% (dashed) confidence intervals. Thus, empirical data above (or below) the null model confidence intervals are significantly different than the expected value rc = 1 at the given significance level and demonstrate that it is highly unlikely to obtain these large values by chance alone.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/8/eaat4211/DC1

    Appendix S1. Author name disambiguation.

    Appendix S2. Connectivity of the F network.

    Fig. S1. Robustness of the F network with respect to link removal.

    Fig. S2. F network distributions for direct and mediated associations.

    Fig. S3. Three perspectives on the centrality of Formula in the direct collaboration network.

    Fig. S4. Evolution of the nongiant components in the F network.

    Fig. S5. Distribution of normalized citation impact by departmental affiliation and time period.

    Table S1. Set of 155 biology and computing departments in the United States.

    Table S2. Career data set: Pooled cross-sectional model.

    Table S3. Career data set: Pooled cross-sectional model—robustness check.

    Table S4. Career data set: Panel model on all faculty F.

    Table S5. Career data set: Panel model on the Formula faculty.

    Table S6. Career data set: Panel model on the Formula faculty with matched pairs.

    References (6264)

  • Supplementary Materials

    This PDF file includes:

    • Appendix S1. Author name disambiguation.
    • Appendix S2. Connectivity of the F network.
    • Fig. S1. Robustness of the F network with respect to link removal.
    • Fig. S2. F network distributions for direct and mediated associations.
    • Fig. S3. Three perspectives on the centrality of Fi in the direct collaboration network.
    • Fig. S4. Evolution of the nongiant components in the F network.
    • Fig. S5. Distribution of normalized citation impact by departmental affiliation and time period.
    • Table S1. Set of 155 biology and computing departments in the United States.
    • Table S2. Career data set: Pooled cross-sectional model.
    • Table S3. Career data set: Pooled cross-sectional model—robustness check.
    • Table S4. Career data set: Panel model on all faculty F.
    • Table S5. Career data set: Panel model on the XDF faculty.
    • Table S6. Career data set: Panel model on the XDF faculty with matched pairs.
    • References (6264)

    Download PDF

    Files in this Data Supplement:

Navigate This Article