Research ArticleSOCIAL NETWORKS

The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: The hotspot

+ See all authors and affiliations

Science Advances  19 Apr 2017:
Vol. 3, no. 4, e1601315
DOI: 10.1126/sciadv.1601315
  • Fig. 1 Knowledge hotspot predicts high-impact science.

    Papers in the hotspot are, on average, more than two times as likely to be hits than the background rate (data shown are for the year 1995, N = 546,912 papers). The hotspot is the overrepresented concentration of “hit” papers shown in green that cite prior knowledge with a low mean age, Dμ, and a high age COV, Dθ. Notably, 75% of papers are outside the hotspot, and their likelihood of being a hit is no greater than expected by chance. Solid lines and dotted lines are population means and medians of Dμ and Dθ. The background rate is the likelihood of a paper chosen at random being in the top 5% of citations for papers in that field.

  • Fig. 2 Distributions of the age of references.

    The plot shows the characteristic age distributions that correspond to the four quadrants shown in Fig. 1, taking the average distribution for each category among all papers in the WOS published in 1995. The central tendency of the low Dμ and high Dθ, “the knowledge hotspot,” distribution includes very recent work with a long, slowly sloping tail into past knowledge. By contrast, the central tendency of the low Dμ and low Dθ distribution is recent work, the central tendency of the high Dμ and high Dθ distribution is relatively old work, and the central tendency of the high Dμ and high Dθ distribution is to cite relatively evenly over past knowledge.

  • Fig. 3 Increasing dominance of the knowledge hotspot for predicting hit papers in science.

    Examining scientific papers over time shows that papers referencing work in the “low Dμ and high Dθ” distribution (that is, the knowledge hotspot) have consistently had the highest impact during the past 55 years. The probability of being a hit paper is more than twice the expected background rate, and the gap in citation impact between papers in the hotspot and those outside the hotspot is growing over time. After 1960, only papers that referenced work with certain age distributions, that is, belong to the hotspot, were associated with high-impact research at a rate that exceeded the rate expected by chance.

  • Fig. 4 Knowledge hotspot dominates high-impact science on a field-by-field basis.

    Disaggregating science into 171 separate science and engineering fields, 54 social science fields, and 27 humanities fields, the histograms indicate the fraction of all fields, where the knowledge hotspot predicts hit papers. In 1990–2000, almost 90% of the 252 fields showed the hotspot-hit link (P < 0.0001, two-tailed binomial test).

  • Fig. 5 Probability of a hit paper and combinations of Dμ and Dθ.

    Estimates are from Table 1 for 1990–2000 with 95% confidence intervals. Combinations of Dμ and Dθ above the dashed line have a probability greater than the 5% background rate expected by chance.

  • Fig. 6 The dominance of the hotspot for predicting hit patents.

    (A) Knowledge hotspot predicts high-impact technology. Patents that are in the hotspot are more than two times more likely to be hits than the background rate of 5% (data shown are for the year 1995, N = 103,700 patents). These papers cite prior work that has a low mean age, Dμ, and a high age variance, Dθ, relative to other papers in their field. Notably, 75% of patents are outside the hotspot and display a probability of being a hit that is no greater than expected by chance. Solid lines and dotted lines are population means and medians of Dμ and Dθ. (B) Increasing dominance of the knowledge hotspot in patenting. Examining patents on a year-by-year basis shows that patents in the hotspot have consistently had the highest probability of a hit during the past 50 years. (C) Knowledge hotspot dominates high-impact patenting on a field-by-field basis. Across 95% patent subfields, patents in the hotspot are more likely to be hits than those based on other ages of information. Between 1990 and 1999, patents in the top 5% of the citation distribution are in the hotspot in more than 95% of subfields (P < 0.0001, two-tailed binomial test).

  • Fig. 7 Collaboration predicts the increased probability of referencing knowledge in the hotspot.

    Each entry on the x axis indicates a different Fields Medalist in mathematics in chronological order of receiving the prize. Values above zero on the y axis indicate the difference in the probability of being in the hotspot when a Fields Medalist coauthors versus authors alone. For 26 of 31 Fields Medalists, coauthorship is positively and significantly associated with the authors’ chances of being in the hotspot (P < 0.0009, binomial test).

  • Table 1 Probability of being in the top 5% of citations for scientific papers.

    Logit regression estimates for three time periods indicate that the strong negative predictive relationship between Dμ and H and the strong positive relationship between Dθ and H shown in Figs. 1 and 5 hold across time, fields, paper, and reference characteristics. BIC model fit statistics “very strongly” indicate that models with Dμ and Dθ significantly and substantively fit the data better than control variable models (see Materials and Methods) [(25), p. 139]. Variance inflation factor statistics are 1.25 or 1.21, depending on the decade, and indicate no multicollinearity among the independent variables. ***P < 0.0001, **P < 0.001.

    1980–19891990–20001950–2000
    β (SE)β (SE)β (SE)β (SE)β (SE)β (SE)
    Dμ−0.195*** (0.0008)−0.185*** (0.001)−0.203*** (0.0006)−0.179*** (0.001)−0.157*** (0.0004)−0.179*** (0.0006)
    Dθ1.691*** (0.007)1.329*** (0.010)1.559*** (0.0056)1.410*** (0.007)1.776*** (0.004)1.367*** (0.10)
    Reference-level controls
      P (Interdisciplinarity)1.954*** (0.024)1.892*** (0.021)1.909*** (0.030)
      A (Novelty)0.185*** (0.006)0.177*** (0.005)0.186*** (0.003)
      C (Conventionality)0.239*** (0.002)0.255*** (0.002)0.229*** (0.001)
      M (Reference quality)0.001*** (10−04)0.0006*** (10−04)0.001*** (6.5 × 10−06)
    Paper fixed effects
      N (#Authors)YYY
      Y (Year)YYY
      R (#References)YYY
      S (Subfield)YYY
    Obs.3,792,0383,627,6246,298,0056,099,78813,950,69113,387,366
  • High-impact papers
    Top 5% of citations in a paper’s field
      1950–2000BIC drops from 4,502,525 to 4,257,045 when Dμ
    and Dθ are added in the model
      1990–2000BIC drops from 2,001,497 to 1,884,133 when Dμ
    and Dθ are added in the model
      1980–1989BIC drops from 1,213,883 to 1,147,347 when Dμ
    and Dθ are added in the model
    Top 1% of citations in a paper’s field
      1950–2000BIC drops from 1,427,176 to 1,141,414 when Dμ
    and Dθ are added in the model
      1990–2000BIC drops from 542,014 to 508,325 when Dμ and
    Dθ are added in the model
      1980–1989BIC drops from 324,438 to 305,732 when Dμ and
    Dθ are added in the model
    High-impact patents
    Top 5% of citations in a patent’s field
      1980–2000BIC drops from 589,218 to 582,978 when Dμ and
    Dθ are added in the model
      1990–2000BIC drops from 399,672 to 395,771 when Dμ and
    Dθ are added in the model
      1980–1989BIC drops from 225,631 to 223,843 when Dμ and
    Dθ are added in the model
    Top 1% of citations in a patent’s field
      1980–2000BIC drops from 160,756 to 159,132.6 when Dμ
    and Dθ are added in the model
      1990–2000BIC drops from 105,391 to 104,316 when Dμ and
    Dθ are added in the model
      1980–1989BIC drops from 68,556 to 66,992 when Dμ and Dθ
    are added in the model

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/3/4/e1601315/DC1

    section S1. Data sets

    section S2. Dμ and Dθ distributions

    section S3. Alternative measures of Dθ produce equivalent results

    section S4. Simple null model of referencing with respect to the age of a publication or patent

    section S5. Demonstration case

    section S6. Robustness checks for alternative measures of being a hit beyond the top 5% (top 1, 10, 25, and 50%) for papers

    section S7. Further robustness checks

    section S8. Standardized coefficients of papers

    section S9. Regression analyses of patents

    section S10. Predicting referencing in the hotspot

    section S11. BIC statistics for supplemental regression analyses

    fig. S1. Reference age distributions.

    fig. S2. Switching references in the null model.

    fig. S3. Expected and observed distributions of reference age distributions.

    table S1. Alternative measures of Dθ produce consistent results.

    table S2. Demonstration case of search and impact.

    table S3. Probability of being in the top 1% of citations for papers.

    table S4. Probability of being in the top 10% of citations for papers.

    table S5. The probability of being in the top 25% of citations for papers.

    table S6. The probability of being in the top 50% of citations for papers.

    table S7. The probability of being a sleeping beauty paper at different levels of citations.

    table S8. Fixed-effects ordinary least squares regression estimating the relationship between Dμ and Dθ and the citations acquired in the first 8 years after publication and for all citations over a paper’s lifetime.

    table S9. Approximate PageRank analysis.

    table S10. Standardized coefficients of the probability of being in the top 5% of papers.

    table S11. Probability of being in the top 5% of citations for patents.

    table S12. Probability of being in the top 1% of citations for patents.

    table S13. Probability of a paper referencing work in the knowledge hotspot for coauthors versus solo authors.

  • Supplementary Materials

    This PDF file includes:

    • section S1. Data sets
    • section S2. Dμ and Dθ distributions
    • section S3. Alternative measures of Dθ produce equivalent results
    • section S4. Simple null model of referencing with respect to the age of a publication or patent
    • section S5. Demonstration case
    • section S6. Robustness checks for alternative measures of being a hit beyond the top 5% (top 1, 10, 25, and 50%) for papers
    • section S7. Further robustness checks
    • section S8. Standardized coefficients of papers
    • section S9. Regression analyses of patents
    • section S10. Predicting referencing in the hotspot
    • section S11. BIC statistics for supplemental regression analyses
    • fig. S1. Reference age distributions.
    • fig. S2. Switching references in the null model.
    • fig. S3. Expected and observed distributions of reference age distributions.
    • table S1. Alternative measures of Dθ produce consistent results.
    • table S2. Demonstration case of search and impact.
    • table S3. Probability of being in the top 1% of citations for papers.
    • table S4. Probability of being in the top 10% of citations for papers.
    • table S5. The probability of being in the top 25% of citations for papers.
    • table S6. The probability of being in the top 50% of citations for papers.
    • table S7. The probability of being a sleeping beauty paper at different levels of citations.
    • table S8. Fixed-effects ordinary least squares regression estimating the relationship between Dμ and Dθ and the citations acquired in the first 8 years after publication and for all citations over a paper’s lifetime.
    • table S9. Approximate PageRank analysis.
    • table S10. Standardized coefficients of the probability of being in the top 5% of papers.
    • table S11. Probability of being in the top 5% of citations for patents.
    • table S12. Probability of being in the top 1% of citations for patents.
    • table S13. Probability of a paper referencing work in the knowledge hotspot for coauthors versus solo authors.

    Download PDF

    Files in this Data Supplement:

Related Content

More Like This