Neural embeddings of scholarly periodicals reveal complex disciplinary organizations

See allHide authors and affiliations

Science Advances  23 Apr 2021:
Vol. 7, no. 17, eabb9004
DOI: 10.1126/sciadv.abb9004
  • Fig. 1 Model validation.

    (A to C), The distribution of cosine similarities for four groups of 100,000 journal pairs calculated based on the citation vector (cv) model, the Jaccard similarity matrix (jac), and our dense periodical embeddings (p2v). The four labels—random, cross-disc., discipline, and subdiscipline—represent random pairs, cross-discipline pairs, within-discipline pairs, and within-subdiscipline pairs. The two sparse embeddings (cv and jac) put most pairs at 0 and thus are not as informative as our dense embedding, which better captures journal similarities and their differences. Compared to random pairs, both the means and the distributions of the other three groups shift more dramatically based on p2v than that based on either cv or jac. (D) Average rank correlation coefficient between algorithms and experts in ranking topically similar journals. Target journals with an average pairwise expert agreement above 0.2 are used in the evaluation. The label disc. represents the method that ranks journals in the same discipline based on their PageRank scores. (E) F1 score of the classification task in predicting the discipline category for 12,751 journals (excluding 29 interdisciplinary journals) using the three vector-space models. The results are based on a five-fold cross validation. The “label citation” weight represents the method that predicts the discipline of a journal to be that of its most cited neighbor in the undirected journal citation network. Error bars indicate 95% confidence intervals.

  • Fig. 2 Periodical embeddings reveal complex disciplinary organizations.

    (A) The two-dimensional (2D) projection of 12,780 journals obtained using t-distributed stochastic neighbor embedding (t-SNE) (52). Each dot represents a journal, and its color denotes its discipline designated in the UCSD map (29 multidisciplinary journals are colored in black). (B) Archaeology and anthropology journals, classified as “Earth Sciences,” form a distinct cluster with its center closer to “Social Sciences” than the major “Earth Sciences” cluster (verified by cosine distances). (C) Group of medical imaging journals comes from “Brain Research,” “Medical Specialties,” and “EE & CS,” highlighting the key role of computer science and engineering in the study of brain imaging. (D) Set of parasite-focused journals spans many disciplines, including “Social Sciences” (Ecohealth), “Biology” (Parasites), “Infectious Diseases” (Malaria Journal), and “Chemistry” (Journal of Natural Toxins), revealing the multifaceted, highly interdisciplinary nature of parasite research. (E) The same map but with a grayscale representing the level of disagreement between the clustering in our embedding space and the discipline categories in the UCSD map. Red rectangles highlight the locations in (B) to (D). (F) Agreement between UCSD classifications and our survey. The top (bottom) represents journals with high (low) similarity between the UCSD catalog and a clustering based on our periodical embeddings.

  • Fig. 3 Analogy graphs between periodicals.

    (A) We apply two poles (ASR, JMLR) to KDD (or ICWSM) iteratively to find the most similar periodical at each step via the vector analogy: v(X) − v(ASR) + v(JMLR) ≈ v( ? ) (blue edges) or v(X) − v(JMLR) + v(ASR) ≈ v( ? ) (orange edges). Each node has two outgoing edges (blue or orange) representing the two opposite analogies. (B) We apply (Cell, PRL) to ASR and only expand periodicals that are one step away from ASR to make the graph concise. (C) Graph obtained by applying (ASR, PRL) to Blood. (D) Similar to (C), for seeds in different disciplines, including “Brain Research” (Cognition, Brain), “Earth Sciences” (Journal of Climate), “Humanities”(Mind), “Medical Specialties” (Cancer), and “Social Sciences” (Quarterly Journal of Economics). (E) Average fraction of acyclic edges per analogy graph that satisfy the author overlap criterion for all 1800 analogy graphs (produced by our periodical embeddings, p2v) in each of the 78 discipline pairs. (F) Same as (E) but for the differences in the mean values from the analogy graphs produced by cv. For all discipline pairs, the difference is positive and statistically significant (at P < 0.001).

  • Fig. 4 Two spectra of scholarship.

    (A) Spectrum of soft and hard sciences, operationalized by defining S+={v(p)pMath & Physics} and S={v(p)pSocial SciencespHumanities}. Each disciplinary journal is represented by a vertical line inside the box (12,751 in total). The color represents the discipline category and the position reflects the cosine similarity between the periodical vector and the axis vsoft → hard. We also annotate several journals and proceedings, whose background colors are proportional to their projection values. We then show journals in each disciplinary category separately at the bottom. The black vertical line in each discipline represents the mean projection value of its journals. (B) The spectrum along the axis between social sciences and life sciences (biological), operationalized by defining S+={v(p)pBiologypBiotechnologypInfectious DiseasespHealth ProfessionalspMedical Specialties} and S={v(p)pSocial SciencespHumanities}. Note that the ordering of 13 disciplines is dramatically changed from (A), reflecting the complex organization of scholarly periodicals in the embedding space along scientific axes.

Supplementary Materials

  • Supplementary Materials

    Neural embeddings of scholarly periodicals reveal complex disciplinary organizations

    Hao Peng, Qing Ke, Ceren Budak, Daniel M. Romero, Yong-Yeol Ahn

    Download Supplement

    This PDF file includes:

    • Tables S1 to S3
    • Figs. S1 to S26
    • Annotated maps of journals in each discipline

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article