Research ArticleVIROLOGY

A phylogenomic data-driven exploration of viral origins and evolution

See allHide authors and affiliations

Science Advances  25 Sep 2015:
Vol. 1, no. 8, e1500527
DOI: 10.1126/sciadv.1500527
  • Fig. 1 FSF sharing patterns and makeup of cellular and viral proteomes.

    (A) Numbers in parentheses indicate the total number of proteomes that were sampled from Archaea, Bacteria, Eukarya, and viruses. (B) Barplots comparing the proteomic composition of viruses infecting the three superkingdoms. Numbers in parentheses indicate the total number of viral proteomes in each group. Numbers above bars indicate the total number of proteins in each of the three classes of proteins. VSFs are listed in Table 1. (C and D) FSF use and reuse for proteomes in each viral subgroup and in the three superkingdoms. Values given in logarithmic scale. Important outliers are labeled. Shaded regions highlight the overlap between parasitic cells and giant viruses.

  • Fig. 2 Spread of viral FSFs in cellular proteomes.

    (A) Violin plots comparing the spread (f value) of FSFs shared and not shared with viruses in archaeal, bacterial, and eukaryal proteomes. (B) Violin plots comparing the spread (f value) of FSFs shared with each viral subgroup in archaeal, bacterial, and eukaryal proteomes. Numbers on top indicate the total number of FSFs involved in each comparison. White circles in each boxplot represent group medians. Density trace is plotted symmetrically around the boxplots.

  • Fig. 3 Virus-host preferences and FSF distribution in viruses infecting different hosts.

    (A) The abundance of each viral replicon type that is capable of infecting Archaea, Bacteria, and Eukarya and major divisions in Eukarya. Virus-host information was retrieved from the National Center for Biotechnology Information Viral Genomes Project (119). Hosts were classified into Archaea, Bacteria, Protista (animal-like protists), Fungi, Plants (all plants, blue-green algae, and diatoms), Invertebrates and Plants (IP), and Metazoa (vertebrates, invertebrates, and humans). Host information was available for 3440 of the 3660 viruses that were sampled in this study. Two additional ssDNA archaeoviruses were added from the literature (129, 130). Numbers on bars indicate the total virus count in each host group. (B) Venn diagram shows the distribution of 715 (of 716) FSFs that were detected in archaeoviruses, bacterioviruses, and eukaryoviruses. Host information on the Circovirus-like genome RW_B virus encoding the “Satellite viruses” FSF (b.121.7) was not available. (C) Mean f values for FSFs corresponding to each of the seven Venn groups defined in (B) in archaeal, bacterial, and eukaryal proteomes. Values were averaged for all FSFs in each of the seven Venn groups. Text above bars indicates how many different viral subgroups encoded those FSFs.

  • Fig. 4 FSF distribution in the viral supergroup.

    (A) Total number of FSFs that were either shared or uniquely present in each viral subgroup. A seven-set Venn diagram makes explicit the 127 (27 – 1) combinations that are possible with seven groups. (B) Ariadne’s threads give the most parsimonious solution to encase all highly shared FSFs between different viral subgroups. Threads were inferred directly from the seven-set Venn diagram. FSFs identified by SCOP css. (C) Number of FSFs shared in each viral subgroup with every other subgroup. Pie charts are proportional to the size of the FSF repertoire in each viral subgroup.

  • Fig. 5 Phylogenomic analysis of FSF domains.

    (A) ToD describe the evolution of 1995 FSF domains (taxa) in 5080 proteomes (characters) (tree length = 1,882,554; retention index = 0.74; g1 = −0.18). The bar on top of ToD is a simple representation of how FSFs appeared in its branches, which correlates with their age (nd). FSFs were labeled blue for cell-only and red for those either shared with or unique to viruses. The boxplots identify the most ancient and derived Venn groups. Two major phases in the evolution of viruses are indicated in different background colors. Patterned area highlights the appearances of AV, BV, and EV soon after A, B, and E, respectively. FSFs are identified by SCOP css. (B) Viral FSFs plotted against their spread in viral proteomes (f value) and evolutionary time (nd). FSFs identified by SCOP css. (C) Distribution of ABEV FSFs in each viral subgroup along evolutionary time (nd). Numbers in parentheses indicate the total number of ABEV FSFs in each viral subgroup. White circles indicate group medians. Density trace is plotted symmetrically around the boxplots.

  • Fig. 6 Ancient history of RNA viral proteomes.

    (A) The length of Ariadne’s threads (colored lines) identifies FSFs that were shared by more than three viral subgroups. Filled circles indicate FSFs shared between two or three viral subgroups. Numbers next to each circle give the mean nd of FSFs shared by each combination. Numbers in parentheses give the range between the most ancient and the most recent FSFs that were shared by each combination. (B) Distribution of the most ancient (nd < 0.3) ABEV FSFs in evolutionary timeline (nd) for each viral subgroup. Numbers in parentheses indicate the total FSFs in each viral subgroup. White circles indicate group medians. A density trace is plotted symmetrically around the boxplots.

  • Fig. 7 Evolutionary relationships between cells and viruses.

    (A) ToP describing the evolution of 368 proteomes (taxa) that were randomly sampled from cells and viruses and were distinguished by the abundance of 442 ABEV FSFs (characters) (tree length = 45,935; retention index = 0.83; g1 = −0.31). All characters were parsimony informative. Differently colored branches represent BS support values. Major groups are identified. Viral genera names are given inside parentheses. The viral order “Megavirales” is awaiting approval by the ICTV and hence written inside quotes. Viral families that form largely unified or monophyletic groups are labeled with an asterisk. Virion morphotypes were mapped to ToP and illustrated with images from the ViralZone Web resource (131). No picture was available for Turriviridae. aActinobacteria, Bacteroidetes/Chlorobi, Chloroflexi, Cyanobacteria, Fibrobacter, Firmicutes, Planctomycetes, and Thermotogae. (B) A distance-based phylogenomic network reconstructed from the occurrence of 442 ABEV FSFs in randomly sampled 368 proteomes (uncorrected P distance; equal angle; least-squares fit = 99.46). Numbers on branches indicate BS support values. Taxa were colored for easy visualization. Important groups are labeled. bActinobacteria, Bacteroidetes/Chlorobi, Chloroflexi, Cyanobacteria, Deinococcus-Thermus, Fibrobacter, Firmicutes, and Planctomycetes. cAmoebozoa and Chromalveolata.

  • Fig. 8 Evolutionary history of proteomes inferred from numerical analysis.

    (A) Plot of the first three axes of evoPCO portrays evolutionary distances between cellular and viral proteomes. The percentage of variability explained by each coordinate is given in parentheses on each axis. The proteome of the last common ancestor of modern cells (57) was added as an additional sample to infer the direction of evolutionary splits. aIgnicoccus hospitalis, bLactobacillus delbrueckii, cCaenorhabditis elegans. (B) A distance-based NJ tree reconstructed from the occurrence of 442 ABEV FSFs in randomly sampled 368 proteomes. Each taxon was given a unique tree ID (tables S1 and S2). Taxa were colored for quick visualization.

  • Table 1 VSFs and their distribution in the viral supergroup.

    FSFs in boldface could be potential VSFs based on the criterion described in the text. FSFs were referenced by either SCOP ID or css. For example, the P-loop containing NTP hydrolase FSF is c.37.1, where “c” is the α/β class of secondary structure present in the protein domain, “37” is the fold, and “1” is the FSF.

    SCOP IDSCOP cssVenn groupFSF descriptionDistribution
    69070a.150.1VAnti-sigma factor AsiAdsDNA
    55064d.58.27VTranslational regulator protein regAdsDNA
    48493a.120.1VGene 59 helicase assembly proteindsDNA
    89433b.127.1VBaseplate structural protein gp8dsDNA
    69652d.199.1VDNA binding C-terminal domain of the transcription factor MotAdsDNA
    56558d.182.1VBaseplate structural protein gp11dsDNA
    49894b.28.1VBaculovirus p35 proteindsDNA
    160957e.69.1VPoly(A) polymerase catalytic subunit–likedsDNA
    51289b.85.5VTlp20, baculovirus telokin-like proteindsDNA
    88648b.121.6VGroup I dsDNA virusesdsDNA
    161240g.92.1VT-antigen–specific domain–likedsDNA
    118208e.58.1VViral ssDNA binding proteindsDNA
    54957d.58.8VViral DNA binding domaindsDNA
    51332b.91.1VE2 regulatory, transactivation domaindsDNA
    56548d.180.1VConserved core of transcriptional regulatory protein vp16dsDNA
    90246h.1.24VHead morphogenesis protein gp7dsDNA
    47724a.54.1VDomain of early E2A DNA binding protein, ADDBPdsDNA
    57917g.51.1VZn binding domains of ADDBPdsDNA
    49889b.27.1VSoluble secreted chemokine inhibitor, VCCIdsDNA
    89428b.126.1VAdsorption protein p2dsDNA
    82046b.116.1VViral chemokine binding protein m3dsDNA
    158974b.170.1VWSSV envelope protein-likedsDNA
    47852a.62.1VHepatitis B viral capsid (hbcag)dsDNA-RT
    111379f.47.1VVP4 membrane interaction domaindsRNA
    48345a.115.1VA virus capsid protein alpha-helical domaindsRNA
    69908e.35.1VMembrane penetration protein mu1dsRNA
    75347d.13.2VRotavirus NSP2 fragment, C-terminal domaindsRNA
    69903e.34.1VNSP3 homodimerdsRNA
    75574d.216.1VRotavirus NSP2 fragment, N-terminal domaindsRNA
    58030h.1.13VRotavirus nonstructural proteinsdsRNA
    49818b.19.1VViral protein domaindsRNA, minus-ssRNA, plus-ssRNA
    88650b.121.7VSatellite virusesssDNA
    48045a.84.1VScaffolding protein gpD of bacteriophage procapsidssDNA
    50176b.37.1VN-terminal domains of the minor coat protein g3pssDNA
    75404d.213.1VVSV matrix proteinMinus-ssRNA
    118173d.293.1VPhosphoprotein M1, C-terminal domainMinus-ssRNA
    69922f.12.1VHead and neck region of the ectodomain of NDV fusion glycoproteinMinus-ssRNA
    101089a.8.5VPhosphoprotein XD domainMinus-ssRNA
    58034h.1.14VMultimerization domain of the phosphoprotein from Sendai virusMinus-ssRNA
    50012b.31.1VEV matrix proteinMinus-ssRNA
    48145a.95.1VInfluenza virus matrix protein M1Minus-ssRNA
    143021d.299.1VNs1 effector domain–likeMinus-ssRNA
    161003e.75.1VFlu NP-likeMinus-ssRNA
    160453d.361.1VPB2 C-terminal domain–likeMinus-ssRNA
    101156a.30.3VNonstructural protein ns2, Nep, M1 binding domainMinus-ssRNA
    160892d.378.1VPhosphoprotein oligomerization domain–likeMinus-ssRNA
    56983f.10.1VViral glycoprotein, central and dimerization domainsPlus-ssRNA
    101257a.190.1VFlavivirus capsid protein CPlus-ssRNA
    103145d.255.1VTombusvirus P19 core protein, VP19Plus-ssRNA
    89043a.178.1VSoluble domain of poliovirus core protein 3aPlus-ssRNA
    110304b.148.1VCoronavirus RNA binding domainPlus-ssRNA
    101816b.140.1VReplicase NSP9Plus-ssRNA
    140367a.8.9VCoronavirus NSP7–likePlus-ssRNA
    143076d.302.1VCoronavirus NSP8–likePlus-ssRNA
    144246g.86.1VCoronavirus NSP10–likePlus-ssRNA
    103068d.254.1VNucleocapsid protein dimerization domainPlus-ssRNA
    117066b.1.24VAccessory protein X4 (ORF8, ORF7a)Plus-ssRNA
    143587d.318.1VSARS receptor binding domain–likePlus-ssRNA
    159936d.15.14VNSP3A-likePlus-ssRNA
    160099d.346.1VSARS Nsp1–likePlus-ssRNA
    140506a.30.8VFHV B2 protein–likePlus-ssRNA
    144251g.87.1VViral leader polypeptide zinc fingerPlus-ssRNA
    141666b.164.1VSARS ORF9b–likePlus-ssRNA
    55671d.102.1VRegulatory factor NefssRNA-RT
    56502d.172.1Vgp120 coressRNA-RT
    57647g.34.1VHIV-1 VPU cytoplasmic domainssRNA-RT
    49749b.121.2EVGroup II dsDNA viruses VPdsDNA
    103417e.48.1EVMajor capsid protein VP5dsDNA
    140713a.251.1EVPhage replication organizer domaindsDNA
    161008e.76.1EVViral glycoprotein ectodomain–likedsDNA, minus-ssRNA
    110132b.147.1EVBTV NS2-like ssRNA binding domaindsRNA
    82856e.42.1EVL-A virus major coat proteindsRNA
    140809a.260.1EVRhabdovirus nucleoprotein–likeMinus-ssRNA
    101399a.206.1EVP40 nucleoproteinMinus-ssRNA
    55405d.85.1EVRNA bacteriophage capsid proteinMinus-ssRNA
    68918a.140.4BVRecombination endonuclease VII, C-terminal and dimerization domainsdsDNA
    50017b.32.1BVgp9dsDNA
    58046h.1.17BVFibritindsDNA
    56826e.27.1BVUpper collar protein gp10 (connector protein)dsDNA
    161234g.91.1BVE7 C-terminal domain–likedsDNA
    140919a.263.1BVDNA terminal proteindsDNA
    89064a.179.1BVReplisome organizer (g39p helicase loader/inhibitor protein)dsDNA
    160570d.368.1BVYonK-likedsDNA
    51327b.90.1BVHead binding domain of phage P22 tailspike proteindsDNA
    141658b.163.1BVBacteriophage trimeric proteins domaindsDNA
    64210d.186.1BVHead-to-tail joining protein W, gpWdsDNA
    51274b.85.2BVHead decoration protein D (gpD, major capsid protein D)dsDNA
    159865d.186.2BVXkdW-likedsDNA
    101059a.159.3BVB-form DNA mimic OcrdsDNA
    58091h.4.2BVClostridium neurotoxins, “coiled-coil” domaindsDNA
    47681a.49.1BVC-terminal domain of B transposition proteindsDNA
    58059h.2.1BVTetramerization domain of the Mnt repressordsDNA
    54328d.15.5BVStaphylokinase/streptokinasedsDNA
    64465d.196.1BVOuter capsid protein sigma 3dsRNA
    57987h.1.4BVInovirus (filamentous phage) major coat proteinssDNA
    160940e.66.1BEVApi92-likedsDNA
    160459d.362.1BEVBLRF2-likedsDNA
    109859a.214.1BEVNblA-likedsDNA
    54334d.15.6BEVSuperantigen toxins, C-terminal domaindsDNA
    51225b.83.1BEVFiber shaft of virus attachment proteinsdsDNA, dsRNA
    49835b.21.1BEVVirus attachment protein globular domaindsDNA, dsRNA
    50203b.40.2BEVBacterial enterotoxinsdsDNA, ssDNA
    111474h.3.3BEVCoronavirus S2 glycoproteindsDNA, plus-ssRNA
    56831e.28.1BEVReovirus inner layer core protein p3dsRNA
    109801a.30.5AVHypothetical protein D-63dsDNA
    161229g.90.1ABVE6 C-terminal domain–likedsDNA
    74748a.154.1ABVVariable surface antigen VlsEdsDNA
    143602d.321.1ABEVSTIV B116-likedsDNA
    58064h.3.1ABEVInfluenza hemagglutinin (stalk)dsDNA, minus-ssRNA
  • Table 2 Significantly enriched “biological process” GO terms in (66 +43) VSFs (FDR < 0.01).
    GO IDGO termZ scorePFDR
    GO:0044415Evasion or tolerance of host defenses14.564.01 × 1063.00 × 105
    GO:0050690Regulation of defense response to virus by virus14.564.01 × 1063.00 × 105
    GO:0044068Modulation by symbiont of host cellular process13.85.72 × 1063.00 × 105
    GO:0052572Response to host immune response13.147.86 × 1063.02 × 105
    GO:0002832Negative regulation of response to biotic stimulus12.571.05 × 1053.02 × 105
    GO:0052255Modulation by organism of defense response of other organism involved in symbiotic interaction12.571.05 × 1053.02 × 105
    GO:0051805Evasion or tolerance of immune response of other organism involved in symbiotic interaction12.571.05 × 1053.02 × 105
    GO:0019048Modulation by virus of host morphology or physiology12.061.36 × 1053.53 × 105
  • Table 3 FSFs involved in capsid/coat assembly processes in viruses.

    FSFs that are completely absent in cellular proteomes are presented in boldface. Several other FSFs also have negligible f values in cells.

    SCOP IDSCOP cssFSF descriptionViral lineagef-value in cells
    82856e.42.1L-A virus major coat proteinBTV-like0.00025
    56831e.28.1Reovirus inner layer core protein p3BTV-like0.00019
    48345a.115.1A virus capsid protein alpha-helical domainBTV-like0
    56563d.183.1Major capsid protein gp5HK97-like0.2352
    103417e.48.1Major capsid protein VP5HK97-like0.00006
    88633b.121.4Positive stranded ssRNA virusesPicornavirus-like0.00364
    88645b.121.5ssDNA virusesPicornavirus-like0.00099
    88650b.121.7Satellite virusesPicornavirus-like0
    88648b.121.6Group I dsDNA virusesPicornavirus-like0
    49749b.121.2Group II dsDNA viruses VPPRD1/adenovirus-like0.00031
    47353a.28.3Retrovirus capsid dimerization domain–likeOther/unclassified0.00407
    47943a.73.1Retrovirus capsid protein, N-terminal core domainOther/unclassified0.00123
    47195a.24.5TMV-like viral coat proteinsOther/unclassified0.00099
    57987h.1.4Inovirus (filamentous phage) major coat proteinOther/unclassified0.00068
    51274b.85.2Head decoration protein D (gpD, major capsid protein D)Other/unclassified0.00049
    64465d.196.1Outer capsid protein sigma 3Other/unclassified0.00006
    55405d.85.1RNA bacteriophage capsid proteinOther/unclassified0.00006
    48045a.84.1Scaffolding protein gpD of bacteriophage procapsidOther/unclassified0
    47852a.62.1Hepatitis B viral capsid (hbcag)Other/unclassified0
    101257a.190.1Flavivirus capsid protein COther/unclassified0
    50176b.37.1N-terminal domains of the minor coat protein g3pOther/unclassified0
    103068d.254.1Nucleocapsid protein dimerization domainOther/unclassified0
  • Table 4 FSFs shared by different viral subgroups.
    SCOP IDSCOP cssFSF descriptionDistribution
    56672e.8.1DNA/RNA polymerasesdsDNA, dsRNA, dsDNA-RT, ssRNA-RT, minus-ssRNA, plus-ssRNA
    52540c.37.1P-loop containing nucleoside triphosphate hydrolasesdsDNA, dsRNA, ssDNA, plus-ssRNA
    53335c.66.1S-Adenosyl-l-methionine–dependent methyltransferasesdsDNA, dsRNA, ssDNA, minus-ssRNA, plus-ssRNA
    53098c.55.3Ribonuclease H–likedsDNA, ssRNA-RT, ssDNA, minus-ssRNA
    88633b.121.4Positive stranded ssRNA virusesdsDNA, dsRNA, minus-ssRNA, plus-ssRNA
    57850g.44.1RING/U-boxdsDNA, minus-ssRNA, plus-ssRNA
    51283b.85.4dUTPase-likedsDNA, dsDNA-RT, ssRNA-RT
    56112d.144.1Protein kinase–like (PK-like)dsDNA, dsRNA, ssRNA-RT
    54768d.50.1dsRNA binding domain–likedsDNA, dsRNA, plus-ssRNA
    54001d.3.1Cysteine proteinasesdsDNA, minus-ssRNA, plus-ssRNA
    52266c.23.10SGNH hydrolasedsDNA, minus-ssRNA, plus-ssRNA
    58100h.4.4Bacterial hemolysinsdsDNA, dsRNA, ssDNA
    49818b.19.1Viral protein domaindsRNA, minus-ssRNA, plus-ssRNA
    57756g.40.1Retrovirus zinc finger–like domainsdsDNA, dsDNA-RT, ssRNA-RT
    50044b.34.2SH3 domaindsDNA, dsRNA, ssRNA-RT
    57924g.52.1Inhibitor of apoptosis (IAP) repeatdsDNA, plus-ssRNA
    50249b.40.4Nucleic acid binding proteinsdsDNA, ssDNA
    53041c.53.1Resolvase-likedsDNA, ssDNA
    55550d.93.1SH2 domaindsDNA, ssRNA-RT
    55464d.89.1Origin of replication binding domain, RBD-likedsDNA, ssDNA
    56399d.166.1ADP ribosylationdsDNA, ssDNA
    100920b.130.1Heat shock protein 70 kD (HSP70), peptide binding domaindsDNA, plus-ssRNA
    47413a.35.1Lambda repressor–like DNA binding domainsdsDNA, ssDNA
    69065a.149.1RNase III domain–likedsDNA, plus-ssRNA
    46785a.4.5Winged helix DNA binding domaindsDNA, ssDNA
    53448c.68.1Nucleotide-diphospho-sugar transferasesdsDNA, dsRNA
    57997h.1.5TropomyosindsDNA, dsRNA
    54236d.15.1Ubiquitin-likedsDNA, ssRNA-RT
    47954a.74.1Cyclin-likedsDNA, ssRNA-RT
    90229g.66.1CCCH zinc fingerdsDNA, minus-ssRNA
    103657a.238.1BAR/IMD domain–likedsDNA, ssRNA-RT
    53067c.55.1Actin-like ATPase domaindsDNA, plus-ssRNA
    47794a.60.4Rad51 N-terminal domain–likedsDNA, ssDNA
    143990d.336.1YbiA-likedsDNA, plus-ssRNA
    55811d.113.1NudixdsDNA, dsRNA
    51197b.82.2Clavaminate synthase–likedsDNA, plus-ssRNA
    53756c.87.1UDP-glycosyltransferase/glycogen phosphorylasedsDNA, dsRNA
    81665f.33.1Calcium ATPase, transmembrane domain MdsDNA, plus-ssRNA
    52949c.50.1Macro domain–likedsDNA, plus-ssRNA
    53955d.2.1Lysozyme-likedsDNA, dsRNA
    49899b.29.1Concanavalin A–like lectins/glucanasesdsDNA, dsRNA
    48371a.118.1ARM repeatdsDNA, plus-ssRNA
    51126b.80.1Pectin lyase–likedsDNA, plus-ssRNA
    47598a.43.1Ribbon-helix-helixdsDNA, ssDNA
    50494b.47.1Trypsin-like serine proteasesdsDNA, plus-ssRNA
    55144d.61.1LigT-likedsDNA, plus-ssRNA
    81296b.1.18E set domainsdsDNA, plus-ssRNA
    161008e.76.1Viral glycoprotein ectodomain–likedsDNA, minus-ssRNA
    90257h.1.26Myosin rod fragmentsdsDNA, dsRNA
    57501g.17.1Cystine-knot cytokinesdsDNA, ssRNA-RT
    54117d.9.1Interleukin 8–like chemokinesdsDNA, dsRNA
    58069h.3.2Virus ectodomainssRNA-RT, minus-ssRNA
    50630b.50.1Acid proteasesdsDNA-RT, ssRNA-RT
    47459a.38.1HLH, helix-loop-helix DNA binding domaindsDNA, ssRNA-RT
    50939b.68.1SialidasesdsDNA, minus-ssRNA
    55166d.65.1Hedgehog/DD peptidasedsDNA, ssDNA
    51225b.83.1Fiber shaft of virus attachment proteinsdsDNA, dsRNA
    49835b.21.1Virus attachment protein globular domaindsDNA, dsRNA
    111474h.3.3Coronavirus S2 glycoproteindsDNA, plus-ssRNA
    55658d.100.1L9 N-domain–likedsDNA, dsDNA-RT
    55895d.124.1Ribonuclease Rh–likedsDNA, plus-ssRNA
    52972c.51.4ITPase-likedsDNA, plus-ssRNA
    57959h.1.3Leucine zipper domaindsDNA, ssRNA-RT
    50203b.40.2Bacterial enterotoxinsdsDNA, ssDNA
    48208a.102.1Six-hairpin glycosidasesdsDNA, ssDNA
    50022b.33.1ISP domaindsDNA, ssRNA-RT
    58064h.3.1Influenza hemagglutinin (stalk)dsDNA, minus-ssRNA

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/1/8/e1500527/DC1

    Text S1. Phylogenetic assumptions and models.

    Fig. S1. FSF use and reuse for proteomes in each viral subgroup and for free-living cellular organisms.

    Fig. S2. Distribution of FSFs in each of the seven Venn groups defined in Fig. 3B along the evolutionary timeline (nd).

    Fig. S3. Spread of abe core FSFs in viral subgroups.

    Fig. S4. Evolutionary relationships within the viral subgroup.

    Fig. S5. Evolutionary relationships between cells and viruses.

    Table S1. List of viruses sampled in this study.

    Table S2. List of cellular organisms sampled in this study.

    Table S3. VSFs and their spread in cellular (X) proteomes.

    Table S4. FSF use and reuse values for all proteomes.

    Table S5. List of FSFs corresponding to each of the seven Venn groups defined in Fig. 3B.

    Table S6. FSFs mapped to structure-based viral lineages.

    Table S7. Significantly enriched “biological process” GO terms in EV FSFs (FDR < 0.01).

    References (132137)

  • Supplementary Materials

    This PDF file includes:

    • Text S1. Phylogenetic assumptions and models.
    • Fig. S1. FSF use and reuse for proteomes in each viral subgroup and for free-living cellular organisms.
    • Fig. S2. Distribution of FSFs in each of the seven Venn groups defined in Fig. 3B along the evolutionary timeline (nd).
    • Fig. S3. Spread of abe core FSFs in viral subgroups.
    • Fig. S4. Evolutionary relationships within the viral subgroup.
    • Fig. S5. Evolutionary relationships between cells and viruses.
    • Legends for tables S1 to S7
    • References (132–137)

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • Table S1 (Microsoft Excel format). List of viruses sampled in this study.
    • Table S2 (Microsoft Excel format). List of cellular organisms sampled in this study.
    • Table S3 (Microsoft Excel format). VSFs and their spread in cellular (X) proteomes.
    • Table S4 (Microsoft Excel format). FSF use and reuse values for all proteomes.
    • Table S5 (Microsoft Excel format). List of FSFs corresponding to each of the seven Venn groups defined in Fig. 3B.
    • Table S6 (Microsoft Excel format). FSFs mapped to structure-based viral lineages.
    • Table S7 (Microsoft Excel format). Significantly enriched “biological process” GO terms in EV FSFs (FDR < 0.01).

    Files in this Data Supplement: