Research ArticleEVOLUTIONARY BIOLOGY

Genome-scale phylogeny and contrasting modes of genome evolution in the fungal phylum Ascomycota

See allHide authors and affiliations

Science Advances  04 Nov 2020:
Vol. 6, no. 45, eabd0079
DOI: 10.1126/sciadv.abd0079

Abstract

Ascomycota, the largest and most well-studied phylum of fungi, contains three subphyla: Saccharomycotina (budding yeasts), Pezizomycotina (filamentous fungi), and Taphrinomycotina (fission yeasts). Despite its importance, we lack a comprehensive genome-scale phylogeny or understanding of the similarities and differences in the mode of genome evolution within this phylum. By examining 1107 genomes from Saccharomycotina (332), Pezizomycotina (761), and Taphrinomycotina (14) species, we inferred a robust genome-wide phylogeny that resolves several contentious relationships and estimated that the Ascomycota last common ancestor likely originated in the Ediacaran period. Comparisons of genomic properties revealed that Saccharomycotina and Pezizomycotina differ greatly in their genome properties and enabled inference of the direction of evolutionary change. The Saccharomycotina typically have smaller genomes, lower guanine-cytosine contents, lower numbers of genes, and higher rates of molecular sequence evolution compared with Pezizomycotina. These results provide a robust evolutionary framework for understanding the diversity and ecological lifestyles of the largest fungal phylum.

INTRODUCTION

The fungal phylum Ascomycota is one of most diverse phyla of eukaryotes with at least 83,000 described species that represent more than half of all described species of fungi (1). The Ascomycota is divided into three subphyla. The Saccharomycotina subphylum is a lineage of more than 1000 known species and 12 major clades (2), commonly referred to as budding yeasts. Species in this lineage include the model organism Saccharomyces cerevisiae (3) and several notable pathogens, such as the opportunistic human pathogen Candida albicans (4) and the multidrug-resistant emerging pathogen Candida auris (5). The Pezizomycotina subphylum contains more than 82,000 described species in 16 classes (1), commonly referred to as filamentous fungi. This subphylum contains several major plant and animal pathogens belonging to diverse genera, such as Fusarium and Aspergillus (6, 7). Last, the Taphrinomycotina subphylum contains ~140 described species in five classes (1), commonly referred to as fission yeasts. This subphylum contains the model organism Schizosaccharomyces pombe (8).

To better understand the evolution of species diversity and ecological lifestyles in the Ascomycota, a robust framework of phylogenetic relationships and divergence time estimates is essential (911). In the last two decades, several studies have aimed to infer the phylogeny of the Ascomycota either using a handful of gene markers from hundreds of taxa (1216) or using hundreds of gene markers from tens of taxa (1720). To date, the most comprehensive “few-markers-from-many-taxa” phylogeny used a 6-gene, 420-taxon (8 Taphrinomycotina, 16 Saccharomycotina, and 396 Pezizomycotina) data matrix (12), whereas the most comprehensive genome-scale phylogeny used a 238-gene, 496-taxon (12 Taphrinomycotina, 76 Saccharomycotina, and 408 Pezizomycotina) data matrix (21) but was inferred using FastTree, a program that is faster but typically yields phylogenies that have much lower likelihood scores than those obtained by IQ-TREE and RAxML/RAxML-NG (22). Key relationships supported by these studies include the monophyly of each subphylum and class and the sister group relationship of subphyla Saccharomycotina and Pezizomycotina. In contrast, relationships among classes are contentious between studies, particularly with respect to relationships between the 16 classes in Pezizomycotina (1). For example, there is disagreement whether the sister group to the rest of classes in the Pezizomycotina is class Pezizomycetes (13), class Orbiliomycetes (16), or a clade composed of both (18).

Previous molecular clock-based estimates of divergence times for Ascomycota have all been based on few-markers-from-many-taxa data matrices (13, 14, 2325), resulting in age estimates for key events in Ascomycota evolution that have wide intervals. For example, analysis of a 6-gene, 121-taxon (1 Saccharomycotina, 118 Pezizomycotina, and 2 Taphrinomycotina) data matrix inferred that the origin of the phylum Ascomycota took place 531 million years (Ma) ago [95% credibility interval (CI), 671 to 410 Ma ago) (see their scenario 4) (14), while analysis of a 4-gene, 145-taxon (12 Saccharomycotina, 129 Pezizomycotina, and 4 Taphrinomycotina) data matrix inferred that the phylum originated 588 Ma ago (95% CI, 773 to 487 Ma ago) (13). The sparser taxon sampling of previous studies has prevented estimation of divergence times of several key divergence events of higher taxonomic ranks (2325) and stymied our understanding of their evolutionary pace. While these studies have substantially advanced our understanding of Ascomycota evolution, a comprehensive, genome-scale phylogeny and timetree stemming from the sampling of hundreds of genes from thousands of taxa from the phylum are still lacking.

A robust phylogenomic framework would also facilitate comparisons of genome evolution across the subphyla of Ascomycota. For example, the three subphyla differ in their genome sizes, with the genomes of Pezizomycotina species being notably larger (~42 Mb) than those of Saccharomycotina (~13 Mb) and Taphrinomycotina (~14 Mb) (26). While several recent studies have analyzed major lineages within the two largest subphyla (in terms of number of described species), Saccharomycotina (2, 19, 27) and Pezizomycotina (28, 29), comparisons of genome evolution across the two subphyla are lacking. For example, a recent analysis of the tempo and mode of genome evolution in 332 Saccharomycotina found evidence of high evolutionary rates and reductive evolution across this subphylum (2), but whether budding yeasts are faster evolving than filamentous fungi remains unknown. However, a recent analysis of 71 Ascomycota genomes showed that Pezizomycotina has much higher levels of gene order divergence than Saccharomycotina (20). Similarly, genome-wide examinations of horizontal gene transfer events in dozens to more than a hundred Ascomycota genomes have revealed that Pezizomycotina acquired substantially higher numbers of genes from prokaryotic donors than did Saccharomycotina (30). Although these studies have contributed to our understanding of certain evolutionary processes in the phylum, we still know relatively little about the evolution of Ascomycota genomes and their properties.

There are currently more than 1000 publicly available genomes from Ascomycota species, which span the diversity of Saccharomycotina (332 genomes representing all 12 major clades), Pezizomycotina (761 genomes representing 9 of 16 classes), and Taphrinomycotina (14 genomes representing 4 of 5 classes) (1107 genomes as of 14 December 2018). These 1107 genomes represent a much larger and representative source of genomic data across the Ascomycota phylum than previously available, providing a unique opportunity to infer a genome-scale phylogeny and timetree for the phylum and compare the mode of genome evolution across its subphyla.

RESULTS AND DISCUSSION

A genome-scale phylogeny of the fungal phylum Ascomycota

To infer a genome-scale phylogeny of the Ascomycota, we used 1107 publicly available genomes from species belonging to Ascomycota (Saccharomycotina, 332; Pezizomycotina, 761; and Taphrinomycotina, 14) and six out-groups from the sister fungal phylum Basidiomycota. The numbers of species examined represent ~332 of 1000 (~30%) of all described species of Saccharomycotina, ~761 of 82,000 (~1%) of Pezizomycotina, and ~14 of 140 (~10%) of Taphrinomycotina. All genomes were retrieved from the National Center for Biotechnology Information (NCBI) GenBank database, ensuring that only one genome per species was included (tables S1 and S2). Analysis of genome assembly completeness reveals that 1021 of 1113 (~92%) genomes have more than 90% of the 1315 full-length Benchmarking Universal Single-Copy Orthologs (BUSCO) genes (fig. S1) (31).

The 1315 BUSCO genes from 1107 Ascomycota species and six out-groups were used to construct a phylogenomic data matrix (see Materials and Methods). After constructing the multiple amino acid sequence alignment and trimming ambiguous regions for each of these 1315 BUSCO genes, we kept only the 815 BUSCO genes that had taxon occupancy of ≥50% for each subphylum (i.e., ≥7 Taphrinomycotina, ≥166 Saccharomycotina, and ≥381 Pezizomycotina) and whose amino acid sequence alignments were ≥300 sites in length. In the final set of 815 BUSCO genes, alignment lengths range from 300 to 4585 amino acid sites (average, 690), and numbers of taxa range from 851 to 1098 (average, 1051) (table S3). The final data matrix contains 1107 taxa, 815 genes, and 562,376 amino acid sites.

Inference using concatenation- and coalescent-based approaches yielded a robust, comprehensive phylogeny of the Ascomycota phylum (Fig. 1). The vast majority of internodes in both the concatenation-based (1103 of 1110, 99%) and the coalescent-based phylogeny (1076 of 1110, 97%) received strong (≥95%) support and were congruent between the phylogenies inferred using the two approaches; only 46 of 1110 (4%) internodes were incongruent between the two phylogenies (figs. S2 and S3).

Fig. 1 Maximum likelihood (ML) phylogeny of 1107 taxa in the fungal phylum Ascomycota.

The concatenation-based ML phylogeny (lnL = −269,043,834.145) was inferred from a set of 815 BUSCO amino acid genes (total of 562,376 sites) under a single LG + G4 substitution model using IQ-TREE multicore version 1.5.1. The number of species sampled in each subphylum is given in parentheses. Internal branch labels are acronyms for 12 major clades in the subphylum Saccharomycotina and nine classes in the subphylum Pezizomycotina. S. cerevisiae, Saccharomyces cerevisiae; H. vineae, Hanseniaspora vineae; D. hansenii, Debaryomyces hansenii; O. sinensis, Ophiocordyceps sinensis; P. hyperparasitica, Pseudovirgaria hyperparasitica; A. nidulans, Aspergillus nidulans. Images representing taxa were drawn by hand, taken from PhyloPic (http://phylopic.org), or modified from Google Images. The bar next to each species indicates the guanine-cytosine (GC) content. On average, lineages in the subphylum Saccharomycotina have significantly lower GC content (49.6% versus 40.6%; Wilcoxon rank-sum test; P = 3.07 × 10−103) but higher evolutionary rate (1.80 substitutions per site versus 1.12 substitutions per site; Wilcoxon rank-sum test; P = 6.57 × 10−126) compared with lineages in the subphylum Pezizomycotina. The complete phylogenetic relationships of 1107 taxa are given in fig. S2 and in the Figshare repository. For easy determination of the relationships among any subset of taxa, the phylogeny is also available through Treehouse (63).

Our higher-level phylogeny of Ascomycota is generally more congruent with previous genome-scale phylogenies (2, 17, 18) than with few-genes-from-many-taxa phylogenies (1215), particularly with respect to relationships among the nine classes in the subphylum Pezizomycotina. For example, genome-scale studies, including ours, consistently favor a clade consisting of Pezizomycetes and Orbiliomycetes as the sister group to the rest of the Pezizomycotina (17, 18), while studies based on a few genes recovered either Orbiliomycetes (1416) or Pezizomycetes (13) as the sister class to the rest of the Pezizomycotina (Fig. 2A). Our phylogeny also strongly supported the placement of the class Schizosaccharomycetes, which includes the model organism S. pombe, as the sister group to the class Pneumocystidomycetes, which contains the human pathogen Pneumocystis jirovecii (Fig. 2B). A recent genome-scale study of 84 fungal genomes showed that our result is consistent with the phylogeny inferred using an alignment-free composition vector approach but not with the phylogeny inferred using maximum likelihood (ML), which instead recovered Schizosaccharomycetes as the sister group to Taphrinomycetes (17). Last, both concatenation- and coalescent-based approaches supported the placement of the subphylum Saccharomycotina as the sister group to the subphylum Pezizomycotina (Fig. 2C). This result is consistent with most previous studies that analyze multiple sequence alignment data (15, 17, 19, 32, 33), but not with a recent study that analyzed genomic data with an alignment-free method and placed the subphylum Saccharomycotina as the sister group to the subphylum Taphrinomycotina (34).

Fig. 2 Distribution of phylogenetic signal for three historically contentious relationships within Ascomycota.

For each relationship/internal branch [(A) Which class(es) is the sister group to the rest of the Pezizomycotina? (B) What is the relationship among three classes Schizosaccharomycetes, Pneumocystidomycetes, and Taphrinomycetes in the subphylum Taphrinomycotina? (C) What is the relationship among three subphyla Pezizomycotina, Saccharomycotina, and Taphrinomycotina in the phylum Ascomycota?], we applied the framework presented by Shen et al. (35) to examine proportions of genes (left) and sites (right) supporting each of three competing hypotheses (topology 1 or T1 in red, topology 2 or T2 in green, and topology 3 or T3 in yellow). Note that both concatenation- and coalescent-based approaches supported T1 in our study. Dashed horizontal lines on one-third y axis value denote expectation of proportion of genes/sites under a polytomy scenario. The G-test was used to test whether the sets of three values are significantly different (***P ≤ 0.001). All values are given in tables S4 and S5. Input and output files associated with phylogenetic signal estimation are also deposited in the Figshare repository.

To evaluate whether our genome-scale data matrix robustly resolved the three historically contentious branches discussed in the previous paragraph, we quantified the distribution of phylogenetic signal for alternative topologies of these three phylogenetic hypotheses at the level of genes and sites using an ML framework presented by Shen et al. (35). First, we found that the phylogenetic support for each of the three branches stemmed from many genes, i.e., it was not dominated by a small number of genes with highly disproportionate influence (table S4). Second, we found that the topology recovered by both concatenation- and coalescent-based approaches in our study had significantly higher frequencies of supporting genes and supporting sites (G-test) than two alternative competing hypotheses, ranging from 0.45 to 0.65, in all three branches examined (Fig. 2, A to C, and tables S4 and S5). Neither of the two alternative conflicting phylogenetic hypotheses for each of the three branches received frequencies of supporting genes and supporting sites that were equal to or greater than one-third (0.33), the value expected if the relationships among the taxa were represented by a polytomy. In summary, we found that very few branches (<5%) were incongruent between concatenation- and coalescent-based phylogenies (figs. S2 and S3). We also found that individual genes and sites exhibited robust phylogenetic signal for specific, historically contentious branches (Fig. 2, A to C). These results suggest that genome-scale amounts of data from a comprehensive sampling of taxa will robustly resolve major lineages of the tree of life.

A genome-scale evolutionary timetree of the fungal phylum Ascomycota

We next used the robust phylogeny, a relaxed molecular clock approach, and six widely accepted time calibration nodes (see Materials and Methods) to infer the time scale of evolution of Ascomycota. We inferred the origin of the phylum to have taken place 563 Ma ago (95% CI, 631 to 495 Ma ago), the origin of the subphylum Saccharomycotina 438.4 Ma ago (CI, 590 to 304 Ma ago), the origin of the subphylum Pezizomycotina 407.7 Ma ago (CI, 631 to 405 Ma ago), and the origin of Taphrinomycotina crown group 530.5 Ma ago (CI, 620 to 417 Ma ago). Notably, the taxonomic placement of all budding yeasts into a single class, Saccharomycetes, whose origin coincides with the origin of the subphylum Saccharomycotina, means that the last common ancestor of this sole class of budding yeasts is much more ancient than those of any of the nine classes (based on current taxon sampling) in the subphylum Pezizomycotina (fig. S4 and table S6). For example, the most ancient class in Pezizomycotina is Pezizomycetes, whose origin is dated 247.7 Ma ago (CI, 475 to 193 Ma ago) (fig. S4 and table S6). The other outlier, albeit with much larger confidence intervals, is class Neolectomycetes in Taphrinomycotina, which we estimate to have originated 480.4 Ma ago (CI, 607 to 191 Ma ago) (fig. S4 and table S6).

Comparison of our inferred dates of divergence to recent studies using a few genes (11, 13, 14) shows that our estimates are generally younger. For example, our estimate for the phylum Ascomycota is around 563 Ma ago, whereas the origin of the Ascomycota was dated at least 533 Ma ago by Hyde et al. (9). However, recent studies that used 18S and 28S ribosomal RNA markers from 111 fungal taxa (11) and four gene markers from 145 Ascomycota taxa (13) inferred the origin of the Ascomycota to have taken place around 606 and 588 Ma ago, respectively. This result is consistent with findings of previous studies [e.g., (36)], where inclusion of large numbers of genes was found to also result in younger estimates of divergence times, perhaps because of the influence of larger amounts of data in decreasing the stochastic error involved in date estimation. In summary, generation of a genome-scale timetree for more than 1000 ascomycete species spanning the diversity of the phylum provides a robust temporal framework for understanding and exploring the origin and diversity of Ascomycota lifestyles.

Contrasting modes of genome evolution in fungal phylum Ascomycota

To begin to understand the similarities and differences in the modes of genome evolution between subphyla, we assessed seven different genomic properties for each species. Because the number of genomes from Taphrinomycotina taxa is substantially smaller (14) than Saccharomycotina (332 genomes) and Pezizomycotina (761) (Fig. 3), we focused on examining the evolution of different genomic properties between Saccharomycotina and Pezizomycotina. Specifically, we found that Saccharomycotina exhibited a 1.6-fold higher evolutionary rate (on average, 1.80 substitutions per site in Saccharomycotina versus 1.12 substitutions per site in Pezizomycotina), 1.24-fold lower guanine-cytosine (GC) content (40% versus 50%), 3-fold smaller genome size (13 Mb versus 39 Mb), 1.9-fold lower number of protein-coding genes (5734 versus 10,847), 1.3-fold lower number of DNA repair genes (41 versus 54), 1.2-fold higher number of tRNA genes (179 versus 146), and 1.3-fold smaller estimates of nonsynonymous to synonymous substitution rate ratio (dN/dS) (0.053 versus 0.063), compared with Pezizomycotina (Fig. 3A, Table 1, and table S7). In addition, we found that gene density (that is, the number of genes divided by genome size) in Saccharomycotina is much higher than that in Pezizomycotina (on average, 451 genes per megabase in Saccharomycotina versus 290 genes per megabase in Pezizomycotina), which is in agreement with previous work showing that gene density is inversely correlated with genome size (37).

Fig. 3 Contrasting patterns for seven genomic properties between Pezizomycotina and Saccharomycotina.

As the subphylum Taphrinomycotina (no. of species = 14) has a much smaller number of species than the subphylum Saccharomycotina (no. of species = 332) and the subphylum Pezizomycotina (no. of species = 761) in our dataset, we focused our analyses on the comparisons of seven genome properties between Saccharomycotina and Pezizomycotina. (A) For each species in Pezizomycotina (colored in red, n = 761) and Saccharomycotina (colored in green, n = 332), we calculated evolutionary rate, GC content, genome size, number of protein-coding genes, number of DNA repair genes, number of tRNA genes, and dN/dS (see Materials and Methods for details). The Wilcoxon rank-sum test was used to test whether the sets of values in two subphyla are significantly different. (B) Pairwise standard Pearson’s correlation coefficient among pairs of the seven genomic properties was conducted using R 3.4.2 for Pezizomycotina (lower diagonal) and Saccharomycotina (upper diagonal), respectively. For each cell, the top value corresponds to the P value (NS, P > 0.05; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001), whereas the bottom value corresponds to Pearson’s coefficient value. Orange cells denote instances where correlation trends in Pezizomycotina and Saccharomycotina are in opposite directions, whereas blue cells denote instances where the trends are in the same direction. The detailed values of all seven properties in Pezizomycotina and Saccharomycotina are given in table S7. The correlations among these seven properties are largely consistent before (i.e., standard Pearson’s correlations) and after (i.e., phylogenetically independent contrasts) accounting for correlations due to phylogeny (see table S8).

Table 1 Summary of values for seven genomic properties in extant Saccharomycotina and Pezizomycotina and in the last common ancestors of Saccharomycotina and Pezizomycotina.

View this table:

Analysis of standard Pearson’s correlations among the seven genomic properties revealed that two pairs exhibited statistically significant contrasting patterns between Saccharomycotina and Pezizomycotina. Specifically, evolutionary rate shows negative correlation with GC content in Saccharomycotina but positive correlation in Pezizomycotina, and GC content shows negative correlation with number of DNA repair genes in Saccharomycotina but positive correlation in Pezizomycotina (Fig. 3B). These correlations are largely consistent before (i.e., standard Pearson’s correlations) and after (i.e., phylogenetically independent contrasts) accounting for correlations due to phylogeny (table S8).

For each of the seven properties, we used our genome-scale phylogeny (Fig. 1) to infer the ancestral character states and reconstruct their evolution in the Saccharomycotina ancestor and the Pezizomycotina ancestor. Comparison of ancestral states along branches on the Saccharomycotina part of the phylogeny to those on the Pezizomycotina part of the phylogeny showed that all genomic properties, except the number of tRNA genes, exhibited different modes of evolution (Fig. 4 and Table 1). For example, most Saccharomycotina branches exhibit evolutionary rates of at least 1.0 amino acid substitutions per site, whereas those of Pezizomycotina exhibit evolutionary rates between 0.7 and 1.4 substitutions per site (Fig. 4A). However, the inferred values for these properties of the last common ancestors of Saccharomycotina and Pezizomycotina are quite similar. For example, the inferred state values for the Saccharomycotina last common ancestor and the Pezizomycotina last common ancestor are 1.1 and 0.9 substitutions per site for evolutionary rate and 43 and 47% for GC content (Table 1), respectively. The same trends are also observed across lineages, such as Lipomycetaceae, which is the sister group to the rest of the Saccharomycotina, and the clade consisting of Pezizomycetes and Orbiliomycetes, which is the sister group to the rest of the Pezizomycotina (Fig. 4, A and B).

Fig. 4 Contrasting modes of genome evolution in Pezizomycotina and Saccharomycotina.

(A) For each of the seven genomic properties examined (see Materials and Methods for details), we reconstructed them as continuous traits on the species phylogeny (Fig. 1) and visualized their ancestral states with the R package phytools v0.6.44 (62). Heatmap bars denote ancestral state values from small (blue) to large (red). Three ancestral state values next to three red dots are shown for the ancestor of the subphyla Pezizomycotina and Saccharomycotina, the ancestor of the subphylum Pezizomycotina, and the ancestor of the subphylum Saccharomycotina, respectively. Sub., substitution. (B) Phylogeny key showing the placement of the 21 nodes representing the last common ancestors of the 12 major clades in the subphylum Saccharomycotina and of the nine classes in the subphylum Pezizomycotina; the 21 nodes are indicated by the red dots. The orders of branches in (A) are identical to those in (B).

Comparison of the trait values for the seven genome properties between extant Saccharomycotina and Pezizomycotina branches to those of the Saccharomycotina and Pezizomycotina last common ancestors showed that evolutionary rate, GC content, genome size, and number of protein-coding genes were the properties with the highest amounts of evolutionary change (Figs. 3 and 4 and Table 1). Ancestral state reconstruction also enabled inference of the direction of evolutionary change for each of the evolutionary properties. For example, the Saccharomycotina and Pezizomycotina last common ancestors, as well as branches in Lipomycetaceae and branches across Pezizomycotina, exhibit similar evolutionary rates, whereas the rest of the nodes and branches in the Saccharomycotina part of the phylogeny exhibit much higher evolutionary rates. This pattern suggests that the higher levels of genomic diversity in Saccharomycotina stem from an acceleration of evolutionary rate that occurred within the subphylum, after the divergence of Lipomycetaceae from the rest of the Saccharomycotina (Fig. 4, A and B).

Why do Saccharomycotina exhibit higher evolutionary rates compared with Pezizomycotina? Studies in other lineages, such as vertebrate animals (38), have previously shown that evolutionary rate is positively associated with generation time. Assuming that mutation rates are equal, species with shorter generation times will replicate their genomes more frequently, accruing more mutations per unit time. While the generation times of most fungi in our phylogeny are unknown, the generation times of model organisms in Saccharomycotina are thought to be shorter than those in Pezizomycotina. For example, the doubling time of the budding yeasts S. cerevisiae and C. albicans under optimal conditions is 90 min (39), while that of the filamentous fungi Aspergillus nidulans and Neurospora crassa is between 2 and 3 hours (40, 41). It is also well established that absence or loss of DNA repair genes can lead to an increase in genomes’ mutation rates (42). We found that Saccharomycotina have, on average, fewer DNA repair genes (41) than Pezizomycotina (54) (Fig. 3 and Table 1). Thus, an alternative but not mutually exclusive explanation for the higher evolutionary rates in Saccharomycotina may stem at least partially from the presence of a smaller set of genes involved in DNA repair in their genomes than in the genomes of Pezizomycotina. Last, other life history traits (e.g., smaller cell size, faster metabolism, and larger population size) that have been associated with variation in the rate of molecular evolution (43) might also contribute to higher evolutionary rates of the Saccharomycotina.

Variation in genomic GC content has historically been of broad interest in biology. Average GC content values of different genomic regions (e.g., intergenic regions and protein-coding regions) in Saccharomycotina are consistently lower than those in Pezizomycotina (figs. S5 and S6A). Similarly, gene-wise average estimates of GC content showed that all 815 BUSCO genes in Saccharomycotina have lower GC content values than those in Pezizomycotina (fig. S6B). Moreover, we found that the frequencies of amino acids encoded by GC-rich codons in Saccharomycotina are much lower than those of amino acids encoded by GC-rich codons in Pezizomycotina (fig. S7). Ancestral state reconstruction of genomic GC content along branches on the phylogeny shows that the last common ancestors of the Saccharomycotina and Pezizomycotina, as well as branches in Lipomycetaceae and branches in classes Pezizomycetes and Orbiliomycetes, exhibit intermediate GC content of around 45%. In contrast, GC content of most branches within the rest of Saccharomycotina (i.e., all major clades of Saccharomycotina, including extant taxa, except Lipomycetaceae) evolved toward 40%, while GC content within the rest of Pezizomycotina (i.e., all classes, including extant taxa, except Pezizomycetes and Orbiliomycetes) evolved toward 50%. This pattern suggests that the evolution of lower overall GC content in Saccharomycotina occurred after the divergence of Lipomycetaceae from the rest of Saccharomycotina and that the evolution of higher overall GC content in Pezizomycotina occurred after the divergence of the clade consisting of Pezizomycetes and Orbiliomycetes from the rest of Pezizomycotina (Fig. 4, A and B).

Why are Pezizomycotina genomes more GC rich compared with Saccharomycotina genomes? There are two possible explanations. The first one is that mutational biases have skewed the composition of Saccharomycotina genomes toward AT (adenine-thymine) content (44). For example, Steenwyk et al. showed that Hanseniaspora budding yeasts with higher AT content lost a greater number of DNA repair genes than those with lower AT content (42), suggesting that the loss of DNA repair genes is associated with AT richness. Consistent with these results, we found that Pezizomycotina genomes contain a higher number of DNA repair genes than Saccharomycotina (Fig. 3 and Table 1). The second potential, not necessarily mutually exclusive, explanation is that mutational biases have skewed Pezizomycotina genomes toward GC richness. It was recently shown that increasing GC-biased gene conversion (gBGC), a process associated with recombination that favors the transmission of GC alleles over AT alleles, can result in a systematic underestimate of dN/dS in birds (45). If this is true for Ascomycota, because of the higher GC content of Pezizomycotina genomes, we would expect that their dN/dS would be underestimated due to the higher levels of gBGC compared with Saccharomycotina. Consistent with this expectation, by calculating differences in dN/dS before and after accounting for gBGC across 815 codon-based BUSCO genes, we found that the underestimate of dN/dS in Pezizomycotina is twofold higher than that in Saccharomycotina (Pezizomycotina: average of differences in dN/dS = 0.004; Saccharomycotina: average of differences in dN/dS = 0.002) (fig. S8).

CONCLUSION AND FUTURE WORK

In this study, we took advantage of the recent availability of the genome sequences of 1107 Ascomycota species from Saccharomycotina (332), Pezizomycotina (761), and Taphrinomycotina (14) to infer a genome-scale phylogeny and timetree for the phylum and compare the mode of genome evolution across its subphyla. Although we collected publicly available genomes for the greatest possible taxonomic sampling as of 14 December 2018, we note that the genome sequences of the 1107 taxa sampled are unevenly distributed across the three subphyla of Ascomycota. Furthermore, our sample of taxa did not contain novel species found in environmental samples (46). Similarly, while our data matrix includes 815 genes, thousands of additional homologous genes could potentially be added. Therefore, it is possible that both the inference of the species phylogeny and the estimation of divergence times of Ascomycota may change as more taxa and genes are added.

These caveats notwithstanding, leveraging genome-scale amounts of data from the most comprehensive taxon set to date enabled us to test the robustness of our inference for several contentious branches, potentially resolving controversies surrounding key higher-level relationships within the Ascomycota phylum. For example, our study robustly supported Saccharomycotina as the sister group to Pezizomycotina, and a clade composed of classes Pezizomycetes and Orbiliomycetes as the sister group to the rest of the Pezizomycotina. Our first genome-scale timetree suggests the last common ancestor of Ascomycota likely originated in the Ediacaran period. Examination of mode of genome evolution revealed that Saccharomycotina, which contains the single currently described class Saccharomycetes, and Pezizomycotina, which contains 16 classes, exhibited greatly contrasting evolutionary processes for seven genomic properties, in particular for evolutionary rate, GC content, and genome size. Our results provide a robust evolutionary framework for understanding the fundamental genetic and ecological processes that have generated the biodiversity of the largest fungal phylum.

MATERIALS AND METHODS

Data collection

To collect the greatest possible set of genome representatives of the phylum Ascomycota as of 14 December 2018, we first retrieved the 332 publicly available Saccharomycotina yeast genomes (https://doi.org/10.6084/m9.figshare.5854692) from a recent comprehensive genomic study of the Saccharomycotina yeasts (2). We then used “Pezizomycotina” and “Taphrinomycotina” as search terms in NCBI’s Genome Browser (www.ncbi.nlm.nih.gov/genome/browse#!/eukaryotes/Ascomycota) to obtain the basic information of strain name, assembly accession number, assembly release date, assembly level (e.g., contig and scaffold), and GenBank File Transfer Protocol (FTP) access number for draft genomes from the subphyla Pezizomycotina and Taphrinomycotina, respectively. For species with multiple isolates sequenced, we only included the genome of the isolate with the highest assembly level and the latest release date. We next downloaded genome assemblies from GenBank data via FTP access number (https://ftp.ncbi.nlm.nih.gov/genomes/). Collectively, we included 332 species representing 77 genera and all 12 major clades of the subphylum Saccharomycotina, 761 species representing 257 genera and nine classes of the subphylum Pezizomycotina, and 14 species representing 6 genera and four classes of the subphylum Taphrinomycotina. Last, we used the genomes of six representatives of the phylum Basidiomycota as out-groups. To check whether our sample of taxa were biased toward particular ecological lifestyles, we obtained the trophic categories for the 1107 Ascomycota species used in this study [symbiotrophs (e.g., ectomycorrhizae and lichens), pathotrophs (e.g., biotrophs, parasites, and pathogens), and saprotrophs (e.g., wood rotters and litter rotters)] from the FUNGuild database (47). We found that Saccharomycotina and Pezizomycotina, the two subphyla with the largest numbers of taxa sampled, contained taxa from all three trophic categories in similar proportions. Specifically, the numbers of Saccharomycotina species classified as symbiotrophs, pathotrophs, saprotrophs, and at least two trophic categories are 2 of 154 (1%), 45 of 154 (29%), 47 of 154 (30%), and 60 of 154 (40%), respectively; the numbers of Pezizomycotina species classified as symbiotrophs, pathotrophs, saprotrophs, and at least two trophic categories are 43 of 702 (6%), 183 of 702 (26%), 176 of 702 (25%), and 300 of 702 (43%), respectively. Detailed information on taxonomy and source of the 1107 Ascomycota and six out-group genomes in our study are provided in tables S1, S2, and S7.

Assessment of genome assemblies and phylogenomic data matrix construction

To assess the quality of each of the 1113 genome assemblies, we used the BUSCO version 3.0.2 (31). Each assembly’s completeness was assessed on the basis of the presence/absence of a set of 1315 predefined orthologs (referred to as BUSCO genes) from 75 genomes in the OrthoDB version 9 database from the Ascomycota database, as described previously (27). In brief, for each BUSCO gene, its consensus orthologous protein sequence among the 75 reference genomes was used as query in a tBLASTn search against each genome to identify up to three putative genomic regions, and the gene structure of each putative genomic region was predicted by AUGUSTUS v3.2.2 (48). Next, the sequences of these predicted genes were aligned to the hidden Markov model profile of the BUSCO gene. BUSCO genes in a given genome assembly were considered as single copy, “full-length” if there was only one complete predicted gene present in the genome; duplicated, “full-length” if there were two or more complete predicted genes present in the genome; “fragmented” if the predicted gene was shorter than 95% of the aligned sequence lengths from the 75 reference species; and “missing” if there was no predicted gene present in the genome.

To construct the phylogenomic data matrix, we started with the set of 1315 single-copy, full-length BUSCO genes from 1107 representatives of the phylum Ascomycota and six out-groups. For each BUSCO gene, we first translated nucleotide sequences into amino acid sequences, taking into account the different usage of the CUG (cytosine-uracil-guanine) codon in Saccharomycotina (2). Next, we aligned the amino acid sequences using Multiple Alignment using Fast Fourier Transform (MAFFT) v7.299b (49) with the options “--thread 4 --auto --maxiterate 1000” and trimmed amino acid alignments using the trimAl v1.4.rev15 (50) with the options “-gappyout -colnumbering.” We mapped the nucleotide sequences on the trimmed amino acid alignment based on the column numbers in the original alignment and to generate the trimmed codon-based nucleotide alignment. Last, we removed BUSCO gene alignments whose taxon occupancy (i.e., percentage of taxa whose sequences were present in the trimmed amino acid alignment) was <50% for each subphylum (i.e., <7 Taphrinomycotina, <166 Saccharomycotina, and <381 Pezizomycotina) or whose trimmed alignment length was <300 amino acid sites. These filters resulted in the retention of 815 BUSCO gene alignments, each of which had ≥50% taxon occupancy for each subphylum and alignment length ≥300 amino acid sites.

Phylogenetic analysis

For each of the 815 BUSCO genes, we first inferred its best-fitting amino acid substitution model using IQ-TREE multithread version 1.6.8 (51) with options “-m TEST -mrate G4” with the Bayesian information criterion. We then inferred best-scoring ML gene tree under 10 independent tree searches using IQ-TREE. The detailed parameters for running each gene were kept in log files (see the Figshare repository). We inferred the concatenation-based ML tree using IQ-TREE on a single node with 32 logical cores under a single “LG + G4” model with the options “-seed 668688 -nt 32 -mem 220G -m LG+G4 -bb 1000”, as 404 of 815 genes favored “LG + G4” as best-fitting model (table S3). We also inferred the coalescent-based species phylogeny with ASTRAL-III version 4.10.2 (52) using the set of 815 individual ML gene trees. The reliability of each internal branch was evaluated using 1000 ultrafast bootstrap replicates and local posterior probability in the concatenation- and coalescence-based species trees, respectively. We visualized phylogenetic trees using the R package ggtree v1.10.5 (53).

We used the RelTime method, as implemented in the command line version of MEGA7 (54), to estimate divergence times. The concatenation-based ML tree with branch lengths was used as the input tree. Six time calibration nodes, which were retrieved from the TimeTree database (55), were used for molecular dating analyses: the S. cerevisiaeSaccharomyces uvarum split (14.3 to 17.94 Ma ago), the S. cerevisiaeKluyveromyces lactis split (103 to 126 Ma ago), the S. cerevisiaeC. albicans split (161 to 447 Ma ago), the origin of the subphylum Saccharomycotina (304 to 590 Ma ago), the S. cerevisiaeSaitoella complicata split (444 to 631 Ma ago), and the origin of the subphylum Pezizomycotina (at least 400 Ma ago) based on the Paleopyrenomycites devonicus fossil (10, 56).

To explore whether divergence time estimates were robust when using different methods, we also inferred divergence times using a Bayesian MCMCTree analysis (57). As the MCMCTree approach is computationally intractable for very large phylogenomic data matrices like the one we assembled for this study, we pruned the full data matrix to retain 48 representatives of 25 major groups (four classes in Taphrinomycotina, nine classes in Pezizomycotina, and 12 major clades in Saccharomycotina) so that we can estimate the divergence times of 47 nodes on the backbone of the Ascomycota phylogeny. We first estimated branch lengths under a single LG + G4 model with codeml in the PAML 4.9e package (57) and obtained a rough mean of the overall mutation rate. Next, we applied the approximate likelihood method to estimate the gradient vector and Hessian matrix with Taylor expansion (option usedata = 3). Last, we assigned (i) the Gamma-Dirichlet prior for the overall substitution rate (option rgene_gamma) as G(1, 8.36), with a mean of 0.11 (meaning 11 × 10−10 amino acid substitutions per site per year), (ii) the Gamma-Dirichlet prior for the rate-drift parameter (option sigma2 gamma) as G(1, 4.5), and (iii) the parameters for the birth-death sampling process with birth and death rates λ = μ = 1 and sampling fraction ρ = 0. We used the autocorrelated-rate model (option clock = 3) to account for the rate variation across different lineages and used soft bounds (left and right tail probabilities equal 0.025) to set minimum and maximum values for the four calibration splits mentioned above. The Markov chain Monte Carlo (MCMC) run was first run for 100,000 iterations as burn-in, then sampled every 500 iterations until a total of 3000 samples were collected. Our results showed that divergence time estimates for the 47 backbone nodes between the RelTime and MCMCTree methods are very similar, and their correlation is highly significant (Pearson’s correlation coefficient r = 0.9907; P < 0.00001) (fig. S9). Furthermore, the average percentage of deviation (that is the average percentage of the absolute deviation from a mean point) between the two methods is small (∼3%) (table S6).

Examination of seven genome properties

As the subphylum Taphrinomycotina (no. of species = 14) has a much smaller number of species than the subphylum Saccharomycotina (no. of species = 332) and the subphylum Pezizomycotina (no. of species = 761) in our dataset, we focused our analyses on the comparisons of seven genome properties (evolutionary rate, GC content, genome size, number of genes, number of DNA repair genes, number of tRNA genes, and dN/dS) between Saccharomycotina and Pezizomycotina. Specifically, for a given taxon, (i) evolutionary rate is a sum of path distances from the most common ancestor of the subphyla Saccharomycotina and Pezizomycotina to its tip on the concatenation-based ML tree (Fig. 1); (ii) GC content is the percentage of GC nucleotides in genome; (iii) genome size is the total number of base pairs in genome in megabases; (iv) number of genes is the number of protein-coding genes in genome [the gene structure was predicted with AUGUSTUS v3.3.1 (48) on Aspergillus fumigatus and S. cerevisiae S288C trained models for Pezizomycotina and Saccharomycotina, respectively]; (v) number of DNA repair genes was estimated by counting the number of unique protein-coding genes with Gene Ontology terms related to DNA repair using InterProscan version 5 (58); (vi) the number of tRNA genes is the number of tRNA genes inferred to be present using the tRNAscan-SE 2.0 program (59); and (vii) dN/dS was estimated by calculating the average of the ratio of the expected numbers of nonsynonymous (dN) and synonymous substitutions (dS) along the external branch (tip) of a given taxon across 815 trimmed codon-based BUSCO gene alignments and the concatenation-based ML tree. Note that the concatenation-based ML tree was pruned to match the taxon set in the codon-based alignment if the concatenation-based ML tree contained taxa that were absent from the codon-based alignment. We used the YN98 (F3X4) codon model and the free ratio model using bppml and MapNH in the bio++ libraries, following the study by Bolívar et al. (45).

Statistical analyses

All statistical analyses were performed in R v. 3.4.2 (R core team 2017). Pearson’s correlation coefficient was used to test for correlations among seven variables. To account for phylogenetic relationships of species in correlation analysis, we used the R package ape v5.1 (60) to compute phylogenetically independent contrasts following the method described by Felsenstein (61).

Ancestral state reconstruction

To reconstruct ancestral character states for each of seven continuous properties, we used the R package phytools v0.6.44 function contMap (62) to infer ancestral character states across internal nodes using the ML method with the function fastAnc and to interpolate the states along each edge using Eq. 2 of Felsenstein (61). The input tree was derived from the concatenation-based ML with branch lengths, which was then pruned to keep the 1093 taxa from the subphyla Pezizomycotina and Saccharomycotina.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/45/eabd0079/DC1

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We thank members of the Rokas and Hittinger laboratories, especially members of the Y1000+ Project for constructive feedback. Funding: This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. X.-X.S. was supported by a grant from the National Natural Science Foundation of China (No. 32071665) and the Fundamental Research Funds for the Central Universities (no. 2020QNA6019). X.Z. was supported by the National Key Project for Basic Research of China (973 Program, no. 2015CB150600) and the open fund from Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Crop Science, Fujian Agriculture and Forestry University (GBMUC-2018-005). M.G. was supported by the Royal Netherlands Academy of Arts and Sciences. C.T.H. was supported by the National Science Foundation (DEB-1442148), the USDA National Institute of Food and Agriculture (Hatch Project No. 1020204), in part by the DOE Great Lakes Bioenergy Research Center (DOE BER Office of Science No. DE-SC0018409), the Pew Charitable Trusts (Pew Scholar in the Biomedical Sciences), and the Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation (H. I. Romnes Faculty Fellow). J.L.S. and A.R. were supported by the Howard Hughes Medical Institute through the James H. Gilliam Fellowships for Advanced Study program. A.R. was supported by the National Science Foundation (DEB-1442113), the Guggenheim Foundation, and the Burroughs Wellcome Fund. Author contributions: Study conception and design: X.-X.S., C.T.H., and A.R. Acquisition of data: X.-X.S. Analysis and interpretation of data: X-X.S., J.L.S., A.L.L., D.A.O., X.Z., J.K., Y.L., M.G., C.T.H., and A.R. Drafting of manuscript: X.-X.S. and A.R. Critical revision: all authors. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All genome assemblies and proteomes are publicly available in the Zenodo repository: https://doi.org/10.5281/zenodo.3783970. Multiple sequence alignments, phylogenetic trees, trait ancestral character state reconstructions, log files, R codes, and custom Perl scripts are available on the Figshare repository: https://doi.org/10.6084/m9.figshare.12196149.
View Abstract

Stay Connected to Science Advances

Navigate This Article