Research ArticleEVOLUTIONARY BIOLOGY

Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow

See allHide authors and affiliations

Science Advances  04 Apr 2018:
Vol. 4, no. 4, eaap9873
DOI: 10.1126/sciadv.aap9873
  • Fig. 1 MSC tree.

    (A) An MSC species tree was constructed from 34,192 individual GFs. Internal branches within Balaenopteridae are numbered 1 to 7. All branches receive maximal support (P = 1.0, ASTRAL analysis). Branch lengths were calculated from an ML analysis. Gray whales, family Eschrichtiidae, are placed inside Balaenopteridae as a sister group to fin and humpback whales. (B) ASTRAL quartet-score analyses for branches 1 to 7 (A). Quartet scores were calculated for the three possible arrangements (q1 to q3) for the respective branch. The principal quartet trees are depicted, with q1 representing the species tree. Branch nos. 2 and 3 receive only limited quartet scores, and no quartet can be significantly rejected.

  • Fig. 2 Median network of 34,192 GF ML trees with 11% threshold.

    Conflicting evolutionary signals characterize the center of the network, which is equivalent to branch no. 3 in the species tree (Fig. 1). In addition, placing the minke whale has some conflicting signal, but the elongated rectangle indicates a higher degree of resolution. The number of supporting GFs is shown for selected splits. Colored circles indicate taxonomic classification. Blue, Balaenoptera; red, Megaptera; yellow, Eschrichtius; green, Balaena and Eubalaena.

  • Fig. 3 Gene flow signals for baleen whales inferred by the D statistic, DFOIL, and PhyloNet.

    (A) The species tree of baleen whales with gene flow signals detected by the D statistic and DFOIL indicated by dashed lines. Signals I to IV were inferred by the D statistic, and signals V, VI, and VII were detected by DFOIL and were partially corroborated by the D statistic. Note that DFOIL cannot infer gene flow involving the minke whale. (B to D) Rooted networks for the Balaenopteridae sensu lato phylogeny with reticulations inferred from PhyloNet based on 34,192 20-kbp GFs. Reticulations are shown as blue arrows with inheritance probability denoted above or below. Log-likelihood scores are shown below the networks. Notably, inheritance probability around 33% resembles the distribution of quartet scores and the phylogenetic signals from GFs (Fig. 1). (B) The three best networks indicated a reticulation originating at the circled three branches to minke whale. Similar likelihood scores do not allow the identification of a single origin of gene flow; therefore, the networks were merged, and a range of inheritance probabilities is given. (C) The fourth best network has only a marginally poorer likelihood score and indicates a reticulation between the ancestor of the fin and humpback whale and that of the minke whale. (D) The fifth best network has the same likelihood as (C) and finds an alternative placement of gray whale (blue branch) and reticulation from the ancestor of the blue and sei whale to that of the minke whale.

  • Fig. 4 Demographic history and genome-wide heterozygosity.

    (A) Genome-wide heterozygosity estimated from genomic 100-kbp windows. (B) Historical Ne using the PSMC analyses for all baleen whale genomes. The x axis shows the time, and the y axis shows Ne. Plots were scaled using a mutation rate (μ) of 1.39 × 10−8 substitutions nucleotide−1 generation−1 and species-specific generation times (g). Generation times are noted next to the species names. Light brown shading indicates interglacials (IG) in the Pleistocene and Holocene, and gray shading indicates the MPT and the PPT.

  • Fig. 5 Divergence time tree of Cetancodonta (56) including the newly sequenced baleen whales, estimated from 234,947 amino acid sites (2778 orthologs).

    Rorquals diverged in the late Miocene, 10.5 to 7.5 Ma ago. Four other cetartiodactyl species were also included but not shown due to space constraints; the dog (Canis lupus familiaris) was used as an outgroup. Five calibration points were used for dating (table S8) (29, 5660).

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/4/eaap9873/DC1

    fig. S1. Possible tree topologies for baleen whales that were evaluated by the AU test.

    fig. S2. Phylogenetic content of GFs.

    fig. S3. AU test for increasing GF sizes.

    fig. S4. MSC-based species trees generated by ASTRAL using 34,192 GFs, with each GF being 20 kbp long.

    fig. S5. Phylogenetic tree from mitochondrial genomes for baleen whales.

    fig. S6. A majority-rule consensus tree from 34,192 individual GF ML trees (table S6) calculated with the program CONSENSE of the PHYLIP package.

    fig. S7. Consensus networks for baleen whales from 34,192 gene trees (10-kbp GF) at different minimum thresholds of gene trees to form an edge.

    fig. S8. ML estimates of genome-wide heterozygosity estimated with mlRho.

    fig. S9. Blue whale heterozygosity for different sequencing depth.

    fig. S10. Demographic histories for each individual whale genome with 100 bootstrap replicates.

    table S1. Sequencing and mapping statistics.

    table S2. Occurrences of repetitive elements in the bowhead whale genome.

    table S3. Number of called substitutions for each whale genome.

    table S4. Library and sequencing information for the hippopotamus genome assembly.

    table S5. Summary of repetitive elements in the hippopotamus genome.

    table S6. A majority-rule consensus analysis of 34,192 individual GF ML trees.

    table S7. Common names, scientific names, accession numbers, and source database of additional genomes that were included in the divergence time analyses.

    table S8. Calibration points used for the divergence time tree, node age estimates in million years ago, and references.

    table S9. Divergence time estimates for Artiodactyla and Cetacea for nodes in the divergence time tree (Fig. 5).

    data S1. D statistics results.

    data S2. DFOIL results.

  • Supplementary Materials

    This PDF file includes:

    • fig. S1. Possible tree topologies for baleen whales that were evaluated by the AU test.
    • fig. S2. Phylogenetic content of GFs.
    • fig. S3. AU test for increasing GF sizes.
    • fig. S4. MSC-based species trees generated by ASTRAL using 34,192 GFs, with each GF being 20 kbp long.
    • fig. S5. Phylogenetic tree from mitochondrial genomes for baleen whales.
    • fig. S6. A majority-rule consensus tree from 34,192 individual GF ML trees (table S6) calculated with the program CONSENSE of the PHYLIP package.
    • fig. S7. Consensus networks for baleen whales from 34,192 gene trees (10-kbp GF) at different minimum thresholds of gene trees to form an edge.
    • fig. S8. ML estimates of genome-wide heterozygosity estimated with mlRho.
    • fig. S9. Blue whale heterozygosity for different sequencing depth.
    • fig. S10. Demographic histories for each individual whale genome with 100 bootstrap replicates.
    • table S1. Sequencing and mapping statistics.
    • table S2. Occurrences of repetitive elements in the bowhead whale genome.
    • table S3. Number of called substitutions for each whale genome.
    • table S4. Library and sequencing information for the hippopotamus genome assembly.
    • table S5. Summary of repetitive elements in the hippopotamus genome.
    • table S6. A majority-rule consensus analysis of 34,192 individual GF ML trees.
    • table S7. Common names, scientific names, accession numbers, and source database of additional genomes that were included in the divergence time analyses.
    • table S8. Calibration points used for the divergence time tree, node age estimates in million years ago, and references.
    • table S9. Divergence time estimates for Artiodactyla and Cetacea for nodes in the divergence time tree (Fig. 5).

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • data S1 (Microsoft Excel format). D statistics results.
    • data S2 (Microsoft Excel format). DFOIL results.

    Files in this Data Supplement:

Navigate This Article