Research ArticleEVOLUTIONARY BIOLOGY

Improved de novo genomic assembly for the domestic donkey

See allHide authors and affiliations

Science Advances  04 Apr 2018:
Vol. 4, no. 4, eaaq0392
DOI: 10.1126/sciadv.aaq0392
  • Fig. 1 Distribution of the cumulative scaffold length compared to previously published genome assemblies.

    The red line represents the genome assembly obtained in this work using the Chicago HiRise technology. It shows that the greater N50 value of our new assembly is not simply due to a few longer scaffolds than two previously reported assemblies. Mbp, million base pairs.

  • Fig. 2 Heterozygosity rates for various equine species.

    The heterozygosity estimates were computed using the same data aligned both to the horse genome (EquCab2.0) from a previous study and to the donkey reference presented in this study.

  • Fig. 3 Demographic trajectories of zebras and asses during the last ~2.5 million years (Ma).

    (A and B) PSMC reconstruction of the effective population size over time, for different ass species (A) and zebra species (B). The first 100 ka are highlighted for the ass and zebra species.

  • Fig. 4 Dot plot showing the correspondence of unique 101-nucleotide oligomers from the donkey scaffolds to their location on the horse genome, using exact matches.

    Because the orientation of the donkey scaffolds is unknown a priori, those were oriented using the strand that minimized the number and the size of inversions with respect to the horse chromosomes. The large inversions on the donkey scaffolds aligning to ECA7, ECA28, and ECA31 are enlarged for clarity. In the enlarged alignment to ECA7, donkey scaffold ScCGjx6_197 is not reverse-complemented consistently with the figures found in the Supplementary Materials.

  • Fig. 5 Genetic distance of the donkey scaffold to ECA28.

    The middle part of the scaffold (~20 Mb) represents a good candidate for an inversion in either lineage and shows inflated level of divergence at the breakpoints. The dotted lines in the bottom panel represent the genomic average and the 95% confidence interval for the upper and lower divergence.

  • Table 1 Quality metrics for this assembly compared to previous donkey genome assemblies.

    The number of annotated genes (lower than that in previous assemblies) shows a better homologous correspondence with the horse gene set (see Gene annotation).

    This studyHuang et al. (15)Orlando et al. (2)
    N50 contigs140.3 kb66.7 kb6.38 kb
    N50 scaffolds15.4 Mb3.8 Mb100.94 kb
    Coverage61.2×42.4×12.4×
    Total bases2.320 Gb2.391 Gb2.293 Gb
    Largest scaffold84.20 Mb17.06 Mb1.09 Mb
    Unresolved bases per 100 kb1121.611384.934128.43
    Total number of predicted protein-coding genes18,98423,850*24,156

    *Calculated using one isoform per gene and 42,247 total transcripts.

    Supplementary Materials

    • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/4/eaaq0392/DC1

      Supplementary Materials and Methods

      section S1. Supplementary Methods

      fig. S1. Venn diagram of the protein-coding genes that were annotated in the donkey assembly versus the protein-coding gene annotation for the horse.

      fig. S2. Venn diagram of the protein-coding genes that were annotated in the donkey assembly published by Huang et al. (15) versus the protein-coding gene annotation for the E. caballus genome (version EquCab2.0) using Ensembl genes (version 86).

      fig. S3. Alignment of horse chromosomes to six donkey scaffolds with putative signs of translocations.

      fig. S4. Alignment of donkey scaffolds to corresponding horse chromosomes.

      fig. S5. Genetic distance between scaffolds spanning the gap on ECA12 versus the background.

      fig. S6. Measured heterozygosity rates for the donkey scaffolds aligned to the various horse chromosomes.

      fig. S7. Nei’s genetic distance by windows of 30 kb between donkey and horse chromosomes for scaffolds with signs of inversions.

      fig. S8. Effective population size over time by aligning to the horse reference.

      fig. S9. Measured heterozygosity rates for the African wild ass using the donkey scaffolds aligned to the horse chromosomes.

      table S1. Translocations found between the donkey and horse scaffolds.

      table S2. Gene ontologies of biological processes and enriched Reactome pathways associated with genes found in donkey scaffolds with signs of inversions when compared to the horse genome.

      table S3. Human phenotypes, human diseases, and pathways associated with genes enriched in detected ROHs.

      table S4. Horse sequences used for the detection of donkey scaffolds pertaining to the Y chromosome.

      table S5. Heterozygosity rates for various species of asses and zebras computed when aligning to the donkey reference described in this study and recomputed on the basis of the data reported by Jónsson et al. (9), which were aligned to the horse reference.

      table S6. Listing missing proteins in complete and partially complete Eukaryotic Orthologous Groups from the Core Eukaryotic Genes Mapping Approach.

      table S7. Repeat elements and low-complexity DNA sequences masked in the donkey genome using RepeatMasker.

      table S8. Repeat elements and low-complexity DNA sequences masked in the donkey genome using the second of the RepeatMasker using the model generated from RepeatModeler as custom library input on the previously masked genome.

      table S9. Statistics of the completeness of the different versions of the donkey genome based on 248 Core Eukaryotic Genes.

      References (4462)

    • Supplementary Materials

      This PDF file includes:

      • Supplementary Materials and Methods
      • section S1. Supplementary Methods
      • fig. S1. Venn diagram of the protein-coding genes that were annotated in the donkey assembly versus the protein-coding gene annotation for the horse.
      • fig. S2. Venn diagram of the protein-coding genes that were annotated in the donkey assembly published by Huang et al. (15) versus the protein-coding gene annotation for the E. caballus genome (version EquCab2.0) using Ensembl genes (version 86).
      • fig. S3. Alignment of horse chromosomes to six donkey scaffolds with putative signs of translocations.
      • fig. S4. Alignment of donkey scaffolds to corresponding horse chromosomes.
      • fig. S5. Genetic distance between scaffolds spanning the gap on ECA12 versus the background.
      • fig. S6. Measured heterozygosity rates for the donkey scaffolds aligned to the various horse chromosomes.
      • fig. S7. Nei’s genetic distance by windows of 30 kb between donkey and horse chromosomes for scaffolds with signs of inversions.
      • fig. S8. Effective population size over time by aligning to the horse reference.
      • fig. S9. Measured heterozygosity rates for the African wild ass using the donkey scaffolds aligned to the horse chromosomes.
      • table S1. Translocations found between the donkey and horse scaffolds.
      • table S2. Gene ontologies of biological processes and enriched Reactome pathways associated with genes found in donkey scaffolds with signs of inversions when compared to the horse genome.
      • table S3. Human phenotypes, human diseases, and pathways associated with genes enriched in detected ROHs.
      • table S4. Horse sequences used for the detection of donkey scaffolds pertaining to the Y chromosome.
      • table S5. Heterozygosity rates for various species of asses and zebras computed when aligning to the donkey reference described in this study and recomputed on the basis of the data reported by Jónsson et al. (9), which were aligned to the
        horse reference.
      • table S6. Listing missing proteins in complete and partially complete Eukaryotic Orthologous Groups from the Core Eukaryotic Genes Mapping Approach.
      • table S7. Repeat elements and low-complexity DNA sequences masked in the donkey genome using RepeatMasker.
      • table S8. Repeat elements and low-complexity DNA sequences masked in the donkey genome using the second of the RepeatMasker using the model generated from RepeatModeler as custom library input on the previously masked genome.
      • table S9. Statistics of the completeness of the different versions of the donkey genome based on 248 Core Eukaryotic Genes.
      • References (44–62)

      Download PDF

      Files in this Data Supplement:

    Navigate This Article