Research ArticleGENETICS

De novo assembly of the goldfish (Carassius auratus) genome and the evolution of genes after whole-genome duplication

See allHide authors and affiliations

Science Advances  26 Jun 2019:
Vol. 5, no. 6, eaav0547
DOI: 10.1126/sciadv.aav0547
  • Fig. 1 Basic statistics for the goldfish genome in comparison to grass carp, common carp, and zebrafish.

    (A) The gynogenetic goldfish used for sequencing before sacrifice. (B) Transposable elements distribution for goldfish (GF) and zebrafish (ZF). (C) Distribution of orthologous/ohnologous gene pairs by synonymous substitution among four species: zebrafish, grass carp (GC), common carp (CC), and goldfish. Numbers are a count of the homologous genes shared among zebrafish, common carp, and goldfish. (D) Rate of synonymous base changes (dS) for various species comparisons. (E) The phylogenetic tree shows the time of divergence of grass carp from goldfish and common carp (green circle), the WGD (red triangle), and divergence common carp and goldfish (cyan square). Each genome from the duplication was analyzed separately (chromosomes randomly assigned) and are denoted with _1 or _2 for both common carp and goldfish. (Photo credit: Yoshihiro Omori, Osaka University).

  • Fig. 2 Chromosome collinearity is stable from zebrafish to goldfish.

    (A) Reciprocal BLAST best gene pair counts for each pair of chromosomes between common carp and goldfish. Color from yellow to red indicates low to high counts, respectively. (B) Reciprocal BLAST best gene pair counts for each pair of chromosomes between goldfish and zebrafish. Color from yellow to red indicates low to high counts, respectively. Goldfish to common carp results in 50 bivalents, and goldfish to zebrafish shows a clear 1:2 relationship. (C) Chain alignment along zebrafish chromosome six and the two duplicated chromosomes from goldfish and common carp. Very large stretches of collinearity are readily visible between zebrafish and goldfish, as are simple intrachromosomal inversions. The more fragmented relationship with common carp (e.g., chr12) may be the result of a more fragmented common carp assembly.

  • Fig. 3 The evolutionary relationships between zebrafish, grass carp, common carp, and goldfish can be used to study the dynamics of gene loss after WGD events.

    (A) Using zebrafish as the reference, the tree tracks gene and CNE loss at different evolutionary branch points. Numbers on nodes or leaves indicate retained genes (pink) or CNEs (skyblue). Negative number on the branches indicates the number of lost genes (pink) or CNEs (skyblue) on the corresponding branch. The red triangle represents the carp WGD event at 14.4 Ma ago. The blue square marks the speciation of common carp and goldfish at 11.0 Ma ago. A maximum-likelihood phylogenetic tree was constructed by using the third position of all codons of ohnologous genes. (B) Decay curve of gene loss. The rates of gene loss accelerated after the genome duplication event (i.e., thick gray line between the red triangle and blue square). We assume that most cases where both copies of a gene were lost in either goldfish or carp occurred after separation from grass carp but before the WGD.

  • Fig. 4 Gene expression is affected by changes in sequence, exon loss, and CNE loss.

    (A) Histogram of expression correlation (x axis) and expression Euclidean distance (y axis) between WGD ohnolog gene pairs. Each box lists the number of ohnolog pairs (×2 for total genes) and the percentage of the total number of pairs this group represents. Most of the genes (70.3%) had a correlation of 0.6 or better. (B) Expression distance distribution in different cDNA identity groups. The more closely related the cDNA sequence, the more closely correlated gene expression was. (C) Boxplot of expression distance in gene groups with different numbers of lost exons. The more exons lost, the less related gene expression becomes. Asterisks mark statistically significant differences. (D) Boxplot of tissue expression SD in gene groups with different numbers of CNEs lost. Similar to exons, loss of CNEs correlates with loss of concordant expression, but the effect size is smaller. Asterisks denote significant differences. (E) Gene expression clustered into 20 groups for the 19,500 ohnologous genes. Heatmap and the keys indicate the value of log2(TPM + 1). Left color bar indicates different clusters. Right bars show the number and percentage of the gene pairs in the same cluster. Colored links indicate the number of gene pairs split between different clusters, only numbers larger than 100 were plotted, and thicker links indicate larger counts.

  • Fig. 5 Systematic analysis of gene expression changes between duplicated genes can detect gene extinction, sub-F, and neo-F events.

    (A) Genes clustered into 20 groups for the 8483 zebrafish-goldfish gene triplets. Heatmap and the keys indicate the normalized value (z score) of log2(FPKM + 1). The left color bar indicates different clusters, the text next to the cluster color bar indicates major zebrafish-expressed tissue in each cluster, and unlabeled ones are expressed in all zebrafish tissues. B, brain; E, eye; H, heart; G, gill; M, muscle; T, tail fin. (B) Example of expression of subfunctionalized (left) and neofunctionalized (right) genes. Gray bar, zebrafish; red and blue bar, two goldfish orthologs. Asterisks indicate tissue(s) associated with sub-F or neo-F. (C) Cumulative sum of triplets in different zebrafish-goldfish nucleotide identity groups (left) and exon gain/loss groups. Genes in non-F, neo-F, and sub-F triplets have low nucleotide identity and higher exon gain/loss than the coexpressed group. Genes in sub-F and neo-F triplets have medial exon gain/loss.

  • Table 1 Assembly statistics.

    CarAur01 (Canu + genetic map)
    Longest scaffold37,185 kbp
    N1030,202 kbp (n = 10)
    N5022,763 kbp (n = 14)
    N9086.8 kbp (n = 1506)
    Total length1,820,635,051 bp
    No. of LGs50
    Total length of LGs1,246,641,604 bp
  • Table 2 Annotations statistics.

    CNE, conserved noncoding element (i.e., potential enhancers/promoters); GO, gene ontology.

    GoldfishCommon carpZebrafish
    (danRer10)
    Assembly size
    (bp)
    1,820,635,0511,713,641,4361,371,719,383
    GC content37.48%36.99%36.64%
    Repeats (bp)721,087,053 (39.6%)672,246,354 (39.2%)745,150,642
    (54.3%)
    Protein-coding
    genes
    70,32466,99925,600
    (Ensembl
    release 85)
    Genes with GO49,27218,779*
    Exons556,731547,164276,021
    Genes with
    InterPro
    49,27244,845*24,204*
    miRNA1,037769
    ncRNA
    (noncoding
    RNA)
    11,820
    4-way CNE
    counts
    486,767484,139*237,891*
    4-way CNE bp95,815,23397,818,440*44,090,004*
    Missing
    BUSCOs (of
    3023)
    167330*0 (used for original BUSCO set)

    *Data generated from this study.

    Supplementary Materials

    • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/6/eaav0547/DC1

      Supplementary Methods and Analysis

      Table S1. PacBio read statistics.

      Table S2. Assembly statistics for different coverage groups.

      Table S3. Repeated DNA statistics.

      Table S4. Core eukaryotic genes using BUSCOs.

      Table S5. Statistics of exon gain/loss.

      Table S6. Statistics of CNE gain/loss.

      Table S7. Triplets with different number of coexpressed tissues.

      Table S8. Number and percentage of ohnolog clusters in evolutionary fate categories.

      Table S9. Comparison of features between ZF-GF1 and ZF-GF2, where “Mean1” and “Mean2” are the mean of features between ZF-GF1 and ZF-GF2, respectively.

      Table S10. Comparison of features between different gene evolutionary fate.

      Fig. S1. Twenty-five–nucleotide oligomer occurrence distribution from 2 × 125 bp Illumina paired-end reads.

      Fig. S2. Screenshot of the UCSC Genome Browser implementation of the carAur01 assembly.

      Fig. S3. Distribution of exon and intron lengths.

      Fig. S4. RBH gene counts between zebrafish and common carp chromosomes.

      Fig. S5. RBH gene counts between grass carp and goldfish chromosomes.

      Fig. S6. RBH gene counts between goldfish whole-genome duplicated chromosomes.

      Fig. S7. Chain-net alignment between each zebrafish chromosome (middle light blue bars) and two corresponding whole-genome duplicated goldfish chromosomes (green bars), and goldfish to common carp (blue bars).

      Fig. S8. GO terms prone to retaining both gene copies (blue rectangle) or losing one copy (blue rectangle) after WGD in goldfish.

      Fig. S9. GO molecular function comparison among zebrafish (ZF), grass carp (GC), common carp (CC), goldfish (GF).

      Fig. S10. Example of neo-F.

      Fig. S11. Expression of ohnolog gene pairs in seven tissues.

      Fig. S12. Number of ohnolog gene pairs in the same cluster (diagonal) or between each of the 20 clusters (top triangle).

      Fig. S13. Function enrichment and reduction in divergent expressed gene pairs.

      Fig. S14. Sequence divergence among zebrafish-goldfish triplets.

      Fig. S15. Pearson’s correlation coefficient between zebrafish ortholog (ZF)–goldfish ohnolog (GF) and goldfish ohnolog-ohnolog (GF1-GF2).

      Fig. S16. Definition of neo-F, sub-F, and neo-F.

      Fig. S17. Correlation between different classes of gene expression changes and gain/loss of CNEs.

      Fig. S18. Function enrichment (red) or reduction (blue) of genes in coexpressed groups.

      Fig. S19. Function enrichment (red) or reduction (blue) of genes in nonfunctionalized groups.

      Fig. S20. Function enrichment (red) or reduction (blue) of genes in subfunctionalized groups.

      Fig. S21. Function enrichment (red) or reduction (blue) of genes in neofunctionalized groups.

      List of members of the NISC Comparative Sequencing Program.

      References (6179)

    • Supplementary Materials

      The PDF file includes:

      • Supplementary Methods and Analysis
      • Table S1. PacBio read statistics.
      • Table S2. Assembly statistics for different coverage groups.
      • Table S3. Repeated DNA statistics.
      • Table S4. Core eukaryotic genes using BUSCOs.
      • Table S5. Statistics of exon gain/loss.
      • Table S6. Statistics of CNE gain/loss.
      • Table S7. Triplets with different number of coexpressed tissues.
      • Table S8. Number and percentage of ohnolog clusters in evolutionary fate categories.
      • Table S9. Comparison of features between ZF-GF1 and ZF-GF2, where “Mean1” and “Mean2” are the mean of features between ZF-GF1 and ZF-GF2, respectively.
      • Table S10. Comparison of features between different gene evolutionary fate.
      • Fig. S1. Twenty-five–nucleotide oligomer occurrence distribution from 2 × 125 bp Illumina paired-end reads.
      • Fig. S2. Screenshot of the UCSC Genome Browser implementation of the carAur01 assembly.
      • Fig. S3. Distribution of exon and intron lengths.
      • Fig. S4. RBH gene counts between zebrafish and common carp chromosomes.
      • Fig. S5. RBH gene counts between grass carp and goldfish chromosomes.
      • Fig. S6. RBH gene counts between goldfish whole-genome duplicated chromosomes.
      • Fig. S7. Chain-net alignment between each zebrafish chromosome (middle light blue bars) and two corresponding whole-genome duplicated goldfish chromosomes (green bars), and goldfish to common carp (blue bars).
      • Fig. S8. GO terms prone to retaining both gene copies (blue rectangle) or losing one copy (blue rectangle) after WGD in goldfish.
      • Fig. S9. GO molecular function comparison among zebrafish (ZF), grass carp (GC), common carp (CC), goldfish (GF).
      • Fig. S10. Example of neo-F.
      • Fig. S11. Expression of ohnolog gene pairs in seven tissues.
      • Fig. S12. Number of ohnolog gene pairs in the same cluster (diagonal) or between each of the 20 clusters (top triangle).
      • Fig. S13. Function enrichment and reduction in divergent expressed gene pairs.
      • Fig. S14. Sequence divergence among zebrafish-goldfish triplets.
      • Fig. S15. Pearson’s correlation coefficient between zebrafish ortholog (ZF)–goldfish ohnolog (GF) and goldfish ohnolog-ohnolog (GF1-GF2).
      • Fig. S16. Definition of neo-F, sub-F, and neo-F.
      • Fig. S17. Correlation between different classes of gene expression changes and gain/loss of CNEs.
      • Fig. S18. Function enrichment (red) or reduction (blue) of genes in coexpressed groups.
      • Fig. S19. Function enrichment (red) or reduction (blue) of genes in nonfunctionalized groups.
      • Fig. S20. Function enrichment (red) or reduction (blue) of genes in subfunctionalized groups.
      • Fig. S21. Function enrichment (red) or reduction (blue) of genes in neofunctionalized groups.
      • References (6179)

      Download PDF

      Other Supplementary Material for this manuscript includes the following:

      Files in this Data Supplement:

    Navigate This Article