Research ArticleANTHROPOLOGY

The comparative genomics and complex population history of Papio baboons

See allHide authors and affiliations

Science Advances  30 Jan 2019:
Vol. 5, no. 1, eaau6947
DOI: 10.1126/sciadv.aau6947
  • Fig. 1 Papio baboon species.

    (A) The appearance and current distribution of each baboon species, and the locations of three well-documented active hybrid zones are also shown. x1: hybrid zone between P. hamadryas and P. anubis (19, 28), x2: hybrid zone between P. cynocephalus and P. anubis (17, 26), x3: hybrid zone between P. kindae and P. ursinus (18). Drawings of each species by S. Nash. (B) Distinguishing features of Papio species. Body mass data from (16, 59) and unpublished data from J.P.-C., J.R., and C.J.J.

  • Fig. 2 Comparison of Alu mobilization rates in selected primate genomes.

    Only Alu elements specific to each lineage are included. The size of the circle corresponds to the number of near full-length lineage-specific AluY elements in that species. The bars on the right show the estimated number of insertions per million years for each lineage. For baboon (Panu_3.0), rhesus macaque (Mmul_8.0.1), African green monkey (chlSab2), chimpanzee (Pan_tro3), and human (GRCh38/hg38), AluY sequences were retrieved computationally by cross comparisons using the most recent available assemblies. Orangutan estimates are from P_pygmaeus2.0.2 (60). The number of lineage-specific AluY elements is similar in rhesus macaque and baboon, and more than twice that in the African green monkeys, despite a longer period of independent evolution for the African green monkeys.

  • Fig. 3 Phylogenetic relationships among baboon species.

    (A) Phylogeny generated using the polymorphism-aware phylogenetic method (PoMo) (23, 24). This topology for the three northern species is also supported by ML analysis of concatenated SNVs and by 43.9% of informative gene trees filtered to exclude any coding sequence genes [scaled concordance factor (CF) of 0.439, greater than the other two alternatives]. The topology shown for the three southern clade species is supported by the PoMo analysis and has a scaled CF score of 0.332. (B) One alternative topology for the northern species, supported by a scaled CF of 0.241. (C) One alternative topology for the southern species, supported by ML analysis of concatenated SNVs and a scaled CF score of 0.513, i.e., a larger proportion of gene trees that are devoid of coding genes than the other two alternative trees.

  • Fig. 4 Evolutionary and demographic history for Papio baboons.

    (A) Analyses using f-statistics indicate that P. kindae was formed via input from both a southern clade lineage and a northern clade lineage, with contributions estimated to be 52 and 48%. P. papio is inferred to have been produced through 10% introgression from an unidentified ancient northern lineage into a population related to P. anubis. Dates for divergence and admixture events were inferred through CoalHMM, and internal nodes representing those divergence or admixture events are labeled A through K. Our analyses of asymmetric haplotype sharing also inferred admixture from P. cynocephalus into P. anubis approximately 21 generations ago. (B) Reconstruction of baboon demographic history using PSMC methods. A prolonged bottleneck was observed in the lineage ancestral to P. papio beginning ~400 thousand years (ka) ago, while the populations ancestral to P. hamadryas and P. anubis increased between ~280 and ~160 ka ago. After diverging, P. anubis followed an upward trend whereas P. hamadryas declined. At ~400 ka ago, Ne for P. ursinus diverged from estimates for the populations ancestral to P. cynocephalus and P. kindae, and underwent a species-specific prolonged bottleneck. At ~300 ka ago, the Ne reconstructed for both P. cynocephalus and P. kindae increased, peaking ~150 ka ago before experiencing a subsequent decline. PSMC methods are not always reliable for the most recent time periods.

  • Table 1 Summary of diverse data types and analytical approaches used to investigate the phylogeny of baboon species.
    Data typeAnalytical methodPrimary clusteringNorthern clade resultSouthern clade result
    SNVsML phylogenyNorth versus southhamad-[papio-anubis]kinda-[cyno-ursinus]
    SNVsBayesian phylogenyNorth plus kinda versus
    two south
    kinda-[hamad-[papio-anubis]][cyno-ursinus]
    SNVsPolymorphism-aware phylogenyNorth versus southhamad-[papio-anubis]cyno-[kinda-ursinus]
    SNVsf-statisticsNorth versus southhamad-[papio-anubis]cyno-[kinda-ursinus]
    Alu insertion polymorphismsMajority-rule Dollo parsimonyNorth versus southUnresolved phylogeny with
    north-south split
    Unresolved phylogeny with
    north-south split
    Locus-specific trees for segments
    without coding sequences
    Concordance factorsNorth versus southhamad-[papio-anubis]kinda-[cyno-ursinus]

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/1/eaau6947/DC1

    Section S1. Rationale for baboon taxonomy and nomenclature

    Section S2. Rationale for mutation rate used in PSMC analyses

    Section S3. Sequencing and assembly of olive baboon genome

    Section S4. Annotation and gene content of the baboon genome

    Section S5. Identification of SNVs and small indels within baboon species

    Section S6. Validation of species identity within the diversity panel

    Section S7. Lineage-specific Alu insertion in OWMs and hominoids

    Section S8. Alternative methods for constructing phylogenetic trees

    Section S9. Identification of admixture through asymmetric allele sharing

    Section S10. Polymorphic AluY insertions across Papio species

    Section S11. Bayesian concordance analyses of gene trees devoid of coding sequences

    Section S12. CoalHMMs of admixture trees and events

    Section S13. Locus-specific phylogenetic trees for chromosomal segments containing annotated genes

    Fig. S1. Panu3.0 genome assembly process.

    Fig. S2. Workflow used in variant calling pipeline.

    Fig. S3. Details regarding SNV calls.

    Fig. S4. Maximum likelihood and Bayesian phylogenetic trees based on SNV data.

    Fig. S5. Test of phylogeny reconstruction using PoMo.

    Fig. S6. Identification of admixture using f-statistics.

    Fig. S7. Evidence for admixture from haplotyping sharing.

    Fig. S8. A cladogram of Papio individuals from the diversity panel.

    Fig. S9. Bayesian concordance analysis.

    Fig. S10. Bootstrap analysis of timing of divergence events.

    Fig. S11. Confidence intervals for baboon admixture proportions.

    Fig. S12. Results for simulated admixture analysis.

    Fig. S13. Results for correction factor adjustment of admixture history.

    Fig. S14. Model used to estimate specific divergence and admixture history.

    Fig. S15. Unbiased estimates dating divergences and admixture events.

    Fig. S16. Phylogeny representing cluster 1 genic regions.

    Fig. S17. Phylogeny representing cluster 2 genic regions.

    Fig. S18. Phylogeny representing cluster 3 genic regions.

    Table S1. Assembly statistics.

    Table S2. Annotation of baboon genome assemblies.

    Table S3. DNA samples used for diversity analysis.

    Table S4. SNV variation among 15 Papio baboons and a gelada.

    Table S5. Full-length AluY insertions and lineage-specific insertions in primate genomes.

    Table S6. Effect of admixture on branch lengths measured in substitutions per site.

    Table S7. Bayes factors comparing alternate phylogenies.

    Table S8. Divergence time estimates across triplets.

    Table S9. Admixture proportion estimates across triplets.

    Table S10. GO terms associated with genes falling in clusters 1 to 3 of genic regions.

    References (6178)

  • Supplementary Materials

    The PDF file includes:

    • Section S1. Rationale for baboon taxonomy and nomenclature
    • Section S2. Rationale for mutation rate used in PSMC analyses
    • Section S3. Sequencing and assembly of olive baboon genome
    • Section S4. Annotation and gene content of the baboon genome
    • Section S5. Identification of SNVs and small indels within baboon species
    • Section S6. Validation of species identity within the diversity panel
    • Section S7. Lineage-specific Alu insertion in OWMs and hominoids
    • Section S8. Alternative methods for constructing phylogenetic trees
    • Section S9. Identification of admixture through asymmetric allele sharing
    • Section S10. Polymorphic AluY insertions across Papio species
    • Section S11. Bayesian concordance analyses of gene trees devoid of coding sequences
    • Section S12. CoalHMMs of admixture trees and events
    • Section S13. Locus-specific phylogenetic trees for chromosomal segments containing annotated genes
    • Fig. S1. Panu3.0 genome assembly process.
    • Fig. S2. Workflow used in variant calling pipeline.
    • Fig. S3. Details regarding SNV calls.
    • Fig. S4. Maximum likelihood and Bayesian phylogenetic trees based on SNV data.
    • Fig. S5. Test of phylogeny reconstruction using PoMo.
    • Fig. S6. Identification of admixture using f-statistics.
    • Fig. S7. Evidence for admixture from haplotyping sharing.
    • Fig. S8. A cladogram of Papio individuals from the diversity panel.
    • Fig. S9. Bayesian concordance analysis.
    • Fig. S10. Bootstrap analysis of timing of divergence events.
    • Fig. S11. Confidence intervals for baboon admixture proportions.
    • Fig. S12. Results for simulated admixture analysis.
    • Fig. S13. Results for correction factor adjustment of admixture history.
    • Fig. S14. Model used to estimate specific divergence and admixture history.
    • Fig. S15. Unbiased estimates dating divergences and admixture events.
    • Fig. S16. Phylogeny representing cluster 1 genic regions.
    • Fig. S17. Phylogeny representing cluster 2 genic regions.
    • Fig. S18. Phylogeny representing cluster 3 genic regions.
    • Table S1. Assembly statistics.
    • Table S2. Annotation of baboon genome assemblies.
    • Table S3. DNA samples used for diversity analysis.
    • Table S4. SNV variation among 15 Papio baboons and a gelada.
    • Table S5. Full-length AluY insertions and lineage-specific insertions in primate genomes.
    • Table S6. Effect of admixture on branch lengths measured in substitutions per site.
    • Table S7. Bayes factors comparing alternate phylogenies.
    • Table S8. Divergence time estimates across triplets.
    • Table S9. Admixture proportion estimates across triplets.
    • Legend for table S10
    • References (6178)

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • Table S10 (Microsoft Excel format). GO terms associated with genes falling in clusters 1 to 3 of genic regions.

    Files in this Data Supplement:

Navigate This Article