Research ArticleHUMAN GENETICS

Recovering signals of ghost archaic introgression in African populations

See allHide authors and affiliations

Science Advances  12 Feb 2020:
Vol. 6, no. 7, eaax5097
DOI: 10.1126/sciadv.aax5097
  • Fig. 1 Demography relating known and proposed archaic lineages to modern human populations.

    (A) Basic demographic model with CSFS fit. W Afr, West Africans; Eur, European; N, Neanderthal; D, Denisovan; UA, unknown archaic [see (18)]. Below, we show the CSFS in the West African YRI when restricting to SNPs where a randomly sampled allele from the high-coverage Vindija Neanderthal was observed to be derived [Neanderthal (data)], as well as where a randomly sampled allele from the high-coverage Denisovan genome was observed to be derived [Denisovan (data)]. We also show the CSFS under the proposed model [Neanderthal (model) and Denisova (model)]. Migration between Europe and West Africa introduces an excess of low-frequency variants but does not capture the decrease in intermediate frequency variants and increase in high-frequency variants. (B) Newly proposed model involving introgression into the modern human ancestor from an unknown hominin that separated from the human ancestor before the split of modern humans and the ancestors of Neanderthals and Denisovans. Below, we show the CSFS fit from the proposed model, which captures the U-shape observed in the data.

  • Fig. 2 ABC estimates of the demographic parameters of the archaic ghost population across four West African populations (YRI, ESN, GWD, and MSL).

    Posterior means are denoted by diamonds, and 95% credible intervals are denoted by lines. (A) The admixture time ta, (B) the admixture fraction α, (C) the split time of the introgressing population ts, and (D) the effective population size of the introgressing population Ne are shown. The parameter estimates are largely consistent across the African populations: We estimate split times of 360 ka to 1.02 Ma B.P., admixture times of 0 to 124 ka B.P., admixture fractions that range from 0.02 to 0.19, and effective population sizes that range from 22,000 to 28,000.

  • Fig. 3 Analysis of segments of archaic ghost ancestry found in the Yoruba and Mende populations.

    (A) Inference of segments of archaic ancestry was performed with ArchIE. ArchIE proceeds by simulating data under a model of archaic introgression, calculating population genetic summary statistics, and training a model to predict the probability that a 50-kb window in an individual comes from an archaic population. We apply the resulting predictor to genome sequences from the Yoruba and Mende populations. (B) Comparison of TMRCA between inferred archaic and nonarchaic segments to the TMRCA of a pair of nonarchaic segments in the Yoruba. On average, archaic segments are 1.69× older than nonarchaic segments. (C) Estimates of the divergence times of archaic segments inferred in Yoruba from KhoeSan, Jul‘hoan, two modern human pygmy genomes (Mbuti and Biaka), and Neanderthal and Denisovan genomes compared to divergence times of nonarchaic segments. P values are computed via block jackknife. Archaic segments are more diverged from all six genomes than nonarchaic segments.

  • Table 1 Genes harboring a high frequency of archaic segments in the Yoruba and Mende populations.

    Genes were selected by ranking the union of the set of putative archaic segments by frequency in either the Mende or Yoruba population and selecting the top 10 genes. Genes in bold denote frequencies greater than 50% in the respective population.

    ChromosomeGene nameFrequency
    (Yoruba)
    Frequency
    (Mende)
    Gene type
    chr1RP11-286M16.10.840.81lincRNA
    chr4KCNIP40.730.69Protein coding
    chr6MTFR20.670.78Protein coding
    chr8TRPS10.710.75Protein coding
    chr12RP11-125N22.20.120.88Pseudogene
    chr16HSD17B20.740.68Protein coding
    chr17NF10.830.85Protein coding
    chr17KRT18P610.840.36Pseudogene
    chr21MIR125B20.760.64MicroRNA

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/7/eaax5097/DC1

    Section S1. Current demographic models cannot explain the CSFS

    Section S2. The CSFS cannot be explained by departures from panmixia in the ancestor of archaics and modern humans

    Section S3. Exploration of models of introgression into the ancestors of present-day Africans

    Section S4. Parameter exploration of model A

    Section S5. Estimating parameters for the best-fit model of archaic introgression

    Section S6. Continuous migration versus a single pulse

    Section S7. Local ancestry inference

    Section S8. Extended discussion

    Section S9. ms command lines

    Fig. S1. Demographic model from Prüfer et al. (15) (see section S1 for details).

    Fig. S2. Demographic model topologies for introgression into the ancestors of present-day Africans simulations in figs. S20, S22, S24, S26, S28, and S30.

    Fig. S3. Demographic model topologies for mathematical results.

    Fig. S4. CSFS from 1000 Genomes Phase 3 data across all African populations included in the dataset.

    Fig. S5. CSFS from 1000 Genomes Phase 3 data in the Luhya population.

    Fig. S6. CSFS from 1000 Genomes Phase 3 data in the CEU and CHB.

    Fig. S7. Robustness of CSFS in YRI across mutation types and the Phase 1 1000 Genomes dataset.

    Fig. S8. Robustness of CSFS in YRI to genotype quality thresholds in archaic genomes.

    Fig. S9. CSFS in YRI when using alternate sources for the ancestral allele.

    Fig. S10. CSFS in YRI when controlling for biased gene conversion and background selection.

    Fig. S11. Simulations of the baseline model (section S1) with both ancestral misidentification (e1) and genotyping error in the archaic (e2).

    Fig. S12. Mutation rate and recombination rate variation.

    Fig. S13. Simulations of the demographic model inferred from Hsieh et al. (19) relating the Yoruba, Baka, and Biaka populations.

    Fig. S14. Simulations of a demographic model with structure and gene flow in Africa.

    Fig. S15. Models with continuous migration (m in units of migrants per generation) since the introgressing lineages lineage splits.

    Fig. S16. Current demographic models from the literature cannot explain the shape of the CSFS observed in fig. S4.

    Fig. S17. Models involving structure in the ancestor of modern humans and archaics cannot explain the observed CSFS.

    Fig. S18. Models involving ancestral structure from the literature cannot explain the observed CSFS.

    Fig. S19. Model A.1: Gene flow from the modern human ancestor branch back into the modern human ancestor before the out-of-Africa event.

    Fig. S20. Model sA.1: Simplified model of gene flow from the modern human ancestor branch back into the modern human ancestor before the out-of-Africa event.

    Fig. S21. Model A.2: Gene flow from the modern human ancestor branch into the African branch after the out of Africa event.

    Fig. S22. Model sA.2: Simplified model of gene flow from the modern human ancestor branch into the African branch after the out of Africa event.

    Fig. S23. Model B.1: Gene flow from the archaic branch into the modern human ancestor before the out-of-Africa event.

    Fig. S24. Model sB.1: Simplified model of gene flow from the archaic branch into the modern human ancestor before the out-of-Africa event.

    Fig. S25. Model B.2: Gene flow from the archaic branch into the African branch after the out-of-Africa event.

    Fig. S26. Model sB.2: Simplified model of gene flow from the archaic branch into the African branch after the out-of-Africa event.

    Fig. S27. Model C.1: Gene flow from an unknown archaic branch into the modern human ancestor before the out-of-Africa event.

    Fig. S28. Model sC.1: Simplified model of gene flow from an unknown archaic branch into the modern human ancestor before the out-of-Africa event.

    Fig. S29. Model C.2: Gene flow from an unknown archaic branch into the African branch after the out-of-Africa event.

    Fig. S30. Model sC.2: Simplified model of gene flow from an unknown archaic branch into the African branch after the out-of-Africa event.

    Fig. S31. Simulations of the best-fitting parameters for models A, B, C (section S3).

    Fig. S32. Model A.2 with a population size of 0:01 Na in the introgressing population.

    Fig. S33. Model A.2 with a population size of 1 × 10−4 Na in the introgressing population.

    Fig. S34. Model A.2 with a population size of 1 × 10−4 Na in the introgressing population and migration between CEU and YRI over the last 20 ka B.P.

    Fig. S35. Model A.2 with a population size of 1 × 10−5 Na in the introgressing population, which branches off 200 ka B.P.

    Fig. S36. Model A.2 where the introgressing population splits at the same time as the archaic population (550 ka B.P.) with a population size of 0.01 Na.

    Fig. S37. Model A.2 where the introgressing population splits at the same time as the archaic population, 765 ka B.P.

    Fig. S38. Parameter estimates using ABC for model A.1 including ancestral misidentification (e1) and genotyping error in the archaic (e2).

    Fig. S39. Parameter estimates using ABC for model A.2 including ancestral misidentification (e1) and genotyping error in the archaic (e2).

    Fig. S40. Marginalized joint CSFS of YRI and CEU from simulations.

    Fig. S41. Distribution of allele frequencies for neutral archaic SNPs from model C with 13% introgression and an introgression time of 42 ka B.P.

    Fig. S42. Archaic segment frequency map for MSL and YRI.

    Fig. S43. CSFS from the baseline model allowing for recurrent mutations.

    Table S1. Description of the models examined in this work.

    Table S2. We simulated data from the Prüfer et al. (15) model and added in ancestral misidentification error and genotyping error in the archaic.

    Table S3. Model fits for null models including structure and departures from panmixia in the Modern Human (MH) ancestor.

    Table S4. Model fits for alternate models including admixture from other lineages.

    Table S5. Model fits for alternate models using a simplified demography.

    Table S6. Model fits for variations of model A.

    Table S7. Best-fitting parameter values for all populations using ABC.

    Table S8. P values of a test of goodness of fit for the best-fitting parameters for each class of demographic models.

    Appendix A. The CSFS is uniform under structure in the archaic population.

    Appendix B. The CSFS is symmetric under model A.

    References (4355)

  • Supplementary Materials

    This PDF file includes:

    • Section S1. Current demographic models cannot explain the CSFS
    • Section S2. The CSFS cannot be explained by departures from panmixia in the ancestor of archaics and modern humans
    • Section S3. Exploration of models of introgression into the ancestors of present-day Africans
    • Section S4. Parameter exploration of model A
    • Section S5. Estimating parameters for the best-fit model of archaic introgression
    • Section S6. Continuous migration versus a single pulse
    • Section S7. Local ancestry inference
    • Section S8. Extended discussion
    • Section S9. ms command lines
    • Fig. S1. Demographic model from Prüfer et al. ( 15) (see section S1 for details).
    • Fig. S2. Demographic model topologies for introgression into the ancestors of present-day Africans simulations in figs. S20, S22, S24, S26, S28, and S30.
    • Fig. S3. Demographic model topologies for mathematical results.
    • Fig. S4. CSFS from 1000 Genomes Phase 3 data across all African populations included in the dataset.
    • Fig. S5. CSFS from 1000 Genomes Phase 3 data in the Luhya population.
    • Fig. S6. CSFS from 1000 Genomes Phase 3 data in the CEU and CHB.
    • Fig. S7. Robustness of CSFS in YRI across mutation types and the Phase 1 1000 Genomes dataset.
    • Fig. S8. Robustness of CSFS in YRI to genotype quality thresholds in archaic genomes.
    • Fig. S9. CSFS in YRI when using alternate sources for the ancestral allele.
    • Fig. S10. CSFS in YRI when controlling for biased gene conversion and background selection.
    • Fig. S11. Simulations of the baseline model (section S1) with both ancestral misidentification (e1) and genotyping error in the archaic (e2).
    • Fig. S12. Mutation rate and recombination rate variation.
    • Fig. S13. Simulations of the demographic model inferred from Hsieh et al. (19) relating the Yoruba, Baka, and Biaka populations.
    • Fig. S14. Simulations of a demographic model with structure and gene flow in Africa.
    • Fig. S15. Models with continuous migration (m in units of migrants per generation) since the introgressing lineages lineage splits.
    • Fig. S16. Current demographic models from the literature cannot explain the shape of the CSFS observed in fig. S4.
    • Fig. S17. Models involving structure in the ancestor of modern humans and archaics cannot explain the observed CSFS.
    • Fig. S18. Models involving ancestral structure from the literature cannot explain the observed CSFS.
    • Fig. S19. Model A.1: Gene flow from the modern human ancestor branch back into the modern human ancestor before the out-of-Africa event.
    • Fig. S20. Model sA.1: Simplified model of gene flow from the modern human ancestor branch back into the modern human ancestor before the out-of-Africa event.
    • Fig. S21. Model A.2: Gene flow from the modern human ancestor branch into the African branch after the out of Africa event.
    • Fig. S22. Model sA.2: Simplified model of gene flow from the modern human ancestor branch into the African branch after the out of Africa event.
    • Fig. S23. Model B.1: Gene flow from the archaic branch into the modern human ancestor before the out-of-Africa event.
    • Fig. S24. Model sB.1: Simplified model of gene flow from the archaic branch into the modern human ancestor before the out-of-Africa event.
    • Fig. S25. Model B.2: Gene flow from the archaic branch into the African branch after the out-of-Africa event.
    • Fig. S26. Model sB.2: Simplified model of gene flow from the archaic branch into the African branch after the out-of-Africa event.
    • Fig. S27. Model C.1: Gene flow from an unknown archaic branch into the modern human ancestor before the out-of-Africa event.
    • Fig. S28. Model sC.1: Simplified model of gene flow from an unknown archaic branch into the modern human ancestor before the out-of-Africa event.
    • Fig. S29. Model C.2: Gene flow from an unknown archaic branch into the African branch after the out-of-Africa event.
    • Fig. S30. Model sC.2: Simplified model of gene flow from an unknown archaic branch into the African branch after the out-of-Africa event.
    • Fig. S31. Simulations of the best-fitting parameters for models A, B, C (section S3).
    • Fig. S32. Model A.2 with a population size of 0:01 Na in the introgressing population.
    • Fig. S33. Model A.2 with a population size of 1 × 10−4 Na in the introgressing population.
    • Fig. S34. Model A.2 with a population size of 1 × 10−4 Na in the introgressing population and migration between CEU and YRI over the last 20 ka B.P.
    • Fig. S35. Model A.2 with a population size of 1 × 10−5 Na in the introgressing population, which branches off 200 ka B.P.
    • Fig. S36. Model A.2 where the introgressing population splits at the same time as the archaic population (550 ka B.P.) with a population size of 0.01 Na.
    • Fig. S37. Model A.2 where the introgressing population splits at the same time as the archaic population, 765 ka B.P.
    • Fig. S38. Parameter estimates using ABC for model A.1 including ancestral misidentification (e1) and genotyping error in the archaic (e2).
    • Fig. S39. Parameter estimates using ABC for model A.2 including ancestral misidentification (e1) and genotyping error in the archaic (e2).
    • Fig. S40. Marginalized joint CSFS of YRI and CEU from simulations.
    • Fig. S41. Distribution of allele frequencies for neutral archaic SNPs from model C with 13% introgression and an introgression time of 42 ka B.P.
    • Fig. S42. Archaic segment frequency map for MSL and YRI.
    • Fig. S43. CSFS from the baseline model allowing for recurrent mutations.
    • Table S1. Description of the models examined in this work.
    • Table S2. We simulated data from the Prüfer et al. (15) model and added in ancestral misidentification error and genotyping error in the archaic.
    • Table S3. Model fits for null models including structure and departures from panmixia in the Modern Human (MH) ancestor.
    • Table S4. Model fits for alternate models including admixture from other lineages.
    • Table S5. Model fits for alternate models using a simplified demography.
    • Table S6. Model fits for variations of model A.
    • Table S7. Best-fitting parameter values for all populations using ABC.
    • Table S8. P values of a test of goodness of fit for the best-fitting parameters for each class of demographic models.
    • Appendix A. The CSFS is uniform under structure in the archaic population.
    • Appendix B. The CSFS is symmetric under model A.
    • References (4355)

    Download PDF

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article