Research ArticleHUMAN GENETICS

Population structure of modern-day Italians reveals patterns of ancient and archaic ancestries in Southern Europe

See allHide authors and affiliations

Science Advances  04 Sep 2019:
Vol. 5, no. 9, eaaw3492
DOI: 10.1126/sciadv.aaw3492
  • Fig. 1 Genetic structure of the Italian populations.

    (A) Simplified dendrogram of 3057 Eurasian samples clustered by the fS algorithm using the CP output (complete dendrogram in fig. S1C). Each leaf represents a cluster of individuals with similar copying vectors. Clusters with more than five individuals are labeled in black. Italian clusters are color coded. Gray labels ending in the form <<NAME>>_D refer to clusters containing less than five individuals or individuals of uncertain origin that have been removed in the following analyses. (B) Principal components analysis (PCA) based on the CP chunkcount matrix [colors as in (A)]. The centroid of the individuals belonging to non-Italian clusters is identified by the label for each cluster. The plot was rotated to the left by 90° to highlight the correspondence with the geography of the Italian samples. (C) Pie charts summarizing the relative proportions of inferred fS genetic clusters for all the 20 Italian administrative regions [colors as in (A)]. (D) Between-cluster FST estimates within European groups. Clusters were generated using only individuals belonging to the population analyzed (see Materials and Methods and the Supplementary Materials). The number of genetic clusters analyzed for each population is reported within brackets. For the comparisons across Europe, the cluster NEurope1 containing almost exclusively Finnish individuals was excluded (FST estimates for Italian and European clusters are in data file S2). FST distributions statistically lower than the Italian one are in colors other than green. (E) Estimated effective migration surfaces (EEMS) analysis in Southern Europe. Colors represent the log10 scale of the effective migration rate, from low (red) to high (yellow).

  • Fig. 2 Ancient ancestries in Western Eurasian modern-day clusters and Italian ancient samples.

    CP/NNLS analysis on all Italian and European clusters using as donors different sets of ancient samples and two modern clusters (NAfrica1, North Africa; EAsia2, East Asia) [full results in fig. S5 (A and B)]. (A) Ultimate sources: AN, Anatolian Neolithic (Bar8); WHG, western hunter-gatherer (Bichon); CHG, Caucasus hunter-gatherer (KK1); EHG, Eastern hunter-gatherer (I0061); IN, Iranian Neolithic (WC1). (B) EHG and (C) CHG ancestry contributions in Western Eurasia, as inferred in (A) and figs. S8A and S5A. (D) Same as in (A), using proximate sources: WHG, western hunter-gatherer (Bichon); EEN, European Early Neolithic (Stuttgart); SBA, Bronze Age from steppe (I0231); ABA, Bronze Age from Anatolia (I2683). (E) SBA and (F) ABA ancestry contributions, as inferred in (D) and fig. S5B. Triangles refer to the location of ancient samples used as sources (data file S1). (G) Ratio of the residuals in the NNLS analysis (see Materials and Methods and the Supplementary Materials) for all the Italian and European clusters when ABA was excluded and included in the set of proximate sources; (H) as in (G), but excluding/including SBA instead of ABA. (I) Ancient Italian and other selected ancient samples projected on the components inferred from modern European individuals. Labels are placed at the centroid of the individuals belonging to the indicated clusters.

  • Fig. 3 Admixture events inferred by GT.

    (A) Dates of the events inferred in the GT noItaly analysis on all the Italian clusters (labels as in Fig. 1A and data file S1; full results in fig. S8 and data file S5; see Materials and Methods and the Supplementary Materials); lines encompassed the 95% confidence interval. GT events were distinguished in “one date” (black squares; 1D in data file S5) and “one date multiway” (white squares; 1 MW). (B) Correlation values between copying vectors of first source(s) identified by GT and the best proxy in the noItaly analysis (circles) or the best proxy among Italian clusters (diamonds). (C) Same as in (B), referring to second source(s) copying vectors. Empty symbols refer to additional first (B) and second (C) sources detected in multiway events. African best proxies in (B) for clusters SItaly1 and SItaly2 were plotted on the 0.90 boundary for visualization only, the correlation values being 0.78 and 0.87, respectively. The symbols referring to the best Italian proxies for the African sources identified for clusters SItaly1, SItaly2, Sicily1, and Sardinia3 in (B) are not included as the correlation values are lower than the African ones and below the threshold used in the figure. The colors of the symbols refer to the ancestry to which proxies were assigned (see Materials and Methods and the Supplementary Materials).

  • Fig. 4 Neanderthal ancestry distribution in Eurasian populations.

    (A) Neanderthal allele counts in individuals from Eurasian populations, sorted by median values on 3969 LD-pruned Neanderthal tag-SNPs. CEU, Utah Residents with Northern and Western European ancestry; GBR, British in England and Scotland; FIN, Finnish in Finland; IBS, Iberian Population in Spain; TSI, Tuscans from Italy; ITN, Italians from Northern Italy; ITC, Italians from Central Italy; ITS, Italians from Southern Italy; SAR, Italians from Sardinia; CHB, Han Chinese. (B) Matrix of significances based on Wilcoxon rank sum test between pairs of populations including (lower triangular matrix) and removing (upper) outliers (see Materials and Methods and Supplementary Materials; dark blue, adjusted P < 0.05; light blue, adjusted P > 0.05). Colored squares at the sides of the heatmap refer to the populations compared, as per Fig. 4A. (C) Correlation between Neanderthal ancestry proportions and the amount of Basal Eurasian ancestry in European clusters (see Materials and Methods and the Supplementary Materials). (D and E) Neanderthal allele frequency (AF) for selected SNPs within the indicated genes: (D) high-frequency alleles in Europe and (E) North-South Europe divergent alleles. (F) Comparisons between Northern European and Italian populations (excluding Sardinia). Bars refer to comparison for reported pairs of populations; the number of NTT SNPs is reported within bars. Each section of the circos represents a tested chromosome; points refer to NTT SNPs. Colors are the same as for bars; igr, intergenic region variant.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/9/eaaw3492/DC1

    Fig. S1. Geographic location of populations included in FMD and HDD, and fineStructure dendrogram for all the 4,852 (FMD) and 1,641 (HDD) samples.

    Fig. S2. Allele frequency PCA (genotype based) and individual-level ADMIXTURE of modern samples.

    Fig. S3. “Cluster self-copy” analysis and PCA with admixed Italian individuals.

    Fig. S4. Results of the EEMS analysis on Italy-only populations.

    Fig. S5. CP/NNLS and qpAdm results for different sets of ancient sources for all modern clusters.

    Fig. S6. D-statistics analyses.

    Fig. S7. PCA and ADMIXTURE analyses of 63 ancient samples.

    Fig. S8. GT and MALDER analyses for all the Eurasian and North African clusters.

    Fig. S9. Neanderthal ancestry distribution in Eurasian population and its relationship with African admixture and Basal Eurasian ancestry.

    Data file S1. Modern and ancient samples used in this study.

    Data file S2. Cluster self-copy analysis.

    Data file S3. Weighted jackknife bootstraps.

    Data file S4. qpAdm results.

    Data file S5. GT and MALDER results.

    Data file S6. NTT SNPs (Neanderthal-Tag SNPs within the top 1% of the genome-wide distributions of each of the 55 pairwise population comparisons).

  • Supplementary Materials

    The PDF file includes:

    • Fig. S1. Geographic location of populations included in FMD and HDD, and fineStructure dendrogram for all the 4,852 (FMD) and 1,641 (HDD) samples.
    • Fig. S2. Allele frequency PCA (genotype based) and individual-level ADMIXTURE of modern samples.
    • Fig. S3. “Cluster self-copy” analysis and PCA with admixed Italian individuals.
    • Fig. S4. Results of the EEMS analysis on Italy-only populations.
    • Fig. S5. CP/NNLS and qpAdm results for different sets of ancient sources for all modern clusters.
    • Fig. S6. D-statistics analyses.
    • Fig. S7. PCA and ADMIXTURE analyses of 63 ancient samples.
    • Fig. S8. GT and MALDER analyses for all the Eurasian and North African clusters.
    • Fig. S9. Neanderthal ancestry distribution in Eurasian population and its relationship with African admixture and Basal Eurasian ancestry.
    • Legends for data files S1 to S6

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • Data file S1 (Microsoft Excel format). Modern and ancient samples used in this study.
    • Data file S2 (Microsoft Excel format). Cluster self-copy analysis.
    • Data file S3 (Microsoft Excel format). Weighted jackknife bootstraps.
    • Data file S4 (Microsoft Excel format). qpAdm results.
    • Data file S5 (Microsoft Excel format). GT and MALDER results.
    • Data file S6 (Microsoft Excel format). NTT SNPs (Neanderthal-Tag SNPs within the top 1% of the genome-wide distributions of each of the 55 pairwise population comparisons).

    Files in this Data Supplement:

Navigate This Article