Research ArticleGENETICS

Ancient Xinjiang mitogenomes reveal intense admixture with high genetic diversity

See allHide authors and affiliations

Science Advances  31 Mar 2021:
Vol. 7, no. 14, eabd6690
DOI: 10.1126/sciadv.abd6690


Xinjiang is a key region in northwestern China, connecting East and West Eurasian populations and cultures for thousands of years. To understand the genetic history of Xinjiang, we sequenced 237 complete ancient human mitochondrial genomes from the Bronze Age through Historical Era (41 archaeological sites). Overall, the Bronze Age Xinjiang populations show high diversity and regional genetic affinities with Steppe and northeastern Asian populations along with a deep ancient Siberian connection for the Tarim Basin Xiaohe individuals. In the Iron Age, in general, Steppe-related and northeastern Asian admixture intensified, with North and East Xinjiang populations showing more affinity with northeastern Asians and South Xinjiang populations showing more affinity with Central Asians. The genetic structure observed in the Historical Era of Xinjiang is similar to that in the Iron Age, demonstrating genetic continuity since the Iron Age with some additional genetic admixture with populations surrounding the Xinjiang region.


The Xinjiang region in northwestern China has served as a crossroads for East and West Eurasian migrations for thousands of years. As early as the Bronze Age (BA), Xinjiang hosted diverse cultures influenced by ancient Steppe, Siberian, Central Asian, and Northeast Asian populations (16). As suggested by many archaeological findings, these cultural influences on Xinjiang varied by region and time period (16). Archaeological studies of northern Xinjiang have revealed connections with the Afanasievo (~3300 to 2500 BCE) and Chemurchek (~2750 to 1900 BCE) cultures present in the Altai Mountains (13). BA cemeteries in western Xinjiang contain materials associated with mobile transportation and advanced metallurgy, which were likely derived from the Andronovo culture (~1700 to 1500 BCE) in the western Steppe and Tianshan region (711). There was a Central West Asian connection with Xinjiang in the BA through the Inner Asian Mountain Corridor (IAMC), which likely introduced agriculturally important crops, such as wheat and barley, and an East Asian connection through the Hexi Corridor, which introduced broomcorn millet in Xinjiang (1214). In addition, BA populations in eastern Xinjiang share a cultural connection with East Asians from the Gansu and Qinghai (Gan-Qing) region of northern China (3, 1317). Past studies on two BA sites (Xiaohe and Tianshanbeilu), using a limited number of Y-chromosomal single-nucleotide polymorphisms (SNPs) and the hypervariable region of mitochondrial DNA (mtDNA), could not resolve the past genetic history of Xinjiang (1820).

During the Iron Age (IA; ~800 to 200 BCE), nomadic groups from the Eurasian Steppe affected different regions of Xinjiang. One such group was the Scythians who were a confederation of several populations, such as the Tagar, Pazyryk, and Sakas (18, 21). Some burial custom records suggest that the Middle BA southern Siberian Okunevo culture, which had a limited amount of Steppe ancestry, also influenced northern Xinjiang (2225).

A recent genomic study of a single IA site reported Steppe-related ancestry in eastern Xinjiang (26). Ancient genomic studies in regions around Xinjiang, mostly in the Steppe region, further support the widespread population movement and admixture of western Steppe–related ancestry in the IA (21). However, the extent of Steppe-related ancestry across Xinjiang is unknown without more ancient DNA. Later, after 200 BCE, Xinjiang was dominated by many important nomadic confederations (such as the Xiongnu and Turkic Khaganates) and Han (21, 27). These groups notably influenced this region, and the frequent transitions of power suggest that the Historical Era (HE) was also a culturally mixed period, but whether Xinjiang population structure was affected by these cultural shifts is indeterminable without ancient DNA. Overall, the population genetic structure of ancient Xinjiang remains uncharacterized, as well as the genetic changes from the BA to IA and into the HE. Linguistically, the presence of Tocharian and Khotanese language is also an important question to explore (28). Genomic studies on present-day Xinjiang populations indicate a complex genetic structure with frequent migration and genetic admixture (29, 30). However, with ancient DNA from only a few Xinjiang sites, our ability to reconstruct a complete picture of past population structure and admixture is limited. Therefore, obtaining ancient genetic data from BA, IA, and Historical populations is critical for characterizing the spatiotemporal changes in Xinjiang’s population structure.


Overview of data

We captured mtDNA from 237 ancient individuals from 41 archaeological sites across Xinjiang (listed in table S1), with samples dating to ~4962 to 500 years before the present (BP). Of these 237 mtDNA samples, 235 of them have low contamination rates (≤4%), while for the other two samples with higher contamination rates (>4%), we only used fragments with damaged signatures characteristic of ancient DNA to avoid contamination. In total, we obtained complete mtDNA genomes from 237 individuals sequenced to between 31 and 5515× coverage (table S1).

For the BA, we collected 63 samples spanning the time period of ~4962 to 2900 BP (table S1), including 24 individuals from North Xinjiang in the Early and Middle BA (EMBA; ~4800 to 4000 BP); among them, 18 individuals were associated with the Chemurchek culture (NChemur_EMBA, 4811 to 3965 BP), and two were associated with the Afanasievo culture (NAfana_EMBA; 4570 to 4426 BP). We obtained three samples associated with the Afanasievo culture from the Ili Valley in western Xinjiang (WAfana_EMBA; ~4962 to 4840 BP) and combined them with samples associated with the Afanasievo from North Xinjiang to form the NWAfana_EMBA group. The other sites, Songshugou (NSSG_EMBA; n = 3, 4237 to 4087 BP) and Habahe (n = 1, ~4500 BP), with no known information on their archaeological cultures were analyzed separately. From eastern Xinjiang, we collected one individual from the BA Nanwan site (E_BA; ~3600 to 3000 BP) and an additional 25 samples dating to the Late BA (E_LBA; ~3000 to 2900 BP). We merged the haplogroup information of our single Nanwan individual with 29 previously published Tianshanbeilu individuals to comprise the E_BA group, given the close archaeological relationship between the two sites (18). In addition, we collected 10 samples from the fourth to fifth layers of the Xiaohe archaeological site in the eastern Tarim Basin, southeastern Xinjiang (SEXiaohe_BA; ~3929 to 3572 BP) (figs. S1 and S2) (20). These groupings represent the eastern, western, northern, and southeastern Xinjiang populations during the BA, as indicated by the first letters of their labels (Fig. 1).

Fig. 1 Geographic locations and haplogroups of ancient Xinjiang samples.

(A) Geographic locations and haplogroups of BA and IA/HE samples from eastern (E), western (W), southern (S), and northern (N) Xinjiang (XJ). The different colored and shaped symbols overlaying Xinjiang represent different sites (group names in the map are consistent with those in the manuscript). We merged haplogroup information of our one Nanwan (NW) individual with 29 previously published Tianshanbeilu (TSBL) individuals to comprise the E_BA group. Colored bar plots represent the frequencies of various haplogroups, shown on the bottom right. (B) Timeline of Xinjiang sites with the number of individuals in parentheses. NS, NC, and NWA_EMBA are the abbreviations of NSSG_EMBA, NChemur_EMBA, and NWAfana_EMBA, respectively.

For the IA, we collected 128 samples (~2900 to 2000 BP) across Xinjiang (Fig. 1), of which 27 were from eastern Xinjiang (E_IA), 15 from northern Xinjiang (N_IA), 55 from the Ili region in western Xinjiang (W_IA), and 31 from southern Xinjiang (S_IA). We did not merge the different S_IA archaeological sites into one large group because of their high cultural heterogeneity and wide geographic distribution (3136); we analyzed them as separate groups. The S_IA groups SZGLK_IA (n = 19) and SWJEZKL_IA (n = 12) were from the Tarim Basin, while the SWJEZKL_IA was from the high-altitude region of southwestern Xinjiang, adjacent to the Pamir Plateau (Fig. 1). We also collected and analyzed the haplogroup information of 10 published IA individuals (~2200 BP) from the Shirenzigou site of eastern Tianshan (grouped as SRZG_IA) (26).

In addition to the BA and IA mitogenomes, we also sequenced mtDNA from 46 HE (~2000 to 500 BP) individuals. Considering the sample size and highly mixed archaeological cultures of HE sites, we first grouped the samples based on their geographical locations and then additionally subdivided geographical groups based on their archaeological cultures. There are five groups in total, one from western Xinjiang (W_HE; n = 11), three from southern Xinjiang (SBZL_HE, n = 15; SSPL_HE, n = 10; and SHetian_HE, n = 9), and one individual from the Baiyanghe site in eastern Xinjiang (E_HE). In addition, we obtained the published ancient (n = 738) and present-day (n = 7085) mtDNA data from regions surrounding Xinjiang and grouped all these populations into several subgroups based on previous genetic studies (see Box 1 and table S1 for additional details).

Box 1

Group composition and new findings in this study.

Ancient groups for published individuals

➤ European populations (Euro): Baltic_BA, BellBeaker_LN, Hungary_BA, Unetic_LNBA, Scandi_LNBA; early European farmer: LBKH_EN, Iberia_EN, Iberia_Chal, and Anatolia_N; western Steppe: WSteppe_EMBA (Yamnaya_EBA, Afanasievo_EBA, Poltavka_BA, and Potapovka_BA), WSteppe_MLBA (CordedWare_BA, Sintashta_MLBA, Srubnaya_MLBA, Andronovo_MLBA, and WKazakh_MLBA); Anatolia_BA.

➤ Turan populations: Turkmen_EN, Turkmen_MLBA, Uzbek_MLBA, Iran_C, Iran_EMBA.

➤ Bronze Age central Steppe populations (CS_BA): Okunevo_EMBA, CSteppe_EMBA (Kazakh_EBA), CSteppe_MLBA (Kazakh_MLBA and Krasn_MLBA), Karasuk_MLBA.

➤ Iron Age central Steppe populations (CS_IA): Sarmatian_IA, CenSaka_IA, TianSaka_IA, TianHun_HE, Nomad_IA, Nomad_Med, WuKang_HE, Tagar_IA.

➤ Northeastern Asian populations (NEA): Siberian: Baikal_EN (Shamanka_EN), Baikal_EBA (Shamanka_EBA and Ustlda_EBA); northern East Asian: Khovsgol_LBA, GQMajiaY_EBA, GQQijia_BA, GQKayue_LBA, LTP_IA, Xiongnu_HP.

➤ Southeastern Asian populations (SEA): HTP_IA.

Present-day groups for published individuals

➤ Present-day northeastern Asian populations (pdNEA): Siberian: Yakut, Even, Evenk, Koryak, Yukaghir, Chukchi; northern East Asia: Japanese, Korean, Barghut, Daur, Mongolian, Tu, Oroqen, Hezhen, Xibo, NHan.

➤ Present-day southeastern Asian populations (pdSEA): Southern East Asian: Dai, Miao, She, Yi, Lahu, Naxi, Tujia, SHan, TibetC, TaiwanC; Southeast Asia: Cambodia, India, Laos, Myanmar, Nepal, Philippines, Thailand, Vietnam.

➤ Present-day populations in and around Xinjiang (pdCA/XJ): Buryat, AltaianSib, KazSib, UygurXJ, KyrgyzXJ, SarTajikXJ, WakTajikXJ, LowTajik, PamirTajik, KazTajik, Pakistan.

➤ Present-day Europeans and West Asian populations (pdEurWA): Caucasus, European, WestAsia.

Ancient groups for samples in four regions of Xinjiang

➤ Northern and western Xinjiang

NChemur_EMBA: EMBA individuals (n = 18, ~4811–3965 BP) associated with Chemurchek culture from northern Xinjiang show a connection with western Steppe, central Steppe, and NEA populations.

NSSG_EMBA: EMBA individuals (n = 3, ~4237–4087 BP), with unknown culture, show a connection with western Steppe and NEA populations.

NWAfana_EMBA: EMBA individuals (n = 5, ~4962–4426 BP) associated with Afanasievo culture from northern and western Xinjiang show an affinity with western Steppe populations.

N_IA: IA individuals (n = 15, ~2900–2000 BP) from northern Xinjiang are closely related to NEA.

W_IA: IA individuals (n = 55, ~3000–2000 BP) from western Xinjiang are closely related to both NEA and European lineages and share lineages with Turan.

W_HE: HE samples (n = 11, ~2000–500 BP) from western Xinjiang are closely related to both NEA and European lineages and share lineages with Turan.

➤ Eastern Xinjiang

E_BA: One BA sample (~3600–3000 BP) from eastern Xinjiang.

E_LBA: 25 LBA samples (~3000–2900 BP) from eastern Xinjiang show high affinity with NEA.

E_IA: 27 IA individuals (~2800–2000 BP) from eastern Xinjiang share ancestry with both NEA and European lineages but have more affinity with NEA.

E_HE: One individual (~1200–1000 BP) from Historical Era.

➤ Southern Xinjiang in Tarim Basin

SEXiaohe_BA: 10 BA samples from the 4th to 5th layers of the Xiaohe site in southeastern Xinjiang (~3929–3572 BP) are closely related to NEA (Siberian) populations.

S_IA: Two IA sites, SZGLK_IA (~2488–1968 BP, n = 19) and SWJEZKL_IA (~2680–2348 BP, n = 12). SZGLK_IA has an affinity with MLBA individuals in western Steppe and Turan; SWJEZKL_IA, from southwestern Xinjiang, near the Pamir Plateau at a high-elevation region, shows a more NEA connection.

S_HE: Three HE sites, SSPL_HE (~1876–1724 BP, n = 10), SHetian_HE (~1812–1605 BP, n = 9), and SBZL_HE (~1529–1412 BP, n = 15), show an affinity with western Steppe and Turan populations.

In general, for all the sampled ancient Xinjiang populations, we find a positive and significant correlation coefficient between maternal genetic distance (FST) and geographic distance (R2 ≈ 0.0327, P ≈ 0.018) (fig. S3), using the Mantel test (37, 38). Thus, ancient Xinjiang populations were likely highly admixed and had low geographic structuring. The genetic comparison of BA, IA, and HE Xinjiang populations also revealed variation in nucleotide diversity (π) (fig. S4). IA and HE populations, generally, showed higher π compared to their BA populations, indicative of elevated population migration and admixture during the IA and HE relative to the BA. Among the IA and HE populations, W_HE (0.0024 ± 0.0013) showed the highest diversity, with the lowest diversity observed in southern Xinjiang populations (0.0015 ± 0.0008 to 0.0020 ± 0.0010) (fig. S4).

Genetic origins and complexity of BA Xinjiang populations

To determine the genetic differences and affinities among the BA Xinjiang groups, we first conducted a discriminant analysis of principal components (DAPC) based on haplogroup frequency. PC1 explains population variation from east to west geographically, and PC2 explains the variation from north to south (Fig. 2A and fig. S5). In general, all the populations were divided into four main clusters: northeastern Asian (NEA: Siberian and northern East Asian), southeastern Asian (SEA: southern East Asian and Southeast Asian), central Steppe, and European (Turan and European). All the ancient Xinjiang samples lie on a cline extending from the NEA populations to the central Steppe and European clusters (Fig. 2A and fig. S5), suggesting that these ancient Xinjiang populations had varying degrees of relatedness to NEA, central Steppe, and European populations.

Fig. 2 The genetic results for the ancient Xinjiang samples and other ancient and present-day Eurasians.

(A) DAPC based on haplogroup frequency; the eigenvalues of the first and second PCs are shown on the top right. The different colors on the plot represent different groups made under unsupervised classification. The shapes with black frames represent the ancient Xinjiang samples with the BA samples in dark red and the IA/HE samples in dark blue. The populations plotted as triangles are ancient populations, and the circles are present-day humans. NEA, northern East Asian (upward pointing triangles), and Siberia (triangles pointing to the right); SEA, southeastern Asia; Euro, Europe; CS, central Steppe populations in BA (CS_BA; upward pointing triangles) and IA (CS_IA; triangles pointing to the right); pdCA/XJ, present-day populations in and around Xinjiang. (B) The genetic distance (FST) heatmap plot of ancient Xinjiang samples and ancient Eurasians. The different labels represent genetically distinctive groups corresponding to those in the DAPC. Values with FST ≈ 0.00 are in white, representing a close genetic relationship. SEXiaohe_BA was removed from the FST heatmap, considering the significantly large genetic distances (FST > 0.10) between them and other groups.

We find that the EMBA Xinjiang individuals from northern and western Xinjiang, associated with the Afanasievo culture (NWAfana_EMBA), are surrounded by western Steppe–related populations (WSteppe_EMBA and WSteppe_MLBA) (Fig. 2A and fig. S5). In contrast, the EMBA individuals associated with the Chemurchek culture (NChemur_EMBA) and individuals from the Songshugou site in northern Xinjiang (NSSG_EMBA) form their own separate cluster surrounded by other central Steppe populations (Fig. 2A and fig. S5). High proportions of U, H, and R haplogroups are observed, which were reported primarily in BA Steppe populations (table S1). Although there is no significant genetic differentiation among these EMBA individuals (FST ≈ 0.00, P > 0.05) (Fig. 2B, fig. S6, and table S2), we find that only NChemur_EMBA shows significant genetic differentiation from both western Steppe populations (WSteppe_EMBA: FST ≈ 0.045, P ≈ 0.005; WSteppe_MLBA: FST ≈ 0.042, P ≈ 0.006) (Fig. 2B and table S2). NChemur_EMBA also shows significant genetic differentiation from CSteppe_MLBA (FST ≈ 0.057, P ≈ 0.002) but not from CSteppe_EMBA (FST ≈ 0.029, P ≈ 0.16) (Fig. 2B and table S2), consistent with its position on the DAPC plot (fig. S5C). Moreover, in the median-joining networks, the WSteppe_EMBA individuals cluster with NWAfana_EMBA in haplogroups U4, U5, and H15 (Fig. 3D and fig. S7B); with NChemur_EMBA in U4, U5, H2, H6a, and W3 (Fig. 3D and fig. S7, B and D); and with NSSG_EMBA in U4 (Fig. 3D and fig. S7B) (39, 40). The two haplogroups H2 and H5 are present in the western Steppe–related populations, and H6a appears in the populations related to the Okunevo culture present in the Altai region (fig. S7B and table S1) (40). We also find an NEA connection (Baikal_EBA) with NChemur_EMBA supported by the appearance of haplogroup D4j (table S1), which differed in these two populations by only four mutations in the network analysis and appeared in northern East Asia, including the southern Siberian region (Fig. 3E and fig. S7A) (41). The one EMBA sample from HBH has the haplogroup U5 (table S1), suggesting a more western Steppe–related connection. Therefore, we demonstrate that both western and northern Xinjiang populations have considerable western Steppe–related ancestry during the EMBA.

Fig. 3 Median-joining haplogroup networks.

The median-joining network of the haplogroups C4 (A), R1b (B), HV (C), U4 (D), and D4 (E) related to ancient northeastern Asian (NEA), Botai/Dali, European (Euro), Turan, and BA and IA populations from the central Steppe (CS_BA and CS_IA). The Euro group consists mainly of western Steppe–related individuals, and the CS_IA group contains the Saka, Hun, and Nomad populations. The size of the circles represents the proportion of each haplotype. The lengths of lines between nodes represent the number of mutations between two haplotypes. The different population groups are shown by different colors that are consistent with those groups in DAPC and FST heatmap. Some ancient Xinjiang samples are annotated with black labels.

We find that eastern groups (E_BA and E_LBA) cluster separately from the EMBA individuals in northern and western Xinjiang. Both of the eastern groups cluster with ancient and present-day NEA in the DAPC (Fig. 2A and fig. S5). E_BA and E_LBA harbor high proportions of the haplogroup D (~36.70 and 32.00%, respectively), which is a common lineage in ancient and present-day NEA populations (42, 43) including northern Chinese (18.20 to 44.80%) and ancient Mongolians (31.20%) (Figs. 1A and 3E and table S3). E_LBA also shows nonsignificant genetic distances to some of these NEA populations, specifically two ancient Gan-Qing populations (GQQijia_BA and GQKayue_LBA; FST < 0.05, P > 0.05) and four present-day populations (Japanese, Mongolian, Tu, and Oroqen; FST < 0.03, P > 0.05) (fig. S6, B and C, and table S2). Although both E_BA and E_LBA have the western Steppe–related haplogroup U, they show a higher proportion of lineages from NEA than from Europe, with more European lineages appearing in later samples (20% in E_BA and 36% in E_LBA) (table S3). This pattern is consistent with DAPC in which E_LBA plots closer to West Eurasians compared to E_BA (Fig. 2A). Moreover, haplogroup D4b2b4 is found in both the Xiongnu and E_LBA (Fig. 3E), which suggests a direct relationship between E_LBA and Xiongnu populations due to the presence of shared NEA ancestry. Thus, E_BA and E_LBA populations show more NEA connections, but the presence of western Steppe–related lineages (U, 16.7% in E_BA and 8% in E_LBA) also supports additional connections to the western Steppe–related populations (table S3).

Although SEXiaohe_BA clusters into the NEA groups in DAPC, which is similar to E_BA and E_LBA, they show more affinity for populations with ancient and present-day Siberian ancestry (Fig. 2A and fig. S5). SEXiaohe_BA has a high proportion of the C4 haplogroup (six of seven individuals) present in ancient and present-day Siberian populations, including NEA and Shamanka populations from near the Lake Baikal region of South Siberia (Fig. 3A and fig. S7A). This population is unique in yielding significant genetic distances compared to all other ancient and present-day populations (FST > 0.11, most P values < 0.04), including other BA Xinjiang groups, but it has the lowest genetic distances with three present-day populations from Siberia (Even, Evenk, and Yakut: FST < 0.13) (table S2). These results are consistent with previous studies on Xiaohe (19, 20).

We also find the mtDNA haplogroup R1b in BA Xinjiang samples (NChemur_EMBA, n = 2; NWAfana_EMBA, n = 1; NSSG_EMBA, n = 1; SEXiaohe_BA, n = 1) and in IA and HE populations from eastern and western Xinjiang (E_IA, n = 1; W_IA, n = 1; W_HE, n = 2) (table S1), which was reported not only in East European Hunter-Gatherers (Karelia) (44) but also in Botai (40) and Dali (28) individuals from Kazakhstan. Moreover, the haplogroup K1b2 was shared among the Botai (40) and western Steppe–related populations as well as our LBA samples from eastern Xinjiang (E_LBA) (table S1). The R1b median–joining network shows that the EMBA sample (3012 to 2890 cal BCE) from northern and western Xinjiang, associated with the Afanasievo (NWAfana_EMBA), plots in the center of the network and was separated from Botai by only a single mutation (Fig. 3B). This branch, in turn, is associated with NSSG_EMBA and another branch that includes an individual from the Dali site (Fig. 3B). This may suggest either a deep ancestry connection with an Ancient North Eurasian (ANE) population or some genetic connections with geographically proximal populations from Kazakhstan (Dali and Botai) (28, 40). We also find the R1b haplogroup in one of the individuals from the Xiaohe population (Fig. 3B), which may also suggest a North Xinjiang connection with Xiaohe people (11). Thus, during the BA, the northwestern Xinjiang populations showed a high genetic affinity for western Steppe–related cultures, such as the Afanasievo and Chemurchek, and the southeastern Xinjiang populations for NEA and South Siberian populations (Fig. 4A), suggesting a scenario of complex interactions with the neighboring populations and communities of diverse cultural backgrounds.

Fig. 4 Inferred maternal population movements around Xinjiang.

Depictions of the main population movements in the BA (A), IA (B), HE (C), and present day (D). The dashed line in (A) represents the connection between SEXiaohe_BA and ancient Siberian (Sib) populations. The shades of green, blue, brown, and turquois in (A) to (C) represent the ancient European-, NEA-, central Steppe–, and Turan-specific ancestries, respectively. The shades of light green, blue, and yellow represent lineages found in present-day Europeans and West Asians, NEA, and populations in and around Xinjiang (pdCA/XJ). NS, NC, and NWA in (A) are the abbreviations of NSSG_EMBA, NChemur_EMBA, and NWAfana_EMBA, respectively.

Greater cultural and haplogroup diversity in IA Xinjiang

To better understand the population movements and changes in Xinjiang during the IA, we analyzed 128 samples across Xinjiang (Fig. 1). In general, all the Xinjiang IA groups except N_IA show higher genetic haplogroup diversity than those in the BA (Fig. 1A), suggesting more migrations and communication from East and West Eurasia. The genetic differentiation (FST) between the IA groups is mostly nonsignificant (P > 0.05), suggesting high levels of admixture in the IA (Fig. 2B and table S2). In the DAPC analysis (Fig. 2A and fig. S5C), we find that the N_IA clusters close to NEA populations (such as ancient GQMajiaY_EBA and Khovsgol_LBA in northern China, present-day Xibo in northern China, and present-day Yakut in Siberia), which is contrary to the EMBA northern Xinjiang populations who cluster close to those with western Steppe–related ancestry. This high affinity for NEA is also represented by two major EA haplogroups, D (53.30%) and F (13.30%) (Fig. 1A and table S3), and low FST values with North China populations, such as GQMajiaY_EBA, GQQijia_BA, GQKayue_LBA, and LTP_IA (table S2). The median-joining networks further suggest a connection between N_IA and NEA individuals (haplogroups D4e and D5a), along with some Saka individuals (TianSaka_IA, D4j8) (Fig. 3E and fig. S7E). In addition, we also observe western Steppe–related haplogroups U4 (6.70%) and H5 (6.7%) and the Turan-related haplogroup U7 (6.7%) (Fig. 3D and fig. S7, B and C) (45, 46), suggesting a genetic connection with these populations.

There are two groups of populations that include Xinjiang E_IA and the published samples from SRZG_IA. We observe that both E_IA and SRZG_IA have very low genetic distances with GQQijia_BA from northern China (FST < 0.05, P > 0.05), consistent with their BA samples, E_LBA (Fig. 2B and table S2). In DAPC, although both E_IA and SRZG_IA plot in the central Steppe cluster, E_IA shows a slightly greater affinity to the NEA populations, whereas the published samples from SRZG_IA group closer to those with western Steppe–related ancestry (Fig. 2A and fig. S5). We also observe higher proportions of the western steppe–related U haplogroup in SRZG_IA (40%) compared to the E_IA (22.20%), supporting the DAPC plot (Fig. 1A and table S3). In addition, we also find the NEA haplogroup D (14.8%) in E_IA, represented by D4b2b (in the Xiongnu) (Fig. 3E, fig. S7A, and table S1). We also observe Central Asian haplogroups. The Turan-specific haplogroup T2d1 (Uzbek_MLBA) in E_IA likely reflects additional affinities shared with Turan populations (fig. S7C). Therefore, E_IA populations not only show both NEA and European haplogroups (Fig. 1 and table S3) but also share additional genetic affinity with Turan populations.

In DAPC, the W_IA populations lie on the cline of NEA and European populations and group close to the IA populations in the central Steppe (e.g., Saka and Hun) (Fig. 2A and fig. S5, A and C). This East and West Eurasian affinity could be seen in the diverse array of East and West Eurasian haplogroups compared to the other three IA geographical regions (Fig. 1A). The W_IA includes the two major European haplogroups U (20.40%) and H (18.50%) and the NEA haplogroups C (14.80%) and D (11.10%) (Fig. 1A and table S3). The connection to the WSteppe_MLBA lineages is also observed with sub-haplogroups T2b34 and U5a2a1 but not in the BA samples from the same region (fig. S7B and table S1). The median-joining network of T2b34 also shows a W_IA and WSteppe_MLBA connection (fig. S7B). This high haplogroup diversity is further reflected in the FST values whereby many of the NEA and European populations show very low FST values with W_IA (table S2). In addition, the presence of the Turan-specific haplogroup HV (HV18 and HV20) and W (W3b) (Fig. 3C and fig. S7, C and D) indicates the possible migration of people from the Turan through IAMC to the Ili region. We also observe some important outliers in W_IA, such as the presence of haplogroups G3a3 (Xiongnu_HP), C4a1a+195 and C4+152 (TianHun_HE), and H101 (CenSaka_IA), suggesting a possible connection with Steppe nomadic groups (table S1). Also, the phylogenetic networks of C4, G3a3, and H101 show direct connections with few differences between the W_IA and the IA Steppe nomadic Saka, Xiongnu, and Hun populations (Fig. 3A and fig. S7D), suggesting the potential role of the Ili region in the migration of the Saka, Xiongnu, and Hun in ancient times.

Last, the SZGLK_IA from South Xinjiang shows a predominant Steppe and Central Asian connection as they lie close to the Steppe and Central Asian populations in DAPC, whereas SWJEZKL_IA lies close to populations with NEA and central Steppe ancestry (Fig. 2A and fig. S5) with high frequencies of NEA haplogroups C (25.00%) and D (25.00%) (Fig. 1A and table S3). Haplogroup analysis shows that SZGLK_IA has a relatively higher frequency of H (26.3%) and U (5.3%) haplogroups (table S3), which are present in ancient West Eurasian populations, such as WSteppe_MLBA (fig. S7B); SZGLK_IA also has low FST values (<0.05, P = 0.005) with WSteppe_MLBA (table S2). The connection to the Steppe_MLBA is further augmented by the presence of haplogroups T2b34, H5a1, U5a2a1, and N1a1a1a, present in WSteppe_MLBA populations but not in WSteppe_EMBA (table S1 and fig. S7B). Moreover, the haplogroup N1a1a1a has a high frequency (about 6 of 73) in early European farmers (LBK_EN and Anatolia_N) but not in European Hunter-Gatherers (table S1 and fig. S7B). The presence of the HV12 and R2+13500 haplogroups in SZGLK_IA also reveals a Central Asian connection from Turan (Fig. 3C and fig. S7C). A close affinity with Central Asians was further found with lower FST values comparing SZGLK_IA with Iran_C (=0.00) and Turkmen_EN (≈0.01) (Fig. 2B and table S2). In addition, the SWJEZKL_IA too had a Turan-related lineage (H13a2a), indicating some connections with Turan (table S1 and fig. S7D).

Overall, the IA populations show a distinct population structure across Xinjiang despite intense admixture. The N_IA and high-elevation SWJEZKL_IA populations show more NEA ancestry, whereas the W_IA and E_IA populations show both NEA and European ancestries, and SZGLK_IA shows more connections with the western Steppe–related populations in the MLBA (Fig. 4B). In addition, all the IA populations show a genetic connection with Turan populations; among these IA samples, E_IA, W_IA, and SZGLK_IA shared more affinity with the Turan populations compared to N_IA (table S2). This further suggests the important role played by the IAMC, which probably led to the increased migration of Turan people with ancestries similar to the Bactria Margiana Archaeological Complex culture prevalent in this region (11, 47).

Genetic continuity between the IA and HE in Xinjiang

In addition to the BA and IA individuals, to further assess the genetic diversity after the IA time period, we sequenced 46 samples from the HE with a range of archaeological dates between ~2000 and 500 BP. In general, we find that the HE individuals show the lowest FST values with ancient Xinjiang populations compared to other ancient populations outside of Xinjiang (table S2). In DAPC, the W_HE clusters close to other IA individuals of W_IA, SRZG_IA, SBZL_HE, and SZGLK_IA along with ancient central Steppe individuals (Fig. 2A and fig. S5). The W_HE populations also show the same NEA lineages (C, 27.30%; G, 9.10%) and European haplogroups (H, 9.10%; U, 18.20%) present in the IA (W_IA) individuals (Fig. 1A and table S3). They additionally have the Hun (C4a1a+195), Saka (H101), and Turan mtDNA lineages that were also found in the W_IA (table S1). The close affinity between the W_HE and the W_IA is further shown by low FST values (FST ≈ 0.002, P > 0.05) (table S2). The one historical sample from East Xinjiang has the haplogroup D4 (table S1), suggesting a more NEA connection.

The South HE (S_HE) individuals from the three sites (SBZL_HE, SSPL_HE, and SHetian_HE) yielded varied affinities. SBZL_HE is unique among the S_HE in having haplogroup D (13.3%), while SSPL_HE contains a high proportion of the western Steppe–related haplogroup H (60%) (Fig. 1A and table S3). Haplogroup frequencies show the same pattern, whereby the SBZL_HE individuals show a relatively higher NEA affinity (33.3%, haplogroups A, C, and D) compared to the SHetian_HE (C, 11.1%) and SSPL_HE (0.00%) individuals (Fig. 1A and table S3), which show higher affinities for western Steppe–related populations (60% H, 20% U, 10% T, and 10% W in SSPL_HE) (Fig. 1A and table S3), and group close to the western Steppe–related populations in DAPC (Fig. 2A and fig. S5). The three HE populations show low FST values with western Steppe–related and Turan populations (FST < 0.05) (table S2), similar to the IA population from the same region (SZGLK_IA). However, we find the Hun-related haplogroup (D4j5) in some of the S_HE samples (SBZL_HE) and in W_IA (Fig. 3E). The D4 haplogroup network shows that W_IA and SBZL_HE are separated by fewer mutations with TianHun_HE (Fig. 3E), which may reflect a connection between IA western Xinjiang and HE southern Xinjiang, and a southward movement of the Hun from the Ili region into southern Xinjiang.

Mitogenomic comparisons of BA, IA, and HE Xinjiang individuals

Above all, haplogroup comparisons between different Xinjiang geographical regions across the BA and IA suggest the occurrence of multiple migrations and admixture events (Fig. 1A). The EMBA populations in northern Xinjiang show higher affinity with western Steppe–related populations, suggested by haplogroups U and H, whereas the IA population (N_IA) shares more connections with East Eurasians, especially those found in NEA with high proportions of haplogroup D4 (such as D4c1b1, D4e1, and D4o2a that are specific in IA/HE) (Fig. 2A and table S1). We found consistent results in our FST analysis where EMBA in northern Xinjiang populations shows a small genetic distance with the western Steppe–related populations (FST = 0.00, P > 0.05 in NWAfana_EMBA and NSSG_EMBA) and the N_IA with the ancient NEA (FST = 0.00, P > 0.05 in GQQijia_BA) (Fig. 2B and table S2). This shift toward more NEA-related ancestry from the BA to IA period suggests more frequent migrations and admixture between NEA and North Xinjiang populations.

The western Xinjiang populations show fairly consistent haplogroup compositions across the BA and IA periods, with the presence of western Steppe–related haplogroups (U5 and H15), while W_IA shared some NEA haplogroups (C4, G2, and D4) (table S1). W_IA plotted between NEA and European populations and clustered with central Steppe populations in DAPC, which was distinct from the EMBA individuals (NWAfana_EMBA) associated with the Afanasievo (Fig. 2A and fig. S5). W_IA also shows a Turan connection (FST < 0.03, most P > 0.05 in Turkmen_EN, Iran_C, and Iran_EMBA) (table S2). The genetic comparison from the BA to IA thus supports the important role played by the Ili region, where populations with Steppe-related and NEA ancestries thrived for a very long time.

In eastern Xinjiang, both the BA and IA groups show more affinity to NEA populations, but we also find haplogroups from European populations (Fig. 1A and table S3), further suggesting the presence of NEA and European-admixed populations in this region. The South IA population shows a Turan connection (SZGLK_IA, FST < 0.03, P > 0.05 in Iran_C and Iran_EMBA), similar to the W_IA (Fig. 2B and table S2), which may reflect the migration of populations from the West (Ili) to South Xinjiang. These highly admixed ancestries in the IA populations continued into the HE. The Historical period in Xinjiang saw varied migrations and settlements of both NEA and European populations, thus reflecting a civilization founded and settled by both East and West Eurasian populations.

To test how different classifications would affect the variance among different Xinjiang groups, we conducted analysis of molecular variance (AMOVA) tests among the ancient Xinjiang populations (table S4). When compared to other groups, the significantly highest FCT value (variance among groups; FCT = 0.054, P = 0.001) was observed when Xinjiang samples were classified into four groups: WXJ (NChemur_EMBA/NWAfana_EMBA/NSSG_EMBA/W_IA/W_HE/SZGLK_IA/SSPL_HE/SHetian_HE/SBZL_HE), EXJ (N_IA/SWJEZKL_IA/E_LBA), EIA (E_IA/SRZG_IA), and SEXiaohe_BA (table S4). The EXJ group shared more affinity with NEA populations, WXJ groups clustered with western Steppe–related populations, and the EIA group shared more affinity with central Steppe and Europeans, as shown by the FST heatmap and DAPC plot (Fig. 2).

Ancient Xinjiang comparisons with present-day populations

To determine the relationship that ancient Xinjiang populations share with present-day Eurasians, we compared Xinjiang individuals to four subgroups based on geographical locations: present-day NEA populations (pdNEA: Siberian and northern East Asian); SEA populations (pdSEA: southern East Asian and Southeast Asian); populations in and around Xinjiang (pdCA/XJ); and a combined European, Caucasus, and West Asian group (“pdEurWA”) (figs. S5C and S6A). Among the BA populations, SEXiaohe_BA shows the highest affinity for pdNEA-Siberians (FST = 0.12, P = 0.00) and E_LBA for pdNEA–northern East Asian (FST < 0.02, P > 0.05 with Tu, Japanese, and Mongolian), while the northern and western Xinjiang EMBA shows the highest affinity for pdCA/XJ and pdEurWA (FST < 0.05, P > 0.05 with most populations), which are also observed in DAPC and haplogroup analyses (Fig. 2A and table S2). The IA and HE Xinjiang populations also show, in general, a high affinity for the populations in and around the Xinjiang region (pdCA/XJ) with very low FST values, demonstrating genetic connections between the IA, HE, and present-day populations (Fig. 4 and table S2). In addition, we see significant pdNEA connections with Xinjiang N_IA (FST < 0.02, P > 0.05 with Japanese, Tu, Oroqen, and Han populations) and SWJEZKL_IA (FST < 0.02, P > 0.05 in Tu and Mongolian) (fig. S6, B and C, and table S2).

The present-day Xinjiang populations from southwestern Xinjiang (29) include four populations: UygurXJ, KyrgyzXJ, SarTajikXJ, and WakTajikXJ. In DAPC, UygurXJ and KyrgyzXJ plot closer to the populations around Xinjiang (pdCA/XJ), and SarTajikXJ and WakTajikXJ cluster with pdEurWA populations (fig. S5, B and C). Our ancient Xinjiang samples clustered close to the UygurXJ and KyrgyzXJ, suggesting more East Asian affinities compared to the SarTajikXJ and WakTajikXJ, who show more affinity with Europeans and Iranians (fig. S5, B and C). In FST analysis, we also generally observe that Xinjiang EMBA and IA individuals show high genetic affinity with UygurXJ and KyrgyzXJ compared to the SarTajikXJ and WakTajikXJ, while Xinjiang HE samples show similar genetic affinity for both of these groups (table S2). Thus, in summary, we find a genetic continuity from the ancient to present-day time periods whereby all the major ancestries from Siberia, Europe, East Asia, and West Central Asia are present in both ancient and present-day Xinjiang populations (28).


Archaeological findings from Xinjiang have raised curiosity about its past population structure and the peopling of this region. Among competing hypotheses, most archaeologists support the view that ancient Xinjiang was a mixed confederation of people from both East and West Eurasia. The high mobility and admixture of people during the BA and IA periods around Xinjiang are supported by previous analyses with limited uniparental markers (1820). However, the lack of ancient DNA from multiple regions and time periods has left Xinjiang’s past population structure shrouded in mystery. Through our analyses, we have characterized the ancient population structure of Xinjiang and how it changed from the BA through the present day. These new data and results allowed us to propose a much more detailed and expansive admixture scenario.

The BA around Xinjiang was predominantly represented by western Steppe–related ancestries, which included the EMBA Yamnaya/Afanasievo cultures (28, 40, 42, 48). For instance, we find that the WSteppe_EMBA populations cluster with individuals from the Songshugou site in northern Xinjiang (NSSG_EMBA) in multiple haplogroups (U4, U5, H2, H6a, and W3). This is consistent with the Afanasievo-style relics at Songshugou (SSG) and the physical anthropology of an individual from this site (tomb M15) who shows European-like characteristics (49). We also find evidence for the influence of Chemurchek culture in BA Xinjiang, as suggested by the archaeological records of standing stone pillars with anthropomorphic figures around different cemeteries (50). The BA populations within Xinjiang were quite mixed genetically, as we found the presence of both East (NEA) and West (western Steppe–related) Eurasian mitochondrial haplogroups. Despite high admixture among the BA Xinjiang people, some unique genetic affinities are still observed. For example, western Steppe–related populations appear to have affected the northern and western Xinjiang populations (NWAfana_EMBA and NChemur_EMBA) more notably than the eastern Xinjiang groups (E_BA and E_LBA), which showed more NEA connections. The NEA connection is consistent with the archaeologically hypothesized formation of the earliest known culture in BA eastern Xinjiang, the Tianshanbeilu culture (~3900 BP). It has been suggested that the Majiayao/Machang culture in Gansu Province, east of Xinjiang, formed the Tianshanbeilu culture in eastern Xinjiang and formed the Siba culture in the Hexi Corridor (51). Individuals from BA eastern Xinjiang (E_BA; Tianshanbeilu culture) also showed physical similarities to populations from the Gan-Qing region (52). Consistent with these reports, we found that later LBA populations in eastern Xinjiang also had genetic connections to ancient Gan-Qing populations (GQQijia_BA and GQKayue_LBA). The presence of some West Eurasian–related haplogroups in eastern Xinjiang (both E_BA and E_LBA) is further consistent with the presence of burial forms, ornaments, and tools at the Tianshanbeilu site that share some West Eurasian features (53, 54), as well as some individuals that had European-like physical characteristics (55). SEXiaohe_BA populations, however, showed more connections with ancient and present-day Siberian populations (Fig. 4A). The archaeological findings from the Xiaohe site include wheat and millet grains, suggesting probable connections and exchanges from both West Asian along the IAMC and northern Chinese cultures (56, 57). This scenario likely suggests that migrations from both NEA and West Eurasia considerably affected the BA Xinjiang populations. All these western Steppe–related cultures, such as the Afanasievo, Chemurchek, and Botai, can represent a form of Altaic culture, which probably used the IAMC route to establish their presence in Xinjiang. In addition, the affinity of East and South Xinjiang BA populations with NEA and Siberians also suggests a greater influence from NEA and Siberia during the BA in Xinjiang.

The IA samples in Xinjiang continued to be more admixed with both East and West Eurasian lineages, but, similar to the BA, certain geographical regions showed different affinities for NEA and European populations. The IA populations from northern Xinjiang showed more ancient NEA connections, whereas the western and eastern populations (W_IA and E_IA) had an affinity for both NEA and European populations (Turan and western-related Steppe) and clustered with IA central Steppe populations (Fig. 4B). In IA western Xinjiang, the mixed affinity to East and West Eurasians may reflect the mixed cultural and physical similarities. The burial structure of the Early IA Suodunbulake culture in western Xinjiang is similar to the Central Asian Sapali and Wakeshi cultures in Amu Darya. The painted potteries of the Suodunbulake culture are more reminiscent of East Eurasian cultures, whereas individuals associated with the Suodunbulake have more European-like characteristics (58). The occupation of the Ili region by the Scythians in the early IA may also explain the shared affinity to both East and West Eurasian populations in IA western Xinjiang. The continued NEA connection observed in E_IA reflects the continuity between the BA Tianshanbeilu and IA Yanbulake cultures in eastern Xinjiang, of which the latter was further affected by the Xindian culture from Gansu Province as well as the Chemurchek and Xintala cultures in Xinjiang (59).

Populations from southern Xinjiang (SZGLK_IA) show more WSteppe_MLBA and Turan connections, while the population from the high-elevation region of southwestern Xinjiang (SWJEZKL_IA) shows more of an NEA connection. The regional preferences of some Xinjiang populations, particularly the differentiation between southwest and northeast Xinjiang, suggest that the IA was a very interactive time period. From the 200 BCE, the Silk Road passing through Xinjiang became influential in facilitating population migrations across Eurasia (60). We found that IA samples in northern Xinjiang (N_IA) differed from the EMBA samples from this region in showing a closer affinity with NEA populations from the Hexi Corridor of the Gan-Qing region. Moreover, the IA samples from South Xinjiang (SZGLK_IA) also had NEA lineages (C7b, D4i, and D4j1b), demonstrating a connection between northern China and the Tarim Basin, consistent with the influence of population movements along the Silk Road toward the Tarim Basin.

The HE populations continued to show both NEA and European lineages, reflecting the sustained high movement and admixture of people in this region. These HE populations are highly admixed, with societies of different cultural affinities living alongside one another (Fig. 4C). We also found that populations in both the N_IA and W_IA had Saka-related lineages, suggesting that the Saka people may have admixed with the IA populations in northern and western Xinjiang. The Xiongnu lineages (Fig. 3E and fig. S7D) found in the population from both eastern and western Tianshan (E_IA and W_IA) coincide with the westward expansion of the Xiongnu population around the second or third century BCE (21). Hun mitochondrial lineages were observed in HE (but not IA) samples of southern Xinjiang [SBZL_HE; 421 (95.4%) to 538 CE], coinciding with the invasion of the Hun into Scythian populations and the formation of the Hun-Scythians in the fourth to fifth century CE (21), suggesting a southward movement of this Hun tradition from the Ili region into southern Xinjiang in HE.

Xinjiang is associated with the extinct Indo-European Tocharian language, which was present from 500 to 900 CE in central Xinjiang based on ancient manuscripts (28, 61). In general, archaeologists view this language as being associated with Afanasievo-related people in Xinjiang (28). Our results for the BA sites suggest a complex scenario whereby the Xiaohe site in the Tarim Basin has a deep ancestral connection with ancient Siberian populations, whereas other Xinjiang EMBA populations from the north and west show a more Steppe EMBA (Afanasievo) connection. Thus, probably, the Tocharian language came into Xinjiang with populations associated with Steppe-related ancestry, such as the Afanasievo. However, this will require more sampling as Xiaohe (3900 to 3500 BP), which dates later than the other Xinjiang EMBA (~4500 BP), instead had a deep ANE connection with ancient Siberians. Khotanese, another ancient language, associated with the Indo-Iranian language family, was first observed in ancient documents at the Niya site (200 to 500 CE), Khotan, southern Tarim Basin (62), which is contemporaneous with our samples from this region (SSPL_HE, 74 to 226 cal CE; SHetian_HE, 138 to 345 cal CE; SBZL_HE, 421 to 538 cal CE). The Khotanese language is associated with the expansion of the Sakas around 200 BCE into the Xinjiang region (63). We also observed the genetic affinity between many IA and HE Xinjiang populations with Sakas, suggesting its widespread presence in Xinjiang.

Therefore, the mitogenomic history of Xinjiang was heavily marked by western Steppe–related, central Steppe, northeastern Asian, and Turan introgression, and a confederation of different ancient populations is quite visible from the BA to HE periods. This admixture formed the foundation of present-day populations in Xinjiang, and future studies with ancient genomic data will reveal more admixture patterns in this region.


Ancient sample collection

We collected samples from a total of 237 ancient human individuals from 41 sites (table S1). Their archaeological details are provided in the Supplementary Materials. Approval for their use was curated by co-authors and obtained with permission from the respective provincial archaeology institutes or universities that managed the samples. The permission and oversight were also provided by the institutional review board at the Institute of Vertebrate Paleontology and Paleoanthropology of the Chinese Academy of Sciences to study their ancient genomes.

Ancient DNA (aDNA) extraction and library preparation

A total of 237 DNA extractions were obtained from the skeletal or dental remains of ancient Xinjiang individuals. Human remains were surface-cleaned and drilled for less than 100 mg of bone powder. Extraction was conducted in the aDNA clean room at the Institute of Vertebrate Paleontology and Paleoanthropology following strict aDNA standards (64). Single-stranded or double-stranded DNA was used to generate 237 libraries. We identified and retained the cytosine-deaminated terminal sequences seen in typical aDNA and constructed a uracil-DNA glycosylase (“UDG”) partially treated, double-stranded (denoted as “DS_half” in table S1) library (65). For library enrichment, we used an AccuPrime Pfx DNA enzyme for 35 cycles. P5 and P7 adapters were added to limit the contamination rate. We also used a NanoDrop 2000 spectrometer to monitor the DNA concentration.

In-solution capture of mtDNA

The in-solution capture was accomplished by overlapping probes with DNA fragments and enriching the resulting libraries (66). The probes were synthesized on the basis of the human mitochondrial genome (67). It yielded an average mtDNA coverage of 460-fold (table S1).

Sequencing and read alignment

The library pools were sequenced using an Illumina MiSeq instrument with paired 2 × 76 base pair (bp) reads. We used leeHom ( to trim adapters and merge sequences, with paired-end reads overlapping by at least 11 bp (68). Sequenced and merged reads with lengths of at least 30 bp were then mapped to the revised Cambridge Reference Sequence version 17 (rCRS) for mtDNA, using the samse command in the BWA v0.6.1 aligner (arguments used: -n 0.01 and -l16500) (68, 69). We removed duplicate sequences and retained the one with highest mapping quality. After removing sequences with a mapping quality below 30, we constructed the whole mitochondrial sequence. We used Haplogrep2 built on Phylotree Build 17 to call haplogroups for each sample (70, 71). We also sampled sequences randomly for each SNP covered at least once to recover allele information for each locus (table S1).

Test for contamination

On the basis of the C to T transition rates estimated by MapDamage2.0 (72), we expect the true ancient DNA with proper transition rate. We also expect the mtDNA reads to match the consensus sequence better than 311 worldwide mtDNA genomes; we, therefore, used contamMix to detect present-day human mtDNA contamination (66). To avoid the damage pattern detected as contamination, we ignored the first and last five positions of the fragments. If the library had more than 4% of the fragments matched with other sequences that are better than the consensus, then we treated it as contaminated. However, for libraries Hetian_M16A and Hetian_M16B with substantial contamination but high coverage, a consensus was still retrieved by only using sequences with retained C to T substitutions on the first three positions at the 5′-end and the last three positions at the 3′-end of each fragment.

Haplogroup analysis

The complete sequences of mtDNA were aligned with rCRS, based on PhyloTree v17 using MUSCLE (MUSCLE 3.3.31) software (70, 73). Haplogroups were called with haplogrep2 (70). We additionally used the DAPC package in R to conduct a haplogroup frequency–based DAPC and plotted PC1 and PC2 to illustrate haplogroup differences among populations (74). A heatmap based on the correlation of haplogroup frequencies was made using the heatmap function in R.

Genetic distance analysis

We used the Arlequin software package to calculate genetic distances (FST) between populations, which were visualized with the vegan package’s metaMDS function in R (75). Heatmaps were also used to illustrate the statistical significance of clusters based on FST.

Geographic distance and Mantel test

The geosphere package was used with R software to construct a geographic distance matrix from the geographic coordinates of each ancient Xinjiang group (37), excluding two extreme sites (SEXiaohe_BA and NSSG_EMBA). To estimate the correlation between the FST and geographic distance matrices, the lm function was used in R to fit the linear models (38). The regression results were plotted in R with its P and R2 values.

AMOVA analysis

The Arlequin software package was also used to analyze the molecular genetic variance among groups, within each group, and among populations to assess population genetic structure (75). P values were calculated under 20,000 permutations to test whether the variance was statistically significant.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We would like to thank M. A. Yang, E. A. Bennett, and Y. Liu for comments as well as archaeological teams from Xinjiang and Xi’an. We would also like to thank the reviewers, which included V. M. Narasimhan and two anonymous reviewers, who provided valuable suggestions that helped to further improve the manuscript. Funding: This work was supported by National Key R&D Program of China (2016YFE0203700), the Chinese Academy of Sciences (CAS, XDB26000000), National Natural Science Foundation of China (91731303, 41925009, 41672021, and 41630102), CAS (XDA1905010 and QYZDB-SSW-DQC003), Tencent Foundation through the XPLORER PRIZE, and the Howard Hughes Medical Institute (grant no. 55008731). Author contributions: Q.F. designed the research project. Q.F., V.K., W.L., and J.M. managed the project. X.H., Q.R., W.G., Xinhua Wu, J.Y., B.W., Z.T., N.A., J.Z., X.C., Y.T., M.R., Xiaohong Wu, M.Z., W.H., and T.W. assembled archaeological materials and dating. Q.F., P.C., R.Y., F.L., X.F., Q.D., and W.P. performed or supervised wet laboratory work. Q.F. and X.F. did the data processing and quality control. W.W. and M.D. analyzed the data. W.W., M.D., J.D.G., Y.W., B.M., V.K., and Q.F. wrote the manuscript. All authors discussed, critically revised, and approved the final version of the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The mitochondrial genome consensus reported in this paper has been deposited in the Genome Warehouse in the National Genomics Data Center (76), Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, under BioProject accession number PRJCA002948 that is publicly accessible at All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article