Research ArticleEVOLUTIONARY BIOLOGY

Hyena paleogenomes reveal a complex evolutionary history of cross-continental gene flow between spotted and cave hyena

See allHide authors and affiliations

Science Advances  13 Mar 2020:
Vol. 6, no. 11, eaay0456
DOI: 10.1126/sciadv.aay0456

Abstract

The genus Crocuta (African spotted and Eurasian cave hyenas) includes several closely related extinct and extant lineages. The relationships among these lineages, however, are contentious. Through the generation of population-level paleogenomes from late Pleistocene Eurasian cave hyena and genomes from modern African spotted hyena, we reveal the cross-continental evolutionary relationships between these enigmatic hyena lineages. We find a deep divergence (~2.5 Ma) between African and Eurasian Crocuta populations, suggesting that ancestral Crocuta left Africa around the same time as early Homo. Moreover, we find discordance between nuclear and mitochondrial phylogenies and evidence for bidirectional gene flow between African and Eurasian Crocuta after the lineages split, which may have complicated prior taxonomic classifications. Last, we find a number of introgressed loci that attained high frequencies within the recipient lineage, suggesting some level of adaptive advantage from admixture.

INTRODUCTION

The late Quaternary was characterized by a substantial number of global extinction events (1). Despite this, many lineages still persist today, albeit with marked local extinctions, culminating in reductions in distribution and diversity. The study of ancient DNA and, more recently, paleogenomics are powerful approaches enabling direct comparisons between surviving and extinct lineages. The relationships between our own species (Homo sapiens) and extinct archaic hominins (Neanderthals and Denisovans) are arguably the best known example of this. The inclusion of paleogenomic data from nuclear genomes proved to be an invaluable tool for studying their evolution and most notably uncovered gene flow between archaic and modern humans that could not be detected using mitochondrial DNA (mtDNA) alone. The discovery of this gene flow markedly changed commonly accepted views of human evolution and highlighted the ability of paleogenomics to provide insights into the relationships between extinct and extant lineages (2).

The genus Crocuta (spotted and cave hyenas) includes several closely related extinct and extant lineages and is one of only two large-bodied African carnivores (the other being the saber-toothed cat genus Megantereon) whose migratory and evolutionary history has been compared to that of the genus Homo (3). Crocuta crocuta, the spotted hyena from sub-Saharan Africa, represents the only extant species. Spotted hyenas are the most common large carnivore in Africa today. They are a highly adaptable and opportunistic species, often living in large matriarchal clans and displaying complex social behaviors. Spotted hyenas are opportunistic feeders, and both hunting and scavenging play important roles in obtaining food (4). Females tend to stay within their natal clan, whereas males typically disperse to achieve reproductive success (5). Although now restricted to sub-Saharan Africa, the genus once had a much more extensive range, occupying most of Eurasia, from the British Isles to the far east of Asia (6). Initially, because of distinct morphologies, Eurasian cave and African spotted hyena lineages were considered distinct taxa. Cave hyenas had shorter distal limb elements than the extant species, indicating less cursorial ability. They also had a less trenchant check tooth morphology, indicating less active hunting and meat eating and a greater reliance on scavenging for their nutritional needs (7). Eurasian cave hyenas have further been split into European (C. crocuta spelaea) and Asian (C. crocuta ultima) subspecies (6). These classifications have, however, met with some resistance, with differences being attributed to phenotypic plasticity caused by different climates (8). This skepticism was further supported by a study using short fragments of mtDNA (9), which found African spotted hyenas to be intermingled within Eurasian cave hyenas’ mitochondrial haplogroups. This result indicated that, when only considering these mtDNA fragments, cave and spotted hyena appear to be inseparable taxa.

RESULTS AND DISCUSSION

Using paleogenomic data from several late Pleistocene cave hyenas from across Eurasia and population-level genomic data from sub-Saharan spotted hyenas, we investigated the evolutionary history of the genus Crocuta. We first assembled the mitochondrial genomes for 7 cave hyenas and 12 spotted hyenas. We then constructed a Bayesian phylogenetic tree using these assembled mitochondrial genomes, three previously published Crocuta mitochondrial genomes, and three outgroup mitochondrial genomes from Hyaena, Parahyaena, and Proteles (Fig. 1 and fig. S1). The tree shows the same major clades as previously found using short mitochondrial fragments and a similar topology, with the two taxa being polyphyletic with respect to each other, i.e., African spotted hyenas being intermingled within Eurasian cave hyenas’ mitochondrial haplogroups (9). However, the relationships between the major clades are slightly different. We find the East Asian clade D diverging first, followed by the purely European clade B, and a bifurcation of the African clade C and the African/Eurasian clade A. Moreover, the full mitochondrial genomes show the European and African sequences within clade A to form reciprocally monophyletic clades. In contrast to this picture for mtDNA, a principal components analysis (PCA) using nuclear genomic data from 7 cave hyenas and 10 spotted hyenas reveals a clear differentiation between cave and spotted hyenas along the PC1 axis (Fig. 2A). The PC2 axis then separates cave hyenas by geographical location, i.e., European versus East Asian individuals. As typical ancient DNA damage most commonly causes C-to-T transitions, this analysis only considered differences between individuals caused by transversions and should therefore not be biased by a mixture of modern and ancient samples.

Fig. 1 Sampling distribution and mitogenomic timetree of the hyaenidae family.

(A) Map showing the geographic origins of our spotted and cave hyena samples. The color of the dot represents the age of the sample. (B) Dated Bayesian phylogenetic tree constructed using complete mitochondrial genomes and a strict molecular clock. Haplogroups are those previously defined in Rohland et al. (9). Red-colored labels show Pleistocene cave hyena and blue-colored labels show modern spotted hyena. The yellow star represents a putative mitochondrial introgression event of unknown direction. The Hyaena/Parahyaena node was fixed for fossil calibration (as indicated by the bone image), and Proteles was set as the outgroup. All major nodes had posterior probability values of 1 (fig. S1). Dark blue node bars show the 95% credibility interval of the divergence dating.

Fig. 2 Population structure analyses comparing nuclear genomic information from Pleistocene cave hyena and modern spotted hyena.

(A) PCA based on genome-wide SNPs. Red shaded area encompasses cave hyenas and blue shaded area encompasses spotted hyenas. (B) Densitree constructed using 2 Mbp sliding windows and a maximum likelihood approach. Light gray lines represent single phylogenetic trees produced from each window. Dark black lines represent the root canal as defined by Densitree. Sample name colors represent the previously defined mitochondrial haplogroups.

To further investigate this distinct relationship, we constructed 467 maximum likelihood phylogenetic trees from 2–million base pair (Mbp) nonoverlapping windows along the nuclear genome and visualized the outputs simultaneously using Densitree (Fig. 2B) (10). We also combined each individual tree to produce a single consensus tree using PHYLogeny Inference Package (PHYLIP) (11) to investigate node support from the individual trees (fig. S2). Both the PHYLIP consensus tree and the Densitree root canal (consensus tree with the highest clade support) are consistent with the PCA results, again showing a separation between cave and spotted hyenas. Individual trees are highly concordant with respect to the major groupings, with 465 (99.6%) supporting the monophyly of cave hyenas and 462 (98.9%) supporting the monophyly of spotted hyenas. Tree topologies within each clade are, however, highly variable, likely reflecting the effects of both incomplete lineage sorting (ILS) and gene flow (fig. S2). A PCA of cave hyenas alone suggests three distinct clusters (fig. S3). The PC1 axis separates individuals by continent (European and East Asian), whereas PC2 separates the European cave hyenas into two groups, corresponding to their assignment to previously described mitochondrial haplogroups A and B and without any obvious correspondence with temporal or geographic proximity (9). This suggests an external barrier between geographically close populations, which may have hindered the successful dispersal of males to clans of different cave hyena matrilines. An alternative possible explanation for this finding has been suggested for European bison (12). In European bison, one mitochondrial haplogroup was dominant in individuals >50 thousand years old (ka). This haplogroup was then replaced by another between 50 and 32 ka, before returning to the original haplogroup. A similar phenomenon could explain our data with geographically close individuals displaying different mitochondrial haplogroups and clustering separately in the PCA. However, as Ccsp040 could not be dated owing to low collagen content, and the previously published Cc8 and Cc9 could not be reliably dated either, this hypothesis is purely speculative until more genetic information from reliably dated samples is available. A PCA of spotted hyenas alone did not show any clear clustering but indicates that there could be some level of isolation by distance (fig. S4). Demographic analyses provide an explanation for these contrasting patterns. Pairwise sequentially Markovian coalescent (PSMC) analyses indicate a large bottleneck in spotted hyenas over the late Pleistocene (Fig. 3). Furthermore, despite originating from markedly different areas on the African continent (Ghana, Namibia, and Somalia), all three spotted hyenas used within this analysis produced near-identical demographic histories, suggesting that these three individuals belonged to a single population or their demographics have been shaped by very similar drivers. While the exact causes behind this bottleneck remain unknown, population bottlenecks over a similar time period have also been reported in a large cohort of ruminant genomes (13). These bottlenecks coincided with increasing human effective population size, leading to the speculation that the decline might have somewhat been associated with human activities. An increasing human population size coupled with decreasing prey availability would have increased competition between spotted hyenas and other carnivorous species, potentially leading to the decrease in population size seen in the PSMC analysis. Furthermore, although our PSMC analyses only investigated African individuals, on the basis of the demographic findings by Chen et al. in European and Asian ruminants, Eurasian cave hyenas may have also faced a similar decline, ultimately leading to their demise at the end of the Pleistocene.

Fig. 3 PSMC demographic analyses of three modern spotted hyena nuclear genomes.

The y axis represents effective population size (×10,000), and the x axis represents millions of years before present. Faded lines show bootstrap values.

The contrast between the nuclear and mtDNA results suggests either extensive ILS or gene flow between spotted and cave hyena. We therefore investigated for the presence and directionality of gene flow using a previously published five-taxon phylogenetic sliding window analysis (14). This analysis revealed several instances of gene flow between cave and spotted hyenas after the split of the two lineages (tables S1 to S3). In regard to gene flow from spotted hyena into cave hyena, the Russian individual (Ccsp041—Geographical Society Cave, Russia) had the least putatively introgressed genomic regions, whereas the German individuals (Ccsp040—Aufhausener Höhle and Ccsp042—Lindenthaler Höhle) shared a similar introgression signal. This pattern occurred irrespective of which spotted hyenas were included in the comparison. The similarity between the German samples was further confirmed by nonsignificant D statistics results (z score between −3 and 3; tables S4 and S5). This result suggests that gene flow into cave hyenas occurred after the split between European and Asian populations, but before the split between the two European lineages. Conversely, gene flow from cave hyenas into spotted hyenas appears more complicated, and levels of admixture did not appear to correlate with either mitochondrial haplogroup or geographical proximity to Eurasia. The Namibian individual (NamCrocuta) contained the fewest windows associated with gene flow from cave hyenas, whereas the Kenyan individual (Kenya_795) contained the most. All other spotted hyenas in the comparison had similar levels. There are two explanations for the differential admixture patterns across spotted hyenas: either there were multiple gene flow events into spotted hyenas or a single admixture event was followed by random assortment and differential diffusion of the cave hyena loci. To investigate these two possibilities and infer the relative timing of the admixture events, we computed the cumulative lengths of windows showing signs of admixture (14) (fig. S4). Ccsp042 and Ccsp040 share similar cumulative lengths, again suggesting gene flow before the split of the two European lineages. The cumulative length of the least admixed spotted hyena (NamCrocuta) was similar to that of Ccsp042 and Ccsp040, suggesting that bidirectional admixture occurred at a similar time. This then appears to have been followed by a secondary gene flow event into Northern Africa (Ghana, Kenya, and Somalia), which may have diffused into Zambia and, possibly, even Namibia. However, as this is a relative test, it is difficult to separate gene flow from ILS in the individuals showing the least admixture. To further investigate these signs of admixture, we ran a fastsimcoal analysis (15). Results were generally consistent with the previous analyses and suggested bidirectional gene flow between spotted hyena and European cave hyenas. Moreover, fastsimcoal suggested substantially more gene flow in the direction of Europe from Africa (table S6), with the presence of directional gene flow being stronger in recent times (between 120,000 years before present and the sampling time of the cave hyenas) than before 120,000 years before present. The five-taxon sliding window analysis, on the other hand, showed more admixed windows from Eurasia into Africa than vice versa, potentially indicating more gene flow in the opposite direction to what was uncovered using fastsimcoal. However, these seemingly contradictory results can be explained by some caveats in assessing levels of gene flow using the sliding window analysis. First, as it is a relative test, i.e., places the least admixed individual as a baseline “nonadmixed” individual, total levels of admixture could be underestimated if this baseline individual is highly admixed. For example, if the Asian individual, Ccsp041, contained a substantial amount of admixture from Africa, the total number of windows recovered showing gene flow in the other cave hyena individuals would be relatively reduced. Second, the timing of gene flow can influence the number of admixed windows recovered. The more recent the gene flow event, the less time for consecutive windows to be broken through recombination. More recent gene flow could therefore be misinterpreted as more total gene flow simply because the total number of windows showing gene flow would be higher. This seems a likely explanation for the discordance as we find longer consecutive admixture windows in spotted hyena compared to cave hyena, suggesting more recent gene flow into Africa than vice versa.

Thus, on the basis of both the five-taxon phylogenetic sliding window analysis, D statistics results, and fastsimcoal (tables S1 to S6), we suggest a bidirectional gene flow event between cave and spotted hyenas after the split of cave hyenas into the European and Asian lineages and a subsequent unidirectional gene flow event into northern spotted hyenas, followed by differential diffusion of the admixed loci within the other spotted hyena lineages (Fig. 4). Furthermore, on the basis of the divergence time between the cave and spotted hyena in mitochondrial haplogroup A (Fig. 1), we suggest that gene flow occurred sometime before ~475 ka (95% CI, 388 to 570 ka), either from spotted hyena into cave hyena or vice versa. However, the fact that the spotted hyena mitochondrial haplogroup reached such a high frequency in some cave hyena populations, but not in others, or that one of the cave hyena haplogroups reached such a high frequency in northern Africa was most likely by chance (i.e., ILS and genetic drift) as opposed to selection.

Fig. 4 D statistics results and a schematic overview of the divergence of and post-divergence gene flow between spotted and cave hyenas.

(A) D statistics results using two cave hyenas, one spotted hyena, and the striped hyena as outgroup. (B) D statistics results using two spotted hyenas, one cave hyena, and the striped hyena as outgroup. Filled circles show significant D scores while nonfilled circles show nonsignificant D scores. (C) Phylogenetic tree presenting the admixture events found between African spotted and Eurasian cave hyena. Blue color represents African inhabitancy and red color represents Eurasian inhabitancy. Arrows show the direction of gene flow events. (Illustration credit: Binia De Cahsan, https://sites.google.com/view/decahsanillustrations.)

To investigate whether this admixture may have had adaptive consequences, we determined the frequency and distribution of the different tree topologies across the genome recovered from the 100-kb five-taxon sliding window analysis. The most common topology within these comparisons was the monophyly of spotted and cave hyenas as 63 to 73% of all windows represented this topology with the rest assumed to have arisen due to ILS or admixture. However, we noted that some windows consistently showed an aberrant topology regardless of which individuals were used in the comparison and therefore were unlikely to have arisen by a stochastic process such as neutral drift. We further investigated these regions for the presence of protein-coding genes, the protein classes these genes belong to, and the biological functions of the genes. When investigating regions of introgression into spotted hyenas, we found 75 putative genes (table S7). These genes corresponded to 54 unique protein classes and 135 biological processes (tables S8 and S9). We further investigated Gene Ontology (GO) term enrichment in these genes using GOrilla (16) and found a significant enrichment in the glutamate receptor signaling pathway (false discovery rate q = 0.05). This pathway is involved in a variety of biological processes and plays a key role in the central nervous system. In humans, abnormalities in this pathway have been linked to both chronic disabling brain disorders (e.g., schizophrenia) and neurodegenerative diseases (e.g., Alzheimer’s, Parkinson’s, and multiple sclerosis) (17). On the other hand, when investigating regions of introgression into cave hyenas, we found only 12 putative genes (table S10) corresponding to 11 unique protein classes and 28 biological processes (tables S11 and S12). GO term enrichment analysis did not find any significantly enriched terms.

To better understand the various events in the evolutionary history of the genus Crocuta, we estimated the divergence time between the cave and spotted hyena lineages using a genome-wide distance-based approach based on the method from Heintzman et al. (18) with some adjustments. To remove any biases that admixture or ILS may incur, we only used windows from the above admixture analysis that showed cave and spotted hyenas as monophyletic. This resulted in 3993 100-kb windows for our analysis. With this method, we estimated the time to the most recent common ancestor (tMRCA) of cave and spotted hyenas to be 2.52 million years (Ma) (CI 2.21 to 2.83 Ma). This deep divergence was further strengthened through a fastsimcoal analysis, which specifically models both lineage sorting and gene flow. This analysis indicated a tMRCA between cave and spotted hyenas of 2.71 Ma (CI 2.15 to 2.84 Ma; table S6). As the earliest presence of Crocuta outside of Africa is Crocuta honanensis in the Longdan basin of China from ~2 Ma (19) and the oldest Crocuta fossils in Africa (Crocuta dietrichi) are from the early Pliocene (3.63 to 3.85 Ma) (20), we propose a dispersal from Africa into Eurasia, most likely into Asia, shortly after the divergence of the two lineages. This timing closely coincides with the oldest fossil of Homo found outside of Africa (Homo georgicus), found in Georgia and estimated at ~1.8 Ma (21), and the oldest Homo artifacts outside of Africa dated to ~2.1 Ma (22). This was most likely followed by subsequent dispersal into Europe from Asia as the earliest confirmed European occurrences of Crocuta are from Sierra de Atapuerca (Spain) [~0.9 Ma (23)] and middle Pleistocene Italy (24). Several other mammalian species (e.g., Theropithecus, Pachycrocuta, Panthera, and Hippopotamus) have also shown an Africa-to-Eurasia dispersal in the late Pliocene/early Pleistocene (25), with an overrepresentation of carnivores, most likely owing to their good dispersal abilities and broad ecological tolerances. However, as most of these dispersals only roughly correlate with hominin dispersals, any connection between these mammals would be speculative based on current evidence.

Beyond the timing of dispersal into Eurasia, in contrast to our own genus, very little information is available with regard to the evolutionary history of spotted hyena. Our analyses combining both modern and Pleistocene nuclear genomes reveal some notable similarities and provide new evidence to the theory that the evolutionary and migratory history of Crocuta is similar to that of Homo, first proposed decades ago (26). In addition to the similar migratory pattern and timing, we see highly similar discordances between the mitochondrial and nuclear genome trees for modern and archaic humans on the one hand and spotted and cave hyena on the other. In both cases, the deepest divergence in the mitochondrial tree is formed by an East Asian lineage, while the nuclear genomes show a sister group relationship for East and West Eurasian populations (Neanderthals versus Denisovans and European versus East Asian cave hyena, respectively), with the African populations forming the basal diverging lineage. Although these topological similarities are remarkable, since the underlying causes are largely unknown for both humans and hyenas, it remains to be determined if the similarities are due to a common cause or simply coincidence. However, replacements of forested to savannah-like terrain ~2.6 to 1.8 Ma outside of Africa have been suggested as a catalyst to early hominin dispersal out of Africa and may have played a similar role in Crocuta dispersal (27).

Despite all of these similarities, it should be noted that there are also a number of differences. Although humans and hyenas may have left Africa around the same time, on a nuclear genomic level, this first emigration did not leave any descendants in humans, whereas it may have done so in hyenas. Also, the population trajectories in spotted hyenas and modern humans are notably different, when considering both the PSMC analyses and the demographic outcome, with modern humans populating the entire globe, whereas spotted hyenas are today restricted to sub-Saharan Africa. Thus, certain similarities in the evolutionary history of species groups should not distract from the fact that the overall population history of an individual species or species group is almost certainly unique in most aspects.

MATERIALS AND METHODS

Wet laboratory work

Sample information can be found in tables S13 and S14.

Modern samples

We extracted DNA from eight spotted hyena tissue samples using a Qiagen DNeasy blood and tissue extraction kit following the manufacturer’s protocol. We then fragmented the DNA into ~500–base pair (bp) fragments using a Covaris sonicator. We constructed fragmented extracts into Illumina sequencing libraries using a modified version of the protocol set out by Meyer and Kircher (28, 29). Library molecules from 400 to 900 bp were then selected using a Pippin Prep instrument (Sage Science). All modern individuals sequenced to low coverage were sequenced on an Illumina NextSeq 500 at Potsdam University, Germany, using 2 × 150 bp paired-end sequencing. The Namibian individual was sequenced on a single lane on an Illumina HiSeq X using 2 × 150 bp paired-end sequencing at the National Genomics Infrastructure (NGI) in Stockholm. The Ghana sample was extracted using a Qiagen DNeasy blood and tissue extraction kit and then sent to the NGI in Stockholm for Illumina library construction. The extract was built into a polymerase chain reaction (PCR)–free TruSeq Illumina sequencing library using a 350-bp insert size by the NGI in Stockholm. This library was sequenced on a single lane of an Illumina HiSeq X using 2 × 150 bp paired-end sequencing. DNA was extracted from the Somalian C. crocuta sample, built into Illumina sequencing libraries, and sequenced on a HiSeq 2000 Illumina platform using 2 × 100 bp paired-end sequencing by the BGI (www.bgi.com). We additionally included previously published mitochondrial genomes from spotted hyena, cave hyena, striped hyena, brown hyena, and aardwolf (3032) and the nuclear genomes of the brown and striped hyena (32). The Somalian hyena sample came from the Frozen Zoo (Frozen Zoo ID no. KB4526) at the San Diego Zoo Institute for Conservation Research and was provided by O. Ryder. The Namibian spotted hyena sample was collected on 12 October 2017. This individual was anesthetized via dart injection with a combination of medetomidine (Domitor, Pfizer Animal Health, Exton, PA, USA; 2.5 mg intramuscularly), tiletamine hydrochloride/zolazepam hydrochloride, and ketamine (Ketaset, Zoetis, NJ, USA; 250 mg intramuscularly). The hyena was given a physical examination for a general health assessment, and body measurements were taken. Blood samples were taken from the femoral vein, subsequently kept cold, and frozen at −16°C not more than 24 hours later. The hyena was reversed with atipamezole (Antisedan, Pfizer Animal Health, Exton, PA, USA; five times the milligram dose of medetomidine induction intramuscularly). Capture and handling procedures were approved and permitted through an annually reviewed research permit issued to I.W. by the Namibian Ministry of Environment and Tourism (MET) (permit no. 1134/2007). All captures were done under veterinary supervision, and reports were annually submitted to the relevant authorities. The export permit for samples was issued by the Namibian MET on 26 February 2016 (export permit no. 104801). The remaining spotted hyena samples were a mixture of blood samples and skin biopsies collected from the field between 1991 and 1998 and presently stored at the University of Copenhagen. Relevant permits were secured, and all legislation has been observed in collecting and exporting the samples.

Ancient samples

For the extraction of DNA from the late Pleistocene cave hyena petrous bones, we ground approximately 50 mg of bone from samples Ccsp040, Ccsp041, Ccsp042, and Ccsp043 to powder using a mortar and pestle and extracted DNA following the protocol described in Dabney et al. (33). We ground approximately 50 mg from each tooth sample from samples Ccsp014, Ccsp015, and Ccsp026, which we then pretreated with 1 ml of 0.5% bleach for 15 min at room temperature before DNA extraction in an attempt to increase endogenous DNA content (34). After pretreatment, the extraction method followed that described in Dabney et al. We built all DNA extracts into individually barcoded Illumina sequencing libraries using a method based on single-stranded DNA specifically developed for highly degraded ancient samples (35). Production and sequencing of library and extraction blanks were included to check for the presence of contamination. We sequenced all ancient libraries using 75-bp single-end reads on an Illumina NextSeq 500 sequencing platform at the University of Potsdam, Germany. Approximately 1 g of bone from individuals that had not been previously dated either contextually or directly (Ccsp026, Ccsp040, Ccsp042, and Ccsp043) was sent to the Curt-Engelhorn Centre for Archaeometry to be dated by 14C. The collagen was extracted from the bone/teeth samples, purified by ultrafiltration (fraction >30 kDa), and freeze-dried. Collagen was then combusted to CO2 in an elemental analyzer. CO2 was converted catalytically to graphite to be dated using the mini radiocarbon dating system (MICADAS)–AMS of the Klaus-Tschira-Archäometrie-Zentrum. Ages were taken from previous publications for individuals Ccsp014 and Ccsp041 (9, 36, 37).

Data processing

Modern individuals. We trimmed Illumina adapter sequences and removed reads shorter than 30 bp from the raw reads of the spotted hyena samples using Cutadapt v1.8.1 (38) and merged overlapping reads using FLASH v1.2.1 (39). We mapped the trimmed and merged data to the striped hyena assembly (32) (PEQU00000000) using Burrows-Wheeler Aligner (BWA) v0.7.15 (40) using the mem algorithm and parsed the output using SAMtools v1.3.1 (40). We removed potential PCR duplicates using SAMtools. Mapping information and statistics can be found in table S15.

Ancient individuals. We trimmed Illumina adapter sequences and removed short reads from the raw reads of the spotted hyena samples using Cutadapt v1.8.1 (38). We used a custom Perl script to determine the optimal read length cutoff for our samples. Our approach assumes that short spurious alignments contain more mismatches, and we selected, empirically for each dataset, the read length at which we see a relatively constant number of variants for the same total number of bases. Trimmed reads were mapped to the striped hyena assembly using BWA v0.7.15 (40) using the aln algorithm (-n 0.01, -l 999) and parsed using SAMtools v1.3.1 (40). We also removed potential PCR duplicates using SAMtools. Mapping information and statistics can be found in table S16. Ancient DNA authenticity was then measured using Mapdamage2.0 using default parameters (41). All samples showed significant C-to-T transitions at the ends of the reads, indicating authentic ancient DNA (fig. S6).

Population structure. We implemented three independent PCAs using three different datasets: cave hyena only, cave and spotted hyena, and spotted hyena only. Spotted hyena individuals Hyena2071 and Hyena375 were excluded from this analysis because of unknown population provenance. We carried out PCA using single read identity by state (IBS) analyses in analysis of next generation sequencing data (ANGSD) v0.913 to avoid any biases differential coverage may cause in base calling (42). The following filters were applied to all three analyses: only consider reads with a mapping quality over 25 (-minMapQ 25), only consider bases with a base quality over 25 (-minQ 25), only consider reads that map uniquely to one location (-unique_only 1), and remove “bad” reads as deemed by ANGSD (secondary alignments) (-remove_bads 1). We also implemented additional filtering specific to each dataset. The spotted hyena PCA only considered sites at which at least five individuals had coverage (-minInd 5) and only considered single-nucleotide polymorphisms (SNPs) that occurred in at least two individuals (-minFreq 0.2). The cave hyena PCA only considered sites where at least five individuals had coverage, only considered transversions, and only considered SNPs that occurred in at least two individuals. The cave and spotted hyena low-coverage genome PCA only considered sites where at least 11 individuals had coverage, only considered transversions, and only considered SNPs that occurred in at least two individuals.

For the maximum likelihood phylogenetic analyses, we created pseudohaploidized consensus sequences using a random single-base selection method (-doFasta 1) in ANGSD while also applying the same filters mentioned above. These consensus sequences were aligned together with the striped hyena and split into nonoverlapping windows of 2 Mbp, only considering scaffolds with a length greater than 2 Mbp. We then further filtered the data by only including sites where at least 11 of the 17 ingroup individuals (Crocuta) had data, converted to binary to only score the transversions (A or G: 0, C or T: 1), removed invariant positions, and only considered sites where at least two of the ingroup individuals contained an SNP using a custom Perl script. We performed maximum likelihood analyses with RAxML v8.2.10 (43), specifying the striped hyena as outgroup and using the BINGAMMA substitution model. Output trees were then visualized simultaneously using Densitree (10) and combined into a single consensus tree with node support values using PHYLIP v3.6 (11).

Mitochondrial analyses. We mapped the trimmed and merged reads from our spotted hyena shotgun data and the trimmed reads for the cave hyena to an available spotted hyena reference sequence (GenBank accession: JF894377.1) using BWA v0.7.15 and parsed the mapped files using SAMtools v1.3.1. We then constructed the mitochondrial consensus sequences using ANGSD v0.913, only considering mapped reads and bases with quality scores greater than 25. We aligned the resultant consensus sequences together with three previously published C. crocuta (GenBank accessions: JF894378.1, JF894379.1, and JF894377.1) mitochondrial genomes (30) and the mitochondrial genomes of a striped hyena (Hyaena hyaena) (GenBank accession: NC_020669.1) (30), an aardwolf (Proteles cristata) (GenBank accession: MH662445.1) (31), and a brown hyena (Parahyaena brunnea) (GenBank accession: NC_038159.1) (32) using MAFFT v7.271 (44). We excluded the control region from further analyses because of its high interspecific variation and manually annotated the alignment for tRNAs, ribosomal RNAs, and protein-coding regions to be used as input for Partitionfinder2 to find the optimal genomic feature partitions and substitution models (45). We constructed a phylogenetic tree using BEAST v1.8.4 (46) on the Cipres server (47) from this alignment and specified the partitions and substitution models determined via Partitionfinder2 (table S17), a strict clock model, and a birth-death speciation process (48). We used a fossil calibration point at the Hyaena/Parahyaena divergence, specifying a normal distribution with a mean of 4.625 Ma and a standard deviation of 0.25. This date was based on the earliest Parahyaena fossil (Parahyaena howelli) first described from Kanapoi, Kenya (dated between 4.17 ± 0.03 and 4.07 ± 0.02 Ma), and later from the Lower Laetolil Beds, Tanzania (4.36 to 3.85 Ma) (4951), and the most recent putative Hyaena/Parahyaena ancestor, Ikelohyaena abronia from Langebaanweg, South Africa (~5.2 Ma) (52). We ran the Markov Chain Monte Carlo chain for 100 million generations, sampling every 100,000 generations, and assessed convergence of the posteriors [effective sample size (ESS) > 200] using Tracer v1.6 (53). We extracted the maximum clade credibility tree with node heights scaled to the median of the posterior sample. We then visualized the resultant consensus tree using FigTree v1.4.2 (54).

Gene flow. To evaluate directional gene flow between spotted and cave hyenas, we performed a five-taxon (four ingroup and one outgroup) sliding window phylogenetic test, previously implemented in the analysis of admixture between cave and brown bears (14). For this analysis, we selected three cave hyena representatives, one from each major clade found in fig. S2 (Ccsp040, Ccsp041, and Ccsp042), and five spotted hyena representatives (Somalia_crocuta, NamCrocuta, Ghana_crocuta, Kenya_793, and Zambia_2518). The five taxa consistently contained two cave hyena individuals, two spotted hyena individuals, and the striped hyena as outgroup. We then applied every possible taxon combination while using this as a prior, leading to a total of 30 independent analyses. As input for this analysis, we created pseudohaploidized consensus sequences for each individual by a random single-base selection method (-doFasta 1) in ANGSD while also applying the following filters: only consider reads with a map quality over 25, only consider bases with a base quality of over 25, only consider reads that map uniquely to one location, and remove bad mapping reads. The consensus genomes of the selected individuals were then aligned and divided into nonoverlapping windows of 100 kbp (kilo–base pairs). Any windows for which one of the five individuals contained more than 50% gaps were removed from further analyses. We then recorded sites into binary characters to only score transversions and computed a maximum likelihood phylogeny under the BINGAMMA model, with the striped hyena as outgroup using RAxML (43). The topology of each phylogeny was evaluated using a custom Perl script that made use of the ETE3 software (55).

To estimate the number and relative timing of admixture events, we investigated the cumulative lengths of the 100-kb genomic regions showing signs of admixture from five independent five-taxon sliding window phylogenetic tests. The lengths of genomic regions showing signs of admixture were estimated from the complete dataset by counting the number of consecutive blocks returning the respective tree topology. As this is a relative test, we kept the individuals with the least amount of admixture (NamCrocuta and Ccsp041) consistent throughout the analyses while changing the cave hyena or spotted hyena these were compared against. When investigating admixture into the cave hyena, we compared against Kenya and Namibia. When investigating admixture into the spotted hyena, we compared against Ccsp040 and Ccsp041. We used the following combinations to investigate the cumulative windows: (Ccsp041,Ccsp040,NamCrocuta, Ghana), (Ccsp041,Ccsp040,NamCrocuta,Kenya_793), (Ccsp041, Ccsp040,NamCrocuta,Zambia_2518), (Ccsp041,Ccsp040,NamCrocuta,Somalia), and (Ccsp041,Ccsp042,NamCrocuta,Kenya_793).

To investigate any sites that may be associated with adaptive introgression, we extracted windows in which at least 90% of all comparisons showed introgression into the cave hyena, regardless of which individuals were used. Moreover, we also investigated windows in which at least 90% of all comparisons showed introgression into the spotted hyena, regardless of which individuals were used. We then investigated these windows for the presence of genes by blasting them against the Felis catus transcriptome (GCF_000181335.2) using Blastn v2.2.31+ (56). We then classified GO using PANTHER (57) and the F. catus database, investigating both the biological processes the genes are involved in and which protein classes these genes belong to. We further investigated whether there was any GO enrichment in the genes found within these windows using GOrilla (16).

We further performed a test for admixture using D statistics. For this analysis, we used the same three cave hyena representatives (Ccsp040, Ccsp041, and Ccsp042), five spotted hyena representatives (Somalia_crocuta, NamCrocuta, Ghana_crocuta, Kenya_793, and Zambia_2518), and the striped hyena as outgroup. We ran D statistics in ANGSD using the random base option (-doAbbaBaba 1), removing transitions (-rmTrans 1), and applying the following filters: only consider reads with a map quality over 25, only consider bases with a base quality of over 25, only consider reads that map uniquely to one location, and remove bad reads as deemed by ANGSD. Significance was evaluated using a jackknife approach. Any D score that was more than 3 standard errors (SE) away from 0 was considered significant (z score > 3 or < −3).

Demographic inference. We calculated the demographic history of the three high-coverage spotted hyena originating from Namibia, Somalia, and Ghana using the PSMC model (58), considering only the autosomal chromosomes. Scaffolds previously found to represent the X chromosomes of the striped hyena were removed along with any scaffold less than 1 Mbp in length (32). A PSMC diploid sequence was constructed using SAMtools. One hundred bootstrap analyses were undertaken for each analysis. When plotting, we used a generation time of 6 years and a mutation rate of 4.44 × 10−9 per generation for autosomes. To estimate the mutation rate, we carried out a pairwise distance analysis on the striped hyena and the three high-coverage spotted hyena’s autosomes using a consensus base IBS approach in ANGSD v0.913 and took the average distance of all three spotted hyena from the striped hyena (0.012735). The average per generation mutation rate was then calculated assuming a divergence date of the two species to be 8.6 Ma (59), a genome-wide strict molecular clock, and a generation time of 6 years (5).

Nuclear genome divergence dating. We constructed a distance matrix between the nuclear genomes of the three cave hyena individuals with over 1× coverage (Ccsp040, Ccsp041, and Ccsp042), the three high-coverage spotted hyena genomes (Ghana_crocuta, NamCrocuta, and Somalia_crocuta), the striped hyena, and the brown hyena (SAMN07431164) in ANGSD using the consensus base call (-doIBS 2) and the -MakeMatrix functions while applying the following filters: only consider reads with a mapping quality over 25, only consider bases with a base quality of over 25, only consider reads that map uniquely to one location, remove bad reads as deemed by ANGSD, only consider sites for which all individuals had a base, and remove transitions. We further filtered by only considering windows for which no signs of admixture were found in any of the five-taxon phylogenetic tests above, regardless of individuals used. We then calculated the divergence time from the resultant distance matrix (table S18). We assumed that the long terminal branches in the cave hyena individuals came from sequencing errors due to the low coverage of the dataset or errors caused by short-read mismapping. Therefore, we reduced the cave hyena terminal branches to be equal in length to the high-coverage spotted hyena (assumably error free) individuals. To calibrate the tree, similar to the mitochondrial timetree, we used the Hyaena/Parahyaena divergence date. However, for this analysis, we specified the minimum age of 4.05 Ma based on the earliest Parahyaena fossil (P. howelli) (4951) as the minimum age, the maximum age of 5.2 Ma based on the most recent putative Hyaena/Parahyaena ancestor, I. abronia (52), and the mean between these two values (4.625 Ma) as the mean divergence time value. We further assumed a strict genome-wide molecular clock. To investigate the validity of this method, we estimated the Crocuta and Hyaena/Parahyaena divergence date using the same dataset. We dated this divergence date to be 11.2 Ma (CI 9.8 to 12.7 Ma). This result is in concordance with previous findings using both molecular-based approaches (31, 59) and the fossil record (6) and our own mitochondrial genome timetree, giving us confidence in this method.

We further applied a coalescence-based method to estimate the divergence time between the cave and spotted hyena. To do this, we first created a three-dimensional site frequency spectrum (3DSFS) specifying the spotted hyena (all individuals), European cave hyena (Ccsp040, Ccsp042, and Ccsp043), and Asian cave hyena (Ccsp041) using ANGSD. We used the same filtering parameters as in the previous PCA analyses (-minMapQ 25 -minQ 25 -remove_bads 1, -uniqueOnly 1, -rmtrans 1) with the following additional parameters: perform multisample genotype likelihoods estimation (-doSaf 1) and specify the striped hyena as the ancestral allele (-anc). We also uniquely specified the minInd parameter for each population (Spotted hyena -minInd 5, European cave hyena -minInd 2, and Asian cave hyena -minInd 1). To ensure compatibility between populations, we also downsampled the high-coverage spotted hyena to ~5× using SAMtools. The resultant ANGSD allele frequency output was then run through ANGSD realSFS. We used fastsimcoal v.2.6 to estimate parameters from a historical scenario based on the PSMC and gene flow analyses (15). We used the expectation conditional maximization procedure described by Excoffier et al. to estimate 11 model parameters running 50 replicates of the optimization procedure, each with 50 expectation conditional maximization cycles and 100,000 simulations per cycle. The replicate with the highest estimated likelihood was used as the maximum likelihood parameter estimate. To estimate parameter uncertainty, we performed block bootstrapping (nSites of 200,000,000) of the 3DSFS using realSFS in ANGSD. The resulting bootstrapped SFSs were used as input in fastsimcoal, using the same procedure as outlined above, with the exception that only a single replicate estimation was performed for each bootstrapped dataset. Eight bootstrap parameter estimations showed an exceptionally large difference between the estimated and observed likelihood value as well as outlying values for some parameters, indicating poor convergence or optimization due to the single optimization run for each bootstrap dataset. These were rerun and subsequently yielded better optimization results. Divergence times were calibrated using the same mutation rate and generation time as used for the PSMC analysis.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/11/eaay0456/DC1

Table S1. Results from the five-taxon topology test for admixture when considering only Ccsp040 and Ccsp041 as C1 and C2 and all combinations of spotted hyena.

Table S2. Results from the five-taxon topology test for admixture when considering only Ccsp040 and Ccsp042 as C1 and C2 and all combinations of spotted hyena.

Table S3. Results from the five-taxon topology test for admixture when considering only Ccsp041 and Ccsp042 as C1 and C2 and all combinations of spotted hyena.

Table S4. D statistics results with spotted hyena as H1 and H2 and cave hyena as H3.

Table S5. D statistics results with cave hyena as H1 and H2 and spotted hyena as H3.

Table S6. Parameter estimates from fastsimcoal.

Table S7. List of orthologous domestic cat genes found in windows showing signs of admixture from cave hyena into spotted hyena.

Table S8. List of protein classes of the orthologous domestic cat genes found in windows showing signs of admixture from cave hyena into spotted hyena.

Table S9. List of biological processes of the orthologous domestic cat genes found in windows showing signs of admixture from cave hyena into spotted hyena.

Table S10. List of orthologous domestic cat genes found in windows showing signs of admixture from spotted hyena into cave hyena.

Table S11. List of protein classes of the orthologous domestic cat genes found in windows showing signs of admixture from spotted hyena into cave hyena.

Table S12. List of biological processes of the orthologous domestic cat genes found in windows showing signs of admixture from spotted hyena into cave hyena.

Table S13. Modern spotted hyena sample information.

Table S14. Pleistocene cave hyena sample information.

Table S15. Spotted hyena mapping values.

Table S16. Cave hyena mapping values.

Table S17. Genomic features and substitution model within each partition determined using Partionfinder2 on the Hyaenidae mitochondrial genomes.

Table S18. Distance matrix used to calculate divergence time between spotted and cave hyena.

Fig. S1. Bayesian dated phylogenetic tree of the Hyaenidae family constructed using complete mitochondrial genomes.

Fig. S2. Spotted and cave hyena cladogram consensus tree constructed using 467 maximum likelihood phylogenetic trees from 2 Mbp nonoverlapping windows along the nuclear genome.

Fig. S3. PCA constructed using only Pleistocene cave hyena individuals.

Fig. S4. PCA constructed using only modern spotted hyena individuals.

Fig. S5. Cumulative size distributions of regions returning putative admixture topologies, determined by counting the number of consecutive 100-kb blocks showing a putative admixture topology.

Fig. S6. MapDamage results for cave hyena genome data.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We thank F. Hrouda from the Museum für Naturkunde Gera for supplying some cave hyena samples. We also thank R. Slade for helping produce the phylogenetic tree in Fig. 4 and B. De Cahsan for drawing the illustrations of the cave and spotted hyena in Fig. 4. We thank G. Zhang for supplying the genomic data for the Somalian spotted hyena individual. Last, we thank the three reviewers for constructive feedback on the manuscript. Funding: This work was supported by the European Research Council (consolidator grant GeneFlow no. 310763 to M.H.). The authors also acknowledge support from Science for Life Laboratory, the Knut and Alice Wallenberg Foundation, the National Genomics Infrastructure funded by the Swedish Research Council, and Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure. L.D. acknowledges support from the Swedish Research Council and FORMAS. R.H. was supported by a Villum Foundation Young Investigator grant (VKR023447) and a Sapere Aude Research Leader grant (8049-00098B) from Independent Research Fund Denmark. Author contributions: M.H. conceived the project. M.V.W. and M.P. performed the laboratory work. M.V.W., R.H., and S.H. performed the bioinformatic analyses. M.V.W., M.H., A.B., L.D., and R.H. interpreted the analyses. B.R., D.N., T.R., R.Z., G.B., G.S., A.L., I.W., and R.H. provided samples and sample information. F.B. and L.W. provided the paleontological context and interpretations. M.V.W. wrote the manuscript with input from M.H. All coauthors read and agreed on the final manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The raw sequencing reads can be found under the GenBank BioProject IDs PRJNA559045 and PRJNA554753. New mitochondrial genomes can be found under the GenBank Accession codes MN320449 to MN320467. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.
View Abstract

Stay Connected to Science Advances

Navigate This Article