Research ArticleGENETICS

A climate-associated multispecies cryptic cline in the northwest Atlantic

See allHide authors and affiliations

Science Advances  28 Mar 2018:
Vol. 4, no. 3, eaaq0929
DOI: 10.1126/sciadv.aaq0929


The spatial genetic structure of most species in the open marine environment remains largely unresolved. This information gap creates uncertainty in the sustainable management, recovery, and associated resilience of marine communities and our capacity to extrapolate beyond the few species for which such information exists. We document a previously unidentified multispecies biogeographic break aligned with a steep climatic gradient and driven by seasonal temperature minima in the northwest Atlantic. The coherence of this genetic break across our five study species with contrasting life histories suggests a pervasive macroecological phenomenon. The integration of this genetic structure with habitat suitability models and climate forecasts predicts significant variation in northward distributional shifts among populations and availability of suitable habitat in future oceans. The results of our integrated approach provide new perspective on how cryptic intraspecific diversity associated with climatic variation influences species and community response to climate change beyond simple poleward shifts.


Growing evidence demonstrates cryptic population structure as a prevalent phenomenon among marine species (13). Phylogeographic breaks (4) or genetic structure associated with adaptation (5) appear common and point to areas of contemporary or historical restriction in gene flow that are consistent with genetic incompatibilities or selection against nonnative alleles (68). In contrast, few studies report intraspecific genetic breaks or clines showing concordant patterns across multiple species [notable exceptions (9, 10)]. Even fewer studies document consistent multispecies genetic clines caused by adaptation to ocean climatic conditions. Previous rare examples of congruent multispecies genetic discontinuities often occur where land masses or divergent ocean current systems create obvious barriers to gene flow (11). When present, congruent multispecies spatial patterns highlight particularly important macroecological determinants of marine connectivity and community composition (4) and provide a window into past and future population structuring forces in the marine environment.

The dearth of congruent discontinuities in the marine environment is surprising given the role of climate as the primary determinant of species distributions, driving both expansion and contraction in species ranges (12), influencing ecosystems, and altering the landscape of resources and services they provide (13). Although ubiquitous, the influence of climate can vary spatially across environmental or biogeographic gradients (14, 15) and temporally among population-specific phenologies (16). Most studies examining the influence of climate on species distributions focus on range limits (17) and in the context of physiological limitations (14) and changing biotic communities (18). Although changes in distributions of species provide a useful metric of the influence of climate and climate change [for example, northward distributional shifts in demersal fishes (17)], these species-level observations may miss important shifts that can occur within species, involving changes in distributions of locally adapted forms. Particularly for species that span strong environmental gradients, the influence of climate change can be more complex. Examination of intraspecific genetic variation across climate gradients may clarify the role of selection and adaptive variation as determinants of population structure and contextualize the impact of climate change on marine systems (6, 19).

We examined genetic and genomic evidence for latitudinal and climate-associated clines in both native and introduced species in the northwest (NW) Atlantic, from North Carolina, United States to Labrador, Canada. This region spans a ~16° latitudinal range and a strong environmental gradient, with an abrupt transition on the Eastern Scotian Shelf (Fig. 1). Previous genetic studies over this range suggest a broad north-south clinal pattern within individual species (table S1), and the potential for adaptive divergence (5), but whether these patterns recur across taxa and at what ecological scale remains unknown.

Fig. 1 Spatial distribution of sampling, genetic structure, and climate in the NW Atlantic.

Here, panels depict (A) species sampling distribution overlaid on the average winter bottom temperature and (B) average of binary clustering analyses to north (blue) and south (red) overlaid on the average spring sea surface temperature (SST). Solid and dashed lines (means ± SD) denote the clinal inflection point. Temperatures on both panels represent an aggregate of seasons from 2002 to 2012. Sampling summary provided for each species in (A) (number of sites, number of samples). Inset in (B) shows range of map panels in reference to the western Atlantic coast. Refer to table S1 for species data source references.

We modeled clinal population structure for Atlantic cod (Gadus morhua), American lobster (Homarus americanus), sea scallop (Placopecten magellanicus), northern shrimp (Pandalus borealis), and the invasive European green crab (Carcinus maenas) (hereafter cod, lobster, scallop, shrimp, and green crab) using available genetic and genomic data. These species vary greatly in habitat utilization, spawning characteristics, larval durations, and adult movement patterns. We used three genetic clustering approaches for each species to estimate clinal patterns and inflection points for the ensemble output. Evaluating the clinal variation as an ensemble provides a unique, multispecies perspective of clinal variation, modeled as a function of geography (latitude and along-shore distance) and environment (annual and seasonal averages for surface and bottom temperature and salinity at each sampling location, in addition to topographic variables describing slope and depth).


Analyses of genetic variation and clinal structure revealed remarkable consistency in broad-scale, north-south delineation among clustering approaches, life histories (species), and across latitude. Logistic mixed models fit to the cluster coefficients revealed a significant relationship with latitude (P < 0.002, mean r2 = 0.6), documenting a multispecies clinal transition point at 44.61°N (±0.25, means ± SD, among clustering methods) along the eastern coast of Nova Scotia (Fig. 1B), spanning less than 100 km of coastline (Fig. 2 and fig. S2). This break occurs near a steep environmental gradient that potentially mediates both connectivity and adaptive divergence (20). Corresponding gradients of productivity (21) and delineations among ecoregions (22) further highlight the ecological significance of this transition area. Refining the position and width of the genetic cline and better understanding of its taxonomic scope require focused sampling across this transition area.

Fig. 2 Spatial clustering analysis using STRUCTURE, discriminant analysis of principal components (DAPC), and spatial principal components analysis (sPCA) modeled as a function of latitude.

Here, assignment probability and admixture are relative to the southern population, and sPCA scores have been rescaled between 0 and 1 to align with other coefficients. Model fits are presented with standard error (dashed lines). Dashed vertical lines denote modeled intercepts.

Multivariate redundancy analyses (RDAs) identified environmental variables that correlated significantly (global P < 0.001, adjusted r2 = 0.80) with clinal genetic clustering. To account for geography, we used partial ordination, constraining the RDA using a Cartesian reprojection of sampling, accounting for land as a barrier to marine dispersal. Environment explained 40% of genetic variance, geography explained 7%, and the interaction between geography and environment explained 53%, suggesting that environment strongly influenced the clinal structure across all species. Seasonal temperature measurements from the spring, fall, and winter correlated most strongly with clinal structure, with average RDA adjusted r2 values of 0.64 at the bottom and 0.52 at the surface. Adjusted r2 values for salinity (0.12) and summer temperatures (0.18) explained less variance on average. Overall, a model that included spring bottom temperature and winter sea-surface temperature best predicted concordant clinal population structure across the five species (see relationships; Fig. 3).

Fig. 3 Environmental relationships for bottom temperature and assignment probability (DAPC, solid line), admixture (STRUCTURE, short dashed line), and lagged sPCA first axis score (sPCA, long dashed line).

Here, assignment probability and admixture are relative to the southern population, and sPCA scores have been rescaled between 0 and 1 to align with other coefficients. Points colored by latitude up to 50°N. Insets show distribution of temperature variables partitioned between southern and northern (N) and southern (S) groups as defined by the cline. Temperature measurements were taken at the bottom for winter and fall and at the surface for spring and the annual minimum.

Most studies attribute congruent multispecies genetic structure in marine systems to one, or a combination, of three driving mechanisms (7): (i) oceanographic gradients influencing biogeographic selection; (ii) ocean currents driving consistent dispersal/retention patterns; and (iii) past divergence associated with vicariant separation (for example, glaciation) that remains evident in contemporary genetic structure. Our observation of an abrupt genetic break among species characterized by a variety of pelagic larval life histories (for example, timing, duration, and water column position) and adult movement characteristics (that is, spanning migratory to near-sedentary species) suggests that current systems and associated larval dispersal-retention mechanisms are unlikely the primary determinant of this large-scale spatial structure. Full evaluation of this supposition would require further studies to evaluate the strength of natural selection as a function of dispersal distance and clinal shape. Nonetheless, our observation that gradients in seasonal thermal minima best align with observed genetic structure suggests that the biogeographic barrier could result from differential postsettlement mortality associated with adaptive divergence spanning the environmental gradient. Pleistocene glaciation has significantly affected the historical genetic structure of many NW Atlantic taxa (7). The divergence we observe could, in part, result from this historical vicariance (5, 9, 10) and/or from landscape-independent endogenous genetic incompatibilities, sensu Bierne et al. (8), in addition to contemporary adaptive divergence in response to the climatic gradient. For instance, population structure of the European green crab aligns with our observations, wherein range expansions and secondary contact following two independent colonization events (20) correspond with the estimated genetic break observed for the other four native species. Previous studies proposed that both adaptive divergence carried over from European green crab source populations (23) and the NW Atlantic’s steep environmental gradient (20) might drive North American green crab population structure.

Temperature plays a significant role in structuring marine populations. Cold stress limits species’ ranges (15) and community diversity (24), and temperature gradients are important evolutionary drivers in the NW Atlantic (25). Our analysis provides a coherent multispecies link between environmental factors and population structure (Fig. 3). To evaluate the interplay between environmental dynamics and population genetics, we developed spatially disaggregated habitat suitability models for each species reflecting the estimated multispecies genetic break point at ~44.61°N. Species distribution models (SDMs) were developed using maximum entropy modeling based on species observational data and seasonal environmental correlates. We investigated the potential impact of climate change on species’ distributions by comparing their contemporary distributions to those distributions projected using downscaled climate projections for our study region in the year 2075 (26).

Our model predicts a northward shift (224.6 ± 96.6 km, means ± SD) in the distribution of all species and population structures (Fig. 4, A and B), mirroring shifts already observed in the northeast Atlantic (17). In the NW Atlantic, the asymmetrical influence of changing climatic conditions leads to more pronounced distributional shifts and changes suitable habitat availability, on average, in northern subpopulations (Fig. 4), where seasonal thermal minima largely define suitable habitat, consistent with our observations of environmental relationships with the genetic clinal structure (Fig. 3). Predicted distributional shifts were particularly pronounced in northern populations of lobster, green crab, and shrimp (Fig. 4A), with average shifts of 359 km (±115). Only cod demonstrated a significantly higher northern shift in the southern population, echoing fluctuations in abundance across their range (27, 28) and significant reductions in the availability of suitable habitat (Fig. 4C). For northern shrimp, declines in landings over the last decade (29) are consistent with predictions of declining habitat availability and a shift to higher latitudes over the coming decades. Predicted northern distributional shifts in lobster, green crab, and scallop were associated with habitat expansions in the northern part of their respective ranges, whereas southern ranges were not associated with expanded habitat, except for green crab (Fig. 4). Predicted species distributional changes will also influence the position and shape of the multispecies genetic cline, suggesting variation in genetic structure, and could potentially offer a sensitive indicator for ecological change in warming oceans.

Fig. 4 Mean distributional shifts ± 95% confidence intervals (CIs) for northern and southern groups of each species.

(A) Northward shifts in the centroid of each population group, (B) northward shifts in the clinal breakpoint between population groups, and (C) net change in the spatial extent of suitable habitat for each population group.

Predicting how climate and potential climate change influence population demographic processes and understanding causal relationships (14, 15) across taxa remains elusive. Studies such as ours that integrate genomic and oceanographic data to evaluate multispecies genetic clinal pattern provide strong inference regarding the relationship among environment, local adaptive population structure, and projected species and community distributional shifts. Our observation of a relatively sharp multispecies genetic discontinuity provides strong evidence of a new, cryptic, marine biogeographic boundary and offers a new perspective of population demographic and connectivity processes in the NW Atlantic.


Data collection

To evaluate climate-associated genetic variation among species, we used available genome-wide data extracted from five genetic censuses conducted along the coast of the NW Atlantic (table S1). Data sets were composed of microsatellites, expressed sequence tags, derived single-nucleotide polymorphisms (SNPs), and restriction site–associated DNA sequencing (RAD-seq) (table S1). This suite of data sets encompassed 1979 individuals, with samples spanning 48 locations between Delaware Bay, United States and Labrador, Canada (Fig. 1A). We prepared RAD-seq libraries with Sbf1 digestion enzyme and SNP markers identified de novo using the Stacks pipeline (30). The individual studies (table S1) include detailed methods on sampling, DNA extraction, sample processing, and data filtering/archiving.

To evaluate potential associations between environment and genetic structure, we considered ecologically relevant and readily available variables, such as temperature, salinity, and bottom topography. Temperature and salinity data were aggregated to seasonal climatological data (averaged across 2002 to 2012) layers, corresponding with winter (January to March), spring (April to June), summer (July to September), and fall (October to December). Seasonal SST and sea surface salinity (SSS) were calculated at nominal 1 × 1–km grid cell resolution from AMSR-E Level 3 SST climatological satellite data, including Advanced Very High Resolution Radiometer data (AVHRR Atlantic; compiled by Fisheries and Oceans Canada) and global oceanographic climatological SSS composites (31). Benthic temperature and benthic salinity climatological data layers were interpolated at nominal 8 × 8–km grid cell resolution from a numerical climatological model (GLORYS2V1 at 1/4 degree resolution) adapted to the study area by Fisheries and Oceans Canada. Topographic complexity of the seafloor at nominal 1 × 1–km grid cell resolution was represented by east-west and north-south components of aspect, slope, plan and profile curvature, and rugosity (32). These topographic data layers were used as predictive surfaces for each of the five species native to the range (table S1). Point estimates of each environmental variable were extracted from these topographic data layers for each genetic sample location to explore potential genetic-environmental relationships [point estimate data archived online (33)].

We projected forecasts of climatological seasonal temperature and salinity data layers to the year 2075, matching the resolution of either the surface (1 km2) or bottom layer (8 km2) using a numerical model of projected monthly anomalies in the NW Atlantic [NEMO RCP 85 2075 at nominal 5 × 5–km grid cell resolution (26)]. All resulting contemporary and prognostic climatological data layers were converted to the ASCII (American Standard Code for Information Interchange) grid with WGS84 global stereographic projection, and a uniform land mask was applied. We collated occurrence data encompassing the NW Atlantic extent of each species distribution from Fisheries and Oceans Canada databases, which included assessment surveys and commercial landings.

Clinal variation

Our analysis evaluated genetic spatial patterns across an environmental gradient. In particular, we evaluated congruence among species along a latitudinal north-south split as observed independently in previous studies (5, 34, 35). Analyses were conducted using all available genetic data, which encompass a variety of genetic and genomic sources (table S1). Noting the availability of a wide variety of methods to evaluate genetic clustering and spatial structure from genotypic data, we chose both model-STRUCTURE– (36) and nonmodel-DAPC–based (37) approaches to measure spatial clustering and the north-south divide for each species. These approaches differ in their computational basis: STRUCTURE uses Bayesian hierarchical clustering and all genetic variations, whereas DAPC partitions genetic variation to specific allele-based linear combinations that maximize between-cluster differences but minimize within-cluster differences. Although the output from each analysis differs, combining these outputs provides computationally independent models of clinal structure. The results from both analyses provide a standard relative scale (0, very unlikely; 1, very likely) defining the assignment of an individual or population to a genetic cluster. We ran both analyses individually for each species. STRUCTURE runs had 100,000 burn-in iterations and 500,000 Markov chain Monte Carlo iterations using the admixture model and no prior structuring information.

STRUCTURE and DAPC methods provide useful metrics of genotypic clustering, but neither approach explicitly uses spatial information. sPCA (38) incorporates spatial variability directly by optimizing the variance of the principal components while accounting for spatial autocorrelation among entities (individuals or populations). We performed an sPCA analysis to incorporate spatial information when characterizing the multilocus geographic cline. Latitude and longitude do not incorporate spatial information; therefore, they are not surrogates of Cartesian space. Moreover, straight line distances do not represent dispersal distance among sample locations realistically given the inclusion of land masses in our study area (Fig. 1). To account for these land barriers, we calculated pairwise Euclidian least-coast distances between each sample location, treating land as an impermeable barrier to dispersal using the R package marmap (39). Using this distance matrix, we reprojected the coordinates into two-dimensional Cartesian space, accounting for land constrained distances, using a nonmetric multidimensional scaling function in the R package vegan (40). This reprojection produced low stress, averaging 0.01 (±0.007) among species, and pairwise Cartesian to least-cost distances yielded a highly coherent, near 1-1, relationship (slope, 1.003; intercept, −12.65; P < 0.001; r2 = 0.98; fig. S2). sPCA analyses used the neighborhood-by-distance connection network based on these Cartesian coordinates. For each species, we retained the first positive eigenvalue from the sPCA analyses as an indication of global structure. To match this output with coefficients from the STRUCTURE and DAPC analyses, we standardized the lagged sPCA score range between 0 and 1 for each species. We then used Monte Carlo post hoc permutation tests to test for evidence of “global” (large scale, clinal) versus “local” (small scale) structure for each species.

To evaluate a multispecies cline, we combined clustering coefficients and standardized sPCA axis scores among species and modeled them as a function of latitude and along-shore, least-cost distance (distance from each station to the southernmost sampled location—Mid-Atlantic Bight 38.8°N; see fig. S2). We fit genetic spatial clines using generalized logistic mixed models with a logit transformation and a beta distribution and the R package glmmTMB (41). Models were fit using a random intercept to account for species dependency, while fitting an average relationship with latitude and along-shore distance. Separation between the presumptive “north” and “south” populations for each statistical method, among species, was assigned on the basis of the latitude associated with the inflection point of the cline.

We modeled the relationships between environment and clinal structure using an RDA in the R package vegan (40), with the cluster coefficients and sPCA axis scores, pooled among species, as the response variables and the point estimates of seasonal temperature and salinity as the environmental predictors. To evaluate which combination of environmental variables best predicted clinal structure, we used stepwise model selection by permutation tests in constrained ordination. We used partial ordination, conditioning for space, to partition the explained variance in clinal structure among environment, space, and their interaction. We conditioned space within the partial ordination using the Cartesian reprojections for each sample site.

Species distribution modeling

Species distributions were predicted by maximum entropy modeling (MaxEnt) (42), a machine learning algorithm that mathematically predicts the potential distribution of a species occurrence using climatic and geographic variables. We created spatially disaggregated SDMs using MaxEnt to correlate species occurrence data with environmental predictors on either side of the presumptive phylogeographic break point. From these clinal-partitioned occurrence layers, MaxEnt extracted a sample of background locations (n = 100,000) and contrasted them against our presence locations to delineate the multidimensional range of environmental conditions that determine relative occurrence of each species. To allow for model convergence, we allowed model runs of 1000 iterations. After spatially rarefying occurrence data and selecting background sample locations according to a “bias surface” (43), we used replicated, geographically structured cross validation using the ENMeval package in R (44) followed by bootstrap replication (30 iterations) to generate CIs around prediction surfaces in MaxEnt. These methods help mitigate potential issues associated with spatial autocorrelation and inflated estimates of model performance (45). We conducted further tests on the applicability accuracy of each model, including clamping and multivariate environmental similarity surfaces analysis (46), according to methods outlined in Lowen and DiBacco (47).

On the basis of predicted ecological tolerances estimated from the environmental predictors, we estimated species distributions on either side of the phylogeographic break by applying each model to all cells in the study region using a logistic link function to yield a relative probability of occurrence between 0 and 1. We applied this method using both contemporary and projected environmental data to produce current and future probability surfaces.

We reclassified continuous probability MaxEnt output surfaces (mean and 95% CIs) for both present and future climate scenarios to binary SDMs (presence or absence) using values obtained from the 10th percentile training presence threshold. Distributional changes (mean ± 95% CI) between pairs of binary SDMs (that is, current and future for each species) were subsequently estimated using the ENM toolkit (43). Outputs include geographical summaries (mean ± 95% CI km2) of range contraction, expansion, and stability as well as distributional centroid shifts for each species and clinal grouping. In addition, we quantified latitudinal shifts (mean ± 95% CIs) in the delineation between northern and southern environmental conditions (sensu the inflection point of the cline) for each species.


Supplementary material for this article is available at

fig. S1. Pairwise comparisons of least-cost path distances among species sample stations and the Cartesian projection distances using nonmetric multidimensional scaling.

fig. S2. Spatial clustering analysis using STRUCTURE, DAPC, and sPCA as a function of along-shore distance.

table S1. Sources and metadata for combined genetic analysis.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank L. C. Hamilton and members of the Aquatic Biotechnology Laboratory at the Bedford Institute of Oceanography for their assistance with bioinformatics and data assembly. We also thank S. Heaslip, B. Dempson, and C. Grant for their comments and input on an earlier version of the manuscript. We are grateful to the many graduate students and technical staff who contributed to the sample collection during preliminary studies. Funding: This research was supported by funding from Fisheries and Oceans Canada’s Strategic Program for Ecosystem-Based Research and Advice (SPERA), Aquatic Climate Change Adaptation Services Program (ACCASP), and the Government of Canada’s Genomic Research Development Initiative (GRDI). Additional postdoctoral support for R.R.E.S. was provided by the National Sciences and Engineering Research Council of Canada’s Canadian Healthy Oceans Network (CHONe II). Author contributions: I.R.B., R.B., and C.D. conceived the study. R.R.E.S. and I.R.B. conceived and developed the genetic data analysis with contributions from P.B., N.W.J., and B.F.W. All authors contributed to data interpretation. Habitat suitability modeling was conducted and led by B.L. with contributions from R.R.E.S. and C.D. Z.W. developed and ran the ocean circulation model with contribution from D.B., who provided and analyzed the model output. Genetic and environmental data contributions were made by N.W.J., M.V.W., L. Benestan, and L. Bernatchez. R.R.E.S., I.R.B., C.D., P.B., and P.V.R.S. wrote the manuscript with all the other authors contributing to subsequent drafts. Competing interests: The authors declare that they have no competing interests. Data and materials availability: Genomic data are available online with associated studies listed in table S1. Environmental point data corresponding to sample locations are available online (33). All other data needed to evaluate the conclusions of this paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.
View Abstract

Navigate This Article