Research ArticleECOLOGY

The commonness of rarity: Global and future distribution of rarity across land plants

See allHide authors and affiliations

Science Advances  27 Nov 2019:
Vol. 5, no. 11, eaaz0414
DOI: 10.1126/sciadv.aaz0414


A key feature of life’s diversity is that some species are common but many more are rare. Nonetheless, at global scales, we do not know what fraction of biodiversity consists of rare species. Here, we present the largest compilation of global plant diversity to quantify the fraction of Earth’s plant biodiversity that are rare. A large fraction, ~36.5% of Earth’s ~435,000 plant species, are exceedingly rare. Sampling biases and prominent models, such as neutral theory and the k-niche model, cannot account for the observed prevalence of rarity. Our results indicate that (i) climatically more stable regions have harbored rare species and hence a large fraction of Earth’s plant species via reduced extinction risk but that (ii) climate change and human land use are now disproportionately impacting rare species. Estimates of global species abundance distributions have important implications for risk assessments and conservation planning in this era of rapid global change.


Why some species are common and others are rare has intrigued ecologists (1, 2), at least, since Darwin (3). Rare species are orders of magnitude more likely to go extinct (4, 5), making it puzzling how so many rare species can be maintained (6). Understanding rarity and the maintenance of rare species is also central to conservation biology [e.g., (7)] and to understanding current and future changes in biodiversity due to global change (8). Despite this importance, we know unexpectedly little about the causes of commonness and rarity and their maintenance at a global scale (9, 10).

Most quantifications of species abundance use abundances in local communities because estimates of global taxon abundance are difficult to obtain. However, there are two major limitations to focusing solely on local abundance. First, most species tend to be simultaneously common in a few parts of their ranges and rare in most of their ranges (11, 12), making estimates of local abundance a noisy and a poor measure of how truly rare a species is globally. Second, at a global scale, a measure of rarity results from a combination of the average local abundance and the number of sites occupied throughout the species geographic range. Local species abundance and species occupancy across the geographic range tend to be correlated (1214), so locally rare species tend to also show up in only a few local communities. This makes it likely that estimates of global abundance will be more skewed to the rare, but this has rarely been tested (15). A global estimate of rarity can therefore minimize the potential problems associated with assessing whether a species is rare. Fortunately, with the rapid development of biodiversity databases and networks in the past decade, it is becoming increasingly possible to quantify continental and global patterns of biodiversity and test competing models for the origin and maintenance of these patterns at a global scale (16).

Here, we use a global botanical database of unprecedented coverage to (i) assess global patterns of plant rarity, (ii) test several proposed hypotheses underlying the generation and persistence of rare species, (iii) identify regions that harbor hotspots of rare species and explore the drivers of these spatial patterns, and (iv) assess how current patterns of human impact and future climate change scenarios may affect plant diversity via impacts on rare species. In the past, quantification of global patterns of abundance and rarity has been hampered by the many limitations of global biodiversity data. These issues have made the use of these data in comprehensive biodiversity analyses difficult (17, 18). Here, we take a novel approach that overcomes many of these limitations. For all known land plants (Embryophyta), we have compiled a global database of standardized botanical observation records—the integrated Botanical Information and Ecology Network (BIEN) [Fig. 1, BIEN v4.1;; see the Supplementary Materials; (19)]. The BIEN data are mainly composed of herbarium collections, ecological plots and surveys, and trait observations. Together, these data constitute more than 200 million observations of plant species occurrences. Assembling these data involves overcoming numerous challenges of taxonomy, data quality, data exchange, provenance, interoperability, and scaling (Fig. 1) (20). After correcting misspelled or synonymous taxon names and removing records with invalid or suspect geocoordinates, incomplete or unresolvable taxon names, and observations of non-native species and cultivated plants, the final dataset consists of 34,902,348 observation records of 434,934 land plant species from herbarium and ecological plot data (see Fig. 1 and the Supplementary Materials for details of data cleaning and validation).

Fig. 1 Computational workflow for creating gSADs.

TNRS, Taxonomic Name Resolution Service; GNRS, Geographic Name Resolution Service.

We quantified the distribution of global abundance for all land plant species (hereafter, plant species) on Earth using a metric of global relative abundance, the total number of unique observations of a species ever recorded in global databases. The distribution of the total number of global observations per species [the global species abundance distribution (gSAD)] is an estimate of global abundance and is still a sample, as a count of all individuals on the planet is impossible. Nonetheless, quantifying the functional form of gSADs has a substantial practical advantage over other estimates of abundance. First, we can combine data from different datasets including plots and surveys, and herbarium specimens to increase sampling coverage. These datasets all share the common attribute of observing an individual of a given species in a given location and time. Second, comparing and integrating estimates of gSADs from different datasets (e.g., plots versus herbarium specimens) provide a way to assess potential biases in estimating species global abundance. For example, gSADs can be estimated by compiling only plot or ecological survey data. In plot data, a global estimate of species abundance is quantified directly, as each individual of that species is summed within and across plots. As we discuss, our approach is less biased than local plot-based abundance data that samples only a tiny fraction of Earth’s surface.

Traditionally, measures of rarity have been based on a multidimensional concept. For example, Rabinowitz (21) identified three major axes on which a species can be common or rare: local abundance, extent of the geographic range, and habitat specificity. Although conceptually these three dimensions are independent, they are often strongly positively correlated (22). Four of the five criteria the International Union for Conservation of Nature uses to evaluate extinction risk for their Red List (23) directly involve measurement of rarity via absolute levels of or declines in abundance and geographic distribution, while the fifth involves computer simulations, which are likely to incorporate population size and range size as well. These criteria all point to the importance of measuring rarity at global scale (in contrast to local rarity). A species may be globally rare because it has few individuals at many sites or many individuals at few sites.


We generated three gSAD distributions based on summing individual observations of species across all ecological plots, by summing all observations across all other botanical observation records, and by summing the non-plot observation records found within 100 km distance from each ecological plot. Our analyses reveal that a large fraction of the plant species on Earth are rare (Figs. 2 and 3). Analyzing the distribution of the number of observations per species reveals that the global-scale distribution is highly skewed and lacking a central tendency (i.e., the mode of the gSAD is at N = 1; Fig. 2). The total number of land plant species is ~435,000 (the number of species before geovalidation based on 66,334,188 observation records), a large fraction of these species, 36.5% or 158,535, are rare, with just five observations or fewer, while 28.3% or 123,149 have just three observations or fewer. The large number of rare species is consistent with past claims that when biodiversity observations are compiled at increasingly larger spatial (15) and temporal scales (24), rare species should comprise an increasing majority of species.

Fig. 2 The gSAD for all plant species.

(A) Schematic illustration of the predicted gSAD based on expectations from theory (see main text) (28). In the inset, we list several differing predictions for the shape of the gSAD. (B) Two estimates of the gSAD for all land plant species. The first distribution (green) is the observed number of observations per species for all species found in ecological plots. Each data point represents the total number of individuals observed for a given species. The second distribution (red) is all botanical specimens collected within 100 km of each plot. The third distribution (light purple) is all botanical specimens per species. Each distribution is strongly modal at the lowest abundance, showing that most species have only been observed a very small number of times and only a few species are common. The distributions are shown on log10-transformed axes. Comparing the shape of the distributions of the competing fits of differing proposed gSAD distributions allows us to test differing hypotheses for the origin of the gSAD.

Fig. 3 Does using the number of observations in botanical datasets provide a reliable measure of rarity?

Assessments of rarity by taxonomic specialists at the Missouri Botanical Garden and the New York Botanical Garden for a random sample of 300 species with three observations or fewer in the BIEN database. Most species (72.7%) identified as “rare” based on the number of unique occurrences within the BIEN database are also recognized as rare by experts. Approximately 7.3% of these species appear to be incorrectly characterized as rare, as they are recognized by experts as abundant or having large ranges. The apparent scarcity of approximately 7.5% of these taxa may reflect recent taxonomic splits or old names no longer used. Moreover, 10.3% are non-native species (which may or may not be rare). In sum, we estimate that between 72 and 90% of plant taxa (recognized as rare + recent name + unresolved + old name) identified by BIEN as being rare would be recognized as rare by other measures.

Global species abundance distribution

We tested several long-standing hypotheses concerning the processes creating and maintaining large-scale patterns of commonness and rarity. Specifically, we assessed whether the number of observed rare species follows predictions from biodiversity theory by comparing several proposed statistical distributions for the gSAD. First, we assessed two contrasting sets of predictions for the distribution of commonness and rarity of species (Fig. 2). Specifically, at increasingly larger geographic scales, both the unified neutral theory of biogeography (UNTB) (25) and the k-niche model (26) predict that the gSAD will converge on Fisher’s log-series distribution (27)f̂=αxnn(1)where f̂ is the expected number of species, n is the total number of observations per species, α is the diversity parameter, and x is a nuisance parameter that is defined by α and the total number of individuals sampled, N, x = N/(N − α). The UNTB further makes two predictions: (i) At increasingly large spatial scales (such as continental and global scales), the Fisher’s log-series distribution will also increasingly converge to approximate a “power law” (or a Pareto distribution) over most of the range of the distribution (see Fig. 2A) (28), wherefˆ=β1n0(nn0)β(2)where (ii) the value of β, the scaling exponent or slope on a log-log plot, will equal −1.0. For the continuous Pareto or power law distribution, n0 is the minimum scale of the distribution, and β is the scaling exponent (29). For the BIEN data, the minimum number of observations for a species is 1, so it was set at 1.

The UNTB predicts that the gSAD (called the regional pool in neutral theory) will follow a log-series distribution. Pueyo (28) notes that the log-series distribution consists of two parts multiplied together: a Pareto distribution with exponent β = −1 that is the result of neutral dynamics and an exponential “bend” that takes effect at very high abundances due to the finite size assumption. Pueyo (28) also suggests a generalization of the Pareto and log series that incorporates a Pareto where the exponent β is allowed to vary combined with an exponential finite size term, which we call here “Pareto with exponential finite adjustment.” Thus, testing whether the gSAD is best fit by a log series (where β = −1), a Pareto distribution (where β is allowed to vary), or a Pareto with exponential finite adjustment (where β is also allowed to vary) provides a test of neutral dynamics. In sum, both the UNTB and k-niche model predict that the log-series distribution will best fit gSADs, but at large geographic scales, this distribution will also converge to a Pareto distribution. Thus, fitting the Pareto or the Pareto with exponential finite adjustment provides a simultaneous test of whether neutral or niche dynamics are consistent with the data (28). A poor fit or a value of β ≠ 1.0 rejects neutral theory. A poor fit of the Pareto regardless of the value of β further rejects the k-niche model (28). In addition, the value of β is then a useful ecological and evolutionary indicator of whether Earth has more rare species (β < −1; the slope of the function is steeper) or fewer (β > −1; the slope of the function is flatter) rare species than expected under zero-sum neutral evolutionary dynamics (28, 30).

In contrast to the predictions from the UNTB and k-niche model, the central limit theorem (CLT) predicts that gSADs will be characterized by a lognormal distribution. If the abundance of a species is the result of several multiplicative processes acting together (31), then lognormal distributions are expected. Because of the CLT, a lognormal distribution is expected any time many variables interact multiplicatively to influence abundance, such as many differing biotic and abiotic factors [see references in (32)]. Common processes in ecology and evolution are known to interact multiplicatively to influence species abundance (see Supplementary Document) (32). One context in which random variables are multiplied (yielding a lognormal) is consecutive annual population growth rates, although the applicability of this across species (i.e., to generate SADs) is controversial (33), relying on subtle philosophical interpretations of exchangeability. Some authors such as May (34) and MacArthur (35) say it can, while others such as Pielou (36) (see page 48) say it cannot produce a lognormal. This debate, however, is a red herring because many other biological processes in ecology and evolution also interact multiplicatively and can influence variation in interspecific abundance. For example processes that lead to niche partitioning, stochastic density-dependent differential equation models (37), models of rates of fixation of favorable alleles (35), or hurdle models (15) can generate lognormal SADs. Note that in the case of discrete abundances sampled from a continuous lognormal, we have a Poisson lognormal (38).

Next, we fit several additional models and statistical distributions that have been proposed to describe the distribution of commonness and rarity [see (39, 40) and the Supplementary Materials]. Using maximum likelihood estimations (MLEs), we fit each distribution to three ways to assess empirical gSAD: (i) for all of the species observation records within the BIEN database, (ii) for all species recorded only from ecological plots, and (iii) for all specimens found within 100 km around each ecological plot. Comparing the goodness of fit of various models for each of these gSADs allows us to compare potential sampling biases in botanical data.

The best model varied with which measure of goodness of fit was used, as well as with the dataset used (Tables 1 and 2 and tables S1 and S2). However, in general, the truncated Pareto (i.e., a modified Pareto distribution that adds an additional parameter to allow the right tail to drop down because of finite sample size (28)] and the Poisson lognormal (41) both fit well. These models have strong skew on a log scale, indicative of many rare species. All three models (at the estimated parameter values) show the mode at species with one individual. The log series, while also showing a mode at one individual in a log plot, markedly underestimates the number of extremely rare species, and the remaining models fit the distribution even more poorly and have an interior mode, incorrectly predicting that the most common abundances will be intermediate.

Table 1 Three different measures of goodness of fit (r2 or percentage of variance explained in the cumulative distribution function, χ2 on log2 bins, and Akaike’s information criterion) are shown for six different species abundance models [see (40)].

All distributions shown have two parameters except the log-series and power distributions, which have one. Distributions were fitted for the number of observations per species across all species found (i) within ecological plots only and (ii) across all datasets within the BIEN database. Sampling species found only in plots standardizes for sampling influences, as all individuals within ecological plots are sampled and identified to species. Thus, the species abundance distribution from ecological plots is expected to more accurately describe the species abundance distribution. As predicted by the CLT, the Poisson lognormal distribution provides the best fit to both gSADs. Nonetheless, Pareto and truncated Pareto also all fit well. The log-series distribution, predicted by the k-niche model and neutral theory, falls behind these distributions across the different goodness-of-fit measures. AIC, Akaike’s information criterion; CDF, cumulative distribution function.

View this table:
Table 2 Parameter fits for each of the fitted statistical distributions.

The estimated slope values, β, of the gSAD are given in bold by fits of the Pareto and Truncated Pareto distributions. Note that the estimated slope values differ from −1.0 expected from the unified neutral theory of biodiversity. Instead, the observed fitted slope, β, is steeper than expected from neutral theory (with fitted exponents more negative than −1.0). The steeper exponents indicate that of all of the observed plant species on Earth, proportionally more of them are rare and that there are more rare species than expected by demographic and evolutionary neutral processes. Thus, the processes creating and maintaining rare species on Earth generate proportionally more rare species.

View this table:

The UNTB predicts that log-series distribution will be approximated by the fit of the Pareto power distribution, with an exponent, β = −1.0 (28). However, our fit of the log-series distribution shows that it was not the best fit and the fitted scaling exponent is steeper than −1.0 (β MLE = −1.41 for all of the BIEN observations and β MLE = −1.43 for the observations from ecological plot data; Fig. 2). Thus, a Pareto power distribution needs an exponent less than −1 to generate the number of rare species actually observed.

Together, these results underscore that, at continental to global scales, only a few species abundance distribution models are capable of producing sufficient numbers of rare species to match the observed data and that neutral dynamics under the UNTB is not one of them. The observed value of β for embryophytes is similar to what has been reported for an extensive dataset for other taxa including animals and marine phytoplankton (28), suggesting that the shape of the SAD at increasingly larger spatial scales may converge to a similar distribution across disparate taxa. In sum, the Poisson lognormal is best fit, the Pareto exponent is markedly steeper than −1.0, and the Pareto distribution is the second best fit on two of the three metrics.

Assessment of sampling or taxonomic bias

Next, an obvious question is whether the observed number of rare species is the result of sampling or taxonomic bias. Data from herbarium records are known to exhibit biases in collection and sampling (17, 18). However, do these biases influence our identification of whether a species is rare or not?

We conducted two tests: First, in Fig. 2, we compared the distributions of global abundance in (i) the total BIEN database (including plot surveys and herbarium records), (ii) only the plot datasets, and (iii) the subset of herbarium records that reflect the same geographic distribution as the survey data (e.g., all records within 100 km of any plot location) (Fig. 2). Ecological plots and surveys, in contrast to herbarium data, contain less sampling bias, as a robust effort is made to ensure all individuals within the sampling design are surveyed within a given area. In many cases, repeated visits ensure accurate identification to species. Thus, assessing whether the gSAD from plot data is different from the gSAD from all botanical observations based on sampling herbarium data at the globe or around plots enables us to assess potential bias and sampling effectiveness. As discussed below, both empirical gSADs are described by similar statistical distributions (e.g., the shape of gSADs in Fig. 2B are similar to each other and to Fig. 2A; Tables 1 and 2 and tables S1 and S2), indicating that sampling issues do not greatly influence conclusions regarding gSADs.

Next, to further assess whether rare species are truly rare or artifactual, we randomly sampled 300 rare species with three observations or fewer from the Americas. The Americas were chosen because our taxonomic expertise was focused on these two continents. For each species selected, we consulted taxonomic experts at the Missouri Botanical Garden and the New York Botanical Garden to sort each species into several classifications (Fig. 3 and see the Supplementary Materials). Taxonomic experts largely confirm that the majority of rare species identified by BIEN are rare, with only 7.3% that were clearly erroneous and recognized as abundant or large-ranged species. We conclude that our results are not driven by taxonomic and sampling biases.

Our results from Fig. 3 allow us to estimate the total number of native land plant species currently observed across the globe with estimates for taxonomic uncertainty. After correcting and standardizing data, we estimate that the total number of extant embryophyte (land plant) species on Earth is between ~358,000 and ~435,000. The lower limit stems from subtracting 17.8% from the total [10.3% from the remaining presence of naturalized non-native species +7.5% caused by the inflation of names due to “old names” (basionyms) not yet corrected for by taxonomic data cleaning; see Fig. 3]. Our estimates are consistent with previous estimates of the total number of embryophytes in the world of approximately 391,000 [see (42)] or 403,911 (43) (see the Supplementary Materials). However, now, we can quantify that ~36% of these species are highly rare with very little distributional information for each species. In sum, our results from Fig. 2 show that rarity is commonplace across the land plants. Little botanical information exists across the world’s herbaria and ecological collections for between 11.2 and 13.6% (species with one observation) or between 30.0 and 36.5% (species with fewer than or equal to five total observations) of all vascular plant species.

“Hotspots” of rare species

To identify the regions that harbor hotspots of rare species, in Fig. 4, we mapped the locations of rare species across the world. We controlled for variation in sampling effort by calculating both the Menhinick and Margalef indices (see Materials and Methods). Plotting the sampling-corrected number of rare species reveals several patterns. Rare species cluster in the Americas in (i) mountainous regions (particularly along the thin strip along the western flank of the Andean Mountains, Central America, and the southern Sierra Madre of Mexico), (ii) the Guiana shield in northern South America, and (iii) relatively small climatic regions that are strongly distinct from surrounding areas (the Atlantic Forest or Mata Atlântica in Brazil, the southern region of the California Floristic Province, and the Caribbean); in Africa in (iv) the Cape Floristic Region of South Africa, (v) mountainous regions of Madagascar, (vi) the coastal mountains of Cameroon, and (vii) the Ethiopian highlands and the Somali peninsula; and in Asia in (viii) southwestern China and the border regions of Myanmar, Laos, and Thailand, (ix) Malaysia, (x) New Guinea, and (xi) the mountainous strip from Iran through Turkey. In Europe, there are several regions of notably high diversity of rarity in and around (xii) the Mediterranean, including the Pyrenees and Caucasus.

Fig. 4 Where are rare species distributed geographically?

Plotting the geographic coordinates for all the observations for species with three observations or fewer at a coarse, 1° resolution reveals several patterns. The sampling background is shown (grey cells are areas with no georeferenced botanical sampling records, while yellow cells indicate regions with observation records but no rare species). Colored cells correspond to areas with rare species (species with three observations or fewer) rarified to the sampling intensity using the Margalef index (see the Supplementary Materials). Areas with a proportionally high number of rare species are dark brown (“hotspots of rarity”), while areas with relatively low numbers of rare species are yellow to orange. Areas with a high number of rare species tend to be clustered in a small number of locations including mountainous tropical and subtropical regions including New Guinea, Indonesia, southwestern China, Madagascar, the Andes (in Ecuador, Columbia, and Peru), Central America (Costa Rica and Panama), and southern Mexico. In addition, several notable temperate zone locations including the Fynbos in South Africa and southwest Australia, Northern Iran/Georgia/Turkey, and the Iberian Peninsula.

There is a relative dearth of rare species throughout the Amazon basin, confirming past claims that the Amazon flora consists largely of widespread and relatively abundant species (44). The areas identified by our methods show some overlap with areas independently identified as biodiversity hotspots (45) (e.g., Mesoamerican highlands, the Andes, Southeast Asia, and New Guinea) but differ in other areas.

Drivers of the spatial distribution of rarity

To assess the drivers of the spatial distribution of rarity, we conducted ordinary least squares (OLS) linear regression and simultaneously autoregressive (SAR) models to analyze the relationship between rarity index and environmental variables, including present climate, glacial-interglacial climatic velocity or instability of climate, and topography. Our OLS models showed that all the variables (annual mean temperature, annual precipitation, temperature seasonality, precipitation seasonality, temperature velocity, precipitation velocity, elevation, and heterogeneity of elevation) have significant relationships with both the Menhinick rarity index (tables S3 to S5 and fig. S2) and the Margalef rarity index (tables S6 to S8 and fig. S3), with the largest effects from temperature velocity and heterogeneity of elevation. In comparing the group models [present climate (annual mean temperature, annual precipitation, temperature seasonality, and precipitation seasonality), stability of climate (temperature and precipitation velocity), and topography (elevation and heterogeneity of elevation)], the model with instability of climate tended to outperform models with current climate and topography, while the full model showed the lowest Akaike’s information criterion (AIC). The exhaustively selected model did not include elevation as a predictor, although it had minor differences in model performance compared with the full model.

A Moran’s I test showed high spatial autocorrelation in the residuals of the OLS models, while we found no significant spatial autocorrelation in the residuals of the SAR models (tables S3 to S8). The coefficients of the SAR models were generally similar to those from OLS models, with the exceptions that signs of annual mean temperature, precipitation seasonality, and precipitation velocity switched from positive to negative in the SAR model. Temperature velocity remains the largest negative effect, and heterogeneity of elevation remains the largest positive effect in the SAR models (see figs. S2 and S3 and tables S3 to S8). Models incorporating climate stability and topography outperformed the model with current climate, while the full model remains the best-performing SAR model. The modeling results based on Menhinick and Margalef rarity index showed comparable results (tables S3 to S8 and figs. S2 and S3).

To summarize, areas that contain a higher number of rare species have had a more stable climate. The best predictor of plant rarity is the historical temperature velocity. Climate velocity describes climate instability with ecologically relevant units (distance/time; see discussion in Supplementary Document). In addition, mountainous area, as measured by the SD of elevation, is also a predictor with positive effect (tables S3 to S8 and figs. S2 and S3). Adding short-term annual variation (annual seasonality) in temperature and precipitation and mountainous conditions in addition to climate velocity does improve the explanation of the current spatial distribution of rarity (e.g., the proportion of variation explained, R2, increased to 0.193 for the OLS model and to 0.518 for the SAR model of Menhinick rarity index but less so for Margalef rarity index, 0.176 for the OLS model and 0.263 for the SAR model; tables S3 to S8). Together, these results are consistent with previous results [see (46, 47) and references therein], indicating that increased rates of climate change velocity negatively affect the retention of rare species, presumably because of increased rates of extinction during times of rapid climate change.

The overlaps between future climate velocity and human footprint and rarity indices

Our environment is facing rapid human changes at the global scale, so we quantified the intensity of human impact on the area with rare species (48). Regions with rare species are currently characterized by higher human impact and will experience faster rates of future climate change under representative concentration pathway 8.5 (RCP8.5) (Fig. 5). Areas with rare species have human footprint values of 8.5 ± 5.8, which is ~1.6 times higher (P < 0.001, Wilcoxon test) than that of the globe on average (5.2 ± 5.8). Furthermore, on average, areas with rare species are predicted to experience ~200 (±58) times greater rates of temperature velocity than those same areas experienced historically in terms of the overall glacial-interglacial climate shift across the past 21,000 years [from the last glacial maximum (LGM) to the present]. The ratio between future temperature velocity and this long-term overall historical temperature velocity is ~1.2 times greater (P < 0.001, Wilcoxon test) for areas with rare species than the globe will experience on average (170 ± 77) (Fig. 5). This is because areas with concentrations of rare species have previously been characterized by relatively more stable climates, but under the predicted climate change under RCP8.5, they will now experience velocities as high as the rest of the globe (see fig. S5).

Fig. 5 Regions that currently have high numbers of rare species are also characterized by higher human impact and will experience faster rates of future climate change.

(A) Density plot of human footprint index in areas with rare species (light gray) and the global map (dark gray). Areas with rare species have, on average, human footprint values of 8.5 ± 5.8, which is ~1.6 times higher (P < 0.001, Wilcoxon test) human impact than on the globe on average (5.2 ± 5.8). (B) Density plot of the ratio of future climate (temperature) velocity versus historical climate velocity. On average, areas with rare species will experience ~200 (±58) times greater rates of temperature velocity than those same areas experienced historically and will experience ~1.2 times greater (P < 0.001, Wilcoxon test) rates of temperature velocity change than the globe will experience on average (170 ± 77). (C) Global variation in the human footprint index. Areas with high human footprint are in brown. Areas with low human footprint are dark green. (D) Global map of the ratio between future (baseline climate to late century, 1960–1990 to 2060–2080, under RCP8.5) and historical rates of temperature change [LGM to baseline climate (~21 ka ago to 1960–1990)]. Future temperatures will increase across the globe. However, in comparison with historical rates of climate change, some areas will experience relatively faster (ratio values greater than 1; yellow to red values) or slower (ratio values less than 1; green to blue values) rates of change. Note that many of the regions of rarity hotspots are found in regions that will be experiencing relatively faster rates of climate change compared to historical rates of change.

Predicted changes of rarity indices

With the previously calibrated OLS and SAR full models, we made predictions of rarity indices under future projected climate. These showed worldwide decreases in rarity indices (Fig. 6), with the southern Andes and Southeast Asia predicted to experience the largest decreases. These decreases were likely due to the accelerated future climate velocities under RCP8.5, which are two orders of magnitude higher than those experienced from LGM [~21 thousand years (ka) ago] to the present day (see fig. S5). Note, however, that future velocities are estimated over a shorter time frame, which will tend to produce higher estimates.

Fig. 6 What will happen to rare species diversity with climate change?

(A) The predicted change in Margalef SAR rarity index under climate change from the autoregressive models (SAR). The rarity indices are log-transformed. Large decreases in climate suitability for rare species are in red to orange, whereas smaller reductions in climate suitability are given in green to blue colors. Note the large decreases in climate suitability for rare species in the Andes and Mesoamerica, African highlands, New Guinea, southwestern China, Indonesia, Nepal, and New Zealand. (B) The diagonal 1:1 line (red) represents situations of no difference between the predicted current and future rarity index from SAR and OLS models. All points in the scatter plot are below the diagonal line, indicating a reduction of rare species diversity across all the areas where they currently occur.


Our dataset represents the most comprehensive assembly of global plant diversity data to date, comprising both plots and herbarium specimens, from far more sources than previously available. Large quantities of primary biodiversity data have still not been mobilized, and those data that are available are subject to various forms of collection bias (17, 18). Thus, it is possible that the patterns we observe may change with additional data. However, comparison between plot and all observation gSADs (Fig. 2 and Table 1) indicates that both sampling methods yield similar results. Furthermore, the notable efforts we made in data cleaning and curation assure that our analyses represent, by far, the best window yet into global commonness and rarity in plants.

Our results indicate that hotspots of plant biodiversity largely reflect the accumulation of very rare species. Assessing the predictions of the Unified Theory of Neutral Biogeography [UTNB; (25)] for the distribution of commonness and rarity across species enables us to reveal likely drivers of rarity. The UTNB assumes that species overlap in their niches and are equivalent in their rates of speciation, extinction, and dispersal (25). It implies that biodiversity arises at random, as the abundance of each species follows a random walk so that the distribution of abundances across species is given by a dynamic equilibrium of speciation and extinction. Our results show that β ≈ −1.4, indicating that the proportion of plant species that are rare is higher than expected from neutral processes. Given that rare species are orders of magnitude more likely to go extinct (4, 5) than more abundant species, this begs the question: Why do we observe a larger proportion of observed rare species than expected from neutral theory?

Our analyses (tables S1 to S8 and figs. S1 to S5) suggest two primary reasons. First, current hotspots of rare species (Fig. 4) likely reflect areas with lowered risk of historical extinction. Rare species are often found in geographic localities that have had more stable climates that have likely lowered the probability of extinction [see (4, 5)]. Models that include relative climate stability better explain both the locations of hotspots of rarity and the shape of rarity distributions. The finding that climate (in)stability is important in non-neutral models has important real-world implications for ecology and conservation.

Second, rare species are spatially clumped in ways that support mechanisms for generating and maintaining rare species articulated by early theorists, who proposed roles for mountains and climate stability in influencing both rates of speciation and dispersal. In 1964, Simpson (49) hypothesized that “Small population ranges and numerous barriers against the spread and sympatry of related populations would therefore tend to increase density of species.” Janzen’s 1967 (50) “Why mountain passes are higher in the tropics” extended Simpson’s hypothesis to predict that mountainous regions in the tropics will harbor proportionally more rare species than temperate mountains or even topographically uniform tropical regions due to less variability in climate. Our findings of disproportionate numbers of rare species in mainly tropical mountains and more isolated regions support these ideas. More recent studies have also documented the importance of tropical mountains as harbors of biodiversity (51, 52), which supports our findings.

Our results have important implications for conservation in the face of climate change and other human impacts. If ~36% of species are rare and threatened (Figs. 5 and 6), then ~158,000 plant species are at risk of extinction. Although not all primary biodiversity data have been digitized, it is still remarkable that ~36% of all plant species known are only documented a very small number of times. In addition, our analyses show that rapid rates of current human impact and projected future climate change appear to disproportionately affect regions that harbor most of these rare species (Fig. 5), whereas the rare species likely have been in relatively more stable climates through their evolutionary history.

Ultimately, rare species, by definition, are more prone to reductions in population size and extinction and should be high priorities for conservation (4, 5) . Our results suggest that redoubling global efforts to conserve rare species is needed and that we have a closing window to do so. The tools to ensure that these rare species are maintained are area-based conservation and solutions to climate change (53). The Convention on Biological Diversity should recognize these areas as critical to conserving all life on Earth and important focal areas for expansion of conserved areas after 2020 (54). The climate convention seeks to avoid extinctions due to the exceedance of species’ natural ability to adapt to climate change, making these areas with high numbers of rare species and very high future-to-historic velocities of climate change yet another reason the world should move quickly to curb greenhouse gas emissions (55). Joint climate and biodiversity efforts should be made to ensure that these numerous but little-known species, living in unusual climatic circumstances, persist into the future.


Competing different hypothesized gSADs

As we described in the Supplementary Materials, we fit several additional hypothesized univariate distributions to the gSAD using the following proposed biological and statistical distributions. Most theories produced SADs that were so similar to each other that it was difficult to distinguish them given the noisy data and the fact that the differences were most pronounced in the tails, which were, by definition, infrequently observed (40). In Table 1, we provided several different goodness-of-fit measures. Each emphasizes different aspects of fit (chi-square on log-binned data emphasizes the fit of each statistical distribution to rare species, calculating an r2 on the predicted versus empirical cumulative distribution function); cumulative distribution function [describes the probability that a random variable, X, drawn from f(x) is ≤x] emphasizes the abundances with the most species (usually intermediate abundances), while likelihood emphasizes avoidance of extreme outliers. As previously noted, it is common for different measures of fit to select different SAD theories as providing the best fit to a single dataset (32). As a result, any claim of a superior fit must be robust by being superior on multiple measures.

Rarity indices

Because the sampling intensity for plants across the globe is not uniform, we assessed the rarified species diversity. For each 1° grid cell, we calculated the total number of observations or samples, N, as well as the total number of observed rare species, S; for mapping rarity across the globe, we focused on the rarist species - those species having three observation records or fewer. We calculated two separate rarified diversity measures for each 1° grid cell:

1) Margalef diversity (SMargalef), which assumes that species richness increases with sampling intensity N and, in particular, increases nonlinearly and approximately logarithmically with N.SMargalef=(S1)/ln N(3)

2) Menhinick diversity (SMenhinick). In a similar vein, the Menhinick diversity measure assumes that species richness also increases nonlinearly with sampling intensity, N, but according to a square root functionSMenhinick=S/N(4)

As the Menhenick index assumes a square root rarefaction function and the Margalef assumes a logarithmic rarefaction function, they represent both a more liberal and more conservative estimate of higher estimates of richness, respectively. Comparing both measures of SMargalef and SMenhinick revealed similar spatial maps, indicating that both measures result in identical conclusions.

Methods for regression models

As described in the Supplementary Materials, we conducted OLS linear regression models to analyze the relationship between environmental variables and rarity index. We included three groups of environmental variables that portray present climate (annual mean temperature, annual precipitation, temperature seasonality, and precipitation seasonality), stability of climate (temperature velocity and precipitation velocity), and topology (elevation and heterogeneity of elevation). We also calculated the SD of elevations within each one by 1° window and considered this as the heterogeneity of elevation. We performed log transformation of rarity index, temperature and precipitation velocity, elevation, and heterogeneity of elevation to get normally distributed residuals in the regression models. We standardized all variables to zero mean and 1 SD to make the regression coefficients comparable. With 4571 records (for Menhinick index, or 2940 for Margalef index) we conducted OLS linear regression models to explore the bivariate relationship between rarity index and each environmental variable.

We also constructed multiple regression models using each group of variables (present climate, stability of climate, and topology) and using all variables (full model). We conducted multiple regression models through exhaustive model selection based on AIC values using all environmental predictors. Last, to account for spatial autocorrelation in climate data, we performed Moran’s I test and SAR models for all the OLS models mentioned above.

Climate change and future predicted changes in rarity indices

With the previously calibrated full models (OLS and SAR models), we made predictions of rarity indices under future projected climate. We used the full models, as they outperformed individual models or subgroup models and had comparable performances with the exhaustively selected model. We obtained future climatic variables from WorldClim ( (56). We used the future climate in 2070 constructed by the Community Climate System Model (CCSM4) under RCP8.5 scenario, which has comparatively high greenhouse gas emissions (57). To match the resolution of the rarity map, we sampled the environmental variables (annual temperature, annual precipitation, temperature seasonality, and precipitation seasonality) to 1° cells. We further calculated the temperature and precipitation velocity between present and future following (46). The two topological variables (elevation and heterogeneity of elevation) were kept the same as the present. After making the predictions, we compared the differences between predicted rarity indices under present and future climate.

Rarity and climate velocity

Using data sources and methods described above in regression model methods, we derived velocity of temperature change and velocity of precipitation change over the following periods: LGM to baseline climate (~21 ka ago to 1960–1990) and baseline climate to late century (1960–1990 to 2060–2080) ( under RCP8.5 (see Supplementary Document). Velocity was calculated using the neighborhood statistic approach originally described by Sandel et al. (46); see also (58).We note that our calculation of velocity of historical climate change and future climate change must be interpreted with caution, as they were calculated over different time intervals (46). We compared velocity values at locations where (i) there are rare species observations and (ii) there are no rare species observations and to (iii) background sampled locations. This comparison was conducted for both historical change since LGM and projected future change.

Rarity and the human footprint

We downloaded global human footprint data (48) and resampled to the resolution of the rarity map. We extracted the values of human footprint where rare species exist (i.e., 1° by 1° spatial windows where one or more rare species are observed) and compared the mean of those values with that of the global human footprint map using the Wilcoxon test.


Supplementary material for this article is available at

Supplementary Document

Table S1. As in Table 1 but for specimen data found within 1° proximity to each plot.

Table S2. As with Table 2 but for specimens near plots.

Table S3. Summary statics of OLS linear regression models and SAR models for predicting the Menhinick rarity index.

Table S4. Summary statics of OLS linear regression models for predicting the Menhinick rarity index.

Table S5. Summary statics of SAR models for predicting the Menhinick rarity index.

Table S6. Summary statics of OLS linear regression models and SAR models for predicting the Margalef rarity index.

Table S7. Summary statics of OLS linear regression models for predicting the Margalef rarity index.

Table S8. Summary statics of SAR models for predicting the Margalef rarity index.

Fig. S1. Sampling density for different data types in BIEN.

Fig. S2. Scatter plots showing the relationships between bivariate relationship between Menhinick rarity index and environmental variables.

Fig. S3. Scatter plots showing the relationships between bivariate relationship between Margalef rarity index and environmental variables.

Fig. S4. Predicted changes of Margalef rarity index using either the OLS or the SAR models.

Fig. S5. Historical and future global temperature velocities.

References (59108)

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: This work was conducted as a part of the BIEN Working Group, 2008–2012. We thank all the data contributors and numerous herbaria who have contributed their data to various data compiling organizations (see the Supplementary Materials) for the invaluable data and support provided to BIEN. We thank the New York Botanical Garden; Missouri Botanical Garden; Utrecht Herbarium; the UNC Herbarium; and GBIF, REMIB, and SpeciesLink. The staff at CyVerse provided critical computational assistance. We thank the more than 50 scientists who participated in our various BIEN working group and subgroup meetings since 2008 including B. Blonder, K. Engemann, E. Fegraus, J. Cavender-Bares, B. Dobrin, K. Gendler, R. Jorgensen, G. Lopez-Gonzalez, L. Zhenyuan, S. McKay, O. Phillips, J. Pickering, N. Swenson, C. Vriesendorp, and K. Woods, who participated in a working group meeting, and D. Ackerly, E. Garnier, R. Guralnick, W. Jetz, J. Macklin, N. Matasci, S. Ramteke, and A. Zanne who participated in subgroup meetings. We also acknowledge the critical support of the University of Arizona high-performance computing resources via the Research Data Center as well as iPlant and CyVerse support from R. Jorgensen, S. Goff, N. Matasci, N. Merchant, M. Narrow, and R. Walls. Furthermore, the long-term vision, encouragement, and computational support of F. Davis, S. Hampton, M. Jones, N. Outin, and the ever-helpful staff at NCEAS were critical for the completion of this first stage of the BIEN working group. Special thanks to K. Koenig for cartographic support. We acknowledge the herbaria that contributed data to this work: A, AAH, AAS, AAU, ABH, ACAD, ACOR, AD, AFS, AK, AKPM, ALCB, ALTA, ALU, AMD, AMES, AMNH, AMO, ANGU, ANSM, ANSP, AQP, ARAN, ARIZ, AS, ASDM, ASU, AUT, AV, AWH, B, BA, BAA, BAB, BABY, BACP, BAF, BAFC, BAI, BAJ, BAL, BARC, BAS, BBB, BBS, BC, BCMEX, BCN, BCRU, BEREA, BESA, BG, BH, BHCB, BIO, BISH, BLA, BM, BOCH, BOL, BOLV, BONN, BOON, BOTU, BOUM, BPI, BR, BREM, BRI, BRIT, BRLU, BRM, BSB, BUT, C, CALI, CAN, CANB, CANU, CAS, CATA, CATIE, CAY, CBM, CDA, CDBI, CEN, CEPEC, CESJ, CGE, CGMS, CHAM, CHAPA, CHAS, CHR, CHSC, CIB, CICY, CIIDIR, CIMI, CINC, CLEMS, CLF, CMM, CMMEX, CNPO, CNS, COA, COAH, COCA, CODAGEM, COFC, COL, COLO, CONC, CORD, CP, CPAP, CPUN, CR, CRAI, CRP, CS, CSU, CSUSB, CTES, CTESN, CU, CUVC, CUZ, CVRD, DAO, DAV, DBG, DBN, DES, DLF, DNA, DPU, DR, DS, DSM, DUKE, DUSS, E, EA, EAC, EAN, EBUM, ECON, EIF, EIU, EMMA, ENCB, ER, ERA, ESA, ETH, F, FAA, FAU, FAUC, FB, FCME, FCO, FCQ, FEN, FHO, FI, FLAS, FLOR, FM, FR, FRU, FSU, FTG, FUEL, FULD, FURB, G, GAT, GB, GDA, GENT, GES, GH, GI, GLM, GMDRC, GMNHJ, GOET, GRA, GUA, GZU, H, HA, HAC, HAL, HAM, HAMAB, HAO, HAS, HASU, HB, HBG, HBR, HCIB, HEID, HGM, HIB, HIP, HNT, HO, HPL, HRCB, HRP, HSC, HSS, HU, HUA, HUAA, HUAL, HUAZ, HUCP, HUEFS, HUEM, HUFU, HUJ, HUSA, HUT, HXBH, HYO, IAA, IAC, IAN, IB, IBGE, IBK, IBSC, IBUG, ICEL, ICESI, ICN, IEA, IEB, ILL, ILLS, IMSSM, INB, INEGI, INIF, INM, INPA, IPA, IPRN, IRVC, ISC, ISKW, ISL, ISTC, ISU, IZAC, IZTA, JACA, JBAG, JBGP, JCT, JE, JEPS, JOTR, JROH, JUA, JYV, K, KIEL, KMN, KMNH, KOELN, KOR, KPM, KSC, KSTC, KSU, KTU, KU, KUN, KYO, L, LA, LAGU, LBG, LD, LE, LEB, LIL, LINC, LINN, LISE, LISI, LISU, LL, LMS, LOJA, LOMA, LP, LPAG, LPB, LPD, LPS, LSU, LSUM, LTB, LTR, LW, LYJB, LZ, M, MA, MACF, MAF, MAK, MARS, MARY, MASS, MB, MBK, MBM, MBML, MCNS, MEL, MELU, MEN, MERL, MEXU, MFA, MFU, MG, MGC, MICH, MIL, MIN, MISSA, MJG, MMMN, MNHM, MNHN, MO, MOL, MOR, MPN, MPU, MPUC, MSB, MSC, MSUN, MT, MTMG, MU, MUB, MUR, MVFA, MVFQ, MVJB, MVM, MW, MY, N, NA, NAC, NAS, NCU, NE, NH, NHM, NHMC, NHT, NLH, NM, NMB, NMNL, NMR, NMSU, NSPM, NSW, NT, NU, NUM, NY, NZFRI, O, OBI, ODU, OS, OSA, OSC, OSH, OULU, OWU, OXF, P, PACA, PAMP, PAR, PASA, PDD, PE, PEL, PERTH, PEUFR, PFC, PGM, PH, PKDC, PLAT, PMA, POM, PORT, PR, PRC, PRE, PSU, PY, QCA, QCNE, QFA, QM, QRS, QUE, R, RAS, RB, RBR, REG, RELC, RFA, RIOC, RM, RNG, RSA, RYU, S, SACT, SALA, SAM, SAN, SANT, SAPS, SASK, SAV, SBBG, SBT, SCFS, SD, SDSU, SEL, SEV, SF, SFV, SGO, SI, SIU, SJRP, SJSU, SLPM, SMDB, SMF, SNM, SOM, SP, SPF, SPSF, SQF, SRFA, STL, STU, SUU, SVG, TAES, TAI, TAIF, TALL, TAM, TAMU, TAN, TASH, TEF, TENN, TEPB, TEX, TFC, TI, TKPM, TNS, TO, TOYA, TRA, TRH, TROM, TRT, TRTE, TU, TUB, U, UADY, UAM, UAMIZ, UB, UBC, UC, UCMM, UCR, UCS, UCSB, UCSC, UEC, UESC, UFG, UFMA, UFMT, UFP, UFRJ, UFRN, UFS, UGDA, UH, UI, UJAT, ULM, ULS, UME, UMO, UNA, UNB, UNCC, UNEX, UNITEC, UNL, UNM, UNR, UNSL, UPCB, UPEI, UPNA, UPS, US, USAS, USF, USJ, USM, USNC, USP, USZ, UT, UTC, UTEP, UU, UVIC, UWO, V, VAL, VALD, VDB, VEN, VIT, VMSL, VT, W, WAG, WAT, WELT, WFU, WII, WIN, WIS, WMNH, WOLL, WS, WTU, WU, XAL, YAMA, Z, ZMT, ZSS, and ZT. Funding: The BIEN working group was supported by the National Center for Ecological Analysis and Synthesis, a center funded by NSF EF-0553768 at the University of California, Santa Barbara and the State of California. Additional support for the BIEN working group was provided by iPlant/CyVerse via NSF DBI-0735191. B.J.E., B.J.M., and C.M. were supported by NSF ABI-1565118. B.J.E. and C.M. were supported by NSF HDR-1934790. B.J.E., L.H., R.T.C., G.M., W.F., and P.R.R. were supported by the Global Environment Facility SPARC project grant (GEF-5810). N.M.-H. was supported by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement no. 746334 and acknowledges the Danish National Research Foundation for support to the Center for Macroecology, Evolution and Climate (grant no. DNRF96). C.V. was supported by a Marie Curie International Outgoing Fellowship within the 7th European Community Framework Program (DiversiTraits project no. 221060) and by the European Research Council (ERC) Starting Grant Project (grant ERC-StG-2014-639706-CONSTRAINTS). B.J.E., B.M., N.J.B.K., C.V., and B.J.M. acknowledge the FREE group funded by the synthesis center CESAB of the French Foundation for Research on Biodiversity (FRB) and EDF. J.-C.S. and B.J.E. acknowledge support from the Center for Informatics Research on Complexity in Ecology (CIRCE), funded by the Aarhus University Research Foundation under the AU Ideas program. J.-C.S. also considers this work a contribution to his VILLUM Investigator project “Biodiversity dynamics in a changing world” funded by VILLUM FONDEN (grant 16549). X.F., D.S.P., and E.A.N. were supported by the University of Arizona Bridging Biodiversity and Conservation Science program. C.M. acknowledges funding from NSF Grant DBI-1913673. S.K.W. acknowledges funding from the Strategic Science Investment Fund to Crown Research Institutes from the New Zealand Ministry of Business, Innovation and Employment. I.Š. was supported by the Charles University (UNCE 204069). T.L.P.C., G.D., and J.J.W. acknowledge the French FRB and the Provence-Alpes-Côte d’Azur région (PACA) through the Centre for Synthesis and Analysis of Biodiversity (CESAB) data program, as part of the RAINBIO research project. Author contributions: B.J.E. and B.J.M. designed the study. B.B., X.F., B.M., and D.S.P. integrated and cleaned data. P.M.J., B.M.T., T.L.P.C., G.D., D.M.N., A.T.O.-F., R.K.P., J.M.S.-D., J.J.W., W.F., and S.K.W., contributed data. B.J.E., B.J.M., X.F., B.M., B.B., and P.R.R. performed the analyses. B.J.E., B.J.M., J.M.S.-D., L.H., P.M.J., B.M.T., X.F., B.B., B.S., E.A.N., P.M.J., P.R.R., B.M.T., J.R.B., R.T.C., T.L.P.C., J.C.D., J.C.L., P.A.M., C.M., G.M., N.M.-H., N.J.B.K., D.S.P., R.K.P., M.P., J.M.S.-D., B.S., M.S., I.S. C.V., S.K.W., and J.-C.S. helped interpret and analyze results. All authors helped collect and assemble data. B.J.E. wrote the first draft of the manuscript, and all authors contributed to revisions. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. All data and code used in this study are available via GitHub, The MATLAB code used for fitting gSADs are available from B.J.M. upon request. The public version of the BIEN database is accessible via the BIEN R package, Additional data related to this paper may be requested from the authors.
View Abstract

Navigate This Article