## Abstract

The fossil record of the origins of major groups such as animals and birds has generated considerable controversy, especially when it conflicts with timings based on molecular clock estimates. Here, we model the diversity of “stem” (basal) and “crown” (modern) members of groups using a “birth-death model,” the results of which qualitatively match many large-scale patterns seen in the fossil record. Typically, the stem group diversifies rapidly until the crown group emerges, at which point its diversity collapses, followed shortly by its extinction. Mass extinctions can disturb this pattern and create long stem groups such as the dinosaurs. Crown groups are unlikely to emerge either cryptically or just before mass extinctions, in contradiction to popular hypotheses such as the “phylogenetic fuse”. The patterns revealed provide an essential context for framing ecological and evolutionary explanations for how major groups originate, and strengthen our confidence in the reliability of the fossil record.

## INTRODUCTION

The fossil record shows many notable patterns (*1*) that have been addressed by hypotheses such as mass extinctions (*2*), diversity-dependent diversification (DDD) (*3*), and competitive displacement (*4*, *5*). In particular, the timing and nature of the emergence of modern groups such as animals (*6*), birds (*7*, *8*), and flowering plants (*9*) have been of intense interest: Molecular clock estimates for these groups have often been at odds with estimates based solely on the fossil record (*9*–*13*). Understanding this discrepancy has, however, been hampered by the lack of a model of how both basal (“stem group”) and modern (“crown group”) members (*6*, *14*) of a particular group diversify and go extinct in the fossil record. Quantification of the fossil record, rooted especially in the work of Raup and others (*15*, *16*), allowed explicit models of diversification to be tested, although problems remained about estimation of speciation and extinction rates from the fossil record (*17*–*19*). This modeling also failed to take into account the important survivorship and selection biases of the “push of the past” and large clade effects (*20*) or the systematic differences that exist between extinct and extant taxa (*6*, *14*). Any clade can theoretically be divided into two components: the last common ancestor of all the living forms and all of its descendants (the crown group) and the extinct organisms more closely related to a particular crown group than to any other living group (the stem group): Together, they make up the “total group” (Fig. 1A). Here, we also define the provisional crown group (pCG): the crown group as it would have appeared in the past (e.g., to a Silurian paleontologist). As basal members of the pCG go extinct through time, the node defining it moves upward to subtend the next pair of (then) living sister groups, defining a new pCG. When the last member of the stem group goes extinct, the node defining the crown group will be fixed until the present. Nevertheless, the definitive (i.e., modern) crown group will have emerged some time before this, as a subclade of the then pCG, and therefore, the definitive crown group will temporarily coexist with its corresponding stem group. Diversifications can thus be divided into three sequential phases: (1) only the stem group, (2) both stem and crown groups, and (3) only the crown group (Fig. 1A). We note that stem and crown groups are hierarchical concepts so that within a particular crown group are nested and smaller stem and crown groups.

The origins and fates of crown groups are of particular interest as they comprise modern diversity. However, understanding their origins has been hampered by the lack of any analysis of combined stem and crown group dynamics; the plausibility of models to explain the origin of modern diversity (e.g., Fig. 1B) cannot therefore be easily assessed. An absence of crown group taxa after a time of origin based on a molecular clock estimate could be explained by a variety of means, for example, by invoking poor fossil preservation (*21*). However, observed stem group diversity potentially offers a mode of assessing this sort of claim if we have a model for relative stem and crown group diversity together. If such a model suggests high crown group diversity when only stem group taxa are observed (e.g., as is the case for birds discussed below), then the molecular clock estimate for the crown group’s origin may be incorrect. Other possibilities to explain the discrepancy include the following: There is a strong preservation bias between the stem and crown groups that may, on the basis of the model, be further assessed for plausibility; the stem taxa is misidentified; or the model itself requires modification. First, however, it is important to understand what the null model prediction of such a diversity model might be (*22*).

Here, then, we derive explicit expressions for both stem and crown group diversity and their ratio, using a birth-death model (*23*) (i.e., our calculations do not rely on simulations): Birth-death models stand at the heart of most modern approaches to modeling diversification patterns (*19*). We first consider how stem and crown groups evolve under homogeneous conditions (i.e., with a fixed “background” probability of extinction and speciation, with the background referring to the long-term average excluding mass extinctions) and then how mass extinctions perturb the process, followed by considering how these patterns would be affected by more complex birth-death models. We can capture the concepts of stem and crown groups in a birth-death model by imposing the respective conditions of necessary death before and necessary survival to the present on appropriate probability distributions. In the following, we condition the process on the observation that the clade survives until the present day to form a crown group of at least two species (so that it can also have a defined stem group).

The mathematical models that we use are entirely general so that they apply to any time scale or rate of diversification (see Methods for details). We further note that because of this generality, they also apply at all scales of stem and crown groups, i.e., the behaviors of a particular stem and crown groups are automatically aligned with those stem and crown groups contained hierarchically within them [c.f. (*20*)]. Nevertheless, to illustrate the salient features of the model, we take, as a model example, a total group that emerged 500 million years (Ma) ago and whose crown group emerged c. 410 Ma ago and with diversification parameters speciation λ = 0.5107 and extinction μ = 0.5 (both per Ma) [c.f. figure 1 of (*20*)]. These numbers would, on average, generate 10,000 species in the recent, assuming no mass extinctions. Although these numbers are arbitrary, we note that the clade size and time of origin are approximately similar to those of the bivalves (*24*), and an extinction rate of 0.5 per species per Ma, with the implication of species lasting 2 Ma on average, is defensible [see discussion in (*20*)]. We note that whenever there are substantial extinction rates, long-term survivorship of any clade is unlikely [c.f. (*20*)]. To further expand our exploration of the implications of the model, we also consider similarly sized clades generated with high turnover (μ = 1, λ = 1.0090) and low turnover (μ = 0.1, λ = 0.1143). We consider how unusually large clades (*20*, *25*) (given their diversification parameters) behave and discuss the relevance of this behavior to more complex models such as DDD. We also introduce mass extinctions (*26*) of various sizes. In all cases, we calculate for 1–million year intervals the expected size of the stem group, the probability that it has gone extinct, the expected proportion of total diversity contained within the stem at any given time (averaged over all clades with these parameters), the probability that the stem contains a certain proportion of diversity, and the expected absolute diversity in the stem (Fig. 2). The spindle diagrams of Fig. 1 (C and D) show typical to-scale shapes and sizes of the stem and crown groups under these conditions: Note that because of the high stochasticity of the process, a wide range of less likely outcomes is also possible (*20*).

## RESULTS

In our model example, after the emergence of the crown group, the average proportion of diversity contained within the stem group drops exponentially; the stem group makes up half of the expected total diversity only c. 15 Ma after the origin of the crown group. By c. 80 Ma after, it is expected to make up only 10% of total group diversity. In addition, the probability of the stem group having gone entirely extinct climbs sharply in the same interval: to 50% after 60 Ma and 75% after 100 Ma. With high turnover, the stem group grows faster initially, lasts longer, and generates a larger crown group; however, it also declines extremely quickly once the crown group is established. The converse of all these is true for the low turnover example, except that here, too, the stem group declines quickly after the crown group origin. These quantitative results illustrate an important insight. Whenever there are substantial levels of extinction, crown groups and the stem group before the crown group emerge rapidly as a necessary condition for survival to the present, and stem groups after the emergence of the crown group cease expansion and rapidly decline as the necessary condition for their extinction. The net effect is that a pCG always diversifies rapidly. The rapidity of the transition from a stem-dominated to a crown-dominated regime is sharpened when the crown group emerges more quickly than expected, as would be the case for an unusually large clade (*20*, *25*). One would expect the crown group of a clade (such as the arthropods) that ends up perhaps five times larger than expected over the same 500 Ma to emerge at only c. 20 Ma after the total group (Fig. 1B) and for its stem to dwindle very quickly. pCGs quickly become converted to the definitive crown group, and the crown group node (Fig. 1A) then becomes stable. In our example, the definitive crown group forms after an average of 90 Ma, and the stem dies an average of 110 Ma later, implying that the pCG stops changing from this point and that the identity of the crown group node remains consistent until the present. This stability explains why the fossil record is dominated by long blocks of time with a very similar biota (*2*) (e.g., the “Age of Reptiles”). Once a pCG has sufficiently diversified, it becomes very unlikely to go extinct by stochastic variation alone. These pCGs and their shared characteristics thus become stable features that are disrupted only by the largest mass extinctions.

Long-lived groups have been through one or more mass extinctions. The effect of a very large (96%) mass extinction at c. 250 Ma ago on our typical clade is shown in Figs. 1D and 2 (C and D), with the three possibilities of crown group emergence after the mass extinction (63%), crown group emergence before the mass extinction with stem group survival until the mass extinction (27%), or crown group emergence before the mass extinction with the stem group becoming extinct before the mass extinction (10%) being illustrated. The probability of the crown group emerging after a mass extinction is only high when the absolute number of surviving species is small (Fig. 3, B and C). For our example clade, the tipping point mass extinction size (when the crown group becomes more likely to emerge after than before the extinction) is as much as 93%. Well-established (i.e., relatively long-lived) pCGs, such as crown groups themselves, are also robust even to large mass extinctions; the loss of famous clades such as trilobites seems to be the result of successive biotic crises, not a single event (*27*).

In Fig. 3 (B and C), we show the fates of both species and lineages counts through time as they pass through a mass extinction of varying size (for that particular clade) and the probability of the emergence of the crown group against time. The bifurcating effect of mass extinctions on crown group origins implies that the least likely time for a crown group to emerge is just before a mass extinction (Fig. 3C). For example, if crown Aves really evolved at around 70 Ma ago (*8*) and *Vegavis* (*28*) is an anseriform, then at least five crown group bird lineages independently crossed the Cretaceous/Paleogene (K/Pg) boundary (*28*) and, at the same time, had an origin very close to it in time. Our modeling shows this to be an unlikely possibility [c.f. (*29*)] and that the controversial “explosive” post-Cretaceous model (*10*) of crown group birds is more likely. Similarly, this bifurcation is useful in explaining (at least qualitatively) the very different times of origin (at least as implied by the fossil record) of crown group angiosperms (c. 140 Ma ago) and crown group gymnosperms (c. 325 Ma ago) within the seed plants (*12*, *30*). The successive crises of the end-Permian, end-Triassic, and Toarcian (*30*–*32*) all seem to have considerably affected turnover in floristic diversity and may have preferentially affected plants with specialized reproduction (*33*). The net effect was, in this case, to push crown group angiosperm origins late but retain a deep origin for gymnosperms (Fig. 4). Thus, in this case, our model predicts a relatively short and low-diversity stem to the angiosperms that emerged after this period of floral reorganization and that would have existed through the Jurassic from about 180 to 140 Ma ago, which seems in accord with the fossil record.

It is also possible to calculate directly the effect of multiple mass extinctions (*26*), such as the canonical “big five” of the Phanerozoic (*34*). For a set of these extinctions across Phanerozoic time (Fig. 4, D and E), it can be seen that the probability of the crown group of a Cambrian total group emerging is sharply higher in periods of time just after mass extinctions, apart from the continued high likelihood of a Cambrian origin. The suggestive resemblance of Fig. 3E to plots of Phanerozoic high-level taxonomic origination rates (*3*, *17*) may not be coincidental, with higher-level taxa such as families that appear to preferentially appear after mass extinctions corresponding partly to crown groups.

### More complex models

Exploration of the features of birth-death models has led to more complex versions of them being described, of which the most important are the various types of DDD (*35*–*38*). What would be the effect on our results of these models? We have elsewhere noted that DDD closely resembles the effect of our “large clade” example given here [c.f. figure 7 of (*20*)]. Here (see Methods), we show that the large clade effect closely resembles DDD. Thus, in general, DDD-type models are likely to have the effect of making stem groups die more quickly, and crown groups expand more rapidly, compared to a homogeneous model with the same diversification rate as in the recent. More qualitatively, our results illustrate the point that stem groups, whenever there are substantial extinction rates, are most likely to go extinct either when they are small (i.e., soon after the crown group forms), or at or soon after a mass extinction.

We have assumed in our model that the fossil record can be accurately interpreted in terms of stem- and crown-group diversity. However, it is clear that assignment of particular fossils to one or other stem or crown group is not always correct. If there were to be large-scale misinterpretation of the fossil record in this way [(*39*) but see also (*40*) for a counterexample], then this might be another way in which the mismatch between the fossil record and molecular clock estimates of origins might be explicable. While we accept the constant possibility of misinterpration of the fossil record, this kind of mistake seems most likely to occur close to the crown node. For example, whether trilobites are stem- or crown-group arthropods remains uncertain, but anomalocaridids are clearly in the stem, and although the position of *Vegavis* remains unclear, that of *Archaeopteryx* (or even *Tyrannosaurus*) is not. Thus, we do not believe that these misassignments are sufficient to explain the sometimes large discrepancies between molecular clock estimates and the fossil record. One further way in which models based on the fossil record and molecular clocks might become misaligned might be through different estimates of speciation and extinction processes in the fossil record compared to those recovered from phylogenetic methods (*19*). Although we have not investigated such a possibility here, this is clearly an area of interesting further research.

## DISCUSSION

Our results reveal much about the dynamics of stem and crown groups and have an important bearing on how we view the fossil record. In particular, as the crown group forms, the stem group quickly collapses into first obscurity and then extinction unless the total group was affected by a very large mass extinction. This rapid phase transition from stage 1 to stage 3 of the diversification process might be mistaken for a mass extinction itself, such as has been suggested for the end of the Ediacaran period (*41*). Qualitative models (*42*) wherein crown groups form “silently” and remain as low diversity components of the background while the stem group diversifies (Fig. 1B) can be seen to be unlikely under the model; rather, the reverse is always true. Our model thus suggests that notable mismatches between the fossil record and the results of molecular clock analyses [such as the origin of the bilaterians (*21*)] should not be explained at least like this.

Although our results are not conditioned on the diversification parameters of any particular clade, we note that they do appear to align with many large-scale patterns seen in the fossil record. In particular, the rapid loss of stem group diversity such as that of arthropods [which seems to have largely, although not entirely, vanished by the early Ordovician (*43*)] and tetrapods [early Permian, e.g., (*44*)] as well as that of the Ediacaran biota (*45*) seems well explained by our model, as well as the persistence of large stem groups up to mass extinctions. In addition, major groups seem, at least in the fossil record, to appear abruptly and rapidly expand. Thus, our model is consonant with and strengthens interpretations of the fossil record that view it as recording relatively true-to-life patterns of diversification. This is particularly important in clades where molecular clocks suggest a deep origin of the crown group during an interval of time when the stem group is seen to be diversifying in the fossil record, with classic examples being animals as a whole, angiosperms, birds, and mammals (Fig. 1B) (*9*, *11*–*13*, *21*, *46*). This broad correspondence between the patterns of the fossil record and our model is another reason to be cautious about interpreting molecular clock results when they are markedly at odds with the fossil record [c.f. (*9*, *20*, *46*)], although the possibility that the simple homogeneous birth-death model is inadequate for explaining patterns of diversity is also considered above.

The famous debates in the 1970s in the pages of Paleobiology and elsewhere (*47*) were essentially between viewing the fossil record as an archive of discernible evolutionary processes [such as DDD (*3*)] or as the result of inevitable stochastic patterns that could be captured with simple statistical models (*16*). We have previously extended the latter view by modeling heterogeneity that results from survivorship and selection bias (*20*). Here, we show that other types of heterogeneity in the record, such as the rapid takeover of diversity by the crown group and preferential crown group emergence after large mass extinctions, also emerge from imposing retrospective structure on a homogeneous process and should therefore also be part of any null hypothesis. A homogeneous model is not, of course, intended to capture the many true heterogeneities that exist in the evolutionary process: It is a model, not a description. Nevertheless, it shows that strong patterns of heterogeneity emerge even under homogeneous background conditions when simple conditions such as survival to the present are imposed. Furthermore, the “large clade effect” is similar to a general DDD model, suggesting that even under more heterogeneous conditions of diversification, the qualitative aspects of the model are likely to hold. Hypotheses that strongly transgress against the general predictions of the model should thus not simply be accepted without scrutiny, especially when these transgressions are themselves generalized [e.g., the “phylogenetic fuse” (*42*) concept]. Last, fuller understandings of the behavior of the fossil record relative to molecular clock predictions are likely to come from combined models of preservation and diversification [see, e.g., (*48*, *49*)].

## METHODS

The fundamental mathematical results concerning the distribution of species and lineage numbers in a birth-death process (BDP) are given by Nee *et al.* (*23*) and have been much elaborated and applied subsequently, especially for the reconstructed process (i.e., models for lineages leading to extant species) (*26*, *50*, *51*). Here, we used those results to derive properties of the stem and crown groups, both in homogeneous processes and those experiencing mass extinctions that occur at singular points in time, with references to related approaches for the reconstructed process where appropriate. Our notation follows that used by Budd and Mann (*20*), but we strived to make clear the connection to the notation of Nee *et al.* (*23*) and others.

### Glossary of mathematical terminology

Note that the mathematics has time increasing forward but that, in our diagrams, we used the usual convention of showing time at the present day to be 0.

λ: The speciation probability per unit time of a BDP, assumed to be constant.

μ: The extinction probability per unit time of a BDP, assumed to be constant.

*T*: The time from the origin of a BDP to the present day (equivalently the age of the total group).

*n*_{t,t′}: The number of species alive at time *t*′ in a BDP that originated at time *t*.

*m*_{t,t′}: The number of species alive at time *t* with descendants at time *t*′, in a BDP that originated at time 0.

*a*_{t,t′}: Parameter controlling the distribution of abundance such that (1 − *a*_{t,t′}) is the “success” parameter in the geometric distribution of *n*_{t,t′} (see Eq. 1).

*S*_{t,t′}: The probability that a BDP originating at time *t* has any descendants at time *t*′.

*f _{i}*: The fraction of species that survive mass extinction event

*i*.

*t* in the crown group of a BDP that originated at time 0.

*t* in the stem group of a BDP that originated at time 0.

### The birth-death process

The number of living species, *n*_{t,t′} at time *t*′ in a process that originates at time *t*, conditioned on the survival of the process to *t*′, is geometrically distributed with success parameter 1 − *a*_{t,t′}

The parameter *a*_{t,t′} is analogous to *u _{t}* of Nee

*et al.*(

*23*) and also corresponds to the lower tail distribution of the coalescent point process (CPP)

*P*(

*H*<

*T*) described by Lambert and Stadler (

*26*), where

*H*is the coalescence time and

*T*is as above. In a homogeneous process without mass extinctions,

*a*(

*t*,

*t*′) is a function of the background speciation rate λ and extinction rate μ

Nee *et al.* (*23*) showed that the number of species alive at time *t* with descendants at time *t*′,*m*_{t,t′}, is also geometrically distributed [note that Nee *et al.* (*23*) use the notation η rather than *m*]*S*_{t,t′} is the probability for a BDP originating with one species at time *t* to survive to time *t*′ [analogous to *P*(*t*,*t*′) of Nee *et al.* (*23*)]. In a homogeneous process, this survival probability is given as

### Mass extinctions

As shown by Nee *et al.* (*23*), the distributions of *n*_{t,t′} and *m*_{t,t′} in terms of *a*_{t,t′} and *S*_{t,t′} continue to hold when the process is not homogeneous, for example, when mass extinctions occur. However, the forms of *a*_{t,t′} and *S*_{t,t′} must change to account for this heterogeneity. The parameter controlling species diversity, *a*_{t,t′}, can be given more generally in terms of time-dependent λ(*s*) and μ(*s*) as

In the case of a set of mass extinctions occurring instantaneously at times between *t* and *t*′, with survival fractions *f*_{1}, *f*_{2}, …, *f _{N}*, in an otherwise process, this reduces to

What remains is to determine the survival probability *S*_{t,t′} of the process through the mass extinctions. First, we considered survival through one mass extinction and then identified a recursive process for calculating the survival probability through multiple mass extinctions. Consider a BDP starting with one species at time *t*. If each species has a instantaneous and independent probability *f* to survive a mass extinction at time *t _{m}*, what is the probability that the process has descendants at time

*t*′ beyond the mass extinction?

First, the distribution of the number of descendants it has at time *t _{m}*, from Eq. 1, is

The probability that any one of these descendants at time *t _{m}* will leave a descendent at time

*t*′ is given by the product of the (independent) probability that it survives the mass extinction (

*f*) and the probability that its lineage survives another time

*t*′ −

*t*. Therefore, the probability that at least one will survive is

_{m}To obtain the probability *S*_{t,t′} that any species at time *t* will have some descendants at time *t*′, we must marginalize over the unknown value of *n*_{t,tm}

In the case of one mass extinction, *a*_{t,tm}, *S*_{t,tm}, and *S*_{tm,t′} are all given by Eqs. 2 and 4, because these periods contain no further heterogeneities. However, in the case of multiple mass extinctions, Eq. 9 can be applied recursively along with Eq. 6 to obtain these values, until all mass extinctions have been incorporated.

Using these adapted expressions for *a*_{t,t′} and *S*_{t,t′}, the geometric distributions specified in Eqs. 1 and 3 continue to hold. We note that Lambert and Stadler (*26*) considered the reconstructed evolutionary process as a CPP and concisely derived a full distribution of trees through an arbitrary set of mass extinctions, showing that the process remains a CPP. This previous result thus provides an alternative derivation for our results concerning the number of lineages that survive to the present, *m*_{t,T}, as these constitute the reconstructed phylogeny. However, the reconstructed process by definition excludes the extinct stem lineages.

### When does the crown group form?

The probability that the crown group has formed by time *t* is the probability that the number of species alive at time *t* that have descendants in the present, *m*_{t,T}, is greater than one. We took a crown group here to consist of at least two species so that both the crown and stem group can be defined. Because *m*_{T,T} > 1 (i.e., we observed that the crown group does form before the present), this implies

Using Eq. 3, we have

We can further condition on a known limit to the time of the crown group. For example, if we wish to consider cases where the crown group emerges before the time of a mass extinction, *t _{m}*, then the above gives an updated expression

This result can also straightforwardly be derived by the CPP approach of Lambert and Stadler (*26*) because the crown group node subtends all extant species of a clade.

### Dynamics of stem and crown group abundance

In this section, we derived the new mathematics required to model stem and crown group abundance together. We assumed that the process starts at time *t* = 0 and that the time at which the crown group emerges, *t*_{c}, is known, and derived expressions for abundance in the stem and crown groups conditioned on this.

### Before the crown group forms

Before the crown group has formed, all diversity is stem diversity. We can calculate the distribution of this diversity, *m*_{t,T} = 1, i.e., that the crown group has not yet formed, revealing a negative binomial distribution

### After the crown group forms

*The crown group*. The crown group is defined by a speciation at time *t*_{c}, creating exactly two species that will give rise to descendants in the present. This implies two independent BD processes that will survive to time *T*, starting at time *t*_{c}, thus implying for each that *X* and *Y*, with

Because the crown group consists of two such independent BD processes, we can use the fact that the sum of two independent and identically distributed geometric random variables follows a negative binomial distribution to express the crown group diversity at time *t*

*The stem group*. Let *t*. We know that *T*. Conditioning on this, we have

We can identify the quantity *t*_{c} before the crown group speciation that have descendants in the stem at time *t*, because one species at time *t*_{c} only has descendants in the crown group. The total stem diversity at time *t* is therefore the sum of *t* − *t*_{c} but also on becoming extinct before time *T* (by definition, the stem is extinct at time *T*). First, we considered the dynamics of one such BD process. From Eq. 1, conditioning only on survival to time *t* from a start of *t*_{c}, we have

Now, we also condition on the process not surviving to time *T*

With normalization, this reveals another geometric distribution

We can now see the total stem abundance at time *t* as the sum of *t* > *t*_{c}*q* = 1 − *S*_{t,T}) and *r* = 1 − *a*_{0,t}(*S*_{0,t}/*S*_{t,T}/*t* is *r*^{2}.

### The relationship between DDD and large clades

Here, we evaluated the dynamics of stem and crown groups under a homogeneous BDP in which the speciation and extinction rates are fixed. A natural question is how these dynamics would be affected by a nonhomogeneous process. A commonly supposed source of heterogeneity in speciation and/or extinction rates is DDD (*35*–*37*), in which the net rate of diversification (and thus the rate of production of lineages) depends on current diversity. Here, we showed that a large clade (i.e., a clade that is larger than expected, conditioned on its diversification parameters) closely resembles a clade produced under DDD.

Consider the probability that the number of lineages increases by one over a short period of time, Δ*t*, conditioned on knowing both the current number of lineages, *m*_{t,T}, and the eventual size of the clade, *m*_{T,T}. By Bayes rule, we have

The prior probability that a lineage is created over a short time interval Δ*t* is given in terms of the current number of lineages, the speciation rate, λ, and the probability that a new species has surviving descendants at time *T*, *S*_{t,T}

The other terms on the right-hand side simplify by virtue of the Markovian property of the model and are given by negative binomial probabilities

Putting these together, we have

Taking the limit as Δ*t* → 0 gives

We can further clarify this expression by recognizing that

This reveals a DDD relative to the prior expected rate of (*m*_{t,T}λ*S*_{t,T}), with a carrying capacity of *m*_{T,T} (the eventual clade size). For large clades, *k* is initially lower than *m*_{T,T}, inducing higher expected diversification. If this greater probability of lineage creation actually results in greater lineage abundance (recalling that these events are stochastic), then *m*_{T,T}/*k* will decrease, reducing the expected rate of new lineage creation. Conversely, if new lineages are not created, this will increase *m*_{T,T}/*k*, further elevating the future expected rate. When *k* = *m*_{T,T}, the rate converges to the prior expectation. Along the expected course of diversification in a large clade, *m*_{T,T}/*k* declines though an initial burst of high diversification and rapidly approaches one, closely resembling a DDD with an increasing carrying capacity [see, for example, figure 7 of (*20*) compared to figure 4 of (*52*)].

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/8/eaaz1626/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank S. Porter, S. Slater, and D. Field for useful discussions.

**Funding:**This work was funded by the Swedish Research Council (VR grant no. 621-2011-4703 to G.E.B.).

**Author contributions:**G.E.B. and R.P.M. contributed equally to the study design and conceptual development of this paper. R.P.M. performed the mathematical analysis, and G.E.B. drafted the manuscript. Both authors contributed to revisions of the paper.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).