## Abstract

We have proposed a novel, accurate low-cost method to estimate the incubation-period distribution of COVID-19 by conducting a cross-sectional and forward follow-up study. We identified those presymptomatic individuals at their time of departure from Wuhan and followed them until the development of symptoms. The renewal process was adopted by considering the incubation period as a renewal and the duration between departure and symptoms onset as a forward time. Such a method enhances the accuracy of estimation by reducing recall bias and using the readily available data. The estimated median incubation period was 7.76 days [95% confidence interval (CI): 7.02 to 8.53], and the 90th percentile was 14.28 days (95% CI: 13.64 to 14.90). By including the possibility that a small portion of patients may contract the disease on their way out of Wuhan, the estimated probability that the incubation period is longer than 14 days was between 5 and 10%.

## INTRODUCTION

The Center for Disease Control and Prevention (CDC) of China and World Health Organization are closely monitoring the current outbreak of coronavirus disease 2019 (COVID-19). As of 22 February 2020, the National Health Commission of China had confirmed a total of 76,936 cases of COVID-19 in mainland China, including 2442 fatalities and 22,888 recoveries (*1*). Various containment measures, including travel restrictions, isolation, and quarantine have been implemented in China with the aim of minimizing virus transmission via human-to-human contact (*2*). Quarantine of individuals with exposure to infectious pathogens has always been an effective approach for containing contagious diseases in the past. One of the critical factors to determine the optimal quarantine of presymptomatic individuals is a good understanding of the incubation period, and this has been lacking for COVID-19.

The incubation period of an infectious disease is the time elapsed between infection and appearance of the first symptoms and signs of disease. Precise knowledge of the incubation period would help to provide an optimal length of quarantine period for disease control purpose and also is essential in the investigation of the mechanism of transmission and development of treatment. For example, the distribution of the incubation period is used to estimate the reproductive number *R*, that is, the average number of secondary infections produced by a primary case. The reproductive number is a key quantity that affects the potential size of an epidemic. Despite the importance of the incubation period, it is often poorly estimated on the basis of limited data.

To the best of our knowledge, there is only a handful of studies estimating the incubation period of COVID-19. Among them are Li *et al*. (*3*), Zhang *et al*. (*4*), Guan *et al*. (*5*), Backer *et al*. (*6*), Linton *et al*. (*7*), and Lauer *et al.* (*8*). The estimates of the incubation period from these five studies, together with other results of two other coronavirus disease, severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS), are listed in Table 1. In Li *et al.* (*3*), the first 425 lab-confirmed cases, reported as of 22 January 2020, were included in the study, but the exact dates of exposure could be identified in only 10 of these cases. The distribution of the incubation period was subsequently approximated by fitting a lognormal distribution to these 10 data points, resulting in a mean incubation period of 5.2 days [95% confidence interval (CI): 4.1 to 7.0], and the 95th percentile is 12.5 days. Similarly, in Zhang *et al.* (*4*), 49 cases with no travel history who were identified by prospective contact tracing were used to estimate incubation period by fitting a lognormal distribution, resulting in a mean incubation period of 5.2 days (1.8 to 12.4). However, given the limited sample size, it is challenging to make a solid inference on the distribution of the incubation period. A different result was reported by Guan *et al*. (*5*), based on 291 patients who had clear information regarding the specific date of exposure as of 29 January 2020, stating that the median incubation period was 4.0 days (interquartile range, 2 to 7). However, this study of the incubation period can be highly influenced by the individuals’ recall bias or interviewers’ judgment on the possible dates of exposure rather than the actual dates of exposure that, in turn, might not be accurately monitored and determined, thus leading to a high percentage of error. In Backer *et al*. (*6*), 88 confirmed cases detected outside Wuhan were used to estimate the distribution of the incubation period. For each selected case, a right-censored observation of the incubation period can be obtained by travel history and symptoms onset. The distribution of the incubation period can then be estimated by fitting a Weibull, Gamma, or lognormal distribution with censored data. However, this method contained two types of sampling biases: (i) With the longer incubation period, the patients who resided at Wuhan but developed symptoms outside Wuhan were easier to be observed (i.e., a patient with a shorter incubation period would develop symptoms before the planned trip and possibly cancel the trip; hence, such case would not be observed) and, therefore, lead to an overestimation; (ii) if the follow-up time (from infection to the end of the study) is short, then only the shorter incubation period would be observed and hence lead to an underestimation (i.e., assume information of confirmed cases from days 1 to 10 was collected, two patients, A and B, both got infected on the day 5, patient A had an incubation period of 2 days while patient B had an incubation period of 8 days, then only patient A with the shorter incubation period would be included in the data, patient B with the longer incubation period would develop symptoms after day 10 hence would not be included in the data). Linton *et al*. (*7*) proposed a similar approach to the study of Backer *et al*. (*6*) with a larger sample size of 152 but, in addition, corrected the second sampling bias aforementioned. However, the first problem in regard to the sampling bias is still an unsolved issue. In Lauer *et al.* (*8*), a pooled data with sample size of 181 were used to estimate the incubation period. All collected cases in the pooled data had identifiable exposure and symptom onset windows available, of which 161 had a known recent history of travel to or residence in Wuhan, which was the same kind of data collected in Backer *et al*. (*6*) and Linton *et al*. (*7*); others had evidence of contact with travelers from Hubei or persons with known infection. A similar approach to Backer *et al*. (*6*) was used, and the aforementioned two issues in regard to sampling bias remain unsolved. Lauer *et al*. (*8*) reported that 2.5% of patients developed symptoms after 11.5 days and claimed that it was highly unlikely that further symptomatic infections would be undetected after 14 days, while the same co-authors reported 5% of patients have symptoms onset after 14 days in the study of Bi *et al.* (*9*).

To overcome the aforementioned problems, we propose a novel method to estimate the incubation period of COVID-19 by using the well-known renewal theory in probability (*10*). Such a method enhances the accuracy of estimation by reducing recall bias and using abundance of the readily available forward time with a large sample size of 1084. To the best of our knowledge, our study of the distribution of the incubation period involves the largest number of samples to date. We find that the estimated median of the incubation period is 7.76 days (95% CI: 7.02 to 8.53), mean is 8.29 days (95% CI: 7.67 to 8.9), the 90th percentile is 14.28 days (95% CI: 13.64 to 14.90), and the 99th percentile is 20.31 days (95% CI: 19.15 to 21.47). Furthermore, by including the possibility that a small portion of patients may contract the disease on their way out of Wuhan, the estimated tail probability that incubation period is longer than 14 days is between 5 and 10%. It is difficult to estimate the proportion of incubation beyond 14 days in general if the sample size is small. Because our sample size is much larger than that of other studies published to date, we have confidence in the robustness of our findings. Our estimated incubation period of COVID-19 is longer than those given by previous researches on SARS, MERS, and COVID-19 in Table 1.

## METHODS

### Motivations

As described in the previous section, the distribution of the incubation period in most of the literature is either described through a parametric model or its empirical distribution based on the observed incubation period from the contact tracing data. However, the contact tracing data are challenging and expensive to obtain, and their accuracy can be highly influenced by recall bias. Hence, a low-cost and high-accuracy method to estimate the incubation distribution is needed. In this study, we make use of confirmed cases detected outside Wuhan with known histories of travel or residency in Wuhan to estimate the distribution of incubation times. The renewal theory is implemented by treating an incubation period of a prevalence case as a renewal process. See more details of the renewal process and corresponding assumptions in section S1.

### Data collection and justification

Publicly available data were retrieved from provincial and municipal health commissions in China and the ministries of health in other countries, including 12,963 confirmed cases outside Hubei Province as of 15 February 2020. Detailed information on confirmed cases includes region, gender, age, date of symptom onset, date of confirmation, history of travel or residency in Wuhan, and date of departure from Wuhan. The date of symptoms onset in these data refers to the date reported by the patient on which the clinical symptoms first appeared, where the clinical symptoms include fever, cough, nausea, vomiting, diarrhea, and others. Among 12,963 confirmed cases, 6345 cases had their dates of symptom onset collected, 3169 cases had histories of travel or residency in Wuhan, 2514 cases had their dates of departure recorded, and 1922 cases had records of both dates of departure from Wuhan and dates of symptoms onset. However, not all 1922 cases should be taken in the analysis. After examining the collected data, there were a total of 1084 cases that meet the criteria described in section S2 and were followed forwardly.

Figure 1 shows the design of the cross-sectional and forward follow-up study. The dot on the left end of each segment is a date of infection, while the square on the right end is a date of symptoms onset. The date of departure from Wuhan cuts the line segment in between. Note that only solid lines were followed in our cohort, while dashed lines are not followed in the cohort because the date of departure from Wuhan is not between 19 January 2020 and 23 January 2020.

Among the 1084 cases with gender information in the study, 468 (43.30%) are female. The mean age of patients was 41.31, and the median age was 40. More than 80% of the cases were between 20 and 60. The youngest confirmed case in our cohort was 6 months old, while the oldest was 86 years old. Table 2 shows the demographic characteristics of patients with COVID-19 in the Wuhan departure cohort and the entire data collected as of 15 February 2020. Although there are slight differences between the selected cases and all cases, we explored the correlation between forward time and age instead and found that the correlation between forward time and age was −0.0309. Hence, there is no evidence that the incubation time depends on age in this dataset, and the observed forward times should be able to represent that of in the general population. More demographic characteristics of patients are summarized in section S2.

### Estimation of incubation period distribution of COVID-19

Let *Y* be the incubation period of an infected case with probability density function *f*(*y*) where *y* > 0. Let *A* be the duration from infection in Wuhan to the departure of Wuhan, which can be considered as the backward time in a renewal process. Let *V* denote the duration between the departure from Wuhan and the onset of symptoms, which can be considered as the forward time in a renewal process. Then, *V* has the density as follows*f*( ∙ ), and *A* and *V* have the same density marginally, and the aforementioned sampling bias can be corrected by using Eq. 1. See more technical details in section S3.

In our cohort of COVID-19 cases, we assume that the incubation period is a Weibull random variable; the estimates in the Weibull model can be obtained by maximizing the corresponding likelihood function. The mean and percentiles of the incubation period can be calculated from the parametric Weibull distribution. The CIs in this study are obtained using bootstrap method with *B* = 1000 resamples. Note that Gamma distribution and lognormal distribution are also fitted for the incubation, both provide similar estimates of quantiles compared with Weibull.

### Sensitivity analysis

It is arguable that people who left Wuhan might also be infected on the day of departure since they had a higher chance to be exposed to this highly contagious, human-to-human–transmitted virus in a crowded environment, as cases were increasing. In this case, the duration between departure from Wuhan and onset of symptoms is no longer only the forward time but a mixture of the incubation period and the forward time. Unfortunately, it is unclear who got infected before departure and who got infected at the event of departure. Hence, a mixture sensitivity forward time model is proposed, that is

If α ≠ 1, then it is possible to identify all underlying parameters. We explore the sensitivity of estimates of incubation period by assuming a range of π, that is, π = 0,0.05,0.1, and 0.2 and estimate α and λ by maximizing the product of likelihoods, *v _{i}* is the observed forward time of the

*i*th individual and

*I*is the sample size of the studying cohort.

## RESULTS

By fitting the observed forward times *v _{i}* of the 1084 cases in our cohort to the likelihood function (Eq. 2), we find that π = 0 gives the largest log likelihood; hence, we set π = 0 as the reference scenario. The maximum likelihood estimates are

Table 3 summarizes the estimates of the parameters and the mean and percentiles of the incubation period. We can see that the estimates for mean and percentiles decrease as the proportion of people who got infected at the event of departure, π, increases. However, variation of the results from π = 0 to 0.2 is only about 1 day, which we believe is still in an acceptable range.

## DISCUSSION

A sound estimate of the distribution of the incubation period plays a vital role in epidemiology. Its application includes decisions regarding the length of quarantine for prevention and control, dynamic models that accurately predict the disease process, and determining the contaminated source in foodborne outbreaks. Here, we propose a novel method to estimate the incubation distribution that only requires information on travel histories and dates of symptoms onset. This method enhances the accuracy of estimation by reducing recall bias and using abundance of the readily available forward time data. To the best of our knowledge, this study of incubation period involves the largest number of samples to date. In addition, this is the first article to consider the incubation period for COVID-19 as a renewal process, which is a well-studied methodology and has a solid theoretical foundation. The estimated incubation period has a median of 7.76 days (95% CI: 7.02 to 8.53) and a mean of 8.29 days (95% CI: 7.67 to 8.90), the 90th percentile is 14.28 days (95% CI: 13.64 to 14.90), and the 99th percentile is 20.31 days (95% CI: 19.15 to 21.47). By including the possibility that a small portion of patients may contract the disease on their way out of Wuhan, the estimated tail probability that incubation period is longer than 14 days is between 5 and 10%. Compared with the results published in Li *et al*. (*3*), Guan *et al*. (*5*), Backer *et al*. (*6*), and Linton *et al*. (*7*), the incubation period estimated in our study is notably longer. Below is some evidence that may potentially support our findings of the long incubation period:

1) In the study of Guan *et al.* (*5*) on behalf of the China Medical Treatment Expert Group for COVID-19, the incubation period had a reported median of 4 days, the first quartile of 2 days, and the third quartile of 7 days. By fitting a commonly used Weibull distribution to these quartiles, we can obtain *et al*. (*5*), it was reported that the incubation period of one patient in each of the severe and nonsevere groups was up to 24 days, 13 cases (12.7%) with an incubation period greater than 14 days and 8 cases (7.3%) with an incubation period greater than 18 days, which were close to what have found in our study (*11*).

2) One particular case reported by Yibin municipal health commissions in China stated that a 64-year-old female was diagnosed with COVID-19 on 11 February 2020 at Yibin, Sichuan Province 20 days after returning from Wuhan. This patient was under self-quarantine at home with the family for 18 days, from January 23 to February 9. On February 8, the patient developed mild symptoms of cough with sputum production (*12*).

3) It was reported in Bai *et al*. (*13*) that the incubation period for patient 1 was 19 days. However, the claimed 19-day incubation was the time difference between departure from Wuhan and symptoms onset, namely, the forward time in our study. The actual incubation period should be longer than 19 days.

On the basis of the estimated incubation distribution in this study, about 10% of patients with COVID-19 would develop symptoms after 14 days of infection. This may be a public health concern in regard to the current 14-day quarantine period. Our approach does require that certain assumptions are to be met, which we detail below.

1) The collection of forward time depends on the follow-up time, that is, if the follow-up time is not long enough, then we would only be able to include those with a shorter incubation period in the Wuhan departure cohort. This limitation may lead to an underestimation of the incubation period. The same limitation also applies to Backer *et al*. (*6*) and Linton *et al.* (*7*). However, as explained earlier, we only included cases who left Wuhan before January 23 in this study, which leaves an average follow-up time of 25 days. Hence, it is less likely that we missed those patients with longer incubation periods based on the largest incubation period of 24 days, as reported in Guan *et al*. (*5*). Note that the 24-day incubation period was reported as an outlier in Guan *et al*. (*5*).

2) We assume that the individuals included in our cohort were either infected in Wuhan or on the way to their destination from Wuhan, and violation of this assumption leads to an overestimation of incubation period. The same limitation also applies to Backer *et al*. (*6*), Linton *et al*. (*7*), and Lauer *et al*. (*8*). However, with a carefully selected cohort justified in Methods, the chance for an individual in the Wuhan departure cohort getting infected outside Wuhan should be relatively small. Nonetheless, we acknowledge that this possibility exists, for example, a family member could be uninfected by the time of departing Wuhan but got infected by other family members or outside contacts after leaving Wuhan. A sensitivity analysis was also conducted by removing all cases who left Wuhan with their families in the Wuhan departure cohort, and we found that it only resulted in a small change of the estimated distribution of the incubation period.

3) Individuals in our selected cohort were those who got infected in the early days of the outbreak. They were likely the first- or second-generation cases. Our results do not apply to higher generation cases if the virus mutates.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/33/eabc1202/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank D. Follmann from the National Institute of Allergy and Infectious Diseases for comments that improved the manuscript, W. Zhou from U.S. CDC, and M. Thompson from the University of Waterloo for many helpful comments and suggestions. We also thank B. Snow, ELS, from Leidos Biomedical Research Inc. for providing a technical review of the manuscript.

**Funding:**This research is supported by the National Natural Science Foundation of China grant 8204100362 and the Zhejiang University special scientific research fund for COVID-19 prevention and control.

**Author contributions:**J.Q.: Study design, writing, and data interpretation. C.Y.: Writing, literature search, and data interpretation. Q.L.: Writing, data analysis, and data collection. T.H.: Data analysis and data collection. S.Y.: Data interpretation. X.-H.Z.: Study design, writing, and data interpretation.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Data and codes are now available from https://github.com/johnnyhu149/estimating_incubation_period. Additional data related to this paper may be requested from the authors.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).