## Abstract

Earthquake forecasting is the ultimate challenge for seismologists, because it condenses the scientific knowledge about the earthquake occurrence process, and it is an essential component of any sound risk mitigation planning. It is commonly assumed that, in the short term, trustworthy earthquake forecasts are possible only for typical aftershock sequences, where the largest shock is followed by many smaller earthquakes that decay with time according to the Omori power law. We show that the current Italian operational earthquake forecasting system issued statistically reliable and skillful space-time-magnitude forecasts of the largest earthquakes during the complex 2016–2017 Amatrice-Norcia sequence, which is characterized by several bursts of seismicity and a significant deviation from the Omori law. This capability to deliver statistically reliable forecasts is an essential component of any program to assist public decision-makers and citizens in the challenging risk management of complex seismic sequences.

## INTRODUCTION

On 24 August 2016, a magnitude 6 (M6) earthquake hit close to the city of Amatrice, causing severe damage and killing about 300 inhabitants. This earthquake was not anticipated by any increase in seismic activity, and it initiated a very complex seismic sequence that had several bursts of seismicity and a complex spatial pattern (Fig. 1), with more than 50,000 earthquakes recorded. After 2 months of a typical aftershock sequence characterized by many smaller earthquakes, in late October, the northern portion of the sequence was hit by M5.9 (on October 26) and M6.5 (on October 30) earthquakes that caused significant damage to a vast area around the city of Norcia, destroying the famous 14th century basilica of San Benedetto that had withstood earthquakes for more than six centuries. Later in January, four M5+ earthquakes occurred close in time and space in the southern part of the sequence, raising concerns for the potential impact on the nearby Campotosto dam that forms the second largest man-made lake in Europe.

This complex behavior is significantly different from the typical mainshock-aftershock sequences, where one large earthquake is followed by many smaller aftershocks. The statistics of typical mainshock-aftershock sequences is successfully described by the model of Reasenberg and Jones (R&J) (*1*), which has been widely used in aftershock forecasting since the 1989 Loma Prieta earthquake (*1*, *2*). The R&J model is rooted in the idea that aftershocks from a single mainshock are distributed according to the Omori-Utsu (*3*, *4*) and Gutenberg-Richter (*5*) empirical laws, thus contemplating the possibility that aftershocks can be larger than the mainshock. Although the R&J model gained wide acceptance in the seismological community, it cannot forecast the evolution of more complex seismic sequences, such as the Amatrice-Norcia sequence, because it does not include spatial information, and it assumes that there is only one major quake that triggers the whole aftershock sequence. The limited applicability of the R&J model was illustrated during the recent sequence that hit Kumamoto region, Japan, in 2016. After the first M6.5 earthquake on April 14, the Japan Meteorological Agency (JMA) released aftershock forecasts for smaller earthquakes. A few days later, on April 16, an M7.3 earthquake hit the same region, killing more than 50 inhabitants, and JMA stopped issuance of forecasts because the sequence did not conform to typical mainshock-aftershock sequence (*6*).

More recent clustering models, such as the epidemic-type aftershock sequence (ETAS) (*7*) and the short-term earthquake probability (STEP) (*8*) models, are based on the same empirical laws used by R&J, but they introduce two important improvements: the spatial component of the forecasts and the assumption that every earthquake above a certain magnitude can generate other earthquakes, not only the mainshock. The latter precludes any meaningful distinction among foreshocks, mainshocks, and aftershocks (*9*, *10*), because they can be attributed, at best, only a posteriori when the sequence is over. Despite these conceptual improvements, the increase in forecasting skill with respect to R&J and the capability of these models to deliver statistically reliable real-time forecasts during complex seismic sequences are still mostly unknown. With this specific aim, we analyze here the prospective real-time forecasting performance of ETAS and STEP models during the Amatrice-Norcia complex seismic sequence.

In the last few years, the Seismic Hazard Center at Istituto Nazionale di Geofisica e Vulcanologia (INGV) introduced an operational earthquake forecasting (OEF) system (OEF_Italy) (*11*, *12*) that provides continuous authoritative information about time-dependent seismic hazards to the Italian Civil Protection. Currently, the system is still in a pilot phase. OEF_Italy consists of an ensemble of three different clustering models, two different flavors of ETAS (*13*, *14*), and one STEP model (*15*). The ensemble modeling weighs the forecasts of each single model according to its past forecasting performance (*16*) and allows proper estimation of the epistemic uncertainty that is of paramount importance for testing (*17*) and for communicating uncertainties to the decision-makers. Here, we analyze the weekly earthquake forecasts for the complex seismic sequence still ongoing in Central Italy (Fig. 1). The weekly forecasts are updated every week or after any earthquake of M4.5 or above. Figure 2 shows the forecasts after (Fig. 2A) and before (Fig. 2, B to D) the most important earthquakes of the sequence. Notably, although each earthquake has an isotropic triggering spatial kernel in OEF_Italy models, the sum of the spatial triggering of all earthquakes yields a more heterogeneous spatial pattern. (All weekly forecasts used in this analysis are available upon request to the corresponding author.)

To evaluate the scientific reliability of the forecasts, we test whether the forecasts satisfactorily describe the space-time-magnitude distribution of the 40 target earthquakes (1 of M ≥ 6, 6 of 5 ≤ M < 6, and 33 of 4 ≤ M < 5), which are the largest earthquakes that occurred in the period between 24 August 2016 to 31 January 2017. We do not consider as target earthquakes the few events with M ≥ M4 that occurred in the first few hours after the two largest Amatrice and Norcia earthquakes. (We come back to this point later.) To test the space and time distribution, we use the *S*- and *N*-tests proposed in the Collaboratory for the Study of Earthquake Predictability (CSEP) framework (*18*, *19*). The *S*-test (*20*) verifies whether the spatial occurrence of target earthquakes is consistent with the forecasts of the model, that is, if the empirical spatial distribution of the target earthquakes is not statistically different from the spatial distribution of the forecasts; the *N*-test (*21*) verifies whether the overall number of earthquakes predicted by the model is consistent with the number of earthquakes observed. More technical details on the *S*- and *N*-tests are provided in Materials and Methods. For magnitude distribution, we test whether the magnitude of the observed target earthquakes fits the Gutenberg-Richter law that is used by all models in OEF_Italy. Specifically, we test whether the magnitudes have an exponential distribution, that is, if *x* ~ exp(Λ), where *x* = *M* − *M*_{t} + ε, with *M* as the magnitude of the observed target earthquakes, *M*_{t} = 3.95 as the minimum magnitude for the target earthquakes, Λ as the average of *x*, and ε as a random uniform noise to avoid the binning problem of the magnitudes. This goodness of fit is carried out using the Lilliefors test (*22*).

## RESULTS

Figure 3A shows the results of the *S*-test for the different forecasts. The spatial probability of each target earthquake is given by the probability of the last time window that begins before the occurrence time of the target earthquake. The time of each forecast is reported in Table 1. The length of these forecasts is not an issue for the *S*-test, because the spatial probability of each target earthquake is normalized to the number of earthquakes that occurred in the time window. The results show good agreement between spatial forecasts and locations of the target earthquakes; the quantile score ξ, which mimics the *P* value of the test (Eq. 1), is always larger than 0.2 and often close to 1, which indicates optimal spatial forecasting.

Figure 3B shows the expected number of target earthquakes for each forecast. Owing to the well-known fact that clustering models based on ETAS severely underestimate the number of earthquakes in the hours immediately after a large shock because of the significant incompleteness of the seismic catalog (*23*), the statistical tests include neither the forecasts nor any target earthquakes immediately after the M6 Amatrice earthquake and after the M6.5 Norcia earthquake (forecast numbers 1, 2, and 19). These forecasts were corrected in real time with empirical factors before sending them to the Italian Civil Protection. From the figure, we note that there is one significant underestimate for forecast 18, which corresponds to the first forecast after the M5.9 earthquake in Visso on October 26. Being less than M6, we did not correct the OEF_Italy forecast; therefore, an underestimate is not a surprise, for the same reasons discussed above. The forecasts of Fig. 3B overlap in time, posing challenges for statistical testing. To avoid this problem, we rescale each weekly forecast for the real-time length of the forecast using a function proportional to 1/(*t* + *c*) (see Materials and Methods for more details). Figure 3C shows the results of the *N*-test, that is, that the cumulative number of observed target earthquakes (40) lies within the 95% confidence interval (7 to 46). Figure 3D shows the agreement between the magnitudes of the target earthquakes and the Gutenberg-Richter frequency-magnitude distribution of the forecasts; the *P* value of the Lilliefors test is larger than 0.05, which is the significance level usually adopted by the CSEP experiments (*19*–*21*).

In summary, this prospective testing phase shows a good agreement between the forecasts and the observed space-time-magnitude distribution of the largest earthquakes that occurred during the sequence. The skill and reliability with which OEF_Italy tracks the spatial evolution of the sequence can be qualitatively grasped by looking at Fig. 1C; besides showing the spatial pattern of the evolution of the sequence, the plot shows that the largest earthquakes of the sequence (marked by stars) occurred in areas that are characterized by spatial clustering of the preceding seismicity.

## DISCUSSION

Statistically reliable and skillful OEF forecasts are a prerequisite for any practical use or dissemination of this information (*12*). However, the conversion of OEF forecasts in terms of risk reduction strategies is challenging and is still a matter of discussion (*24*, *25*). To illustrate this general problem for the Central Italy sequence in Fig. 4, we show the time evolution of the weekly exceedance probability of macroseismic intensities VII and VIII in a circle of 10 km around Norcia before the M6.5 earthquake on October 30 that severely damaged this historical city. Figure 4 (top) shows the long-term background probability, including the increase in probability due to the nearby M6.2 L’Aquila earthquake that occurred in 2009 at about 55 km from Norcia. In Fig. 4 (bottom), we show that weekly earthquake probability increases to 100 to 1000 times that of background before the M6.5 earthquake, achieving weekly probabilities of about 5% (for intensity VII) and slightly less than 1% (for intensity VIII) a few days before the M6.5 event. Although the probability of such a ground shaking occurrence is a few percent, the associated risk can be large, very likely above any a priori–defined acceptable risk threshold (*26*). As is the case for any kind of low-probability high-impact events, management of this risk poses a great challenge to the wide range of possible decision-makers (*27*); but despite its unpredictability, the risk can hardly be considered negligible or irrelevant. For this reason, we foresee in the near future a clear urgency to improve communication strategies to make OEF messages comprehensible to any possible interested stakeholders.

Besides the practical implications of seismic risk reduction for society, the capability to accurately forecast the evolution of the natural phenomena is the distinctive feature of science with respect to all other human enterprises (*28*). OEF advances a valuable scientific approach, because it describes what we really know about the earthquake occurrence process, and it represents a benchmark for measuring any scientific improvement through a prospective statistical testing phase. Prospective statistical testing is the scientific gold standard for validating models (*17*), showing where models can be improved, and allowing scientists to quantify the forecasting ability of a new model compared to that of models currently in use. This framework modifies the way in which the earthquake predictability problem has been approached by scientists and perceived by society, moving from the tacit dichotomy for which earthquakes either can be predicted at a probability close to 1 or cannot be predicted at all toward a more incremental continuum of probabilities between 0 and 1, traditionally termed forecasts.

Although the results reported here demonstrate that current OEF models can provide statistically reliable and skillful forecasts even during complex seismic sequences, OEF is still in a nascent stage. The steady accumulation of scientific knowledge, which is implicit in the OEF’s brick-by-brick approach (*18*), may eventually pave the way to a “quiet revolution” in earthquake forecasting as in weather forecasting (*29*). Without pretending to be exhaustive, we foresee that significant improvement may come from including complete geodetic and/or geological information (*30*), a better physical description of the earthquake-generating process and fault interaction, and the identification of possible physical and/or empirical features that may shed light on the so-far elusive distinction between the preparatory phase of small and large earthquakes.

## MATERIALS AND METHODS

*S*- and *N*-tests

The *S*-test (*20*) is a likelihood test applied to a forecast that has been normalized to the observed number of target earthquakes, thereby isolating the spatial distribution of the forecast. After normalizing each forecast, the *S*-test is summarized by a quantile score*S*_{i} is the *i*th simulated spatial likelihood according to the model, *S*_{s} is the whole set of simulated spatial likelihoods, *S* is the likelihood of the spatial forecast relative to the observed seismicity, and *n*{*A*} indicates the number of elements in a set {*A*}. The score ζ indicates the percentage of times in which the simulated spatial likelihoods are less or equal to the spatial likelihood observed, mimicking a statistical *P* value. If ζ is below the critical threshold value, say 0.05, the spatial forecast is deemed inconsistent; values close to 1 indicate optimal spatial forecasts.

The *N*-test (*21*) verifies whether *N*_{obs} ~ *F*(*N*_{forecast}), that is, if the observed number of target earthquakes (*N*_{obs}) over the entire region samples the distribution *F*(*N*_{forecast}) that describes the expected total number of target earthquakes predicted by the OEF model (*N*_{forecast}). We assumed that the target earthquakes follow a generalized Poisson distribution with the time-varying parameter λ that has a gamma distribution to describe the variability among the rates produced by each one of the three models in OEF_Italy. This gamma-Poisson mixture produces an NBD for the expected number of target earthquakes.

### NBD of the number of target earthquakes

The mass function of the NBD can be written as*r*) is the gamma function, *n* is a positive integer random variable (the number of target earthquakes), and *r* > 0, *p* ∈ (0, 1) are the two parameters of the distribution. NBD can be derived from the gamma-Poisson mixture distribution*f*_{1}(*n*;λ) is the Poisson distribution of the positive integer random variable *n*, and the parameter λ of the Poisson distribution is assumed to follow a gamma distribution*k*, θ > 0 are the parameters of the gamma distribution that are related to the parameters *r* and *p* of NBD through

The models included in OEF_Italy provide a set of different rates for each forecast with different weights, describing the past forecasting performance of each model. From these rates, we estimated the parameters of the gamma distribution using the method of moments. In particular, we computed the weighted mean *k* and θ of the gamma distribution, that is

Once the *k* and *θ* parameters are obtained, it is possible to estimate the parameters *r* and *p* of the NBD through Eqs. 5 and 6. More information on the derivation of the NBD as mixture of Poisson distribution can be found in the study of Greenwood and Yule (*31*).

### Correcting the weekly forecasts to account for shorter forecasting time windows

Weekly forecasts (that is, the expected number of target earthquakes in 1 week) have been updated every week or every earthquake of M4.5 or larger. This means that some weekly forecasts have been superseded by a new forecast before the end of the established 1-week forecasting time window (see Table 1). Thus, when testing the expected number of target earthquakes for the entire sequence (*N*-test, Fig. 3C), we had to correct each weekly forecast accounting for the real extent of the corresponding forecasting time window. For this purpose, we assumed that the earthquake rate λ decays with time within the forecasting window by following a simple power law equation*c* = 0.015 day^{−1} (*13*). The expected number of target earthquakes in a generic forecasting time window *T* is

Using Eq. 10, we can calculate the expected number of target earthquakes *n*_{T} for a generic forecasting time window of length *T* (shorter than or equal to 7 days) from the weekly expected number of target earthquakes *n*_{week} given by the model

We have verified that replacing Eq. 9 with the more complex function*p* estimated by Lombardi and Marzocchi (*13*).

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**This work has been carried out within the Seismic Hazard Center (Centro di Pericolosità Sismica) at the INGV.

**Funding:**This paper has been funded by the Presidenza del Consiglio dei Ministri – Dipartimento della Protezione Civile. The authors are responsible of the content of this paper, which does not necessarily reflect the official position and policies of the Department of Civil Protection.

**Author contributions:**W.M. conceptualized the study, performed the research, analyzed the data, and wrote the manuscript. M.T. performed the research, analyzed the data, and revised the manuscript. G.F. performed the research, analyzed the data, revised the manuscript, and formatted the figures.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the corresponding author.

- Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).