## Abstract

Slow earthquakes, like regular earthquakes, result from unstable frictional slip. They produce little slip and can therefore repeat frequently. We assess their predictability using the slip history of the Cascadia subduction between 2007 and 2017, during which slow earthquakes have repeatedly ruptured multiple segments. We characterize the system dynamics using embedding theory and extreme value theory. The analysis reveals a low-dimensional (<5) nonlinear chaotic system rather than a stochastic system. We calculate properties of the underlying attractor like its correlation and instantaneous dimension, instantaneous persistence, and metric entropy. We infer that the system has a predictability horizon of the order of days weeks. For the better resolved segments, the onset of large slip events can be correctly forecasted by high values of the instantaneous dimension. Longer-term deterministic prediction seems intrinsically impossible. Regular earthquakes might similarly be predictable but with a limited predictable horizon of the order of their durations.

## INTRODUCTION

There are typically two ways to forecast earthquakes. The first one considers that earthquakes result from a stochastic process: Mainshock interevent times are assumed to follow a specific distribution. This distribution is typically Poissonian and stationary (*1*), but it can be nonstationary if we consider a variable stress history (*2*) or it can be based on some renewal time model mimicking the elastic rebound theory (*3*). The probability of the next large event in a given time window can be estimated on the basis of the past seismicity using a probabilistic approach. The second way hypothesizes a deterministic process, for example assuming that earthquakes result from stick-slip frictional sliding as observed in laboratory friction experiments (*4*). The deterministic predictability of earthquakes is a matter of debate. Friction is a nonlinear phenomenon (*5*) and hence can result in deterministic chaos (*6*) with only limited predictability. Which of these two approaches is more relevant boils down to the interpretation of earthquakes as a stochastic process or as a deterministic chaotic process (i.e., a process governed by deterministic laws but highly sensitive to initial conditions). Theoretical studies and numerical simulations based on deterministic equations suggest that stick-slip sliding can be chaotic (*7*–*11*). However, we do not know whether natural earthquakes result from deterministic chaos, and, if so, what would be their predictability horizon. Detecting chaotic behavior in geophysical time series is a difficult task, mainly because of the short and noisy data generally available (*12*). While historical catalogs (*13*) or particular earthquake sequences (*14*) have been used to argue for chaos, characterizing the chaotic behavior of regular earthquakes has been a challenge because of the short observation time period, making it difficult to record multiple ruptures of a same fault segment. This limitation does not apply to laboratory earthquakes and slow earthquakes, slip events that produce very little slip at slow rates and radiate very little seismic waves (*15*).

Recent laboratory experiments have shown that both slow and fast ruptures are preceded by crackling that can be detected and used to forecast the time of failure, implying that the stick-slip behavior observed in these experiments is governed by some deterministic dynamics (*16*). In nature, a possible chaotic behavior has been inferred in the particular case of low-frequency earthquakes (*17*), a class of small slow earthquakes that can be detected and characterized with seismology. Geodetic position time series capture the whole deformation independently of radiated seismic waves and can thus give a more comprehensive view of fault slip, in particular during slow earthquakes. Here, we focus on slow slip events (SSEs) imaged from geodetic time series. SSEs are a stick-slip phenomenon, which presents a recurrence time much shorter than that of typical earthquakes, of the order of months or years instead of decades or centuries. They have been documented along major subduction megathrusts (*15*), reaching moment magnitudes [*M*_{w}] comparable to that of large earthquakes [*M*_{w} > 7]. SSEs are notably similar to regular earthquakes. They evolve into large pulse-like ruptures (*18*). They follow similar scaling laws and exhibit systematic along-strike segmentation (*18*–*19*). These characteristics make them a most suitable system to study the dynamics of frictional sliding at scale of the order of hundreds or thousands of kilometers. Our goal is to characterize whether the observed slip time series irregular behavior is emerging from an underlying deterministic dynamics or from a stochastic nature of the source.

## RESULTS

We focus on the Cascadia subduction using the slip history on the megathrust derived from the inversion of geodetic records over more than 10 years, from 2007.0 to 2017.632 (*20*). SSEs in Cascadia show a first-order segmentation consisting of 13 major segments, which produced repeating ruptures over the studied time period (*18*) (Fig. 1). We low-pass filter the slip time series to calculate the slip rate, and then integrate it over the area of a given segment to obtain the slip potency rate time series for all the 13 segments *20*), i.e., an equiripple filter (EF) with normalized passband and stopband frequencies of 35^{−1} and 21^{−1} *21*) and the extreme value theory (EVT) (*22*) to study the dynamics of the system (see Methods). The system that we study is limited to the slow slipping belt on the megathrust, at a depth of about 30 to 40 km (Fig. 1). We treat it as an isolated dynamical system, thus neglecting the potential interactions with other parts either of the megathrust or of the surrounding medium. Since we are focusing on the slow slipping region, it follows that we are dealing with only a part of the whole phase space. The phase space is that space where all possible states of the system can be represented, i.e., that space having for axes all the variables needed to explain the dynamics of the system. Because of the assumption of isolated system, we refer to the region representing the slow slipping phenomenon as the phase space, and to the trajectories in it, which represent how the state of the dynamical system evolves with time, as the attractor of the system. The slip rate (and thus the slip potency rate, which is directly proportional to it) is considered to be the main observable variable governing the evolution of friction in laboratory experiments (*5*). To apply ET and EVT, we need the temporal evolution of at least one of the variables needed to explain the system’s dynamics, and the slip potency rate is therefore a good candidate. It shows episodic bursts with the most extreme events occurring quite regularly (Fig. 2 and figs. S1 to S7).

A quantity of interest to characterize a dynamical system is its dimension, a geometrical property of the underlying attractor. There are multiple definitions of an attractor’s dimension, but they should agree in telling how many effective degrees of freedom (dof) are needed to explain the dynamical system (*23*). The ET methods indicate an embedding dimension *m* ≃ 10 for all the segments, consisting in a correlation dimension of the attractor ν ≲ 5 (figs. S8 to S12 and “Embedding theory” section). The application of EVT for the calculation of the average attractor dimension (*D*) (see “Extreme value theory” section) confirms low average dimension, for which we get a value <4 for all the segments (3.1 < *D* < 3.5; green dots in Fig. 3). However, the signal-to-noise ratio (SNR) is high enough only for segments located in the northern section of the megathrust (segments #1 and #2). This is evident when surrogate data (*24*) are used to test the significance of the observed low dimensionality of the system (Fig. 3, “Surrogate data” section, and figs. S13 and S14).

The dimension, *D*, being an average in the phase space, does not capture the information about transient instabilities. Recent advances in EVT applied to dynamical systems theory have proven that it is possible to characterize these transient instabilities via two instantaneous properties (*25*): the instantaneous dimension (*d*) and the instantaneous extremal index (θ). These quantities refer to the state of the system in a given location of the phase space and, for this reason, are also referred as local. As in (*25*), here we name them “instantaneous,” because they depend on the specific time, and we want to avoid confusion with “local” in a geographical sense. Their distribution and temporal evolution are shown in figs. S15 to S19. The instantaneous dimension, *d*, indicates the number of variables needed to explain the dynamics of the system in a specific phase space location. We expect to find high values of *d* when a metastable state is approached. In this situation, the segment is transitioning from a stable to an unstable regime, or vice versa, and more variables are then needed to explain the state of the system. This is apparent from Fig. 4A, where a section of the attractor is plotted with *d* color coded, with high values of *d* marking the onset and end of large SSEs.

The instantaneous extremal index (θ) measures the inverse of the average persistence time around a given state in a region of the phase space (Fig. 4B). In this sense, it is a local and instantaneous quantity. The extremal index (Θ), a parameter in the range [0,1], measures the degree of clustering of extremes in a stationary process *26*). A relationship between Θ and the metric entropy *H* of a system has been recently demonstrated (*27*)

The relationship between Θ and θ is not as straightforward as the one between *D* and *d*, but we must have Θ ∈ [θ_{min}, θ_{max}]. We can, thus, deduce a range of values for *H*. Given the values of θ that we have found (figs. S15 to S19) and the fact that the metric entropy is equal to the sum of all the positive Lyapunov exponents, we deduce that there is at least one positive Lyapunov exponent in our system. The positive Lyapunov exponents tell us how rapidly two trajectories in the phase space exponentially diverge. We can thus derive a predictability time *t** as the inverse of the metric entropy, and we find *t** ranging from a few days to about 2 months (Table 1). A similar conclusion, with predictability horizons of the order of days to weeks, is reached by applying nonlinear forecasting analysis (NFA) techniques (based on ET, see Methods) to estimate *H* (figs. S20 and S21) (*28*–*29*). The NFA provides only a lower bound estimate of *H* (*29*). It follows that the NFA estimate of *t** is an upper bound, which, for all the segments, is in the range estimated from EVT (Table 1).

## DISCUSSION

According to the EVT, the dynamics revealed by the filtered data requires at least four variables, which is the first integer above the fractal value of *D*. The description of stick-slip friction in the laboratory can be schematized by a spring-slider system and, in a fully inertial model, must involve slip, slip rate, and at least one additional “state” variable to allow fault healing after slip events so that they can repeat. The underlying physics and the number of state variables needed to explain laboratory friction remain matters of debate (*30*). It is important to clarify here the difference between the state variables coming from the rate-and-state description of friction (RS-state variables) and the state variables that determine the state of the system. Typically, a point in the phase space is defined by a state vector, with elements given by the variables needed to explain the dynamics of the system. An RS-state variable is only one among all the state variables belonging to a state vector.

In the absence of coupling with other spring-slider systems, the expected number of dof is 2 (because the system is described by a second-order differential equation) plus the number of RS-state variables, and the dimension should be between dof-1 and dof. For a noninertial single spring-slider, adding a second RS-state variable is a viable way to enact chaotic behavior, and a Kaplan-Yorke dimension of 2.119 ± 0.001 has been derived (*8*), consistent with the fact that we can drop 1 dof because of the noninertial assumption. A one-dimensional, fluid-infiltrated, rate-and-state friction spring-slider has 5 dof, i.e., its dynamics can be described by five variables: loading shear stress (which is proportional to slip in case of constant loading rate), slip rate, one RS-state variable (potentially related to the effective contact area) (*31*), porosity, and pore pressure (*32*). To describe SSEs, we can assume a quasi-static condition, and under this hypothesis, the number of state variables drops to four, consistent with the number of dof recovered from our analysis in Cascadia where pressurized fluid has been invoked to justify SSEs’ existence (*33*).

In a coupled system, the loading term would also depend on the variables associated with neighboring fault segments, which need to be introduced in the state vector to describe the system’s evolution. In the case of a one-dimensional system, similar to the one observed in Cascadia with along-strike segmentation, we would expect dimensions as high as 12, with the extra 8 dof carried in by the two adjacent segments. The neighbors’ effect may be particularly relevant in starting or stopping a slipping event. The maximum retrieved instantaneous dimension is in the range between 11 and 18 for the segments with high SNR. Considering the first-order segmentation here adopted and the available noisy data, we consider this result in good agreement with the maximum expected dimension of 12, as in a one-dimensional spring-and-slider chain. Potentially, an along-dip segmentation would also increase the number of neighbors, i.e., the number of dof. The proposed description considers only short-range elastic interactions, like in the Burridge-Knopoff model. Long-range interactions (i.e., interactions with segments further than the adjacent neighbors) would further increase the instantaneous dimension of the attractor.

Since the filter parameters may affect our findings, we perform the same analyses on the unfiltered time series as well as on time series filtered with various parameters and different filters. The surrogate data test on unfiltered time series suggests that we cannot reject the null hypothesis for which data can be described via a linear stochastic model (fig. S22). Furthermore, for unfiltered time series, we find extremal index values close to 1, with predictability horizons smaller than the data sampling rate, indicating a random system (fig. S26 for an example relative to segment #1). A similar conclusion would be reached by ET-based techniques (figs. S20, S21, and S27). This shows that, if not filtered, the noise in the data is dominating, masking the SSE dynamical structure. A more detailed discussion of the filter effects (and the tested filters) is reported in the “Filter” and “Surrogate data” sections. We stress here that filters applied to pure noise can potentially produce the spurious identification of finite correlation dimension (*34*), and for this reason, we further test the effects of the filters on the predictability of the time series generating (pseudo-)random time series and applying the same filters to them. The number of generated random time series is equal to the number of subfaults in segment #1, which is the segment with the largest number of subfaults. We then generate surrogate data and calculate the average dimension on both the filtered random time series and the surrogate data. The result shows that we would not be able to reject the null hypothesis according to which the time series were generated by a random process (fig. S28). We thus consider unlikely that the filters here adopted are introducing an apparent chaotic dynamics, and we notice that, for the cases that pass the surrogate data test, *D* is typically between 3 and 4 for all segments, independently of the chosen filter parameters (figs. S22 and S23).

Not all segments pass the surrogate data test. We conclude that SSEs on the northernmost segments of the Cascadia subduction zone (segments #1 and #2) result from deterministic chaos, while the central part (segments #3 to #5) shows ambiguous results, and for the southernmost portion of the fault (segments #6 to #13), we cannot infer deterministic chaos with the data at our disposal (figs. S13, S14, and S22 to S25). Our analysis implies that it should be possible to forecast the onset of large SSE ahead of time based on an explicit deterministic representation of the system dynamics or some machine learning algorithm that would implicitly capture it. Along that line, it is notable that the increase in instantaneous dimensionality seems to constitute a reliable precursor of the large SSEs (segments #1 and #2; Fig. 2 and fig. S1). The causal filter adopted here introduces a group delay larger than the predictability horizon time, implying that this approach cannot be used for real-time forecasting. Alternative noise reduction or forecasting techniques need to be investigated for this purpose, or more accurate data would be needed to avoid the filtering step.

In conclusion, we have shown that continuous spatiotemporal models of SSE evolution (*20*) provide insight into fault dynamics because, where the SNR is sufficiently high, we have shown that SSEs in Cascadia manifest chaotic dynamics. In other words, SSEs can be described as a deterministic, albeit chaotic system rather than as a random or stochastic process. The estimated predictability horizon is of the order of days or weeks that is equivalent to a fraction of the typical large SSE duration in Cascadia, which is the representative time scale of the instability. If the dynamics derived from the filtered time series is representative of the true underlying dynamics, then long-term prediction of SSEs (i.e., over a time horizon much longer than their duration) seems intrinsically impossible. This implies that for long-term predictions, a stochastic approach remains the best tool at our disposal, while short-term deterministic predictions should be achievable. As SSEs might be regarded as earthquakes in slow motion (*18*), regular earthquakes might be similarly chaotic and predictable. If the relation between predictability horizon and event duration holds also for regular earthquakes, then this would imply that long-term predictions of earthquakes are intrinsically impossible, and the predictable horizon would be only a fraction of the regular earthquakes’ typical duration (10 to 100 s for *M*_{w} > 6 earthquakes). Furthermore, it might be possible to augment the capabilities of early warning systems with a forecast of the final magnitude while the event is still ongoing.

## METHODS

### Slip potency rate

Given a subfault (*n*) of a given segment (*s* = 1, …,13), we apply a low-pass filter to the slip history *20*) to obtain a smoothed slip history [*u*_{sn}(*t*)]. Since chaotic systems are extremely sensitive to initial conditions, we test several low-pass filters (details are provided in the “Filters” section). We then multiply the smoothed slip history by the area of each subfault (*A*_{sn}), getting the slip potency for each subfault

The total slip potency for the *s*-th segment is calculated as the slip potency integral over the entire slipping area. In a discretized case where there are *N _{s}* subfaults belonging to the

*s*-th segment, the total slip potency is calculated as

We also construct slip potency rate curves for each subfault*v*_{0sn} is the long-term reference loading velocity for a given patch belonging to a specific segment as derived from (*20*).

The choice of the variable to use for the dynamics reconstruction is important because it can bias the correlation dimension estimation. In particular, the correlation dimension can be lowered when using a variable strongly correlated only with a few variables of the system (*35*). Having in mind the rate-and-state formalism for friction (*5*), we hypothesize that the slip rate, and thus the slip potency rate, is not only more strongly coupled than the slip potency to the other variables of the system, but it is also coupled with more variables (e.g., the state variables). We find smaller values for the correlation dimension of the attractor ν when using the slip potency as observable (figs. S8 to S12 and S29 to S33 and “Embedding theory” section), confirming our intuition that slip potency rate is a more suitable observable.

### Filters

The effect of low-pass filters on dynamical systems’ dimension estimation has been extensively studied. Ideal linear low-pass filters introduce further equations in the dynamical system, leading to an increase in dimensionality (*36*). Finite order nonrecursive filters instead do not modify quantities estimated via ET, but this is not strictly proven for practical cases where finite time series are available (*37*). Proper low-pass filtering data can improve the dynamical system characterization (*38*), and we thus restrain ourselves to finite order nonrecursive filters, which implies that the filtered value *u*_{sn}(*t*) is obtained from the current and past *N _{b}* nonfiltered values,

*t*is the data sampling rate, i.e., 1 day. First, we test an EF with fixed passband ripple and stopband attenuation of 1 and 60 dB, respectively, and we test 10 combinations of normalized stopband and passband frequencies:

*u*(

_{sn}*t*). With the tested combinations, the order of the filter (i.e.,

*N*) takes the following values when increasing

_{b}*i*: 48, 125, 230, 363, 484, 664, 871, 1106, 1368, and 1658. For each designed filter, we show the filter response in figs. S34 and S35. The results shown in the main text are those obtained using an EF and the parameters adopted in (

*20*) to automatically pick 64 SSEs from the dataset here studied, i.e.,

*N*) and its normalized frequency cutoff (

_{b}*f*

_{cutoff}). We test both a variable

*f*

_{cutoff}fixing

*N*to 60 (i.e., we use 2 months of data before the current epoch to smooth the signal; figs. S36 and S37) and a variable

_{b}*N*fixing

_{b}*f*

_{cutoff}to 28

^{−1}(figs. S38 and S39). In the first case, we vary

*t*and the 6 epochs before, 13 epochs before, and so on until using a total of 10 weeks of data. We use the MATLAB built-in function fir1, which adopts a normalized gain of the filter at

*f*

_{cutoff}of −6 dB. We observe that reducing the cutoff frequency lets the underlying dynamics emerge, and we consistently observe a finite fractal dimension significantly smaller than what would be calculated from surrogate data (fig. S22). We refer the reader to the “Surrogate data” section for more details.

All tested filters have a constant group delay (figs. S34 to S39). The information on the group delay is relevant when discussing about the predictability horizon of the system (see Discussion). The effects of the EF filters on the time series are shown in figs. S40 to S44. For readability, the filtered time series have been shifted back in time of the corresponding group delay to be in phase with the original time series. To test whether there is any difference between causally and noncausally filtered time series, for the filter

### Embedding theory

We apply two methodologies based on ET to determine the correlation dimension of the strange attractor, and the NFA, also based on ET, to estimate the metric entropy associated with the time series.

The correlation dimension is defined as (*39*)*C*(*r*, *T*) is the correlation integral, *r* is a variable threshold distance, *T* is the time series length, and we use the base-10 logarithm Log.

The construction of the correlation integral typically involves two parameters: the delay time (τ) and the embedding dimension (*m*). Given a scalar time series *x*(*t*), for example *m*-dimensional vector delaying in time the time series of an amount τ for *m* − 1 times

We can now define the correlation integral as (*40*)*w* > 1 to improve the convergence of the classical algorithm (*w* = 1) toward its limit *T* → ∞, and H is the Heaviside function. Values of *w*~τ_{corr} are recommended, where τ_{corr} is the autocorrelation time for the specific scalar time series under examination (*40*). We calculate τ_{corr} using the batch means method (*41*).

Basically, the algorithm counts how many *m*-dimensional vectors are closer than *r* at different times. If *m* ≥ 2ν + 1, then **F** is an embedding function of the strange attractor (*42*), and we expect the correlation dimension to be independent of *m*. Figures S8 to S12 and S29 to S33 show the ν versus Log(*r*) curves for different *m*. The selected delay times τ are chosen to emphasize the scaling region. For every segment, we have tested values of τ = 2*i* + 1 days, with *i* from 1 to 6, and *m* = 4*j*, with *j* from 1 to 5. In the case of a stochastic signal, we might expect that a scaling relationship between *C*(*r*) and *r* does not hold, and larger values of ν are calculated when increasing *m*, i.e., ν does not saturate. Nonetheless, autocorrelated noise in short time series can fool the described algorithm (*24*). For this reason, surrogate data are typically introduced, but instead of evaluating the plateau for ν, which is quite subjective, we prefer to apply the methodology from EVT for this calculation. Given the time delays retrieved from ET (see figs. S8 to S12), one accurate slip rate data per week might be sufficient to reconstruct an embedding for SSEs.

The second methodology derived from ET needs only the definition of the delay time to determine the minimum embedding dimension (*43*). The algorithm still uses the embedding function **F**, but it exploits the nearest neighbors counting in the reconstructed phase space to detect false neighbors when changing the embedding dimension *m*.

The following two quantities are defined*n*(*t*, *m*) ≤ *T* − *m*τ is an integer such that **F**_{n(t, m)}(*t*, τ, *m*) is the nearest neighbor of **F**(*t*, τ, *m*). From these two quantities, the ratio at two subsequent embedding dimensions is calculated

The quantity *E*_{1} is relevant because if the studied time series is the result of a dynamic process, i.e., it comes from an attractor, then *E*_{1}(*m*) saturates (*43*). In other words, we reach a value *E*_{1} stops changing increasing *m* above *E*_{2}, is introduced to check for randomness in the data. In theory, for stochastic time series, *E*_{1} should never saturate, but in practical cases, it can be unclear whether *E*_{1} is slowly increasing with increasing *m* or not. If the time series of interest is the result of a stochastic process, then we expect future data points to be independent from the previous ones. This means that *E*_{2}(*m*) = 1 for each *m*. If the studied time series is instead the result of a deterministic process, then *E*_{2} is not constant, and there must exist some *m* values such that *E*_{2}(*m*) ≠ 1.

We show in fig. S27 the results on both causally filtered (i.e., present values depend only on the past and the present, such that the statistic *E*_{2} can be used) with *m* ≃ 10 is detected from filtered data, implying a correlation dimension ν ≤ (*m* − 1)/2 ≲ 4.5, while for nonfiltered data, *E*_{2} remains almost constant at unitary values. This result is consistent with the results from EVT, for which a θ close to 1 is calculated for the nonfiltered time series (e.g., fig. S26).

Another quantity of interest to characterize a chaotic system is its metric entropy *H*. NFA also works using nearest neighbors in the phase space, and it is a powerful approach not only to estimate *H* from the short time series but also to assess the chaotic or stochastic nature of the time series under study (*28*–*29*). We apply here a first-order NFA approach (*28*). Being based on ET, we first embed a time series in a phase space of dimension *m* using delay coordinates with delay time τ. The idea then is to use the first half of the time series (training set) to learn a nonlinear map to make a prediction *T _{p}* time steps in the future using a local linear approximation in the phase space. We then use the second half of the data (test set) to test the learned prediction and assess the chaoticity of the time series. In practice, the following are the steps of the procedure:

1)Embed the time series *x*(*t*) like in Eq. 8 to create the observation **F**(*t*, τ, *m*) at time *t*.

2)For a given point in the test set **F**(*t*, τ, *m*), find the *k* nearest neighbors *k* = *m* + 1, but, as recommended in (*28*), to ensure stability of the solution, it is better to take *k* > *m* + 1. Here, we use *k* = 2*m* + 1.]

3)Use the *k* nearest neighbors in the training set [for which *x*(*t* + *T*_{p}) is known] to find the best local linear predictor *A*.

4)Use *A* to predict the value of *x* at time *t* + *T _{p}* in the test set:

5)Compare *x*(*t* + *T _{p}*) and

6)Normalize the prediction error by the root mean square deviation (RMSD) of the data in the test set: *x*(*t*) − 〈*x*(*t*)〉]^{2}〉^{1/2}.

If ϵ = 0, then the prediction is perfect; if instead ϵ = 1, then the prediction is as good as a constant predictor *T _{p}*, while for a chaotic system, we expect ϵ to increase when increasing

*T*. We have tested various embedding dimensions

_{p}*m*and delay times τ, obtaining similar results. In figs. S20 and S21, we show the results relative to

*m*= 9 (i.e., we assume an average dimension between 3 and 4 for the underlying attractor, as estimated via EVT) and τ = 7 days. We show the evolution of ϵ versus

*T*for all the segments using the same filter adopted in (

_{p}*20*) (fig. S20), and for segment #1 varying the filter parameters (fig. S21). The black lines indicate the value calculated for the unfiltered time series, while the blue and red lines correspond to ϵ for the various segments or filter parameters. We find that for the unfiltered time series, ϵ ≈ 1, confirming the dominant role of stochastic noise, while for sufficiently effective filters, ϵ increases with

*T*, as expected for a chaotic system. These graphs can be used to estimate the metric entropy

_{p}*H*using the following equation, valid in the limit ϵ ≪ 1

*N*is the number of data points and

*D*is the (average) attractor dimension (for which we use the values estimated from EVT). It follows that

*C*and

*H*are estimated via least square using points for which ϵ ≤ 0.3 (green dashed line in figs. S20 and S21) to fit Eq. 15.

Another estimate of *H*, which does not depend on a particular forecasting technique, can be obtained using the predictions to calculate the rate of loss of information from the time series (*29*). We calculate the correlation coefficient (ρ) between *x*(*t* + *T _{p}*) in the testing set, which will be a function of

*T*. After some approximations, the following relation can be derived (

_{p}*29*)

*c*and

*H*. The number of points to use for the estimate is not strictly defined. Previous studies have used 2 (

*29*) or 6 (

*44*) points for example. The resulting values of

*t** for all segments using the same EF adopted in the main text are reported also in Table 1.

### Extreme value theory

We use recent results of EVT to overcome some of the issues encountered when using ET algorithms. In particular, we would like (i) a method to rigorously calculate a particular statistic (for example, the attractor’s dimension) and, consequently, test whether the calculated quantity is the result of a random process or not, and (ii) a method to calculate also instantaneous properties of the attractor. The main idea behind the usage of EVT for the characterization of a dynamical system is to connect the statistics of extreme events to the Poincaré recurrence theorem (*25*, *45*).

Let us consider a generic point ζ on a strange attractor. The instantaneous dimension is a quantity that measures the density of neighbors in the phase space around ζ. To calculate this density, we can ask ourselves the following question: What is the probability to visit again a region of the phase space close to (i.e., in an arbitrary small radius ε from) ζ? If we had access to all the possible states of the system (*z*), then we could calculate the distances from the actual state under study, δ(*z*, ζ). Then, we would like to construct a random variable related to δ(*z*, ζ) and use the second theorem of EVT (*46*) to be able to gain information about the density of neighbors around ζ. The second theorem of EVT states that, given a random variable *Z* with nonvanishing probability distribution, we can set a threshold value *q* such that, for *q* sufficiently large, values of *Z* that exceed *q* (or exceedances) follow a generalized Pareto distribution (GPD).

For real case scenarios, we might only have one scalar time series, and here, we consider the univariate case. Similarly to what is done to characterize atmospheric flows (*25*), we assume that our observed scalar time series (i.e., the slip potency rate) approximates possible states of the system. The only requirement to apply this methodology is the observed time series to be sampled from an underlying ergodic system. A possible improvement would be to consider a multivariate case with slip potency rates and/or slip potencies from adjacent segments, but we defer this complication to future investigations. For our case of interest, to generate the pool of all possible *z*, we consider the slip potency rates of all subfaults belonging to the segment under examination (Eq. 4). We then select ζ to be equal to the slip potency rate at a certain epoch *t* and calculate the distance from all other possible states that we have recorded at different times. Recent studies (*45*, *47*) have shown that if we construct the random variable given by the negative log distances, *Z* = − ln (δ(*z*, ζ)), then we can use the GPD shape parameter (σ) to calculate the instantaneous dimension as

This assumes that the exceedances follow a GPD. We visually inspected the Q-Q plots of the exceedances versus a GPD, and discrepancies between the observed quantiles and those predicted by a GPD are noticed especially in the high quantile of the distribution. This probably reflects the fact that not many extreme events have been observed and a longer slip history may provide better results. Repeating this calculation for different values of ζ extracted from the pool of values *z*, we can then estimate the attractor dimension by simply averaging over *d*: *D* = 〈*d*(ζ)〉. This is a very powerful result for two reasons. We can calculate the attractor dimension (i) without the need of an embedding and (ii) simply setting one parameter: a threshold *q* on the negative log distances. Here, we have used *q* = 0.98 percentile and *q* = 0.99 percentile of the negative log distance, and we have tested both L-1 and L-2 norm distances. The results are overall similar (green dots in figs. S13 and S14). We can now apply the method of surrogate data to each segment and compare the reconstructed attractor dimensions with the one calculated from real data. We refer to the “Surrogate data” section for more details. We notice that using a threshold *q* = 0.99 reduces the average dimension to values <3. The threshold quantile *q* does not affect the conclusion that a chaotic deterministic dynamics can be successfully detected only for the northernmost segments. It affects though the interpretation of the total number of dof needed to explain the system. Following the spring-and-slider interpretation, if 3 < *D* < 4, then a second RS-state variable may be needed (see Discussion). Longer time series will help to resolve this ambiguity, since the value of *q* = 0.99 may be too high for the amount of available data.

Another quantity of interest that can be derived is the instantaneous extremal index of the system θ(ζ). In the main text, we have described the extremal index (Θ) as the reciprocal of the mean cluster size (*26*). The instantaneous extremal index can instead be defined as the inverse of the persistence in the phase space, where the persistence tells us how long the trajectory sticks in the proximity of a certain point in the phase space. In other words, while *d* is related to the density of points in a certain neighborhood of the phase space, i.e., how many times a certain region of the phase space is visited, the persistence time indicates for how long the system stays in a region in the neighborhood of a given state. If a state ζ is a fixed point, then we expect an infinite persistence, i.e., null θ. On the other hand, if we are studying a stochastic process, then we expect the persistence to tend to 0, i.e., unitary θ. In other words, we expect that at a certain epoch *t*, the system will be in a state ζ, and then at a subsequent epoch *t* + Δ*t* to be in a region of the phase space far from the one occupied at time *t*. Looking at the extremal index from this angle gives us the intuition that it can be related to the predictability of the system. As mentioned in the main text, in a recent study (*27*) this connection has been made explicit (see Eq. 1).

We calculate θ via a maximum likelihood estimator (*48*). When we look at the calculated θ (figs. S15 to S19 for noncausal filter and fig. S26 for causal and noncausal filter relative to segment #1), we see that both causally and noncausally filtered data show values far from 1. This already indicates that, no matter the causality or not of the filter, the system at the selected frequencies shows predictability features. We observe a different situation when performing the analysis on nonfiltered data (fig. S26). The values of θ are now very close to 1, implying a predictability horizon shorter than the sampling time. This is consistent with the fact that, in such nonfiltered slip potency rate time series, the high-frequency noise dominates, which is a random process.

### Surrogate data

The concept behind surrogate data techniques is rooted in statistical hypothesis testing. The method requires to state a null hypothesis, and, using the words in (*24*), “The idea is to test the original time series *against* the null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets.” The necessity to use surrogate data derives from the intrinsic finiteness of the available data: It is always possible to generate the observations with a particular random process.

The null hypothesis that we test consists in assuming that the data can be described via a linear stochastic model. Surrogate data should be generated before any filtering (*49*); thus, we first generate the surrogate data from the original slip potency time series, then filter both the actual and surrogate data, and lastly calculate the slip potency rate. Since we are using slip potency rates on multiple subfaults, when shuffling the signal phases, we want to preserve not only the autocorrelation of each slip potency rate time series but also the cross correlations between subfaults, and we thus use a generalization of the phase-randomized Fourier transform algorithm (*50*). Despite the fact that filtering should not compromise the actual chaotic nature of the system (*37*), the estimate of *D* might depend on the applied filter, and we consequently test the method on both causally and noncausally filtered time series for the *f*_{cutoff}), we increase its order and we witness a decrease in *D* (figs. S22 and S45). We notice that among the tested EFs, those that do not remove relatively high frequencies *N _{b}* ≳ 365 to 425, i.e., 12 to 14 months) is unable to pass the surrogate data test. The

*f*

_{cutoff}, and we observe a clear stabilization of

*D*at values between 3 and 4 for all segments when decreasing

*f*

_{cutoff}(figs. S22 to S25 and S45). Moreover, we notice that when applying a 60th-order HWF, we would reject the null hypothesis here tested for the segments from #1 to #5 and not only for the two northernmost.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/27/eaaz5548/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank K. V. Hodges, M. Ritzwoller, M. Shirzaei, and an anonymous reviewer for comments that helped improve the manuscript.

**Funding:**This study was partially supported by the National Science Foundation through award EAR-1821853. A.G. presented the results of this work at various conferences thanks to an appointment with the NASA Postdoctoral Program at the Jet Propulsion Laboratory, California Institute of Technology, administered by the Universities Space Research Association under contract with NASA, in collaboration with the Agenzia Spaziale Italiana (ASI). S.M. is currently supported by a postdoc fellowship from the French Centre National d’Études Spatiales (CNES). The calculations were performed using MATLAB. For the ET calculations, we have used and modified scripts developed by M. Kizilkaya (function cao_deneme.m), A. Leontitsis (Cahotic Systems Toolbox), and C. Gias (function phaseran.m), all available through the MathWorks portal.

**Author contributions:**A.G. conceived of the presented idea and performed the calculations. A.G. wrote the manuscript with support from J.-P.A. and feedback from S.M. and D.F. S.M. provided help with the data. D.F. supervised the application of EVT. All authors contributed to the discussion and interpretation of the results.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**The slip model in (

*20*), which is used as input in this study is available at ftp://ftp.gps.caltech.edu/pub/avouac/Cascadia_SSE_Nature/Data_for_Nature/. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).