Research ArticleRESEARCH METHODS

Detecting and quantifying causal associations in large nonlinear time series datasets

See allHide authors and affiliations

Science Advances  27 Nov 2019:
Vol. 5, no. 11, eaau4996
DOI: 10.1126/sciadv.aau4996
  • Fig. 1 Causal discovery problem.

    Consider a large-scale time series dataset (A) from a complex system such as the Earth system of which we try to estimate the underlying causal dependencies (B), accounting for linear and nonlinear dependencies and including their time lags (link labels). Pairwise correlations yield spurious dependencies due to common drivers (e.g., X1X2X3) or transitive indirect paths (e.g., X2X3X4). Causal discovery aims to unveil such spurious dependencies, leading to estimated causal networks that are, therefore, much sparser than correlation networks.

  • Fig. 2 Motivational climate example.

    Correlation, FullCI partial correlation, and PCMCI partial correlation between the monthly climate index Nino (3.4 region) (27) and land air temperature over British Columbia (28) (A) for 1979–2017 (T = 468 months), as well as artificial variables [Z and Wi in (B and C)]. Node colors depict autocorrelation strength, edge colors the partial correlation effect size, and edge widths the detection rate estimated from 500 realizations of the artificial variables Z and Wi at a significance level of 5%. The maximum lag is τmax = 6. Correlation does not allow for a causal interpretation, leading to spurious correlations (gray edges) (A). FullCI identifies the correct direction Nino→BCT but loses power because of smaller effect size (B) and higher dimensionality (C) if more variables are added. PCMCI avoids conditioning on irrelevant variables, leading to larger effect size, lower dimensionality, and, hence, higher detection power. See fig. S2 for more details.

  • Fig. 3 Proposed causal discovery method.

    (A) Time series graph (3234) representing the time-lagged causal dependency structure underlying the data. FullCI tests the presence of a causal link by XtτiXtjXt{Xtτi}, where ⫫ denotes (conditional) independence and Xt{Xtτi} the past of all N variables up to a maximum time lag τmax excluding Xtτi (gray boxes). (B) Illustration of PC1 condition selection algorithm for the variables X1 (top) and X3 (bottom): The algorithm starts by initializing the preliminary parents Pˆ(Xtj)=Xt. In the first iteration (p = 0), variables without even an unconditional dependency (e.g., uncorrelated) are removed from Pˆ(Xtj) (lightest shade of red and blue, respectively). In the second iteration (p = 1), variables that become independent conditional on the driver in Pˆ(Xtj) with largest dependency in the previous iteration are removed. In the third iteration (p = 2), variables are removed that are independent conditionally on the two strongest drivers and so on until there are no more conditions to test in Pˆ(Xtj). In this way, PC1 adaptively converges to typically only few relevant conditions (dark red/blue) that include the causal parents P with high probability and potentially some false positives (marked with a star). (C) These low-dimensional conditions are then used in the MCI conditional independence test: For testing Xt21Xt3, the conditions Pˆ(Xt3) (blue boxes) are sufficient to establish conditional independence, while the additional conditions on the parents Pˆ(Xt21) (red boxes) account for autocorrelation and make MCI an estimator of causal strength. (D) Both the PC1 and the MCI stage can be flexibly combined with linear (ParCorr) or nonlinear (GPDC and CMI) independence tests (see section S4 and table S1). ParCorr assumes linear additive noise models and GPDC only additivity. The gray scatter plots illustrate regressions of X, Y on Z and the black scatter plots the residuals. The red cubes in CMI illustrate the data-adaptive model–free k-nearest neighbor test (44), which does not require additivity.

  • Fig. 4 Real-world applications.

    (A) Tropical climate example of dependencies between monthly surface pressure anomalies for 1948–2012 (T = 780 months) in the West Pacific (WPAC; regions depicted as shaded boxes below nodes), as well as surface air temperature anomalies in the Central (CPAC) and East Pacific (EPAC), and tropical Atlantic (ATL) (65). The left panel shows correlation (Corr), and the right panel shows PCMCI in the ParCorr implementation with τmax = 7 months to also capture long time lags. Significance was assessed at a strict 1% level. (B) Cardiovascular example of links between heart rate (B) and diastolic (D) and systolic (S) blood pressure (T = 600) of 13 healthy pregnant women. The left panel shows MI, and the right panel shows PCMCI in the CMI implementation with τmax = 5 heart beats and default parameters kCMI = 60 and kperm = 5 (see table S1). The graphs are obtained by analyzing PCMCI separately for the 13 datasets and showing only links that are significant at the 1% level in at least 80% of the subjects. In all panels, node colors depict autodependency strength and edge colors the cross-link strength at the lag with maximum absolute value. See lag functions in fig. S3 and Materials and Methods for more details on the datasets. Note the different scales in colorbars.

  • Fig. 5 Numerical experiments for causal network estimation.

    (A) The full model setup is described in section S6 and table S2. In total, 20 coupling topologies for each network size N were randomly created, where all cross-link coefficients are fixed while the variables have different autocorrelations. An example network for N = 10 with node colors denoting autocorrelation strength is shown, and the arrow colors denote the (positive or negative) coefficient strength. The arrow width illustrates the detection rate of a particular method. As indicated here, the boxplots in the figures below show the distribution of detection rates across individual links with the left (right) boxplot depicting links between weakly (strongly) autocorrelated variable pairs, defined by the average autocorrelation of both variables being smaller or larger than 0.7. (B) Example time series realization of a model depicting partially highly autocorrelated variables. Each method’s performance was assessed on 100 such realizations for each random network model. (C) Performance of different methods for models with linear relationships with time series length T = 150. Table S3 provides implementation details. The bottom row shows boxplot pairs (for weakly and strongly autocorrelated variables) of the distributions of false positives, and the top row shows the distributions of true positives for different network sizes N along the x axis in each plot. Average runtime and its SD are given on top. (D) Numerical experiments for nonlinear GPDC implementation with T = 250, where dCor denotes distance correlation. (E) Results for CMI implementation with T = 500, where MI denotes mutual information. In both panels, we differentiate between linear and two types of nonlinear links (top three rows). See table S2 for model setups and section S4 and table S1 for a description of the nonlinear conditional independence tests.

  • Fig. 6 Numerical experiments for causal effect estimation.

    Shown is detection power (top row) and causal effect size (bottom row) as given by univariate linear regression (CE-Corr), multivariate regression on the whole past of the multivariate process (CE-Full), and multivariate regression on just the parents obtained with PCMCI (CE-PCMCI) for different link coefficient strengths c along the x axis in each plot. The last column denotes the regression on the true parents (CE-True). In the bottom row, the orange shades give the 1, 25, 75, and 99% quantiles and the median of the respective causal effects of all links (mean over 100 realizations for each link), and the black line denotes the true causal effect strength |c|, which is the same for all links in a model.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/11/eaau4996/DC1

    Section S1. Time series graphs

    Section S2. Alternative methods

    Section S3. Further PCMCI variants

    Section S4. Conditional independence tests

    Section S5. Theoretical properties of PCMCI

    Section S6. Numerical experiments

    Algorithm S1. Pseudo-code for condition selection algorithm.

    Algorithm S2. Pseudo-code for MCI causal discovery stage.

    Algorithm S3. Pseudo-code for adaptive Lasso regression.

    Table S1. Overview of conditional independence tests.

    Table S2. Model configurations for different experiments.

    Table S3. Overview of methods compared in numerical experiments.

    Table S4. Summarized ANOVA results for high-dimensionality ParCorr experiments.

    Table S5. Summarized ANOVA results for high-density ParCorr experiments.

    Table S6. Summarized ANOVA results for high-dimensionality GPDC and CMI experiments.

    Table S7. Summarized ANOVA results for sample size experiments.

    Table S8. Summarized ANOVA results for noise and nonstationarity experiments.

    Table S9. ANCOVA results for FullCI.

    Table S10. ANCOVA results for Lasso.

    Table S11. ANCOVA results for PC.

    Table S12. ANCOVA results for FullCI.

    Fig. S1. Illustration of notation.

    Fig. S2. Motivational climate example.

    Fig. S3. Real climate and cardiovascular applications.

    Fig. S4. Experiments for linear models with short time series length.

    Fig. S5. Experiments for linear models with longer time series length.

    Fig. S6. Experiments for dense linear models with short time series length.

    Fig. S7. Experiments for dense linear models with longer time series length.

    Fig. S8. Experiments for different method parameters.

    Fig. S9. Experiments for linear methods with different sample sizes.

    Fig. S10. Experiments for nonlinear models (part 1).

    Fig. S11. Experiments for nonlinear models with different sample sizes (part 1).

    Fig. S12. Experiments for nonlinear models (part 2).

    Fig. S13. Experiments for nonlinear models with different sample sizes (part 2).

    Fig. S14. Experiments for observational noise models.

    Fig. S15. Experiments for nonstationary models.

    Fig. S16. Runtimes for numerical experiments.

    Fig. S17. ANCOVA interaction plots.

    Fig. S18. Comparison of PCMCI and CCM on logistic maps.

    References (6680)

  • Supplementary Materials

    This PDF file includes:

    • Section S1. Time series graphs
    • Section S2. Alternative methods
    • Section S3. Further PCMCI variants
    • Section S4. Conditional independence tests
    • Section S5. Theoretical properties of PCMCI
    • Section S6. Numerical experiments
    • Algorithm S1. Pseudo-code for condition selection algorithm.
    • Algorithm S2. Pseudo-code for MCI causal discovery stage.
    • Algorithm S3. Pseudo-code for adaptive Lasso regression.
    • Table S1. Overview of conditional independence tests.
    • Table S2. Model configurations for different experiments.
    • Table S3. Overview of methods compared in numerical experiments.
    • Table S4. Summarized ANOVA results for high-dimensionality ParCorr experiments.
    • Table S5. Summarized ANOVA results for high-density ParCorr experiments.
    • Table S6. Summarized ANOVA results for high-dimensionality GPDC and CMI experiments.
    • Table S7. Summarized ANOVA results for sample size experiments.
    • Table S8. Summarized ANOVA results for noise and nonstationarity experiments.
    • Table S9. ANCOVA results for FullCI.
    • Table S10. ANCOVA results for Lasso.
    • Table S11. ANCOVA results for PC.
    • Table S12. ANCOVA results for FullCI.
    • Fig. S1. Illustration of notation.
    • Fig. S2. Motivational climate example.
    • Fig. S3. Real climate and cardiovascular applications.
    • Fig. S4. Experiments for linear models with short time series length.
    • Fig. S5. Experiments for linear models with longer time series length.
    • Fig. S6. Experiments for dense linear models with short time series length.
    • Fig. S7. Experiments for dense linear models with longer time series length.
    • Fig. S8. Experiments for different method parameters.
    • Fig. S9. Experiments for linear methods with different sample sizes.
    • Fig. S10. Experiments for nonlinear models (part 1).
    • Fig. S11. Experiments for nonlinear models with different sample sizes (part 1).
    • Fig. S12. Experiments for nonlinear models (part 2).
    • Fig. S13. Experiments for nonlinear models with different sample sizes (part 2).
    • Fig. S14. Experiments for observational noise models.
    • Fig. S15. Experiments for nonstationary models.
    • Fig. S16. Runtimes for numerical experiments.
    • Fig. S17. ANCOVA interaction plots.
    • Fig. S18. Comparison of PCMCI and CCM on logistic maps.
    • References (6680)

    Download PDF

    Files in this Data Supplement:

Navigate This Article