Causality#

Utilities for Liang and Granger causality analysis in atmospheric and climate data

This module provides causality analysis tools for time series data commonly used in atmospheric and climate research.

Utilities for Liang and Granger causality analysis LinkedEarth/Pyleoclim_util

Overview#

The skyborn.causality module provides methods for analyzing causal relationships between time series in atmospheric and climate data. This module implements both Granger causality and Liang-Kleeman information flow methods with comprehensive significance testing.

Key Features#

  • Granger Causality: Statistical test for causality based on prediction improvement

  • Liang Information Flow: Physically-based causality measure using information theory

  • Significance Testing: Multiple methods for statistical significance assessment

  • AR(1) Modeling: Autoregressive model fitting for red noise generation

  • Phase Randomization: Surrogate data generation for null hypothesis testing

Methods Available#

Causality Analysis#

granger_causality(y1, y2, maxlag=1, addconst=True, verbose=True)[source]#

Granger causality tests

Four tests for the Granger non-causality of 2 time series.

All four tests give similar results. params_ftest and ssr_ftest are equivalent based on F test which is identical to lmtest:grangertest in R.

Wrapper for the functions described in statsmodels (https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.grangercausalitytests.html)

Parameters:
  • y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • maxlag (int or int iterable, optional) – If an integer, computes the test for all lags up to maxlag. If an iterable, computes the tests only for the lags in maxlag.

  • addconst (bool, optional) – Include a constant in the model.

  • verbose (bool, optional) – Print results

Returns:

All test results, dictionary keys are the number of lags. For each lag the values are a tuple, with the first element a dictionary with test statistic, pvalues, degrees of freedom, the second element are the OLS estimation results for the restricted model, the unrestricted model and the restriction (contrast) matrix for the parameter f_test.

Return type:

dict

Notes

The null hypothesis for Granger causality tests is that y2, does NOT Granger cause y1. Granger causality means that past values of y2 have a statistically significant effect on the current value of y1, taking past values of y1 into account as regressors. We reject the null hypothesis that y2 does not Granger cause y1 if the p-values are below a desired threshold (e.g. 0.05).

The null hypothesis for all four test is that the coefficients corresponding to past values of the second time series are zero.

‘params_ftest’, ‘ssr_ftest’ are based on the F distribution

‘ssr_chi2test’, ‘lrtest’ are based on the chi-square distribution

See also

skyborn.causality.liang_causality

Information flow estimated using the Liang algorithm

skyborn.causality.signif_isopersist

Significance test with AR(1) with same persistence

skyborn.causality.signif_isospec

Significance test with surrogates with randomized phases

References

Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438.

Granger, C. W. J. (1980). Testing for causality: A personal viewpoont. Journal of Economic Dynamics and Control, 2, 329-352.

Granger, C. W. J. (1988). Some recent development in a concept of causality. Journal of Econometrics, 39(1-2), 199-211.

liang_causality(y1, y2, npt=1, signif_test='isospec', nsim=1000, qs=[0.005, 0.025, 0.05, 0.95, 0.975, 0.995])[source]#

Liang-Kleeman information flow

Estimate the Liang information transfer from series y2 to series y1 with significance estimates using either an AR(1) tests with series with the same persistence or surrogates with randomized phases.

Parameters:
  • y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • npt (int >=1) – time advance in performing Euler forward differencing, e.g., 1, 2. Unless the series are generated with a highly chaotic deterministic system, npt=1 should be used

  • signif_test (str; {'isopersist', 'isospec'}) – the method for significance test see signif_isospec and signif_isopersist for details.

  • nsim (int) – the number of AR(1) surrogates for significance test

  • qs (list) – the quantiles for significance test

Returns:

res – A dictionary of results including:

  • T21 : float - information flow from y2 to y1 (Note: not y1 -> y2!)

  • tau21 : float - the standardized information flow from y2 to y1

  • Z : float - the total information flow from y2 to y1

  • dH1_star : float - dH*/dt (Liang, 2016)

  • dH1_noise : float

  • signif_qs : the quantiles for significance test

  • T21_noise : list - the quantiles of the information flow from noise2 to noise1 for significance testing

  • tau21_noise : list - the quantiles of the standardized information flow from noise2 to noise1 for significance testing

Return type:

dict

See also

skyborn.causality.liang

Information flow estimated using the Liang algorithm

skyborn.causality.granger_causality

Information flow estimated using the Granger algorithm

skyborn.causality.signif_isopersist

Significance test with AR(1) with same persistence

skyborn.causality.signif_isospec

Significance test with surrogates with randomized phases

References

Liang, X.S. (2013) The Liang-Kleeman Information Flow: Theory and Applications. Entropy, 15, 327-360, doi:10.3390/e15010327

Liang, X.S. (2014) Unraveling the cause-effect relation between timeseries. Physical review, E 90, 052150

Liang, X.S. (2015) Normalizing the causality between time series. Physical review, E 92, 022126

Liang, X.S. (2016) Information flow and causality as rigorous notions ab initio. Physical review, E 94, 052201

Significance Testing#

signif_isopersist(y1, y2, method, nsim=1000, qs=[0.005, 0.025, 0.05, 0.95, 0.975, 0.995], **kwargs)[source]#

significance test with AR(1) with same persistence

Parameters:
  • y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • method (str; {'liang'}) – estimates for the Liang method

  • nsim (int) – the number of AR(1) surrogates for significance test

  • qs (list) – the quantiles for significance test

Returns:

res_dict

A dictionary with the following information:

  • T21_noise_qs : list the quantiles of the information flow from noise2 to noise1 for significance testing

  • tau21_noise_qs : list the quantiles of the standardized information flow from noise2 to noise1 for significance testing

Return type:

dict

See also

skyborn.causality.liang_causality

Information flow estimated using the Liang algorithm

skyborn.causality.granger_causality

Information flow estimated using the Granger algorithm

skyborn.causality.signif_isospec

Significance test with surrogates with randomized phases

signif_isospec(y1, y2, method, nsim=1000, qs=[0.005, 0.025, 0.05, 0.95, 0.975, 0.995], **kwargs)[source]#

significance test with surrogates with randomized phases

Parameters:
  • y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed

  • method (str; {'liang'}) – estimates for the Liang method

  • nsim (int) – the number of surrogates for significance test

  • qs (list) – the quantiles for significance test

  • kwargs (dict) – keyword arguments for the causality method (e.g. npt for Liang-Kleeman)

Returns:

res_dict

A dictionary with the following information:
  • T21_noise_qslist

    the quantiles of the information flow from noise2 to noise1 for significance testing

  • tau21_noise_qslist

    the quantiles of the standardized information flow from noise2 to noise1 for significance testing

Return type:

dict

See also

skyborn.causality.liang_causality

Information flow estimated using the Liang algorithm

skyborn.causality.granger_causality

Information flow estimated using the Granger algorithm

skyborn.causality.signif_isopersist

Significance test with AR(1) with same persistence

Utility Functions#

ar1_fit_evenly(y)[source]#

Returns the lag-1 autocorrelation from AR(1) fit.

Uses statsmodels.tsa.arima.model.ARIMA. to calculate lag-1 autocorrelation

MARK FOR DEPRECATION once uar1_fit is adopted

Parameters:

y (array) – Vector of (float) numbers as a time series

Returns:

g – Lag-1 autocorrelation coefficient

Return type:

float

phaseran(recblk, nsurr)[source]#

Simultaneous phase randomization of a set of time series

It creates blocks of surrogate data with the same second order properties as the original time series dataset by transforming the original data into the frequency domain, randomizing the phases simultaneoulsy across the time series and converting the data back into the time domain.

Written by Carlos Gias for MATLAB

http://www.mathworks.nl/matlabcentral/fileexchange/32621-phase-randomization/content/phaseran.m

Parameters:
  • recblk (numpy array) – 2D array , Row: time sample. Column: recording. An odd number of time samples (height) is expected. If that is not the case, recblock is reduced by 1 sample before the surrogate data is created. The class must be double and it must be nonsparse.

  • nsurr (int) – is the number of image block surrogates that you want to generate.

Returns:

surrblk – 3D multidimensional array image block with the surrogate datasey along the third dimension

Return type:

numpy array

See also

skyborn.causality.liang_causality

Liang-Kleeman information flow analysis

skyborn.causality.granger_causality

Granger causality analysis

References

  • Prichard, D., Theiler, J. Generating Surrogate Data for Time Series with Several Simultaneously Measured Variables (1994) Physical Review Letters, Vol 73, Number 7

  • Carlos Gias (2020). Phase randomization, MATLAB Central File Exchange

Theoretical Background#

Granger Causality#

Granger causality tests whether past values of one time series help predict another time series beyond what can be predicted from the target series alone. The null hypothesis states that the second series does NOT Granger-cause the first.

Mathematical Foundation:

For time series X and Y, Y Granger-causes X if:

\[\sigma^2(X_t | X_{t-1}, X_{t-2}, \ldots) > \sigma^2(X_t | X_{t-1}, X_{t-2}, \ldots, Y_{t-1}, Y_{t-2}, \ldots)\]

where \(\sigma^2\) denotes the prediction error variance.

Liang-Kleeman Information Flow#

The Liang method quantifies information flow between time series using rigorous information theory principles. It measures the rate of information transfer from one series to another.

Key Metrics:

  • T21: Information flow from series 2 to series 1

  • tau21: Normalized information flow (relative to total information)

  • Z: Total information in the system

\[T_{2 \rightarrow 1} = \frac{C_{12}}{C_{11}} \cdot \frac{-C_{21}\frac{dC_{11}}{dt} + C_{11}\frac{dC_{21}}{dt}}{|C|}\]

where \(C_{ij}\) are covariance matrix elements.

Usage Examples#

Basic Granger Causality Test#

import skyborn.causality as scaus
import numpy as np

# Generate sample atmospheric time series
np.random.seed(42)
n_samples = 1000

# Temperature-like series
temp = np.cumsum(np.random.randn(n_samples)) * 0.1

# Pressure-like series with some dependence on temperature
pressure = np.zeros(n_samples)
for i in range(1, n_samples):
    pressure[i] = 0.7 * pressure[i-1] + 0.3 * temp[i-1] + np.random.randn()

# Test if temperature Granger-causes pressure
gc_result = scaus.granger_causality(pressure, temp, maxlag=5)

# Extract p-values for different lags
for lag in gc_result:
    f_stat = gc_result[lag][0]['ssr_ftest'][0]
    p_value = gc_result[lag][0]['ssr_ftest'][1]
    print(f"Lag {lag}: F-statistic = {f_stat:.3f}, p-value = {p_value:.3f}")

Liang Information Flow Analysis#

import skyborn.causality as scaus
import numpy as np

# Generate coupled time series
np.random.seed(123)
n = 500

# Series 1: autonomous dynamics
x1 = np.zeros(n)
for i in range(1, n):
    x1[i] = 0.8 * x1[i-1] + np.random.randn()

# Series 2: driven by series 1
x2 = np.zeros(n)
for i in range(1, n):
    x2[i] = 0.5 * x2[i-1] + 0.4 * x1[i-1] + np.random.randn()

# Calculate Liang causality with significance testing
result = scaus.liang_causality(x2, x1, signif_test='isospec', nsim=1000)

print(f"Information flow (T21): {result['T21']:.4f}")
print(f"Normalized flow (tau21): {result['tau21']:.4f}")
print(f"Total information (Z): {result['Z']:.4f}")

# Check significance
sig_level = 0.05
lower_bound = result['T21_noise'][1]  # 2.5th percentile
upper_bound = result['T21_noise'][-2]  # 97.5th percentile

if result['T21'] > upper_bound or result['T21'] < lower_bound:
    print(f"Causality is significant at {(1-sig_level)*100}% level")
else:
    print("Causality is not significant")

Atmospheric Science Application#

import skyborn.causality as scaus
import numpy as np
import matplotlib.pyplot as plt

# Simulate ENSO-like and temperature-like indices
def generate_enso_temp_data(n_years=50):
    n_months = n_years * 12
    t = np.arange(n_months)

    # ENSO-like oscillation (irregular ~3-7 year cycle)
    enso_base = np.sin(2 * np.pi * t / 42) + 0.5 * np.sin(2 * np.pi * t / 84)
    enso_noise = np.random.randn(n_months) * 0.5
    enso = enso_base + enso_noise

    # Temperature anomaly influenced by ENSO with lag
    temp = np.zeros(n_months)
    for i in range(3, n_months):
        temp[i] = 0.6 * temp[i-1] + 0.3 * enso[i-3] + np.random.randn() * 0.3

    return enso, temp

# Generate data
enso_index, temp_anomaly = generate_enso_temp_data(40)

# Test causality: Does ENSO cause temperature changes?
liang_result = scaus.liang_causality(
    temp_anomaly, enso_index,
    signif_test='isopersist',
    nsim=2000
)

print("ENSO → Temperature Analysis:")
print(f"Information Flow: {liang_result['T21']:.4f}")
print(f"Normalized Flow: {liang_result['tau21']:.4f}")

# Compare with Granger causality
granger_result = scaus.granger_causality(temp_anomaly, enso_index, maxlag=6)

print("\\nGranger Causality Results:")
for lag in [1, 3, 6]:
    if lag in granger_result:
        p_val = granger_result[lag][0]['ssr_ftest'][1]
        print(f"Lag {lag}: p-value = {p_val:.4f}")

Significance Testing Methods#

Isopersistent Testing (signif_isopersist)#

Tests significance using AR(1) surrogates with the same persistence (autocorrelation) as the original data. This method preserves the red noise characteristics while removing any causal relationships.

When to use: - When your data shows significant autocorrelation - For testing against red noise null hypothesis - When computational efficiency is important

Isospectral Testing (signif_isospec)#

Tests significance using phase-randomized surrogates that preserve the power spectrum of the original data. This method maintains spectral properties while destroying phase relationships.

When to use: - When preserving spectral characteristics is important - For more conservative significance testing - When dealing with complex periodic behaviors

Interpretation Guidelines#

Granger Causality Interpretation#

  • p-value < 0.05: Reject null hypothesis; evidence for Granger causality

  • p-value ≥ 0.05: Fail to reject null hypothesis; no evidence for causality

  • Consider multiple lag values to capture different timescale relationships

  • Be aware that Granger causality tests statistical precedence, not physical causation

Liang Information Flow Interpretation#

  • T21 > 0: Positive information flow from series 2 to series 1

  • T21 < 0: Negative information flow (information destruction)

  • |tau21| → 1: Strong relative causality

  • |tau21| → 0: Weak relative causality

  • Compare against significance bounds from surrogate testing

Best Practices#

Data Preparation#

  1. Stationarity: Ensure time series are stationary or properly detrended

  2. Length: Use sufficiently long time series (typically > 100 points)

  3. Sampling: Ensure consistent and appropriate sampling rates

  4. Missing Data: Handle gaps appropriately before analysis

Statistical Considerations#

  1. Multiple Testing: Apply correction for multiple hypothesis testing

  2. Lag Selection: Test multiple lag values for Granger causality

  3. Surrogate Count: Use adequate number of surrogates (≥ 1000) for significance testing

  4. Cross-Validation: Validate results on independent data when possible

Physical Interpretation#

  1. Mechanism: Consider physical mechanisms that could explain causal relationships

  2. Timescales: Match analysis timescales to relevant physical processes

  3. Confounding: Be aware of potential confounding variables

  4. Bidirectionality: Test causality in both directions

References#

Granger Causality:

  1. Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438.

  2. Granger, C. W. J. (1980). Testing for causality: A personal viewpoint. Journal of Economic Dynamics and Control, 2, 329-352.

Liang-Kleeman Information Flow:

  1. Liang, X. S. (2013). The Liang-Kleeman Information Flow: Theory and Applications. Entropy, 15, 327-360.

  2. Liang, X. S. (2014). Unraveling the cause-effect relation between time series. Physical Review E, 90, 052150.

  3. Liang, X. S. (2016). Information flow and causality as rigorous notions ab initio. Physical Review E, 94, 052201.

Surrogate Methods:

  1. Prichard, D., & Theiler, J. (1994). Generating surrogate data for time series with several simultaneously measured variables. Physical Review Letters, 73(7), 951-954.