Causality#
Utilities for Liang and Granger causality analysis in atmospheric and climate data
This module provides causality analysis tools for time series data commonly used in atmospheric and climate research.
Utilities for Liang and Granger causality analysis LinkedEarth/Pyleoclim_util
Overview#
The skyborn.causality module provides methods for analyzing causal relationships between time series in atmospheric and climate data. This module implements both Granger causality and Liang-Kleeman information flow methods with comprehensive significance testing.
Key Features#
Granger Causality: Statistical test for causality based on prediction improvement
Liang Information Flow: Physically-based causality measure using information theory
Significance Testing: Multiple methods for statistical significance assessment
AR(1) Modeling: Autoregressive model fitting for red noise generation
Phase Randomization: Surrogate data generation for null hypothesis testing
Methods Available#
Causality Analysis#
- granger_causality(y1, y2, maxlag=1, addconst=True, verbose=True)[source]#
Granger causality tests
Four tests for the Granger non-causality of 2 time series.
All four tests give similar results. params_ftest and ssr_ftest are equivalent based on F test which is identical to lmtest:grangertest in R.
Wrapper for the functions described in statsmodels (https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.grangercausalitytests.html)
- Parameters:
y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed
y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed
maxlag (int or int iterable, optional) – If an integer, computes the test for all lags up to maxlag. If an iterable, computes the tests only for the lags in maxlag.
addconst (bool, optional) – Include a constant in the model.
verbose (bool, optional) – Print results
- Returns:
All test results, dictionary keys are the number of lags. For each lag the values are a tuple, with the first element a dictionary with test statistic, pvalues, degrees of freedom, the second element are the OLS estimation results for the restricted model, the unrestricted model and the restriction (contrast) matrix for the parameter f_test.
- Return type:
Notes
The null hypothesis for Granger causality tests is that y2, does NOT Granger cause y1. Granger causality means that past values of y2 have a statistically significant effect on the current value of y1, taking past values of y1 into account as regressors. We reject the null hypothesis that y2 does not Granger cause y1 if the p-values are below a desired threshold (e.g. 0.05).
The null hypothesis for all four test is that the coefficients corresponding to past values of the second time series are zero.
‘params_ftest’, ‘ssr_ftest’ are based on the F distribution
‘ssr_chi2test’, ‘lrtest’ are based on the chi-square distribution
See also
skyborn.causality.liang_causalityInformation flow estimated using the Liang algorithm
skyborn.causality.signif_isopersistSignificance test with AR(1) with same persistence
skyborn.causality.signif_isospecSignificance test with surrogates with randomized phases
References
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438.
Granger, C. W. J. (1980). Testing for causality: A personal viewpoont. Journal of Economic Dynamics and Control, 2, 329-352.
Granger, C. W. J. (1988). Some recent development in a concept of causality. Journal of Econometrics, 39(1-2), 199-211.
- liang_causality(y1, y2, npt=1, signif_test='isospec', nsim=1000, qs=[0.005, 0.025, 0.05, 0.95, 0.975, 0.995])[source]#
Liang-Kleeman information flow
Estimate the Liang information transfer from series y2 to series y1 with significance estimates using either an AR(1) tests with series with the same persistence or surrogates with randomized phases.
- Parameters:
y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed
y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed
npt (int >=1) – time advance in performing Euler forward differencing, e.g., 1, 2. Unless the series are generated with a highly chaotic deterministic system, npt=1 should be used
signif_test (str; {'isopersist', 'isospec'}) – the method for significance test see signif_isospec and signif_isopersist for details.
nsim (int) – the number of AR(1) surrogates for significance test
qs (list) – the quantiles for significance test
- Returns:
res – A dictionary of results including:
T21 : float - information flow from y2 to y1 (Note: not y1 -> y2!)
tau21 : float - the standardized information flow from y2 to y1
Z : float - the total information flow from y2 to y1
dH1_star : float - dH*/dt (Liang, 2016)
dH1_noise : float
signif_qs : the quantiles for significance test
T21_noise : list - the quantiles of the information flow from noise2 to noise1 for significance testing
tau21_noise : list - the quantiles of the standardized information flow from noise2 to noise1 for significance testing
- Return type:
See also
skyborn.causality.liangInformation flow estimated using the Liang algorithm
skyborn.causality.granger_causalityInformation flow estimated using the Granger algorithm
skyborn.causality.signif_isopersistSignificance test with AR(1) with same persistence
skyborn.causality.signif_isospecSignificance test with surrogates with randomized phases
References
Liang, X.S. (2013) The Liang-Kleeman Information Flow: Theory and Applications. Entropy, 15, 327-360, doi:10.3390/e15010327
Liang, X.S. (2014) Unraveling the cause-effect relation between timeseries. Physical review, E 90, 052150
Liang, X.S. (2015) Normalizing the causality between time series. Physical review, E 92, 022126
Liang, X.S. (2016) Information flow and causality as rigorous notions ab initio. Physical review, E 94, 052201
Significance Testing#
- signif_isopersist(y1, y2, method, nsim=1000, qs=[0.005, 0.025, 0.05, 0.95, 0.975, 0.995], **kwargs)[source]#
significance test with AR(1) with same persistence
- Parameters:
y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed
y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed
method (str; {'liang'}) – estimates for the Liang method
nsim (int) – the number of AR(1) surrogates for significance test
qs (list) – the quantiles for significance test
- Returns:
res_dict –
A dictionary with the following information:
T21_noise_qs : list the quantiles of the information flow from noise2 to noise1 for significance testing
tau21_noise_qs : list the quantiles of the standardized information flow from noise2 to noise1 for significance testing
- Return type:
See also
skyborn.causality.liang_causalityInformation flow estimated using the Liang algorithm
skyborn.causality.granger_causalityInformation flow estimated using the Granger algorithm
skyborn.causality.signif_isospecSignificance test with surrogates with randomized phases
- signif_isospec(y1, y2, method, nsim=1000, qs=[0.005, 0.025, 0.05, 0.95, 0.975, 0.995], **kwargs)[source]#
significance test with surrogates with randomized phases
- Parameters:
y1 (array) – vectors of (real) numbers with identical length, no NaNs allowed
y2 (array) – vectors of (real) numbers with identical length, no NaNs allowed
method (str; {'liang'}) – estimates for the Liang method
nsim (int) – the number of surrogates for significance test
qs (list) – the quantiles for significance test
kwargs (dict) – keyword arguments for the causality method (e.g. npt for Liang-Kleeman)
- Returns:
res_dict –
- A dictionary with the following information:
- T21_noise_qslist
the quantiles of the information flow from noise2 to noise1 for significance testing
- tau21_noise_qslist
the quantiles of the standardized information flow from noise2 to noise1 for significance testing
- Return type:
See also
skyborn.causality.liang_causalityInformation flow estimated using the Liang algorithm
skyborn.causality.granger_causalityInformation flow estimated using the Granger algorithm
skyborn.causality.signif_isopersistSignificance test with AR(1) with same persistence
Utility Functions#
- ar1_fit_evenly(y)[source]#
Returns the lag-1 autocorrelation from AR(1) fit.
Uses statsmodels.tsa.arima.model.ARIMA. to calculate lag-1 autocorrelation
MARK FOR DEPRECATION once uar1_fit is adopted
- Parameters:
y (array) – Vector of (float) numbers as a time series
- Returns:
g – Lag-1 autocorrelation coefficient
- Return type:
- phaseran(recblk, nsurr)[source]#
Simultaneous phase randomization of a set of time series
It creates blocks of surrogate data with the same second order properties as the original time series dataset by transforming the original data into the frequency domain, randomizing the phases simultaneoulsy across the time series and converting the data back into the time domain.
Written by Carlos Gias for MATLAB
http://www.mathworks.nl/matlabcentral/fileexchange/32621-phase-randomization/content/phaseran.m
- Parameters:
recblk (numpy array) – 2D array , Row: time sample. Column: recording. An odd number of time samples (height) is expected. If that is not the case, recblock is reduced by 1 sample before the surrogate data is created. The class must be double and it must be nonsparse.
nsurr (int) – is the number of image block surrogates that you want to generate.
- Returns:
surrblk – 3D multidimensional array image block with the surrogate datasey along the third dimension
- Return type:
numpy array
See also
skyborn.causality.liang_causalityLiang-Kleeman information flow analysis
skyborn.causality.granger_causalityGranger causality analysis
References
Prichard, D., Theiler, J. Generating Surrogate Data for Time Series with Several Simultaneously Measured Variables (1994) Physical Review Letters, Vol 73, Number 7
Carlos Gias (2020). Phase randomization, MATLAB Central File Exchange
Theoretical Background#
Granger Causality#
Granger causality tests whether past values of one time series help predict another time series beyond what can be predicted from the target series alone. The null hypothesis states that the second series does NOT Granger-cause the first.
Mathematical Foundation:
For time series X and Y, Y Granger-causes X if:
where \(\sigma^2\) denotes the prediction error variance.
Liang-Kleeman Information Flow#
The Liang method quantifies information flow between time series using rigorous information theory principles. It measures the rate of information transfer from one series to another.
Key Metrics:
T21: Information flow from series 2 to series 1
tau21: Normalized information flow (relative to total information)
Z: Total information in the system
where \(C_{ij}\) are covariance matrix elements.
Usage Examples#
Basic Granger Causality Test#
import skyborn.causality as scaus
import numpy as np
# Generate sample atmospheric time series
np.random.seed(42)
n_samples = 1000
# Temperature-like series
temp = np.cumsum(np.random.randn(n_samples)) * 0.1
# Pressure-like series with some dependence on temperature
pressure = np.zeros(n_samples)
for i in range(1, n_samples):
pressure[i] = 0.7 * pressure[i-1] + 0.3 * temp[i-1] + np.random.randn()
# Test if temperature Granger-causes pressure
gc_result = scaus.granger_causality(pressure, temp, maxlag=5)
# Extract p-values for different lags
for lag in gc_result:
f_stat = gc_result[lag][0]['ssr_ftest'][0]
p_value = gc_result[lag][0]['ssr_ftest'][1]
print(f"Lag {lag}: F-statistic = {f_stat:.3f}, p-value = {p_value:.3f}")
Liang Information Flow Analysis#
import skyborn.causality as scaus
import numpy as np
# Generate coupled time series
np.random.seed(123)
n = 500
# Series 1: autonomous dynamics
x1 = np.zeros(n)
for i in range(1, n):
x1[i] = 0.8 * x1[i-1] + np.random.randn()
# Series 2: driven by series 1
x2 = np.zeros(n)
for i in range(1, n):
x2[i] = 0.5 * x2[i-1] + 0.4 * x1[i-1] + np.random.randn()
# Calculate Liang causality with significance testing
result = scaus.liang_causality(x2, x1, signif_test='isospec', nsim=1000)
print(f"Information flow (T21): {result['T21']:.4f}")
print(f"Normalized flow (tau21): {result['tau21']:.4f}")
print(f"Total information (Z): {result['Z']:.4f}")
# Check significance
sig_level = 0.05
lower_bound = result['T21_noise'][1] # 2.5th percentile
upper_bound = result['T21_noise'][-2] # 97.5th percentile
if result['T21'] > upper_bound or result['T21'] < lower_bound:
print(f"Causality is significant at {(1-sig_level)*100}% level")
else:
print("Causality is not significant")
Atmospheric Science Application#
import skyborn.causality as scaus
import numpy as np
import matplotlib.pyplot as plt
# Simulate ENSO-like and temperature-like indices
def generate_enso_temp_data(n_years=50):
n_months = n_years * 12
t = np.arange(n_months)
# ENSO-like oscillation (irregular ~3-7 year cycle)
enso_base = np.sin(2 * np.pi * t / 42) + 0.5 * np.sin(2 * np.pi * t / 84)
enso_noise = np.random.randn(n_months) * 0.5
enso = enso_base + enso_noise
# Temperature anomaly influenced by ENSO with lag
temp = np.zeros(n_months)
for i in range(3, n_months):
temp[i] = 0.6 * temp[i-1] + 0.3 * enso[i-3] + np.random.randn() * 0.3
return enso, temp
# Generate data
enso_index, temp_anomaly = generate_enso_temp_data(40)
# Test causality: Does ENSO cause temperature changes?
liang_result = scaus.liang_causality(
temp_anomaly, enso_index,
signif_test='isopersist',
nsim=2000
)
print("ENSO → Temperature Analysis:")
print(f"Information Flow: {liang_result['T21']:.4f}")
print(f"Normalized Flow: {liang_result['tau21']:.4f}")
# Compare with Granger causality
granger_result = scaus.granger_causality(temp_anomaly, enso_index, maxlag=6)
print("\\nGranger Causality Results:")
for lag in [1, 3, 6]:
if lag in granger_result:
p_val = granger_result[lag][0]['ssr_ftest'][1]
print(f"Lag {lag}: p-value = {p_val:.4f}")
Significance Testing Methods#
Isopersistent Testing (signif_isopersist)#
Tests significance using AR(1) surrogates with the same persistence (autocorrelation) as the original data. This method preserves the red noise characteristics while removing any causal relationships.
When to use: - When your data shows significant autocorrelation - For testing against red noise null hypothesis - When computational efficiency is important
Isospectral Testing (signif_isospec)#
Tests significance using phase-randomized surrogates that preserve the power spectrum of the original data. This method maintains spectral properties while destroying phase relationships.
When to use: - When preserving spectral characteristics is important - For more conservative significance testing - When dealing with complex periodic behaviors
Interpretation Guidelines#
Granger Causality Interpretation#
p-value < 0.05: Reject null hypothesis; evidence for Granger causality
p-value ≥ 0.05: Fail to reject null hypothesis; no evidence for causality
Consider multiple lag values to capture different timescale relationships
Be aware that Granger causality tests statistical precedence, not physical causation
Liang Information Flow Interpretation#
T21 > 0: Positive information flow from series 2 to series 1
T21 < 0: Negative information flow (information destruction)
|tau21| → 1: Strong relative causality
|tau21| → 0: Weak relative causality
Compare against significance bounds from surrogate testing
Best Practices#
Data Preparation#
Stationarity: Ensure time series are stationary or properly detrended
Length: Use sufficiently long time series (typically > 100 points)
Sampling: Ensure consistent and appropriate sampling rates
Missing Data: Handle gaps appropriately before analysis
Statistical Considerations#
Multiple Testing: Apply correction for multiple hypothesis testing
Lag Selection: Test multiple lag values for Granger causality
Surrogate Count: Use adequate number of surrogates (≥ 1000) for significance testing
Cross-Validation: Validate results on independent data when possible
Physical Interpretation#
Mechanism: Consider physical mechanisms that could explain causal relationships
Timescales: Match analysis timescales to relevant physical processes
Confounding: Be aware of potential confounding variables
Bidirectionality: Test causality in both directions
References#
Granger Causality:
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438.
Granger, C. W. J. (1980). Testing for causality: A personal viewpoint. Journal of Economic Dynamics and Control, 2, 329-352.
Liang-Kleeman Information Flow:
Liang, X. S. (2013). The Liang-Kleeman Information Flow: Theory and Applications. Entropy, 15, 327-360.
Liang, X. S. (2014). Unraveling the cause-effect relation between time series. Physical Review E, 90, 052150.
Liang, X. S. (2016). Information flow and causality as rigorous notions ab initio. Physical Review E, 94, 052201.
Surrogate Methods:
Prichard, D., & Theiler, J. (1994). Generating surrogate data for time series with several simultaneously measured variables. Physical Review Letters, 73(7), 951-954.