Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.
self._get_shuffle_dist(array, xyz,
self.get_dependence_measure,
sig_samples=self.sig_samples,
sig_blocklength=self.sig_blocklength,
verbosity=self.verbosity)
# Sort
null_dist.sort()
pval = (null_dist >= value).mean()
if return_null_dist:
return pval, null_dist
return pval
class CMIsymb(CondIndTest):
r"""Conditional mutual information test based on discrete estimator.
Conditional mutual information is the most general dependency measure
coming from an information-theoretic framework. It makes no assumptions
about the parametric form of the dependencies by directly estimating the
underlying joint density. The test here is based on directly estimating
the joint distribution assuming symbolic input, combined with a
shuffle test to generate the distribution under the null hypothesis of
independence. The knn-estimator is suitable only for discrete variables.
For continuous variables, either pre-process the data using the functions
in data_processing or, better, use the CMIknn class.
Notes
-----
CMI and its estimator are given by
p-value
"""
null_dist = self._get_shuffle_dist(array, xyz,
self.get_dependence_measure,
sig_samples=self.sig_samples,
sig_blocklength=self.sig_blocklength,
verbosity=self.verbosity)
pval = (null_dist >= value).mean()
if return_null_dist:
return pval, null_dist
return pval
class RCOT(CondIndTest):
r"""Randomized Conditional Correlation Test.
Tests conditional independence in the fully non-parametric setting based on
Kernel measures. For not too small sample sizes, the test can utilize an
analytic approximation of the null distribution making it very fast. Based
on r-package ``rcit``. This test is described in [5]_.
Notes
-----
RCOT is a fast variant of the Kernel Conditional Independence Test (KCIT)
utilizing random Fourier features. Kernel tests measure conditional
independence in the fully non-parametric setting. In practice, RCOT tests
scale linearly with sample size and return accurate p-values much faster
than KCIT in the large sample size context. To use the analytical null
approximation, the sample size should be at least ~1000.
if df < 1:
pval = np.nan
else:
# idx_near = (np.abs(self.sample_sizes - df)).argmin()
if int(df) not in list(self.gauss_pr.null_dists):
# if np.abs(self.sample_sizes[idx_near] - df) / float(df) > 0.01:
if self.verbosity > 0:
print("Null distribution for GPDC not available "
"for deg. of freed. = %d." % df)
self.generate_nulldist(df)
null_dist_here = self.gauss_pr.null_dists[int(df)]
pval = np.mean(null_dist_here > np.abs(value))
return pval
class CMIknn(CondIndTest):
r"""Conditional mutual information test based on nearest-neighbor estimator.
Conditional mutual information is the most general dependency measure coming
from an information-theoretic framework. It makes no assumptions about the
parametric form of the dependencies by directly estimating the underlying
joint density. The test here is based on the estimator in S. Frenzel and B.
Pompe, Phys. Rev. Lett. 99, 204101 (2007), combined with a shuffle test to
generate the distribution under the null hypothesis of independence first
used in [3]_. The knn-estimator is suitable only for variables taking a
continuous range of values. For discrete variables use the CMIsymb class.
Notes
-----
CMI is given by
.. math:: I(X;Y|Z) &= \int p(z) \iint p(x,y|z) \log
def trafo(xi):
xisorted = np.sort(xi)
yi = np.linspace(1. / len(xi), 1, len(xi))
return np.interp(xi, xisorted, yi)
if np.ndim(x) == 1:
u = trafo(x)
else:
u = np.empty(x.shape)
for i in range(x.shape[0]):
u[i] = trafo(x[i])
return u
class ParCorr(CondIndTest):
r"""Partial correlation test.
Partial correlation is estimated through linear ordinary least squares (OLS)
regression and a test for non-zero linear Pearson correlation on the
residuals.
Notes
-----
To test :math:`X \perp Y | Z`, first :math:`Z` is regressed out from
:math:`X` and :math:`Y` assuming the model
.. math:: X & = Z \beta_X + \epsilon_{X} \\
Y & = Z \beta_Y + \epsilon_{Y}
using OLS regression. Then the dependency of the residuals is tested with
the Pearson correlation test.
tau_max=tau_max,
mask_type=self.cond_ind_test.mask_type,
return_cleaned_xyz=False,
do_checks=False,
verbosity=self.verbosity)
dim, T = array.shape
_, logli = self._get_single_residuals(array,
target_var=1,
return_likelihood=True)
score = -logli
return score
class GPDC(CondIndTest):
r"""GPDC conditional independence test based on Gaussian processes and
distance correlation.
GPDC is based on a Gaussian process (GP) regression and a distance
correlation test on the residuals [2]_. GP is estimated with scikit-learn
and allows to flexibly specify kernels and hyperparameters or let them be
optimized automatically. The distance correlation test is implemented with
cython. Here the null distribution is not analytically available, but can be
precomputed with the function generate_and_save_nulldists(...) which saves a
\*.npz file containing the null distribution for different sample sizes.
This file can then be supplied as null_dist_filename.
Notes
-----
GPDC is based on a Gaussian process (GP) regression and a distance