ballet.validation.entropy module¶

ballet.validation.entropy.estimate_conditional_information(x, y, z)[source]¶

Estimate the conditional mutual information of x and y given z

Conditional mutual information is the mutual information of two datasets, given a third:

\[I(x;y|z) = H(x,z) + H(y,z) - H(x,y,z) - H(z)\]

Where \(H(X)\) is the Shannon entropy of dataset \(X\). For continuous datasets, adapts the KSG Estimator [1] for mutual information.

Eq 8 from [1] holds because the epsilon terms cancel out. Let \(d_x\), represent the dimensionality of the continuous portion of x. Then, we see that:

\begin{align} d_{xz} + d_{yz} - d_{xyz} - d_z &= (d_x + d_z) + (d_y + d_z) - (d_x + d_y + d_z) - d_z \\ &= 0 \end{align}

Parameters

x (ndarray) – An array with shape (n_samples, n_features_x)
y (ndarray) – An array with shape (n_samples, n_features_y)
z (ndarray) – An array with shape (n_samples, n_features_z). This is the dataset being conditioned on.

Return type

float

Returns

conditional mutual information of x and y given z.

References:

1: A. Kraskov, H. Stogbauer and P. Grassberger, “Estimating mutual information”. Phys. Rev. E 69, 2004.

ballet.validation.entropy.estimate_entropy(x)[source]¶

Estimate dataset entropy.

This function can take datasets of mixed discrete and continuous features, and uses a set of heuristics to determine which functions to apply to each. Discrete (Shannon) entropy is estimated via the empirical probability mass function. Continuous (differential) entropy is estimated via the KSG estimator [1].

Let x be made of continuous features c and discrete features d. To deal with both continuous and discrete features, We use the following reworking of entropy:

\begin{align} H(x) &= H(c,d) \\ &= H(d) + H(c | d) \\ &= \sum_{x \in d} p(x) H(c(x)) + H(d), \end{align}

where \(c(x)\) is a dataset that represents the rows of the continuous dataset in the same row as a discrete column with value x in the original dataset.

Parameters: x (ndarray) – Dataset with shape (n_samples, n_features) or (n_samples, )
Return type: float
Returns: Dataset entropy of X.

References:

1: A. Kraskov, H. Stogbauer and P. Grassberger, “Estimating mutual information”. Phys. Rev. E 69, 2004.

ballet.validation.entropy.estimate_mutual_information(x, y)[source]¶

Estimate the mutual information of two datasets.

Mutual information is a measure of dependence between two datasets and is calculated as:

\[I(x;y) = H(x) + H(y) - H(x,y)\]

Where H(x) is the Shannon entropy of x. For continuous datasets, adapts the KSG Estimator [1] for mutual information.

Parameters

x (ndarray) – An array with shape (n_samples, n_features_x)
y (ndarray) – An array with shape (n_samples, n_features_y)

Return type

float

Returns

mutual information of x and y

References:

1: A. Kraskov, H. Stogbauer and P. Grassberger, “Estimating mutual information”. Phys. Rev. E 69, 2004.