Utilities

Utility functions and classes.

Periodic Container

class eryn.utils.PeriodicContainer(periodic)

Bases: object

Perform operations for periodic parameters

Parameters:: periodic_in (dict) – Keys are branch_names. Values are dictionaries. These dictionaries have keys as the parameter indexes and values their associated period.

distance(p1, p2, xp=None)

Move from p1 to p2 with periodic distance control

Parameters:

p1 (dict) – If dict, keys are branch_names and values are positions with parameters along the final dimension.
p2 (dict) – If dict, keys are branch_names and values are positions with parameters along the final dimension.
xp (object, optional) – numpy or cupy. If None, use numpy. (default: None)

Returns:

Distances accounting for periodicity.: Keys are branch names and values are distance arrays.

Return type:

dict

wrap(p, xp=None)

Wrap p with periodic distance control

Parameters:

p (dict) – If dict, keys are branch_names and values are positions with parameters along the final dimension.
xp (object, optional) – numpy or cupy. If None, use numpy. (default: None)

TransformContainer

class eryn.utils.TransformContainer(parameter_transforms=None, fill_dict=None)

Bases: object

Container for helpful transformations

Parameters:

parameter_transforms (dict, optional) – Keys are int or tuple of int that contain the indexes into the parameters that correspond to the transformation added as the Values to the dict. If using fill_values, you must be careful with making sure parameter transforms properly comes before or after filling values. int indicate single parameter transforms. These are performed first. tuple of int indicates multiple parameter transforms. These are performed after single-parameter transforms. (default: None)
fill_dict (dict, optional) – Keys must contain 'ndim_full', 'fill_inds', and 'fill_values'. 'ndim_full' is the full last dimension of the final array after fill_values are added. ‘fill_inds’ and ‘fill_values’ are np.ndarray[number of fill values] that contain the indexes and corresponding values for filling. (default: None)

Raises:

ValueError – Input information is not correct.

transform_base_parameters(params, copy=True, return_transpose=False, xp=None)

Transform the base parameters

Parameters:

params (np.ndarray[..., ndim]) – Array with coordinates. This array is transformed according to the self.base_transforms dictionary.
copy (bool, optional) – If True, copy the input array. (default: True)
return_transpose (bool, optional) – If True, return the transpose of the array. (default: False)
xp (object, optional) – numpy or cupy. If None, use numpy. (default: None)

Returns:

Transformed params array.

Return type:

np.ndarray[…, ndim]

fill_values(params, xp=None)

fill fixed parameters

Parameters:

params (np.ndarray[..., ndim]) – Array with coordinates. This array is filled with values according to the self.fill_dict dictionary.
xp (object, optional) – numpy or cupy. If None, use numpy. (default: None)

Returns:

Filled params array.

Return type:

np.ndarray[…, ndim_full]

both_transforms(params, copy=True, return_transpose=False, reverse=False, xp=None)

Transform the parameters and fill fixed parameters

This fills the fixed parameters and then transforms all of them. Therefore, the user must be careful with the indexes input.

This is generally the direction recommended because fixed parameters may change non-fixed parameters during parameter transformations. This can be reversed with the reverse kwarg.

Parameters:

params (np.ndarray[..., ndim]) – Array with coordinates. This array is transformed according to the self.base_transforms dictionary.
copy (bool, optional) – If True, copy the input array. (default: True)
return_transpose (bool, optional) – If True, return the transpose of the array. (default: False)
reverse (bool, optional) – If True perform the filling after the transforms. This makes indexing easier, but removes the ability of fixed parameters to affect transforms. (default: False)
xp (object, optional) – numpy or cupy. If None, use numpy. (default: None)

Returns:

Transformed and filleds params array.

Return type:

np.ndarray[…, ndim]

Update functions

Update Base Class

class eryn.utils.Update

Bases: ABC, object

Update the sampler.

classmethod __call__(iter, last_sample, sampler)

Call update function.

Parameters:

iter (int) – Iteration of the sampler.
last_sample (obj) – Last state of sampler (eryn.state.State).
sampler (obj) – Full sampler oject (eryn.ensemble.EnsembleSampler).

Implemented Update Functions

class eryn.utils.AdjustStretchProposalScale(target_acceptance=0.22, supression_factor=0.1, max_change=0.5, verbose=False)

Bases: Update

__call__(iter, last_sample, sampler)

Call update function.

Parameters:

iter (int) – Iteration of the sampler.
last_sample (obj) – Last state of sampler (eryn.state.State).
sampler (obj) – Full sampler oject (eryn.ensemble.EnsembleSampler).

Stopping functions

Stopping Base Class

class eryn.utils.Stopping

Bases: ABC, object

Base class for stopping.

Stopping checks are only performed every thin_by iterations.

classmethod __call__(iter, last_sample, sampler)

Call update function.

Parameters:

iter (int) – Iteration of the sampler.
last_sample (obj) – Last state of sampler (eryn.state.State).
sampler (obj) – Full sampler oject (eryn.ensemble.EnsembleSampler).

Returns:

Value of stop. If True, stop sampling.

Return type:

bool

Implemented Stopping Functions

class eryn.utils.SearchConvergeStopping(n_iters=30, diff=0.1, start_iteration=0, verbose=False)

Bases: Stopping

Stopping function based on a convergence to a maximunm Likelihood.

Stopping checks are only performed every thin_by iterations. Therefore, the iterations of stopping checks are really every sampler iterations * thin_by.

All arguments are stored as attributes.

Parameters:

n_iters (int, optional) – Number of iterative stopping checks that need to pass in order to stop the sampler. (default: 30)
diff (float, optional) – Change in the Likelihood needed to fail the stopping check. In other words, if the new maximum Likelihood is more than diff greater than the old, all iterative checks reset. (default: 0.1).
start_iteration (int, optional) – Iteration of sampler to start checking to stop. (default: 0)
verbose (bool, optional) – If True, print information. (default: False)

iters_consecutive

Number of consecutive passes of the stopping check.

Type:: int

past_like_best

Previous best Likelihood. The initial value is -np.inf.

Type:: float

__call__(iter, sample, sampler)

Call update function.

Parameters:

iter (int) – Iteration of the sampler.
last_sample (obj) – Last state of sampler (eryn.state.State).
sampler (obj) – Full sampler oject (eryn.ensemble.EnsembleSampler).

Returns:

Value of stop. If True, stop sampling.

Return type:

bool

Sampler Model Container

The sampler model container (eryn.model.Model) is a named tuple that carries around some of the most important objects in the sampler. These are then passed into proposals for usage. The model container has keys: ["log_like_fn", "compute_log_like_fn", "compute_log_prior_fn", "temperature_control", "map_fn", "random"]. These correspond, respectively, to the log Likelihood function in the form of the function wrapper with ensemble.py; the log Likelihood function from the sampler; the log prior function from the sampler; the temperature controller; the map function where pool objects can be found; and the random generator. After initializing the eryn.ensemble.EnsembleSampler object, the model container tuple can be accessed with the eryn.ensemble.EnsembleSampler.get_model() method. If you store this in a variable model, you can access each member as an attribute, e.g. model.compute_log_like_fn.

Other Utility Functions

eryn.utils.utility.groups_from_inds(inds)

Convert inds to group information

Parameters:

inds (dict) – Keys are branch_names and values are inds np.ndarrays[ntemps, nwalkers, nleaves_max] that specify which leaves are used in this step.

Returns:

Dictionary with group information.: Keys are branch_names and values are np.ndarray[total number of used leaves]. The array is flat.

Return type:

dict

eryn.utils.utility.get_acf(x, axis=0, fast=False)

Estimate the autocorrelation function of a time series using the FFT. :param x:

The time series. If multidimensional, set the time axis using the axis keyword argument and the function will be computed for every other axis.

Parameters:

axis – (optional) The time axis of x. Assumed to be the first axis if not specified.
fast – (optional) If True, only use the largest 2^n entries for efficiency. (default: False)

eryn.utils.utility.get_integrated_act(x, axis=0, window=50, fast=False, average=True)

Estimate the integrated autocorrelation time of a time series. See `Sokal’s notes on MCMC and sample estimators for autocorrelation times. :param x:

The time series. If multidimensional, set the time axis using the axis keyword argument and the function will be computed for every other axis.

Parameters:

axis – (optional) The time axis of x. Assumed to be the first axis if not specified.
window – (optional) The size of the window to use. (default: 50)
fast – (optional) If True, only use the largest 2^n entries for efficiency. (default: False)

eryn.utils.utility.thermodynamic_integration_log_evidence(betas, logls)

Thermodynamic integration estimate of the evidence.

This function origindated in ptemcee.

Parameters:

betas (np.ndarray[ntemps]) – The inverse temperatures to use for the quadrature.
logls (np.ndarray[ntemps]) – The mean log-Likelihoods corresponding to betas to use for computing the thermodynamic evidence.

Returns:

(logZ, dlogZ):: Returns an estimate of the log-evidence and the error associated with the finite number of temperatures at which the posterior has been sampled.

Return type:

tuple

The evidence is the integral of the un-normalized posterior over all of parameter space: .. math:

Z \equiv \int d\theta \, l(\theta) p(\theta)

Thermodymanic integration is a technique for estimating the evidence integral using information from the chains at various temperatures. Let .. math:

Z(\beta) = \int d\theta \, l^\beta(\theta) p(\theta)

Then .. math:

\frac{d \log Z}{d \beta}
= \frac{1}{Z(\beta)} \int d\theta l^\beta p \log l
= \left \langle \log l \right \rangle_\beta

so .. math:

\log Z(1) - \log Z(0)
= \int_0^1 d\beta \left \langle \log l \right\rangle_\beta

By computing the average of the log-likelihood at the difference temperatures, the sampler can approximate the above integral.

eryn.utils.utility.stepping_stone_log_evidence(betas, logls, block_len=50, repeats=100)

Stepping stone approximation for the evidence calculation.

Based on a. https://arxiv.org/abs/1810.04488 and b. https://pubmed.ncbi.nlm.nih.gov/21187451/.

Parameters:

betas (np.ndarray[ntemps]) – The inverse temperatures to use for the quadrature.
logls (np.ndarray[ntemps]) – The mean log-Likelihoods corresponding to betas to use for computing the thermodynamic evidence.
block_len (int) – The length of each chain block to compute the evidence from. Useful for computing the error-bars.
repeats (int) – The number of repeats to compute the evidence (using the block above).

Returns

tuple: (logZ, dlogZ):: Returns an estimate of the log-evidence and the error associated with the finite number of temperatures at which the posterior has been sampled.

eryn.utils.utility.psrf(C, ndims, per_walker=False)

The Gelman - Rubin convergence diagnostic. A general approach to monitoring convergence of MCMC output of multiple walkers. The function makes a comparison of within-chain and between-chain variances. A large deviation between these two variances indicates non-convergence, and the output [Rhat] deviates from unity.

By default, it combines the MCMC chains for all walkers, and then computes the Rhat for the first and last 1/3 parts of the traces. This can be tuned with the per_walker flag.

Based on a. Brooks, SP. and Gelman, A. (1998) General methods for monitoring convergence

of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434-455

Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, Statistical Science, 7, 457-511.

Parameters:

C (np.ndarray[nwalkers, ndim]) – The parameter traces. The MCMC chains.
ndims (int) – The dimensions
per_walker (bool, optional) – Do the test on the combined chains, or using
separatelly. (each if the walkers)

Returns

tuple: (Rhat, neff):: Returns an estimate of the Gelman-Rubin convergence diagnostic Rhat, and the effective number od samples neff.

Code taken from https://joergdietrich.github.io/emcee-convergence.html