Ensemble

The eryn.ensemble.Ensemble is the main driver and container for MCMC runs.

class eryn.ensemble.EnsembleSampler(nwalkers, ndims, log_like_fn, priors, provide_groups=False, provide_supplemental=False, tempering_kwargs={}, branch_names=None, nbranches=1, nleaves_max=1, nleaves_min=0, pool=None, moves=None, rj_moves=None, dr_moves=None, dr_max_iter=5, args=None, kwargs=None, backend=None, vectorize=False, blobs_dtype=None, plot_iterations=-1, plot_generator=None, plot_name=None, periodic=None, update_fn=None, update_iterations=-1, stopping_fn=None, stopping_iterations=-1, fill_zero_leaves_val=-1e+300, num_repeats_in_model=1, num_repeats_rj=1, track_moves=True, info={})

Bases: object

An ensemble MCMC sampler

The class controls the entire sampling run. It can handle everything from a basic non-tempered MCMC to a parallel-tempered, global fit containing multiple branches (models) and a variable number of leaves (sources) per branch. See here for a basic explainer.

Parameters related to parallelization can be controlled via the pool argument.

Parameters:

nwalkers (int) – The number of walkers in the ensemble per temperature.
ndims (int, list of ints, or dict) – The number of dimensions for each branch. If dict, keys should be the branch names and values the associated dimensionality.
log_like_fn (callable) –
A function that returns the natural logarithm of the likelihood for that position. The inputs to log_like_fn depend on whether the function is vectorized (kwarg vectorize below), if you are using reversible jump, and how many branches you have.

In the simplest case where vectorize == False, no reversible jump, and only one type of model, the inputs are just the array of parameters for one walker, so shape is (ndim,).

If vectorize == True, no reversible jimp, and only one type of model, the inputs will be a 2D array of parameters of all the walkers going in. Shape: (num positions, ndim).

If using reversible jump, the leaves that go together in the same Likelihood will be grouped together into a single function call. If vectorize == False, then each group is sent as an individual computation. With N different branches (N > 1), inputs would be a list of 2D arrays of the coordinates for all leaves within each branch: ([x0, x1,...,xN]) where xi is 2D with shape (number of leaves in this branch, ndim). If N == 1, then a list is not provided, just x0, the 2D array of coordinates for the one branch considered.

If using reversible jump and vectorize == True, then the arrays of parameters will be output with information as regards the grouping of branch and leaf set. Inputs will be ([X0, X1,..XN], [group0, group1,...,groupN]) where Xi is a 2D array of all leaves in the sampler for branch i. groupi is an index indicating which unique group that sources belongs. For example, if we have 3 walkers with (1, 2, 1) leaves for model i, respectively, we wil have an Xi = array([params0, params1, params2, params3]) and groupsi = array([0, 1, 1, 2]). If N == 1, then the lists are removed and the inputs become (X0, group0).

Extra args and kwargs for the Likelihood function can be added with the kwargs args and kwargs below.

Please see the tutorial for more information.
priors (dict) – The prior dictionary can take four forms. 1) A dictionary with keys as int or tuple containing the int or tuple of int that describe the parameter number over which to assess the prior, and values that are prior probability distributions that must have a logpdf class method. 2) A eryn.prior.ProbDistContainer object. 3) A dictionary with keys that are branch_names and values that are dictionaries for each branch as described for (1). 4) A dictionary with keys that are branch_names and values are eryn.prior.ProbDistContainer objects. If the priors dictionary has the specific key "all_models_together" in it, then a special prior must be input by the user that can produce the prior logpdf information that can depend on all of the models together rather than handling them separately as is the default. In this case, the user must input as the item attached to this special key a class object with a logpdf method that takes as input two arguments: (coords, inds), which are the coordinate and index dictionaries across all models with shapes of (ntemps, nwalkers, nleaves_max, ndim) and (ntemps, nwalkers, nleaves_max) for each individual model, respectively. This function must then return the numpy array of logpdf values for the prior value with shape (ntemps, nwalkers).
provide_groups (bool, optional) – If True, provide groups as described in log_like_fn above. A group parameter is added for each branch. (default: False)
provide_supplemental (bool, optional) –
If True, it will provide keyword arguments to the Likelihood function: supps and branch_supps. Please see the Tutorial and eryn.state.BranchSupplemental for more information.
tempering_kwargs (dict, optional) – Keyword arguments for initialization of the tempering class: eryn.moves.tempering.TemperatureControl. (default: {})
branch_names (list, optional) – List of branch names. If None, models will be assigned names as f"model_{index}" based on nbranches. (default: None)
nbranches (int, optional) – Number of branches (models) tested. Only used if branch_names is None. (default: 1)
nleaves_max (int, list of ints, or dict, optional) – Maximum allowable leaf count for each branch. It should have the same length as the number of branches. If dict, keys should be the branch names and values the associated maximal leaf value. (default: 1)
nleaves_min (int, list of ints, or dict, optional) – Minimum allowable leaf count for each branch. It should have the same length as the number of branches. Only used with Reversible Jump. If dict, keys should be the branch names and values the associated maximal leaf value. If None and using Reversible Jump, will fill all branches with zero. (default: 0)
pool (object, optional) – An object with a map method that follows the same calling sequence as the built-in map function. This is generally used to compute the log-probabilities for the ensemble in parallel.
moves (list or object, optional) – This can be a single move object, a list of moves, or a “weighted” list of the form [(eryn.moves.StretchMove(), 0.1), ...]. When running, the sampler will randomly select a move from this list (optionally with weights) for each proposal. If None, the default will be StretchMove. (default: None)
rj_moves (list or object or bool or str, optional) – If None or False, reversible jump will not be included in the run. This can be a single move object, a list of moves, or a “weighted” list of the form [(eryn.moves.DistributionGenerateRJ(), 0.1), ...]. When running, the sampler will randomly select a move from this list (optionally with weights) for each proposal. If True, it defaults to DistributionGenerateRJ. When running just the DistributionGenerateRJ with multiple branches, it will propose changes to all branhces simultaneously. When running with more than one branch, useful options for rj_moves are "iterate_branches", "separate_branches", or "together". If rj_moves == "iterate_branches", sample one branch by one branch in order of the branch names. This occurs within one RJ proposal, for each RJ proposal. If rj_moves == "separate_branches", there will be one RJ move per branch. During each individual RJ move, one of these proposals is chosen at random with equal propability. This is generally recommended when using multiple branches. If rj_moves == "together", this is equivalent to rj_moves == True. (default: None)
dr_moves (bool, optional) – If None ot False, delayed rejection when proposing “birth” of new components/models will be switched off for this run. Requires rj_moves set to True. Not implemented yet. Working on it. (default: None)
dr_max_iter (int, optional) – Maximum number of iterations used with delayed rejection. (default: 5)
args (optional) – A list of extra positional arguments for log_like_fn. log_like_fn will be called as log_like_fn(sampler added args, *args, sampler added kwargs, **kwargs).
kwargs (optional) – A dict of extra keyword arguments for log_like_fn. log_like_fn will be called as log_like_fn(sampler added args, *args, sampler added kwargs, **kwargs).
backend (optional) – Either a backends.Backend or a subclass (like backends.HDFBackend) that is used to store and serialize the state of the chain. By default, the chain is stored as a set of numpy arrays in memory, but new backends can be written to support other mediums.
vectorize (bool, optional) – If True, log_like_fn is expected to accept an array of position vectors instead of just one. Note that pool will be ignored if this is True. See log_like_fn information above to understand the arguments of log_like_fn based on whether vectorize is True. (default: False)
periodic (dict, optional) – Keys are branch_names. Values are dictionaries that have (key: value) pairs as (index to parameter: period). Periodic parameters are treated as having periodic boundary conditions in proposals.
update_fn (callable, optional) – eryn.utils.updates.AdjustStretchProposalScale object that allows the user to update the sampler in any preferred way every update_iterations sampler iterations. The callable must have signature: (sampler iteration, last sample state object, EnsembleSampler object).
update_iterations (int, optional) – Number of iterations between sampler updates using update_fn. Updates are only performed at the thinning rate. If thin_by>1 when EnsembleSampler.run_mcmc() is used, the sampler is updated every thin_by * update_iterations iterations.
stopping_fn (callable, optional) – eryn.utils.stopping.Stopping object that allows the user to end the sampler if specified criteria are met. The callable must have signature: (sampler iteration, last sample state object, EnsembleSampler object).
stopping_iterations (int, optional) – Number of iterations between sampler attempts to evaluate the stopping_fn. Stopping checks are only performed at the thinning rate. If thin_by>1 when EnsembleSampler.run_mcmc() is used, the sampler is checked for the stopping criterion every thin_by * stopping_iterations iterations.
fill_zero_leaves_val (double, optional) – When there are zero leaves in a given walker (across all branches), fill the likelihood value with fill_zero_leaves_val. If wanting to keep zero leaves as a possible model, this should be set to the value of the contribution to the Likelihood from the data. (Default: -1e300).
num_repeats_in_model (int, optional) – Number of times to repeat the in-model step within in one sampler iteration. When analyzing the acceptance fraction, you must include the value of num_repeats_in_model to get the proper denominator.
num_repeats_rj (int, optional) – Number of time to repeat the reversible jump step within in one sampler iteration. When analyzing the acceptance fraction, you must include the value of num_repeats_rj to get the proper denominator.
track_moves (bool, optional) – If True, track acceptance fraction of each move in the backend. If False, no tracking is done. If True and run is interrupted, it will check that the move configuration has not changed. It will not allow the run to go on if it is changed. In this case, the user should declare a new backend and use the last state from the previous backend. Warning: If the order of moves of the same move class is changed, the check may not catch it, so the tracking may mix move acceptance fractions together. (default: ``True’’)
info (dict, optional) – Key and value pairs reprenting any information the user wants to add to the backend if the user is not inputing their own backend.

Raises:

ValueError – Any startup issues.

property random_state: The state of the internal random number generator. In practice, it’s the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

property priors: Return the priors in the sampler.

reset(**kwargs)

Reset the backend.

Parameters:: **info (dict, optional) – information to pass to backend reset method.

get_model()

Get Model object from sampler

The model object is used to pass necessary information to the proposals. This method can be used to retrieve the model used in the sampler from outside the sampler.

Returns:: Model object used by sampler.
Return type:: Model

sample(initial_state, iterations=1, tune=False, skip_initial_state_check=True, thin_by=1, store=True, progress=False)

Advance the chain as a generator

Parameters:

initial_state (State or ndarray[ntemps, nwalkers, nleaves_max, ndim] or dict) – The initial State or positions of the walkers in the parameter space. If multiple branches used, must be dict with keys as the branch_names and values as the positions. If betas are provided in the state object, they will be loaded into the temperature_control.
iterations (int or None, optional) – The number of steps to generate. None generates an infinite stream (requires store=False). (default: 1)
tune (bool, optional) – If True, the parameters of some moves will be automatically tuned. (default: False)
thin_by (int, optional) – If you only want to store and yield every thin_by samples in the chain, set thin_by to an integer greater than 1. When this is set, iterations * thin_by proposals will be made. (default: 1)
store (bool, optional) – By default, the sampler stores in the backend the positions (and other information) of the samples in the chain. If you are using another method to store the samples to a file or if you don’t need to analyze the samples after the fact (for burn-in for example) set store to False. (default: True)
progress (bool or str, optional) – If True, a progress bar will be shown as the sampler progresses. If a string, will select a specific tqdm progress bar - most notable is 'notebook', which shows a progress bar suitable for Jupyter notebooks. If False, no progress bar will be shown. (default: False)
skip_initial_state_check (bool, optional) – If True, a check that the initial_state can fully explore the space will be skipped. If using reversible jump, the user needs to ensure this on their own (skip_initial_state_check``is set to ``False in this case. (default: True)

Returns:

Every thin_by steps, this generator yields the State of the ensemble.

Return type:

State

Raises:

ValueError – Improper initialization.

run_mcmc(initial_state, nsteps, burn=None, post_burn_update=False, **kwargs)

Iterate sample() for nsteps iterations and return the result.

Parameters:

initial_state (State or ndarray[ntemps, nwalkers, nleaves_max, ndim] or dict) – The initial State or positions of the walkers in the parameter space. If multiple branches used, must be dict with keys as the branch_names and values as the positions. If betas are provided in the state object, they will be loaded into the temperature_control.
nsteps (int) – The number of steps to generate. The total number of proposals is nsteps * thin_by.
burn (int, optional) – Number of burn steps to run before storing information. The thin_by kwarg is ignored when counting burn steps since there is no storage (equivalent to thin_by=1).
post_burn_update (bool, optional) – If True, run update_fn after burn in.

Other parameters are directly passed to sample().

Returns:: This method returns the most recent result from sample().
Return type:: State
Raises:: ValueError – If initial_state is None and run_mcmc has never been called.

compute_log_prior(coords, inds=None, supps=None, branch_supps=None)

Calculate the vector of log-prior for the walkers

Parameters:

coords (dict) – Keys are branch_names and values are the position np.arrays[ntemps, nwalkers, nleaves_max, ndim]. This dictionary is created with the branches_coords attribute from State.
inds (dict, optional) – Keys are branch_names and values are the inds np.arrays[ntemps, nwalkers, nleaves_max] that indicates which leaves are being used. This dictionary is created with the branches_inds attribute from State. (default: None)

Returns:

Prior Values

Return type:

np.ndarray[ntemps, nwalkers]

compute_log_like(coords, inds=None, logp=None, supps=None, branch_supps=None)

Calculate the vector of log-likelihood for the walkers

Parameters:

coords (dict) – Keys are branch_names and values are the position np.arrays[ntemps, nwalkers, nleaves_max, ndim]. This dictionary is created with the branches_coords attribute from State.
inds (dict, optional) – Keys are branch_names and values are the inds np.arrays[ntemps, nwalkers, nleaves_max] that indicates which leaves are being used. This dictionary is created with the branches_inds attribute from State. (default: None)
logp (np.ndarray[ntemps, nwalkers], optional) – Log prior values associated with all walkers. If not provided, it will be calculated because if a walker has logp = -inf, its likelihood is not calculated. This prevents evaluting likelihood outside the prior. (default: None)

Returns:

Carries log-likelihood and blob information.: First entry is np.ndarray[ntemps, nwalkers] with values corresponding to the log likelihood of each walker. Second entry is blobs.

Return type:

tuple

Raises:: ValueError: Infinite or NaN values in parameters.

property acceptance_fraction: The fraction of proposed steps that were accepted

property rj_acceptance_fraction: The fraction of proposed reversible jump steps that were accepted

property swap_acceptance_fraction: The fraction of proposed temperature swaps that were accepted

get_chain(**kwargs)

Get the stored chain of MCMC samples

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

MCMC samples: The dictionary contains np.ndarrays of samples across the branches.

Return type:

dict

get_blobs(**kwargs)

Get the chain of blobs for each sample in the chain

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

The chain of blobs.

Return type:

double np.ndarray[nsteps, ntemps, nwalkers, nblobs]

get_log_like(**kwargs)

Get the chain of log Prior evaluated at the MCMC samples

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

The chain of log prior values.

Return type:

double np.ndarray[nsteps, ntemps, nwalkers]

get_log_prior(**kwargs)

Get the chain of log Prior evaluated at the MCMC samples

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

The chain of log prior values.

Return type:

double np.ndarray[nsteps, ntemps, nwalkers]

get_log_posterior(**kwargs)

Get the chain of log posterior values evaluated at the MCMC samples

Parameters:

temper (bool, optional) – Apply tempering to the posterior values. (default: False)
thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

The chain of log prior values.

Return type:

double np.ndarray[nsteps, ntemps, nwalkers]

get_inds(**kwargs)

Get the stored chain of MCMC samples

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

The inds associated with the MCMC samples.: The dictionary contains np.ndarrays of inds across the branches indicated which leaves were used at each step.

Return type:

dict

get_nleaves(**kwargs)

Get the number of leaves for each walker

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

nleaves on each branch.

The number of leaves on each branch associated with the MCMC samples: within each branch.

Return type:

dict

get_last_sample(**kwargs)

Access the most recent sample in the chain

Returns:: eryn.state.State object containing the last sample from the chain.
Return type:: State

get_betas(**kwargs)

Get the chain of inverse temperatures

Parameters:

thin (int, optional) – Take only every thin steps from the chain. (default: 1)
discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
slice_vals (indexing np.ndarray or slice, optional) – This is only available in eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice = slice_vals. thin and discard will be ignored if slice_vals is not None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default: None)
temp_index (int, optional) – Integer for the desired temperature index. If None, will return all temperatures. (default: None)

Returns:

The chain of temperatures.

Return type:

double np.ndarray[nsteps, ntemps]

get_value(name, **kwargs): Get a specific value

get_autocorr_time(**kwargs)

Compute an estimate of the autocorrelation time for each parameter

Parameters:

discard (int, optional) – Discard the first discard steps in the chain as burn-in. (default: 0)
thin (int, optional) – Use only every thin steps from the chain. The returned estimate is multiplied by thin so the estimated time is in units of steps, not thinned steps. (default: 1)
all_temps (bool, optional) – If True, calculate autocorrelation across all temperatures. If False, calculate autocorrelation across the minumum temperature chain (usually T=1). (default: False)
multiply_thin (bool, optional) – into the autocorrelation length. (default: True)

Other arguments are passed directly to emcee.autocorr.integrated_time().

Returns:

autocorrelation times: The dictionary contains autocorrelation times for all parameters as 1D double np.ndarrays as values with associated branch_names as keys.

Return type:

dict