Ensemble
The eryn.ensemble.Ensemble is the main driver and container for MCMC runs.
- class eryn.ensemble.EnsembleSampler(nwalkers, ndims, log_like_fn, priors, provide_groups=False, provide_supplemental=False, tempering_kwargs={}, branch_names=None, nbranches=1, nleaves_max=1, nleaves_min=0, pool=None, moves=None, rj_moves=None, dr_moves=None, dr_max_iter=5, args=None, kwargs=None, backend=None, vectorize=False, blobs_dtype=None, plot_iterations=-1, plot_generator=None, plot_name=None, periodic=None, update_fn=None, update_iterations=-1, stopping_fn=None, stopping_iterations=-1, fill_zero_leaves_val=-1e+300, num_repeats_in_model=1, num_repeats_rj=1, track_moves=True, info={})
- Bases: - object- An ensemble MCMC sampler - The class controls the entire sampling run. It can handle everything from a basic non-tempered MCMC to a parallel-tempered, global fit containing multiple branches (models) and a variable number of leaves (sources) per branch. See here for a basic explainer. - Parameters related to parallelization can be controlled via the - poolargument.- Parameters:
- nwalkers (int) – The number of walkers in the ensemble per temperature. 
- ndims (int, list of ints, or dict) – The number of dimensions for each branch. If - dict, keys should be the branch names and values the associated dimensionality.
- log_like_fn (callable) – - A function that returns the natural logarithm of the likelihood for that position. The inputs to - log_like_fndepend on whether the function is vectorized (kwarg- vectorizebelow), if you are using reversible jump, and how many branches you have.- In the simplest case where - vectorize == False, no reversible jump, and only one type of model, the inputs are just the array of parameters for one walker, so shape is- (ndim,).- If - vectorize == True, no reversible jimp, and only one type of model, the inputs will be a 2D array of parameters of all the walkers going in. Shape:- (num positions, ndim).- If using reversible jump, the leaves that go together in the same Likelihood will be grouped together into a single function call. If - vectorize == False, then each group is sent as an individual computation. With- Ndifferent branches (- N > 1), inputs would be a list of 2D arrays of the coordinates for all leaves within each branch:- ([x0, x1,...,xN])where- xiis 2D with shape- (number of leaves in this branch, ndim). If- N == 1, then a list is not provided, just x0, the 2D array of coordinates for the one branch considered.- If using reversible jump and - vectorize == True, then the arrays of parameters will be output with information as regards the grouping of branch and leaf set. Inputs will be- ([X0, X1,..XN], [group0, group1,...,groupN])where- Xiis a 2D array of all leaves in the sampler for branch- i.- groupiis an index indicating which unique group that sources belongs. For example, if we have 3 walkers with (1, 2, 1) leaves for model- i, respectively, we wil have an- Xi = array([params0, params1, params2, params3])and- groupsi = array([0, 1, 1, 2]). If- N == 1, then the lists are removed and the inputs become- (X0, group0).- Extra - argsand- kwargsfor the Likelihood function can be added with the kwargs- argsand- kwargsbelow.- Please see the tutorial for more information. 
- priors (dict) – The prior dictionary can take four forms. 1) A dictionary with keys as int or tuple containing the int or tuple of int that describe the parameter number over which to assess the prior, and values that are prior probability distributions that must have a - logpdfclass method. 2) A- eryn.prior.ProbDistContainerobject. 3) A dictionary with keys that are- branch_namesand values that are dictionaries for each branch as described for (1). 4) A dictionary with keys that are- branch_namesand values are- eryn.prior.ProbDistContainerobjects. If the priors dictionary has the specific key- "all_models_together"in it, then a special prior must be input by the user that can produce the prior- logpdfinformation that can depend on all of the models together rather than handling them separately as is the default. In this case, the user must input as the item attached to this special key a class object with a- logpdfmethod that takes as input two arguments:- (coords, inds), which are the coordinate and index dictionaries across all models with shapes of- (ntemps, nwalkers, nleaves_max, ndim)and- (ntemps, nwalkers, nleaves_max)for each individual model, respectively. This function must then return the numpy array of logpdf values for the prior value with shape- (ntemps, nwalkers).
- provide_groups (bool, optional) – If - True, provide groups as described in- log_like_fnabove. A group parameter is added for each branch. (default:- False)
- provide_supplemental (bool, optional) – - If - True, it will provide keyword arguments to the Likelihood function:- suppsand- branch_supps. Please see the Tutorial and- eryn.state.BranchSupplementalfor more information.
- tempering_kwargs (dict, optional) – Keyword arguments for initialization of the tempering class: - eryn.moves.tempering.TemperatureControl. (default:- {})
- branch_names (list, optional) – List of branch names. If - None, models will be assigned names as- f"model_{index}"based on- nbranches. (default:- None)
- nbranches (int, optional) – Number of branches (models) tested. Only used if - branch_names is None. (default:- 1)
- nleaves_max (int, list of ints, or dict, optional) – Maximum allowable leaf count for each branch. It should have the same length as the number of branches. If - dict, keys should be the branch names and values the associated maximal leaf value. (default:- 1)
- nleaves_min (int, list of ints, or dict, optional) – Minimum allowable leaf count for each branch. It should have the same length as the number of branches. Only used with Reversible Jump. If - dict, keys should be the branch names and values the associated maximal leaf value. If- Noneand using Reversible Jump, will fill all branches with zero. (default:- 0)
- pool (object, optional) – An object with a - mapmethod that follows the same calling sequence as the built-in- mapfunction. This is generally used to compute the log-probabilities for the ensemble in parallel.
- moves (list or object, optional) – This can be a single move object, a list of moves, or a “weighted” list of the form - [(eryn.moves.StretchMove(), 0.1), ...]. When running, the sampler will randomly select a move from this list (optionally with weights) for each proposal. If- None, the default will be- StretchMove. (default:- None)
- rj_moves (list or object or bool or str, optional) – If - Noneor- False, reversible jump will not be included in the run. This can be a single move object, a list of moves, or a “weighted” list of the form- [(eryn.moves.DistributionGenerateRJ(), 0.1), ...]. When running, the sampler will randomly select a move from this list (optionally with weights) for each proposal. If- True, it defaults to- DistributionGenerateRJ. When running just the- DistributionGenerateRJwith multiple branches, it will propose changes to all branhces simultaneously. When running with more than one branch, useful options for- rj_movesare- "iterate_branches",- "separate_branches", or- "together". If- rj_moves == "iterate_branches", sample one branch by one branch in order of the branch names. This occurs within one RJ proposal, for each RJ proposal. If- rj_moves == "separate_branches", there will be one RJ move per branch. During each individual RJ move, one of these proposals is chosen at random with equal propability. This is generally recommended when using multiple branches. If- rj_moves == "together", this is equivalent to- rj_moves == True. (default:- None)
- dr_moves (bool, optional) – If - Noneot- False, delayed rejection when proposing “birth” of new components/models will be switched off for this run. Requires- rj_movesset to- True. Not implemented yet. Working on it. (default:- None)
- dr_max_iter (int, optional) – Maximum number of iterations used with delayed rejection. (default: 5) 
- args (optional) – A list of extra positional arguments for - log_like_fn.- log_like_fnwill be called as- log_like_fn(sampler added args, *args, sampler added kwargs, **kwargs).
- kwargs (optional) – A dict of extra keyword arguments for - log_like_fn.- log_like_fnwill be called as- log_like_fn(sampler added args, *args, sampler added kwargs, **kwargs).
- backend (optional) – Either a - backends.Backendor a subclass (like- backends.HDFBackend) that is used to store and serialize the state of the chain. By default, the chain is stored as a set of numpy arrays in memory, but new backends can be written to support other mediums.
- vectorize (bool, optional) – If - True,- log_like_fnis expected to accept an array of position vectors instead of just one. Note that- poolwill be ignored if this is- True. See- log_like_fninformation above to understand the arguments of- log_like_fnbased on whether- vectorizeis- True. (default:- False)
- periodic (dict, optional) – Keys are - branch_names. Values are dictionaries that have (key: value) pairs as (index to parameter: period). Periodic parameters are treated as having periodic boundary conditions in proposals.
- update_fn (callable, optional) – - eryn.utils.updates.AdjustStretchProposalScaleobject that allows the user to update the sampler in any preferred way every- update_iterationssampler iterations. The callable must have signature:- (sampler iteration, last sample state object, EnsembleSampler object).
- update_iterations (int, optional) – Number of iterations between sampler updates using - update_fn. Updates are only performed at the thinning rate. If- thin_by>1when- EnsembleSampler.run_mcmc()is used, the sampler is updated every- thin_by * update_iterationsiterations.
- stopping_fn (callable, optional) – - eryn.utils.stopping.Stoppingobject that allows the user to end the sampler if specified criteria are met. The callable must have signature:- (sampler iteration, last sample state object, EnsembleSampler object).
- stopping_iterations (int, optional) – Number of iterations between sampler attempts to evaluate the - stopping_fn. Stopping checks are only performed at the thinning rate. If- thin_by>1when- EnsembleSampler.run_mcmc()is used, the sampler is checked for the stopping criterion every- thin_by * stopping_iterationsiterations.
- fill_zero_leaves_val (double, optional) – When there are zero leaves in a given walker (across all branches), fill the likelihood value with - fill_zero_leaves_val. If wanting to keep zero leaves as a possible model, this should be set to the value of the contribution to the Likelihood from the data. (Default:- -1e300).
- num_repeats_in_model (int, optional) – Number of times to repeat the in-model step within in one sampler iteration. When analyzing the acceptance fraction, you must include the value of - num_repeats_in_modelto get the proper denominator.
- num_repeats_rj (int, optional) – Number of time to repeat the reversible jump step within in one sampler iteration. When analyzing the acceptance fraction, you must include the value of - num_repeats_rjto get the proper denominator.
- track_moves (bool, optional) – If - True, track acceptance fraction of each move in the backend. If- False, no tracking is done. If- Trueand run is interrupted, it will check that the move configuration has not changed. It will not allow the run to go on if it is changed. In this case, the user should declare a new backend and use the last state from the previous backend. Warning: If the order of moves of the same move class is changed, the check may not catch it, so the tracking may mix move acceptance fractions together.
- info (dict, optional) – Key and value pairs reprenting any information the user wants to add to the backend if the user is not inputing their own backend. 
 
- Raises:
- ValueError – Any startup issues. 
 - property random_state
- The state of the internal random number generator. In practice, it’s the result of calling - get_state()on a- numpy.random.mtrand.RandomStateobject. You can try to set this property but be warned that if you do this and it fails, it will do so silently.
 - property priors
- Return the priors in the sampler. 
 - reset(**info)
- Reset the backend. - Parameters:
- **info (dict, optional) – information to pass to backend reset method. 
 
 - get_model()
- Get - Modelobject from sampler- The model object is used to pass necessary information to the proposals. This method can be used to retrieve the - modelused in the sampler from outside the sampler.- Returns:
- Modelobject used by sampler.
- Return type:
- Model
 
 - sample(initial_state, iterations=1, tune=False, skip_initial_state_check=True, thin_by=1, store=True, progress=False)
- Advance the chain as a generator - Parameters:
- initial_state (State or ndarray[ntemps, nwalkers, nleaves_max, ndim] or dict) – The initial - Stateor positions of the walkers in the parameter space. If multiple branches used, must be dict with keys as the- branch_namesand values as the positions. If- betasare provided in the state object, they will be loaded into the- temperature_control.
- iterations (int or None, optional) – The number of steps to generate. - Nonegenerates an infinite stream (requires- store=False). (default: 1)
- tune (bool, optional) – If - True, the parameters of some moves will be automatically tuned. (default:- False)
- thin_by (int, optional) – If you only want to store and yield every - thin_bysamples in the chain, set- thin_byto an integer greater than 1. When this is set,- iterations * thin_byproposals will be made. (default: 1)
- store (bool, optional) – By default, the sampler stores in the backend the positions (and other information) of the samples in the chain. If you are using another method to store the samples to a file or if you don’t need to analyze the samples after the fact (for burn-in for example) set - storeto- False. (default:- True)
- progress (bool or str, optional) – If - True, a progress bar will be shown as the sampler progresses. If a string, will select a specific- tqdmprogress bar - most notable is- 'notebook', which shows a progress bar suitable for Jupyter notebooks. If- False, no progress bar will be shown. (default:- False)
- skip_initial_state_check (bool, optional) – If - True, a check that the initial_state can fully explore the space will be skipped. If using reversible jump, the user needs to ensure this on their own (- skip_initial_state_check``is set to ``Falsein this case. (default:- True)
 
- Returns:
- Every - thin_bysteps, this generator yields the- Stateof the ensemble.
- Return type:
- Raises:
- ValueError – Improper initialization. 
 
 - run_mcmc(initial_state, nsteps, burn=None, post_burn_update=False, **kwargs)
- Iterate - sample()for- nstepsiterations and return the result.- Parameters:
- initial_state (State or ndarray[ntemps, nwalkers, nleaves_max, ndim] or dict) – The initial - Stateor positions of the walkers in the parameter space. If multiple branches used, must be dict with keys as the- branch_namesand values as the positions. If- betasare provided in the state object, they will be loaded into the- temperature_control.
- nsteps (int) – The number of steps to generate. The total number of proposals is - nsteps * thin_by.
- burn (int, optional) – Number of burn steps to run before storing information. The - thin_bykwarg is ignored when counting burn steps since there is no storage (equivalent to- thin_by=1).
- post_burn_update (bool, optional) – If - True, run- update_fnafter burn in.
 
 - Other parameters are directly passed to - sample().
 - compute_log_prior(coords, inds=None, supps=None, branch_supps=None)
- Calculate the vector of log-prior for the walkers - Parameters:
- coords (dict) – Keys are - branch_namesand values are the position np.arrays[ntemps, nwalkers, nleaves_max, ndim]. This dictionary is created with the- branches_coordsattribute from- State.
- inds (dict, optional) – Keys are - branch_namesand values are the- indsnp.arrays[ntemps, nwalkers, nleaves_max] that indicates which leaves are being used. This dictionary is created with the- branches_indsattribute from- State. (default:- None)
 
- Returns:
- Prior Values 
- Return type:
- np.ndarray[ntemps, nwalkers] 
 
 - compute_log_like(coords, inds=None, logp=None, supps=None, branch_supps=None)
- Calculate the vector of log-likelihood for the walkers - Parameters:
- coords (dict) – Keys are - branch_namesand values are the position np.arrays[ntemps, nwalkers, nleaves_max, ndim]. This dictionary is created with the- branches_coordsattribute from- State.
- inds (dict, optional) – Keys are - branch_namesand values are the inds np.arrays[ntemps, nwalkers, nleaves_max] that indicates which leaves are being used. This dictionary is created with the- branches_indsattribute from- State. (default:- None)
- logp (np.ndarray[ntemps, nwalkers], optional) – Log prior values associated with all walkers. If not provided, it will be calculated because if a walker has logp = -inf, its likelihood is not calculated. This prevents evaluting likelihood outside the prior. (default: - None)
 
- Returns:
- Carries log-likelihood and blob information.
- First entry is np.ndarray[ntemps, nwalkers] with values corresponding to the log likelihood of each walker. Second entry is - blobs.
 
- Return type:
- tuple - Raises:
- ValueError: Infinite or NaN values in parameters. 
 
 
 - property acceptance_fraction
- The fraction of proposed steps that were accepted 
 - property rj_acceptance_fraction
- The fraction of proposed reversible jump steps that were accepted 
 - property swap_acceptance_fraction
- The fraction of proposed temperature swaps that were accepted 
 - get_chain(**kwargs)
- Get the stored chain of MCMC samples - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- MCMC samples
- The dictionary contains np.ndarrays of samples across the branches. 
 
- Return type:
- dict 
 
 - get_blobs(**kwargs)
- Get the chain of blobs for each sample in the chain - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- The chain of blobs. 
- Return type:
- double np.ndarray[nsteps, ntemps, nwalkers, nblobs] 
 
 - get_log_like(**kwargs)
- Get the chain of log Prior evaluated at the MCMC samples - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- The chain of log prior values. 
- Return type:
- double np.ndarray[nsteps, ntemps, nwalkers] 
 
 - get_log_prior(**kwargs)
- Get the chain of log Prior evaluated at the MCMC samples - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- The chain of log prior values. 
- Return type:
- double np.ndarray[nsteps, ntemps, nwalkers] 
 
 - get_log_posterior(**kwargs)
- Get the chain of log posterior values evaluated at the MCMC samples - Parameters:
- temper (bool, optional) – Apply tempering to the posterior values. (default: - False)
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- The chain of log prior values. 
- Return type:
- double np.ndarray[nsteps, ntemps, nwalkers] 
 
 - get_inds(**kwargs)
- Get the stored chain of MCMC samples - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- The indsassociated with the MCMC samples.
- The dictionary contains np.ndarrays of - indsacross the branches indicated which leaves were used at each step.
 
- The 
- Return type:
- dict 
 
 - get_nleaves(**kwargs)
- Get the number of leaves for each walker - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- nleaves on each branch.
- The number of leaves on each branch associated with the MCMC samples
- within each branch. 
 
 
- Return type:
- dict 
 
 - get_last_sample(**kwargs)
- Access the most recent sample in the chain - Returns:
- eryn.state.Stateobject containing the last sample from the chain.
- Return type:
 
 - get_betas(**kwargs)
- Get the chain of inverse temperatures - Parameters:
- thin (int, optional) – Take only every - thinsteps from the chain. (default:- 1)
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- slice_vals (indexing np.ndarray or slice, optional) – This is only available in - eryn.backends.hdfbackend. If provided, slice the array directly from the HDF5 file with slice =- slice_vals.- thinand- discardwill be ignored if slice_vals is not- None. This is particularly useful if files are very large and the user only wants a small subset of the overall array. (default:- None)
- temp_index (int, optional) – Integer for the desired temperature index. If - None, will return all temperatures. (default:- None)
 
- Returns:
- The chain of temperatures. 
- Return type:
- double np.ndarray[nsteps, ntemps] 
 
 - get_value(name, **kwargs)
- Get a specific value 
 - get_autocorr_time(**kwargs)
- Compute an estimate of the autocorrelation time for each parameter - Parameters:
- discard (int, optional) – Discard the first - discardsteps in the chain as burn-in. (default:- 0)
- thin (int, optional) – Use only every - thinsteps from the chain. The returned estimate is multiplied by- thinso the estimated time is in units of steps, not thinned steps. (default:- 1)
- all_temps (bool, optional) – If True, calculate autocorrelation across all temperatures. If False, calculate autocorrelation across the minumum temperature chain (usually - T=1). (default:- False)
- multiply_thin (bool, optional) – into the autocorrelation length. (default: - True)
 
 - Other arguments are passed directly to - emcee.autocorr.integrated_time().- Returns:
- autocorrelation times
- The dictionary contains autocorrelation times for all parameters as 1D double np.ndarrays as values with associated - branch_namesas keys.
 
- Return type:
- dict