Emulator

class prism.Emulator(pipeline_obj, modellink_obj)[source]

Defines the Emulator class of the PRISM package.

Description

The Emulator class is the backbone of the PRISM package, holding all tools necessary to construct, load, save and evaluate the emulator of a model. It performs many checks to see if the provided ModelLink object is compatible with the current emulator, advises the user on alternatives when certain operations are requested, automatically takes care of distributing emulator systems over MPI ranks and more.

Even though the purpose of the Emulator class is to hold only information about the emulator and therefore does not require any details about the provided ModelLink object, it will keep track of changes made to it. This is to allow the user to modify the properties of the ModelLink subclass without causing any desynchronization problems by accident.

The Emulator class requires to be linked to an instance of the Pipeline class and will automatically attempt to do so when initialized. By default, this class should only be initialized from within a Pipeline object.

__init__(pipeline_obj, modellink_obj)[source]

Initialize an instance of the Emulator class.

Parameters:
_assign_data_idx(emul_i)[source]

Determines the emulator system each data point in the provided emulator iteration emul_i should be assigned to, in order to make sure that recurring data points have the same emulator system index as in the previous emulator iteration. If multiple options are possible, data points are assigned such to spread them as much as possible.

Parameters:emul_i (int) – Number indicating the requested emulator iteration.
Returns:
  • data_to_emul_s (list of int) – The index of the emulator system that each data point should be assigned to.
  • n_emul_s (int) – The total number of active and passive emulator systems there will be in the provided emulator iteration.

Examples

If the number of data points is less than the previous iteration:

>>> emul_i = 2
>>> self._data_idx[emul_i-1]
['A', 'B', 'C', 'D', 'E']
>>> self._modellink._data_idx
['B', 'F', 'G', 'E']
>>> self._assign_data_idx(emul_i)
([1, 3, 2, 4], 5)

If the number of data points is more than the previous iteration:

>>> emul_i = 2
>>> self._data_idx[emul_i-1]
['A', 'B', 'C', 'D', 'E']
>>> self._modellink._data_idx
['B', 'F', 'G', 'E', 'A', 'C']
>>> self._assign_data_idx(emul_i)
([1, 5, 3, 4, 0, 2], 6)

If there is no previous iteration:

>>> emul_i = 1
>>> self._data_idx[emul_i-1]
[]
>>> self._modellink._data_idx
['B', 'F', 'G', 'E', 'A', 'C']
>>> self._assign_data_idx(emul_i)
([5, 4, 3, 2, 1, 0], 6)
_assign_emul_s(emul_i)[source]

Determines which emulator systems (files) should be assigned to which MPI rank in order to balance the number of active emulator systems on every rank for every iteration up to the provided emulator iteration emul_i. If multiple choices can achieve this, the emulator systems are automatically spread out such that the total number of active emulator systems on a single rank is also balanced as much as possible.

Parameters:emul_i (int) – Number indicating the requested emulator iteration.
Returns:emul_s_to_core (list of lists) – A list containing the emulator systems that have been assigned to the corresponding MPI rank by the controller.

Notes

Currently, this function only uses high-level MPI. Additional speed can be obtained by also implementing low-level MPI, which will potentially be done in the future.

_cleanup_emul_files(emul_i)[source]

Opens all emulator HDF5-files and removes the provided emulator iteration emul_i and subsequent iterations from them. Also removes any related projection figures that have default names. If emul_i == 1, all emulator HDF5-files are removed instead.

Parameters:emul_i (int) – Number indicating the requested emulator iteration.
_construct_iteration(emul_i)[source]

Constructs the emulator iteration corresponding to the provided emul_i, by performing the given emulation method and pre-calculating the prior expectation and variance values of the used model evaluation samples.

Parameters:emul_i (int) – Number indicating the requested emulator iteration.

Generates

All data sets that are required to evaluate the emulator at the constructed iteration.

_create_new_emulator()[source]

Creates a new master HDF5-file that holds all the information of a new emulator and writes all important emulator details to it. Afterwards, resets all loaded emulator data and prepares the HDF5-file and emulator for the construction of the first emulator iteration.

Generates

A new master HDF5-file ‘prism.hdf5’ contained in the working directory specified in the Pipeline instance, holding all information required to construct the first iteration of the emulator.

_do_regression(emul_i, emul_s_seq)[source]

Performs a forward stepwise linear regression for all requested emulator systems emul_s_seq in the provided emulator iteration emul_i. Calculates what the expectation values of all polynomial coefficients are. The polynomial order that is used in the regression depends on poly_order.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.

Generates (for every emulator system)

rsdl_var : float
Residual variance of the regression function.
regr_score : float
Fit-score of the regression function.
poly_coef : 1D ndarray object
Array containing the expectation values of the non-zero polynomial coefficients.
poly_powers : 2D ndarray object
Array containing the powers of the non-zero polynomial terms in the regression function.
poly_idx : 1D ndarray object
Array containing the indices of the non-zero polynomial terms in the regression function.
poly_coef_cov : 1D ndarray object (if use_regr_cov is True)
Array containing the covariance values of the non-zero polynomial coefficients.
_evaluate(emul_i, emul_s_seq, par_set)[source]

Evaluates the emulator systems emul_s_seq at iteration emul_i for given par_set.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
  • par_set (1D ndarray object) – Model parameter value set to evaluate the emulator at.
Returns:

  • adj_exp_val (1D ndarray object) – Adjusted emulator expectation value for all requested emulator systems on this MPI rank.
  • adj_var_val (1D ndarray object) – Adjusted emulator variance value for all requested emulator systems on this MPI rank.

_get_active_par(emul_i, emul_s_seq)[source]

Determines the active parameters to be used for every emulator system listed in emul_s_seq in the provided emulator iteration emul_i. Uses backwards stepwise elimination to determine the set of active parameters. The polynomial order that is used in the stepwise elimination depends on poly_order.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.

Generates (for every emulator system)

active_par_data : 1D ndarray object
Array containing the indices of all the parameters that are active in the emulator iteration emul_i.
_get_adj_exp(emul_i, emul_s_seq, par_set, cov_vec)[source]

Calculates the adjusted emulator expectation values for requested emulator systems emul_s_seq at a given emulator iteration emul_i for specified parameter set par_set and corresponding covariance vector cov_vec.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
  • par_set (1D ndarray object) – Model parameter value set to calculate the adjusted emulator expectation for.
  • cov_vec (2D ndarray object) – Covariance vector corresponding to par_set.
Returns:

adj_exp_val (1D ndarray object) – Adjusted emulator expectation value for all requested emulator systems on this MPI rank.

_get_adj_var(emul_i, emul_s_seq, par_set, cov_vec)[source]

Calculates the adjusted emulator variance values for requested emulator systems emul_s_seq at a given emulator iteration emul_i for specified parameter set par_set and corresponding covariance vector cov_vec.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
  • par_set (1D ndarray object) – Model parameter value set to calculate the adjusted emulator variance for.
  • cov_vec (2D ndarray object) – Covariance vector corresponding to par_set.
Returns:

adj_var_val (1D ndarray object) – Adjusted emulator variance value for all requested emulator systems on this MPI rank.

_get_cov(emul_i, emul_s_seq, par_set1, par_set2)[source]

Calculates the full emulator covariances for requested emulator systems emul_s_seq at emulator iteration emul_i for given parameter sets par_set1 and par_set2. The contributions to these covariances depend on method.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
  • par_set1, par_set2 (1D ndarray object or None) – If par_set1 and par_set2 are both not None, calculate covariances for par_set1 with par_set2. If par_set1 is not None and par_set2 is None, calculate covariances for par_set1 with sam_set (covariance vector). If par_set1 and par_set2 are both None, calculate covariances for sam_set (covariance matrix). When not None, par_set is the model parameter value set to calculate the covariances for.
Returns:

cov (1D, 2D or 3D ndarray object) – Depending on the arguments provided, a covariance value, vector or matrix for requested emulator systems.

_get_cov_matrix(emul_i, emul_s_seq)[source]

Calculates the (inverse) matrix of covariances between known model evaluation samples for requested emulator systems emul_s_seq at emulator iteration emul_i.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.

Generates

cov_mat : 3D ndarray object
Matrix containing the covariances between all known model evaluation samples for requested emulator systems.
cov_mat_inv : 3D ndarray object
Inverse of covariance matrix for requested emulator systems.
_get_default_parameters()[source]

Generates a dict containing default values for all emulator parameters.

Returns:par_dict (dict) – Dict containing all default emulator parameter values.
_get_emul_i(emul_i, cur_iter)[source]

Checks if the provided emulator iteration emul_i can be requested or replaces it if None was provided. This method requires all MPI ranks to call it simultaneously.

Parameters:
  • emul_i (int or None) – Number indicating the requested emulator iteration.
  • cur_iter (bool) – Bool determining whether the current (True) or the next (False) emulator iteration is requested.
Returns:

emul_i (int) – The requested emulator iteration that passed the check.

_get_exp_dot_term(emul_i, emul_s_seq)[source]

Pre-calculates the second expectation adjustment dot-term for requested emulator systems emul_s_seq at a given emulator iteration emul_i for all model evaluation samples and saves it for later use.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.

Generates

exp_dot_term : 2D ndarray object
2D array containing the pre-calculated values for the second adjustment dot-term of the adjusted expectation for requested emulator systems.
_get_inv_matrix(matrix)[source]

Calculates the inverse of a given matrix. Right now only uses the inv() function.

Parameters:matrix (2D array_like) – Matrix to be inverted.
Returns:matrix_inv (2D ndarray object) – Inverse of the given matrix.
_get_prior_exp(emul_i, emul_s_seq, par_set)[source]

Calculates the prior expectation value for requested emulator systems emul_s_seq at a given emulator iteration emul_i for specified parameter set par_set. This expectation depends on method.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
  • par_set (1D ndarray object or None) – If None, calculate the prior expectation values of sam_set. If not None, calculate the prior expectation value for the given model parameter value set.
Returns:

prior_exp (1D or 2D ndarray object) – Prior expectation values for either sam_set or par_set for requested emulator systems.

_get_regr_cov(emul_i, emul_s_seq, par_set1, par_set2)[source]

Calculates the covariances of the regression function for requested emulator systems emul_s_seq at emulator iteration emul_i for given parameter sets par_set1 and par_set2.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
  • par_set1, par_set2 (1D ndarray object or None) – If par_set1 and par_set2 are both not None, calculate regression covariances for par_set1 with par_set2. If par_set1 is not None and par_set2 is None, calculate regression covariances for par_set1 with sam_set (covariance vector). If par_set1 and par_set2 are both None, calculate regression covariances for sam_set (covariance matrix). When not None, par_set is the model parameter value set to calculate the regression covariances for.
Returns:

regr_cov (1D, 2D or 3D ndarray object) – Depending on the arguments provided, a regression covariance value, vector or matrix for requested emulator systems.

_load_data(emul_i)[source]

Loads in all the important emulator data up to emulator iteration emul_i into memory.

Parameters:emul_i (int) – Number indicating the requested emulator iteration.

Generates

All relevant emulator data up to emulator iteration emul_i is loaded into memory.

_load_emulator(modellink_obj)[source]

Checks if the provided working directory contains a constructed emulator and loads in the emulator data accordingly.

Parameters:modellink_obj (ModelLink object) – Instance of the ModelLink class that links the emulated model to this Pipeline object.
_prepare_new_iteration(emul_i)[source]

Prepares the emulator for the construction of a new iteration emul_i. Checks if this iteration can be prepared or if it has been prepared before, and acts accordingly.

Parameters:emul_i (int) – Number indicating the requested emulator iteration.
Returns:reload (bool) – Bool indicating whether or not the controller rank of the Pipeline instance needs to reload its data.

Generates

A new group in the master HDF5-file with the emulator iteration as its name, containing subgroups corresponding to all emulator systems that will be used in this iteration.

Notes

Preparing an iteration that has been prepared before, causes that and all subsequent iterations of the emulator to be deleted. A check is carried out to see if it was necessary to reprepare the requested iteration and a warning is given if this check fails.

_read_data_idx(emul_s_group)[source]

Reads in and combines the parts of the data point identifier that is assigned to the provided emul_s_group.

Parameters:emul_s_group (Group object) – The HDF5-group from which the data point identifier needs to be read in.
Returns:data_idx (tuple of {int, float, str}) – The combined data point identifier.
_read_parameters()[source]

Reads in the Emulator parameters from the provided PRISM parameter file and saves them in the current Emulator instance.

_retrieve_parameters()[source]

Reads in the emulator parameters from the provided working directory and saves them in the current Emulator instance.

_save_data(emul_i, lemul_s, data_dict)[source]

Saves a given data dict {keyword: data} at the given emulator iteration emul_i and local emulator system lemul_s to the HDF5-file and as an data attribute to the current Emulator instance.

Parameters:
  • emul_i (int) – Number indicating the requested emulator iteration.
  • lemul_s (int or None) – Number indicating the requested local emulator system. If None, use the master emulator file instead.
  • data_dict (dict) – Dict containing the data that needs to be saved to the HDF5-file.
Keyword Arguments:
 
  • keyword ({‘active_par’, ‘active_par_data’, ‘cov_mat’, ‘exp_dot_term’, ‘mod_real_set’, ‘regression’}) – String specifying the type of data that needs to be saved.
  • data ({int, float, str, array_like} or dict) – The actual data that needs to be saved at data keyword keyword. If dict, save every item individually.

Generates

The specified data is saved to the HDF5-file.

_set_mock_data()[source]

Loads previously used mock data into the ModelLink object, overwriting the parameter estimates, data values, data errors, data spaces and data identifiers with their mock equivalents.

Generates

Overwrites the corresponding ModelLink class properties with the previously used values (taken from the first emulator iteration).

Sets the ModelLink object that will be used for constructing this emulator. If a constructed emulator is present, checks if provided modellink_obj argument matches the ModelLink subclass used to construct it.

Parameters:
  • modellink_obj (ModelLink object) – Instance of the ModelLink class that links the emulated model to this Pipeline object. The provided ModelLink object must match the one used to construct the loaded emulator.
  • modellink_loaded (str or None) – If str, the name of the ModelLink subclass that was used to construct the loaded emulator. If None, no emulator is loaded.
_write_data_idx(emul_s_group, data_idx)[source]

Splits a given data_idx up into individual parts and saves it as an attribute to the provided emul_s_group.

Parameters:
  • emul_s_group (Group object) – The HDF5-group to which the data point identifier needs to be saved.
  • data_idx (tuple of {int, float, str}) – The data point identifier to be saved.
active_emul_s

The indices of the emulator systems on this MPI rank that are active.

Type:list of int
active_par

The model parameter names that are considered active. Only available on the controller rank.

Type:list of str
active_par_data

The model parameter names that are considered active for every emulator system on this MPI rank.

Type:list of str
ccheck

The emulator system components that are still required to complete the construction of an emulator iteration on this MPI rank. The controller rank additionally lists the required components that are emulator iteration specific (‘mod_real_set’ and ‘active_par’).

Type:list of str
cov_mat_inv

The inverses of the covariance matrices for every emulator system on this MPI rank.

Type:list of ndarray
emul_i

The last emulator iteration that is fully constructed for all emulator systems on this MPI rank.

Type:int
emul_load

Whether or not a previously constructed emulator is currently loaded.

Type:bool
emul_s

The indices of the emulator systems that are assigned to this MPI rank.

Type:list of int
emul_s_to_core

List of the indices of the emulator systems that are assigned to every MPI rank. Only available on the controller rank.

Type:list of lists
emul_type

The type of emulator that is currently loaded.

Type:str
exp_dot_term

The second expectation adjustment dot-term values of all model evaluation samples for every emulator system on this MPI rank.

Type:list of ndarray
l_corr

The Gaussian correlation lengths for all model parameters, which is defined as the maximum distance between two values of a specific model parameter within which the Gaussian contribution to the correlation between the values is still significant.

Type:ndarray
method

The emulation method to use for constructing the emulator. Possible are ‘gaussian’, ‘regression’ and ‘full’.

Type:str
mod_set

The model outputs corresponding to the samples in sam_set for every emulator system on this MPI rank.

Type:list of ndarray
n_cross_val

Number of (k-fold) cross-validations that are used for determining the quality of the regression process. It is set to zero if cross-validations are not used. If method == ‘gaussian’ and do_active_anal is False, this number is not required.

Type:int
n_emul_s

Number of emulator systems assigned to this MPI rank.

Type:int
n_emul_s_tot

Total number of emulator systems assigned to all MPI ranks combined. Only available on the controller rank.

Type:int
n_sam

Number of model evaluation samples that have been/will be used to construct an emulator iteration.

Type:int
poly_coef

The non-zero coefficients for the polynomial terms in the regression function for every emulator system on this MPI rank. Empty if method == ‘gaussian’.

Type:list of ndarray
poly_coef_cov

The covariances for all coefficients in poly_coef for every emulator system on this MPI rank. Empty if method == ‘gaussian’ or use_regr_cov is False.

Type:list of ndarray
poly_idx

The indices for all polynomial terms with non-zero coefficients in the regression function for every emulator system on this MPI rank. Empty if method == ‘gaussian’.

Type:list of ndarray
poly_order

Polynomial order that is considered for the regression process. If method == ‘gaussian’ and do_active_anal is False, this number is not required.

Type:int
poly_powers

The powers for all polynomial terms with non-zero coefficients in the regression function for every emulator system on this MPI rank. Empty if method == ‘gaussian’.

Type:list of ndarray
rsdl_var

The residual variances for every emulator system on this MPI rank. Obtained from regression process and replaces the Gaussian sigma. Empty if method == ‘gaussian’.

Type:list of float
sam_set

The model evaluation samples that have been/will be used to construct the specified emulator iteration.

Type:ndarray
sigma

Value of the Gaussian sigma. If method != ‘gaussian’, this value is not required, since it is obtained from the regression process instead.

Type:float
use_mock

Whether or not mock data has been used for the construction of this emulator instead of actual data. If True, changes made to the data in the provided ModelLink object are ignored.

Type:bool
use_regr_cov

Whether or not to take into account the regression covariance when calculating the covariance of the emulator, in addition to the Gaussian covariance. If method == ‘gaussian’, this bool is not required. If method == ‘regression’, this bool is always set to True.

Type:bool