Emulator¶

class
prism.emulator.
Emulator
(pipeline_obj, modellink_obj)[source]¶ Defines the
Emulator
base class of the PRISM package.Description
The
Emulator
class is the backbone of the PRISM package, holding all tools necessary to construct, load, save and evaluate the emulator of a model. It performs many checks to see if the providedModelLink
object is compatible with the current emulator, advises the user on alternatives when certain operations are requested, automatically takes care of distributing emulator systems over MPI ranks and more.Even though the purpose of the
Emulator
class is to hold only information about the emulator and therefore does not require any details about the providedModelLink
object, it will keep track of changes made to it. This is to allow the user to modify the properties of theModelLink
subclass without causing any desynchronization problems by accident.The
Emulator
class requires to be linked to an instance of thePipeline
class and will automatically attempt to do so when initialized. By default, this class should only be initialized from within aPipeline
object.
__init__
(pipeline_obj, modellink_obj)[source]¶ Initialize an instance of the
Emulator
class.Parameters:

_assign_data_idx
(emul_i)[source]¶ Determines the emulator system each data point in the provided emulator iteration emul_i should be assigned to, in order to make sure that recurring data points have the same emulator system index as in the previous emulator iteration. If multiple options are possible, data points are assigned such to spread them as much as possible.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Returns:  data_to_emul_s (list of int) – The index of the emulator system that each data point should be assigned to.
 n_emul_s (int) – The total number of active and passive emulator systems there will be in the provided emulator iteration.
Examples
If the number of data points is less than the previous iteration:
>>> emul_i = 2 >>> self._data_idx[emul_i1] ['A', 'B', 'C', 'D', 'E'] >>> self._modellink._data_idx ['B', 'F', 'G', 'E'] >>> self._assign_data_idx(emul_i) ([1, 3, 2, 4], 5)
If the number of data points is more than the previous iteration:
>>> emul_i = 2 >>> self._data_idx[emul_i1] ['A', 'B', 'C', 'D', 'E'] >>> self._modellink._data_idx ['B', 'F', 'G', 'E', 'A', 'C'] >>> self._assign_data_idx(emul_i) ([1, 5, 3, 4, 0, 2], 6)
If there is no previous iteration:
>>> emul_i = 1 >>> self._data_idx[emul_i1] [] >>> self._modellink._data_idx ['B', 'F', 'G', 'E', 'A', 'C'] >>> self._assign_data_idx(emul_i) ([5, 4, 3, 2, 1, 0], 6)

_assign_emul_s
(emul_i)[source]¶ Determines which emulator systems (files) should be assigned to which MPI rank in order to balance the number of active emulator systems on every rank for every iteration up to the provided emulator iteration emul_i. If multiple choices can achieve this, the emulator systems are automatically spread out such that the total number of active emulator systems on a single rank is also balanced as much as possible.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Returns: emul_s_to_core (list of lists) – A list containing the emulator systems that have been assigned to the corresponding MPI rank by the controller. Notes
Currently, this function only uses highlevel MPI. Additional speed can be obtained by also implementing lowlevel MPI, which will potentially be done in the future.

_check_future_compat
(req_version, dep_version)[source]¶ Checks if the version of this emulator is compatible with the provided req_version. If not, raises a
FutureWarning
, indicating that the given dep_version will no longer support this emulator.Parameters:  req_version (str) – The version in which an incompatible change was introduced.
 dep_version (str) – The version in which the backward compatibility for this change will be removed.

_cleanup_emul_files
(emul_i)[source]¶ Opens all emulator HDF5files and removes the provided emulator iteration emul_i and subsequent iterations from them. Also removes any related projection figures that have default names. If emul_i == 1, all emulator HDF5files are removed instead.
Parameters: emul_i (int) – Number indicating the requested emulator iteration.

_construct_iteration
(emul_i)[source]¶ Constructs the emulator iteration corresponding to the provided emul_i, by performing the given emulation method and precalculating the prior expectation and variance values of the used model evaluation samples.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Generates
All data sets that are required to evaluate the emulator at the constructed iteration.

_create_new_emulator
()[source]¶ Creates a new master HDF5file that holds all the information of a new emulator and writes all important emulator details to it. Afterwards, resets all loaded emulator data and prepares the HDF5file and emulator for the construction of the first emulator iteration.
Generates
A new master HDF5file ‘prism.hdf5’ contained in the working directory specified in the
Pipeline
instance, holding all information required to construct the first iteration of the emulator.

_do_regression
(emul_i, emul_s_seq)[source]¶ Performs a forward stepwise linear regression for all requested emulator systems emul_s_seq in the provided emulator iteration emul_i. Calculates what the expectation values of all polynomial coefficients are. The polynomial order that is used in the regression depends on
poly_order
.Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
Generates (for every emulator system)
 rsdl_var : float
 Residual variance of the regression function.
 regr_score : float
 Fitscore of the regression function.
 poly_coef : 1D
ndarray
object  Array containing the expectation values of the nonzero polynomial coefficients.
 poly_powers : 2D
ndarray
object  Array containing the powers of the nonzero polynomial terms in the regression function.
 poly_idx : 1D
ndarray
object  Array containing the indices of the nonzero polynomial terms in the regression function.
 poly_coef_cov : 1D
ndarray
object (ifuse_regr_cov
is True)  Array containing the covariance values of the nonzero polynomial coefficients.

_evaluate
(emul_i, par_set)[source]¶ Evaluates the emulator systems emul_s_seq at iteration emul_i for given par_set.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
 par_set (1D
ndarray
object) – Model parameter value set to evaluate the emulator at.
Returns:

_get_active_par
(emul_i, emul_s_seq)[source]¶ Determines the active parameters to be used for every emulator system listed in emul_s_seq in the provided emulator iteration emul_i. Uses backwards stepwise elimination to determine the set of active parameters. The polynomial order that is used in the stepwise elimination depends on
poly_order
.Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
Generates (for every emulator system)
 active_par_data : 1D
ndarray
object  Array containing the indices of all the parameters that are active in the emulator iteration emul_i.

_get_adj_exp
(emul_i, emul_s_seq, par_set, cov_vec)[source]¶ Calculates the adjusted emulator expectation values for requested emulator systems emul_s_seq at a given emulator iteration emul_i for specified parameter set par_set and corresponding covariance vector cov_vec.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
 par_set (1D
ndarray
object) – Model parameter value set to calculate the adjusted emulator expectation for.  cov_vec (2D
ndarray
object) – Covariance vector corresponding to par_set.
Returns: adj_exp_val (1D
ndarray
object) – Adjusted emulator expectation value for all requested emulator systems on this MPI rank.

_get_adj_var
(emul_i, emul_s_seq, par_set, cov_vec)[source]¶ Calculates the adjusted emulator variance values for requested emulator systems emul_s_seq at a given emulator iteration emul_i for specified parameter set par_set and corresponding covariance vector cov_vec.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
 par_set (1D
ndarray
object) – Model parameter value set to calculate the adjusted emulator variance for.  cov_vec (2D
ndarray
object) – Covariance vector corresponding to par_set.
Returns: adj_var_val (1D
ndarray
object) – Adjusted emulator variance value for all requested emulator systems on this MPI rank.

_get_cov
(emul_i, emul_s_seq, par_set1, par_set2)[source]¶ Calculates the full emulator covariances for requested emulator systems emul_s_seq at emulator iteration emul_i for given parameter sets par_set1 and par_set2. The contributions to these covariances depend on
method
.Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
 par_set1, par_set2 (1D
ndarray
object or None) – If par_set1 and par_set2 are both not None, calculate covariances for par_set1 with par_set2. If par_set1 is not None and par_set2 is None, calculate covariances for par_set1 withsam_set
(covariance vector). If par_set1 and par_set2 are both None, calculate covariances forsam_set
(covariance matrix). When not None, par_set is the model parameter value set to calculate the covariances for.
Returns: cov (1D, 2D or 3D
ndarray
object) – Depending on the arguments provided, a covariance value, vector or matrix for requested emulator systems.

_get_cov_matrix
(emul_i, emul_s_seq)[source]¶ Calculates the (inverse) matrix of covariances between known model evaluation samples for requested emulator systems emul_s_seq at emulator iteration emul_i.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
Generates

_get_data_idx_flat
(emul_i)[source]¶ Obtains the data point identifiers data_idx that are assigned to each MPI rank for the provided emulator iteration emul_i and returns them as a single list. The number of data points assigned to each MPI rank is also returned.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Returns:  n_data_rank (list of int) – The number of data points each MPI rank has assigned to it for given emul_i. These values can be used to split data_idx_flat up again into MPI rankspecific data point identifiers.
 data_idx_flat (list of tuples) – The data point identifiers that are assigned to an MPI rank for given emul_i.

_get_default_parameters
()[source]¶ Generates a dict containing default values for all emulator parameters.
Returns: par_dict (dict) – Dict containing all default emulator parameter values.

_get_emul_i
(emul_i, cur_iter=True)[source]¶ Checks if the provided emulator iteration emul_i can be requested or replaces it if None was provided. This method requires all MPI ranks to call it simultaneously.
Parameters: emul_i (int or None) – Number indicating the requested emulator iteration. Other Parameters: cur_iter (bool) – Bool determining whether the current (True) or the next (False) emulator iteration is requested. Returns: emul_i (int) – The requested emulator iteration that passed the check.

_get_emul_space
(emul_i)[source]¶ Returns the boundaries of the hypercube that encloses the parameter space in which the provided emulator iteration emul_i is defined.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Returns: emul_space (2D ndarray
object) – The requested hypercube boundaries. If emul_i == 1, this is equal to the defined model parameter space.Note
The parameter space over which an emulator iteration is defined is always equal to the plausible space of the previous iteration.

_get_exp_dot_term
(emul_i, emul_s_seq)[source]¶ Precalculates the second expectation adjustment dotterm for requested emulator systems emul_s_seq at a given emulator iteration emul_i for all model evaluation samples and saves it for later use.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
Generates
 exp_dot_term : 2D
ndarray
object  2D array containing the precalculated values for the second adjustment dotterm of the adjusted expectation for requested emulator systems.

_get_inv_matrix
(matrix)[source]¶ Calculates the inverse of a given matrix. Right now only uses the
pinv()
function.Parameters: matrix (2D array_like) – Matrix to be inverted. Returns: matrix_inv (2D ndarray
object) – Inverse of the given matrix.

_get_poly_term_str
(active_par, poly_power)[source]¶ Returns the string representation of a polynomial term given by the provided active_par and poly_power.
Parameters:  active_par (list of int) – List containing the indices of the parameters whose polynomial powers are given in poly_power.
 poly_power (list of int) – List with the powers of the requested polynomial term.
Returns: poly_term (str) – String representation of the requested polynomial term.

_get_prior_exp
(emul_i, emul_s_seq, par_set)[source]¶ Calculates the prior expectation value for requested emulator systems emul_s_seq at a given emulator iteration emul_i for specified parameter set par_set. This expectation depends on
method
.Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
 par_set (1D
ndarray
object or None) – If None, calculate the prior expectation values of sam_set. If not None, calculate the prior expectation value for the given model parameter value set.
Returns: prior_exp (1D or 2D
ndarray
object) – Prior expectation values for either sam_set or par_set for requested emulator systems.

_get_regr_cov
(emul_i, emul_s_seq, par_set1, par_set2)[source]¶ Calculates the covariances of the regression function for requested emulator systems emul_s_seq at emulator iteration emul_i for given parameter sets par_set1 and par_set2.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
 par_set1, par_set2 (1D
ndarray
object or None) – If par_set1 and par_set2 are both not None, calculate regression covariances for par_set1 with par_set2. If par_set1 is not None and par_set2 is None, calculate regression covariances for par_set1 withsam_set
(covariance vector). If par_set1 and par_set2 are both None, calculate regression covariances forsam_set
(covariance matrix). When not None, par_set is the model parameter value set to calculate the regression covariances for.
Returns: regr_cov (1D, 2D or 3D
ndarray
object) – Depending on the arguments provided, a regression covariance value, vector or matrix for requested emulator systems.

_get_rsdl_var
(emul_i, emul_s_seq)[source]¶ Splits up the calculated residual variances for requested emulator systems emul_s_seq at emulator iteration emul_i into active and passive contributions.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 emul_s_seq (list of int) – List of numbers indicating the requested emulator systems.
Generates
 act_rsdl_var : list of float
 List containing the active portions of the residual variances.
 pas_rsdl_var : list of float
 List containing the passive portions of the residual variances.
If
f_infl
is not zero, this also includes the inflated residual variance values.

_load_data
(emul_i)[source]¶ Loads in all the important emulator data up to emulator iteration emul_i into memory.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Generates
All relevant emulator data up to emulator iteration emul_i is loaded into memory.

_load_emulator
(modellink_obj)[source]¶ Checks if the provided working directory contains a constructed emulator and loads in the emulator data accordingly.
Parameters: modellink_obj ( ModelLink
object) – Instance of theModelLink
class that links the emulated model to thisPipeline
object.

_prepare_new_iteration
(emul_i)[source]¶ Prepares the emulator for the construction of a new iteration emul_i. Checks if this iteration can be prepared or if it has been prepared before, and acts accordingly.
Parameters: emul_i (int) – Number indicating the requested emulator iteration. Returns: reload (bool) – Bool indicating whether or not the controller rank of the Pipeline
instance needs to reload its data.Generates
A new group in the master HDF5file with the emulator iteration as its name, containing subgroups corresponding to all emulator systems that will be used in this iteration.
Notes
Preparing an iteration that has been prepared before, causes that and all subsequent iterations of the emulator to be deleted. A check is carried out to see if it was necessary to reprepare the requested iteration and a warning is given if this check fails.

_read_data_idx
(emul_s_group)[source]¶ Reads in and combines the parts of the data point identifier that is assigned to the provided emul_s_group.
Parameters: emul_s_group ( Group
object) – The HDF5group from which the data point identifier needs to be read in.Returns: data_idx (tuple of {int, float, str}) – The combined data point identifier.

_retrieve_parameters
()[source]¶ Reads in the emulator parameters from the provided working directory and saves them in the current
Emulator
instance.

_save_data
(emul_i, lemul_s, data_dict)[source]¶ Saves a given data dict
{keyword: data}
at the given emulator iteration emul_i and local emulator system lemul_s to the HDF5file and as an data attribute to the currentEmulator
instance.Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 lemul_s (int or None) – Number indicating the requested local emulator system. If None, use the master emulator file instead.
 data_dict (dict) – Dict containing the data that needs to be saved to the HDF5file.
Keyword Arguments:  keyword ({‘active_par’; ‘active_par_data’; ‘cov_mat’; ‘exp_dot_term’; ‘mod_real_set’; ‘regression’, ‘rsdl_var’}) – String specifying the type of data that needs to be saved.
 data ({int; float; str; array_like} or dict) – The actual data that needs to be saved at data keyword keyword. If dict, save every item individually.
Generates
The specified data is saved to the HDF5file.

_set_mock_data
()[source]¶ Loads previously used mock data into the
ModelLink
object, overwriting the parameter estimates, data values, data errors, data spaces and data identifiers with their mock equivalents.Generates
Overwrites the corresponding
ModelLink
class properties with the previously used values (taken from the first emulator iteration).

_set_modellink
(modellink_obj, modellink_loaded)[source]¶ Sets the
ModelLink
object that will be used for constructing this emulator. If a constructed emulator is present, checks if provided modellink_obj argument matches theModelLink
subclass used to construct it.Parameters:  modellink_obj (
ModelLink
object) – Instance of theModelLink
class that links the emulated model to thisPipeline
object. The providedModelLink
object must match the one used to construct the loaded emulator.  modellink_loaded (str or None) – If str, the name of the
ModelLink
subclass that was used to construct the loaded emulator. If None, no emulator is loaded.
 modellink_obj (

_set_parameters
()[source]¶ Sets the Emulator parameters from the
prism_dict
property and saves them in the currentEmulator
instance.

_set_sam_set_data
(emul_i, sam_set)[source]¶ Sets the provided sam_set as the iteration data at the given emulator iteration emul_i.
Parameters:  emul_i (int) – Number indicating the requested emulator iteration.
 sam_set (2D
ndarray
object) – Array containing the model evaluation samples for emulator iteration emul_i.

_write_data_idx
(emul_s_group, data_idx)[source]¶ Splits a given data_idx up into individual parts and saves it as an attribute to the provided emul_s_group.
Parameters:  emul_s_group (
Group
object) – The HDF5group to which the data point identifier needs to be saved.  data_idx (tuple of {int, float, str}) – The data point identifier to be saved.
 emul_s_group (

__weakref__
¶ list of weak references to the object (if defined)

act_rsdl_var
¶ The active contribution of the residual variance of every emulator system on this MPI rank. Obtained from either
rsdl_var
(regression) orsigma
(Gaussian).Type: dict of float

active_emul_s
¶ The indices of the emulator systems on this MPI rank that are active.
Type: list of int

active_par
¶ The model parameter names that are considered active. Only available on the controller rank.
Type: list of str

active_par_data
¶ The model parameter names that are considered active for every emulator system on this MPI rank.
Type: dict of lists

ccheck
¶ The emulator system components that are still required to complete the construction of an emulator iteration on this MPI rank. The controller rank additionally lists the required components that are emulator iteration specific (‘mod_real_set’ and ‘active_par’).
Type: list of str

cov_mat_inv
¶ The inverses of the covariance matrices for every emulator system on this MPI rank.
Type: dict of ndarray

data_idx_to_core
¶ List of the data identifiers that were assigned to the emulator systems listed in
emul_s_to_core
. Only available on the controller rank.Type: list of lists

emul_i
¶ The last emulator iteration that is fully constructed for all emulator systems on this MPI rank.
Type: int

emul_s
¶ The indices of the emulator systems that are assigned to this MPI rank.
Type: list of int

emul_s_to_core
¶ List of the indices of the emulator systems that are assigned to every MPI rank. Only available on the controller rank.
Type: list of lists

emul_space
¶ The boundaries of the hypercube that encloses the parameter space in which the specified emulator iteration is defined. This is always equal to the plausible space of the previous iteration.
Type: dict of ndarray

emul_type
¶ The type of emulator that is currently loaded. This determines the way in which the
Pipeline
instance will treat thisEmulator
instance.Type: str

exp_dot_term
¶ The second expectation adjustment dotterm values of all model evaluation samples for every emulator system on this MPI rank.
Type: dict of ndarray

f_infl
¶ The residual variance inflation factor. The prior variance of all known samples is inflated by this factor multiplied with
rsdl_var
(regression) orsigma
(Gaussian). If this value is zero, no variance inflation is performed.Type: float

l_corr
¶ The Gaussian correlation lengths for all model parameters, which is defined as the maximum distance between two values of a specific model parameter within which the Gaussian contribution to the correlation between the values is still significant.
Type: dict of float

method
¶ The emulation method to use for constructing the emulator. Possible are ‘gaussian’, ‘regression’ and ‘full’.
Type: str

mod_set
¶ The model outputs corresponding to the samples in
sam_set
for every emulator system on this MPI rank.Type: dict of ndarray

n_cross_val
¶ Number of (kfold) crossvalidations that are used for determining the quality of the regression process. It is set to zero if crossvalidations are not used. If
method
== ‘gaussian’ anddo_active_anal
is False, this number is not required.Type: int

n_emul_s_tot
¶ Total number of emulator systems assigned to all MPI ranks combined. Only available on the controller rank.
Type: int

n_sam
¶ Number of model evaluation samples that have been/will be used to construct an emulator iteration.
Type: int

pas_rsdl_var
¶ The passive contribution of the residual variance of every emulator system on this MPI rank. If
f_infl
is not zero, this also includes the inflated residual variance value. Obtained from eitherrsdl_var
(regression) orsigma
(Gaussian).Type: dict of float

poly_coef
¶ The nonzero coefficients of the polynomial terms in the regression function for every emulator system on this MPI rank. Empty if
method
== ‘gaussian’.Type: list of ndarray

poly_coef_cov
¶ The covariances of all coefficients in
poly_coef
for every emulator system on this MPI rank. Empty ifmethod
== ‘gaussian’ oruse_regr_cov
is False.Type: list of ndarray

poly_idx
¶ The indices of all polynomial terms with nonzero coefficients in the regression function for every emulator system on this MPI rank. Empty if
method
== ‘gaussian’.Type: list of ndarray

poly_order
¶ Polynomial order that is considered for the regression process. If
method
== ‘gaussian’ anddo_active_anal
is False, this number is not required.Type: int

poly_powers
¶ The powers of all polynomial terms with nonzero coefficients in the regression function for every emulator system on this MPI rank. Empty if
method
== ‘gaussian’.Type: list of ndarray

poly_terms
¶ Overview of all polynomial terms with nonzero coefficients in the regression function for every emulator system on this MPI rank. Empty if
method
== ‘gaussian’.This is basically a humanreadable representation of
poly_coef
pluspoly_powers
. Given its formatting, it is not advised to use this for any operations.Type: dict of dict

prism_version
¶ The version of PRISM that was used to construct the emulator that is currently loaded.
Type: str

rsdl_var
¶ The residual variance of every emulator system on this MPI rank. Obtained from regression process and replaces the Gaussian sigma. Empty if
method
== ‘gaussian’.Type: dict of float

sam_set
¶ The model evaluation samples that have been/will be used to construct the specified emulator iteration.
Type: list of dict

sigma
¶ Value of the Gaussian sigma. If
method
!= ‘gaussian’, this value is not required, since it is obtained from the regression process instead.Type: float
