ModelLink¶

class prism.modellink.ModelLink(*, model_parameters=None, model_data=None)[source]¶

Provides an abstract base class definition that allows the Pipeline class to be linked to any model/test object of choice. Every model wrapper used in the Pipeline class must be an instance of the ModelLink class.

Description

The ModelLink class is an abstract base class, which forms the base for wrapping a model and allowing PRISM to use it effectively. Because it is mandatory for every model to be wrapped in a user-made ModelLink subclass, several tools are provided to the user to make this as versatile as possible.

The ModelLink class uses three properties that define the way the subclass will be used by PRISM: name, call_type and MPI_call. The first defines what the name of the subclass is, which is used by PRISM to identify the subclass with and check if one did not use a different subclass by accident. The other two are flags that determine how the call_model() method should be used. These three properties can be set anywhere during the initialization of the ModelLink subclass, or are set to a default value if they are not modified.

Every ModelLink subclass needs to be provided with two different data sets: model parameters and model data. The model parameters define which parameters the model can take, what their names are and in what value range each parameter must be. The model data on the other hand, states where in a model realization a data value must be retrieved and compared with a provided observational value. One can think of the model data as the observational constraints used to calculate the likelihood in a Bayesian analysis.

The model parameters and model data can be set in two different ways. They can be hard-coded into the ModelLink subclass by altering the get_default_model_parameters() and get_default_model_data() methods or set by providing them during class initialization. A combination of both is also possible. More details on this can be found in __init__().

The ModelLink class has two abstract methods that must be overridden before the subclass can be initialized. The call_model() method is the most important method, as it provides PRISM with a way of calling the model wrapped in the ModelLink subclass. The get_md_var() method allows for PRISM to calculate the model discrepancy variance.

Notes

The __init__() method may be extended by the ModelLink subclass, but the superclass version must always be called.

If required, one can use the test_subclass() function to test a ModelLink subclass on correct functionality.

_ModelLink__set_model_data(add_model_data)¶

Generates the model data properties from the default model data and the additional input argument add_model_data.

Parameters:	add_model_data (array_like, dict, str or None) – Anything that can be converted to a dict that provides non-default model data information or None if only default data is used from `get_default_model_data()`.

Generates

n_data : int: Number of provided data points.
data_val : list: List with values of provided data points.
data_err : list of lists: List with upper and lower \(1\sigma\)-confidence levels of provided data points.
data_spc : list: List with types of value space ({‘lin’, ‘log’, ‘ln’}) of provided data points.
data_idx : list of tuples: List with user-defined data point identifiers.

_ModelLink__set_model_parameters(add_model_parameters)¶

Generates the model parameter properties from the default model parameters and the additional input argument add_model_parameters.

Parameters:	add_model_parameters (array_like, dict, str or None) – Anything that can be converted to a dict that provides non-default model parameters information or None if only default information is used from `get_default_model_parameters()`.

Generates

n_par : int: Number of model parameters.
par_name : list: List with model parameter names.
par_rng : ndarray object: Array containing the lower and upper values of the model parameters.
par_est : list: List containing user-defined estimated values of the model parameters. Contains None in places where estimates were not provided.

__init__(*, model_parameters=None, model_data=None)[source]¶

Initialize an instance of the ModelLink subclass.

Other Parameters:

model_parameters, model_data (array_like, dict, str or None. Default: None) – Anything that can be converted to a dict that provides non-default model parameters/data information or None if only default information is used from get_default_model_parameters() or get_default_model_data(). For more information on the lay-out of these dicts, see Notes.

If array_like, dict(model_parameters/model_data) must generate a dict with the correct lay-out. If dict, the dict itself must have the correct lay-out. If str, the string must be the path to a file containing the dict keys in the first column and the dict values in the second column, which combined generate a dict with the correct lay-out.

Notes (model_parameters)

The model parameters provides this ModelLink subclass with the names, ranges and estimates of all model parameters that need to be explored.

The model parameters dict requires to have the name of the parameters as the keyword, and a 1D list containing the lower bound, the upper bound and, if applicable, the estimate of this parameter. It is not required to provide an estimate for every parameter. The estimates are used to draw illustrative lines when making projection figures. An example of a model parameters file can be found in the ‘data’ folder of the PRISM package. If required, one can use the convert_parameters() function to validate their formatting.

Formatting :: {par_name: [lower_bnd, upper_bnd, par_est]}

Notes (model_data)

The model data provides this ModelLink subclass with the observational data points that need to be used to constrain this model with.

The model data dict requires to have the data identifiers (data_idx) as the keyword, and a 1D list containing the data value (data_val); the data errors (data_err) and the data space (data_spc).

If the data errors are given with one value, then the data points are assumed to have a centered \(1\sigma\)-confidence interval. If the data errors are given with two values, then the data points are assumed to have a \(1\sigma\)-confidence interval defined by the provided upper and lower errors.

The data spaces are one of five strings ({‘lin’, ‘log’ or ‘log_10’, ‘ln’ or ‘log_e’}) indicating in which of the three value spaces (linear, log, ln) the data values are. It defaults to ‘lin’ if it is not provided.

The data identifier is a sequence of bools, ints, floats and strings that is unique for every data point. PRISM uses it to identify a data point with, which is required in some cases (like MPI), while the model itself can use it as a description of the operations required to extract the data point from the model output. It can be provided as any sequence of any length for any data point. If any sequence contains a single element, it is replaced by just that element instead of a tuple.

A simple example of a data identifier is \(f(\text{data_idx}) = \text{data_val}\), where the output of the model is given by \(f(x)\).

An example of a model data file can be found in the ‘data’ folder of the PRISM package. If required, one can use the convert_data() function to validate their formatting.

Formatting :

{(data_idx_0, data_idx_1, ..., data_idx_n): [data_val, data_err, data_spc]}

or

{(data_idx_0, data_idx_1, ..., data_idx_n): [data_val, upper_data_err, lower_data_err, data_spc]}

_check_md_var(md_var, name)[source]¶

Checks validity of provided set of model discrepancy variances md_var in this ModelLink instance.

Parameters:	md_var (1D or 2D array_like or dict) – Model discrepancy variance set to validate in this `ModelLink` instance. name (str) – The name of the model discrepancy set, which is used in the error message if the validation fails.
Returns:	md_var (2D `ndarray` object) – The (converted) provided md_var if the validation was successful. If md_var was a dict, it will be converted to a `ndarray` object.

_check_mod_set(mod_set, name)[source]¶

Checks validity of provided set of model outputs mod_set in this ModelLink instance.

Parameters:	mod_set (1D or 2D array_like or dict) – Model output (set) to validate in this `ModelLink` instance. name (str) – The name of the model output (set), which is used in the error message if the validation fails.
Returns:	mod_set (1D or 2D `ndarray` object) – The provided mod_set if the validation was successful. If mod_set was a dict, it will be converted to a `ndarray` object (sorted on `data_idx`).

_check_sam_set(sam_set, name)[source]¶

Checks validity of provided set of model parameter samples sam_set in this ModelLink instance.

Parameters:	sam_set (1D or 2D array_like or dict) – Parameter/sample set to validate in this `ModelLink` instance. name (str) – The name of the parameter/sample set, which is used in the error message if the validation fails.
Returns:	sam_set (1D or 2D `ndarray` object) – The provided sam_set if the validation was successful. If sam_set was a dict, it will be converted to a `ndarray` object.

_get_backup_path(emul_i, suffix)[source]¶

Returns the absolute path to a backup file made by this ModelLink instance, using the provided emul_i and suffix.

This method is used by the _make_backup() and _read_backup() methods, and should not be called directly.

Parameters:	emul_i (int) – The emulator iteration for which a backup filepath is needed. suffix (str or None) – If str, determine path to associated backup file using provided suffix. If suffix is empty, obtain last created backup file. If None, create a new path to a backup file.
Returns:	filepath (str) – Absolute path to requested backup file.

_get_model_par_seq(par_seq, name)[source]¶

Converts a provided sequence par_seq of model parameter names and indices to a list of indices, removes duplicates and checks if every provided name/index is valid.

Parameters:	par_seq (1D array_like of {int, str}) – A sequence of integers and strings determining which model parameters need to be used for a certain operation. name (str) – A string stating the name of the variable the result of this method will be stored in. Used for error messages.
Returns:	par_seq_conv (list of int) – The provided sequence par_seq converted to a sorted list of model parameter indices.

_get_sam_space(sam_set)[source]¶

Returns the boundaries of the hypercube that encloses the parameter space in which the provided sam_set is defined.

The main use for this function is to determine what part of model parameter space was likely sampled from in order to obtain the provided sam_set. Because of this, extra spacing is added to the boundaries to reduce the effect of the used sampling method.

Parameters:	sam_set (1D or 2D array_like or dict) – Parameter/sample set for which an enclosing hypercube is requested.
Returns:	sam_space (2D `ndarray` object) – The requested hypercube boundaries.

_make_backup(*args, **kwargs)[source]¶

WARNING: This is an advanced utility method and probably will not work unless used properly. Use with caution!

Creates an HDF5-file backup of the provided args and kwargs when called by the call_model() method or any of its inner functions. Additionally, the backup will contain the emul_i, par_set and data_idx values that were passed to the call_model() method. It also contains the version of PRISM that made the backup. The backup can be restored using the _read_backup() method.

If it is detected that this method is used incorrectly, a RequestWarning is raised (and the method returns) rather than a RequestError, in order to not disrupt the call to call_model().

Parameters:	args (positional arguments) – All positional arguments that must be stored in the backup file. kwargs (keyword arguments) – All keyword arguments that must be stored in the backup file.

Notes

The name of the created backup file contains the value of emul_i, name and a random string to avoid replacing an already existing backup file.

The saved emul_i, par_set and data_idx are the values these variables have locally in the call_model() method at the point this method is called. Because of this, making any changes to them may cause problems and is therefore heavily discouraged. If changes are necessary, it is advised to copy them to a different variable first.

_read_backup(emul_i, *, suffix=None)[source]¶

Reads in a backup HDF5-file created by the _make_backup() method, using the provided emul_i and the value of name.

Other Parameters:
Parameters:	emul_i (int) – The emulator iteration that was provided to the `call_model()` method when the backup was made.
	suffix (str or None. Default: None) – The suffix of the backup file (everything between parentheses) that needs to be read. If None or empty, the last created backup will be read.
Returns:	filename (str) – The absolute path to the backup file that has been read. data (dict with keys (‘emul_i’, ‘prism_version’, ‘par_set’, ‘data_idx’, ‘args’, ‘kwargs’)) – A dict containing the data that was provided to the `_make_backup()` method.

_to_par_space(sam_set)[source]¶: Converts provided sam_set from unit space ([0, 1]) to parameter space ([lower_bnd, upper_bnd]).

_to_unit_space(sam_set)[source]¶: Converts provided sam_set from parameter space ([lower_bnd, upper_bnd]) to unit space ([0, 1]).

call_model(emul_i, par_set, data_idx)[source]¶

Calls the model wrapped in this ModelLink subclass at emulator iteration emul_i for model parameter values par_set and returns the data points corresponding to data_idx.

This method is called with solely keyword arguments.

This is an abstract method and must be overridden by the ModelLink subclass.

Parameters:

emul_i (int) – Number indicating the requested emulator iteration.
par_set (dict of float64) – Dict containing the values for all model parameters corresponding to the requested model realization(s). If model is single-called, dict item is formatted as {par_name: par_val}. If multi-called, it is formatted as {par_name: [par_val_1, par_val_2, ..., par_val_n]}.
data_idx (list of tuples) – List containing the user-defined data point identifiers corresponding to the requested data points.

Returns:

data_val (1D or 2D array_like or dict) – Array containing the data values corresponding to the requested data points generated by the requested model realization(s). If model is multi-called, data_val is of shape (n_sam, n_data). If dict, it has the identifiers in data_idx as its keys with either scalars or 1D array_likes as its values.

Note

If this model is multi-called, then the parameter sets in the provided par_set dict will be sorted in order of parameter name (e.g., sort on first parameter first, then on second parameter, etc.).

get_default_model_data()[source]¶: Returns the default model data to use for every instance of this ModelLink subclass. By default, returns _default_model_data.

get_default_model_parameters()[source]¶: Returns the default model parameters to use for every instance of this ModelLink subclass. By default, returns _default_model_parameters.

get_md_var(emul_i, par_set, data_idx)[source]¶

Calculates the linear model discrepancy variance at a given emulator iteration emul_i for model parameter values par_set and given data points data_idx for the model wrapped in this ModelLink subclass.

This method is always single-called by one MPI rank with solely keyword arguments.

This is an abstract method and must be overridden by the ModelLink subclass.

Parameters:

emul_i (int) – Number indicating the requested emulator iteration.
par_set (dict of float64) – Dict containing the values for all model parameters corresponding to the requested model realization.
data_idx (list of tuples) – List containing the user-defined data point identifiers corresponding to the requested data points.

Returns:

md_var (1D or 2D array_like) – Array containing the linear model discrepancy variance values corresponding to the requested data points. If 1D array_like, data is assumed to have a centered one sigma confidence interval. If 2D array_like, the values determine the upper and lower variances and the array is of shape (n_data, 2). If dict, it has the identifiers in data_idx as its keys with either scalars or 1D array_likes of length 2 as its values.

Notes

The returned model discrepancy variance values must be of linear form, even for those data values that are returned in logarithmic form by the call_model() method. If not, the possibility exists that the emulation process will not converge properly.

get_str_repr()[source]¶: Returns a list of string representations of all additional input arguments with which this ModelLink subclass was initialized.

MPI_call¶

Whether call_model() can/should be called by all MPI ranks simultaneously instead of by the controller. By default, only the controller rank calls the model (False).

Type:	bool

__weakref__¶: list of weak references to the object (if defined)

_default_model_data¶

The default model data to use for every instance of this ModelLink subclass.

Type:	dict

_default_model_parameters¶

The default model parameters to use for every instance of this ModelLink subclass.

Type:	dict

call_type¶

String indicating whether call_model() should be supplied with a single evaluation sample (‘single’) or a set of samples (‘multi’), or can be supplied with both (‘hybrid’). By default, single model calls are requested (‘single’).

Type:	str

data_err¶

The upper and lower \(1\sigma\)-confidence levels of provided data points.

Type:	list of float

data_idx¶

The user-defined data point identifiers.

Type:	list of tuples

data_spc¶

The types of value space ({‘lin’, ‘log’, ‘ln’}) of provided data points.

Type:	list of str

data_val¶

The values of provided data points.

Type:	list of float

multi_call¶

Whether call_model() can/should be supplied with a set of evaluation samples. At least one of single_call and multi_call must be True. By default, single model calls are requested (False).

Type:	bool

n_data¶

Number of provided data points.

Type:	int

n_par¶

Number of model parameters.

Type:	int

name¶

Name associated with an instance of this ModelLink subclass. By default, it is set to the name of this ModelLink subclass. Can be manually manipulated to allow for more user control.

Type:	str

par_est¶

The user-defined estimated values of the model parameters. Contains None in places where estimates were not provided.

Type:	dict of {float, None}

par_name¶

List with model parameter names.

Type:	list of str

par_rng¶

The lower and upper values of the model parameters.

Type:	`ndarray`

single_call¶

Whether call_model() can/should be supplied with a single evaluation sample. At least one of single_call and multi_call must be True. By default, single model calls are requested (True).

Type:	bool