ModelLink¶
-
class
prism.modellink.
ModelLink
(*, model_parameters=None, model_data=None)[source]¶ Provides an abstract base class definition that allows the
Pipeline
class to be linked to any model/test object of choice. Every model wrapper used in thePipeline
class must be an instance of theModelLink
class.Description
The
ModelLink
class is an abstract base class, which forms the base for wrapping a model and allowing PRISM to use it effectively. Because it is mandatory for every model to be wrapped in a user-madeModelLink
subclass, several tools are provided to the user to make this as versatile as possible.The
ModelLink
class uses three properties that define the way the subclass will be used by PRISM:name
,call_type
andMPI_call
. The first defines what the name of the subclass is, which is used by PRISM to identify the subclass with and check if one did not use a different subclass by accident. The other two are flags that determine how thecall_model()
method should be used. These three properties can be set anywhere during the initialization of theModelLink
subclass, or are set to a default value if they are not modified.Every
ModelLink
subclass needs to be provided with two different data sets: model parameters and model data. The model parameters define which parameters the model can take, what their names are and in what value range each parameter must be. The model data on the other hand, states where in a model realization a data value must be retrieved and compared with a provided observational value. One can think of the model data as the observational constraints used to calculate the likelihood in a Bayesian analysis.The model parameters and model data can be set in two different ways. They can be hard-coded into the
ModelLink
subclass by altering theget_default_model_parameters()
andget_default_model_data()
methods or set by providing them during class initialization. A combination of both is also possible. More details on this can be found in__init__()
.The
ModelLink
class has two abstract methods that must be overridden before the subclass can be initialized. Thecall_model()
method is the most important method, as it provides PRISM with a way of calling the model wrapped in theModelLink
subclass. Theget_md_var()
method allows for PRISM to calculate the model discrepancy variance.Notes
The
__init__()
method may be extended by theModelLink
subclass, but the superclass version must always be called.If required, one can use the
test_subclass()
function to test aModelLink
subclass on correct functionality.-
_ModelLink__set_model_data
(add_model_data)¶ Generates the model data properties from the default model data and the additional input argument add_model_data.
Parameters: add_model_data (array_like, dict, str or None) – Anything that can be converted to a dict that provides non-default model data information or None if only default data is used from get_default_model_data()
.Generates
- n_data : int
- Number of provided data points.
- data_val : list
- List with values of provided data points.
- data_err : list of lists
- List with upper and lower \(1\sigma\)-confidence levels of provided data points.
- data_spc : list
- List with types of value space ({‘lin’, ‘log’, ‘ln’}) of provided data points.
- data_idx : list of tuples
- List with user-defined data point identifiers.
-
_ModelLink__set_model_parameters
(add_model_parameters)¶ Generates the model parameter properties from the default model parameters and the additional input argument add_model_parameters.
Parameters: add_model_parameters (array_like, dict, str or None) – Anything that can be converted to a dict that provides non-default model parameters information or None if only default information is used from get_default_model_parameters()
.Generates
- n_par : int
- Number of model parameters.
- par_name : list
- List with model parameter names.
- par_rng :
ndarray
object - Array containing the lower and upper values of the model parameters.
- par_est : list
- List containing user-defined estimated values of the model parameters. Contains None in places where estimates were not provided.
-
__init__
(*, model_parameters=None, model_data=None)[source]¶ Initialize an instance of the
ModelLink
subclass.Other Parameters: model_parameters, model_data (array_like, dict, str or None. Default: None) – Anything that can be converted to a dict that provides non-default model parameters/data information or None if only default information is used from
get_default_model_parameters()
orget_default_model_data()
. For more information on the lay-out of these dicts, seeNotes
.If array_like, dict(model_parameters/model_data) must generate a dict with the correct lay-out. If dict, the dict itself must have the correct lay-out. If str, the string must be the path to a file containing the dict keys in the first column and the dict values in the second column, which combined generate a dict with the correct lay-out.
Notes (model_parameters)
The model parameters provides this
ModelLink
subclass with the names, ranges and estimates of all model parameters that need to be explored.The model parameters dict requires to have the name of the parameters as the keyword, and a 1D list containing the lower bound, the upper bound and, if applicable, the estimate of this parameter. It is not required to provide an estimate for every parameter. The estimates are used to draw illustrative lines when making projection figures. An example of a model parameters file can be found in the ‘data’ folder of the PRISM package. If required, one can use the
convert_parameters()
function to validate their formatting.- Formatting :
{par_name: [lower_bnd, upper_bnd, par_est]}
Notes (model_data)
The model data provides this
ModelLink
subclass with the observational data points that need to be used to constrain this model with.The model data dict requires to have the data identifiers (
data_idx
) as the keyword, and a 1D list containing the data value (data_val
); the data errors (data_err
) and the data space (data_spc
).If the data errors are given with one value, then the data points are assumed to have a centered \(1\sigma\)-confidence interval. If the data errors are given with two values, then the data points are assumed to have a \(1\sigma\)-confidence interval defined by the provided upper and lower errors.
The data spaces are one of five strings ({‘lin’, ‘log’ or ‘log_10’, ‘ln’ or ‘log_e’}) indicating in which of the three value spaces (linear, log, ln) the data values are. It defaults to ‘lin’ if it is not provided.
The data identifier is a sequence of bools, ints, floats and strings that is unique for every data point. PRISM uses it to identify a data point with, which is required in some cases (like MPI), while the model itself can use it as a description of the operations required to extract the data point from the model output. It can be provided as any sequence of any length for any data point. If any sequence contains a single element, it is replaced by just that element instead of a tuple.
A simple example of a data identifier is \(f(\text{data_idx}) = \text{data_val}\), where the output of the model is given by \(f(x)\).
An example of a model data file can be found in the ‘data’ folder of the PRISM package. If required, one can use the
convert_data()
function to validate their formatting.- Formatting :
{(data_idx_0, data_idx_1, ..., data_idx_n): [data_val,
data_err, data_spc]}
or
{(data_idx_0, data_idx_1, ..., data_idx_n): [data_val,
upper_data_err, lower_data_err, data_spc]}
-
_check_md_var
(md_var, name)[source]¶ Checks validity of provided set of model discrepancy variances md_var in this
ModelLink
instance.Parameters: - md_var (1D or 2D array_like or dict) – Model discrepancy variance set to validate in this
ModelLink
instance. - name (str) – The name of the model discrepancy set, which is used in the error message if the validation fails.
Returns: md_var (2D
ndarray
object) – The (converted) provided md_var if the validation was successful. If md_var was a dict, it will be converted to andarray
object.- md_var (1D or 2D array_like or dict) – Model discrepancy variance set to validate in this
-
_check_mod_set
(mod_set, name)[source]¶ Checks validity of provided set of model outputs mod_set in this
ModelLink
instance.Parameters: - mod_set (1D or 2D array_like or dict) – Model output (set) to validate in this
ModelLink
instance. - name (str) – The name of the model output (set), which is used in the error message if the validation fails.
Returns: mod_set (1D or 2D
ndarray
object) – The provided mod_set if the validation was successful. If mod_set was a dict, it will be converted to andarray
object (sorted ondata_idx
).- mod_set (1D or 2D array_like or dict) – Model output (set) to validate in this
-
_check_sam_set
(sam_set, name)[source]¶ Checks validity of provided set of model parameter samples sam_set in this
ModelLink
instance.Parameters: - sam_set (1D or 2D array_like or dict) – Parameter/sample set to validate in this
ModelLink
instance. - name (str) – The name of the parameter/sample set, which is used in the error message if the validation fails.
Returns: sam_set (1D or 2D
ndarray
object) – The provided sam_set if the validation was successful. If sam_set was a dict, it will be converted to andarray
object.- sam_set (1D or 2D array_like or dict) – Parameter/sample set to validate in this
-
_get_backup_path
(emul_i, suffix)[source]¶ Returns the absolute path to a backup file made by this
ModelLink
instance, using the provided emul_i and suffix.This method is used by the
_make_backup()
and_read_backup()
methods, and should not be called directly.Parameters: - emul_i (int) – The emulator iteration for which a backup filepath is needed.
- suffix (str or None) – If str, determine path to associated backup file using provided suffix. If suffix is empty, obtain last created backup file. If None, create a new path to a backup file.
Returns: filepath (str) – Absolute path to requested backup file.
-
_get_model_par_seq
(par_seq, name)[source]¶ Converts a provided sequence par_seq of model parameter names and indices to a list of indices, removes duplicates and checks if every provided name/index is valid.
Parameters: - par_seq (1D array_like of {int, str}) – A sequence of integers and strings determining which model parameters need to be used for a certain operation.
- name (str) – A string stating the name of the variable the result of this method will be stored in. Used for error messages.
Returns: par_seq_conv (list of int) – The provided sequence par_seq converted to a sorted list of model parameter indices.
-
_get_sam_space
(sam_set)[source]¶ Returns the boundaries of the hypercube that encloses the parameter space in which the provided sam_set is defined.
The main use for this function is to determine what part of model parameter space was likely sampled from in order to obtain the provided sam_set. Because of this, extra spacing is added to the boundaries to reduce the effect of the used sampling method.
Parameters: sam_set (1D or 2D array_like or dict) – Parameter/sample set for which an enclosing hypercube is requested. Returns: sam_space (2D ndarray
object) – The requested hypercube boundaries.
-
_make_backup
(*args, **kwargs)[source]¶ WARNING: This is an advanced utility method and probably will not work unless used properly. Use with caution!
Creates an HDF5-file backup of the provided args and kwargs when called by the
call_model()
method or any of its inner functions. Additionally, the backup will contain the emul_i, par_set and data_idx values that were passed to thecall_model()
method. It also contains the version of PRISM that made the backup. The backup can be restored using the_read_backup()
method.If it is detected that this method is used incorrectly, a
RequestWarning
is raised (and the method returns) rather than aRequestError
, in order to not disrupt the call tocall_model()
.Parameters: - args (positional arguments) – All positional arguments that must be stored in the backup file.
- kwargs (keyword arguments) – All keyword arguments that must be stored in the backup file.
Notes
The name of the created backup file contains the value of emul_i,
name
and a random string to avoid replacing an already existing backup file.The saved emul_i, par_set and data_idx are the values these variables have locally in the
call_model()
method at the point this method is called. Because of this, making any changes to them may cause problems and is therefore heavily discouraged. If changes are necessary, it is advised to copy them to a different variable first.
-
_read_backup
(emul_i, *, suffix=None)[source]¶ Reads in a backup HDF5-file created by the
_make_backup()
method, using the provided emul_i and the value ofname
.Parameters: emul_i (int) – The emulator iteration that was provided to the call_model()
method when the backup was made.Other Parameters: suffix (str or None. Default: None) – The suffix of the backup file (everything between parentheses) that needs to be read. If None or empty, the last created backup will be read. Returns: - filename (str) – The absolute path to the backup file that has been read.
- data (dict with keys (‘emul_i’, ‘prism_version’, ‘par_set’, ‘data_idx’, ‘args’, ‘kwargs’)) – A dict containing the data that was provided to the
_make_backup()
method.
-
_to_par_space
(sam_set)[source]¶ Converts provided sam_set from unit space ([0, 1]) to parameter space ([lower_bnd, upper_bnd]).
-
_to_unit_space
(sam_set)[source]¶ Converts provided sam_set from parameter space ([lower_bnd, upper_bnd]) to unit space ([0, 1]).
-
call_model
(emul_i, par_set, data_idx)[source]¶ Calls the model wrapped in this
ModelLink
subclass at emulator iteration emul_i for model parameter values par_set and returns the data points corresponding to data_idx.This method is called with solely keyword arguments.
This is an abstract method and must be overridden by the
ModelLink
subclass.Parameters: - emul_i (int) – Number indicating the requested emulator iteration.
- par_set (dict of
float64
) – Dict containing the values for all model parameters corresponding to the requested model realization(s). If model is single-called, dict item is formatted as{par_name: par_val}
. If multi-called, it is formatted as{par_name: [par_val_1, par_val_2, ..., par_val_n]}
. - data_idx (list of tuples) – List containing the user-defined data point identifiers corresponding to the requested data points.
Returns: data_val (1D or 2D array_like or dict) – Array containing the data values corresponding to the requested data points generated by the requested model realization(s). If model is multi-called, data_val is of shape
(n_sam, n_data)
. If dict, it has the identifiers in data_idx as its keys with either scalars or 1D array_likes as its values.Note
If this model is multi-called, then the parameter sets in the provided par_set dict will be sorted in order of parameter name (e.g., sort on first parameter first, then on second parameter, etc.).
-
get_default_model_data
()[source]¶ Returns the default model data to use for every instance of this
ModelLink
subclass. By default, returns_default_model_data
.
-
get_default_model_parameters
()[source]¶ Returns the default model parameters to use for every instance of this
ModelLink
subclass. By default, returns_default_model_parameters
.
-
get_md_var
(emul_i, par_set, data_idx)[source]¶ Calculates the linear model discrepancy variance at a given emulator iteration emul_i for model parameter values par_set and given data points data_idx for the model wrapped in this
ModelLink
subclass.This method is always single-called by one MPI rank with solely keyword arguments.
This is an abstract method and must be overridden by the
ModelLink
subclass.Parameters: - emul_i (int) – Number indicating the requested emulator iteration.
- par_set (dict of
float64
) – Dict containing the values for all model parameters corresponding to the requested model realization. - data_idx (list of tuples) – List containing the user-defined data point identifiers corresponding to the requested data points.
Returns: md_var (1D or 2D array_like) – Array containing the linear model discrepancy variance values corresponding to the requested data points. If 1D array_like, data is assumed to have a centered one sigma confidence interval. If 2D array_like, the values determine the upper and lower variances and the array is of shape
(n_data, 2)
. If dict, it has the identifiers in data_idx as its keys with either scalars or 1D array_likes of length 2 as its values.Notes
The returned model discrepancy variance values must be of linear form, even for those data values that are returned in logarithmic form by the
call_model()
method. If not, the possibility exists that the emulation process will not converge properly.
-
get_str_repr
()[source]¶ Returns a list of string representations of all additional input arguments with which this
ModelLink
subclass was initialized.
-
MPI_call
¶ Whether
call_model()
can/should be called by all MPI ranks simultaneously instead of by the controller. By default, only the controller rank calls the model (False).Type: bool
-
__weakref__
¶ list of weak references to the object (if defined)
-
_default_model_data
¶ The default model data to use for every instance of this
ModelLink
subclass.Type: dict
-
_default_model_parameters
¶ The default model parameters to use for every instance of this
ModelLink
subclass.Type: dict
-
call_type
¶ String indicating whether
call_model()
should be supplied with a single evaluation sample (‘single’) or a set of samples (‘multi’), or can be supplied with both (‘hybrid’). By default, single model calls are requested (‘single’).Type: str
-
data_err
¶ The upper and lower \(1\sigma\)-confidence levels of provided data points.
Type: list of float
-
data_idx
¶ The user-defined data point identifiers.
Type: list of tuples
-
data_spc
¶ The types of value space ({‘lin’, ‘log’, ‘ln’}) of provided data points.
Type: list of str
-
data_val
¶ The values of provided data points.
Type: list of float
-
multi_call
¶ Whether
call_model()
can/should be supplied with a set of evaluation samples. At least one ofsingle_call
andmulti_call
must be True. By default, single model calls are requested (False).Type: bool
-
name
¶ Name associated with an instance of this
ModelLink
subclass. By default, it is set to the name of thisModelLink
subclass. Can be manually manipulated to allow for more user control.Type: str
-
par_est
¶ The user-defined estimated values of the model parameters. Contains None in places where estimates were not provided.
Type: dict of {float, None}
-
par_name
¶ List with model parameter names.
Type: list of str
-
single_call
¶ Whether
call_model()
can/should be supplied with a single evaluation sample. At least one ofsingle_call
andmulti_call
must be True. By default, single model calls are requested (True).Type: bool
-