Concepts in fmu-ensemble

There are some concepts in fmu-ensemble that one needs to understand at some point

Realization

A realization is a single model run. It usually includes one Eclipse run, but that is not a requirement. As long as you have a done some computational job that has left some (supported) files on the filesystem, it can be possible to load results from the realization using fmu-ensemble.

ScratchRealization

The class ScratchRealization is a Python object that can load realization results (and input) from the filesystem from a single realization, typically located on /scratch. You can ask the object to load and thereby internalize data, with the load_*() functions. The internalization is an important concept. All data that you internalize will be stored in the object, it can be easily reaccessed, statistics can be computed. The object is tied to the filesystem, and will become unstable if files are deleted from disk. If you need some persistence of the object, it must be converted to a VirtualRealization (see below)

Additionally, you may find get_*() functions that can access certain datatypes. Common for these is that they will not modify the object, it is a read-once operation. This is particularly relevant for Eclipse summary data, where you at different times may ask for different subsets and at different sampling frequencies, but do not want to internalize all the data into the object.

VirtualRealization

A VirtualRealization is typically a ScratchRealization where you have removed the tie to the location on the original filesystem. The VirtualRealization object only knows of the data that was internalized while being a ScratchRealization, and the main access function is get_df(). When a realization is a ScratchRealization, you can always ask for any time-resolution for Eclipse data, or any other CSV file that the scratch directory contains. A VirtualRealization will have to interpolate if you ask for daily summary data, in case you only internalized yearly or monthly before making a VirtualRealization.

Another typical source for a virtual realization is a calculated realization, either as a point-wise statistical aggregate of a collection of realizations (ensemble), or maybe from a linear combination of realizations (then coming from the object RealizationCombination. The reason for naming this a VirtualRealization is due to the possibility to handle computed realizations fully analogous to ScratchRealizations.

Virtual realizations have features to write their internal data to disk, the to_disk() feature. This can be used to store stripped down versions of realizations, or to be able to store computed realizations. You may both write to disk into a file structure that would resemble the original realization (and you can edit the files if you are bold enough). The virtual realization can later be instantiated from the dumped disk structure by load_disk(). Another variant for storage is to use to_json() which will dump all data in as a json datatype, for which there probably exists use cases.

RealizationCombination

The object RealizationCombination is an object that the user will not observe directly, but it will work under the hood every time arithmetic operations on realizations are done.

Ensemble

An ensemble is a collection of realizations, where the realizations share some common features making it relevant to do statistical aggregations over them. This does not necessarily exclude random collections of realizations, but if the realizations do not share anything, they can always be collected in simple Python lists as well.

A requirement is that each member of the ensembled are referred to by an integer, and will always be present in the column REAL in aggregated dataframes.

ScratchEnsemble

A ScratchEnsemble is an ensemble that is initialized from a directory of realization-runs on the file system, typically on /scratch. This object can do a full initialization of all ScratchRealization in a specific directory, and collect them into a ensemble object. The ScratchEnsemble object itself is very light, it is more or less only a list of ScratchRealization objects, and support functions for running operations on all of them. Whenever you ask for an aggregated dataset, data aggregation over all realizations is performed on the fly.

VirtualEnsemble

Analogous to the relationship between ScratchRealization and VirtualRealization, a VirtualEnsemble is an ensemble with no strings attached to the original filesystem. All data from its underlying realizations is aggregated and full dataframes are stored. The object is able to construct new VirtualRealization objects from its data, both by picking by index, or from statistical aggregations.

You can ask a VirtualEnsemble for get_smry() in which it will try its best to locate internalized Eclipse summary data, and then interpolate the data to your chosen time index. This is opposed to a ScratchEnsemble which in that case would go back to the original binary files and give the correct answer.

Since all data is aggregated, VirtualEnsembles may work faster than ScratchEnsemble. The recommended procedure would then be to initialize a ScratchEnsemble, internalize all data you need using load_*(), and then make a VirtualEnsemble object using .to_virtual(). Also, if you want to add additional ensemble data, aggregated by your own ad-hoc code, you can only do that on VirtualEnsemble, which has an .append() function. It is also possible to replace existing aggregated data in a VirtualEnsemble, for example making new summary vectors that are combinations of other data.

EnsembleCombination

Whenever you try do add or substract ensembles, the objects you get in return are of type EnsembleCombination. These objects act as ensembles, but its data is always a combination of the data in two or more ensembles (or a single ensemble scaled by a scalar).

Calculating a combination of ensembles can be computationally expensive, depending on the amount of data requested and included. The actual combination of numbers is not done until you actually ask for it. That means that initialization of EnsembleCombination is fast, but when you ask for its data, it might take time. If you want all data to be evaluated in one go, you ask the object for a VirtualEnsemble using the function .to_virtual(), which means that all internalized data is evaluated and returned to you for further access and/or storage.

The implementation of the linear algebra over ensembles and realizations is accomplished using a Binary Expression Tree, with ScratchEnsemble or VirtualEnsemble at the leaf nodes.