Usage ===== fmu-ensemble is designed for use in several scenarios: * Interactive use in the (i)python interpreter or Jupyter * Part of an ERT workflow, typically after the ensemble is finished as a *POST_WORKFLOW_HOOK* * Part of other scripts or utilities, either for analysis or preparatory work before f.ex. a webviz instance is generated As an introduction to the module, we will go through interactive usage in the python interpreter. Whether you use ipython or jupyter does not matter. It is recommended to choose `ipython` over `python`. Prerequisites ------------- Basic knowlegde of Python is needed to use the module. For simple use, copy-paste from other projects will take you far. For something extra, it is strongly recommended to spend time learning the `Pandas`_ library and understand how you can in very short Python code do a lot of data processing and handling. Most data is exposed as Pandas dataframes. .. _Pandas: https://pandas.pydata.org/ Basic interactive usage ----------------------- Loading an ensemble ^^^^^^^^^^^^^^^^^^^ An ensemble must be loaded from the filesystem (typically `/scratch`) into Python's memory first. .. code-block:: python from fmu import ensemble ens = ensemble.ScratchEnsemble('reek_r001_iter0', '/scratch/fmustandmat/r001_reek_scratch/realization-*/iter-3') # Type the object name to check what you got ens # the output should be something like # You name your ensemble in the first argument. This name is used when you combine the ensemble with other ensembles into an ``EnsembleSet``. The path is where on the filesystem your realizations roots are. The realization root is also called RUNPATH in ERT terminology, and is where you have the ``STATUS`` file among others. When you initialize single ensembles, ensure you do not mix ``iter-3`` with ``iter-*``, where the latter only makes sense when you initialize an *EnsembleSet*, see below. When a `ScratchEnsemble` object is intialized, only rudimentary loading of the ensemble is performed, like loading ``STATUS`` and ``parameters.txt``. It is the intention that this operation should be fast, and any heavy data parsing is not done until the user requests it. When an ensemble is loaded into memory, you can ask for certain properties, .. code-block:: python # Obtain a Pandas Dataframe of the parameters ens.parameters # Unique realizations indices with parameters.txt ens.parameters['REAL'].unique() # List of parameters available: ens.parameters.columns Loading multiple ensembles ^^^^^^^^^^^^^^^^^^^^^^^^^^ If you have multiple ensembles in the typical ``realization-*/iter-*`` directory structure, you can load all these ensembles in one go: .. code-block:: python ens_set = ensemble.EnsembleSet('hm_attempt01', frompath='/scratch/fmustandmat/r001_reek_scratch/') This will look for realizations and iterations and group them accordingly. Ensemble names will be inferred from the iteration directory level, and will be named `iter-X` If you have run prediction ensembles, which do not match `iter-X` in the directory name, you have to add them manually to the ensemble set: .. code-block:: python # Augment the existing ens_set object ens_set.add_ensemble(ensemble.ScratchEnsemble('pred-dg3', '/scratch/fmustandmat/r001_reek_scratch/realization-*/pred-dg3/')) EnsembleSet object can be treated almost as Ensemble objects. Operations on ensemble sets will typically be applied to each ensemble member. A difference is that aggregated data structures always have an extra column called ``ENSEMBLE`` that contains the ensemble names. If you in ERT have exported a "runpath file", you can initialize an EnsembleSet from that file with .. code-block:: python # Load from an ERT runpath file ens_set = ensemble.EnsembleSet('hm', runpath='/foo/bar/ert-runpath-file') The realization and iteration integers are taken directly from the information in this file. For runpath files with only one ensemble, it is also possible to initialize ScratchEnsembles directly. It is possible to load directory structures like ``iter_*/real_*``, but you will need to look more closely into the API for the EnsembleSet object, and provide regular expressions for determining the iteration names and realization indices. Obtaining warning and error messages ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Application/script authors can configure logging output to console by e.g. .. code-block:: python import logging logging.basicConfig(level=logging.INFO) See documentation on `Python logging`_ for more details. .. _Python logging: https://docs.python.org/3/library/logging.html Reading Eclipse data ^^^^^^^^^^^^^^^^^^^^ The ensemble class has specific support for parsing binary files produced by reservoir simulator outputting the Eclipse binary format. This support is through `resdata`_. .. _resdata: https://github.com/equinor/resdata .. code-block:: python # Get a dataframe with monthly summary data for all field vectors # and all well vectors smry = ens.get_smry(column_keys=['F*', 'W*'], time_index='monthly') The Python object ``smry`` is now a Pandas DataFrame (a table) containing the summary data you requested. Each row is the values for a specific realization at a specific time. Pandas DataFrames can be written to disk as CSV files quite easily using e.g. ``smry.to_csv('summaryvectors.csv', index=False)``. For `time_index` you may also try `yearly`, `daily` or `raw`. Check the function documentation for further possibilities. If you replace `get_smry` with `load_smry` the same dataframe will also be internalized, see below. By default, Eclipse summary files will be searched for in `eclipse/model`, and then files with the suffix `*.UNSMRY`. In case you either have multiple `UNSMRY` files in that directory, or if you have them in a different directory you need to hint to the exact location beforehand, using the *file discovery* (`find_files()`) feature. If your Eclipse output files is at the realization root (the old standard), you only need to issue .. code-block:: python ens.find_files("*.UNSMRY") prior to running `load_smry()`. If your problem is multiple Eclipse run in the same directory, you have to explicitly discover the full path for the file in the call to `find_files()`. If you have used the `runpathfile` feature of ensemble initialization, file discovery of the correct `UNSMRY` file is done automatically. Rate handling in Eclipse summary vectors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Eclipse summary vectors with of *rate* type (oil rate, water rate etc.) are to be interpreted carefully. A value of e.g. `FOPR` at a specific date means that the value is valid backwards in time, until the prior point in time where data is available. For correct rates, you must use the `raw` time index for `get_smry()`, anything else will only give you an approximation. Also, you can not assume that summing the rates at every point in time corresponds to the associated cumulative summary vectors, e.g. `FOPT`, as there are multiple features into play here with efficienty factors etc. It is however possible to ask an ensemble or realization to compute so called "volumetric rates", which are then computed from cumulative columns. Eclipse summary rate data is ignored in this computation, only e.g. `FOPT`. You can then ask to get a "volumetric rate" for `FOPT` at various time indices, yearly will give you yearly volumes, monthly will give monthly volumes etc. The data is returned as `FOPR` but you must be careful not to mix its meaning with the original `FOPR`. It is also possible to supply a custom time index (with arbitrary time between each index), but where the volumetric rates are scaled to correspond to daily/monthly/yearly rates. These will sum up to the cumulative given correct integration (with time interval length weigthing). .. code-block:: python # Examples for volumetric rate computations, yearly rates: yearly_volumes = ens.get_volumetric_rates(column_keys='FOPT', time_index='yearly') # For each month, compute the average daily rate: daily_rates = ens.get_volumetric_rates(column_keys='FOPT', time_index='monthly', time_unit='days') Internalized data ^^^^^^^^^^^^^^^^^ The ensemble object (which holds a collection of realization objects) will internalize the data it reads if and when you call ``load_()``, meaning that it will keep the dataframes produced in memory for later retrieval. You can ask the ensemble objects for what data it currently contains by calling ``ens.keys()`` (this is a call that is forwarded to each realization, and you are seeing all keys that are in at least one realization). Note that for ScratchEnsemble objects, the data is held in each realization object, and aggregated upon request. The ensemble object is able to aggregate any data that its realizations contains, using the general function ``get_df()``. When we asked for the ensemble parameters above, what actually happened is a call to ``get_df('parameters.txt')``. In the objects, these dataframes are stored with filenames as keys. When checking ``keys()`` after having run ``load_smry()``, you will see a pathname in front of ``unsmry--monthly.csv`` which is where the dataframe will be written to if you want to dump a realization or realization to disk. For convenience in interactive use, you do not need to write the entire pathname when calling ``get_df()``, but *only* when there is no ambiguity. You may also skip the extension ``.csv`` or ``.txt``. Reading data from text files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Text files in this context is a special case of text files arranged similarly to the already mentioned ``parameters.txt`` .. code-block:: text etc.. Think of the values in such text files as scalar values for realizations, but you can put anything into them. You can use as many of these kinds of text files as you want, in order to categorize inputs and/or outputs. As an example, put any scalar results that you produce though any code into a file called ``outputs.txt`` in every realization directory, and call ``myensembleobject.load_txt('outputs.txt')``. Scalar data ^^^^^^^^^^^ There is support for text files containing only one value, either string or numeric. There should be nothing else than the value itself in the text file, except for comments after a comment character. .. code-block:: python ens.load_scalar('npv.txt') You are advised to add the option `convert_numeric=True` when the values are actually numeric. This ensures that the loaded data is interpreted as numbers, and thrown away if not. When strings are present in in erroneous realizations, it will break aggregation as all the data for all realizations will be treated as strings. Scalar data will be aggregated to ensembles and ensemble sets. When aggregated, a dataframe with the realization index in the first column and the values in the second column. This value column has the same name as the filename. .. code-block:: python npv = ens.get_df('npv.txt') # A DataFrame is returned, with the columns 'REAL' and 'npv.txt' npv_values = npv['npv.txt'] # Need to say 'npv.txt' once more to get to the column values. Reading tabular data from CSV files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CSV files are handled analogously to txt files, in that you read them in by ``load_csv(filename)`` (where ``filename`` is the filename local to each realization). The data will be stored with the filename as the key, and you can get back the aggregated data set using ``get_df(filename)``. In aggregations from ensembles, the first column will always be ``REAL`` which is the realization index. The next columns will be from the CSV data you loaded. In case you need to clean up imported files, it is possible to delete columns and rows from internalized dataframes through the `drop()` functionality. For an ensemble object called `ens` you may issue the following: .. code-block:: python ens.drop('parameters.txt', key='BOGUSDATA') ens.drop('parameters.txt', keys=['FOO1', 'FOO2', 'FOO3']) ens.drop('geo_gas_volumes.csv', rowcontains='Totals') # Deletes all rows with 'Totals' anywhere. ens.drop('geo_oil_volumes.csv', column='Giip') ens.drop('unsmry--monthly', rowcontains='2000-01-01') # Enter dates as strings When called on `ScratchEnsemble` object the drops occur in each linked realization object, while on virtual ensembles, it occurs directly in its dataframe. Reading simulation grid data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Simulation static and dynamic grid data can be read and aggregated from the ensemble and returned as a DataFrame. The current implementation can be slow for large grid model and/or ensembles with many realizations. .. code-block:: python # Find of the report number corresponding to the date you are interested to extract from ens.get_unrst_report_dates() # Extract the mean of following properties at the report step 4 ens.get_eclgrid(props=['PERMX', 'FLOWATI+', 'FLOWATJ+'], report=4, agg='mean') When called, `get_eclgrid()` reads the grid (geometry) from one realization. Then depending if the properties requested are static or dynamic, the corresponding `*INIT` or `*UNRST` file will be read for all successful realization in the ensemble. The user can specify how the results should be aggregated. Currently the options supported are `mean` or `std`. Filtering realizations ^^^^^^^^^^^^^^^^^^^^^^ In an ensemble, realizations can be filtered out based on certain properties. Filtering is relevant both for removing realizations that have failed somewhere in the process, and it is also relevant for extracting subsets with certain properties (by values). Generally, fmu.ensemble is very permissive of realizations with close to no data. It is the user responsibility to filter those out if needed. The filtering function `filter()` can be used both do to in-place filtering, but also return VirtualEnsemble objects containing those realizations that matched the criterion. Examples: .. code-block:: python # Assuming an ensemble where yearly summary data is loaded, # throw away all realizations that did not reach a certain date ens.filter('unsmry--yearly', column='DATE', columncontains='2030-01-01') # Extract the subset for a specific sensitivity. vens = ens.filter('parameters.txt', key='DRAINAGE_STRATEGY', value='Depletion', inplace=False) # Remove all realizations where a specific output file # (that we have tried to internalize) is missing ens.filter('geo_oil_1.csv') Filtering with other comparators than equivalence is not implemented.