Advanced usage

Merging data

The CSV_EXPORT1 workflow built into ERT is performing one of the specific tasks that can now be accomplished using this module. That CSV export is just a merge of the dataframes coming from parameters.txt and the Eclipse summary data.

import pandas as pd
smry = reekensemble.load_smry(time_index='monthly')
params = reekensemble.parameters
# Match the two tables where the value of REAL is identical:
smry_params = pd.merge(smry, params)
smry_params.to_csv('smry_params.csv', index=False)

For finer control, you can specify exactly which summary vectors you want to include, the time resolution, and perhaps also a subset of the parameters. For example, if you have computed any kind of scalar data pr. realization and put that into outputs.txt, you can merge with load_txt('outputs.txt') instead of params in the code above.

Statistics over ensembles

Statistics over ensembles can be computed by aggregating their data and presenting them in realization-like objects. A “mean” of an ensemble is possible to compute for all data an ensemble contains, and the result is something that we treat like a realization. It is important to realize that this result is not a realization, only something we can treat in this way, for which the VirtualRealization object is used. This “mean realization” is not a physical realization where all numbers make physically sense, it just gives you the mean of all the data, over the realizations for scalars, for each point in time for time-series.

As an example, oil, gas, and pressure profiles in a statistical realization, will typically not be compatible, in that it can be physically impossible in reality to obtain these profiles. Use and interpret with care!

Supported statistical aggregations are mean, median, min, max, std, var and pXX where XX is a number between 00 and 99 being the percentile you want. You access this functionality through the function agg() in the ensemble objects.

Pandas and Numpy is used under the hood for all computations. The quantile integers you supply are forwarded directly to Pandas, beware that this is opposite to the usual subsurface understanding of e.g. a “high case p10 profile”. Translate p10 to p90 if needed in your client code.

ens = ensemble.ScratchEnsemble('ref-ensemble',
         '/scratch/foo/r018-ref-case/realization-*/iter-3')

mean = ens.agg('mean')

# What is the mean value of "FWL" in parameters.txt?
print(mean['parameters.txt']['FWL'])

# What is the mean ultimate FOPT
print(mean['unsmry--monthly']['FOPT'].iloc[-1])
# (.iloc[-1] is here Pandas functionality for accessing the last
# row)

Comparing realizations or ensembles

Any linear combination of ensembles or realizations is possible to compute, in a pointwise manner. This includes the data that is shared in the linear combination.

Computing the sum of two realizations is only a matter of adding them in your Python interpreter. The end result is a object you can treat similar to a realization, asking for its data using get_df(), or asking for the summary data using get_smry(). Eclipse time-series will be combined at point-wise in time, but only on shared time-steps. It is therefore recommended to interpolate them to e.g. monthly time interval prior to combination.

Ensembles can be linearly combined analogously to realizations, and will be matched on realization index REAL.

refens = ensemble.ScratchEnsemble('ref-ensemble',
            '/scratch/foo/r018-ref-case/realization-*/iter-3')
iorens = ensemble.ScratchEnsemble('ior-ensemble',
            '/scratch/foo/r018-ior-case/realization-*/pred')

# Calculate the delta ensemble
deltaens = iorens - refens

# Obtain the field gain:
fieldgain = deltaens.get_smry(column_keys=['FOPT'],
                              time_index='monthly')

If the ensembles you want to combine cannot be compared realization by realization, or does not even contain the same number of realizations, you should first aggregate the ensembles (mean or anyone else), and then construct delta objects of the statistical realizations.

Remember that this library does not help you in interpreting these results correctly, it only gives you the opportunity to calculate them!

Working with observations

Observations for history matching can be loaded, and computations (comparisons) of observed data versus simulated data can be performed.

The Observation object can be initizalized using YAML files or from a Python dictionary.

If you are opting for simple usage, just being able to compare FOPT versus FOPTH in your ensemble, your observation config could look like:

# Eclipse summary vectors compared with allocated summary vectors
smryh:
  - key: FOPT
    histvec: FOPTH
    time_index: monthly  # or yearly, daily, raw or last, or a ISO-date

This file can be loaded in Python:

# Assume the yaml above has been put in a file:
obs = ensemble.Observations('fopt-obs.yml')

Alternatively, it is possible to initialize this directly without the filesystem:

obs = ensemble.Observations({'smryh': [{'key': 'FOPT',
        'histvec': 'FOPTH', 'time_index': 'last'}]})

# Load an ensemble we want to analyze
ens = ensemble.ScratchEnsemble('hmcandidate',
        '/scratch/foo/something/realization-*/iter-3')

# Perform calculation of misfit
# A dataframe with computed mismatches is returned.
# We only have one "observation" for each realization, so
# only one row pr. realization is returned.
misfit = obs.mismatch(ens)

# Sort ascending by L1 (absolute error) and print the realization
# indices of the first five:
print(misfit.sort_values('L1').head()['REAL'].values)
# Will return f.ex:
#   [ 38  26 100  71  57]

For comparisons with single measured values (recommended for history matching), use the YAML syntax:

smry:
  # Mandatory elements per entry: key and observations
- key: WBP4:OP_1
    # This is a global comment regarding this set of observations
  comment: "Shut-in pressures converted from well head conditions"
  observations:
     # Mandatory elements per entry in ecl_vector observations: value, error, date
     - {value: 251, error: 4, date: 2001-01-01}
     - {value: 251, error: 10, date: 2002-01-01}
     - {value: 251, error: 10, date: 2003-01-01,
        comment: First measurement after sensor drift correction}

Representative realizations

It is possible to utilize the observation support for calculating similarity between realizations. An example of this is to create a “mean” realization by use of the aggregation functionality (or p10, p90 etc.) and then rank the ensemble members by how similar they are to this aggregated realization. It is possible to pick certain summary data from the virtual realization as “observations”, and calculate mismatches. For this, a utility function load_smry() is provided by the Observation object to load “virtual” observations from an existing realization. If you then use the Observation object to compute mismatches, and then rank realizations by the mismatch, you can pick the realization that is closest to your statistics of choice.

# Load an ensemble we want to analyze
ens = ensemble.ScratchEnsemble('hmensemble',
        '/scratch/foo/something/realization-*/iter-3')
ens.load_smry(column_keys=['FOPT'], time_index='yearly')

# Calculate a "mean" realization
mean = ens.agg('mean')

# Create an empty observation object
obs = ensemble.Observations({})

# Load data from the mean realization as virtual observations:
obs.load_smry(mean, 'FOPT', time_index='yearly')

# Calculate the difference between the ensemble members and the
# mean realization:
mis = obs.mismatch(ens)

# Group mismatch data by realization, and pick the realization
# index with the smallest sum of squared errors ('L2')
closest_to_mean = mis.groupby('REAL').sum(numeric_only=True)['L2']\
                                     .sort_values()\
                                     .index\
                                     .values[0]

Custom compute functions for each realization

If you have a custom Python function that works on Realization objects producing some dataframe, you can have the Ensemble object apply this function to each realization in turn (potentially in parallel).

Note that the same can be accomplished if you are able to produce the same dataframe and export it to a CSV file in every realization, and then use load_csv() on the ensemble object. But this requires the CSV file to be precomputed and dumped in every realization directory, which is not always practical.

Assume first we have a function that is able to produce such a table when given a ScratchRealization object (the function can choose freely what information in the realization object to use, potentially only the directory).

Example data from one realization.
ZONE	PORV	VOLUME	Z	PERMX
LowerReek	91526392.0	504504371.3	1595.7	1079.4
MidReek	157574096.0	938721683.9	1585.9	877.4
UpperReek	142346336.0	1036154565.6	1570.7	493.1

where PORV and VOLUME are sums over each zone, Z is the minimum (thus apex pr. zone) and PERMX is an arithmetic mean. In the language of res2df this could be done with a code like this:

from res2df import grid, ResdataFiles

resdatafiles = ResdataFiles('MYDATADECK.DATA')  # There is a file zones.lyr alongside this.
grid_df = grid.df(resdatafiles)  # Produce a dataframe with one row pr. cell
my_aggregators = {'PORV': 'sum', 'VOLUME': 'sum', 'Z': 'min', 'PERMX': 'mean'}
stats_df = grid_df.groupby("ZONE").agg(my_aggregators)
print(stats_df)

ScratchRealization objects contain the methods runpath() which will give the full path to the directory the realization resides in, this can be used freely by your function. For easier coupling with res2df, the function get_resdatafiles() is provided.

To be able to inject the res2df lines above into the API of fmu.ensemble and the apply() function, we need to to put it into a wrapper function. This wrapper function will always receive a Realization object as a named argument, and it must return a dataframe. The wrapper function can look like this:

from res2df import grid, ResdataFiles

def my_realization_stats(args):
   """A custom function for performing a particular calculation
   on every realization

   Args:
      args (dict): A dictionary with parameters to my custom function.
          The keys 'realization' and 'localpath' are reserved for fmu.ensemble."""
   realization = args["realization"]  # Provided by fmu.ensemble apply()
   resdatafiles = realization.get_resdatafiles()
   grid_df = grid.df(resdatafiles)
   my_aggregators = {'PORV': 'sum', 'VOLUME': 'sum', 'Z': 'min', 'PERMX': 'mean'}
   stats_df = grid_df.groupby("ZONE").agg(my_aggregators)
   return stats_df.reset_index()  # Zone names are in the index, lost if not reset.

You are free to code your wrapper function in a way that suits both usage in apply() and interactive usage. Your wrapper function can perform differently for example if the “realization” key is not existing in the args dictionary given as input.

When this function is defined, and your ensemble is initialized, you can call this function on every realization as in the following (this would work on EnsembleSets also):

from fmu.ensemble import ScratchEnsemble

ensemble = ScratchEnsemble("test", "testcase/realization-*/iter-0")
ensemble.apply(my_realization_stats, localpath="zonestats.csv")

For interactive test-runs on single realizations, you can run my_realization_stats({"realization": ens[0]) if ens is a ScratchEnsemble object.

After the apply() operation is performed above, the data for each realization resides in each realization object by the key zonestats.csv. We can obtain all the data for all realizations (aggretated vertically by concatenation) by asking ensemble.get_df("zonestats.csv"). Further aggregation to the ensemble level can be sone with the agg() function which returns a VirtualRealization object from an Ensemble object. If we want only the aggregated table for our particular custom function, we can aggregate the ensemble only for that particular datatype:

In [1]: mean_realization = ensemble.agg("mean", keylist="zonestats.csv")
In [2]: mean_realization.get_df("zonestats.csv")
Out [2]:
        ZONE        PERMX        VOLUME            Z         PORV
0  LowerReek  1105.552718  6.070259e+08  1599.113406  109885776.0
1    MidReek   966.315122  9.608399e+08  1586.559663  161875872.0
2  UpperReek   592.824625  1.028779e+09  1571.164775  148655376.0