Advanced usage
Merging data
The CSV_EXPORT1
workflow built into ERT is performing one of the
specific tasks that can now be accomplished using this module. That
CSV export is just a merge of the dataframes coming from
parameters.txt
and the Eclipse summary data.
import pandas as pd
smry = reekensemble.load_smry(time_index='monthly')
params = reekensemble.parameters
# Match the two tables where the value of REAL is identical:
smry_params = pd.merge(smry, params)
smry_params.to_csv('smry_params.csv', index=False)
For finer control, you can specify exactly which summary vectors you
want to include, the time resolution, and perhaps also a subset of the
parameters. For example, if you have computed any kind of scalar data
pr. realization and put that into outputs.txt
, you can merge with
load_txt('outputs.txt')
instead of params
in the code above.
Statistics over ensembles
Statistics over ensembles can be computed by aggregating their data
and presenting them in realization-like objects. A “mean” of an
ensemble is possible to compute for all data an ensemble contains, and
the result is something that we treat like a realization. It is
important to realize that this result is not a realization, only
something we can treat in this way, for which the VirtualRealization
object is used. This “mean realization” is not a physical realization
where all numbers make physically sense, it just gives you the mean of
all the data, over the realizations for scalars, for each point in
time for time-series.
As an example, oil, gas, and pressure profiles in a statistical realization, will typically not be compatible, in that it can be physically impossible in reality to obtain these profiles. Use and interpret with care!
Supported statistical aggregations are mean
, median
, min
,
max
, std
, var
and pXX
where XX
is a number between
00
and 99
being the percentile you want. You access this
functionality through the function agg()
in the ensemble objects.
Pandas and Numpy is used under the hood for all computations. The quantile integers you supply are forwarded directly to Pandas, beware that this is opposite to the usual subsurface understanding of e.g. a “high case p10 profile”. Translate p10 to p90 if needed in your client code.
ens = ensemble.ScratchEnsemble('ref-ensemble',
'/scratch/foo/r018-ref-case/realization-*/iter-3')
mean = ens.agg('mean')
# What is the mean value of "FWL" in parameters.txt?
print(mean['parameters.txt']['FWL'])
# What is the mean ultimate FOPT
print(mean['unsmry--monthly']['FOPT'].iloc[-1])
# (.iloc[-1] is here Pandas functionality for accessing the last
# row)
Comparing realizations or ensembles
Any linear combination of ensembles or realizations is possible to compute, in a pointwise manner. This includes the data that is shared in the linear combination.
Computing the sum of two realizations is only a matter of adding them
in your Python interpreter. The end result is a object you can treat
similar to a realization, asking for its data using get_df()
, or
asking for the summary data using get_smry()
. Eclipse time-series
will be combined at point-wise in time, but only on shared
time-steps. It is therefore recommended to interpolate them to
e.g. monthly time interval prior to combination.
Ensembles can be linearly combined analogously to realizations, and
will be matched on realization index REAL
.
refens = ensemble.ScratchEnsemble('ref-ensemble',
'/scratch/foo/r018-ref-case/realization-*/iter-3')
iorens = ensemble.ScratchEnsemble('ior-ensemble',
'/scratch/foo/r018-ior-case/realization-*/pred')
# Calculate the delta ensemble
deltaens = iorens - refens
# Obtain the field gain:
fieldgain = deltaens.get_smry(column_keys=['FOPT'],
time_index='monthly')
If the ensembles you want to combine cannot be compared realization by realization, or does not even contain the same number of realizations, you should first aggregate the ensembles (mean or anyone else), and then construct delta objects of the statistical realizations.
Remember that this library does not help you in interpreting these results correctly, it only gives you the opportunity to calculate them!
Working with observations
Observations for history matching can be loaded, and computations (comparisons) of observed data versus simulated data can be performed.
The Observation object can be initizalized using YAML files or from a Python dictionary.
If you are opting for simple usage, just being able to compare FOPT
versus FOPTH
in your ensemble, your observation config could look
like:
# Eclipse summary vectors compared with allocated summary vectors
smryh:
- key: FOPT
histvec: FOPTH
time_index: monthly # or yearly, daily, raw or last, or a ISO-date
This file can be loaded in Python:
# Assume the yaml above has been put in a file:
obs = ensemble.Observations('fopt-obs.yml')
Alternatively, it is possible to initialize this directly without the filesystem:
obs = ensemble.Observations({'smryh': [{'key': 'FOPT',
'histvec': 'FOPTH', 'time_index': 'last'}]})
# Load an ensemble we want to analyze
ens = ensemble.ScratchEnsemble('hmcandidate',
'/scratch/foo/something/realization-*/iter-3')
# Perform calculation of misfit
# A dataframe with computed mismatches is returned.
# We only have one "observation" for each realization, so
# only one row pr. realization is returned.
misfit = obs.mismatch(ens)
# Sort ascending by L1 (absolute error) and print the realization
# indices of the first five:
print(misfit.sort_values('L1').head()['REAL'].values)
# Will return f.ex:
# [ 38 26 100 71 57]
For comparisons with single measured values (recommended for history matching), use the YAML syntax:
smry:
# Mandatory elements per entry: key and observations
- key: WBP4:OP_1
# This is a global comment regarding this set of observations
comment: "Shut-in pressures converted from well head conditions"
observations:
# Mandatory elements per entry in ecl_vector observations: value, error, date
- {value: 251, error: 4, date: 2001-01-01}
- {value: 251, error: 10, date: 2002-01-01}
- {value: 251, error: 10, date: 2003-01-01,
comment: First measurement after sensor drift correction}
Representative realizations
It is possible to utilize the observation support for calculating
similarity between realizations. An example of this is to create a
“mean” realization by use of the aggregation functionality (or p10,
p90 etc.) and then rank the ensemble members by how similar they are
to this aggregated realization. It is possible to pick certain summary
data from the virtual realization as “observations”, and calculate
mismatches. For this, a utility function load_smry()
is provided
by the Observation object to load “virtual” observations from an
existing realization. If you then use the Observation object to
compute mismatches, and then rank realizations by the mismatch, you
can pick the realization that is closest to your statistics of choice.
# Load an ensemble we want to analyze
ens = ensemble.ScratchEnsemble('hmensemble',
'/scratch/foo/something/realization-*/iter-3')
ens.load_smry(column_keys=['FOPT'], time_index='yearly')
# Calculate a "mean" realization
mean = ens.agg('mean')
# Create an empty observation object
obs = ensemble.Observations({})
# Load data from the mean realization as virtual observations:
obs.load_smry(mean, 'FOPT', time_index='yearly')
# Calculate the difference between the ensemble members and the
# mean realization:
mis = obs.mismatch(ens)
# Group mismatch data by realization, and pick the realization
# index with the smallest sum of squared errors ('L2')
closest_to_mean = mis.groupby('REAL').sum(numeric_only=True)['L2']\
.sort_values()\
.index\
.values[0]
Custom compute functions for each realization
If you have a custom Python function that works on Realization objects producing some dataframe, you can have the Ensemble object apply this function to each realization in turn (potentially in parallel).
Note that the same can be accomplished if you are able to produce the same
dataframe and export it to a CSV file in every realization, and then use
load_csv()
on the ensemble object. But this requires the CSV file to be
precomputed and dumped in every realization directory, which is not always
practical.
Assume first we have a function that is able to produce such a table when given
a ScratchRealization
object (the function can choose freely what information
in the realization object to use, potentially only the directory).
ZONE |
PORV |
VOLUME |
Z |
PERMX |
---|---|---|---|---|
LowerReek |
91526392.0 |
504504371.3 |
1595.7 |
1079.4 |
MidReek |
157574096.0 |
938721683.9 |
1585.9 |
877.4 |
UpperReek |
142346336.0 |
1036154565.6 |
1570.7 |
493.1 |
where PORV and VOLUME are sums over each zone, Z is the minimum (thus apex pr. zone) and PERMX is an arithmetic mean. In the language of res2df this could be done with a code like this:
from res2df import grid, ResdataFiles
resdatafiles = ResdataFiles('MYDATADECK.DATA') # There is a file zones.lyr alongside this.
grid_df = grid.df(resdatafiles) # Produce a dataframe with one row pr. cell
my_aggregators = {'PORV': 'sum', 'VOLUME': 'sum', 'Z': 'min', 'PERMX': 'mean'}
stats_df = grid_df.groupby("ZONE").agg(my_aggregators)
print(stats_df)
ScratchRealization
objects contain the methods runpath()
which will give
the full path to the directory the realization resides in, this can be used
freely by your function. For easier coupling with res2df, the function
get_resdatafiles()
is provided.
To be able to inject the res2df lines above into the API of fmu.ensemble and the
apply()
function, we
need to to put it into a wrapper function. This wrapper function will always
receive a Realization object as a named argument, and it must return a
dataframe. The wrapper function can look like this:
from res2df import grid, ResdataFiles
def my_realization_stats(args):
"""A custom function for performing a particular calculation
on every realization
Args:
args (dict): A dictionary with parameters to my custom function.
The keys 'realization' and 'localpath' are reserved for fmu.ensemble."""
realization = args["realization"] # Provided by fmu.ensemble apply()
resdatafiles = realization.get_resdatafiles()
grid_df = grid.df(resdatafiles)
my_aggregators = {'PORV': 'sum', 'VOLUME': 'sum', 'Z': 'min', 'PERMX': 'mean'}
stats_df = grid_df.groupby("ZONE").agg(my_aggregators)
return stats_df.reset_index() # Zone names are in the index, lost if not reset.
You are free to code your wrapper function in a way that suits both usage in apply() and interactive usage. Your wrapper function can perform differently for example if the “realization” key is not existing in the args dictionary given as input.
When this function is defined, and your ensemble is initialized, you can call this function on every realization as in the following (this would work on EnsembleSets also):
from fmu.ensemble import ScratchEnsemble
ensemble = ScratchEnsemble("test", "testcase/realization-*/iter-0")
ensemble.apply(my_realization_stats, localpath="zonestats.csv")
For interactive test-runs on single realizations, you can run
my_realization_stats({"realization": ens[0])
if ens
is a ScratchEnsemble
object.
After the apply()
operation is performed above, the data for each
realization resides in each realization object by the key zonestats.csv. We
can obtain all the data for all realizations (aggretated vertically by
concatenation) by asking ensemble.get_df("zonestats.csv")
. Further
aggregation to the ensemble level can be sone with the agg()
function which returns a
VirtualRealization object from an Ensemble object. If we want only the
aggregated table for our particular custom function, we can aggregate the
ensemble only for that particular datatype:
In [1]: mean_realization = ensemble.agg("mean", keylist="zonestats.csv")
In [2]: mean_realization.get_df("zonestats.csv")
Out [2]:
ZONE PERMX VOLUME Z PORV
0 LowerReek 1105.552718 6.070259e+08 1599.113406 109885776.0
1 MidReek 966.315122 9.608399e+08 1586.559663 161875872.0
2 UpperReek 592.824625 1.028779e+09 1571.164775 148655376.0