fmu.ensemble package

Subpackages

fmu.ensemble.util package

Submodules

fmu.ensemble.ensemble module

Module containing the ScratchEnsemble class

class fmu.ensemble.ensemble.ScratchEnsemble(ensemble_name, paths=None, realidxregexp=None, runpathfile=None, runpathfilter=None, autodiscovery=True, manifest=None, batch=None)[source]

Bases: object

An ensemble is a collection of Realizations.

Ensembles are initialized from path(s) pointing to filesystem locations containing realizations.

Ensemble objects can be grouped into EnsembleSet.

Realizations in an ensembles are uniquely determined by their realization index (integer).

Example for initialization:

>>> from fmu import ensemble
>>> ens = ensemble.ScratchEnsemble('ensemblename',
            '/scratch/fmu/foobert/r089/casename/realization-*/iter-0')

Upon initialization, only a subset of the files on disk will be discovered. More files must be expliclitly discovered and/or loaded.

Parameters:

ensemble_name (str) – Name identifier for the ensemble. Optional to have it consistent with f.ex. iter-0 in the path.
paths (list/str) – String or list of strings with wildcards to file system. Absolute or relative paths. If omitted, ensemble will be empty unless runpathfile is used.
realidxregexp (str or regexp) – used to deduce the realization index from the file path. Default tailored for realization-X
runpathfile (str) – Filename (absolute or relative) of an ERT runpath file, consisting of four space separated text fields, first column is realization index, second column is absolute or relative path to a realization RUNPATH, third column is the basename of the Eclipse simulation, relative to RUNPATH. Fourth column is not used.
runpathfilter (str) – If supplied, the only the runpaths in the runpathfile which contains this string will be included Use to select only a specific realization f.ex.
autodiscovery (boolean) – True by default, means that the class can try to autodiscover data in the realization. Turn off to gain more fined tuned control.
manifest – dict or filename to use for manifest. If filename, it must be a yaml-file that will be parsed to a single dict.
batch (dict) – List of functions (load_*) that should be run at time of initialization for each realization. Each element is a length 1 dictionary with the function name to run as the key and each keys value should be the function arguments as a dict.

keys()[source]

Return the union of all keys available in realizations.

Keys refer to the realization datastore of internalized data. The datastore is a dictionary of dataframes or dicts. Examples would be parameters.txt, STATUS, share/results/tables/unsmry–monthly.csv

add_realizations(paths, realidxregexp=None, autodiscovery=True, batch=None)[source]

Utility function to add realizations to the ensemble.

Realizations are identified by their integer index. If the realization index already exists, it will be replaced when calling this function.

This function passes on initialization to ScratchRealization and stores a reference to those generated objects.

Parameters:

paths (list/str) – String or list of strings with wildcards to file system. Absolute or relative paths.
autodiscovery (boolean) – whether files can be attempted auto-discovered
batch (list) – Batch commands sent to each realization.

Returns:

Number of realizations successfully added.

Return type:

count (int)

add_from_runpathfile(runpath, runpathfilter=None, batch=None)[source]

Add realizations from a runpath file typically coming from ERT.

The runpath file is a space separated table with the columns:

index - integer with realization index

runpath - string with the full path to the realization

eclbase - ECLBASE within the runpath (location of DATA file minus the trailing ‘.DATA’)

iter - integer with the iteration number.

Parameters:

runpath (str) – Filename, absolute or relative, or a Pandas DataFrame parsed from a runpath file
runpathfilter (str) – in order to be included. Default None which means not filter
batch (list) – Batch commands to be sent to each realization.

Returns:

Number of successfully added realizations.

Return type:

int

remove_data(localpaths)[source]

Remove certain datatypes from each realizations datastores. This modifies the underlying realization objects, and is equivalent to

>>> del realization[localpath]

on each realization in the ensemble.

Parameters:: localpaths (string) – Full localpaths to the data, or list of strings.

remove_realizations(realindices)[source]

Remove specific realizations from the ensemble

Parameters:: realindices (int or list of ints) – The realization indices to be removed

to_virtual(name=None)[source]

Convert the ScratchEnsemble to a VirtualEnsemble.

This means that all imported data in each realization is aggregated and stored as dataframes in the returned VirtualEnsemble

Unless specified, the VirtualEnsemble object wil have the same ‘name’ as the ScratchEnsemble.

Parameters:: name (str) – Name of the ensemble as virtualized.

to_disk(filesystempath, delete=False, dumpcsv=True, dumpparquet=True)[source]

Dump ensemble data to a directory on disk.

The ScratchEnsemble is first converted to a VirtualEnsemble, which is then dumped to disk. This function is a convenience wrapper for to_disk() in VirtualEnsemble.

property manifest

Get the manifest of the ensemble. The manifest is nothing but a Python dictionary with unspecified content

Returns:: dict

property parameters

Build a dataframe of the information in each realizations parameters.txt.

If no realizations have the file, an empty dataframe is returned.

Returns:: pd.DataFrame

load_scalar(localpath, convert_numeric=False, force_reread=False)[source]

Parse a single value from a file for each realization.

The value can be a string or a number.

Empty files are treated as existing, with an empty string as the value, different from non-existing files.

Parsing is performed individually in each realization

Parameters:

localpath (str) – path to the text file, relative to each realization
convert_numeric (boolean) – If set to True, assume that the value is numerical, and treat strings as errors.
force_reread (boolean) – Force reread from file system. If False, repeated calls to this function will returned cached results.

Returns:

Aggregated data over the ensemble. The column ‘REAL’ signifies the realization indices, and a column with the same name as the localpath filename contains the data.

Return type:

pd.DataFrame

load_txt(localpath, convert_numeric=True, force_reread=False)[source]

Parse a key-value text file from disk and internalize data

Parses text files on the form

<key> <value>

in each line.

Parsing is performed individually in each realization

load_csv(localpath, convert_numeric=True, force_reread=False)[source]

For each realization, load a CSV.

The CSV file must be present in at least one realization. The parsing is done individually for each realization, and aggregation is on demand (through get_df()) and when this function returns.

Parameters:

localpath (str) – path to the text file, relative to each realization
convert_numeric (boolean) – If set to True, numerical columns will be searched for and have their dtype set to integers or floats. If scalars, only numerical data will be loaded.
force_reread (boolean) – Force reread from file system. If False, repeated calls to this function will returned cached results.

Returns:

aggregation of the loaded CSV files. Column ‘REAL’ distuinguishes each realizations data.

Return type:

pd.Dataframe

load_file(localpath, fformat, convert_numeric=False, force_reread=False)[source]

Function for calling load_file() in every realization

This function may utilize multithreading.

Parameters:

localpath (str) – path to the text file, relative to each realization
fformat (str) – string identifying the file format. Supports ‘txt’ and ‘csv’.
convert_numeric (boolean) – If set to True, numerical columns will be searched for and have their dtype set to integers or floats. If scalars, only numerical data will be loaded.
force_reread (boolean) – Force reread from file system. If False, repeated calls to this function will returned cached results.

Returns:

with loaded data aggregated. Column ‘REAL’ distuinguishes each realizations data.

Return type:

pd.Dataframe

find_files(paths, metadata=None, metayaml=False)[source]

Discover realization files. The files dataframes for each realization will be updated.

Certain functionality requires up-front file discovery, e.g. ensemble archiving and ensemble arithmetic.

CSV files for single use do not have to be discovered.

Files containing double-dashes ‘–’ indicate that the double dashes separate different component with meaning in the filename. The components are extracted and put into additional columns “COMP1”, “COMP2”, etc.. Filetype extension (after the last dot) will be removed from the last component.

Parameters:

paths (str or list of str) – Filenames (will be globbed) that are relative to the realization directory.
metadata (dict) – metadata to assign for the discovered files. The keys will be columns, and its values will be assigned as column values for the discovered files.
metayaml (boolean) – Additional possibility of adding metadata from associated yaml files. Yaml files to be associated to a specific discovered file can have an optional dot in front, and must end in .yml, added to the discovered filename. The yaml file will be loaded as a dict, and have its keys flattened using the separator ‘–’. Flattened keys are then used as column headers in the returned dataframe.

Returns:

with the slice of discovered files in each: realization, tagged with realization index in the column REAL. Empty dataframe if no files found.

Return type:

pd.DataFrame

get_smrykeys(vector_match=None)[source]

Return a union of all Eclipse Summary vector names in all realizations (union).

If any requested key/pattern does not match anything, it is silently ignored.

Parameters:

vector_match (str or list of str) – Wildcards for vectors to obtain. If None, all vectors are returned

Returns:

Matched summary vectors. Empty list if no: summary file or no matched summary file vectors

Return type:

list of str

get_smry_meta(column_keys=None)[source]

Provide metadata for summary data vectors.

A dictionary indexed by summary vector names is returned, and each value is another dictionary with potentially the following metadata types: * unit (string) * is_total (bool) * is_rate (bool) * is_historical (bool) * get_num (int) (only provided if not None) * keyword (str) * wgname (str or None)

The requested columns are asked for over the entire ensemble, and if necessary all realizations will be checked to obtain the metadata for a specific key. If metadata differ between realization, behaviour is undefined.

Parameters:: column_keys (list or str) – Column key wildcards.
Returns:: dict of dict with metadata information

get_df(localpath, merge=None)[source]

Load data from each realization and aggregate (vertically)

Data must be already have been internalized using a load_*() function.

Each row is tagged by the realization index in the column ‘REAL’

The localpath argument can be shortened, as it will be looked up using the function shortcut2path()

Parameters:

localpath (str) – refers to the internalized name.
merge (list or str) – refer to additional localpath which will be merged into the dataframe for every realization

Returns:

Merged data from each realization.: Realizations with missing data are ignored.

Return type:

pd.dataframe

Raises:

KeyError if no data is found in no realizations. –

load_smry(time_index='raw', column_keys=None, stacked=None, cache_eclsum=None, start_date=None, end_date=None, include_restart=True)[source]

Fetch and internalize summary data from all realizations.

The fetched summary data will be cached/internalized by each realization object, and can be retrieved through get_df().

The name of the internalized dataframe is “unsmry–” + a string for the time index, ‘monthly’, ‘yearly’, ‘daily’ or ‘raw’.

Multiple calls to this function with differnent time indices will lead to multiple storage of internalized dataframes, so your ensemble can both contain a yearly and a monthly dataset. There is no requirement for the column_keys to be consistent, but care should be taken if they differ.

If you create a virtual ensemble of this ensemble object, all internalized summary data will be kept, as opposed to if you have retrieved it through get_smry()

Wraps around Realization.load_smry() which wraps around resdata.summary.Summary.pandas_frame()

Beware that the default time_index for ensembles is ‘monthly’, differing from realizations which use raw dates by default.

Parameters:

time_index (str or list of DateTime) – If defaulted, the raw Eclipse report times will be used. If a string is supplied, that string is attempted used via get_smry_dates() in order to obtain a time index, typically ‘monthly’, ‘daily’ or ‘yearly’.
column_keys (str or list of str) – column key wildcards. Default is ‘*’ which will match all vectors in the Eclipse output.
stacked (boolean) – determining the dataframe layout. If true, the realization index is a column, and dates are repeated for each realization in the DATES column. If false, a dictionary of dataframes is returned, indexed by vector name, and with realization index as columns. This only works when time_index is the same for all realizations. Not implemented yet!
cache_eclsum (boolean) – Boolean for whether we should cache the EclSum objects. Set to False if you cannot keep all EclSum files in memory simultaneously
start_date (str or date) – First date to include. Dates prior to this date will be dropped, supplied start_date will always be included. Overridden if time_index is ‘first’ or ‘last’. If string, use ISO-format, YYYY-MM-DD. ISO-format, YYYY-MM-DD.
end_date (str or date) – Last date to be included. Dates past this date will be dropped, supplied end_date will always be included. Overridden if time_index is ‘first’ or ‘last’. If string, use ISO-format, YYYY-MM-DD.
include_restart (boolean) – boolean sent to resdata for whether restart files should be traversed.

Returns:

Summary vectors for the ensemble, or a dict of dataframes if stacked=False.

Return type:

pd.DataFame

get_volumetric_rates(column_keys=None, time_index=None, time_unit=None)[source]

Compute volumetric rates from cumulative summary vectors

Column names that are not referring to cumulative summary vectors are silently ignored.

A Dataframe is returned with volumetric rates, that is rate values that can be summed up to the cumulative version. The ‘T’ in the column name is switched with ‘R’. If you ask for FOPT, you will get FOPR in the returned dataframe.

Rates in the returned dataframe are valid forwards in time, opposed to rates coming directly from the Eclipse simulator which are valid backwards in time.

If time_unit is set, the rates will be scaled to represent either daily, monthly or yearly rates. These will sum up to the cumulative as long as you multiply with the correct number of days, months or year between each consecutive date index. Month lengths and leap years are correctly handled.

Parameters:

column_keys (str or list of str) – cumulative summary vectors
time_index (str or list of datetimes)
time_unit – str or None. If None, the rates returned will be the difference in cumulative between each included time step (where the time interval can vary arbitrarily) If set to ‘days’, ‘months’ or ‘years’, the rates will be scaled to represent a daily, monthly or yearly rate that is compatible with the date index and the cumulative data.

Returns:

analoguous to the dataframe returned by get_smry(). Empty dataframe if no data found.

Return type:

pd.DataFrame

filter(localpath, inplace=True, **kwargs)[source]

Filter realizations or data within realizations

Calling this function can return a copy with fewer realizations, or remove realizations from the current object.

Typical usage is to require that parameters.txt is present, or that the OK file is present.

It is also possible to require a certain scalar to have a specific value, for example filtering on a specific sensitivity case.

Parameters:

localpath (string) – pointing to the data for which the filtering applies. If no other arguments, only realizations containing this data key is kept.
key (str) – A certain key within a realization dictionary that is required to be present. If a value is also provided, this key must be equal to this value
value (str, int or float) – The value a certain key must equal. Floating point comparisons are not robust.
column (str) – Name of a column in tabular data. If columncontains is not specified, this means that this column must be present
columncontains (str, int or float) – A value that the specific column must include.
inplace – Indicating if the current object should have its realizations stripped, or if a copy should be returned. Default true.

drop(localpath, **kwargs)[source]

Delete elements from internalized data.

Shortcuts are allowed for localpath. If the data pointed to is a DataFrame, you can delete columns, or rows containing certain elements

If the data pointed to is a dictionary, keys can be deleted.

Parameters:

localpath – string, path to internalized data. If no other options are supplied, that dataset is deleted in its entirety
column – string with a column name to drop. Only for dataframes
columns – list of strings with column names to delete
rowcontains – rows where one column contains this string will be dropped. The comparison is on strings only, and all cells in the dataframe is converted to strings for the comparison. Thus it might work on dates, but be careful with numbers.
key – string with a keyname in a dictionary. Will not work for dataframes
keys – list of strings of keys to delete from a dictionary

process_batch(batch=None)[source]

Process a list of functions to run/apply

This is equivalent to calling each function individually but this enables more efficient concurrency. It is meant to be used for functions that modifies the realization object, not for functions that returns a dataframe already.

Parameters:

batch (list) – Each list element is a dictionary with one key, being a function names, value pr key is a dict with keyword arguments to be supplied to each function.

Returns:

This ensemble object (self), for it: to be picked up by ProcessPoolExecutor and pickling.

Return type:

ScratchEnsemble

apply(callback, **kwargs)[source]

Callback functionalty, apply a function to every realization

The supplied function handle will be handed over to each underlying realization object. The function supplied must return a Pandas DataFrame. The function can obtain the realization object in the kwargs dictionary through the key ‘realization’.

Parameters:

callback – function handle
kwargs – dictionary where ‘realization’ and ‘localpath’ is reserved, will be forwarded to the callbacked function
localpath – str, optional if the data is to be internalized in each realization object.

Returns:

pd.DataFrame, aggregated result of the supplied function on each realization.

get_smry_dates(freq='monthly', normalize=True, start_date=None, end_date=None, cache_eclsum=None, include_restart=True)[source]

Return list of datetimes for an ensemble according to frequency

Parameters:

freq –

string denoting requested frequency for: the returned list of datetime. ‘report’ or ‘raw’ will yield the sorted union of all valid timesteps for all realizations. Other valid options are ‘daily’, ‘monthly’ and ‘yearly’. ‘first’ will give out the first date (minimum). ‘last’ will give out the last date (maximum).
normalize: Whether to normalize backwards at the start: and forwards at the end to ensure the raw date range is covered.
start_date: str or date with first date to include.: Dates prior to this date will be dropped, supplied start_date will always be included. Overrides normalized dates. Overridden if freq is ‘first’ or ‘last’. If string, use ISO-format, YYYY-MM-DD.
end_date: str or date with last date to be included.: Dates past this date will be dropped, supplied end_date will always be included. Overrides normalized dates. Overridden if freq is ‘first’ or ‘last’. If string, use ISO-format, YYYY-MM-DD.
include_restart: boolean sent to resdata for whether restart: files should be traversed.

Returns:

list of datetimes. Empty list if no data found.

get_smry_stats(column_keys=None, time_index='monthly', quantiles=None, cache_eclsum=None, start_date=None, end_date=None)[source]

Function to extract the ensemble statistics (Mean, Min, Max, P10, P90) for a set of simulation summary vectors (column key).

Compared to the agg() function, this function only works on summary data (time series), and will only operate on actually requested data, independent of what is internalized. It accesses the summary files directly and can thus obtain data at any time frequency.

Parameters:

column_keys – list of column key wildcards
time_index – list of DateTime if interpolation is wanted default is None, which returns the raw Eclipse report times If a string is supplied, that string is attempted used via get_smry_dates() in order to obtain a time index.
quantiles – list of ints between 0 and 100 for which quantiles to compute. Quantiles refer to scientific standard, which is opposite to the oil industry convention. Ask for p10 if you need the oil industry p90.
cache_eclsum – boolean for whether to keep the loaded EclSum object in memory after data has been loaded.
start_date – str or date with first date to include. Dates prior to this date will be dropped, supplied start_date will always be included. Overridden if time_index is ‘first’ or ‘last’. If string, use ISO-format, YYYY-MM-DD.
end_date – str or date with last date to be included. Dates past this date will be dropped, supplied end_date will always be included. Overridden if time_index is ‘first’ or ‘last’. If string, use ISO-format, YYYY-MM-DD.

Returns:

A MultiIndex dataframe. Outer index is ‘minimum’, ‘maximum’, ‘mean’, ‘p10’, ‘p90’, inner index are the dates. Column names are the different vectors. Quantiles refer to the scientific standard, opposite to the oil industry convention. If quantiles are explicitly supplied, the ‘pXX’ strings in the outer index are changed accordingly. If no data is found, return empty DataFrame.

get_wellnames(well_match=None)[source]

Return a union of all Eclipse Summary well names in all realizations (union). In addition, can return a list based on matches to an input string pattern.

Parameters:: well_match – Optional. String (or list of strings) with wildcard filter. If None, all wells are returned
Returns:: list of strings with eclipse well names. Empty list if no summary file or no matched well names.

get_groupnames(group_match=None)[source]

Return a union of all Eclipse Summary group names in all realizations (union).

Optionally, the well names can be filtered.

Parameters:: well_match – Optional. String (or list of strings) with wildcard filter (globbing). If None, all wells are returned. Empty string does not match anything.
Returns:: list of strings with eclipse well names. Empty list if no summary file or no matched well names.

agg(aggregation, keylist=None, excludekeys=None)[source]

Aggregate the ensemble data into one VirtualRealization

All data will be attempted aggregated. String data will typically be dropped in the result.

Parameters:

aggregation – string, supported modes are ‘mean’, ‘median’, ‘p10’, ‘p90’, ‘min’, ‘max’, ‘std, ‘var’, ‘pXX’ where X is a number
keylist – list of strings, indicating which keys in the internal datastore to include. If list is empty (default), all data will be attempted included.
excludekeys – list of strings that should be excluded if keylist is empty, otherwise ignored

Returns:

VirtualRealization. Its name will include the aggregation operator

WARNING: This code is duplicated in virtualensemble.py

property files: Return a concatenation of files in each realization

property name: The ensemble name.

get_realindices()[source]

Return the integer indices for realizations in this ensemble

Returns:: list of integers

get_smry(time_index=None, column_keys=None, cache_eclsum=None, start_date=None, end_date=None, include_restart=True)[source]

Aggregates summary data from all realizations.

Wraps around Realization.get_smry() which wraps around resdata.summary.Summary.pandas_frame()

Parameters:

time_index – list of DateTime if interpolation is wanted default is None, which returns the raw Eclipse report times If a string with an ISO-8601 date is supplied, that date is used directly, otherwise the string is assumed to indicate a wanted frequencey for dates, daily, weekly, monthly, yearly, that will be send to get_smry_dates()
column_keys – list of column key wildcards
cache_eclsum – boolean for whether to cache the EclSum objects. Defaults to True. Set to False if not enough memory to keep all summary files in memory.
start_date – str or date with first date to include. Dates prior to this date will be dropped, supplied start_date will always be included. Overridden if time_index is ‘first’ or ‘last’.
end_date – str or date with last date to be included. Dates past this date will be dropped, supplied end_date will always be included. Overridden if time_index is ‘first’ or ‘last’.
include_restart – boolean sent to resdata for whether restart files should be traversed.

Returns:

A DataFame of summary vectors for the ensemble. The column REAL with integers is added to distinguish realizations. If no realizations, empty DataFrame is returned.

get_eclgrid(props, report=0, agg='mean', active_only=False)[source]

Returns the grid (i,j,k) and (x,y), and any requested init and/or unrst property. The values are aggregated over the ensemble (mean/ std currently supported).

Parameters:

props – list of column key wildcards
report – int. for unrst props only. Report step for given date. Use the function get_unrst_report_dates to get an overview of the report steps availible.
agg – String. “mean” or “std”.
active_only – bool. True if activate cells only.

Returns:

A dictionary. Index by grid attribute, and contains a list corresponding to a set of values for each grid cells.

property global_active

An ResdataKW with, for each cell, the number of realizations where the cell is active.

Type:: returns

property global_size

global size of the realizations in the Ensemble. see fmu_postprocessing.modelling.Realization.global_size().

Type:: returns

property init_keys: Return all keys available in the Eclipse INIT file

property unrst_keys: Return keys availaible in the Eclipse UNRST file

get_unrst_report_dates()[source]: Returns UNRST report step and the corresponding date

get_init(prop, agg)[source]

Parameters:: prop – A time independent property,
Returns:: Dictionary with mean or std_dev as keys, and corresponding values for given property as values.
Raises:: ValueError – If prop is not found.

get_unrst(prop, report, agg)[source]

Parameters:: prop – A time dependent property, see fmu_postprocessing.modelling.SimulationGrid.TIME_DEPENDENT.
Returns:: Dictionary with mean and std_dev as keys, and corresponding values for given property as values.
Raises:: ValueError – If prop is not in TIME_DEPENDENT.

fmu.ensemble.ensemblecombination module

Module for handling linear combinations of ensembles

class fmu.ensemble.ensemblecombination.EnsembleCombination(ref, scale=None, add=None, sub=None)[source]

Bases: object

The class is used to perform linear operations on ensembles.

When instantiated, the linear combination will not actually be computed before the results are actually asked for - lazy evaluation.

keys()[source]: Return the intersection of all keys available in reference ensemble(combination) and the other

get_df(localpath, merge=None)[source]

Obtain given data from the ensemblecombination, doing the actual computation of ensemble on the fly.

Warning: In order to add dataframes together with meaning, using pandas.add, the index of the frames must be correctly set, and this can be tricky for some datatypes (f.ex. volumetrics table where you want to add together volumes for correct zone and fault segment).

If you have the columns “REAL”, “DATE”, “ZONE” and/or “REGION”, it will be regarded as an index column.

Parameters:

localpath (str) – refers to the internalized name of the data wanted in each ensemble.
merge (list or str) – Optional data to be merged in for the data The merge will happen as deep as possible (in realization objects in case of ScratchEnsembles), and all ensemble combination computations happen after merging. Be careful with index guessing and merged data.

to_virtual(keyfilter=None)[source]

Evaluate the current linear combination and return as a virtual ensemble.

Parameters:: keyfilter (list or str) – If supplied, only keys matching wildcards in this argument will be included. Use this for speed reasons when only some data is needed. Default is to include everything. If you supply “unsmry”, it will match every key that includes this string by prepending and appending ‘*’ to your pattern
Returns:: VirtualEnsemble

get_smry_dates(freq='monthly', normalize=True, start_date=None, end_date=None)[source]: Create a union of dates available in the involved ensembles

get_smry(column_keys=None, time_index=None)[source]

Loads the Eclipse summary data directly from the underlying ensemble data. The ensembles can be ScratchEnsemble or VirtualEnsemble, if scratch it will access binary summary files directly, if virtual ensembles, summary data must have been loaded earlier.

Parameters:

column_keys (str or list) – column key wildcards. Default is ‘*’, which will match all vectors in the Eclipse output.
time_index (str or list of DateTime) – time_index mnemonic or a list of explicit datetime at which the summary data is requested (interpolated or extrapolated)

Returns:

pd.DataFrame. Indexed by rows, has at least the columns REAL: and DATE if not empty.

get_smry_stats(column_keys=None, time_index='monthly')[source]

Function to extract the ensemble statistics (Mean, Min, Max, P10, P90) for a set of simulation summary vectors (column key).

Compared to the agg() function, this function only works on summary data (time series), and will only operate on actually requested data, independent of what is internalized. It accesses the summary files directly and can thus obtain data at any time frequency.

Parameters:

column_keys – list of column key wildcards
time_index – list of DateTime if interpolation is wanted default is None, which returns the raw Eclipse report times If a string is supplied, that string is attempted used via get_smry_dates() in order to obtain a time index.

Returns:

A MultiLevel dataframe. Outer index is ‘minimum’, ‘maximum’, ‘mean’, ‘p10’, ‘p90’, inner index are the dates. Column names are the different vectors. Quantiles follow the scientific standard, opposite to the oil industry standard.

TODO: add warning message when failed realizations are removed

get_smry_meta(column_keys=None)[source]

Provide metadata for summary data vectors.

A dictionary indexed by summary vector names is returned, and each value is another dictionary with potentially the metadata types: * unit (string) * is_total (bool) * is_rate (bool) * is_historical (bool) * get_num (int) (only provided if not None) * keyword (str) * wgname (str or None)

Parameters:: column_keys – List or str of column key wildcards

agg(aggregation, keylist=None, excludekeys=None)[source]: Aggregator, this is a wrapper that will call .to_virtual() on your behalf and call the corresponding agg() in VirtualEnsemble.

get_volumetric_rates(column_keys=None, time_index='monthly', time_unit=None)[source]

Compute volumetric rates from cumulative summary vectors.

Column names that are not referring to cumulative summary vectors are silently ignored.

A Dataframe is returned with volumetric rates, that is rate values that can be summed up to the cumulative version. The ‘T’ in the column name is switched with ‘R’. If you ask for FOPT, you will get FOPR in the returned dataframe.

Rates in the returned dataframe are valid forwards in time, opposed to rates coming directly from the Eclipse simulator which are valid backwards in time.

If time_unit is set, the rates will be scaled to represent either daily, monthly or yearly rates. These will sum up to the cumulative as long as you multiply with the correct number of days, months or year between each consecutive date index. Month lengths and leap years are correctly handled.

Parameters:

column_keys – str or list of strings, cumulative summary vectors
time_index – str or list of datetimes
time_unit – str or None. If None, the rates returned will be the difference in cumulative between each included time step (where the time interval can vary arbitrarily) If set to ‘days’, ‘months’ or ‘years’, the rates will be scaled to represent a daily, monthly or yearly rate that is compatible with the date index and the cumulative data.

property parameters: Return parameters from the ensemble as a class property

get_realindices()[source]

Return the integer indices for realizations in this ensemble

There is no guarantee that all realizations returned here will be valid for all datatypes after computation.

Returns:: list of integers

fmu.ensemble.ensembleset module

Module for book-keeping and aggregation of ensembles

class fmu.ensemble.ensembleset.EnsembleSet(name=None, ensembles=None, frompath=None, runpathfile=None, realidxregexp=None, iterregexp=None, batchregexp=None, autodiscovery=True, batch=None)[source]

Bases: object

An ensemble set is any collection of ensemble objects

Ensemble objects are ScratchEnsembles or VirtualEnsembles.

There is support for initializing from a filstructure with both iterations and batches, but the concept of iterations and batches are not kept in an EnsembleSet, there each ensemble is uniquely identified by the ensemble name. To keep the iteration (and batch) concept, that must be embedded into the ensemble name.

The init method will make an ensemble set, either as empty, or from a list of already initialized ensembles, or directly from the filesystem, or from an ERT runpath file. Only one of these initialization modes can be used.

Parameters:

name – Chosen name for the ensemble set. Can be used if aggregated at a higher level.
ensembles – list of Ensemble objects. Can be omitted.
frompath – string or list of strings with filesystem path. Will be globbed by default. If no realizations or iterations are detected after globbing, the standard glob ‘realization-/iter-/ will be used.
runpathfile – string with path to an ert runpath file which will be used to lookup realizations and iterations.
realidxregexp – regular expression object that will be used to determine the realization index (must be integer) from a path component (split by /). The default fits realization-*
iterregexp – similar to realidxregexp, and result will always be treated as a string.
batchregexp – similar ot iterregexp, for future support of an extra level similar to iterations
autodiscovery – boolean, sent to initializing Realization objects, instructing them on whether certain files should be auto-discovered.
batch (dict) – List of functions (load_*) that should be run at time of initialization for each realization. Each element is a length 1 dictionary with the function name to run as the key and each keys value should be the function arguments as a dict.

property name: Return the name of the ensembleset, as initialized

property ensemblenames: Return a list of named ensembles in this set

keys()[source]

Return the union of all keys available in the ensembles.

Keys refer to the realization datastore, a dictionary of dataframes or dicts.

add_ensembles_frompath(paths, realidxregexp=None, iterregexp=None, batchregexp=None, autodiscovery=True, batch=None)[source]

Convenience function for adding multiple ensembles.

Parameters:

paths – str or list of strings with path to the directory containing the realization-/iter- structure
realidxregexp – Supply a regexp that can extract the realization index as an integer from path components. The expression will be tested on individual path components from right to left.
iterregexp – Similar to real_regexp, but is allowed to match strings.
batchregexp – Similar to real_regexp, but is allowed to match strings.
autodiscovery – boolean, sent to initializing Realization objects, instructing them on whether certain files should be auto-discovered.
batch (dict) – List of functions (load_*) that should be run at time of initialization for each realization. Each element is a length 1 dictionary with the function name to run as the key and each keys value should be the function arguments as a dict.

add_ensembles_fromrunpath(runpathfile, batch=None)[source]

Add one or many ensembles from an ERT runpath file.

autodiscovery is not an argument, it is by default set to False for runpath-files, since the location of the UNSMRY-file is given in the runpath file.

add_ensemble(ensembleobject)[source]

Add a single ensemble to the ensemble set

Name is taken from the ensembleobject.

property parameters

Build a dataframe of the information in each realizations parameters.txt.

If no realizations have the file, an empty dataframe is returned.

Returns:: pd.DataFrame

load_scalar(localpath, convert_numeric=False, force_reread=False)[source]

Parse a single value from a file

The value can be a string or a number. Empty files are treated as existing, with an empty string as the value, different from non-existing files.

Parsing is performed individually in each ensemble and realization

load_txt(localpath, convert_numeric=True, force_reread=False)[source]

Parse and internalize a txt-file from disk

Parses text files on the form <key> <value> in each line.

load_csv(localpath, convert_numeric=True, force_reread=False)[source]: Parse and internalize a CSV file from disk

load_file(localpath, fformat, convert_numeric=True, force_reread=False)[source]: Internal function for load_*()

get_df(localpath, merge=None)[source]

Collect contents of dataframes from each ensemble

Parameters:

localpath (str) – path to the text file, relative to each realization
merge (list or str) – refer to additional localpath(s) which will be merged into the dataframe for every ensemble/realization. Merging happens before aggregation.

drop(localpath, **kwargs)[source]

Delete elements from internalized data.

Shortcuts are allowed for localpath. If the data pointed to is a DataFrame, you can delete columns, or rows containing certain elements

If the data pointed to is a dictionary, keys can be deleted.

Parameters:

localpath – string, path to internalized data. If no other options are supplied, that dataset is deleted in its entirety
column – string with a column name to drop. Only for dataframes
columns – list of strings with column names to delete
rowcontains – rows where one column contains this string will be dropped. The comparison is on strings only, and all cells in the dataframe is converted to strings for the comparison. Thus it might work on dates, but be careful with numbers.
key – string with a keyname in a dictionary. Will not work for dataframes
keys – list of strings of keys to delete from a dictionary

remove_data(localpaths)[source]

Remove certain datatypes from each ensembles/realizations datastores. This modifies the underlying realization objects, and is equivalent to

>>> del realization[localpath]

on each realization in each ensemble.

Parameters:: localpaths (string) – Full localpath to the data, or list of strings.

process_batch(batch=None)[source]

Process a list of functions to run/apply

This is equivalent to calling each function individually but this enables more efficient concurrency. It is meant to be used for functions that modifies the realization object, not for functions that returns a dataframe already.

Parameters:: batch (list) – Each list element is a dictionary with one key, being a function names, value pr key is a dict with keyword arguments to be supplied to each function.

apply(callback, **kwargs)[source]

Callback functionalty, apply a function to every realization

The supplied function handle will be handed over to each underlying ScratchEnsemble object, which in turn will hand it over to its realization objects. The function supplied must return a Pandas DataFrame. The function can obtain the realization object in the kwargs dictionary through the key ‘realization’.

Any VirtualEnsembles are ignored. Operations on dataframes in VirtualEnsembles can be done using the apply() functionality in pd.DataFrame

Parameters:

callback – function handle
kwargs – dictionary where ‘realization’ and ‘localpath’ is reserved, will be forwarded to the callbacked function
localpath – str, optional if the data is to be internalized in each realization object.

Returns:

pd.DataFrame, aggregated result of the supplied function: on each realization.

shortcut2path(shortpath)[source]

Convert short pathnames to fully qualified pathnames within the datastore.

If the fully qualified localpath is

‘share/results/volumes/simulator_volume_fipnum.csv’

then you can also access this with these alternatives:

simulator_volume_fipnum
simulator_volume_fipnum.csv
share/results/volumes/simulator_volume_fipnum

but only as long as there is no ambiguity. In case of ambiguity, the shortpath will be returned.

CODE DUPLICATION from realization.py

get_csv_deprecated(filename)[source]

Load CSV data from each realization in each ensemble, and aggregate.

Parameters:

filename – string, filename local to realization

Returns:

Merged CSV from each realization.: Realizations with missing data are ignored. Empty dataframe if no data is found

Return type:

dataframe

load_smry(time_index='raw', column_keys=None, cache_eclsum=None, start_date=None, end_date=None)[source]

Fetch summary data from all ensembles

Wraps around Ensemble.load_smry() which wraps Realization.load_smry(), which wraps resdata.summary.Summary.pandas_frame()

The time index is determined at realization level. If you ask for ‘monthly’, you will from each realization get its months. At ensemble or ensembleset-level, the number of monthly report dates between realization can vary

The pr. realization results will be cached by each realization object, and can be retrieved through get_df().

Parameters:

time_index – list of DateTime if interpolation is wanted default is raw, which returns the raw Eclipse report times If a string is supplied, that string is attempted used via get_smry_dates() in order to obtain a time index.
column_keys – list of column key wildcards
cache_eclsum – Boolean for whether we should cache the Summary objects. Set to False if you cannot keep all Summary files in memory simultaneously
start_date – str or date with first date to include. Dates prior to this date will be dropped, supplied start_date will always be included. Overridden if time_index is ‘first’ or ‘last’.
end_date – str or date with last date to be included. Dates past this date will be dropped, supplied end_date will always be included. Overridden if time_index is ‘first’ or ‘last’.

Returns:

A DataFame of summary vectors for the ensembleset. The column ‘ENSEMBLE’ will denote each ensemble’s name

get_smry(time_index=None, column_keys=None, cache_eclsum=None, start_date=None, end_date=None)[source]

Aggregates summary data from all ensembles

Wraps around Ensemble.get_smry(), which wraps around Realization.get_smry() which wraps around resdata.summary.Summary.pandas_frame()

Parameters:

time_index – list of DateTime if interpolation is wanted default is None, which returns the raw Eclipse report times If a string is supplied, that string is attempted used via get_smry_dates() in order to obtain a time index.
column_keys – list of column key wildcards
cache_eclsum – boolean for whether to cache the Summary objects. Defaults to False. Set to True if there is enough memory to keep all realizations summary files in memory at once. This will speed up subsequent operations
start_date – str or date with first date to include. Dates prior to this date will be dropped, supplied start_date will always be included. Overridden if time_index is ‘first’ or ‘last’.
end_date – str or date with last date to be included. Dates past this date will be dropped, supplied end_date will always be included. Overridden if time_index is ‘first’ or ‘last’.

Returns:

A DataFame of summary vectors for the EnsembleSet. The column ENSEMBLE will distinguish the different ensembles by their respective names.

get_smry_dates(freq='monthly', cache_eclsum=None, start_date=None, end_date=None)[source]

Return list of datetimes from an ensembleset

Datetimes from each realization in each ensemble can be returned raw, or be resampled.

Parameters:

freq –

string denoting requested frequency for: the returned list of datetime. ‘report’ will yield the sorted union of all valid timesteps for all realizations. Other valid options are ‘daily’, ‘monthly’ and ‘yearly’.
cache_eclsum: Boolean for whether we should cache the Summary: objects. Set to False if you cannot keep all Summary files in memory simultaneously
start_date: str or date with first date to include.: Dates prior to this date will be dropped, supplied start_date will always be included. Overridden if time_index is ‘first’ or ‘last’.
end_date: str or date with last date to be included.: Dates past this date will be dropped, supplied end_date will always be included. Overridden if time_index is ‘first’ or ‘last’.

Returns:

list of datetime.date.

get_wellnames(well_match=None)[source]

Return a union of all Eclipse summary well names in all ensembles realizations (union).

Optionally, the well names can be filtered.

Parameters:: well_match – Optional. String (or list of strings) with wildcard filter (globbing). If None, all wells ar returned. Empty string will not match anything.
Returns:: list of strings with eclipse well names. Empty list if no summary file or no matched well names.

fmu.ensemble.etc module

This module is deprecated and will be removed in fmu-ensemble v2.0.0

class fmu.ensemble.etc.Interaction[source]

Bases: object

System for handling interaction; dialogues and messages in FMU.

This module cooperates with the standard Python logging module.

property logginglevel: Set or return a logging level property, e.g. logging.CRITICAL

property numericallogginglevel: Return a numerical logging level (read only)

property loggingformatlevel: Set logging format (for future use)

property loggingformat: Returns the format string to be used in logging

property tmpdir: Get and set tmpdir for testing

static print_fmu_header(appname, appversion, info=None)[source]

Prints a banner for a FMU app to STDOUT.

Parameters:

appname (str) – Name of application.
appversion (str) – Version of application on form ‘3.2.1’
info (str, optional) – More info, e.g. if beta release

Example:

fmux.print_fmu_header('fmu.ensemble, '0.2.1', info='Beta release!')

basiclogger(name, level=None)[source]: Initiate the logger by some default settings.

static functionlogger(name)[source]: Get the logger for functions (not top level).

testsetup(path='TMP')[source]: Basic setup for FMU testing (developer only; relevant for tests)

static timer(*args)[source]

Without args; return the time, with a time as arg return the difference.

Example:

time1 = timer()
for i in range(10000):
    i = i + 1
time2 = timer(time1)
print('Execution took {} seconds'.format(time2)

echo(string)[source]: Show info at runtime (for user scripts)

warn(string)[source]: Show warnings at Runtime (pure user info/warns).

warning(string): Show warnings at Runtime (pure user info/warns).

error(string)[source]: Issue an error, will not exit system by default

critical(string, sysexit=True)[source]: Issue a critical error, default is SystemExit.

get_callerinfo(caller, frame)[source]: Get caller info for logging (developer stuff)

fmu.ensemble.observations module

Observations support and related calculations

class fmu.ensemble.observations.Observations(observations)[source]

Bases: object

Represents a set of observations and the ability to compare realizations and ensembles to the observations

The primary data structure is a dictionary holding actual observations, this can typically be loaded from a yaml-file

Key functionality is to be able to compute mismatch pr observation and presenting the computed data as a Pandas Dataframe. If run on ensembles, every row will be tagged by which realization index the data was computed for.

An observation unit is a concept for the observation and points to something we define as a “single” observation. It can be one value for one datatype at a specific date, but in the case of Eclipse summary vector, it can also be a time-series. Mismatches will be computed pr. observation unit.

Pay attention to mismatch versus misfit. Here, mismatch is used for individual observation units, while misfit is used as single number for whole realizations.

Important: Using time-series as observations is not recommended in assisted history match. Pick individual uncorrelated data points at relevant points in time instead.

The type of observations supported must follow the datatypes that the realizations and ensemble objects are able to internalize.

mismatch(ens_or_real)[source]

Compute the mismatch from the current observation set to the incoming ensemble or realization.

In the case of an ensemble, it will calculate individually for every realization, and aggregate the results.

Returns:

dataframe with REAL (only if ensemble), OBSKEY, DATE,: L1, L2. One row for every observation unit.

load_smry(realization, smryvector, time_index='yearly', smryerror=None)[source]

Add an observation unit from a VirtualRealization or ScratchRealization, being a specific summaryvector, picking values with the specified time resolution.

This can be used to compare similarity between realization, by viewing simulated results as “observations”. A use case is to rank all realizations in an ensemble for the similarity to a certain mean profile, f.ex. FOPT.

The result of the function is a observation unit added to the smry observations, with values at every date.

Parameters:

realization – ScratchRealization or VirtualRealization containing data for constructing the virtual observation
smryvector – string with a name of a specific summary vector to be used
time_index – string with timeresolution, typically ‘yearly’ or ‘monthly’.
smryerror – float, constant value to be used as the measurement error for every date.

property empty

Decide if the observation set is empty

An empty observation set is has zero observation unit count

keys()[source]

Return a list of observation units present.

This list might change into a dataframe in the future, but calling len() on its results should always return the number of observation units.

to_ert2observations()[source]

Convert the observation set to an observation file for use with Ert 2.x.

Returns: multiline string

to_yaml()[source]

Convert the current observations to YAML format

Returns:: Multiline YAML string.
Return type:: string

to_disk(filename)[source]

Write the current observation object to disk

In YAML-format. If a new observation object is instantiated from the outputted filename, it should yield identical results in mismatch calculation.

Directory structure will be created if not existing. Existing file will be overwritten.

Parameters:: to (filename - string with path and filename) – be written to

fmu.ensemble.realization module

Module for the ScratchRealization class

A realization is a set of results from one subsurface model realization. A realization can be either defined from its output files from the FMU run on the file system, it can be computed from other realizations, or it can be an archived realization.

class fmu.ensemble.realization.ScratchRealization(path, realidxregexp=None, index=None, autodiscovery=True, batch=None)[source]

Bases: object

A representation of results still present on disk

ScratchRealizations point to the filesystem for their contents.

A realization must at least contain a STATUS file. Additionally, jobs.json and parameters.txt will be attempted loaded by default.

The realization is defined by the pointers to the filesystem. When asked for, this object will return data from the filesystem (or from cache if already computed).

The files dataframe is the central filesystem pointer repository for the object. It will at least contain the columns * FULLPATH absolute path to a file * FILETYPE filename extension (after last dot) * LOCALPATH relative filename inside realization diretory * BASENAME filename only. No path. Includes extension

This dataframe is available as a read-only property from the object

Parameters:

path (str) – absolute or relative path to a directory containing a realizations files.
realidxregexp (re/str) – a compiled regular expression which is used to determine the realization index (integer) from the path. First match is the index. Default: realization-(d+) Only needs to match path components. If a string is supplied, it will be attempted compiled into a regular expression.
index (int) – the realization index to be used, will override anything else.
autodiscovery (boolean) – whether the realization should try to auto-discover certain data (UNSMRY files in standard location)
batch (dict) – List of functions (load_*) that should be run at time of initialization. Each element is a length 1 dictionary with the function name to run as the key and each keys value should be the function arguments as a dict.

process_batch(batch)[source]

Process a list of functions to run/apply

This is equivalent to calling each function individually but this enables more efficient concurrency. It is meant to be used for functions that modifies the realization object, not for functions that returns a dataframe already.

Parameters:

batch (list) – Each list element is a dictionary with one key, being a function names, value pr key is a dict with keyword arguments to be supplied to each function.

Returns:

This realization object (self), for it: to be picked up by ProcessPoolExecutor and pickling.

Return type:

ScratchRealization

runpath()[source]

Return the runpath (“root”) of the realization

Returns:

the filesystem path which at least existed: at time of object initialization.

Return type:

str

to_virtual(name=None, deepcopy=True)[source]

Convert the current ScratchRealization object to a VirtualRealization

Parameters:

description (str) – used as label
deepcopy (boolean) – Set to true if you want to continue to manipulate the ScratchRealization object afterwards without affecting the virtual realization. Defaults to True. False will give faster execution.

load_file(localpath, fformat, convert_numeric=True, force_reread=False)[source]

Parse and internalize files from disk.

Several file formats are supported: - txt (one key-value pair pr. line) - csv - scalar (one number or one string in the first line)

load_scalar(localpath, convert_numeric=False, force_reread=False, comment=None, skip_blank_lines=True, skipinitialspace=True)[source]

Parse a single value from a file.

The value can be a string or a number.

Empty files are treated as existing, with an empty string as the value, different from non-existing files.

pandas.read_table() is used to parse the contents, the args ‘comment’, ‘skip_blank_lines’, and ‘skipinitialspace’ is passed on to that function.

Parameters:

localpath – path to the file, local to the realization
convert_numeric – If True, non-numerical content will be thrown away
force_reread – Reread the data from disk.

Returns:

the value read from the file.

Return type:

str/number

load_txt(localpath, convert_numeric=True, force_reread=False)[source]

Parse a txt file with <key> <value> in each line.

The txt file will be internalized in a dict and will be stored if the object is archived. Recommended file extension is ‘txt’.

Common usage is internalization of parameters.txt which happens by default, but this can be used for all txt files.

The parsed data is returned as a dict. At the ensemble level the same function returns a dataframe.

There is no get’er for the constructed data, access the class variable keyvaluedata directly, or rerun this function. (except for parameters.txt, for which there is a property called ‘parameters’)

Values with spaces are not supported, this is similar to ERT’s CSV_EXPORT1. Remainder string will be ignored silently.

Parameters:

localpath – path local the realization to the txt file
convert_numeric – defaults to True, will try to parse all values as integers, if not, then floats, and strings as the last resort.
force_reread – Force reread from file system. If False, repeated calls to this function will returned cached results.

Returns:

Dictionary with the parsed values. Values will be returned as: integers, floats or strings. If convert_numeric is False, all values are strings.

Return type:

dict

load_csv(localpath, convert_numeric=True, force_reread=False)[source]

Parse a CSV file as a DataFrame

Data will be stored as a DataFrame for later access or storage.

Filename is relative to realization root.

Parameters:

localpath – path local the realization to the txt file
convert_numeric – defaults to True, will try to parse all values as integers, if not, then floats, and strings as the last resort.
force_reread – Force reread from file system. If False, repeated calls to this function will returned cached results.

Returns:

The CSV file loaded. Empty dataframe: if file is not present.

Return type:

dataframe

load_status()[source]

Collects the contents of the STATUS files and return as a dataframe, with information from jobs.json added if available.

Each row in the dataframe is a finished FORWARD_MODEL The STATUS files are parsed and information is extracted. Job duration is calculated, but jobs above 24 hours get incorrect durations.

Returns:: A dataframe with information from the STATUS files. Each row represents one job in one of the realizations.

apply(callback, **kwargs)[source]

Callback functionality

A function handle can be supplied which will be executed on this realization. The function supplied must return a Pandas DataFrame. The function can accept an additional kwargs dictionary with extra information. Special keys in the kwargs data are ‘realization’, which will hold the current realization object. The key ‘localpath’ is also reserved for the use inside this apply(), as it is used for the name of the internalized data.

If the key ‘dumptofile’ is a boolean and set to True, the resulting dataframe is also attempted written to disk using the supplied ‘localpath’.

Parameters:: **kwargs (dict) – which is supplied to the callbacked function, in which the key ‘localpath’ also points the the name used for data internalization.

keys()[source]: Access the keys of the internal data structure

get_df(localpath, merge=None)[source]

Access the internal datastore which contains dataframes or dicts or scalars.

The localpath argument can be shortened, as it will be looked up using the function shortcut2path()

Parameters:

localpath (str) – the idenfier of the data requested
merge (list or str) – identifier/localpath of some data to be merged in, typically ‘parameters.txt’. Will only work when return type is a dataframe. If list is supplied, order can matter.

Returns:

dataframe or dictionary.

Raises:

KeyError if data is not found. –
TypeError if data in localpath or merge is not of a mergeable type –

find_files(paths, metadata=None, metayaml=False)[source]

Discover realization files. The files dataframe will be updated.

Certain functionality requires up-front file discovery, e.g. ensemble archiving and ensemble arithmetic.

CSV files for single use do not have to be discovered.

Files containing double-dashes ‘–’ indicate that the double dashes separate different component with meaning in the filename. The components are extracted and put into additional columns “COMP1”, “COMP2”, etc.. Filetype extension (after the last dot) will be removed from the last component.

Parameters:

paths – str or list of str with filenames (will be globbed) that are relative to the realization directory.
metadata – dict with metadata to assign for the discovered files. The keys will be columns, and its values will be assigned as column values for the discovered files. During rediscovery of files, old metadata will be removed.
metayaml – Additional possibility of adding metadata from associated yaml files. Yaml files to be associated to a specific discovered file can have an optional dot in front, and must end in .yml, added to the discovered filename. The yaml file will be loaded as a dict, and have its keys flattened using the separator ‘–’. Flattened keys are then used as column headers in the returned dataframe.

Returns:

A slice of the internalized dataframe corresponding to the discovered files (will be included even if it has been discovered earlier)

property parameters

Access the data obtained from parameters.txt

Returns:: dict with data from parameters.txt

get_eclfiles()[source]: get_eclfiles is deprecated as ecl2df has been renamed to res2df. Use the function get_resdatafiles together with res2df instead.

get_resdatafiles()[source]

Return an res2df.ResdataFiles object to connect to the res2df package

If autodiscovery, it will search for a DATA file in the standard location eclipse/model/…DATA.

If you have multiple DATA files, you must discover the one you need explicitly before calling this function, example:

>>> real = ScratchRealization("myrealpath")
>>> real.find_files("eclipse/model/MYMODELPREDICTION.DATA")

Returns:: res2df.ResdataFiles. None if nothing found

get_eclsum(cache=True, include_restart=True)[source]

Fetch the Eclipse Summary file from the realization and return as a ResdataFile object

Unless the UNSMRY file has been discovered, it will pick the file from the glob eclipse/model/*UNSMRY, as long as autodiscovery is not turned off when the realization object was initialized.

If you have multiple UNSMRY files in eclipse/model turning off autodiscovery is strongly recommended.

Parameters:

cache – boolean indicating whether we should keep an object reference to the EclSum object. Set to false if you need to conserve memory.
include_restart – boolean sent to resdata for whether restart files should be traversed.

Returns:

object representing the summary file. None if: nothing was found.

Return type:

EclSum

load_smry(time_index='raw', column_keys=None, cache_eclsum=None, start_date=None, end_date=None, include_restart=True)[source]

Produce dataframe from Summary data from the realization

When this function is called, the dataframe will be internalized. Internalization of summary data in a realization object supports different time_index, but there is no handling of multiple sets of column_keys. The cached data will be called

‘share/results/tables/unsmry–<time_index>.csv’

where <time_index> is among ‘yearly’, ‘monthly’, ‘daily’, ‘first’, ‘last’ or ‘raw’ (meaning the raw dates in the SMRY file), depending on the chosen time_index. If a custom time_index (list of datetime) was supplied, <time_index> will be called ‘custom’.

Wraps resdata.summary.Summary.pandas_frame()

fmu.ensemble.realizationcombination module

Module for handling linear combinations of realizations.

class fmu.ensemble.realizationcombination.RealizationCombination(ref, scale=None, add=None, sub=None)[source]

Bases: object

The class is used to perform linear operations on realizations.

When instantiated, the linear combination will not actually be computed before the results are actually asked for - lazy evaluation.

keys()[source]: Return the intersection of all keys available in reference realization(combination) and the other

get_df(localpath, merge=None)[source]

Obtain given data from the realizationcombination, doing the actual computation of realizationdata on the fly.

Warning: In order to add dataframes together with meaning, using pandas.add, the index of the frames must be correctly set, and this can be tricky for some datatypes (f.ex. volumetrics table where you want to add together volumes for correct zone and fault segment).

If you have the columns “DATE”, “ZONE” and/or “REGION”, it will be regarded as an index column.

Parameters:

localpath (str) – refers to the internalized name of the data wanted in the realizations.
merge (list or str) – Optional data to be merged in for the data The merge will happen before combination. Be careful with index guessing and merged data.

Returns:

pd.DataFrame, str, float, int or dict. None if datatype is: a string which we cannot combine.

Raises:

KeyError if data is not found. This can also happen –
for the requested data to merge in. TypeError if scalar values –
are strings and they are multiplied with scalar. –

to_virtual(keyfilter=None)[source]

Evaluate the current linear combination and return as a VirtualRealization.

Parameters:: keyfilter (list or str) – If supplied, only keys matching wildcards in this argument will be included. Use this for speed reasons when only some data is needed. Default is to include everything. If you supply “unsmry”, it will match every key that includes this string by prepending and appending ‘*’ to your pattern
Returns:: VirtualRealization

get_smry_dates(freq='monthly', normalize=True, start_date=None, end_date=None)[source]: Create a union of dates available in the involved ensembles

get_smry(column_keys=None, time_index=None)[source]

Loads the Eclipse summary data directly from the underlying realization data.

Parameters:

column_keys (str or list) – column key wildcards. Default is ‘*’, which will match all vectors in the Eclipse output.
time_index (str or list of DateTime) – time_index mnemonic or a list of explicit datetime at which the summary data is requested (interpolated or extrapolated)

Returns:

Indexed rows, has at least the column DATE

Return type:

pd.DataFrame

get_smry_meta(column_keys=None)[source]

Provide metadata for summary data vectors.

A dictionary indexed by summary vector names is returned, and each value is another dictionary with potentially the metadata types: * unit (string) * is_total (bool) * is_rate (bool) * is_historical (bool) * get_num (int) (only provided if not None) * keyword (str) * wgname (str og None)

Parameters:: column_keys – List or str of column key wildcards

property parameters

Access the data obtained from parameters.txt

Returns:: dict with data from parameters.txt

fmu.ensemble.version module

fmu.ensemble.virtualensemble module

Module containing a VirtualEnsemble class

class fmu.ensemble.virtualensemble.VirtualEnsemble(name=None, data=None, longdescription=None, fromdisk=None, lazy_load=False, manifest=None)[source]

Bases: object

A computed or archived ensemble

Computed or archived, there is no link to the original dataset(s) that once was on a file system.

Contrary to a ScratchEnsemble which contains realization objects with individual data, a VirtualEnsemble stores aggregrated dataframes for its data. The column REAL will in all dataframes signify the realization index.

Initialization of VirtualEnsembles is typically done by other code, as to_virtual() in a ScratchEnsemble.

Parameters:

name – string, can be chosen freely
data – dict with data to initialize with. Defaults to empty
longdescription – string, free form multiline description.
fromdisk – string with filesystem path, from which we will try to initialize the ensemble from files on disk.
lazy_load (boolean) – If true, it will be used if loaded from disk to be lazy in actually loading dataframes from disk
manifest – dict with any information about the ensemble

get_realindices()[source]

Return the integer indices for realizations in this ensemble

Returns:: list of integers

update_realindices()[source]

Update the internal list of known realization indices

Anything that adds or removes realizations must take responsibility for having that list consistent.

If there is a dataframe missing the REAL column, this will intentionally error.

keys()[source]

Return all keys in the internal datastore

The keys are also called localpaths, and resemble the the filenames they would be written to if dumped to disk, and also resemble the filenames from which they were originally loaded in a ScratchEnsemble.

lazy_keys()[source]: Return keys that are not yet loaded, but will be loaded on demand

shortcut2path(shortpath, keys=None)[source]

Convert short pathnames to fully qualified pathnames within the datastore.

If the fully qualified localpath is

‘share/results/volumes/simulator_volume_fipnum.csv’

then you can also access this with these alternatives:

simulator_volume_fipnum
simulator_volume_fipnum.csv
share/results/volumes/simulator_volume_fipnum

but only as long as there is no ambiguity. In case of ambiguity, the shortpath will be returned.

get_realization(realindex)[source]

Return a virtual realization object, with data taken from the virtual ensemble. Each dataframe in the ensemble will by sliced by the REAL column.

Parameters:: realindex – integer for the realization.
Returns:: VirtualRealization, populated with data.

add_realization(realization, realidx=None, overwrite=False)[source]

Add a realization. A ScratchRealization will be effectively converted to a virtual realization.

A ScratchRealization knows its realization index, and that index will be used unless realidx is not None. A VirtualRealization does not always have a index, so then it must be supplied.

Unless overwrite is True, a ValueError will be raised if the realization index already exists.

Parameters:

overwrite – boolean whether an existing realization with the same index should be removed prior to adding
realidx – Override the realization index for incoming realization. Necessary for VirtualRealization.

remove_realizations(deleteindices)[source]

Remove realizations from internal data

This will remove all rows in all internalized data belonging to the set of supplied indices.

Parameters:: deleteindices – int or list of ints, realization indices to remove

remove_data(localpaths)[source]

Remove a certain datatype from the internal datastore

Parameters:: localpaths – string or list of strings, fully qualified localpath (no shorthand allowed)

agg(aggregation, keylist=None, excludekeys=None)[source]

Aggregate the ensemble data into a VirtualRealization

All data will be attempted aggregated. String data will typically be dropped in the result.

Parameters:

aggregation – string, supported modes are ‘mean’, ‘median’, ‘p10’, ‘p90’, ‘min’, ‘max’, ‘std, ‘var’, ‘pXX’ where X is a number
keylist – list of strings, indicating which keys in the internal datastore to include. If list is empty (default), all data will be attempted included.
excludekeys – list of strings that should be excluded if keylist is empty, otherwise ignored

Returns:

VirtualRealization. Its name will include the aggregation operator

WARNING: CODE DUPLICATION from ensemble.py

append(key, dataframe, overwrite=False)[source]

Append a dataframe to the internal datastore

Incoming dataframe MUST have a column called ‘REAL’ which refers to the realization indices already known to the object.

Parameters:

key – name (localpath) for the data, this will be the name under with the dataframe is stored, for later retrival via get_df().
dataframe – a Pandas DataFrame with a REAL column
overwrite – boolean - set to True if existing data is to be overwritten. Defaults to false which will only issue a warning if the dataset exists already.

to_disk(filesystempath, delete=False, dumpcsv=True, dumpparquet=True, includefiles=False, symlinks=False)[source]

Dump all data to disk, in a retrieveable manner.

Unless dumpcsv is set to False, all data is dumped to CSV files, except if some CSV files cannot be dumped as parquet.

Unless dumpparquet is set to False, all data is attempted dumped as Parquet files. If parquet dumping fails for some reason, a CSV file is always left behind.

dumpcsv and dumpparquet cannot be False at the same time.

Parameters:

filesystempath – string with a directory, absolute or relative. If it exists already it must be empty, or delete must be True.
delete – boolean for whether an existing directory will be cleared before data is dumped.
dumpcsv – boolean for whether CSV files should be written.
dumpparquet – boolean for whether parquet files should be written
includefiles (boolean) – If set to True, files in the files dataframe will be included in the disk-dump.
symlinks (boolean) – If includefiles is True, setting this to True means that only symlinking will take place, not full copy.

from_disk(filesystempath, fmt='parquet', lazy_load=False)[source]

Load data from disk.

Data must be written like to_disk() would have written it. As long as you follow that convention, you are able to add data manually to the filesystem and load them into a VirtualEnsemble.

Any DataFrame not containing a column called ‘REAL’ with integers will be ignored.

Parameters:

filesystempath (string) – path to a directory that was written by VirtualEnsemble.to_disk().
fmt (string) – the preferred format to load, must be either csv or parquet. If you say ‘csv’ parquet files will always be ignored. If you say parquet, corresponding ‘csv’ files will still be parsed. Delete them if you really don’t want them
lazy_load (bool) – If True, loading of dataframes from disk will be postponed until get_df() is actually called.

get_df(localpath, merge=None)[source]

Access the internal datastore which contains dataframes or dicts

The localpath argument can be shortened, as it will be looked up using the function shortcut2path()

Parameters:

localpath – the idenfier of the data requested
merge (list or str) – refer to an additional localpath which will be merged into the dataframe for every realization

Returns:

dataframe or dictionary

Raises:

KeyError if no data is found –

get_smry(column_keys=None, time_index='monthly')[source]

Function analoguous to the EclSum direct get’ters in ScratchEnsemble, but here we have to resort to what we have internalized.

This will perform interpolation in each realizations data to the requested time_index, this is done by creating VirtualRealization object for all realizations, which can do the interpolation, and the result is merged and returned. This creates some overhead, so if you do not need the interpolation, stick with get_df() instead.

get_smry_stats(column_keys=None, time_index='monthly', quantiles=None)[source]

Function to extract the ensemble statistics (Mean, Min, Max, P10, P90) for a set of simulation summary vectors (column key).

Compared to the agg() function, this function only works on summary data (time series), and will only operate on actually requested data, independent of what is internalized. It accesses the summary files directly and can thus obtain data at any time frequency.

In a virtual ensemble, this function can only provide data it has internalized. There is no resampling functionality yet.

Parameters:

column_keys – list of column key wildcards. Defaults to match all available columns
time_index – list of DateTime if interpolation is wanted default is None, which returns the raw Eclipse report times If a string is supplied, that string is attempted used via get_smry_dates() in order to obtain a time index.
quantiles – list of ints between 0 and 100 for which quantiles to compute. Quantiles follow scientific standard, for the oil industry p10 you should ask for p90.

Returns:

A MultiIndex dataframe. Outer index is ‘minimum’, ‘maximum’, ‘mean’, ‘p10’, ‘p90’, inner index are the dates. Column names are the different vectors. The column ‘p10’ represent the scientific p10, not the oil industry p10 for which you have to ask for p90.

get_volumetric_rates(column_keys=None, time_index='monthly', time_unit=None)[source]

Compute volumetric rates from internalized cumulative summary vectors

Column names that are not referring to cumulative summary vectors are silently ignored.

A Dataframe is returned with volumetric rates, that is rate values that can be summed up to the cumulative version. The ‘T’ in the column name is switched with ‘R’. If you ask for FOPT, you will get FOPR in the returned dataframe.

Rates in the returned dataframe are valid forwards in time, opposed to rates coming directly from the Eclipse simulator which are valid backwards in time.

If time_unit is set, the rates will be scaled to represent either daily, monthly or yearly rates. These will sum up to the cumulative as long as you multiply with the correct number of days, months or year between each consecutive date index. Month lengths and leap years are correctly handled.

Parameters:

column_keys – str or list of strings, cumulative summary vectors
time_index – str or list of datetimes
time_unit – str or None. If None, the rates returned will be the difference in cumulative between each included time step (where the time interval can vary arbitrarily) If set to ‘days’, ‘months’ or ‘years’, the rates will be scaled to represent a daily, monthly or yearly rate that is compatible with the date index and the cumulative data.

get_smry_meta(column_keys=None)[source]

Provide metadata for summary data vectors.

A dictionary indexed by summary vector names is returned, and each value is another dictionary with potentially the following metadata types: * unit (string) * is_total (bool) * is_rate (bool) * is_historical (bool) * get_num (int) (only provided if not None) * keyword (str) * wgname (str or None)

This data is produced from loaded summary dataframes upon ensemble virtualization.

Parameters:: column_keys (list or str) – Column key wildcards.
Returns:: dict of dict with metadata.

property files

Access the list of internalized files as they came from a ScratchEnsemble. Might be empty

Returns:: pd.Dataframe. Empty if no files are meaningful

property manifest

Get the manifest of the ensemble. The manifest is nothing but a dictionary with unspecified content

Returns:: dict

property parameters: Quick access to parameters

property name: The name of the virtual ensemble as set during initialization

fmu.ensemble.virtualrealization module

Contains the VirtualRealization class

class fmu.ensemble.virtualrealization.VirtualRealization(description=None, data=None, longdescription=None)[source]

Bases: object

A computed or archived realization.

Computed or archived, one cannot assume to have access to the file system containing original data.

Datatables that in a ScratchRealization was available through the files dataframe, is now available as dataframes in a dict accessed by the localpath in the files dataframe from ScratchRealization-

keys()[source]: Return the keys of all data in internal datastore

append(key, dataframe, overwrite=False)[source]

Append data to the datastore.

No checks performed on the dataframe coming in. If key exists, nothing will be appended unless overwrite is set to True

to_disk(filesystempath, delete=False)[source]

Write the virtual realization to the filesystem.

All data will be dumped to the requested directory according to their localpaths (keys).

Parameters:

filesystempath – string with a directory, absolute or relative. If it exists already, it must be empty, otherwise we give up.
delete – boolean, if True, existing directory at the filesystempath will be deleted before export.

load_disk(filesystempath)[source]

Load data for a virtual realization from disk.

Existing data in the current object will be wiped, this function is intended for initialization

WARNING: This code is really shaky. We need metafiles written by to_json() for robust parsing of files on disk, f.ex. are txt files really key-value data (dicts) or csv files?

Currently, the file format is guessed based on the contents of the two first lines: * CSV files contains commas, and more than one line * key-value files contains two space-separated values, and at least one line * scalar files contain only one item and one line

Parameters:: filesystempath – path to a directory that to_disk() has written to (or a really careful user)

to_json()[source]

Dump realization data to json.

Resulting json string is compatible with the accompanying load_json() function

get_df(localpath, merge=None)[source]

Access the internal datastore which contains dataframes, dicts or scalars.

The localpath argument can be shortened, as it will be looked up using the function shortcut2path()

Parameters:

localpath (str) – the identifier of the data requested
merge (list or str) – refer to additional localpath(s) which will be merged into the dataframe.

Returns:

dataframe, dictionary, float, str or integer

Raises:

KeyError if data is not found. –

get_volumetric_rates(column_keys=None, time_index=None, time_unit=None)[source]

Compute volumetric rates from cumulative summary vectors

See fmu.ensemble.util.compute_volumetric_rates()

get_smry(column_keys=None, time_index='monthly')[source]

Analog function to get_smry() in ScratchRealization

Accesses the internalized summary data and performs interpolation if needed.

Returns data for those columns that are known, unknown columns will be issued a warning for.

BUG: If some columns are available only in certain dataframes, we might miss them (e.g. we ask for yearly FOPT, and we have yearly smry with only WOPT data, and FOPT is only in daily smry). Resolution is perhaps to merge all relevant data upfront.

Parameters:

column_keys – str or list of str with column names, may contain wildcards (glob-style). Default is to match every key that is known (contrary to behaviour in a ScratchRealization)
time_index – str or list of datetimes

get_smry_dates(freq='monthly', normalize=False)[source]

Return list of datetimes available in the realization

Similar to the function in ScratchRealization, but start and end date is taken from internalized smry dataframes.

Parameters:

freq – string denoting requested frequency for the list of datetimes. ‘daily’, ‘monthly’, ‘yearly’ or ‘weekly’ ‘first’ will give out the first date (minimum) and ‘last’ will give out the last date (maximum), both as lists with one element.
normalize – Whether to normalize backwards at the start and forwards at the end to ensure the entire date range is covered.

Returns:

list of datetimes. Empty if no summary data is available.

get_smry_meta(column_keys=None)[source]

Provide metadata for summary data vectors.

A dictionary indexed by summary vector names is returned, and each value is another dictionary with potentially the following metadata types: * unit (string) * is_total (bool) * is_rate (bool) * is_historical (bool) * get_num (int) (only provided if not None) * keyword (str) * wgname (str or None)

Parameters:: column_keys (list or str) – Column key wildcards.
Returns:: dict of dict with metadata information

property parameters: Convenience getter for parameters.txt

property name: Return name of ensemble

fmu.ensemble.virtualrealization.smry_cumulative(column_keys)[source]

Determine whether smry vectors are cumulative

Returns list of booleans, indicating whether a certain column_key in summary dataframes corresponds to a cumulative column.

The current implementation checks for the letter ‘T’ in the column key, but this behaviour is not guaranteed in the future, in case the cumulative information gets internalized

Warning: This code is duplicated in realization.py, even though a ScratchRealization has access to the EclSum object which can give the true answer

Parameters:

column_keys – str or list of strings with summary vector names

Returns:

list of booleans, corresponding to each inputted: summary vector name.

Module contents

Top-level package for fmu.ensemble