dataio package

Top-level package for fmu-dataio

class AggregatedData(aggregation_id=None, casepath=None, source_metadata=<factory>, name='', operation='', tagname='', verbosity='DEPRECATED')[source]

Bases: object

Instantate AggregatedData object.

Parameters:
  • aggregation_id (Optional[str]) – Give an explicit ID for the aggregation. If None, an ID will be

  • uuids. (made based on existing realization)

  • casepath (Union[str, Path, None]) – The root folder to the case, default is None. If None, the casepath is derived from the first input metadata paths (cf. source_metadata) if possible. If given explicitly, the physical casepath folder must exist in advance, otherwise a ValueError will be raised.

  • source_metadata (list) – A list of individual metadata dictionarys, coming from the valid metadata per input element that forms the aggregation.

  • operation (str) – A string that describes the operation, e.g. “mean”. This is mandatory and there is no default.

  • tagname (str) – Additional name, as part of file name

aggregation_id: Optional[str] = None
casepath: Union[str, Path, None] = None
export(obj, **kwargs)[source]

Export aggregated file with metadata to file.

Parameters:
  • obj – Aggregated object to export, e.g. a XTGeo RegularSurface

  • **kwargs – See AggregatedData() arguments; initial will be overridden by settings here.

Returns:

full path to exported item.

Return type:

String

generate_aggregation_metadata(obj, compute_md5=True, skip_null=True, **kwargs)[source]

Alias method name, see generate_metadata

generate_metadata(obj, compute_md5=True, skip_null=True, **kwargs)[source]

Generate metadata for the aggregated data.

This is a quite different and much simpler operation than the ExportData() version, as here most metadata for each input element are already known. Hence, the metadata for the first element in the input list is used as template.

Parameters:
  • obj – The map, 3D grid, table, etc instance.

  • compute_md5 – If True, an md5 sum for the file will be created. This involves a temporary export of the data, and may be time consuming for large data.

  • skip_null – This input parameter has been deprecated. If set to False, a deprecation warning will be raised.

  • **kwargs – See AggregatedData() arguments; initial will be overridden by settings here.

meta_format: ClassVar[Optional[Literal['yaml', 'json']]] = None
name: str = ''
operation: str = ''
tagname: str = ''
verbosity: str = 'DEPRECATED'
source_metadata: list
class ExportData(access_ssdl=<factory>, aggregation=False, casepath=None, classification=None, config=<factory>, content=None, content_metadata=None, depth_reference=None, domain_reference='msl', description='', display_name=None, fmu_context=None, forcefolder='', geometry=None, grid_model=None, is_observation=False, is_prediction=True, name='', undef_is_zero=False, parent='', preprocessed=False, realization=None, rep_include=None, reuse_metadata_rule=None, runpath=None, subfolder='', tagname='', timedata=None, unit='', verbosity='DEPRECATED', vertical_domain='depth', workflow=None, table_index=None, _classification=Classification.internal)[source]

Bases: object

Class for exporting data with rich metadata in FMU.

This class sets up the general metadata content to be applied in export. For example:

for name in ["TopOne", TopTwo", "TopThree"]:
    poly = xtgeo.polygons_from_roxar(PRJ, hname, POL_FOLDER)

    ed = dataio.ExportData(
        config=CFG,
        content="depth",
        unit="m",
        vertical_domain="depth",
        domain_reference="msl",
        timedata=None,
        is_prediction=True,
        is_observation=False,
        tagname="faultlines",
        workflow="rms structural model",
        name=name
    )
    out = ed.export(poly)

A note on ‘pwd’ and ‘rootpath’ and ‘casepath’: The ‘pwd’ is the process working directory, which is folder where the process (script) starts. The ‘rootpath’ is the folder from which relative file names are relative to and is normally auto-detected. The user can however force set the ‘actual’ rootpath by providing the input casepath. In case of running a RMS project interactive on disk:

/project/foo/resmod/ff/2022.1.0/rms/model                   << pwd
/project/foo/resmod/ff/2022.1.0/                            << rootpath

A file:

/project/foo/resmod/ff/2022.1.0/share/results/maps/xx.gri   << example absolute
                                share/results/maps/xx.gri   << example relative

When running an ERT forward job using a normal ERT job (e.g. a script):

/scratch/nn/case/realization-44/iter-2                      << pwd
/scratch/nn/case                                            << rootpath

A file:

/scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri  << absolute
                 realization-44/iter-2/share/results/maps/xx.gri  << relative

When running an ERT forward job but here executed from RMS:

/scratch/nn/case/realization-44/iter-2/rms/model            << pwd
/scratch/nn/case                                            << rootpath

A file:

/scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri  << absolute
                 realization-44/iter-2/share/results/maps/xx.gri  << relative
Parameters:
  • access_ssdl (dict) – Optional. A dictionary that will overwrite or append to the default ssdl settings read from the config. Example: {"access_level": "restricted", "rep_include": False} Deprecated and replaced by ‘classification’ and ‘rep_include’ arguments.

  • casepath (Union[str, Path, None]) – Optional path to a case directory that contains valid case metadata “fmu_case.yml” in folder “<casepath>/share/metadata/”. Note for the fmu_context case the casepath is required, while for fmu_context realization it will be attempted inferred from an ERT environment variable.

  • classification (Optional[str]) – Optional. Security classification level of the data object. If present it will override the default found in the config. Valid values are either “restricted” or “internal”.

  • config (dict | GlobalConfiguration) – Required in order to produce valid metadata, either as key (here) or through an environment variable. A dictionary with static settings. In the standard case this is read from FMU global variables (via fmuconfig). The dictionary must contain some predefined main level keys to work with fmu-dataio. If the key is missing or key value is None, then it will look for the environment variable FMU_GLOBAL_CONFIG to detect the file. If no success in finding the file, a UserWarning is made. If both a valid config is provided and FMU_GLOBAL_CONFIG is provided in addition, the latter will be used. Note that this key shall be set while initializing the instance, ie. it cannot be used in generate_metadata() or export(). Note also: If missing or empty, export() may still be done, but without a metadata file (this feature may change in future releases).

  • content (Union[dict, str, None]) – A required string describing the content of the data e.g. ‘volumes’. Content is checked agains a white-list for validation! Some contents like ‘seismic’ requires additional information. This should be provided through the ‘content_metadata’ argument.

  • content_metadata (Optional[dict]) – Optional. Dictionary with additional information about the provided content. Only required for some contents, e.g. ‘seismic’. Example {“attribute”: “amplitude”, “calculation”: “mean”}.

  • fmu_context (Optional[str]) – Optional string with value realization or case. If not explicitly given it will be inferred based on the presence of ERT environment variables. The fmu_context realization will export data per realization, and should be used in normal ERT forward models, while the fmu_context case will export data relative to the case directory. Note that for the fmu_context case the case directory needs to be provided through the argument casepath.

  • domain_reference (str) – Optional, reference for the vertical scale of the data. Valid references are “msl”/”sb”/”rkb”, and the default is “msl”. Note use the vertical_domain key to set the domain (depth or time).

  • description (Union[str, list]) – A multiline description of the data either as a string or a list of strings.

  • display_name (Optional[str]) – Optional, set name for clients to use when visualizing.

  • forcefolder (str) – This setting shall only be used as exception, and will make it possible to output to a non-standard folder relative to casepath/rootpath, as dependent on the both fmu_context and the is_observations boolean value. A typical use-case is forcefolder=”seismic” which will replace the “cubes” standard folder for Cube output with “seismic”. Use with care and avoid if possible!

  • geometry (Optional[str]) – Optional, and for grid properties only, which may need a reference to the 3D grid geometry object. The value shall point to an existing file which is already exported with dataio, and hence has an assosiated metadata file. The grid name will be derived from the grid metadata, if present, and applied as part of the gridproperty file name (same behaviour as the parent key; replacing this). Note that this key may replace the usage of both the parent key and the grid_model key in the near future.

  • grid_model (Optional[str]) – Currently allowed but planned for deprecation. See geometry.

  • table_index (Optional[list]) – This applies to Pandas (table) data only, and is a list of the column names to use as index columns e.g. [“ZONE”, “REGION”].

  • is_prediction (bool) – True (default) if model prediction data

  • is_observation (bool) – Default is False. If True, then disk storage will be on the “share/observations” folder, otherwise on “share/result”. An exception arise if preprocessed=True, then the folder will be set to “share/preprocessed” irrespective the value of is_observation.

  • name (str) – Optional but recommended. The name of the object. If not set it is tried to be inferred from the xtgeo/pandas/… object. The name is then checked towards the stratigraphy list, and name is replaced with official stratigraphic name if found in static metadata stratigraphy. For example, if “TopValysar” is the model name and the actual name is “Valysar Top Fm.” that latter name will be used.

  • parent (str) – Optional. This key is required for datatype GridProperty, unless the geometry is given, and refers to the name of the grid geometry. It will only be added in the filename, and not as genuine metadata entry. This key is a candidate for deprecation, and users shall use the geometry key instead. If both parent and geometry is given, the grid name derived from the geometry object will have predence.

  • preprocessed (bool) – Default is False. If True, the data exported are output to a dedicated “share/preprocessed” folder, and metadata can be partially re-used in an ERT model run using the ExportPreprocessedData class.

  • rep_include (Optional[bool]) – Optional. If True then the data object will be available in REP. Default is False.

  • runpath (Union[str, Path, None]) – TODO! Optional and deprecated. The relative location of the current run root. Optional and will in most cases be auto-detected, assuming that FMU folder conventions are followed. For an ERT run e.g. /scratch/xx/nn/case/realization-0/iter-0/. while in a revision at project disc it will the revision root e.g. /project/xx/resmod/ff/21.1.0/.

  • subfolder (str) – It is possible to set one level of subfolders for file output. The input should only accept a single folder name, i.e. no paths. If paths are present, a deprecation warning will be raised.

  • tagname (str) – This is a short tag description which be be a part of file name.

  • timedata (Union[List[str], List[List[str]], None]) – Optional. List of dates, where the dates are strings on form ‘YYYYMMDD’, example [‘20200101’]. A maximum of two dates can be input, the oldest date will be set as t0 in the metadata and the latest date will be t1. Note it is also possible to provide a label to each date by using a list of lists, .e.g. [[20200101, “monitor”], [20180101, “base”]].

  • vertical_domain (Union[str, dict]) – Optional. String with vertical domain either “time” or “depth” (default). It is also possible to provide a reference for the vertical scale, see the domain_reference key. Note that if the content is “depth” or “time” the vertical_domain will be set accordingly.

  • workflow (Union[str, Dict[str, str], None]) – Short tag desciption of workflow (as description)

  • undef_is_zero (bool) – Flags that nans should be considered as zero in aggregations

aggregation: bool = False
allow_forcefolder_absolute: ClassVar[bool] = False
arrow_fformat: ClassVar[str] = 'parquet'
case_folder: ClassVar[str] = 'share/metadata'
casepath: Union[str, Path, None] = None
classification: Optional[str] = None
content: Union[dict, str, None] = None
content_metadata: Optional[dict] = None
createfolder: ClassVar[bool] = True
cube_fformat: ClassVar[str] = 'segy'
depth_reference: Optional[str] = None
description: Union[str, list] = ''
dict_fformat: ClassVar[str] = 'json'
display_name: Optional[str] = None
domain_reference: str = 'msl'
export(obj, return_symlink=False, **kwargs)[source]

Export data objects of ‘known’ type to FMU storage solution with metadata.

This function will also collect the data spesific class metadata. For “classic” files, the metadata will be stored i a YAML file with same name stem as the data, but with a . in front and “yml” and suffix, e.g.:

top_volantis--depth.gri
.top_volantis--depth.gri.yml
Parameters:
  • obj – XTGeo instance, a Pandas Dataframe instance or other supported object.

  • **kwargs – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.

Returns:

full path to exported item.

Return type:

String

filename_timedata_reverse: ClassVar[bool] = False
fmu_context: Optional[str] = None
forcefolder: str = ''
generate_metadata(obj, compute_md5=True, **kwargs)[source]

Generate and return the complete metadata for a provided object.

An object may be a map, 3D grid, cube, table, etc which is of a known and supported type.

Examples of such known types are XTGeo objects (e.g. a RegularSurface), a Pandas Dataframe, a PyArrow table, etc.

Parameters:
  • obj – XTGeo instance, a Pandas Dataframe instance or other supported object.

  • compute_md5 – Deprecated, a MD5 checksum will always be computed.

  • **kwargs – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.

Returns:

A dictionary with all metadata.

geometry: Optional[str] = None
grid_fformat: ClassVar[str] = 'roff'
grid_model: Optional[str] = None
include_ertjobs: ClassVar[bool] = False
is_observation: bool = False
is_prediction: bool = True
legacy_time_format: ClassVar[bool] = False
meta_format: ClassVar[Optional[Literal['yaml', 'json']]] = None
name: str = ''
parent: str = ''
points_fformat: ClassVar[str] = 'csv'
polygons_fformat: ClassVar[str] = 'csv'
preprocessed: bool = False
realization: Optional[int] = None
rep_include: Optional[bool] = None
reuse_metadata_rule: Optional[str] = None
runpath: Union[str, Path, None] = None
subfolder: str = ''
surface_fformat: ClassVar[str] = 'irap_binary'
table_fformat: ClassVar[str] = 'csv'
table_include_index: ClassVar[bool] = False
table_index: Optional[list] = None
tagname: str = ''
timedata: Union[List[str], List[List[str]], None] = None
undef_is_zero: bool = False
unit: Optional[str] = ''
verbosity: str = 'DEPRECATED'
verifyfolder: ClassVar[bool] = True
vertical_domain: Union[str, dict] = 'depth'
workflow: Union[str, Dict[str, str], None] = None
access_ssdl: dict
config: dict | GlobalConfiguration
class CreateCaseMetadata(config, rootfolder, casename, caseuser, description=None)[source]

Bases: object

Create metadata for an FMU Case.

In ERT this is typically ran as an hook workflow in advance.

Parameters:
  • config (dict) – A configuration dictionary. In the standard case this is read from FMU global variables (via fmuconfig). The dictionary must contain some predefined main level keys. If config is None or the env variable FMU_GLOBAL_CONFIG pointing to a file is provided, then it will attempt to parse that file instead.

  • rootfolder (str | Path) – Absolute path to the case root, including case name.

  • casename (str) – Name of case (experiment)

  • caseuser (str) – Username provided

  • description (Optional) – Description text as string or list of strings.

description: Union[str, list, None] = None
export()[source]

Export case metadata to file.

Return type:

str

Returns:

Full path of metadata file.

generate_metadata()[source]

Generate case metadata.

Return type:

dict

Returns:

A dictionary with case metadata or an empty dictionary if the metadata already exists.

config: dict
rootfolder: str | Path
casename: str
caseuser: str
read_metadata(filename)[source]

Read the metadata as a dictionary given a filename.

If the filename is e.g. /some/path/mymap.gri, the assosiated metafile will be /some/path/.mymap.gri.yml (or json?)

Parameters:

filename (str | Path) – The full path filename to the data-object.

Return type:

dict

Returns:

A dictionary with metadata read from the assiated metadata file.

class ExportPreprocessedData(casepath, is_observation=True, _fmudata=None)[source]

Bases: object

Export a preprocessed file and its metadata into a FMU run at case level.

The existing metadata will be validated and three fields will be updated - The ‘fmu’ block will be added with information about the existing FMU/ERT run - The ‘file’ block will be updated with new file paths. - The ‘tracklog’ block will be extended with a new event tagged “merged”.

Note it is important that the preprocessed data have been created upfront with the, ExportData class using the argument fmu_context=’preprocessed’. This ensures that the file and metadata are stored in the ‘share/preprocessed/’ folder.

Parameters:
  • casepath (str | Path) – Required casepath for the active ERT experiment. The case needs to contain valid case metadata i.e. the ERT workflow ‘WF_CREATE_CASE_METADATA’ has been run prior to using this class.

  • is_observation (bool) – Default is True. If True, then disk storage will be on the “casepath/share/observations” folder, otherwise on casepath/share/result.

export(obj)[source]

Re-export preprocessed file with updated metadata. If existing metadata can’t be found or it is outdated, the file will still be copied but metadata will not be created.

Return type:

str

Returns:

Full path of exported object file.

generate_metadata(obj)[source]

Generate updated metadata for the preprocessed data.

Return type:

dict

Returns:

A dictionary with all metadata.

is_observation: bool = True
casepath: str | Path

Subpackages

Submodules