dataio.dataio module

Module for DataIO class.

The metadata spec is documented as a JSON schema, stored under schema/.

read_metadata(filename)[source]

Read the metadata as a dictionary given a filename.

If the filename is e.g. /some/path/mymap.gri, the assosiated metafile will be /some/path/.mymap.gri.yml (or json?)

Parameters:

filename (str | Path) – The full path filename to the data-object.

Return type:

dict

Returns:

A dictionary with metadata read from the assiated metadata file.

class ExportData(access_ssdl=<factory>, aggregation=False, casepath=None, classification=None, config=<factory>, content=None, content_metadata=None, depth_reference=None, domain_reference='msl', description='', display_name=None, fmu_context=None, forcefolder='', geometry=None, grid_model=None, is_observation=False, is_prediction=True, name='', undef_is_zero=False, parent='', preprocessed=False, realization=None, rep_include=None, reuse_metadata_rule=None, runpath=None, subfolder='', tagname='', timedata=None, unit='', verbosity='DEPRECATED', vertical_domain='depth', workflow=None, table_index=None, _classification=Classification.internal)[source]

Bases: object

Class for exporting data with rich metadata in FMU.

This class sets up the general metadata content to be applied in export. For example:

for name in ["TopOne", TopTwo", "TopThree"]:
    poly = xtgeo.polygons_from_roxar(PRJ, hname, POL_FOLDER)

    ed = dataio.ExportData(
        config=CFG,
        content="depth",
        unit="m",
        vertical_domain="depth",
        domain_reference="msl",
        timedata=None,
        is_prediction=True,
        is_observation=False,
        tagname="faultlines",
        workflow="rms structural model",
        name=name
    )
    out = ed.export(poly)

A note on ‘pwd’ and ‘rootpath’ and ‘casepath’: The ‘pwd’ is the process working directory, which is folder where the process (script) starts. The ‘rootpath’ is the folder from which relative file names are relative to and is normally auto-detected. The user can however force set the ‘actual’ rootpath by providing the input casepath. In case of running a RMS project interactive on disk:

/project/foo/resmod/ff/2022.1.0/rms/model                   << pwd
/project/foo/resmod/ff/2022.1.0/                            << rootpath

A file:

/project/foo/resmod/ff/2022.1.0/share/results/maps/xx.gri   << example absolute
                                share/results/maps/xx.gri   << example relative

When running an ERT forward job using a normal ERT job (e.g. a script):

/scratch/nn/case/realization-44/iter-2                      << pwd
/scratch/nn/case                                            << rootpath

A file:

/scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri  << absolute
                 realization-44/iter-2/share/results/maps/xx.gri  << relative

When running an ERT forward job but here executed from RMS:

/scratch/nn/case/realization-44/iter-2/rms/model            << pwd
/scratch/nn/case                                            << rootpath

A file:

/scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri  << absolute
                 realization-44/iter-2/share/results/maps/xx.gri  << relative
Parameters:
  • access_ssdl (dict) – Optional. A dictionary that will overwrite or append to the default ssdl settings read from the config. Example: {"access_level": "restricted", "rep_include": False} Deprecated and replaced by ‘classification’ and ‘rep_include’ arguments.

  • casepath (Union[str, Path, None]) – Optional path to a case directory that contains valid case metadata “fmu_case.yml” in folder “<casepath>/share/metadata/”. Note for the fmu_context case the casepath is required, while for fmu_context realization it will be attempted inferred from an ERT environment variable.

  • classification (Optional[str]) – Optional. Security classification level of the data object. If present it will override the default found in the config. Valid values are either “restricted” or “internal”.

  • config (dict | GlobalConfiguration) – Required in order to produce valid metadata, either as key (here) or through an environment variable. A dictionary with static settings. In the standard case this is read from FMU global variables (via fmuconfig). The dictionary must contain some predefined main level keys to work with fmu-dataio. If the key is missing or key value is None, then it will look for the environment variable FMU_GLOBAL_CONFIG to detect the file. If no success in finding the file, a UserWarning is made. If both a valid config is provided and FMU_GLOBAL_CONFIG is provided in addition, the latter will be used. Note that this key shall be set while initializing the instance, ie. it cannot be used in generate_metadata() or export(). Note also: If missing or empty, export() may still be done, but without a metadata file (this feature may change in future releases).

  • content (Union[dict, str, None]) – A required string describing the content of the data e.g. ‘volumes’. Content is checked agains a white-list for validation! Some contents like ‘seismic’ requires additional information. This should be provided through the ‘content_metadata’ argument.

  • content_metadata (Optional[dict]) – Optional. Dictionary with additional information about the provided content. Only required for some contents, e.g. ‘seismic’. Example {“attribute”: “amplitude”, “calculation”: “mean”}.

  • fmu_context (Optional[str]) – Optional string with value realization or case. If not explicitly given it will be inferred based on the presence of ERT environment variables. The fmu_context realization will export data per realization, and should be used in normal ERT forward models, while the fmu_context case will export data relative to the case directory. Note that for the fmu_context case the case directory needs to be provided through the argument casepath.

  • domain_reference (str) – Optional, reference for the vertical scale of the data. Valid references are “msl”/”sb”/”rkb”, and the default is “msl”. Note use the vertical_domain key to set the domain (depth or time).

  • description (Union[str, list]) – A multiline description of the data either as a string or a list of strings.

  • display_name (Optional[str]) – Optional, set name for clients to use when visualizing.

  • forcefolder (str) – This setting shall only be used as exception, and will make it possible to output to a non-standard folder relative to casepath/rootpath, as dependent on the both fmu_context and the is_observations boolean value. A typical use-case is forcefolder=”seismic” which will replace the “cubes” standard folder for Cube output with “seismic”. Use with care and avoid if possible!

  • geometry (Optional[str]) – Optional, and for grid properties only, which may need a reference to the 3D grid geometry object. The value shall point to an existing file which is already exported with dataio, and hence has an assosiated metadata file. The grid name will be derived from the grid metadata, if present, and applied as part of the gridproperty file name (same behaviour as the parent key; replacing this). Note that this key may replace the usage of both the parent key and the grid_model key in the near future.

  • grid_model (Optional[str]) – Currently allowed but planned for deprecation. See geometry.

  • table_index (Optional[list]) – This applies to Pandas (table) data only, and is a list of the column names to use as index columns e.g. [“ZONE”, “REGION”].

  • is_prediction (bool) – True (default) if model prediction data

  • is_observation (bool) – Default is False. If True, then disk storage will be on the “share/observations” folder, otherwise on “share/result”. An exception arise if preprocessed=True, then the folder will be set to “share/preprocessed” irrespective the value of is_observation.

  • name (str) – Optional but recommended. The name of the object. If not set it is tried to be inferred from the xtgeo/pandas/… object. The name is then checked towards the stratigraphy list, and name is replaced with official stratigraphic name if found in static metadata stratigraphy. For example, if “TopValysar” is the model name and the actual name is “Valysar Top Fm.” that latter name will be used.

  • parent (str) – Optional. This key is required for datatype GridProperty, unless the geometry is given, and refers to the name of the grid geometry. It will only be added in the filename, and not as genuine metadata entry. This key is a candidate for deprecation, and users shall use the geometry key instead. If both parent and geometry is given, the grid name derived from the geometry object will have predence.

  • preprocessed (bool) – Default is False. If True, the data exported are output to a dedicated “share/preprocessed” folder, and metadata can be partially re-used in an ERT model run using the ExportPreprocessedData class.

  • rep_include (Optional[bool]) – Optional. If True then the data object will be available in REP. Default is False.

  • runpath (Union[str, Path, None]) – TODO! Optional and deprecated. The relative location of the current run root. Optional and will in most cases be auto-detected, assuming that FMU folder conventions are followed. For an ERT run e.g. /scratch/xx/nn/case/realization-0/iter-0/. while in a revision at project disc it will the revision root e.g. /project/xx/resmod/ff/21.1.0/.

  • subfolder (str) – It is possible to set one level of subfolders for file output. The input should only accept a single folder name, i.e. no paths. If paths are present, a deprecation warning will be raised.

  • tagname (str) – This is a short tag description which be be a part of file name.

  • timedata (Union[List[str], List[List[str]], None]) – Optional. List of dates, where the dates are strings on form ‘YYYYMMDD’, example [‘20200101’]. A maximum of two dates can be input, the oldest date will be set as t0 in the metadata and the latest date will be t1. Note it is also possible to provide a label to each date by using a list of lists, .e.g. [[20200101, “monitor”], [20180101, “base”]].

  • vertical_domain (Union[str, dict]) – Optional. String with vertical domain either “time” or “depth” (default). It is also possible to provide a reference for the vertical scale, see the domain_reference key. Note that if the content is “depth” or “time” the vertical_domain will be set accordingly.

  • workflow (Union[str, Dict[str, str], None]) – Short tag desciption of workflow (as description)

  • undef_is_zero (bool) – Flags that nans should be considered as zero in aggregations

allow_forcefolder_absolute: ClassVar[bool] = False
arrow_fformat: ClassVar[str] = 'parquet'
case_folder: ClassVar[str] = 'share/metadata'
createfolder: ClassVar[bool] = True
cube_fformat: ClassVar[str] = 'segy'
filename_timedata_reverse: ClassVar[bool] = False
grid_fformat: ClassVar[str] = 'roff'
include_ertjobs: ClassVar[bool] = False
legacy_time_format: ClassVar[bool] = False
meta_format: ClassVar[Optional[Literal['yaml', 'json']]] = None
polygons_fformat: ClassVar[str] = 'csv'
points_fformat: ClassVar[str] = 'csv'
surface_fformat: ClassVar[str] = 'irap_binary'
table_fformat: ClassVar[str] = 'csv'
dict_fformat: ClassVar[str] = 'json'
table_include_index: ClassVar[bool] = False
verifyfolder: ClassVar[bool] = True
access_ssdl: dict
aggregation: bool = False
casepath: Union[str, Path, None] = None
classification: Optional[str] = None
config: dict | GlobalConfiguration
content: Union[dict, str, None] = None
content_metadata: Optional[dict] = None
depth_reference: Optional[str] = None
domain_reference: str = 'msl'
description: Union[str, list] = ''
display_name: Optional[str] = None
fmu_context: Optional[str] = None
forcefolder: str = ''
geometry: Optional[str] = None
grid_model: Optional[str] = None
is_observation: bool = False
is_prediction: bool = True
name: str = ''
undef_is_zero: bool = False
parent: str = ''
preprocessed: bool = False
realization: Optional[int] = None
rep_include: Optional[bool] = None
reuse_metadata_rule: Optional[str] = None
runpath: Union[str, Path, None] = None
subfolder: str = ''
tagname: str = ''
timedata: Union[List[str], List[List[str]], None] = None
unit: Optional[str] = ''
verbosity: str = 'DEPRECATED'
vertical_domain: Union[str, dict] = 'depth'
workflow: Union[str, Dict[str, str], None] = None
table_index: Optional[list] = None
generate_metadata(obj, compute_md5=True, **kwargs)[source]

Generate and return the complete metadata for a provided object.

An object may be a map, 3D grid, cube, table, etc which is of a known and supported type.

Examples of such known types are XTGeo objects (e.g. a RegularSurface), a Pandas Dataframe, a PyArrow table, etc.

Parameters:
  • obj – XTGeo instance, a Pandas Dataframe instance or other supported object.

  • compute_md5 – Deprecated, a MD5 checksum will always be computed.

  • **kwargs – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.

Returns:

A dictionary with all metadata.

export(obj, return_symlink=False, **kwargs)[source]

Export data objects of ‘known’ type to FMU storage solution with metadata.

This function will also collect the data spesific class metadata. For “classic” files, the metadata will be stored i a YAML file with same name stem as the data, but with a . in front and “yml” and suffix, e.g.:

top_volantis--depth.gri
.top_volantis--depth.gri.yml
Parameters:
  • obj – XTGeo instance, a Pandas Dataframe instance or other supported object.

  • **kwargs – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.

Returns:

full path to exported item.

Return type:

String