dataio.dataio module¶
Module for DataIO class.
The metadata spec is documented as a JSON schema, stored under schema/.
- read_metadata(filename)[source]¶
Read the metadata as a dictionary given a filename.
If the filename is e.g. /some/path/mymap.gri, the assosiated metafile will be /some/path/.mymap.gri.yml (or json?)
- Parameters:
filename (
str
|Path
) – The full path filename to the data-object.- Return type:
dict
- Returns:
A dictionary with metadata read from the assiated metadata file.
- class ExportData(access_ssdl=<factory>, aggregation=False, casepath=None, classification=None, config=<factory>, content=None, content_metadata=None, depth_reference=None, domain_reference='msl', description='', display_name=None, fmu_context=None, forcefolder='', geometry=None, grid_model=None, is_observation=False, is_prediction=True, name='', undef_is_zero=False, parent='', preprocessed=False, realization=None, rep_include=None, reuse_metadata_rule=None, runpath=None, subfolder='', tagname='', timedata=None, unit='', verbosity='DEPRECATED', vertical_domain='depth', workflow=None, table_index=None, _classification=Classification.internal)[source]¶
Bases:
object
Class for exporting data with rich metadata in FMU.
This class sets up the general metadata content to be applied in export. For example:
for name in ["TopOne", TopTwo", "TopThree"]: poly = xtgeo.polygons_from_roxar(PRJ, hname, POL_FOLDER) ed = dataio.ExportData( config=CFG, content="depth", unit="m", vertical_domain="depth", domain_reference="msl", timedata=None, is_prediction=True, is_observation=False, tagname="faultlines", workflow="rms structural model", name=name ) out = ed.export(poly)
A note on ‘pwd’ and ‘rootpath’ and ‘casepath’: The ‘pwd’ is the process working directory, which is folder where the process (script) starts. The ‘rootpath’ is the folder from which relative file names are relative to and is normally auto-detected. The user can however force set the ‘actual’ rootpath by providing the input casepath. In case of running a RMS project interactive on disk:
/project/foo/resmod/ff/2022.1.0/rms/model << pwd /project/foo/resmod/ff/2022.1.0/ << rootpath A file: /project/foo/resmod/ff/2022.1.0/share/results/maps/xx.gri << example absolute share/results/maps/xx.gri << example relative
When running an ERT forward job using a normal ERT job (e.g. a script):
/scratch/nn/case/realization-44/iter-2 << pwd /scratch/nn/case << rootpath A file: /scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri << absolute realization-44/iter-2/share/results/maps/xx.gri << relative
When running an ERT forward job but here executed from RMS:
/scratch/nn/case/realization-44/iter-2/rms/model << pwd /scratch/nn/case << rootpath A file: /scratch/nn/case/realization-44/iter-2/share/results/maps/xx.gri << absolute realization-44/iter-2/share/results/maps/xx.gri << relative
- Parameters:
access_ssdl (
dict
) – Optional. A dictionary that will overwrite or append to the default ssdl settings read from the config. Example:{"access_level": "restricted", "rep_include": False}
Deprecated and replaced by ‘classification’ and ‘rep_include’ arguments.casepath (
Union
[str
,Path
,None
]) – Optional path to a case directory that contains valid case metadata “fmu_case.yml” in folder “<casepath>/share/metadata/”. Note for the fmu_contextcase
thecasepath
is required, while for fmu_contextrealization
it will be attempted inferred from an ERT environment variable.classification (
Optional
[str
]) – Optional. Security classification level of the data object. If present it will override the default found in the config. Valid values are either “restricted” or “internal”.config (
dict
|GlobalConfiguration
) – Required in order to produce valid metadata, either as key (here) or through an environment variable. A dictionary with static settings. In the standard case this is read from FMU global variables (via fmuconfig). The dictionary must contain some predefined main level keys to work with fmu-dataio. If the key is missing or key value is None, then it will look for the environment variable FMU_GLOBAL_CONFIG to detect the file. If no success in finding the file, a UserWarning is made. If both a valid config is provided and FMU_GLOBAL_CONFIG is provided in addition, the latter will be used. Note that this key shall be set while initializing the instance, ie. it cannot be used ingenerate_metadata()
orexport()
. Note also: If missing or empty, export() may still be done, but without a metadata file (this feature may change in future releases).content (
Union
[dict
,str
,None
]) – A required string describing the content of the data e.g. ‘volumes’. Content is checked agains a white-list for validation! Some contents like ‘seismic’ requires additional information. This should be provided through the ‘content_metadata’ argument.content_metadata (
Optional
[dict
]) – Optional. Dictionary with additional information about the provided content. Only required for some contents, e.g. ‘seismic’. Example {“attribute”: “amplitude”, “calculation”: “mean”}.fmu_context (
Optional
[str
]) – Optional string with valuerealization
orcase
. If not explicitly given it will be inferred based on the presence of ERT environment variables. The fmu_contextrealization
will export data per realization, and should be used in normal ERT forward models, while the fmu_contextcase
will export data relative to the case directory. Note that for the fmu_contextcase
the case directory needs to be provided through the argumentcasepath
.domain_reference (
str
) – Optional, reference for the vertical scale of the data. Valid references are “msl”/”sb”/”rkb”, and the default is “msl”. Note use thevertical_domain
key to set the domain (depth or time).description (
Union
[str
,list
]) – A multiline description of the data either as a string or a list of strings.display_name (
Optional
[str
]) – Optional, set name for clients to use when visualizing.forcefolder (
str
) – This setting shall only be used as exception, and will make it possible to output to a non-standard folder relative to casepath/rootpath, as dependent on the both fmu_context and the is_observations boolean value. A typical use-case is forcefolder=”seismic” which will replace the “cubes” standard folder for Cube output with “seismic”. Use with care and avoid if possible!geometry (
Optional
[str
]) – Optional, and for grid properties only, which may need a reference to the 3D grid geometry object. The value shall point to an existing file which is already exported with dataio, and hence has an assosiated metadata file. The grid name will be derived from the grid metadata, if present, and applied as part of the gridproperty file name (same behaviour as the parent key; replacing this). Note that this key may replace the usage of both the parent key and the grid_model key in the near future.grid_model (
Optional
[str
]) – Currently allowed but planned for deprecation. See geometry.table_index (
Optional
[list
]) – This applies to Pandas (table) data only, and is a list of the column names to use as index columns e.g. [“ZONE”, “REGION”].is_prediction (
bool
) – True (default) if model prediction datais_observation (
bool
) – Default is False. If True, then disk storage will be on the “share/observations” folder, otherwise on “share/result”. An exception arise ifpreprocessed=True
, then the folder will be set to “share/preprocessed” irrespective the value ofis_observation
.name (
str
) – Optional but recommended. The name of the object. If not set it is tried to be inferred from the xtgeo/pandas/… object. The name is then checked towards the stratigraphy list, and name is replaced with official stratigraphic name if found in static metadata stratigraphy. For example, if “TopValysar” is the model name and the actual name is “Valysar Top Fm.” that latter name will be used.parent (
str
) – Optional. This key is required for datatype GridProperty, unless the geometry is given, and refers to the name of the grid geometry. It will only be added in the filename, and not as genuine metadata entry. This key is a candidate for deprecation, and users shall use the geometry key instead. If both parent and geometry is given, the grid name derived from the geometry object will have predence.preprocessed (
bool
) – Default is False. If True, the data exported are output to a dedicated “share/preprocessed” folder, and metadata can be partially re-used in an ERT model run using theExportPreprocessedData
class.rep_include (
Optional
[bool
]) – Optional. If True then the data object will be available in REP. Default is False.runpath (
Union
[str
,Path
,None
]) – TODO! Optional and deprecated. The relative location of the current run root. Optional and will in most cases be auto-detected, assuming that FMU folder conventions are followed. For an ERT run e.g. /scratch/xx/nn/case/realization-0/iter-0/. while in a revision at project disc it will the revision root e.g. /project/xx/resmod/ff/21.1.0/.subfolder (
str
) – It is possible to set one level of subfolders for file output. The input should only accept a single folder name, i.e. no paths. If paths are present, a deprecation warning will be raised.tagname (
str
) – This is a short tag description which be be a part of file name.timedata (
Union
[List
[str
],List
[List
[str
]],None
]) – Optional. List of dates, where the dates are strings on form ‘YYYYMMDD’, example [‘20200101’]. A maximum of two dates can be input, the oldest date will be set as t0 in the metadata and the latest date will be t1. Note it is also possible to provide a label to each date by using a list of lists, .e.g. [[20200101, “monitor”], [20180101, “base”]].vertical_domain (
Union
[str
,dict
]) – Optional. String with vertical domain either “time” or “depth” (default). It is also possible to provide a reference for the vertical scale, see the domain_reference key. Note that if thecontent
is “depth” or “time” the vertical_domain will be set accordingly.workflow (
Union
[str
,Dict
[str
,str
],None
]) – Short tag desciption of workflow (as description)undef_is_zero (
bool
) – Flags that nans should be considered as zero in aggregations
-
allow_forcefolder_absolute:
ClassVar
[bool
] = False¶
-
arrow_fformat:
ClassVar
[str
] = 'parquet'¶
-
case_folder:
ClassVar
[str
] = 'share/metadata'¶
-
createfolder:
ClassVar
[bool
] = True¶
-
cube_fformat:
ClassVar
[str
] = 'segy'¶
-
filename_timedata_reverse:
ClassVar
[bool
] = False¶
-
grid_fformat:
ClassVar
[str
] = 'roff'¶
-
include_ertjobs:
ClassVar
[bool
] = False¶
-
legacy_time_format:
ClassVar
[bool
] = False¶
-
meta_format:
ClassVar
[Optional
[Literal
['yaml'
,'json'
]]] = None¶
-
polygons_fformat:
ClassVar
[str
] = 'csv'¶
-
points_fformat:
ClassVar
[str
] = 'csv'¶
-
surface_fformat:
ClassVar
[str
] = 'irap_binary'¶
-
table_fformat:
ClassVar
[str
] = 'csv'¶
-
dict_fformat:
ClassVar
[str
] = 'json'¶
-
table_include_index:
ClassVar
[bool
] = False¶
-
verifyfolder:
ClassVar
[bool
] = True¶
-
access_ssdl:
dict
¶
-
aggregation:
bool
= False¶
-
casepath:
Union
[str
,Path
,None
] = None¶
-
classification:
Optional
[str
] = None¶
-
config:
dict
|GlobalConfiguration
¶
-
content:
Union
[dict
,str
,None
] = None¶
-
content_metadata:
Optional
[dict
] = None¶
-
depth_reference:
Optional
[str
] = None¶
-
domain_reference:
str
= 'msl'¶
-
description:
Union
[str
,list
] = ''¶
-
display_name:
Optional
[str
] = None¶
-
fmu_context:
Optional
[str
] = None¶
-
forcefolder:
str
= ''¶
-
geometry:
Optional
[str
] = None¶
-
grid_model:
Optional
[str
] = None¶
-
is_observation:
bool
= False¶
-
is_prediction:
bool
= True¶
-
name:
str
= ''¶
-
undef_is_zero:
bool
= False¶
-
parent:
str
= ''¶
-
preprocessed:
bool
= False¶
-
realization:
Optional
[int
] = None¶
-
rep_include:
Optional
[bool
] = None¶
-
reuse_metadata_rule:
Optional
[str
] = None¶
-
runpath:
Union
[str
,Path
,None
] = None¶
-
subfolder:
str
= ''¶
-
tagname:
str
= ''¶
-
timedata:
Union
[List
[str
],List
[List
[str
]],None
] = None¶
-
unit:
Optional
[str
] = ''¶
-
verbosity:
str
= 'DEPRECATED'¶
-
vertical_domain:
Union
[str
,dict
] = 'depth'¶
-
workflow:
Union
[str
,Dict
[str
,str
],None
] = None¶
-
table_index:
Optional
[list
] = None¶
- generate_metadata(obj, compute_md5=True, **kwargs)[source]¶
Generate and return the complete metadata for a provided object.
An object may be a map, 3D grid, cube, table, etc which is of a known and supported type.
Examples of such known types are XTGeo objects (e.g. a RegularSurface), a Pandas Dataframe, a PyArrow table, etc.
- Parameters:
obj – XTGeo instance, a Pandas Dataframe instance or other supported object.
compute_md5 – Deprecated, a MD5 checksum will always be computed.
**kwargs – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.
- Returns:
A dictionary with all metadata.
- export(obj, return_symlink=False, **kwargs)[source]¶
Export data objects of ‘known’ type to FMU storage solution with metadata.
This function will also collect the data spesific class metadata. For “classic” files, the metadata will be stored i a YAML file with same name stem as the data, but with a . in front and “yml” and suffix, e.g.:
top_volantis--depth.gri .top_volantis--depth.gri.yml
- Parameters:
obj – XTGeo instance, a Pandas Dataframe instance or other supported object.
**kwargs – Using other ExportData() input keys is now deprecated, input the arguments when initializing the ExportData() instance instead.
- Returns:
full path to exported item.
- Return type:
String