The FMU results data model¶
This section describes the data model used for FMU results when exporting with fmu-dataio. For the time being, the data model is hosted as part of fmu-dataio.
The data model described herein is new and shiny, and experimental in many aspects. Any feedback on this is greatly appreciated. The most effective feedback is to apply the data model, then use the resulting metadata.
The FMU data model is described using a Pydantic model which programmatically generates a JSON Schema.
This schema contains rules and definitions for all attributes in the data model. This means, in practice, that outgoing metadata from FMU needs to comply with the schema. If data is uploaded to e.g. Sumo, validation will be done on the incoming data to ensure consistency.
Data model documentation¶
There are two closely related data models represented here: metadata generated from an FMU realization and metadata generated on a case level. The structure and documentation of these two models can be inspected from here.
The FMU metadata model for a given data object. |
|
The FMU metadata model for an FMU case. |
About the data model¶
Why is it made?¶
FMU is a mighty system developed by and for the subsurface community in Equinor, to make reservoir modeling more efficient, less error-prone and more repeatable with higher quality, mainly through automation of cross-disciplinary workflows. It combines off-the-shelf software with in-house components such as the ERT orchestrator.
FMU is defined more and more by the data it produces, and direct and indirect dependencies on output from FMU is increasing. When FMU results started to be regularly transferred to cloud storage for direct consumption from 2017/2018 and outwards, the need for stable metadata on outgoing data became immiment. Local development on Johan Sverdrup was initiated to cater for the digital ecosystem evolving in and around that particular project, and the need for generalizing became apparent with the development of Sumo, Webviz and other initiatives.
The purpose of the data model is to cater for the existing dependencies, as well as enable more direct usage of FMU results in different contexts. The secondary objective of this data model is to create a normalization layer between the components that create data and the components that use those data. The data model is designed to also be adapted to other sources of data than FMU.
Scope of this data model¶
This data model covers data produced by FMU workflows. This includes data generated by direct runs of model templates, data produced by pre-processing workflows, data produced in individual realizations or hooked workflows, and data produced by post-processing workflows.
Note
An example of a pre-processing workflow is a set of jobs modifying selected input data for later use in the FMU workflows and/or for comparison with other results in a QC context.
Note
An example of a post-processing workflow is a script that aggregates results across many realizations and/or iterations of an FMU case.
This data model covers data that, in the FMU context, can be linked to a specific case.
Note that e.g. ERT and other components will, and should, have their own data models to cater for their needs. It is not the intention of this data model to cover all aspects of data in the FMU context. The scope is primarily data going out of FMU to be used elsewhere.
A denormalized data model¶
The data model used for FMU results is a denormalized data model, at least to a certain point. This means that the static data will be repeated many times. Example: Each exported data object contains basic information about the FMU case it belongs to, such as a unique ID for this case, its name, the user that made it, which model template was used, etc. This information if stored in every exported .yml file. This may seem counterintuitive, and differs from a relational database (where this information would typically be stored once, and referred to when needed).
There are a few reasons for choosing a denormalized data model:
First, the components for creating a relational database containing these data is not and would be extremely difficult to implement fast. Also, the nature of data in an FMU context is very distributed, with lots of files spread across many files and folders (currently).
Second, a denormalized data model enables us to utilize search engine technologies for for indexing. This is not efficient for a normalized data model. The penalty for duplicating metadata across many individual files is returned in speed and ease-of-use.
Note
The data model is only denormalized to a certain point. Most likely, it is better described as a hybrid. Example: The concept of a case is used in FMU context. In the outgoing metadata for FMU results, some information about the current case is included. However, details about the case is out of scope. For this, a consumer would have to refer to the owner of the case definition. In FMU contexts, this will be the workflow manager (ERT).
Standardized vs anarchy¶
Creating a data model for FMU results brings with it some standard. In essence, this represents the next evolution of the existing FMU standard. We haven’t called it “FMU standard 2.0” because although this would ressonate with many people, many would find it revolting. But, sure, if you are so inclined you are allowed to think of it this way. The FMU standard 1.0 is centric around folder structure and file names - a pre-requisite for standardizing for the good old days when files where files, folders were folders, and data could be consumed by double-clicking. Or, by traversing the mounted file system.
With the transition to a cloud-native state comes numerous opportunities - but also great responsibilities. Some of them are visible in the data model, and the data model is in itself a testament to the most important of them: We need to get our data straight.
There are many challenges. Aligning with everyone and everything is one. We probably don’t succeed with that in the first iteration(s). Materializing metadata effectively, and without hassle, during FMU runs (meaning that everything must be fully automated is another. This is what fmu-dataio solves. But, finding the balance between retaining flexibility and enforcing a standard is perhaps the most tricky of all.
This data model has been designed with the great flexibility of FMU in mind. If you are a geologist on an asset using FMU for something important, you need to be able to export any data from your workflow and use that data without having to wait for someone else to rebuild something. For FMU, one glove certainly does not fit all, and this has been taken into account. While the data model and the associated validation will set some requirements that you need to follow, you are still free to do more or less what you want.
We do, however, STRONGLY ENCOURAGE you to not invent too many private wheels. The risk is that your data cannot be used by others.
The materialized metadata has a nested structure which can be represented by Python dictionaries, yaml or json formats. The root level only contains key attributes, where most are nested sub-dictionaries.
Relations to other data models¶
The data model for FMU results is designed with generalization in mind. While in practice this data model cover data produced by, or in direct relations to, an FMU workflow - in theory it relates more to subsurface predictive modeling generally, than FMU specifically.
In Equinor, FMU is the primary system for creating, maintaining and using 3D predictive numerical models for the subsurface. Therefore, FMU is the main use case for this data model.
There are plenty of other data models in play in the complex world of subsurface predictive modeling. Each software applies its own data model, and in FMU this encompasses multiple different systems.
Similarly, there are other data models in the larger scope where FMU workflows represent one out of many providors/consumers of data. A significant motivation for defining this data model is to ensure consistency towards other systems and enable stable conditions for integration.
fmu-dataio has three important roles in this context:
Be a translating layer between individual softwares’ data models and the FMU results data model.
Enable fully-automated materialization of metadata during FMU runs (hundreds of thousands of files being made)
Abstract the FMU results data model through Python methods and functions, allowing them to be embedded into other systems - helping maintain a centralized definition of this data model.
The parent/child principle¶
In the FMU results data model, the traditional hierarchy of an FMU setup is not continued. An individual file produced by an FMU workflow and exported to disk can be seen in relations to a hiearchy looking something like this: case > iteration > realization > file
Many reading this will instinctively disagree with this definition, and significant confusion arises from trying to have meaningful discussions around this. There is no unified definition of this hierarchy (despite many claiming to have such a definition).
In the FMU results data model, this hiearchy is flattened down to two levels: The Parent (case) and children to that parent (files). From this, it follows that the most fundamental definition in this context is a case. To a large degree, this definition belongs to the ERT workflow manager in the FMU context. For now, however, the case definitions are extracted by-proxy from the file structure and from arguments passed to fmu-dataio.
Significant confusion can also arise from discussing the definition of a case, and the validity of this hiearchy, of course. But consensus (albeit probably local minima) is that this serves the needs.
Each file produced in relations to an FMU case (meaning before, during or after) is tagged with information about the case - signalling that this entity belongs to this case. It is not the intention of the FMU results data model to maintain all information about a case, and in the future it is expected that ERT will serve case information beyond the basics.
Note
Dot-annotation - we like it and use it. This is what it means:
The metadata structure is a dictionary-like structure, e.g.
{
"myfirstkey": {
"mykey": "myvalue",
"anotherkey": "anothervalue"
}
}
Annotating tracks along a dictionary can be tricky. With dot-annotation, we can
refer to mykey
in the example above as myfirstkey.mykey
. This will be a
pointer to myvalue
in this case. You will see dot annotation in the
explanations of the various metadata blocks below: Now you know what it means!
Weaknesses¶
uniqueness
The data model currently has challenges wrt ensuring uniqueness. Uniqueness is a challenge in this context, as a centralized data model cannot (and should not!) dictate in detail nor define in detail which data an FMU user should be able to export from local workflows.
understanding validation errors
When validating against the current schema, understanding the reasons for non-validation can be tricky. The root cause of this is the use of conditional logic in the schemas - a functionality JSON Schema is not designed for. See Logical rules below.
Logical rules¶
The schema contains some logical rules which are applied during validation. These are rules of type “if this, then that”. They are, however, not explicitly written (nor readable) as such directly. This type of logic is implemented in the schema by explicitly generating subschemas that A) are only valid for specific conditions, and B) contain requirements for that specific situation. In this manner, one can assure that if a specific condition is met, the associated requirements for that condition is used.
Example:
"oneOf": [
{
"$comment": "Conditional schema A - 'if class == case make myproperty required'",
"required": [
"myproperty"
],
"properties": {
"class": {
"enum": ["case"]
},
"myproperty": {
"type": "string",
"example": "sometext"
}
}
},
{
"$comment": "Conditional schema B - 'if class != case do NOT make myproperty required'",
"properties": {
"myproperty": {
"type": "string",
"example": "sometext"
}
}
}
]
For metadata describing a case
, requirements are different compared to
metadata describing data objects.
For selected contents, a content-specific block under data
is required. This
is implemented for fluid_contact
, field_outline
and seismic
.
Validation of data¶
When fmu-dataio exports data from FMU workflows, it produces a pair of data + metadata. The two are
considered one entity. Data consumers who wish to validate the correct match of data and metadata can
do so by verifying recreation of file.checksum_md5
on the data object only. Metadata is not considered
when generating the checksum.
This checksum is the string representation of the hash created using RSA’s MD5
algorithm. This hash
was created from the file that fmu-dataio exported. In most cases, this is the same file that are
provided to consumer. However, there are some exceptions:
Seismic data may be transformed to other formats when stored out of FMU context and the checksum may be invalid.
Changes and revisions¶
The only constant is change, as we know, and in the case of the FMU results data model - definitely so. The learning component here is huge, and there will be iterations. This poses a challenge, given that there are existing dependencies on top of this data model already, and more are arriving.
To handle this, two important concepts has been introduced.
Versioning. The current version of the FMU metadata is 0.9.0.
Contractual attributes. Within the FMU ecosystem, we need to retain the ability to do rapid changes to the data model. As we are in early days, unknowns will become knowns and unknown unknowns will become known unknowns. However, from the outside perspective some stability is required. Therefore, we have labelled some key attributes as contractual. They are listed at the top of the schema. This is not to say that they will never change - but they should not change erratically, and when we need to change them, this needs to be subject to alignment.
Contractual attributes¶
The following attributes are contractual:
access
class
data.alias
data.bbox
data.content
data.format
data.geometry
data.grid_model
data.is_observation
data.is_prediction
data.name
data.offset
data.seismic.attribute
data.spec.columns
data.standard_result.name
data.stratigraphic
data.stratigraphic_alias
data.tagname
data.time
data.vertical_domain
file.checksum_md5
file.relative_path
file.size_bytes
fmu.aggregation.operation
fmu.aggregation.realization_ids
fmu.case
fmu.context.stage
fmu.iteration.name
fmu.iteration.uuid
fmu.model
fmu.realization.id
fmu.realization.is_reference
fmu.realization.name
fmu.realization.uuid
fmu.workflow
masterdata
source
tracklog.datetime
tracklog.event
tracklog.user.id
version
Metadata example¶
Expand below to see a full example of valid metadata for surface exported from FMU.
# Example metadata for a depth surface.
$schema: https://main-fmu-schemas-dev.radix.equinor.com/schemas/0.8.0/fmu_results.json
version: "0.8.0" # including for REP and others to build logic on.
source: fmu # always included, and always this value if FMU made it. For quick identification for external consumers.
tracklog:
- datetime: 2020-10-28T14:28:02
user:
id: peesv
event: created
sysinfo:
fmu-dataio:
version: 1.2.3
komodo: # only added when running in Komodo environment
version: 2023.12.05-py38
- datetime: 2020-10-28T14:46:14
user:
id: peesv
event: updated
class: surface # class is the main identifier of the data type.
fmu: # the fmu-block contains information directly related to the FMU context
model:
name: ff
revision: 21.0.0.dev
description:
- detailed description
- optional
workflow:
reference: rms/structural_model
case:
name: MyCaseName
uuid: 8bb56d60-8758-481a-89a4-6bac8561d38e
user:
id: jriv # $USER from ERT
description:
- yet other detailed description
- optional
iteration:
id: 0 # always an int, will be 0 for e.g. "pred"
uuid: 4b939310-34b1-4179-802c-49460bc0f799 # hash of case.uuid + iteration.id
name: "iter-0" # /"pred"
restart_from: 15ce3b84-766f-4c93-9050-b154861f9100 # fmu.iteration.uuid for another iteration
realization:
id: 33
uuid: 29a15b21-ce13-471b-9a4a-0f791552aa51
name: "realization-33"
parameters: # directly pass parameters.txt. This is potentially a lot of content, only a stub is included here.
SENSNAME: faultseal
SENSCASE: low
RMS_SEED: 1006
INIT_FILES:
PERM_FLUVCHAN_E1_NORM: 0.748433
PERM_FLUVCHAN_E21_NORM: 0.782068
KVKH_CHANNEL: 0.6
KVKH_US: 0.6
FAULT_SEAL_SCALING: 0.1
FWL_CENTRAL: 1677
context:
stage: realization
file:
relative_path: realization-33/iter-0/share/results/maps/volantis_gp_base--amplitude.gri # case-relative
absolute_path: /some/absolute/path/realization-33/iter-0/share/results/maps/volantis_gp_base--amplitude.gri
checksum_md5: fa4d055b113ae5282796e328cde0ffa4 # checksum of the file, not the data.
size_bytes: 132321
data: # The data block describes the actual data (e.g. surface). Only present in data objects
content: depth # white-listed and standardized
# if stratigraphic, name must match the strat column. This is the official name of this surface.
name: volantis_top-volantis_base
stratigraphic: false # if true, this is a stratigraphic surface found in the strat column
offset: 0.0 # to be used if a specific horizon is represented with an offset.
top: # not required, but allowed
name: volantis_gp_top
stratigraphic: true
offset: 2.0
base:
name: volantis_gp_top
stratigraphic: true
offset: 8.3
stratigraphic_alias: # other stratigraphic entities this corresponds to in the strat column, e.g. Top Viking vs Top Draupne. Ref issue.
- SomeName Fm. 1 Top
alias: # other known-as names, such as name used inside RMS etc
- somename_fm_1_top
- top_somename
# tagname is flexible. The tag is intended primarily for providing uniqueness. The tagname will also be part of the outgoing file name on disk.
tagname: ds_extract_geogrid
# no content-specific attribute for "depth" but can come in the future
properties: # what the values actually show. List, only one for IRAP Binary surfaces. Multiple for 3d grid or multi-parameter surfaces. First is geometry.
- name: PropertyName
attribute: owc
is_discrete: false # to be used for discrete values in surfaces.
calculation: null # max/min/rms/var/maxpos/sum/etc
format: irap_binary
layout: regular # / cornerpoint / structured / etc
unit: m
vertical_domain: depth # / time / null
depth_reference: msl # / seabed / etc
grid_model: # Making this an object to allow for expanding in the future
name: MyGrid # important for data identification, also important for other data types
spec: # class/layout dependent, optional? Can spec be expanded to work for all data types?
ncol: 281
nrow: 441
nlay: 333
xori: 461499.9997558594
yori: 5926500.0
xinc: 25.0
yinc: 25.0
yflip: 1
rotation: 30.000000834826057
undef: 1.0e+33 # Allow both number and string
bbox:
xmin: 456012.5003497944
xmax: 467540.52762886323
ymin: 5926499.999511719
ymax: 5939492.128326312
zmin: 1244.039
zmax: 2302.683
is_prediction: true # A mechanism for separating pure QC output from actual predictions
is_observation: true # Used for 4D data currently but also valid for other data?
undef_is_zero: false # Used to flag if undef should be considered 0.0 in statistics
description:
- Depth surfaces extracted from the structural model
- Made in a FMU work flow
display:
name: Top Volantis
subtitle: Some subtitle
line:
show: true
color: black
style: solid
points:
show: false
color: null
contours:
show: true
color: black
increment: 20
fill:
show: true
color: black # color and colormap are mutually exclusive in practice, so clients must choose.
colormap: gist_earth
display_min: 1221.3
display_max: 1900.0
access:
asset:
name: Drogon
ssdl:
access_level: internal
rep_include: true
classification: internal
masterdata:
smda:
country:
- identifier: Norway
uuid: ad214d85-8a1d-19da-e053-c918a4889309
discovery:
- short_identifier: DROGON
uuid: 00000000-0000-0000-0000-000000000000 # mock uuid for Drogon
field:
- identifier: DROGON
uuid: 00000000-0000-0000-0000-000000000000 # mock uuid for Drogon
coordinate_system:
identifier: ST_WGS84_UTM37N_P32637
uuid: ad214d85-dac7-19da-e053-c918a4889309
stratigraphic_column:
identifier: DROGON_2020
uuid: 00000000-0000-0000-0000-000000000000 # mock uuid for Drogon
You will find more examples in fmu-dataio github repository.
FAQ¶
We won’t claim that these questions are really very frequently asked, but these are some key questions you may have along the way.
My existing FMU workflow does not produce any metadata. Now I am told that it has to. What do I do?
First step: Start using fmu-dataio in your workflow. You will get a lot for free using it, amongst other things, metadata will start to appear from your workflow. To get started with fmu-dataio, see the overview section.
This data model is not what I would have chosen. How can I change it?
The FMU community (almost always) builds what the FMU community wants. The first step would be to define what you are unhappy with, preferably formulated as an issue in the fmu-dataio github repository.
This data model allows me to create a smashing data visualisation component, but I fear that it is so immature that it will not be stable - will it change all the time?
Yes, and no. It is definitely experimental and these are early days. Therefore, changes will occur as learning is happening. Part of that learning comes from development of components utilizing the data model, so your feedback may contribute to evolving this data model. However, you should not expact erratic changes. The concept of Contractual attributes are introduced for this exact purpose. We have also chosen to version the metadata - partly to clearly separate from previous versions, but also for allowing smooth evolution going forward. We don’t yet know exactly how this will be done in practice, but perhaps you will tell us!