Updating schemas

This document contains instructions and guidelines for making changes to schemas.

Changing a schema

This section collects some things to do, or not to do, when adding or changing fields to a schema. Some of these are generic but some are specific to how Pydantic translates its models into JSON schemas.

Tips and guidelines

  • Add a doc string to every class and field.

    These docstrings are built into the data model documentation and included in the schema as description fields. Try to align them to existing examples!

  • Avoid free text fields as much as possible.

    Use string enums if the string should be from a controlled vocabulary. If a string should be of a particular form apply a regex, if possible.

  • Prefer to make fields required.

    At the time of writing we have to annotate optional fields with Optional, but this term is slightly deceiving. As a schema this makes the field nullable, meaning that it is a required field that is either some type T or null.

    A truly optional field must be a union between two types: one with the optional field, and one without it.

  • Ensure Optional fields have default=None.

    There are issues in the JSON schema that occur when an optional field is not given a default of None. A test should catch when you forget to do this, but you should remember to do it.

  • Apply numerical validation.

    Many numerics have known ranges due to representing things physical and geometric. A cube cannot have a z depth of 0, a thickness cannot be negative. (Or can it? 🧐). Pydantic Fields make this sort of validation simple to add; see its numeric constraints documentation.

  • Take union orderings as important.

    Pydantic has different modes it resolves union types by. This means when you create a field with a unioned type you should remember to consider that validating the union may be more nuanced than it appears.

Running the update script

To update schemas use the tool included in fmu-dataio.

./tools/update-schemas

If any schemas have changed, this command will fail and alert you that a version bump is required for said schemas. You can also include the --diff flag to display what has changed.

Under normal circumstances this means you must update the schema version in accordance with the schema versioning protocol. Schema versions are located in the code, as a class variable constant VERSION in the schema class derived from SchemaBase. Changing a schema version is cause for a discussion among the team and possibly stakeholders.

Under some circumstances, you may need to force update the schemas with some changes. You can give the --force flag for this.

Preparing schemas for production

To prepare schemas for production involves a few steps.

  1. Check-out the commit or tagged version from upstream/main with the correct schema changes:

    git fetch upstream
    git checkout 3.2.1  # for version
    # or
    git checkout 1a2b3c4  # for a particular commit
    
  2. Branch off from this

    git checkout -b schema-release-2030.01
    
  3. Rebase the upstream/staging branch

    git rebase upstream/staging
    git log  # Ensure it looks right
    
  4. Check that the schema URLs are changing for the production version

    ./tools/update-schemas --prod --diff
    
  5. Update the schemas for production

    ./tools/update-schemas --prod --force
    
  6. Build and inspect the documentation relevant for the schema locally and ensure the information (version, fields, etc) are up to date. Append the changes that occurred to the changelog of each schema.

    sphinx-build -b html docs/src build/docs/html -j auto
    open build/docs/html/index.html  # in macOS
    
  7. Commit and push to your fork. Create a PR that merges into the staging (!) branch, not the main branch

    git add schemas/ docs/
    git commit -m "REL: Update schema X"
    git push -u origin HEAD
    
  8. Carefully look over the changes in the schema to ensure nothing looks out of order (typos, URLs pointing to dev rather than prod, etc)

  9. Once merged this will create a new build on the staging Radix environment. Once tested and validated by all relevant parties it can be promoted to the production environment.