The qcproperties class ================================== The ``qcproperties`` class provides a set of methods for extracting property statistics from 3D Grids, Raw and Blocked wells. Statistics can be extracted for both continous and discrete properties. Dependent on the property type different statistics are calculated. The property type is auto-detected. If several methods of statistics extraction has been run within the instance, a merged dataframe is available through the 'dataframe' property. The methods for statistics extraction can be run individually, or a yaml-configuration file can be used to enable an automatic run of the methods. All methods can be run from either RMS python, or from files (e.g. from an ERT job). XTGeo is being utilized to get a dataframe from the input parameter data. XTGeo data is reused in the instance to increase performance. Methods for extracting property statistics ----------------------------------------------- Three methods exists for extracting property statistics. The method to select is dependent on the input data source (3D grid properties, wells or blocked wells). Arguments for the methods are similar and described in section below. * ``get_grid_statistics``: This method extract property statistics from 3D grid data. * ``get_well_statistics``: This method extract property statistics from well logs. * ``get_bwell_statistics``: This method extract property statistics from blocked well logs. * ``from_yaml``: Use a yaml-configuration file to enable an automatic run of the methods above. All methods returns a Pandas DataFrame for the run in question, if several methods of statistics extraction has been run within the instance a merged dataframe is available through the 'dataframe' property .. seealso:: The `Using yaml input for auto execution` section for description of how to use a yaml-configuration file to run the different methods automatically. Other methods ^^^^^^^^^^^^^^^ Note: The methods below are only applicable if at least one method for extracting statistics have been run within the QCProperties instance. dataframe A merged dataframe with statistical data for **continous** properties from all runs of statistics extractions within the instance. to_csv Used to write the dataframe with statistics to a csv-file. Takes one arguments: ``csvfile``: String with desired filename (required). Arguments ^^^^^^^^^^ The input `data` is given in a python dictionary (or a YAML file) and will be somewhat different for the three methods, and for the two run environments (inside/outside RMS). **Input arguments:** * ``data``: The input data as a Python dictionary (required). See valid keys below. * ``project``: Required for usage inside RMS **Valid fields in the 'data' argument:** Method specific fields: grid Name of grid icon in RMS, or name of grid file if run outside RMS. Required with the ``get_grid_statistics`` method. wells Required with the ``get_well_statistics`` and the ``get_bwell_statistics`` methods. Outside RMS, "wells" is a list of files on RMS ascii well format. Inside RMS, "wells" is a dictionary whith 3 fields that depend on the method: **get_well_statistics**: ``names``: List of wellnames (optional). Default is all wells. ``logrun``: Name of logrun. ``trajectory``: Name of trajectory. **get_bwell_statistics**: ``names``: List of wellnames (optional). Default is all wells. ``bwname``: Name of BW object in RMS. ``grid``: Name of grid that contains the BW object. .. note:: Wildcards are supported when running from files, and python valid regular expressions are supported in "names", see examples. Common fields: properties Properties to compute statistics for. Both continous and discrete properties are supported. Standard statistics will be computed for continous properties e.g "avg" and "stddev", while for discrete properties percentages are calculated. Can be given as list or as dictionary. If dictionary the key will be the column name in the output dataframe, and the value will be a dictionary with valid options: ``name``: The actual name (or path) of the property / log. ``weight``: A weight parameter (name or path if outside RMS) (optional) ``pfile``: Name (or path) to file containing the parameter e.g. INIT file (optional) selectors Selectors are discrete properties/logs e.g. Zone. that are used to extract statistics for groups of the data (optional). Can be given as list or as dictionary. If dictionary the key will be the column name in the output dataframe, and the value will be a dictionary with valid options: ``name``: The actual name (or path) of the property / log. ``include``: List of values to include (optional) ``exclude``: List of values to exclude (optional) ``codes``: A dictionary of codenames to update some/all existing codenames (optional). ``pfile``: Name (or path) to file containing the parameter e.g. INIT file (optional) .. note:: The "codes" field can be used to merge code values that the user wants to extract combined statistics from. This is done by setting the same name on several code values, as it is the name that are used to group the data. filters Dictionary with additional filters (optional). The key is the name (or path) to the filter parameter / log, and the value is a dictionary with options: ``include``: List of values to include for discrete parameters ``exclude``: List of values to exclude for discrete parameters ``range``: List with two entries, defining minimum and maximum values to use for continous parameters ``pfile``: Name (or path) to file containing the parameter e.g. INIT file .. note:: If a selector or property is input as a filter, this will override any existing filters specified directly on the selector/property. .. seealso:: Option ``"multiple_filters"`` below which can be used to extract statistics multiple times with different filters. multiple_filters Option that can be used to extract statistics multiple times with different filters (optional). The input is a dictionariy where the keys are the "name" (ID string) for the dataset, and the value is the dictionary of filters (Same format as ``filters`` above) See examples. path Path to where files are located (optional) selector_combos Bool to turn on/off calculation of statistics for every combination of selectors (optional). Default is True. For example, if True and both a ZONE and a REGION parameter is given as selectors, statistics for three groups will be calculated: ``["ZONE", "FACIES"], ["ZONE"] and ["REGION"]``. If False the data will only be extracted for one group: ``["ZONE", "FACIES"]``, hence no data is available if the user wants to evaluate statistics per ZONE (or REGION) for the global grid. Depending on number of selectors and size of grid, this process may be time consuming. source Source string (optional). Default values depend on the method being executed: * For **grid statistics** default is the `gridname` * For **blocked wells statistics** default is the `name of the blocked wells object` if inside RMS and `bwells` if outside * For **well statistics** default is `wells` name ID string for the dataset (optional). Recommended, if not given it will be set equal to the source string. verbosity Level of output while running None, "info" or "debug", default is None. (optional) Examples ^^^^^^^^^ get_grid_statistics examples """""""""""""""""""""""""""""""" **Example in RMS (continous properties - basic):** Example extracting statistics for porosity and permeability for each zone and facies. Result is written to csv. .. code-block:: python from fmu.tools import QCProperties GRID = "GeoGrid" PROPERTIES = ["Poro", "Perm"] SELECTORS = ["Zone", "Facies"] REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "grid": GRID, "verbosity": 1, } qcp.get_grid_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() print("Done") **Example in RMS (continous properties - more settings):** Example extracting statistics for porosity per region. Filters are used to extract statistics for HC zone and Water zone separately. Statistics will be combined for regions with code values 2 and 3. Both properties are weighted on a Total_bulk parameter. Result is written to csv. .. code-block:: python from fmu.tools import QCProperties GRID = "GeoGrid" PROPERTIES = { "PORO": {"name": "PHIT", "weight": "Total_bulk"}, } SELECTORS = { "REGION": { "name": "Regions", "exclude": ["Surroundings"], "codes": {2: "NS", 3: "NS",}, } } REPORT = "../output/qc/continous_stats.csv" FLUID_FILTERS = { "HC_zone": {"Fluid": {"include": ["oil", "gas"]}}, "Water_zone": {"Fluid": {"include": ["water"]}}, } def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "grid": GRID, "multiple_filters": FLUID_FILTERS, "verbosity": 1, } qcp.get_grid_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() print("Done") .. note:: The code is executed twice, filtering on the HC-zone first then the water-zone in a second run. Alternatively the fluid parameter could have been used as a selector, for extracting statistics in one run. **Example in RMS (discrete properties):** Example extracting statistics for a discrete facies parameter for each region. The facies parameter are weighted on a Total_bulk parameter. The result is written out to csv. .. code-block:: python from fmu.tools import QCProperties GRID = "GeoGrid" PROPERTIES = { "FACIES": {"name": "Facies", "weight": "Total_bulk"}, } SELECTORS = ["Regions"] REPORT = "../output/qc/discrete_stats.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "grid": GRID, "verbosity": 1, } qcp.get_grid_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() print("Done") **Example when executed from files:** .. code-block:: python from fmu.tools import QCProperties PATH = "../input/qc/" GRID = "grid.roff" PROPERTIES = {"PORO": {"name": "poro.roff"}} SELECTORS = { "ZONE": { "name": "zone.roff", }, "FACIES": { "name": "facies.roff", "exclude": ["Carbonate"], }, } REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "path": PATH, "grid": GRID, "name": "MYDATA", } qcp.get_grid_statistics(data=usedata) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() **Example when executed from file using Eclipse INIT-file as input:** .. code-block:: python from fmu.tools import QCProperties PATH = "../input/qc/" GRID = "ECLIPSE.EGRID" PROPERTIES = {"PERMX": {"name": "PERMX", "pfile": "ECLIPSE.INIT"}} SELECTORS = { "FIPNUM": { "name": "FIPNUM", "pfile": "ECLIPSE.INIT" }, } REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "path": PATH, "grid": GRID, "name": "from_eclipse", } qcp.get_grid_statistics(data=usedata) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() get_well_statistics examples """""""""""""""""""""""""""""""" **Example in RMS:** Example extracting statistics for permeability for each zone and facies. All wells starting with 33_10 and all 34_11 wells containing "A" will be included in statistics. Note the use of python regular expressions! Result is written to csv. .. code-block:: python from fmu.tools import QCProperties WELLS = { "names": ["33_10.*", "34_11-.*A.*"], "logrun": "log", "trajectory": "Drilled trajectory", } PROPERTIES = {"PERM": {"name": "Klogh"}} SELECTORS = ["Zonelog", "Facies_log"] REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "wells": WELLS, } qcp.get_well_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() print("Done") **Example when executed from files:** Example extracting statistics for permeability for each zone and facies. First extracting statistics for wells starting with "34_10-A", then wells starting with "34_10-B" in a subsequent run. Result is written to csv. .. code-block:: python from fmu.tools import QCProperties WELLS = ["34_10-A.*"] PATH = "../input/qc/" PROPERTIES = ["Phit", "Klogh"] SELECTORS = ["Zonelog", "Facies_log"] REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "wells": WELLS, "path": PATH, "name": "A-wells", } qcp.get_well_statistics(data=usedata) usedata2 = usedata.copy() usedata2["wells"] = ["34_10-B.*"] usedata2["name"] = "B-wells" qcp.get_grid_statistics(data=usedata2, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() get_bwell_statistics examples """""""""""""""""""""""""""""""" **Example in RMS:** Example extracting statistics for permeability for each zone and facies. All blocked wells will be included in statistics. Result is written to csv. .. code-block:: python from fmu.tools import QCProperties WELLS = { "bwname": "BW", "grid": "GeoGrid", } PROPERTIES = {"PERM": {"name": "Klogh"}} SELECTORS = ["Zonelog", "Facies_log"] REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() usedata = { "properties": PROPERTIES, "selectors": SELECTORS, "wells": WELLS, } qcp.get_bwell_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() print("Done") **Example when executed from files:** To come.... Comparison of data from different sources ------------------------------------------- Advice when comparing data from different sources ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When extracting statistics from different sources there are several tips for enabling easy comparison in the post-analysis of the data in e.g. WebViz: * Input "properties" and "selectors" as dictionaries and keep property and selector keys identical between the sources. The keys will be the names seen in the dataframe. * Try to use the same selectors for all sources * Keep the option "selector_combos" at True to get as much overlapping data as possible. For example, if well statistics only have ZONE as selector and the grid properties are calculated with selectors ZONE and REGION and "selector_combos" where True, the ZONE level statistics can be compared. * Use the "codes" field on the selectors to align and match the codenames for each selector. For example if the zone codes are coarser in the grid than in the zonelogs from the wells, this field can be used to merge codes in the zonelog together under one name. Example ^^^^^^^^^ Example below collects statistical data from four different sources and writes result to a csv-file. Several steps have been to ensure consistency between the sources, making the resulting csv-file easy to compare: * "Poro" and "Perm" will be the property names * "ZONE" will be the column name for the selector * The zone codes "UpperReek", "MidReek", "LowerReek" is present in the two grids, to get the same codes in the wells the codes are updated and redundant codes are excluded. .. code-block:: python from fmu.tools import QCProperties REPORT = "../output/qc/somefile.csv" GEOGRIDDATA = { "properties": ["Poro", "Perm"], "selectors": {"ZONE": {"name":"Zone"}}, "grid": "GeoGrid", } SIMGRIDDATA = { "properties": {"Poro": {"name":"PORO"}, "Perm": {"name":"PERMX"}}, "selectors": {"ZONE": {"name":"Zone"}}, "grid": "SimGrid", } BWDATA = { "properties": {"Poro": {"name": "Phit"}, "Perm": {"name": "Klogh"}}, "selectors": {"ZONE": {"name": "Zonelog", "codes": {1:"UpperReek", 2:"MidReek", 3:"LowerReek"}, "exclude": ["Above_TopUpperReek", "Below_BaseLowerReek"]}}, "wells": {"bwname": "BW", "grid": "Geogrid"}, } WDATA = BWDATA.copy() WDATA["wells"] = {"logrun": "log", "trajectory": "Drilled trajectory"} def extract_statistics(): qcp = QCProperties() qcp.get_grid_statistics(data=GEOGRIDDATA, project=project) qcp.get_grid_statistics(data=SIMGRIDDATA, project=project) qcp.get_bwell_statistics(data=BWDATA, project=project) qcp.get_well_statistics(data=WDATA, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() .. seealso:: The section below for example of using the same configuration but with yaml-input. Using yaml input for auto execution ----------------------------------- A yaml-configuration file can be used with the method ``from_yaml`` to enable an automatic run of the methods. This is especially useful if the user wants to run multiple extractions of statistics with minimal code input. The code evaluates what method to execute based on the value of the first level in the yaml file. The second level is a list of input 'data' objects, and statistics will be calculated for each list element. **Three fields are available for the first level:** * ``grid``: the get_grid_statistics method are executed on elements in this level * ``wells``: the get_well_statistics method are executed on elements in this level * ``blockedwells``: the get_bwell_statistics method are executed on elements in this level Example in RMS with setting from a YAML file: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Example using yaml input in RMS for extracting statistics for porosity and permeability from four data sources (geogrid, simgrid, wells and blocked wells). The resulting combined dataframe are written to csv. .. code-block:: python from fmu.tools import QCProperties YAML_PATH = "../input/qc/somefile.yml" REPORT = "../output/qc/somefile.csv" def extract_statistics(): qcp = QCProperties() qcp.from_yaml(YAML_PATH, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() The YAML file may in case look like: .. code-block:: yaml grid: - grid: GeoGrid properties: - Poro - Perm selectors: ZONE: name: Zone - grid: SimGrid properties: Poro: name: PORO Perm: name: PERMX selectors: ZONE: name: Zone wells: - wells: logrun: log trajectory: Drilled trajectory properties: Poro: name: Phit Perm: name: Klogh selectors: ZONE: name: Zonelog codes: 1: UpperReek 2: MidReek 3: LowerReek exclude: - Above_TopUpperReek - Below_BaseLowerReek blockedwells: - wells: grid: GeoGrid bwname: BW properties: Poro: name: Phit Perm: name: Klogh selectors: ZONE: name: Zonelog codes: 1: UpperReek 2: MidReek 3: LowerReek exclude: - Above_TopUpperReek - Below_BaseLowerReek Additional Notes --------------------- Advice on performance ^^^^^^^^^^^^^^^^^^^^^^^^^ There are several settings that has an influence perfomance: * Filters can be used to remove unnecessary data, this will limit the input data before statistics is calculated and will speed up execution. * If many selectors, the option ``selector_combos`` can have a high impact on performance Comparison with statistics in RMS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * To avoid bias in the calculation, the code removes duplicates from both well and blocked well data before calculating statistics. Duplicates are data points that have the same coordinates and property values. For blocked wells this refers to cells that are penetrated by multiple wells, for raw wells this can happen if branches of multilateral wells have overlapping logs. This is the same as RMS does when calculating statistics for blocked wells, and statistical values extracted with this code will be identical to RMS. However RMS does not remove duplicates when calculating statistics for raw wells, and minor differences in statistical values are possible.