openghg_inversions.hbmcmc.hbmcmc#

Contains functions for running all steps of the MCMC inversion using PyMC.

This module handles getting data, filtering, applying basis functions, sampling, and processing the outputs.

Notes

If not using on an HPC in the terminal you should do:

export OPENBLAS_NUM_THREADS=XX

and/or:

export OMP_NUM_THREADS=XX

where XX is the number of chains you are running.

If running in Spyder do this before launching Spyder, else you will use every available thread. Apart from being annoying it will also slow down your run due to unnecessary forking.

RHIME with OpenGHG expects ALL data to already be included in the object stores and for the paths to object stores to already be set in the users OpenGHG config file (default location: ~/.openghg/openghg.conf).

openghg_inversions.hbmcmc.hbmcmc.fixedbasisMCMC(species: str, sites: list[str], domain: str, averaging_period: list[str], start_date: str, end_date: str, outputpath: str, outputname: str, bc_store: str = 'user', obs_store: str = 'user', footprint_store: str = 'user', emissions_store: str = 'user', met_model: list | None = None, fp_model: str | None = None, fp_height: list[str] | None = None, fp_species: str | None = None, emissions_name: list[str] | None = None, inlet: list[str] | None = None, instrument: list[str] | None = None, max_level: int | None = None, calibration_scale: str | None = None, obs_data_level: list | None = None, platform: list[str | None] | str | None = None, use_tracer: bool = False, use_bc: bool = True, fp_basis_case: str | None = None, basis_directory: str | None = None, bc_basis_case: str = 'NESW', bc_basis_directory: str | None = None, country_file: str | None = None, bc_input: str | None = None, basis_algorithm: str = 'weighted', nbasis: int = 100, xprior: dict = {'lower': 0.0, 'mu': 1.0, 'pdf': 'truncatednormal', 'sigma': 1.0}, bcprior: dict = {'lower': 0.0, 'mu': 1.0, 'pdf': 'truncatednormal', 'sigma': 0.1}, sigprior: dict = {'lower': 0.1, 'pdf': 'uniform', 'upper': 3}, offsetprior: dict = {'mu': 0, 'pdf': 'normal', 'sigma': 1}, offset_args: dict | None = None, nit: int = 250000, burn: int = 50000, tune: int = 125000, nchain: int = 2, filters: None | list | dict[str, list[str] | None] = None, fix_basis_outer_regions: bool = False, averaging_error: bool = True, bc_freq: str | None = None, sigma_freq: str | None = None, sigma_per_site: bool = True, country_unit_prefix: str | None = None, add_offset: bool = False, verbose: bool = False, reload_merged_data: bool = False, save_merged_data: bool = False, merged_data_dir: str | None = None, merged_data_name: str | None = None, basis_output_path: str | None = None, save_trace: str | Path | bool = False, save_inversion_output: str | Path | bool = False, min_error: Literal['percentile', 'residual'] | None | float = 0.0, calculate_min_error: Literal['percentile', 'residual'] | None = None, min_error_options: dict | None = None, output_format: Literal['hbmcmc', 'paris', 'basic', 'merged_data', 'inv_out', 'mcmc_args', 'mcmc_results'] = 'hbmcmc', paris_postprocessing: bool = False, paris_postprocessing_kwargs: dict | None = None, power: dict | float = 1.99, **kwargs) Dataset | dict#

Script to run hierarchical Bayesian MCMC (RHIME) for inference of emissions.

Uses PyMC to solve the inverse problem. Saves an output from the inversion code using inferpymc_postprocessouts.

Parameters:
  • species – Atmospheric trace gas species of interest (e.g. ‘co2’).

  • sites – List of measurement site names.

  • domain – Model domain. (NB. Does not necessarily correspond to the inversion domain).

  • averaging_period – Averaging period of observations (must match number of sites).

  • start_date – Start time of inversion: “YYYY-mm-dd”.

  • end_date – End time of inversion: “YYYY-mm-dd”.

  • outputname – Unique identifier for output/run name.

  • outputpath – Path to where output should be saved.

  • bc_store – Name of object store containing boundary conditions files.

  • obs_store – Name of object store containing measurements files.

  • footprint_store – Name of object store containing footprints files.

  • emissions_store – Name of object store containing emissions/flux files.

  • met_model – Meteorological model used in the LPDM (e.g. ‘ukv’).

  • fp_model – LPDM used for generating footprints (e.g. ‘NAME’).

  • fp_height – Inlet height modelled for sites in LPDM (must match number of sites).

  • fp_species – Species name associated with footprints in the object store.

  • emissions_name – List of keyword “source” args used for retrieving emissions files from ‘emissions_store’.

  • inlet – Specific inlet height for the site (must match number of sites).

  • instrument – Specific instrument for the site (must match number of sites).

  • max_level – Maximum atmospheric level to extract. Only needed if using satellite data.

  • calibration_scale – Calibration scale to use for measurements data.

  • obs_data_level – Data quality level for measurements data. (must match number of sites).

  • use_tracer – Option to use inverse model that uses tracers of species (e.g. d13C, CO, C2H4).

  • use_bc – When True, use and infer boundary conditions.

  • fp_basis_case – Name of basis function to use for emission.

  • basis_directory – Directory containing the basis function.

  • bc_basis_case – Name of basis case type for boundary conditions (NOTE, I don’t think that currently you can do anything apart from scaling NSEW boundary conditions if you want to scale these monthly).

  • bc_basis_directory – Directory containing the boundary condition basis functions (e.g. files starting with “NESW”).

  • country_file – Path to the country definition file.

  • bc_input – Variable for calling BC data from ‘bc_store’ - equivalent of ‘emissions_name’ for fluxes.

  • basis_algorithm – Select basis function algorithm for creating basis function file for emissions on the fly. Options include “quadtree” or “weighted”. Defaults to “weighted” which distinguishes between land-sea regions.

  • nbasis – Number of basis functions that you want if using quadtree derived basis function. This will optimise to closest value that fits with quadtree splitting algorithm, i.e. nbasis % 4 = 1.

  • xprior – Dictionary containing information about the prior PDF for emissions. The entry “pdf” is the name of the analytical PDF used, see https://docs.pymc.io/api/distributions/continuous.html for PDFs built into pymc3, although they may have to be coded into the script. The other entries in the dictionary should correspond to the shape parameters describing that PDF as the online documentation, e.g. N(1,1**2) would be: xprior={pdf:”normal”, “mu”:1, “sd”:1}. Note that the standard deviation should be used rather than the precision. Currently all variables are considered iid.

  • bcprior – Same as xprior but for boundary conditions.

  • sigprior – Same as xprior but for model error.

  • offsetprior – Same as xprior but for bias offset. Only used is addoffset=True.

  • offset_args – Dictionary of args to pass to make_offset. For instance {“drop_first”: False} will put an offset on all site (rather than using 0 offset for the first site).

  • nit – Number of iterations for MCMC.

  • burn – Number of iterations to burn/discard in MCMC.

  • tune – Number of iterations to use to tune step size.

  • nchain – Number of independent chains to run (there is no way at all of knowing whether your distribution has converged by running only one chain).

  • filters – List of filters to apply to all sites, or dictionary with sites as keys and a list of filters for each site, e.g. filters = {“MHD”: [“pblh_inlet_diff”, “pblh_min”], “JFJ”: None}.

  • fix_basis_outer_regions – When set to True uses InTEM regions to derive basis functions for inner region. Default False.

  • averaging_error – Adds the variability in the averaging period to the measurement error if set to True.

  • bc_freq – The period over which the baseline is estimated. Set to “monthly” to estimate per calendar month; set to a number of days, as e.g. “30D” for 30 days; or set to None to estimate to have one scaling for the whole inversion period.

  • sigma_freq – As bc_freq, but for model sigma.

  • sigma_per_site – Whether a model sigma value will be calculated for each site independently (True) or all sites together (False). Default: True.

  • country_unit_prefix – A prefix for scaling the country emissions. Current options are: ‘T’ will scale to Tg, ‘G’ to Gg, ‘M’ to Mg, ‘P’ to Pg. To add additional options add to convert.prefix. Default is None and no scaling will be applied (output in g).

  • add_offset – Add an offset (intercept) to all sites but the first in the site list. Default False.

  • verbose – When True, prints progress bar of mcmc.inferpymc.

  • reload_merged_data – If True, reads fp_all object from a pickle file, instead of rerunning get_data.

  • save_merged_data – If True, saves the merged data object (fp_all) as a pickle file.

  • merged_data_dir – Path to a directory of merged data objects. For saving to or reading from.

  • merged_data_name – Name of files in which are the merged data objects. For saving to or reading from.

  • basis_output_path – If set, save the basis functions to this path. Used for testing.

  • save_trace – If True, save arviz InferenceData trace to outputpath. Alternatively, a file path (including file name and extension) can be passed, and the trace will be saved there.

  • min_error – If float, the value represents the minimum error. Otherwise, compute min model error using the “residual” method or the “percentile” method. (See openghg_inversions.model_error.py for details.) Combines the functionality of the previous min_error and calculate_min_error parameters. None only an option to accommodate old ini files.

  • calculate_min_error – Is deprecated and will be removed in a future update.

  • min_error_options – Dictionary of additional arguments to pass the function used to calculate min. model error (as specified by min_error).

  • output_format

    Select what is returned/saved by inversion:

    • ”hbmcmc”: (default) return the results of inferpymc_postprocessouts, and save result as netCDF

    • ”merged_data”: return fp_all dictionary, no further processing and inversion not run

    • ”inv_out”: return InversionOutput object

    • ”basic”: return basic output created by new postprocessing submodule

    • ”paris”: return flux and concentration datasets with PARIS formatting; these are also saved as netCDF files in the directory outputpath

    • ”mcmc_args”: return the arguments passed to fixedbasisMCMC, but do not run the inversion

    • ”mcmc_results”: return the results of fixedbasisMCMC with no further processing

  • paris_postprocessing_kwargs – Dict of kwargs to pass to make_paris_outputs.

  • power – Power to raise pollution event size to if using pollution events from obs. Default is 1.99.

Returns:

Results from the inversion in a Dataset if skip_post_processing==False,

in a dictionary if True.

Return type:

xr.Dataset | dict

openghg_inversions.hbmcmc.hbmcmc.rerun_output(input_file: str, outputname: str, outputpath: str, verbose: bool = False) None#

Rerun the MCMC code using inputs from a previous output.

This allows reproducibility of results without the need to transfer all raw input files.

Parameters:
  • input_file – Full path to previously written ncdf file.

  • outputname – Unique identifier new for output/run name.

  • outputpath – Path to where output should be saved.

  • verbose – When True, prints progress bar of mcmc.inferpymc.

Note

At the moment fluxapriori in the output is the mean apriori flux over the inversion period and so will not be identical to the original a priori flux, if it varies over the inversion period.