openghg_inversions.hbmcmc.hbmcmc_post_process#
Script to process HBMCMC (RHIME) output.
This module includes functions for processing and visualizing HBMCMC inversion results.
Functions#
- write_netcdf, append_netcdf
Write output to netCDF files.
- plot_scaling
Plot posterior scaling map.
- regions_histogram
Plot histogram of number of regions.
- country_emissions
Calculate emissions from given list of countries. Currently hard-wired for methane.
- openghg_inversions.hbmcmc.hbmcmc_post_process.calculate_DIC(ds, silence=False)#
Calculates the Deviance information criterion (DIC) for an inversion. It does this using two different definitions: 1) Spiegelhalter et al. (2002) https://doi.org/10.1111/1467-9868.00353 2) Gelman et al. (2004) http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf.
The DIC is similar to metrics like AIC (Akaike Information Criterion) but better suited to hierarchical models and MCMC. What this is useful for is for testing things such as whether increasing the number of basis functions improves things, or is it just fitting to the noise and making uncertainty larger. The lower the DIC the better. It should be noted that this doesn’t give a hard and fast description of the suitability of a statistical model but is the most useful indicator I can find.
- openghg_inversions.hbmcmc.hbmcmc_post_process.check_missing_dates(filenames, dates, labels=[])#
Check for missing dates from a list of filenames.
- Parameters:
filenames – List of filenames to check.
dates – List of expected dates to check in filenames.
labels – List of labels for the dates that do match.
- Returns:
- (dates, labels) - List of dates with matching filenames and
corresponding list of labels. labels associated with dates, as specified in input
- Return type:
- openghg_inversions.hbmcmc.hbmcmc_post_process.check_platform(site: str, network: str | None = None) str | None#
This function extracts platform (if specified) for the site from site_info.json file. network can be specified if site is associated with more than one network. If not specified, the first network will be used by default.
- Parameters:
site – Site code (if applicable) or name
network – If a site is part of multiple networks, will select a given one and check the platform
- Returns:
- platform type (e.g. “site”, “satellite”, “aircraft”) if specified by site_info.json,
otherwise None
- Return type:
str or None
- openghg_inversions.hbmcmc.hbmcmc_post_process.country_emissions(ds, species, domain, country_file=None, country_unit_prefix=None, countries=None)#
Extract individual country emissions from a dataset.
- Parameters:
ds – Output dataset from HBMCMC inversion.
species – Species run in the inversion (e.g. ‘hfc23’).
domain – Domain over which the inversion was run (e.g. ‘EASTASIA’).
country_file – Country file from which to extract country definitions. Defaults to None, in which case the function looks for it in ‘data/countries/[domain]’ using the utils.get_country function.
country_unit_prefix – Prefix for which to report emissions in (e.g. ‘G’ for Gg). Conversion done by convert.prefix. Defaults to None, in which case emissions are reported in g.
countries – Array of country names for which to calculate emissions for. Defaults to None, in which case these are extracted from the country file.
- Returns:
- (cntrymean, cntry68, cntry95, cntryprior, cntrymode) where:
cntrymean: 1D array of mean emissions from each country
cntry68: 2D array of 68% CI emissions from each country
cntry95: 2D array of 95% CI emissions from each country
cntryprior: 1D array of prior emissions from each country
cntrymode: 1D array of mode emissions from each country
- Return type:
- openghg_inversions.hbmcmc.hbmcmc_post_process.country_emissions_mult(ds_list, species, domain, country_file=None, country_unit_prefix=None, countries=None)#
Calculate country emissions across multiple datasets.
See process.country_emissions() function for details of inputs.
- Returns:
- (cntrymean_arr, cntry68_arr, cntry95_arr, cntryprior_arr) where:
cntrymean_arr: Array of country means for each ds, with size [number of ds x number of countries]
- cntry68_arr: Array of 68th percentile upper and lower bounds of country emissions for each ds,
with size [number of ds x number of countries x 2]
- cntry95_arr: Array of 95th percentile upper and lower bounds of country emissions for each ds,
with size [number of ds x number of countries x 2]
cntryprior_arr: Array of country priors for each ds, with size [number of ds x number of countries].
- Return type:
- openghg_inversions.hbmcmc.hbmcmc_post_process.define_stations(ds: Dataset, sites: list[str] | None = None, use_site_info: bool | None = False) dict | None#
Define latitude and longitude values for each site in a dataset.
The output can be passed directly as the ‘stations’ argument in plot_map.
If sites is not specified, if the platform for the site is listed as “aircraft” or “satellite” in the site_info.json file then no values are included in the stations dictionary for this site.
- Parameters:
ds –
Output from run_hbmcmc() function. Expects dataset to contain:
sitelons: Longitude values for each site. Dimension = len(sites)
sitelats: Latitude values for each site. Dimension = len(sites)
y_site: Site identifier for each measurement. Dimension = nmeasure
sites – List of sites to look for within dataset. If not specified, the sites will be extracted from the input dataset assuming a data variable “sites” is included within the dataset.
use_site_info – Use positions from openghg_defs rather than extract them from the tdmcmc dataset. Default = False.
- Returns:
Dictionary containing sitelats, sitelons for each site.
- Return type:
dict or None
- openghg_inversions.hbmcmc.hbmcmc_post_process.extract_hbmcmc_files(directory, species, domain, runname, dates, return_filenames=False)#
Find hbmcmc output filenames based on naming convention.
Naming convention: “directory”/”species”+”domain”+”runname”_”date”.nc” Open as xarray.Dataset objects and return as a list.
- Parameters:
directory – Path to output directory where hbmcmc files are written.
species – Species of inversion (e.g. “hfc23”).
domain – Domain of inversion (e.g. “EASTASIA”).
runname – Name of run (as specified in .ini file).
dates – List of dates of the inversion, as specified at the top of the .ini file and in the output file name.
return_filenames – Whether to return the filenames. Defaults to False.
- Returns:
- If return_filenames is True, returns (ds_list, filenames), otherwise just ds_list.
ds_list: List of xarray datasets matching the input parameters
filenames: List of filenames (only if return_filenames is True)
- Return type:
- openghg_inversions.hbmcmc.hbmcmc_post_process.open_ds(path)#
Function efficiently opens xr datasets.
- Parameters:
path (str) – path to xarray dataset
- Returns:
dataset
- Return type:
xr.Dataset
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_abs_map(ds_list, species, lat=None, lon=None, grid=True, clevels=None, divergeCentre=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, borders=True, labels=None, plot_stations=True, use_site_info=False, smooth=False, out_filename=None, fignum=None, title=None, extend='both', figsize=None, flux_data_var='fluxmode')#
Plot 2D map(s) of posterior x in g/m2/s.
- Parameters:
ds_list –
List of xarray.Dataset objects. Each dataset is an output from run_tdmcmc() function (tdmcmc_inputs.py script). Expects each data set to contain:
x_post_vit: posterior values for each iteration flattened along lat-lon axis. Dimensions = nIt x NGrid (nlat x nlon)
q_ap: a priori flux values on a latitude x longitude grid. Dimensions = nlat x nlon
lat – Data array of lat values to plot over - must match values in ds exactly.
lon – Data array of lon values to plot over - must match values in ds exactly.
species – Species for the tdmcmc output.
grid – Whether to plot the posterior on one figure as a grid or on individual plots.
labels – Can specify either one label for all plots (str) or a different label for each plot. If list is specified, it must match number of datasets in ds_list.
plot_stations (bool, optional) – Plot site positions on the output map. Will not plot aircraft or satellite positions.
use_site_info (bool, optional) – If plotting site positions, use positions from site_info.json file rather than extract them from the tdmcmc dataset. Default = False.
flux_data_var (str) – Which measure of flux distribution (i.e. mode) to plot. Defaults to ‘fluxmode’, which is the only one in the output files currently
inputs. (See plot_map() function for definition of remaining)
- Returns:
None
- If out_filename specified:
Plot is written to file
- Otherwise:
Plot is displayed interactively
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_country_timeseries(country_mean, country_CI, country_prior, d0, country_label='', prior_label='Prior', posterior_label='Posterior', y_label='emissions', units='g', figsize=(7, 3))#
Plot timeseries of country emissions. Requires more than one time stamp.
- Parameters:
country_mean (data array) – A 1D array of emissions over time
country_CI (data array) – A 2D array of upper and lower bounds of emissions over time
country_prior (data array) – A 1D array of prior emissions over time
d0 (data array) – 1D array of time stamps. Must match the length of country_mean, country_CI and country_prior
country_label (str) – Label for the country being plotted. Defaults to empty string
prior_label (str) – Label for the prior emissions. Defaults to “Prior”
posterior_label (str) – Label for the posterior emissions. Defaults to “Posterior”
y_label (str) – Label for the y-axis (which is labelled [country_label] [y_label]). Defaults to “emissions”
units (str) – Units for labelling of y-axis. Defaults to “g”
figsize (tuple) – Size of figure. Defaults to (7,3).
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_diff_map(ds_list, species, lat=None, lon=None, grid=True, clevels=None, divergeCentre=None, centre_zero=True, cmap=<matplotlib.colors.LinearSegmentedColormap object>, borders=True, labels=None, plot_stations=True, use_site_info=False, smooth=False, out_filename=None, fignum=None, title=None, extend='both', figsize=None, flux_data_var='fluxmode')#
Plot 2D map(s) of the difference between the prior and posterior x in g/m2/s.
- Parameters:
ds_list –
List of xarray.Dataset objects. Each dataset is an output from run_tdmcmc() function (tdmcmc_inputs.py script). Expects each data set to contain:
x_post_vit: posterior values for each iteration flattened along lat-lon axis. Dimensions = nIt x NGrid (nlat x nlon)
q_ap: a priori flux values on a latitude x longitude grid. Dimensions = nlat x nlon
lat – Data array of lat values to plot over - must match values in ds exactly.
lon – Data array of lon values to plot over - must match values in ds exactly.
species – Species for the tdmcmc output.
grid – Whether to plot the posterior on one figure as a grid or on individual plots.
labels (str/list) – Can specify either one label for all plots (str) or a different label for each plot. If list is specified, it must match number of datasets in ds_list.
plot_stations (bool, optional) – Plot site positions on the output map. Will not plot aircraft or satellite positions.
use_site_info (bool, optional) – If plotting site positions, use positions from site_info.json file rather than extract them from the tdmcmc dataset. Default = False.
flux_data_var (str) – Which measure of flux distribution (i.e. mode) to plot. Defaults to ‘fluxmode’, which is the only one in the output files currently
inputs. (See plot_map() function for definition of remaining)
- Returns:
None
- If out_filename specified:
Plot is written to file
- Otherwise:
Plot is displayed interactively
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_map(data, lon, lat, clevels=None, divergeCentre=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, borders=True, label=None, smooth=False, out_filename=None, stations=None, fignum=None, title=None, extend='both', figsize=None, fig=None, ax=None, show=True)#
Plot 2D map of data.
e.g. scaling map of posterior x i.e. degree of scaling applied to prior emissions. Mainly used within the wrappers of plot_abs_map, plot_diff_map etc.
- Parameters:
data – 2D (lat,lon) array of whatever you want.
lon – Longitude array matching to data grid.
lat (numpy.array) – Latitude array matching to data grid
clevels (numpy.array, optional) – Array of contour levels; if None, uses the set_clevels function
divergeCentre (float/None, optional) –
- Default is None, to replicate original clevels behaviour.
If given a float, this value is used to manually set the centre value of a diverging cmap, while using the min and max values of clevels as the min and max values of the cmap.
cmap (matplotlib.cm, optional) – Colormap object; defaults to Red Blue reverse (plt.cm.RdBu_r)
borders (bool, optional) – Add country borders as well as coastlines. Default = True.
label (str, optional) – Label to appear underneath the colorbar. Default = None
smooth (bool, optional) – If True plot smooth contours; otherwise use pcolormesh. Default = False.
out_filename (str, optional) – Output filename. If this is specified the plot will be written to file. Will be shown interactively otherwise (if show=True). Default = None.
stations (dict, optional) –
Default is None. If specified needs to be a dictionary containing the list of sites and site locations for each site. For example:
{“sites”: [‘MHD’, ‘TAC’], “MHDlons”: -9.02, “MHDlats”: 55.2, “TAClons”: etc…}
This is the default output from the define_stations function, but can’t default to this as the data argument doesn’t have a sitenames attribute
fignum (int, optional) – Figure number for created plot. Default = None
title (str, optional) – Title for the plot or sub-plot. Default = None
extend (str, optional) – Extend colorbar for out-of-range values. Options are [ ‘neither’ | ‘both’ | ‘min’ | ‘max’ ] Default = “both”. Set to “neither” to not extend.
figsize (tuple/None, optional) – Figure size tuple if creating a new fig object. Default = None.
fig (matplotlib.pyplot.Figure, optional) – Figure object for plot. If not specified this will be created. Default = None
ax (matplotlib.pyplot.Axes, optional) – Axes object for plot domain. If not specified this will be created. Default = None
show (bool, optional) – Whether to plot immediately upon completion of plotting within this function. Note that out_filename supercedes this option and plot will be written to file even if this is set to True. Default = True.
- Returns:
None
- If out_filename is None:
Created plot is saved to file
- Else:
Plot is displayed interactively
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_map_mult(data_all, lon, lat, grid=True, subplot='auto', clevels=None, divergeCentre=None, centre_zero=False, cmap=<matplotlib.colors.LinearSegmentedColormap object>, borders=True, labels=None, smooth=False, out_filename=None, stations=None, fignum=None, title=None, extend='both', figsize=None)#
Use plot_map function to plot a set of maps either on a grid or as separate figures.
If plotting on a grid the subplots are either determined automatically based on shape of input or using subplot input.
- Expect data_all to either be:
a numpy.array of the shape: nlat x nlon (x ngrid)
list of numpy.array objects each of shape nlat x nlon.
Either the ngrid dimension or the len of the list is taken as the number of panels to include on the plot.
- Parameters:
data_all – Multiple lat-lon grids to be plotted on one figure as a set of sub-plots or as multiple figures. Can either be a list of grids or an array of dimension nlat x nlon (x ngrid).
lon – Longitude array matching to longitude points in each grid in grid_data.
lat – Latitude array matching to longitude points in each grid in grid_data.
grid – Whether to plot on a grid. Default = True.
subplot – If grid is True, subplot grid to use. If this is set to “auto” this will be automatically determined based on the size of ngrid (see subplot_fmt() function). Otherwise, this should be a two item list of [nrows, ncols]. Default = “auto”.
labels – Can specify either one label for all plots (str) or a different label for each plot as a list. If list is specified, it must match ngrid length.
Note
See plot_map() function for definition of remaining inputs.
- Returns:
None. If out_filename specified, plot is written to file. Otherwise, plot is displayed interactively.
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_multi_country_timeseries(ds_list, species, domain, countries, country_file, d0, country_unit_prefix=None, plot_prior=True, figsize=(7, 3))#
Plot country emissions timeseries for given inversions and specified countries.
- Parameters:
ds_list (list) – list of RHIME hbmcmc output datasets
species (str) – species for which the inversion was run
domain (str) – domain over which the inversion was run
countries (list) – list of countries to plot (must be in countryfile)
country_file (filepath) – country file
d0 (array) – 1D array of dates corresponding to the ds_list
country_unit_prefix (str) – prefix for which to report emissions in (e.g. ‘G’ for Gg). Conversion done by convert.prefix. Defaults to None, in which case emissions are reported in g
plot_prior (bool) – whether to plot the prior emissions. defaults to True
figsize (tuple) – figure size (defaults to (7,3))
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_scale_map(ds_list, lat=None, lon=None, grid=True, clevels=None, divergeCentre=None, centre_zero=False, cmap=<matplotlib.colors.LinearSegmentedColormap object>, borders=True, labels=None, plot_stations=True, use_site_info=False, smooth=False, out_filename=None, fignum=None, title=None, extend='both', figsize=None)#
Plot 2D scaling map(s) of posterior x.
This is the degree of scaling which has been applied to prior emissions.
- Parameters:
ds_list –
List of xarray.Dataset objects. Each dataset is an output from run_tdmcmc() function (tdmcmc_inputs.py script). Expects each data set to contain:
x_post_vit: posterior values for each iteration flattened along lat-lon axis. Dimensions = nIt x NGrid (nlat x nlon)
lat – Data array of lat values to plot over - must match values in ds exactly.
lon – Data array of lon values to plot over - must match values in ds exactly.
grid – Whether to plot the posterior on one figure as a grid or on individual plots.
labels – Can specify either one label for all plots (str) or a different label for each plot. If list is specified, it must match number of datasets in ds_list.
plot_stations – Whether to plot station locations. Plot site positions on the output map. Will not plot aircraft or satellite positions.
use_site_info (bool, optional) – If plotting site positions, use positions from site_info.json file rather than extract them from the tdmcmc dataset. Default = False.
inputs. (See plot_map() function for definition of remaining)
- Returns:
None
- If out_filename specified:
Plot is written to file
- Otherwise:
Plot is displayed interactively
- openghg_inversions.hbmcmc.hbmcmc_post_process.plot_timeseries(ds, fig_text=None, ylim=None, out_filename=None, doplot=True, figsize=None, plot_prior=False, plot_bc_prior=False)#
Plot measurement timeseries of posterior and observed measurements Requires post_mcmc xr dataset. Can plot to console or save to file. Plots 95% CI by default. For future: incorporate model & measurement uncertainty Plots separate subplots for each of the measurement sites - hopefully!
- Parameters:
ds (xarray dataset) – dataset output from run_tdmcmc script
fig_text (String) – e.g. “CH$_{4}$ mole fraction (ppb)”. Defaults to None
ylim (array) – y-axis limits [ymin,ymax]. If not specified, set automatically
out_filename (string) – Filename to save file, if specified. Defaults to None (figure isn’t saved)
doplot (bool) – Plot to console? (optional, defaults to True)
figsize (tuple) – Specify size of figure as a two-item tuple.
plot_prior (bool) – Plot mole fraction prior.
plot_bc_prior (bool) – Plot inner boundary conditions prior.
- openghg_inversions.hbmcmc.hbmcmc_post_process.set_clevels(data, num_tick=20.0, tick=None, centre_zero=False, above_zero=False, rescale=False, robust=False)#
The set_clevels function defines a set of contour levels for plotting based on the inputs values.
- Parameters:
data (iterable) – Data which will be plotted.
num_tick (int) – Number of ticks on axis within levels. Either this or tick should be specified. Default = 20
tick (int/None) – Tick interval to use between minimum and maximum data values. Either this or num_tick should be specified. Default = None i.e. use num_tick rather than set an explicit tick interval
centre_zero (bool, optional) – Explictly centre levels around zero. Default = False.
above_zero (bool, optional) – Explicitly set clevels based on percentiles of positive data only. Default = False
rescale (bool, optional) – Rescale according to the most appropriate unit. This will rescale based on 10^3 and return the scaling factor used. Default = False
robust (bool, optional) – Based on xarray plotting. This finds the 2nd and 98th percentiles (rather than min and max) to account for any outliers which would cause the range to be too large. Default = False
- Returns:
Array of levels values based on min and max of input data. Also returns scaling factor if rescale=True.
- Return type:
np.array[,float]
- openghg_inversions.hbmcmc.hbmcmc_post_process.subplot_fmt(num: int, row_dims: list[int] = [3, 2, 4], fill: bool | None = False) tuple[int, int]#
Decide the placement of a grid of figures dependent on the number.
The row_dims input determines which placement is preferable for the user.
- Parameters:
num – Number of figures to be placed.
row_dims –
Row dimensions in order of preference. For the default row_dims=[3,2,4] the preferences of placement is as follows:
equal rows of 3
equal rows of 2
equal rows of 4
If none of the above are possible the format will be num x number of columns if fill is True or the configuration suitable for num+1 if fill is False.
fill – All panels in subplot must be filled. If not, for uneven numbers an extra panel will be added which will be left blank when plotting. Default = False (i.e. allow an empty panel to be included within subplot).
- Returns:
2 item tuple containing the row number and column number for the subplots.
- Return type:
- openghg_inversions.hbmcmc.hbmcmc_post_process.unbiasedDivergingCmap(data, zero=0, minValue=None, maxValue=None)#
Calculate the normalisation of a diverging cbar around a given value.
Prevents bias due to asymmetry in data affecting scale.
- Parameters:
data – numpy array of data to plot
zero – the centre value of the cmap
minValue – smallest value to use in calculation
maxValue – largest value to use in calculation
- Returns:
a normalization function to be fed into plot