openghg_inversions.postprocessing.stats#

Functions for computing statistics on datasets.

class openghg_inversions.postprocessing.stats.StatsFunction(name, func, params)#

Bases: tuple

Tuple holding a stats function and the parameters it accepts.

Storing the parameters the stats function accepts helps with passing optional arguments to the stats functions without having to specify extra parameters for each stats function separately.

func#

Alias for field number 1

name#

Alias for field number 0

params#

Alias for field number 2

openghg_inversions.postprocessing.stats.calculate_stats(ds: Dataset, stats: list[str] = ['mean', 'quantiles'], **kwargs) Dataset#

Calculate stats on dataset.

Parameters:
  • ds – dataset to calculate stats on.

  • stats – list of stats to calculate.

  • **kwargs – arguments to pass to stats functions. If a parameter can be passed to a stats function, it will be passed. To pass to a specific stats function, use <stats func name>__<key> = <value>. Note: that is a double underscore. For instance mode_kde__chunk_dim=”country” would specify chunk_dim only for the stats function “mode_kde”.

Returns:

dataset containing all stats calculated on all variables in input dataset.

Raises:

ValueError – if a statistic in stats is not found in the registry.

openghg_inversions.postprocessing.stats.hdi(ds: Dataset, hdi_prob: float | Iterable[float] = 0.68, sample_dim: str = 'draw')#

Compute highest density interval with the given probabilities.

openghg_inversions.postprocessing.stats.mean(ds: Dataset, sample_dim='draw')#

Compute sample mean.

openghg_inversions.postprocessing.stats.median(ds: Dataset, sample_dim='draw')#

Compute sample median.

openghg_inversions.postprocessing.stats.mode(ds: Dataset, sample_dim='draw', thin: int = 1)#

Approximate the mode by the midpoint of the shorted interval containing k samples.

The slowest step is sorting. Still, this is over 30x faster than computing the KDE. (Unless you parallelise the KDE version by chunking the input.)

Thinning by some integer factor will produce a corresponding speed up. For instance, if thin = 2 is passed, then the running time will be roughly half.

openghg_inversions.postprocessing.stats.mode_kde(ds: Dataset, sample_dim='draw', chunk_dim: str | None = None, chunk_size: int = 10) Dataset#

Calculate the (KDE smoothed) mode of a data array containing MCMC samples.

This can be parallelized if you chunk the DataArray first, e.g. >>> da_chunked = da.chunk({“basis_region”: 10})

openghg_inversions.postprocessing.stats.quantiles(ds: Dataset, quantiles: Sequence[float] = [0.159, 0.841], sample_dim: str = 'draw') Dataset#

Compute quantiles.

Parameters:
  • ds – input dataset; must have dimension specified by sample_dim (default is “draw”)

  • quantiles – sequence of quantiles to compute; default values correspond to mean +/- 1 stdev for a normally distributed sequence.

  • sample_dim – dimension to compute quantiles over; defaults to “draw”, which is the default sample dimension for PyMC outputs.

Returns:

xr.Dataset of specified quantiles, with a new quantile dimension.

openghg_inversions.postprocessing.stats.register_stat(stat: Callable) Callable#

Decorator function to register stats functions.

Parameters:

stat – stats function to register

Returns:

stat, the input function (no modifications made)

openghg_inversions.postprocessing.stats.stdev(ds: Dataset, sample_dim='draw')#

Compute sample standard deviation.