openghg_inversions.inversion_inputs#

Functions for creating the inputs needed by PyMC.

openghg_inversions.inversion_inputs.add_min_error(ds: Dataset, fp_data: dict[str, Any], min_error: str | dict[str, float] | float = 0.0, min_error_per_site: bool = True) Dataset#

Add min_error to combined Dataset.

openghg_inversions.inversion_inputs.add_site_indicator(ds: Dataset, sort: bool = False) Dataset#

Adds site_indicator and site_names data variables.

openghg_inversions.inversion_inputs.concat_gather_data_arrays(da_dict: Mapping[Hashable, DataArray], key_dim: str, ragged_dim: str, stack_dim: str | None = None, **concat_kwargs) DataArray#

Concatenate DataArrays by gathering along ragged coordinate.

For example, if the keys are site codes and the ragged dimension is time, then the “stacked dimension” will be the usual nmeasure coordinate.

Parameters:
  • da_dict – dictionary of DataArrays

  • key_dim – dimension name for the keys of the dictionary

  • ragged_dim – name of the ragged dimension

  • stack_dim – name for the “stacked” multi-index dimension

  • **concat_kwargs – arguments to pass to xr.concat

Returns:

Combined DataArray with new stacked dimension.

openghg_inversions.inversion_inputs.concat_gather_datasets(ds_dict: Mapping[Hashable, Dataset], key_dim: str, ragged_dim: str, stack_dim: str | None = None, **concat_kwargs) Dataset#

Concatenate dictionary of xr.Datasets by gathering ragged coordinates.

This assumes that all datasets have the same data variables.

TODO: need to handle missing data variables.

openghg_inversions.inversion_inputs.concat_gather_datatree(dt: DataTree, key_dim: str, ragged_dim: str, stack_dim: str | None = None, **concat_kwargs) Dataset#

Concatenate xr.DataTree children by gathering ragged coordinates.

This assumes that all children have the same data variables.

openghg_inversions.inversion_inputs.make_freq_indicator(time: DataArray, freq: Literal['monthly'] | str, *, anchor_time: str | datetime | datetime64 | Timestamp | None = None) DataArray#
openghg_inversions.inversion_inputs.make_inv_inputs(fp_data: dict[str, Any], sites: list[str] | None = None, bc_freq: Literal['monthly'] | str | None = None, sigma_freq: Literal['monthly'] | str | None = None, min_error: str | dict[str, float] | float = 0.0, min_error_per_site: bool = True, start_date: str | datetime | datetime64 | Timestamp | None = None) Dataset#
openghg_inversions.inversion_inputs.make_sigma_freq(time: DataArray, freq: Literal['monthly'] | str | None = None, anchor_time: str | datetime | datetime64 | Timestamp | None = None) DataArray#
openghg_inversions.inversion_inputs.make_site_indicator(site_coord: DataArray) DataArray#

Make site_indicator from DataArray of site names.

For instance, the values [“TAC”, “TAC”, “MHD”] would be converted to [0, 0, 1].

openghg_inversions.inversion_inputs.make_site_names(site_coord: DataArray) DataArray#

Make site names DataArray corresponding to site indicator.

openghg_inversions.inversion_inputs.transform_bc(ds: Dataset, freq: Literal['monthly'] | str | None = None, anchor_time: str | datetime | datetime64 | Timestamp | None = None) Dataset#

Convert ds so that ds.H_bc is converted to (curtain, period) coordinates.

openghg_inversions.inversion_inputs.xr_factorize(da: DataArray, indicator_name: str, label_name: str, label_dim: str, sort: bool = False) Dataset#

Create Dataset with integer indicators and labels for DataArray.

Parameters:
  • da – DataArray to find indicator for.

  • indicator_name – name for indicator data variable

  • label_name – name for label data variable

  • label_dim – dimension for labels

  • sort – if True, the labels will be sorted and the indicator shuffled

  • accordingly

Returns:

Dataset with indicator and label data variables.

openghg_inversions.inversion_inputs.xr_unique_inv(da: DataArray, sort: bool = True) DataArray#