openghg_inversions.models.coords#

Helpers for managing xarray and PyMC coordinate mismatches.

Xarray coordinates can contain rich objects, including MultiIndex coordinates from stacked or ragged dimensions such as nmeasure representing stacked (site, time) observations. PyMC does not reliably accept all such objects as model coordinates, so model construction should use sanitized, PyMC-safe coords.

The current sanitization policy is intentionally simple: convert each known dimension coordinate to a range index. The original scientific coordinates are stored separately so they can later be restored onto ArviZ InferenceData.

class openghg_inversions.models.coords.CoordRegistry(pymc_coords: dict[str, ~numpy.ndarray]=<factory>, original_coords: dict[str, ~typing.Any]=<factory>, auxiliary_coords: dict[str, ~xarray.core.dataarray.DataArray]=<factory>)#

Bases: object

Track scientific and PyMC-safe coordinates for a model.

Variables:
  • pymc_coords (dict[str, numpy.ndarray]) – Sanitized coordinates actually registered with PyMC.

  • original_coords (dict[str, Any]) – Original scientific coordinates keyed by model dimension name.

  • auxiliary_coords (dict[str, xarray.core.dataarray.DataArray]) – Additional non-dimension coordinates attached to model dimensions, such as exploded time or site coordinates derived from a stacked nmeasure MultiIndex.

add(coords: dict[str, Any] | Coordinates, *, model_dims: tuple[str, ...] | list[str] | set[str] | None = None) None#

Register model and auxiliary coordinates with consistency checks.

Parameters:
  • coords – Coordinate mapping or xarray coordinate container to register.

  • model_dims – Optional subset of model dimensions represented by the current data variable. Auxiliary coordinates attached to these dimensions are also preserved when possible.

Raises:

ValueError – If the same coordinate name is registered more than once with conflicting lengths, shapes, or values.

auxiliary_coords: dict[str, DataArray]#
original_coords: dict[str, Any]#
pymc_coords: dict[str, ndarray]#
openghg_inversions.models.coords.add_coords(coords: dict[str, ndarray] | Coordinates, *, model_dims: tuple[str, ...] | list[str] | set[str] | None = None) None#

Register coordinates on the active model and capture scientific metadata.

Parameters:
  • coords – Coordinate mapping or xarray coordinate container to register.

  • model_dims – Optional subset of model dimensions represented by the current data variable. When provided, auxiliary coordinates attached to those dimensions are also stored in the registry.

This helper must be called inside an active pm.Model context.

openghg_inversions.models.coords.attach_coord_registry(model: Model, registry: CoordRegistry) None#

Attach a coordinate registry to a PyMC model.

openghg_inversions.models.coords.get_coord_registry(model: Model) CoordRegistry | None#

Return the coordinate registry attached to a PyMC model, if any.

openghg_inversions.models.coords.restore_inferencedata_coords(idata: InferenceData, coords_or_registry: CoordRegistry | dict[str, Any]) InferenceData#

Restore saved scientific coordinates onto matching InferenceData groups.

Parameters:
  • idata – Inference data object returned by sampling.

  • coords_or_registry – Either a CoordRegistry or a legacy mapping of original coordinates keyed by dimension name.

Returns:

The same InferenceData object with compatible original coordinates and auxiliary coordinates restored onto its xarray groups.

openghg_inversions.models.coords.sanitize_coords_for_pymc(coords: dict[str, Any] | Coordinates | object, *, model_dims: tuple[str, ...] | list[str] | set[str] | None = None) dict[str, ndarray]#

Convert coordinate metadata into the range-based format to use with PyMC.

PyMC accepts fewer coordinate types than Xarray, so for simplicity, we convert all coordinates to range coordinates, and use the range coordinates with PyMC.

Parameters:
  • coords – Coordinate mapping or xarray coordinate container.

  • model_dims – Optional subset of dimensions to sanitize. When omitted, all dimensions found in coords are considered.

Returns:

A mapping from model dimension name to a simple np.arange index of the corresponding length.