openghg_inversions.models#

Reusable model-building helpers for OpenGHG inversions.

class openghg_inversions.models.CoordRegistry(pymc_coords: dict[str, ~numpy.ndarray]=<factory>, original_coords: dict[str, ~typing.Any]=<factory>, auxiliary_coords: dict[str, ~xarray.core.dataarray.DataArray]=<factory>)#

Bases: object

Track scientific and PyMC-safe coordinates for a model.

Variables:
  • pymc_coords (dict[str, numpy.ndarray]) – Sanitized coordinates actually registered with PyMC.

  • original_coords (dict[str, Any]) – Original scientific coordinates keyed by model dimension name.

  • auxiliary_coords (dict[str, xarray.core.dataarray.DataArray]) – Additional non-dimension coordinates attached to model dimensions, such as exploded time or site coordinates derived from a stacked nmeasure MultiIndex.

add(coords: dict[str, Any] | Coordinates, *, model_dims: tuple[str, ...] | list[str] | set[str] | None = None) None#

Register model and auxiliary coordinates with consistency checks.

Parameters:
  • coords – Coordinate mapping or xarray coordinate container to register.

  • model_dims – Optional subset of model dimensions represented by the current data variable. Auxiliary coordinates attached to these dimensions are also preserved when possible.

Raises:

ValueError – If the same coordinate name is registered more than once with conflicting lengths, shapes, or values.

auxiliary_coords: dict[str, DataArray]#
original_coords: dict[str, Any]#
pymc_coords: dict[str, ndarray]#
class openghg_inversions.models.LinearComponentResult(data: TensorVariable, latent: TensorVariable, output: TensorVariable)#

Bases: object

Objects created by add_linear_component.

data: TensorVariable#
latent: TensorVariable#
output: TensorVariable#
openghg_inversions.models.add_coords(coords: dict[str, ndarray] | Coordinates, *, model_dims: tuple[str, ...] | list[str] | set[str] | None = None) None#

Register coordinates on the active model and capture scientific metadata.

Parameters:
  • coords – Coordinate mapping or xarray coordinate container to register.

  • model_dims – Optional subset of model dimensions represented by the current data variable. When provided, auxiliary coordinates attached to those dimensions are also stored in the registry.

This helper must be called inside an active pm.Model context.

openghg_inversions.models.add_inferpymc_likelihood_component(data: Dataset, /, mu: TensorVariable, mu_bc: TensorVariable | None, sigprior: dict, offset: TensorVariable | None = None, power: dict | float = 1.99, pollution_events_from_obs: bool = False, no_model_error: bool = False, sigma_per_site: bool = True, output_dim: str = 'nmeasure') TensorVariable#

Add the inferpymc observation model.

mu is the non-baseline forward-model contribution. mu_bc is the baseline contribution, usually H_bc @ bc, plus offset if applicable.

Parameters:
  • data – Canonical inferpymc input dataset.

  • mu – Non-baseline forward-model contribution.

  • mu_bc – Baseline contribution, if present.

  • sigprior – Prior specification for sigma.

  • offset – Optional aligned offset term.

  • power – Scalar or prior specification controlling pollution-event scaling.

  • pollution_events_from_obs – Whether to derive pollution events from the observations instead of mu.

  • no_model_error – Whether to bypass the model-error term.

  • sigma_per_site – Whether sigma varies by site.

  • output_dim – Observation/output dimension name.

Returns:

The epsilon deterministic variable used by the observation model.

openghg_inversions.models.add_linear_component(data: DataArray, /, data_name: str, prior_args: dict, var_name: str, output_name: str, output_dim: str = 'nmeasure', compute_deterministic: bool = True) LinearComponentResult#

Add a linear latent component and its aligned forward-model contribution.

Parameters:
  • data – Sensitivity matrix or other linear data term.

  • data_name – Name used when registering the data as pm.Data.

  • prior_args – Prior specification for the latent random variable.

  • var_name – Name for the latent random variable.

  • output_name – Name for the aligned deterministic output.

  • output_dim – Observation/output dimension name.

  • compute_deterministic – Whether to wrap the aligned output in pm.Deterministic.

Returns:

A LinearComponentResult containing the registered data tensor, the effective latent variable, and the aligned output tensor.

openghg_inversions.models.add_model_data(data: DataArray, name: str | None = None) TensorVariable#

Add labelled xarray data to the active PyMC model.

Parameters:
  • data – Xarray data to register as pm.Data.

  • name – Optional PyMC variable name. If omitted, data.name is used.

Returns:

The registered pm.Data tensor for data.

Raises:

ValueError – If no name can be determined for the data variable.

openghg_inversions.models.add_offset_component(site_indicator: DataArray, /, prior_args: dict, offset_freq_indicator: DataArray | ndarray | None = None, offset_freq: str | None = None, var_name: str = 'offset_latent', output_name: str = 'offset', output_dim: str = 'nmeasure', drop_first: bool = False) TensorVariable#

Add a site-only or site-by-period offset component.

Parameters:
  • site_indicator – Observation-aligned site indicator.

  • prior_args – Prior specification for the offset latent variable.

  • offset_freq_indicator – Optional explicit observation-aligned offset frequency indicator.

  • offset_freq – Optional frequency string used to derive an indicator when offset_freq_indicator is not provided.

  • var_name – Name for the latent offset variable.

  • output_name – Name for the aligned deterministic offset output.

  • output_dim – Observation/output dimension name.

  • drop_first – Whether to omit the first site indicator column.

Returns:

The aligned offset deterministic variable.

openghg_inversions.models.add_sigma_component(site_indicator: DataArray, /, prior_args: dict, sigma_freq_index: DataArray | None = None, sigma_freq: str | None = None, var_name: str = 'sigma', output_name: str | None = None, per_site: bool = True, output_dim: str = 'nmeasure', compute_deterministic: bool = False) TensorVariable#

Add inferpymc-compatible sigma terms and align them to observations.

Parameters:
  • site_indicator – Observation-aligned site indicator.

  • prior_args – Prior specification for the sigma random variable.

  • sigma_freq_index – Optional explicit observation-aligned frequency indicator.

  • sigma_freq – Optional frequency string used to derive an indicator when sigma_freq_index is not provided.

  • var_name – Name for the latent sigma random variable.

  • output_name – Optional name for an observation-aligned deterministic output.

  • per_site – Whether sigma varies by site.

  • output_dim – Observation/output dimension name.

  • compute_deterministic – Whether to register the aligned sigma term as a deterministic variable.

Returns:

The observation-aligned sigma tensor or deterministic variable.

Raises:

ValueError – If no frequency information is available.

openghg_inversions.models.attach_coord_registry(model: Model, registry: CoordRegistry) None#

Attach a coordinate registry to a PyMC model.

openghg_inversions.models.get_coord_registry(model: Model) CoordRegistry | None#

Return the coordinate registry attached to a PyMC model, if any.

openghg_inversions.models.parse_prior(name: str, prior_params: dict[str, str | float | bool], **kwargs) TensorVariable#

Create a continuous PyMC prior from a prior-parameter dictionary.

Parameters:
  • name – Name of the user-facing PyMC variable to create.

  • prior_params – Prior specification including pdf and any distribution parameters accepted by the chosen PyMC distribution.

  • **kwargs – Additional keyword arguments forwarded to the created PyMC variable, such as dims.

Returns:

The created PyMC random variable or deterministic transform.

Raises:

ValueError – If prior_params["pdf"] does not name a supported PyMC continuous distribution.

This helper must be called inside an active pm.Model context because it registers the created variable with the current model.

openghg_inversions.models.restore_inferencedata_coords(idata: InferenceData, coords_or_registry: CoordRegistry | dict[str, Any]) InferenceData#

Restore saved scientific coordinates onto matching InferenceData groups.

Parameters:
  • idata – Inference data object returned by sampling.

  • coords_or_registry – Either a CoordRegistry or a legacy mapping of original coordinates keyed by dimension name.

Returns:

The same InferenceData object with compatible original coordinates and auxiliary coordinates restored onto its xarray groups.