openghg_inversions.array_ops#

General methods for xarray Datasets and DataArrays.

The functions here are not specific to OpenGHG inversions: they add functionality missing from xarray. These functions should accept xarray Datasets and DataArrays, and return either a Dataset or a DataArray.

Functions#

get_xr_dummies

Applies pandas get_dummies to xarray DataArrays.

sparse_xr_dot

Multiplies a Dataset or DataArray by a DataArray with sparse underlying array. The built-in xarray functionality doesn’t work correctly.

openghg_inversions.array_ops.align_sparse_lat_lon(sparse_da: DataArray, other_array: DataWithCoords) DataArray#

Align lat/lon coordinates of sparse_da with lat/lon coordinates from other_array.

NOTE: This is a work-around for an xarray Issue: pydata/xarray#3445

Parameters:
  • sparse_da – xarray DataArray with sparse underlying array

  • other_array – xarray Dataset or DataArray whose lat/lon coordinates should be used to replace the lat/lon coordinates in sparse_da

Returns:

copy of sparse_da with lat/lon coords from other_array

Return type:

xr.DataArray

openghg_inversions.array_ops.get_xr_dummies(da: DataArray, categories: Sequence[Any] | Index | DataArray | ndarray | None = None, cat_dim: str = 'categories', return_sparse: bool = True) DataArray#

Create 0-1 dummy matrix from DataArray with values that correspond to categories.

If the values of da are integers 0-N, then the result has N + 1 columns, and the (i, j) coordiante of the result is 1 if da[i] == j, and is 0 otherwise.

This function works like the pandas function get_dummies, but preserves the coordinates of the input data, and allowing the user to specify coordinates for the categories used to make the “dummies” (or “one-hot encoding”).

Parameters:
  • da – DataArray encoding categories.

  • categories – optional coordinates for categories.

  • cat_dim – dimension for categories coordinate

  • sparse – if True, store values in sparse.COO matrix

Returns:

Dummy matrix corresponding to the input vector. Its dimensions are the same as the

input DataArray, plus an additional “categories” dimension, which has one value for each distinct value in the input DataArray.

openghg_inversions.array_ops.sparse_xr_dot(da1: DataArray, da2: DataArray, dim: list[str] | None = None) DataArray#
openghg_inversions.array_ops.sparse_xr_dot(da1: DataArray, da2: Dataset, dim: list[str] | None = None) Dataset

Compute the matrix “dot” of a tuple of DataArrays with sparse.COO values.

This multiplies and sums over all common dimensions of the input DataArrays, and preserves the coordinates and dimensions that are not summed over.

Common dimensions are automatically selected by name. The input arrays must have at least one dimension in common. All matching dimensions will be used for multiplication.

Compared to just using da1 @ da2, this function has two advantages: 1. if da1 is sparse but not a dask array, then da1 @ da2 will fail if da2 is a dask array 2. da2 can be a Dataset, and current DataArray @ Dataset is not allowed by xarray

Parameters:
  • da1 – xr.DataArrays to multiply and sum along common dimensions.

  • da2 – xr.DataArrays to multiply and sum along common dimensions.

  • dim – optional list of dimensions to sum over; if None, then all common dimensions are summed over.

Returns:

containing the result of matrix/tensor multiplication.

The type that is returned will be the same as the type of da2.

Return type:

xr.Dataset or xr.DataArray

openghg_inversions.array_ops.to_dense(da: DataArray) DataArray#

Convert sparse to numpy.

If the data array has chunks, these are preserved, but the underlying arrays are converted. Does nothing if chunks are already numpy.