openghg_inversions.array_ops#
General methods for xarray Datasets and DataArrays.
The functions here are not specific to OpenGHG inversions: they add functionality missing from xarray. These functions should accept xarray Datasets and DataArrays, and return either a Dataset or a DataArray.
Functions#
- get_xr_dummies
Applies pandas
get_dummiesto xarray DataArrays.- sparse_xr_dot
Multiplies a Dataset or DataArray by a DataArray with sparse underlying array. The built-in xarray functionality doesn’t work correctly.
- openghg_inversions.array_ops.align_sparse_lat_lon(sparse_da: DataArray, other_array: DataWithCoords) DataArray#
Align lat/lon coordinates of sparse_da with lat/lon coordinates from other_array.
NOTE: This is a work-around for an xarray Issue: pydata/xarray#3445
- Parameters:
sparse_da – xarray DataArray with sparse underlying array
other_array – xarray Dataset or DataArray whose lat/lon coordinates should be used to replace the lat/lon coordinates in sparse_da
- Returns:
copy of sparse_da with lat/lon coords from other_array
- Return type:
xr.DataArray
- openghg_inversions.array_ops.get_xr_dummies(da: DataArray, categories: Sequence[Any] | Index | DataArray | ndarray | None = None, cat_dim: str = 'categories', return_sparse: bool = True) DataArray#
Create 0-1 dummy matrix from DataArray with values that correspond to categories.
If the values of da are integers 0-N, then the result has N + 1 columns, and the (i, j) coordiante of the result is 1 if da[i] == j, and is 0 otherwise.
This function works like the pandas function get_dummies, but preserves the coordinates of the input data, and allowing the user to specify coordinates for the categories used to make the “dummies” (or “one-hot encoding”).
- Parameters:
da – DataArray encoding categories.
categories – optional coordinates for categories.
cat_dim – dimension for categories coordinate
sparse – if True, store values in sparse.COO matrix
- Returns:
- Dummy matrix corresponding to the input vector. Its dimensions are the same as the
input DataArray, plus an additional “categories” dimension, which has one value for each distinct value in the input DataArray.
- openghg_inversions.array_ops.sparse_xr_dot(da1: DataArray, da2: DataArray, dim: list[str] | None = None) DataArray#
- openghg_inversions.array_ops.sparse_xr_dot(da1: DataArray, da2: Dataset, dim: list[str] | None = None) Dataset
Compute the matrix “dot” of a tuple of DataArrays with sparse.COO values.
This multiplies and sums over all common dimensions of the input DataArrays, and preserves the coordinates and dimensions that are not summed over.
Common dimensions are automatically selected by name. The input arrays must have at least one dimension in common. All matching dimensions will be used for multiplication.
Compared to just using da1 @ da2, this function has two advantages: 1. if da1 is sparse but not a dask array, then da1 @ da2 will fail if da2 is a dask array 2. da2 can be a Dataset, and current DataArray @ Dataset is not allowed by xarray
- Parameters:
da1 – xr.DataArrays to multiply and sum along common dimensions.
da2 – xr.DataArrays to multiply and sum along common dimensions.
dim – optional list of dimensions to sum over; if None, then all common dimensions are summed over.
- Returns:
- containing the result of matrix/tensor multiplication.
The type that is returned will be the same as the type of da2.
- Return type:
xr.Dataset or xr.DataArray