openghg_inversions.inversion_data.serialise#

Functions for saving and loading data used for inversions.

_save_merged_data saves the fp_all dict created by get_data.data_processing_surface_notracer to disk (either as a pickle file, netCDF, or zarr)
load_merged_data restores the fp_all dict from these saved formats
make_combined_scenario converts the fp_all dict into a xr.Dataset

openghg_inversions.inversion_data.serialise.combine_scenario_attrs(attrs_list: list[dict[str, Any]], context) → dict[str, Any]#

Combine attributes when concatenating scenarios from different sites.

The ModelScenario.scenario`s in `get_combined_scenario have the key “scenario” added to their attributes as a flag so this function can process the dataset attributes and the data variable attributes differently.

TODO: add ‘time_period’, ‘high_time/spatial_resolution’, ‘short_lifetime’, ‘heights’?: Is ‘time_period’ from the footprint? Need to check model scenario…

Parameters:

attrs_list – list of attributes from datasets being concatenated
context – additional parameter supplied by concatenate (this is required/supplied by xarray)

Returns:

dict that will be used as attributes for concatenated dataset

openghg_inversions.inversion_data.serialise.fp_all_from_dataset(ds: Dataset) → dict#

Recover “fp_all” dictionary from “combined scenario” dataset.

This is the inverse of make_combined_scenario, except that the attributes of the scenarios, fluxes, and boundary conditions may be different.

Parameters:: ds – dataset created by make_combined_scenario
Returns:: dictionary containing model scenarios keyed by site, as well as flux and boundary conditions.

openghg_inversions.inversion_data.serialise.load_merged_data(merged_data_dir: str | Path, species: str | None = None, start_date: str | None = None, output_name: str | None = None, merged_data_name: str | None = None, output_format: Literal['pickle', 'netcdf', 'zarr', 'zarr.zip'] | None = None) → dict#

Load fp_all dictionary from a file in merged_data_dir.

The name of the pickle file can be specified using merged_data_name, or a standard name will be created given species, start_date, and output_name.

If merged_data_name is not given, then species, start_date, and output_name must be provided.

This function tries to automatically find a compatible format of merged data, if a format is not specified. First, it checks for data in “zarr” format, then in netCDF, and finally in pickle.

Parameters:

merged_data_dir – path to directory where merged data will be saved
species – species of inversion
start_date – start date of inversion period
output_name – output name parameter used for inversion run
merged_data_name – name to use for saved data.
output_format – format of data to load (if not specified, this will be inferred).

Returns:

fp_all dictionary

openghg_inversions.inversion_data.serialise.make_combined_scenario(fp_all: dict) → Dataset#

Combine scenarios and merge in fluxes and boundary conditions.

If fluxes and boundary conditions only have one coordinate for their “time” dimension, then “time” will be dropped.

Otherwise, it is assumed that the time axis for fluxes and boundary conditions have the same length as the time axis for the model scenarios.

openghg_inversions.inversion_data.serialise#

This Page