openghg_inversions.filters#

Functions for filtering data.

All filters are accessed and applied to data via the filtering function.

New filters are registered using @register_filter. A filter function should accept as arguments: an xr.Dataset, a bool called “keep_missing”

To see the available filters call list_filters.

openghg_inversions.filters.daily_median(dataset: Dataset, keep_missing: bool = False) Dataset#

Resample data to daily frequency and use daily median values.

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

openghg_inversions.filters.daytime(dataset: Dataset, keep_missing: bool = False) Dataset#

Subset during daytime hours (11:00-15:00).

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

openghg_inversions.filters.daytime9to5(dataset: Dataset, keep_missing: bool = False) Dataset#

Subset during daytime hours (9:00-17:00).

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

openghg_inversions.filters.filtering(datasets_in: dict, filters: str | None | dict[str, list[str | None]] | list[str | None], keep_missing: bool = False) dict#

Applies time filtering to all datasets in datasets_in.

If filters is a list, the same filters are applied to all sites. If filters is a dict with site codes as keys, then the filters applied to each site depend on the list supplied for that site.

In any case, filters supplied in a list are applied in order. For example, if you wanted a daily, daytime average, you could do this:

datasets_dictionary = filtering(datasets_dictionary,

[“daytime”, “daily_median”])

The order of the filters reflects the order they are applied, so for instance when applying the “daily_median” filter if you only wanted to look at daytime values the filters list should be [“daytime”,”daily_median”]

If a site is datasets_in is not in filters, then no filters are applied to that site.

Parameters:
  • datasets_in – dictionary of datasets containing output from ModelScenario.footprints_merge().

  • filters – filters to apply to the datasets. Either a list of filters, which will be applied to every site, or a dictionary of lists of the form {<site code>: [filter1, filter2, …]}, with specific filters to be applied at each site. Use the list_filters function to list available filters.

  • keep_missing – if True, drop missing data

Returns:

dict in same format as datasets_in, with filters applied

openghg_inversions.filters.list_filters() None#

Print a list of the available filters with a short description.

openghg_inversions.filters.local_influence(dataset: Dataset, keep_missing: bool = False) Dataset#

Subset for times when “local influence” is below threshold.

Local influence expressed as a fraction of the sum of entire footprint domain.

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

openghg_inversions.filters.nighttime(dataset: Dataset, keep_missing: bool = False) Dataset#

Subset during nighttime hours (23:00 - 03:00).

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

openghg_inversions.filters.noon(dataset: Dataset, keep_missing: bool = False) Dataset#

Select only 12pm data.

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

openghg_inversions.filters.pblh(dataset: Dataset, keep_missing: bool = False) Dataset#

Deprecated: pblh is now called pblh_inlet_diff.

openghg_inversions.filters.pblh_inlet_diff(dataset: Dataset, diff_threshold: float = 50.0, keep_missing: bool = False) Dataset#

Subset for times when observations are taken at a height of less than 50 m below the PBLH.

Parameters:
  • dataset – dataset to filter

  • diff_threshold – filter will discard times where obs. are taken at a height of less than diff_threshold below PBLH

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

TODO: need way to pass diff_threshold to filter

openghg_inversions.filters.pblh_min(dataset: Dataset, pblh_threshold: float = 200.0, keep_missing: bool = False) Dataset#

Subset for times when the PBLH is greater than 200m.

Parameters:
  • dataset – dataset to filter

  • pblh_threshold – filter will discard times where PBLH/atmosphere boundary layer thickness is below pblh_threshold

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset

TODO: need way to pass pblh_threshold to filter

openghg_inversions.filters.register_filter(filt: Callable) Callable#

Decorator function to register filters.

Parameters:

filt – filter function to register

Returns:

filt, the input function (no modifications made)

For instance, the following use of register_filter as a decorator adds my_new_filter to the filtering_functions dictionary, under the key “my_new_filter”:

>>> @register_filter
    def my_new_filter(data):
        return data
>>> "my_new_filter" in filtering_functions
True
openghg_inversions.filters.six_hr_mean(dataset: Dataset, keep_missing: bool = False) Dataset#

Resample data to 6h frequency and use 6h mean values.

Parameters:
  • dataset – dataset to filter

  • keep_missing – if True, drop time points removed by filter

Returns:

filtered dataset