openghg_inversions.filters#
Functions for filtering data.
All filters are accessed and applied to data via the filtering function.
New filters are registered using @register_filter. A filter function should accept as arguments: an xr.Dataset, a bool called “keep_missing”
To see the available filters call list_filters.
- openghg_inversions.filters.daily_median(dataset: Dataset, keep_missing: bool = False) Dataset#
Resample data to daily frequency and use daily median values.
- Parameters:
dataset – dataset to filter
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
- openghg_inversions.filters.daytime(dataset: Dataset, keep_missing: bool = False) Dataset#
Subset during daytime hours (11:00-15:00).
- Parameters:
dataset – dataset to filter
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
- openghg_inversions.filters.daytime9to5(dataset: Dataset, keep_missing: bool = False) Dataset#
Subset during daytime hours (9:00-17:00).
- Parameters:
dataset – dataset to filter
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
- openghg_inversions.filters.filtering(datasets_in: dict, filters: str | None | dict[str, list[str | None]] | list[str | None], keep_missing: bool = False) dict#
Applies time filtering to all datasets in datasets_in.
If filters is a list, the same filters are applied to all sites. If filters is a dict with site codes as keys, then the filters applied to each site depend on the list supplied for that site.
In any case, filters supplied in a list are applied in order. For example, if you wanted a daily, daytime average, you could do this:
- datasets_dictionary = filtering(datasets_dictionary,
[“daytime”, “daily_median”])
The order of the filters reflects the order they are applied, so for instance when applying the “daily_median” filter if you only wanted to look at daytime values the filters list should be [“daytime”,”daily_median”]
If a site is datasets_in is not in filters, then no filters are applied to that site.
- Parameters:
datasets_in – dictionary of datasets containing output from ModelScenario.footprints_merge().
filters – filters to apply to the datasets. Either a list of filters, which will be applied to every site, or a dictionary of lists of the form {<site code>: [filter1, filter2, …]}, with specific filters to be applied at each site. Use the list_filters function to list available filters.
keep_missing – if True, drop missing data
- Returns:
dict in same format as datasets_in, with filters applied
- openghg_inversions.filters.list_filters() None#
Print a list of the available filters with a short description.
- openghg_inversions.filters.local_influence(dataset: Dataset, keep_missing: bool = False) Dataset#
Subset for times when “local influence” is below threshold.
Local influence expressed as a fraction of the sum of entire footprint domain.
- Parameters:
dataset – dataset to filter
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
- openghg_inversions.filters.nighttime(dataset: Dataset, keep_missing: bool = False) Dataset#
Subset during nighttime hours (23:00 - 03:00).
- Parameters:
dataset – dataset to filter
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
- openghg_inversions.filters.noon(dataset: Dataset, keep_missing: bool = False) Dataset#
Select only 12pm data.
- Parameters:
dataset – dataset to filter
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
- openghg_inversions.filters.pblh(dataset: Dataset, keep_missing: bool = False) Dataset#
Deprecated: pblh is now called pblh_inlet_diff.
- openghg_inversions.filters.pblh_inlet_diff(dataset: Dataset, diff_threshold: float = 50.0, keep_missing: bool = False) Dataset#
Subset for times when observations are taken at a height of less than 50 m below the PBLH.
- Parameters:
dataset – dataset to filter
diff_threshold – filter will discard times where obs. are taken at a height of less than diff_threshold below PBLH
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
TODO: need way to pass diff_threshold to filter
- openghg_inversions.filters.pblh_min(dataset: Dataset, pblh_threshold: float = 200.0, keep_missing: bool = False) Dataset#
Subset for times when the PBLH is greater than 200m.
- Parameters:
dataset – dataset to filter
pblh_threshold – filter will discard times where PBLH/atmosphere boundary layer thickness is below pblh_threshold
keep_missing – if True, drop time points removed by filter
- Returns:
filtered dataset
TODO: need way to pass pblh_threshold to filter
- openghg_inversions.filters.register_filter(filt: Callable) Callable#
Decorator function to register filters.
- Parameters:
filt – filter function to register
- Returns:
filt, the input function (no modifications made)
For instance, the following use of register_filter as a decorator adds my_new_filter to the filtering_functions dictionary, under the key “my_new_filter”:
>>> @register_filter def my_new_filter(data): return data >>> "my_new_filter" in filtering_functions True