openghg_inversions.model_error#

Functions for computing estimates of model error.

openghg_inversions.model_error.percentile_error_method(ds_dict: dict[str, Dataset]) → ndarray#

Compute estimate of minimum model error using percentile error method.

This is a simple method to estimate the minimum model error (i.e. the model error used at baseline points). For each site. it takes the monthly median measured mf and subtracts the monthly 5th percentile measured mf, then calculates the annual mean of these monthly values. The thinking behind this is that transport error might result in modelled enhancements at the baseline points, even with an accurate flux map. So this provides a rough calculation for the likely impact of such an event.

Parameters:: ds_dict – dictionary of combined scenario datasets, keyed by site codes.
Returns:: estimated value(s) for model error.
Return type:: np.ndarray

openghg_inversions.model_error.residual_error_method(ds_dict: dict[str, Dataset], robust: bool = False, by_site: bool = False) → ndarray#

Compute estimate of model error using residual error method.

This method is explained in “Modeling of Atmospheric Chemistry” by Brasseur and Jacobs in Box 11.2 on p.499-500, following “Comparative inverse analysis of satellitle (MOPITT) and aircraft (TRACE-P) observations to estimate Asian sources of carbon monoxide”, by Heald, Jacob, Jones, et.al. (Journal of Geophysical Research, vol. 109, 2004).

Roughly, we assume that the observations y are equal to the modelled observations y_mod (mf_mod + bc_mod), plus a bias term b, and instrument, representation, and model error:

y = y_mod + b + err_I + err_R + err_M

Assuming the errors are mean zero, we have

(y - y_mod) - mean(y - y_mod) = err_I + err_R + err_M (*)

where the mean is taken over all observations.

Calculating the RMS of the LHS of (*) gives us an estimate for

sqrt(sigma_I^2 + sigma_R^2 + sigma_M^2),

where sigma_I is the standard deviation of err_I, and so on.

Thus a rough estimate for sigma_M is the RMS of the LHS of (*), possibly with the RMS of the instrument/observation and averaging errors removed (this isn’t implemented here).

Note: in the “non-robust” case, we are computing the standard deviation of y - y_mod. The mean on the LHS of equation (*) could be taken over a subset of the observation, in which case the value calculated is not a standard deviation. We wrote the derivation this way to match Brasseur and Jacobs.

Parameters:

ds_dict – dictionary of combined scenario datasets, keyed by site codes.
robust – if True, use the “median absolute deviation” (https://en.wikipedia.org/wiki/Median_absolute_deviation) instead of the standard deviation. MAD is a measure of spread, similar to standard deviation, but is more robust to outliers.
by_site – if True, return array with one mininum error value per site

Returns:

estimated value(s) for model error.

Return type:

np.ndarray

openghg_inversions.model_error.setup_min_error(min_error: ndarray, siteindicator: ndarray) → ndarray#: Given min_error vector with same length as number of sites, create a vector aligned with obs stacked by site.

openghg_inversions.model_error#

This Page