Tutorial: advanced real-world catalogs¶
The scripts in this section are optional, larger examples for existing ACRG data trees. They are useful demonstrations of external-reference catalogs, but they are intentionally not the first learning path.
Both scripts can work from a mounted filesystem or from saved recursive
ls -R output. Listing mode records paths and filename-derived metadata only;
mounted-scan mode can also inspect file sizes, archive contents, and optional
netCDF summaries.
ACRG flux catalog¶
The flux script catalogs files under a tree like /group/chem/acrg/ES/fluxes
as external path-backed records. It infers fields such as top_collection,
product, species, domain, sector, temporal_resolution, year,
archive_format, and file_role.
Run it from a saved listing:
uv run python examples/catalog_fluxes.py catalog /tmp/ogcat-fluxes --listing /path/to/eric_fluxes_recursive_ls.txt --no-enrich
Run it from a mounted source tree:
uv run python examples/catalog_fluxes.py catalog /tmp/ogcat-fluxes --source-root /group/chem/acrg/ES/fluxes
The script uses batched add_artifacts() calls with ArtifactLocator.path(...)
and storage_mode="external", so source files stay where they are.
ACRG NAME footprint catalog¶
The footprint script catalogs monthly NAME footprint files under a tree like
/group/chem/acrg/LPDM/fp_NAME. It infers fields such as site, inlet,
model, met_model, domain, species, year, month, and start_date.
Run it from a saved listing:
uv run python examples/catalog_acrg_name_footprints.py /tmp/ogcat-footprints --listing /path/to/fp_name_recursive_ls.txt
Run it from a mounted source tree:
uv run python examples/catalog_acrg_name_footprints.py /tmp/ogcat-footprints --source-root /group/chem/acrg/LPDM/fp_NAME
The footprint script uses a named metadata schema for required fields and stores records as external references. It does not copy NetCDF files into the catalog.
After building¶
Inspect records with the CLI:
uv run ogcat info --catalog /tmp/ogcat-fluxes
uv run ogcat search --catalog /tmp/ogcat-fluxes species=CO2 --fields id,product,year,path
uv run ogcat search --catalog /tmp/ogcat-footprints site=BCOB --fields id,domain,year,month,path