Locators and storage

A locator tells ogcat where a catalogued artifact lives. The locator is stored in the catalog record alongside the metadata and is independent of how the file ended up there.

Locator kinds

path : A local filesystem path. Managed files ingested with add_file() use this kind. Path-backed records support :meth:ogcat.CatalogRecord.path and the ogcat path CLI command.

urlpath : An fsspec-addressable URL path, such as ssh://host/path/file.nc or s3://bucket/path/store.zarr. These locators are interpreted only when fsspec-backed storage behavior is requested.

uri : An external reference that ogcat records but does not manage or inspect. Use this for DOI, FTP, HTTP, ICOS, object-store, or project-specific references that domain code will interpret later.

opaque : A placeholder used when the locator is not yet set or when no path is applicable. You will not normally see this in practice.

Other project-specific kinds can be stored using :meth:ogcat.ArtifactLocator directly, but ogcat does not interpret them beyond recording the string value.

Managed files

catalog.add_file() copies or moves the source file into the catalog’s files/ tree and records a path locator pointing at the stored copy.

record = catalog.add_file(
    Path("data.nc"),
    metadata={"species": "CO2"},
    operation="copy",     # or "move"
)
print(record.path())      # path inside files/

The storage location is derived from directory and filename templates stored in catalog.json. The defaults are:

directory: {year_added}/{original_stem}
filename:  {title_slug|original_stem}{original_suffix}

Storage plans

Catalog.plan_artifact_storage() performs the planning part of an add operation without writing data or inserting a record. It validates metadata, applies the same naming templates, lets locator-resolution hooks adjust the result, and returns a StoragePlan.

plan = catalog.plan_artifact_storage(
    Path("incoming/example.nc"),
    metadata={"title": "example"},
    write_mode="copy",
)
print(plan.locator)

StoragePlan describes storage only. It carries the resolved locator, target kind, write mode, storage-relative path, resolved directory, and resolved filename. It does not carry record metadata; pass metadata again to add_artifact(...) when turning a storage plan into a record.

Overriding template-derived storage paths

Pass an explicit locator when the correct target path is known and should not be derived from the schema naming templates. This is useful when the physical source filename is not the filename that should be stored, such as a .zip archive that contains a single .nc member.

from pathlib import Path

from ogcat import ArtifactLocator, UnzipSingleFileArtifactWriter, path_source

archive_path = Path("incoming/GCP-GridFEDv2023.1_2018.zip")
target_path = catalog.root / "files" / "flux/raw/GridFED/v2023.1/co2-o2/GCP-GridFEDv2023.1_2018.nc"

plan = catalog.plan_artifact_storage(
    archive_path,
    record_type="raw_flux",
    locator=ArtifactLocator.from_path(target_path),
    target_kind="file",
    write_mode="write",
    metadata={"product": "GridFED", "version": "v2023.1", "species": "co2-o2", "year": 2018},
)

record = catalog.add_artifact(
    record_type="raw_flux",
    storage_plan=plan,
    metadata={"product": "GridFED", "version": "v2023.1", "species": "co2-o2", "year": 2018},
    source=path_source(archive_path, kind="zip_file"),
    artifact_writer=UnzipSingleFileArtifactWriter(),
)

When a locator is supplied, plan_artifact_storage(...) still validates metadata and exposes the planned locator to hooks, but it does not render the schema directory and filename templates. The resulting record uses the explicit locator from the plan.

Hook timing matters. before_validate_metadata runs before planning, so it receives neither context.planned_locators nor context.storage_plan. resolve_artifact_locator receives proposed locators in context.planned_locators and can return the locator that should be used for the artifact being added. After that hook returns, ogcat builds the final StoragePlan and exposes it to later hooks and artifact writers as context.storage_plan. The plan lets domain code materialise a generic artifact such as a directory of NetCDF files or a .zarr store while ogcat core only records the locator. Artifact writers remain responsible for filesystem side effects and rollback registration.

External references

To catalog a file that should stay in place, use add_artifact() with a path locator and record_type="external_reference".

from ogcat import ArtifactLocator

catalog.add_artifact(
    record_type="external_reference",
    locator=ArtifactLocator.from_path("/data/shared/flux.nc"),
    metadata={"species": "CO2"},
)

The file is not copied or moved. ogcat records only the path and the metadata.

For non-local references, use a uri locator when ogcat should not check or manage the target:

catalog.add_artifact(
    record_type="external_reference",
    locator=ArtifactLocator(kind="uri", value="ftp://example.org/data/file.nc"),
    storage_mode="external",
)

Use ArtifactLocator.from_urlpath(...) when the location should be interpreted by fsspec-backed storage adapters. Install the optional dependency with ogcat[fsspec] before a writer performs fsspec-backed storage work.

Catalog layout

<catalog-root>/
  catalog.json      catalog specification and schemas
  db.json           TinyDB record store
  files/            managed file storage tree