Catalog API

class ogcat.Catalog(root, spec, repository, hook_manager=<factory>)[source]

Bases: object

User-facing API bound to one catalog root.

Parameters:
  • root (Path) – Root directory containing catalog.json, db.json, and managed files.

  • spec (CatalogSpec) – Catalog specification loaded from or written to catalog.json.

  • repository (CatalogRepository) – Record storage backend.

  • hook_manager (HookManager) – Dispatcher for lifecycle hooks.

root: Path
spec: CatalogSpec
repository: CatalogRepository
hook_manager: HookManager
classmethod create(root, spec, *, plugins=None, hooks=None)[source]

Create a catalog directory and write its specification.

Parameters:
  • root (str | Path) – Directory to create or reuse for the catalog.

  • spec (CatalogSpec) – Catalog specification to persist.

  • plugins (PluginRegistry | None) – Optional plugin registry used to build a hook manager.

  • hooks (HookManager | None) – Optional hook manager. Pass either plugins or hooks.

Return type:

Catalog

Returns:

Open catalog instance bound to root.

Raises:

ValueError – If the configured backend is unsupported, or both plugins and hooks are supplied.

classmethod open(root, *, plugins=None, hooks=None)[source]

Open an existing catalog from disk.

Parameters:
  • root (str | Path) – Existing catalog root containing catalog.json.

  • plugins (PluginRegistry | None) – Optional plugin registry used to build a hook manager.

  • hooks (HookManager | None) – Optional hook manager. Pass either plugins or hooks.

Return type:

Catalog

Returns:

Open catalog instance bound to root.

Raises:
  • FileNotFoundError – If catalog.json is missing.

  • ValueError – If the configured backend is unsupported, or both plugins and hooks are supplied.

add_file(path, metadata=None, operation=None, record_type=None)[source]

Add a local file using managed copy or move.

Parameters:
  • path (str | Path) – Source file to ingest.

  • metadata (MetadataDict | None) – JSON-compatible user metadata.

  • operation (str | None) – "copy" or "move". Defaults to the catalog spec.

  • record_type (str | None) – Optional named schema to validate against.

Return type:

CatalogRecord

Returns:

Persisted catalog record.

Raises:
  • TypeError – If metadata is not a dictionary.

  • ValueError – If validation fails, the operation is unsupported, or record_type names an unknown schema.

plan_artifact_storage(path=None, *, record_type=None, metadata=None, locator=None, target_kind='file', write_mode=None, ogcat_owned=True, storage_root=None)[source]

Plan artifact storage without writing data or a catalog record.

Parameters:
  • path (str | Path | None) – Optional local source path used for naming and copy/move plans.

  • record_type (str | None) – Optional named schema to validate and use for naming.

  • metadata (MetadataDict | None) – JSON-compatible user metadata.

  • locator (ArtifactLocator | None) – Optional pre-resolved target locator. When omitted, schema naming templates are rendered under storage_root or this catalog’s managed files root.

  • target_kind (Literal['file', 'directory']) – Whether the target is a file-like or directory-like artifact.

  • write_mode (Optional[Literal['copy', 'move', 'write', 'reference']]) – Desired materialisation mode. Defaults to "write" for owned artifacts and "reference" otherwise.

  • ogcat_owned (bool) – Whether ogcat should treat the target as managed.

  • storage_root (str | Path | None) – Optional local root or fsspec URL root for rendered template targets.

Return type:

StoragePlan

Returns:

Planned storage decision.

add_artifact(*, record_type, locator=None, storage_plan=None, metadata=None, storage_mode=None, original_path=None, original_filename=None, suffixes=None, derived_metadata=None, naming_metadata=None, time_added=None, source=None, artifact_writer=None, transaction=None)[source]

Add an artifact record and optionally materialise planned storage.

This is the minimal general record API. add_file() remains the managed ingest convenience wrapper that prepares a path-backed locator and delegates through the same lifecycle.

Parameters:
  • record_type (str) – Logical type of record to create.

  • locator (ArtifactLocator | None) – Artifact locator to store with the record. Required unless storage_plan is supplied.

  • storage_plan (StoragePlan | None) – Optional planned storage decision to use instead of a standalone locator.

  • metadata (MetadataDict | None) – JSON-compatible user metadata.

  • storage_mode (str | None) – Optional description such as "external".

  • original_path (str | Path | None) – Optional source path or URI.

  • original_filename (str | None) – Optional source filename.

  • suffixes (list[str] | None) – Optional suffix list for the source artifact.

  • derived_metadata (MetadataDict | None) – Optional derived metadata to persist.

  • naming_metadata (MetadataDict | None) – Optional naming metadata to persist.

  • time_added (str | None) – Optional timestamp override.

  • source (OperationSource | None) – Optional operation source for hooks and writers.

  • artifact_writer (ArtifactWriter | None) – Optional writer that materialises data before the record is written.

  • transaction (UnitOfWork | None) – Optional caller-owned unit of work.

Return type:

CatalogRecord

Returns:

Persisted or staged catalog record.

Raises:
  • TypeError – If metadata or writer inputs are invalid.

  • ValueError – If validation fails or the transaction belongs to a different repository.

transaction()[source]

Create a best-effort unit of work for composed catalog operations.

The current TinyDB backend uses staged writes and compensating rollback actions. This context manager does not provide true database transactions or ACID semantics.

Return type:

Iterator[UnitOfWork]

add_artifacts(artifacts)[source]

Add multiple artifact records.

Each item should provide the same keyword-style fields accepted by add_artifact(). Items are added one at a time so hooks and artifact writers run consistently for each record. Earlier items remain committed if a later item fails.

Parameters:

artifacts (list[dict[str, object]]) – List of dictionaries accepted by add_artifact().

Return type:

list[CatalogRecord]

Returns:

Persisted records in input order.

search(query=None, *, where=None, contains=None, regex=None, match=None, exists=None, missing=None, ignore_case=False, as_record_set=False)[source]

Search catalog records using backend-neutral query semantics.

Parameters:
  • query (SearchQuery | None) – Optional pre-built search query.

  • where (dict[str, object] | None) – Equality filters.

  • contains (dict[str, object] | None) – Substring or list-membership filters.

  • regex (dict[str, str] | None) – Regular-expression filters.

  • match (dict[str, str] | None) – Glob or substring filters.

  • exists (Sequence[str] | None) – Fields that must be present.

  • missing (Sequence[str] | None) – Fields that must be absent.

  • ignore_case (bool) – Whether string comparisons should be case-insensitive.

  • as_record_set (bool) – Return a CatalogRecordSet instead of a list.

Return type:

list[CatalogRecord] | CatalogRecordSet

Returns:

Matching records, either as a list or record-set view.

record_set(records)[source]

Wrap records in a sequence-like container.

Parameters:

records (Sequence[CatalogRecord]) – Records to expose through CatalogRecordSet helpers.

Return type:

CatalogRecordSet

Returns:

Record set using this catalog’s field resolution order.

describe()[source]

Return a serialisable summary of catalog configuration and contents.

Return type:

dict[str, object]

list_metadata_fields(record_type=None)[source]

Return serialisable metadata field descriptions for a schema.

Return type:

list[MetadataDict]

list_record_fields()[source]

Return discoverable field paths present in stored records.

Return type:

list[str]

unique_values(field)[source]

Return unique scalar values present for a field across stored records.

Return type:

list[JsonValue]

get_schema(record_type=None)[source]

Return a serialisable schema description.

Return type:

dict[str, object]

list_record_schemas()[source]

Return available named record schema names.

Return type:

list[str]

get(record_id)[source]

Get a record by id.

Return type:

CatalogRecord | None

path(record_id)[source]

Return the stored path for a path-backed record, if present.

Return type:

Path | None

add_record_schema(name, schema, *, overwrite=False)[source]

Add or replace a record schema in the catalog spec.

Parameters:
  • name (str) – Record schema name.

  • schema (RecordSchema | dict[str, object]) – Schema object or serialised schema dictionary.

  • overwrite (bool) – Whether an existing schema may be replaced.

Raises:
  • ValueError – If the schema already exists and overwrite is false, or if the resulting spec is invalid.

  • TypeError – If schema is not a valid schema object.

Return type:

None

set_default_record_schema(name)[source]

Set the default record schema by name.

Return type:

None

update_spec(**fields)[source]

Update simple catalog spec fields and persist catalog.json.

Supported fields are catalog_name, default_operation, and field_resolution_order. files_root changes require a dedicated migration operation and are intentionally rejected here.

Return type:

None