Tutorial: basic catalog¶
This tutorial builds a small catalog with two named schemas, adds both managed files and external references, searches across records, and updates metadata through the repository API.
Setup¶
From the repository root:
uv sync
Create a catalog with multiple schemas¶
from pathlib import Path
from tempfile import TemporaryDirectory
from ogcat import ArtifactLocator, Catalog, CatalogSpec, MetadataFieldDescription, RecordSchema
measurement_schema = RecordSchema(
description="Managed local measurement files.",
metadata_fields=[
MetadataFieldDescription(name="title", description="Human-readable title.", required=True),
MetadataFieldDescription(name="site", description="Site code.", required=True),
MetadataFieldDescription(name="species", description="Species code.", required=True),
MetadataFieldDescription(name="year", description="Calendar year.", value_types=["int"]),
],
)
reference_schema = RecordSchema(
description="References to external artifacts that ogcat does not copy.",
metadata_fields=[
MetadataFieldDescription(name="title", description="Reference title.", required=True),
MetadataFieldDescription(name="kind", description="Reference kind.", required=True),
MetadataFieldDescription(name="topic", description="Searchable topic."),
],
)
spec = CatalogSpec(
catalog_name="tutorial",
record_schemas={
"measurement": measurement_schema,
"reference": reference_schema,
},
)
Add files and references¶
with TemporaryDirectory(prefix="ogcat-tutorial-") as tmp:
root = Path(tmp)
source_dir = root / "source"
source_dir.mkdir()
catalog = Catalog.create(root / "catalog", spec)
ch4_file = source_dir / "mhd_ch4_2024.txt"
ch4_file.write_text("demo methane data", encoding="utf-8")
measurement = catalog.add_file(
ch4_file,
record_type="measurement",
metadata={
"title": "MHD methane observations",
"site": "MHD",
"species": "CH4",
"year": 2024,
},
)
reference = catalog.add_artifact(
record_type="reference",
locator=ArtifactLocator(kind="uri", value="https://example.org/mhd-method"),
metadata={
"title": "MHD processing method",
"kind": "method-note",
"topic": "methane",
},
storage_mode="external",
)
add_file() copies the source file into the catalog’s managed files/ tree.
add_artifact() records a locator and metadata without copying or moving data.
Search and inspect records¶
ch4_records = catalog.search(where={"species": "CH4"})
topic_matches = catalog.search(contains={"topic": "methane"}, ignore_case=True)
print(ch4_records[0].path())
print(topic_matches[0].locator.value)
Unqualified fields such as species and topic are resolved across top-level
record fields, user_metadata, and derived_metadata. Use dotted paths such as
user_metadata.species when you need to be explicit.
Modify metadata with the repository¶
Catalog.repository is the low-level record store. Use it when you need to
update an existing record in place.
saved = catalog.get(measurement.id)
assert saved is not None
saved.user_metadata["quality_flag"] = "reviewed"
catalog.repository.update(saved)
reviewed = catalog.search(where={"quality_flag": "reviewed"})
assert reviewed[0].id == measurement.id
Repository updates replace the stored record with the modified
CatalogRecord. Keep record IDs and reserved fields intact unless you are
intentionally changing them.
CLI equivalents¶
uv run ogcat init /tmp/tutorial-catalog --name tutorial
uv run ogcat add ./mhd_ch4_2024.txt --catalog /tmp/tutorial-catalog --meta title="MHD methane observations" site=MHD species=CH4 year=2024
uv run ogcat search --catalog /tmp/tutorial-catalog species=CH4 --fields id,title,species,path