Typed Record SchemasΒΆ
ogcat keeps schemas deliberately small. A catalog stores all schemas in a
record_schemas mapping and uses default_record_schema to identify the
broad, heterogeneous ingest fallback. Each RecordSchema can describe metadata
fields, a directory template, a filename template, and a short description.
Metadata field descriptions can also carry lightweight type names. These are serialised as human-readable schema hints for now; they are not enforced by the catalog core.
The configured default_record_schema is the source of truth for broad catalog
behavior. Earlier MVP top-level fields such as metadata_fields,
directory_template, filename_template, and default_schema were removed
before any real catalog migration burden existed, which keeps CatalogSpec
smaller and avoids parallel compatibility state.
For this first pass, record type and schema name are the same concept only where
a named schema exists. Catalog.add_file(..., record_type="flux") selects the
flux schema and raises a clear error if that named schema is missing. Generic
artifact records can still use arbitrary record types; they fall back to the
default schema unless a matching named schema is present.
Validation remains lightweight and schema-driven. Required metadata fields are
checked when a record is added. If a schema supplies value_types, those values
are also validated on add and can reject records with incompatible metadata.
Callers can use ogcat.validate_metadata() or ogcat.validate_record() to get
structured validation reports for CLI output, tests, or plugin code.
Unknown metadata is allowed by default so broad catalogs can stay free-form:
from ogcat import RecordSchema, validate_metadata
schema = RecordSchema()
report = validate_metadata({"title": "Example", "extra": "allowed"}, schema)
assert report.ok
Project catalogs can opt into strict unknown-field handling by setting
allow_unknown_metadata=False on a schema and calling validation with
strict=True:
from ogcat import MetadataFieldDescription, RecordSchema, validate_metadata
schema = RecordSchema(
metadata_fields=[
MetadataFieldDescription(name="title", description="Short title.", required=True),
],
allow_unknown_metadata=False,
)
report = validate_metadata({"title": "Example", "extra": "blocked"}, schema, strict=True)
assert not report.ok
Field descriptions can also include simple value_types labels such as str,
int, number, bool, date, datetime, list[str], and dict. These are
validated internally with Pydantic while the public runtime objects remain
dataclasses. Schema authors can call validate_schema() or validate_spec() to
catch unsupported type labels without causing repeated warnings during regular
record validation. Domain-specific checks should live in plugins or caller code
and can append their own ValidationIssue objects to the same report format.