CLI & Operations
This document covers day-to-day workflows, the operational safety model, configuration, and troubleshooting.
Operational Model
The Dataset API is immutable from the perspective of clients. All changes happen through controlled workflows: - pipelines change physical data - CLI imports catalogue metadata and reconciles state - policies are updated in OPA
This prevents “clickops” drift.
CLI: Core Workflows
1) Validate a catalogue
Validation should happen before import.
dataset-cli validate catalogue catalogue.yaml
What validation should check: - YAML schema correctness - required fields - duplicate dataset_ids - invalid access_level values - unresolved references (optional)
2) Import a catalogue
dataset-cli import catalogue catalogue.yaml
Expected output: - created / updated counts - per-dataset errors (if partial success supported) - cleanup summary (if enabled)
3) Dry-run import
dataset-cli import catalogue catalogue.yaml --dry-run
Dry-run should print: - final selected dataset_ids after filters - planned create/update/delete actions
4) Export lineage candidates
dataset-cli export openlineage
Expected: - fetch from lineage backend - map lineage → dataset_ids - produce a governance/candidate YAML for curation
Filters & Selection
Use include/exclude patterns:
- +pattern include
- -pattern exclude
Example:
dataset-cli import catalogue catalogue.yaml \
--filter +datasets.*.gold.* \
--filter -datasets.*.gold.experimental_*
Configuration Surfaces
Typical configuration areas: - API base URL - DB connection - OPA URL - lineage backend URL/credentials - query limits/timeouts - public catalogue exposure rules
Keep configuration environment-driven and documented.
Troubleshooting Guide
“Dataset not found”
- dataset_id missing from catalogue DB
- import filters excluded it
- import failed validation
- cleanup removed it due to missing physical table
“Forbidden (403)”
- JWT missing groups/roles required by OPA policy
- service account not mapped in policy
- access_level is
restrictedand policy denies - token missing scope used by policy
“SQL rejected”
- non-SELECT statement
- unknown function
- references non-catalogued table
- multiple statements detected
“Schema endpoint fails”
- physical table missing or renamed
- reflection permissions insufficient
- unsupported column type mapping
“Lineage export is empty”
- lineage backend not receiving OpenLineage events
- wrong namespace filters
- wrong time window
- wrong Marquez project or API base URL
Operational Checks (Recommended)
- periodic import dry-run in CI
- periodic cleanup run (or as part of import)
- dashboards for:
- authorization denies
- validation rejects
- query latency
- lineage ingestion lag