Architecture
Overview
celine-grid is a FastAPI Backend-for-Frontend. It orchestrates three external concerns:
- Read path — proxy grid resilience data (wind/heat risk, substations, filters, summary) from the Digital Twin to the frontend, enforcing DSO network-ownership at the request boundary.
- CIM topology path — proxy ValueFetcherSpec-backed queries (tile-index, shapes, risks, nowcasting, trendline) from the Digital Twin, with lightweight presentation transformations (GeoJSON assembly for shapes).
- Write path — persist per-user alert rules and notification settings in PostgreSQL, then dispatch nudging events to the nudging-tool when the grid-resilience-flow pipeline completes.
Frontend ──► celine-grid BFF ──► Digital Twin API (read)
│
├──► PostgreSQL (alert rules, notification settings)
└──► nudging-tool (alert dispatch)
MQTT broker ──► pipeline_listener ──► alert_dispatcher ──► nudging-tool
Request lifecycle
Every protected request passes through two security layers before reaching a route handler:
PolicyMiddleware(security/middleware.py) — rejects requests with no recognisable token with401before any route handler is invoked. Public paths (/health,/api/docs,/api/redoc,/api/openapi.json) bypass this check.- FastAPI dependency (
api/deps.py) — decodes the JWT viaJwtUser.from_token, then delegates toGridAccessPolicyto perform an OPA evaluation for the specific action.
Authentication
Tokens are accepted from two sources (checked in order):
x-auth-request-access-tokenheader — set by OAuth2 Proxy in productionAuthorization: Bearer <token>header — used for direct API calls / service accounts
The header name is configurable via JWT_HEADER_NAME.
Authorization — OPA policy
policies/grid.rego (package celine.grid.access) defines three actions:
| Action | Who may proceed |
|---|---|
read |
DSO users whose org alias matches the requested network_id; service accounts with grid.read or grid.admin scope |
alerts.read |
Any authenticated non-service user (ownership enforced at DB query level by user_id = sub) |
alerts.write |
DSO users with grid.alerts.write or grid.admin scope |
The GridAccessPolicy class (security/policy.py) loads the Rego bundle once at import time and evaluates decisions per request. When the policy engine is unavailable (e.g. local dev without OPA), the policy falls back to permissive — allow=True.
DSO network identity comes from the Keycloak organisation claim on the JWT. The first organisation with type=dso becomes the user's network_id; resolve_dso_network() raises HTTP 403 if no such organisation is present.
Grid data proxy
All grid endpoints live under /api/grid/{network_id}/. The network_id path parameter is validated against the user's DSO org alias (or scope for service accounts) before the request is forwarded to the Digital Twin via celine.sdk.dt.DTClient. DTApiError responses are mapped to appropriate HTTP status codes.
Available data surfaces:
- Wind —
wind/map,wind/bosco,wind/alert-distribution,wind/trend - Heat —
heat/map,heat/alert-distribution,heat/trend - Substations —
substations/map - Metadata —
filters,summary - CIM topology —
tile-index,shapes,risks,risks-now,trendline
The CIM topology endpoints use ValueFetcherSpec-backed queries. tile-index returns the tile catalog for progressive loading. shapes assembles CIM asset topology as a GeoJSON FeatureCollection, parsing per-row feature_geojson and building Feature objects (a lightweight presentation transformation). risks returns date-filtered risk rows per vector. risks-now returns current nowcasting observations without date filtering. trendline returns daily risk percentage over a date_from/date_to range.
tile-index and shapes set Cache-Control: public, max-age=3600 for frontend caching.
Each DT call is authenticated using client-credentials OIDC flow (OidcClientCredentialsProvider).
Alert rules
Alert rules are stored in the alert_rules PostgreSQL table. Each rule belongs to a user_id (JWT sub) and a network_id, and carries:
risk_types— JSON array of"wind"/"heat"(or both)threshold—WARNINGorALERTactive— whether the rule participates in dispatch evaluationrecipients— optional override email list
Rule ownership is enforced at the SQL query level (WHERE user_id = :sub), not only at the OPA layer.
Pipeline listener and alert dispatch
services/pipeline_listener.py subscribes to celine/pipelines/runs/+ on startup. When a PipelineRunEvent with status=completed and flow=grid-resilience-flow arrives, it calls dispatch_grid_alerts().
The dispatcher (services/alert_dispatcher.py):
- Fetches current
wind_alert_distributionandheat_alert_distributionfrom the DT for the event'snamespace(=network_id). - Loads all active
AlertRulerows for thatnetwork_id. - For each rule, checks whether the distribution contains events at or above the rule's threshold (
WARNINGfloor ={WARNING, ALERT};ALERTfloor ={ALERT}). - For each triggered rule, emits a
grid_alertDigitalTwinEventto the nudging-tool viaNudgingAdminClient.
If the MQTT broker is unavailable at startup, the listener logs a warning and the rest of the service continues to operate normally.
Database
PostgreSQL (async via SQLAlchemy + asyncpg). Two tables:
| Table | Purpose |
|---|---|
alert_rules |
Per-user grid alert rules |
notification_settings |
Per-user global notification preferences (email recipients, webhook URL) |
Migrations are managed by Alembic in the alembic/ directory. The docker-compose.yml migrate service runs alembic upgrade head before the API container starts.
Service dependencies
| Dependency | Role | Required |
|---|---|---|
| PostgreSQL | Alert rules and notification settings storage | Yes |
Digital Twin API (digital-twin) |
Grid risk data source | Yes (grid endpoints return 503 if unconfigured) |
| nudging-tool | Alert delivery | Yes (dispatch silently degrades on send failures) |
| MQTT broker | Pipeline completion events | No (startup warning; alert dispatch is inactive) |
OPA / celine.sdk.policies |
Fine-grained access control | No (permissive fallback) |
Key design decisions
DSO org alias as network_id — the Keycloak organisation alias is used directly as the network_id for DT queries. No mapping table is needed; org management in Keycloak is the single source of truth.
Permissive OPA fallback — the policy engine falls back to allow-all when unavailable so development environments without a running OPA instance stay functional. Production deployments always have the policies directory present in the container.
Pipeline listener vs polling — alert dispatch is event-driven (MQTT) rather than scheduled. This avoids unnecessary DT queries and ensures alerts fire promptly after each pipeline run without coupling the BFF to a scheduler.
Thin BFF pattern — celine-grid performs no risk calculations. All domain logic lives in the Digital Twin. The BFF's job is routing, authentication, authorization, and persistence of user preferences.