Architecture

This document describes the authorization model, system architecture, and design decisions of the CELINE Policy Service.

System Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                         CELINE Platform                                  │
│                                                                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │ digital-twin│  │  pipelines  │  │ rec-registry│  │   nudging   │    │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘    │
│         │                │                │                │            │
│         └────────────────┴────────────────┴────────────────┘            │
│                                   │                                      │
│                                   ▼                                      │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                      Policy Service                                │  │
│  │                                                                    │  │
│  │   ┌─────────────┐   ┌─────────────┐   ┌─────────────────────┐    │  │
│  │   │ JWT Validate│──▶│  Subject    │──▶│   OPA Engine        │    │  │
│  │   │ (JWKS cache)│   │  Extract    │   │   (regorus)         │    │  │
│  │   └─────────────┘   └─────────────┘   └──────────┬──────────┘    │  │
│  │                                                   │               │  │
│  │   ┌─────────────┐   ┌─────────────┐              │               │  │
│  │   │ Audit Log   │◀──│  Decision   │◀─────────────┘               │  │
│  │   │ (structured)│   │  Cache      │                              │  │
│  │   └─────────────┘   └─────────────┘                              │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                   │                                      │
│                                   ▼                                      │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                       Keycloak (IdP)                               │  │
│  │   Users, Groups, Service Accounts, OAuth Clients, Scopes          │  │
│  └───────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘

Authorization Model

The Dual-Check Model

The CELINE authorization model enforces two independent checks that must both pass:

                    ┌─────────────────────────────────────┐
                    │          Authorization              │
                    │                                     │
                    │   ┌───────────┐   ┌───────────┐    │
                    │   │   User    │   │  Client   │    │
                    │   │  Groups   │   │  Scopes   │    │
                    │   │ (roles)   │   │  (OAuth)  │    │
                    │   └─────┬─────┘   └─────┬─────┘    │
                    │         │               │          │
                    │         └───────┬───────┘          │
                    │                 │                  │
                    │                 ▼                  │
                    │         ┌───────────────┐         │
                    │         │  INTERSECTION │         │
                    │         │   (both must  │         │
                    │         │    pass)      │         │
                    │         └───────────────┘         │
                    └─────────────────────────────────────┘

Why this model?

User Groups define what a human user is allowed to do (based on their role)
Client Scopes define what the requesting application is allowed to do

The intersection prevents privilege escalation: even if a user is an admin, a low-privilege client (like a public dashboard) cannot access admin-only resources.

Subject Types

Type	Identification	Authorization Source
User	JWT `sub` claim	Group hierarchy + Client scopes
Service	JWT `client_id` claim (no `sub`)	Client scopes only
Anonymous	No JWT provided	Limited to open resources

Group Hierarchy

Users are assigned to groups in Keycloak. Groups have a hierarchy:

admins    (level 4) ─── Full platform access
    │
managers  (level 3) ─── Operational access, simulations
    │
editors   (level 2) ─── Write access to non-restricted resources
    │
viewers   (level 1) ─── Read-only access to internal resources
    │
(none)    (level 0) ─── Anonymous / no group membership

Higher levels inherit all permissions of lower levels.

Resource Types

Resource	Policy Package	Description
`dataset`	`celine.dataset.access`	Data access control with row-level filtering
`pipeline`	`celine.pipeline.state`	Pipeline state machine transitions
`dt`	`celine.dt.access`	Digital twin API access
`topic`	`celine.mqtt.acl`	MQTT topic publish/subscribe
`userdata`	`celine.userdata.access`	User-owned resources

Policy Engine

Why OPA?

Open Policy Agent provides:

Declarative policies — Rules expressed in Rego, not code
Testable — Policies can be unit tested with opa test
Decoupled — Policy changes don't require service redeployment
Industry standard — CNCF graduated project, widely adopted

Embedded vs. Sidecar

The policy service uses embedded OPA (via regorus, a Rust implementation):

Approach	Latency	Deployment	Best For
Embedded (current)	~0.1-0.5ms	Single service	Centralized, moderate scale
Sidecar per service	~0.1ms	Container per service	High throughput, low latency
Remote OPA	~1-5ms	Separate deployment	Shared policies, simple services

For high-throughput services, the architecture can evolve to sidecars that pull policy bundles from this central service.

Policy Packages

policies/celine/
├── common/
│   ├── subject.rego      # is_user, is_service, has_scope(), in_group()
│   └── access_levels.rego # level_value(), is_open(), etc.
├── dataset/
│   ├── access.rego       # allow, reason, filters
│   ├── row_filter.rego   # Row-level security filters
│   └── access_test.rego  # Policy unit tests
├── pipeline/
│   └── state.rego        # State machine validation
├── dt/
│   └── access.rego       # Digital twin access
├── mqtt/
│   └── acl.rego          # Topic ACLs
└── userdata/
    └── access.rego       # User data ownership

Policy Input Structure

All policies receive a standardized input:

{
  "subject": {
    "id": "user-123",
    "type": "user",
    "groups": ["viewers", "editors"],
    "scopes": ["dataset.query", "dt.read"],
    "claims": { /* raw JWT claims */ }
  },
  "resource": {
    "type": "dataset",
    "id": "ds-456",
    "attributes": {
      "access_level": "internal"
    }
  },
  "action": {
    "name": "read",
    "context": {}
  },
  "environment": {
    "request_id": "req-789",
    "timestamp": 1706745600
  }
}

Policy Output Structure

{
  "allow": true,
  "reason": "user has viewer access and client has dataset.query scope",
  "filters": [
    {"field": "organization_id", "operator": "eq", "value": "org-123"}
  ]
}

Request Flow

1. JWT Validation

Incoming Request
       │
       ▼
┌─────────────────┐
│ Extract Bearer  │
│ token from      │
│ Authorization   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌─────────────────┐
│ Fetch JWKS      │◀───▶│ JWKS Cache      │
│ (if needed)     │     │ (1 hour TTL)    │
└────────┬────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│ Validate JWT    │
│ - signature     │
│ - expiry        │
│ - issuer        │
└────────┬────────┘
         │
         ▼
   Valid Claims

2. Subject Extraction

# Simplified logic
def extract_subject(claims: dict) -> Subject:
    # Service account detection
    if "client_id" in claims and "sub" not in claims:
        return Subject(
            type="service",
            id=claims["client_id"],
            scopes=claims.get("scope", "").split(),
        )

    # User detection
    return Subject(
        type="user",
        id=claims["sub"],
        groups=extract_groups(claims),  # From realm_access.roles + groups
        scopes=claims.get("scope", "").split(),
    )

3. Policy Evaluation

Subject + Resource + Action
           │
           ▼
┌──────────────────────┐
│ Check Decision Cache │
└──────────┬───────────┘
           │
     ┌─────┴─────┐
     │  Cache    │
     │  Hit?     │
     └─────┬─────┘
       No  │  Yes
           │    └──────▶ Return Cached Decision
           ▼
┌──────────────────────┐
│ Build Policy Input   │
│ (JSON document)      │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│ OPA Evaluate         │
│ "data.celine.{type}" │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│ Cache Decision       │
│ (LRU + TTL)          │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│ Audit Log            │
│ (structured JSON)    │
└──────────┬───────────┘
           │
           ▼
      Return Decision

Caching Strategy

Decision Cache

Identical requests return cached decisions:

Setting	Default	Description
`DECISION_CACHE_ENABLED`	`true`	Enable/disable caching
`DECISION_CACHE_TTL_SECONDS`	`300`	Time-to-live for cached decisions
`DECISION_CACHE_MAXSIZE`	`10000`	Maximum cache entries

Cache key: hash(policy_package + policy_input)

JWKS Cache

Public keys are cached to avoid fetching on every request:

Setting	Default	Description
`JWKS_CACHE_TTL_SECONDS`	`3600`	Key cache TTL

Automatic refresh on: - TTL expiry - Unknown key ID (kid) in token

Audit Logging

All decisions are logged with structured JSON:

{
  "timestamp": "2024-01-31T12:00:00Z",
  "event": "policy_decision",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "allowed": true,
  "policy": "celine.dataset.access",
  "subject_id": "user-123",
  "subject_type": "user",
  "resource_type": "dataset",
  "resource_id": "ds-456",
  "action": "read",
  "source_service": "digital-twin",
  "latency_ms": 0.42,
  "cached": false
}

Security Considerations

Zero Trust Compliance

Principle	Implementation
Never trust, always verify	Every request requires valid JWT
Least privilege	Scopes limit what each client can do
Assume breach	Service-to-service requires auth
Defense in depth	User groups + Client scopes

Token Security

JWTs validated with RS256 signatures
Issuer (iss) claim verified against Keycloak
Expiry (exp) enforced
No token storage — stateless validation

Recommendations for Production

mTLS between services and policy service
Network segmentation — policy service not publicly accessible
Audit log forwarding to SIEM
Rate limiting on policy endpoints
Secret rotation for OAuth clients

Performance Characteristics

Metric	Typical Value
Policy evaluation	0.1 - 0.5 ms
JWT validation (cached JWKS)	0.5 - 1 ms
Full request (uncached)	2 - 5 ms
Full request (cached)	< 1 ms
Throughput	5,000+ req/sec (single instance)

Future Considerations

Scaling Options

Horizontal scaling — Multiple policy service instances behind load balancer
OPA sidecars — Deploy OPA alongside high-throughput services
Policy bundles — Central service serves bundles to distributed OPA instances

Potential Enhancements

GraphQL authorization integration
Relationship-based access control (ReBAC) for complex hierarchies
Policy versioning and rollback
A/B testing for policy changes