Deployment
This document covers configuration, deployment options, and production considerations for the CELINE Policy Service.
Configuration
All configuration is via environment variables with the CELINE_ prefix.
Core Settings
| Variable | Default | Description |
|---|---|---|
CELINE_ENVIRONMENT |
development |
Environment name (development, staging, production) |
CELINE_LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
OIDC / JWT Settings
| Variable | Default | Description |
|---|---|---|
CELINE_OIDC_ISSUER |
http://keycloak:8080/realms/celine |
JWT issuer URL |
CELINE_OIDC_AUDIENCE |
(none) | Expected audience claim (optional) |
CELINE_JWKS_CACHE_TTL_SECONDS |
3600 |
JWKS cache TTL |
CELINE_JWT_ALGORITHMS |
["RS256"] |
Allowed JWT algorithms |
Policy Engine Settings
| Variable | Default | Description |
|---|---|---|
CELINE_POLICIES_DIR |
policies |
Path to Rego policies |
CELINE_DATA_DIR |
policies/data |
Path to policy data (JSON) |
Decision Cache Settings
| Variable | Default | Description |
|---|---|---|
CELINE_DECISION_CACHE_ENABLED |
true |
Enable decision caching |
CELINE_DECISION_CACHE_TTL_SECONDS |
300 |
Cache TTL |
CELINE_DECISION_CACHE_MAXSIZE |
10000 |
Maximum cache entries |
Audit Settings
| Variable | Default | Description |
|---|---|---|
CELINE_AUDIT_ENABLED |
true |
Enable audit logging |
CELINE_AUDIT_LOG_INPUTS |
true |
Log full policy inputs |
Docker Deployment
Dockerfile
# syntax=docker/dockerfile:1
FROM python:3.12-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
WORKDIR /app
COPY . /app
RUN uv sync --frozen
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/policies /app/policies
COPY --from=builder /app/src /app/src
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
ENV CELINE_POLICIES_DIR=/app/policies
ENV CELINE_DATA_DIR=/app/policies/data
EXPOSE 8009
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
CMD python -c "import httpx; httpx.get('http://localhost:8009/health').raise_for_status()"
CMD ["uvicorn", "celine.policies.main:create_app", "--host", "0.0.0.0", "--port", "8009"]
Docker Compose (Development)
version: "3.8"
services:
policy-service:
build: .
ports:
- "8009:8009"
environment:
- CELINE_ENVIRONMENT=development
- CELINE_LOG_LEVEL=DEBUG
- CELINE_OIDC_ISSUER=http://keycloak:8080/realms/celine
- CELINE_DECISION_CACHE_TTL_SECONDS=60
volumes:
- ./policies:/app/policies:ro
depends_on:
keycloak:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8009/health"]
interval: 10s
timeout: 5s
retries: 3
keycloak:
image: quay.io/keycloak/keycloak:latest
command: start-dev --import-realm
environment:
- KC_BOOTSTRAP_ADMIN_USERNAME=admin
- KC_BOOTSTRAP_ADMIN_PASSWORD=admin
- KC_HTTP_PORT=8080
- KC_HOSTNAME=keycloak
volumes:
- ./config/keycloak/import:/opt/keycloak/data/import:ro
ports:
- "8080:8080"
healthcheck:
test: ["CMD-SHELL", "exec 3<>/dev/tcp/localhost/8080"]
interval: 5s
timeout: 5s
retries: 15
redis:
image: redis:7-alpine
ports:
- "6379:6379"
Kubernetes Deployment
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: policy-service
labels:
app: policy-service
spec:
replicas: 3
selector:
matchLabels:
app: policy-service
template:
metadata:
labels:
app: policy-service
spec:
containers:
- name: policy-service
image: celine/policy-service:latest
ports:
- containerPort: 8009
env:
- name: CELINE_ENVIRONMENT
value: production
- name: CELINE_LOG_LEVEL
value: INFO
- name: CELINE_OIDC_ISSUER
valueFrom:
configMapKeyRef:
name: policy-service-config
key: oidc-issuer
- name: CELINE_DECISION_CACHE_ENABLED
value: "true"
- name: CELINE_DECISION_CACHE_TTL_SECONDS
value: "300"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8009
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8009
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: policies
mountPath: /app/policies
readOnly: true
volumes:
- name: policies
configMap:
name: policy-service-policies
Service
apiVersion: v1
kind: Service
metadata:
name: policy-service
spec:
selector:
app: policy-service
ports:
- port: 8009
targetPort: 8009
type: ClusterIP
ConfigMap for Policies
apiVersion: v1
kind: ConfigMap
metadata:
name: policy-service-policies
data:
# Policies loaded from files
# Or use a volume mount from a git-sync sidecar
HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: policy-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: policy-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Production Checklist
Security
- [ ] TLS enabled between services and policy service
- [ ] Network policies restricting access to policy service
- [ ] Secrets management for any sensitive configuration
- [ ] Audit logs forwarded to centralized logging (SIEM)
- [ ] Rate limiting configured at ingress/load balancer
- [ ] OIDC issuer uses HTTPS in production
Reliability
- [ ] Multiple replicas (minimum 2 for HA)
- [ ] Health checks configured for liveness and readiness
- [ ] Resource limits set appropriately
- [ ] PodDisruptionBudget configured
- [ ] Graceful shutdown handling
Monitoring
- [ ] Metrics endpoint exposed (Prometheus format)
- [ ] Dashboards for request rate, latency, error rate
- [ ] Alerts for high error rate, latency spikes
- [ ] Log aggregation configured
Performance
- [ ] Decision cache enabled with appropriate TTL
- [ ] JWKS cache TTL appropriate for key rotation frequency
- [ ] Connection pooling if using external data sources
- [ ] Load testing completed
Monitoring
Prometheus Metrics
The service exposes metrics at /metrics (if enabled):
# Request metrics
policy_requests_total{endpoint="/authorize", status="200"} 12345
policy_request_duration_seconds{endpoint="/authorize", quantile="0.99"} 0.005
# Decision metrics
policy_decisions_total{policy="celine.dataset.access", allowed="true"} 10000
policy_decisions_total{policy="celine.dataset.access", allowed="false"} 500
# Cache metrics
policy_cache_hits_total 8000
policy_cache_misses_total 4500
# Engine metrics
policy_evaluation_duration_seconds{quantile="0.99"} 0.001
Grafana Dashboard
Key panels to include:
- Request Rate — Requests per second by endpoint
- Latency — P50, P95, P99 latency
- Error Rate — 4xx and 5xx responses
- Decision Distribution — Allow vs deny by policy
- Cache Hit Rate — Cache effectiveness
- Policy Evaluation Time — OPA performance
Alerting Rules
groups:
- name: policy-service
rules:
- alert: HighErrorRate
expr: |
sum(rate(policy_requests_total{status=~"5.."}[5m]))
/ sum(rate(policy_requests_total[5m])) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Policy service error rate > 1%"
- alert: HighLatency
expr: |
histogram_quantile(0.99, rate(policy_request_duration_seconds_bucket[5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Policy service P99 latency > 100ms"
- alert: LowCacheHitRate
expr: |
policy_cache_hits_total / (policy_cache_hits_total + policy_cache_misses_total) < 0.5
for: 15m
labels:
severity: warning
annotations:
summary: "Policy cache hit rate < 50%"
Troubleshooting
Service Won't Start
Check policies are valid:
opa check policies/
opa test policies/ -v
Check Keycloak connectivity:
curl http://keycloak:8080/realms/celine/.well-known/openid-configuration
JWT Validation Failing
Decode and inspect the token:
# Decode JWT (without verification)
echo $TOKEN | cut -d. -f2 | base64 -d | jq
Check issuer matches:
# Token issuer
echo $TOKEN | cut -d. -f2 | base64 -d | jq -r '.iss'
# Expected issuer
echo $CELINE_OIDC_ISSUER
Policy Evaluation Errors
Test policy locally:
opa eval -d policies/ -i input.json "data.celine.dataset.access"
Check policy logs:
# Enable debug logging
CELINE_LOG_LEVEL=DEBUG uvicorn celine.policies.main:create_app
High Latency
Check cache effectiveness:
curl http://localhost:8009/ready | jq '.details.cache'
Profile OPA evaluation:
opa eval -d policies/ -i input.json --profile "data.celine.dataset.access"
Upgrades and Rollbacks
Policy Updates
Policies can be hot-reloaded without restart:
# Update policies on disk, then:
curl -X POST http://localhost:8009/reload
Service Updates
- Deploy new version alongside existing
- Run smoke tests against new version
- Gradually shift traffic (if using service mesh)
- Monitor for errors
- Rollback if needed by reverting deployment
Rollback Procedure
# Kubernetes
kubectl rollout undo deployment/policy-service
# Docker Compose
docker compose up -d --force-recreate policy-service