Architecture

RAG Pipeline

The assistant processes user queries through a Retrieval-Augmented Generation pipeline:

Upload — files are uploaded via POST /upload, parsed, and chunked. Images are captioned via the vision model.
Index — chunks are embedded and stored in Qdrant as vector documents
Chat — at query time, relevant chunks are retrieved and injected into the OpenAI prompt, along with dashboard context and page-aware context
Stream — the response is streamed back to the client via SSE

Component Overview

Component	Module	Role
FastAPI application	`main.py`	HTTP server (`create_app` factory), route registration, lifespan hooks
RAG pipeline	`rag.py`	LlamaIndex index, retrieval, query engine
Ingestion	`ingest.py`	Parses uploads, splits into nodes, upserts to Qdrant
Auth	`auth.py`	JWT verification via trusted headers or JWKS
History	`history.py`	Conversation and message persistence
Uploads	`uploads.py`	File storage, attachment management
OpenAI streaming	`openai_stream.py`	SSE token streaming with LlamaIndex
Vision	`openai_vision.py`	Image captioning via OpenAI vision model
Dashboard context	`dashboard_context.py`	Enriches prompts with user data from Digital Twin
Training materials	`training_materials.py`	Indexes Markdown training materials into Qdrant
Training materials sync	`training_materials_sync.py`	Git clone/pull for training materials repo
Site docs	`site_docs.py`	Indexes documentation site content
Qdrant setup	`qdrant_setup.py`	Collection initialization and management
Settings	`settings.py`	Pydantic-settings environment configuration

Frontend

This repository is a pure API backend. The chat UI is maintained separately in celine-frontend:

apps/assistant — standalone full-page assistant app

The frontend communicates with this API at apiBaseUrl.

Service Dependencies

Service	Purpose
OpenAI	Chat completions, text embeddings, and vision (image captioning)
Qdrant	Vector storage and similarity search
PostgreSQL	Conversation history, attachment metadata
Digital Twin	Dashboard context enrichment (optional)
Keycloak / oauth2_proxy	JWT authentication

Data Flow

Upload path:

POST /upload -> parse file -> (if image: caption via vision model) -> split into chunks -> embed (OpenAI) -> upsert (Qdrant) -> store metadata (PostgreSQL)

Chat path:

POST /chat -> verify JWT -> load history -> load authorized attachments -> retrieve context (Qdrant) -> enrich with dashboard context -> build prompt -> stream (OpenAI SSE)

Database Models

PostgreSQL (async via SQLAlchemy + asyncpg). Alembic handles migrations. Models:

Conversation — linked to a user identity from the JWT subject claim
Message — role (user/assistant), content, timestamp, optional attachment refs
Attachment — file metadata: name, MIME type, Qdrant collection reference, scope (user/system)