Architecture
RAG Pipeline
The assistant processes user queries through a Retrieval-Augmented Generation pipeline:
- Upload — files are uploaded via
POST /upload, parsed, and chunked - Index — chunks are embedded and stored in Qdrant as vector documents
- Chat — at query time, relevant chunks are retrieved and injected into the OpenAI prompt
- Stream — the response is streamed back to the client via SSE
Component Overview
| Component | Module | Role |
|---|---|---|
| FastAPI application | main.py |
HTTP server, route registration, lifespan hooks |
| RAG pipeline | rag.py |
LlamaIndex index, retrieval, query engine |
| Ingestion | ingest.py |
Parses uploads, splits into nodes, upserts to Qdrant |
| Auth | auth.py |
JWKS-based JWT verification, identity extraction |
| History | history.py |
Conversation and message persistence |
| Uploads | uploads.py |
File storage, attachment management |
| OpenAI streaming | openai_stream.py |
SSE token streaming with LlamaIndex |
| Vision | openai_vision.py |
Image attachment support in chat messages |
| Qdrant setup | qdrant_setup.py |
Collection initialization and management |
| Settings | settings.py |
Pydantic-settings environment configuration |
Frontend
This repository is a pure API backend. The chat UI is maintained separately in celine-frontend:
packages/assistant-ui—@celine-eu/assistant-uiSvelte component library (ChatCore, AssistantWidget, etc.)apps/assistant— standalone full-page assistant app
The frontend communicates with this API at apiBaseUrl. When deployed inside the participant webapp, requests are proxied through the celine-webapp BFF.
Service Dependencies
| Service | Purpose |
|---|---|
| OpenAI | Chat completions and text embeddings |
| Qdrant | Vector storage and similarity search |
| PostgreSQL | Conversation history and attachment metadata |
| S3 / object storage | Raw file storage for uploads |
| Keycloak / JWKS endpoint | JWT public key discovery for auth |
Data Flow
Upload path:
POST /upload → parse file → split into chunks → embed (OpenAI) → upsert (Qdrant) → store metadata (PostgreSQL)
Chat path:
POST /chat → verify JWT → load history → retrieve context (Qdrant) → build prompt → stream (OpenAI SSE)
Database Models
Conversations and messages are stored in PostgreSQL using SQLAlchemy async models. Alembic handles schema migrations. The db/models.py module defines:
Conversation— linked to a user identity from the JWT subject claimMessage— role (user/assistant), content, timestamp, optional attachment refsAttachment— file metadata: name, MIME type, Qdrant collection reference