Architecture
RAG Pipeline
The assistant processes user queries through a Retrieval-Augmented Generation pipeline:
- Upload — files are uploaded via
POST /upload, parsed, and chunked. Images are captioned via the vision model. - Index — chunks are embedded and stored in Qdrant as vector documents
- Chat — at query time, relevant chunks are retrieved and injected into the OpenAI prompt, along with dashboard context and page-aware context
- Stream — the response is streamed back to the client via SSE
Component Overview
| Component | Module | Role |
|---|---|---|
| FastAPI application | main.py |
HTTP server (create_app factory), route registration, lifespan hooks |
| RAG pipeline | rag.py |
LlamaIndex index, retrieval, query engine |
| Ingestion | ingest.py |
Parses uploads, splits into nodes, upserts to Qdrant |
| Auth | auth.py |
JWT verification via trusted headers or JWKS |
| History | history.py |
Conversation and message persistence |
| Uploads | uploads.py |
File storage, attachment management |
| OpenAI streaming | openai_stream.py |
SSE token streaming with LlamaIndex |
| Vision | openai_vision.py |
Image captioning via OpenAI vision model |
| Dashboard context | dashboard_context.py |
Enriches prompts with user data from Digital Twin |
| Training materials | training_materials.py |
Indexes Markdown training materials into Qdrant |
| Training materials sync | training_materials_sync.py |
Git clone/pull for training materials repo |
| Site docs | site_docs.py |
Indexes documentation site content |
| Qdrant setup | qdrant_setup.py |
Collection initialization and management |
| Settings | settings.py |
Pydantic-settings environment configuration |
Frontend
This repository is a pure API backend. The chat UI is maintained separately in celine-frontend:
apps/assistant— standalone full-page assistant app
The frontend communicates with this API at apiBaseUrl.
Service Dependencies
| Service | Purpose |
|---|---|
| OpenAI | Chat completions, text embeddings, and vision (image captioning) |
| Qdrant | Vector storage and similarity search |
| PostgreSQL | Conversation history, attachment metadata |
| Digital Twin | Dashboard context enrichment (optional) |
| Keycloak / oauth2_proxy | JWT authentication |
Data Flow
Upload path:
POST /upload -> parse file -> (if image: caption via vision model) -> split into chunks -> embed (OpenAI) -> upsert (Qdrant) -> store metadata (PostgreSQL)
Chat path:
POST /chat -> verify JWT -> load history -> load authorized attachments -> retrieve context (Qdrant) -> enrich with dashboard context -> build prompt -> stream (OpenAI SSE)
Database Models
PostgreSQL (async via SQLAlchemy + asyncpg). Alembic handles migrations. Models:
Conversation— linked to a user identity from the JWT subject claimMessage— role (user/assistant), content, timestamp, optional attachment refsAttachment— file metadata: name, MIME type, Qdrant collection reference, scope (user/system)