Skip to content

CELINE Pipelines Tutorial

This document provides a complete, end-to-end guide to setting up, running, and deploying a CELINE pipeline application. It consolidates all prior sections into a single reference file.

Overview

A CELINE pipeline is a self-contained application that combines:

  • Meltano (ingestion / bronze)
  • dbt (staging, silver, gold)
  • Optional Python / Prefect flows
  • Governance configuration
  • OpenLineage emission (optional)
  • Container-first execution

Pipelines are designed to run identically in: - Local development - CI - Kubernetes

Creating a New Pipeline Application

CELINE provides a CLI command to scaffold a complete pipeline application with sane defaults.

From the root of your pipelines repository:

celine-utils pipeline init app demo_app

This command creates a fully functional example pipeline with:

  • Meltano project
  • dbt project structure
  • Example Prefect-compatible flow
  • Governance configuration
  • .env file
  • README

You can safely use the generated structure as-is and evolve it incrementally.

Canonical Repository Layout

After initialization, your repository will look like:

pipelines/
└── apps/
    └── demo_app/
        ├── meltano/
        ├── dbt/
        ├── flows/
        ├── governance.yaml
        ├── .env
        └── README.md

Each folder under apps/ represents a deployable pipeline application.

Baseline Docker Image (Required)

All CELINE pipelines must be built on top of the official pipeline image:

ghcr.io/celine-eu/pipeline

This image already contains: - Python + uv - celine-utils - Meltano - dbt - Prefect - OpenLineage client

Canonical Dockerfile (Per App)

Example Dockerfile for the demo_app pipeline:

ARG BASE_TAG=latest
FROM ghcr.io/celine-eu/pipeline:${BASE_TAG}

ARG APP_NAME

ENV APP_NAME=${APP_NAME}

# Enable / disable OpenLineage support
ENV OPENLINEAGE_ENABLED=false
ENV OPENLINEAGE_URL=http://marquez-api:5001
ENV OPENLINEAGE_NAMESPACE=${APP_NAME}

ENV PIPELINES_ROOT=${PIPELINES_ROOT:-/pipelines}
ENV BASE_DIR="${PIPELINES_ROOT}/apps"
ENV APP_PATH="${BASE_DIR}/${APP_NAME}"

ENV MELTANO_PROJECT_ROOT="${APP_PATH}/meltano"
ENV DBT_PROJECT_DIR="${APP_PATH}/dbt"
ENV DBT_PROFILES_DIR="${DBT_PROJECT_DIR}"

WORKDIR /pipelines

COPY ./apps/${APP_NAME} /pipelines/apps/${APP_NAME}

RUN uv sync

RUN if [ -f "${APP_PATH}/requirements.txt" ]; then       uv add --requirements "${APP_PATH}/requirements.txt";     fi

RUN if [ -d "${MELTANO_PROJECT_ROOT}" ]; then       cd "${MELTANO_PROJECT_ROOT}" &&       rm -rf .meltano &&       MELTANO_PROJECT_ROOT=$(pwd) meltano install ;     fi

RUN if [ -d "${DBT_PROJECT_DIR}" ]; then       cd "${DBT_PROJECT_DIR}" &&       rm -rf target dbt_packages .dbt &&       DBT_PROFILES_DIR=$(pwd) dbt deps ;     fi

WORKDIR ${APP_PATH}

Build Example

docker build   --build-arg APP_NAME=demo_app   -t demo-app-pipeline:latest   .

Environment Configuration

CELINE uses environment-driven configuration.

Minimal .env:

APP_NAME=demo_app

POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_DB=datasets
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres

OPENLINEAGE_URL=http://marquez-api:5000

Resolution order: - Environment variables - .env files - Defaults

Meltano (Ingestion / Bronze)

Configure ingestion in meltano/meltano.yml.

Example job:

jobs:
  - name: import
    tasks:
      - tap-my-source target-postgres

Run manually:

celine-utils pipeline run meltano "run import"

CELINE automatically: - Executes Meltano - Discovers datasets - Emits lineage events (if enabled) - Applies governance metadata

dbt (Staging / Silver / Gold)

CELINE follows a medallion-style convention:

Layer Command
Staging celine-utils pipeline run dbt staging
Silver celine-utils pipeline run dbt silver
Gold celine-utils pipeline run dbt gold
Tests celine-utils pipeline run dbt test

During execution, CELINE:

  • Collects schema metadata
  • Captures dbt test results
  • Emits dataset-level lineage

Governance Configuration

Each pipeline includes a governance.yaml.

Example:

defaults:
  access_level: internal
  classification: green
  retention_days: 365

sources:
  datasets.ds.gold_*:
    license: CC0-1.0
    ownership:
      - name: data-team
        type: DATA_OWNER
    tags: [gold]

Governance is:

  • Pattern-based
  • Automatically resolved
  • Emitted as a custom OpenLineage facet

Generate interactively:

celine-utils governance generate marquez --app demo_app

Note Since governance are openlineage facets, disabling openlineage using OPENLINEAGE_ENABLED=false will exclude the governance tracking capabilities.

Python / Prefect Flows

Example flows/pipeline.py:

from prefect import flow
from celine.utils.pipelines.pipeline import (
    meltano_run_import,
    dbt_run_staging,
    dbt_run_silver,
    dbt_run_gold,
)

@flow
def medallion_flow():
    meltano_run_import()
    dbt_run_staging()
    dbt_run_silver()
    dbt_run_gold()

Local Prefect Setup

Start a local Prefect server:

prefect server start

Set API URL:

export PREFECT_API_URL=http://127.0.0.1:4200/api

Example prefect.yaml (Local)

name: demo-app
deployments:
  - name: demo-app-flow
    description: "Demo app medallion pipeline"
    entrypoint: flows/pipeline.py:medallion_flow

    schedule:
      cron: "0 * * * *"

    work_pool:
      name: local-process
      job_variables:
        env:
          APP_NAME: demo_app
          PIPELINES_ROOT: /pipelines
          POSTGRES_HOST: localhost
          POSTGRES_DB: datasets

Register the deployment:

prefect deploy --prefect-file prefect.yaml

Run it:

prefect deployment run demo-app/demo-app-flow

Kubernetes & Production Deployment

In Kubernetes, Prefect deployments are registered once and executed later by workers.

CELINE provides a reference infrastructure repository at https://github.com/celine-eu/infra

It includes:

  • Prefect Server
  • Prefect Workers
  • Marquez / OpenLineage
  • PostgreSQL
  • Keycloak
  • Helm charts
  • Minikube-based local setup
  • Other CELINE specific services, that can be omitted customizing the helmfiles

Local Kubernetes (Minikube)

minikube start
kubectl create namespace celine

Follow the infra repository README to deploy the full stack.

Stage Tool
Local development Prefect local server
Image build ghcr.io/celine-eu/pipeline
Flow authoring flows/*.py
Deployment config prefect.yaml
Local execution local-process work pool
Production CELINE infra Helm charts

Key Takeaways

  • Use celine-utils pipeline init to bootstrap pipelines
  • One Docker image per pipeline app
  • Always use ghcr.io/celine-eu/pipeline
  • Governance is declarative and automatic
  • Prefect deployments are configuration, not code
  • Use CELINE infra for production

Need help?

Open an issue or submit a pull request to propose improvements. Commercial support is also available.