Skip to content

Open-Meteo Pipeline

Overview

The Open-Meteo pipeline ingests hourly weather forecast data from the Open-Meteo API via Meltano (tap-openmeteo) for energy community forecasting use cases.

No API key is required -- Open-Meteo provides free access under CC-BY 4.0. To change the weather model, modify the configuration in meltano/meltano.yml.


Data sources

  • Open-Meteo Forecast API (hourly weather variables, DWD ICON-D2 model)

Data is fetched from: https://open-meteo.com/


Pipeline architecture

Open-Meteo API
     |
     v
[Meltano] ── tap-openmeteo -> target-postgres
     |
     v
[RAW] ── raw.om_weather ── verbatim API data
     |
     v
[STAGING] ── dbt ── type casting (time -> datetime), deduplication
     |
     v
[SILVER] ── dbt ── 7 weather variables (hourly)
     |
     v
[GOLD COMPUTE] ── Python ── 29 ML features + 15 PV features
     |
     v
[GOLD] ── dbt ── type casting, tests (ds_dev_gold.om_weather_features)

Output datasets

Silver layer (ds_dev_silver.om_weather_hourly)

7 weather variables:

Column Unit Description
datetime timestamp Hourly, Europe/Rome timezone
shortwave_radiation W/m2 Total solar radiation (GHI)
direct_radiation W/m2 Direct beam solar radiation
diffuse_radiation W/m2 Scattered solar radiation
global_tilted_irradiance W/m2 Global tilted irradiance for PV
cloud_cover % Total cloud cover
temperature_2m C Air temperature at 2m
precipitation mm Total precipitation

Gold layer (ds_dev_gold.om_weather_features)

29 ML features for energy consumption forecasting:

Temporal / Fourier (11 features)

Feature Description
hour_sin, hour_cos Cyclical hour encoding (sin/cos of 2pihour/24)
dow_sin, dow_cos Cyclical day-of-week encoding (sin/cos of 2pidow/7)
annual_sin, annual_cos Annual cycle encoding (period = 8766 hours)
semi_annual_sin, semi_annual_cos 6-month cycle encoding (period = 4383 hours)
is_weekend 1 if Saturday or Sunday
is_holiday 1 if Italian public holiday
is_daylight 1 if hour between 6 and 20 (Italian time)

Temperature-derived (11 features)

Feature Description
temperature_2m Direct pass-through (C)
heating_degree_hour max(0, 18 - temperature_2m)
temp_rolling_mean_24h 24h rolling mean
temp_rolling_std_24h 24h rolling std
temp_change_rate_3h (T[t] - T[t-3]) / 3
thermal_inertia_12h EWM with halflife=12h
temp_gradient_24h T[t] - T[t-24]
heating_degree_rolling_mean_24h 24h rolling mean of HDD
cumulative_hdd_48h 48h rolling sum of HDD
temp_x_hour_sin temperature_2m * hour_sin
heating_x_night heating_degree_hour * is_night (hour >= 20 or <= 6)

Radiation-derived (3 features)

Feature Description
shortwave_radiation Direct pass-through (W/m2)
radiation_rolling_mean_24h 24h rolling mean
radiation_x_daytime shortwave_radiation * is_daylight

Cloud-derived (2 features)

Feature Description
cloud_cover Direct pass-through (%)
cloud_cover_rolling_mean_24h 24h rolling mean

Precipitation (1 feature)

Feature Description
precipitation Direct pass-through (mm)

Interaction (1 feature)

Feature Description
weekend_x_hour_cos is_weekend * hour_cos

Gold meters layer (ds_dev_gold.om_weather_features_meters)

15 PV/solar features for energy metering:

Feature Description
hour_sin, hour_cos Cyclical hour encoding
day_of_week, month Calendar features
is_weekend, is_daylight Binary flags
global_tilted_irradiance PV panel irradiance (W/m2)
shortwave_radiation GHI (W/m2)
cloud_cover Cloud cover (%)
temperature_2m Temperature (C)
clearsky_index Ratio of actual to theoretical clear-sky GHI
effective_solar_pv direct_radiation + 0.9 * diffuse_radiation
heating_degree, cooling_degree Thermal comfort indices
theoretical_prod GTI * effective_solar_pv

Licensing follows CC-BY-4.0.


Timezone handling

All datetime operations use Italian local time (Europe/Rome):

  • The tap-openmeteo config sets timezone: "Europe/Rome"
  • Feature computation works directly on the local-time datetime column
  • No UTC conversion happens at any stage
  • DST transitions (2 hours/year) are not explicitly handled -- the model was trained with the same convention

Execution

Run once via Docker

# Ensure Postgres is up
docker compose up datasets-db -d
docker compose build pipeline-om

Daily mode (fetches 48h forecast + 120h past):

docker compose run --rm pipeline-om python3 -c "
from flows.pipeline import om_flow
om_flow()
"

Run as scheduled service (daily at 06:00)

docker compose up pipeline-om -d

This registers the Prefect flow with cron schedule 0 6 * * *.


Configuration

Pipeline configuration is split between two files:

meltano/meltano.yml -- Extraction config

Section Key parameters
locations name, latitude, longitude, timezone
forecast_hours / past_hours Time window (48h forecast, 120h past)
models Weather model selection (icon_d2)
hourly_variables 7 weather variables
streams_to_sync weather_hourly
stream_maps weather_hourly -> om_weather (table alias)

flows/config.yaml -- Table mappings and schedule

Section Key parameters
silver silver table/schema (dbt output)
gold_raw / gold_raw_meters raw gold tables (Python feature output)
gold / gold_meters gold table/schema (dbt output)
schedule cron expression, flow name

File structure

apps/om/
├── flows/
│   ├── config.yaml       # Table mappings and schedule
│   ├── features.py       # Gold-layer feature engineering (29 + 15 features)
│   └── pipeline.py       # Prefect tasks and flow definition
├── meltano/
│   ├── meltano.yml       # Meltano extractor/loader config (tap-openmeteo)
│   └── .gitignore        # Ignores .meltano/ runtime
├── dbt/
│   ├── models/
│   │   ├── staging/      # stg_om_weather (time->datetime, type cast, dedup)
│   │   ├── silver/       # om_weather_hourly (7 weather variables)
│   │   └── gold/         # om_weather_features (type cast + tests)
│   ├── macros/
│   │   └── cleanup_om_weather.sql  # Retention-based raw data cleanup
│   ├── dbt_project.yml
│   └── profiles.yml
├── governance.yaml       # Dataset governance metadata
└── README.md

Contributing

Contributions may include: - additional ML features (update features.py and SELECTED_FEATURES) - new locations (add to meltano/meltano.yml locations array) - improved imputation strategies - additional dbt tests

Ensure: - licensing remains compatible - derived datasets are documented in governance - feature parity between historical and forecast is maintained