Skip to content

mesa-ducklake

Python library that tracks AVU (Attribute / Value / Unit) metadata history for iRODS collections and data objects in MESA-enabled projects, using DuckDB's DuckLake lakehouse pattern.

Repository github.com/cyverse/mesa-ducklake
Language Python (library)
License Apache 2.0
Status Pre-alpha but functional
Catalog PostgreSQL
Data Parquet under <project_root>/.mesa/ducklake/ in iRODS

What it does

mesa-ducklake is a library, not a standalone MCP server. Its primary consumer is mesa-mcp, which mirrors every AVU change into the DuckLake catalog so that the full metadata history is queryable and time-travelable.

The catalog uses DuckDB's DuckLake pattern: a Postgres catalog plus Parquet data files stored at <project_root>/.mesa/ducklake/ inside the iRODS project itself, so the metadata history travels with the data.

When the catalog is and isn't used

mesa-mcp deployment mode DuckLake catalog
Mode A — Hosted service Active — every AVU change is recorded
Mode B — Local workstation Usually skipped — leave catalog_dsn blank; AVU writes still succeed but are not recorded
Mode C — VICE app Usually skipped — same reason as Mode B

In the local / VICE modes, users typically don't have access to a shared Postgres catalog, so AVU writes go directly to iRODS without history.

Public API

from mesa_ducklake import DuckLakeClient, AvuChange

with DuckLakeClient(postgres_dsn=..., irods_session=...) as client:
    project = client.register_project(
        irods_path="/iplant/home/alice/myproj",
        actor="alice",
        zone="iplant",
    )
    snap = client.record_changes(
        project_id=project.project_id,
        actor="alice",
        changes=[AvuChange(...)],
        note="Tagged file.csv with ENVO biome",
    )
    avus = client.get_avus(project.project_id, irods_path=...)

DuckLakeClient is the only supported entry point. All other modules in mesa_ducklake are internal.

Time-travel queries

# What did the metadata look like on a specific date?
historical_avus = client.get_avus_as_of(
    project_id=project.project_id,
    irods_path="/iplant/home/alice/myproj/data.csv",
    timestamp="2026-09-01T00:00:00Z",
)

See the upstream time-travel docs for more patterns.

Backup and recovery

The Postgres catalog is backed up daily via pg_dump to iRODS at <project_root>/.mesa/ducklake/backups/. The Parquet data files live in iRODS already, so the entire catalog can be reconstructed from iRODS alone.

See the upstream backup guide for recovery procedures.

Development

pip install -e ".[dev]"
pytest -q
ruff check src/ tests/