mesa-ducklake¶
Python library that tracks AVU (Attribute / Value / Unit) metadata history for iRODS collections and data objects in MESA-enabled projects, using DuckDB's DuckLake lakehouse pattern.
| Repository | github.com/cyverse/mesa-ducklake |
| Language | Python (library) |
| License | Apache 2.0 |
| Status | Pre-alpha but functional |
| Catalog | PostgreSQL |
| Data | Parquet under <project_root>/.mesa/ducklake/ in iRODS |
What it does¶
mesa-ducklake is a library, not a standalone MCP server. Its primary
consumer is mesa-mcp, which mirrors every AVU change into
the DuckLake catalog so that the full metadata history is queryable and
time-travelable.
The catalog uses DuckDB's DuckLake pattern: a Postgres catalog plus Parquet
data files stored at <project_root>/.mesa/ducklake/ inside the iRODS
project itself, so the metadata history travels with the data.
When the catalog is and isn't used¶
mesa-mcp deployment mode |
DuckLake catalog |
|---|---|
| Mode A — Hosted service | Active — every AVU change is recorded |
| Mode B — Local workstation | Usually skipped — leave catalog_dsn blank; AVU writes still succeed but are not recorded |
| Mode C — VICE app | Usually skipped — same reason as Mode B |
In the local / VICE modes, users typically don't have access to a shared Postgres catalog, so AVU writes go directly to iRODS without history.
Public API¶
from mesa_ducklake import DuckLakeClient, AvuChange
with DuckLakeClient(postgres_dsn=..., irods_session=...) as client:
project = client.register_project(
irods_path="/iplant/home/alice/myproj",
actor="alice",
zone="iplant",
)
snap = client.record_changes(
project_id=project.project_id,
actor="alice",
changes=[AvuChange(...)],
note="Tagged file.csv with ENVO biome",
)
avus = client.get_avus(project.project_id, irods_path=...)
DuckLakeClient is the only supported entry point. All other modules in
mesa_ducklake are internal.
Time-travel queries¶
# What did the metadata look like on a specific date?
historical_avus = client.get_avus_as_of(
project_id=project.project_id,
irods_path="/iplant/home/alice/myproj/data.csv",
timestamp="2026-09-01T00:00:00Z",
)
See the upstream time-travel docs for more patterns.
Backup and recovery¶
The Postgres catalog is backed up daily via pg_dump to iRODS at
<project_root>/.mesa/ducklake/backups/. The Parquet data files live in
iRODS already, so the entire catalog can be reconstructed from iRODS alone.
See the upstream backup guide for recovery procedures.