Skip to content

iRODS Data Mesh — Goal 2

The MESA Data Mesh federates iRODS zones across institutions so authorized users can query data wherever it lives, without first copying it into a central repository. The Mesh is led by RENCI under Co-PI T. Russell as a subaward to the project, with three RENCI RSEs (Draughn, King, James).

Why federate?

Scientific data is institution-bound — by policy, by storage cost, by IRB / data-use agreement. The Mesh lets MESA span those boundaries without requiring data movement:

  • A microbiologist at NCEMS can analyze CyVerse-hosted sequence data and their own institution's iRODS zone in a single query.
  • An ESIIL data scientist can apply an MCP-driven workflow across partner-institution zones without first copying TB of imagery.
  • Data-use restrictions (provenance, access, retention) are encoded as machine-enforceable policy, not human paperwork.

Architecture

flowchart LR
    A[MCP client] -->|"ds_* / mesa_ducklake_*"| B[mesa-mcp]
    B -->|federated query| Z1[iRODS zone — CyVerse]
    B -->|federated query| Z2[iRODS zone — partner institution]
    Z1 -->|policy check| P[Policy engine — RENCI]
    Z2 -->|policy check| P
    P -->|allow / deny / transform| Z1
    P -->|allow / deny / transform| Z2

Stack

Layer Technology
Federation Multi-zone iRODS
Authentication CILogon / Globus Auth / ORCID
Authorization Policy engine prototype (RENCI)
Bridge Lakehouse–Mesh interoperability — joint with Goal 1

Policy engine prototype

The RENCI team builds a policy-engine prototype that lives between the MCP servers and the underlying iRODS zones. It enforces:

  • Access policy — who can read, write, share each collection.
  • Provenance policy — every read and write event is recorded.
  • Retention policy — automatic age-based archival / deletion.
  • Data-use agreement encoding — DUAs translated into enforceable rules.

The prototype is tested against a TrustedCI prototype security review in WBS 6.5 (Phase 2).

Federated identity

Identity provider Use
CyVerse Keycloak Default for MESA's hosted services
CILogon Cross-institution federation (InCommon)
Globus Auth High-throughput data-transfer authentication
ORCID Researcher identity binding

Lakehouse–Mesh integration

The Mesh and Lakehouse meet at the AVU layer: every metadata operation that the Mesh exposes is mirrored into the Lakehouse catalog, so all AVU history — across zones — is queryable from a single point.

This integration (WBS 4.6 + 3.4) is on the critical path of Phase 1 (Weeks 24–48, owners Cao + Edgin + Russell).

Production deferral

The prototype demonstrates federation across two zones (CyVerse + one partner). Production multi-zone deployment across the full set of leveraged partners — and TACC Corral mirroring at full scale — is deferred to the follow-on Cat I/II operations proposal.