iRODS Data Mesh — Goal 2¶
The MESA Data Mesh federates iRODS zones across institutions so authorized users can query data wherever it lives, without first copying it into a central repository. The Mesh is led by RENCI under Co-PI T. Russell as a subaward to the project, with three RENCI RSEs (Draughn, King, James).
Why federate?¶
Scientific data is institution-bound — by policy, by storage cost, by IRB / data-use agreement. The Mesh lets MESA span those boundaries without requiring data movement:
- A microbiologist at NCEMS can analyze CyVerse-hosted sequence data and their own institution's iRODS zone in a single query.
- An ESIIL data scientist can apply an MCP-driven workflow across partner-institution zones without first copying TB of imagery.
- Data-use restrictions (provenance, access, retention) are encoded as machine-enforceable policy, not human paperwork.
Architecture¶
flowchart LR
A[MCP client] -->|"ds_* / mesa_ducklake_*"| B[mesa-mcp]
B -->|federated query| Z1[iRODS zone — CyVerse]
B -->|federated query| Z2[iRODS zone — partner institution]
Z1 -->|policy check| P[Policy engine — RENCI]
Z2 -->|policy check| P
P -->|allow / deny / transform| Z1
P -->|allow / deny / transform| Z2
Stack¶
| Layer | Technology |
|---|---|
| Federation | Multi-zone iRODS |
| Authentication | CILogon / Globus Auth / ORCID |
| Authorization | Policy engine prototype (RENCI) |
| Bridge | Lakehouse–Mesh interoperability — joint with Goal 1 |
Policy engine prototype¶
The RENCI team builds a policy-engine prototype that lives between the MCP servers and the underlying iRODS zones. It enforces:
- Access policy — who can read, write, share each collection.
- Provenance policy — every read and write event is recorded.
- Retention policy — automatic age-based archival / deletion.
- Data-use agreement encoding — DUAs translated into enforceable rules.
The prototype is tested against a TrustedCI prototype security review in WBS 6.5 (Phase 2).
Federated identity¶
| Identity provider | Use |
|---|---|
| CyVerse Keycloak | Default for MESA's hosted services |
| CILogon | Cross-institution federation (InCommon) |
| Globus Auth | High-throughput data-transfer authentication |
| ORCID | Researcher identity binding |
Lakehouse–Mesh integration¶
The Mesh and Lakehouse meet at the AVU layer: every metadata operation that the Mesh exposes is mirrored into the Lakehouse catalog, so all AVU history — across zones — is queryable from a single point.
This integration (WBS 4.6 + 3.4) is on the critical path of Phase 1 (Weeks 24–48, owners Cao + Edgin + Russell).
Production deferral¶
The prototype demonstrates federation across two zones (CyVerse + one partner). Production multi-zone deployment across the full set of leveraged partners — and TACC Corral mirroring at full scale — is deferred to the follow-on Cat I/II operations proposal.