Architecture Overview¶
MESA is built from a small set of components that together implement the three innovation goals on leveraged NSF cyberinfrastructure.
High-level architecture¶
flowchart TB
subgraph Clients["AI clients"]
A["Claude Desktop / Claude Code"]
B["Cursor · Cline · Continue · Codex"]
end
subgraph MCP["MCP server framework (Goal 3)"]
M1["mesa-mcp<br/>CyVerse Data Store + AVU + ontology"]
M2["formation-mcp<br/>Discovery Environment jobs"]
M3["terrain-mcp<br/>Terrain API"]
M4["irods-mcp-server<br/>generic iRODS access"]
end
subgraph Lake["Data Lakehouse (Goal 1)"]
D1["mesa-ducklake<br/>AVU history catalog<br/>DuckLake + Iceberg + Parquet"]
D2["Postgres catalog"]
end
subgraph Mesh["iRODS Data Mesh (Goal 2)"]
I1["iRODS zone — CyVerse 5.2 PB"]
I2["iRODS zone — partner institution"]
P["Policy engine<br/>(RENCI)"]
end
subgraph Infra["Leveraged infrastructure"]
J["Jetstream-2 — GPUs, K8s"]
C["CyVerse — DE, Data Store"]
U["UA HPC (cost-recovery)"]
O["OSN Pod / TACC Corral"]
end
Clients --> MCP
MCP --> Lake
MCP --> Mesh
Mesh --- P
Lake --- D2
MCP --> Infra
Lake --> Infra
Mesh --> Infra
Components by goal¶
| Goal | Component | Lead | Repo |
|---|---|---|---|
| Goal 1 — Lakehouse | DuckLake catalog | Cao | cyverse/mesa-ducklake |
| Goal 2 — Data Mesh | Federated iRODS | Russell (RENCI) | upstream iRODS + RENCI policy engine |
| Goal 3 — Agentic AI | MCP servers | Roberts | cyverse/mesa-mcp + sibling repos |
MCP servers in this project¶
These five MCP servers expose MESA's component capabilities to AI clients:
| Server | Language | Layer | Status |
|---|---|---|---|
mesa-mcp |
Python | CyVerse Data Store + AVU + ontology + DuckLake | Pre-alpha |
mesa-ducklake |
Python (library) | AVU-history Lakehouse client | Pre-alpha (functional) |
irods-mcp-server |
Go | Generic iRODS access | Public release |
formation-mcp |
Go | CyVerse Formation API — DE job orchestration | Public release |
terrain-mcp |
Node.js | CyVerse Terrain API | Public release |
How requests flow¶
A user prompt in an AI assistant typically lands as a sequence of MCP tool calls across these components:
- Discovery — the agent calls
ds_*tools inmesa-mcpto list, search, and inspect iRODS collections. - Metadata reasoning — the agent calls
mesa_ols_*(OBO/OLS ontology lookup) to ground tags in shared vocabularies, thenmesa_ducklake_*to read or update the AVU history catalog. - Action — the agent calls
formation-mcporterrain-mcpto launch an analysis app on the Discovery Environment over the selected data. - Audit — every AVU change is recorded as a versioned snapshot in the DuckLake catalog, so the metadata history is queryable and time-travelable.
Where to read more¶
- Data Lakehouse (DuckLake) — Goal 1 deep dive
- iRODS Data Mesh — Goal 2 deep dive
- Agentic AI & MCP — Goal 3 deep dive
- MCP servers — one-page-per-server reference