Architecture Overview¶

MESA is built from a small set of components that together implement the three innovation goals on leveraged NSF cyberinfrastructure.

High-level architecture¶

flowchart TB
    subgraph Clients["AI clients"]
      A["Claude Desktop / Claude Code"]
      B["Cursor · Cline · Continue · Codex"]
    end

    subgraph MCP["MCP server framework (Goal 3)"]
      M1["mesa-mcp<br/>CyVerse Data Store + AVU + ontology"]
      M2["formation-mcp<br/>Discovery Environment jobs"]
      M3["terrain-mcp<br/>Terrain API"]
      M4["irods-mcp-server<br/>generic iRODS access"]
    end

    subgraph Lake["Data Lakehouse (Goal 1)"]
      D1["mesa-ducklake<br/>AVU history catalog<br/>DuckLake + Iceberg + Parquet"]
      D2["Postgres catalog"]
    end

    subgraph Mesh["iRODS Data Mesh (Goal 2)"]
      I1["iRODS zone — CyVerse 5.2 PB"]
      I2["iRODS zone — partner institution"]
      P["Policy engine<br/>(RENCI)"]
    end

    subgraph Infra["Leveraged infrastructure"]
      J["Jetstream-2 — GPUs, K8s"]
      C["CyVerse — DE, Data Store"]
      U["UA HPC (cost-recovery)"]
      O["OSN Pod / TACC Corral"]
    end

    Clients --> MCP
    MCP --> Lake
    MCP --> Mesh
    Mesh --- P
    Lake --- D2
    MCP --> Infra
    Lake --> Infra
    Mesh --> Infra

Components by goal¶

Goal	Component	Lead	Repo
Goal 1 — Lakehouse	DuckLake catalog	Cao	`cyverse/mesa-ducklake`
Goal 2 — Data Mesh	Federated iRODS	Russell (RENCI)	upstream iRODS + RENCI policy engine
Goal 3 — Agentic AI	MCP servers	Roberts	`cyverse/mesa-mcp` + sibling repos

MCP servers in this project¶

These five MCP servers expose MESA's component capabilities to AI clients:

Server	Language	Layer	Status
`mesa-mcp`	Python	CyVerse Data Store + AVU + ontology + DuckLake	Pre-alpha
`mesa-ducklake`	Python (library)	AVU-history Lakehouse client	Pre-alpha (functional)
`irods-mcp-server`	Go	Generic iRODS access	Public release
`formation-mcp`	Go	CyVerse Formation API — DE job orchestration	Public release
`terrain-mcp`	Node.js	CyVerse Terrain API	Public release

How requests flow¶

A user prompt in an AI assistant typically lands as a sequence of MCP tool calls across these components:

Discovery — the agent calls ds_* tools in mesa-mcp to list, search, and inspect iRODS collections.
Metadata reasoning — the agent calls mesa_ols_* (OBO/OLS ontology lookup) to ground tags in shared vocabularies, then mesa_ducklake_* to read or update the AVU history catalog.
Action — the agent calls formation-mcp or terrain-mcp to launch an analysis app on the Discovery Environment over the selected data.
Audit — every AVU change is recorded as a versioned snapshot in the DuckLake catalog, so the metadata history is queryable and time-travelable.

Where to read more¶

Data Lakehouse (DuckLake) — Goal 1 deep dive
iRODS Data Mesh — Goal 2 deep dive
Agentic AI & MCP — Goal 3 deep dive
MCP servers — one-page-per-server reference