Skip to content

Architecture Overview

MESA is built from a small set of components that together implement the three innovation goals on leveraged NSF cyberinfrastructure.

High-level architecture

flowchart TB
    subgraph Clients["AI clients"]
      A["Claude Desktop / Claude Code"]
      B["Cursor · Cline · Continue · Codex"]
    end

    subgraph MCP["MCP server framework (Goal 3)"]
      M1["mesa-mcp<br/>CyVerse Data Store + AVU + ontology"]
      M2["formation-mcp<br/>Discovery Environment jobs"]
      M3["terrain-mcp<br/>Terrain API"]
      M4["irods-mcp-server<br/>generic iRODS access"]
    end

    subgraph Lake["Data Lakehouse (Goal 1)"]
      D1["mesa-ducklake<br/>AVU history catalog<br/>DuckLake + Iceberg + Parquet"]
      D2["Postgres catalog"]
    end

    subgraph Mesh["iRODS Data Mesh (Goal 2)"]
      I1["iRODS zone — CyVerse 5.2 PB"]
      I2["iRODS zone — partner institution"]
      P["Policy engine<br/>(RENCI)"]
    end

    subgraph Infra["Leveraged infrastructure"]
      J["Jetstream-2 — GPUs, K8s"]
      C["CyVerse — DE, Data Store"]
      U["UA HPC (cost-recovery)"]
      O["OSN Pod / TACC Corral"]
    end

    Clients --> MCP
    MCP --> Lake
    MCP --> Mesh
    Mesh --- P
    Lake --- D2
    MCP --> Infra
    Lake --> Infra
    Mesh --> Infra

Components by goal

Goal Component Lead Repo
Goal 1 — Lakehouse DuckLake catalog Cao cyverse/mesa-ducklake
Goal 2 — Data Mesh Federated iRODS Russell (RENCI) upstream iRODS + RENCI policy engine
Goal 3 — Agentic AI MCP servers Roberts cyverse/mesa-mcp + sibling repos

MCP servers in this project

These five MCP servers expose MESA's component capabilities to AI clients:

Server Language Layer Status
mesa-mcp Python CyVerse Data Store + AVU + ontology + DuckLake Pre-alpha
mesa-ducklake Python (library) AVU-history Lakehouse client Pre-alpha (functional)
irods-mcp-server Go Generic iRODS access Public release
formation-mcp Go CyVerse Formation API — DE job orchestration Public release
terrain-mcp Node.js CyVerse Terrain API Public release

How requests flow

A user prompt in an AI assistant typically lands as a sequence of MCP tool calls across these components:

  1. Discovery — the agent calls ds_* tools in mesa-mcp to list, search, and inspect iRODS collections.
  2. Metadata reasoning — the agent calls mesa_ols_* (OBO/OLS ontology lookup) to ground tags in shared vocabularies, then mesa_ducklake_* to read or update the AVU history catalog.
  3. Action — the agent calls formation-mcp or terrain-mcp to launch an analysis app on the Discovery Environment over the selected data.
  4. Audit — every AVU change is recorded as a versioned snapshot in the DuckLake catalog, so the metadata history is queryable and time-travelable.

Where to read more