Skip to content

Innovation Goals

MESA retains all three reviewer-praised innovation goals at full technical depth, demonstrated at prototype scale on leveraged infrastructure.

Goal 1 — Open-Source Data Lakehouse Prototype

  • Stack: DuckLake, Apache Iceberg, Parquet
  • Capabilities: time-travel versioning, sub-second analytical queries
  • Leveraged compute: Jetstream-2 VMs, CyVerse production databases, UA HPC cost-recovery
  • Lead: Co-PI L. Cao with Sci. Researcher I. Choi and GRA 2 (CS Database)
  • WBS: 4.0
  • Component: Data Lakehouse

The Lakehouse stores AVU (attribute/value/unit) metadata for iRODS collections in a queryable, time-travelable catalog. AI-powered metadata generation populates the catalog automatically, eliminating weeks of manual cataloging that currently delay research.

Goal 2 — Federated iRODS Data Mesh Prototype

  • Stack: iRODS policy-engine prototype, CILogon / Globus Auth / ORCID federation, Lakehouse–Mesh interoperability
  • Lead: Co-PI T. Russell (RENCI) with RSEs Draughn, King, James and GRA 3 (ECE Security / Federated AuthZ)
  • WBS: 3.0
  • Component: iRODS Data Mesh

The Data Mesh federates iRODS zones across institutions so that an authorized user can query data wherever it lives, without first copying it into a central repository. The policy engine encodes data-governance rules — provenance, access, retention — as machine-enforceable policy.

Goal 3 — Agentic AI Orchestration Prototype

  • Stack: vLLM model serving on Jetstream-2 GPUs, retrieval-augmented generation (RAG) pipelines, Model Context Protocol (MCP) server framework, sandbox security, multi-agent orchestration patterns
  • Lead: S. Roberts (Lead RSE) with Co-PI D. Ebert, Sci. Researcher (LLM/RAG/MCP, TBD), and GRA 1 (CS Agentic AI / MCP)
  • WBS: 2.0
  • Component: Agentic AI & MCP

MCP servers expose data-mesh and lakehouse operations as agent-callable tools. The agentic-orchestration layer manages multi-step scientific workflows end-to-end — from data discovery through analysis launch on the Discovery Environment.

Cross-cutting: science cases & community

Phase 2 matures the prototype into three science cases and onboards an early-adopter community across NSF synthesis centers and AI institutes. See Use Cases and Early-Adopter Program.

Success criteria

Per PEP §9 (Performance Measurement and Reporting):

Metric Target
Prototype availability ≥ 95% business hours
Lakehouse query latency Sub-second on benchmark
MCP / RAG median latency < 5 s on Jetstream-2 GPUs
Publications acknowledging the prototype ≥ 3
Letters of support from early adopters ≥ 8
Follow-on operations proposal Submission-ready by Month 24