Innovation Goals¶

MESA retains all three reviewer-praised innovation goals at full technical depth, demonstrated at prototype scale on leveraged infrastructure.

Goal 1 — Open-Source Data Lakehouse Prototype¶

Stack: DuckLake, Apache Iceberg, Parquet
Capabilities: time-travel versioning, sub-second analytical queries
Leveraged compute: Jetstream-2 VMs, CyVerse production databases, UA HPC cost-recovery
Lead: Co-PI L. Cao with Sci. Researcher I. Choi and GRA 2 (CS Database)
WBS: 4.0
Component: Data Lakehouse

The Lakehouse stores AVU (attribute/value/unit) metadata for iRODS collections in a queryable, time-travelable catalog. AI-powered metadata generation populates the catalog automatically, eliminating weeks of manual cataloging that currently delay research.

Goal 2 — Federated iRODS Data Mesh Prototype¶

Stack: iRODS policy-engine prototype, CILogon / Globus Auth / ORCID federation, Lakehouse–Mesh interoperability
Lead: Co-PI T. Russell (RENCI) with RSEs Draughn, King, James and GRA 3 (ECE Security / Federated AuthZ)
WBS: 3.0
Component: iRODS Data Mesh

The Data Mesh federates iRODS zones across institutions so that an authorized user can query data wherever it lives, without first copying it into a central repository. The policy engine encodes data-governance rules — provenance, access, retention — as machine-enforceable policy.

Goal 3 — Agentic AI Orchestration Prototype¶

Stack: vLLM model serving on Jetstream-2 GPUs, retrieval-augmented generation (RAG) pipelines, Model Context Protocol (MCP) server framework, sandbox security, multi-agent orchestration patterns
Lead: S. Roberts (Lead RSE) with Co-PI D. Ebert, Sci. Researcher (LLM/RAG/MCP, TBD), and GRA 1 (CS Agentic AI / MCP)
WBS: 2.0
Component: Agentic AI & MCP

MCP servers expose data-mesh and lakehouse operations as agent-callable tools. The agentic-orchestration layer manages multi-step scientific workflows end-to-end — from data discovery through analysis launch on the Discovery Environment.

Cross-cutting: science cases & community¶

Phase 2 matures the prototype into three science cases and onboards an early-adopter community across NSF synthesis centers and AI institutes. See Use Cases and Early-Adopter Program.

Success criteria¶

Per PEP §9 (Performance Measurement and Reporting):

Metric	Target
Prototype availability	≥ 95% business hours
Lakehouse query latency	Sub-second on benchmark
MCP / RAG median latency	< 5 s on Jetstream-2 GPUs
Publications acknowledging the prototype	≥ 3
Letters of support from early adopters	≥ 8
Follow-on operations proposal	Submission-ready by Month 24