Innovation Goals¶
MESA retains all three reviewer-praised innovation goals at full technical depth, demonstrated at prototype scale on leveraged infrastructure.
Goal 1 — Open-Source Data Lakehouse Prototype¶
- Stack: DuckLake, Apache Iceberg, Parquet
- Capabilities: time-travel versioning, sub-second analytical queries
- Leveraged compute: Jetstream-2 VMs, CyVerse production databases, UA HPC cost-recovery
- Lead: Co-PI L. Cao with Sci. Researcher I. Choi and GRA 2 (CS Database)
- WBS: 4.0
- Component: Data Lakehouse
The Lakehouse stores AVU (attribute/value/unit) metadata for iRODS collections in a queryable, time-travelable catalog. AI-powered metadata generation populates the catalog automatically, eliminating weeks of manual cataloging that currently delay research.
Goal 2 — Federated iRODS Data Mesh Prototype¶
- Stack: iRODS policy-engine prototype, CILogon / Globus Auth / ORCID federation, Lakehouse–Mesh interoperability
- Lead: Co-PI T. Russell (RENCI) with RSEs Draughn, King, James and GRA 3 (ECE Security / Federated AuthZ)
- WBS: 3.0
- Component: iRODS Data Mesh
The Data Mesh federates iRODS zones across institutions so that an authorized user can query data wherever it lives, without first copying it into a central repository. The policy engine encodes data-governance rules — provenance, access, retention — as machine-enforceable policy.
Goal 3 — Agentic AI Orchestration Prototype¶
- Stack: vLLM model serving on Jetstream-2 GPUs, retrieval-augmented generation (RAG) pipelines, Model Context Protocol (MCP) server framework, sandbox security, multi-agent orchestration patterns
- Lead: S. Roberts (Lead RSE) with Co-PI D. Ebert, Sci. Researcher (LLM/RAG/MCP, TBD), and GRA 1 (CS Agentic AI / MCP)
- WBS: 2.0
- Component: Agentic AI & MCP
MCP servers expose data-mesh and lakehouse operations as agent-callable tools. The agentic-orchestration layer manages multi-step scientific workflows end-to-end — from data discovery through analysis launch on the Discovery Environment.
Cross-cutting: science cases & community¶
Phase 2 matures the prototype into three science cases and onboards an early-adopter community across NSF synthesis centers and AI institutes. See Use Cases and Early-Adopter Program.
Success criteria¶
Per PEP §9 (Performance Measurement and Reporting):
| Metric | Target |
|---|---|
| Prototype availability | ≥ 95% business hours |
| Lakehouse query latency | Sub-second on benchmark |
| MCP / RAG median latency | < 5 s on Jetstream-2 GPUs |
| Publications acknowledging the prototype | ≥ 3 |
| Letters of support from early adopters | ≥ 8 |
| Follow-on operations proposal | Submission-ready by Month 24 |