UC1 — Agentic Discovery + Multi-Omics¶
Lead: S. Roberts, G. Chism, T. Swetnam · WBS: 5.0 · DOI deliverable: Month 15
Story¶
A bioinformatician is investigating the molecular basis of a metabolic disorder. They have access to:
- Genomic sequence data (NCEMS-affiliated cohort)
- Transcriptomic data from a partner institution
- Proteomic and metabolomic data from public repositories indexed in CyVerse
Traditionally, integrating these four omics layers takes weeks of manual data wrangling. With MESA, the researcher poses a natural-language question to an AI assistant, and the assistant orchestrates discovery, metadata reasoning, federated queries, and analysis launches across all four sources.
What the prototype demonstrates¶
- AI-powered metadata generation populates the Lakehouse with ontology- grounded AVUs (GO terms, ChEBI compounds, ENVO contexts) automatically.
- Cross-domain integration through the federated Data Mesh joins datasets that previously could not be queried together.
- Agentic orchestration plans a multi-step analysis (QC → alignment → peak calling → differential expression → pathway enrichment) and launches each step as a Discovery Environment analysis.
Architecture¶
flowchart LR
A[Researcher prompt] --> Cl[MCP client]
Cl --> Me[mesa-mcp]
Me --> Mu[mesa-ducklake]
Me --> DM[iRODS Data Mesh]
Cl --> Fo[formation-mcp]
Fo --> DE[Discovery Environment apps]
DE --> RES[Results + DOI]
Deliverables (Month 15)¶
- Reproducible Jupyter notebook on the Discovery Environment.
- Open-source helper library on GitHub.
- DOI issued through CyVerse's DataCite service.
- Tutorial integrated with the Educator Fellows program.
Status¶
Draft — content matures through Phase 2.