Skip to content

UC1 — Agentic Discovery + Multi-Omics

Lead: S. Roberts, G. Chism, T. Swetnam  ·  WBS: 5.0  ·  DOI deliverable: Month 15

Story

A bioinformatician is investigating the molecular basis of a metabolic disorder. They have access to:

  • Genomic sequence data (NCEMS-affiliated cohort)
  • Transcriptomic data from a partner institution
  • Proteomic and metabolomic data from public repositories indexed in CyVerse

Traditionally, integrating these four omics layers takes weeks of manual data wrangling. With MESA, the researcher poses a natural-language question to an AI assistant, and the assistant orchestrates discovery, metadata reasoning, federated queries, and analysis launches across all four sources.

What the prototype demonstrates

  1. AI-powered metadata generation populates the Lakehouse with ontology- grounded AVUs (GO terms, ChEBI compounds, ENVO contexts) automatically.
  2. Cross-domain integration through the federated Data Mesh joins datasets that previously could not be queried together.
  3. Agentic orchestration plans a multi-step analysis (QC → alignment → peak calling → differential expression → pathway enrichment) and launches each step as a Discovery Environment analysis.

Architecture

flowchart LR
    A[Researcher prompt] --> Cl[MCP client]
    Cl --> Me[mesa-mcp]
    Me --> Mu[mesa-ducklake]
    Me --> DM[iRODS Data Mesh]
    Cl --> Fo[formation-mcp]
    Fo --> DE[Discovery Environment apps]
    DE --> RES[Results + DOI]

Deliverables (Month 15)

  • Reproducible Jupyter notebook on the Discovery Environment.
  • Open-source helper library on GitHub.
  • DOI issued through CyVerse's DataCite service.
  • Tutorial integrated with the Educator Fellows program.

Status

Draft — content matures through Phase 2.