AI Safety¶
The MESA prototype is — by design — an agentic system that acts on research data on behalf of users. This policy summarizes the safety commitments the project makes and the controls in place during the prototype phase.
Principles¶
- No autonomous destructive actions. Any tool call that writes, moves, shares, or deletes data requires either an iRODS ACL the user personally holds, or an explicit user confirmation if the agent client surfaces one.
- Human-in-the-loop for paid compute. Launching a Discovery
Environment analysis (via
formation-mcporterrain-mcp) requires user confirmation when the analysis would draw on a paid allocation. - Sandbox isolation. Tool execution runs in per-request container sandboxes so a malformed or adversarial prompt cannot affect sibling sessions or escape to the host.
- Audit everything. Every tool call is logged with actor, timestamp, and arguments hash, and is available to project administrators for review.
Model selection¶
The MESA prototype prefers self-hosted, open-weight models running on Jetstream-2 GPUs (vLLM serving). Commercial-API egress (LiteLLM router) is used as a buffer for benchmarking and when a specific model materially outperforms the self-hosted options.
| Use | Model class |
|---|---|
| Hypothesis generation, summarization | Self-hosted open-weight 70B-class |
| Tool planning, multi-step orchestration | Self-hosted open-weight 70B-class or commercial frontier with user opt-in |
| Embeddings for RAG | Self-hosted sentence-transformers |
Prompt-injection resistance¶
The MCP servers MESA ships are designed assuming prompts may be adversarial:
- iRODS access is gated by the user's own bearer token, not by free-text parameters in the prompt.
- Tool calls validate arguments against typed schemas before execution.
- File contents read from iRODS are not re-interpreted as prompt instructions by the MESA servers themselves (the agent layer is where prompt-injection defenses also belong).
Disclosure of automation¶
Any artifact produced by a MESA-mediated workflow (a tagged dataset, a published analysis) carries a provenance record indicating that an AI agent was involved. This is recorded in the AVU history and surfaced in the Discovery Environment's analysis metadata.
What MESA does not do¶
- The MESA prototype does not train models on user data.
- The MESA prototype does not transmit user data to commercial APIs unless the user explicitly enables the commercial-API path for their session.
- The MESA prototype does not make medical, legal, or financial decisions on behalf of users. Outputs are research aids.
Review¶
This policy is reviewed twice-yearly by the AI Ethics Advisory Board and adjusted as the prototype matures.
Reporting concerns¶
Concerns about agent behavior — including unexpected tool calls, hallucinated outputs that look authoritative, or suspected prompt-injection successes — should be reported to the PI at tswetnam@arizona.edu and tracked in the project incident log.