Data Governance¶
Scope¶
This policy covers data stored in MESA-managed iRODS collections, DuckLake catalogs, and prototype services. It does not override CyVerse's institutional data policy — where the two differ, CyVerse policy applies to data living in CyVerse-managed collections.
Data classification¶
| Class | Examples | Where it may live |
|---|---|---|
| Public | Reference datasets, published research data | Any MESA-managed storage |
| Internal | Pre-publication research data, draft analyses | MESA storage with appropriate ACLs |
| Restricted | Anything with privacy or compliance requirements (PII, PHI, ITAR, controlled-unclassified) | Not permitted in the prototype |
The MESA prototype is not approved for PII, PHI, or controlled-unclassified data. Production deployment after the follow-on operations award will include the additional controls needed for these classes.
Provenance¶
Every AVU change made through MESA tools is recorded in the Lakehouse catalog with actor and timestamp, giving a complete metadata-provenance record. Compute-provenance (which analysis produced which output) is recorded in the Discovery Environment via Formation / Terrain APIs.
Access¶
- Access to iRODS collections follows native iRODS ACLs, optionally augmented by the RENCI policy engine (Goal 2) for cross-zone federation.
- The hosted
mesa-mcpservice authenticates through CyVerse Keycloak; bearer tokens carry the authenticated user identity to iRODS. - Service accounts (used by automated agents) are explicitly enumerated and audited.
Retention¶
- Project data follows the retention policy of the owning institution.
- AVU-history catalogs (DuckLake) retain all snapshots by default. Project administrators may configure pruning for very old snapshots to manage storage cost.
- The MESA project itself retains operational logs for 90 days.
Sharing¶
- Sharing inside CyVerse uses native iRODS sharing.
- Sharing across institutions uses the federated Data Mesh (Goal 2) once both institutions have onboarded a policy-engine instance.
- Public dataset publication goes through CyVerse's DataCite DOI service.
Data-use agreements¶
Project teams that bring data under a Data Use Agreement (DUA) work with the MESA team to encode the DUA constraints as policy in the RENCI policy engine. This is currently a manual, person-supervised process; the follow-on production system will automate parts of it.
Review¶
This policy is reviewed:
- By the Change Control Board annually.
- By the AI Ethics Advisory Board twice-yearly.
- Whenever a new data class is requested by an early adopter.
Questions¶
Contact the PI at tswetnam@arizona.edu.