Skip to content

Data Governance

Scope

This policy covers data stored in MESA-managed iRODS collections, DuckLake catalogs, and prototype services. It does not override CyVerse's institutional data policy — where the two differ, CyVerse policy applies to data living in CyVerse-managed collections.

Data classification

Class Examples Where it may live
Public Reference datasets, published research data Any MESA-managed storage
Internal Pre-publication research data, draft analyses MESA storage with appropriate ACLs
Restricted Anything with privacy or compliance requirements (PII, PHI, ITAR, controlled-unclassified) Not permitted in the prototype

The MESA prototype is not approved for PII, PHI, or controlled-unclassified data. Production deployment after the follow-on operations award will include the additional controls needed for these classes.

Provenance

Every AVU change made through MESA tools is recorded in the Lakehouse catalog with actor and timestamp, giving a complete metadata-provenance record. Compute-provenance (which analysis produced which output) is recorded in the Discovery Environment via Formation / Terrain APIs.

Access

  • Access to iRODS collections follows native iRODS ACLs, optionally augmented by the RENCI policy engine (Goal 2) for cross-zone federation.
  • The hosted mesa-mcp service authenticates through CyVerse Keycloak; bearer tokens carry the authenticated user identity to iRODS.
  • Service accounts (used by automated agents) are explicitly enumerated and audited.

Retention

  • Project data follows the retention policy of the owning institution.
  • AVU-history catalogs (DuckLake) retain all snapshots by default. Project administrators may configure pruning for very old snapshots to manage storage cost.
  • The MESA project itself retains operational logs for 90 days.

Sharing

  • Sharing inside CyVerse uses native iRODS sharing.
  • Sharing across institutions uses the federated Data Mesh (Goal 2) once both institutions have onboarded a policy-engine instance.
  • Public dataset publication goes through CyVerse's DataCite DOI service.

Data-use agreements

Project teams that bring data under a Data Use Agreement (DUA) work with the MESA team to encode the DUA constraints as policy in the RENCI policy engine. This is currently a manual, person-supervised process; the follow-on production system will automate parts of it.

Review

This policy is reviewed:

  • By the Change Control Board annually.
  • By the AI Ethics Advisory Board twice-yearly.
  • Whenever a new data class is requested by an early adopter.

Questions

Contact the PI at tswetnam@arizona.edu.