The Open Source Initiative (OSI) spent two years working with global experts to create the Open Source AI Definition (OSAID) 1.0. A critical need was discovered during this process: organizations working on open, fair and public-interest need to pay particular attention to and establish a shared position on data sharing and data governance.
Data governance refers to how rules for data use are created and enforced. This includes laws, standards, and social norms that guide what people can and can’t do with data. Good governance ensures fair and responsible data sharing.
In October 2024, OSI and Open Future gathered a group of experts to tackle these challenges in a two-day workshop held in Paris. This new paper, “Data Governance in Open Source AI: Enabling Responsible and Systematic Access”, is based on the outcomes of this workshop.
The paper suggests two paradigm shifts needed to better govern data needed for open source AI:
Six critical focus areas to advance data governance in Open Source AI are defined: data preparation and provenance, preference signaling and licensing, data stewards and custodians, environmental sustainability, reciprocity and compensation, and policy interventions.
The white paper calls for collective action among developers, policymakers, and civil society organizations to establish shared standards and implement solutions that balance open sharing with responsible governance. Through these efforts, Open Source AI can deliver on its promise of serving the public good while respecting the rights and interests of all stakeholders.
The report is also available on the webpage of the Open Source Initiative.