AI and the Commons: Open Data Commons Licences and the issues of data governance

In February, our AI and the Commons community call’s guests were Melanie Dulong de Rosnay and Yaniv Benhamou. The meeting focused on their recent paper on Open Data Commons Licenses (ODCL), which proposes a standardized approach for licensing various rights in data. The licenses expand the open licensing approach by adding mechanisms for personal data and limitations on the scope of use. In addition, they combine licensing with a data trust and other collective forms of data governance.

The exploration that led to the creation of the licenses is related to questions about the use of data in AI training. Melanie and Yaniv claim that standardized licensing is a strong tool to protect both user rights and the commons as long as limitations of open licenses as tools for governing complex rights in data are addressed. We have also been exploring this issue, starting with the AI_Commons case study. We have been arguing that Open Access and related approaches provide minimal data governance mechanisms that are insufficient to address those challenges that go beyond copyright issues. Dulong de Rosnay and Benhamou draw similar conclusions from the current state of data sharing: “the ethos of the commons has yet to be fully performed and refined for data.”

Both researchers hope that sharing this model and opening it to feedback from the open movement community can help fix the bugs and make sure that the Open Data Commons Licenses can be, indeed, used in practice.

During the call, we addressed compatibility issues between existing legal tools, such as consent mechanisms for personal data, intellectual property rights licensing, and licenses for both AI models and training data. Melanie and Yaniv explained how the Open Data Commons License is an attempt to address the fact that data is functional, and thus, it becomes impossible to distinguish and disentangle between its types and corresponding legal mechanisms to address them (such as in the case of, e.g., intellectual property rights and privacy rights). That is why, in terms of terminology, the paper’s authors have decided to use the term “license” instead of contracts for all kinds of data that the ODCL might refer to. This effort aims at reconciling open sharing and the protection of personal data and human rights and could potentially also address the issue of consent fatigue.

The discussion also touched upon the issue of copyleft mechanisms and AI models. Yaniv argued that, in true copyleft fashion, an AI model could be pollinated with an Open Data Commons License (like the “propagating effect“ of open source licenses). The idea is that introducing the license would impact (and successively change) the entire data ecosystem.

The Open Data Commons Licenses proposal offers a blueprint for implementing the vision of strong commons for AI training. Hopefully, the work on these licenses will continue, and we will see it tested in practice by communities wishing to share training data while respecting user rights.

During this conversation, we haven’t gotten to discussing data trust and the governance mechanisms suggested in the paper; we plan to return to the topic in another call. Stay tuned for the updates.

AI and the Commons community calls are invite-only conversations. If you’d like to join them, email alicja@openfuture.eu.