The release of powerful machine learning models under open licenses was a major event in the AI/ML development space in 2022. Until then, large generative models such as GPT-3 and Dall-E were seen as a force that would concentrate digital power in the hands of a few corporations. The release of the Stable Diffusion image generation model (and other models like BLOOM and Whisper) marked a significant change.
This was a breakthrough moment for the world of open, indicating the emergence of a new field in which the principles of open are applied. This is a nascent field in which there are still no established norms for openly sharing different elements of the machine learning stack: data, model, and code. Moreover, a new norm for sharing has emerged, expressed in a new suite of RAIL licenses that aim to combine an open licensing model with rules for responsible use.
By early 2023, it became clear that the emergence of generative AI would re-ignite copyright debates, which free culture and access to knowledge advocates had been involved in for the past two decades. Until then, public discussion about the potential harms of AI systems had focused on issues such as bias, disinformation and threats to privacy. Now, the list must include the issue of creators’ rights and rules for the reuse of creative works. This is a conversation that is familiar to open movement activists, but one that needs to move beyond its traditional framing. It is essential to understand how to balance creators’ and users’ rights in a context where creation is automated and reuse occurs in new ways.
Our research seeks to contribute to this public debate and to the emerging field of open and commons-based approaches to machine learning. We are particularly interested in the commons-based governance of datasets and models, the impact of generative AI on creativity, and the emergence of new licensing models that balance openness and responsible use.
The dependence of GFMs on digital commons has economic implications: much of the value comes from the commons, but the profits of the models and their applications may be disproportionately captured by those creating GFMs and associated products, rather than going back into enriching the commons. Some of the trained models have been open-sourced, some are available through paid APIs (such as OpenAI’s GPT-3 and other models), but many are proprietary and commercialized. It is likely that users will capture economic surplus from using GFM products, and some of them will have contributed to the commons, but there is still a question of whether there are obligations to directly compensate either the commons or those who contributed to it.In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons.In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons. Read the full paper here: