The rise of generative artificial intelligence systems has raised a number of copyright issues. Some of the most hotly contested questions revolve around the use of copyrighted works to train AI models. One particular problem that has received relatively little attention is how AI training intersects with openly licensed works. To better understand the dynamics at play, Open Future commissioned the Institute for Information Law at the University of Amsterdam (IVIR) to conduct a study on the impact of Share Alike/CopyLeft (SA/CL) licensing on machine learning and generative AI.
The study focuses on Share Alike/CopyLeft licenses because these the SA/CL conditions are crucial for the functioning of many large-scale collaborative projects and Digital Commons. Share Alike/CopyLeft licensing clauses provide a mechanism that many commons-based projects (such as Wikipedia) rely on to prevent misappropriation of open knowledge resources.
In practical terms, this means that if someone modifies, transforms, or builds upon the original material, they must distribute their contributions under the same license as the original.
We were, therefore, particularly interested in the extent to which such license clauses remain effective in a context where works are used to extract information that powers AI models.
Dr. Kacper Szkalej‘s and Prof. Dr. Martin Senftleben‘s research contributes to an understanding of how Share Alike and CopyLeft licensing terms work in the context of developing AI models, deploying AI systems, and using AI output.
The report provides a comprehensive legal analysis of whether and to what extent SA and CL licensing terms can provide the same protections for commons-based projects in the context of machine learning and generative AI output as they do for more traditional uses.
The report shows that challenges to the successful use of Share Alike and CopyLeft licenses arise primarily from the design of the licenses. It points out that several concepts in Creative Commons (CC) license agreements (“adapted material” and “technical modification”) that are critical for applying Share Alike and CopyLeft conditions to trained models, curated datasets, and AI output are largely absent in AI training workflows. In addition, most SA/CL licenses do not apply to uses that do not require permission under copyright exceptions and limitations (such as the TDM exceptions in Europe or fair use in the US). As a result, Share Alike/CopyLeft licenses are largely ineffective when materials licensed under them are used to train AI models.
In light of this, the authors of the report conclude that with respect to Share Alike and CopyLeft principles in the era of generative AI, communities that wish to continue to rely on these mechanisms have a choice between two basic policy approaches.
The first approach would maintain the primacy of copyright exceptions. In countries and regions that exempt machine learning from copyright control, this approach leads to broad freedom to use openly licensed resources as training material for AI models. At the same time, it is likely to marginalize Share Alike obligations in the realm of literary and artistic AI output. In the EU, for example, an approach that allows TDM exceptions to override SA licensing terms implies that AI developers are free to use Creative Commons (CC) licensed material for AI training purposes without seeking permission — and without accepting SA obligations with respect to their own AI development results.
The other option suggested by the authors would be to use copyright strategically to extend Share Alike obligations to AI training results and AI output. To achieve this goal, it would be necessary to reserve copyright and make use of CC material in the world of AI-generated content subject to conditions such as SA. Following this approach, a tailored licensing solution could grant AI developers broad freedom to use CC works for training purposes. In exchange for the training permission, however, AI developers would have to accept Share Alike and CopyLeft obligations. This could include an obligation to make the trained model available under an open license. At the AI exploitation stage, AI developers could be required to ensure — through a whole chain of contractual obligations — that SA/CL terms are also attached to AI output generated by AI systems that use models trained on SA/CL licensed resources.
Overall, the report provides valuable insights into the challenges of adapting certain core elements of open licensing to the new technological paradigm ushered in by generative AI. In doing so, we hope that the report can contribute to protecting Digital Commons from unilateral value extraction by reintroducing measures aimed at sustaining the curation and production of important digital public goods that have historically relied on SA/CL licensing.