AI and Creative Labor

Making generative AI work for creators and the commons

This line of our work explores the consequences of the fact that machines can now consume human creativity, reassemble it, and spit out synthetic content that closely resembles the creative output previously produced by human creators.

The arrival of powerful generative machine learning models in 2022 raised important questions about their impact on creators and other rightholders. Will generative AI systems replace human creators? How will they affect the income of creators and other cultural producers? Do AI companies have the right to use copyright-protected works as training data for their models, and if so, under what conditions? And what does the emergence of generative AI tell us about the limits of copyright?

Our work in this area is guided by the objective of making AI work for both creators and the Digital Commons.

Timeline

At this year's Blender Conference, Paul gave a talk on AI, the commons, and the limits of copyright. The talk rehashes some of the arguments made in an earlier blog post with the same title, and combines them with the seven recommendations for making AI work for creators and the commons that we developed with other participants of this year's Creative Commons Summit. A recording of Paul's talk is available on the Blender YouTube channel:

Ahead of this year's Creative Commons Summit in Mexico City, Open Future and Creative Commons hosted a one-day workshop to discuss the impact of generative AI on creators and the commons. The workshop explored how legal and regulatory contexts differ around the world and how this affects the development of shared strategies for dealing with the impact of generative AI on the commons and the position of creators. Based on this discussion, and in subsequent conversations over the three days of the summit, the group identified a set of seven principles that could guide further work on creating an equitable framework for the regulation of generative AI around the world. These principles were published as part of a statement on "Making AI work for Creators and the Commons" which was published on the Creative Commons blog on the final day of the Summit.
Today, Open AI announced that GPTBot, the web crawler used to collect training data for its GPT series of large language models, can now be blocked via the robots.txt protocol. Site administrators can either disallow crawling of entire sites or create custom rules that allow `GPTBot` access to some parts of a site while blocking it from others. This functionality gives site owners a level of control over how their content is used by OpenAI's LLMs that they previously lacked. At first glance, OpenAI's approach follows the opt-out mechanism established by the TDM exceptions in the EU copyright framework. But on closer inspection, the model/vendor-specific nature of this approach raises more questions than it answers, as it implies that it is the responsibility of website publishers to set rules for each individual ML training crawler operating on the web, rather than setting default permissions that apply to all ML training crawlers.
According to the announcement, 40,000+ individual artworks have been opted out from use for ML training via the haveibeentrained.com tool. The remaining 79 million+ opt-outs were registered through partnerships with platforms (such as ArtStation) and large rightholders (such as Shutterstock).

These opt-outs are for images included in the LAION 5B dataset used to train the Stable Diffusion text-to-image model. Stability AI has announced that the opt-outs collected by spawning.ai and made available via an API will be respected in the upcoming training of Stable Diffusion V3.

As we have previously argued, such opt-outs are supported by the EU's legal framework for machine learning, which allows rights holders to reserve the right to text and data mining carried out for all purposes except academic research undertaken by academic reserach institutions. Spawning.ai is the first large-scale initiative to leverage this framework to offer creators and other rights holders the ability to exclude their works from being used for machine learning training.