A Copyright Infrastructure for the Digital Age

Five years after adopting the 2019 Copyright Directive, discussions about copyright are back in the spotlight. The sudden emergence of generative AI systems trained on billions of copyrighted works has created a lot of uncertainty among creators and other rightholders and raised new questions about how copyright interacts with this new set of technologies.

Fortunately, the EU copyright system is well-equipped to deal with these challenges: The 2019 directive introduced two exceptions for text and data mining that provide a balanced framework for using copyrighted works when training generative AI systems. Researchers in academic research institutions and cultural heritage institutions are free to use all lawfully accessible works to train AI models for the purpose of their research. Everyone else — including commercial AI developers — can only use works that are lawfully accessible and whose rightholders have not explicitly reserved their use for text and data mining.

The result is a balanced legal framework that privileges uses of works in the public interest but allows those creators and rightholders to control if and how their works can be used for AI training in other contexts. At the same time, this opt-out approach ensures that the vast majority of copyrighted material that is not actively managed by its creators or other rightholders can be freely used to train AI models.

The AI Act, once enacted, will build on this approach by requiring AI model developers to implement policies to comply with creator and rightsholder opt-outs and provide transparency about their use of copyrighted works for model training.

What is currently missing to make all of this work in practice is a set of generally accepted technical standards for expressing and managing such opt-outs and copyright information more generally. The lack of publicly available, reliable information about the copyright status of works and the permissions granted or reserved by creators and other rights holders is increasingly hampering the ability of copyright to function in machine-to-machine contexts. It also risks undermining the viability of Europe’s balanced approach to dealing with the copyright issues raised by generative AI.

What needs to be done?

This means that during the next mandate the EU should focus on creating the conditions for the existing copyright framework to work by investing into and supporting the creation of a copyright infrastructure that ensures that the copyright framework remains fit for purpose.

Increasing the amount of publicly available information on the copyright status, the usage permissions granted or reserved by creators and other rightholders is an essential step toward making sure that the EU copyright rules for generative AI training will work in practice and enable creators and other rightholders to control the conditions under which their works can be used. Increasing the amount of publicly available information is also an essential ingredient for protecting Public Domain works and other parts of the Digital Commons (such as works available under open licences).

So far, EU involvement in this space has been limited. To ensure that the EU regulatory framework for the use of copyrighted works functions in practice, the EU needs to step forward and ensure that the required technological infrastructure exists and that it is provided as a public good that serves the interests of all stakeholders: creators, rightholders, technology companies, and users (including institutional users).

Such an infrastructure must also be able to serve as a registry of Public Domain and openly licensed works that constitute the Digital Commons and must be protected from re-appropriation. Providing reliable public information on the copyright status and the licensing conditions is an essential step in removing legal uncertainties around the re-use of these works and further unlocking the societal value of the Digital Commons through initiatives like the Common European Data Space for Cultural Heritage. Ultimately, addressing the discoverability issues of copyright in the digital domain should benefit all stakeholders, including creators and other rightholders.

Europe’s opportunity

As we have argued in our policy brief on the issue, the speed of development of generative AI systems means a clear and urgent need for the European Commission to provide guidance on how the machine-readable opt-outs from AI training should be expressed in practice.

For the next mandate, the Commission should commit to supporting the creation of standards and protocols for AI model training compliance to assist the proper functioning of the regulatory framework provided by the 2019 Copyright Directive and the AI Act. These standards should complement (and possibly build on) existing plans for a public repository of Public Domain and openly licensed works.

To ensure that the EU copyright framework contributes to the broader goal of maintaining a balanced, transparent, and fair digital ecosystem, the Commission should also conduct a feasibility study for a more comprehensive copyright infrastructure that builds on these elements and identifies other areas of intervention needed. Based on the outcomes of such a study, the Commission should publish a roadmap for its implementation.