Copyright in its current form dates back to the late 19th century, well before the invention of most of the technologies we use to access and use copyright-protected works. This has led to a situation in which copyright protection is way too broad, both by virtue of its excessive length and by the fact that it also applies to works whose authors do not want or need copyright protection. Therefore, copyright has a discoverability problem: it is very often difficult or impossible to ask for permission from rights holders because of a lack of public copyright management information.
As a result, the overboard scope of protection of copyright prevents many socially beneficial uses of copyrighted works in contexts such as research, education, and the preservation of cultural heritage.
One way to address this challenge is to build publicly available registries of copyright management information, something that we have been advocating since 2022. This is an approach that we have started to actively explore in 2025 through our work on the CommonsDB registry.
Our work in the area is guided by the conviction that addressing the discoverability issues of copyright in the digital domain will contribute to a more modern copyright framework that benefits all of society, including creators and other rights holders.
No. There isn’t really a way to get a hundred million images and know where they’re coming from. It would be cool if images had metadata embedded in them about the copyright owner or something. But that’s not a thing; there’s not a registry. There’s no way to find a picture on the Internet, and then automatically trace it to an owner and then have any way of doing anything to authenticate it.While this response sounds derisive in the context of the article (a similar statement made by Open AI to the House of Lords was also criticized as derisive), Holz does have a point. There is indeed an urgent need for better copyright information infrastructures that allow AI model developers and others to automatically assess the copyright status of works - and clear rights. Something we pointed out in our recent policy paper on best practices for opting out of ML training and an earlier white paper on a public repository of public domain and openly licensed works.