LAION Round 2: Machine-Readable but Still Not Actionable

The Lack of Progress on TDM Opt-Outs
Analysis
December 18, 2025

Last week, the OLG Hamburg provided the first genuinely substantive judicial engagement with what constitutes a machine-readable rights reservation under Article 4(3) of the DSM Directive in an AI context. While EU courts have so far been consistent in treating the use of copyrighted works for AI training as a form of text and data mining, the question of how rightholders can validly opt out of the general TDM exception has largely remained underdeveloped. Until now, references to “machine readability” have appeared mostly in passing, without courts being required to articulate concrete criteria or to assess specific signalling mechanisms.

Prior to Hamburg, only two national courts had substantively engaged with the question of how a valid Article 4 opt-out must be expressed —and those decisions point in markedly different directions. In late 2024, in DPG Media v. HowardsHome, the Amsterdam District Court held that while Article 4(3) does not mandate a single technical standard, it nevertheless requires that a reservation be practically detectable and processable by automated systems. By contrast, in October 2025 the Danish Maritime and Commercial Court held that a prohibition on scraping or data mining set out in a publicly accessible HTML privacy or data policy could qualify as an “appropriate” reservation under Article 4(3), effectively equating public online accessibility with machine readability. Neither of these cases concerned the use of protected works for AI training.

The ruling of the OLG Hamburg in Kneschke v. LAION arises in a different procedural and analytical context. Deciding on appeal, the court largely confirmed the outcome reached by the Landgericht Hamburg in 2024: the use of the contested photographs by LAION in the construction of the LAION-5B dataset was covered not only by the German implementation of the Article 3 exception for TDM for scientific research (§ 60d UrhG), but also by the general TDM exception in Article 4 DSM (§ 44b UrhG). It is important to note, however, that while the LAION-5B dataset is used in the training of AI models, the contested reproductions themselves were not part of what is commonly understood as AI training. Rather, they were reproductions used as input to a trained model employed by LAION to determine whether the image descriptions in its dataset accurately describe the referenced images.

The LG Hamburg had confined its original analysis to Article 3, finding that LAION qualified as a scientific research organisation and that the computational analysis at issue constituted TDM for scientific purposes. It nevertheless suggested that Article 4 would in any event not apply, on the basis that the photographs had been published on a platform whose terms and conditions contained a reservation of rights, which the court considered could be regarded as machine-readable in light of the ability of large language models to process unstructured text.

Machine-readable = machine-actionable

Interestingly, in last week’s judgment the OLG Hamburg directly addressed the applicability of the German implementation of Article 4 and thus saw a need to establish explicit criteria for what makes a rights reservation machine-readable. At its core, the court argues that a rights reservation should be considered machine-readable if it can be machine-interpreted in such a way that an automated process can use it to block TDM operations. In other words, it needs to be both machine-readable and actionable (own translation):

When it comes to machine readability, it is not only important that the text can be captured by machine, but also that it can be interpreted by machine in such a way that, in an automated process, the content covered by the reservation is not processed

This understanding is largely based on the explanatory memorandum to the German DSM implementation law, which notes that:

The purpose of the provision is, on the one hand, to give rights holders the opportunity to prohibit use on the basis of legal permission. At the same time, the provision aims to ensure that automated processes, which are a typical criterion of text and data mining, can actually be carried out automatically in the case of content accessible online.

Although there is no equivalent justification in the recitals of the DSM Directive, this reading is fully in line with the regulatory intent behind the machine-readability requirement. That requirement only makes sense if it enables the automatic processing of opt-outs.

According to the OLG Hamburg, machine readability is not a static property but must be assessed against technological capabilities at the time of the disputed use. This is an important caveat, as it means that the court’s determination is based on the state of the art in the second half of 2021 (i.e. before the widespread availability of large language models with the capacity to automatically extract meaning from unstructured text). Keeping this in mind, it seems doubtful whether a court would reach the same conclusion for a dispute with an otherwise identical constellation of facts in 2025.

In other words, the ruling of the OLG Hamburg tells us only that plain-language reservations of rights in terms and conditions were not machine-readable in 2021, but it tells us comparatively little about what constitutes machine readability in 2025.

A question of vocabulary?

In its judgement the court also highlights the fact that the reservation of rights in this specific case did not mention TDM verbatim, but that it had to be “interpreted to determine whether it also excludes this use, which requires machine text comprehension.” This points to a deeper challenge for implementing machine-readable opt-outs at scale: For this approach to work there needs to be a shared understanding of key concepts. While the AI regulatory framework provides a clear definition of the concept of TDM (by way of the definition in Article 2(2) of the CDSM directive) many other concepts that could be used to more specifically target the scope of rights reservations currently lack definitions that have a common understanding among all stakeholders involved.

As we argued in our 2023 paper “Defining best practices for opting out of ML training”, for machine-readable opt-outs to work across different standards (and possibly non-standardised instruments such as terms of use), they need to rely on a common vocabulary that ensures the intent of a rights reservation can be interpreted uniformly across different protocols and standards. This requirement follows directly from the core criterion of automated actionability articulated by the OLG Hamburg: without a shared and well-defined vocabulary, it is not possible for automated systems to reliably detect and act upon rights reservations at scale.

This point was also explicitly raised by LAION in its written submission on appeal. LAION argued that the criterion of “machine-readable” usage restrictions can only be met if such restrictions can be automatically located, understood, and correctly classified without the risk of inaccuracies or misinterpretation. From this perspective, the ability to carry out fully automated TDM—an objective explicitly pursued by both EU and German legislators—depends on the definition of a shared vocabulary of terms (described by LAION as “technical parameters”) that informs automated systems, such as crawlers, whether and under what conditions website content may be used for TDM.

An attempt to define such a vocabulary of terms is currently underway in the AI Preferences Working Group of the Internet Engineering Task Force (IETF), which is chartered to develop both a vocabulary for AI-related usage preferences and mechanisms for expressing those preferences via the Robots Exclusion Protocol (i.e. robots.txt). However, work on the vocabulary (the author of this piece is one of the editors of the vocabulary draft) has largely stalled since the summer. The most recent version of the vocabulary draft contains only two defined terms (“Foundation Model Production” and “Search”), and neither of these terms is currently close to being consensual.

The discussion in the AI Preferences Working Group reflects deep divisions between different sets of stakeholders. These divisions are further compounded by the fact that preference signals would need to operate against the background of widely diverging regulatory frameworks, in which the EU’s copyright framework—characterised by a clearly defined legal status of opt-outs and public-interest exceptions that are protected from technological and contractual override—stands out as an exception.

A proliferation of vocabularies

With the IETF working group moving much more slowly than originally expected, the dynamics around machine-readable opt-outs have shifted elsewhere. In September, Cloudflare launched contentsignals.org, described as an “implementation of a mechanism for allowing website publishers to declare how automated systems should use their content.” It functions as a proprietary extension of robots.txt and offers three opt-out categories (“ai-train”, “ai-input”, and “search”). Last week, the Really Simple Licensing (RSL) Collective—an initiative supported primarily by US online publishers and content delivery networks, presenting itself as a new type of collective rights management organisation—published version 1.0 of its eponymous protocol, which establishes “a standardized XML vocabulary and associated discovery and authorization mechanisms for expressing machine-readable usage, licensing, payment, and legal terms that govern how digital assets may be accessed or licensed by AI systems and automated agents.

The RSL protocol leverages robots.txt and HTTP headers to direct crawlers to a licence file, allowing the expression of rules related to automated processing in general (“all”), any use by AI systems (“ai-all”), specific uses by AI systems (“ai-train”, “ai-input”, and “ai-index”), and the building of a search index (“search”). The RSL protocol effectively expands the vocabulary defined by Cloudflare by introducing hierarchical relationships and anchoring the entire system in an overarching “automated processing” category that functions similarly to the notion of TDM under EU copyright law.

In addition to this there are a number of other standards and protocols such as the TDM Reservation Protocol (TDMRep) and TDM·AI — both specifically focussed on enabling TDM opt-outs in compliance with Article 4(3) — as well as broader initiatives such as C2PA and IPTC PLUS that also offer opt-out functionality based on their own vocabularies.

Against the backdrop of this fragmented landscape the European Commission is currently running a consultation on protocols for reserving rights from text and data mining under the AI Act and the GPAI Code of Practice which is aimed at drawing up a list of generally agreed upon machine-readable protocols that GPAI model providers have to comply with to meet their obligations under the AI Act.

Seen in this light, Kneschke v. LAION illustrates both the value and the limits of judicial intervention. The OLG Hamburg draws an important boundary by insisting that “machine-readable” opt-outs must be machine-actionable in practice, not merely intelligible in theory. But courts can only assess concrete constellations of fact against past technological capabilities; they cannot, on their own, supply the shared vocabularies and interoperable signalling mechanisms that automated compliance at scale would require.

As a result, while courts can clarify the legal boundary conditions, questions about how machine-readable opt-outs function in practice are being taken up in standard-setting processes. Whether this shift leads to interoperable, open standards developed in inclusive technical fora, or to de facto standards shaped by individual platforms and vendors, will be decisive for how the Article 4(3) opt-out mechanism functions in practice. The Kneschke v. LAION  ruling clarifies the legal boundary conditions, but it leaves open the question of who will define the technical meaning of “machine-readable” going forward—and under what governance model.

For now, we are faced with a real risk that, in the absence of shared and openly governed standards, machine-readable opt-outs will become the foundation for a new intermediary layer that normalises pay-per-crawl or pay-per-click access to the open web.

This analysis was first published in two parts on Kluwer Copyright Blog: part 1 and part 2.

Paul Keller
keep up to date
and subscribe
to our newsletter
Subscribe