GEMA v OpenAI: Memory is Fragile. Garbage Lasts Forever.

Earlier this week, the Landgericht München issued its ruling in the first EU lawsuit concerning the training of generative AI models. Having now read the GEMA v. OpenAI judgment from start to finish, what matters most to the discussions about the relationship between AI training and the EU copyright framework is that the court sets out two clear points: when the TDM exceptions apply to model development, and why memorisation of training data falls outside them.

This hasn’t stopped some commentators from confidently insisting that the ruling shows the TDM exceptions don’t enable AI training. But the judgment actually reaffirms that they do—provided the training doesn’t lead to memorization.

GEMA’s argument is straightforward: they claim (and demonstrate) that GPT‑4 and 4o were trained on song texts from writers they represent, and that these same texts can be reproduced by users of ChatGPT through fairly simple prompts. In their view, this constitutes unauthorized reproduction and unauthorized making available to the public. For good measure, they added a few rather implausible assertions about how some less‑than‑perfect chatbot outputs might infringe the personality rights of the authors. The court spends roughly twenty pages summarizing all of this, including this gem of a sentence^{[1]This is so much more beautiful in the original German. For anyone who does not read German here is a machine translation into English: The outputs are not double creations because they are not based on a magical process, but rather causally on training with the corresponding works and their memorization, i.e., storage in the model.}:

Die Outputs seien keine Doppelschöpfungen, weil diese nicht durch einen magischen Vorgang begründet würden, sondern kausal durch das Training mit den entsprechenden Werken und deren Memorisierung, also durch Speicherung im Modell.

OpenAI’s defence strategy can be charitably described as “comprehensive”: they argued almost everything except that their models are powered by sorcery. Beyond asserting that GEMA lacked standing, they claimed that they could rely on the research exception in Article 3 of the CDSM Directive, that any problematic outputs were merely hallucinations, and that the models were simply quoting, producing pastiches, or generating private copies. The court dismissed these arguments in short order, dryly noting—among other things—that AI models do not have rights because they are not human. In the end, essentially none of OpenAI’s arguments survived the court’s scrutiny.

The court’s position: training is allowed, memorization is not

This brings us to the core of the judgment. The court’s reasoning is remarkably straightforward: if model developers want to rely on the TDM exceptions, they must ensure that training does not result in the model storing protected works—in whole or in part—within its parameters. In other words, the court affirms that training on lawfully accessible and not-opted-out works falls squarely under the TDM exception when the training process does not lead to memorization. However, if the model does memorize works from the training data, then the exception no longer applies. As the court puts it (translation from German original):

If memorization of training data cannot be prevented using state-of-the-art technology, training models with copyright-protected training data are not covered by the text and data mining exception.

To me, the court’s reasoning is coherent. It aligns with the way I have understood the relationship between AI training and the TDM exceptions since I first wrote about this, and it confirms the basic principle that training is permissible so long as it does not result in memorization of protected works.

What the court is really parsing here is the boundary between learning and leakage—the difference between a system that abstracts information and one that quietly preserves the debris of what it was fed.

Although this is only a ruling from a lower-level court, anyone still insisting that the TDM exceptions do not apply to AI training now has to reckon with the fact that the court treats that position as legally untenable. And given that the judgment is widely being read as a reaffirmation of copyright protection in the context of AI, it even offers critics a face-saving way to drop their increasingly strained objections.

At the same time, the judgment puts the responsibility squarely on AI model developers and companies. They need to demonstrate that the models they are building are nothing more than abstract mathematical representations of digitized human knowledge and that they can function without sneakily storing a bit of extra juicy training data.

While I hope this is achievable, I remain sceptical that developers will be able to demonstrate it in a way that satisfies the evidentiary standards of the legal profession.

In more practical terms, I assume that this will require the development of some kind of shared minimum threshold for what counts as memorization. And that threshold will need to take seriously the idea of de minimis use. As noted by Technollama, some of the evidence brought forward by GEMA looks like scraping the bottom of the honeypot: the fact that a model can reproduce a fragment of fifteen words that many of us could memorize as teenagers is not, on its own, a good reason to bring the full force of copyright to bear on the problem.

The unresolved opt-out question

Even if a sensible de minimis threshold emerges, this still leaves the unresolved issue of how rightholders can effectively communicate that their works should not be used for commercial AI training.

What is notable here is that, although the court affirms the applicability of the Article 4 TDM exception to AI training (when implemented correctly), it says almost nothing about the hotly debated issue of machine-readable opt-outs. OpenAI argued that it was entitled to use the song texts because they had been included in datasets compiled in accordance with robots.txt exclusions, but the court did not need to address this point. Once it found that the model itself contained reproductions of protected works, the opt-out question became irrelevant.

Even so, several passages imply that, that if the court had been required to rule on the matter, it would have have adopted a more expansive interpretation of “machine-readable” than mere adherence to formal technical standards like robots.txt. This aligns the trend evident in the LAION judgment and in a recent case from Denmark: European courts are increasingly willing to interpret machine-readability in a flexible and practical way. The responsibility for this gap lies with the AI companies themselves, whose inability or unwillingness to agree on a standard is now leaving them increasingly exposed on this front as well.

GEMA v OpenAI: Memory is Fragile. Garbage Lasts Forever.

The court’s position: training is allowed, memorization is not

The unresolved opt-out question

Footnotes