A Step Forward, But Not Far Enough: the EU’s AI Transparency Template

Opinion
July 30, 2025

The EU’s first attempt to require data transparency from AI companies has arrived, but it may not deliver the accountability it promises. The EU AI Office released the official Template for the Public Summary of Training Content for GPAI models (the Transparency Template), a document intended to give effect to Article 53(1)(d) of the AI Act, which requires developers of general-purpose AI models to publish a “sufficiently detailed summary” of the data used to train their models.

For months leading up to this release, Open Future collaborated closely with the Mozilla Foundation and gathered extensive input from experts across academia, civil society, and industry to demonstrate what a meaningful transparency obligation could look like in practice. Together, we developed and published a policy brief and a blueprint template.

Where Our Work Influenced the Template

The influence of this collective effort is visible in the EU AI Office’s final template. Our framework for identifying the legitimate interests at stake, spanning copyright, privacy and data protection, academic freedom, anti-discrimination, consumer protection, fair competition, is explicitly reflected in the Explanatory Notice that accompanies the template. Likewise, our insistence that disclosure should cover all stages of model development is mirrored in the AI Office’s scope, which extends from pre-training through fine-tuning to post-training alignment. Similarly, the categorization of data sources adopted by the AI Office template follows the taxonomy we proposed.

Critical Compromises and Omissions

While the conceptual framework clearly reflects our work, much of the substance and the level of detail has been watered down. Instead of our proposed precise quantitative disclosures of dataset size and composition, the AI Office settled for broad, approximate ranges. Where our blueprint advocated for strong transparency on licensed data, the official template requires only minimal disclosure.

A prime example of how the official template creates unnecessary ambiguity lies in its handling of scraped internet content. Our blueprint proposed a straightforward requirement: “a weighted list of the top 5 percent or 100,000 domains by data modality (e.g., text, images, video).” This precise formulation specified that percentages should be calculated separately for each modality, recognizing that different types of content come from different digital environments.

The official template, however, introduces confusing language that undermines this clarity. It requires providers to list “the top 10% of all domain names determined by the size of content scraped (in a representative manner across all modalities where applicable).” The crux of the problem lies in replacing our clear specification “by data modality” with the vague phrase “in a representative manner across all modalities where applicable.” While our language mandated separate calculations for each modality, the template’s phrasing allows multiple interpretations. As it stands now, the template’s language could be interpreted as requiring only the top 10% overall—a single aggregated list that could completely obscure modality-specific sources.

This ambiguity has serious implications. Consider a multimodal AI model where text data dominates the training corpus, with smaller but significant portions of image-text pairs and video content. Under an overall calculation, text-heavy domains would appear in the top 10% due to sheer volume, while specialized sources crucial for visual capabilities might remain completely invisible despite being essential for the model’s non-text performance. The template also fails to specify what constitutes a ‘representative’ metric when comparing fundamentally different data types. Should companies measure by file size, number of tokens, processing time, or impact on model performance? The template provides no guidance, leaving companies free to choose metrics that minimize their disclosure obligations.

Meaningful transparency requires combined per-modality top 10% lists. This approach would reveal the top sources for each modality (in other words—going back to the above example—the top text sources AND the top image sources AND the top video sources), providing a complete picture of where different types of training content originated and enabling stakeholders to assess modality-specific risks and capabilities.

Another telling sign of the extent to which the AI Office compromised is the lack of transparency in data processing. Where we called for technical specifications—includingfiltering processes, anonymization techniques, sampling methodologies, and annotation practices with sufficient depth for bias research and meaningful auditing—the AI Office opted for ‘general descriptions.’

This opens the door to corporate boilerplate and essentially meaningless responses. When asked about handling illegal content, companies could simply state ‘We implemented appropriate state-of-the-art filtering mechanisms’ and technically comply with the requirement. The concrete consequences are severe: researchers studying gender bias cannot determine whether training data was filtered for gender-balanced representation, and individuals cannot verify whether their social media posts were anonymized or scraped wholesale.

The Political Calculus Behind the Choices

The political calculus behind these choices is clear. In the face of strong industry lobbying, the EU AI Office prioritized regulatory simplicity and trade secret protection over the depth of disclosure needed for meaningful transparency.

This reflects a broader shift in Brussels: a growing appetite for simplification and a retreat from the EU’s earlier role as a rulemaker willing to set high standards for digital governance. But experience from other areas of digital policy shows that it is openness that fosters trust, enables competition, and promotes long-term innovation. Secrecy, by contrast, entrenches incumbents, shields companies from accountability, and risks slowing the very technological progress the EU hopes to accelerate. The irony is stark: by weakening transparency obligations under pressure from major AI developers, the EU risks undermining the competitiveness and legitimacy of its own AI sector.

Our blueprint remains a roadmap for a better approach. It demonstrates that robust transparency is both feasible and necessary if the EU is serious about building an AI future that aligns with fundamental rights. The EU AI Office’s template may be a first step, but it is a compromised one. The real task now is to ensure that the spirit of Article 53—transparency in the service of rights—is not lost.

A Concerning Timeline Loophole

An important caveat in the template’s rollout reveals another concerning compromise. New AI models released after August 2025 must comply with the transparency obligation immediately, but providers of models already on the market have until August 2027 to publish their data summaries. This means that the general-purpose AI systems currently dominating the market—such as GPT‑4 (deployed in ChatGPT), Claude, and Gemini—enjoy a two‑year grace period. Moreover, there is a risk that providers might attempt to classify upcoming versions, such as hypothetical “4.5” or “3.1” releases, not as new models but as incremental updates, potentially extending this grace period even further and weakening the transparency requirements.

Enforcement by the AI Office will not begin until August 2026—a full year after transparency obligations formally take effect for new models. All of this creates a window where providers may delay or minimize disclosures without immediate consequences, resulting in a confusing, staggered system where transparency requirements apply unevenly across AI models.

Looking Forward

The result is a framework that looks good in theory, but may fall short of delivering its full potential in practice. Without sufficiently detailed documentation of training data and processing methods, compounded by the timing loophole, the transparency template risks providing only limited accountability.

The immediate task is to ensure that once GPAI providers begin publishing their summaries, civil society, researchers, and supervisory bodies test whether these disclosures provide meaningful accountability or merely the illusion of transparency. We will monitor how the EU AI Office’s template is implemented in practice. As the first summaries are published, we will assess whether they truly “facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law,” as the AI Act requires. Whenever necessary, we will advocate for improvements in future iterations of the template. And if the first wave of disclosures proves superficial, the AI Office must be ready to correct course quickly.

Zuzanna Warso
keep up to date
and subscribe
to our newsletter
Subscribe