“Sufficiently detailed summary” — v2.0 of the blueprint for GPAI training data

In June, together with the Mozilla Foundation, we published a policy brief on the AI Act’s new requirement for GPAI providers to disclose information about the content used to train GPAI models. The brief included a proposal for the template for the “sufficiently detailed summary” of the training data.

Today, we are presenting a revised version of the blueprint, which has been further developed and refined based on the feedback we received following the publication of the policy brief and a workshop with experts from industry, academia, and civil society in September 2024.

While the core purpose of the blueprint remains the same—to provide meaningful and comprehensive information that enables parties with legitimate interests to exercise and enforce their rights—we have made some changes to the structure and wording of the template to provide more clarity to potential users and ensure that the template is viable in practice.

One of the main changes concerns the relationship between data sources—meaning the origins of the data—and data sets, which are processed and filtered data points derived from those sources. In the new version of the template, we’ve strengthened the connection between these two categories and suggested that GPAI providers disclose the specific data sources for each dataset.

We hope that this revised blueprint will help inform the AI Office’s work and serve as a valuable contribution to the consultations on the Code of Practice, outlining rules for general-purpose AI providers.

View the blueprint