This research report was written by students of the Glushko & Samuelson Information Law and Policy (ILP) Lab at the Law Faculty of the University of Amsterdam in collaboration with Open Future. In April of this year, Open Future approached the ILP Lab to take a closer look at the legal status of the datasets produced by the European participants in the Google Books project.
The background to this request stems from the observation that, in the context of generative AI emerging as a new technological paradigm, Google’s exclusive access to a large part of the European public domain is likely to give it a significant competitive advantage. Such an outcome is difficult to reconcile with both the public domain status of these collections and policies aimed at providing a level playing field for smaller and European AI developers. Against the background, the report provides an analysis of the exclusivity clauses and their relationship to the relevant provisions in the 2019 Open Data Directive that seek to limit such arrangements.
Under these agreements, both Google and the European libraries can use digital copies of the books. The agreements contain a clause restricting the commercial use of these digital copies by others. The clauses are often agreed upon for up to 15 years. This means that Google and the libraries are the only ones with unrestricted use of the copies for the duration of this period, i.e. at the latest until 2025.
The number of books digitized amounts to a large digital dataset, stemming from the collections of different European libraries. Through these clauses Google has exclusive access to this dataset, which has put them in an advantageous – if not monopolistic – position to use them for the training of Artificial Intelligence (AI) models.
These exclusivity clauses are potentially problematic, especially in light of EU law. In particular, the 2019 Open Data Directive (ODD) explicitly regulates these types of clauses. According to this Directive, an exclusivity clause in itself may be needed for a private partner to recoup its investment. However, recital 49 states that the period of exclusivity should be limited to as short a time as possible in order to comply with the principle that public domain material should stay in the public domain once it’s digitized. This period should not exceed 10 years, and if it does, it should be subject to review.