A Public Option for AI Development

The European Union (EU) is at the forefront of regulatory efforts to limit the dominance of large technology companies, as evidenced by the Digital Services Act, the Digital Markets Act, and the proposed AI Act. While such regulatory interventions are undoubtedly important to address power imbalances and protect people from technological harm, true democratization of AI development will require a more comprehensive approach to technology governance.

Experts agree that artificial intelligence is the next frontier of market concentration in the landscape of the internet economy. As AI applications continue to reshape industries and society, the current trajectory reveals a critical bottleneck in the form of reliance on private infrastructure.

Large technology companies, many of which have been targeted by the EU regulations listed above, have disproportionate control over resources critical to AI development. These resources include computing power, data storage capabilities, data sets, and products and services into which AI can be integrated. This dominance contributes to a landscape where access is limited, benefits accrue to a select few, and the shaping of the technology is primarily driven by corporate interests. In this situation, society acts merely as a consumer of technologies and services that are often designed without regard to its best interests.

This reliance on private infrastructure is a significant barrier to the democratization of AI. Breaking away from this dependency on Big Tech is critical to ensuring Europe’s digital sovereignty and fostering a more inclusive and diverse AI landscape.

What needs to be done?

To tackle these issues, the EU must make strategic investments in the resources and technologies needed to develop AI. These investments should aim not only to reduce current market concentration but also to enable a broader range of actors to contribute to and benefit from advances in AI technology. The goal should be to empower these diverse stakeholders to shape the future trajectory of AI development, ensuring that AI not only does not harm society but also meets societal needs and its benefits are widely shared. Aligning AI advances with broader goals of social progress and sustainability requires public investment in this technology rather than leaving its fate in the hands of private companies.

The European Commission has responded to this challenge and outlined its ambitions in the AI Innovation Package to support Artificial Intelligence startups and SMEs presented in January 2024. The interventions and support actions outlined in the Communication on EU AI Start-Up and Innovation, which is part of the Package, address three key bottlenecks: data, computing capacity, and talent. The Commission promises additional investment in computing capacity through the creation of “AI factories,” building on the existing EuroHPC supercomputing facilities, and mobilizing support for start-ups working on generative AI through the Horizon Europe program. However, to achieve the goal of reducing dependency on large technology companies, support for the startup ecosystem is not enough, as the exit strategy for many of these startups continues to be acquired by Big Tech. To deal with this challenge, the EU’s efforts must focus on support for independent open source AI research and the development of large-scale artificial intelligence models designed to address pressing societal challenges.

The AI Innovation Communication identifies a number of key initiatives, such as the creation of an Alliance for Language Technologies European Digital Infrastructure Consortium and a commitment from the European institutions to provide language resources. These initiatives should be implemented in the form of datasets governed as Digital Commons, meaning that they should be shared in the public interest, with democratic and collective oversight. It will also be important to develop mechanisms that ensure a fair “give back” to the creators, rights holders, and communities involved in the creation of these resources.

To handle the data bottleneck, the EU should support the creation of trusted, commons-based datasets for AI. The relative scarcity of openly available training datasets currently makes it difficult for independent open source AI developers to compete with Big Tech, which often has access to vast amounts of proprietary data in addition to data scraped from the public internet.

Europe’s opportunity

Building on the initiatives outlined in the AI Innovation Package, the next European Commission should focus on building commons-based data sets that can be used for training large-scale artificial intelligence models designed to address pressing societal challenges. This work should leverage the Common European Cultural Heritage Data Space and focus on ensuring that Europe’s rich linguistic and cultural heritage can feed into the development of open source AI models. A first step in this direction would be to open up the vast collections of digitized public domain books held by libraries across the EU, which have been digitized as part of the Google Books project (and are currently exclusively available to Google to train its AI systems). However, the ambition needs to go further and explore ways to make more recent (in copyright) works accessible while at the same time creating revenue streams from commercial users of such data sets to support their maintenance and ensure compensation to participating rightholders.