Public AI

Online Event on Public AI

May 20, 2025

Open Future and Bertelsmann Stiftung hosted an online event that brought together policymakers, industry representatives, and civil society to explore how Public AI can create more accessible and accountable technology.

May 20, 2025

May 20, 2025

White Paper on Public AI

Today's AI landscape is dominated by a few tech companies. This White Paper presents an alternative: Public AI systems built on transparency, democratic governance, and open access to critical infrastructure.

April 22, 2025

Gutting AI transparency in the name of deregulation will not help Europe

April 22, 2025

The EU's rush to simplify AI regulation risks weakening transparency rules. This regulatory shift threatens core principles needed for responsible AI development.

April 10, 2025

From AI Factories to Public Value

April 10, 2025

The AI Continent Action Plan needs a stronger vision of purposeful AI deployment if it wants to achieve more than just boost commercial AI development.

December 11, 2024

Advancing Training Data Transparency in the EU AI Act

December 11, 2024

While the EU AI Act is the most significant opportunity to advance AI transparency, this analysis argues what information should be disclosed in a "sufficiently detailed summary", who it matters to, and why trade secrets cannot be used as an excuse to undermine transparency.

November 28, 2024

November 28, 2024

AI Speaks Polish

This report analyzes the development of Polish small language models and the ecosystem in which this happens. The case studies show alternatives to large commercial models that are built as Digital Commons, and can help fight concentrations of power in AI.

November 20, 2024

From Code to Conduct: Insights from a Mozilla Morning

November 20, 2024

The Mozilla Foundation and Open Future co-hosted an event with policymakers, industry representatives, and civil society to explore how to make content used by AI more transparent.

November 4, 2024

PD12M: a fully open image training dataset with community governance

Spawning has released PD12M, a fully open dataset consisting of 12.4 million image-caption pairs. The dataset exclusively consists of public domain and CC0 licensed images that have been obtained from Wikimedia Commons, a large number of cultural heritage organizations, and the iNaturalist website. From the paper accompanying the release:

We present Public Domain 12M (PD12M), a dataset of 12.4 million high-quality public domain and CC0-licensed images with synthetic captions, designed for training text-to-image models. PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform, we also introduce novel, community-driven dataset governance mechanisms that reduce harm and support reproducibility over time.

The release of PD12M is remarkable not only given the size of the fully open dataset but also because of the holistic approach that Spawning has taken. Via the source.plus platform, Spawning provides community-based governance mechanisms. In addition, the platform also provides an exemplary level of transparency regarding the sources of the images included in the dataset. The release of PD12M is exciting not only because it builds on our ideas for a public data commons but also because Spawning sees the release of the dataset as a first step towards offering a foundational public domain image model with no IP concerns, that will help artists to fine-tune, and own, their own models on their own terms.

November 1, 2024

The Open Source AI Definition is a step forward in defining openness in AI

November 1, 2024

This week, the Open Source Initiative released its definition of open source AI. This analysis considers its significance as a standard, its limitations, and the need for a broader community norm.

October 29, 2024

Museums and AI: Balancing Innovation and Integrity

October 29, 2024

This video provocation, presented at the European Heritage Hub Forum, focuses on AI systems' implications for the role of cultural heritage institutions.

October 10, 2024

October 10, 2024

LAION vs Kneschke

The Landgericht Hamburg's decision to allow LAION to include a photographer's image in the LAION-5B training dataset empowers non-profit providers of public training datasets, which play a critical role in making AI training more transparent.

October 3, 2024

Democratizing AI for the Public Good

October 3, 2024

This policy brief considers open-source AI as a digital public good and provides recommendations for progressing toward AI democratization and accelerating the attainment of the SDGs.

September 19, 2024

“Sufficiently detailed summary” — v2.0 of the blueprint for GPAI training data

September 19, 2024

We hope that this revised blueprint will help inform the AI Office's work and serve as a valuable contribution to the consultations on the Code of Practice, outlining rules for general-purpose AI providers.

July 4, 2024

AI and the Commons: the heritage sector

July 4, 2024

At the June AI and the Commons community call, we discussed the heritage sector’s relationship to AI (generative and analytical). Our guests were Dr. Mathilde Pavis, an expert in intellectual property law, ethics, and new technologies, and Mike Weinberg, Executive Director of NYU's Engelberg Center for Innovation Law and Policy.

June 19, 2024

Alignment Assembly Report Launch

June 19, 2024

If you are interested in learning more about the outcomes of the Alignment Assembly process and would like to participate in further discussions on this topic, register to join the report's online launch.

June 19, 2024

Towards Robust Training Data Transparency

June 19, 2024

This paper and the accompanying blueprint of the transparency template that the AI office is tasked to develop is a collaborative effort of Open Future and Mozilla Foundation, drawing on input from experts.

June 13, 2024

The Impact of Share Alike/CopyLeft Licensing on Generative AI

June 13, 2024

The study explores the impact of Share Alike/CopyLeft licensing on machine learning and generative AI.

June 12, 2024

Alignment Assembly on AI and the Commons — outcomes and learnings

June 12, 2024

This report captures learnings from the Alignment Assembly on AI and the Commons, a six-week online deliberation of open movement activists, creators, and organizations about regulating generative AI.

May 21, 2024

Source.plus — an image commons for generative AI

Last week, Spawning launched source.plus, a platform for “curating, enriching and downloading non-infringing media collections in bulk for AI training.” This is a significant step in addressing a host of issues with AI training datasets, such as LAION or face recognition training datasets.

The aim of this experimental platform is to demonstrate that licensed content is not the only viable solution:

This means the most conscientious developers and most affected communities are often on the sidelines of this rapidly developing field, whereas these are the very groups that need to be steering its evolution, and they too should be able to benefit from participation with AI.

Its real value lies not just in the volume of aggregated media files but in how they are curated and governed. It is an interface to established collections, it introduces additional mechanisms – many of which we have proposed in our recent white paper, Commons-based governance of data sets for AI training. For example, source.plus is the first collection to offer an “opt-out” mechanism. Spawning is also planning to introduce value-sharing mechanisms, including paid collections of in-copyright works and a donation mechanism that supports cultural heritage institutions.

May 17, 2024

Democratic governance of AI systems and datasets

May 17, 2024

The brief outlines a policy agenda that addresses concentrations of power in AI through policies supporting democratic governance of these technologies. It was written together with other organizations by invitation from Think7, the think tank of the Italian G7 Presidency.

May 2, 2024

Common Corpus: building AI as Commons

May 2, 2024

At our April AI and the Commons community call, we heard from Pierre-Carl Langlais who talked about ways in which generative AI models can be designed and built as a Commons.

April 8, 2024

Towards a Books Data Commons for AI Training

April 8, 2024

This white paper describes ways of building a books data commons: a responsibly designed, broadly accessible data set of digitized books to be used in training AI models.

April 4, 2024

AI and the Commons: Participation in the AI Governance

April 4, 2024

During the last AI and the Commons call, we spoke to Tim Davies about including the public in AI governance. In his presentation, Tim talked about Connected by Data’s People’s Panel on AI.

March 27, 2024

Seeing like an algorithm: A closer look at LAION 5B

Researchers at Knowing Machines have published Models all the way down, a visual investigation that takes a detailed look at the construction of the LAION 5B dataset "to better understand its contents, implications, and entanglements.” The investigation provides detailed insight into the internal structure and strategies used to build one of the largest and most influential training datasets used to train the current crop of image generation models. Among other things, the researchers show that the model's curators relied heavily on algorithmic selection to assemble the model, and as a result…

…there is a circularity inherent to the authoring of AI training sets. [...] Because they need to be so large, their construction necessarily involves the use of other models, which themselves were trained on algorithmically curated training sets. [...] There are models on top of models, and trainings sets on top of training sets. Omissions and biases and blind spots from these stacked-up models and training sets shape all of the resulting new models and new training sets.

One of the key takeaways from the researchers (who, for all their critical observations, give LAION credit for releasing the dataset as open data) is that we need more dataset transparency to understand the structural configuration of today's generative AI systems, which is very much in line with what we’ve been advocating for in the context of the AI Act and will continue to push for in the implementation of the Act.

Screenshot from Models all the way down, © Knowing Machines

March 20, 2024

Common Corpus public domain data set released

A group of AI researchers coordinated by the French start-up Pleias wants to challenge the belief that you need copyrighted materials to train an LLM that competes with the models developed by leading AI companies. Yesterday, they released what has been dubbed the largest open AI training data set consisting entirely of public-domain texts. The collection is called “Common Corpus” and is available on Hugging Face for download. The resource is multilingual – besides English, it includes the largest open collections in French, German, Spanish, Dutch, and Italian, as well as collections for other languages.

Training data is a key resource for developing AI systems. Until very recently, it was commonly believed that LLMs, such as those behind popular services such as ChatGPT or Bard, could not be trained without relying on copyrighted content. If this is the case, access to high-quality data may continue to be a significant barrier for independent AI developers seeking to compete in the LLM market.

Datasets consisting only of public domain texts have significant limitations, the most important being that they miss more contemporary information because they are comprised of historical sources or older publications where copyrights have already expired. It remains to be seen whether public domain datasets can indeed compete with datasets containing more contemporary content that is protected by copyright.

March 7, 2024

AI Act fails to set meaningful dataset transparency standards for open source AI

March 7, 2024

There is an urgent need to address the issue and set a clear standard for transparency with regard to AI training and access to training datasets. The European policymakers avoided answering the questions that could help ensure openness in the context of AI development.

March 6, 2024

AI and the Commons: Open Data Commons Licences and the issues of data governance

March 6, 2024

The post summarizes the AI and the Commons Community call focused on a recent paper by Melanie Dulong de Rosnay and Yaniv Benhamou.

March 4, 2024

Friction in AI Governance: Performing Participation

March 4, 2024

In this article, Nadia takes a closer look at and debunks a few popular participation practices.

February 27, 2024

“It’s infrastructure, stupid”

February 27, 2024

The Microsoft-Mistral partnership proves that without infrastructure independence, Europe’s digital sovereignty will remain just a pipe dream.

February 16, 2024

AI and the Commons: building AI datasets for the future

February 16, 2024

At our first 2024 AI and the Commons Community Call, we were joined by Eryk Salvaggio, an interdisciplinary researcher, lecturer, and artist who works with digital media and AI.

February 2, 2024

Without public funding, Europe won’t get AI that serves the public

Last week, the Commission published the AI Innovation Package to support Artificial Intelligence startups and SMEs. The measures listed in the package include facilitating access to AI-focused supercomputers, which is expected to help expand the use of AI to a wide range of users, including European start-ups and SMEs. An article in Science|Business rightly pointed out that the plan outlined by the Commission suggests that it is pinning its hopes on private companies to keep the EU competitive in AI.

Putting faith in private actors is not sufficient. The way to address the imbalance of power and market concentration must also include investing in the development of systems that serve society and have the best interests of people and the planet at their core. At the moment, it doesn't seem that this approach will be implemented in the field of AI. Whether we can expect any efforts to create a public option for AI in Europe remains to be seen. Some public interests, such as ensuring diversity and transparency in the datasets that train AI models, are simply not always aligned with the interests of corporations who might favor the fastest and cheapest solutions. This is where public authorities, civil society and the large communities of scientists and practitioners working on AI in Europe have a role to play.

Science | Business reported that German Research Minister Bettina Stark-Watzinger believes that "no state or association of states can match the investments made by large corporations like Microsoft or Google with public investments.”

This suggests that the German government throws in the towel and assumes that private actors are equipped and have the intention to develop digital services that serve the public interest and allow people to enjoy their fundamental rights.This approach is somewhat disappointing, and given the example of private social media platforms that fail to fulfill the role of digital public spaces,, it does not appear to be appropriate. To put it simply, without public funding, European society won't get AI that serves the public.

February 1, 2024

Alignment Assembly on AI and the Commons

February 1, 2024

Open Future is hosting an asynchronous, virtual alignment assembly for the open movement to explore principles and considerations for regulating generative AI. We hope to reach 500 participants, spread across different fields of open and coming from different regions of the world.

January 11, 2024

AI and the Commons: the paradox of open (for business)

January 11, 2024

To discuss the work presented in “Open (For Business): Big Tech, Concentrated Power, and the Political Economy of Open AI," we have invited its authors to join our monthly community calls, which explore the intersection of AI and the Commons.

December 19, 2023

AI and the Commons: the Wikimedia movement

December 19, 2023

As a part of exploring the relationship between generative AI systems and the commons, we have been looking closely at the approach taken on Wikipedia.

December 18, 2023

Considerations for Governing Open Foundation Models

Last week, Stanford HAI, Stanford CRFM, Princeton CITP, and RegLab released a Policy Brief on Considerations for Governing Open Foundation Models.

The brief outlines the current evidence on the risks of open foundation models (FMs) and offers some recommendations for policymakers on how to think about the risks of open FMs. The brief argues that open FMs - defined by the authors as "models with widely available weights" — "provide significant benefits by combating market concentration, catalyzing innovation, and improving transparency." The authors therefore conclude that "policymakers should explicitly consider the potential unintended consequences of AI regulation on the vibrant innovation ecosystem around open foundation models."

The policy brief also points out that despite widespread concern about the dangers of open foundation models that has dominated policy discussions, "the existing evidence on the marginal risk of open foundation models remains quite limited. The key questions for understanding their impact are the risks posed by open models relative to the risks posed by other models (the marginal risk):

To what extent do open endowment models increase risk relative to (a) closed endowment models or (b) pre-existing technologies such as search engines?

Coming a few days after the final compromise on the EU AI Act, the policy brief provides further support for the AI Act's approach of providing targeted exemptions for open (source) AI developers. The final compromise on the AI Act sidesteps policies - liability for downstream harm, licensing of model developers - that the authors of the policy brief see as particularly problematic for open AI developers. As we argued in our analysis of the Act, the overall approach to open source AI development in the AI Act is quite sound, although there is still room for improvement by getting some of the details of the transparency obligations right.

Read the policy paper

December 14, 2023

A Frankenstein-like approach: open source in the AI act

December 14, 2023

Late last week, the European Commission, the Member States, and the European Parliament reached a deal on the AI Act. The current compromise is a combination of tiered obligations and a limited open source exemption which creates a situation where open source AI models can get away with being less transparent and less well-documented than proprietary GPAI models.

December 7, 2023

Data Governance in the Age of Generative AI

December 7, 2023

Open Future partnered with the Elliott School of International Affairs at George Washington University, which organized the event.

November 15, 2023

Friction in AI Governance: there’s more to it than breaking servers

November 15, 2023

In this article, Nadia Nadesan examines collective bargaining as an essential element of AI governance.

October 25, 2023

Falcon 180B, open source AI and control over compute

October 25, 2023

This opinion takes a closer look at how the Falcon 180B model is licensed and is a part of our exploration of the emergent standards for the sharing of AI models.

October 15, 2023

CSCW workshop on licensing and web scraping

As part of the CSCW 2023 conference, Alek co-organized a workshop titled “Can Licensing Mitigate the Negative Implications of Commercial Web Scraping?”. Representatives of several research institutions, Hugging Face, Creative Commons, RAIL, and Hippo AI participated in the conversation. You can read the short paper outlining the ideas behind the workshop in the ACM digital library.

October 12, 2023

Code is speech, and speech is free

October 12, 2023

Some experts believe that open-sourcing AI increases the risk of malicious use. In this opinion, we argue that calls for regulators to intervene and limit the possibility of open-sourcing AI models must consider the impact on freedom of expression.

September 26, 2023

Talk on Commons-based AI dataset governance for OSI

Zuzanna and Alek gave a talk on commons-based governance of AI datasets, as part of this year’s Deep Dive on AI webinar series, organized by the Open Source Initiative. The webinars are part of OSI’s initiative to define a new standard for Open Source AI systems, in which we are participating. The talk highlighted the importance of strong standards for data sharing that should be part of a community standard for open source AI. You can watch the video here.

September 15, 2023

Open Source, AI and the Paradox of Open

September 15, 2023

We agree with Widder, West, and Whittaker that openness alone will not democratize AI. However, it is clear to us that any alternative to current Big Tech-driven AI must be, among other things, open.

August 11, 2023

The Mirage of Open-Source AI: Analyzing Meta’s Llama 2 Release Strategy

August 11, 2023

In this analysis, I review the Llama 2 release strategy and show its non-compliance with the open-source standard. Furthermore, I explain how this case demonstrates the need for more robust governance that mandates training data transparency.

August 1, 2023

Mozilla position on Openness & AI in the AI Act

The Mozilla Foundation published a blog post outlining its ideas on Fostering Innovation & Accountability in the EU’s AI Act. In this blogpost they highlight our recent paper on Supporting Open Source and Open Science in the EU AI Act and make two recommendations for EU lawmakers working on finalizing the AI act that echo some of our own recommendations:

The AI Act should allow for proportional obligations in the case of open source projects while creating strong guardrails to ensure they are not exploited to hide from legitimate regulatory scrutiny.
The AI Act should provide clarity on the criteria by which a project will be judged to determine whether it has crossed the “commercialization” threshold, including revenue.

In a third recommendation, Mozilla highlights the importance of definitional clarity when it comes to regulating open source AI systems. Here Mozilla suggests maintaining a strict definition (that would exclude newer licenses like the RAIL family of licenses) and clarifying which components would need to be licensed under an open license for a system to be considered to be an open source AI system. According to Mozilla this should indicatively apply to models, weights and training data.

July 26, 2023

Supporting Open Source and Open Science in the EU AI Act

July 26, 2023

Today — together with Hugging Face, Eleuther.ai, LAION, GitHub, and Creative Commons, we publish a statement on Supporting Open Source and Open Science in the EU AI Act. We strongly believe that open source and open science are the building blocks of trustworthy AI and should be promoted in the EU.

July 7, 2023

Stewarding the sum of all knowledge in the age of AI

July 7, 2023

We need a more holistic approach that considers how machine learning technologies impact Wikimedia — changes to editing, disintermediation of users, and governance of free knowledge as a resource used in AI training. These changes call for an overall strategy that balances the need to protect the organization from negative impact and harms with the need to deploy new technologies in productive ways to help build the digital commons.

July 3, 2023

Friction and AI Governance: Institutional Intermediaries

July 3, 2023

This article examines an example from the global women's rights movement of how organizations and institutions support local actors to participate in transnational AI governance and challenge top-down structures and mechanisms.

May 11, 2023

Undermining the foundation of open source AI?

May 11, 2023

Today, the European Parliament's IMCO and LIBE committees adopted their joint report on the proposed AI Act. The text includes additional safeguards for fundamental rights and an overall more cautious approach to AI. In this post, we provide an in-depth analysis of the implications of the text for open source AI development.

May 4, 2023

How Wikipedia can shape the future of AI

May 4, 2023

The following piece is the first part of a case study on how Wikipedia is positioned to address the challenges of open AI development. It spells out the general argument, which will be followed by more specific suggestions on how a wikiAI mission could look like.

May 2, 2023

Concerns Over the Impact of the AI Act on Open-Source R&D: LAION’s open letter

May 2, 2023

Establishing a regulatory framework that achieves the dual objectives of protecting open-source AI systems and mitigating risks of potential harm is a critical imperative for the European Union. Especially since open-source, publicly supported AI systems are crucial digital public infrastructures that would ensure Europe’s sovereignty.

April 25, 2023

Clément Perarnaud on the role of standards in making the AI Act operational

The European Union's upcoming AI Act will require adequate standards to become fully operational, and much work is required to ensure that the standardization process does not conflict with the Act's inclusion and transparency objectives.

The process will be led by the European Committee for Standardization (CEN) and the European Committee for Electrotechnical Standardization (CENELEC). In the past, they have been criticized for their secrecy and lack of transparency. The standards must be made public, but some fear that the private sector will have too much control over the process, which could have an impact on human rights. The standards' nature and scope will also have geopolitical implications, with some calling for greater international cooperation.

Standards will be essential in enforcing the EU's AI legislation, and CEN-CENELEC will have just two years to formulate and agree on a series of AI standards.

April 13, 2023

LAION petitions for a European public AI mission

April 13, 2023

The LAION proposal calls for a public research facility capable of building large-scale artificial intelligence models. It offers an alternative to corporate development of AI, in which responsible use is ensured in open source environments through the involvement of democratically elected institutions.

April 7, 2023

Exploring the Intersection of Openness and AI

April 7, 2023

The rapid advancements in AI challenge the concept of openness on the internet, as companies use publicly available data to their advantage, frequently disregarding the concerns and welfare of other parties, such as artists and content creators, and the impacts of the tools they make available for use. There is a growing realization that the […]

March 30, 2023

AI is already out there. We need commons-based governance, not a moratorium

March 30, 2023

The Future of Life Institute published an open letter asking for a moratorium on generative AI development. Yet social harms caused by AI will not be addressed in this way. Instead, commons-based governance of existing AI systems is needed.

March 10, 2023

Helberger and Diakopoulos on the AI Act and ChatGPT

Natali Helberger and Nicholas Diakopoulos have published an article titled "ChatGPT and the AI Act" in the Internet Policy Review. The article argues that the AI Act’s risk-based approach is not suitable for regulating generative AI due to two characteristics of such systems: their scale and broad context of use. These characteristics make it challenging to regulate them based on clear distinctions of risk and no-risk categories.

The article is relevant to us in the context of open source, general-purpose AI systems, and their potential regulation.

Helberger and Diakopoulos propose looking for inspiration in the Digital Services Act (DSA), which lays down obligations on mitigating systemic risks. A similar argument was made by Philipp Hacker, Andreas Engel, and Theresa List in their analysis.

Interestingly, the authors also point out that providers of generative AI models are currently making efforts to define risky or prohibited uses through contractual clauses. While they argue that “a complex system of private ordering could defy the broader purpose of the AI Act to promote legal certainty, foreseeability, and standardisation,” it is worth considering how regulation and private ordering (through RAIL licenses, which we previously analyzed) can contribute to the overall governance of these models.

February 7, 2023

Generative AI and the Digital Commons

The Collective Intelligence Project has published a new working paper by Saffron Huang and Divya Siddarth that discusses the impact of Generative Foundation Models (GFMs) on the digital commons. One of the key concerns raised by the authors is that GFMs are largely extractive in their relationship to the Digital Commons:

The dependence of GFMs on digital commons has economic implications: much of the value comes from the commons, but the profits of the models and their applications may be disproportionately captured by those creating GFMs and associated products, rather than going back into enriching the commons. Some of the trained models have been open-sourced, some are available through paid APIs (such as OpenAI’s GPT-3 and other models), but many are proprietary and commercialized. It is likely that users will capture economic surplus from using GFM products, and some of them will have contributed to the commons, but there is still a question of whether there are obligations to directly compensate either the commons or those who contributed to it.In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons.

In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons. Read the full paper here:

Generative AI and the Digital Commons

February 7, 2023

The Growth of responsible AI licensing

February 7, 2023

The RAIL licenses are gaining ground, but permissive sharing is still the prominent norm governing the sharing of ML models on huggingface.co. This analysis aims at understanding how licenses are used by developers making ML model-related code and or data publicly available.

December 13, 2022

How will the AI Act deal with open source AI systems?

December 13, 2022

None of the approaches dealing with open source AI systems in the AI Act address the concerns related to chilling effects on open source AI development so far. The Parliament still has the opportunity to address these concerns without jeopardizing the AI Act’s overall regulatory objective by leveraging on the inherent transparency of open source, writes Paul Keller.

September 9, 2022

Notes on BLOOM, RAIL, and openness of AI

The launch of BLOOM, an open language model capable of generating text, and the related RAIL open licenses by BigScience, together with the launch of Stable Diffusion, a text-to-image language, shows that a new approach to open licensing is emerging. In Notes on BLOOM, RAIL, and openness of AI, Alek outlines the challenges to established ways of understanding open faced by AI researchers, as they aim to enforce their vision of not just open, but also responsible AI.

Notes

September 9, 2022