Exploring commons-based approaches to machine learning
This line of work explores the intersections between AI and openness. The release of powerful AI models under open licenses was a breakthrough moment for the world of open, indicating the emergence of a new field in which the principles of open are applied. This is a nascent field in which there are still no established norms for openly sharing different elements of the machine learning stack: data, model, and code. Addressing this gap requires governance mechanisms that uphold open sharing while addressing power imbalances and safeguarding digital rights.
The argument for open sharing of AI components is familiar to open movement activists. Key governance debates revolve around datasets and model licensing. But there is a need to better understand the benefits and risks and to explore new frameworks for sharing. There is a balance to be struck between openness and responsible use.
Our work in this area is guided by the insight that commons-based models and approaches offer a solution to this challenge.
Last week, the Commission published the AI Innovation Package to support Artificial Intelligence startups and SMEs. The measures listed in the package include facilitating access to AI-focused supercomputers, which is expected to help expand the use of AI to a wide range of users, including European start-ups and SMEs. An article in Science|Business rightly pointed out that the plan outlined by the Commission suggests that it is pinning its hopes on private companies to keep the EU competitive in AI.
Putting faith in private actors is not sufficient. The way to address the imbalance of power and market concentration must also include investing in the development of systems that serve society and have the best interests of people and the planet at their core. At the moment, it doesn't seem that this approach will be implemented in the field of AI. Whether we can expect any efforts to create a public option for AI in Europe remains to be seen. Some public interests, such as ensuring diversity and transparency in the datasets that train AI models, are simply not always aligned with the interests of corporations who might favor the fastest and cheapest solutions. This is where public authorities, civil society and the large communities of scientists and practitioners working on AI in Europe have a role to play.
Science | Business reported that German Research Minister Bettina Stark-Watzinger believes that "no state or association of states can match the investments made by large corporations like Microsoft or Google with public investments.”
This suggests that the German government throws in the towel and assumes that private actors are equipped and have the intention to develop digital services that serve the public interest and allow people to enjoy their fundamental rights.This approach is somewhat disappointing, and given the example of private social media platforms that fail to fulfill the role of digital public spaces,, it does not appear to be appropriate. To put it simply, without public funding, European society won't get AI that serves the public.
Open Future is hosting an asynchronous, virtual alignment assembly for the open movement to explore principles and considerations for regulating generative AI. We hope to reach 500 participants, spread across different fields of open and coming from different regions of the world.
To discuss the work presented in “Open (For Business): Big Tech, Concentrated Power, and the Political Economy of Open AI," we have invited its authors to join our monthly community calls, which explore the intersection of AI and the Commons.
Last week, Stanford HAI, Stanford CRFM, Princeton CITP, and RegLab released a Policy Brief on Considerations for Governing Open Foundation Models.
The brief outlines the current evidence on the risks of open foundation models (FMs) and offers some recommendations for policymakers on how to think about the risks of open FMs. The brief argues that open FMs - defined by the authors as "models with widely available weights" - "provide significant benefits by combating market concentration, catalyzing innovation, and improving transparency." The authors therefore conclude that "policymakers should explicitly consider the potential unintended consequences of AI regulation on the vibrant innovation ecosystem around open foundation models."
The policy brief also points out that despite widespread concern about the dangers of open foundation models that has dominated policy discussions, "the existing evidence on the marginal risk of open foundation models remains quite limited. The key questions for understanding their impact are the risks posed by open models relative to the risks posed by other models (the marginal risk):
To what extent do open endowment models increase risk relative to (a) closed endowment models or (b) pre-existing technologies such as search engines?
Coming a few days after the final compromise on the EU AI Act, the policy brief provides further support for the AI Act's approach of providing targeted exemptions for open (source) AI developers. The final compromise on the AI Act sidesteps policies - liability for downstream harm, licensing of model developers - that the authors of the policy brief see as particularly problematic for open AI developers. As we argued in our analysis of the Act, the overall approach to open source AI development in the AI Act is quite sound, although there is still room for improvement by getting some of the details of the transparency obligations right.
Late last week, the European Commission, the Member States, and the European Parliament reached a deal on the AI Act. The current compromise is a combination of tiered obligations and a limited open source exemption which creates a situation where open source AI models can get away with being less transparent and less well-documented than proprietary GPAI models.
As part of the CSCW 2023 conference, Alek co-organized a workshop titled “Can Licensing Mitigate the Negative Implications of Commercial Web Scraping?”. Representatives of several research institutions, Hugging Face, Creative Commons, RAIL, and Hippo AI participated in the conversation. You can read the short paper outlining the ideas behind the workshop in the ACM digital library.
Some experts believe that open-sourcing AI increases the risk of malicious use. In this opinion, we argue that calls for regulators to intervene and limit the possibility of open-sourcing AI models must consider the impact on freedom of expression.
Zuzanna and Alek gave a talk on commons-based governance of AI datasets, as part of this year’s Deep Dive on AI webinar series, organized by the Open Source Initiative. The webinars are part of OSI’s initiative to define a new standard for Open Source AI systems, in which we are participating. The talk highlighted the importance of strong standards for data sharing that should be part of a community standard for open source AI. You can watch the video here.
In this analysis, I review the Llama 2 release strategy and show its non-compliance with the open-source standard. Furthermore, I explain how this case demonstrates the need for more robust governance that mandates training data transparency.
The AI Act should allow for proportional obligations in the case of open source projects while creating strong guardrails to ensure they are not exploited to hide from legitimate regulatory scrutiny.
The AI Act should provide clarity on the criteria by which a project will be judged to determine whether it has crossed the “commercialization” threshold, including revenue.
In a third recommendation, Mozilla highlights the importance of definitional clarity when it comes to regulating open source AI systems. Here Mozilla suggests maintaining a strict definition (that would exclude newer licenses like the RAIL family of licenses) and clarifying which components would need to be licensed under an open license for a system to be considered to be an open source AI system. According to Mozilla this should indicatively apply to models, weights and training data.
Today — together with Hugging Face, Eleuther.ai, LAION, GitHub, and Creative Commons, we publish a statement on Supporting Open Source and Open Science in the EU AI Act. We strongly believe that open source and open science are the building blocks of trustworthy AI and should be promoted in the EU.
We need a more holistic approach that considers how machine learning technologies impact Wikimedia — changes to editing, disintermediation of users, and governance of free knowledge as a resource used in AI training. These changes call for an overall strategy that balances the need to protect the organization from negative impact and harms with the need to deploy new technologies in productive ways to help build the digital commons.
This article examines an example from the global women's rights movement of how organizations and institutions support local actors to participate in transnational AI governance and challenge top-down structures and mechanisms.
Today, the European Parliament's IMCO and LIBE committees adopted their joint report on the proposed AI Act. The text includes additional safeguards for fundamental rights and an overall more cautious approach to AI. In this post, we provide an in-depth analysis of the implications of the text for open source AI development.
The following piece is the first part of a case study on how Wikipedia is positioned to address the challenges of open AI development. It spells out the general argument, which will be followed by more specific suggestions on how a wikiAI mission could look like.
Establishing a regulatory framework that achieves the dual objectives of protecting open-source AI systems and mitigating risks of potential harm is a critical imperative for the European Union. Especially since open-source, publicly supported AI systems are crucial digital public infrastructures that would ensure Europe’s sovereignty.
The European Union's upcoming AI Act will require adequate standards to become fully operational, and much work is required to ensure that the standardization process does not conflict with the Act's inclusion and transparency objectives.
The process will be led by the European Committee for Standardization (CEN) and the European Committee for Electrotechnical Standardization (CENELEC). In the past, they have been criticized for their secrecy and lack of transparency. The standards must be made public, but some fear that the private sector will have too much control over the process, which could have an impact on human rights. The standards' nature and scope will also have geopolitical implications, with some calling for greater international cooperation.
Standards will be essential in enforcing the EU's AI legislation, and CEN-CENELEC will have just two years to formulate and agree on a series of AI standards.
The LAION proposal calls for a public research facility capable of building large-scale artificial intelligence models. It offers an alternative to corporate development of AI, in which responsible use is ensured in open source environments through the involvement of democratically elected institutions.
The rapid advancements in AI challenge the concept of openness on the internet, as companies use publicly available data to their advantage, frequently disregarding the concerns and welfare of other parties, such as artists and content creators, and the impacts of the tools they make available for use. There is a growing realization that the […]
The Future of Life Institute published an open letter asking for a moratorium on generative AI development. Yet social harms caused by AI will not be addressed in this way. Instead, commons-based governance of existing AI systems is needed.
Natali Helberger and Nicholas Diakopoulos have published an article titled "ChatGPT and the AI Act" in the Internet Policy Review. The article argues that the AI Act’s risk-based approach is not suitable for regulating generative AI due to two characteristics of such systems: their scale and broad context of use. These characteristics make it challenging to regulate them based on clear distinctions of risk and no-risk categories.
Helberger and Diakopoulos propose looking for inspiration in the Digital Services Act (DSA), which lays down obligations on mitigating systemic risks. A similar argument was made by Philipp Hacker, Andreas Engel, and Theresa List in their analysis.
Interestingly, the authors also point out that providers of generative AI models are currently making efforts to define risky or prohibited uses through contractual clauses. While they argue that “a complex system of private ordering could defy the broader purpose of the AI Act to promote legal certainty, foreseeability, and standardisation,” it is worth considering how regulation and private ordering (through RAIL licenses, which we previously analyzed) can contribute to the overall governance of these models.
The Collective Intelligence Project has published a new working paper by Saffron Huang and Divya Siddarth that discusses the impact of Generative Foundation Models (GFMs) on the digital commons. One of the key concerns raised by the authors is that GFMs are largely extractive in their relationship to the Digital Commons:
The dependence of GFMs on digital commons has economic implications: much of the value comes from the commons, but the profits of the models and their applications may be disproportionately captured by those creating GFMs and associated products, rather than going back into enriching the commons. Some of the trained models have been open-sourced, some are available through paid APIs (such as OpenAI’s GPT-3 and other models), but many are proprietary and commercialized. It is likely that users will capture economic surplus from using GFM products, and some of them will have contributed to the commons, but there is still a question of whether there are obligations to directly compensate either the commons or those who contributed to it.In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons.
In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons. Read the full paper here:
The RAIL licenses are gaining ground, but permissive sharing is still the prominent norm governing the sharing of ML models on huggingface.co. This analysis aims at understanding how licenses are used by developers making ML model-related code and or data publicly available.
None of the approaches dealing with open source AI systems in the AI Act address the concerns related to chilling effects on open source AI development so far. The Parliament still has the opportunity to address these concerns without jeopardizing the AI Act’s overall regulatory objective by leveraging on the inherent transparency of open source, writes Paul Keller.
The launch of BLOOM, an open language model capable of generating text, and the related RAIL open licenses by BigScience, together with the launch of Stable Diffusion, a text-to-image language, shows that a new approach to open licensing is emerging. In Notes on BLOOM, RAIL, and openness of AI, Alek outlines the challenges to established ways of understanding open faced by AI researchers, as they aim to enforce their vision of not just open, but also responsible AI.
Instead of analyzing the functioning of image generators through the lens of copyright, we should ask ourselves a normative question: Why should we want that copyright applies to the visual output of these generators?