AI and the Commons

Exploring commons-based approaches to machine learning

This line of work explores the intersections between AI and openness. The release of powerful AI models under open licenses was a breakthrough moment for the world of open, indicating the emergence of a new field in which the principles of open are applied. This is a nascent field in which there are still no established norms for openly sharing different elements of the machine learning stack: data, model, and code. Addressing this gap requires governance mechanisms that uphold open sharing while addressing power imbalances and safeguarding digital rights.

The argument for open sharing of AI components is familiar to open movement activists. Key governance debates revolve around datasets and model licensing. But there is a need to better understand the benefits and risks and to explore new frameworks for sharing. There is a balance to be struck between openness and responsible use.

Our work in this area is guided by the insight that commons-based models and approaches offer a solution to this challenge.

Timeline

As part of the CSCW 2023 conference, Alek co-organized a workshop titled “Can Licensing Mitigate the Negative Implications of Commercial Web Scraping?”. Representatives of several research institutions, Hugging Face, Creative Commons, RAIL, and Hippo AI participated in the conversation. You can read the short paper outlining the ideas behind the workshop in the ACM digital library.
Zuzanna and Alek gave a talk on commons-based governance of AI datasets, as part of this year’s Deep Dive on AI webinar series, organized by the Open Source Initiative. The webinars are part of OSI’s initiative to define a new standard for Open Source AI systems, in which we are participating. The talk highlighted the importance of strong standards for data sharing that should be part of a community standard for open source AI. You can watch the video here.
The Mozilla Foundation published a blog post outlining its ideas on Fostering Innovation & Accountability in the EU’s AI Act. In this blogpost they highlight our recent paper on Supporting Open Source and Open Science in the EU AI Act and make two recommendations for EU lawmakers working on finalizing the AI act that echo some of our own recommendations:
  1. The AI Act should allow for proportional obligations in the case of open source projects while creating strong guardrails to ensure they are not exploited to hide from legitimate regulatory scrutiny.
  2. The AI Act should provide clarity on the criteria by which a project will be judged to determine whether it has crossed the “commercialization” threshold, including revenue.
In a third recommendation, Mozilla highlights the importance of definitional clarity when it comes to regulating open source AI systems. Here Mozilla suggests maintaining a strict definition (that would exclude newer licenses like the RAIL family of licenses) and clarifying which components would need to be licensed under an open license for a system to be considered to be an open source AI system. According to Mozilla this should indicatively apply to models, weights and training data.
The European Union's upcoming AI Act will require adequate standards to become fully operational, and much work is required to ensure that the standardization process does not conflict with the Act's inclusion and transparency objectives. The process will be led by the European Committee for Standardization (CEN) and the European Committee for Electrotechnical Standardization (CENELEC). In the past, they have been criticized for their secrecy and lack of transparency. The standards must be made public, but some fear that the private sector will have too much control over the process, which could have an impact on human rights. The standards' nature and scope will also have geopolitical implications, with some calling for greater international cooperation. Standards will be essential in enforcing the EU's AI legislation, and CEN-CENELEC will have just two years to formulate and agree on a series of AI standards.
Natali Helberger and Nicholas Diakopoulos have published an article titled "ChatGPT and the AI Act" in the Internet Policy Review. The article argues that the AI Act’s risk-based approach is not suitable for regulating generative AI due to two characteristics of such systems: their scale and broad context of use. These characteristics make it challenging to regulate them based on clear distinctions of risk and no-risk categories.

The article is relevant to us in the context of open source, general-purpose AI systems, and their potential regulation.

Helberger and Diakopoulos propose looking for inspiration in the Digital Services Act (DSA), which lays down obligations on mitigating systemic risks. A similar argument was made by Philipp Hacker, Andreas Engel, and Theresa List in their analysis.

Interestingly, the authors also point out that providers of generative AI models are currently making efforts to define risky or prohibited uses through contractual clauses. While they argue that “a complex system of private ordering could defy the broader purpose of the AI Act to promote legal certainty, foreseeability, and standardisation,” it is worth considering how regulation and private ordering (through RAIL licenses, which we previously analyzed) can contribute to the overall governance of these models.
The Collective Intelligence Project has published a new working paper by Saffron Huang and Divya Siddarth that discusses the impact of Generative Foundation Models (GFMs) on the digital commons. One of the key concerns raised by the authors is that GFMs are largely extractive in their relationship to the Digital Commons:
The dependence of GFMs on digital commons has economic implications: much of  the value comes from the commons, but the profits of the models and their applications may be disproportionately captured by those creating GFMs and associated products, rather than going back into enriching the commons. Some of the trained models have been open-sourced, some are available through paid APIs (such as OpenAI’s GPT-3 and other models), but many are proprietary and commercialized. It is likely that users will capture economic surplus from using GFM products, and some of them will have contributed to the commons, but there is still a question of whether there are obligations to directly compensate either the commons or those who contributed to it.In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons.
In response, the paper identifies three proposals for dealing with the risks that GFMs pose to the commons. Read the full paper here:
The launch of BLOOM, an open language model capable of generating text, and the related RAIL open licenses by BigScience, together with the launch of Stable Diffusion, a text-to-image language, shows that a new approach to open licensing is emerging. In Notes on BLOOM, RAIL, and openness of AI, Alek outlines the challenges to established ways of understanding open faced by AI researchers, as they aim to enforce their vision of not just open, but also responsible AI.