AI and the Commons: the Wikimedia movement

As a part of exploring the relationship between generative AI systems and the commons, we have been looking closely at the approach taken by the stewards of Wikipedia.

Wikipedia is operated by the nonprofit Wikimedia Foundation, and its content is curated and maintained by a global community of volunteer editors (often referred to as “Wikimedians”). As a steward of one of the world’s larger, if not the largest, free knowledge repositories, the Wikimedia movement is uniquely positioned to explore a response to the emergent AI systems. Wikipedia is also a critical component of many AI training datasets.

Principle-based approach

During our latest AI and the Commons community call, we talked to Maryana Pinchuk and Mike Pham from the Wikimedia Foundation’s Future Audiences team. Their goal is to explore strategies and solutions that extend the reach of Wikipedia and, in particular, to develop new channels through which Wikimedia projects can be accessed and used. We learned that Wikimedians have adopted a set of principles guiding their work when it comes to working with AI. The key is that knowledge is a human endeavor, and the development of Wikipedia and its sister projects depends on human collaboration and consensus. Machine learning tools can augment the process, but AI cannot replace volunteer Wikimedians.

At a time when many conversations — also among open advocates — hinge on an implicit assumption that AI-driven solutions are unavoidable, and when in the broader public debate, important actors forecast the arrival of super-human technologies, the level-headed approach adopted by the Wikimedia movement is refreshing. It creates space for innovation while acknowledging that the core of Wikimedia projects lies in creating a verifiable knowledge database and not over-experimenting with AI. After all, the hype around AI will ultimately be overblown, like with many previous emergent technologies.

The three principles that drive the Foundation’s deployment of machine learning solutions in Wikimedia projects are sustainability, equity, and transparency. When translated into requirements for specific tools, they set a high standard. In particular, care for equity and transparency means that the Wikimedia Foundation needs to address knowledge gaps and biases in existing AI tools.

How does it work in practice?

The approach taken by The Wikimedia Foundation and the attention to the principle-driven deployment could be illustrated by the case study of the MiNT suite, a new machine-assisted translation tool. The key assumption is that it keeps its human editors in the loop. A commitment to using open-source solutions led the Wikimedia Foundation to deploy a service based on available open-source models as well as OPUS, a corpus of freely available texts. One of the main objectives of the project is to provide tools that effectively translate text into rare languages, meeting the obligations set by the Wikimedia movement’s principle of knowledge equity. It’s worth noting that outputs generated with the help of the tool are added to the OPUS corpus and — as synthetic texts reviewed and edited by humans — are especially valuable as training data for rare languages.

The MiNT initiative demonstrates how open-source AI models can be deployed in practice and in what ways they can be useful. It comes at a time when the space of generative AI services is dominated by a small number of commercial offers. And, true to its original mission, the Wikimedia Foundation is offering alternatives based on its vision of free knowledge for all.

Towards a commons-based generative model?

In my second opinion about the Wikimedia movement and AI, I argued that the Wikimedia Foundation should “build its own generative model. One that is open but also governed democratically and designed responsibly. And one that gives Wikimedia volunteers full knowledge and control over the whole system”. The machine-learning-powered translation tools could be seen as the first steps towards such a model. It is inspiring to see the Wikimedia Foundation prioritize issues of equity, transparency, and even select its GPUs based on the availability of non-proprietary drivers when building new machine-learning tools.

Hopefully, this can lead to the creation of a localized model that can democratize AI — just like in the case of Te Hiku Media, who are building sovereign Māori language models. This process supports the work to define standards for AI commons, which could be based on open-source development principles and enriched by considerations for transparency, sustainability, and other aspects of responsible use and deployment of AI.

A lot of our thinking about openness and machine learning focuses on policy and legal discussions, but at the same time, there are crucial lessons to be learned from deploying these systems. It was fascinating to learn about the product-driven strategy adopted by the Wikimedia Foundation and the principles built to protect the critical assumption: Wikimedia projects are ultimately human and can be augmented and improved but not replaced by outputs of generative models.

This text is inspired by a conversation with members of the Wikimedia Foundation team, which was part of our ongoing series of community calls called AI the Commons. If you’d like to join them, email alicja@openfuture.eu.