Beyond Control and Compensation

Last year, we published—together with Europeana—an impulse paper on Publishing Cultural Heritage Data in the Age of AI, which at its core proposed a conditional access framework. We published this paper as a discussion starter. Two recent pieces—Dave Hansen’s Library and Archives 101: AI and the False Promise of Control and Farzaneh Badii’s No One Should Control The Internet After AI: Freedom To Build Cleopatra GPT, both of which cite our work in their opening sentences—engage with the ideas we put forward and raise concerns that deserve a response. This post aims to engage with those concerns, correct mischaracterizations of conditional access models, and arrive at a clearer articulation of what conditional access should mean in the cultural heritage context.

Both pieces are at their core critiques of conditional access, though they engage with it differently. Hansen’s central argument is that conditions on access that cannot reasonably be met are not conditions at all—they are denials of access in disguise. He develops this through a close reading of the University of Virginia (UVA) Archival AI Protocol, which requires, among other things, that AI systems demonstrate training-stage provenance tracing and accept an institutional “Right to Stop”—the ability to demand decommission of a model after the fact. Hansen convincingly argues that such requirements cannot be met by any current or likely future technology, and that framing them as access conditions rather than prohibitions is intellectually dishonest. More broadly, he warns that libraries and archives risk remaking themselves as gatekeepers in the mold of commercial rights holders, adopting the “consent, credit, and compensation” logic of the publishing industry rather than their own foundational values of open access and free inquiry. Access restrictions designed to constrain large corporations, he argues, will in practice fall hardest on smaller, less-resourced actors.

Badii’s critique operates on similar terrain but engages more directly with our impulse paper. Her core claim is that conditional access is a form of control, and that control over data infrastructure is incompatible with the open internet values that have historically made cultural heritage data useful to the world. She argues that authentication requirements, contractual frameworks, and institutional gatekeeping create capacity barriers that fall disproportionately on actors outside well-resourced institutional contexts—researchers in the Global South, independent developers, or small organizations without legal departments. To illustrate this, she constructs a hypothetical actor, “Marwa,” a researcher building an AI application with limited resources, and argues that conditional access frameworks would exclude her. Her preferred alternative is purely technical and load-based access management: rate limiting, throttling, and similar measures that are neutral as to user identity and purpose.

Notably, both authors point to Wikimedia Enterprise (WME) as a preferable model—one that is, structurally, a conditional access product that differentiates between user categories, charges large commercial actors, and offers preferential terms to public interest users. Badii does draw a distinction: she accepts WME-style differentiation because it provides a genuinely new service layer while preserving open access, and objects to differentiation that merely gates existing public content without adding new infrastructure. This is a meaningful and useful distinction—one we take as a design requirement rather than a criticism, and one that our proposal is explicitly built to satisfy. The real disagreement is therefore not about whether differentiated access is acceptable in principle, but about whether the conditions proposed are legitimate and whether the infrastructure is genuinely new.

Conflating access and infrastructure

Both pieces treat conditional access as a restriction on existing baseline access to collections—a departure from open access norms. This mischaracterizes what we are proposing. Maintaining open access to collections is not in tension with a conditional access framework but a necessary precondition in the framework that we propose. What we are proposing is a governance framework for a new layer of data infrastructure—dataset-level bulk exports, high-throughput APIs—that institutions are building, or being encouraged to build, in response to growing demand from AI developers, at significant cost on top of their existing open collections. Attaching conditions to new infrastructure is not the same as restricting existing access.

In practice, this means that conditional access frameworks operate across two distinct layers.

The first layer covers individual item access via existing web interfaces and programmatic APIs. This layer must remain open, governed only by content-neutral technical measures such as rate limiting and throttling. Some lightweight authentication—API keys for capacity management purposes—is compatible with this layer, but there is no place for discrimination between different types of users or for contractual arrangements beyond simple terms of use and standard open content licenses.
The second layer covers access to collection data at bulk-level via dedicated interfaces—dumps, snapshots, high-throughput exports. This is new, purpose-built infrastructure that caters to the needs of a specific class of users: AI developers, large-scale researchers, and data aggregators. Conditions on access to this infrastructure are legitimate precisely because the infrastructure itself is new and its costs for building and maintaining it are real.

The Hypothetical Marwa

This distinction shows where Badii’s critique misses its target. Her “Marwa” is constructed as simultaneously a low-resource individual developer and someone engaged in large-scale data collection. In reality, these two profiles do not coexist in the same actor. A technically competent individual working at a modest scale would have access to rate-limited APIs—the same ones available to everyone—requiring no institutional affiliation, no contract, and no legal review. Conditional access, in the sense of conditions that could exclude her, simply does not apply at this layer.

The only version of Marwa who encounters the bulk access layer is one who needs data volumes that exceed what any individual without significant compute and infrastructure can actually use. At that point, she is no longer the figure Badii is invoking: an actor operating at this scale has, by definition, the engineering capacity, the compute infrastructure, and the institutional backing that makes engaging with conditionalities feasible.

A carefully designed conditional access framework should be able to accommodate both profiles on their own terms—and that is precisely what the two-layer structure is designed to do.

In this sense, Badii’s Marwa hypothetical, despite its structural weaknesses, serves a useful purpose: it is a reminder that any framework that claims to serve public interest users must be able to demonstrate, concretely and operationally, that a low-resource individual developer encounters nothing more than a rate-limited API—and that the public interest tier of any bulk access product is genuinely accessible without institutional backing or legal resources. It also confirms that the right test for conditional access is whether new infrastructure is being provided, not whether existing public content is being gated—a distinction our two-layer model is explicitly built around. We return to what this requires in the final section.

The (a)historical framing problem

The main problem with Hansen’s critique of conditional access is a different one. His critique rests on an understanding of the current situation as continuous with the historical role and situated-ness of libraries and, by extension, cultural heritage institutions. But the situation that cultural heritage institutions find themselves in the age of AI is the result of a break in the very historical continuum that underpins Hansen’s analysis. Libraries and other cultural heritage institutions have not previously faced systematic, large-scale extraction of value from their collections by a small number of the world’s most capitalized companies—a dynamic for which the historical experience of public libraries offers no real precedent. This means that the pre-AI values framework cannot simply be applied unchanged—not because the values are wrong, but because the situation is materially different from what it was before.

Hansen acknowledges the practical burden that large-scale AI extraction places on library infrastructure—he describes scraping bots as effectively conducting DDoS attacks and notes the logistical and financial difference between an undergraduate requesting a few books and an AI company seeking access to thousands of works. But he stops short of framing this as a structural asymmetry that his values-based argument needs to account for. Rather than asking which access model best serves open access values under conditions where a small number of extraordinarily well-resourced actors can extract value from public collections at industrial-scale he treats the practical burdens as a technical infrastructure problem and historical library values as settling the normative question.

But establishing that libraries should support open access and free inquiry is the starting point for deciding how to respond to structural asymmetry in AI data extraction—not a substitute for that analysis.

What Hansen gets right

On several of the concerns he raises about conditional access, Hansen is correct—and acknowledging this helps clarify what a well-designed framework should look like.

Hansen is right to point out that commercial/non-commercial is the wrong axis for differentiation. The better criterion is organizational scale and capacity. Large commercial actors pay; research institutions, NGOs, and open source projects access the same product under permissive or free standardized terms.

He is also right that good actors comply and bad actors do not. But this is problematic primarily in models based on contractual restrictions, and less so when it comes to infrastructure-based differentiation. Wikimedia Enterprise demonstrates this in practice: once a well-structured, reliable bulk access product exists, commercial actors use it rather than scrape—not out of goodwill but because purpose-designed access mechanisms are more efficient than scraping at scale.

In summary, Hansen’s concerns point to two concrete design principles for any conditional access framework: differentiation by organizational scale and capacity rather than by commercial intent, and standardized published access tiers rather than discretionary case-by-case negotiation. A framework built on these principles avoids becoming the gatekeeper that Badii and Hansen rightfully warn against.

Reciprocity, not control

Hansen concludes his piece by stating that “asserting control, demanding compensation, and conditioning access on the institution’s ability to dictate the terms of downstream use — that is not what libraries and archives are for”. He is right that control and compensation are the wrong objectives. But this framing misses a critical distinction. Well-designed conditional access frameworks are not about control or compensation—they are about restoring reciprocity in response to a structural shift in how cultural heritage data is being extracted and used at scale. Put differently, the question is not whether institutions should control their collections but whether it is legitimate to expect those who extract disproportionate value from a shared resource to contribute to its sustainability.

In this context, it is important to recall that cultural heritage collections are a commons built through decades of public investment and stewardship. Unconditional bulk access by a small number of extraordinarily well-resourced actors facilitates a dynamic that turns these commons—together with pretty much all of the publicly available information online—into a one-way input for private value extraction, effectively privatising collective cultural wealth built up over decades of public investment.

This dynamic makes it clear when conditional access frameworks are merited: conditional access at the bulk level is only justified where there is clear additional value being created. There would be no dedicated bulk access infrastructure if there were no demand from AI companies. But satisfying that demand is not a core mission of cultural heritage institutions—it is an additional service built on top of their existing infrastructure, whose provision needs to be balanced with serving their existing users and public interest missions. It is therefore legitimate to seek contributions that sustain the creation and maintenance of such services, including the underlying digitization and data preparation work that feeds them.

At the same time, reciprocity must not result in withholding access from those who either create limited demand on infrastructure or require access as part of their public interest missions. Baseline access at the first layer must remain open. Any bulk access product at the second layer must include a public interest tier that is genuinely free and operationally simple—standardized terms, automatic eligibility for research institutions, CSOs, and open source projects, with no legal department required. Wikimedia Enterprise has set a useful precedent here: from the outset, it granted free access to actors like the Internet Archive, treating this not as an exception but as a core feature of the model.

Conditions for conditional access

These principles point toward a concrete framework, and it is no coincidence that both Hansen and Badii, despite their criticisms of conditional access, point to Wikimedia Enterprise as a workable model. A conditional access framework for cultural heritage data should be organized around four design principles:

Differentiation by organizational scale and capacity, not commercial intent. The commercial/non-commercial distinction is unworkable. The relevant question is whether an actor has the resources to contribute to the sustainability of the infrastructure they depend on—not what they intend to do with the data.
Standardized, published access tiers rather than discretionary negotiation. Conditions must be transparent and applied uniformly. Case-by-case negotiation creates gatekeeping, preserves bargaining leverage, and excludes actors without legal departments. Published criteria eliminate all three problems.
A genuinely free and operationally simple public interest tier as a core feature, not an afterthought. Automatic eligibility for research institutions, CSOs, and open source projects, with no legal review required.
Reciprocity from large commercial actors as a sustainability mechanism, not a revenue strategy. The objective is that contributions from well-resourced actors support public AI infrastructure—covering the costs of bulk access infrastructure, sustaining the underlying digitization and data stewardship work, and contributing to the sustainability of the commons they depend on. This is not monetization of the public domain; it is how the public domain remains viable under conditions of industrial-scale extraction.

These principles are not untested speculation. WME shows that differentiating between user categories while keeping content openly available is operationally feasible. Whether it can generate meaningful returns at the scale of European cultural heritage data is an open question. Any attempt at answering it will require collective action among institutions rather than institution-level policies developed in isolation. But the normative case is clear: in a situation where a small number of extraordinarily well-resourced actors can extract disproportionate value from collections built through decades of public investment, unconditional openness is not a defence of the commons. Reciprocity is.