Eryk Salvaggio, at Tech Policy Press, writes about generative AI as a disruptive force for creative work. The underlying, unique driver, says Salvaggio, is that AI models produce synthetic works from “vast streams of creative writing, photographs, and drawings shared online.” At Open Future, we also believe that there is a need to focus not just on the models but also on the sources of AI training data – the content that machine learning technologies feed upon.
Salvaggio’s piece is also relevant for another reason. In his analysis of the impact of generative AI on creativity, he quickly focuses on the field of free culture enabled by Creative Commons licenses. Salvaggio makes a crucial observation that:
Copyright is insufficient to understand these issues. We also need to consider the role of data rights: the protections we offer to people who share information and artistic expression online.
In our AI_Commons work, we reached similar conclusions – although, in that case, our focus was on photographs seen not just as creative works but also as content that expresses personal data about a person. Salvaggio rightly notes a tension between stewarding the commons that can be accessed and reused and ensuring data rights – as this quickly leads to limiting access.
In the final report for our AI_Commons research, we argued that the case suggests a need to shift from a traditional – by now – approach to open sharing to more complex frameworks based on the ideas of the commons. These would aim to offer stronger protections while retaining the spirit of sharing.
Over the last year, we observed this shift happening, with initiatives like the responsible AI licenses or mechanisms for opting out of AI training datasets. These initiatives are close in spirit to the by-now traditional vision of openness. For example, creators of RAIL licenses position their project in relation to open science and open source, and Spawning.ai frames its vision in the context of free culture.
This could be called the post-copyright moment for proponents of different forms of sharing, in different fields of human work and creativity.
With our Data Commons Primer, we proposed this kind of approach in the context of data governance: it encompasses traditional tools of openness, such as free licenses, but adds other mechanisms that are not part of the copyright activism toolkit.
Salvaggio agrees on this point, noting that we need to re-evaluate these approaches and investigate a deepening tension between open access and consent. And underpinning this choice is a growing understanding of the specific impact of generative AI models on the commons – they are technologies that depend on the commons but, simultaneously, can have a cooling effect on contributions. Or, even worse, “dilute” the commons with low-quality, synthetic content. This forces stewards of the digital commons to consider one more novel distinction that falls beyond the vocabulary offered by copyright issues: human-made, “genuine” content versus synthetic media.
Salvaggio ends his piece by proposing elements of a new legal framework to policymakers. This includes opt-out mechanisms on content-sharing platforms, affirming data rights (alongside copyrights), and empowering data consent. Yet governance of generative AI, as it impacts the commons, should not just be a matter of state regulation – Creative Commons licensing, to which Salvaggio refers, is an example of effective, voluntary self-governance. There is a growing need for a new set of community-based principles and governance framework for the digital commons, which combines the achievements of free culture with care for other rights and balances sharing with consent.