Copyright rules for AI generated visuals will determine the future of synthetic worlds

Over the past few months, following the public availability of a number of relatively sophisticated image generators^{[1]For example, DALL·E 2, Stable Diffusion or Midjourney all of which are based on machine learning models. I am consciously avoiding to describe these generators as powered by “Artificial Intelligence” throughout this text.}, there has been a resurgence of discussion if the output of these generators can be considered art and if that means that it should be eligible for copyright protection. If this is the case, who should own the resulting copyrights? While this is an intriguing question to anyone thinking about the evolution of copyright, it is fundamentally the wrong question to ask. Instead of analyzing the functioning of these image generators through the lens of copyright, we should ask ourselves a normative question: Why should we want that copyright applies to the visual output of these generators? And to be able to answer these questions, we need to make an effort to understand the societal consequences of awarding exclusive rights to generated visuals.

Before we do this we need to understand how these generators work. On a fundamental level, they are — like all other computers — copying machines. While there is a lot of hype and awe around the new crop of image generators that can indeed generate stunning visual output in reaction to textual prompts fed into them, this does not mean that they are somehow capable of independently creating works of art.

The current crop of generators mainly assembles new visuals by combining various elements of the visuals they have been trained with^{[2]See this recent blog post by Andy Baio for an exploration of 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator.}. Such synthetic combinations can be extremely impressive in terms of technique, but they fail to rise to the level of independently expressing concepts. The current generators tend to function best when asked to combine concepts that are well represented in their training data, but they fail (often miserably) when asked to illustrate abstract concepts, especially when there are no prevalent pre-existing visual representations of these concepts in their training data^{[3] Given that the training data for most of these generators is essentially all images that are freely available on the internet, this replicates the difficulty of human creators to depict abstract concepts.}.

Example 1: DALL·E 2 output in response to the prompt “an open future where there is abundant access to knowledge and culture”

Example 2: DALL·E 2 output in response to the prompt “a high-speed train in the style of Piet Mondrian”

These examples clearly illustrate that, at least with the current generation of image generators, the predominant determinator for the quality (or rather a coherence?) of the output is not the amount of “intelligence” ascribed to these systems, but rather the presence of relevant sets of relatively coherent visual representations within the training data.

In their current form, these generators are essentially very sophisticated probabilistic copying machines that create visual output that is derived from the visual input they have been trained on.

But copyright?!

Discussions on the copyright implications of these image generators tend to center on two different aspects. Questions related to the input — both training data and the prompts fed into the generators — for these generators and questions related to the visual output.

When analyzing the output of these generators from a copyright perspective, a number of questions arise: does the output of these generators qualify for copyright protection? And if so, to whom would these rights belong? To the operators of the generators (sometimes referred to as “AI whisperers“) or to the entities who have built these generators? And what about the rights of authors of protected works that have been used as training data (and that can reference by including “in the style of” qualifiers in the prompts fed to the generators)?

Asking any of these questions assumes that the output of these generators should be treated as works of art — after all, that is what copyright protection applies to. And while it may seem evident to most observers that these generators produce artworks — see, for example, this recent case where the output of a generator was entered into an art competition and won the first prize — we should not simply assume this.

It is almost certain that the current use of these generators to produce static visuals that are used to illustrate blog posts such as this one or shared on social media because of their novelty is not what this technology will be used for in the future. The novelty of the “Teddy bears working on new AI research on the moon in the 1980s” type of images will almost certainly diminish over time, but the technology is here to stay.

Some technology observers have pointed out that one of the most intriguing uses cases for this technology will be in the fields of Virtual Reality and other immersive visual experiences, as image generators can create convincing visuals based on text inputs fed into will make it economically feasible to offer open-ended VR environments:

In the very long run this points to a metaverse vision that is much less deterministic than your typical video game, yet much richer than what is generated on social media. Imagine environments that are not drawn by artists but rather created by AI: this not only increases the possibilities, but crucially, decreases the costs.

This idea points to another way that we can conceptualize the output of generators, namely as digital environments instead of works of art. And, if we accept this way of understanding computer-generated visuals, it becomes clear that bestowing copyright protection on them just because they resemble works of art created by humans would be incredibly short-sighted: it would point to a future where the very environments that we spend increasing amounts of time in will be governed by an inflexible set of legal rules that would reconstitute concepts of ownership and control that do not make sense in these new spaces.

So the real question we should ask in response to the emergence of this new class of visual creations is not if copyright applies to them, but rather if treating them as copyrighted works can possibly result in societal harm. After all, the copyright framework that has emerged in the past 150 years is not a natural law but rather a legal framework that has been created by humans to serve a societal objective. So the question we need to ask is which societal objective would be served by granting strong exclusive rights over the output of machines. The answer to this question might very likely be that there is no societal objective in granting anyone strong, long-lasting exclusive rights over ephemeral environments that are created by sophisticated copying machines.

Copyright rules for AI generated visuals will determine the future of synthetic worlds

But copyright?!

Footnotes