Filling the governance vacuum on the use of information commons for AI training
January 12, 2023

This publication is the final report from our AI_Commons activity, conducted in 2021-2022. The report summarizes our findings and offers recommendations for commons-based governance of AI datasets.

The use of openly licensed photos of faces for training AI facial recognition systems has been raised in recent years as one of the controversial use cases for Creative Commons-licensed content.

The case created an opportunity to ask essential questions about the challenges that open licensing faces today, related to privacy, exploitation of the commons at massive scales of use, or dealing with unexpected and unintended uses of works that are openly licensed.

Our AI_Commons work was an exploration of how AI training datasets, and openly licensed works included in those datasets, can be better governed and shared as a commons.

As part of this activity, we also commissioned Adam Harvey to conduct a study on the use of Creative Commons licenses for AI training datasets, and Selkie Study to conduct research on the use of openly licensed photographs and machine learning. Furthermore, Aniek Kempeneers has conducted a study of design solutions for the case as her MSc graduation project in the DCODE Labs at the Delft University of Technology. We also published an in-depth white paper on understanding the implications of face recognition training with CC-licensed photographs. (See the full timeline of this activity).


Read the report

The authors want to thank experts who have contributed their ideas and feedback to this research project: Peter Cihon, Jennifer Ding, Carlos Muñoz Ferrandis, David Kanter, Jennifer Lee, Mike Linksvayer, Ben MacAskill, Roger MacDonald, Jacob Rogers, Cari Spivack, Paul Stacey, Barry Threw, Luis Villa, Kat Walsh.

Alek Tarkowski
Zuzanna Warso
download as PDF:
keep up to date
and subscribe
to our newsletter