The use of openly licensed photos of faces for the purpose of training AI facial recognition systems has been raised in recent years as one of the controversial use cases for Creative Commons-licensed content.
Since the case received media attention in 2019, it has often been referred to as an example of the inherent conflict between openness and privacy. And of the extraction of value from the commons by corporations.
With AI_Commons, we explored how AI training datasets and openly licensed works included in those datasets can be better governed and shared as a commons.
The case created an opportunity to ask essential questions about the challenges that open licensing faces today, related to privacy, exploitation of the commons at massive scales of use, or dealing with unexpected and unintended uses of works that are openly licensed.
As part of this activity, we also commissioned Adam Harvey to conduct a study on the use of Creative Commons licenses for AI training datasets and Selkie Study to conduct research on the use of openly licensed photographs and machine learning. Furthermore, Aniek Kempeneers conducted a study of design solutions for the case as her MSc graduation project in the DCODE Labs at the Delft University of Technology. In addition, we published an in-depth white paper on understanding the implications of face recognition training with CC-licensed photographs.
AI_Commons ended with the publication of our report “AI_Commons – Filling the governance vacuum on the use of information commons for AI training.” The report summarizes our findings and offers recommendations for commons-based governance of AI datasets.