Last week, Spawning launched source.plus, a platform for “curating, enriching and downloading non-infringing media collections in bulk for AI training.” This is a significant step in addressing a host of issues with AI training datasets, such as LAION or face recognition training datasets.
The aim of this experimental platform is to demonstrate that licensed content is not the only viable solution:
This means the most conscientious developers and most affected communities are often on the sidelines of this rapidly developing field, whereas these are the very groups that need to be steering its evolution, and they too should be able to benefit from participation with AI.
Its real value lies not just in the volume of aggregated media files but in how they are curated and governed. It is an interface to established collections, it introduces additional mechanisms – many of which we have proposed in our recent white paper, Commons-based governance of data sets for AI training. For example, source.plus is the first collection to offer an “opt-out” mechanism. Spawning is also planning to introduce value-sharing mechanisms, including paid collections of in-copyright works and a donation mechanism that supports cultural heritage institutions.