A community-based approach for high-value datasets

On 24 May, the Commission opened a public consultation on the availability of public datasets. The goal is to define a list of high-value datasets that will be made available publicly for reuse, based on the rules set in the Open Data Directive. In line with this Directive, the goal of opening up these datasets is

“to ensure that public data of highest socio-economic potential are made available for re-use with minimal legal and technical restriction and free of charge”. (draft recital 2)

The release of these high-value public datasets builds on over a decade of Open Data policies, but should also be interpreted in the context of the European strategy for data, with its aim to foster access to and reuse of data in Europe. If proposed, the implementing regulation will identify a list of high-value datasets to be shared as open data, increasing data availability in the Union.

In light of this possibility, it is worth considering an alternative approach to conceptualizing high-value datasets proposed in India. Following the publication of the 2020 Report by the Committee of Experts on Non-Personal Data Governance Framework, the Indian government started developing a “community-based approach” which identifies a series of high-value datasets not as open data, but as public goods that are to be managed by data trustees on behalf of communities, whose collective rights are represented in those datasets.

As such, the Indian framework does not only try to derive the highest socio-economic potential but also attempts to maximize social value and rethink the role that data can play in our societies by embedding public interest considerations in the identification and use of high-value datasets. In doing this, it broadens the scope of high-value datasets to include not just public, non-personal data.

A community-based approach to high-value datasets

The 2020 Report defines a community as “any group of people that are bound by common interests and purposes and involved in social and/or economic interactions” (p. 16). This broad definition empowers citizens with two rights to reap the benefits accruing from non-personal data processing. First, the community is empowered with a right to derive economic and other value and maximize data’s benefits; Second, it enjoys a right to eliminate or minimize harms that can arise from data processing and sharing.

To realize these rights, the 2020 Report clearly defines data as a non-rivalrous resource, instead of a private asset, “where the value of data may be consumed by several organizations and communities, without degrading its value to the relevant community” (p. 16). And it identifies two central mechanisms that can serve the community interests: high-value datasets, and a data trustee.

A high-value dataset is defined as “a dataset that is beneficial to the community at large and shared as a public good” (p. 18). A high-value dataset is deemed to exist in areas that are beneficial to the community, spanning from, but not limited to, policymaking, job creation, business creation, research and education, innovation, poverty alleviation, financial inclusion, healthcare, etc. A high-value dataset is established based on data sharing for “public good purpose” where data may be requested for community uses. High-value datasets consist of an aggregation of both private and public sector data, as long as such data support societal objectives which are beneficial to the community at large.

The curation of the dataset is operated by the data trustee – a public sector or non-profit organization – tasked with the “exercise of the rights of the community over non-personal data collected in these high-value datasets” (p. 17). This responsibility is translated with a threefold duty of care where the data trustee has, first, a responsibility to ensure that high-value datasets are only used in the interest of the community; second, that no harm can occur through the re-identification of users’ non-personal data; and, third, that redressal mechanisms are available to the community in case of harm. These are aligned with the two above-mentioned rights that the community enjoys over its data.

To compile and update datasets, the expert group empowers the data trustee with the unique prerogative of requesting data from public or private sector organizations, as long as such data is relevant for a high-value dataset. In exchange, private and public sector organizations must provide non-discriminatory access to data. Likewise, the trustee is obliged to give access to any organization registered in India, as long as it does not constitute an individual person.

European communities for European data governance?

A community-based approach to data governance is an important framework that could be implemented as one of the governance mechanisms of the European strategy for data. The Indian approach is different from the European framework, as it provides conditions to treat not just public data as public goods while empowering users with collective rights to make sure that they can benefit from data processing. Yet, the two should not be necessarily seen in opposition, but as complementary, especially since the European strategy for data aims to reconsider the role that data has to play in the fulfillment of societal objectives and, therefore, touches on important questions related to value distribution.

And it is on this point that the European approach is, for now, failing to deliver. As recently argued by Maximilian Grafenstein, the EU would be struggling to achieve its data governance ambitions due to an inherent difficulty in reconciling its traditional free-market approach with so-called “data sharing prescriptions” – measures that simply strive to make more data available without questioning underlying market dynamics. The European approach to high-value datasets perfectly fits within this critique: while open data enables greater access to and reuse of data, experience with this model shows that it often does not shift these power dynamics. Differently, the Indian approach focuses on the maximization of societal value by rethinking the role that data can play in our societies via a community-based approach.

The Commission’s ambition to develop a novel European way of data governance needs not only a market blueprint but the establishment of communities that can reap the benefits of data sharing and use. The Indian approach provides a framework that helps maximize the collective benefits stemming from data access and reuse in the public interest. In the European context, this framework could be applied to data intermediaries and data cooperatives. A clear community-based approach – such as the Indian framework – would provide an approach to collective governance of data rights while avoiding the recentralization of market power (an issue I analyzed earlier in this opinion on the Data Governance Act).