European Health Data Space and the Artificial Intelligence Act proposal: are they tangled?

by Mariana A. Rissetto

The author is (partly) funded by the TVB-Cloud Project. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 826421.

Artificial Intelligence (AI) systems are mined and trained with data and a bridge is being built at the European Union (EU) level to facilitate access to health data, in particular through the European Health Data Space. In this context, what do the European Union Artificial Intelligence Act proposali(AIA proposal) and the European Health Data Space Initiative (EHDS) foresee for their interplay?

This question is posed in the realm of a data-driven economy and an AI regulation boost at the EU level, specifically in the context of health & healthcare, where the value of personal data and in particular health data is skyrocketing, aiming to improve the diagnoses, prognoses, treatment and prediction of health diseases and to enhance research and innovation.

Let’s assume that both EHDS initiative and AIA proposal reach a binding legal status at the EU level, and in this assumption an interdisciplinary and multi-jurisdictional research project wants to avail itself from the access to health data facilitated by an established EHDS. A project similar to the TVB-Cloud Project - a Horizon 2020 EU research funded project aiming to develop a platform designed to integrate and analyse data by applying machine learning techniques with the aim to predict and diagnose neurodegenerative diseasesii- relies on accessibility to health data for conducting its research applying machine-learning techniques. So, the question above posed from a practical point of view: Can, and how can, a research project applying high-risk AI systems access health data facilitated by a hypothetically established EHDS?

Context

The EU Proposal for a Regulation on European data governance (‘DGA proposal’)iii was tabled against the regulatory and practical obstacles to the use of health data in the EU, in particular regarding accessibility and interoperability of health data. The DGA proposal creates a European Data Space, a tool that supports the objective of creating a single market for data and foresees dedicated data spaces. As one of them, the EHDS, focused on health data, is on the way of becoming a legislative proposal, and as part of the preparatory works to help shaping the planned legislative initiative, an open public consultation on the EHDS structure was launched- the results are now pending for analysis and publication-iv.

In parallel, the AIA Proposal was tabled, among others, as a response to the global AI boost and need for regulation, in particular given attention to high-risk AI systems.

Such EHDS initiative and the AIA proposal are certainly intertwined considering that health data and access to such data is pivotal for training, validating and testing trustworthy and well-functioning AI systems. In particular, as the AIA proposal suggests, the nascent EHDS is thought to be a space where high-quality data can be retrieved from for the purposes of developing a trustworthy AI system.

It cannot be denied that an interplay is foreseen between the EHDS and AIA proposal. The EHDS is envisaged ‘as a system for data exchange and access which is governed by common rules, procedures and technical standards to ensure that health data can be accessed within and between Member States[…]’v, which one of its objectives is to ‘provide access to datasets necessary to make successful use of emerging responsible, human centred artificial intelligence and machine learning techniques to drive innovation in healthcare’vi.

This objective has been captured in the text of the AIA proposal Recital 45, which suggests that for the development of AI systems certain actors should have access and be able to use high quality data. It states that data centre hubs or EHDS will provide trustful, accountable and non-discriminatory access to high quality data for the training, validation and testing of AI systems.

It can be stated out of gist of this recital that availability, access, and quality and interoperability of data are key aspects for the materialization of this compliance-alike tool. This recital identifies the EHDS as a proper data platform/hub to provide these aspects for training AI systems with health data.

Tangled to detangled

In order to model the interplay between the EHDS and the AIA proposal, there are a number of considerations which should be addressed, ideally before the EHDS legislative proposal materializes.

A first consideration is the nature of the recitals per se. As a non-operative part of a regulation, there is a risk of not resorting to the EHDS as a data hub for testing an AI system, posing the question whether this mere reference would turn into dead-end interaction in detriment of harmonized and trustworthy trained AI systems.
There are strong arguments in favour of having an EHDS as a separate space with its own regulatory scheme, complementing with health specificities the European Data Space as set forth by the DGA Proposal. Nevertheless, how is this reflected in the AIA proposal? Should this specificity of AI systems working with health data and therefore requesting/accessing to health data from the EHDS also be mentioned in the AIA proposal? Is there a need for a dedicated set of rules addressing the specificity of AI systems applied to health & healthcare?
Data quality and data quantity. The AIA proposal builds in an’ ecosystem of excellence and an ecosystem of trust for AI in Europe’vii, and therefore, the ‘availability of high-quality health data and the possibility of using, combining and reusing data from various sources in line with the EU acquis, […], are essential prerequisites for the development and deployment of AI systems’viii. Concerning data quality, the expectations of high-quality (health) data or high data quality for development and deployment of AI systems are then translated into the EHDS, which should guarantee the same standards of data if such systems are planned to use health data from the EHDS. Interestingly, the AIA proposal does not address the requirements for data to be considered as high quality, which can constitute a terminology and interpretative issue for the interaction between EHDS and AI systems. The EHDS incorporates Fair-ification of data/datasets in its planned governance, which might not necessarily encompass what is understood by high-quality health data in the AIA proposal. Having a high value quality data for AI and innovation is essential to avoid discriminatory biases. To achieve this, data sets should be based on huge amounts of varied data. The need of big data, as understood by its three to five ‘Vs’ix, seems to be a requirement for the proper training, validation and verification of AI systems. Again, a conceptual and terminological issue not addressed in the AIA proposal, which poses the question to what extent is data enough to be high quality data, and what is the volume of data expected to be facilitated by the EHDS.
The re-use of health data for Recital 45 AIA proposal purposes poses data protection concerns. In a nutshell, some of the problems reflected in the Study on health data processing in light of GDPR conducted by the EC reflect issues of ‘low re-use of health data; fragmented implementation of GDPR for processing in health and research; cumbersome cross-border access to health data, lack of coherent mechanisms and procedures across EU; and fragmented digital infrastructuresx. Here the planned consultation with the European Data Protection Supervisor might shed some light on the problematics posed. Another related question is whether these issues, which go beyond the scope of the EHDS and have already been, at least, under discussion for many years now, will be addressed before the EHDS implementation.
As regards planning the EHDS governance in relation to AI, the question is whether there are specific requirements foreseen when requesting access for training data in line with Article 10 of AIA Proposal, in particular for high-risk AI systems?
AI specificity of the health & healthcare sector. It is argued that for a trustworthy application of AI systems in the health & healthcare sector certain concerns raised by patients and the healthcare sector need to be at rest. Concerns like transparency and trust, liability of healthcare works, human approach in diagnoses/treatment, information that both actors have, if AI is involvedxi, diminish the trustworthiness of AI systems, and therefore could be an obstacle when requesting access to health data through the EHDS. In this regard, will the EHDS governance consider such concerns if health data are used for training AI systems?
The role of EHDS vis-à-vis AI systems. Section 3 of the above-mentioned open consultation addresses the development and use of AI systems in healthcare, and suggest measures which help to ‘facilitate sharing and use of data sets for the development and testing of Artificial Intelligence in healthcare’xii and to ‘ensure collaboration and education between Artificial Intelligence developers and healthcare professionals’xiii. A point in the consultation, which will also speak out on the interaction between EHDS and AIA in regard to making available data for developing and training of AI systems in healthcare, is determining the role of EHDSxiv.

The interplay - A way forward?

EHDS is envisaged by the AIA proposal to be one possible data platform/hub to resort to for the purposes of training, validating, and testing AI systems using health data. The interplay between these two instruments (AIA Proposal and EHDS Initiative) carries preliminary considerations to be detangled ideally before the EHDS legislative proposal is drafted.

Notwithstanding the European Commission’s efforts to suggest solutions to some of these considerations by carrying out the public consultation, there are still many, e.g. data protection concerns, which go beyond the mere EHDS initiative and AIA proposal. The question remains how these concerns are going to be addressed in the context of a safe EHDS and trustworthy AI systems.