Key takeways:
- Federated learning breaks the data paradox. It makes it possible to train AI models without centralising data, allowing organisations to combine performance with privacy.
- Data stays local; knowledge is shared. Rather than collecting raw data, only model updates are exchanged, significantly reducing the risk of data breaches.
- Data Protection by design is at the core. Federated learning aligns closely with GDPR principles such as data minimisation and data protection by design.
- Technical and legal challenges remain. These include securing model updates, variable data quality, and ambiguity around responsibilities between parties.
Introduction: the data paradox.
Artificial intelligence needs three ingredients: an algorithm, computing power, and data. The more of each you have, the better. So, the more data available, the better AI models can learn, recognise patterns, and make predictions. This has been the main driver in the increasing need for organisations to collect and centralise data.
At the same time, concerns are growing. Privacy risks, stricter regulation such as the GDPR, and heightened attention to data security make it increasingly difficult to simply bring large volumes of sensitive data together in one central location.
A clear paradox emerges: we need a great deal of data to build powerful AI, but we cannot, or do not want to, centralise it freely.
Federated learning offers an innovative solution. In simple terms: instead of bringing data to the model, the model is brought to the data.
In this blog, we answer the following questions about federated learning:
- What exactly is federated learning?
- How does federated learning work in practice?
- Why is federated learning relevant from a GDPR perspective?
- Where is federated learning already being applied today?
- What are the challenges and key considerations?
What exactly is federated learning?
Federated learning is a machine learning technique in which models are trained across multiple distributed datasets, without the underlying data being shared or centralised.
The core principle is simple but powerful:
- The AI model is sent to where the data resides (for example, a device or an organisation).
- The model is trained locally on the available data.
- Only the model updates (such as weights or parameters) are sent back to a central server.
This is fundamentally different from classical machine learning, where all data is first collected in a central database and then used to train a model. Federated learning reverses this process, keeping data local.
How does federated learning work in practice?
Federated learning follows a cyclical, distributed training process in which multiple parties collaborate without sharing their data. A step-by-step breakdown:
- A global model is developed and shared with various participants (for example, devices or organisations).
- Each participant trains the model locally on their own data.
- The local model updates are sent back to a central server.
- The server aggregates these updates into an improved global model.
- The updated model is shared again with all participants.
This process is repeated multiple times, allowing the model to improve gradually.
Aggregation typically uses techniques such as weighted averaging, where updates from different participants are combined without any individual data becoming visible.
There’s one crucial point to note: the raw data never leaves the local environment. Only derived information (model parameters) is shared.
In practice, federated learning is already being used on smartphones, in hospitals, and in IoT environments, all of which we explore further below.
Why is federated learning relevant from a GDPR perspective?
Federated learning aligns well with several core GDPR principles:
- Data minimisation: no central dataset is built up, meaning fewer personal data are collected and processed.
- Data Protection by design and by default: privacy protection is built into the architecture of the system itself.
- Limitation of transfers: personal data remain local and are not systematically shared between parties.
- Reduced breach risk: there is no central single point of failure holding all the data.
That said, federated learning is not a free pass to process personal data without further analysis or safeguards. Organisations must still carefully assess whether their application is compliant.
In many cases, a Data Protection Impact Assessment (DPIA) remains necessary. Federated learning is often deployed in high-risk contexts, such as large-scale processing, use of sensitive data (such as health data), or innovative technologies. A DPIA helps to map these risks systematically and define appropriate measures.
In addition, model updates can themselves leak information under certain circumstances, which requires additional security measures.
For example, so-called model inversion attacks allow an attacker to attempt to reconstruct sensitive information from shared model parameters, such as characteristics of training data. Membership inference attacks can also reveal whether certain data were included in the training set. These risks call for additional safeguards, including encryption and secure aggregation.
Where is federated learning already being applied today?
Federated learning is no longer a future concept. It is already being used across several domains:
- Mobile devices: keyboards and voice assistants improve their performance based on user behaviour, without that data ever leaving the device.
- Healthcare: hospitals can jointly train AI models on medical data without exchanging patient records.
- Internet of Things (IoT): sensors and smart devices train models on distributed data in real time.
These applications demonstrate how federated learning can deliver concrete benefits without compromising privacy.
What are the challenges and key considerations?
While federated learning offers clear advantages in terms of privacy and data management, it also brings a number of important challenges:
- Uneven data quality: participants often hold different types and qualities of data. This can reduce the accuracy of the global model or skew its outputs, as some datasets carry more weight or are not representative.
- More complex infrastructure: setting up and managing federated systems is technically more demanding than centralised models.
- Security of model updates: updates can potentially leak information or be manipulated.
- Limitations in applicability: not every type of AI application is suited to federated learning. Use cases that depend heavily on centralised, consistent datasets or real-time access to all data (such as certain forms of fraud detection or global optimisation problems) are less suitable. Federated learning works better in scenarios with clearly separated data sources, such as mobile devices or distinct organisations.
There are also legal and organisational considerations, including:
- Agreements on security, incident management, and updates.
- Transparency towards data subjects.
- Control mechanisms within the collaboration.
- The extent to which model updates or the global model itself may qualify as personal data.
- How to assess the division of responsibilities when parties jointly determine the conditions of training.
- The practical exercise of data subjects’ rights (such as the right to access or erasure) in a federated learning context.
These governance and privacy aspects are crucial to implementing federated learning in a compliant and sustainable way.
Conclusion: federated learning is a step towards responsible AI.
Federated learning offers a promising way to develop powerful AI without centralising data. It helps organisations to limit risks related to privacy and data security.
At the same time, it does not replace sound data governance or legal compliance. It is a building block within a broader approach to data protection by design.
Organisations looking to deploy AI would do well to consider technology and compliance together. Federated learning can be one of the key building blocks in that approach, and it has an important role to play in the evolution towards more responsible and sustainable AI systems.
Questions about Federated AI or how to use AI in a compliant way? Our legal professionals are here to assist.