Federated Learning: Privacy-Preserving AI
Federated Learning: Privacy-Preserving AI
Artificial intelligence (AI) has become ubiquitous in our daily lives, powering applications such as spam filters, chatbots, and recommendation systems. However, these applications rely on massive amounts of data, often collected from users' personal devices or online activities, to train and improve their AI models. This raises serious concerns about data privacy, security, and ownership, as users may not want to share their sensitive information with third parties, or may face legal or ethical restrictions on doing so. Moreover, centralizing data in one place also poses challenges for scalability, efficiency, and robustness, as data may be too large, too diverse, or too dynamic to be transferred and processed by a single server.
Federated learning is a novel machine learning technique that aims to address these challenges by enabling multiple parties to collaboratively train a common AI model without exchanging their data. Instead of sending their data to a central server, each party (also called a node or a client) trains a local model on its data and periodically communicates with other nodes to update a global model that is shared by all nodes. This way, federated learning preserves data privacy and security, while also leveraging the distributed and heterogeneous nature of data sources.
In this article, we will introduce the concept and principles of federated learning, discuss its advantages and challenges, and review some of its applications and use cases.
Federated Learning: Definition and Formulation
Federated learning was first coined by Google in 2016, although the idea of distributed and collaborative learning has been explored before. Federated learning can be seen as a generalization of distributed learning, where multiple nodes cooperate to train a single model on multiple datasets. However, unlike distributed learning, which assumes that the datasets are independent and identically distributed (i.i.d.) and roughly have the same size, federated learning does not make any assumptions on the properties of the datasets. Federated learning is designed to handle heterogeneous and non-i.i.d. datasets that may vary significantly in size and quality across nodes. Moreover, federated learning also considers the unreliability and resource constraints of the nodes, which may be subject to failures, dropouts, or limited computational power and communication bandwidth.
The objective function for federated learning can be written as follows:
$$\min_{\mathbf{w}} \sum_{k=1}^K \alpha_k f_k(\mathbf{w})$$
where $K$ is the number of nodes, $\mathbf{w}$ are the weights of the global model, $\alpha_k$ are the weights assigned to each node (usually proportional to the size or quality of its dataset), and $f_k(\mathbf{w})$ is the local objective function of node $k$, which measures how well the global model fits its local dataset.
The goal of federated learning is to find the optimal $\mathbf{w}$ that minimizes the sum of the local objective functions over all nodes. However, since the nodes do not share their data with a central server, they cannot directly compute or optimize this objective function. Instead, they use an iterative algorithm that alternates between two steps: local computation and global aggregation.
In the local computation step, each node updates its local model using its dataset and the current global model. This can be done by applying any standard machine learning algorithm, such as stochastic gradient descent (SGD), on a subset of the whole dataset of the node. The result is a new set of local model weights for each node.
In the global aggregation step, each node sends its local model weights to a central server or coordinator, which aggregates them to produce a new global model. This can be done by taking a weighted average of the local model weights, or by using more sophisticated methods such as secure aggregation or differential privacy. The new global model is then broadcast back to all nodes for the next round of local computation.
This process is repeated until some convergence criterion is met, such as reaching a maximum number of iterations or achieving a desired level of accuracy.
Federated Learning: Advantages and Challenges
Federated learning offers several advantages over traditional centralized machine learning techniques:
Privacy preservation: Federated learning does not require nodes to share their data with anyone else, thus protecting their privacy and confidentiality. Moreover, federated learning can also incorporate additional mechanisms such as encryption or noise injection to further enhance data security and prevent malicious attacks.
Data ownership: Federated learning respects the ownership and sovereignty of data owners, who can retain full control over their data and decide whether and how to participate in federated learning. This can also facilitate compliance with data regulations such as GDPR or CCPA.
Data diversity: Federated learning can leverage the rich and diverse data sources that are distributed across different nodes, which may capture different aspects or perspectives of the problem domain. This can improve the generalization and robustness of the global model, as well as enable personalized or customized models for each node.
Data efficiency: Federated learning can reduce the communication and storage costs associated with transferring and centralizing large amounts of data. Instead, only model parameters, which are typically much smaller than data, need to be exchanged between nodes. This can also improve the latency and responsiveness of the system, as nodes can train and update their models locally without waiting for data synchronization.
However, federated learning also faces several challenges and limitations that need to be addressed:
System heterogeneity: Federated learning needs to cope with the heterogeneity and variability of the nodes, which may have different hardware capabilities, network conditions, data availability, and participation rates. This can affect the performance and convergence of federated learning, as well as introduce bias or inconsistency in the global model.
Algorithm design: Federated learning requires novel algorithms and techniques that can handle the distributed and decentralized nature of the system, as well as the privacy and security constraints. For example, federated learning needs to balance the trade-off between local computation and global aggregation, as well as optimize the communication efficiency and reliability between nodes.
Evaluation and validation: Federated learning poses new challenges for evaluating and validating the quality and reliability of the global model, as well as the contribution and impact of each node. For example, federated learning needs to account for the non-i.i.d. and heterogeneous nature of the datasets, as well as the potential influence of malicious or faulty nodes.
Federated Learning: Applications and Use Cases
Federated learning has a wide range of applications and use cases across various domains and industries. Some examples are:
Mobile devices: Federated learning can enable mobile devices such as smartphones or tablets to collaboratively train AI models without sharing their data, such as contacts, messages, photos, or location. This can improve the functionality and user experience of applications such as keyboard prediction, voice recognition, or face detection.
Internet of Things: Federated learning can enable IoT devices such as sensors or cameras to collaboratively train AI models without sharing their raw data, which may be too large or sensitive to transmit. This can improve the performance and efficiency of applications such as smart home, smart city, or smart grid.
Healthcare: Federated learning can enable healthcare providers or researchers to collaboratively train AI models without sharing their patient data, which may be subject to strict privacy regulations or ethical concerns. This can improve the quality and accessibility of healthcare services such as diagnosis, treatment, or drug discovery.
The difference between horizontal and vertical federated learning.
Horizontal federated learning is a type of federated learning where the nodes have the same or similar features, but different samples of data. For example, different hospitals may have the same medical records of different patients, and they want to train a model to predict the risk of a disease. In horizontal federated learning, the nodes can share their model parameters without revealing their data, and learn from each other's samples.
Vertical federated learning is a type of federated learning where the nodes have different features, but share the same or overlapping samples of data. For example, different companies may have different information about the same customers, such as demographic, behavioural, or financial data, and they want to train a model to optimize their marketing strategies. In vertical federated learning, the nodes cannot directly share their model parameters, because they have different feature spaces. Instead, they need to use some techniques to align their features aggregate their gradients without revealing their data and learn from each other's features.
Some applications of horizontal federated learning are:
Mobile keyboard prediction: Google uses horizontal federated learning to improve the accuracy and personalization of its keyboard app, Gboard, without collecting users' typing data. Users can train a local model on their device based on their typing patterns and preferences, and then share the model updates with Google's server, which aggregates them to produce a global model that is shared by all users.
Medical diagnosis: Hospitals or clinics can use horizontal federated learning to train a common model for diagnosing diseases or predicting outcomes, without sharing their patient data. For example, a study used horizontal federated learning to train a deep neural network for brain tumour segmentation, using data from multiple institutions. The results showed that the federated model achieved comparable performance to the centralized model while preserving data privacy and security.
Smart grid load forecasting: Electric utilities can use horizontal federated learning to train a model for forecasting the electricity load demand, without sharing their customer data. For example, a paper proposed a horizontal federated learning framework for smart grid load forecasting, using data from different regions. The framework used a recurrent neural network to capture the temporal patterns of the load demand, and a secure aggregation protocol to protect the data privacy. The experiments showed that the federated model outperformed the local models and achieved similar accuracy to the centralized model.
Some challenges of horizontal federated learning are:
Data heterogeneity: The data distribution across different nodes may be non-i.i.d. and unbalanced, meaning that some nodes may have more or less data, or different data characteristics, than others. This can cause bias or inconsistency in the global model, as well as reduce the convergence speed and accuracy of federated learning.
Communication efficiency: The communication between nodes and the central server may be costly, unreliable, or constrained by bandwidth or latency. This can affect the performance and scalability of federated learning, as well as introduce security risks or privacy leaks. Therefore, federated learning needs to optimize the communication frequency, size, and reliability of model updates.
Privacy preservation: The model updates exchanged between nodes and the central server may contain sensitive information about the data or the nodes. This can expose the nodes to potential attacks or breaches from malicious parties, such as eavesdropping, tampering, or inference. Therefore, federated learning needs to incorporate additional mechanisms such as encryption, noise injection, or secure aggregation to protect the privacy and security of the model updates.
Horizontal federated learning and vertical federated learning are two types of federated learning that differ in how the data is distributed and partitioned across the nodes.
In horizontal federated learning, the nodes have the same or similar features, but different samples of data. For example, different hospitals may have the same medical records of different patients, and they want to train a model to predict the risk of a disease. In horizontal federated learning, the nodes can share their model parameters without revealing their data, and learn from each other's samples.
In vertical federated learning, the nodes have different features, but share the same or overlapping samples of data. For example, different companies may have different information about the same customers, such as demographic, behavioural, or financial data, and they want to train a model to optimize their marketing strategies. In vertical federated learning, the nodes cannot directly share their model parameters, because they have different feature spaces. Instead, they need to use some techniques to align their features aggregate their gradients without revealing their data and learn from each other's features.
Horizontal federated learning is more suitable for scenarios where the data is homogeneous and abundant across the nodes, while vertical federated learning is more suitable for scenarios where the data is heterogeneous and scarce across the nodes. Horizontal federated learning can achieve higher communication efficiency and privacy preservation than vertical federated learning because the global model is the same as the local models and the update of the model does not depend on the participation of all parties. However, vertical federated learning can leverage more diverse and complementary data sources than horizontal federated learning, which can improve the performance and robustness of the global model.
Differential privacy and secure aggregation
Differential privacy and secure aggregation are two techniques that can be used to enhance the privacy and security of federated learning. They have different goals and methods, but they can also be combined to achieve stronger guarantees.
Differential privacy (DP) is a mathematical framework that quantifies the privacy loss of a data analysis or a machine learning algorithm. It provides a formal guarantee that the output of the algorithm is not significantly affected by the presence or absence of any individual data point. This means that an adversary who observes the output cannot infer much information about any specific data point. DP can be achieved by adding carefully calibrated noise to the data or the output, or by using subsampling techniques. DP has two parameters: epsilon and delta, which measure the privacy loss and the probability of failure, respectively. Smaller values of epsilon and delta imply stronger privacy guarantees, but also lower utility or accuracy.
Secure aggregation (SecAgg) is a cryptographic protocol that allows multiple parties to compute the sum of their inputs without revealing their inputs to anyone else, not even a central server. SecAgg can be used in federated learning to aggregate the model updates from different nodes without exposing their local data or gradients. SecAgg relies on secret sharing and multiparty computation techniques, which split each input into random shares and distributed them among other parties. The parties then exchange and combine their shares to obtain the sum of the inputs, without learning anything else. SecAgg has two parameters: group size and bit length, which measure the number of parties involved in each aggregation and the number of bits used to represent each input, respectively. Larger values of group size and bit length imply stronger security guarantees, but also higher communication and computation costs.
DP and SecAgg can be used together in federated learning to achieve both differential privacy and malicious security. This means that the model updates are protected from both passive and active adversaries, who may try to observe or tamper with the data or the protocol. The combination of DP and SecAgg can be done by applying DP noise to each model update before sending it to SecAgg, or by applying DP noise to the aggregated model update after SecAgg. The trade-off between privacy, security, and utility depends on the choice of parameters for both DP and SecAgg, as well as the properties of the data and the model.
Differential privacy and homomorphic encryption are two techniques that can be used to enhance the privacy and security of federated learning. They have different goals and methods, but they can also be combined to achieve stronger guarantees.
Differential privacy (DP) is a mathematical framework that quantifies the privacy loss of a data analysis or a machine learning algorithm. It provides a formal guarantee that the output of the algorithm is not significantly affected by the presence or absence of any individual data point. This means that an adversary who observes the output cannot infer much information about any specific data point. DP can be achieved by adding carefully calibrated noise to the data or the output, or by using subsampling techniques. DP has two parameters: epsilon and delta, which measure the privacy loss and the probability of failure, respectively. Smaller values of epsilon and delta imply stronger privacy guarantees, but also lower utility or accuracy.
Homomorphic encryption (HE) is a cryptographic technique that allows performing computations on encrypted data without decrypting it. HE can ensure information security since it preserves the structure and semantics of the data during encryption and decryption. Moreover, HE can provide lossless data privacy protection. However, the HE mechanism needs to address the problem of substantial computational overhead, which causes it to suffer from issues such as low efficiency, large keys, and ciphertext explosion.
DP and HE can be used together in federated learning to achieve both differential privacy and malicious security. This means that the model updates are protected from both passive and active adversaries, who may try to observe or tamper with the data or the protocol. The combination of DP and HE can be done by applying DP noise to each model update before sending it to HE, or by applying DP noise to the aggregated model update after HE. The trade-off between privacy, security, and utility depends on the choice of parameters for both DP and HE, as well as the properties of the data and the model.
Some other privacy-preserving techniques in federated learning are:
Shuffling: Shuffling is a technique that randomly permutes the order of the model updates before sending them to the central server. This can prevent the server from linking the updates to the corresponding nodes, and thus protect the nodes' identities and data distributions. Shuffling can also reduce the communication cost and latency of federated learning, as it allows for more efficient compression and encoding schemes.
Split learning: Split learning is a technique that splits the model into two parts: a local part and a global part. Each node trains the local part on its data and then sends the intermediate activations to the central server, which trains the global part on the aggregated activations. This can reduce the amount of information revealed by the model updates, as well as the computation and communication costs of federated learning.
Federated dropout: Federated dropout is a technique that randomly drops out some of the model parameters during each round of federated learning. This can introduce randomness and noise into the model updates, which can enhance the privacy and robustness of federated learning. Federated dropout can also improve the generalization and diversity of the global model, as it prevents overfitting and encourages exploration.
Conclusion
Federated learning is still an active and evolving research field, with many open problems and opportunities for future work. We hope this article has provided a comprehensive introduction and overview of federated learning, and inspired you to learn more about this exciting topic. It is a promising machine learning technique that enables multiple parties to collaboratively train a common AI model without exchanging their data. This can preserve data privacy and security, while also leveraging data diversity and efficiency. However, federated learning also faces several challenges and limitations that need to be addressed by novel algorithms and techniques. Federated learning has a wide range of applications and use cases across various domains and industries, such as mobile devices, IoT, or healthcare.