Introduction
Federated learning has emerged as a promising solution to the growing concerns about data privacy and security in machine learning applications. This distributed learning paradigm enables multiple clients to collaboratively train a shared model without directly sharing their private data. While traditional federated learning approaches have shown success in scenarios with independent and identically distributed (IID) data, they face significant challenges when dealing with highly heterogeneous, non-IID data distributions commonly encountered in real-world applications such as healthcare diagnostics.
The conventional federated learning framework, exemplified by algorithms like FedAvg, suffers from two major limitations in non-IID settings: poor model convergence and lack of personalized solutions. These limitations stem from the fundamental assumption that a single global model can adequately serve all clients, despite their potentially diverse data characteristics and task requirements. In medical diagnosis scenarios, for instance, different hospitals may serve distinct patient populations with varying demographics, disease prevalence, and medical equipment, making a one-size-fits-all model suboptimal for each institution’s specific needs.
To address these challenges, personalized federated learning (PFL) has gained increasing attention. Unlike traditional approaches that focus on learning a single global model, PFL aims to develop customized models for each client while still benefiting from collaborative training. Existing PFL methods can be broadly categorized into two groups: those that operate at the model level by combining local and global models, and those that employ model decoupling techniques to separate shared and personalized components.
The proposed FedAM (Attention-driven Feature Separation Method for Personalized Federated Learning) represents a significant advancement in PFL by introducing a more sophisticated approach to handling client-specific information. FedAM builds upon model decoupling techniques but goes further by incorporating attention mechanisms to dynamically separate global and personalized features at the data level. This approach provides finer-grained control over information sharing and personalization compared to previous methods that primarily focused on parameter separation at the model level.
Related Work
The field of personalized federated learning has seen rapid development in recent years, with various approaches proposed to address the challenges of non-IID data. Early PFL techniques focused on aggregating locally trained personalized models with the global model to enhance performance. These methods operated at the complete model level, treating each client’s model as a monolithic entity during the personalization process.
More advanced approaches introduced the concept of model decoupling, where the neural network architecture is explicitly divided into shared (global) and personalized components. FedPer pioneered this direction by separating the model into a shared feature extractor and client-specific heads. This architecture allowed clients to maintain personalized classification layers while benefiting from a collaboratively learned feature representation. Building on this idea, FedRep further refined the approach by alternating between updating shared and personalized components during training. FedRoD introduced the concept of learning two distinct heads with different objectives to bridge the gap between traditional federated learning and PFL.
While these model decoupling methods demonstrated improvements over conventional approaches, they still faced limitations in fully capturing and utilizing the rich information embedded in client data. The parameters of neural networks represent highly compressed and abstracted versions of the original data, making it challenging to completely preserve and reflect the specific information contained in each client’s dataset. This limitation motivated researchers to explore approaches that operate directly on data features rather than just model parameters.
Recent work in this direction includes FedCP, which generates conditional policies for each data sample to separate global and personalized information before processing them with respective heads. Although FedCP represented progress in data-level processing, it still had shortcomings in effectively balancing and separating these information types. The attention mechanism, widely successful in various deep learning domains, offers a promising solution to these challenges due to its ability to dynamically focus on relevant parts of input data.
Attention mechanisms have proven particularly effective in handling distribution differences in both computer vision and natural language processing tasks. In computer vision, attention helps models distinguish and extract both global and specific features from images. For natural language processing, attention enables models to process text data with significant distribution variations while maintaining both global information coherence and local relevance. These successful applications provide theoretical foundations for incorporating attention mechanisms into personalized federated learning frameworks.
Methodology
The FedAM framework introduces a novel architecture that combines model decoupling with attention-driven feature separation to achieve adaptive, dynamic partitioning of global and personalized information. The system decomposes the neural network model into five distinct components: global feature extractor, global head, personalized feature extractor, personalized head, and attention module. This decomposition provides more granular control over information flow compared to previous approaches that typically divided models into just two or three parts.
At the beginning of each federated learning round, the server distributes global model parameters (including both feature extractor and head components) along with the attention module to participating clients. Each client initializes its local model using these global parameters while maintaining separate personalized components. During local training, the global parameters remain frozen to preserve their generalizability, while the personalized components and attention module are updated to adapt to the client’s specific data distribution.
The attention module represents the core innovation of FedAM, enabling dynamic, sample-specific feature separation. This module consists of two linear layers with ReLU activation and softmax output, generating two complementary weight matrices that partition input features into global and personalized components. The attention weights are computed based on contextual information that combines both sample features and client-specific characteristics derived from the personalized head parameters.
The separation process works by applying the attention-generated weights to the feature vectors produced by the personalized feature extractor. This results in two modified feature representations: one emphasizing globally relevant aspects and the other focusing on client-specific characteristics. These separated features are then processed by their respective heads (global or personalized) to produce predictions that are combined for the final output.
To maintain harmony between global and personalized components, FedAM introduces a correlation alignment loss term. This term helps align the feature distributions produced by the personalized extractor with those from the global extractor, preventing excessive divergence that could degrade performance. The alignment is achieved by minimizing differences in feature correlations rather than enforcing strict similarity, allowing for flexible adaptation while preserving beneficial global characteristics.
The federated aggregation process in FedAM carefully combines contributions from different clients to update the global model. The personalized feature extractors and attention modules from clients are directly averaged to form new global versions of these components. For the global head, FedAM employs a weighted combination of both global and personalized heads from clients, striking a balance between maintaining generalizability and incorporating useful personalized adaptations.
This comprehensive approach enables FedAM to achieve superior performance in several key aspects. The attention mechanism provides fine-grained control over feature separation, allowing the model to dynamically adjust the blend of global and personalized information for each sample. The correlation alignment ensures that personalization doesn’t come at the cost of losing beneficial global patterns. The sophisticated aggregation scheme facilitates effective knowledge sharing while respecting the diversity of client needs.
Experimental Evaluation
The effectiveness of FedAM was rigorously evaluated through extensive experiments on multiple standard datasets and compared against several state-of-the-art federated learning approaches. The evaluation covered four widely used image classification benchmarks: MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet. To assess performance across different model architectures, tests were conducted using both a 4-layer convolutional neural network and the more complex ResNet-18 backbone.
Experimental settings carefully simulated real-world non-IID conditions by using Dirichlet distribution to partition data among clients, with concentration parameter β=0.1 creating highly heterogeneous distributions. Each client’s data was split into training (75%) and testing (25%) sets. Default configurations included 20 clients with full participation (ρ=1), local batch size of 10, and 1 local epoch per round, running for 2000 communication rounds until convergence.
FedAM demonstrated superior performance across all datasets and model architectures. On the 4-layer CNN, FedAM achieved accuracy improvements over the best baseline methods by 0.01% on MNIST (99.72% vs 99.71%), 0.64% on CIFAR-10 (92.11% vs 91.47%), 0.06% on CIFAR-100 (59.70% vs 59.64%), and 0.06% on Tiny-ImageNet (43.59% vs 43.53%). The advantage became more pronounced with ResNet-18 on Tiny-ImageNet, where FedAM outperformed by 7.2% (51.59% vs 44.39%).
Analysis revealed several factors contributing to FedAM’s success. The attention mechanism’s dynamic feature separation proved particularly effective in handling non-IID data, allowing appropriate balancing of global and local information for each sample. The correlation alignment loss successfully maintained harmony between personalized adaptations and global knowledge. The method also showed robustness to varying numbers of clients and different local training configurations.
Communication efficiency studies showed FedAM’s practical viability. While introducing some additional computation due to personalized components, FedAM’s per-iteration time (2.31 minutes) was lower than recent baselines FedRep (4.09 minutes) and FedCP (2.75 minutes). Notably, with ResNet-18, FedAM increased accuracy by 16.21% while only adding 1.35% to communication costs compared to FedAvg.
Scalability tests examined performance with varying numbers of clients (10 to 100). As client count increased (making each client’s data more sparse), FedAM maintained superior performance, outperforming the best baseline by 3.30% (N=10), 4.16% (N=30), 6.85% (N=50), and 3.32% (N=100) on CIFAR-100. This demonstrates FedAM’s ability to handle data sparsity challenges in large-scale deployments.
Robustness evaluations simulated real-world scenarios with client dropouts. Experiments randomly selected subsets of clients (10 or 30 out of 50) to participate in each round. FedAM showed graceful degradation, maintaining strong performance even with only 20% client participation (48.35% accuracy vs 51.27% for FedCP), and achieving 58.02% with 60% participation, ultimately reaching 62.49% with full participation.
Additional experiments investigated the impact of increasing local computation (more epochs per round). While most methods suffered from client drift with more local training, FedAM maintained relatively stable performance, decreasing only from 92.31% (5 epochs) to 88.17% (40 epochs) on CIFAR-10, still outperforming alternatives. This suggests FedAM’s suitability for scenarios where communication costs necessitate more local computation.
Discussion and Future Directions
The experimental results demonstrate that FedAM effectively addresses key challenges in personalized federated learning, particularly in handling highly heterogeneous data distributions. The success of FedAM can be attributed to its comprehensive approach that combines model decoupling with attention-driven feature separation and correlation-based alignment. This combination allows for more nuanced handling of global and personalized information compared to previous methods that operated primarily at either the model or data level.
One of FedAM’s most significant advantages is its ability to dynamically adjust the balance between global and personalized features for each individual sample through the attention mechanism. This sample-specific adaptation provides finer-grained control than methods that apply uniform personalization strategies across all data from a client. The attention weights effectively identify which features should be treated as globally relevant versus client-specific, leading to more effective knowledge sharing while preserving important local characteristics.
The correlation alignment loss represents another important innovation, addressing a critical challenge in personalized federated learning: maintaining an appropriate relationship between personalized adaptations and global knowledge. By aligning feature correlations rather than enforcing strict similarity, this approach allows for flexible personalization while preventing harmful divergence that could degrade model performance. This balance is particularly crucial in applications like medical diagnosis, where both general medical knowledge and institution-specific patterns contribute to accurate predictions.
Despite its strengths, FedAM has certain limitations that point to valuable directions for future research. Currently, FedAM focuses on client-side personalization without explicitly modeling inter-client relationships at the server level. Incorporating server-side processing that considers similarities and differences between clients could further enhance performance, especially in scenarios with complex client relationships or hierarchical data distributions.
Another promising direction involves developing more sophisticated attention mechanisms specifically tailored for federated learning scenarios. Current attention modules in FedAM, while effective, were adapted from single-model paradigms. Designing attention mechanisms that explicitly account for federated-specific considerations like communication efficiency, privacy preservation, and robustness to client heterogeneity could yield additional improvements.
The performance degradation observed with increased local computation (more training epochs per round) suggests room for improvement in handling client drift during extended local training. Techniques that better preserve global knowledge during local updates or more intelligently regulate the pace of personalization could help maintain stability in communication-constrained scenarios.
Practical deployment considerations also warrant attention. While FedAM showed reasonable communication efficiency, further optimizations could make it more suitable for resource-constrained edge devices. Techniques like attention module compression or selective parameter updating could reduce communication and computation overhead without significantly compromising performance.
Finally, extending FedAM’s principles beyond image classification to other domains such as natural language processing, time-series analysis, or multimodal learning represents an important direction. Each domain presents unique challenges in feature separation and personalization that may require adaptations of the core FedAM approach while maintaining its fundamental advantages.
Conclusion
FedAM represents a significant advancement in personalized federated learning, introducing a novel approach that combines model decoupling with attention-driven feature separation. By dynamically partitioning features into global and personalized components at the data level, FedAM achieves more nuanced and effective personalization compared to methods that operate solely at the model parameter level. The incorporation of correlation alignment further enhances performance by maintaining an optimal balance between preserving valuable global patterns and adapting to local data characteristics.
Comprehensive experiments across multiple datasets and model architectures demonstrate FedAM’s superiority over existing approaches in handling non-IID data distributions. The method shows particular strength in scenarios with high data heterogeneity while maintaining reasonable communication efficiency and robustness to practical challenges like client dropouts and varying participation rates.
The success of FedAM has important implications for real-world applications, especially in privacy-sensitive domains like healthcare. Medical diagnostic systems can benefit from FedAM’s ability to share global diagnostic knowledge while adapting to institution-specific patterns in patient populations, equipment characteristics, and diagnostic procedures. Similar advantages apply to other domains requiring collaborative learning with diverse data sources, such as financial services, smart cities, and IoT applications.
While demonstrating significant improvements over existing methods, FedAM also highlights promising directions for future research in personalized federated learning. Extensions addressing server-side client relationship modeling, specialized attention mechanisms for federated scenarios, and adaptations for various data types and tasks can build upon FedAM’s foundation to further advance the field.
As the demand for privacy-preserving collaborative learning continues to grow across industries, approaches like FedAM that effectively balance shared knowledge and personalized adaptation will become increasingly valuable. The attention-driven feature separation paradigm introduced by FedAM offers a powerful framework for developing the next generation of personalized federated learning systems that can handle the complexities of real-world data while respecting privacy and security requirements.
doi:10.19734/j.issn.1001-3695.2024.09.0325
Was this helpful?
0 / 0