A Comprehensive Overview of Similarity-Based Personalized Federated Learning Model Aggregation Framework
Introduction
Federated Learning (FL) is a distributed machine learning paradigm that enables multiple clients to collaboratively train a shared model without sharing their raw data. This approach addresses privacy concerns while allowing participants to benefit from collective knowledge. However, traditional FL frameworks, such as Federated Averaging (FedAvg), face significant challenges when dealing with data heterogeneity across clients. Data heterogeneity, particularly feature distribution shift, occurs when clients possess data with varying statistical properties, leading to degraded model performance and slow convergence.
To mitigate these challenges, personalized federated learning has emerged as a promising solution. Personalized FL aims to tailor models to individual clients by balancing global knowledge with local data characteristics. However, existing methods struggle to effectively combine global and local information, often requiring prior knowledge of data distributions or extensive hyperparameter tuning.
This paper introduces two novel frameworks: FedPG (Federated Learning with Personalized Global Model) and FedPGS (Federated Learning with Personalized Global Model and Scheduled Personalization). These frameworks leverage model similarity to dynamically adjust aggregation weights, enabling personalized model updates without compromising privacy. Additionally, the proposed methods enhance robustness against malicious devices by reducing their influence during aggregation.
Background and Motivation
Challenges in Traditional Federated Learning
FedAvg, the most widely used FL algorithm, aggregates model updates by weighting them according to each client’s data volume. While effective in homogeneous settings, FedAvg performs poorly when clients exhibit feature distribution shifts—where identical labels have different feature distributions. For example, handwritten digits vary in style across individuals, and medical data differs across hospitals due to patient demographics. Such heterogeneity causes conflicting model updates, leading to biased global models and slow convergence.
Personalized Federated Learning
Personalized FL addresses data heterogeneity by allowing clients to maintain distinct models. Existing approaches fall into two categories:
- Global Model Fine-tuning: Clients fine-tune a shared global model on their local data.
- Direct Personalized Aggregation: Clients receive custom models tailored during aggregation.
While methods like FedProx and MOON improve robustness, they still rely on a single global model. Other techniques, such as FedPer and LG-FedAvg, partition models into shared and personalized layers but require architectural modifications. FedProto and FCCL use feature alignment but depend on auxiliary data or complex constraints.
Despite these advances, a key limitation remains: most methods assume prior knowledge of data distributions or require costly hyperparameter tuning. Additionally, they are vulnerable to malicious clients submitting falsified updates.
Proposed Framework: FedPG and FedPGS
Core Idea
FedPG and FedPGS address these limitations by using cosine similarity between model updates to determine aggregation weights. This approach ensures that clients with similar feature distributions contribute more to each other’s personalized models. The frameworks operate as follows:
- Model Update Calculation: Each client computes its model update relative to the previous global model.
- Cosine Similarity Measurement: The server calculates pairwise similarities between updates.
- Normalized Weight Assignment: Similarity scores are normalized using a softmax function with a smoothing coefficient (τ).
- Personalized Aggregation: Clients receive weighted averages of all models, emphasizing those with similar updates.
Smoothing Coefficient (τ)
The smoothing coefficient τ controls the balance between global and local information:
• Small τ: Sharpens weight distribution, emphasizing local similarities.
• Large τ: Flattens weights, promoting global consensus.
FedPG requires manual tuning of τ, while FedPGS automates this process by scheduling τ to decrease over training rounds. Early rounds use large τ to capture global patterns, while later rounds reduce τ to refine local adaptations.
Robustness Against Malicious Devices
Malicious clients can disrupt FL by submitting falsified updates. FedPG and FedPGS inherently reduce their impact because:
• Malicious updates exhibit low similarity to legitimate ones, receiving negligible weights.
• The frameworks do not require predefining the number of malicious clients, unlike methods like Multi-KRUM.
Additionally, integrating blockchain and IPFS ensures model integrity and traceability, enabling post-hoc identification of malicious actors.
Experimental Validation
Datasets and Setup
Experiments were conducted on:
- Digits: Comprising MNIST, USPS, SVHN, and Synthetic Digits, featuring significant feature shifts.
- OfficeCaltech-10: Containing images from Amazon, Caltech, DSLR, and Webcam domains.
Models included a CNN for Digits and ResNet10 for OfficeCaltech-10. Federated settings involved non-IID data splits with varying client counts.
Key Findings
-
Performance Improvement:
• FedPG and FedPGS improved accuracy by 1.20–11.50 percentage points over FedAvg, FedProx, and FedProtoAvg.• FedPG (optimal τ) outperformed FedPGS in most cases, but FedPGS achieved comparable results without manual tuning.
-
Impact of Smoothing Coefficient:
• Without τ tuning, FedPG sometimes underperformed baseline methods (e.g., FedProx-PG with τ=1.0 was worse than FedProx).• Optimal τ values (e.g., 0.2 for FedAvg-PG) significantly boosted accuracy.
-
Malicious Device Resistance:
• With 25–30% malicious clients, FedPG maintained high accuracy, while FedAvg’s performance dropped sharply.• FedPGS initially suffered from malicious influence but recovered as τ decreased.
-
Generalizability:
• On MNIST (no feature shift), FedPGS matched FedAvg’s accuracy, proving its adaptability.
Conclusion
FedPG and FedPGS offer a robust solution to feature distribution shifts in FL by leveraging model similarity for personalized aggregation. Key advantages include:
• No need for data distribution knowledge: Works purely on model updates.
• Flexible personalization: Adjustable τ balances global and local information.
• Automation: FedPGS eliminates manual τ tuning.
• Security: Resists malicious clients and supports blockchain-based auditing.
Future work could extend these frameworks to scenarios with both feature and volume heterogeneity and explore applications in model-heterogeneous FL.
doi.org/10.19734/j.issn.1001-3695.2024.06.0205
Was this helpful?
0 / 0