Match-Based Model Offloading for Edge Federated Learning

Match-Based Model Offloading for Edge Federated Learning

Introduction

The rapid development of the Internet of Things (IoT) has led to an exponential increase in data generated at the network edge. Traditional cloud computing architectures face significant challenges in handling this massive volume of data due to high communication overhead, latency, and privacy concerns. Federated learning (FL) has emerged as a promising solution, enabling collaborative model training across distributed devices without sharing raw data. However, in edge computing environments, the heterogeneity of device resources introduces the “straggler effect,” where slower devices significantly delay the overall training process.

This paper proposes Fed-MBMO (Federated Learning with Match-Based Model Offloading), a novel approach to mitigate the straggler effect in edge federated learning. By leveraging performance profiling, model freezing, and optimal client matching, Fed-MBMO accelerates training while maintaining model accuracy. The key contributions include:

  1. A model offloading strategy that reduces training time by freezing feature layers in weak clients.
  2. A bipartite graph-based matching algorithm to optimize model offloading between strong and weak clients.
  3. Comprehensive experiments demonstrating significant improvements in training efficiency compared to existing methods.

Background and Motivation

Edge Federated Learning

Federated learning enables decentralized model training by aggregating local updates from participating devices. In edge computing, FL leverages the computational capabilities of edge devices, reducing reliance on centralized cloud servers. However, edge devices exhibit varying computational power, memory, and network conditions, leading to the straggler problem.

The Straggler Effect

Stragglers are devices with limited computational resources that take significantly longer to complete training tasks. In synchronous FL, the central server must wait for all clients to finish before aggregation, making stragglers a major bottleneck. Existing solutions include asynchronous training, client selection, and computation offloading, but these approaches often compromise model accuracy or fail to fully address resource heterogeneity.

Challenges in Model Offloading

Offloading model training from weak to strong clients can alleviate stragglers, but several challenges arise:
• Data Heterogeneity: Non-IID (non-independent and identically distributed) data distributions across clients can degrade model performance when offloading.

• Communication Overhead: Frequent model transfers between clients increase network load.

• Optimal Matching: Identifying the best pairs of strong and weak clients to minimize training time while preserving model quality is non-trivial.

Fed-MBMO Methodology

Client Classification

Fed-MBMO begins by classifying clients into strong and weak categories based on their computational performance. Each client executes a profiling phase, measuring the time taken for different training stages (forward propagation, backward propagation, etc.). The median compute time (MCT) is used as a threshold:
• Strong Clients: Devices with expected training times ≤ MCT.

• Weak Clients: Devices with expected training times > MCT.

• Extremely Weak Clients: Devices that remain slow even after freezing feature layers.

Model Training Phases

The training process in convolutional neural networks (CNNs) consists of four phases:

  1. Forward propagation through feature layers (FF).
  2. Forward propagation through fully connected layers (FC).
  3. Backward propagation through fully connected layers (BC).
  4. Backward propagation through feature layers (BF).

Profiling reveals that BF is the most time-consuming phase, accounting for 52-59% of total training time. Fed-MBMO reduces training time for weak clients by freezing feature layers, eliminating BF computations.

Model Offloading Strategy

Weak clients offload their models to strong clients for additional training. The key steps include:

  1. Freezing Feature Layers: Weak clients stop updating feature layers after a certain number of mini-batches, reducing local training time.
  2. Offloading to Strong Clients: Frozen models are sent to strong clients, which perform extra training iterations.
  3. Model Reconstruction: The strong client’s feature layers are combined with the weak client’s fully connected layers to form the final model.

Optimal Client Matching

The offloading problem is formulated as a bipartite graph matching problem, where the goal is to minimize both training time and model dissimilarity. The cost function considers:
• Task Completion Time: The expected time for a weak-strong pair to complete training.

• Feature Layer Similarity: Cosine similarity between feature layers of weak and strong clients to ensure compatibility.

The Kuhn-Munkres (KM) algorithm is used to find the optimal matching, iteratively adjusting the cost matrix to balance time and accuracy.

Experimental Evaluation

Setup

Experiments were conducted using Docker containers to simulate heterogeneous edge devices with varying CPU allocations (0.1 to 1.0 cores). Four datasets were used: MNIST, Fashion-MNIST, CIFAR-10, and EMNIST-B, under both IID and Non-IID (Dirichlet distribution) settings.

Results

  1. Training Time Reduction:
    • Fed-MBMO reduces training time by an average of 46.65% compared to FedAvg, 12.66% compared to FedUE, and 38.07% compared to Aergia.

    • The reduction is most significant in resource-extreme heterogeneous settings.

  2. Model Accuracy:
    • In IID settings, Fed-MBMO achieves comparable or better accuracy than baselines.

    • In Non-IID settings, accuracy slightly decreases (1.37-2.36% for EMNIST-B) due to feature layer mismatches.

  3. Client Waiting Time:
    • Fed-MBMO minimizes the mean waiting time (MWT) across clients, ensuring no single straggler dominates the training process.

Impact of Key Parameters

  1. Weight Parameter (α):
    • Higher α prioritizes faster training, while lower α emphasizes model accuracy.

    • A balanced α (e.g., 0.5) achieves the best trade-off.

  2. CPU Heterogeneity:
    • Greater variance in CPU allocations improves model accuracy but increases training time.

    • Total computational power (sum of CPU cores) has a linear effect on training speed.

Applications and Future Directions

Fed-MBMO is particularly beneficial in scenarios where real-time model updates are critical, such as:
• Healthcare IoT: Wearable devices can quickly adapt to new health data without compromising privacy.

• Autonomous Vehicles: Edge devices in vehicles can rapidly update obstacle detection models.

Future work could explore:
• Dynamic Client Matching: Adapting to changing device conditions in real-time.

• Multi-Objective Optimization: Jointly optimizing for time, accuracy, and energy consumption.

• Broader Non-IID Settings: Extending the approach to more complex data distributions.

Conclusion

Fed-MBMO addresses the straggler problem in edge federated learning through a novel combination of model freezing, offloading, and optimal client matching. By reducing the computational burden on weak clients and leveraging strong clients’ resources, it significantly accelerates training without sacrificing model accuracy. The KM-based matching algorithm ensures efficient offloading, while experimental results validate its superiority over existing methods. Fed-MBMO represents a scalable and practical solution for resource-constrained edge environments, paving the way for faster and more efficient federated learning systems.

doi.org/10.19734/j.issn.1001-3695.2024.06.0199

Was this helpful?

0 / 0