Edge-Cloud Task Offloading and Resource Allocation Using Deep Reinforcement Learning

Edge-Cloud Task Offloading and Resource Allocation Using Deep Reinforcement Learning

Introduction

The rapid development of the Internet of Things (IoT) has led to an exponential increase in embedded devices, which are expected to reach nearly 500 billion by 2030. However, these devices often have limited computational power, storage, and battery life, making it challenging to handle resource-intensive tasks locally. Traditional cloud computing provides a solution by offloading tasks to remote data centers, but the long-distance communication introduces significant latency, which is unsuitable for delay-sensitive applications such as remote healthcare and traffic monitoring.

Edge computing has emerged as a complementary solution, offering lower latency and higher bandwidth by processing tasks closer to the data source. Edge servers (ES) are geographically distributed and closer to IoT devices than cloud servers (CS), but they have limited computational resources compared to cloud data centers. Therefore, an efficient task offloading strategy must balance between edge and cloud computing to optimize performance and cost.

Task offloading in edge-cloud environments is particularly challenging due to the highly dynamic and stochastic nature of the system. Tasks vary in computational requirements, deadlines, and dependencies, making it difficult to determine the optimal offloading decision. Traditional heuristic and rule-based approaches struggle to adapt to these complexities, while Q-table-based reinforcement learning (RL) methods suffer from scalability issues in high-dimensional state spaces.

To address these challenges, this paper proposes a novel deep reinforcement learning (DRL) algorithm called Novel Dueling and Double Deep Q-Network (ND3QN). ND3QN combines the strengths of Dueling DQN and Double DQN to improve learning efficiency and stability. It introduces an enhanced state representation that captures dynamic environmental information and a refined reward function to guide the learning process effectively. Additionally, ND3QN supports fine-grained offloading, where tasks are assigned to specific virtual machines (VMs) rather than entire servers, improving resource utilization and meeting multi-tenant requirements.

System Model and Problem Formulation

System Architecture

The system consists of IoT devices, edge servers (ES), and cloud servers (CS). IoT devices are connected to the nearest ES via a local area network (LAN), while ES and CS communicate through high-speed fiber and wide-area networks (WAN), respectively. Each ES and CS hosts multiple heterogeneous VMs, which execute offloaded tasks. A centralized scheduler on the ES collects VM resource information and uses ND3QN to make offloading decisions.

Task Model

Each task is represented as a tuple containing:
• Length: The number of instructions required for execution.

• File Size: The data size, affecting storage and transmission requirements.

• Deadline: The maximum allowable completion time.

• RAM: The memory required for execution.

• CPU: The number of CPU cores needed.

Tasks are generated continuously, and each offloading decision is treated as a discrete time step in a sequential decision-making process.

Resource Model

The system includes multiple ES and CS, each hosting several VMs with varying computational capabilities, memory, storage, and cost. Tasks are offloaded to VMs in a non-preemptive, first-come-first-served (FCFS) manner.

Completion Time and Cost Models

The completion time of a task consists of:

  1. Transmission time: The time taken to offload the task to the target VM and return results.
  2. Queueing time: The waiting time in the VM’s execution queue.
  3. Execution time: The actual processing time on the VM.

The cost of offloading depends on the VM’s pricing model and the total execution time.

Optimization Objective

The goal is to minimize a weighted cost function that balances completion time and monetary cost:
[ text{Weighted Cost} = omega_1 times text{Completion Time} + omega_2 times text{Monetary Cost} ]
where (omega_1) and (omega_2) are user-defined weights.

Constraints ensure that tasks meet deadlines and do not exceed VM resource limits (CPU, RAM, storage). If constraints are violated, the task is discarded.

ND3QN Algorithm

Key Innovations

  1. Enhanced State Representation:
    • The state includes both current and previous time-step information about VM resources and task characteristics.

    • This dynamic state representation allows the algorithm to capture environmental changes efficiently without requiring complex neural networks.

  2. Improved Reward Function:
    • The reward function provides immediate feedback on offloading decisions.

    • If constraints are satisfied, the reward is based on the negative weighted cost.

    • If constraints are violated, a penalty is applied, discouraging invalid decisions.

  3. Fine-Grained Offloading:
    • Tasks are assigned to specific VMs rather than entire servers, improving resource utilization and isolation.

Network Architecture

ND3QN uses two neural networks:
• Current Network: Continuously updated during training.

• Target Network: A delayed copy of the current network to stabilize training.

Each network consists of:
• Advantage Stream: Estimates the relative value of each action.

• Value Stream: Estimates the overall state value.

The final Q-value is computed by combining these streams, reducing overestimation bias and improving learning stability.

Training Framework

  1. Exploration vs. Exploitation:
    • During training, the algorithm uses an (epsilon)-greedy policy to balance exploration (random actions) and exploitation (best-known actions).

    • The exploration rate (epsilon) decays over time to favor exploitation as learning progresses.

  2. Experience Replay:
    • Past experiences (state, action, reward, next state) are stored in a buffer and sampled in batches for training.

    • This breaks temporal correlations and improves learning efficiency.

  3. Loss Function:
    • The algorithm minimizes the difference between predicted Q-values and target Q-values using mean squared error.

Complexity Analysis

• Time Complexity: Depends on the number of VMs and neural network layers.

• Space Complexity: Determined by the neural network size and experience replay buffer.

Experimental Results

Simulation Setup

Experiments were conducted using CloudSim, a Java-based simulation tool for edge-cloud environments. The setup included:
• 4 Edge Servers, each with 5 heterogeneous VMs.

• 1 Cloud Server, with 3 high-performance VMs.

• Task Parameters: Varied in length, file size, deadline, RAM, and CPU requirements.

Hyperparameter Tuning

  1. Exploration Rate ((epsilon)):
    • Optimal performance was achieved at (epsilon = 0.2), balancing exploration and convergence speed.

  2. Learning Rate (lr):
    • A learning rate of 0.0001 provided stable convergence without oscillations.

Performance Comparison

ND3QN was compared against three baseline algorithms:

  1. Random Policy: Randomly selects a VM for offloading.
  2. Edge-Only (OE): Always offloads tasks to edge servers.
  3. Double DQN (DDQN): A standard DRL baseline.

Training Performance
• ND3QN achieved higher cumulative rewards and lower task discard rates than baselines.

• It converged faster and more stably than DDQN.

Testing Performance
• ND3QN maintained superior performance across different task arrival rates.

• It consistently minimized weighted cost while meeting deadlines.

Ablation Study

To validate the contributions of the enhanced state and reward function, three variants were tested:

  1. Baseline Algorithm (BA): No state or reward improvements.
  2. BA + Reward Improvement (RI): Only the refined reward function.
  3. BA + State Improvement (SI): Only the dynamic state representation.

Results confirmed that both improvements contribute to better performance, with the full ND3QN (BA + RI + SI) achieving the lowest task discard rate.

Conclusion

This paper presented ND3QN, a deep reinforcement learning-based approach for efficient task offloading and resource allocation in edge-cloud environments. By combining Dueling DQN and Double DQN techniques, ND3QN improves learning stability and reduces overestimation bias. The enhanced state representation and reward function further optimize decision-making, while fine-grained offloading improves resource utilization.

Experimental results demonstrated that ND3QN outperforms baseline algorithms in terms of convergence speed, task completion time, and cost efficiency. The ablation study validated the effectiveness of the proposed enhancements.

Future work will explore extending the cost model to include energy consumption and allowing local task execution. Additionally, constraint prioritization could further refine the penalty mechanism.

doi.org/10.19734/j.issn.1001-3695.2024.07.0228

Was this helpful?

0 / 0