Deep Reinforcement Learning-Based Dynamic Service Function Chain Deployment in IoT-Mobile Edge Computing Networks
Introduction
The rapid proliferation of Internet of Things (IoT) devices has led to an increasing number of heterogeneous, computation-intensive, and latency-sensitive service requests. These requests often require multiple network functions (NFs), such as firewalls, deep packet inspection, intrusion prevention systems, and load balancers. Network Function Virtualization (NFV) enables these functions to be decoupled from dedicated hardware and deployed as software applications on virtual machines (VMs), known as Virtual Network Functions (VNFs). This flexibility allows VNFs to adapt to dynamic network environments and meet diverse service demands.
Additionally, Software-Defined Networking (SDN) separates the data plane from the control plane, enabling centralized management of IoT services. However, many IoT applications demand lower latency and greater computational resources. Mobile Edge Computing (MEC) addresses these requirements by deploying VNFs at the network edge, reducing end-to-end latency and providing computational support for IoT devices.
Real-time service requests from IoT devices often require traversal through predefined sequences of VNFs, forming Service Function Chains (SFCs). These requests are referred to as IoT-SFC Requests (IoT-SFCRs). Due to limited edge cloud resources, SDN controllers must dynamically place VNFs and determine optimal routing strategies for IoT-SFCRs—a problem known as dynamic SFC deployment.
Traditional optimization methods, such as heuristic algorithms, struggle to efficiently solve dynamic SFC deployment in large-scale IoT-MEC networks. These methods often require excessive computational resources and may converge to suboptimal solutions in highly dynamic environments. Recent studies have explored machine learning approaches, including Deep Reinforcement Learning (DRL), for traffic routing, VNF placement, and SFC orchestration. However, many existing solutions fail to consider resource consumption costs, Quality of Service (QoS) requirements, or the global optimization of VNF placement.
This paper introduces a novel DRL-based approach for dynamic SFC deployment in IoT-MEC networks. The proposed method decomposes the problem into two subproblems: VNF placement and routing path determination. A Markov Decision Process (MDP) models the network state transitions, and a Double Deep Q-Network (DDQN) algorithm is employed to optimize VNF selection and routing decisions. The approach minimizes the weighted sum of resource consumption costs and end-to-end delay while ensuring network load balancing.
System Model and Problem Formulation
Physical Network
The IoT-MEC network is represented as an undirected graph, where nodes include edge clouds (cloudlets) and links connecting these nodes. Each cloudlet has limited CPU capacity and processing latency, while each link has bandwidth capacity and transmission delay. The network also includes an SDN controller responsible for receiving IoT-SFCR information and orchestrating SFC deployments.
Service Function Chain Requests
An IoT-SFCR consists of a source node, a destination node, and an ordered set of VNFs that must be traversed. Each request specifies CPU and bandwidth requirements, as well as a maximum tolerable end-to-end delay. The dynamic nature of IoT networks, coupled with random service request arrivals, makes SFC deployment a challenging problem.
Problem Decomposition
To simplify the dynamic SFC deployment problem, it is decomposed into two subproblems:
- VNF Placement Subproblem: Determines the optimal cloudlets for hosting required VNF instances.
- Routing Path Subproblem: Selects the best paths to connect source, destination, and intermediate VNF instances while meeting bandwidth and latency constraints.
Optimization Objectives
The primary objective is to minimize the weighted sum of resource consumption costs and end-to-end delay. Resource costs include VNF deployment expenses, computational resource usage, and penalties for rejected requests. The optimization must also ensure that CPU and bandwidth constraints are satisfied and that end-to-end latency remains within acceptable limits.
Deep Reinforcement Learning for Dynamic SFC Deployment
Overview of the Proposed Solution
The proposed solution leverages DRL to dynamically adapt to network changes and optimize SFC deployment. The framework consists of two neural networks:
- VNF Selection Network: Identifies the best cloudlets for VNF placement.
- SFC Path Search Network (SPSN): Determines optimal routing paths between selected cloudlets.
The SDN controller collects network state information and IoT-SFCR details, feeding them into the VNF selection network. The network outputs Q-values for possible cloudlet combinations, and the agent selects the action with the highest Q-value. If a selected cloudlet lacks the required VNF instance, it is dynamically deployed.
For routing, the SPSN evaluates potential paths and selects the top-k candidates based on Q-values. A heuristic algorithm then verifies resource availability and latency constraints to determine feasible paths.
Markov Decision Process Model
The dynamic SFC deployment problem is modeled as an MDP, defined by states, actions, and rewards.
State Representation
The state includes:
• Remaining CPU resource ratios of cloudlets.
• CPU requirements of requested VNFs.
• Maximum tolerable delay of the IoT-SFCR.
Action Space
For VNF placement, actions represent all possible cloudlet combinations for hosting required VNFs. For routing, actions correspond to candidate paths between cloudlets, filtered by maximum hop count to avoid excessive latency.
Reward Function
The reward function balances resource consumption and latency:
• VNF Selection Network Reward: Penalizes high latency and resource costs while encouraging efficient VNF placement.
• SPSN Reward: Combines path latency and bandwidth usage costs, weighted to ensure stability during training.
Training Algorithm
The DDQN algorithm is used to train both neural networks. Experience replay stores past transitions (state, action, reward, next state) to break correlations in training data. The online network is updated via gradient descent, while the target network periodically synchronizes its parameters with the online network to stabilize training.
Algorithm Implementation
DRL-SFCD Execution Process
The DRL-SFCD algorithm operates in four stages:
- VNF Instance Selection: The VNF selection network identifies the best cloudlets for hosting required VNFs.
- Path Evaluation: The SPSN ranks potential routing paths based on Q-values.
- Feasibility Check: A heuristic algorithm verifies resource availability and latency constraints for the top-k paths.
- Execution: The best feasible path is deployed in the actual network.
Heuristic Algorithm for Path Selection
The heuristic algorithm constructs a simulated environment to evaluate path feasibility. It checks bandwidth and latency constraints, computes rewards for feasible paths, and selects the path with the highest reward for real-world deployment.
Simulation and Performance Evaluation
Simulation Setup
Three network topologies were used for evaluation:
- Random Network: Nodes connected with a 0.4 probability.
- Small-World Network: High clustering with short path lengths.
- Scale-Free Network: Power-law degree distribution.
Each topology included 24 nodes, with 8 designated as cloudlets. Six VNF types were considered, each with specific CPU requirements. IoT-SFCRs were randomly generated, each requiring traversal of three VNFs with bandwidth and latency constraints.
Comparative Algorithms
The proposed DRL-SFCD was compared against:
- Random Method: Randomly selects cloudlets and paths.
- Delay-Least-Greedy Method: Prioritizes low-latency paths.
- DQL-SP: Combines DRL for VNF placement with Dijkstra’s algorithm for routing.
Results and Analysis
Success Rate
DRL-SFCD consistently achieved higher IoT-SFCR acceptance rates across all topologies. The DQL-SP method performed comparably but degraded with increasing request volumes due to suboptimal resource balancing.
Average Reward
DRL-SFCD maximized rewards by efficiently balancing latency and resource costs. The random and greedy methods exhibited significant performance drops under high load, while DRL-SFCD maintained stable performance.
Key Findings
• In random networks, DRL-SFCD improved success rates by 17% over baseline methods.
• In small-world and scale-free networks, DRL-SFCD achieved 23.8% higher average rewards.
• The heuristic-aided path selection ensured feasible deployments even under resource constraints.
Conclusion
This paper presented DRL-SFCD, a DRL-based algorithm for dynamic SFC deployment in IoT-MEC networks. By decomposing the problem into VNF placement and routing subproblems, the approach efficiently optimized resource usage and latency. The integration of DDQN with heuristic path selection ensured robust performance across diverse network topologies.
Simulation results demonstrated that DRL-SFCD outperformed existing methods in both success rate and average reward. The algorithm’s adaptability to dynamic network conditions makes it a promising solution for real-world IoT-MEC deployments. Future work could explore multi-agent DRL for distributed SFC orchestration and enhanced scalability.
doi.org/10.19734/j.issn.1001-3695.2024.06.0222
Was this helpful?
0 / 0