Multi-Objective Deep Reinforcement Learning for Fire Facility Location Problems
Introduction
Fire emergencies pose significant threats to public safety, often resulting in substantial human casualties and property damage when responses are delayed. The strategic placement of firefighting facilities is crucial for minimizing response times and ensuring effective emergency coverage. Traditional approaches to fire station location problems have primarily focused on single objectives such as minimizing response time or construction costs. However, these models often neglect important factors such as the psychological impact on citizens during rescue operations. This paper presents a comprehensive three-objective optimization model that simultaneously considers response timeliness, public anxiety during rescue waits, and facility construction costs.
The fire facility location problem belongs to the class of NP-hard combinatorial optimization problems, making exact solution methods computationally intractable for large-scale instances. While heuristic algorithms have been commonly employed, they frequently suffer from local optima entrapment and slow convergence in high-dimensional search spaces. Recent advances in deep reinforcement learning (DRL) have demonstrated promising results in solving complex combinatorial problems by learning effective search policies. Unlike traditional heuristics, DRL-based approaches can adaptively select optimization operators during the search process, leading to faster convergence and better solution quality.
This work introduces a novel Multi-Objective Deep Reinforcement Learning (MDRL) framework that leverages operator learning to solve the fire facility location problem. The model incorporates two distinct reward calculation methods: Advantage Disparity-Driven Reward (MDRL-AD) for small-scale problems and Dominance-Evaluation Reward (MDRL-DE) for large-scale instances. Through extensive computational experiments across different problem scales and a real-world case study, the proposed approach demonstrates superior performance compared to state-of-the-art multi-objective optimization algorithms.
Problem Formulation
The fire facility location problem addressed in this study considers three critical objectives that reflect practical requirements for urban emergency service planning. The first objective minimizes the maximum emergency response time across all demand points, ensuring that no area suffers from excessively delayed firefighting assistance. The second objective addresses the psychological impact on citizens during rescue operations by minimizing the total anxiety level of the population. This innovative aspect models public anxiety using a sigmoid function that increases with longer waiting times, recognizing that prolonged uncertainty during emergencies can lead to panic and disorder. The third objective focuses on economic efficiency by minimizing the total construction cost of new fire facilities.
The model distinguishes between two types of fire facilities: standard fire stations and mini fire stations. Standard stations offer broader coverage and faster response capabilities but require higher construction costs and more space. Mini stations provide more flexible deployment options with lower costs but have limited service ranges and slower response speeds. The formulation also incorporates existing fire stations into the optimization framework, recognizing their continued service value while determining optimal locations for new facilities.
Key constraints in the model ensure practical feasibility of solutions. Each demand point must be served by a minimum number of facilities to provide redundancy in emergency coverage. Response time constraints guarantee that all areas receive service within acceptable time limits. The total number of new facilities is bounded to prevent excessive infrastructure investment while maintaining adequate coverage. The model carefully handles allocation variables to ensure that only established facilities can provide service to demand points.
Methodology
The proposed Multi-Objective Deep Reinforcement Learning (MDRL) framework approaches the fire facility location problem as a Markov Decision Process, where an intelligent agent learns to select optimal improvement operators through interaction with the solution space. The system consists of five core components: state representation, policy network, action space (optimization operators), reward function, and return calculation.
The state representation combines static and dynamic information about the problem instance. Static features include facility candidate locations, construction costs, demand point characteristics, and existing infrastructure. Dynamic features capture the current solution state, historical operator selections, and their performance impacts. This comprehensive state encoding enables the policy network to make informed decisions based on both the problem structure and search trajectory.
The policy network architecture employs a transformer-based encoder-decoder structure with multiple attention heads. The encoder processes facility and demand point information through successive layers of self-attention and feed-forward networks, while the decoder incorporates historical search patterns and masking mechanisms. The network outputs probability distributions over available optimization operators, using a scaled softmax function to prevent gradient vanishing issues.
Two distinct sets of optimization operators form the action space. Problem Instance-Oriented Operators (PIOO) include specialized swap, flip, and break operators designed specifically for fire facility location characteristics. These operators consider factors like maximum response times and construction costs when modifying solutions. Decoding Scale-Oriented Operators (DSOO) provide more general-purpose neighborhood search operations that automatically adjust their perturbation intensity based on problem size.
For reward calculation in multi-objective optimization, the framework implements two complementary approaches. The Advantage Disparity-Driven Reward (ADR) method evaluates improvements in weighted combinations of normalized objective values, suitable for small-scale problems where objective trade-offs are more straightforward. The Dominance-Evaluation Reward (DER) method directly assesses Pareto dominance relationships between solutions, incorporating an innovative incentive factor that maintains learning stability as solutions approach the Pareto frontier. This approach proves particularly effective for large-scale problems where simple weighted combinations may not adequately capture complex objective interactions.
The training process employs a modified REINFORCE algorithm with baseline subtraction to reduce variance in policy gradient updates. The model is trained on diverse problem instances with carefully designed initial solutions that balance solution quality and diversity. The training incorporates mechanisms to handle constraints, ensuring feasible solutions throughout the learning process.
Computational Experiments
The experimental evaluation comprehensively assesses the performance of proposed MDRL framework across different problem scales and compares it against three state-of-the-art multi-objective optimization algorithms: an improved NSGA-II genetic algorithm, an enhanced MOPSO particle swarm optimization method, and the Learning to Improve (L2I) deep reinforcement learning approach.
Small-scale validation tests confirm that the MDRL algorithm successfully recovers optimal solutions identified by exact methods while additionally discovering numerous non-dominated alternatives. For medium and large-scale instances, extensive testing demonstrates MDRL’s superior performance across four key metrics: Hypervolume (measuring convergence to the true Pareto front), Spacing (assessing solution diversity), Ω-dominance (quantifying Pareto front coverage), and IGD (combining convergence and diversity assessment).
The comparison between operator sets reveals that problem-specific PIOO operators consistently outperform more generic DSOO operators, particularly as problem size increases. This advantage stems from PIOO’s ability to incorporate domain knowledge about facility location characteristics into the search process. The evaluation also demonstrates that MDRL-DE’s dominance-based reward calculation significantly outperforms MDRL-AD’s weighted-sum approach for large-scale instances, while both methods show comparable performance on smaller problems.
A notable finding is MDRL’s exceptional scalability. While traditional algorithms like NSGA-II and MOPSO show degraded performance as problem size grows beyond 200 candidate locations, MDRL maintains high solution quality even for very large instances with 500 candidate sites. This scalability makes the approach particularly suitable for real-world urban planning scenarios that typically involve numerous potential facility locations and demand points.
Real-World Case Study
The practical applicability of MDRL is demonstrated through a detailed case study of fire facility placement in Shanghai’s Fengxian New City district. The study incorporates real geographical data, existing fire station locations, population density estimates from foot traffic analysis, and actual construction cost parameters. The model considers 83 candidate locations (23 standard stations and 60 mini stations) serving 36 demand points across the urban area.
Analysis of current coverage reveals that 8 demand points exceed acceptable response time thresholds, while 13 points lack sufficient facility coverage according to safety standards. The MDRL algorithm identifies 44 non-dominated solutions that provide various trade-offs between maximum response time, public anxiety reduction, and infrastructure investment. A selected optimal solution suggests establishing 3 new standard fire stations and 4 mini stations, achieving a maximum response time of 277.2 seconds (within the 5-minute emergency standard) at a total cost of 32.4 million yuan.
Comparative analysis shows MDRL’s superior performance in the real-world scenario, discovering significantly more non-dominated solutions than alternative methods while achieving better Hypervolume and IGD metrics. The algorithm’s ability to handle mixed facility types (standard and mini stations) and incorporate existing infrastructure demonstrates its practical utility for urban planning decision-makers.
Conclusion
This paper presents a comprehensive multi-objective optimization framework for fire facility location problems that advances current methodologies in several important directions. By incorporating public anxiety as an explicit optimization objective alongside traditional response time and cost considerations, the model provides a more holistic approach to emergency service planning. The innovative MDRL algorithm demonstrates how deep reinforcement learning can effectively solve complex combinatorial optimization problems through adaptive operator selection.
The proposed approach offers several advantages over conventional methods. The integration of domain knowledge through specialized optimization operators enables more efficient search processes. The dual reward calculation strategies (ADR and DER) provide flexibility for different problem scales. The transformer-based policy network effectively captures complex relationships in facility location problems while maintaining computational efficiency.
Experimental results across various problem scales and the real-world case study validate MDRL’s superior performance in terms of solution quality, diversity, and computational efficiency. The algorithm’s strong scalability makes it particularly valuable for large urban areas where traditional methods become computationally prohibitive. Future research directions include extending the framework to dynamic fire vehicle dispatching problems and investigating transfer learning capabilities across different urban configurations.
The practical implications of this work are significant for urban planners and emergency service providers. By simultaneously optimizing response capability, psychological impact, and economic efficiency, the model supports data-driven decision making for fire facility placement. The ability to generate multiple non-dominated solutions allows stakeholders to evaluate various trade-offs and select implementations that best align with local priorities and constraints.
doi.org/10.19734/j.issn.1001-3695.2024.06.0276
Was this helpful?
0 / 0