Short-Term Power Load Forecasting Based on Component-Aware Dynamic Graph Transformer

Short-Term Power Load Forecasting Based on Component-Aware Dynamic Graph Transformer

Accurate short-term load forecasting is crucial for the stable operation and effective scheduling of power systems. However, the nonlinear and non-stationary nature of power load data often leads to low prediction accuracy. While decomposition techniques can reduce the impact of non-stationarity and improve forecasting performance, existing decomposition-based methods fail to capture relationships between decomposed components and significantly increase computational time. To address these challenges, this paper introduces the Component-Aware Dynamic Graph Transformer (CDGT) model, which integrates improved decomposition techniques with advanced deep learning architectures for more efficient and accurate load forecasting.

Introduction

Power load forecasting plays a vital role in modern power systems, where fluctuations and uncertainties in electricity demand require precise predictions to ensure grid reliability and efficiency. Short-term load forecasting, which predicts electricity demand from hours to days ahead, enables power dispatch centers to optimize generation planning, allocate resources effectively, and respond to unexpected events. Additionally, accurate load forecasting supports electricity market operations by facilitating fair pricing and risk management. However, the non-stationary and nonlinear characteristics of power load data make forecasting particularly challenging.

Traditional statistical models, such as ARIMA and exponential smoothing, have been widely used but struggle with nonlinear relationships. Machine learning methods like SVM and XGBoost offer better adaptability but often suffer from overfitting and inefficient feature extraction. Deep learning models, including RNNs, LSTMs, and GRUs, have improved forecasting by capturing temporal dependencies, yet they still neglect inter-sequence relationships. Transformer-based models, with their self-attention mechanisms, have shown promise in sequence modeling but often mix features across channels, leading to suboptimal performance.

This paper proposes CDGT, a novel approach that combines optimized decomposition with dynamic graph-based modeling and Transformer architectures. The key contributions include:

  1. Improved Decomposition with JSSAO-VMD: An enhanced snow ablation optimizer (JSSAO) is used to optimize VMD parameters, improving decomposition quality.
  2. Component-Aware Dynamic Graph Modeling: Graph neural networks (GNNs) automatically learn relationships between decomposed components.
  3. Channel-Independent Transformer with EMA Attention: A modified Transformer architecture processes each component independently while incorporating frequency-aware attention.
  4. Efficient Multi-Component Prediction: The model outputs all component predictions simultaneously, reducing computational overhead.

Methodology

Optimized Variational Mode Decomposition

Variational Mode Decomposition (VMD) is a powerful technique for decomposing complex signals into intrinsic mode functions (IMFs) with distinct frequency characteristics. Unlike empirical mode decomposition (EMD), VMD avoids mode mixing and provides better noise robustness. However, its performance heavily depends on two key parameters: the number of decomposition modes (K) and the penalty factor (α). Manual selection of these parameters often leads to suboptimal decomposition.

To address this, the paper introduces the Jointly Searched and Stochastic Perturbed Snow Ablation Optimizer (JSSAO), an improved version of the Snow Ablation Optimizer (SAO). The enhancements include:
• Initial Population Improvement: Using a good point set to ensure uniform distribution and enhance diversity.

• Global and Local Search Balancing: Gaussian Brownian motion simulates snow sublimation for global exploration, while a degree-day model refines local exploitation.

• Dual-Population Mechanism: Separate populations handle exploration and exploitation, with dynamic size adjustments.

• Random Perturbation and Joint Opposite Selection: These strategies prevent premature convergence and improve search efficiency.

The fitness function for optimization is based on envelope entropy, which measures signal complexity and noise levels. Lower entropy indicates better decomposition quality.

Component-Aware Dynamic Graph Modeling

After decomposition, the model must capture relationships between different IMFs. Traditional approaches process each component separately, ignoring potential interdependencies. CDGT addresses this by representing IMFs as nodes in a dynamic graph, where edges represent learned relationships.

The graph structure is learned end-to-end using attention mechanisms:

  1. Dynamic Graph Learning: A self-attention mechanism computes similarity scores between components, forming a weighted adjacency matrix.
  2. Global Message Tokens: Learnable tokens aggregate global information from each component, enhancing interaction.
  3. Graph Message Passing: A GNN propagates information across the graph, refining component representations based on their neighbors.

This approach allows the model to exploit both intra-component patterns and inter-component dependencies, improving forecasting accuracy.

Channel-Independent Transformer with EMA Attention

To process decomposed components efficiently, CDGT adopts a channel-independent Transformer architecture. Unlike standard Transformers that mix features across channels, this approach treats each component as an independent input, preventing irrelevant feature interactions.

Key modifications include:
• Patch-Based Processing: Input sequences are divided into patches to reduce computational complexity.

• Positional Encoding: Sine-cosine embeddings preserve temporal order.

• EMA Attention: Exponential moving average attention emphasizes recent observations, improving local dependency modeling.

The EMA mechanism is computed efficiently in the frequency domain using FFT, maintaining scalability for long sequences.

Loss Function and Training

The model employs a signal-decay-aware loss function that assigns higher weights to recent prediction errors, aligning with the intuition that near-future predictions are more reliable. This helps mitigate overfitting to noisy or irrelevant long-term patterns.

Experiments and Results

Datasets and Setup

Experiments were conducted on two public datasets:

  1. Australian Electricity Load Data: Half-hourly recordings from 2009–2010, including weather variables.
  2. Moroccan Electricity Load Data: 10-minute recordings from 2017, with meteorological data.

The CDGT model was compared against state-of-the-art baselines, including Crossformer, ETSformer, FEDformer, PatchTST, LightTS, and SageFormer. Evaluation metrics included MAE, RMSE, and MAPE.

Key Findings

  1. Decomposition Optimization: JSSAO-VMD achieved superior decomposition compared to CEEMDAN and standard VMD, with optimized parameters (K=9, α=2076 for Australia; K=6, α=1102 for Morocco).
  2. Ablation Studies: Removing any CDGT component (graph modeling, channel independence, EMA attention, or decay-aware loss) degraded performance, confirming their necessity.
  3. Efficiency: CDGT reduced training time by 81.15% (Australia) and 79.52% (Morocco) compared to sequential component processing.
  4. Forecasting Accuracy:
    • Australia: MAE of 0.377 GW (5.51–21.62% better than baselines).

    • Morocco: MAE of 1.075 MW (15.02–75.49% improvement).

The model also outperformed decomposition-based hybrid methods, achieving 16.04–31.08% lower MAE with significantly less computation.

Conclusion

The CDGT model introduces a novel framework for short-term load forecasting by integrating optimized signal decomposition, dynamic graph-based relationship learning, and channel-independent Transformer architectures. Key advantages include:
• Enhanced Decomposition: JSSAO-VMD adaptively tunes decomposition parameters for improved IMF quality.

• Relationship Awareness: GNNs capture complex interactions between components, avoiding information loss.

• Computational Efficiency: Simultaneous multi-component prediction reduces runtime versus sequential methods.

Future work may explore integrating weather variables more effectively and refining patch-level attention mechanisms. The proposed approach demonstrates significant potential for practical power system applications, balancing accuracy and efficiency.

doi.org/10.19734/j.issn.1001-3695.2024.07.0231

Was this helpful?

0 / 0