Integrating Contrastive Learning for Dual-Branch Multivariate Time Series Anomaly Detection

Integrating Contrastive Learning for Dual-Branch Multivariate Time Series Anomaly Detection

Introduction

Multivariate time series anomaly detection plays a crucial role in maintaining the operational integrity of complex industrial systems. The ability to accurately identify anomalous patterns across numerous interconnected devices presents a significant challenge due to the dynamic dependencies between entities and the interference caused by anomalous data during reconstruction. Traditional approaches often struggle to capture these dynamic relationships effectively, leading to suboptimal performance in distinguishing between normal and abnormal patterns.

This paper introduces a novel approach called Integrating Contrastive Learning Dual-Branch Anomaly Detection (CLDAD), which leverages contrastive learning to enhance the distinction between normal and anomalous representations. The method combines graph structure learning with temporal dependency modeling to extract both spatial and temporal relationships in multivariate time series data. By incorporating multi-scale feature fusion and joint contrastive training, CLDAD significantly improves anomaly detection performance across various industrial datasets.

Background and Challenges

Multivariate Time Series Anomaly Detection

Multivariate time series data consists of observations collected from multiple sensors or variables over time. Detecting anomalies in such data is essential for applications like industrial monitoring, cybersecurity, and predictive maintenance. However, several challenges complicate this task:

  1. Dynamic Dependencies – Entities in industrial systems often exhibit time-varying relationships that are difficult to model.
  2. Anomaly Contamination – Training data may contain anomalies, leading models to inadvertently learn abnormal patterns.
  3. Temporal and Spatial Complexity – Time series data contains both sequential dependencies and inter-variable correlations, requiring models to capture both aspects effectively.

Existing Approaches and Limitations

Current anomaly detection methods can be broadly categorized into:

• Reconstruction-based models (e.g., autoencoders, GANs) – These attempt to reconstruct normal behavior and flag deviations as anomalies. However, they may fail when anomalies are present in training data.

• Prediction-based models (e.g., LSTM, GNNs) – These forecast future values and detect anomalies based on prediction errors. While effective for temporal modeling, they may struggle with complex nonlinear patterns.

• Discriminative models – These rely on labeled data to learn the boundary between normal and abnormal instances. However, labeled anomalies are often scarce in real-world scenarios.

Additionally, graph neural networks (GNNs) have been applied to model entity dependencies but face challenges in constructing appropriate graph structures and balancing spatial and temporal feature extraction.

Methodology

Overview of CLDAD

CLDAD is an end-to-end contrastive learning framework designed to address the limitations of existing methods. The key innovations include:

  1. Dual-Branch Architecture – Separately captures spatial (entity) and temporal dependencies.
  2. Graph Structure Learning and Enhancement – Dynamically models inter-entity relationships and enhances graph features.
  3. Block Reassembly and Multi-Scale Fusion – Extracts features at different scales to improve robustness.
  4. Joint Contrastive Training – Amplifies the distinction between normal and abnormal patterns.

Spatial Relationship Extraction

The spatial branch focuses on learning dynamic dependencies between entities (e.g., sensors). Given a multivariate time series window, the method:

  1. Learns a Dynamic Graph – Uses self-attention to compute pairwise relationships between entities, constructing an adjacency matrix that evolves over time.
  2. Enhances Graph Features – Applies residual linear layers to amplify node features while preserving original graph characteristics.
  3. Performs Block Reassembly – Splits the learned graphs into smaller blocks and reassembles them to capture multi-scale spatial relationships.

Temporal Relationship Extraction

The temporal branch employs Long Short-Term Memory (LSTM) networks to model sequential dependencies. The process involves:

  1. Encoding Time Series Windows – Each window is processed by an LSTM to generate hidden state representations that encapsulate temporal patterns.
  2. Block Reassembly for Multi-Scale Analysis – Similar to the spatial branch, temporal encodings are split and reassembled to analyze different time scales.

Fusion and Contrastive Learning

The spatial and temporal features are fused using graph convolutional operations:

  1. Graph Convolutional Fusion – Combines spatial and temporal representations through two-layer graph convolutions, enhancing feature interactions.
  2. Joint Contrastive Training – Computes two loss terms:
    • Reconstruction Loss – Measures the similarity between reassembled and original fused representations.

    • Cross-Branch Contrastive Loss – Ensures consistency between spatial and temporal branches while amplifying differences between normal and abnormal patterns.

The final anomaly score is derived from the combined losses, with higher scores indicating a greater likelihood of anomalies.

Experiments and Results

Datasets

CLDAD was evaluated on four public industrial datasets:

  1. SWaT – A water treatment plant dataset with sensor and actuator readings.
  2. WADI – A water distribution system dataset containing both normal and attack scenarios.
  3. SMAP – NASA’s soil moisture monitoring dataset with expert-labeled anomalies.
  4. MSL – Data from the Mars Science Laboratory rover, capturing sensor measurements.

These datasets vary in dimensionality, anomaly ratios, and complexity, providing a comprehensive testbed for evaluation.

Comparative Methods

CLDAD was compared against six state-of-the-art methods:

  1. OmniAnomaly – A reconstruction-based model using variational autoencoders.
  2. MAD-GAN – A GAN-based anomaly detection approach.
  3. USAD – An adversarial training model with dual decoders.
  4. GDN – A graph-based method leveraging entity embeddings.
  5. FuSAGNet – A hybrid model combining reconstruction and prediction.
  6. MTGFLOW – A density estimation model with dynamic graph learning.

Performance Metrics

The evaluation used three standard metrics:

• Precision (Prec) – The proportion of detected anomalies that are true anomalies.

• Recall (Rec) – The proportion of true anomalies correctly identified.

• F1-Score (F1) – The harmonic mean of precision and recall.

Results

CLDAD achieved the highest F1-scores across all datasets:

• SWaT: 91.63%

• WADI: 90.60%

• SMAP: 90.06%

• MSL: 93.69%

These results represent an average improvement of 1.52 percentage points over the next best method (MTGFLOW). Key observations include:

  1. Superior Precision – CLDAD consistently achieved the highest precision, indicating fewer false positives.
  2. Robustness to Anomaly Contamination – Unlike reconstruction-based models, CLDAD’s contrastive learning framework minimized the impact of anomalies in training data.
  3. Effective Multi-Scale Learning – The block reassembly strategy improved feature extraction across different time scales.

Ablation Studies

To validate the contributions of key components, two ablation experiments were conducted:

  1. Removing Graph Enhancement (w/o GA) – The F1-score dropped by 2.2–3.57%, confirming the importance of residual feature amplification.
  2. Removing Contrastive Loss (w/o RCL) – Performance declined by 7.61–8.53%, highlighting the critical role of joint contrastive training.

Hyperparameter Sensitivity

Experiments on window size, graph enhancement layers, block size, and batch size revealed:

  1. Optimal Block Size – A block size of 4–6 yielded the best results.
  2. Stable Performance – The model was relatively insensitive to graph enhancement layers and window size variations.

Computational Efficiency

CLDAD demonstrated a favorable balance between runtime and memory usage:

• Runtime: 344 seconds (second only to GDN).

• GPU Memory: 10.02 GB (lower than FuSAGNet and MTGFLOW).

Visualization and Case Study

Graph Feature Enhancement

A comparison of adjacency matrix eigenvalues before and after graph enhancement showed increased feature separation, confirming the effectiveness of residual linear layers.

Anomaly Detection Visualization

On the MSL dataset, CLDAD successfully identified anomalies in sensor readings, with anomaly scores spiking during actual anomaly periods (marked by red regions). This demonstrated the model’s ability to distinguish subtle deviations from normal behavior.

Conclusion

CLDAD presents a robust and efficient framework for multivariate time series anomaly detection by integrating contrastive learning with dynamic graph modeling. The dual-branch architecture effectively captures both spatial and temporal dependencies, while block reassembly and joint contrastive training enhance feature discrimination. Experimental results on industrial datasets demonstrate significant improvements over existing methods, particularly in precision and F1-score.

Future work will focus on optimizing feature dimensionality reduction and exploring alternative fusion strategies to further enhance model performance.

doi.org/10.19734/j.issn.1001-3695.2024.07.0286

Was this helpful?

0 / 0